基于韵母发音事件匹配与位置时延分析的音唇一致性判决方法

朱铮宇; 廖丽平; 杨春玲; 王泳; 蔡君; 邱华愉

doi:10.12263/DZXB.20190238

您当前的位置：

首页 >

文章列表页 >

基于韵母发音事件匹配与位置时延分析的音唇一致性判决方法

学术论文 | 更新时间：2025-12-08

- 基于韵母发音事件匹配与位置时延分析的音唇一致性判决方法
- Lip Motion and Voice Consistency Recognition Based on Audio-Visual Matching of Vowel Pronunciation Events and Position Delay Analysis
- 电子学报 2021年49卷第1期页码：140-148
- 作者机构：
  
  1. 广东技术师范大学网络空间安全学院,广东,广州,510665
  2. 华南理工大学电子与信息学院,广东,广州,510641
  3. 广东技术师范大学网络空间安全学院,广东,广州,510665
  4. 华南理工大学电子与信息学院,广东,广州,510641
- 作者简介：
- 基金信息：
  
  国家自然科学基金 (No.61672173）;广东省普通高校青年创新人才 (No.2018KQNCX140）
- DOI：10.12263/DZXB.20190238
  中图分类号： TP391
- 网络出版：2021-01-25，
  
  纸质出版：2021
- 稿件说明：
移动端阅览
朱铮宇, 廖丽平, 杨春玲, 等. 基于韵母发音事件匹配与位置时延分析的音唇一致性判决方法[J]. 电子学报, 2021,49(1):140-148.

ZHU Zheng-yu, LIAO Li-ping, YANG Chun-ling, et al. Lip Motion and Voice Consistency Recognition Based on Audio-Visual Matching of Vowel Pronunciation Events and Position Delay Analysis[J]. Acta Electronica Sinica, 2021, 49(1): 140-148.
朱铮宇, 廖丽平, 杨春玲, 等. 基于韵母发音事件匹配与位置时延分析的音唇一致性判决方法[J]. 电子学报, 2021,49(1):140-148. DOI： 10.12263/DZXB.20190238.

ZHU Zheng-yu, LIAO Li-ping, YANG Chun-ling, et al. Lip Motion and Voice Consistency Recognition Based on Audio-Visual Matching of Vowel Pronunciation Events and Position Delay Analysis[J]. Acta Electronica Sinica, 2021, 49(1): 140-148. DOI： 10.12263/DZXB.20190238.

摘要

针对传统一致性判决方法主要对整句（段）话进行分析，并无对分析内容加以筛选，存在字典规模过大、计算复杂度高及结果易受静音等弱关联片段影响等不足，本文以唇型变化显著的韵母为代表性发音事件，结合音唇初始时延分布范围的统计结果，提出基于韵母发音事件匹配与位置时延分析的一致性判决方法.先利用提出的音视频结合韵母切分法对字典学习数据进行韵母段筛选，再通过学习所得的韵母字典分析韵母事件的音唇匹配度，并对各韵母出现位置的时延分布进行统计评分.最后由韵母发音事件音唇匹配度得分与位置时延分析评分相融合的评分机制判决一致性.实验结果表明，本文算法在识别性能上优于多种比较算法，且与传统字典法相比降低了一定的运算量.

Abstract

For the mainstream lip motion and voice coherence judgment method

the whole sentence (segment) is analyzed without screening the content. This leads to large dictionary size and high computational complexity

and the result is vulnerable to weak related segments such as mute. Considering the vowel with significant lip shape changes as a representative pronunciation event and combining with the statistical results of the audio-visual initial delay distribution range

a consistent decision method based on audio-visual matching of vowel pronunciation events and position delay analysis is proposed. Firstly

the dictionary learning data is selected by the proposed audio-visual vowel segmentation method

and then the vowel dictionary is used to analyze the matching of the vowel event

and the time delay distribution of each vowel position is statistically scored. A consistency judgment is made by a scoring mechanism in which the vowel pronunciation event lip matching score and the position delay analysis score are combined. Experimental results show that the proposed method is superior to compared algorithms in recognition performance and reduces the amount of computation compared with the traditional dictionary method.

关键词

Keywords

references

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

结合最近邻图模型的稀疏ISAR成像方法

一种基于超完备字典学习的图像去噪方法

基于多核稀疏编码的三维人体姿态估计

在线鲁棒判别式字典学习视觉跟踪