National Natural Science Foundation of China (No.61672173);Youth Innovative Talents of Colleges and Universities in Guangdong Province (No.2018KQNCX140)
ZHU Zheng-yu, LIAO Li-ping, YANG Chun-ling, et al. Lip Motion and Voice Consistency Recognition Based on Audio-Visual Matching of Vowel Pronunciation Events and Position Delay Analysis[J]. Acta Electronica Sinica, 2021, 49(1): 140-148.
DOI:
ZHU Zheng-yu, LIAO Li-ping, YANG Chun-ling, et al. Lip Motion and Voice Consistency Recognition Based on Audio-Visual Matching of Vowel Pronunciation Events and Position Delay Analysis[J]. Acta Electronica Sinica, 2021, 49(1): 140-148. DOI: 10.12263/DZXB.20190238.
Lip Motion and Voice Consistency Recognition Based on Audio-Visual Matching of Vowel Pronunciation Events and Position Delay Analysis
For the mainstream lip motion and voice coherence judgment method
the whole sentence (segment) is analyzed without screening the content. This leads to large dictionary size and high computational complexity
and the result is vulnerable to weak related segments such as mute. Considering the vowel with significant lip shape changes as a representative pronunciation event and combining with the statistical results of the audio-visual initial delay distribution range
a consistent decision method based on audio-visual matching of vowel pronunciation events and position delay analysis is proposed. Firstly
the dictionary learning data is selected by the proposed audio-visual vowel segmentation method
and then the vowel dictionary is used to analyze the matching of the vowel event
and the time delay distribution of each vowel position is statistically scored. A consistency judgment is made by a scoring mechanism in which the vowel pronunciation event lip matching score and the position delay analysis score are combined. Experimental results show that the proposed method is superior to compared algorithms in recognition performance and reduces the amount of computation compared with the traditional dictionary method.