CHEN Yan-xiang, LIU Ming. Research on Robustness of Audio-Visual Speaker Recognition Based on Articulatory Features[J]. Acta Electronica Sinica, 2010, 38(12): 2920-2924.
DOI:
CHEN Yan-xiang, LIU Ming. Research on Robustness of Audio-Visual Speaker Recognition Based on Articulatory Features[J]. Acta Electronica Sinica, 2010, 38(12): 2920-2924.DOI:
Research on Robustness of Audio-Visual Speaker Recognition Based on Articulatory Features
Speech perception of human is bimodal because of the simultaneous audible and visible influence.This paper investigates how to fuse speech and visual speech features.From research on articulatory mechanism
the apparently observed audio-visual asynchrony is represented by asynchronous articulatory feature streams.An audio-visual model composed of speech and lip-moving is proposed based on Dynamic Bayesian Network
and then the multi-level fusion is implemented to improve the robustness of speaker recognition system.The experiment for audio-visual bimodal corpus shows that the multi-level fusion can improve the performance at all levels of acoustic signal-to-noise ratio (SNR) from 0 to 30dB.