Research on Robustness of Audio-Visual Speaker Recognition Based on Articulatory Features

CHEN Yan-xiang; LIU Ming

您当前的位置：

首页 >

文章列表页 >

Research on Robustness of Audio-Visual Speaker Recognition Based on Articulatory Features

更新时间：2025-07-16

- Research on Robustness of Audio-Visual Speaker Recognition Based on Articulatory Features
- Acta Electronica Sinica Vol. 38, Issue 12, Pages: 2920-2924(2010)
- 作者机构：
  
  1. 合肥工业大学计算机与信息学院,安徽,合肥,230009
  2. 伊利诺伊大学香槟分校电子计算机工程系,伊利诺伊州,61801
  3. 合肥工业大学计算机与信息学院安徽合肥,230009
  4. 伊利诺伊大学香槟分校电子计算机工程系伊利诺伊州,61801
- 作者简介：
- 基金信息：
- DOI：
  CLC： TN912.34
- Published：2010
- 稿件说明：
移动端阅览
CHEN Yan-xiang, LIU Ming. Research on Robustness of Audio-Visual Speaker Recognition Based on Articulatory Features[J]. Acta Electronica Sinica, 2010, 38(12): 2920-2924.
DOI：

CHEN Yan-xiang, LIU Ming. Research on Robustness of Audio-Visual Speaker Recognition Based on Articulatory Features[J]. Acta Electronica Sinica, 2010, 38(12): 2920-2924. DOI：

摘要

人类对语音的感知是多模态的

会同时受到听觉和视觉的影响.以语音及其视觉特征的融合为研究核心

依据发音机理中揭示的音视频之间非同步关联的深层次成因

采用多个发音特征的非同步关联

去描述表面上观察到的音视频之间的非同步

提出了一个基于动态贝叶斯网络的语音与唇动联合模型

并通过音视频双模态的多层次融合

实现了说话人识别系统鲁棒性的提高.音视频双模态数据库上的实验表明了

在不同语音信噪比的条件下多层次融合均达到了更好的性能.

Abstract

Speech perception of human is bimodal because of the simultaneous audible and visible influence.This paper investigates how to fuse speech and visual speech features.From research on articulatory mechanism

the apparently observed audio-visual asynchrony is represented by asynchronous articulatory feature streams.An audio-visual model composed of speech and lip-moving is proposed based on Dynamic Bayesian Network

and then the multi-level fusion is implemented to improve the robustness of speaker recognition system.The experiment for audio-visual bimodal corpus shows that the multi-level fusion can improve the performance at all levels of acoustic signal-to-noise ratio (SNR) from 0 to 30dB.

关键词

Keywords

references

Views

1633

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

TFA-Conformer Based Network for Short Utterance Speaker Recognition

An Automatic Caption Generator for Mandarin Broadcast News

Speaker Recognition Based on Adaptive Niche Hybrid Genetic Algorithms

Related Author

ZENG De-lu

YANG Jun-mei

ZHANG Bang-cheng

YANG Lu

ZHENG Li-lei

ZHANG Yan-ning

YANG Yu-lian

WANG Xiao-xuan

Related Institution

School of Electronics and Information Technology, South China University of Technology

Shaanxi Provincial Key Laboratory of Speech and Image Information Processing,School of Computer Science, Northwestern Polytechnical University

College of Communication Engineering, Jilin University

1 Institute of Semiconductors,Chinese Academy of Sciences

2 Dept.of Automation,Qiqihaer Light Industry Institute,Heilongjiang Qiqihear

⁰