电子学报 ›› 2019, Vol. 47 ›› Issue (6): 1244-1250.DOI: 10.3969/j.issn.0372-2112.2019.06.009

• 学术论文 • 上一篇    下一篇

加性噪声条件下鲁棒说话人确认

张二华, 王明合, 唐振民   

  1. 南京理工大学计算机科学与工程学院, 江苏南京 210094
  • 收稿日期:2016-11-24 修回日期:2018-01-15 出版日期:2019-06-25 发布日期:2019-06-25
  • 通讯作者: 张二华
  • 作者简介:王明合 男,2009年于南京工业大学获得硕士学位,现为南京理工大学博士研究生,主要研究方向为信号处理、语音识别、说话人识别.;唐振民 男,1961年,陕西杨凌人.1982年出生于哈尔滨工程大学(原哈尔滨船舶工程学院)获得学士学位,1988于华东理工大学获得硕士学位,2002年于南京理工大学获得博士学位,现为南京理工大学教授、博导,CCF高级会员,主要研究领域为语音识别、图像处理、智能机器人
  • 基金资助:
    国家自然科学基金(No.61473154)

Robust Speaker Verification Under Additive Noise Condition

ZHANG Er-hua, WANG Ming-he, TANG Zhen-min   

  1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu 210094, China
  • Received:2016-11-24 Revised:2018-01-15 Online:2019-06-25 Published:2019-06-25

摘要: 基于非负矩阵分解的语音去噪,在提高语音信号信噪比的同时,也会引起语音失真,从而导致噪声环境下说话人确认系统性能下降.本文提出基于分区约束非负矩阵分解的语音去噪方法(Nonnegative Matrix Factorization with Partial Constrains,PCNMF),目的是在未知和非平稳噪声条件下提高话人确认系统的鲁棒性.PCNMF在满足分区约束条件的基础上分别构建语音字典和噪声字典.考虑到传统语音训练产生的语音字典往往含有一定的噪声成分,PCNMF通过数学模型产生基音及泛音频谱,在此基础上利用该频谱模仿人声的共振峰结构来合成字典,从而保证语音字典纯净性.另一方面,为了克服传统噪声字典构建方法带来的部分噪声信息丢失问题,PCNMF对在线分离出的噪声样本进行分帧和短时傅里叶变换,然后以帧为单位线性组合生成噪声字典.性能评估实验引入了多种噪声类型,实验结果表明PCNMF可有效提高说话人确认系统的鲁棒性,特别是在未知和非平稳噪声条件下其等错率相比基线系统(Multi-Condition)平均降低了5.2%.

关键词: 语音处理, 说话人确认, 非负矩阵分解, 加性噪声

Abstract: While nonnegative matrix factorization based speech enhancing methods can improve signal to noise ratio (SNR) of recovered speech signal,these methods lead to the speech distortion,and thus degrade the performance of speaker verification system under noisy environment.This paper proposes a nonnegative matrix factorization with partial constrains (PCNMF),with objective of enhancing the robustness of speaker verification system in presence of unknown and unstable noises.PCNMF constructs the speech and noise dictionaries while satisfying partition conditions.Considering that the speech dictionary generated by traditional speech training contains a little noise element,PCNMF generates speech dictionary using the spectra of pitch and their harmonics via mathematical model,and accordingly imitates the formant structure of human voice.The purpose is to guarantee the purity of speech dictionary.In addition,in order to alleviate the problem about the loss of the information of the noise sample,PCNMF performs framing operation and Short-Time Fourier Transform against the noise samples separated online,and then generates noise dictionary by means of linear combination of the spectrum frames of the noise samples.Our experiment takes unknown and unstable noises into account,demonstrating that the proposed PCNMF achieves significant improvement of robustness under various noise conditions.Particularly,the equal error rate of PCNMF is reduced by an average of 5.2% in comparison with the base-line (Multi-Condition system).

Key words: speech processing, speaker verification, nonnegative matrix factorization, additive noise

中图分类号: