电子学报 ›› 2014, Vol. 42 ›› Issue (10): 1991-1997.DOI: 10.3969/j.issn.0372-2112.2014.10.019

• 学术论文 • 上一篇    下一篇

基于AR-HMM在线能量调整的语音增强方法

何玉文, 鲍长春, 夏丙寅   

  1. 北京工业大学电子信息与控制工程学院语音与音频信号处理实验室, 北京 100124
  • 收稿日期:2013-06-28 修回日期:2013-09-23 出版日期:2014-10-25
    • 作者简介:
    • 何玉文 女,1988年生于北京,北京工业大学硕士研究生,主要研究方向为语音增强. E-mail:iamhyw@emails.bjut.edu.cn;鲍长春 男,1965年生于内蒙古赤峰,博士,北京工业大学教授、博士生导师,IEEE高级会员,国际语音通信学会(ISCA)会员,亚太信号与信息处理学会(APSIAP)会员,中国电子学会理事,中国声学学会理事,信号处理学会委员.主要研究方向为语音与音频信号处理. E-mail:chchbao@bjut.edu.cn;夏丙寅 男,1986年生于北京,北京工业大学博士生,主要研究方向为语音编码与增强. E-mail:xby-abc@emails.bjut.edu.cn
    • 基金资助:
    • 国家自然科学基金 (No.61072089); 北京市教育委员会科技发展计划重点项目 (No.KZ201110005005)

Online Energy Adjustment Using AR-HMM for Speech Enhancement

HE Yu-wen, BAO Chang-chun, XIA Bing-yin   

  1. Speech and Audio Signal Processing Lab, School of Electronic Information and Control Engineering, Beijing University of Technology, Beijing 100124, China
  • Received:2013-06-28 Revised:2013-09-23 Online:2014-10-25 Published:2014-10-25
    • Supported by:
    • National Natural Science Foundation of China (No.61072089); Science and Technology Development Project of Beijing Municipal Education Commission (No.KZ201110005005)

摘要:

针对单通道语音增强技术对非平稳噪声的跟踪不准确、噪声抑制效果较差的问题,本文提出一种基于在线能量调整的语音增强方法.该方法以归一化临界带能量为特征,采用高斯混合模型对背景噪声进行分类,利用对应类型噪声的自回归隐马尔可夫模型(Auto-Regressive Hidden Markov Model,AR-HMM)和纯净语音的AR-HMM,在最小均方误差准则下估计语音和噪声的功率谱.考虑到非平稳环境中训练集和测试集的差异性,需在线调整语音模型和噪声模型中的能量,语音模型的能量调整采用迭代的期望最大化算法;噪声模型的能量调整则利用的是模型训练过程中的能量重估方法,并以最小值控制的递归平均算法确定噪声能量调整的初始值.在ITU-T G.160标准下对算法进行性能测试,测试结果表明,本文方法对非平稳噪声的跟踪效果较好,对噪声衰减量较大,收敛时间较短.

关键词: 语音增强, 非平稳噪声, 隐马尔可夫模型, 高斯混合模型

Abstract:

Because the existing single channel speech enhancement technologies perform not well in the tracking and suppression of non-stationary noise, the speech enhancement method based on online energy adjustment is proposed.The normalized critical band energy parameters are employed as the feature in Gaussian mixture model (GMM) to distinguish the background noises.Based on the AR-HMM of clean speech and the noise of corresponding type, the power spectrums of speech and noise are estimated under minimum mean square error (MMSE) criteria.When the differences between the training data and test data are considered in the non-stationary noise environment, the online adjustment method for the speech and noise models is necessary.The scaling factor of speech energy is estimated with the iterative expectation maximization (EM) algorithm and the one of noise energy is estimated with the re-estimation approach similar to the training stage.And the initial scaling factor of noise energy is obtained by minima-controlled recursive averaging (MCRA) algorithm.The evaluation of the proposed method is performed under the standard of ITU-T G.160.The test results reveal that, comparing with the two reference methods, the proposed method performs well in non-stationary noise environments, including larger noise reduction and shorter convergence time.

Key words: speech enhancement, non-stationary noise, hidden Markov model, Gaussian mixture model

中图分类号: