电子学报 ›› 2012, Vol. 40 ›› Issue (10): 2031-2038.DOI: 10.3969/j.issn.0372-2112.2012.10.022

• 学术论文 • 上一篇    下一篇

基于高斯混合模型的压缩域语音增强方法

梁岩, 鲍长春, 夏丙寅, 何玉文, 周璇, 李娜   

  1. 北京工业大学电子信息与控制工程学院, 北京 100124
  • 收稿日期:2012-05-21 修回日期:2012-08-06 出版日期:2012-10-25
    • 作者简介:
    • 梁 岩 女,1986年10月出生于山西朔州.2009年进入北京工业大学攻读硕士学位.主要研究方向:压缩域语音增强. E-mail:liangyan861003@163.com 鲍长春 男,1965年出生于内蒙古赤峰市.博士,教授,博士生导师.主要研究领域为语音与音频信号处理. E-mail:chchbao@bjut.edu.cn
    • 基金资助:
    • 北京市教育委员会科技发展计划重点项目 (No.KZ201110005005); 国家自然科学基金 (No.61072089); 北京市属高等学校人才强教计划

Compressed Domain Speech Enhancement Based on Gaussian Mixture Model

LIANG Yan, BAO Chang-chun, XIA Bing-yin, HE Yu-wen, ZHOU Xuan, LI Na   

  1. School of Electronic Information and Control Engineering, Beijing University of Technology, Beijing 100124, China
  • Received:2012-05-21 Revised:2012-08-06 Online:2012-10-25 Published:2012-10-25
    • Supported by:
    • Key Program of Science and Technology Development Plan of Beijing Municipality Education Commission (No.KZ201110005005); National Natural Science Foundation of China (No.61072089); Talents training program of higher education institutions in Beijing Municipality

摘要: 为了有效利用纯净语音导抗谱频率参数(ISFs)的先验知识,本文针对ITU-T G.722.2宽带语音编码标准提出了一种基于高斯混合模型的压缩域语音增强方法.首先,将含噪语音、纯净语音的导抗谱频率参数,以及对应的增益调整因子构成特征矢量,并利用高斯混合模型拟合其概率密度;然后,在最小均方误差 (MMSE) 准则下对纯净语音的特征参数进行最优贝叶斯估计.为了兼容编码器中的非连续性传输模式,当处理信号为非语音信息时,算法在保持噪声帧谱包络参数不变的前提下,按固定比例调整对数帧能量;且若出现帧擦除情况,算法不调整接收到的码流,并按正常帧处理方式调整恢复后的参数以更新相关历史.本文采用ITU-T G.160标准进行了性能测试,结果表明,与参考方法相比,所提方法在保证信噪比提高能力的同时,可以达到更大的噪声衰减量,且增强语音的客观质量更优.

关键词: 语音增强, 参数域, 高斯混合模型, 贝叶斯估计, 非连续性传输, 帧擦除

Abstract: A Gaussian Mixture Model (GMM) based speech enhancement method in compressed domain used for ITU-T G.722.2 wideband speech codec is proposed to take full advantage of the prior knowledge of the Immittance Spectral Frequencies (ISFs) for the clean speech.Firstly,GMM is adopted to model the joint probability density of feature vectors which are composed by the ISFs of noisy speech and clean speech with the corresponding gain scaling factor.Secondly,an optimal Bayesian estimation of feature parameters derived from clean speech is obtained under the minimum mean square error (MMSE) criterion.To be compatible with the DTX (Discontinuous Transmission) mode,the logarithmic energy is attenuated and the ISFs remain when a SID (Silence Insertion Descriptor) frame is received.Furthermore,if ao erased frame is received,the bit stream is unchanged and the proposed method is performed on the recovered parameters for the memory update.The evaluation is conducted under the ITU-T G.160.The results indicate that,comparing with the reference method,the proposed method can produce larger amount of noise level reduction with better objective speech quality,while the SNR improvement remains acceptable.

Key words: speech enhancement, compressed domain, Gaussian Mixture Model, Bayesian estimation, DTX, frame erasure

中图分类号: