Compressed Domain Speech Enhancement Based on Gaussian Mixture Model

LIANG Yan; BAO Chang-chun; XIA Bing-yin; HE Yu-wen; ZHOU Xuan; LI Na

doi:10.3969/j.issn.0372-2112.2012.10.022

您当前的位置：

首页 >

文章列表页 >

Compressed Domain Speech Enhancement Based on Gaussian Mixture Model

更新时间：2025-07-16

- Compressed Domain Speech Enhancement Based on Gaussian Mixture Model
- Acta Electronica Sinica Vol. 40, Issue 10, Pages: 2031-2038(2012)
- 作者机构：
  
  北京工业大学电子信息与控制工程学院,北京,100124
- 作者简介：
- 基金信息：
  
  Key Program of Science and Technology Development Plan of Beijing Municipality Education Commission (No.KZ201110005005);National Natural Science Foundation of China (No.61072089);Talents training program of higher education institutions in Beijing Municipality
- DOI：10.3969/j.issn.0372-2112.2012.10.022
  CLC： TN912.35
- Published：2012
- 稿件说明：
移动端阅览
LIANG Yan, BAO Chang-chun, XIA Bing-yin, et al. Compressed Domain Speech Enhancement Based on Gaussian Mixture Model[J]. Acta Electronica Sinica, 2012, 40(10): 2031-2038.
DOI：

LIANG Yan, BAO Chang-chun, XIA Bing-yin, et al. Compressed Domain Speech Enhancement Based on Gaussian Mixture Model[J]. Acta Electronica Sinica, 2012, 40(10): 2031-2038. DOI： 10.3969/j.issn.0372-2112.2012.10.022.

摘要

为了有效利用纯净语音导抗谱频率参数(ISFs)的先验知识

本文针对ITU-T G.722.2宽带语音编码标准提出了一种基于高斯混合模型的压缩域语音增强方法.首先

将含噪语音、纯净语音的导抗谱频率参数

以及对应的增益调整因子构成特征矢量

并利用高斯混合模型拟合其概率密度;然后

在最小均方误差 (MMSE) 准则下对纯净语音的特征参数进行最优贝叶斯估计.为了兼容编码器中的非连续性传输模式

当处理信号为非语音信息时

算法在保持噪声帧谱包络参数不变的前提下

按固定比例调整对数帧能量;且若出现帧擦除情况

算法不调整接收到的码流

并按正常帧处理方式调整恢复后的参数以更新相关历史.本文采用ITU-T G.160标准进行了性能测试

结果表明

与参考方法相比

所提方法在保证信噪比提高能力的同时

可以达到更大的噪声衰减量

且增强语音的客观质量更优.

Abstract

A Gaussian Mixture Model (GMM) based speech enhancement method in compressed domain used for ITU-T G.722.2 wideband speech codec is proposed to take full advantage of the prior knowledge of the Immittance Spectral Frequencies (ISFs) for the clean speech.Firstly

GMM is adopted to model the joint probability density of feature vectors which are composed by the ISFs of noisy speech and clean speech with the corresponding gain scaling factor.Secondly

an optimal Bayesian estimation of feature parameters derived from clean speech is obtained under the minimum mean square error (MMSE) criterion.To be compatible with the DTX (Discontinuous Transmission) mode

the logarithmic energy is attenuated and the ISFs remain when a SID (Silence Insertion Descriptor) frame is received.Furthermore

if ao erased frame is received

the bit stream is unchanged and the proposed method is performed on the recovered parameters for the memory update.The evaluation is conducted under the ITU-T G.160.The results indicate that

comparing with the reference method

the proposed method can produce larger amount of noise level reduction with better objective speech quality

while the SNR improvement remains acceptable.

关键词

Keywords

references

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Online Energy Adjustment Using AR-HMM for Speech Enhancement

An In-Vehicle Interaction Speech Enhancement and Recognition Method Based on Lightweight Models in Complex Environment

Suppression Method of the Interference Sound Sources by Estimated Steering Vector Based on the Focusing Signal Subspace

Structural α-Entropy Weighting Gaussian Mixture Model for Subspace Clustering

Sub-Band Voice Morphing Algorithm Based on State-Space Model

Related Author

HE Yu-wen

BAO Chang-chun

XIA Bing-yin

LIAN Xiao-yu

XIA Nan

DAI Gao-le

YANG Hong-qin

ZHOU Jing

Related Institution

Speech and Audio Signal Processing Lab, School of Electronic Information and Control Engineering, Beijing University of Technology

School of Information Science and Engineering， Dalian Ploytechnic University

Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing University of Technology

Hebei Machine Vision Engineering Research Center

School of Cyber Security and Computer, Hebei University

⁰