Online Energy Adjustment Using AR-HMM for Speech Enhancement
HE Yu-wen, BAO Chang-chun, XIA Bing-yin
Speech and Audio Signal Processing Lab, School of Electronic Information and Control Engineering, Beijing University of Technology, Beijing 100124, China
Because the existing single channel speech enhancement technologies perform not well in the tracking and suppression of non-stationary noise, the speech enhancement method based on online energy adjustment is proposed.The normalized critical band energy parameters are employed as the feature in Gaussian mixture model (GMM) to distinguish the background noises.Based on the AR-HMM of clean speech and the noise of corresponding type, the power spectrums of speech and noise are estimated under minimum mean square error (MMSE) criteria.When the differences between the training data and test data are considered in the non-stationary noise environment, the online adjustment method for the speech and noise models is necessary.The scaling factor of speech energy is estimated with the iterative expectation maximization (EM) algorithm and the one of noise energy is estimated with the re-estimation approach similar to the training stage.And the initial scaling factor of noise energy is obtained by minima-controlled recursive averaging (MCRA) algorithm.The evaluation of the proposed method is performed under the standard of ITU-T G.160.The test results reveal that, comparing with the two reference methods, the proposed method performs well in non-stationary noise environments, including larger noise reduction and shorter convergence time.
何玉文, 鲍长春, 夏丙寅. 基于AR-HMM在线能量调整的语音增强方法[J]. 电子学报, 2014, 42(10): 1991-1997.
HE Yu-wen, BAO Chang-chun, XIA Bing-yin. Online Energy Adjustment Using AR-HMM for Speech Enhancement. Chinese Journal of Electronics, 2014, 42(10): 1991-1997.
[1] Ephraim Y.A Bayesian estimation approach for speech enhancement using hidden Markov models[J].IEEE Transactions on Signal Processing,1992,40(4):725-735.
[2] Ephraim Y.Gain-adapted hidden Markov models for recognition of clean and noisy speech[J].IEEE Transactions on Signal Processing,1992,40(6):1303-1316.
[3] Sameti H,Sheikhzadeh H,Deng L,Brennan R L.HMM-based strategies for enhancement of speech signals embedded in non-stationary noise[J].IEEE Transactions on Speech and Audio Processing,1998,6(5):445-455.
[4] Srinivasan S,Samuelsson J,Kleijn W B.Codebook-based Bayesian speech enhancement[A].IEEE International Conference on Acoustics,Speech,and Signal Processing[C].IEEE,2005.1077-1080.
[5] Zhao D Y,Kleijn W B.HMM-based gain modeling for enhancement of speech in noise[J].IEEE Transactions on Audio,Speech,and Language Processing,2007,15(3):882-892.
[6] Zhao D Y,Kleijn W B,Ypma A,et al.Online noise estimation using stochastic-gain HMM for speech enhancement[J].IEEE Transactions on Audio,Speech,and Language Processing,2008,16(4):835-846.
[7] Srinivasan S,Samuelsson J,Kleijn W B.Codebook-based Bayesian speech enhancement for nonstationary environments[J].IEEE Transactions on Audio,Speech,and Language Processing,2007,15(2):441-452.
[8] Varga A,Steeneken H J M.Assessment for automatic speech recognition:II.NOISEX-92:a database and an experiment to study the effect of additive noise on speech recognition systems[J].Speech Communication,1993,12 (3):247-251.
[9] Johnston J D.Transform coding of audio signals using perceptual noise criteria[J].IEEE Journal on Selected Areas in Communications,1988,6(2):314-323.
[10] Ephraim Y.A minimum mean square error approach for speech enhancement[A].International Conference on Acoustics,Speech,and Signal Processing[C].IEEE,1990.829-832.
[11] ITU-T Recommendation P.862.Perceptual Evaluation of Speech Quality(PESQ):An Objective Method for End-to-end Speech Quality Assessmen of Narrow-band Telephone Networks and Speech Codecs[S].1996.
[12] Loizou P.Speech enhancement based on perceptually motivated Bayesian estimators of the speech magnitude spectrum[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,2005,13(5):857-869.