Abstract:In this paper,a single channel speech enhancement method is proposed by constructing a priori binaural cue codebook of speech and noise based on binaural cue coding principle.Firstly,as a priori information,the binaural cues of speech and noise are offline trained to form a priori codebook.Then,the weighted codebook mapping (WCBM) algorithm is used to estimate the clean cue.At last,the noisy speech is enhanced with binaural cue coding (BCC) model.Moreover,an estimation method of the clean cue is proposed for further improving performance based on deep neural network,namely stacked auto-encoders (SAE),instead of WCBM algorithm.Objective test results show that the proposed method is superior to the reference methods.
[1] Boll S F.Suppression of acoustic noise in speech using spectral subtraction[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,1979,27(2):113-120.
[2] Lim J S,Oppenheim A V.Enhancement and bandwidth compression of noisy speech[J].Proceedings of the IEEE,1979,67(12):1586-1604.
[3] Srinivasan S,Samuelsson J,Klejin W B.Codebook driven short-term prediction parameter estimation for speech enhancement[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,2006,14(1):163-175.
[4] Srinivasan S,Samuelsson J,Klejin W B.Codebook-based bayesian speech enhancement for nonstationary environments[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,2007,15(2):441-451.
[5] 何玉文,鲍长春,夏丙寅.基于AR-HMM在线能量调整的语音增强方法[J].电子学报,2014,42(10):1991-1997. HE Yu-wen,BAO Chang-chun,XIA Bing-yin.Online energy adjustment using AR-HMM for speech enhancement[J].Acta Electronica Sinica,2014,42(10):1991-1997.(in Chinese)
[6] 梁岩,鲍长春,夏丙寅,何玉文,李娜.基于高斯混合模型的压缩域语音增强方法[J].电子学报,2012,40(10):2031-2038. LIANG Yan,BAO Chang-chun,HE Yu-wen,LI Na.Compressed domain speech enhancement based on gaussian mixture model[J].Acta Electronica Sinica,2012,40(10):2031-2038.(in Chinese)
[7] Faller C,Baumgarte F.Binaural cue coding-part1:Psychoacoustic fundamentals and design principles[J].IEEE Transactions on Audio,Speech,Language Processing,2003,11(6):509-519.
[8] Faller C,Baumgarte F.Binaural cue coding-part2:Schemes and applicarions[J].IEEE Transactions on Audio,Speech,Language Processing,2003,11(6):520-531.
[9] 张勇,胡瑞敏.基于高斯混合模型的语音频带扩展所发的研究[J].声学学报,2009,34(5):471-480. ZHANG Yong,HU Rui-min.Speech wideband extension based on gaussian mixture model[J].Chinese Journal of Acoustics,2009,34(5):471-480.(in Chinese)
[10] 孟宪波,鲍长春.基于最小控制GARCH模型的噪声估计方法[J].电子学报,2016,44(3):747-752. MENG Xian-bo,BAO Chang-chun.Noise estimate algorithm based on minima controlled GARCH model[J].Acta Electronica Sinica,2016,44(3):747-752.(in Chinese)
[11] Araki S,Araki T.Exploring multi-channel features for denoising-autoencoder-based speech enhancement[A].Proceedings of the 40th International Conference on Acoustics,Speech and Signal Processing(ICASSP)[C].Brisbane,Australia:IEEE Press,2015.116-120.
[12] Hinton G E,Osindero S.A fast learning algorithm for deep belief nets[J].Neural Computation,2006,18(7):1527-1554.
[13] Quackenbush S R,Barnwell T P,Clements M A.Objective Measures of Speech Quality[M].Englewood Cliffs,NJ:Prentice Hall,1988.
[14] ITU-T,Recommendation P.862.Perceptual Evaluation of Speech Quality (PESQ):An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Coders[S].2001.
[15] Abramson A,Cohen I.Simultaneous detection and estimation approach for speech enhancement[J].IEEE Transactions on Audio,Speech and Language Processing,2007,15(8):2348-2359.
[16] Ephraim Y,Malah D.Speech enhancement using a minimum mean-square error log-spectral amplitude estimator[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,1985,23(2):443-445.