Probabilistic Combination Framework of Two Decision-Directed Algorithms for a Priori SNR Estimation
OU Shi-feng1, ZHAO Yan-lei1, SONG Peng2, GAO Ying1
1. School of Science and Technology for Opto-electronic Information, Yantai University, Yantai, Shandong 264005, China;
2. School of Computer and Control Engineering, Yantai University, Yantai, Shandong 264005, China
Abstract:Due to the low computational complexity and acceptable ability in reducing musical noise effect,the decision-directed (DD) approach is widely used for estimating the a priori signal-noise-ratio (SNR) in many speech enhancement systems.However,the DD approach suffers from the problem of time delay and the performance is very sensitive to the fixed smoothing factor.Firstly,the performance of DD approach in musical noise reduction as well as speech distortion attenuation are analyzed using actual speech and noise data,and the boundary values of smoothing factors are presented in view of the analyzed results.Then,a novel algorithm is proposed,in which two DD approaches with different smoothing factors are probabilistically combined in an attempt to put together the best properties of them.The contribution of either DD approach to the combination is automatically adjusted in accordance with the speech absence probability,which can be computed using the complex Gaussian model and soft decision technique.Experiments are carried out in different noise and input SNR conditions,and the results demonstrate that the proposed algorithm can significantly outperform the popular methods for estimating the a priori SNR.
[1] Kuldip P,Kamil W and Belinda S.Single-channel speech enhancement using spectral subtraction in the short-time modulation domain[J].Speech Communication,2010,52(5):450-475.
[2] Ephraim Y and Malah D.Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator[J].IEEE Transactions on Acoustic,Speech and Signal Processing,1984,32(6):1109-1121.
[3] Jahangir A and Douglas O.Perceptual improvement of Wiener filtering employing a post-filter[J].Digital Signal Processing,2011,21:54-65.
[4] Mohamed D and Pascal S.Reducing over and under-estimation of the a priori SNR in speech enhancement techniques[J].Digital Signal Processing,2014,32:124-136.
[5] Cappé O.Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor[J].IEEE Transactions on Speech and Audio Processing,1994,2(2):345-349.
[6] Plapous C,Marro C and Scalart P.Improved signal-to-noise ratio estimation for speech enhancement[J].IEEE Transactions on Audio,Speech and Language Processing,2006,14(6):2098-2108.
[7] Yong P C,Nordholm S and Dam H.Optimization and evaluation of sigmoid function with a priori SNR estimate for real-time speech enhancement[J].Speech Communication,2013,55(2):358-376.
[8] Lee Y K,Gue P J,Keun L Y,et al.Speech enhancement using phase-dependent a priori SNR estimator in Log-Mel spectral domain[J].ETRI Journal,2014,36(5):721-729.
[9] Soon I Y and Koh S N.Low distortion speech enhancement[J].IEE Proceedings-Vision Image and Signal Processing,2000,147(3):247-253.
[10] Hasan M K,Salahuddin S and Khan M R.A modified a priori SNR for speech enhancement using spectral subtraction rules[J].IEEE Signal Processing Letters,2004,11(4):450-453.
[11] Park Y S and Chang J H.A novel approach to a robust a priori SNR estimator in speech enhancement[J].IEICE transactions on Communication,2007,E90-B(8):2182-2185.
[12] Nahma L,Yong P C,Dam H,et al.Improved a priori SNR estimation in speech enhancement[A].In Proceedings of 23rd Asia-Pacific Conference on Communications [C].Ho Chi Minh,Vietnam:2017.1-5.
[13] Suhadi S,Last C and Fingscheidt T.A data-driven approach to a priori SNR estimation[J].IEEE Transactions on Audio,Speech and Language Processing,2010,19(1):186-195.
[14] Choi J H and Chang J H.On using acoustic environment classification for statistical model-based speech enhancement[J].Speech Communication,2012,54(3):477-490.
[15] Lee S,Lim C and Chang J H.A new a priori SNR estimator based on multiple linear regression technique for speech enhancement[J].Digital Signal Processing,2014,30:154-164.
[16] Elshamy S,Madhu N,Tirry W,et al.Instantaneous a priori SNR estimation by cepstral excitation manipulation[J].IEEE/ACM Transactions on Audio,Speech and Language Processing,2017,25(8):1592-1605.
[17] Xu Z,Elshamy S and Tim F.A priori SNR estimation using discriminative non-negative matrix factorization[A].In Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing [C].Calgary,AB,Canada:2018.661-665
[18] Ou S,Song P and Gao Y.Phase-sensitive decision-directed SNR estimator for single-channel speech enhancement[J].International Journal of Pattern Recognition and Artificial Intelligence,2017,31(8):1758003~1-16.
[19] Shin H S,Fingscheidt T and Kang H G.A priori SNR estimation using air and bone-conduction microphones[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2015,23(11):2015-2025.
[20] Délier J,Proakis J and Hansen J.Discrete-time processing of speech signals[M].Wiley-Interscience,1993.
[21] Park T J and Chang J H.Dempster-shafer theory for enhanced statistical model-based voice activity detection[J].Computer Speech and Language,2018,47:47-58.
[22] Chang J,Jo Q,Kim D K,et al.Global soft decision employing support vector machine for speech enhancement[J].IEEE Signal Processing Letters,2008,16(1):57-60.
[23] Hu Y and Loizou P C.Evaluation of objective quality measures for speech enhancement[J].IEEE Transactions on Audio,Speech and Language Processing,2008,16(1):229-238.
[24] Taal C H,Hendriks R C,Heusdens R and Jensen J.An algorithm for intelligibility prediction of time-frequency weighted noisy speech[J].IEEE Transactions on Audio,Speech,and Language Processing,2011,19(7):2125-2136.