Abstract:While nonnegative matrix factorization based speech enhancing methods can improve signal to noise ratio (SNR) of recovered speech signal,these methods lead to the speech distortion,and thus degrade the performance of speaker verification system under noisy environment.This paper proposes a nonnegative matrix factorization with partial constrains (PCNMF),with objective of enhancing the robustness of speaker verification system in presence of unknown and unstable noises.PCNMF constructs the speech and noise dictionaries while satisfying partition conditions.Considering that the speech dictionary generated by traditional speech training contains a little noise element,PCNMF generates speech dictionary using the spectra of pitch and their harmonics via mathematical model,and accordingly imitates the formant structure of human voice.The purpose is to guarantee the purity of speech dictionary.In addition,in order to alleviate the problem about the loss of the information of the noise sample,PCNMF performs framing operation and Short-Time Fourier Transform against the noise samples separated online,and then generates noise dictionary by means of linear combination of the spectrum frames of the noise samples.Our experiment takes unknown and unstable noises into account,demonstrating that the proposed PCNMF achieves significant improvement of robustness under various noise conditions.Particularly,the equal error rate of PCNMF is reduced by an average of 5.2% in comparison with the base-line (Multi-Condition system).
[1] Lei Y,Burget L,Ferrer L,et al.Towards noise-robust speaker recognition using probabilistic linear discriminant analysis[A].Proceedings of IEEE International Conference on Acoustics,Speech,& Signal Processing[C].Kyoto:IEEE,2012.4253-4256.
[2] Lyubimov N,Kotov M.Non-negativematrix factorization with linear constraints for single-channel speech enhancement[A].Proceedings of the 14th Annual Conference of the International Speech Communication Association[C].Lyon:EURASIP,2013.446-450.
[3] Lee D D,Seung H S.Learning the parts of objects by nonnegative matrix factorization[J].Nature,1999,401:788-791.
[4] Mohammadiha N,Smaragdis P,Leijon A.Supervised andunsupervised speech enhancement using nonnegative matrix factorization[J].IEEE Transactions on Audio Speech & Language Processing,2013,21(10):2140-2151.
[5] SUN M,LI YA,GEMMEKE J F,et al.Speechenhancement under low snr conditions via noise estimation using sparse and low-rank nmf with kullback-leibler divergence[J].IEEE Transactions on Audio,Speech,and Language Processing,2015,23(7):1233-1242.
[6] Hu Y,Liu G.Separation of singing voice using nonnegative matrix partial co-factorization for singer identification[J].IEEE/ACM Transactions on Audio Speech & Language Processing,2015,23(4):643-653.
[7] Lyubimov N,Nastasenko M,Kotov M,et al.Exploiting Non-negative Matrix Factorization with Linear Constraints in Noise-Robust Speaker Identification[A].Speech and Computer[C].Novi Sad,Serbia:Springer International Publishing,2014.200-208.
[8] Sun M,Zhang X,Van Hamme H,et al.Unseennoise estimation using separable deep auto encoder for speech enhancement[J].IEEE/ACM Transactions on Audio Speech & Language Processing,2016,24(1):93-104.
[9] Yoshioka T,Nakatani T.Noisemodel transfer:novel approach to robustness against nonstationary noise[J].IEEE Transactions on Audio Speech & Language Processing,2013,21(10):2182-2192.
[10] Li J,Deng L,Gong Y,et al.An overview of noise-robust automatic speech recognition[J].IEEE/ACM Transactions on Audio Speech & Language Processing,2014,22(4):745-777.
[11] 练秋生,石保顺,陈书贞.字典学习模型、算法及其应用研究进展[J].自动化学报,2015,41(2):240-260. LIAN Qiu-sheng,SHI Bao-shun,CHEN Shu-zhen.Research advances on dictionary learning models,algorithms and applications[J].Acta Automatica Sinica,2015,41(2):240-260.(in Chinese)
[12] KANAGASUNDARAM A,DEANA D,SRIDHARAN S,et al.I-vector based speaker recognition using advanced channel compensation techniques[J].Computer Speech and Language,2014,28(1):121-140.
[13] 蒋晔,唐振民.短语音说话人辨认的研究[J].电子学报,2011,39(4):953-957. JIANG Ye,TANG Zhen-min.Research on the speaker identification based on short utterance[J].Acta Electronica Sinica,2011,39(4):953-957.(in Chinese)
[14] XU Long-ting,YANG Zhen,SUN Lin-hui.Simplification of I-vector extraction for speaker identification[J].Chinese Journal of Electronics,2016,25(6):1121-1126.
[15] Avila A R,Sarria-Paja M,Fraga F J,et al.Improving the performance of far-field speaker verification using multicondition training:The case of GMM-UBM and i-vector systems[A].Proceedings of the Fifteenth Annual Conference of the International Speech Communication Association[C].Singapore:EURASIP,2014.1096-1100.
[16] 许云飞,杨海,周若华,等.高斯PLDA在说话人确认中的应用及其联合估计[J].自动化学报,2014,40(6):1068-1074. XU Yun-fei,YANG Hai,ZHOU Ruo-hua,et al.Gaussian PLDA for speaker verification and joint estimation[J].Acta Automatica Sinica,2014,40(6):1068-1074.(in Chinese)
[17] Stafylakis T,Kenny P,Alam M J,et al.Speaker andchannel factors in text-dependent speaker recognition[J].IEEE/ACM Transactions on Audio Speech & Language Processing,2016,24(1):65-78.
[18] Mak M W,Pang X,Chien J T.Mixture of PLDA for noise robust i-vector speaker verification[J].IEEE/ACM Transactions on Audio Speech & Language Processing,2016,24(1):130-142.
[19] 车滢霞,俞一彪.约束条件下的结构化高斯混合模型及非平行语料语音转换[J].电子学报,2016,44(9):2282-2288. CHE Ying-xia,YU Yi-biao.Non-parallelcorpora voice conversion based on structured gaussian mixture model under constraint conditions[J].Acta Electronica Sinica,2016,44(9):2282-2288.(in Chinese)
[20] XU Yun-fei,YANG Hai,YANG Lin,et al.A general bayesian model for speaker verification[J].Chinese Journal of Electronics,2016,25(6):1045-1051.