Abstract:How to improve the generalization ability to unknown noise types is an important problem to be solved urgently in supervised speech enhancement approaches.By modeling a large number of types of noise,the deep neural network(DNN)becomes an effective way to solve this problem.In order to further improve the generalization ability of speech enhancement approaches based on DNN,this paper designs NoiseGAN based on Generative Adversarial Networks (GAN) to generate new noise types from real noise data.By adding generated noise to training set,the diversity of noise types in training set is increased,and thereby the generalization ability of speech enhancement model is improved.The results of speech enhancement experiments under different structures of networks show that the proposed NoiseGAN can generate new noise types,increase the diversity of noise types in training set,and effectively improve the generalization ability of speech enhancement models under unknown noise types.
[1] LOIZOU P C.Speech Enhancement:Theory and Practice[M].UK:CRC Press,2013.
[2] 孟宪波,鲍长春.基于最小控制GARCH模型的噪声估计算法[J].电子学报,2016,44(3):747-752. MENG Xian-bo,BAO Chang-chun.Noise estimate algorithm based on minima controlled GARCH model[J].Acta Electronica Sinica,2016,44(3):747-752.(in Chinese)
[3] 何玉文,鲍长春,夏丙寅,等.基于AR-HMM在线能量调整的语音增强方法[J].电子学报,2014,42(10):1991-1997. HE Yu-wen,BAO Chang-chun,XIA Bing-yin,et al.Online energy adjustment using AR-HMM for speech enhancement[J].Acta Electronica Sinica,2014,42(10):1991-1997.(in Chinese)
[4] MOHAMMADIHA N,SMARAGDIS P,LEIJON A.Supervised and unsupervised speech enhancement using nonnegative matrix factorization[J].IEEE Transactions on Audio,Speech,and Language Processing,2013,21(10):2140-2151.
[5] 刘文举,聂帅,梁山,等.基于深度学习语音分离技术的研究现状与进展[J].自动化学报,2016,42(6):819-833. LIU Wen-Ju,NIE Shuai,LIANG Shan,et al.Deep learning based speech separation technology and its developments[J].Acta Automatica Sinica,2016,42(6):819-833.(in Chinese)
[6] XU Y,DU J,DAI L R,et al.An experimental study on speech enhancement based on deep neural networks[J].IEEE Signal Processing Letters,2014,21(1):65-68.
[7] XU Y,DU J,DAI L R,et al.A regression approach to speech enhancement based on deep neural networks[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2015,23(1):7-19.
[8] XU Y,DU J,HUANG Z,et al.Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement[A].Proceedings of Sixteenth Annual Conference of the International Speech Communication Association[C].Dresden:ISCA,2015.1508-1512.
[9] WANG Y,CHEN J,WANG D L.Deep Neural Network Based Supervised Speech Segregation Generalizes to Novel Noises Through Large-Scale Training[R].Ohio State University Columbus,2015.
[10] CHEN J,WANG Y,YOHO S E,et al.Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises[J].The Journal of the Acoustical Society of America,2016,139(5):2604-2612.
[11] CHEN J,WANG Y,WANG D L.Noise perturbation for supervised speech separation[J].Speech Communication,2016,78:1-10.
[12] GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[A].Proceedings of Advances in Neural Information Processing Systems[C].US:NIPS,2014.2672-2680.
[13] RADFORD A,METZ L,CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[A].International Conference on Learning Representations[C].US:ICLR,arXiv:1511.06434v2.
[14] HU G.100 Nonspeech Environmental Sounds,2004[OL].http://web.cse.ohio-state.edu/pnl/corpus/HuNonspeech/HuCorpus.html,2004.
[15] GAROFOLO J S,LAMEL L F,FISHER W M,et al.TIMIT Acoustic-Phonetic Continuous Speech Corpus[CD].Philadelphia:Linguistic Data Consortium,1993.
[16] VARGA A,STEENEKEN H J M.Assessment for automatic speech recognition:Ⅱ.NOISEX-92:A database and an experiment to study the effect of additive noise on speech recognition systems[J].Speech Communication,1993,12(3):247-251.
[17] YU D,EVERSOLE A,SELTZER M,et al.An Introduction to Computational Networks and the Computational Network Toolkit[R].Tech Rep MSR,Microsoft Research,2014.
[18] RIX A W,BEERENDS J G,HOLLIER M P,et al.Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs[A].Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing[C].US:IEEE,2001.749-752.
[19] TAAL C H,HENDRIKS R C,HEUSDENS R,et al.An algorithm for intelligibility prediction of time-frequency weighted noisy speech[J].IEEE Transactions on Audio,Speech,and Language Processing,2011,19(7):2125-2136.