[1] 陈楠,鲍长春.基于双耳线索编码原理的语音增强方法[J].电子学报,2019,47(1):227-233. CHEN Nan,BAO Chang-chun.Speech enhancement method based on binaural cues coding principle[J].Acta Electronica Sinica,2019,47(1):227-233.(in Chinese)
[2] OU Shifeng,SONG Peng,GAO Ying.Laplacian speech model and soft decision based MMSE estimator for noise power spectral density in speech enhancement[J].Chinese Journal of Electronics,2018,27(6):1214-1220.
[3] 刘文举,聂帅,梁山,等.基于深度学习语音分离技术的研究现状与进展[J].自动化学报,2016,42(6):819-833. LIU Wenju,NIE Shuai,LIANG Shan,et al.Deep learning based speech separation technology and its developments[J].Acta Automatica Sinica,2016,42(6):819-833.(in Chinese)
[4] WANG D L,CHEN J.Supervised speech separation based on deep learning:An overview[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2018,26(10):1702-1726.
[5] WANG Y,WANG D L.Towards scaling up classification-based speech separation[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2013,21(7):1381-1390.
[6] WANG Y,NARAYANAN A,WANG D L.On training targets for supervised speech separation[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2014,22(12):1849-1858.
[7] WILLIAMSON D S,WANG D L.Time-frequency masking in the complex domain for speech dereverberation and denoising[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2017,25(7):1492-1501.
[8] XU Y,DU J,DAI L R,et al.An experimental study on speech enhancement based on deep neural networks[J].IEEE Signal Processing Letters,2014,21(1):65-68.
[9] XU Y,DU J,DAI L R,et al.A regression approach to speech enhancement based on deep neural networks[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2015,23(1):7-19.
[10] HUANG P S,KIM M,HASEGAWA-JOHNSON M,et al.Joint optimization of masks and deep recurrent neural networks for monaural source separation[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2015,23(12):2136-2147.
[11] WENINGER F,ERDOGAN H,WATANABE S,et al.Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR[A].Proceedings of International Conference on Latent Variable Analysis and Signal Separation[C].Liberec:Springer International Publishing,2015.91-99.
[12] CHEN J,WANG D.Long short-term memory for speaker generalization in supervised speech separation[J].Journal of the Acoustical Society of America,2017,141(6):4705-4714.
[13] PARK S R,LEE J.A fully convolutional neural network for speech enhancement[A].Proceedings of the Eighteenth Annual Conference of the International Speech Communication Association[C].Stockholm:ISCA,2017.1993-1997.
[14] FU S W,TSAO Y,LU X.SNR-aware convolutional neural network modeling for speech enhancement[A].Proceedings of the Seventeenth Annual Conference of the International Speech Communication Association[C].California:ISCA,2016.3768-3772.
[15] TAN K,CHEN J,WANG D.Gated residual networks with dilated convolutions for supervised speech separation[A].Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing[C].Alberta:IEEE,2018.21-25.
[16] TAN K,CHEN J,WANG D.Gated residual networks with dilated convolutions for monaural speech enhancement[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2019,27(1):189-198.
[17] LI Y,LI X,DONG Y,LI M,XU S,XIONG S.Densely connected network with time-frequency dilated convolution for speech enhancement[A].Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing[C].Brighton:IEEE,2019.6860-6864.
[18] 杨绪魁,屈丹,张文林,闫红刚.基于长时信息的自适应话音激活检测[J].电子学报,2018,46(4):878-885. YANG Xu-kui,QU Dan,ZHANG Wen-lin,YAN Hong-gang.Adaptive voice activity detection based on long-term information[J].Acta Electronica Sinica,2018,46(4):878-885.(in Chinese)
[19] SHI X,CHEN Z,WANG H,et al.Convolutional LSTM network:A machine learning approach for precipitation nowcasting[A].Advances in Neural Information Processing Systems[C].Morgan Kaufmann,2015.802-810.
[20] BALLAS N,YAO L,PAL C,et al.Delving deeper into convolutional networks for learning video representations[J].arXiv Preprint,2015,arXiv:1511.06432.
[21] GAROFOLO J S,LAMEL L F,FISHER W M,et al.TIMIT Acoustic-Phonetic Continuous Speech Corpus[M].Linguistic Data Consortium,Philadelphia,1993,33.
[22] WANG D,ZHANG X,ZHANG Z.THCHS-30:A Free Chinese Speech Corpus[OL].http://arxiv.org/abs/1512.01882,2015/2019-08-10.
[23] HU G.100 Nonspeech Environmental Sounds[OL].http://web.cse.ohio-state.edu/pnl/corpus/HuNonspeech/HuCorpus.html,2004/2019-08-10.
[24] VARGA A,STEENEKEN H J M.Assessment for automatic speech recognition:II.NOISEX-92:A database and an experiment to study the effect of additive noise on speech recognition systems[J].Speech Communication,1993,12(3):247-251.
[25] RIX A W,BEERENDS J G,HOLLIER M P,et al.Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs[A].Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing[C].Utah:IEEE,2001.749-752.
[26] TAAL C H,HENDRIKS R C,HEUSDENS R,et al.An algorithm for intelligibility prediction of time-frequency weighted noisy speech[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2011,19(7):2125-2136. |