[1] RAMIREZ J,GORRIZ J-M,SEGURA J-C.Voice activity detection,fundamentals and speech recognition system robustness,robust speech recognition and understanding[OL].https://www.intechopen.com/books/robust_speech_ecognition_and_understanding/voice_activety_detection_fundamentals_and_speech_recognition_system_robustness, 2016-11-16.
[2] WISDOM S,OKOPAL G,ATLAS L,PITTON J.Voice activity detection using subband noncircularity[A].Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing (ICASSP)[C].Brisbane,Australia,2015.4505-4509.
[3] HEESE F,NIERMANN M,VARY P.Speech-codebook based soft voice activity detection[A].Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing (ICASSP)[C].Brisbane,Australia,2015.4335-4339.
[4] TAO F-J,HANSEN H-L,BUSSO C.An unsupervised visual-only voice activity detection approach using temporal orofacial features[A].Proceedings of 16th Annual Conference of the International Speech Communication Association (INTERSPEECH)[C].Dresden,Germany,2015.2302-2306.
[5] ZHAN G,HUANG Z-Q,et al.Spectrographic speech mask estimation using the time-frequency correlation of speech presence[A].Proceedings of 16th Annual Conference of the International Speech Communication Association (INTERSPEECH)[C].Dresden,Germany,2015.2287-2291.
[6] RAMIREZ J,SEGURA J-C,BENITEZ C,et al.Efficient voice activity detection algorithms using long-term speech information[J].Speech Communication,2004,42(3):271-287.
[7] GHOSH P-K,TSIARTAS A,NARAYANAN S.Robust voice activity detection using long-term signal variability[J].IEEE Transactions on Audio,Speech,and Language Processing,2011,19(3):600-613.
[8] MA Y,NISHIHARA A.Efficient voice activity detection algorithm using long-term spectral flatness measure[J]. EURASIP Journal on Audio,Speech and Music Processing,2013:87,DOI:10.1186/1687-4722-2013-21.
[9] YANG X-K,He L,QU D,ZHANG W-Q.Voice activity detection algorithm based on long-term pitch information[J].EURASIP Journal on Audio,Speech,and Music Processing,2016:14,DOI:10.1186/s13636-016-0092-y.
[10] DAVIS S,MERMELSTEIN P.Comparison of parametric representations for monosyllabic word recognitions in continuously spoken sentences[J].IEEE Transactions on Acoustics,Speech and Signal Processing,1980,28(4):357-366.
[11] SCHLUTER R,BEZRUKOV I,WAGNER H,NEY H.Gammatone features and feature combination for large vocabulary speech recognition[A].Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing (ICASSP)[C].Hawaii,USA:IEEE,2007.649-652.
[12] MEINARD Muller.Information Retrieval for Music and Motion[M].Berlin:Springer Verlag,2007.51-55.
[13] SEGBROECH M-V,TSIARTAS A,NARAYANAN S-S.A robust frontend for VAD:exploiting contextual,discriminative and spectral cues of human voice[A].Proceedings of 14th Annual Conference of the International Speech Communication Association (INTERSPEECH)[C].Lyon,France,2013.704-708.
[14] KINNUNEN T,RAJAN P.A practical,self-adaptive voice activity detector for speaker verification with noise telephone and microphone data[A].Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing (ICASSP)[C].Vancouver,Canada,2013.7229-7233.
[15] Georgiou T-T.Distances Between Power Spectral Densities[R].Technique Report,arXiv:math/0607026v2,2006.
[16] JOHANNESMA P-I-M.The pre-response stimulus ensemble of neurons in the cochlear nucleus[A].Proceedings of IPO Symposium on Hearing Theory[C].Eindhoven,Netherlands,1972.58-69.
[17] GERKMANN T,RICHARD C-H.Unbiased MMSE-based noise power estimation with low complexity and low tracking delay[J].IEEE Transactions on Audio,Speech,and Language Processing,2012,20(4):1383-1393.
[18] GAROFOLO J-S,LAMEL L-F,Fisher W-M,et al.TIMIT Acoustic-Phonetic Continuous Speech Corpus[R].NIST Interagency/Internal Report (NISTIR)-4930,1993.
[19] VARGA A,STEENEKEN H-J-M.Assessment for automatic speech recognition:Ii.NOISEX-92:a database and an experiment to study the effect of additive noise on speech recognition systems[J].Speech Communication,1993,12(3):247-251.
[20] SOHN J,KIM N-S,SUNG W.A statistical model-based voice activity detection[J].IEEE Signal Processing Letters,1999,6(1):1-3.
[21] TAN L-N,BORGSTROM B-J,ALWAN A.Voice activity detection using harmonic frequency components in likelihood ratio Test[A].IEEE International Conference on Acoustics,Speech,and Signal Processing (ICASSP)[C].Dallas,USA:IEEE,2010.4466-4469. |