[1] Han Y,Song S,Zhao W.Retrieval of TV talk-show speakers by associating audio transcript to visual clusters[J].IEEE Access,2017,5:20512-20523.
[2] Schonherr L,Zeiler S,et al.Spoofing detection via simultaneous verification of audio-visual synchronicity and transcription[A].2017 IEEE Automatic Speech Recognition and Understanding Workshop[C].Okinawa:IEEE,2017.591-598.
[3] Dov D,Talmon R,et al.Sequential audio-visual correspondence with alternating diffusion kernels[J].IEEE Transactions on Signal Processing,2018,66(12):3100-3111.
[4] Liu Y,Sato Y.Recovery of audio-to-video synchronization through analysis of cross-modality correlation[J].Pattern Recognition Letters,2010,31(8):696-701.
[5] Izadinia H,Saleemi I,et al.Multimodal analysis for identification and segmentation of moving-sounding objects[J].IEEE Transactions on Multimedia,2013,15(2):378-390.
[6] EA Rúa,H Bredin,et al.Audio-visual speech asynchrony detection using co-inertia analysis and coupled hidden markov models[J].Pattern Analysis and Applications,2009,12(3):271-284.
[7] Kumar K,Navratil J,et al.Audio-visual speech synchronization detection using a bimodal linear prediction model[A].IEEE Computer Society Conference on Computer Vision & Pattern Recognition Workshops[C].Florida:IEEE,2009.53-59.
[8] S Kumagai,K Doman,et al.Detection of inconsistency between subject and speaker based on the co-occurrence of lip motion and voice towards speech scene extraction from news videos[A].2011 IEEE International Symposium on Multimedia[C].California:IEEE,2011.311-318.
[9] 朱铮宇,贺前华,奉小慧,等.基于时空相关度融合的语音唇动一致性检测算法[J].电子学报,2014,42(4):779-785. ZHU Zheng-yu,HE Qian-hua,FENG Xiao-hui,et al.Lip motion and voice consistency algorithm based on fusing spatiotemporal correlation degree[J].Acta Electronica Sinica,2014,42(4):779-785.(in Chinese)
[10] Monaci G,Vandergheynst P,et al.Learning bimodal structure in audio-visual data[J].IEEE Transactions on Neural Networks,2009,20(12):1898-1910.
[11] Qingju Liu,Wenwu Wang,et al.Source separation of convolutive and noisy mixtures using audio-visual dictionary learning and probabilistic time-frequency masking[J].IEEE Transactions on Signal Processing,2013,61(22):5520-5535.
[12] 贺前华,朱铮宇,奉小慧.基于平移不变字典的语音唇动一致性判决方法[J].华中科技大学学报(自然科学版),2015,43(10):69-74. HE Qian-hua,ZHU Zheng-yu,FENG Xiao-hui.Lip motion and voice consistency analysis algorithm based on shift-invariant dictionary[J].Journal of Huazhong University of Science and Technology (Nature Science Edition),2015,43(10):69-74.(in Chinese)
[13] El-Sallam A A,Mian A S.Correlation based speech-video synchronization[J].Pattern Recognition Letters,2011,32(6):780-786.
[14] Eg R,Griwodz C,et al.Audiovisual robustness:exploring perceptual tolerance to asynchrony and quality distortion[J].Multimedia Tools & Applications,2015,74(2):345-365.
[15] Staelens N,Meulenaere J D,et al.Assessing the importance of audio/video synchronization for simultaneous translation of video sequences[J].Multimedia Systems,2012,18(6):445-457.
[16] 孙金城,倪宏,莫福源,等.普通话声母和韵母的统计特性[J].应用声学,1995,14(3):35-41. SUN Jin-cheng,NI Hong,MO Fu-yuan,et al.The statistical distribution of standard chinese initials and finals[J].Journal of Applied Acoustics,1995,14(3):35-41.(in Chinese)
[17] Song T,Lee K,et al.Visual voice activity detection via chaos based lip motion measure robust under illumination changes[J].IEEE Transactions on Consumer Electronics,2014,60(2):251-257.
[18] Siatras S,Nikolaidis N,et al.Visual lip activity detection and speaker detection using mouth region intensities[J].IEEE Transactions on Circuits and Systems for Video Technology,2009,19(1):133-137.
[19] Wang Q,Shi G,et al.Analysis and design of an optimum detector for weak sinusoidal signals[A].International Conference on Signal Processing[C].Beijing:IEEE,2002.1608-1611.
[20] Gustafsson F.Determining the initial states in forward-backward filtering[J].IEEE Transactions of Signal Processing,1996,44(4):988-992.
[21] 钱博,李燕萍,唐振民,等.基于频域能量分布分析的自适应元音帧提取算法[J].电子学报,2007,35(2):279-282. QIAN Bo,LI Yan-ping,TANG Zhen-min,et al.Self-adaptive vowel-frame detection algorithm based on energy distribution analysis in frequency domain[J].Acta Electronica Sinica,2007,35(2):279-282.(in Chinese)
[22] 李皓,唐朝京.采用损失函数和声学特征切分声韵母的方法[J].声学学报,2012,37(3):339-345. LI Hao,TANG Chao-jing.Initial/final segmentation using loss function and acoustic feature[J].Acta Acustica,2012,37(3):339-345.(in Chinese)
[23] 胡瑛,陈宁.基于小波变换的清浊音分类及基音周期检测算法[J].电子与信息学报,2008,30(2):353-356. HU Ying,CHEN Ning.Voiced/unvoiced classification and pitch period detection algorithm based on wavelet transform[J].Journal of Electronics and Information Technology,2008,30(2):353-356.(in Chinese)
[24] Yang B,Liu R,Chen X.Fault diagnosis for wind turbine generator bearing via sparse representation and shift-invariant K-SVD[J].IEEE Transactions on Industrial Informatics,2017,13(3):1321-1331.
[25] Ragnhild Eg,Dawn Behne,Carsten Griwodz.Audiovisual temporal integration in reverberant environments[J].Speech Communication,2015,66:91-106.
[26] Takahashi T,Kageyama Y,Ariuntsengel B,et al.Analysis of lip motion due to the influence of vocalization[A].SICE Annual Conference 2012[C].Akita:IEEE,2012.973-978.
[27] 邵健,赵庆卫,颜永红.基于鼻韵尾分离的汉语声韵母识别模型[J].声学学报,2010,35(5):587-592. SHAO Jian,ZHAO Qing-wei,YAN Yong-hong.Initial/final acoustic model based on separating nasal coda in chinese putonghua speech recognition[J].Acta Acustica,2010,35(5):587-592.(in Chinese) |