Sound Event Detection at Low SNR Based on Multi-random Forests
LI Ying1,2, YIN Jia-li1,2
1. College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian 350116, China;
2. Key Lab of Information Security of Network Systems(Fuzhou University), Fuzhou, Fujian 350116, China
Abstract:For sound event detection under various background noises at low SNR,this paper proposes a method that mixes the background noises with sound events into noisy samples to train classifiers.In the pre-processing stage,we use a voting method based on 2th to 6th intrinsic mode functions (IMFs) that generated from empirical mode decomposition (EMD),to detect the endpoint of sound events and estimate the SNR.Then subband power distribution (SPD) is used to extract features from audio data.Finally,we mix the background noise and all the sound event samples in the sound event database according to the estimated SNR,and then extract the noisy samples features to train multi-random forests (M-RF) for the detection of the sound events in low SNR environment.The experiment proves that the proposed method has the ability to recognize sound events in various acoustic scenes at low SNR,and can remain an average accuracy rate of 67.1% at-5dB.
[1] Zuren Feng,Qing Zhou,Jun Zhang,et al.A target guided subband filter for acoustic event detection in noisy environments using wavelet packets[J].IEEE Trans on Audio,Speech,and Language Processing,2015,23(2):361-372.
[2] Grzeszick R,Plinge A,Fink G A.Bag-of-features methods for acoustic event detection and classification[J].IEEE/ACM Trans on Audio,Speech,and Language Processing,2017,25(6):1242-1252.
[3] Ren Jian feng,Jiang Xu dong,Yuan Jun song,et al.Sound-event classification using robust texture features for robot hearing[J].IEEE Trans on Multimedia,2017,19(3):447-458.
[4] Ye Jia xing,Kobayashi T,Murakawa M.Urban sound event classification based on local and global features aggregation[J].Applied Acoustics,2017,117:246-256.
[5] Ozer I,Ozer Z,Findik O.Noise robust sound event classification with convolutional neural network[J].Neurocomputing,2018,272:505-512.
[6] 李艳雄,王琴,张雪,等.基于凝聚信息瓶颈的音频事件聚类方法,电子学报,2017,45(5):1064-1011. LI Yan xiong,Wang Qin,Zhang Xue,et al.Audio events clustering based on agglomerative information bottleneck[J].Acta Electronica Sinica,2017,45(5):1064-1011.(in Chinese)
[7] Phan H,Maab M,Mazur R,et al.Random regression forests for acoustic event detection and classification[J].IEEE Transactions on Audio,Speech,and Language Processing,2015,23(1):20-31.
[8] Stowell D,Giannoulis D,Benetos E,et al.Detection and classification of acoustic scenes and events[J].IEEE Trans on Multimedia,2015,17(10):1733-1746.
[9] Wang J,Lin C,Chen B,et al.Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation[J].IEEE Trans on Automation Science and Engineering,2014,11(2):607-613.
[10] Sharma A,Kaul S.Two-stage supervised learning-based method to detect screams and cries in urban environments[J].IEEE Trans on Audio,Speech,and Language Processing,2016,24(2):290-299.
[11] Dennis J,Ttran H D,Chng E S.Image feature representation of the subband power distribution for robust sound event classification[J].IEEE Trans on Audio,Speech,and Language Processing,2013,21(2):367-377.
[12] Seltzer M,Raj B,Stern R.A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition[J].Speech Communication,2004,43(4):379-393.
[13] Yamashita K,Shimamura T.Nonstationary noise estimation using low-frequency regions for spectral subtraction[J].IEEE Signal Processing Letters,2005,12(6):465-468.
[14] Sunil K,Philipos L.A multi-band spectral subtraction method for enhancing speech corrupted by colored noise[A].IEEE international Conference on Acoustics,Speech and Signal Processing[C].Orlando,USA:IEEE,2002.13-17.
[15] Huang H,Pan J.Q.Speech pitch determination based on hilbert-huang transform[J].Signal Processing,2006,86(4):792-803.
[16] Breiman L.Random forests[J].Machine Learning,2001,45(1):5-32.
[17] Pang H,Lin A,Holford M,et al.Pathway analysisusing random forests classification and regression[J].Bioinformatics,2006,22(16):2028-2036.
[18] Unella K L,Hayward L B,Scgal J,et al.Screening large-scale association study data:exploiting interactions using random forests[J].BMC Genetics,2004,11(5):32-37.
[19] Wei Jingming,Li Ying.Specific environmental sounds recognition using time-frequency texture features and random forest[A].International Congress on Image and Signal Processing[C].Hangzhou,China,2013.1705-1709.
[20] Lin Wei,Li Ying.Lower SNR sound event recognition using noisy training sample[A],International Congress on Image and Signal Processing[C].Shenyang,China,2015.1448-1453.
[21] Pour A F,Asgari M,Hasanabadi M R.Gammatonegram based speaker identification[A].International E-Conference on Computer and Knowledge Engineering[C].Hong Kong,China,2014.52-55.
[22] Universitat P F.Repository of sound under the creative commons license,freesound.org[DB/OL].http://www.freesound.org,2012-5-14.
[23] Chang C C,Lin C J.LIBSVM:a library for support vector machines[J].ACM Trans on Intelligent Systems and Technology,2011,2(3):27.
[24] Dennis J,Tran H D,Li H.Spectrogram image feature for sound event classification in mismatched conditions[J].IEEE Signal Processing Letters,2011,18(2):130-133.
[25] Zheng F,Zhang G,Song Z.Comparison of different implementations of MFCC[J].Journal of Computer Science and Technology,2001,16(6):582-589.