Research on Family Activity Recognition Method Based on Additive Margin Capsule Network
ZHENG Qi-hang1, WANG Zhang-quan1,2, LIU Ban-teng1,2, CHEN Yang1, CHEN You-rong2
1. School of Information Science and Engineering, Changzhou University, Changzhou, Jiangsu 213164, China;
2. College of Information Science and Technology, Zhejiang Shuren University, Hangzhou, Zhejiang 310015, China
摘要 本文研究基于音频的家庭活动识别方法,提出了一种基于加性间距胶囊神经网络识别模型,针对传统胶囊神经网络目标函数仅以输出胶囊模长作为约束的弊端,本文以几何学的视角,在胶囊神经网络结构中加入Transition层,使用Transition层对胶囊单元空间关系进行变基至一维空间,再使用加性间距Softmax作为目标函数,以同类特征变化小,非同类特征差异大作为优化策略构建基于胶囊向量空间关系的目标函数以提高模型分类能力,最后对方法进行试验,采用音频事件对家庭活动进行分类识别.选择声学场景和事件检测与分类(Detection and Classification of Acoustic Scenes and Events,DCASE)2018挑战任务5作为数据集,进行分类器构建和测试,最终平均F1分数达到92.3%,优于其他主流方法.
Abstract:We study the method of family activity recognition based on audio and propose a capsule neural network recognition model based on additive margin.In view of the drawbacks of the traditional capsule neural network objective function only with the output capsule mode length as the constraint,this paper adds a Transition layer to the capsule neural network structure from the perspective of geometry and uses the Transition layer to rebase the capsule unit spatial relationship to the one-dimensional.Then,using the additive margin Softmax as the objective function,the change of similar features is small,and the difference of non-similar features is used as the optimization strategy to construct the objective function based on the capsule vector space relationship to improve model classification ability.Finally,test this method by classified identified for audio events for family activities.Selecting Detection and Classification of Acoustic Scenes and Events(DCASE)2018 Challenge Task 5 as a dataset for classifier construction and testing,with a final average F1 score of 92.3%,which is superior to other mainstream methods.
[1] Nathan V,Paul S,Prioleau T,et al.A survey on smart homes for aging in place:Toward solutions to the specific needs of the elderly[J].IEEE Signal Processing Magazine,2018,35(5):111-119.
[2] Sophiya E,Jothilakshmi S.Large scale data based audio scene classification[J].International Journal of Speech Technology,2018,21(4):825-836.
[3] Ferguson E L,Ramakrishnan R,Williams S B,et al.Deep learning approach to passive monitoring of the underwater acoustic environment[J].The Journal of the Acoustical Society of America,2016,140(4):3351-3351.
[4] Kasnesis P,Tatlas N A,Mitilineos S A,et al.Acoustic sensor data flow for cultural heritage monitoring and safeguarding[J].Sensors,2019,19(7):1629.
[5] Lapuschkin S,Wäldchen S,Binder A,et al.Unmasking clever hans predictors and assessing what machines really learn[J].Nature communications,2019,10(1):1-8.
[6] Keren G,Schuller B.Convolutional RNN:An enhanced model for extracting features from sequential data[A].2016 International Joint Conference on Neural Networks[C].Canada:IEEE,2016.3412-3419.
[7] Chew J,Sun Y,Jayasinghe L,et al.DCASE 2018 Challenge:Solution for task 5[R].DCASE2018 Challenge,Tech.Rep,2018.
[8] Sabour S,Frosst N,Hinton G E.Dynamic routing between capsules[A].Advances in neural information processing systems[C].US:NIPS,2017.3856-3866.
[9] 任开旭,王玉龙,刘同存,李炜.融合多维语义表示的概率矩阵分解模型[J].电子学报,2019,47(9):1848-1854. REN Kai-xu,WANG Yu-long,LIU Tong-cun,LI Wei.A probabilistic matrix factorization model based on multidimensional semantic representation learning[J].Acta Electronica Sinica,2019,47(9):1848-1854.(in Chinese)
[10] 贾旭东,王莉.基于多头注意力胶囊网络的文本分类模型[J].清华大学学报(自然科学版),2020,60(5):415-421. JIA Xudong,WANG Li.Text classification model based on multi-head attention capsule networks[J].Journal of Tsinghua University(Science and Technology),2020,60(5):415-421.(in Chinese)
[11] Wang F,Cheng J,Liu W,et al.Additive margin softmax for face verification[J].IEEE Signal Processing Letters,2018,25(7):926-930.
[12] Dekkers G,Lauwereins S,Thoen B,et al.The SINS database for detection of daily activities in a home environment using an acoustic sensor network[A].Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop [C].Germany:DCASE.2017.32-36.
[13] Liu H,Wang F,Liu X,et al.An ensemble system for domestic activity recognition[R].DCASE2018 Challenge,Tech.Rep,2018.
[14] Tanabe R,Endo T,Nikaido Y,et al.Multichannel acoustic scene classification by blind dereverberation,blind source separation,data augmentation,and model ensembling[R].DCASE2018 Challenge,Tech.Rep,2018.
[15] Inoue T,Vinayavekhin P,Wang S,et al.Domestic activities classification based on cnn using shuffling and mixing data augmentation[R].DCASE2018 Challenge,Tech.Rep,2018.
[16] Dekkers G,Vuegen L,van Waterschoot T,et al.DCASE 2018 Challenge-Task 5:Monitoring of domestic activities based on multi-channel acoustics[R].DCASE2018 Challenge,Tech.Rep,2018.
[17] Yuhan Shen,Kexin He,Weiqiang Zhang.Home activity monitoring based on gated convolutional neural networks and system fusion[R].DCASE2018 Challenge,Tech.Rep,2018.