Abstract:For the diversity of group behavior characteristics in complex scenes and the problem of difficult interaction modeling,this paper proposes a new two-layered network architecture.The first layer of network combines a pseudo 3D residual network with a graph convolution network to capture the interaction characteristics.The second layer of network,uses the pseudo 3D residual network to capture the group global scene spatio-temporal characteristics.Based on the complementary role of the above features,their group behavior decisions are fused with a weight adaptive adjustment algorithm,which adaptively calculates importance weights for the group behavior categories predicted by the above two channels,and realizes decision fusion of the different prediction results.The method has achieved 91.4% and 97.9% average recognition accuracy on CAD and CAE respectively.
[1] 韩磊,李君峰,贾云得.基于时空单词的两人交互行为识别方法[J].计算机学报,2010,33(4):776-784. HAN Lei,LI Jun-feng,JIA Yun-de.Human interaction recognition using spatio-temporal words[J].Chinese Journal of Computers,2010,33(4):776-784.(in Chinese)
[2] 朱煜,赵江坤,王逸宁,郑兵兵.基于深度学习的人体行为识别算法综述[J].自动化学报,2016,42(6):848-857. ZHU Yu,ZHAO Jiang-kun,WANG Yi-ning,ZHENG Bing-bing.A review of human action recognition based on deep learning[J].Acta Automatica Sinica,2016,42(6):848-857.(in Chinese)
[3] 郑兴华,孙喜庆,吕嘉欣,等.基于深度学习和智能规划的行为识别[J].电子学报,2019,47(8):1661-1668. ZHENG Xing-hua,SUN Xi-qing,LU Jia-xin,et al.Action recognition based on deep learning and artificial intelligence planning[J].Acta Electronica Sinica,2019,47(8):1661-1668.(in Chinese)
[4] 王传旭,刘云,厉万庆.基于时空特征点的非监督姿态建模和行为识别的算法研究[J].电子学报,2011,39(8):1751-1756. WANG Chuan-xu,LIU Yun,LI Wan-qing.Research ofunsupervised posture modeling and action recognition based on spatial-temporal interesting points[J].Acta Electronica Sinica,2011,39(8):1751-1756.(in Chinese)
[5] Deng Z,Vahdat A,Hu H,et al.Structure inference machines:Recurrent neural networks for analyzing relations in group activity recognition[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].USA:IEEE,2016.4772-4781.
[6] Tran D,Bourdev L,Fergus R,et al.Learning spatiotemporal features with 3d convolutional networks[A].Proceedings of the IEEE International Conference on Computer Vision[C]. USA:IEEE,2015.4489-4497.
[7] Simonyan K,Zisserman A.Two-stream convolutional networks for action recognition in videos[A].Advances in Neural Information Processing Systems[C].USA:Massachusetts Institute of Technology Press,2014.568-576.
[8] Ibrahim M S,Muralidharan S,Deng Z,et al.A hierarchical deep temporal model for group activity recognition[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].USA:IEEE,2016.1971-1980.
[9] Vahora S,Chauhan N.Deep neural network model for group activity recognition using contextual relationship[J].Engineering Science and Technology,an International Journal,2019,22(1):47-54.
[10] Ramanathan V,Huang J,Abu-El-Haija S,et al.Detecting events and key actors in multi-person videos[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].USA:IEEE,2016.3043-3053.
[11] Li W,Chang M-C,Lyu S.Who did what at where and when:simultaneous multi-person tracking and activity recognition[J].arXiv Preprint,2018,arXiv:1807.01253.
[12] Deng Z,Zhai M,Chen L,et al.Deep structured models for group activity recognition[J].arXiv Preprint,2015,arXiv:1506.04191.
[13] Wu J,Wang L,Wang L,et al.Learning actor relation graphs for group activity recognition[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].USA:IEEE,2019.9964-9974.
[14] 罗会兰,王婵娟.行为识别中一种基于融合特征的改进VLAD编码方法[J].电子学报,2019,47(1):49-58. LUO Hui-lan,WANG Chan-juan.An improved VLAD coding method based on fusion feature in action recognition[J].Acta Electronica Sinica,2019,47(1):49-58.(in Chinese)
[15] 田国会,尹建芹,闫云章,李国栋.基于混合高斯模型和主成分分析的轨迹分析行为识别方法[J].电子学报,2016,44(1):143-149. TIAN Guo-hui,YIN Jian-qin,YAN Yun-zhang,LI Guo-dong.Gaussian mixture models and principal component analysis based human trajectory behavior recognition[J].Acta Electronica Sinica,2016,44(1):143-149.(in Chinese)
[16] Cao Z,Simon T,Wei S-E,et al.Realtime multi-person 2d pose estimation using part affinity fields[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].USA:IEEE,2017.7291-7299.
[17] Qiu Z,Yao T,Mei T.Learning spatio-temporal representation with pseudo-3d residual networks[A].Proceedings of the IEEE International Conference on Computer Vision[C].USA:IEEE,2017.5533-5541.
[18] Choi W,Shahid K,Savarese S.What are they doing?:Collective activity classification using spatio-temporal relationship among people[A].IEEE 12th International Conference on Computer Vision(ICCV) Workshops[C].USA:IEEE,2009.1282-1289.
[19] Choi W,Shahid K,Savarese S.Learning context for collective activity recognition[A].Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern RecognitionJune(CVPR'11)[C].USA:IEEE,2011.3273-3280.
[20] Li X,Choo Chuah M.SBGAR:semantics based group activity recognition[A].Proceedings of the IEEE International Conference on Computer Vision[C].USA:IEEE,2017.2876-2885.
[21] Lan T,Wang Y,Yang W,et al.Discriminative latent models for recognizing contextual group activities[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,34(8):1549-1562.
[22] Choi W,Savarese S.Understanding collective activitiesof people from videos[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,36(6):1242-1257.
[23] Amer M R,Lei P,Todorovic S.Hirf:Hierarchical random field for collective activity recognition in videos[A].European Conference on Computer Vision[C].Cham:Springer,2014.572-585.