

浏览全部资源
扫码关注微信
1.哈尔滨工业大学仪器科学与工程学院,黑龙江哈尔滨 150006
2.青岛科技大学信息科学技术学院,山东青岛 266061
Received:09 October 2021,
Revised:2022-01-11,
Published:25 August 2022
移动端阅览
邓海刚,王传旭,李成伟等.深度学习框架下群组行为识别算法综述[J].电子学报,2022,50(08):2018-2036.
DENG Hai-gang,WANG Chuan-xu,LI Cheng-wei,et al.Summarization of Group Activity Recognition Algorithms Based on Deep Learning Frame[J].ACTA ELECTRONICA SINICA,2022,50(08):2018-2036.
邓海刚,王传旭,李成伟等.深度学习框架下群组行为识别算法综述[J].电子学报,2022,50(08):2018-2036. DOI: 10.12263/DZXB.20211359.
DENG Hai-gang,WANG Chuan-xu,LI Cheng-wei,et al.Summarization of Group Activity Recognition Algorithms Based on Deep Learning Frame[J].ACTA ELECTRONICA SINICA,2022,50(08):2018-2036. DOI: 10.12263/DZXB.20211359.
群组行为识别目前是计算机视觉领域的一个研究热点,在智能安防监控、社会角色理解和体育运动视频分析等方面具有广泛的应用价值.本文主要针对基于深度学习框架下的群组行为识别算法进行综述.首先,依据群组行为识别方法中“是否包含组群成员交互关系建模”这一核心技术环节,将现有算法划分为“无交互关系建模的群组行为识别”和“基于交互关系描述的群组行为识别”两大类.其次,鉴于“无交互关系建模的群组行为识别方法”主要是聚焦于如何对“群组行为时序过程的整体时空特征的计算和提纯”进行设计的,故本文从“多流时空特征计算融合”“个人/群体多层级时空特征计算合并”“基于注意力机制的群组行为时空特征提纯”3类典型算法进行概述.再次,对于“基于交互关系建模的群组行为识别”,依据对交互关系描述方法的不同,将其归纳为“基于组群成员全局交互关系建模”“基于组群分组下的交互关系建模”和“基于关键人物为主的核心成员间交互关系建模”3种类别分别概述.然后,对群组行为识别相关的数据集进行介绍,并对不同识别方法在各个数据集的测试性能进行了对比和总结.最后,分别从群组行为类别定义的二元性、交互关系建模的难点与不足、群组行为数据集弱监督标注和自学习、视角变化以及场景信息综合利用等方面概述了几个具有挑战性的问题和未来研究的方向.
Group behavior recognition is currently a research hotspot in the field of computer vision
and has a wide range of applications in intelligent security monitoring
social role understanding
and sports video analysis. This article mainly reviews group behavior recognition algorithms based on deep learning framework. Firstly
by judging “whether a method including group member interaction relationship modeling”
it can be classified as “group behavior recognition without interaction relationship modeling(GBRWIR)” or “group behavior recognition based on interaction relationship description(GBRBIR)”. Secondly
because GBRWIR mainly focuses on how to design “calculation and purification of overall spatiotemporal characteristics of a group behavior sequence”
this article summarizes it as the following three typical algorithms
which are “multi-stream spatiotemporal feature calculation fusion”
“individual/group multi-level spatiotemporal feature calculation and merging”
and “group behavior spatiotemporal feature purification based on attention mechanism” respectively. Thirdly
for GBRWIR algorithms
depending on its different descriptions of interaction relationship
it can be summarized respectively as “based on group member global interaction relationship modeling”
“based on group division and subgroup interaction modeling”, and “modeling of interactions between core members”. Then
the data sets related to group behavior recognition are introduced
and the test performances of different recognition methods in each data set are compared and summarized. Finally
several challenging issues and future research directions are discussed
which respectively are the duality of group behavior category definition
the difficulty of interactive relationship modeling
the weakly supervised labeling and self-learning of group behavior recognition
and the changes of viewpoint and the comprehensive utilization of scene information.
SIMONYAN K , ZISSERMAN A . Two-Stream convolutional networks for action recognition in videos [J]. Advances in Neural Information Processing Systems , 2014 , 27 : 568 - 576 .
BORJA-BORJA L F , AZORIN-LOPEZ J , SAVAL-CALVO M A , et al . Deep learning architecture for group activity recognition using description of local motions [C]// 2020 International Joint Conference on Neural Networks . Glasgow : IEEE , 2020 : 1 - 8 .
ZALLUHOGLU C , IKIZLER-CINBIS N . Region based multi-stream convolutional neural networks for collective activity recognition [J]. Journal of Visual Communication and Image Representation , 2019 , ( 60 ): 170 - 179 .
AZAR S M , ATIGH M G , NICKABADI A . A multi-stream convolutional neural network framework for group activity recognition [EB/OL]. ( 2018-12-26 )[ 2021-10-09 ]. https://arxiv.org/abs/1812.10328 https://arxiv.org/abs/1812.10328 .
王传旭 , 胡小悦 , 孟唯佳 , 等 . 基于多流架构与长短时记忆网络的组群行为识别方法研究 [J]. 电子学报 , 2020 , 48 ( 4 ): 178 - 185 .
WANG C X , HU X Y , MENG W J , et al . Research on group behavior recognition method based on multi-stream architecture and long short-term memory network [J]. Acta Electronica Sinica , 2020 , 48 ( 4 ): 800 - 807 .
IBRAHIM M , MURALIDHARAN S , DENG Z , et al . A hierarchical deep temporal model for group activity recognition [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas : IEEE , 2016 : 1971 - 1980 .
TAKAMASA T , YASUHIRO K , et al . Football action recognition using hierarchical LSTM [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops . Honolulu : IEEE , 2017 : 155 - 163 .
KIM P S , LEE D G , LEE S W . Discriminative context learning with gated recurrent unit for group activity recognition [J]. 2017 Pattern Recognition , 2018 , 76 : 149 - 161 .
GAMMULLE H , DENMAN S , SRIDHARAN S , et al . Multi-level sequence GAN for group activity recognition [C]// 2018 ACCV Computer Vision . Perth : ACCV , 2018 : 331 - 346 .
XIN L , CHUAH M C . SBGAR: Semantics based group activity recognition [C]// 2017 IEEE International Conference on Computer Vision . Venice : IEEE , 2017 : 2895 - 2904 .
CHOI W , SHAHID K , SAVARESE S . What are they doing?: Collective activity classification using spatiotemporal relationship among people [C]// 2012 IEEE International Conference on Computer Vision Workshops . Kyoto : IEEE , 2012 : 1282 - 1289 .
XU K , BA J , KIROS R , et al . Show, attend and tell: Neural image caption generation with visual attention [C]// 2015 International Conference on Machine Learning . Lille : ICML , 2015 : 2048 - 2057 .
BAHDANAU D , CHO K , BENGIO Y . Neural machine translation by jointly learning to align and translate [C]// 2016 International Conference on Learning Representations . San Diego : ICLR , 2015 : 1713 - 1717 .
YAN S , SMITH J S , LU W , et al . CHAM: Action recognition using convolutional hierarchical attention model [C]// 2017 IEEE International Conference on Image Processing . Beijing : ICIP , 2017 : 3958 - 3962 .
WANG Y L , WANG S H , TANG J L , et al . Hierarchical attention network for action recognition in videos [EB/OL]. ( 2016-07-21 )[ 2021-10-09 ]. https://arxiv.org/abs/1607.06416 https://arxiv.org/abs/1607.06416 .
RAMANATHAN V , HUANG J , ABU-EL-HAIJA S , et al . Detecting events and key actors in multi-person videos [C]// 2016 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas : IEEE , 2016 : 3043 - 3053 .
KARPATHY A , LI F F . Deep visual-semantic alignments for generating image descriptions [C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition . Boston : IEEE , 2015 : 664 - 676 .
LU L , DI H , LU Y , et al . Spatio-temporal attention mechanisms based model for collective activity recognition [J]. Signal Processing Image Communication , 2019 , 74 : 162 - 174 .
TANG J , SHU X , YAN R , et al . Coherence constrained graph LSTM for group activity recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2019 , 44 ( 2 ): 636 - 647 .
王传旭 , 龚玉婷 . 基于注意力机制的群组行为识别方法 [J]. 数据采集与处理 , 2019 , 34 ( 3 ): 38 - 45 .
WANG C X , GONG Y T . Group activity recognition method based on attention mechanism [J]. Journal of Data Acquisition and Processing , 2019 , 34 ( 3 ): 38 - 45 .
LIU J C , WANG C X , GONG Y T , et al . Deep fully connected model for collective activity recognition [J]. IEEE Access , 2019 , 7 : 104308 - 104314 .
BOYKOV Y Y , JOLLY M P . Interactive graph cuts for optimal boundary & region segmentation of objects in ND images [C]// 2001 Proceedings Eighth IEEE International Conference On Computer Vision . Columbia : ICCV , 2001 : 1(105-112 .
CHENG Z , QIN L , HUANG Q , et al . Group activity recognition by Gaussian processes estimation [C]// 2010 International Conference on Pattern Recognition . Istanbul : ICPR , 2010 : 3228 - 3231 .
ZHANG Y , GE W , CHANG M C , et al . Group context learning for event recognition [C]// 2012 Proceedings of the IEEE Workshop on the Applications of Computer Vision . Breckenridge : IEEE 2012 : 249 - 255 .
LAN T . Discriminative latent models for recognizing contextual group activities [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2012 , 34 ( 8 ): 1549 - 1562 .
QI M , JIE Q , LI A , et al . StagNet: An attentive semantic RNN for group activity recognition [J]. 2020 IEEE Transactions on Circuits and Systems for Video Technology . 2020 , 30 ( 2 ): 549 - 565 .
IBRAHIM M S , MORI G . Hierarchical relational networks for group activity recognition and retrieval [C]// 2018 European Conference on Computer Vision . Munich : ECCV , 2018 : 742 - 758 .
XU D , FU H , WU L , et al . Group activity recognition by using effective multiple modality relation representation with temporal-spatial attention [J]. IEEE Access , 2020 , ( 99 ): 1 .
SHU X , ZHANG L , SUN Y , et al . Host-Parasite: Graph LSTM-in-LSTM for group activity recognition [J]. IEEE Transactions on Neural Networks and Learning Systems , 2020 , ( 99 ): 1 - 12 .
丰艳 , 张甜甜 , 王传旭 . 基于伪3D残差网络与交互关系建模的群组行为识别方法 [J]. 电子学报 , 2020 , 48 ( 7 ): 1269 - 1275 .
FENG Y , ZHANG T T , WANG C X . Group activity recognition method based on pseudo 3D residual network and interaction modeling [J]. Acta Electronica Sinica , 2020 , 48 ( 7 ): 1269 - 1275 .
EHSANPOUR M , ABEDIN A , SALEH J SHI F , et al . Joint learning of social groups, individuals action and sub-group activities in videos [C]// European Conference on Computer Vision . Virtual Conference . Glasgow : IEEE , 2020 : 177 - 195 .
SUN L , AI H Z , et al . Localizing activity groups in videos [J]. Comput . 2016 , 144 : 144 - 154 .
YIN Y , YANG G , JIN X , et al . Small group human activity recognition [C]// 2012 The 19th IEEE International Conference on Image Processing . Florida : ICIP , 2012 : 2709 - 2712 .
AZORIN-LOPEZ J , SAVAL-CALVO M , FUSTER-GUILLO A , et al . Group activity description and recognition based on trajectory analysis and neural networks [C]// International Joint Conference on Neural Networks . Vancouver : IEEE 2016 : 1585 - 1592 .
TRAN K N , GALA A , KAKADIARIS I A , et al . Activity analysis in crowded environments using social cues for group discovery and human interaction modeling [J]. Pattern Recognition Letters , 2014 , 44 : 49 - 57 .
ZHANG C , YANG X K , ZHU J , et al . Parsing collective behaviors by hierarchical model with varying structure [C]// 2012 The 20th ACM International Conference on Multimedia . Nara : ACM , 2012 : 1085 - 1088 .
YAN R , TANG J , SHU X , et al . Participation-contributed temporal dynamic model for group activity recognition [C]// The 26th ACM International Conference on Multimedia . Seoul : ACM , 2018 : 1292 - 1300 .
DENG Z , ZHAI M , CHEN L , et al . Deep structured models for group activity recognition [C]// 2015 The British Machine Vision Conference . Swansea : BMVC , 2015 : 1 - 12 .
DENG Z W , ARASH V , HU H X , et al . Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas : IEEE , 2016 : 4772 - 4781 .
WU J C , WANG L M , et al . Learning actor relation graphs for group activity recognition [C]// 2019 IEEE Conference on Computer Vision and Pattern Recognition . California : CVPR , 2019 : 9964 - 9974 .
KUANG Z J , TIE X R . Improved actor relation graph based group activity recognition [EB/OL]. ( 2020-12-29 )[ 2021-10-09 ]. https://arxiv.org/abs/2010.12968v2 https://arxiv.org/abs/2010.12968v2 .
HU G , CUI B , HE Y , et al . Progressive relation learning for group activity recognition [C]// 2020 Conference on Computer Vision and Pattern Recognition . Virtual Conference : CVPR , 2020 : 980 - 989 .
ZHANG P Z , TANG Y Y , HU J F , et al . Fast collective activity recognition under weak supervision [J]. IEEE Transactions on Image Processing , 2020 , 29 ( 1 ): 29 - 43 .
ZHANG P , LAN C , ZENG W , et al . Semantics-guided neural networks for efficient skeleton-based human action recognition [C]// 2020 Conference on Computer Vision and Pattern Recognition . Virtual Conference : IEEE , 2020 . 1112 - 1121 .
TANG Y , WANG Z , LI P , et al . Mining semantics-preserving attention for group activity recognition [C]// 2018 The 26th ACM international conference on Multimedia . Seoul : ACM , 2018 : 1283 - 1291 .
YANG F K , YIN W J , et al . Group Behavior Recognition Using Attention-and Graph-Based Neural Networks [C]// the 24th European Conference on Artificial Intelligence . Santiago : IEEE , 2020 : 1626 - 1633 .
NI B , YAN S , KASSIM A A . Recognizing human group activities with localized causalities [C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition . Florida : IEEE , 2009 : 1470 - 1477 .
BLUNSDEN S J , FISHER R B . The BEHAVE video dataset: Ground truth video for multi-person [J]. Annals of the BMVA , 2010 ( 4 ): 1 - 11 .
FABIO Z , TIBERIO U , et al . Learning group activities from skeletons without individual action labels [C]// 2021 International Conference on Pattern Recognition . Taichung : ICPR , 2021 : 10412 - 10417 .
XU K , BA J , KIROS R , et al . Show, Attend and tell: Neural image caption generation with visual attention [J]. Computer Science , 2015 ( 37 ): 2048 - 2057 .
PEI D X , LI A , et al . Group activity recognition by exploiting position distribution and appearance relation [C]// 2021 International Conference on Multimedia Modelin . Manchester : ICMM , 2021 : 123 - 135 .
YAN R , XIE L X , et al . Social adaptive module for weakly-supervised group activity recognition [C]// 2020 European Conference on Computer Vision . Virtual Conference : ECCV , 2020 : 208 - 224 .
ZALLUHOGLU C , IKIZLER-CINBIS N . Collective sports: A multi-task dataset for collective activity recognition [J]. Image and Vision Computing , 2020 , ( 94 ): 103870 .
CHOI W , SAVARESE S . A unified framework for multi-target tracking and collective activity recognition [C]// 2012 European Conference on Computer Vision(ECCV) . Florida : ECCV , 2012 : 215 - 230 .
LAN T , WANG Y , et al . Discriminative latent models for recognizing contextual group activities [J]. 2012 IEEE Transactions on Pattern Analysis and Machine Intelligence , 2012 , 34 ( 8 ), 1549 - 1562 .
DENG Z , VAHDAT A , HU H , et al . Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition [C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition . Boston : IEEE , 2015 : 4772 - 4781 .
FENG Y Q , SHAN S M , et al . DRGCN: Deep relation gcn for group activity recognition [C]// 2020 International Conference on Neural Information Processing . Transtations on Multimedin : ICONIP , 2020 : 361 - 368 .
LU L H , LU Y , et al . GAIM: Graph attention interaction model for collective activity recognition [J]. IEEE Transtations on Multimedin , 2020 , 22 ( 2 ): 524 - 539 .
SINA M A , MINA G A , et al . Convolutional relational machine for group activity recognition [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach : IEEE , 2019 : 7892 - 7901 .
GAVRILYUK K , SANFORD R , JAVAN M , et al . Actor-transformers for group activity recognition [C]// 2020 Conference on Computer Vision and Pattern Recognition . Seattle : IEEE , 2020 : 839 - 848 .
SOVAN B , JUERGEN G . Structural recurrent neural network(SRNN) for group activity analysis [C]// Winter Conference on Applications of Computer Vision . Lake Tahoe : WCACV , 2018 : 1625 - 1632 .
AZAR S M , ATIGH M G , NICKABADI A . A multi-stream convolutional neural network framework for group activity recognition [EB/OL]. ( 2018-12-26 )[ 2021-10-09 ]. https://arxiv.org/abs/1812.10328 https://arxiv.org/abs/1812.10328 .
WANG M , NI B , YANG X . Recurrent modeling of interaction context for collective activity recognition [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu : IEEE , 2017 : 7408 - 7416 .
LU L H , LU Y , et al . Learning multi-level interaction relations and feature representations for group activity recognition [C]// ACM International Conference on Multimedia . Chengdu : ACM , 2021 : 617 - 628 .
LAN T , SIGAL L , MORI G . Social roles in hierarchical models for human activity recognition [C]//2012 IEEE Conference on Computer Vision and Pattern Recognition[C] . Providence: IEEE , 2012 : 1354 - 1361 .
林晓萌 . 基于图模型和深度学习网络的群组行为识别算法研究 [D]. 青岛 : 青岛科技大学 , 2021 .
Lin X M . Group Activity Recognition Research Based on Graph Model and Deep Learning [D]. Qingdao : Qingdao University of Science and Technology , 2021 .
LI J , SHLIZERMAN E . Sparse semi-supervised action recognition with active learning [EB/OL]. ( 2020-12-03 )[ 2021-10-09 ]. https://arxiv.org/abs/2012.01740 https://arxiv.org/abs/2012.01740 .
丰艳 , 李鸽 , 原春锋 , 等 . 基于时空注意力深度网络的视角无关性骨架行为识别 [J]. 计算机辅助设计与图形学学报 , 2018 , 30 ( 12 ): 2271 - 2277 .
FENG Y , LI G , YUAN C F , et al . Spatio-temporal attention deep network for skeleton based view-invariant human action recognition [J]. Journal of Computer-Aided Design & Computer Graphics , 2018 , 30 ( 12 ): 2271 - 2277 .
吴培良 , 杨霄 , 毛秉毅 , 孔令富 , 侯增广 . 一种视角无关的时空关联深度视频行为识别方法 [J]. 电子与信息学报 , 2019 , 41 ( 4 ): 904 - 910 .
WU P L , YANG X , KONG L F , et al . A perspective-independent method for behavior recognition in depth video via temporal-spatial correlating [J]. Jorney of Electronic & Information Technology , 2019 , 41 ( 4 ): 904 - 910 .
DENG , Z W , ARASH V , HU H X , et al . Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition [C]// 2016 the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas : IEEE , 2016 : 4772 - 4781 .
0
Views
7
下载量
1
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621