

浏览全部资源
扫码关注微信
华北水利水电大学信息工程学院,河南郑州 450000
Received:06 April 2025,
Accepted:09 October 2025,
Published:25 October 2025
移动端阅览
姜维, 关孟怡, 魏富鹏, 等. 基于增强时空图卷积网络的骨架行为识别[J]. 电子学报, 2025, 53(10): 3692-3704.
JIANG Wei, GUAN Meng-yi, WEI Fu-peng, et al. Enhanced Spatial-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition[J]. Acta Electronica Sinica, 2025, 53(10): 3692-3704.
姜维, 关孟怡, 魏富鹏, 等. 基于增强时空图卷积网络的骨架行为识别[J]. 电子学报, 2025, 53(10): 3692-3704. DOI:10.12263/DZXB.20250259
JIANG Wei, GUAN Meng-yi, WEI Fu-peng, et al. Enhanced Spatial-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition[J]. Acta Electronica Sinica, 2025, 53(10): 3692-3704. DOI:10.12263/DZXB.20250259
图卷积网络(Graph Convolutional Network,GCN)被广泛应用在基于骨架序列的行为识别方法中,并取得显著效果.然而,随着行为种类和场景复杂度的增加,现有方法在建模人体结构细节与时序依赖方面仍面临诸多挑战,具体表现为以下两个问题:其一,在提取关节间的关联特征时,往往未能充分反映边缘处关节(双手、双脚与头部)之间的相互作用以及边缘处关节与其他关节之间的协同效应;其二,在提取时间特征时,局限于短期时间特征的提取,未能有效捕获长期时序依赖关系.针对以上问题,本文提出一种增强时空图卷积网络模型(Enhanced Spatial-Temporal Graph Convolutional Network,EST-GCN),它由多分支空间增强图卷积(Multi-branch Spatial Enhanced Graph Convolution,MSEGC)模块和多尺度时间增强卷积(Multi-scale Temporal Enhanced Convolution,MTEC)模块堆叠组成.MSEGC通过多阶段学习并传递双流图卷积下的特征,以增强边缘处关节的特征表达能力,从而捕获边缘处关节与其他关节之间的关系;MTEC通过多阶段学习并传递多尺度时间卷积下的时间特征,扩大时间跨度,从而捕获时间帧之间更广泛的时序依赖关系.模型依次通过MSEGC与MTEC提取并融合空间与时间特征,协同建模关节结构关联与时序依赖,提升时空特征判别性.为充分挖掘骨架数据的时空特征,在输入设计上,本文引入关节位置、运动速度与骨骼3类特征,并采用多流融合方式以增强特征表示能力.本文所提出的方法,在NTU-RGB+D数据集的X-Sub与X-View基准上,分别实现了92.4%与96.2%的准确率;在NTU-RGB+D 120数据集的X-Sub与X-Setup基准上,分别达到了88.7%和90.0%的准确率,证明了该方法的有效性.此外,为进一步验证模型在真实场景下的人体行为识别性能,本文基于NTU-RGB+D数据集的视频样本开展了骨架行为识别实验,并在多人交互及关节噪声干扰条件下进行了额外测试.实验结果表明,即使在局部关节出现错乱分配的情况下,模型仍能实现准确识别,验证了所提方法的实用性与鲁棒性.
Graph convolutional network (GCN) has been extensively applied to skeleton-based action recognition and have achieved remarkable performance. However
as the number of action categories and scene complexity increase
existing methods still face significant challenges in modeling detailed human body structures and temporal dependencies
which can be summarized as two main issues. Firstly
when extracting relational features among joints
these methods often inadequately capture the interactions between peripheral joints (such as hands
feet
and head) and their synergistic effects with other joints. Secondly
when extracting temporal features
these methods focus on short-term temporal feature extraction neglecting of long-term dependencies. To address these issues
this paper proposes an enhanced spatiotemporal graph convolutional network (EST-GCN)
which consists of multi-branch spatial enhanced graph convolution (MSEGC) and multi-scale temporal enhanced convolution (MTEC) modules. The MSEGC module enhances the feature representation of peripheral joints by capturing relationships between peripheral joints and others through multi-stage learning and propagation within a two-stream graph convolution framework. Meanwhile
the MTEC module effectively captures long-term temporal dependencies across frames through multi-stage learning and propagation of temporal features from multi-scale convolutions
thereby expanding the temporal receptive field. The model sequentially extracts and fuses spatial and temporal features via MSEGC and MTEC
jointly modeling joint structural correlations and temporal dependencies to improve the discriminability of spatial-temporal features. To fully exploit the spatial-temporal information of skeleton data
three types of input features—joint positions
motion velocities
and bone features—are introduced and fused through a multi-stream strategy to enhance feature representation. The proposed method achieves accuracies of 92.4% and 96.2% on the X-Sub and X-View benchmarks of the NTU-RGB+D dataset
respectively; and 88.7% and 90.0% on the X-Sub and X-Setup benchmarks of the NTU-RGB+D 120 dataset
which validates its effectiveness. Furthermore
to validate the model’s performance in real-world scenarios
additional skeleton-based action recognition experiments are conducted on video samples from the NTU-RGB+D dataset
including tests under multi-person interactions and joint noise interference. The results show that the model can still achieve accurate recognition even when local joint misassignments occur
further verifying the practicality and robustness of the proposed approach.
YAN S J , XIONG Y J , LIN D H . Spatial temporal graph convolutional networks for skeleton-based action recognition [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2018 , 32 ( 1 ): 7444 - 7452 .
SHI L , ZHANG Y F , CHENG J , et al . Two-stream adaptive graph convolutional networks for skeleton-based action recognition [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 12018 - 12027 .
赵俊男 , 佘青山 , 孟明 , 等 . 基于多流空间注意力图卷积SRU网络的骨架动作识别 [J ] . 电子学报 , 2022 , 50 ( 7 ): 1579 - 1585 .
ZHAO J N , SHE Q S , MENG M , et al . Skeleton action recognition based on multi-stream spatial attention graph convolutional SRU network [J ] . Acta Electronica Sinica , 2022 , 50 ( 7 ): 1579 - 1585 . (in Chinese)
LI C , ZHONG Q Y , XIE D , et al . Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation [EB/OL ] . ( 2018-04-17 )[ 2025-05-25 ] . https://arXiv.org/abs/1804.06055 https://arXiv.org/abs/1804.06055 .
LIU Z Y , ZHANG H W , CHEN Z H , et al . Disentangling and unifying graph convolutions for skeleton-based action recognition [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 143 - 152 .
LI M S , CHEN S H , CHEN X , et al . Symbiotic graph neural networks for 3D skeleton-based human action recognition and motion prediction [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 44 ( 6 ): 3316 - 3333 .
SHAHROUDY A , LIU J , NG T T , et al . NTU RGB+D: A large scale dataset for 3D human activity analysis [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2016 : 1010 - 1019 .
LIU J , SHAHROUDY A , PEREZ M , et al . NTU RGB D 120: A large-scale benchmark for 3D human activity understanding [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 , 42 ( 10 ): 2684 - 2701 .
ZHUO J M , CUI C , FU K , et al . Propagation is all you need: A new framework for representation learning and classifier training on graphs [C ] // Proceedings of the 31st ACM International Conference on Multimedia . New York : ACM , 2023 : 481 - 489 .
MONDAL A , SHASHANT R , GIRALDO J H , et al . Moving object detection for event-based vision using graph spectral clustering [C ] // 2021 IEEE/CVF International Conference on Computer Vision Workshops . Piscataway : IEEE , 2021 : 876 - 884 .
YING R , YOU J X , MORRIS C , et al . Hierarchical graph representation learning with differentiable pooling [C ] // Advances in Neural Information Processing Systems 31 . San Diego : NeurIPS , 2018 : 4800 - 4810 .
HU J F , ZHENG W S , LAI J H , et al . Jointly learning heterogeneous features for RGB-D activity recognition [C ] // 2015 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2015 : 5344 - 5352 .
ZHANG S Y , LIU X M , XIAO J . On geometric features for skeleton-based action recognition using multilayer LSTM networks [C ] // 2017 IEEE Winter Conference on Applications of Computer Vision . Piscataway : IEEE , 2017 : 148 - 157 .
SOO K , REITER A . Interpretable 3D human action analysis with temporal convolutional networks [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops . Piscataway : IEEE , 2017 : 1623 - 1631 .
YANG H , YAN D , ZHANG L , et al . Feedback graph convolutional network for skeleton-based action recognition [J ] . IEEE Transactions on Image Processing , 2022 , 31 : 164 - 175 .
LEE J , LEE M , LEE D , et al . Hierarchically decomposed graph convolutional networks for skeleton-based action recognition [C ] // 2023 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2023 : 10410 - 10419 .
罗会兰 , 曹立京 . 基于多维动态拓扑学习图卷积的骨架动作识别 [J ] . 电子学报 , 2024 , 52 ( 3 ): 991 - 1001 .
LUO H L , CAO L J . Multi-dimensional dynamic topology learning graph convolution for skeleton-based action recognition [J ] . Acta Electronica Sinica , 2024 , 52 ( 3 ): 991 - 1001 . (in Chinese)
XIA Y , GAO Q Y , WU W G , et al . Skeleton-based action recognition based on multidimensional adaptive dynamic temporal graph convolutional network [J ] . Engineering Applications of Artificial Intelligence , 2024 , 127 : 107210 .
WU Z Z , SUN P P , CHEN X , et al . SelfGCN: Graph convolution network with self-attention for skeleton-based action recognition [J ] . IEEE Transactions on Image Processing , 2024 , 33 : 4391 - 4403 .
XIE J Y , MENG Y D , ZHAO Y T , et al . Dynamic semantic-based spatial graph convolution network for skeleton-based human action recognition [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2024 , 38 ( 6 ): 6225 - 6233 .
ZHU Y S , SHUAI H , LIU G C , et al . Multilevel spatial-temporal excited graph network for skeleton-based action recognition [J ] . IEEE Transactions on Image Processing , 2023 , 32 : 496 - 508 .
SONG Y F , ZHANG Z , SHAN C F , et al . Constructing stronger and faster baselines for skeleton-based action recognition [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 2 ): 1474 - 1488 .
ALSARHAN T , ALI U , LU H T . Enhanced discriminative graph convolutional network with adaptive temporal modelling for skeleton-based action recognition [J ] . Computer Vision and Image Understanding , 2022 , 216 : 103348 .
DING C Y , WEN S , DING W W , et al . Temporal segment graph convolutional networks for skeleton-based action recognition [J ] . Engineering Applications of Artificial Intelligence , 2022 , 110 : 104675 .
CHENG K , ZHANG Y F , HE X Y , et al . Skeleton-based action recognition with shift graph convolutional network [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 183 - 192 .
ZHANG P F , LAN C L , ZENG W J , et al . Semantics-guided neural networks for efficient skeleton-based human action recognition [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 1112 - 1121 .
PLIZZARI C , CANNICI M , MATTEUCCI M . Skeleton-based action recognition via spatial and temporal transformer networks [J ] . Computer Vision and Image Understanding , 2021 , 208 / 209 : 103219 .
CHEN Z , LI S C , YANG B , et al . Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2021 , 35 ( 2 ): 1113 - 1122 .
HUANG Z X , QIN Y S , LIN X B , et al . Motion-driven spatial and temporal adaptive high-resolution graph convolutional networks for skeleton-based action recognition [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2023 , 33 ( 4 ): 1868 - 1883 .
WEN Y H , GAO L , FU H B , et al . Motif-GCNs with local and non-local temporal blocks for skeleton-based action recognition [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 2 ): 2009 - 2023 .
JIANG Y J , DENG H M . Lighter and faster: A multi-scale adaptive graph convolutional network for skeleton-based action recognition [J ] . Engineering Applications of Artificial Intelligence , 2024 , 132 : 107957 .
CHEN H , SHEN Y H , ZHANG Y X , et al . Skeleton-based action recognition through dual-granularity feature fusion with self-adapting graph convolution and multi-scale temporal convolution [J ] . Neurocomputing , 2025 , 639 : 130261 .
CHI H G , HA M H , CHI S , et al . InfoGCN: Representation learning for human skeleton-based action recognition [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 20154 - 20164 .
0
Views
4
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621