基于多维动态拓扑学习图卷积的骨架动作识别

罗会兰; 曹立京

doi:10.12263/DZXB.20221106

您当前的位置：

首页 >

文章列表页 >

基于多维动态拓扑学习图卷积的骨架动作识别

学术论文 | 更新时间：2025-12-11

- 基于多维动态拓扑学习图卷积的骨架动作识别
- Multi-Dimensional Dynamic Topology Learning Graph Convolution for Skeleton-Based Action Recognition
- 电子学报 2024年52卷第3期页码：991-1001
- 作者机构：
  
  江西理工大学信息工程学院，江西赣州 341000
- 作者简介：
  
  [ "罗会兰女，1974年9月生于江西上高.现为江西理工大学图像处理实验室教授、硕士生导师.主要从事机器学习、模式识别等方面的研究. E-mail: luohuilan@sina.com" ]
  [ "曹立京男，1997年10月生于江西赣州.现为江西理工大学信息工程学院硕士研究生，研究方向为骨架动作识别. E-mail: 2870256076@qq.com" ]
- 基金信息：
  
  国家自然科学基金(61862031);江西省主要学科技术带头人领军人才计划资助项目(20213BCJ22004);江西省学位与研究生教育教学改革研究重点项目(JXYJG-2020-120)
- DOI：10.12263/DZXB.20221106
  中图分类号： TP391.4
- 收稿：2022-09-30，
  
  修回：2023-03-15，
  
  纸质出版：2024-03-25
- 稿件说明：
移动端阅览
罗会兰,曹立京.基于多维动态拓扑学习图卷积的骨架动作识别[J].电子学报,2024,52(03):991-1001.

LUO Hui-lan,CAO Li-jing.Multi-Dimensional Dynamic Topology Learning Graph Convolution for Skeleton-Based Action Recognition[J].ACTA ELECTRONICA SINICA,2024,52(03):991-1001.
罗会兰,曹立京.基于多维动态拓扑学习图卷积的骨架动作识别[J].电子学报,2024,52(03):991-1001. DOI： 10.12263/DZXB.20221106.

LUO Hui-lan,CAO Li-jing.Multi-Dimensional Dynamic Topology Learning Graph Convolution for Skeleton-Based Action Recognition[J].ACTA ELECTRONICA SINICA,2024,52(03):991-1001. DOI： 10.12263/DZXB.20221106.

摘要

图卷积由于其对图数据的强大表示能力被广泛应用于基于骨架的动作识别任务中.但是现有的图卷积方法在所有帧或通道上都使用共享的图拓扑进行特征聚合，这极大限制了图卷积网络的表示能力.为了解决这些问题，本文提出多维动态拓扑学习图卷积用于动态建模具有时序与通道特异性的拓扑结构. 多维动态拓扑学习图卷积主要包含三个组成部分：纯粹节点拓扑学习图卷积（pure Joint topology learning Graph Convolution， J-GC）、动态时序特异性拓扑学习图卷积（Dynamic Temporal-Wise topology learning Graph Convolution， DTW-GC）和通道特异性拓扑学习图卷积（Channel-Wise topology learning Graph Convolution， CW-GC）.特别地，在DTW-GC中使用了动态骨架拓扑建模方法（Dynamic Skeleton Topology Learning， DSTL），以高效地建模富含全局时空拓扑特征的动态骨架拓扑.将多维动态拓扑学习图卷积与多尺度时间卷积（Multi-Scale Temporal Convolution， MS-TC）相结合，本文构建了具有强大建模能力的图卷积网络.此外，为了对骨架数据的空间信息进行补充，本文额外引入了相对节点数据和相对骨骼数据进行多流网络的融合.本文所提出的方法在NTU-RGB+D与NTU-RGB+D 120数据集上分别取得了92.64%和89.29%的准确率，超过了当前最先进方法.

Abstract

Graph convolution is widely used in skeleton-based action recognition because of its effectiveness of processing graph data. However

the existing graph convolution methods use the shared graph topology for feature aggregation on all frames or channels

which greatly limits the representation ability of graph convolution network. In order to solve these problems

a multi-dimensional dynamic topology learning graph convolution is proposed in this paper to dynamically model the topology with temporal and channel specificity. The multi-dimensional dynamic topology learning graph convolution mainly includes three parts: pure joint topology learning graph convolution (J-GC)

dynamic temporal-wise topology learning graph convolution (DTW-GC) and channel-wise topology learning graph convolution (CW-GC). In particular

in DTW-GC

a dynamic skeleton topology modeling method (DSTL) is designed to efficiently model the dynamic skeleton topology with rich global spatio-temporal topological features. Finally

by combining multi-dimensional dynamic topology learning graph convolution with multi-scale temporal convolution (Muti-Scale TCN)

a graph convolution network with powerful modeling capability is constructed in this paper. In addition

in order to supplement the spatial information of skeleton data

the relative joint data and relative bone data are introduced for multi-stream network fusion. Our method achieves 92.64% and 89.29% accuracy on NTU-RGB+D and NTU-RGB+D 120 datasets

respectively

which is superior to the current state-of-the-art methods.

关键词

Keywords

references

CAO Z , SIMON T , WEI S E , et al . Realtime multi-person 2D pose estimation using part affinity fields [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 1302 - 1310 .

YAN S J , XIONG Y J , LIN D H . Spatial temporal graph convolutional networks for skeleton-based action recognition [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . New Orleans : AAAI , 2018 : 7444 - 7452 .

赵俊男 , 佘青山 , 孟明 , 等 . 基于多流空间注意力图卷积SRU网络的骨架动作识别 [J ] . 电子学报 , 2022 , 50 ( 7 ): 1579 - 1585 .

ZHAO J N , SHE Q S , MENG M , et al . Skeleton action recognition based on multi-stream spatial attention graph convolutional SRU network [J ] . Acta Electronica Sinica , 2022 , 50 ( 7 ): 1579 - 1585 . (in Chinese)

SHI L , ZHANG Y F , CHENG J , et al . Two-stream adaptive graph convolutional networks for skeleton-based action recognition [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 12018 - 12027 .

SHI L , ZHANG Y F , CHENG J , et al . Skeleton-based action recognition with multi-stream adaptive graph convolutional networks [J ] . IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society , 2020 , 29 : 9532 - 9545 .

YE F F , PU S L , ZHONG Q Y , et al . Dynamic GCN: Context-enriched topology learning for skeleton-based action recognition [C ] // Proceedings of the 28th ACM International Conference on Multimedia . New York : ACM , 2020 : 55 - 63 .

WEN Y H , GAO L , FU H B , et al . Motif-GCNs with local and non-local temporal blocks for skeleton-based action recognition [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 2 ): 2009 - 2023 .

LI M S , CHEN S H , CHEN X , et al . Actional-structural graph convolutional networks for skeleton-based action recognition [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 3590 - 3598 .

LIU Z Y , ZHANG H W , CHEN Z H , et al . Disentangling and unifying graph convolutions for skeleton-based action recognition [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 140 - 149 .

CHI H G , HA M H , CHI S , et al . InfoGCN: Representation learning for human skeleton-based action recognition [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 20154 - 20164 .

CHENG K , ZHANG Y F , CAO C Q , et al . Decoupling GCN with DropGraph module for skeleton-based action recognition [C ] // Computer Vision ECCV 2020 . Cham : Springer International Publishing , 2020 : 536 - 553 .

CHEN Y X , ZHANG Z Q , YUAN C F , et al . Channel-wise topology refinement graph convolution for skeleton-based action recognition [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2022 : 13339 - 13348 .

ZHANG P F , LAN C L , ZENG W J , et al . Semantics-guided neural networks for efficient skeleton-based human action recognition [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 1109 - 1118 .

SHAHROUDY A , LIU J , NG T T , et al . NTU RGB D: A large scale dataset for 3D human activity analysis [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 1010 - 1019 .

LIU J , SHAHROUDY A , PEREZ M , et al . NTU RGB D 120: A large-scale benchmark for 3D human activity understanding [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 , 42 ( 10 ): 2684 - 2701 .

BILEN H , FERNANDO B , GAVVES E , et al . Dynamic image networks for action recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 3034 - 3042 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .

LIU J , SHAHROUDY A , XU D , et al . Skeleton-based action recognition using spatio-temporal LSTM network with trust gates [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 , 40 ( 12 ): 3007 - 3021 .

LI C , ZHONG Q Y , XIE D , et al . Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation [C ] // Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence . Stockholm : International Joint Conferences on Artificial Intelligence Organization , 2018 : 786 - 792 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于多流空间注意力图卷积SRU网络的骨架动作识别

一种基于双流融合3D卷积神经网络的动态头势识别方法

基于深度学习的视频中人体动作识别进展综述