电子学报 ›› 2022, Vol. 50 ›› Issue (7): 1579-1585.DOI: 10.12263/DZXB.20210416

• 学术论文 • 上一篇    下一篇

基于多流空间注意力图卷积SRU网络的骨架动作识别

赵俊男, 佘青山, 孟明, 陈云   

  1. 杭州电子科技大学自动化学院,浙江 杭州 310018
  • 收稿日期:2021-03-29 修回日期:2021-12-30 出版日期:2022-07-25 发布日期:2022-07-30
  • 作者简介:赵俊男 男,1996年12月出生于浙江湖州.现为杭州电子科技大学自动化学院硕士研究生,研究方向为3D骨架动作识别、人体姿态估计.E-mail: 663261972@qq.com
    佘青山 男,1980年2月出生于湖北松滋. 现为杭州电子科技大学教授,主要研究方向为机器学习与脑-机接口、康复机器人、图像/视频处理与分析.E-mail: qsshe@hdu.edu.cn
  • 基金资助:
    国家自然科学基金(61871427);浙江省自然科学基金重点项目(LZ22F010003)

Skeleton Action Recognition Based on Multi-Stream Spatial Attention Graph Convolutional SRU Network

ZHAO Jun-nan, SHE Qing-shan, MENG Ming, CHEN Yun   

  1. College of Automation,Hangzhou Dianzi University,Hangzhou,Zhejiang 310018,China
  • Received:2021-03-29 Revised:2021-12-30 Online:2022-07-25 Published:2022-07-30

摘要:

基于骨架的动作识别越来越受到重视.针对现有算法推理速度慢、数据模式单一等问题,本文提出了一种轻量且高效的方法.该网络在简单循环单元(Simple Recurrent Unit,SRU)中嵌入图卷积算子构建图卷积SRU(GC-SRU)模型,来捕获数据的时空域信息.同时,为了加强节点间的区分,采用空间注意力网络和多流数据融合方式,将GC-SRU拓展成多流空间注意力图卷积SRU(MSAGC-SRU).最后,在公开数据集上进行实验分析.结果表明,本文方法在Northwestern-UCLA上的分类准确率达到了93.1%,模型FLOPs为4.4G;NTU RGB+D上的分类准确率在CV、CS评估协议下分别达到92.7%和87.3%,模型FLOPs为21.3G,达到了计算效率和分类精度的良好平衡.

关键词: 动作识别, 图卷积, 注意力机制, 数据融合

Abstract:

Action recognition with skeleton data has attracted more attention. In order to solve the problems of low reasoning speed and single data mode of most algorithms, a lightweight and efficient method is proposed. The network embeds the graph convolution operator in the simple recurrent unit(SRU) to construct the graph convolutional SRU(GC-SRU), which can capture the spatial-temporal information of data. Meanwhile, to enhance the distinction between nodes, spatial attention network and multi-stream data fusion are used to expand GC-SRU into multi-stream spatial attention graph convolutional SRU(MSAGC-SRU). Finally, the proposed method is evaluated on two public datasets. Experimental results show that the classification accuracy of our method on Northwestern-UCLA reaches 93.1% and the FLOPs of the model is 4.4G. The accuracy on NTU RGB+D reaches 92.7% and 87.3% under the CV and CS evaluation protocols, respectively, and the FLOPs of the model is 21.3G. The proposed model has achieved good trade-off between computational efficiency and classification accuracy.

Key words: action recognition, graph convolution, attention mechanism, data fusion

中图分类号: