基于循环区域关注和视频帧关注的视频行为识别网络设计

桑海峰; 赵子裕; 何大阔

doi:10.3969/j.issn.0372-2112.2020.06.002

您当前的位置：

首页 >

文章列表页 >

基于循环区域关注和视频帧关注的视频行为识别网络设计

学术论文 | 更新时间：2025-07-16

- 基于循环区域关注和视频帧关注的视频行为识别网络设计
- Recurrent Region Attention and Video Frame Attention Based Video Action Recognition Network Design
- 电子学报 2020年48卷第6期页码：1052-1061
- 作者机构：
  
  1. 沈阳工业大学信息科学与工程学院,辽宁,沈阳,110870
  2. 东北大学信息科学与工程学院,辽宁,沈阳,110819
- 作者简介：
- 基金信息：
  
  国家自然科学基金 (No.61773105，No.61374147）;辽宁省自然科学基金 (No.20170540675）;辽宁省教育厅科研项目 (No.LQGD2017023）
- DOI：10.3969/j.issn.0372-2112.2020.06.002
  中图分类号： TP391
- 网络出版：2020-06-25，
  
  纸质出版：2020
- 稿件说明：
移动端阅览
桑海峰, 赵子裕, 何大阔. 基于循环区域关注和视频帧关注的视频行为识别网络设计[J]. 电子学报, 2020,48(6):1052-1061.

SANG Hai-feng, ZHAO Zi-yu, HE Da-kuo. Recurrent Region Attention and Video Frame Attention Based Video Action Recognition Network Design[J]. Acta Electronica Sinica, 2020, 48(6): 1052-1061.
桑海峰, 赵子裕, 何大阔. 基于循环区域关注和视频帧关注的视频行为识别网络设计[J]. 电子学报, 2020,48(6):1052-1061. DOI： 10.3969/j.issn.0372-2112.2020.06.002.

SANG Hai-feng, ZHAO Zi-yu, HE Da-kuo. Recurrent Region Attention and Video Frame Attention Based Video Action Recognition Network Design[J]. Acta Electronica Sinica, 2020, 48(6): 1052-1061. DOI： 10.3969/j.issn.0372-2112.2020.06.002.

摘要

视频帧中复杂的环境背景、照明条件等与行为无关的视觉信息给行为空间特征带来了大量的冗余和噪声，一定程度上影响了行为识别的准确性.针对这一点，本文提出了一种循环区域关注单元以捕捉空间特征中与行为相关的区域视觉信息，并根据视频的时序特性又提出了循环区域关注模型.其次，本文又提出了一种能够突显整段行为视频序列中较为重要帧的视频帧关注模型，以减少异类行为视频序列间相似的前后关联给识别带来的干扰.最后，提出了一个能够端到端训练的网络模型：基于循环区域关注和视频帧关注的视频行为识别网络（Recurrent Region Attention and Video Frame Attention based video action recognition Network，RFANet）.在两个视频行为识别基准UCF101数据集和HMDB51数据集上的实验表明，本文提出的端到端网络RFANet能够可靠地识别出视频中行为的所属类别.受双流结构启发，本文构建了双模态RFANet网络.在相同的训练环境下，双模态RFANet网络在两个数据集上达到了最优的性能.

Abstract

In video frames

the complex environment background

lighting conditions and other visual information unrelated to action bring a lot of redundancy and noise to action spatial feature

which affects the accuracy of action recognition to some extent. In view of this

this paper proposes a recurrent region attention cell to capture the visual information of the region related to the action in spatial features. Based on the sequence nature of video

a recurrent region attention model (RRA) is proposed. Secondly

this paper proposes a video frame attention model (VFA) that can highlight the more important frames in the video sequence of the whole action

so as to reduce the interference brought by the similar before and after correlation between video sequences of different actions. Finally

this paper presents a network model which can perform end-to-end training: recurrent region attention and video frame attention based video action recognition network (RFANet). Experiments on two video action recognition benchmark UCF101 dataset and HMDB51 dataset show that the RFANet proposed in this paper can reliably identify the category of action in the video. Inspired by the two-stream structure

we construct a two-modalities RFANet network. In the same training conditions

the two-modalities RFANet network achieved optimal performance on both datasets.

关键词

Keywords

references

浏览量

107

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于一维卷积神经网络与循环神经网络串联的心音分析方法

面向激光光条图像修复的循环相似度映射网络

基于时空图神经网络的手势识别

基于增强时空图卷积网络的骨架行为识别

基于时空特征点的非监督姿态建模和行为识别的算法研究