电子学报

• • 上一篇    

时域注意力特征对齐的视频压缩感知重构网络

魏志超, 杨春玲   

  1. 华南理工大学电子与信息学院,广东 广州 510640
  • 收稿日期:2022-01-05 修回日期:2022-04-01
  • 作者简介:魏志超 男,1996年出生,河南禹州人.现为华南理工大学电子与信息学院硕士研究生.主要研究方向为视频压缩感知.E‑mail:zcwei2306@outlook.com
    杨春玲(通信作者) 女,1970年出生,河南新乡人.现为华南理工大学电子与信息学院博士生导师.主要研究方向为图像/视频压缩编码、图像质量评价.E‑mail:eeclyang@scut.edu.cn
  • 基金资助:
    广东省自然科学基金(2019A1515011949)

Video Compressed Sensing Reconstruction Network Based on Temporal-Attention Feature Alignment

WEI Zhi-chao, YANG Chun-ling   

  1. School of Electronic and Information Engineering,South China University of Technology,Guangzhou,Guangdong 510640,China
  • Received:2022-01-05 Revised:2022-04-01

摘要:

现有视频压缩感知神经网络重构算法采用的光流对齐和可变形卷积对齐的运动补偿方式存在误差积聚、信息感知范围有限等问题,极大地限制了其有效性和实用性.为了在不引入额外参数的条件下自适应提取参考帧的全局信息,本文提出了利用注意力机制实现视频压缩感知重构过程中运动估计/运动补偿的创新思想,并设计了时域注意力特征对齐网络(Temporal-Attention Feature Alignment Network, TAFA-Net)进行实现.在此基础上,提出了联合深度重构网络(Joint Deep Reconstruction Network Based on TAFA-Net, JDR-TAFA-Net),实现非关键帧的高性能重构.先利用本文所提的TAFA-Net获得参考帧到当前帧的对齐帧;然后,利用基于自编码器架构的融合网络充分提取已有帧信息,增强非关键帧的重构质量.仿真结果表明,与最优的迭代优化算法SSIM-InterF-GSR相比,所提算法重构帧的峰值信噪比(Peak Signal to Noise Ratio, PSNR)最高提升了4.74dB;与最优的深度学习算法STM-Net相比,所提算法重构帧的PSNR最高提升了0.64dB.

关键词: 视频压缩感知, 神经网络, 时域注意力, 特征对齐, 运动补偿, 深度重构

Abstract:

The motion compensation methods of optical flow alignment and deformable convolution alignment adopted by the existing video compressed sensing reconstruction algorithms have problems such as error accumulation and limited information perception range, which greatly limit their effectiveness and practicability. In order to adaptively extract the global information of the reference frame without introducing extra parameters, this paper first proposes an innovative idea of using the attention mechanism to realize motion estimation and motion compensation in video compressed sensing reconstruction, and then designs the temporal-attention feature alignment network(TAFA-Net) for implementation. On this basis, a joint deep reconstruction network(JDR-TAFA-Net) is proposed to achieve high-performance reconstruction for non-key frames. First, the reference frames are adaptively aligned to the current non-key frame through TAFA-Net, and then a fusion network based on the auto-encoder is introduced to fully extract the relevant information from existing frames to further enhance the reconstruction quality of the non-key frames. Experimental results show that, compared with the state-of-the-art iterative optimization-based method SSIM-InterF-GSR, the proposed method can improve PSNR(Peak Signal to Noise Ratio) by 4.74dB, and compared with the state-of-the-art deep learning-based method STM-Net, the proposed method can improve PSNR by 0.64dB.

Key words: video compressed sensing, neural network, temporal attention, feature alignment, motion compensation, deep reconstruction

中图分类号: