时域注意力特征对齐的视频压缩感知重构网络

魏志超; 杨春玲

doi:10.12263/DZXB.20220041

您当前的位置：

首页 >

文章列表页 >

时域注意力特征对齐的视频压缩感知重构网络

学术论文 | 更新时间：2025-12-08

- 时域注意力特征对齐的视频压缩感知重构网络
- Video Compressed Sensing Reconstruction Network Based on Temporal-Attention Feature Alignment
- 电子学报 2022年50卷第11期页码：2584-2592
- 作者机构：
  
  华南理工大学电子与信息学院，广东广州 510640
- 作者简介：
  
  [ "魏志超男，1996年出生，河南禹州人.现为华南理工大学电子与信息学院硕士研究生.主要研究方向为视频压缩感知.E‑mail: zcwei2306@outlook.com" ]
  [ "杨春玲（通讯作者）女，1970年出生，河南新乡人.现为华南理工大学电子与信息学院博士生导师.主要研究方向为图像/视频压缩编码、图像质量评价." ]
- 基金信息：
  
  广东省自然科学基金(2019A1515011949)
- DOI：10.12263/DZXB.20220041
  中图分类号： TN919.8
- 收稿：2022-01-05，
  
  修回：2022-04-01，
  
  纸质出版：2022-11-25
- 稿件说明：
移动端阅览
魏志超,杨春玲.时域注意力特征对齐的视频压缩感知重构网络[J].电子学报,2022,50(11):2584-2592.

WEI Zhi-chao,YANG Chun-ling.Video Compressed Sensing Reconstruction Network Based on Temporal-Attention Feature Alignment[J].ACTA ELECTRONICA SINICA,2022,50(11):2584-2592.
魏志超,杨春玲.时域注意力特征对齐的视频压缩感知重构网络[J].电子学报,2022,50(11):2584-2592. DOI： 10.12263/DZXB.20220041.

WEI Zhi-chao,YANG Chun-ling.Video Compressed Sensing Reconstruction Network Based on Temporal-Attention Feature Alignment[J].ACTA ELECTRONICA SINICA,2022,50(11):2584-2592. DOI： 10.12263/DZXB.20220041.

摘要

现有视频压缩感知神经网络重构算法采用的光流对齐和可变形卷积对齐的运动补偿方式存在误差积聚、信息感知范围有限等问题，极大地限制了其有效性和实用性.为了在不引入额外参数的条件下自适应提取参考帧的全局信息，本文提出了利用注意力机制实现视频压缩感知重构过程中运动估计/运动补偿的创新思想，并设计了时域注意力特征对齐网络（Temporal-Attention Feature Alignment Network， TAFA-Net）进行实现.在此基础上，提出了联合深度重构网络（Joint Deep Reconstruction Network Based on TAFA-Net， JDR-TAFA-Net），实现非关键帧的高性能重构.先利用本文所提的TAFA-Net获得参考帧到当前帧的对齐帧；然后，利用基于自编码器架构的融合网络充分提取已有帧信息，增强非关键帧的重构质量.仿真结果表明，与最优的迭代优化算法SSIM-InterF-GSR相比，所提算法重构帧的峰值信噪比（Peak Signal to Noise Ratio， PSNR）最高提升了4.74dB；与最优的深度学习算法STM-Net相比，所提算法重构帧的PSNR最高提升了0.64dB.

Abstract

The motion compensation methods of optical flow alignment and deformable convolution alignment adopted by the existing video compressed sensing reconstruction algorithms have problems such as error accumulation and limited information perception range

which greatly limit their effectiveness and practicability. In order to adaptively extract the global information of the reference frame without introducing extra parameters

this paper first proposes an innovative idea of using the attention mechanism to realize motion estimation and motion compensation in video compressed sensing reconstruction

and then designs the temporal-attention feature alignment network(TAFA-Net) for implementation. On this basis

a joint deep reconstruction network(JDR-TAFA-Net) is proposed to achieve high-performance reconstruction for non-key frames. First

the reference frames are adaptively aligned to the current non-key frame through TAFA-Net

and then a fusion network based on the auto-encoder is introduced to fully extract the relevant information from existing frames to further enhance the reconstruction quality of the non-key frames. Experimental results show that

compared with the state-of-the-art iterative optimization-based method SSIM-InterF-GSR

the proposed method can improve PSNR(Peak Signal to Noise Ratio) by 4.74dB

and compared with the state-of-the-art deep learning-based method STM-Net

the proposed method can improve PSNR by 0.64dB.

关键词

Keywords

references

DONOHO D L . Compressed sensing [J]. IEEE Transactions on Information Theory , 2006 , 52 ( 4 ): 1289 - 1306 .

TRAMEL E W , FOWLER J E . Video compressed sensing with multihypothesis [C]// 2011 Data Compression Conference . Snowbird : IEEE , 2011 : 193 - 202 .

OU W , YANG C , LI W , et al . A two-stage multihypothesis reconstruction scheme in compressed video sensing [C]// 2016 IEEE International Conference on Image Processing . Phoenix : IEEE , 2016 : 2494 - 2498 .

LI W , YANG C , MA L . A multihypothesis-based residual reconstruction scheme in compressed video sensing [C]// 2017 IEEE International Conference on Image Processing . Beijing : IEEE , 2017 : 2766 - 2770 .

和志杰 , 杨春玲 , 汤瑞东 . 视频压缩感知中基于结构相似的帧间组稀疏表示重构算法研究 [J]. 电子学报 , 2018 , 46 ( 3 ): 544 - 553 .

HE Zhi-jie , YANG Chun-ling , TANG Rui-dong . Research on structural similarity based inter-frame group sparse representation for compressed video sensing [J]. Acta Electronica Sinica , 2018 , 46 ( 3 ): 544 - 553 . (in Chinese)

XU K , REN F . CSVideoNet: A real-time end-to end learning framework for high-frame-rate video compressive sensing [C]// 2018 IEEE Winter Conference on Applications of Computer Vision . Lake Tahoe : IEEE , 2018 : 1680 - 1688 .

禤韵怡 , 杨春玲 . 基于帧间组稀疏的两阶段递归增强视频压缩感知重构网络 [J]. 电子学报 , 2021 , 49 ( 3 ): 435 - 442 .

XUAN Yun-yi , YANG Chun-ling . Two-stage recursive enhancement reconstruction based on video inter-frame group sparse representation in compressed video sensing [J]. Acta Electronica Sinica , 2021 , 49 ( 3 ): 435 - 442 . (in Chinese)

LING X , YANG C , PEI H . Compressed video sensing network based on alignment prediction and residual reconstruction [C]// 2020 IEEE International Conference on Multimedia and Expo . London : IEEE , 2020 : 1 - 6 .

WEI Z , YANG C , XUAN Y . Efficient video compressed sensing reconstruction via exploiting spatial-temporal correlation with measurement constraint [C]// 2021 IEEE International Conference on Multimedia and Expo . Shenzhen : IEEE , 2021 : 1 - 6 .

DAI J , QI H , XIONG Y , et al . Deformable convolutional networks [C]// 2017 IEEE International Conference on Computer Vision . Venice : IEEE , 2017 : 764 - 773 .

CARION N , MASSA F , SYNNAEVE G , et al . End-to-end object detection with transformers [C]// European Conference on Computer Vision . Virtual : Springer , 2020 : 213 - 229 .

CHEN H , WANG Y , GUO T , et al . Pre-trained image processing transformer [C]// 2021 IEEE Conference on Computer Vision and Pattern Recognition . Nashville : IEEE , 2021 : 12299 - 12310 .

LIU Z , LIN Y , CAO Y , et al . Swin transformer: Hierarchical vision transformer using shifted windows [C]// 2021 IEEE International Conference on Computer Vision . Virtual : IEEE , 2021 : 10012 - 10022 .

HUANG Z , WANG X , HUANG L , et al . CCNet: Criss-cross attention for semantic segmentation [C]// 2019 IEEE International Conference on Computer Vision . Seoul : IEEE , 2019 : 603 - 612 .

SHI W , JIANG F , ZHANG S , et al . Deep networks for compressed image sensing [C]// 2017 IEEE International Conference on Multimedia and Expo . Hong Kong : IEEE , 2017 : 877 - 882 .

裴翰奇 , 杨春玲 , 魏志超 , 曹燕 . 基于SPL迭代思想的图像压缩感知重构神经网络 [J]. 电子学报 , 2021 , 49 ( 6 ): 1195 - 1203 .

PEI Han-qi , YANG Chun-ling , WEI Zhi-chao , CAO Yan . Image compressive sensing reconstruction network based on iterative SPL theory [J]. Acta Electronica Sinica , 2021 , 49 ( 6 ): 1195 - 1203 . (in Chinese)

GAN L . Block compressed sensing of natural images [C]// 2007 International Conference on Digital Signal Processing . Cardiff : IEEE , 2007 : 403 - 406 .

WANG P , CHEN P , YUAN Y , et al . Understanding convolution for semantic segmentation [C]// 2018 IEEE Winter Conference on Applications of Computer Vision . Lake Tahoe : IEEE , 2018 : 1451 - 1460 .

ARBELAEZ P , MAIRE M , FOWLKES C , et al . Contour detection and hierarchical image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2010 , 33 ( 5 ): 898 - 916 .

SOOMRO K , ZAMIR A R , SHAH M . UCF101: A dataset of 101 human actions classes from videos in the wild [EB/OL]. [ 2022-04-01 ]. https://arxiv.org/abs/1212.0402 https://arxiv.org/abs/1212.0402 .

KINGMA D P , BA J . Adam: A method for stochastic optimization [EB/OL]. [ 2022-04-01 ]. https://arxiv.org/abs/ https://arxiv.org/abs/

1412 . 6980 .

SULLIVAN G J , OHM J R , HAN W J , et al . Overview of the high efficiency video coding(HEVC) standard [J]. IEEE Transactions on Circuits and Systems for Video Technology , 2012 , 22 ( 12 ): 1649 - 1668 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

背景感知机制的图像分类网络

基于神经网络的图像风格迁移算法综述

基于自适应空间映射多可信度模型的网状天线形态机电集成优化设计

差分编码与神经网络辅助的OFDM系统信道估计方法