华南理工大学电子与信息学院, 广东广州 510640
[ "魏志超 男,1996年出生,河南禹州人.现为华南理工大学电子与信息学院硕士研究生.主要研究方向为视频压缩感知.E‑mail: zcwei2306@outlook.com" ]
[ "杨春玲(通讯作者) 女,1970年出生,河南新乡人.现为华南理工大学电子与信息学院博士生导师.主要研究方向为图像/视频压缩编码、图像质量评价." ]
收稿:2022-01-05,
修回:2022-04-01,
纸质出版:2022-11-25
移动端阅览
魏志超,杨春玲.时域注意力特征对齐的视频压缩感知重构网络[J].电子学报,2022,50(11):2584-2592.
WEI Zhi-chao,YANG Chun-ling.Video Compressed Sensing Reconstruction Network Based on Temporal-Attention Feature Alignment[J].ACTA ELECTRONICA SINICA,2022,50(11):2584-2592.
魏志超,杨春玲.时域注意力特征对齐的视频压缩感知重构网络[J].电子学报,2022,50(11):2584-2592. DOI: 10.12263/DZXB.20220041.
WEI Zhi-chao,YANG Chun-ling.Video Compressed Sensing Reconstruction Network Based on Temporal-Attention Feature Alignment[J].ACTA ELECTRONICA SINICA,2022,50(11):2584-2592. DOI: 10.12263/DZXB.20220041.
现有视频压缩感知神经网络重构算法采用的光流对齐和可变形卷积对齐的运动补偿方式存在误差积聚、信息感知范围有限等问题,极大地限制了其有效性和实用性.为了在不引入额外参数的条件下自适应提取参考帧的全局信息,本文提出了利用注意力机制实现视频压缩感知重构过程中运动估计/运动补偿的创新思想,并设计了时域注意力特征对齐网络(Temporal-Attention Feature Alignment Network, TAFA-Net)进行实现.在此基础上,提出了联合深度重构网络(Joint Deep Reconstruction Network Based on TAFA-Net, JDR-TAFA-Net),实现非关键帧的高性能重构.先利用本文所提的TAFA-Net获得参考帧到当前帧的对齐帧;然后,利用基于自编码器架构的融合网络充分提取已有帧信息,增强非关键帧的重构质量.仿真结果表明,与最优的迭代优化算法SSIM-InterF-GSR相比,所提算法重构帧的峰值信噪比(Peak Signal to Noise Ratio, PSNR)最高提升了4.74dB;与最优的深度学习算法STM-Net相比,所提算法重构帧的PSNR最高提升了0.64dB.
The motion compensation methods of optical flow alignment and deformable convolution alignment adopted by the existing video compressed sensing reconstruction algorithms have problems such as error accumulation and limited information perception range
which greatly limit their effectiveness and practicability. In order to adaptively extract the global information of the reference frame without introducing extra parameters
this paper first proposes an innovative idea of using the attention mechanism to realize motion estimation and motion compensation in video compressed sensing reconstruction
and then designs the temporal-attention feature alignment network(TAFA-Net) for implementation. On this basis
a joint deep reconstruction network(JDR-TAFA-Net) is proposed to achieve high-performance reconstruction for non-key frames. First
the reference frames are adaptively aligned to the current non-key frame through TAFA-Net
and then a fusion network based on the auto-encoder is introduced to fully extract the relevant information from existing frames to further enhance the reconstruction quality of the non-key frames. Experimental results show that
compared with the state-of-the-art iterative optimization-based method SSIM-InterF-GSR
the proposed method can improve PSNR(Peak Signal to Noise Ratio) by 4.74dB
and compared with the state-of-the-art deep learning-based method STM-Net
the proposed method can improve PSNR by 0.64dB.
DONOHO D L . Compressed sensing [J]. IEEE Transactions on Information Theory , 2006 , 52 ( 4 ): 1289 - 1306 .
TRAMEL E W , FOWLER J E . Video compressed sensing with multihypothesis [C]// 2011 Data Compression Conference . Snowbird : IEEE , 2011 : 193 - 202 .
OU W , YANG C , LI W , et al . A two-stage multihypothesis reconstruction scheme in compressed video sensing [C]// 2016 IEEE International Conference on Image Processing . Phoenix : IEEE , 2016 : 2494 - 2498 .
LI W , YANG C , MA L . A multihypothesis-based residual reconstruction scheme in compressed video sensing [C]// 2017 IEEE International Conference on Image Processing . Beijing : IEEE , 2017 : 2766 - 2770 .
和志杰 , 杨春玲 , 汤瑞东 . 视频压缩感知中基于结构相似的帧间组稀疏表示重构算法研究 [J]. 电子学报 , 2018 , 46 ( 3 ): 544 - 553 .
HE Zhi-jie , YANG Chun-ling , TANG Rui-dong . Research on structural similarity based inter-frame group sparse representation for compressed video sensing [J]. Acta Electronica Sinica , 2018 , 46 ( 3 ): 544 - 553 . (in Chinese)
XU K , REN F . CSVideoNet: A real-time end-to end learning framework for high-frame-rate video compressive sensing [C]// 2018 IEEE Winter Conference on Applications of Computer Vision . Lake Tahoe : IEEE , 2018 : 1680 - 1688 .
禤韵怡 , 杨春玲 . 基于帧间组稀疏的两阶段递归增强视频压缩感知重构网络 [J]. 电子学报 , 2021 , 49 ( 3 ): 435 - 442 .
XUAN Yun-yi , YANG Chun-ling . Two-stage recursive enhancement reconstruction based on video inter-frame group sparse representation in compressed video sensing [J]. Acta Electronica Sinica , 2021 , 49 ( 3 ): 435 - 442 . (in Chinese)
LING X , YANG C , PEI H . Compressed video sensing network based on alignment prediction and residual reconstruction [C]// 2020 IEEE International Conference on Multimedia and Expo . London : IEEE , 2020 : 1 - 6 .
WEI Z , YANG C , XUAN Y . Efficient video compressed sensing reconstruction via exploiting spatial-temporal correlation with measurement constraint [C]// 2021 IEEE International Conference on Multimedia and Expo . Shenzhen : IEEE , 2021 : 1 - 6 .
DAI J , QI H , XIONG Y , et al . Deformable convolutional networks [C]// 2017 IEEE International Conference on Computer Vision . Venice : IEEE , 2017 : 764 - 773 .
CARION N , MASSA F , SYNNAEVE G , et al . End-to-end object detection with transformers [C]// European Conference on Computer Vision . Virtual : Springer , 2020 : 213 - 229 .
CHEN H , WANG Y , GUO T , et al . Pre-trained image processing transformer [C]// 2021 IEEE Conference on Computer Vision and Pattern Recognition . Nashville : IEEE , 2021 : 12299 - 12310 .
LIU Z , LIN Y , CAO Y , et al . Swin transformer: Hierarchical vision transformer using shifted windows [C]// 2021 IEEE International Conference on Computer Vision . Virtual : IEEE , 2021 : 10012 - 10022 .
HUANG Z , WANG X , HUANG L , et al . CCNet: Criss-cross attention for semantic segmentation [C]// 2019 IEEE International Conference on Computer Vision . Seoul : IEEE , 2019 : 603 - 612 .
SHI W , JIANG F , ZHANG S , et al . Deep networks for compressed image sensing [C]// 2017 IEEE International Conference on Multimedia and Expo . Hong Kong : IEEE , 2017 : 877 - 882 .
裴翰奇 , 杨春玲 , 魏志超 , 曹燕 . 基于SPL迭代思想的图像压缩感知重构神经网络 [J]. 电子学报 , 2021 , 49 ( 6 ): 1195 - 1203 .
PEI Han-qi , YANG Chun-ling , WEI Zhi-chao , CAO Yan . Image compressive sensing reconstruction network based on iterative SPL theory [J]. Acta Electronica Sinica , 2021 , 49 ( 6 ): 1195 - 1203 . (in Chinese)
GAN L . Block compressed sensing of natural images [C]// 2007 International Conference on Digital Signal Processing . Cardiff : IEEE , 2007 : 403 - 406 .
WANG P , CHEN P , YUAN Y , et al . Understanding convolution for semantic segmentation [C]// 2018 IEEE Winter Conference on Applications of Computer Vision . Lake Tahoe : IEEE , 2018 : 1451 - 1460 .
ARBELAEZ P , MAIRE M , FOWLKES C , et al . Contour detection and hierarchical image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2010 , 33 ( 5 ): 898 - 916 .
SOOMRO K , ZAMIR A R , SHAH M . UCF101: A dataset of 101 human actions classes from videos in the wild [EB/OL]. [ 2022-04-01 ]. https://arxiv.org/abs/1212.0402 https://arxiv.org/abs/1212.0402 .
KINGMA D P , BA J . Adam: A method for stochastic optimization [EB/OL]. [ 2022-04-01 ]. https://arxiv.org/abs/ https://arxiv.org/abs/
1412 . 6980 .
SULLIVAN G J , OHM J R , HAN W J , et al . Overview of the high efficiency video coding(HEVC) standard [J]. IEEE Transactions on Circuits and Systems for Video Technology , 2012 , 22 ( 12 ): 1649 - 1668 .
0
浏览量
10
下载量
3
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621