电子学报 ›› 2022, Vol. 50 ›› Issue (11): 2584-2592.DOI: 10.12263/DZXB.20220041
魏志超, 杨春玲
收稿日期:
2022-01-05
修回日期:
2022-04-01
出版日期:
2022-11-25
通讯作者:
作者简介:
基金资助:
WEI Zhi-chao, YANG Chun-ling
Received:
2022-01-05
Revised:
2022-04-01
Online:
2022-11-25
Published:
2022-11-19
Corresponding author:
摘要:
现有视频压缩感知神经网络重构算法采用的光流对齐和可变形卷积对齐的运动补偿方式存在误差积聚、信息感知范围有限等问题,极大地限制了其有效性和实用性.为了在不引入额外参数的条件下自适应提取参考帧的全局信息,本文提出了利用注意力机制实现视频压缩感知重构过程中运动估计/运动补偿的创新思想,并设计了时域注意力特征对齐网络(Temporal-Attention Feature Alignment Network,TAFA-Net)进行实现.在此基础上,提出了联合深度重构网络(Joint Deep Reconstruction Network Based on TAFA-Net,JDR-TAFA-Net),实现非关键帧的高性能重构.先利用本文所提的TAFA-Net获得参考帧到当前帧的对齐帧;然后,利用基于自编码器架构的融合网络充分提取已有帧信息,增强非关键帧的重构质量.仿真结果表明,与最优的迭代优化算法SSIM-InterF-GSR相比,所提算法重构帧的峰值信噪比(Peak Signal to Noise Ratio,PSNR)最高提升了4.74dB;与最优的深度学习算法STM-Net相比,所提算法重构帧的PSNR最高提升了0.64dB.
中图分类号:
魏志超, 杨春玲. 时域注意力特征对齐的视频压缩感知重构网络[J]. 电子学报, 2022, 50(11): 2584-2592.
WEI Zhi-chao, YANG Chun-ling. Video Compressed Sensing Reconstruction Network Based on Temporal-Attention Feature Alignment[J]. Acta Electronica Sinica, 2022, 50(11): 2584-2592.
层序 | 5个级联卷积层输出通道数 | ||||
---|---|---|---|---|---|
l1 | 128 | 96 | 64 | 32 | 16 |
l2 | 128 | 128 | 96 | 64 | 32 |
l3 | 128 | 128 | 128 | 96 | 64 |
l4 | 128 | 128 | 128 | 128 | 96 |
表1 不同金字塔层g(?)所表示级联卷积层输出通道数
层序 | 5个级联卷积层输出通道数 | ||||
---|---|---|---|---|---|
l1 | 128 | 96 | 64 | 32 | 16 |
l2 | 128 | 128 | 96 | 64 | 32 |
l3 | 128 | 128 | 128 | 96 | 64 |
l4 | 128 | 128 | 128 | 128 | 96 |
算法 | Coastguard | Football | Hall | Ice | Soccer |
---|---|---|---|---|---|
SRK=0.5, SRN=0.1 | |||||
Video-MH[ | 29.83 / 0.85 | 26.25 / 0.72 | 31.85 / 0.95 | 30.07 / 0.92 | 28.75 / 0.84 |
2sMHR[ | 30.17 / 0.86 | 26.77 / 0.74 | 32.24 / 0.95 | 30.92 / 0.94 | 29.71 / 0.86 |
SSIM-InterF-GSR[ | 30.25 / 0.87 | 27.22 / 0.76 | 34.46 / 0.97 | 31.74 / 0.95 | 30.34 / 0.87 |
JDR-TAFA-Net | 32.98 / 0.93 | 29.68 / 0.86 | 38.25 / 0.98 | 36.23 / 0.98 | 34.79 / 0.93 |
SRK=0.5, SRN=0.05 | |||||
Video-MH[ | 28.14 / 0.79 | 24.59 / 0.63 | 30.97 / 0.94 | 28.42 / 0.90 | 26.81 / 0.77 |
2sMHR[ | 28.68 / 0.81 | 25.12 / 0.66 | 31.37 / 0.94 | 29.08 / 0.91 | 27.53 / 0.80 |
SSIM-InterF-GSR[ | 28.09 / 0.81 | 25.57 / 0.68 | 33.21 / 0.96 | 28.80 / 0.92 | 27.59 / 0.80 |
JDR-TAFA-Net | 32.23 / 0.91 | 27.78 / 0.79 | 37.26 / 0.98 | 34.83 / 0.97 | 32.25 / 0.90 |
SRK=0.5, SRN=0.01 | |||||
Video-MH[ | 21.07 / 0.44 | 20.09 / 0.43 | 23.43 / 0.78 | 21.96 / 0.73 | 21.19 / 0.52 |
2sMHR[ | 21.57 / 0.47 | 20.76 / 0.46 | 24.04 / 0.81 | 22.33 / 0.75 | 21.78 / 0.56 |
SSIM-InterF-GSR[ | 25.09 / 0.68 | 22.96 / 0.54 | 28.44 / 0.93 | 24.57 / 0.82 | 23.44 / 0.63 |
JDR-TAFA-Net | 30.45 / 0.88 | 24.78 / 0.63 | 35.39 / 0.98 | 30.03 / 0.94 | 27.56 / 0.78 |
表2 JDR?TAFA?Net与迭代优化VCS算法重构PSNR(dB)/SSIM对比
算法 | Coastguard | Football | Hall | Ice | Soccer |
---|---|---|---|---|---|
SRK=0.5, SRN=0.1 | |||||
Video-MH[ | 29.83 / 0.85 | 26.25 / 0.72 | 31.85 / 0.95 | 30.07 / 0.92 | 28.75 / 0.84 |
2sMHR[ | 30.17 / 0.86 | 26.77 / 0.74 | 32.24 / 0.95 | 30.92 / 0.94 | 29.71 / 0.86 |
SSIM-InterF-GSR[ | 30.25 / 0.87 | 27.22 / 0.76 | 34.46 / 0.97 | 31.74 / 0.95 | 30.34 / 0.87 |
JDR-TAFA-Net | 32.98 / 0.93 | 29.68 / 0.86 | 38.25 / 0.98 | 36.23 / 0.98 | 34.79 / 0.93 |
SRK=0.5, SRN=0.05 | |||||
Video-MH[ | 28.14 / 0.79 | 24.59 / 0.63 | 30.97 / 0.94 | 28.42 / 0.90 | 26.81 / 0.77 |
2sMHR[ | 28.68 / 0.81 | 25.12 / 0.66 | 31.37 / 0.94 | 29.08 / 0.91 | 27.53 / 0.80 |
SSIM-InterF-GSR[ | 28.09 / 0.81 | 25.57 / 0.68 | 33.21 / 0.96 | 28.80 / 0.92 | 27.59 / 0.80 |
JDR-TAFA-Net | 32.23 / 0.91 | 27.78 / 0.79 | 37.26 / 0.98 | 34.83 / 0.97 | 32.25 / 0.90 |
SRK=0.5, SRN=0.01 | |||||
Video-MH[ | 21.07 / 0.44 | 20.09 / 0.43 | 23.43 / 0.78 | 21.96 / 0.73 | 21.19 / 0.52 |
2sMHR[ | 21.57 / 0.47 | 20.76 / 0.46 | 24.04 / 0.81 | 22.33 / 0.75 | 21.78 / 0.56 |
SSIM-InterF-GSR[ | 25.09 / 0.68 | 22.96 / 0.54 | 28.44 / 0.93 | 24.57 / 0.82 | 23.44 / 0.63 |
JDR-TAFA-Net | 30.45 / 0.88 | 24.78 / 0.63 | 35.39 / 0.98 | 30.03 / 0.94 | 27.56 / 0.78 |
算法 | SRN=0.037 | SRN=0.018 | SRN=0.009 |
---|---|---|---|
CSVideoNet[ | 26.87 / 0.81 | 25.09 / 0.77 | 24.23 / 0.74 |
2sRER-VGSR-Net[ | 31.52 / 0.89 | 29.87 / 0.86 | 28.60 / 0.83 |
PRCVSNet[ | 31.09 / - | 28.93 / - | 26.86 / - |
STM-Net[ | 32.50 / 0.93 | 31.14 / 0.91 | 29.98 / 0.89 |
JDR-TAFA-Net | 33.14 / 0.94 | 31.63 / 0.91 | 30.33 / 0.89 |
表3 JDR?TAFA?Net与深度学习VCS算法重构PSNR(dB)/SSIM对比
算法 | SRN=0.037 | SRN=0.018 | SRN=0.009 |
---|---|---|---|
CSVideoNet[ | 26.87 / 0.81 | 25.09 / 0.77 | 24.23 / 0.74 |
2sRER-VGSR-Net[ | 31.52 / 0.89 | 29.87 / 0.86 | 28.60 / 0.83 |
PRCVSNet[ | 31.09 / - | 28.93 / - | 26.86 / - |
STM-Net[ | 32.50 / 0.93 | 31.14 / 0.91 | 29.98 / 0.89 |
JDR-TAFA-Net | 33.14 / 0.94 | 31.63 / 0.91 | 30.33 / 0.89 |
算法 | SRN=0.037 | SRN=0.018 | SRN=0.009 |
---|---|---|---|
CSVideoNet[ | 0.0094 | 0.0085 | 0.0080 |
2sRER-VGSR-Net[ | 0.0152 | 0.0153 | 0.0156 |
PRCVSNet[ | - | - | - |
STM-Net[ | 0.0087 | 0.0087 | 0.0086 |
JDR-TAFA-Net | 0.0169 | 0.0169 | 0.0169 |
表4 JDR?TAFA?Net与深度学习VCS算法重构时间对比 (s)
算法 | SRN=0.037 | SRN=0.018 | SRN=0.009 |
---|---|---|---|
CSVideoNet[ | 0.0094 | 0.0085 | 0.0080 |
2sRER-VGSR-Net[ | 0.0152 | 0.0153 | 0.0156 |
PRCVSNet[ | - | - | - |
STM-Net[ | 0.0087 | 0.0087 | 0.0086 |
JDR-TAFA-Net | 0.0169 | 0.0169 | 0.0169 |
算法 | BasketballDrive | Cactus | ParkScene |
---|---|---|---|
SRK=0.2, SRN=0.037 | |||
STM-Net[ | 33.48 / 0.87 | 32.01 / 0.88 | 32.57 / 0.88 |
JDR-TAFA-Net | 33.84 / 0.88 | 32.62 / 0.88 | 32.71 / 0.89 |
SRK=0.2, SRN=0.018 | |||
STM-Net[ | 31.01 / 0.85 | 31.04 / 0.86 | 31.38 / 0.86 |
JDR-TAFA-Net | 31.87 / 0.86 | 31.57 / 0.87 | 32.04 / 0.88 |
SRK=0.2, SRN=0.009 | |||
STM-Net[ | 29.16 / 0.81 | 29.83 / 0.83 | 30.07 / 0.83 |
JDR-TAFA-Net | 29.46 / 0.81 | 29.95 / 0.84 | 30.33 / 0.84 |
表5 JDR?TAFA?Net与STM?Net在高清序列上重构PSNR(dB)/SSIM对比
算法 | BasketballDrive | Cactus | ParkScene |
---|---|---|---|
SRK=0.2, SRN=0.037 | |||
STM-Net[ | 33.48 / 0.87 | 32.01 / 0.88 | 32.57 / 0.88 |
JDR-TAFA-Net | 33.84 / 0.88 | 32.62 / 0.88 | 32.71 / 0.89 |
SRK=0.2, SRN=0.018 | |||
STM-Net[ | 31.01 / 0.85 | 31.04 / 0.86 | 31.38 / 0.86 |
JDR-TAFA-Net | 31.87 / 0.86 | 31.57 / 0.87 | 32.04 / 0.88 |
SRK=0.2, SRN=0.009 | |||
STM-Net[ | 29.16 / 0.81 | 29.83 / 0.83 | 30.07 / 0.83 |
JDR-TAFA-Net | 29.46 / 0.81 | 29.95 / 0.84 | 30.33 / 0.84 |
设置 | FT-1 | FT-2 | FF | PP | MFE | PSNR / SSIM |
---|---|---|---|---|---|---|
Base | × | × | × | × | × | 27.75 / 0.82 |
① | √ | × | × | × | × | 31.82 / 0.91 |
② | × | √ | × | × | × | 32.04 / 0.92 |
③ | × | √ | √ | × | × | 32.14 / 0.93 |
④ | × | √ | √ | √ | × | 32.26 / 0.93 |
⑤ | × | √ | √ | √ | √ | 32.47 / 0.93 |
表6 不同网络设置对TAFA?Net重构PSNR(dB)/SSIM的影响
设置 | FT-1 | FT-2 | FF | PP | MFE | PSNR / SSIM |
---|---|---|---|---|---|---|
Base | × | × | × | × | × | 27.75 / 0.82 |
① | √ | × | × | × | × | 31.82 / 0.91 |
② | × | √ | × | × | × | 32.04 / 0.92 |
③ | × | √ | √ | × | × | 32.14 / 0.93 |
④ | × | √ | √ | √ | × | 32.26 / 0.93 |
⑤ | × | √ | √ | √ | √ | 32.47 / 0.93 |
Base | TAFA-Net | 融合网络 | 观测损失 | PSNR / SSIM |
---|---|---|---|---|
√ | × | × | × | 27.75 / 0.82 |
√ | √ | × | × | 32.47 / 0.93 |
√ | √ | √ | × | 33.00 / 0.93 |
√ | √ | √ | √ | 33.14 / 0.94 |
表7 融合网络及观测损失对重构PSNR(dB)/SSIM的影响
Base | TAFA-Net | 融合网络 | 观测损失 | PSNR / SSIM |
---|---|---|---|---|
√ | × | × | × | 27.75 / 0.82 |
√ | √ | × | × | 32.47 / 0.93 |
√ | √ | √ | × | 33.00 / 0.93 |
√ | √ | √ | √ | 33.14 / 0.94 |
1 | DONOHO D L. Compressed sensing[J]. IEEE Transactions on Information Theory, 2006, 52(4): 1289-1306. |
2 | TRAMEL E W, FOWLER J E. Video compressed sensing with multihypothesis[C]//2011 Data Compression Conference. Snowbird: IEEE, 2011: 193-202. |
3 | OU W, YANG C, LI W, et al. A two-stage multihypothesis reconstruction scheme in compressed video sensing[C]//2016 IEEE International Conference on Image Processing. Phoenix: IEEE, 2016: 2494-2498. |
4 | LI W, YANG C, MA L. A multihypothesis-based residual reconstruction scheme in compressed video sensing[C]//2017 IEEE International Conference on Image Processing. Beijing: IEEE, 2017: 2766-2770. |
5 | 和志杰, 杨春玲, 汤瑞东. 视频压缩感知中基于结构相似的帧间组稀疏表示重构算法研究[J]. 电子学报, 2018, 46(3): 544-553. |
HE Zhi-jie, YANG Chun-ling, TANG Rui-dong. Research on structural similarity based inter-frame group sparse representation for compressed video sensing[J]. Acta Electronica Sinica, 2018, 46(3): 544-553. (in Chinese) | |
6 | XU K, REN F. CSVideoNet: A real-time end-to end learning framework for high-frame-rate video compressive sensing[C]//2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe: IEEE, 2018: 1680-1688. |
7 | 禤韵怡, 杨春玲. 基于帧间组稀疏的两阶段递归增强视频压缩感知重构网络[J]. 电子学报, 2021, 49(3): 435-442. |
XUAN Yun-yi, YANG Chun-ling. Two-stage recursive enhancement reconstruction based on video inter-frame group sparse representation in compressed video sensing[J]. Acta Electronica Sinica, 2021, 49(3): 435-442. (in Chinese) | |
8 | LING X, YANG C, PEI H. Compressed video sensing network based on alignment prediction and residual reconstruction[C]//2020 IEEE International Conference on Multimedia and Expo. London: IEEE, 2020: 1-6. |
9 | WEI Z, YANG C, XUAN Y. Efficient video compressed sensing reconstruction via exploiting spatial-temporal correlation with measurement constraint[C]//2021 IEEE International Conference on Multimedia and Expo. Shenzhen: IEEE, 2021: 1-6. |
10 | DAI J, QI H, XIONG Y, et al. Deformable convolutional networks[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 764-773. |
11 | CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Virtual: Springer, 2020: 213-229. |
12 | CHEN H, WANG Y, GUO T, et al. Pre-trained image processing transformer[C]//2021 IEEE Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 12299-12310. |
13 | LIU Z, LIN Y, CAO Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//2021 IEEE International Conference on Computer Vision. Virtual: IEEE, 2021: 10012-10022. |
14 | HUANG Z, WANG X, HUANG L, et al. CCNet: Criss-cross attention for semantic segmentation[C]//2019 IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 603-612. |
15 | SHI W, JIANG F, ZHANG S, et al. Deep networks for compressed image sensing[C]//2017 IEEE International Conference on Multimedia and Expo. Hong Kong: IEEE, 2017: 877-882. |
16 | 裴翰奇, 杨春玲, 魏志超, 曹燕. 基于SPL迭代思想的图像压缩感知重构神经网络[J]. 电子学报, 2021, 49(6): 1195-1203. |
PEI Han-qi, YANG Chun-ling, WEI Zhi-chao, CAO Yan. Image compressive sensing reconstruction network based on iterative SPL theory[J]. Acta Electronica Sinica, 2021, 49(6): 1195-1203. (in Chinese) | |
17 | GAN L. Block compressed sensing of natural images[C]//2007 International Conference on Digital Signal Processing. Cardiff: IEEE, 2007: 403-406. |
18 | WANG P, CHEN P, YUAN Y, et al. Understanding convolution for semantic segmentation[C]//2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe: IEEE, 2018: 1451-1460. |
19 | ARBELAEZ P, MAIRE M, FOWLKES C, et al. Contour detection and hierarchical image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33(5): 898-916. |
20 | SOOMRO K, ZAMIR A R, SHAH M. UCF101: A dataset of 101 human actions classes from videos in the wild[EB/OL]. [2022-04-01]. . |
21 | KINGMA D P, BA J. Adam: A method for stochastic optimization[EB/OL]. [2022-04-01]. |
1412. 6980. | |
22 | SULLIVAN G J, OHM J R, HAN W J, et al. Overview of the high efficiency video coding(HEVC) standard[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2012, 22(12): 1649-1668. |
[1] | 丁琪, 田萱, 孙国栋. 基于注意力增强的热点感知新闻推荐模型[J]. 电子学报, 2023, 51(1): 93-104. |
[2] | 张永梅, 孙捷. 基于动静态特征双输入神经网络的咳嗽声诊断COVID-19算法[J]. 电子学报, 2023, 51(1): 202-212. |
[3] | 袁海英, 成君鹏, 曾智勇, 武延瑞. Mobile_BLNet:基于Big-Little Net的轻量级卷积神经网络优化设计[J]. 电子学报, 2023, 51(1): 180-191. |
[4] | 王神龙, 雍宇, 吴晨睿. 基于伪孪生神经网络的低纹理工业零件6D位姿估计[J]. 电子学报, 2023, 51(1): 192-201. |
[5] | 王硕, 王坚, 王亚男, 宋亚飞. 一种基于特征融合的恶意代码快速检测方法[J]. 电子学报, 2023, 51(1): 57-66. |
[6] | 吴靖, 叶晓晶, 黄峰, 陈丽琼, 王志锋, 刘文犀. 基于深度学习的单帧图像超分辨率重建综述[J]. 电子学报, 2022, 50(9): 2265-2294. |
[7] | 毛国君, 王者浩, 黄山, 王翔. 基于剪边策略的图残差卷积深层网络模型[J]. 电子学报, 2022, 50(9): 2205-2214. |
[8] | 崔金鹏, 周洋, 殷海兵, 黄晓峰, 陆宇. 结合显著性的MVD视频整帧丢失错误隐藏[J]. 电子学报, 2022, 50(9): 2146-2154. |
[9] | 袁海英, 曾智勇, 成君鹏. 面向灵活并行度的稀疏卷积神经网络加速器[J]. 电子学报, 2022, 50(8): 1811-1818. |
[10] | 徐兴荣, 刘聪, 李婷, 郭娜, 任崇广, 曾庆田. 基于双向准循环神经网络和注意力机制的业务流程剩余时间预测方法[J]. 电子学报, 2022, 50(8): 1975-1984. |
[11] | 王相海, 赵晓阳, 王鑫莹, 赵克云, 宋传鸣. 非抽取小波边缘学习深度残差网络的单幅图像超分辨率重建[J]. 电子学报, 2022, 50(7): 1753-1765. |
[12] | 张志文, 刘天歌, 聂鹏举. 基于实景数据增强和双路径融合网络的实时街景语义分割算法[J]. 电子学报, 2022, 50(7): 1609-1620. |
[13] | 丁毅, 沈薇, 李海生, 钟琼慧, 田明宇, 李洁. 面向CNN的区块链可信隐私服务计算模型[J]. 电子学报, 2022, 50(6): 1399-1409. |
[14] | 吴晓晓, 李刚强, 张胜利. 分布式协作频谱感知网络中恶意节点检测和定位方法研究[J]. 电子学报, 2022, 50(6): 1370-1380. |
[15] | 张波, 陆云杰, 秦东明, 邹国建. 一种卷积自编码深度学习的空气污染多站点联合预测模型[J]. 电子学报, 2022, 50(6): 1410-1427. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||