A Convolutional Neural Network Accelerator Based on Inter-layer Feature Map Compression

XIE Liangbo; CHEN Lin; ZHOU Mu; BU Wenjie

doi:10.12263/DZXB.20260134

您当前的位置：

首页 >

文章列表页 >

A Convolutional Neural Network Accelerator Based on Inter-layer Feature Map Compression

更新时间：2026-06-09

- A Convolutional Neural Network Accelerator Based on Inter-layer Feature Map Compression
- ACTA ELECTRONICA SINICA Pages: 1-17(2026)
- 作者机构：
  
  1.重庆邮电大学通信与信息工程学院，重庆 400065
  2.重庆邮电大学电子科学与工程学院，重庆 400065
- 作者简介：
- 基金信息：
  
  Chongqing Natural Science Foundation Project(CSTB2023NSCQ-MSX0249;CSTB2023NSCQ-LZX0126);Science and Technology Research Program of Chongqing Municipal Education Commission(KJQN202300615)
- DOI：10.12263/DZXB.20260134
  CLC： TP302;TN47
- Received：14 February 2026，
  
  Accepted：19 March 2026，
  
  Online First：09 June 2026，
- 稿件说明：
移动端阅览
XIE Liangbo, CHEN Lin, ZHOU Mu, et al. A Convolutional Neural Network Accelerator Based on Inter-layer Feature Map Compression[J/OL]. ACTA ELECTRONICA SINICA, 2026, 1-17.
DOI：

XIE Liangbo, CHEN Lin, ZHOU Mu, et al. A Convolutional Neural Network Accelerator Based on Inter-layer Feature Map Compression[J/OL]. ACTA ELECTRONICA SINICA, 2026, 1-17. DOI： 10.12263/DZXB.20260134.

摘要

随着深度学习技术的飞速发展，卷积神经网络（Convolutional Neural Network，CNN）在图像识别与处理任务中展现出卓越的性能。然而，随着网络深度的增加，海量的中间数据传输给硬件加速器的片上存储和访存带宽带来了巨大的压力，“访存墙”问题日益凸显，严重制约了系统的整体吞吐量与能效比。针对该问题，现有的层间特征数据压缩方法主要分为两类。一类侧重于硬件实现的轻量化，其面积开销虽小，但受限于算法复杂度，压缩率较低，难以有效缓解高吞吐场景下的片外带宽压力。另一类追求压缩性能，导致过高的硬件面积开销，难以在资源受限的边缘设备上部署。针对上述挑战，本文提出了一种面向CNN层间特征图的统计感知混合压缩方法，核心设计目标是实现高压缩率和低硬件开销，解决压缩性能和资源消耗难以兼顾的问题。该方法通过深入挖掘数据的稀疏性与分布特征，结合“离线分析-在线压缩”的软硬件协同机制，实现了硬件友好的数据编码。离线分析阶段，对CNN层间特征数据进行统计分析，生成所需编码表及基准值。在线压缩阶段，对特征数据进行分类，划分为零值数据与非零值数据，对零值数据，采用结合熵编码的增强型零游程编码；对非零数据，采用动态基准-增量编码。该差异化编码机制在维持高压缩率的同时，将硬件面积开销降低了58.7%~72.9%，解决了传统压缩算法硬件复杂度高的问题。基于AlexNet、VGG16、ResNet34和MobileNetV2四种具有代表性的CNN层间特征图压缩实验，对本文所提方法在不同网络结构和数据格式下的压缩性能进行了系统评估。实验结果表明，相较于同类研究，本文所提数据压缩方法在INT8量化格式下的压缩率最高提升了58.5%，在FP32/FP16格式下最高提升了36.7%。在ALINX AXU5EV目标平台上部署VGG16模型，基于本文数据压缩方法的加速器的推理吞吐量可达242.8 GOPS，相比无压缩基准架构，运算性能与能效比分别提升了41.4%和27.8%。实验结果表明，本文所提方法平衡了CNN层间特征图压缩的压缩率和硬件开销，为资源受限边缘场景下的CNN加速器设计提供了新的解决方案。

Abstract

With the rapid development of deep learning technology

convolutional neural networks (CNNs) have demonstrated exceptional performance in image recognition and processing tasks. However

as the network depth increases

the massive transmission of intermediate data imposes tremendous pressure on the on-chip memory and memory access bandwidth of hardware accelerators. The increasingly prominent “memory wall” problem has severely constrained the overall throughput and energy efficiency of the system.To address this issue

existing inter-layer feature data compression methods are mainly divided into two categories. The first category focuses on lightweight hardware implementation: despite low area overhead

their compression ratio is limited by algorithm complexity

making it difficult to effectively alleviate the off-chip bandwidth pressure in high-throughput scenarios. The second category pursues superior compression performance

but incurs excessive hardware area overhead

which is hard to deploy on resource-constrained edge devices.Aiming at the above challenges

this paper proposes a statistic-aware hybrid compression method for CNN inter-layer feature maps

with the core design goal of achieving high compression ratio and low hardware overhead to resolve the difficulty in balancing compression performance and resource consumption. By deeply exploiting the sparsity and distribution characteristics of the data

this method realizes hardware-friendly data coding combined with a hardware-software co-design mechanism of “offline analysis-online compression”. In the offline analysis stage

statistical analysis is performed on the CNN inter-layer feature data to generate the required coding tables and baseline values. In the online compression stage

the feature data are classified into zero-value data and non-zero-value data. For zero-value data

an enhanced zero run-length encoding combined with entropy coding is adopted; for non-zero data

dynamic baseline-delta encoding is applied. This differentiated coding mechanism reduces the hardware area overhead by 58.7% to 72.9% while maintaining a high compression ratio

which solves the problem of high hardware complexity in traditional compression algorithms.We conduct a systematic evaluation of the compression performance of the proposed method under different network structures and data formats

based on compression experiments on inter-layer feature maps of four representative CNNs: AlexNet

VGG16

ResNet34

and MobileNetV2. Experimental results show that

compared with similar studies

the proposed data compression method achieves a maximum improvement of 58.5% in compression ratio under the INT8 quantization format

and a maximum improvement of 36.7% under FP32/FP16 formats. When deploying the VGG16 model on the ALINX AXU5EV target platform

the accelerator based on the proposed data compression method reaches an inference throughput of 242.8 GOPS. Compared with the compression-free baseline architecture

the computing performance and energy efficiency are improved by 41.4% and 27.8%

respectively.The experimental results demonstrate that the proposed method balances the compression ratio and hardware overhead for CNN inter-layer feature map compression

and provides a new solution for the design of CNN accelerators in resource-constrained edge scenarios.

关键词

Keywords

references

Chen Junliang . CNN or RNN: Review and experimental comparison on image classification [C ] // 2022 IEEE 8th International Conference on Computer and Communications . Piscataway : IEEE , 2022 : 1939 - 1944 . DOI: 10.1109/iccc56324.2022.10065984 http://dx.doi.org/10.1109/iccc56324.2022.10065984

Bolme D S , Beveridge J R , Draper B A , et al . Visual object tracking using adaptive correlation filters [C ] // 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2010 : 2544 - 2550 . DOI: 10.1109/cvpr.2010.5539960 http://dx.doi.org/10.1109/cvpr.2010.5539960

Wang Jin , Yu L C , Lai K R , et al . Tree-structured regional CNN-LSTM model for dimensional sentiment analysis [J ] . IEEE/ACM Transactions on Audio , Speech, and Language Processing, 2020 , 28 : 581 - 591 . DOI: 10.1109/taslp.2019.2959251 http://dx.doi.org/10.1109/taslp.2019.2959251

Wu Di , Fan Xitian , Cao Wei , et al . SWM: A high-performance sparse-Winograd matrix multiplication CNN accelerator [J ] . IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 2021 , 29 ( 5 ): 936 - 949 . DOI: 10.1109/tvlsi.2021.3060041 http://dx.doi.org/10.1109/tvlsi.2021.3060041

Mittal S . A survey on optimized implementation of deep learning models on the NVIDIA Jetson platform [J ] . Journal of Systems Architecture , 2019 , 97 : 428 - 442 . DOI: 10.1016/j.sysarc.2019.01.011 http://dx.doi.org/10.1016/j.sysarc.2019.01.011

Chen Y H , Yang T J , Emer J , et al . Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices [J ] . IEEE Journal on Emerging and Selected Topics in Circuits and Systems , 2019 , 9 ( 2 ): 292 - 308 . DOI: 10.1109/jetcas.2019.2910232 http://dx.doi.org/10.1109/jetcas.2019.2910232

欧阳一鸣 , 王奇 , 汤飞扬 , 等 . MRNDA: 一种基于资源受限片上网络的深度神经网络加速器组播机制研究 [J ] . 电子学报 , 2024 , 52 ( 3 ): 872 - 884 .

Ouyang Yiming , Wang Qi , Tang Feiyang , et al . MRNDA: A multicast mechanism for resource-constrained NoC-based deep neural network accelerators [J ] . Acta Electronica Sinica , 2024 , 52 ( 3 ): 872 - 884 . (in Chinese)

Lee S S , Nguyen T D , Meher P K , et al . Energy-efficient high-speed ASIC implementation of convolutional neural network using novel reduced critical-path design [J ] . IEEE Access , 2022 , 10 : 34032 - 34045 . DOI: 10.1109/access.2022.3162066 http://dx.doi.org/10.1109/access.2022.3162066

Frasser C F , Linares-Serrano P , de los Ríos I D , et al . Fully parallel stochastic computing hardware implementation of convolutional neural networks for edge computing applications [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2023 , 34 ( 12 ): 10408 - 10418 . DOI: 10.1109/tnnls.2022.3166799 http://dx.doi.org/10.1109/tnnls.2022.3166799

Li Jixuan , Li Ke , Un K F , et al . An 800-MHz 8.17-TOPS/W 0.63-TOPS/mm 2 memory-utilization-aware CNN accelerator featuring a memory stationary dataflow [J ] . IEEE Journal of Solid-State Circuits , 2025 , 60 ( 8 ): 3033 - 3042 . DOI: 10.1109/jssc.2025.3532544 http://dx.doi.org/10.1109/jssc.2025.3532544

Xiong Feng , Tu Fengbin , Shi Man , et al . STC: Significance-aware transform-based codec framework for external memory access reduction [C ] // 2020 57th ACM/IEEE Design Automation Conference . Piscataway : IEEE , 2020 : 1 - 6 . DOI: 10.1109/dac18072.2020.9218522 http://dx.doi.org/10.1109/dac18072.2020.9218522

Yuan Tian , Liu Weiqiang , Han Jie , et al . High performance CNN accelerators based on hardware and algorithm co-optimization [J ] . IEEE Transactions on Circuits and Systems I: Regular Papers , 2021 , 68 ( 1 ): 250 - 263 . DOI: 10.1109/tcsi.2020.3030663 http://dx.doi.org/10.1109/tcsi.2020.3030663

Jiang Weixiong , Yu Heng , Yajun Ha . A high-throughput full-dataflow MobileNetv2 accelerator on edge FPGA [J ] . IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , 2023 , 42 ( 5 ): 1532 - 1545 . DOI: 10.1109/tcad.2022.3198246 http://dx.doi.org/10.1109/tcad.2022.3198246

Wu Bi , Yu Tianyang , Chen Ke , et al . Edge-side fine-grained sparse CNN accelerator with efficient dynamic pruning scheme [J ] . IEEE Transactions on Circuits and Systems I: Regular Papers , 2024 , 71 ( 3 ): 1285 - 1298 . DOI: 10.1109/tcsi.2023.3347417 http://dx.doi.org/10.1109/tcsi.2023.3347417

Yang Chen , Meng Yishuo , Huo Kaibo , et al . A sparse CNN accelerator for eliminating redundant computations in intra- and inter-convolutional/pooling layers [J ] . IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 2022 , 30 ( 12 ): 1902 - 1915 . DOI: 10.1109/tvlsi.2022.3211665 http://dx.doi.org/10.1109/tvlsi.2022.3211665

Chen Y H , Krishna T , Emer J S , et al . Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks [J ] . IEEE Journal of Solid-State Circuits , 2017 , 52 ( 1 ): 127 - 138 . DOI: 10.1109/jssc.2016.2616357 http://dx.doi.org/10.1109/jssc.2016.2616357

Huang Wenjin , Wu Huangtao , Chen Qingkun , et al . FPGA-based high-throughput CNN hardware accelerator with high computing resource utilization ratio [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2022 , 33 ( 8 ): 4069 - 4083 . DOI: 10.1109/tnnls.2021.3055814 http://dx.doi.org/10.1109/tnnls.2021.3055814

Parashar A , Rhu M , Mukkara A , et al . SCNN: An accelerator for compressed-sparse convolutional neural networks [C ] // 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture . Piscataway : IEEE , 2017 : 27 - 40 . DOI: 10.1145/3079856.3080254 http://dx.doi.org/10.1145/3079856.3080254

Aimar A , Mostafa H , Calabrese E , et al . NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2019 , 30 ( 3 ): 644 - 656 . DOI: 10.1109/tnnls.2018.2852335 http://dx.doi.org/10.1109/tnnls.2018.2852335

Albericio J , Judd P , Hetherington T , et al . Cnvlutin: Ineffectual-neuron-free deep neural network computing [C ] // 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture . Piscataway : IEEE , 2016 : 1 - 13 . DOI: 10.1109/isca.2016.11 http://dx.doi.org/10.1109/isca.2016.11

Han Song , Liu Xingyu , Mao Huizi , et al . EIE: Efficient inference engine on compressed deep neural network [J ] . ACM SIGARCH Computer Architecture News , 2016 , 44 ( 3 ): 243 - 254 . DOI: 10.1145/3007787.3001163 http://dx.doi.org/10.1145/3007787.3001163

Chen Yuechen , Louri A , Liu Shanshan , et al . A balanced sparse matrix convolution accelerator for efficient CNN training [J ] . IEEE Transactions on Circuits and Systems I: Regular Papers , 2024 , 71 ( 10 ): 4638 - 4651 . DOI: 10.1109/tcsi.2024.3430831 http://dx.doi.org/10.1109/tcsi.2024.3430831

Cavigelli L , Rutishauser G , Benini L . EBPC: Extended bit-plane compression for deep neural network inference and training accelerators [J ] . IEEE Journal on Emerging and Selected Topics in Circuits and Systems , 2019 , 9 ( 4 ): 723 - 734 . DOI: 10.1109/jetcas.2019.2950093 http://dx.doi.org/10.1109/jetcas.2019.2950093

Yan B K , Ruan S J . Area efficient compression for floating-point feature maps in convolutional neural network accelerators [J ] . IEEE Transactions on Circuits and Systems II: Express Briefs , 2023 , 70 ( 2 ): 746 - 750 . DOI: 10.1109/tcsii.2022.3213847 http://dx.doi.org/10.1109/tcsii.2022.3213847

Krizhevsky A , Sutskever I , Hinton G E . ImageNet classification with deep convolutional neural networks [J ] . Communications of the ACM , 2017 , 60 ( 6 ): 84 - 90 . DOI: 10.1145/3065386 http://dx.doi.org/10.1145/3065386

He Kaiming , Zhang Xiangyu , Ren Shaoqing , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2016 : 770 - 778 . DOI: 10.1109/cvpr.2016.90 http://dx.doi.org/10.1109/cvpr.2016.90

Sandler M , Howard A , Zhu Menglong , et al . MobileNetV2: Inverted residuals and linear bottlenecks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 4510 - 4520 . DOI: 10.1109/cvpr.2018.00474 http://dx.doi.org/10.1109/cvpr.2018.00474

Russakovsky O , Deng Jia , Su Hao , et al . ImageNet large scale visual recognition challenge [J ] . International Journal of Computer Vision , 2015 , 115 ( 3 ): 211 - 252 . DOI: 10.1007/s11263-015-0816-y http://dx.doi.org/10.1007/s11263-015-0816-y

龚贵川 , 谢良波 , 黄倩 , 等 . 基于ZYNQ的高效卷积神经网络加速器设计 [J ] . 电讯技术 , 2026 , 66 ( 2 ): 259 - 266 .

Gong Guichuan , Xie Liangbo , Huang Qian , et al . Design of an efficient convolutional neural network accelerator based on ZYNQ [J ] . Telecommunication Engineering , 2026 , 66 ( 2 ): 259 - 266 . (in Chinese)

Kuo J T , Wu C B , Chen Yiyuan . Implementation of a tile-grained pipeline architecture for CNN accelerator [C ] // 2023 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia) . Piscataway : IEEE , 2023 : 1 - 4 . DOI: 10.1109/icce-asia59966.2023.10326420 http://dx.doi.org/10.1109/icce-asia59966.2023.10326420

Foroumandi R , Mashoufi B , Fathi A . High-efficiency FPGA-based CNN accelerator with optimized data handling for convolution and fully connected layers [J ] . IEEE Access , 2025 , 13 : 211235 - 211250 . DOI: 10.1109/access.2025.3642938 http://dx.doi.org/10.1109/access.2025.3642938

Sun Wenhao , Liu Deng , Zou Zhiwei , et al . Sense: Model-hardware codesign for accelerating sparse CNNs on systolic arrays [J ] . IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 2023 , 31 ( 4 ): 470 - 483 . DOI: 10.1109/tvlsi.2023.3241933 http://dx.doi.org/10.1109/tvlsi.2023.3241933

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Organic-Light-Emitting-Diode-on-Silicon Micro-Display Based on Super Pixel Strategy

Network Structure Optimization and High-Efficiency Implementation of Skynet Based on FPGA

A Sparsity-Aware Convolutional Neural Network Accelerator with Flexible Parallelism

Implementation of RTL Scalable High-Performance Data Compression Method

Multilevel Resistive Memory Enabled Parallel-gradient Boolean Satisfiability Solver with Clause-folding Scheme

Related Author

MU Ting-zhou

CHEN Hong-gang

ZHANG Yin

JI Yuan

WANG Xin-rui

LU Jin-yi

XU Wen-hui

ZHOU Xu

Related Institution

School of Mechatronic Engineering and Automation, Shanghai University

Microelectronics Research and Development Center, Shanghai University

National Key Laboratory of Science & Technology on Multi-Spectral Information Processing,Huazhong University of Science and Technology

School of Artificial Intelligence and Automation, Huazhong University of Science and Technology

Faculty of Information Technology， Beijing University of Technology

⁰