Convolutional neural network involves in high computational complexity and excessive hardware resources
which greatly increases hardware deployment cost of deep learning algorithm. It is a promising scheme to make full use of the information redundancy of sparsity activation between layers can reduce the inference delay and power consumption with low resource overhead and almost lossless network accuracy. To solve low utilization problem of operation module caused by coarse-grained control in sparse convolution neural network accelerator
a sparsity-aware accelerator with flexible parallelism based on FPGA is designed. Convolution operation module is flexibly scheduled based on operation clustering idea
and the parallelism of input channel and output activation is adjusted online.In addition
a parallel propagation mode of input data is designed according to the data consistency during output activated parallel operation. The proposed hardware architecture is implemented on Xilinx VC709. It contains up to 1 024 multiplication and accumulation units and provides 409.6GOP/s peak computing power
and the operation speed is up to 325.8GOP/ s in VGG-16 model
which is equivalent to 794.63GOP/s of accelerator without sparse activation optimization. Its performance is 4.6 times more than that of baseline model.
关键词
Keywords
references
BAI L , LYU Y , HUANG X . RoadNet-RT: High throughput CNN architecture and SoC design for real-time road segmentation [J]. IEEE Transactions on Circuits and Systems I , 2021 , 68 ( 2 ): 704 - 714 .
KRIZHEVSKY A , SUTSKEVER I , HINTON G . ImageNet classification with deep convolutional neural networks [J]. Advances in Neural Information Processing Systems , 2012 , 25 ( 2 ): 1097 - 1105 .
HE K M , ZHANG X Y , REN S Q , et al . Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification [C]// IEEE International Conference on Computer Vision . Santiago : IEEE , 2015 : 1026 - 1034 .
LIU Jie , GE Yi-fan , TIAN Ming , MA Li-qiang . Reconfigurable convolutional network accelerator based on ZYNQ [J]. Acta Electronica Sinica , 2021 , 49 ( 4 ): 729 - 735 . (in Chinese)
LIANG S , YIN S , LIU L , et al . Acoarse-grained reconfigurable architecture for compute-intensive mapreduce acceleration [J]. IEEE Computer Architecture Letters , 2016 , 15 ( 2 ): 69 - 72 .
YU Y , WU C , ZHAO T , et al . OPU: An FPGA-based overlay processor for convolutional neural networks [J] . IEEE Transactions on Very Large-Scale Integration(VLSI) Systems , 2020 , 28 ( 1 ): 35 - 47 .
ZHANG C , ZHENMAN F , PEIPEI Z , et al . Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks [C]// IEEE/ACM International Conference on Computer-Aided Design(ICCAD) . Austin : IEEE , 2016 : 1 - 8 .
GUO J , YIN S , OUYANG P , et al . Bit-width based resource partitioning for CNN acceleration on FPGA [C]// IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines(FCCM) . Napa : IEEE , 2017 : 31 - 31 .
ALBERICIO J , JUDD P , HETHERINGTON T , et al . Cnvlutin: Ineffectual-neuron-freedeep neural network computing [C]// IEEE 43th International Symposium on Computer Architecture . Seoul : IEEE , 2016 : 1 - 13 .
MA Y , CAOY , VRUDHULA S , et al . Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks [C]// ACM/Sigda International Symposium on Field-programmable Gate Arrays . Monterey : ACM , 2017 : 45 - 54 .
LEE H , GROSSE R , RANGANATH R , et al . Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations [C]// 26th International Conference on Machine Learning . Montreal : ACM , 2009 : 609 - 616 .
KARPATHY A , TODERICI G , SHETTY S , et al . Large-scale video classification with convolutional neural networks [C]// Computer Vision & Pattern Recognition . Columbus : IEEE , 2014 : 1725 - 1732 .
YU D , DENG L . Deep learning and its applications to signal and information processing [J]. IEEE Signal Processing Magazine , 2011 , 28 ( 1 ): 145 - 154 .
CONG J , XIAO B . Minimizing computation in convolutional neural networks [C]// International Conference on Artificial Neural Networks . Hamburg : Springer, Cham , 2014 : 281 - 290 .
LI Y , MA S , GUO Y , et al . Configurable CNN accelerator based on tiling dataflow [C]// 2018 IEEE 9th International Conference on Software Engineering and Service Science(ICSESS) . Beijing : IEEE , 2018 : 309 - 313 .
SHANG J W , QIAN L , ZHANG Z , et al . LACS: A high-computational-efficiency accelerator for CNNs [J]. IEEE Access , 2020 , 8 : 6045 - 6059 .
ZHU C , HUANG K , YANG S , et al . Anefficient hardware accelerator for structured sparse convolutional neural networks on FPGAs [J]. IEEE Transactions on Very Large-Scale Integration(VLSI) Systems , 2020 , 28 ( 9 ): 1953 - 1965 .
LIANG Y , LU L Q , XIE J M . OMNI: A framework for integrating hardware and software optimizations for sparse CNNs [J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , 2021 , 40 ( 8 ): 1648 - 1661 .