

浏览全部资源
扫码关注微信
1.华中科技大学人工智能与自动化学院,湖北武汉430074
2.华中科技大学多谱信息处理技术国家级重点实验室,湖北武汉430074
Received:28 December 2020,
Revised:2021-03-30,
Published:25 February 2023
移动端阅览
唐维伟,钟胜,卢金仪等.基于FPGA的Skynet网络结构优化及高时效实现[J].电子学报,2023,51(02):314-323.
TANG Wei-wei,ZHONG Sheng,LU Jin-yi,et al.Network Structure Optimization and High-Efficiency Implementation of Skynet Based on FPGA[J].ACTA ELECTRONICA SINICA,2023,51(02):314-323.
唐维伟,钟胜,卢金仪等.基于FPGA的Skynet网络结构优化及高时效实现[J].电子学报,2023,51(02):314-323. DOI: 10.12263/DZXB.20210028.
TANG Wei-wei,ZHONG Sheng,LU Jin-yi,et al.Network Structure Optimization and High-Efficiency Implementation of Skynet Based on FPGA[J].ACTA ELECTRONICA SINICA,2023,51(02):314-323. DOI: 10.12263/DZXB.20210028.
基于卷积神经网络(Convolutional Neural Network,CNN)的目标检测算法有着鲁棒性强、准确度高等优点,被广泛用于计算机视觉任务领域.然而,CNN参数量大、计算量大的特性使得其难以在边缘计算平台实时实现,为此,本文针对目标检测网络Skynet进行结构优化,并基于高效的层内并行流水的加速架构,在现场可编程门阵列(Field Programmable Gate Array,FPGA)上对其进行实时实现.该方法对Skynet进行剪枝,合并其卷积层与归一化层,利用(Kullback-Leibler,KL)相对熵及极大值量化方法对权重及特征图进行8 bit定点量化,同时将偏置参数及缩放系数定点化,并合并激活操作与饱和截断操作,在减少存储量和计算量的同时,加快前向推理速度.此外,以滑窗操作为基础,采用通道及像素并行计算,设计深度可分离卷积的流水策略,将串行的前向推理结构优化为并行流水的结构,极大减少了前向推理的时间.实验表明,在UA-DETRAC数据集上,本文实现的系统识别精度为0.752,在160×160的图像分辨率上,速度达到115FPS,与CPU相比,提速11倍,达到了GPU的75%,功耗分别为CPU的10.6%,GPU的7.43%,而且,与同类基于FPGA的CNN加速工作相比,本文方法在速度和能效比上均表现最优.
The object detection algorithm based on convolutional neural network (CNN) has the advantages of strong robustness and high accuracy
and is widely used in the field of computer vision tasks. However
the size of CNN parameters and the amount of calculation make it difficult to implement in real-time on edge computing platforms. For this reason
this paper optimizes the structure of the object detection network Skynet
and realizes on the field programmable logic gate array (FPGA) based on an efficient intra-layer parallel pipeline acceleration architecture. This method prunes skynet
merges its convolutional layer and normalization layer
uses the (KL) relative entropy method and maximum quantization method to perform 8 bit fixed-point quantization on the weights and feature maps
and converts bias and scaling coefficients into fixed point
then merges the activation operation and saturation truncation operation for speeding up the CNN forward calculation. In addition
this paper optimizes serial structure to pipeline parallel structure based on the sliding window operation
parallelizes channel and pixel calculation
then designs a pipeline strategy for depthwise separable convolution
which greatly reduces time to forward calculation. Experiments show that on the UA-DETRAC dataset
the method recognition accuracy of this paper is 0.752
and the frame rate reaches 115FPS at an image resolution of 160×160
which is 11 times faster than the CPU and reaches 75% of the GPU. The power is reduced to 10.6% of the CPU and 7.43% of the GPU. Moreover
the proposed method has the best performance in both speed and energy efficiency ratio by comparing with the similar CNN acceleration methods based on FPGA.
李旭冬 , 叶茂 , 李涛 . 基于卷积神经网络的目标检测研究综述 [J]. 计算机应用研究 , 2017 , 34 ( 10 ): 2881 - 2886, 2891 .
LI X D , YE M , LI T . Review of object detection based on convolutional neural networks [J]. Application Research of Computers , 2017 , 34 ( 10 ): 2881 - 2886, 2891 . (in Chinese)
REDMON J , DIVVALA S , GIRSHICK R , et al . You only look once: Unified, real-time object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas : IEEE Press , 2016 : 779 - 788 .
GIRSHICK R . Fast R-CNN [C]// Proceedings of The IEEE International Conference on Computer Vision . Nice : IEEE Press , 2015 : 1440 - 1448 .
HOWARD A G , ZHU M L , CHEN B , et al . MobileNets: Efficient convolutional neural networks for mobile vision applications [EB/OL]. ( 2017-04-17 )[ 2020-12-28 ]. https://arxiv.org/abs/1704.04861 https://arxiv.org/abs/1704.04861 .
CHOLLET F . Xception: Deep learning with depthwise separable convolutions [EB/OL].( 2016-10-07 )[ 2017-04-04 ]. https://arxiv.org/abs/1610.02357 https://arxiv.org/abs/1610.02357 .
ZHANG X , LU H , HAO C , et al . SkyNet: A hardware-efficient method for object detection and tracking on embedded systems [J]. Proceedings of Machine Learning and Systems , 2020 , 2 : 216 - 229 .
NAKAHARA H , YONEKAWA H , FUJII T , et al . A lightweight YOLOV2: A binarized CNN with a parallel support vector regression for an FPGA [C]// Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays . New York : ACM Press , 2018 : 31 - 40 .
NAKAHARA H , YONEKAWA H , SATO S . An object detector based on multiscale sliding window search using a fully pipelined binarized CNN on an FPGA [C]// International Conference on Field Programmable Technology (ICFPT) . Tokyo : IEEE Press , 2017 : 168 - 175 .
NGUYEN D T , NGUYEN T N , KIM H , et al . A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection [J]. IEEE Trans on Very Large Scale Integration (VLSI) Systems , 2019 , 27 ( 8 ): 1861 - 1873 .
刘勤让 , 刘崇阳 . 利用参数稀疏性的卷积神经网络计算优化及其 FPGA 加速器设计 [J]. 电子与信息学报 , 2018 , 40 ( 6 ): 1368 - 1374 .
LIU Q R , LIU C Y . Calculation optimization for convolutional neural networks and FPGA-based accelerator design using the parameters sparsity [J]. Journal of Electronics & Information Technology , 2018 , 40 ( 6 ): 1368 - 1374 . (in Chinese)
ZHANG C , LI P , SUN G , et al . Optimizing FPGA-based accelerator design for deep convolutional neural networks [C]// Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays . New York : ACM Press , 2015 : 161 - 170 .
QIU J , WANG J , YAO S , et al . Going deeper with embedded FPGA platform for convolutional neural network [C]// Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays . New York : ACM Press , 2016 : 26 - 35 .
DING C , WANG S , LIU N , et al . REQ-YOLO: A resource-aware, efficient quantization framework for object detection on FPGAs [C]// Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays . New York : ACM Press , 2019 : 33 - 42 .
FAN H , LIU S , FERIANC M , et al . A real-time object detection accelerator with compressed SSDLite on FPGA [C]// International Conference on Field-Programmable Technology (FPT) . Piscataway : IEEE Press , 2018 : 14 - 21 .
BAI L , ZHAO Y , HUANG X . A CNN accelerator on FPGA using depthwise separable convolution [J]. IEEE Trans on Circuits and Systems II: Express Briefs , 2018 , 65 ( 10 ): 1415 - 1419 .
ZENG H , CHEN R , ZHANG C , et al . A framework for generating high throughput CNN implementations on FPGAs [C]// Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays . New York : ACM Press , 2018 : 117 - 126 .
YU Y , WU C , ZHAO T , et al . Opu: An fpga-based overlay processor for convolutional neural networks [J]. IEEE Trans on Very Large Scale Integration (VLSI) Systems , 2019 , 28 ( 1 ): 35 - 47 .
WU D , ZHANG Y , JIA X , et al . A high-performance CNN processor based on FPGA for MobileNets [C]// the 29th International Conference on Field Programmable Logic and Applications (FPL) . Barcelona : IEEE Press , 2019 : 136 - 143 .
蹇强 , 张培勇 , 王雪洁 . 一种可配置的CNN协加速器的FPGA实现方法 [J]. 电子学报 , 2019 , 47 ( 7 ): 1525 - 1531 .
JIAN Q , ZHANG P Y , WANG X J . An FPGA implementation method for configurable CNN co-accelerator [J]. Acta Electronica Sinica , 2019 , 47 ( 7 ): 1525 - 1531 . (in Chinese)
ZHAO R , NIU X , WU Y , et al . Optimizing CNN-based object detection algorithms on embedded FPGA platforms [C]// International Symposium on Applied Reconfigurable Computing . Berlin : Springer , 2017 : 255 - 267 .
HAN S , MAO H , DALLY W J , et al . Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding [EB/OL].( 2015-10-01 )[ 2016-02-15 ]. https://arxiv.org/abs/1510.00149 https://arxiv.org/abs/1510.00149 .
0
Views
29
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621