

浏览全部资源
扫码关注微信
1.郑州大学计算机与人工智能学院,河南郑州 450001
2.郑州大学信息工程学院,河南郑州 450001
Received:01 December 2020,
Revised:2021-08-17,
Published:25 October 2022
移动端阅览
李斌,齐延荣,周清雷.基于Winograd算法的目标检测加速器设计与优化[J].电子学报,2022,50(10):2387-2397.
LI Bin,QI Yan-rong,ZHOU Qing-lei.Design and Optimization of Target Detection Accelerator Based on Winograd Algorithm[J].ACTA ELECTRONICA SINICA,2022,50(10):2387-2397.
李斌,齐延荣,周清雷.基于Winograd算法的目标检测加速器设计与优化[J].电子学报,2022,50(10):2387-2397. DOI: 10.12263/DZXB.20201371.
LI Bin,QI Yan-rong,ZHOU Qing-lei.Design and Optimization of Target Detection Accelerator Based on Winograd Algorithm[J].ACTA ELECTRONICA SINICA,2022,50(10):2387-2397. DOI: 10.12263/DZXB.20201371.
卷积神经网络(Convolutional Neural Networks,CNN)已被广泛应用于图像处理领域.基于CNN的目标检测模型,如YOLO,已被证明在许多应用中是最先进的.CNN对计算能力和内存带宽要求极高,通常需要部署到专用硬件平台,FPGA因其高性能、低功耗和可重配置性成为CNN的有效硬件加速器.以往的基于FPGA的目标检测加速器主要采用传统卷积算法,然而,传统卷积算法的高运算复杂度限制了加速器的性能.基于此,本文设计了一种基于Winograd算法的目标检测加速器.考虑到各模块间的联系,采用模块融合策略融合卷积层和池化层模块,降低数据移动次数,减少片外存储器访问次数,提高加速器整体性能.以YOLO2模型为例,对数据访问模式、池化内核、参数重排序、数据通路优化进行分析设计,并部署在U280板卡上.实验结果表明,量化后mAP降低了0.96%,性能达249.65 GOP/s,是Xilinx官网所给数据的4.4倍.
Convolutional neural network(CNN) has been widely used in the field of image processing. CNN-based target detection models
such as YOLO
have proven to be the most advanced in many applications. CNN has extremely high requirements for computing power and memory bandwidth
and usually needs to be deployed on a dedicated hardware platform. FPGA has become an effective hardware accelerator for CNN due to its high performance
low power consumption and reconfigurability. In the past
FPGA-based target detection accelerators mainly used traditional convolution algorithms. However
the high computational complexity of traditional convolution algorithms limited the accelerator’s performance. Based on this
this paper designs a target detection accelerator based on Winograd algorithm. Taking into account the connection between the modules
the module fusion strategy is adopted to fuse the convolutional layer and the pooling layer modules to reduce the number of data movement
reduce the number of off-chip memory accesses
and improve the overall performance of the accelerator. Take the YOLO2 model as an example
analyze and design the data access mode
pooled kernel
parameter reordering
and data path optimization
and deploy it on the U280 board. The experimental results show that mAP is reduced by 0.96% after quantification
and the performance reaches 249.65GOP/s
which is 4.4 times the data given by Xilinx official website.
REDMON J , FARHADI A . YOLO9000: Better, faster, stronger [C]// IEEE Conference on Computer Vision & Pattern Recognition . Honolulu : IEEE , 2017 : 6517 - 6525 .
NAKAHARA H , YONEKAWA H , FUJII T , et al . A lightweight YOLOv2: A binarized CNN with a parallel support vector regression for an FPGA [C]// The 2018 ACM/SIGDA International Symposium . Monterey : ACM , 2018 : 31 - 40 .
GUO K , ZENG S , YU J , et al . [DL] A survey of FPGA-based neural network inference accelerators [J]. ACM Transactions on Reconfigurable Technology and Systems (TRETS) , 2019 , 12 ( 1 ): 1 - 26 .
BAO C , XIE T , FENG W , et al . A power-efficient optimizing framework fpga accelerator based on winograd for yolo [J]. IEEE Access , 2020 , 8 : 94307 - 94317 .
HUANG Y , SHEN J , WANG Z , et al . A high-efficiency fpga-based accelerator for convolutional neural networks using winograd algorithm [J]. Journal of Physics Conference Series , 2018 , 1026 : 012019 .
YANG A , LI Y , SHU H , et al . An opencl-based FPGA accelerator for compressed YOLOv2 [C]// 2019 International Conference on Field-Programmable Technology(ICFPT) . Tianjin : IEEE , 2019 : 235 - 238 .
SHI F , LI H , GAO Y , et al . Sparse Winograd Convolutional Neural Networks on Small-Scale Systolic Arrays [EB/OL]. ( 2018-10-03 )[ 2020-12-01 ]. https://arxiv.org/abs/1810.01973 https://arxiv.org/abs/1810.01973 .
武铮 , 安虹 , 金旭 , 等 . 基于Intel平台的Winograd快速卷积算法研究与优化 [J]. 计算机研究与发展 , 2019 , 56 ( 4 ): 825 - 835 .
WU Z , AN H , JIN X , et al . Research and optimization of Winograd fast convolution algorithm based on Intel platform [J]. Computer Research and Development , 2019 , 56 ( 4 ) ): 825 - 835 . (in Chinese
NGUYEN D T , NGUYEN T N , KIM H , et al . A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection [J]. IEEE Transactions on Very Large Scale Integration(VLSI) Systems , 2019 , 27 ( 8 ): 1861 - 1873 .
LIAN X , LIU Z , SONG Z , et al . High-performance FPGA-based CNN accelerator with block-floating-point arithmetic [J]. IEEE Transactions on Very Large Scale Integration(VLSI) Systems , 2019 , 27 ( 8 ): 1874 - 1885 .
XIAO Q , LIANG Y , LU L , et al . Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs [C]// Design Automation Conference . Austin : IEEE , 2017 : 1 - 6 .
ALWANI M , CHEN H , FERDMAN M , et al . Fused-layer CNN accelerators [C]// 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture(MICRO) . Taipei : IEEE , 2016 : 1 - 12 .
BI F , YANG J . Target detection system design and FPGA implementation based on YOLO v2 algorithm [C]// 2019 3rd International Conference on Imaging, Signal Processing and Communication(ICISPC) . Singapore : IEEE , 2019 : 10 - 14 .
ZHANG S , CAO J , ZHANG Q , et al . An FPGA-based reconfigurable CNN accelerator for YOLO [C]// 2020 IEEE 3rd International Conference on Electronics Technology (ICET) . Chengdu : IEEE , 2020 : 74 - 78 .
LU T Y , CHIN H H , WU H I , et al . A very Compact Embedded CNN Processor Design Based on Logarithmic Computing [EB/OL]. ( 2020-10-13 )[ 2020-12-01 ]. https://arxiv.org/abs/2010.11686 https://arxiv.org/abs/2010.11686 .
ZHAO R , NIU X , WU Y , et al . Optimizing CNN-based object detection algorithms on embedded FPGA platforms [C]// International Symposium on Applied Reconfigurable Computing . Rennes : Springer , 2017 : 255 - 267 .
WAI Y J , MOHD YUSSOF Z BIN , SALIM S I BIN , et al . Fixed point implementation of Tiny-Yolo-v2 using OpenCL on FPGA [J]. International Journal of Advanced Computer Science and Applications , 2018 , 9 ( 10 ): 506 - 512 .
齐延荣 . 基于FPGA的深度学习图像识别加速与优化研究 [D]. 郑州 : 郑州大学 , 2021 .
QI Y R . Research on Acceleration and Optimization of Deep Learning Image Recognition Based on FPGA [D]. Zhengzhou : Zhengzhou University , 2021 .
0
Views
8
下载量
1
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621