电子学报

• •    

基于Winograd算法的目标检测加速器设计与优化

李斌1,2, 齐延荣1,2, 周清雷1,2   

  1. 1.郑州大学计算机与人工智能学院, 河南 郑州, 450001
    2.郑州大学信息工程学院, 河南 郑州, 450001
  • 收稿日期:2020-12-01 修回日期:2021-08-17 出版日期:2022-07-04
  • 通讯作者: 齐延荣
  • 作者简介:李 斌 男,1986年生,河南郑州人.主要研究方向为信息安全和高性能计算.E-mail: iebinli@zzu.edu.cn
    齐延荣(通讯作者) 女,1995年生,河南濮阳人.主要研究方向为图像处理和高性能计算.E-mail: 2297149111@qq.com

Design and optimization of target detection accelerator based on Winograd algorithm

Li Bin1,2, Qi Yan-rong1,2, Zhou Qing-lei1,2   

  1. 1.School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou,Henan 450001,China
    2.School of Information Engineering,Zhengzhou University,Zhengzhou,Henan 450001,China
  • Received:2020-12-01 Revised:2021-08-17 Online:2022-07-04
  • Contact: Qi Yan-rong

摘要:

卷积神经网络(Convolutional Neural Networks,CNN)已被广泛应用于图像处理领域.基于CNN的目标检测模型,如YOLO,已被证明在许多应用中是最先进的.CNN对计算能力和内存带宽要求极高,通常需要部署到专用硬件平台,FPGA因其高性能、低功耗和可重配置性成为CNN的有效硬件加速器.以往的基于FPGA的目标检测加速器主要采用传统卷积算法,然而,传统卷积算法的高运算复杂度限制了加速器的性能.基于此,本文设计了一种基于Winograd算法的目标检测加速器.考虑到各模块间的联系,采用模块融合策略融合卷积层和池化层模块,降低数据移动次数,减少片外存储器访问次数,提高加速器整体性能.以YOLO2模型为例,对数据访问模式、池化内核、参数重排序、数据通路优化进行分析设计,并部署在U280板卡上.实验结果表明,量化后mAP降低了0.96%,性能达249.65GOP/s,是Xilinx官网所给数据的4.4倍.

关键词: 目标检测, FPGA, Winograd算法, 模块融合, YOLO2

Abstract:

Convolutional neural network (CNN) has been widely used in the field of image processing. CNN-based target detection models, such as YOLO, have proven to be the most advanced in many applications. CNN has extremely high requirements for computing power and memory bandwidth, and usually needs to be deployed on a dedicated hardware platform. FPGA has become an effective hardware accelerator for CNN due to its high performance, low power consumption and reconfigurability. In the past, FPGA-based target detection accelerators mainly used traditional convolution algorithms. However, the high computational complexity of traditional convolution algorithms limited the accelerator's performance. Based on this, this paper designs a target detection accelerator based on Winograd algorithm. Taking into account the connection between the modules, the module fusion strategy is adopted to fuse the convolutional layer and the pooling layer modules to reduce the number of data movement, reduce the number of off-chip memory accesses, and improve the overall performance of the accelerator. Take the YOLO2 model as an example, analyze and design the data access mode, pooled kernel, parameter reordering, and data path optimization, and deploy it on the U280 board. The experimental results show that mAP is reduced by 0.96% after quantification, and the performance reaches 249.65GOP/s, which is 4.4 times the data given by Xilinx official website.

Key words: Target Detection, FPGA, Winograd Algorithm, Module integration, YOLO2

中图分类号: