电子学报 ›› 2021, Vol. 49 ›› Issue (4): 729-735.DOI: 10.12263/DZXB.20200409

• 学术论文 • 上一篇    下一篇

基于ZYNQ的可重构卷积神经网络加速器

刘杰1, 葛一凡1, 田明2, 马力强1   

  1. 1. 哈尔滨理工大学电气与电子工程学院, 黑龙江哈尔滨 150080;
    2. 中国电信黑龙江省分公司, 黑龙江哈尔滨 150000
  • 收稿日期:2020-04-29 修回日期:2020-07-14 出版日期:2021-04-25 发布日期:2021-04-25
  • 作者简介:刘杰 女,1980年出生,黑龙江齐齐哈尔人.2002年、2005年和2013年分别在哈尔滨理工大学、哈尔滨工业大学和哈尔滨理工大学获工学学士、工学硕士和工学博士学位.现为哈尔滨理工大学电气与电子工程学院副教授,主要从事FPGA设计和深度学习算法优化.E-mail:liujie@hrbust.edu.cn;葛一凡 男,1997年出生,山东聊城人.哈尔滨理工大学电气与电子工程学院硕士研究生,主要研究方向为深度学习算法优化及应用.
  • 基金资助:
    国家自然科学基金(No.5177090001)

Reconfigurable Convolutional Network Accelerator Based on ZYNQ

LIU Jie1, GE Yi-fan1, TIAN Ming2, MA Li-qiang1   

  1. 1. College of Electrical and Electronic Engineering, Harbin University of Science and Technology, Harbin, Heilongjiang 150080, China;
    2. Heilongjiang Branch of China Telecom, Harbin, Heilongjiang 150000, China
  • Received:2020-04-29 Revised:2020-07-14 Online:2021-04-25 Published:2021-04-25

摘要: 针对卷积神经网络中卷积运算复杂度高、计算量大及算法在CPU和GPU上计算时存在延时及功耗限制问题,从提高现有硬件平台计算速率、降低功耗角度出发,设计了一种基于ZYNQ的具有高吞吐率和低功耗的可重构神经网络加速系统.为充分利用运算资源,探索了一种卷积运算循环优化电路;为降低带宽访问量,设计了一种数据在内存中的特殊排列方式.以VGG16网络为例,利用ZYNQ对系统进行加速,在计算性能上达到62.00GPOS的有效算力,分别是GPU和CPU的2.58倍和6.88倍,其MAC利用率高达98.20%,逼近Roofline模型理论值.加速器的计算功耗为2.0W,能效比为31.00GOPS/W,是GPU的112.77倍和CPU的334.41倍.

关键词: FPGA, 卷积神经网络, Roofline模型, 硬件加速

Abstract: Aiming at the problems of high complexity of convolution operation,large amount of calculation and the limitation of delay and power consumption when the algorithm is calculated on the CPU and GPU in the convolutional neural network,from the perspective of increasing the calculation rate and reducing power consumption of existing hardware platforms,a reconfigurable neural network acceleration system with high throughput and low power consumption based on ZYNQ is presented.In order to make full use of computing resources,a convolution operation loop optimization circuit is explored; in order to reduce the bandwidth access,a special arrangement of the data in memory is designed.Taking the VGG16 network as an example,using ZYNQ to accelerate the system,62.00 GPOS effective computing power was reached,which was 2.58 times and 6.88 times that of the GPU and CPU respectively.Its MAC utilization rate was as high as 98.20%,which was close to the theoretical value of the Roofline model.The computing power consumption of the accelerator was 2.0W,and the energy efficiency ratio was 31.00GOPS/W,which was 112.77 times that of the GPU and 334.41 times that of the CPU.

Key words: FPGA, convolutional neural network, Roofline model, hardware acceleration

中图分类号: