1. 哈尔滨理工大学电气与电子工程学院,黑龙江,哈尔滨,150080
2. 中国电信黑龙江省分公司,黑龙江,哈尔滨,150000
3. 哈尔滨理工大学电气与电子工程学院,黑龙江,哈尔滨,150080
4. 中国电信黑龙江省分公司,黑龙江,哈尔滨,150000
纸质出版:2021
移动端阅览
刘杰, 葛一凡, 田明, 等. 基于ZYNQ的可重构卷积神经网络加速器[J]. 电子学报, 2021,49(4):729-735.
LIU Jie, GE Yi-fan, TIAN Ming, et al. Reconfigurable Convolutional Network Accelerator Based on ZYNQ[J]. Acta Electronica Sinica, 2021, 49(4): 729-735.
刘杰, 葛一凡, 田明, 等. 基于ZYNQ的可重构卷积神经网络加速器[J]. 电子学报, 2021,49(4):729-735. DOI: 10.12263/DZXB.20200409.
LIU Jie, GE Yi-fan, TIAN Ming, et al. Reconfigurable Convolutional Network Accelerator Based on ZYNQ[J]. Acta Electronica Sinica, 2021, 49(4): 729-735. DOI: 10.12263/DZXB.20200409.
针对卷积神经网络中卷积运算复杂度高、计算量大及算法在CPU和GPU上计算时存在延时及功耗限制问题,从提高现有硬件平台计算速率、降低功耗角度出发,设计了一种基于ZYNQ的具有高吞吐率和低功耗的可重构神经网络加速系统.为充分利用运算资源,探索了一种卷积运算循环优化电路;为降低带宽访问量,设计了一种数据在内存中的特殊排列方式.以VGG16网络为例,利用ZYNQ对系统进行加速,在计算性能上达到62.00GPOS的有效算力,分别是GPU和CPU的2.58倍和6.88倍,其MAC利用率高达98.20%,逼近Roofline模型理论值.加速器的计算功耗为2.0W,能效比为31.00GOPS/W,是GPU的112.77倍和CPU的334.41倍.
Aiming at the problems of high complexity of convolution operation
large amount of calculation and the limitation of delay and power consumption when the algorithm is calculated on the CPU and GPU in the convolutional neural network
from the perspective of increasing the calculation rate and reducing power consumption of existing hardware platforms
a reconfigurable neural network acceleration system with high throughput and low power consumption based on ZYNQ is presented. In order to make full use of computing resources
a convolution operation loop optimization circuit is explored; in order to reduce the bandwidth access
a special arrangement of the data in memory is designed. Taking the VGG16 network as an example
using ZYNQ to accelerate the system
62.00 GPOS effective computing power was reached
which was 2.58 times and 6.88 times that of the GPU and CPU respectively. Its MAC utilization rate was as high as 98.20%
which was close to the theoretical value of the Roofline model. The computing power consumption of the accelerator was 2.0W
and the energy efficiency ratio was 31.00GOPS/W
which was 112.77 times that of the GPU and 334.41 times that of the CPU.
0
浏览量
2
下载量
4
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621