Network Structure Optimization and High-Efficiency Implementation of Skynet Based on FPGA

TANG Wei-wei; ZHONG Sheng; LU Jin-yi; YAN Lu-xin; TAN Fu-zhong; ZHOU Xu; XU Wen-hui

doi:10.12263/DZXB.20210028

您当前的位置：

首页 >

文章列表页 >

Network Structure Optimization and High-Efficiency Implementation of Skynet Based on FPGA

PAPERS | 更新时间：2025-12-08

- Network Structure Optimization and High-Efficiency Implementation of Skynet Based on FPGA
- ACTA ELECTRONICA SINICA Vol. 51, Issue 2, Pages: 314-323(2023)
- 作者机构：
  
  1.华中科技大学人工智能与自动化学院，湖北武汉430074
  2.华中科技大学多谱信息处理技术国家级重点实验室，湖北武汉430074
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(61806081)
- DOI：10.12263/DZXB.20210028
  CLC： TN47;
- Received：28 December 2020，
  
  Revised：2021-03-30，
  
  Published：25 February 2023
- 稿件说明：
移动端阅览
唐维伟,钟胜,卢金仪等.基于FPGA的Skynet网络结构优化及高时效实现[J].电子学报,2023,51(02):314-323.

TANG Wei-wei,ZHONG Sheng,LU Jin-yi,et al.Network Structure Optimization and High-Efficiency Implementation of Skynet Based on FPGA[J].ACTA ELECTRONICA SINICA,2023,51(02):314-323.
唐维伟,钟胜,卢金仪等.基于FPGA的Skynet网络结构优化及高时效实现[J].电子学报,2023,51(02):314-323. DOI： 10.12263/DZXB.20210028.

TANG Wei-wei,ZHONG Sheng,LU Jin-yi,et al.Network Structure Optimization and High-Efficiency Implementation of Skynet Based on FPGA[J].ACTA ELECTRONICA SINICA,2023,51(02):314-323. DOI： 10.12263/DZXB.20210028.

摘要

基于卷积神经网络（Convolutional Neural Network，CNN）的目标检测算法有着鲁棒性强、准确度高等优点，被广泛用于计算机视觉任务领域.然而，CNN参数量大、计算量大的特性使得其难以在边缘计算平台实时实现，为此，本文针对目标检测网络Skynet进行结构优化，并基于高效的层内并行流水的加速架构，在现场可编程门阵列（Field Programmable Gate Array，FPGA）上对其进行实时实现.该方法对Skynet进行剪枝，合并其卷积层与归一化层，利用（Kullback-Leibler，KL）相对熵及极大值量化方法对权重及特征图进行8 bit定点量化，同时将偏置参数及缩放系数定点化，并合并激活操作与饱和截断操作，在减少存储量和计算量的同时，加快前向推理速度.此外，以滑窗操作为基础，采用通道及像素并行计算，设计深度可分离卷积的流水策略，将串行的前向推理结构优化为并行流水的结构，极大减少了前向推理的时间.实验表明，在UA-DETRAC数据集上，本文实现的系统识别精度为0.752，在160×160的图像分辨率上，速度达到115FPS，与CPU相比，提速11倍，达到了GPU的75%，功耗分别为CPU的10.6%，GPU的7.43%，而且，与同类基于FPGA的CNN加速工作相比，本文方法在速度和能效比上均表现最优.

Abstract

The object detection algorithm based on convolutional neural network (CNN) has the advantages of strong robustness and high accuracy

and is widely used in the field of computer vision tasks. However

the size of CNN parameters and the amount of calculation make it difficult to implement in real-time on edge computing platforms. For this reason

this paper optimizes the structure of the object detection network Skynet

and realizes on the field programmable logic gate array (FPGA) based on an efficient intra-layer parallel pipeline acceleration architecture. This method prunes skynet

merges its convolutional layer and normalization layer

uses the (KL) relative entropy method and maximum quantization method to perform 8 bit fixed-point quantization on the weights and feature maps

and converts bias and scaling coefficients into fixed point

then merges the activation operation and saturation truncation operation for speeding up the CNN forward calculation. In addition

this paper optimizes serial structure to pipeline parallel structure based on the sliding window operation

parallelizes channel and pixel calculation

then designs a pipeline strategy for depthwise separable convolution

which greatly reduces time to forward calculation. Experiments show that on the UA-DETRAC dataset

the method recognition accuracy of this paper is 0.752

and the frame rate reaches 115FPS at an image resolution of 160×160

which is 11 times faster than the CPU and reaches 75% of the GPU. The power is reduced to 10.6% of the CPU and 7.43% of the GPU. Moreover

the proposed method has the best performance in both speed and energy efficiency ratio by comparing with the similar CNN acceleration methods based on FPGA.

关键词

Keywords

references

李旭冬 , 叶茂 , 李涛 . 基于卷积神经网络的目标检测研究综述 [J]. 计算机应用研究 , 2017 , 34 ( 10 ): 2881 - 2886, 2891 .

LI X D , YE M , LI T . Review of object detection based on convolutional neural networks [J]. Application Research of Computers , 2017 , 34 ( 10 ): 2881 - 2886, 2891 . (in Chinese)

REDMON J , DIVVALA S , GIRSHICK R , et al . You only look once: Unified, real-time object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas : IEEE Press , 2016 : 779 - 788 .

GIRSHICK R . Fast R-CNN [C]// Proceedings of The IEEE International Conference on Computer Vision . Nice : IEEE Press , 2015 : 1440 - 1448 .

HOWARD A G , ZHU M L , CHEN B , et al . MobileNets: Efficient convolutional neural networks for mobile vision applications [EB/OL]. ( 2017-04-17 )[ 2020-12-28 ]. https://arxiv.org/abs/1704.04861 https://arxiv.org/abs/1704.04861 .

CHOLLET F . Xception: Deep learning with depthwise separable convolutions [EB/OL].( 2016-10-07 )[ 2017-04-04 ]. https://arxiv.org/abs/1610.02357 https://arxiv.org/abs/1610.02357 .

ZHANG X , LU H , HAO C , et al . SkyNet: A hardware-efficient method for object detection and tracking on embedded systems [J]. Proceedings of Machine Learning and Systems , 2020 , 2 : 216 - 229 .

NAKAHARA H , YONEKAWA H , FUJII T , et al . A lightweight YOLOV2: A binarized CNN with a parallel support vector regression for an FPGA [C]// Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays . New York : ACM Press , 2018 : 31 - 40 .

NAKAHARA H , YONEKAWA H , SATO S . An object detector based on multiscale sliding window search using a fully pipelined binarized CNN on an FPGA [C]// International Conference on Field Programmable Technology (ICFPT) . Tokyo : IEEE Press , 2017 : 168 - 175 .

NGUYEN D T , NGUYEN T N , KIM H , et al . A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection [J]. IEEE Trans on Very Large Scale Integration (VLSI) Systems , 2019 , 27 ( 8 ): 1861 - 1873 .

刘勤让 , 刘崇阳 . 利用参数稀疏性的卷积神经网络计算优化及其 FPGA 加速器设计 [J]. 电子与信息学报 , 2018 , 40 ( 6 ): 1368 - 1374 .

LIU Q R , LIU C Y . Calculation optimization for convolutional neural networks and FPGA-based accelerator design using the parameters sparsity [J]. Journal of Electronics & Information Technology , 2018 , 40 ( 6 ): 1368 - 1374 . (in Chinese)

ZHANG C , LI P , SUN G , et al . Optimizing FPGA-based accelerator design for deep convolutional neural networks [C]// Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays . New York : ACM Press , 2015 : 161 - 170 .

QIU J , WANG J , YAO S , et al . Going deeper with embedded FPGA platform for convolutional neural network [C]// Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays . New York : ACM Press , 2016 : 26 - 35 .

DING C , WANG S , LIU N , et al . REQ-YOLO: A resource-aware, efficient quantization framework for object detection on FPGAs [C]// Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays . New York : ACM Press , 2019 : 33 - 42 .

FAN H , LIU S , FERIANC M , et al . A real-time object detection accelerator with compressed SSDLite on FPGA [C]// International Conference on Field-Programmable Technology (FPT) . Piscataway : IEEE Press , 2018 : 14 - 21 .

BAI L , ZHAO Y , HUANG X . A CNN accelerator on FPGA using depthwise separable convolution [J]. IEEE Trans on Circuits and Systems II: Express Briefs , 2018 , 65 ( 10 ): 1415 - 1419 .

ZENG H , CHEN R , ZHANG C , et al . A framework for generating high throughput CNN implementations on FPGAs [C]// Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays . New York : ACM Press , 2018 : 117 - 126 .

YU Y , WU C , ZHAO T , et al . Opu: An fpga-based overlay processor for convolutional neural networks [J]. IEEE Trans on Very Large Scale Integration (VLSI) Systems , 2019 , 28 ( 1 ): 35 - 47 .

WU D , ZHANG Y , JIA X , et al . A high-performance CNN processor based on FPGA for MobileNets [C]// the 29th International Conference on Field Programmable Logic and Applications (FPL) . Barcelona : IEEE Press , 2019 : 136 - 143 .

蹇强 , 张培勇 , 王雪洁 . 一种可配置的CNN协加速器的FPGA实现方法 [J]. 电子学报 , 2019 , 47 ( 7 ): 1525 - 1531 .

JIAN Q , ZHANG P Y , WANG X J . An FPGA implementation method for configurable CNN co-accelerator [J]. Acta Electronica Sinica , 2019 , 47 ( 7 ): 1525 - 1531 . (in Chinese)

ZHAO R , NIU X , WU Y , et al . Optimizing CNN-based object detection algorithms on embedded FPGA platforms [C]// International Symposium on Applied Reconfigurable Computing . Berlin : Springer , 2017 : 255 - 267 .

HAN S , MAO H , DALLY W J , et al . Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding [EB/OL].( 2015-10-01 )[ 2016-02-15 ]. https://arxiv.org/abs/1510.00149 https://arxiv.org/abs/1510.00149 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Organic-Light-Emitting-Diode-on-Silicon Micro-Display Based on Super Pixel Strategy

Research of Parallel Hardware Architecture for Matrix Triangularization Decomposition Based on Reconfigurable Computing System

UPRFloor: A Modeling and Floorplanner for Partially Reconfigurable FPGA Systems

Related Author

ZHONG Sheng

LU Jin-yi

YAN Lu-xin

TAN Fu-zhong

ZHOU Xu

XU Wen-hui

TANG Wei-wei

WANG Xin-rui

Related Institution

School of Artificial Intelligence and Automation， Huazhong University of Science and Technology

National Key Laboratory of Science & Technology on Multi-Spectral Information Processing，Huazhong University of Science and Technology

Microelectronics Research and Development Center, Shanghai University

School of Mechatronic Engineering and Automation, Shanghai University

College of Computer Science and Technology, Harbin Engineering University.

⁰