电子学报 ›› 2023, Vol. 51 ›› Issue (3): 639-647.DOI: 10.12263/DZXB.20210691

• 学术论文 • 上一篇    下一篇

一种基于数据标准差的卷积神经网络量化方法

黄赟1, 张帆2, 郭威2, 陈立1, 羊光3   

  1. 1.信息工程大学, 河南 郑州 450001
    2.国家数字交换系统工程技术研究中心, 河南 郑州 450002
    3.河南省广播电视监测中心, 河南 郑州 450002
  • 收稿日期:2021-05-29 修回日期:2021-10-18 出版日期:2023-03-25
    • 通讯作者:
    • 张帆
    • 作者简介:
    • 黄 赟 男,1993年9月出生于江西省新余市. 信息工程大学硕士生. 主要研究方向为神经网络模型量化压缩、网络内生安全. E-mail: yyhuangz@163.com
      张 帆(通讯作者) 男,1981年9月出生. 博士. 现为国家数字交换系统工程技术研究中心副研究员、硕士生导师. 主要研究方向为主动防御、人工智能、高性能计算. 中国电子学会会员编号:E190013697M.
      郭 威 男,1990年8月出生. 博士. 现为国家数字交换系统工程技术研究中心助理研究员. 主要研究方向为主动防御、人工智能、高性能计算. 中国电子学会会员编号:E190029991M.E-mail: guowjss@126.com
      陈 立 男,1997年2月出生于浙江省义乌市. 信息工程大学硕士生. 主要研究方向为计算机视觉. E-mail: 2464863136@qq.com
      羊 光 女,1986年11月出生于河南省驻马店市. 学士. 主要研究方向为网络流量分类、入侵检测、人工智能. E-mail: flyingaki@126.com
    • 基金资助:
    • 国家自然科学基金创新研究群体项目(61521003)

A Quantification Method of Convolutional Neural Network Based on Data Standard Deviation

HUANG Yun1, ZHANG Fan2, GUO Wei2, CHEN Li1, YANG Guang3   

  1. 1.Information Engineering University,Zhengzhou,Hennan 450001,China
    2.National Digital Switching System Engineering Technology Research Center,Zhengzhou,Hennan 450002,China
    3.Henan Administration of Radio and Television Monitoring Center,Zhengzhou,Hennan 450002,China
  • Received:2021-05-29 Revised:2021-10-18 Online:2023-03-25 Published:2023-04-20
    • Corresponding author:
    • ZHANG Fan
    • Supported by:
    • Foundation for Innovative Research Groups of the National Natural Science Foundation of China(61521003)

摘要:

当前卷积神经网络模型存在规模过大且运算复杂的问题,难以应用部署在资源受限的计算平台. 针对此问题,本文基于数据标准差提出了一种适合部署在现场可编程门阵列(Field Programmable Gate Array, FPGA)上的对数量化方法. 首先,依据FPGA的特性提出对数量化方法,将32 bit浮点乘法运算转换为整数乘法及移位运算,提高了运算效率. 然后通过研究数据分布特点,提出基于数据标准差的输入量化及权值混合bit量化方法,能够有效减少量化损失. 通过对RepVGG、EfficientNet等网络进行效率与精度对比实验,8 bit量化使得大型神经网络精度仅下降1%左右;输入量化为8 bit,权重量化为10 bit场景下,模型精度损失小于0.2%,达到浮点模型几乎相同的准确率. 实验表明,所提量化方法能够使得模型大小减少75%左右,在基本保持原有模型准确率的同时有效地降低功耗损失、提高运算效率.

关键词: 卷积神经网络, 现场可编程门阵列, 对数量化, 数据标准差, 混合bit

Abstract:

Due to the large scale of the current convolutional neural network model and complex calculations, it is not suitable for deployment on resource-constrained computing platforms. In order to solve this problem, this paper proposes a logarithmic quantization method based on data standard deviation, which is suitable for deployment on FPGA (Field Programmable Gate Array). According to the characteristics of FPGA, this paper proposes a logarithmic quantization method to convert the 32 bit floating point multiplication operation into integer multiplication and shift operation, which improves the efficiency of the operation. By studying the characteristics of data distribution, the input quantization and mixed bit weight quantization methods based on data standard deviation are proposed, which can effectively reduce the quantization loss. The experimental results show that the accuracy of large-scale neural network is only reduced by about 1% due to 8-bit quantization. When the input is quantized to 8 bits and the weight is quantized to 10 bits, the accuracy loss of the model is less than 0.2%, which is almost the same as that of the floating-point model. Experimental results show that the proposed method can reduce the size of the model by about 75%, and effectively reduce the power loss and improve the computing efficiency while maintaining the accuracy of the original model.

Key words: convolutional neural networks, field programmable gate array(FPGA), logarithmic quantization, standard deviation of the data, mixed bit number

中图分类号: