电子学报

• •    

基于自适应梯度优化的二值神经网络

王子为1,2, 鲁继文1,2, 周杰1,2   

  1. 1.清华大学自动化系,北京 100084
    2.北京信息科学与技术国家研究中心,北京 100084
  • 收稿日期:2021-08-13 修回日期:2021-09-28 出版日期:2023-02-01
    • 通讯作者:
    • 鲁继文
    • 作者简介:
    • 王子为 男,1996年生,湖南益阳人. 清华大学自动化系博士生,主要研究方向为特征学习和模型压缩.E-mail: wang-zw18@mails.tsinghua.edu.cn
      鲁继文 男,1981年生,湖北武穴人. 清华大学自动化系长聘副教授,IAPR Fellow,主要研究方向为计算机视觉和模式识别. 获国家杰出青年科学基金项目资助.E-mail: lujiwen@tsinghua.edu.cn
      周杰 男,1968年生,河南信阳人. 清华大学自动化系教授,IAPR Fellow,主要研究方向为计算机视觉和模式识别. 获国家杰出青年科学基金项目资助.E-mail: jzhou@tsinghua.edu.cn
    • 基金资助:
    • 国家重点研发计划(2017YFA0700802);国家自然科学基金(62125603)

Learning Adaptive Gradients for Binary Neural Networks

WANG Zi-Wei1,2, LU Ji-Wen1,2, ZHOU Jie1,2   

  1. 1.Department of Automation,Tsinghua University,Beijing 100084
    2.Beijing National Research Center for Information Science and Technology,Beijing 100084
  • Received:2021-08-13 Revised:2021-09-28 Online:2023-02-01
    • Corresponding author:
    • LU Ji-Wen
    • Supported by:
    • The National Key Research and Development Program of China(2017YFA0700802);The National Natural Science Foundation of China(62125603)

摘要:

二值神经网络由于其在储存空间和计算上的高效性,在视觉任务中被广泛运用.为了训练不可导的二值网络,直通近似(Straight-Through Estimator)和 S型近似(Sigmoid)等多种松弛优化方法被用来拟合量化函数.但是,这些方法存在两个问题:(1)由于松弛函数和量化算子的差异导致的梯度失配,(2)由于激活值饱和引起的梯度消失.量化函数自身的特性使得二值网络梯度的准确性和有效性无法同时保证.本文提出了基于自适应梯度优化的二值神经网络(Adaptive Gradient based Binary Neural Networks, AdaBNN),其通过自适应地寻找梯度准确性和有效性之间的最佳平衡来解决梯度失配和梯度消失的问题.具体而言,本文从理论上证明了梯度准确性和有效性之间的矛盾,并通过比较松弛梯度的范数和松弛梯度与真实梯度之间的差距,构建了这一平衡的度量标准.因此,二值神经网络能根据所提出的度量调整松弛函数,从而得到有效训练. 在ImageNet数据集上的实验表明,本文的方法相较于被广泛使用的BNN 网络将top-1准确率提升了17.1%.

关键词: 二值神经网络, 梯度饱和, 梯度失配, 自适应梯度, 图像分类

Abstract:

Binary neural networks are widely employed in visual tasks due to the computation acceleration and storage shrinkage compared with the float counterparts. In order to train the non-differentiable networks, some continuous relaxation methods were proposed to approximate the quantizer including Straight-Through Estimator (STE) and Sigmoid. However, these methods cause: (1) gradient mismatch due to the discrepancy between the quantizer and the relaxed function, (2) gradient vanishing due to the activation saturation. Because of the nature of quantization, the accuracy and validity of the gradient cannot be obtained for binary neural networks at the same time. In this paper, we propose AdaBNN that simultaneously solves the gradient mismatch and vanishing by adaptively achieving the optimal trade-off. Specifically, we theoretically prove the contradiction between gradient accuracy and validity, and formulate the evaluation measure for the trade-off by comparing the relaxed gradient norm and the discrepancy with true gradients. Therefore, the binary neural networks are trained effectively by changing the relaxation function based on the measure. Compared with the widely adopted BNN, experiments on ImageNet show that our method increases the top-1 classification accuracy by 17.1%.

Key words: binary neural networks, gradient saturation, gradient mismatch, adaptive gradients, image classification

中图分类号: