Learning Adaptive Gradients for Binary Neural Networks

WANG Zi-wei; LU Ji-wen; ZHOU Jie

doi:10.12263/DZXB.20211084

您当前的位置：

首页 >

文章列表页 >

Learning Adaptive Gradients for Binary Neural Networks

PAPERS | 更新时间：2025-12-08

- Learning Adaptive Gradients for Binary Neural Networks
- ACTA ELECTRONICA SINICA Vol. 51, Issue 2, Pages: 257-266(2023)
- 作者机构：
  
  1.清华大学自动化系，北京 100084
  2.北京信息科学与技术国家研究中心，北京 100084
- 作者简介：
- 基金信息：
  
  The National Key Research and Development Program of China(2017YFA0700802);The National Natural Science Foundation of China(62125603;61822603;U1813218;U1713214)
- DOI：10.12263/DZXB.20211084
  CLC： TP391.4;TP29
- Received：13 August 2021，
  
  Revised：2021-09-28，
  
  Published：25 February 2023
- 稿件说明：
移动端阅览
王子为,鲁继文,周杰.基于自适应梯度优化的二值神经网络[J].电子学报,2023,51(02):257-266.

WANG Zi-wei,LU Ji-wen,ZHOU Jie.Learning Adaptive Gradients for Binary Neural Networks[J].ACTA ELECTRONICA SINICA,2023,51(02):257-266.
王子为,鲁继文,周杰.基于自适应梯度优化的二值神经网络[J].电子学报,2023,51(02):257-266. DOI： 10.12263/DZXB.20211084.

WANG Zi-wei,LU Ji-wen,ZHOU Jie.Learning Adaptive Gradients for Binary Neural Networks[J].ACTA ELECTRONICA SINICA,2023,51(02):257-266. DOI： 10.12263/DZXB.20211084.

摘要

二值神经网络由于在储存空间和计算上的高效性，在视觉任务中被广泛运用.为了训练不可导的二值网络，直通近似（Straight-Through Estimator）和 S型近似（Sigmoid）等多种松弛优化方法被用来拟合量化函数.但是，这些方法存在两个问题：（1）由于松弛函数和量化算子的差异导致的梯度失配；（2）由于激活值饱和引起的梯度消失.量化函数自身的特性使二值网络梯度的准确性和有效性无法同时保证.本文提出了基于自适应梯度优化的二值神经网络（Adaptive Gradient based Binary Neural Networks，AdaBNN），其通过自适应地寻找梯度准确性和有效性之间的最佳平衡来解决梯度失配和梯度消失的问题.具体而言，本文从理论上证明了梯度准确性和有效性之间的矛盾，并通过比较松弛梯度的范数和松弛梯度与真实梯度之间的差距，构建了这一平衡的度量标准.因此，二值神经网络能根据所提出的度量调整松弛函数，从而得到有效训练. 在ImageNet数据集上的实验表明，本文的方法相较于被广泛使用的BNN 网络将top-1准确率提升了17.1%.

Abstract

Binary neural networks are widely employed in visual tasks due to the computation acceleration and storage shrinkage compared with the float counterparts. In order to train the non-differentiable networks

some continuous relaxation methods were proposed to approximate the quantizer including straight-through estimator (STE) and Sigmoid. However

these methods cause: (1) gradient mismatch due to the discrepancy between the quantizer and the relaxed function

(2) gradient vanishing due to the activation saturation. Because of the nature of quantization

the accuracy and validity of the gradient cannot be obtained for binary neural networks at the same time. In this paper

we propose AdaBNN that simultaneously solves the gradient mismatch and vanishing by adaptively achieving the optimal trade-off. Specifically

we theoretically prove the contradiction between gradient accuracy and validity

and formulate the evaluation measure for the trade-off by comparing the relaxed gradient norm and the discrepancy with true gradients. Therefore

the binary neural networks are trained effectively by changing the relaxation function based on the measure. Compared with the widely adopted BNN

experiments on ImageNet show that our method increases the top-1 classification accuracy by 17.1%.

关键词

Keywords

references

HUBARA I , COURBARIAUX M , SOUDRY D , et al . Binarized neural networks [C]// Advances in Neural Information Processing Systems . Barcelona : NIPS , 2016 : 4107 - 4115 .

RASTEGARI M , ORDONEZ V , REDMON J , et al . Xnor-net: Imagenet classification using binary convolutional neural networks [C]// European Conference on Computer Vision . Amsterdam : Springer , 2016 : 525 - 542 .

权宇 , 李志欣 , 张灿龙 , 等 . 融合深度扩张网络和轻量化网络的目标检测模型 [J]. 电子学报 , 2020 , 48 ( 2 ): 390 - 397 .

QUAN Y , LI Z X , ZHANG C L , et al . Fusing deep dilated convolutions network and light-weight network for object detection [J]. Acta Electronica Sinica , 2020 , 48 ( 2 ): 390 - 397 . (in Chinese)

侯志强 , 刘晓义 , 余旺盛 , 等 . 使用GIoU改进非极大值抑制的目标检测算法 [J]. 电子学报 , 2021 , 49 ( 4 ): 696 - 705 .

HOU Z Q , Liu X Y , YU W S , et al . Object detection algorithm for improving non-maximum suppression using GIoU [J]. Acta Electronica Sinica , 2021 , 49 ( 4 ): 696 - 705 . (in Chinese)

李雅倩 , 盖成远 , 肖存军 , 等 . 基于细化多尺度深度特征的目标检测网络 [J]. 电子学报 , 2020 , 48 ( 12 ): 2360 - 2366 .

LI Y Q , GAI C Y , XIAO C J , et al . Objection detection networks based on refined multi-scale depth feature [J]. Acta Electronica Sinica , 2020 , 48 ( 12 ): 2360 - 2366 . (in Chinese)

李维刚 , 叶欣 , 赵云涛 , 等 . 基于改进YOLOv3算法的带钢表面缺陷检测 [J] . 电子学报 , 2020 , 48 ( 7 ): 1284 - 1292 .

LI W G , YE X , ZHAO Y T , et al . Strip steel surface defect detection based on improved YOLOv3 algorithm [J]. Acta Electronica Sinica , 2020 , 48 ( 7 ): 1284 - 1292 . (in Chinese)

LIU Z , WU B , LUO W , et al . Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm [C]// European Conference on Computer Vision . Munich : Springer , 2018 : 722 - 737 .

江泽涛 , 秦嘉奇 , 张少钦 . 参数池化卷积神经网络图像分类方法.电子学报 [J], 2020 , 48 ( 9 ): 1729 - 1734 .

JIANG Z T , QIN J Q , ZHANG S Q . Parameterized pooling convolution neural network for image classification [J]. Acta Electronica Sinica , 2020 , 48 ( 9 ): 1729 - 1734 . (in Chinese)

WEI Y , PAN X , QIN H , et al . Quantization mimic: Towards very tiny cnn for object detection [C]// European Conference on Computer Vision . Munich : Springer , 2018 : 267 - 283 .

WANG Z , LU J , ZHOU J . Learning Channel-Wise Interactions for Binary Convolutional Neural Networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 43 ( 10 ): 3432 - 3445 .

葛疏雨 , 高子淋 , 张冰冰 , 等 . 基于核化双线性卷积网络的细粒度图像分类 [J]. 电子学报 , 2019 , 47 ( 10 ): 2134 - 2141 .

GE S Y , GAO Z L , ZHANG B B , et al . Kernelized bilinear CNN models for fine-grained visual recognition [J]. Acta Electronica Sinica , 2019 , 47 ( 10 ): 2134 - 2141 . (in Chinese)

HOU L , KWOK J T . Loss-aware weight quantization of deep networks [C]// International Conference on Learning Representations . Vancouver : ICLR , 2018 : 1 - 11 .

LENG C , DOU Z , LI H , et al . Extremely low bit neural network: Squeeze the last bit out with admm [C]// Proceedings of the AAAI Conference on Artificial Intelligence . New Orleans : AAAI , 2018 : 3466 - 3473 .

ALIZADEH M , FERNÁNDEZ-MARQUÉS J , LANE N D , et al . An empirical study of binary neural networks' optimization [C]// International Conference on Learning Representations . Vancouver : ICLR , 2018 : 1 - 10 .

YIN P , LYU J , ZHANG S , et al . Understanding straight-through estimator in training activation quantized neural nets [C]// International Conference on Learning Representations . New Orleans : ICLR , 2019 : 1 - 12 .

GONG R , LIU X , JIANG S , et al . Differentiable soft quantization: Bridging full-precision and low-bit neural networks [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul : IEEE , 2019 : 4852 - 4861 .

KRIZHEVSKY A , HINTON G . Learning Multiple Layers of Features From Tiny Images [R]. Toronto : University of Toronto , 2009 .

DENG J , DONG W , SOCHER R , et al . Imagenet: A large-scale hierarchical image database [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Miami : IEEE , 2009 : 248 - 255 .

WANG Z , LU J , WU Z , et al . Learning efficient binarized object detectors with information compression [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 44 ( 6 ): 3082 - 3095 .

龚成 , 卢冶 , 代素蓉 , 等 . 一种超低损失的深度神经网络量化压缩方法 [J]. 软件学报 , 2021 , 32 ( 8 ): 2391 - 2407 .

GONG C , LU Y , DAI S R , et al . Ultra-low loss quantization method for deep neural network compression [J]. Journal of Software , 2021 , 32 ( 8 ): 2391 - 2407 . (in Chinese)

WANG Z , XIAO H , LU J , et al . Generalizable mixed-precision quantization via attribution rank preservation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal : IEEE , 2021 : 5291 - 5300 .

WANG Z , ZHENG Q , LU J , et al . Deep hashing with active pairwise supervision [C]// European Conference on Computer Vision . Glasgow : Springer , 2020 : 522 - 538 .

YANG J , SHEN X , XING J , et al . Quantization networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach : IEEE , 2019 : 7308 - 7316 .

汪海龙 , 禹晶 , 肖创柏 . 基于点对相似度的深度非松弛哈希算法 [J]. 自动化学报 , 2021 , 47 ( 5 ): 1077 - 1086 .

WANG H L , YU J , XIAO C B . Deep non-relaxing hashing based on point pair similarity [J]. Acta Automatica Sinica , 2021 , 47 ( 5 ): 1077 - 1086 . (in Chinese)

董震 , 裴明涛 . 基于异构哈希网络的跨模态人脸检索方法 [J]. 计算机学报 , 2019 , 42 ( 1 ): 73 - 84 .

DONG Zhen , PEI M T . Cross-modality face retrieval based on heterogeneous hashing network [J]. Chinese Journal of Computers , 2019 , 42 ( 1 ): 73 - 84 . (in Chinese)

ZHU C , HAN S , MAO H , et al . Trained ternary quantization [C]// International Conference on Learning Representations . Toulon : ICLR , 2017 : 1 - 10 .

COURBARIAUX M , BENGIO Y , DAVID J P . Binaryconnect: Training deep neural networks with binary weights during propagations [C]// Advances in Neural Information Processing Systems . Montréal : NIPS , 2015 : 3123 - 3131 .

ZHOU S , WU Y , NI Z , et al . Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients [C]// International Conference on Learning Representations . San Juan : ICLR , 2016 : 1 - 13 .

LOUIZOS C , ULLRICH K , WELLING M . Bayesian compression for deep learning [C]// Advances in Neural Information Processing Systems . Long Beach : NIPS , 2017 : 3288 - 3298 .

ZHANG D , YANG J , YE D , et al . Lq-nets: Learned quantization for highly accurate and compact deep neural networks [C]// European Conference on Computer Vision . Munich : Springer , 2018 : 365 - 382 .

ULLRICH K , MEEDS E , WELLING M . Soft weight-sharing for neural network compression [C]// International Conference on Learning Representations . Toulon : ICLR , 2017 : 1 - 10 .

BANNER R , HUBARA I , HOFFER E , et al . Scalable methods for 8-bit training of neural networks [C]// Advances in Neural Information Processing Systems . Montréal : NIPS , 2018 : 5145 - 5153 .

LIN X , ZHAO C , PAN W . Towards accurate binary convolutional neural network [C]// Advances in Neural Information Processing Systems . Long Beach : NIPS , 2017 : 345 - 353 .

BETHGE J , HYANG H , BORNSTEIN M , et al . Back to simplicity: How to train accurate bnns from scratch? [EB/OL]. ( 2019-06-19 )[ 2021-08-13 ]. https://arxiv.org/abs/1906.08637 https://arxiv.org/abs/1906.08637 .

DUAN Y , LU J , WANG Z , et al . Learning deep binary descriptor with multi-quantization [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 , 41 ( 8 ): 1924 - 1938 .

TANG W , HUA G , WANG L . How to train a compact binary neural network with high accuracy? [C]// Proceedings of the AAAI Conference on Artificial Intelligence . San Francisco : AAAI , 2017 : 2625 - 2631 .

WANG P , HU Q , ZHANG Y , et al . Two-step quantization for low-bit neural networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City : IEEE , 2018 : 4376 - 4384 .

GU J , LI C , ZHANG B , et al . Projection convolutional neural networks for 1-bit cnns via discrete back propagation [C]// Proceedings of the AAAI Conference on Artificial Intelligence . Honolulu : AAAI , 2019 : 8344 - 8351 .

BOYD S , PARIKH N , CHU E , et al . Distributed optimization and statistical learning via the alternating direction method of multipliers [J]. Foundations and Trends in Machine learning , 2011 , 3 ( 1 ): 1 - 122 .

ANDERSON A G , BERG C P . The high-dimensional geometry of binary neural networks [C]// International Conference on Learning Representations . Vancouver : ICLR , 2018 : 1 - 10 .

HE K , ZHANG X , REN S , et al . Deep residual learning for image recognition [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Las Vegas : IEEE , 2016 : 770 - 778 .

KINGMA D P , BA J . Adam: A method for stochastic optimization [C]// International Conference on Learning Representations . San Diego : ICLR , 2015 : 1 - 11 .

CAI Z , HE X , SUN J , et al . Vasconcelos. Deep learning with low precision by half-wave gaussian quantization [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Honolulu : IEEE , 2017 : 5918 - 5926 .

CHOI J , WANG Z , VENKATARAMANI S , et al . PACT: Parameterized clipping activation for quantized neural networks [C]// International Conference on Learning Representations . Vancouver : ICLR , 2018 : 1 - 11 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Image Classification Network of Background Perception Mechanism

Image Classification Algorithm Based on Coordinate Importance Pooling and Decoupled Class Alignment Distillation

Feature Anomaly Detection and Pseudo-Label Regression for Adversarial Domain Adaptation

Graph-Based Representation Knowledge Distillation for Image Classification

Related Author

WANG Zi-Wei

LU Ji-Wen

ZHOU Jie

YUAN Heng

RAN Chao

ZHANG Sheng-chong

LIU Ying

XUE Jia-hao

Related Institution

Department of Automation， Tsinghua University

Key Laboratory of Optoelectronic Information Control and Security Technology

School of Software, Liaoning Technical University

University of Huddersfield, West Yorkshire HD13DH, United Kingdom of Great Britain and Northern Ireland

International Joint-Research Center for Wireless Communication and Information Processing

⁰