• •
王子为1,2, 鲁继文1,2, 周杰1,2
收稿日期:
2021-08-13
修回日期:
2021-09-28
出版日期:
2023-02-01
通讯作者:
作者简介:
基金资助:
WANG Zi-Wei1,2, LU Ji-Wen1,2, ZHOU Jie1,2
Received:
2021-08-13
Revised:
2021-09-28
Online:
2023-02-01
Corresponding author:
Supported by:
摘要:
二值神经网络由于其在储存空间和计算上的高效性,在视觉任务中被广泛运用.为了训练不可导的二值网络,直通近似(Straight-Through Estimator)和 S型近似(Sigmoid)等多种松弛优化方法被用来拟合量化函数.但是,这些方法存在两个问题:(1)由于松弛函数和量化算子的差异导致的梯度失配,(2)由于激活值饱和引起的梯度消失.量化函数自身的特性使得二值网络梯度的准确性和有效性无法同时保证.本文提出了基于自适应梯度优化的二值神经网络(Adaptive Gradient based Binary Neural Networks, AdaBNN),其通过自适应地寻找梯度准确性和有效性之间的最佳平衡来解决梯度失配和梯度消失的问题.具体而言,本文从理论上证明了梯度准确性和有效性之间的矛盾,并通过比较松弛梯度的范数和松弛梯度与真实梯度之间的差距,构建了这一平衡的度量标准.因此,二值神经网络能根据所提出的度量调整松弛函数,从而得到有效训练. 在ImageNet数据集上的实验表明,本文的方法相较于被广泛使用的BNN 网络将top-1准确率提升了17.1%.
中图分类号:
王子为, 鲁继文, 周杰. 基于自适应梯度优化的二值神经网络[J]. 电子学报, DOI: 10.12263/DZXB.20211084.
WANG Zi-Wei, LU Ji-Wen, ZHOU Jie. Learning Adaptive Gradients for Binary Neural Networks[J]. Acta Electronica Sinica, DOI: 10.12263/DZXB.20211084.
图1 符号函数,多项式函数,S形函数和双曲正切函数之间的比较.松弛函数不同的超参数影响着其对符号函数的近似程度.平缓的松弛函数会导致梯度失配,而陡峭的松弛函数会因为输入饱和导致梯度消失.传统方法使用固定的松弛函数或手动调整松弛函数的陡度,导致训练过程中梯度准确性和有效性之间的平衡只能获得次优解.而我们提出的基于自适应梯度优化二值神经网络可以在训练阶段根据评估度量,动态自适应地寻找最优松弛函数.
图2 基于自适应梯度优化的二值神经网络训练流程图.各层全精度的权重和激活值通过自适应量化层(AdaBin)量化成二值,其中尺度和陡度参数α和β决定了松弛函数的形式.对于权重,这两个参数在网络优化的过程中进行更新;对于激活值,我们利用动态调整器计算α和β.我们的方法使得梯度准确性和有效性的平衡能够在训练阶段动态保持最优.在模型推理阶段,我们移除动态调整器,并将自适应量化层替换为符号函数.
松弛函数 | ResNet20 | |
---|---|---|
全精度 | - | 92.10 |
二值 | 恒等 | 89.90 |
S形函数 | 90.53 | |
双曲正切 | 90.17 | |
多项式 | 89.66 |
表1 选用不同的松弛函数时我们的方法在CIFAR-10数据集上的分类准确率(%),其中网络结构是ResNet20
松弛函数 | ResNet20 | |
---|---|---|
全精度 | - | 92.10 |
二值 | 恒等 | 89.90 |
S形函数 | 90.53 | |
双曲正切 | 90.17 | |
多项式 | 89.66 |
m | 50 | 100 | 200 |
---|---|---|---|
S形函数 | 89.99 | 90.53 | 89.97 |
双曲正切 | 90.09 | 90.49 | 90.18 |
表2 我们选用了不同的真实梯度估计器,来比较其在CIFAR-10数据集上的分类准确率(%),其中网络结构是ResNet20
m | 50 | 100 | 200 |
---|---|---|---|
S形函数 | 89.99 | 90.53 | 89.97 |
双曲正切 | 90.09 | 90.49 | 90.18 |
固定 | 静态 | 动态 | |
---|---|---|---|
固定 | 88.81 | 88.63 | 89.15 |
静态 | 89.38 | 89.18 | 89.45 |
动态 | 89.86 | 89.94 | 90.53 |
表3 CIFAR-10数据集上的分类准确率,选用不同的α和β自由度来作比较
固定 | 静态 | 动态 | |
---|---|---|---|
固定 | 88.81 | 88.63 | 89.15 |
静态 | 89.38 | 89.18 | 89.45 |
动态 | 89.86 | 89.94 | 90.53 |
方法 | 位宽 | VGG-small | ResNet20 |
---|---|---|---|
全精度 | 32/32 | 93.20 | 92.10 |
BC | 1/32 | 90.10 | - |
TTQ | 2/32 | - | 91.13 |
HWGQ | 1/2 | 92.50 | - |
LQ-Net | 1/2 | 93.40 | 88.40 |
PACT | 2/2 | - | 89.70 |
BNN | 1/1 | 89.90 | - |
DoReFa-Net | 1/1 | - | 79.30 |
Xnor-Net | 1/1 | 89.80 | - |
DSQ | 1/1 | 91.72 | 84.11 |
AdaBNN | 1/1 | 92.52 | 90.53 |
表4 在CIFAR-10上与现有方法的分类准确率比较(%),其中VGG-small和ResNet20的结构被采用.位宽表示权重/激活的比特数
方法 | 位宽 | VGG-small | ResNet20 |
---|---|---|---|
全精度 | 32/32 | 93.20 | 92.10 |
BC | 1/32 | 90.10 | - |
TTQ | 2/32 | - | 91.13 |
HWGQ | 1/2 | 92.50 | - |
LQ-Net | 1/2 | 93.40 | 88.40 |
PACT | 2/2 | - | 89.70 |
BNN | 1/1 | 89.90 | - |
DoReFa-Net | 1/1 | - | 79.30 |
Xnor-Net | 1/1 | 89.80 | - |
DSQ | 1/1 | 91.72 | 84.11 |
AdaBNN | 1/1 | 92.52 | 90.53 |
方法 | 位宽 | ResNet18 | ResNet34 | ||
---|---|---|---|---|---|
top-1 | top-5 | top-1 | top-5 | ||
全精度 | 32/32 | 69.3 | 89.2 | 7.3 | 91.3 |
BWN | 1/32 | 60.8 | 80.3 | - | - |
HWGQ | 1/2 | 59.6 | 82.2 | 64.3 | 85.7 |
LQ-Net | 1/2 | 62.6 | 84.3 | 66.6 | 86.9 |
PACT | 2/2 | 55.4 | 78.6 | - | - |
BNN | 1/1 | 42.2 | 67.1 | - | - |
Xnor-Net | 1/1 | 51.2 | 73.2 | - | - |
ABC-Net | 1/1 | 42.7 | 67.6 | 52.4 | 76.5 |
QN | 1/1 | 53.6 | 75.3 | - | - |
AdaBNN | 1/1 | 54.3 | 77.4 | 60.6 | 82.2 |
Bi-Real-Net | 1/1 | 56.4 | 79.5 | 62.2 | 83.9 |
AdaBNN+SC | 1/1 | 59.3 | 81.2 | 63.8 | 84.8 |
表5 在ImageNet上与现有方法关于第一和前五分类准确率(%)的比较,其中ResNet18和ResNet34的结构被采用.位宽表示权重/激活的比特数
方法 | 位宽 | ResNet18 | ResNet34 | ||
---|---|---|---|---|---|
top-1 | top-5 | top-1 | top-5 | ||
全精度 | 32/32 | 69.3 | 89.2 | 7.3 | 91.3 |
BWN | 1/32 | 60.8 | 80.3 | - | - |
HWGQ | 1/2 | 59.6 | 82.2 | 64.3 | 85.7 |
LQ-Net | 1/2 | 62.6 | 84.3 | 66.6 | 86.9 |
PACT | 2/2 | 55.4 | 78.6 | - | - |
BNN | 1/1 | 42.2 | 67.1 | - | - |
Xnor-Net | 1/1 | 51.2 | 73.2 | - | - |
ABC-Net | 1/1 | 42.7 | 67.6 | 52.4 | 76.5 |
QN | 1/1 | 53.6 | 75.3 | - | - |
AdaBNN | 1/1 | 54.3 | 77.4 | 60.6 | 82.2 |
Bi-Real-Net | 1/1 | 56.4 | 79.5 | 62.2 | 83.9 |
AdaBNN+SC | 1/1 | 59.3 | 81.2 | 63.8 | 84.8 |
存储量 | FLOPs(G) | ||
---|---|---|---|
ResNet18 | 全精度 | 374.1Mbit | 1.81 |
Xnor-Net | 33.7Mbit | 0.17 | |
Bi-Real-Net | 33.6Mbit | 0.16 | |
AdaBNN | 33.4Mbit | 0.15 |
表6 在ResNet18结构上与现有二值网络在计算量和存储代价上的比较
存储量 | FLOPs(G) | ||
---|---|---|---|
ResNet18 | 全精度 | 374.1Mbit | 1.81 |
Xnor-Net | 33.7Mbit | 0.17 | |
Bi-Real-Net | 33.6Mbit | 0.16 | |
AdaBNN | 33.4Mbit | 0.15 |
1 | HUBARA I, COURBARIAUX M, SOUDRY D, et al. Binarized neural networks[C]//Advances in Neural Information Processing Systems. Barcelona: NIPS, 2016: 4107-4115. |
2 | RASTEGARI M, ORDONEZ V, REDMON J, et al. Xnor-net: Imagenet classification using binary convolutional neural networks[C]//European Conference on Computer Vision. Amsterdam: Springer, 2016: 525-542. |
3 | 权宇, 李志欣, 张灿龙, 等. 融合深度扩张网络和轻量化网络的目标检测模型[J]. 电子学报, 2020, 48(2): 390-397. |
QUAN Y, LI Z X, ZHANG C L, et al. Fusing deep dilated convolutions network and light-weight network for object detection[J]. Acta Electronica Sinica, 2020, 48(2): 390-397. | |
4 | 侯志强, 刘晓义, 余旺盛,等.使用GIoU改进非极大值抑制的目标检测算法[J]. 电子学报, 2021,49(4):696-705. |
HOU Z Q, Liu X Y, YU W S, et al. Object detection algorithm for improving non-maximum suppression using GIoU[J]. Acta Electronica Sinica, 2021,49(4): 696-705. | |
5 | 李雅倩, 盖成远, 肖存军, 等. 基于细化多尺度深度特征的目标检测网络[J]. 电子学报, 2020, 48(12): 2360-2366. |
LI Y Q, GAI C Y, XIAO C J, et al. Objection detection networks based on refined multi-scale depth feature[J]. Acta Electronica Sinica, 2020, 48(12): 2360-2366. | |
6 | 李维刚, 叶欣, 赵云涛, 等. 基于改进YOLOv3算法的带钢表面缺陷检测[J]. 电子学报, 2020, 48(7): 1284-1292. |
LI W G, YE X, ZHAO Y T, et al. Strip steel surface defect detection based on improved YOLOv3 algorithm[J]. Acta Electronica Sinica, 2020, 48(7): 1284-1292. | |
7 | LIU Z, WU B, LUO W, et al. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm[C]//European Conference on Computer Vision. Munich: Springer, 2018: 722-737. |
8 | 江泽涛, 秦嘉奇, 张少钦. 参数池化卷积神经网络图像分类方法.电子学报[J], 2020,48(09):1729-1734. |
JIANG Z T, QIN J Q, ZHANG S Q. Parameterized pooling convolution neural network for image classification[J]. Acta Electronica Sinica, 2020,48(09):1729-1734. | |
9 | WEI Y, PAN X, QIN H, et al. Quantization mimic: Towards very tiny cnn for object detection[C]//European Conference on Computer Vision. Munich: Springer, 2018: 267-283. |
10 | WANG Z, LU J, ZHOU J. Learning Channel-Wise Interactions for Binary Convolutional Neural Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10):3432-3445. |
11 | 葛疏雨, 高子淋, 张冰冰, 等. 基于核化双线性卷积网络的细粒度图像分类. 电子学报[J], 2019,47(10):2134-2141. |
GE S Y, GAO Z L, ZHANG B B, et al. Kernelized bilinear CNN models for fine-grained visual recognition[J]. Acta Electronica Sinica, 2019,47(10):2134-2141. | |
12 | HOU L, KWOK J T. Loss-aware weight quantization of deep networks[C]//International Conference on Learning Representations. Vancouver: ICLR, 2018:1-11. |
13 | LENG C, DOU Z, LI H, et al. Extremely low bit neural network: Squeeze the last bit out with admm[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018, 3466-3473. |
14 | ALIZADEH M, FERNÁNDEZ-MARQUÉS J, LANE N D, et al. An empirical study of binary neural networks' optimization[C]// International Conference on Learning Representations. Vancouver: ICLR, 2018: 1-10. |
15 | YIN P, LYU J, ZHANG S, et al. Understanding straight-through estimator in training activation quantized neural nets[C]//International Conference on Learning Representations. New Orleans: ICLR, 2019: 1-12. |
16 | GONG R, LIU X, JIANG S, et al. Differentiable soft quantization: Bridging full-precision and low-bit neural networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 4852-4861. |
17 | KRIZHEVSKY A, HINTON G. Learning Multiple Layers of Features From Tiny Images[R]. Toronto: University of Toronto, Technical report: 2009. |
18 | DENG J, DONG W, SOCHER R, et al. Imagenet: A large-scale hierarchical image database[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009: 248-255. |
19 | WANG Z, LU J, WU Z, et al. Learning efficient binarized object detectors with information compression[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6):3082-3095. |
20 | 龚成, 卢冶, 代素蓉, 等. 一种超低损失的深度神经网络量化压缩方法[J]. 软件学报, 2021,32(8):2391-2407. |
GONG C, LU Y, DAI S R, et al. Ultra-low loss quantization method for deep neural network compression[J]. Journal of Software, 2021,32(8):2391-2407. | |
21 | WANG Z, XIAO H, LU J, et al. Generalizable Mixed-Precision Quantization via Attribution Rank Preservation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Virtual Conference: IEEE, 2021: 5291-5300. |
22 | WANG Z, ZHENG Q, LU J, et al. Deep Hashing with Active Pairwise Supervision[C]// European Conference on Computer Vision. Virtual Conference: Springer, 2020: 522-538. |
23 | YANG J, SHEN X, XING J, et al. Quantization networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7308-7316. |
24 | 汪海龙, 禹晶, 肖创柏. 基于点对相似度的深度非松弛哈希算法[J]. 自动化学报, 2021,47(5):1077-1086. |
WANG H L, YU J, XIAO C B. Deep non-relaxing hashing based on point pair similarity[J]. Acta Automatica Sinica, 2021,47(5):1077-1086. | |
25 | 董震, 裴明涛. 基于异构哈希网络的跨模态人脸检索方法[J]. 计算机学报, 2019,42(1):73-84. |
DONG Zhen, PEI M T. Cross-modality face retrieval based on heterogeneous hashing network[J]. Chinese Journal of Computers, 2019,42(1):73-84. | |
26 | ZHU C, HAN S, MAO H, et al. Trained ternary quantization[C]// International Conference on Learning Representations. Toulon: ICLR, 2017: 1-10. |
27 | COURBARIAUX M, BENGIO Y, DAVID J P. Binaryconnect: Training deep neural networks with binary weights during propagations[C]//Advances in Neural Information Processing Systems. Montréal: NIPS, 2015: 3123-3131. |
28 | ZHOU S, WU Y, NI Z, et al. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients[C]// International Conference on Learning Representations. San Juan: ICLR, 2016:1-13. |
29 | LOUIZOS C, ULLRICH K, WELLING M. Bayesian compression for deep learning[C]//Advances in Neural Information Processing Systems. Long Beach: NIPS, 2017: 3288-3298. |
30 | ZHANG D, YANG J, YE D, et al. Lq-nets: Learned quantization for highly accurate and compact deep neural networks[C]//European Conference on Computer Vision. Munich: Springer, 2018: 365-382. |
31 | ULLRICH K, MEEDS E, WELLING M. Soft weight-sharing for neural network compression[C]//International Conference on Learning Representations. Toulon: ICLR, 2017: 1-10. |
32 | BANNER R, HUBARA I, HOFFER E, et al. Scalable methods for 8-bit training of neural networks[C]//Advances in Neural Information Processing Systems. Montréal: NIPS, 2018: 5145-5153. |
33 | LIN X, ZHAO C, PAN W. Towards accurate binary convolutional neural network[C]//Advances in Neural Information Processing Systems. Long Beach: NIPS, 2017: 345-353. |
34 | BETHGE J, HYANG H, BORNSTEIN M, et al. Back to simplicity: How to train accurate bnns from scratch?[EB/OL].(2019-06-19)[2021-08-13]. . |
35 | DUAN Y, LU J, WANG Z, et al. Learning Deep Binary Descriptor with Multi-Quantization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(8):1924-1938. |
36 | TANG W, HUA G, WANG L. How to train a compact binary neural network with high accuracy?[C]//Proceedings of the AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2017: 2625-2631. |
37 | WANG P, HU Q, ZHANG Y, et al. Two-step quantization for low-bit neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4376-4384. |
38 | GU J, LI C, ZHANG B, et al. Projection convolutional neural networks for 1-bit cnns via discrete back propagation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu: AAAI, 2019: 8344-8351. |
39 | BOYD S, PARIKH N, CHU E, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers[J]. Foundations and Trends® in Machine learning, 2011,3(1):1-122. |
40 | YIN P, LYU J, ZHANG S, et al. Understanding straight-through estimator in training activation quantized neural nets[C]. International Conference on Learning Representations. New Orleans: ICLR, 2019:1-12. |
41 | ANDERSON A G, BERG C P. The high-dimensional geometry of binary neural networks[C]//International Conference on Learning Representations. Vancouver: ICLR, 2018: 1-10. |
42 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778. |
43 | KINGMA D P, BA J. Adam: A method for stochastic optimization[C]//International Conference on Learning Representations. San Diego, USA: ICLR, 2015: 1-11. |
44 | CAI Z, HE X, SUN J, et al. Vasconcelos. Deep learning with low precision by half-wave gaussian quantization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 5918-5926. |
45 | CHOI J, WANG Z, VENKATARAMANI S, et al. Pact: Parameterized clipping activation for quantized neural networks[C]//International Conference on Learning Representations. Vancouver: ICLR, 2018:1-11. |
[1] | 王雪松, 张翰林, 程玉虎. 基于自编码器和超图的半监督宽度学习系统[J]. 电子学报, 2022, 50(3): 533-539. |
[2] | 刘芳, 韩笑. 基于小波变换和深度网络的着陆地貌图像分类[J]. 电子学报, 2021, 49(11): 2171-2176. |
[3] | 江泽涛, 秦嘉奇, 张少钦. 参数池化卷积神经网络图像分类方法[J]. 电子学报, 2020, 48(9): 1729-1734. |
[4] | 邵航, 黄海亮, 郭雨晨, 戴琼海. 一种基于可信度估计单元的图像分类噪声抑制深度学习策略[J]. 电子学报, 2020, 48(10): 1969-1975. |
[5] | 魏志强, 毕海霞, 刘霞. 基于深度卷积神经网络的图上半监督极化SAR图像分类算法[J]. 电子学报, 2020, 48(1): 66-74. |
[6] | 曹晔. 一种基于局部排序的约束稀疏编码的图像分类方法[J]. 电子学报, 2019, 47(4): 832-836. |
[7] | 葛疏雨, 高子淋, 张冰冰, 李培华. 基于核化双线性卷积网络的细粒度图像分类[J]. 电子学报, 2019, 47(10): 2134-2141. |
[8] | 陈允杰, 马辰阳, 孙乐, 詹天明. 基于边缘修正的高光谱图像超像素空谱核分类方法[J]. 电子学报, 2019, 47(1): 73-81. |
[9] | 李雅倩, 吴超, 李海滨, 刘彬. 局部位置特征与全局轮廓特征相结合的图像分类方法[J]. 电子学报, 2018, 46(7): 1726-1731. |
[10] | 王凯丽, 张艳红, 肖斌, 李伟生. 一种基于二维局部二值模式的纹理图像分类方法[J]. 电子学报, 2018, 46(10): 2519-2526. |
[11] | 徐金环, 沈煜, 刘鹏飞, 肖亮. 联合核稀疏多元逻辑回归和TV-L1错误剔除的高光谱图像分类算法[J]. 电子学报, 2018, 46(1): 175-184. |
[12] | 杨赛, 赵春霞, 胡彬, 陈峰. 基于图正则化局部特征编码算法的图像分类方法[J]. 电子学报, 2017, 45(8): 1882-1887. |
[13] | 程玉虎, 乔雪, 王雪松. 基于混合属性的零样本图像分类[J]. 电子学报, 2017, 45(6): 1462-1468. |
[14] | 胡根生, 查慧敏, 梁栋, 鲍文霞. 结合分类与迁移学习的薄云覆盖遥感图像地物信息恢复[J]. 电子学报, 2017, 45(12): 2855-2862. |
[15] | 姚拓中, 安鹏, 宋加涛. 基于历史分类加权和分级竞争采样的多视角主动学习[J]. 电子学报, 2017, 45(1): 46-53. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||