1.华南师范大学计算机学院,广东广州 510631
2.恒生电子股份有限公司研究院,浙江杭州 310053
3.北京大学软件与微电子学院,北京 102600
4.达观数据有限公司,上海 201203
[ "李佳明 男,2002年10月生,广东梅州人.硕士研究生.主要研究方向为知识蒸馏和计算机视觉." ]
[ "鲍志强 男,1995年1月生,江西九江人.博士.主要研究方向为知识蒸馏和模型压缩. E-mail: zhiqiangbao1995@163.com" ]
[ "黄震华 男,1980年9月生,福建莆田人.教授,博士生导师.主要研究方向为机器学习、数据挖掘、推荐系统. E-mail: jukiehuang@163.com" ]
[ "孙圣力 男,1978年12月生,湖南常德人.博士,教授.主要研究方向为机器学习、数据挖掘、数据库. E-mail: slsun@ss.pku.edu.cn" ]
[ "陈运文 男,1981年7月生,江苏南京人.博士,高级工程师.主要研究方向为机器学习、数据挖掘、自然语言处理. E-mail: chenyunwen@datagrand.com" ]
收稿:2024-10-07,
修回:2025-01-24,
纸质出版:2025-04-25
移动端阅览
李佳明, 鲍志强, 黄震华, 等. 基于低秩自适应的伸缩感知蒸馏方法[J]. 电子学报, 2025, 53(04): 1337-1346.
LI Jia-ming, BAO Zhi-qiang, HUANG Zhen-hua, et al. Low-Rank Adaptation Based Flexibility-Aware Distillation Method[J]. Acta Electronica Sinica, 2025, 53(04): 1337-1346.
李佳明, 鲍志强, 黄震华, 等. 基于低秩自适应的伸缩感知蒸馏方法[J]. 电子学报, 2025, 53(04): 1337-1346. DOI:10.12263/DZXB.20240894
LI Jia-ming, BAO Zhi-qiang, HUANG Zhen-hua, et al. Low-Rank Adaptation Based Flexibility-Aware Distillation Method[J]. Acta Electronica Sinica, 2025, 53(04): 1337-1346. DOI:10.12263/DZXB.20240894
知识蒸馏是一种从复杂深层教师模型向轻量级学生模型迁移知识以提升性能的学习范式.针对教师模型分布知识多样性不足,以及构建学生模型架构的搜索空间导致大量资源消耗的问题,本文提出了一种基于低秩自适应的伸缩感知蒸馏(Low-rank Adaptation based Flexibility-Aware distillation,LAFA)方法.LAFA方法通过构建低秩变换矩阵,将教师知识分别变换到学生模型的知识和类别标签,以提高分布知识的多样性.同时,LAFA引入决策辅助器,动态伸缩学生模型容量,从而实现蒸馏性能与容量之间的均衡.进一步,本文提出热启动和松弛策略来优化决策变量.热启动策略通过约束学生模型缓慢增加容量,缓解因容量伸缩而导致的收敛困难.松弛策略则在蒸馏后期移除约束,以少量资源消耗实现显著的性能提升.在CIFAR-100数据集上,LAFA集成于13种蒸馏方法,平均性能提升了0.28个百分点.同时,消融实验和分析实验进一步验证了LAFA方法的有效性.
Knowledge distillation is a learning paradigm that transfers knowledge from a complex and deep teacher model to a lightweight student model to enhance performance. To address the issues of insufficient diversity in the teacher model’s knowledge distribution and the significant resource consumption caused by the search space for constructing the student model’s architecture
we propose a low-rank adaptation based flexibility-aware distillation (LAFA) method. The LAFA method constructs low-rank transformation matrices to map teacher knowledge to both student model knowledge and class labels
thereby enhancing the diversity of distributed knowledge. Meanwhile
LAFA introduces a decision support module that dynamically adjusts the student model’s capacity
achieving a balance between distillation performance and model capacity. Furthermore
we propose the warm-up and relaxation strategies to optimize decision variables. The warm-up strategy constrains the gradual increase in model capacity to alleviate convergence difficulties caused by capacity scaling
while the relaxation strategy removes the constraints in the later stages of distillation
achieving significant performance improvements with minimal resource consumption. On the CIFAR-100 dataset
LAFA integrated into 13 distillation methods achieved an average performance improvement of 0.28 percentage points. Moreover
through ablation experiments and analytical experiments
the effectiveness of the LAFA method is further validated.
HINTON G , VINYALS O , DEAN J . Distilling the knowledge in a neural network [EB/OL ] . ( 2015-03-09 )[ 2024-10-07 ] . https://arxiv.org/abs/1503.02531v1 https://arxiv.org/abs/1503.02531v1 .
ZHAO B R , CUI Q , SONG R J , et al . Decoupled knowledge distillation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 11943 - 11952 .
CHEN P G , LIU S , ZHAO H S , et al . Distilling knowledge via knowledge review [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 5006 - 5015 .
XIN X M , SONG H P , GOU J P . A new similarity-based relational knowledge distillation method [C ] // ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE , 2024 : 3535 - 3539 .
MIRZADEH S I , FARAJTABAR M , LI A , et al . Improved knowledge distillation via teacher assistant [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 4 ): 5191 - 5198 .
LIU Y , JIA X H , TAN M X , et al . Search to distill: Pearls are everywhere but not the eyes [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . IEEE , 2020 : 7536 - 7545 .
ROMERO A , BALLAS N , KAHOU S E , et al . FitNets: Hints for thin deep nets [EB/OL ] . ( 2015-03-27 )[ 2024-10-07 ] . https://arxiv.org/abs/1412.6550v4 https://arxiv.org/abs/1412.6550v4 .
AHN S , HU S X , DAMIANOU A , et al . Variational information distillation for knowledge transfer [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 9163 - 9171 .
YOU S , XU C , XU C , et al . Learning from multiple teacher networks [C ] // Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . New York : ACM , 2017 : 1285 - 1294 .
欧阳毅 , 汤文燕 , 黎晏伶 . 基于特征蒸馏的变分编码器交通流预测模型 [J ] . 电子学报 , 2024 , 52 ( 6 ): 1938 - 1944 .
OUYANG Y , TANG W Y , LI Y L . Traffic flow prediction model based on spatio-temporal feature distillation variational autoencoder [J ] . Acta Electronica Sinica , 2024 , 52 ( 6 ): 1938 - 1944 . (in Chinese)
郑云飞 , 王晓兵 , 张雄伟 , 等 . 基于金字塔知识的自蒸馏HRNet目标分割方法 [J ] . 电子学报 , 2023 , 51 ( 3 ): 746 - 756 .
ZHENG Y F , WANG X B , ZHANG X W , et al . The self-distillation HRNet object segmentation based on the pyramid knowledge [J ] . Acta Electronica Sinica , 2023 , 51 ( 3 ): 746 - 756 . (in Chinese)
WANG X , YU F , DOU Z Y , et al . SkipNet: Learning dynamic routing in convolutional networks [M ] // Computer Vision-ECCV 2018 . Cham : Springer International Publishing , 2018 : 420 - 436 .
SAINATH T N , KINGSBURY B , SINDHWANI V , et al . Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets [C ] // 2013 IEEE International Conference on Acoustics, Speech and Signal Processing . Piscataway : IEEE , 2013 : 6655 - 6659 .
HUIJBEN I A M , KOOL W , PAULUS M B , et al . A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 2 ): 1353 - 1371 .
KRIZHEVSKY A . Learning multiple layers of features from tiny images [EB/OL ] . ( 2009-04-08 )[ 2024-10-07 ] . https://www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf https://www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf .
DENG J , DONG W , SOCHER R , et al . ImageNet: A large-scale hierarchical image database [C ] // 2009 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2009 : 248 - 255 .
HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .
SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [EB/OL ] . ( 2015-04-10 )[ 2024-10-07 ] . https://export.arxiv.org/abs/1409.1556v6 https://export.arxiv.org/abs/1409.1556v6 .
ZHANG X Y , ZHOU X Y , LIN M X , et al . ShuffleNet: An extremely efficient convolutional neural network for mobile devices [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 6848 - 6856 .
ZAGORUYKO S , KOMODAKIS N . Wide residual networks [C ] // Proceedings ofthe British Machine Vision Conference 2016 . Paris : British Machine Vision Association , 2016 .
HUANG G , LIU Z , VAN DER MAATEN L , et al . Densely connected convolutional networks [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 2261 - 2269 .
ZHOU H L , SONG L C , CHEN J J , et al . Rethinking soft labels for knowledge distillation: A bias-variance tradeoff perspective [EB/OL ] . ( 2021-02-01 )[ 2024-10-07 ] . https://arxiv.org/abs/2102.00650 https://arxiv.org/abs/2102.00650 .
JAFARI A , REZAGHOLIZADEH M , SHARMA P , et al . Annealing knowledge distillation [C ] // Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics . EACL : Association for Computational Linguistics , 2021 : 2493 - 2504 .
XU L C , REN J , HUANG Z H , et al . Improving knowledge distillation via head and tail categories [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2024 , 34 ( 5 ): 3465 - 3480 .
PARK W , KIM D , LU Y , et al . Relational knowledge distillation [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 3962 - 3971 .
HEO B , KIM J , YUN S , et al . A comprehensive overhaul of feature distillation [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 1921 - 1930 .
PASSALIS N , TEFAS A . Learning deep representations with probabilistic knowledge transfer [M ] // Computer Vision-ECCV 2018 . Cham : Springer International Publishing , 2018 : 283 - 299 .
TUNG F , MORI G . Similarity-preserving knowledge distillation [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 1365 - 1374 .
ZAGORUYKO S , KOMODAKIS N . Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer [EB/OL ] . ( 2017-02-12 )[ 2024-10-07 ] . https://arxiv.org/abs/1612.03928v3 https://arxiv.org/abs/1612.03928v3 .
0
浏览量
7
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621