基于轻量自蒸馏的低成本联邦学习

刘松; 罗杨宇; 许佳培; 张建忠

doi:10.12263/DZXB.20240325

您当前的位置：

首页 >

文章列表页 >

基于轻量自蒸馏的低成本联邦学习

学术论文 | 更新时间：2025-12-08

- 基于轻量自蒸馏的低成本联邦学习
- Low-Cost Federated Learning Based on Lightweight Self-Distillation
- 电子学报 2025年53卷第1期页码：259-269
- 作者机构：
  
  1.南开大学计算机学院，天津 300350
  2.南开大学网络空间安全学院，天津 300350
  3.数据与智能系统安全教育部重点实验室，天津 300350
- 作者简介：
  
  [ "刘松男，1998年10月出生于河北省邢台市.现为南开大学计算机学院博士研究生.主要研究方向为边缘计算、联邦学习与5G网络架构.E-mail: liusong0918@mail.nankai.edu.cn" ]
  [ "罗杨宇男，2001年3月出生于福建省漳州市.现为南开大学网络空间安全学院硕士研究生.主要研究方向为边缘计算、5G网络架构与算力网络.E-mail: lyy@mail.nankai.edu.cn" ]
  [ "许佳培男，2001年5月生于河北承德市.现为南开大学计算机学院硕士研究生.主要研究方向为视频传输与边缘计算.E-email: 2120230702@nankai.edu.cn" ]
  [ "张建忠男，1964年6月出生于河北省石家庄市.现为南开大学网络空间安全学院教授、博士生导师.主要研究方向为计算机网络与网络安全.E-mail: zhangjz@nankai.edu.cn" ]
- 基金信息：
  
  天津市科技重大专项与工程(18ZXZNGX00200)
- DOI：10.12263/DZXB.20240325
  中图分类号： TP391;
- 收稿：2024-04-09，
  
  修回：2024-11-10，
  
  纸质出版：2025-01-25
- 稿件说明：
移动端阅览
刘松, 罗杨宇, 许佳培, 等. 基于轻量自蒸馏的低成本联邦学习[J]. 电子学报, 2025, 53(01): 259-269.

LIU Song, LUO Yang-yu, XU Jia-pei, et al. Low-Cost Federated Learning Based on Lightweight Self-Distillation[J]. Acta Electronica Sinica, 2025, 53(01): 259-269.
刘松, 罗杨宇, 许佳培, 等. 基于轻量自蒸馏的低成本联邦学习[J]. 电子学报, 2025, 53(01): 259-269. DOI：10.12263/DZXB.20240325

LIU Song, LUO Yang-yu, XU Jia-pei, et al. Low-Cost Federated Learning Based on Lightweight Self-Distillation[J]. Acta Electronica Sinica, 2025, 53(01): 259-269. DOI：10.12263/DZXB.20240325

摘要

随着边缘计算的发展，深度学习模型的训练越发依赖于大量边缘设备生成的隐私数据.在此背景下，联邦学习因其突出的隐私保护能力而受到学术界和工业界的广泛瞩目.在实践中，联邦学习面临着因数据异质性和计算资源受限导致的训练效率低下和模型质量不理想的问题.本文受知识蒸馏理念的启发，提出一种采用轻量自蒸馏技术的高效联邦学习算法（efficient Federated learning with lightweight Self Knowledge Distillation，FedSKD），该算法首先利用自蒸馏技术挖掘训练过程中的内在知识，以减轻本地模型的过拟合问题并增强其泛化能力，并通过服务端参数聚合将本地模型的泛化能力转移到全局模型，从而提高全局模型质量和收敛速度.其次，通过动态同步机制，进一步提高全局模型的准确率和训练效率.实验结果表明，FedSKD算法在非独立同分布数据划分策略下，在降低训练代价的同时，提高了模型准确率和训练效率.在CIFAR10/100数据集上，与最新的基线算法FedMLD算法相比，FedSKD算法在准确率上取得了平均2%的提升，并降低了平均56%的训练代价.

Abstract

With the development of edge computing

the training of deep learning models increasingly relies on the privacy data generated by a large number of edge devices. In this context

federated learning has drawn extensive attention from both academia and industry due to its prominent privacy protection capabilities. However

in practice

federated learning faces challenges such as inefficient training and suboptimal model quality due to data heterogeneity and limited computational resources. Inspired by the concept of knowledge distillation

this paper proposes an efficient federated learning algorithm

named efficient federated learning with lightweight self knowledge distillation (FedSKD). This algorithm utilizes lightweight self-distillation techniques to extract intrinsic knowledge during the training process

alleviating local model overfitting and enhancing its generalization capability. Furthermore

it aggregates the generalization capability of local models to a global model through server parameter aggregation

thereby improving the quality and convergence speed of the global model. Additionally

by employing a dynamic synchronization mechanism

it further enhances the accuracy and training efficiency of the global model. Experimental results demonstrate that FedSKD algorithm

under non-identically distributed data partition strategies

enhances model accuracy and training efficiency while reducing computational costs. On the CIFAR10/100

compared to the latest baseline FedMLD

the FedSKD achieved an average 2% improvement in accuracy and reduced the training cost by an average of 56%.

关键词

Keywords

references

BROWN T B , MANN B , RYDER N , et al . Language models are few-shot learners [C ] // Proceedings of the 34th International Conference on Neural Information Processing Systems . New York : ACM , 2020 : 1877 - 1901 .

NGUYEN D C , DING M , PATHIRANA P N , et al . Federated learning for industrial Internet of Things in future industries [J ] . IEEE Wireless Communications , 2021 , 28 ( 6 ): 192 - 199 .

ABDULRAHMAN S , TOUT H , OULD-SLIMANE H , et al . A survey on federated learning: The journey from centralized to distributed on-site learning and beyond [J ] . IEEE Internet of Things Journal , 2020 , 8 ( 7 ): 5476 - 5497 .

MCMAHAN H B , MOORE E , RAMAGE D , et al . Communication-efficient learning of deep networks from decentralized data [J ] . Proceedings of Machine Learning Research , 2017 , 54 : 1273 - 128 .

SINGH N , RUPCHANDANI J , ADHIKARI M . Personalized federated learning for heterogeneous edge device: Self-knowledge distillation approach [J ] . IEEE Transactions on Consumer Electronics , 2024 , 70 ( 1 ): 4625 - 46 .

LI T , SAHU A K , ZAHEER M , et al . Federated optimization in heterogeneous networks [J ] . Proceedings of Machine Learning and Systems , 2020 , 2 ( 3 ): 429 - 450 .

KARIMIREDDY S P , KALE S , MOHRI M , et al . Scaffold: Stochastic controlled averaging for federated learning [J ] . Proceedings of Machine Learning Research , 2020 , 119 : 5132 - 514 .

ZHANG J , LI Z Q , LI B , et al . Federated learning with label distribution skew via logits calibration [J ] . Proceedings of Machine Learning Research , 2020 , 162 : 26311 - 26329 .

JIN C , CHEN X D , GU Y , et al . FedDyn: A dynamic and efficient federated distillation approach on recommender system [C ] // 2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS) . Piscataway : IEEE , 2023 : 786 - 79 .

LI Q B , HE B S , SONG D . Model-contrastive federated learning [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 10713 - 10720 .

DENG W , CHEN X T , LI X Y , et al . Adaptive federated learning with negative inner product aggregation [J ] . IEEE Internet of Things Journal , 2024 , 11 ( 4 ): 6570 - 6581 .

LEE G , JEONG M , SHIN Y , et al . Preservation of the global knowledge by not-true distillation in federated learning [J ] . Advances in Neural Information Processing Systems , 2020 , 35 ( 11 ): 38461 - 38474 .

HE Y T , CHEN Y Q , YANG X D , et al . Class-wise adaptive self distillation for federated learning on non-IID data (student abstract) [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 36 ( 11 ): 12967 - 12968 .

LI D L , WANG J P . FedMD: Heterogenous federated learning via model distillation [EB/OL ] . ( 2019-10-08 )[ 2024-04-09 ] . https://arxiv.org/abs/1910.03581v1 https://arxiv.org/abs/1910.03581v1 .

HE C Y , ANNAVARAM M , AVESTIMEHR S , et al . Group knowledge transfer [C ] // Proceedings of the 34th International Conference on Neural Information Processing Systems . New York : ACM , 2020 : 14068 - 14080 .

HU L , YAN H Y , LI L , et al . MHAT: An efficient model-heterogenous aggregation training scheme for federated learning [J ] . Information Sciences , 2021 , 560 : 493 - 50 .

DENG Y H , REN J , TANG C , et al . A hierarchical knowledge transfer framework for heterogeneous federated learning [C ] // IEEE INFOCOM 2023 - IEEE Conference on Computer Communications . Piscataway : IEEE , 2023 : 1 - 10 .

ITAHARA S , NISHIO T , KODA Y , et al . Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-IID private data [J ] . IEEE Transactions on Mobile Computing , 2020 , 22 ( 1 ): 191 - 205 .

YAO D Z , PAN W N , DAI Y T , et al . FedGKD: Toward heterogeneous federated learning via global knowledge distillation [J ] . IEEE Transactions on Computers , 2024 , 73 ( 1 ): 3 - 17 .

ZHANG J , GUO S , GUO J C , et al . Towards data-independent knowledge transfer in model-heterogeneous federated learning [J ] . IEEE Transactions on Computers , 2020 , 72 ( 10 ): 2888 - 2901 .

SU T T , ZHANG J S , YU Z Y , et al . STKD: Distilling knowledge from synchronous teaching for efficient model compression [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2023 , 34 ( 12 ): 10051 - 10064 .

SHEN Y Q , XU L W , YANG Y Z , et al . Self-distillation from the last mini-batch for consistency regularization [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 11933 - 1194 .

HINTON G , VINYALS O , DEAN J . Distilling the knowledge in a neural network [EB/OL ] . ( 2015-04-09 )[ 2024-04-09 ] . https://arxiv.org/abs/150.02531v1 https://arxiv.org/abs/150.02531v1 .

LI M Y , LIN J , DING Y Y , et al . GAN compression: Efficient architectures for interactive conditional GANs [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 5284 - 5294 .

PARK W , KIM D , LU Y , et al . Relational knowledge distillation [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 3962 - 3971 .

邵仁荣 , 刘宇昂 , 张伟 , 等 . 深度学习中知识蒸馏研究综述 [J ] . 计算机学报 , 2022 , 45 ( 8 ): 1638 - 1673 .

SHAO R R , LIU Y A , ZHANG W , et al . A survey of knowledge distillation in deep learning [J ] . Chinese Journal of Computers , 2022 , 45 ( 8 ): 1638 - 1673 . (in Chinese)

LEE H , HWANG S J , SHIN J , et al . Self-supervised label augmentation via input transformations [C ] // Proceedings of the 37th International Conference on Machine Learning . New York : ACM , 2020 : 5714 - 5724 .

XU T B , LIU C L . Data-distortion guided self-distillation for deep neural networks [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2019 , 33 ( 1 ): 5565 - 5572 .

LONG Z X , MA F Y , SUN B , et al . Diversified branch fusion for self-knowledge distillation [J ] . Information Fusion , 20 , 90 : 12- .

ZHANG L F , BAO C L , MA K S . Self-distillation: Towards efficient and compact neural networks [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2015 , 44 ( 8 ): 4388 - 4404 .

FURLANELLO T , LIPTON Z , TSCHANNEN M , et al . Born again neural networks [J ] . Proceedings of Machine Learning Research , 2018 , 80 : 1607 - 1616 .

JIANG D L , SHAN C , ZHANG Z H . Federated learning algorithm based on knowledge distillation [C ] // 2020 International Conference on Artificial Intelligence and Computer Engineering (ICAICE) . Piscataway : IEEE , 2020 : 163 - 167 .

LU J H , LI S K , BAO K X , et al . Federated learning with label-masking distillation [C ] // Proceedings of the 31st ACM International Conference on Multimedia . New York : ACM , 2023 : 222 - 230 .

ZHANG H , WU T T , CHENG S Y , et al . Aperiodic local SGD: Beyond local SGD [C ] // Proceedings of the 51st International Conference on Parallel Processing . New York : ACM , 2023 : 1 - 10 .

XIAO H , RASUL K , VOLLGRAF R . Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms [EB/OL ] . ( 2017-09-15 )[ 2024-04-09 ] . https://arxiv.org/pdf/1708.07747 https://arxiv.org/pdf/1708.07747 .

ALEX K . Learning multiple layers of features from tiny images [EB/OL ] . ( 2009-08-08 )[ 2024-04-09 ] . https://www.cs.toronto.edu/kriz/learning-features-2009-TR https://www.cs.toronto.edu/kriz/learning-features-2009-TR .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

面向C-V2I的基于边缘计算的智能信道估计

面向时序异常检测的可变视距多向扫描方法

基于稀疏平滑自蒸馏的差分隐私深度学习方法

基于非一般类算子融合方法及硬件架构设计