Cloud-Edge Collaborative Retraining of Foundation Models at the Block Granularity

ZHANG Qing-long; HAN Rui; LIU Chi

doi:10.12263/DZXB.20240518

您当前的位置：

首页 >

文章列表页 >

Cloud-Edge Collaborative Retraining of Foundation Models at the Block Granularity

Cloud-Edge-Device Collaborative Data Management, Analysis and System | 更新时间：2025-12-08

- Cloud-Edge Collaborative Retraining of Foundation Models at the Block Granularity
  增强出版
- ACTA ELECTRONICA SINICA Vol. 53, Issue 2, Pages: 287-300(2025)
- 作者机构：
  
  北京理工大学计算机学院，北京 100081
- 作者简介：
- 基金信息：
  
  National Key Research and Development Program of China(2023YFE0209100);National Natural Science Foundation of China(62272046;62132019;61872337)
- DOI：10.12263/DZXB.20240518
  CLC： TP391;
- Received：04 June 2024，
  
  Revised：2024-10-27，
  
  Published：25 February 2025
- 稿件说明：
移动端阅览
张青龙, 韩锐, 刘驰. 云边协同大模型块粒度重训方法[J]. 电子学报, 2025, 53(02): 287-300.

ZHANG Qing-long, HAN Rui, LIU Chi. Cloud-Edge Collaborative Retraining of Foundation Models at the Block Granularity[J]. Acta Electronica Sinica, 2025, 53(02): 287-300.
张青龙, 韩锐, 刘驰. 云边协同大模型块粒度重训方法[J]. 电子学报, 2025, 53(02): 287-300. DOI：10.12263/DZXB.20240518

ZHANG Qing-long, HAN Rui, LIU Chi. Cloud-Edge Collaborative Retraining of Foundation Models at the Block Granularity[J]. Acta Electronica Sinica, 2025, 53(02): 287-300. DOI：10.12263/DZXB.20240518

摘要

边缘侧大模型外部环境的不确定性（如路边摄像头画面中天气、光照、物体密度的变化），导致其输入数据分布持续改变，因此需进行重训以维持高精度.受限于设备可用资源和重训窗口，现有技术仅能训练固定压缩模型，其有限的泛化能力导致模型精度显著降低.本文提出云边协同大模型块粒度重训方法，引入模型重训缩放定律评估不同块对边缘侧当前数据的精度贡献，以此为依据生成有限资源下最优重训方案，将云平台大模型中精度最相关部分动态转换为边缘侧可重训小模型，构建大小模型协同训练系统.真实云边平台上对比实验表明，本文方法可以在相同资源消耗下提升大模型重训精度81.24%，并支持最大至330亿参数大模型重训.

Abstract

Foundation models deployed in dynamic edge environment encounter continuously evolving input data distributions

requiring retraining them to maintain high accuracy. However

existing retraining techniques can only train fixed compressed models within the constraints of device resources and retraining windows

thus considerably lowering accuracies due to these small models’ limited generalization ability. For such an issue

this paper proposes BlockTrainer

an edge-cloud collaborative retraining approach of foundation models at the block granularity. BlockTrainer first introduces a model retraining scaling law to evaluate the accuracy contributions of different blocks in a foundation model according to its latest input data at edge. Based on this evaluation

it generates the optimal retraining solution under resource constraints

and dynamically converts the most accuracy-relevant parts of the model into retrainable small models at edge

thereby constructing a collaborative training system between large and small models. Comparative experiments on real edge-cloud platforms show that BlockTrainer improves the retraining accuracy of foundation models by 81.24% using the same resource consumptions

and supports retraining a model of up to 33 billion parameters.

关键词

Keywords

references

BOMMASANI R , HUDSON D A , ADELI E , et al . On the opportunities and risks of foundation models [EB/OL ] . ( 2021-07-12 )[ 2024-06-04 ] . https://arxiv.org/abs/2108.07258v3 https://arxiv.org/abs/2108.07258v3 .

BHARDWAJ R , XIA Z X , ANANTHANARAYANAN G , et al . Ekya: Continuous learning of video analytics models on edge compute servers [EB/OL ] . ( 2020-12-19 )[ 2024-06-04 ] . http://arxiv.org/abs/2012.10557 http://arxiv.org/abs/2012.10557 .

KHANI M , ANANTHANARAYANAN G , HSIEH K , et al . RECL: Responsive resource-efficient continuous learning for video analytics [C ] // 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23) . Boston : USENIX Association , 2023 : 917 - 932 .

XU S K , YAO J C , LUO R , et al . Towards efficient task-driven model reprogramming with foundation models [EB/OL ] . ( 2023-04-05 )[ 2024-06-04 ] . http://arxiv.org/abs/2304.02263 http://arxiv.org/abs/2304.02263 .

HU E J , SHEN Y , WALLIS P , et al . LoRA: Low-rank adaptation of large language models [C ] // International Conference on Learning Representations . Virtual Event : OpenReview.net , 2022 : 1 - 13 .

LI Y X , YU Y F , ZHANG Q R , et al . LoSparse: Structured compression of large language models based on low-rank and sparse approximation [EB/OL ] . ( 2023-06-26 )[ 2024-06-04 ] . https://arxiv.org/abs/2306.11222v2 https://arxiv.org/abs/2306.11222v2 .

SANH V , DEBUT L , CHAUMOND J , et al . DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter [EB/OL ] . ( 2020-03-01 )[ 2024-06-04 ] . https://arxiv.org/abs/1910.01108v4 https://arxiv.org/abs/1910.01108v4 .

JIAO X Q , YIN Y C , SHANG L F , et al . TinyBERT: Distilling BERT for natural language understanding [C ] // Findings of the Association for Computational Linguistics: EMNLP 2020 . Stroudsburg : Association for Computational Linguistics , 2020 : 4163 - 4174 .

LIU Z , HU H , LIN Y T , et al . Swin transformer V2: Scaling up capacity and resolution [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 11999 - 12009 .

TAN M X , LE Q V . EfficientNet: Rethinking model scaling for convolutional neural networks [EB/OL ] . ( 2020-09-11 )[ 2024-06-04 ] . https://arxiv.org/abs/1905.11946v5 https://arxiv.org/abs/1905.11946v5 .

HOCHREITER S , SCHMIDHUBER J . Long short-term memory [J ] . Neural Computation , 1997 , 9 ( 8 ): 1735 - 1780 .

HO J , JAIN A , ABBEEL P . Denoising diffusion probabilistic models [C ] // NIPS'20: Proceedings of the 34th International Conference on Neural Information Processing Systems . New York : Curran Associates Inc , 2020 : 6840 - 6851 .

KAPLAN J , MCCANDLISH S , HENIGHAN T , et al . Scaling laws for neural language models [EB/OL ] . ( 2020-01-23 )[ 2024-06-04 ] . http://arxiv.org/abs/2001.08361 http://arxiv.org/abs/2001.08361 .

ZHANG B , LIU Z , CHERRY C , et al . When scaling meets LLM finetuning: The effect of data, model and finetuning method [C ] // The Twelfth International Conference on Learning Representations . Virtual Event : OpenReview.net , 2024 : 1 - 14 .

DETTMERS T , LEWIS M , BELKADA Y , et al . GPT3.int8(): 8-bit matrix multiplication for transformers at scale [C ] // Advances in Neural Information Processing Systems . New Orleans : ACM , 2022 : 30318 - 30332 .

HAN R , ZHANG Q L , LIU C H , et al . LegoDNN: Block-grained scaling of deep neural networks for mobile vision [C ] // Proceedings of the 27th Annual International Conference on Mobile Computing and Networking . New York : ACM , 2021 : 406 - 419 .

HOU L , HUANG Z Q , SHANG L F , et al . DynaBERT: Dynamic BERT with adaptive width and depth [EB/OL ] . ( 2020-10-09 )[ 2024-06-04 ] . https://arxiv.org/abs/2004.04037v2 https://arxiv.org/abs/2004.04037v2 .

GAO X T , ZHAO Y R , DUDZIAK Ł , et al . Dynamic channel pruning: Feature boosting and suppression [EB/OL ] . ( 2018-10-12 )[ 2024-06-04 ] . http://arxiv.org/abs/1810.05331 http://arxiv.org/abs/1810.05331 .

WEN H , LI Y C , ZHANG Z S , et al . AdaptiveNet: Post-deployment neural architecture adaptation for diverse edge environments [C ] // Proceedings of the 29th Annual International Conference on Mobile Computing and Networking . New York : ACM , 2023 : 1 - 17 .

陈思光 , 陈佳民 , 赵传信 . 基于深度强化学习的云边协同计算迁移研究 [J ] . 电子学报 , 2021 , 49 ( 1 ): 157 - 166 .

CHEN S G , CHEN J M , ZHAO C X . Deep reinforcement learning based cloud-edge collaborative computation offloading mechanism [J ] . Acta Electronica Sinica , 2021 , 49 ( 1 ): 157 - 166 . (in Chinese)

HERNANDEZ D , KAPLAN J , HENIGHAN T , et al . Scaling laws for transfer [EB/OL ] . ( 2021-02-02 )[ 2024-06-04 ] . https://arxiv.org/abs/2102.01293v1 https://arxiv.org/abs/2102.01293v1 .

XU G Y , HAO J W , SHEN L , et al . LGViT: Dynamic early exiting for accelerating vision transformer [C ] // Proceedings of the 31st ACM International Conference on Multimedia . New York : ACM , 2023 : 9103 - 9114 .

PENG Y H , BAO Y X , CHEN Y R , et al . Optimus: An efficient dynamic resource scheduler for deep learning clusters [C ] // Proceedings of the Thirteenth EuroSys Conference . New York : ACM , 2018 : 1 - 14 .

DENG W J , ZHENG L . Are labels always necessary for classifier accuracy evaluation? [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 15064 - 15073 .

LIANG J , HU D P , FENG J S . Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation[EB/OL ] . ( 2021-06-01 )[ 2024-06-04 ] . http://arxiv.org/abs/2002.08546v6 http://arxiv.org/abs/2002.08546v6 .

WOLF T , DEBUT L , SANH V , et al . HuggingFace’s transformers: State-of-the-art natural language processing [EB/OL ] . ( 2020-07-14 )[ 2024-06-04 ] . http://arxiv.org/abs/1910.03771v5 http://arxiv.org/abs/1910.03771v5 .

DOSOVITSKIY A , BEYER L , KOLESNIKOV A , et al . An image is worth 16 x 16 words: Transformers for image recognition at scale[EB/OL ] . ( 2020-10-22 )[ 2024-06-04 ] . http://arxiv.org/abs/2010.11929 http://arxiv.org/abs/2010.11929 .

ZHANG S S , ROLLER S , GOYAL N , et al . OPT: Open pre-trained transformer language models [EB/OL ] . ( 2022-06-21 )[ 2024-06-04 ] . https://arxiv.org/abs/2205.01068v4 https://arxiv.org/abs/2205.01068v4 .

TOUVRON H , LAVRIL T , IZACARD G , et al . LLaMA: Open and efficient foundation language models [EB/OL ] . ( 2023-02-27 )[ 2024-06-04 ] . https://arxiv.org/abs/2302.13971v1 https://arxiv.org/abs/2302.13971v1 .

AN Y Q , ZHAO X , YU T , et al . Fluctuation-based adaptive structured pruning for large language models [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2024 , 38 ( 10 ): 10865 - 10873 .

ASHKBOOS S , CROCI M L , M G do NASCIMENTO , et al . SliceGPT: Compress large language models by deleting rows and columns [C ] // The Twelfth International Conference on Learning Representations . Virtual Event : OpenReview.net , 2024 : 1 - 12 .

ZHANG Q L , HAN R , LIU C H , et al . EdgeVisionBench: A benchmark of evolving input domains for vision applications at edge [C ] // 2023 IEEE 39th International Conference on Data Engineering (ICDE) . Piscataway : IEEE , 2023 : 3643 - 3646 .

LIU Z , WANG J , DAO T , et al . Deja vu: Contextual sparsity for efficient llms at inference time [C ] // International Conference on Machine Learning . Hawaii : PMLR , 2023 : 22137 - 22176 .

ALIZADEH K , MIRZADEH I , BELENKO D , et al . LLM in a flash: Efficient large language model inference with limited memory [EB/OL ] . ( 2023-12-12 )[ 2024-06-04 ] . https://arxiv.org/abs/2312.11514v1 https://arxiv.org/abs/2312.11514v1 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

No data

Related Author

No data

Related Institution

No data

⁰