面向工业场景的边-云协同大语言模型细粒度推理任务卸载

廖玲玲; 陶铭; 谢仁平; 张引; 袁华强

doi:10.12263/DZXB.20250411

您当前的位置：

首页 >

文章列表页 >

面向工业场景的边-云协同大语言模型细粒度推理任务卸载

大模型与互联网 | 更新时间：2026-02-10

- 面向工业场景的边-云协同大语言模型细粒度推理任务卸载
- Fine-Grained Inference Task Offloading for Large Language Model in Industrial Edge-Cloud Collaborative Scenarios
- 电子学报 2025年53卷第11期页码：3880-3893
- 作者机构：
  
  1.东莞理工学院计算机科学与技术学院（网络空间安全学院），广东东莞 523808
  2.电子科技大学信息与通信工程学院，四川成都 611731
- 作者简介：
  
  [ "廖玲玲女，2001年4月出生于江西省赣州市.现为东莞理工学院计算机科学与技术学院硕士研究生.主要研究方向为工业物联网.E-mail: liaolingling@dgut.edu.cn" ]
  [ "陶铭男，1986年6月出生于安徽省马鞍山市.现为东莞理工学院计算机科学与技术学院教授.主要研究方向为工业物联网.E-mail: taom@dgut.edu.cn" ]
  [ "谢仁平男，1989年1月出生于湖南省娄底市.现为东莞理工学院计算机科学与技术学院特聘副研究员.主要研究方向为工业物联网.E-mail: xierenping@dgut.edu.cn" ]
  [ "张引男，1986年10月出生于江西省九江市.现为电子科技大学信息与通信工程学院研究员、博士生导师.主要研究方向为边缘计算、物联网.E-mail: zhangyin123@uestc.edu.cn" ]
  [ "袁华强男，1966年12月出生于湖南省衡阳市.现为东莞理工学院计算机科学与技术学院教授.主要研究方向为工业物联网.E-mail: yuanhq@dgut.edu.cn" ]
- 基金信息：
  
  国家自然科学基金(62572122;62572099)
- DOI：10.12263/DZXB.20250411
  中图分类号： TP393.1;
- 收稿：2025-05-23，
  
  录用：2025-10-11，
  
  纸质出版：2025-11-25
- 稿件说明：
移动端阅览
廖玲玲, 陶铭, 谢仁平, 等. 面向工业场景的边-云协同大语言模型细粒度推理任务卸载[J]. 电子学报, 2025, 53(11): 3880-3893.

LIAO Ling-ling, TAO Ming, XIE Ren-ping, et al. Fine-Grained Inference Task Offloading for Large Language Model in Industrial Edge-Cloud Collaborative Scenarios[J]. Acta Electronica Sinica, 2025, 53(11): 3880-3893.
廖玲玲, 陶铭, 谢仁平, 等. 面向工业场景的边-云协同大语言模型细粒度推理任务卸载[J]. 电子学报, 2025, 53(11): 3880-3893. DOI：10.12263/DZXB.20250411

LIAO Ling-ling, TAO Ming, XIE Ren-ping, et al. Fine-Grained Inference Task Offloading for Large Language Model in Industrial Edge-Cloud Collaborative Scenarios[J]. Acta Electronica Sinica, 2025, 53(11): 3880-3893. DOI：10.12263/DZXB.20250411

摘要

大语言模型（Large Language Model，LLM）在任务推理等领域展现出卓越性能.然而，面向复杂工业场景的实时高效推理仍是亟待解决的关键问题.传统中心化云推理架构受限于长思维链（Chain of Thought，CoT）推理延迟与数据传输拥堵，难以满足复杂工业推理任务对低时延的严苛需求；边缘侧部署的轻量化LLM能实现快速响应，但是推理能力受限，难以保障推理质量.为此，边-云协同推理成为必然选择.然而，单一模态的LLM难以兼顾模态特性和任务需求，多模态LLM高昂的算力成本限制了其普适性；直接利用LLM推理复杂任务容易陷入固有的幻觉困境，影响推理质量.因此，本文提出了一种基于边-云协同的LLM细粒度推理任务卸载框架，在边缘端部署轻量化专属模态LLM，充分适配特定数据模态，低时延高效处理简单任务；在云端部署具备强大推理能力的多模态深度LLM，执行复杂逻辑推理任务，保障推理质量.将复杂LLM推理任务细粒度地划分为三个阶段，并构建有向无环图（Directed Acyclic Graph，DAG）.在此基础上，进一步提出通信与推理任务执行模型，并将LLM推理任务建模为总体推理时延与成本加权和的最小化问题.通过证明该问题是离散马尔可夫决策过程（Markov Decision Process，MDP），针对动态环境中子任务特征与系统资源状态的复杂交互，设计了融合置信上界（Upper Confidence Bound，UCB）的动作选择机制和反事实多智能体策略梯度（COunterfactual Multi-Agent policy gradient，COMA）的问题求解方案UCB-COMA，实现子任务调度顺序与推理子任务执行位置的联合最优决策.实验结果表明，本文方案的性能优于对比方案.

Abstract

Large language model (LLM) has exhibited exceptional performance in inference. However

achieving real-time and high-efficiency inference in complex industrial scenarios remains a significant challenge. Traditional centralized cloud-based inference architectures are constrained by the latency of long chain of thought (CoT) reasoning and transmission bottleneck

rendering them inadequate to meet the stringent low-latency requirements of complex industrial inference. Conversely

although lightweight LLM deployed on the edge can achieve rapid response

limited inference capabilities also compromise the inference quality. Therefore

edge-cloud collaborative inference emerges as an inevitable choice. However

single-modal LLM struggle to accommodate modality-specific characteristics and diverse task requirements

while the widespread applicability of multimodal LLM is limited by the high computational costs. Moreover

directly employing an LLM for complex inference often leads to hallucinations

undermining inference reliability. To address the issues

a fine-grained LLM inference task offloading framework based on edge-cloud collaboration is proposed in this paper. Specifically

lightweight and modality-specialized LLM are deployed on the edge to efficiently process simple tasks with minimal latency

while a powerful multimodal deep LLM resides in the cloud to execute complex logical reasoning tasks

ensuring inference quality. Complex LLM inference is decomposed into three stages and modeled as a directed acyclic graph (DAG). With this representation

the communication and inference models are constructed

and the LLM inference is formulated as a minimization problem of the weighted sum between overall inference latency and cost. With the proof that the investigated problem can be transferred into a discrete Markov decision process (MDP)

considering the complex interactions between subtask features and dynamic system resource states

a solution named UCB-COMA

integrating the upper confidence bound (UCB)-based action selection mechanism with counterfactual multi-agent policy gradient (COMA)

is designed to enable joint optimization of subtask scheduling order and executing position of inference subtask. Experimental results demonstrate that the performance of UCB-COMA is superior to that of comparison schemes.

关键词

Keywords

references

ZHANG C , YU M , WANG W , et al . Mark: Exploiting cloud services for cost-effective, slo-aware machine learning inference serving [C ] // Proceedings of 2019 USENIX Annual Technical Conference (USENIX ATC) . California : USENIX Association , 2019 : 1049 - 1062 .

MA X , FANG G , WANG X . LLM-pruner: On the structural pruning of large language models [J ] . Advances in Neural Information Processing Systems , 2023 , 36 : 21702 - 21720 .

张青龙 , 韩锐 , 刘驰 . 云边协同大模型块粒度重训方法 [J ] . 电子学报 , 2025 , 53 ( 2 ): 287 - 300 .

ZHANG Q L , HAN R , LIU C . Cloud-edge collaborative retraining of foundation models at the block granularity [J ] . Acta Electronica Sinica , 2025 , 53 ( 2 ): 287 - 300 . (in Chinese)

TANG X H , LIU F G , XU D S , et al . LLM-assisted reinforcement learning: Leveraging lightweight large language model capabilities for efficient task scheduling in multi-cloud environment [J ] . IEEE Transactions on Consumer Electronics , 2025 , 71 ( 2 ): 5631 - 5644 .

ZHANG Z Y , ZHAO Y , LI H , et al . DVFO: Learning-based DVFS for energy-efficient edge-cloud collaborative inference [J ] . IEEE Transactions on Mobile Computing , 2024 , 23 ( 10 ): 9042 - 9059 .

SRIRAMANAN G , BHARTI S , SADASIVAN V S , et al . LLM-check: Investigating detection of hallucinations in large language models [J ] . Advances in Neural Information Processing Systems , 2024 , 37 : 34188 - 34216 .

ZHOU D , SCHARLI N , HOU L , et al . Least-to-most prompting enables complex reasoning in large language models [EB/OL ] . ( 2023-04-16 )[ 2025-03-23 ] . https://arxiv.org/abs/2205.10625 https://arxiv.org/abs/2205.10625 .

KHOT T , TRIVEDI H , FINLAYSON M , et al . Decomposed prompting: A modular approach for solving complex tasks [EB/OL ] . ( 2023-04-11 )[ 2025-03-23 ] . https://arxiv.org/abs/2210.02406 https://arxiv.org/abs/2210.02406 .

秦龙 , 武万森 , 刘丹 , 等 . 基于大语言模型的复杂任务自主规划处理框架 [J ] . 自动化学报 , 2024 , 50 ( 4 ): 862 - 872 .

QIN L , WU W S , LIU D , et al . Autonomous planning and processing framework for complex tasks based on large language models [J ] . Acta Automatica Sinica , 2024 , 50 ( 4 ): 862 - 872 . (in Chinese)

TAO M , LIAO L L , XIE R P , et al . Bidding-enabled resource pricing for computation offloading in 6G vehicle-to-edge networks [J ] . IEEE Transactions on Intelligent Transportation Systems , 2025 , 26 ( 10 ): 17838 - 17850 .

YANG Z M , YANG Y H , ZHAO C , et al . PerLLM: Personalized inference scheduling with edge-cloud collaboration for diverse LLM services [EB/OL ] . ( 2024-05-23 )[ 2025-03-23 ] . https://arXiv.org/abs/2405.14636 https://arXiv.org/abs/2405.14636 .

SEID A M , BOATENG G O , MARERI B , et al . Multi-agent DRL for task offloading and resource allocation in multi-UAV enabled IoT edge network [J ] . IEEE Transactions on Network and Service Management , 2021 , 18 ( 4 ): 4531 - 4547 .

LIAO L L , TAO M , DONG A N , et al . Graph-convolutional-network-enabled task offloading for industrial image recognition in digital twin edge networks [J ] . IEEE Internet of Things Journal , 2025 , 12 ( 15 ): 29176 - 29188 .

崔玉亚 , 张德干 , 张婷 , 等 . 一种面向移动边缘计算的多用户细粒度任务卸载调度方法 [J ] . 电子学报 , 2021 , 49 ( 11 ): 2202 - 2207 .

CUI Y Y , ZHANG D G , ZHANG T , et al . A multi-user fine-grained task offloading scheduling approach of mobile edge computing [J ] . Acta Electronica Sinica , 2021 , 49 ( 11 ): 2202 - 2207 . (in Chinese)

LIN L , LIAO X F , JIN H , et al . Computation offloading toward edge computing [J ] . Proceedings of the IEEE , 2019 , 107 ( 8 ): 1584 - 1607 .

高晗 , 田育龙 , 许封元 , 等 . 深度学习模型压缩与加速综述 [J ] . 软件学报 , 2021 , 32 ( 1 ): 68 - 92 .

GAO H , TIAN Y L , XU F Y , et al . Survey of deep learning model compression and acceleration [J ] . Journal of Software , 2021 , 32 ( 1 ): 68 - 92 . (in Chinese)

HU Y Q , YE D D , KANG J W , et al . A cloud-edge collaborative architecture for multimodal LLM-based advanced driver assistance systems in IoT networks [J ] . IEEE Internet of Things Journal , 2025 , 12 ( 10 ): 13208 - 13221 .

ZHANG M J , SHEN X M , CAO J N , et al . EdgeShard: Efficient LLM inference via collaborative edge computing [J ] . IEEE Internet of Things Journal , 2025 , 12 ( 10 ): 13119 - 13131 .

HE Y , FANG J C , YU F R , et al . Large language models (LLMs) inference offloading and resource allocation in cloud-edge computing: An active inference approach [J ] . IEEE Transactions on Mobile Computing , 2024 , 23 ( 12 ): 11253 - 11264 .

REN Y Z , ZHANG H J , YU F R , et al . Industrial Internet of Things with large language models (LLMs): An intelligence-based reinforcement learning approach [J ] . IEEE Transactions on Mobile Computing , 2025 , 24 ( 5 ): 4136 - 4152 .

ZHOU H , HU C M , YUAN D , et al . Generative AI as a service in 6G edge-cloud: Generation task offloading by in-context learning [J ] . IEEE Wireless Communications Letters , 2025 , 14 ( 3 ): 711 - 715 .

TAO M , LI X Q , FENG J , et al . Multi-agent cooperation for computing power scheduling in UAVs empowered aerial computing systems [J ] . IEEE Journal on Selected Areas in Communications , 2024 , 42 ( 12 ): 3521 - 3535 .

CHEN M , WEI Z , HUANG Z , et al . Simple and deep graph convolutional networks [C ] // Proceedings of Machine Learning Research (PMLR) . Cambridge : PMLR , 2020 , 119 : 1725 - 1735 .

FOERSTER J , FARQUHAR G , AFOURAS T , et al . Counterfactual multi-agent policy gradients [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2018 , 32 ( 1 ): 2974 - 2982 .

CARPENTIER A , LAZARIC A , GHAVAMZADEH M , et al . Upper-confidence-bound algorithms for active learning in multi-armed bandits [M ] // Algorithmic Learning Theory . Berlin, Heidelberg : Springer , 2011 : 189 - 203 .

BRADTKE S J , BARTO A G . Linear least-squares algorithms for temporal difference learning [J ] . Machine Learning , 1996 , 22 ( 1 ): 33 - 57 .

SHAH-MANSOURI H , WONG V W S , SCHOBER R . Joint optimal pricing and task scheduling in mobile cloud computing systems [J ] . IEEE Transactions on Wireless Communications , 2017 , 16 ( 8 ): 5218 - 5232 .

ZHANG Q , YANG Y Y , YI C Y , et al . Energy- and cost-aware offloading of dependent tasks with edge-cloud collaboration for human digital twin [J ] . IEEE Internet of Things Journal , 2024 , 11 ( 17 ): 29116 - 29131 .

LI S H , WU Y , CUI X Y , et al . Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2019 , 33 ( 1 ): 4213 - 4220 .

WU Z , YU C , YE D , et al . Coordinated proximal policy optimization [J ] . Advances in Neural Information Processing Systems , 2021 , 34 : 26437 - 26448 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

车联网边缘计算环境下基于流量预测的高效任务卸载策略研究

基于动态关系原型的持续关系抽取技术

基于因果思维树的电动汽车电池SOC预测模型

Na₅Eu(MO₄)₄(M=Mo,W)的B(λkq)强度参数和跃迁几率

YGG:Cr晶体的光谱特性