暨南大学信息科学技术学院,广东广州 511346
[ "刘忠仁 男,1996年1月出生于江西省赣州市.现为暨南大学信息科学技术学院博士研究生.主要研究方向人工智能与大语言模型推理系统.E-mail: lzrisme@stu2024.jnu.edu.cn" ]
[ "李哲涛 男,1980年1月出生于湖南省邵阳市.现为暨南大学信息科学技术学院教授、博士生导师.主要研究方向为云计算、智能网络、人工智能等.E-mail: liztchina@hotmail.com" ]
[ "王建辉 男,1997年12月出生于湖南省永州市.现为暨南大学信息科学技术学院博士研究生.主要研究方向为边缘计算和人工智能.E-mail: tranfer98@foxmail.com" ]
[ "肖勇 男,2000年10月出生于湖南省衡阳市.现为暨南大学信息科学技术学院博士研究生.主要研究方向为人工智能与数据隐私安全.Email: xiaoyong@stu2022.jnu.edu.cn" ]
[ "曾曦玉 女,1999年4月出生于四川省宜宾市.现为暨南大学信息科学技术学院博士研究生.主要研究方向为人工智能、隐私安全及其应用.E-mail: xyzeng@stu2025.jnu.edu.cn" ]
[ "李俊 男,2003年6月出生于河南省信阳市.现为暨南大学网络空间安全学院硕士研究生.主要研究方向为大模型代理系统及其安全领域.E-mail: koinu@qq.com" ]
[ "莫光峰 男,2002年2月出生于广东省茂名市.现为暨南大学信息科学技术学院硕士研究生.主要研究方向为人工智能与模型评估.E-mail: moguangfeng2002@126.com" ]
收稿:2025-06-11,
录用:2025-11-11,
纸质出版:2025-11-25
移动端阅览
刘忠仁, 李哲涛, 王建辉, 等. 模型互联中多模型串并联协作推理[J]. 电子学报, 2025, 53(11): 3817-3835.
LIU Zhong-ren, LI Zhe-tao, WANG Jian-hui, et al. Multi-Model Serial and Parallel Collaborative Inference in AI-ModelNet[J]. Acta Electronica Sinica, 2025, 53(11): 3817-3835.
刘忠仁, 李哲涛, 王建辉, 等. 模型互联中多模型串并联协作推理[J]. 电子学报, 2025, 53(11): 3817-3835. DOI:10.12263/DZXB.20250503
LIU Zhong-ren, LI Zhe-tao, WANG Jian-hui, et al. Multi-Model Serial and Parallel Collaborative Inference in AI-ModelNet[J]. Acta Electronica Sinica, 2025, 53(11): 3817-3835. DOI:10.12263/DZXB.20250503
大语言模型(Large Language Models,LLMs)凭借其庞大的参数规模和强大的语义表达能力,在自然语言处理、计算机视觉等领域取得突破性进展,并逐渐成为智能系统的关键基础.然而,随着模型轻量化、本地化定制及场景专用化需求持续增强,面向特定任务开发的专有化模型快速涌现.这类模型通常在局部领域具备能力优势,但难以独立覆盖多任务、多领域的复杂推理需求,从而推动了多模型协作推理的研究.现有研究多侧重于模型融合或单一协作范式,难以充分挖掘各模型间的优势互补潜力,且在协作结构和路径机制方面缺乏系统性的探索.为此,本文提出一种面向模型互联场景的多模型协作结构推理方法,构建了由线性链式结构向多路径组合结构演进的协作推理体系.在基础协作层面,设计了串联推理(Serial Inference,SI)与并联推理(Parallel Inference,PI)两种核心范式,分别通过阶段式信息传递与多模型并行处理增强推理过程中的语义收敛性与信息覆盖度.在此基础上,进一步从协作范式层面提出了“先串后并”(Serial-to-Parallel,S2P)与“先并后串”(Parallel-to-Serial,P2S)两种组合策略,实现协作路径在深度与广度之间的动态调度,拓展了多模型协作的结构表达能力与推理能力边界.本文在数学推理、知识理解和符号推理三类典型任务上搭建了系统实验框架,对四类协作策略进行了全面评估.实验结果表明,四类协作策略相较于单模型推理在平均准确率上分别提升了24.33、16.66、26.66和25.33个百分点.进一步分析发现,组合协作策略在融合串联与并联结构优势的同时,能够有效压缩整体推理时延,并在相较于最优单模型可接受的时延增量条件下,实现了更高的推理准确率,展现出在多任务场景下更优的性能-效率的权衡.此外,本文还系统分析了不同模型路径配置在协作过程中的表现差异,为多模型组网结构设计、协作机制优化及大规模模型互联体系的构建提供了理论依据与实证支撑.
Large language models (LLMs)
empowered by massive parameter scales and strong semantic representation capabilities
have achieved breakthrough progress in natural language processing
computer vision
and related fields
and have gradually become a key foundation of modern intelligent systems. However
increasing demands for lightweight deployment
on-device customization
and scenario-specific specialization have led to the rapid emergence of task-specific models. Although these specialized models exhibit strong capabilities within their respective domains
they are insufficient for handling complex multi-task and multi-domain reasoning independently
which motivates research on multi-model collaborative inference. Existing studies primarily focus on model fusion or single collaboration paradigms
which limits the exploitation of complementary strengths across models and lacks systematic exploration of collaboration structures and path mechanisms. To address these challenges
this study proposes a collaborative inference framework for model-interconnection scenarios
enabling an evolutionary shift from linear chain structures to multi-path composite structures. The framework formalizes two basic paradigms—serial inference (SI) and parallel inference (PI)—and further introduces two hybrid strategies
serial-to-parallel (S2P) and parallel-to-serial (P2S)
to dynamically coordinate depth- and breadth-oriented collaboration pathways. Comprehensive experiments on mathematical reasoning
knowledge understanding
and symbolic reasoning show that SI
PI
S2P
and P2S improve accuracy by 24.33
16.66
26.66
and 25.33 percentage points
respectively
compared with single-model inference. Additional analysis shows that hybrid collaboration significantly reduces overall inference latency while achieving higher accuracy
demonstrating a superior performance-efficiency trade-off. Moreover
the study reveals the structural impacts of different collaboration paths
offering theoretical insights and empirical evidence for the design of multi-model networks and efficient model-interconnection systems.
CHANG Y P , WANG X , WANG J D , et al . A survey on evaluation of large language models [J ] . ACM Transactions on Intelligent Systems and Technology , 2024 , 15 ( 3 ): 1 - 45 .
WANG W H , CHEN Z , CHEN X K , et al . VisionLLM: Large language model is also an open-ended decoder for vision-centric tasks [EB/OL ] . ( 2023-05-25 )[ 2025-09-20 ] . https://arXiv.org/abs/2305.11175 https://arXiv.org/abs/2305.11175 .
ZHANG D Z , YU Y H , DONG J H , et al . MM-LLMs: Recent advances in MultiModal large language models [C ] // Findings of the Association for Computational Linguistics ACL 2024 . Stroudsburg : ACL , 2024 : 12401 - 12430 .
杨赟辉 , 程虎 , 魏敬和 , 等 . 面向Transformer模型边缘端部署的常用激活函数高精度轻量级量化推理方法 [J ] . 电子学报 , 2024 , 52 ( 10 ): 3301 - 3311 .
YANG Y H , CHENG H , WEI J H , et al . High-precision lightweight quantization inference method for prevalent activation functions in transformer models in edge device deployment [J ] . Acta Electronica Sinica , 2024 , 52 ( 10 ): 3301 - 3311 . (in Chinese)
徐刚 , 刘志鹏 , 冯骐 , 等 . 大语言模型在教育信息化中的实践: 规范、框架与应用 [J ] . 通信学报 , 2024 , 45 ( S2 ): 229 - 241 .
XU G , LIU Z P , FENG Q , et al . Practical application of large language models in educational informatics: Specification, framework, and applications [J ] . Journal on Communications , 2024 , 45 ( S2 ): 229 - 241 .
赖清楠 , 金建栋 , 周昌令 . 基于大语言模型的网络威胁情报知识图谱构建技术研究 [J ] . 通信学报 , 2024 , 45 ( S2 ): 33 - 43 .
LAI Q N , JIN J D , ZHOU C L . Research on knowledge graph construction technology for cyber threat intelligence based on large language models [J ] . Journal on Communications , 2024 , 45 ( S2 ): 33 - 43 .
QIU P C , WU C Y , ZHANG X M , et al . Towards building multilingual language model for medicine [J ] . Nature Communications , 2024 , 15 ( 1 ): 8384 .
SUN Q S , YIN Z Y , LI X , et al . Corex: Pushing the boundaries of complex reasoning through multi-model collaboration [EB/OL ] . ( 2024-08-21 )[ 2025-09-20 ] . https://arXiv.org/abs/2310.00280 https://arXiv.org/abs/2310.00280 .
JIN Z J , KLEIMAN-WEINER M , MIHALCEA R , et al . Cooperate or collapse: Emergence of sustainable cooperation in a society of LLM agents [EB/OL ] . ( 2024-12-08 )[ 2025-10-10 ] . https://arxiv.org/abs/2404.16698 https://arxiv.org/abs/2404.16698 .
EHTESHAM A , SINGH A , GUPTA G K , et al . A survey of agent interoperability protocols: Model context protocol (MCP), agent communication protocol (ACP), agent-to-agent protocol (A2A), and agent network protocol (ANP) [EB/OL ] . ( 2025-05-23 )[ 2025-06-10 ] . https://arXiv.org/abs/2505.02279 https://arXiv.org/abs/2505.02279 .
LI Q M , XIE Y . From glue-code to protocols: A critical analysis of A2A and MCP integration for scalable agent systems [EB/OL ] . ( 2025-05-06 )[ 2025-06-10 ] . https://arXiv.org/abs/2505.03864 https://arXiv.org/abs/2505.03864 .
CHEN Z Y , YANG X C , LIN J C , et al . Cascade speculative drafting for even faster LLM inference [EB/OL ] . ( 2025-07-13 )[ 2025-09-10 ] . https://arXiv.org/abs/2312.11462 https://arXiv.org/abs/2312.11462 .
JIN X S , REN X , PREOTIUC-PIETRO D , et al . Dataless knowledge fusion by merging weights of language models [EB/OL ] . ( 2025-05-21 )[ 2025-09-20 ] . https://arXiv.org/abs/2212.09849 https://arXiv.org/abs/2212.09849 .
YANG E N , WANG Z Y , SHEN L , et al . AdaMerging: Adaptive model merging for multi-task learning [EB/OL ] . ( 2024-05-28 )[ 2025-09-10 ] . https://arXiv.org/abs/2310.02575 https://arXiv.org/abs/2310.02575 .
SHNITZER T , OU A , SILVA M , et al . Large language model routing with benchmark datasets [EB/OL ] . ( 2023-09-27 )[ 2025-09-20 ] . https://arXiv.org/abs/2309.15789 https://arXiv.org/abs/2309.15789 .
FENG X C , HUANG Y C , LI B H , et al . Ensemble learning for heterogeneous large language models with deep parallel collaboration [C ] // Proceedings of the 38th International Conference on Neural Information Processing Systems . New York : ACM , 2024 : 119838 - 119860 .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [EB/OL ] . ( 2023-08-02 )[ 2025-09-20 ] . https://arXiv.org/abs/1706.03762 https://arXiv.org/abs/1706.03762 .
OPENAI , ACHIAM J , ADLER S , et al . GPT-4 technical report [EB/OL ] . ( 2024-03-04 )[ 2025-06-10 ] . https://arXiv.org/abs/2303.08774 https://arXiv.org/abs/2303.08774 .
GRATTAFIORI A , DUBEY A , JAUHRI A , et al . The llama 3 herd of models [EB/OL ] . ( 2024-11-23 )[ 2025-06-10 ] . https://arXiv.org/abs/2407.21783 https://arXiv.org/abs/2407.21783 .
GUO D Y , YANG D J , ZHANG H W , et al . DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning [EB/OL ] . ( 2025-01-22 )[ 2025-06-10 ] . https://arXiv.org/abs/2501.12948 https://arXiv.org/abs/2501.12948 .
YANG A , YANG B S , ZHANG B C , et al . Qwen2.5 technical report [EB/OL ] . ( 2025-01-03 )[ 2025-06-10 ] . https://arXiv.org/abs/2412.15115 https://arXiv.org/abs/2412.15115 .
TEAM K , DU A G , GAO B F , et al . Kimi k 1 . 5 : Scaling reinforcement learning with LLMs[EB/OL ] . ( 2025-06-03 )[ 2025-09-20 ] . https://arXiv.org/abs/2501.12599 https://arXiv.org/abs/2501.12599 .
张青龙 , 韩锐 , 刘驰 . 云边协同大模型块粒度重训方法 [J ] . 电子学报 , 2025 , 53 ( 2 ): 287 - 300 .
ZHANG Q L , HAN R , LIU C . Cloud-edge collaborative retraining of foundation models at the block granularity [J ] . Acta Electronica Sinica , 2025 , 53 ( 2 ): 287 - 300 . (in Chinese)
DONG X L , MOON S , XU Y E , et al . Towards next-generation intelligent assistants leveraging LLM techniques [C ] // Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . New York : ACM , 2023 : 5792 - 5793 .
LEE M N , LIANG P , YANG Q . CoAuthor: Designing a human-AI collaborative writing dataset for exploring language model capabilities [C ] // CHI Conference on Human Factors in Computing Systems . New York : ACM , 2022 : 1 - 19 .
Medical large language model for diagnostic reasoning across specialties [J ] . Nature Medicine , 2025 , 31 ( 3 ): 743 - 744 .
ZHU Y C , WU L , GUO Q , et al . Collaborative large language model for recommender systems [C ] // Proceedings of the ACM Web Conference 2024 . New York : ACM , 2024 : 3162 - 3172 .
ZHU X , WANG Y , GAO H , et al . Recommender systems meet large language model agents: A survey [J ] . Foundations and Trends in Privacy and Security , 2025 , 7 ( 4 ): 247 - 396 .
ZHU Y H , HE Z Y , HU H R , et al . MedAgentBoard: Benchmarking multi-agent collaboration with conventional methods for diverse medical tasks [EB/OL ] . ( 2025-10-30 )[ 2025-11-10 ] . https://arXiv.org/abs/2505.12371 https://arXiv.org/abs/2505.12371 .
BREAZEAL C , CHAN Y , GHASSEMI M , et al . MDAgents: An adaptive collaboration of LLMs for medical decision-making [EB/OL ] . ( 2024-10-30 )[ 2025-09-20 ] . https://arxiv.org/abs/2404.15155 https://arxiv.org/abs/2404.15155 .
HAO Z X , JIANG H Q , JIANG S Q , et al . Hybrid SLM and LLM for edge-cloud collaborative inference [C ] // Proceedings of the Workshop on Edge and Mobile Foundation Models . New York : ACM , 2024 : 36 - 41 .
YANG Z M , YANG Y H , ZHAO C , et al . PerLLM: Personalized inference scheduling with edge-cloud collaboration for diverse LLM services [EB/OL ] . ( 2025-05-23 )[ 2025-06-10 ] . https://arXiv.org/abs/2405.14636 https://arXiv.org/abs/2405.14636 .
LI T L , LIU Q , PANG T Y , et al . Purifying large language models by ensembling a small language model [EB/OL ] . ( 2024-02-19 )[ 2025-06-10 ] . https://arXiv.org/abs/2402.14845 https://arXiv.org/abs/2402.14845 .
HOANG H , KHAYRALLAH H , JUNCZYS-DOWMUNT M . On-the-fly fusion of large language models and machine translation [C ] // Findings of the Association for Computational Linguistics: NAACL 2024 . Stroudsburg : ACL , 2024 : 520 - 532 .
WAN F Q , HUANG X T , CAI D , et al . Knowledge fusion of large language models [EB/OL ] . ( 2024-01-22 )[ 2025-09-20 ] . https://arXiv.org/abs/2401.10491 https://arXiv.org/abs/2401.10491 .
BANSAL R , SAMANTA B , DALMIA S , et al . LLM augmented LLMs: Expanding capabilities through composition [EB/OL ] . ( 2024-01-04 )[ 2025-10-09 ] . https://arXiv.org/abs/2401.02412 https://arXiv.org/abs/2401.02412 .
VENKATRAMAN S , TRIPTO N I , LEE D . CollabStory: Multi-LLM collaborative story generation and authorship analysis [C ] // Findings of the Association for Computational Linguistics: NAACL 2025 . Stroudsburg : ACL , 2025 : 3665 - 3679 .
NI A S , DESAI R , LI Y , et al . Collaborative reasoner: Self-improving social agents with synthetic conversations [EB/OL ] . ( 2025-10-29 )[ 2025-11-09 ] . https://openreview.net/forum?id=dye9w8IOV0 https://openreview.net/forum?id=dye9w8IOV0 .
YANG S , LI Y F , LAM W , et al . Multi-LLM collaborative search for complex problem solving [EB/OL ] . ( 2025-02-26 )[ 2025-06-10 ] . https://arXiv.org/abs/2502.18873 https://arXiv.org/abs/2502.18873 .
王建辉 , 李哲涛 , 伍涛 , 等 . Token级多模型并联协作推理 [J ] . 计算机学报 , 2025 , 48 ( 11 ): 2579 - 2593 .
WANG J H , LI Z T , WU T , et al . Token-level collaborative reasoning for parallel multi-models [J ] . Chinese Journal of Computers , 2025 , 48 ( 11 ): 2579 - 2593 . (in Chinese)
YU Y C , KUO C C , YE Z Q , et al . Breaking the ceiling of the LLM community by treating token generation as a classification for ensembling [C ] // Findings of the Association for Computational Linguistics: EMNLP 2024 . Stroudsburg : ACL , 2024 : 1826 - 1839 .
XU Y Y F , CHEN J H , WU J H , et al . Hit the sweet spot!Span-level ensemble for large language models [EB/OL ] . ( 2024-09-27 )[ 2025-09-20 ] . https://arXiv.org/abs/2409.18583 https://arXiv.org/abs/2409.18583 .
NIE L , DING Z M , HU E D , et al . Online cascade learning for efficient inference over streams [C ] // Proceedings of the 41st International Conference on Machine Learning . New York : ACM , 2024 : 38071 - 38090 .
NARASIMHAN H , JITKRITTUM W , RAWAT A S , et al . Faster cascades via speculative decoding [EB/OL ] . ( 2024-10-21 )[ 2025-10-10 ] . https://arXiv.org/abs/2405.19261 https://arXiv.org/abs/2405.19261 .
HU Z M , HUANG H . Accelerated speculative sampling based on tree monte carlo [C ] // Proceedings of the 41st International Conference on Machine Learning (ICML) . New York : ACM , 2024 , 235 : 19216 - 19251 .
XU H , YE J Y , LI Y T , et al . Can speculative sampling accelerate react without compromising reasoning quality? [C ] // Proceedings of the 12th International Conference on Learning Representations (ICLR) . New York : ACM , 2024 : 1 - 7 .
0
浏览量
3
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621