多智能体强化学习：从基础理论到前沿算法

韩光洁; 朱胜超; 林川; 江金芳

doi:10.12263/DZXB.20250418

您当前的位置：

首页 >

文章列表页 >

多智能体强化学习：从基础理论到前沿算法

综述评论 | 更新时间：2026-04-24

- 多智能体强化学习：从基础理论到前沿算法
- Multi-Agent Reinforcement Learning: From Foundational Theory to Cutting-Edge Algorithms
- 电子学报 2025年53卷第12期页码：4756-4786
- 作者机构：
  
  1.河海大学信息科学与工程学院，江苏常州 213200
  2.河海大学计算机与软件学院，江苏南京 211100
  3.东北大学软件学院，辽宁沈阳 110169
- 作者简介：
  
  [ "韩光洁男，1972年8月出生于黑龙江省绥化市.现为河海大学信息科学与工程学院教授、博士生导师.主要研究方向为水声通信与组网、水利智能物联网、人工智能、网络与安全等.中国电子学会会员编号：E190157962M.E-mail: hanguangjie@gmail.com" ]
  [ "朱胜超男，2001年9月出生于山东省德州市.现为河海大学计算机与软件学院博士研究生.主要研究方向为多智能体强化学习、软件定义网络、智慧海洋.中国电子学会会员编号：E190197863A.E-mail: zhushengchao77@gmail.com" ]
  [ "林川男，1988年2月出生于辽宁省丹东市.现为东北大学软件学院副教授、博士生导师.主要研究方向为多智能体强化学习、软件定义网络、智慧海洋等.E-mail: chuanlin1988@gmail.com" ]
  [ "江金芳女，1988年1月出生于安徽省六安市.现为河海大学信息科学与工程学院教授、博士生导师.主要研究方向为水下通信与组网、水下信任等.中国电子学会会员编号：E190157961M.E-mail: jiangjinfang@hhu.edu" ]
- 基金信息：
  
  国家自然科学基金(U22A2011)
- DOI：10.12263/DZXB.20250418
  中图分类号： TP18;
- 收稿：2025-05-26，
  
  录用：2025-12-05，
  
  纸质出版：2025-12-25
- 稿件说明：
移动端阅览
韩光洁, 朱胜超, 林川, 等. 多智能体强化学习：从基础理论到前沿算法[J]. 电子学报, 2025, 53(12): 4756-4786.

HAN Guang-jie, ZHU Sheng-chao, LIN Chuan, et al. Multi-Agent Reinforcement Learning: From Foundational Theory to Cutting-Edge Algorithms[J]. Acta Electronica Sinica, 2025, 53(12): 4756-4786.
韩光洁, 朱胜超, 林川, 等. 多智能体强化学习：从基础理论到前沿算法[J]. 电子学报, 2025, 53(12): 4756-4786. DOI：10.12263/DZXB.20250418

HAN Guang-jie, ZHU Sheng-chao, LIN Chuan, et al. Multi-Agent Reinforcement Learning: From Foundational Theory to Cutting-Edge Algorithms[J]. Acta Electronica Sinica, 2025, 53(12): 4756-4786. DOI：10.12263/DZXB.20250418

摘要

多智能体强化学习（Multi-Agent Reinforcement Learning，MARL）作为处理复杂动态环境中智能体协作与竞争问题的重要框架，近年来在理论与应用上取得快速发展，并在自动驾驶、群体机器人、智能调度与对抗博弈等领域展现出广阔前景.然而，多智能体系统中普遍存在环境非平稳、策略强耦合、信用分配困难和安全约束复杂等问题，使得MARL相较于单智能体强化学习面临更大挑战.本文首先梳理了MARL的基础建模与理论框架，从马尔可夫博弈、部分可观测马尔可夫博弈等形式化描述出发，结合集中式训练、分布式执行和基于通信的协同决策等典型范式，对现有方法在信息利用、计算复杂度与收敛性质等方面进行对比分析，并围绕价值分解、策略梯度、多智能体信用分配和通信建模等核心技术进行归纳.在此基础上，本文重点总结了若干前沿研究方向：一是基于大语言模型（Large Language Model，LLM）的MARL，通过引入LLM的知识推理和高层规划能力，用于任务分解、策略引导及自然语言通信，以提升智能体在开放环境中的泛化与协作能力；二是基于元学习的MARL，面向多任务与分布迁移场景，关注策略对新任务、新队友或新对手的快速适应，通过学习“会学习的初始化”或适应规则提高样本效率；三是基于可解释性的MARL，利用注意力可视化、因果分析和规则抽取等方法增强决策过程透明度，为策略审计、人机协同与安全监管提供支持；四是大规模MARL的应用与部署，聚焦智能体数量和状态维度急剧增长带来的训练效率、通信开销与可扩展性问题，探索分层结构、群体建模和并行训练等机制；五是多智能体安全强化学习，从约束满足、风险控制和稳健性出发，研究在对抗扰动、不确定性和策略博弈下的安全决策.最后，本文结合协作与竞争两类典型应用场景，讨论了MARL在真实系统落地中面临的样本效率不足、仿真到现实迁移困难、公平性与稳态博弈分析不足等挑战，旨在为后续MARL的理论研究与工程应用提供系统参考.

Abstract

Multi-Agent Reinforcement Learning (MARL)

as an important framework for handling the problems of agent cooperation and competition in complex dynamic environments

has achieved rapid development in both theory and application in recent years

and has shown broad prospects in fields such as autonomous driving

swarm robotics

intelligent scheduling

and adversarial games. However

problems such as environmental non-stationarity

strong policy coupling

difficult credit assignment

and complex safety constraints are widespread in multi-agent systems

making MARL face greater challenges compared to single-agent reinforcement learning. This paper first combs through the foundational modeling and theoretical framework of MARL

starting from formal descriptions such as Markov games and partially observable Markov games

and combining typical paradigms such as centralized training with decentralized execution

and communication-based cooperative decision-making

to conduct a comparative analysis of existing methods in terms of information utilization

computational complexity

and convergence properties

and summarizes the core technologies such as value decomposition

policy gradients

multi-agent credit assignment

and communication modeling. On this basis

this paper focuses on summarizing several frontier research directions. The first is Large Language Models (LLMs)-based MARL

which

by introducing the knowledge reasoning and high-level planning capabilities of LLMs

is used for task decomposition

policy guidance

and natural language communication

to enhance the generalization and collaboration capabilities of agents in open environments. The second is MARL based on meta-learning

facing multi-task and distribution shift scenarios

focusing on the rapid adaptation of policies to new tasks

new teammates

or new opponents

improving sample efficiency by learning “learn-to-learn” initializations or adaptation rules. The third is MARL based on explainability

using methods such as attention visualization

causal analysis

and rule extraction to enhance the transparency of the decision-making process

providing support for policy auditing

human-agent collaboration

and safety supervision. The fourth is the application and deployment of large-scale MARL

focusing on the problems of training efficiency

communication overhead

and scalability brought by the sharp increase in the number of agents and state dimensions

exploring mechanisms such as hierarchical structures

population modeling

and parallel training. The fifth is multi-agent safe reinforcement learning

starting from constraint satisfaction

risk control

and robustness

studying safe decision-making under adversarial perturbations

uncertainties

and policy games. Finally

this paper

combining two typical application scenarios of cooperation and competition

discusses the challenges faced by MARL in its deployment in real systems

such as insufficient sample efficiency

difficulty in simulation-to-real transfer

and insufficient analysis of fairness and steady-state games

aiming to provide a systematic reference for the subsequent theoretical research and engineering applications of MARL.

关键词

Keywords

references

TANG C , ABBATEMATTEO B , HU J H , et al . Deep reinforcement learning for robotics: A survey of real-world successes [C ] // Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intelligence . New York : ACM , 2025 : 28694 - 28698 .

MILANI S , TOPIN N , VELOSO M , et al . Explainable reinforcement learning: A survey and comparative review [J ] . ACM Computing Surveys , 2024 , 56 ( 7 ): 3616864 .

TANG Y L , SUN J , WANG H , et al . A method of network attack-defense game and collaborative defense decision-making based on hierarchical multi-agent reinforcement learning [J ] . Computers & Security , 2024 , 142 : 103871 .

SHI H R , LIU G J , ZHANG K W , et al . MARL Sim2real transfer: Merging physical reality with digital virtuality in metaverse [J ] . IEEE Transactions on Systems, Man, and Cybernetics: Systems , 2023 , 53 ( 4 ): 2107 - 2117 .

YOUN J , PARK J , KIM S , et al . MARL-based access control for grant-free nonorthogonal random access in UDN [J ] . IEEE Internet of Things Journal , 2024 , 11 ( 17 ): 28421 - 28436 .

陈阳 , 皮德常 , 代成龙 , 等 . 多无人机协同陆地设施辅助移动边缘计算的系统能耗最小化方法 [J ] . 电子学报 , 2023 , 51 ( 4 ): 984 - 992 .

CHEN Y , PI D C , DAI C L , et al . System energy consumption minimization method for multi-UAVs cooperating with land facilities to assist moving edge calculation [J ] . Acta Electronica Sinica , 2023 , 51 ( 4 ): 984 - 992 . (in Chinese)

ZHANG H , CHENG J Y , ZHANG L , et al . H2GNN: Hierarchical-hops graph neural networks for multi-robot exploration in unknown environments [J ] . IEEE Robotics and Automation Letters , 2022 , 7 ( 2 ): 3435 - 3442 .

PEY J J J , SAMARAKOON S M B P , MUTHUGALA M A V J , et al . A Decentralized Partially Observable Markov Decision Process for complete coverage onboard multiple shape changing reconfigurable robots [J ] . Expert Systems with Applications , 2025 , 271 : 126565 .

ABOUELAZM A , MICHEL J , ZÖLLNER J M . A review of reward functions for reinforcement learning in the context of autonomous driving [C ] // 2024 IEEE Intelligent Vehicles Symposium . Piscataway : IEEE , 2024 : 156 - 163 .

彭翔 , 许华 , 蒋磊 , 等 . 一种基于深度强化学习的动态自适应干扰功率分配方法 [J ] . 电子学报 , 2023 , 51 ( 5 ): 1223 - 1234 .

PENG X , XU H , JIANG L , et al . Dynamic adaptive interference power allocation method based on deep reinforcement learning [J ] . Acta Electronica Sinica , 2023 , 51 ( 5 ): 1223 - 1234 . (in Chinese)

SESSA P G , KAMGARPOUR M , KRAUSE A . Efficient model-based multi-agent reinforcement learning via optimistic equilibrium computation [EB/OL ] . ( 2022-07-10 )[ 2025-10-10 ] . https://arXiv.org/abs/2203.07322 https://arXiv.org/abs/2203.07322 .

ESCHMANN J . Reward function design in reinforcement learning [M ] // Reinforcement Learning Algorithms: Analysis and Applications . Cham : Springer International Publishing , 2021 : 25 - 33 .

TORO ICARTE R , KLASSEN T Q , VALENZANO R , et al . Reward machines: Exploiting reward function structure in reinforcement learning [J ] . Journal of Artificial Intelligence Research , 2022 , 73 : 173 - 208 .

TABRIZI Y H , UDDIN M N . Multi-agent reinforcement learning-based maximum power point tracking approach to fortify PMSG-based WECSs [J ] . IEEE Transactions on Industry Applications , 2024 , 60 ( 6 ): 8077 - 8087 .

RETZLAFF C O , DAS S , WAYLLACE C , et al . Human-in-the-loop reinforcement learning: A survey and position on requirements, challenges, and opportunities [J ] . Journal of Artificial Intelligence Research , 2024 , 79 : 359 - 415 .

蔡玉 , 官铮 , 王增文 , 等 . 基于多智能体深度强化学习的车联网区分业务资源分配算法 [J ] . 计算机工程与科学 , 2024 , 46 ( 10 ): 1757 - 1764 .

CAI Y , GUAN Z , WANG Z W , et al . Multi-agent deep reinforcement learning based resource allocation algorithm for differentiated services in Internet of vehicles [J ] . Computer Engineering and Science , 2024 , 46 ( 10 ): 1757 - 1764 . (in Chinese)

YAN Y M , CHOW A H F , HO C P , et al . Reinforcement learning for logistics and supply chain management: Methodologies, state of the art, and future opportunities [J ] . Transportation Research Part E: Logistics and Transportation Review , 2022 , 162 : 102712 .

徐少毅 , 杨磊 . 基于多智能体深度强化学习的多无人机辅助移动边缘计算轨迹设计 [J ] . 北京交通大学学报 , 2024 , 48 ( 5 ): 1 - 9 .

XU S Y , YANG L . Trajectory design of multi-UAV-assisted moving edge calculation based on multi-agent depth reinforcement learning [J ] . Journal of Beijing Jiaotong University , 2024 , 48 ( 5 ): 1 - 9 . (in Chinese)

CHEN D , ZHANG K X , WANG Y Q , et al . Communication-efficient decentralized multi-agent reinforcement learning for cooperative adaptive cruise control [J ] . IEEE Transactions on Intelligent Vehicles , 2024 , 9 ( 10 ): 6436 - 6449 .

ZHANG K Q , YANG Z R , BAŞAR T . Multi-agent reinforcement learning: A selective overview of theories and algorithms [M ] // Handbook of Reinforcement Learning and Control . Cham : Springer International Publishing , 2021 : 321 - 384 .

GRONAUER S , DIEPOLD K . Multi-agent deep reinforcement learning: A survey [J ] . Artificial Intelligence Review , 2022 , 55 ( 2 ): 895 - 943 .

DU W , DING S F . A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications [J ] . Artificial Intelligence Review , 2021 , 54 ( 5 ): 3215 - 3238 .

HU K , LI M Y , SONG Z Q , et al . A review of research on reinforcement learning algorithms for multi-agents [J ] . Neurocomputing , 2024 , 599 : 128068 .

ZHU C X , DASTANI M , WANG S H . A survey of multi-agent deep reinforcement learning with communication [J ] . Autonomous Agents and Multi-Agent Systems , 2024 , 38 ( 1 ): 2845 - 2847 .

LIU Z H , ZHANG J Y , SHI E Y , et al . Graph neural network meets multi-agent reinforcement learning: Fundamentals, applications, and future directions [J ] . IEEE Wireless Communications , 2024 , 31 ( 6 ): 39 - 47 .

DU P , LI F L , SHAO J L . Multi-agent reinforcement learning clustering algorithm based on silhouette coefficient [J ] . Neurocomputing , 2024 , 596 : 127901 .

YU W W , WANG R , HU X H . Learning attentional communication with a common network for multiagent reinforcement learning [J ] . Computational Intelligence and Neuroscience , 2023 , 2023 : 5814420 .

HAN H M , JIANG X , LU W D , et al . A multi-agent reinforcement learning approach for massive access in NOMA-URLLC networks [J ] . IEEE Transactions on Vehicular Technology , 2023 , 72 ( 12 ): 16799 - 16804 .

CHEN J D , LAN T , JOE-WONG C . RGMComm: Return gap minimization via discrete communications in multi-agent reinforcement learning [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2024 , 38 ( 16 ): 17327 - 17336 .

YOUNAS R , RAZA UR REHMAN H M , LEE I , et al . SA-MARL: Novel self-attention-based multi-agent reinforcement learning with stochastic gradient descent [J ] . IEEE Access , 2025 , 13 : 35674 - 35687 .

HAN R X , LI H X , KNOBLOCK E J , et al . Joint velocity and spectrum optimization in urban air transportation system via multi-agent deep reinforcement learning [J ] . IEEE Transactions on Vehicular Technology , 2023 , 72 ( 8 ): 9770 - 9782 .

LIN H Y , LYU C , HE Y X , et al . Enhancing state representation in multi-agent reinforcement learning for platoon-following models [J ] . IEEE Transactions on Vehicular Technology , 2024 , 73 ( 8 ): 12110 - 12114 .

KARPE M , FANG J , MA Z Y , et al . Multi-agent reinforcement learning in a realistic limit order book market simulation [EB/OL ] . ( 2020-06-10 )[ 2025-10-10 ] . https://arXiv.org/abs/2006.05574 https://arXiv.org/abs/2006.05574 .

JI Y X , WANG Y , ZHAO H T , et al . Multi-agent reinforcement learning resources allocation method using dueling double deep Q-network in vehicular networks [J ] . IEEE Transactions on Vehicular Technology , 2023 , 72 ( 10 ): 13447 - 13460 .

LI H L , YI P , WEI D X , et al . Seek-and-take games of heterogeneous agent teams with large language model [C ] // 2024 China Automation Congress . Piscataway : IEEE , 2025 : 7078 - 7084 .

YANG T T , FENG P , GUO Q X , et al . AutoHMA-LLM: Efficient task coordination and execution in heterogeneous multi-agent systems using hybrid large language models [J ] . IEEE Transactions on Cognitive Communications and Networking , 2025 , 11 ( 2 ): 987 - 998 .

CHEN C L , WANG Z , WU W H , et al . Meta-DT: Offline meta-RL as conditional sequence modeling with world model disentanglement [C ] // Advances in Neural Information Processing Systems 37 . Berkeley : USENIX Association , 2024 : 44845 - 44870 .

LEE S , CHUNG S Y . Improving generalization in meta-RL with imaginary tasks from latent dynamics mixture [EB/OL ] . ( 2022-01-18 )[ 2025-10-10 ] . https://arXiv.org/abs/2105.13524 https://arXiv.org/abs/2105.13524 .

WANG H , YU Y , JIANG Y . Fully decentralized multiagent communication via causal inference [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2023 , 34 ( 12 ): 10193 - 10202 .

WANG C , TANG H Z , DING W B . MAMGDT: Enhancing multi-agent systems with multi-game decision transformer [C ] // Proceedings of the 30th Annual International Conference on Mobile Computing and Networking . New York : ACM , 2024 : 1962 - 1967 .

LI Y H , ZHANG X X , ZENG T Y , et al . Task placement and resource allocation for edge machine learning: A GNN-based multi-agent reinforcement learning paradigm [J ] . IEEE Transactions on Parallel and Distributed Systems , 2023 , 34 ( 12 ): 3073 - 3089 .

DAI Y P , LYU L , CHENG N , et al . A survey of graph-based resource management in wireless networks: Part II: Learning approaches [J ] . IEEE Transactions on Cognitive Communications and Networking , 2025 , 11 ( 4 ): 2101 - 2122 .

LITTMAN M L . Markov games as a framework for multi-agent reinforcement learning [M ] // Machine Learning Proceedings 1994 . Amsterdam : Elsevier , 1994 : 157 - 163 .

BEYNIER A , CHARPILLET F , SZER D , et al . DEC-MDP/POMDP [M ] // Markov Decision Processes in Artificial Intelligence . Hoboken : Wiley , 2013 : 277 - 318 .

MA C Y T , YAU D K Y , LOU X , et al . Markov game analysis for attack-defense of power networks under possible misinformation [J ] . IEEE Transactions on Power Systems , 2013 , 28 ( 2 ): 1676 - 1686 .

MURPHY K P . A survey of POMDP solution techniques [J ] . Environment , 2000 , 2 ( 10 ): 268076619 .

DIBANGOYE J S , AMATO C , BUFFET O , et al . Optimally solving dec-POMDPs as continuous-state MDPs [J ] . Journal of Artificial Intelligence Research , 2016 , 55 : 443 - 497 .

KRAEMER L , BANERJEE B . Multi-agent reinforcement learning as a rehearsal for decentralized planning [J ] . Neurocomputing , 2016 , 190 : 82 - 94 .

DENG Y , WANG Z R , CHEN X , et al . Boosting multi-agent reinforcement learning via contextual prompting [J ] . Journal of Machine Learning Research , 2023 , 24 ( 399 ): 1 - 34 .

MIAO C Y , CUI Y D , LI H Y , et al . Effective multi-agent deep reinforcement learning control with relative entropy regularization [J ] . IEEE Transactions on Automation Science and Engineering , 2025 , 22 : 3704 - 3718 .

KIM J B , CHOI H B , HAN Y H . Strangeness-driven exploration in multi-agent reinforcement learning [J ] . Neural Networks , 2024 , 172 : 106149 .

FENG P , LIANG J K , WANG S Z , et al . Hierarchical consensus-based multi-agent reinforcement learning for multi-robot cooperation tasks [C ] // 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems . Piscataway : IEEE , 2024 : 642 - 649 .

PEI Y H , REN T , ZHANG Y X , et al . Policy distillation for efficient decentralized execution in multi-agent reinforcement learning [J ] . Neurocomputing , 2025 , 626 : 129617 .

GUPTA J K , EGOROV M , KOCHENDERFER M . Cooperative multi-agent control using deep reinforcement learning [C ] // Autonomous Agents and Multiagent Systems . Cham : Springer , 2017 : 66 - 83 .

ITURRIA-RIVERA P E , CHENIER M , HERSCOVICI B , et al . Channel selection for Wi-Fi 7 multi-link operation via optimistic-weighted VDN and parallel transfer reinforcement learning [C ] // 2023 IEEE 34th Annual International Symposium on Personal, Indoor and Mobile Radio Communications . Piscataway : IEEE , 2023 : 1 - 6 .

RASHID T , SAMVELYAN M , DE WITT C S , et al . Monotonic value function factorisation for deep multi-agent reinforcement learning [EB/OL ] . ( 2020-08-27 )[ 2025-10-10 ] . https://arXiv.org/abs/2003.08839 https://arXiv.org/abs/2003.08839 .

SON K , KIM D , KANG W J , et al . QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning [EB/OL ] . ( 2019-05-14 )[ 2025-10-10 ] . https://arXiv.org/abs/1905.05408 https://arXiv.org/abs/1905.05408 .

WANG J H , REN Z Z , LIU T , et al . QPLEX: Duplex dueling multi-agent Q-learning [EB/OL ] . ( 2021-10-04 )[ 2025-10-10 ] . https://arXiv.org/abs/2008.01062 https://arXiv.org/abs/2008.01062 .

FOERSTER J , FARQUHAR G , AFOURAS T , et al . Counterfactual multi-agent policy gradients [EB/OL ] . ( 2024-12-11 )[ 2025-10-10 ] . https://arXiv.org/abs/1705.08926 https://arXiv.org/abs/1705.08926 .

LOWE R , WU Y , TAMAR A , et al . Multi-agent actor-critic for mixed cooperative-competitive environments [EB/OL ] . ( 2020-03-14 )[ 2025-10-10 ] . https://arXiv.org/abs/1706.02275 https://arXiv.org/abs/1706.02275 .

YU C , VELU A , VINITSKY E , et al . The surprising effectiveness of PPO in cooperative, multi-agent games [EB/OL ] . ( 2022-11-04 )[ 2025-10-10 ] . https://arXiv.org/abs/2103.01955 https://arXiv.org/abs/2103.01955 .

WANG T H , DONG H , LESSER V , et al . ROMA: Multi-agent reinforcement learning with emergent roles [EB/OL ] . ( 2020-07-04 )[ 2025-10-10 ] . https://arXiv.org/abs/2003.08039 https://arXiv.org/abs/2003.08039 .

PENG B , RASHID T , DE WITT C A S , et al . FACMAC: Factored multi-agent centralised policy gradients [EB/OL ] . ( 2021-05-07 )[ 2025-10-10 ] . https://arXiv.org/abs/2003.06709 https://arXiv.org/abs/2003.06709 .

KUBA J G , CHEN R Q , WEN M N , et al . Trust region policy optimisation in multi-agent reinforcement learning [EB/OL ] . ( 2022-04-04 )[ 2025-10-10 ] . https://arXiv.org/abs/2109.11251 https://arXiv.org/abs/2109.11251 .

WEN M N , KUBA J G , LIN R J , et al . Multi-agent reinforcement learning is a sequence modeling problem [C ] // Proceedings of the 36th International Conference on Neural Information Processing Systems . New York : ACM , 2022 : 16509 - 16521 .

TAN M . Multi-agent reinforcement learning: Independent vs . cooperative agents [M ] // Machine Learning Proceedings 1993 . Amsterdam : Elsevier , 1993 : 330 - 337 .

STEPANOV E P , SMELIANSKY R L , PLAKUNOV A V , et al . On fair traffic allocation and efficient utilization of network resources based on MARL [J ] . Computer Networks , 2024 , 250 : 110540 .

ZHU S C , HAN G J , LIN C . A software-defined MARL-based architecture for AUV cluster network to enable cooperative and smart underwater target tracking [J ] . IEEE Wireless Communications , 2024 , 31 ( 6 ): 56 - 62 .

ZHANG K Q , YANG Z R , LIU H , et al . Fully decentralized multi-agent reinforcement learning with networked agents [EB/OL ] . ( 2018-02-27 )[ 2025-10-10 ] . https://arXiv.org/abs/1802.08757 https://arXiv.org/abs/1802.08757 .

KOPPEL A , SINGH BEDI A , GANGULY B , et al . Convergence rates of average-reward multi-agent reinforcement learning via randomized linear programming [C ] // 2022 IEEE 61st Conference on Decision and Control . Piscataway : IEEE , 2023 : 4545 - 4552 .

WATKINS C J C H , DAYAN P . Q-learning [J ] . Machine Learning , 1992 , 8 ( 3 ): 279 - 292 .

SUTTON R S . Generalization in reinforcement learning: Successful examples using sparse coarse coding [C ] // Proceedings of the 9th International Conference on Neural Information Processing Systems . New York : ACM , 1995 : 1038 - 1044 .

BERTSEKAS D . Multiagent reinforcement learning: Rollout and policy iteration [J ] . IEEE/CAA Journal of Automatica Sinica , 2021 , 8 ( 2 ): 249 - 272 .

SILVER D , LEVER G , HEESS N M O , et al . Deterministic policy gradient algorithms [C ] // International Conference on Machine Learning . Brookline : JMLR , 2014 : 605 - 619 .

MOERLAND T M , BROEKENS J , PLAAT A , et al . Model-based reinforcement learning: A survey [EB/OL ] . ( 2022-03-31 )[ 2025-10-10 ] . https://arXiv.org/abs/2006.16712 https://arXiv.org/abs/2006.16712 .

DU Y L , MA C D , LIU Y C , et al . Scalable model-based policy optimization for decentralized networked systems [C ] // 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems . Piscataway : IEEE , 2022 : 9019 - 9026 .

DONG S , XIA Y J , PENG T . Network abnormal traffic detection model based on semi-supervised deep reinforcement learning [J ] . IEEE Transactions on Network and Service Management , 2021 , 18 ( 4 ): 4197 - 4212 .

WILLEMSEN D , COPPOLA M , DE CROON G C H E . MAMBPO: Sample-efficient multi-robot reinforcement learning using learned world models [C ] // 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . New York : ACM , 2021 : 5635 - 5640 .

JANNER M , FU J , ZHANG M , et al . When to trust your model: Model-based policy optimization [EB/OL ] . ( 2021-11-29 )[ 2025-10-10 ] . https://arXiv.org/abs/1906.08253 https://arXiv.org/abs/1906.08253 .

EGOROV V , SHPILMAN A . Scalable multi-agent model-dased reinforcement learning [EB/OL ] . ( 2022-05-25 )[ 2025-10-10 ] . https://arXiv.org/abs/2205.15023 https://arXiv.org/abs/2205.15023 .

NAM D , MACVEAN A , HELLENDOORN V , et al . Using an LLM to help with code understanding [C ] // Proceedings of the IEEE/ACM 46th International Conference on Software Engineering . New York : ACM , 2024 : 1184 - 1196 .

AN S N , CHEN W Z , LIN Z Q , et al . Make your LLM fully utilize the context [C ] // Advances in Neural Information Processing Systems 37 . Berkeley : USENIX Association , 2024 : 62160 - 62188 .

ALBERTS I L , MERCOLLI L , PYKA T , et al . Large language models (LLM) and ChatGPT: What will the impact on nuclear medicine be? [J ] . European Journal of Nuclear Medicine and Molecular Imaging , 2023 , 50 ( 6 ): 1549 - 1552 .

ZHU G B , ZHOU R , JI W K , et al . LAMARL: LLM-aided multi-agent reinforcement learning for cooperative policy generation [J ] . IEEE Robotics and Automation Letters , 2025 , 10 ( 7 ): 7476 - 7483 .

YAO T L , XU Y Q , WANG H Y , et al . Multi-agent fuzzy reinforcement learning with LLM for cooperative navigation of endovascular robotics [J ] . IEEE Transactions on Fuzzy Systems , 2025 . DOI: 10.1109/TFUZZ.2025.3585934 http://dx.doi.org/10.1109/TFUZZ.2025.3585934 .

LI Z M , ZHANG R B , WANG Z M , et al . LLM-guided decision-making toolkit for multi-agent reinforcement learning [J ] . Neurocomputing , 2025 , 638 : 130105 .

CHALAKI B , LEE K , LEWIS M , et al . Language grounded multi-agent reinforcement learning with human-interpretable communication [EB/OL ] . ( 2024-09-25 )[ 2025-10-10 ] . https://arXiv.org/pdf/2409.17348 https://arXiv.org/pdf/2409.17348 .

MORAD S , SHANKAR A , BLUMENKAMP J , et al . Language-conditioned offline RL for multi-robot navigation [C ] // 2025 IEEE International Conference on Robotics and Automation . Piscataway : IEEE , 2025 : 14984 - 14991 .

ZHOU L , DENG X F , WANG Z , et al . Semantic information extraction and multi-agent communication optimization based on generative pre-trained transformer [J ] . IEEE Transactions on Cognitive Communications and Networking , 2025 , 11 ( 2 ): 725 - 737 .

LOU J B , SHI R Y , LIN Y X , et al . TALKER: A task-activated language model based knowledge-extension reasoning system [J ] . IEEE Robotics and Automation Letters , 2025 , 10 ( 2 ): 1026 - 1033 .

JIA Z Q , LI J J , QU X Y , et al . Enhancing multi-agent systems via reinforcement learning with LLM-based planner and graph-based policy [C ] // 2025 IEEE International Conference on Robotics and Automation . Piscataway : IEEE , 2025 : 1240 - 1246 .

WEI Y , SHAN X H , MIAO R , et al . LERO: LLM-driven evolutionary framework with hybrid rewards and enhanced observation for multi-agent reinforcement learning [C ] // Advanced Intelligent Computing Technology and Applications . Singapore : Springer , 2025 : 15 - 26 .

CHEN R Q , SONG W B , ZU W Q , et al . An LLM-driven framework for multiple-vehicle dispatching and navigation in smart city landscapes [C ] // 2024 IEEE International Conference on Robotics and Automation . Piscataway : IEEE , 2024 : 2147 - 2153 .

ZINTGRAF L , SCHULZE S , LU C , et al . VariBAD: Variational bayes-adaptive deep RL via meta-learning [J ] . Journal of Machine Learning Research , 2021 , 22 ( 289 ): 1 - 39 .

FINN C , ABBEEL P , LEVINE S . Model-agnostic meta-learning for fast adaptation of deep networks [C ] // Proceedings of the 34th International Conference on Machine Learning - Volume 70 . New York : ACM , 2017 : 1126 - 1135 .

GUPTA A , MENDONCA R , LIU Y X , et al . Meta-reinforcement learning of structured exploration strategies [EB/OL ] . ( 2018-02-20 )[ 2025-10-10 ] . https://arXiv.org/abs/1802.07245 https://arXiv.org/abs/1802.07245 .

RAKELLY K , ZHOU A , QUILLEN D , et al . Efficient off-policy meta-reinforcement learning via probabilistic context variables [EB/OL ] . ( 2019-03-19 )[ 2025-10-10 ] . https://arXiv.org/abs/1903.08254 https://arXiv.org/abs/1903.08254 .

SHARMA N , GHOSH A , MISRA R , et al . Deep meta Q-learning based multi-task offloading in edge-cloud systems [J ] . IEEE Transactions on Mobile Computing , 2024 , 23 ( 4 ): 2583 - 2598 .

YUN W J , PARK J , KIM J . Quantum multi-agent meta reinforcement learning [C ] // Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence . New York : ACM , 2023 : 11087 - 11095 .

MAO W C , QIU H R , WANG C , et al . Multi-agent meta-reinforcement learning: Sharper convergence rates with task similarity [C ] // Proceedings of the 37th International Conference on Neural Information Processing Systems , New York : Curran Associates Inc. , 2023 : 66556 - 66570 .

BOUGZIME O , JABBAR S , CRUZ C , et al . Evaluating Neuro-symbolic AI architectures: Design principles, qualitative benchmark, comparative analysis and results [C ] // Conference on Neurosymbolic Learning and Reasoning . Cambridge : PMLR , 2025 : 1119 - 1143 .

SHINDO H , DELFOSS Q , DHAMI D S , et al . BlendRL: A framework for merging symbolic and neural policy learning [EB/OL ] . ( 2025-04-21 )[ 2025-10-10 ] . https://arXiv.org/abs/2410.11689 https://arXiv.org/abs/2410.11689 .

WAN K J , LIU Y T , LIU H Z , et al . A framework for modeling cognitive processes in intelligent agents using behavior trees [C ] // Proceedings of the 2025 5th International Conference on Internet of Things and Machine Learning . New York : ACM , 2025 : 267 - 271 .

LIU Z C , ZHU Y Y , WANG Z , et al . MIXRTs: Toward interpretable multi-agent reinforcement learning via mixing recurrent soft decision trees [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2025 , 47 ( 5 ): 4090 - 4107 .

BOGGESS K . Explanations for multi-agent reinforcement learning [C ] // Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intelligence . New York : ACM , 2025 : 29245 - 29246 .

ZHU S C , HAN G J , LIN C , et al . Underwater multiple AUV cooperative target tracking based on minimal reward participation-embedded MARL [J ] . IEEE Transactions on Mobile Computing , 2025 , 24 ( 5 ): 4169 - 4182 .

CHEN J M , WANG Y W , WANG J J , et al . Understanding individual agent importance in multi-agent system via counterfactual reasoning [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2025 , 39 ( 15 ): 15785 - 15794 .

RUAN Z H , YU C . Causality-guided exploration for multi-agent reinforcement learning [C ] // 2024 IEEE International Conference on Agents . Piscataway : IEEE , 2024 : 56 - 59 .

MADUMAL P , MILLER T , SONENBERG L , et al . Explainable reinforcement learning through a causal lens [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 3 ): 2493 - 2500 .

CHU T S , WANG J , CODECÀ L , et al . Multi-agent deep reinforcement learning for large-scale traffic signal control [J ] . IEEE Transactions on Intelligent Transportation Systems , 2020 , 21 ( 3 ): 1086 - 1095 .

WANG X Q , KE L J , QIAO Z M , et al . Large-scale traffic signal control using a novel multiagent reinforcement learning [J ] . IEEE Transactions on Cybernetics , 2021 , 51 ( 1 ): 174 - 187 .

GU W , KATO S , LIU D B , et al . Integrating suboptimal human knowledge with hierarchical reinforcement learning for large-scale multiagent systems [C ] // Advances in Neural Information Processing Systems 37 . Neural Information Processing Systems Foundation, Inc. (NeurIPS) , 2024 : 102744 - 102767 .

LI J C , SHI H B , HWANG K S . Using fuzzy logic to learn abstract policies in large-scale multiagent reinforcement learning [J ] . IEEE Transactions on Fuzzy Systems , 2022 , 30 ( 12 ): 5211 - 5224 .

LIU Y L , LUO G Y , YUAN Q , et al . GPLight: Grouped multi-agent reinforcement learning for large-scale traffic signal control [C ] // Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence . California : IJCAI , 2023 : 199 - 207 .

HAO Q Y , HUANG W Z , FENG T , et al . GAT-MF: Graph attention mean field for very large scale multi-agent reinforcement learning [C ] // Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . New York : ACM , 2023 : 685 - 697 .

IQBAL S , SHA F . Actor-attention-critic for multi-agent reinforcement learning [EB/OL ] . ( 2019-05-27 )[ 2025-10-10 ] . https://arXiv.org/abs/1810.02912 https://arXiv.org/abs/1810.02912 .

MA C D , LI A M , DU Y L , et al . Efficient and scalable reinforcement learning for large-scale network control [J ] . Nature Machine Intelligence , 2024 , 6 ( 9 ): 1006 - 1020 .

ZHAO X Y , WU C . Large-scale machine learning cluster scheduling via multi-agent graph reinforcement learning [J ] . IEEE Transactions on Network and Service Management , 2022 , 19 ( 4 ): 4962 - 4974 .

YE Y J , TANG Y , WANG H Y , et al . A scalable privacy-preserving multi-agent deep reinforcement learning approach for large-scale peer-to-peer transactive energy trading [J ] . IEEE Transactions on Smart Grid , 2021 , 12 ( 6 ): 5185 - 5200 .

GAO Z , YANG L , DAI Y . Large-scale computation offloading using a multi-agent reinforcement learning in heterogeneous multi-access edge computing [J ] . IEEE Transactions on Mobile Computing , 2023 , 22 ( 6 ): 3425 - 3443 .

LEROY P , MORATO P G , PISANE J , et al . IMP-MARL: A suite of environments for large-scale infrastructure management planning via MARL [EB/OL ] . ( 2023-10-27 )[ 2025-10-10 ] . https://arXiv.org/abs/2306.11551 https://arXiv.org/abs/2306.11551 .

RILEY J , CALINESCU R , PATERSON C , et al . Utilising assured multi-agent reinforcement learning within safety-critical scenarios [J ] . Procedia Computer Science , 2021 , 192 : 1061 - 1070 .

GU S D , GRUDZIEN KUBA J , CHEN Y P , et al . Safe multi-agent reinforcement learning for multi-robot control [J ] . Artificial Intelligence , 2023 , 319 : 103905 .

HAN S Y , ZHOU S L , WANG J W , et al . A multi-agent reinforcement learning approach for safe and efficient behavior planning of connected autonomous vehicles [EB/OL ] . ( 2022-09-04 )[ 2025-10-10 ] . https://arXiv.org/abs/2003.04371 https://arXiv.org/abs/2003.04371 .

QIU Y B , JIN Y , YU L B , et al . Safe multi-agent reinforcement learning via dynamic shielding [C ] // 2024 IEEE Conference on Artificial Intelligence . Piscataway : IEEE , 2024 : 1254 - 1257 .

LU S T , ZHANG K Q , CHEN T Y , et al . Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2021 , 35 ( 10 ): 8767 - 8775 .

GU S D , HUANG D Y , WEN M N , et al . Safe multiagent learning with soft constrained policy optimization in real robot control [J ] . IEEE Transactions on Industrial Informatics , 2024 , 20 ( 9 ): 10706 - 10716 .

GAO Z , FU J M , JING Z M , et al . MOIPC-MAAC: Communication-assisted multiobjective MARL for trajectory planning and task offloading in multi-UAV-assisted MEC [J ] . IEEE Internet of Things Journal , 2024 , 11 ( 10 ): 18483 - 18502 .

CUI J J , LIU Y W , NALLANATHAN A . Multi-agent reinforcement learning-based resource allocation for UAV networks [J ] . IEEE Transactions on Wireless Communications , 2020 , 19 ( 2 ): 729 - 743 .

PAN Y H , WANG X C , XU Z Y , et al . GNN-empowered effective partial observation MARL method for AoI management in multi-UAV network [J ] . IEEE Internet of Things Journal , 2024 , 11 ( 21 ): 34541 - 34553 .

SHI R Y , YU X , WANG Y D , et al . Symmetry-informed MARL: A decentralized and cooperative UAV swarm control approach for communication coverage [J ] . IEEE Transactions on Mobile Computing , 2025 , 24 ( 9 ): 8039 - 8056 .

ZHANG Y , MOU Z Y , GAO F F , et al . UAV-enabled secure communications by multi-agent deep reinforcement learning [J ] . IEEE Transactions on Vehicular Technology , 2020 , 69 ( 10 ): 11599 - 11611 .

CHEN S T , LIU G J , ZHOU Z Y , et al . Robust multi-agent reinforcement learning method based on adversarial domain randomization for real-world dual-UAV cooperation [J ] . IEEE Transactions on Intelligent Vehicles , 2024 , 9 ( 1 ): 1615 - 1627 .

XIA Z Y , DU J , WANG J J , et al . Multi-agent reinforcement learning aided intelligent UAV swarm for target tracking [J ] . IEEE Transactions on Vehicular Technology , 2022 , 71 ( 1 ): 931 - 945 .

CHEN D Z , QI Q , FU Q L , et al . Transformer-based reinforcement learning for scalable multi-UAV area coverage [J ] . IEEE Transactions on Intelligent Transportation Systems , 2024 , 25 ( 8 ): 10062 - 10077 .

BIAGIONI D , ZHANG X Y , WALD D , et al . PowerGridworld: A framework for multi-agent reinforcement learning in power systems [C ] // Proceedings of the Thirteenth ACM International Conference on Future Energy Systems . New York : ACM , 2022 : 565 - 570 .

WANG J H , XU W K , GU Y J , et al . Multi-agent reinforcement learning for active voltage control on power distribution networks [C ] // Proceedings of the 35th International Conference on Neural Information Processing Systems . New York : ACM , 2021 : 3271 - 3284 .

CHEN D , CHEN K A , LI Z J , et al . PowerNet: Multi-agent deep reinforcement learning for scalable powergrid control [J ] . IEEE Transactions on Power Systems , 2022 , 37 ( 2 ): 1007 - 1017 .

SHARMA M K , ZAPPONE A , ASSAAD M , et al . Distributed power control for large energy harvesting networks: A multi-agent deep reinforcement learning approach [J ] . IEEE Transactions on Cognitive Communications and Networking , 2019 , 5 ( 4 ): 1140 - 1154 .

ROESCH M , LINDER C , ZIMMERMANN R , et al . Smart grid for industry using multi-agent reinforcement learning [J ] . Applied Sciences , 2020 , 10 ( 19 ): 10196900 .

YU T , WANG H Z , ZHOU B , et al . Multi-agent correlated equilibrium Q(λ) learning for coordinated smart generation control of interconnected power grids [J ] . IEEE Transactions on Power Systems , 2015 , 30 ( 4 ): 1669 - 1679 .

MU C X , LIU Z Y , YAN J , et al . Graph multi-agent reinforcement learning for inverter-based active voltage control [J ] . IEEE Transactions on Smart Grid , 2024 , 15 ( 2 ): 1399 - 1409 .

HU D E , YE Z H , GAO Y Q , et al . Multi-agent deep reinforcement learning for voltage control with coordinated active and reactive power optimization [J ] . IEEE Transactions on Smart Grid , 2022 , 13 ( 6 ): 4873 - 4886 .

GAO Y Q , WANG W , YU N P . Consensus multi-agent reinforcement learning for volt-VAR control in power distribution networks [J ] . IEEE Transactions on Smart Grid , 2021 , 12 ( 4 ): 3594 - 3604 .

LIU T Y , CHEN H C , HU J F , et al . Generalized multi-agent competitive reinforcement learning with differential augmentation [J ] . Expert Systems with Applications , 2024 , 238 : 121760 .

DASKALAKIS C , FOSTER D J , GOLOWICH N . Independent policy gradient methods for competitive reinforcement learning [EB/OL ] . ( 2021-01-11 )[ 2025-10-10 ] . https://arXiv.org/abs/2101.04233 https://arXiv.org/abs/2101.04233 .

CHEN C Q , YANG H N , ZHAI C J , et al . Competitive pricing for ride-sourcing platforms with MARL [J ] . Transportation Research Part C: Emerging Technologies , 2024 , 165 : 104697 .

WU J H , WANG J D , KONG X Y . Strategic bidding in a competitive electricity market: An intelligent method using Multi-Agent Transfer Learning based on reinforcement learning [J ] . Energy , 2022 , 256 : 124657 .

LIU Z , LU M , WANG Z , et al . Welfare maximization in competitive equilibrium: Reinforcement learning for markov exchange economy [C ] // International Conference on Machine Learning . Cambridge : PMLR , 2022 : 13870 - 13911 .

BAI Y , JIN C . Provable self-play algorithms for competitive reinforcement learning [EB/OL ] . ( 2020-07-09 )[ 2025-10-10 ] . https://arXiv.org/abs/2002.04017 https://arXiv.org/abs/2002.04017 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

大语言模型增强的抗灰洞攻击海域无人机路由算法

面向工业场景的边-云协同大语言模型细粒度推理任务卸载

基于动态关系原型的持续关系抽取技术

基于元权重网络的跨场景点预测人群计数方法