

浏览全部资源
扫码关注微信
1.清华大学深圳国际研究生院,广东深圳 518055
2.哈尔滨工业大学(深圳)计算机科学与技术学院,广东深圳 518055
3.清华大学电子工程系,北京 100084
Received:16 February 2026,
Accepted:19 March 2026,
Published:25 March 2026
移动端阅览
李骏唯, 阮书岚, 梁嘉旋, 等. 面向多智能体博弈策略鲁棒性的自适应融合评估方法[J]. 电子学报, 2026, 54(03): 912-926.
LI Junwei, RUAN Shulan, LIANG Jiaxuan, et al. Adaptive Fusion-Based Robustness Evaluation Method for Multi-Agent Game Strategies[J]. Acta Electronica Sinica, 2026, 54(03): 912-926.
李骏唯, 阮书岚, 梁嘉旋, 等. 面向多智能体博弈策略鲁棒性的自适应融合评估方法[J]. 电子学报, 2026, 54(03): 912-926. DOI:10.12263/DZXB.20251264
LI Junwei, RUAN Shulan, LIANG Jiaxuan, et al. Adaptive Fusion-Based Robustness Evaluation Method for Multi-Agent Game Strategies[J]. Acta Electronica Sinica, 2026, 54(03): 912-926. DOI:10.12263/DZXB.20251264
随着多智能体强化学习算法的快速发展,智能体在博弈任务中的协作与竞争能力得到了显著提升。然而,面对实际场景中环境的动态变化,智能体策略在跨环境迁移中的性能波动问题日益凸显。尽管当前已涌现出对抗训练、域随机化等鲁棒性增强技术,但现有的鲁棒性评估体系仍存在明显局限。现有方法往往仅关注平均奖励等单一性能指标的变化,忽视了碰撞次数等反映安全性或稳定性的特征,难以全面衡量策略的稳定性。此外,由于缺乏统一的测试基准,不同研究常依赖特定的实验环境参数设定,导致算法难以在不同的场景条件下进行公平的横向比较。这些局限制约了博弈策略的实际落地与迭代优化。为此,本文提出了面向多智能体博弈策略的多维自适应融合鲁棒性评估方法,旨在通过数学形式化建模实现对策略稳定性的量化分析。首先,本文设计了基于条件变异系数(Conditional Coefficient of Variation, CondCV)的鲁棒性评分指标(Robustness Score, RS),用于精确捕捉并融合多种基础评测指标在环境扰动下的波动特征。通过消除指标间的量纲差异,该方法构建了一种标准化的通用度量,具备良好的自适应性与评估公平性,广泛适用于多智能体协作、对抗等各类环境下的策略评估。同时,针对多维指标权重的分配,本文提出基于α-Rank演化博弈的权重自适应融合框架。该框架将指标间的排序一致性建模为博弈过程,通过计算稳态分布获得客观权重,并与先验权重进行动态融合,有效平衡了指标的客观稳定性与专家先验知识。为验证方法的有效性,本文基于Isaac Sim平台自主构建了高度可配置的实验环境,涵盖对抗与协作两类典型的多智能体博弈场景,并集成多种主流算法开展了系统性的实验验证。实验结果表明,该评估方法可有效度量策略在不同环境设定下的稳定性,具备多维波动捕捉能力和跨任务通用性,为算法优化与评估提供了理论支持和参考。最后,本文探讨了评估方法在虚实迁移中的应用潜力,并提出了相应的可行方案,为未来研究提供了参考。
With the rapid development of multi-agent reinforcement learning algorithms
agents’ capabilities for cooperation and competition in game-based tasks have significantly improved. However
given the dynamic changes of real-world environments
performance fluctuations of strategies during cross-environment transfer have become increasingly prominent. Although robustness enhancement techniques such as adversarial training and domain randomization have emerged
existing robustness evaluation frameworks still exhibit evident limitations. Current methods often focus only on changes in a single performance metric such as average reward
while neglecting safety or stability metrics such as collision frequency
making it difficult to comprehensively evaluate strategy stability. In addition
the lack of unified evaluation benchmarks leads different studies to rely on specific experimental parameter settings
hindering fair comparisons across diverse scenarios. These limitations restrict the practical deployment and iterative optimization of game strategies. To address these issues
we propose a robustness evaluation method for multi-agent game strategies via multidimensional adaptive fusion
aiming to provide a quantitative analysis of strategy stability through mathematically formalized modeling. First
we design a robustness score (RS) based on the conditional coefficient of variation (CondCV) to accurately capture and fuse the fluctuation characteristics of base metrics under environmental perturbations. By eliminating dimensional differences among metrics
the method establishes a standardized and generalizable measurement with strong adaptability and evaluation fairness
making it broadly applicable to strategy evaluation in cooperative
competitive
and other multi-agent environments. To address the weight assignment for multidimensional metrics
we propose an adaptive weight fusion framework based on an adversarial α-Rank evolutionary game. This framework models ranking consistency among metrics as a game process
derives objective weights from the stationary distribution
and dynamically fuses them with expert prior weights
achieving a balance between objective metric stability and expert prior knowledge. To validate the effectiveness of our method
we develops highly configurable multi-agent environments based on Isaac Sim that cover typical adversarial and cooperative game scenarios
and conducts systematic experiments with various mainstream algorithms. Experimental results demonstrate that the evaluation method can effectively measure strategy stability under diverse environmental settings
exhibiting multidimensional fluctuation-capturing capability and cross-task generality
thereby providing theoretical support and reference for algorithm optimization and evaluation. Finally
we discuss the potential application of the evaluation method in sim-to-real transfer and propose corresponding feasible solutions
offering insights for future research.
Watkins C J C H , Dayan P . Q-learning [J ] . Machine Learning , 1992 , 8 ( 3/4 ): 279 - 292 . DOI: 10.1023/a:1022676722315 http://dx.doi.org/10.1023/a:1022676722315
Rummery G A , Niranjan M . On-line Q-learning using connectionist systems [R ] . Cambridge : University of Cambridge , 1994 .
Liu H J , Ruan S L , Liu Q , et al . Global structure-aware and feature-augmented graph neural network for heterophilic graphs [J ] . ACM Transactions on Information Systems , 2026 , 44 ( 2 ): 1 - 28 . DOI: 10.1145/3775057 http://dx.doi.org/10.1145/3775057
Wang X , Wang S , Liang X X , et al . Deep reinforcement learning: A survey [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2024 , 35 ( 4 ): 5064 - 5078 . DOI: 10.1109/tnnls.2022.3207346 http://dx.doi.org/10.1109/tnnls.2022.3207346
顾健华 , 冯建华 , 许辉阳 , 等 . 基于有向图与卷积网络强化学习的端侧协同算力资源分配方法 [J ] . 电子学报 , 2025 , 53 ( 6 ): 1771 - 1783 .
Gu Jianhua , Feng Jianhua , Xu Huiyang , et al . Directed graph and convolutional network reinforcement learning for terminal-side collaborative computing resource allocation scheme [J ] . Acta Electronica Sinica , 2025 , 53 ( 6 ): 1771 - 1783 . (in Chinese)
王为念 , 苏健 , 陈勇 , 等 . 基于多智能体深度强化学习的车联网频谱共享 [J ] . 电子学报 , 2024 , 52 ( 5 ): 1690 - 1699 . DOI: 10.3390/fi16050152 http://dx.doi.org/10.3390/fi16050152
Wang Weinian , Su Jian , Chen Yong , et al . Multi-agent reinforcement learning enabled spectrum sharing for vehicular networks [J ] . Acta Electronica Sinica , 2024 , 52 ( 5 ): 1690 - 1699 . (in Chinese) . DOI: 10.3390/fi16050152 http://dx.doi.org/10.3390/fi16050152
文鹏 , 叶苗 , 王勇 , 等 . SDWN中基于多智能体图强化学习的多对多通信路由方法 [J ] . 电子学报 , 2025 , 53 ( 6 ): 1885 - 1905 .
Wen Peng , Ye Miao , Wang Yong , et al . A multi-agent graph reinforcement learning method for many-to-many communication routing in SDWN [J ] . Acta Electronica Sinica , 2025 , 53 ( 6 ): 1885 - 1905 . (in Chinese)
Littman M L . Value-function reinforcement learning in Markov games [J ] . Cognitive Systems Research , 2001 , 2 ( 1 ): 55 - 66 . DOI: 10.1016/s1389-0417(01)00015-8 http://dx.doi.org/10.1016/s1389-0417(01)00015-8
Lowe R , Wu Yi , Tamar A , et al . Multi-agent actor-critic for mixed cooperative-competitive environments [C ] // Proceedings of the 31st International Conference on Neural Information Processing System . New York : Curran Associates, Inc. , 2017 : 6382 - 6393 .
Sun H R , Wu Y S , Cheng Y K , et al . Game theory meets large language models: A systematic survey [C ] // Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence . International Joint Conferences on Artificial Intelligence Organization , 2025 : 10669 - 10677 . DOI: 10.24963/ijcai.2025/1184 http://dx.doi.org/10.24963/ijcai.2025/1184
Vinyals O , Babuschkin I , Czarnecki W M , et al . Grandmaster level in StarCraft II using multi-agent reinforcement learning [J ] . Nature , 2019 , 575 ( 7782 ): 350 - 354 . DOI: 10.1038/s41586-019-1724-z http://dx.doi.org/10.1038/s41586-019-1724-z
Huang H , Hu Z Q , Li M Y , et al . Cooperative optimization of traffic signals and vehicle speed using a novel multi-agent deep reinforcement learning [J ] . IEEE Transactions on Vehicular Technology , 2024 , 73 ( 6 ): 7785 - 7798 . DOI: 10.1109/tvt.2024.3359282 http://dx.doi.org/10.1109/tvt.2024.3359282
Zhu Y J , Chen M Z , Wang S H , et al . Collaborative reinforcement learning based unmanned aerial vehicle (UAV) trajectory design for 3D UAV tracking [J ] . IEEE Transactions on Mobile Computing , 2024 , 23 ( 12 ): 10787 - 10802 . DOI: 10.1109/tmc.2024.3382913 http://dx.doi.org/10.1109/tmc.2024.3382913
Dimitropoulos K , Hatzilygeroudis I , Chatzilygeroudis K . A brief survey of Sim2Real methods for robot learning [M ] // Advances in Service and Industrial Robotics . ChamSpringer International Publishing , 2022 : 133 - 140 . DOI: 10.1007/978-3-031-04870-8_16 http://dx.doi.org/10.1007/978-3-031-04870-8_16
Pinto L , Davidson J , Sukthankar R , et al . Robust adversarial reinforcement learning [C ] // Proceedings of the 34th International Conference on Machine Learning . Sydney : PMLR , 2017 : 2817 - 2826 .
Tessler C , Efroni Y , Mannor S . Action robust reinforcement learning and applications in continuous control [C ] // Proceedings of the 36th International Conference on Machine Learning . Long Beach : PMLR , 2019 : 6215 - 6224 .
Lee X Y , Ghadai S , Tan K L , et al . Spatiotemporally constrained action space attacks on deep reinforcement learning agents [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 4 ): 4577 - 4584 . DOI: 10.1609/aaai.v34i04.5887 http://dx.doi.org/10.1609/aaai.v34i04.5887
Tobin J , Fong R , Ray A , et al . Domain randomization for transferring deep neural networks from simulation to the real world [C ] // 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems . Piscataway : IEEE , 2017 : 23 - 30 . DOI: 10.1109/iros.2017.8202133 http://dx.doi.org/10.1109/iros.2017.8202133
Peng X B , Andrychowicz M , Zaremba W , et al . Sim-to-real transfer of robotic control with dynamics randomization [C ] // 2018 IEEE International Conference on Robotics and Automation . Piscataway : IEEE , 2018 : 3803 - 3810 . DOI: 10.1109/icra.2018.8460528 http://dx.doi.org/10.1109/icra.2018.8460528
Geng M H , Pateria S , Subagdja B , et al . MOSMAC: A multi-agent reinforcement learning benchmark on sequential multi-objective tasks [J ] . Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1 , 2025 : 867 - 876 . DOI: 10.65109/ozrw1498 http://dx.doi.org/10.65109/ozrw1498
Zheng X , Ma X J , Wang S J , et al . Toward evaluating robustness of reinforcement learning with adversarial policy [C ] // 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks . Piscataway : IEEE , 2024 : 288 - 301 . DOI: 10.1109/dsn58291.2024.00038 http://dx.doi.org/10.1109/dsn58291.2024.00038
林谦 , 余超 , 伍夏威 , 等 . 面向机器人系统的虚实迁移强化学习综述 [J ] . 软件学报 , 2024 , 35 ( 2 ): 711 - 738 .
Lin Qian , Yu Chao , Wu Xiawei , et al . Survey on sim-to-real transfer reinforcement learning in robot systems [J ] . Journal of Software , 2024 , 35 ( 2 ): 711 - 738 . (in Chinese)
Samvelyan M , Rashid T , Schroeder de Witt C , et al . The StarCraft multi-agent challenge [J ] . Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1 , 2019 : 2186 - 2188 . DOI: 10.65109/lvzz5205 http://dx.doi.org/10.65109/lvzz5205
Bard N , Foerster J N , Chandar S , et al . The Hanabi challenge: A new frontier for AI research [J ] . Artificial Intelligence , 2020 , 280 : 103216 . DOI: 10.1016/j.artint.2019.103216 http://dx.doi.org/10.1016/j.artint.2019.103216
Kurach K , Raichuk A , Stańczyk P , et al . Google research football: A novel reinforcement learning environment [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 4 ): 4501 - 4510 . DOI: 10.1609/aaai.v34i04.5878 http://dx.doi.org/10.1609/aaai.v34i04.5878
Omidshafiei S , Papadimitriou C , Piliouras G , et al . α-Rank: Multi-agent evaluation by evolution [J ] . Scientific Reports , 2019 , 9 : 9937 . DOI: 10.1038/s41598-019-45619-9 http://dx.doi.org/10.1038/s41598-019-45619-9
NVIDIA . NVIDIA Isaac sim [EB/OL ] . [ 2026-02-14 ] . https://developer.nvidia.com/isaac-sim https://developer.nvidia.com/isaac-sim .
Wang J D , Lan C L , Liu C , et al . Generalizing to unseen domains: A survey on domain generalization [J ] . IEEE Transactions on Knowledge and Data Engineering , 2023 , 35 ( 8 ): 8052 - 8072 .
Mnih V , Kavukcuoglu K , Silver D , et al . Human-level control through deep reinforcement learning [J ] . Nature , 2015 , 518 ( 7540 ): 529 - 533 . DOI: 10.1038/nature14236 http://dx.doi.org/10.1038/nature14236
Schulman J , Wolski F , Dhariwal P , et al . Proximal policy optimization algorithms [PP/OL ] . V2. arXiv ( 2017-08-28 )[ 2026-02-14 ] . https://doi.org/10.48550/arXiv.1707.06347 https://doi.org/10.48550/arXiv.1707.06347 .
Lillicrap T P , Hunt J J , Pritzel A , et al . Continuous control with deep reinforcement learning [PP/OL ] . V6.arXiv ( 2019-07-05 )[ 2026-02-14 ] . https://doi.org/10.48550/arXiv.1509.02971 https://doi.org/10.48550/arXiv.1509.02971 .
Haarnoja T , Zhou A , Abbeel P , et al . Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [C ] // Proceedings of the 35th International Conference on Machine Learning . Stockholm : PMLR , 2018 : 1861 - 1870 .
Ning Z P , Xie L H . A survey on multi-agent reinforcement learning and its application [J ] . Journal of Automation and Intelligence , 2024 , 3 ( 2 ): 73 - 91 . DOI: 10.1016/j.jai.2024.02.003 http://dx.doi.org/10.1016/j.jai.2024.02.003
Hu Junling , Wellman M P . Nash q-learning for general-sum stochastic games [J ] . The Journal of Machine Learning Research , 2003 , 4 : 1039 - 1069 .
Rashid T , Samvelyan M , Schroeder C , et al . QMIX: Monotonic value function factorisation for deep multi-agent reinforcement Learning [C ] // Proceedings of the 35th International Conference on Machine Learning . Stockholm : PMLR , 2018 : 4295 - 4304 .
Bayen A , Gao J X , Velu A , et al . The surprising effectiveness of PPO in cooperative multi-agent games [C ] // Advances in Neural Information Processing Systems 35 . Neural Information Processing Systems Foundation, Inc. (NeurIPS) , 2022 : 24611 - 24624 . DOI: 10.52202/068431-1787 http://dx.doi.org/10.52202/068431-1787
Kuba J G , Chen Ruiqing , Wen Muning , et al . Trust region policy optimisation in multi-agent reinforcement learning [C/OL ] // Proceedings of the 10th International Conference on Learning Representations , 2022 : 1 - 27 [2026-02-15] . https://openreview.net/forum?id=EcGGFkNTxdJ https://openreview.net/forum?id=EcGGFkNTxdJ .
Li Simin , Guo Jun , Xiu Jingqiao , et al . Byzantine robust cooperative multi-agent reinforcement learning as a Bayesian game [C/OL ] // Proceedings of the 12th International Conference on Learning Representations , 2024 : 1 - 27 [2026-02-15] . https://openreview.net/forum?id=z6KS9D1dxt https://openreview.net/forum?id=z6KS9D1dxt .
Zhou Z Y , Liu G J , Zhou M C , et al . Robust multi-agent reinforcement learning with stochastic adversary [C ] // Proceedings of the 42nd International Conference on Machine Learning . New York : ACM , 2025 : 79004 - 79027 .
Lee S , Hwang J , Jo Y , et al . Wolfpack adversarial attack for robust multi-agent reinforcement learning [C ] // Proceedings of the 42nd International Conference on Machine Learning , 2025 : 33025 - 33056 .
Ruan S L , Zhang Y , Zhang K , et al . DAE-GAN: Dynamic aspect-aware GAN for text-to-image synthesis [C ] // 2021 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2021 : 13940 - 13949 . DOI: 10.1109/iccv48922.2021.01370 http://dx.doi.org/10.1109/iccv48922.2021.01370
Ruan S L , Liu H J , Chen Z , et al . CPWS: Confident programmatic weak supervision for high-quality data labeling [J ] . ACM Transactions on Information Systems , 2025 , 43 ( 4 ): 1 - 26 . DOI: 10.1145/3725730 http://dx.doi.org/10.1145/3725730
Wang A , Singh A , Michael J , et al . GLUE: A multi-task benchmark and analysis platform for natural language understanding [C ] // Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP . Stroudsburg : ACL , 2018 : 353 - 355 . DOI: 10.18653/v1/w18-5446 http://dx.doi.org/10.18653/v1/w18-5446
Srivastava A , Rastogi A , Rao A , et al . Beyond the imitation game: Quantifying and extrapolating the capabilities of language models [J ] . Transactions on Machine Learning Research , 2023 , 2023( 5 ): 1 - 95 .
Bettini M , Prorok A , Moens V . BenchMARL: Benchmarking multi-agent reinforcement learning [C ] // New York : ACM , 2024 : 10557 - 10566 .
Papadopoulos G , Kontogiannis A , Papadopoulou F , et al . An extended benchmarking of multi-agent reinforcement learning algorithms in complex fully cooperative tasks [J ] . Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1 , 2025 : 1613 - 1622 . DOI: 10.65109/mbxj1309 http://dx.doi.org/10.65109/mbxj1309
Li Simin , Mao Zihao , Li Hanxiao , et al . Empirical study on robustness and resilience in cooperative multi-agent reinforcement learning [C ] // Advances in Neural Information Processing Systems 38 . New York : Curran Associates, Inc. , 2025 .
Kendall M G . A new measure of rank correlation [J ] . Biometrika , 1938 , 30 ( 1/2 ): 81 - 93 . DOI: 10.1093/biomet/30.1-2.81 http://dx.doi.org/10.1093/biomet/30.1-2.81
Brockman G , Cheung V , Pettersson L , et al . OpenAI gym [PP/OL ] . V1.arXiv ( 2016-06-05 )[ 2026-02-14 ] . https://doi.org/10.48550/arXiv.1606.01540 https://doi.org/10.48550/arXiv.1606.01540 .
Fujimoto S , Hoof H , Meger D . Addressing function approximation error in actor-critic methods [C ] // Proceedings of the 35th International Conference on Machine Learning . Stockholm : PMLR , 2018 : 1587 - 1596 . DOI: 10.48550/arXiv.1802.09477 http://dx.doi.org/10.48550/arXiv.1802.09477
Bhatt A , Palenicek D , Belousov B , et al . CrossQ: Batch normalization in deep reinforcement learning for greater sample efficiency and simplicity [C/OL ] // The 12th International Conference on Learning Representations , 2024 : 1 - 19 [2026-02-15] . https://openreview.net/forum?id=PczQtTsTIX https://openreview.net/forum?id=PczQtTsTIX .
Schulman J , Levine S , Abbeel P , et al . Trust region policy optimization [C ] // Proceedings of the 32nd International Conference on Machine Learning . Lille : PMLR , 2015 : 1889 - 1897 .
Kuznetsov A , Shvechikov P , Grishin A , et al . Controlling overestimation bias with truncated mixture of continuous distributional quantile critics [C ] // Proceedings of the 37th International Conference on Machine Learning . New York : ACM , 2020 : 5556 - 5566 .
0
Views
32
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621