Adaptive Fusion-Based Robustness Evaluation Method for Multi-Agent Game Strategies

LI Junwei; RUAN Shulan; LIANG Jiaxuan; LIU Yu; HE You

doi:10.12263/DZXB.20251264

您当前的位置：

首页 >

文章列表页 >

Adaptive Fusion-Based Robustness Evaluation Method for Multi-Agent Game Strategies

The Theory and Application of Swarm Intelligence Technology in the Information\-Rich Era | 更新时间：2026-06-16

- Adaptive Fusion-Based Robustness Evaluation Method for Multi-Agent Game Strategies
- ACTA ELECTRONICA SINICA Vol. 54, Issue 3, Pages: 912-926(2026)
- 作者机构：
  
  1.清华大学深圳国际研究生院，广东深圳 518055
  2.哈尔滨工业大学（深圳）计算机科学与技术学院，广东深圳 518055
  3.清华大学电子工程系，北京 100084
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(62293544;62425117;62506205);China Postdoctoral Science Foundation(2025T180426);Postdoctoral Fellowship Program of CPSF(GZB20250393)
- DOI：10.12263/DZXB.20251264
  CLC： TP181;
- Received：16 February 2026，
  
  Accepted：19 March 2026，
  
  Published：25 March 2026
- 稿件说明：
移动端阅览
李骏唯, 阮书岚, 梁嘉旋, 等. 面向多智能体博弈策略鲁棒性的自适应融合评估方法[J]. 电子学报, 2026, 54(03): 912-926.

LI Junwei, RUAN Shulan, LIANG Jiaxuan, et al. Adaptive Fusion-Based Robustness Evaluation Method for Multi-Agent Game Strategies[J]. Acta Electronica Sinica, 2026, 54(03): 912-926.
李骏唯, 阮书岚, 梁嘉旋, 等. 面向多智能体博弈策略鲁棒性的自适应融合评估方法[J]. 电子学报, 2026, 54(03): 912-926. DOI：10.12263/DZXB.20251264

LI Junwei, RUAN Shulan, LIANG Jiaxuan, et al. Adaptive Fusion-Based Robustness Evaluation Method for Multi-Agent Game Strategies[J]. Acta Electronica Sinica, 2026, 54(03): 912-926. DOI：10.12263/DZXB.20251264

摘要

随着多智能体强化学习算法的快速发展，智能体在博弈任务中的协作与竞争能力得到了显著提升。然而，面对实际场景中环境的动态变化，智能体策略在跨环境迁移中的性能波动问题日益凸显。尽管当前已涌现出对抗训练、域随机化等鲁棒性增强技术，但现有的鲁棒性评估体系仍存在明显局限。现有方法往往仅关注平均奖励等单一性能指标的变化，忽视了碰撞次数等反映安全性或稳定性的特征，难以全面衡量策略的稳定性。此外，由于缺乏统一的测试基准，不同研究常依赖特定的实验环境参数设定，导致算法难以在不同的场景条件下进行公平的横向比较。这些局限制约了博弈策略的实际落地与迭代优化。为此，本文提出了面向多智能体博弈策略的多维自适应融合鲁棒性评估方法，旨在通过数学形式化建模实现对策略稳定性的量化分析。首先，本文设计了基于条件变异系数（Conditional Coefficient of Variation， CondCV）的鲁棒性评分指标（Robustness Score， RS），用于精确捕捉并融合多种基础评测指标在环境扰动下的波动特征。通过消除指标间的量纲差异，该方法构建了一种标准化的通用度量，具备良好的自适应性与评估公平性，广泛适用于多智能体协作、对抗等各类环境下的策略评估。同时，针对多维指标权重的分配，本文提出基于α-Rank演化博弈的权重自适应融合框架。该框架将指标间的排序一致性建模为博弈过程，通过计算稳态分布获得客观权重，并与先验权重进行动态融合，有效平衡了指标的客观稳定性与专家先验知识。为验证方法的有效性，本文基于Isaac Sim平台自主构建了高度可配置的实验环境，涵盖对抗与协作两类典型的多智能体博弈场景，并集成多种主流算法开展了系统性的实验验证。实验结果表明，该评估方法可有效度量策略在不同环境设定下的稳定性，具备多维波动捕捉能力和跨任务通用性，为算法优化与评估提供了理论支持和参考。最后，本文探讨了评估方法在虚实迁移中的应用潜力，并提出了相应的可行方案，为未来研究提供了参考。

Abstract

With the rapid development of multi-agent reinforcement learning algorithms

agents’ capabilities for cooperation and competition in game-based tasks have significantly improved. However

given the dynamic changes of real-world environments

performance fluctuations of strategies during cross-environment transfer have become increasingly prominent. Although robustness enhancement techniques such as adversarial training and domain randomization have emerged

existing robustness evaluation frameworks still exhibit evident limitations. Current methods often focus only on changes in a single performance metric such as average reward

while neglecting safety or stability metrics such as collision frequency

making it difficult to comprehensively evaluate strategy stability. In addition

the lack of unified evaluation benchmarks leads different studies to rely on specific experimental parameter settings

hindering fair comparisons across diverse scenarios. These limitations restrict the practical deployment and iterative optimization of game strategies. To address these issues

we propose a robustness evaluation method for multi-agent game strategies via multidimensional adaptive fusion

aiming to provide a quantitative analysis of strategy stability through mathematically formalized modeling. First

we design a robustness score (RS) based on the conditional coefficient of variation (CondCV) to accurately capture and fuse the fluctuation characteristics of base metrics under environmental perturbations. By eliminating dimensional differences among metrics

the method establishes a standardized and generalizable measurement with strong adaptability and evaluation fairness

making it broadly applicable to strategy evaluation in cooperative

competitive

and other multi-agent environments. To address the weight assignment for multidimensional metrics

we propose an adaptive weight fusion framework based on an adversarial α-Rank evolutionary game. This framework models ranking consistency among metrics as a game process

derives objective weights from the stationary distribution

and dynamically fuses them with expert prior weights

achieving a balance between objective metric stability and expert prior knowledge. To validate the effectiveness of our method

we develops highly configurable multi-agent environments based on Isaac Sim that cover typical adversarial and cooperative game scenarios

and conducts systematic experiments with various mainstream algorithms. Experimental results demonstrate that the evaluation method can effectively measure strategy stability under diverse environmental settings

exhibiting multidimensional fluctuation-capturing capability and cross-task generality

thereby providing theoretical support and reference for algorithm optimization and evaluation. Finally

we discuss the potential application of the evaluation method in sim-to-real transfer and propose corresponding feasible solutions

offering insights for future research.

关键词

Keywords

references

Watkins C J C H , Dayan P . Q-learning [J ] . Machine Learning , 1992 , 8 ( 3/4 ): 279 - 292 . DOI: 10.1023/a:1022676722315 http://dx.doi.org/10.1023/a:1022676722315

Rummery G A , Niranjan M . On-line Q-learning using connectionist systems [R ] . Cambridge : University of Cambridge , 1994 .

Liu H J , Ruan S L , Liu Q , et al . Global structure-aware and feature-augmented graph neural network for heterophilic graphs [J ] . ACM Transactions on Information Systems , 2026 , 44 ( 2 ): 1 - 28 . DOI: 10.1145/3775057 http://dx.doi.org/10.1145/3775057

Wang X , Wang S , Liang X X , et al . Deep reinforcement learning: A survey [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2024 , 35 ( 4 ): 5064 - 5078 . DOI: 10.1109/tnnls.2022.3207346 http://dx.doi.org/10.1109/tnnls.2022.3207346

顾健华 , 冯建华 , 许辉阳 , 等 . 基于有向图与卷积网络强化学习的端侧协同算力资源分配方法 [J ] . 电子学报 , 2025 , 53 ( 6 ): 1771 - 1783 .

Gu Jianhua , Feng Jianhua , Xu Huiyang , et al . Directed graph and convolutional network reinforcement learning for terminal-side collaborative computing resource allocation scheme [J ] . Acta Electronica Sinica , 2025 , 53 ( 6 ): 1771 - 1783 . (in Chinese)

王为念 , 苏健 , 陈勇 , 等 . 基于多智能体深度强化学习的车联网频谱共享 [J ] . 电子学报 , 2024 , 52 ( 5 ): 1690 - 1699 . DOI: 10.3390/fi16050152 http://dx.doi.org/10.3390/fi16050152

Wang Weinian , Su Jian , Chen Yong , et al . Multi-agent reinforcement learning enabled spectrum sharing for vehicular networks [J ] . Acta Electronica Sinica , 2024 , 52 ( 5 ): 1690 - 1699 . (in Chinese) . DOI: 10.3390/fi16050152 http://dx.doi.org/10.3390/fi16050152

文鹏 , 叶苗 , 王勇 , 等 . SDWN中基于多智能体图强化学习的多对多通信路由方法 [J ] . 电子学报 , 2025 , 53 ( 6 ): 1885 - 1905 .

Wen Peng , Ye Miao , Wang Yong , et al . A multi-agent graph reinforcement learning method for many-to-many communication routing in SDWN [J ] . Acta Electronica Sinica , 2025 , 53 ( 6 ): 1885 - 1905 . (in Chinese)

Littman M L . Value-function reinforcement learning in Markov games [J ] . Cognitive Systems Research , 2001 , 2 ( 1 ): 55 - 66 . DOI: 10.1016/s1389-0417(01)00015-8 http://dx.doi.org/10.1016/s1389-0417(01)00015-8

Lowe R , Wu Yi , Tamar A , et al . Multi-agent actor-critic for mixed cooperative-competitive environments [C ] // Proceedings of the 31st International Conference on Neural Information Processing System . New York : Curran Associates, Inc. , 2017 : 6382 - 6393 .

Sun H R , Wu Y S , Cheng Y K , et al . Game theory meets large language models: A systematic survey [C ] // Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence . International Joint Conferences on Artificial Intelligence Organization , 2025 : 10669 - 10677 . DOI: 10.24963/ijcai.2025/1184 http://dx.doi.org/10.24963/ijcai.2025/1184

Vinyals O , Babuschkin I , Czarnecki W M , et al . Grandmaster level in StarCraft II using multi-agent reinforcement learning [J ] . Nature , 2019 , 575 ( 7782 ): 350 - 354 . DOI: 10.1038/s41586-019-1724-z http://dx.doi.org/10.1038/s41586-019-1724-z

Huang H , Hu Z Q , Li M Y , et al . Cooperative optimization of traffic signals and vehicle speed using a novel multi-agent deep reinforcement learning [J ] . IEEE Transactions on Vehicular Technology , 2024 , 73 ( 6 ): 7785 - 7798 . DOI: 10.1109/tvt.2024.3359282 http://dx.doi.org/10.1109/tvt.2024.3359282

Zhu Y J , Chen M Z , Wang S H , et al . Collaborative reinforcement learning based unmanned aerial vehicle (UAV) trajectory design for 3D UAV tracking [J ] . IEEE Transactions on Mobile Computing , 2024 , 23 ( 12 ): 10787 - 10802 . DOI: 10.1109/tmc.2024.3382913 http://dx.doi.org/10.1109/tmc.2024.3382913

Dimitropoulos K , Hatzilygeroudis I , Chatzilygeroudis K . A brief survey of Sim2Real methods for robot learning [M ] // Advances in Service and Industrial Robotics . ChamSpringer International Publishing , 2022 : 133 - 140 . DOI: 10.1007/978-3-031-04870-8_16 http://dx.doi.org/10.1007/978-3-031-04870-8_16

Pinto L , Davidson J , Sukthankar R , et al . Robust adversarial reinforcement learning [C ] // Proceedings of the 34th International Conference on Machine Learning . Sydney : PMLR , 2017 : 2817 - 2826 .

Tessler C , Efroni Y , Mannor S . Action robust reinforcement learning and applications in continuous control [C ] // Proceedings of the 36th International Conference on Machine Learning . Long Beach : PMLR , 2019 : 6215 - 6224 .

Lee X Y , Ghadai S , Tan K L , et al . Spatiotemporally constrained action space attacks on deep reinforcement learning agents [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 4 ): 4577 - 4584 . DOI: 10.1609/aaai.v34i04.5887 http://dx.doi.org/10.1609/aaai.v34i04.5887

Tobin J , Fong R , Ray A , et al . Domain randomization for transferring deep neural networks from simulation to the real world [C ] // 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems . Piscataway : IEEE , 2017 : 23 - 30 . DOI: 10.1109/iros.2017.8202133 http://dx.doi.org/10.1109/iros.2017.8202133

Peng X B , Andrychowicz M , Zaremba W , et al . Sim-to-real transfer of robotic control with dynamics randomization [C ] // 2018 IEEE International Conference on Robotics and Automation . Piscataway : IEEE , 2018 : 3803 - 3810 . DOI: 10.1109/icra.2018.8460528 http://dx.doi.org/10.1109/icra.2018.8460528

Geng M H , Pateria S , Subagdja B , et al . MOSMAC: A multi-agent reinforcement learning benchmark on sequential multi-objective tasks [J ] . Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1 , 2025 : 867 - 876 . DOI: 10.65109/ozrw1498 http://dx.doi.org/10.65109/ozrw1498

Zheng X , Ma X J , Wang S J , et al . Toward evaluating robustness of reinforcement learning with adversarial policy [C ] // 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks . Piscataway : IEEE , 2024 : 288 - 301 . DOI: 10.1109/dsn58291.2024.00038 http://dx.doi.org/10.1109/dsn58291.2024.00038

林谦 , 余超 , 伍夏威 , 等 . 面向机器人系统的虚实迁移强化学习综述 [J ] . 软件学报 , 2024 , 35 ( 2 ): 711 - 738 .

Lin Qian , Yu Chao , Wu Xiawei , et al . Survey on sim-to-real transfer reinforcement learning in robot systems [J ] . Journal of Software , 2024 , 35 ( 2 ): 711 - 738 . (in Chinese)

Samvelyan M , Rashid T , Schroeder de Witt C , et al . The StarCraft multi-agent challenge [J ] . Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1 , 2019 : 2186 - 2188 . DOI: 10.65109/lvzz5205 http://dx.doi.org/10.65109/lvzz5205

Bard N , Foerster J N , Chandar S , et al . The Hanabi challenge: A new frontier for AI research [J ] . Artificial Intelligence , 2020 , 280 : 103216 . DOI: 10.1016/j.artint.2019.103216 http://dx.doi.org/10.1016/j.artint.2019.103216

Kurach K , Raichuk A , Stańczyk P , et al . Google research football: A novel reinforcement learning environment [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 4 ): 4501 - 4510 . DOI: 10.1609/aaai.v34i04.5878 http://dx.doi.org/10.1609/aaai.v34i04.5878

Omidshafiei S , Papadimitriou C , Piliouras G , et al . α-Rank: Multi-agent evaluation by evolution [J ] . Scientific Reports , 2019 , 9 : 9937 . DOI: 10.1038/s41598-019-45619-9 http://dx.doi.org/10.1038/s41598-019-45619-9

NVIDIA . NVIDIA Isaac sim [EB/OL ] . [ 2026-02-14 ] . https://developer.nvidia.com/isaac-sim https://developer.nvidia.com/isaac-sim .

Wang J D , Lan C L , Liu C , et al . Generalizing to unseen domains: A survey on domain generalization [J ] . IEEE Transactions on Knowledge and Data Engineering , 2023 , 35 ( 8 ): 8052 - 8072 .

Mnih V , Kavukcuoglu K , Silver D , et al . Human-level control through deep reinforcement learning [J ] . Nature , 2015 , 518 ( 7540 ): 529 - 533 . DOI: 10.1038/nature14236 http://dx.doi.org/10.1038/nature14236

Schulman J , Wolski F , Dhariwal P , et al . Proximal policy optimization algorithms [PP/OL ] . V2. arXiv ( 2017-08-28 )[ 2026-02-14 ] . https://doi.org/10.48550/arXiv.1707.06347 https://doi.org/10.48550/arXiv.1707.06347 .

Lillicrap T P , Hunt J J , Pritzel A , et al . Continuous control with deep reinforcement learning [PP/OL ] . V6.arXiv ( 2019-07-05 )[ 2026-02-14 ] . https://doi.org/10.48550/arXiv.1509.02971 https://doi.org/10.48550/arXiv.1509.02971 .

Haarnoja T , Zhou A , Abbeel P , et al . Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [C ] // Proceedings of the 35th International Conference on Machine Learning . Stockholm : PMLR , 2018 : 1861 - 1870 .

Ning Z P , Xie L H . A survey on multi-agent reinforcement learning and its application [J ] . Journal of Automation and Intelligence , 2024 , 3 ( 2 ): 73 - 91 . DOI: 10.1016/j.jai.2024.02.003 http://dx.doi.org/10.1016/j.jai.2024.02.003

Hu Junling , Wellman M P . Nash q-learning for general-sum stochastic games [J ] . The Journal of Machine Learning Research , 2003 , 4 : 1039 - 1069 .

Rashid T , Samvelyan M , Schroeder C , et al . QMIX: Monotonic value function factorisation for deep multi-agent reinforcement Learning [C ] // Proceedings of the 35th International Conference on Machine Learning . Stockholm : PMLR , 2018 : 4295 - 4304 .

Bayen A , Gao J X , Velu A , et al . The surprising effectiveness of PPO in cooperative multi-agent games [C ] // Advances in Neural Information Processing Systems 35 . Neural Information Processing Systems Foundation, Inc. (NeurIPS) , 2022 : 24611 - 24624 . DOI: 10.52202/068431-1787 http://dx.doi.org/10.52202/068431-1787

Kuba J G , Chen Ruiqing , Wen Muning , et al . Trust region policy optimisation in multi-agent reinforcement learning [C/OL ] // Proceedings of the 10th International Conference on Learning Representations , 2022 : 1 - 27 [2026-02-15] . https://openreview.net/forum?id=EcGGFkNTxdJ https://openreview.net/forum?id=EcGGFkNTxdJ .

Li Simin , Guo Jun , Xiu Jingqiao , et al . Byzantine robust cooperative multi-agent reinforcement learning as a Bayesian game [C/OL ] // Proceedings of the 12th International Conference on Learning Representations , 2024 : 1 - 27 [2026-02-15] . https://openreview.net/forum?id=z6KS9D1dxt https://openreview.net/forum?id=z6KS9D1dxt .

Zhou Z Y , Liu G J , Zhou M C , et al . Robust multi-agent reinforcement learning with stochastic adversary [C ] // Proceedings of the 42nd International Conference on Machine Learning . New York : ACM , 2025 : 79004 - 79027 .

Lee S , Hwang J , Jo Y , et al . Wolfpack adversarial attack for robust multi-agent reinforcement learning [C ] // Proceedings of the 42nd International Conference on Machine Learning , 2025 : 33025 - 33056 .

Ruan S L , Zhang Y , Zhang K , et al . DAE-GAN: Dynamic aspect-aware GAN for text-to-image synthesis [C ] // 2021 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2021 : 13940 - 13949 . DOI: 10.1109/iccv48922.2021.01370 http://dx.doi.org/10.1109/iccv48922.2021.01370

Ruan S L , Liu H J , Chen Z , et al . CPWS: Confident programmatic weak supervision for high-quality data labeling [J ] . ACM Transactions on Information Systems , 2025 , 43 ( 4 ): 1 - 26 . DOI: 10.1145/3725730 http://dx.doi.org/10.1145/3725730

Wang A , Singh A , Michael J , et al . GLUE: A multi-task benchmark and analysis platform for natural language understanding [C ] // Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP . Stroudsburg : ACL , 2018 : 353 - 355 . DOI: 10.18653/v1/w18-5446 http://dx.doi.org/10.18653/v1/w18-5446

Srivastava A , Rastogi A , Rao A , et al . Beyond the imitation game: Quantifying and extrapolating the capabilities of language models [J ] . Transactions on Machine Learning Research , 2023 , 2023( 5 ): 1 - 95 .

Bettini M , Prorok A , Moens V . BenchMARL: Benchmarking multi-agent reinforcement learning [C ] // New York : ACM , 2024 : 10557 - 10566 .

Papadopoulos G , Kontogiannis A , Papadopoulou F , et al . An extended benchmarking of multi-agent reinforcement learning algorithms in complex fully cooperative tasks [J ] . Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1 , 2025 : 1613 - 1622 . DOI: 10.65109/mbxj1309 http://dx.doi.org/10.65109/mbxj1309

Li Simin , Mao Zihao , Li Hanxiao , et al . Empirical study on robustness and resilience in cooperative multi-agent reinforcement learning [C ] // Advances in Neural Information Processing Systems 38 . New York : Curran Associates, Inc. , 2025 .

Kendall M G . A new measure of rank correlation [J ] . Biometrika , 1938 , 30 ( 1/2 ): 81 - 93 . DOI: 10.1093/biomet/30.1-2.81 http://dx.doi.org/10.1093/biomet/30.1-2.81

Brockman G , Cheung V , Pettersson L , et al . OpenAI gym [PP/OL ] . V1.arXiv ( 2016-06-05 )[ 2026-02-14 ] . https://doi.org/10.48550/arXiv.1606.01540 https://doi.org/10.48550/arXiv.1606.01540 .

Fujimoto S , Hoof H , Meger D . Addressing function approximation error in actor-critic methods [C ] // Proceedings of the 35th International Conference on Machine Learning . Stockholm : PMLR , 2018 : 1587 - 1596 . DOI: 10.48550/arXiv.1802.09477 http://dx.doi.org/10.48550/arXiv.1802.09477

Bhatt A , Palenicek D , Belousov B , et al . CrossQ: Batch normalization in deep reinforcement learning for greater sample efficiency and simplicity [C/OL ] // The 12th International Conference on Learning Representations , 2024 : 1 - 19 [2026-02-15] . https://openreview.net/forum?id=PczQtTsTIX https://openreview.net/forum?id=PczQtTsTIX .

Schulman J , Levine S , Abbeel P , et al . Trust region policy optimization [C ] // Proceedings of the 32nd International Conference on Machine Learning . Lille : PMLR , 2015 : 1889 - 1897 .

Kuznetsov A , Shvechikov P , Grishin A , et al . Controlling overestimation bias with truncated mixture of continuous distributional quantile critics [C ] // Proceedings of the 37th International Conference on Machine Learning . New York : ACM , 2020 : 5556 - 5566 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Dynamic Clustering Algorithm for UAV Ad Hoc Networks Based on Multi-Agent Reinforcement Learning

A Multi-Agent Cooperative Decision-Making Method Based on the Fusion of Heterogeneous Capability Awareness and Role Attention

Multi-Agent Reinforcement Learning: From Foundational Theory to Cutting-Edge Algorithms

Bimodal Action Recognition Based on Spatiotemporal Adaptive Fusion

A Traceable and Encrypted Malicious Traffic Detection Scheme Based on Set Pre-Constrained Encryption

Related Author

FANG Jiarui

WU Nan

WEI Shiyu

ZHANG Tingting

LIU Mengke

LI Kuixian

SUN Qian

WANG Wei

Related Institution

School of Integrated Circuits and Electronics, Beijing Institute of Technology

School of Information and Electronics, Beijing Institute of Technology

School of Information and Communication Engineering, Harbin Engineering University

College of Information Science and Engineering, Hohai University

College of Computer Science and Software Engineering, Hohai University

⁰