

浏览全部资源
扫码关注微信
1.中国科学院自动化研究所,北京 100190
2.中国科学院大学人工智能学院,北京 100049
3.南京信息工程大学计算机学院,江苏南京 210044
4.华如研究院,北京 100193
Received:19 October 2023,
Revised:2024-05-27,
Published:25 November 2024
移动端阅览
冯锦元, 陈敏, 李俊影, 等. 知识数据协同的多对手智能空中博弈策略设计[J]. 电子学报, 2024, 52(11): 3809-3822.
FENG Jin-yuan, CHEN Min, LI Jun-ying, et al. Knowledge-Based and Data-Driven Integrating Design Methodology for Air Combat Strategy in Multi-Opponent Adversarial Game[J]. Acta Electronica Sinica, 2024, 52(11): 3809-3822.
冯锦元, 陈敏, 李俊影, 等. 知识数据协同的多对手智能空中博弈策略设计[J]. 电子学报, 2024, 52(11): 3809-3822. DOI:10.12263/DZXB.20230985
FENG Jin-yuan, CHEN Min, LI Jun-ying, et al. Knowledge-Based and Data-Driven Integrating Design Methodology for Air Combat Strategy in Multi-Opponent Adversarial Game[J]. Acta Electronica Sinica, 2024, 52(11): 3809-3822. DOI:10.12263/DZXB.20230985
人工智能技术的迅速发展赋予了空战自主对抗策略超越人类专家的潜力.现有智能空战对抗策略依据驱动方式主要包含两类:其中,基于知识规则的策略对应用场景和专家知识依赖性强,而以强化学习为代表的数据驱动的策略可解释性差、泛化性弱.本文以全国智能空中博弈大赛多机协同空战为背景,提出了一种知识数据协同的多对手的空中博弈策略设计方法.其中,知识驱动部分基于专家知识设计一种参数化、风格化的策略,以生成高质量离线数据及初始化策略;数据驱动部分基于图注意力网络对队友、对手的信息进行针对性表征,提升训练效率及收敛性能.进一步,以动态对手匹配机制进行多对手强化学习训练,进一步提升策略泛化性.该策略与大赛16强中的12支队伍对抗,达到70%以上的统计胜率,这些队伍均采用最新的知识或数据驱动方法,风格各异,同时具有较强的作战能力.
The rapid development of artificial intelligence technology has endowed autonomous air combat strategies with the potential to surpass human experts. Existing intelligent air combat strategies can be classified into two categories based on their driving methods: knowledge-based strategies
which heavily rely on application scenarios and expert knowledge; and data-driven strategies
represented by reinforcement learning
which have poor interpretability and weak generalization. In this study
focusing on the scenario of multi-agent cooperative air combat from the air intelligence game (AIG)—a knowledge-based and data-driven integrating strategy design method is proposed. The knowledge-based part utilizes expert knowledge to design a parameterized and stylized knowledge-based artificial intelligence (AI) system
which generates high-quality offline data and initializes the strategy. The data-driven part employs graph attention networks to selectively represent information about teammates and opponents
aiming to improve training efficiency and convergence performance. Furthermore
a dynamic opponent matching mechanism is introduced for multi-agent reinforcement learning training to enhance strategy generalization. The proposed strategy achieved a statistical winning rate of over 70% when competing against 12 teams from the top 16 teams in AIG. It is worth mentioning that these teams all adopt the latest knowledge-based or data-driven methods
with diverse styles
and at the same time
they have strong combat capabilities.
赵静萌 , 黄宁 , 朱杰 , 等 . 多参数关联的机载系统空战业务可靠性评估方法 [J ] . 电子学报 , 2022 , 50 ( 9 ): 2060 - 2067 .
ZHAO J M , HUANG N , ZHU J , et al . Airborne system air combat application reliability evaluation method based on multi-parameter coupling [J ] . Acta Electronica Sinica , 2022 , 50 ( 9 ): 2060 - 2067 . (in Chinese)
傅莉 , 谢福怀 , 孟光磊 , 等 . 基于滚动时域的无人机空战决策专家系统 [J ] . 北京航空航天大学学报 , 2015 , 41 ( 11 ): 1994 - 1999 .
FU L , XIE F H , MENG G L , et al . An UAV air-combat decision expert system based on receding horizon control [J ] . Journal of Beijing University of Aeronautics and Astronautics , 2015 , 41 ( 11 ): 1994 - 1999 . (in Chinese) .
KENNETH H . A High-Fidelity, Six-Degree-of-Freedom Batch Simulation Environment for Tactical Guidance Research and Evaluation [R ] . Washington : National Aeronautics and Space Administration, Office of Management, Scientific and Technical Information Program , 1993 .
GOODRICH K , MCMANUS J . Development of a tactical guidance research and evaluation system (TGRES) [C ] // Proceedings of the Flight Simulation Technologies Conference and Exhibit . Reston : AIAA , 1989 : 350 - 356 .
张清华 , 黄志康 , 高满 , 等 . 基于不确定性与错误分类率博弈的序贯三支决策模型 [J ] . 电子学报 , 2022 , 50 ( 5 ): 1033 - 1041 .
ZHANG Q H , HUANG Z K , GAO M , et al . Sequential three-way decision model based on the game between uncertainty and error classification rate [J ] . Acta Electronica Sinica , 2022 , 50 ( 5 ): 1033 - 1041 . (in Chinese)
张恒巍 , 黄世锐 . Markov微分博弈模型及其在网络安全中的应用 [J ] . 电子学报 , 2019 , 47 ( 3 ): 606 - 612 .
ZHANG H W , HUANG S R . Markov differential game model and its application in network security [J ] . Acta Electronica Sinica , 2019 , 47 ( 3 ): 606 - 612 . (in Chinese)
孙智孝 , 杨晟琦 , 朴海音 , 等 . 未来智能空战发展综述 [J ] . 航空学报 , 2021 , 42 ( 8 ): 28 - 42 .
SUN Z X , YANG S Q , PIAO H Y , et al . A survey of air combat artificial intelligence [J ] . Acta Aeronautica et Astronautica Sinica , 2021 , 42 ( 8 ): 28 - 42 . (in Chinese)
HAMBLING D . AI outguns a human fighter pilot [J ] . New Scientist , 2020 , 247 ( 3297 ): 12 .
SUN Z X , PIAO H Y , YANG Z , et al . Multi-agent hierarchical policy gradient for air combat tactics emergence via self-play [J ] . Engineering Applications of Artificial Intelligence , 2021 , 98 : 104112 .
WANG Y , ZHANG X W , ZHOU R , et al . Research on UCAV maneuvering decision method based on heuristic reinforcement learning [J ] . Computational Intelligence and Neuroscience , 2022 , 2022 : 1477078 .
PIAO H Y , HAN Y , CHEN H C , et al . Complex relationship graph abstraction for autonomous air combat collaboration: A learning and expert knowledge hybrid approach [J ] . Expert Systems with Applications , 2023 , 215 : 119285 .
JIANG F L , XU M Q , LI Y Q , et al . Short-range air combat maneuver decision of UAV swarm based on multi-agent transformer introducing virtual objects [J ] . Engineering Applications of Artificial Intelligence , 2023 , 123 : 106358 .
蒲志强 , 易建强 , 刘振 , 等 . 知识和数据协同驱动的群体智能决策方法研究综述 [J ] . 自动化学报 , 2022 , 48 ( 3 ): 627 - 643 .
PU Z Q , YI J Q , LIU Z , et al . Knowledge-based and data-driven integrating methodologies for collective intelligence decision making: A survey [J ] . Acta Automatica Sinica , 2022 , 48 ( 3 ): 627 - 643 . (in Chinese)
VINYALS O , BABUSCHKIN I , CZARNECKI W M , et al . Grandmaster level in StarCraft II using multi-agent reinforcement learning [J ] . Nature , 2019 , 575 ( 7782 ): 350 - 354 .
SUI Z Z , PU Z Q , YI J Q , et al . Formation control with collision avoidance through deep reinforcement learning using model-guided demonstration [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2021 , 32 ( 6 ): 2358 - 2372 .
HE W Z , TAN J L , GUO Y F , et al . Flipit game deception strategy selection method based on deep reinforcement learning [J ] . International Journal of Intelligent Systems , 2023 , 2023 : 5560416 .
FU H B , TIAN Y , YU H X , et al . Greedy when sure and conservative when uncertain about the opponents [C ] // International Conference on Machine Learning . Baltimore : ICML , 2022 : 6829 - 6848 .
SCHÄFER L , CHRISTIANOS F , STORKEY A , et al . Learning task embeddings for teamwork adaptation in multi-agent reinforcement learning [EB/OL ] . ( 2023-11-20 )[ 2024-01-27 ] . https://arxiv.org/abs/2207.02249v2 https://arxiv.org/abs/2207.02249v2 .
PAPOUDAKIS G , ALBRECHT S V . Variational autoencoders for opponent modeling in multi-agent syste-ms [EB/OL ] . ( 2020-01-29 )[ 2024-01-27 ] . https://arxiv.org/abs/2001. 10829 https://arxiv.org/abs/2001.10829 .
PAPOUDAKIS G , CHRISTIANOS F , ALBRECHT S V . Agent modelling under partial observability for deep reinforcement learning [EB/OL ] . ( 2021-11-09 )[ 2024-01-27 ] . http://arxiv.org/abs/2006.09447 http://arxiv.org/abs/2006.09447 .
RICHARDS C . Boyd's OODA loop [J ] . Necesse , 2020 , 5 ( 1 ): 142 - 165 .
LITTMAN M L . Markov games as a framework for multi-agent reinforcement learning [M ] // Machine Learning Proceedings 1994 . Amsterdam : Elsevier , 1994 : 157 - 163 .
SILVER D , HUANG A , MADDISON C J , et al . Mastering the game of Go with deep neural networks and tree search [J ] . Nature , 2016 , 529 ( 7587 ): 484 - 489 .
SILVER D , SCHRITTWIESER J , SIMONYAN K , et al . Mastering the game of go without human knowledge [J ] . Nature , 2017 , 550 ( 7676 ): 354 - 359 .
BERNER C , BROCKMAN G , CHAN B , et al . Dota 2 with large scale deep reinforcement learning [EB/OL ] . ( 2019-12-13 )[ 2024-01-27 ] . https://arxiv.org/abs/1912.06680 https://arxiv.org/abs/1912.06680 .
YE D H , CHEN G B , ZHANG W , et al . Towards playing full moba games with deep reinforcement learni-ng [EB/OL ] . ( 2020-12-31 )[ 2024-01-27 ] . http://arxiv.org/abs/2011.12692 http://arxiv.org/abs/2011.12692 .
ROSS S , GORDON G J , BAGNELL J . A reduction of imitation learning and structured prediction to no-regret online learning [EB/OL ] . ( 2011-03-16 )[ 2024-01-27 ] . http://arxiv.org/abs/1011.0686v3 http://arxiv.org/abs/1011.0686v3 .
CHEN M , PU Z Q , PAN Y , et al . All for goals: A stylized automated analysis framework in football matches [C ] // 2023 International Joint Conference on Neural Networks (IJCNN) . Piscataway : IEEE , 2023 : 1 - 8 .
YU C , VELU A , VINITSKY E , et al . The surprising effectiveness of PPO in cooperative, multi-agent ga-mes [EB/OL ] . ( 2022-11-04 )[ 2024-01-27 ] . http://arxiv.org/abs/2103.01955v4 http://arxiv.org/abs/2103.01955v4 .
BRODY S , ALON U , YAHAV E . How attentive are graph attention networks? [EB/OL ] . ( 2022-01-31 )[ 2024-01-27 ] . https://arxiv.org/abs/2105.14491v3 https://arxiv.org/abs/2105.14491v3 .
VELIČKOVIĆ P , CUCURULL G , CASANOVA A , et al . Graph attention networks [EB/OL ] . ( 2018-02-04 )[ 2024-01-27 ] . https://arxiv.org/abs/1710.10903v3 https://arxiv.org/abs/1710.10903v3 .
ZHANG C Y , VINYALS O , MUNOS R , et al . A study on overfitting in deep reinforcement learning [EB/OL ] . ( 2018-04-20 )[ 2024-01-27 ] . http://arxiv.org/abs/1804.06893v2 http://arxiv.org/abs/1804.06893v2 .
KURNIAWAN B , VAMPLEW P , PAPASIMEON M , et al . An empirical study of reward structures for actor-critic reinforcement learning in air combat manoeuvring simulation [M ] // Lecture Notes in Computer Science . Cham : Springer International Publishing , 2019 : 54 - 65 .
0
Views
11
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621