Knowledge-Based and Data-Driven Integrating Design Methodology for Air Combat Strategy in Multi-Opponent Adversarial Game

FENG Jin-yuan; CHEN Min; LI Jun-ying; CHEN Jia-le; PU Zhi-qiang; CHEN Min-jie; SUN Fang-yi

doi:10.12263/DZXB.20230985

您当前的位置：

首页 >

文章列表页 >

Knowledge-Based and Data-Driven Integrating Design Methodology for Air Combat Strategy in Multi-Opponent Adversarial Game

PAPERS | 更新时间：2025-12-11

- Knowledge-Based and Data-Driven Integrating Design Methodology for Air Combat Strategy in Multi-Opponent Adversarial Game
- ACTA ELECTRONICA SINICA Vol. 52, Issue 11, Pages: 3809-3822(2024)
- 作者机构：
  
  1.中国科学院自动化研究所，北京 100190
  2.中国科学院大学人工智能学院，北京 100049
  3.南京信息工程大学计算机学院，江苏南京 210044
  4.华如研究院，北京 100193
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(62073323);Strategic Priority Research Program of Chinese Academy of Sciences(XDA27030403);Beijing Nova Program(20220484077)
- DOI：10.12263/DZXB.20230985
  CLC： TP183
- Received：19 October 2023，
  
  Revised：2024-05-27，
  
  Published：25 November 2024
- 稿件说明：
移动端阅览
冯锦元, 陈敏, 李俊影, 等. 知识数据协同的多对手智能空中博弈策略设计[J]. 电子学报, 2024, 52(11): 3809-3822.

FENG Jin-yuan, CHEN Min, LI Jun-ying, et al. Knowledge-Based and Data-Driven Integrating Design Methodology for Air Combat Strategy in Multi-Opponent Adversarial Game[J]. Acta Electronica Sinica, 2024, 52(11): 3809-3822.
冯锦元, 陈敏, 李俊影, 等. 知识数据协同的多对手智能空中博弈策略设计[J]. 电子学报, 2024, 52(11): 3809-3822. DOI：10.12263/DZXB.20230985

FENG Jin-yuan, CHEN Min, LI Jun-ying, et al. Knowledge-Based and Data-Driven Integrating Design Methodology for Air Combat Strategy in Multi-Opponent Adversarial Game[J]. Acta Electronica Sinica, 2024, 52(11): 3809-3822. DOI：10.12263/DZXB.20230985

摘要

人工智能技术的迅速发展赋予了空战自主对抗策略超越人类专家的潜力.现有智能空战对抗策略依据驱动方式主要包含两类：其中，基于知识规则的策略对应用场景和专家知识依赖性强，而以强化学习为代表的数据驱动的策略可解释性差、泛化性弱.本文以全国智能空中博弈大赛多机协同空战为背景，提出了一种知识数据协同的多对手的空中博弈策略设计方法.其中，知识驱动部分基于专家知识设计一种参数化、风格化的策略，以生成高质量离线数据及初始化策略；数据驱动部分基于图注意力网络对队友、对手的信息进行针对性表征，提升训练效率及收敛性能.进一步，以动态对手匹配机制进行多对手强化学习训练，进一步提升策略泛化性.该策略与大赛16强中的12支队伍对抗，达到70%以上的统计胜率，这些队伍均采用最新的知识或数据驱动方法，风格各异，同时具有较强的作战能力.

Abstract

The rapid development of artificial intelligence technology has endowed autonomous air combat strategies with the potential to surpass human experts. Existing intelligent air combat strategies can be classified into two categories based on their driving methods: knowledge-based strategies

which heavily rely on application scenarios and expert knowledge; and data-driven strategies

represented by reinforcement learning

which have poor interpretability and weak generalization. In this study

focusing on the scenario of multi-agent cooperative air combat from the air intelligence game (AIG)—a knowledge-based and data-driven integrating strategy design method is proposed. The knowledge-based part utilizes expert knowledge to design a parameterized and stylized knowledge-based artificial intelligence (AI) system

which generates high-quality offline data and initializes the strategy. The data-driven part employs graph attention networks to selectively represent information about teammates and opponents

aiming to improve training efficiency and convergence performance. Furthermore

a dynamic opponent matching mechanism is introduced for multi-agent reinforcement learning training to enhance strategy generalization. The proposed strategy achieved a statistical winning rate of over 70% when competing against 12 teams from the top 16 teams in AIG. It is worth mentioning that these teams all adopt the latest knowledge-based or data-driven methods

with diverse styles

and at the same time

they have strong combat capabilities.

关键词

Keywords

references

赵静萌 , 黄宁 , 朱杰 , 等 . 多参数关联的机载系统空战业务可靠性评估方法 [J ] . 电子学报 , 2022 , 50 ( 9 ): 2060 - 2067 .

ZHAO J M , HUANG N , ZHU J , et al . Airborne system air combat application reliability evaluation method based on multi-parameter coupling [J ] . Acta Electronica Sinica , 2022 , 50 ( 9 ): 2060 - 2067 . (in Chinese)

傅莉 , 谢福怀 , 孟光磊 , 等 . 基于滚动时域的无人机空战决策专家系统 [J ] . 北京航空航天大学学报 , 2015 , 41 ( 11 ): 1994 - 1999 .

FU L , XIE F H , MENG G L , et al . An UAV air-combat decision expert system based on receding horizon control [J ] . Journal of Beijing University of Aeronautics and Astronautics , 2015 , 41 ( 11 ): 1994 - 1999 . (in Chinese) .

KENNETH H . A High-Fidelity, Six-Degree-of-Freedom Batch Simulation Environment for Tactical Guidance Research and Evaluation [R ] . Washington : National Aeronautics and Space Administration, Office of Management, Scientific and Technical Information Program , 1993 .

GOODRICH K , MCMANUS J . Development of a tactical guidance research and evaluation system (TGRES) [C ] // Proceedings of the Flight Simulation Technologies Conference and Exhibit . Reston : AIAA , 1989 : 350 - 356 .

张清华 , 黄志康 , 高满 , 等 . 基于不确定性与错误分类率博弈的序贯三支决策模型 [J ] . 电子学报 , 2022 , 50 ( 5 ): 1033 - 1041 .

ZHANG Q H , HUANG Z K , GAO M , et al . Sequential three-way decision model based on the game between uncertainty and error classification rate [J ] . Acta Electronica Sinica , 2022 , 50 ( 5 ): 1033 - 1041 . (in Chinese)

张恒巍 , 黄世锐 . Markov微分博弈模型及其在网络安全中的应用 [J ] . 电子学报 , 2019 , 47 ( 3 ): 606 - 612 .

ZHANG H W , HUANG S R . Markov differential game model and its application in network security [J ] . Acta Electronica Sinica , 2019 , 47 ( 3 ): 606 - 612 . (in Chinese)

孙智孝 , 杨晟琦 , 朴海音 , 等 . 未来智能空战发展综述 [J ] . 航空学报 , 2021 , 42 ( 8 ): 28 - 42 .

SUN Z X , YANG S Q , PIAO H Y , et al . A survey of air combat artificial intelligence [J ] . Acta Aeronautica et Astronautica Sinica , 2021 , 42 ( 8 ): 28 - 42 . (in Chinese)

HAMBLING D . AI outguns a human fighter pilot [J ] . New Scientist , 2020 , 247 ( 3297 ): 12 .

SUN Z X , PIAO H Y , YANG Z , et al . Multi-agent hierarchical policy gradient for air combat tactics emergence via self-play [J ] . Engineering Applications of Artificial Intelligence , 2021 , 98 : 104112 .

WANG Y , ZHANG X W , ZHOU R , et al . Research on UCAV maneuvering decision method based on heuristic reinforcement learning [J ] . Computational Intelligence and Neuroscience , 2022 , 2022 : 1477078 .

PIAO H Y , HAN Y , CHEN H C , et al . Complex relationship graph abstraction for autonomous air combat collaboration: A learning and expert knowledge hybrid approach [J ] . Expert Systems with Applications , 2023 , 215 : 119285 .

JIANG F L , XU M Q , LI Y Q , et al . Short-range air combat maneuver decision of UAV swarm based on multi-agent transformer introducing virtual objects [J ] . Engineering Applications of Artificial Intelligence , 2023 , 123 : 106358 .

蒲志强 , 易建强 , 刘振 , 等 . 知识和数据协同驱动的群体智能决策方法研究综述 [J ] . 自动化学报 , 2022 , 48 ( 3 ): 627 - 643 .

PU Z Q , YI J Q , LIU Z , et al . Knowledge-based and data-driven integrating methodologies for collective intelligence decision making: A survey [J ] . Acta Automatica Sinica , 2022 , 48 ( 3 ): 627 - 643 . (in Chinese)

VINYALS O , BABUSCHKIN I , CZARNECKI W M , et al . Grandmaster level in StarCraft II using multi-agent reinforcement learning [J ] . Nature , 2019 , 575 ( 7782 ): 350 - 354 .

SUI Z Z , PU Z Q , YI J Q , et al . Formation control with collision avoidance through deep reinforcement learning using model-guided demonstration [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2021 , 32 ( 6 ): 2358 - 2372 .

HE W Z , TAN J L , GUO Y F , et al . Flipit game deception strategy selection method based on deep reinforcement learning [J ] . International Journal of Intelligent Systems , 2023 , 2023 : 5560416 .

FU H B , TIAN Y , YU H X , et al . Greedy when sure and conservative when uncertain about the opponents [C ] // International Conference on Machine Learning . Baltimore : ICML , 2022 : 6829 - 6848 .

SCHÄFER L , CHRISTIANOS F , STORKEY A , et al . Learning task embeddings for teamwork adaptation in multi-agent reinforcement learning [EB/OL ] . ( 2023-11-20 )[ 2024-01-27 ] . https://arxiv.org/abs/2207.02249v2 https://arxiv.org/abs/2207.02249v2 .

PAPOUDAKIS G , ALBRECHT S V . Variational autoencoders for opponent modeling in multi-agent syste-ms [EB/OL ] . ( 2020-01-29 )[ 2024-01-27 ] . https://arxiv.org/abs/2001. 10829 https://arxiv.org/abs/2001.10829 .

PAPOUDAKIS G , CHRISTIANOS F , ALBRECHT S V . Agent modelling under partial observability for deep reinforcement learning [EB/OL ] . ( 2021-11-09 )[ 2024-01-27 ] . http://arxiv.org/abs/2006.09447 http://arxiv.org/abs/2006.09447 .

RICHARDS C . Boyd's OODA loop [J ] . Necesse , 2020 , 5 ( 1 ): 142 - 165 .

LITTMAN M L . Markov games as a framework for multi-agent reinforcement learning [M ] // Machine Learning Proceedings 1994 . Amsterdam : Elsevier , 1994 : 157 - 163 .

SILVER D , HUANG A , MADDISON C J , et al . Mastering the game of Go with deep neural networks and tree search [J ] . Nature , 2016 , 529 ( 7587 ): 484 - 489 .

SILVER D , SCHRITTWIESER J , SIMONYAN K , et al . Mastering the game of go without human knowledge [J ] . Nature , 2017 , 550 ( 7676 ): 354 - 359 .

BERNER C , BROCKMAN G , CHAN B , et al . Dota 2 with large scale deep reinforcement learning [EB/OL ] . ( 2019-12-13 )[ 2024-01-27 ] . https://arxiv.org/abs/1912.06680 https://arxiv.org/abs/1912.06680 .

YE D H , CHEN G B , ZHANG W , et al . Towards playing full moba games with deep reinforcement learni-ng [EB/OL ] . ( 2020-12-31 )[ 2024-01-27 ] . http://arxiv.org/abs/2011.12692 http://arxiv.org/abs/2011.12692 .

ROSS S , GORDON G J , BAGNELL J . A reduction of imitation learning and structured prediction to no-regret online learning [EB/OL ] . ( 2011-03-16 )[ 2024-01-27 ] . http://arxiv.org/abs/1011.0686v3 http://arxiv.org/abs/1011.0686v3 .

CHEN M , PU Z Q , PAN Y , et al . All for goals: A stylized automated analysis framework in football matches [C ] // 2023 International Joint Conference on Neural Networks (IJCNN) . Piscataway : IEEE , 2023 : 1 - 8 .

YU C , VELU A , VINITSKY E , et al . The surprising effectiveness of PPO in cooperative, multi-agent ga-mes [EB/OL ] . ( 2022-11-04 )[ 2024-01-27 ] . http://arxiv.org/abs/2103.01955v4 http://arxiv.org/abs/2103.01955v4 .

BRODY S , ALON U , YAHAV E . How attentive are graph attention networks? [EB/OL ] . ( 2022-01-31 )[ 2024-01-27 ] . https://arxiv.org/abs/2105.14491v3 https://arxiv.org/abs/2105.14491v3 .

VELIČKOVIĆ P , CUCURULL G , CASANOVA A , et al . Graph attention networks [EB/OL ] . ( 2018-02-04 )[ 2024-01-27 ] . https://arxiv.org/abs/1710.10903v3 https://arxiv.org/abs/1710.10903v3 .

ZHANG C Y , VINYALS O , MUNOS R , et al . A study on overfitting in deep reinforcement learning [EB/OL ] . ( 2018-04-20 )[ 2024-01-27 ] . http://arxiv.org/abs/1804.06893v2 http://arxiv.org/abs/1804.06893v2 .

KURNIAWAN B , VAMPLEW P , PAPASIMEON M , et al . An empirical study of reward structures for actor-critic reinforcement learning in air combat manoeuvring simulation [M ] // Lecture Notes in Computer Science . Cham : Springer International Publishing , 2019 : 54 - 65 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

CS-ROMF:Efficient Community Search Based on Graph Combinatorial Optimization

Reinforcement Learning Based Tuning-free Plug-and-Play Image Reconstruction Method for Single Photon Imaging

Optimal Directed Control of Discrete Event Systems Based on Reinforcement Learning

Autoimmune Dynamic Attack Generation Method Based on Reinforcement Learning

Related Author

CHEN Min

CHEN Min-jie

ZHANG An-ran

WANG Xing-fen

ZHAO Yu-han

LI Li-bo

CHEN Shuang

TIAN Ye

Related Institution

Beijing Information Science and Technology University

Hong Kong Baptist University, Hongkong

School of Computer Science and Technology, Beijing Institute of Technology

School of Information and Electronicsy, Beijing Institute of Technology

MIIT Key Laboratory of Complex-Field Intelligent Exploration, Beijing Institute of Technology

⁰