• 电磁频谱智能+ •

### 基于多智能体深度强化学习的分布式协同干扰功率分配算法

1. 空军工程大学信息与导航学院，陕西 西安 710077
• 收稿日期:2021-06-30 修回日期:2021-10-10 出版日期:2022-06-25
• 作者简介:
• 饶 宁 男，1997年8月出生，江西上饶人.现为空军工程大学信息与导航学院硕士研究生.主要研究方向为通信对抗、强化学习. E-mail: raoningmabma@163.com
许 华 男，1976年4月出生，湖北宜昌人.现为空军工程大学信息与导航学院教授、博士生导师.主要研究方向为通信对抗、信号盲处理.E-mail: 13720720010@139.com
蒋 磊 男，1974年6月出生，江苏无锡人.现为空军工程大学信息与导航学院副教授、硕士生导师.主要研究方向为通信对抗、无线通信技术.E-mail: jleimail@126.com
宋佰霖 男，1997年11月出生，辽宁沈阳人.现为空军工程大学信息与导航学院硕士研究生.主要研究方向为通信对抗、强化学习. E-mail: songbail@126.com
史蕴豪 男，1996年7月出生，陕西咸阳人.现为空军工程大学信息与导航学院博士研究生.主要研究方向为信号识别、深度学习. E-mail: shiyunhaoai@163.com

### Allocation Algorithm of Distributed Cooperative Jamming Power Based on Multi-Agent Deep Reinforcement Learning

RAO Ning, XU Hua, JIANG Lei, SONG Bai-lin, SHI Yun-hao

1. Information and Navigation College of Air Force Engineering University, Xi’an, Shaanxi 710077, China
• Received:2021-06-30 Revised:2021-10-10 Online:2022-06-25 Published:2022-06-25

Abstract:

In order to solve the problem of jamming power allocation in battlefield cooperative communication countermeasures, this paper designs a distributed cooperative jamming power allocation method based on multi-agent deep reinforcement learning. Specifically, modeling the communication jamming power allocation as a fully cooperative multi-agent task, then the framework of centralized training and distributed decision-making is adopted to alleviate the characteristic of non-stationary environment and high dimensions in multi-agent system, reducing the communication overhead between agents as well, and introducing the maximum policy entropy criterion to control the exploration efficiency of each agent. Regarding maximizing the cumulative jamming reward and maximizing the entropy of the jamming policy as the optimization goal, then accelerates the learning of cooperative strategies. Simulation results indicate the proposed distributed method can effectively solve the high-dimensional cooperative jamming power allocation problem. Compared with the existing centralized allocation method, it has faster learning speed and less volatility, and the jamming efficiency is 16.8% higher than that of the centralized method under the same conditions.

Extended Abstract
In order to solve the problem of jamming power allocation in multi device cooperative jamming in battlefield cooperative communication countermeasure scenario, this paper designs a distributed cooperative jamming power allocation method based on multi-agent deep reinforcement learning. Specifically, modeling the communication jamming power allocation as a fully cooperative multi-agent task, combining the advantages of centralized learning method and independent learning method in multi-agent system, then the framework of centralized training and distributed decision-making is adopted to alleviate the characteristic of non-stationary environment, high decision-making dimensions and difficult training convergence in multi-agent system, reducing the communication overhead between agents as well, and introducing the maximum policy entropy criterion to control the exploration efficiency of each agent. Regarding maximizing the cumulative jamming reward and maximizing the entropy of the jamming policy as the optimization goal, then accelerates the learning of cooperative strategies. In the reward function, the realization of the overall jamming suppression task and the optimization of jamming power utilization are comprehensively considered, and the reasonable jamming power allocation scheme can be adaptively adjusted under different jamming suppression coefficients. Simulation results indicate the proposed distributed method can effectively solve the high-dimensional cooperative jamming power allocation problem, compared with the existing centralized allocation method, it has faster learning speed and less volatility, and the jamming efficiency is 16.8% higher than that of the centralized method under the same conditions. The ablation experiment shows that the maximum strategy entropy can further improve the exploration efficiency and find the optimal scheme faster.