电子学报 ›› 2022, Vol. 50 ›› Issue (6): 1301-1309.DOI: 10.12263/DZXB.20210814

• 电磁频谱智能+ • 上一篇    下一篇

一种基于深度强化学习的协同通信干扰决策算法

宋佰霖, 许华, 齐子森, 饶宁, 彭翔   

  1. 空军工程大学信息与导航学院,陕西 西安 710077
  • 收稿日期:2021-06-30 修回日期:2022-01-05 出版日期:2022-06-25 发布日期:2022-06-25
  • 作者简介:宋佰霖 男,1997年出生,辽宁沈阳人.现为空军工程大学硕士研究生.主要研究方向为通信对抗智能决策和深度强化学习.E-mail: songbail@126.com
    许 华 男,1976年出生,湖北宜昌人.现为空军工程大学信息与导航学院教授、博士生导师.主要研究方向为通信对抗、信号盲处理.E-mail: 13720720010@139.com
  • 基金资助:
    国家自然科学基金青年基金(6190656)

A Collaborative Communication Jamming Decision Algorithm Based on Deep Reinforcement Learning

SONG Bai-lin, XU Hua, QI Zi-sen, RAO Ning, PENG Xiang   

  1. Information and Navigation School,Air Force Engineering University,Xi’an,Shaanxi 710077,China
  • Received:2021-06-30 Revised:2022-01-05 Online:2022-06-25 Published:2022-06-25

摘要:

针对协同电子战中跳频通信干扰协同决策难题,通过构建“整体优化、逐站决策”的协同决策模型,基于深度强化学习技术,设计了在Actor-Critic算法架构下融合优势函数的决策算法,并在奖励函数中嵌入专家激励机制以提高算法的探索能力,采用集中式训练方法优化决策网络,使算法能够输出资源利用率最高的干扰方案,并大幅提高决策效率.仿真结果表明,相比于现有智能决策算法,本文算法给出的干扰方案能够节约8%干扰资源,决策效率提高50%以上,具有较大实用价值.

关键词: 深度强化学习, 通信干扰决策, 干扰资源分配, 优势函数, 专家激励

Abstract:

In order to solve the problem of collaborative decision-making of frequency-hopping communication jamming in collaborative electronic warfare, based on deep reinforcement learning, a collaborative jamming decision-making algorithm based on actor-critic algorithm framework is proposed, which fuses dominant functions by building a collaborative decision-making model of "overall optimization and making decision station by station". An expert experience mechanism is embedded in the reward function to improve the exploration ability of the algorithm, and the decision network is optimized by the distributed execution-centralized training method, so that the algorithm can output the jamming scheme with the highest resource utilization rate and greatly improve the efficiency of decision-making. The simulation results show that, compared with the existing intelligent decision algorithms, the jamming scheme presented in this paper can save 8% of the interference resources and improve the decision efficiency by more than 50%, which is of great practical value.

Key words: deep reinforcement learning, communication jamming decision-making, jamming resource allocation, advantage function, expert incentive

中图分类号: