空军工程大学信息与导航学院,陕西西安 710077
[ "饶 宁 男,1997年8月出生,江西上饶人.现为空军工程大学信息与导航学院硕士研究生.主要研究方向为通信对抗、强化学习. E-mail: raoningmabma@163.com" ]
[ "许 华 男,1976年4月出生,湖北宜昌人.现为空军工程大学信息与导航学院教授、博士生导师.主要研究方向为通信对抗、信号盲处理.E-mail: 13720720010@139.com" ]
[ "蒋 磊 男,1974年6月出生,江苏无锡人.现为空军工程大学信息与导航学院副教授、硕士生导师.主要研究方向为通信对抗、无线通信技术.E-mail: jleimail@126.com" ]
[ "宋佰霖 男,1997年11月出生,辽宁沈阳人.现为空军工程大学信息与导航学院硕士研究生.主要研究方向为通信对抗、强化学习. E-mail: songbail@126.com" ]
[ "史蕴豪 男,1996年7月出生,陕西咸阳人.现为空军工程大学信息与导航学院博士研究生.主要研究方向为信号识别、深度学习. E-mail: shiyunhaoai@163.com" ]
收稿:2021-06-30,
修回:2021-10-10,
纸质出版:2022-06-25
移动端阅览
饶宁,许华,蒋磊等.基于多智能体深度强化学习的分布式协同干扰功率分配算法[J].电子学报,2022,50(06):1319-1330.
RAO Ning,XU Hua,JIANG Lei,et al.Allocation Algorithm of Distributed Cooperative Jamming Power Based on Multi-Agent Deep Reinforcement Learning[J].ACTA ELECTRONICA SINICA,2022,50(06):1319-1330.
饶宁,许华,蒋磊等.基于多智能体深度强化学习的分布式协同干扰功率分配算法[J].电子学报,2022,50(06):1319-1330. DOI: 10.12263/DZXB.20210818.
RAO Ning,XU Hua,JIANG Lei,et al.Allocation Algorithm of Distributed Cooperative Jamming Power Based on Multi-Agent Deep Reinforcement Learning[J].ACTA ELECTRONICA SINICA,2022,50(06):1319-1330. DOI: 10.12263/DZXB.20210818.
针对战场通信对抗协同干扰中的干扰功率分配难题,本文基于多智能体深度强化学习设计了一种分布式协同干扰功率分配算法.具体地,将通信干扰功率分配问题构建为完全协作的多智能体任务,采用集中式训练、分布式决策的方式缓解多智能体系统环境非平稳、决策维度高的问题,减少智能体之间的通信开销,并加入最大策略熵准则控制各智能体的探索效率,以最大化累积干扰奖励和最大化干扰策略熵为优化目标,加速各智能体间协同策略的学习.仿真结果表明,所提出的分布式算法能有效解决高维协同干扰功率分配难题,相比于已有的集中式分配算法具有学习速度更快、波动性更小等优点,且相同条件下干扰效率可高出集中式算法16.8%.
In order to solve the problem of jamming power allocation in battlefield cooperative communication countermeasures
this paper designs a distributed cooperative jamming power allocation method based on multi-agent deep reinforcement learning. Specifically
modeling the communication jamming power allocation as a fully cooperative multi-agent task
then the framework of centralized training and distributed decision-making is adopted to alleviate the characteristic of non-stationary environment and high dimensions in multi-agent system
reducing the communication overhead between agents as well
and introducing the maximum policy entropy criterion to control the exploration efficiency of each agent. Regarding maximizing the cumulative jamming reward and maximizing the entropy of the jamming policy as the optimization goal
then accelerates the learning of cooperative strategies. Simulation results indicate the proposed distributed method can effectively solve the high-dimensional cooperative jamming power allocation problem. Compared with the existing centralized allocation method
it has faster learning speed and less volatility
and the jamming efficiency is 16.8% higher than that of the centralized method under the same conditions.
王沙飞 , 鲍雁飞 , 李岩 . 认知电子战体系结构与技术 [J]. 中国科学: 信息科学 , 2018 , 48 ( 12 ): 1603 - 1613, 1709 .
WANG S F , BAO Y F , LI Y . The architecture and technology of cognitive electronic warfare [J]. Scientia Sinica(Informationis) , 2018 , 48 ( 12 ): 1603 - 1613, 1709 . (in Chinese)
BAYRAM S , VANLI N D , DULEK B , et al . Optimum power allocation for average power constrained jammers in the presence of non-Gaussian noise [J]. IEEE Communications Letters , 2012 , 16 ( 8 ): 1153 - 1156 .
XU C , SHENG M , WANG X J , et al . Distributed subchannel allocation for interference mitigation in OFDMA femtocells: A utility-based learning approach [J]. IEEE Transactions on Vehicular Technology , 2015 , 64 ( 6 ): 2463 - 2475 .
GOMADAM K , CADAMBE V R , JAFAR S A . Approaching the capacity of wireless networks through distributed interference alignment [C]// 2008 IEEE Global Telecommunications Conference . New Orleans : IEEE , 2008 : 1 - 6 .
AMURU S , TEKIN C , SCHAAR M VAN DER , et al . Jamming bandits—A novel learning method for optimal jamming [J]. IEEE Transactions on Wireless Communications , 2016 , 15 ( 4 ): 2792 - 2808 .
颛孙少帅 , 杨俊安 , 刘辉 , 等 . 基于正强化学习和正交分解的干扰策略选择算法 [J]. 系统工程与电子技术 , 2018 , 40 ( 3 ): 518 - 525 .
ZHUANSUN S S , YANG J N , LIU H , et al . Jamming strategy learning based on positive reinforcement learning and orthogonal decomposition [J]. Systems Engineering and Electronics , 2018 , 40 ( 3 ): 518 - 525 . (in Chinese)
AMURU S , BUEHRER R M . Optimal jamming using delayed learning [C]// 2014 IEEE Military Communications Conference . Baltimore : IEEE , 2014 : 1528 - 1533 .
黄志清 , 曲志伟 , 张吉 , 等 . 基于深度强化学习的端到端无人驾驶决策 [J]. 电子学报 , 2020 , 48 ( 9 ): 1711 - 1719 .
HUANG Z Q , QU Z W , ZHANG J , et al . End-to-end autonomous driving decision based on deep reinforcement learning [J]. Acta Electronica Sinica , 2020 , 48 ( 9 ): 1711 - 1719 . (in Chinese)
SILVER D , HUANG A , MADDISON C J , et al . Mastering the game of Go with deep neural networks and tree search [J]. Nature , 2016 , 529 ( 7587 ): 484 - 489 .
VINYALS O , BABUSCHKIN I , CZARNECKI W M , et al . Grandmaster level in StarCraft II using multi-agent reinforcement learning [J]. Nature , 2019 , 575 ( 7782 ): 350 - 354 .
陈思光 , 陈佳民 , 赵传信 . 基于深度强化学习的云边协同计算迁移研究 [J]. 电子学报 , 2021 , 49 ( 1 ): 157 - 166 .
CHEN S G , CHEN J M , ZHAO C X . Deep reinforcement learning based cloud-edge collaborative computation offloading mechanism [J]. Acta Electronica Sinica , 2021 , 49 ( 1 ): 157 - 166 . (in Chinese)
LI S , YAN Y H , REN J , et al . A sample-efficient actor-critic algorithm for recommendation diversification [J]. Chinese Journal of Electronics , 2020 , 29 ( 1 ): 89 - 96 .
杨启萌 , 禹龙 , 田生伟 , 等 . 基于深度强化学习的维吾尔语人称代词指代消解 [J]. 电子学报 , 2020 , 48 ( 6 ): 1077 - 1083 .
YANG Q M , YU L , TIAN S W , et al . Anaphora resolution of uyghur personal pronouns based on deep reinforcement learning [J]. Acta Electronica Sinica , 2020 , 48 ( 6 ): 1077 - 1083 . (in Chinese)
LUONG N C , HOANG D T , GONG S M , et al . Applications of deep reinforcement learning in communications and networking: A survey [J]. IEEE Communications Surveys & Tutorials , 2019 , 21 ( 4 ): 3133 - 3174 .
ZHAO D , QIN H , SONG B , et al . A graph convolutional network-based deep reinforcement learning approach for resource allocation in a cognitive radio network [J]. Sensors(Basel, Switzerland) , 2020 , 20 ( 18 ): 5216 - 5239 .
WANG S X , LIU H P , GOMES P H , et al . Deep reinforcement learning for dynamic multichannel access in wireless networks [J]. IEEE Transactions on Cognitive Communications and Networking , 2018 , 4 ( 2 ): 257 - 265 .
XU Z Y , WANG Y Z , TANG J , et al . A deep reinforcement learning based framework for power-efficient resource allocation in cloud RANs [C]// 2017 IEEE International Conference on Communications . Paris : IEEE , 2017 : 1 - 6 .
GUO D L , TANG L , ZHANG X G , et al . Joint optimization of handover control and power allocation based on multi-agent deep reinforcement learning [J]. IEEE Transactions on Vehicular Technology , 2020 , 69 ( 11 ): 13124 - 13138 .
刘婷婷 , 罗义南 , 杨晨阳 . 基于多智能体深度强化学习的分布式干扰协调 [J]. 通信学报 , 2020 , 41 ( 7 ): 38 - 48 .
LIU T T , LUO Y N , YANG C Y . Distributed interference coordination based on multi-agent deep reinforcement learning [J]. Journal on Communications , 2020 , 41 ( 7 ): 38 - 48 . (in Chinese)
NASIR Y S , GUO D N . Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks [J]. IEEE Journal on Selected Areas in Communications , 2019 , 37 ( 10 ): 2239 - 2250 .
ZHAO N , LIANG Y C , NIYATO D , et al . Deep reinforcement learning for user association and resource allocation in heterogeneous cellular networks [J]. IEEE Transactions on Wireless Communications , 2019 , 18 ( 11 ): 5141 - 5152 .
MENG F , CHEN P , WU L N , et al . Power allocation in multi-user cellular networks: Deep reinforcement learning approaches [J]. IEEE Transactions on Wireless Communications , 2020 , 19 ( 10 ): 6255 - 6267 .
ZHANG K Q , YANG Z R , BAŞAR T . Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms [M/OL]. [2021] . https://link.springer.com/chapter/10.1007/978-3-030-60990-0_12 https://link.springer.com/chapter/10.1007/978-3-030-60990-0_12 .
NGUYEN T T , NGUYEN N D , NAHAVANDI S . Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications [J]. IEEE Transactions on Cybernetics , 2020 , 50 ( 9 ): 3826 - 3839 .
冯小平 , 李鹏 , 杨绍全 . 通信对抗原理 [M]. 西安 : 西安电子科技大学出版社 , 2009 .
FOERSTER J , FARQUHAR G , AFOURAS T , et al . Counterfactual multi-agent policy gradients [C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence(AAAI) . New Orleans : ACM , 2018 : 2974 - 2983 .
LOWE R , WU Y , TAMAR A , et al . Multiagent actor-critic for mixed cooperative-competitive environments [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems(NIPS) . Long Beach : MIT Press , 2017 : 6379 - 6390 .
HAARNOJA T , ZHOU A , ABBEEL P , et al . Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [C]// Proceedings of the 35th International Conference on Machine Learning(ICML) . Stockholm : IMLS , 2018 : 1861 - 1870 .
HAARNOJA T , TANG H , ABBEEL P , et al . Reinforcement learning with deep energy-based policies [C]// Proceedings of the 34th International Conference on Machine Learning(ICML) . Sydney : IMLS , 2017 : 1352 - 1361 .
MNIHL V , KAVUKCUOGLU K , SLIVER D , et al . Human-level control through deep reinforcement learning [J]. Nature , 2015 , 518 ( 7540 ): 529 - 533 .
FUJIMOTO S , HOOF H , MEGER M . Addressing function approximation error in actor-critic methods [C]// Proceedings of the 35th International Conference on Machine Learning(ICML) . Stockholm : IMLS , 2018 : 1587 - 1596 .
0
浏览量
14
下载量
6
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621