1.福州大学先进制造学院,福建晋江 362251
2.福州大学物理与信息工程学院,福建福州 350108
3.广东工业大学信息工程学院,广东广州 510006
[ "陈平平 男,1986年12月生,教授,福州大学先进制造学院,主要研究方向为强化学习、压缩感知、信道编码与无线通信等.中国电子学会会员编号:E190021215M." ]
[ "张 旭 男,1997年9月生,2023年毕业于福州大学先进制造学院,获得硕士学位,主要研究方向为强化学习与无线通信." ]
[ "谢肇鹏 男,1995年7月生,讲师,福州大学先进制造学院,主要研究方向为强化学习、信道编码与无线通信等.中国电子学会会员编号:E190156454M. Email:xzp_fzu@163.com" ]
[ "方 毅 男,1986年8月生,教授,广东工业大学信息工程学院,研究方向为信息论与信道编码、无线通信、数据存储编码.中国电子学会会员编号:E190028682M." ]
收稿:2023-07-14,
修回:2024-01-16,
纸质出版:2024-06-25
移动端阅览
陈平平, 张旭, 谢肇鹏, 等. 基于多智能体近端策略优化的多信道动态频谱接入[J]. 电子学报, 2024, 52(06): 1824-1831.
CHEN Ping-ping, ZHANG Xu, XIE Zhao-peng, et al. Multi-Channel Dynamic Spectrum Access Based on Multi-Agent Proximal Policy Optimization[J]. Acta Electronica Sinica, 2024, 52(06): 1824-1831.
陈平平, 张旭, 谢肇鹏, 等. 基于多智能体近端策略优化的多信道动态频谱接入[J]. 电子学报, 2024, 52(06): 1824-1831. DOI:10.12263/DZXB.20230663
CHEN Ping-ping, ZHANG Xu, XIE Zhao-peng, et al. Multi-Channel Dynamic Spectrum Access Based on Multi-Agent Proximal Policy Optimization[J]. Acta Electronica Sinica, 2024, 52(06): 1824-1831. DOI:10.12263/DZXB.20230663
为了在多用户多信道通信场景中应用动态频谱接入(Dynamic Spectrum Access,DSA)技术提高通信效率,保证用户公平,本文基于多智能体近端策略优化(Multi-Agent Proximal Policy Optimization,MAPPO)提出了MAPPO-DSA算法.该算法首先针对单信道接入在多个信道同时空闲时存在的频谱浪费问题,使用多信道接入作为解决方案.同时,多信道接入导致状态空间与动作空间指数增长,计算成本高,学习难度大.为此本文引入MAPPO深度强化学习(Deep Reinforcement Learning,DRL)算法,在复杂环境中高效学习和优化接入策略.通过设计优化MAPPO中观测及奖励等强化学习要素和共享网络参数来保证用户公平.最后,在不同场景下的实验结果表明,所提出的MAPPO-DSA能够学习到近似最优的接入策略,部分场景中的网络吞吐量逼近理论上限,显著优于现有算法,且有效保证用户公平.
To enhance communication efficiency and ensure user fairness in multi-user multi-channel communication scenarios
based on multi-agent proximal policy optimization (MAPPO) for the application of dynamic spectrum access (DSA) technology
this paper proposes the MAPPO-DSA algorithm. The algorithm addresses the issue of spectrum waste in single-channel access when multiple channels are simultaneously idle by using multi-channel access as a solution. However
multi-channel access leads to an exponential increase in the state and action spaces
resulting in high computational costs and learning difficulties. To tackle this
the paper introduces the MAPPO deep reinforcement learning (DRL) algorithm to efficiently learn and optimize access strategies in complex environments. The design of MAPPO incorporates reinforcement learning elements such as observation and reward
as well as shared network parameters to ensure user fairness. Experimental results in different scenarios demonstrate that the proposed MAPPO-DSA algorithm can learn near-optimal access strategies
and approach the theoretical throughput limit in some scenarios
outperforming the existing algorithms significantly and effectively ensuring user fairness.
HU F , CHEN B , ZHU K . Full spectrum sharing in cognitive radio networks toward 5G: A survey [J ] . IEEE Access , 2018 , 6 : 15754 - 15776 .
AKYILDIZ I F , LEE W Y , VURAN M C , et al . A survey on spectrum management in cognitive radio networks [J ] . IEEE Communications Magazine , 2008 , 46 ( 4 ): 40 - 48 .
TANDRA R , MISHRA S M , SAHAI A . What is a spectrum hole and what does it take to recognize one? [J ] . Proceedings of the IEEE , 2009 , 97 ( 5 ): 824 - 848 .
蒋师 , 屈代明 , 吴露露 , 等 . 动态频谱接入技术的分类和研究现状 [J ] . 通信技术 , 2008 , 41 ( 11 ): 20 - 22 .
JIANG S , QU D M , WU L L , et al . A taxonomy of dynamic spectrum access technologies and current research progress [J ] . Communications Technology , 2008 , 41 ( 11 ): 20 - 22 . (in Chinese)
ZHAO Q , SWAMI A . A survey of dynamic spectrum access: Signal processing and networking perspectives [C ] // 2007 IEEE International Conference on Acoustics, Speech and Signal Processing . Piscataway : IEEE , 2007 : IV-1349-IV-1352.
ZHAO Q , SADLER B M . A survey of dynamic spectrum access [J ] . IEEE Signal Processing Magazine , 2007 , 24 ( 3 ): 79 - 89 .
葛雨明 , 孙毅 , 蒋海 , 等 . 基于认知无线电技术的动态频谱分配方案研究 [J ] . 计算机学报 , 2012 , 35 ( 3 ): 446 - 453 .
GE Y M , SUN Y , JIANG H , et al . Research on dynamic spectrum allocation using cognitive radio technologies [J ] . Chinese Journal of Computers , 2012 , 35 ( 3 ): 446 - 453 .
胡浪涛 , 毕松姣 , 刘全金 , 等 . 基于深度强化学习的多小区NOMA能效优化功率分配算法 [J ] . 电子科技大学学报 , 2022 , 51 ( 3 ): 384 - 391 .
HU L T , BI S J , LIU Q J , et al . Multi-cell NOMA energy efficiency optimization power allocation algorithm based on deep reinforcement learning [J ] . Journal of University of Electronic Science and Technology of China , 2022 , 51 ( 3 ): 384 - 391 . (in Chinese)
李保罡 , 石泰 , 陈静 , 等 . 基于强化学习的非正交多址接入和移动边缘计算联合系统信息年龄更新 [J ] . 电子与信息学报 , 2022 , 44 ( 12 ): 4238 - 4245 .
LI B G , SHI T , CHEN J , et al . Age of information updates in non-orthogonal multiple access-mobile edge computing system based on reinforcement learning [J ] . Journal of Electronics & Information Technology , 2022 , 44 ( 12 ): 4238 - 4245 . (in Chinese)
WANG S , LIU H , GOMES P H , et al . Deep reinforcement learning for dynamic multichannel access in wireless networks [J ] . IEEE Transactions on Cognitive Communications and Networking , 2018 , 4 ( 2 ): 257 - 265 .
MNIH V , KAVUKCUOGLU K , SILVER D , et al . Playing atari with deep reinforcement learning [EB/OL ] . [2023 ] . http://arxiv.org/abs/1312.5602.pdf http://arxiv.org/abs/1312.5602.pdf .
宋波 , 叶伟 , 孟祥辉 . 基于多智能体强化学习的动态频谱分配方法综述 [J ] . 系统工程与电子技术 , 2021 , 43 ( 11 ): 3338 - 3351 .
SONG B , YE W , MENG X H . Review of multi-agent reinforcement learning based dynamic spectrum allocation method [J ] . Systems Engineering and Electronics , 2021 , 43 ( 11 ): 3338 - 3351 . (in Chinese)
NAPARSTEK O , COHEN K . Deep multi-user reinforcement learning for distributed dynamic spectrum access [J ] . IEEE Transactions on Wireless Communications , 2019 , 18 ( 1 ): 310 - 323 .
YU Y , WANG T , LIEW S C . Deep-reinforcement learning multiple access for heterogeneous wireless networks [J ] . IEEE Journal on Selected Areas in Communications , 2019 , 37 ( 6 ): 1277 - 1290 .
GUO Z , CHEN Z , LIU P , et al . Multi-agent reinforcement learning-based distributed channel access for next generation wireless networks [J ] . IEEE Journal on Selected Areas in Communications , 2022 , 40 ( 5 ): 1587 - 1599 .
RASHID T , SAMVELYAN M , DE WITT C S , et al . Monotonic value function factorisation for deep multi-agent reinforcement learning [J ] . The Journal of Machine Learning Research , 2020 , 21 ( 1 ): 7234 - 7284 .
YU C , VELU A , VINITSKY E , et al . The surprising effectiveness of PPO in cooperative multi-agent games [J ] . Advances in Neural Information Processing Systems , 2022 , 35 : 24611 - 24624 .
NAMISLO C . Analysis of mobile radio slotted ALOHA networks [J ] . IEEE Journal on Selected Areas in Communications , 2006 , 2 ( 4 ): 583 - 588 .
CASSANDRA A R . A survey of POMDP applications [C ] // AAAI 1998 Symposium on Planning with Partially Observable Markov Decision Processes . California : AAAI , 1998 : 1724 .
SCHULMAN J , WOLSKI F , DHARIWAL P , et al . Proximal policy optimization algorithms [EB/OL ] . [2023 ] . http://arxiv.org/abs/1707.06347.pdf http://arxiv.org/abs/1707.06347.pdf .
PETERS J , SCHAAL S . Natural actor-critic [J ] . Neurocomputing , 2008 , 71 ( 7-9 ): 1180 - 1190 .
SUTTON R S , BARTO A G . Reinforcement Learning: An Introduction [M ] . Massachusetts : MIT Press , 2018 .
DIALLO E A O , SUGIYAMA A , SUGAWARA T . Learning to coordinate with deep reinforcement learning in doubles pong game [C ] // 16th IEEE International Conference on Machine Learning and Applications (ICMLA) . Piscataway : IEEE , 2017 : 14 - 19 .
NAPARSTEK O , COHEN K . Deep multi-user reinforcement learning for distributed dynamic spectrum access [J ] . IEEE Transactions on Wireless Communications , 2019 , 18 ( 1 ): 310 - 323 .
0
浏览量
28
下载量
1
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621