基于多智能体近端策略优化的多信道动态频谱接入

陈平平; 张旭; 谢肇鹏; 丘毓萍; 方毅

doi:10.12263/DZXB.20230663

您当前的位置：

首页 >

文章列表页 >

基于多智能体近端策略优化的多信道动态频谱接入

学术论文 | 更新时间：2025-12-11

- 基于多智能体近端策略优化的多信道动态频谱接入
- Multi-Channel Dynamic Spectrum Access Based on Multi-Agent Proximal Policy Optimization
- 电子学报 2024年52卷第6期页码：1824-1831
- 作者机构：
  
  1.福州大学先进制造学院,福建晋江 362251
  2.福州大学物理与信息工程学院,福建福州 350108
  3.广东工业大学信息工程学院,广东广州 510006
- 作者简介：
  
  [ "陈平平男，1986年12月生，教授，福州大学先进制造学院，主要研究方向为强化学习、压缩感知、信道编码与无线通信等.中国电子学会会员编号：E190021215M." ]
  [ "张旭男，1997年9月生，2023年毕业于福州大学先进制造学院，获得硕士学位，主要研究方向为强化学习与无线通信." ]
  [ "谢肇鹏男，1995年7月生，讲师，福州大学先进制造学院，主要研究方向为强化学习、信道编码与无线通信等.中国电子学会会员编号：E190156454M. Email:xzp_fzu@163.com" ]
  [ "方毅男，1986年8月生，教授，广东工业大学信息工程学院，研究方向为信息论与信道编码、无线通信、数据存储编码.中国电子学会会员编号：E190028682M." ]
- 基金信息：
  
  国家自然科学基金(62171135;62322106;62071131);福建省杰出青年科学基金(2022J06010)
- DOI：10.12263/DZXB.20230663
  中图分类号： TP317.4
- 收稿：2023-07-14，
  
  修回：2024-01-16，
  
  纸质出版：2024-06-25
- 稿件说明：
移动端阅览
陈平平, 张旭, 谢肇鹏, 等. 基于多智能体近端策略优化的多信道动态频谱接入[J]. 电子学报, 2024, 52(06): 1824-1831.

CHEN Ping-ping, ZHANG Xu, XIE Zhao-peng, et al. Multi-Channel Dynamic Spectrum Access Based on Multi-Agent Proximal Policy Optimization[J]. Acta Electronica Sinica, 2024, 52(06): 1824-1831.
陈平平, 张旭, 谢肇鹏, 等. 基于多智能体近端策略优化的多信道动态频谱接入[J]. 电子学报, 2024, 52(06): 1824-1831. DOI：10.12263/DZXB.20230663

CHEN Ping-ping, ZHANG Xu, XIE Zhao-peng, et al. Multi-Channel Dynamic Spectrum Access Based on Multi-Agent Proximal Policy Optimization[J]. Acta Electronica Sinica, 2024, 52(06): 1824-1831. DOI：10.12263/DZXB.20230663

摘要

为了在多用户多信道通信场景中应用动态频谱接入（Dynamic Spectrum Access，DSA）技术提高通信效率，保证用户公平，本文基于多智能体近端策略优化（Multi-Agent Proximal Policy Optimization，MAPPO）提出了MAPPO-DSA算法.该算法首先针对单信道接入在多个信道同时空闲时存在的频谱浪费问题，使用多信道接入作为解决方案.同时，多信道接入导致状态空间与动作空间指数增长，计算成本高，学习难度大.为此本文引入MAPPO深度强化学习（Deep Reinforcement Learning，DRL）算法，在复杂环境中高效学习和优化接入策略.通过设计优化MAPPO中观测及奖励等强化学习要素和共享网络参数来保证用户公平.最后，在不同场景下的实验结果表明，所提出的MAPPO-DSA能够学习到近似最优的接入策略，部分场景中的网络吞吐量逼近理论上限，显著优于现有算法，且有效保证用户公平.

Abstract

To enhance communication efficiency and ensure user fairness in multi-user multi-channel communication scenarios

based on multi-agent proximal policy optimization (MAPPO) for the application of dynamic spectrum access (DSA) technology

this paper proposes the MAPPO-DSA algorithm. The algorithm addresses the issue of spectrum waste in single-channel access when multiple channels are simultaneously idle by using multi-channel access as a solution. However

multi-channel access leads to an exponential increase in the state and action spaces

resulting in high computational costs and learning difficulties. To tackle this

the paper introduces the MAPPO deep reinforcement learning (DRL) algorithm to efficiently learn and optimize access strategies in complex environments. The design of MAPPO incorporates reinforcement learning elements such as observation and reward

as well as shared network parameters to ensure user fairness. Experimental results in different scenarios demonstrate that the proposed MAPPO-DSA algorithm can learn near-optimal access strategies

and approach the theoretical throughput limit in some scenarios

outperforming the existing algorithms significantly and effectively ensuring user fairness.

关键词

Keywords

references

HU F , CHEN B , ZHU K . Full spectrum sharing in cognitive radio networks toward 5G: A survey [J ] . IEEE Access , 2018 , 6 : 15754 - 15776 .

AKYILDIZ I F , LEE W Y , VURAN M C , et al . A survey on spectrum management in cognitive radio networks [J ] . IEEE Communications Magazine , 2008 , 46 ( 4 ): 40 - 48 .

TANDRA R , MISHRA S M , SAHAI A . What is a spectrum hole and what does it take to recognize one? [J ] . Proceedings of the IEEE , 2009 , 97 ( 5 ): 824 - 848 .

蒋师 , 屈代明 , 吴露露 , 等 . 动态频谱接入技术的分类和研究现状 [J ] . 通信技术 , 2008 , 41 ( 11 ): 20 - 22 .

JIANG S , QU D M , WU L L , et al . A taxonomy of dynamic spectrum access technologies and current research progress [J ] . Communications Technology , 2008 , 41 ( 11 ): 20 - 22 . (in Chinese)

ZHAO Q , SWAMI A . A survey of dynamic spectrum access: Signal processing and networking perspectives [C ] // 2007 IEEE International Conference on Acoustics, Speech and Signal Processing . Piscataway : IEEE , 2007 : IV-1349-IV-1352.

ZHAO Q , SADLER B M . A survey of dynamic spectrum access [J ] . IEEE Signal Processing Magazine , 2007 , 24 ( 3 ): 79 - 89 .

葛雨明 , 孙毅 , 蒋海 , 等 . 基于认知无线电技术的动态频谱分配方案研究 [J ] . 计算机学报 , 2012 , 35 ( 3 ): 446 - 453 .

GE Y M , SUN Y , JIANG H , et al . Research on dynamic spectrum allocation using cognitive radio technologies [J ] . Chinese Journal of Computers , 2012 , 35 ( 3 ): 446 - 453 .

胡浪涛 , 毕松姣 , 刘全金 , 等 . 基于深度强化学习的多小区NOMA能效优化功率分配算法 [J ] . 电子科技大学学报 , 2022 , 51 ( 3 ): 384 - 391 .

HU L T , BI S J , LIU Q J , et al . Multi-cell NOMA energy efficiency optimization power allocation algorithm based on deep reinforcement learning [J ] . Journal of University of Electronic Science and Technology of China , 2022 , 51 ( 3 ): 384 - 391 . (in Chinese)

李保罡 , 石泰 , 陈静 , 等 . 基于强化学习的非正交多址接入和移动边缘计算联合系统信息年龄更新 [J ] . 电子与信息学报 , 2022 , 44 ( 12 ): 4238 - 4245 .

LI B G , SHI T , CHEN J , et al . Age of information updates in non-orthogonal multiple access-mobile edge computing system based on reinforcement learning [J ] . Journal of Electronics & Information Technology , 2022 , 44 ( 12 ): 4238 - 4245 . (in Chinese)

WANG S , LIU H , GOMES P H , et al . Deep reinforcement learning for dynamic multichannel access in wireless networks [J ] . IEEE Transactions on Cognitive Communications and Networking , 2018 , 4 ( 2 ): 257 - 265 .

MNIH V , KAVUKCUOGLU K , SILVER D , et al . Playing atari with deep reinforcement learning [EB/OL ] . [2023 ] . http://arxiv.org/abs/1312.5602.pdf http://arxiv.org/abs/1312.5602.pdf .

宋波 , 叶伟 , 孟祥辉 . 基于多智能体强化学习的动态频谱分配方法综述 [J ] . 系统工程与电子技术 , 2021 , 43 ( 11 ): 3338 - 3351 .

SONG B , YE W , MENG X H . Review of multi-agent reinforcement learning based dynamic spectrum allocation method [J ] . Systems Engineering and Electronics , 2021 , 43 ( 11 ): 3338 - 3351 . (in Chinese)

NAPARSTEK O , COHEN K . Deep multi-user reinforcement learning for distributed dynamic spectrum access [J ] . IEEE Transactions on Wireless Communications , 2019 , 18 ( 1 ): 310 - 323 .

YU Y , WANG T , LIEW S C . Deep-reinforcement learning multiple access for heterogeneous wireless networks‍ [J ] . IEEE Journal on Selected Areas in Communications , 2019 , 37 ( 6 ): 1277 - 1290 .

GUO Z , CHEN Z , LIU P , et al . Multi-agent reinforcement learning-based distributed channel access for next generation wireless networks [J ] . IEEE Journal on Selected Areas in Communications , 2022 , 40 ( 5 ): 1587 - 1599 .

RASHID T , SAMVELYAN M , DE WITT C S , et al . Monotonic value function factorisation for deep multi-agent reinforcement learning [J ] . The Journal of Machine Learning Research , 2020 , 21 ( 1 ): 7234 - 7284 .

YU C , VELU A , VINITSKY E , et al . The surprising effectiveness of PPO in cooperative multi-agent games [J ] . Advances in Neural Information Processing Systems , 2022 , 35 : 24611 - 24624 .

NAMISLO C . Analysis of mobile radio slotted ALOHA networks [J ] . IEEE Journal on Selected Areas in Communications , 2006 , 2 ( 4 ): 583 - 588 .

CASSANDRA A R . A survey of POMDP applications‍ [C ] // AAAI 1998 Symposium on Planning with Partially Observable Markov Decision Processes . California : AAAI , 1998 : 1724 .

SCHULMAN J , WOLSKI F , DHARIWAL P , et al . Proximal policy optimization algorithms [EB/OL ] . [2023 ] . http://arxiv.org/abs/1707.06347.pdf http://arxiv.org/abs/1707.06347.pdf .

PETERS J , SCHAAL S . Natural actor-critic [J ] . Neurocomputing , 2008 , 71 ( 7-9 ): 1180 - 1190 .

SUTTON R S , BARTO A G . Reinforcement Learning: An Introduction [M ] . Massachusetts : MIT Press , 2018 .

DIALLO E A O , SUGIYAMA A , SUGAWARA T . Learning to coordinate with deep reinforcement learning in doubles pong game [C ] // 16th IEEE International Conference on Machine Learning and Applications (ICMLA) . Piscataway : IEEE , 2017 : 14 - 19 .

NAPARSTEK O , COHEN K . Deep multi-user reinforcement learning for distributed dynamic spectrum access [J ] . IEEE Transactions on Wireless Communications , 2019 , 18 ( 1 ): 310 - 323 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

面向工业场景的边-云协同大语言模型细粒度推理任务卸载

基于因果思维树的电动汽车电池SOC预测模型

车联网边缘计算环境下基于流量预测的高效任务卸载策略研究

面向公平性数据采集和能量补充的无人机路径规划算法研究

基于信息融合的区块链系统隐匿安全补丁识别及迁移技术