

浏览全部资源
扫码关注微信
1.南京信息工程大学计算机与软件学院,江苏南京 210044
2.国防科技大学第六十三研究所,江苏南京210007
Received:28 March 2022,
Revised:2022-07-13,
Published:25 May 2024
移动端阅览
王为念, 苏健, 陈勇, 等. 基于多智能体深度强化学习的车联网频谱共享[J]. 电子学报, 2024, 52(05): 1690-1699.
WANG Wei-nian, SU Jian, CHEN Yong, et al. Multi-Agent Reinforcement Learning Enabled Spectrum Sharing for Vehicular Networks[J]. Acta Electronica Sinica, 2024, 52(05): 1690-1699.
王为念, 苏健, 陈勇, 等. 基于多智能体深度强化学习的车联网频谱共享[J]. 电子学报, 2024, 52(05): 1690-1699. DOI:10.12263/DZXB.20220320
WANG Wei-nian, SU Jian, CHEN Yong, et al. Multi-Agent Reinforcement Learning Enabled Spectrum Sharing for Vehicular Networks[J]. Acta Electronica Sinica, 2024, 52(05): 1690-1699. DOI:10.12263/DZXB.20220320
针对高动态车联网环境中基站难以收集和管理瞬时信道状态信息的问题,提出了基于多智能体深度强化学习的车联网频谱分配算法.该算法以车辆通信延迟和可靠性约束条件下最大化网络吞吐量为目标,利用学习算法改进频谱和功率分配策略.首先通过改进DQN模型和Exp3策略训练隐式协作智能体.其次,利用迟滞性Q学习和并发体验重放轨迹解决多智能体并发学习引起的非平稳性问题.仿真结果表明,该算法有效载荷平均成功交付率可达95.89%,比随机基线算法提高了16.48%,可快速获取近似最优解,在降低车联网通信系统信令开销方面具有显著优势.
Aiming at the problem that it is difficult for base stations to collect and manage instantaneous channel state information in high dynamic vehicle networking environment
a spectrum allocation algorithm for vehicle networking based on multi-agent deep reinforcement learning is proposed. The algorithm aims to maximize the network throughput under the constraints of vehicle communication delay and reliability
and uses the learning algorithm to improve the spectrum and power allocation strategy. Firstly
the implicit cooperative agent is trained by improving DQN model and EXP3 strategy. Secondly
the nonstationary problem caused by multi-agent concurrent learning is solved by using hysteretic Q-learning and concurrent experience replay trajectory. The simulation results show that the average successful delivery rate of the payload of the proposed algorithm can reach 95.89%
which is 16.48% higher than the random baseline algorithm. It can quickly obtain the approximate optimal solution
and has significant advantages in reducing the signaling overhead of the Internet of vehicles communication system.
GYAWALI S , XU S , QIAN Y , et al . Challenges and solutions for cellular based V2X communications [J ] . IEEE Communications Surveys & Tutorials , 2020 , 23 ( 1 ): 222 - 255 .
MNIH V , KAVUKCUOGLU K , SILVER D , et al . Human-level control through deep reinforcement learning [J ] . Nature , 2015 , 518 ( 7540 ): 529 - 533 .
WANG S , LIU H , GOMES P H , et al . Deep reinforcement learning for dynamic multichannel access in wireless networks [J ] . IEEE Transactions on Cognitive Communications and Networking , 2018 , 4 ( 2 ): 257 - 265 .
NAPARSTEK O , COHEN K . Deep multi-user reinforcement learning for distributed dynamic spectrum access [J ] . IEEE Transactions on Wireless Communications , 2018 , 18 ( 1 ): 310 - 323 .
YU Y , WANG T , LIEW S C . Deep-reinforcement learning multiple access for heterogeneous wireless networks [J ] . IEEE Journal on Selected Areas in Communications , 2019 , 37 ( 6 ): 1277 - 1290 .
YE H , LI G Y , JUANG B H F . Deep reinforcement learning based resource allocation for V2V communications [J ] . IEEE Transactions on Vehicular Technology , 2019 , 68 ( 4 ): 3163 - 3173 .
ZHANG X , PENG M , YAN S , et al . Deep-reinforcement-learning-based mode selection and resource allocation for cellular V2X communications [J ] . IEEE Internet of Things Journal , 2019 , 7 ( 7 ): 6380 - 6391 .
黄煜梵 , 彭诺蘅 , 林艳 , 等 . 基于SAC强化学习的车联网频谱资源动态分配 [J ] . 计算机工程 , 2021 , 47 ( 9 ): 34 - 43 .
HUANG Y F , PENG N H , LIN Y , et al . Dynamic al location of spectrum resources of Internet of vehicles based on sac reinforcement learning [J ] . Computer Engineering , 2021 , 47 ( 9 ): 34 - 43 . (in Chinese)
WANG L , YE H , LIANG L , et al . Learn to compress CSI and allocate resources in vehicular networks [J ] . IEEE Transactions on Communications , 2020 , 68 ( 6 ): 3640 - 3653 .
许新操 , 刘凯 , 刘春晖 , 等 . 基于势博弈的车载边缘计算信道分配方法 [J ] . 电子学报 , 2021 , 49 ( 5 ): 851 - 860 .
XU X C , LIU K , LIU C H , et al . Channel allocation method for vehicle edge computing based on potential game [J ] . Acta Electronica Sinica , 2021 , 49 ( 5 ): 851 - 860 . (in Chinese)
LIANG L , YE H , LI G Y . Spectrum sharing in vehicular networks based on multi-agent reinforcement learning [J ] . IEEE Journal on Selected Areas in Communications , 2019 , 37 ( 10 ): 2282 - 2292 .
LE T D , KADDOUM G . A distributed channel access scheme for vehicles in multi-agent V2I systems [J ] . IEEE Transactions on Cognitive Communications and Networking , 2020 , 6 ( 4 ): 1297 - 1307 .
XU Y H , YANG C C , HUA M , et al . Deep deterministic policy gradient (DDPG)-based resource allocation scheme for NOMA vehicular communications [J ] . IEEE Access , 2020 , 8 : 18797 - 18807 .
XIANG P , SHAN H , WANG M , et al . Multi-agent rl enables decentralized spectrum access in vehicular networks [J ] . IEEE Transactions on Vehicular Technology , 2021 , 70 ( 10 ): 10750 - 10762 .
XU Y , YU J , BUEHRER R M . The application of deep reinforcement learning to distributed spectrum access in dynamic heterogeneous environments with partial observations [J ] . IEEE Transactions on Wireless Communications , 2020 , 19 ( 7 ): 4494 - 4506 .
HAUSKNECHT M , STONE P . Deep recurrent Q-learning for partially observable MDPs [J ] . AAAI Fall Symposium - Technical Report , 2015 , 3 : 29 - 37 .
WANG Z , SCHAUL T , HESSEL M , et al . Dueling network architectures for deep reinforcement learning [C ] // International Conference on Machine Learning . New York : ACM , 2016 : 1995 - 2003 .
HASSELT H . Double Q-learning [J ] . Advances in Neural Information Processing Systems , 2010 , 23 : 2613 - 2621 .
VAN HASSELT H , GUEZ A , SILVER D . Deep reinforcement learning with double Q-learning [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . New York : ACM , 2016 : 2094 - 2100 .
MATIGNON L , LAURENT G J , LE FORT-PIAT N . Hysteretic Q-learning: An algorithm for decentralized reinforcement learning in cooperative multi-agent teams [C ] // 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems . Piscataway : IEEE , 2007 : 64 - 69 .
OMIDSHAFIEI S , PAZIS J , AMATO C , et al . Deep decentralized multi-task multi-agent reinforcement learning under partial observability [C ] // International Conference on Machine Learning . New York : ACM , 2017 : 2681 - 2690 .
AUER P , CESA-BIANCHI N , FREUND Y , et al . Gambling in a rigged casino: The adversarial multi-armed bandit problem [C ] // Proceedings of IEEE 36th Annual Foundations of Computer Science . Piscataway : IEEE , 1995 : 322 - 331 .
LG Electronics , Deutsche Telekom . WF on SLS evaluation assumptions for eV 2 X[EB/OL ] . ( 2016 )[2022 ] . https://www.3gpp.org/ftp/tsg_ran/WG1_RL1/TSGR1_85/Docs/R1-165704.zip https://www.3gpp.org/ftp/tsg_ran/WG1_RL1/TSGR1_85/Docs/R1-165704.zip .
MARTIN D , WERNER M , AFIF O . WINNER II channel models [M ] // Radio Technologies and Concepts for IMT-Advanced . New York : Wiley , 2010 : 39 - 92 .
0
Views
16
下载量
2
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621