Multi-Agent Reinforcement Learning Enabled Spectrum Sharing for Vehicular Networks

WANG Wei-nian; SU Jian; CHEN Yong; ZHANG Jian-zhao; TANG Zhen

doi:10.12263/DZXB.20220320

您当前的位置：

首页 >

文章列表页 >

Multi-Agent Reinforcement Learning Enabled Spectrum Sharing for Vehicular Networks

PAPERS | 更新时间：2025-12-11

- Multi-Agent Reinforcement Learning Enabled Spectrum Sharing for Vehicular Networks
- ACTA ELECTRONICA SINICA Vol. 52, Issue 5, Pages: 1690-1699(2024)
- 作者机构：
  
  1.南京信息工程大学计算机与软件学院，江苏南京 210044
  2.国防科技大学第六十三研究所，江苏南京210007
- 作者简介：
- 基金信息：
- DOI：10.12263/DZXB.20220320
  CLC： TP393.1;
- Received：28 March 2022，
  
  Revised：2022-07-13，
  
  Published：25 May 2024
- 稿件说明：
移动端阅览
王为念, 苏健, 陈勇, 等. 基于多智能体深度强化学习的车联网频谱共享[J]. 电子学报, 2024, 52(05): 1690-1699.

WANG Wei-nian, SU Jian, CHEN Yong, et al. Multi-Agent Reinforcement Learning Enabled Spectrum Sharing for Vehicular Networks[J]. Acta Electronica Sinica, 2024, 52(05): 1690-1699.
王为念, 苏健, 陈勇, 等. 基于多智能体深度强化学习的车联网频谱共享[J]. 电子学报, 2024, 52(05): 1690-1699. DOI：10.12263/DZXB.20220320

WANG Wei-nian, SU Jian, CHEN Yong, et al. Multi-Agent Reinforcement Learning Enabled Spectrum Sharing for Vehicular Networks[J]. Acta Electronica Sinica, 2024, 52(05): 1690-1699. DOI：10.12263/DZXB.20220320

摘要

针对高动态车联网环境中基站难以收集和管理瞬时信道状态信息的问题，提出了基于多智能体深度强化学习的车联网频谱分配算法.该算法以车辆通信延迟和可靠性约束条件下最大化网络吞吐量为目标，利用学习算法改进频谱和功率分配策略.首先通过改进DQN模型和Exp3策略训练隐式协作智能体.其次，利用迟滞性Q学习和并发体验重放轨迹解决多智能体并发学习引起的非平稳性问题.仿真结果表明，该算法有效载荷平均成功交付率可达95.89%，比随机基线算法提高了16.48%，可快速获取近似最优解，在降低车联网通信系统信令开销方面具有显著优势.

Abstract

Aiming at the problem that it is difficult for base stations to collect and manage instantaneous channel state information in high dynamic vehicle networking environment

a spectrum allocation algorithm for vehicle networking based on multi-agent deep reinforcement learning is proposed. The algorithm aims to maximize the network throughput under the constraints of vehicle communication delay and reliability

and uses the learning algorithm to improve the spectrum and power allocation strategy. Firstly

the implicit cooperative agent is trained by improving DQN model and EXP3 strategy. Secondly

the nonstationary problem caused by multi-agent concurrent learning is solved by using hysteretic Q-learning and concurrent experience replay trajectory. The simulation results show that the average successful delivery rate of the payload of the proposed algorithm can reach 95.89%

which is 16.48% higher than the random baseline algorithm. It can quickly obtain the approximate optimal solution

and has significant advantages in reducing the signaling overhead of the Internet of vehicles communication system.

关键词

Keywords

references

GYAWALI S , XU S , QIAN Y , et al . Challenges and solutions for cellular based V2X communications [J ] . IEEE Communications Surveys & Tutorials , 2020 , 23 ( 1 ): 222 - 255 .

MNIH V , KAVUKCUOGLU K , SILVER D , et al . Human-level control through deep reinforcement learning [J ] . Nature , 2015 , 518 ( 7540 ): 529 - 533 .

WANG S , LIU H , GOMES P H , et al . Deep reinforcement learning for dynamic multichannel access in wireless networks [J ] . IEEE Transactions on Cognitive Communications and Networking , 2018 , 4 ( 2 ): 257 - 265 .

NAPARSTEK O , COHEN K . Deep multi-user reinforcement learning for distributed dynamic spectrum access [J ] . IEEE Transactions on Wireless Communications , 2018 , 18 ( 1 ): 310 - 323 .

YU Y , WANG T , LIEW S C . Deep-reinforcement learning multiple access for heterogeneous wireless networks [J ] . IEEE Journal on Selected Areas in Communications , 2019 , 37 ( 6 ): 1277 - 1290 .

YE H , LI G Y , JUANG B H F . Deep reinforcement learning based resource allocation for V2V communications [J ] . IEEE Transactions on Vehicular Technology , 2019 , 68 ( 4 ): 3163 - 3173 .

ZHANG X , PENG M , YAN S , et al . Deep-reinforcement-learning-based mode selection and resource allocation for cellular V2X communications [J ] . IEEE Internet of Things Journal , 2019 , 7 ( 7 ): 6380 - 6391 .

黄煜梵 , 彭诺蘅 , 林艳 , 等 . 基于SAC强化学习的车联网频谱资源动态分配 [J ] . 计算机工程 , 2021 , 47 ( 9 ): 34 - 43 .

HUANG Y F , PENG N H , LIN Y , et al . Dynamic al location of spectrum resources of Internet of vehicles based on sac reinforcement learning [J ] . Computer Engineering , 2021 , 47 ( 9 ): 34 - 43 . (in Chinese)

WANG L , YE H , LIANG L , et al . Learn to compress CSI and allocate resources in vehicular networks [J ] . IEEE Transactions on Communications , 2020 , 68 ( 6 ): 3640 - 3653 .

许新操 , 刘凯 , 刘春晖 , 等 . 基于势博弈的车载边缘计算信道分配方法 [J ] . 电子学报 , 2021 , 49 ( 5 ): 851 - 860 .

XU X C , LIU K , LIU C H , et al . Channel allocation method for vehicle edge computing based on potential game [J ] . Acta Electronica Sinica , 2021 , 49 ( 5 ): 851 - 860 . (in Chinese)

LIANG L , YE H , LI G Y . Spectrum sharing in vehicular networks based on multi-agent reinforcement learning [J ] . IEEE Journal on Selected Areas in Communications , 2019 , 37 ( 10 ): 2282 - 2292 .

LE T D , KADDOUM G . A distributed channel access scheme for vehicles in multi-agent V2I systems [J ] . IEEE Transactions on Cognitive Communications and Networking , 2020 , 6 ( 4 ): 1297 - 1307 .

XU Y H , YANG C C , HUA M , et al . Deep deterministic policy gradient (DDPG)-based resource allocation scheme for NOMA vehicular communications [J ] . IEEE Access , 2020 , 8 : 18797 - 18807 .

XIANG P , SHAN H , WANG M , et al . Multi-agent rl enables decentralized spectrum access in vehicular networks [J ] . IEEE Transactions on Vehicular Technology , 2021 , 70 ( 10 ): 10750 - 10762 .

XU Y , YU J , BUEHRER R M . The application of deep reinforcement learning to distributed spectrum access in dynamic heterogeneous environments with partial observations [J ] . IEEE Transactions on Wireless Communications , 2020 , 19 ( 7 ): 4494 - 4506 .

HAUSKNECHT M , STONE P . Deep recurrent Q-learning for partially observable MDPs [J ] . AAAI Fall Symposium - Technical Report , 2015 , 3 : 29 - 37 .

WANG Z , SCHAUL T , HESSEL M , et al . Dueling network architectures for deep reinforcement learning [C ] // International Conference on Machine Learning . New York : ACM , 2016 : 1995 - 2003 .

HASSELT H . Double Q-learning [J ] . Advances in Neural Information Processing Systems , 2010 , 23 : 2613 - 2621 .

VAN HASSELT H , GUEZ A , SILVER D . Deep reinforcement learning with double Q-learning [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . New York : ACM , 2016 : 2094 - 2100 .

MATIGNON L , LAURENT G J , LE FORT-PIAT N . Hysteretic Q-learning: An algorithm for decentralized reinforcement learning in cooperative multi-agent teams [C ] // 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems . Piscataway : IEEE , 2007 : 64 - 69 .

OMIDSHAFIEI S , PAZIS J , AMATO C , et al . Deep decentralized multi-task multi-agent reinforcement learning under partial observability [C ] // International Conference on Machine Learning . New York : ACM , 2017 : 2681 - 2690 .

AUER P , CESA-BIANCHI N , FREUND Y , et al . Gambling in a rigged casino: The adversarial multi-armed bandit problem [C ] // Proceedings of IEEE 36th Annual Foundations of Computer Science . Piscataway : IEEE , 1995 : 322 - 331 .

LG Electronics , Deutsche Telekom . WF on SLS evaluation assumptions for eV 2 X[EB/OL ] . ( 2016 )[2022 ] . https://www.3gpp.org/ftp/tsg_ran/WG1_RL1/TSGR1_85/Docs/R1-165704.zip https://www.3gpp.org/ftp/tsg_ran/WG1_RL1/TSGR1_85/Docs/R1-165704.zip .

MARTIN D , WERNER M , AFIF O . WINNER II channel models [M ] // Radio Technologies and Concepts for IMT-Advanced . New York : Wiley , 2010 : 39 - 92 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Multi-Agent Reinforcement Learning enabled Spectrum Sharing for Vehicular Networks

A Causal Tree-of-Thought-Based Model for Battery State-of-Charge Prediction in Electric Vehicles

Efficient Task Offloading Based on Traffic Prediction in IoV-Enabled Edge Computing

Research on UAV Path Planning Algorithm for Fairness Data Collection and Energy Supplement

Related Author

WANG Wei-nian

SU Jian

CHEN Yong

ZHANG Jian-zhao

TANG Zhen

PENG Zi-ran

YANG Xiao-yang

LI Xue-yong

Related Institution

School of Computer and Software， Nanjing University of Information Science and Technology

The 63rd Research Institute， National University of Defense Technology

School of Transportation and Electrical Engineering, Hunan University of Technology

School of Software, Nanjing University of Information Science and Technology

State Key Laboratory for Novel Software Technology, Nanjing University

⁰