SDWN中基于多智能体图强化学习的多对多通信路由方法

文鹏; 叶苗; 王勇; 何倩; 仇洪冰

doi:10.12263/DZXB.20240980

您当前的位置：

首页 >

文章列表页 >

SDWN中基于多智能体图强化学习的多对多通信路由方法

学术论文 | 更新时间：2025-10-16

- SDWN中基于多智能体图强化学习的多对多通信路由方法
- A Multi-Agent Graph Reinforcement Learning Method for Many-to-Many Communication Routing in SDWN
- 电子学报 2025年53卷第6期页码：1885-1905
- 作者机构：
  
  1.桂林电子科技大学信息与通信学院，广西桂林 541004
  2.桂林电子科技大学广西无线宽带通信与信号处理重点实验室，广西桂林 541004
  3.桂林电子科技大学认知无线电与信息处理省部共建教育部重点实验室，广西桂林 541004
  4.桂林电子科技大学计算机与信息安全学院，广西桂林 541004
- 作者简介：
  
  [ "文鹏男，1994年生，贵州毕节人.现为桂林电子大学信息与通信学院博士.主要研究方向为软件定义网络、强化学习和随机优化与应用等.E-mail: 22021101006@mails.guet.edu.cn" ]
  [ "叶苗男，1977年生，广西桂林人.现为桂林电子科技大学信息与通信学院教授、博士生导师.主要研究方向为边缘存储与云存储、软件定义网络、无线传感网络、模式识别与机器学习等.E-mail: yemiao@guet.edu.cn" ]
  [ "王勇男，1964年生，四川成都人.现为桂林电子科技大学计算机与信息安全学院教授、博士生导师.主要研究方向为云计算、网络流量分析与信息安全等.中国电子学会会员编号：E190013611S.E-mail: ywang@guet.edu.cn" ]
  [ "何倩男，1979年生，湖南郴州人.现为桂林电子科技大学计算机与信息安全学院教授、博士生导师.主要研究方向为模式识别、机器学习、软件定义网络与传感器网络等.中国电子学会会员编号：E190021935S.E-mail: heqian@guet.edu.cn" ]
  [ "仇洪冰男，1963年生，江苏如皋人.现为桂林电子科技大学信息与通信学院教授、博士生导师.主要研究方向为宽带无线通信、通信信号处理、辐射源定位等.E-mail: qiuhb@guet.edu.cn" ]
- 基金信息：
  
  国家自然科学基金(62161006;62372353);广西无线宽带通信与信号处理重点实验室基金(桂科AD25069102);广西研究生教育创新计划基金(YCBZ2023134);认知无线电与信息处理教育部重点实验室主任基金(CRKL220103)
- DOI：10.12263/DZXB.20240980
  中图分类号： TP393;
- 收稿：2024-10-29，
  
  修回：2025-05-05，
  
  纸质出版：2025-06-25
- 稿件说明：
移动端阅览
文鹏, 叶苗, 王勇, 等. SDWN中基于多智能体图强化学习的多对多通信路由方法[J]. 电子学报, 2025, 53(06): 1885-1905.

WEN Peng, YE Miao, WANG Yong, et al. A Multi-Agent Graph Reinforcement Learning Method for Many-to-Many Communication Routing in SDWN[J]. Acta Electronica Sinica, 2025, 53(06): 1885-1905.
文鹏, 叶苗, 王勇, 等. SDWN中基于多智能体图强化学习的多对多通信路由方法[J]. 电子学报, 2025, 53(06): 1885-1905. DOI：10.12263/DZXB.20240980

WEN Peng, YE Miao, WANG Yong, et al. A Multi-Agent Graph Reinforcement Learning Method for Many-to-Many Communication Routing in SDWN[J]. Acta Electronica Sinica, 2025, 53(06): 1885-1905. DOI：10.12263/DZXB.20240980

摘要

多对多通信路由问题是NP（Nondeterministic Polynomial time）难的组合优化问题，构建出高效的多对多通信路由路径还需及时获取全局网络状态信息以适应网络状态高度动态变化的特点.本文在软件定义无线网络（Software-Defined Wireless Networks，SDWN）场景中针对现有数据驱动的多智能体深度强化学习方法存在计算和部署成本高、难以适应非欧结构特点的网络拓扑的问题，并且训练过程中无效动作过多会增加存储空间和时间开销以及收敛速度慢，本文设计了一种SDN控制平面和数据平面进行协同感知与智能决策的新框架，并针对多对多通信路由问题设计了一种两阶段的多智能体路由方法（基于智能节点部署策略的多智能体图强化学习方法：MAGDS-M2M）.为了降低在每个节点上都部署智能体所带来的计算和部署成本，设计了一种基于Q-学习的智能节点部署算法来确定需要部署智能体的网络节点；在完成多智能体部署后，在Actor-Critic（AC）框架下设计了一种基于多智能体图强化学习的多对多路由决策方法，基于图卷积网络（Graph Convolutional Networks，GCN）和图神经网络（Graph Neural Networks，GNN）重新设计Actor和Critic网络，解决了现有多智能体强化学习方法中卷积神经网络（Convolutional Neural Networks，CNN）对拓扑结构数据适应能力比较弱的问题；此外，为解决Actor网络固定长度的动作空间在训练过程中产生大量无效动作的问题，设计了一种新的动作空间局部观测方法.实验结果表明所提出的方法相比于基准实验降低了29.33%任务完成时延，并且验证了可以通过调节参数使任务完成的时延和各节点累计能耗标准差之间达到平衡.本文所做工作源代码已提交至开源平台

https：//github.com/GuetYe/MAGDS-M2M

https://github.com/GuetYe/MAGDS-M2M

Abstract

The many-to-many communication routing problem is an NP(Nondeterministic Polynomial time)-hard combinatorial optimization problem. Constructing efficient many-to-many communication routing paths requires timely acquisition of global network state information to adapt to the highly dynamic nature of network states. In this paper

within the context of software-defined wireless networks (SDWN)

we address the issues present in existing data-driven multi-

agent deep reinforcement learning methods

such as high computational and deployment costs

difficulty in adapting to the non-Euclidean characteristics of network topologies

excessive invalid actions during training leading to increased storage and time overheads

and slow convergence rates. This paper designs a new framework for collaborative sensing and intelligent decision-making between the SDN control plane and data plane and proposes a two-stage multi-agent routing method (Multi-Agent Graph deep reinforcement learning method based on intelligent node Deployment Strategy

MAGDS-M2M) to address the multi-to-multi communication routing problem. To reduce the computational and deployment costs associated with deploying agents on every node

a Q-learning-based intelligent node deployment algorithm is designed to determine the network nodes where agents need to be deployed. After completing the multi-agent deployment

a multi-to-multi routing decision method based on multi-agent graph reinforcement learning is developed within the actor-critic (AC) framework. This method redesigns the actor and critic networks using graph convolutional networks (GCN) and graph neural networks (GNN)

addressing the weak adaptability of convolutional neural networks (CNN) to topological structure data in existing multi-agent reinforcement learning approaches. Additionally

to solve the issue of generating a large number of invalid actions during training caused by the fixed-length action space of the Actor network

a new local observation method for the action space is proposed. Experimental results demonstrate that the proposed method reduces task completion delay by 29.33% compared to benchmark experiments and verifies that by adjusting parameters

a balance can be achieved between task completion delay and the standard deviation of cumulative energy consumption across nodes. The source code developed in this work has been submitted to the open-source platform at

https://github.com/GuetYe/MAGDS-M2M

关键词

Keywords

references

DE MORAES R M , SADJADPOUR H R , GARCIA-LUNA-ACEVES J J . Many-to-many communication for mobile ad hoc networks [J ] . IEEE Transactions on Wireless Communications , 2009 , 8 ( 5 ): 2388 - 2399 .

XIONG S G , LI J Z . Optimizing Many-to-Many Data Aggregation in Wireless Sensor Networks [M ] // Advances in Data and Web Management . Berlin, Heidelberg : Springer Berlin Heidelberg , 2009 : 550 - 555 .

GUO D K , TENG X Q , HU Z Y , et al . Source selection problem in multi-source multi-destination multicasting [J ] . Computer Networks , 2017 , 127 : 43 - 55 .

ARSHAD S , AZAM M A , AHMED S H , et al . Towards information-centric networking (ICN) naming for internet of things (IoT): The case of smart campus [C ] // Proceedings of the International Conference on Future Networks and Distributed Systems . New York : ACM , 2017 .

LIU R Z , ZHU Y T , ZHANG Y , et al . Resource mobility aware hybrid task planning in space information networks [J ] . Journal of Communications and Information Networks , 2019 , 4 ( 4 ): 107 - 116 .

MARTINEZ G , LI S F , ZHOU C . Multi-commodity online maximum lifetime utility routing for energy-harvesting wireless sensor networks [C ] // 2014 IEEE Global Communications Conference . Piscataway : IEEE , 2014 : 106 - 111 .

BELEY O , CHAPLYHA V . A management of cloud services in social-economic systems [C ] // Proceedings of the 20th International Conference on Information Technology for PracticeInformation . Athens : Panhellenic Conference on Informatics , 2017 : 33 - 45 .

MI H B , XU K L , FENG D W , et al . Collaborative deep learning across multiple data centers [J ] . Science China Information Sciences , 2020 , 63 ( 8 ): 11432 .

MOTTOLA L , PICCO G P . MUSTER: Adaptive energy-aware multisink routing in wireless sensor networks [J ] . IEEE Transactions on Mobile Computing , 2011 , 10 ( 12 ): 1694 - 1709 .

CHEN Y R , RADHAKRISHNAN S , DHALL S , et al . On multi-stream multi-source multicast routing [J ] . Computer Networks , 2013 , 57 ( 15 ): 2916 - 2930 .

REN C , CHEN X X , XIANG H Y , et al . On efficient delay-aware multisource multicasting in NFV-enabled softwarized networks [J ] . IEEE Transactions on Network and Service Management , 2022 , 19 ( 3 ): 3371 - 3386 .

JAIN K , PADHYE J , PADMANABHAN V N , et al . Impact of interference on multi-hop wireless network performance [J ] . Wireless Networks , 2005 , 11 ( 4 ): 471 - 487 .

HE D J , CHAN S , GUIZANI M . Securing software defined wireless networks [J ] . IEEE Communications Magazine , 2016 , 54 ( 1 ): 20 - 25 .

HU H L , CHEN H H , MUELLER P , et al . Software defined wireless networks (SDWN): Part 1 [guest editorial] [J ] . IEEE Communications Magazine , 2015 , 53 ( 11 ): 108 - 109 .

FORNEY G D . The viterbi algorithm [J ] . Proceedings of the IEEE , 1973 , 61 ( 3 ): 268 - 278 .

CASAS-VELASCO D M , RENDON O M C , FONSECA N L S DA . Intelligent routing based on reinforcement learning for software-defined networking [J ] . IEEE Transactions on Network and Service Management , 2021 , 18 ( 1 ): 870 - 881 .

YE M , ZHAO C W , WEN P , et al . DHRL-FNMR: An intelligent multicast routing approach based on deep hierarchical reinforcement learning in SDN [J ] . IEEE Transactions on Network and Service Management , 2024 , 21 ( 5 ): 5733 - 5755 .

OKINE A A , ADAM N , NAEEM F , et al . Multi-agent deep reinforcement learning for packet routing in tactical mobile sensor networks [J ] . IEEE Transactions on Network and Service Management , 2024 , 21 ( 2 ): 2155 - 2169 .

ALAM M Z , KHAN K S , JAMALIPOUR A . Multiagent best routing in high-mobility digital-twin-driven Internet of vehicles (IoV) [J ] . IEEE Internet of Things Journal , 2024 , 11 ( 8 ): 13708 - 13721 .

ZAFEIRIOU S , BRONSTEIN M , COHEN T , et al . Guest editorial: Non-euclidean machine learning [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 44 ( 2 ): 723 - 726 .

CHEN L , HU B , GUAN Z H , et al . Multiagent meta-reinforcement learning for adaptive multipath routing optimization [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2022 , 33 ( 10 ): 5374 - 5386 .

TAN D D , KIM D S . Dynamic traffic-aware routing algorithm for multi-sink wireless sensor networks [J ] . Wireless Networks , 2014 , 20 ( 6 ): 1239 - 1250 .

SUN B L , LI L Y . Optimizing on multiple constrained QoS multicast routing algorithms based on GA [J ] . Journal of Systems Engineering and Electronics , 2004 , 15 ( 4 ): 677 - 683 .

LI W , LI K S , HUANG Y , et al . A EA- and ACA-based QoS multicast routing algorithm with multiple constraints for ad hoc networks [J ] . Soft Computing , 2017 , 21 ( 19 ): 5717 - 5727 .

MANN P S , SINGH S . Energy-efficient hierarchical routing for wireless sensor networks: A swarm intelligence approach [J ] . Wireless Personal Communications , 2017 , 92 ( 2 ): 785 - 805 .

PAN X Q , PENG D L , LI S M . Quantum binary improved artificial bee colony algorithm to solve the spanning tree construction problem in vehicular ad hoc network [J ] . IEEE Internet of Things Journal , 2024 , 11 ( 22 ): 36014 - 36029 .

BOYAN J , LITTMAN M . Packet routing in dynamically changing networks: A reinforcement learning approach [J ] . Advances in Neural Information Processing Systems , 1993 , 6 : 671 - 678 .

YAO Z , WANG Y , QIU X S . DQN-based energy-efficient routing algorithm in software-defined data centers [J ] . International Journal of Distributed Sensor Networks , 2020 , 16 ( 6 ): 15501477209 .

LU Y , CHEN Y H , XU X , et al . A sub-flow adaptive multipath routing algorithm for data centre network [J ] . International Journal of Computational Intelligence Systems , 2023 , 16 ( 1 ): 25 .

ZHOU W , JIANG X , GUO B L , et al . PQROM: To optimize software defined network QoS-aware routing with proximal policy optimization [J ] . Journal of Intelligent Fuzzy Systems , 42 ( 4 ): 3605 - 3614 .

QIU X , XIE Y , WANG Y , et al . QLGR: A Q-learning-based geographic FANET routing algorithm based on multi-agent reinforcement learning [J ] . KSII Transactions on Internet and Information Systems , 2021 , 15 ( 11 ): 4244 - 4274 .

ABDOLLAHI M , NI W , ABOLHASAN M , et al . Software-defined networking-based adaptive routing for multi-hop multi-frequency wireless mesh [J ] . IEEE Transactions on Vehicular Technology , 2021 , 70 ( 12 ): 13073 - 13086 .

TRIMPONIAS G , XIAO Y , WU X R , et al . Node-constrained traffic engineering: Theory and applications [J ] . IEEE/ACM Transactions on Networking , 2019 , 27 ( 4 ): 1344 - 1358 .

WEI Q L , LEWIS F L , SUN Q Y , et al . Discrete-time deterministic Q-learning: A novel convergence analysis [J ] . IEEE Transactions on Cybernetics , 2017 , 47 ( 5 ): 1224 - 1237 .

SILVER D , LEVER G , HEESS N , et al . Deterministic policy gradient algorithms [J ] . 31st International Conference on Machine Learning , ICML 2014, 2014 , 1 : 605 - 619 .

JANG E , GU S , POOLE B . Categorical reparameterization with gumbel-Softmax [C ] // International Conference on Learning Representations . Washington DC : ICLR , 2022 .

DEGRIS T , WHITE M , SUTTON R S . Off-Policy Actor-Critic [C ] // International Conference on Machine Learning . New York : ICML , 2012 .

LI L T , LI D Z , SONG T H , et al . Actor-critic learning control based on Q-regularized temporal-difference prediction with gradient correction [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2018 , 29 ( 12 ): 5899 - 5909 .

HAARNOJA T , ZHOU A , ABBEEL P , et al . Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [EB/OL ] . ( 2018-08-08 )[ 2025-05-12 ] . https://arxiv.org/abs/1801.01290v2 https://arxiv.org/abs/1801.01290v2 .

SUTTON R S , MCALLESTER D , SINGH S , et al . Policy gradient methods for reinforcement learning with function approximation [J ] . Advances in neural information processing systems , 1999 , 12 : 06643 .

BHATNAGAR S , SUTTON R S , GHAVAMZADEH M , et al . Natural actor-critic algorithms [J ] . Automatica , 2009 , 45 ( 11 ): 2471 - 2482 .

CHOI S P M , YEUNG D Y . Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control [C ] // Advances in Neural Information Processing Systems , San Diego : NIPS , 1995 , 8 : 945 - 951 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据