电子学报 ›› 2020, Vol. 48 ›› Issue (11): 2170-2177.DOI: 10.3969/j.issn.0372-2112.2020.11.011

• 学术论文 • 上一篇    下一篇

一种基于深度增强学习的智能路由技术

孙鹏浩, 兰巨龙, 申涓, 胡宇翔   

  1. 解放军战略支援部队信息工程大学, 河南郑州 450002
  • 收稿日期:2019-09-23 修回日期:2019-12-23 出版日期:2020-11-25
    • 作者简介:
    • 孙鹏浩 男,1992年生于山东即墨.现为国家数字交换系统工程技术研究中心硕士研究生.主要研究方向为可编程网络数据平面.E-mail:sphshine@126.com;兰巨龙 男,1962年生于河北张北.现为国家数字交换系统工程技术研究中心教授、博士生导师.主要研究方向为新一代信息网络关键理论与技术.E-mail:ndscljl@163.com
    • 基金资助:
    • 国家自然科学基金 (No.61521003,No.61702547,No.61872382); 国家重点研发计划课题 (No.2017YFB0803204); 广东省重点领域研发计划项目 (No.2018B010113001)

An Intelligent Routing Technology Based on Deep Reinforcement Learning

SUN Peng-hao, LAN Ju-long, SHEN Juan, HU Yu-xiang   

  1. PLA Strategic Support Force Information Engineering University, Zhengzhou, Henan 450002, China
  • Received:2019-09-23 Revised:2019-12-23 Online:2020-11-25 Published:2020-11-25
    • Supported by:
    • National Natural Science Foundation of China (No.61521003, No.61702547, No.61872382); National Key Research and Development Program of China (No.2017YFB0803204); Key-Area Research and Development Program of Guangdong Province (No.2018B010113001)

摘要: 随着网络规模的不断增大以及网络复杂度的不断提高,传统路由算法面对网络流量在时空分布上的剧烈波动难以兼顾计算复杂度和算法效率.近年来,随着软件定义网络和人工智能技术的兴起,基于机器学习的自动路由策略生成逐渐受到关注.本文提出一种基于深度增强学习的智能路由技术SmartPath,通过动态收集网络状态,使用深度增强学习自动生成路由策略,从而保证路由策略能够动态适应网络流量变化.实验结果表明,本文所提出的方案能够不依赖人工流量建模动态更新网络路由,在测试环境下比当前最优方案减少至少10%的平均端到端传输时延.

关键词: 路由优化, 软件定义网络, 人工智能, 深度增强学习

Abstract: With the expansion of network scale and network complexity, traditional routing algorithms cannot ensure both the calculation complexity and performance under the large fluctuation of spatial-temporal distribution of network traffic. In recent years, with the development of Software-Defined Networking (SDN) and Artificial Intelligence (AI), AI-based methods of automatic routing strategies are gaining attention. In this paper, we propose an intelligent network routing technology called SmartPath based on Deep Reinforcement Learning (DRL). With dynamic collection of network status, we can use DRL to generate routing policies automatically, thus ensuring that the routing policy can dynamically adapt to the change of network traffic. Experiment result shows that the proposed scheme can adjust the routing strategy dynamically without human experience on traffic analysis and can reduce the average end-to-end transmission delay by at least 10% compared with the state-of-art schemes.

Key words: routing optimization, software-defined networking (SDN), artificial intelligence (AI), deep reinforcement learning (DRL)

中图分类号: