End-to-End Autonomous Driving Decision Based on Deep Reinforcement Learning

HUANG Zhi-qing; QU Zhi-wei; ZHANG Ji; ZHANG Yan-xin; TIAN Rui

doi:10.3969/j.issn.0372-2112.2020.09.007

您当前的位置：

首页 >

文章列表页 >

End-to-End Autonomous Driving Decision Based on Deep Reinforcement Learning

更新时间：2025-12-08

- End-to-End Autonomous Driving Decision Based on Deep Reinforcement Learning
- Acta Electronica Sinica Vol. 48, Issue 9, Pages: 1711-1719(2020)
- 作者机构：
  
  1. 北京工业大学信息学部,北京,100124
  2. 北京交通大学电子信息工程学院,北京,100044
  3. 北京市物联网软件与系统工程技术研究中心,北京,100124
  4. 北京工业大学信息学部,北京,100124
  5. 北京交通大学电子信息工程学院,北京,100044
  6. 北京市物联网软件与系统工程技术研究中心,北京,100124
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China (No.61502018)
- DOI：10.3969/j.issn.0372-2112.2020.09.007
  CLC： TP242.6
- Published Online：25 September 2020，
  
  Published：2020
- 稿件说明：
移动端阅览
HUANG Zhi-qing, QU Zhi-wei, ZHANG Ji, et al. End-to-End Autonomous Driving Decision Based on Deep Reinforcement Learning[J]. Acta Electronica Sinica, 2020, 48(9): 1711-1719.
DOI：

HUANG Zhi-qing, QU Zhi-wei, ZHANG Ji, et al. End-to-End Autonomous Driving Decision Based on Deep Reinforcement Learning[J]. Acta Electronica Sinica, 2020, 48(9): 1711-1719. DOI： 10.3969/j.issn.0372-2112.2020.09.007.

摘要

端到端的驾驶决策是无人驾驶领域的研究热点.本文基于DDPG（Deep Deterministic Policy Gradient）的深度强化学习算法对连续型动作输出的端到端驾驶决策展开研究.首先建立基于DDPG算法的端到端决策控制模型，模型根据连续获取的感知信息（如车辆转角，车辆速度，道路距离等）作为输入状态，输出车辆驾驶动作（加速，刹车，转向）的连续型控制量.然后在TORCS（The Open Racing Car Simulator）平台下不同的行驶环境中进行训练并验证，结果表明该模型可以实现端到端的无人驾驶决策.最后与离散型动作输出的DQN（Deep Q-learning Network）模型进行对比分析，实验结果表明DDPG决策模型具有更优越的决策控制效果.

Abstract

The end-to-end driving decision making is a research hotspot in the field of autonomous driving. This paper studies the end-to-end driving decision of continuous action output based on DDPG (Deep Deterministic Policy Gradient) deep reinforcement learning algorithm. First

an end-to-end decision-making control model based on DDPG algorithm is established. The model outputs the continuous control quantity of vehicle driving action (acceleration

braking

steering) according to the continuously acquired perception information (such as vehicle angle

vehicle speed

road distance

etc.) as the input state. Then

the model is trained and verified in different driving environments on the platform of TORCS (The Open Racing Car Simulator). The results show that the model can realize the end-to-end decision-making of autonomous driving. At last

it is compared with DQN (Deep Q-Learning Network) model of discrete action output. The experimental results show that DDPG model has better decision control effect.

关键词

Keywords

references

Views

138

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

A Causal Tree-of-Thought-Based Model for Battery State-of-Charge Prediction in Electric Vehicles

Efficient Task Offloading Based on Traffic Prediction in IoV-Enabled Edge Computing

Research on UAV Path Planning Algorithm for Fairness Data Collection and Energy Supplement

Multi-Channel Dynamic Spectrum Access Based on Multi-Agent Proximal Policy Optimization

Multi-Agent Reinforcement Learning Enabled Spectrum Sharing for Vehicular Networks

Related Author

PENG Zi-ran

YANG Xiao-yang

LI Xue-yong

ZHOU Yu

XU Xiao-long

YANG Wei

YANG Chen-yi

CHENG Yong

Related Institution

School of Transportation and Electrical Engineering, Hunan University of Technology

State Key Laboratory for Novel Software Technology, Nanjing University

School of Software, Nanjing University of Information Science and Technology

Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology

Jiangsu Province Engineering Research Center of Advanced Computing and Intelligent Services

⁰