基于高斯过程分类器的连续空间强化学习

王雪松; 张依阳; 程玉虎

您当前的位置：

首页 >

文章列表页 >

基于高斯过程分类器的连续空间强化学习

论文 | 更新时间：2025-07-16

- 基于高斯过程分类器的连续空间强化学习
- Reinforcement Learning for Continuous Spaces Based on Gaussian Process Classifier
- 电子学报 2009年37卷第6期页码：1153-1158
- 作者机构：
  
  1. 中国矿业大学信息与电气工程学院,江苏,徐州,221116
  2. 中国科学院自动化研究所,北京,100190
  3. 中国矿业大学信息与电气工程学院江苏徐州,221116
  4. 中国科学院自动化研究所北京,100190
- 作者简介：
- 基金信息：
  
  教育部新世纪优秀人才支持计划 (No.NCET-08-0836);国家自然科学基金 (No.60804022);江苏省自然科学基金 (No.BK2008126);高等学校博士学科点专项科研基金 (No.20070290537,200802901506);国家博士后科学基金 (No.20070411064)
- DOI：
  中图分类号： TP18
- 纸质出版：2009
- 稿件说明：
移动端阅览
王雪松, 张依阳, 程玉虎. 基于高斯过程分类器的连续空间强化学习[J]. 电子学报, 2009,37(6):1153-1158.

WANG Xue-song, ZHANG Yi-yang, CHENG Yu-hu. Reinforcement Learning for Continuous Spaces Based on Gaussian Process Classifier[J]. Acta Electronica Sinica, 2009, 37(6): 1153-1158.
王雪松, 张依阳, 程玉虎. 基于高斯过程分类器的连续空间强化学习[J]. 电子学报, 2009,37(6):1153-1158. DOI：

WANG Xue-song, ZHANG Yi-yang, CHENG Yu-hu. Reinforcement Learning for Continuous Spaces Based on Gaussian Process Classifier[J]. Acta Electronica Sinica, 2009, 37(6): 1153-1158. DOI：

摘要

如何将强化学习方法推广到大规模或连续空间

是决定强化学习方法能否得到广泛应用的关键.不同于已有的值函数逼近法

把强化学习构建为一个简单的二分类问题

利用分类算法来得到强化学习中的策略

提出一种基于高斯过程分类器的连续状态和连续动作空间强化学习方法.首先将连续动作空间离散化为确定数目的离散动作

然后利用高斯分类器对系统的连续状态-离散动作对进行正负分类

对判定为正类的离散动作按其概率值进行加权求和

进而得到实际作用于系统的连续动作.小船靠岸问题的仿真结果表明所提方法能够有效解决强化学习的连续空间表示问题.

Abstract

The generalization of reinforcement learning methods to large-scale or continuous spaces has become a major focus in the research field of reinforcement learning.Unlike the present reinforcement learning methods for continuous spaces based on a value-function approximation method

the reinforcement learning is constructed as a simple binary-class problem.A kind of reinforcement learning method for continuous state and action spaces based on a Gaussian process classifier is proposed using a classification algorithm to obtain a control policy.At first

a continuous action space is discretized into discrete actions with definite number

and the Gaussian process classifier is used to predict the probability of class for a continuous-state-discrete-action pair.Then a continuous action is generated based on a weighted operation of the positive actions with their probability values.Computer simulations involving a boat problem illustrate the validity of the proposed reinforcement learning method.

关键词

Keywords

references

浏览量

2887

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

一种基于随机投影的贝叶斯时间差分算法

基于图组合优化的高效社区搜索

知识数据协同的多对手智能空中博弈策略设计

基于强化学习的免调参即插即用单光子图像重建方法

基于强化学习的离散事件系统最优定向监控