一种基于模型的可分解贝叶斯在线强化学习

仵博; 郑红燕; 冯延蓬; 陈鑫

doi:10.3969/j.issn.0372-2112.2014.07.029

您当前的位置：

首页 >

文章列表页 >

一种基于模型的可分解贝叶斯在线强化学习

科研通信 | 更新时间：2025-07-16

- 一种基于模型的可分解贝叶斯在线强化学习
- Model-Based Factored Bayesian Online Reinforcement Learning
- 电子学报 2014年42卷第7期页码：1429-1434
- 作者机构：
  
  1. 深圳职业技术学院教育技术与信息中心,广东,深圳,518055
  2. 中南大学信息科学与工程学院,湖南,长沙,410083
  3. 先进控制与智能自动化湖南省工程实验室,湖南,长沙,410083
  4. 深圳职业技术学院教育技术与信息中心,广东,深圳,518055
  5. 中南大学信息科学与工程学院,湖南,长沙,410083
  6. 先进控制与智能自动化湖南省工程实验室,湖南,长沙,410083
- 作者简介：
- 基金信息：
  
  国家自然科学基金 (No.61074058，No.60874042）;深圳市自然科学基金 (No.JCYJ20120617134831736）
- DOI：10.3969/j.issn.0372-2112.2014.07.029
  中图分类号： TP181
- 纸质出版：2014
- 稿件说明：
移动端阅览
仵博, 郑红燕, 冯延蓬, 等. 一种基于模型的可分解贝叶斯在线强化学习[J]. 电子学报, 2014,42(7):1429-1434.

WU Bo, ZHENG Hong-yan, FENG Yan-peng, et al. Model-Based Factored Bayesian Online Reinforcement Learning[J]. Acta Electronica Sinica, 2014, 42(7): 1429-1434.
仵博, 郑红燕, 冯延蓬, 等. 一种基于模型的可分解贝叶斯在线强化学习[J]. 电子学报, 2014,42(7):1429-1434. DOI： 10.3969/j.issn.0372-2112.2014.07.029.

WU Bo, ZHENG Hong-yan, FENG Yan-peng, et al. Model-Based Factored Bayesian Online Reinforcement Learning[J]. Acta Electronica Sinica, 2014, 42(7): 1429-1434. DOI： 10.3969/j.issn.0372-2112.2014.07.029.

摘要

针对贝叶斯强化学习中参数个数巨大，收敛速度慢，无法实现在线学习的问题，提出一种基于模型的可分解贝叶斯强化学习方法.首先，将学习参数进行可分解表示，降低学习参数的个数；然后，根据先验知识和观察数据采用贝叶斯方法来学习，最优化探索和利用二者之间的平衡关系；最后，采用基于点的贝叶斯强化学习方法实现学习过程的快速收敛，从而达到在线学习的目的.仿真结果表明该算法能够满足实时系统性能的要求.

Abstract

Due to the enormous number of parameters and slow convergence which are the major obstacles for online learning in model-based Bayesian reinforcement learning

the paper presents a model-based factored Bayesian reinforcement learning approach.Firstly

factored representations are made to represent the dynamics with fewer parameters.Then

according to prior knowledge and observable data

this paper exploits model-based reinforcement learning to provide an elegant solution to the optimal exploration-exploitation tradeoff.Finally

a pointed-based Bayesian reinforcement learning approach is proposed to speed up the convergence to achieve online learning.The experimental results show that the proposed approach can approximate the underlying Bayesian reinforcement learning task well with guaranteed real-time performance.

关键词

Keywords

references

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据