Model-Based Factored Bayesian Online Reinforcement Learning

WU Bo; ZHENG Hong-yan; FENG Yan-peng; CHEN Xin

doi:10.3969/j.issn.0372-2112.2014.07.029

您当前的位置：

首页 >

文章列表页 >

Model-Based Factored Bayesian Online Reinforcement Learning

更新时间：2025-07-16

- Model-Based Factored Bayesian Online Reinforcement Learning
- Acta Electronica Sinica Vol. 42, Issue 7, Pages: 1429-1434(2014)
- 作者机构：
  
  1. 深圳职业技术学院教育技术与信息中心,广东,深圳,518055
  2. 中南大学信息科学与工程学院,湖南,长沙,410083
  3. 先进控制与智能自动化湖南省工程实验室,湖南,长沙,410083
  4. 深圳职业技术学院教育技术与信息中心,广东,深圳,518055
  5. 中南大学信息科学与工程学院,湖南,长沙,410083
  6. 先进控制与智能自动化湖南省工程实验室,湖南,长沙,410083
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China (No.61074058, No.60874042);Shenzhen Natural Science Foundation (No.JCYJ20120617134831736)
- DOI：10.3969/j.issn.0372-2112.2014.07.029
  CLC： TP181
- Published：2014
- 稿件说明：
移动端阅览
WU Bo, ZHENG Hong-yan, FENG Yan-peng, et al. Model-Based Factored Bayesian Online Reinforcement Learning[J]. Acta Electronica Sinica, 2014, 42(7): 1429-1434.
DOI：

WU Bo, ZHENG Hong-yan, FENG Yan-peng, et al. Model-Based Factored Bayesian Online Reinforcement Learning[J]. Acta Electronica Sinica, 2014, 42(7): 1429-1434. DOI： 10.3969/j.issn.0372-2112.2014.07.029.

摘要

针对贝叶斯强化学习中参数个数巨大，收敛速度慢，无法实现在线学习的问题，提出一种基于模型的可分解贝叶斯强化学习方法.首先，将学习参数进行可分解表示，降低学习参数的个数；然后，根据先验知识和观察数据采用贝叶斯方法来学习，最优化探索和利用二者之间的平衡关系；最后，采用基于点的贝叶斯强化学习方法实现学习过程的快速收敛，从而达到在线学习的目的.仿真结果表明该算法能够满足实时系统性能的要求.

Abstract

Due to the enormous number of parameters and slow convergence which are the major obstacles for online learning in model-based Bayesian reinforcement learning

the paper presents a model-based factored Bayesian reinforcement learning approach.Firstly

factored representations are made to represent the dynamics with fewer parameters.Then

according to prior knowledge and observable data

this paper exploits model-based reinforcement learning to provide an elegant solution to the optimal exploration-exploitation tradeoff.Finally

a pointed-based Bayesian reinforcement learning approach is proposed to speed up the convergence to achieve online learning.The experimental results show that the proposed approach can approximate the underlying Bayesian reinforcement learning task well with guaranteed real-time performance.

关键词

Keywords

references

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

No data

Related Author

No data

Related Institution

No data

⁰