A Bayesian Temporal Difference Algorithm Based on Random Projection

LIU Quan; YU Jun; WANG Hui; FU Qi-ming; ZHU Fei

doi:10.3969/j.issn.0372-2112.2016.11.026

您当前的位置：

首页 >

文章列表页 >

A Bayesian Temporal Difference Algorithm Based on Random Projection

更新时间：2025-07-16

- A Bayesian Temporal Difference Algorithm Based on Random Projection
- Acta Electronica Sinica Vol. 44, Issue 11, Pages: 2752-2757(2016)
- 作者机构：
  
  1. 苏州大学计算机科学与技术学院,江苏,苏州,215006
  2. 吉林大学符号计算与知识工程教育部重点实验室,吉林,长春,130012
  3. 软件新技术与产业化协同创新中心,江苏,南京,210023
  4. 苏州大学计算机科学与技术学院,江苏,苏州,215006
  5. 吉林大学符号计算与知识工程教育部重点实验室,吉林,长春,130012
  6. 软件新技术与产业化协同创新中心,江苏,南京,210023
- 作者简介：
- 基金信息：
- DOI：10.3969/j.issn.0372-2112.2016.11.026
  CLC： TP181
- Published Online：25 November 2016，
  
  Published：2016
- 稿件说明：
移动端阅览
LIU Quan, YU Jun, WANG Hui, et al. A Bayesian Temporal Difference Algorithm Based on Random Projection[J]. Acta Electronica Sinica, 2016, 44(11): 2752-2757.
DOI：

LIU Quan, YU Jun, WANG Hui, et al. A Bayesian Temporal Difference Algorithm Based on Random Projection[J]. Acta Electronica Sinica, 2016, 44(11): 2752-2757. DOI： 10.3969/j.issn.0372-2112.2016.11.026.

摘要

在强化学习方法中，大部分的算法都是基于值函数评估的算法.高斯过程时间差分算法利用贝叶斯方法来评估值函数，通过贝尔曼公式和贝叶斯规则，建立立即奖赏与值函数之间的概率生成模型.在状态空间中，通过在线核稀疏化并利用最小二乘方法来求解新样本的近似线性逼近，以提高算法的执行速度，但时间复杂度依然较高.针对在状态空间中近似状态的选择问题，在高斯过程框架下提出一种基于随机投影的贝叶斯时间差分算法，该算法利用哈希函数把字典状态集合中的元素映射成哈希值，根据哈希值进行分组，进而减少状态之间的比较.实验结果表明，该方法不仅能够提高算法的执行速度，而且较好地平衡了评估状态值函数精度和算法执行时间.

Abstract

Most algorithms are based on policy evaluation in reinforcement learning.The Gaussian process temporal difference is an algorithm that uses Bayesian solution to evaluate value functions.In the method

Gaussian process builds a probabilistic generative model between the immediate reward and the value function through Bellman Equation and Bayesian rule.In order to improve the efficiency of the algorithm

approximate linear approximation for new samples is solved by on-line kernel sparse and least squares in state space.However

the time complexity is still high.To deal with this problem

a Bayesian temporal difference algorithm bases on random projection algorithm is proposed.The elements in dictionary state set are mapped to hash values by hash function.According to the hash values

groups are divided and the comparison between the states is reduced.The experimental results show that this algorithm not only improves the execution speed

but also obtains balance between execution time and precision of the state value function.

关键词

Keywords

references

Views

1343

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Reinforcement Learning for Continuous Spaces Based on Gaussian Process Classifier

CS-ROMF:Efficient Community Search Based on Graph Combinatorial Optimization

Knowledge-Based and Data-Driven Integrating Design Methodology for Air Combat Strategy in Multi-Opponent Adversarial Game

Reinforcement Learning Based Tuning-free Plug-and-Play Image Reconstruction Method for Single Photon Imaging

Optimal Directed Control of Discrete Event Systems Based on Reinforcement Learning

Related Author

WANG Xue-song

ZHANG Yi-yang

CHENG Yu-hu

ZHANG An-ran

WANG Xing-fen

ZHAO Yu-han

LI Li-bo

FENG Jin-yuan

Related Institution

中国矿业大学信息与电气工程学院

中国科学院自动化研究所

中国矿业大学信息与电气工程学院江苏徐州

中国科学院自动化研究所北京

Beijing Information Science and Technology University

⁰