1. 苏州大学计算机科学与技术学院,江苏,苏州,215006
2. 吉林大学符号计算与知识工程教育部重点实验室,吉林,长春,130012
3. 软件新技术与产业化协同创新中心,江苏,南京,210023
4. 苏州大学计算机科学与技术学院,江苏,苏州,215006
5. 吉林大学符号计算与知识工程教育部重点实验室,吉林,长春,130012
6. 软件新技术与产业化协同创新中心,江苏,南京,210023
网络出版:2016-11-25,
纸质出版:2016
移动端阅览
刘全, 于俊, 王辉, 等. 一种基于随机投影的贝叶斯时间差分算法[J]. 电子学报, 2016,44(11):2752-2757.
LIU Quan, YU Jun, WANG Hui, et al. A Bayesian Temporal Difference Algorithm Based on Random Projection[J]. Acta Electronica Sinica, 2016, 44(11): 2752-2757.
刘全, 于俊, 王辉, 等. 一种基于随机投影的贝叶斯时间差分算法[J]. 电子学报, 2016,44(11):2752-2757. DOI: 10.3969/j.issn.0372-2112.2016.11.026.
LIU Quan, YU Jun, WANG Hui, et al. A Bayesian Temporal Difference Algorithm Based on Random Projection[J]. Acta Electronica Sinica, 2016, 44(11): 2752-2757. DOI: 10.3969/j.issn.0372-2112.2016.11.026.
在强化学习方法中,大部分的算法都是基于值函数评估的算法.高斯过程时间差分算法利用贝叶斯方法来评估值函数,通过贝尔曼公式和贝叶斯规则,建立立即奖赏与值函数之间的概率生成模型.在状态空间中,通过在线核稀疏化并利用最小二乘方法来求解新样本的近似线性逼近,以提高算法的执行速度,但时间复杂度依然较高.针对在状态空间中近似状态的选择问题,在高斯过程框架下提出一种基于随机投影的贝叶斯时间差分算法,该算法利用哈希函数把字典状态集合中的元素映射成哈希值,根据哈希值进行分组,进而减少状态之间的比较.实验结果表明,该方法不仅能够提高算法的执行速度,而且较好地平衡了评估状态值函数精度和算法执行时间.
Most algorithms are based on policy evaluation in reinforcement learning.The Gaussian process temporal difference is an algorithm that uses Bayesian solution to evaluate value functions.In the method
Gaussian process builds a probabilistic generative model between the immediate reward and the value function through Bellman Equation and Bayesian rule.In order to improve the efficiency of the algorithm
approximate linear approximation for new samples is solved by on-line kernel sparse and least squares in state space.However
the time complexity is still high.To deal with this problem
a Bayesian temporal difference algorithm bases on random projection algorithm is proposed.The elements in dictionary state set are mapped to hash values by hash function.According to the hash values
groups are divided and the comparison between the states is reduced.The experimental results show that this algorithm not only improves the execution speed
but also obtains balance between execution time and precision of the state value function.
0
浏览量
1343
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621