LIU Quan, YU Jun, WANG Hui, et al. A Bayesian Temporal Difference Algorithm Based on Random Projection[J]. Acta Electronica Sinica, 2016, 44(11): 2752-2757.
DOI:
LIU Quan, YU Jun, WANG Hui, et al. A Bayesian Temporal Difference Algorithm Based on Random Projection[J]. Acta Electronica Sinica, 2016, 44(11): 2752-2757. DOI: 10.3969/j.issn.0372-2112.2016.11.026.
A Bayesian Temporal Difference Algorithm Based on Random Projection
Most algorithms are based on policy evaluation in reinforcement learning.The Gaussian process temporal difference is an algorithm that uses Bayesian solution to evaluate value functions.In the method
Gaussian process builds a probabilistic generative model between the immediate reward and the value function through Bellman Equation and Bayesian rule.In order to improve the efficiency of the algorithm
approximate linear approximation for new samples is solved by on-line kernel sparse and least squares in state space.However
the time complexity is still high.To deal with this problem
a Bayesian temporal difference algorithm bases on random projection algorithm is proposed.The elements in dictionary state set are mapped to hash values by hash function.According to the hash values
groups are divided and the comparison between the states is reduced.The experimental results show that this algorithm not only improves the execution speed
but also obtains balance between execution time and precision of the state value function.