ZHU Fei, LIU Quan, FU Qi-ming, et al. A Policy Search and Transfer Approach in the Non-stationary Environment[J]. Acta Electronica Sinica, 2017, 45(2): 257-266.
DOI:
ZHU Fei, LIU Quan, FU Qi-ming, et al. A Policy Search and Transfer Approach in the Non-stationary Environment[J]. Acta Electronica Sinica, 2017, 45(2): 257-266. DOI: 10.3969/j.issn.0372-2112.2017.02.001.
A Policy Search and Transfer Approach in the Non-stationary Environment
which obtains the optimal policy with the maximum expected cumulative reward by interacting with the environment
is mostly based on the stationary Markov Decision Process (MDP) but however is unable to deal with problems of the non-stationary case because traditional reinforcement learning algorithms cannot be used to learn an optimal policy directly due to the failure of MDP model after the agent once interacts with the environment.Hereby
a novel policy search algorithm based on a formula set (FSPS)
which is generated by features extracted from the collected historical sample trajectories
was proposed.The algorithm adopted the formula with the best performance as the optimal policy.The algorithm also took advantage of concept of transfer learning by transferred the learned policy between two similar MDP distributions
where the performance of the transferred policy mainly depends on the distance between two MDP distributions as well as the performance of the learned policy in the original MDP distribution.Simulation results on the Markov Chain problem show that the algorithm can solve the problem of the non-stationary case quite well.