[1] Taylor J,Precup D,Panangaden P.Bounding performance loss in approximate MDP homomorphisms[A].Proceedings of the 22nd Annual Conference on Neural Information Processing Systems[C].NY:Curran Associates,2008.1660-1667.
[2] 王皓,高阳,陈兴国.强化学习中的迁移:方法和进展[J].电子学报,2008,36(12A):39-43. Wang Hao,Gao Yang,Chen Xinguo.Transfer of reinforcement learning:The state of the art[J].Acta Electronica Sinica,2008,36(12A):39-43.(in Chinese)
[3] Sunmola F T,Wyatt J L.Model transfer for Markov decision tasks via parameter matching[A].Proceedings of the 25th Workshop of the UK Planning and Scheduling Special Interest Group[C].Nottingham,England,2006.17-24.
[4] Konidaris G D,Barto A G.Building portable options:skill transfer in reinforcement learning[A].Proceedings of the 20th International Joint Conference on Artificial Intelligence[C].CA:Morgan Kaufmann Publishers,2007.895-901.
[5] Ferrante E,Lazaric A,Restelli M.Transfer of task representation in reinforcement learning using policy-based proto-value functions[A].Proceedings of the 7th International Conference on Autonomous Agents and Multi-Agent Systems[C].Estoril:,2008.1329-1332.
[6] Lazaric A,Restelli M,Bonarini A.Transfer of samples in batch reinforcement learning[A].Proceedings of the 25th International Conference on Machine Learning[C].NY:ACM Press,2008.544-551.
[7] Sorg J,Singh S.Transfer via soft homomorphisms[A].Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems[C].Hungary:,2009.741-748.
[8] Ammar H B,Taylor M,Tuyls K,Weiss G.Reinforcement learning transfer using a sparse coded inter-task mapping[A].Proceedings of the 9th European Workshop on Multi-agent Systems[C].Berlin:Springer-verlag,2012.1-16.
[9] Konidaris G D,Scheidwasser I and Barto A G.Transfer in Reinforcement Learning via Shared Features[J].Journal of Machine Learning Research,2012,13:1333-1371.
[10] Sutton R S,Barto A G.Reinforcement Learning[M].Cambridge:MIT Press,1998.
[11] Givan R,Dean T,Greig M.Equivalence notions and model minimization in Markov decision processes[J].Artificial Intelligence,2003,147(1-2):163-223.
[12] Ferns N,Panangaden P,Precup D.Metrics for finite markov decision processes[A].Proceeding of the 20th Conference on Uncertainty in Artificial Intelligence[C].Arlington:AUAI Press,2004.162-169.
[13] Wiering M andHasselt H V.The QV family compared to other reinforcement learning algorithms[A].Proceedings of IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning[C].Nashville:IEEE,2009.101-108. |