1. 苏州大学计算机与科学学院,江苏,苏州,215000
2. 符号计算与知识工程教育部重点实验室(吉林大学),吉林,长春,130012
3. 苏州大学计算机与科学学院,江苏,苏州,215000
4. 符号计算与知识工程教育部重点实验室(吉林大学),吉林,长春,130012
纸质出版:2013
移动端阅览
刘全, 李瑾, 傅启明, 等. 一种最大集合期望损失的多目标Sarsa(λ)算法[J]. 电子学报, 2013,41(8):1469-1473.
LIU Quan, LI Jin, FU Qi-ming, et al. A Multiple-Goal Sarsa(λ) Algorithm Based on Lost Reward of Greatest Mass[J]. Acta Electronica Sinica, 2013, 41(8): 1469-1473.
刘全, 李瑾, 傅启明, 等. 一种最大集合期望损失的多目标Sarsa(λ)算法[J]. 电子学报, 2013,41(8):1469-1473. DOI: 10.3969/j.issn.0372-2112.2013.08.003.
LIU Quan, LI Jin, FU Qi-ming, et al. A Multiple-Goal Sarsa(λ) Algorithm Based on Lost Reward of Greatest Mass[J]. Acta Electronica Sinica, 2013, 41(8): 1469-1473. DOI: 10.3969/j.issn.0372-2112.2013.08.003.
针对RoboCup这一典型的多目标强化学习问题
提出一种基于最大集合期望损失的多目标强化学习算法LRGM-Sarsa(
λ
)算法.该算法预估各个目标的最大集合期望损失
在平衡各个目标的前提下选择最佳联合动作以产生最优联合策略.在单个目标训练的过程中
采用基于改进MSBR误差函数的Sarsa(
λ
)算法
并对动作选择概率函数和步长参数进行优化
解决了强化学习在使用非线性函数泛化时
算法不稳定、不收敛的问题.将该算法应用到RoboCup射门局部策略训练中
取得了较好的效果
表明该学习算法的有效性.
For solving the multiple-goal problem in RoboCup
a novel multiple-goal Reinforcement Learning algorithm
named LRGM-Sarsa(
λ
)
is proposed.The algorithm estimates the lost reward of the greatest mass of every sub goal and trades off the long term reward of the sub goals to get a composite policy.In the single learning module
B error function
which is based on MSBR error function is proposed.B error function has guaranteed the convergence of the value prediction with the non-linear function approximation.The probability funciton of selecting actions and the parameter
α
are also improved with respect to B error function.This algorithm is applied to the training of shooting in Robocup 2D.The experimental results show that the pro
posed algorithm is more stable and converges faster.
0
浏览量
2
下载量
2
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621