一种最大集合期望损失的多目标Sarsa(λ)算法

刘全; 李瑾; 傅启明; 崔志明; 伏玉琛

doi:10.3969/j.issn.0372-2112.2013.08.003

您当前的位置：

首页 >

文章列表页 >

一种最大集合期望损失的多目标Sarsa(λ)算法

学术论文 | 更新时间：2025-07-16

- 一种最大集合期望损失的多目标Sarsa(λ)算法
- A Multiple-Goal Sarsa(λ) Algorithm Based on Lost Reward of Greatest Mass
- 电子学报 2013年41卷第8期页码：1469-1473
- 作者机构：
  
  1. 苏州大学计算机与科学学院,江苏,苏州,215000
  2. 符号计算与知识工程教育部重点实验室(吉林大学),吉林,长春,130012
  3. 苏州大学计算机与科学学院,江苏,苏州,215000
  4. 符号计算与知识工程教育部重点实验室(吉林大学),吉林,长春,130012
- 作者简介：
- 基金信息：
  
  国家自然科学基金 (No.61070223,No.61103045,No.61272005,No.61170020);江苏省自然科学基金 (No.BK2012616);江苏省高校自然科学研究项目 (No.09KJA520002,No.09KJB520012);吉林大学符号计算与知识工程教育部重点实验室项目 (No.93K172012K04)
- DOI：10.3969/j.issn.0372-2112.2013.08.003
  中图分类号： TP181
- 纸质出版：2013
- 稿件说明：
移动端阅览
刘全, 李瑾, 傅启明, 等. 一种最大集合期望损失的多目标Sarsa(λ)算法[J]. 电子学报, 2013,41(8):1469-1473.

LIU Quan, LI Jin, FU Qi-ming, et al. A Multiple-Goal Sarsa(λ) Algorithm Based on Lost Reward of Greatest Mass[J]. Acta Electronica Sinica, 2013, 41(8): 1469-1473.
刘全, 李瑾, 傅启明, 等. 一种最大集合期望损失的多目标Sarsa(λ)算法[J]. 电子学报, 2013,41(8):1469-1473. DOI： 10.3969/j.issn.0372-2112.2013.08.003.

LIU Quan, LI Jin, FU Qi-ming, et al. A Multiple-Goal Sarsa(λ) Algorithm Based on Lost Reward of Greatest Mass[J]. Acta Electronica Sinica, 2013, 41(8): 1469-1473. DOI： 10.3969/j.issn.0372-2112.2013.08.003.

摘要

针对RoboCup这一典型的多目标强化学习问题

提出一种基于最大集合期望损失的多目标强化学习算法LRGM-Sarsa(

)算法.该算法预估各个目标的最大集合期望损失

在平衡各个目标的前提下选择最佳联合动作以产生最优联合策略.在单个目标训练的过程中

采用基于改进MSBR误差函数的Sarsa(

)算法

并对动作选择概率函数和步长参数进行优化

解决了强化学习在使用非线性函数泛化时

算法不稳定、不收敛的问题.将该算法应用到RoboCup射门局部策略训练中

取得了较好的效果

表明该学习算法的有效性.

Abstract

For solving the multiple-goal problem in RoboCup

a novel multiple-goal Reinforcement Learning algorithm

named LRGM-Sarsa(

)

is proposed.The algorithm estimates the lost reward of the greatest mass of every sub goal and trades off the long term reward of the sub goals to get a composite policy.In the single learning module

B error function

which is based on MSBR error function is proposed.B error function has guaranteed the convergence of the value prediction with the non-linear function approximation.The probability funciton of selecting actions and the parameter

are also improved with respect to B error function.This algorithm is applied to the training of shooting in Robocup 2D.The experimental results show that the pro

posed algorithm is more stable and converges faster.

关键词

Keywords

references

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于图组合优化的高效社区搜索

知识数据协同的多对手智能空中博弈策略设计

基于强化学习的免调参即插即用单光子图像重建方法

基于强化学习的离散事件系统最优定向监控

基于强化学习的自免疫动态攻击生成方法