一种新的基于值函数迁移的快速Sarsa算法

傅启明; 刘全; 尤树华; 黄蔚; 章晓芳

doi:10.3969/j.issn.0372-2112.2014.11.005

您当前的位置：

首页 >

文章列表页 >

一种新的基于值函数迁移的快速Sarsa算法

学术论文 | 更新时间：2025-07-16

- 一种新的基于值函数迁移的快速Sarsa算法
- A Novel Fast Sarsa Algorithm Based on Value Function Transfer
- 电子学报 2014年42卷第11期页码：2157-2161
- 作者机构：
  
  1. 苏州大学计算机科学与技术学院,江苏,苏州,215006
  2. 吉林大学符号计算与知识工程教育部重点实验室,吉林,长春,130012
  3. 苏州大学计算机科学与技术学院,江苏,苏州,215006
  4. 吉林大学符号计算与知识工程教育部重点实验室,吉林,长春,130012
- 作者简介：
- 基金信息：
  
  国家自然科学基金 (No.61103045,No.61303108);江苏省自然科学基金 (No.BK2012616);江苏省高校自然科学研究项目 (No.13KJB520020);吉林大学符号计算与知识工程教育部重点实验室资助 (No.93K17012K04)
- DOI：10.3969/j.issn.0372-2112.2014.11.005
  中图分类号： TP181
- 纸质出版：2014
- 稿件说明：
移动端阅览
傅启明, 刘全, 尤树华, 等. 一种新的基于值函数迁移的快速Sarsa算法[J]. 电子学报, 2014,42(11):2157-2161.

FU Qi-ming, LIU Quan, YOU Shu-hua, et al. A Novel Fast Sarsa Algorithm Based on Value Function Transfer[J]. Acta Electronica Sinica, 2014, 42(11): 2157-2161.
傅启明, 刘全, 尤树华, 等. 一种新的基于值函数迁移的快速Sarsa算法[J]. 电子学报, 2014,42(11):2157-2161. DOI： 10.3969/j.issn.0372-2112.2014.11.005.

FU Qi-ming, LIU Quan, YOU Shu-hua, et al. A Novel Fast Sarsa Algorithm Based on Value Function Transfer[J]. Acta Electronica Sinica, 2014, 42(11): 2157-2161. DOI： 10.3969/j.issn.0372-2112.2014.11.005.

摘要

知识迁移是当前机器学习领域的一个新的研究热点.其基本思想是通过将经验知识从历史任务到目标任务的迁移

达到提高算法收敛速度和收敛精度的目的.针对当前强化学习领域中经典算法收敛速度慢的问题

提出在学习过程中通过迁移值函数信息

减少算法收敛所需要的样本数量

加快算法的收敛速度.基于强化学习中经典的在策略Sarsa算法的学习框架

结合值函数迁移方法

优化算法初始值函数的设置

提出一种新的基于值函数迁移的快速Sarsa算法VFT-Sarsa.该算法在执行前期

通过引入自模拟度量方法

在状态空间以及动作空间一致的情况下

对目标任务中的状态与历史任务中的状态之间的距离进行度量

对其中相似并满足一定条件的状态进行值函数迁移

而后再通过学习算法进行学习.将VTF-Sarsa算法用于Random Walk问题

并与经典的Sarsa算法、Q学习算法以及具有较好收敛速度的QV算法进行比较

实验结果表明

该算法在保证收敛精度的基础上

具有更快的收敛速度.

Abstract

Knowledge Transfer has gradually became a research hot pot in machine learning

which tries to transfer the knowledge from the historical tasks to the target task in order to speed up the convergence rate and improve the performance of algorithms.With respect to the slow convergence rate of traditional reinforcement learning algorithms

this paper proposed to transfer the value function between different similar learning tasks with the same state space and action space

which tries to reduce the needed samples in the target task and speed up the convergence rate.Based on the framework of on-policy Sarsa algorithm

combined with the value function transfer method

this paper put forward a novel fast Sarsa algorithm based on the value function transferVFT-Sarsa.At the beginning

the algorithm uses Bisimulation metric to measure the distance between states in target task and historical task on the condition that these tasks have the same state space and action space

transfers the value function if the distance meets some condition

and finally executes the learning algorithm.At the end

apply the proposed algorithm in Random Walk

compared with Sarsa algorithm

Q-Learning and QV algorithm

the results show that the proposed algorithm can get a better convergence rate with a good performance.

关键词

Keywords

references

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于图组合优化的高效社区搜索

知识数据协同的多对手智能空中博弈策略设计

基于强化学习的免调参即插即用单光子图像重建方法

基于强化学习的离散事件系统最优定向监控

基于强化学习的自免疫动态攻击生成方法