电子学报 ›› 2022, Vol. 50 ›› Issue (7): 1744-1752.DOI: 10.12263/DZXB.20211252

• 学术论文 • 上一篇    下一篇

未知环境下基于深度序列蒙特卡罗树搜索的信源导航方法

段世红1,2, 何昊1,2, 徐诚1,2, 殷楠1, 王然1,2   

  1. 1.北京科技大学计算机与通信工程学院,北京 100083
    2.北京科技大学顺德研究生院,广东 佛山 528399
  • 收稿日期:2021-09-13 修回日期:2021-12-28 出版日期:2022-07-25
    • 作者简介:
    • 段世红 女,1973年生,山西太原人.北京科技大学副教授.研究方向为多智能体系统、嵌入式计算与无线定位、数值优化与分布式计算.
      段世红 女,1973年生,山西太原人. 北京科技大学副教授. 研究方向为多智能体系统、嵌入式计算与无线定位、数值优化与分布式计算.
      何昊 男,1997年生,湖南永州人. 研究方向为多智能体系统、无线定位.
      徐诚 男,1988年生,辽宁开原人. 北京科技大学副教授. 研究方向为群体智能与协同计算、多智能体系统与分布式安全.E-mail: xucheng@ustb.edu.cn
      殷 楠 女,1996年生,山西忻州人.研究方向为多智能体系统、无线定位.
      殷楠 女,1996年生,山西忻州人. 研究方向为多智能体系统、无线定位.
      王 然 女,1991年生,北京人.主要研究方向为群体智能与协同计算、多智能体系统与分布式安全.
      王然 女,1991年生,北京人. 主要研究方向为群体智能与协同计算、多智能体系统与分布式安全.
    • 基金资助:
    • 国家自然科学基金 (62101029); 博士后创新人才支持计划 (BX20190033); 广东省基础与应用基础研究基金联合基金 (2019A1515110325); 中国博士后基金面上项目 (2020M670135); 北京科技大学顺德研究生院博士后科研经费 (2020BH001); 中央高校基本科研业务费 (06500127)

DS-MCTS: A Deep Sequential Monte-Carlo Tree Search Method for Source Navigation in Unknown Environments

DUAN Shi-hong1,2, HE Hao1,2, XU Cheng1,2, YIN Nan1, WANG Ran1,2   

  1. 1.School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
    2.Shunde Graduate School, University of Science and Technology Beijing, Foshan, Guangdong 528399, China
  • Received:2021-09-13 Revised:2021-12-28 Online:2022-07-25 Published:2022-07-30
    • Supported by:
    • National Natural Science Foundation of China (62101029); Post doctoral Innovative Talents Support Program (BX20190033); Joint Fund of Basic and Applied Basic Research Foundation of Guangdong Province (2019A1515110325); Post-doctoral Foundation of China (2020M670135); Postdoctoral Research Fund of Shunde Innovation School, University of Science and Technology Beijing (2020BH001); Fundamental Research Funds for the Central Universities (06500127)

摘要:

信源导航在应急救援、工业巡检及其他危险作业中具有重要应用意义.在实际应用中,环境的状态信息往往是难以完全观测的,即部分可观测环境.如何利用观测到的部分环境信息做出实时决策,并基于历史序列信息对系统未来状态进行有效的预测,成为信源导航相关研究所面临的挑战性问题.本文提出一种基于深度序列蒙特卡洛树搜索(Deep Sequential Monte-Carlo Tree Search,DS-MCTS)的信源导航算法和系统框架,基于序列动作预测(Sequential Action Prediction,SAP)网络为MCTS决策提供先验知识,构建奖励分配预测(Reward Allocation Prediction,RAP)网络提高奖励分配精度,最终实现系统的最优化决策.仿真实验表明,DS-MCTS方法提供了一种端到端的信源导航解决方案,可以实现智能体动作的有效预测,实现高效、鲁棒的路径规划.

关键词: 信源导航, 蒙特卡洛树搜索, 序贯决策, 路径规划, 深度强化学习

Abstract:

Source navigation has important application significance in emergency rescue, industrial patrol, and other dangerous operations. In practical applications, it is often difficult to fully observe the state information of the environment, that is, a partially observable environment. Making real-time decisions using part of the observed environmental information and effectively predicting the system's future state based on the historical sequence information have become a challenge faced by research institutes related to source navigation. This paper proposes a source navigation algorithm and system framework based on deep sequential Monte-Carlo tree search(DS-MCTS). Prior knowledge is provided to MCTS decision-making based on a sequential action prediction(SAP) network. A reward allocation prediction(RAP) network is built to improve the accuracy of reward distribution and finally realize the system's optimal decision-making. The simulation results show that the DS-MCTS method provides an end-to-end source navigation solution, which can effectively predict agents' actions and achieve efficient and robust path planning.

Key words: source navigation, Monte-Carlo tree search, sequential decision-making, path planning, deep reinforcement learning

中图分类号: