1.北京科技大学计算机与通信工程学院,北京 100083
2.北京科技大学顺德研究生院,广东佛山 528399
[ "段世红 女,1973年生,山西太原人. 北京科技大学副教授. 研究方向为多智能体系统、嵌入式计算与无线定位、数值优化与分布式计算." ]
[ "何昊 男,1997年生,湖南永州人. 研究方向为多智能体系统、无线定位." ]
[ "徐诚 男,1988年生,辽宁开原人. 北京科技大学副教授. 研究方向为群体智能与协同计算、多智能体系统与分布式安全.E-mail: xucheng@ustb.edu.cn" ]
[ "殷楠 女,1996年生,山西忻州人. 研究方向为多智能体系统、无线定位." ]
[ "王然 女,1991年生,北京人. 主要研究方向为群体智能与协同计算、多智能体系统与分布式安全." ]
收稿:2021-09-13,
修回:2021-12-28,
纸质出版:2022-07-25
移动端阅览
段世红,何昊,徐诚等.未知环境下基于深度序列蒙特卡罗树搜索的信源导航方法[J].电子学报,2022,50(07):1744-1752.
DUAN Shi-hong,HE Hao,XU Cheng,et al.DS-MCTS: A Deep Sequential Monte-Carlo Tree Search Method for Source Navigation in Unknown Environments[J].ACTA ELECTRONICA SINICA,2022,50(07):1744-1752.
段世红,何昊,徐诚等.未知环境下基于深度序列蒙特卡罗树搜索的信源导航方法[J].电子学报,2022,50(07):1744-1752. DOI: 10.12263/DZXB.20211252.
DUAN Shi-hong,HE Hao,XU Cheng,et al.DS-MCTS: A Deep Sequential Monte-Carlo Tree Search Method for Source Navigation in Unknown Environments[J].ACTA ELECTRONICA SINICA,2022,50(07):1744-1752. DOI: 10.12263/DZXB.20211252.
信源导航在应急救援、工业巡检及其他危险作业中具有重要应用意义.在实际应用中,环境的状态信息往往是难以完全观测的,即部分可观测环境.如何利用观测到的部分环境信息做出实时决策,并基于历史序列信息对系统未来状态进行有效的预测,成为信源导航相关研究所面临的挑战性问题.本文提出一种基于深度序列蒙特卡洛树搜索(Deep Sequential Monte-Carlo Tree Search,DS-MCTS)的信源导航算法和系统框架,基于序列动作预测(Sequential Action Prediction,SAP)网络为MCTS决策提供先验知识,构建奖励分配预测(Reward Allocation Prediction,RAP)网络提高奖励分配精度,最终实现系统的最优化决策.仿真实验表明,DS-MCTS方法提供了一种端到端的信源导航解决方案,可以实现智能体动作的有效预测,实现高效、鲁棒的路径规划.
Source navigation has important application significance in emergency rescue
industrial patrol
and other dangerous operations. In practical applications
it is often difficult to fully observe the state information of the environment
that is
a partially observable environment. Making real-time decisions using part of the observed environmental information and effectively predicting the system's future state based on the historical sequence information have become a challenge faced by research institutes related to source navigation. This paper proposes a source navigation algorithm and system framework based on deep sequential Monte-Carlo tree search(DS-MCTS). Prior knowledge is provided to MCTS decision-making based on a sequential action prediction(SAP) network. A reward allocation prediction(RAP) network is built to improve the accuracy of reward distribution and finally realize the system's optimal decision-making. The simulation results show that the DS-MCTS method provides an end-to-end source navigation solution
which can effectively predict agents' actions and achieve efficient and robust path planning.
路永鑫 , 魏云冰 , 赵启承 , 等 . 基于层次分析法和改进A*算法的电力应急机器人路径规划 [J]. 电力系统保护与控制 , 2021 , 49 ( 9 ): 82 - 89 .
LU Y X , WEI Y B , ZHAO Q C , et al . Path planning of a power emergency robot based on an analytic hierarchy process and improved A* algorithm [J]. Power System Protection and Control , 2021 , 49 ( 9 ): 82 - 89 . (in Chinese)
MANDAL S , SAHA D , MAHANTI A . A heuristic search for generalized cellular network planning [C]// 2002 IEEE International Conference on Personal Wireless Communications . New Delhi, India : IEEE , 2002 : 105 - 109 .
LIN Q Z , LIU S B , ZHU Q L , et al . Particle swarm optimization with a balanceable fitness estimation for many-objective optimization problems [J]. IEEE Transactions on Evolutionary Computation , 2018 , 22 ( 1 ): 32 - 46 .
ROJAS SANTIAGO M , MUTHUSWAMY S , HULETT M . An ACO algorithm for scheduling a flow shop with setup times [J]. International Journal of Industrial and Systems Engineering , 2020 , 36 ( 1 ): 98 - 109 .
LI J , SHU Z . Research on path planning based on improved D* lite genetic algorithm [J]. Machine Tool & Hydraulics , 2019 , 47 ( 11 ): 39 - 42 .
YAN X S , LI P P , TANG K , et al . Clonal selection based intelligent parameter inversion algorithm for prestack seismic data [J]. Information Sciences , 2020 , 517 : 86 - 99 .
WANG H Q , HU Y Y , LIAO W D , et al . Path planning algorithm based on improved artificial bee colony algorithm [J]. Control Engineering , 2016 , 23 ( 95 ): 1407 - 1411 .
VITERBI A J . CDMA: Principles of Spread Spectrum Communication [M]. Wokingham : Addison-Wesley , 1995 .
SKOLNIK M I . Radar Handbook [M]. 3rd ed . New York : McGraw-Hill , 2008 .
XIE G , SHANGGUAN A Q , FEI R , et al . Motion trajectory prediction based on a CNN-LSTM sequential model [J]. Science China Information Sciences , 2020 , 63 ( 11 ): 1 - 21 .
许凯波 , 鲁海燕 , 黄洋 , 等 . 基于双层蚁群算法和动态环境的机器人路径规划方法 [J]. 电子学报 , 2019 , 47 ( 10 ): 2166 - 2176 .
XU K B , LU H Y , HUANG Y , et al . Robot path planning based on double-layer ant colony optimization algorithm and dynamic environment [J]. Acta Electronica Sinica , 2019 , 47 ( 10 ): 2166 - 2176 . (in Chinese)
WEN T , YANG D C , LIU W F , et al . A novel integrated path planning algorithm for warehouse AGVs [J]. Chinese Journal of Electronics , 2021 , 30 ( 2 ): 331 - 338 .
陈劲峰 , 黄卫华 , 章政 , 等 . 动态环境下基于改进人工势场法的路径规划算法 [J]. 组合机床与自动化加工技术 , 2020 ( 12 ): 6 - 9, 14 .
CHEN J F , HUANG W H , ZHANG Z , et al . Path planning algorithm based on improved artificial potential field method in dynamic environment [J]. Modular Machine Tool & Automatic Manufacturing Technique , 2020 ( 12 ): 6 - 9, 14 . (in Chinese)
TRAUTMAN P , KRAUSE A . Unfreezing the robot: Navigation in dense, interacting crowds [C]// 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems . Taipei, China : IEEE , 2010 : 797 - 803 .
HELBING D , MOLNÁR P . Social force model for pedestrian dynamics [J]. Physical Review . E, Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, 1995 , 51 ( 5 ): 4282 - 4286 .
SUTTON R S , Barto A G . Reinforcement learning: An introduction [M]. MIT press , 2018 .
PUTERMAN M L . Markov Decision Processes: Discrete Stochastic Dynamic Programming [M]. Hoboken : John Wiley & Sons , 1994 .
PENG P , ZHU F , LIU Q , et al . Achieving safe deep reinforcement learning via environment comprehension mechanism [J]. Chinese Journal of Electronics , 2021 , 30 ( 6 ): 1049 - 1058 .
VODOPIVEC T , SAMOTHRAKIS S , STER B . On Monte Carlo tree search and reinforcement learning [J]. Journal of Artificial Intelligence Research , 2017 , 60 : 881 - 936 .
SILVER D , HUANG A , MADDISON C J , et al . Mastering the game of Go with deep neural networks and tree search [J]. Nature , 2016 , 529 ( 7587 ): 484 - 489 .
KEARNS M , MANSOUR Y , NG A Y . A sparse sampling algorithm for near-optimal planning in large Markov decision processes [J]. Machine Learning , 2002 , 49 ( 2/3 ): 193 - 208 .
SILVER D , SCHRITTWIESER J , SIMONYAN K , et al . Mastering the game of Go without human knowledge [J]. Nature , 2017 , 550 ( 7676 ): 354 - 359 .
LILLICRAP T P , HUNT J J , PRITZEL A , et al . Continuous control with deep reinforcement learning [J]. arXiv preprint , arXiv: 1509.02971 , 2015 .
黄志清 , 曲志伟 , 张吉 , 等 . 基于深度强化学习的端到端无人驾驶决策 [J]. 电子学报 , 2020 , 48 ( 9 ): 1711 - 1719 .
HUANG Z Q , QU Z W , ZHANG J , et al . End-to-end autonomous driving decision based on deep reinforcement learning [J]. Acta Electronica Sinica , 2020 , 48 ( 9 ): 1711 - 1719 . (in Chinese)
CHEN X , LI Z , WANG K , et al . MDP-based network selection with reward optimization in HetNets [J]. Chinese Journal of Electronics , 2018 , 27 ( 1 ): 183 - 190 .
COULOM R . Efficient selectivity and backup operators in Monte-Carlo tree search [C]// International Conference on Computers and Games . Turin, Italy : Springer , 2007 : 72 - 83 .
CHASLOT G , BAKKES S , SZITA I , et al . Monte-Carlo tree search: A new framework for game AI [C]// Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment . Palo Alto, California, USA : The AAAI Press . 2008 , 4 ( 1 ): 216 - 217 .
BROWNE C B , POWLEY E , WHITEHOUSE D , et al . A survey of monte carlo tree search methods [J]. IEEE Transactions on Computational Intelligence and AI in games , 2012 , 4 ( 1 ): 1 - 43 .
KRONBERGER G , BRAUNE R . Bandit-based Monte-Carlo planning for the single-machine total weighted tardiness scheduling problem [C]// International Conference on Computer Aided Systems Theory . Las Palmas de Gran Canaria, Spain : Springer , 2007 : 837 - 844 .
ZHANG J J , LIU H Y , CHANG Q , et al . Recurrent neural network for motion trajectory prediction in human-robot collaborative assembly [J]. CIRP Annals , 2020 , 69 ( 1 ): 9 - 12 .
徐诚 , 何昊 , 段世红 , 等 . 一种基于深度蒙特卡洛树搜索的信源导航方法及装置 : 202110316103.9 [P]. 2021 .
DENKER A , İŞERI M C . Design and implementation of a semi-autonomous mobile search and rescue robot: SALVOR [C]// 2017 International Artificial Intelligence and Data Processing Symposium (IDAP) . Malatya, Turkey : IEEE , 2017 : 1 - 6 .
方朋朋 , 杨家富 , 施杨洋 , 等 . 基于梯度下降法和改进人工势场法的无人车避障方法 [J]. 制造业自动化 , 2018 , 40 ( 11 ): 81 - 84 .
FANG P P , YANG J F , SHI Y Y , et al . Gradient descent method and improved artificial potential field method for obstacle avoidance of unmanned vehicle [J]. Manufacturing Automation , 2018 , 40 ( 11 ): 81 - 84 . (in Chinese)
VISERAS A , SHUTIN D , MERINO L . Robotic active information gathering for spatial field reconstruction with rapidly-exploring random trees and online learning of Gaussian processes [J]. Sensors(Basel, Switzerland) , 2019 , 19 ( 5 ): 1016 .
0
浏览量
14
下载量
1
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621