

浏览全部资源
扫码关注微信
1.矿山数字化教育部工程研究中心,江苏徐州 221000
2.中国矿业大学计算机科学与技术学院,江苏徐州 221000
3.宁波市轨道交通集团有限公司,浙江宁波 315000
Received:09 March 2020,
Revised:2020-05-14,
Published:25 September 2021
移动端阅览
肖硕,黄珍珍,张国鹏等.基于SAC的多智能体深度强化学习算法[J].电子学报,2021,49(09):1675-1681.
XIAO Shuo,HUANG Zhen-zhen,ZHANG Guo-peng,et al.Deep Reinforcement Learning Algorithm of Multi⁃agent Based on SAC[J].ACTA ELECTRONICA SINICA,2021,49(09):1675-1681.
肖硕,黄珍珍,张国鹏等.基于SAC的多智能体深度强化学习算法[J].电子学报,2021,49(09):1675-1681. DOI: 10.12263/DZXB.20200243.
XIAO Shuo,HUANG Zhen-zhen,ZHANG Guo-peng,et al.Deep Reinforcement Learning Algorithm of Multi⁃agent Based on SAC[J].ACTA ELECTRONICA SINICA,2021,49(09):1675-1681. DOI: 10.12263/DZXB.20200243.
由于多智能体所处环境动态变化,并且单个智能体的决策也会影响其他智能体,这使得单智能体深度强化学习算法难以在多智能体环境中保持稳定. 为了适应多智能体环境,本文利用集中训练和分散执行框架Cen‑tralized Training with Decentralized Execution(CTDE),对单智能体深度强化学习算法Soft Actor‑Critic(SAC)进行了改进,引入智能体通信机制,构建Multi‑Agent Soft Actor‑Critic(MASAC)算法. MASAC中智能体共享观察信息和历史经验,有效减少了环境不稳定性对算法造成的影响.最后,本文在协同以及协同竞争混合的任务中,对MASAC算法性能进行了实验分析,结果表明MASAC相对于SAC在多智能体环境中具有更好的稳定性.
Due to the dynamic change of multi-agent environment
and the decision of single agent will affect other agents
it is difficult for the deep reinforcement learning algorithm of single agent to maintain stability in multi-agent environment. In order to adapt to multi-agent environment
this paper uses centralized training and decentralized execution framework (CTDE) to improve single agent deep reinforcement learning algorithm soft actor-critic (SAC). By introducing agent communication mechanism
in multi-agent soft actor-critic (MASAC)
agents share observation information and historical experience
which effectively reduces the impact of environmental instability on the algorithm. Finally
in the task of cooperation and cooperation and competition
the performance of MASAC algorithm is analyzed experimentally. The results show that MASAC has better stability than SAC in multi-agent environment.
Silver D , Huang A , Maddison C J , et al . Mastering the game of Go with deep neural networks and tree search [J]. Nature , 2016 , 529 ( 7587 ): 484 - 489 .
Silver D , Schrittwieser J , Simonyan K , et al . Mastering the game of Go without human knowledge [J]. Nature , 2017 , 550 ( 7676 ): 354 - 359 .
周沛 , 陈后金 , 于泽宽 , 等 . 跨模态医学图像预测综述 [J]. 电子学报 , 2019 , 47 ( 1 ): 220 - 226 .
Zhou P , Chen H J , Yu Z K , et al . A review of multimodal medical image prediction [J]. Acta Electronica ,Sinica, 2019 , 47 ( 1 ): 220 - 226 . (in Chinese)
Lowe R , Foerster J , Boureau Y L , et al . On the pitfalls of measuring emergent communication [A]. Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems [C]. Montreal : ICAAMS , 2019 . 693 - 701 .
Wang X , Chen W , Wu J , et al . Video captioning via hierarchical reinforcement learning [A]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition [C]. Hawaii : CVPR , 2018 . 4213 - 4222 .
郑兴华 , 孙喜庆 , 吕嘉欣 , 等 . 基于深度学习和智能规划的行为识别 [J]. 电子学报 , 2019 , 47 ( 8 ): 1661 - 1668 .
Zheng X H , Sun X Q , LV J X , et al . Behavior recognition based on deep learning and intelligent planning [J]. Acta Electronica Sinica , 2019 , 47 ( 8 ): 1661 - 1668 . (in Chinese)
Schulman J , Levine S , Abbeel P , et al . Trust region policy optimization [A]. International Conference on Machine Learning [C]. Lille : ICML , 2015 . 1889 - 1897 .
闻佳 , 王宏君 , 邓佳 , 等 . 基于深度学习的异常事件检测 [J]. 电子学报 , 2020 , 48 ( 2 ): 308 - 313 .
Wen J , Wang H J , Deng J , et al . Abnormal event detection based on deep learning [J]. Acta Electronica Sinica , 2020 , 48 ( 2 ): 308 - 313 . (in Chinese)
Abdallah S , Kaisers M . Addressing environment non-stationarity by repeating Q-learning updates [J]. The Journal of Machine Learning Research , 2016 , 17 ( 1 ): 1582 - 1612 .
Foerster J N , Farquhar G , Afouras T , et al . Counterfactual multi-agent policy gradients [A]. Thirty-second AAAI Conference on Artificial Intelligence [C]. New Orleans : AAAI , 2018 . 2974 - 2982 .
Lowe R , Wu Y , Tamar A , et al . Multi-agent actor-critic for mixed cooperative-competitive environments [A]. Advances in Neural Information Processing Systems [C]. Long Beach : NIPS , 2017 . 6379 - 6390 .
Haarnoja T , Zhou A , Abbeel P , et al . Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor [A]. International Conference on Machine Learning [C]. Stockholm : ICML , 2018 . 1856 - 1865 .
Haarnoja T , Tang H , Abbeel P , et al . Reinforcement learning with deep energy-based policies [A]. Proceedings of the 34th International Conference on Machine Learning [C]. Sydney : ICML , 2017 . 1352 - 1361 .
Das A , Kottur S , Moura J M F , et al . Learning cooperative visual dialog agents with deep reinforcement learning [A]. Proceedings of the IEEE International Conference on Computer Vision [C]. Venice : ICCV , 2017 . 2951 - 2960 .
曹源 , 唐涛 , 徐田华 , 穆建成 . 形式化方法在列车运行控制系统中的应用 [J]. 交通运输工程学报 , 2010 , 10 ( 1 ): 112 - 126 .
Cao Yuan , Tang Tao , Xu Tianhua , Mu Jiancheng . Application of formal method in train operation control system [J]. Journal of Transportation Engineering , 2010 , 10 ( 1 ): 112 - 126 . (in Chinese)
吴胜权 , 黄振晖 , 曹源 . 有轨电车路权配置与信号系统选择 [J]. 中国铁路 , 2014 , ( 8 ): 97 - 99 .
Wu Shengquan , Huang Zhenhui , Cao Yuan . Tram right of way configuration and signal system selection [J]. China Railway , 2014 , ( 8 ): 97 - 99 . (in Chinese)
0
Views
17
下载量
5
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621