Deep Reinforcement Learning Algorithm of Multi⁃agent Based on SAC

XIAO Shuo; HUANG Zhen-zhen; ZHANG Guo-peng; YANG Shu-song; JIANG Hai-fei; LI Tian-xu

doi:10.12263/DZXB.20200243

您当前的位置：

首页 >

文章列表页 >

Deep Reinforcement Learning Algorithm of Multi⁃agent Based on SAC

PAPERS | 更新时间：2025-12-08

- Deep Reinforcement Learning Algorithm of Multi⁃agent Based on SAC
- ACTA ELECTRONICA SINICA Vol. 49, Issue 9, Pages: 1675-1681(2021)
- 作者机构：
  
  1.矿山数字化教育部工程研究中心，江苏徐州 221000
  2.中国矿业大学计算机科学与技术学院，江苏徐州 221000
  3.宁波市轨道交通集团有限公司，浙江宁波 315000
- 作者简介：
- 基金信息：
- DOI：10.12263/DZXB.20200243
  CLC： TP391
- Received：09 March 2020，
  
  Revised：2020-05-14，
  
  Published：25 September 2021
- 稿件说明：
移动端阅览
肖硕,黄珍珍,张国鹏等.基于SAC的多智能体深度强化学习算法[J].电子学报,2021,49(09):1675-1681.

XIAO Shuo,HUANG Zhen-zhen,ZHANG Guo-peng,et al.Deep Reinforcement Learning Algorithm of Multi⁃agent Based on SAC[J].ACTA ELECTRONICA SINICA,2021,49(09):1675-1681.
肖硕,黄珍珍,张国鹏等.基于SAC的多智能体深度强化学习算法[J].电子学报,2021,49(09):1675-1681. DOI： 10.12263/DZXB.20200243.

XIAO Shuo,HUANG Zhen-zhen,ZHANG Guo-peng,et al.Deep Reinforcement Learning Algorithm of Multi⁃agent Based on SAC[J].ACTA ELECTRONICA SINICA,2021,49(09):1675-1681. DOI： 10.12263/DZXB.20200243.

摘要

由于多智能体所处环境动态变化，并且单个智能体的决策也会影响其他智能体，这使得单智能体深度强化学习算法难以在多智能体环境中保持稳定. 为了适应多智能体环境，本文利用集中训练和分散执行框架Cen‑tralized Training with Decentralized Execution(CTDE)，对单智能体深度强化学习算法Soft Actor‑Critic(SAC)进行了改进，引入智能体通信机制，构建Multi‑Agent Soft Actor‑Critic(MASAC)算法. MASAC中智能体共享观察信息和历史经验，有效减少了环境不稳定性对算法造成的影响.最后，本文在协同以及协同竞争混合的任务中，对MASAC算法性能进行了实验分析，结果表明MASAC相对于SAC在多智能体环境中具有更好的稳定性.

Abstract

Due to the dynamic change of multi-agent environment

and the decision of single agent will affect other agents

it is difficult for the deep reinforcement learning algorithm of single agent to maintain stability in multi-agent environment. In order to adapt to multi-agent environment

this paper uses centralized training and decentralized execution framework (CTDE) to improve single agent deep reinforcement learning algorithm soft actor-critic (SAC). By introducing agent communication mechanism

in multi-agent soft actor-critic (MASAC)

agents share observation information and historical experience

which effectively reduces the impact of environmental instability on the algorithm. Finally

in the task of cooperation and cooperation and competition

the performance of MASAC algorithm is analyzed experimentally. The results show that MASAC has better stability than SAC in multi-agent environment.

关键词

Keywords

references

Silver D , Huang A , Maddison C J , et al . Mastering the game of Go with deep neural networks and tree search [J]. Nature , 2016 , 529 ( 7587 ): 484 - 489 .

Silver D , Schrittwieser J , Simonyan K , et al . Mastering the game of Go without human knowledge [J]. Nature , 2017 , 550 ( 7676 ): 354 - 359 .

周沛 , 陈后金 , 于泽宽 , 等 . 跨模态医学图像预测综述 [J]. 电子学报 , 2019 , 47 ( 1 ): 220 - 226 .

Zhou P , Chen H J , Yu Z K , et al . A review of multimodal medical image prediction [J]. Acta Electronica ,Sinica, 2019 , 47 ( 1 ): 220 - 226 . (in Chinese)

Lowe R , Foerster J , Boureau Y L , et al . On the pitfalls of measuring emergent communication [A]. Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems [C]. Montreal : ICAAMS , 2019 . 693 - 701 .

Wang X , Chen W , Wu J , et al . Video captioning via hierarchical reinforcement learning [A]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition [C]. Hawaii : CVPR , 2018 . 4213 - 4222 .

郑兴华 , 孙喜庆 , 吕嘉欣 , 等 . 基于深度学习和智能规划的行为识别 [J]. 电子学报 , 2019 , 47 ( 8 ): 1661 - 1668 .

Zheng X H , Sun X Q , LV J X , et al . Behavior recognition based on deep learning and intelligent planning [J]. Acta Electronica Sinica , 2019 , 47 ( 8 ): 1661 - 1668 . (in Chinese)

Schulman J , Levine S , Abbeel P , et al . Trust region policy optimization [A]. International Conference on Machine Learning [C]. Lille : ICML , 2015 . 1889 - 1897 .

闻佳 , 王宏君 , 邓佳 , 等 . 基于深度学习的异常事件检测 [J]. 电子学报 , 2020 , 48 ( 2 ): 308 - 313 .

Wen J , Wang H J , Deng J , et al . Abnormal event detection based on deep learning [J]. Acta Electronica Sinica , 2020 , 48 ( 2 ): 308 - 313 . (in Chinese)

Abdallah S , Kaisers M . Addressing environment non-stationarity by repeating Q-learning updates [J]. The Journal of Machine Learning Research , 2016 , 17 ( 1 ): 1582 - 1612 .

Foerster J N , Farquhar G , Afouras T , et al . Counterfactual multi-agent policy gradients [A]. Thirty-second AAAI Conference on Artificial Intelligence [C]. New Orleans : AAAI , 2018 . 2974 - 2982 .

Lowe R , Wu Y , Tamar A , et al . Multi-agent actor-critic for mixed cooperative-competitive environments [A]. Advances in Neural Information Processing Systems [C]. Long Beach : NIPS , 2017 . 6379 - 6390 .

Haarnoja T , Zhou A , Abbeel P , et al . Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor [A]. International Conference on Machine Learning [C]. Stockholm : ICML , 2018 . 1856 - 1865 .

Haarnoja T , Tang H , Abbeel P , et al . Reinforcement learning with deep energy-based policies [A]. Proceedings of the 34th International Conference on Machine Learning [C]. Sydney : ICML , 2017 . 1352 - 1361 .

Das A , Kottur S , Moura J M F , et al . Learning cooperative visual dialog agents with deep reinforcement learning [A]. Proceedings of the IEEE International Conference on Computer Vision [C]. Venice : ICCV , 2017 . 2951 - 2960 .

曹源 , 唐涛 , 徐田华 , 穆建成 . 形式化方法在列车运行控制系统中的应用 [J]. 交通运输工程学报 , 2010 , 10 ( 1 ): 112 - 126 .

Cao Yuan , Tang Tao , Xu Tianhua , Mu Jiancheng . Application of formal method in train operation control system [J]. Journal of Transportation Engineering , 2010 , 10 ( 1 ): 112 - 126 . (in Chinese)

吴胜权 , 黄振晖 , 曹源 . 有轨电车路权配置与信号系统选择 [J]. 中国铁路 , 2014 , ( 8 ): 97 - 99 .

Wu Shengquan , Huang Zhenhui , Cao Yuan . Tram right of way configuration and signal system selection [J]. China Railway , 2014 , ( 8 ): 97 - 99 . (in Chinese)

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Allocation Algorithm of Distributed Cooperative Jamming Power Based on Multi-Agent Deep Reinforcement Learning

Related Author

RAO Ning

XU Hua

JIANG Lei

SONG Bai-lin

SHI Yun-hao

Ning RAO

Hua XU

Lei JIANG

Related Institution

Information and Navigation College of Air Force Engineering University

⁰