

浏览全部资源
扫码关注微信
西北工业大学软件学院,陕西西安 710065
Received:17 September 2021,
Revised:2022-03-08,
Published:25 April 2022
移动端阅览
张涛,张文涛,代凌等.基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法[J].电子学报,2022,50(04):954-966.
ZHANG Tao,ZHANG Wen-tao,DAI Ling,et al.Integrated Modular Avionics System Reconstruction Method Based on Sequential Game Multi-agent Reinforcement Learning[J].ACTA ELECTRONICA SINICA,2022,50(04):954-966.
张涛,张文涛,代凌等.基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法[J].电子学报,2022,50(04):954-966. DOI: 10.12263/DZXB.20211268.
ZHANG Tao,ZHANG Wen-tao,DAI Ling,et al.Integrated Modular Avionics System Reconstruction Method Based on Sequential Game Multi-agent Reinforcement Learning[J].ACTA ELECTRONICA SINICA,2022,50(04):954-966. DOI: 10.12263/DZXB.20211268.
动态重构是一种有效的综合模块化航空电子系统故障容错方法.重构蓝图定义了系统故障环境下的应用迁移与资源重配置方案,是以最小代价重构恢复系统功能的关键.在复杂多级关联故障模式下,如何快速自动生成有效重构蓝图是其难点.针对该问题,本文提出一种基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法.该方法引入序贯博弈模型,将因受故障影响而需要迁移重构的应用软件定义为博弈中的智能体,根据应用软件优先级确定序贯博弈的顺序.针对序贯博弈过程中多智能体间竞争与合作的问题,算法使用强化学习中的策略梯度,通过控制与环境交互中的动作选择概率来优化重构效果.应用基于有偏估计的策略梯度蒙特卡洛树搜索算法更新博弈策略,解决了传统策略梯度算法震荡难收敛、计算耗时长问题.实验结果表明,与差分进化、Q学习等方法相比,所提算法的优化性能和稳定性均具有显著优势.
Dynamic reconfiguration is an efficient fault-tolerant approach for integrated modular avionics(IMA) systems. The reconfiguration blueprint defines the application migration and resource reconfiguration scheme in the system failure environment
which is the key to reconfiguring and recovering the system function with minimum cost. How to generate effective reconfiguration blueprints rapidly and automatically in complex multi-level associated failure modes is the difficulty. This paper proposes an IMA system reconfiguration method based on sequential game multi-agent reinforcement learning to solve the problem. The sequential game model is introduced in this method. We define the application software needs to be migrated as the agent in the game. The sequence of sequential game is determined according to the priority of the application software. Aiming at the problem of competition and cooperation among multiple agents in the process of sequential game
the algorithm introduces policy gradient of reinforcement learning and optimizes the reconfiguration effect by controlling the action selection probability in interaction with the environment. The policy gradient Monte Carlo tree search algorithm based on biased estimation is applied to update game strategy
which solves the problems of oscillation
difficulty in convergence
long calculation time of the traditional policy gradient algorithm. Experimental results indicate that compared with differential evolution and Q-learning methods
the proposed algorithm has significant advantages in convergence and efficiency.
PARR G R , EDWARDS R . Integrated modular avionics [J]. Air & Space Europe , 1999 , 1 ( 2 ): 72 - 75 .
丁全心 . 综合模块化航空电子系统标准述评 [J]. 电光与控制 , 2013 , 20 ( 6 ): 1 - 3 .
DING X Q . Remarks on standards of integrated module avionic system [J]. Electronics Optics & Control , 2013 , 20 ( 6 ): 1 - 3 . (in Chinese)
PARTON D . Blueprint for the future [J]. Mental Health Today , 2011 , 63 ( 2 ): 10 .
JOLLIFFE G , NICHOLSON M . Exploring the Possibilities Towards a Preliminary Safety Case for IMA Blueprints [M]. London, UK : Springer , 2005 : 8 . 1-8 . 43 .
王震 , 朱剑锋 . 基于在线加载分区机制的重构方案的设计与实现 [J]. 航空电子技术 , 2016 , 47 ( 1 ): 6 .
WANG Z , ZHU J F . Design and implementation of a reconfiguration blueprint based on online-loaded partition mechanism [J]. Avionics Technology , 2016 , 47 ( 1 ): 6 . (in Chinese)
BRIAO E W , BARCELOS D , WRONSKI F , et al . Impact of task migration in NoC-based MPSoCs for soft real-time applications [C]// 2007 IFIP International Conference on Very Large Scale Integration . Virtual Conference : IEEE , 2007 : 296 - 299 .
ANNIGHOEFER B , NIL C , SEBALD J , et al . Structured and symmetric IMA architecture optimization: Use case Ariane launcher [C]// IEEE/AIAA Digital Avionics Systems Conference . New York : IEEE , 2015 : 6B3-1-6B3 - 14 .
刘若辰 , 李建霞 , 刘静 , 等 . 动态多目标优化研究综述 [J]. 计算机学报 , 2020 , 43 ( 7 ): 1246 - 1278 .
LIU R C , LI J X , LIU J , et al . A survey on dynamic multi-objective optimization [J] . Chinese Journal of Computers , 2020 , 43 ( 7 ): 1246 - 1278 . (in Chinese)
CALABOUGH J . Software configuration—an NP-complete problem [J]. ACM Sigmis Database , 1988 , 19 ( 2 ): 29 - 34 .
HOU X Y , GAO H B , DENG Z Q , et al . Path planning of lunar rover group based on theory of dynamic programming and multi-objective optimization [C]// IEEE Conference on Industrial Electronics & Applications . Virtual Conference : IEEE , 2007 : 1308 - 1313
赵玉芳 , 唐立新 . 极小化总完工时间的单机连续型批调度问题 [J]. 电子学报 , 2008 , 36 ( 2 ): 367 - 370 .
ZHAO Y F , TANG L X . Scheduling a single continuous batch processing machine to minimize total completion time [J]. Acta Electronica Sinica , 2008 , 36 ( 2 ): 367 - 370 . (in Chinese)
Z̆ILINSKAS A , ZHIGLJAVSKY A . Branch and probability bound methods in multi-objective optimization [J]. Optimization Letters , 2016 , 10 ( 2 ): 341 - 353 .
SINGH H K , RAY T , SMITH W . C-PSA: Constrained pareto simulated annealing for constrained multi-objective optimization [J]. Information Sciences , 2010 , 180 ( 13 ): 2499 - 2513 .
ZHANG J , SHANG Y , GAO R , et al . An improved multi-objective adaptive niche genetic algorithm based on pareto front [C]// 2009 IEEE International Advance Computing Conference . Virtual Conference : IEEE , 2009 : 300 - 304 .
LEI R , CHENG Y . A pareto-based differential evolution algorithm for multi-objective optimization problems [C]// 2010 Chinese Control and Decision Conference . New York : IEEE , 2010 : 1608 - 1613 .
ZHANG T , CHEN J , LV D , et al . Automatic generation of reconfiguration blueprints for ima systems using reinforcement learning [J]. IEEE Embedded Systems Letters , 2021 , 13 ( 4 ): 182 - 185
罗庆 , 张涛 , 单鹏 , 等 . 基于改进Q学习的IMA系统重构蓝图生成方法 [J]. 航空学报 , 2021 , 42 ( 8 ): 525792 - 525792 .
LUO Q , ZHANG T , SHAN P , et al . Generating reconfiguration blueprints for IMA systems based on improved Q-learning [J]. Acta Aeronautica et Astronautica Sinica , 2021 , 42 ( 8 ): 525792 - 525792 . (in Chinese)
HUANG R T , YU T Y , DING Z H , et al . Deep Reinforcement Learning [M]. Singapore : Springer , 2020 : 161 - 212 .
HE Q , HOU X . WD3: Taming the estimation bias in deep reinforcement learning [C]// 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI) . Virtual Conference : IEEE , 2020 : 391 - 398 .
CHASLOT J B , WINANDS M , HERIK H , et al . Progressive Strategies for Monte-Carlo Tree Search [J]. New Mathematics & Natural Computation , 2008 , 4 ( 3 ): 343 - 357 .
PATTERSON W , WINSTON-PROCTOR C E . Game Theory [M]. Part of the Springer Undergraduate Mathematics Series book series(SUMS). London : Springer , 2020 : 87 - 106 .
MEZZETTI C , RENOU L . implementation in mixed nash equilibrium [J]. Warwick Economics Research Paper , 2012 , 147 ( 6 ): 2357 - 2375 .
KRISHNAMURTHY V , ABAD F . Gradient Based Policy Optimization of Constrained Markov Decision Processes [M]. Singapore : World Scientific , 2012 : 503 - 547 .
POURMOHSENI B , WILDERMANN S , GLA M , et al . Hard real-time application mapping reconfiguration for NoC-based many-core systems [J]. Real-Time Systems , 2019 , 55 ( 2 ): 433 - 469 .
SUTTON R S , MCALLESTER D , SINGH S , et al . Policy gradient methods for reinforcement learning with function approximation [C]// Submitted to Advances in Neural Information Processing Systems(NIPS) . Vertual Conference : The MIT Press , 1999 : 1057 - 1063 .
SUTTON R S , BARTO A G , et al . Reinforcement Learning: An Introduction Second edition [M]. London : The MIT Press , 2015 : 265 - 278 .
SOEMERS D , PIETTE R , STEPHENSON M , et al . Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates [C]// 2019 IEEE Conference on Games(CoG) . Virtual Conference : IEEE , 2019 : 1 - 8 .
0
Views
8
下载量
4
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621