电子学报 ›› 2022, Vol. 50 ›› Issue (4): 954-966.DOI: 10.12263/DZXB.20211268
张涛, 张文涛, 代凌, 陈婧怡, 王丽, 魏倩茹
收稿日期:
2021-09-17
修回日期:
2022-03-08
出版日期:
2022-04-25
发布日期:
2022-04-25
作者简介:
基金资助:
ZHANG Tao, ZHANG Wen-tao, DAI Ling, CHEN Jing-yi, WANG Li, WEI Qian-ru
Received:
2021-09-17
Revised:
2022-03-08
Online:
2022-04-25
Published:
2022-04-25
摘要:
动态重构是一种有效的综合模块化航空电子系统故障容错方法.重构蓝图定义了系统故障环境下的应用迁移与资源重配置方案,是以最小代价重构恢复系统功能的关键.在复杂多级关联故障模式下,如何快速自动生成有效重构蓝图是其难点.针对该问题,本文提出一种基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法.该方法引入序贯博弈模型,将因受故障影响而需要迁移重构的应用软件定义为博弈中的智能体,根据应用软件优先级确定序贯博弈的顺序.针对序贯博弈过程中多智能体间竞争与合作的问题,算法使用强化学习中的策略梯度,通过控制与环境交互中的动作选择概率来优化重构效果.应用基于有偏估计的策略梯度蒙特卡洛树搜索算法更新博弈策略,解决了传统策略梯度算法震荡难收敛、计算耗时长问题.实验结果表明,与差分进化、Q学习等方法相比,所提算法的优化性能和稳定性均具有显著优势.
中图分类号:
张涛, 张文涛, 代凌, 陈婧怡, 王丽, 魏倩茹. 基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法[J]. 电子学报, 2022, 50(4): 954-966.
ZHANG Tao, ZHANG Wen-tao, DAI Ling, CHEN Jing-yi, WANG Li, WEI Qian-ru. Integrated Modular Avionics System Reconstruction Method Based on Sequential Game Multi-agent Reinforcement Learning[J]. Acta Electronica Sinica, 2022, 50(4): 954-966.
应用软件 Id | 截止时间 /ms | 最坏情况执行时间/ms | 内存 /kb | 优先级 |
---|---|---|---|---|
M1 | 15 | 2 | 20 | 3 |
M2 | 40 | 6 | 90 | 4 |
M3 | 40 | 8 | 90 | 4 |
M4 | 30 | 5 | 80 | 5 |
M5 | 20 | 4 | 80 | 2 |
M6 | 40 | 6 | 70 | 2 |
M7 | 30 | 4 | 70 | 4 |
M8 | 20 | 3 | 60 | 2 |
M9 | 20 | 2 | 60 | 3 |
M10 | 30 | 4 | 50 | 4 |
M11 | 15 | 2 | 50 | 3 |
M12 | 30 | 2 | 40 | 1 |
M13 | 40 | 3 | 40 | 3 |
M14 | 30 | 4 | 30 | 2 |
M15 | 15 | 1 | 30 | 4 |
表1 应用软件属性数据
应用软件 Id | 截止时间 /ms | 最坏情况执行时间/ms | 内存 /kb | 优先级 |
---|---|---|---|---|
M1 | 15 | 2 | 20 | 3 |
M2 | 40 | 6 | 90 | 4 |
M3 | 40 | 8 | 90 | 4 |
M4 | 30 | 5 | 80 | 5 |
M5 | 20 | 4 | 80 | 2 |
M6 | 40 | 6 | 70 | 2 |
M7 | 30 | 4 | 70 | 4 |
M8 | 20 | 3 | 60 | 2 |
M9 | 20 | 2 | 60 | 3 |
M10 | 30 | 4 | 50 | 4 |
M11 | 15 | 2 | 50 | 3 |
M12 | 30 | 2 | 40 | 1 |
M13 | 40 | 3 | 40 | 3 |
M14 | 30 | 4 | 30 | 2 |
M15 | 15 | 1 | 30 | 4 |
硬件资源 | 主框架时间/ms | 分区 | 内存 /kb | 执行时间/ms | 应用软件 |
---|---|---|---|---|---|
C1 | 50 | P1 | 128 | 15 | M15 |
P2 | 256 | 20 | M2 | ||
P3 | 128 | 15 | M13 | ||
C2 | 50 | P1 | 512 | 50 | M3 M4 M12 |
C3 | 50 | P1 | 256 | 30 | M6 M7 |
P2 | 256 | 20 | M8 | ||
C4 | 50 | P1 | 128 | 15 | M9 |
P2 | 128 | 10 | / | ||
P3 | 256 | 25 | M10 M14 | ||
C5 | 50 | P1 | 256 | 30 | M1 M5 |
P2 | 256 | 20 | M11 |
表2 IMA初始E0配置信息
硬件资源 | 主框架时间/ms | 分区 | 内存 /kb | 执行时间/ms | 应用软件 |
---|---|---|---|---|---|
C1 | 50 | P1 | 128 | 15 | M15 |
P2 | 256 | 20 | M2 | ||
P3 | 128 | 15 | M13 | ||
C2 | 50 | P1 | 512 | 50 | M3 M4 M12 |
C3 | 50 | P1 | 256 | 30 | M6 M7 |
P2 | 256 | 20 | M8 | ||
C4 | 50 | P1 | 128 | 15 | M9 |
P2 | 128 | 10 | / | ||
P3 | 256 | 25 | M10 M14 | ||
C5 | 50 | P1 | 256 | 30 | M1 M5 |
P2 | 256 | 20 | M11 |
参数 | 值 |
---|---|
系统应用软件总数 | 15 |
故障应用软件数 | 3,3,6,6,6,6,9,9 |
可用分区数 | 8,8,7,6,8,6,5,3 |
最大博弈次数 | |
有偏长度 | |
博弈竞争学习率 | 0.01 |
博弈合作学习率 | 0.9 |
分区负载参数 | 0.5,0.5 |
负载约束 | 0.8,0.8 |
评价指标参数 | 0.1,0.35,0.35,0.2 |
表3 实验参数设置
参数 | 值 |
---|---|
系统应用软件总数 | 15 |
故障应用软件数 | 3,3,6,6,6,6,9,9 |
可用分区数 | 8,8,7,6,8,6,5,3 |
最大博弈次数 | |
有偏长度 | |
博弈竞争学习率 | 0.01 |
博弈合作学习率 | 0.9 |
分区负载参数 | 0.5,0.5 |
负载约束 | 0.8,0.8 |
评价指标参数 | 0.1,0.35,0.35,0.2 |
故障环境 | 算法 | 最大值平均值 | 算法时延/(ms/次) | 最大值收敛时间/s |
---|---|---|---|---|
E1 | BE-PGMCTS | 0.957 2 | 4.93 | 2.474 8 |
PGMCTS | 0.957 2 | 18.60 | 8.686 2 | |
Qlearn | 0.955 2 | 3.24 | 16.251 0 | |
DE | 0.957 2 | 51.40 | 17.294 0 | |
E2 | BE-PGMCTS | 0.970 6 | 3.53 | 2.308 6 |
PGMCTS | 0.970 6 | 9.05 | 10.887 0 | |
Qlearn | 0.970 6 | 3.83 | 6.736 9 | |
DE | 0.970 6 | 61.60 | 45.707 0 | |
E3 | BE-PGMCTS | 0.956 2 | 3.46 | 2.283 6 |
PGMCTS | 0.956 2 | 8.94 | 3.272 0 | |
Qlearn | 0.955 8 | 3.31 | 13.518 0 | |
DE | 0.956 2 | 48.40 | 8.905 6 | |
E4 | BE-PGMCTS | 0.965 3 | 3.22 | 1.120 6 |
PGMCTS | 0.965 3 | 7.10 | 2.307 5 | |
Qlearn | 0.965 2 | 2.96 | 13.897 0 | |
DE | 0.965 3 | 36.60 | 14.274 0 | |
E5 | BE-PGMCTS | 0.954 1 | 5.87 | 8.893 1 |
PGMCTS | 0.954 1 | 38.10 | 74.486 0 | |
Qlearn | 0.953 3 | 4.32 | 117.930 0 | |
DE | 0.954 1 | 69.10 | 84.762 0 | |
E6 | BE-PGMCTS | 0.961 8 | 4.71 | 27.954 0 |
PGMCTS | 0.961 8 | 20.30 | 123.780 0 | |
Qlearn | 0.960 2 | 4.14 | 41.806 0 | |
DE | 0.961 8 | 50.70 | 153.690 0 | |
E7 | BE-PGMCTS | 0.892 4 | 6.42 | 83.556 0 |
PGMCTS | 0.892 4 | 32.00 | 416.480 0 | |
Qlearn | 0.866 4 | 4.30 | 105.380 0 | |
DE | 0.882 3 | 42.70 | 108.500 0 | |
E8 | BE-PGMCTS | 0.894 9 | 5.79 | 26.512 0 |
PGMCTS | 0.894 9 | 18.30 | 66.411 0 | |
Qlearn | 0.880 1 | 3.92 | 67.134 0 | |
DE | 0.885 8 | 23.10 | 80.873 0 |
表4 算法性能对比
故障环境 | 算法 | 最大值平均值 | 算法时延/(ms/次) | 最大值收敛时间/s |
---|---|---|---|---|
E1 | BE-PGMCTS | 0.957 2 | 4.93 | 2.474 8 |
PGMCTS | 0.957 2 | 18.60 | 8.686 2 | |
Qlearn | 0.955 2 | 3.24 | 16.251 0 | |
DE | 0.957 2 | 51.40 | 17.294 0 | |
E2 | BE-PGMCTS | 0.970 6 | 3.53 | 2.308 6 |
PGMCTS | 0.970 6 | 9.05 | 10.887 0 | |
Qlearn | 0.970 6 | 3.83 | 6.736 9 | |
DE | 0.970 6 | 61.60 | 45.707 0 | |
E3 | BE-PGMCTS | 0.956 2 | 3.46 | 2.283 6 |
PGMCTS | 0.956 2 | 8.94 | 3.272 0 | |
Qlearn | 0.955 8 | 3.31 | 13.518 0 | |
DE | 0.956 2 | 48.40 | 8.905 6 | |
E4 | BE-PGMCTS | 0.965 3 | 3.22 | 1.120 6 |
PGMCTS | 0.965 3 | 7.10 | 2.307 5 | |
Qlearn | 0.965 2 | 2.96 | 13.897 0 | |
DE | 0.965 3 | 36.60 | 14.274 0 | |
E5 | BE-PGMCTS | 0.954 1 | 5.87 | 8.893 1 |
PGMCTS | 0.954 1 | 38.10 | 74.486 0 | |
Qlearn | 0.953 3 | 4.32 | 117.930 0 | |
DE | 0.954 1 | 69.10 | 84.762 0 | |
E6 | BE-PGMCTS | 0.961 8 | 4.71 | 27.954 0 |
PGMCTS | 0.961 8 | 20.30 | 123.780 0 | |
Qlearn | 0.960 2 | 4.14 | 41.806 0 | |
DE | 0.961 8 | 50.70 | 153.690 0 | |
E7 | BE-PGMCTS | 0.892 4 | 6.42 | 83.556 0 |
PGMCTS | 0.892 4 | 32.00 | 416.480 0 | |
Qlearn | 0.866 4 | 4.30 | 105.380 0 | |
DE | 0.882 3 | 42.70 | 108.500 0 | |
E8 | BE-PGMCTS | 0.894 9 | 5.79 | 26.512 0 |
PGMCTS | 0.894 9 | 18.30 | 66.411 0 | |
Qlearn | 0.880 1 | 3.92 | 67.134 0 | |
DE | 0.885 8 | 23.10 | 80.873 0 |
1 | PARR G R, EDWARDS R. Integrated modular avionics[J]. Air & Space Europe, 1999, 1(2): 72-75. |
2 | 丁全心. 综合模块化航空电子系统标准述评[J]. 电光与控制, 2013, 20(6): 1-3. |
DING X Q. Remarks on standards of integrated module avionic system[J]. Electronics Optics & Control, 2013,20(6): 1-3. (in Chinese) | |
3 | PARTON D. Blueprint for the future[J]. Mental Health Today, 2011, 63(2): 10. |
4 | JOLLIFFE G, NICHOLSON M. Exploring the Possibilities Towards a Preliminary Safety Case for IMA Blueprints[M]. London, UK: Springer, 2005: 8.1-8.43. |
5 | 王震,朱剑锋. 基于在线加载分区机制的重构方案的设计与实现[J]. 航空电子技术, 2016, 47(1): 6. |
WANG Z, ZHU J F. Design and implementation of a reconfiguration blueprint based on online-loaded partition mechanism[J]. Avionics Technology, 2016, 47(1): 6. (in Chinese) | |
6 | BRIAO E W, BARCELOS D, WRONSKI F, et al. Impact of task migration in NoC-based MPSoCs for soft real-time applications[C]//2007 IFIP International Conference on Very Large Scale Integration. Virtual Conference: IEEE, 2007: 296-299. |
7 | ANNIGHOEFER B, NIL C, SEBALD J, et al. Structured and symmetric IMA architecture optimization: Use case Ariane launcher[C]//IEEE/AIAA Digital Avionics Systems Conference. New York: IEEE, 2015: 6B3-1-6B3-14. |
8 | 刘若辰, 李建霞, 刘静, 等. 动态多目标优化研究综述[J]. 计算机学报, 2020, 43(7): 1246-1278. |
LIU R C, LI J X, LIU J, et al. A survey on dynamic multi-objective optimization[J]. Chinese Journal of Computers, 2020, 43(7): 1246-1278. (in Chinese) | |
9 | CALABOUGH J. Software configuration—an NP-complete problem[J]. ACM Sigmis Database, 1988, 19(2):29-34. |
10 | HOU X Y, GAO H B, DENG Z Q, et al. Path planning of lunar rover group based on theory of dynamic programming and multi-objective optimization[C]//IEEE Conference on Industrial Electronics & Applications. Virtual Conference: IEEE, 2007: 1308-1313 |
11 | 赵玉芳, 唐立新. 极小化总完工时间的单机连续型批调度问题[J]. 电子学报, 2008, 36(2): 367-370. |
ZHAO Y F, TANG L X. Scheduling a single continuous batch processing machine to minimize total completion time[J]. Acta Electronica Sinica, 2008, 36(2): 367-370. (in Chinese) | |
12 | Z̆ILINSKAS A, ZHIGLJAVSKY A. Branch and probability bound methods in multi-objective optimization[J]. Optimization Letters, 2016, 10(2): 341-353. |
13 | SINGH H K, RAY T, SMITH W. C-PSA: Constrained pareto simulated annealing for constrained multi-objective optimization[J]. Information Sciences, 2010, 180(13): 2499-2513. |
14 | ZHANG J, SHANG Y, GAO R, et al. An improved multi-objective adaptive niche genetic algorithm based on pareto front[C]//2009 IEEE International Advance Computing Conference. Virtual Conference: IEEE, 2009: 300-304. |
15 | LEI R, CHENG Y. A pareto-based differential evolution algorithm for multi-objective optimization problems[C]//2010 Chinese Control and Decision Conference. New York: IEEE, 2010: 1608- 1613. |
16 | ZHANG T, CHEN J, LV D, et al. Automatic generation of reconfiguration blueprints for ima systems using reinforcement learning[J]. IEEE Embedded Systems Letters, 2021, 13(4): 182-185 |
17 | 罗庆, 张涛, 单鹏, 等. 基于改进Q学习的IMA系统重构蓝图生成方法[J].航空学报, 2021, 42(8): 525792-525792. |
LUO Q, ZHANG T, SHAN P, et al. Generating reconfiguration blueprints for IMA systems based on improved Q-learning[J].Acta Aeronautica et Astronautica Sinica, 2021, 42(8): 525792-525792. (in Chinese) | |
18 | HUANG R T, YU T Y, DING Z H, et al. Deep Reinforcement Learning[M]. Singapore: Springer, 2020: 161-212. |
19 | HE Q, HOU X.WD3: Taming the estimation bias in deep reinforcement learning[C]//2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI). Virtual Conference: IEEE, 2020: 391-398. |
20 | CHASLOT J B, WINANDS M, HERIK H, et al. Progressive Strategies for Monte-Carlo Tree Search[J]. New Mathematics & Natural Computation, 2008, 4(3): 343-357. |
21 | PATTERSON W, WINSTON-PROCTOR C E. Game Theory[M]. Part of the Springer Undergraduate Mathematics Series book series(SUMS). London:Springer, 2020: 87-106. |
22 | MEZZETTI C, RENOU L. implementation in mixed nash equilibrium[J]. Warwick Economics Research Paper, 2012, 147(6): 2357-2375. |
23 | KRISHNAMURTHY V, ABAD F. Gradient Based Policy Optimization of Constrained Markov Decision Processes[M]. Singapore: World Scientific, 2012: 503-547. |
24 | POURMOHSENI B, WILDERMANN S, GLA M, et al. Hard real-time application mapping reconfiguration for NoC-based many-core systems[J]. Real-Time Systems, 2019, 55(2): 433-469. |
25 | SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Submitted to Advances in Neural Information Processing Systems(NIPS). Vertual Conference: The MIT Press, 1999: 1057-1063. |
26 | SUTTON R S, BARTO A G, et al. Reinforcement Learning: An Introduction Second edition[M]. London: The MIT Press, 2015: 265-278. |
27 | SOEMERS D, PIETTE R, STEPHENSON M, et al. Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates[C]//2019 IEEE Conference on Games(CoG). Virtual Conference: IEEE, 2019: 1-8. |
[1] | 朱旭, 杨健, 杨涛. 中心频率与输出端口可重构的新型滤波分支线电桥[J]. 电子学报, 2022, 50(6): 1331-1335. |
[2] | 吕明久, 马建朝, 韦旭, 陈文峰, 杨军, 马晓岩. 基于矩阵化原子范数的高频雷达距离-多普勒二维估计方法[J]. 电子学报, 2022, 50(5): 1150-1158. |
[3] | 杨晓城, 杨真乙, 阎敬业, 武林, 蒋明峰. 基于拉普拉斯算子的综合孔径辐射计成像算法[J]. 电子学报, 2022, 50(2): 339-345. |
[4] | 裴翰奇, 杨春玲, 魏志超, 曹燕. 基于SPL迭代思想的图像压缩感知重构神经网络[J]. 电子学报, 2021, 49(6): 1195-1203. |
[5] | 禤韵怡, 杨春玲. 基于帧间组稀疏的两阶段递归增强视频压缩感知重构网络[J]. 电子学报, 2021, 49(3): 435-442. |
[6] | 江家宝, 张晓峰, 沈云付, 欧阳山, 周时强, 彭俊杰, 刘跃军, 金翊. 三值光学计算机中SJ-MSD加法器的设计与实现[J]. 电子学报, 2021, 49(2): 275-285. |
[7] | 王丽, 杜鹏程, 许一鸣, 李必信. 基于分层架构模式识别的软件架构重构技术[J]. 电子学报, 2021, 49(1): 201-208. |
[8] | 孙兵, 吴晨曦, 阮怀林, 叶文强, 苏宝桐. 阵元失效条件下的高精度DOA估计方法[J]. 电子学报, 2020, 48(9): 1688-1694. |
[9] | 黄志清, 曲志伟, 张吉, 张严心, 田锐. 基于深度强化学习的端到端无人驾驶决策[J]. 电子学报, 2020, 48(9): 1711-1719. |
[10] | 李永伟, 谢文冲. 互耦效应下端射阵机载雷达STAP方法研究[J]. 电子学报, 2020, 48(6): 1091-1098. |
[11] | 董道广, 芮国胜, 田文飚, 张洋, 张海波. 基于稀疏贝叶斯学习的时域流信号鲁棒动态压缩感知算法[J]. 电子学报, 2020, 48(5): 990-996. |
[12] | 马彬, 王宏明, 谢显中. 基于二项分布改进的宽带压缩频谱检测方案[J]. 电子学报, 2020, 48(2): 243-248. |
[13] | 袁子晗, 蒋明峰, 李杨, 支明豪, 朱志军. 基于DAWGAN-GP的磁共振图像重构方法研究[J]. 电子学报, 2020, 48(10): 1883-1890. |
[14] | 吴霞, 吴晓军, 史素真, 张其进, 张玉梅. 隐相空间的DUPSO-RPSOVF语音预测模型研究[J]. 电子学报, 2019, 47(9): 1875-1882. |
[15] | 杨高明, 朱海明, 方贤进, 苏树智. 局部差分隐私约束的关联属性不变后随机响应扰动[J]. 电子学报, 2019, 47(5): 1079-1085. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||