电子学报 ›› 2022, Vol. 50 ›› Issue (4): 954-966.DOI: 10.12263/DZXB.20211268
所属专题: 机器学习交叉融合创新
张涛, 张文涛, 代凌, 陈婧怡, 王丽, 魏倩茹
收稿日期:
2021-09-17
修回日期:
2022-03-08
出版日期:
2022-04-25
作者简介:
基金资助:
ZHANG Tao, ZHANG Wen-tao, DAI Ling, CHEN Jing-yi, WANG Li, WEI Qian-ru
Received:
2021-09-17
Revised:
2022-03-08
Online:
2022-04-25
Published:
2022-04-25
Supported by:
摘要:
动态重构是一种有效的综合模块化航空电子系统故障容错方法.重构蓝图定义了系统故障环境下的应用迁移与资源重配置方案,是以最小代价重构恢复系统功能的关键.在复杂多级关联故障模式下,如何快速自动生成有效重构蓝图是其难点.针对该问题,本文提出一种基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法.该方法引入序贯博弈模型,将因受故障影响而需要迁移重构的应用软件定义为博弈中的智能体,根据应用软件优先级确定序贯博弈的顺序.针对序贯博弈过程中多智能体间竞争与合作的问题,算法使用强化学习中的策略梯度,通过控制与环境交互中的动作选择概率来优化重构效果.应用基于有偏估计的策略梯度蒙特卡洛树搜索算法更新博弈策略,解决了传统策略梯度算法震荡难收敛、计算耗时长问题.实验结果表明,与差分进化、Q学习等方法相比,所提算法的优化性能和稳定性均具有显著优势.
中图分类号:
张涛, 张文涛, 代凌, 等. 基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法[J]. 电子学报, 2022, 50(4): 954-966.
Tao ZHANG, Wen-tao ZHANG, Ling DAI, et al. Integrated Modular Avionics System Reconstruction Method Based on Sequential Game Multi-agent Reinforcement Learning[J]. Acta Electronica Sinica, 2022, 50(4): 954-966.
应用软件 Id | 截止时间 /ms | 最坏情况执行时间/ms | 内存 /kb | 优先级 |
---|---|---|---|---|
M1 | 15 | 2 | 20 | 3 |
M2 | 40 | 6 | 90 | 4 |
M3 | 40 | 8 | 90 | 4 |
M4 | 30 | 5 | 80 | 5 |
M5 | 20 | 4 | 80 | 2 |
M6 | 40 | 6 | 70 | 2 |
M7 | 30 | 4 | 70 | 4 |
M8 | 20 | 3 | 60 | 2 |
M9 | 20 | 2 | 60 | 3 |
M10 | 30 | 4 | 50 | 4 |
M11 | 15 | 2 | 50 | 3 |
M12 | 30 | 2 | 40 | 1 |
M13 | 40 | 3 | 40 | 3 |
M14 | 30 | 4 | 30 | 2 |
M15 | 15 | 1 | 30 | 4 |
表1 应用软件属性数据
应用软件 Id | 截止时间 /ms | 最坏情况执行时间/ms | 内存 /kb | 优先级 |
---|---|---|---|---|
M1 | 15 | 2 | 20 | 3 |
M2 | 40 | 6 | 90 | 4 |
M3 | 40 | 8 | 90 | 4 |
M4 | 30 | 5 | 80 | 5 |
M5 | 20 | 4 | 80 | 2 |
M6 | 40 | 6 | 70 | 2 |
M7 | 30 | 4 | 70 | 4 |
M8 | 20 | 3 | 60 | 2 |
M9 | 20 | 2 | 60 | 3 |
M10 | 30 | 4 | 50 | 4 |
M11 | 15 | 2 | 50 | 3 |
M12 | 30 | 2 | 40 | 1 |
M13 | 40 | 3 | 40 | 3 |
M14 | 30 | 4 | 30 | 2 |
M15 | 15 | 1 | 30 | 4 |
硬件资源 | 主框架时间/ms | 分区 | 内存 /kb | 执行时间/ms | 应用软件 |
---|---|---|---|---|---|
C1 | 50 | P1 | 128 | 15 | M15 |
P2 | 256 | 20 | M2 | ||
P3 | 128 | 15 | M13 | ||
C2 | 50 | P1 | 512 | 50 | M3 M4 M12 |
C3 | 50 | P1 | 256 | 30 | M6 M7 |
P2 | 256 | 20 | M8 | ||
C4 | 50 | P1 | 128 | 15 | M9 |
P2 | 128 | 10 | / | ||
P3 | 256 | 25 | M10 M14 | ||
C5 | 50 | P1 | 256 | 30 | M1 M5 |
P2 | 256 | 20 | M11 |
表2 IMA初始E0配置信息
硬件资源 | 主框架时间/ms | 分区 | 内存 /kb | 执行时间/ms | 应用软件 |
---|---|---|---|---|---|
C1 | 50 | P1 | 128 | 15 | M15 |
P2 | 256 | 20 | M2 | ||
P3 | 128 | 15 | M13 | ||
C2 | 50 | P1 | 512 | 50 | M3 M4 M12 |
C3 | 50 | P1 | 256 | 30 | M6 M7 |
P2 | 256 | 20 | M8 | ||
C4 | 50 | P1 | 128 | 15 | M9 |
P2 | 128 | 10 | / | ||
P3 | 256 | 25 | M10 M14 | ||
C5 | 50 | P1 | 256 | 30 | M1 M5 |
P2 | 256 | 20 | M11 |
参数 | 值 |
---|---|
系统应用软件总数 | 15 |
故障应用软件数 | 3,3,6,6,6,6,9,9 |
可用分区数 | 8,8,7,6,8,6,5,3 |
最大博弈次数 | |
有偏长度 | |
博弈竞争学习率 | 0.01 |
博弈合作学习率 | 0.9 |
分区负载参数 | 0.5,0.5 |
负载约束 | 0.8,0.8 |
评价指标参数 | 0.1,0.35,0.35,0.2 |
表3 实验参数设置
参数 | 值 |
---|---|
系统应用软件总数 | 15 |
故障应用软件数 | 3,3,6,6,6,6,9,9 |
可用分区数 | 8,8,7,6,8,6,5,3 |
最大博弈次数 | |
有偏长度 | |
博弈竞争学习率 | 0.01 |
博弈合作学习率 | 0.9 |
分区负载参数 | 0.5,0.5 |
负载约束 | 0.8,0.8 |
评价指标参数 | 0.1,0.35,0.35,0.2 |
故障环境 | 算法 | 最大值平均值 | 算法时延/(ms/次) | 最大值收敛时间/s |
---|---|---|---|---|
E1 | BE-PGMCTS | 0.957 2 | 4.93 | 2.474 8 |
PGMCTS | 0.957 2 | 18.60 | 8.686 2 | |
Qlearn | 0.955 2 | 3.24 | 16.251 0 | |
DE | 0.957 2 | 51.40 | 17.294 0 | |
E2 | BE-PGMCTS | 0.970 6 | 3.53 | 2.308 6 |
PGMCTS | 0.970 6 | 9.05 | 10.887 0 | |
Qlearn | 0.970 6 | 3.83 | 6.736 9 | |
DE | 0.970 6 | 61.60 | 45.707 0 | |
E3 | BE-PGMCTS | 0.956 2 | 3.46 | 2.283 6 |
PGMCTS | 0.956 2 | 8.94 | 3.272 0 | |
Qlearn | 0.955 8 | 3.31 | 13.518 0 | |
DE | 0.956 2 | 48.40 | 8.905 6 | |
E4 | BE-PGMCTS | 0.965 3 | 3.22 | 1.120 6 |
PGMCTS | 0.965 3 | 7.10 | 2.307 5 | |
Qlearn | 0.965 2 | 2.96 | 13.897 0 | |
DE | 0.965 3 | 36.60 | 14.274 0 | |
E5 | BE-PGMCTS | 0.954 1 | 5.87 | 8.893 1 |
PGMCTS | 0.954 1 | 38.10 | 74.486 0 | |
Qlearn | 0.953 3 | 4.32 | 117.930 0 | |
DE | 0.954 1 | 69.10 | 84.762 0 | |
E6 | BE-PGMCTS | 0.961 8 | 4.71 | 27.954 0 |
PGMCTS | 0.961 8 | 20.30 | 123.780 0 | |
Qlearn | 0.960 2 | 4.14 | 41.806 0 | |
DE | 0.961 8 | 50.70 | 153.690 0 | |
E7 | BE-PGMCTS | 0.892 4 | 6.42 | 83.556 0 |
PGMCTS | 0.892 4 | 32.00 | 416.480 0 | |
Qlearn | 0.866 4 | 4.30 | 105.380 0 | |
DE | 0.882 3 | 42.70 | 108.500 0 | |
E8 | BE-PGMCTS | 0.894 9 | 5.79 | 26.512 0 |
PGMCTS | 0.894 9 | 18.30 | 66.411 0 | |
Qlearn | 0.880 1 | 3.92 | 67.134 0 | |
DE | 0.885 8 | 23.10 | 80.873 0 |
表4 算法性能对比
故障环境 | 算法 | 最大值平均值 | 算法时延/(ms/次) | 最大值收敛时间/s |
---|---|---|---|---|
E1 | BE-PGMCTS | 0.957 2 | 4.93 | 2.474 8 |
PGMCTS | 0.957 2 | 18.60 | 8.686 2 | |
Qlearn | 0.955 2 | 3.24 | 16.251 0 | |
DE | 0.957 2 | 51.40 | 17.294 0 | |
E2 | BE-PGMCTS | 0.970 6 | 3.53 | 2.308 6 |
PGMCTS | 0.970 6 | 9.05 | 10.887 0 | |
Qlearn | 0.970 6 | 3.83 | 6.736 9 | |
DE | 0.970 6 | 61.60 | 45.707 0 | |
E3 | BE-PGMCTS | 0.956 2 | 3.46 | 2.283 6 |
PGMCTS | 0.956 2 | 8.94 | 3.272 0 | |
Qlearn | 0.955 8 | 3.31 | 13.518 0 | |
DE | 0.956 2 | 48.40 | 8.905 6 | |
E4 | BE-PGMCTS | 0.965 3 | 3.22 | 1.120 6 |
PGMCTS | 0.965 3 | 7.10 | 2.307 5 | |
Qlearn | 0.965 2 | 2.96 | 13.897 0 | |
DE | 0.965 3 | 36.60 | 14.274 0 | |
E5 | BE-PGMCTS | 0.954 1 | 5.87 | 8.893 1 |
PGMCTS | 0.954 1 | 38.10 | 74.486 0 | |
Qlearn | 0.953 3 | 4.32 | 117.930 0 | |
DE | 0.954 1 | 69.10 | 84.762 0 | |
E6 | BE-PGMCTS | 0.961 8 | 4.71 | 27.954 0 |
PGMCTS | 0.961 8 | 20.30 | 123.780 0 | |
Qlearn | 0.960 2 | 4.14 | 41.806 0 | |
DE | 0.961 8 | 50.70 | 153.690 0 | |
E7 | BE-PGMCTS | 0.892 4 | 6.42 | 83.556 0 |
PGMCTS | 0.892 4 | 32.00 | 416.480 0 | |
Qlearn | 0.866 4 | 4.30 | 105.380 0 | |
DE | 0.882 3 | 42.70 | 108.500 0 | |
E8 | BE-PGMCTS | 0.894 9 | 5.79 | 26.512 0 |
PGMCTS | 0.894 9 | 18.30 | 66.411 0 | |
Qlearn | 0.880 1 | 3.92 | 67.134 0 | |
DE | 0.885 8 | 23.10 | 80.873 0 |
1 | PARRG R, EDWARDSR. Integrated modular avionics[J]. Air & Space Europe, 1999, 1(2): 72-75. |
2 | 丁全心. 综合模块化航空电子系统标准述评[J]. 电光与控制, 2013, 20(6): 1-3. |
DINGX Q. Remarks on standards of integrated module avionic system[J]. Electronics Optics & Control, 2013,20(6): 1-3. (in Chinese) | |
3 | PARTOND. Blueprint for the future[J]. Mental Health Today, 2011, 63(2): 10. |
4 | JOLLIFFEG, NICHOLSONM. Exploring the Possibilities Towards a Preliminary Safety Case for IMA Blueprints[M]. London, UK: Springer, 2005: 8.1-8.43. |
5 | 王震,朱剑锋. 基于在线加载分区机制的重构方案的设计与实现[J]. 航空电子技术, 2016, 47(1): 6. |
WANGZ, ZHUJ F. Design and implementation of a reconfiguration blueprint based on online-loaded partition mechanism[J]. Avionics Technology, 2016, 47(1): 6. (in Chinese) | |
6 | BRIAOE W, BARCELOSD, WRONSKIF, et al. Impact of task migration in NoC-based MPSoCs for soft real-time applications[C]//2007 IFIP International Conference on Very Large Scale Integration. Virtual Conference: IEEE, 2007: 296-299. |
7 | ANNIGHOEFERB, NIL C, SEBALDJ, et al. Structured and symmetric IMA architecture optimization: Use case Ariane launcher[C]//IEEE/AIAA Digital Avionics Systems Conference. New York: IEEE, 2015: 6B3-1-6B3-14. |
8 | 刘若辰, 李建霞, 刘静, 等. 动态多目标优化研究综述[J]. 计算机学报, 2020, 43(7): 1246-1278. |
LIUR C, LIJ X, LIUJ, et al. A survey on dynamic multi-objective optimization[J]. Chinese Journal of Computers, 2020, 43(7): 1246-1278. (in Chinese) | |
9 | CALABOUGHJ. Software configuration—an NP-complete problem[J]. ACM Sigmis Database, 1988, 19(2):29-34. |
10 | HOUX Y, GAOH B, DENGZ Q, et al. Path planning of lunar rover group based on theory of dynamic programming and multi-objective optimization[C]//IEEE Conference on Industrial Electronics & Applications. Virtual Conference: IEEE, 2007: 1308-1313 |
11 | 赵玉芳, 唐立新. 极小化总完工时间的单机连续型批调度问题[J]. 电子学报, 2008, 36(2): 367-370. |
ZHAOY F, TANGL X. Scheduling a single continuous batch processing machine to minimize total completion time[J]. Acta Electronica Sinica, 2008, 36(2): 367-370. (in Chinese) | |
12 | Z̆ILINSKASA, ZHIGLJAVSKYA. Branch and probability bound methods in multi-objective optimization[J]. Optimization Letters, 2016, 10(2): 341-353. |
13 | SINGHH K, RAY T, SMITHW. C-PSA: Constrained pareto simulated annealing for constrained multi-objective optimization[J]. Information Sciences, 2010, 180(13): 2499-2513. |
14 | ZHANGJ, SHANGY, GAOR, et al. An improved multi-objective adaptive niche genetic algorithm based on pareto front[C]//2009 IEEE International Advance Computing Conference. Virtual Conference: IEEE, 2009: 300-304. |
15 | LEIR, CHENGY. A pareto-based differential evolution algorithm for multi-objective optimization problems[C]//2010 Chinese Control and Decision Conference. New York: IEEE, 2010: 1608- 1613. |
16 | ZHANGT, CHENJ, LVD, et al. Automatic generation of reconfiguration blueprints for ima systems using reinforcement learning[J]. IEEE Embedded Systems Letters, 2021, 13(4): 182-185 |
17 | 罗庆, 张涛, 单鹏, 等. 基于改进Q学习的IMA系统重构蓝图生成方法[J].航空学报, 2021, 42(8): 525792-525792. |
LUOQ, ZHANGT, SHANP, et al. Generating reconfiguration blueprints for IMA systems based on improved Q-learning[J].Acta Aeronautica et Astronautica Sinica, 2021, 42(8): 525792-525792. (in Chinese) | |
18 | HUANGR T, YUT Y, DINGZ H, et al. Deep Reinforcement Learning[M]. Singapore: Springer, 2020: 161-212. |
19 | HEQ, HOUX.WD3: Taming the estimation bias in deep reinforcement learning[C]//2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI). Virtual Conference: IEEE, 2020: 391-398. |
20 | CHASLOTJ B, WINANDSM, HERIKH, et al. Progressive Strategies for Monte-Carlo Tree Search[J]. New Mathematics & Natural Computation, 2008, 4(3): 343-357. |
21 | PATTERSONW, WINSTON-PROCTORC E. Game Theory[M]. Part of the Springer Undergraduate Mathematics Series book series(SUMS). London:Springer, 2020: 87-106. |
22 | MEZZETTIC, RENOUL. implementation in mixed nash equilibrium[J]. Warwick Economics Research Paper, 2012, 147(6): 2357-2375. |
23 | KRISHNAMURTHYV, ABADF. Gradient Based Policy Optimization of Constrained Markov Decision Processes[M]. Singapore: World Scientific, 2012: 503-547. |
24 | POURMOHSENIB, WILDERMANNS, GLA M, et al. Hard real-time application mapping reconfiguration for NoC-based many-core systems[J]. Real-Time Systems, 2019, 55(2): 433-469. |
25 | SUTTONR S, MCALLESTERD, SINGHS, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Submitted to Advances in Neural Information Processing Systems(NIPS). Vertual Conference: The MIT Press, 1999: 1057-1063. |
26 | SUTTONR S, BARTOA G, et al. Reinforcement Learning: An Introduction Second edition[M]. London: The MIT Press, 2015: 265-278. |
27 | SOEMERSD, PIETTER, STEPHENSONM, et al. Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates[C]//2019 IEEE Conference on Games(CoG). Virtual Conference: IEEE, 2019: 1-8. |
[1] | 袁海英, 成君鹏, 曾智勇, 武延瑞. Mobile_BLNet:基于Big-Little Net的轻量级卷积神经网络优化设计[J]. 电子学报, 2023, 51(1): 180-191. |
[2] | 段世红, 何昊, 徐诚, 殷楠, 王然. 未知环境下基于深度序列蒙特卡罗树搜索的信源导航方法[J]. 电子学报, 2022, 50(7): 1744-1752. |
[3] | 朱旭, 杨健, 杨涛. 中心频率与输出端口可重构的新型滤波分支线电桥[J]. 电子学报, 2022, 50(6): 1331-1335. |
[4] | 吕明久, 马建朝, 韦旭, 陈文峰, 杨军, 马晓岩. 基于矩阵化原子范数的高频雷达距离-多普勒二维估计方法[J]. 电子学报, 2022, 50(5): 1150-1158. |
[5] | 杨晓城, 杨真乙, 阎敬业, 武林, 蒋明峰. 基于拉普拉斯算子的综合孔径辐射计成像算法[J]. 电子学报, 2022, 50(2): 339-345. |
[6] | 刘元安, 王卫民, 张震, 王珩, 高华强. 基于可重构OTA装置的非平稳多维无线信道重构模型与应用性能分析[J]. 电子学报, 2022, 50(12): 2935-2944. |
[7] | 邓若琪, 张雨童, 张浩波, 邸博雅, 张泓亮, 宋令阳. 全息无线电:全息超表面赋能的超大规模MIMO新范式[J]. 电子学报, 2022, 50(12): 2984-2995. |
[8] | 王鹏, 邹彬, 刘金枝, 周丹阳. 基于Xilinx型FPGA系统单粒子效应评估方法研究[J]. 电子学报, 2022, 50(11): 2716-2721. |
[9] | 魏志超, 杨春玲. 时域注意力特征对齐的视频压缩感知重构网络[J]. 电子学报, 2022, 50(11): 2584-2592. |
[10] | 陈文俊, 杨春玲. 图像压缩感知的特征域优化及自注意力增强神经网络重构算法[J]. 电子学报, 2022, 50(11): 2629-2637. |
[11] | 田典, 余宁梅, 周广霖, 李娜. 双线阵无透镜超分辨率扫描成像方法研究[J]. 电子学报, 2022, 50(10): 2503-2516. |
[12] | 裴翰奇, 杨春玲, 魏志超, 曹燕. 基于SPL迭代思想的图像压缩感知重构神经网络[J]. 电子学报, 2021, 49(6): 1195-1203. |
[13] | 禤韵怡, 杨春玲. 基于帧间组稀疏的两阶段递归增强视频压缩感知重构网络[J]. 电子学报, 2021, 49(3): 435-442. |
[14] | 江家宝, 张晓峰, 沈云付, 欧阳山, 周时强, 彭俊杰, 刘跃军, 金翊. 三值光学计算机中SJ-MSD加法器的设计与实现[J]. 电子学报, 2021, 49(2): 275-285. |
[15] | 王丽, 杜鹏程, 许一鸣, 李必信. 基于分层架构模式识别的软件架构重构技术[J]. 电子学报, 2021, 49(1): 201-208. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||