

浏览全部资源
扫码关注微信
1.南京邮电大学计算机学院、软件学院、网络空间安全学院,江苏南京 210023
2.江苏省物联网智能感知与计算重点实验室, 江苏南京 210023
3.南京邮电大学教育科学与技术学院,江苏南京 210023
Received:24 September 2025,
Accepted:05 March 2026,
Published:25 April 2026
移动端阅览
沙乐天, 陈霄, 郑红美, 等. 基于强化学习的泛在操作系统内存布局优化框架[J]. 电子学报, 2026, 54(04): 1612-1628.
SHA Letian, CHEN Xiao, ZHENG Hongmei, et al. A Reinforcement Learning-Based Framework for Memory Layout Optimization in Ubiquitous Operating Systems[J]. Acta Electronica Sinica, 2026, 54(04): 1612-1628.
沙乐天, 陈霄, 郑红美, 等. 基于强化学习的泛在操作系统内存布局优化框架[J]. 电子学报, 2026, 54(04): 1612-1628. DOI:10.12263/DZXB.20250835
SHA Letian, CHEN Xiao, ZHENG Hongmei, et al. A Reinforcement Learning-Based Framework for Memory Layout Optimization in Ubiquitous Operating Systems[J]. Acta Electronica Sinica, 2026, 54(04): 1612-1628. DOI:10.12263/DZXB.20250835
随着人机物三元融合时代的到来,泛在操作系统(Ubiquitous Operating System,UOS)对资源受限且负载多变的边缘节点内存管理提出了弹性与自适应的严苛要求。然而,现有的内存管理策略大多依赖预设的静态规则,难以感知运行时的动态负载特征,导致系统在面对突发性、多模态任务时,常出现严重的内存碎片化与实时响应性能下降,成为制约泛在计算能力落地的关键瓶颈。针对上述挑战,本文提出一种基于深度强化学习的内存布局优化框架AIMO。该框架旨在通过“感知-决策-执行”的闭环机制,实现对内存资源的智能动态管理。首先,多维状态感知模块通过处理器性能监控单元(Performance Monitoring Unit,PMU)和内核插桩,实时捕捉硬件层指标(如缓存未命中率、介质损耗)、对象层行为(如访问频率、冷热度)以及系统层上下文(如内存碎片率、应用场景类型),并将其编码为高维状态向量。其次,强化学习智能决策模块将内存布局问题建模为马尔可夫决策过程(Markov Decision Process,MDP),利用近端策略优化(Proximal Policy Optimization,PPO)算法在线生成最优决策。为了兼顾嵌入式环境的资源约束,该模块采用了“离线预训练与轻量化在线微调”的策略,在保证决策确定性的同时大幅降低了计算开销。最后,策略执行模块通过在内核态设置轻量级钩子函数,实现了分配拦截、对象重定位、策略驱动的对象放置及主动碎片整理等机制,完成了对内存布局的闭环自适应控制。本文在OpenHarmony内核中实现了AIMO框架,并基于Raspberry Pi Zero 2W平台进行了实证评估。实验结果表明,与实时性能标杆TLSF相比,AIMO在维持同等水平最坏情况分配时间的同时,将大块内存分配失败率从3.4%显著降低至0.9%,最大可用连续空闲块从318 KB提升至476 KB,并使上层业务的预加载延迟减少了15.8%。此外,该框架表现出极高的轻量化特性,其决策模块的平均CPU占用率仅为1.9%,元数据存储开销相对于512 MB系统总内存占比不足0.001%。本研究证实了强化学习在操作系统内核内存管理中的有效性,为构建高效、自适应的泛在操作系统内核提供了新思路。
With the advent of the era of human-cyber-physical ternary integration
ubiquitous operating systems (UOS) impose stringent requirements for elasticity and adaptability on memory management within resource-constrained edge nodes handling variable workloads. However
most existing memory management strategies rely on predefined static rules and struggle to perceive dynamic workload characteristics at runtime. Consequently
when facing bursty and multi-modal tasks
systems often suffer from severe memory fragmentation and degraded real-time response performance
which has become a critical bottleneck restricting the practical deployment of ubiquitous computing capabilities. To address these challenges
this paper proposes AIMO
a memory layout optimization framework based on deep reinforcement learning. This framework aims to achieve intelligent and dynamic management of memory resources through a closed-loop mechanism of “perception-decision-execution”. First
a multi-dimensional state perception module captures hardware-level metrics (e.g.
cache miss rate
media wear)
object-level behaviors (e.g.
access frequency
coldness/hotness)
and system-level context (e.g.
memory fragmentation rate
application scenario types) in real-time via the processor’s performance monitoring unit (PMU) and kernel instrumentation
encoding them into high-dimensional state vectors. Second
a reinforcement learning intelligent decision module models the memory layout problem as a markov decision process (MDP) and utilizes the proximal policy optimization (PPO) algorithm to generate optimal decisions online. To accommodate the resource constraints of embedded environments
this module adopts a strategy of “offline pre-training and lightweight online fine-tuning”
which significantly reduces computational overhead while ensuring decision determinism. Finally
a policy execution module implements mechanisms such as allocation interception
object relocation
policy-driven object placement
and proactive defragmentation by establishing lightweight hook functions in the kernel space
thereby completing the closed-loop adaptive control of the memory layout. We implemented the AIMO framework in the OpenHarmony kernel and conducted an empirical evaluation based on the Raspberry Pi Zero 2W platform. Experimental results demonstrate that
compared to the real-time performance benchmark TLSF
AIMO significantly reduces the large-block memory allocation failure rate from 3.4% to 0.9% and increases the maximum available contiguous free block from 318 KB to 476 KB
while maintaining an equivalent worst-case allocation time. Furthermore
it reduces the preloading latency of upper-layer applications by 15.8%. In addition
the framework exhibits a highly lightweight nature; the average CPU utilization of its decision module is only 1.9%
and the metadata storage overhead accounts for less than 0.001% of the 512 MB total system memory. This study verifies the effectiveness of reinforcement learning in operating system kernel memory management
providing a novel approach for constructing efficient and adaptive UOS kernels.
梅宏 , 曹东刚 , 谢涛 . 泛在操作系统: 面向人机物融合泛在计算的新蓝海 [J ] . 中国科学院院刊 , 2022 , 37 ( 1 ): 30 - 37 . DOI: 10.16418/j.issn.1000-3045.20211117009 http://dx.doi.org/10.16418/j.issn.1000-3045.20211117009
Mei Hong , Cao Donggang , Xie Tao . Ubiquitous operating system: Toward the blue ocean of human-cyber-physical ternary ubiquitous computing [J ] . Bulletin of Chinese Academy of Sciences , 2022 , 37 ( 1 ): 30 - 37 . (in Chinese) . DOI: 10.16418/j.issn.1000-3045.20211117009 http://dx.doi.org/10.16418/j.issn.1000-3045.20211117009
Cao Donggang , Xue Dongliang , Ma Zhiyi , et al . XiUOS: An open-source ubiquitous operating system for industrial Internet of Things [J ] . Science China Information Sciences , 2022 , 65 ( 1 ): 117101 . DOI: 10.1007/s11432-021-3294-y http://dx.doi.org/10.1007/s11432-021-3294-y
Mei Hong , Guo Yao . Toward ubiquitous operating systems: A software-defined perspective [J ] . Computer , 2018 , 51 ( 1 ): 50 - 56 . DOI: 10.1109/mc.2018.1151018 http://dx.doi.org/10.1109/mc.2018.1151018
Liu Xuanzhe , Wang Shangguang , Ma Yun , et al . Operating systems for resource-adaptive intelligent software: Challenges and opportunities [J ] . ACM Transactions on Internet Technology (TOIT) , 2021 , 21 ( 2 ): 27 . DOI: 10.1145/3425866 http://dx.doi.org/10.1145/3425866
靳晓忠 , 刘海坤 , 赖皓 , 等 . 一种可重构异构内存架构和控制器 [J ] . 电子学报 , 2024 , 52 ( 9 ): 3038 - 3051 .
Jin Xiaozhong , Liu Haikun , Lai Hao , et al . A reconfigurable heterogeneous memory architecture and memory controller [J ] . Acta Electronica Sinica , 2024 , 52 ( 9 ): 3038 - 3051 . (in Chinese)
李琪 , 钟将 , 李雪 , 等 . 基于新型非易失存储器的混合内存架构的内存管理机制 [J ] . 电子学报 , 2019 , 47 ( 3 ): 664 - 670 .
Li Qi , Zhong Jiang , Li Xue , et al . Memory management mechanism for hybrid memory architecture based on new non-volatile memory [J ] . Acta Electronica Sinica , 2019 , 47 ( 3 ): 664 - 670 . (in Chinese)
Tripathi R R K , Singh P K , Singh S . Robust left-right hashing scheme for ubiquitous computing [J ] . Engineering Research Express , 2024 , 6 ( 3 ): 035225 . DOI: 10.1088/2631-8695/ad6d2a http://dx.doi.org/10.1088/2631-8695/ad6d2a
钱振江 , 刘永俊 , 姚宇峰 , 等 . 微内核架构内存管理的形式化设计和验证方法研究 [J ] . 电子学报 , 2017 , 45 ( 1 ): 251 - 256 .
Qian Zhenjiang , Liu Yongjun , Yao Yufeng , et al . Research on method of formal design and verification of memory management based on microkernel architecture [J ] . Acta Electronica Sinica , 2017 , 45 ( 1 ): 251 - 256 . (in Chinese)
张佳辰 , 胡泽瑞 , 赵盛 , 等 . VMFS: 一种持久性内存统一管理系统 [J ] . 电子学报 , 2021 , 49 ( 12 ): 2299 - 2306 .
Zhang Jiachen , Hu Zerui , Zhao Sheng , et al . VMFS: A unified persistent memory management system [J ] . Acta Electronica Sinica , 2021 , 49 ( 12 ): 2299 - 2306 . (in Chinese)
范晓鹏 , 阎松 , 翁楚良 . 面向泛在操作系统的结构化存储 [J ] . 中国科学: 信息科学 , 2024 , 54 ( 3 ): 461 - 490 . DOI: 10.1360/ssi-2022-0415 http://dx.doi.org/10.1360/ssi-2022-0415
Fan Xiaopeng , Yan Song , Weng Chuliang . Structured storage for ubiquitous operating systems [J ] . Scientia Sinica Informationis , 2024 , 54 ( 3 ): 461 - 490 . (in Chinese) . DOI: 10.1360/ssi-2022-0415 http://dx.doi.org/10.1360/ssi-2022-0415
Lim J , Song S , Lee S , et al . The design of a new virtualization-based server cluster system targeting for ubiquitous IT systems [M ] // Ubiquitous Computing Application and Wireless Sensor . Dordrecht : Springer Netherlands , 2015 : 361 - 375 . DOI: 10.1007/978-94-017-9618-7_34 http://dx.doi.org/10.1007/978-94-017-9618-7_34
Rajesh S C , Kushwaha A S . Memory optimization techniques in large-scale data management systems [J ] . International Journal for Research in Management and Pharmacy , 2024 , 13 ( 11 ): 37 .
Kwon W , Li Zhuohan , Zhuang Siyuan , et al . Efficient memory management for large language model serving with PagedAttention [C ] // Proceedings of the 29th Symposium on Operating Systems Principles . New York : ACM , 2023 : 611 - 626 . DOI: 10.1145/3600006.3613165 http://dx.doi.org/10.1145/3600006.3613165
Li Haoyang , Li Yiming , Tian Anxin , et al . A survey on large language model acceleration based on KV cache management [J ] . Transactions on Machine Learning Research , 2025 , 2025 .
Zhao Pinxue , Zhang Hailin , Fu Fangcheng , et al . MEMO: Fine-grained tensor management for ultra-long context LLM training [J ] . Proceedings of the ACM on Management of Data , 2025 , 3 ( 1 ): 53 . DOI: 10.1145/3709703 http://dx.doi.org/10.1145/3709703
Hu Zhongzhe , Xiao Junmin , Deng Zheye , et al . MegTaiChi: Dynamic tensor-based memory management optimization for DNN training [C ] // Proceedings of the 36th ACM International Conference on Supercomputing . New York : ACM , 2022 : 25 . DOI: 10.1145/3524059.3532394 http://dx.doi.org/10.1145/3524059.3532394
Zhu Zhanda , Giannoula C , Andoorveedu M , et al . Mist: Efficient distributed training of large language models via memory-parallelism co-optimization [C ] // Proceedings of the Twentieth European Conference on Computer Systems . New York : ACM , 2025 : 1298 - 1316 . DOI: 10.1145/3689031.3717461 http://dx.doi.org/10.1145/3689031.3717461
Lim A , Maddukuri A . Reinforcement learning for dynamic memory allocation [PP/OL ] . V2.arXiv ( 2025-10-08 )[ 2025-10-10 ] . https://arxiv.org/abs/2410.15492 https://arxiv.org/abs/2410.15492 .
Jia Danlin , Wang Li , Valencia N , et al . Learning-based dynamic memory allocation schemes for apache spark data processing [J ] . IEEE Transactions on Cloud Computing , 2024 , 12 ( 1 ): 13 - 25 . DOI: 10.1109/tcc.2023.3329129 http://dx.doi.org/10.1109/tcc.2023.3329129
Garrido L A , Nishtala R , Carpenter P . Continuous-action reinforcement learning for memory allocation in virtualized servers [C ] // Proceedings of the ISC High Performance 2019 International Workshops on High Performance Computing . Heidelberg : Springer , 2019 : 13 - 24 . DOI: 10.1007/978-3-030-34356-9_2 http://dx.doi.org/10.1007/978-3-030-34356-9_2
Kumar N N , Saravana S , Balamurugan S , et al . Optimized memory allocation in edge-PLCs using deep Q-networks and bidirectional LSTM with quantum genetic algorithm [J ] . e-Prime-Advances in Electrical Engineering, Electronics and Energy , 2024 , 10 : 100762 . DOI: 10.1016/j.prime.2024.100762 http://dx.doi.org/10.1016/j.prime.2024.100762
Wang Mowei , Huang Sijiang , Cui Yong , et al . Learning buffer management policies for shared memory switches [C ] // Proceedings of the IEEE INFOCOM 2022-IEEE Conference on Computer Communications . Piscataway : IEEE , 2022 : 730 - 739 . DOI: 10.1109/infocom48880.2022.9796784 http://dx.doi.org/10.1109/infocom48880.2022.9796784
Yi Xinyue , Du Hongchao , Wang Yu , et al . ArtMem: Adaptive migration in reinforcement learning-enabled tiered memory [C ] // Proceedings of the 52nd Annual International Symposium on Computer Architecture . New York : ACM , 2025 : 405 - 418 . DOI: 10.1145/3695053.3731001 http://dx.doi.org/10.1145/3695053.3731001
Chang J , Doh W , Moon Y , et al . IDT: Intelligent data placement for multi-tiered main memory with reinforcement learning [C ] // Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing . New York : ACM , 2024 : 69 - 82 . DOI: 10.1145/3625549.3658659 http://dx.doi.org/10.1145/3625549.3658659
Karimov E , Evenblij T , Chamazcoti S A , et al . PARL: Page allocation in hybrid main memory using reinforcement learning [J ] . Journal of Systems Architecture , 2025 , 159 : 103310 . DOI: 10.1016/j.sysarc.2024.103310 http://dx.doi.org/10.1016/j.sysarc.2024.103310
Yoo S , Shin D . Reinforcement learning-based SLC cache technique for enhancing SSD write performance [C ] // Proceedings of the 12th USENIX Workshop on Hot Topics in Storage and File Systems . USENIX Association , 2020 : 7 .
Huang Darong , Pahlevan A , Costero L , et al . Reinforcement learning-based joint reliability and performance optimization for hybrid-cache computing servers [J ] . IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , 2022 , 41 ( 12 ): 5596 - 5609 . DOI: 10.1109/tcad.2022.3158832 http://dx.doi.org/10.1109/tcad.2022.3158832
范浩 , 徐光平 , 薛彦兵 , 等 . 一种基于强化学习的混合缓存能耗优化与评价 [J ] . 计算机研究与发展 , 2020 , 57 ( 6 ): 1125 - 1139 .
Fan Hao , Xu Guangping , Xue Yanbing , et al . An energy consumption optimization and evaluation for hybrid cache based on reinforcement learning [J ] . Journal of Computer Research and Development , 2020 , 57 ( 6 ): 1125 - 1139 . (in Chinese)
Xie Qian . Application of machine learning algorithms in data cache and storage hierarchical management [C ] // Proceedings of the 3rd International Conference on Data Science and Information System (ICDSIS) . Piscataway : IEEE , 2025 : 1 - 6 . DOI: 10.1109/icdsis65355.2025.11070935 http://dx.doi.org/10.1109/icdsis65355.2025.11070935
Li Xingchen , Yuan Zhihang , Guan Yijin , et al . Flatfish: A reinforcement learning approach for application-aware address mapping [J ] . IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , 2022 , 41 ( 11 ): 4758 - 4770 . DOI: 10.1109/tcad.2022.3146204 http://dx.doi.org/10.1109/tcad.2022.3146204
Yang Chongyi , Zhang Zhendong , Wang Xiaohang , et al . Adaptive caching policies for Chiplet systems based on reinforcement learning [C ] // Proceedings of 2023 IEEE International Symposium on Circuits and Systems (ISCAS) . Piscataway : IEEE , 2023 : 1 - 5 . DOI: 10.1109/iscas46773.2023.10181966 http://dx.doi.org/10.1109/iscas46773.2023.10181966
Yang Huijing , Fang Juan . A fairness-aware prefetching mechanism based on reinforcement learning for multi-core systems [C ] // Proceedings of 2023 IEEE International Conference on High Performance Computing & Communications , Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys) . Piscataway : IEEE , 2023 : 639 - 646 . DOI: 10.1109/hpcc-dss-smartcity-dependsys60770.2023.00092 http://dx.doi.org/10.1109/hpcc-dss-smartcity-dependsys60770.2023.00092
Kim H , Yeom H Y . LPR: Learning-based page replacement scheme for scientific applications [C ] // Proceedings of the 23rd International Middleware Conference Industrial Track . New York : ACM , 2022 : 36 - 42 . DOI: 10.1145/3564695.3564777 http://dx.doi.org/10.1145/3564695.3564777
Jain R , Panda P R , Subramoney S . Cooperative multi-agent reinforcement learning-based co-optimization of cores, caches, and on-chip network [J ] . ACM Transactions on Architecture and Code Optimization (TACO) , 2017 , 14 ( 4 ): 32 . DOI: 10.1145/3132170 http://dx.doi.org/10.1145/3132170
Li Jingchun , Zhou Fanqin , Zhang Guoyi , et al . Resource allocation for componentized multimedia service in ubiquitous computing power environment [C ] // Proceedings of 2021 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) . Piscataway : IEEE , 2021 : 1 - 6 . DOI: 10.1109/bmsb53066.2021.9547193 http://dx.doi.org/10.1109/bmsb53066.2021.9547193
Masmano M , Ripoll I , Crespo A , et al . TLSF: A new dynamic memory allocator for real-time systems [C ] // Proceedings of the 16th Euromicro Conference on Real-Time Systems , 2004 . ECRTS 2004. Piscataway : IEEE , 2004: 79 - 88 . DOI: 10.1109/emrts.2004.1310981 http://dx.doi.org/10.1109/emrts.2004.1310981
0
Views
5
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621