A Reinforcement Learning-Based Framework for Memory Layout Optimization in Ubiquitous Operating Systems

SHA Letian; CHEN Xiao; ZHENG Hongmei; PAN Jiaye; DONG Jiankuo; XIAO Fu

doi:10.12263/DZXB.20250835

您当前的位置：

首页 >

文章列表页 >

A Reinforcement Learning-Based Framework for Memory Layout Optimization in Ubiquitous Operating Systems

\‌Research Progress on Ubiquitous Operating Systems and Environments for Human\-Cyber\-Physical Integrated Scenarios | 更新时间：2026-06-17

- A Reinforcement Learning-Based Framework for Memory Layout Optimization in Ubiquitous Operating Systems
- ACTA ELECTRONICA SINICA Vol. 54, Issue 4, Pages: 1612-1628(2026)
- 作者机构：
  
  1.南京邮电大学计算机学院、软件学院、网络空间安全学院，江苏南京 210023
  2.江苏省物联网智能感知与计算重点实验室，江苏南京 210023
  3.南京邮电大学教育科学与技术学院，江苏南京 210023
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(62572255;62302238);The 2024 Frontier Technology Research and Development Program of Jiangsu(BF2024071)
- DOI：10.12263/DZXB.20250835
  CLC： TP316.4;
- Received：24 September 2025，
  
  Accepted：05 March 2026，
  
  Published：25 April 2026
- 稿件说明：
移动端阅览
沙乐天, 陈霄, 郑红美, 等. 基于强化学习的泛在操作系统内存布局优化框架[J]. 电子学报, 2026, 54(04): 1612-1628.

SHA Letian, CHEN Xiao, ZHENG Hongmei, et al. A Reinforcement Learning-Based Framework for Memory Layout Optimization in Ubiquitous Operating Systems[J]. Acta Electronica Sinica, 2026, 54(04): 1612-1628.
沙乐天, 陈霄, 郑红美, 等. 基于强化学习的泛在操作系统内存布局优化框架[J]. 电子学报, 2026, 54(04): 1612-1628. DOI：10.12263/DZXB.20250835

SHA Letian, CHEN Xiao, ZHENG Hongmei, et al. A Reinforcement Learning-Based Framework for Memory Layout Optimization in Ubiquitous Operating Systems[J]. Acta Electronica Sinica, 2026, 54(04): 1612-1628. DOI：10.12263/DZXB.20250835

摘要

随着人机物三元融合时代的到来，泛在操作系统（Ubiquitous Operating System，UOS）对资源受限且负载多变的边缘节点内存管理提出了弹性与自适应的严苛要求。然而，现有的内存管理策略大多依赖预设的静态规则，难以感知运行时的动态负载特征，导致系统在面对突发性、多模态任务时，常出现严重的内存碎片化与实时响应性能下降，成为制约泛在计算能力落地的关键瓶颈。针对上述挑战，本文提出一种基于深度强化学习的内存布局优化框架AIMO。该框架旨在通过“感知-决策-执行”的闭环机制，实现对内存资源的智能动态管理。首先，多维状态感知模块通过处理器性能监控单元（Performance Monitoring Unit，PMU）和内核插桩，实时捕捉硬件层指标（如缓存未命中率、介质损耗）、对象层行为（如访问频率、冷热度）以及系统层上下文（如内存碎片率、应用场景类型），并将其编码为高维状态向量。其次，强化学习智能决策模块将内存布局问题建模为马尔可夫决策过程（Markov Decision Process，MDP），利用近端策略优化（Proximal Policy Optimization，PPO）算法在线生成最优决策。为了兼顾嵌入式环境的资源约束，该模块采用了“离线预训练与轻量化在线微调”的策略，在保证决策确定性的同时大幅降低了计算开销。最后，策略执行模块通过在内核态设置轻量级钩子函数，实现了分配拦截、对象重定位、策略驱动的对象放置及主动碎片整理等机制，完成了对内存布局的闭环自适应控制。本文在OpenHarmony内核中实现了AIMO框架，并基于Raspberry Pi Zero 2W平台进行了实证评估。实验结果表明，与实时性能标杆TLSF相比，AIMO在维持同等水平最坏情况分配时间的同时，将大块内存分配失败率从3.4%显著降低至0.9%，最大可用连续空闲块从318 KB提升至476 KB，并使上层业务的预加载延迟减少了15.8%。此外，该框架表现出极高的轻量化特性，其决策模块的平均CPU占用率仅为1.9%，元数据存储开销相对于512 MB系统总内存占比不足0.001%。本研究证实了强化学习在操作系统内核内存管理中的有效性，为构建高效、自适应的泛在操作系统内核提供了新思路。

Abstract

With the advent of the era of human-cyber-physical ternary integration

ubiquitous operating systems (UOS) impose stringent requirements for elasticity and adaptability on memory management within resource-constrained edge nodes handling variable workloads. However

most existing memory management strategies rely on predefined static rules and struggle to perceive dynamic workload characteristics at runtime. Consequently

when facing bursty and multi-modal tasks

systems often suffer from severe memory fragmentation and degraded real-time response performance

which has become a critical bottleneck restricting the practical deployment of ubiquitous computing capabilities. To address these challenges

this paper proposes AIMO

a memory layout optimization framework based on deep reinforcement learning. This framework aims to achieve intelligent and dynamic management of memory resources through a closed-loop mechanism of “perception-decision-execution”. First

a multi-dimensional state perception module captures hardware-level metrics (e.g.

cache miss rate

media wear)

object-level behaviors (e.g.

access frequency

coldness/hotness)

and system-level context (e.g.

memory fragmentation rate

application scenario types) in real-time via the processor’s performance monitoring unit (PMU) and kernel instrumentation

encoding them into high-dimensional state vectors. Second

a reinforcement learning intelligent decision module models the memory layout problem as a markov decision process (MDP) and utilizes the proximal policy optimization (PPO) algorithm to generate optimal decisions online. To accommodate the resource constraints of embedded environments

this module adopts a strategy of “offline pre-training and lightweight online fine-tuning”

which significantly reduces computational overhead while ensuring decision determinism. Finally

a policy execution module implements mechanisms such as allocation interception

object relocation

policy-driven object placement

and proactive defragmentation by establishing lightweight hook functions in the kernel space

thereby completing the closed-loop adaptive control of the memory layout. We implemented the AIMO framework in the OpenHarmony kernel and conducted an empirical evaluation based on the Raspberry Pi Zero 2W platform. Experimental results demonstrate that

compared to the real-time performance benchmark TLSF

AIMO significantly reduces the large-block memory allocation failure rate from 3.4% to 0.9% and increases the maximum available contiguous free block from 318 KB to 476 KB

while maintaining an equivalent worst-case allocation time. Furthermore

it reduces the preloading latency of upper-layer applications by 15.8%. In addition

the framework exhibits a highly lightweight nature; the average CPU utilization of its decision module is only 1.9%

and the metadata storage overhead accounts for less than 0.001% of the 512 MB total system memory. This study verifies the effectiveness of reinforcement learning in operating system kernel memory management

providing a novel approach for constructing efficient and adaptive UOS kernels.

关键词

Keywords

references

梅宏 , 曹东刚 , 谢涛 . 泛在操作系统: 面向人机物融合泛在计算的新蓝海 [J ] . 中国科学院院刊 , 2022 , 37 ( 1 ): 30 - 37 . DOI: 10.16418/j.issn.1000-3045.20211117009 http://dx.doi.org/10.16418/j.issn.1000-3045.20211117009

Mei Hong , Cao Donggang , Xie Tao . Ubiquitous operating system: Toward the blue ocean of human-cyber-physical ternary ubiquitous computing [J ] . Bulletin of Chinese Academy of Sciences , 2022 , 37 ( 1 ): 30 - 37 . (in Chinese) . DOI: 10.16418/j.issn.1000-3045.20211117009 http://dx.doi.org/10.16418/j.issn.1000-3045.20211117009

Cao Donggang , Xue Dongliang , Ma Zhiyi , et al . XiUOS: An open-source ubiquitous operating system for industrial Internet of Things [J ] . Science China Information Sciences , 2022 , 65 ( 1 ): 117101 . DOI: 10.1007/s11432-021-3294-y http://dx.doi.org/10.1007/s11432-021-3294-y

Mei Hong , Guo Yao . Toward ubiquitous operating systems: A software-defined perspective [J ] . Computer , 2018 , 51 ( 1 ): 50 - 56 . DOI: 10.1109/mc.2018.1151018 http://dx.doi.org/10.1109/mc.2018.1151018

Liu Xuanzhe , Wang Shangguang , Ma Yun , et al . Operating systems for resource-adaptive intelligent software: Challenges and opportunities [J ] . ACM Transactions on Internet Technology (TOIT) , 2021 , 21 ( 2 ): 27 . DOI: 10.1145/3425866 http://dx.doi.org/10.1145/3425866

靳晓忠 , 刘海坤 , 赖皓 , 等 . 一种可重构异构内存架构和控制器 [J ] . 电子学报 , 2024 , 52 ( 9 ): 3038 - 3051 .

Jin Xiaozhong , Liu Haikun , Lai Hao , et al . A reconfigurable heterogeneous memory architecture and memory controller [J ] . Acta Electronica Sinica , 2024 , 52 ( 9 ): 3038 - 3051 . (in Chinese)

李琪 , 钟将 , 李雪 , 等 . 基于新型非易失存储器的混合内存架构的内存管理机制 [J ] . 电子学报 , 2019 , 47 ( 3 ): 664 - 670 .

Li Qi , Zhong Jiang , Li Xue , et al . Memory management mechanism for hybrid memory architecture based on new non-volatile memory [J ] . Acta Electronica Sinica , 2019 , 47 ( 3 ): 664 - 670 . (in Chinese)

Tripathi R R K , Singh P K , Singh S . Robust left-right hashing scheme for ubiquitous computing [J ] . Engineering Research Express , 2024 , 6 ( 3 ): 035225 . DOI: 10.1088/2631-8695/ad6d2a http://dx.doi.org/10.1088/2631-8695/ad6d2a

钱振江 , 刘永俊 , 姚宇峰 , 等 . 微内核架构内存管理的形式化设计和验证方法研究 [J ] . 电子学报 , 2017 , 45 ( 1 ): 251 - 256 .

Qian Zhenjiang , Liu Yongjun , Yao Yufeng , et al . Research on method of formal design and verification of memory management based on microkernel architecture [J ] . Acta Electronica Sinica , 2017 , 45 ( 1 ): 251 - 256 . (in Chinese)

张佳辰 , 胡泽瑞 , 赵盛 , 等 . VMFS: 一种持久性内存统一管理系统 [J ] . 电子学报 , 2021 , 49 ( 12 ): 2299 - 2306 .

Zhang Jiachen , Hu Zerui , Zhao Sheng , et al . VMFS: A unified persistent memory management system [J ] . Acta Electronica Sinica , 2021 , 49 ( 12 ): 2299 - 2306 . (in Chinese)

范晓鹏 , 阎松 , 翁楚良 . 面向泛在操作系统的结构化存储 [J ] . 中国科学: 信息科学 , 2024 , 54 ( 3 ): 461 - 490 . DOI: 10.1360/ssi-2022-0415 http://dx.doi.org/10.1360/ssi-2022-0415

Fan Xiaopeng , Yan Song , Weng Chuliang . Structured storage for ubiquitous operating systems [J ] . Scientia Sinica Informationis , 2024 , 54 ( 3 ): 461 - 490 . (in Chinese) . DOI: 10.1360/ssi-2022-0415 http://dx.doi.org/10.1360/ssi-2022-0415

Lim J , Song S , Lee S , et al . The design of a new virtualization-based server cluster system targeting for ubiquitous IT systems [M ] // Ubiquitous Computing Application and Wireless Sensor . Dordrecht : Springer Netherlands , 2015 : 361 - 375 . DOI: 10.1007/978-94-017-9618-7_34 http://dx.doi.org/10.1007/978-94-017-9618-7_34

Rajesh S C , Kushwaha A S . Memory optimization techniques in large-scale data management systems [J ] . International Journal for Research in Management and Pharmacy , 2024 , 13 ( 11 ): 37 .

Kwon W , Li Zhuohan , Zhuang Siyuan , et al . Efficient memory management for large language model serving with PagedAttention [C ] // Proceedings of the 29th Symposium on Operating Systems Principles . New York : ACM , 2023 : 611 - 626 . DOI: 10.1145/3600006.3613165 http://dx.doi.org/10.1145/3600006.3613165

Li Haoyang , Li Yiming , Tian Anxin , et al . A survey on large language model acceleration based on KV cache management [J ] . Transactions on Machine Learning Research , 2025 , 2025 .

Zhao Pinxue , Zhang Hailin , Fu Fangcheng , et al . MEMO: Fine-grained tensor management for ultra-long context LLM training [J ] . Proceedings of the ACM on Management of Data , 2025 , 3 ( 1 ): 53 . DOI: 10.1145/3709703 http://dx.doi.org/10.1145/3709703

Hu Zhongzhe , Xiao Junmin , Deng Zheye , et al . MegTaiChi: Dynamic tensor-based memory management optimization for DNN training [C ] // Proceedings of the 36th ACM International Conference on Supercomputing . New York : ACM , 2022 : 25 . DOI: 10.1145/3524059.3532394 http://dx.doi.org/10.1145/3524059.3532394

Zhu Zhanda , Giannoula C , Andoorveedu M , et al . Mist: Efficient distributed training of large language models via memory-parallelism co-optimization [C ] // Proceedings of the Twentieth European Conference on Computer Systems . New York : ACM , 2025 : 1298 - 1316 . DOI: 10.1145/3689031.3717461 http://dx.doi.org/10.1145/3689031.3717461

Lim A , Maddukuri A . Reinforcement learning for dynamic memory allocation [PP/OL ] . V2.arXiv ( 2025-10-08 )[ 2025-10-10 ] . https://arxiv.org/abs/2410.15492 https://arxiv.org/abs/2410.15492 .

Jia Danlin , Wang Li , Valencia N , et al . Learning-based dynamic memory allocation schemes for apache spark data processing [J ] . IEEE Transactions on Cloud Computing , 2024 , 12 ( 1 ): 13 - 25 . DOI: 10.1109/tcc.2023.3329129 http://dx.doi.org/10.1109/tcc.2023.3329129

Garrido L A , Nishtala R , Carpenter P . Continuous-action reinforcement learning for memory allocation in virtualized servers [C ] // Proceedings of the ISC High Performance 2019 International Workshops on High Performance Computing . Heidelberg : Springer , 2019 : 13 - 24 . DOI: 10.1007/978-3-030-34356-9_2 http://dx.doi.org/10.1007/978-3-030-34356-9_2

Kumar N N , Saravana S , Balamurugan S , et al . Optimized memory allocation in edge-PLCs using deep Q-networks and bidirectional LSTM with quantum genetic algorithm [J ] . e-Prime-Advances in Electrical Engineering, Electronics and Energy , 2024 , 10 : 100762 . DOI: 10.1016/j.prime.2024.100762 http://dx.doi.org/10.1016/j.prime.2024.100762

Wang Mowei , Huang Sijiang , Cui Yong , et al . Learning buffer management policies for shared memory switches [C ] // Proceedings of the IEEE INFOCOM 2022-IEEE Conference on Computer Communications . Piscataway : IEEE , 2022 : 730 - 739 . DOI: 10.1109/infocom48880.2022.9796784 http://dx.doi.org/10.1109/infocom48880.2022.9796784

Yi Xinyue , Du Hongchao , Wang Yu , et al . ArtMem: Adaptive migration in reinforcement learning-enabled tiered memory [C ] // Proceedings of the 52nd Annual International Symposium on Computer Architecture . New York : ACM , 2025 : 405 - 418 . DOI: 10.1145/3695053.3731001 http://dx.doi.org/10.1145/3695053.3731001

Chang J , Doh W , Moon Y , et al . IDT: Intelligent data placement for multi-tiered main memory with reinforcement learning [C ] // Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing . New York : ACM , 2024 : 69 - 82 . DOI: 10.1145/3625549.3658659 http://dx.doi.org/10.1145/3625549.3658659

Karimov E , Evenblij T , Chamazcoti S A , et al . PARL: Page allocation in hybrid main memory using reinforcement learning [J ] . Journal of Systems Architecture , 2025 , 159 : 103310 . DOI: 10.1016/j.sysarc.2024.103310 http://dx.doi.org/10.1016/j.sysarc.2024.103310

Yoo S , Shin D . Reinforcement learning-based SLC cache technique for enhancing SSD write performance [C ] // Proceedings of the 12th USENIX Workshop on Hot Topics in Storage and File Systems . USENIX Association , 2020 : 7 .

Huang Darong , Pahlevan A , Costero L , et al . Reinforcement learning-based joint reliability and performance optimization for hybrid-cache computing servers [J ] . IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , 2022 , 41 ( 12 ): 5596 - 5609 . DOI: 10.1109/tcad.2022.3158832 http://dx.doi.org/10.1109/tcad.2022.3158832

范浩 , 徐光平 , 薛彦兵 , 等 . 一种基于强化学习的混合缓存能耗优化与评价 [J ] . 计算机研究与发展 , 2020 , 57 ( 6 ): 1125 - 1139 .

Fan Hao , Xu Guangping , Xue Yanbing , et al . An energy consumption optimization and evaluation for hybrid cache based on reinforcement learning [J ] . Journal of Computer Research and Development , 2020 , 57 ( 6 ): 1125 - 1139 . (in Chinese)

Xie Qian . Application of machine learning algorithms in data cache and storage hierarchical management [C ] // Proceedings of the 3rd International Conference on Data Science and Information System (ICDSIS) . Piscataway : IEEE , 2025 : 1 - 6 . DOI: 10.1109/icdsis65355.2025.11070935 http://dx.doi.org/10.1109/icdsis65355.2025.11070935

Li Xingchen , Yuan Zhihang , Guan Yijin , et al . Flatfish: A reinforcement learning approach for application-aware address mapping [J ] . IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , 2022 , 41 ( 11 ): 4758 - 4770 . DOI: 10.1109/tcad.2022.3146204 http://dx.doi.org/10.1109/tcad.2022.3146204

Yang Chongyi , Zhang Zhendong , Wang Xiaohang , et al . Adaptive caching policies for Chiplet systems based on reinforcement learning [C ] // Proceedings of 2023 IEEE International Symposium on Circuits and Systems (ISCAS) . Piscataway : IEEE , 2023 : 1 - 5 . DOI: 10.1109/iscas46773.2023.10181966 http://dx.doi.org/10.1109/iscas46773.2023.10181966

Yang Huijing , Fang Juan . A fairness-aware prefetching mechanism based on reinforcement learning for multi-core systems [C ] // Proceedings of 2023 IEEE International Conference on High Performance Computing & Communications , Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys) . Piscataway : IEEE , 2023 : 639 - 646 . DOI: 10.1109/hpcc-dss-smartcity-dependsys60770.2023.00092 http://dx.doi.org/10.1109/hpcc-dss-smartcity-dependsys60770.2023.00092

Kim H , Yeom H Y . LPR: Learning-based page replacement scheme for scientific applications [C ] // Proceedings of the 23rd International Middleware Conference Industrial Track . New York : ACM , 2022 : 36 - 42 . DOI: 10.1145/3564695.3564777 http://dx.doi.org/10.1145/3564695.3564777

Jain R , Panda P R , Subramoney S . Cooperative multi-agent reinforcement learning-based co-optimization of cores, caches, and on-chip network [J ] . ACM Transactions on Architecture and Code Optimization (TACO) , 2017 , 14 ( 4 ): 32 . DOI: 10.1145/3132170 http://dx.doi.org/10.1145/3132170

Li Jingchun , Zhou Fanqin , Zhang Guoyi , et al . Resource allocation for componentized multimedia service in ubiquitous computing power environment [C ] // Proceedings of 2021 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) . Piscataway : IEEE , 2021 : 1 - 6 . DOI: 10.1109/bmsb53066.2021.9547193 http://dx.doi.org/10.1109/bmsb53066.2021.9547193

Masmano M , Ripoll I , Crespo A , et al . TLSF: A new dynamic memory allocator for real-time systems [C ] // Proceedings of the 16th Euromicro Conference on Real-Time Systems , 2004 . ECRTS 2004. Piscataway : IEEE , 2004: 79 - 88 . DOI: 10.1109/emrts.2004.1310981 http://dx.doi.org/10.1109/emrts.2004.1310981

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Automatic Theorem Identification for Scenario-Aware Verification in Ubiquitous Operating Systems

Dynamic Task Allocation Method for Heterogeneous Multi-Agent Systems Based on Graph Attention Networks

Fine-Grained Inference Task Offloading for Large Language Model in Industrial Edge-Cloud Collaborative Scenarios

A Causal Tree-of-Thought-Based Model for Battery State-of-Charge Prediction in Electric Vehicles

Lightweight Intrusion Detection in the Edge Internet of Things with Dual-Teacher Distillation

Related Author

ZHANG Manqing

DONG Yunwei

ZHANG Tao

LI Zhongyang

CAO Xiaoke

CAI Yichen

SUN Guibin

LIU Kexin

Related Institution

School of Software, Northwestern Polytechnical University

School of Computer Science and Engineering, Macao University of Science and Technology, Macao

Shenyuan Honors College, Beihang University

School of Automation Science and Electrical Engineering, Beihang University

School of Artificial Intelligence, Beihang University

⁰