

浏览全部资源
扫码关注微信
1.北京大学计算机学院,北京100871
2.军事科学院战略评估咨询中心,北京100091
3.国防科技大学计算机学院,湖南长沙410073
4.军事科学院,北京100091
Received:07 December 2025,
Accepted:19 January 2026,
Published:25 January 2026
移动端阅览
汪子尧, 田瑜, 黄俊杰, 等. 基于机器学习与规则推理的SSD故障预测方法研究及对比分析[J]. 电子学报, 2026, 54(01): 115-124.
WANG Ziyao, TIAN Yu, HUANG Junjie, et al. Research and Comparative Analysis of SSD Failure Prediction Methods Based on Machine Learning and Rule-Based Reasoning[J]. Acta Electronica Sinica, 2026, 54(01): 115-124.
汪子尧, 田瑜, 黄俊杰, 等. 基于机器学习与规则推理的SSD故障预测方法研究及对比分析[J]. 电子学报, 2026, 54(01): 115-124. DOI:10.12263/DZXB.20250975
WANG Ziyao, TIAN Yu, HUANG Junjie, et al. Research and Comparative Analysis of SSD Failure Prediction Methods Based on Machine Learning and Rule-Based Reasoning[J]. Acta Electronica Sinica, 2026, 54(01): 115-124. DOI:10.12263/DZXB.20250975
随着云计算、大数据及人工智能应用的快速演进,数据中心规模持续扩张,存储系统的可靠性已成为影响其稳定运行与服务可用性的关键因素。固态硬盘(Solid-State Drive, SSD)作为数据中心存储系统的关键组成部分,因其高吞吐、低时延、低功耗等特性被广泛部署于数据中心核心存储层,但在大规模、长周期运行条件下,SSD故障呈现出突发性强、演化模式复杂等特征,对业务连续性与数据安全构成严峻挑战。为提高SSD故障预测的准确性与实用性,本文提出基于分类模型与特征工程的机器学习预测方法,以及基于显式规则引擎和动态特征补偿的规则推理预测方法。机器学习预测方法通过多阶段特征工程与集成学习,在数据完备场景下实现了0.968的宏平均
F
1
分数,但其“黑盒”特性在某种程度上限制了工业应用。规则推理预测方法通过构建多算法融合的显式规则引擎,并引入基于SHAP(SHapley Additive exPlanations)值的动态特征补偿机制,在数据完整情况下达到0.988的准确率;在8个特征缺失的极端场景下仍保持0.941的准确率,展现出强鲁棒性。实验结果对比分析表明,机器学习预测方法在数据完备时预测精度高,规则推理预测方法则在可解释性、实时性与缺失数据适应能力方面更具优势。本文进一步探讨了两类方法的融合路径,为构建兼具感知能力与推理透明性的下一代智能运维系统提供了理论支撑与实践参考。
With the rapid evolution of cloud computing
big data
and artificial intelligence applications
the scale of data centers continues to expand
and the reliability of storage systems has become a critical factor affecting their stable operation and service availability. As a key component of data center storage systems
solid-state drives (SSDs) are widely deployed in the core storage layers of data centers owing to their advantages of high throughput
low latency
and low power consumption. However
under large-scale and long-term operating conditions
SSD failures are characterized by strong suddenness and complex evolution patterns
posing severe challenges to service continuity and data security. To enhance the accuracy and practicality of failure prediction
this paper investigates a machine learning prediction methodology based on classification models and feature engineering
alongside a rule-based reasoning prediction approach utilizing an explicit rule engine and dynamic feature compensation. The machine learning methodology
through multi-stage feature engineering and ensemble learning
achieves a macro-average
F
1
-score of 0.968 under complete data conditions; however
its “black-box” nature somewhat limits its industrial applicability. In contrast
the rule-based reasoning approach constructs an explicit rule engine integrating multiple algorithms and introduces a dynamic feature compensation mechanism based on SHAP (SHapley Additive exPlanations) values. This method attains an accuracy of 0.988 with complete data and maintains an accuracy of 0.941 under extreme conditions with eight missing features
demonstrating strong robustness. Comparative analysis of experimental results indicates that the machine learning methodology excels in prediction accuracy with complete data
while the rule-based reasoning approach offers superior interpretability
real-time performance
and adaptability to missing data. This paper further explores potential pathways for integrating these two methodologies
providing theoretical support and practical references for constructing next-generation intelligent operation and maintenance systems that possess both perceptual capability and transparent reasoning.
Xu Fan , Han Shujie , Lee P P C , et al . General feature selection for failure prediction in Large-scale SSD deployment [C/OL ] // 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) , 2021 : 263 - 270 . https://ieeexplore.ieee.org/document/9505157 https://ieeexplore.ieee.org/document/9505157 . DOI: 10.1109/dsn48987.2021.00039 http://dx.doi.org/10.1109/dsn48987.2021.00039
Botezatu M M , Giurgiu I , Bogojeska J , et al . Predicting disk replacement towards reliable data centers [C/OL ] // The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2016 : 39 - 48 . https://dl.acm.org/doi/10.1145/2939672.2939699 https://dl.acm.org/doi/10.1145/2939672.2939699 . DOI: 10.1145/2939672.2939699 http://dx.doi.org/10.1145/2939672.2939699
Meza J , Wu Q , Kumar S , et al . A large-scale study of flash memory failures in the field [J ] . ACM Sigmetrics Performance Evaluation Review , 2015 , 43 ( 1 ): 177 - 190 . DOI: 10.1145/2796314.2745848 http://dx.doi.org/10.1145/2796314.2745848
Schroeder B , Lagisetty R , Merchant A . Flash reliability in production: the expected and the unexpected [C/OL ] // The 14th usenix conference on file and storage technologies , 2016 : 67 - 80 . https://dl.acm.org/doi/10.5555/2930583.2930589 https://dl.acm.org/doi/10.5555/2930583.2930589 .
Pinheiro E , Weber W D , Barroso L A . Failure trends in a large disk drive population [C/OL ] // Proceedings of the 5th USENIX conference on File and Storage Technologies , 2007 : 2 . https://api.semanticscholar.org/CorpusID:2420428 https://api.semanticscholar.org/CorpusID:2420428 .
You Wenyan , Dong Jiayuan , Feng Xingdi , et al . SSD failures in Large-scale data centers: what why and how [C/OL ] // 2024 International Conference on Networking, Architecture and Storage (NAS) . 2024 : 1 - 8 . https://ieeexplore.ieee.org/abstract/document/10781343 https://ieeexplore.ieee.org/abstract/document/10781343 . DOI: 10.1109/nas63802.2024.10781343 http://dx.doi.org/10.1109/nas63802.2024.10781343
Li Jing , Ji Xinpu , Jia Yuhan , et al . Hard drive failure prediction using classification and regression trees [C ] // 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks , 2014 : 383 - 394 . https://ieeexplore.ieee.org/document/6903596 https://ieeexplore.ieee.org/document/6903596 . DOI: 10.1109/dsn.2014.44 http://dx.doi.org/10.1109/dsn.2014.44
Koh C , Kang J S , Kim T , et al . Temporal-contextual attention network for solid-state drive failure prediction in data centers [J ] . IEEE Access , 2024 , 12 : 154455 - 154466 . DOI: 10.1109/access.2024.3482368 http://dx.doi.org/10.1109/access.2024.3482368
Wang Xiaofei , Zhang Yang , Chen Junyan , et al . Proactive SSD failure prediction with a gradient-guided LSTM-xLSTM hybrid model [C ] // 2025 IEEE International Conference on Cluster Computing . Piscataway : IEEE , 2025 : 11186457 . DOI: 10.1109/cluster59342.2025.11186457 http://dx.doi.org/10.1109/cluster59342.2025.11186457
Rudin C . Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead [J ] . Nature Machine Intelligence , 2019 , 1 ( 5 ): 206 - 215 . DOI: 10.1038/s42256-019-0048-x http://dx.doi.org/10.1038/s42256-019-0048-x
Abu-Soud S M . A novel approach for dealing with missing values in machine learning datasets with discrete values [C ] // 2019 International Conference on Computer and Information Sciences . Piscataway : IEEE , 2019 : 8716430 . DOI: 10.1109/iccisci.2019.8716430 http://dx.doi.org/10.1109/iccisci.2019.8716430
Menghani G . Efficient deep learning: A survey on making deep learning models smaller, faster, and better [J ] . ACM Computing Surveys , 2023 , 55 ( 12 ): 1 - 37 . DOI: 10.1145/3578938 http://dx.doi.org/10.1145/3578938
van der Waa J , Schoonderwoerd T , van Diggelen J , et al . Interpretable confidence measures for decision support systems [J ] . International Journal of Human-Computer Studies , 2020 , 144 : 102493 . DOI: 10.1016/j.ijhcs.2020.102493 http://dx.doi.org/10.1016/j.ijhcs.2020.102493
Góra G , Skowron A , Wojna A . Explainability in RIONA algorithm combining rule induction and instance-based learning [C/OL ] // 2023 18th Conference on Computer Science and Intelligence Systems . 2023 : 491 - 502 . https://ieeexplore.ieee.org/document/10305962 https://ieeexplore.ieee.org/document/10305962 . DOI: 10.15439/2023f4139 http://dx.doi.org/10.15439/2023f4139
Quinlan J R . C4 . 5 : Programs for machine learning [M ] . CA, USA : Morgan Kaufmann Publishers Inc , 1993: 236 - 239 . DOI: 10.1007/bf00993309 http://dx.doi.org/10.1007/bf00993309
Frank E , Witten I H . Generating accurate rule sets without global optimization [C ] // Proceedings of the Fifteenth International Conference on Machine Learning . New York : ACM , 1998 : 144 - 151 . DOI: 10.21236/ada350721 http://dx.doi.org/10.21236/ada350721
Govada A , Thomas V S , Samal I , et al . Distributed multi-class rule based classification using RIPPER [C ] // 2016 IEEE International Conference on Computer and Information Technology . Piscataway : IEEE , 2016 : 303 - 309 . DOI: 10.1109/cit.2016.111 http://dx.doi.org/10.1109/cit.2016.111
Marcílio W E , Eler D M . From explanations to feature selection: Assessing SHAP values as feature selection mechanism [C ] // 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images . Piscataway : IEEE , 2020 : 340 - 347 . DOI: 10.1109/sibgrapi51738.2020.00053 http://dx.doi.org/10.1109/sibgrapi51738.2020.00053
Kwon Y , Lee Z . A hybrid decision support system for adaptive trading strategies: Combining a rule-based expert system with a deep reinforcement learning strategy [J ] . Decision Support Systems , 2024 , 177 : 114100 . DOI: 10.1016/j.dss.2023.114100 http://dx.doi.org/10.1016/j.dss.2023.114100
Li Peng , Liu Kai , Dang Wei , et al . Reliability assessment of NAND SSD based on acceleration degradation test [C ] // 2017 IEEE International Conference on Industrial Engineering and Engineering Management . Piscataway : IEEE , 2017 : 1945 - 1949 . DOI: 10.1109/ieem.2017.8290231 http://dx.doi.org/10.1109/ieem.2017.8290231
ODCC2505002 NVMe子系统故障预测——健康度指标行业标准草案 [S ] .
Breiman L . Random forests [J ] . Machine Learning , 2001 , 45 ( 1 ): 5 - 32 . DOI: 10.1023/a:1010933404324 http://dx.doi.org/10.1023/a:1010933404324
Fürnkranz J , Gamberger D , Lavrač N . Foundations of rule learning (1st ed) [M ] . Berlin : Springer , 2012 . DOI: 10.1007/978-3-540-75197-7 http://dx.doi.org/10.1007/978-3-540-75197-7
Cohen W W . Fast effective rule induction [M ] // Machine Learning Proceedings 1995 . Amsterdam : Elsevier , 1995 : 115 - 123 . DOI: 10.1016/b978-1-55860-377-6.50023-2 http://dx.doi.org/10.1016/b978-1-55860-377-6.50023-2
Kwasny S C , Faisal K A . Overcoming limitations of rule-based systems: An example of a hybrid deterministic parser [C ] // Konnektionismus in Artificial Intelligence und Kognitionsforschung . Berlin, Heidelberg : Springer , 1990 : 48 - 57 . DOI: 10.1007/978-3-642-76070-9_5 http://dx.doi.org/10.1007/978-3-642-76070-9_5
杨鑫文 , 麦钰岚 , 郭巧玉 . 一种固态硬盘的寿命预测方法及系统 : CN202410517838.1 [P ] . 2024-05-28 .
Tabebordbar A , Beheshti A , Benatallah B , et al . Feature-based and adaptive rule adaptation in dynamic environments [J ] . Data Science and Engineering , 2020 , 5 ( 3 ): 207 - 223 . DOI: 10.1007/s41019-020-00130-4 http://dx.doi.org/10.1007/s41019-020-00130-4
0
Views
14
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621