基于LLM的日志故障诊断

许婷; 肖桐; 张圣林; 孙一丹; 孙永谦; 裴丹

doi:10.12263/DZXB.20240801

您当前的位置：

首页 >

文章列表页 >

基于LLM的日志故障诊断

中国电子学会科学技术奖特约专栏 | 更新时间：2025-07-24

- 基于LLM的日志故障诊断
- Log Fault Diagnosis Based on Large Language Models
- 电子学报 2025年53卷第4期页码：1123-1141
- 作者机构：
  
  1.南开大学软件学院，天津 300457
  2.清华大学计算机科学与技术系，北京 100084
- 作者简介：
  
  [ "许婷女，1991年10月出生于河南驻马店市.现为南开大学软件学院软件工程专业博士研究生.主要研究方向为异常检测、故障定位、根因分析和故障预测等.E-mail: xuting@mail.nankai.edu.cn" ]
  [ "肖桐男，1990年出生于湖南邵阳市.现为清华大学博士后.主要研究方向为基于日志的异常检测、根因定位、故障预测等.E-mail: xiaotong@tsinghua.edu.cn" ]
  [ "张圣林男，1989年7月出生于山东滨州市.现为南开大学副教授，副院长，博士生、硕士生导师.主要研究方向为基于机器学习的智能运维，包括异常检测、故障定位、根因分析和故障预测等.E-mail: zhangsl@nankai.edu.cn" ]
  [ "孙一丹男，2002年11月出生于天津市.本科就读于南开大学软件学院，硕士就读于浙江大学软件学院.主要研究方向为大语言模型、表征学习等.E-mail: syd20021134@163.com" ]
  [ "孙永谦男，1988年出生于河北石家庄市.现为南开大学副教授，博士生、硕士生导师.主要研究方向为智能运维、人工智能、网络智能管理等.E-mail: sunyongqian@nankai.edu.cn" ]
  [ "裴丹男，出生于河北省.现为清华大学计算机科学与技术系长聘副教授、博士生导师.主要研究方向为智能运维、时间序列智能等.E-mail: peidan@tsinghua.edu.cn" ]
- 基金信息：
  
  国家自然科学基金(62272249;62302244)
- DOI：10.12263/DZXB.20240801
  中图分类号： TP391;
- 收稿：2024-09-03，
  
  修回：2025-04-18，
  
  纸质出版：2025-04-25
- 稿件说明：
移动端阅览
许婷, 肖桐, 张圣林, 等. 基于LLM的日志故障诊断[J]. 电子学报, 2025, 53(04): 1123-1141.

XU Ting, XIAO Tong, ZHANG Sheng-lin, et al. Log Fault Diagnosis Based on Large Language Models[J]. Acta Electronica Sinica, 2025, 53(04): 1123-1141.
许婷, 肖桐, 张圣林, 等. 基于LLM的日志故障诊断[J]. 电子学报, 2025, 53(04): 1123-1141. DOI：10.12263/DZXB.20240801

XU Ting, XIAO Tong, ZHANG Sheng-lin, et al. Log Fault Diagnosis Based on Large Language Models[J]. Acta Electronica Sinica, 2025, 53(04): 1123-1141. DOI：10.12263/DZXB.20240801

摘要

随着软件服务系统日益庞大、复杂，基于日志的故障诊断对保证软件服务的可靠性至关重要.已有的日志故障诊断方法虽然可以确定故障类型，但无法为其推理过程提供解释让运维人员信服，从而导致它们难以在实际生产环境中进行部署.为此，本文提出了一种全新的通过自动构建思维链指令提示（log Chain of Thought-Prompting，CoT-Prompting）来进行日志故障诊断的框架——LogCoT（Log Chain of Thought），它利用基于两阶段思维链提示工程（Auto-Few-Shot-CoT，Auto-FSC）算法，通过大语言模型（Large Language Model，LLM）提取日志的语义信息，从而生成可解释的根因分析报告.此外，LogCoT结合无类别标注的指令优化（prompt-tuning）工程和有类别标注的参数微调（preference-tuning）技术优化微调Mistral基座模型.然后通过大模型反馈身份偏好优化（Large-Language Model feedback Identity Preference Optimisation，LLMf-IPO）算法纠正Mistral生成的错误诊断结果，以更好对齐用户意图.最后，本文基于从一家互联网服务提供商和一家云服务提供商的生产环境中收集的两个日志数据集对LogCoT的性能进行了全面综合的实验评估.实验结果表明，LogCoT在Accuracy、Macro-F1、Weighted-F1等三个性能指标上均优于当前典型的基线模型，在两个数据集上比现有最佳模型的Accuracy分别高出31.88个百分点和10.51个百分点.

Abstract

As the software service systems become increasingly large and complex

log-based fault diagnosis is critical to ensure the reliability of software services. Although existing research in log fault diagnosis methods can identify the type of the fault

they often fails to explain the reasoning process to convince the operation and maintenance personnel

which makes the above method challenging to apply in the production environment. The LogCoT (Log Chain of Thought) is proposed in this paper as a new framework for fault diagnosis based on automatically constructing chain of thought prompting (CoT-Prompting) to address the above issues. The auto-few-shot-CoT (Auto-FSC) algorithm of the two-stage CoT-Prompting engineering extracts semantic information from the large language mode (LLM) table root cause analysis reports. In addition

the combination of prompt-tuning with category-unlabelled and preference-tuning with category-labelled is used to optimally align the base model Mistral. Then

the large language model feedback identity preference optimisation(LLMf-IPO) algorithm is used to correct the wrong diagnosis results generated by the base model Mistral to better align the user’s intention. Finally

we provide a comprehensive experimental evaluation of LogCoT’s performance based on two log datasets collected from the production environment of the top-tier global Internet service provider and a cloud service provider. The experimental results show that LogCoT outperforms the three baseline models in three performance metrics

including Accuracy

Macro-F1

and Weighted-F1 on two datasets

and outperforms the Accuracy of the best existing model by 31.88 percentage points

10.51 percentage points

respectively.

关键词

Keywords

references

DU X Z , YU Y , WANG P , et al . Unstructured log oriented fault diagnosis for operation and maintenance management [C ] // Proceedings of the 3rd International Conference on Computer Science and Application Engineering . New York : ACM , 2019 : 1 - 5 .

RUAN H X , LIU Z J , DING Y . Large-scale log-based failure diagnosis of server groups: A two-stage mining approach based on drain 3 and weight-based optimization algorithm [C ] // 2023 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC) . Piscataway : IEEE , 2023 : 302 - 306 .

JIA T , LI Y , ZHANG C B , et al . Machine deserves better logging: A log enhancement approach for automatic fault diagnosis [C ] // 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW) . Piscataway : IEEE , 2018 : 106 - 111 .

ZHANG L Y , FAN L , GUO N W . Log-based OpenStack fault diagnosis by machine learning [J ] . Journal of Physics: Conference Series , 2018 , 1069 : 012111 .

ZOU D Q , QIN H , JIN H . UiLog: Improving log-based fault diagnosis by log analysis [J ] . Journal of Computer Science and Technology , 2016 , 31 ( 5 ): 1038 - 1052 .

JIA T , LI Y , WU Z H . Brief announcement: Automatic log enhancement for fault diagnosis [C ] // Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing . New York : ACM , 2018 : 415 - 417 .

HANKA S . A grammar based approach to distributed systems fault diagnosis using log files [C ] // Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) . New York : ACM , 2019 : 1 - 81 .

YANG Z , YING S , WANG B M , et al . A system fault diagnosis method with a reclustering algorithm [J ] . Scientific Programming , 2021 , 2021 : 6617882 .

ZHANG X , XU Y , QIN S , et al . Onion: Identifying incident-indicating logs for cloud systems [C ] // Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering . New York : ACM , 2021 : 1253 - 1263 .

CHUAH E , JHUMKA A , BROWNE J C , et al . Insights into the diagnosis of system failures from cluster message logs [C ] // 2015 11th European Dependable Computing Conference (EDCC) . Piscataway : IEEE , 2015 : 225 - 232 .

TAK B C , TAO S , YANG L , et al . LOGAN: Problem diagnosis in the cloud using log-based reference models [C ] // 2016 IEEE International Conference on Cloud Engineering (IC2E) . Piscataway : IEEE , 2016 : 62 - 67 .

XIE Y X , YANG K , LUO P . LogM: Log analysis for multiple components of hadoop platform [J ] . IEEE Access , 2021 , 9 : 73522 - 73532 .

NAGARAJ K , KILLIAN C , NEVILLE J . Structured comparative analysis of systems logs to diagnose performance problems [C ] // Proceedings of NSDI 2012: 9th USENIX Symposium on Networked Systems Design and Imple⁃mentation . San Jose : NSDI , 2012 : 353 - 366 .

IKEUCHI H , WATANABE A , KAWATA T , et al . Root-cause diagnosis using logs generated by user actions [C ] // 2018 IEEE Global Communications Conference (GLOBECOM) . Piscataway : IEEE , 2018 : 1 - 7 .

HE S L , LIN Q W , LOU J G , et al . Identifying impactful service system problems via log analysis [C ] // Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering . New York : ACM , 2018 : 60 - 70 .

CHU Z , CHEN J C , CHEN Q L , et al . Navigate through enigmatic labyrinth A survey of chain of thought reasoning: Advances, frontiers and future [C ] // Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . Stroudsburg : USAACL , 2024 : 1173 - 1203 .

KOJIMA T , GU S S , REID M , et al . Large language models are zero-shot reasoners [C ] // Proceedings of the 36th International Conference on Neural Information Processing Systems . Virtual Event : NeurIPS , 2022 : 22199 - 22213 .

HSIEH C Y , LI C L , YEH C K , et al . Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes [C ] // Findings of the Association for Computational Linguistics: ACL 2023 . Stroudsburg : USAACL , 2023 : 8003 - 8017 .

MAGISTER L C , MALLINSON J , ADAMEK J , et al . Teaching small language models to reason [C ] // Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) . Stroudsburg : USAACL , 2023 : 1773 - 1781 .

HO N , SCHMID L , YUN S Y . Large language models are reasoning teachers [C ] // Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . Stroudsburg : USAACL , 2023 : 14852 - 14882 .

LI L H , HESSEL J , YU Y , et al . Symbolic chain-of-thought distillation: Small models can also “think” step-by-step [C ] // Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . Stroudsburg : USAACL , 2023 : 2665 - 2679 .

YANG S , SHANG Z R , WANG Y Q , et al . Data-free multi-label image recognition via LLM-powered prompt tuning [EB/OL ] . ( 2024-03-02 )[ 2025-04-22 ] . https://arxiv.org/abs/2403.01209v1 https://arxiv.org/abs/2403.01209v1 .

OYMAK S , RAWAT A S , SOLTANOLKOTABI M , et al . On the role of attention in prompt-Tuning [C ] // International Conference on Machine Learning . New York : PMLR , 2023 : 26724 - 26768 .

OUYANG L , WU J , JIANG X , et al . Training language models to follow instructions with human feedback [C ] // Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS) . New Orleans : Curran Associates , 2022 : 27730 - 27744 .

KAUFMANN T , WENG P , BENGS V , et al . A survey of reinforcement learning from human feedback [EB/OL ] . ( 2024-04-30 )[ 2025-04-22 ] . https://arxiv.org/abs/2312.14925v2 https://arxiv.org/abs/2312.14925v2 .

QIN L B , CHEN Q G , WEI F X , et al . Cross-lingual prompting: Improving zero-shot chain-of-thought reasoning across languages [C ] // Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing . Stroudsburg : USAACL , 2023 : 2695 - 2709 .

KONG A B , ZHAO S W , CHEN H , et al . Better zero-shot reasoning with role-play prompting [C ] // Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) . Stroudsburg : USAACL , 2024 : 4099 - 4113 .

LE V H , ZHANG H Y . Log parsing: How far can ChatGPT go? [C ] // 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) . Piscataway : IEEE , 2023 : 1699 - 1704 .

LIANG Y Y , WANG J N , ZHU H L , et al . Prompting large language models with chain-of-thought for few-shot knowledge base question generation [C ] // Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing . Stroudsburg : USAACL , 2023 : 4329 - 4343 .

WANG L , XU W Y , LAN Y H , et al . Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models [C ] // Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . Stroudsburg : USAACL , 2023 : 2609 - 2634 .

LIN Q W , ZHANG H Y , LOU J G , et al . Log clustering based problem identification for online service systems [C ] // 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C) . Piscataway : IEEE , 2016 : 102 - 111 .

QAISER S , ALI R . Text mining: Use of TF-IDF to examine the relevance of words to documents [J ] . International Journal of Computer Applications , 2018 , 181 ( 1 ): 25 - 29 .

YUAN Y , SHI W C , LIANG B , et al . An approach to cloud execution failure diagnosis based on exception logs in OpenStack [C ] // 2019 IEEE 12th International Conference on Cloud Computing (CLOUD) . Piscataway : IEEE , 2019 : 124 - 131 .

CHURCH K W . Word2Vec [J ] . Natural Language Engineering , 2017 , 23 ( 1 ): 155 - 162 .

VERVAET A . MoniLog: An automated log-based anomaly detection system for cloud computing infrastructures [C ] // 2021 IEEE 37th International Conference on Data Engineering (ICDE) . Piscataway : IEEE , 2021 : 2739 - 2743 .

LI X Y , CHEN P F , JING L X , et al . SwissLog: Robust and unified deep learning based log anomaly detection for diverse faults [C ] // 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE) . Piscataway : IEEE , 2020 : 92 - 103 .

LIAO L P , ZHU K , LUO J Z , et al . LogBASA: Log anomaly detection based on system behavior analysis and global semantic awareness [J ] . International Journal of Intelligent Systems , 2023 , 2023 ( 1 ): 3777826 .

SUI Y C , ZHANG Y Z , SUN J J , et al . LogKG: Log failure diagnosis through knowledge graph [J ] . IEEE Transactions on Services Computing , 2023 , 16 ( 5 ): 3493 - 3507 .

ANKERST M , BREUNIG M M , KRIEGEL H P , et al . OPTICS: Ordering points to identify the clustering structure [J ] . ACM Sigmod record , 1999 , 28 ( 2 ): 49 - 60 .

LIU Y L , TAO S M , MENG W B , et al . LogPrompt: Prompt engineering towards zero-shot and interpretable log analysis [C ] // Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings . New York : ACM , 2024 : 364 - 365 .

XU J , CUI Z A , ZHAO Y , et al . UniLog: Automatic logging via LLM and in-context learning [C ] // 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE) . Piscataway : IEEE , 2024 : 1 - 12 .

AN L , MLOUKI O , KHOMH F , et al . Stack Overflow: A code laundering platform? [C ] // 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) . Piscataway : IEEE , 2017 : 283 - 293 .

XU J , YANG R C , HUO Y T , et al . DivLog: Log parsing with prompt enhanced in-context learning [C ] // 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE) . Piscataway : IEEE , 2024 : 2457 - 2468 .

WANG J B , CHU G J , WANG J Y , et al . LogExpert: Log-based recommended resolutions generation using large language model [C ] // Proceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results . New York : ACM , 2024 : 42 - 46 .

JIAO W X , WANG W X , HUANG J T , et al . Is ChatGPT A good translator yes with GPT-4 As the engine [EB/OL ] . ( 2023-11-02 )[ 2025-04-22 ] . https://arxiv.org/abs/2301.08745v4 https://arxiv.org/abs/2301.08745v4 .

ZHOU H , NOVA A , LAROCHELLE H , et al . Teaching algorithmic reasoning via in-context learning [EB/OL ] . ( 2022-11-15 )[ 2025-04-22 ] . https://arxiv.org/abs/2211.09066v1 https://arxiv.org/abs/2211.09066v1 .

CHEN W H . Large language models are few(1)-shot table reasoners [EB/OL ] . ( 2023-01-23 )[ 2025-04-22 ] . https://arxiv.org/abs/2210.06710v2 https://arxiv.org/abs/2210.06710v2 .

CHENG S T , ZHUANG Z Y , XU Y , et al . Call me when necessary: LLMs can efficiently and faithfully reason over structured environments [C ] // Findings of the Association for Computational Linguistics ACL 2024 . Stroudsburg : USAACL , 2024 : 4275 - 4295 .

ZHANG Y F , YANG J Q , YUAN Y , et al . Cumulative reasoning with large language models [EB/OL ] . ( 2025-03-12 )[ 2025-04-22 ] . https://arxiv.org/abs/2308.04371v7 https://arxiv.org/abs/2308.04371v7 .

AN S N , MA Z X , LIN Z Q , et al . Learning from mistakes makes LLM better reasoner [EB/OL ] . ( 2024-03-29 )[ 2025-04-22 ] . https://arxiv.org/abs/2310.20689v4 https://arxiv.org/abs/2310.20689v4 .

QIAO S F , OU Y X , ZHANG N Y , et al . Reasoning with language model prompting: A survey [C ] // Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . Stroudsburg : USAACL , 2023 : 5368 - 5393 .

WANG Z H , LIU A J , LIN H W , et al . RAT: Retrieval augmented thoughts elicit context-aware reasoning in long-horizon generation [EB/OL ] . ( 2024-03-08 )[ 2025-04-22 ] . https://arxiv.org/abs/2403.05313v1 https://arxiv.org/abs/2403.05313v1 .

HU Y S , LEE C H , XIE T B , et al . In-context learning for few-shot dialogue state tracking [C ] // Findings of the Association for Computational Linguistics: EMNLP 2022 . Stroudsburg : USAACL , 2022 : 2627 - 2643 .

RAFAILOV R , SHARMA A , MITCHELL E , et al . Direct preference optimization: Your language model is secretly a reward model [C ] // Proceedings of the 37th International Conference on Neural Information Processing Systems . New Orleans : Curran Associates , 2024 : 53728 - 53741 .

AZAR M G , ROWLAND M , PIOT B , et al . A general theoretical paradigm to understand learning from human preferences [C ] // International Conference on Artificial Intelligence and Statistics . New York : PMLR , 2024 : 4447 - 4455 .

ETHAYARAJH K , XU W , MUENNIGHOFF N , et al . KTO: Model alignment as prospect theoretic optimization [EB/OL ] . ( 2024-11-19 )[ 2025-04-22 ] . https://arxiv.org/abs/2402.01306v4 https://arxiv.org/abs/2402.01306v4 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于进化策略生成可解释性模糊系统