Enhancing a Log-analyzer with GraphRAG
(2025)Department of Automatic Control
- Abstract
- Sifting through complex software logs for the root cause of errors can be a daunting and time-consuming task. This thesis explores a novel approach using knowledge graphs and GraphRAG to empower Large Language Models (LLMs) for more efficient and accurate analysis. The methodology involves utilizing log template mining and a CMake-generated file dependency tree to build the knowledge graph, which is then queried for contextual paths to augment LLM prompts. To evaluate our GraphRAG implementation, various LLMs were tested, and performance was measured using F1-score, accuracy, precision, and recall based on the predicted root cause lines. Additionally, a developer survey was conducted to assess the quality of the generated descriptions and... (More)
- Sifting through complex software logs for the root cause of errors can be a daunting and time-consuming task. This thesis explores a novel approach using knowledge graphs and GraphRAG to empower Large Language Models (LLMs) for more efficient and accurate analysis. The methodology involves utilizing log template mining and a CMake-generated file dependency tree to build the knowledge graph, which is then queried for contextual paths to augment LLM prompts. To evaluate our GraphRAG implementation, various LLMs were tested, and performance was measured using F1-score, accuracy, precision, and recall based on the predicted root cause lines. Additionally, a developer survey was conducted to assess the quality of the generated descriptions and solutions.
Our evaluation demonstrates that the GraphRAG implementation can significantly improve root cause line identification, increasing the F1-score by up to 58.9% for CMake-related errors and up to 23.56% on a randomized test set, depending on the LLM. Code-oriented LLMs, such as Qwen2.5 coder (achieving an F1-score of 0.77 with GraphRAG), proved to be the most effective both with and without GraphRAG. Survey results regarding the descriptions and solutions generated with GraphRAG compared to those without were inconclusive, suggesting comparable quality.
The thesis also discusses potential threats to validity and outlines areas for future research in GraphRAG for log-based root cause analysis, including the expansion of domain-specific knowledge within the graph. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9208051
- author
- Johansson, Tobias and Khan, Ishaaq
- supervisor
- organization
- year
- 2025
- type
- H3 - Professional qualifications (4 Years - )
- subject
- report number
- TFRT-6283
- other publication id
- 0280-5316
- language
- English
- id
- 9208051
- date added to LUP
- 2025-08-08 15:08:20
- date last changed
- 2025-08-08 15:08:20
@misc{9208051, abstract = {{Sifting through complex software logs for the root cause of errors can be a daunting and time-consuming task. This thesis explores a novel approach using knowledge graphs and GraphRAG to empower Large Language Models (LLMs) for more efficient and accurate analysis. The methodology involves utilizing log template mining and a CMake-generated file dependency tree to build the knowledge graph, which is then queried for contextual paths to augment LLM prompts. To evaluate our GraphRAG implementation, various LLMs were tested, and performance was measured using F1-score, accuracy, precision, and recall based on the predicted root cause lines. Additionally, a developer survey was conducted to assess the quality of the generated descriptions and solutions. Our evaluation demonstrates that the GraphRAG implementation can significantly improve root cause line identification, increasing the F1-score by up to 58.9% for CMake-related errors and up to 23.56% on a randomized test set, depending on the LLM. Code-oriented LLMs, such as Qwen2.5 coder (achieving an F1-score of 0.77 with GraphRAG), proved to be the most effective both with and without GraphRAG. Survey results regarding the descriptions and solutions generated with GraphRAG compared to those without were inconclusive, suggesting comparable quality. The thesis also discusses potential threats to validity and outlines areas for future research in GraphRAG for log-based root cause analysis, including the expansion of domain-specific knowledge within the graph.}}, author = {{Johansson, Tobias and Khan, Ishaaq}}, language = {{eng}}, note = {{Student Paper}}, title = {{Enhancing a Log-analyzer with GraphRAG}}, year = {{2025}}, }