Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Towards Automated Log Message Embeddings for Anomaly Detection

Murphy, Adrian and Larsson, Daniel (2024)
Department of Automatic Control
Abstract
Log messages are implemented by developers to record important runtime information about a system. For that reason, system logs can provide insight into the state and health of a system and potentially be used to anticipate and discover errors. Manually inspecting these logs becomes impractical due to the high volume of messages generated by modern systems. Consequently, the research field of machine learning-based log anomaly detection has emerged to automatically identify irregularities. Parsing log messages into a structured, tractable format is a vital step in log anomaly detection. This degree project investigates the application of log message embeddings, a recently proposed log parsing method, for anomaly detection in complex IT... (More)
Log messages are implemented by developers to record important runtime information about a system. For that reason, system logs can provide insight into the state and health of a system and potentially be used to anticipate and discover errors. Manually inspecting these logs becomes impractical due to the high volume of messages generated by modern systems. Consequently, the research field of machine learning-based log anomaly detection has emerged to automatically identify irregularities. Parsing log messages into a structured, tractable format is a vital step in log anomaly detection. This degree project investigates the application of log message embeddings, a recently proposed log parsing method, for anomaly detection in complex IT systems and measures their resilience to concept drift, where the format of log messages changes over time, in comparison with a traditional parsing approach. Empirical analyses are conducted on two benchmark datasets, revealing that log message embeddings not only achieve anomaly detection results on par with traditional methods but also demonstrate considerable robustness against concept drift. A key focus of this project is on the application of large language models to automate the log embedding pipeline by handling out-of-vocabulary words and extracting synonymous and antonymous word relationships. These capabilities are important for distinguishing log messages that are identical except for one or more synonymous or antonymous word pairs. While large language models show promise in these tasks, experiments highlight the need for further refinement to match the performance achieved through manual operator feedback. (Less)
Please use this url to cite or link to this publication:
author
Murphy, Adrian and Larsson, Daniel
supervisor
organization
year
type
H3 - Professional qualifications (4 Years - )
subject
report number
TFRT-6222
other publication id
0280-5316
language
English
id
9148775
date added to LUP
2024-02-22 11:29:49
date last changed
2024-02-22 11:29:49
@misc{9148775,
  abstract     = {{Log messages are implemented by developers to record important runtime information about a system. For that reason, system logs can provide insight into the state and health of a system and potentially be used to anticipate and discover errors. Manually inspecting these logs becomes impractical due to the high volume of messages generated by modern systems. Consequently, the research field of machine learning-based log anomaly detection has emerged to automatically identify irregularities. Parsing log messages into a structured, tractable format is a vital step in log anomaly detection. This degree project investigates the application of log message embeddings, a recently proposed log parsing method, for anomaly detection in complex IT systems and measures their resilience to concept drift, where the format of log messages changes over time, in comparison with a traditional parsing approach. Empirical analyses are conducted on two benchmark datasets, revealing that log message embeddings not only achieve anomaly detection results on par with traditional methods but also demonstrate considerable robustness against concept drift. A key focus of this project is on the application of large language models to automate the log embedding pipeline by handling out-of-vocabulary words and extracting synonymous and antonymous word relationships. These capabilities are important for distinguishing log messages that are identical except for one or more synonymous or antonymous word pairs. While large language models show promise in these tasks, experiments highlight the need for further refinement to match the performance achieved through manual operator feedback.}},
  author       = {{Murphy, Adrian and Larsson, Daniel}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Towards Automated Log Message Embeddings for Anomaly Detection}},
  year         = {{2024}},
}