Log Anomaly Detection of Structured Logs in a Distributed Cloud System

Nilsson, David; Olsson, Albin

Log Anomaly Detection of Structured Logs in a Distributed Cloud System

Mark

Nilsson, David and Olsson, Albin (2022)
Department of Automatic Control

Abstract: As computer systems grow larger and more complex, the task of maintaining the system and finding potential security threats or other malfunctions become increasingly hard. Traditionally, this has had to be done by manually examining the logs. In modern systems, this can become infeasible due to either the large amount of logs or the complexity of the system. By using machine learning based anomaly detection to analyze system logs, this can be done automatically.
In this thesis the authors have researched the area of anomaly detection, and implemented an anomaly detection pipeline for a specific system. Three different machine learning based anomaly detection models were implemented, namely a clustering algorithm, PCA, and a neural network... (More); As computer systems grow larger and more complex, the task of maintaining the system and finding potential security threats or other malfunctions become increasingly hard. Traditionally, this has had to be done by manually examining the logs. In modern systems, this can become infeasible due to either the large amount of logs or the complexity of the system. By using machine learning based anomaly detection to analyze system logs, this can be done automatically.
In this thesis the authors have researched the area of anomaly detection, and implemented an anomaly detection pipeline for a specific system. Three different machine learning based anomaly detection models were implemented, namely a clustering algorithm, PCA, and a neural network in the form of an autoencoder. These models were compared and evaluated with regards to a baseline error detection system, which was already in place for the target system. They were also compared against each other to find which models performed best, and in which circumstances. To compare the models, six different types of known anomalies were injected into the data.
When comparing the performances of the different methods, all of them were found to outperform the baseline system. In the first experiment, where the models were trained and tested using data from the same time period, PCA achieved the highest F1-score of 0.990. In the second experiment the models were trained and tested using data from separate time periods. In this scenario, the clustering algorithm outperformed the others, with an F1-score of 0.879. Both PCA and the autoencoder found many false positives, reducing their precision and thereby their F1-score. (Less)

- Open Access
- |
- PDF

Links

Document download statistics

Related Materials

Related object is popular science:
Popular Science summary

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9096767

author

Nilsson, David and Olsson, Albin

supervisor

organization

Department of Automatic Control

year

2022

type

H3 - Professional qualifications (4 Years - )

subject

Technology and Engineering

report number

TFRT-6176

ISSN

0280-5316

language

English

id

9096767

date added to LUP

2022-08-12 09:49:20

date last changed

2022-08-12 09:49:20

@misc{9096767,
  abstract     = {{As computer systems grow larger and more complex, the task of maintaining the system and finding potential security threats or other malfunctions become increasingly hard. Traditionally, this has had to be done by manually examining the logs. In modern systems, this can become infeasible due to either the large amount of logs or the complexity of the system. By using machine learning based anomaly detection to analyze system logs, this can be done automatically.
In this thesis the authors have researched the area of anomaly detection, and implemented an anomaly detection pipeline for a specific system. Three different machine learning based anomaly detection models were implemented, namely a clustering algorithm, PCA, and a neural network in the form of an autoencoder. These models were compared and evaluated with regards to a baseline error detection system, which was already in place for the target system. They were also compared against each other to find which models performed best, and in which circumstances. To compare the models, six different types of known anomalies were injected into the data.
When comparing the performances of the different methods, all of them were found to outperform the baseline system. In the first experiment, where the models were trained and tested using data from the same time period, PCA achieved the highest F1-score of 0.990. In the second experiment the models were trained and tested using data from separate time periods. In this scenario, the clustering algorithm outperformed the others, with an F1-score of 0.879. Both PCA and the autoencoder found many false positives, reducing their precision and thereby their F1-score.}},
  author       = {{Nilsson, David and Olsson, Albin}},
  issn         = {{0280-5316}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Log Anomaly Detection of Structured Logs in a Distributed Cloud System}},
  year         = {{2022}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Log Anomaly Detection of Structured Logs in a Distributed Cloud System