Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Automatic Log Based Anomaly Detection in Cloud Operations using Machine Learning

Gummesson Atroshi, Jacob and Le, Christian (2021)
Department of Automatic Control
Abstract
For modern large scale cloud services a fast and reliable anomaly detection is of utmost importance. Traditionally developers perform simple keyword search, for keywords such as "error" or "fail" in the log data, one of the main data sources that depicts the state of the system. In today’s large-scale systems however several TB of log messages can be output every day making manual search highly ineffective. To address the problem there have been many anomaly detection methods based on the few publicly available log data sets. In this thesis we present a unique data collection method using a virtualized OpenStack cloud system to collect log data from six simulated anomaly scenarios. Three different detection methods are presented using both... (More)
For modern large scale cloud services a fast and reliable anomaly detection is of utmost importance. Traditionally developers perform simple keyword search, for keywords such as "error" or "fail" in the log data, one of the main data sources that depicts the state of the system. In today’s large-scale systems however several TB of log messages can be output every day making manual search highly ineffective. To address the problem there have been many anomaly detection methods based on the few publicly available log data sets. In this thesis we present a unique data collection method using a virtualized OpenStack cloud system to collect log data from six simulated anomaly scenarios. Three different detection methods are presented using both the dynamic and static parts of the individual log messages. An investigation of the impact of parameters such as time window size is done by an evaluation of the various anomaly types. Among the four conventional machine learning models based on the static parts gave a good performance of a 50% detection rate with a 0.35% false alarm rate. In addition the results show a better LSTM model performance when using the dynamic rather than the static parts. For the LSTM using dynamic parameters the results depended on the anomaly type, and the parameter, with the best average scores around 55-65% detection rate with a false alarm rate around 0.5-1%. (Less)
Please use this url to cite or link to this publication:
author
Gummesson Atroshi, Jacob and Le, Christian
supervisor
organization
year
type
H3 - Professional qualifications (4 Years - )
subject
report number
TFRT-6129
other publication id
0280-5316
language
English
id
9049552
date added to LUP
2021-06-04 14:45:23
date last changed
2021-06-04 14:45:23
@misc{9049552,
  abstract     = {{For modern large scale cloud services a fast and reliable anomaly detection is of utmost importance. Traditionally developers perform simple keyword search, for keywords such as "error" or "fail" in the log data, one of the main data sources that depicts the state of the system. In today’s large-scale systems however several TB of log messages can be output every day making manual search highly ineffective. To address the problem there have been many anomaly detection methods based on the few publicly available log data sets. In this thesis we present a unique data collection method using a virtualized OpenStack cloud system to collect log data from six simulated anomaly scenarios. Three different detection methods are presented using both the dynamic and static parts of the individual log messages. An investigation of the impact of parameters such as time window size is done by an evaluation of the various anomaly types. Among the four conventional machine learning models based on the static parts gave a good performance of a 50% detection rate with a 0.35% false alarm rate. In addition the results show a better LSTM model performance when using the dynamic rather than the static parts. For the LSTM using dynamic parameters the results depended on the anomaly type, and the parameter, with the best average scores around 55-65% detection rate with a false alarm rate around 0.5-1%.}},
  author       = {{Gummesson Atroshi, Jacob and Le, Christian}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Automatic Log Based Anomaly Detection in Cloud Operations using Machine Learning}},
  year         = {{2021}},
}