Automatic Log Based Anomaly Detection in Cloud Operations using Machine Learning

Gummesson Atroshi, Jacob; Le, Christian

Automatic Log Based Anomaly Detection in Cloud Operations using Machine Learning

Mark

Gummesson Atroshi, Jacob and Le, Christian (2021)
Department of Automatic Control

Abstract: For modern large scale cloud services a fast and reliable anomaly detection is of utmost importance. Traditionally developers perform simple keyword search, for keywords such as "error" or "fail" in the log data, one of the main data sources that depicts the state of the system. In today’s large-scale systems however several TB of log messages can be output every day making manual search highly ineffective. To address the problem there have been many anomaly detection methods based on the few publicly available log data sets. In this thesis we present a unique data collection method using a virtualized OpenStack cloud system to collect log data from six simulated anomaly scenarios. Three different detection methods are presented using both... (More); For modern large scale cloud services a fast and reliable anomaly detection is of utmost importance. Traditionally developers perform simple keyword search, for keywords such as "error" or "fail" in the log data, one of the main data sources that depicts the state of the system. In today’s large-scale systems however several TB of log messages can be output every day making manual search highly ineffective. To address the problem there have been many anomaly detection methods based on the few publicly available log data sets. In this thesis we present a unique data collection method using a virtualized OpenStack cloud system to collect log data from six simulated anomaly scenarios. Three different detection methods are presented using both the dynamic and static parts of the individual log messages. An investigation of the impact of parameters such as time window size is done by an evaluation of the various anomaly types. Among the four conventional machine learning models based on the static parts gave a good performance of a 50% detection rate with a 0.35% false alarm rate. In addition the results show a better LSTM model performance when using the dynamic rather than the static parts. For the LSTM using dynamic parameters the results depended on the anomaly type, and the parameter, with the best average scores around 55-65% detection rate with a false alarm rate around 0.5-1%. (Less)

- Open Access
- |
- PDF

Links

Document download statistics

Related Materials

Related object is popular science:
Popular science summary

Please use this url to cite or link to this publication: https://lup.lub.lu.se/student-papers/record/9049552

author

Gummesson Atroshi, Jacob and Le, Christian

supervisor

organization

Department of Automatic Control

year

2021

type

H3 - Professional qualifications (4 Years - )

subject

Technology and Engineering

report number

TFRT-6129

other publication id

0280-5316

language

English

id

9049552

date added to LUP

2021-06-04 14:45:23

date last changed

2021-06-04 14:45:23

@misc{9049552,
  abstract     = {{For modern large scale cloud services a fast and reliable anomaly detection is of utmost importance. Traditionally developers perform simple keyword search, for keywords such as "error" or "fail" in the log data, one of the main data sources that depicts the state of the system. In today’s large-scale systems however several TB of log messages can be output every day making manual search highly ineffective. To address the problem there have been many anomaly detection methods based on the few publicly available log data sets. In this thesis we present a unique data collection method using a virtualized OpenStack cloud system to collect log data from six simulated anomaly scenarios. Three different detection methods are presented using both the dynamic and static parts of the individual log messages. An investigation of the impact of parameters such as time window size is done by an evaluation of the various anomaly types. Among the four conventional machine learning models based on the static parts gave a good performance of a 50% detection rate with a 0.35% false alarm rate. In addition the results show a better LSTM model performance when using the dynamic rather than the static parts. For the LSTM using dynamic parameters the results depended on the anomaly type, and the parameter, with the best average scores around 55-65% detection rate with a false alarm rate around 0.5-1%.}},
  author       = {{Gummesson Atroshi, Jacob and Le, Christian}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Automatic Log Based Anomaly Detection in Cloud Operations using Machine Learning}},
  year         = {{2021}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Automatic Log Based Anomaly Detection in Cloud Operations using Machine Learning