Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Real-time unsupervised log event anomaly detection in public transportation

Segui, Felicia and Timürtas, Andreas (2022)
Department of Automatic Control
Abstract
Detecting log data anomalies in real-time is useful since it makes it possible to apply logic that corrects the anomalies when they happen. This project presents a method for detecting public transportation bus event log data anomalies in realtime, without having a labeled data set. Initially, each unique bus trip is represented by the event frequencies, a representation that is not suitable for real-time. With a data set assumed to only contain normal data, an autoencoder, a PCA model and a clustering algorithm label each data point in the frequency domain, as normal or anomalous. The labeled data is split into sequences of events with a rolling window, a representation that is suitable for detecting anomalies in real-time. To separate... (More)
Detecting log data anomalies in real-time is useful since it makes it possible to apply logic that corrects the anomalies when they happen. This project presents a method for detecting public transportation bus event log data anomalies in realtime, without having a labeled data set. Initially, each unique bus trip is represented by the event frequencies, a representation that is not suitable for real-time. With a data set assumed to only contain normal data, an autoencoder, a PCA model and a clustering algorithm label each data point in the frequency domain, as normal or anomalous. The labeled data is split into sequences of events with a rolling window, a representation that is suitable for detecting anomalies in real-time. To separate the anomalous event sequences from the normal event sequences that occur, during the same bus trip as an anomalous event sequence, the event sequences together with their labels are grouped and counted. By comparing the frequency for each event sequence in anomalous trips with the frequency of the corresponding event sequence in normal trips, the sequences that are overrepresented in anomalous trips are detected and receive a final label being normal or anomalous. These labeled sequences are further used in the real-time detector. With the three base labeling models (autoencoder, PCA and clustering algorithm), different combinations of models are created. These models are either created by applying the union or the intersection of all anomalous labeled journeys. This results in 11 different models that are all tested and evaluated. The evaluation is performed by calculating the recall, precision and F1-score of experiments performed with a data set of assumed normal journeys, together with injected simulated anomalies. The evaluation is performed at two places within the method; one after the initial labeling and another after the real-time detector. The results obtained using this evaluation method show that the combination using the autoencoder and the clustering algorithm together through intersection is the best model combination, based on the F1-score calculated after the real-time detection. This combination scores a median recall and precision of 0.89 respectively 0.72, which results in an F1-score of 0.79. (Less)
Please use this url to cite or link to this publication:
author
Segui, Felicia and Timürtas, Andreas
supervisor
organization
year
type
H3 - Professional qualifications (4 Years - )
subject
report number
TFRT-6184
ISSN
0280-5316
language
English
id
9101077
date added to LUP
2022-09-29 15:03:08
date last changed
2022-09-29 15:03:08
@misc{9101077,
  abstract     = {{Detecting log data anomalies in real-time is useful since it makes it possible to apply logic that corrects the anomalies when they happen. This project presents a method for detecting public transportation bus event log data anomalies in realtime, without having a labeled data set. Initially, each unique bus trip is represented by the event frequencies, a representation that is not suitable for real-time. With a data set assumed to only contain normal data, an autoencoder, a PCA model and a clustering algorithm label each data point in the frequency domain, as normal or anomalous. The labeled data is split into sequences of events with a rolling window, a representation that is suitable for detecting anomalies in real-time. To separate the anomalous event sequences from the normal event sequences that occur, during the same bus trip as an anomalous event sequence, the event sequences together with their labels are grouped and counted. By comparing the frequency for each event sequence in anomalous trips with the frequency of the corresponding event sequence in normal trips, the sequences that are overrepresented in anomalous trips are detected and receive a final label being normal or anomalous. These labeled sequences are further used in the real-time detector. With the three base labeling models (autoencoder, PCA and clustering algorithm), different combinations of models are created. These models are either created by applying the union or the intersection of all anomalous labeled journeys. This results in 11 different models that are all tested and evaluated. The evaluation is performed by calculating the recall, precision and F1-score of experiments performed with a data set of assumed normal journeys, together with injected simulated anomalies. The evaluation is performed at two places within the method; one after the initial labeling and another after the real-time detector. The results obtained using this evaluation method show that the combination using the autoencoder and the clustering algorithm together through intersection is the best model combination, based on the F1-score calculated after the real-time detection. This combination scores a median recall and precision of 0.89 respectively 0.72, which results in an F1-score of 0.79.}},
  author       = {{Segui, Felicia and Timürtas, Andreas}},
  issn         = {{0280-5316}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Real-time unsupervised log event anomaly detection in public transportation}},
  year         = {{2022}},
}