Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Anomaly Detection in Streaming Time Series Data Using Active Learning and Metalearning

Lundgren, Jonas (2020)
Department of Automatic Control
Abstract
In this thesis a framework for finding anomalies in streaming data is proposed. The framework proposed is not necessarily applicable only to problems in anomaly detection, but could be applied to other problems as well. There are three main concepts at play in the framework: (i) Active Learning, a learning algorithm which can query a human specialist for labels of instance such that the model can improve, from an otherwise unlabeled data set. (ii) Ensemble which is a combination of models, often weaker models, where the idea is that the combined result from all models will mitigate the error in every single model and thus provide better results. (iii) Metalearning which is the concept of having a second model learn model characteristics... (More)
In this thesis a framework for finding anomalies in streaming data is proposed. The framework proposed is not necessarily applicable only to problems in anomaly detection, but could be applied to other problems as well. There are three main concepts at play in the framework: (i) Active Learning, a learning algorithm which can query a human specialist for labels of instance such that the model can improve, from an otherwise unlabeled data set. (ii) Ensemble which is a combination of models, often weaker models, where the idea is that the combined result from all models will mitigate the error in every single model and thus provide better results. (iii) Metalearning which is the concept of having a second model learn model characteristics for a problem. In this thesis metalearning will be used to weight ensemble members.
The framework is displayed in Figure 0.1. The meta learner takes instances as input and output weights for each ensemble member according to its performance of previous similar instances. Thus the total output is a dynamically weighted ensemble output where the weighting is based on
the input. When a human expert provides label feedback on misclassified instances only the meta learner is updated in order to provide new weights for the ensemble to suppress the error and not the entire ensemble.
We want to leverage the fact that different ensemble members have different characteristics which makes them more or less suitable to make predictions for certain instances. We weight the ensemble members using a neural network, taking the instance as input to weight the ensemble members in accordance with their capacity to make a prediction for certain instances. The loss to train the neural network is composed of two parts, the first a supervised part lossAAD, using the labels provided by a human expert, and a second part lossprior which places a uniform prior on the ensemble members. When new labels are provided the meta learner is updated so as not to
misclassify any of the labeled instances.
The framework was tested on the Yahoo Webscope benchmark dataset consisting of four different types of time series. The proposed framework had an AUC of 0.9088, 0.9787, 0.8998 and 0.8123 for the four datasets corresponding to the second highest AUC for 2 data sets and third highest for the remaining 2 data sets out of the models that were used for comparison. (Less)
Please use this url to cite or link to this publication:
author
Lundgren, Jonas
supervisor
organization
year
type
H3 - Professional qualifications (4 Years - )
subject
report number
TFRT-6101
other publication id
0280-5316
language
English
id
9020289
date added to LUP
2020-07-16 09:03:43
date last changed
2020-07-16 09:03:43
@misc{9020289,
  abstract     = {{In this thesis a framework for finding anomalies in streaming data is proposed. The framework proposed is not necessarily applicable only to problems in anomaly detection, but could be applied to other problems as well. There are three main concepts at play in the framework: (i) Active Learning, a learning algorithm which can query a human specialist for labels of instance such that the model can improve, from an otherwise unlabeled data set. (ii) Ensemble which is a combination of models, often weaker models, where the idea is that the combined result from all models will mitigate the error in every single model and thus provide better results. (iii) Metalearning which is the concept of having a second model learn model characteristics for a problem. In this thesis metalearning will be used to weight ensemble members.
 The framework is displayed in Figure 0.1. The meta learner takes instances as input and output weights for each ensemble member according to its performance of previous similar instances. Thus the total output is a dynamically weighted ensemble output where the weighting is based on
the input. When a human expert provides label feedback on misclassified instances only the meta learner is updated in order to provide new weights for the ensemble to suppress the error and not the entire ensemble.
 We want to leverage the fact that different ensemble members have different characteristics which makes them more or less suitable to make predictions for certain instances. We weight the ensemble members using a neural network, taking the instance as input to weight the ensemble members in accordance with their capacity to make a prediction for certain instances. The loss to train the neural network is composed of two parts, the first a supervised part lossAAD, using the labels provided by a human expert, and a second part lossprior which places a uniform prior on the ensemble members. When new labels are provided the meta learner is updated so as not to
misclassify any of the labeled instances. 
 The framework was tested on the Yahoo Webscope benchmark dataset consisting of four different types of time series. The proposed framework had an AUC of 0.9088, 0.9787, 0.8998 and 0.8123 for the four datasets corresponding to the second highest AUC for 2 data sets and third highest for the remaining 2 data sets out of the models that were used for comparison.}},
  author       = {{Lundgren, Jonas}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Anomaly Detection in Streaming Time Series Data Using Active Learning and Metalearning}},
  year         = {{2020}},
}