Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Real-Time Anomaly Detection Using Distributed Tracing in Microservice Cloud Applications

Raeiszadeh, Mahsa ; Ebrahimzadeh, Amin ; Saleem, Ahsan ; Glitho, Roch ; Eker, Johan LU orcid and Mini, Raquel LU (2023)
Abstract
Distributed tracing plays a vital role in microservice infrastructure, and learning-based trace analysis has been utilized to detect anomalies within such systems. However, existing approaches for learning-based trace-based anomaly detection face certain limitations. Some assume that trace patterns can be learned solely from normal executions, while others depend on anomaly injection to generate labeled traces categorized as normal or anomalous. However, in practical scenarios, anomalies may also happen during the normal execution. Moreover, a wide variety of anomalies may occur in practice, which cannot be captured solely through anomaly injection. To address these issues, we propose a Trace-Driven Anomaly Detection (TDAD) approach based... (More)
Distributed tracing plays a vital role in microservice infrastructure, and learning-based trace analysis has been utilized to detect anomalies within such systems. However, existing approaches for learning-based trace-based anomaly detection face certain limitations. Some assume that trace patterns can be learned solely from normal executions, while others depend on anomaly injection to generate labeled traces categorized as normal or anomalous. However, in practical scenarios, anomalies may also happen during the normal execution. Moreover, a wide variety of anomalies may occur in practice, which cannot be captured solely through anomaly injection. To address these issues, we propose a Trace-Driven Anomaly Detection (TDAD) approach based on a Span Causal Graph (SCG) representation, which trains a model using a Graph Neural Network (GNN) and Positive and Unlabeled (PU) learning. This technique allows the model parameters to be optimized by estimating the underlying data distribution. As a result, TDAD can be effectively trained using a small number of labeled anomalous traces along with a relatively large number of unlabeled traces. Our evaluation reveals that TDAD outperforms not only the existing unsupervised trace-based anomaly detection methods by 11.9% in terms of F1-score but also a supervised learning-based benchmark by 12x in terms of detection time. (Less)
Please use this url to cite or link to this publication:
author
; ; ; ; and
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
host publication
Proceeding of IEEE CloudNet 2023
project
AORTA: Advanced Offloading for Real-Time Applications
language
Swedish
LU publication?
no
id
1f629ae5-15c8-46b3-aa34-9ac31d7d407c
date added to LUP
2023-11-07 21:43:42
date last changed
2023-12-04 07:48:54
@inproceedings{1f629ae5-15c8-46b3-aa34-9ac31d7d407c,
  abstract     = {{Distributed tracing plays a vital role in microservice infrastructure, and learning-based trace analysis has been utilized to detect anomalies within such systems. However, existing approaches for learning-based trace-based anomaly detection face certain limitations. Some assume that trace patterns can be learned solely from normal executions, while others depend on anomaly injection to generate labeled traces categorized as normal or anomalous. However, in practical scenarios, anomalies may also happen during the normal execution. Moreover, a wide variety of anomalies may occur in practice, which cannot be captured solely through anomaly injection. To address these issues, we propose a Trace-Driven Anomaly Detection (TDAD) approach based on a Span Causal Graph (SCG) representation, which trains a model using a Graph Neural Network (GNN) and Positive and Unlabeled (PU) learning. This technique allows the model parameters to be optimized by estimating the underlying data distribution. As a result, TDAD can be effectively trained using a small number of labeled anomalous traces along with a relatively large number of unlabeled traces. Our evaluation reveals that TDAD outperforms not only the existing unsupervised trace-based anomaly detection methods by 11.9% in terms of F1-score but also a supervised learning-based benchmark by 12x in terms of detection time.}},
  author       = {{Raeiszadeh, Mahsa and Ebrahimzadeh, Amin and Saleem, Ahsan and Glitho, Roch and Eker, Johan and Mini, Raquel}},
  booktitle    = {{Proceeding of IEEE CloudNet 2023}},
  language     = {{swe}},
  month        = {{11}},
  title        = {{Real-Time Anomaly Detection Using Distributed Tracing in Microservice Cloud Applications}},
  url          = {{https://lup.lub.lu.se/search/files/165150749/Real_Time_Anomaly_Detection_Using_Distributed_Tracing_in_Microservice_Cloud_Applications.pdf}},
  year         = {{2023}},
}