Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Asynchronous Real-Time Federated Learning for Anomaly Detection in Microservice Cloud Applications

Raeiszadeh, Mahsa ; Ebrahimzadeh, Amin ; Glitho, Roch ; Eker, Johan LU orcid and Mini, Raquel LU (2025) In IEEE Transactions on Machine Learning in Communications and Networking 3.
Abstract
The complexity and dynamicity of microservice architectures in cloud environments present substantial challenges to the reliability and availability of the services built on these architectures. Therefore, effective anomaly detection is crucial to prevent impending failures and resolve them promptly. Distributed data analysis techniques based on machine learning (ML) have recently gained attention in detecting anomalies in microservice systems. ML-based anomaly detection techniques mostly require centralized data collection and processing, which may raise scalability and computational issues in practice. In this paper, we propose an Asynchronous Real-Time Federated Learning (ART-FL) approach for anomaly detection in cloud-based... (More)
The complexity and dynamicity of microservice architectures in cloud environments present substantial challenges to the reliability and availability of the services built on these architectures. Therefore, effective anomaly detection is crucial to prevent impending failures and resolve them promptly. Distributed data analysis techniques based on machine learning (ML) have recently gained attention in detecting anomalies in microservice systems. ML-based anomaly detection techniques mostly require centralized data collection and processing, which may raise scalability and computational issues in practice. In this paper, we propose an Asynchronous Real-Time Federated Learning (ART-FL) approach for anomaly detection in cloud-based microservice systems. In our approach, edge clients perform real-time learning with continuous streaming local data. At the edge clients, we model intra-service behaviors and inter-service dependencies in multi-source distributed data based on a Span Causal Graph (SCG) representation and train a model through a combination of Graph Neural Network (GNN) and Positive and Unlabeled (PU) learning. Our FL approach updates the global model in an asynchronous manner to achieve accurate and efficient anomaly detection, addressing computational overhead across diverse edge clients, including those that experience delays. Our trace-driven evaluations indicate that the proposed method outperforms the state-of-the-art anomaly detection methods by 4% in terms of F1 -score while meeting the given time efficiency and scalability requirements. (Less)
Abstract (Swedish)
The complexity and dynamicity of microservice architectures in cloud environments present substantial challenges to the reliability and availability of the services built on these architectures. Therefore, effective anomaly detection is crucial to prevent impending failures and resolve them promptly. Distributed data analysis techniques based on machine learning (ML) have recently gained attention in detecting anomalies in microservice systems. ML-based anomaly detection techniques mostly require centralized data collection and processing, which may raise scalability and computational issues in practice. In this paper, we propose an Asynchronous Real-Time Federated Learning (ART-FL) approach for anomaly detection in cloud-based... (More)
The complexity and dynamicity of microservice architectures in cloud environments present substantial challenges to the reliability and availability of the services built on these architectures. Therefore, effective anomaly detection is crucial to prevent impending failures and resolve them promptly. Distributed data analysis techniques based on machine learning (ML) have recently gained attention in detecting anomalies in microservice systems. ML-based anomaly detection techniques mostly require centralized data collection and processing, which may raise scalability and computational issues in practice. In this paper, we propose an Asynchronous Real-Time Federated Learning (ART-FL) approach for anomaly detection in cloud-based microservice systems. In our approach, edge clients perform real-time learning with continuous streaming local data. At the edge clients, we model intra-service behaviors and inter-service dependencies in multi-source distributed data based on a Span Causal Graph (SCG) representation and train a model through a combination of Graph Neural Network (GNN) and Positive and Unlabeled (PU) learning. Our FL approach updates the global model in an asynchronous manner to achieve accurate and efficient anomaly detection, addressing computational overhead across diverse edge clients, including those that experience delays. Our trace-driven evaluations indicate that the proposed method outperforms the state-of-the-art anomaly detection methods by 4% in terms of F1 -score while meeting the given time efficiency and scalability requirements. (Less)
Please use this url to cite or link to this publication:
author
; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Cloud computing
in
IEEE Transactions on Machine Learning in Communications and Networking
volume
3
publisher
IEEE - Institute of Electrical and Electronics Engineers Inc.
ISSN
2831-316X
DOI
10.1109/TMLCN.2025.3527919
project
AORTA: Advanced Offloading for Real-Time Applications
language
English
LU publication?
yes
id
cdab3389-7db2-4f93-92f5-ef71483f8483
date added to LUP
2025-01-07 14:17:08
date last changed
2025-04-04 14:45:03
@article{cdab3389-7db2-4f93-92f5-ef71483f8483,
  abstract     = {{The complexity and dynamicity of microservice architectures in cloud environments present substantial challenges to the reliability and availability of the services built on these architectures. Therefore, effective anomaly detection is crucial to prevent impending failures and resolve them promptly. Distributed data analysis techniques based on machine learning (ML) have recently gained attention in detecting anomalies in microservice systems. ML-based anomaly detection techniques mostly require centralized data collection and processing, which may raise scalability and computational issues in practice. In this paper, we propose an Asynchronous Real-Time Federated Learning (ART-FL) approach for anomaly detection in cloud-based microservice systems. In our approach, edge clients perform real-time learning with continuous streaming local data. At the edge clients, we model intra-service behaviors and inter-service dependencies in multi-source distributed data based on a Span Causal Graph (SCG) representation and train a model through a combination of Graph Neural Network (GNN) and Positive and Unlabeled (PU) learning. Our FL approach updates the global model in an asynchronous manner to achieve accurate and efficient anomaly detection, addressing computational overhead across diverse edge clients, including those that experience delays. Our trace-driven evaluations indicate that the proposed method outperforms the state-of-the-art anomaly detection methods by 4% in terms of F1 -score while meeting the given time efficiency and scalability requirements.}},
  author       = {{Raeiszadeh, Mahsa and Ebrahimzadeh, Amin and Glitho, Roch and Eker, Johan and Mini, Raquel}},
  issn         = {{2831-316X}},
  keywords     = {{Cloud computing}},
  language     = {{eng}},
  month        = {{01}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{IEEE Transactions on Machine Learning in Communications and Networking}},
  title        = {{Asynchronous Real-Time Federated Learning for Anomaly Detection in Microservice Cloud Applications}},
  url          = {{http://dx.doi.org/10.1109/TMLCN.2025.3527919}},
  doi          = {{10.1109/TMLCN.2025.3527919}},
  volume       = {{3}},
  year         = {{2025}},
}