Asynchronous Real-Time Federated Learning for Anomaly Detection in Microservice Cloud Applications
(2025) In IEEE Transactions on Machine Learning in Communications and Networking 3.- Abstract
- The complexity and dynamicity of microservice architectures in cloud environments present substantial challenges to the reliability and availability of the services built on these architectures. Therefore, effective anomaly detection is crucial to prevent impending failures and resolve them promptly. Distributed data analysis techniques based on machine learning (ML) have recently gained attention in detecting anomalies in microservice systems. ML-based anomaly detection techniques mostly require centralized data collection and processing, which may raise scalability and computational issues in practice. In this paper, we propose an Asynchronous Real-Time Federated Learning (ART-FL) approach for anomaly detection in cloud-based... (More)
- The complexity and dynamicity of microservice architectures in cloud environments present substantial challenges to the reliability and availability of the services built on these architectures. Therefore, effective anomaly detection is crucial to prevent impending failures and resolve them promptly. Distributed data analysis techniques based on machine learning (ML) have recently gained attention in detecting anomalies in microservice systems. ML-based anomaly detection techniques mostly require centralized data collection and processing, which may raise scalability and computational issues in practice. In this paper, we propose an Asynchronous Real-Time Federated Learning (ART-FL) approach for anomaly detection in cloud-based microservice systems. In our approach, edge clients perform real-time learning with continuous streaming local data. At the edge clients, we model intra-service behaviors and inter-service dependencies in multi-source distributed data based on a Span Causal Graph (SCG) representation and train a model through a combination of Graph Neural Network (GNN) and Positive and Unlabeled (PU) learning. Our FL approach updates the global model in an asynchronous manner to achieve accurate and efficient anomaly detection, addressing computational overhead across diverse edge clients, including those that experience delays. Our trace-driven evaluations indicate that the proposed method outperforms the state-of-the-art anomaly detection methods by 4% in terms of F1 -score while meeting the given time efficiency and scalability requirements. (Less)
- Abstract (Swedish)
- The complexity and dynamicity of microservice architectures in cloud environments present substantial challenges to the reliability and availability of the services built on these architectures. Therefore, effective anomaly detection is crucial to prevent impending failures and resolve them promptly. Distributed data analysis techniques based on machine learning (ML) have recently gained attention in detecting anomalies in microservice systems. ML-based anomaly detection techniques mostly require centralized data collection and processing, which may raise scalability and computational issues in practice. In this paper, we propose an Asynchronous Real-Time Federated Learning (ART-FL) approach for anomaly detection in cloud-based... (More)
- The complexity and dynamicity of microservice architectures in cloud environments present substantial challenges to the reliability and availability of the services built on these architectures. Therefore, effective anomaly detection is crucial to prevent impending failures and resolve them promptly. Distributed data analysis techniques based on machine learning (ML) have recently gained attention in detecting anomalies in microservice systems. ML-based anomaly detection techniques mostly require centralized data collection and processing, which may raise scalability and computational issues in practice. In this paper, we propose an Asynchronous Real-Time Federated Learning (ART-FL) approach for anomaly detection in cloud-based microservice systems. In our approach, edge clients perform real-time learning with continuous streaming local data. At the edge clients, we model intra-service behaviors and inter-service dependencies in multi-source distributed data based on a Span Causal Graph (SCG) representation and train a model through a combination of Graph Neural Network (GNN) and Positive and Unlabeled (PU) learning. Our FL approach updates the global model in an asynchronous manner to achieve accurate and efficient anomaly detection, addressing computational overhead across diverse edge clients, including those that experience delays. Our trace-driven evaluations indicate that the proposed method outperforms the state-of-the-art anomaly detection methods by 4% in terms of F1 -score while meeting the given time efficiency and scalability requirements. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/cdab3389-7db2-4f93-92f5-ef71483f8483
- author
- Raeiszadeh, Mahsa
; Ebrahimzadeh, Amin
; Glitho, Roch
; Eker, Johan
LU
and Mini, Raquel LU
- organization
- publishing date
- 2025-01-04
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- Cloud computing
- in
- IEEE Transactions on Machine Learning in Communications and Networking
- volume
- 3
- publisher
- IEEE - Institute of Electrical and Electronics Engineers Inc.
- ISSN
- 2831-316X
- DOI
- 10.1109/TMLCN.2025.3527919
- project
- AORTA: Advanced Offloading for Real-Time Applications
- language
- English
- LU publication?
- yes
- id
- cdab3389-7db2-4f93-92f5-ef71483f8483
- date added to LUP
- 2025-01-07 14:17:08
- date last changed
- 2025-04-04 14:45:03
@article{cdab3389-7db2-4f93-92f5-ef71483f8483, abstract = {{The complexity and dynamicity of microservice architectures in cloud environments present substantial challenges to the reliability and availability of the services built on these architectures. Therefore, effective anomaly detection is crucial to prevent impending failures and resolve them promptly. Distributed data analysis techniques based on machine learning (ML) have recently gained attention in detecting anomalies in microservice systems. ML-based anomaly detection techniques mostly require centralized data collection and processing, which may raise scalability and computational issues in practice. In this paper, we propose an Asynchronous Real-Time Federated Learning (ART-FL) approach for anomaly detection in cloud-based microservice systems. In our approach, edge clients perform real-time learning with continuous streaming local data. At the edge clients, we model intra-service behaviors and inter-service dependencies in multi-source distributed data based on a Span Causal Graph (SCG) representation and train a model through a combination of Graph Neural Network (GNN) and Positive and Unlabeled (PU) learning. Our FL approach updates the global model in an asynchronous manner to achieve accurate and efficient anomaly detection, addressing computational overhead across diverse edge clients, including those that experience delays. Our trace-driven evaluations indicate that the proposed method outperforms the state-of-the-art anomaly detection methods by 4% in terms of F1 -score while meeting the given time efficiency and scalability requirements.}}, author = {{Raeiszadeh, Mahsa and Ebrahimzadeh, Amin and Glitho, Roch and Eker, Johan and Mini, Raquel}}, issn = {{2831-316X}}, keywords = {{Cloud computing}}, language = {{eng}}, month = {{01}}, publisher = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}}, series = {{IEEE Transactions on Machine Learning in Communications and Networking}}, title = {{Asynchronous Real-Time Federated Learning for Anomaly Detection in Microservice Cloud Applications}}, url = {{http://dx.doi.org/10.1109/TMLCN.2025.3527919}}, doi = {{10.1109/TMLCN.2025.3527919}}, volume = {{3}}, year = {{2025}}, }