Enhancing DevOps with Autonomous Monitors: A Proactive Approach to Failure Detection
(2024)- Abstract
- Software engineering practices, including continuous integration, continuous testing, and continuous deployment, aim to streamline and automate the software development process. A cultural and professional movement that builds upon continuous practices, DevOps, seeks to bridge the gap between development and operations. By fostering a collaborative environment, DevOps supports faster, more frequent, and reliable software releases, inherently promoting agile methodologies throughout the software development lifecycle.
By introducing agility, there is a higher risk of operational failures in cloud-based software systems. Recognizing this challenge, the objective of this thesis is to understand and present approaches for mitigating... (More) - Software engineering practices, including continuous integration, continuous testing, and continuous deployment, aim to streamline and automate the software development process. A cultural and professional movement that builds upon continuous practices, DevOps, seeks to bridge the gap between development and operations. By fostering a collaborative environment, DevOps supports faster, more frequent, and reliable software releases, inherently promoting agile methodologies throughout the software development lifecycle.
By introducing agility, there is a higher risk of operational failures in cloud-based software systems. Recognizing this challenge, the objective of this thesis is to understand and present approaches for mitigating the cascading effects of operational failures across interconnected system components. In collaboration with two Swedish companies, we investigated how proactive monitoring strategies inspired by state-of-the-art machine learning (ML) solutions can prevent failure propagation and ensure seamless system operations.
The conducted research activities span from practice to theory and from problem to solution domain, including problem conceptualization, solution design, instantiation, and empirical validation. This complies with the main principles of the design science paradigm mainly used to frame problem-driven studies aiming to improve specific areas of practice.
The main contributions of this thesis are threefold. First, an in-depth overview of operational challenges and matching solutions in cloud-based software systems, focusing on alert management and monitoring data through two case studies and extensive literature reviews. Second, a proactive alert strategy called autonomous monitors to enhance early detection and prevention of operational failures. Finally, the practical applicability of these monitors is confirmed via empirical studies, highlighting their effectiveness in various industrial contexts.
We demonstrated the practical effectiveness of the proposed ML-based monitoring solution to pave the way for its widespread adoption for enhancing DevOps.
(Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/16456ba3-7482-4846-956c-8cd4782fc557
- author
- Hrusto, Adha
LU
- supervisor
-
- Per Runeson LU
- Emelie Engström LU
- Magnus C Ohlsson LU
- opponent
-
- Prof. Mäntylä, Mika, Helsinki University, Finland.
- organization
- publishing date
- 2024-10-01
- type
- Thesis
- publication status
- published
- subject
- pages
- 194 pages
- publisher
- Computer Science, Lund University
- defense location
- Lecture Hall E:A, building E, Klas Anshelms väg 10, Faculty of Engineering LTH, Lund University, Lund. The dissertation will be live streamed, but part of the premises is to be excluded from the live stream. Zoom: https://lu-se.zoom.us/j/68537887917
- defense date
- 2024-11-08 09:15:00
- ISBN
- 978-91-8104-210-8
- 978-91-8104-209-2
- language
- English
- LU publication?
- yes
- id
- 16456ba3-7482-4846-956c-8cd4782fc557
- date added to LUP
- 2024-10-01 14:14:22
- date last changed
- 2024-10-18 09:17:10
@phdthesis{16456ba3-7482-4846-956c-8cd4782fc557, abstract = {{Software engineering practices, including continuous integration, continuous testing, and continuous deployment, aim to streamline and automate the software development process. A cultural and professional movement that builds upon continuous practices, DevOps, seeks to bridge the gap between development and operations. By fostering a collaborative environment, DevOps supports faster, more frequent, and reliable software releases, inherently promoting agile methodologies throughout the software development lifecycle.<br/><br/>By introducing agility, there is a higher risk of operational failures in cloud-based software systems. Recognizing this challenge, the objective of this thesis is to understand and present approaches for mitigating the cascading effects of operational failures across interconnected system components. In collaboration with two Swedish companies, we investigated how proactive monitoring strategies inspired by state-of-the-art machine learning (ML) solutions can prevent failure propagation and ensure seamless system operations.<br/><br/>The conducted research activities span from practice to theory and from problem to solution domain, including problem conceptualization, solution design, instantiation, and empirical validation. This complies with the main principles of the design science paradigm mainly used to frame problem-driven studies aiming to improve specific areas of practice. <br/><br/>The main contributions of this thesis are threefold. First, an in-depth overview of operational challenges and matching solutions in cloud-based software systems, focusing on alert management and monitoring data through two case studies and extensive literature reviews. Second, a proactive alert strategy called autonomous monitors to enhance early detection and prevention of operational failures. Finally, the practical applicability of these monitors is confirmed via empirical studies, highlighting their effectiveness in various industrial contexts.<br/><br/>We demonstrated the practical effectiveness of the proposed ML-based monitoring solution to pave the way for its widespread adoption for enhancing DevOps.<br/>}}, author = {{Hrusto, Adha}}, isbn = {{978-91-8104-210-8}}, language = {{eng}}, month = {{10}}, publisher = {{Computer Science, Lund University}}, school = {{Lund University}}, title = {{Enhancing DevOps with Autonomous Monitors: A Proactive Approach to Failure Detection}}, url = {{https://lup.lub.lu.se/search/files/196213572/PHD_Thesis_Adha_Hrusto.pdf}}, year = {{2024}}, }