Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Getting Started with Chaos Engineering – design of an implementation framework in practice

Jernberg, Hugo ; Runeson, Per LU orcid and Engström, Emelie LU orcid (2020)
Abstract
Background. Chaos Engineering is proposed as a practice to verify a system’s resilience under real, operational conditions. It employs fault injection, is originally developed at Netflix, and supported by several tools from there and other sources. Aims. We aim to intro- duce Chaos Engineering at ICA Gruppen AB, a group of companies whose core business is grocery retail, to improve their systems’ resilience, and to capture our knowledge gained from literature and interviews in a process framework for the introduction of Chaos Engineering. Method. The research is conducted under the design science paradigm, where the problem is conceptualized through a literature study of Chaos Engineering and exploratory interviews in the company. The... (More)
Background. Chaos Engineering is proposed as a practice to verify a system’s resilience under real, operational conditions. It employs fault injection, is originally developed at Netflix, and supported by several tools from there and other sources. Aims. We aim to intro- duce Chaos Engineering at ICA Gruppen AB, a group of companies whose core business is grocery retail, to improve their systems’ resilience, and to capture our knowledge gained from literature and interviews in a process framework for the introduction of Chaos Engineering. Method. The research is conducted under the design science paradigm, where the problem is conceptualized through a literature study of Chaos Engineering and exploratory interviews in the company. The solution framework is designed based on the literature and a tool survey, and validated by letting software en- gineers at ICA apply parts of it to the software systems of ica.se website, including its e-shop. Results. The main contributions are a synthesis of Chaos Engineering literature and tools, in depth un- derstanding of the needs of the case company, and guidelines for introducing Chaos Engineering. Conclusions. The applied parts were concluded to be feasible and they successfully discovered a set of initial improvement opportunities for the system’s resilience, as well as a suitable Chaos Engineering practice for future resilience testing of the system. We recommend companies using the frame- work as a guide for the implementation of Chaos Engineering.
(Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
host publication
Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) Industry Track
editor
Cataldo, Marcelo and Ciolkowski, Marcus
article number
43
pages
10 pages
publisher
Association for Computing Machinery (ACM)
external identifiers
  • scopus:85095833015
ISBN
978-1-4503-7580-1
DOI
10.1145/3382494.3421464
project
WASP Software Engineering Cluster
language
English
LU publication?
yes
id
f5f14990-3b0e-43db-848a-66a44f01a841
date added to LUP
2020-09-17 17:10:53
date last changed
2022-05-12 06:42:49
@inproceedings{f5f14990-3b0e-43db-848a-66a44f01a841,
  abstract     = {{Background. Chaos Engineering is proposed as a practice to verify a system’s resilience under real, operational conditions. It employs fault injection, is originally developed at Netflix, and supported by several tools from there and other sources. Aims. We aim to intro- duce Chaos Engineering at ICA Gruppen AB, a group of companies whose core business is grocery retail, to improve their systems’ resilience, and to capture our knowledge gained from literature and interviews in a process framework for the introduction of Chaos Engineering. Method. The research is conducted under the design science paradigm, where the problem is conceptualized through a literature study of Chaos Engineering and exploratory interviews in the company. The solution framework is designed based on the literature and a tool survey, and validated by letting software en- gineers at ICA apply parts of it to the software systems of ica.se website, including its e-shop. Results. The main contributions are a synthesis of Chaos Engineering literature and tools, in depth un- derstanding of the needs of the case company, and guidelines for introducing Chaos Engineering. Conclusions. The applied parts were concluded to be feasible and they successfully discovered a set of initial improvement opportunities for the system’s resilience, as well as a suitable Chaos Engineering practice for future resilience testing of the system. We recommend companies using the frame- work as a guide for the implementation of Chaos Engineering.<br/>}},
  author       = {{Jernberg, Hugo and Runeson, Per and Engström, Emelie}},
  booktitle    = {{Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) Industry Track}},
  editor       = {{Cataldo, Marcelo and Ciolkowski, Marcus}},
  isbn         = {{978-1-4503-7580-1}},
  language     = {{eng}},
  publisher    = {{Association for Computing Machinery (ACM)}},
  title        = {{Getting Started with Chaos Engineering – design of an implementation framework in practice}},
  url          = {{http://dx.doi.org/10.1145/3382494.3421464}},
  doi          = {{10.1145/3382494.3421464}},
  year         = {{2020}},
}