Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

An implementation and evaluation of different change point detection algorithms for identification of performance regressions in systems

Wahldén, Klara LU (2025) In Master's Theses in Mathematical Sciences FMSM01 20242
Mathematical Statistics
Abstract
This project was completed in collaboration with the graph database company Neo4j with the aim of developing a new algorithm for automatic detection of software bugs that degrade system performance, commonly referred to as performance regressions in the industry. Due to the large number of benchmarks used to monitor different
aspects of the product environment, a flexible and robust solution was needed that could perform well over various data sequences with differing characteristics. Change point detection is a field within data analytics and statistics, originating from the introduction of quality control protocols in the manufacturing industry. It is aimed at identifying the presence, and often also the location, of changes in the... (More)
This project was completed in collaboration with the graph database company Neo4j with the aim of developing a new algorithm for automatic detection of software bugs that degrade system performance, commonly referred to as performance regressions in the industry. Due to the large number of benchmarks used to monitor different
aspects of the product environment, a flexible and robust solution was needed that could perform well over various data sequences with differing characteristics. Change point detection is a field within data analytics and statistics, originating from the introduction of quality control protocols in the manufacturing industry. It is aimed at identifying the presence, and often also the location, of changes in the probability distribution in ordered data. When benchmarks are run, the output provides a measure of either latency---the time it takes to execute the task---or throughput---the number of tasks performed in a unit of time---and a sequence of such test results can be regarded as a stochastic process. Performance regressions are typically characterised by an increase in latency or a decrease in throughput, i.e. a change in the mean of the underlying distribution of the process. Three different change point detection algorithms were implemented in the project; one based on the traditional CUSUM algorithm, one based on energy statistics and one that is kernel based. In addition, variations on the search approach and hypothesis testing were applied to the three algorithms for further comparison. In evaluating the alternative implementations, simulated data was used to test the algorithms’ performance on sequences designed to highlight specific interesting properties. The final results indicated that the energy statistics algorithm did perform best overall. However, user-specific needs may require deeper analysis to ensure that the selected algorithm is efficient in handling any key scenarios. (Less)
Popular Abstract
While the statistical tools to perform quality controls were developed for the manufacturing industry in the 1920s, ensuring high product quality is just as relevant to modern-day software companies. So, can the same concepts be adapted to this digital setting?
Please use this url to cite or link to this publication:
author
Wahldén, Klara LU
supervisor
organization
course
FMSM01 20242
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Change point detection, Performance regressions, Benchmarking
publication/series
Master's Theses in Mathematical Sciences
report number
LUTFMS-3512--2025
ISSN
1404-6342
other publication id
2025:E9
language
English
id
9185477
date added to LUP
2025-02-19 13:25:57
date last changed
2025-04-08 16:05:44
@misc{9185477,
  abstract     = {{This project was completed in collaboration with the graph database company Neo4j with the aim of developing a new algorithm for automatic detection of software bugs that degrade system performance, commonly referred to as performance regressions in the industry. Due to the large number of benchmarks used to monitor different
aspects of the product environment, a flexible and robust solution was needed that could perform well over various data sequences with differing characteristics. Change point detection is a field within data analytics and statistics, originating from the introduction of quality control protocols in the manufacturing industry. It is aimed at identifying the presence, and often also the location, of changes in the probability distribution in ordered data. When benchmarks are run, the output provides a measure of either latency---the time it takes to execute the task---or throughput---the number of tasks performed in a unit of time---and a sequence of such test results can be regarded as a stochastic process. Performance regressions are typically characterised by an increase in latency or a decrease in throughput, i.e. a change in the mean of the underlying distribution of the process. Three different change point detection algorithms were implemented in the project; one based on the traditional CUSUM algorithm, one based on energy statistics and one that is kernel based. In addition, variations on the search approach and hypothesis testing were applied to the three algorithms for further comparison. In evaluating the alternative implementations, simulated data was used to test the algorithms’ performance on sequences designed to highlight specific interesting properties. The final results indicated that the energy statistics algorithm did perform best overall. However, user-specific needs may require deeper analysis to ensure that the selected algorithm is efficient in handling any key scenarios.}},
  author       = {{Wahldén, Klara}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{An implementation and evaluation of different change point detection algorithms for identification of performance regressions in systems}},
  year         = {{2025}},
}