An implementation and evaluation of different change point detection algorithms for identification of performance regressions in systems
(2025) In Master's Theses in Mathematical Sciences FMSM01 20242Mathematical Statistics
- Abstract
- This project was completed in collaboration with the graph database company Neo4j with the aim of developing a new algorithm for automatic detection of software bugs that degrade system performance, commonly referred to as performance regressions in the industry. Due to the large number of benchmarks used to monitor different
aspects of the product environment, a flexible and robust solution was needed that could perform well over various data sequences with differing characteristics. Change point detection is a field within data analytics and statistics, originating from the introduction of quality control protocols in the manufacturing industry. It is aimed at identifying the presence, and often also the location, of changes in the... (More) - This project was completed in collaboration with the graph database company Neo4j with the aim of developing a new algorithm for automatic detection of software bugs that degrade system performance, commonly referred to as performance regressions in the industry. Due to the large number of benchmarks used to monitor different
aspects of the product environment, a flexible and robust solution was needed that could perform well over various data sequences with differing characteristics. Change point detection is a field within data analytics and statistics, originating from the introduction of quality control protocols in the manufacturing industry. It is aimed at identifying the presence, and often also the location, of changes in the probability distribution in ordered data. When benchmarks are run, the output provides a measure of either latency---the time it takes to execute the task---or throughput---the number of tasks performed in a unit of time---and a sequence of such test results can be regarded as a stochastic process. Performance regressions are typically characterised by an increase in latency or a decrease in throughput, i.e. a change in the mean of the underlying distribution of the process. Three different change point detection algorithms were implemented in the project; one based on the traditional CUSUM algorithm, one based on energy statistics and one that is kernel based. In addition, variations on the search approach and hypothesis testing were applied to the three algorithms for further comparison. In evaluating the alternative implementations, simulated data was used to test the algorithms’ performance on sequences designed to highlight specific interesting properties. The final results indicated that the energy statistics algorithm did perform best overall. However, user-specific needs may require deeper analysis to ensure that the selected algorithm is efficient in handling any key scenarios. (Less) - Popular Abstract
- While the statistical tools to perform quality controls were developed for the manufacturing industry in the 1920s, ensuring high product quality is just as relevant to modern-day software companies. So, can the same concepts be adapted to this digital setting?
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9185477
- author
- Wahldén, Klara LU
- supervisor
- organization
- course
- FMSM01 20242
- year
- 2025
- type
- H2 - Master's Degree (Two Years)
- subject
- keywords
- Change point detection, Performance regressions, Benchmarking
- publication/series
- Master's Theses in Mathematical Sciences
- report number
- LUTFMS-3512--2025
- ISSN
- 1404-6342
- other publication id
- 2025:E9
- language
- English
- id
- 9185477
- date added to LUP
- 2025-02-19 13:25:57
- date last changed
- 2025-04-08 16:05:44
@misc{9185477, abstract = {{This project was completed in collaboration with the graph database company Neo4j with the aim of developing a new algorithm for automatic detection of software bugs that degrade system performance, commonly referred to as performance regressions in the industry. Due to the large number of benchmarks used to monitor different aspects of the product environment, a flexible and robust solution was needed that could perform well over various data sequences with differing characteristics. Change point detection is a field within data analytics and statistics, originating from the introduction of quality control protocols in the manufacturing industry. It is aimed at identifying the presence, and often also the location, of changes in the probability distribution in ordered data. When benchmarks are run, the output provides a measure of either latency---the time it takes to execute the task---or throughput---the number of tasks performed in a unit of time---and a sequence of such test results can be regarded as a stochastic process. Performance regressions are typically characterised by an increase in latency or a decrease in throughput, i.e. a change in the mean of the underlying distribution of the process. Three different change point detection algorithms were implemented in the project; one based on the traditional CUSUM algorithm, one based on energy statistics and one that is kernel based. In addition, variations on the search approach and hypothesis testing were applied to the three algorithms for further comparison. In evaluating the alternative implementations, simulated data was used to test the algorithms’ performance on sequences designed to highlight specific interesting properties. The final results indicated that the energy statistics algorithm did perform best overall. However, user-specific needs may require deeper analysis to ensure that the selected algorithm is efficient in handling any key scenarios.}}, author = {{Wahldén, Klara}}, issn = {{1404-6342}}, language = {{eng}}, note = {{Student Paper}}, series = {{Master's Theses in Mathematical Sciences}}, title = {{An implementation and evaluation of different change point detection algorithms for identification of performance regressions in systems}}, year = {{2025}}, }