Real-World Applications of Anomaly Detection : Detecting the Unexpected Through Distributional Modelling
(2026) In Licentiate theses in mathematical sciences 2027(1).- Abstract
- In many machine learning tasks, the premise is designed around predetermined targets and clear expectations of model behaviour. In such cases, there is a direct definition of the optimal mappings between inputs and outputs, which can be learned given sufficiently sized datasets and models. However, in many real-world scenarios, tasks are often not as well-posed and instead defined around detecting the unexpected, the anomalies.
There are many ways of modelling distributions of data points, but in cases of complex high-dimensional data, like images, traditional parametric distributions often fall short. The large non-linear dependencies between pixel values and the cluster-like properties of natural categories make image... (More) - In many machine learning tasks, the premise is designed around predetermined targets and clear expectations of model behaviour. In such cases, there is a direct definition of the optimal mappings between inputs and outputs, which can be learned given sufficiently sized datasets and models. However, in many real-world scenarios, tasks are often not as well-posed and instead defined around detecting the unexpected, the anomalies.
There are many ways of modelling distributions of data points, but in cases of complex high-dimensional data, like images, traditional parametric distributions often fall short. The large non-linear dependencies between pixel values and the cluster-like properties of natural categories make image distributions difficult to model. Instead, recent years have seen advances by using neural networks recontextualized as parametric distributions to construct probabilistic models of natural images.
This thesis investigates how such methods hold up in real-world applications. Modelling data in the wild results in several challenges compared to the controlled conditions of many benchmarks. Instead, by applying these methods in real-world settings, they can be evaluated on their impact and usefulness on downstream tasks. By moving research and method development closer to the intended applications, this thesis aims to highlight some of the benefits that can be gained from bridging the gap between theory and practice.
This thesis contains three main research contributions. The first is a theoretical method development paper that delves into the statistics and machine learning techniques used in the field of anomaly detection. This paper investigates how conditional distributions can be modelled better in variational autoencoder (VAE) models. Commonly, such methods use conditional class clusters which are fully learned by the model. This paper finds that VAE-style models can generalize better with small amounts of rigidity in cluster positions.
The second paper applies these techniques to the field of breast cancer diagnosis. Traditional mammography is a reliable way of diagnosing breast cancer, but is not available globally due to economic constraints. Point-of-care Ultrasound (POCUS) is a promising alternative. However, such images are harder to capture and can contain artifacts that make diagnosis difficult. By modelling the distribution of properly captured POCUS images, we are able to filter out images with artifacts that make them unsuitable for diagnosis.
Paper three applies distributional modelling to the agricultural sector to model how crop yield is distributed over fields using graph neural networks. Using publicly available remote sensing data from the Sentinel-1 and Sentinel-2 satellites, the model is able to estimate how harvest levels were distributed in the past and how the yield will vary in future years. The goal of this study is to provide farmers with more information on how yield is distributed, thereby decreasing cost and mitigating eutrophication caused by over-fertilization. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/3e4deb70-61cf-4efc-9a7a-ca91c4126666
- author
- Åström, Oskar LU
- supervisor
-
- Alexandros Sopasakis LU
- Tony Stillfjord LU
- Ola Hall LU
- organization
- publishing date
- 2026
- type
- Thesis
- publication status
- published
- subject
- keywords
- anomaly detection, out-of-distribution detection, breast cancer, remote sensing, variational autoencoder, VAE, food production, agriculture, point-of-care ultrasound
- in
- Licentiate theses in mathematical sciences
- volume
- 2027
- issue
- 1
- publisher
- Lund University / Centre for Mathematical Sciences /LTH
- ISSN
- 1404-028X
- ISBN
- 978-91-8104-839-1
- 978-91-8104-838-4
- language
- English
- LU publication?
- yes
- id
- 3e4deb70-61cf-4efc-9a7a-ca91c4126666
- date added to LUP
- 2026-01-19 15:55:48
- date last changed
- 2026-01-20 09:34:55
@misc{3e4deb70-61cf-4efc-9a7a-ca91c4126666,
abstract = {{In many machine learning tasks, the premise is designed around predetermined targets and clear expectations of model behaviour. In such cases, there is a direct definition of the optimal mappings between inputs and outputs, which can be learned given sufficiently sized datasets and models. However, in many real-world scenarios, tasks are often not as well-posed and instead defined around detecting the unexpected, the anomalies.<br/><br/>There are many ways of modelling distributions of data points, but in cases of complex high-dimensional data, like images, traditional parametric distributions often fall short. The large non-linear dependencies between pixel values and the cluster-like properties of natural categories make image distributions difficult to model. Instead, recent years have seen advances by using neural networks recontextualized as parametric distributions to construct probabilistic models of natural images. <br/><br/>This thesis investigates how such methods hold up in real-world applications. Modelling data in the wild results in several challenges compared to the controlled conditions of many benchmarks. Instead, by applying these methods in real-world settings, they can be evaluated on their impact and usefulness on downstream tasks. By moving research and method development closer to the intended applications, this thesis aims to highlight some of the benefits that can be gained from bridging the gap between theory and practice.<br/><br/>This thesis contains three main research contributions. The first is a theoretical method development paper that delves into the statistics and machine learning techniques used in the field of anomaly detection. This paper investigates how conditional distributions can be modelled better in variational autoencoder (VAE) models. Commonly, such methods use conditional class clusters which are fully learned by the model. This paper finds that VAE-style models can generalize better with small amounts of rigidity in cluster positions.<br/><br/>The second paper applies these techniques to the field of breast cancer diagnosis. Traditional mammography is a reliable way of diagnosing breast cancer, but is not available globally due to economic constraints. Point-of-care Ultrasound (POCUS) is a promising alternative. However, such images are harder to capture and can contain artifacts that make diagnosis difficult. By modelling the distribution of properly captured POCUS images, we are able to filter out images with artifacts that make them unsuitable for diagnosis. <br/><br/>Paper three applies distributional modelling to the agricultural sector to model how crop yield is distributed over fields using graph neural networks. Using publicly available remote sensing data from the Sentinel-1 and Sentinel-2 satellites, the model is able to estimate how harvest levels were distributed in the past and how the yield will vary in future years. The goal of this study is to provide farmers with more information on how yield is distributed, thereby decreasing cost and mitigating eutrophication caused by over-fertilization.}},
author = {{Åström, Oskar}},
isbn = {{978-91-8104-839-1}},
issn = {{1404-028X}},
keywords = {{anomaly detection; out-of-distribution detection; breast cancer; remote sensing; variational autoencoder; VAE; food production; agriculture; point-of-care ultrasound}},
language = {{eng}},
note = {{Licentiate Thesis}},
number = {{1}},
publisher = {{Lund University / Centre for Mathematical Sciences /LTH}},
series = {{Licentiate theses in mathematical sciences}},
title = {{Real-World Applications of Anomaly Detection : Detecting the Unexpected Through Distributional Modelling}},
url = {{https://lup.lub.lu.se/search/files/239928439/Avhandling_Oskar_A_stro_m_LUCRIS.pdf}},
volume = {{2027}},
year = {{2026}},
}