Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Real-World Applications of Anomaly Detection : Detecting the Unexpected Through Distributional Modelling

Åström, Oskar LU (2026) In Licentiate theses in mathematical sciences 2027(1).
Abstract
In many machine learning tasks, the premise is designed around predetermined targets and clear expectations of model behaviour. In such cases, there is a direct definition of the optimal mappings between inputs and outputs, which can be learned given sufficiently sized datasets and models. However, in many real-world scenarios, tasks are often not as well-posed and instead defined around detecting the unexpected, the anomalies.

There are many ways of modelling distributions of data points, but in cases of complex high-dimensional data, like images, traditional parametric distributions often fall short. The large non-linear dependencies between pixel values and the cluster-like properties of natural categories make image... (More)
In many machine learning tasks, the premise is designed around predetermined targets and clear expectations of model behaviour. In such cases, there is a direct definition of the optimal mappings between inputs and outputs, which can be learned given sufficiently sized datasets and models. However, in many real-world scenarios, tasks are often not as well-posed and instead defined around detecting the unexpected, the anomalies.

There are many ways of modelling distributions of data points, but in cases of complex high-dimensional data, like images, traditional parametric distributions often fall short. The large non-linear dependencies between pixel values and the cluster-like properties of natural categories make image distributions difficult to model. Instead, recent years have seen advances by using neural networks recontextualized as parametric distributions to construct probabilistic models of natural images.

This thesis investigates how such methods hold up in real-world applications. Modelling data in the wild results in several challenges compared to the controlled conditions of many benchmarks. Instead, by applying these methods in real-world settings, they can be evaluated on their impact and usefulness on downstream tasks. By moving research and method development closer to the intended applications, this thesis aims to highlight some of the benefits that can be gained from bridging the gap between theory and practice.

This thesis contains three main research contributions. The first is a theoretical method development paper that delves into the statistics and machine learning techniques used in the field of anomaly detection. This paper investigates how conditional distributions can be modelled better in variational autoencoder (VAE) models. Commonly, such methods use conditional class clusters which are fully learned by the model. This paper finds that VAE-style models can generalize better with small amounts of rigidity in cluster positions.

The second paper applies these techniques to the field of breast cancer diagnosis. Traditional mammography is a reliable way of diagnosing breast cancer, but is not available globally due to economic constraints. Point-of-care Ultrasound (POCUS) is a promising alternative. However, such images are harder to capture and can contain artifacts that make diagnosis difficult. By modelling the distribution of properly captured POCUS images, we are able to filter out images with artifacts that make them unsuitable for diagnosis.

Paper three applies distributional modelling to the agricultural sector to model how crop yield is distributed over fields using graph neural networks. Using publicly available remote sensing data from the Sentinel-1 and Sentinel-2 satellites, the model is able to estimate how harvest levels were distributed in the past and how the yield will vary in future years. The goal of this study is to provide farmers with more information on how yield is distributed, thereby decreasing cost and mitigating eutrophication caused by over-fertilization. (Less)
Please use this url to cite or link to this publication:
author
supervisor
organization
publishing date
type
Thesis
publication status
published
subject
keywords
anomaly detection, out-of-distribution detection, breast cancer, remote sensing, variational autoencoder, VAE, food production, agriculture, point-of-care ultrasound
in
Licentiate theses in mathematical sciences
volume
2027
issue
1
publisher
Lund University / Centre for Mathematical Sciences /LTH
ISSN
1404-028X
ISBN
978-91-8104-839-1
978-91-8104-838-4
language
English
LU publication?
yes
id
3e4deb70-61cf-4efc-9a7a-ca91c4126666
date added to LUP
2026-01-19 15:55:48
date last changed
2026-01-20 09:34:55
@misc{3e4deb70-61cf-4efc-9a7a-ca91c4126666,
  abstract     = {{In many machine learning tasks, the premise is designed around predetermined targets and clear expectations of model behaviour. In such cases, there is a direct definition of the optimal mappings between inputs and outputs, which can be learned given sufficiently sized datasets and models. However, in many real-world scenarios, tasks are often not as well-posed and instead defined around detecting the unexpected, the anomalies.<br/><br/>There are many ways of modelling distributions of data points, but in cases of complex high-dimensional data, like images, traditional parametric distributions often fall short. The large non-linear dependencies between pixel values and the cluster-like properties of natural categories make image distributions difficult to model. Instead, recent years have seen advances by using neural networks recontextualized as parametric distributions to construct probabilistic models of natural images. <br/><br/>This thesis investigates how such methods hold up in real-world applications. Modelling data in the wild results in several challenges compared to the controlled conditions of many benchmarks. Instead, by applying these methods in real-world settings, they can be evaluated on their impact and usefulness on downstream tasks. By moving research and method development closer to the intended applications, this thesis aims to highlight some of the benefits that can be gained from bridging the gap between theory and practice.<br/><br/>This thesis contains three main research contributions. The first is a theoretical method development paper that delves into the statistics and machine learning techniques used in the field of anomaly detection. This paper investigates how conditional distributions can be modelled better in variational autoencoder (VAE) models. Commonly, such methods use conditional class clusters which are fully learned by the model. This paper finds that VAE-style models can generalize better with small amounts of rigidity in cluster positions.<br/><br/>The second paper applies these techniques to the field of breast cancer diagnosis. Traditional mammography is a reliable way of diagnosing breast cancer, but is not available globally due to economic constraints. Point-of-care Ultrasound (POCUS) is a promising alternative. However, such images are harder to capture and can contain artifacts that make diagnosis difficult. By modelling the distribution of properly captured POCUS images, we are able to filter out images with artifacts that make them unsuitable for diagnosis. <br/><br/>Paper three applies distributional modelling to the agricultural sector to model how crop yield is distributed over fields using graph neural networks. Using publicly available remote sensing data from the Sentinel-1 and Sentinel-2 satellites, the model is able to estimate how harvest levels were distributed in the past and how the yield will vary in future years. The goal of this study is to provide farmers with more information on how yield is distributed, thereby decreasing cost and mitigating eutrophication caused by over-fertilization.}},
  author       = {{Åström, Oskar}},
  isbn         = {{978-91-8104-839-1}},
  issn         = {{1404-028X}},
  keywords     = {{anomaly detection; out-of-distribution detection; breast cancer; remote sensing; variational autoencoder; VAE; food production; agriculture; point-of-care ultrasound}},
  language     = {{eng}},
  note         = {{Licentiate Thesis}},
  number       = {{1}},
  publisher    = {{Lund University / Centre for Mathematical Sciences /LTH}},
  series       = {{Licentiate theses in mathematical sciences}},
  title        = {{Real-World Applications of Anomaly Detection : Detecting the Unexpected Through Distributional Modelling}},
  url          = {{https://lup.lub.lu.se/search/files/239928439/Avhandling_Oskar_A_stro_m_LUCRIS.pdf}},
  volume       = {{2027}},
  year         = {{2026}},
}