The value of human data annotation for machine learning based anomaly detection in environmental systems

Russo, Stefania; Besmer, Michael D.; Blumensaat, Frank; Bouffard, Damien; Disch, Andy; Hammes, Frederik; Hess, Angelika; Lürig, Moritz; Matthews, Blake; Minaudo, Camille; Morgenroth, Eberhard; Tran-Khac, Viet; Villez, Kris

The value of human data annotation for machine learning based anomaly detection in environmental systems

Mark

Russo, Stefania ; Besmer, Michael D. ; Blumensaat, Frank ; Bouffard, Damien ; Disch, Andy ; Hammes, Frederik ; Hess, Angelika ; Lürig, Moritz ^LU ; Matthews, Blake and Minaudo, Camille , et al. (2021) In Water Research 206.

Abstract: Anomaly detection is the process of identifying unexpected data samples in datasets. Automated anomaly detection is either performed using supervised machine learning models, which require a labelled dataset for their calibration, or unsupervised models, which do not require labels. While academic research has produced a vast array of tools and machine learning models for automated anomaly detection, the research community focused on environmental systems still lacks a comparative analysis that is simultaneously comprehensive, objective, and systematic. This knowledge gap is addressed for the first time in this study, where 15 different supervised and unsupervised anomaly detection models are evaluated on 5 different environmental... (More); Anomaly detection is the process of identifying unexpected data samples in datasets. Automated anomaly detection is either performed using supervised machine learning models, which require a labelled dataset for their calibration, or unsupervised models, which do not require labels. While academic research has produced a vast array of tools and machine learning models for automated anomaly detection, the research community focused on environmental systems still lacks a comparative analysis that is simultaneously comprehensive, objective, and systematic. This knowledge gap is addressed for the first time in this study, where 15 different supervised and unsupervised anomaly detection models are evaluated on 5 different environmental datasets from engineered and natural aquatic systems. To this end, anomaly detection performance, labelling efforts, as well as the impact of model and algorithm tuning are taken into account. As a result, our analysis reveals the relative strengths and weaknesses of the different approaches in an objective manner without bias for any particular paradigm in machine learning. Most importantly, our results show that expert-based data annotation is extremely valuable for anomaly detection based on machine learning.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/afff622f-9e04-4125-9b13-d8d00df2fa3b

author

Russo, Stefania ; Besmer, Michael D. ; Blumensaat, Frank ; Bouffard, Damien ; Disch, Andy ; Hammes, Frederik ; Hess, Angelika ; Lürig, Moritz ^LU ; Matthews, Blake and Minaudo, Camille , et al. (More)

Russo, Stefania ; Besmer, Michael D. ; Blumensaat, Frank ; Bouffard, Damien ; Disch, Andy ; Hammes, Frederik ; Hess, Angelika ; Lürig, Moritz ^LU ; Matthews, Blake ; Minaudo, Camille ; Morgenroth, Eberhard ; Tran-Khac, Viet and Villez, Kris (Less)

organization

publishing date

2021-11-01

type

Contribution to journal

publication status

published

subject

keywords

Anomaly detection, Environmental systems, Labels, Machine learning

in

Water Research

volume

206

article number

117695

publisher

Elsevier

external identifiers

pmid:34626884
scopus:85116532784

ISSN

0043-1354

DOI

10.1016/j.watres.2021.117695

language

English

LU publication?

yes

additional info

id

afff622f-9e04-4125-9b13-d8d00df2fa3b

date added to LUP

2021-10-20 11:29:34

date last changed

2025-10-21 00:51:43

@article{afff622f-9e04-4125-9b13-d8d00df2fa3b,
  abstract     = {{<p>Anomaly detection is the process of identifying unexpected data samples in datasets. Automated anomaly detection is either performed using supervised machine learning models, which require a labelled dataset for their calibration, or unsupervised models, which do not require labels. While academic research has produced a vast array of tools and machine learning models for automated anomaly detection, the research community focused on environmental systems still lacks a comparative analysis that is simultaneously comprehensive, objective, and systematic. This knowledge gap is addressed for the first time in this study, where 15 different supervised and unsupervised anomaly detection models are evaluated on 5 different environmental datasets from engineered and natural aquatic systems. To this end, anomaly detection performance, labelling efforts, as well as the impact of model and algorithm tuning are taken into account. As a result, our analysis reveals the relative strengths and weaknesses of the different approaches in an objective manner without bias for any particular paradigm in machine learning. Most importantly, our results show that expert-based data annotation is extremely valuable for anomaly detection based on machine learning.</p>}},
  author       = {{Russo, Stefania and Besmer, Michael D. and Blumensaat, Frank and Bouffard, Damien and Disch, Andy and Hammes, Frederik and Hess, Angelika and Lürig, Moritz and Matthews, Blake and Minaudo, Camille and Morgenroth, Eberhard and Tran-Khac, Viet and Villez, Kris}},
  issn         = {{0043-1354}},
  keywords     = {{Anomaly detection; Environmental systems; Labels; Machine learning}},
  language     = {{eng}},
  month        = {{11}},
  publisher    = {{Elsevier}},
  series       = {{Water Research}},
  title        = {{The value of human data annotation for machine learning based anomaly detection in environmental systems}},
  url          = {{http://dx.doi.org/10.1016/j.watres.2021.117695}},
  doi          = {{10.1016/j.watres.2021.117695}},
  volume       = {{206}},
  year         = {{2021}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

The value of human data annotation for machine learning based anomaly detection in environmental systems