Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Efficient and precise annotation of local structures in data

Martinsson, John LU (2024) In Licentiate Theses in Mathematical Sciences 2024(3).
Abstract
Machine learning models are used to help scientists analyze large amounts of data across all fields of science. These models become better with more data and larger models mainly through supervised learning. Both supervised learning and model validation benefit from annotated datasets where the annotations are of high quality. A key challenge is to annotate the amount of data that is needed to train large machine learning models. This is because annotation is a costly process and the collected labels can vary in quality. Methods that enable cheap annotation of high quality are therefore needed.

In this thesis we consider ways to reduce the annotation cost and improve the label quality when annotating local structures in data. An... (More)
Machine learning models are used to help scientists analyze large amounts of data across all fields of science. These models become better with more data and larger models mainly through supervised learning. Both supervised learning and model validation benefit from annotated datasets where the annotations are of high quality. A key challenge is to annotate the amount of data that is needed to train large machine learning models. This is because annotation is a costly process and the collected labels can vary in quality. Methods that enable cheap annotation of high quality are therefore needed.

In this thesis we consider ways to reduce the annotation cost and improve the label quality when annotating local structures in data. An example of a local structure is a sound event in an audio recording, or a visual object in an image. By automatically detecting the boundaries of these structures we allow the annotator to focus on the task of assigning a textual description to the local structure within those boundaries. In this setting we analyze the limits of a commonly used annotation method and compare that to an oracle method, which acts as an upper bound on what can be achieved. Further, we propose new ways to perform this kind of annotation that results in higher label quality for the studied datasets at a reduced cost. Finally, we study ways to reduce annotation cost by making the most use of each annotation that is given through better modelling approaches in general. (Less)
Please use this url to cite or link to this publication:
author
supervisor
organization
publishing date
type
Thesis
publication status
published
subject
keywords
Annotation efficiency, Sound event detection, Machine learning
in
Licentiate Theses in Mathematical Sciences
volume
2024
issue
3
pages
120 pages
publisher
Centre for Mathematical Sciences, Lund University
ISSN
1404-028X
ISBN
978-91-8104-199-6
978-91-8104-200-9
language
English
LU publication?
yes
id
d2b8fa34-a413-47a4-af4d-99f1bb8ccb2b
date added to LUP
2024-09-20 08:34:46
date last changed
2025-04-04 14:44:09
@misc{d2b8fa34-a413-47a4-af4d-99f1bb8ccb2b,
  abstract     = {{Machine learning models are used to help scientists analyze large amounts of data across all fields of science. These models become better with more data and larger models mainly through supervised learning. Both supervised learning and model validation benefit from annotated datasets where the annotations are of high quality. A key challenge is to annotate the amount of data that is needed to train large machine learning models. This is because annotation is a costly process and the collected labels can vary in quality. Methods that enable cheap annotation of high quality are therefore needed.<br/><br/>In this thesis we consider ways to reduce the annotation cost and improve the label quality when annotating local structures in data. An example of a local structure is a sound event in an audio recording, or a visual object in an image. By automatically detecting the boundaries of these structures we allow the annotator to focus on the task of assigning a textual description to the local structure within those boundaries. In this setting we analyze the limits of a commonly used annotation method and compare that to an oracle method, which acts as an upper bound on what can be achieved. Further, we propose new ways to perform this kind of annotation that results in higher label quality for the studied datasets at a reduced cost. Finally, we study ways to reduce annotation cost by making the most use of each annotation that is given through better modelling approaches in general.}},
  author       = {{Martinsson, John}},
  isbn         = {{978-91-8104-199-6}},
  issn         = {{1404-028X}},
  keywords     = {{Annotation efficiency; Sound event detection; Machine learning}},
  language     = {{eng}},
  month        = {{10}},
  note         = {{Licentiate Thesis}},
  number       = {{3}},
  publisher    = {{Centre for Mathematical Sciences, Lund University}},
  series       = {{Licentiate Theses in Mathematical Sciences}},
  title        = {{Efficient and precise annotation of local structures in data}},
  url          = {{https://lup.lub.lu.se/search/files/195517213/Lic_avhandling_John_Martinsson_LUCRIS.pdf}},
  volume       = {{2024}},
  year         = {{2024}},
}