The Accuracy Cost of Weakness : A Theoretical Analysis of Fixed-Segment Weak Labeling for Events in Time

Martinsson, John; Virtanen, Tuomas; Sandsten, Maria; Mogren, Olof

The Accuracy Cost of Weakness : A Theoretical Analysis of Fixed-Segment Weak Labeling for Events in Time

Mark

Martinsson, John ^LU ; Virtanen, Tuomas ; Sandsten, Maria ^LU and Mogren, Olof (2025) In Transactions on Machine Learning Research 2025-September.

Abstract: Accurate labels are critical for deriving robust machine learning models. Labels are used to train supervised learning models and to evaluate most machine learning paradigms. In this paper, we model the accuracy and cost of a common weak labeling process where annotators assign presence or absence labels to fixed-length data segments for a given event class. The annotator labels a segment as "present" if it sufficiently covers an event from that class, e.g., a birdsong sound event in audio data. We analyze how the segment length affects the label accuracy and the required number of annotations, and compare this fixed-length labeling approach with an oracle method that uses the true event activations to construct the segments.... (More); Accurate labels are critical for deriving robust machine learning models. Labels are used to train supervised learning models and to evaluate most machine learning paradigms. In this paper, we model the accuracy and cost of a common weak labeling process where annotators assign presence or absence labels to fixed-length data segments for a given event class. The annotator labels a segment as "present" if it sufficiently covers an event from that class, e.g., a birdsong sound event in audio data. We analyze how the segment length affects the label accuracy and the required number of annotations, and compare this fixed-length labeling approach with an oracle method that uses the true event activations to construct the segments. Furthermore, we quantify the gap between these methods and verify that in most realistic scenarios the oracle method is better than the fixed-length labeling method in both accuracy and cost. Our findings provide a theoretical justification for adaptive weak labeling strategies that mimic the oracle process, and a foundation for optimizing weak labeling processes in sequence labeling tasks.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/4c6e0aae-5f8b-439c-988a-fe9bfcc213f2

author

Martinsson, John ^LU ; Virtanen, Tuomas ; Sandsten, Maria ^LU and Mogren, Olof

organization

publishing date

2025

type

Contribution to journal

publication status

published

subject

Other Computer and Information Science

in

Transactions on Machine Learning Research

volume

2025-September

external identifiers

scopus:105017875586

ISSN

2835-8856

language

English

LU publication?

yes

id

4c6e0aae-5f8b-439c-988a-fe9bfcc213f2

alternative location

https://openreview.net/pdf?id=tTw8wXBQ18

date added to LUP

2025-12-05 12:00:48

date last changed

2026-01-28 09:26:49

@article{4c6e0aae-5f8b-439c-988a-fe9bfcc213f2,
  abstract     = {{<p>Accurate labels are critical for deriving robust machine learning models. Labels are used to train supervised learning models and to evaluate most machine learning paradigms. In this paper, we model the accuracy and cost of a common weak labeling process where annotators assign presence or absence labels to fixed-length data segments for a given event class. The annotator labels a segment as "present" if it sufficiently covers an event from that class, e.g., a birdsong sound event in audio data. We analyze how the segment length affects the label accuracy and the required number of annotations, and compare this fixed-length labeling approach with an oracle method that uses the true event activations to construct the segments. Furthermore, we quantify the gap between these methods and verify that in most realistic scenarios the oracle method is better than the fixed-length labeling method in both accuracy and cost. Our findings provide a theoretical justification for adaptive weak labeling strategies that mimic the oracle process, and a foundation for optimizing weak labeling processes in sequence labeling tasks.</p>}},
  author       = {{Martinsson, John and Virtanen, Tuomas and Sandsten, Maria and Mogren, Olof}},
  issn         = {{2835-8856}},
  language     = {{eng}},
  series       = {{Transactions on Machine Learning Research}},
  title        = {{The Accuracy Cost of Weakness : A Theoretical Analysis of Fixed-Segment Weak Labeling for Events in Time}},
  url          = {{https://openreview.net/pdf?id=tTw8wXBQ18}},
  volume       = {{2025-September}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

The Accuracy Cost of Weakness : A Theoretical Analysis of Fixed-Segment Weak Labeling for Events in Time