Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Few-Shot Bioacoustic Event Detection Using an Event-Length Adapted Ensemble of Prototypical Networks

Martinsson, John LU ; Sandsten, Maria LU ; Willbo, Martin ; Pirinen, Aleksis and Mogren, Olof (2022) 7th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2022)
Abstract
In this paper we study two major challenges in few-shot bioacoustic event detection: variable event lengths and false-positives. We use prototypical networks where the embedding function is trained using a multi-label sound event detection model instead of using episodic training as the proxy task on the provided training dataset. This is motivated by polyphonic sound events being present in the base training data. We propose a method to choose the embedding function based on the average event length of the few-shot examples and show that this makes the method more robust towards variable event lengths. Further, we show that an ensemble of prototypical neural networks trained on different training and validation splits of time-frequency... (More)
In this paper we study two major challenges in few-shot bioacoustic event detection: variable event lengths and false-positives. We use prototypical networks where the embedding function is trained using a multi-label sound event detection model instead of using episodic training as the proxy task on the provided training dataset. This is motivated by polyphonic sound events being present in the base training data. We propose a method to choose the embedding function based on the average event length of the few-shot examples and show that this makes the method more robust towards variable event lengths. Further, we show that an ensemble of prototypical neural networks trained on different training and validation splits of time-frequency images with different loudness normalizations reduces false-positives. In addition, we present an analysis on the effect that the studied loudness normalization techniques have on the performance of the prototypical network ensemble. Overall, per-channel energy normalization (PCEN) outperforms the standard log transform for this task. The method uses no data augmentation and no external data. The proposed approach achieves a F-score of 48.0% when evaluated on the hidden test set of the Detection and Classification of Acoustic Scenes and Events (DCASE) task 5. (Less)
Please use this url to cite or link to this publication:
author
; ; ; and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
machine listening, bioacoustics, few-shot learning, ensemble
host publication
Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2022)
pages
5 pages
conference name
7th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2022)
conference location
Nancy, France
conference dates
2022-11-03 - 2022-11-04
ISBN
978-952-03-2677-7
language
English
LU publication?
yes
id
f07cfc21-bf15-40b7-8e1b-8d2e23eff20f
alternative location
https://dcase.community/documents/workshop2022/proceedings/DCASE2022Workshop_Martinsson_13.pdf
date added to LUP
2022-12-09 07:16:54
date last changed
2023-01-09 13:50:41
@inproceedings{f07cfc21-bf15-40b7-8e1b-8d2e23eff20f,
  abstract     = {{In this paper we study two major challenges in few-shot bioacoustic event detection: variable event lengths and false-positives. We use prototypical networks where the embedding function is trained using a multi-label sound event detection model instead of using episodic training as the proxy task on the provided training dataset. This is motivated by polyphonic sound events being present in the base training data. We propose a method to choose the embedding function based on the average event length of the few-shot examples and show that this makes the method more robust towards variable event lengths. Further, we show that an ensemble of prototypical neural networks trained on different training and validation splits of time-frequency images with different loudness normalizations reduces false-positives. In addition, we present an analysis on the effect that the studied loudness normalization techniques have on the performance of the prototypical network ensemble. Overall, per-channel energy normalization (PCEN) outperforms the standard log transform for this task. The method uses no data augmentation and no external data. The proposed approach achieves a F-score of 48.0% when evaluated on the hidden test set of the Detection and Classification of Acoustic Scenes and Events (DCASE) task 5.}},
  author       = {{Martinsson, John and Sandsten, Maria and Willbo, Martin and Pirinen, Aleksis and Mogren, Olof}},
  booktitle    = {{Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2022)}},
  isbn         = {{978-952-03-2677-7}},
  keywords     = {{machine listening; bioacoustics; few-shot learning; ensemble}},
  language     = {{eng}},
  title        = {{Few-Shot Bioacoustic Event Detection Using an Event-Length Adapted Ensemble of Prototypical Networks}},
  url          = {{https://dcase.community/documents/workshop2022/proceedings/DCASE2022Workshop_Martinsson_13.pdf}},
  year         = {{2022}},
}