Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Learning Multi-Target TDOA Features for Sound Event Localization and Detection

Berg, Axel LU orcid ; Engman, Johanna LU ; Gulin, Jens LU orcid ; Åström, Kalle LU orcid and Oskarsson, Magnus LU orcid (2024) Workshop on Detection and Classification of Acoustic Scenes and Events, DCASE 2024 p.16-20
Abstract
Sound event localization and detection (SELD) systems using audio recordings from a microphone array rely on spatial cues for determining the location of sound events. As a consequence, the localization performance of such systems is to a large extent determined by the quality of the audio features that are used as inputs to the system. We propose a new feature, based on neural generalized cross-correlations with phase-transform (NGCC-PHAT), that learns audio representations suitable for localization. Using permutation invariant training for the time-difference of arrival (TDOA) estimation problem enables NGCC-PHAT to learn TDOA features for multiple overlapping sound events. These features can be used as a drop-in replacement for GCC-PHAT... (More)
Sound event localization and detection (SELD) systems using audio recordings from a microphone array rely on spatial cues for determining the location of sound events. As a consequence, the localization performance of such systems is to a large extent determined by the quality of the audio features that are used as inputs to the system. We propose a new feature, based on neural generalized cross-correlations with phase-transform (NGCC-PHAT), that learns audio representations suitable for localization. Using permutation invariant training for the time-difference of arrival (TDOA) estimation problem enables NGCC-PHAT to learn TDOA features for multiple overlapping sound events. These features can be used as a drop-in replacement for GCC-PHAT inputs to a SELD-network. We test our method on the STARSS23 dataset and demonstrate improved localization performance compared to using standard GCC-PHAT or SALSA-Lite input features. (Less)
Please use this url to cite or link to this publication:
author
; ; ; and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
sound event localization and detection, time difference of arrival, generalized cross-correlation
host publication
Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024)
pages
16 - 20
publisher
Zenodo
conference name
Workshop on Detection and Classification of Acoustic Scenes and Events, DCASE 2024
conference location
Tokyo, Japan
conference dates
2024-10-23 - 2024-10-25
ISBN
978-952-03-3171-9
language
English
LU publication?
yes
id
c2719617-5e34-4797-8f0d-61adc5c6108c
alternative location
https://dcase.community/documents/workshop2024/proceedings/DCASE2024Workshop_Berg_46.pdf
date added to LUP
2024-11-04 08:28:34
date last changed
2025-04-04 14:12:51
@inproceedings{c2719617-5e34-4797-8f0d-61adc5c6108c,
  abstract     = {{Sound event localization and detection (SELD) systems using audio recordings from a microphone array rely on spatial cues for determining the location of sound events. As a consequence, the localization performance of such systems is to a large extent determined by the quality of the audio features that are used as inputs to the system. We propose a new feature, based on neural generalized cross-correlations with phase-transform (NGCC-PHAT), that learns audio representations suitable for localization. Using permutation invariant training for the time-difference of arrival (TDOA) estimation problem enables NGCC-PHAT to learn TDOA features for multiple overlapping sound events. These features can be used as a drop-in replacement for GCC-PHAT inputs to a SELD-network. We test our method on the STARSS23 dataset and demonstrate improved localization performance compared to using standard GCC-PHAT or SALSA-Lite input features.}},
  author       = {{Berg, Axel and Engman, Johanna and Gulin, Jens and Åström, Kalle and Oskarsson, Magnus}},
  booktitle    = {{Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024)}},
  isbn         = {{978-952-03-3171-9}},
  keywords     = {{sound event localization and detection; time difference of arrival; generalized cross-correlation}},
  language     = {{eng}},
  pages        = {{16--20}},
  publisher    = {{Zenodo}},
  title        = {{Learning Multi-Target TDOA Features for Sound Event Localization and Detection}},
  url          = {{https://dcase.community/documents/workshop2024/proceedings/DCASE2024Workshop_Berg_46.pdf}},
  year         = {{2024}},
}