Learning Multi-Target TDOA Features for Sound Event Localization and Detection

Berg, Axel; Engman, Johanna; Gulin, Jens; Åström, Kalle, et al. (2024). Learning Multi-Target TDOA Features for Sound Event Localization and Detection Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024), 16 - 20. Workshop on Detection and Classification of Acoustic Scenes and Events, DCASE 2024. Tokyo, Japan: Zenodo

Download:

Portal Link

URL:

https://dcase.community/documents/workshop2024/proceedings/DCASE2024Workshop_Berg_46.pdf

Conference Proceeding/Paper | Published | English

Authors:

Berg, Axel ; Engman, Johanna ; Gulin, Jens ; Åström, Kalle , et al.

Department:

Computer Vision and Machine Learning
LU Profile Area: Natural and Artificial Cognition
Integrated Electronic Systems
eSSENCE: The e-Science Collaboration

Research Group:

Computer Vision and Machine Learning

Abstract:
Sound event localization and detection (SELD) systems using audio recordings from a microphone array rely on spatial cues for determining the location of sound events. As a consequence, the localization performance of such systems is to a large extent determined by the quality of the audio features that are used as inputs to the system. We propose a new feature, based on neural generalized cross-correlations with phase-transform (NGCC-PHAT), that learns audio representations suitable for localization. Using permutation invariant training for the time-difference of arrival (TDOA) estimation problem enables NGCC-PHAT to learn TDOA features for multiple overlapping sound events. These features can be used as a drop-in replacement for GCC-PHAT inputs to a SELD-network. We test our method on the STARSS23 dataset and demonstrate improved localization performance compared to using standard GCC-PHAT or SALSA-Lite input features.

Keywords:

sound event localization and detection ; time difference of arrival ; generalized cross-correlation

ISBN:

978-952-03-3171-9

LUP-ID:

c2719617-5e34-4797-8f0d-61adc5c6108c | Link: https://lup.lub.lu.se/record/c2719617-5e34-4797-8f0d-61adc5c6108c | Statistics

Learning Multi-Target TDOA Features for Sound Event Localization and Detection

Cite this