THE LU SYSTEM FOR DCASE 2024 SOUND EVENT LOCALIZATION AND DETECTION CHALLENGE

Berg, Axel; Engman, Johanna; Gulin, Jens; Åström, Karl, et al. (2024-06-30). THE LU SYSTEM FOR DCASE 2024 SOUND EVENT LOCALIZATION AND DETECTION CHALLENGE In
Download:
URL:
Report | Published | English
Authors:
Berg, Axel ; Engman, Johanna ; Gulin, Jens ; Åström, Karl , et al.
Department:
Computer Vision and Machine Learning
LU Profile Area: Natural and Artificial Cognition
Integrated Electronic Systems
Mathematical Imaging Group
LTH Profile Area: AI and Digitalization
ELLIIT: the Linköping-Lund initiative on IT and mobile communication
eSSENCE: The e-Science Collaboration
Stroke Imaging Research group
LTH Profile Area: Engineering Health
LU Profile Area: Proactive Ageing
LU Profile Area: Light and Materials
LU Profile Area: Nature-based future solutions
Research Group:
Computer Vision and Machine Learning
Integrated Electronic Systems
Mathematical Imaging Group
Stroke Imaging Research group
Abstract:
This technical report gives an overview of our submission to task 3 of the DCASE 2024 challenge. We present a sound event localization and detection (SELD) system using input features based on trainable neural generalized cross-correlations with phase transform (NGCC-PHAT). With these features together with spectrograms as input to a Transformer-based network, we achieve significant improvements over the baseline method. In addition, we also present an audio-visual version of our system, where distance predictions are updated using depth maps from the panorama video frames.
LUP-ID:
ccb5d1f3-8c87-4398-9b2d-c3260c0f2fd3 | Link: https://lup.lub.lu.se/record/ccb5d1f3-8c87-4398-9b2d-c3260c0f2fd3 | Statistics

Cite this