THE LU SYSTEM FOR DCASE 2024 SOUND EVENT LOCALIZATION AND DETECTION CHALLENGE
Berg, Axel; Engman, Johanna; Gulin, Jens; Åström, Karl, et al. (2024-06-30). THE LU SYSTEM FOR DCASE 2024 SOUND EVENT LOCALIZATION AND DETECTION CHALLENGE In
Report
|
Published
|
English
Authors:
Berg, Axel
;
Engman, Johanna
;
Gulin, Jens
;
Åström, Karl
, et al.
Department:
Computer Vision and Machine Learning
LU Profile Area: Natural and Artificial Cognition
Integrated Electronic Systems
Mathematical Imaging Group
LTH Profile Area: AI and Digitalization
ELLIIT: the Linköping-Lund initiative on IT and mobile communication
eSSENCE: The e-Science Collaboration
Stroke Imaging Research group
LTH Profile Area: Engineering Health
LU Profile Area: Proactive Ageing
LU Profile Area: Light and Materials
LU Profile Area: Nature-based future solutions
Research Group:
Computer Vision and Machine Learning
Integrated Electronic Systems
Mathematical Imaging Group
Stroke Imaging Research group
Abstract:
This technical report gives an overview of our submission to task 3 of the DCASE 2024 challenge. We present a sound event localization and detection (SELD) system using input features based on trainable neural generalized cross-correlations with phase transform (NGCC-PHAT). With these features together with spectrograms as input to a Transformer-based network, we achieve significant improvements over the baseline method. In addition, we also present an audio-visual version of our system, where distance predictions are updated using depth maps from the panorama video frames.
Cite this