Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

THE LU SYSTEM FOR DCASE 2024 SOUND EVENT LOCALIZATION AND DETECTION CHALLENGE

Berg, Axel LU orcid ; Engman, Johanna LU ; Gulin, Jens LU orcid ; Åström, Karl LU orcid and Oskarsson, Magnus LU orcid (2024)
Abstract
This technical report gives an overview of our submission to task 3 of the DCASE 2024 challenge. We present a sound event localization and detection (SELD) system using input features based on trainable neural generalized cross-correlations with phase transform (NGCC-PHAT). With these features together with spectrograms as input to a Transformer-based network, we achieve significant improvements over the baseline method. In addition, we also present an audio-visual version of our system, where distance predictions are updated using depth maps from the panorama video frames.
Please use this url to cite or link to this publication:
@techreport{ccb5d1f3-8c87-4398-9b2d-c3260c0f2fd3,
  abstract     = {{This technical report gives an overview of our submission to task 3 of the DCASE 2024 challenge. We present a sound event localization and detection (SELD) system using input features based on trainable neural generalized cross-correlations with phase transform (NGCC-PHAT). With these features together with spectrograms as input to a Transformer-based network, we achieve significant improvements over the baseline method. In addition, we also present an audio-visual version of our system, where distance predictions are updated using depth maps from the panorama video frames.}},
  author       = {{Berg, Axel and Engman, Johanna and Gulin, Jens and Åström, Karl and Oskarsson, Magnus}},
  language     = {{eng}},
  month        = {{06}},
  title        = {{THE LU SYSTEM FOR DCASE 2024 SOUND EVENT LOCALIZATION AND DETECTION CHALLENGE}},
  url          = {{https://dcase.community/documents/challenge2024/technical_reports/DCASE2024_Berg_24_t3.pdf}},
  year         = {{2024}},
}