THE LU SYSTEM FOR DCASE 2024 SOUND EVENT LOCALIZATION AND DETECTION CHALLENGE

Berg, Axel; Engman, Johanna; Gulin, Jens; Åström, Karl; Oskarsson, Magnus

THE LU SYSTEM FOR DCASE 2024 SOUND EVENT LOCALIZATION AND DETECTION CHALLENGE

Mark

Berg, Axel ^LU

; Engman, Johanna ^LU ; Gulin, Jens ^LU

; Åström, Karl ^LU

and Oskarsson, Magnus ^LU

(2024)

Abstract: This technical report gives an overview of our submission to task 3 of the DCASE 2024 challenge. We present a sound event localization and detection (SELD) system using input features based on trainable neural generalized cross-correlations with phase transform (NGCC-PHAT). With these features together with spectrograms as input to a Transformer-based network, we achieve significant improvements over the baseline method. In addition, we also present an audio-visual version of our system, where distance predictions are updated using depth maps from the panorama video frames.

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/ccb5d1f3-8c87-4398-9b2d-c3260c0f2fd3

author

Berg, Axel ^LU

; Engman, Johanna ^LU ; Gulin, Jens ^LU

; Åström, Karl ^LU

and Oskarsson, Magnus ^LU

organization

publishing date

2024-06-30

type

Book/Report

publication status

published

subject

Computer graphics and computer vision

language

English

LU publication?

yes

id

ccb5d1f3-8c87-4398-9b2d-c3260c0f2fd3

alternative location

https://dcase.community/documents/challenge2024/technical_reports/DCASE2024_Berg_24_t3.pdf

date added to LUP

2024-07-22 21:57:31

date last changed

2025-04-04 14:21:11

@techreport{ccb5d1f3-8c87-4398-9b2d-c3260c0f2fd3,
  abstract     = {{This technical report gives an overview of our submission to task 3 of the DCASE 2024 challenge. We present a sound event localization and detection (SELD) system using input features based on trainable neural generalized cross-correlations with phase transform (NGCC-PHAT). With these features together with spectrograms as input to a Transformer-based network, we achieve significant improvements over the baseline method. In addition, we also present an audio-visual version of our system, where distance predictions are updated using depth maps from the panorama video frames.}},
  author       = {{Berg, Axel and Engman, Johanna and Gulin, Jens and Åström, Karl and Oskarsson, Magnus}},
  language     = {{eng}},
  month        = {{06}},
  title        = {{THE LU SYSTEM FOR DCASE 2024 SOUND EVENT LOCALIZATION AND DETECTION CHALLENGE}},
  url          = {{https://dcase.community/documents/challenge2024/technical_reports/DCASE2024_Berg_24_t3.pdf}},
  year         = {{2024}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

THE LU SYSTEM FOR DCASE 2024 SOUND EVENT LOCALIZATION AND DETECTION CHALLENGE