Alternative implementations of the Auxiliary Duplicating Permutation Invariant Training
(2024)- Abstract
- Simultaneous sound event localization and detection (SELD) for multi-source sound events is an open research field. The Multi-ACCDOA format is a popular way to handle activity-coupled sound events where the same class occurs at multiple locations at the same time. An important part is the Auxiliary Duplicating Permutation Invariant Training (ADPIT) paradigm that calculates the loss for order-agnosic output. The baseline system for the DCASE SELD challenge 2024 has an implementation of ADPIT. In this paper we discuss alternative ways to implement
ADPIT with the goal to reduce multiplications, to make the equivalent calculations faster. ADPIT duplicates output when there are fewer events than tracks. A brief discussion how this differs... (More) - Simultaneous sound event localization and detection (SELD) for multi-source sound events is an open research field. The Multi-ACCDOA format is a popular way to handle activity-coupled sound events where the same class occurs at multiple locations at the same time. An important part is the Auxiliary Duplicating Permutation Invariant Training (ADPIT) paradigm that calculates the loss for order-agnosic output. The baseline system for the DCASE SELD challenge 2024 has an implementation of ADPIT. In this paper we discuss alternative ways to implement
ADPIT with the goal to reduce multiplications, to make the equivalent calculations faster. ADPIT duplicates output when there are fewer events than tracks. A brief discussion how this differs from permutation invariant training without duplicated output is also included. The loss calculations are likely not the execution bottleneck in the current challenge setup, but ADPIT scales poorly for an increased number of tracks and improved efficiency is thus of general interest for audio localization. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/4e67bf36-e4b3-48b5-b1ff-c59772d0bc55
- author
- Gulin, Jens
LU
and Åström, Kalle LU
- organization
-
- Computer Vision and Machine Learning (research group)
- Integrated Electronic Systems (research group)
- LU Profile Area: Natural and Artificial Cognition
- LTH Profile Area: AI and Digitalization
- ELLIIT: the Linköping-Lund initiative on IT and mobile communication
- eSSENCE: The e-Science Collaboration
- Mathematical Imaging Group (research group)
- publishing date
- 2024
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- in press
- subject
- host publication
- Proceedings of the Work-in-Progress Papers at the 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN-WiP 2024)
- language
- English
- LU publication?
- yes
- id
- 4e67bf36-e4b3-48b5-b1ff-c59772d0bc55
- date added to LUP
- 2024-10-05 04:33:14
- date last changed
- 2024-10-07 11:53:25
@inproceedings{4e67bf36-e4b3-48b5-b1ff-c59772d0bc55, abstract = {{Simultaneous sound event localization and detection (SELD) for multi-source sound events is an open research field. The Multi-ACCDOA format is a popular way to handle activity-coupled sound events where the same class occurs at multiple locations at the same time. An important part is the Auxiliary Duplicating Permutation Invariant Training (ADPIT) paradigm that calculates the loss for order-agnosic output. The baseline system for the DCASE SELD challenge 2024 has an implementation of ADPIT. In this paper we discuss alternative ways to implement<br/>ADPIT with the goal to reduce multiplications, to make the equivalent calculations faster. ADPIT duplicates output when there are fewer events than tracks. A brief discussion how this differs from permutation invariant training without duplicated output is also included. The loss calculations are likely not the execution bottleneck in the current challenge setup, but ADPIT scales poorly for an increased number of tracks and improved efficiency is thus of general interest for audio localization.}}, author = {{Gulin, Jens and Åström, Kalle}}, booktitle = {{Proceedings of the Work-in-Progress Papers at the 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN-WiP 2024)}}, language = {{eng}}, title = {{Alternative implementations of the Auxiliary Duplicating Permutation Invariant Training}}, url = {{https://lup.lub.lu.se/search/files/196591300/IPINwip_ARMPIT_final.pdf}}, year = {{2024}}, }