wav2pos: Sound Source Localization using Masked Autoencoders

Berg, Axel; Gulin, Jens; O'Connor, Mark; Zhou, Chuteng, et al. (2024). wav2pos: Sound Source Localization using Masked Autoencoders 2024 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN), 1 - 8. 2024 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN). Hong Kong: IEEE - Institute of Electrical and Electronics Engineers Inc.

Download:

Portal Link

URL:

https://arxiv.org/abs/2408.15771

DOI:

10.1109/IPIN62893.2024.10786105

Conference Proceeding/Paper | Published | English

Authors:

Berg, Axel ; Gulin, Jens ; O'Connor, Mark ; Zhou, Chuteng , et al.

Department:

Computer Vision and Machine Learning
Integrated Electronic Systems
LU Profile Area: Natural and Artificial Cognition
ELLIIT: the Linköping-Lund initiative on IT and mobile communication

Project:

Deep Learning for Simultaneous Localization and Mapping

Research Group:

Computer Vision and Machine Learning

Abstract:
We present a novel approach to the 3D sound source localization task for distributed ad-hoc microphone arrays by formulating it as a set-to-set regression problem. By training a multi-modal masked autoencoder model that operates on audio recordings and microphone coordinates, we show that such a formulation allows for accurate localization of the sound source, by reconstructing coordinates masked in the input. Our approach is flexible in the sense that a single model can be used with an arbitrary number of microphones, even when a subset of audio recordings and microphone coordinates are missing. We test our method on simulated and real-world recordings of music and speech in indoor environments, and demonstrate competitive performance compared to both classical and other learning based localization methods.

Keywords:

sound source localization ; masked autoencoders ; transformers

ISBN:

979-8-3503-6641-9

ISSN:

2471-917X

LUP-ID:

4b3af846-795b-4ac7-956f-6aded73bc4e1 | Link: https://lup.lub.lu.se/record/4b3af846-795b-4ac7-956f-6aded73bc4e1 | Statistics

wav2pos: Sound Source Localization using Masked Autoencoders

Cite this