wav2pos: Sound Source Localization using Masked Autoencoders

Berg, Axel; Gulin, Jens; O'Connor, Mark; Zhou, Chuteng, et al. (2024). wav2pos: Sound Source Localization using Masked Autoencoders 2024 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN), 1 - 8. 2024 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN). Hong Kong: IEEE - Institute of Electrical and Electronics Engineers Inc.
Download:
URL:
DOI:
Conference Proceeding/Paper | Published | English
Authors:
Berg, Axel ; Gulin, Jens ; O'Connor, Mark ; Zhou, Chuteng , et al.
Department:
Computer Vision and Machine Learning
Integrated Electronic Systems
LU Profile Area: Natural and Artificial Cognition
ELLIIT: the Linköping-Lund initiative on IT and mobile communication
Project:
Deep Learning for Simultaneous Localization and Mapping
Research Group:
Computer Vision and Machine Learning
Integrated Electronic Systems
Abstract:
We present a novel approach to the 3D sound source localization task for distributed ad-hoc microphone arrays by formulating it as a set-to-set regression problem. By training a multi-modal masked autoencoder model that operates on audio recordings and microphone coordinates, we show that such a formulation allows for accurate localization of the sound source, by reconstructing coordinates masked in the input. Our approach is flexible in the sense that a single model can be used with an arbitrary number of microphones, even when a subset of audio recordings and microphone coordinates are missing. We test our method on simulated and real-world recordings of music and speech in indoor environments, and demonstrate competitive performance compared to both classical and other learning based localization methods.
Keywords:
sound source localization ; masked autoencoders ; transformers
ISBN:
979-8-3503-6641-9
ISSN:
2471-917X
LUP-ID:
4b3af846-795b-4ac7-956f-6aded73bc4e1 | Link: https://lup.lub.lu.se/record/4b3af846-795b-4ac7-956f-6aded73bc4e1 | Statistics

Cite this