wav2pos: Sound Source Localization using Masked Autoencoders
(2024) 2024 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN) In International Conference on Indoor Positioning and Indoor Navigation (IPIN) p.1-8- Abstract
- We present a novel approach to the 3D sound source localization task for distributed ad-hoc microphone arrays by formulating it as a set-to-set regression problem. By training a multi-modal masked autoencoder model that operates on audio recordings and microphone coordinates, we show that such a formulation allows for accurate localization of the sound source, by reconstructing coordinates masked in the input. Our approach is flexible in the sense that a single model can be used with an arbitrary number of microphones, even when a subset of audio recordings and microphone coordinates are missing. We test our method on simulated and real-world recordings of music and speech in indoor environments, and demonstrate competitive performance... (More)
- We present a novel approach to the 3D sound source localization task for distributed ad-hoc microphone arrays by formulating it as a set-to-set regression problem. By training a multi-modal masked autoencoder model that operates on audio recordings and microphone coordinates, we show that such a formulation allows for accurate localization of the sound source, by reconstructing coordinates masked in the input. Our approach is flexible in the sense that a single model can be used with an arbitrary number of microphones, even when a subset of audio recordings and microphone coordinates are missing. We test our method on simulated and real-world recordings of music and speech in indoor environments, and demonstrate competitive performance compared to both classical and other learning based localization methods. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/4b3af846-795b-4ac7-956f-6aded73bc4e1
- author
- Berg, Axel
LU
; Gulin, Jens LU
; O'Connor, Mark ; Zhou, Chuteng ; Åström, Kalle LU
and Oskarsson, Magnus LU
- organization
- publishing date
- 2024
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- keywords
- sound source localization, masked autoencoders, transformers
- host publication
- 2024 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN)
- series title
- International Conference on Indoor Positioning and Indoor Navigation (IPIN)
- pages
- 8 pages
- publisher
- IEEE - Institute of Electrical and Electronics Engineers Inc.
- conference name
- 2024 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN)
- conference location
- Hong Kong
- conference dates
- 2024-10-14 - 2024-10-17
- external identifiers
-
- scopus:85216392587
- ISSN
- 2162-7347
- 2471-917X
- ISBN
- 979-8-3503-6641-9
- 979-8-3503-6640-2
- DOI
- 10.1109/IPIN62893.2024.10786105
- project
- Deep Learning for Simultaneous Localization and Mapping
- language
- English
- LU publication?
- yes
- id
- 4b3af846-795b-4ac7-956f-6aded73bc4e1
- alternative location
- https://arxiv.org/abs/2408.15771
- date added to LUP
- 2024-11-27 08:51:13
- date last changed
- 2025-07-03 11:00:34
@inproceedings{4b3af846-795b-4ac7-956f-6aded73bc4e1, abstract = {{We present a novel approach to the 3D sound source localization task for distributed ad-hoc microphone arrays by formulating it as a set-to-set regression problem. By training a multi-modal masked autoencoder model that operates on audio recordings and microphone coordinates, we show that such a formulation allows for accurate localization of the sound source, by reconstructing coordinates masked in the input. Our approach is flexible in the sense that a single model can be used with an arbitrary number of microphones, even when a subset of audio recordings and microphone coordinates are missing. We test our method on simulated and real-world recordings of music and speech in indoor environments, and demonstrate competitive performance compared to both classical and other learning based localization methods.}}, author = {{Berg, Axel and Gulin, Jens and O'Connor, Mark and Zhou, Chuteng and Åström, Kalle and Oskarsson, Magnus}}, booktitle = {{2024 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN)}}, isbn = {{979-8-3503-6641-9}}, issn = {{2162-7347}}, keywords = {{sound source localization; masked autoencoders; transformers}}, language = {{eng}}, pages = {{1--8}}, publisher = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}}, series = {{International Conference on Indoor Positioning and Indoor Navigation (IPIN)}}, title = {{wav2pos: Sound Source Localization using Masked Autoencoders}}, url = {{http://dx.doi.org/10.1109/IPIN62893.2024.10786105}}, doi = {{10.1109/IPIN62893.2024.10786105}}, year = {{2024}}, }