Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

wav2pos: Sound Source Localization using Masked Autoencoders

Berg, Axel LU orcid ; Gulin, Jens LU orcid ; O'Connor, Mark ; Zhou, Chuteng ; Åström, Kalle LU orcid and Oskarsson, Magnus LU orcid (2024) 2024 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN) In International Conference on Indoor Positioning and Indoor Navigation (IPIN) p.1-8
Abstract
We present a novel approach to the 3D sound source localization task for distributed ad-hoc microphone arrays by formulating it as a set-to-set regression problem. By training a multi-modal masked autoencoder model that operates on audio recordings and microphone coordinates, we show that such a formulation allows for accurate localization of the sound source, by reconstructing coordinates masked in the input. Our approach is flexible in the sense that a single model can be used with an arbitrary number of microphones, even when a subset of audio recordings and microphone coordinates are missing. We test our method on simulated and real-world recordings of music and speech in indoor environments, and demonstrate competitive performance... (More)
We present a novel approach to the 3D sound source localization task for distributed ad-hoc microphone arrays by formulating it as a set-to-set regression problem. By training a multi-modal masked autoencoder model that operates on audio recordings and microphone coordinates, we show that such a formulation allows for accurate localization of the sound source, by reconstructing coordinates masked in the input. Our approach is flexible in the sense that a single model can be used with an arbitrary number of microphones, even when a subset of audio recordings and microphone coordinates are missing. We test our method on simulated and real-world recordings of music and speech in indoor environments, and demonstrate competitive performance compared to both classical and other learning based localization methods. (Less)
Please use this url to cite or link to this publication:
author
; ; ; ; and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
sound source localization, masked autoencoders, transformers
host publication
2024 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN)
series title
International Conference on Indoor Positioning and Indoor Navigation (IPIN)
pages
8 pages
publisher
IEEE - Institute of Electrical and Electronics Engineers Inc.
conference name
2024 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN)
conference location
Hong Kong
conference dates
2024-10-14 - 2024-10-17
external identifiers
  • scopus:85216392587
ISSN
2162-7347
2471-917X
ISBN
979-8-3503-6641-9
979-8-3503-6640-2
DOI
10.1109/IPIN62893.2024.10786105
project
Deep Learning for Simultaneous Localization and Mapping
language
English
LU publication?
yes
id
4b3af846-795b-4ac7-956f-6aded73bc4e1
alternative location
https://arxiv.org/abs/2408.15771
date added to LUP
2024-11-27 08:51:13
date last changed
2025-07-03 11:00:34
@inproceedings{4b3af846-795b-4ac7-956f-6aded73bc4e1,
  abstract     = {{We present a novel approach to the 3D sound source localization task for distributed ad-hoc microphone arrays by formulating it as a set-to-set regression problem. By training a multi-modal masked autoencoder model that operates on audio recordings and microphone coordinates, we show that such a formulation allows for accurate localization of the sound source, by reconstructing coordinates masked in the input. Our approach is flexible in the sense that a single model can be used with an arbitrary number of microphones, even when a subset of audio recordings and microphone coordinates are missing. We test our method on simulated and real-world recordings of music and speech in indoor environments, and demonstrate competitive performance compared to both classical and other learning based localization methods.}},
  author       = {{Berg, Axel and Gulin, Jens and O'Connor, Mark and Zhou, Chuteng and Åström, Kalle and Oskarsson, Magnus}},
  booktitle    = {{2024 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN)}},
  isbn         = {{979-8-3503-6641-9}},
  issn         = {{2162-7347}},
  keywords     = {{sound source localization; masked autoencoders; transformers}},
  language     = {{eng}},
  pages        = {{1--8}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{International Conference on Indoor Positioning and Indoor Navigation (IPIN)}},
  title        = {{wav2pos: Sound Source Localization using Masked Autoencoders}},
  url          = {{http://dx.doi.org/10.1109/IPIN62893.2024.10786105}},
  doi          = {{10.1109/IPIN62893.2024.10786105}},
  year         = {{2024}},
}