Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Extending GCC-PHAT using Shift Equivariant Neural Networks

Berg, Axel LU orcid ; O'Connor, Mark ; Åström, Kalle LU orcid and Oskarsson, Magnus LU orcid (2022) Interspeech 2022 In Interspeech p.1791-1795
Abstract
Speaker localization using microphone arrays depends on accurate time delay estimation techniques. For decades, methods based on the generalized cross correlation with phase transform (GCC-PHAT) have been widely adopted for this purpose. Recently, the GCC-PHAT has also been used to provide input features to neural networks in order to remove the effects of noise and reverberation, but at the cost of losing theoretical guarantees in noise-free conditions. We propose a novel approach to extending the GCC-PHAT, where the received signals are filtered using a shift equivariant neural network that preserves the timing information contained in the signals. By extensive experiments we show that our model consistently reduces the error of the... (More)
Speaker localization using microphone arrays depends on accurate time delay estimation techniques. For decades, methods based on the generalized cross correlation with phase transform (GCC-PHAT) have been widely adopted for this purpose. Recently, the GCC-PHAT has also been used to provide input features to neural networks in order to remove the effects of noise and reverberation, but at the cost of losing theoretical guarantees in noise-free conditions. We propose a novel approach to extending the GCC-PHAT, where the received signals are filtered using a shift equivariant neural network that preserves the timing information contained in the signals. By extensive experiments we show that our model consistently reduces the error of the GCC-PHAT in adverse environments, with guarantees of exact time delay recovery in ideal conditions. (Less)
Please use this url to cite or link to this publication:
author
; ; and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
speaker localization, TDOA, machine learning
host publication
Proceedings of the Annual Conference of the International Speech Communication Association 2022
series title
Interspeech
pages
5 pages
publisher
ISCA
conference name
Interspeech 2022
conference location
Incheon, Korea, Republic of
conference dates
2022-09-18 - 2022-09-22
external identifiers
  • scopus:85140089206
DOI
10.21437/Interspeech.2022-524
project
Deep Learning for Simultaneous Localization and Mapping
language
English
LU publication?
yes
id
a50aa106-10f7-4972-a144-f1570abf1580
alternative location
https://arxiv.org/abs/2208.04654
date added to LUP
2022-09-21 08:34:13
date last changed
2023-11-21 11:32:20
@inproceedings{a50aa106-10f7-4972-a144-f1570abf1580,
  abstract     = {{Speaker localization using microphone arrays depends on accurate time delay estimation techniques. For decades, methods based on the generalized cross correlation with phase transform (GCC-PHAT) have been widely adopted for this purpose. Recently, the GCC-PHAT has also been used to provide input features to neural networks in order to remove the effects of noise and reverberation, but at the cost of losing theoretical guarantees in noise-free conditions. We propose a novel approach to extending the GCC-PHAT, where the received signals are filtered using a shift equivariant neural network that preserves the timing information contained in the signals. By extensive experiments we show that our model consistently reduces the error of the GCC-PHAT in adverse environments, with guarantees of exact time delay recovery in ideal conditions.}},
  author       = {{Berg, Axel and O'Connor, Mark and Åström, Kalle and Oskarsson, Magnus}},
  booktitle    = {{Proceedings of the Annual Conference of the International Speech Communication Association 2022}},
  keywords     = {{speaker localization; TDOA; machine learning}},
  language     = {{eng}},
  pages        = {{1791--1795}},
  publisher    = {{ISCA}},
  series       = {{Interspeech}},
  title        = {{Extending GCC-PHAT using Shift Equivariant Neural Networks}},
  url          = {{http://dx.doi.org/10.21437/Interspeech.2022-524}},
  doi          = {{10.21437/Interspeech.2022-524}},
  year         = {{2022}},
}