SONNET : Enhancing Time Delay Estimation by Leveraging Simulated Audio

Tegler, Erik; Oskarsson, Magnus; Åström, Kalle

SONNET : Enhancing Time Delay Estimation by Leveraging Simulated Audio

Mark

Tegler, Erik ^LU ; Oskarsson, Magnus ^LU

and Åström, Kalle ^LU

(2025) 27th International Conference on Pattern Recognition, ICPR 2024 In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 15320 LNCS. p.289-303

Abstract: Time delay estimation or Time-Difference-Of-Arrival estimates is a critical component for multiple localization applications such as multilateration, direction of arrival, and self-calibration. The task is to estimate the time difference between a signal arriving at two different sensors. For the audio sensor modality, most current systems are based on classical methods such as the Generalized Cross-Correlation Phase Transform (GCC-PHAT) method. In this paper we demonstrate that learning based methods can— even based on synthetic data—significantly outperform GCC-PHAT on novel real world data. To overcome the lack of data with ground truth for the task, we train our model on a simulated dataset which is sufficiently large and varied,... (More); Time delay estimation or Time-Difference-Of-Arrival estimates is a critical component for multiple localization applications such as multilateration, direction of arrival, and self-calibration. The task is to estimate the time difference between a signal arriving at two different sensors. For the audio sensor modality, most current systems are based on classical methods such as the Generalized Cross-Correlation Phase Transform (GCC-PHAT) method. In this paper we demonstrate that learning based methods can— even based on synthetic data—significantly outperform GCC-PHAT on novel real world data. To overcome the lack of data with ground truth for the task, we train our model on a simulated dataset which is sufficiently large and varied, and that captures the relevant characteristics of the real world problem. We provide our trained model, SONNET (Simulation Optimized Neural Network Estimator of Timeshifts), which is runnable in real-time and works on novel data out of the box for many real data applications, i.e. without re-training. We further demonstrate greatly improved performance on the downstream task of self-calibration when using our model compared to classical methods.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/aa3af396-6cc5-40bf-8e28-18bce4ce6e6e

author

Tegler, Erik ^LU ; Oskarsson, Magnus ^LU

and Åström, Kalle ^LU

organization

publishing date

2025

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

Signal Processing

keywords

Audio, Data Simulation, Generalized Cross-Correlation, Time Delay Estimation, Time-Difference-of-Arrival

host publication

Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings

series title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

editor

Antonacopoulos, Apostolos ; Chaudhuri, Subhasis ; Chellappa, Rama ; Liu, Cheng-Lin ; Bhattacharya, Saumik and Pal, Umapada

volume

15320 LNCS

pages

15 pages

publisher

Springer

conference name

27th International Conference on Pattern Recognition, ICPR 2024

conference location

Kolkata, India

conference dates

2024-12-01 - 2024-12-05

external identifiers

scopus:85212247795

ISSN

0302-9743

1611-3349

ISBN

9783031784972

DOI

10.1007/978-3-031-78498-9_20

language

English

LU publication?

yes

additional info

id

aa3af396-6cc5-40bf-8e28-18bce4ce6e6e

date added to LUP

2025-01-22 11:36:01

date last changed

2025-07-24 02:08:57

@inproceedings{aa3af396-6cc5-40bf-8e28-18bce4ce6e6e,
  abstract     = {{<p>Time delay estimation or Time-Difference-Of-Arrival estimates is a critical component for multiple localization applications such as multilateration, direction of arrival, and self-calibration. The task is to estimate the time difference between a signal arriving at two different sensors. For the audio sensor modality, most current systems are based on classical methods such as the Generalized Cross-Correlation Phase Transform (GCC-PHAT) method. In this paper we demonstrate that learning based methods can— even based on synthetic data—significantly outperform GCC-PHAT on novel real world data. To overcome the lack of data with ground truth for the task, we train our model on a simulated dataset which is sufficiently large and varied, and that captures the relevant characteristics of the real world problem. We provide our trained model, SONNET (Simulation Optimized Neural Network Estimator of Timeshifts), which is runnable in real-time and works on novel data out of the box for many real data applications, i.e. without re-training. We further demonstrate greatly improved performance on the downstream task of self-calibration when using our model compared to classical methods.</p>}},
  author       = {{Tegler, Erik and Oskarsson, Magnus and Åström, Kalle}},
  booktitle    = {{Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings}},
  editor       = {{Antonacopoulos, Apostolos and Chaudhuri, Subhasis and Chellappa, Rama and Liu, Cheng-Lin and Bhattacharya, Saumik and Pal, Umapada}},
  isbn         = {{9783031784972}},
  issn         = {{0302-9743}},
  keywords     = {{Audio; Data Simulation; Generalized Cross-Correlation; Time Delay Estimation; Time-Difference-of-Arrival}},
  language     = {{eng}},
  pages        = {{289--303}},
  publisher    = {{Springer}},
  series       = {{Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)}},
  title        = {{SONNET : Enhancing Time Delay Estimation by Leveraging Simulated Audio}},
  url          = {{http://dx.doi.org/10.1007/978-3-031-78498-9_20}},
  doi          = {{10.1007/978-3-031-78498-9_20}},
  volume       = {{15320 LNCS}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

SONNET : Enhancing Time Delay Estimation by Leveraging Simulated Audio