GCC-PHAT Re-Imagined - A U-Net Filter for Audio TDOA Peak-Selection
(2024) 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing p.8806-8810- Abstract
- Time-difference-of-arrival (TDOA) estimation from GCC-PHAT is not always as straight forward as finding the maximum peak. This work views the GCC output as an image, with time on the vertical axis and TDOA horizontally, to explore if image-to-image machine learning methods can make a more robust filter. The Structure from Sound Database provides audio recorded with a distributed microphone setup and a moving sound source. The audio was fed to GCC-PHAT without pre-processing, and images were produced for batch processing. The ground truth, the direct-path TDOA, shows a continuous curve through time. The GCC output image has a similar curve, but obscured by noise and not at all times texturally different from the multi-path components. The... (More)
- Time-difference-of-arrival (TDOA) estimation from GCC-PHAT is not always as straight forward as finding the maximum peak. This work views the GCC output as an image, with time on the vertical axis and TDOA horizontally, to explore if image-to-image machine learning methods can make a more robust filter. The Structure from Sound Database provides audio recorded with a distributed microphone setup and a moving sound source. The audio was fed to GCC-PHAT without pre-processing, and images were produced for batch processing. The ground truth, the direct-path TDOA, shows a continuous curve through time. The GCC output image has a similar curve, but obscured by noise and not at all times texturally different from the multi-path components. The main approach tested is binary semantic segmentation with a U-Net. A challenge is the extreme class imbalance within the image. Preliminary results indicate that the method is valid to detect curves, yet more work is needed to single out the direct path TDOA with confidence. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/e85c159b-3969-479d-b475-30b3e3e7ab01
- author
- Gulin, Jens LU and Åström, Kalle LU
- organization
-
- Computer Vision and Machine Learning (research group)
- Integrated Electronic Systems (research group)
- LU Profile Area: Nature-based future solutions
- LU Profile Area: Light and Materials
- LU Profile Area: Proactive Ageing
- LU Profile Area: Natural and Artificial Cognition
- LTH Profile Area: AI and Digitalization
- LTH Profile Area: Engineering Health
- Stroke Imaging Research group (research group)
- ELLIIT: the Linköping-Lund initiative on IT and mobile communication
- eSSENCE: The e-Science Collaboration
- Mathematical Imaging Group (research group)
- alternative title
- GCC-PHAT ombildad - Ett U-Net filter för urval av Audio TDOA toppar
- publishing date
- 2024-03-18
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- keywords
- Time-difference-of-arrival, Semantic segmentation, curve detection, noise reduction, U-Net, Generalized Cross-Correlation
- host publication
- ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- pages
- 5 pages
- publisher
- IEEE - Institute of Electrical and Electronics Engineers Inc.
- conference name
- 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing
- conference location
- Seoul, Korea, Republic of
- conference dates
- 2024-04-14 - 2024-04-19
- external identifiers
-
- scopus:85195385350
- ISBN
- 979-8-3503-4486-8
- 979-8-3503-4485-1
- DOI
- 10.1109/ICASSP48485.2024.10447558
- language
- English
- LU publication?
- yes
- additional info
- “© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”
- id
- e85c159b-3969-479d-b475-30b3e3e7ab01
- date added to LUP
- 2024-04-29 15:25:59
- date last changed
- 2024-11-08 13:15:29
@inproceedings{e85c159b-3969-479d-b475-30b3e3e7ab01, abstract = {{Time-difference-of-arrival (TDOA) estimation from GCC-PHAT is not always as straight forward as finding the maximum peak. This work views the GCC output as an image, with time on the vertical axis and TDOA horizontally, to explore if image-to-image machine learning methods can make a more robust filter. The Structure from Sound Database provides audio recorded with a distributed microphone setup and a moving sound source. The audio was fed to GCC-PHAT without pre-processing, and images were produced for batch processing. The ground truth, the direct-path TDOA, shows a continuous curve through time. The GCC output image has a similar curve, but obscured by noise and not at all times texturally different from the multi-path components. The main approach tested is binary semantic segmentation with a U-Net. A challenge is the extreme class imbalance within the image. Preliminary results indicate that the method is valid to detect curves, yet more work is needed to single out the direct path TDOA with confidence.}}, author = {{Gulin, Jens and Åström, Kalle}}, booktitle = {{ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}}, isbn = {{979-8-3503-4486-8}}, keywords = {{Time-difference-of-arrival; Semantic segmentation; curve detection; noise reduction; U-Net; Generalized Cross-Correlation}}, language = {{eng}}, month = {{03}}, pages = {{8806--8810}}, publisher = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}}, title = {{GCC-PHAT Re-Imagined - A U-Net Filter for Audio TDOA Peak-Selection}}, url = {{https://lup.lub.lu.se/search/files/181844482/ICASSP24_GCC_Reimagined_Gulin_str_m_.pdf}}, doi = {{10.1109/ICASSP48485.2024.10447558}}, year = {{2024}}, }