GCC-PHAT Re-Imagined - A U-Net Filter for Audio TDOA Peak-Selection

Gulin, Jens; Åström, Kalle (2024-03-18). GCC-PHAT Re-Imagined - A U-Net Filter for Audio TDOA Peak-Selection ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 8806 - 8810. 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing. Seoul, Korea, Republic of: IEEE - Institute of Electrical and Electronics Engineers Inc.
Download:
DOI:
Conference Proceeding/Paper | Published | English
Authors:
Gulin, Jens ; Åström, Kalle
Department:
Computer Vision and Machine Learning
Integrated Electronic Systems
LU Profile Area: Nature-based future solutions
LU Profile Area: Light and Materials
LU Profile Area: Proactive Ageing
LU Profile Area: Natural and Artificial Cognition
LTH Profile Area: AI and Digitalization
LTH Profile Area: Engineering Health
Stroke Imaging Research group
ELLIIT: the Linköping-Lund initiative on IT and mobile communication
eSSENCE: The e-Science Collaboration
Mathematical Imaging Group
Research Group:
Computer Vision and Machine Learning
Integrated Electronic Systems
Stroke Imaging Research group
Mathematical Imaging Group
Alternative Title:
GCC-PHAT ombildad - Ett U-Net filter för urval av Audio TDOA toppar
Abstract:
Time-difference-of-arrival (TDOA) estimation from GCC-PHAT is not always as straight forward as finding the maximum peak. This work views the GCC output as an image, with time on the vertical axis and TDOA horizontally, to explore if image-to-image machine learning methods can make a more robust filter. The Structure from Sound Database provides audio recorded with a distributed microphone setup and a moving sound source. The audio was fed to GCC-PHAT without pre-processing, and images were produced for batch processing. The ground truth, the direct-path TDOA, shows a continuous curve through time. The GCC output image has a similar curve, but obscured by noise and not at all times texturally different from the multi-path components. The main approach tested is binary semantic segmentation with a U-Net. A challenge is the extreme class imbalance within the image. Preliminary results indicate that the method is valid to detect curves, yet more work is needed to single out the direct path TDOA with confidence.
Keywords:
Time-difference-of-arrival ; Semantic segmentation ; curve detection ; noise reduction ; U-Net ; Generalized Cross-Correlation
ISBN:
979-8-3503-4486-8
LUP-ID:
e85c159b-3969-479d-b475-30b3e3e7ab01 | Link: https://lup.lub.lu.se/record/e85c159b-3969-479d-b475-30b3e3e7ab01 | Statistics

Cite this