An ASR-based Hybrid Approach for Auditory Attention Decoding
(2024)Department of Automatic Control
- Abstract
- Auditory Attention Decoding (AAD) aims to determine the focus of a listener’s attention in environments with multiple overlapping speakers, a challenging situation for hearing impaired patients known as the Cocktail Party Problem. This thesis investigates AAD using Whisper, a transformer-based Automatic Speech Recognition (ASR) system that performs a graded transformation from speech to text while encoding linguistic and semantic information in its latent encoder layers. Two approaches to AAD are explored: first, a forward pipeline that utilizes Whisper for preprocessing audio stimuli in conjunction with a Temporal Response Function (TRF) model for predicting Electroencephalography (EEG) responses. Second, a hybrid approach aims to enhance... (More)
- Auditory Attention Decoding (AAD) aims to determine the focus of a listener’s attention in environments with multiple overlapping speakers, a challenging situation for hearing impaired patients known as the Cocktail Party Problem. This thesis investigates AAD using Whisper, a transformer-based Automatic Speech Recognition (ASR) system that performs a graded transformation from speech to text while encoding linguistic and semantic information in its latent encoder layers. Two approaches to AAD are explored: first, a forward pipeline that utilizes Whisper for preprocessing audio stimuli in conjunction with a Temporal Response Function (TRF) model for predicting Electroencephalography (EEG) responses. Second, a hybrid approach aims to enhance the classification performance by applying Canonical Correlation Analysis (CCA) and its neural network variant, Deep Canonical Correlation Analysis (DCCA), to Whisper’s latent encoder layers and EEG signals. The performance of these models is compared across fixed decision window lengths, assessing their attention decoding capabilities when presented with limited information, to highlight Whisper’s enhanced performance when combined with CCA. Additionally, we test Whisper’s AAD performance when only a restricted number of electrodes limited to the temporal regions is available, as a step towards the development of wearable neurosteered hearing aid devices. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9173483
- author
- Celoria, Alessandro and López, Valentín
- supervisor
- organization
- year
- 2024
- type
- H3 - Professional qualifications (4 Years - )
- subject
- report number
- TFRT-6233
- other publication id
- 0280-5316
- language
- English
- id
- 9173483
- date added to LUP
- 2024-09-09 09:21:25
- date last changed
- 2024-09-09 09:21:25
@misc{9173483, abstract = {{Auditory Attention Decoding (AAD) aims to determine the focus of a listener’s attention in environments with multiple overlapping speakers, a challenging situation for hearing impaired patients known as the Cocktail Party Problem. This thesis investigates AAD using Whisper, a transformer-based Automatic Speech Recognition (ASR) system that performs a graded transformation from speech to text while encoding linguistic and semantic information in its latent encoder layers. Two approaches to AAD are explored: first, a forward pipeline that utilizes Whisper for preprocessing audio stimuli in conjunction with a Temporal Response Function (TRF) model for predicting Electroencephalography (EEG) responses. Second, a hybrid approach aims to enhance the classification performance by applying Canonical Correlation Analysis (CCA) and its neural network variant, Deep Canonical Correlation Analysis (DCCA), to Whisper’s latent encoder layers and EEG signals. The performance of these models is compared across fixed decision window lengths, assessing their attention decoding capabilities when presented with limited information, to highlight Whisper’s enhanced performance when combined with CCA. Additionally, we test Whisper’s AAD performance when only a restricted number of electrodes limited to the temporal regions is available, as a step towards the development of wearable neurosteered hearing aid devices.}}, author = {{Celoria, Alessandro and López, Valentín}}, language = {{eng}}, note = {{Student Paper}}, title = {{An ASR-based Hybrid Approach for Auditory Attention Decoding}}, year = {{2024}}, }