Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

An ASR-based Hybrid Approach for Auditory Attention Decoding

Celoria, Alessandro and López, Valentín (2024)
Department of Automatic Control
Abstract
Auditory Attention Decoding (AAD) aims to determine the focus of a listener’s attention in environments with multiple overlapping speakers, a challenging situation for hearing impaired patients known as the Cocktail Party Problem. This thesis investigates AAD using Whisper, a transformer-based Automatic Speech Recognition (ASR) system that performs a graded transformation from speech to text while encoding linguistic and semantic information in its latent encoder layers. Two approaches to AAD are explored: first, a forward pipeline that utilizes Whisper for preprocessing audio stimuli in conjunction with a Temporal Response Function (TRF) model for predicting Electroencephalography (EEG) responses. Second, a hybrid approach aims to enhance... (More)
Auditory Attention Decoding (AAD) aims to determine the focus of a listener’s attention in environments with multiple overlapping speakers, a challenging situation for hearing impaired patients known as the Cocktail Party Problem. This thesis investigates AAD using Whisper, a transformer-based Automatic Speech Recognition (ASR) system that performs a graded transformation from speech to text while encoding linguistic and semantic information in its latent encoder layers. Two approaches to AAD are explored: first, a forward pipeline that utilizes Whisper for preprocessing audio stimuli in conjunction with a Temporal Response Function (TRF) model for predicting Electroencephalography (EEG) responses. Second, a hybrid approach aims to enhance the classification performance by applying Canonical Correlation Analysis (CCA) and its neural network variant, Deep Canonical Correlation Analysis (DCCA), to Whisper’s latent encoder layers and EEG signals. The performance of these models is compared across fixed decision window lengths, assessing their attention decoding capabilities when presented with limited information, to highlight Whisper’s enhanced performance when combined with CCA. Additionally, we test Whisper’s AAD performance when only a restricted number of electrodes limited to the temporal regions is available, as a step towards the development of wearable neurosteered hearing aid devices. (Less)
Please use this url to cite or link to this publication:
author
Celoria, Alessandro and López, Valentín
supervisor
organization
year
type
H3 - Professional qualifications (4 Years - )
subject
report number
TFRT-6233
other publication id
0280-5316
language
English
id
9173483
date added to LUP
2024-09-09 09:21:25
date last changed
2024-09-09 09:21:25
@misc{9173483,
  abstract     = {{Auditory Attention Decoding (AAD) aims to determine the focus of a listener’s attention in environments with multiple overlapping speakers, a challenging situation for hearing impaired patients known as the Cocktail Party Problem. This thesis investigates AAD using Whisper, a transformer-based Automatic Speech Recognition (ASR) system that performs a graded transformation from speech to text while encoding linguistic and semantic information in its latent encoder layers. Two approaches to AAD are explored: first, a forward pipeline that utilizes Whisper for preprocessing audio stimuli in conjunction with a Temporal Response Function (TRF) model for predicting Electroencephalography (EEG) responses. Second, a hybrid approach aims to enhance the classification performance by applying Canonical Correlation Analysis (CCA) and its neural network variant, Deep Canonical Correlation Analysis (DCCA), to Whisper’s latent encoder layers and EEG signals. The performance of these models is compared across fixed decision window lengths, assessing their attention decoding capabilities when presented with limited information, to highlight Whisper’s enhanced performance when combined with CCA. Additionally, we test Whisper’s AAD performance when only a restricted number of electrodes limited to the temporal regions is available, as a step towards the development of wearable neurosteered hearing aid devices.}},
  author       = {{Celoria, Alessandro and López, Valentín}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{An ASR-based Hybrid Approach for Auditory Attention Decoding}},
  year         = {{2024}},
}