Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Computation models for audiovisual attention decoding

Enander, Sara and Karsten, Louise (2022)
Department of Automatic Control
Abstract
When being in a noisy environment, a normal hearing person can manage to sort out background noise and focus on the attended source. This is something that a person with impaired hearing will struggle with, even when wearing a hearing aid. Research for developing intelligent hearing aids has not yet come up with a solution for solving this problem and more research is needed. This thesis uses data from experiments where a noisy environment is simulated. The test subjects are exposed to a monologue and a dialogue at the same time but are told to only focus on one of them. Using EEG and eye gaze data collected from these experiments as an input, different machine learning models are implemented to solve a binary classification task to... (More)
When being in a noisy environment, a normal hearing person can manage to sort out background noise and focus on the attended source. This is something that a person with impaired hearing will struggle with, even when wearing a hearing aid. Research for developing intelligent hearing aids has not yet come up with a solution for solving this problem and more research is needed. This thesis uses data from experiments where a noisy environment is simulated. The test subjects are exposed to a monologue and a dialogue at the same time but are told to only focus on one of them. Using EEG and eye gaze data collected from these experiments as an input, different machine learning models are implemented to solve a binary classification task to predict whether a subject is attending to a monologue or a dialogue. The investigated models are support vector machine, multilayer perceptron, and convolutional neural network. The input to the models is time series arrays from either EEG signals or eye gaze data. For the support vector machine and the multilayer perceptron models, more compact representations of the time series arrays are used as inputs. The convolutional neural network performs best overall and reaches an average prediction score of 87% for all subjects when using inputs from all electrodes at the same time. When using one electrode at the time as input, and then averaging over all electrodes, the support vector machine performs best with an average accuracy of 78%. There is however a clear pattern in what regions of electrodes that succeed best with the classification task for all models. These are the electrodes at the temporal lobe as well as the sides of the front of the frontal lobe. It varies how long the trials need to be to get a decent accuracy for each model when EEG data is used. The support vector machine and the multilayer perceptron performs best for longer trials while the convolutional neural network performs best for shorter trials. For the eye gaze data, the support vector machine reaches the highest average score of 99%. The accuracy for the eye gaze data is not affected remarkably by decreasing the length of trials. (Less)
Please use this url to cite or link to this publication:
author
Enander, Sara and Karsten, Louise
supervisor
organization
year
type
H3 - Professional qualifications (4 Years - )
subject
report number
TFRT-6165
ISSN
0280-5316
language
English
id
9093832
date added to LUP
2022-08-12 10:02:58
date last changed
2022-09-05 12:34:29
@misc{9093832,
  abstract     = {{When being in a noisy environment, a normal hearing person can manage to sort out background noise and focus on the attended source. This is something that a person with impaired hearing will struggle with, even when wearing a hearing aid. Research for developing intelligent hearing aids has not yet come up with a solution for solving this problem and more research is needed. This thesis uses data from experiments where a noisy environment is simulated. The test subjects are exposed to a monologue and a dialogue at the same time but are told to only focus on one of them. Using EEG and eye gaze data collected from these experiments as an input, different machine learning models are implemented to solve a binary classification task to predict whether a subject is attending to a monologue or a dialogue. The investigated models are support vector machine, multilayer perceptron, and convolutional neural network. The input to the models is time series arrays from either EEG signals or eye gaze data. For the support vector machine and the multilayer perceptron models, more compact representations of the time series arrays are used as inputs. The convolutional neural network performs best overall and reaches an average prediction score of 87% for all subjects when using inputs from all electrodes at the same time. When using one electrode at the time as input, and then averaging over all electrodes, the support vector machine performs best with an average accuracy of 78%. There is however a clear pattern in what regions of electrodes that succeed best with the classification task for all models. These are the electrodes at the temporal lobe as well as the sides of the front of the frontal lobe. It varies how long the trials need to be to get a decent accuracy for each model when EEG data is used. The support vector machine and the multilayer perceptron performs best for longer trials while the convolutional neural network performs best for shorter trials. For the eye gaze data, the support vector machine reaches the highest average score of 99%. The accuracy for the eye gaze data is not affected remarkably by decreasing the length of trials.}},
  author       = {{Enander, Sara and Karsten, Louise}},
  issn         = {{0280-5316}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Computation models for audiovisual attention decoding}},
  year         = {{2022}},
}