Computation models for audiovisual attention decoding

Enander, Sara; Karsten, Louise

Computation models for audiovisual attention decoding

Mark

Enander, Sara and Karsten, Louise (2022)
Department of Automatic Control

Abstract: When being in a noisy environment, a normal hearing person can manage to sort out background noise and focus on the attended source. This is something that a person with impaired hearing will struggle with, even when wearing a hearing aid. Research for developing intelligent hearing aids has not yet come up with a solution for solving this problem and more research is needed. This thesis uses data from experiments where a noisy environment is simulated. The test subjects are exposed to a monologue and a dialogue at the same time but are told to only focus on one of them. Using EEG and eye gaze data collected from these experiments as an input, different machine learning models are implemented to solve a binary classification task to... (More); When being in a noisy environment, a normal hearing person can manage to sort out background noise and focus on the attended source. This is something that a person with impaired hearing will struggle with, even when wearing a hearing aid. Research for developing intelligent hearing aids has not yet come up with a solution for solving this problem and more research is needed. This thesis uses data from experiments where a noisy environment is simulated. The test subjects are exposed to a monologue and a dialogue at the same time but are told to only focus on one of them. Using EEG and eye gaze data collected from these experiments as an input, different machine learning models are implemented to solve a binary classification task to predict whether a subject is attending to a monologue or a dialogue. The investigated models are support vector machine, multilayer perceptron, and convolutional neural network. The input to the models is time series arrays from either EEG signals or eye gaze data. For the support vector machine and the multilayer perceptron models, more compact representations of the time series arrays are used as inputs. The convolutional neural network performs best overall and reaches an average prediction score of 87% for all subjects when using inputs from all electrodes at the same time. When using one electrode at the time as input, and then averaging over all electrodes, the support vector machine performs best with an average accuracy of 78%. There is however a clear pattern in what regions of electrodes that succeed best with the classification task for all models. These are the electrodes at the temporal lobe as well as the sides of the front of the frontal lobe. It varies how long the trials need to be to get a decent accuracy for each model when EEG data is used. The support vector machine and the multilayer perceptron performs best for longer trials while the convolutional neural network performs best for shorter trials. For the eye gaze data, the support vector machine reaches the highest average score of 99%. The accuracy for the eye gaze data is not affected remarkably by decreasing the length of trials. (Less)

- Open Access
- |
- PDF

Links

Document download statistics

Related Materials

Related object is popular science:
Popular Science summary

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9093832

author

Enander, Sara and Karsten, Louise

supervisor

organization

Department of Automatic Control

year

2022

type

H3 - Professional qualifications (4 Years - )

subject

Technology and Engineering

report number

TFRT-6165

ISSN

0280-5316

language

English

id

9093832

date added to LUP

2022-08-12 10:02:58

date last changed

2022-09-05 12:34:29

@misc{9093832,
  abstract     = {{When being in a noisy environment, a normal hearing person can manage to sort out background noise and focus on the attended source. This is something that a person with impaired hearing will struggle with, even when wearing a hearing aid. Research for developing intelligent hearing aids has not yet come up with a solution for solving this problem and more research is needed. This thesis uses data from experiments where a noisy environment is simulated. The test subjects are exposed to a monologue and a dialogue at the same time but are told to only focus on one of them. Using EEG and eye gaze data collected from these experiments as an input, different machine learning models are implemented to solve a binary classification task to predict whether a subject is attending to a monologue or a dialogue. The investigated models are support vector machine, multilayer perceptron, and convolutional neural network. The input to the models is time series arrays from either EEG signals or eye gaze data. For the support vector machine and the multilayer perceptron models, more compact representations of the time series arrays are used as inputs. The convolutional neural network performs best overall and reaches an average prediction score of 87% for all subjects when using inputs from all electrodes at the same time. When using one electrode at the time as input, and then averaging over all electrodes, the support vector machine performs best with an average accuracy of 78%. There is however a clear pattern in what regions of electrodes that succeed best with the classification task for all models. These are the electrodes at the temporal lobe as well as the sides of the front of the frontal lobe. It varies how long the trials need to be to get a decent accuracy for each model when EEG data is used. The support vector machine and the multilayer perceptron performs best for longer trials while the convolutional neural network performs best for shorter trials. For the eye gaze data, the support vector machine reaches the highest average score of 99%. The accuracy for the eye gaze data is not affected remarkably by decreasing the length of trials.}},
  author       = {{Enander, Sara and Karsten, Louise}},
  issn         = {{0280-5316}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Computation models for audiovisual attention decoding}},
  year         = {{2022}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Computation models for audiovisual attention decoding