Objective Assessment of Speech Intelligibility from EEG by Integrating Acoustic and Linguistic Information

Johansson, Max; Magnusson Fredlund, Joakim

Objective Assessment of Speech Intelligibility from EEG by Integrating Acoustic and Linguistic Information

Mark

Johansson, Max ^LU and Magnusson Fredlund, Joakim ^LU (2026) BMEM01 20261
Division for Biomedical Engineering

Abstract: This study investigated the effects of using multiple different acoustic and linguistic features for objective Speech Reception Threshold (SRT) estimation from Electroencephalography (EEG) data, using models that estimates the neural response (EEG data) from the corresponding feature stimulus, or vice versa. The SRT is a measurement of the speech volume Signal-to-Noise Ratio where the subject understood half the spoken content. This study experimented with training these models on individual features and combining their output, instead of training them on multiple simultaneous features. The features tested were acoustic speech envelopes, linguistic onsets, layer outputs from OpenAI's Automatic Speech Recognition (ASR) Large Language Model... (More); This study investigated the effects of using multiple different acoustic and linguistic features for objective Speech Reception Threshold (SRT) estimation from Electroencephalography (EEG) data, using models that estimates the neural response (EEG data) from the corresponding feature stimulus, or vice versa. The SRT is a measurement of the speech volume Signal-to-Noise Ratio where the subject understood half the spoken content. This study experimented with training these models on individual features and combining their output, instead of training them on multiple simultaneous features. The features tested were acoustic speech envelopes, linguistic onsets, layer outputs from OpenAI's Automatic Speech Recognition (ASR) Large Language Model (LLM), and word surprisal. Surprisal is a LLM measurement that signifies how unlikely a token is to appear next in a sequence. To improve the performance of the EEG prediction models, various EEG channel selection and weighted average methodologies were tested.

As features, the LLM layers generally performed better than the acoustic speech envelope. Word surprisal gave contradictory results. Optimized channel handling for the EEG prediction models improved performance, but still underperformed compared to the feature stimulus reconstruction models. The SRT estimations were found to have a trend of underestimation, indicating the speech intelligibility was overestimated. The magnitude of this trend varied significantly based on the features and model used, with the EEG estimation model instead significantly overestimating SRT when channel handling was optimized.

Overall, the results of this study show that incorporating the LLM derived features improves SRT estimation, and that these features on their own may be used as feature stimuli. They also suggest that the changed feature combination methodology is promising, opening up for further studies on optimization and improvement. (Less)
Popular Abstract: Improved estimation of noise-dependent listening comprehension using machine learning models

Hearing loss is a worsening global problem, with 430 million people around the world requiring hearing assistance. One metric we can use to look at hearing problems is how much random noise someone can handle before they struggle to understand what they're listening to. This is usually measured in a lab, where the person listens to speech and is asked to repeat what they heard. But what if they're a young child, or unable to respond due to medical reasons, or what if the doctor just wants to repeatedly check this to monitor the decline in their hearing capabilities?

Our goal was to try to improve methods for calculating that threshold. We... (More); Improved estimation of noise-dependent listening comprehension using machine learning models

Hearing loss is a worsening global problem, with 430 million people around the world requiring hearing assistance. One metric we can use to look at hearing problems is how much random noise someone can handle before they struggle to understand what they're listening to. This is usually measured in a lab, where the person listens to speech and is asked to repeat what they heard. But what if they're a young child, or unable to respond due to medical reasons, or what if the doctor just wants to repeatedly check this to monitor the decline in their hearing capabilities?

Our goal was to try to improve methods for calculating that threshold. We used measurements of speech played with different noise levels and the brain activity of the listener, experimenting with combining different ways of processing that speech, including machine learning methods. These machine learning methods included investigating how surprising each word in the speech was using GPT-2, a precursor to OpenAI's ChatGPT, and looking for patterns in how the speech-to-text model Whisper analyzed that speech.

An interesting discovery in this study was the stark difference in performance between these two methods. Cases using only the Whisper data performed very well, even better than the standard processing methods, while cases that only used the surprise factor performed inconsistently, scoring very poorly by some metrics, yet unchanged or improved by some.

Another interesting discovery was the success in using a new methodology for combining different types of speech information, by training estimation models on individual speech information types and then combining them in the final steps, instead of training a new model for each combination of these types. This also opened the way for using some models where the common methodology of training a model for each combination would not be productive. (Less)

- Open Access
- |
- PDF

Links

Document download statistics

Related Materials

Related object is popular science:
Popular Science Summary

Please use this url to cite or link to this publication: https://lup.lub.lu.se/student-papers/record/9225488

author

Johansson, Max ^LU and Magnusson Fredlund, Joakim ^LU

supervisor

organization

Division for Biomedical Engineering

course

BMEM01 20261

year

2026

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

keywords

Speech intelligibility, Speech in noise, Neural decoding, Automatic Speech Recognition, Large Language Model, Surprisal

language

English

additional info

2026-04

id

9225488

date added to LUP

2026-04-24 14:12:38

date last changed

2026-04-24 14:12:38

@misc{9225488,
  abstract     = {{This study investigated the effects of using multiple different acoustic and linguistic features for objective Speech Reception Threshold (SRT) estimation from Electroencephalography (EEG) data, using models that estimates the neural response (EEG data) from the corresponding feature stimulus, or vice versa. The SRT is a measurement of the speech volume Signal-to-Noise Ratio where the subject understood half the spoken content. This study experimented with training these models on individual features and combining their output, instead of training them on multiple simultaneous features. The features tested were acoustic speech envelopes, linguistic onsets, layer outputs from OpenAI's Automatic Speech Recognition (ASR) Large Language Model (LLM), and word surprisal. Surprisal is a LLM measurement that signifies how unlikely a token is to appear next in a sequence. To improve the performance of the EEG prediction models, various EEG channel selection and weighted average methodologies were tested.

As features, the LLM layers generally performed better than the acoustic speech envelope. Word surprisal gave contradictory results. Optimized channel handling for the EEG prediction models improved performance, but still underperformed compared to the feature stimulus reconstruction models. The SRT estimations were found to have a trend of underestimation, indicating the speech intelligibility was overestimated. The magnitude of this trend varied significantly based on the features and model used, with the EEG estimation model instead significantly overestimating SRT when channel handling was optimized.

Overall, the results of this study show that incorporating the LLM derived features improves SRT estimation, and that these features on their own may be used as feature stimuli. They also suggest that the changed feature combination methodology is promising, opening up for further studies on optimization and improvement.}},
  author       = {{Johansson, Max and Magnusson Fredlund, Joakim}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Objective Assessment of Speech Intelligibility from EEG by Integrating Acoustic and Linguistic Information}},
  year         = {{2026}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Objective Assessment of Speech Intelligibility from EEG by Integrating Acoustic and Linguistic Information