Predicting Perceptual Centers Located at Vowel Onset in German Speech Using Long Short-Term Memory Networks

Schulz, Felicia; De Sisto, Mirella; Roncaglia-Denissen, M. Paula; Hendrix, Peter

Predicting Perceptual Centers Located at Vowel Onset in German Speech Using Long Short-Term Memory Networks

Mark

Schulz, Felicia ; De Sisto, Mirella ; Roncaglia-Denissen, M. Paula and Hendrix, Peter (2023) 24th International Speech Communication Association, Interspeech 2023 In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023-August. p.1793-1797

Abstract: Perceptual centers (p-centers) can be defined as the perceived centers of a syllable. Previous research regarding the location of p-centers in speech relied on experimental methods, and among the suggested acoustic features contributing to the location of p-centers in Germanic languages is the transition of the consonant to the vowel onset. The current study investigates the prediction of the location of p-centers in German, by means of machine learning. Machine learning is a promising tool to capture possible non-linear relationships that may occur among the acoustic features used in the complexity that is the human perception. Therefore, an LSTM neural network approach was used for the identification of p-centers in a set of spoken... (More); Perceptual centers (p-centers) can be defined as the perceived centers of a syllable. Previous research regarding the location of p-centers in speech relied on experimental methods, and among the suggested acoustic features contributing to the location of p-centers in Germanic languages is the transition of the consonant to the vowel onset. The current study investigates the prediction of the location of p-centers in German, by means of machine learning. Machine learning is a promising tool to capture possible non-linear relationships that may occur among the acoustic features used in the complexity that is the human perception. Therefore, an LSTM neural network approach was used for the identification of p-centers in a set of spoken German sentences, with input data features being Mel Frequency Cepstral Coefficients (MFCC), amplitude envelope and root mean squared energy. The model was able to achieve a balanced accuracy of 84% with MFCCs being the best predictor of p-center location.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/56984f1c-ee14-494f-a9f3-2d2246dc91b7

author

Schulz, Felicia ; De Sisto, Mirella ; Roncaglia-Denissen, M. Paula and Hendrix, Peter

publishing date

2023

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

Natural Language Processing

keywords

deep learning, Long Short-Term Memory, Mel Frequency Cepstral Coefficients, perceptual centers

host publication

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

series title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

volume

2023-August

pages

5 pages

conference name

24th International Speech Communication Association, Interspeech 2023

conference location

Dublin, Ireland

conference dates

2023-08-20 - 2023-08-24

external identifiers

scopus:85171594724

ISSN

2308-457X

DOI

10.21437/Interspeech.2023-2154

language

English

LU publication?

no

additional info

id

56984f1c-ee14-494f-a9f3-2d2246dc91b7

date added to LUP

2023-12-21 12:56:49

date last changed

2025-10-14 11:45:57

@inproceedings{56984f1c-ee14-494f-a9f3-2d2246dc91b7,
  abstract     = {{<p>Perceptual centers (p-centers) can be defined as the perceived centers of a syllable. Previous research regarding the location of p-centers in speech relied on experimental methods, and among the suggested acoustic features contributing to the location of p-centers in Germanic languages is the transition of the consonant to the vowel onset. The current study investigates the prediction of the location of p-centers in German, by means of machine learning. Machine learning is a promising tool to capture possible non-linear relationships that may occur among the acoustic features used in the complexity that is the human perception. Therefore, an LSTM neural network approach was used for the identification of p-centers in a set of spoken German sentences, with input data features being Mel Frequency Cepstral Coefficients (MFCC), amplitude envelope and root mean squared energy. The model was able to achieve a balanced accuracy of 84% with MFCCs being the best predictor of p-center location.</p>}},
  author       = {{Schulz, Felicia and De Sisto, Mirella and Roncaglia-Denissen, M. Paula and Hendrix, Peter}},
  booktitle    = {{Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH}},
  issn         = {{2308-457X}},
  keywords     = {{deep learning; Long Short-Term Memory; Mel Frequency Cepstral Coefficients; perceptual centers}},
  language     = {{eng}},
  pages        = {{1793--1797}},
  series       = {{Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH}},
  title        = {{Predicting Perceptual Centers Located at Vowel Onset in German Speech Using Long Short-Term Memory Networks}},
  url          = {{http://dx.doi.org/10.21437/Interspeech.2023-2154}},
  doi          = {{10.21437/Interspeech.2023-2154}},
  volume       = {{2023-August}},
  year         = {{2023}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Predicting Perceptual Centers Located at Vowel Onset in German Speech Using Long Short-Term Memory Networks