Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Predicting Perceptual Centers Located at Vowel Onset in German Speech Using Long Short-Term Memory Networks

Schulz, Felicia ; De Sisto, Mirella ; Roncaglia-Denissen, M. Paula and Hendrix, Peter (2023) 24th International Speech Communication Association, Interspeech 2023 In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023-August. p.1793-1797
Abstract

Perceptual centers (p-centers) can be defined as the perceived centers of a syllable. Previous research regarding the location of p-centers in speech relied on experimental methods, and among the suggested acoustic features contributing to the location of p-centers in Germanic languages is the transition of the consonant to the vowel onset. The current study investigates the prediction of the location of p-centers in German, by means of machine learning. Machine learning is a promising tool to capture possible non-linear relationships that may occur among the acoustic features used in the complexity that is the human perception. Therefore, an LSTM neural network approach was used for the identification of p-centers in a set of spoken... (More)

Perceptual centers (p-centers) can be defined as the perceived centers of a syllable. Previous research regarding the location of p-centers in speech relied on experimental methods, and among the suggested acoustic features contributing to the location of p-centers in Germanic languages is the transition of the consonant to the vowel onset. The current study investigates the prediction of the location of p-centers in German, by means of machine learning. Machine learning is a promising tool to capture possible non-linear relationships that may occur among the acoustic features used in the complexity that is the human perception. Therefore, an LSTM neural network approach was used for the identification of p-centers in a set of spoken German sentences, with input data features being Mel Frequency Cepstral Coefficients (MFCC), amplitude envelope and root mean squared energy. The model was able to achieve a balanced accuracy of 84% with MFCCs being the best predictor of p-center location.

(Less)
Please use this url to cite or link to this publication:
author
; ; and
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
deep learning, Long Short-Term Memory, Mel Frequency Cepstral Coefficients, perceptual centers
host publication
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
series title
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
volume
2023-August
pages
5 pages
conference name
24th International Speech Communication Association, Interspeech 2023
conference location
Dublin, Ireland
conference dates
2023-08-20 - 2023-08-24
external identifiers
  • scopus:85171594724
ISSN
2308-457X
DOI
10.21437/Interspeech.2023-2154
language
English
LU publication?
no
additional info
Publisher Copyright: © 2023 International Speech Communication Association. All rights reserved.
id
56984f1c-ee14-494f-a9f3-2d2246dc91b7
date added to LUP
2023-12-21 12:56:49
date last changed
2024-01-02 08:48:37
@inproceedings{56984f1c-ee14-494f-a9f3-2d2246dc91b7,
  abstract     = {{<p>Perceptual centers (p-centers) can be defined as the perceived centers of a syllable. Previous research regarding the location of p-centers in speech relied on experimental methods, and among the suggested acoustic features contributing to the location of p-centers in Germanic languages is the transition of the consonant to the vowel onset. The current study investigates the prediction of the location of p-centers in German, by means of machine learning. Machine learning is a promising tool to capture possible non-linear relationships that may occur among the acoustic features used in the complexity that is the human perception. Therefore, an LSTM neural network approach was used for the identification of p-centers in a set of spoken German sentences, with input data features being Mel Frequency Cepstral Coefficients (MFCC), amplitude envelope and root mean squared energy. The model was able to achieve a balanced accuracy of 84% with MFCCs being the best predictor of p-center location.</p>}},
  author       = {{Schulz, Felicia and De Sisto, Mirella and Roncaglia-Denissen, M. Paula and Hendrix, Peter}},
  booktitle    = {{Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH}},
  issn         = {{2308-457X}},
  keywords     = {{deep learning; Long Short-Term Memory; Mel Frequency Cepstral Coefficients; perceptual centers}},
  language     = {{eng}},
  pages        = {{1793--1797}},
  series       = {{Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH}},
  title        = {{Predicting Perceptual Centers Located at Vowel Onset in German Speech Using Long Short-Term Memory Networks}},
  url          = {{http://dx.doi.org/10.21437/Interspeech.2023-2154}},
  doi          = {{10.21437/Interspeech.2023-2154}},
  volume       = {{2023-August}},
  year         = {{2023}},
}