Predicting Perceptual Centers Located at Vowel Onset in German Speech Using Long Short-Term Memory Networks
(2023) 24th International Speech Communication Association, Interspeech 2023 In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023-August. p.1793-1797- Abstract
Perceptual centers (p-centers) can be defined as the perceived centers of a syllable. Previous research regarding the location of p-centers in speech relied on experimental methods, and among the suggested acoustic features contributing to the location of p-centers in Germanic languages is the transition of the consonant to the vowel onset. The current study investigates the prediction of the location of p-centers in German, by means of machine learning. Machine learning is a promising tool to capture possible non-linear relationships that may occur among the acoustic features used in the complexity that is the human perception. Therefore, an LSTM neural network approach was used for the identification of p-centers in a set of spoken... (More)
Perceptual centers (p-centers) can be defined as the perceived centers of a syllable. Previous research regarding the location of p-centers in speech relied on experimental methods, and among the suggested acoustic features contributing to the location of p-centers in Germanic languages is the transition of the consonant to the vowel onset. The current study investigates the prediction of the location of p-centers in German, by means of machine learning. Machine learning is a promising tool to capture possible non-linear relationships that may occur among the acoustic features used in the complexity that is the human perception. Therefore, an LSTM neural network approach was used for the identification of p-centers in a set of spoken German sentences, with input data features being Mel Frequency Cepstral Coefficients (MFCC), amplitude envelope and root mean squared energy. The model was able to achieve a balanced accuracy of 84% with MFCCs being the best predictor of p-center location.
(Less)
- author
- Schulz, Felicia ; De Sisto, Mirella ; Roncaglia-Denissen, M. Paula and Hendrix, Peter
- publishing date
- 2023
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- keywords
- deep learning, Long Short-Term Memory, Mel Frequency Cepstral Coefficients, perceptual centers
- host publication
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
- series title
- Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
- volume
- 2023-August
- pages
- 5 pages
- conference name
- 24th International Speech Communication Association, Interspeech 2023
- conference location
- Dublin, Ireland
- conference dates
- 2023-08-20 - 2023-08-24
- external identifiers
-
- scopus:85171594724
- ISSN
- 2308-457X
- DOI
- 10.21437/Interspeech.2023-2154
- language
- English
- LU publication?
- no
- additional info
- Publisher Copyright: © 2023 International Speech Communication Association. All rights reserved.
- id
- 56984f1c-ee14-494f-a9f3-2d2246dc91b7
- date added to LUP
- 2023-12-21 12:56:49
- date last changed
- 2024-01-02 08:48:37
@inproceedings{56984f1c-ee14-494f-a9f3-2d2246dc91b7, abstract = {{<p>Perceptual centers (p-centers) can be defined as the perceived centers of a syllable. Previous research regarding the location of p-centers in speech relied on experimental methods, and among the suggested acoustic features contributing to the location of p-centers in Germanic languages is the transition of the consonant to the vowel onset. The current study investigates the prediction of the location of p-centers in German, by means of machine learning. Machine learning is a promising tool to capture possible non-linear relationships that may occur among the acoustic features used in the complexity that is the human perception. Therefore, an LSTM neural network approach was used for the identification of p-centers in a set of spoken German sentences, with input data features being Mel Frequency Cepstral Coefficients (MFCC), amplitude envelope and root mean squared energy. The model was able to achieve a balanced accuracy of 84% with MFCCs being the best predictor of p-center location.</p>}}, author = {{Schulz, Felicia and De Sisto, Mirella and Roncaglia-Denissen, M. Paula and Hendrix, Peter}}, booktitle = {{Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH}}, issn = {{2308-457X}}, keywords = {{deep learning; Long Short-Term Memory; Mel Frequency Cepstral Coefficients; perceptual centers}}, language = {{eng}}, pages = {{1793--1797}}, series = {{Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH}}, title = {{Predicting Perceptual Centers Located at Vowel Onset in German Speech Using Long Short-Term Memory Networks}}, url = {{http://dx.doi.org/10.21437/Interspeech.2023-2154}}, doi = {{10.21437/Interspeech.2023-2154}}, volume = {{2023-August}}, year = {{2023}}, }