Advanced

Lexical and Acoustic Modelling of Swedish Prosody

Frid, Johan LU (2003) In Travaux de l'Institut de Linguistique de Lund 45.
Abstract (Swedish)
Popular Abstract in Swedish

Prosodi och intonation är viktiga delar i det mänskliga talet. I system för text-till-talomvandling och automatisk taligenkänning måste prosodiska modeller inkluderas för att uppnå acceptabla resultat. Denna avhandling bidrar till modelleringen av svensk prosodi ur lexikalt och akustiskt perspektiv.



Tre aspekter av lexikal modellering av prosodi studeras. Vi skisserar en modell av prosodibehandlingen för 'icke-standard'-ord, samt genomför en analys av ordbetoning i svenska. I en större studie byggs ett CART-baserat system för förutsägelse av uttal från ortografi. Genom förutsägelse bokstav-för-bokstav förutsägs allofoner korrekt till 98.87% och hela ord korrekt till 72.26%. För... (More)
Popular Abstract in Swedish

Prosodi och intonation är viktiga delar i det mänskliga talet. I system för text-till-talomvandling och automatisk taligenkänning måste prosodiska modeller inkluderas för att uppnå acceptabla resultat. Denna avhandling bidrar till modelleringen av svensk prosodi ur lexikalt och akustiskt perspektiv.



Tre aspekter av lexikal modellering av prosodi studeras. Vi skisserar en modell av prosodibehandlingen för 'icke-standard'-ord, samt genomför en analys av ordbetoning i svenska. I en större studie byggs ett CART-baserat system för förutsägelse av uttal från ortografi. Genom förutsägelse bokstav-för-bokstav förutsägs allofoner korrekt till 98.87% och hela ord korrekt till 72.26%. För prosodi uppnås de bästa resultaten genom förutsägelser från hela ordmönster: huvudbetoningens position är korrekt i 88.6% och ordaccenten i 87.7% av fallen.



I den akustiska delen presenteras först två översikter: en om tidigare arbeten inom talteknologiinriktad modellering av svenskans intonation, samt en om intonationsmodeller i allmänhet, med tonvikt på stiliseringsmodeller. Vårt eget bidrag utgörs av utvecklandet av ett system som genererar grundtonskurvor utifrån prosodiska transkriptioner.



Den sista studien handlar om automatisk identifiering och klassificering av dialekttyp, ordaccentkategori och prominensnivå i svenskan. Material från nästan 100 olika dialekter och 250 talare används. Genom parametrisering av grundtonskurvor, bl a genom stiliseringsmodeller, byggs modeller för förutsägelse av ovan nämnda kategorier. De bästa resultaten är 79.3% för ordaccent, 62.2% för prominensnivå, samt 59.1% för dialekttyp. För enskilda orter, kan ordaccentkategori klassificeras korrekt i upp till över 90% av fallen. (Less)
Abstract
Prosody and intonation are very important ingredients of human speech. In speech technology, text-to-speech (TTS) and automatic speech recognition (ASR) systems must incorporate prosodic models in order to reach acceptable performances. We contribute to the modelling of Swedish prosody in a speech technology context from lexical and acoustic angles.



Three aspects of lexical prosodic modelling are studied. We sketch a model for handling prosody in 'non-standard' words and then we perform a study of Swedish word stress. In a larger study, we build a CART-based system for the prediction of pronunciation from orthography. By using a letter-by-letter prediction method, allophones are correctly predicted in 96.87%, and whole... (More)
Prosody and intonation are very important ingredients of human speech. In speech technology, text-to-speech (TTS) and automatic speech recognition (ASR) systems must incorporate prosodic models in order to reach acceptable performances. We contribute to the modelling of Swedish prosody in a speech technology context from lexical and acoustic angles.



Three aspects of lexical prosodic modelling are studied. We sketch a model for handling prosody in 'non-standard' words and then we perform a study of Swedish word stress. In a larger study, we build a CART-based system for the prediction of pronunciation from orthography. By using a letter-by-letter prediction method, allophones are correctly predicted in 96.87%, and whole words in 72.26% of the cases. For prosody, predictions based on whole-word features perform better: location of primary stress is correct in 88.6% and word accent in 87.7%.



In the acoustic modelling section, we first present two surveys: one with special reference to previous work on TTS-related intonation modelling of Swedish and one on intonation modelling in general, with special emphasis on stylization models. Our own work is concerned with the development of a system for the generation of F0 contours from phonological intonation labels. In the other survey, we describe some of the more influential intonation models in the field. This review leads to the selection of two of the models for our continued work in the thesis: a stylization-based model, where temporal and frequency information is extracted directly from actual F0 contours, and Taylor's tilt model, which parameterizes the contours using a mathematical function.



The stylization model is then used in building a data-driven method for the generation of pitch patterns in Swedish content words. The model is able to produce pitch contours that closely approach real ones, but the performance varies with the complexity of the pitch patterns.



Both models are used in the final study. This is concerned with the automatic identification and classification of dialect, word accent categories and prominence levels in Swedish data. Material from almost 100 dialects and 250 speakers is used to build a model that predicts these features from F0 contours. For material from the whole Swedish-speaking area, the best results for word accent, prominence level and regional dialect type are 79.3%, 62.2% and 59.1% correct, respectively. For individual villages, word accent data can be classified correctly with an accuracy of more than 90%. (Less)
Please use this url to cite or link to this publication:
author
opponent
  • Docent House, David, Kungliga Tekniska Högskolan, Stockholm
organization
publishing date
type
Thesis
publication status
published
subject
keywords
phonology, Phonetics, stress, dialects, Swedish, word accents, letter-to-sound, text-to-speech, modelling, intonation, prosody, speech technology, Fonetik, fonologi, Scandinavian languages and literature, Nordiska språk (språk och litteratur)
in
Travaux de l'Institut de Linguistique de Lund
volume
45
pages
176 pages
defense location
Kulturanatomen, sal 201, Biskopsgatan 7
defense date
2003-05-28 10:15
ISSN
0347-2558
ISBN
91-974116-8-X
language
English
LU publication?
yes
id
2da8019a-a513-463f-a752-6942f12a2038 (old id 21139)
date added to LUP
2007-05-28 13:40:12
date last changed
2016-09-19 08:45:00
@phdthesis{2da8019a-a513-463f-a752-6942f12a2038,
  abstract     = {Prosody and intonation are very important ingredients of human speech. In speech technology, text-to-speech (TTS) and automatic speech recognition (ASR) systems must incorporate prosodic models in order to reach acceptable performances. We contribute to the modelling of Swedish prosody in a speech technology context from lexical and acoustic angles.<br/><br>
<br/><br>
Three aspects of lexical prosodic modelling are studied. We sketch a model for handling prosody in 'non-standard' words and then we perform a study of Swedish word stress. In a larger study, we build a CART-based system for the prediction of pronunciation from orthography. By using a letter-by-letter prediction method, allophones are correctly predicted in 96.87%, and whole words in 72.26% of the cases. For prosody, predictions based on whole-word features perform better: location of primary stress is correct in 88.6% and word accent in 87.7%.<br/><br>
<br/><br>
In the acoustic modelling section, we first present two surveys: one with special reference to previous work on TTS-related intonation modelling of Swedish and one on intonation modelling in general, with special emphasis on stylization models. Our own work is concerned with the development of a system for the generation of F0 contours from phonological intonation labels. In the other survey, we describe some of the more influential intonation models in the field. This review leads to the selection of two of the models for our continued work in the thesis: a stylization-based model, where temporal and frequency information is extracted directly from actual F0 contours, and Taylor's tilt model, which parameterizes the contours using a mathematical function.<br/><br>
<br/><br>
The stylization model is then used in building a data-driven method for the generation of pitch patterns in Swedish content words. The model is able to produce pitch contours that closely approach real ones, but the performance varies with the complexity of the pitch patterns.<br/><br>
<br/><br>
Both models are used in the final study. This is concerned with the automatic identification and classification of dialect, word accent categories and prominence levels in Swedish data. Material from almost 100 dialects and 250 speakers is used to build a model that predicts these features from F0 contours. For material from the whole Swedish-speaking area, the best results for word accent, prominence level and regional dialect type are 79.3%, 62.2% and 59.1% correct, respectively. For individual villages, word accent data can be classified correctly with an accuracy of more than 90%.},
  author       = {Frid, Johan},
  isbn         = {91-974116-8-X},
  issn         = {0347-2558},
  keyword      = {phonology,Phonetics,stress,dialects,Swedish,word accents,letter-to-sound,text-to-speech,modelling,intonation,prosody,speech technology,Fonetik,fonologi,Scandinavian languages and literature,Nordiska språk (språk och litteratur)},
  language     = {eng},
  pages        = {176},
  school       = {Lund University},
  series       = {Travaux de l'Institut de Linguistique de Lund},
  title        = {Lexical and Acoustic Modelling of Swedish Prosody},
  volume       = {45},
  year         = {2003},
}