Advanced

Knowledge-light Letter-to-Sound Conversion for Swedish with FST and TBL

Uneson, Marcus LU (2006) Fonetik 2006 In Proceedings of Fonetik 2006 p.141-144
Abstract
This paper describes some exploratory attempts to apply a combination of finite state

transducers (FST) and transformation-based learning (TBL, Brill 1992) to the problem of

letter-to-sound (LTS) conversion for Swedish. Following Bouma (2000) for Dutch, we employ

FST for segmentation of the textual input into groups of letters and a first transcription stage;

we feed the output of this step into a TBL system. With this setup, we reach 96.2% correctly

transcribed segments with rather restricted means (a small set of hand-crafted rules for the

FST stage; a set of 12 templates and a training set of 30kw for the TBL stage).

Observing that quantity is the major error source and that... (More)
This paper describes some exploratory attempts to apply a combination of finite state

transducers (FST) and transformation-based learning (TBL, Brill 1992) to the problem of

letter-to-sound (LTS) conversion for Swedish. Following Bouma (2000) for Dutch, we employ

FST for segmentation of the textual input into groups of letters and a first transcription stage;

we feed the output of this step into a TBL system. With this setup, we reach 96.2% correctly

transcribed segments with rather restricted means (a small set of hand-crafted rules for the

FST stage; a set of 12 templates and a training set of 30kw for the TBL stage).

Observing that quantity is the major error source and that compound morpheme

boundaries can be useful for inferring quantity, we exploratively add good precision-low

recall compound splitting based on graphotactic constraints. With this simple-minded

method, targeting only a subset of the compounds, performance improves to 96.9%. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
LTS, Swedish, grapheme-to-phoneme conversion for Swedish, letter-to-sound conversion for Swedish
in
Proceedings of Fonetik 2006
editor
Ambrazaitis, Gilbert; Schötz, Susanne; and
pages
141 - 144
publisher
Lund University
conference name
Fonetik 2006
language
English
LU publication?
yes
id
8a7843d7-9be3-47bd-8d0a-84e299613bf3 (old id 538838)
date added to LUP
2007-09-25 13:29:39
date last changed
2016-07-08 14:52:31
@inproceedings{8a7843d7-9be3-47bd-8d0a-84e299613bf3,
  abstract     = {This paper describes some exploratory attempts to apply a combination of finite state<br/><br>
transducers (FST) and transformation-based learning (TBL, Brill 1992) to the problem of<br/><br>
letter-to-sound (LTS) conversion for Swedish. Following Bouma (2000) for Dutch, we employ<br/><br>
FST for segmentation of the textual input into groups of letters and a first transcription stage;<br/><br>
we feed the output of this step into a TBL system. With this setup, we reach 96.2% correctly<br/><br>
transcribed segments with rather restricted means (a small set of hand-crafted rules for the<br/><br>
FST stage; a set of 12 templates and a training set of 30kw for the TBL stage).<br/><br>
Observing that quantity is the major error source and that compound morpheme<br/><br>
boundaries can be useful for inferring quantity, we exploratively add good precision-low<br/><br>
recall compound splitting based on graphotactic constraints. With this simple-minded<br/><br>
method, targeting only a subset of the compounds, performance improves to 96.9%.},
  author       = {Uneson, Marcus},
  booktitle    = {Proceedings of Fonetik 2006},
  editor       = {Ambrazaitis, Gilbert and Schötz, Susanne},
  keyword      = {LTS,Swedish,grapheme-to-phoneme conversion for Swedish,letter-to-sound conversion for Swedish},
  language     = {eng},
  pages        = {141--144},
  publisher    = {Lund University},
  title        = {Knowledge-light Letter-to-Sound Conversion for Swedish with FST and TBL},
  year         = {2006},
}