Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

The use of lexical tone in the segmentation of speech

Söderström, Pelle LU ; Lulaci, Tugba LU and Roll, Mikael LU (2023) Annual Conference of the Australian Linguistic Society p.70-71
Abstract
Introduction
In speech, there are no blank spaces to signal boundaries between words as there is in written language, but listeners can nevertheless recognise individual words rapidly. Without these blank spaces or commas, listeners have to divide up – segment – the continuous speech stream into discrete words using other means. This study aimed to investigate the tonal cues important for speech segmentation in Swedish. We know that that different languages use different cues in speech segmentation, such as stress (Norris, McQueen, & Cutler, 1995), syllable weight (Cutler & Norris, 1988) and vowel harmony (Suomi, McQueen, & Cutler, 1997), but we do not yet know the extent to which phonological cues are used in speech... (More)
Introduction
In speech, there are no blank spaces to signal boundaries between words as there is in written language, but listeners can nevertheless recognise individual words rapidly. Without these blank spaces or commas, listeners have to divide up – segment – the continuous speech stream into discrete words using other means. This study aimed to investigate the tonal cues important for speech segmentation in Swedish. We know that that different languages use different cues in speech segmentation, such as stress (Norris, McQueen, & Cutler, 1995), syllable weight (Cutler & Norris, 1988) and vowel harmony (Suomi, McQueen, & Cutler, 1997), but we do not yet know the extent to which phonological cues are used in speech segmentation. In English, stressed and metrically strong syllables are heard as more reliable word onsets, leading the parser to initiate a lexical access attempt at these points. Accurate segmentation is crucial since words can always be embedded in larger words, and these spurious embedded words are activated in memory (Luce & Cluff, 1998): the phrase start writing potentially includes star, trite, try, rye and so on (Cutler, 2012). However, no study has yet investigated speech segmentation in languages like Swedish, where prosody systematically combines with morphology. This will allow us to more fully understand universal drivers behind speech segmentation.

In Swedish, every word or word stem has a lexical tone known as a word accent, in addition to stress. In Central Swedish, this tone is either low (accent 1) or high (accent 2). All monosyllabic words have accent 1, and the majority of polysyllabic words – such as compounds – have accent 2 on the word stem, especially trochees. There is also an interaction between prosody and morphology, so that stem word accent is also determined by suffixation: the word stem båt (‘boat’) has accent 1 preceding the singular suffix -en (båt1-en) but accent 2 preceding the plural suffix -ar (båt2-ar). With regard to word embeddings, a frequent accent 2 word with a plural suffix like möten2 (‘meetings’) potentially contains mö (‘maiden’) and tenn (‘tin’), and the accent 2 on the word stem ensures it can also be heard as the compound mö-tenn (‘maiden tin’). However, the string möten1 with accent 1 can only be heard as two words, as in the phrase möt en ko (‘meet a cow’). Accent 2 has thus been proposed to be ‘connective’ (Elert, 1970; Malmberg, 1959): it signals that more syllables will follow, belonging to the same lexical item. A string with accent 2 can thus always contain other words, perhaps more so than accent 1, which might make it more difficult to segment – especially in the case of monosyllabic targets – than accent 1 strings.
This study used a word spotting paradigm to investigate the segmentation of Swedish words embedded in non-word frames to determine how prosody and syllable structure interact to affect word spotting performance.

Methods
Native speakers of Swedish listened to auditory stimuli – trisyllabic non-word frames – recorded by a native speaker of Central Swedish. They were asked to press a button when they heard a Swedish word at the beginning of a string, entering the word using the computer keyboard. Each participant heard 15 monosyllabic target words embedded in accent 1 frames (bal-ädi1 ‘ball’), 15 monosyllabic words in accent 2 frames (bal-ädi2), 15 disyllabic words in accent 2 frames (bagge-pi2 ‘ram’) and 15 disyllabic words in accent 1 frames (bagge-pi1). All target items were matched for word frequency. Word accent pairs were counterbalanced across subjects. There were 60 fillers, containing no possible Swedish words. For response times, only trials where participants spotted and typed in the correct word were included, whereas all trials were included in the accuracy analysis.

Data analysis and results
Response times were analysed using a generalised linear mixed-effects model with an inverse Gaussian function and identity link using the lme4 package in R (Bates, Mächler, Bolker, & Walker, 2015). Word accent and number of target syllables were included as deviation-coded fixed effects with participant and item as random effects. The fastest response times were found for disyllabic words (e.g. bagge) in accent 2 frames, significantly faster than for monosyllabic words (e.g. bal) in accent 2 frames. Response accuracy was analysed using an identical model structure to response times but using a binomial function and logit link. An interaction between accent and number of target syllables showed that disyllabic words were spotted more successfully than monosyllabic words in accent 2 frames.

Discussion
Monosyllabic targets were more difficult to spot in accent 2 strings, as shown by both response time and accuracy. This can possibly be explained by the fact that accent 2 strings can always contain other words, slowing down speech segmentation and recognition. It is also possible that the word accent triggers inappropriate syllabification, so that bal in bal-ädi2 is heard as the non-word ba (*ba-lädi), similarly to strong syllables signalling a segmentation point and prompting syllabification in English (Cutler & Norris, 1988).

References
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1). doi:10.18637/jss.v067.i01
Cutler, A. (2012). Native Listening: Language Experience and the Recognition of Spoken Words: The MIT Press.
Cutler, A., & Norris, D. (1988). The Role of Strong Syllables in Segmentation for Lexical Access. Journal of Experimental Psychology-Human Perception and Performance, 14(1), 113-121. doi:10.1037/0096-1523.14.1.113
Elert, C.-C. (1970). Ljud och ord i svenskan. Stockholm: Almqvist & Wiksell.
Luce, P. A., & Cluff, M. S. (1998). Delayed commitment in spoken word recognition: Evidence from cross-modal priming. Perception & Psychophysics, 60(3), 484-490. doi:10.3758/Bf03206868
Malmberg, B. (1959). Bemerkungen zum schwedischen Wortakzent. Zeitschrift für Phonetik, 12, 193–207.
Norris, D., McQueen, J. M., & Cutler, A. (1995). Competition and Segmentation in Spoken-Word Recognition. Journal of Experimental Psychology-Learning Memory and Cognition, 21(5), 1209-1228. doi:10.1037/0278-7393.21.5.1209
Suomi, K., McQueen, J. M., & Cutler, A. (1997). Vowel Harmony and Speech Segmentation in Finnish. Journal of Memory and Language, 36(3), 422-444. doi:10.1006/jmla.1996.2495
(Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to conference
publication status
published
subject
keywords
Swedish, speech segmentation, prosody, phonology
pages
2 pages
conference name
Annual Conference of the Australian Linguistic Society
conference location
Sydney, Australia
conference dates
2023-11-29 - 2023-12-01
project
Neurophysiological correlates of predictive mechanisms in word recognition
language
English
LU publication?
yes
id
863ef738-207e-420f-b355-f3b9bf76cb47
alternative location
https://als.asn.au/Resources/PageContent/Files/a1e8cba3-f03f-47e8-8ca7-3ebab8eaf60f.pdf#page=70
date added to LUP
2023-08-25 03:59:30
date last changed
2023-12-04 07:30:22
@misc{863ef738-207e-420f-b355-f3b9bf76cb47,
  abstract     = {{Introduction<br/>In speech, there are no blank spaces to signal boundaries between words as there is in written language, but listeners can nevertheless recognise individual words rapidly. Without these blank spaces or commas, listeners have to divide up – segment – the continuous speech stream into discrete words using other means. This study aimed to investigate the tonal cues important for speech segmentation in Swedish. We know that that different languages use different cues in speech segmentation, such as stress (Norris, McQueen, &amp; Cutler, 1995), syllable weight (Cutler &amp; Norris, 1988) and vowel harmony (Suomi, McQueen, &amp; Cutler, 1997), but we do not yet know the extent to which phonological cues are used in speech segmentation. In English, stressed and metrically strong syllables are heard as more reliable word onsets, leading the parser to initiate a lexical access attempt at these points. Accurate segmentation is crucial since words can always be embedded in larger words, and these spurious embedded words are activated in memory (Luce &amp; Cluff, 1998): the phrase start writing potentially includes star, trite, try, rye and so on (Cutler, 2012). However, no study has yet investigated speech segmentation in languages like Swedish, where prosody systematically combines with morphology. This will allow us to more fully understand universal drivers behind speech segmentation.<br/><br/>In Swedish, every word or word stem has a lexical tone known as a word accent, in addition to stress. In Central Swedish, this tone is either low (accent 1) or high (accent 2). All monosyllabic words have accent 1, and the majority of polysyllabic words – such as compounds – have accent 2 on the word stem, especially trochees. There is also an interaction between prosody and morphology, so that stem word accent is also determined by suffixation: the word stem båt (‘boat’) has accent 1 preceding the singular suffix -en (båt1-en) but accent 2 preceding the plural suffix -ar (båt2-ar). With regard to word embeddings, a frequent accent 2 word with a plural suffix like möten2 (‘meetings’) potentially contains mö (‘maiden’) and tenn (‘tin’), and the accent 2 on the word stem ensures it can also be heard as the compound mö-tenn (‘maiden tin’). However, the string möten1 with accent 1 can only be heard as two words, as in the phrase möt en ko (‘meet a cow’). Accent 2 has thus been proposed to be ‘connective’ (Elert, 1970; Malmberg, 1959): it signals that more syllables will follow, belonging to the same lexical item. A string with accent 2 can thus always contain other words, perhaps more so than accent 1, which might make it more difficult to segment – especially in the case of monosyllabic targets – than accent 1 strings.<br/>This study used a word spotting paradigm to investigate the segmentation of Swedish words embedded in non-word frames to determine how prosody and syllable structure interact to affect word spotting performance.<br/><br/>Methods<br/>Native speakers of Swedish listened to auditory stimuli – trisyllabic non-word frames – recorded by a native speaker of Central Swedish. They were asked to press a button when they heard a Swedish word at the beginning of a string, entering the word using the computer keyboard. Each participant heard 15 monosyllabic target words embedded in accent 1 frames (bal-ädi1 ‘ball’), 15 monosyllabic words in accent 2 frames (bal-ädi2), 15 disyllabic words in accent 2 frames (bagge-pi2 ‘ram’) and 15 disyllabic words in accent 1 frames (bagge-pi1). All target items were matched for word frequency. Word accent pairs were counterbalanced across subjects. There were 60 fillers, containing no possible Swedish words. For response times, only trials where participants spotted and typed in the correct word were included, whereas all trials were included in the accuracy analysis.<br/><br/>Data analysis and results<br/>Response times were analysed using a generalised linear mixed-effects model with an inverse Gaussian function and identity link using the lme4 package in R (Bates, Mächler, Bolker, &amp; Walker, 2015). Word accent and number of target syllables were included as deviation-coded fixed effects with participant and item as random effects. The fastest response times were found for disyllabic words (e.g. bagge) in accent 2 frames, significantly faster than for monosyllabic words (e.g. bal) in accent 2 frames. Response accuracy was analysed using an identical model structure to response times but using a binomial function and logit link. An interaction between accent and number of target syllables showed that disyllabic words were spotted more successfully than monosyllabic words in accent 2 frames. <br/><br/>Discussion<br/>Monosyllabic targets were more difficult to spot in accent 2 strings, as shown by both response time and accuracy. This can possibly be explained by the fact that accent 2 strings can always contain other words, slowing down speech segmentation and recognition. It is also possible that the word accent triggers inappropriate syllabification, so that bal in bal-ädi2 is heard as the non-word ba (*ba-lädi), similarly to strong syllables signalling a segmentation point and prompting syllabification in English (Cutler &amp; Norris, 1988).<br/><br/>References<br/>Bates, D., Mächler, M., Bolker, B., &amp; Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1). doi:10.18637/jss.v067.i01<br/>Cutler, A. (2012). Native Listening: Language Experience and the Recognition of Spoken Words: The MIT Press.<br/>Cutler, A., &amp; Norris, D. (1988). The Role of Strong Syllables in Segmentation for Lexical Access. Journal of Experimental Psychology-Human Perception and Performance, 14(1), 113-121. doi:10.1037/0096-1523.14.1.113<br/>Elert, C.-C. (1970). Ljud och ord i svenskan. Stockholm: Almqvist &amp; Wiksell.<br/>Luce, P. A., &amp; Cluff, M. S. (1998). Delayed commitment in spoken word recognition: Evidence from cross-modal priming. Perception &amp; Psychophysics, 60(3), 484-490. doi:10.3758/Bf03206868<br/>Malmberg, B. (1959). Bemerkungen zum schwedischen Wortakzent. Zeitschrift für Phonetik, 12, 193–207. <br/>Norris, D., McQueen, J. M., &amp; Cutler, A. (1995). Competition and Segmentation in Spoken-Word Recognition. Journal of Experimental Psychology-Learning Memory and Cognition, 21(5), 1209-1228. doi:10.1037/0278-7393.21.5.1209<br/>Suomi, K., McQueen, J. M., &amp; Cutler, A. (1997). Vowel Harmony and Speech Segmentation in Finnish. Journal of Memory and Language, 36(3), 422-444. doi:10.1006/jmla.1996.2495<br/>}},
  author       = {{Söderström, Pelle and Lulaci, Tugba and Roll, Mikael}},
  keywords     = {{Swedish; speech segmentation; prosody; phonology}},
  language     = {{eng}},
  month        = {{12}},
  pages        = {{70--71}},
  title        = {{The use of lexical tone in the segmentation of speech}},
  url          = {{https://als.asn.au/Resources/PageContent/Files/a1e8cba3-f03f-47e8-8ca7-3ebab8eaf60f.pdf#page=70}},
  year         = {{2023}},
}