Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

The new London–Lund Corpus (LLC–2) : Design, compilation, access

Pöldvere, Nele LU ; Johansson, Victoria LU and Paradis, Carita LU orcid (2019) International symposium on spoken language across time
Abstract
This talk reports on the compilation of the new London–Lund Corpus (LLC–2) –a corpus of contemporary spoken British English, collected 2014–2019. The sizeand design of LLC–2 are the same as that of the world’s first corpus of spokenlanguage, namely the London–Lund Corpus (LLC–1), with spoken data mainlyfrom the 1960s. In addition to the fact that we have a corpus of contemporary speech, the existence of LLC–2 also gives researchers the opportunity to make principles diachronic comparisons of speech over the past 50 years and detect change in communicative behaviour among speakers.The compilation of LLC–2 has included a number of different stages such as data collection, transcription of the recordings, markup and annotation, and finally... (More)
This talk reports on the compilation of the new London–Lund Corpus (LLC–2) –a corpus of contemporary spoken British English, collected 2014–2019. The sizeand design of LLC–2 are the same as that of the world’s first corpus of spokenlanguage, namely the London–Lund Corpus (LLC–1), with spoken data mainlyfrom the 1960s. In addition to the fact that we have a corpus of contemporary speech, the existence of LLC–2 also gives researchers the opportunity to make principles diachronic comparisons of speech over the past 50 years and detect change in communicative behaviour among speakers.The compilation of LLC–2 has included a number of different stages such as data collection, transcription of the recordings, markup and annotation, and finally making the corpus accessible to the research community. The talk describes and critically examines the methodological decisions made in each stage. For example, it was important to strike a balance between LLC–2 as a representative collection of data of contemporary spoken English and its comparability to LLC–1. Therefore, both corpora contain the same speech situations (dialogue, mainly everyday face-to-face conversation, as well as monologue), but the specific recordings added to LLC–2 also reflect the technological advances of the last few decades, particularly with respect to speech situations such as telephone calls (e.g., Skype) and broadcast discussions and interviews (e.g., podcasts). Moreover, the transcriptions in LLC–2 are orthographic and time-aligned with the corresponding sound files, which is a feature of the corpus that is novel and makes it possible to, among other things, investigate prosody and dialogue management among speakers with great precision. The corpus, as well as metadata about the transcriptions and the speakers, will be released to the public in late 2019 from the Lund University Humanities Lab’s corpus server. The release will fill an unfortunate gap in the availability of spoken corpora for linguistic analysis. The benefits of spoken corpora in general and of LLC–2 in particular will be demonstrated in the talk through examples of case studies based on the corpus (e.g., Põldvere & Paradis, 2019a, 2019b). The case studies illustrate how LLC–2 can contribute to our understanding of meaning-making and discursive practices in real communication and provide a window into the cognitive and social processes of dialogic interaction, both from a contemporary and a back-in-time perspective. (Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to conference
publication status
published
subject
conference name
International symposium on spoken language across time
conference location
Lund, Sweden
conference dates
2019-09-20 - 2019-09-20
language
English
LU publication?
yes
id
a513af19-e296-4549-a624-a878b048fa0b
alternative location
https://www.sol.lu.se/engelska/forskning/llc2/international-symposium-spoken-language-across-time/programme-and-speakers/
date added to LUP
2019-10-04 17:33:44
date last changed
2019-10-23 08:36:59
@misc{a513af19-e296-4549-a624-a878b048fa0b,
  abstract     = {{This talk reports on the compilation of the new London–Lund Corpus (LLC–2) –a corpus of contemporary spoken British English, collected 2014–2019. The sizeand design of LLC–2 are the same as that of the world’s first corpus of spokenlanguage, namely the London–Lund Corpus (LLC–1), with spoken data mainlyfrom the 1960s. In addition to the fact that we have a corpus of contemporary speech, the existence of LLC–2 also gives researchers the opportunity to make principles diachronic comparisons of speech over the past 50 years and detect change in communicative behaviour among speakers.The compilation of LLC–2 has included a number of different stages such as data collection, transcription of the recordings, markup and annotation, and finally making the corpus accessible to the research community. The talk describes and critically examines the methodological decisions made in each stage. For example, it was important to strike a balance between LLC–2 as a representative collection of data of contemporary spoken English and its comparability to LLC–1. Therefore, both corpora contain the same speech situations (dialogue, mainly everyday face-to-face conversation, as well as monologue), but the specific recordings added to LLC–2 also reflect the technological advances of the last few decades, particularly with respect to speech situations such as telephone calls (e.g., Skype) and broadcast discussions and interviews (e.g., podcasts). Moreover, the transcriptions in LLC–2 are orthographic and time-aligned with the corresponding sound files, which is a feature of the corpus that is novel and makes it possible to, among other things, investigate prosody and dialogue management among speakers with great precision. The corpus, as well as metadata about the transcriptions and the speakers, will be released to the public in late 2019 from the Lund University Humanities Lab’s corpus server. The release will fill an unfortunate gap in the availability of spoken corpora for linguistic analysis. The benefits of spoken corpora in general and of LLC–2 in particular will be demonstrated in the talk through examples of case studies based on the corpus (e.g., Põldvere & Paradis, 2019a, 2019b). The case studies illustrate how LLC–2 can contribute to our understanding of meaning-making and discursive practices in real communication and provide a window into the cognitive and social processes of dialogic interaction, both from a contemporary and a back-in-time perspective.}},
  author       = {{Pöldvere, Nele and Johansson, Victoria and Paradis, Carita}},
  language     = {{eng}},
  month        = {{09}},
  title        = {{The new London–Lund Corpus (LLC–2) : Design, compilation, access}},
  url          = {{https://www.sol.lu.se/engelska/forskning/llc2/international-symposium-spoken-language-across-time/programme-and-speakers/}},
  year         = {{2019}},
}