On the London–Lund Corpus 2 : Design, challenges and innovations

Pöldvere, Nele; Johansson, Victoria; Paradis, Carita

On the London–Lund Corpus 2 : Design, challenges and innovations

Mark

Pöldvere, Nele ^LU

; Johansson, Victoria ^LU and Paradis, Carita ^LU

(2021) In English Language and Linguistics 25(3). p.459-483

Abstract: This article describes and critically examines the challenging task of compiling The London–Lund Corpus 2 (LLC–2) from start to end, accounting for the methodological decisions made in each stage and highlighting the innovations. LLC–2 is a half-a-million-word corpus of contemporary spoken British English with recordings from 2014 to 2019. Its size and design are the same as those of the world's first machine-readable spoken corpus, The London–Lund Corpus of Spoken English with data from the 1950s to 1980s. In this way, LLC–2 allows not only for synchronic investigations of contemporary speech but also for principled diachronic research of spoken language across time. Each stage of the compilation of LLC–2 posed its own... (More); This article describes and critically examines the challenging task of compiling The London–Lund Corpus 2 (LLC–2) from start to end, accounting for the methodological decisions made in each stage and highlighting the innovations. LLC–2 is a half-a-million-word corpus of contemporary spoken British English with recordings from 2014 to 2019. Its size and design are the same as those of the world's first machine-readable spoken corpus, The London–Lund Corpus of Spoken English with data from the 1950s to 1980s. In this way, LLC–2 allows not only for synchronic investigations of contemporary speech but also for principled diachronic research of spoken language across time. Each stage of the compilation of LLC–2 posed its own challenges, ranging from the design of the corpus, the recruitment of the speakers, transcription, markup and annotation procedures, to the release of the corpus to the international research community. The decisions and solutions represent state-of-the-art practices of spoken corpus compilation with important innovations that enhance the value of LLC–2 for spoken corpus research, such as the availability of both the transcriptions and the corresponding time-aligned audio files in a standard compliant format. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/58483455-af73-4fe9-b222-713cfed707c4

author

Pöldvere, Nele ^LU

; Johansson, Victoria ^LU and Paradis, Carita ^LU

organization

publishing date

2021

type

Contribution to journal

publication status

published

subject

Comparative Language Studies and Linguistics

keywords

Corpus compilation, Spoken language, The London–Lund Corpus of Spoken English, XML transcriptions, Open access

in

English Language and Linguistics

volume

25

issue

3

pages

459 - 483

publisher

Cambridge University Press

external identifiers

scopus:85115023145

ISSN

1360-6743

DOI

10.1017/S1360674321000186

project

The London-Lund Corpus 2 of spoken British English (LLC 2)

language

English

LU publication?

yes

id

58483455-af73-4fe9-b222-713cfed707c4

date added to LUP

2021-06-07 16:21:38

date last changed

2026-01-12 15:54:57

@article{58483455-af73-4fe9-b222-713cfed707c4,
  abstract     = {{This article describes and critically examines the challenging task of compiling <i>The London–Lund Corpus 2</i> (LLC–2) from start to end, accounting for the methodological decisions made in each stage and highlighting the innovations. LLC–2 is a half-a-million-word corpus of contemporary spoken British English with recordings from 2014 to 2019. Its size and design are the same as those of the world's first machine-readable spoken corpus, <i>The London–Lund Corpus of Spoken English</i> with data from the 1950s to 1980s. In this way, LLC–2 allows not only for synchronic investigations of contemporary speech but also for principled diachronic research of spoken language across time. Each stage of the compilation of LLC–2 posed its own challenges, ranging from the design of the corpus, the recruitment of the speakers, transcription, markup and annotation procedures, to the release of the corpus to the international research community. The decisions and solutions represent state-of-the-art practices of spoken corpus compilation with important innovations that enhance the value of LLC–2 for spoken corpus research, such as the availability of both the transcriptions and the corresponding time-aligned audio files in a standard compliant format.}},
  author       = {{Pöldvere, Nele and Johansson, Victoria and Paradis, Carita}},
  issn         = {{1360-6743}},
  keywords     = {{Corpus compilation; Spoken language; The London–Lund Corpus of Spoken English; XML transcriptions; Open access}},
  language     = {{eng}},
  number       = {{3}},
  pages        = {{459--483}},
  publisher    = {{Cambridge University Press}},
  series       = {{English Language and Linguistics}},
  title        = {{On the London–Lund Corpus 2 : Design, challenges and innovations}},
  url          = {{http://dx.doi.org/10.1017/S1360674321000186}},
  doi          = {{10.1017/S1360674321000186}},
  volume       = {{25}},
  year         = {{2021}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

On the London–Lund Corpus 2 : Design, challenges and innovations