Journal Digital Corpus : Swedish Newsreel Transcriptions
(2025) In Journal of Open Humanities Data 11. p.1-6- Abstract
- The Journal Digital Corpus (JDC) is a corpus comprising transcriptions of Swedish historical newsreels, primarily sourced from the SF Veckorevy newsreels produced between the early 1910s and the 1960s. JDC includes transcribed speech from 2,553 newsreels (over two million words) and intertitles from 4,333 videos. Utilizing custom-built Python libraries, SweScribe and stum, the corpus facilitates unprecedented access to historical narratives of Swedish modernity. It offers extensive research opportunities across history, cultural studies, linguistics, and media analysis, enabling detailed examinations of societal shifts, media representation, and linguistic developments throughout twentieth-century Sweden.
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/60d655d5-8e30-4431-a419-ea333b426e2f
- author
- Aspenskog, Robert
LU
; Johansson, Mathias LU
and Snickars, Pelle LU
- organization
- publishing date
- 2025-08-04
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- newsreel, automatic speech recognition, intertitles, Swedish, corpus, multimodal
- in
- Journal of Open Humanities Data
- volume
- 11
- article number
- 44
- pages
- 1 - 6
- publisher
- Ubiquity Press Ltd.
- ISSN
- 2059-481X
- DOI
- 10.5334/johd.344
- project
- Modern Times 1936
- language
- English
- LU publication?
- yes
- additional info
- Data paper
- id
- 60d655d5-8e30-4431-a419-ea333b426e2f
- date added to LUP
- 2025-08-07 21:01:02
- date last changed
- 2025-08-18 12:10:07
@article{60d655d5-8e30-4431-a419-ea333b426e2f, abstract = {{The Journal Digital Corpus (JDC) is a corpus comprising transcriptions of Swedish historical newsreels, primarily sourced from the SF Veckorevy newsreels produced between the early 1910s and the 1960s. JDC includes transcribed speech from 2,553 newsreels (over two million words) and intertitles from 4,333 videos. Utilizing custom-built Python libraries, SweScribe and stum, the corpus facilitates unprecedented access to historical narratives of Swedish modernity. It offers extensive research opportunities across history, cultural studies, linguistics, and media analysis, enabling detailed examinations of societal shifts, media representation, and linguistic developments throughout twentieth-century Sweden.}}, author = {{Aspenskog, Robert and Johansson, Mathias and Snickars, Pelle}}, issn = {{2059-481X}}, keywords = {{newsreel; automatic speech recognition; intertitles; Swedish; corpus; multimodal}}, language = {{eng}}, month = {{08}}, pages = {{1--6}}, publisher = {{Ubiquity Press Ltd.}}, series = {{Journal of Open Humanities Data}}, title = {{Journal Digital Corpus : Swedish Newsreel Transcriptions}}, url = {{http://dx.doi.org/10.5334/johd.344}}, doi = {{10.5334/johd.344}}, volume = {{11}}, year = {{2025}}, }