Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Mapping the Past : Geographically Linking an Early 20th Century Swedish Encyclopedia with Wikidata

Ahlin, Axel ; Myrne, Alfred and Nugues, Pierre LU orcid (2024) Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings p.11040-11048
Abstract

In this paper, we describe the extraction of all the location entries from a prominent Swedish encyclopedia from the early 20th century, the Nordisk Familjebok 'Nordic Family Book.' We focused on the second edition called Uggleupplagan, which comprises 38 volumes and over 182,000 articles. This makes it one of the most extensive Swedish encyclopedias. Using a classifier, we first determined the category of the entries. We found that approximately 22 percent of them were locations. We applied a named entity recognition to these entries and we linked them to Wikidata. Wikidata enabled us to extract their precise geographic locations resulting in almost 18,000 valid coordinates. We then analyzed the distribution of these locations and the... (More)

In this paper, we describe the extraction of all the location entries from a prominent Swedish encyclopedia from the early 20th century, the Nordisk Familjebok 'Nordic Family Book.' We focused on the second edition called Uggleupplagan, which comprises 38 volumes and over 182,000 articles. This makes it one of the most extensive Swedish encyclopedias. Using a classifier, we first determined the category of the entries. We found that approximately 22 percent of them were locations. We applied a named entity recognition to these entries and we linked them to Wikidata. Wikidata enabled us to extract their precise geographic locations resulting in almost 18,000 valid coordinates. We then analyzed the distribution of these locations and the entry selection process. It showed a higher density within Sweden, Germany, and the United Kingdom. The paper sheds light on the selection and representation of geographic information in the Nordisk Familjebok, providing insights into historical and societal perspectives. It also paves the way for future investigations into entry selection in different time periods and comparative analyses among various encyclopedias.

(Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
entity annotation, entity linking, named entity recognition
host publication
2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
series title
2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
editor
Calzolari, Nicoletta ; Kan, Min-Yen ; Hoste, Veronique ; Lenci, Alessandro ; Sakti, Sakriani and Xue, Nianwen
pages
9 pages
publisher
European Language Resources Association
conference name
Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
conference location
Hybrid, Torino, Italy
conference dates
2024-05-20 - 2024-05-25
external identifiers
  • scopus:85195913963
ISBN
9782493814104
language
English
LU publication?
yes
id
d4e5b4d0-72fe-4576-aaa3-c13bdd564211
alternative location
https://aclanthology.org/2024.lrec-main.962
date added to LUP
2024-09-11 15:13:34
date last changed
2024-09-11 15:13:45
@inproceedings{d4e5b4d0-72fe-4576-aaa3-c13bdd564211,
  abstract     = {{<p>In this paper, we describe the extraction of all the location entries from a prominent Swedish encyclopedia from the early 20th century, the Nordisk Familjebok 'Nordic Family Book.' We focused on the second edition called Uggleupplagan, which comprises 38 volumes and over 182,000 articles. This makes it one of the most extensive Swedish encyclopedias. Using a classifier, we first determined the category of the entries. We found that approximately 22 percent of them were locations. We applied a named entity recognition to these entries and we linked them to Wikidata. Wikidata enabled us to extract their precise geographic locations resulting in almost 18,000 valid coordinates. We then analyzed the distribution of these locations and the entry selection process. It showed a higher density within Sweden, Germany, and the United Kingdom. The paper sheds light on the selection and representation of geographic information in the Nordisk Familjebok, providing insights into historical and societal perspectives. It also paves the way for future investigations into entry selection in different time periods and comparative analyses among various encyclopedias.</p>}},
  author       = {{Ahlin, Axel and Myrne, Alfred and Nugues, Pierre}},
  booktitle    = {{2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings}},
  editor       = {{Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen}},
  isbn         = {{9782493814104}},
  keywords     = {{entity annotation; entity linking; named entity recognition}},
  language     = {{eng}},
  pages        = {{11040--11048}},
  publisher    = {{European Language Resources Association}},
  series       = {{2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings}},
  title        = {{Mapping the Past : Geographically Linking an Early 20th Century Swedish Encyclopedia with Wikidata}},
  url          = {{https://aclanthology.org/2024.lrec-main.962}},
  year         = {{2024}},
}