Mapping the Past : Geographically Linking an Early 20th Century Swedish Encyclopedia with Wikidata
(2024) Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings p.11040-11048- Abstract
In this paper, we describe the extraction of all the location entries from a prominent Swedish encyclopedia from the early 20th century, the Nordisk Familjebok 'Nordic Family Book.' We focused on the second edition called Uggleupplagan, which comprises 38 volumes and over 182,000 articles. This makes it one of the most extensive Swedish encyclopedias. Using a classifier, we first determined the category of the entries. We found that approximately 22 percent of them were locations. We applied a named entity recognition to these entries and we linked them to Wikidata. Wikidata enabled us to extract their precise geographic locations resulting in almost 18,000 valid coordinates. We then analyzed the distribution of these locations and the... (More)
In this paper, we describe the extraction of all the location entries from a prominent Swedish encyclopedia from the early 20th century, the Nordisk Familjebok 'Nordic Family Book.' We focused on the second edition called Uggleupplagan, which comprises 38 volumes and over 182,000 articles. This makes it one of the most extensive Swedish encyclopedias. Using a classifier, we first determined the category of the entries. We found that approximately 22 percent of them were locations. We applied a named entity recognition to these entries and we linked them to Wikidata. Wikidata enabled us to extract their precise geographic locations resulting in almost 18,000 valid coordinates. We then analyzed the distribution of these locations and the entry selection process. It showed a higher density within Sweden, Germany, and the United Kingdom. The paper sheds light on the selection and representation of geographic information in the Nordisk Familjebok, providing insights into historical and societal perspectives. It also paves the way for future investigations into entry selection in different time periods and comparative analyses among various encyclopedias.
(Less)
- author
- Ahlin, Axel ; Myrne, Alfred and Nugues, Pierre LU
- organization
- publishing date
- 2024
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- keywords
- entity annotation, entity linking, named entity recognition
- host publication
- 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
- series title
- 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
- editor
- Calzolari, Nicoletta ; Kan, Min-Yen ; Hoste, Veronique ; Lenci, Alessandro ; Sakti, Sakriani and Xue, Nianwen
- pages
- 9 pages
- publisher
- European Language Resources Association
- conference name
- Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
- conference location
- Hybrid, Torino, Italy
- conference dates
- 2024-05-20 - 2024-05-25
- external identifiers
-
- scopus:85195913963
- ISBN
- 9782493814104
- language
- English
- LU publication?
- yes
- id
- d4e5b4d0-72fe-4576-aaa3-c13bdd564211
- alternative location
- https://aclanthology.org/2024.lrec-main.962
- date added to LUP
- 2024-09-11 15:13:34
- date last changed
- 2024-09-11 15:13:45
@inproceedings{d4e5b4d0-72fe-4576-aaa3-c13bdd564211, abstract = {{<p>In this paper, we describe the extraction of all the location entries from a prominent Swedish encyclopedia from the early 20th century, the Nordisk Familjebok 'Nordic Family Book.' We focused on the second edition called Uggleupplagan, which comprises 38 volumes and over 182,000 articles. This makes it one of the most extensive Swedish encyclopedias. Using a classifier, we first determined the category of the entries. We found that approximately 22 percent of them were locations. We applied a named entity recognition to these entries and we linked them to Wikidata. Wikidata enabled us to extract their precise geographic locations resulting in almost 18,000 valid coordinates. We then analyzed the distribution of these locations and the entry selection process. It showed a higher density within Sweden, Germany, and the United Kingdom. The paper sheds light on the selection and representation of geographic information in the Nordisk Familjebok, providing insights into historical and societal perspectives. It also paves the way for future investigations into entry selection in different time periods and comparative analyses among various encyclopedias.</p>}}, author = {{Ahlin, Axel and Myrne, Alfred and Nugues, Pierre}}, booktitle = {{2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings}}, editor = {{Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen}}, isbn = {{9782493814104}}, keywords = {{entity annotation; entity linking; named entity recognition}}, language = {{eng}}, pages = {{11040--11048}}, publisher = {{European Language Resources Association}}, series = {{2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings}}, title = {{Mapping the Past : Geographically Linking an Early 20th Century Swedish Encyclopedia with Wikidata}}, url = {{https://aclanthology.org/2024.lrec-main.962}}, year = {{2024}}, }