Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Pairing Wikipedia Articles Across Languages

Klang, Marcus LU orcid and Nugues, Pierre LU orcid (2016) Open Knowledge Base and Question Answering (OKBQA) Workshop p.72-76
Abstract
Wikipedia has become a reference knowledge source for scores of NLP applications. One of its invaluable features lies in its multilingual nature, where articles on a same entity or concept can have from one to more than 200 different versions. The interlinking of language versions in Wikipedia has undergone a major renewal with the advent of Wikidata, a unified scheme to identify entities and their properties using unique numbers. However, as the interlinking is still manuallycarriedoutbythousandsofeditorsacrosstheglobe,errorsmaycreepintheassignment ofentities. Inthispaper,wedescribeanoptimizationtechniquetomatchautomaticallylanguage versions of articles, and hence entities, that is only based on bags of words and anchors. We created a... (More)
Wikipedia has become a reference knowledge source for scores of NLP applications. One of its invaluable features lies in its multilingual nature, where articles on a same entity or concept can have from one to more than 200 different versions. The interlinking of language versions in Wikipedia has undergone a major renewal with the advent of Wikidata, a unified scheme to identify entities and their properties using unique numbers. However, as the interlinking is still manuallycarriedoutbythousandsofeditorsacrosstheglobe,errorsmaycreepintheassignment ofentities. Inthispaper,wedescribeanoptimizationtechniquetomatchautomaticallylanguage versions of articles, and hence entities, that is only based on bags of words and anchors. We created a dataset of all the articles on persons we extracted from Wikipedia in six languages: English, French, German, Russian, Spanish, and Swedish. We report a correct match of at least 94.3% on each pair. (Less)
Please use this url to cite or link to this publication:
author
and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
host publication
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)
pages
72 - 76
publisher
The COLING 2016 Organizing Committee
conference name
Open Knowledge Base and Question Answering (OKBQA) Workshop
conference location
Osaka, Japan
conference dates
2016-12-11 - 2016-12-11
ISBN
978-4-87974-712-9
language
English
LU publication?
yes
id
10b176f2-f95c-492c-9cf7-f29c82acee70
alternative location
http://www.aclweb.org/anthology/W/W16/W16-4410.pdf
date added to LUP
2016-12-11 11:26:33
date last changed
2021-05-05 10:41:03
@inproceedings{10b176f2-f95c-492c-9cf7-f29c82acee70,
  abstract     = {{Wikipedia has become a reference knowledge source for scores of NLP applications. One of its invaluable features lies in its multilingual nature, where articles on a same entity or concept can have from one to more than 200 different versions. The interlinking of language versions in Wikipedia has undergone a major renewal with the advent of Wikidata, a unified scheme to identify entities and their properties using unique numbers. However, as the interlinking is still manuallycarriedoutbythousandsofeditorsacrosstheglobe,errorsmaycreepintheassignment ofentities. Inthispaper,wedescribeanoptimizationtechniquetomatchautomaticallylanguage versions of articles, and hence entities, that is only based on bags of words and anchors. We created a dataset of all the articles on persons we extracted from Wikipedia in six languages: English, French, German, Russian, Spanish, and Swedish. We report a correct match of at least 94.3% on each pair.}},
  author       = {{Klang, Marcus and Nugues, Pierre}},
  booktitle    = {{Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)}},
  isbn         = {{978-4-87974-712-9}},
  language     = {{eng}},
  pages        = {{72--76}},
  publisher    = {{The COLING 2016 Organizing Committee}},
  title        = {{Pairing Wikipedia Articles Across Languages}},
  url          = {{http://www.aclweb.org/anthology/W/W16/W16-4410.pdf}},
  year         = {{2016}},
}