Advanced

Pairing Wikipedia Articles Across Languages

Klang, Marcus LU and Nugues, Pierre LU (2016) Open Knowledge Base and Question Answering (OKBQA) Workshop In Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016) p.72-76
Abstract
Wikipedia has become a reference knowledge source for scores of NLP applications. One of its invaluable features lies in its multilingual nature, where articles on a same entity or concept can have from one to more than 200 different versions. The interlinking of language versions in Wikipedia has undergone a major renewal with the advent of Wikidata, a unified scheme to identify entities and their properties using unique numbers. However, as the interlinking is still manuallycarriedoutbythousandsofeditorsacrosstheglobe,errorsmaycreepintheassignment ofentities. Inthispaper,wedescribeanoptimizationtechniquetomatchautomaticallylanguage versions of articles, and hence entities, that is only based on bags of words and anchors. We created a... (More)
Wikipedia has become a reference knowledge source for scores of NLP applications. One of its invaluable features lies in its multilingual nature, where articles on a same entity or concept can have from one to more than 200 different versions. The interlinking of language versions in Wikipedia has undergone a major renewal with the advent of Wikidata, a unified scheme to identify entities and their properties using unique numbers. However, as the interlinking is still manuallycarriedoutbythousandsofeditorsacrosstheglobe,errorsmaycreepintheassignment ofentities. Inthispaper,wedescribeanoptimizationtechniquetomatchautomaticallylanguage versions of articles, and hence entities, that is only based on bags of words and anchors. We created a dataset of all the articles on persons we extracted from Wikipedia in six languages: English, French, German, Russian, Spanish, and Swedish. We report a correct match of at least 94.3% on each pair. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
in
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)
pages
72 - 76
publisher
The COLING 2016 Organizing Committee
conference name
Open Knowledge Base and Question Answering (OKBQA) Workshop
ISBN
978-4-87974-712-9
language
English
LU publication?
yes
id
10b176f2-f95c-492c-9cf7-f29c82acee70
alternative location
http://www.aclweb.org/anthology/W/W16/W16-4410.pdf
date added to LUP
2016-12-11 11:26:33
date last changed
2016-12-12 11:21:23
@inproceedings{10b176f2-f95c-492c-9cf7-f29c82acee70,
  abstract     = {Wikipedia has become a reference knowledge source for scores of NLP applications. One of its invaluable features lies in its multilingual nature, where articles on a same entity or concept can have from one to more than 200 different versions. The interlinking of language versions in Wikipedia has undergone a major renewal with the advent of Wikidata, a unified scheme to identify entities and their properties using unique numbers. However, as the interlinking is still manuallycarriedoutbythousandsofeditorsacrosstheglobe,errorsmaycreepintheassignment ofentities. Inthispaper,wedescribeanoptimizationtechniquetomatchautomaticallylanguage versions of articles, and hence entities, that is only based on bags of words and anchors. We created a dataset of all the articles on persons we extracted from Wikipedia in six languages: English, French, German, Russian, Spanish, and Swedish. We report a correct match of at least 94.3% on each pair.},
  author       = {Klang, Marcus and Nugues, Pierre},
  booktitle    = {Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)},
  isbn         = {978-4-87974-712-9},
  language     = {eng},
  pages        = {72--76},
  publisher    = {The COLING 2016 Organizing Committee},
  title        = {Pairing Wikipedia Articles Across Languages},
  year         = {2016},
}