Pairing Wikipedia Articles Across Languages
(2016) Open Knowledge Base and Question Answering (OKBQA) Workshop p.72-76- Abstract
- Wikipedia has become a reference knowledge source for scores of NLP applications. One of its invaluable features lies in its multilingual nature, where articles on a same entity or concept can have from one to more than 200 different versions. The interlinking of language versions in Wikipedia has undergone a major renewal with the advent of Wikidata, a unified scheme to identify entities and their properties using unique numbers. However, as the interlinking is still manuallycarriedoutbythousandsofeditorsacrosstheglobe,errorsmaycreepintheassignment ofentities. Inthispaper,wedescribeanoptimizationtechniquetomatchautomaticallylanguage versions of articles, and hence entities, that is only based on bags of words and anchors. We created a... (More)
- Wikipedia has become a reference knowledge source for scores of NLP applications. One of its invaluable features lies in its multilingual nature, where articles on a same entity or concept can have from one to more than 200 different versions. The interlinking of language versions in Wikipedia has undergone a major renewal with the advent of Wikidata, a unified scheme to identify entities and their properties using unique numbers. However, as the interlinking is still manuallycarriedoutbythousandsofeditorsacrosstheglobe,errorsmaycreepintheassignment ofentities. Inthispaper,wedescribeanoptimizationtechniquetomatchautomaticallylanguage versions of articles, and hence entities, that is only based on bags of words and anchors. We created a dataset of all the articles on persons we extracted from Wikipedia in six languages: English, French, German, Russian, Spanish, and Swedish. We report a correct match of at least 94.3% on each pair. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/10b176f2-f95c-492c-9cf7-f29c82acee70
- author
- Klang, Marcus LU and Nugues, Pierre LU
- organization
- publishing date
- 2016-12
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- host publication
- Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)
- pages
- 72 - 76
- publisher
- The COLING 2016 Organizing Committee
- conference name
- Open Knowledge Base and Question Answering (OKBQA) Workshop
- conference location
- Osaka, Japan
- conference dates
- 2016-12-11 - 2016-12-11
- ISBN
- 978-4-87974-712-9
- language
- English
- LU publication?
- yes
- id
- 10b176f2-f95c-492c-9cf7-f29c82acee70
- alternative location
- http://www.aclweb.org/anthology/W/W16/W16-4410.pdf
- date added to LUP
- 2016-12-11 11:26:33
- date last changed
- 2021-05-05 10:41:03
@inproceedings{10b176f2-f95c-492c-9cf7-f29c82acee70, abstract = {{Wikipedia has become a reference knowledge source for scores of NLP applications. One of its invaluable features lies in its multilingual nature, where articles on a same entity or concept can have from one to more than 200 different versions. The interlinking of language versions in Wikipedia has undergone a major renewal with the advent of Wikidata, a unified scheme to identify entities and their properties using unique numbers. However, as the interlinking is still manuallycarriedoutbythousandsofeditorsacrosstheglobe,errorsmaycreepintheassignment ofentities. Inthispaper,wedescribeanoptimizationtechniquetomatchautomaticallylanguage versions of articles, and hence entities, that is only based on bags of words and anchors. We created a dataset of all the articles on persons we extracted from Wikipedia in six languages: English, French, German, Russian, Spanish, and Swedish. We report a correct match of at least 94.3% on each pair.}}, author = {{Klang, Marcus and Nugues, Pierre}}, booktitle = {{Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)}}, isbn = {{978-4-87974-712-9}}, language = {{eng}}, pages = {{72--76}}, publisher = {{The COLING 2016 Organizing Committee}}, title = {{Pairing Wikipedia Articles Across Languages}}, url = {{http://www.aclweb.org/anthology/W/W16/W16-4410.pdf}}, year = {{2016}}, }