Linking, Searching, and Visualizing Entities in Wikipedia

Klang, Marcus; Nugues, Pierre

Linking, Searching, and Visualizing Entities in Wikipedia

Mark

Klang, Marcus ^LU

and Nugues, Pierre ^LU

(2018) Language Resources and Evaluation Conference (LREC) p.3426-3432

Abstract: In this paper, we describe a new system to extract, index, search, and visualize entities in Wikipedia. To carry out the entity extraction, we designed a high-performance, multilingual, entity linker and we used a document model to store the resulting linguistic annotations. The entity linker, HEDWIG, extracts the mentions from text usinga string matching Engine and links them toentities with a combination of statistical rules and PageRank. The document model, Docforia (Klang and Nugues, 2017), consists of layers, where each layer is a sequence of ranges describing a speciﬁc annotation, here the entities. We evaluated HEDWIG with the TAC 2016 data and protocol (Ji and Nothman, 2016) and we reached the CEAFm scores of 70.0 on English, on... (More); In this paper, we describe a new system to extract, index, search, and visualize entities in Wikipedia. To carry out the entity extraction, we designed a high-performance, multilingual, entity linker and we used a document model to store the resulting linguistic annotations. The entity linker, HEDWIG, extracts the mentions from text usinga string matching Engine and links them toentities with a combination of statistical rules and PageRank. The document model, Docforia (Klang and Nugues, 2017), consists of layers, where each layer is a sequence of ranges describing a speciﬁc annotation, here the entities. We evaluated HEDWIG with the TAC 2016 data and protocol (Ji and Nothman, 2016) and we reached the CEAFm scores of 70.0 on English, on 64.4 on Chinese, and 66.5 on Spanish. We applied the entity linker to the whole collection of English and Swedish articles of Wikipedia and we used Lucene to index the layers and a search module to interactively retrieve all the concordances of an entity in Wikipedia. The user can select and visualize the concordances in the articles or paragraphs. Contrary to classic text indexing, this system does not use strings to identify the entities but unique identiﬁers from Wikidata (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/25b0e9be-4d0d-4877-b8d7-b0a3c5e3af25

author

Klang, Marcus ^LU

and Nugues, Pierre ^LU

organization

publishing date

2018-05

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

Computer Sciences

host publication

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pages

3426 - 3432

conference name

Language Resources and Evaluation Conference (LREC)

conference location

Miyazaki, Japan

conference dates

2018-05-07 - 2018-05-12

external identifiers

scopus:85059879922

ISBN

979-10-95546-00-9

language

English

LU publication?

yes

id

25b0e9be-4d0d-4877-b8d7-b0a3c5e3af25

alternative location

http://www.lrec-conf.org/proceedings/lrec2018/summaries/93.html

date added to LUP

2018-05-08 11:26:25

date last changed

2025-10-14 13:09:23

@inproceedings{25b0e9be-4d0d-4877-b8d7-b0a3c5e3af25,
  abstract     = {{In this paper, we describe a new system to extract, index, search, and visualize entities in Wikipedia. To carry out the entity extraction, we designed a high-performance, multilingual, entity linker and we used a document model to store the resulting linguistic annotations. The entity linker, HEDWIG, extracts the mentions from text usinga string matching Engine and links them toentities with a combination of statistical rules and PageRank. The document model, Docforia (Klang and Nugues, 2017), consists of layers, where each layer is a sequence of ranges describing a speciﬁc annotation, here the entities. We evaluated HEDWIG with the TAC 2016 data and protocol (Ji and Nothman, 2016) and we reached the CEAFm scores of 70.0 on English, on 64.4 on Chinese, and 66.5 on Spanish. We applied the entity linker to the whole collection of English and Swedish articles of Wikipedia and we used Lucene to index the layers and a search module to interactively retrieve all the concordances of an entity in Wikipedia. The user can select and visualize the concordances in the articles or paragraphs. Contrary to classic text indexing, this system does not use strings to identify the entities but unique identiﬁers from Wikidata}},
  author       = {{Klang, Marcus and Nugues, Pierre}},
  booktitle    = {{Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}},
  isbn         = {{979-10-95546-00-9}},
  language     = {{eng}},
  pages        = {{3426--3432}},
  title        = {{Linking, Searching, and Visualizing Entities in Wikipedia}},
  url          = {{http://www.lrec-conf.org/proceedings/lrec2018/summaries/93.html}},
  year         = {{2018}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Linking, Searching, and Visualizing Entities in Wikipedia