Linking, Searching, and Visualizing Entities for the Swedish Wikipedia
(2016) Sixth Swedish Language Technology Conference (SLTC 2016)- Abstract
- In this paper, we describe a new system to extract, index, search, and visualize entities on Wikipedia. To carry out the extraction, we designed a high-performance entity linker and we used a document model to store the resulting linguistic annotations. The entity linker ,HERD, extracts the mentions from text using a string matching Engine and links the mto entities with a combination of rules, PageRank, and feature vectors based on the Wikipedia categories. The document model, Docforia, consists of layers, where each layer is a sequence of ranges describing a specific annotation,here thee ntities. We evaluated HERD with the ERD’14 protocol (Carmel et al., 2014) and we reached the competitive F1-score of 0.746 on the English development... (More)
- In this paper, we describe a new system to extract, index, search, and visualize entities on Wikipedia. To carry out the extraction, we designed a high-performance entity linker and we used a document model to store the resulting linguistic annotations. The entity linker ,HERD, extracts the mentions from text using a string matching Engine and links the mto entities with a combination of rules, PageRank, and feature vectors based on the Wikipedia categories. The document model, Docforia, consists of layers, where each layer is a sequence of ranges describing a specific annotation,here thee ntities. We evaluated HERD with the ERD’14 protocol (Carmel et al., 2014) and we reached the competitive F1-score of 0.746 on the English development set. We applied HERD to the whole collection of Swedish articles of Wikipedia and we used Lucene to index the layers and a search module to interactively retrieve articles and metadata given a title, a phrase, or a property. The user can then select an entity and visualize concordance in articles or paragraphs. A demonstration of the entity search and visualization is available for Swedish at this address: http://vilde.cs.lth.se:9001/sv-herd/.
(Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/6c907110-bb62-47c4-868c-84f0570a3b5b
- author
- Södergren, Anton ; Klang, Marcus LU and Nugues, Pierre LU
- organization
- publishing date
- 2016
- type
- Contribution to conference
- publication status
- published
- subject
- conference name
- Sixth Swedish Language Technology Conference (SLTC 2016)
- conference location
- Umeå, Sweden
- conference dates
- 2016-11-17 - 2016-11-18
- language
- English
- LU publication?
- yes
- id
- 6c907110-bb62-47c4-868c-84f0570a3b5b
- alternative location
- http://www8.cs.umu.se/~johanna/sltc2016/abstracts/SLTC_2016_paper_8.pdf
- date added to LUP
- 2017-01-11 17:03:53
- date last changed
- 2021-05-06 15:57:35
@misc{6c907110-bb62-47c4-868c-84f0570a3b5b, abstract = {{In this paper, we describe a new system to extract, index, search, and visualize entities on Wikipedia. To carry out the extraction, we designed a high-performance entity linker and we used a document model to store the resulting linguistic annotations. The entity linker ,HERD, extracts the mentions from text using a string matching Engine and links the mto entities with a combination of rules, PageRank, and feature vectors based on the Wikipedia categories. The document model, Docforia, consists of layers, where each layer is a sequence of ranges describing a specific annotation,here thee ntities. We evaluated HERD with the ERD’14 protocol (Carmel et al., 2014) and we reached the competitive F1-score of 0.746 on the English development set. We applied HERD to the whole collection of Swedish articles of Wikipedia and we used Lucene to index the layers and a search module to interactively retrieve articles and metadata given a title, a phrase, or a property. The user can then select an entity and visualize concordance in articles or paragraphs. A demonstration of the entity search and visualization is available for Swedish at this address: http://vilde.cs.lth.se:9001/sv-herd/.<br/>}}, author = {{Södergren, Anton and Klang, Marcus and Nugues, Pierre}}, language = {{eng}}, title = {{Linking, Searching, and Visualizing Entities for the Swedish Wikipedia}}, url = {{http://www8.cs.umu.se/~johanna/sltc2016/abstracts/SLTC_2016_paper_8.pdf}}, year = {{2016}}, }