A Multilingual Named Entity Recognition System based on Fixed Ordinally-Forgetting Encoding
(2018) In LU-CS-EX 2018-10 EDAM05 20181Department of Computer Science
- Abstract
- This thesis describes a system whose goal is to find named entities in text. The system uses an encoding method, called the fixed ordinally-forgetting encoding, to efficiently encode variable-length text. We applied this encoding to words and characters and we used the resulting vectors as features. The system is language agnostic, and has been evaluated and tested on multiple languages.
The system uses annotated data, which is supplied by third parties, as the knowledge source. The system parses any given text and outputs a list of entities found in the text with the given entity class and position in the text.
The system achieved an F1 score of 90.31 in the shared CoNLL2003 English task. In the TAC2017 competition, the system... (More) - This thesis describes a system whose goal is to find named entities in text. The system uses an encoding method, called the fixed ordinally-forgetting encoding, to efficiently encode variable-length text. We applied this encoding to words and characters and we used the resulting vectors as features. The system is language agnostic, and has been evaluated and tested on multiple languages.
The system uses annotated data, which is supplied by third parties, as the knowledge source. The system parses any given text and outputs a list of entities found in the text with the given entity class and position in the text.
The system achieved an F1 score of 90.31 in the shared CoNLL2003 English task. In the TAC2017 competition, the system achieved a F1 score of 75.4 for English. (Less) - Popular Abstract (Swedish)
- Allt fler tjänster idag erbjuder funktionalitet där fritext matas in och relevanta resultat returneras, vilket kräver identifiering av egennamn. Detta arbete utforskar en språkoberoende metod för att identifiera och returnera egennamn i text.
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/8959951
- author
- Dib, Firas LU
- supervisor
- organization
- course
- EDAM05 20181
- year
- 2018
- type
- H1 - Master's Degree (One Year)
- subject
- keywords
- MSc, NER, TAC, CoNLL, FOFE, named entities
- publication/series
- LU-CS-EX 2018-10
- report number
- LU-CS-EX 2018-10
- ISSN
- 1650-2884
- language
- English
- id
- 8959951
- date added to LUP
- 2018-12-13 15:34:54
- date last changed
- 2018-12-13 15:34:54
@misc{8959951, abstract = {{This thesis describes a system whose goal is to find named entities in text. The system uses an encoding method, called the fixed ordinally-forgetting encoding, to efficiently encode variable-length text. We applied this encoding to words and characters and we used the resulting vectors as features. The system is language agnostic, and has been evaluated and tested on multiple languages. The system uses annotated data, which is supplied by third parties, as the knowledge source. The system parses any given text and outputs a list of entities found in the text with the given entity class and position in the text. The system achieved an F1 score of 90.31 in the shared CoNLL2003 English task. In the TAC2017 competition, the system achieved a F1 score of 75.4 for English.}}, author = {{Dib, Firas}}, issn = {{1650-2884}}, language = {{eng}}, note = {{Student Paper}}, series = {{LU-CS-EX 2018-10}}, title = {{A Multilingual Named Entity Recognition System based on Fixed Ordinally-Forgetting Encoding}}, year = {{2018}}, }