Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

A Multilingual Named Entity Recognition System based on Fixed Ordinally-Forgetting Encoding

Dib, Firas LU (2018) In LU-CS-EX 2018-10 EDAM05 20181
Department of Computer Science
Abstract
This thesis describes a system whose goal is to find named entities in text. The system uses an encoding method, called the fixed ordinally-forgetting encoding, to efficiently encode variable-length text. We applied this encoding to words and characters and we used the resulting vectors as features. The system is language agnostic, and has been evaluated and tested on multiple languages.

The system uses annotated data, which is supplied by third parties, as the knowledge source. The system parses any given text and outputs a list of entities found in the text with the given entity class and position in the text.

The system achieved an F1 score of 90.31 in the shared CoNLL2003 English task. In the TAC2017 competition, the system... (More)
This thesis describes a system whose goal is to find named entities in text. The system uses an encoding method, called the fixed ordinally-forgetting encoding, to efficiently encode variable-length text. We applied this encoding to words and characters and we used the resulting vectors as features. The system is language agnostic, and has been evaluated and tested on multiple languages.

The system uses annotated data, which is supplied by third parties, as the knowledge source. The system parses any given text and outputs a list of entities found in the text with the given entity class and position in the text.

The system achieved an F1 score of 90.31 in the shared CoNLL2003 English task. In the TAC2017 competition, the system achieved a F1 score of 75.4 for English. (Less)
Popular Abstract (Swedish)
Allt fler tjänster idag erbjuder funktionalitet där fritext matas in och relevanta resultat returneras, vilket kräver identifiering av egennamn. Detta arbete utforskar en språkoberoende metod för att identifiera och returnera egennamn i text.
Please use this url to cite or link to this publication:
author
Dib, Firas LU
supervisor
organization
course
EDAM05 20181
year
type
H1 - Master's Degree (One Year)
subject
keywords
MSc, NER, TAC, CoNLL, FOFE, named entities
publication/series
LU-CS-EX 2018-10
report number
LU-CS-EX 2018-10
ISSN
1650-2884
language
English
id
8959951
date added to LUP
2018-12-13 15:34:54
date last changed
2018-12-13 15:34:54
@misc{8959951,
  abstract     = {{This thesis describes a system whose goal is to find named entities in text. The system uses an encoding method, called the fixed ordinally-forgetting encoding, to efficiently encode variable-length text. We applied this encoding to words and characters and we used the resulting vectors as features. The system is language agnostic, and has been evaluated and tested on multiple languages.

The system uses annotated data, which is supplied by third parties, as the knowledge source. The system parses any given text and outputs a list of entities found in the text with the given entity class and position in the text.

The system achieved an F1 score of 90.31 in the shared CoNLL2003 English task. In the TAC2017 competition, the system achieved a F1 score of 75.4 for English.}},
  author       = {{Dib, Firas}},
  issn         = {{1650-2884}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{LU-CS-EX 2018-10}},
  title        = {{A Multilingual Named Entity Recognition System based on Fixed Ordinally-Forgetting Encoding}},
  year         = {{2018}},
}