A Multilingual Named Entity Recognition System based on Fixed Ordinally-Forgetting Encoding

Dib, Firas

A Multilingual Named Entity Recognition System based on Fixed Ordinally-Forgetting Encoding

Mark

Dib, Firas ^LU (2018) In LU-CS-EX 2018-10 EDAM05 20181
Department of Computer Science

Abstract: This thesis describes a system whose goal is to find named entities in text. The system uses an encoding method, called the fixed ordinally-forgetting encoding, to efficiently encode variable-length text. We applied this encoding to words and characters and we used the resulting vectors as features. The system is language agnostic, and has been evaluated and tested on multiple languages.

The system uses annotated data, which is supplied by third parties, as the knowledge source. The system parses any given text and outputs a list of entities found in the text with the given entity class and position in the text.

The system achieved an F1 score of 90.31 in the shared CoNLL2003 English task. In the TAC2017 competition, the system... (More); This thesis describes a system whose goal is to find named entities in text. The system uses an encoding method, called the fixed ordinally-forgetting encoding, to efficiently encode variable-length text. We applied this encoding to words and characters and we used the resulting vectors as features. The system is language agnostic, and has been evaluated and tested on multiple languages.

The system uses annotated data, which is supplied by third parties, as the knowledge source. The system parses any given text and outputs a list of entities found in the text with the given entity class and position in the text.

The system achieved an F1 score of 90.31 in the shared CoNLL2003 English task. In the TAC2017 competition, the system achieved a F1 score of 75.4 for English. (Less)
Popular Abstract (Swedish): Allt fler tjänster idag erbjuder funktionalitet där fritext matas in och relevanta resultat returneras, vilket kräver identifiering av egennamn. Detta arbete utforskar en språkoberoende metod för att identifiera och returnera egennamn i text.

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/8959951

author

Dib, Firas ^LU

supervisor

Pierre Nugues ^LU

organization

Department of Computer Science

course

EDAM05 20181

year

2018

type

H1 - Master's Degree (One Year)

subject

Technology and Engineering

keywords

MSc, NER, TAC, CoNLL, FOFE, named entities

publication/series

LU-CS-EX 2018-10

report number

LU-CS-EX 2018-10

ISSN

1650-2884

language

English

id

8959951

date added to LUP

2018-12-13 15:34:54

date last changed

2018-12-13 15:34:54

@misc{8959951,
  abstract     = {{This thesis describes a system whose goal is to find named entities in text. The system uses an encoding method, called the fixed ordinally-forgetting encoding, to efficiently encode variable-length text. We applied this encoding to words and characters and we used the resulting vectors as features. The system is language agnostic, and has been evaluated and tested on multiple languages.

The system uses annotated data, which is supplied by third parties, as the knowledge source. The system parses any given text and outputs a list of entities found in the text with the given entity class and position in the text.

The system achieved an F1 score of 90.31 in the shared CoNLL2003 English task. In the TAC2017 competition, the system achieved a F1 score of 75.4 for English.}},
  author       = {{Dib, Firas}},
  issn         = {{1650-2884}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{LU-CS-EX 2018-10}},
  title        = {{A Multilingual Named Entity Recognition System based on Fixed Ordinally-Forgetting Encoding}},
  year         = {{2018}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

A Multilingual Named Entity Recognition System based on Fixed Ordinally-Forgetting Encoding