Advanced

Named Entity Recognition for Short Text Messages

Ek, Tobias; Kirkegaard, Camilla; Jonsson, Håkan LU and Nugues, Pierre LU (2011) Conference of the Pacific-Association-for-Computational-Linguistics (PACLING) In Computational Linguistics and Related Fields 27. p.178-187
Abstract
This paper describes a named entity recognition (NER) system for short text messages (SMS) running on a mobile platform. Most NER systems deal with text that is structured, formal, well written, with a good grammatical structure, and few spelling errors. SMS text messages lack these qualities and have instead a short-handed and mixed language studded with emoticons, which makes NER a challenge on this kind of material. We implemented a system that recognizes named entities from SMSes written in Swedish and that runs on an Android cellular telephone. The entities extracted are locations, names, dates, times, and telephone numbers with the idea that extraction of these entities could be utilized by other applications running on the... (More)
This paper describes a named entity recognition (NER) system for short text messages (SMS) running on a mobile platform. Most NER systems deal with text that is structured, formal, well written, with a good grammatical structure, and few spelling errors. SMS text messages lack these qualities and have instead a short-handed and mixed language studded with emoticons, which makes NER a challenge on this kind of material. We implemented a system that recognizes named entities from SMSes written in Swedish and that runs on an Android cellular telephone. The entities extracted are locations, names, dates, times, and telephone numbers with the idea that extraction of these entities could be utilized by other applications running on the telephone. We started from a regular expression implementation that we complemented with classifiers using logistic regression. We optimized the recognition so that the incoming text messages could be processed on the telephone with a fast response time. We reached an F-score of 86 for strict matches and 89 for partial matches. (C) 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of PACLING Organizing Committee. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
Named entity recognition, Short text messages, SMS, Information, extraction, Ensemble systems
in
Computational Linguistics and Related Fields
volume
27
pages
178 - 187
publisher
Elsevier
conference name
Conference of the Pacific-Association-for-Computational-Linguistics (PACLING)
external identifiers
  • wos:000299624700020
  • scopus:83755171548
ISSN
1877-0428
DOI
10.1016/j.sbspro.2011.10.596
language
English
LU publication?
yes
id
017a3b06-09e4-4dab-818b-9b140d26ab14 (old id 2494191)
date added to LUP
2012-05-11 13:17:30
date last changed
2017-07-23 03:59:15
@inproceedings{017a3b06-09e4-4dab-818b-9b140d26ab14,
  abstract     = {This paper describes a named entity recognition (NER) system for short text messages (SMS) running on a mobile platform. Most NER systems deal with text that is structured, formal, well written, with a good grammatical structure, and few spelling errors. SMS text messages lack these qualities and have instead a short-handed and mixed language studded with emoticons, which makes NER a challenge on this kind of material. We implemented a system that recognizes named entities from SMSes written in Swedish and that runs on an Android cellular telephone. The entities extracted are locations, names, dates, times, and telephone numbers with the idea that extraction of these entities could be utilized by other applications running on the telephone. We started from a regular expression implementation that we complemented with classifiers using logistic regression. We optimized the recognition so that the incoming text messages could be processed on the telephone with a fast response time. We reached an F-score of 86 for strict matches and 89 for partial matches. (C) 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of PACLING Organizing Committee.},
  author       = {Ek, Tobias and Kirkegaard, Camilla and Jonsson, Håkan and Nugues, Pierre},
  booktitle    = {Computational Linguistics and Related Fields},
  issn         = {1877-0428},
  keyword      = {Named entity recognition,Short text messages,SMS,Information,extraction,Ensemble systems},
  language     = {eng},
  pages        = {178--187},
  publisher    = {Elsevier},
  title        = {Named Entity Recognition for Short Text Messages},
  url          = {http://dx.doi.org/10.1016/j.sbspro.2011.10.596},
  volume       = {27},
  year         = {2011},
}