Advanced

Langforia: Language pipelines for annotating large collections of documents.

Klang, Marcus LU and Nugues, Pierre LU (2016) 26th International Conference on Computational Linguistics (COLING), 2016 In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations p.74-78
Abstract
In this paper, we describe Langforia, a multilingual processing pipeline to annotate texts with multiple layers: formatting, parts of speech, named entities, dependencies, semantic roles, and entity links. Langforia works as a web service, where the server hosts the language processing components and the client, the input and result visualization. To annotate a text or a Wikipedia page, the user chooses an NLP pipeline and enters the text or the name of the Wikipedia page in the input field of the interface. Once processed, the results are returned to the client, where the user can select the annotation layers s/he wants to visualize. We designed Langforia with a specific focus for Wikipedia, although it can process any type of text.... (More)
In this paper, we describe Langforia, a multilingual processing pipeline to annotate texts with multiple layers: formatting, parts of speech, named entities, dependencies, semantic roles, and entity links. Langforia works as a web service, where the server hosts the language processing components and the client, the input and result visualization. To annotate a text or a Wikipedia page, the user chooses an NLP pipeline and enters the text or the name of the Wikipedia page in the input field of the interface. Once processed, the results are returned to the client, where the user can select the annotation layers s/he wants to visualize. We designed Langforia with a specific focus for Wikipedia, although it can process any type of text. Wikipedia has become an essential encyclopedic corpus used in many NLP projects. However, processing articles and visualizing the annotations are nontrivial tasks that require dealing with multiple markup variants, encodings issues, and tool incompatibilities across the language versions. This motivated the development of a new architecture. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
in
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations
pages
74 - 78
conference name
26th International Conference on Computational Linguistics (COLING), 2016
ISBN
978-4-87974-703-7
language
English
LU publication?
yes
id
5f193157-4e61-4741-86f0-59e5ada4d280
alternative location
http://www.aclweb.org/anthology/C/C16/C16-2016.pdf
date added to LUP
2016-12-07 16:57:19
date last changed
2016-12-08 12:54:02
@inproceedings{5f193157-4e61-4741-86f0-59e5ada4d280,
  abstract     = {In this paper, we describe Langforia, a multilingual processing pipeline to annotate texts with multiple layers: formatting, parts of speech, named entities, dependencies, semantic roles, and entity links. Langforia works as a web service, where the server hosts the language processing components and the client, the input and result visualization. To annotate a text or a Wikipedia page, the user chooses an NLP pipeline and enters the text or the name of the Wikipedia page in the input field of the interface. Once processed, the results are returned to the client, where the user can select the annotation layers s/he wants to visualize. We designed Langforia with a specific focus for Wikipedia, although it can process any type of text. Wikipedia has become an essential encyclopedic corpus used in many NLP projects. However, processing articles and visualizing the annotations are nontrivial tasks that require dealing with multiple markup variants, encodings issues, and tool incompatibilities across the language versions. This motivated the development of a new architecture. },
  author       = {Klang, Marcus and Nugues, Pierre},
  booktitle    = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations},
  isbn         = {978-4-87974-703-7},
  language     = {eng},
  pages        = {74--78},
  title        = {Langforia: Language pipelines for annotating large collections of documents.},
  year         = {2016},
}