Towards Robust Linguistic Analysis using OntoNotes

Pradhan, Sameer; Moschitti, Alessandro; Xue, Nianwen; Ng, Hwee Tou; Björkelund, Anders; Uryupina, Olga; Zhang, Yuchen; Zhong, Zhi

Towards Robust Linguistic Analysis using OntoNotes

Mark

Pradhan, Sameer ; Moschitti, Alessandro ; Xue, Nianwen ; Ng, Hwee Tou ; Björkelund, Anders ^LU ; Uryupina, Olga ; Zhang, Yuchen and Zhong, Zhi (2013) p.143-152

Abstract: Large-scale linguistically annotated corpora have played a crucial role in advancing the state of the art of key natural language technologies such as syntactic, semantic and discourse analyzers, and they serve as training data as well as evaluation benchmarks. Up till now, however, most of the evaluation has been done on monolithic corpora such as the Penn Treebank, the Proposition Bank. As a result, it is still unclear how the state-of-the-art analyzers perform in general on data from a variety of genres or domains. The completion of the OntoNotes corpus, a large-scale, multi-genre, multilingual corpus manually annotated with syntactic, semantic and discourse information, makes it possible to perform such an evaluation. This paper... (More); Large-scale linguistically annotated corpora have played a crucial role in advancing the state of the art of key natural language technologies such as syntactic, semantic and discourse analyzers, and they serve as training data as well as evaluation benchmarks. Up till now, however, most of the evaluation has been done on monolithic corpora such as the Penn Treebank, the Proposition Bank. As a result, it is still unclear how the state-of-the-art analyzers perform in general on data from a variety of genres or domains. The completion of the OntoNotes corpus, a large-scale, multi-genre, multilingual corpus manually annotated with syntactic, semantic and discourse information, makes it possible to perform such an evaluation. This paper presents an analysis of the performance of publicly available, state-of-the-art tools on all layers and languages in the OntoNotes v5.0 corpus. This should set the benchmark for future development of various NLP components in syntax and semantics, and possibly encourage research towards an integrated system that makes use of the various layers jointly to improve overall performance (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/61541df2-f316-4249-8f99-d36f4828500b

author

Pradhan, Sameer ; Moschitti, Alessandro ; Xue, Nianwen ; Ng, Hwee Tou ; Björkelund, Anders ^LU ; Uryupina, Olga ; Zhang, Yuchen and Zhong, Zhi

publishing date

2013-08-01

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

Natural Language Processing

host publication

Proceedings of the Seventeenth Conference on Computational Natural Language Learning

pages

10 pages

publisher

Association for Computational Linguistics

external identifiers

scopus:85072757969

ISBN

978-1-937284-70-1

language

English

LU publication?

no

id

61541df2-f316-4249-8f99-d36f4828500b

alternative location

https://www.aclweb.org/anthology/W13-3516

date added to LUP

2019-05-21 14:18:38

date last changed

2025-10-14 10:24:52

@inproceedings{61541df2-f316-4249-8f99-d36f4828500b,
  abstract     = {{Large-scale linguistically annotated corpora have played a crucial role in advancing the state of the art of key natural language technologies such as syntactic, semantic and discourse analyzers, and they serve as training data as well as evaluation benchmarks. Up till now, however, most of the evaluation has been done on monolithic corpora such as the Penn Treebank, the Proposition Bank. As a result, it is still unclear how the state-of-the-art analyzers perform in general on data from a variety of genres or domains. The completion of the OntoNotes corpus, a large-scale, multi-genre, multilingual corpus manually annotated with syntactic, semantic and discourse information, makes it possible to perform such an evaluation. This paper presents an analysis of the performance of publicly available, state-of-the-art tools on all layers and languages in the OntoNotes v5.0 corpus. This should set the benchmark for future development of various NLP components in syntax and semantics, and possibly encourage research towards an integrated system that makes use of the various layers jointly to improve overall performance}},
  author       = {{Pradhan, Sameer and Moschitti, Alessandro and Xue, Nianwen and Ng, Hwee Tou and Björkelund, Anders and Uryupina, Olga and Zhang, Yuchen and Zhong, Zhi}},
  booktitle    = {{Proceedings of the Seventeenth Conference on Computational Natural Language Learning}},
  isbn         = {{978-1-937284-70-1}},
  language     = {{eng}},
  month        = {{08}},
  pages        = {{143--152}},
  publisher    = {{Association for Computational Linguistics}},
  title        = {{Towards Robust Linguistic Analysis using OntoNotes}},
  url          = {{https://www.aclweb.org/anthology/W13-3516}},
  year         = {{2013}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Towards Robust Linguistic Analysis using OntoNotes