Advanced

Extended constituent-to-dependency conversion for English

Johansson, Richard LU and Nugues, Pierre LU (2007) 16th Nordic Conference of Computational Linguistics In NODALIDA 2007 Proceedings p.105-112
Abstract
We describe a new method to convert English constituent trees using the Penn Treebank annotation style into dependency trees. The new format was inspired by annotation practices used in other dependency treebanks with the intention to produce a better interface to further semantic processing than existing methods. In particular, we used a richer set of edge labels and introduced links to handle long-distance phenomena such as wh-movement and topicalization.



The resulting trees generally have a more complex dependency structure. For example, 6% of the trees contain at least one nonprojective link, which is difficult for many parsing algorithms. As can be expected, the more complex structure and the enriched set of edge... (More)
We describe a new method to convert English constituent trees using the Penn Treebank annotation style into dependency trees. The new format was inspired by annotation practices used in other dependency treebanks with the intention to produce a better interface to further semantic processing than existing methods. In particular, we used a richer set of edge labels and introduced links to handle long-distance phenomena such as wh-movement and topicalization.



The resulting trees generally have a more complex dependency structure. For example, 6% of the trees contain at least one nonprojective link, which is difficult for many parsing algorithms. As can be expected, the more complex structure and the enriched set of edge labels make the trees more difficult to predict, and we observed a decrease in parsing accuracy when applying two dependency parsers to the new corpus. However, the richer information contained in the new trees resulted in a 23% error reduction in a baseline FrameNet semantic role labeler that relied on dependency arc labels only. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
dependency syntax, treebanks, Natural language processing
in
NODALIDA 2007 Proceedings
editor
Nivre, Joakim; Kalep, Heiki-Jaan; Muischnek, Kadri and Koit, Mare
pages
8 pages
publisher
University of Tartu
conference name
16th Nordic Conference of Computational Linguistics
ISBN
978-9985-4-0514-7
language
English
LU publication?
yes
id
6a392e53-3abd-40df-8f2e-4e460650e1e1 (old id 630232)
alternative location
http://dspace.utlib.ee/dspace/bitstream/10062/2560/1/reg-Johansson-10.pdf
date added to LUP
2007-11-27 14:10:56
date last changed
2016-04-16 09:46:07
@misc{6a392e53-3abd-40df-8f2e-4e460650e1e1,
  abstract     = {We describe a new method to convert English constituent trees using the Penn Treebank annotation style into dependency trees. The new format was inspired by annotation practices used in other dependency treebanks with the intention to produce a better interface to further semantic processing than existing methods. In particular, we used a richer set of edge labels and introduced links to handle long-distance phenomena such as wh-movement and topicalization.<br/><br>
<br/><br>
The resulting trees generally have a more complex dependency structure. For example, 6% of the trees contain at least one nonprojective link, which is difficult for many parsing algorithms. As can be expected, the more complex structure and the enriched set of edge labels make the trees more difficult to predict, and we observed a decrease in parsing accuracy when applying two dependency parsers to the new corpus. However, the richer information contained in the new trees resulted in a 23% error reduction in a baseline FrameNet semantic role labeler that relied on dependency arc labels only.},
  author       = {Johansson, Richard and Nugues, Pierre},
  editor       = {Nivre, Joakim and Kalep, Heiki-Jaan and Muischnek, Kadri and Koit, Mare},
  isbn         = {978-9985-4-0514-7},
  keyword      = {dependency syntax,treebanks,Natural language processing},
  language     = {eng},
  pages        = {105--112},
  publisher    = {ARRAY(0x8b36200)},
  series       = {NODALIDA 2007 Proceedings},
  title        = {Extended constituent-to-dependency conversion for English},
  year         = {2007},
}