Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Extended constituent-to-dependency conversion for English

Johansson, Richard LU and Nugues, Pierre LU orcid (2007) 16th Nordic Conference of Computational Linguistics p.105-112
Abstract
We describe a new method to convert English constituent trees using the Penn Treebank annotation style into dependency trees. The new format was inspired by annotation practices used in other dependency treebanks with the intention to produce a better interface to further semantic processing than existing methods. In particular, we used a richer set of edge labels and introduced links to handle long-distance phenomena such as wh-movement and topicalization.



The resulting trees generally have a more complex dependency structure. For example, 6% of the trees contain at least one nonprojective link, which is difficult for many parsing algorithms. As can be expected, the more complex structure and the enriched set of edge... (More)
We describe a new method to convert English constituent trees using the Penn Treebank annotation style into dependency trees. The new format was inspired by annotation practices used in other dependency treebanks with the intention to produce a better interface to further semantic processing than existing methods. In particular, we used a richer set of edge labels and introduced links to handle long-distance phenomena such as wh-movement and topicalization.



The resulting trees generally have a more complex dependency structure. For example, 6% of the trees contain at least one nonprojective link, which is difficult for many parsing algorithms. As can be expected, the more complex structure and the enriched set of edge labels make the trees more difficult to predict, and we observed a decrease in parsing accuracy when applying two dependency parsers to the new corpus. However, the richer information contained in the new trees resulted in a 23% error reduction in a baseline FrameNet semantic role labeler that relied on dependency arc labels only. (Less)
Please use this url to cite or link to this publication:
author
and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
dependency syntax, treebanks, Natural language processing
host publication
NODALIDA 2007 Proceedings
editor
Nivre, Joakim ; Kalep, Heiki-Jaan ; Muischnek, Kadri and Koit, Mare
pages
8 pages
publisher
University of Tartu
conference name
16th Nordic Conference of Computational Linguistics
conference location
Tartu, Estonia
conference dates
2007-05-25 - 2007-05-26
external identifiers
  • scopus:85008242043
ISBN
978-9985-4-0514-7
language
English
LU publication?
yes
id
6a392e53-3abd-40df-8f2e-4e460650e1e1 (old id 630232)
alternative location
http://dspace.utlib.ee/dspace/bitstream/10062/2560/1/reg-Johansson-10.pdf
date added to LUP
2016-04-04 11:50:02
date last changed
2022-02-21 05:16:25
@inproceedings{6a392e53-3abd-40df-8f2e-4e460650e1e1,
  abstract     = {{We describe a new method to convert English constituent trees using the Penn Treebank annotation style into dependency trees. The new format was inspired by annotation practices used in other dependency treebanks with the intention to produce a better interface to further semantic processing than existing methods. In particular, we used a richer set of edge labels and introduced links to handle long-distance phenomena such as wh-movement and topicalization.<br/><br>
<br/><br>
The resulting trees generally have a more complex dependency structure. For example, 6% of the trees contain at least one nonprojective link, which is difficult for many parsing algorithms. As can be expected, the more complex structure and the enriched set of edge labels make the trees more difficult to predict, and we observed a decrease in parsing accuracy when applying two dependency parsers to the new corpus. However, the richer information contained in the new trees resulted in a 23% error reduction in a baseline FrameNet semantic role labeler that relied on dependency arc labels only.}},
  author       = {{Johansson, Richard and Nugues, Pierre}},
  booktitle    = {{NODALIDA 2007 Proceedings}},
  editor       = {{Nivre, Joakim and Kalep, Heiki-Jaan and Muischnek, Kadri and Koit, Mare}},
  isbn         = {{978-9985-4-0514-7}},
  keywords     = {{dependency syntax; treebanks; Natural language processing}},
  language     = {{eng}},
  pages        = {{105--112}},
  publisher    = {{University of Tartu}},
  title        = {{Extended constituent-to-dependency conversion for English}},
  url          = {{https://lup.lub.lu.se/search/files/5865743/2972024.pdf}},
  year         = {{2007}},
}