Extended constituent-to-dependency conversion for English
(2007) 16th Nordic Conference of Computational Linguistics p.105-112- Abstract
- We describe a new method to convert English constituent trees using the Penn Treebank annotation style into dependency trees. The new format was inspired by annotation practices used in other dependency treebanks with the intention to produce a better interface to further semantic processing than existing methods. In particular, we used a richer set of edge labels and introduced links to handle long-distance phenomena such as wh-movement and topicalization.
The resulting trees generally have a more complex dependency structure. For example, 6% of the trees contain at least one nonprojective link, which is difficult for many parsing algorithms. As can be expected, the more complex structure and the enriched set of edge... (More) - We describe a new method to convert English constituent trees using the Penn Treebank annotation style into dependency trees. The new format was inspired by annotation practices used in other dependency treebanks with the intention to produce a better interface to further semantic processing than existing methods. In particular, we used a richer set of edge labels and introduced links to handle long-distance phenomena such as wh-movement and topicalization.
The resulting trees generally have a more complex dependency structure. For example, 6% of the trees contain at least one nonprojective link, which is difficult for many parsing algorithms. As can be expected, the more complex structure and the enriched set of edge labels make the trees more difficult to predict, and we observed a decrease in parsing accuracy when applying two dependency parsers to the new corpus. However, the richer information contained in the new trees resulted in a 23% error reduction in a baseline FrameNet semantic role labeler that relied on dependency arc labels only. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/630232
- author
- Johansson, Richard LU and Nugues, Pierre LU
- organization
- publishing date
- 2007
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- keywords
- dependency syntax, treebanks, Natural language processing
- host publication
- NODALIDA 2007 Proceedings
- editor
- Nivre, Joakim ; Kalep, Heiki-Jaan ; Muischnek, Kadri and Koit, Mare
- pages
- 8 pages
- publisher
- University of Tartu
- conference name
- 16th Nordic Conference of Computational Linguistics
- conference location
- Tartu, Estonia
- conference dates
- 2007-05-25 - 2007-05-26
- external identifiers
-
- scopus:85008242043
- ISBN
- 978-9985-4-0514-7
- language
- English
- LU publication?
- yes
- id
- 6a392e53-3abd-40df-8f2e-4e460650e1e1 (old id 630232)
- alternative location
- http://dspace.utlib.ee/dspace/bitstream/10062/2560/1/reg-Johansson-10.pdf
- date added to LUP
- 2016-04-04 11:50:02
- date last changed
- 2022-02-21 05:16:25
@inproceedings{6a392e53-3abd-40df-8f2e-4e460650e1e1, abstract = {{We describe a new method to convert English constituent trees using the Penn Treebank annotation style into dependency trees. The new format was inspired by annotation practices used in other dependency treebanks with the intention to produce a better interface to further semantic processing than existing methods. In particular, we used a richer set of edge labels and introduced links to handle long-distance phenomena such as wh-movement and topicalization.<br/><br> <br/><br> The resulting trees generally have a more complex dependency structure. For example, 6% of the trees contain at least one nonprojective link, which is difficult for many parsing algorithms. As can be expected, the more complex structure and the enriched set of edge labels make the trees more difficult to predict, and we observed a decrease in parsing accuracy when applying two dependency parsers to the new corpus. However, the richer information contained in the new trees resulted in a 23% error reduction in a baseline FrameNet semantic role labeler that relied on dependency arc labels only.}}, author = {{Johansson, Richard and Nugues, Pierre}}, booktitle = {{NODALIDA 2007 Proceedings}}, editor = {{Nivre, Joakim and Kalep, Heiki-Jaan and Muischnek, Kadri and Koit, Mare}}, isbn = {{978-9985-4-0514-7}}, keywords = {{dependency syntax; treebanks; Natural language processing}}, language = {{eng}}, pages = {{105--112}}, publisher = {{University of Tartu}}, title = {{Extended constituent-to-dependency conversion for English}}, url = {{https://lup.lub.lu.se/search/files/5865743/2972024.pdf}}, year = {{2007}}, }