Lexical semantics for software requirements engineering – a corpus-based approach

Lindmark, Kerstin; Natt och Dag, Johan; Willners, Caroline

Lexical semantics for software requirements engineering – a corpus-based approach

Mark

Lindmark, Kerstin ^LU ; Natt och Dag, Johan ^LU and Willners, Caroline ^LU (2007) ICAME 25 p.365-385

Abstract: In companies that constantly develop new software releases for large markets, there continually arrive new requirements, written in natural language that may affect the development work. Before any decision is made about the requirements, these must be analysed and understood, and related to the current set of implemented and queued requirements. This task is time-consuming owing to the high inflow of requirements, and decision-making would be facilitated by any support that would reduce the requirements analyst’s workload. One of the main tasks is finding requirement duplicates and requirements with similar content and different NLP methods have been tried for this. Simple word matching is one of the methods used for linkage between... (More); In companies that constantly develop new software releases for large markets, there continually arrive new requirements, written in natural language that may affect the development work. Before any decision is made about the requirements, these must be analysed and understood, and related to the current set of implemented and queued requirements. This task is time-consuming owing to the high inflow of requirements, and decision-making would be facilitated by any support that would reduce the requirements analyst’s workload. One of the main tasks is finding requirement duplicates and requirements with similar content and different NLP methods have been tried for this. Simple word matching is one of the methods used for linkage between requirements. If links could be set up not only between words, but also between concepts at different semantic levels, the chances of finding content-corresponding requirements would be greater. One goal of this project is to establish a terminology for requirements as well as to establish (Wordnet-type) semantic relations between terms, in order to enable multi-level linkage. For this purpose, we use a corpus consisting of 1,932 authentic software requirements, written in English of varying grammatical and stylistic quality. First, term candidates were extracted using the WordSmith Keyword function, with BNC Sampler as reference corpus. To find out whether there is any terminology specific to the ‘requirements’ sub-domain of the ‘software’ domain, the documentation associated with the software to which the requirements relate was also used as a reference (separately). Then, lexico-semantic patterns according to Hearst (1992) were used to find hyponymy–hyperonymy relations, and to confirm manually established relations. These analyses were performed on the text both ‘as is’ and, reducing noise somewhat, after POS-tagging by means of the Brill tagger (Brown Corpus tag-set). The results so far suggest that corpus-based methods are of importance to the management or requirements analyses. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/538770

author

Lindmark, Kerstin ^LU ; Natt och Dag, Johan ^LU and Willners, Caroline ^LU

organization

publishing date

2007

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

keywords

software requirements engineering, corpus-based methods, lexical semantics

host publication

Corpus Linguistics 25 Years on

editor

Facchinetti, Roberta

issue

62

pages

365 - 385

publisher

Rodopi

conference name

ICAME 25

conference location

Verona, Italy

conference dates

2004-05-19 - 2004-05-23

external identifiers

wos:000246928600019

ISSN

0921-5034

ISBN

9042021950

language

English

LU publication?

yes

id

9cfa326e-36c9-468c-9d16-1be27d43489e (old id 538770)

alternative location

http://www.ingentaconnect.com/content/rodopi/lang/2007/00000062/00000001/art00020

date added to LUP

2016-04-01 16:24:18

date last changed

2025-04-04 14:18:08

@inproceedings{9cfa326e-36c9-468c-9d16-1be27d43489e,
  abstract     = {{In companies that constantly develop new software releases for large markets, there continually arrive new requirements, written in natural language that may affect the development work. Before any decision is made about the requirements, these must be analysed and understood, and related to the current set of implemented and queued requirements. This task is time-consuming owing to the high inflow of requirements, and decision-making would be facilitated by any support that would reduce the requirements analyst’s workload. One of the main tasks is finding requirement duplicates and requirements with similar content and different NLP methods have been tried for this. Simple word matching is one of the methods used for linkage between requirements. If links could be set up not only between words, but also between concepts at different semantic levels, the chances of finding content-corresponding requirements would be greater. One goal of this project is to establish a terminology for requirements as well as to establish (Wordnet-type) semantic relations between terms, in order to enable multi-level linkage. For this purpose, we use a corpus consisting of 1,932 authentic software requirements, written in English of varying grammatical and stylistic quality. First, term candidates were extracted using the WordSmith Keyword function, with BNC Sampler as reference corpus. To find out whether there is any terminology specific to the ‘requirements’ sub-domain of the ‘software’ domain, the documentation associated with the software to which the requirements relate was also used as a reference (separately). Then, lexico-semantic patterns according to Hearst (1992) were used to find hyponymy–hyperonymy relations, and to confirm manually established relations. These analyses were performed on the text both ‘as is’ and, reducing noise somewhat, after POS-tagging by means of the Brill tagger (Brown Corpus tag-set). The results so far suggest that corpus-based methods are of importance to the management or requirements analyses.}},
  author       = {{Lindmark, Kerstin and Natt och Dag, Johan and Willners, Caroline}},
  booktitle    = {{Corpus Linguistics 25 Years on}},
  editor       = {{Facchinetti, Roberta}},
  isbn         = {{9042021950}},
  issn         = {{0921-5034}},
  keywords     = {{software requirements engineering; corpus-based methods; lexical semantics}},
  language     = {{eng}},
  number       = {{62}},
  pages        = {{365--385}},
  publisher    = {{Rodopi}},
  title        = {{Lexical semantics for software requirements engineering – a corpus-based approach}},
  url          = {{http://www.ingentaconnect.com/content/rodopi/lang/2007/00000062/00000001/art00020}},
  year         = {{2007}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Lexical semantics for software requirements engineering – a corpus-based approach