Advanced

Detection of duplicate defect reports using natural language processing

Runeson, Per LU ; Alexandersson, Magnus and Nyholm, Oskar (2007) 29th International Conference on Software Engineering, ICSE 2007 In Proceedings - International Conference on Software Engineering p.499-508
Abstract
Defect reports are generated from various testing and development activities in software engineering. Some-times two reports are submitted that describe the same problem, leading to duplicate reports. These reports are mostly written in structured natural language, and as such, it is hard to compare two reports for similarity with formal methods. In order to identify duplicates, we investigate using Natural Language Processing (NLP) techniques to support the identification. A prototype tool is developed and evaluated in a case study analyzing defect reports at Sony Ericsson Mobile Communications. The evaluation shows that about 2/3 of the duplicates can possibly be found using the NLP techniques. Different variants of the techniques... (More)
Defect reports are generated from various testing and development activities in software engineering. Some-times two reports are submitted that describe the same problem, leading to duplicate reports. These reports are mostly written in structured natural language, and as such, it is hard to compare two reports for similarity with formal methods. In order to identify duplicates, we investigate using Natural Language Processing (NLP) techniques to support the identification. A prototype tool is developed and evaluated in a case study analyzing defect reports at Sony Ericsson Mobile Communications. The evaluation shows that about 2/3 of the duplicates can possibly be found using the NLP techniques. Different variants of the techniques provide only minor result differences, indicating a robust technology. User testing shows that the overall attitude towards the technique is positive and that it has a growth potential. © 2007 IEEE. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
Sony Ericsson (CO), User testing
in
Proceedings - International Conference on Software Engineering
pages
499 - 508
publisher
IEEE--Institute of Electrical and Electronics Engineers Inc.
conference name
29th International Conference on Software Engineering, ICSE 2007
external identifiers
  • wos:000247063000049
  • other:CODEN: PCSEDE
  • scopus:34548795892
ISSN
0270-5257
DOI
10.1109/ICSE.2007.32
language
English
LU publication?
yes
id
790a2092-354d-4d08-ae3e-ba8c62ae9b38 (old id 643449)
date added to LUP
2007-12-04 11:24:41
date last changed
2017-11-12 04:00:25
@inproceedings{790a2092-354d-4d08-ae3e-ba8c62ae9b38,
  abstract     = {Defect reports are generated from various testing and development activities in software engineering. Some-times two reports are submitted that describe the same problem, leading to duplicate reports. These reports are mostly written in structured natural language, and as such, it is hard to compare two reports for similarity with formal methods. In order to identify duplicates, we investigate using Natural Language Processing (NLP) techniques to support the identification. A prototype tool is developed and evaluated in a case study analyzing defect reports at Sony Ericsson Mobile Communications. The evaluation shows that about 2/3 of the duplicates can possibly be found using the NLP techniques. Different variants of the techniques provide only minor result differences, indicating a robust technology. User testing shows that the overall attitude towards the technique is positive and that it has a growth potential. © 2007 IEEE.},
  author       = {Runeson, Per and Alexandersson, Magnus and Nyholm, Oskar},
  booktitle    = {Proceedings - International Conference on Software Engineering},
  issn         = {0270-5257},
  keyword      = {Sony Ericsson (CO),User testing},
  language     = {eng},
  pages        = {499--508},
  publisher    = {IEEE--Institute of Electrical and Electronics Engineers Inc.},
  title        = {Detection of duplicate defect reports using natural language processing},
  url          = {http://dx.doi.org/10.1109/ICSE.2007.32},
  year         = {2007},
}