Using Text Clustering to Predict Defect Resolution Time: A Conceptual Replication and an Evaluation of Prediction Accuracy

Assar, Saïd; Borg, Markus; Pfahl, Dietmar

Using Text Clustering to Predict Defect Resolution Time: A Conceptual Replication and an Evaluation of Prediction Accuracy

Mark

Assar, Saïd ; Borg, Markus ^LU and Pfahl, Dietmar ^LU (2015) In Empirical Software Engineering

Abstract: Defect management is a central task in software maintenance. When a defect is reported, appropriate resources must be allocated to analyze and resolve the defect. An important issue in resource allocation is the estimation of Defect Resolution Time (DRT). Prior research has considered different approaches for DRT prediction exploiting information retrieval techniques and similarity in textual defect descriptions. In this article, we investigate the potential of text clustering for DRT prediction. We build on a study published by Raja (2013) which demonstrated that clusters of similar defect reports had statistically significant differences in DRT. Raja’s study also suggested that this difference between clusters could be used for DRT... (More); Defect management is a central task in software maintenance. When a defect is reported, appropriate resources must be allocated to analyze and resolve the defect. An important issue in resource allocation is the estimation of Defect Resolution Time (DRT). Prior research has considered different approaches for DRT prediction exploiting information retrieval techniques and similarity in textual defect descriptions. In this article, we investigate the potential of text clustering for DRT prediction. We build on a study published by Raja (2013) which demonstrated that clusters of similar defect reports had statistically significant differences in DRT. Raja’s study also suggested that this difference between clusters could be used for DRT prediction. Our aims are twofold: First, to conceptually replicate Raja’s study and to assess the repeatability of its results in different settings; Second, to investigate the potential of textual clustering of issue reports for DRT prediction with focus on accuracy. Using different data sets and a different text mining tool and clustering technique, we first conduct an independent replication of the original study. Then we design a fully automated prediction method based on clustering with a simulated test scenario to check the accuracy of our method. The results of our independent replication are comparable to those of the original study and we confirm the initial findings regarding significant differences in DRT between clusters of defect reports. However, the simulated test scenario used to assess our prediction method yields poor results in terms of DRT prediction accuracy. Although our replication confirms the main finding from the original study, our attempt to use text clustering as the basis for DRT prediction did not achieve practically useful levels of accuracy. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/7765098

author

Assar, Saïd ; Borg, Markus ^LU and Pfahl, Dietmar ^LU

organization

publishing date

2015

type

Contribution to journal

publication status

published

subject

Computer Systems

keywords

Defect resolution time Prediction Text mining Data clustering Independent replication Simulation

in

Empirical Software Engineering

publisher

Springer

external identifiers

scopus:84933556607
wos:000379060700001

ISSN

1573-7616

DOI

10.1007/s10664-015-9391-7

project

Embedded Applications Software Engineering

language

English

LU publication?

yes

id

1d4aa93c-21be-4d56-b1f2-73e8b6830887 (old id 7765098)

date added to LUP

2016-04-01 10:41:13

date last changed

2022-03-04 21:50:21

@article{1d4aa93c-21be-4d56-b1f2-73e8b6830887,
  abstract     = {{Defect management is a central task in software maintenance. When a defect is reported, appropriate resources must be allocated to analyze and resolve the defect. An important issue in resource allocation is the estimation of Defect Resolution Time (DRT). Prior research has considered different approaches for DRT prediction exploiting information retrieval techniques and similarity in textual defect descriptions. In this article, we investigate the potential of text clustering for DRT prediction. We build on a study published by Raja (2013) which demonstrated that clusters of similar defect reports had statistically significant differences in DRT. Raja’s study also suggested that this difference between clusters could be used for DRT prediction. Our aims are twofold: First, to conceptually replicate Raja’s study and to assess the repeatability of its results in different settings; Second, to investigate the potential of textual clustering of issue reports for DRT prediction with focus on accuracy. Using different data sets and a different text mining tool and clustering technique, we first conduct an independent replication of the original study. Then we design a fully automated prediction method based on clustering with a simulated test scenario to check the accuracy of our method. The results of our independent replication are comparable to those of the original study and we confirm the initial findings regarding significant differences in DRT between clusters of defect reports. However, the simulated test scenario used to assess our prediction method yields poor results in terms of DRT prediction accuracy. Although our replication confirms the main finding from the original study, our attempt to use text clustering as the basis for DRT prediction did not achieve practically useful levels of accuracy.}},
  author       = {{Assar, Saïd and Borg, Markus and Pfahl, Dietmar}},
  issn         = {{1573-7616}},
  keywords     = {{Defect resolution time Prediction Text mining Data clustering Independent replication Simulation}},
  language     = {{eng}},
  publisher    = {{Springer}},
  series       = {{Empirical Software Engineering}},
  title        = {{Using Text Clustering to Predict Defect Resolution Time: A Conceptual Replication and an Evaluation of Prediction Accuracy}},
  url          = {{https://lup.lub.lu.se/search/files/2053063/7765103.pdf}},
  doi          = {{10.1007/s10664-015-9391-7}},
  year         = {{2015}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Using Text Clustering to Predict Defect Resolution Time: A Conceptual Replication and an Evaluation of Prediction Accuracy