Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

A comparative analysis of ML techniques for bug report classification

Laiq, Muhammad ; Ali, Nauman bin ; Börstler, Jürgen and Engström, Emelie LU orcid (2025) In Journal of Systems and Software 227.
Abstract

Several studies have evaluated various ML techniques and found promising results in classifying bug reports. However, these studies have used different evaluation designs, making it difficult to compare their results. Furthermore, they have focused primarily on accuracy and did not consider other potentially relevant factors such as generalizability, explainability, and maintenance cost. These two aspects make it difficult for practitioners and researchers to choose an appropriate ML technique for a given context. Therefore, we compare promising ML techniques against practitioners’ concerns using evaluation criteria that go beyond accuracy. Based on an existing framework for adopting ML techniques, we developed an evaluation framework... (More)

Several studies have evaluated various ML techniques and found promising results in classifying bug reports. However, these studies have used different evaluation designs, making it difficult to compare their results. Furthermore, they have focused primarily on accuracy and did not consider other potentially relevant factors such as generalizability, explainability, and maintenance cost. These two aspects make it difficult for practitioners and researchers to choose an appropriate ML technique for a given context. Therefore, we compare promising ML techniques against practitioners’ concerns using evaluation criteria that go beyond accuracy. Based on an existing framework for adopting ML techniques, we developed an evaluation framework for ML techniques for bug report classification. We used this framework to compare nine ML techniques on three datasets. The results enable a tradeoff analysis between various promising ML techniques. The results show that an ML technique with the highest predictive accuracy might not be the most suitable technique for some contexts. The overall approach presented in the paper supports making informed decisions when choosing ML techniques. It is not locked to the specific techniques, datasets, or factors we have selected here, and others could easily use and adapt it for additional techniques or concerns. Editor's note: Open Science material was validated by the Journal of Systems and Software Open Science Board.

(Less)
Please use this url to cite or link to this publication:
author
; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Automated machine learning, AutoML, BERT, Bug report classification, Issue classification, Large language models, Natural language processing, RoBERTa, Software analytics, Software maintenance
in
Journal of Systems and Software
volume
227
article number
112457
pages
17 pages
publisher
Elsevier
external identifiers
  • scopus:105003372247
ISSN
0164-1212
DOI
10.1016/j.jss.2025.112457
language
English
LU publication?
yes
id
7c871a48-2c47-451f-83e5-c5801006cb70
date added to LUP
2025-05-05 13:51:20
date last changed
2025-05-06 12:52:33
@article{7c871a48-2c47-451f-83e5-c5801006cb70,
  abstract     = {{<p>Several studies have evaluated various ML techniques and found promising results in classifying bug reports. However, these studies have used different evaluation designs, making it difficult to compare their results. Furthermore, they have focused primarily on accuracy and did not consider other potentially relevant factors such as generalizability, explainability, and maintenance cost. These two aspects make it difficult for practitioners and researchers to choose an appropriate ML technique for a given context. Therefore, we compare promising ML techniques against practitioners’ concerns using evaluation criteria that go beyond accuracy. Based on an existing framework for adopting ML techniques, we developed an evaluation framework for ML techniques for bug report classification. We used this framework to compare nine ML techniques on three datasets. The results enable a tradeoff analysis between various promising ML techniques. The results show that an ML technique with the highest predictive accuracy might not be the most suitable technique for some contexts. The overall approach presented in the paper supports making informed decisions when choosing ML techniques. It is not locked to the specific techniques, datasets, or factors we have selected here, and others could easily use and adapt it for additional techniques or concerns. Editor's note: Open Science material was validated by the Journal of Systems and Software Open Science Board.</p>}},
  author       = {{Laiq, Muhammad and Ali, Nauman bin and Börstler, Jürgen and Engström, Emelie}},
  issn         = {{0164-1212}},
  keywords     = {{Automated machine learning; AutoML; BERT; Bug report classification; Issue classification; Large language models; Natural language processing; RoBERTa; Software analytics; Software maintenance}},
  language     = {{eng}},
  publisher    = {{Elsevier}},
  series       = {{Journal of Systems and Software}},
  title        = {{A comparative analysis of ML techniques for bug report classification}},
  url          = {{http://dx.doi.org/10.1016/j.jss.2025.112457}},
  doi          = {{10.1016/j.jss.2025.112457}},
  volume       = {{227}},
  year         = {{2025}},
}