A comparative analysis of ML techniques for bug report classification

Laiq, Muhammad; Ali, Nauman bin; Börstler, Jürgen; Engström, Emelie

A comparative analysis of ML techniques for bug report classification

Mark

Laiq, Muhammad ; Ali, Nauman bin ; Börstler, Jürgen and Engström, Emelie ^LU

(2025) In Journal of Systems and Software 227.

Abstract: Several studies have evaluated various ML techniques and found promising results in classifying bug reports. However, these studies have used different evaluation designs, making it difficult to compare their results. Furthermore, they have focused primarily on accuracy and did not consider other potentially relevant factors such as generalizability, explainability, and maintenance cost. These two aspects make it difficult for practitioners and researchers to choose an appropriate ML technique for a given context. Therefore, we compare promising ML techniques against practitioners’ concerns using evaluation criteria that go beyond accuracy. Based on an existing framework for adopting ML techniques, we developed an evaluation framework... (More); Several studies have evaluated various ML techniques and found promising results in classifying bug reports. However, these studies have used different evaluation designs, making it difficult to compare their results. Furthermore, they have focused primarily on accuracy and did not consider other potentially relevant factors such as generalizability, explainability, and maintenance cost. These two aspects make it difficult for practitioners and researchers to choose an appropriate ML technique for a given context. Therefore, we compare promising ML techniques against practitioners’ concerns using evaluation criteria that go beyond accuracy. Based on an existing framework for adopting ML techniques, we developed an evaluation framework for ML techniques for bug report classification. We used this framework to compare nine ML techniques on three datasets. The results enable a tradeoff analysis between various promising ML techniques. The results show that an ML technique with the highest predictive accuracy might not be the most suitable technique for some contexts. The overall approach presented in the paper supports making informed decisions when choosing ML techniques. It is not locked to the specific techniques, datasets, or factors we have selected here, and others could easily use and adapt it for additional techniques or concerns. Editor's note: Open Science material was validated by the Journal of Systems and Software Open Science Board.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/7c871a48-2c47-451f-83e5-c5801006cb70

author

Laiq, Muhammad ; Ali, Nauman bin ; Börstler, Jürgen and Engström, Emelie ^LU

organization

publishing date

2025-09

type

Contribution to journal

publication status

published

subject

keywords

Automated machine learning, AutoML, BERT, Bug report classification, Issue classification, Large language models, Natural language processing, RoBERTa, Software analytics, Software maintenance

in

Journal of Systems and Software

volume

227

article number

112457

pages

17 pages

publisher

Elsevier

external identifiers

scopus:105003372247

ISSN

0164-1212

DOI

10.1016/j.jss.2025.112457

language

English

LU publication?

yes

id

7c871a48-2c47-451f-83e5-c5801006cb70

date added to LUP

2025-05-05 13:51:20

date last changed

2025-05-06 12:52:33

@article{7c871a48-2c47-451f-83e5-c5801006cb70,
  abstract     = {{<p>Several studies have evaluated various ML techniques and found promising results in classifying bug reports. However, these studies have used different evaluation designs, making it difficult to compare their results. Furthermore, they have focused primarily on accuracy and did not consider other potentially relevant factors such as generalizability, explainability, and maintenance cost. These two aspects make it difficult for practitioners and researchers to choose an appropriate ML technique for a given context. Therefore, we compare promising ML techniques against practitioners’ concerns using evaluation criteria that go beyond accuracy. Based on an existing framework for adopting ML techniques, we developed an evaluation framework for ML techniques for bug report classification. We used this framework to compare nine ML techniques on three datasets. The results enable a tradeoff analysis between various promising ML techniques. The results show that an ML technique with the highest predictive accuracy might not be the most suitable technique for some contexts. The overall approach presented in the paper supports making informed decisions when choosing ML techniques. It is not locked to the specific techniques, datasets, or factors we have selected here, and others could easily use and adapt it for additional techniques or concerns. Editor's note: Open Science material was validated by the Journal of Systems and Software Open Science Board.</p>}},
  author       = {{Laiq, Muhammad and Ali, Nauman bin and Börstler, Jürgen and Engström, Emelie}},
  issn         = {{0164-1212}},
  keywords     = {{Automated machine learning; AutoML; BERT; Bug report classification; Issue classification; Large language models; Natural language processing; RoBERTa; Software analytics; Software maintenance}},
  language     = {{eng}},
  publisher    = {{Elsevier}},
  series       = {{Journal of Systems and Software}},
  title        = {{A comparative analysis of ML techniques for bug report classification}},
  url          = {{http://dx.doi.org/10.1016/j.jss.2025.112457}},
  doi          = {{10.1016/j.jss.2025.112457}},
  volume       = {{227}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

A comparative analysis of ML techniques for bug report classification