A comparative analysis of ML techniques for bug report classification
(2025) In Journal of Systems and Software 227.- Abstract
Several studies have evaluated various ML techniques and found promising results in classifying bug reports. However, these studies have used different evaluation designs, making it difficult to compare their results. Furthermore, they have focused primarily on accuracy and did not consider other potentially relevant factors such as generalizability, explainability, and maintenance cost. These two aspects make it difficult for practitioners and researchers to choose an appropriate ML technique for a given context. Therefore, we compare promising ML techniques against practitioners’ concerns using evaluation criteria that go beyond accuracy. Based on an existing framework for adopting ML techniques, we developed an evaluation framework... (More)
Several studies have evaluated various ML techniques and found promising results in classifying bug reports. However, these studies have used different evaluation designs, making it difficult to compare their results. Furthermore, they have focused primarily on accuracy and did not consider other potentially relevant factors such as generalizability, explainability, and maintenance cost. These two aspects make it difficult for practitioners and researchers to choose an appropriate ML technique for a given context. Therefore, we compare promising ML techniques against practitioners’ concerns using evaluation criteria that go beyond accuracy. Based on an existing framework for adopting ML techniques, we developed an evaluation framework for ML techniques for bug report classification. We used this framework to compare nine ML techniques on three datasets. The results enable a tradeoff analysis between various promising ML techniques. The results show that an ML technique with the highest predictive accuracy might not be the most suitable technique for some contexts. The overall approach presented in the paper supports making informed decisions when choosing ML techniques. It is not locked to the specific techniques, datasets, or factors we have selected here, and others could easily use and adapt it for additional techniques or concerns. Editor's note: Open Science material was validated by the Journal of Systems and Software Open Science Board.
(Less)
- author
- Laiq, Muhammad
; Ali, Nauman bin
; Börstler, Jürgen
and Engström, Emelie
LU
- organization
- publishing date
- 2025-09
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- Automated machine learning, AutoML, BERT, Bug report classification, Issue classification, Large language models, Natural language processing, RoBERTa, Software analytics, Software maintenance
- in
- Journal of Systems and Software
- volume
- 227
- article number
- 112457
- pages
- 17 pages
- publisher
- Elsevier
- external identifiers
-
- scopus:105003372247
- ISSN
- 0164-1212
- DOI
- 10.1016/j.jss.2025.112457
- language
- English
- LU publication?
- yes
- id
- 7c871a48-2c47-451f-83e5-c5801006cb70
- date added to LUP
- 2025-05-05 13:51:20
- date last changed
- 2025-05-06 12:52:33
@article{7c871a48-2c47-451f-83e5-c5801006cb70, abstract = {{<p>Several studies have evaluated various ML techniques and found promising results in classifying bug reports. However, these studies have used different evaluation designs, making it difficult to compare their results. Furthermore, they have focused primarily on accuracy and did not consider other potentially relevant factors such as generalizability, explainability, and maintenance cost. These two aspects make it difficult for practitioners and researchers to choose an appropriate ML technique for a given context. Therefore, we compare promising ML techniques against practitioners’ concerns using evaluation criteria that go beyond accuracy. Based on an existing framework for adopting ML techniques, we developed an evaluation framework for ML techniques for bug report classification. We used this framework to compare nine ML techniques on three datasets. The results enable a tradeoff analysis between various promising ML techniques. The results show that an ML technique with the highest predictive accuracy might not be the most suitable technique for some contexts. The overall approach presented in the paper supports making informed decisions when choosing ML techniques. It is not locked to the specific techniques, datasets, or factors we have selected here, and others could easily use and adapt it for additional techniques or concerns. Editor's note: Open Science material was validated by the Journal of Systems and Software Open Science Board.</p>}}, author = {{Laiq, Muhammad and Ali, Nauman bin and Börstler, Jürgen and Engström, Emelie}}, issn = {{0164-1212}}, keywords = {{Automated machine learning; AutoML; BERT; Bug report classification; Issue classification; Large language models; Natural language processing; RoBERTa; Software analytics; Software maintenance}}, language = {{eng}}, publisher = {{Elsevier}}, series = {{Journal of Systems and Software}}, title = {{A comparative analysis of ML techniques for bug report classification}}, url = {{http://dx.doi.org/10.1016/j.jss.2025.112457}}, doi = {{10.1016/j.jss.2025.112457}}, volume = {{227}}, year = {{2025}}, }