Automated Bug Assignment: Ensemble-based Machine Learning in Large Scale Industrial Contexts

Jonsson, Leif; Borg, Markus; Broman, David; Sandahl, Kristian; Eldh, Sigrid; Runeson, Per

Automated Bug Assignment: Ensemble-based Machine Learning in Large Scale Industrial Contexts

Mark

Jonsson, Leif ; Borg, Markus ^LU ; Broman, David ; Sandahl, Kristian ; Eldh, Sigrid and Runeson, Per ^LU

(2015) In Empirical Software Engineering 21(4).

Abstract: Bug report assignment is an important part of software maintenance. In particular, incorrect assignments of bug reports to development teams can be very expensive in large software development projects. Several studies propose automating bug assignment techniques using machine learning in open source software contexts, but no study exists for large-scale proprietary projects in industry. The goal of this study is to evaluate automated bug assignment techniques that are based on machine learning classification. In particular, we study the state-of-the-art ensemble learner Stacked Generalization (SG) that combines several classifiers. We collect more than 50,000 bug reports from five development projects from two companies in different... (More); Bug report assignment is an important part of software maintenance. In particular, incorrect assignments of bug reports to development teams can be very expensive in large software development projects. Several studies propose automating bug assignment techniques using machine learning in open source software contexts, but no study exists for large-scale proprietary projects in industry. The goal of this study is to evaluate automated bug assignment techniques that are based on machine learning classification. In particular, we study the state-of-the-art ensemble learner Stacked Generalization (SG) that combines several classifiers. We collect more than 50,000 bug reports from five development projects from two companies in different domains. We implement automated bug assignment and evaluate the performance in a set of controlled experiments. We show that SG scales to large scale industrial application and that it outperforms the use of individual classifiers for bug assignment, reaching prediction accuracies from 50 % to 89 % when large training sets are used. In addition, we show how old training data can decrease the prediction accuracy of bug assignment. We advice industry to use SG for bug assignment in proprietary contexts, using at least 2,000 bug reports for training. Finally, we highlight the importance of not solely relying on results from cross-validation when evaluating automated bug assignment. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/7865961

author

Jonsson, Leif ; Borg, Markus ^LU ; Broman, David ; Sandahl, Kristian ; Eldh, Sigrid and Runeson, Per ^LU

organization

publishing date

2015

type

Contribution to journal

publication status

published

subject

Software Engineering

keywords

Large scale, Industrial scale, Bug assignment, Bug reports, Classification, Ensemble learning, Machine learning

in

Empirical Software Engineering

volume

21

issue

4

publisher

Springer

external identifiers

scopus:84941356343
wos:000379060700004

ISSN

1573-7616

DOI

10.1007/s10664-015-9401-9

project

Embedded Applications Software Engineering

language

English

LU publication?

yes

id

0a7a873f-c93e-4846-bf92-3ab882384457 (old id 7865961)

date added to LUP

2016-04-01 10:27:33

date last changed

2025-10-14 11:49:04

@article{0a7a873f-c93e-4846-bf92-3ab882384457,
  abstract     = {{Bug report assignment is an important part of software maintenance. In particular, incorrect assignments of bug reports to development teams can be very expensive in large software development projects. Several studies propose automating bug assignment techniques using machine learning in open source software contexts, but no study exists for large-scale proprietary projects in industry. The goal of this study is to evaluate automated bug assignment techniques that are based on machine learning classification. In particular, we study the state-of-the-art ensemble learner Stacked Generalization (SG) that combines several classifiers. We collect more than 50,000 bug reports from five development projects from two companies in different domains. We implement automated bug assignment and evaluate the performance in a set of controlled experiments. We show that SG scales to large scale industrial application and that it outperforms the use of individual classifiers for bug assignment, reaching prediction accuracies from 50 % to 89 % when large training sets are used. In addition, we show how old training data can decrease the prediction accuracy of bug assignment. We advice industry to use SG for bug assignment in proprietary contexts, using at least 2,000 bug reports for training. Finally, we highlight the importance of not solely relying on results from cross-validation when evaluating automated bug assignment.}},
  author       = {{Jonsson, Leif and Borg, Markus and Broman, David and Sandahl, Kristian and Eldh, Sigrid and Runeson, Per}},
  issn         = {{1573-7616}},
  keywords     = {{Large scale; Industrial scale; Bug assignment; Bug reports; Classification; Ensemble learning; Machine learning}},
  language     = {{eng}},
  number       = {{4}},
  publisher    = {{Springer}},
  series       = {{Empirical Software Engineering}},
  title        = {{Automated Bug Assignment: Ensemble-based Machine Learning in Large Scale Industrial Contexts}},
  url          = {{https://lup.lub.lu.se/search/files/1859620/7865979.pdf}},
  doi          = {{10.1007/s10664-015-9401-9}},
  volume       = {{21}},
  year         = {{2015}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Automated Bug Assignment: Ensemble-based Machine Learning in Large Scale Industrial Contexts