Predicting likelihood of requirement implementation within the planned iteration : An empirical study at IBM
(2017) 14th IEEE/ACM International Conference on Mining Software Repositories, MSR 2017 p.124-134- Abstract
There has been a significant interest in the estimation of time and effort in fixing defects among both software practitioners and researchers over the past two decades. However, most of the focus has been on prediction of time and effort in resolving bugs, without much regard to predicting time needed to complete high-level requirements, a critical step in release planning. In this paper, we describe a mixed-method empirical study on three large IBM projects in which we developed and evaluated a process of training a predictive model constituting a set of 29 features in nine categories in order to predict if a requirement will be completed within its planned iteration. We conducted feature engineering through iterative interviews with... (More)
There has been a significant interest in the estimation of time and effort in fixing defects among both software practitioners and researchers over the past two decades. However, most of the focus has been on prediction of time and effort in resolving bugs, without much regard to predicting time needed to complete high-level requirements, a critical step in release planning. In this paper, we describe a mixed-method empirical study on three large IBM projects in which we developed and evaluated a process of training a predictive model constituting a set of 29 features in nine categories in order to predict if a requirement will be completed within its planned iteration. We conducted feature engineering through iterative interviews with IBM practitioners as well as analysis of large development repositories of these three projects. Using machine learning techniques, we were able to make predictions on completion time of requirements at four different stages of their lifetime. Using our industrial partner's interest in high precision over recall, we then adopted a cost sensitive learning method and maximized precision of predictions (ranging from 0.8 to 0.97) while maintaining an acceptable recall. We also ranked the features based on their relative importance to the optimized predictive model. We show that although satisfying predictions can be made at early stages, performance of predictions improves over time by taking advantage of requirements' progress data. Furthermore, feature importance ranking results show that although importance of features are highly dependent on project and prediction stage, there are certain features (e.g. requirement creator, time remained to the end of iteration, time since last requirement summary change and number of times requirement has been replanned for a new iteration) that emerge as important across most projects and stages, implying future worthwhile research directions for both researchers and practitioners.
(Less)
- author
- Dehghan, Ali ; Neal, Adam ; Blincoe, Kelly ; Linaker, Johan LU and Damian, Daniela
- organization
- publishing date
- 2017-06-29
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- keywords
- Completion Time Prediction, Machine Learning, Mining Software Repositories, Release Planning
- host publication
- Proceedings - 2017 IEEE/ACM 14th International Conference on Mining Software Repositories, MSR 2017
- article number
- 7962362
- pages
- 11 pages
- publisher
- IEEE Computer Society
- conference name
- 14th IEEE/ACM International Conference on Mining Software Repositories, MSR 2017
- conference location
- Buenos Aires, Argentina
- conference dates
- 2017-05-20 - 2017-05-21
- external identifiers
-
- scopus:85026548340
- ISBN
- 9781538615447
- DOI
- 10.1109/MSR.2017.53
- project
- Synthesis of a Software Engineering Framework for Open Innovation through Empirical Research
- language
- English
- LU publication?
- yes
- id
- 119d6515-c4e4-45d1-8686-fb17ca3c454d
- date added to LUP
- 2017-09-01 14:21:53
- date last changed
- 2022-07-11 19:15:38
@inproceedings{119d6515-c4e4-45d1-8686-fb17ca3c454d, abstract = {{<p>There has been a significant interest in the estimation of time and effort in fixing defects among both software practitioners and researchers over the past two decades. However, most of the focus has been on prediction of time and effort in resolving bugs, without much regard to predicting time needed to complete high-level requirements, a critical step in release planning. In this paper, we describe a mixed-method empirical study on three large IBM projects in which we developed and evaluated a process of training a predictive model constituting a set of 29 features in nine categories in order to predict if a requirement will be completed within its planned iteration. We conducted feature engineering through iterative interviews with IBM practitioners as well as analysis of large development repositories of these three projects. Using machine learning techniques, we were able to make predictions on completion time of requirements at four different stages of their lifetime. Using our industrial partner's interest in high precision over recall, we then adopted a cost sensitive learning method and maximized precision of predictions (ranging from 0.8 to 0.97) while maintaining an acceptable recall. We also ranked the features based on their relative importance to the optimized predictive model. We show that although satisfying predictions can be made at early stages, performance of predictions improves over time by taking advantage of requirements' progress data. Furthermore, feature importance ranking results show that although importance of features are highly dependent on project and prediction stage, there are certain features (e.g. requirement creator, time remained to the end of iteration, time since last requirement summary change and number of times requirement has been replanned for a new iteration) that emerge as important across most projects and stages, implying future worthwhile research directions for both researchers and practitioners.</p>}}, author = {{Dehghan, Ali and Neal, Adam and Blincoe, Kelly and Linaker, Johan and Damian, Daniela}}, booktitle = {{Proceedings - 2017 IEEE/ACM 14th International Conference on Mining Software Repositories, MSR 2017}}, isbn = {{9781538615447}}, keywords = {{Completion Time Prediction; Machine Learning; Mining Software Repositories; Release Planning}}, language = {{eng}}, month = {{06}}, pages = {{124--134}}, publisher = {{IEEE Computer Society}}, title = {{Predicting likelihood of requirement implementation within the planned iteration : An empirical study at IBM}}, url = {{http://dx.doi.org/10.1109/MSR.2017.53}}, doi = {{10.1109/MSR.2017.53}}, year = {{2017}}, }