Optimizing Data-driven Project Duration Prediction through Machine Learning Approaches
(2025) DABN01 20251Department of Economics
Department of Statistics
- Abstract
- Accurately predicting project duration is vital for effective management in large-scale engineering and product development. This thesis analyzes a dataset from Ericsson, covering over 8,000 projects from 2016 to 2025, to develop machine learning models for predicting project duration across various telecommunications programs. It also identifies key features that influence project timelines.
After a thorough process of data preprocessing, feature engineering, and model
selection, the CatBoost regressor was chosen and improved with residual correction. The final model achieved an R² of 0.61 and a mean absolute error of 10.39 weeks. Results show that program type and project creation year are strong predictors, while long-duration... (More) - Accurately predicting project duration is vital for effective management in large-scale engineering and product development. This thesis analyzes a dataset from Ericsson, covering over 8,000 projects from 2016 to 2025, to develop machine learning models for predicting project duration across various telecommunications programs. It also identifies key features that influence project timelines.
After a thorough process of data preprocessing, feature engineering, and model
selection, the CatBoost regressor was chosen and improved with residual correction. The final model achieved an R² of 0.61 and a mean absolute error of 10.39 weeks. Results show that program type and project creation year are strong predictors, while long-duration projects remain harder to estimate accurately.
Beyond prediction, the thesis proposes an automated reporting system powered
by a large language model and designed within a retrieval-augmented generation
framework. It provides clear, interactive summaries of predictions and feature importance, helping project managers make data-driven decisions more effectively. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9202224
- author
- Yu, Baoyang LU and Nguyen, An LU
- supervisor
- organization
- course
- DABN01 20251
- year
- 2025
- type
- H1 - Master's Degree (One Year)
- subject
- keywords
- Machine Learning, Data Analytics, CatBoost, Project Management, Project Duration Prediction
- language
- English
- id
- 9202224
- date added to LUP
- 2025-09-12 09:05:28
- date last changed
- 2025-09-12 09:05:28
@misc{9202224, abstract = {{Accurately predicting project duration is vital for effective management in large-scale engineering and product development. This thesis analyzes a dataset from Ericsson, covering over 8,000 projects from 2016 to 2025, to develop machine learning models for predicting project duration across various telecommunications programs. It also identifies key features that influence project timelines. After a thorough process of data preprocessing, feature engineering, and model selection, the CatBoost regressor was chosen and improved with residual correction. The final model achieved an R² of 0.61 and a mean absolute error of 10.39 weeks. Results show that program type and project creation year are strong predictors, while long-duration projects remain harder to estimate accurately. Beyond prediction, the thesis proposes an automated reporting system powered by a large language model and designed within a retrieval-augmented generation framework. It provides clear, interactive summaries of predictions and feature importance, helping project managers make data-driven decisions more effectively.}}, author = {{Yu, Baoyang and Nguyen, An}}, language = {{eng}}, note = {{Student Paper}}, title = {{Optimizing Data-driven Project Duration Prediction through Machine Learning Approaches}}, year = {{2025}}, }