Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Optimizing Data-driven Project Duration Prediction through Machine Learning Approaches

Yu, Baoyang LU and Nguyen, An LU (2025) DABN01 20251
Department of Economics
Department of Statistics
Abstract
Accurately predicting project duration is vital for effective management in large-scale engineering and product development. This thesis analyzes a dataset from Ericsson, covering over 8,000 projects from 2016 to 2025, to develop machine learning models for predicting project duration across various telecommunications programs. It also identifies key features that influence project timelines.

After a thorough process of data preprocessing, feature engineering, and model
selection, the CatBoost regressor was chosen and improved with residual correction. The final model achieved an R² of 0.61 and a mean absolute error of 10.39 weeks. Results show that program type and project creation year are strong predictors, while long-duration... (More)
Accurately predicting project duration is vital for effective management in large-scale engineering and product development. This thesis analyzes a dataset from Ericsson, covering over 8,000 projects from 2016 to 2025, to develop machine learning models for predicting project duration across various telecommunications programs. It also identifies key features that influence project timelines.

After a thorough process of data preprocessing, feature engineering, and model
selection, the CatBoost regressor was chosen and improved with residual correction. The final model achieved an R² of 0.61 and a mean absolute error of 10.39 weeks. Results show that program type and project creation year are strong predictors, while long-duration projects remain harder to estimate accurately.

Beyond prediction, the thesis proposes an automated reporting system powered
by a large language model and designed within a retrieval-augmented generation
framework. It provides clear, interactive summaries of predictions and feature importance, helping project managers make data-driven decisions more effectively. (Less)
Please use this url to cite or link to this publication:
author
Yu, Baoyang LU and Nguyen, An LU
supervisor
organization
course
DABN01 20251
year
type
H1 - Master's Degree (One Year)
subject
keywords
Machine Learning, Data Analytics, CatBoost, Project Management, Project Duration Prediction
language
English
id
9202224
date added to LUP
2025-09-12 09:05:28
date last changed
2025-09-12 09:05:28
@misc{9202224,
  abstract     = {{Accurately predicting project duration is vital for effective management in large-scale engineering and product development. This thesis analyzes a dataset from Ericsson, covering over 8,000 projects from 2016 to 2025, to develop machine learning models for predicting project duration across various telecommunications programs. It also identifies key features that influence project timelines.

After a thorough process of data preprocessing, feature engineering, and model
selection, the CatBoost regressor was chosen and improved with residual correction. The final model achieved an R² of 0.61 and a mean absolute error of 10.39 weeks. Results show that program type and project creation year are strong predictors, while long-duration projects remain harder to estimate accurately.

Beyond prediction, the thesis proposes an automated reporting system powered
by a large language model and designed within a retrieval-augmented generation
framework. It provides clear, interactive summaries of predictions and feature importance, helping project managers make data-driven decisions more effectively.}},
  author       = {{Yu, Baoyang and Nguyen, An}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Optimizing Data-driven Project Duration Prediction through Machine Learning Approaches}},
  year         = {{2025}},
}