Comparative Analysis of Machine Learning Algorithms on Comprehensive and Cluster-Specific Data in the Auto Insurance Industry

Balabanova, Veselina; Bhattarai, Shreeya

Comparative Analysis of Machine Learning Algorithms on Comprehensive and Cluster-Specific Data in the Auto Insurance Industry

Mark

Balabanova, Veselina ^LU and Bhattarai, Shreeya ^LU (2024) DABN01 20241
Department of Economics
Department of Statistics

Abstract: In recent years, businesses have been focusing on Customer Lifetime Value (CLV) to achieve better customer relationships and to identify high-value customers for more customized marketing strategies. This thesis contributes by comparing the performance of different machine learning models on cluster-specific data points and the complete dataset from the auto insurance industry. In addition, the study also discovers the most valuable customer cluster and devises customer retention strategies based on significant features that influence CLV.

For further empirical analysis, we have selected Principal Component Analysis (PCA) and k-means Clustering for customer segmentation. We have also used Random Forest, XGBoost, and Neural Networks, to... (More); In recent years, businesses have been focusing on Customer Lifetime Value (CLV) to achieve better customer relationships and to identify high-value customers for more customized marketing strategies. This thesis contributes by comparing the performance of different machine learning models on cluster-specific data points and the complete dataset from the auto insurance industry. In addition, the study also discovers the most valuable customer cluster and devises customer retention strategies based on significant features that influence CLV.

For further empirical analysis, we have selected Principal Component Analysis (PCA) and k-means Clustering for customer segmentation. We have also used Random Forest, XGBoost, and Neural Networks, to predict CLV on comprehensive and cluster-specific data. Applied feature importance and hyperparameter tuning have been used for further insights. Overall, the findings suggest the best performance among the models is by Random Forest and its R^2 improved by 27% while RMSE dropped by 39% after applying the models to every cluster for predicting CLV. For future research, the findings from this study can also be adopted in other insurance industries to see how using clustering techniques helps improve the machine learning models’ performances. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9154968

author

Balabanova, Veselina ^LU and Bhattarai, Shreeya ^LU

supervisor

Simon Reese ^LU

organization

course

DABN01 20241

year

2024

type

H1 - Master's Degree (One Year)

subject

Business and Economics

keywords

Auto Insurance Industry, Machine Learning, Random Forest, XGBoost, Neural Network, k-Means Clustering, Principal Component Analysis, Customer Lifetime Value (CLV)

language

English

id

9154968

date added to LUP

2024-09-24 08:32:17

date last changed

2024-09-24 08:32:17

@misc{9154968,
  abstract     = {{In recent years, businesses have been focusing on Customer Lifetime Value (CLV) to achieve better customer relationships and to identify high-value customers for more customized marketing strategies. This thesis contributes by comparing the performance of different machine learning models on cluster-specific data points and the complete dataset from the auto insurance industry. In addition, the study also discovers the most valuable customer cluster and devises customer retention strategies based on significant features that influence CLV.

For further empirical analysis, we have selected Principal Component Analysis (PCA) and k-means Clustering for customer segmentation. We have also used Random Forest, XGBoost, and Neural Networks, to predict CLV on comprehensive and cluster-specific data. Applied feature importance and hyperparameter tuning have been used for further insights. Overall, the findings suggest the best performance among the models is by Random Forest and its R^2 improved by 27% while RMSE dropped by 39% after applying the models to every cluster for predicting CLV. For future research, the findings from this study can also be adopted in other insurance industries to see how using clustering techniques helps improve the machine learning models’ performances.}},
  author       = {{Balabanova, Veselina and Bhattarai, Shreeya}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Comparative Analysis of Machine Learning Algorithms on Comprehensive and Cluster-Specific Data in the Auto Insurance Industry}},
  year         = {{2024}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Comparative Analysis of Machine Learning Algorithms on Comprehensive and Cluster-Specific Data in the Auto Insurance Industry