Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Predicting Customer Churn and Customer Lifetime Value (CLV) using Machine Learning

Gerde, Magnus LU (2023) In Master’s Theses in Mathematical Sciences FMSM01 20222
Mathematical Statistics
Abstract
In an evermore competitive environment for companies and business, predictive customer behaviour models can give companies a competitive edge over its competitors. Two such important predictive behaviour models are customer churn models and customer lifetime value (CLV) models. As it is more expensive for companies to acquire new customers rather than retaining existing ones, it is important for business to keep their existing customer base. Customer churn models can assist in retaining existing customers as they can identify patterns in customer engagements and behaviour which increase the risk of churning. These high risk customers can then proactively be targeted with personalized retention strategies. CLV models can assist companies... (More)
In an evermore competitive environment for companies and business, predictive customer behaviour models can give companies a competitive edge over its competitors. Two such important predictive behaviour models are customer churn models and customer lifetime value (CLV) models. As it is more expensive for companies to acquire new customers rather than retaining existing ones, it is important for business to keep their existing customer base. Customer churn models can assist in retaining existing customers as they can identify patterns in customer engagements and behaviour which increase the risk of churning. These high risk customers can then proactively be targeted with personalized retention strategies. CLV models can assist companies with predicting revenues and identify areas where the company can improve to meet revenue goals. In this thesis, three different popular machine learning algorithms were used to predict customer churn: logistic regression, random forest(RF) and support vector classifier (SVC). Moreover, two different regression algorithms were used to predict CLV: linear regression and support vector regression(SVR). The results showed that the SVC model and the logistic regression model had similar results, with the SVC model having slightly better performance metrics. Moreover, as the feature data was significantly correlated, the logistic regression model might not generalize as well to new data, compared to the SVC model. The random forest model was unstable across different evaluation sets, was to reluctant to classify customers as churned and had overall the worst performance of the three models. For the CLV models, the linear regression model was unable to accurately model the skewed distribution in spending patterns among the customers. Compared to a naive predictor, the linear regression model was only able to outperform in predicting which customer would stop generating revenue. For the customers who did not stop generating revenue, the linear regression model performed significantly worse. The SVR model could more accurately model CLV, outperforming the naive predictor across all ranges except the 1/8:th highest spending customers. The SVR model further significantly outperformed the linear regression model, except for predicting which customers would stop generate revenue, where the linear regression model was slightly better. (Less)
Please use this url to cite or link to this publication:
author
Gerde, Magnus LU
supervisor
organization
course
FMSM01 20222
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Churn, CLV, SVM, Logistic Regression, Linear Regression, Random Forest Model
publication/series
Master’s Theses in Mathematical Sciences
report number
LUTFMS-3466-2023
ISSN
1404-6342
other publication id
2023:E8
language
English
id
9113032
date added to LUP
2023-04-12 08:21:59
date last changed
2023-04-12 08:21:59
@misc{9113032,
  abstract     = {{In an evermore competitive environment for companies and business, predictive customer behaviour models can give companies a competitive edge over its competitors. Two such important predictive behaviour models are customer churn models and customer lifetime value (CLV) models. As it is more expensive for companies to acquire new customers rather than retaining existing ones, it is important for business to keep their existing customer base. Customer churn models can assist in retaining existing customers as they can identify patterns in customer engagements and behaviour which increase the risk of churning. These high risk customers can then proactively be targeted with personalized retention strategies. CLV models can assist companies with predicting revenues and identify areas where the company can improve to meet revenue goals. In this thesis, three different popular machine learning algorithms were used to predict customer churn: logistic regression, random forest(RF) and support vector classifier (SVC). Moreover, two different regression algorithms were used to predict CLV: linear regression and support vector regression(SVR). The results showed that the SVC model and the logistic regression model had similar results, with the SVC model having slightly better performance metrics. Moreover, as the feature data was significantly correlated, the logistic regression model might not generalize as well to new data, compared to the SVC model. The random forest model was unstable across different evaluation sets, was to reluctant to classify customers as churned and had overall the worst performance of the three models. For the CLV models, the linear regression model was unable to accurately model the skewed distribution in spending patterns among the customers. Compared to a naive predictor, the linear regression model was only able to outperform in predicting which customer would stop generating revenue. For the customers who did not stop generating revenue, the linear regression model performed significantly worse. The SVR model could more accurately model CLV, outperforming the naive predictor across all ranges except the 1/8:th highest spending customers. The SVR model further significantly outperformed the linear regression model, except for predicting which customers would stop generate revenue, where the linear regression model was slightly better.}},
  author       = {{Gerde, Magnus}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master’s Theses in Mathematical Sciences}},
  title        = {{Predicting Customer Churn and Customer Lifetime Value (CLV) using Machine Learning}},
  year         = {{2023}},
}