Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Classification of Premium and Non-Premium Products using XGBoost and Logistic Regression

Erazo, Francisco LU and Rojas Gerena, Stephany LU (2022) DABN01 20221
Department of Statistics
Department of Economics
Abstract
In the past few years, many industries have become interested in premium product segmentation to achieve higher unit margins. In this paper, we applied machine learning algorithms to predict whether a product is premium or non-premium. This product is manufactured by a food and beverage company that considers the incorrect classification of products as their primary concern, especially when incorrectly predicting premium products (False Positives). Therefore, the focus of this study is to minimize the misclassification of premium products. We selected Logistic Regression (LR) and XGBoost (XGB) and applied balancing methods, feature selection, and tuning parameters. The main contribution of this research is the application of a... (More)
In the past few years, many industries have become interested in premium product segmentation to achieve higher unit margins. In this paper, we applied machine learning algorithms to predict whether a product is premium or non-premium. This product is manufactured by a food and beverage company that considers the incorrect classification of products as their primary concern, especially when incorrectly predicting premium products (False Positives). Therefore, the focus of this study is to minimize the misclassification of premium products. We selected Logistic Regression (LR) and XGBoost (XGB) and applied balancing methods, feature selection, and tuning parameters. The main contribution of this research is the application of a Cost-Sensitive (CS) analysis for addressing misclassification with a highly imbalanced dataset. According to our results, the model with the best performance was CS-XGB-SMOTE achieving a False Positive Rate (FPR) of 2.7%. A more robust way to assign the costs for the CS analysis and a direct modification of the loss function for XGB can be explored for future research and may improve the performance of this algorithm. (Less)
Please use this url to cite or link to this publication:
author
Erazo, Francisco LU and Rojas Gerena, Stephany LU
supervisor
organization
course
DABN01 20221
year
type
H1 - Master's Degree (One Year)
subject
keywords
XGBoost, Logistic Regression, Classification Algorithms, Food and Beverage, Cost-Sensitive Analysis, SMOTE
language
English
id
9087706
date added to LUP
2022-10-10 08:42:52
date last changed
2022-10-10 16:00:06
@misc{9087706,
  abstract     = {{In the past few years, many industries have become interested in premium product segmentation to achieve higher unit margins. In this paper, we applied machine learning algorithms to predict whether a product is premium or non-premium. This product is manufactured by a food and beverage company that considers the incorrect classification of products as their primary concern, especially when incorrectly predicting premium products (False Positives). Therefore, the focus of this study is to minimize the misclassification of premium products. We selected Logistic Regression (LR) and XGBoost (XGB) and applied balancing methods, feature selection, and tuning parameters. The main contribution of this research is the application of a Cost-Sensitive (CS) analysis for addressing misclassification with a highly imbalanced dataset. According to our results, the model with the best performance was CS-XGB-SMOTE achieving a False Positive Rate (FPR) of 2.7%. A more robust way to assign the costs for the CS analysis and a direct modification of the loss function for XGB can be explored for future research and may improve the performance of this algorithm.}},
  author       = {{Erazo, Francisco and Rojas Gerena, Stephany}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Classification of Premium and Non-Premium Products using XGBoost and Logistic Regression}},
  year         = {{2022}},
}