Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

How do sampling strategies influence the accuracy and interpretability of bankruptcy predictions in U.S. firms?

Lantz, Alexander LU (2025) NEKN01 20251
Department of Economics
Abstract
This thesis investigates how sampling strategies influence the predictive accuracy and interpretability of bankruptcy models. Using financial data from U.S. publicly listed firms (2009 to 2023), it compares logistic regression and random forest classifiers trained under three sampling setups: no resampling, random undersampling (RUS), and SMOTE. Models are trained to predict one-year-ahead bankruptcies based on firm-year financial ratios. Out-of-sample evaluation is conducted for the period 2020 to 2024 using F2-score, AUROC, AUPRC, and Brier score. Random forest with RUS achieves the highest predictive performance, identifying 28.4% of actual bankruptcies in the test set. Logistic regression offers interpretable outputs through average... (More)
This thesis investigates how sampling strategies influence the predictive accuracy and interpretability of bankruptcy models. Using financial data from U.S. publicly listed firms (2009 to 2023), it compares logistic regression and random forest classifiers trained under three sampling setups: no resampling, random undersampling (RUS), and SMOTE. Models are trained to predict one-year-ahead bankruptcies based on firm-year financial ratios. Out-of-sample evaluation is conducted for the period 2020 to 2024 using F2-score, AUROC, AUPRC, and Brier score. Random forest with RUS achieves the highest predictive performance, identifying 28.4% of actual bankruptcies in the test set. Logistic regression offers interpretable outputs through average marginal effects. Beyond accuracy, the analysis finds that sampling design affects which financial indicators the models learn to prioritise. Liquidity and cash flow dominate under RUS, while growth and valuation features become more prominent with SMOTE. The results show that resampling not only improves detection under class imbalance but also reshapes the financial interpretation of distress risk. Sampling is thus not a neutral step, but a modelling decision with substantive consequences. (Less)
Please use this url to cite or link to this publication:
author
Lantz, Alexander LU
supervisor
organization
course
NEKN01 20251
year
type
H1 - Master's Degree (One Year)
subject
keywords
Bankruptcy Prediction, Class Imbalance, Random Forest, Logistic Regression, Sampling Strategies
language
English
id
9202952
date added to LUP
2025-09-12 09:59:57
date last changed
2025-09-12 09:59:57
@misc{9202952,
  abstract     = {{This thesis investigates how sampling strategies influence the predictive accuracy and interpretability of bankruptcy models. Using financial data from U.S. publicly listed firms (2009 to 2023), it compares logistic regression and random forest classifiers trained under three sampling setups: no resampling, random undersampling (RUS), and SMOTE. Models are trained to predict one-year-ahead bankruptcies based on firm-year financial ratios. Out-of-sample evaluation is conducted for the period 2020 to 2024 using F2-score, AUROC, AUPRC, and Brier score. Random forest with RUS achieves the highest predictive performance, identifying 28.4% of actual bankruptcies in the test set. Logistic regression offers interpretable outputs through average marginal effects. Beyond accuracy, the analysis finds that sampling design affects which financial indicators the models learn to prioritise. Liquidity and cash flow dominate under RUS, while growth and valuation features become more prominent with SMOTE. The results show that resampling not only improves detection under class imbalance but also reshapes the financial interpretation of distress risk. Sampling is thus not a neutral step, but a modelling decision with substantive consequences.}},
  author       = {{Lantz, Alexander}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{How do sampling strategies influence the accuracy and interpretability of bankruptcy predictions in U.S. firms?}},
  year         = {{2025}},
}