How do sampling strategies influence the accuracy and interpretability of bankruptcy predictions in U.S. firms?
(2025) NEKN01 20251Department of Economics
- Abstract
- This thesis investigates how sampling strategies influence the predictive accuracy and interpretability of bankruptcy models. Using financial data from U.S. publicly listed firms (2009 to 2023), it compares logistic regression and random forest classifiers trained under three sampling setups: no resampling, random undersampling (RUS), and SMOTE. Models are trained to predict one-year-ahead bankruptcies based on firm-year financial ratios. Out-of-sample evaluation is conducted for the period 2020 to 2024 using F2-score, AUROC, AUPRC, and Brier score. Random forest with RUS achieves the highest predictive performance, identifying 28.4% of actual bankruptcies in the test set. Logistic regression offers interpretable outputs through average... (More)
- This thesis investigates how sampling strategies influence the predictive accuracy and interpretability of bankruptcy models. Using financial data from U.S. publicly listed firms (2009 to 2023), it compares logistic regression and random forest classifiers trained under three sampling setups: no resampling, random undersampling (RUS), and SMOTE. Models are trained to predict one-year-ahead bankruptcies based on firm-year financial ratios. Out-of-sample evaluation is conducted for the period 2020 to 2024 using F2-score, AUROC, AUPRC, and Brier score. Random forest with RUS achieves the highest predictive performance, identifying 28.4% of actual bankruptcies in the test set. Logistic regression offers interpretable outputs through average marginal effects. Beyond accuracy, the analysis finds that sampling design affects which financial indicators the models learn to prioritise. Liquidity and cash flow dominate under RUS, while growth and valuation features become more prominent with SMOTE. The results show that resampling not only improves detection under class imbalance but also reshapes the financial interpretation of distress risk. Sampling is thus not a neutral step, but a modelling decision with substantive consequences. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9202952
- author
- Lantz, Alexander LU
- supervisor
- organization
- course
- NEKN01 20251
- year
- 2025
- type
- H1 - Master's Degree (One Year)
- subject
- keywords
- Bankruptcy Prediction, Class Imbalance, Random Forest, Logistic Regression, Sampling Strategies
- language
- English
- id
- 9202952
- date added to LUP
- 2025-09-12 09:59:57
- date last changed
- 2025-09-12 09:59:57
@misc{9202952,
abstract = {{This thesis investigates how sampling strategies influence the predictive accuracy and interpretability of bankruptcy models. Using financial data from U.S. publicly listed firms (2009 to 2023), it compares logistic regression and random forest classifiers trained under three sampling setups: no resampling, random undersampling (RUS), and SMOTE. Models are trained to predict one-year-ahead bankruptcies based on firm-year financial ratios. Out-of-sample evaluation is conducted for the period 2020 to 2024 using F2-score, AUROC, AUPRC, and Brier score. Random forest with RUS achieves the highest predictive performance, identifying 28.4% of actual bankruptcies in the test set. Logistic regression offers interpretable outputs through average marginal effects. Beyond accuracy, the analysis finds that sampling design affects which financial indicators the models learn to prioritise. Liquidity and cash flow dominate under RUS, while growth and valuation features become more prominent with SMOTE. The results show that resampling not only improves detection under class imbalance but also reshapes the financial interpretation of distress risk. Sampling is thus not a neutral step, but a modelling decision with substantive consequences.}},
author = {{Lantz, Alexander}},
language = {{eng}},
note = {{Student Paper}},
title = {{How do sampling strategies influence the accuracy and interpretability of bankruptcy predictions in U.S. firms?}},
year = {{2025}},
}