Nonstandard Errors in Bank Default Prediction Using Machine Learning
(2024) NEKH01 20241Department of Economics
- Abstract
- This thesis analyses the risk of nonstandard errors affecting bank prediction using machine learning. Nonstandard errors are defined as the type of errors that occur during the Evidence Generating Process (EGP), meaning that these occur as a consequence of decision-making by researchers, rather than from sampling. The aim is to analyze how different choices of methods for pre-processing and data engineering create variation in the prediction performance of a classifier, hence signifying the existence of nonstandard error. This is done by creating 20 different pre-processing and data engineering pipelines consisting of different choices of methods. The variation in the performance of the different pipelines then gives an estimate of the... (More)
- This thesis analyses the risk of nonstandard errors affecting bank prediction using machine learning. Nonstandard errors are defined as the type of errors that occur during the Evidence Generating Process (EGP), meaning that these occur as a consequence of decision-making by researchers, rather than from sampling. The aim is to analyze how different choices of methods for pre-processing and data engineering create variation in the prediction performance of a classifier, hence signifying the existence of nonstandard error. This is done by creating 20 different pre-processing and data engineering pipelines consisting of different choices of methods. The variation in the performance of the different pipelines then gives an estimate of the nonstandard errors. By using recall, precision and ROC-AUC as scoring metrics, this thesis finds that the size of the nonstandard errors are smaller than the standard errors for recall and ROC-AUC. For precision, the nonstandard errors are larger. Overall, the size of the errors across the metrics were of similar magnitude. This thesis concludes that nonstandard errors are likely to affect predictions of bank defaults. The implication of this is that researchers always need to be aware of nonstandard errors and compare as many parts of the machine learning pipeline as possible. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9159134
- author
- Svalfors, Emil LU
- supervisor
- organization
- course
- NEKH01 20241
- year
- 2024
- type
- M2 - Bachelor Degree
- subject
- keywords
- Nonstandard Errors, Machine Learning, Outlier Detection, Resampling, Feature Selection
- language
- English
- id
- 9159134
- date added to LUP
- 2024-06-12 14:11:24
- date last changed
- 2024-06-12 14:11:24
@misc{9159134, abstract = {{This thesis analyses the risk of nonstandard errors affecting bank prediction using machine learning. Nonstandard errors are defined as the type of errors that occur during the Evidence Generating Process (EGP), meaning that these occur as a consequence of decision-making by researchers, rather than from sampling. The aim is to analyze how different choices of methods for pre-processing and data engineering create variation in the prediction performance of a classifier, hence signifying the existence of nonstandard error. This is done by creating 20 different pre-processing and data engineering pipelines consisting of different choices of methods. The variation in the performance of the different pipelines then gives an estimate of the nonstandard errors. By using recall, precision and ROC-AUC as scoring metrics, this thesis finds that the size of the nonstandard errors are smaller than the standard errors for recall and ROC-AUC. For precision, the nonstandard errors are larger. Overall, the size of the errors across the metrics were of similar magnitude. This thesis concludes that nonstandard errors are likely to affect predictions of bank defaults. The implication of this is that researchers always need to be aware of nonstandard errors and compare as many parts of the machine learning pipeline as possible.}}, author = {{Svalfors, Emil}}, language = {{eng}}, note = {{Student Paper}}, title = {{Nonstandard Errors in Bank Default Prediction Using Machine Learning}}, year = {{2024}}, }