Predicting Corporate Credit Ratings with Machine Learning: A Sector-Specific Evaluation Across U.S. Industries
(2025) DABN01 20251Department of Economics
Department of Statistics
- Abstract
- Credit ratings play a crucial role in financial risk assessment, influencing borrowing costs, investment decisions, and overall market stability. As corporate financial data becomes more intricate, machine learning (ML) emerges as a promising tool to enhance traditional credit evaluation methods. This thesis investigates the predictive performance of six supervised ML algorithms—Support Vector Machine (SVM), Decision Tree, Random Forest, XGBoost, k-Nearest Neighbors (KNN), and Multinomial Logistic Regression (MLR)—for forecasting corporate credit ratings.
The study draws on financial data from 532 publicly listed U.S. companies across five sectors: Industry, Technology, Finance, Real Estate, and Healthcare. To ensure model robustness... (More) - Credit ratings play a crucial role in financial risk assessment, influencing borrowing costs, investment decisions, and overall market stability. As corporate financial data becomes more intricate, machine learning (ML) emerges as a promising tool to enhance traditional credit evaluation methods. This thesis investigates the predictive performance of six supervised ML algorithms—Support Vector Machine (SVM), Decision Tree, Random Forest, XGBoost, k-Nearest Neighbors (KNN), and Multinomial Logistic Regression (MLR)—for forecasting corporate credit ratings.
The study draws on financial data from 532 publicly listed U.S. companies across five sectors: Industry, Technology, Finance, Real Estate, and Healthcare. To ensure model robustness and sample balance, S&P’s original 17 credit levels were consolidated into five broader categories. Models were trained on sector-specific feature sets and evaluated using accuracy, weighted precision, recall, and F1-score.
Results reveal that tree-based ensemble methods—particularly Random Forest and XGBoost—achieved the strongest performance in sectors with non-linear and heterogeneous financial patterns. MLR outperformed in the Finance sector, reflecting more linear structures. However, all models struggled in the Real Estate sector, likely due to class imbalance and limited sample size. These findings highlight the value of sector-aware modeling, suggesting ML has practical use in credit rating applications by offering scalable, data-driven alternatives to traditional approaches. (Less) - Popular Abstract
- Credit ratings help investors understand how risky it is to lend money to companies and play a key role in financial decision-making. Traditionally, these ratings are assigned by analysts using financial reports and expert judgment. In this study, we explore whether machine learning, a technology that can learn patterns from data, can help predict credit ratings more efficiently and accurately. We analyzed financial data from over 500 U.S. companies in five different sectors: Industry, Technology, Finance, Real Estate, and Healthcare. Using six different machine learning models, we tested how well they could forecast a company’s credit rating. Our results showed that certain models, like Random Forest and XGBoost, worked particularly well... (More)
- Credit ratings help investors understand how risky it is to lend money to companies and play a key role in financial decision-making. Traditionally, these ratings are assigned by analysts using financial reports and expert judgment. In this study, we explore whether machine learning, a technology that can learn patterns from data, can help predict credit ratings more efficiently and accurately. We analyzed financial data from over 500 U.S. companies in five different sectors: Industry, Technology, Finance, Real Estate, and Healthcare. Using six different machine learning models, we tested how well they could forecast a company’s credit rating. Our results showed that certain models, like Random Forest and XGBoost, worked particularly well in sectors with more complex financial patterns. In contrast, a simpler model performed best in the Finance sector. However, predictions were less accurate in the Real Estate sector, likely because there was less data available. Overall, our research suggests that machine learning can be a valuable tool for supporting credit rating analysis, especially when tailored to specific industries. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9194172
- author
- Liu, Yuxi LU and Liao, Nai-Xuan LU
- supervisor
- organization
- course
- DABN01 20251
- year
- 2025
- type
- H1 - Master's Degree (One Year)
- subject
- keywords
- Credit Rating, Machine Learning, Classification Models, Financial Analytics, Random Forest, Support Vector Machine, XGBoost, KNN, Decision Tree, Multinomial Logistic Regression, Feature Importance, Rating Prediction
- language
- English
- id
- 9194172
- date added to LUP
- 2025-09-12 09:04:39
- date last changed
- 2025-09-12 09:04:39
@misc{9194172,
abstract = {{Credit ratings play a crucial role in financial risk assessment, influencing borrowing costs, investment decisions, and overall market stability. As corporate financial data becomes more intricate, machine learning (ML) emerges as a promising tool to enhance traditional credit evaluation methods. This thesis investigates the predictive performance of six supervised ML algorithms—Support Vector Machine (SVM), Decision Tree, Random Forest, XGBoost, k-Nearest Neighbors (KNN), and Multinomial Logistic Regression (MLR)—for forecasting corporate credit ratings.
The study draws on financial data from 532 publicly listed U.S. companies across five sectors: Industry, Technology, Finance, Real Estate, and Healthcare. To ensure model robustness and sample balance, S&P’s original 17 credit levels were consolidated into five broader categories. Models were trained on sector-specific feature sets and evaluated using accuracy, weighted precision, recall, and F1-score.
Results reveal that tree-based ensemble methods—particularly Random Forest and XGBoost—achieved the strongest performance in sectors with non-linear and heterogeneous financial patterns. MLR outperformed in the Finance sector, reflecting more linear structures. However, all models struggled in the Real Estate sector, likely due to class imbalance and limited sample size. These findings highlight the value of sector-aware modeling, suggesting ML has practical use in credit rating applications by offering scalable, data-driven alternatives to traditional approaches.}},
author = {{Liu, Yuxi and Liao, Nai-Xuan}},
language = {{eng}},
note = {{Student Paper}},
title = {{Predicting Corporate Credit Ratings with Machine Learning: A Sector-Specific Evaluation Across U.S. Industries}},
year = {{2025}},
}