Comparing Machine Learning Models for Predicting Lung Cancer Mortality Rates
(2024) STAH11 20222Department of Statistics
- Abstract
- Every year, approximately 2.2 million people are diagnosed with lung cancer worldwide, and 1.7 million die as a result of the disease. This thesis employs machine learning to predict mean per-capita lung cancer mortality rates in US counties, using a dataset consisting of a wide array of demographic, socio-economic, educational, and healthcare-related variables. The primary objective is to compare the predictive performance of several machine learning methods: ordinary least squares, ridge regression, the lasso, and neural networks with hyperparameters optimized through random search and 5-fold cross-validation.
A neural network using mean squared error (MSE) as loss function achieved the lowest mean absolute error (MAE) and root mean... (More) - Every year, approximately 2.2 million people are diagnosed with lung cancer worldwide, and 1.7 million die as a result of the disease. This thesis employs machine learning to predict mean per-capita lung cancer mortality rates in US counties, using a dataset consisting of a wide array of demographic, socio-economic, educational, and healthcare-related variables. The primary objective is to compare the predictive performance of several machine learning methods: ordinary least squares, ridge regression, the lasso, and neural networks with hyperparameters optimized through random search and 5-fold cross-validation.
A neural network using mean squared error (MSE) as loss function achieved the lowest mean absolute error (MAE) and root mean squared error (RMSE) on the test set. However, the overall differences between models were small. Regularization through ridge regression and the lasso did not improve predictive performance compared to ordinary least squares. Furthermore, a comparison with previous research revealed substantial differences in model performance, with past studies reporting better predictive results. Consequently, several avenues were suggested as potential paths for future research endeavours. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9159269
- author
- Welander, Mikael LU
- supervisor
- organization
- course
- STAH11 20222
- year
- 2024
- type
- M2 - Bachelor Degree
- subject
- keywords
- Machine learning, neural networks, ridge regression, lasso, ordinary least squares, cross-validation, gradient descent.
- language
- English
- id
- 9159269
- date added to LUP
- 2024-06-11 11:17:00
- date last changed
- 2024-06-11 11:17:00
@misc{9159269, abstract = {{Every year, approximately 2.2 million people are diagnosed with lung cancer worldwide, and 1.7 million die as a result of the disease. This thesis employs machine learning to predict mean per-capita lung cancer mortality rates in US counties, using a dataset consisting of a wide array of demographic, socio-economic, educational, and healthcare-related variables. The primary objective is to compare the predictive performance of several machine learning methods: ordinary least squares, ridge regression, the lasso, and neural networks with hyperparameters optimized through random search and 5-fold cross-validation. A neural network using mean squared error (MSE) as loss function achieved the lowest mean absolute error (MAE) and root mean squared error (RMSE) on the test set. However, the overall differences between models were small. Regularization through ridge regression and the lasso did not improve predictive performance compared to ordinary least squares. Furthermore, a comparison with previous research revealed substantial differences in model performance, with past studies reporting better predictive results. Consequently, several avenues were suggested as potential paths for future research endeavours.}}, author = {{Welander, Mikael}}, language = {{eng}}, note = {{Student Paper}}, title = {{Comparing Machine Learning Models for Predicting Lung Cancer Mortality Rates}}, year = {{2024}}, }