Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Uncertainty Quantification in Deep Learning for Breast Cancer Classification in Point-of-Care Ultrasound Imaging

Wodrich, Marisa LU (2024) In Master’s Theses in Mathematical Sciences FMAM02 20232
Mathematics (Faculty of Engineering)
Abstract
Breast cancer is the most common type of cancer worldwide with an estimate of 2.3 million new cases in 2020, and the number one cause of cancer-related deaths in women. While survival rates are high in many high-income countries, with a five year relative survival rate of 85% and more, the respective survival rates are poor in many middle- and low-income countries, with rates as low as 12% in Kyadondo, Uganda. This immense difference is largely due to the difference in availability of access to diagnostic tools and screenings, as well as the amount of diagnostic experts.

One solution to bridge this gap and increase the survival rates in low-income countries could be to use point-of-care ultrasound (POCUS) imaging as a cheap and... (More)
Breast cancer is the most common type of cancer worldwide with an estimate of 2.3 million new cases in 2020, and the number one cause of cancer-related deaths in women. While survival rates are high in many high-income countries, with a five year relative survival rate of 85% and more, the respective survival rates are poor in many middle- and low-income countries, with rates as low as 12% in Kyadondo, Uganda. This immense difference is largely due to the difference in availability of access to diagnostic tools and screenings, as well as the amount of diagnostic experts.

One solution to bridge this gap and increase the survival rates in low-income countries could be to use point-of-care ultrasound (POCUS) imaging as a cheap and portable diagnostic tool, combined with a deep learning (DL) based algorithm for image classification. While it has previously been shown that this is possible and can produce good results, it is extremely important in a field like medical diagnostics to have a classifier that is also trustworthy, as wrong predictions can have severe consequences.

This work therefore addresses the question of how to quantify uncertainties in a model's prediction and explores different methods from the field of uncertainty quantification (UQ) and out-of-distribution (OOD) detection, including Bayesian neural networks, deep ensembles and three different post-hoc methods. The results support the hypothesis that there is a correlation between uncertainty scores and the correctness of a prediction. The correlation was the strongest using an average ensemble with entropy-based total uncertainty. The results suggest that a suitable threshold should be set so that the predictions of the 20\% of test data with the highest uncertainties will be marked as not trustworthy. This improves the accuracy of the breast cancer classification (benign, malignant, normal) from previous 68.6% to 77.5%, binary accuracy (cancerous vs. non-cancerous) from 81.8% to 90.2%, and the AUC from 95.6% to 98.4%.

Additionally, all methods were tested for the purpose of OOD detection using three different OOD data sets. The best results were achieved using the post-hoc OOD detection method energy score, performing well on all three data sets, followed by several types of ensembles.

Overall, the results show that there is great potential in the different methods for the purpose of building a safer and more trustworthy classifier that can be applied in a real-world setting. Based on our findings, an average ensemble as the classification method with entropy-based total uncertainty is the most promising choice, followed by the energy score method. Further evaluation with more data and comparison to additional UQ methods is needed to confirm the optimal method. (Less)
Please use this url to cite or link to this publication:
author
Wodrich, Marisa LU
supervisor
organization
course
FMAM02 20232
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Uncertainty quantification, Deep learning, Breast cancer classification, Trustworthy AI, Point-of-care ultrasound
publication/series
Master’s Theses in Mathematical Sciences
report number
LUTFMA-3524-2024
ISSN
1404-6342
other publication id
2024:E5
language
English
id
9149768
date added to LUP
2024-03-18 13:09:17
date last changed
2024-03-18 13:09:17
@misc{9149768,
  abstract     = {{Breast cancer is the most common type of cancer worldwide with an estimate of 2.3 million new cases in 2020, and the number one cause of cancer-related deaths in women. While survival rates are high in many high-income countries, with a five year relative survival rate of 85% and more, the respective survival rates are poor in many middle- and low-income countries, with rates as low as 12% in Kyadondo, Uganda. This immense difference is largely due to the difference in availability of access to diagnostic tools and screenings, as well as the amount of diagnostic experts. 

One solution to bridge this gap and increase the survival rates in low-income countries could be to use point-of-care ultrasound (POCUS) imaging as a cheap and portable diagnostic tool, combined with a deep learning (DL) based algorithm for image classification. While it has previously been shown that this is possible and can produce good results, it is extremely important in a field like medical diagnostics to have a classifier that is also trustworthy, as wrong predictions can have severe consequences. 

This work therefore addresses the question of how to quantify uncertainties in a model's prediction and explores different methods from the field of uncertainty quantification (UQ) and out-of-distribution (OOD) detection, including Bayesian neural networks, deep ensembles and three different post-hoc methods. The results support the hypothesis that there is a correlation between uncertainty scores and the correctness of a prediction. The correlation was the strongest using an average ensemble with entropy-based total uncertainty. The results suggest that a suitable threshold should be set so that the predictions of the 20\% of test data with the highest uncertainties will be marked as not trustworthy. This improves the accuracy of the breast cancer classification (benign, malignant, normal) from previous 68.6% to 77.5%, binary accuracy (cancerous vs. non-cancerous) from 81.8% to 90.2%, and the AUC from 95.6% to 98.4%.

Additionally, all methods were tested for the purpose of OOD detection using three different OOD data sets. The best results were achieved using the post-hoc OOD detection method energy score, performing well on all three data sets, followed by several types of ensembles. 

Overall, the results show that there is great potential in the different methods for the purpose of building a safer and more trustworthy classifier that can be applied in a real-world setting. Based on our findings, an average ensemble as the classification method with entropy-based total uncertainty is the most promising choice, followed by the energy score method. Further evaluation with more data and comparison to additional UQ methods is needed to confirm the optimal method.}},
  author       = {{Wodrich, Marisa}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master’s Theses in Mathematical Sciences}},
  title        = {{Uncertainty Quantification in Deep Learning for Breast Cancer Classification in Point-of-Care Ultrasound Imaging}},
  year         = {{2024}},
}