Uncertainty Quantification in Deep Learning for Breast Cancer Classification in Point-of-Care Ultrasound Imaging

Wodrich, Marisa

Uncertainty Quantification in Deep Learning for Breast Cancer Classification in Point-of-Care Ultrasound Imaging

Mark

Wodrich, Marisa ^LU (2024) In Master’s Theses in Mathematical Sciences FMAM02 20232
Mathematics (Faculty of Engineering)

Abstract: Breast cancer is the most common type of cancer worldwide with an estimate of 2.3 million new cases in 2020, and the number one cause of cancer-related deaths in women. While survival rates are high in many high-income countries, with a five year relative survival rate of 85% and more, the respective survival rates are poor in many middle- and low-income countries, with rates as low as 12% in Kyadondo, Uganda. This immense difference is largely due to the difference in availability of access to diagnostic tools and screenings, as well as the amount of diagnostic experts.

One solution to bridge this gap and increase the survival rates in low-income countries could be to use point-of-care ultrasound (POCUS) imaging as a cheap and... (More); Breast cancer is the most common type of cancer worldwide with an estimate of 2.3 million new cases in 2020, and the number one cause of cancer-related deaths in women. While survival rates are high in many high-income countries, with a five year relative survival rate of 85% and more, the respective survival rates are poor in many middle- and low-income countries, with rates as low as 12% in Kyadondo, Uganda. This immense difference is largely due to the difference in availability of access to diagnostic tools and screenings, as well as the amount of diagnostic experts.

One solution to bridge this gap and increase the survival rates in low-income countries could be to use point-of-care ultrasound (POCUS) imaging as a cheap and portable diagnostic tool, combined with a deep learning (DL) based algorithm for image classification. While it has previously been shown that this is possible and can produce good results, it is extremely important in a field like medical diagnostics to have a classifier that is also trustworthy, as wrong predictions can have severe consequences.

This work therefore addresses the question of how to quantify uncertainties in a model's prediction and explores different methods from the field of uncertainty quantification (UQ) and out-of-distribution (OOD) detection, including Bayesian neural networks, deep ensembles and three different post-hoc methods. The results support the hypothesis that there is a correlation between uncertainty scores and the correctness of a prediction. The correlation was the strongest using an average ensemble with entropy-based total uncertainty. The results suggest that a suitable threshold should be set so that the predictions of the 20\% of test data with the highest uncertainties will be marked as not trustworthy. This improves the accuracy of the breast cancer classification (benign, malignant, normal) from previous 68.6% to 77.5%, binary accuracy (cancerous vs. non-cancerous) from 81.8% to 90.2%, and the AUC from 95.6% to 98.4%.

Additionally, all methods were tested for the purpose of OOD detection using three different OOD data sets. The best results were achieved using the post-hoc OOD detection method energy score, performing well on all three data sets, followed by several types of ensembles.

Overall, the results show that there is great potential in the different methods for the purpose of building a safer and more trustworthy classifier that can be applied in a real-world setting. Based on our findings, an average ensemble as the classification method with entropy-based total uncertainty is the most promising choice, followed by the energy score method. Further evaluation with more data and comparison to additional UQ methods is needed to confirm the optimal method. (Less)

- Open Access
- |
- PDF

Links

Document download statistics

Related Materials

Related object is popular science:
Popular Science Summary

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9149768

author

Wodrich, Marisa ^LU

supervisor

Ida Arvidsson ^LU
Jennie Karlsson ^LU

organization

Mathematics (Faculty of Engineering)

course

FMAM02 20232

year

2024

type

H2 - Master's Degree (Two Years)

subject

Mathematics and Statistics

keywords

Uncertainty quantification, Deep learning, Breast cancer classification, Trustworthy AI, Point-of-care ultrasound

publication/series

Master’s Theses in Mathematical Sciences

report number

LUTFMA-3524-2024

ISSN

1404-6342

other publication id

2024:E5

language

English

id

9149768

date added to LUP

2024-03-18 13:09:17

date last changed

2024-03-18 13:09:17

@misc{9149768,
  abstract     = {{Breast cancer is the most common type of cancer worldwide with an estimate of 2.3 million new cases in 2020, and the number one cause of cancer-related deaths in women. While survival rates are high in many high-income countries, with a five year relative survival rate of 85% and more, the respective survival rates are poor in many middle- and low-income countries, with rates as low as 12% in Kyadondo, Uganda. This immense difference is largely due to the difference in availability of access to diagnostic tools and screenings, as well as the amount of diagnostic experts. 

One solution to bridge this gap and increase the survival rates in low-income countries could be to use point-of-care ultrasound (POCUS) imaging as a cheap and portable diagnostic tool, combined with a deep learning (DL) based algorithm for image classification. While it has previously been shown that this is possible and can produce good results, it is extremely important in a field like medical diagnostics to have a classifier that is also trustworthy, as wrong predictions can have severe consequences. 

This work therefore addresses the question of how to quantify uncertainties in a model's prediction and explores different methods from the field of uncertainty quantification (UQ) and out-of-distribution (OOD) detection, including Bayesian neural networks, deep ensembles and three different post-hoc methods. The results support the hypothesis that there is a correlation between uncertainty scores and the correctness of a prediction. The correlation was the strongest using an average ensemble with entropy-based total uncertainty. The results suggest that a suitable threshold should be set so that the predictions of the 20\% of test data with the highest uncertainties will be marked as not trustworthy. This improves the accuracy of the breast cancer classification (benign, malignant, normal) from previous 68.6% to 77.5%, binary accuracy (cancerous vs. non-cancerous) from 81.8% to 90.2%, and the AUC from 95.6% to 98.4%.

Additionally, all methods were tested for the purpose of OOD detection using three different OOD data sets. The best results were achieved using the post-hoc OOD detection method energy score, performing well on all three data sets, followed by several types of ensembles. 

Overall, the results show that there is great potential in the different methods for the purpose of building a safer and more trustworthy classifier that can be applied in a real-world setting. Based on our findings, an average ensemble as the classification method with entropy-based total uncertainty is the most promising choice, followed by the energy score method. Further evaluation with more data and comparison to additional UQ methods is needed to confirm the optimal method.}},
  author       = {{Wodrich, Marisa}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master’s Theses in Mathematical Sciences}},
  title        = {{Uncertainty Quantification in Deep Learning for Breast Cancer Classification in Point-of-Care Ultrasound Imaging}},
  year         = {{2024}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Uncertainty Quantification in Deep Learning for Breast Cancer Classification in Point-of-Care Ultrasound Imaging