Diagnosis of Bloodstream Infections Using Machine Learning
(2024) In Master's Theses in Mathematical Sciences FMAM05 20241Mathematics (Faculty of Engineering)
- Abstract
- Bloodstream infections (BSIs) are among the top causes of death in Europe and as such, a serious health concern. Blood cultures are the most common method to diagnose this condition, bringing certain disadvantages. Mainly, it is time- consuming as it can take several days to get the test results back. Furthermore, the blood cultures carry a high risk of contamination. For this reason, a successful deployment of machine learning in the field could reduce arbitrary antibiotic usage and expedite correct treatment. This was the motivation for our thesis, where XGBoost, TabNet and a Multilayer Perceptron are used to predict blood culture outcomes.
The main question is which model performs the best on the provided dataset? This dataset... (More) - Bloodstream infections (BSIs) are among the top causes of death in Europe and as such, a serious health concern. Blood cultures are the most common method to diagnose this condition, bringing certain disadvantages. Mainly, it is time- consuming as it can take several days to get the test results back. Furthermore, the blood cultures carry a high risk of contamination. For this reason, a successful deployment of machine learning in the field could reduce arbitrary antibiotic usage and expedite correct treatment. This was the motivation for our thesis, where XGBoost, TabNet and a Multilayer Perceptron are used to predict blood culture outcomes.
The main question is which model performs the best on the provided dataset? This dataset contains vital measurements and laboratory results in tabular format. Furthermore, due to the required preprocessing of the raw data and handling of its missing values, which imputation method is most suitable? To answer these questions, we conduct a study where the models and multiple imputation methods are evaluated and compared. We find that XGBoost is the superior model, while imputing with median values and including missing indicators obtains the best results. This combination of methods obtained an Area under Receiver operating characteristic of 0.763 and an Area under Precision-Recall curve of 0.361. With this model and a threshold of 5%, the amount of blood cultures could be reduced by 29%, with the drawback that 1% of true positives are missed. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9163317
- author
- Jakobsson, Hector LU and Rydengård, Erik LU
- supervisor
-
- Ida Arvidsson LU
- Johanna Engman LU
- Gustav Torisson LU
- Oskar Ljungquist LU
- organization
- course
- FMAM05 20241
- year
- 2024
- type
- H2 - Master's Degree (Two Years)
- subject
- publication/series
- Master's Theses in Mathematical Sciences
- report number
- LUTFMA-3545-2024
- ISSN
- 1404-6342
- other publication id
- 2024:E45
- language
- English
- id
- 9163317
- date added to LUP
- 2024-06-14 10:38:28
- date last changed
- 2024-06-14 10:38:28
@misc{9163317, abstract = {{Bloodstream infections (BSIs) are among the top causes of death in Europe and as such, a serious health concern. Blood cultures are the most common method to diagnose this condition, bringing certain disadvantages. Mainly, it is time- consuming as it can take several days to get the test results back. Furthermore, the blood cultures carry a high risk of contamination. For this reason, a successful deployment of machine learning in the field could reduce arbitrary antibiotic usage and expedite correct treatment. This was the motivation for our thesis, where XGBoost, TabNet and a Multilayer Perceptron are used to predict blood culture outcomes. The main question is which model performs the best on the provided dataset? This dataset contains vital measurements and laboratory results in tabular format. Furthermore, due to the required preprocessing of the raw data and handling of its missing values, which imputation method is most suitable? To answer these questions, we conduct a study where the models and multiple imputation methods are evaluated and compared. We find that XGBoost is the superior model, while imputing with median values and including missing indicators obtains the best results. This combination of methods obtained an Area under Receiver operating characteristic of 0.763 and an Area under Precision-Recall curve of 0.361. With this model and a threshold of 5%, the amount of blood cultures could be reduced by 29%, with the drawback that 1% of true positives are missed.}}, author = {{Jakobsson, Hector and Rydengård, Erik}}, issn = {{1404-6342}}, language = {{eng}}, note = {{Student Paper}}, series = {{Master's Theses in Mathematical Sciences}}, title = {{Diagnosis of Bloodstream Infections Using Machine Learning}}, year = {{2024}}, }