Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Diagnosis of Bloodstream Infections Using Machine Learning

Jakobsson, Hector LU and Rydengård, Erik LU (2024) In Master's Theses in Mathematical Sciences FMAM05 20241
Mathematics (Faculty of Engineering)
Abstract
Bloodstream infections (BSIs) are among the top causes of death in Europe and as such, a serious health concern. Blood cultures are the most common method to diagnose this condition, bringing certain disadvantages. Mainly, it is time- consuming as it can take several days to get the test results back. Furthermore, the blood cultures carry a high risk of contamination. For this reason, a successful deployment of machine learning in the field could reduce arbitrary antibiotic usage and expedite correct treatment. This was the motivation for our thesis, where XGBoost, TabNet and a Multilayer Perceptron are used to predict blood culture outcomes.

The main question is which model performs the best on the provided dataset? This dataset... (More)
Bloodstream infections (BSIs) are among the top causes of death in Europe and as such, a serious health concern. Blood cultures are the most common method to diagnose this condition, bringing certain disadvantages. Mainly, it is time- consuming as it can take several days to get the test results back. Furthermore, the blood cultures carry a high risk of contamination. For this reason, a successful deployment of machine learning in the field could reduce arbitrary antibiotic usage and expedite correct treatment. This was the motivation for our thesis, where XGBoost, TabNet and a Multilayer Perceptron are used to predict blood culture outcomes.

The main question is which model performs the best on the provided dataset? This dataset contains vital measurements and laboratory results in tabular format. Furthermore, due to the required preprocessing of the raw data and handling of its missing values, which imputation method is most suitable? To answer these questions, we conduct a study where the models and multiple imputation methods are evaluated and compared. We find that XGBoost is the superior model, while imputing with median values and including missing indicators obtains the best results. This combination of methods obtained an Area under Receiver operating characteristic of 0.763 and an Area under Precision-Recall curve of 0.361. With this model and a threshold of 5%, the amount of blood cultures could be reduced by 29%, with the drawback that 1% of true positives are missed. (Less)
Please use this url to cite or link to this publication:
author
Jakobsson, Hector LU and Rydengård, Erik LU
supervisor
organization
course
FMAM05 20241
year
type
H2 - Master's Degree (Two Years)
subject
publication/series
Master's Theses in Mathematical Sciences
report number
LUTFMA-3545-2024
ISSN
1404-6342
other publication id
2024:E45
language
English
id
9163317
date added to LUP
2024-06-14 10:38:28
date last changed
2024-06-14 10:38:28
@misc{9163317,
  abstract     = {{Bloodstream infections (BSIs) are among the top causes of death in Europe and as such, a serious health concern. Blood cultures are the most common method to diagnose this condition, bringing certain disadvantages. Mainly, it is time- consuming as it can take several days to get the test results back. Furthermore, the blood cultures carry a high risk of contamination. For this reason, a successful deployment of machine learning in the field could reduce arbitrary antibiotic usage and expedite correct treatment. This was the motivation for our thesis, where XGBoost, TabNet and a Multilayer Perceptron are used to predict blood culture outcomes.

The main question is which model performs the best on the provided dataset? This dataset contains vital measurements and laboratory results in tabular format. Furthermore, due to the required preprocessing of the raw data and handling of its missing values, which imputation method is most suitable? To answer these questions, we conduct a study where the models and multiple imputation methods are evaluated and compared. We find that XGBoost is the superior model, while imputing with median values and including missing indicators obtains the best results. This combination of methods obtained an Area under Receiver operating characteristic of 0.763 and an Area under Precision-Recall curve of 0.361. With this model and a threshold of 5%, the amount of blood cultures could be reduced by 29%, with the drawback that 1% of true positives are missed.}},
  author       = {{Jakobsson, Hector and Rydengård, Erik}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Diagnosis of Bloodstream Infections Using Machine Learning}},
  year         = {{2024}},
}