Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Predicting True Sepsis and Culture-positive Sepsis in Intensive Care Unit with Machine Learning Techniques

Wu, Zeyuan LU (2024) In Master’s Theses in Mathematical Sciences MASM02 20231
Mathematical Statistics
Abstract
Sepsis, a serious medical condition often leading to patients requiring intensive care, has prompted numerous scientists to employ mathematical techniques to aid in its diagnosis. This thesis uses logistic regression and a machine learning technique, XGBoost, to predict true sepsis (as opposed to sepsis mimics) and culture-positive sepsis (among true sepsis) in critical care using blood test results, physiological measurements and other patient characteristics.

In this study, the dataset employed for constructing the prediction models comprises the information of 2,667 patients across 105 variables. Notably, a considerable portion of these variables exhibits missing values. To address this issue, imputation techniques are... (More)
Sepsis, a serious medical condition often leading to patients requiring intensive care, has prompted numerous scientists to employ mathematical techniques to aid in its diagnosis. This thesis uses logistic regression and a machine learning technique, XGBoost, to predict true sepsis (as opposed to sepsis mimics) and culture-positive sepsis (among true sepsis) in critical care using blood test results, physiological measurements and other patient characteristics.

In this study, the dataset employed for constructing the prediction models comprises the information of 2,667 patients across 105 variables. Notably, a considerable portion of these variables exhibits missing values. To address this issue, imputation techniques are systematically applied to rectify the gaps within the dataset.

The predictive models acquired in this study are evaluated with the area under the operating characteristic curve (AUC) and using cross-validation. To address the imputed missing values within the dataset, a modified cross-validation technique is employed. This methodology ensures that imputed values are exclusively utilized during the training phase, while the testing phase exclusively involves the use of the original, unaltered data. Variable selection and analysis have been conducted employing forest plots for regression, while for XGBoost models, significance is determined through the utilization of importance plots and SHAP value plots.

The result of this study shows that XGBoost performs better than the regression models. In predicting true sepsis, the XGBoost model achieves an AUC of 0.74, while the regression model yields an AUC of 0.72. In predicting culture positivity, the XGBoost model attains an AUC of 0.77, whereas the regression model yields an AUC of 0.74. Both the XGBoost algorithm and regression models demonstrated efficacy in predicting true sepsis and culture-positive sepsis. The performance of these prediction models exhibits potential for enhancement with the utilization of a more extensive dataset. Consequently, mathematical models serve as valuable and effective aids in supporting medical professionals' clinical judgement. (Less)
Please use this url to cite or link to this publication:
author
Wu, Zeyuan LU
supervisor
organization
course
MASM02 20231
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Machine Learning, Diagnosis of Sepsis, XGBoost, Logistic Regression
publication/series
Master’s Theses in Mathematical Sciences
report number
LUNFMS-3124-2024
ISSN
1404-6342
other publication id
2024:E11
language
English
id
9149839
date added to LUP
2024-03-14 10:07:59
date last changed
2024-04-23 15:38:44
@misc{9149839,
  abstract     = {{Sepsis, a serious medical condition often leading to patients requiring intensive care, has prompted numerous scientists to employ mathematical techniques to aid in its diagnosis. This thesis uses logistic regression and a machine learning technique, XGBoost, to predict true sepsis (as opposed to sepsis mimics) and culture-positive sepsis (among true sepsis) in critical care using blood test results, physiological measurements and other patient characteristics. 

In this study, the dataset employed for constructing the prediction models comprises the information of 2,667 patients across 105 variables. Notably, a considerable portion of these variables exhibits missing values. To address this issue, imputation techniques are systematically applied to rectify the gaps within the dataset.

The predictive models acquired in this study are evaluated with the area under the operating characteristic curve (AUC) and using cross-validation. To address the imputed missing values within the dataset, a modified cross-validation technique is employed. This methodology ensures that imputed values are exclusively utilized during the training phase, while the testing phase exclusively involves the use of the original, unaltered data. Variable selection and analysis have been conducted employing forest plots for regression, while for XGBoost models, significance is determined through the utilization of importance plots and SHAP value plots.

The result of this study shows that XGBoost performs better than the regression models. In predicting true sepsis, the XGBoost model achieves an AUC of 0.74, while the regression model yields an AUC of 0.72. In predicting culture positivity, the XGBoost model attains an AUC of 0.77, whereas the regression model yields an AUC of 0.74. Both the XGBoost algorithm and regression models demonstrated efficacy in predicting true sepsis and culture-positive sepsis. The performance of these prediction models exhibits potential for enhancement with the utilization of a more extensive dataset. Consequently, mathematical models serve as valuable and effective aids in supporting medical professionals' clinical judgement.}},
  author       = {{Wu, Zeyuan}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master’s Theses in Mathematical Sciences}},
  title        = {{Predicting True Sepsis and Culture-positive Sepsis in Intensive Care Unit with Machine Learning Techniques}},
  year         = {{2024}},
}