Advanced

Machine Learning Prediction of Cardiovascular Complications

Rahimi, Sima (2018) BINP51 20181
Degree Projects in Bioinformatics
Popular Abstract
Can Machine Learning Provide more Insights in Cardiovascular Disease?

Cardiovascular (CV) disease is the main cause of morbidity and mortality all over the world and the number of people suffering from CV complications is growing as the population is getting older. Despite of knowing several risk factors such as age, hypertension, body mass index (BMI) and genetics associated to CV complication, it is still challenging to identify patients at high risk of developing CV events such as heart attack and stroke.
Today, thanks to the technical advances, data collection from various sources is rapidly growing which requires alternative approaches to efficiently reveal predictive insights into complex polygenic diseases such as CV in order to... (More)
Can Machine Learning Provide more Insights in Cardiovascular Disease?

Cardiovascular (CV) disease is the main cause of morbidity and mortality all over the world and the number of people suffering from CV complications is growing as the population is getting older. Despite of knowing several risk factors such as age, hypertension, body mass index (BMI) and genetics associated to CV complication, it is still challenging to identify patients at high risk of developing CV events such as heart attack and stroke.
Today, thanks to the technical advances, data collection from various sources is rapidly growing which requires alternative approaches to efficiently reveal predictive insights into complex polygenic diseases such as CV in order to predict patients at high risk who might survive by taking the possible benefits of preventing treatments.

Machine Learning for Disease Prediction
While traditional statistical approaches fail to analyse big datasets, machine learning (ML) models are capable to learn from available data, exploit the complex relationship between attributes and predict the corresponding outcome of new individuals.

In this study, given a dataset comprised of ~ 1000 descriptive variables including clinical routines for 1018 patients, a binary supervised classifier , XGBoost - a greedy and powerful leaning model was employed to classify patients with prebaseline symptoms, predict occurrence of future events and identify important conventional risk factors. Besides, the impact of incorporating a set of genetic variants in form of single nucleotide polymorphisms (SNPs) belong to a subset of patients in prediction performance was evaluated.

In big clinical datasets, presence of redundancy, irrelevant information, missing values and huge number of features are common which can produce misleading results in classification of individuals by the learning classifier. In order to transform raw data into an understandable format to facilitate machine readability, the preprocessing procedure including imputation, rescaling, resampling and feature subset selection was utilized. XGBoost learning parameters were adjusted and finally performance of the final model was evaluated.

Our findings show that XGBoost learning model could reliably predict new symptomatic/asymptomatic individuals based on the baseline characteristic where we included 1018 samples to train the model. However, we have also seen that predictive power of trained classifiers drops with sample size where we included a smaller training data in prediction of future events. According to our results, including genetic variants could improve the prediction accuracy. Furthermore, the ML model identifies age, BMI, blood LDL (Low-Density Lipoproteins), blood cholesterol, waist and blood platelets as more important and relevant predictors to outcomes in question which demonstrate the consistency of our model outputs with other clinical and diagnostic studies. In consequence, other top identified features and genetic variants can potentially be under further closer investigation leading us to get a greater insight in identification of unknown risk factors and genetic effects implicated in CV complication and future events.

Master’s Degree Project in Bioinformatics 45 credits 2018
Department of Clinical Sciences, Lund University

Advisor: Petr Volkov, PhD
Department of Clinical sciences, Lund University (Less)
Please use this url to cite or link to this publication:
author
Rahimi, Sima
supervisor
organization
course
BINP51 20181
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
8963876
date added to LUP
2018-12-05 14:42:43
date last changed
2018-12-05 14:42:43
@misc{8963876,
  author       = {Rahimi, Sima},
  language     = {eng},
  note         = {Student Paper},
  title        = {Machine Learning Prediction of Cardiovascular Complications},
  year         = {2018},
}