Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Variable selection for generalized linear mixed model by L1 penalization for predicting clinical parameters of ovarian cancer

Diep, Lan Hoa LU (2021) In Bachelor's Theses in Mathematical Sciences MASK11 20211
Mathematical Statistics
Abstract
The quantity of biomarkers, which are proteins in this case, in ovarian cancer (OC)
tumor and immune tissue regions of interest (ROIs) were measured with the new
technology Digital Spatial Profiler (DSP). These measurements were used to construct regression models on the biomarkers to predict for two clinical parameters; tumor type (”Type 1” vs ”Type 2”) and the immune infiltration type (”Cavities” vs ”Dispersed”). The dataset was divided into tumor and immune ROIs to analyze separately. A total of three models were constructed: immune ROI with immune infiltration type, immune ROI with tumor type, and tumor ROI with tumor type. Since there were repeated measurements on the same patient but on different ROIs, logistic linear mixed model... (More)
The quantity of biomarkers, which are proteins in this case, in ovarian cancer (OC)
tumor and immune tissue regions of interest (ROIs) were measured with the new
technology Digital Spatial Profiler (DSP). These measurements were used to construct regression models on the biomarkers to predict for two clinical parameters; tumor type (”Type 1” vs ”Type 2”) and the immune infiltration type (”Cavities” vs ”Dispersed”). The dataset was divided into tumor and immune ROIs to analyze separately. A total of three models were constructed: immune ROI with immune infiltration type, immune ROI with tumor type, and tumor ROI with tumor type. Since there were repeated measurements on the same patient but on different ROIs, logistic linear mixed model with random intercept was used to account for the dependency of ROIs and allow for the intercept to vary between patients. Since there were too many biomarkers to regress on, Lasso was used in combination with mixed model (GLMMLasso) for automatic variable selection. The tuning parameter λ in
Lasso was chosen using BIC with some supervision. The model of immune ROI with immune infiltration level included four variables with coefficients that make
biological sense and has good fit with both the training and test data. The model of immune ROI with tumor type had three variables that also makes biological sense
and fitted the training data well, but not too well for test data. The model of tumor ROI with tumor type had a total of 12 variables but some of the variable coefficients do not make sense biologically. It could probably be optimized by including fewer variables in the model. For any certain conclusion to be made about the predictability of the models, bigger sample size would be needed for refitting as well as testing the models. (Less)
Please use this url to cite or link to this publication:
author
Diep, Lan Hoa LU
supervisor
organization
course
MASK11 20211
year
type
M2 - Bachelor Degree
subject
publication/series
Bachelor's Theses in Mathematical Sciences
report number
LUNFMS-4056-2021
ISSN
1654-6229
other publication id
2021:K27
language
English
id
9058519
date added to LUP
2021-07-05 15:34:49
date last changed
2021-07-09 08:45:48
@misc{9058519,
  abstract     = {{The quantity of biomarkers, which are proteins in this case, in ovarian cancer (OC)
tumor and immune tissue regions of interest (ROIs) were measured with the new
technology Digital Spatial Profiler (DSP). These measurements were used to construct regression models on the biomarkers to predict for two clinical parameters; tumor type (”Type 1” vs ”Type 2”) and the immune infiltration type (”Cavities” vs ”Dispersed”). The dataset was divided into tumor and immune ROIs to analyze separately. A total of three models were constructed: immune ROI with immune infiltration type, immune ROI with tumor type, and tumor ROI with tumor type. Since there were repeated measurements on the same patient but on different ROIs, logistic linear mixed model with random intercept was used to account for the dependency of ROIs and allow for the intercept to vary between patients. Since there were too many biomarkers to regress on, Lasso was used in combination with mixed model (GLMMLasso) for automatic variable selection. The tuning parameter λ in
Lasso was chosen using BIC with some supervision. The model of immune ROI with immune infiltration level included four variables with coefficients that make
biological sense and has good fit with both the training and test data. The model of immune ROI with tumor type had three variables that also makes biological sense
and fitted the training data well, but not too well for test data. The model of tumor ROI with tumor type had a total of 12 variables but some of the variable coefficients do not make sense biologically. It could probably be optimized by including fewer variables in the model. For any certain conclusion to be made about the predictability of the models, bigger sample size would be needed for refitting as well as testing the models.}},
  author       = {{Diep, Lan Hoa}},
  issn         = {{1654-6229}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Bachelor's Theses in Mathematical Sciences}},
  title        = {{Variable selection for generalized linear mixed model by L1 penalization for predicting clinical parameters of ovarian cancer}},
  year         = {{2021}},
}