Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

DLBCL Patient stratification using serum protein profiling

Xu, Xiaoyan (2021) BINP52 20201
Degree Projects in Bioinformatics
Abstract
Diffuse large B-cell lymphoma (DLBCL) is a highly heterogeneous malignancy which poses a challenge for treatment efficiency. To better stratify DLBCL patients and optimize treatment protocols, we analyzed the serum proteome using samples from a population-based cohort of DLBCL patients. Our objective was to find association between serum protein expression with clinical parameters such as overall survival, as well as define patient subgroups for better treatment selection. For this, we employed various machine learning strategies to explore a protein dataset starting with pre-processing steps such as normalization. We aimed to eliminate batch effects identified by protocol-based annotations by precise methods (Combat) or global... (More)
Diffuse large B-cell lymphoma (DLBCL) is a highly heterogeneous malignancy which poses a challenge for treatment efficiency. To better stratify DLBCL patients and optimize treatment protocols, we analyzed the serum proteome using samples from a population-based cohort of DLBCL patients. Our objective was to find association between serum protein expression with clinical parameters such as overall survival, as well as define patient subgroups for better treatment selection. For this, we employed various machine learning strategies to explore a protein dataset starting with pre-processing steps such as normalization. We aimed to eliminate batch effects identified by protocol-based annotations by precise methods (Combat) or global normalization methods (Normalyzer DE). The data processing steps included supervised and unsupervised learning methods such as exploring clustering algorithms, regression strategies and statistical analysis. Through various tools and statistical analysis, we were able to 1) define two subgroups, by selecting Kmeans as our chief clustering method and 2) create a short panel of 58 proteins that were the best classifiers, identified by applying a combination of backward elimination with support vector machines and cross validation. By comparing our panel to a previously proposed panel defined by Pauly et al, we found an overlap of 10 serum proteins and showed that differential expression of this panel can be used to defined subgroups which were mapped to high-risk International Prognostic Index (IPI), elevated Lactate dehydrogenase (LDH) level, Ann-Arbor stage but was not relevant in survival predictions. We highlight the heterogeneity of DLBCL and conclude that larger datasets are required with more understanding of the genetic subtypes for better patient classification. (Less)
Popular Abstract
Tumour subtypes often have different clinical outcome and therapeutic responses, and that is due to genetic heterogeneity patient-to-patient. Among B-cell lymphomas, one such subtype is DLBCL, a highly heterogeneous malignancy due to which the patients in this cohort have variable treatment response. Currently, some studies classify DLBCL into three major subtypes: GCB DLBCL, activated B-cell-like (ABC) DLBCL, and unclassifiable type (about 10%). But the diversity in genetic profile makes it difficult to subclassify this lymphoma further.

To develop a better model of classification using serum proteins, we explored the serum proteomes of a population based DLBCL cohort (n=228) from the VIOLA Biobank, Region Skåne. We used an... (More)
Tumour subtypes often have different clinical outcome and therapeutic responses, and that is due to genetic heterogeneity patient-to-patient. Among B-cell lymphomas, one such subtype is DLBCL, a highly heterogeneous malignancy due to which the patients in this cohort have variable treatment response. Currently, some studies classify DLBCL into three major subtypes: GCB DLBCL, activated B-cell-like (ABC) DLBCL, and unclassifiable type (about 10%). But the diversity in genetic profile makes it difficult to subclassify this lymphoma further.

To develop a better model of classification using serum proteins, we explored the serum proteomes of a population based DLBCL cohort (n=228) from the VIOLA Biobank, Region Skåne. We used an antibody-based microarray to explore 379 antibody clones for 174 unique serum proteins. The dataset was then explored to find correlation of serum expression with clinical parameters such as overall survival and define patient subgroups for better treatment selection.

We explored this protein dataset starting with pre-processing steps such as normalization, followed by data processing steps such as feature reduction to find protein signature that could best classify survival associated sub- groups. The methods used at this stage were K-means clustering followed by backward elimination for shortening protein panel. In the end, we were able to get a short panel of 58 proteins. By comparing our panel to a previously proposed panel defined by a previous publication on DLBCL-IMMray study (Pauly et al), we found an overlap of 10 serum proteins. We were also able to show that the differential expression of this panel can associate the defined subgroups to high risk IPI, elevated LDH level, Ann-Arbor stage but was not relevant in survival predictions.

For data exploration, various software was used like Normalizer DE (for normalization and differential expression), R (for writing R scripts as needed), SPSS (for statistical analysis of data), Qlucore (for unsupervised learning and clustering) as well as IMMray-associated shiny apps developed in conjugation with the experimental platform. The software’s and applications as mentioned above were used in combination with each other, often for validating several steps. The entire process described in this thesis is given in figure below which describes the workflow.

Master´s Degree Project in Bioinformatics 60 credits
Department of Biology, Lund University

Supervisor: Sara Ek, Department of Immunotechnology, Lund University (Less)
Please use this url to cite or link to this publication:
author
Xu, Xiaoyan
supervisor
organization
course
BINP52 20201
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
9055431
date added to LUP
2021-06-16 14:26:01
date last changed
2021-06-16 14:26:01
@misc{9055431,
  abstract     = {{Diffuse large B-cell lymphoma (DLBCL) is a highly heterogeneous malignancy which poses a challenge for treatment efficiency. To better stratify DLBCL patients and optimize treatment protocols, we analyzed the serum proteome using samples from a population-based cohort of DLBCL patients. Our objective was to find association between serum protein expression with clinical parameters such as overall survival, as well as define patient subgroups for better treatment selection. For this, we employed various machine learning strategies to explore a protein dataset starting with pre-processing steps such as normalization. We aimed to eliminate batch effects identified by protocol-based annotations by precise methods (Combat) or global normalization methods (Normalyzer DE). The data processing steps included supervised and unsupervised learning methods such as exploring clustering algorithms, regression strategies and statistical analysis. Through various tools and statistical analysis, we were able to 1) define two subgroups, by selecting Kmeans as our chief clustering method and 2) create a short panel of 58 proteins that were the best classifiers, identified by applying a combination of backward elimination with support vector machines and cross validation. By comparing our panel to a previously proposed panel defined by Pauly et al, we found an overlap of 10 serum proteins and showed that differential expression of this panel can be used to defined subgroups which were mapped to high-risk International Prognostic Index (IPI), elevated Lactate dehydrogenase (LDH) level, Ann-Arbor stage but was not relevant in survival predictions. We highlight the heterogeneity of DLBCL and conclude that larger datasets are required with more understanding of the genetic subtypes for better patient classification.}},
  author       = {{Xu, Xiaoyan}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{DLBCL Patient stratification using serum protein profiling}},
  year         = {{2021}},
}