Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Predictors of colorectal cancer survival using cox regression and random survival forests models based on gene expression data

Mohammed, Mohanad ; Mboya, Innocent B. LU orcid ; Mwambi, Henry ; Elbashir, Murtada K. and Omolo, Bernard (2021) In PLoS ONE 16(12 December).
Abstract

Understanding and identifying the markers and clinical information that are associated with colorectal cancer (CRC) patient survival is needed for early detection and diagnosis. In this work, we aimed to build a simple model using Cox proportional hazards (PH) and random survival forest (RSF) and find a robust signature for predicting CRC overall survival. We used stepwise regression to develop Cox PH model to analyse 54 common differentially expressed genes from three mutations. RSF is applied using log-rank and log-rank-score based on 5000 survival trees, and therefore, variables important obtained to find the genes that are most influential for CRC survival. We compared the predictive performance of the Cox PH model and RSF for early... (More)

Understanding and identifying the markers and clinical information that are associated with colorectal cancer (CRC) patient survival is needed for early detection and diagnosis. In this work, we aimed to build a simple model using Cox proportional hazards (PH) and random survival forest (RSF) and find a robust signature for predicting CRC overall survival. We used stepwise regression to develop Cox PH model to analyse 54 common differentially expressed genes from three mutations. RSF is applied using log-rank and log-rank-score based on 5000 survival trees, and therefore, variables important obtained to find the genes that are most influential for CRC survival. We compared the predictive performance of the Cox PH model and RSF for early CRC detection and diagnosis. The results indicate that SLC9A8, IER5, ARSJ, ANKRD27, and PIPOX genes were significantly associated with the CRC overall survival. In addition, age, sex, and stages are also affecting the CRC overall survival. The RSF model using log-rank is better than log-rank-score, while log-rank-score needed more trees to stabilize. Overall, the imputation of missing values enhanced the model’s predictive performance. In addition, Cox PH predictive performance was better than RSF.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; and
publishing date
type
Contribution to journal
publication status
published
in
PLoS ONE
volume
16
issue
12 December
article number
e0261625
publisher
Public Library of Science (PLoS)
external identifiers
  • pmid:34965262
  • scopus:85122002339
ISSN
1932-6203
DOI
10.1371/journal.pone.0261625
language
English
LU publication?
no
additional info
Publisher Copyright: © 2021 Public Library of Science. All rights reserved.
id
3cbc64fd-b804-4e8c-bc35-4bf6612a1cdb
date added to LUP
2022-09-29 10:02:48
date last changed
2024-07-11 22:12:07
@article{3cbc64fd-b804-4e8c-bc35-4bf6612a1cdb,
  abstract     = {{<p>Understanding and identifying the markers and clinical information that are associated with colorectal cancer (CRC) patient survival is needed for early detection and diagnosis. In this work, we aimed to build a simple model using Cox proportional hazards (PH) and random survival forest (RSF) and find a robust signature for predicting CRC overall survival. We used stepwise regression to develop Cox PH model to analyse 54 common differentially expressed genes from three mutations. RSF is applied using log-rank and log-rank-score based on 5000 survival trees, and therefore, variables important obtained to find the genes that are most influential for CRC survival. We compared the predictive performance of the Cox PH model and RSF for early CRC detection and diagnosis. The results indicate that SLC9A8, IER5, ARSJ, ANKRD27, and PIPOX genes were significantly associated with the CRC overall survival. In addition, age, sex, and stages are also affecting the CRC overall survival. The RSF model using log-rank is better than log-rank-score, while log-rank-score needed more trees to stabilize. Overall, the imputation of missing values enhanced the model’s predictive performance. In addition, Cox PH predictive performance was better than RSF.</p>}},
  author       = {{Mohammed, Mohanad and Mboya, Innocent B. and Mwambi, Henry and Elbashir, Murtada K. and Omolo, Bernard}},
  issn         = {{1932-6203}},
  language     = {{eng}},
  number       = {{12 December}},
  publisher    = {{Public Library of Science (PLoS)}},
  series       = {{PLoS ONE}},
  title        = {{Predictors of colorectal cancer survival using cox regression and random survival forests models based on gene expression data}},
  url          = {{http://dx.doi.org/10.1371/journal.pone.0261625}},
  doi          = {{10.1371/journal.pone.0261625}},
  volume       = {{16}},
  year         = {{2021}},
}