Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Random forest modelling of high-dimensional mixed-type data for breast cancer classification

Quist, Jelmar ; Taylor, Lawson ; Staaf, Johan LU orcid and Grigoriadis, Anita (2021) In Cancers 13(5).
Abstract

Advances in high-throughput technologies encourage the generation of large amounts of multiomics data to investigate complex diseases, including breast cancer. Given that the aetiologies of such diseases extend beyond a single biological entity, and that essential biological information can be carried by all data regardless of data type, integrative analyses are needed to identify clinically relevant patterns. To facilitate such analyses, we present a permutation-based framework for random forest methods which simultaneously allows the unbiased integration of mixed-type data and assessment of relative feature importance. Through simulation studies and machine learning datasets, the performance of the approach was evaluated. The results... (More)

Advances in high-throughput technologies encourage the generation of large amounts of multiomics data to investigate complex diseases, including breast cancer. Given that the aetiologies of such diseases extend beyond a single biological entity, and that essential biological information can be carried by all data regardless of data type, integrative analyses are needed to identify clinically relevant patterns. To facilitate such analyses, we present a permutation-based framework for random forest methods which simultaneously allows the unbiased integration of mixed-type data and assessment of relative feature importance. Through simulation studies and machine learning datasets, the performance of the approach was evaluated. The results showed minimal multicollinearity and limited overfitting. To further assess the performance, the permutation-based framework was applied to high-dimensional mixed-type data from two independent breast cancer cohorts. Reproducibility and robustness of our approach was demonstrated by the concordance in relative feature importance between the cohorts, along with consistencies in clustering profiles. One of the identified clusters was shown to be prognostic for clinical outcome after standard-of-care adjuvant chemotherapy and outperformed current intrinsic molecular breast cancer classifications.

(Less)
Please use this url to cite or link to this publication:
author
; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Breast cancer, DNA damage repair, Integrative analysis, Machine learning, Random forest
in
Cancers
volume
13
issue
5
article number
991
pages
15 pages
publisher
MDPI AG
external identifiers
  • scopus:85101701894
  • pmid:33673506
ISSN
2072-6694
DOI
10.3390/cancers13050991
language
English
LU publication?
yes
id
b7b3d6cb-c51e-425f-bd3b-c035751b189f
date added to LUP
2021-03-15 13:33:39
date last changed
2024-06-13 08:31:58
@article{b7b3d6cb-c51e-425f-bd3b-c035751b189f,
  abstract     = {{<p>Advances in high-throughput technologies encourage the generation of large amounts of multiomics data to investigate complex diseases, including breast cancer. Given that the aetiologies of such diseases extend beyond a single biological entity, and that essential biological information can be carried by all data regardless of data type, integrative analyses are needed to identify clinically relevant patterns. To facilitate such analyses, we present a permutation-based framework for random forest methods which simultaneously allows the unbiased integration of mixed-type data and assessment of relative feature importance. Through simulation studies and machine learning datasets, the performance of the approach was evaluated. The results showed minimal multicollinearity and limited overfitting. To further assess the performance, the permutation-based framework was applied to high-dimensional mixed-type data from two independent breast cancer cohorts. Reproducibility and robustness of our approach was demonstrated by the concordance in relative feature importance between the cohorts, along with consistencies in clustering profiles. One of the identified clusters was shown to be prognostic for clinical outcome after standard-of-care adjuvant chemotherapy and outperformed current intrinsic molecular breast cancer classifications.</p>}},
  author       = {{Quist, Jelmar and Taylor, Lawson and Staaf, Johan and Grigoriadis, Anita}},
  issn         = {{2072-6694}},
  keywords     = {{Breast cancer; DNA damage repair; Integrative analysis; Machine learning; Random forest}},
  language     = {{eng}},
  number       = {{5}},
  publisher    = {{MDPI AG}},
  series       = {{Cancers}},
  title        = {{Random forest modelling of high-dimensional mixed-type data for breast cancer classification}},
  url          = {{http://dx.doi.org/10.3390/cancers13050991}},
  doi          = {{10.3390/cancers13050991}},
  volume       = {{13}},
  year         = {{2021}},
}