Random forest modelling of high-dimensional mixed-type data for breast cancer classification

Quist, Jelmar; Taylor, Lawson; Staaf, Johan; Grigoriadis, Anita

Random forest modelling of high-dimensional mixed-type data for breast cancer classification

Mark

Quist, Jelmar ; Taylor, Lawson ; Staaf, Johan ^LU

and Grigoriadis, Anita (2021) In Cancers 13(5).

Abstract: Advances in high-throughput technologies encourage the generation of large amounts of multiomics data to investigate complex diseases, including breast cancer. Given that the aetiologies of such diseases extend beyond a single biological entity, and that essential biological information can be carried by all data regardless of data type, integrative analyses are needed to identify clinically relevant patterns. To facilitate such analyses, we present a permutation-based framework for random forest methods which simultaneously allows the unbiased integration of mixed-type data and assessment of relative feature importance. Through simulation studies and machine learning datasets, the performance of the approach was evaluated. The results... (More); Advances in high-throughput technologies encourage the generation of large amounts of multiomics data to investigate complex diseases, including breast cancer. Given that the aetiologies of such diseases extend beyond a single biological entity, and that essential biological information can be carried by all data regardless of data type, integrative analyses are needed to identify clinically relevant patterns. To facilitate such analyses, we present a permutation-based framework for random forest methods which simultaneously allows the unbiased integration of mixed-type data and assessment of relative feature importance. Through simulation studies and machine learning datasets, the performance of the approach was evaluated. The results showed minimal multicollinearity and limited overfitting. To further assess the performance, the permutation-based framework was applied to high-dimensional mixed-type data from two independent breast cancer cohorts. Reproducibility and robustness of our approach was demonstrated by the concordance in relative feature importance between the cohorts, along with consistencies in clustering profiles. One of the identified clusters was shown to be prognostic for clinical outcome after standard-of-care adjuvant chemotherapy and outperformed current intrinsic molecular breast cancer classifications.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/b7b3d6cb-c51e-425f-bd3b-c035751b189f

author

Quist, Jelmar ; Taylor, Lawson ; Staaf, Johan ^LU

and Grigoriadis, Anita

organization

publishing date

2021

type

Contribution to journal

publication status

published

subject

keywords

Breast cancer, DNA damage repair, Integrative analysis, Machine learning, Random forest

in

Cancers

volume

13

issue

5

article number

991

pages

15 pages

publisher

MDPI AG

external identifiers

scopus:85101701894
pmid:33673506

ISSN

2072-6694

DOI

10.3390/cancers13050991

language

English

LU publication?

yes

id

b7b3d6cb-c51e-425f-bd3b-c035751b189f

date added to LUP

2021-03-15 13:33:39

date last changed

2026-02-21 18:08:43

@article{b7b3d6cb-c51e-425f-bd3b-c035751b189f,
  abstract     = {{<p>Advances in high-throughput technologies encourage the generation of large amounts of multiomics data to investigate complex diseases, including breast cancer. Given that the aetiologies of such diseases extend beyond a single biological entity, and that essential biological information can be carried by all data regardless of data type, integrative analyses are needed to identify clinically relevant patterns. To facilitate such analyses, we present a permutation-based framework for random forest methods which simultaneously allows the unbiased integration of mixed-type data and assessment of relative feature importance. Through simulation studies and machine learning datasets, the performance of the approach was evaluated. The results showed minimal multicollinearity and limited overfitting. To further assess the performance, the permutation-based framework was applied to high-dimensional mixed-type data from two independent breast cancer cohorts. Reproducibility and robustness of our approach was demonstrated by the concordance in relative feature importance between the cohorts, along with consistencies in clustering profiles. One of the identified clusters was shown to be prognostic for clinical outcome after standard-of-care adjuvant chemotherapy and outperformed current intrinsic molecular breast cancer classifications.</p>}},
  author       = {{Quist, Jelmar and Taylor, Lawson and Staaf, Johan and Grigoriadis, Anita}},
  issn         = {{2072-6694}},
  keywords     = {{Breast cancer; DNA damage repair; Integrative analysis; Machine learning; Random forest}},
  language     = {{eng}},
  number       = {{5}},
  publisher    = {{MDPI AG}},
  series       = {{Cancers}},
  title        = {{Random forest modelling of high-dimensional mixed-type data for breast cancer classification}},
  url          = {{http://dx.doi.org/10.3390/cancers13050991}},
  doi          = {{10.3390/cancers13050991}},
  volume       = {{13}},
  year         = {{2021}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Random forest modelling of high-dimensional mixed-type data for breast cancer classification