Random forest modelling of high-dimensional mixed-type data for breast cancer classification
(2021) In Cancers 13(5).- Abstract
Advances in high-throughput technologies encourage the generation of large amounts of multiomics data to investigate complex diseases, including breast cancer. Given that the aetiologies of such diseases extend beyond a single biological entity, and that essential biological information can be carried by all data regardless of data type, integrative analyses are needed to identify clinically relevant patterns. To facilitate such analyses, we present a permutation-based framework for random forest methods which simultaneously allows the unbiased integration of mixed-type data and assessment of relative feature importance. Through simulation studies and machine learning datasets, the performance of the approach was evaluated. The results... (More)
Advances in high-throughput technologies encourage the generation of large amounts of multiomics data to investigate complex diseases, including breast cancer. Given that the aetiologies of such diseases extend beyond a single biological entity, and that essential biological information can be carried by all data regardless of data type, integrative analyses are needed to identify clinically relevant patterns. To facilitate such analyses, we present a permutation-based framework for random forest methods which simultaneously allows the unbiased integration of mixed-type data and assessment of relative feature importance. Through simulation studies and machine learning datasets, the performance of the approach was evaluated. The results showed minimal multicollinearity and limited overfitting. To further assess the performance, the permutation-based framework was applied to high-dimensional mixed-type data from two independent breast cancer cohorts. Reproducibility and robustness of our approach was demonstrated by the concordance in relative feature importance between the cohorts, along with consistencies in clustering profiles. One of the identified clusters was shown to be prognostic for clinical outcome after standard-of-care adjuvant chemotherapy and outperformed current intrinsic molecular breast cancer classifications.
(Less)
- author
- Quist, Jelmar ; Taylor, Lawson ; Staaf, Johan LU and Grigoriadis, Anita
- organization
- publishing date
- 2021
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- Breast cancer, DNA damage repair, Integrative analysis, Machine learning, Random forest
- in
- Cancers
- volume
- 13
- issue
- 5
- article number
- 991
- pages
- 15 pages
- publisher
- MDPI AG
- external identifiers
-
- pmid:33673506
- scopus:85101701894
- ISSN
- 2072-6694
- DOI
- 10.3390/cancers13050991
- language
- English
- LU publication?
- yes
- id
- b7b3d6cb-c51e-425f-bd3b-c035751b189f
- date added to LUP
- 2021-03-15 13:33:39
- date last changed
- 2024-06-27 10:01:01
@article{b7b3d6cb-c51e-425f-bd3b-c035751b189f, abstract = {{<p>Advances in high-throughput technologies encourage the generation of large amounts of multiomics data to investigate complex diseases, including breast cancer. Given that the aetiologies of such diseases extend beyond a single biological entity, and that essential biological information can be carried by all data regardless of data type, integrative analyses are needed to identify clinically relevant patterns. To facilitate such analyses, we present a permutation-based framework for random forest methods which simultaneously allows the unbiased integration of mixed-type data and assessment of relative feature importance. Through simulation studies and machine learning datasets, the performance of the approach was evaluated. The results showed minimal multicollinearity and limited overfitting. To further assess the performance, the permutation-based framework was applied to high-dimensional mixed-type data from two independent breast cancer cohorts. Reproducibility and robustness of our approach was demonstrated by the concordance in relative feature importance between the cohorts, along with consistencies in clustering profiles. One of the identified clusters was shown to be prognostic for clinical outcome after standard-of-care adjuvant chemotherapy and outperformed current intrinsic molecular breast cancer classifications.</p>}}, author = {{Quist, Jelmar and Taylor, Lawson and Staaf, Johan and Grigoriadis, Anita}}, issn = {{2072-6694}}, keywords = {{Breast cancer; DNA damage repair; Integrative analysis; Machine learning; Random forest}}, language = {{eng}}, number = {{5}}, publisher = {{MDPI AG}}, series = {{Cancers}}, title = {{Random forest modelling of high-dimensional mixed-type data for breast cancer classification}}, url = {{http://dx.doi.org/10.3390/cancers13050991}}, doi = {{10.3390/cancers13050991}}, volume = {{13}}, year = {{2021}}, }