Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

A mixed clinicopathological and molecular proxy of homologous recombination deficiency in triple negative breast cancer

Walford, Elise (2021) BINP50 20201
Degree Projects in Bioinformatics
Abstract
Clinical models are increasingly employed in medical science as either diagnostic or prognostic aids. Machine-learning methods are able to draw links in large data that can be used to predict patient risk and allow more informed decisions regarding treatment and medication intervention. An advanced clinical predictor, HRDetect, can determine loss of homologous recombination-based repair pathways in patients with triple negative breast cancer, other breast cancers, and other cancers with high accuracy through the use of mutational signatures determined through whole genome sequencing. These patients respond well to treatment with targeted therapies, and the predictor is able to identify a far larger number of patients than would be... (More)
Clinical models are increasingly employed in medical science as either diagnostic or prognostic aids. Machine-learning methods are able to draw links in large data that can be used to predict patient risk and allow more informed decisions regarding treatment and medication intervention. An advanced clinical predictor, HRDetect, can determine loss of homologous recombination-based repair pathways in patients with triple negative breast cancer, other breast cancers, and other cancers with high accuracy through the use of mutational signatures determined through whole genome sequencing. These patients respond well to treatment with targeted therapies, and the predictor is able to identify a far larger number of patients than would be identified using current clinical methods. The aim of this thesis was to predict the results of patients previously classified by HRDetect using only more clinically available data alone. The predictor was developed in SCAN-B data of triple negative breast cancer patients. The process utilised multiple imputation to handle missing values, modelling of continuous variables using restricted cubic splines, and sparse principal component analysis for dimensionality reduction. The model was internally validated using bootstrapping, adjusted to improve calibration and applied to an external breast cancer dataset for external validation. Interpretation of results was made difficult by large differences between the development and validation datasets, and the final model showed seemingly good discrimination but poor calibration. Further investigation of clinical relevance may be required. (Less)
Popular Abstract
Predicting the predictor?

An advanced predictive model (HRDetect) can use mutation patterns in DNA to accurately identify patients with triple negative breast cancer (TNBC) who would benefit from treatment with specific targeted therapies. How well can this predictive output be guessed using only clinically measurable variables?


Advancements in machine learning have become increasingly applicable to day-to-day life. Computer-made models are able to make judgements on par with or better than humans. These models can also be used in prognosis and diagnosis in the clinic, where they can inform decisions regarding medical interventions to take or be used to predict disease progression and survival.

TNBC is a subtype of breast cancer... (More)
Predicting the predictor?

An advanced predictive model (HRDetect) can use mutation patterns in DNA to accurately identify patients with triple negative breast cancer (TNBC) who would benefit from treatment with specific targeted therapies. How well can this predictive output be guessed using only clinically measurable variables?


Advancements in machine learning have become increasingly applicable to day-to-day life. Computer-made models are able to make judgements on par with or better than humans. These models can also be used in prognosis and diagnosis in the clinic, where they can inform decisions regarding medical interventions to take or be used to predict disease progression and survival.

TNBC is a subtype of breast cancer that has relatively poor prognosis, however some TNBC tumours have loss of homologous recombination (HR)-based repair systems and, as a result, these respond well to treatment with PARP inhibitors.

Logistic regression is an older, though often used, technique for clinical models which have only two outcomes (in this case, high chance of loss of HR-repair and low, as determined by HRDetect). A predictive model was developed using this technique in a comprehensive dataset of TNBC patients in Skåne that had previously been assessed by HRDetect.

Sample size drove many modelling decisions, with the aim to “spend” model complexity on maximising the inclusion of explanatory variables without overfitting the model to its development data. The result was a trade-off between interpretability and dimensionality reduction. The final model employed several different methods and techniques, including restricted cubic splines, multiple imputation, sparse principal component analysis and bootstrap resampling.

The model was validated both internally and externally. Internal validation showed good discrimination, but poor calibration. The final model was adjusted to correct for this. External validation posed its own challenges, as missing data needed to be accounted for, and it involved applying the model to previously unseen breast cancer subtypes. The model showed a generally good ability to discriminate the outcomes of the previous predictor, but poor calibration – possibly due to differences in the data used for development and for validation.


The results partially confirmed known relationships between clinical variables and HR deficiency. However, model predictions were dominated by a single variable which is not routinely measured in the clinic. Although some discrimination was displayed when this variable and the outcome were removed, further analysis is required to determine if this is clinically relevant, and how well the model could predict in the absence of this variable.



Bioinformatics 30 credits 2021
Department of Biology, Lund University

Advisor: Johan Staaf / Nikos Tsardakas Renhuldt
Advisors Unit: Medicon Village / Department of Biology, Lund University (Less)
Please use this url to cite or link to this publication:
author
Walford, Elise
supervisor
organization
course
BINP50 20201
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
9041735
date added to LUP
2021-03-11 16:22:56
date last changed
2021-03-11 16:22:56
@misc{9041735,
  abstract     = {{Clinical models are increasingly employed in medical science as either diagnostic or prognostic aids. Machine-learning methods are able to draw links in large data that can be used to predict patient risk and allow more informed decisions regarding treatment and medication intervention. An advanced clinical predictor, HRDetect, can determine loss of homologous recombination-based repair pathways in patients with triple negative breast cancer, other breast cancers, and other cancers with high accuracy through the use of mutational signatures determined through whole genome sequencing. These patients respond well to treatment with targeted therapies, and the predictor is able to identify a far larger number of patients than would be identified using current clinical methods. The aim of this thesis was to predict the results of patients previously classified by HRDetect using only more clinically available data alone. The predictor was developed in SCAN-B data of triple negative breast cancer patients. The process utilised multiple imputation to handle missing values, modelling of continuous variables using restricted cubic splines, and sparse principal component analysis for dimensionality reduction. The model was internally validated using bootstrapping, adjusted to improve calibration and applied to an external breast cancer dataset for external validation. Interpretation of results was made difficult by large differences between the development and validation datasets, and the final model showed seemingly good discrimination but poor calibration. Further investigation of clinical relevance may be required.}},
  author       = {{Walford, Elise}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{A mixed clinicopathological and molecular proxy of homologous recombination deficiency in triple negative breast cancer}},
  year         = {{2021}},
}