Gene Expression Guided Distance Metric Learning for Breast Cancer Whole Slide Image Analysis
(2022) In Master’s Theses in Mathematical Sciences FMAM05 20221Mathematics (Faculty of Engineering)
- Abstract
- Female breast cancer is a complex and heterogeneous disease that accounts for most of the deaths caused by cancer in women worldwide. To stratify breast cancer patients into treatment groups is a challenging task, and in recent years, analysis of the genes active in the tumour has been used in the decision of cancer therapy. Although gene expression analysis is expensive and not available for most breast cancer patients, calling for a more cost-effective and reproducible alternative.
In the following thesis, a gene expression guided embedding extractor network is trained that maps whole slide images of female breast cancer tumours into embeddings in a metric space in which relative distances should be similar to the distances in the... (More) - Female breast cancer is a complex and heterogeneous disease that accounts for most of the deaths caused by cancer in women worldwide. To stratify breast cancer patients into treatment groups is a challenging task, and in recent years, analysis of the genes active in the tumour has been used in the decision of cancer therapy. Although gene expression analysis is expensive and not available for most breast cancer patients, calling for a more cost-effective and reproducible alternative.
In the following thesis, a gene expression guided embedding extractor network is trained that maps whole slide images of female breast cancer tumours into embeddings in a metric space in which relative distances should be similar to the distances in the corresponding gene expression data. In the thesis, the embedding extractor network is the convolutional-based neural network ResNet-50. The metrics studied for distance measurements were the L1-distance dL1, cosine distance dCL, L2-distance dL2 and an average L1-distance dMAD. In the thesis, each whole slide image consisted of smaller tiles. Examining the model’s performance basing the distance measurement on one or multiple tiles from each slide, it was seen that the best performing metric was dMAD with the multi-tile calculation. The final model gave a Pearson correlation coefficient between predicted- and ground truth distances of ρ = 0.631 on the test data. The statistical significance of the correlation between predicted- and ground truth distances was evaluated with a Mantel test, resulting in a p-value < 1e−15.
The thesis suggests that an image-based approach could serve as a potential alternative to gene expression profiling, with the possibility of further research and evaluation. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9092524
- author
- Ledesma Eriksson, Kajsa LU
- supervisor
- organization
- course
- FMAM05 20221
- year
- 2022
- type
- H2 - Master's Degree (Two Years)
- subject
- keywords
- Whole Slide Image, Breast Cancer, Histopathology, Deep Metric Learning, Deep Learning, Image Analysis
- publication/series
- Master’s Theses in Mathematical Sciences
- report number
- LUTFMA-3483-2022
- ISSN
- 1404-6342
- other publication id
- 2022:E36
- language
- English
- id
- 9092524
- date added to LUP
- 2022-08-10 15:54:03
- date last changed
- 2022-08-10 15:55:04
@misc{9092524, abstract = {{Female breast cancer is a complex and heterogeneous disease that accounts for most of the deaths caused by cancer in women worldwide. To stratify breast cancer patients into treatment groups is a challenging task, and in recent years, analysis of the genes active in the tumour has been used in the decision of cancer therapy. Although gene expression analysis is expensive and not available for most breast cancer patients, calling for a more cost-effective and reproducible alternative. In the following thesis, a gene expression guided embedding extractor network is trained that maps whole slide images of female breast cancer tumours into embeddings in a metric space in which relative distances should be similar to the distances in the corresponding gene expression data. In the thesis, the embedding extractor network is the convolutional-based neural network ResNet-50. The metrics studied for distance measurements were the L1-distance dL1, cosine distance dCL, L2-distance dL2 and an average L1-distance dMAD. In the thesis, each whole slide image consisted of smaller tiles. Examining the model’s performance basing the distance measurement on one or multiple tiles from each slide, it was seen that the best performing metric was dMAD with the multi-tile calculation. The final model gave a Pearson correlation coefficient between predicted- and ground truth distances of ρ = 0.631 on the test data. The statistical significance of the correlation between predicted- and ground truth distances was evaluated with a Mantel test, resulting in a p-value < 1e−15. The thesis suggests that an image-based approach could serve as a potential alternative to gene expression profiling, with the possibility of further research and evaluation.}}, author = {{Ledesma Eriksson, Kajsa}}, issn = {{1404-6342}}, language = {{eng}}, note = {{Student Paper}}, series = {{Master’s Theses in Mathematical Sciences}}, title = {{Gene Expression Guided Distance Metric Learning for Breast Cancer Whole Slide Image Analysis}}, year = {{2022}}, }