Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Gene Expression Guided Distance Metric Learning for Breast Cancer Whole Slide Image Analysis

Ledesma Eriksson, Kajsa LU (2022) In Master’s Theses in Mathematical Sciences FMAM05 20221
Mathematics (Faculty of Engineering)
Abstract
Female breast cancer is a complex and heterogeneous disease that accounts for most of the deaths caused by cancer in women worldwide. To stratify breast cancer patients into treatment groups is a challenging task, and in recent years, analysis of the genes active in the tumour has been used in the decision of cancer therapy. Although gene expression analysis is expensive and not available for most breast cancer patients, calling for a more cost-effective and reproducible alternative.

In the following thesis, a gene expression guided embedding extractor network is trained that maps whole slide images of female breast cancer tumours into embeddings in a metric space in which relative distances should be similar to the distances in the... (More)
Female breast cancer is a complex and heterogeneous disease that accounts for most of the deaths caused by cancer in women worldwide. To stratify breast cancer patients into treatment groups is a challenging task, and in recent years, analysis of the genes active in the tumour has been used in the decision of cancer therapy. Although gene expression analysis is expensive and not available for most breast cancer patients, calling for a more cost-effective and reproducible alternative.

In the following thesis, a gene expression guided embedding extractor network is trained that maps whole slide images of female breast cancer tumours into embeddings in a metric space in which relative distances should be similar to the distances in the corresponding gene expression data. In the thesis, the embedding extractor network is the convolutional-based neural network ResNet-50. The metrics studied for distance measurements were the L1-distance dL1, cosine distance dCL, L2-distance dL2 and an average L1-distance dMAD. In the thesis, each whole slide image consisted of smaller tiles. Examining the model’s performance basing the distance measurement on one or multiple tiles from each slide, it was seen that the best performing metric was dMAD with the multi-tile calculation. The final model gave a Pearson correlation coefficient between predicted- and ground truth distances of ρ = 0.631 on the test data. The statistical significance of the correlation between predicted- and ground truth distances was evaluated with a Mantel test, resulting in a p-value < 1e−15.

The thesis suggests that an image-based approach could serve as a potential alternative to gene expression profiling, with the possibility of further research and evaluation. (Less)
Please use this url to cite or link to this publication:
author
Ledesma Eriksson, Kajsa LU
supervisor
organization
course
FMAM05 20221
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Whole Slide Image, Breast Cancer, Histopathology, Deep Metric Learning, Deep Learning, Image Analysis
publication/series
Master’s Theses in Mathematical Sciences
report number
LUTFMA-3483-2022
ISSN
1404-6342
other publication id
2022:E36
language
English
id
9092524
date added to LUP
2022-08-10 15:54:03
date last changed
2022-08-10 15:55:04
@misc{9092524,
  abstract     = {{Female breast cancer is a complex and heterogeneous disease that accounts for most of the deaths caused by cancer in women worldwide. To stratify breast cancer patients into treatment groups is a challenging task, and in recent years, analysis of the genes active in the tumour has been used in the decision of cancer therapy. Although gene expression analysis is expensive and not available for most breast cancer patients, calling for a more cost-effective and reproducible alternative.

In the following thesis, a gene expression guided embedding extractor network is trained that maps whole slide images of female breast cancer tumours into embeddings in a metric space in which relative distances should be similar to the distances in the corresponding gene expression data. In the thesis, the embedding extractor network is the convolutional-based neural network ResNet-50. The metrics studied for distance measurements were the L1-distance dL1, cosine distance dCL, L2-distance dL2 and an average L1-distance dMAD. In the thesis, each whole slide image consisted of smaller tiles. Examining the model’s performance basing the distance measurement on one or multiple tiles from each slide, it was seen that the best performing metric was dMAD with the multi-tile calculation. The final model gave a Pearson correlation coefficient between predicted- and ground truth distances of ρ = 0.631 on the test data. The statistical significance of the correlation between predicted- and ground truth distances was evaluated with a Mantel test, resulting in a p-value < 1e−15.

The thesis suggests that an image-based approach could serve as a potential alternative to gene expression profiling, with the possibility of further research and evaluation.}},
  author       = {{Ledesma Eriksson, Kajsa}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master’s Theses in Mathematical Sciences}},
  title        = {{Gene Expression Guided Distance Metric Learning for Breast Cancer Whole Slide Image Analysis}},
  year         = {{2022}},
}