Generation of Synthetic White Blood Cell Images Using Denoising Diffusion
(2023) In Master's Theses in Mathematical Sciences FMAM05 20231Mathematics (Faculty of Engineering)
- Abstract
- CellaVision’s digital hematology systems are designed to analyze blood and
pre-classify different types of blood cells. Some abnormal white blood cells are
rare, which can cause imbalanced datasets. This can lead to a decrease in pre-
classification performance and a need to carry out more time-consuming data
gathering. The aim of this thesis is to investigate the possibility of using deep
learning to generate synthetic images of white blood cells with abnormalities,
in order to augment the training dataset of the pre-classifier.
Denoising diffusion is a new cutting edge method to generate synthetic data and
has been shown to be able to generate state-of-the-art images. A diffusion model
works by adding noise to training images... (More) - CellaVision’s digital hematology systems are designed to analyze blood and
pre-classify different types of blood cells. Some abnormal white blood cells are
rare, which can cause imbalanced datasets. This can lead to a decrease in pre-
classification performance and a need to carry out more time-consuming data
gathering. The aim of this thesis is to investigate the possibility of using deep
learning to generate synthetic images of white blood cells with abnormalities,
in order to augment the training dataset of the pre-classifier.
Denoising diffusion is a new cutting edge method to generate synthetic data and
has been shown to be able to generate state-of-the-art images. A diffusion model
works by adding noise to training images and learning to remove the noise. The
diffusion model of this thesis was created by first training a base model on im-
ages with and without abnormalities and then fine-tuning it for three different
types of abnormalities: hypersegmentation, Dohle bodies and hypergranulation.
A Generative Adversarial Network (GAN) was trained and its performance was
compared to the performance of the diffusion model.
To evaluate the generated images, the performance of a classifier trained on a
dataset augmented by generated images was compared to a classifier trained
only on real cell images. It is uncertain whether adding generated images to the
training dataset resulted in an improved classifier performance. For two of the
abnormalities, an increase in accuracy was seen for the abnormal class but in
the other cases there was a decrease in accuracy.
Moreover, a medical expert and an experienced CellaVision employee were both
given a set of 100 cell images, whereof 50 were synthetic. They were then asked
to assess which cell images were synthetic. The medical expert was able to
classify 96% of the real images as real, but only 32% of the synthetic images
were correctly classified. In turn, the experienced CellaVision employee was
able to correctly classify 44% of the real images and 24% of the synthetic. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9126116
- author
- Zettergren, Louise LU and Nilsson, Fanny
- supervisor
- organization
- course
- FMAM05 20231
- year
- 2023
- type
- H2 - Master's Degree (Two Years)
- subject
- publication/series
- Master's Theses in Mathematical Sciences
- report number
- LUTFMA-3499-2023
- ISSN
- 1404-6342
- other publication id
- 2023:E18
- language
- English
- id
- 9126116
- date added to LUP
- 2023-08-22 16:47:09
- date last changed
- 2023-08-22 16:47:09
@misc{9126116, abstract = {{CellaVision’s digital hematology systems are designed to analyze blood and pre-classify different types of blood cells. Some abnormal white blood cells are rare, which can cause imbalanced datasets. This can lead to a decrease in pre- classification performance and a need to carry out more time-consuming data gathering. The aim of this thesis is to investigate the possibility of using deep learning to generate synthetic images of white blood cells with abnormalities, in order to augment the training dataset of the pre-classifier. Denoising diffusion is a new cutting edge method to generate synthetic data and has been shown to be able to generate state-of-the-art images. A diffusion model works by adding noise to training images and learning to remove the noise. The diffusion model of this thesis was created by first training a base model on im- ages with and without abnormalities and then fine-tuning it for three different types of abnormalities: hypersegmentation, Dohle bodies and hypergranulation. A Generative Adversarial Network (GAN) was trained and its performance was compared to the performance of the diffusion model. To evaluate the generated images, the performance of a classifier trained on a dataset augmented by generated images was compared to a classifier trained only on real cell images. It is uncertain whether adding generated images to the training dataset resulted in an improved classifier performance. For two of the abnormalities, an increase in accuracy was seen for the abnormal class but in the other cases there was a decrease in accuracy. Moreover, a medical expert and an experienced CellaVision employee were both given a set of 100 cell images, whereof 50 were synthetic. They were then asked to assess which cell images were synthetic. The medical expert was able to classify 96% of the real images as real, but only 32% of the synthetic images were correctly classified. In turn, the experienced CellaVision employee was able to correctly classify 44% of the real images and 24% of the synthetic.}}, author = {{Zettergren, Louise and Nilsson, Fanny}}, issn = {{1404-6342}}, language = {{eng}}, note = {{Student Paper}}, series = {{Master's Theses in Mathematical Sciences}}, title = {{Generation of Synthetic White Blood Cell Images Using Denoising Diffusion}}, year = {{2023}}, }