Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Generation of Synthetic White Blood Cell Images Using Denoising Diffusion

Zettergren, Louise LU and Nilsson, Fanny (2023) In Master's Theses in Mathematical Sciences FMAM05 20231
Mathematics (Faculty of Engineering)
Abstract
CellaVision’s digital hematology systems are designed to analyze blood and
pre-classify different types of blood cells. Some abnormal white blood cells are
rare, which can cause imbalanced datasets. This can lead to a decrease in pre-
classification performance and a need to carry out more time-consuming data
gathering. The aim of this thesis is to investigate the possibility of using deep
learning to generate synthetic images of white blood cells with abnormalities,
in order to augment the training dataset of the pre-classifier.

Denoising diffusion is a new cutting edge method to generate synthetic data and
has been shown to be able to generate state-of-the-art images. A diffusion model
works by adding noise to training images... (More)
CellaVision’s digital hematology systems are designed to analyze blood and
pre-classify different types of blood cells. Some abnormal white blood cells are
rare, which can cause imbalanced datasets. This can lead to a decrease in pre-
classification performance and a need to carry out more time-consuming data
gathering. The aim of this thesis is to investigate the possibility of using deep
learning to generate synthetic images of white blood cells with abnormalities,
in order to augment the training dataset of the pre-classifier.

Denoising diffusion is a new cutting edge method to generate synthetic data and
has been shown to be able to generate state-of-the-art images. A diffusion model
works by adding noise to training images and learning to remove the noise. The
diffusion model of this thesis was created by first training a base model on im-
ages with and without abnormalities and then fine-tuning it for three different
types of abnormalities: hypersegmentation, Dohle bodies and hypergranulation.
A Generative Adversarial Network (GAN) was trained and its performance was
compared to the performance of the diffusion model.

To evaluate the generated images, the performance of a classifier trained on a
dataset augmented by generated images was compared to a classifier trained
only on real cell images. It is uncertain whether adding generated images to the
training dataset resulted in an improved classifier performance. For two of the
abnormalities, an increase in accuracy was seen for the abnormal class but in
the other cases there was a decrease in accuracy.

Moreover, a medical expert and an experienced CellaVision employee were both
given a set of 100 cell images, whereof 50 were synthetic. They were then asked
to assess which cell images were synthetic. The medical expert was able to
classify 96% of the real images as real, but only 32% of the synthetic images
were correctly classified. In turn, the experienced CellaVision employee was
able to correctly classify 44% of the real images and 24% of the synthetic. (Less)
Please use this url to cite or link to this publication:
author
Zettergren, Louise LU and Nilsson, Fanny
supervisor
organization
course
FMAM05 20231
year
type
H2 - Master's Degree (Two Years)
subject
publication/series
Master's Theses in Mathematical Sciences
report number
LUTFMA-3499-2023
ISSN
1404-6342
other publication id
2023:E18
language
English
id
9126116
date added to LUP
2023-08-22 16:47:09
date last changed
2023-08-22 16:47:09
@misc{9126116,
  abstract     = {{CellaVision’s digital hematology systems are designed to analyze blood and
pre-classify different types of blood cells. Some abnormal white blood cells are
rare, which can cause imbalanced datasets. This can lead to a decrease in pre-
classification performance and a need to carry out more time-consuming data
gathering. The aim of this thesis is to investigate the possibility of using deep
learning to generate synthetic images of white blood cells with abnormalities,
in order to augment the training dataset of the pre-classifier.

Denoising diffusion is a new cutting edge method to generate synthetic data and
has been shown to be able to generate state-of-the-art images. A diffusion model
works by adding noise to training images and learning to remove the noise. The
diffusion model of this thesis was created by first training a base model on im-
ages with and without abnormalities and then fine-tuning it for three different
types of abnormalities: hypersegmentation, Dohle bodies and hypergranulation.
A Generative Adversarial Network (GAN) was trained and its performance was
compared to the performance of the diffusion model.

To evaluate the generated images, the performance of a classifier trained on a
dataset augmented by generated images was compared to a classifier trained
only on real cell images. It is uncertain whether adding generated images to the
training dataset resulted in an improved classifier performance. For two of the
abnormalities, an increase in accuracy was seen for the abnormal class but in
the other cases there was a decrease in accuracy.

Moreover, a medical expert and an experienced CellaVision employee were both
given a set of 100 cell images, whereof 50 were synthetic. They were then asked
to assess which cell images were synthetic. The medical expert was able to
classify 96% of the real images as real, but only 32% of the synthetic images
were correctly classified. In turn, the experienced CellaVision employee was
able to correctly classify 44% of the real images and 24% of the synthetic.}},
  author       = {{Zettergren, Louise and Nilsson, Fanny}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Generation of Synthetic White Blood Cell Images Using Denoising Diffusion}},
  year         = {{2023}},
}