Generation of Synthetic White Blood Cell Images Using Denoising Diffusion

Zettergren, Louise; Nilsson, Fanny

Generation of Synthetic White Blood Cell Images Using Denoising Diffusion

Mark

Zettergren, Louise ^LU and Nilsson, Fanny (2023) In Master's Theses in Mathematical Sciences FMAM05 20231
Mathematics (Faculty of Engineering)

Abstract: CellaVision’s digital hematology systems are designed to analyze blood and
pre-classify different types of blood cells. Some abnormal white blood cells are
rare, which can cause imbalanced datasets. This can lead to a decrease in pre-
classification performance and a need to carry out more time-consuming data
gathering. The aim of this thesis is to investigate the possibility of using deep
learning to generate synthetic images of white blood cells with abnormalities,
in order to augment the training dataset of the pre-classifier.

Denoising diffusion is a new cutting edge method to generate synthetic data and
has been shown to be able to generate state-of-the-art images. A diffusion model
works by adding noise to training images... (More); CellaVision’s digital hematology systems are designed to analyze blood and
pre-classify different types of blood cells. Some abnormal white blood cells are
rare, which can cause imbalanced datasets. This can lead to a decrease in pre-
classification performance and a need to carry out more time-consuming data
gathering. The aim of this thesis is to investigate the possibility of using deep
learning to generate synthetic images of white blood cells with abnormalities,
in order to augment the training dataset of the pre-classifier.

Denoising diffusion is a new cutting edge method to generate synthetic data and
has been shown to be able to generate state-of-the-art images. A diffusion model
works by adding noise to training images and learning to remove the noise. The
diffusion model of this thesis was created by first training a base model on im-
ages with and without abnormalities and then fine-tuning it for three different
types of abnormalities: hypersegmentation, Dohle bodies and hypergranulation.
A Generative Adversarial Network (GAN) was trained and its performance was
compared to the performance of the diffusion model.

To evaluate the generated images, the performance of a classifier trained on a
dataset augmented by generated images was compared to a classifier trained
only on real cell images. It is uncertain whether adding generated images to the
training dataset resulted in an improved classifier performance. For two of the
abnormalities, an increase in accuracy was seen for the abnormal class but in
the other cases there was a decrease in accuracy.

Moreover, a medical expert and an experienced CellaVision employee were both
given a set of 100 cell images, whereof 50 were synthetic. They were then asked
to assess which cell images were synthetic. The medical expert was able to
classify 96% of the real images as real, but only 32% of the synthetic images
were correctly classified. In turn, the experienced CellaVision employee was
able to correctly classify 44% of the real images and 24% of the synthetic. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9126116

author

Zettergren, Louise ^LU and Nilsson, Fanny

supervisor

Anders Heyden ^LU

organization

Mathematics (Faculty of Engineering)

course

FMAM05 20231

year

2023

type

H2 - Master's Degree (Two Years)

subject

Mathematics and Statistics

publication/series

Master's Theses in Mathematical Sciences

report number

LUTFMA-3499-2023

ISSN

1404-6342

other publication id

2023:E18

language

English

id

9126116

date added to LUP

2023-08-22 16:47:09

date last changed

2023-08-22 16:47:09

@misc{9126116,
  abstract     = {{CellaVision’s digital hematology systems are designed to analyze blood and
pre-classify different types of blood cells. Some abnormal white blood cells are
rare, which can cause imbalanced datasets. This can lead to a decrease in pre-
classification performance and a need to carry out more time-consuming data
gathering. The aim of this thesis is to investigate the possibility of using deep
learning to generate synthetic images of white blood cells with abnormalities,
in order to augment the training dataset of the pre-classifier.

Denoising diffusion is a new cutting edge method to generate synthetic data and
has been shown to be able to generate state-of-the-art images. A diffusion model
works by adding noise to training images and learning to remove the noise. The
diffusion model of this thesis was created by first training a base model on im-
ages with and without abnormalities and then fine-tuning it for three different
types of abnormalities: hypersegmentation, Dohle bodies and hypergranulation.
A Generative Adversarial Network (GAN) was trained and its performance was
compared to the performance of the diffusion model.

To evaluate the generated images, the performance of a classifier trained on a
dataset augmented by generated images was compared to a classifier trained
only on real cell images. It is uncertain whether adding generated images to the
training dataset resulted in an improved classifier performance. For two of the
abnormalities, an increase in accuracy was seen for the abnormal class but in
the other cases there was a decrease in accuracy.

Moreover, a medical expert and an experienced CellaVision employee were both
given a set of 100 cell images, whereof 50 were synthetic. They were then asked
to assess which cell images were synthetic. The medical expert was able to
classify 96% of the real images as real, but only 32% of the synthetic images
were correctly classified. In turn, the experienced CellaVision employee was
able to correctly classify 44% of the real images and 24% of the synthetic.}},
  author       = {{Zettergren, Louise and Nilsson, Fanny}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Generation of Synthetic White Blood Cell Images Using Denoising Diffusion}},
  year         = {{2023}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Generation of Synthetic White Blood Cell Images Using Denoising Diffusion