Advanced

Simultaneous Classification of Sets of Images Using Deep Learning and Clustering

Carleke, Nellie LU and Sellerberg, Hugo LU (2020) In Master’s Theses in Mathematical Sciences FMAM05 20201
Mathematics (Faculty of Engineering)
Abstract
Classification of cell images is conventionally done manually in hematology laboratories by medical technologists. CellaVision aims to automate this work in order to make the analysis process faster, better and more flexible. The automatic classification is currently done by processing each individual cell image through a Convolutional Neural Network. This methodology does not exploit any correlations that might exist between cells from the same blood sample.

We suggest a method to first compress the images of a whole sample using a Convolutional Neural Network and a Variational Autoencoder, then cluster these compressed data points using DBSCAN clustering and Bayesian Optimization, and finally assign a cell class to each cluster using... (More)
Classification of cell images is conventionally done manually in hematology laboratories by medical technologists. CellaVision aims to automate this work in order to make the analysis process faster, better and more flexible. The automatic classification is currently done by processing each individual cell image through a Convolutional Neural Network. This methodology does not exploit any correlations that might exist between cells from the same blood sample.

We suggest a method to first compress the images of a whole sample using a Convolutional Neural Network and a Variational Autoencoder, then cluster these compressed data points using DBSCAN clustering and Bayesian Optimization, and finally assign a cell class to each cluster using statistical tools such as Earth Mover's Distance. We used data from CellaVision's system DC-1 to train a Convolutional Neural Network with 90.68% accuracy on training data and 82.85% accuracy on test data. This was used both as a benchmark and as the foundation to our method. We managed to enhance the accuracies to 90.90% on training data and 83.13% on test data by applying our method.

We explored the feasibility of using our method on mixed cell data from different systems, but the results were not as good as on DC-1 data. Applying our method on images of handwritten digits from the MNIST dataset could be made advantageous by forming customized subsets of images. This indicates that our method is versatile enough to use on general image data, provided that correlations within the subsets exist. (Less)
Please use this url to cite or link to this publication:
author
Carleke, Nellie LU and Sellerberg, Hugo LU
supervisor
organization
course
FMAM05 20201
year
type
H2 - Master's Degree (Two Years)
subject
publication/series
Master’s Theses in Mathematical Sciences
report number
LUTFMA-3410-2020
ISSN
1404-6342
other publication id
2020:E29
language
English
id
9011189
date added to LUP
2020-06-24 13:46:30
date last changed
2020-06-24 13:46:30
@misc{9011189,
  abstract     = {Classification of cell images is conventionally done manually in hematology laboratories by medical technologists. CellaVision aims to automate this work in order to make the analysis process faster, better and more flexible. The automatic classification is currently done by processing each individual cell image through a Convolutional Neural Network. This methodology does not exploit any correlations that might exist between cells from the same blood sample.

We suggest a method to first compress the images of a whole sample using a Convolutional Neural Network and a Variational Autoencoder, then cluster these compressed data points using DBSCAN clustering and Bayesian Optimization, and finally assign a cell class to each cluster using statistical tools such as Earth Mover's Distance. We used data from CellaVision's system DC-1 to train a Convolutional Neural Network with 90.68% accuracy on training data and 82.85% accuracy on test data. This was used both as a benchmark and as the foundation to our method. We managed to enhance the accuracies to 90.90% on training data and 83.13% on test data by applying our method.

We explored the feasibility of using our method on mixed cell data from different systems, but the results were not as good as on DC-1 data. Applying our method on images of handwritten digits from the MNIST dataset could be made advantageous by forming customized subsets of images. This indicates that our method is versatile enough to use on general image data, provided that correlations within the subsets exist.},
  author       = {Carleke, Nellie and Sellerberg, Hugo},
  issn         = {1404-6342},
  language     = {eng},
  note         = {Student Paper},
  series       = {Master’s Theses in Mathematical Sciences},
  title        = {Simultaneous Classification of Sets of Images Using Deep Learning and Clustering},
  year         = {2020},
}