Simultaneous Classification of Sets of Images Using Deep Learning and Clustering

Carleke, Nellie; Sellerberg, Hugo

Simultaneous Classification of Sets of Images Using Deep Learning and Clustering

Mark

Carleke, Nellie ^LU and Sellerberg, Hugo ^LU (2020) In Master’s Theses in Mathematical Sciences FMAM05 20201
Mathematics (Faculty of Engineering)

Abstract: Classification of cell images is conventionally done manually in hematology laboratories by medical technologists. CellaVision aims to automate this work in order to make the analysis process faster, better and more flexible. The automatic classification is currently done by processing each individual cell image through a Convolutional Neural Network. This methodology does not exploit any correlations that might exist between cells from the same blood sample.

We suggest a method to first compress the images of a whole sample using a Convolutional Neural Network and a Variational Autoencoder, then cluster these compressed data points using DBSCAN clustering and Bayesian Optimization, and finally assign a cell class to each cluster using... (More); Classification of cell images is conventionally done manually in hematology laboratories by medical technologists. CellaVision aims to automate this work in order to make the analysis process faster, better and more flexible. The automatic classification is currently done by processing each individual cell image through a Convolutional Neural Network. This methodology does not exploit any correlations that might exist between cells from the same blood sample.

We suggest a method to first compress the images of a whole sample using a Convolutional Neural Network and a Variational Autoencoder, then cluster these compressed data points using DBSCAN clustering and Bayesian Optimization, and finally assign a cell class to each cluster using statistical tools such as Earth Mover's Distance. We used data from CellaVision's system DC-1 to train a Convolutional Neural Network with 90.68% accuracy on training data and 82.85% accuracy on test data. This was used both as a benchmark and as the foundation to our method. We managed to enhance the accuracies to 90.90% on training data and 83.13% on test data by applying our method.

We explored the feasibility of using our method on mixed cell data from different systems, but the results were not as good as on DC-1 data. Applying our method on images of handwritten digits from the MNIST dataset could be made advantageous by forming customized subsets of images. This indicates that our method is versatile enough to use on general image data, provided that correlations within the subsets exist. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9011189

author

Carleke, Nellie ^LU and Sellerberg, Hugo ^LU

supervisor

Karl Åström ^LU

organization

Mathematics (Faculty of Engineering)

course

FMAM05 20201

year

2020

type

H2 - Master's Degree (Two Years)

subject

Mathematics and Statistics

publication/series

Master’s Theses in Mathematical Sciences

report number

LUTFMA-3410-2020

ISSN

1404-6342

other publication id

2020:E29

language

English

id

9011189

date added to LUP

2020-06-24 13:46:30

date last changed

2020-06-24 13:46:30

@misc{9011189,
  abstract     = {{Classification of cell images is conventionally done manually in hematology laboratories by medical technologists. CellaVision aims to automate this work in order to make the analysis process faster, better and more flexible. The automatic classification is currently done by processing each individual cell image through a Convolutional Neural Network. This methodology does not exploit any correlations that might exist between cells from the same blood sample.

We suggest a method to first compress the images of a whole sample using a Convolutional Neural Network and a Variational Autoencoder, then cluster these compressed data points using DBSCAN clustering and Bayesian Optimization, and finally assign a cell class to each cluster using statistical tools such as Earth Mover's Distance. We used data from CellaVision's system DC-1 to train a Convolutional Neural Network with 90.68% accuracy on training data and 82.85% accuracy on test data. This was used both as a benchmark and as the foundation to our method. We managed to enhance the accuracies to 90.90% on training data and 83.13% on test data by applying our method.

We explored the feasibility of using our method on mixed cell data from different systems, but the results were not as good as on DC-1 data. Applying our method on images of handwritten digits from the MNIST dataset could be made advantageous by forming customized subsets of images. This indicates that our method is versatile enough to use on general image data, provided that correlations within the subsets exist.}},
  author       = {{Carleke, Nellie and Sellerberg, Hugo}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master’s Theses in Mathematical Sciences}},
  title        = {{Simultaneous Classification of Sets of Images Using Deep Learning and Clustering}},
  year         = {{2020}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Simultaneous Classification of Sets of Images Using Deep Learning and Clustering