Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Self-Supervised Learning: Land Classification of Satellite Imagery

Skoglund, Robert LU (2022) In Bachelor's Theses in Mathematicas Sciences MASK11 20221
Mathematical Statistics
Abstract
The rise of self-supervised learning has granted a deeper level of generalized machine learning capable of learning semantic representations without any use of labelling. With 700 satellites orbiting Earth and generating terabytes of unlabelled data daily, satellite imagery serves as a particularly enticing data set for self-supervised learning, containing rich information with many applicable domains,such as agriculture. Could one, for example, train a model to predict harvest yields on farmland based on representations learnt with self-supervised learning?

To explore the capacity of self-supervised learning in the context of agriculture and remote sensing, i.e. studying phenomena from a distance using technology such as drone and... (More)
The rise of self-supervised learning has granted a deeper level of generalized machine learning capable of learning semantic representations without any use of labelling. With 700 satellites orbiting Earth and generating terabytes of unlabelled data daily, satellite imagery serves as a particularly enticing data set for self-supervised learning, containing rich information with many applicable domains,such as agriculture. Could one, for example, train a model to predict harvest yields on farmland based on representations learnt with self-supervised learning?

To explore the capacity of self-supervised learning in the context of agriculture and remote sensing, i.e. studying phenomena from a distance using technology such as drone and satellites, we attempt to build a simple binary classifier classifying satellite images as farmland or not-farmland using the self-supervised visual representation learning algorithm SimCLR developed by Google. The dataset used was the BigEarthNet-S2 dataset captured by the Copernicus Sentinel-2 satellites. For comparison, a multi-class classifier was trained using an identical procedure on the visual object recognition dataset TinyImageNet. Self-supervised training of the network and supervised training of linear classifier were performed simultaneously, as the SimCLR authors report that this achieves similar performance as sequential self-supervised and supervised training.

TinyImageNet training metrics revealed successful self-supervised learning, however it was evident that the model would benefit from longer training and further experimentation with hyperparameters. The highest top-1 accuracy achieved was 41.66%. As for BigEarthNet-S2 : after training the linear classifiers, the evaluation metrics revealed poor predictive accuracy and generalization capacity, obtaining sensitivities of approximately 60%. The poor evaluation metrics were however not attributed to poor training or choice of hyperparameters, but rather to the poor pairing of the classification task and dataset. Namely, the multi-labelled BigEarthNet-S2 dataset contained too semantically diverse information with regards to the binary classification problem. This problem is intrinsic to the nature of multi-labelled data, containing overlapping classes and an arbitrary level of relevance for each label. After the analysis, improvements and methodological changes are proposed, such as utilizing a dataset with semantically distinct classes or fine-tuning with another niched dataset for a specialized downstream task. (Less)
Please use this url to cite or link to this publication:
author
Skoglund, Robert LU
supervisor
organization
course
MASK11 20221
year
type
M2 - Bachelor Degree
subject
keywords
Self-supervised learning, remote sensing, land classification, SimCLR
publication/series
Bachelor's Theses in Mathematicas Sciences
report number
LUNFMS-4067-2022
ISSN
1654-6229
other publication id
2022:K19
language
English
id
9101633
date added to LUP
2022-11-28 10:26:55
date last changed
2022-12-22 12:45:26
@misc{9101633,
  abstract     = {{The rise of self-supervised learning has granted a deeper level of generalized machine learning capable of learning semantic representations without any use of labelling. With 700 satellites orbiting Earth and generating terabytes of unlabelled data daily, satellite imagery serves as a particularly enticing data set for self-supervised learning, containing rich information with many applicable domains,such as agriculture. Could one, for example, train a model to predict harvest yields on farmland based on representations learnt with self-supervised learning?

To explore the capacity of self-supervised learning in the context of agriculture and remote sensing, i.e. studying phenomena from a distance using technology such as drone and satellites, we attempt to build a simple binary classifier classifying satellite images as farmland or not-farmland using the self-supervised visual representation learning algorithm SimCLR developed by Google. The dataset used was the BigEarthNet-S2 dataset captured by the Copernicus Sentinel-2 satellites. For comparison, a multi-class classifier was trained using an identical procedure on the visual object recognition dataset TinyImageNet. Self-supervised training of the network and supervised training of linear classifier were performed simultaneously, as the SimCLR authors report that this achieves similar performance as sequential self-supervised and supervised training.

TinyImageNet training metrics revealed successful self-supervised learning, however it was evident that the model would benefit from longer training and further experimentation with hyperparameters. The highest top-1 accuracy achieved was 41.66%. As for BigEarthNet-S2 : after training the linear classifiers, the evaluation metrics revealed poor predictive accuracy and generalization capacity, obtaining sensitivities of approximately 60%. The poor evaluation metrics were however not attributed to poor training or choice of hyperparameters, but rather to the poor pairing of the classification task and dataset. Namely, the multi-labelled BigEarthNet-S2 dataset contained too semantically diverse information with regards to the binary classification problem. This problem is intrinsic to the nature of multi-labelled data, containing overlapping classes and an arbitrary level of relevance for each label. After the analysis, improvements and methodological changes are proposed, such as utilizing a dataset with semantically distinct classes or fine-tuning with another niched dataset for a specialized downstream task.}},
  author       = {{Skoglund, Robert}},
  issn         = {{1654-6229}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Bachelor's Theses in Mathematicas Sciences}},
  title        = {{Self-Supervised Learning: Land Classification of Satellite Imagery}},
  year         = {{2022}},
}