Self-Supervised Learning: Land Classification of Satellite Imagery

Skoglund, Robert

Self-Supervised Learning: Land Classification of Satellite Imagery

Mark

Skoglund, Robert ^LU (2022) In Bachelor's Theses in Mathematicas Sciences MASK11 20221
Mathematical Statistics

Abstract: The rise of self-supervised learning has granted a deeper level of generalized machine learning capable of learning semantic representations without any use of labelling. With 700 satellites orbiting Earth and generating terabytes of unlabelled data daily, satellite imagery serves as a particularly enticing data set for self-supervised learning, containing rich information with many applicable domains,such as agriculture. Could one, for example, train a model to predict harvest yields on farmland based on representations learnt with self-supervised learning?

To explore the capacity of self-supervised learning in the context of agriculture and remote sensing, i.e. studying phenomena from a distance using technology such as drone and... (More); The rise of self-supervised learning has granted a deeper level of generalized machine learning capable of learning semantic representations without any use of labelling. With 700 satellites orbiting Earth and generating terabytes of unlabelled data daily, satellite imagery serves as a particularly enticing data set for self-supervised learning, containing rich information with many applicable domains,such as agriculture. Could one, for example, train a model to predict harvest yields on farmland based on representations learnt with self-supervised learning?

To explore the capacity of self-supervised learning in the context of agriculture and remote sensing, i.e. studying phenomena from a distance using technology such as drone and satellites, we attempt to build a simple binary classifier classifying satellite images as farmland or not-farmland using the self-supervised visual representation learning algorithm SimCLR developed by Google. The dataset used was the BigEarthNet-S2 dataset captured by the Copernicus Sentinel-2 satellites. For comparison, a multi-class classifier was trained using an identical procedure on the visual object recognition dataset TinyImageNet. Self-supervised training of the network and supervised training of linear classifier were performed simultaneously, as the SimCLR authors report that this achieves similar performance as sequential self-supervised and supervised training.

TinyImageNet training metrics revealed successful self-supervised learning, however it was evident that the model would benefit from longer training and further experimentation with hyperparameters. The highest top-1 accuracy achieved was 41.66%. As for BigEarthNet-S2 : after training the linear classifiers, the evaluation metrics revealed poor predictive accuracy and generalization capacity, obtaining sensitivities of approximately 60%. The poor evaluation metrics were however not attributed to poor training or choice of hyperparameters, but rather to the poor pairing of the classification task and dataset. Namely, the multi-labelled BigEarthNet-S2 dataset contained too semantically diverse information with regards to the binary classification problem. This problem is intrinsic to the nature of multi-labelled data, containing overlapping classes and an arbitrary level of relevance for each label. After the analysis, improvements and methodological changes are proposed, such as utilizing a dataset with semantically distinct classes or fine-tuning with another niched dataset for a specialized downstream task. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9101633

author

Skoglund, Robert ^LU

supervisor

Alexandros Sopasakis ^LU

organization

Mathematical Statistics

course

MASK11 20221

year

2022

type

M2 - Bachelor Degree

subject

Mathematics and Statistics

keywords

Self-supervised learning, remote sensing, land classification, SimCLR

publication/series

Bachelor's Theses in Mathematicas Sciences

report number

LUNFMS-4067-2022

ISSN

1654-6229

other publication id

2022:K19

language

English

id

9101633

date added to LUP

2022-11-28 10:26:55

date last changed

2022-12-22 12:45:26

@misc{9101633,
  abstract     = {{The rise of self-supervised learning has granted a deeper level of generalized machine learning capable of learning semantic representations without any use of labelling. With 700 satellites orbiting Earth and generating terabytes of unlabelled data daily, satellite imagery serves as a particularly enticing data set for self-supervised learning, containing rich information with many applicable domains,such as agriculture. Could one, for example, train a model to predict harvest yields on farmland based on representations learnt with self-supervised learning?

To explore the capacity of self-supervised learning in the context of agriculture and remote sensing, i.e. studying phenomena from a distance using technology such as drone and satellites, we attempt to build a simple binary classifier classifying satellite images as farmland or not-farmland using the self-supervised visual representation learning algorithm SimCLR developed by Google. The dataset used was the BigEarthNet-S2 dataset captured by the Copernicus Sentinel-2 satellites. For comparison, a multi-class classifier was trained using an identical procedure on the visual object recognition dataset TinyImageNet. Self-supervised training of the network and supervised training of linear classifier were performed simultaneously, as the SimCLR authors report that this achieves similar performance as sequential self-supervised and supervised training.

TinyImageNet training metrics revealed successful self-supervised learning, however it was evident that the model would benefit from longer training and further experimentation with hyperparameters. The highest top-1 accuracy achieved was 41.66%. As for BigEarthNet-S2 : after training the linear classifiers, the evaluation metrics revealed poor predictive accuracy and generalization capacity, obtaining sensitivities of approximately 60%. The poor evaluation metrics were however not attributed to poor training or choice of hyperparameters, but rather to the poor pairing of the classification task and dataset. Namely, the multi-labelled BigEarthNet-S2 dataset contained too semantically diverse information with regards to the binary classification problem. This problem is intrinsic to the nature of multi-labelled data, containing overlapping classes and an arbitrary level of relevance for each label. After the analysis, improvements and methodological changes are proposed, such as utilizing a dataset with semantically distinct classes or fine-tuning with another niched dataset for a specialized downstream task.}},
  author       = {{Skoglund, Robert}},
  issn         = {{1654-6229}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Bachelor's Theses in Mathematicas Sciences}},
  title        = {{Self-Supervised Learning: Land Classification of Satellite Imagery}},
  year         = {{2022}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Self-Supervised Learning: Land Classification of Satellite Imagery