Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Data-Efficient Learning of Semantic Segmentation

Nilsson, David LU (2022) In Doctoral Theses in Mathematical Sciences 2022(5).
Abstract
Semantic segmentation is a fundamental problem in visual perception with a wide range of applications ranging from robotics to autonomous vehicles, and recent approaches based on deep learning have achieved excellent performance. However, to train such systems there is in general a need for very large datasets of annotated images. In this thesis we investigate and propose methods and setups for which it is possible to use unlabelled data to increase the performance or to use limited application specific data to reduce the need for large datasets when learning semantic segmentation.

In the first paper we study semantic video segmentation. We present a deep end-to-end trainable model that uses propagated labelling information in... (More)
Semantic segmentation is a fundamental problem in visual perception with a wide range of applications ranging from robotics to autonomous vehicles, and recent approaches based on deep learning have achieved excellent performance. However, to train such systems there is in general a need for very large datasets of annotated images. In this thesis we investigate and propose methods and setups for which it is possible to use unlabelled data to increase the performance or to use limited application specific data to reduce the need for large datasets when learning semantic segmentation.

In the first paper we study semantic video segmentation. We present a deep end-to-end trainable model that uses propagated labelling information in unlabelled frames in addition to sparsely labelled frames to predict semantic segmentation. Extensive experiments on the CityScapes and CamVid datasets show that the model can improve accuracy and temporal consistency by using extra unlabelled video frames in training and testing.

In the second, third and fourth paper we study active learning for semantic segmentation in an embodied context where navigation is part of the problem. A navigable agent should explore a building and query for the labelling of informative views that increase the visual perception of the agent. In the second paper we introduce the embodied visual active learning problem, and propose and evaluate a range of methods from heuristic baselines to a fully trainable agent using reinforcement learning (RL) on the Matterport3D dataset. We show that the learned agent outperforms several comparable pre-specified baselines. In the third paper we study the embodied visual active learning problem in a lifelong setup, where the visual learning spans the exploration of multiple buildings, and the learning in one scene should influence the active learning in the next e.g. by not annotating already accurately segmented object classes. We introduce new methodology to encourage global exploration of scenes, via an RL-formulation that combines local navigation with global exploration by frontier exploration. We show that the RL-agent can learn adaptable behaviour such as annotating less frequently when it already has explored a number of buildings. Finally we study the embodied visual active learning problem with region-based active learning in the fourth paper. Instead of querying for annotations for a whole image, an agent can query for annotations of just parts of images, and we show that it is significantly more labelling efficient to just annotate regions in the image instead of the full images. (Less)
Please use this url to cite or link to this publication:
author
supervisor
opponent
  • Prof. Maki, Atsuto, KTH Royal Institute of Technology, Sweden.
organization
publishing date
type
Thesis
publication status
published
subject
keywords
semantic segmentation, embodied learning, active learning, semantic video segmentation, computer vision, deep learning
in
Doctoral Theses in Mathematical Sciences
volume
2022
issue
5
publisher
Lund University, Faculty of Science, Centre for Mathematical Sciences, Mathematics
defense location
Lecture hall MH:Hörmander, Centre of Mathematical Sciences, Sölvegatan 18, Faculty of Engineering LTH, Lund University, Lund.
defense date
2022-06-13 10:15:00
ISSN
1404-0034
1404-0034
ISBN
978-91-8039-284-6
978-91-8039-283-9
language
English
LU publication?
yes
id
4c744d52-1902-4dd8-9473-30f522342b85
date added to LUP
2022-05-13 09:43:14
date last changed
2024-02-13 11:33:39
@phdthesis{4c744d52-1902-4dd8-9473-30f522342b85,
  abstract     = {{Semantic segmentation is a fundamental problem in visual perception with a wide range of applications ranging from robotics to autonomous vehicles, and recent approaches based on deep learning have achieved excellent performance. However, to train such systems there is in general a need for very large datasets of annotated images. In this thesis we investigate and propose methods and setups for which it is possible to use unlabelled data to increase the performance or to use limited application specific data to reduce the need for large datasets when learning semantic segmentation.<br/><br/>In the first paper we study semantic video segmentation. We present a deep end-to-end trainable model that uses propagated labelling information in unlabelled frames in addition to sparsely labelled frames to predict semantic segmentation. Extensive experiments on the CityScapes and CamVid datasets show that the model can improve accuracy and temporal consistency by using extra unlabelled video frames in training and testing.<br/><br/>In the second, third and fourth paper we study active learning for semantic segmentation in an embodied context where navigation is part of the problem. A navigable agent should explore a building and query for the labelling of informative views that increase the visual perception of the agent. In the second paper we introduce the embodied visual active learning problem, and propose and evaluate a range of methods from heuristic baselines to a fully trainable agent using reinforcement learning (RL) on the Matterport3D dataset. We show that the learned agent outperforms several comparable pre-specified baselines. In the third paper we study the embodied visual active learning problem in a lifelong setup, where the visual learning spans the exploration of multiple buildings, and the learning in one scene should influence the active learning in the next e.g. by not annotating already accurately segmented object classes. We introduce new methodology to encourage global exploration of scenes, via an RL-formulation that combines local navigation with global exploration by frontier exploration. We show that the RL-agent can learn adaptable behaviour such as annotating less frequently when it already has explored a number of buildings. Finally we study the embodied visual active learning problem with region-based active learning in the fourth paper. Instead of querying for annotations for a whole image, an agent can query for annotations of just parts of images, and we show that it is significantly more labelling efficient to just annotate regions in the image instead of the full images.}},
  author       = {{Nilsson, David}},
  isbn         = {{978-91-8039-284-6}},
  issn         = {{1404-0034}},
  keywords     = {{semantic segmentation; embodied learning; active learning; semantic video segmentation; computer vision; deep learning}},
  language     = {{eng}},
  number       = {{5}},
  publisher    = {{Lund University, Faculty of Science, Centre for Mathematical Sciences, Mathematics}},
  school       = {{Lund University}},
  series       = {{Doctoral Theses in Mathematical Sciences}},
  title        = {{Data-Efficient Learning of Semantic Segmentation}},
  url          = {{https://lup.lub.lu.se/search/files/118061050/Avhandling_DavidNilsson.pdf}},
  volume       = {{2022}},
  year         = {{2022}},
}