Data-Efficient Learning of Semantic Segmentation

Nilsson, David

Data-Efficient Learning of Semantic Segmentation

Mark

Nilsson, David ^LU (2022) In Doctoral Theses in Mathematical Sciences 2022(5).

Abstract: Semantic segmentation is a fundamental problem in visual perception with a wide range of applications ranging from robotics to autonomous vehicles, and recent approaches based on deep learning have achieved excellent performance. However, to train such systems there is in general a need for very large datasets of annotated images. In this thesis we investigate and propose methods and setups for which it is possible to use unlabelled data to increase the performance or to use limited application specific data to reduce the need for large datasets when learning semantic segmentation.

In the first paper we study semantic video segmentation. We present a deep end-to-end trainable model that uses propagated labelling information in... (More); Semantic segmentation is a fundamental problem in visual perception with a wide range of applications ranging from robotics to autonomous vehicles, and recent approaches based on deep learning have achieved excellent performance. However, to train such systems there is in general a need for very large datasets of annotated images. In this thesis we investigate and propose methods and setups for which it is possible to use unlabelled data to increase the performance or to use limited application specific data to reduce the need for large datasets when learning semantic segmentation.

In the first paper we study semantic video segmentation. We present a deep end-to-end trainable model that uses propagated labelling information in unlabelled frames in addition to sparsely labelled frames to predict semantic segmentation. Extensive experiments on the CityScapes and CamVid datasets show that the model can improve accuracy and temporal consistency by using extra unlabelled video frames in training and testing.

In the second, third and fourth paper we study active learning for semantic segmentation in an embodied context where navigation is part of the problem. A navigable agent should explore a building and query for the labelling of informative views that increase the visual perception of the agent. In the second paper we introduce the embodied visual active learning problem, and propose and evaluate a range of methods from heuristic baselines to a fully trainable agent using reinforcement learning (RL) on the Matterport3D dataset. We show that the learned agent outperforms several comparable pre-specified baselines. In the third paper we study the embodied visual active learning problem in a lifelong setup, where the visual learning spans the exploration of multiple buildings, and the learning in one scene should influence the active learning in the next e.g. by not annotating already accurately segmented object classes. We introduce new methodology to encourage global exploration of scenes, via an RL-formulation that combines local navigation with global exploration by frontier exploration. We show that the RL-agent can learn adaptable behaviour such as annotating less frequently when it already has explored a number of buildings. Finally we study the embodied visual active learning problem with region-based active learning in the fourth paper. Instead of querying for annotations for a whole image, an agent can query for annotations of just parts of images, and we show that it is significantly more labelling efficient to just annotate regions in the image instead of the full images. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/4c744d52-1902-4dd8-9473-30f522342b85

author

Nilsson, David ^LU

supervisor

Cristian Sminchisescu ^LU
Carl Olsson ^LU

opponent

Prof. Maki, Atsuto, KTH Royal Institute of Technology, Sweden.

organization

publishing date

2022

type

Thesis

publication status

published

subject

Computer graphics and computer vision

keywords

semantic segmentation, embodied learning, active learning, semantic video segmentation, computer vision, deep learning

in

Doctoral Theses in Mathematical Sciences

volume

2022

issue

5

publisher

Lund University, Faculty of Science, Centre for Mathematical Sciences, Mathematics

defense location

Lecture hall MH:Hörmander, Centre of Mathematical Sciences, Sölvegatan 18, Faculty of Engineering LTH, Lund University, Lund.

defense date

2022-06-13 10:15:00

ISSN

1404-0034

ISBN

978-91-8039-283-9

978-91-8039-284-6

language

English

LU publication?

yes

id

4c744d52-1902-4dd8-9473-30f522342b85

date added to LUP

2022-05-13 09:43:14

date last changed

2025-04-04 13:59:33

@phdthesis{4c744d52-1902-4dd8-9473-30f522342b85,
  abstract     = {{Semantic segmentation is a fundamental problem in visual perception with a wide range of applications ranging from robotics to autonomous vehicles, and recent approaches based on deep learning have achieved excellent performance. However, to train such systems there is in general a need for very large datasets of annotated images. In this thesis we investigate and propose methods and setups for which it is possible to use unlabelled data to increase the performance or to use limited application specific data to reduce the need for large datasets when learning semantic segmentation.<br/><br/>In the first paper we study semantic video segmentation. We present a deep end-to-end trainable model that uses propagated labelling information in unlabelled frames in addition to sparsely labelled frames to predict semantic segmentation. Extensive experiments on the CityScapes and CamVid datasets show that the model can improve accuracy and temporal consistency by using extra unlabelled video frames in training and testing.<br/><br/>In the second, third and fourth paper we study active learning for semantic segmentation in an embodied context where navigation is part of the problem. A navigable agent should explore a building and query for the labelling of informative views that increase the visual perception of the agent. In the second paper we introduce the embodied visual active learning problem, and propose and evaluate a range of methods from heuristic baselines to a fully trainable agent using reinforcement learning (RL) on the Matterport3D dataset. We show that the learned agent outperforms several comparable pre-specified baselines. In the third paper we study the embodied visual active learning problem in a lifelong setup, where the visual learning spans the exploration of multiple buildings, and the learning in one scene should influence the active learning in the next e.g. by not annotating already accurately segmented object classes. We introduce new methodology to encourage global exploration of scenes, via an RL-formulation that combines local navigation with global exploration by frontier exploration. We show that the RL-agent can learn adaptable behaviour such as annotating less frequently when it already has explored a number of buildings. Finally we study the embodied visual active learning problem with region-based active learning in the fourth paper. Instead of querying for annotations for a whole image, an agent can query for annotations of just parts of images, and we show that it is significantly more labelling efficient to just annotate regions in the image instead of the full images.}},
  author       = {{Nilsson, David}},
  isbn         = {{978-91-8039-283-9}},
  issn         = {{1404-0034}},
  keywords     = {{semantic segmentation; embodied learning; active learning; semantic video segmentation; computer vision; deep learning}},
  language     = {{eng}},
  number       = {{5}},
  publisher    = {{Lund University, Faculty of Science, Centre for Mathematical Sciences, Mathematics}},
  school       = {{Lund University}},
  series       = {{Doctoral Theses in Mathematical Sciences}},
  title        = {{Data-Efficient Learning of Semantic Segmentation}},
  url          = {{https://lup.lub.lu.se/search/files/118061050/Avhandling_DavidNilsson.pdf}},
  volume       = {{2022}},
  year         = {{2022}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Data-Efficient Learning of Semantic Segmentation