Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Memory Efficient Semantic Segmentation for Embedded Systems

Liu, Haochen LU and Olsson, Erik LU (2019) In LU-CS-EX 2019-14 EDAM05 20191
Department of Computer Science
Abstract
Convolutional neural networks (CNNs) have made rapid progress in the last years and in fields, such as computer vision, they are considered state-of-the-art. However, CNNs are very computationally intensive. This makes them challenging to use in embedded devices such as smartphones, security cameras and cars.

This thesis investigates different neural network compression techniques to see which will result in least memory consumption with the least accuracy drop. The techniques tested are pruning (selectively removing parts of the network), quantization (storing network parameters with lower accuracy) and tiling (splitting the dataflow inside the network).

The compression techniques are tested on DeepLab v3+, a neural network for... (More)
Convolutional neural networks (CNNs) have made rapid progress in the last years and in fields, such as computer vision, they are considered state-of-the-art. However, CNNs are very computationally intensive. This makes them challenging to use in embedded devices such as smartphones, security cameras and cars.

This thesis investigates different neural network compression techniques to see which will result in least memory consumption with the least accuracy drop. The techniques tested are pruning (selectively removing parts of the network), quantization (storing network parameters with lower accuracy) and tiling (splitting the dataflow inside the network).

The compression techniques are tested on DeepLab v3+, a neural network for semantic segmentation. Compared to other papers in neural network compression, our baseline, DeepLab, has significantly less parameters than the baseline used in other papers. We then selected a compressed version of DeepLab and tested it on an Axis P3227-LV network camera with two different implementations. One with TensorFlow Lite and the second one with a custom implementation of DeepLab written from scratch utilizing a novel memory allocation algorithm.

We find that memory usage is reduced one third by pruning, one half with quantization and almost two thirds with our custom implementation. In total, when combining all tested compression techniques with our custom implementation, we managed to reduce the memory consumption from 170 MB (TensorFlow Lite) down to 20 MB with only a minor reduction in accuracy. (Less)
Please use this url to cite or link to this publication:
author
Liu, Haochen LU and Olsson, Erik LU
supervisor
organization
course
EDAM05 20191
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Neural networks, Semantic segmentation, Pruning, Quantization, Tiling, Memory allocation, DeepLab, MobileNet
publication/series
LU-CS-EX 2019-14
report number
LU-CS-EX 2019-14
ISSN
1650-2884
language
English
id
8992374
date added to LUP
2019-09-16 11:00:10
date last changed
2019-09-16 11:00:10
@misc{8992374,
  abstract     = {{Convolutional neural networks (CNNs) have made rapid progress in the last years and in fields, such as computer vision, they are considered state-of-the-art. However, CNNs are very computationally intensive. This makes them challenging to use in embedded devices such as smartphones, security cameras and cars.

This thesis investigates different neural network compression techniques to see which will result in least memory consumption with the least accuracy drop. The techniques tested are pruning (selectively removing parts of the network), quantization (storing network parameters with lower accuracy) and tiling (splitting the dataflow inside the network).

The compression techniques are tested on DeepLab v3+, a neural network for semantic segmentation. Compared to other papers in neural network compression, our baseline, DeepLab, has significantly less parameters than the baseline used in other papers. We then selected a compressed version of DeepLab and tested it on an Axis P3227-LV network camera with two different implementations. One with TensorFlow Lite and the second one with a custom implementation of DeepLab written from scratch utilizing a novel memory allocation algorithm.

We find that memory usage is reduced one third by pruning, one half with quantization and almost two thirds with our custom implementation. In total, when combining all tested compression techniques with our custom implementation, we managed to reduce the memory consumption from 170 MB (TensorFlow Lite) down to 20 MB with only a minor reduction in accuracy.}},
  author       = {{Liu, Haochen and Olsson, Erik}},
  issn         = {{1650-2884}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{LU-CS-EX 2019-14}},
  title        = {{Memory Efficient Semantic Segmentation for Embedded Systems}},
  year         = {{2019}},
}