Memory Efficient Semantic Segmentation for Embedded Systems

Liu, Haochen; Olsson, Erik

Memory Efficient Semantic Segmentation for Embedded Systems

Mark

Liu, Haochen ^LU and Olsson, Erik ^LU (2019) In LU-CS-EX 2019-14 EDAM05 20191
Department of Computer Science

Abstract: Convolutional neural networks (CNNs) have made rapid progress in the last years and in fields, such as computer vision, they are considered state-of-the-art. However, CNNs are very computationally intensive. This makes them challenging to use in embedded devices such as smartphones, security cameras and cars.

This thesis investigates different neural network compression techniques to see which will result in least memory consumption with the least accuracy drop. The techniques tested are pruning (selectively removing parts of the network), quantization (storing network parameters with lower accuracy) and tiling (splitting the dataflow inside the network).

The compression techniques are tested on DeepLab v3+, a neural network for... (More); Convolutional neural networks (CNNs) have made rapid progress in the last years and in fields, such as computer vision, they are considered state-of-the-art. However, CNNs are very computationally intensive. This makes them challenging to use in embedded devices such as smartphones, security cameras and cars.

This thesis investigates different neural network compression techniques to see which will result in least memory consumption with the least accuracy drop. The techniques tested are pruning (selectively removing parts of the network), quantization (storing network parameters with lower accuracy) and tiling (splitting the dataflow inside the network).

The compression techniques are tested on DeepLab v3+, a neural network for semantic segmentation. Compared to other papers in neural network compression, our baseline, DeepLab, has significantly less parameters than the baseline used in other papers. We then selected a compressed version of DeepLab and tested it on an Axis P3227-LV network camera with two different implementations. One with TensorFlow Lite and the second one with a custom implementation of DeepLab written from scratch utilizing a novel memory allocation algorithm.

We find that memory usage is reduced one third by pruning, one half with quantization and almost two thirds with our custom implementation. In total, when combining all tested compression techniques with our custom implementation, we managed to reduce the memory consumption from 170 MB (TensorFlow Lite) down to 20 MB with only a minor reduction in accuracy. (Less)

- Open Access
- |
- PDF

Links

Document download statistics

Related Materials

Related object is popular science:
Popsci

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/8992374

author

Liu, Haochen ^LU and Olsson, Erik ^LU

supervisor

Pierre Nugues ^LU

organization

Department of Computer Science

course

EDAM05 20191

year

2019

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

keywords

Neural networks, Semantic segmentation, Pruning, Quantization, Tiling, Memory allocation, DeepLab, MobileNet

publication/series

LU-CS-EX 2019-14

report number

LU-CS-EX 2019-14

ISSN

1650-2884

language

English

id

8992374

date added to LUP

2019-09-16 11:00:10

date last changed

2019-09-16 11:00:10

@misc{8992374,
  abstract     = {{Convolutional neural networks (CNNs) have made rapid progress in the last years and in fields, such as computer vision, they are considered state-of-the-art. However, CNNs are very computationally intensive. This makes them challenging to use in embedded devices such as smartphones, security cameras and cars.

This thesis investigates different neural network compression techniques to see which will result in least memory consumption with the least accuracy drop. The techniques tested are pruning (selectively removing parts of the network), quantization (storing network parameters with lower accuracy) and tiling (splitting the dataflow inside the network).

The compression techniques are tested on DeepLab v3+, a neural network for semantic segmentation. Compared to other papers in neural network compression, our baseline, DeepLab, has significantly less parameters than the baseline used in other papers. We then selected a compressed version of DeepLab and tested it on an Axis P3227-LV network camera with two different implementations. One with TensorFlow Lite and the second one with a custom implementation of DeepLab written from scratch utilizing a novel memory allocation algorithm.

We find that memory usage is reduced one third by pruning, one half with quantization and almost two thirds with our custom implementation. In total, when combining all tested compression techniques with our custom implementation, we managed to reduce the memory consumption from 170 MB (TensorFlow Lite) down to 20 MB with only a minor reduction in accuracy.}},
  author       = {{Liu, Haochen and Olsson, Erik}},
  issn         = {{1650-2884}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{LU-CS-EX 2019-14}},
  title        = {{Memory Efficient Semantic Segmentation for Embedded Systems}},
  year         = {{2019}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Memory Efficient Semantic Segmentation for Embedded Systems