Memory Efficient Semantic Segmentation for Embedded Systems
(2019) In LU-CS-EX 2019-14 EDAM05 20191Department of Computer Science
- Abstract
- Convolutional neural networks (CNNs) have made rapid progress in the last years and in fields, such as computer vision, they are considered state-of-the-art. However, CNNs are very computationally intensive. This makes them challenging to use in embedded devices such as smartphones, security cameras and cars.
This thesis investigates different neural network compression techniques to see which will result in least memory consumption with the least accuracy drop. The techniques tested are pruning (selectively removing parts of the network), quantization (storing network parameters with lower accuracy) and tiling (splitting the dataflow inside the network).
The compression techniques are tested on DeepLab v3+, a neural network for... (More) - Convolutional neural networks (CNNs) have made rapid progress in the last years and in fields, such as computer vision, they are considered state-of-the-art. However, CNNs are very computationally intensive. This makes them challenging to use in embedded devices such as smartphones, security cameras and cars.
This thesis investigates different neural network compression techniques to see which will result in least memory consumption with the least accuracy drop. The techniques tested are pruning (selectively removing parts of the network), quantization (storing network parameters with lower accuracy) and tiling (splitting the dataflow inside the network).
The compression techniques are tested on DeepLab v3+, a neural network for semantic segmentation. Compared to other papers in neural network compression, our baseline, DeepLab, has significantly less parameters than the baseline used in other papers. We then selected a compressed version of DeepLab and tested it on an Axis P3227-LV network camera with two different implementations. One with TensorFlow Lite and the second one with a custom implementation of DeepLab written from scratch utilizing a novel memory allocation algorithm.
We find that memory usage is reduced one third by pruning, one half with quantization and almost two thirds with our custom implementation. In total, when combining all tested compression techniques with our custom implementation, we managed to reduce the memory consumption from 170 MB (TensorFlow Lite) down to 20 MB with only a minor reduction in accuracy. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/8992374
- author
- Liu, Haochen LU and Olsson, Erik LU
- supervisor
- organization
- course
- EDAM05 20191
- year
- 2019
- type
- H2 - Master's Degree (Two Years)
- subject
- keywords
- Neural networks, Semantic segmentation, Pruning, Quantization, Tiling, Memory allocation, DeepLab, MobileNet
- publication/series
- LU-CS-EX 2019-14
- report number
- LU-CS-EX 2019-14
- ISSN
- 1650-2884
- language
- English
- id
- 8992374
- date added to LUP
- 2019-09-16 11:00:10
- date last changed
- 2019-09-16 11:00:10
@misc{8992374, abstract = {{Convolutional neural networks (CNNs) have made rapid progress in the last years and in fields, such as computer vision, they are considered state-of-the-art. However, CNNs are very computationally intensive. This makes them challenging to use in embedded devices such as smartphones, security cameras and cars. This thesis investigates different neural network compression techniques to see which will result in least memory consumption with the least accuracy drop. The techniques tested are pruning (selectively removing parts of the network), quantization (storing network parameters with lower accuracy) and tiling (splitting the dataflow inside the network). The compression techniques are tested on DeepLab v3+, a neural network for semantic segmentation. Compared to other papers in neural network compression, our baseline, DeepLab, has significantly less parameters than the baseline used in other papers. We then selected a compressed version of DeepLab and tested it on an Axis P3227-LV network camera with two different implementations. One with TensorFlow Lite and the second one with a custom implementation of DeepLab written from scratch utilizing a novel memory allocation algorithm. We find that memory usage is reduced one third by pruning, one half with quantization and almost two thirds with our custom implementation. In total, when combining all tested compression techniques with our custom implementation, we managed to reduce the memory consumption from 170 MB (TensorFlow Lite) down to 20 MB with only a minor reduction in accuracy.}}, author = {{Liu, Haochen and Olsson, Erik}}, issn = {{1650-2884}}, language = {{eng}}, note = {{Student Paper}}, series = {{LU-CS-EX 2019-14}}, title = {{Memory Efficient Semantic Segmentation for Embedded Systems}}, year = {{2019}}, }