Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Accelerated Segmentation with Mixed-Precision Quantization of EfficientViT-SAM

Ryhede Bengtsson, Bernard and Bengs, Joel (2024)
Department of Automatic Control
Abstract
The wide adoption of machine learning (ML) models in resource-constrained devices for complex vision tasks is contingent on improved computational efficiency. Quantization is a tool for model compression in which weights and activations are approximated with low-precision datatypes, to achieve a smaller model with reduced latency and memory usage. In this thesis, we seek to apply post-training mixedprecision quantization to the image encoder of EfficientViT-SAM, a state-of-the-art Segment Anything Model that produces segmentation masks for any object in an image.
We study the architecture and latency profile of EfficientViT-SAM, and conduct experiments to understand the sensitivity of different architecture components, like layers, on... (More)
The wide adoption of machine learning (ML) models in resource-constrained devices for complex vision tasks is contingent on improved computational efficiency. Quantization is a tool for model compression in which weights and activations are approximated with low-precision datatypes, to achieve a smaller model with reduced latency and memory usage. In this thesis, we seek to apply post-training mixedprecision quantization to the image encoder of EfficientViT-SAM, a state-of-the-art Segment Anything Model that produces segmentation masks for any object in an image.
We study the architecture and latency profile of EfficientViT-SAM, and conduct experiments to understand the sensitivity of different architecture components, like layers, on the accuracy achieved. Results show that some convolution layers, in particular depth-wise convolutions, are difficult to quantize without degrading accuracy, while other layers such as the ReLU-based self-attention mechanism used in EfficientViT-SAM are relatively robust under quantization.
From these insights, we design and evaluate several quantization schemes that quantize only non-sensitive layers. More specifically, using TensorRT optimization engine, we reduce the latency of EfficientViT-SAM-XL1 by 2.4%, with a loss of 6.9% accuracy. Using a simulation framework, we reduce the size of EfficientViTSAM- L0 by 31% with a loss of 9% accuracy. Our extensive experiments show that a desirable accuracy-latency trade-off is hard, if not infeasible, to reach through uniform quantization to INT8. From our qualitative results, we found that the poorly performing models matched baseline performance in simpler scenarios but failed in predicting complex segmentation masks. (Less)
Please use this url to cite or link to this publication:
author
Ryhede Bengtsson, Bernard and Bengs, Joel
supervisor
organization
year
type
H3 - Professional qualifications (4 Years - )
subject
report number
TFRT-6250
other publication id
0280-5316
language
English
id
9174462
date added to LUP
2024-09-16 08:48:23
date last changed
2024-09-16 08:48:23
@misc{9174462,
  abstract     = {{The wide adoption of machine learning (ML) models in resource-constrained devices for complex vision tasks is contingent on improved computational efficiency. Quantization is a tool for model compression in which weights and activations are approximated with low-precision datatypes, to achieve a smaller model with reduced latency and memory usage. In this thesis, we seek to apply post-training mixedprecision quantization to the image encoder of EfficientViT-SAM, a state-of-the-art Segment Anything Model that produces segmentation masks for any object in an image.
 We study the architecture and latency profile of EfficientViT-SAM, and conduct experiments to understand the sensitivity of different architecture components, like layers, on the accuracy achieved. Results show that some convolution layers, in particular depth-wise convolutions, are difficult to quantize without degrading accuracy, while other layers such as the ReLU-based self-attention mechanism used in EfficientViT-SAM are relatively robust under quantization.
 From these insights, we design and evaluate several quantization schemes that quantize only non-sensitive layers. More specifically, using TensorRT optimization engine, we reduce the latency of EfficientViT-SAM-XL1 by 2.4%, with a loss of 6.9% accuracy. Using a simulation framework, we reduce the size of EfficientViTSAM- L0 by 31% with a loss of 9% accuracy. Our extensive experiments show that a desirable accuracy-latency trade-off is hard, if not infeasible, to reach through uniform quantization to INT8. From our qualitative results, we found that the poorly performing models matched baseline performance in simpler scenarios but failed in predicting complex segmentation masks.}},
  author       = {{Ryhede Bengtsson, Bernard and Bengs, Joel}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Accelerated Segmentation with Mixed-Precision Quantization of EfficientViT-SAM}},
  year         = {{2024}},
}