ASIC implementation exploration for EfficientNet optimizations

Fondelius, Filip; Davidsson, Albin

ASIC implementation exploration for EfficientNet optimizations

Mark

Fondelius, Filip ^LU and Davidsson, Albin ^LU (2024) EITM01 20241
Department of Electrical and Information Technology

Abstract: Neural networks are effective for solving many complex human-level tasks, one type of neural network often used in imaging tasks is the convolutional neural network (CNN). They can effectively solve problems in areas like image recognition, object detection, and classification tasks. Using a CNN for one of these tasks requires a huge computational cost relative to what a typical algorithm would have used. However, while the processors running the CNNs can handle the computational load, they get extremely hot and draw a lot of power.
This thesis explores different hardware techniques to optimize a fixed con- volutional neural network to decrease its power consumption while keeping an acceptable output result. One technique used is... (More); Neural networks are effective for solving many complex human-level tasks, one type of neural network often used in imaging tasks is the convolutional neural network (CNN). They can effectively solve problems in areas like image recognition, object detection, and classification tasks. Using a CNN for one of these tasks requires a huge computational cost relative to what a typical algorithm would have used. However, while the processors running the CNNs can handle the computational load, they get extremely hot and draw a lot of power.
This thesis explores different hardware techniques to optimize a fixed con- volutional neural network to decrease its power consumption while keeping an acceptable output result. One technique used is precision scaling where the preci- sion of weights and tensors is reduced to a lower number of bits. It reduces both the amount of bits used in calculations and data transfer. Another technique used was approximate computing. These approximate circuits are designed to have fewer gates and therefore draw less power.
In this thesis an algorithm and a simulator was created to choose at which location bits should be removed. By introducing several error metrics such as mean squared error and F1 score, the results could be compared between different metrics and an almost optimal solution could be found.
With our precision scaling method, we have been able to save almost half the energy while maintaining an acceptable F1 score of 0.8. When evaluating approx- imate multipliers for the same task, the results were more difficult to interpret. The approximate multipliers work well in some cases while they work significantly worse in other cases.
Our results show that using hardware techniques like precision scaling and approximate circuits can be used to reduce power consumption. (Less)
Popular Abstract: In recent years AI has become a powerful tool for solving complex tasks and there are lots of different use cases for it. This thesis uses a convolutional neural network that takes an input image and outputs a class for each pixel in the image. Each pixel is divided into one of three classes, based on which class it is the most likely to belong to. This is called semantic segmentation. The problem with using neu- ral networks are that they are computational heavy which leads to a high power consumption.
There are many ways to reduce the power consumption for neural networks. This thesis focuses on precision scaling where the normally 8-bit long weights and input tensors are reduced to fewer bits. Since fewer bits have to be both... (More); In recent years AI has become a powerful tool for solving complex tasks and there are lots of different use cases for it. This thesis uses a convolutional neural network that takes an input image and outputs a class for each pixel in the image. Each pixel is divided into one of three classes, based on which class it is the most likely to belong to. This is called semantic segmentation. The problem with using neu- ral networks are that they are computational heavy which leads to a high power consumption.
There are many ways to reduce the power consumption for neural networks. This thesis focuses on precision scaling where the normally 8-bit long weights and input tensors are reduced to fewer bits. Since fewer bits have to be both transported to the memory as well as used in computations, less power will be consumed. Lowering the precision of the weights and input tensors affects the ac- curacy of the result. We have developed a method that non-uniformly reduces the bit precision while maintaining an acceptable result. Our method was evaluated with a simulation program, written in Python and C++, which was also developed during this thesis.
The thesis also discusses the possibility of replacing the multiplier in parts of the neural network with approximate multipliers. Approximate multipliers are circuits that behave like normal multipliers but consumes less power at the cost none exact outputs. There are many different approximate multipliers with differ- ent levels of error and power consumption.
Using only precision scaling we have been able to cut the power consumption in half. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9170114

author

Fondelius, Filip ^LU and Davidsson, Albin ^LU

supervisor

Liang Liu ^LU
Ilayda Yaman ^LU

organization

Department of Electrical and Information Technology

course

EITM01 20241

year

2024

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

report number

LU/LTH-EIT 2024-1007

language

English

id

9170114

date added to LUP

2024-07-04 15:12:36

date last changed

2024-07-04 15:12:36

@misc{9170114,
  abstract     = {{Neural networks are effective for solving many complex human-level tasks, one type of neural network often used in imaging tasks is the convolutional neural network (CNN). They can effectively solve problems in areas like image recognition, object detection, and classification tasks. Using a CNN for one of these tasks requires a huge computational cost relative to what a typical algorithm would have used. However, while the processors running the CNNs can handle the computational load, they get extremely hot and draw a lot of power.
This thesis explores different hardware techniques to optimize a fixed con- volutional neural network to decrease its power consumption while keeping an acceptable output result. One technique used is precision scaling where the preci- sion of weights and tensors is reduced to a lower number of bits. It reduces both the amount of bits used in calculations and data transfer. Another technique used was approximate computing. These approximate circuits are designed to have fewer gates and therefore draw less power.
In this thesis an algorithm and a simulator was created to choose at which location bits should be removed. By introducing several error metrics such as mean squared error and F1 score, the results could be compared between different metrics and an almost optimal solution could be found.
With our precision scaling method, we have been able to save almost half the energy while maintaining an acceptable F1 score of 0.8. When evaluating approx- imate multipliers for the same task, the results were more difficult to interpret. The approximate multipliers work well in some cases while they work significantly worse in other cases.
Our results show that using hardware techniques like precision scaling and approximate circuits can be used to reduce power consumption.}},
  author       = {{Fondelius, Filip and Davidsson, Albin}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{ASIC implementation exploration for EfficientNet optimizations}},
  year         = {{2024}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

ASIC implementation exploration for EfficientNet optimizations