A Study on Efficient Memory Utilization in Machine Learning and Memory Intensive Systems

Emmanuel, Jones

A Study on Efficient Memory Utilization in Machine Learning and Memory Intensive Systems

Mark

Emmanuel, Jones ^LU (2021) EITM02 20211
Department of Electrical and Information Technology

Abstract: As neural networks find more and more practical applications targeted for edge devices, the implementation of energy-efficient architectures is becoming very crucial. Despite the advancements in process technology, power and performance of memories remain to be a bottleneck for most computing platforms. The aim of this thesis is to study the effect of the breakdown structure of memories on power cost with a focus on a dedicated hardware accelerator in neural network applications. The evaluation test suite of this study consists of a RISC-V based System-on-Chip (SoC), PULPissimo, integrated with an accelerator designed for a convolutional neural network (CNN) application. The memory organization of the CNN hardware accelerator is... (More); As neural networks find more and more practical applications targeted for edge devices, the implementation of energy-efficient architectures is becoming very crucial. Despite the advancements in process technology, power and performance of memories remain to be a bottleneck for most computing platforms. The aim of this thesis is to study the effect of the breakdown structure of memories on power cost with a focus on a dedicated hardware accelerator in neural network applications. The evaluation test suite of this study consists of a RISC-V based System-on-Chip (SoC), PULPissimo, integrated with an accelerator designed for a convolutional neural network (CNN) application. The memory organization of the CNN hardware accelerator is implemented as a flexible and configurable wrapper for studying different breakdown structures. Moreover, different optimization techniques are also utilized to put area and power costs in the defined design budget. Multiple memory breakdown structures, suitable for the accelerator's memory sub-system, were analyzed for power consumption, as part of the study. The study reveals a non-linear increase in power consumption with the size of the static random-access memory (SRAM) modules and the limits of memory partitioning while using SRAM. It also revealed the power and area limitations of D flip-flop based standard cell memory (SCM) in comparison with SRAM. (Less)
Popular Abstract: The field of artificial intelligence is making fast progress these days. It is finding more practical applications like image recognition, self-driving cars, healthcare, and speech recognition that are relevant in day to day life. New smartphones are the best example of devices that are capable to perform machine learning tasks that find everyday use. In 2017, Google released a new smartphone app called Lens. This app, using the phone's camera and machine learning techniques, can perform many image recognition tasks such as identifying plants or animals, reading and translating text, and scanning QR codes or barcodes. Another example is the voice assistants available with phones or smart speakers. The speech recognition systems for these... (More); The field of artificial intelligence is making fast progress these days. It is finding more practical applications like image recognition, self-driving cars, healthcare, and speech recognition that are relevant in day to day life. New smartphones are the best example of devices that are capable to perform machine learning tasks that find everyday use. In 2017, Google released a new smartphone app called Lens. This app, using the phone's camera and machine learning techniques, can perform many image recognition tasks such as identifying plants or animals, reading and translating text, and scanning QR codes or barcodes. Another example is the voice assistants available with phones or smart speakers. The speech recognition systems for these applications also use machine learning to classify the words spoken by the user. But for all these applications to work, a stable connection to the internet is necessary. The data is sent to powerful data centers located in the cloud to perform the demanding computations. However, with the rising concerns about privacy and due to bandwidth limitations, it is desirable to perform these tasks on the device itself instead of sending it over to a data center. This approach requires devices powerful enough to process the complex machine learning algorithms but highly energy-efficient at the same time so that they can still be powered by batteries.

Rapid draining of batteries on today's laptops, smartphones, smartwatches, and many other portable devices can be attributed to several reasons. One of the reasons is the internal memory of such battery-powered devices. Despite the advancements in technology and design architecture, the memories consume a significant amount of the energy budget. Hence, in order to achieve better battery life on such devices, it is important to have an energy-efficient memory sub-system. This thesis aims to analyze different memory configurations to improve energy efficiency for the memory sub-system in a hardware designed for image classification using machine learning. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9067280

author

Emmanuel, Jones ^LU

supervisor

organization

Department of Electrical and Information Technology

course

EITM02 20211

year

2021

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

keywords

Memory, Machine Learning, Hardware accelerator, Power, PULPissimo

report number

LU/LTH-EIT 2021-850

language

English

id

9067280

date added to LUP

2021-10-25 15:57:21

date last changed

2021-10-25 15:57:21

@misc{9067280,
  abstract     = {{As neural networks find more and more practical applications targeted for edge devices, the implementation of energy-efficient architectures is becoming very crucial. Despite the advancements in process technology, power and performance of memories remain to be a bottleneck for most computing platforms. The aim of this thesis is to study the effect of the breakdown structure of memories on power cost with a focus on a dedicated hardware accelerator in neural network applications. The evaluation test suite of this study consists of a RISC-V based System-on-Chip (SoC), PULPissimo, integrated with an accelerator designed for a convolutional neural network (CNN) application. The memory organization of the CNN hardware accelerator is implemented as a flexible and configurable wrapper for studying different breakdown structures. Moreover, different optimization techniques are also utilized to put area and power costs in the defined design budget. Multiple memory breakdown structures, suitable for the accelerator's memory sub-system, were analyzed for power consumption, as part of the study. The study reveals a non-linear increase in power consumption with the size of the static random-access memory (SRAM) modules and the limits of memory partitioning while using SRAM. It also revealed the power and area limitations of D flip-flop based standard cell memory (SCM) in comparison with SRAM.}},
  author       = {{Emmanuel, Jones}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{A Study on Efficient Memory Utilization in Machine Learning and Memory Intensive Systems}},
  year         = {{2021}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

A Study on Efficient Memory Utilization in Machine Learning and Memory Intensive Systems