Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Memory Efficient Hardware Accelerator for CNN Inference

Castillo Mohedano, Sergio LU (2023) EITM02 20231
Department of Electrical and Information Technology
Abstract
Convolutional neural networks (CNNs) have gained popularity in recent years due to their ability to solve complex problems in areas such as image recognition, natural language processing, and speech recognition. However, the computational cost and memory requirements of CNNs are significant challenges for their widespread deployment, particularly in edge devices where power and area budgets are limited. To address these challenges, this thesis focuses on the design of a low-energy CNN inference accelerator using near-data processing (NDP), which is an approach to improve energy efficiency by bringing computation closer to data.

This thesis presents a design for a CNN inference accelerator that utilizes NDP to improve energy efficiency.... (More)
Convolutional neural networks (CNNs) have gained popularity in recent years due to their ability to solve complex problems in areas such as image recognition, natural language processing, and speech recognition. However, the computational cost and memory requirements of CNNs are significant challenges for their widespread deployment, particularly in edge devices where power and area budgets are limited. To address these challenges, this thesis focuses on the design of a low-energy CNN inference accelerator using near-data processing (NDP), which is an approach to improve energy efficiency by bringing computation closer to data.

This thesis presents a design for a CNN inference accelerator that utilizes NDP to improve energy efficiency. The accelerator is designed to execute convolutional layers of the CNN with high throughput and low power consumption. It uses parallel processing and data reuse techniques to reduce the amount of data transferred between the memory and the accelerator. In addition, clock-gating is applied to reduce power consumption. At 200 MHz, it achieves a performance of 2.42 GOPS and an energy efficiency of 47.54 GOPS/W.

The accelerator is synthesized and simulated at gate-level to calculate its performance and energy consumption, and it is evaluated using the CIFAR-10 dataset. Overall, this thesis contributes to the field of CNN accelerators by providing a low-energy and high-performance design that could be used in edge devices for real-time CNN inference applications. The design can be further optimized and customized for specific use cases, and it provides a foundation for future research in the field of NDP and CNN accelerators. (Less)
Popular Abstract
Deep Neural Networks (DNNs) have achieved remarkable breakthroughs, exemplifying the potential to revolutionize the world we live in. One such example is the remarkable performance of AlphaGo, a computer program developed by DeepMind and based on DNNs, which achieved a historic victory against the world champion in the ancient and complex board game of Go in 2016. This accomplishment, among others, underscores the potential of DNNs to transform various fields and bring new and innovative solutions to real-world problems.

Convolutional Neural Networks (CNNs), a subset of DNNs, are used in today’s world in many applications such as image classification, speech recognition, and natural language processing. CNNs have achieved significant... (More)
Deep Neural Networks (DNNs) have achieved remarkable breakthroughs, exemplifying the potential to revolutionize the world we live in. One such example is the remarkable performance of AlphaGo, a computer program developed by DeepMind and based on DNNs, which achieved a historic victory against the world champion in the ancient and complex board game of Go in 2016. This accomplishment, among others, underscores the potential of DNNs to transform various fields and bring new and innovative solutions to real-world problems.

Convolutional Neural Networks (CNNs), a subset of DNNs, are used in today’s world in many applications such as image classification, speech recognition, and natural language processing. CNNs have achieved significant milestones in the past years, such as surpassing human-level accuracy in complex image classification benchmarks such as ImageNet.

The design of efficient CNN accelerators is crucial to meet the increasing demand for real-time and energy-efficient processing. In this thesis, the design of a CNN accelerator for inference is proposed, which takes advantage of the Row-Stationary dataflow. This dataflow technique improves performance by reusing data and distributing it to Processing Elements across a Network On Chip, bringing data closer to the computation units.

The design is validated using the CIFAR-10 benchmark, which is a popular dataset used to evaluate CNN performance. The proposed design uses fixed-point quantization, which reduces memory usage and computation complexity while maintaining acceptable accuracy levels.

Overall, this thesis presents a step forward towards accomplishing a reliable, efficient and flexible accelerator. The proposed design addresses the challenges associated with CNN acceleration, such as high computational cost and memory bottleneck, while improving energy efficiency. (Less)
Please use this url to cite or link to this publication:
author
Castillo Mohedano, Sergio LU
supervisor
organization
course
EITM02 20231
year
type
H2 - Master's Degree (Two Years)
subject
report number
LU/LTH-EIT 2023-917
language
English
id
9117003
date added to LUP
2023-05-30 11:26:55
date last changed
2023-05-30 11:27:43
@misc{9117003,
  abstract     = {{Convolutional neural networks (CNNs) have gained popularity in recent years due to their ability to solve complex problems in areas such as image recognition, natural language processing, and speech recognition. However, the computational cost and memory requirements of CNNs are significant challenges for their widespread deployment, particularly in edge devices where power and area budgets are limited. To address these challenges, this thesis focuses on the design of a low-energy CNN inference accelerator using near-data processing (NDP), which is an approach to improve energy efficiency by bringing computation closer to data.

This thesis presents a design for a CNN inference accelerator that utilizes NDP to improve energy efficiency. The accelerator is designed to execute convolutional layers of the CNN with high throughput and low power consumption. It uses parallel processing and data reuse techniques to reduce the amount of data transferred between the memory and the accelerator. In addition, clock-gating is applied to reduce power consumption. At 200 MHz, it achieves a performance of 2.42 GOPS and an energy efficiency of 47.54 GOPS/W.

The accelerator is synthesized and simulated at gate-level to calculate its performance and energy consumption, and it is evaluated using the CIFAR-10 dataset. Overall, this thesis contributes to the field of CNN accelerators by providing a low-energy and high-performance design that could be used in edge devices for real-time CNN inference applications. The design can be further optimized and customized for specific use cases, and it provides a foundation for future research in the field of NDP and CNN accelerators.}},
  author       = {{Castillo Mohedano, Sergio}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Memory Efficient Hardware Accelerator for CNN Inference}},
  year         = {{2023}},
}