DSP Design With Hardware Accelerator For Convolutional Neural Networks

Hille, Julian; Santos Ferreira, Lucas

DSP Design With Hardware Accelerator For Convolutional Neural Networks

Mark

Hille, Julian ^LU and Santos Ferreira, Lucas ^LU (2019) EITM02 20182
Department of Electrical and Information Technology

Abstract: Convolutional Neural Networks impressed the world in 2012 by reaching state-of-the-art accuracy levels in the ImageNet Large Scale Visual Recognition Challenge. The era of machine learning has arrived and with it countless applications varying from autonomous driving to unstructured robotic manipulation. Computational complexity in the past years has grown exponentially, requiring highly efficient low power new hardware architectures, capable of executing those.
In this work, we have performed optimization in three levels of hardware design: from algorithmic, to system, and accelerator level. The design of a DSP with Tensilica and the integration of Xenergic dual port SRAMs, for direct memory access of a convolution hardware accelerator,... (More); Convolutional Neural Networks impressed the world in 2012 by reaching state-of-the-art accuracy levels in the ImageNet Large Scale Visual Recognition Challenge. The era of machine learning has arrived and with it countless applications varying from autonomous driving to unstructured robotic manipulation. Computational complexity in the past years has grown exponentially, requiring highly efficient low power new hardware architectures, capable of executing those.
In this work, we have performed optimization in three levels of hardware design: from algorithmic, to system, and accelerator level. The design of a DSP with Tensilica and the integration of Xenergic dual port SRAMs, for direct memory access of a convolution hardware accelerator, lead to four orders speed-up on the initial identified bottleneck, causing an estimated three times final speed-up of a single handwritten classification image compared to the pure software implementation. Higher speed-up is expected for deeper convolutional architectures and larger image dimensions, due to the linear time complexity scaling of the convolution hardware accelerator in comparison to conventional non-linear software-based approaches. (Less)
Popular Abstract: Artificial Intelligence is becoming more and more used in newer technologies, from mobile phones featuring voice detection to autonomous driving cars and also in the new industries. For such applications the "intelligence" requirements are increasing. Today much computations are solved by using the cloud. For example, in mobile phones, voice assistance only works with an internet connection. The same approaches are not possible for autonomic controlled vehicles. The essential control features have to be inside the vehicle.

Therefore we are in need to bring Artificial Intelligence into mobile devices. This thesis aims to implement a benchmark classification problem (MNIST) by using a programmable processor, designed with a commercial... (More); Artificial Intelligence is becoming more and more used in newer technologies, from mobile phones featuring voice detection to autonomous driving cars and also in the new industries. For such applications the "intelligence" requirements are increasing. Today much computations are solved by using the cloud. For example, in mobile phones, voice assistance only works with an internet connection. The same approaches are not possible for autonomic controlled vehicles. The essential control features have to be inside the vehicle.

Therefore we are in need to bring Artificial Intelligence into mobile devices. This thesis aims to implement a benchmark classification problem (MNIST) by using a programmable processor, designed with a commercial tool, and a flexible hardware accelerator to speed up a convolutional neural network that recognizes handwritten digits between 0 and 9.
Therefore we have designed and trained a reference architecture in the programming language Python, from which the weights were obtained to implement the same architecture on the designed processor (by using C/C++). By investigation of the most resources consuming functions, we have figured out that the convolution has the highest computation cost. Hence the accelerator was implemented and instructions added, directly connecting it to the processor. Results obtained achieved four orders of magnitude total speed-up of the identified bottleneck. Yielding in an estimated three times final speed-up for a single handwritten classification image, compared to a pure software implementation at the same processor. Additionally, an open-source processor alternative is proposed. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/8973041

author

Hille, Julian ^LU and Santos Ferreira, Lucas ^LU

supervisor

Joachim Rodrigues ^LU
Hemanth Prabhu

organization

Department of Electrical and Information Technology

course

EITM02 20182

year

2019

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

keywords

CNN, Hardware, Accelerator, DSP, Convolutional, Convolution, Neural, Network, Processor, Tensilica, FIR, Memory, SRAM

report number

LU/LTH-EIT 2019-686

language

English

id

8973041

date added to LUP

2019-03-27 11:46:33

date last changed

2019-03-27 11:46:33

@misc{8973041,
  abstract     = {{Convolutional Neural Networks impressed the world in 2012 by reaching state-of-the-art accuracy levels in the ImageNet Large Scale Visual Recognition Challenge. The era of machine learning has arrived and with it countless applications varying from autonomous driving to unstructured robotic manipulation. Computational complexity in the past years has grown exponentially, requiring highly efficient low power new hardware architectures, capable of executing those.
In this work, we have performed optimization in three levels of hardware design: from algorithmic, to system, and accelerator level. The design of a DSP with Tensilica and the integration of Xenergic dual port SRAMs, for direct memory access of a convolution hardware accelerator, lead to four orders speed-up on the initial identified bottleneck, causing an estimated three times final speed-up of a single handwritten classification image compared to the pure software implementation. Higher speed-up is expected for deeper convolutional architectures and larger image dimensions, due to the linear time complexity scaling of the convolution hardware accelerator in comparison to conventional non-linear software-based approaches.}},
  author       = {{Hille, Julian and Santos Ferreira, Lucas}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{DSP Design With Hardware Accelerator For Convolutional Neural Networks}},
  year         = {{2019}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

DSP Design With Hardware Accelerator For Convolutional Neural Networks