Advanced

Implementation of an 8-bit Dynamic Fixed-Point Convolutional Neural Network for Human Sign Language Recognition on a Xilinx FPGA Board

Núñez-Prieto, Ricardo LU (2019) EITM02 20181
Department of Electrical and Information Technology
Abstract
The goal of this thesis work is to implement a convolutional neural network on an FPGA device with the capability of recognising human sign language. The set of gestures that the neural network can identify has been taken from the Swedish sign language, and it consists of the signs used for representing the letters of the Swedish alphabet (a.k.a. fingerspelling).
The motivation driving this project lies in the tremendous interest aroused by neural networks in recent years for its ability for solving complex problems and its capacity to learn by example. More specifically, convolutional neural networks are being extensively used for image classification, and this project aims to design a hardware accelerator to compute the convolutional... (More)
The goal of this thesis work is to implement a convolutional neural network on an FPGA device with the capability of recognising human sign language. The set of gestures that the neural network can identify has been taken from the Swedish sign language, and it consists of the signs used for representing the letters of the Swedish alphabet (a.k.a. fingerspelling).
The motivation driving this project lies in the tremendous interest aroused by neural networks in recent years for its ability for solving complex problems and its capacity to learn by example. More specifically, convolutional neural networks are being extensively used for image classification, and this project aims to design a hardware accelerator to compute the convolutional layers of such type of network topology and test its accuracy and performance when dealing with human sign language.
The network topology of choice is Zynqnet, proposed by Gschwend in 2016, which is a topology that has already been implemented successfully on an FPGA platform and it has been trained with the large picture dataset provided by ImageNet, for its popular image recognition contest. In this regard, the aim of this work is not to propose a new neural network topology but to re-use an existent one by introducing some improvements like the utilisation of an 8-bit dynamic fixed-point scheme and challenge it with a different but related task, like human sign language recognition.
The methodology followed to carry out a successful hardware implementation has consisted, first, of the installation and setup of a reliable framework used for the training of the neural network. Different frameworks were tried out, like MATLAB or Caffe, but finally, DIGITS from NVIDIA was the more convenient due to its graphical environment and because it provides all the compatibility and drivers needed to run together with the GPU used in this project. Then, an image dataset of more than 13,000 pictures of hand gestures has been built up to grant enough input data for the framework to fine-tune ZynqNet for the new task, i.e. to provide the neural network with the ability to classify the different hand-signs into its corresponding alphabet letter. In parallel, the Register-Transfer Level (RTL) abstraction of the hardware architecture has been generated using a High-Level Synthesis tool chain, in which the algorithmic descriptions are written in C/C++. Finally, the validation of the design has been done by means of co-simulation techniques where the golden data obtained with the C test bench is compared with the output data of the RTL implementation, and all of it within the simulation environment provided by the Vivado Design Suite.
As a result, the best-performing obtained solution achieved an accuracy of 80.1\% in the inference test and a frame rate of 6.4 FPS with a clock frequency of 250 MHz. (Less)
Popular Abstract
Neural networks are becoming more and more ubiquitous in our everyday lives, many times in ways that we do not even realise. Artificial Intelligence (AI) is extensively used nowadays to improve the user's experience with digital technologies, for instance, the on-the-fly translation service provided by Skype and Google. In other fields like robotics, improvements in object detection from the hand of machine learning, allow robots and autonomous cars to take better decisions. And in medicine, automatic detection of blood diseases like leukaemia and lymphoma, powered by neural networks algorithms, have accelerated and improved diagnosis. So the list of applications found in many diverse fields goes on and on.
Now, let's picture yourself in... (More)
Neural networks are becoming more and more ubiquitous in our everyday lives, many times in ways that we do not even realise. Artificial Intelligence (AI) is extensively used nowadays to improve the user's experience with digital technologies, for instance, the on-the-fly translation service provided by Skype and Google. In other fields like robotics, improvements in object detection from the hand of machine learning, allow robots and autonomous cars to take better decisions. And in medicine, automatic detection of blood diseases like leukaemia and lymphoma, powered by neural networks algorithms, have accelerated and improved diagnosis. So the list of applications found in many diverse fields goes on and on.
Now, let's picture yourself in the hypothetical situation in which you have a friend or a relative who has been born with a hearing impairment. This person has been taught sign language from an early age on a specialised school, and you would like to learn sign language too so you both can have meaningful and pleasant communication. Furthermore, you want to be able to help this person in day-to-day situations where deaf people can be in clear disadvantage like a routine visit to the doctor or administrative processes.
You learn from one of your classmates at the university about a mobile app which employs a deep neural network to recognise human sign language just by using the phone camera. The application translates the captured sequence of gestures from video to written text in real-time and automatically reproduces the message in the phone speaker. It also includes a sign language tutorial which can help you to rapidly learn to communicate by using your hands. The software records your gestures and improves your learning abilities by telling how accurate are your movements.
Well, the situation just described is something that I believe it is not far to happen. The computational power found in embedded systems such as mobile phones is growing by the day. So far, at least, one can find mostly solutions that work one-way, that is, they convert speech into sign language, by mapping spoken language to signs and using a virtual animated human-like avatar. Fewer solutions can translate sign language into speech. Some of them use special gloves with position sensors not so pleasant to wear by the signer person, and others use 3D cameras that can pick the speaker's body gestures and then compare the obtained frame sequence with a reference frame stored in a dictionary. These solutions though, rely on mapping methods and are limited in terms of the number of gestures they can interpret.
I really believe that neural networks are a game changer and they will bring powerful, elegant solutions to the kind of problems described above. This thesis work aims to provide proof-of-concept of a feasible implementation of a neural network trained for recognition of human sign language. The number of gestures to recognise is limited to the Swedish alphabet, and the network must show an acceptable level of prediction accuracy, throughput and area utilisation.
The content of this thesis is addressed to a variety of public: from people interested in neural networks in general, and the significant development experienced in the field in recent years, to people interested in learning the basic concepts and the methodology for training neural networks, or practitioners who look to implement a deep learning model on an AI accelerator. (Less)
Please use this url to cite or link to this publication:
author
Núñez-Prieto, Ricardo LU
supervisor
organization
course
EITM02 20181
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Artificial Intelligence, Computer Vision, Machine Learning, Convolutional Neural Networks, FPGA, Sign Language Recognition
report number
LU/LTH-EIT 2019-688
language
English
id
8973088
date added to LUP
2019-03-25 15:52:42
date last changed
2019-03-25 15:52:42
@misc{8973088,
  abstract     = {The goal of this thesis work is to implement a convolutional neural network on an FPGA device with the capability of recognising human sign language. The set of gestures that the neural network can identify has been taken from the Swedish sign language, and it consists of the signs used for representing the letters of the Swedish alphabet (a.k.a. fingerspelling).
The motivation driving this project lies in the tremendous interest aroused by neural networks in recent years for its ability for solving complex problems and its capacity to learn by example. More specifically, convolutional neural networks are being extensively used for image classification, and this project aims to design a hardware accelerator to compute the convolutional layers of such type of network topology and test its accuracy and performance when dealing with human sign language.
The network topology of choice is Zynqnet, proposed by Gschwend in 2016, which is a topology that has already been implemented successfully on an FPGA platform and it has been trained with the large picture dataset provided by ImageNet, for its popular image recognition contest. In this regard, the aim of this work is not to propose a new neural network topology but to re-use an existent one by introducing some improvements like the utilisation of an 8-bit dynamic fixed-point scheme and challenge it with a different but related task, like human sign language recognition.
The methodology followed to carry out a successful hardware implementation has consisted, first, of the installation and setup of a reliable framework used for the training of the neural network. Different frameworks were tried out, like MATLAB or Caffe, but finally, DIGITS from NVIDIA was the more convenient due to its graphical environment and because it provides all the compatibility and drivers needed to run together with the GPU used in this project. Then, an image dataset of more than 13,000 pictures of hand gestures has been built up to grant enough input data for the framework to fine-tune ZynqNet for the new task, i.e. to provide the neural network with the ability to classify the different hand-signs into its corresponding alphabet letter. In parallel, the Register-Transfer Level (RTL) abstraction of the hardware architecture has been generated using a High-Level Synthesis tool chain, in which the algorithmic descriptions are written in C/C++. Finally, the validation of the design has been done by means of co-simulation techniques where the golden data obtained with the C test bench is compared with the output data of the RTL implementation, and all of it within the simulation environment provided by the Vivado Design Suite.
As a result, the best-performing obtained solution achieved an accuracy of 80.1\% in the inference test and a frame rate of 6.4 FPS with a clock frequency of 250 MHz.},
  author       = {Núñez-Prieto, Ricardo},
  keyword      = {Artificial Intelligence,Computer Vision,Machine Learning,Convolutional Neural Networks,FPGA,Sign Language Recognition},
  language     = {eng},
  note         = {Student Paper},
  title        = {Implementation of an 8-bit Dynamic Fixed-Point Convolutional Neural Network for Human Sign Language Recognition on a Xilinx FPGA Board},
  year         = {2019},
}