Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Gate Recurrent Unit Neural Networks for Hearing Instruments

Sharma, Harshit LU and Rajanna, Pallavi LU (2020) EITM02 20192
Department of Electrical and Information Technology
Abstract
Gated Recurrent Unit (GRU) neural networks have gained popularity for applications such as keyword spotting, speech recognition and other artificial intelligence applications. Typically for most applications training and inference is performed on cloud servers, and the result are transferred to the power constrained device, e.g., an hearing instrument (HI). This approach has disadvantages such as latency and connectivity, privacy concern, and high energy cost per bit for real-time data transfer. Therefore, there is a strong demand to move inference from cloud to power constraint devices. However, executing inference on HI introduces many challenges in terms of throughput, power budget, and memory footprint. This research investigate how... (More)
Gated Recurrent Unit (GRU) neural networks have gained popularity for applications such as keyword spotting, speech recognition and other artificial intelligence applications. Typically for most applications training and inference is performed on cloud servers, and the result are transferred to the power constrained device, e.g., an hearing instrument (HI). This approach has disadvantages such as latency and connectivity, privacy concern, and high energy cost per bit for real-time data transfer. Therefore, there is a strong demand to move inference from cloud to power constraint devices. However, executing inference on HI introduces many challenges in terms of throughput, power budget, and memory footprint. This research investigate how efficient it is to execute inference on a dedicated hardware accelerator, rather than using an existing audio digital signal processor (xDSP in Oticon’s HI).

The two approaches are compared in terms of area, power, energy dissipation and total clock cycles required to perform an inference. Straightforward implementation of nonlinear activation function is expensive in hardware, therefore, different methods of approximation are evaluated. Out of different approximation algorithms, fast sigmoid and fast tanh approaches were chosen. A pretrained keyword spotting (KWS) model was used. However, it exceeds the memory space available on xDSP. Instead, three small GRU networks were trained and executed on xDSP to approximate energy dissipation and clock cycle count if a bigger network was run on the xDSP.

Precision needed to store and compute data was reduced to minimize storage needed keeping detection accuracy in mind. By reducing wordlength from 32-bit to 8-bit for network parameters, memory space required was reduced by 4 times while accuracy decreased from 91% to 88%. The GRU inference runs on per layer basis, data flow was optimized to achieve significant reduction in area and power.

The xDSP needs around 2× more clock cycles to complete a full network inference for a benchmark keyword spotting neural network compared to dedicated hardware accelerator. The energy dissipation increased by around 10× while using Oticon’s xDSP processor instead of a dedicated accelerator. The xDSP is capable of executing GRU network with upto 40 neurons per layer, but for bigger networks hardware accelerator is a better solution. All in all, the dedicated accelerator solution has the best performance from the explored solution and can be integrated in HI to compute neural networks. (Less)
Popular Abstract
Artificial intelligence is becoming a huge part of our life, from being used in mobile phones, smart watchs, home entertainment systems etc. However, due to a large amount of computations need to be performed to execute a simple task, most of this processing is done in cloud servers. Although, there is a breakthrough in artificial intelligence, there has been serious limitation in terms of power and energy efficiency that needs to be addressed.

In 2019, artificial intelligence (AI) computer program known as AlphaStar built by Google’s AI firm DeepMind played the science-fiction video game StarCraft II on European servers. The AI competed against 90,000 player and was placed within the top 0.15%. DeepMind, previously built world leading... (More)
Artificial intelligence is becoming a huge part of our life, from being used in mobile phones, smart watchs, home entertainment systems etc. However, due to a large amount of computations need to be performed to execute a simple task, most of this processing is done in cloud servers. Although, there is a breakthrough in artificial intelligence, there has been serious limitation in terms of power and energy efficiency that needs to be addressed.

In 2019, artificial intelligence (AI) computer program known as AlphaStar built by Google’s AI firm DeepMind played the science-fiction video game StarCraft II on European servers. The AI competed against 90,000 player and was placed within the top 0.15%. DeepMind, previously built world leading AIs that play chess and Go. However, estimated power consumption of these AI is in order of megawatts, whereas human brain only consumes 20 watt. This means that AI needs to be more efficient before it can be completely integrated in daily our life.

AI has gained popularity in speech recognition technology, where AI can recognize spoken words, which can then be converted to text or used to perform tasks. A subset of speech recognition is keyword spotting, where a task is performed after identifying a keyword in the input voice signal. Companies such as Facebook, Amazon, Microsoft, Google and Apple have already integrated this feature on various devices through services like Google Home, Amazon Echo and Siri.

With this in mind, the goal of this thesis has been to select a pretrained keyword spotting model and propose a efficient dedicated hardware accelerator to perform this task. The spotting of spoken keyword has been performed using a GRU algorithm which is an advanced recurrent neural network. In order to compare the scalable and efficient hardware accelerator design, it was compared with an existing audio digital signal processor used in Oticon’s hearing instruments. This research addresses the problem of high power consumption and large memory reference that restricts the use of large scale neural networks on power constrained devices. Research also addresses the issue of privacy, i.e., sharing of data with cloud servers.

The proposed dedicated hardware accelerator can be integrated in HI to compute neural networks. (Less)
Please use this url to cite or link to this publication:
author
Sharma, Harshit LU and Rajanna, Pallavi LU
supervisor
organization
alternative title
Återkommande grindenhet neuronnät för hörapparater
course
EITM02 20192
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Keyword Spotting, GRU, Gated, Recurrent, Unint, RNN, Hearing Instruments, Hearing, Instruments.
report number
LU/LTH-EIT 2020-774
language
English
id
9022675
date added to LUP
2020-06-26 11:05:30
date last changed
2020-06-27 03:40:22
@misc{9022675,
  abstract     = {{Gated Recurrent Unit (GRU) neural networks have gained popularity for applications such as keyword spotting, speech recognition and other artificial intelligence applications. Typically for most applications training and inference is performed on cloud servers, and the result are transferred to the power constrained device, e.g., an hearing instrument (HI). This approach has disadvantages such as latency and connectivity, privacy concern, and high energy cost per bit for real-time data transfer. Therefore, there is a strong demand to move inference from cloud to power constraint devices. However, executing inference on HI introduces many challenges in terms of throughput, power budget, and memory footprint. This research investigate how efficient it is to execute inference on a dedicated hardware accelerator, rather than using an existing audio digital signal processor (xDSP in Oticon’s HI).

The two approaches are compared in terms of area, power, energy dissipation and total clock cycles required to perform an inference. Straightforward implementation of nonlinear activation function is expensive in hardware, therefore, different methods of approximation are evaluated. Out of different approximation algorithms, fast sigmoid and fast tanh approaches were chosen. A pretrained keyword spotting (KWS) model was used. However, it exceeds the memory space available on xDSP. Instead, three small GRU networks were trained and executed on xDSP to approximate energy dissipation and clock cycle count if a bigger network was run on the xDSP.

Precision needed to store and compute data was reduced to minimize storage needed keeping detection accuracy in mind. By reducing wordlength from 32-bit to 8-bit for network parameters, memory space required was reduced by 4 times while accuracy decreased from 91% to 88%. The GRU inference runs on per layer basis, data flow was optimized to achieve significant reduction in area and power. 

The xDSP needs around 2× more clock cycles to complete a full network inference for a benchmark keyword spotting neural network compared to dedicated hardware accelerator. The energy dissipation increased by around 10× while using Oticon’s xDSP processor instead of a dedicated accelerator. The xDSP is capable of executing GRU network with upto 40 neurons per layer, but for bigger networks hardware accelerator is a better solution. All in all, the dedicated accelerator solution has the best performance from the explored solution and can be integrated in HI to compute neural networks.}},
  author       = {{Sharma, Harshit and Rajanna, Pallavi}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Gate Recurrent Unit Neural Networks for Hearing Instruments}},
  year         = {{2020}},
}