A Hardware Accelerated Low Power DSP for Recurrent Neural Networks

Andersen, Allan; Yaman, Ilayda

A Hardware Accelerated Low Power DSP for Recurrent Neural Networks

Mark

Andersen, Allan ^LU and Yaman, Ilayda ^LU (2020) EITM02 20201
Department of Electrical and Information Technology

Abstract: Recurrent neural networks (RNNs) have become a dominating player for processing of sequential data such as speech and audio. The reason for this, is the high accuracy that can be achieved with the more complex variants, such as the gated recurrent unit (GRU). This makes them very attractive in speech recognition systems for digital assistance and voice control applications. However, a high power consumption and the large amount of memory required for these networks, make them less suitable for battery powered devices. In this work, we have designed a system on a chip (SoC) for efficient processing of GRU networks, that consists of an optimized digital signal processor (DSP) integrated with a hardware accelerator. To deal with the large... (More); Recurrent neural networks (RNNs) have become a dominating player for processing of sequential data such as speech and audio. The reason for this, is the high accuracy that can be achieved with the more complex variants, such as the gated recurrent unit (GRU). This makes them very attractive in speech recognition systems for digital assistance and voice control applications. However, a high power consumption and the large amount of memory required for these networks, make them less suitable for battery powered devices. In this work, we have designed a system on a chip (SoC) for efficient processing of GRU networks, that consists of an optimized digital signal processor (DSP) integrated with a hardware accelerator. To deal with the large memory requirements and high power consumption, several optimization techniques have been applied. A 75% reduction is achieved for the required memory, while the system can process real-time speech data with an energy consumption of 7.79 μJ per classification. In 28nm CMOS technology the area is 0.686 mm2. The design is programmable and scalable, which allows for execution of different network sizes. (Less)
Popular Abstract: New advancements in AI techniques and designs come into light everyday but there are serious limitations that should be addressed before they can be a part of the everyday life. In March 2016, Lee Sedol, who is a professional top-ranking Go player with 18 world championship titles, lost a match to an AI-based computer program called, AlphaGo. This victory (or defeat depending on how you look at it) was a game changer and milestone for the AI community in many ways. On the contrary, according to estimations, the AlphaGo consumes 1 megawatt power whereas the human brain consumes only 20 watts [1], which means the human brain is 50,000 times more efficient than the famous computer program.

Another area where AI is becoming a main player is... (More); New advancements in AI techniques and designs come into light everyday but there are serious limitations that should be addressed before they can be a part of the everyday life. In March 2016, Lee Sedol, who is a professional top-ranking Go player with 18 world championship titles, lost a match to an AI-based computer program called, AlphaGo. This victory (or defeat depending on how you look at it) was a game changer and milestone for the AI community in many ways. On the contrary, according to estimations, the AlphaGo consumes 1 megawatt power whereas the human brain consumes only 20 watts [1], which means the human brain is 50,000 times more efficient than the famous computer program.

Another area where AI is becoming a main player is recognition tasks such as image or speech recognition. Speech recognition systems are already commonly used in daily life and many people have interacted with them through "Apple Siri", "Google Home" or other kind of speech recognition systems. Many of these systems are using machine learning methods in order to classify the words spoken by humans. Machine learning methods focus on developing computer programs that can learn on their own and deep neural networks are a subset of machine learning. Deep neural networks can be specialized in accomplishing tasks such as detecting the keywords in a speech. Within deep learning, there are recurrent neural networks, which are more appealing than the other techniques because of the high accuracy it performs on speech recognition tasks. Yet, the high power consumption and the huge amount of data necessary in order to complete the calculations are restricting the usage of this technique. Many platforms that are used today just do these calculations in the cloud, which means the data on the device will be sent via the internet to large data centers where it is processed. This has many negative results such as extra power consumption, latency, cloud dependency and concerns about the security.

In our thesis, we explore different design techniques in order to achieve a low power, high efficiency and scalable design for speech recognition. The reference model we use is based on gated recurrent unit (GRU) which is an advanced recurrent neural network and it is used to detect a keyword that is spoken such as "Hey Siri" or "Okay Google". In order to implement a scalable yet highly efficient design, we optimized a Digital Signal Processor based on an Xtensa processor by Cadence Design Systems and accelerated the computation and optimized the memory usage with a hardware accelerator. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9021697

author

Andersen, Allan ^LU and Yaman, Ilayda ^LU

supervisor

Joachim Rodrigues ^LU

organization

Department of Electrical and Information Technology

alternative title

Low Power DSP for RNNs with GRU

course

EITM02 20201

year

2020

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

keywords

Digital Signal Processor, Hardware Accelerator, Machine Learning, Deep Learning, Recurrent Neural Networks, Gated Recurrent Unit

report number

LU/LTH-EIT 2020-771

language

English

id

9021697

date added to LUP

2020-06-26 10:33:36

date last changed

2020-06-26 10:33:36

@misc{9021697,
  abstract     = {{Recurrent neural networks (RNNs) have become a dominating player for processing of sequential data such as speech and audio. The reason for this, is the high accuracy that can be achieved with the more complex variants, such as the gated recurrent unit (GRU). This makes them very attractive in speech recognition systems for digital assistance and voice control applications. However, a high power consumption and the large amount of memory required for these networks, make them less suitable for battery powered devices. In this work, we have designed a system on a chip (SoC) for efficient processing of GRU networks, that consists of an optimized digital signal processor (DSP) integrated with a hardware accelerator. To deal with the large memory requirements and high power consumption, several optimization techniques have been applied. A 75% reduction is achieved for the required memory, while the system can process real-time speech data with an energy consumption of 7.79 μJ per classification. In 28nm CMOS technology the area is 0.686 mm2. The design is programmable and scalable, which allows for execution of different network sizes.}},
  author       = {{Andersen, Allan and Yaman, Ilayda}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{A Hardware Accelerated Low Power DSP for Recurrent Neural Networks}},
  year         = {{2020}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

A Hardware Accelerated Low Power DSP for Recurrent Neural Networks