Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

FLoPAD-GRU : A Flexible, Low Power, Accelerated DSP for Gated Recurrent Unit Neural Network

Yaman, Ilayda LU ; Andersen, Allan ; Ferreira, Lucas LU and Rodrigues, Joachirn LU (2021) 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021 In Proceedings - 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021
Abstract

Recurrent neural networks (RNNs) are efficient for classification of sequential data such as speech and audio due to their high precision on tasks. However, power efficiency, the required memory capacity and bandwidth requirements make them less suitable for battery powered devices. In this work, we introduce FLoPAD-GRU: a system on a chip (SoC) for efficient processing of gated recurrent unit (GRU) networks, that consists of a digital signal processor (DSP), supplemented with an optimized hardware accelerator, which reduces memory accesses and cost. The system is programmable and scalable, which allows for execution of different network sizes. Synthesized in 28 nm CMOS technology, real-time classification is achieved at 4 MHz, with an... (More)

Recurrent neural networks (RNNs) are efficient for classification of sequential data such as speech and audio due to their high precision on tasks. However, power efficiency, the required memory capacity and bandwidth requirements make them less suitable for battery powered devices. In this work, we introduce FLoPAD-GRU: a system on a chip (SoC) for efficient processing of gated recurrent unit (GRU) networks, that consists of a digital signal processor (DSP), supplemented with an optimized hardware accelerator, which reduces memory accesses and cost. The system is programmable and scalable, which allows for execution of different network sizes. Synthesized in 28 nm CMOS technology, real-time classification is achieved at 4 MHz, with an energy dissipation of 4.1 pJ/classification, an improvement of 15 × compared to a pure DSP realization. The memory requirements are reduced by 75 %, which results in a silicon area of 0.7 mm2for the entire SoC.

(Less)
Please use this url to cite or link to this publication:
author
; ; and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
Deep Learning, Digital Signal Processor, GRU, Hardware Accelerator, RNN, SoC, Speech Recognition
host publication
Proceedings - 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021
series title
Proceedings - 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021
publisher
IEEE - Institute of Electrical and Electronics Engineers Inc.
conference name
34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021
conference location
Campinas, Brazil
conference dates
2021-08-23 - 2021-08-27
external identifiers
  • scopus:85116259648
ISBN
9781665421706
DOI
10.1109/SBCCI53441.2021.9529981
language
English
LU publication?
yes
additional info
Publisher Copyright: © 2021 IEEE.
id
c06936c3-eedd-43f4-b4d7-3b1210d09bc8
date added to LUP
2021-10-26 13:46:10
date last changed
2022-04-27 05:16:09
@inproceedings{c06936c3-eedd-43f4-b4d7-3b1210d09bc8,
  abstract     = {{<p>Recurrent neural networks (RNNs) are efficient for classification of sequential data such as speech and audio due to their high precision on tasks. However, power efficiency, the required memory capacity and bandwidth requirements make them less suitable for battery powered devices. In this work, we introduce FLoPAD-GRU: a system on a chip (SoC) for efficient processing of gated recurrent unit (GRU) networks, that consists of a digital signal processor (DSP), supplemented with an optimized hardware accelerator, which reduces memory accesses and cost. The system is programmable and scalable, which allows for execution of different network sizes. Synthesized in 28 nm CMOS technology, real-time classification is achieved at 4 MHz, with an energy dissipation of 4.1 pJ/classification, an improvement of 15 × compared to a pure DSP realization. The memory requirements are reduced by 75 %, which results in a silicon area of 0.7 mm2for the entire SoC.</p>}},
  author       = {{Yaman, Ilayda and Andersen, Allan and Ferreira, Lucas and Rodrigues, Joachirn}},
  booktitle    = {{Proceedings - 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021}},
  isbn         = {{9781665421706}},
  keywords     = {{Deep Learning; Digital Signal Processor; GRU; Hardware Accelerator; RNN; SoC; Speech Recognition}},
  language     = {{eng}},
  month        = {{08}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{Proceedings - 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021}},
  title        = {{FLoPAD-GRU : A Flexible, Low Power, Accelerated DSP for Gated Recurrent Unit Neural Network}},
  url          = {{http://dx.doi.org/10.1109/SBCCI53441.2021.9529981}},
  doi          = {{10.1109/SBCCI53441.2021.9529981}},
  year         = {{2021}},
}