FLoPAD-GRU : A Flexible, Low Power, Accelerated DSP for Gated Recurrent Unit Neural Network

Yaman, Ilayda; Andersen, Allan; Ferreira, Lucas; Rodrigues, Joachirn

FLoPAD-GRU : A Flexible, Low Power, Accelerated DSP for Gated Recurrent Unit Neural Network

Mark

Yaman, Ilayda ^LU ; Andersen, Allan ; Ferreira, Lucas ^LU and Rodrigues, Joachirn ^LU (2021) 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021 In Proceedings - 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021

Abstract: Recurrent neural networks (RNNs) are efficient for classification of sequential data such as speech and audio due to their high precision on tasks. However, power efficiency, the required memory capacity and bandwidth requirements make them less suitable for battery powered devices. In this work, we introduce FLoPAD-GRU: a system on a chip (SoC) for efficient processing of gated recurrent unit (GRU) networks, that consists of a digital signal processor (DSP), supplemented with an optimized hardware accelerator, which reduces memory accesses and cost. The system is programmable and scalable, which allows for execution of different network sizes. Synthesized in 28 nm CMOS technology, real-time classification is achieved at 4 MHz, with an... (More); Recurrent neural networks (RNNs) are efficient for classification of sequential data such as speech and audio due to their high precision on tasks. However, power efficiency, the required memory capacity and bandwidth requirements make them less suitable for battery powered devices. In this work, we introduce FLoPAD-GRU: a system on a chip (SoC) for efficient processing of gated recurrent unit (GRU) networks, that consists of a digital signal processor (DSP), supplemented with an optimized hardware accelerator, which reduces memory accesses and cost. The system is programmable and scalable, which allows for execution of different network sizes. Synthesized in 28 nm CMOS technology, real-time classification is achieved at 4 MHz, with an energy dissipation of 4.1 pJ/classification, an improvement of 15 × compared to a pure DSP realization. The memory requirements are reduced by 75 %, which results in a silicon area of 0.7 mm2for the entire SoC.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/c06936c3-eedd-43f4-b4d7-3b1210d09bc8

author

Yaman, Ilayda ^LU ; Andersen, Allan ; Ferreira, Lucas ^LU and Rodrigues, Joachirn ^LU

organization

Integrated Electronic Systems

publishing date

2021-08-23

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

Computer Sciences

keywords

Deep Learning, Digital Signal Processor, GRU, Hardware Accelerator, RNN, SoC, Speech Recognition

host publication

Proceedings - 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021

series title

Proceedings - 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021

publisher

IEEE - Institute of Electrical and Electronics Engineers Inc.

conference name

34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021

conference location

Campinas, Brazil

conference dates

2021-08-23 - 2021-08-27

external identifiers

scopus:85116259648

ISBN

9781665421706

DOI

10.1109/SBCCI53441.2021.9529981

language

English

LU publication?

yes

additional info

id

c06936c3-eedd-43f4-b4d7-3b1210d09bc8

date added to LUP

2021-10-26 13:46:10

date last changed

2025-04-21 00:19:50

@inproceedings{c06936c3-eedd-43f4-b4d7-3b1210d09bc8,
  abstract     = {{<p>Recurrent neural networks (RNNs) are efficient for classification of sequential data such as speech and audio due to their high precision on tasks. However, power efficiency, the required memory capacity and bandwidth requirements make them less suitable for battery powered devices. In this work, we introduce FLoPAD-GRU: a system on a chip (SoC) for efficient processing of gated recurrent unit (GRU) networks, that consists of a digital signal processor (DSP), supplemented with an optimized hardware accelerator, which reduces memory accesses and cost. The system is programmable and scalable, which allows for execution of different network sizes. Synthesized in 28 nm CMOS technology, real-time classification is achieved at 4 MHz, with an energy dissipation of 4.1 pJ/classification, an improvement of 15 × compared to a pure DSP realization. The memory requirements are reduced by 75 %, which results in a silicon area of 0.7 mm2for the entire SoC.</p>}},
  author       = {{Yaman, Ilayda and Andersen, Allan and Ferreira, Lucas and Rodrigues, Joachirn}},
  booktitle    = {{Proceedings - 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021}},
  isbn         = {{9781665421706}},
  keywords     = {{Deep Learning; Digital Signal Processor; GRU; Hardware Accelerator; RNN; SoC; Speech Recognition}},
  language     = {{eng}},
  month        = {{08}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{Proceedings - 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021}},
  title        = {{FLoPAD-GRU : A Flexible, Low Power, Accelerated DSP for Gated Recurrent Unit Neural Network}},
  url          = {{http://dx.doi.org/10.1109/SBCCI53441.2021.9529981}},
  doi          = {{10.1109/SBCCI53441.2021.9529981}},
  year         = {{2021}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

FLoPAD-GRU : A Flexible, Low Power, Accelerated DSP for Gated Recurrent Unit Neural Network