FLoPAD-GRU : A Flexible, Low Power, Accelerated DSP for Gated Recurrent Unit Neural Network
(2021) 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021 In Proceedings - 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021- Abstract
Recurrent neural networks (RNNs) are efficient for classification of sequential data such as speech and audio due to their high precision on tasks. However, power efficiency, the required memory capacity and bandwidth requirements make them less suitable for battery powered devices. In this work, we introduce FLoPAD-GRU: a system on a chip (SoC) for efficient processing of gated recurrent unit (GRU) networks, that consists of a digital signal processor (DSP), supplemented with an optimized hardware accelerator, which reduces memory accesses and cost. The system is programmable and scalable, which allows for execution of different network sizes. Synthesized in 28 nm CMOS technology, real-time classification is achieved at 4 MHz, with an... (More)
Recurrent neural networks (RNNs) are efficient for classification of sequential data such as speech and audio due to their high precision on tasks. However, power efficiency, the required memory capacity and bandwidth requirements make them less suitable for battery powered devices. In this work, we introduce FLoPAD-GRU: a system on a chip (SoC) for efficient processing of gated recurrent unit (GRU) networks, that consists of a digital signal processor (DSP), supplemented with an optimized hardware accelerator, which reduces memory accesses and cost. The system is programmable and scalable, which allows for execution of different network sizes. Synthesized in 28 nm CMOS technology, real-time classification is achieved at 4 MHz, with an energy dissipation of 4.1 pJ/classification, an improvement of 15 × compared to a pure DSP realization. The memory requirements are reduced by 75 %, which results in a silicon area of 0.7 mm2for the entire SoC.
(Less)
- author
- Yaman, Ilayda LU ; Andersen, Allan ; Ferreira, Lucas LU and Rodrigues, Joachirn LU
- organization
- publishing date
- 2021-08-23
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- keywords
- Deep Learning, Digital Signal Processor, GRU, Hardware Accelerator, RNN, SoC, Speech Recognition
- host publication
- Proceedings - 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021
- series title
- Proceedings - 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021
- publisher
- IEEE - Institute of Electrical and Electronics Engineers Inc.
- conference name
- 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021
- conference location
- Campinas, Brazil
- conference dates
- 2021-08-23 - 2021-08-27
- external identifiers
-
- scopus:85116259648
- ISBN
- 9781665421706
- DOI
- 10.1109/SBCCI53441.2021.9529981
- language
- English
- LU publication?
- yes
- additional info
- Publisher Copyright: © 2021 IEEE.
- id
- c06936c3-eedd-43f4-b4d7-3b1210d09bc8
- date added to LUP
- 2021-10-26 13:46:10
- date last changed
- 2022-04-27 05:16:09
@inproceedings{c06936c3-eedd-43f4-b4d7-3b1210d09bc8, abstract = {{<p>Recurrent neural networks (RNNs) are efficient for classification of sequential data such as speech and audio due to their high precision on tasks. However, power efficiency, the required memory capacity and bandwidth requirements make them less suitable for battery powered devices. In this work, we introduce FLoPAD-GRU: a system on a chip (SoC) for efficient processing of gated recurrent unit (GRU) networks, that consists of a digital signal processor (DSP), supplemented with an optimized hardware accelerator, which reduces memory accesses and cost. The system is programmable and scalable, which allows for execution of different network sizes. Synthesized in 28 nm CMOS technology, real-time classification is achieved at 4 MHz, with an energy dissipation of 4.1 pJ/classification, an improvement of 15 × compared to a pure DSP realization. The memory requirements are reduced by 75 %, which results in a silicon area of 0.7 mm2for the entire SoC.</p>}}, author = {{Yaman, Ilayda and Andersen, Allan and Ferreira, Lucas and Rodrigues, Joachirn}}, booktitle = {{Proceedings - 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021}}, isbn = {{9781665421706}}, keywords = {{Deep Learning; Digital Signal Processor; GRU; Hardware Accelerator; RNN; SoC; Speech Recognition}}, language = {{eng}}, month = {{08}}, publisher = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}}, series = {{Proceedings - 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, SBCCI 2021}}, title = {{FLoPAD-GRU : A Flexible, Low Power, Accelerated DSP for Gated Recurrent Unit Neural Network}}, url = {{http://dx.doi.org/10.1109/SBCCI53441.2021.9529981}}, doi = {{10.1109/SBCCI53441.2021.9529981}}, year = {{2021}}, }