A scalable all-digital near-memory computing architecture for edge AIoT applications
(2025) In IEEE Access 13. p.108609-108625- Abstract
- With the growing need to process large volumes of data, edge computing near data collection sources has become increasingly important. However, the resource constraints of edge devices require more efficient data processing techniques. Near-memory computing (NMC) presents an efficient solution, especially for data-intensive applications, by enabling processing that is both energy-efficient and hardware optimized. This work introduces a platform-agnostic NMC architecture tailored for convolutional neural network (CNN) workloads, integrated into the shared cache memory subsystem of a microcontroller unit (MCU). An open-source RISC-V MCU is chosen as the target platform due to its flexibility and low-power architecture. The NMC co-processor,... (More)
- With the growing need to process large volumes of data, edge computing near data collection sources has become increasingly important. However, the resource constraints of edge devices require more efficient data processing techniques. Near-memory computing (NMC) presents an efficient solution, especially for data-intensive applications, by enabling processing that is both energy-efficient and hardware optimized. This work introduces a platform-agnostic NMC architecture tailored for convolutional neural network (CNN) workloads, integrated into the shared cache memory subsystem of a microcontroller unit (MCU). An open-source RISC-V MCU is chosen as the target platform due to its flexibility and low-power architecture. The NMC co-processor, operating alongside the general-purpose RISC-V core, forms a multi-core system-on-chip that combines low hardware cost with high energy efficiency, while maintaining a high degree of flexibility. The proposed design offers a configurable architecture capable of processing a wide range of CNN models with a computational efficiency of 94%. For evaluation purposes, widely recognized CNN benchmark models are utilized, showing a performance of 96GOPS and an energy efficiency of 1828GOPS/W for 8-bit precision at 200MHz. These results represent a significant improvement over both highly customized state-of-the-art hardware accelerators and multi-core MCU solutions. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/7fdf1a9f-50c6-4377-b804-a4e32d123bc8
- author
- Nouripayam, Masoud LU ; Prieto, Arturo LU and Rodrigues, Joachim LU
- organization
- publishing date
- 2025-06
- type
- Contribution to journal
- publication status
- published
- subject
- in
- IEEE Access
- volume
- 13
- pages
- 17 pages
- publisher
- IEEE - Institute of Electrical and Electronics Engineers Inc.
- external identifiers
-
- scopus:105009081799
- ISSN
- 2169-3536
- DOI
- 10.1109/ACCESS.2025.3582013
- language
- English
- LU publication?
- yes
- id
- 7fdf1a9f-50c6-4377-b804-a4e32d123bc8
- date added to LUP
- 2025-10-28 19:05:52
- date last changed
- 2025-11-04 15:51:58
@article{7fdf1a9f-50c6-4377-b804-a4e32d123bc8,
abstract = {{With the growing need to process large volumes of data, edge computing near data collection sources has become increasingly important. However, the resource constraints of edge devices require more efficient data processing techniques. Near-memory computing (NMC) presents an efficient solution, especially for data-intensive applications, by enabling processing that is both energy-efficient and hardware optimized. This work introduces a platform-agnostic NMC architecture tailored for convolutional neural network (CNN) workloads, integrated into the shared cache memory subsystem of a microcontroller unit (MCU). An open-source RISC-V MCU is chosen as the target platform due to its flexibility and low-power architecture. The NMC co-processor, operating alongside the general-purpose RISC-V core, forms a multi-core system-on-chip that combines low hardware cost with high energy efficiency, while maintaining a high degree of flexibility. The proposed design offers a configurable architecture capable of processing a wide range of CNN models with a computational efficiency of 94%. For evaluation purposes, widely recognized CNN benchmark models are utilized, showing a performance of 96GOPS and an energy efficiency of 1828GOPS/W for 8-bit precision at 200MHz. These results represent a significant improvement over both highly customized state-of-the-art hardware accelerators and multi-core MCU solutions.}},
author = {{Nouripayam, Masoud and Prieto, Arturo and Rodrigues, Joachim}},
issn = {{2169-3536}},
language = {{eng}},
pages = {{108609--108625}},
publisher = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
series = {{IEEE Access}},
title = {{A scalable all-digital near-memory computing architecture for edge AIoT applications}},
url = {{http://dx.doi.org/10.1109/ACCESS.2025.3582013}},
doi = {{10.1109/ACCESS.2025.3582013}},
volume = {{13}},
year = {{2025}},
}