A scalable all-digital near-memory computing architecture for edge AIoT applications

Nouripayam, Masoud; Prieto, Arturo; Rodrigues, Joachim

A scalable all-digital near-memory computing architecture for edge AIoT applications

Mark

Nouripayam, Masoud ^LU ; Prieto, Arturo ^LU and Rodrigues, Joachim ^LU (2025) In IEEE Access 13. p.108609-108625

Abstract: With the growing need to process large volumes of data, edge computing near data collection sources has become increasingly important. However, the resource constraints of edge devices require more efficient data processing techniques. Near-memory computing (NMC) presents an efficient solution, especially for data-intensive applications, by enabling processing that is both energy-efficient and hardware optimized. This work introduces a platform-agnostic NMC architecture tailored for convolutional neural network (CNN) workloads, integrated into the shared cache memory subsystem of a microcontroller unit (MCU). An open-source RISC-V MCU is chosen as the target platform due to its flexibility and low-power architecture. The NMC co-processor,... (More); With the growing need to process large volumes of data, edge computing near data collection sources has become increasingly important. However, the resource constraints of edge devices require more efficient data processing techniques. Near-memory computing (NMC) presents an efficient solution, especially for data-intensive applications, by enabling processing that is both energy-efficient and hardware optimized. This work introduces a platform-agnostic NMC architecture tailored for convolutional neural network (CNN) workloads, integrated into the shared cache memory subsystem of a microcontroller unit (MCU). An open-source RISC-V MCU is chosen as the target platform due to its flexibility and low-power architecture. The NMC co-processor, operating alongside the general-purpose RISC-V core, forms a multi-core system-on-chip that combines low hardware cost with high energy efficiency, while maintaining a high degree of flexibility. The proposed design offers a configurable architecture capable of processing a wide range of CNN models with a computational efficiency of 94%. For evaluation purposes, widely recognized CNN benchmark models are utilized, showing a performance of 96GOPS and an energy efficiency of 1828GOPS/W for 8-bit precision at 200MHz. These results represent a significant improvement over both highly customized state-of-the-art hardware accelerators and multi-core MCU solutions. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/7fdf1a9f-50c6-4377-b804-a4e32d123bc8

author

Nouripayam, Masoud ^LU ; Prieto, Arturo ^LU and Rodrigues, Joachim ^LU

organization

publishing date

2025-06

type

Contribution to journal

publication status

published

subject

in

IEEE Access

volume

13

pages

17 pages

publisher

IEEE - Institute of Electrical and Electronics Engineers Inc.

external identifiers

scopus:105009081799

ISSN

2169-3536

DOI

10.1109/ACCESS.2025.3582013

language

English

LU publication?

yes

id

7fdf1a9f-50c6-4377-b804-a4e32d123bc8

date added to LUP

2025-10-28 19:05:52

date last changed

2025-11-04 15:51:58

@article{7fdf1a9f-50c6-4377-b804-a4e32d123bc8,
  abstract     = {{With the growing need to process large volumes of data, edge computing near data collection sources has become increasingly important. However, the resource constraints of edge devices require more efficient data processing techniques. Near-memory computing (NMC) presents an efficient solution, especially for data-intensive applications, by enabling processing that is both energy-efficient and hardware optimized. This work introduces a platform-agnostic NMC architecture tailored for convolutional neural network (CNN) workloads, integrated into the shared cache memory subsystem of a microcontroller unit (MCU). An open-source RISC-V MCU is chosen as the target platform due to its flexibility and low-power architecture. The NMC co-processor, operating alongside the general-purpose RISC-V core, forms a multi-core system-on-chip that combines low hardware cost with high energy efficiency, while maintaining a high degree of flexibility. The proposed design offers a configurable architecture capable of processing a wide range of CNN models with a computational efficiency of 94%. For evaluation purposes, widely recognized CNN benchmark models are utilized, showing a performance of 96GOPS and an energy efficiency of 1828GOPS/W for 8-bit precision at 200MHz. These results represent a significant improvement over both highly customized state-of-the-art hardware accelerators and multi-core MCU solutions.}},
  author       = {{Nouripayam, Masoud and Prieto, Arturo and Rodrigues, Joachim}},
  issn         = {{2169-3536}},
  language     = {{eng}},
  pages        = {{108609--108625}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{IEEE Access}},
  title        = {{A scalable all-digital near-memory computing architecture for edge AIoT applications}},
  url          = {{http://dx.doi.org/10.1109/ACCESS.2025.3582013}},
  doi          = {{10.1109/ACCESS.2025.3582013}},
  volume       = {{13}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

A scalable all-digital near-memory computing architecture for edge AIoT applications