Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

A scalable all-digital near-memory computing architecture for edge AIoT applications

Nouripayam, Masoud LU ; Prieto, Arturo LU and Rodrigues, Joachim LU (2025) In IEEE Access 13. p.108609-108625
Abstract
With the growing need to process large volumes of data, edge computing near data collection sources has become increasingly important. However, the resource constraints of edge devices require more efficient data processing techniques. Near-memory computing (NMC) presents an efficient solution, especially for data-intensive applications, by enabling processing that is both energy-efficient and hardware optimized. This work introduces a platform-agnostic NMC architecture tailored for convolutional neural network (CNN) workloads, integrated into the shared cache memory subsystem of a microcontroller unit (MCU). An open-source RISC-V MCU is chosen as the target platform due to its flexibility and low-power architecture. The NMC co-processor,... (More)
With the growing need to process large volumes of data, edge computing near data collection sources has become increasingly important. However, the resource constraints of edge devices require more efficient data processing techniques. Near-memory computing (NMC) presents an efficient solution, especially for data-intensive applications, by enabling processing that is both energy-efficient and hardware optimized. This work introduces a platform-agnostic NMC architecture tailored for convolutional neural network (CNN) workloads, integrated into the shared cache memory subsystem of a microcontroller unit (MCU). An open-source RISC-V MCU is chosen as the target platform due to its flexibility and low-power architecture. The NMC co-processor, operating alongside the general-purpose RISC-V core, forms a multi-core system-on-chip that combines low hardware cost with high energy efficiency, while maintaining a high degree of flexibility. The proposed design offers a configurable architecture capable of processing a wide range of CNN models with a computational efficiency of 94%. For evaluation purposes, widely recognized CNN benchmark models are utilized, showing a performance of 96GOPS and an energy efficiency of 1828GOPS/W for 8-bit precision at 200MHz. These results represent a significant improvement over both highly customized state-of-the-art hardware accelerators and multi-core MCU solutions. (Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
IEEE Access
volume
13
pages
17 pages
publisher
IEEE - Institute of Electrical and Electronics Engineers Inc.
external identifiers
  • scopus:105009081799
ISSN
2169-3536
DOI
10.1109/ACCESS.2025.3582013
language
English
LU publication?
yes
id
7fdf1a9f-50c6-4377-b804-a4e32d123bc8
date added to LUP
2025-10-28 19:05:52
date last changed
2025-11-04 15:51:58
@article{7fdf1a9f-50c6-4377-b804-a4e32d123bc8,
  abstract     = {{With the growing need to process large volumes of data, edge computing near data collection sources has become increasingly important. However, the resource constraints of edge devices require more efficient data processing techniques. Near-memory computing (NMC) presents an efficient solution, especially for data-intensive applications, by enabling processing that is both energy-efficient and hardware optimized. This work introduces a platform-agnostic NMC architecture tailored for convolutional neural network (CNN) workloads, integrated into the shared cache memory subsystem of a microcontroller unit (MCU). An open-source RISC-V MCU is chosen as the target platform due to its flexibility and low-power architecture. The NMC co-processor, operating alongside the general-purpose RISC-V core, forms a multi-core system-on-chip that combines low hardware cost with high energy efficiency, while maintaining a high degree of flexibility. The proposed design offers a configurable architecture capable of processing a wide range of CNN models with a computational efficiency of 94%. For evaluation purposes, widely recognized CNN benchmark models are utilized, showing a performance of 96GOPS and an energy efficiency of 1828GOPS/W for 8-bit precision at 200MHz. These results represent a significant improvement over both highly customized state-of-the-art hardware accelerators and multi-core MCU solutions.}},
  author       = {{Nouripayam, Masoud and Prieto, Arturo and Rodrigues, Joachim}},
  issn         = {{2169-3536}},
  language     = {{eng}},
  pages        = {{108609--108625}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{IEEE Access}},
  title        = {{A scalable all-digital near-memory computing architecture for edge AIoT applications}},
  url          = {{http://dx.doi.org/10.1109/ACCESS.2025.3582013}},
  doi          = {{10.1109/ACCESS.2025.3582013}},
  volume       = {{13}},
  year         = {{2025}},
}