Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

40nm CMOS Ultra-Low-Power Keyword Spotting Hardware Design

Fan, Tingyi LU and Sun, Peihao (2026) EITM02 20261
Department of Electrical and Information Technology
Abstract
Keyword spotting (KWS) is an essential function in always-on voice-interface systems,where a low-power detector continuously monitors audio streams and activates subsequent speech-processing modules only when a target keyword is detected.
This thesis presents the design and evaluation of a 40 nm CMOS ultralow-
power KWS accelerator that integrates a Mel-Frequency Cepstral Coefficient
(MFCC) front-end with a Depthwise Separable Convolutional Neural Network
(DSCNN) back-end. To reduce hardware overhead, we optimize not only the
front-end processing, the back-end neural network individually but also co-design
the whole processing. The proposed architecture adopts fixed-point arithmetic,
shared multiplier resources,... (More)
Keyword spotting (KWS) is an essential function in always-on voice-interface systems,where a low-power detector continuously monitors audio streams and activates subsequent speech-processing modules only when a target keyword is detected.
This thesis presents the design and evaluation of a 40 nm CMOS ultralow-
power KWS accelerator that integrates a Mel-Frequency Cepstral Coefficient
(MFCC) front-end with a Depthwise Separable Convolutional Neural Network
(DSCNN) back-end. To reduce hardware overhead, we optimize not only the
front-end processing, the back-end neural network individually but also co-design
the whole processing. The proposed architecture adopts fixed-point arithmetic,
shared multiplier resources, finite-state-machine-based scheduling, and on-chip
SRAM buffering.
The design is implemented and evaluated using a 40 nm HVT CMOS technology
under a 1.1 V supply voltage. Post-synthesis results (TT, 1.1 V, 25◦C) show
that the complete KWS accelerator achieves an active-window average power of
Pavg = 1220.44 μW during MFCC extraction and DSCNN inference. After incorporating
duty-cycled execution over a 100 MHz reference clock, the resulting
long-term average power reduces to Ptotal = 32.757 μW, while the total synthesized
cell area is 132,731.94 μm2. SRAM macros occupy 117,658.50 μm2 (88.6%
of the total cell area). The proposed design demonstrates the feasibility and practical value of integrating MFCC feature extraction and DSCNN classification into a compact, ultra-low-power ASIC for always-on keyword spotting. (Less)
Please use this url to cite or link to this publication:
author
Fan, Tingyi LU and Sun, Peihao
supervisor
organization
course
EITM02 20261
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Keyword Spotting, Ultra-Low-Power, Edged Devices, ASIC Design, MFCC, DSCNN
report number
LU/LTH-EIT 2026-1148
language
English
id
9235367
date added to LUP
2026-06-15 13:11:32
date last changed
2026-06-15 13:11:32
@misc{9235367,
  abstract     = {{Keyword spotting (KWS) is an essential function in always-on voice-interface systems,where a low-power detector continuously monitors audio streams and activates subsequent speech-processing modules only when a target keyword is detected.
This thesis presents the design and evaluation of a 40 nm CMOS ultralow-
power KWS accelerator that integrates a Mel-Frequency Cepstral Coefficient
(MFCC) front-end with a Depthwise Separable Convolutional Neural Network
(DSCNN) back-end. To reduce hardware overhead, we optimize not only the
front-end processing, the back-end neural network individually but also co-design
the whole processing. The proposed architecture adopts fixed-point arithmetic,
shared multiplier resources, finite-state-machine-based scheduling, and on-chip
SRAM buffering.
The design is implemented and evaluated using a 40 nm HVT CMOS technology
under a 1.1 V supply voltage. Post-synthesis results (TT, 1.1 V, 25◦C) show
that the complete KWS accelerator achieves an active-window average power of
Pavg = 1220.44 μW during MFCC extraction and DSCNN inference. After incorporating
duty-cycled execution over a 100 MHz reference clock, the resulting
long-term average power reduces to Ptotal = 32.757 μW, while the total synthesized
cell area is 132,731.94 μm2. SRAM macros occupy 117,658.50 μm2 (88.6%
of the total cell area). The proposed design demonstrates the feasibility and practical value of integrating MFCC feature extraction and DSCNN classification into a compact, ultra-low-power ASIC for always-on keyword spotting.}},
  author       = {{Fan, Tingyi and Sun, Peihao}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{40nm CMOS Ultra-Low-Power Keyword Spotting Hardware Design}},
  year         = {{2026}},
}