40nm CMOS Ultra-Low-Power Keyword Spotting Hardware Design
(2026) EITM02 20261Department of Electrical and Information Technology
- Abstract
- Keyword spotting (KWS) is an essential function in always-on voice-interface systems,where a low-power detector continuously monitors audio streams and activates subsequent speech-processing modules only when a target keyword is detected.
This thesis presents the design and evaluation of a 40 nm CMOS ultralow-
power KWS accelerator that integrates a Mel-Frequency Cepstral Coefficient
(MFCC) front-end with a Depthwise Separable Convolutional Neural Network
(DSCNN) back-end. To reduce hardware overhead, we optimize not only the
front-end processing, the back-end neural network individually but also co-design
the whole processing. The proposed architecture adopts fixed-point arithmetic,
shared multiplier resources,... (More) - Keyword spotting (KWS) is an essential function in always-on voice-interface systems,where a low-power detector continuously monitors audio streams and activates subsequent speech-processing modules only when a target keyword is detected.
This thesis presents the design and evaluation of a 40 nm CMOS ultralow-
power KWS accelerator that integrates a Mel-Frequency Cepstral Coefficient
(MFCC) front-end with a Depthwise Separable Convolutional Neural Network
(DSCNN) back-end. To reduce hardware overhead, we optimize not only the
front-end processing, the back-end neural network individually but also co-design
the whole processing. The proposed architecture adopts fixed-point arithmetic,
shared multiplier resources, finite-state-machine-based scheduling, and on-chip
SRAM buffering.
The design is implemented and evaluated using a 40 nm HVT CMOS technology
under a 1.1 V supply voltage. Post-synthesis results (TT, 1.1 V, 25◦C) show
that the complete KWS accelerator achieves an active-window average power of
Pavg = 1220.44 μW during MFCC extraction and DSCNN inference. After incorporating
duty-cycled execution over a 100 MHz reference clock, the resulting
long-term average power reduces to Ptotal = 32.757 μW, while the total synthesized
cell area is 132,731.94 μm2. SRAM macros occupy 117,658.50 μm2 (88.6%
of the total cell area). The proposed design demonstrates the feasibility and practical value of integrating MFCC feature extraction and DSCNN classification into a compact, ultra-low-power ASIC for always-on keyword spotting. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/student-papers/record/9235367
- author
- Fan, Tingyi LU and Sun, Peihao
- supervisor
- organization
- course
- EITM02 20261
- year
- 2026
- type
- H2 - Master's Degree (Two Years)
- subject
- keywords
- Keyword Spotting, Ultra-Low-Power, Edged Devices, ASIC Design, MFCC, DSCNN
- report number
- LU/LTH-EIT 2026-1148
- language
- English
- id
- 9235367
- date added to LUP
- 2026-06-15 13:11:32
- date last changed
- 2026-06-15 13:11:32
@misc{9235367,
abstract = {{Keyword spotting (KWS) is an essential function in always-on voice-interface systems,where a low-power detector continuously monitors audio streams and activates subsequent speech-processing modules only when a target keyword is detected.
This thesis presents the design and evaluation of a 40 nm CMOS ultralow-
power KWS accelerator that integrates a Mel-Frequency Cepstral Coefficient
(MFCC) front-end with a Depthwise Separable Convolutional Neural Network
(DSCNN) back-end. To reduce hardware overhead, we optimize not only the
front-end processing, the back-end neural network individually but also co-design
the whole processing. The proposed architecture adopts fixed-point arithmetic,
shared multiplier resources, finite-state-machine-based scheduling, and on-chip
SRAM buffering.
The design is implemented and evaluated using a 40 nm HVT CMOS technology
under a 1.1 V supply voltage. Post-synthesis results (TT, 1.1 V, 25◦C) show
that the complete KWS accelerator achieves an active-window average power of
Pavg = 1220.44 μW during MFCC extraction and DSCNN inference. After incorporating
duty-cycled execution over a 100 MHz reference clock, the resulting
long-term average power reduces to Ptotal = 32.757 μW, while the total synthesized
cell area is 132,731.94 μm2. SRAM macros occupy 117,658.50 μm2 (88.6%
of the total cell area). The proposed design demonstrates the feasibility and practical value of integrating MFCC feature extraction and DSCNN classification into a compact, ultra-low-power ASIC for always-on keyword spotting.}},
author = {{Fan, Tingyi and Sun, Peihao}},
language = {{eng}},
note = {{Student Paper}},
title = {{40nm CMOS Ultra-Low-Power Keyword Spotting Hardware Design}},
year = {{2026}},
}