Evaluation of Design Metrics for DFT Implemented Using In-Memory Computing
(2026) EITM01 20261Department of Electrical and Information Technology
- Abstract
- The discrete Fourier transform (DFT) is one of the most important algorithms
commonly used in baseband processing. While traditional hardware implemen-
tations like the Fast Fourier Transform (FFT) are highly optimized, they heav-
ily suffer from the von Neumann bottleneck. To overcome this, performing the
computations directly in memory using memristive crossbar arrays is a promising
solution.
This thesis evaluates different simulators for analog in-memory-computing and
found NeuroSim to be the most suitable for simulating power, performance, area
and accuracy. Modifications to the simulator are proposed to support the assess-
ment of three different crossbar-based designs for computing the DFT. To investi-
gate whether the DFT... (More) - The discrete Fourier transform (DFT) is one of the most important algorithms
commonly used in baseband processing. While traditional hardware implemen-
tations like the Fast Fourier Transform (FFT) are highly optimized, they heav-
ily suffer from the von Neumann bottleneck. To overcome this, performing the
computations directly in memory using memristive crossbar arrays is a promising
solution.
This thesis evaluates different simulators for analog in-memory-computing and
found NeuroSim to be the most suitable for simulating power, performance, area
and accuracy. Modifications to the simulator are proposed to support the assess-
ment of three different crossbar-based designs for computing the DFT. To investi-
gate whether the DFT can be reliably implemented in the analog domain without
severe accuracy degradation, a Ferroelectric tunnel junction memristor was cho-
sen due to its high resistance and inherent robustness against non-idealities like
IR drop. We demonstrate that a symmetry design, which stacks the twiddle coef-
ficients into one crossbar and leverages the conjugate symmetry of the DFT, is the
most optimal for real-valued inputs. This design effectively reduces the hardware
cost of the peripherals, reducing the crossbar area by 50% compared to a naive
implementation.
For a small 64-point DFT, our evaluations show that the system can achieve
a mean square error of magnitude 10^−3. However, scaling to a large 1024-point
DFT in a single crossbar introduces significant IR drop, resulting in an increased
accuracy degradation. To mitigate this, a tiled architecture is adopted. As tiling
significantly increases energy and area due to the overhead of multiple analog-to-
digital converters, a tradeoff analysis is performed. Square tiles of size T = 1024
are found to be the most optimal, effectively reducing the error margin to 2 × 10^−2
while maintaining low energy consumption and latency. Finally, the results show
that a single crossbar implementation consumes approximately 8.65 nJ, about
half the energy of highly optimized CMOS FFTs, whereas the tiled architecture
requires approximately 15.6 nJ (Less) - Popular Abstract
- In an increasingly connected world, the demand for fast wireless communication,
like 5G and 6G, is rapidly growing. To process these wireless signals, our devices
heavily rely on a crucial mathematical algorithm called the discrete Fourier trans-
form (DFT). However, traditional computers have an architectural limit. They
suffer from the von Neumann bottleneck, which means they waste massive amounts
of time and energy simply moving data back and forth between the memory unit
and the processing unit.
A solution to overcome this bottleneck is to perform the calculations directly
inside the memory itself. This can be achieved with the emerging devices called
memristors. Arranging these analog components in a grid-like structure... (More) - In an increasingly connected world, the demand for fast wireless communication,
like 5G and 6G, is rapidly growing. To process these wireless signals, our devices
heavily rely on a crucial mathematical algorithm called the discrete Fourier trans-
form (DFT). However, traditional computers have an architectural limit. They
suffer from the von Neumann bottleneck, which means they waste massive amounts
of time and energy simply moving data back and forth between the memory unit
and the processing unit.
A solution to overcome this bottleneck is to perform the calculations directly
inside the memory itself. This can be achieved with the emerging devices called
memristors. Arranging these analog components in a grid-like structure called a
crossbar array, we can utilize the current flowing through it, to do math directly
inside the arrays. Using this can drastically reduce time complexity and save
power.
In this project, our main purpose was to simulate and evaluate the best way to
build a DFT calculator using these memristor crossbars. We proposed and tested
three different hardware designs:
• Baseline: A standard, straightforward implementation of the DFT using
differential pairs.
• Merged: A design where we stacked the calculation matrices together to
significantly reduce the hardware cost of expensive peripherals, like analog-
to-Digital Converters (ADCs).
• Symmetry: A highly optimized design that leverages the natural mathemat-
ical symmetry of the DFT to cut the required crossbar area completely in
half for real-valued inputs.
Using a modified NeuroSim simulator, we found our Symmetry design con-
sumes only 8.65 nJ of energy. This is roughly half the energy required by highly
optimized traditional hardware setups. However, scaling up to larger computations
(like a 1024-point DFT) introduces a major challenge known as IR drop. This phe-
nomenon is linked to the natural resistance of longer interconnecting wires, causing
a voltage loss that heavily degrades calculation accuracy.
To mitigate this IR drop, we introduce tiling to break the massive crossbar
into smaller, interconnected grids. Furthermore, because adding tiles requires
expensive extra peripherals (like the previously mentioned ADCs) that increase
energy and area costs, we performed a tradeoff analysis. We found that a square
tile size of 1024 offers the most optimal balance between energy efficiency and
accuracy for a 1024-point DFT. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/student-papers/record/9224551
- author
- Nielsen, Philip LU and Tatidis, Sofia LU
- supervisor
- organization
- course
- EITM01 20261
- year
- 2026
- type
- H2 - Master's Degree (Two Years)
- subject
- keywords
- Discrete Fourier Transform (DFT), Fast Fourier Transform (FFT), In-Memory Computing (IMC), Analog In-Memory Computing (AIMC), Memristor, Ferroelectric Tunnel Junction (FTJ), Crossbar Array, Hardware Architecture, Differential Representation, Power Performance and Area (PPA), Mean Square Error (MSE), IR Drop, Quantization Error, NeuroSim
- report number
- LU/LTH-EIT 2026-1114
- language
- English
- id
- 9224551
- date added to LUP
- 2026-03-24 15:08:08
- date last changed
- 2026-03-24 15:08:08
@misc{9224551,
abstract = {{The discrete Fourier transform (DFT) is one of the most important algorithms
commonly used in baseband processing. While traditional hardware implemen-
tations like the Fast Fourier Transform (FFT) are highly optimized, they heav-
ily suffer from the von Neumann bottleneck. To overcome this, performing the
computations directly in memory using memristive crossbar arrays is a promising
solution.
This thesis evaluates different simulators for analog in-memory-computing and
found NeuroSim to be the most suitable for simulating power, performance, area
and accuracy. Modifications to the simulator are proposed to support the assess-
ment of three different crossbar-based designs for computing the DFT. To investi-
gate whether the DFT can be reliably implemented in the analog domain without
severe accuracy degradation, a Ferroelectric tunnel junction memristor was cho-
sen due to its high resistance and inherent robustness against non-idealities like
IR drop. We demonstrate that a symmetry design, which stacks the twiddle coef-
ficients into one crossbar and leverages the conjugate symmetry of the DFT, is the
most optimal for real-valued inputs. This design effectively reduces the hardware
cost of the peripherals, reducing the crossbar area by 50% compared to a naive
implementation.
For a small 64-point DFT, our evaluations show that the system can achieve
a mean square error of magnitude 10^−3. However, scaling to a large 1024-point
DFT in a single crossbar introduces significant IR drop, resulting in an increased
accuracy degradation. To mitigate this, a tiled architecture is adopted. As tiling
significantly increases energy and area due to the overhead of multiple analog-to-
digital converters, a tradeoff analysis is performed. Square tiles of size T = 1024
are found to be the most optimal, effectively reducing the error margin to 2 × 10^−2
while maintaining low energy consumption and latency. Finally, the results show
that a single crossbar implementation consumes approximately 8.65 nJ, about
half the energy of highly optimized CMOS FFTs, whereas the tiled architecture
requires approximately 15.6 nJ}},
author = {{Nielsen, Philip and Tatidis, Sofia}},
language = {{eng}},
note = {{Student Paper}},
title = {{Evaluation of Design Metrics for DFT Implemented Using In-Memory Computing}},
year = {{2026}},
}