Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Evaluating pseudo-random SRAM for AI applications in GPU cache

Asare, Kwaku Ofosu LU (2024) EITM02 20241
Department of Electrical and Information Technology
Abstract
General Purpose Graphics Processing Units (GPGPUs) have become the prevalent
processor for AI/ML and other large computational problems because parallel processing has not reached a hardware limit, unlike single-threaded processing. The
goal of the thesis is to investigate the suitability of a novel SRAM architecture
for implementation in a GPU without extensive GPU architecture changes. This
architecture features groups of SRAM cells called zones, which are leveraged to
perform pipelined SRAM read operations to reduce dynamic energy consumption. This thesis examines the implementation of an energy-efficient SRAM in
GPU caches and analyzes its energy saving and performance under AI/ML work
loads in a GPU. GPGPU-Sim [1]... (More)
General Purpose Graphics Processing Units (GPGPUs) have become the prevalent
processor for AI/ML and other large computational problems because parallel processing has not reached a hardware limit, unlike single-threaded processing. The
goal of the thesis is to investigate the suitability of a novel SRAM architecture
for implementation in a GPU without extensive GPU architecture changes. This
architecture features groups of SRAM cells called zones, which are leveraged to
perform pipelined SRAM read operations to reduce dynamic energy consumption. This thesis examines the implementation of an energy-efficient SRAM in
GPU caches and analyzes its energy saving and performance under AI/ML work
loads in a GPU. GPGPU-Sim [1] simulator was used to run all the benchmarks.
The simulator was modified to output all memory accesses made to L1 and L2
cache. Thereafter, the memory accesses were used by the software cache model
to analyze the performance and energy of all the workloads run. Hybrid SRAM
implementations with a small capacity conventional SRAM in tandem with the
Pseudo-random SRAM (PR-SRAM) were investigated to check penalty cycles and
reduction in dynamic energy consumption. A penalty cycle is a stall in the pipeline
caused by repeated access to a specific zone. The penalty cycle rate is the number
of penalty cycles per 100 accesses. The hybrid implementation featured a 33%
increase in energy consumption versus the pure PR-SRAM implementation. This
increase in energy consumption was the cost of reducing the penalty cycle rate
by 43%. The effect of cache replacement policy on the performance of the hybrid design was also investigated, Least Recently Used (LRU) achieved the lowest
penalty cycle rate. The gains were also measured against the complexity required
to perform necessary operations with the hybrid cache. (Less)
Popular Abstract
AI has become unavoidable in daily life; it is used to recommend services to us,
in virtual assistants, healthcare, finance, and more. The meteoric rise of AI and
its applications can be attributed to the technological advancements that have
allowed processing power to catch up with the computational requirements of AI
training.
These AI/ML models require large compute resources and are still limited by
hardware, as current models are designed to take advantage of all the available
hardware resources. Graphics Processing Units (GPUs) are the prevalent choice of
processors for specific repetitive workloads with parallelism due to the GPU structure. This has led to a great increase in the computing power of graphics cards
... (More)
AI has become unavoidable in daily life; it is used to recommend services to us,
in virtual assistants, healthcare, finance, and more. The meteoric rise of AI and
its applications can be attributed to the technological advancements that have
allowed processing power to catch up with the computational requirements of AI
training.
These AI/ML models require large compute resources and are still limited by
hardware, as current models are designed to take advantage of all the available
hardware resources. Graphics Processing Units (GPUs) are the prevalent choice of
processors for specific repetitive workloads with parallelism due to the GPU structure. This has led to a great increase in the computing power of graphics cards
(GPUs) which has eclipsed the growth of computing power in Central Processing
Units (CPUs). Performance increases in GPUs can be attributed to advancements
in manufacturing processes and an increase in onboard memory.
To achieve these performance gains, improvements have been made across all aspects, including memory. Larger memory is required to store the growing amount
of data being processed. Data transfer rates for all memory hierarchy levels have
also been increased to improve latency, while wider buses and newer protocols
have been implemented to improve memory throughput. The pursuit of increasing performance often results in energy efficiency taking a backseat. This thesis
investigates the benefits of using a new memory design to read data efficiently
with minimal impact on performance.
Scrutinising the benefits of Xenergic’s new memory in a GPU; required the use of
an accurate and detailed GPU simulator. A performance model was programmed
to calculate the energy consumption of the new memory against a conventional
memory. The model used details obtained from the simulator running benchmarks.
The model also implemented a dual-memory design to look into performance and
energy-saving trade-offs of different memory configurations. This thesis project
found considerable energy reduction with the new memory design. (Less)
Please use this url to cite or link to this publication:
author
Asare, Kwaku Ofosu LU
supervisor
organization
course
EITM02 20241
year
type
H2 - Master's Degree (Two Years)
subject
keywords
GPU, SRAM, cache, AI
report number
LU/LTH-EIT 2024-980
language
English
id
9161586
date added to LUP
2024-06-11 14:00:52
date last changed
2024-06-11 14:00:52
@misc{9161586,
  abstract     = {{General Purpose Graphics Processing Units (GPGPUs) have become the prevalent
 processor for AI/ML and other large computational problems because parallel processing has not reached a hardware limit, unlike single-threaded processing. The
 goal of the thesis is to investigate the suitability of a novel SRAM architecture
 for implementation in a GPU without extensive GPU architecture changes. This
 architecture features groups of SRAM cells called zones, which are leveraged to
 perform pipelined SRAM read operations to reduce dynamic energy consumption. This thesis examines the implementation of an energy-efficient SRAM in
 GPU caches and analyzes its energy saving and performance under AI/ML work
loads in a GPU. GPGPU-Sim [1] simulator was used to run all the benchmarks.
 The simulator was modified to output all memory accesses made to L1 and L2
 cache. Thereafter, the memory accesses were used by the software cache model
 to analyze the performance and energy of all the workloads run. Hybrid SRAM
 implementations with a small capacity conventional SRAM in tandem with the
 Pseudo-random SRAM (PR-SRAM) were investigated to check penalty cycles and
 reduction in dynamic energy consumption. A penalty cycle is a stall in the pipeline
 caused by repeated access to a specific zone. The penalty cycle rate is the number
 of penalty cycles per 100 accesses. The hybrid implementation featured a 33%
 increase in energy consumption versus the pure PR-SRAM implementation. This
 increase in energy consumption was the cost of reducing the penalty cycle rate
 by 43%. The effect of cache replacement policy on the performance of the hybrid design was also investigated, Least Recently Used (LRU) achieved the lowest
 penalty cycle rate. The gains were also measured against the complexity required
 to perform necessary operations with the hybrid cache.}},
  author       = {{Asare, Kwaku Ofosu}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Evaluating pseudo-random SRAM for AI applications in GPU cache}},
  year         = {{2024}},
}