System level cache prefetching algorithms for complex GPU workloads

Eklund Sigleifs, Fabian; Häggström, Erik

System level cache prefetching algorithms for complex GPU workloads

Mark

Eklund Sigleifs, Fabian ^LU and Häggström, Erik ^LU (2024) EITM01 20241
Department of Electrical and Information Technology

Abstract: Prefetching is a well known concept for CPUs but for GPUs it is fairly
unexplored. The memory management of a GPU plays a crucial role
in its performance, and cache prefetching has the potential to lower
the overall latency. This thesis compares different types of prefetching
methods for GPUs and remaking some CPU prefetchers to fit the
GPU architecture. All these prefetchers were then put inside the
system level cache (SLC), between the GPU and external memory.
Five different methods were tested on framework based on ARM’s
GPU model. The thesis was mainly based upon prefetching tech-
niques discussed in the following papers: Adaptive Stream Detec-
tion[11], Best-offset [18], Many-thread aware [17], APOGEE [26],
and Last-level... (More); Prefetching is a well known concept for CPUs but for GPUs it is fairly
unexplored. The memory management of a GPU plays a crucial role
in its performance, and cache prefetching has the potential to lower
the overall latency. This thesis compares different types of prefetching
methods for GPUs and remaking some CPU prefetchers to fit the
GPU architecture. All these prefetchers were then put inside the
system level cache (SLC), between the GPU and external memory.
Five different methods were tested on framework based on ARM’s
GPU model. The thesis was mainly based upon prefetching tech-
niques discussed in the following papers: Adaptive Stream Detec-
tion[11], Best-offset [18], Many-thread aware [17], APOGEE [26],
and Last-level collective cache prefetcher [19]. The prefetchers pro-
duced in this thesis were either heavily inspired by or implemented
as closely as possible to the designs in the papers.
The thesis concludes that for graphics workloads the best prefetch-
ers implemented can achieve 0.51-1.29% decrease in GPU cycles on
average, depending on the chosen GPU configurations, while also
lowering the estimated energy usage. (Less)
Popular Abstract: Prefetching is the process of guessing what data a system wants, before it asks for it. This idea has been used for Central Processing Units (CPUs) for a long time but is
rather unexplored for Graphic Processing Units (GPUs). This thesis will look at five
different prefetchers and evaluate their performance.

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9159122

author

Eklund Sigleifs, Fabian ^LU and Häggström, Erik ^LU

supervisor

organization

Department of Electrical and Information Technology

course

EITM01 20241

year

2024

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

keywords

GPU, Prefetching, Cache prefetching, Memory management, System level cache, SLC, LLC, Last Level Cache, GPU architecture, Graphics workloads, ARM GPU model, Adaptive Stream Detection, Best-offset prefetcher, Many-thread aware prefetcher, APOGEE prefetcher, Last-level collective cache prefetcher, GPU performance, GPU latency reduction, Energy efficiency in GPUs, External memory, Prefetching techniques, GPU cycle reduction

report number

LU/LTH-EIT 2024-971

language

English

id

9159122

date added to LUP

2024-06-10 10:49:41

date last changed

2024-06-10 10:49:41

@misc{9159122,
  abstract     = {{Prefetching is a well known concept for CPUs but for GPUs it is fairly
unexplored. The memory management of a GPU plays a crucial role
in its performance, and cache prefetching has the potential to lower
the overall latency. This thesis compares different types of prefetching
methods for GPUs and remaking some CPU prefetchers to fit the
GPU architecture. All these prefetchers were then put inside the
system level cache (SLC), between the GPU and external memory.
Five different methods were tested on framework based on ARM’s
GPU model. The thesis was mainly based upon prefetching tech-
niques discussed in the following papers: Adaptive Stream Detec-
tion[11], Best-offset [18], Many-thread aware [17], APOGEE [26],
and Last-level collective cache prefetcher [19]. The prefetchers pro-
duced in this thesis were either heavily inspired by or implemented
as closely as possible to the designs in the papers.
The thesis concludes that for graphics workloads the best prefetch-
ers implemented can achieve 0.51-1.29% decrease in GPU cycles on
average, depending on the chosen GPU configurations, while also
lowering the estimated energy usage.}},
  author       = {{Eklund Sigleifs, Fabian and Häggström, Erik}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{System level cache prefetching algorithms for complex GPU workloads}},
  year         = {{2024}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

System level cache prefetching algorithms for complex GPU workloads