Solving Nonlinear PDEs Using CUDA

Johannesson, Anna

Solving Nonlinear PDEs Using CUDA

Mark

Johannesson, Anna ^LU (2025) In LU-CS-EX EDAM05 20251
Department of Computer Science

Abstract: Partial differential equations (PDEs) are central in many scientific research areas,
including modeling physical phenomena and economic forecasting. While the
general solutions of PDEs can be complex, we can determine a particular solution
by introducing boundary conditions and/or initial conditions. This is called an
Initial Boundary Value Problem (IBVP).

Thalassa is a framework that generates solvers for IBVPs in PyTorch, target-
ing both CPU and GPU execution. One goal of this thesis was to extend Thalassa
to produce solvers in CUDA C++. We compared the performance of the PyTorch
solvers to their CUDA C++ counterparts, using different PDEs and problem sizes,
focusing on the execution time. We also profiled the CUDA C++... (More); Partial differential equations (PDEs) are central in many scientific research areas,
including modeling physical phenomena and economic forecasting. While the
general solutions of PDEs can be complex, we can determine a particular solution
by introducing boundary conditions and/or initial conditions. This is called an
Initial Boundary Value Problem (IBVP).

Thalassa is a framework that generates solvers for IBVPs in PyTorch, target-
ing both CPU and GPU execution. One goal of this thesis was to extend Thalassa
to produce solvers in CUDA C++. We compared the performance of the PyTorch
solvers to their CUDA C++ counterparts, using different PDEs and problem sizes,
focusing on the execution time. We also profiled the CUDA C++ solvers in order
to identify bottlenecks that would affect the speed of the CUDA C++ solvers.
Our results show that the CUDA C++ solvers outperformed their equivalent
PyTorch solvers, targeting both the CPU and the GPU. Speedups ranged from
3.5x to 77x compared to PyTorch GPU solvers, and from 52x to 160x compared
to PyTorch CPU solvers.

The generated CUDA C++ solvers were limited by architectural factors such
as shared memory and register availability, as well as runtime factors like memory
latency and warp-level stalls. Additionally, performance depended on manual
tuning of thread block sizes. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9213966

author

Johannesson, Anna ^LU

supervisor

Michail Boulasikis ^LU
Flavius Gruian ^LU

organization

Department of Computer Science

alternative title

Lösning av Olinjära PDEs med CUDA

course

EDAM05 20251

year

2025

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

keywords

partial differential equations, finite difference methods, CUDA, GPU computing, parallel programming, code generation, performance profiling

publication/series

LU-CS-EX

report number

2025-56

ISSN

1650-2884

language

English

id

9213966

date added to LUP

2025-12-16 11:21:28

date last changed

2025-12-16 11:21:28

@misc{9213966,
  abstract     = {{Partial differential equations (PDEs) are central in many scientific research areas,
including modeling physical phenomena and economic forecasting. While the
general solutions of PDEs can be complex, we can determine a particular solution
by introducing boundary conditions and/or initial conditions. This is called an
Initial Boundary Value Problem (IBVP).

Thalassa is a framework that generates solvers for IBVPs in PyTorch, target-
ing both CPU and GPU execution. One goal of this thesis was to extend Thalassa
to produce solvers in CUDA C++. We compared the performance of the PyTorch
solvers to their CUDA C++ counterparts, using different PDEs and problem sizes,
focusing on the execution time. We also profiled the CUDA C++ solvers in order
to identify bottlenecks that would affect the speed of the CUDA C++ solvers.
Our results show that the CUDA C++ solvers outperformed their equivalent
PyTorch solvers, targeting both the CPU and the GPU. Speedups ranged from
3.5x to 77x compared to PyTorch GPU solvers, and from 52x to 160x compared
to PyTorch CPU solvers.

The generated CUDA C++ solvers were limited by architectural factors such
as shared memory and register availability, as well as runtime factors like memory
latency and warp-level stalls. Additionally, performance depended on manual
tuning of thread block sizes.}},
  author       = {{Johannesson, Anna}},
  issn         = {{1650-2884}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{LU-CS-EX}},
  title        = {{Solving Nonlinear PDEs Using CUDA}},
  year         = {{2025}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Solving Nonlinear PDEs Using CUDA