Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Solving Nonlinear PDEs Using CUDA

Johannesson, Anna LU (2025) In LU-CS-EX EDAM05 20251
Department of Computer Science
Abstract
Partial differential equations (PDEs) are central in many scientific research areas,
including modeling physical phenomena and economic forecasting. While the
general solutions of PDEs can be complex, we can determine a particular solution
by introducing boundary conditions and/or initial conditions. This is called an
Initial Boundary Value Problem (IBVP).

Thalassa is a framework that generates solvers for IBVPs in PyTorch, target-
ing both CPU and GPU execution. One goal of this thesis was to extend Thalassa
to produce solvers in CUDA C++. We compared the performance of the PyTorch
solvers to their CUDA C++ counterparts, using different PDEs and problem sizes,
focusing on the execution time. We also profiled the CUDA C++... (More)
Partial differential equations (PDEs) are central in many scientific research areas,
including modeling physical phenomena and economic forecasting. While the
general solutions of PDEs can be complex, we can determine a particular solution
by introducing boundary conditions and/or initial conditions. This is called an
Initial Boundary Value Problem (IBVP).

Thalassa is a framework that generates solvers for IBVPs in PyTorch, target-
ing both CPU and GPU execution. One goal of this thesis was to extend Thalassa
to produce solvers in CUDA C++. We compared the performance of the PyTorch
solvers to their CUDA C++ counterparts, using different PDEs and problem sizes,
focusing on the execution time. We also profiled the CUDA C++ solvers in order
to identify bottlenecks that would affect the speed of the CUDA C++ solvers.
Our results show that the CUDA C++ solvers outperformed their equivalent
PyTorch solvers, targeting both the CPU and the GPU. Speedups ranged from
3.5x to 77x compared to PyTorch GPU solvers, and from 52x to 160x compared
to PyTorch CPU solvers.

The generated CUDA C++ solvers were limited by architectural factors such
as shared memory and register availability, as well as runtime factors like memory
latency and warp-level stalls. Additionally, performance depended on manual
tuning of thread block sizes. (Less)
Please use this url to cite or link to this publication:
author
Johannesson, Anna LU
supervisor
organization
alternative title
Lösning av Olinjära PDEs med CUDA
course
EDAM05 20251
year
type
H2 - Master's Degree (Two Years)
subject
keywords
partial differential equations, finite difference methods, CUDA, GPU computing, parallel programming, code generation, performance profiling
publication/series
LU-CS-EX
report number
2025-56
ISSN
1650-2884
language
English
id
9213966
date added to LUP
2025-12-16 11:21:28
date last changed
2025-12-16 11:21:28
@misc{9213966,
  abstract     = {{Partial differential equations (PDEs) are central in many scientific research areas,
including modeling physical phenomena and economic forecasting. While the
general solutions of PDEs can be complex, we can determine a particular solution
by introducing boundary conditions and/or initial conditions. This is called an
Initial Boundary Value Problem (IBVP).

Thalassa is a framework that generates solvers for IBVPs in PyTorch, target-
ing both CPU and GPU execution. One goal of this thesis was to extend Thalassa
to produce solvers in CUDA C++. We compared the performance of the PyTorch
solvers to their CUDA C++ counterparts, using different PDEs and problem sizes,
focusing on the execution time. We also profiled the CUDA C++ solvers in order
to identify bottlenecks that would affect the speed of the CUDA C++ solvers.
Our results show that the CUDA C++ solvers outperformed their equivalent
PyTorch solvers, targeting both the CPU and the GPU. Speedups ranged from
3.5x to 77x compared to PyTorch GPU solvers, and from 52x to 160x compared
to PyTorch CPU solvers.

The generated CUDA C++ solvers were limited by architectural factors such
as shared memory and register availability, as well as runtime factors like memory
latency and warp-level stalls. Additionally, performance depended on manual
tuning of thread block sizes.}},
  author       = {{Johannesson, Anna}},
  issn         = {{1650-2884}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{LU-CS-EX}},
  title        = {{Solving Nonlinear PDEs Using CUDA}},
  year         = {{2025}},
}