Solving Nonlinear PDEs Using CUDA
(2025) In LU-CS-EX EDAM05 20251Department of Computer Science
- Abstract
- Partial differential equations (PDEs) are central in many scientific research areas,
including modeling physical phenomena and economic forecasting. While the
general solutions of PDEs can be complex, we can determine a particular solution
by introducing boundary conditions and/or initial conditions. This is called an
Initial Boundary Value Problem (IBVP).
Thalassa is a framework that generates solvers for IBVPs in PyTorch, target-
ing both CPU and GPU execution. One goal of this thesis was to extend Thalassa
to produce solvers in CUDA C++. We compared the performance of the PyTorch
solvers to their CUDA C++ counterparts, using different PDEs and problem sizes,
focusing on the execution time. We also profiled the CUDA C++... (More) - Partial differential equations (PDEs) are central in many scientific research areas,
including modeling physical phenomena and economic forecasting. While the
general solutions of PDEs can be complex, we can determine a particular solution
by introducing boundary conditions and/or initial conditions. This is called an
Initial Boundary Value Problem (IBVP).
Thalassa is a framework that generates solvers for IBVPs in PyTorch, target-
ing both CPU and GPU execution. One goal of this thesis was to extend Thalassa
to produce solvers in CUDA C++. We compared the performance of the PyTorch
solvers to their CUDA C++ counterparts, using different PDEs and problem sizes,
focusing on the execution time. We also profiled the CUDA C++ solvers in order
to identify bottlenecks that would affect the speed of the CUDA C++ solvers.
Our results show that the CUDA C++ solvers outperformed their equivalent
PyTorch solvers, targeting both the CPU and the GPU. Speedups ranged from
3.5x to 77x compared to PyTorch GPU solvers, and from 52x to 160x compared
to PyTorch CPU solvers.
The generated CUDA C++ solvers were limited by architectural factors such
as shared memory and register availability, as well as runtime factors like memory
latency and warp-level stalls. Additionally, performance depended on manual
tuning of thread block sizes. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9213966
- author
- Johannesson, Anna LU
- supervisor
- organization
- alternative title
- Lösning av Olinjära PDEs med CUDA
- course
- EDAM05 20251
- year
- 2025
- type
- H2 - Master's Degree (Two Years)
- subject
- keywords
- partial differential equations, finite difference methods, CUDA, GPU computing, parallel programming, code generation, performance profiling
- publication/series
- LU-CS-EX
- report number
- 2025-56
- ISSN
- 1650-2884
- language
- English
- id
- 9213966
- date added to LUP
- 2025-12-16 11:21:28
- date last changed
- 2025-12-16 11:21:28
@misc{9213966,
abstract = {{Partial differential equations (PDEs) are central in many scientific research areas,
including modeling physical phenomena and economic forecasting. While the
general solutions of PDEs can be complex, we can determine a particular solution
by introducing boundary conditions and/or initial conditions. This is called an
Initial Boundary Value Problem (IBVP).
Thalassa is a framework that generates solvers for IBVPs in PyTorch, target-
ing both CPU and GPU execution. One goal of this thesis was to extend Thalassa
to produce solvers in CUDA C++. We compared the performance of the PyTorch
solvers to their CUDA C++ counterparts, using different PDEs and problem sizes,
focusing on the execution time. We also profiled the CUDA C++ solvers in order
to identify bottlenecks that would affect the speed of the CUDA C++ solvers.
Our results show that the CUDA C++ solvers outperformed their equivalent
PyTorch solvers, targeting both the CPU and the GPU. Speedups ranged from
3.5x to 77x compared to PyTorch GPU solvers, and from 52x to 160x compared
to PyTorch CPU solvers.
The generated CUDA C++ solvers were limited by architectural factors such
as shared memory and register availability, as well as runtime factors like memory
latency and warp-level stalls. Additionally, performance depended on manual
tuning of thread block sizes.}},
author = {{Johannesson, Anna}},
issn = {{1650-2884}},
language = {{eng}},
note = {{Student Paper}},
series = {{LU-CS-EX}},
title = {{Solving Nonlinear PDEs Using CUDA}},
year = {{2025}},
}