Software acceleration of multi-user MIMO uplink detection on GPU
(2025) In Parallel Computing 125.- Abstract
This paper presents the exploration of GPU-accelerated block-wise decompositions for zero-forcing (ZF) based QR and Cholesky methods applied to massive multiple-input multiple-output (MIMO) uplink detection algorithms. Three algorithms are evaluated: ZF with block Cholesky decomposition, ZF with block QR decomposition (QRD), and minimum mean square error (MMSE) with block Cholesky decomposition. The latter was the only one previously explored, but it used standard Cholesky decomposition. Our approach achieves an 11% improvement over the previous GPU-accelerated MMSE study. Through performance analysis, we observe a trade-off between precision and execution time. Reducing precision from FP64 to FP32 improves execution time but increases... (More)
This paper presents the exploration of GPU-accelerated block-wise decompositions for zero-forcing (ZF) based QR and Cholesky methods applied to massive multiple-input multiple-output (MIMO) uplink detection algorithms. Three algorithms are evaluated: ZF with block Cholesky decomposition, ZF with block QR decomposition (QRD), and minimum mean square error (MMSE) with block Cholesky decomposition. The latter was the only one previously explored, but it used standard Cholesky decomposition. Our approach achieves an 11% improvement over the previous GPU-accelerated MMSE study. Through performance analysis, we observe a trade-off between precision and execution time. Reducing precision from FP64 to FP32 improves execution time but increases bit error rate (BER), with ZF-based QRD reducing execution time from 2.04μs to 1.24μs for a 128 × 8 MIMO size. The study also highlights that larger MIMO sizes, particularly 2048 × 32, require GPUs to fully utilize their computational and memory capabilities, especially under FP64 precision. In contrast, smaller matrices are compute-bound. Our results recommend GPUs for larger MIMO sizes, as they offer the parallelism and memory resources necessary to efficiently handle the computational demands of next-generation networks. This work paves the way for scalable, GPU-based massive MIMO uplink detection systems.
(Less)
- author
- Nada, Ali
; Ali, Hazem Ismail
; Liu, Liang
LU
and Alkabani, Yousra
- organization
- publishing date
- 2025-09
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- High-performance computing, Massive MIMO, Matrix decomposition, Parallel computing, Uplink detection
- in
- Parallel Computing
- volume
- 125
- article number
- 103150
- publisher
- Elsevier
- external identifiers
-
- scopus:105014755752
- ISSN
- 0167-8191
- DOI
- 10.1016/j.parco.2025.103150
- language
- English
- LU publication?
- yes
- id
- 67fc8d0b-3d20-4135-afd0-d8d9f721aaf4
- date added to LUP
- 2025-10-16 12:11:29
- date last changed
- 2025-10-17 02:23:27
@article{67fc8d0b-3d20-4135-afd0-d8d9f721aaf4,
abstract = {{<p>This paper presents the exploration of GPU-accelerated block-wise decompositions for zero-forcing (ZF) based QR and Cholesky methods applied to massive multiple-input multiple-output (MIMO) uplink detection algorithms. Three algorithms are evaluated: ZF with block Cholesky decomposition, ZF with block QR decomposition (QRD), and minimum mean square error (MMSE) with block Cholesky decomposition. The latter was the only one previously explored, but it used standard Cholesky decomposition. Our approach achieves an 11% improvement over the previous GPU-accelerated MMSE study. Through performance analysis, we observe a trade-off between precision and execution time. Reducing precision from FP64 to FP32 improves execution time but increases bit error rate (BER), with ZF-based QRD reducing execution time from 2.04μs to 1.24μs for a 128 × 8 MIMO size. The study also highlights that larger MIMO sizes, particularly 2048 × 32, require GPUs to fully utilize their computational and memory capabilities, especially under FP64 precision. In contrast, smaller matrices are compute-bound. Our results recommend GPUs for larger MIMO sizes, as they offer the parallelism and memory resources necessary to efficiently handle the computational demands of next-generation networks. This work paves the way for scalable, GPU-based massive MIMO uplink detection systems.</p>}},
author = {{Nada, Ali and Ali, Hazem Ismail and Liu, Liang and Alkabani, Yousra}},
issn = {{0167-8191}},
keywords = {{High-performance computing; Massive MIMO; Matrix decomposition; Parallel computing; Uplink detection}},
language = {{eng}},
publisher = {{Elsevier}},
series = {{Parallel Computing}},
title = {{Software acceleration of multi-user MIMO uplink detection on GPU}},
url = {{http://dx.doi.org/10.1016/j.parco.2025.103150}},
doi = {{10.1016/j.parco.2025.103150}},
volume = {{125}},
year = {{2025}},
}