Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Software acceleration of multi-user MIMO uplink detection on GPU

Nada, Ali ; Ali, Hazem Ismail ; Liu, Liang LU orcid and Alkabani, Yousra (2025) In Parallel Computing 125.
Abstract

This paper presents the exploration of GPU-accelerated block-wise decompositions for zero-forcing (ZF) based QR and Cholesky methods applied to massive multiple-input multiple-output (MIMO) uplink detection algorithms. Three algorithms are evaluated: ZF with block Cholesky decomposition, ZF with block QR decomposition (QRD), and minimum mean square error (MMSE) with block Cholesky decomposition. The latter was the only one previously explored, but it used standard Cholesky decomposition. Our approach achieves an 11% improvement over the previous GPU-accelerated MMSE study. Through performance analysis, we observe a trade-off between precision and execution time. Reducing precision from FP64 to FP32 improves execution time but increases... (More)

This paper presents the exploration of GPU-accelerated block-wise decompositions for zero-forcing (ZF) based QR and Cholesky methods applied to massive multiple-input multiple-output (MIMO) uplink detection algorithms. Three algorithms are evaluated: ZF with block Cholesky decomposition, ZF with block QR decomposition (QRD), and minimum mean square error (MMSE) with block Cholesky decomposition. The latter was the only one previously explored, but it used standard Cholesky decomposition. Our approach achieves an 11% improvement over the previous GPU-accelerated MMSE study. Through performance analysis, we observe a trade-off between precision and execution time. Reducing precision from FP64 to FP32 improves execution time but increases bit error rate (BER), with ZF-based QRD reducing execution time from 2.04μs to 1.24μs for a 128 × 8 MIMO size. The study also highlights that larger MIMO sizes, particularly 2048 × 32, require GPUs to fully utilize their computational and memory capabilities, especially under FP64 precision. In contrast, smaller matrices are compute-bound. Our results recommend GPUs for larger MIMO sizes, as they offer the parallelism and memory resources necessary to efficiently handle the computational demands of next-generation networks. This work paves the way for scalable, GPU-based massive MIMO uplink detection systems.

(Less)
Please use this url to cite or link to this publication:
author
; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
High-performance computing, Massive MIMO, Matrix decomposition, Parallel computing, Uplink detection
in
Parallel Computing
volume
125
article number
103150
publisher
Elsevier
external identifiers
  • scopus:105014755752
ISSN
0167-8191
DOI
10.1016/j.parco.2025.103150
language
English
LU publication?
yes
id
67fc8d0b-3d20-4135-afd0-d8d9f721aaf4
date added to LUP
2025-10-16 12:11:29
date last changed
2025-10-17 02:23:27
@article{67fc8d0b-3d20-4135-afd0-d8d9f721aaf4,
  abstract     = {{<p>This paper presents the exploration of GPU-accelerated block-wise decompositions for zero-forcing (ZF) based QR and Cholesky methods applied to massive multiple-input multiple-output (MIMO) uplink detection algorithms. Three algorithms are evaluated: ZF with block Cholesky decomposition, ZF with block QR decomposition (QRD), and minimum mean square error (MMSE) with block Cholesky decomposition. The latter was the only one previously explored, but it used standard Cholesky decomposition. Our approach achieves an 11% improvement over the previous GPU-accelerated MMSE study. Through performance analysis, we observe a trade-off between precision and execution time. Reducing precision from FP64 to FP32 improves execution time but increases bit error rate (BER), with ZF-based QRD reducing execution time from 2.04μs to 1.24μs for a 128 × 8 MIMO size. The study also highlights that larger MIMO sizes, particularly 2048 × 32, require GPUs to fully utilize their computational and memory capabilities, especially under FP64 precision. In contrast, smaller matrices are compute-bound. Our results recommend GPUs for larger MIMO sizes, as they offer the parallelism and memory resources necessary to efficiently handle the computational demands of next-generation networks. This work paves the way for scalable, GPU-based massive MIMO uplink detection systems.</p>}},
  author       = {{Nada, Ali and Ali, Hazem Ismail and Liu, Liang and Alkabani, Yousra}},
  issn         = {{0167-8191}},
  keywords     = {{High-performance computing; Massive MIMO; Matrix decomposition; Parallel computing; Uplink detection}},
  language     = {{eng}},
  publisher    = {{Elsevier}},
  series       = {{Parallel Computing}},
  title        = {{Software acceleration of multi-user MIMO uplink detection on GPU}},
  url          = {{http://dx.doi.org/10.1016/j.parco.2025.103150}},
  doi          = {{10.1016/j.parco.2025.103150}},
  volume       = {{125}},
  year         = {{2025}},
}