Software acceleration of multi-user MIMO uplink detection on GPU

Nada, Ali; Ali, Hazem Ismail; Liu, Liang; Alkabani, Yousra

Software acceleration of multi-user MIMO uplink detection on GPU

Mark

Nada, Ali ; Ali, Hazem Ismail ; Liu, Liang ^LU

and Alkabani, Yousra (2025) In Parallel Computing 125.

Abstract: This paper presents the exploration of GPU-accelerated block-wise decompositions for zero-forcing (ZF) based QR and Cholesky methods applied to massive multiple-input multiple-output (MIMO) uplink detection algorithms. Three algorithms are evaluated: ZF with block Cholesky decomposition, ZF with block QR decomposition (QRD), and minimum mean square error (MMSE) with block Cholesky decomposition. The latter was the only one previously explored, but it used standard Cholesky decomposition. Our approach achieves an 11% improvement over the previous GPU-accelerated MMSE study. Through performance analysis, we observe a trade-off between precision and execution time. Reducing precision from FP64 to FP32 improves execution time but increases... (More); This paper presents the exploration of GPU-accelerated block-wise decompositions for zero-forcing (ZF) based QR and Cholesky methods applied to massive multiple-input multiple-output (MIMO) uplink detection algorithms. Three algorithms are evaluated: ZF with block Cholesky decomposition, ZF with block QR decomposition (QRD), and minimum mean square error (MMSE) with block Cholesky decomposition. The latter was the only one previously explored, but it used standard Cholesky decomposition. Our approach achieves an 11% improvement over the previous GPU-accelerated MMSE study. Through performance analysis, we observe a trade-off between precision and execution time. Reducing precision from FP64 to FP32 improves execution time but increases bit error rate (BER), with ZF-based QRD reducing execution time from 2.04μs to 1.24μs for a 128 × 8 MIMO size. The study also highlights that larger MIMO sizes, particularly 2048 × 32, require GPUs to fully utilize their computational and memory capabilities, especially under FP64 precision. In contrast, smaller matrices are compute-bound. Our results recommend GPUs for larger MIMO sizes, as they offer the parallelism and memory resources necessary to efficiently handle the computational demands of next-generation networks. This work paves the way for scalable, GPU-based massive MIMO uplink detection systems.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/67fc8d0b-3d20-4135-afd0-d8d9f721aaf4

author

Nada, Ali ; Ali, Hazem Ismail ; Liu, Liang ^LU

and Alkabani, Yousra

organization

publishing date

2025-09

type

Contribution to journal

publication status

published

subject

Telecommunications

keywords

High-performance computing, Massive MIMO, Matrix decomposition, Parallel computing, Uplink detection

in

Parallel Computing

volume

125

article number

103150

publisher

Elsevier

external identifiers

scopus:105014755752

ISSN

0167-8191

DOI

10.1016/j.parco.2025.103150

language

English

LU publication?

yes

id

67fc8d0b-3d20-4135-afd0-d8d9f721aaf4

date added to LUP

2025-10-16 12:11:29

date last changed

2025-10-17 02:23:27

@article{67fc8d0b-3d20-4135-afd0-d8d9f721aaf4,
  abstract     = {{<p>This paper presents the exploration of GPU-accelerated block-wise decompositions for zero-forcing (ZF) based QR and Cholesky methods applied to massive multiple-input multiple-output (MIMO) uplink detection algorithms. Three algorithms are evaluated: ZF with block Cholesky decomposition, ZF with block QR decomposition (QRD), and minimum mean square error (MMSE) with block Cholesky decomposition. The latter was the only one previously explored, but it used standard Cholesky decomposition. Our approach achieves an 11% improvement over the previous GPU-accelerated MMSE study. Through performance analysis, we observe a trade-off between precision and execution time. Reducing precision from FP64 to FP32 improves execution time but increases bit error rate (BER), with ZF-based QRD reducing execution time from 2.04μs to 1.24μs for a 128 × 8 MIMO size. The study also highlights that larger MIMO sizes, particularly 2048 × 32, require GPUs to fully utilize their computational and memory capabilities, especially under FP64 precision. In contrast, smaller matrices are compute-bound. Our results recommend GPUs for larger MIMO sizes, as they offer the parallelism and memory resources necessary to efficiently handle the computational demands of next-generation networks. This work paves the way for scalable, GPU-based massive MIMO uplink detection systems.</p>}},
  author       = {{Nada, Ali and Ali, Hazem Ismail and Liu, Liang and Alkabani, Yousra}},
  issn         = {{0167-8191}},
  keywords     = {{High-performance computing; Massive MIMO; Matrix decomposition; Parallel computing; Uplink detection}},
  language     = {{eng}},
  publisher    = {{Elsevier}},
  series       = {{Parallel Computing}},
  title        = {{Software acceleration of multi-user MIMO uplink detection on GPU}},
  url          = {{http://dx.doi.org/10.1016/j.parco.2025.103150}},
  doi          = {{10.1016/j.parco.2025.103150}},
  volume       = {{125}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Software acceleration of multi-user MIMO uplink detection on GPU