Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

ASaP : Automatic Software Prefetching for Sparse Tensor Computations in MLIR

Sotiropoulos, Konstantinos ; Skeppstedt, Jonas LU orcid and Stenström, Per LU (2025) 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops In Proceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops p.1017-1027
Abstract

Sparse tensor computations suffer from irregular memory access patterns that degrade cache performance. While software prefetching can mitigate this, existing compiler approaches lack the semantic insight needed for effective optimization. We present ASaP, an automatic software prefetching framework integrated within MLIR’s sparse tensor dialect. By leveraging semantic information-tensor formats and loop structure-available during sparsification, ASaP determines accurate buffer bounds and injects prefetches in both innermost and outer loops, achieving broader coverage than prior work. Evaluated on SuiteSparse matrices, ASaP demonstrates significant performance gains for unstructured matrices. For SpMV with innermost-loop prefetching,... (More)

Sparse tensor computations suffer from irregular memory access patterns that degrade cache performance. While software prefetching can mitigate this, existing compiler approaches lack the semantic insight needed for effective optimization. We present ASaP, an automatic software prefetching framework integrated within MLIR’s sparse tensor dialect. By leveraging semantic information-tensor formats and loop structure-available during sparsification, ASaP determines accurate buffer bounds and injects prefetches in both innermost and outer loops, achieving broader coverage than prior work. Evaluated on SuiteSparse matrices, ASaP demonstrates significant performance gains for unstructured matrices. For SpMV with innermost-loop prefetching, ASaP achieves 1.38× speedup over Ainsworth & Jones. For SpMM with outer-loop prefetching, ASaP achieves 1.28× speedup while Ainsworth & Jones fails to generate prefetches. Our experiments reveal that disabling inaccurate hardware prefetchers frees critical resources for software prefetching, suggesting future architectures should expose prefetcher control as an optimization interface.

(Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
software prefetching, sparse data structures, sparse tensors
host publication
Proceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops
series title
Proceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops
pages
11 pages
publisher
Association for Computing Machinery (ACM)
conference name
2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops
conference location
St. Louis, United States
conference dates
2025-11-16 - 2025-11-21
external identifiers
  • scopus:105023391502
ISBN
9798400718717
DOI
10.1145/3731599.3767477
language
English
LU publication?
yes
additional info
Publisher Copyright: © 2025 Copyright held by the owner/author(s).
id
03e045f9-da7e-4f34-b15a-446db55bbd5d
date added to LUP
2026-01-22 10:37:13
date last changed
2026-01-22 10:37:30
@inproceedings{03e045f9-da7e-4f34-b15a-446db55bbd5d,
  abstract     = {{<p>Sparse tensor computations suffer from irregular memory access patterns that degrade cache performance. While software prefetching can mitigate this, existing compiler approaches lack the semantic insight needed for effective optimization. We present ASaP, an automatic software prefetching framework integrated within MLIR’s sparse tensor dialect. By leveraging semantic information-tensor formats and loop structure-available during sparsification, ASaP determines accurate buffer bounds and injects prefetches in both innermost and outer loops, achieving broader coverage than prior work. Evaluated on SuiteSparse matrices, ASaP demonstrates significant performance gains for unstructured matrices. For SpMV with innermost-loop prefetching, ASaP achieves 1.38× speedup over Ainsworth &amp; Jones. For SpMM with outer-loop prefetching, ASaP achieves 1.28× speedup while Ainsworth &amp; Jones fails to generate prefetches. Our experiments reveal that disabling inaccurate hardware prefetchers frees critical resources for software prefetching, suggesting future architectures should expose prefetcher control as an optimization interface.</p>}},
  author       = {{Sotiropoulos, Konstantinos and Skeppstedt, Jonas and Stenström, Per}},
  booktitle    = {{Proceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops}},
  isbn         = {{9798400718717}},
  keywords     = {{software prefetching; sparse data structures; sparse tensors}},
  language     = {{eng}},
  month        = {{11}},
  pages        = {{1017--1027}},
  publisher    = {{Association for Computing Machinery (ACM)}},
  series       = {{Proceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops}},
  title        = {{ASaP : Automatic Software Prefetching for Sparse Tensor Computations in MLIR}},
  url          = {{http://dx.doi.org/10.1145/3731599.3767477}},
  doi          = {{10.1145/3731599.3767477}},
  year         = {{2025}},
}