Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocks

Emami, Mahyar ; Bezati, Endri ; Janneck, Jörn W. LU and Larus, James R. (2022) 31st International Conference on Parallel Architectures and Compilation Techniques, PACT 2022 p.398-411
Abstract

FPGAs play an increasing role in the reconfgurable accelerator landscape. A key challenge in designing FPGA-based systems is partitioning computation between processor cores and FPGAs. An appropriate division of labor is difcult to predict in advance and requires experiments and measurements. When an investigation requires rewriting part of the system in a new language or with a new programming model, its high cost can delay design-space exploration. A single-language system with an appropriate programming model and compiler that targets both platforms transforms this tedious exploration to a simple recompile with new compiler directives. This work introduces StreamBlocks, a unifed open-source software/FPGA compiler and runtime that... (More)

FPGAs play an increasing role in the reconfgurable accelerator landscape. A key challenge in designing FPGA-based systems is partitioning computation between processor cores and FPGAs. An appropriate division of labor is difcult to predict in advance and requires experiments and measurements. When an investigation requires rewriting part of the system in a new language or with a new programming model, its high cost can delay design-space exploration. A single-language system with an appropriate programming model and compiler that targets both platforms transforms this tedious exploration to a simple recompile with new compiler directives. This work introduces StreamBlocks, a unifed open-source software/FPGA compiler and runtime that takes dataflow programs written in Cal, and automatically partitions them across heterogeneous CPU/FPGA platforms. The explicit task-parallel semantics of dataflow allows our compiler to simultaneously take advantage of thread parallelism on software and spatial parallelism on hardware. StreamBlocks is augmented with a profle-guided autopartitioning tool that helps identify the best hardware-software partitions. We demonstrate the capability of our compiler in fnding the right balance between hardware and software execution on both a high-end datacenter accelerator card and an embedded board. Our experiments exhibit a 4-7× speedup over trivial partitions. This speedup is achieved automatically with zero code modifcations.

(Less)
Please use this url to cite or link to this publication:
author
; ; and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
Actors, partitioning, Reconfgurable computing
host publication
Proceedings of the 2022 International Conference on Parallel Architectures and Compilation Techniques
pages
14 pages
publisher
IEEE - Institute of Electrical and Electronics Engineers Inc.
conference name
31st International Conference on Parallel Architectures and Compilation Techniques, PACT 2022
conference location
Chicago, United States
conference dates
2022-10-08 - 2022-10-10
external identifiers
  • scopus:85147330085
ISBN
9781450398688
DOI
10.1145/3559009.3569659
language
English
LU publication?
yes
id
179785e4-6398-4091-aeb3-32eaf1bdf41a
date added to LUP
2023-02-20 13:57:13
date last changed
2023-11-21 06:49:37
@inproceedings{179785e4-6398-4091-aeb3-32eaf1bdf41a,
  abstract     = {{<p>FPGAs play an increasing role in the reconfgurable accelerator landscape. A key challenge in designing FPGA-based systems is partitioning computation between processor cores and FPGAs. An appropriate division of labor is difcult to predict in advance and requires experiments and measurements. When an investigation requires rewriting part of the system in a new language or with a new programming model, its high cost can delay design-space exploration. A single-language system with an appropriate programming model and compiler that targets both platforms transforms this tedious exploration to a simple recompile with new compiler directives. This work introduces StreamBlocks, a unifed open-source software/FPGA compiler and runtime that takes dataflow programs written in Cal, and automatically partitions them across heterogeneous CPU/FPGA platforms. The explicit task-parallel semantics of dataflow allows our compiler to simultaneously take advantage of thread parallelism on software and spatial parallelism on hardware. StreamBlocks is augmented with a profle-guided autopartitioning tool that helps identify the best hardware-software partitions. We demonstrate the capability of our compiler in fnding the right balance between hardware and software execution on both a high-end datacenter accelerator card and an embedded board. Our experiments exhibit a 4-7× speedup over trivial partitions. This speedup is achieved automatically with zero code modifcations.</p>}},
  author       = {{Emami, Mahyar and Bezati, Endri and Janneck, Jörn W. and Larus, James R.}},
  booktitle    = {{Proceedings of the 2022 International Conference on Parallel Architectures and Compilation Techniques}},
  isbn         = {{9781450398688}},
  keywords     = {{Actors; partitioning; Reconfgurable computing}},
  language     = {{eng}},
  pages        = {{398--411}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  title        = {{Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocks}},
  url          = {{http://dx.doi.org/10.1145/3559009.3569659}},
  doi          = {{10.1145/3559009.3569659}},
  year         = {{2022}},
}