Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocks
(2022) 31st International Conference on Parallel Architectures and Compilation Techniques, PACT 2022 p.398-411- Abstract
FPGAs play an increasing role in the reconfgurable accelerator landscape. A key challenge in designing FPGA-based systems is partitioning computation between processor cores and FPGAs. An appropriate division of labor is difcult to predict in advance and requires experiments and measurements. When an investigation requires rewriting part of the system in a new language or with a new programming model, its high cost can delay design-space exploration. A single-language system with an appropriate programming model and compiler that targets both platforms transforms this tedious exploration to a simple recompile with new compiler directives. This work introduces StreamBlocks, a unifed open-source software/FPGA compiler and runtime that... (More)
FPGAs play an increasing role in the reconfgurable accelerator landscape. A key challenge in designing FPGA-based systems is partitioning computation between processor cores and FPGAs. An appropriate division of labor is difcult to predict in advance and requires experiments and measurements. When an investigation requires rewriting part of the system in a new language or with a new programming model, its high cost can delay design-space exploration. A single-language system with an appropriate programming model and compiler that targets both platforms transforms this tedious exploration to a simple recompile with new compiler directives. This work introduces StreamBlocks, a unifed open-source software/FPGA compiler and runtime that takes dataflow programs written in Cal, and automatically partitions them across heterogeneous CPU/FPGA platforms. The explicit task-parallel semantics of dataflow allows our compiler to simultaneously take advantage of thread parallelism on software and spatial parallelism on hardware. StreamBlocks is augmented with a profle-guided autopartitioning tool that helps identify the best hardware-software partitions. We demonstrate the capability of our compiler in fnding the right balance between hardware and software execution on both a high-end datacenter accelerator card and an embedded board. Our experiments exhibit a 4-7× speedup over trivial partitions. This speedup is achieved automatically with zero code modifcations.
(Less)
- author
- Emami, Mahyar ; Bezati, Endri ; Janneck, Jörn W. LU and Larus, James R.
- organization
- publishing date
- 2022-10
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- keywords
- Actors, partitioning, Reconfgurable computing
- host publication
- Proceedings of the 2022 International Conference on Parallel Architectures and Compilation Techniques
- pages
- 14 pages
- publisher
- IEEE - Institute of Electrical and Electronics Engineers Inc.
- conference name
- 31st International Conference on Parallel Architectures and Compilation Techniques, PACT 2022
- conference location
- Chicago, United States
- conference dates
- 2022-10-08 - 2022-10-10
- external identifiers
-
- scopus:85147330085
- ISBN
- 9781450398688
- DOI
- 10.1145/3559009.3569659
- language
- English
- LU publication?
- yes
- id
- 179785e4-6398-4091-aeb3-32eaf1bdf41a
- date added to LUP
- 2023-02-20 13:57:13
- date last changed
- 2025-04-04 14:02:53
@inproceedings{179785e4-6398-4091-aeb3-32eaf1bdf41a, abstract = {{<p>FPGAs play an increasing role in the reconfgurable accelerator landscape. A key challenge in designing FPGA-based systems is partitioning computation between processor cores and FPGAs. An appropriate division of labor is difcult to predict in advance and requires experiments and measurements. When an investigation requires rewriting part of the system in a new language or with a new programming model, its high cost can delay design-space exploration. A single-language system with an appropriate programming model and compiler that targets both platforms transforms this tedious exploration to a simple recompile with new compiler directives. This work introduces StreamBlocks, a unifed open-source software/FPGA compiler and runtime that takes dataflow programs written in Cal, and automatically partitions them across heterogeneous CPU/FPGA platforms. The explicit task-parallel semantics of dataflow allows our compiler to simultaneously take advantage of thread parallelism on software and spatial parallelism on hardware. StreamBlocks is augmented with a profle-guided autopartitioning tool that helps identify the best hardware-software partitions. We demonstrate the capability of our compiler in fnding the right balance between hardware and software execution on both a high-end datacenter accelerator card and an embedded board. Our experiments exhibit a 4-7× speedup over trivial partitions. This speedup is achieved automatically with zero code modifcations.</p>}}, author = {{Emami, Mahyar and Bezati, Endri and Janneck, Jörn W. and Larus, James R.}}, booktitle = {{Proceedings of the 2022 International Conference on Parallel Architectures and Compilation Techniques}}, isbn = {{9781450398688}}, keywords = {{Actors; partitioning; Reconfgurable computing}}, language = {{eng}}, pages = {{398--411}}, publisher = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}}, title = {{Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocks}}, url = {{http://dx.doi.org/10.1145/3559009.3569659}}, doi = {{10.1145/3559009.3569659}}, year = {{2022}}, }