Hardware Support for CSP on a Java Chip-Multiprocessor
(2013) In Microprocessors and Microsystems 37(4-5). p.472-481- Abstract
- Due to memory bandwidth limitations, chip multiprocessors (CMP) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded applications. Programmatically, the Communicating Sequential Processes (CSP) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory.... (More)
- Due to memory bandwidth limitations, chip multiprocessors (CMP) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded applications. Programmatically, the Communicating Sequential Processes (CSP) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory. The presented solution is scalable and also specific for our limited resources and real-time predictability requirements. CMP architectures of three to eight processors were implemented and tested on both Altera (EP1C12, EP2C70) and Xilinx (XC3S1200e) FPGAs, showing that the NoC accounts for under 9% of the total device area used by the system. Compared to shared memory-based communication, our NoC-based solution is between 1.7 and 9.3 times faster for raw data transfer, depending on the communication and memory configuration. Application speed-up, on the other hand, is highly dependent on the type of processing, as our measurements show. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/3047977
- author
- Gruian, Flavius LU and Schoeberl, Martin
- organization
- publishing date
- 2013
- type
- Contribution to journal
- publication status
- published
- subject
- in
- Microprocessors and Microsystems
- volume
- 37
- issue
- 4-5
- pages
- 472 - 481
- publisher
- Elsevier
- external identifiers
-
- wos:000324667900009
- scopus:84878560167
- ISSN
- 0141-9331
- DOI
- 10.1016/j.micpro.2012.08.004
- language
- English
- LU publication?
- yes
- id
- 6f3e7621-3f2c-4fde-ba96-a5fc7b432ff4 (old id 3047977)
- date added to LUP
- 2016-04-01 09:55:49
- date last changed
- 2022-01-25 18:04:06
@article{6f3e7621-3f2c-4fde-ba96-a5fc7b432ff4, abstract = {{Due to memory bandwidth limitations, chip multiprocessors (CMP) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded applications. Programmatically, the Communicating Sequential Processes (CSP) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory. The presented solution is scalable and also specific for our limited resources and real-time predictability requirements. CMP architectures of three to eight processors were implemented and tested on both Altera (EP1C12, EP2C70) and Xilinx (XC3S1200e) FPGAs, showing that the NoC accounts for under 9% of the total device area used by the system. Compared to shared memory-based communication, our NoC-based solution is between 1.7 and 9.3 times faster for raw data transfer, depending on the communication and memory configuration. Application speed-up, on the other hand, is highly dependent on the type of processing, as our measurements show.}}, author = {{Gruian, Flavius and Schoeberl, Martin}}, issn = {{0141-9331}}, language = {{eng}}, number = {{4-5}}, pages = {{472--481}}, publisher = {{Elsevier}}, series = {{Microprocessors and Microsystems}}, title = {{Hardware Support for CSP on a Java Chip-Multiprocessor}}, url = {{http://dx.doi.org/10.1016/j.micpro.2012.08.004}}, doi = {{10.1016/j.micpro.2012.08.004}}, volume = {{37}}, year = {{2013}}, }