Advanced

Hardware Support for CSP on a Java Chip-Multiprocessor

Gruian, Flavius LU and Schoeberl, Martin (2013) In Microprocessors and Microsystems 37(4-5). p.472-481
Abstract
Due to memory bandwidth limitations, chip multiprocessors (CMP) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded applications. Programmatically, the Communicating Sequential Processes (CSP) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory.... (More)
Due to memory bandwidth limitations, chip multiprocessors (CMP) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded applications. Programmatically, the Communicating Sequential Processes (CSP) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory. The presented solution is scalable and also specific for our limited resources and real-time predictability requirements. CMP architectures of three to eight processors were implemented and tested on both Altera (EP1C12, EP2C70) and Xilinx (XC3S1200e) FPGAs, showing that the NoC accounts for under 9% of the total device area used by the system. Compared to shared memory-based communication, our NoC-based solution is between 1.7 and 9.3 times faster for raw data transfer, depending on the communication and memory configuration. Application speed-up, on the other hand, is highly dependent on the type of processing, as our measurements show. (Less)
Please use this url to cite or link to this publication:
author
and
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
Microprocessors and Microsystems
volume
37
issue
4-5
pages
472 - 481
publisher
Elsevier
external identifiers
  • wos:000324667900009
  • scopus:84878560167
ISSN
0141-9331
DOI
10.1016/j.micpro.2012.08.004
language
English
LU publication?
yes
id
6f3e7621-3f2c-4fde-ba96-a5fc7b432ff4 (old id 3047977)
date added to LUP
2016-04-01 09:55:49
date last changed
2020-04-01 01:11:44
@article{6f3e7621-3f2c-4fde-ba96-a5fc7b432ff4,
  abstract     = {Due to memory bandwidth limitations, chip multiprocessors (CMP) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded applications. Programmatically, the Communicating Sequential Processes (CSP) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory. The presented solution is scalable and also specific for our limited resources and real-time predictability requirements. CMP architectures of three to eight processors were implemented and tested on both Altera (EP1C12, EP2C70) and Xilinx (XC3S1200e) FPGAs, showing that the NoC accounts for under 9% of the total device area used by the system. Compared to shared memory-based communication, our NoC-based solution is between 1.7 and 9.3 times faster for raw data transfer, depending on the communication and memory configuration. Application speed-up, on the other hand, is highly dependent on the type of processing, as our measurements show.},
  author       = {Gruian, Flavius and Schoeberl, Martin},
  issn         = {0141-9331},
  language     = {eng},
  number       = {4-5},
  pages        = {472--481},
  publisher    = {Elsevier},
  series       = {Microprocessors and Microsystems},
  title        = {Hardware Support for CSP on a Java Chip-Multiprocessor},
  url          = {http://dx.doi.org/10.1016/j.micpro.2012.08.004},
  doi          = {10.1016/j.micpro.2012.08.004},
  volume       = {37},
  year         = {2013},
}