Hardware Support for CSP on a Java Chip-Multiprocessor

Gruian, Flavius; Schoeberl, Martin

Hardware Support for CSP on a Java Chip-Multiprocessor

Mark

Gruian, Flavius ^LU

and Schoeberl, Martin (2013) In Microprocessors and Microsystems 37(4-5). p.472-481

Abstract: Due to memory bandwidth limitations, chip multiprocessors (CMP) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded applications. Programmatically, the Communicating Sequential Processes (CSP) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory.... (More); Due to memory bandwidth limitations, chip multiprocessors (CMP) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded applications. Programmatically, the Communicating Sequential Processes (CSP) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory. The presented solution is scalable and also specific for our limited resources and real-time predictability requirements. CMP architectures of three to eight processors were implemented and tested on both Altera (EP1C12, EP2C70) and Xilinx (XC3S1200e) FPGAs, showing that the NoC accounts for under 9% of the total device area used by the system. Compared to shared memory-based communication, our NoC-based solution is between 1.7 and 9.3 times faster for raw data transfer, depending on the communication and memory configuration. Application speed-up, on the other hand, is highly dependent on the type of processing, as our measurements show. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/3047977

author

Gruian, Flavius ^LU

and Schoeberl, Martin

organization

publishing date

2013

type

Contribution to journal

publication status

published

subject

Computer Sciences

in

Microprocessors and Microsystems

volume

37

issue

4-5

pages

472 - 481

publisher

Elsevier

external identifiers

wos:000324667900009
scopus:84878560167

ISSN

0141-9331

DOI

10.1016/j.micpro.2012.08.004

language

English

LU publication?

yes

id

6f3e7621-3f2c-4fde-ba96-a5fc7b432ff4 (old id 3047977)

date added to LUP

2016-04-01 09:55:49

date last changed

2025-11-19 13:32:09

@article{6f3e7621-3f2c-4fde-ba96-a5fc7b432ff4,
  abstract     = {{Due to memory bandwidth limitations, chip multiprocessors (CMP) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded applications. Programmatically, the Communicating Sequential Processes (CSP) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory. The presented solution is scalable and also specific for our limited resources and real-time predictability requirements. CMP architectures of three to eight processors were implemented and tested on both Altera (EP1C12, EP2C70) and Xilinx (XC3S1200e) FPGAs, showing that the NoC accounts for under 9% of the total device area used by the system. Compared to shared memory-based communication, our NoC-based solution is between 1.7 and 9.3 times faster for raw data transfer, depending on the communication and memory configuration. Application speed-up, on the other hand, is highly dependent on the type of processing, as our measurements show.}},
  author       = {{Gruian, Flavius and Schoeberl, Martin}},
  issn         = {{0141-9331}},
  language     = {{eng}},
  number       = {{4-5}},
  pages        = {{472--481}},
  publisher    = {{Elsevier}},
  series       = {{Microprocessors and Microsystems}},
  title        = {{Hardware Support for CSP on a Java Chip-Multiprocessor}},
  url          = {{http://dx.doi.org/10.1016/j.micpro.2012.08.004}},
  doi          = {{10.1016/j.micpro.2012.08.004}},
  volume       = {{37}},
  year         = {{2013}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Hardware Support for CSP on a Java Chip-Multiprocessor