Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Fault-tolerant average execution time optimization for general-purpose multi-processor system-on-chips

Vayrynen, M. ; Singh, V. and Larsson, Erik LU orcid (2009) Design Automation and Test in Europe (DATE 2009) p.484-489
Abstract
Fault-tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault-tolerance. For a given job and a soft (transient) error probability, we define mathematical formulas for AET that includes bus communication overhead for both voting (active replication) and rollback-recovery with checkpointing (RRC). And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize AET including bus communication overhead... (More)
Fault-tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault-tolerance. For a given job and a soft (transient) error probability, we define mathematical formulas for AET that includes bus communication overhead for both voting (active replication) and rollback-recovery with checkpointing (RRC). And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC, (2) finding the number of processors and job-to-processor assignment when using voting, and (3) defining fault-tolerance scheme (voting or RRC) per job and defining its usage for each job. Experiments demonstrate significant savings in AET. (Less)
Please use this url to cite or link to this publication:
author
; and
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
host publication
2009 Design, Automation & Test in Europe Conference & Exhibition
pages
484 - 489
conference name
Design Automation and Test in Europe (DATE 2009)
conference location
Nice, France
conference dates
2009-04-20 - 2009-04-24
external identifiers
  • scopus:70350068433
ISBN
978-1-4244-3781-8
DOI
10.1109/DATE.2009.5090713
language
English
LU publication?
no
id
c5bd3bcc-246e-48f9-a31e-8b8c5a4bfdaa (old id 2340936)
date added to LUP
2016-04-04 13:32:38
date last changed
2022-01-30 02:39:00
@inproceedings{c5bd3bcc-246e-48f9-a31e-8b8c5a4bfdaa,
  abstract     = {{Fault-tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault-tolerance. For a given job and a soft (transient) error probability, we define mathematical formulas for AET that includes bus communication overhead for both voting (active replication) and rollback-recovery with checkpointing (RRC). And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC, (2) finding the number of processors and job-to-processor assignment when using voting, and (3) defining fault-tolerance scheme (voting or RRC) per job and defining its usage for each job. Experiments demonstrate significant savings in AET.}},
  author       = {{Vayrynen, M. and Singh, V. and Larsson, Erik}},
  booktitle    = {{2009 Design, Automation & Test in Europe Conference & Exhibition}},
  isbn         = {{978-1-4244-3781-8}},
  language     = {{eng}},
  pages        = {{484--489}},
  title        = {{Fault-tolerant average execution time optimization for general-purpose multi-processor system-on-chips}},
  url          = {{http://dx.doi.org/10.1109/DATE.2009.5090713}},
  doi          = {{10.1109/DATE.2009.5090713}},
  year         = {{2009}},
}