Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Level of Confidence Evaluation and Its Usage for Roll-back Recovery with Checkpointing Optimization

Nikolov, Dimitar LU ; Ingelsson, Urban ; Singh, Virendra and Larsson, Erik LU orcid (2011) 5th Workshop on Dependable and Secure Nanocomputing p.59-64
Abstract
Increasing soft error rates for semiconductor devices manu- factured in later technologies enforces the use of fault tolerant techniques such as Roll-back Recovery with Checkpointing (RRC). However, RRC introduces time overhead that increases the completion (execution) time. For non-real-time systems, research have focused on optimizing RRC and shown that it is possible to find the optimal number of checkpoints such that the average execution time is minimal. While minimal average execution time is important, it is for real-time systems important to provide a high probability that deadlines are met. Hence, there is a need of probabilistic guarantees that jobs employing RRC complete before a given deadline. First, we present a mathematical... (More)
Increasing soft error rates for semiconductor devices manu- factured in later technologies enforces the use of fault tolerant techniques such as Roll-back Recovery with Checkpointing (RRC). However, RRC introduces time overhead that increases the completion (execution) time. For non-real-time systems, research have focused on optimizing RRC and shown that it is possible to find the optimal number of checkpoints such that the average execution time is minimal. While minimal average execution time is important, it is for real-time systems important to provide a high probability that deadlines are met. Hence, there is a need of probabilistic guarantees that jobs employing RRC complete before a given deadline. First, we present a mathematical framework for the evaluation of level of confidence, the probability that a given deadline is met, when RRC is employed. Second, we present an optimization method for RRC that finds the number of checkpoints that results in the minimal completion time while the minimal com- pletion time satisfies a given level of confidence requirement. Third, we use the proposed framework to evaluate probabilistic guarantees for RRC optimization in non-real-time systems. (Less)
Please use this url to cite or link to this publication:
author
; ; and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
host publication
2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks Workshops (DSN-W)
pages
6 pages
publisher
IEEE - Institute of Electrical and Electronics Engineers Inc.
conference name
5th Workshop on Dependable and Secure Nanocomputing
conference dates
2011-06-27
external identifiers
  • scopus:80052149078
DOI
10.1109/DSNW.2011.5958836
language
English
LU publication?
no
id
99398648-1840-4be2-b40d-2bc9bd3c96b1 (old id 2733979)
date added to LUP
2016-04-04 11:52:14
date last changed
2022-04-08 07:57:04
@inproceedings{99398648-1840-4be2-b40d-2bc9bd3c96b1,
  abstract     = {{Increasing soft error rates for semiconductor devices manu- factured in later technologies enforces the use of fault tolerant techniques such as Roll-back Recovery with Checkpointing (RRC). However, RRC introduces time overhead that increases the completion (execution) time. For non-real-time systems, research have focused on optimizing RRC and shown that it is possible to find the optimal number of checkpoints such that the average execution time is minimal. While minimal average execution time is important, it is for real-time systems important to provide a high probability that deadlines are met. Hence, there is a need of probabilistic guarantees that jobs employing RRC complete before a given deadline. First, we present a mathematical framework for the evaluation of level of confidence, the probability that a given deadline is met, when RRC is employed. Second, we present an optimization method for RRC that finds the number of checkpoints that results in the minimal completion time while the minimal com- pletion time satisfies a given level of confidence requirement. Third, we use the proposed framework to evaluate probabilistic guarantees for RRC optimization in non-real-time systems.}},
  author       = {{Nikolov, Dimitar and Ingelsson, Urban and Singh, Virendra and Larsson, Erik}},
  booktitle    = {{2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks Workshops (DSN-W)}},
  language     = {{eng}},
  pages        = {{59--64}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  title        = {{Level of Confidence Evaluation and Its Usage for Roll-back Recovery with Checkpointing Optimization}},
  url          = {{https://lup.lub.lu.se/search/files/5874317/2733983.pdf}},
  doi          = {{10.1109/DSNW.2011.5958836}},
  year         = {{2011}},
}