Advanced

Maximizing Level of Confidence for Non-Equidistant Checkpointing

Nikolov, Dimitar LU and Larsson, Erik LU (2016) 21st Asia and South Pacific Design Automation Conference ASP-DAC
Abstract
To combat the increasing soft error rates in recent semiconductor technologies, it is important to employ fault tolerance techniques. While these techniques enable correct operation, they introduce a time overhead, which may cause a deadline violation in real-time systems (RTS). Since correct operation for RTS is defined as producing correct outputs while satisfying time constraints (deadlines), it is important to optimize the fault tolerance techniques such that the probability to meet deadlines is maximized. To measure to what extent a deadline is met, the concept of Level of Confidence (LoC), i.e. the probability to meet the deadline, can be used. Previous studies have focused on evaluating the LoC for Roll-back Recovery with... (More)
To combat the increasing soft error rates in recent semiconductor technologies, it is important to employ fault tolerance techniques. While these techniques enable correct operation, they introduce a time overhead, which may cause a deadline violation in real-time systems (RTS). Since correct operation for RTS is defined as producing correct outputs while satisfying time constraints (deadlines), it is important to optimize the fault tolerance techniques such that the probability to meet deadlines is maximized. To measure to what extent a deadline is met, the concept of Level of Confidence (LoC), i.e. the probability to meet the deadline, can be used. Previous studies have focused on evaluating the LoC for Roll-back Recovery with Checkpointing (RRC) with an equidistant distribution of the checkpoints. However, no studies have addressed the problem of evaluating the LoC for a non-equidistant distribution of the checkpoints. In this work, we provide an expression to evaluate the LoC for a non-equidistant checkpointing scheme, and propose a method, i.e. Clustered Checkpointing, to distribute a given number of checkpoints with the goal to maximize the LoC. The results show that the LoC can be improved when a non-equidistant checkpointing scheme is used. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Contribution to conference
publication status
in press
subject
keywords
soft errors, fault tolerance, checkpointing, real-time systems, reliability analysis
conference name
21st Asia and South Pacific Design Automation Conference ASP-DAC
language
English
LU publication?
yes
id
9c653543-b140-4437-ac30-a06d2eb3b33d (old id 8046905)
date added to LUP
2015-10-05 12:59:52
date last changed
2016-04-16 10:37:47
@misc{9c653543-b140-4437-ac30-a06d2eb3b33d,
  abstract     = {To combat the increasing soft error rates in recent semiconductor technologies, it is important to employ fault tolerance techniques. While these techniques enable correct operation, they introduce a time overhead, which may cause a deadline violation in real-time systems (RTS). Since correct operation for RTS is defined as producing correct outputs while satisfying time constraints (deadlines), it is important to optimize the fault tolerance techniques such that the probability to meet deadlines is maximized. To measure to what extent a deadline is met, the concept of Level of Confidence (LoC), i.e. the probability to meet the deadline, can be used. Previous studies have focused on evaluating the LoC for Roll-back Recovery with Checkpointing (RRC) with an equidistant distribution of the checkpoints. However, no studies have addressed the problem of evaluating the LoC for a non-equidistant distribution of the checkpoints. In this work, we provide an expression to evaluate the LoC for a non-equidistant checkpointing scheme, and propose a method, i.e. Clustered Checkpointing, to distribute a given number of checkpoints with the goal to maximize the LoC. The results show that the LoC can be improved when a non-equidistant checkpointing scheme is used.},
  author       = {Nikolov, Dimitar and Larsson, Erik},
  keyword      = {soft errors,fault tolerance,checkpointing,real-time systems,reliability analysis},
  language     = {eng},
  title        = {Maximizing Level of Confidence for Non-Equidistant Checkpointing},
  year         = {2016},
}