Clustered checkpointing: Maximizing the level of confidence for non-equidistant checkpointing

Nikolov, Dimitar; Larsson, Erik

Clustered checkpointing: Maximizing the level of confidence for non-equidistant checkpointing

Mark

Nikolov, Dimitar ^LU and Larsson, Erik ^LU

(2017) In Integration, the VLSI Journal 58. p.549-562

Abstract: Employing fault tolerance often introduces a time overhead, which may cause a deadline violation in real-time systems (RTS). Therefore, for RTS it is important to optimize the fault tolerance techniques such that the probability to meet the deadlines, i.e. the Level of Confidence (LoC), is maximized. Previous studies have focused on evaluating the LoC for equidistant checkpointing. However, no studies have addressed the problem of evaluating the LoC for non-equidistant checkpointing. In this work, we provide an expression to evaluate the LoC for non-equidistant checkpointing. Further, we detail an exhaustive search approach to find the distribution of a given number of checkpoints that results in the maximal LoC. Since the exhaustive... (More); Employing fault tolerance often introduces a time overhead, which may cause a deadline violation in real-time systems (RTS). Therefore, for RTS it is important to optimize the fault tolerance techniques such that the probability to meet the deadlines, i.e. the Level of Confidence (LoC), is maximized. Previous studies have focused on evaluating the LoC for equidistant checkpointing. However, no studies have addressed the problem of evaluating the LoC for non-equidistant checkpointing. In this work, we provide an expression to evaluate the LoC for non-equidistant checkpointing. Further, we detail an exhaustive search approach to find the distribution of a given number of checkpoints that results in the maximal LoC. Since the exhaustive search approach is very time-consuming, we propose the Clustered Checkpointing method, a heuristic that distributes checkpoints in a number of clusters with the goal to maximize the LoC. The results show that the LoC can be improved when non-equidistant checkpointing is used. Further, the results indicate that the proposed Clustered Checkpointing method is capable to find the distribution that results in the maximal LoC in much shorter time than the exhaustive search approach, while considering only few clusters. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/0161c14b-764c-48f2-8431-c0cd799707af

author

Nikolov, Dimitar ^LU and Larsson, Erik ^LU

organization

publishing date

2017-06

type

Contribution to journal

publication status

published

subject

Other Electrical Engineering, Electronic Engineering, Information Engineering

keywords

Fault tolerance, Reliability analysis, Real-time systems, Checkpointing

in

Integration, the VLSI Journal

volume

58

pages

549 - 562

publisher

Elsevier

external identifiers

scopus:85009754351
wos:000405052700058

ISSN

0167-9260

DOI

10.1016/j.vlsi.2016.10.013

language

English

LU publication?

yes

id

0161c14b-764c-48f2-8431-c0cd799707af

date added to LUP

2017-02-14 13:53:50

date last changed

2025-10-14 10:47:58

@article{0161c14b-764c-48f2-8431-c0cd799707af,
  abstract     = {{Employing fault tolerance often introduces a time overhead, which may cause a deadline violation in real-time systems (RTS). Therefore, for RTS it is important to optimize the fault tolerance techniques such that the probability to meet the deadlines, i.e. the Level of Confidence (LoC), is maximized. Previous studies have focused on evaluating the LoC for equidistant checkpointing. However, no studies have addressed the problem of evaluating the LoC for non-equidistant checkpointing. In this work, we provide an expression to evaluate the LoC for non-equidistant checkpointing. Further, we detail an exhaustive search approach to find the distribution of a given number of checkpoints that results in the maximal LoC. Since the exhaustive search approach is very time-consuming, we propose the Clustered Checkpointing method, a heuristic that distributes checkpoints in a number of clusters with the goal to maximize the LoC. The results show that the LoC can be improved when non-equidistant checkpointing is used. Further, the results indicate that the proposed Clustered Checkpointing method is capable to find the distribution that results in the maximal LoC in much shorter time than the exhaustive search approach, while considering only few clusters.}},
  author       = {{Nikolov, Dimitar and Larsson, Erik}},
  issn         = {{0167-9260}},
  keywords     = {{Fault tolerance; Reliability analysis; Real-time systems; Checkpointing}},
  language     = {{eng}},
  pages        = {{549--562}},
  publisher    = {{Elsevier}},
  series       = {{Integration, the VLSI Journal}},
  title        = {{Clustered checkpointing: Maximizing the level of confidence for non-equidistant checkpointing}},
  url          = {{http://dx.doi.org/10.1016/j.vlsi.2016.10.013}},
  doi          = {{10.1016/j.vlsi.2016.10.013}},
  volume       = {{58}},
  year         = {{2017}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Clustered checkpointing: Maximizing the level of confidence for non-equidistant checkpointing