Optimizing the Level of Confidence for Multiple Jobs
(2016) In IEEE Transactions on Computers- Abstract
- Correct operation of real-time systems (RTS) is defined as producing correct results within given time constraints (deadlines). As RTS are becoming more susceptible to soft errors, employing fault-tolerant techniques is crucial. Rollback Recovery with Checkpointing (RRC) is an efficient fault-tolerant technique. However, RRC introduces a time overhead which depends on the number of checkpoints. The imposed time overhead may cause deadline violations. Therefore, it is important at design time to have a metric to evaluate to what extent a time constraint is met such that RRC can be optimized. In our previous work we introduced the usage of Level of Confidence (LoC), i.e. the probability to meet a given deadline, and showed for a single job... (More)
- Correct operation of real-time systems (RTS) is defined as producing correct results within given time constraints (deadlines). As RTS are becoming more susceptible to soft errors, employing fault-tolerant techniques is crucial. Rollback Recovery with Checkpointing (RRC) is an efficient fault-tolerant technique. However, RRC introduces a time overhead which depends on the number of checkpoints. The imposed time overhead may cause deadline violations. Therefore, it is important at design time to have a metric to evaluate to what extent a time constraint is met such that RRC can be optimized. In our previous work we introduced the usage of Level of Confidence (LoC), i.e. the probability to meet a given deadline, and showed for a single job that there exists an optimal number of checkpoints which results in the maximal LoC. In this paper we assume given is a deadline and a set of jobs that employ RRC, and the objective is to find the optimal checkpoint assignment that maximizes the LoC. We show that our previous work is not sufficient for multiple jobs. Therefore, we derive an expression to compute the LoC and propose an efficient method to maximize the LoC for multiple jobs. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/5465092
- author
- Nikolov, Dimitar LU and Larsson, Erik LU
- organization
- publishing date
- 2016
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- checkpointing fault tolerance real-time systems reliability analysis soft errors
- in
- IEEE Transactions on Computers
- publisher
- IEEE - Institute of Electrical and Electronics Engineers Inc.
- external identifiers
-
- wos:000372752600020
- scopus:84963762168
- ISSN
- 0018-9340
- DOI
- 10.1109/TC.2015.2439254
- language
- English
- LU publication?
- yes
- id
- 59dae7e8-3732-488e-8dd2-615cb079fed5 (old id 5465092)
- date added to LUP
- 2016-04-01 14:17:56
- date last changed
- 2022-01-27 23:51:03
@article{59dae7e8-3732-488e-8dd2-615cb079fed5, abstract = {{Correct operation of real-time systems (RTS) is defined as producing correct results within given time constraints (deadlines). As RTS are becoming more susceptible to soft errors, employing fault-tolerant techniques is crucial. Rollback Recovery with Checkpointing (RRC) is an efficient fault-tolerant technique. However, RRC introduces a time overhead which depends on the number of checkpoints. The imposed time overhead may cause deadline violations. Therefore, it is important at design time to have a metric to evaluate to what extent a time constraint is met such that RRC can be optimized. In our previous work we introduced the usage of Level of Confidence (LoC), i.e. the probability to meet a given deadline, and showed for a single job that there exists an optimal number of checkpoints which results in the maximal LoC. In this paper we assume given is a deadline and a set of jobs that employ RRC, and the objective is to find the optimal checkpoint assignment that maximizes the LoC. We show that our previous work is not sufficient for multiple jobs. Therefore, we derive an expression to compute the LoC and propose an efficient method to maximize the LoC for multiple jobs.}}, author = {{Nikolov, Dimitar and Larsson, Erik}}, issn = {{0018-9340}}, keywords = {{checkpointing fault tolerance real-time systems reliability analysis soft errors}}, language = {{eng}}, publisher = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}}, series = {{IEEE Transactions on Computers}}, title = {{Optimizing the Level of Confidence for Multiple Jobs}}, url = {{http://dx.doi.org/10.1109/TC.2015.2439254}}, doi = {{10.1109/TC.2015.2439254}}, year = {{2016}}, }