On-line Techniques to Adjust and Optimize Checkpointing Frequency
(2010) IEEE International Workshop on Realiability Aware System Design and Test (RASDAT 2010) p.29-33- Abstract
- Due to increased susceptibility to soft errors in recent semiconductor technologies, techniques for detecting and recovering from errors are required. Roll-back Recovery with Checkpointing (RRC) is one well known technique that copes with soft errors by taking and storing checkpoints during execution of a job. Employing this technique, increases the average execution time (AET), i.e. the expected time for a job to complete, and thus impacts performance. To minimize the AET, the checkpointing frequency is to be optimized. However, it has been shown that optimal checkpointing frequency depends highly on error probability. Since error probability cannot be known in advance and can change during time, the optimal checkpointing frequency cannot... (More)
- Due to increased susceptibility to soft errors in recent semiconductor technologies, techniques for detecting and recovering from errors are required. Roll-back Recovery with Checkpointing (RRC) is one well known technique that copes with soft errors by taking and storing checkpoints during execution of a job. Employing this technique, increases the average execution time (AET), i.e. the expected time for a job to complete, and thus impacts performance. To minimize the AET, the checkpointing frequency is to be optimized. However, it has been shown that optimal checkpointing frequency depends highly on error probability. Since error probability cannot be known in advance and can change during time, the optimal checkpointing frequency cannot be known at design time. In this paper we present techniques that are adjusting the checkpointing frequency on-line (during operation) with the goal to reduce the AET of a job. A set of experiments have been performed to demonstrate the benefits of the proposed techniques. The results have shown that these techniques adjust the checkpointing frequency so well that the resulting AET is close to the theoretical optimum. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/2340829
- author
- Nikolov, Dimitar LU ; Ingelsson, Urban ; Singh, Virendra and Larsson, Erik LU
- publishing date
- 2010
- type
- Contribution to conference
- publication status
- published
- subject
- pages
- 29 - 33
- conference name
- IEEE International Workshop on Realiability Aware System Design and Test (RASDAT 2010)
- conference location
- Bangalore, India
- conference dates
- 2010-01-07 - 2010-01-08
- language
- English
- LU publication?
- no
- id
- a22f277d-2520-40ce-9653-d6b9e05c0a54 (old id 2340829)
- date added to LUP
- 2016-04-04 14:15:03
- date last changed
- 2018-11-21 21:19:12
@misc{a22f277d-2520-40ce-9653-d6b9e05c0a54, abstract = {{Due to increased susceptibility to soft errors in recent semiconductor technologies, techniques for detecting and recovering from errors are required. Roll-back Recovery with Checkpointing (RRC) is one well known technique that copes with soft errors by taking and storing checkpoints during execution of a job. Employing this technique, increases the average execution time (AET), i.e. the expected time for a job to complete, and thus impacts performance. To minimize the AET, the checkpointing frequency is to be optimized. However, it has been shown that optimal checkpointing frequency depends highly on error probability. Since error probability cannot be known in advance and can change during time, the optimal checkpointing frequency cannot be known at design time. In this paper we present techniques that are adjusting the checkpointing frequency on-line (during operation) with the goal to reduce the AET of a job. A set of experiments have been performed to demonstrate the benefits of the proposed techniques. The results have shown that these techniques adjust the checkpointing frequency so well that the resulting AET is close to the theoretical optimum.}}, author = {{Nikolov, Dimitar and Ingelsson, Urban and Singh, Virendra and Larsson, Erik}}, language = {{eng}}, pages = {{29--33}}, title = {{On-line Techniques to Adjust and Optimize Checkpointing Frequency}}, year = {{2010}}, }