Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors

Subramanyan, Pramod; Singh, Virendra; Saluja, Kewal K.; Larsson, Erik

Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors

Mark

Subramanyan, Pramod ; Singh, Virendra ; Saluja, Kewal K. and Larsson, Erik ^LU

(2010) Design Automation and Test in Europe (DATE) p.1572-1577

Abstract: Continued CMOS scaling is expected to make future micro-processors susceptible to transient faults, hard faults, manufacturing defects and process variations causing fault tolerance to become important even for general purpose processors targeted at the commodity market. Tomitigate the effect of decreased reliability, a number of fault-tolerant architectures have been proposed that exploit the natural coarse-grained redundancy available in chip multiprocessors (CMPs). These architectures execute a single application using two threads, typically as one leading thread and one trailing thread. Errors are detected by comparing the outputs produced by these two threads. These architectures schedule a single application on two cores or two... (More); Continued CMOS scaling is expected to make future micro-processors susceptible to transient faults, hard faults, manufacturing defects and process variations causing fault tolerance to become important even for general purpose processors targeted at the commodity market. Tomitigate the effect of decreased reliability, a number of fault-tolerant architectures have been proposed that exploit the natural coarse-grained redundancy available in chip multiprocessors (CMPs). These architectures execute a single application using two threads, typically as one leading thread and one trailing thread. Errors are detected by comparing the outputs produced by these two threads. These architectures schedule a single application on two cores or two thread contexts of a CMP. As a result, besides the additional energy consumption and performance overhead that is required to provide fault tolerance, such schemes also impose a throughput loss. Consequently a CMP which is capable of executing 2n threads in non-redundant mode can only execute half as many (n) threads in fault-tolerant mode. In this paper we propose multiplexed redundant execution (MRE), a low-overhead architectural technique that executes multiple trailing threads on a single processor core. MRE exploits the observation that it is possible to accelerate the execution of the trailing thread by providing execution assistance from the leading thread. Execution assistance combined with coarse-grained multithreading allows MRE to schedule multiple trailing threads concurrently on a single core with only a small performance penalty. Our results show that MRE increases the throughput of fault-tolerant CMP by 16% over an ideal dual modular redundant (DMR) architecture. © 2010 EDAA. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/2340932

author

Subramanyan, Pramod ; Singh, Virendra ; Saluja, Kewal K. and Larsson, Erik ^LU

publishing date

2010

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

Electrical Engineering, Electronic Engineering, Information Engineering

host publication

2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)

pages

1572 - 1577

conference name

Design Automation and Test in Europe (DATE)

conference location

Dresden, Germany

conference dates

2010-03-08 - 2010-03-12

external identifiers

scopus:77953101372

ISBN

978-1-4244-7054-9

DOI

10.1109/DATE.2010.5457061

language

English

LU publication?

no

id

70c069b4-2960-4f3b-8cbb-1f3c2289c32a (old id 2340932)

date added to LUP

2016-04-04 13:34:03

date last changed

2022-03-23 22:22:04

@inproceedings{70c069b4-2960-4f3b-8cbb-1f3c2289c32a,
  abstract     = {{Continued CMOS scaling is expected to make future micro-processors susceptible to transient faults, hard faults, manufacturing defects and process variations causing fault tolerance to become important even for general purpose processors targeted at the commodity market. Tomitigate the effect of decreased reliability, a number of fault-tolerant architectures have been proposed that exploit the natural coarse-grained redundancy available in chip multiprocessors (CMPs). These architectures execute a single application using two threads, typically as one leading thread and one trailing thread. Errors are detected by comparing the outputs produced by these two threads. These architectures schedule a single application on two cores or two thread contexts of a CMP. As a result, besides the additional energy consumption and performance overhead that is required to provide fault tolerance, such schemes also impose a throughput loss. Consequently a CMP which is capable of executing 2n threads in non-redundant mode can only execute half as many (n) threads in fault-tolerant mode. In this paper we propose multiplexed redundant execution (MRE), a low-overhead architectural technique that executes multiple trailing threads on a single processor core. MRE exploits the observation that it is possible to accelerate the execution of the trailing thread by providing execution assistance from the leading thread. Execution assistance combined with coarse-grained multithreading allows MRE to schedule multiple trailing threads concurrently on a single core with only a small performance penalty. Our results show that MRE increases the throughput of fault-tolerant CMP by 16% over an ideal dual modular redundant (DMR) architecture. © 2010 EDAA.}},
  author       = {{Subramanyan, Pramod and Singh, Virendra and Saluja, Kewal K. and Larsson, Erik}},
  booktitle    = {{2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)}},
  isbn         = {{978-1-4244-7054-9}},
  language     = {{eng}},
  pages        = {{1572--1577}},
  title        = {{Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors}},
  url          = {{http://dx.doi.org/10.1109/DATE.2010.5457061}},
  doi          = {{10.1109/DATE.2010.5457061}},
  year         = {{2010}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors