Abstract
SUMMARY & CONCLUSIONSFault tolerance is an essential architectural attribute for achieving high reliability in many critical applications of digital systems. Automatic recovery and reconfiguration mechanisms (fault detection, location, isolation, and correction) play a crucial role in implementing fault tolerance because an uncovered fault may propagate and contaminate other non-faulty components, which in turn leads to a system or subsystem failure even when adequate redundancy exists. An accurate reliability analysis of these systems is critical for an effective assessment and implementation of fault tolerance. The analysis must account for the system failure logic (i.e., structure function) and the effectiveness of recovery and reconfiguration mechanisms that are captured using the fault and error handling model, also referred to as a coverage model. Incorporating coverage models into system reliability analysis introduces complex stochastic dependencies. Traditional solution methods are computationally inefficient, and prone to numerical issues (stiffness). To overcome these difficulties, a separable method-based on Simple and Efficient Algorithm (SEA) was proposed. An important aspect of SEA is that when a coverage model satisfies certain pre-conditions such that it can be used, then it will be the most simple and efficient algorithm to solve the model accurately. In this paper, we present the background and fundamental concepts of SEA methodology and its application to various types of coverage models. We demonstrate the simplicity and efficiency of SEA methodology through large-scale benchmark examples originating from well-known safety-critical applications.