Abstract
A trusted cloud-based software system is a highly reliable, available and predictable advanced computing system with guaranteed Quality of Service (QOS). To maintain the high reliability of a cloud-based software system, it is critical to find a feasible solution to counteract the software aging problem, where system performance may be progressively degrade due to exhaustion of system resources, fragmentation and accumulation of errors. In this dissertation, we adopt a proactive technique, called software rejuvenation, to enhance the fault tolerance of a cloud-based system equipped with software standby spares. We extend the dynamic fault tree (DFT) formalism with Software SPare (SSP) gates, to model the system reliability before and during a software rejuvenation process in an aging cloud-based software system. A novel analytical approach is presented to derive the reliability function of a cloud-based SSP gate, with either one or two Hot Software Spares (HSS). We verify our approach using Continuous Time Markov Chains (CTMC) for the case of constant failure rate. Then, to extend our approach for non-constant failure rates, we adopt Weibull distribution to model the increasing failure rates for software components with aging issues. We use case studies of a cloud-based software system with multiple HSSs to illustrate the validity of our approach for both the constant and non-constant failure rate cases. Based on the reliability analytical results, we show how software rejuvenation schedules can be created to keep the system reliability consistently staying above predefined critical levels.