Logo image
Non-equidistant checkpointing and quantitative resilience modeling: a thesis in Computer Engineering
Thesis   Open access

Non-equidistant checkpointing and quantitative resilience modeling: a thesis in Computer Engineering

Priscila de Paula Silva
Master of Science (MS), University of Massachusetts Dartmouth
2022
DOI:
https://doi.org/10.62791/20250

Abstract

Software intensive systems rely on checkpointing to prevent loss of computation, by per-forming periodic backups. Non-equidistant checkpointing strategies have been proposed for specialized hardware and software applications as well as specific failure distributions. How-ever, a general method to identify a non-equidistant checkpointing strategy for an arbitrary combination of application and failure distribution would be beneficial. This thesis proposes an approach to identify a near optimal non-equidistant checkpointing strategy with a genetic algorithm, which only requires knowledge of the failure distribution. Experiments suggest that the approach consistently outperforms the traditional strategy of equidistant check-points under (i) a range of total processing times and (ii) different values of distributions exhibiting increasing, constant, and decreasing failure rates. Although many systems and processes are amenable to reliability modeling, researchers have also demonstrated interest in bringing a system back to its original performance after a deterioration, which is known as resilience engineering: the ability of a system to respond, absorb, adapt, and recover from a disruptive event. Several metrics to quantify resilience have been proposed in the literature. However, fewer studies have proposed models to predict the metrics. Hence, this thesis presents two alternative approaches to model and predict performance and resilience metrics, including (i) bathtub-shaped hazard functions and (ii)mixture distributions with techniques from reliability engineering. Historical data on job loss during recession in the United States are used to assess the predictive accuracy of these approaches. The results suggest that both approaches can produce accurate predictions for several of the data sets well, but that data sets that experience a sudden drop in performance or deviate from the assumption of a single decrease and subsequent increase cannot be fit to either class of proposed models, necessitating additional modeling efforts that can effectively characterize these more general scenarios.
pdf
Silva P.P.S. COE MS Thesis 20222.35 MBDownloadView
CC BY-NC-ND V4.0 Open Access

Metrics

11 File views/ downloads
4 Record Views

Details

Logo image