Abstract
•Asynchronous multi-attempt multi-component mission is considered.•Components are exposed to shocks.•Components can abort attempts individually or upon a common abort command.•Mission success probability should be balanced with cost of lost components.•An algorithm for evaluating the mission metrics is suggested.•Activation and aborting policy minimizing expected losses is analyzed.
Multi-attempt mission aborting systems have recently received significant attention from the reliability community. Existing models mostly assume parallel or sequential execution of multiple attempts, incurring great cost or low mission success probability (MSP), respectively. This paper advances the state of the art by considering a new model, where system components may be activated with certain delay allowing to activate next one before the previous component leaves the operation, balancing the expected cost of lost components (ECC) and MSP. Each component may abort its attempt according to an individual aborting policy defined by two parameters (the number of survived shocks and an operation time threshold) or upon receiving a common abort command. Because components may have different shock resistances and performance rates, their activation order can affect both MSP and ECC. Thus, we formulate and solve the optimal attempt scheduling and aborting policy (SAP) problem, which determines the vector of component activation times and the individual attempt aborting policy for each component to minimize the expected mission losses (EML). The EML, a function of MSP and ECC, is evaluated using a new numerical procedure. A detailed case study of a cloud data processing system is provided to demonstrate the proposed model.