Cloud-based systems require the management of large volumes of requests while maintaining specific levels of availability and performance. Each service is thus replicated into a pool of identical replicas. This allows for load distribution among the pool of replicas and a greater degree of fault tolerance compared to a single instance of the service that stands as a single point of failure. The high availability and scalability requirements, coupled with the phenomenon of software aging, have made the replica-based approach pervasive in modern online services. In such configurations, the unavailability of a single replica, due to scheduled maintenance or unexpected failures, does not imply the unavailability of the whole system but rather an increase in the load of the remaining replicas. This identifies a performability problem in which the system can tolerate a certain number of offline replicas in the pool. However, once a certain threshold is exceeded, the resulting high workload pending on the online replicas could degrade the performance of the system, potentially leading to a failure in meeting the non-functional requirements. In this work, we study the problem of aging in a pool of service replicas. We characterize two inspection-based rejuvenation strategies that could be implemented in this context, which we identify as uncoordinated and coordinated rejuvenation. We represent them through the formalism of Stochastic Time Petri Nets (STPN) and through steady-state analysis, we conduct a performability evaluation of both the models as the frequency of inspections and the pool size vary.
Quantitative evaluation of software rejuvenation of a pool of service replicas / Scommegna, Leonardo; Becattini, Marco; Fontani, Giovanni; Paroli, Leonardo; Vicario, Enrico. - STAMPA. - (2024), pp. 402-409. (Intervento presentato al convegno INTERNATIONAL WORKSHOP ON SOFTWARE AGING AND REJUVENATION) [10.1109/issrew63542.2024.00110].
Quantitative evaluation of software rejuvenation of a pool of service replicas
Scommegna, Leonardo
;Becattini, Marco;Fontani, Giovanni;Paroli, Leonardo;Vicario, Enrico
2024
Abstract
Cloud-based systems require the management of large volumes of requests while maintaining specific levels of availability and performance. Each service is thus replicated into a pool of identical replicas. This allows for load distribution among the pool of replicas and a greater degree of fault tolerance compared to a single instance of the service that stands as a single point of failure. The high availability and scalability requirements, coupled with the phenomenon of software aging, have made the replica-based approach pervasive in modern online services. In such configurations, the unavailability of a single replica, due to scheduled maintenance or unexpected failures, does not imply the unavailability of the whole system but rather an increase in the load of the remaining replicas. This identifies a performability problem in which the system can tolerate a certain number of offline replicas in the pool. However, once a certain threshold is exceeded, the resulting high workload pending on the online replicas could degrade the performance of the system, potentially leading to a failure in meeting the non-functional requirements. In this work, we study the problem of aging in a pool of service replicas. We characterize two inspection-based rejuvenation strategies that could be implemented in this context, which we identify as uncoordinated and coordinated rejuvenation. We represent them through the formalism of Stochastic Time Petri Nets (STPN) and through steady-state analysis, we conduct a performability evaluation of both the models as the frequency of inspections and the pool size vary.I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.