Nowadays, many critical services are provided by complex distributed systems which are the result of the reuse and integration of a large number of components. Given their multi-context nature, these components are, in general, not designed to achieve high dependability by themselves, thus their behavior with respect to faults can be the most disparate. Nevertheless, it is paramount for these kinds of systems to be able to survive failures of individual components, as well as attacks and intrusions, although with degraded functionalities. To provide control capabilities over unanticipated events, we focus on fault handling strategies, particularly on system's reconfiguration. The paper describes a framework which provides fault tolerance of components based applications by detecting failures through monitoring and by recovering through system reconfiguration. The framework is based on Lira, an agent distributed infrastructure for remote control and reconfiguration, and a decision maker for selecting suitable new configurations. Lira allows for monitoring and reconfiguration at components and applications level, while decisions are taken following the feedbacks provided by the evaluation of statistical Petri net models.
A FRAMEWORK FOR RECONFIGURATION-BASED FAULT-TOLERANCE IN DISTRIBUTED SYSTEMS / A. BONDAVALLI; M. CASTALDI; P. INVERARDI; F. DI GIANDOMENICO; S. PORCARELLI. - STAMPA. - (2004), pp. 167-190. [10.1007/978-3-540-25939-8_8]
A FRAMEWORK FOR RECONFIGURATION-BASED FAULT-TOLERANCE IN DISTRIBUTED SYSTEMS
BONDAVALLI, ANDREA;
2004
Abstract
Nowadays, many critical services are provided by complex distributed systems which are the result of the reuse and integration of a large number of components. Given their multi-context nature, these components are, in general, not designed to achieve high dependability by themselves, thus their behavior with respect to faults can be the most disparate. Nevertheless, it is paramount for these kinds of systems to be able to survive failures of individual components, as well as attacks and intrusions, although with degraded functionalities. To provide control capabilities over unanticipated events, we focus on fault handling strategies, particularly on system's reconfiguration. The paper describes a framework which provides fault tolerance of components based applications by detecting failures through monitoring and by recovering through system reconfiguration. The framework is based on Lira, an agent distributed infrastructure for remote control and reconfiguration, and a decision maker for selecting suitable new configurations. Lira allows for monitoring and reconfiguration at components and applications level, while decisions are taken following the feedbacks provided by the evaluation of statistical Petri net models.File | Dimensione | Formato | |
---|---|---|---|
LNCS3069.pdf
Accesso chiuso
Tipologia:
Versione finale referata (Postprint, Accepted manuscript)
Licenza:
DRM non definito
Dimensione
381.81 kB
Formato
Adobe PDF
|
381.81 kB | Adobe PDF | Richiedi una copia |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.