Nowadays, many critical services are provided by complex distributed systems which are the result of the reuse and integration of a large number of components. Given their multi-context nature, these components are, in general, not designed to achieve high dependability by themselves, thus their behavior with respect to faults can be the most disparate. Nevertheless, it is paramount for these kinds of systems to be able to survive failures of individual components, as well as attacks and intrusions, although with degraded functionalities. To provide control capabilities over unanticipated events, we focus on fault handling strategies, particularly on system's reconfiguration. The paper describes a framework which provides fault tolerance of components based applications by detecting failures through monitoring and by recovering through system reconfiguration. The framework is based on Lira, an agent distributed infrastructure for remote control and reconfiguration, and a decision maker for selecting suitable new configurations. Lira allows for monitoring and reconfiguration at components and applications level, while decisions are taken following the feedbacks provided by the evaluation of statistical Petri net models.

A FRAMEWORK FOR RECONFIGURATION-BASED FAULT-TOLERANCE IN DISTRIBUTED SYSTEMS / A. BONDAVALLI; M. CASTALDI; P. INVERARDI; F. DI GIANDOMENICO; S. PORCARELLI. - STAMPA. - (2004), pp. 167-190. [10.1007/978-3-540-25939-8_8]

A FRAMEWORK FOR RECONFIGURATION-BASED FAULT-TOLERANCE IN DISTRIBUTED SYSTEMS

BONDAVALLI, ANDREA;
2004

Abstract

Nowadays, many critical services are provided by complex distributed systems which are the result of the reuse and integration of a large number of components. Given their multi-context nature, these components are, in general, not designed to achieve high dependability by themselves, thus their behavior with respect to faults can be the most disparate. Nevertheless, it is paramount for these kinds of systems to be able to survive failures of individual components, as well as attacks and intrusions, although with degraded functionalities. To provide control capabilities over unanticipated events, we focus on fault handling strategies, particularly on system's reconfiguration. The paper describes a framework which provides fault tolerance of components based applications by detecting failures through monitoring and by recovering through system reconfiguration. The framework is based on Lira, an agent distributed infrastructure for remote control and reconfiguration, and a decision maker for selecting suitable new configurations. Lira allows for monitoring and reconfiguration at components and applications level, while decisions are taken following the feedbacks provided by the evaluation of statistical Petri net models.
2004
9783540231684
ARCHITECTING DEPENDABLE SYSTEMS II, LNCS 3069, LECTURE NOTES IN COMPUTER SCIENCE
167
190
A. BONDAVALLI; M. CASTALDI; P. INVERARDI; F. DI GIANDOMENICO; S. PORCARELLI
File in questo prodotto:
File Dimensione Formato  
LNCS3069.pdf

Accesso chiuso

Tipologia: Versione finale referata (Postprint, Accepted manuscript)
Licenza: Tutti i diritti riservati
Dimensione 381.81 kB
Formato Adobe PDF
381.81 kB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/15701
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 4
social impact