In this paper, we present a complete architecture for improving the dependability of complex COTS and legacy-based systems. For long-lived applications, such as most of those being constructed nowadays via integration of legacy subsystems, fault treatment is a very important part of the fault tolerance strategy. The paper advocates the need for careful diagnosis and damage assessment, and for precise and effective recovery actions, specifically tailored to the affecting fault and/or to the extent of the damage in the affected component. In our proposal, threshold-based mechanisms are exploited to trigger alternative actions. The design and implementation of the resulting solution is illustrated with respect to a case study. This consists of a distributed architectural framework, handling replicated legacy-based subsystems. Replication and voting are used for error detection and masking. An experimental prototype deployed over a COTS-based LAN is described and has allowed a dependability analysis, via combined use of direct measurements and analytical modeling. © Springer-Verlag Berlin Heidelberg 2003.

A Fault-Tolerant Distributed Legacy-based System and Its Evaluation / A. Bondavalli; S. Chiaradonna; D. Cotroneo; L. Romano. - STAMPA. - (2003), pp. 303-320. [10.1007/978-3-540-45214-0_22]

A Fault-Tolerant Distributed Legacy-based System and Its Evaluation

BONDAVALLI, ANDREA;
2003

Abstract

In this paper, we present a complete architecture for improving the dependability of complex COTS and legacy-based systems. For long-lived applications, such as most of those being constructed nowadays via integration of legacy subsystems, fault treatment is a very important part of the fault tolerance strategy. The paper advocates the need for careful diagnosis and damage assessment, and for precise and effective recovery actions, specifically tailored to the affecting fault and/or to the extent of the damage in the affected component. In our proposal, threshold-based mechanisms are exploited to trigger alternative actions. The design and implementation of the resulting solution is illustrated with respect to a case study. This consists of a distributed architectural framework, handling replicated legacy-based subsystems. Replication and voting are used for error detection and masking. An experimental prototype deployed over a COTS-based LAN is described and has allowed a dependability analysis, via combined use of direct measurements and analytical modeling. © Springer-Verlag Berlin Heidelberg 2003.
2003
9783540202240
LADC2003- 1st Latin-American Dependable Computing Conference - Lecture Notes in Computer Science N. 2487
303
320
A. Bondavalli; S. Chiaradonna; D. Cotroneo; L. Romano
File in questo prodotto:
File Dimensione Formato  
file-BCCR03-LADC2003-2.pdf

Accesso chiuso

Tipologia: Altro
Licenza: Tutti i diritti riservati
Dimensione 255.88 kB
Formato Adobe PDF
255.88 kB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/316326
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 0
social impact