Automatic Failure Diagnosis in Distributed Large-Scale Software Systems based on Timing Behavior Anomaly Correlation .

Marwede, Nina S., Rohr, Matthias, van Hoorn, André and Hasselbring, Wilhelm (2009) Automatic Failure Diagnosis in Distributed Large-Scale Software Systems based on Timing Behavior Anomaly Correlation . [Paper] In: 13th European Conference on Software Maintenance and Reengineering (CSMR'09). , March 24--27, 2009, Kaiserslautern, Germany . Proceedings of the 13th European Conference on Software Maintenance and Reengineering (CSMR 2009). ; pp. 47-57 .

[thumbnail of MarwedeRohrHoornHasselbring2009AutomaticFailureDiagnosisInDistributedLargeScaleSoftwareSystemsBasedOnTimingBehaviorAnomalyCorrelation-cameraReadysubmission-finalPageNumbers.pdf]
Preview
Text
MarwedeRohrHoornHasselbring2009AutomaticFailureDiagnosisInDistributedLargeScaleSoftwareSystemsBasedOnTimingBehaviorAnomalyCorrelation-cameraReadysubmission-finalPageNumbers.pdf - Accepted Version

Download (1MB) | Preview
[thumbnail of MarwedeRohrHoornHasselbring2008AutomaticFailureDiagnosisSupportInDistributedLargeScaleSoftwareSystemsBasedOnTimingBehaviorAnomalyCorrelation-slides.pdf] Text
MarwedeRohrHoornHasselbring2008AutomaticFailureDiagnosisSupportInDistributedLargeScaleSoftwareSystemsBasedOnTimingBehaviorAnomalyCorrelation-slides.pdf - Presentation

Download (444B)

Abstract

Manual failure diagnosis in large-scale software systems is time-consuming and error-prone. Automatic failure diagnosis support mechanisms can potentially narrow down, or even localize faults within a very short time which both helps to preserve system availability. A large class of automatic failure diagnosis approaches consists of two steps: 1) computation of component anomaly scores; 2) global correlation of the anomaly scores for fault localization.

In this paper, we present an architecture-centric approach for the second step. In our approach, component anomaly scores are correlated based on architectural dependency graphs of the software system and a rule set to address error propagation. Moreover, the results are graphically visualized in order to support fault localization and to enhance maintainability. The visualization combines architectural diagrams automatically derived from monitoring data with failure diagnosis results. In a case study, the approach is applied to a distributed sample Web application which is subject to fault injection.

Document Type: Conference or Workshop Item (Paper)
Research affiliation: Kiel University > Software Engineering
Publisher: IEEE Computer Society
Projects: Kieker
Date Deposited: 18 Feb 2012 06:05
Last Modified: 18 Dec 2012 09:51
URI: https://oceanrep.geomar.de/id/eprint/14469

Actions (login required)

View Item View Item