Students' Projects

Topology-Aware Failure Diagnosis For Distributed Enterprise Systems.

Amar Agrawal


The increasing complexity of modern enterprise systems has necessitated the automation of various management tasks. Such self-managing systems - also known as autonomic systems - can be self-healing, self-configuring, self-optimizing and self-securing. In this project we focus on the self-healing aspect, so that applications could detect and diagnose failures inorder to isolate the root-cause of these failures. Several approaches have been proposed for root-cause analysis including expert systems, data mining techniques, etc. However, none of these use the application topology information to help the failure diagnosis process. We believe that the application dependency information can be used to make the diagnosis process more focussed and accurate even in the presence of uncertainty (eg. lost/spurious symptoms). We model the system failure dependency information in terms of a Bayesian Belief Network and use it to infer the system state based on the observed failure symptoms in the application. The output of the project would be a tool that can monitor applications for failures and provide a component ranking based on inferred failure probabilities.

