System administrators can maximize availability in the face of software and hardware faults and facilitate a simpler and more effective end-to-end experience with the Predictive Self-Healing feature in the SolarisTM 10 Operating System (Solaris OS), explain Sun engineers Cynthia McGuire and Liane Praza in the current BigAdmin XPerts Session entitled "Predictive Self-Healing: Solaris Fault Manager and Service Manager."
Expounding on a user's question searching for the benefits in using Predictive Self-Healing, McGuire and Praza identified two sets of Solaris component technologies used to implement predictive self-healing.
The first set is for CPU, memory and I/O bus nexus components, which they contend, "...is used to facilitate a simplified administration model wherein traditional error messages intended for humans are replaced by binary telemetry events consumed by software components that automatically diagnose the underlying fault or defect. The results of the automated diagnosis are used to initiate self-healing activities such as administrator messaging, isolation or deactivation of faulty components, and guided repair."
Software services comprise the second set of technologies. "The Service Management Facility (smf(5)) presents a simplified administrative model for software services. Each software service has an advertised state, and failures are diagnosed automatically by the system to point to the root cause of problems," the experts explain. "Automated restart of failing services is performed whenever possible to reduce the time humans must spend repairing faulty software. If a service cannot be restarted automatically, smf(5) describes the cause of the fault so that time-to-repair is significantly shorter."
McGuire and Praza also have addressed questions in this session regarding smf(5) failure, leveraging Predictive Self-Healing with an x86 system, handling STDOUT when converted to a service and others.
BigAdmin XPerts Sessions, hosted on the BigAdmin System Administration Portal, allows users to ask experts specific questions on a particular subject. Past topics have included Dynamic Tracing (DTrace) Framework, GNOME 2.0 Desktop; Sun StorEdgeTM 6920 features and many more. Transcripts are available on closed sessions.
[...read more...]