Automatic System Recovery
The automatic system recovery (ASR) feature allows an Ultra 450 system to resume
operation after experiencing certain hardware faults or failures. Power-on self-test
(POST) and OpenBoot diagnostics (OBDiag) can automatically detect failed hardware
components, while an auto-configuring capability designed into the OBP firmware
allows the system to deconfigure failed components and restore system operation. As
long as the system is capable of operating without the failed component, the ASR
features will enable the system to reboot automatically, without operator
intervention. Such a “degraded boot” allows the system to continue operating while
a service call is generated to replace the faulty part.
If a faulty component is detected during the power-on sequence, the component is
deconfigured and, if the system remains capable of functioning without it, the boot
sequence continues. In a running system, certain types of failures (such as a
processor failure) can cause an automatic system reset. If this happens, the ASR
functionality allows the system to reboot immediately, provided that the system can
function without the failed component. This prevents a faulty hardware component
from keeping the entire system down or causing the system to crash again.
“Soft” Deconfiguration via Status Property
To support a degraded boot capability, the OBP uses the 1275 Client Interface (via
the device tree) to “mark” devices as either failed or disabled, by creating an
appropriate “status” property in the corresponding device tree node. By convention,
UNIX will not activate a driver for any subsystem so marked.
Thus, as long as the failed component is electrically dormant (that is, it will not
cause random bus errors or signal noise, etc.), the system can be rebooted
automatically and resume operation while a service call is made.
“Hard” Deconfiguration
In two special cases of deconfiguring a subsystem (CPUs and memory), the OBP
actually takes action beyond just creating an appropriate “status” property in the
device tree. At the first moments after reset, the OBP must initialize and functionally
configure (or bypass) these functions in order for the rest of the system to work
correctly. These actions are taken based on the status of two NVRAM configuration
variables,
post-status
and
asr-status
, which hold the override information
supplied either from POST or via a manual user override (see “ASR User Override
Capability” on page 11).
10
♦
September 1998, Revision A
Summary of Contents for Sun Ultra 450
Page 8: ...viii September 1998 Revision A...
Page 12: ...4 September 1998 Revision A...