4405ch04 Continuous availability and manageability.fm
Draft Document for Review September 2, 2008 5:05 pm
110
IBM Power 570 Technical Overview and Introduction
initiates a call home for service. This methodology insures that all platform errors will be
reported through at least one functional path, ultimately resulting in a single notification for a
single problem.
Extended Error Data (EED)
Extended error data (EED) is additional data that is collected either automatically at the time
of a failure or manually at a later time. The data collected is dependent on the invocation
method but includes information like firmware levels, operating system levels, additional fault
isolation register values, recoverable error threshold register values, system status, and any
other pertinent data.
The data is formatted and prepared for transmission back to IBM to assist the service support
organization with preparing a service action plan for the service representative or for
additional analysis.
System dump handling
In some circumstances, an error may require a dump to be automatically or manually created.
In this event, it will be offloaded to the HMC upon reboot. Specific HMC information is
included as part of the information that can optionally be sent to IBM support for analysis. If
additional information relating to the dump is required, or if it becomes necessary to view the
dump remotely, the HMC dump record will notify IBMs support center upon which HMC the
dump is located.
4.3.4 Notifying the appropriate contacts
Once a POWER6 processor-based system has detected, diagnosed, and reported an error to
an appropriate aggregation point, it then takes steps to notify the customer, and if necessary
the IBM Support Organization. Depending upon the assessed severity of the error and
support agreement, this notification could range from a simple notification to having field
service personnel automatically dispatched to the customer site with the correct replacement
part.
Customer notify
When an event is important enough to report, but doesn’t indicate the need for a repair action
or the need to call home to IBM service and support, it is classified as
customer notify.
Customers are notified because these events might be of interest to an administrator. The
event might be a symptom of an expected systemic change, such as a network
reconfiguration or failover testing of redundant power or cooling systems. Examples of these
events include:
Network events like the loss of contact over a Local Area Network (LAN)
Environmental events such as ambient temperature warnings
Events that need further examination by the customer, but these events do not necessarily
require a part replacement or repair action
Customer notify events are serviceable events by definition because they indicate that
something has happened which requires customer awareness in the event they want to take
further action. These events can always be reported back to IBM at the customer’s discretion.
Call home
A correctly configured POWER6 processor-based system can initiate an automatic or manual
call from a customer location to the IBM service and support organization with error data,
server status, or other service-related information. Call home invokes the service organization