Chapter 4. Reliability, availability, and serviceability
121
The first occurrence of each failure type is recorded in the Manage Serviceable Events task
on the management console. This task then filters and maintains a history of duplicate
reports from other logical partitions or from the service processor. It then looks at all active
service event requests within a predefined time span, analyzes the failure to ascertain the
root cause and, if enabled, initiates a Call Home for service. This methodology ensures that
all platform errors are reported through at least one functional path, ultimately resulting in a
single notification for a single problem. Similar service functionality is provided through the
Service Focal Point application on the IVM for providing service functions and interfaces on
non-HMC partitioned servers.
Extended error data
Extended error data (EED) is additional data that is collected either automatically at the time
of a failure or manually later. The data that is collected depends on the invocation method. It
includes information such as firmware levels, operating system levels, additional fault isolation
register values, recoverable error threshold register values, system status, and any other
pertinent data.
The data is formatted and prepared for transmission back to IBM either to assist the service
support organization with preparing a service action plan for the service representative or for
additional analysis.
System dump handling
In certain circumstances, an error might require a memory dump to be automatically or
manually created. In this event, the memory dump can be offloaded to the optional HMC.
Specific management console information is included as part of the information that optionally
can be sent to IBM Support for analysis. If additional information that relates to the memory
dump is required, or if viewing the memory dump remotely becomes necessary, the
management console memory dump record notifies the IBM Support center of which
management console the memory dump is on. If no management console is present, the
memory dump might be either on the FSP or in the operating system. The location depends
on the type of memory dump that was initiated and whether the operating system is
operational.
4.5.6 Notifying
After a Power E850C server detects, diagnoses, and reports an error to an appropriate
aggregation point, it then takes steps to notify the administrator and if necessary, the IBM
Support organization. Depending on the assessed severity of the error and support
agreement, this notification might range from a simple message to having field service
personnel automatically dispatched to the client site with the correct replacement part.
Client Notify
When an event is important enough to report, but does not indicate the need for a repair
action or the need to call home to IBM Support, it is classified as
Client Notify.
Clients are
notified because these events might be of interest to an administrator. The event might be a
symptom of an expected systemic change, such as a network reconfiguration or failover
testing of redundant power or cooling systems. These events include the following examples:
Network events, such as the loss of contact over a local area network (LAN)
Environmental events, such as ambient temperature warnings
Events that need further examination by the client (although these events do not
necessarily require a part replacement or repair action)
Summary of Contents for E850C
Page 2: ......
Page 36: ...22 IBM Power System E850C Technical Overview and Introduction...
Page 114: ...100 IBM Power System E850C Technical Overview and Introduction...
Page 154: ...140 IBM Power System E850C Technical Overview and Introduction...
Page 158: ...144 IBM Power System E850C Technical Overview and Introduction...
Page 159: ......
Page 160: ...ibm com redbooks Printed in U S A Back cover ISBN 0738455687 REDP 5412 00...