The PS703 or PS704 blade server produces several types of codes.
Progress codes:
The power-on self-test (POST) generates eight-digit status codes that are known as
checkpoints
or
progress codes
, which are recorded in the management-module event log. The checkpoints
indicate which blade server resource is initializing.
Error codes:
The First Failure Data Capture (FFDC) error checkers capture fault data, which the service
processor then analyzes. For unrecoverable errors (UEs), for recoverable events that meet or exceed their
service thresholds, and for fatal system errors, an unrecoverable checkstop service event triggers the
service processor to analyze the error, log the system reference code (SRC), and turn on the system
attention LED.
The service processor logs the nine-word, eight-digit per word error code in the BladeCenter
management-module event log. Error codes are either
system reference codes (SRCs)
or
service request
numbers (SRNs)
. A location code might also be included.
Isolation procedures:
If the fault analysis does not determine a definitive cause, the service processor
might indicate a fault isolation procedure that you can use to isolate the failing component.
Viewing the codes
The PS703 or PS704 blade server does not display checkpoints or error codes on the remote console.
If the POST detects a problem, a 9-word, 8-digit error code is logged in the BladeCenter
management-module event log. A location code that identifies a component might also be included. See
“Error logs” on page 176 for information about viewing the management-module event log.
Service request numbers can be viewed using the AIX diagnostics CD, or various operating system
utilities such as AIX diagnostics or the Linux service aid “diagela”, if it is installed.
System reference codes (SRCs)
System reference codes indicate a server hardware or software problem that can originate in hardware, in
firmware, or in the operating system.
A blade server component generates an error code when it detects a problem. An SRC identifies the
component that generated the error code and describes the error. Use the SRC information to identify a
list of possibly failing items and to find information about any additional isolation procedures.
The following table shows the syntax of a nine-word B700xxxx SRC as it might be displayed in the event
log of the management module.
The first word of the SRC in this example is the message identifier,
B7001111
. This example numbers each
word after the first word to show relative word positions. The seventh word is the direct select address,
which is
77777777
in the example.
Table 8. Nine-word system reference code in the management-module event log
Index
Sev
Source
Date/Time
Text
1
E
Blade_05
01/21/2008,
17:15:14
(PS700-BC1BLD5E) SYS F/W: Error. Replace UNKNOWN
(5008FECF
B7001111
22222222 33333333 44444444 55555555
66666666
77777777
88888888 99999999)
Depending on your operating system and the utilities you have installed, error messages might also be
stored in an operating system log. See the documentation that comes with the operating system for more
information.
Chapter 2. Diagnostics
21