Chapter 4. Continuous availability and manageability
147
functions, it is limited in size and complexity when compared to a full operating system
implementation, and therefore can be considered better
contained
from a design and quality
assurance viewpoint. As with any software development project, the IBM firmware
development team writes code to strict guidelines using well-defined software engineering
methods. The overall code architecture is reviewed and approved and each developer
schedules a variety of peer code reviews. In addition, all code is strenuously tested, first by
visual inspections, looking for logic errors, then by simulation and operation in actual test and
production servers. Using this structured approach, most coding error are caught and fixed
early in the design process.
An inherent feature of the POWER Hypervisor is that the majority of the code runs in the
protection domain of a hidden system partition. Failures in this code are limited to this system
partition. Supporting a very robust tasking model, the code in the system partition is
segmented into critical and noncritical tasks. If a noncritical task fails, the system partition is
designed to continue to operate, albeit without the function provided by the failed task. Only in
a rare instance of a failure to a critical task in the system partition would the entire POWER
Hypervisor fail.
The resulting code provides not only advanced features but also superb reliability. It is used in
IBM Power Systems and in the IBM TotalStorage DS8000 series products. It has therefore
been strenuously tested under a wide ranging set of system environments and configurations.
This process has delivered a quality implementation that includes enhanced error isolation
and recovery support when compared to POWER4 process-based offerings.
Service processor and clocks
A number of availability improvements have been included in the service processor in the
POWER6 and POWER5 processor-based servers. Separate copies of service processor
microcode and the POWER Hypervisor code are stored in discrete flash memory storage
areas. Code access is CRC protected. The service processor performs low level hardware
initialization and configuration of all processors. The POWER Hypervisor performs higher
level configuration for features like the virtualization support required to run up to 254
partitions concurrently on the POWER6 595, p5-590, p5-595, and i5-595 servers. The
POWER Hypervisor enables many advanced functions; including sharing of processor cores,
virtual I/O, and high speed communications between partitions using Virtual LAN. AIX, Linux,
and IBM i are supported. The servers also support dynamic firmware updates, in which
applications remain operational while IBM system firmware is updated for many operations.
Maintaining two copies ensures that the service processor can run even if a Flash memory
copy becomes corrupted, and allows for redundancy in the event of a problem during the
upgrade of the firmware.
In addition, if the service processor encounters an error during runtime, it can reboot itself
while the server system stays up and running. No server application impact exists for service
processor transient errors. If the service processor encounters a code
hang
condition, the
POWER Hypervisor can detect the error and direct the service processor to reboot, avoiding
other outage
Each POWER6 processor chip is designed to receive two oscillator signals (clocks) and can
be enabled to switch dynamically from one signal to the other. POWER6 595 servers are
equipped with two clock cards. For the POWER6 595, failure of a clock card will result in an
automatic (runtime) failover to the secondary clock card. No reboot is required. For other
multiclock offerings, an IPL time failover occurs if a system clock fails.
Summary of Contents for Power 595
Page 2: ......
Page 120: ...108 IBM Power 595 Technical Overview and Introduction...
Page 182: ...170 IBM Power 595 Technical Overview and Introduction...
Page 186: ...174 IBM Power 595 Technical Overview and Introduction...
Page 187: ......