28
IBM IntelliStation POWER 285 Technical Overview and Introduction
3.1 Reliability, fault tolerance, and data integrity
Excellent quality and reliability are inherent in all aspects of the IBM IntelliStation POWER
285 workstation processor design and manufacturing. The fundamental objective of the
design approach is to minimize outages. The RAS features help to ensure that the system
operates when required, performs reliably, and efficiently handles any failures that might
occur. This is achieved using capabilities that are provided by both the hardware and the
operating system AIX 5L.
The IntelliStation POWER 285 workstation as a server enhances the RAS
capabilities that are implemented in POWER4-based systems. RAS enhancements available
on POWER5 and servers are:
Most firmware updates allow the system to remain operational.
The ECC has been extended to inter-chip connections for the fabric and processor bus.
Partial L2 cache deallocation is possible.
The number of L3 cache line deletes improved from two to ten for better self-healing
capability.
The following sections describe the concepts that form the basis of leadership RAS features
of IBM IntelliStation POWER 285 workstation in more detail.
3.1.1 Fault avoidance
IBM IntelliStation POWER 285 workstations are built on a quality-based design that is
intended to keep errors from happening. This design includes the following features:
Reduced power consumption and cooler operating temperatures for increased reliability,
which are enabled by the use of copper circuitry, silicon-on-insulator, and dynamic clock
gating
Mainframe-inspired components and technologies
3.1.2 First-failure data capture
If a problem should occur, the ability to diagnose that problem correctly is a fundamental
requirement upon which improved availability is based. The IntelliStation POWER 285
incorporate advanced capability in start-up diagnostics and in run time First-failure data
capture (FDDC) based on strategic error checkers built into the processors.
Any errors detected by the pervasive error checkers are captured into Fault Isolation
Registers (FIRs), which can be interrogated by the service processor. The service processor
has the capability to access system components using special purpose ports or by accessing
error registers. Figure 3-1 on page 29 shows a schematic of a Fault Register Implementation.