Chapter 4. Continuous availability and manageability
149
also offers advanced techniques designed to help contain failures in the coherency domain to
a subset of the server. Through careful design, in many cases, failures are contained to a
component or to a partition, despite the shared hardware system design. Many of these
techniques have been described in this document.
System-level availability (in any server, no matter how partitioned) is a function of the
reliability of the underlying hardware and the techniques used to mitigate the faults that do
occur. The availability design of these systems minimizes system failures and localizes
potential hardware faults to single partitions in multi-partition systems. In this design, although
some hardware errors might cause a full system crash (causing loss of all partitions),
because the rate of system crashes is very low, the rate of partition crashes is also very low.
The reliability and availability characteristics described in this document show how this
design
for availability
approach is consistently applied throughout the system design. IBM believes
this is the best approach to achieving partition level availability while supporting a truly flexible
and manageable partitioning environment.
In addition, to achieve the highest levels of system availability, IBM and third party software
vendors offer clustering solutions (such as HACMP™), which allow for failover from one
system to another, even for geographically dispersed systems.
4.2.7 Operating system availability
The focus of this section is a discussion of RAS attributes in the POWER6 and POWER5
hardware to provide for availability and serviceability of the hardware itself. Operating
systems, middleware, and applications provide additional key features concerning their own
availability that is outside the scope of this hardware discussion.
A worthwhile note, however, is that hardware and firmware RAS features can provide key
enablement for selected software availability features. As can be seen in section 4.4,
“Operating system support for RAS features” on page 160, many RAS features described in
this document are applicable to all supported operating systems.
The AIX, IBM i, and Linux operating systems include many reliability features inspired by
IBMs mainframe technology designed for robust operation. In fact, clients in survey, have
selected AIX as the highest quality UNIX operating system. In addition, IBM i offers a highly
scalable and virus resistant architecture with a proven reputation for exceptional business
resiliency. IBM i integrates a trusted combination of relational database, security, Web
services, networking and storage management capabilities. It provides a broad and highly
stable database and middleware foundation. All core middleware components are developed,
tested, and preloaded together with the operating system.
AIX 6 introduces unprecedented continuous availability features to the UNIX market designed
to extend its leadership continuous availability features.
POWER6 servers support a variety of enhanced features:
POWER6 storage protection keys
These keys provide hardware-enforced access mechanisms for memory regions. Only
programs that use the correct key are allowed to read or write to protected memory
locations. This new hardware allows programmers to restrict memory access within well
defined, hardware enforced boundaries, protecting critical portions of AIX 6 and
applications software from inadvertent memory overlay.
Storage protection keys can reduce the number of intermittent outages associated with
undetected memory overlays inside the AIX kernel. Programmers can also use the
Summary of Contents for Power 595
Page 2: ......
Page 120: ...108 IBM Power 595 Technical Overview and Introduction...
Page 182: ...170 IBM Power 595 Technical Overview and Introduction...
Page 186: ...174 IBM Power 595 Technical Overview and Introduction...
Page 187: ......