74
IBM Power 595 Technical Overview and Introduction
features that are not implemented in the POWER6 processors within other Power Systems
and System p servers. These features include:
Dual, integrated L3 cache controllers
Dual, integrated memory controllers
Two additional (of the many) enhancements to the POWER6 processor include the ability to
perform processor instruction retry and alternate processor recovery. This significantly
reduces exposure to both hard (logic) and soft (transient) errors in the processor core.
Processor instruction retry
Soft failures in the processor core are transient errors. When an error is encountered in
the core, the POWER6 processor automatically retries the instruction. If the source of the
error was truly transient, the instruction will succeed and the system will continue as
before. On predecessor IBM systems, this error would have caused a checkstop.
Alternate processor retry
Hard failures are more challenging to recover from, being true logical errors that are
replicated each time the instruction is repeated. Retrying the instruction does not help in
this situation because the instruction will continue to fail. Systems with POWER6
processors introduce the ability to extract the failing instruction from the faulty core and
retry it elsewhere in the system, after which the failing core is dynamically deconfigured
and called out for replacement. The entire process is transparent to the partition owning
the failing instruction. Systems with POWER6 processors are designed to avoid what
would have been a full system outage.
Other enhancements include:
POWER6 single processor checkstopping
Typically, a processor checkstop would result in a system checkstop. A new feature in the
595 server is the ability to contain most processor checkstops to the partition that was
using the processor at the time. This significantly reduces the probability of any one
processor affecting total system availability.
POWER6 cache availability
In the event that an uncorrectable error occurs in L2 or L3 cache, the system is able to
dynamically remove the offending line of cache without requiring a reboot. In addition,
POWER6 utilizes an L1/L2 cache design and a write-through cache policy on all levels,
helping to ensure that data is written to main memory as soon as possible.
While L2 and L3 cache are physically associated with each processor module or chip, all
cache is coherent. A coherent cache is one in which hardware largely hides, from the
software, the fact that cache exists. This coherency management requires control traffic
both within and between multiple chips. It also often means that data is copied (or moved)
from the contents of cache of one core to the cache of another core. For example, if a core
of chip one incurs a cache miss on some data access and the data happens to still reside
in the cache of a core on chip two, the system finds the needed data and transfers it
across the inter-chip fabric to the core on chip one. This is done without going through
memory to transfer the data.
Figure 2-31 on page 75 shows a high-level view of the POWER6 processor. L1 Data and L1
Instruction caches are within the POWER6 core.
Summary of Contents for Power 595
Page 2: ......
Page 120: ...108 IBM Power 595 Technical Overview and Introduction...
Page 182: ...170 IBM Power 595 Technical Overview and Introduction...
Page 186: ...174 IBM Power 595 Technical Overview and Introduction...
Page 187: ......