background image

10

Mainframe-class RAS Features

Enhanced error detection of the high-speed interconnect

Intricate error handling through multi-bit error detection    

 

 

 

 

 

and resending of errored data

Since higher speed interconnects are implemented to increase 
system performance, there are higher probabilities that 
interference noise will cause errors occurring along these 
interconnects. One method of handling these interconnect errors 
would be to disable the errored interconnect and operate in a 
degradated mode.

In addition to above method, the Expres5800/1000 series servers 
have implemented a methodology prevalent in supercomputers, 
where by intricate multi-bit error detection is carried out, and 
errored data is resent upon detection of an error. This allows 
the Express5800/1000 series servers to handle the intermittent 
errors which occur along the high-speed interconnects, without 
impacting the system performance.

Two independent power sources 

Avoid system shutdown due to failures of the power distribution units

The previous 32 processor and the 16 processor models supported 
having two independent power supplies, where the 8 processor 
model did not. This feature is now available on the new 8 processor 
system (1080Rf) so that the system can continue operations even 
in the event of a failure with in the power distribution unit.   

Autonomic reporting of error logs with pinpoint prognosis   

 

 

 

of failed components 

Realization of a mainframe-class platform serviceability

The Express5800/1000 series servers are equipped with a service 
processor which process server management and platform error 
handling.  The service processor can be considered the core 
component which supports the RAS features of the system.  One 
feature of the service processor is its ability to analyze detail logs 
(BID: built-in diagnosis) which are collected by the chipset in the 
event of an error. The BID is able to diagnose the location of the 
error, and will pinpoint the required FRU (Field Replaceable Unit) 
so that the time required to replace the component and recover the 
system, can be minimized.

In the event of a failure, the Express5800/1000 series servers 
also have the capability to automatically send detailed error logs 
to maintenance personnel, enabling us to further lessen the time 
required to resolve a system error.  Furthermore, to minimize 
the possibility of a critical error, the diagnostics engine is able to 
proactively predict errors rather than just react to errors.

Implementation of an Uninterruptible Power Supply (UPS) can 
further increase availability. The two independent power source 
feature is a standard feature on the 1320Xf and is available as an 
optional feature for 1160Xf and 1080Rf.

Customer

Environment

Diagnostics Agent

Diagnostics of retry tendency and 
confirmation of whether threshold 
was exceeded

Service

Processor

Manager

Preven

tive Maintenance

,

Failed Compon

ent Repla

cemen

t

Maintenance Group

The error information summary 
is analyzed to determine the
cause of the failure.  
The development team may 
be contacted for assistance.

Encrypted message

Development Group

The Error information

is sent via email

If required, the detail log is analyzed

further by the development groups

Hard

ware

Diagnostics

Agent

Log

Mail

Log

Mail

Internet

Log

A detailed hardware error log 
including transaction history is 
collected.

chipset

Without Check Features

Logic Circuits

ECC

Failure

Bad Data

Without Check Features

Logic Circuits

ECC

Data

Data

Failure

Unable to detect error

Circuit

Check

Error Detected

1 bit Error

Error Detection

Circuits

Error Detection

Circuits

Bad data, resulting from a simple error 
such as a single bit error, can not be 
blocked if a failure exists within the 
error detection circuits themselves.

Diagnostics of the error detection 
circuits at every system boot 
insures data integrity.

Error 

Reporting

Error 

Reporting

Summary of Contents for INTEL 5800/1000

Page 1: ...EC Express5800 1000 Technology Guide Vol 1 Powered by the Dual Core Intel Itanium Processor Reliability and Performance through the fusion of the NEC A3 chipset and the Dual Core Intel Itanium process...

Page 2: ...enterprises With the new Dual Core Intel Itanium processor 9000 series and the NEC designed third generation chipset A3 from chipset board to system level design NEC has never compromised to realize...

Page 3: ...etransmission of error data Two independent power sources Avoid system shutdown due to failures of the power distribution units Serviceability Autonomic reporting of logs with pinpoint prognosis of fa...

Page 4: ...llelization is achieved however it is not maximized nor efficient Parallel processing with EPIC architecture In the EPIC architecture parallelization is run at compile time allowing for maximum parall...

Page 5: ...se applications performance through reduced cache memory access latency Very Large Cache VLC Architecture Intel Itanium 2 processor Madison L3 9MB Latency Dual Core Intel Itanium processor Montvale L3...

Page 6: ...ts Partial chipset degradation Dynamic recovery Hot Pluggable 4 Hot Pluggable 4 Hot Pluggable 4 Hot Pluggable 4 Hot Pluggable 4 Hot Pluggable 4 Duplexed 1 16 processor domain segmentation 2 Core I O R...

Page 7: ...e may result in a multi partition shutdown To resolve this issue the Express5800 1000 series servers have been designed to allow for the partial degradation of chipsets Within each of the LSI chips wh...

Page 8: ...de that is linked directly to the failed crossbar will be temporarily shutdown The failed crossbar card can be replaced without halting other business operations Cell Cell Cell Cell Cell Cell Cell Cel...

Page 9: ...distribution mechanisms so that system downtime can be minimized The 1320Xf system allows for the division of the system into two 16 processor segments where one segment utilizes one system clock and...

Page 10: ...ected by the chipset in the event of an error The BID is able to diagnose the location of the error and will pinpoint the required FRU Field Replaceable Unit so that the time required to replace the c...

Page 11: ...iguration Small footprint and a highly scalable I O Along with the industry s prevalent Microsoft Windows operating system the Express5800 1000 series servers also support the Linux operating system B...

Page 12: ...tel logo Itanium and Itanium inside are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries Microsoft and Windows are registered trade...

Reviews: