background image

7

Memory Mirroring

Continuous operation even in the event of a non-correctable memory error

The Express5800/1000 series server supports high-level memory 
RAS features to ensure that the server can rapidly detect memory 
errors, reduce multi-bit errors and continually operate even in 
the event of memory chip or memory controller failures. Memory 
scan, memory chip sparing (SDDC*) and memory scrubbing are 
examples of those features.

A memory scan is run on all loaded memory modules at each OS

boot. If the system detects a memory failure, the failed component 
is immediately isolated and detached from the system preventing 
possible downtime during business operations.

Chip sparing (SDDC*) memory is a memory system loaded with 
several DRAM chips that can correct errors at the chip level.  If 
a failure were to occur in the memory, the error can be corrected 
immediately to allow for continuous operation.

Memory scrubbing checks memory content regularly (every few 
milliseconds) during operation without affecting performance. 
When an error is detected, it is corrected and then reported. 
The scrubbing function is effective in detecting errors in a timely 
manner which ultimately results in the reduction of multi-bit errors.

Memory mirroring takes place continuously, where the same data 
is written onto 2 separate memory blocks instead of 1 (available 
only on the 1160Xf and 1320Xf).  In the event of a non-correctable 
error, due to the fact that the data exists on two independent 
blocks, operations are able to continue without interruption.  

Partial Chipset degradation 

Avoid multi-partition shutdowns resulting from chipset failures

In certain instances when multiple server partitions share a 
common crossbar controller, effects of a single partition failure 
may result in a multi-partition shutdown. To resolve this issue, the 
Express5800/1000 series servers have been designed to allow for 
the partial degradation of chipsets.

Within each of the LSI chips, which make up the chipset, multiple 
LSI sub-units exist.  These sub-units are connected to other sub-
units located on separate LSI chips. The combined sub-units 
together make up single partition. If an error were to occur on an 
LSI sub-unit, that sub-unit alone can be degradated to isolate the 
failure to a single partition, thus preventing the failure to spread to 
other partitions.

Furthermore, the downed partition can automatically reboot 
itself, after isolating the failed subsystem, to resume operations 
in a degradated mode without the intervention of a system 
administrator.  This is made possible, on the Express5800/1000 
series servers, by the redundant paths between the Cells and the 
IO.

Memory

Image

Unit of degradation 

on the Express5800/

1000 Series

D

at

0

D

at

2

D

at

1

D

at

3

D

at

0

D

at

2

D

at

1

D

at

3

Cell
Controller

Memory

I/F

Memory

Controller

Memory

I/F

Memory

Controller

Memory

I/F

Memory

Controller

Memory

I/F

Memory

Controller

Components covered by

the memory mirroring 

CPU

CPU

CPU

CPU

M

ir

ro

r

M

ir

ro

r

Components covered by

the standard chip sparing 

PCIBox

0

0

PCIBox

1

1

0

1

Sub

Unit

Sub

Unit

Crossbar 

Controller

A

Sub

Unit

Sub

Unit

Crossbar 

Controller

B

Sub

Unit

Sub

Unit

Sub

Unit

Sub

Unit

Sub

Unit

Cell 1

1

Cell 0

0

Partial 

degradation

Failure

n specifies the partition number

Sub-units within the chipset
Additional sub-sets exist in 
actuality

Not affected

Failure occurs at the sub-unit of 
the crossbar controller. 
Partition 0 is shutdown so that the 
failed component can be isolated. 
Partition 0 is rebooted

This construct allows for continuous operation through all non-
correctablememory errors, not limited to the memory themselves, 
but also in the memory interfaces and the in memory controllers.

* Single Device Data Correction

Summary of Contents for INTEL 5800/1000

Page 1: ...EC Express5800 1000 Technology Guide Vol 1 Powered by the Dual Core Intel Itanium Processor Reliability and Performance through the fusion of the NEC A3 chipset and the Dual Core Intel Itanium process...

Page 2: ...enterprises With the new Dual Core Intel Itanium processor 9000 series and the NEC designed third generation chipset A3 from chipset board to system level design NEC has never compromised to realize...

Page 3: ...etransmission of error data Two independent power sources Avoid system shutdown due to failures of the power distribution units Serviceability Autonomic reporting of logs with pinpoint prognosis of fa...

Page 4: ...llelization is achieved however it is not maximized nor efficient Parallel processing with EPIC architecture In the EPIC architecture parallelization is run at compile time allowing for maximum parall...

Page 5: ...se applications performance through reduced cache memory access latency Very Large Cache VLC Architecture Intel Itanium 2 processor Madison L3 9MB Latency Dual Core Intel Itanium processor Montvale L3...

Page 6: ...ts Partial chipset degradation Dynamic recovery Hot Pluggable 4 Hot Pluggable 4 Hot Pluggable 4 Hot Pluggable 4 Hot Pluggable 4 Hot Pluggable 4 Duplexed 1 16 processor domain segmentation 2 Core I O R...

Page 7: ...e may result in a multi partition shutdown To resolve this issue the Express5800 1000 series servers have been designed to allow for the partial degradation of chipsets Within each of the LSI chips wh...

Page 8: ...de that is linked directly to the failed crossbar will be temporarily shutdown The failed crossbar card can be replaced without halting other business operations Cell Cell Cell Cell Cell Cell Cell Cel...

Page 9: ...distribution mechanisms so that system downtime can be minimized The 1320Xf system allows for the division of the system into two 16 processor segments where one segment utilizes one system clock and...

Page 10: ...ected by the chipset in the event of an error The BID is able to diagnose the location of the error and will pinpoint the required FRU Field Replaceable Unit so that the time required to replace the c...

Page 11: ...iguration Small footprint and a highly scalable I O Along with the industry s prevalent Microsoft Windows operating system the Express5800 1000 series servers also support the Linux operating system B...

Page 12: ...tel logo Itanium and Itanium inside are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries Microsoft and Windows are registered trade...

Reviews: