Functional Architecture
Intel® Server Boards SE7320SP2 and SE7525GP2
Revision
4.0
26
3.5.4 Disabling
DIMMs
The BIOS provides a mechanism to disable a DIMM if it is detected to be faulty. A faulty DIMM
is defined to have either multiple correctable errors or a single uncorrectable error on a single
DIMM. Memory errors are logged during runtime and CMEs (Correctable Memory Error) are
counted, the CMEs include both single bit correctable and other correctable memory errors.
Though DIMMs are marked as disabled, they are actually disabled only during the next reboot.
At the next system boot, memory-sizing code reads the recorded state of the DIMMs and skips
sizing DIMMs marked as disabled. Because DIMMs are always used in 2-way interleaving, the
DIMM pair is disabled. The disabled DIMMs are indicated by an LED next to the DIMM socket. If
all DIMMs in a system have been disabled, the BIOS generates beep codes to indicate that the
system has no usable memory.
Disabled DIMMs/rows may be re-enabled through a BIOS Setup option (Advanced Menu |
Memory Configuration Sub-menu | Memory Retest | change setting to “enabled” | Exit Menu |
Save changes and Exit). The DIMM slot will no longer be disabled if the system boots without
memory in the DIMM slot.
3.5.4.1
Mechanism for CME/SEC Counter
The expected error rates for DIMMs are stated per gigabyte of memory. This information comes
from three sources:
Intel experimental measurements (one and one-half errors per year)
Data from a memory component vendor (one error per month)
The results from a 10-year study by a major computer manufacturer (four errors per
month)
Since the lowest error rate was gathered over a short time, and the highest error rate was
gathered over a long time, these two numbers are not considered valid and are discarded. The
middle error number is perceived as being a more accurate conservative estimate and is used
to program the threshold registers for single-bit correctable memory errors or SECs.
The threshold number must be adjusted for geographical areas of increased occurrence of
alpha particles, which will increase error rates. Geographical effects include high altitudes and
radioactive mineral deposits. Studies have shown that single-bit error rates at altitudes over
10,000 feet are 14 times higher than error rates at sea level. The highest of the three quoted
error rates included various geographical locations.
Table 8 shows the suggested SEC register threshold for various DIMM sizes. The values in the
table include a minimal error residue at one times the expected average error rate. Halving the
time or threshold would result in loss of error count resolution. One register is programmed for
each DIMM slot.
Summary of Contents for SE7320SP2 - 800MHZ Ecc Ddr Xeon
Page 182: ......