Optimizing Cache Usage
6
6-15
The
maskmovq/maskmovdqu
(non-temporal byte mask store of packed
integer in an MMX technology or Streaming SIMD Extensions register)
instructions store data from a register to the location specified by the
edi
register. The most significant bit in each byte of the second mask
register is used to selectively write the data of the first register on a
per-byte basis. The instruction is implicitly weakly-ordered (that is,
successive stores may not write memory in original program-order),
does not write-allocate, and thus minimizes cache pollution.
The
fence
Instructions
The following fence instructions are available:
sfence
,
lfence
, and
mfence
.
The
sfence
Instruction
The
sfence
(
store fence
) instruction makes it possible for every
store
instruction that precedes the
sfence
instruction in program order
to be globally visible before any
store
instruction that follows the
sfence
. The
sfence
instruction provides an efficient way of ensuring
ordering between routines that produce weakly-ordered results.
The use of weakly-ordered memory types can be important under
certain data sharing relationships, such as a producer-consumer
relationship. Using weakly-ordered memory can make assembling the
data more efficient, but care must be taken to ensure that the consumer
obtains the data that the producer intended to see. Some common usage
models may be affected in this way by weakly-ordered stores. Examples
are:
•
library functions, which use weakly-ordered memory to write
results
•
compiler-generated code, which also benefits from writing
weakly-ordered results
•
hand-crafted code
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...