IA-32 Intel® Architecture Optimization
6-8
The Prefetch Instructions – Pentium 4 Processor
Implementation
Streaming SIMD Extensions include four flavors of
prefetch
instructions, one non-temporal, and three temporal. They correspond to
two types of operations, temporal and non-temporal.
The non-temporal instruction is
prefetchnta
Fetch the data into the second-level cache, minimizing
cache pollution.
The temporal instructions are
prefetcht0
Fetch the data into all cache levels, that is, to the
second-level cache for the Pentium 4 processor.
prefetcht1
Identical to
prefetcht0
prefetcht2
Identical to
prefetcht0
Prefetch and Load Instructions
The Pentium 4 processor has a decoupled execution and memory
architecture that allows instructions to be executed independently with
memory accesses if there are no data and resource dependencies.
Programs or compilers can use dummy load instructions to imitate
prefetch functionality, but preloading is not completely equivalent to
prefetch instructions. Prefetch instructions provide a greater
performance than preloading.
NOTE.
At the time of
prefetch
, if the data is already
found in a cache level that is closer to the processor
than the cache level specified by the instruction, no
data movement occurs.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...