General Optimization Guidelines
2
2-55
Prefetching
The Pentium 4 processor has three prefetching mechanisms:
•
hardware instruction prefetcher
•
software prefetch for data
•
hardware prefetch for cache lines of data or instructions.
Hardware Instruction Fetching
The hardware instruction fetcher reads instructions, 32 bytes at a time,
into the 64-byte instruction streaming buffers.
Software and Hardware Cache Line Fetching
The Pentium 4 and Intel Xeon processors provide hardware prefetching,
in addition to software prefetching. The hardware prefetcher operates
transparently to fetch data and instruction streams from memory,
without requiring programmer intervention. The hardware prefetcher
can track 8 independent streams. Software prefetch using the
prefetchnta
instruction fetches 128 bytes into one way of the
second-level cache.
The Pentium M processor also provides a hardware prefetcher for data.
It can track 12 separate streams in the forward direction and 4 streams in
the backward direction. This processor’s
prefetchnta
instruction also
fetches 64-bytes into the first-level data cache without polluting the
second-level cache.
Intel Core Solo and Intel Core Duo processors provide more advanced
hardware prefetchers for data relative to those on the Pentium M
processors. The key differences are summarized in Table 1-2.
Although hardware prefetcher will operate transparently requiring no
intervention from the programmer, hardware prefetcher will operate
most efficiently if programmers specifically tailor data access patterns
to suit the characteristics of the hardware prefetcher because hardware
prefetcher favor small-stride cache miss patterns. Optimizing data
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...