IA-32 Intel® Architecture Optimization
6-22
Example of Latency Hiding with S/W Prefetch Instruction
Achieving the highest level of memory optimization using prefetch
instructions requires an understanding of the microarchitecture and
system architecture of a given machine. This section translates the key
architectural implications into several simple guidelines for
programmers to use.
Figure 6-2 and Figure 6-3 show two scenarios of a simplified 3D
geometry pipeline as an example. A 3D-geometry pipeline typically
fetches one vertex record at a time and then performs transformation
and lighting functions on it. Both figures show two separate pipelines,
an execution pipeline, and a memory pipeline (front-side bus).
Since the Pentium 4 processor, similarly to the Pentium II and
Pentium
III
processors, completely decouples the functionality of
execution and memory access, these two pipelines can function
concurrently. Figure 6-2 shows “bubbles” in both the execution and
memory pipelines. When loads are issued for accessing vertex data, the
Figure 6-1
Effective Latency Reduction as a Function of Access Stride
U p p e r b o u n d o f P o in t e r - C h a s in g L a t e n c y R e d u c tio n
0 %
2 0 %
4 0 %
6 0 %
8 0 %
1 0 0 %
1 2 0 %
64
80
96
11
2
12
8
14
4
16
0
17
6
19
2
20
8
22
4
24
0
S tr i d e (B y te s)
E
ffe
c
ti
v
e
L
a
te
n
c
y
Re
d
u
c
ti
o
n
F a m . 1 5 ; M o d e l 3 , 4
F a m . 1 5 ; M o d e l 0 , 1 , 2
F a m . 6 ; M o d e l 1 3
F a m . 6 ; M o d e l 1 4
F a m . 1 5 ; M o d e l 6
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...