Optimizing Cache Usage
6
6-25
•
Balance single-pass versus multi-pass execution
•
Resolve memory bank conflict issues
•
Resolve cache management issues
The subsequent sections discuss all the above items.
Software Prefetch Scheduling Distance
Determining the ideal prefetch placement in the code depends on many
architectural parameters, including the amount of memory to be
prefetched, cache lookup latency, system memory latency, and estimate
of computation cycle. The ideal distance for prefetching data is
processor- and platform-dependent. If the distance is too short, the
prefetch will not hide any portion of the latency of the fetch behind
computation. If the prefetch is too far ahead, the prefetched data may be
flushed out of the cache by the time it is actually required.
Since prefetch distance is not a well-defined metric, for this discussion,
we define a new term, prefetch scheduling distance (PSD), which is
represented by the number of iterations. For large loops, prefetch
scheduling distance can be set to 1, that is, schedule prefetch
instructions one iteration ahead. For small loop bodies, that is, loop
iterations with little computation, the prefetch scheduling distance must
be more than one iteration.
A simplified equation to compute PSD is deduced from the
mathematical model. For a simplified equation, complete mathematical
model, and methodology of prefetch distance determination, refer to
Appendix E, “Mathematics of Prefetch Scheduling Distance”.
Example 6-3 illustrates the use of a prefetch within the loop body. The
prefetch scheduling distance is set to 3,
esi
is effectively the pointer to a
line,
edx
is the address of the data being referenced and
xmm1-xmm4
are
the data used in computation. Example 6-4 uses two independent cache
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...