206
January, 2004
Developer’s Manual
Intel XScale® Core
Developer’s Manual
Optimization Guide
A.4.4.13.
Prefetch to Reduce Register Pressure
Prefetch can be used to reduce register pressure. When data is needed for an operation, then the
load is scheduled far enough in advance to hide the load latency. However, the load ties up the
receiving register until the data can be used. For example:
ldr r2, [r0]
; Process code { not yet cached latency > 60 core clocks }
add r1, r1, r2
In the above case, r2 is unavailable for processing until the add statement. Prefetching the data load
frees the register for use. The example code becomes:
pld [r0] ;prefetch the data keeping r2 available for use
; Process code
ldr r2, [r0]
; Process code {ldr result latency is 3 core clocks}
add r1, r1, r2
With the added prefetch, register r2 can be used for other operations until almost just before it is
needed.