164
January, 2004
Developer’s Manual
Intel XScale® Core
Developer’s Manual
Performance Considerations
10.2
Branch Prediction
The Intel XScale
®
core implements dynamic branch prediction for the ARM* instructions B and
BL and for the Thumb instruction B. Any instruction that specifies the PC as the destination is
predicted as not taken. For example, an LDR or a MOV that loads or moves directly to the PC will
be predicted not taken and incur a branch latency penalty.
These instructions -- ARM B, ARM BL and Thumb B -- enter into the branch target buffer when
they are “taken” for the first time. (A “taken” branch refers to when they are evaluated to be true.)
Once in the branch target buffer, the core dynamically predicts the outcome of these instructions
based on previous outcomes.
Table 10-1
shows the branch latency penalty when these instructions
are correctly predicted and when they are not. A penalty of zero for correct prediction means that
the core can execute the next instruction in the program flow in the cycle following the branch.
10.3
Addressing Modes
All load and store addressing modes implemented in the core do not add to the instruction latencies
numbers.
Table 10-1.
Branch Latency Penalty
Core Clock Cycles
Description
ARM*
Thumb
+0
+ 0
Predicted Correctly
. The instruction is in the branch target cache and is correctly
predicted.
+4
+ 5
Mispredicted
. There are three occurrences of branch misprediction, all of which
incur a 4-cycle branch delay penalty.
1. The instruction is in the branch target buffer and is predicted not-taken, but is
actually taken.
2. The instruction is not in the branch target buffer and is a taken branch.
3. The instruction is in the branch target buffer and is predicted taken, but is
actually not-taken