184
January, 2004
Developer’s Manual
Intel XScale® Core
Developer’s Manual
Optimization Guide
A.3.1.2.
Optimizing Branches
Branches decrease application performance by indirectly causing pipeline stalls. Branch prediction
improves the performance by lessening the delay inherent in fetching a new instruction stream. The
number of branches that can accurately be predicted is limited by the size of the branch target
buffer. Since the total number of branches executed in a program is relatively large compared to the
size of the branch target buffer; it is often beneficial to minimize the number of branches in a
program. Consider the following C code segment.
int foo(int a)
{
if (a > 10)
return 0;
else
return 1;
}
The code generated for the if-else portion of this code segment using branches is:
cmp r0, #10
ble L1
mov r0, #0
b L2
L1:
mov r0, #1
L2:
The code generated above takes three cycles to execute the else part and four cycles for the if-part
assuming best case conditions and no branch misprediction penalties. In the case of the Intel
XScale
®
core, a branch misprediction incurs a penalty of four cycles. If the branch is mispredicted
50% of the time, and if we consider that both the if-part and the else-part are equally likely to be
taken, on an average the code above takes 5.5 cycles to execute.
.
If we were to use the core to execute instructions conditionally, the code generated for the above
if-else statement is:
cmp r0,
#10
movgt r0, #0
movle r0, #1
The above code segment would not incur any branch misprediction penalties and would take three
cycles to execute assuming best case conditions. As can be seen, using conditional instructions
speeds up execution significantly. However, the use of conditional instructions should be carefully
considered to ensure that it does improve performance. To decide when to use conditional
instructions over branches consider the following hypothetical code segment:
if (cond)
if_stmt
else
else_stmt
50
100
---------
4
3
4
+
2
------------
+
×
5.5
=
cycles