General Optimization Guidelines
2
2-5
Optimize Branch Predictability
•
Improve branch predictability and optimize instruction prefetching
by arranging code to be consistent with the static branch prediction
assumption: backward taken and forward not taken.
•
Avoid mixing near calls, far calls and returns.
•
Avoid implementing a call by pushing the return address and
jumping to the target. The hardware can pair up call and return
instructions to enhance predictability.
•
Use the
pause
instruction in spin-wait loops.
•
Inline functions according to coding recommendations.
•
Whenever possible, eliminate branches.
•
Avoid indirect calls.
Optimize Memory Access
•
Observe store-forwarding constraints.
•
Ensure proper data alignment to prevent data split across cache line.
boundary. This includes stack and passing parameters.
•
Avoid mixing code and data (self-modifying code).
•
Choose data types carefully (see next bullet below) and avoid type
casting.
•
Employ data structure layout optimization to ensure efficient use of
64-byte cache line size.
•
Favor parallel data access to mask latency over data accesses with
dependency that expose latency.
•
For cache-miss data traffic, favor smaller cache-miss strides to
avoid frequent DTLB misses.
•
Use prefetching appropriately.
•
Use the following techniques to enhance locality: blocking,
hardware-friendly tiling, loop interchange, loop skewing.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...