IA-32 Intel® Architecture Optimization
2-2
The following sections describe practices, tools, coding rules and
recommendations associated with these factors that will aid in
optimizing the performance on IA-32 processors.
Tuning to Prevent Known Coding Pitfalls
To produce program code that takes advantage of the Intel NetBurst
microarchitecture and the Pentium M processor microarchitecture, you
must avoid the coding pitfalls that limit the performance of the target
processor family. This section lists several known pitfalls that can limit
performance of Pentium 4 and Intel Xeon processor implementations.
Some of these pitfalls, to a lesser degree, also negatively impact
Pentium M processor performance (store-to-load-forwarding
restrictions, cache-line splits).
Table 2-1 lists coding pitfalls that cause performance degradation in
some Pentium 4 and Intel Xeon processor implementations. For every
issue, Table 2-1 references a section in this document. The section
describes in detail the causes of the penalty and presents a
recommended solution. Note that “aligned” here means that the address
of the load is aligned with respect to the address of the store.
Table 2-1
Coding Pitfalls Affecting Performance
Factors Affecting
Performance Symptom
Example
(if applicable)
Section Reference
Small, unaligned
load
after large
store
Store-forwarding
blocked
Store Forwarding,
Store-to-Load-Forwar
ding Restriction on
Size and Alignment
Large
load
after small
store;
Load
dword
after
store
dword
,
store
byte;
Load
dword
,
AND
with
0xff
after
store
byte
Store-forwarding
blocked
Store Forwarding,
Store-to-Load-Forwar
ding Restriction on
Size and Alignment
continued
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...