IA-32 Intel® Architecture Optimization
2-68
Assembly/Compiler Coding Rule 33. (H impact, L generality) Minimize the
number of changes to the precision mode.
Improving Parallelism and the Use of FXCH
The x87 instruction set relies on the floating point stack for one of its
operands. If the dependence graph is a tree, which means each
intermediate result is used only once and code is scheduled carefully, it
is often possible to use only operands that are on the top of the stack or
in memory, and to avoid using operands that are buried under the top of
the stack. When operands need to be pulled from the middle of the
stack, an
fxch
instruction can be used to swap the operand on the top of
the stack with another entry in the stack.
The
fxch
instruction can also be used to enhance parallelism.
Dependent chains can be overlapped to expose more independent
instructions to the hardware scheduler. An
fxch
instruction may be
required to effectively increase the register name space so that more
operands can be simultaneously live.
Note, however, that
fxch
inhibits issue bandwidth in the trace cache. It
does this not only because it consumes a slot, but also because of issue
slot restrictions imposed on
fxch
. If the application is not bound by
issue or retirement bandwidth,
fxch
will have no impact.
The Pentium 4 processor’s effective instruction window size is large
enough to permit instructions that are as far away as the next iteration to
be overlapped. This often obviates the need to use
fxch
to enhance
parallelism.
The
fxch
instruction should be used only when it’s needed to express an
algorithm or to enhance parallelism. If the size of register name space is
a problem, the use of XMM registers is recommended (see the section).
Assembly/Compiler Coding Rule 34. (M impact, M generality) Use
fxch
only where necessary to increase the effective name space.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...