General Optimization Guidelines
2
2-73
Note that transcendental functions are supported only in x87 floating
point, not in Streaming SIMD Extensions or Streaming SIMD
Extensions 2.
Instruction Selection
This section explains how to generate optimal assembly code. The listed
optimizations have been shown to contribute to the overall performance
at the application level on the order of 5%. Performance gain for
individual applications may vary.
The recommendations are prioritized as follows:
•
Choose instructions with shorter latencies and fewer µops.
•
Use optimized sequences for clearing and comparing registers.
•
Enhance register availability.
•
Avoid prefixes, especially more than one prefix.
Assembly/Compiler Coding Rule 37. (M impact, H generality) Choose
instructions with shorter latencies and fewer micro-ops. Favor
single-micro-operation instructions.
A compiler may be already doing a good job on instruction selection as
it is. In that case, user intervention usually is not necessary.
Assembly/Compiler Coding Rule 38. (M impact, L generality) Avoid
prefixes, especially multiple non-0F-prefixed opcodes.
Assembly/Compiler Coding Rule 39. (M impact, L generality) Do not use
many segment registers.
On the Pentium M processor, there is only one level of renaming of
segment registers.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...