IA-32 Instruction Latency and Throughput
C
C-19
Table Footnotes
The following footnotes refer to all tables in this appendix.
1.
Latency information for many of instructions that are complex
(> 4
μ
ops) are estimates based on conservative and worst-case
estimates. Actual performance of these instructions by the
out-of-order core execution unit can range from somewhat faster to
significantly faster than the nominal latency data shown in these
tables.
2.
The names of execution units apply to processor implementations
of the Intel NetBurst microarchitecture only with CPUID signature
of family 15, model encoding = 0, 1, 2. They include:
ALU
,
FP_EXECUTE
,
FPMOVE
,
MEM_LOAD
,
MEM_STORE
execution units and ports in the out-of-order core. Note the
following:
•
The
FP_EXECUTE
unit is actually a cluster of execution units,
roughly consisting of seven separate execution units.
•
The
FP_ADD
unit handles x87 and SIMD floating-point add and
subtract operation.
•
The
FP_MUL
unit handles x87 and SIMD floating-point multiply
operation.
•
The
FP_DIV
unit handles x87 and SIMD floating-point divide
square-root operations.
•
The
MMX_SHFT
unit handles shift and rotate operations.
•
The
MMX_ALU
unit handles SIMD integer
ALU
operations.
•
The
MMX_MISC
unit handles reciprocal MMX computations and
some integer operations.
•
The
FP_MISC
designates other execution units in port 1 that are
separated from the six units listed above.
3.
It may be possible to construct repetitive calls to some IA-32
instructions in code sequences to achieve latency that is one or two
clock cycles faster than the more realistic number listed in this
table.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...