IA-32 Intel® Architecture Optimization
C-4
Definitions
The IA-32 instruction performance data are listed in several tables. The
tables contain the following information:
Instruction Name:The assembly mnemonic of each instruction.
Latency:
The number of clock cycles that are required for the
execution core to complete the execution of all of the
μ
ops that form a IA-32 instruction.
Throughput:
The number of clock cycles required to wait before the
issue ports are free to accept the same instruction
again. For many IA-32 instructions, the throughput of
an instruction can be significantly less than its latency.
Execution units: The names of the execution units in the execution core
that are utilized to execute the
μ
ops for each
instruction. This information is provided only for
IA-32 instructions that are decoded into no more than
4
μ
ops.
μ
ops for instructions that decode into more
than 4
μ
ops are supplied by microcode ROM. Note
that several execution units may share the same port,
such as
FP_ADD
,
FP_MUL
, or
MMX_SHFT
in the
FP_EXECUTE
cluster (see Figure 1-4, Figure 1-4 applies
to Pentium 4 and Intel Xeon processors with CPUID
signature of family 15, model encoding = 0, 1, 2).
Latency and Throughput
This section presents the latency and throughput information for the
IA-32 instruction set including the Streaming SIMD Extensions 2,
Streaming SIMD Extensions, MMX technology, and most of the
frequently used general-purpose integer and x87 floating-point
instructions.
Due to the complexity of dynamic execution and out-of-order nature of
the execution core, the instruction latency data may not be sufficient to
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...