IA-32 Intel® Architecture Optimization
2-78
movzx
to avoid a partial register stall when
packing three byte values into a register.
Assembly/Compiler Coding Rule 44. (ML impact, L generality) Use simple
instructions that are less than eight bytes in length.
Assembly/Compiler Coding Rule 45. (M impact, MH generality) Avoid
using prefixes to change the size of immediate and displacement.
Long instructions (more than seven bytes) limit the number of decoded
instructions per cycle on the Pentium M processor. Each prefix adds one
byte to the length of instruction, possibly limiting the decoder’s
throughput. In addition, multiple prefixes can only be decoded by the
first decoder. These prefixes also incur a delay when decoded. If
multiple prefixes or a prefix that changes the size of an immediate or
displacement cannot be avoided, schedule them behind instructions that
stall the pipe for some other reason.
Assembly/Compiler Coding Rule 46. (M impact, MH generality) Break
dependences on portions of registers between instructions by operating on
32-bit registers instead of partial registers. For moves, this can be
accomplished with 32-bit moves or by using
movzx
.
On Pentium M processors, the
movsx
and
movzx
instructions both take a
single
μ
op, whether they move from a register or memory. On Pentium
4 processors, the
movsx
takes an additional
μ
op. This is likely to cause
Table 2-3
Avoiding Partial Register Stall When Packing Byte Values
A Sequence with Partial Register Stall
Alternate Sequence without
Partial Register Stall
mov
al
,byte ptr a[2]
shl
eax
,16
mov
ax
,word ptr a
movd mm0,
eax
movzx eax,byte ptr a[2]
shl eax,16
movzx ecx,word ptr a
or eax,ecx
movd mm0,eax
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...