Optimizing for SIMD Integer Applications
4
4-41
aligned versions; this can reduce the performance gains when using
the 128-bit SIMD integer extensions. The general guidelines on the
alignment of memory operands are:
— The greatest performance gains can be achieved when all
memory streams are 16-byte aligned.
— Reasonable performance gains are possible if roughly half of all
memory streams are 16-byte aligned, and the other half are not.
— Little or no performance gain may result if all memory streams
are not aligned to 16-bytes; in this case, use of the 64-bit SIMD
integer instructions may be preferable.
•
Loop counters need to be updated because each 128-bit integer
instruction operates on twice the amount of data as the 64-bit integer
counterpart.
•
Extension of the
pshufw
instruction (shuffle word across 64-bit
integer operand) across a full 128-bit operand is emulated by a
combination of the following instructions:
pshufhw
,
pshuflw
,
pshufd
.
•
Use of the 64-bit shift by bit instructions (
psrlq
,
psllq
) are
extended to 128 bits in these ways:
— use of
psrlq
and
psllq
, along with masking logic operations
— code sequence is rewritten to use the
psrldq
and
pslldq
instructions (shift double quad-word operand by bytes).
SIMD Optimizations and Microarchitectures
Pentium M, Intel Core Solo and Intel Core Duo processors have a
different microarchitecture than Intel NetBurst
®
microarchitecture. The
following sections discuss optimizing SIMD code that targets Intel Core
Solo and Intel Core Duo processors.
On Intel Core Solo and Intel Core Duo processors, lddqu behaves
identically to movdqu by loading 16 bytes of data irrespective of
address alignment.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...