IA-32 Intel® Architecture Optimization
3-38
Note that this can be applied to both SIMD integer and SIMD
floating-point code.
If there are multiple consumers of an instance of a register, group the
consumers together as closely as possible. However, the consumers
should not be scheduled near the producer.
SIMD Optimizations and Microarchitectures
Pentium M, Intel Core Solo and Intel Core Duo processors have a
different microarchitecture than Intel NetBurst
®
microarchitecture. The
following sub-section discusses optimizing SIMD code targeting Intel
Core Solo and Intel Core Duo processors.
The register-register variant of the following instructions has improved
performance on Intel Core Solo and Intel Core Duo processor relative to
Pentium M processors. This is because the instructions consist of two
micro-ops instead of three. Relevant instructions are: unpcklps,
unpckhps, packsswb, packuswb, packssdw, pshufd, shuffps and shuffpd.
top_of_loop:
movq
mm0, [A + eax]
pcmpgtw mm0, [B + eax]; Create compare mask
movq
mm1, [D + eax]
pand
mm1, mm0; Drop elements where A<B
pandn
mm0, [E + eax] ; Drop elements where A>B
por
mm0, mm1; Crete single word
movq
[C + eax], mm0
add
eax, 8
cmp eax,
MAX_ELEMENT*2
jle
top_of_loop
Example 3-21 Emulation of Conditional Moves
(continued)
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...