IA-32 Intel® Architecture Optimization
5-22
avoided since there is a
penalty associated with writing this register;
typically, through the use of the
cvttps2pi
and
cvttss2si
instructions,
the rounding control in
MXCSR
can be always be set to round-nearest.
Flush-to-Zero and Denormals-are-Zero Modes
The flush-to-zero (FTZ) and denormals-are-zero (DAZ) mode are not
compatible with IEEE Standard 754. They are provided to improve
performance for applications where underflow is common and where
the generation of a denormalized result is not necessary. See
“Floating-point Modes and Exceptions” in Chapter 2.
SIMD Floating-point Programming Using SSE3
SSE3 enhances SSE and SSE2 with 9 instructions targeted for SIMD
floating-point programming. In contrast to many SSE and SSE2
instructions offering homogeneous arithmetic operations on parallel
data elements (see Figure 5-1) and favoring the vertical computation
model, SSE3 offers instructions that performs asymmetric arithmetic
operation and arithmetic operation on horizontal data elements.
ADDSUBPS and ADDSUBPD are two instructions with asymmetric
arithmetic processing capability (see Figure 5-4). HADDPS, HADDPD,
HSUBPS and HSUBPD offers horizontal arithmetic processing
capability (see Figure 5-5). In addition, MOVSLDUP, MOVSHDUP
and MOVDDUP can load data from memory (or XMM register) and
replicate data elements at once.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...