xvi
Identification of SSE2 with cpuid ..................................................... 3-5
Identification of SSE2 by the OS ..................................................... 3-6
Identification of SSE3 with cpuid ..................................................... 3-7
Identification of SSE3 by the OS ..................................................... 3-8
Simple Four-Iteration Loop ............................................................ 3-14
Streaming SIMD Extensions Using Inlined Assembly Encoding ... 3-15
Simple Four-Iteration Loop Coded with Intrinsics.......................... 3-16
C++ Code Using the Vector Classes ............................................. 3-18
Automatic Vectorization for a Simple Loop .................................... 3-19
C Algorithm for 64-bit Data Alignment ........................................... 3-23
AoS Data Structure ....................................................................... 3-27
AoS and SoA Code Samples ........................................................ 3-28
SoA Data Structure ....................................................................... 3-28
Hybrid SoA Data Structure ............................................................ 3-30
Pseudo-code Before Strip Mining.................................................. 3-32
Strip Mined Code........................................................................... 3-33
Loop Blocking ................................................................................ 3-35
Emulation of Conditional Moves .................................................... 3-37
Resetting the Register between __m64 and FP Data Types........... 4-5
Unsigned Unpack Instructions......................................................... 4-7
Signed Unpack Code ...................................................................... 4-8
Interleaved Pack with Saturation ................................................... 4-10
Interleaved Pack without Saturation .............................................. 4-11
Unpacking Two Packed-word Sources in a Non-interleaved Way . 4-13
pextrw Instruction Code................................................................. 4-14
pinsrw Instruction Code................................................................. 4-15
Repeated pinsrw Instruction Code ................................................ 4-16
pmovmskb Instruction Code.......................................................... 4-17
Broadcast Using 2 Instructions...................................................... 4-19
pshuf Instruction Code .................................................................. 4-19
Swap Using 3 Instructions............................................................. 4-20
Reverse Using 3 Instructions......................................................... 4-20
Generating Constants ................................................................... 4-21
Absolute Difference of Two Unsigned Numbers ............................ 4-23
Absolute Difference of Signed Numbers ....................................... 4-24
Computing Absolute Value ............................................................ 4-25
Clipping to a Signed Range of Words [high, low] .......................... 4-27
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...