vii
Identifying Hot Spots ...................................................................................................... 3-10
Determine If Code Benefits by Conversion to SIMD Execution...................................... 3-11
Assembly .................................................................................................................. 3-15
Intrinsics.................................................................................................................... 3-15
Classes ..................................................................................................................... 3-17
Automatic Vectorization ............................................................................................ 3-18
Using Padding to Align Data..................................................................................... 3-20
Using Arrays to Make Data Contiguous.................................................................... 3-21
Stack Alignment For 128-bit SIMD Technologies ........................................................... 3-22
Data Alignment for MMX Technology ............................................................................. 3-23
Data Alignment for 128-bit data...................................................................................... 3-24
Data Structure Layout..................................................................................................... 3-27
Strip Mining..................................................................................................................... 3-32
Loop Blocking ................................................................................................................. 3-34
Optimizing for SIMD Integer Applications
General Rules on SIMD Integer Code .................................................................................... 4-2
Using SIMD Integer with x87 Floating-point............................................................................ 4-3
Using the EMMS Instruction ............................................................................................. 4-3
Guidelines for Using EMMS Instruction............................................................................ 4-4
Unsigned Unpack ............................................................................................................. 4-6
Signed Unpack ................................................................................................................. 4-7
Interleaved Pack with Saturation ...................................................................................... 4-8
Interleaved Pack without Saturation ............................................................................... 4-10
Non-Interleaved Unpack................................................................................................. 4-11
Extract Word................................................................................................................... 4-13
Insert Word ..................................................................................................................... 4-14
Move Byte Mask to Integer............................................................................................. 4-16
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...