viii
Packed Shuffle Word for 64-bit Registers ...................................................................... 4-18
Packed Shuffle Word for 128-bit Registers .................................................................... 4-19
Unpacking/interleaving 64-bit Data in 128-bit Registers................................................. 4-20
Data Movement .............................................................................................................. 4-21
Conversion Instructions .................................................................................................. 4-21
Generating Constants ........................................................................................................... 4-21
Building Blocks...................................................................................................................... 4-23
Absolute Difference of Unsigned Numbers .................................................................... 4-23
Absolute Difference of Signed Numbers ........................................................................ 4-24
Absolute Value................................................................................................................ 4-25
Clipping to an Arbitrary Range [high, low] ...................................................................... 4-26
Highly Efficient Clipping ............................................................................................ 4-27
Clipping to an Arbitrary Unsigned Range [high, low] ................................................ 4-28
Signed Word ............................................................................................................. 4-29
Unsigned Byte .......................................................................................................... 4-30
Packed Multiply High Unsigned...................................................................................... 4-30
Packed Sum of Absolute Differences ............................................................................. 4-30
Packed Average (Byte/Word) ......................................................................................... 4-31
Complex Multiply by a Constant ..................................................................................... 4-32
Packed 32*32 Multiply .................................................................................................... 4-33
Packed 64-bit Add/Subtract............................................................................................ 4-33
128-bit Shifts................................................................................................................... 4-33
Supplemental Techniques for Avoiding Cache Line Splits ........................................ 4-37
Increasing UC and WC Store Bandwidth by Using Aligned Stores........................... 4-40
Optimizing for SIMD Floating-point Applications
General Rules for SIMD Floating-point Code.......................................................................... 5-1
Planning Considerations ......................................................................................................... 5-2
Using SIMD Floating-point with x87 Floating-point ................................................................. 5-3
Scalar Floating-point Code...................................................................................................... 5-3
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...