xvii
Clipping to an Arbitrary Signed Range [high, low]......................... 4-27
Simplified Clipping to an Arbitrary Signed Range ......................... 4-28
Clipping to an Arbitrary Unsigned Range [high, low]..................... 4-29
Complex Multiply by a Constant .................................................... 4-32
A Large Load after a Series of Small Stores (Penalty).................. 4-35
Accessing Data without Delay ....................................................... 4-35
A Series of Small Loads after a Large Store ................................. 4-36
An Example of Video Processing with Cache Line Splits.............. 4-37
Video Processing Using LDDQU to Avoid Cache Line Splits ........ 4-38
Pseudocode for Horizontal (xyz, AoS) Computation ....................... 5-8
Pseudocode for Vertical (xxxx, yyyy, zzzz, SoA) Computation........ 5-9
Swizzling Data............................................................................... 5-10
Swizzling Data Using Intrinsics ..................................................... 5-12
Deswizzling Single-Precision SIMD Data ...................................... 5-14
Deswizzling Data 64-bit Integer SIMD Data .................................. 5-16
Using MMX Technology Code for Copying or Shuffling................. 5-18
Horizontal Add Using movhlps/movlhps ........................................ 5-19
Horizontal Add Using Intrinsics with movhlps/movlhps ................. 5-21
Multiplication of Two Pair of Single-precision Complex Number.... 5-24
Division of Two Pair of Single-precision Complex Number............ 5-25
Calculating Dot Products from AOS .............................................. 5-26
Pseudo-code for Using cflush ....................................................... 6-18
Prefetch Scheduling Distance ....................................................... 6-26
Concatenation and Unrolling the Last Iteration of Inner Loop ....... 6-28
Using Prefetch Concatenation....................................................... 6-28
Spread Prefetch Instructions ......................................................... 6-33
Data Access of a 3D Geometry Engine without Strip-mining ........ 6-37
Data Access of a 3D Geometry Engine with Strip-mining ............. 6-38
Using HW Prefetch to Improve Read-Once Memory Traffic .......... 6-40
Basic Algorithm of a Simple Memory Copy ................................... 6-46
A Memory Copy Routine Using Software Prefetch........................ 6-48
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...