ix
Vertical versus Horizontal Computation...................................................................... 5-5
Data Swizzling ............................................................................................................ 5-9
Data Deswizzling ...................................................................................................... 5-14
Using MMX Technology Code for Copy or Shuffling Functions ................................ 5-17
Horizontal ADD Using SSE....................................................................................... 5-18
Use of cvttps2pi/cvttss2si Instructions .................................................................................. 5-21
Flush-to-Zero and Denormals-are-Zero Modes .................................................................... 5-22
SIMD Floating-point Programming Using SSE3 ................................................................... 5-22
SSE3 and Complex Arithmetics ..................................................................................... 5-23
SSE3 and Horizontal Computation................................................................................. 5-26
SIMD Optimizations and Microarchitectures .................................................................. 5-27
General Prefetch Coding Guidelines....................................................................................... 6-2
Hardware Prefetching of Data................................................................................................. 6-4
Prefetch and Cacheability Instructions.................................................................................... 6-5
Prefetch................................................................................................................................... 6-6
Software Data Prefetch .................................................................................................... 6-6
The Prefetch Instructions – Pentium 4 Processor Implementation................................... 6-8
Prefetch and Load Instructions......................................................................................... 6-8
Fencing ..................................................................................................................... 6-10
Streaming Non-temporal Stores ............................................................................... 6-10
Memory Type and Non-temporal Stores ................................................................... 6-11
Write-Combining ....................................................................................................... 6-12
Streaming Store Instruction Descriptions ....................................................................... 6-14
The fence Instructions .................................................................................................... 6-15
The sfence Instruction .............................................................................................. 6-15
The lfence Instruction ............................................................................................... 6-16
The mfence Instruction ............................................................................................. 6-16
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...