Multi-Core and Hyper-Threading Technology
7
7-25
User/Source Coding Rule 21. (M impact, H generality) Insert the PAUSE
instruction in fast spin loops and keep the number of loop repetitions to a
minimum to improve overall system performance.
On IA-32 processors that use the Intel NetBurst microarchitecture core,
the penalty of exiting from a spin-wait loop can be avoided by inserting
a
PAUSE
instruction in the loop. In spite of the name, the
PAUSE
instruction improves performance by introducing a slight delay in the
loop and effectively causing the memory read requests to be issued at a
rate that allows immediate detection of any store to the synchronization
variable. This prevents the occurrence of a long delay due to memory
order violation.
One example of inserting the
PAUSE
instruction in a simplified spin-wait
loop is shown in Example 7-4(b). The
PAUSE
instruction is compatible
with all IA-32 processors. On IA-32 processors prior to Intel NetBurst
microarchitecture, the
PAUSE
instruction is essentially a
NOP
instruction.
Additional examples of optimizing spin-wait loops using the
PAUSE
instruction are available in Application Note AP-949 “Using
Spin-Loops on Intel Pentium 4 Processor and Intel Xeon Processor.”
Inserting the
PAUSE
instruction has the added benefit of significantly
reducing the power consumed during the spin-wait because fewer
system resources are used.
Optimization with Spin-Locks
Spin-locks are typically used when several threads needs to modify a
synchronization variable and the synchronization variable must be
protected by a lock to prevent un-intentional overwrites. When the lock
is released, however, several threads may compete to acquire it at once.
Such thread contention significantly reduces performance scaling with
respect to frequency, number of discrete processors, and
Hyper-Threading Technology.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...