Multi-Core and Hyper-Threading Technology
7
7-49
On Hyper-Threading-Technology-enabled processors, excessive loop
unrolling is likely to reduce the Trace Cache’s ability to deliver high
bandwidth
μ
op streams to the execution engine.
Optimization for Code Size
When the Trace Cache is continuously and repeatedly delivering
μ
op
traces that are pre-built, the scheduler in the execution engine can
dispatch
μ
ops for execution at a high rate and maximize the utilization
of available execution resources. Optimizing application code size by
organizing code sequences that are repeatedly executed into sections,
each with a footprint that can fit into the Trace Cache, can improve
application performance greatly.
On Hyper-Threading-Technology-enabled processors, multithreaded
applications should improve code locality of frequently executed
sections and target one half of the size of Trace Cache for each
application thread when considering code size optimization. If code size
becomes an issue affecting the efficiency of the front end, this may be
detected by evaluating performance metrics discussed in the previous
sub-section with respect to loop unrolling.
User/Source Coding Rule 38. (L impact, L generality) Optimize code size to
improve locality of Trace cache and increase delivered trace length.
Using Thread Affinities to Manage Shared Platform
Resources
Each logical processor in an MP system has unique initial APIC_ID
which can be queried using CPUID. Resources shared by more than one
logical processors in a multi-threading platform can be mapped into a
three-level hierarchy for a non-clustered MP system. Each of the three
levels can be identified by a label, which can be extracted from the
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...