Multi-Core and Hyper-Threading Technology
7
7-37
latency of scattered memory reads can be improved by issuing multiple
memory reads back-to-back to overlap multiple outstanding memory
read transactions. The average latency of back-to-back bus reads is
likely to be lower than the average latency of scattered reads
interspersed with other bus transactions. This is because only the first
memory read needs to wait for the full delay of a cache miss.
User/Source Coding Rule 29. (M impact, M generality) Consider using
overlapping multiple back-to-back memory reads to improve effective cache
miss latencies.
Another technique to reduce effective memory latency is possible if one
can adjust the data access pattern such that the access strides causing
successive cache misses in the last-level cache is predominantly less
than the trigger threshold distance of the automatic hardware prefetcher.
See “Example of Effective Latency Reduction with H/W Prefetch” in
Chapter 6.
User/Source Coding Rule 30. (M impact, M generality) Consider adjusting
the sequencing of memory references such that the distribution of distances of
successive cache misses of the last level cache peaks towards 64 bytes.
Use Full Write Transactions to Achieve Higher Data Rate
Write transactions across the bus can result in write to physical memory
either using the full line size of 64 bytes or less than the full line size.
The latter is referred to as a partial write. Typically, writes to writeback
(WB) memory addresses are full-size and writes to write-combine (WC)
or uncacheable (UC) type memory addresses result in partial writes.
Both cached WB store operations and WC store operations utilize a set
of six WC buffers (64 bytes wide) to manage the traffic of write
transactions. When competing traffic closes a WC buffer before all
writes to the buffer are finished, this results in a series of 8-byte partial
bus transactions rather than a single 64-byte write transaction.
User/Source Coding Rule 31. (M impact, M generality) Use full write
transactions to achieve higher data throughput.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...