Coding for SIMD Architectures
3
3-35
For the first iteration of the inner loop, each access to array
B
will generate a
cache miss.
If the size of one row of array
A
, that is,
A[2, 0:MAX-1]
, is
large enough, by the time the second iteration starts, each access to array
B
will always generate a cache miss. For instance, on the first iteration,
the cache line containing
B[0, 0:7]
will be brought in when
B[0,0]
is
referenced because the
float
type variable is four bytes and each cache
line is 32 bytes. Due to the limitation of cache capacity, this line will be
evicted due to conflict misses before the inner loop reaches the end. For
the next iteration of the outer loop, another cache miss will be generated
while referencing
B[0,1]
. In this manner, a cache miss occurs when
each element of array
B
is referenced, that is, there is no data reuse in the
cache at all for array
B
.
Example 3-20 Loop Blocking
A. Original Loop
float A[MAX, MAX], B[MAX, MAX]
for (i=0; i< MAX; i++) {
for (j=0; j< MAX; j++) {
A[i,j] = A[i,j] + B[j, i];
}
}
B. Transformed Loop after Blocking
float A[MAX, MAX], B[MAX, MAX];
for (i=0; i< MAX; i+=block_size) {
for (j=0; j< MAX; j+=block_size) {
for (ii=i; ii<i+block_size; ii++) {
for (jj=j; jj<j+block_size; jj++) {
A[ii,jj] = A[ii,jj] + B[jj, ii];
}
}
}
}
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...