Coding for SIMD Architectures
3
3-31
Note that SoA can have the disadvantage of requiring more independent
memory stream references. A computation that uses arrays
x
,
y
, and
z
in
Example 3-15 would require three separate data streams. This can
require the use of more prefetches, additional address generation
calculations, as well as having a greater impact on DRAM page access
efficiency. An alternative, a hybrid SoA approach blends the two
alternatives (see Example 3-17). In this case, only 2 separate address
streams are generated and referenced: one which contains
xxxx,yyyy,zzzz,zzzz,...
and the other which contains
aaaa,bbbb,cccc,aaaa,dddd,...
. This also prevents fetching
unnecessary data, assuming the variables
x
,
y
,
z
are always used
together; whereas the variables
a
,
b
,
c
would also used together, but not
at the same time as
x
,
y
,
z
. This hybrid SoA approach ensures:
•
data is organized to enable more efficient vertical SIMD
computation,
•
simpler/less address generation than AoS,
•
fewer streams, which reduces DRAM page misses,
•
use of fewer prefetches, due to fewer streams,
•
efficient cache line packing of data elements that are used
concurrently.
With the advent of the SIMD technologies, the choice of data
organization becomes more important and should be carefully based on
the operations to be performed on the data. This will become
increasingly important in the Pentium 4 processor and future processors.
In some applications, traditional data arrangements may not lead to the
maximum performance. Application developers are encouraged to
explore different data arrangements and data segmentation policies for
efficient computation. This may mean using a combination of AoS,
SoA, and Hybrid SoA in a given application.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...