Coding for SIMD Architectures
3
3-23
•
Functions that use Streaming SIMD Extensions or Streaming SIMD
Extensions 2 data need to provide a 16-byte aligned stack frame.
•
The
__m128*
parameters need to be aligned to 16-byte boundaries,
possibly creating “holes” (due to padding) in the argument block.
These new conventions presented in this section as implemented by the
Intel C++ Compiler can be used as a guideline for an assembly language
code as well. In many cases, this section assumes the use of the
__m128*
data types, as defined by the Intel C++ Compiler, which represents an
array of four 32-bit floats.
For more details on the stack alignment for Streaming SIMD Extensions
and SSE2, see Appendix D, “Stack Alignment.”
Data Alignment for MMX Technology
Many compilers enable alignment of variables using controls. This
aligns the variables’ bit lengths to the appropriate boundaries. If some of
the variables are not appropriately aligned as specified, you can align
them using the C algorithm shown in Example 3-13.
The algorithm in Example 3-13 aligns an array of 64-bit elements on a
64-bit boundary. The constant of 7 is derived from one less than the
number of bytes in a 64-bit element, or 8-1. Aligning data in this manner
avoids the significant performance penalties that can occur when an
access crosses a cache line boundary.
Example 3-13 C Algorithm for 64-bit Data Alignment
/* Make newp a pointer to a 64-bit aligned array */
/* of NUM_ELEMENTS 64-bit elements. */
double *p, *newp;
p = (double*)malloc (sizeof(double)*(NUM_E1));
newp = (p+7) & (~0x7);
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...