IA-32 Intel® Architecture Optimization
5-10
To gather data from 4 different memory locations on the fly, follow
steps:
1.
Identify the first half of the 128-bit memory location.
2.
Group the different halves together using the
movlps
and
movhps
to
form an
xyxy
layout in two registers.
3.
From the 4 attached halves, get the
xxxx
by using one shuffle, the
yyyy
by using another shuffle.
The
zzzz
is derived the same way but only requires one shuffle.
Example 5-3 illustrates the swizzle function.
Example 5-3
Swizzling Data
typedef struct _VERTEX_AOS {
float x, y, z, color;
} Vertex_aos;
// AoS structure declaration
typedef struct _VERTEX_SOA {
float x[4], float y[4], float z[4];
float color[4];
} Vertex_soa;
// SoA structure declaration
void swizzle_asm (Vertex_aos *in, Vertex_soa *out)
{
// in mem: x1y1z1w1-x2y2z2w2-x3y3z3w3-x4y4z4w4-
// SWIZZLE XYZW --> XXXX
asm {
mov ecx, in
// get structure addresses
mov edx, out
continued
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...