IA-32 Intel® Architecture Optimization
5-28
When targeting complex arithmetics on Intel Core Solo and Intel Core
Duo processors, using single-precision SSE3 instructions can deliver
higher performance than alternatives. On the other hand, tasks requiring
double-precision complex arithmetics may perform better using scalar
SSE2 instructions on Intel Core Solo and Intel Core Duo processors.
This is because scalar SSE2 instructions can be dispatched through two
ports and executed using two separate floating-point units.
Packed horizontal SSE3 instructions (haddps and hsubps) can simplify
the code sequence for some tasks. However, these instruction consist of
more than five micro-ops on Intel Core Solo and Intel Core Duo
processors. Care must be taken to ensure the latency and decoding
penalty of the horizontal instruction does not offset any algorithmic
benefits.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...