IA-32 Intel® Architecture Optimization
4-28
The code above converts values to unsigned numbers first and then clips
them to an unsigned range. The last instruction converts the data back to
signed data and places the data within the signed range. Conversion to
unsigned data is required for correct results when (
high
-
low
)
<
0x8000
.
If (
high
-
low
)
>= 0x8000
, the algorithm can be simplified as shown in
This algorithm saves a cycle when it is known that (
high
-
low
)
>=
0x8000
. The three-instruction algorithm does not work when (
high
-
low
)
< 0x8000
, because
0xffff
minus any number
< 0x8000
will yield
a number greater in magnitude than
0x8000
, which is a negative
number. When the second instruction,
psubssw MM0, (0xffff - high
+ low)
, in the three-step algorithm (Example 4-21) is executed, a
negative number is subtracted. The result of this subtraction causes the
values in
MM0
to be increased instead of decreased
,
as should be the case,
and an incorrect answer is generated.
Clipping to an Arbitrary Unsigned Range [high, low]
Example 4-22 clips an unsigned value to the unsigned range [
high,
low
]. If the value is less than
low
or greater than
high
, then clip to
low
or
high
, respectively. This technique uses the packed-add and
Example 4-21 Simplified Clipping to an Arbitrary Signed Range
; Input:
MM0
signed source operands
; Output:
MM1
signed operands clipped to the unsigned
;
range [high, low]
paddssw MM0, (packed_max - packed_high)
; in effect this clips to high
psubssw MM0, (packed_usmax - packe packed_ow)
; clips to low
paddw MM0, low
; undo the previous two offsets
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...