MOTOROLA
DSP96002 USER’S MANUAL
B-111
move d2.s,x:(r4)+ ;save lower 2, point to next
_bfly
move x:(r0)+n0,d0.s y:(r4)+n4,d1.s ;adjust r0,r4
_grp
lsr d6 d6.l,n0 ;bflys/2, make old value new offset
lsl d7 n0,n4 ;ngroups*2, move new offset
lea (r0)+n0,r4 ;new lower leg pointer
_stage
move #3,n0 ;offset between 2 butterflies-1
move n0,n4 ;same
move (r4)+ ;point r4 to second bfly
do #n/4,_laststage ;do last stage, 2 bflys at a time
move x:(r0)+,d0.s ;get upper of bfly 1
move x:(r0)-,d1.s ;get lower of bfly 1, point to upper
faddsub.s d0,d1 x:(r4)+,d2.s ;get upper of bfly 2
move x:(r4)-,d3.s ;get lower of bfly 1, point to upper
faddsub.s d2,d3 d1.s,x:(r0)+ ;save upper 1
move d0.s,x:(r0)+n0 ;save lower 1, point to next group
move d3.s,x:(r4)+ ;save upper 2
move d2.s,x:(r4)+n4 ;save lower 2, point to next group
_laststage
end
B.1.45.2 Out-of-place WHT
Since the WHT requires 2 loads and 2 stores per butterfly, the maximum throughput for a WHT butterfly is
4 cycles. However, if the data is split between two memories, then the 2 loads and 2 stores can be per-
formed in 2 cycles. Thus, it is possible to execute each butterfly in 2 cycles. This implementation takes the
input data in a single memory space and on the first stage of the transform, splits the data into X and Y
memory. The middle stages then perform 4 WHT butterflies in 8 cycles. The last stage is split out and also
performs 4 WHT butterflies in 8 cycles. Thus, except for the first stage, all WHT butterflies are performed
in 2 cycles.
In this example, a 16 point transform is performed. The input data are in X:0-f and the output is split be-
tween X and Y memory. The first 8 output values are at x:0-7 and the next 8 output values are at y:0-7 in
bit reversed order starting at x:0. To increase execution speed, an extra block of memory is used at y:0-7.
Thus, with this algorithm, an extra block of memory is required in Y memory equal to one-half of the trans-
form data size in X memory.
If both X and Y memory are on the same port (A or B), then all X and Y memory references are performed
on the same port. Thus, the WHT butterfly executes in 4 cycles. This gives an execution speed of 1.64
milliseconds at 13.5 MIPS. However, if X memory is on port A and Y memory is on port B, then the memory
bandwidth is doubled and an X memory access and Y memory access can occur in a single cycle. This
gives an execution speed of 0.939 milliseconds at 13.5 MIPS.
Summary of Contents for DSP96002
Page 3: ...1 2 DSP96002 USER S MANUAL MOTOROLA ...
Page 38: ...MOTOROLA DSP96002 USER S MANUAL 3 15 Figure 3 4 Modulo Arithmetic Unit Block Diagram ...
Page 39: ...3 16 DSP96002 USER S MANUAL MOTOROLA ...
Page 53: ...4 14 DSP96002 USER S MANUAL MOTOROLA ...
Page 76: ...MOTOROLA DSP96002 USER S MANUAL 5 23 Figure 5 8 Address Modifier Summary ...
Page 86: ...6 10 DSP96002 USER S MANUAL MOTOROLA ...
Page 101: ...MOTOROLA DSP96002 USER S MANUAL 7 15 Figure 7 9 HI Block Diagram One Port ...
Page 140: ...7 54 DSP96002 USER S MANUAL MOTOROLA ...
Page 166: ...9 10 DSP96002 USER S MANUAL MOTOROLA ...
Page 181: ...MOTOROLA DSP96002 USER S MANUAL 10 15 Figure 10 8 Program Address Bus FIFO ...
Page 337: ...MOTOROLA DSP96002 USER S MANUAL A 149 ...
Page 404: ...A 216 DSP96002 USER S MANUAL MOTOROLA PC xxxx D ...
Page 460: ...A 272 DSP96002 USER S MANUAL MOTOROLA SIOP Not affected ...
Page 484: ...A 296 DSP96002 USER S MANUAL MOTOROLA SSH PC SSL SR SP 1 SP ...
Page 519: ...MOTOROLA DSP96002 USER S MANUAL A 331 ...
Page 718: ...MOTOROLA DSP96002 USER S MANUAL B 199 ...
Page 871: ... MOTOROLA INC 1994 MOTOROLA TECHNICAL DATA SEMICONDUCTOR M Addendum ...
Page 888: ...MOTOROLA INDEX 1 INDEX ...
Page 889: ......