DSP_fft16x16r
4-17
C64x+ DSPLIB Reference
DSP_fft16x16r(N, &x[0], &w[0], brev,y,N/4,0, N)
DSP_fft16x16r(N/4,&x[0], &w[2*3*N/4],brev,y,rad,0, N)
DSP_fft16x16r(N/4,&x[2*N/4], &w[2*3*N/4],brev,y,rad,N/4, N)
DSP_fft16x16r(N/4,&x[2*N/2], &w[2*3*N/4],brev,y,rad,N/2, N)
DSP_fft16x16r(N/4,&x[2*3*N/4],&w[2*3*N/4],brev,y,rad,3*N/4,N)
As discussed previously, N can be either a power of 4 or 2. If N is a power of
4, then rad = 4, and if N is a power of 2 and not a power of 4, then rad = 2. “rad”
controls how many stages of decomposition are performed. It also determines
whether a radix4 or DSP_radix2 decomposition should be performed at the
last stage. Hence, when “rad” is set to “N/4”, the first stage of the transform
alone is performed and the code exits. To complete the FFT, four other calls
are required to perform N/4 size FFTs. In fact, the ordering of these 4 FFTs
amongst themselves does not matter and, thus, from a cache perspective, it
helps to go through the remaining 4 FFTs in exactly the opposite order to the
first. This is illustrated as follows:
DSP_fft16x16r(N, &x[0], &w[0], brev,y,N/4,0, N)
DSP_fft16x16r(N/4,&x[2*3*N/4],&w[2*3*N/4],brev,y,rad,3*N/4, N)
DSP_fft16x16r(N/4,&x[2*N/2], &w[2*3*N/4],brev,y,rad,N/2, N)
DSP_fft16x16r(N/4,&x[2*N/4], &w[2*3*N/4],brev,y,rad,N/4, N)
DSP_fft16x16r(N/4,&x[0], &w[2*3*N/4],brev,y,rad,0, N)
In addition, this function can be used to minimize call overhead by completing
the FFT with one function call invocation as shown below:
DSP_fft16x16r(N, &x[0], &w[0], y, brev, rad, 0, N)
Algorithm
This is the C equivalent of the assembly code without restrictions. Note that
the assembly code is hand optimized and restrictions may apply.
void fft16x16r
(
int n,
short *ptr_x,
short *ptr_w,
unsigned char *brev,
short *y,
int radix,
int offset,
int nmax
)