CA2086174C

CA2086174C - Computational structures for the frequency-domain analysis of signals and systems

Info

Publication number: CA2086174C
Application number: CA002086174A
Authority: CA
Inventors: Duraisamy Sundararajan; M. Omair Ahmad; M. N. Srikanta Swamy
Original assignee: Individual
Current assignee: Individual
Priority date: 1992-12-23
Filing date: 1992-12-23
Publication date: 1998-08-25
Anticipated expiration: 2012-12-23
Also published as: GB2283592A; GB2283592B; GB9322858D0; CA2086174A1

Abstract

Since the invention of the radix-2 structure for the computation of the discrete Fourier transform (DFT) by Cooley and Tukey in 1965, the DFT has been widely used for the frequency-domain analysis and design of signals and systems in communications, digital signal processing, and in other areas of science and engineering. While the Cooley-Tukey structure is simpler, regular, and efficient, it has some drawbacks such as more complex multiplications than required by higher-radix structures, and the overhead operations of bit-reversal and data-swapping. The present invention provides a large family of radix-2 structures for the computation of the DFT of a discrete signal of N samples. A member of this set of structures is characterized by two parameters, u and v, where u (u = 2r, r =
1, 2, . . ., (log2 N)-1) specifies the size of each data vector applied at the two input nodes of a butterfly and v represents the number of consecutive stages of the structure whose multiplication operations are merged partially or fully. It is shown that the nature of the problem of computing the DFT is such that the sub-family of the structures with u = 2 suits best for achieving its solution. These structures have the features that eliminate or reduce the drawbacks of the Cooley-Tukey structure while retaining its simplicity and regularity.
A comprehensive description of the two most useful structures from this sub-family along with their hardware implementations is presented.

Description

The present invention is related to improved fast-Fourier-transform (FFT) analyzers. It is directed to both the butterfly circuits and the interconnection of the butterfly circuits to form FFT analyzers.
The use of the discrete Fourier transform (DFT) for the frequency-domain analysis and design of signals and systems came into widespread use after the publication of the paper "An Algorithm for the Machine Calculation of Complex Fourier Series, in "Math. Comp~-tation., vol. 19, Apr. 1965, pp. 297-301 by Cooley, J. W. and Tukey, J. W. describing, in general, the decomposition of an N-point DFT into a number of DFTs of a smaller size and, in particular, the radix-2 decimation-in-time (DIT) algorithm. The radix-2 decimation-in-frequency (DIF) algorithm was reported later by Gentleman, W. M. and Sande, G. in "Fast Fourier Transforms--for Fun and Profit,"in Proc. AFIPS, Joint Comp~ter Conf., vol. 29, 1966, pp. 563-578. However, both the DIT and DIF algorithms, in general, are referred to as Cooley-Tukey algorithms. A detailed description and the history of the development of these algorithms can be found in Special Issue on FFT and Applications, IEEE Trans.
Alldio Electroaco~st., vol. AU-15, June 1967. Despite the subsequent development of other algorithms, it is the Cooley-Tukey radix-2 algorithm that has been most widely used as presented originally without any significant changes, due to the simplicity, regularity, and efficiency of the resulting computational structure. A drawback of the computational struc-ture of the Cooley-Tukey radix-2 algorithm is that it requires more complex multiplications compared with that of the higher-radix algorithms. In addition, this structure has the over-heads of bit-reversal and data-swapping operations. This disclosure reports the invention of a large set of computational structures, designated as plus-minus (PM) structures, derived from a new family of radix-2 algorithms, designated as the plus-minus (PM) algorithms.

,!' ~ .

2086~ 74 The Cooley-Tukey computational structure based on the radix-2 algorithm uses a divide-and-conquer strategy to decompose the problem of computing an N-point DFT into a num-ber of sub-problems of computing 2-point DFTs each characterized by a basic computational process commonly known as the butterfly operation. Each of the input and output nodes of a butterfly has only one data value. Therefore, a single butterfly computes only one 2-point transform. However, it is possible to design a large number of radix-2 computational structures by making the input data to each node of a basic computational unit a vector rather than a scalar. The computational structures of the present invention are developed from this observation.
Since a 2-point transform is computed with one data value from each of the two input nodes of the butterfly, the existence of more than one data value (i. e., the elments of a vector) at each node implies the computation of two or more 2-point transforms by each butterfly. The ~-element vector formation from the raw given data is carried out by computing ~b-point DFTs in a preprocessing structure (the first part of the PM structure).
The computational structures derived from the PM family of radix-2 algorithms handle the data values as vectors with 2 or more elements and compute more than one 2-point DFTs in each butterfly of the second part of the structure.
In drawings which illustrate embodiments of the invention, ~igure 1 is the butterfly of the 2 x 1 PM DIT DFT structure. Two of the twiddle factors assume a value of 1, whereas the other two differ by a factor of WN, ~igure 2 is the flowchart of the 2 x 1 PM DIT DFT structure for complex-valued input data with N = 32, Figure 3 is the butterfly of the 2 x 1 PM DIF DFT structure. Two of the twiddle factors assume a value of 1, whereas the other two differ by a factor of W~
Figure 4 is the flowchart of the 2 x 1 PM DIF DFT structure for complex-valued input data with N = 32, ~igure 5 is the hardware realization of the butterfly of the 2 x 1 PM DIT DFT structure, 20861 7~
Figure 6 is the hardware realization of the 2 x 1 PM DIT DFT structure for complex-valued input data with N = 8, Figure 7 is the flowchart of the 2 x 1 PM DIT DFT structure for real-valued input data with N = 32, Figure 8 is the flowchart of the 2 x 1 PM DIF IDFT structure for the transform of real-valued data with N = 32, Figure 9 is the butterfly of the 2 x 2 PM DIT DFT structure with four twiddle factors, Figure 10 is the flowchart of the 2 x 2 PM DIT DFT structure for complex-valued input data with N = 32, Figure 11 is the flowchart of the 2 x 2 PM DIT DFT structure for complex-valued input data with N = 16, Figure 12 is the butterfly of the 2 x 2 PM DIF DFT structure with four twiddle factors, Figure 13 is the flowchart of the 2 x 2 PM DIF DFT structure for complex-valued input data with N = 32, Figure 14 is the flowchart of the 2 x 2 PM DIF DFT structure for complex-valued input data with N = 16, Figure 15 is the hardware realization of the basic computing unit of the 2 x 2 PM DIT
DFT structure, Figure 16 is the flowchart of the 2 x 2 PM DIT DFT structure for real-valued input data with N = 32, Figure 17 is the flowchart of the 2 x 2 PM DIF IDFT structure for the transform of real-valued data with N = 32, Figure 18 is the flowchart of the 2 x 1 PM DIT DFT structure for 8 x 8 complex-valued 2-D input data, a twiddle factor W8 is shown only by its exponent s, ~f - - 2086 1 7~
Figure 19 is the flowchart of the 2 x 1 PM DIF DFT structure for 8 x 8 complex-valued 2-D input data, a twiddle factor W8 is shown only by its exponent s, Figure 20 is the flowchart of the 2 x 2 PM DIT DFT structure for 8 x 8 complex-valued 2-D input data, a twiddle factor W8 is shown only by its exponent s, and Figure 21 is the flowchart of the 2 x 2 PM DIF DFT structure for 8 x 8 complex-valued 2-D input data, a twiddle factor W8 is shown only by its exponent s.

Each member of the set of computational structures of the present invention is designated hereafter as u x v PM structure and the algorithm through which it is derived is designated as u x v PM algorithm. The letter u indicates the length of the input vectors to a butterfly.
It is a power of 2 with a positive integer exponent in the range 1 to M--1 (In this disclosure it is assumed that N, the number of samples in the input data set, is equal to an integral power of 2 i.e., M = log2 N is a positive integer.). It is possible to combine fully or partially the multiplication operations of two or more consecutive stages in the PM structures in order to reduce the total number of multiplication operations. The letter v, in the range 1 to (M--log2 u), specifies the number of stages whose mult;plications are combined.
The decomposition process of the computation of an N-point DFT into 2-point DFTsrequires the incorporation of the vectorization process in the definition of the DFT. Starting from the definition of the DFT, we present an equivalent vectorized format for any value of u.
The DFT of a data sequence {x(n)~ n = 0,1, . . ., N--1} is defined as X(k) = ~, 2(n)WNk, k = 0,1, . . ., N--1, (1) n=O
where WN = exp(--j2N). For a positive integer u as defined earlier, (1) can be rewritten as X(k) = {x(0) + x(1--)WN u + ~ ~ ~ + X((fl--1)--)WN ) u }WNk N)WlkN + ~ ~ + 2(1 + (U--1)--) WN } N

'll ( U 1 ) WN + ~ ~ ~ + X (U----1 ) W( --l )k N } (----1 )k (2) Since WN = WU1, (2) can be expressed as 2 0 8 6 1 7 4 X(k) = {x(O) + x(l--)Wuk + ~ ~ ~ + x((u--1)--)WUu l)k}WNk +{x(l) + x(l + 1--)Wulk + ~ ~ ~ + x(l + (u-- 1)--)WU )k}WN +

U ) + (2 U 1)WU + ~ ~ + X(U----1)W(U--l)k}W(N-l)k Note that the expressions inside each pair of braces is a u-point DFT and when evaluated will give rise to u distinct values for k = O, 1, . . ., ?b--1. Let a(n)(n = O, 1, . . ., N _ 1) denote vectors, each consisting of u-elements (Vectors will be denoted by boldfaced letters). The vector a(n) consists of the nth u-point transform values aq (n) (q = O, 1, . . ., u- 1) as defined below.
a(n) = {aO(n),al(n),...,au-l(n)} =
{valuesof theu--pointDFTof {x(n+O--),x(n+1--),...,x(n+(u--1)--)}}, n=0,1,.. ,----1. (4) u Therefore, (3) can be rewritten as X(k) = aq(O)WN + aq(1)WN + ~ ~ ~ + aq(----1)W( u = ~, aq(n)WNk7 k = O, l, . . ., N--1- (5) n=O
Since the values of the u-point transform aq(n) are periodic with a period of u, for values of k equal to or greater than u in (5), the values aq(n) repeat. Therefore, q = k mod u. (6) Replacing the argument k by k + PN in (5) and (6) yields N) (O)WO(k+PU)+aq(1)WN( U + +aq(U 1)WN
k = 0,1, . . .,----1, p = 0,1, . . ., u--1 (7) q = (Ir + p--) mod u. (8) - 2086~ 74 Let us vectorize the output transform values X (k) as A(k) = {Ao(k) , A1 (k) , . . ., Au-l (k)} = { X (k + p - ),p = O,l,...,u - 1}, k = 0,1,.. .,----1. (9) Therefore, (7) can be expressed as Ap(k) = aq(O)WNk(WNU )~P + aq(l)WNk(WNU )lp + ~ - + aq( - - 1)WNU ) (WNU )( U -1)p Ok( l)op + (l)Wlk(Wl)lP + ~ ~ ~ + aq(----1)WN ( U) With the vectorization of the input quantities x(n) as carried out in (4) and the output quantities X (k) as carried out in (9) and from (10), the DFT as defined by (1) can be equivalently written as N - l Ap(k) = ~ (WUI)Pnaq(n)WNk, k = 0,1,..., - - 1 (11) where q = (k + PNU ) mod u and p = O, 1, . . ., u--1. Thus, the PM structures must have a part in which u-element vectors are formed by implementing u-point transforms.
The inverse DFT (IDFT) is given by x(n) = N ~, X(k)WNnk, n = 0,1, . ..,N--1. (12) A set of expressions similar to those given by (4), (9), and (11) can be obtained for the case of IDFT. The input quantities X (k) are vectorized as B(k) = {Bo(k)~ Bl(k),..., Bu-l(k)} =
{values of the u - point IDFT of { X (k + O - ), X (k + 1 - ),..., X (k + (u - 1) - )}}, k = 0,1,. .. ,----1. (13) The output data values x(n) are vectorized as b(n)={bo(n),bl(n), -,bu-l(n)} = {x(n + p - ),p = O,1,~..,u - 1}, n=O,l,..... ,----1. (14) A~

-- 20861 7~
An expression similar to (11) for computing the IDFT can be obtained from (12) using (13) and (14) as ( ) u ~ ( W -l)PkB (k) W Nnk~ n = 0,1,. ~ u 1 (15) where q = (n+ pUN) mod u and p= 0,1,...,u--1.
In the subset of the computational structures with u = 2, data samples are organized as 2-element vectors. For the specific value of u = 2, (4), (9), and (11) become, respectively, as a(n) = {aO(n),al(n)} = {x(n) +x(n+ 2 )~ x(n)--x(n+ 2 )}~
n = 0, 1, . . ~ 2 -1 (16) A(k) = {Ao(k)7Al(k)} = {X (k)~ X (k + 2 )}' k = 0,1,.. ., 2 --1 (17) N - l Ap(k) = ~ (_l)P aq(n)wNk7 k = 0~ 2 - 1 (18) where p = 0 or 1 represents the first or the second element, respectively, of the output vector A(k), and q = k mod 2, N > 2 Similarly, in the case of IDFT, for u = 2, (13), (14), and (15) become, respectively, as B(k) = {Bo(k)7Bl(k)} = {X (k) + X (k + 2 )~ X (k) - X (k + 2 )}' k = 0,1,..., 2 --1. (19) Note that the process of forming the vectors B (k) differs from the one using (12), in that the values are not divided by 2, the value of the divisor N outside the summation in (12).
This is done to avoid the division by a constant more than once.

b(n) = {bO(n),bl(n)} = {x(n),x(n+ 2 )}' n=O,l,..... , 2 -1 (20) ~- 20861 7~

bp(n) = N ~ )PkBq(k)WNnk~ n = 0,1, . ..,----1 (21) where p = O or 1 represents the first or the second element, respectively, of the output vector b(n), and q = n mod 2, N > 2 Equations (18) and (21) can have direct hardware or software implementation to compute the DFT and IDFT, respectively. The decomposition of these equations can be expected to yield faster and more efficient computational structures. Therefore, the procedure for the decomposition of these equations and the derived structures will be presented in detail in the following sections. The procedure of obtaining the computational structures for the parameter values of ~ ~ 2 follow along the same lines. It should be noted that the number of array-index updatings in the PM structures will be proportional to Nu due to the vectorization of the data samples.
The 2 x 1 PM DIT computational structure to compute the DFT for complex-valued in-put data is derived by decomposing (18) corresponding to a number of computational stages where the multiplication operations of consecutive stages are not combined. Decomposition of (18) corresponding to the even-indexed and the odd-indexed input samples aq(n) yields N_1 N_1 Ap(k) = ~ )P(2n)aq(2n)WNnk + ~ (_ I )P(2n+l)a (2n + l)W(2n+l)k (22) n=O n=O
Since (--1)P(2n) = 1 and WNnk = W_k~ (22) can be rewritten as 4 --1 4N_1 Ap(k) = ~, aq(2n)WN + ( - 1)PWN ~ aq(2n + 1)WN . (23) n=O 2 n=~ 2 The output transform Ap(k+ N) which is N samples ahead of the kth sample Ap(k), can be obtained by replacing k by k + N in (23) . The resulting equation, by noting that WN =--i N

and WN =--1, can be written as Ap(k + 4 ) = ~ (--1)naq(2n)WNk + (--1)P(--j)Wk ~ (--1)naq(2n + 1)WNk, n=O 2 n=O 2 k= 0,1,.. , 4 --1. (24) Using (18), the transform of the N even-indexed input samples aq(n), represented by Ao(k) and A1(k), respectively, is given by Ao(k) = ~ aq(2n)W2N
n=O
and 4N_1 Ae (k) = ~ (--1 ) na (2n) wnk n=O 2 and the transform of the N odd-indexed input samples aq(n), represented by Ao(k) and Al~(k), respectively, is given by Ao(k) = ~ aq(2n + 1)WN
n=O 2 and N_1 Al(k) = ~ (_l)naq(2n+ l)WNk n=O
where k = 0,1,,, , N4 _ 1. Now (23) and (24), using the smaller size transforms given above, can be written as Ap (k) = Ao (k) + (--l ) pwNAo (k) (25) and Ap(k + 4 ) = Al (k) + (--l)P(--j)WNAl~(k) (26) where k = O, 1, , N _ 1. From (25) and (26), we see that in order to compute the transform values Ap(k) and Ap(k+ N), the precomputation of Ao(k) and A1(k), and Ao(k) and Al~(k) is required. Thus, the problem of computing an N--point DFT has been decomposed into a problem of computing two N -point DFTs. This process of decomposition can be continued.
In general, the relations governing the basic computation at the rth stage can be deduced from (25) and (26) as Aor+l) (h) = Aor) (h) + A(r) (I ) wS
Al ) (h) = Aor) (h)--Aor) (I ) wS
Aor+l)(l) = A(r)(h) + A(r)(l)WN+ 4 A(r+l)(l) = A(r)(h)--A(r)(l)WN 4 .

where s is an integer whose value depends on the stage of computation r and the index h.
These equations characterize the input-output relations of a butterfly, the basic computing unit, shown in Fig. 1. These equations represent two 2-point DFTs after the input values indexed with I are multiplied with appropriate twiddle factors. In the general case, each butterfly of the PM structures compute ~b 2-point transforms. In Fig. 1, two input ports of the butterfly are marked 100 and 101. Two output ports of the butterfly are marked 102 and 103.
Repeated use of (25) and (26) yields the flowchart of the computational structure for a specific value of N. The set of butterflies for the last stage is found by allowing k to vary from 0 to N _ 1 in (25) and (26). The output vectors of the preceding stage are split into two sets of N-vector DFTs. The set of butterflies for each of the two independent sets of this stage is found by allowing k to vary from 0 to N _ 1 in equations obtained from (25) and (26) by replacing N by N2 As seen from (25) and (26) ( which correspond to the last stage), h and I of the general butterfly differ by a number that ranges from 1 in the first stage to 2N in the last stage. The process of decomposition stops when the size of the transforms reduces to two vectors. The flowchart of the 2 x 1 PM DIT DFT computational structure for N = 32 is shown in Fig. 2. One of the butterflies that makes this structure is shown in dashed box 106. Reference numeral 104 indicates an input vector with index zero and reference numeral 105 indicates an output vector with index zero.
The 2 x 1 PM DIT structure for computing IDFT can be obtained by decomposing (21) in a similar way. The process will lead to relations similar to (25) and (26) with the sign of the exponents of WN negated and--j replaced by j. Finally, each output data vector component bq(n) is divided by N.
The 2 x 1 PM DIF computational structure to compute the DFT for complex-valued in-put data is derived by decomposing (18) corresponding to a number of computational stages where the multiplication operations of consecutive stages are not combined. Decomposition of (18) corresponding to the first-half and the second-half of the input samples aq(n) yields 4 --I I~T_1 A (k) = ~ (--l)Pnaq(n)wNk + ~ (--l)Pnaq(n)wN (27) ~=0 n= 4 ~- - 2086 1 74 Replacing n by n + N in the second summation of the right side of (27) gives 4N - 1 4N - 1 ( 2 8 ) n=O n=O
The pair of summations in (28) can be combined into a single one, giving Ap(k) = ~ (--l)Pn{aq(n) + (--l)P4 (--j)kaq(n + 4 )}WN (29) n=O
The even-indexed and odd-indexed transform values Ap(k) are readily obtained from (29) by replacing k by 2k and k by 2k + 1, respectively, as Ap(2k) = ~, (--l)P {aO(n) + (--l)P 4 (--1) aO(n+ 4 )}WN (30) n=O
and Ap(2k + 1) = ~, (--1) {al (n) + (--1)P 4 (--j)(--l)kal (n + 4 )}WNWNk (31) n=O
where k = 0~ N _ 1. From (30) and (31)~ we see that the input values aq(n) and nq(n+ N4 ) are combined to reduce the problem size. The problem of computing an N--point DFT, therefore, has been decom~iosed into a problem of computing two N2--point DFTs.
This process of decomposition can be continued. In general, the relations governing the basic computation at the rth stage can be deduced from (30) and (31) as aOr+l)(h) = aOr)(h) + aO )(I) a(r+l ) (h) = aOr) (h)--aOr) (I ) aO (I) = al (h)WN + al (I)WN
(r+l)(l) = a(r)(h)WN--a( )(I)WN 4 where s is an integer whose value depends on the stage of computation r and the index h.
These equations characterize the input-output relations of a butterfly, the basic computing unit, shown in Fig. 3. In Fig. 3, two input ports of the butterfly are marked 200 and 201.
Two output ports of the butterfly are marked 202 and 203.
Repeated use of (30) and (31) yields the flowchart of the computational structure for a specific value of N. The set of butterflies for the first stage is found by allowing k to vary . .

from 0 to N _ 1 in (30) and (31). The input vectors of the succeeding stage are split into two sets of N-vector DFTs. The set of butterflies for each of the two independent sets for this stage is found by allowing ~ to vary from 0 to N _ 1 in equations obtained from (30) and (31) by replacing N by N. The process of decomposition stops when the size of the transforms reduces to two vectors. The flowchart of the 2 x 1 PM DIF DFT computational structure for N = 32 is shown in Fig. 4. One of the butterflies that makes this structure is shown in dashed box 206.
The 2 x 1 PM DIF structure for computing IDFT can be obtained by decomposing (21) in a similar way. The process will lead to relations similar to (30) and (31) with the sign of the exponents of WN negated and--j replaced by j. Finally, each output data vector component bq(n) is divided by N.
Fig. 5 shows a possible hardware realization of the butterfly of the 2 x 1 PM DIT
DFT structure. Several of these units can be used, and in the ultimate realization all the stages can be realized with complete hardware without any software at all. For the sal~e of discussion, assume that only one butterfly is available in a computing system as part of an arithmetic unit. The butterfly unit could be made for different precisions and types of arithmetics. The butterfly essentially consists of four basic arithmetic units: multipliers, adders, subtractors, and plus-minus units. Reference numerals 110, 111, 112, 113, 114, 115, 116, and 117 indicate real multipliers. Reference numerals 121 and 122 indicate adders. Reference numerals 120 and 123 denote subtractors. Dashed boxes 140 and 141 point to complex multipliers. Reference numerals 130, 131, 132, and 133 indicate plus-minus units. The + and - symbols on a subtractor indicates the minuend and subtrahend, respectively. A plus-minus unit has a + input and a +/- input and two outputs. The +
output is simply the addition of the two inputs, whereas the - output is the difference of + and +/- inputs. As the sum and the difference produced by the plus-minus units are stored in consecutive locations in the memory, both the results need only one destination address and can be efficiently moved to the memory as a long word. Each butterfly has two twiddle factors WN and WN+ 9 . Assuming that WN = C + jd, WN+ 4 = d--jc. Note that the inputs to the butterfly are decomposed into their real and imaginary parts before they are applied to multiplier or plus-minus units. Similarly, the outputs from the butterfly are also in decomposed form. Specifically, A(r) = AR(r) + jAl~(r)~ (i = 0,1). The input ports of the butterfly for the real and imaginary parts of the first complex input signal are marked 100rO and 100iO, respectivly. For the second input signal, the ports are maked 100rl and 100il. For the third input signal, the ports are maked 101rO and 101iO. For the fourth input signal, the ports are maked 101rl and 101il. The output ports of the butterfly for the real and imaginary parts of the first complex output signal are marked 102rO and 102iO, respectivly. For the second output signal, the ports are maked 102rl and 102il. For the third output signal, the ports are maked 103rO and 103iO. For the fourth output signal, the ports are maked 103rl and 103il. The multipliers, the adders and subtractors, and the plus-minus units of a butterfly can be used as a three-stage pipeline architecture. For each cycle, one set of four complex-valued outputs are produced from four complex-valued input signals. Each stage of the pipeline is busy with the next set of data.
The operation of the pipeline needs buffer units between the stages and control circuits that are not shown in Fig. 5.
Fig. 6 shows the hardware realization of the 2 x 1 PM DIT DFT structure for complex-valued input data with N = 8. The first part of structure marked 150 implements the 2-point transforms to form the vectors and the swapping of the data. The real part of the complex input signal indexed zero is marked 151. The real part of the complex output signal indexed zero is marked 159. The second part of the structure consists of two stages of butterflies. The first stage has no multiplier units as the twiddle factors for this stage are all either 1 or--j. Dashed boxes 152 and 153 are the two first stage butterflies. Dashed boxes 156 and 158 constitute the first butterfly of the second stage. Dashed boxes 154, 155 and 157 constitute the second butterfly of the second stage. The hardware realization for the 2 x 1 PM DIF structures will contain same number of multipliers, adders, etc., but the interconnections will be different.
The transform of a set of real-valued input data is Hermitian-symmetric, i.e., the real part of the transform is even symmetric and the imaginary part is odd symmetric. For an even N, the transform values X(O) and X(N) are real-valued and distinct and X(k) = X*(N--k), 1 < k < 2 --1.

Therefore, for a specific value of N, only the first-half of the butterflies of each group of the computational structure corresponding to complex-valued data are computed during the computation of any stage. Due to this reason, the memory requirement is reduced by one half and the computation requirement is slightly less than one half of that of the structures for complex-valued input data. Therefore, the computational structure is similar to but only one half in size corresponding to complex-valued input data.
In the subset of the computational structures with ?1 = 2, data samples are organized as 2-element vectors. The storage of the vectors differs from that of the complex-valued data.
For the specific value of ~ = 2, vectors defined in (16) and (17) are modified, respectively, as a'(n) = {a(n), a(n + 4 )} =

{{x(n)+x(n+ 2),x(n)--x(n+ 2)},{x(n+ 4)+x(n+ 4 ),x(n+ 4)--x(n+ 4 )}}, N

n = 0,1,.. , 4 --1 (32) A (O) = {A(O), Ao( 4 )} = {{X(O), X( 2 )}~ X( 4 )}, A'(k) = {A(k)} = {X(k),X(k+ 2 )}' N

k = 1, 2, . . ., 4 --1. (33) Fig. 7 shows the flowchart of the 2 x 1 PM DIT DFT computational structure for real-valued input data with N = 32. Reference numeral 160 indicates an input vector indexed zero.
Reference numeral 170 indicates an output vector indexed zero. Although the structure has the same number of stages as its complex-valued counterpart given in Fig. 2, there are some differences that must be noted. The number of butterflies in each stage is one half. In the first stage, 4-point transforms are computed by merging two 2-point transforms.
As a vector contains two 2-point transforms, the four point transform is computed using the elements of a single vector. This operation in Fig. 7 is marked as f indicating the computation of a 4-point transform in the flowchart. A butterfly marked f is shown in dashed box 161. In the implementation, however, the vector formation, its scrambling, and the computation of the 4-point transform are all carried out at the same time. Since the -first output vector has three transform values, the first butterflies in each group marked g is to be interpreted differently. A butterfly marked g is shown in dashed box 162. While the second vector is computed in a regular manner, the computation of the first vector consists of computing the first transform value and two others that are separated from it by one-quarter and one-half of the samples in the group. The other butterflies are similar to the structures of the complex-valued input data. Butterfly with ports marked 163, 164, 165 and 166 is one such butterfly. As only the first half of each group of butterflies in each stage corresponding to the structure for complex-valued data are computed, negation, swapping and conjugation operations must be carried out in order to get the first half of the output transform vectors. The conjugation operations are shown in Fig. 7 by the symbol * and one of them is marked 168. These operations are shown in Fig. 7 as post-extended operations of each stage, but in the implementation they are merged with the main stage and they do not pose any additional computational complexity. Butterfly with ports marked 163, 164,167 and 169 constitute a butterfly for real data. The hardware implementation of the PM structures for real-valued input data for computing DFT is similar to PM DIT DFT
structures for complex-valued input data with the differences as explained above.
While the redundancies due to real-valued data can be exploited in the development of either the DIT or the DIF structures, it is easier to implement the DFT computation using the DIT structure and implement the TDFT computation using the DIF structure.
In the case of IDFT, for the specific value of ~ = 2, the vector definitions (19) and (20) are modified, respectively, as N

B'(O) = {B(O), B( 4 )} =

{{X(0) + X( 2 )~ X(0)--X( 2 )}~ {X( 4 ) + X*( 4 ), X( 4 )--X*( 4 )}}, B'(k) = {B(k)} = {X(k) + X(k + 2 )~ (X(k)--X(k + 2 ))}' N

k = 1,2,. .. , 4 -1 (34) b~(n) = {b(n),b(n+ 4 )} = {{x(n), 2(n+ 2 )}~ {~(n+ 4 ), ~(n+ 4 )}}, n=0,1,.. , 4 -1. (35) Fig. 8 shows the 2 x 1 PM DIF IDFT computational structure for the transform of real-valued data with N = 32. Reference numeral 260 indicates an input vector indexed zero.
Reference numeral 270 indicates an output vector indexed zero. In computing the IDFT, the input is always the first half of the vectors but the computation requires first quarter and third quarter of the input vectors. Therefore, similar to the forward transform, the conversion of the second quarter data into that of the third quarter data requires the op-erations of negation, data swapping, and conjugation. In the IDFT these operations are carried out before the butterfly operations of each stage, i.e., each stage is pre-extended. The conjugation and data swapping operations are shown in the pre-extended stages but they are combined with the butterfly operations without posing any addit;onal computational complexity. Butterfly with ports marked 263,264,265 and 266 is similar to a complex butterfly. Butterfly with ports marked 263,264,267 and 269 is a butterfly for the trans-form of real data. Since the first vector has three real values and one imaginary value that are independent, the first butterfly marked g of each group must be interpreted differently.
While the second vector is computed in the same manner as for the complex-valued data, the computation of the first vector consists of computing the first inverse transform value and three others that are separated from it by one-quarter, one-half, and three-quarter of the samples of each group. A butterfly marked g is shown in dashed box 262. In the last stage, two 2-point IDFTs are computed in a butterfly as usual. However, all the four input and output values are stored in a single vector and this operation is marked as f in Fig.
8. A butterfly marked f is shown in dashed box 261. This operation of this stage can be combined with the operations of scrambling and dividing the transforrn values by N.
The hardware implementation of the PM structures for real-valued input data for comput-ing IDFT is similar to PM DIF IDFT structures for complex-valued input data with the differences as explained above. -In the 2 x 2 PM structures the data values are 2-element vectors and the multiplication operations of each pair of adjacent stages are combined. To derive the 2 x 2 PM DIT
structure for computing the DFT, consider the input-output relations (25) and (26) for the last stage of the 2 x 1 PM DIT structure, as given by Ap(k) = Ao(k) + ( - l)P(Wk Ao(k)) (36) and Ap(k + 4 ) = Ale(k) + (--l)P(--j)(WkAl~(k)) (37) where k = 0,1, . ., N _1. The output transform values Ap(k+N) and Ap(k+3N) which are N8 samples ahead of those given by (36) and (37), respectively, can be obtained by replacing k by k + N in these equations and the resulting equations are given by Ap(k + 8 ) = Ao(k + 8 ) + (--1)P(WN+ 8 Ao(k + 8 )) (38) and Ap(k + 8 ) = Ael(k+ 8 ) + (--l)P(--j)(WN+8 Al~(k + 8 )) (39) where k = 0,1, . . ., N8 - 1. The variables Aq(i) and Aq~(i) (q = 0,1; i = k, k + N8 ) on the right side of (36) through (39) represent the output values of the previous stage. These values, in turn, can be obtained recursively as Ae(k) = Aoe(k) + (--l)PWNkAo~(k), (40) Ap(k+ 8 ) = Alee(k) + (--l)P(--j)wNkAeo(k) (41) A~ (k) = Aoe (k) + (--l )PWNkAo~ (k), (42) and Ap(k + 8 ) = Al~e(k) + (--l)P(--j)wNkAoo(k) where k = 0,1, . . ., N _ 1. Since the terms Aq~(k) and Aq(k + N) Of (36) through (39) have twiddle factors associated with them, (42) and (43) can be modified accordingly as WNAp(k) = WkAoe(k3 + (--l)PW3kA~~(k) and WN 8 Ap(k + 8 ) = WN 8 Al~e(k) + (--1)P(--j)WN 8 Al (k) (45) where k = 0,1, . .., N8 - 1. From (44) and (45), we see that the butterfly of the 2 x 2 PM
structure has up to four non-trivial twiddle factors. The form of these twiddle factors and their interrelationships, in fact, characterizes this structure. The input-output relations of a butterfly of the 2x 2 PM DIT structure with four twiddle factors shown in Fig. 9 can be deduced from (44) and (45) as A(r+l ) (h) = A(r) (h) WN + A( ) (I ) WN

A(r+l ) (h) = A(r) (h) WN--A( ) (I ) WN
A(r+l) (l) = A(r) (h) WN+ 8 + A(r) (l ) WN 8 A(r+l) (I) = A(r) (h)WN+ 8 _ A(r) (I)WN 8 where s is an integer whose value depends on the stage of computation r and on the index h.
Fig. 10 shows the flowchart of the 2 x 2 PM DIT structure for N = 32. There are four stages of computation after the formation of the input vectors. The multiplication operations of the first and the second stages and those of the third and the fourth stages are separately combined. A 2x2 PM DIT butterfly is shown in dashed box 175. Butterflies marked 173 and 174 require no nontrivial multiplications. Butterfly marked 171 requires, in general, two nontrivial multiplications. Butterfly marked 172 requires, in general, four nontrivial multiplications. If M is even, there is an odd number of stages after the formation of the input vectors. In this case, for the purpose of merging of multiplication operations, stages are paired starting from the last stage. The processing of the left out first stage is done in the same way as in the 2 x 1 PM DIT structure. The reason of not pairing this stage with the second stage is that it has only trivial multiplications. As an example, the flowchart for the 2 x 2 PM DIT structure for N = 16, (i.e., M = 4) is shown in Fig. 11.
The 2 x 2 PM DIT structure, as implemented by (36) through (39), (40), (41), (44), and (45), saves 25% of complex multiplications when compared with the implementation of the 2 x 1 PM DIT structure. The multiplications in two adjacent stages in the 2 x 2 PM DIT
structure have been combined such that the second stage of the pair has multiplications only by -j and the first one has less number of multiplications than those of the two stages together of the 2 x 1 PM DIT structure. In general, multiplication operation of two or more consecutive stages of a PM structure can be combined together in order to reduce such operations.

.0-, ~ ~!

-We will now derive the 2 x 2 PM DIF structure for computing the DFT. Expanding (30) such that the summation is only over N terms and noting that WN =--j, yields 8~ ( l)Pn{(ao(n) + (--l)P4 (--1) aO(n+ 4 )) n=O
(_l)P 8 (--j)k(aO(n + --) + (--1)P 4 (--1) aO(n + 8 ))}WN
k = 0~ 4 --1. (46) Changing k to 2k in (46) gives Ap(4k) = ~ (--l)P {(aO(n) + (--l)P 4 aO(n + 4 )) +
n=O
(--l)P8 (--l)k(an(n+ 8 ) + (--l)P4 aO(n+ 8 ))}WN, k=0,1,.. , 8 -1. (47) Similarly, replacing k by 2k + 1 in (46) yields Ap(4k + 2) = ~ )Pn{(aO(n)--(_l)P 4 aO(n + 4 )) +
n=O
(_l)P8 (--j)(--l)k(aO(n+ 8 )--(--l)P4 aO(n+ 8 ))}WN WN, k = 0, 1, . . . ~ 8 --1. (48) Expanding (31) such that the summation is only over N8 terms yields 8~ ( l)Pn{(al(n) + ( l)P 4 (--j)( 1) a1( 4 n=O
(--1)P 8 WN (--j)k(al (n +--) + (--l)P 4 (--j)(--l)kal (n + 8 ))}WNWN, k=0,1,.. , 4 -1. (49) Changing k to 2k in (49) gives Ap(4k+1) = ~ )Pn{(al(n)+(-l)P 4 (--j)al(n+ 4))+
n=O
(--l)P8 WN (--1) (al (n+ 8 ) + (--l)P4 (--j)al(n+ 8 ))}WNW~k, k = 0,1, . . ~ 8 --1. (50) - A~
.~

Similarly, replacing k by 2k + 1 in (49) yields 2 n 8 6 1 7 4 Ap(4k+3) = ~ )P {(al(n)--(--l)P 4 (--j)al(n+ 4 ))+
n=O
(--1)P8 (--j)WN (--1)k(al(n+ 8 )--(--1)P4 (--j)a1(n+ 8 ))}WN WN ~
N

k=0,1,..., 8 -1. (51) The input-output relations of a butterfly of the 2 x 2 PM DIF structures with four twiddle factors shown in Fig. 12, can be deduced from (50) and (51) as (r+l)(h) = aOr)(h)WN + aO (I)WN

(r+l)(h) = aOr)(h)WN--aO (I)WN

a(r+l) (¦) = a(r) (h)W3s + a(r) (I)W3s+ 8 (r+l)(l) = a(r)(h)WN5--a( )(I)WN

where s is an integer whose value depends on the stage of computation r and on the index h.
Fig. 13 shows the flowchart of the 2 x 2 PM DIF structure for N = 32. A 2 x 2 PM DIF
butterfly is shown in dashed box 275. Butterflies marked 273 and 274 require no nontrivial multiplications. Butterfly marked 271 requires, in general, two nontrivial multiplications.
Butterfly marked 272 requires, in general, four nontrivial multiplications. If M is even, there is an odd number of stages after the formation of the input vectors. In this case, in contrast to the 2 x 2 PM DIT structure, stages are paired starting from the first stage for the purpose of merging of multiplication operations. The processing of the left out last stage is done in a similar way as in the 2 x 1 PM DIF structure. The reason for not pairing the last stage with the preceding one is that it has only trivial multiplications. As an example, the flowchart for the 2 x 2 PM DIF structure for N = 16, (i.e., M = 4) iS shown in Fig.
14. The 2 x 2 PM DIF structure, as implemented by (47), (48), (50), and (51), saves 25~o of complex multiplications when compared with the implementation of the 2 x 1 PM DIF
structure. The multiplications in two adjacent stages in the 2 x 2 PM DIF structure have been combined such that the first stage of the pair has multiplications only by -j and the . ;~ ~;
~ .

second one has less number of multiplications than those of the two stages together of the 2 x 1 PM DIF structure.
Fig. 15 shows a possible hardware realization of the smallest basic computing unit of the 2 x 2 PM DIT DFT structure marked 175 in Fig. 10 that repeats itself. It consists of 2 pairs of butterflies, with butterflies in each pair coming from two consecutive stages. Butterflies marked 180, 181, 182, and 183, respectively, are detailed diagrams of butterflis 172, 171, 174, and 173 of Fig. 10, respectively, As seen from Fig. 5, a single butterfly realization has 8 multipliers, 2 adders, 2 subtractors, and 4 plus-minus units. Therefore, the 2 x 2 basic computing unit realization of Fig. 15 has a savings of 25% each for multipliers, adders, and subtractors. The basic computing unit produces 8 complex outputs in a cycle when operated as a pipeline. As the 2 x 2 PM structures have nontrivial multiplications in one of the two consecutive stages, the pipeline of the basic computing unit has only four stages compared with the 6 stages of an equivalent implementation for the 2 x 1 PM butterfly.
The hardware realization for the 2 x 2 PM DIF structures will contain same number of multipliers, adders, etc., but the interconnections will be different.
The 2 x 2 PM DIT DFT structure for real-valued data is similar to the 2 x 1 PM DIT
DFT structure with the advantage of further saving of 25% of complex multiplications by merging the multiplication operations of consecutive stages. Fig. 16 shows the 2 x 2 PM DIT
DFT structure for real-valued data. A butterfly marked g with ports 190, 191, 192, and 193 is slightly different from that of Fig. 7 due to the merging of multiplication operations.
The 2 x 2 PM DIF IDFT structure for the transform of real-valued data is similar to the 2 x 1 PM DIF IDFT structure with the advantage of further saving of 25% of complex multiplications by merging the multiplication operations of consecutive stages. Fig. 17 shows the 2 x 2 PM DIF IDFT structure for the transform of real-valued data. A butterfly marked g with ports 290, 291, 292, and 293 is slightly different from that of Fig. 8 due to the merging of multiplication operations.
The DFT and the IDFT of 2-D discrete signals is usually obtained by computing 1-D
transforms of each row of data followed by 1-D transforms of each column of the resulting data or vice versa. With this approach, direct application of 1-D DFT or IDFT computa-tional structures yields the required transform. Figs. 18,19, 20, and 21 show, respectively c~

the flowchart of the 2 x 1 DIT, 2 x 1 DIF, 2 x 2 DIT, and 2 x 2 DIF computational structures for computing 2-D DFT of 8 x 8 complex-valued data. In these figures, the twiddle factors are of the form W8 and are represented by their exponents. In Figs. 18, 19, 20, and 21, the 2-D input data is read row by row. The column transform is carried out in the first two stages. The third stage makes the vectors for the row transform carried out in the last two stages. In Fig. 18, reference numerals 197 and 198 show, respectively, an input and output vector with index zero. Reference numeral 199 shows a butterfly for the computation of vectors for the row transforms. It is possible to do the row transform first and column transform next by inputting the data column by column. Also it is possible to use DIF
structure for one transform and DIT structure for the next.
In this disclosure, the invention of a large set of computational structures, referred to as the PM computational structures, along with their hardware implementations for transformation of real-valued or complex-valued one-dimensional or two-dimensional dis-crete signals from the time-domain to the frequency-domain and vice versa has been re-ported. These structures have been derived from the design of a large family of radix-2 decimation-in-time and decimation-in-frequency algorithms, referred to as the PM algo-rithms, to compute the discrete Fourier transform or the inverse discrete Fourier transform.
A member of the set of PM structures is characterized by two parameters, ~ and v where ?l (~ = 2r, r = 1, 2, . . ., (log2 N)--1) specifies the size of each data vector applied at the two input nodes of a butterfly and v represents the number of consecutive stages of the structure whose multiplication operations are merged partially or fully. The formation of the vectors of the input data allows the computation of u 2-point DFTs or IDFTs by a single butterfly.
Each computational structure essentially consists of two parts. In the first part, ~-element vectors are formed from the N samples of the given discrete signal. In the second part, each butterfly of an interconnected network of butterflies operating on 2 ~-element input vectors produce 2 ~-element output vectors.
The nature of the problem of computing the discrete Fourier transform is such that a more efficient solution is achieved by the computational structures with ~ = 2. In this disclosure, a detailed description of the 2 x 1 and 2 x 2 PM computational structures for real and complex-valued one and two-dimensional discrete signals have been presented. The 2086 1 7~
PM structures described in this disclosure provide the advantages of less complex multi-plications, less bit-reversals, less array-index updating, and no independent data swapping compared with the Cooley-Tukey radix-2 structure.
From the general principles of the present invention and the description of the two preferred embodiments, the 2 x 1 and 2 x 2 PM structures, presented in this disclosure, those skilled in the art will readily comprehend the various modifications to which the present invention is susceptible. Examples of some of the modifications are listed below: (i) Structures with butterflies that handle vectors of more than one size can be derived. (ii) Structures with varying vector lengths from stage to stage can be realized. (iii) Structures for any positive integer value of the data length N and for vector length ~ less than N
can be designed. (iv) Structures for m-dimensional signals (m > 2) can be designed. (v) Higher-radix structures can be designed. (vi) Structures in which multiplication operations of more than two consecutive stages merged can be derived. For example, when the mul-tiplication operations of three consecutive stages are merged in the 2 x 3 PM structure an additional savings of 83~o in real multiplication operations over that provided by the 2 x 2 PM structures is obtained.

Claims

1. A radix-2 decimation-in-time fast-Fourier-transform butterfly arithmetic unit operating on four complex input signals designated first, second, third, and fourth complex input signals, comprising:
(a) a first complex multiplier circuit, fed by said third complex input signal and a complex twiddle factor signal, for providing a first multiplier output signal indicative of the product of said third complex input signal and said complex twiddle factor signal;
(b) a second complex multiplier circuit, fed by said fourth complex input signal and said complex twiddle factor signal, for providing a second multiplier output signal indicative of the product of said fourth complex input signal and a 90 degrees phase-shifted version of said complex twiddle factor signal;
(c) a first pair of plus-minus units, fed by said first complex input signal and said first multiplier output signal, for providing a first and a second complex butterfly output signal indicative of the sum and difference of said first complex input signal and said first multiplier output signal; and (d) a second pair of plus-minus units, fed by said second complex input signal and said second multiplier output signal, for providing a third and a fourth complex butterfly output signal indicative of the sum and difference of said second complex input signal and said second multiplier output signal.

2. A radix-2 decimation-in-frequency fast-Fourier-transform butterfly arithmetic unit operating on four complex input signals designated first, second, third, and fourth complex input signals, comprising:
(a) a first complex multiplier circuit, fed by said second complex input signal and a complex twiddle factor signal, for providing a first multiplier output signal indicative of the product of said second complex input signal and said complex twiddle factor signal;
(b) a second complex multiplier circuit, fed by said fourth complex input signal and said complex twiddle factor signal, for providing a second multiplier output signal indicative of the product of said fourth complex input signal and a 90 degrees phase-shifted version of said complex twiddle factor signal;

(c) a first pair of plus-minus units, fed by said first and said third complex input signal, for providing a first and a second complex butterfly output signal indicative of the sum and difference of said first and said third complex input signal; and (d) a second pair of plus-minus units, fed by said first and said second multiplier output signal, for providing a third and a fourth complex butterfly output signal indicative of the sum and difference of said first and said second multiplier output signal.

3. A radix-2 fast-Fourier-transform butterfly arithmetic unit operating on four complex input signals designated first, second, third, and fourth complex input signals, comprising:
(a) a first complex multiplier circuit, fed by said first complex input signal and a first complex twiddle factor signal, for providing a first multiplier output signal indicative of the product of said first complex input signal and said first complex twiddle factor signal;

(b) a second complex multiplier circuit, fed by said second complex input signal and a second complex twiddle factor signal, for providing a second multiplier output signal indicative of the product of said second complex input signal and said second complex twiddle factor signal;
(c) a third complex multiplier circuit, fed by said third complex input signal and a third complex twiddle factor signal, for providing a third multiplier output signal indicative of the product of said third complex input signal and said third complex twiddle factor signal;
(d) a fourth complex multiplier circuit, fed by said fourth complex input signal and a fourth complex twiddle factor signal, for providing a fourth multiplier output signal indicative of the product of said fourth complex input signal and said fourth complex twiddle factor signal;

(e) a first pair of plus-minus units, fed by said first and said third multiplier output signal, for providing a first and a second complex butterfly output signal indicative of the sum and difference of said first and said third multiplier output signal; and (f) a second pair of plus-minus units, fed by said second and said fourth multiplier output signal, for providing a third and a fourth complex butterfly output signal indicative of the sum and difference of said second and said fourth multiplier output signal.

4. A radix-2 decimation-in-time fast-Fourier-transform butterfly arithmetic unit comprising the device of claim 3 in which the magnitude of the phase difference between said first and said second complex twiddle factor signal is 45 degrees.

5. A radix-2 decimation-in-frequency fast-Fourier-transform butterfly arithmetic unit comprising the device of claim 3 in which the magnitude of the phase difference between said first and said third complex twiddle factor signal is 45 degrees.