GB2216693A

GB2216693A - Fourier transformation

Info

Publication number: GB2216693A
Application number: GB8905801A
Authority: GB
Inventors: John E Whelchel; James F Mcarthur
Original assignee: E Systems Inc
Current assignee: Raytheon Co
Priority date: 1988-03-14
Filing date: 1989-03-14
Publication date: 1989-10-11
Anticipated expiration: 2009-03-14
Also published as: DE3908276A1; GB8905801D0; GB2216693B; IL89604A; JPH0214363A; IL89604A0

Description

1 22 16693 1 - Systolic Fast Fourier Transform Method and Apparatus 2

BACKGROUND OF THE INVENTION

This invention relates to an an apparatus -and method for performing Fourier transformations on a set Of digital data. More specifically, it relates to a processor for computing the Past Fourier Transform (FFT) of discrete signals sampled from a contnuously received electronic signal. Techniques for com- puting FrTs involve the rapid computation Of multiple Discrete Fourier Transforms (DFTs). The DFT is the tool used to describe the relationship between.the time domain and frequency domain representations of discrete signals.

Devices and methods for performing FFTs derive their efficlency from the relationship of the number of data words to be transformed (N) to the number of operations required to compute the DFT (N2).__1f a large DFT can bereplaced by multiple small DFTs (2.9. p with radix R of 2 or 4) r the number of operations required can be substantially re.duced. Further, the computation of multiple small DFTs is a multistage process, with each stage having similar steps. This allows the processor calculating the FFT to have fewer unique components. The number of stages (B) is related to the sample size and the radix by: B-Log N. A large R num6er of stages, however, can increase computational complexity and reduce the accuracy of the results due to round-off errors. Therefore, to increase processing efficiency through increased sample size N, methods for prforming FFT have been driven to a compromise between the DET radix R and number of stages B, with the radix generally limited to 4 or 8 by the complexity and cost of the switching in the data paths.

Several distinct methods for performing FFTS have been discussed in the prior art and have resulted in dissimilar archi- tectures when implemented In hardware. The first was devised by t 11 1 3 Cooley and Tukey (Cooley, J.W. and Tukeyt J.W., "An Algorithm for the Machine Calculation of Complex Fourier Series, math. Comput._, Vol. 19, April 1965, pp. 297-301). This type has "variable geometry", meaning that data addressing changes from stage to stage.

The second type.is the "constant geometry" type, introduced by PeaSe (Pease, 14-C., "An Adaptation of the Fast Fourier Transform for Parallel Processing", Journal of the Association for Computing Machinery, Vol. 15, April 1968, pp. 252-264). The addressing ot data remains the same from stage-to-stage. The only price for achievement of this simplification of the resulting hardware architecture is a change in ordering of the readonlymemory (ROM) "twiddle factors" relative to those in the variable geometry type. In both types, the ROM factors ordering will change from stage to stage# and this is generally handled with address counters.

A more recent development is the introduction of "pipeline processors". This architecture divIdes the computing load into successive parallel stages, allowing simultaneous processing of R channels. one well %nown example of a pipeline processor in a variable geometry architecture is credited to McClellan and Purdy, (McClellan, J.H. and Purdy, R.J., "Applications of Digital Signal Processing", pp. 268-278, Alan V. Oppenheim, editort 1978, Prentice Hall). in order to %eep up with the rate of parallel data input, the computational elements In each stage are themselves 4-point FFTs (instead of DFTs) with 2 stages, each with 4 arithmetic processors per stage. However# as the radix increases to 8 or higher to provide more parallelism, the increasing number of commutator or cross-bar switches becomes prohibitively expensive.

A pipelined FFT processor. using the constant geometry architecture has been developed by Corinthlos (Corinthlos, M.J.r The 4 Design of a Class of Fast Fourier Transform Computers", IEEE Transactions on Computers, Vol. C-20, June 1971, pp. 617-623). This architecture also requires switching and gating for cross channel communication. It also has complex and large memory requirements which become more unwieldy as the radix increases (i.e., memory length is a function of N/R2, therefore the number of memory units required is p2). Implementations of this processor are disclos'ed In the U.S. patent to Corinthios No. 3,754,128 dated August 21, 1973, and in "A Parallel Radix-4 Past Fourier Transform Computer", IEEE Transaction On Computers, Vol. C-24# January 1975, pp. 80- 92.

other developments have focused on particular'features of the devices and methods just described. The U.S. patent to Perry, No. 4,159,528, dated June 26, 1979, introduces a correction for the phase shift introduced by the Fourier transform. The technique uses a barrel switch and delay elements to make the proper phase correction to outputs from small DFTs before combining them in a larger Fourier transform. The U.S. patent to McGee No. 4,534,009, dated August Gr 1985# implements the McClellan and Purdy architecture and discloses the use of switches and shift registers to increase the arithmetc efficiency of the computational units.

All of the single and multichannel FFT processor architectures just described require some type of interchannel communication path using switche s. These paths can dynamically change with time, and with the' stage In the variable geometry case.

it is accordingly an object of the present invention to provide an apparatus for performing the FFT of digital data without switches in the cross channel communication paEhs.

it Is another object of the present invention to provide an apparatus for performing the FFT of digital data by implementing %h 1 a new systolic geometry method where the stage-to-stage structure Is substantially duplicated.

it is still another object of the present invention to provide an apparatus for performing the FFT of digital data where the radix size is not limited by complexity and/or cost of the switching arrangement.

I It is yet another object of the present invention to provide an apparatus for performing the FFT of digital data that utilizes the phase shifting property of the Fourier transform for part of the data shuffling.

it Is a further object of the present Invention to provide a method for performing the FFT of digital data without a step for switching data in cross channel communication paths.

These and many other objects and advantages will be readily apparent to one skilled in the art to which the invention pertains from a perusal of the claims and the following detailed description of the preferred embodiments when read in conjunction with the appended drawings.

THE DRAWINGS FIG. 1 is a depiction of the unswitched order of data words exiting an FFT processor found in the prior art.

FIG. 2 is a depiction of the phase shifted order of the data words exiting an FFT processor of the present inventlon.

FIG. 3 Is a schematic diagram of a 3 stage FFT processor incorporating the phase shifting scheme of FIG..2.

FIG. 4 is a schematic diagram of reduced systolic geometry version of the device in FIG. 3.

a 6 FIG. 5 is a schematic of a single stage of the processor of FIG. 4.

FIG. 6 is a depiction of the input and output of the FFT computational element in the stage of FIG. 5.

FIG. 7 shows the computational architecture of the FFT cam:-putational element of FIG. 6.

rIG. 8 displays the twiddle coefficients used in a version of the 12FT computational element of FIG. 7.

FIG. 9 is a smaller ROM version of the twiddle coefficients shown in FIG. 8.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS he present invention is an apparatus and method for performing a r-ast Fo urier Transform (FFT) implementing a new systolic geometry method derived from the constant geometry method of Pease. Digital data are received in windows of N complex data words and sorted in R independent and parallel channels. The data are then sequenced through LogRn serially arranged stagep. Each stage has a data addressing operator (an N x N matrix known as a random access memory (RAM) shuffle operator), a rourier transform operator having computational elements that are themselves FFTs or DFTs (known as kernels), a random access memory for each channel, a twiddle factor operator and a novel phase shift operator to modify the data to be transformed without cross channel communication.

Conceptually, this operation is accomplished by factoring the standard global shuffle operator for each stage that is used in FFT processors of this general type (see Pease and Corinthios, E.!jra) into three permutation operators. The second of these i i i i i i i 1 7 permutation operators is constrained to operate on one data element in each channel, providing it to the proper kernel. For example,An a radix 4 embodiment, this means that four data words, one from each of four channels, are presented to a 4-point FFT kernel simultaneously in each stage. The first and third of the permutation operators are transformed into the phase shift opera tors. These operators perform the equivalent of switching the data words (to get the proper data words together for presentation to th-kernels) by making phase rotations on the complex data words in each channel separately. In this manner relatively straight forward addressing of random access memories in each channel can be used to obain the appropriate data words, rather than using the switches disclosed in the prior art.

hroughout the description of the Invention that follows, N=64 and R=4 (yielding 3 stages) except where specifically noted. These values are used for example only and should not be construed to be limiting the invention. The method will apply to any radix R and sample size N, although both N and R are generally integer powers of 2.

1 i i 1 1 i i 8 The constant geometry FFT calculation method of the prior art can be depicted In matrix form as: (read right-to-left for the order of matrix operations).

1 F 64X '-- F4 S4/16 D16 F4 S4/16 D 64 F4 S4/16 X Stage 3 Stage 2 Stage 1 where, X = input column data vector of N=G4 complex data words, S4/16 = shuffle operator which selects 4 data words spaced 16 data words apart from the input vector and groups them together for input to the Fourier transform operator, F 4 Fourier transform operator, an N x N block diagonal matrix containing 4-poin.t DFTs along the block diagonal, D 64 and D16 are "twiddle factors" to be discussed later, F DFT matrix with rows in digit reversed order.. 64 The S 4/16 shuffle operator changes the order of data in a column vector from a sequence of 0,1,2,...# N-1 to 11/R sequences of 0, N/R, 2N/R,..., (R-1)N/R; 1, [N/Rl+le...# [(R-1)N/R1+1;... [N/RJ-1, we# 17-1. The same effect Is achieved by shifting R parallel input channels NIR data words relative to adjacent channels. For N=64 and R=4t an Input sequence of Orl, 2, co.y 63 is changed to 0, 1 j 2, 16, 17, 18 32, 33, 34 48, 49.

SOF 0 a 0 of 31 47 63 The operator F 4 is more accurately described as (116 X FO where X represents the Kronecker product of the two matrices. As an operator, this matrix is equivalent to performing a 4-point Fourier transform on four data words presented to its input and then repeating the process on consecutive groups of four data words until a complete pass is made through a window of 11 complex data words. For 11=64, sixteen repeated operations are performec.

1 i 1 1 i 1 1 1 1 i 1 1 1 i i i 9 The "twiddle factors" are described in Pease, supra, and are defined as follows:

where, and, D16 = 0 D 64 -'Diag 1,161 D16, D16 2 g D 16 3 W1 0.

0 0 W-e-i 2'W/64 0 0 0 W is The twiddle factor valves are coefficients resulting from the derivation of the constant geometry FFT method by Pease. They are calculable from the.equation given above and are constant for a given N and R.

The vector X will be deleted from ensuing discussions for convenience. -. he resulting expressions are thus matrix factorizations of the DFT matrix, F 6 Consideration of the prior art just described shows that It has similar shuffle operators and rourier transform operators in each stage. only the twiddle factors are unique In each stage. "he shuffle operators, moreover, are global In that they address data from all channels throughout thq significantly large time interval of the window of 64 data words. This feature has tended to constrain hardware configurations Into single channel operations.

Significantly, when the data words enter the second stage, they are no longer in the proper sequence for the next S 4/16 operator due to their presence in the same data channel. The clata must be manipulated to present the proper sequence to the second stage. With reference to FIG. 1, data words from the Fourier transform operator 10 are output in four channels A-D in the order indicated by the numbers shown. Data input to the next stage, however, is required In groups of four data words indi cated by the circled numbers To align these data words in their -proper channels, cross channel communiation has heretofore been required. In the prior art this has been accomplished using switches. The McClellan, Purdy pipeline architecture, for example, performs the cross channel manipulation using a combination of FIFO memories of differing delays and a commutator switch.

As the first step to reaching the systolic geofnetry method of the present invention, and in order to provide for processing of R parallel channels without switches, the shuffle operator of the prior art is factored into three operators:

where, S4/16 Spf SR Sp Spf = fast cyclic shuffle (every data word) S R = random access memory (AAM) shuffle, constrained to operate on one data word from each channel so that separate RAMs can be used in each channel, sp slow cyclic shuffle (one every 4th data word) The Sp shuffle cycles sets of four data words within each larger group of 16 data words. With N=64, there are four groups of 16. The first gr cup of 16 data words is not changed at all. Each set of four data words in- the second group of 16 Is cycled by one data word as demonstrated below:

11.9 16 1 16 17 1 1 11 1 Each set of four data two data words and in 17 is 23 20 21 22 27 24 25 26 31 28 29 30 ---7-1 = SP is 19 20 21 22 23 24 25 26 27 28 29 30 31 -:_1 0 words in the third group of 16 is cycled by the fourth group by three data words.

Using the cons.tant geometry method previously discussed, the second and third stage global shuffles are replaced with the three permutation operators as shown below (the first stage shuffle will remain unchanged):

p 64 " F4 SpE SR SP 1 D 16 F 4 Spf SR Sp 1 D64 F4 S4/16 The "D" twiddle factors are reordered by multiplying each by (SP -1 Sp) and passing Sp through they OW' factor creat 1 ng Sp D16 Sp in place of D16 and Sp -1 D64' Sp in place of 64 (primes indicative of reordering). Each Sp -1 then is merged with the Sp from the adjacent channel, in effect cancelling them out, to form; 9 F 64 ' F4 Spú SR 1 D16' Sp F4 Spf SR 1 D 64' Sp F4 S4/16 Spf and Sp pass through F4. without modifying F4, creating ' phase shift operators DPH and DPHI and forming the basic systolic geometry method of the present invention.

12 F64 '3 DP'11 F4 SR D16' DPil F4 Dpl,' SR D64' F4 DpH S4/16 Much of the constant geometry architecture is retained with differences appearing in twiddle factors and phase rotation operators. The first stage shuffle operator is also retained.

The phase shift operators DpH and DpHr perform the equivalent of cross channel switching by modifying the coefficients of the complex data words. The phase shift operators shift the phase of R of the cimplex data words 'in each channel by multiples of 360/R 'degrees (e.ú.,_or R=4, the phase shifts are 0 for the first channel, rr/2 for the second channel, 'if for the third channel, and 3"1'12 for the fourth channel# or equivalently multiplying by 1, j, 1, -J). The operators use the "shift" property of the DFT kernel wherein a cyclic shift in the Input domain is equivalent to multiplication by complex exponentials In th.e transform domain and vice versa. These shifts put the data words in proper sequence without using data words from another channel, thus eliminating the need for cross-channel switching.

With reference to FIG. 2, in the present invention data entering the Fourier transform operator 10 is phase shifted as discussed above in a phase shifting unit S.' Data words exiting FFT 10 are output In four channels A-D In the order indicated by the numbers shown. In contrast to the order of data words shown in FIG. 1, the pre-FFT shift causes the data words required for Input to the next stage (indicated by the circles) to now appear in separate channels. The next set of inputs to the next stage (shown in the squares) are similarly arranged. The selection of these and succeeding sets of four Inputs from their respective channels Is accomplished through a random access memory (RA1) addressing unit 20.

z 1 4 Another phase shift process is performed after data exit each stage's FFT. for the following reason: With further reference to FIG. 2, observe that the second set of data points (#1, 17, 33, and 49) are in parallel, but are shifted by one from their proper destination RAM. Again, rather than switching data paths we utilize the shift property# this time in reverse, and post multiply (term by term) the F4 computational unit output of the following stage by appropriate multipliers (1, J, -1, -j for R=4) to achieve this same effect. These are simple phase shifts that can be absorbed Into the ROM held coefficients D 161 and D64' which are normally Implemented in a radix 4 transform after each F4.

1rhis simplification t6 the systolic geometry method may be made by merging the phase shift operators with the twiddle factors. This reduces computational. loading by creating a single complex multiplication of the data In each channel, replacing the two complex multiplications required for the twiddle factors and the phase shift operators. Operation of the merged operator is handled in a read-only-memory (ROM) unit.

&he merge Is accomplished by direct merge or by passing the phase shift operation DP.1 through the RAM shuffle operator S..

1 1 Merge r- _Me 1 rge 64 u DpH F4 SR D161 DpH F4 DpHI SR D64 4/16 Stage 3 Stage 2 Stage 1 rrhe final result is a reduced systolic geometry method.

F64 - DpH F4 Sn Dp112 F4 SR Stage 3 stage 2 DPI11 F4 DPH 54/16 Stage 1 A where, and, 14 DPI12 x D16 D P11 DpHl, DpH" D (where D " is a 64 reorderepg DPH 1) With reference now to FIG. 3, a schematic is shown. of the elements of an lq=64, R=4, 3 stage FFT apparatus embodying the systolic method of the present invention. Data flow is from left to right.

shuffle unit 100 receives data words 90 and arranges them into four channels A-D. While it Is envisioned that shuffle unit would perform.the S4/16 shuffle previously descrtbed, this unit could be removed where data is already available in the order provided by the S4/16 shuffle. Phase rotator 110 is a multiplier unit that shifts the phase of the data words so that the order shown in FIG. 2 is available after the subsequent Fourier transform operation. For radix 4, the phase rotator liO simply multiplies each channel by powers of J: for example chan nel A by 1 (jo), B by j (jl), C by -1 (j2), and D by -j (j3). At higher radix operations, the multiplier is more complex (e..%., for radix 8 the phase in-each of 0 channels must be shifted by multiples of 45 0 Fourier transform operator 120 performs the DFT operations, calculating the Fourier transf-orm of sets of 4 data words, one from each channel. Twiddle elements 130 include a memory for storing predetermined coefficients and a multiplier to multiply the data words by the coefficients. RAM units 1 40 perform the RAM shuffle (SR) and are the data interfaces between stages.

For N greater than 64, stage 2 Is repeated except that the second and subsequent stages include a post-FFT phase rotator before each pre-FFT phase rotator 110. As discussed above, the post-FFT phase rotator allows the second set of data to be i i i accepted priperly from the RAMs. For N=64, this function was a bsorbed in the reordered twiddles D 161 and D.4 f.

The reduced systolic geometry method is embodied In rIG. 4 wherein the identification numbers correspond to those in FIG. 3. Phase rotators 150 are the merged operators containing the twiddle coefficients merged with the phase shift multipliers.

The stage-to-stage symmetry of the device in FIG. 4 may be further enhanced by merging the phase rotators 110 in Stage 3 with a unity twiddle operator and.by addressing the.operators in phase rotator 110 in Stage 1 with a RAM element. These changes create a device wherein each stage has a RAM element, a Fourier transform operator and adjusted twiddle coefficients in programmable ROM (PROM).

An FFT processor unit used as a stage in a multistage FFT apparatus and including these three pieces is shown in FIG. 5. For N=64 and R=4, three FFT processor units would be required.

The iFT processor unit 200 in FIG. 5 may be comprised of seven elements: an 1/0 RAM element 220, RAM addressing element 240, twiddie memory element 260, twiddle addressing element 280,.FFT computational element 300, built-in test (BITE) element 320, and control element 340.

The 1/0 RAM element 220 is the data interfce between successive stages. it receives Input data 225 from the previous stage (or, for stage 1, from system inputs through an element such as a shuffle operator, not shown). The data may have two 16-bit components, forming a complex data word having a phase. The 1/0 RAM element 220 may be separated into four identical ports which interface directly with the four ports of the cam putational element 300. Each port may consist of recursive data buffers, and a double buffer RAM module. Each of the four RAM 16 modules may consist of multiple RAMs (a..S.r for N-4096t two 2K x 16 RA14s) with multiplexed inputs and outputs so that new input data can be written into RAH at the same time as data is being read from RAM. in this manner the FFT processor unit can be 100% utilized. RAH element 220 receives write address commands in bus 227 and read address commands in bus 229 from the PAM addressing element 240. output data in bus 235 is provided to the FFT computatLonal element 300.

The RAM addressing element 240 may consist of multiple pRoms for N-4096, four 2K x 8 PROMs), RAM addressing is divided into read and write addressing and into complex and real addressing.

The complex addressing portion uses the respective read or write addresses to store a complex data value. Real addressing adds an LSB to the complex address which controls access to the in-phase and quadrature components of each complex data value. For example# an LSB value of 0 may correspond to the In-phase or real component and a value of 1 to the quadrature or imaginary component. The FFT data can thereby be referenced as complex data using the complex addressp with the inherent knowledge that the value of the UB determines the real or imaginary component.

All RAM write addressing is sequentialg driven by a count corresponding to a particular complex Fourier transform computation, per FFT stage. Eachsuccessie complex point is written sequentially, with the real component first and imaginary component second. Thus a real counter can address the 1/0 RAM element 220. This sequential count is performed simultaneouslY for each of the four ports, and therefore resides on a single bus 227.

The read addressing is cons1derably more involved than the write addressing due to the RAM shuffle (Sp,) of the input data to 1 i Z t 17 the FFT.computation element 300. Since the data was written to RAM in sequgntial order in its proper channelg the read addressing must provide the selection of the required data words for each Fourier transform operation. The read addressing is accomplished. through instructions in PROM memory addressed by a sequential counter tracking he current complex Fourier transform computation number. Each of the 4 PAMs in 1/0 RAVI element 220 requires independent addressing in parallel in bus 229.

The twiddle memory element 260 may consist of multiple PROMs (e.S..# for 11-4096t eight.2K x 8 PROMs) which contain the twiddle factor coefficients that are supplied to the FFT computational element 300. The data are supplied via four, parallel 16-bit ports in bus 265. As will be discussed later# the actual twiddles stored are those required for stage one of a transform for a predetermined N and R. From this stored data# the twiddle addressing scheme in twiddle addressing element 280 selects the correct coefficient for any Fourier transform operation per particular stage and transform size.

The twiddle addressing element 280 may consist of multiple PROMs (e..%.i for N-4096# three 2K x a PROms) controlled by counters tracking the complex rourier transform operation number as In the RAM addressing element 240, The element uses a masking scheme in which the addressing Is Incremented by an appropriate amount to compensate for diff.erent passes or transform sizes. Instructions are fed to the twiddle memory element 260 via line 275.

The control element 340 generates on-board control signals 345 for other elements and provides a clock-in signal 355 to synchronize all 1/0 data words and coefficients.

The BITE element 320 may draw on output data 305 as a source for test data using line 315. For example, by drawing on one 18 port of output data 305, the BITE element 320 may use a 16-to-I data multiplexer and an external control 325'to select one of 16 hits for external monitoring via line 335.

The Fr-P computational element (FFTCE) 300 is discussed with r4ference to rIG. 6, wherein the identification numbers are the same as in FIG. S. While the FFTCE may be a single chip of unique design, its functions may also be performed by a cascade of existing chips such as the IBM SPE chip. The function of the FFTCE is to calculate the EFT matrix operations and to provide array scaling and rounding. It receives input data via bus 235, twiddle coefficients from bus 265, control functions via line 345, and the clock-in signal in line 355. output data is supplied via bus 305. (The number adjacent each port or bus In FIG. 5 Indicates the number of lines per port or bus.) #he FFTCE 300 calculates the radix 4 four point DFT by solving the following:

T(k) ID(k) + D(k+l) + D(k+2) + D(k+3)l x C(k) T(k+l) ID(k) - JD(k+l) D(k+2) + JD(k+3)l x C(k+l) T(k+2) [D(k) - D(k+l) + D(k+2) - D(k+3)l x C(k+2) T(k+3) ID(k) + JD(IC+1) - D(k+2) -JD(k+3)l x C(k+1.0) For k - 0, 4, 8,..., N-4 where T(i) = output vector D(i) = data input vector C(i) = twiddle coefficient or phase rotation ve ctor j - square root of - 1 1, mote, because this is a radix 4, the phase rotator operation has been included In the equations by adding the C(i) vector. At higher radix operations, the phase rotation operation will require a separate complex multiplication operaCion (not shown). Alternatively, the phase rotators may be merged with the twiddle 1 C - 19 memory element coefficients to save a complex multiplication step.

Data may be received via bus 235 as data words. The data words may be complex, consisting of an in-phase' component and a quadrative component. Each component is a 16-bit. fixed point, signed fractional# two's complement number. Four data words are receiVed simultaneously, forming the vector D(I) in the equations shown above. Similarly. four twiddle coefficients are received from bus 265 in sync with the four data words# forming the vector CU). With N=64 In this.example, the rFTCE receives 16 vectors D(i) and C(i) for each pass through a stage, and outputs 16 vectors T(I), each comprised of four data words. The output Is sent to the next stage via bus 305.

With further reference to FIG. 6# the FFTCE 300 may output a control word "scale factor ouC 347 which equals the number of right shifts required to prevent overflow during subsequent processing. The output is received in the next stage as a "scale factor In' 349.

An embodiment of the architecture for the MCE is shown in FIG. 7-wherein the Identification numbers correspond to those in FIG. 5. The data words in a vector D(i) enter In bus 235 and are.appropriately scaled.Arithmetic logic units (ALU) 312# delays 314, "j" multiplier 316, and complex multipliers 318 using the C(i) vector from bus 265 provide the vector T(i) which Is output In bus 305.

The values for vector-C(i) stored in a twiddle memory element (260 in FIG. 4) of an N=64, R=4 systolic geometry FFT processor unit are shown in FIG. 8. The count of the Fourier transform calculations are shown in column A. The four values of C(i) for each stage are shown in columns B-D. Each value represents "m" in the term exp (-j2lTm/N), where N=64.

The stage 2 twiddle coefficient values are seen to be a subset of those in stage 1. For larger N. this relation extends to later stages, When the power of j phase rotator is merged with the twiddle.si however, the repetitiveness Is destroyed. The stage 3 values shown are the result of merging the power of j multiplier with the units twiddle In stage 3. For the case where the power of J multiplier Is not merged with the twiddle in stage 3, all twiddle values in stage 3 are zer.o.

ROM requirements may be further reduced by combining twiddle coefficients as shown in FIG. 8. A hardwired addressing scheme in the twiddle addressing element (280 in FIG. 4Y selects the appropriate C(I) value, usIng the 'Fourier transform calculation number and the stage numbers. in the last stage only the circled valued are adcessed. For higher Nt the patterns are similar, except that avariation of the circled group Is repeated. For example, with N-256, each column. would contain four consecutively repetitions of [Of 641 1280 192).

Software simulation of the method just described shows that it could be used in a radix 2, 4, By or higher radix pipeline t rFT. As the radix increases# the throughput rate goes up due to a greater degree of parallelismi provided the rFT computational processors at each stage can be structured with sufficient amount of parallelism in its internal architecture to keep up with the higher throughput rates. Depending on the processing speed requiredi high radix architecture may call for a pipeline pro cessor within the FFT computational element 8 point FFT, etc.) While the method just described is a decimation-in-frequency (DIF) type,, it should be understood that a decimation-intime (DIT) version can be generated from the DIF method by performing a matrix transpose of the Dr T matrixt F64r when it is symmetric.

1 -1 1 e 21 1P 64 Is m@de symmetric by post multiplying with. the shuffle matrix S4A hich 'reverts the. rows 'DE. 164 to natural (increasing f requpn") order. The DIT version (after pOst-multiplying the DIF vgraon by S4/16 and transposing the result) is:

1P64'r S1614 DPE F4'DPH1 Sp.T F4 DPE2 SPLT F4 Dpa S1614 where# S1614 S4/16T#.

SRT is the transPose.of SR@. and E64T is the DIT DFT matrix.

S ince F4 Is a block diagonal matrix containing 4-point kernel DFTs along the diagonal and each DFT is assummed in natural order and therefore a symmetric 4-by-4 matrixt the matrix F4 is therefore symmetric and unchanged by transpositions. All of the "D". matrices containing phase rotation and twiddle factors are already diagonal and therefore symmetric. only the shuffle matrices S4/16 and OR are changed by the transpose operation. Note, the twiddle factor matrices now precede the F4 matrices which is characteristic of decimation-in-time processors and methods.

In an alternative embodiment, the functions of the phase shift operators could he performed with cyclic commutator switches. While this does Introduce cross channel communication, it does so outside of Che FFT computational element. Such an embodiment may be attractive At low speedr low radix.operations.

The techniques described above could also be applied.in a recursive FFT processor. Recursive processor applications, wherein the same hardware is used repetitivelyt may be appropriate when low processing speed is acceptable.

While the preferred embodiments of the present invention have been described, variations and modifications will naturally occur i 1 22 to those skilled in the act from a perusal hereof. It is therefore, to be understood that the embodiments.deicribed are illustrative only and that the scope of the invention is to be defined solely by the appended claims when accorded a full range of equivalence.

1 1 1 A 1 C 23 1

Claims

What is claimed is:

1. An apparatus for calculating the Fast Fourier Transforn of complex data words, comprising:

input memory means for receiving said data words' and arranging them in R channels, having R outputs corresponding to said R channels; plural serially connected stages for transforming said data words, wherein the first of said stages receives'sald data words in R channe.1s from said input memory means outputl and output memory means connected to the last of said ser Lally connected stages for outputting said data words; wherein each of said stages comprises; R input means and R output means corresponding to said R channe15for connecting said stages; R shuffle means for aligning In time R of said data words in a vectorp wherein each said vector comprises one of said data words from each of said channels; rourier transform operator for performing a rourier transform on each said vector; R first multiplier means for shifting the phase of said data words before said data wards are input to said 1 Fourier transform operator; 24 R second multiplier means for shifting the phase of said data words af. ter said data words are output from said rourier transform operatort wherein'sald second multiplier means is not in the first of said serially arranged stages; and 1 n third multiplier means for providing predetermined coefficients to each. of said data words; wherein between said input memory and said output memory data words in one of said channels are not transferred to another of said channel, except within said Fourier trinsform operator.

2. The apparatus as defined in claim' 1 wherein one of said R third multiplier means Is merged with one of said R first multiplier means.

3. The apparatus as defined In claim 1 wherein one of said third multiplier means Is merged with one of said second multiplier means.

4. The apparatus as defined in claim 1 wherein said Four4er transform operator comprises a block diagonal matrix having R-point Fourier transforms along the block diagonal.

5. The apparatus as defined In Claim 1, wherein said first multiplier means comprises multiplier means for shifting the phase of said data words by integer multiples of 360/R degrees.

6. The apparatus as defined in Claim Ii wherein each said thirdmultiplier means comprises:

one or more programmable read only memory means containing said predetermined coefficients; 1 1 1 1 i 1 i i 1:

1 1 1 i i i 1 1 i i i tracking means for counting the number of operations performed by said Fourier transform operator; and an addressing element for selecting said predetermined coefficient from said programmable readonly memory means using said tracking means.

7. Apparatus for performing Fourier transformations on an Input stream of N digital data words# each having a phase, comprising:

receiving means to receive said data words in R channels; processing means for processing said data words in LQ90 serially arranged stages, wherein each of said stages comprises; shuffle means comprising an N by N matrix for aligning in time R of said data words in a vector, wherein each said vector comprises one of said data words from each of said R channels drawn at intervals of N/R from said stream of 14 digital' data words; phase shifting means for modifying said data words without transferring said data words among said channels, wherein the phase of R data words in each of said channels is shifted. by an integer multiple of 360/R degrees; and Fourier transform means for performing Fourier transforms on each said vector, comprising plural matrix operators of radix R.

1 0 26 8, A processor for performing the Fast FourierTransform of digital data words having a phase, comprising:

plural serially arranged Fourier transform means having plural phannels for performing Fourier transforms on groups of said data words; plural phase shifting means for modifying said data words antecedent to said Fourier transform means, wherein each said phase shifting means places said data words in the proper sequence for the next of said serially arranged Fourier transform means without transferring said data words among.,said channels 9. An apparatus for performing the Past Fourier Transform of digital data comprising plural data channls and plural shuffle operators for rearranging said digital data# wherein each said shuffle operator rearranges.data from only one of said data channels.

10. An apparatus for performing the Fast Fourier Transform of digital data words, each having a phase, comprising; plural data channels; plural serially arranged Fourier transform means for computing the Fourier transform of groups of said digital data words, said groups having one said data word from each of said channels; . plural phase shifting means, each for modifying the phase of said data words in one said channel antecedent to said Fourier tranpform means; and plural shuffle means, each for arranging_ said data words in one of said channels antecedent to said Fouri-er 1 1 1 transform means, i 1 fl E 1 1 27 whereby said e-ta words in one said channel are not transferrd to another said channel.

11. The apparatus as defined in Claim 10, wherein said Fourier transform means cor,prises a block diagonal matrix having Fourier transforms along the block diagonal.

12. The apparatus as defined in Claim 12, wherein there are R said data channels and wherein each of said phase shifting means Comprises multiplier means for shifting the phase of said data words by an integer multiple of 360/R degrees.

13. An apparatus,f.or performing the Fast Fourier Transform of multiple digital data words comprising plural data channels, and means for calculating the Fourier transform of groups of said digital data words, each of said groups comprising one of said data words from each.of said channels# wherein said data words are not transferred among said channels.

A 14. An apparatus for performing-the Fast Fourier Transform of digital data comprising a processor having plural stages and plural channels for. said data, wherein the path for said digital data in each said channels is fixed, having no switches.

15. Apparatus for modifying complex data words having plural channels and plural connected stages for calculating the Fourier transform of said data words,, comprisingphase shifting means for modifying said data words, whereby said data words in one said stage are arranged so that they are in a predetermined order for the next of said stages without transferring data among said channels.

16. The apparatus as defined in claim 15 wherein there are four said channels and wherein said phase shifting means 1 1 1 i i 1 28 comprises multiplier means for multiplying said data words in each of said channels by a power of j, wherein the power of j multipliers for each said channel are consecutivei integer powers of j starting at zero.

17. The apparatus as defined in claim 15 wherein there are R said channels and wherein said phase shifting means comprises multiplier. Means'for shifting the phase of said data words in each of said R channels by a multiple of 360/R degrees, wherein the multiples of 360/R degrees for each said channel are consecutive Integers.starting at zero.

18. An apparatus.for calculating the Fast Fourier Transform of complex data words-, having plural serially connected stages for transforming said data words, each of said stages comprising:

R input means and R output means corresponding to R channels for connecting said stages; R shuffle means for aligning in time R of said data words in a vector# wherein each said vector comprises one of said ' data words from each of said cha'nnels; Fourier transform operator for performing a Fourier transform on each said vector;- R first multiplier means for shifting the phase of said data words before said data words are input to said Fourier transform operator; and R second multiplier means for providing predetermined coefficients to each of said data words; I wherein between said input means and said output means data words in one of said channels are not transferred to another of said channels., I 1 29 1 9 19. The apparatus as defined in Claim 18 wherein one of said P. second multiplier means Is merged with one of said R first multiplier means.

20. The apparatus as defined in Claim iG wherein said Pourier transform operator comprises a block diagonal matrix having R-point Fourier transforms along he block diagonal..

21. The apparatus as defined in Claim lai wherein each said second multiplier means comprises:

one or more programmable read only memory means containing said Predetermined coefficients; trackihg means for counting the number of operations performed by said Fourier transform operator; and an addressing element for selecting said predetermined coefficient from said programmable read-' only memory means using said tracking means.

22. Apparatus for performing Fourier transformations on an input stream. of N digital data words, having R channels and Logp,N serially arranged stages, each of said stages comprises:.

an N by N matrix for aligning in time R of said data wards In a vectorr wherein each said vector comprises one of said data words from each of said R channels drawn at intervals of NIR from said stream of.N digital data words; phase shifting means for modifying said data words without transferring said data words among said channels; and 1 1 Fourier transform means for performing Fourier transforms on each.said vectors.

23. The apparatus as defined in Claim 22 wherein said Fourier transform means comprises a block diagonal matrix having R-point Fourier transforms along the block diagonal.

24. A processor for performing the Past Fourier Transform of digital data words having plural serially arranged Fou:ier transform means, comprising:

plural phase shifting means for modifying said data words antecedent to said Fourier transform means, wherein each said phase shifting means places said data words in the proper sequence for the next of said serially arranged Fourier trans!orm means.

2S. An apparatus for calculating the Fast Fourier Transform ot digital data words in plural data channels, comprising; plural serially arranged Fourier transform means for computing the Fourier transform of groups.of said digital data words, said groups having one said data word from each of said channels; and- plural phase shifting meansf each for modifying the phase of said data words in one said channel antecedent to said Fourier transform means wherein the paths followed by said digital data words are fixed, having no switches.

26. A method for performing'the Fast Fourier Transform of an Input stream of N digital data words, each having a phase, comprising the steps of:

11 31 (a) sorting said data words into R channels; 1 (b) repeating steps (c) through (f) LogRU times; (c) Aligning in time R of said data words in NIR R vectors, wherein said data Words In each of said vector s, are drawn from said Input stream at Intervals of NIR, and wherein each of said vectors comprise one said data word from each of said channels; (d) shifting the phase of said data words, wherein the phase of said data words in each said channel is shifted by niultiples of 360/R degrees for each of said channels; (e) computing the Fourier transform of each of said vectors; and (f) -multiplying by predetermined coefficients.

27. A method for calculating the Fast Fourier Transform of an Input stream of N digital data words In plural serially arranged'stages, each. of said stages comprising the steps of-, (a) aligning R of-said data words in NIR vectors# where.in each of said vectors comprise one said data word from each of said channels:

(b) shifting the phase of said data words, wherein th.e phase of said data words in each said channel Is shifted by multiples of 360/R degrees for each of said channels; and (c) computing the Fourier transform of each of said vectors.

Published 1989 at The Patent Office, State 11ouse, 65171 High Holborn, LondonWC IR 4TP. Further copies may be obtained fvom The Patent Office. Sales Branch, St Mary Cray, Orpington, Kent BM 5RD. Printed by Multiplex techniques ltd, St Mary Cray, Kent, Con. 1/87 1