US3702393A

US3702393A - Cascade digital fast fourier analyzer

Info

Publication number: US3702393A
Application number: US82572A
Authority: US
Inventors: Peter Siegfried Fuss
Original assignee: Bell Telephone Laboratories Inc
Current assignee: AT&T Corp
Priority date: 1970-10-21
Filing date: 1970-10-21
Publication date: 1972-11-07
Anticipated expiration: 1989-11-07
Also published as: BE774086A; FR2111632A5; NL7114286A; JPS547178B1; CA945261A; GB1328489A; SE364384B; DE2151974A1

Abstract

A single input channel cascade FFT processor includes a plurality of substantially identical arithmetic units connected by a delay and switching arrangement for selectively delaying subsets of input data samples and intermediate results. By reordering an input sequence and thus selectively delaying, all components are operated at full capacity at all times, while a reduced amount of storage (delay) is required. Alternate embodiments provide for multiplexing a plurality of input data channels and multiplexing arithmetic units among a plurality of stages.

Description

ll U'i' -YZ X1 2 39 029393 United States Patent Fuss [451 Nov. 7, 1972 [54] CASCADE DIGITAL FAST FOURIER ANALYZER [72] Inventor: Peter Siegfried Fuss, Greensboro,

[73] Assignee: Bell Telephone Laboratories, Incorporated, Murray Hill, NJ.

22 Filed: on. 21, 1970 21 Appl.No.: 82,572

[52] US. Cl ..235/156, 324/77 H [51] Int. Cl ..G06f 7/38 [58] Field of Search ..235/156; 324/77 H [5 6] References Cited UNITED STATES PATENTS 6/1971 Smith ..235/156 10/1970 Bergland ..235/156 OTHER PUBLICATIONS Cochran, What is the Fast Fourier Transform? (LHANNEL A INPUT l (I M DELAY I |o| l (U l CHANNEl B swncn INPUT (W 2 COMPUTER s 1 uo-| DELAV l 2 T (I) "'l lao-l i o IEEE Trans on Audio & Electroacoustics Vol. AU-15, June 1967 pp.

Bergland, FFT Hardware Implementations An Overview IEEE Trans. on Audio & Electroacoustics Vol. AU-l7, June 1969 pp. 104- 108 Primary Examiner-Eugene G. Botz Assistant Examiner-David H. Malzahn Attorney-R. J. Guenther and William L. Keefauver [57 ABSTRACT 10 Claims, 11 Drawing Figures DELAY i l COMPUTER MTENTEDNUY 11m SHEET 1 OF 6 ATTORNEY PATENTEDnuv 1 m2 SHEET 6 [IF 6 #6 556 m e fis m N5 52$? 22 52 =6 E 02 :52 5 OS 25E 9 NEE GR moon I 1 .1 i m ||--E mg IT I 3v a 3 58 3 55 9 Q8 O S O S 0 w m w m w an m u m u m m H 1 H m a INS W I a I 2 a 55 Eda 2 5a g F -08 -5 ZS GOVERNMENT CONTRACT The invention herein claimed was made in the course of or under a contract with the Department of the Navy.

This invention relates to signal processing apparatus and methods. More particularly this invention relates to apparatus and methods for generating Fourier series coefficients corresponding to a sequence of input signals. Still more particularly, this invention relates to a digital processor having a plurality of cascaded stages, each capable of generating a sequence of signals corresponding to the Fourier coefficients of selected signals applied at its input.

The well-known fast Fourier transform techniques have been applied to a wide range of signal analysis problems. Particular apparatus and methods for performing the fast Fourier transform have taken many different forms. A recent summary of several of the most popular configurations are described, for example, in Fast Fourier Transform Hardware Implementations by G. D. Bergland IEEE Trans. on Audio and Electroacoustics, vol. AU-17, June, 1969 pp. 104-108. Another useful reference is Cochran, et al., What Is the Fast Fourier Transform, IEEE Trans. Audio and Electroacoustics, June, 1967 pp. 45-55. One particular form of fast Fourier transform apparatus which has been found to be of commercial importance is the socalled cascade or pipeline processor, described, for example, in Bergland and l-lale, Digital Real-Time Spectral Analysis, IEEE Trans. Electronic Computers, vol. EC-16, pp. 180-185, April 1967, and in copending U.S. patent applications by G. D. Bergland et al., Ser. No. 605,791, filed Dec. 29, 1966, now U.S. Pat. No. 3,544,775 issued Dec. 1, 1970, and by R. A. Smith, Ser. No. 741,506, filed July 1, 1968 now U.S. Pat. No. 3,588,460 issued June 28,1971.

Other useful references dealing with the so-called cascade or pipeline fast Fourier transform processor include Groginsky and Works, A Pipeline Fast Fourier Transform, 1969 IEEE Eascon Rea, pp. 22-29; and OLeary, Nonrecursive Digital Filtering Using Cascade Fast Fourier Transformers, IEEE Trans. on Audio and Electroacoustics, June 1970, pp. 177-183.

The above-cited Smith patent application has described an improvement on a basic cascaded Fourier processor described, for example, in the Bergland et al. patent application, supra. Smith found it possible to more fully utilize the apparatus of the Bergland configuration to achieve higher efficiencies under certain circumstances. In particular, it was noted by Smith that not all of the apparatus in the Bergland et al. configura tion was used at substantially full capacity. By utilizing this spare capacity and by appropriately routing signals, it was shown by Smith that two complete input sequences could be processed with no more hardware than was previously required for a single input sequence.

The present invention represents yet a further improvement over the basic configuration illustrated in Bergland et al. while further improving the efficiency and operating speed of the processor described in either the Bergland et al. or the Smith patent applications, supra.

It is therefore an object of the present invention to provide for the simplified generation of Fourier series coefficients. It is a further object of the present invention to provide for simplified cascade fast Fourier transform processing units which utilize the individual computational and storage elements with improved efficiency. It is a further object of the present invention to provide in a digital cascade fast Fourier transform processor for the generation of Fourier series coefficients based on a single input data sequence.

SUMMARY OF THE INVENTION Briefly stated, one embodiment of the present invention provides for a plurality of cascaded computational units of the type described generally in the Bergland et al. and Smith patent applications, supra. It proves advantageous, however, to perform a permutation of the input data sequences prior to processing. This per muted data is then applied in alternate subsets to each of the input terminals of a succeeding computational stage. The outputs of the first and subsequent stages are first grouped into subsets by a switching and delay arrangement prior to application to the inputs of succeeding stages. By thus alternating subsets previously forming a single data stream, it is possible to utilize to the fullest extent the storage and computational facilities of each of the computational stages. Further, the particular organization of the data used permits a simplified combining of the individual data items with an attendant reduction in the number of memory cells required.

It is therefore a feature of the present invention to include means for scrambling an input data sequence according to the digit-reversed technique prior to processing by the first stage in a cascade FFT processor.

It is a further feature of the present invention to provide for the alternation of subsets of a single input data sequence at each of a plurality of inputs of the first (in put) stage in a cascade F F1 processor.

These and other features and objects of the present invention will be described in greater detail in the detailed disclosure of an illustrative embodiment given below taken, together with the drawing in which:

FIG. 1 shows the general configuration for one embodiment of the present invention;

FIG. 2 shows a modular computer for use in the system of FIG. 1;

FIG. 3 shows a switch for selectively routing data in the FFT processor shown in FIG. 1;

FIG. 4 is a signal flow graph illustrating an F FT computational sequence;

FIG. 5 is a sequence chart showing various of the operations performed by the circuit of FIG. 1;

FIG. 6 is a modified version of the circuit of FIG. 1 having improved emciency of storage, and requiring only a single input;

FIG. 7 shows a sequence chart for the circuit of FIG.

FIG. 8 shows two typical stages of a generalized cascade processor in accordance with the principles of the circuit of FIG. 6;

FIG. 9 shows a modification to the circuit of FIG. 6 and 8 comprising a time shared computational unit;

FIG. shows a generalization of the circuits of FIGS. 6 and 8 for processing signals from a plurality of input channels; and

FIG. 11 shows a time division multiplexing scheme which is useful in operating the circuit of FIG. 10.

DETAILED DESCRIPTION FIG. 1 shows a cascade FFT processor based generally on the earlier teachings described in the Smith reference, supra. Modifications to the Smith system embodied in FIG. 1 will be discussed below. In particular there is shown a plurality of modular computers 100-1 through 100-3. The number of computers 100-i is, of course, understood to relate to the number of individual samples in an input data stream for which Fourier series coefficients are desired. Thus, the threestage cascade FFT processor shown in FIG. 1 is, according to the teachings of Smith, supra, and others, suitable for computing the Fourier series coefficients corresponding to a data sequence having N 2 8 data samples.

In accordance with the teachings of Smith, data are entered on

input terminals

101 and 102. The Smith reference has also shown that by using delay units such as 120-1 and 120-2, together with switches such as 110-1, between each stage of previously used cascade processors, it is possible to accommodate data from each of two individual data sources with the same number of processors and at the same speed as waS previously used to calculate coefficients for one stream. Thus there is indicated on input lead 101 the designation channel A input. Similarly, lead 102 refers to a channel B input. It is recognized that each of these input channels is independent of the other. The number in parentheses in each delay unit indicates the number of sample periods (of the input data stream) of time delay introduced by that delay unit. Thus unit 120-1 introduces a delay of one sample period into the input data stream. Delay units shown in FIG. 1 and elsewhere in this description may conveniently take the form of serial delay lines shift registers or other equivalent wellknown serial memory elements. Alternately, delay may be introduced by merely storing inputs and reading them at a subsequent interval bearing a required relation to the time of storage, under program control or otherwise.

FIG. 2 shows a well-known computer which may be used as unit 100-i in FIG. 1. As indicated in FIG. 2, computer 100-i (for the ith stage) includes a multiplier unit 111 and

adder units

112 and 113. Adder unit 113 includes a negation of one of its inputs. Thus, while adder unit 112 forms the sum of the two inputs impressed on it on leads 114 and 115, adder unit 113 forms the difference between signals impressed on it on leads 1 16 and 117. Multiplier 1 11 forms the product of the signals impressed on

leads

118 and 1 19. Lead 119 is conveniently arranged to receive the trigonometric (complex exponential) signals usually associated with the FFT. Thus, in terms of nomenclature provided in FIG. 2, multiplier 111 forms the product Qw, where w exp 21-rj/N,j =l and N is the number of input samples to be transformed. In sum, then, as indicated in FIG. 2, leads 121 and 122 provide, respectively, the outputs P Q0) and P Qw. Particular configurations for carrying out the individual fundamental operations indicated in FIG. 2 are well-known in the art and may in particular cases take the form shown in US. Pat. No. 3,517,173 issued June 23, 1970 to M. J. Gilmartin, Jr., et al. and assigned to the assignee of the present invention. The required trigonometric function signals are conveniently stored in a separate memory and are accessed as required according to well-known techniques. Altemately, the trigonometric function signals are generated as required in accordance with well-known techniques described, for example, in a copending US patent application by Bergland et al., Ser. No. 873,587, filed November 3, 1969 now US. Pat. No. 3,662,161, issued May 9, 1972.

In accordance with the digits-reversed permutation technique, and as shown in Table 1, an input sequence represented in column 1 as X(O), X( l X(7) and further respectively represented by the direct binary code shown in column 2 in Table l, is effectively permuted.

Column 3 in Table 1 shows a list of codes arrived at by reversing the order of digits found in the corresponding X code. Finally, column 4 in Table 1 shows the order for the new (permuted) sequence designated A, arrived at by associating a decimal digit with a corresponding A code shown in column 3. The sequence of input signals applied at

input terminals

101 and 102 is then arrived at in accordance with the argument of the A variables. Thus, for the particular case of N 8 inputs shown in Table 1, the input sequence A(O), A(l), A(7) is seen to represent the original X sequence in the order X(O), X(4), X(2), A(6), X( l X(S), X(3), X(7).

The actual reordering indicated may be effected by assigning consecutive locations in a buffer memory to input samples (Xs) as they arrive or are generated. The output sequence is then generated by reading in sequence the contents of memory locations associated with consecutively increasing values for A. This may be conveniently performed by a data processing machine operating under program control.

An identical permutation, or prescrambling, of each input sequence of 8 (or, more generally, N samples) appearing on channel B is also performed.

FIG. 3 shows a switch -1 which alternates between an upper and lower position to connect two inputs to two outputs in a controlled manner. The period during which switch 110-1, at the ith stage in the circuit of FIG. 1, is in each of its two positions will be discussed below.

The version of the FFT algorithm practiced in accordance with one embodiment of the present invention is illustrated in FIG. 4. The graph of FIG. 4 is adopted from FIG. 5 of the Cochran et al. reference, supra, and details the sequence of operations performed on each data sample without regard for the absolute time at which each operation is accomplished. The process is illustrated only for the inputs on channel A, but it should be understood that an identical sequence of operations is performed on data appearing on channel B. Extensions of the process illustrated to sets of input data containing more than 8 samples is immediate in light of the above and the well-developed literature on FFT.

The process illustrated in FIG. 4 bears many of the now familiar features of F FT processing. Thus, from an original input sequence A (i), i= 0,1, ,7; a second set Al,'(i), i= 0.], ,7 is formed, and so on. When, as here (for the purpose of simplicity of discussion), the number of input samples in each input sequence, N, is given by N 2'", i.e., N is an integer power of 2, the mth sequence A,,, is the desired set of Fourier coefficients. Thus, in FIG. 4, the set of signals A (0), 14 (1), ,A (7) shown at the extreme right represents the output set of Fourier coefficients.

The intermediate nodes are shown under the column headings A,() and A Each horizontal or diagonal line indicates a computation performed by one of the computers 100-i in FIG. 1. The signed exponential signal associated with each such line indicates the particular value of the exponential which enters into the corresponding machine computation. For this purpose w exp( 2'rrj/N N), l. Note also that w" =w" FIG. illustrates the process of forming the desired output sequences on the system shown in FIG. 1 in accordance with the sequence of operations shown in FIG. 4. Time is indicated along the top in FIG. 5, with the time origin occurring substantially at the instant, t that the first sample of the input sequences A and B (as reordered) are presented on

leads

101 and 102, respectively. The original (nonreordered) sequences are indicated in FIG. 5 as X(i) and Y(i), while the corresponding reordered sequences are indicated respectively as A(i) and B(i). It should be noted that A(i) and A (i) are identical as are B(i) and B (i), i.e., the results of the zeroth iteration are merely the input sequences. The time interval It t l is taken as the sampling period, i.e., the time between the arrival of each input sample on both input leads 101 and 102.

The two-level signals 8 S and 8;, indicate the positions of respective switches 1 10-1, 110-2 and 110-3 as a function of time. The higher level, as exhibited by S in the inverval t -t,, for example, indicates that the indicated switch (switch 1101 in this instance) is in its upper position. Likewise, the lower level for the signals S, indicate that the corresponding switch is in the lower position.

Because of thepresence of delay units 120-1 and 120-2, the first data samples A,,(O) and B,,(0) entered at respective input leads 101 and 102 are initially each delayed a full sampling interval. Thus A (O) is delayed by unit 120-2 and 8 (0) is delayed by unit 120-1. Further, because of the alternating of switch 110-1 the input leads to the computer 100-1, oz and a receive two consecutive samples from the channel A sequence. Thus A,,(O), having been stored during the interval t t, in delay unit 120-2, is available at a, at the very beginning of the t,t interval. A,,( l having no delay introduced in its path, is presented immediately to terminal a, after being made available on lead 101. During the interval t -t computer -1 operates on input samples A (0) and A,,( l) to form the results A (0) A (O A (1) on lead 121-1 and A,(l) A (O) A (l on lead 122-!. The complex operations (including the multiplications) degenerate to real additions for the first stage, i.e., m 1. These A, results are the two-point transforms corresponding to the column A,() in FIG. 4.

After leaving computer 100-1, A (0) proceeds to switch -2 while A,( l encounters delay unit 130-1. The magnitude of the delay introduced by this latter delay unit is, as indicated in FIG. 1, a two-unit delay. Because switch 110-2 is in its up position for the interval t -t signal A (0) passes to delay unit 130-2, where it encounters a two-unit delay. After this twounit delay, and beginning at time t;,, A,(0) appears at the B, input terminal to computer 100-2.

Meanwhile, during the interval t --t computer 100-1 is presented with the first two inputs from channel B. Note that 8 (0) has been delayed an additional sample interval by delay unit -2 while B 1 has encountered only a one-unit delay. The results then generated, 3 (0) and B (1), are presented on the outputs 121-1 and 122-1, respectively.

The remainder of the FFT process proceeds in this general manner, as more particularly indicated in FIG. 5. With switch 110-1 alternating between its up and down position, as shown in FIG. 5, the inputs at input terminals 0:, and a respectively are shown on the corresponding line in FIG. 5. Similarly, with switch 110-2 alternating at the times indicated on line S in FIG. 5, the input at terminals B and B are likewise shown on the corresponding lines 8, and B in FIG. 5. The inputs to the computer 100-3 shown in FIG. 1 are shown on

lines

7 and 7 in FIG. 5.

What should be especially noted at this point is that in the chart of FIG. 5 showing the time of occurrence of the various inputs to the computers 100-i, the inputs are grouped by pairs based on samples from the same input channel. Further, these pairs are separated by one time slot at each occurrence from pairs based on samples from the other channel. Thus, by reordering the input sequence in the manner described and by introducing delays as shown in FIG. 1, the processing of the two input data sequences has been effectively decoupled. This result is an important aspect of the present invention, and has been emphasized by encircling the groups of signals associated with the channel A inputs.

The Smith reference, supra, does not contain this decoupling feature. A review of FIG. 3 of the Smith disclosure will show that the results presented at the input to each computational stage are not partitioned in the manner shown in FIG. 5 of the present disclosure.

For purposes of later comparison, it is well to calculate the amount of delay (memory) required in the system of FIG. 1 above. In particular, it should be noted that at stage 1 (the input stage) there are included two delay units, 120-1 and 120-2, each providing 2 1 unit of delay. In general, the delay units at the ith stage provide 2 units of delay. Since there are two delay elements associated with each stage, the total delay associated with the ith stage is then 2. For the case of an m-stage processor, then, the total number of units of delay, D, required is given by in :2 i=1 For a 12-stage processor, for example, the number of units of delay which must be included is 8190. It should be recognized that each pair of these delay elements must have sufficient capacity to store the results of the previous iteration (the output from the previous stage).

In general, these delay elements must be able to store the indicated number of complex numbers. Altemately, a complex number may be considered as two real numbers (the magnitude of the real and imaginary components), each having a separate storage location. Within this latter framework, a 12-stage processor in accordance with the arrangement of FIG. 1 requires 16,380 words of storage, each word corresponding to a real number.

FIG. 6 shows a three-stage FFT processor in accordance with an improved embodiment of the present invention. The processor of FIG. 6 is intended to maintain the high efficiency of the processor of FIG. 1, but to eliminate the need for two separate input sequences. That is, the system shown in FIG. 6 is designed to calculate Fourier series coefi'rcients in accordance with a fast Fourier transform algorithm for input samples derived from but a single source. Further, the circuit of FIG. 6 has the advantage of requiring only approximately onehalf as much storage (delay) as the circuit of FIG. 1.

Input signals from a single input source are arranged to appear at input lead 201 in FIG. 6. Switch 215 is an ordinary toggle arranged for alternating between each of its upper and lower positions connected to

leads

217 and 218, respectively. Switch 215 is arranged to remain in each of its two positions for alternate periods of duration equal to the sampling interval of the input data stream appearing on lead 201. Switch 215 is arranged to be in its upper position for the initial (0th) and succeeding even-numbered input samples. Conversely, switch 215 is arranged to be in its lower position for sample 1 and all succeeding odd-numbered input samples.

Delay unit 216, having a delay of 1 input sample duration, is interposed between lead 217 and the or, input terminal to computer 601-1. Although the delay introduced by delay unit 216 is equal to 1 input sample duration, the magnitude of this delay (as indicated parenthetically in the box representing delay unit 216) is one-half. The reason for this notation is to establish a reference to the previous notation used in FIG. 1 and FIG. 5. That is, it was assumed that the operation time for computers 100-i in FIG. 1 was, as shown in FIG. 5, equal to (or less than) an input sample duration, e.g., t,t Thus, for purposes of comparison it is presently assumed that the computers shown in FIG. 6 and identified as 601-1 through 601-3 are identical to corresponding computers 100-1 through 100-3 in FIG. 1.

The total input bit rates for the circuits of FIG. 1 and 6 are identical. In FIG. 6 only one source is used, while in FIG. 1 two data sources are used. Hence, to establish the equivalence of total input bit rate, it is assumed that the input data on lead 201 in FIG. 6 arrives at twice the rate of data appearing on either of

leads

101 or 102 in FIG. 1. If it is desirable to use lower speed computers 601-i, or to use computers at less than their maximum speed, the input rate can, of course, be reduced accordingly.

Delay unit 216 introduces a l-input sample duration time delay in the sense just described for each evennumbered input sample. Thus, upon the presentation of an odd-numbered input sample on lead 218, there is simultaneously presented at the a and 0:, input terminals of computer 601-1 input samples corresponding to an even-numbered and an odd-numbered input sample, respectively. Thus, the combination of switch 215 and delay unit 216, cooperates to form pairs of input samples for presentation to computer 601-1 based on a sequence of input samples appearing on lead 201. This process is further illustrated in FIG. 7. FIG. 7 shows two input samples as appearing on lead 201 during the interval t t,. These are the original input samples X(O) and X(4) which, upon reordering, appear as A (0) and A 1 Similarly, during subsequent time in tervals other pairs of input signals are formed from the reordered input sequence which appears on lead 201.

During the interval t t input samples A (0) and A (l), upon presentation to computer 601-1 at terminals oz and 04 respectively, are operated upon to generate corresponding pairs of output signals on

leads

221 and 222. During subsequent time intervals additional pairs of input samples are presented at input terminal a and a of computer 601 and are similarly processed.

Simultaneously, the outputs from computer 601-1 on

leads

221 and 222 are delayed by delay units 220-1 and 220-2 and are switched in the manner indicated by the switch signal S in FIG. 7. Thus, during the interval 13-1 switch 210-1 in FIG. 6 (which is identical to the switch -1 in FIG. 1) is in its upper position. The result appearing on lead 221 (A,(0) A (0) A l is presented at lead B, during the interval 1 -2 after being delayed one time interval in delay unit 220-2. During the next interval (t -t the results generated during the preceding interval at the output of computer 601-2 are applied at the input terminals y, and y of computer 601-3 as shown in FIG. 6. The further progress of the various input signals and the intermediate results generated from them is clear from the sequence chart shown in FIG. 7 and the discussion presented above.

FIG. 8 shows two stages in a generalized version of the system shown in FIG. 6. The pattern of a first delay unit, a switch, a second delay unit and a computer (all of the same type as corresponding units shown in FIG. 6) is, of course, repeated at each stage except the first. Again the first stage is simplified in the manner shown in FIG. 6.

The period for which a given switch S, is in its up (or down) position is equal to 2 time intervals, i.e., the repetition rate for an up-down cycle is once every 2 time intervals. Again, a time interval is equal to the period including two input samples, e. g., the period t t, in FIG. 7.

The amount of delay introduced by a delay unit at the ith stage is equal to 2? For purposes of comparison with the circuit of FIG. 1, it is noted that a circuit in accordance with FIG. 6 and 8 for, say, a 12-stage cascade processor includes only 4,095 words of complex memory. This is equal to 8,190 real memory words, or about one-half of the number required by a l2-stage version of the circuit of FIG. 1. Each of these words is typically part of a serial memory, such as a shift register or delay line. The number of bits in each word is dictated by the range magnitudes of the input samples and the desired accuracy of intermediate and final results.

As is clear from the circuit of FIG. 2 and from the references cited above, the multiplications performed will give rise to rounding-off in most cases. Each user of the present invention will of course adapt or select the number of bits for each of the stages in accordance with his own needs. When it is desired to use a programmed data processor for each computer, such as 60I-I in FIG. 6, suitable straightforward provision may be made in the controlling program to adjust or compensate, as required, to maintain the desired accuracy. Thus, for example, it may be provided that doubleprecision or floating pointing arithmetic operations may be introduced at various points in the computations. These modifications can also be introduced in hard-wired computers on an optional or selectable ba- SIS.

While the arrangements described above are all 100 percent efficient in the sense that each memory (delay device) and each computer contains data or performs an operation, respectively, during each time interval, it may occur that the capabilities of the arithmetic unit (such as a computer 60I-i in FIG. 6) exceed the requirements imposed by a given input data rate. In such circumstances it is often possible to multiplex or time-share a given arithmetic unit between or among two or more processing stages of the type described above. FIG. 9 shows a typical arrangement of the type.

FIG. 9 shows a plurality of stages 400-i in accordance with the circuit of FIG. 8, except that each stage does not include its own computer. Instead a single computer 450 is arranged to be connected between the output of one stage such as 400-(i-1) and the following stage 400-i during a portion of a time interval. Since the arrangement shown in FIG. 9 includes 3 stages sharing a common computer, it is clear that the required computations may be conveniently performed by connecting computer 450 to the input of each of the stages 400-1 for one-third of each time interval. Again the time intervals involved are those indicated in FIG. 7, e.g., t t,, etc. When more than three stages share a common computer, it is clear that each time interval is divided into a corresponding plurality of equal length intervals, and computer 450 in FIG. 9 is conveniently arranged to be connected in such manner as to process signals associated with each of the stages in the network. In appropriate cases, of course, these time subintervals may be unequal.

When a single computer such as 450 is unable to process signals for all stages in a cascade arrangement, two or more computers are conveniently assigned to share this task. Thus, if a computer such as 450 in FIG. 9 operates at a speed which permits it to compute the desired products and sums in a time interval which is one-third (or less) of a time interval such as t t in FIG. 7, such a computer can be connected to three stages in the manner shown in FIG. 9. Then, if a 12- stage processor is required, each of 4 computers such as 450 are arranged in the manner shown in FIG. 9, with one computer associated with each of four subsets of stages each subset including three separate stages.

As shown in FIG. 9, computer 450 is connected intermediate the various stages by using a three-pole, four-throw switch. This switch is cycled between each of the three positions during a time interval such as t t, in FIG. 7. As is true of the various switching arrangements discussed above (the various applications of the circuit of FIG. 3) this switching arrangement is most advantageously performed using straightforward connections of transistor or other logic circuitry.

Although the circuits described in connection with FIG. 6-9 have contemplated the use of but a single input source, it is nevertheless possible to adapt the present invention for use with a plurality of input channels. Thus, in particular it is often desirable to use an arrangement of the type shown in FIG. 10. FIG. 10 shows a plurality of input channels identified as CHI-CHL. These are selected using a scanner indicated as 510 in FIG. 10. The output of scanner 510 is then distributed in accordance with a feature to be described below to each of the inputs to a computational stage such as 500-1.

Each stage 500-i in FIG. 10 is identical to the corresponding stage in the arrangements of FIG. 6 and 8 with one structural exception. That is, each of the delay units employed at a given stage is arranged to provide a delay which is greater by a factor of L than the corresponding delay unit included in the corresponding stage of the circuit of FIG. 6 or 8. The need for this extra delay will appear from the description below. Upon being processed by each of the stages 500-: in FIG. 10 the output is obtained from output leads of the computer in the final (mth) stage by using alternating switch 520 and distributor 525. These latter two elements are complementary to

corresponding input elements

510 and 515 in FIG. 10.

The process involved in generalizing the circuits of FIG. 6 and 8 for use with an L channel input is essentially one of multiplexing the inputs on each of the L channels on a time-division basis. Having thus assigned each of the L channels to corresponding subintervals, the processing required to be performed in each of the stages of the processor of FIG. 10 is exactly equivalent to that required in the circuits of FIG. 6 and 8. However, since the input and output data rates of the system of FIG. 10 as a whole is greater by a factor of L than the circuit of FIG. 6 and 8, there will be included within the system at any given time L times as many data items. From this follows the requirement that the memory (delay) units at each stage are required to be L times as large as those in the corresponding stage in FIG. 6 and 8. Also, for individual input data channels in FIG. 10 having the same data rates as assumed in the discussion of the circuit of FIG. 6, the computers operate at a rate L times as fast as in the circuit of FIG. 6.

As before, the inputs on each of the input channels are assumed to be prescrambled in the manner discussed above. Scanner 510 then successively delivers pairs of inputs from each of the channels in order. This process is illustrated in FIG. 11, where a frame corresponds to a time interval in the sense of FIG. 7 and, additionally, corresponds to one complete scan by scanner 510. It should be noted that scanner remains connected to each channel for a complete time slot in the time division multiplex scheme shown in FIG. 11. During this time two input samples from a given channel are delivered to switch 515. As in the case of the circuit of FIG. 6, a delay unit 501-1 is introduced in series with one input to the degenerate first stage, which otherwise includes only a computer. In the first stage, this delay, as before, is used to permit the simultaneous presentation to a computer of a pair of input samples from a given input channel. Thus the time delay introduced by delay unit 501-1 is equal to one-half of one time slot, e.g., TSO, in FIG. 11. This is also equivalent to D/2L, where D is the duration of a time interval such as t,,t in FIG. 7. Subsequent stages include pairs of delay units with L times as great a delay as the corresponding stages of the circuits of FIG. 5, 6 and 8. Operation of the circuitry of FIG. 10 is otherwise equivalent to that performed by the circuitry of FIG. 6 and 8.

It is clear that the computers in FIG. 10 may be multiplexed in the same manner as in the circuit of FIG. 9.

The above description of the present invention was meant to be merely illustrative. Other embodiments within the spirit and scope of the present disclosure and appended claims will occur to those skilled in the art. Particular extensions or modifications to the presently described embodiments include the use of fast Fourier transforms having other than base 2. That is, a straightforward modification of the presently described techniques and apparatus may be based on a base 4 or other base in accordance with the now well-known FF'I" theory and practice as adopted in accordance with the present teachings. In the case of a base 4 configuration, sets of four consecutive reordered input samples are grouped for presentation to a well-known base 4 version of the computer of FIG. 2. Likewise, the various switches are retained in a given position (up or down) for twice as long a period as in the circuits of FIG. 6 and 8.

Although it has been assumed that the input samples have been complex signals, i.e., having a real and imaginary (or magnitude and phase): there may be cases where the input samples are strictly real numbers. In such cases, memory (delay) for the first stage can be halved, because only real number signals will be processed there. Other simplifications based, for example, on the teachings of copending application by G. D. Bergland, Ser. No. 741,507, filed July I, 1968, now U.S. Pat. No. 3,584,782, issued June 15, 1971, will occur to those skilled in the art.

The arithmetic operations attributed to the various computers mentioned above have, for purposes of clarity, been assumed to be performed in zero time. Thus the results of a particular computation have been assumed to be available immediately after the inputs have been presented to them. This assumption is fully justified for many cases where the speed of the computer is great and the input data rates are moderate.

In cases where this is not the case, a small but discernible propagation delay will be encountered. This is conveniently indicated by a pause at each stage between each of the time intervals such as t t,, and t,t in FIG. 5. A convenient technique that may be used to account for this phenomenon is to introduce a further delay in applying results of a previous stage to a given stage. What is important, in general, is to keep the pairs (or other sets) of data supplied to a given stage grouped in time. Thus one of the two inputs (in the binary case) to a given stage will in general have been delayed by the processing of the previous stage in an amount different from that for the other; this is a predictable difference for each stage. Accordingly, an additional (small) delay is conveniently introduced into the output lead of a given stage having the smaller delay. The sum of all nonzero computer processing delays of course adds a small propagation delay to the time required to compute a complete set of Fourier coefficients.

While the present description has proceeded primarily in terms of special purpose hardware configuration, it is clear to those in the art that each of the above-described operations and functions is readily programmed for performance on many well-known general purpose computers. Thus the functions of the circuits of FIG. l-3, 6 and 8-10 are straightforward programmable arithmetic operations. The delay is conveniently introduced either by separate delay means or a programmed algorithm for delayed access. Similarly the switching may be effected by selection under the control of a programmed indexing algorithm. The applicability of a general purpose processor is especially evident in the case of a multiplexing arrangement as shown in FIG. 9.

What is claimed is:

l. A fast Fourier transform processor for forming Fourier coefficient signals corresponding to one sequence of N input samples, N 2, said input samples in said sequence having been reordered in digitsreversed order to form a reordered input sequence, comprising A. a plurality of ordered cascaded processing stages,

each stage comprising 1. a pair of input terminals,

2. a pair of output terminals,

3. first delay means for delaying signals applied at one of said input terminals,

4. computing means for forming output signals corresponding to selected pairs of signals applied at said input terminals, as selectively delayed by said first delay means, said computer means comprising a. a source of trigonometric function signals,

b. a multiplier for forming product signals corresponding to the product of a first one of each of said pair of signals applied at said input terminals and selected ones of said trigonometric function signals, thereby to form a first product signal, and

c. means for forming sum and difference output signals corresponding respectively to the sum of said first product signal and the second of each of said pairs of signals applied at said input terminals, and the difference between said second of said pair of signals applied at said input terminals and said first product signals,

5. second delay means for selectively delaying said output signals prior to applying said output signals to said output terminals, and

B. means for applying alternate samples from said reordered input sequence to corresponding alternate ones of said input terminals of the first of said processing stages.

2. The processor of claim 1 wherein said first and second delay means each comprise serial delay units.

3. The processor of claim 2 wherein at the ith of said stages, i= 2, 3,. ,m, m= log N,

said first delay means comprises means for delaying selected output signals from the (i l)th stage by an amount equal to 2 sample periods.

4. A single-channel cascade FFT processor for generating Fourier series coefficients corresponding to a sequence of N periodic input data signals, N 2, said input signals in said sequence having been reordered in digitsreversed order to form a reordered input sequence, comprising A. a source of trigonometric function signals,

B. a plurality of ordered stages each comprising 1. a pair of input terminals,

2. a pair of output terminals,

3. a computer for forming at said output terminals Fourier series coefficient output signals based on pairs of data signals presented at said input terminals and on selected trigonometric function signals from said source,

4. first delay means for selectively delaying the presentation of said output signals at said output terminals,

5. second delay means for selectively delaying signals presented at said input terminals prior to their application to said computer,

C. means for selectively connecting the output terminals of each of said stages except the last to the input terminals of a succeeding stage in such manner that each of said delay means delays a different equal length subset of signals during each succeeding period of said input sequence, and

D. means for applying alternate signals in said input sequence to corresponding alternate ones of said pair of input terminals of the first of said ordered stages.

5. A fast Fourier transform processor for concurrently forming a sequence of N Fourier coefficient signals corresponding to each of L input sequences, L 3, each of said input sequences including N input samples, N 2, said input samples in each of said input sequences having been reordered in digits-reversed order to form corresponding reordered input sequences, comprising A. a plurality of ordered cascaded processing stages,

each stage comprising 1 a pair of input terminals,

2. a pair of output terminals,

3. first delay means for selectively delaying signals applied at said input terminals,

4. computing means for forming output signals corresponding to said signals selectively delayed by said delaying means, said computing means comprising a. a source of trigonometric function signals,

B. means for sequentially applying pairs of signals from each of said L reordered input sequences to said input terminals of the first of said processing stages.

6. The processor of claim 5 wherein said means for applying subsets of signals comprises a switch for directing alternate consecutive ones of said reordered input samples from a given input sequence to corresponding alternate ones of said input terminals of said first of said processing stages.

7. The processor of claim 5 wherein said first and second delay means each comprise serial delay units.

8. The processor of claim 5 wherein said subsets of signals from each of said reordered input sequences each comprise two input sample signals, and wherein said first delay means at the first of said stages comprises means for delaying the first of said two input sample signals until the time of arrival of the second of said two input sample signals.

9. The processor of claim 8 wherein at the ith of said stages, i=2,3, ,m, m=log N,

said first delay means comprises means for delaying selected output signals from the (i l)th stage by an amount equal to I .'2" sample periods.

10. A single-channel cascade FFT processor for generating Fourier series coefficients corresponding to a sequence of N periodic input data signals, N 2, comprising A. means for forming a reordered input sequence by reordering said sequence of input data signals in digits-reversed order,

B. a source of trigonometric function signals,

C. a plurality of ordered stages each comprising 1. a pair of input terminals,

2. a pair of output terminals,

D. means for applying said reordered input sequence to the input terminals of the first of said plurality of stages, and

E. means for selectively connecting the output terminals of each of said stages except the last to the input terminals of a succeeding stage in such manner that each of said delay means delays a different equal length subset of signals during each succeeding period of said input sequence.

REQTEQN Patent No.

Inventor(s) Peter S. Fuss It is certified that error appears in the above-identified patent and that said Letters Patent are hereby corrected as shown below:

Column 3', line 30 change was" to --was--.

Column 5, line ,16, change "14(1), i=o.1,,..,7" to =0,1, ,7--; and ii "05=eXp(2TTj/N N),

-A (i), i ne 32 change Column 6, line 3, change "a to 1 line 6 change A (O A (1)" to -A (O) A (l)-; and line 7 change gII "A (1 on lead 12-2- to A (1) on lead 1-22-1--.

insert of.

Column 9, line 8, before 'magnitudes Signed and sealed this 29th day of May 1973.

(SEAL) Attest:

EDWARD M.FLETCHER,JR. ROBERT GOTTSCHALK Attesting Officer Commissioner of Patents USCOMM-DC 60375-P69 9 U.S. GOVERNMENT PRINTING OFFICE: I959 0-366-334 FORM PO-IOSO (10-69)

Claims

1. A fast Fourier transform processor for forming Fourier coefficient signals corresponding to one sequence of N input samples, N > 2, said input samples in said sequence having been reordered in digits-reversed order to form a reordered input sequence, comprising A. a plurality of ordered cascaded processing stages, each stage comprising 1. a pair of input terminals, 2. a pair of output terminals, 3. first delay means for delaying signals applied at one of said input terminals, 4. computing means for forming output signals corresponding to selected pairs of signals applied at said input terminals, as selectively delayed by said first delay means, said computer means comprising a. a source of trigonometric function signals, b. a multiplier for forming product signals corresponding to the product of a first one of each of said pair of signals applied at said input terminals and selected ones of said trigonometric function signals, thereby to form a first product signal, and c. means for forming sum and difference output signals corresponding respectively to the sum of said first product signal and the second of each of said pairs of signals applied at said input terminals, and the difference between said second of said pair of signals applied at said input terminals and said first product signals, 5. second delay means for selectively delaying said output signals prior to applying said output signals to said output terminals, and B. means for applying alternate samples from said reordered input sequence to corresponding alternate ones of said input terminals of the first of said processing stages.

2. a pair of output terminals,

3. The processor of claim 2 wherein at the ith of said stages, i 2, 3, . . . ,m, m log2N, said first delay means comprises means for delaying selected output signals from the (i - 1)th stage by an amount equal to 2i 2 sample periods.

4. computing means for forming output signals corresponding to said signals selectively delayed by said delaying means, said computing means comprising a. a source of trigonometric function signals, b. a multiplier for forming product signals corresponding to the product of a first one of each of said pair of signals applied at said input terminals and selected ones of said trigonometric function signals, thereby to form a first product signal, and c. means for forming sum and difference output signals corresponding respectively to the sum of said first product signal and the second of each of said pairs of signals applied at said input terminals, and the difference between said second of said pair of signals applied at said input terminals and said first product signals,

4. computing means for forming output signals corresponding to selected pairs of signals applied at said input terminals, as selectively delayed by said first delay means, said computer means comprising a. a source of trigonometric function signals, b. a multiplier for forming product signals corresponding to the product of a first one of each of said pair of signals applied at said input terminals and selected ones of said trigonometric function signals, thereby to form a first product signal, and c. means for forming sum and difference output signals corresponding respectively to the sum of said first product signal and the second of each of said pairs of signals applied at said input terminals, and the difference between said second of said pair of signals applied at said input terminals and said first product signals,

4. A single-channel cascade FFT processor for generating Fourier series coefficients corresponding to a sequence of N periodic input data signals, N>2, said input signals in said sequence having been reordered in digits-reversed order to form a reordered input sequence, comprising A. a source of trigonometric function signals, B. a plurality of ordered stages each comprising

5. second delay means for selectively delaying signals presented at said input terminals prior to their application to said computer, C. means for selectively connecting the output terminals of each of said stages except the last to the input terminals of a succeeding stage in such manner that each of said delay means delays a different equal length subset of signals during each succeeding period of said input sequence, and D. means for applying alternate signals in said input sequence to corresponding alternate ones of said pair of input terminals of the first of said ordered stages.

5. A fast Fourier transform processor for concurrently forming a sequence of N Fourier coefficient signals corresponding to each of L input sequences, L>3, each of said input sequences including N input samples, N>2, said input samples in each of said input sequences having been reordered in digits-reversed order to form corresponding reordered input sequences, comprising A. a plurality of ordered cascaded processing stages, each stage comprising

5. second delay means for selectively delaying said output signals prior to applying said output signals to said output terminals, and B. means for applying alternate samples from said reordered input sequence to corresponding alternate ones of said input terminals of the first of said processing stages.

5. second delay means for selectively delaying signals presented at said input terminals prior to their application to said computer, D. means for applying said reordered input sequence to the input terminals of the first of said plurality of stages, and E. means for selectively connecting the output terminals of each of said stages except the last to the input terminals of a succeeding stage in such manner that each of said delay means delays a different equal length subset of signals during each succeeding period of said input sequence.

5. second delay means for selectively delaying said output signals prior to applying said output signals to said output terminals, and B. means for sequentially applying pairs of signals from each of said L reordered input sequences to said input terminals of the first of said processing stages.

9. The processor of claim 8 wherein at the ith of said stages, i 2,3, . . . ,m, m log2N, said first delay means comprises means for delaying selected output signals from the (i - 1)th stage by an amount equal to L.2i 2 sample periods.

10. A single-channel cascade FFT processor for generating Fourier series coefficients corresponding to a sequence of N periodic input data signals, N>2, comprising A. means for forming a reordered input sequence by reordering said sequence of input data signals in digits-reversed order, B. a source of trigonometric function signals, C. a plurality of ordered stages each comprising