GB2109961A

GB2109961A - Digital data processing system

Info

Publication number: GB2109961A
Application number: GB08134091A
Authority: GB
Inventors: Wong Andrew Chi-Chung
Original assignee: Standard Telephone and Cables PLC
Current assignee: STC PLC
Priority date: 1981-11-12
Filing date: 1981-11-12
Publication date: 1983-06-08
Also published as: GB2109961B

Abstract

The technique of breaking up a binary coded array of input data points into a plurality of one-bit precision sub-arrays of shorter address length and of different significance, and mapping the sub-arrays sequentially into output data points in a look-up operation, the output data words then being individually weighted according to the significance of the sub-array before being re-combined, is extended to a multi-stage "butterfly" structure, in which an n-point Fourier transform is split into the sum of m<n>m-point transforms. The sequences of sub-arrays are interleaved into m blocks, and the interleaved arrays sequentially address a corresponding number of memory blocks in a first stage of the multi-stage structure, each block having a look-up table for mapping the sub-arrays into respective output points, and the phase rotation between the output points of the second and subsequent blocks with respect to the phase of the output points for the first block being inherent in the look-up tables of the second and subsequent blocks. <IMAGE>

Description

SPECIFICATION Digital data processing system This invention relates to a digital data processing system and is suitable for the treatment of Fourier transforms and other linear applications.

It is known that the number of vector multiplications in a conventional n-point Fourier transform (DFT) is (n-1 2), but that this can be reduced to n/2 log2n by the use of "butterfly" structures in which the input data points are interleaved and an n-point transform is split into m n/m -point transforms. The resulting outputs from each transform are then re-combined in one or more further stages of the bufferfly structure, the final output points corresponding to those which would be obtained from a single n-point transform.

Such a system is known as a Fast Fourier transform (FFT).

In mathematical terms, for an input time function

The n-point transform is thus split into the sum of two n/2 point transforms with a phase rotation term between them. For the kth output point, this phase rotation is: 2J#k k = 2#k (2) n The mechanism of this process can be represented as in Figure 1 of the accompanying drawings, for the case of an 8-point DFT evaluated as two 4-point DFT's. From the vector diagram of Figure 1 (b) it can be seen that the relative positions of alternate vectors are the same for the upper and lower halves of the diagram.

From Figure 1(a), it can be seen that, if we want to evaluate an 8-point DFT as two 4-point DFT's, a phase rotation must be applied to each of the four outputs from the second transform before the outputs are added to, and subtracted from, the four outputs from the first transform to derive the frequency output points F1 to F8. This phase rotation is given by equation (2) for each output point and is known as the "twiddle factor".

Accordingly, the operations involved in a butterfly cycle for a FFT include looking up a memory for the correct twiddle factor, multiplying it with the incoming data, accumulating the product in an arithmetical unit, and directing the output data to the appropriate point.

An alternative technique for producing discrete Fourier transforms (DFTs) using linear binary decomposition look-up tables is described in our copending application 8116855.

That technique consists of breaking up a binary coded array of input data points into a plurality of 1-bit precision sub-arrays of shorter address length and of different significance, and mapping the sub-arrays sequentially into output data points in a look-up operation, the output data words then being individually weighted according to the significance of the sub-array before being recombined.

In accordance with the present invention, there is provided a digital data processing system for producing an n-point Fourier transform, the system comprising means for splitting a binary coded array of input data points into a plurality of one-bit precision sub-arrays of shorter address length and of different significance, means for interleaving the sub-arrays into m blocks such that the successive sub-arrays sequentially address a corresponding number of memory blocks in a first stage of a multi-stage structure for splitting an n-point transform into the sum of m n/M -point transforms, each block having a look-up table for mapping the sub-arrays into respective output points and the phase rotation between the output points and the phase rotation between the output points of the second and subsequent blocks with respect to the phase of the output points of the first block being inherent in the look-up tables of the second and subsequent blocks whereby the output data words from the blocks are individually weighted according to the signifcance of the sub-array and combined in further blocks in one or more further stages of the structure without requiring further vector-multiplication.

Accordingly, compared to the operations involved in a butterfly cycle for a conventional FFT, the operations in the equivalent linear binary decomposition cycle consist of looking up the memory for an output value and accumulating the values in an arithmetic unit. This is simpler and hence more efficient. The resulting multi-stage memory structure is therefore similar to the butterfly structures of more conventional FFTs, but provides the ability to perform discrete Fourier transforms of large size with lower power dissipation and/or higher processing speeds and/or reduced circuit complexity.

The invention may be best understood by reference to the following detailed examples in association with the accompanying drawings, in which: Figure 7a illustrates diagrammatically the input array time decomposition for a conventional two-stage FFT butterfly structure for an 8-point discrete Fourier transform, Figure 1b is a vector diagram for a set of unity input vectors in the various stages of forming 8 frequency outputs in an 8-point Fourier transform, Figure 2 is a redrawing of Figure 1 a in which the transform is produced by linear binary decomposition and showing the summation points as a second row of look-up table memory blocks, Figure 3 is a diagrammatic representation of a two-stage look-up memory structure for a 16-point transform, Figures 4(a) and 4{b) are a schematic diagram of an optimised look-up memory structure and a timing diagram therefor respectively, Figure 5 is a schematic diagram of a non-time-shared basic building block in the multi-stage structure of Figure 2 or Figure 3, Figure 6 is a schematic diagram of a time-shared building block in the multi-stage structure of Figure 2 or Figure 3, and, Figure 7 is a schematic diagram of a multi-stage structure for an n-point DFT based on the principle of linear binary decomposition and with total time-sharing within each stage.

The operation at each summation point in the second stage of the Figure la configuration is in fact a DFT.

The circuit can therefore be redrawn as shown in Figure 2, with four 2-point look-up memory blocks in the second stage, and with inherent phase rotation provided in the lower 4-point memory block of the first stage.

For such a simple operation, the second stage need not consist of true look-up tables. But, say, if it does, it can be shown that there is a substantial saving in the amount of memory required compared to a single stage structure.

For a single stage 8-point block using linear binary decomposition, the memory size is 28 x 8 or 2K words.

In the 2-stage configuration of Figure 2, because of the inherent phase rotation in the lower 4-point block of the first stage, the symmetry is destroyed. Hence, the memory size of this block is 24 x 4 x 2 = 128 words.

The memory size of the upper first stage block is halved, equal to 64. Each 2-point block in the second stage requires 22 x 2 = 8 words. The total memory size is therefore 128 + 64 + (8x4) = 224 words.

The saving in memory is even smaller if the n-point transform is split into smaller multiples, giving building blocks of smaller memory size. Figure 3 shows a 2-stage structure for a 16-point transform using 4-point blocks. Structures with more than 2-stages are, of course, also possible.

Because of the inherent phase rotation is some of the blocks and not in the others, an "in place" processing strategy, whereby the output of one stage is fed to the input of the same stage for a second round of operations, is not feasible. The circuits have to be true pipe-line structures. As such, the throughput rate will be virtually the same irrespective of the number of stages invovled.

The basic circuit for a single-stage structureusing the linear binary decomposition technique is described in ourcopending application 81 16855. The multi-bit words representing the real and imaginary parts of the input vectors are inserted in parallel into respective shift registers, least significant bit first. The bits are read out from the shift registers in parallel, most significant bit first, under the control of clock pulses and applied bia address buses to a number of read-only memories (ROM's). The number of ROM's is equal to the number of output points required. Each ROM responds to the multi-bit address to give either one or two sets of outputs, these being the I and Q outputs respectively.

The I and Q outputs from the ROM's are then processed in an arithmetic (ALU) unit consisting of adders and latches to derive the final read and imaginary output words for each output point in the Fourier transform. The above operations can be performed by the optimised ALU structure shown in Figure 4(a). In this, one ALU is time multiplexed between four add/subtract operations. The cycle begins with both 'I' and 'Q' tri-state output shift registers having been shifted right by one bit effecting the 2k scaling. The 'I' output is first routed to the 'B' data lines of the ALU with the 'Q' output inhibited. Simultaneous with this, the in-phase input is used to address the 'I' ROM and 'Q' output the 'Q' ROM. The 'I' ROM is enabled first, passing an I output to the 'A' data lines of the ALU, which is then added to the previous value and latched.The 'Q' ROM is then enabled, passing a QQ output to the 'A' lines, which is then subtracted (by using the function control on the ALU) from the previous value and latched. At this point, an intermediate in-phase output, incorporating one 11 - QQ update, is formed on the 'I' shift register. The 'Q' shift register is then enabled instead, passing the existing 'Q' value to the ALU. The addresses to the 'I' and 'Q' ROM are now interchanged, effecting the accumulation of IQ + Ql to the existing 'Q' ROM are now interchanged, effecting the accumulation of 1o + Q to the existing 'Q'value. The cycle repeats itself until all the bits are processed. Refer to Figure 4(b) for the relative timing between the various components.

The cycle time of this circuit is equal to 2 x (ALU add time + ROM access time).

By suitable choice of components to match the access time to the add time plus the various overheads, the throughput of this circuit can usually be fairly well optimised.

Figure 5 of the accompanying drawings is a simplified schematic diagram of such a system. This can be used as a basic building block in a multi-stage structure. Each output point block in this diagram is in fact a simplified representation of the diagram described above.

Assume that an 8-point transform is being split into the sum of two 4-point transforms. In this case the sequence of sub-arrays producing the first, third, fifth and seventh output points will address all the I and Q ROM's in the block of Figure 5, while the sequence of sub-arrays producing the second, fourth, sixth and eighth output points will address a similar number of ROM's in a second building block identical to that shown in Figure 5. The resulting output words from each pair of I and Q ROM's in each building block are then weighted and summed in respective arithmetic units (ALU) blocks to produce two groups of four I outputwords and four Q output words representing four output points of a 4-point transform.

The corresponding words in the two groups of words must then be added and subtracted to produce the final eight output words representing the output points of the required 8-point transform. Provided the look-up tables in the lower block of ROM's have been compiled to take account of the inherent phase rotation term which is generated when the transform is split into two, these final additions and subtractions can be formed directly without the need for a vector multiplication.

The building block of Figure 5 can be modified as shown in Figure 6 to reduce power dissipation by time-sharing the ALU's between the various blocks. In this way, the 4-points from each block will be outputted sequentially, requiring four times the time, but, on the other hand, one quarter of the ALU dissipation.

Figure 6 also shows the I and Q memories combined into a larger memory of four times the original capacity. This can be done since data from no more than one point is required by the ALU at one time. It is achieved by arranging each shift register in the form of a cyclic store to provide the same address to each of the I and Q ROM's through address lines A3 to Al 0 eight times per transform cycle, to cater for each of eight output points formed in sequence. There are, therefore, a total of sixteen such shift registers. A timing circit, which is common to all blocks, selects one out of eight sets of output data for each cycle of the cyclic store through address lines Ao to A2. Thus instead of each sub-array simultaneously addressing a number of ROM's in each block, the arrays are presented sequentially to a single ROM in a number of cycles of the cyclic store.

The time sharing can be taken one step further by combining components not only within one block but all common components within a single stage of a multi-stage structure. The result is shown in the simplified schematic diagram of Figure 7. Each cyclic store has two sets of 2n channels of parallel-in, serial-out shift-registers to allow parallel dumping of data from the ALU block into one set, and simultaneous addressing of the next stage memory through the other set.

Claims

1. A digital data processing system for producing an n-point Fourier transform, the system comprising means for splitting a binary coded array of input data points into a plurality of one-bit precision sub-arrays of shorter address length and of different significance, means for interleaving the sub-arrays into m blocks such that the successive sub-arrays sequentially address a corresponding number of memory blocks in a first stage of a multi-stage structure for splitting an n-point transform into the sum of m n/m -point transforms, each block having a look-up table for mapping the sub-arrays into respective output points and the phase rotation between the output points and the phase rotation between the output points of the second and subsequent blocks with respect to the phase of the output points of the first block being inherent in the look-up tables of the second and subsequent blocks whereby the output data words from the blocks are individually weighted according to the significance of the sub-array and combined in further blocks in one or more further stages of the structure without requiring further vector-multiplication.

2. A system according to claim 1 in which each block consists of a plurality of pairs of ROM's for the real and imaginary parts of the input vectors, and each pair of ROM's has an associated arithmetic unit.

3. A system according to claim 1 in which each block consists of a single pair of ROM's with a common arithmetic unit, and means for time-sharing the ROM's between the successive sequences of input sub-arrays.

4. A system according to claim 1 in which each stage of the multi-stage structure includes a cyclic store, a memory and an arithmetic unit, the first stage store cycling the interleaved sequences of sub-arrays and the second and subsequent stage stores each cycling the output words from the arithmetic unit of the preceding stage.

5. A system according to claim 1 and substantially as herein described with reference to the accompanying drawings.