GB2109961A - Digital data processing system - Google Patents

Digital data processing system Download PDF

Info

Publication number
GB2109961A
GB2109961A GB08134091A GB8134091A GB2109961A GB 2109961 A GB2109961 A GB 2109961A GB 08134091 A GB08134091 A GB 08134091A GB 8134091 A GB8134091 A GB 8134091A GB 2109961 A GB2109961 A GB 2109961A
Authority
GB
United Kingdom
Prior art keywords
sub
stage
arrays
blocks
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB08134091A
Other versions
GB2109961B (en
Inventor
Wong Andrew Chi-Chung
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
STC PLC
Original Assignee
Standard Telephone and Cables PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Standard Telephone and Cables PLC filed Critical Standard Telephone and Cables PLC
Priority to GB08134091A priority Critical patent/GB2109961B/en
Publication of GB2109961A publication Critical patent/GB2109961A/en
Application granted granted Critical
Publication of GB2109961B publication Critical patent/GB2109961B/en
Expired legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • G06F17/142Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm

Landscapes

  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Discrete Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The technique of breaking up a binary coded array of input data points into a plurality of one-bit precision sub-arrays of shorter address length and of different significance, and mapping the sub-arrays sequentially into output data points in a look-up operation, the output data words then being individually weighted according to the significance of the sub-array before being re-combined, is extended to a multi-stage "butterfly" structure, in which an n-point Fourier transform is split into the sum of m<n>m-point transforms. The sequences of sub-arrays are interleaved into m blocks, and the interleaved arrays sequentially address a corresponding number of memory blocks in a first stage of the multi-stage structure, each block having a look-up table for mapping the sub-arrays into respective output points, and the phase rotation between the output points of the second and subsequent blocks with respect to the phase of the output points for the first block being inherent in the look-up tables of the second and subsequent blocks. <IMAGE>

Description

SPECIFICATION Digital data processing system This invention relates to a digital data processing system and is suitable for the treatment of Fourier transforms and other linear applications.
It is known that the number of vector multiplications in a conventional n-point Fourier transform (DFT) is (n-1 2), but that this can be reduced to n/2 log2n by the use of "butterfly" structures in which the input data points are interleaved and an n-point transform is split into m n/m -point transforms. The resulting outputs from each transform are then re-combined in one or more further stages of the bufferfly structure, the final output points corresponding to those which would be obtained from a single n-point transform.
Such a system is known as a Fast Fourier transform (FFT).
In mathematical terms, for an input time function
The n-point transform is thus split into the sum of two n/2 point transforms with a phase rotation term between them. For the kth output point, this phase rotation is: 2J#k k = 2#k (2) n The mechanism of this process can be represented as in Figure 1 of the accompanying drawings, for the case of an 8-point DFT evaluated as two 4-point DFT's. From the vector diagram of Figure 1 (b) it can be seen that the relative positions of alternate vectors are the same for the upper and lower halves of the diagram.
From Figure 1(a), it can be seen that, if we want to evaluate an 8-point DFT as two 4-point DFT's, a phase rotation must be applied to each of the four outputs from the second transform before the outputs are added to, and subtracted from, the four outputs from the first transform to derive the frequency output points F1 to F8. This phase rotation is given by equation (2) for each output point and is known as the "twiddle factor".
Accordingly, the operations involved in a butterfly cycle for a FFT include looking up a memory for the correct twiddle factor, multiplying it with the incoming data, accumulating the product in an arithmetical unit, and directing the output data to the appropriate point.
An alternative technique for producing discrete Fourier transforms (DFTs) using linear binary decomposition look-up tables is described in our copending application 8116855.
That technique consists of breaking up a binary coded array of input data points into a plurality of 1-bit precision sub-arrays of shorter address length and of different significance, and mapping the sub-arrays sequentially into output data points in a look-up operation, the output data words then being individually weighted according to the significance of the sub-array before being recombined.
In accordance with the present invention, there is provided a digital data processing system for producing an n-point Fourier transform, the system comprising means for splitting a binary coded array of input data points into a plurality of one-bit precision sub-arrays of shorter address length and of different significance, means for interleaving the sub-arrays into m blocks such that the successive sub-arrays sequentially address a corresponding number of memory blocks in a first stage of a multi-stage structure for splitting an n-point transform into the sum of m n/M -point transforms, each block having a look-up table for mapping the sub-arrays into respective output points and the phase rotation between the output points and the phase rotation between the output points of the second and subsequent blocks with respect to the phase of the output points of the first block being inherent in the look-up tables of the second and subsequent blocks whereby the output data words from the blocks are individually weighted according to the signifcance of the sub-array and combined in further blocks in one or more further stages of the structure without requiring further vector-multiplication.
Accordingly, compared to the operations involved in a butterfly cycle for a conventional FFT, the operations in the equivalent linear binary decomposition cycle consist of looking up the memory for an output value and accumulating the values in an arithmetic unit. This is simpler and hence more efficient. The resulting multi-stage memory structure is therefore similar to the butterfly structures of more conventional FFTs, but provides the ability to perform discrete Fourier transforms of large size with lower power dissipation and/or higher processing speeds and/or reduced circuit complexity.
The invention may be best understood by reference to the following detailed examples in association with the accompanying drawings, in which: Figure 7a illustrates diagrammatically the input array time decomposition for a conventional two-stage FFT butterfly structure for an 8-point discrete Fourier transform, Figure 1b is a vector diagram for a set of unity input vectors in the various stages of forming 8 frequency outputs in an 8-point Fourier transform, Figure 2 is a redrawing of Figure 1 a in which the transform is produced by linear binary decomposition and showing the summation points as a second row of look-up table memory blocks, Figure 3 is a diagrammatic representation of a two-stage look-up memory structure for a 16-point transform, Figures 4(a) and 4{b) are a schematic diagram of an optimised look-up memory structure and a timing diagram therefor respectively, Figure 5 is a schematic diagram of a non-time-shared basic building block in the multi-stage structure of Figure 2 or Figure 3, Figure 6 is a schematic diagram of a time-shared building block in the multi-stage structure of Figure 2 or Figure 3, and, Figure 7 is a schematic diagram of a multi-stage structure for an n-point DFT based on the principle of linear binary decomposition and with total time-sharing within each stage.
The operation at each summation point in the second stage of the Figure la configuration is in fact a DFT.
The circuit can therefore be redrawn as shown in Figure 2, with four 2-point look-up memory blocks in the second stage, and with inherent phase rotation provided in the lower 4-point memory block of the first stage.
For such a simple operation, the second stage need not consist of true look-up tables. But, say, if it does, it can be shown that there is a substantial saving in the amount of memory required compared to a single stage structure.
For a single stage 8-point block using linear binary decomposition, the memory size is 28 x 8 or 2K words.
In the 2-stage configuration of Figure 2, because of the inherent phase rotation in the lower 4-point block of the first stage, the symmetry is destroyed. Hence, the memory size of this block is 24 x 4 x 2 = 128 words.
The memory size of the upper first stage block is halved, equal to 64. Each 2-point block in the second stage requires 22 x 2 = 8 words. The total memory size is therefore 128 + 64 + (8x4) = 224 words.
The saving in memory is even smaller if the n-point transform is split into smaller multiples, giving building blocks of smaller memory size. Figure 3 shows a 2-stage structure for a 16-point transform using 4-point blocks. Structures with more than 2-stages are, of course, also possible.
Because of the inherent phase rotation is some of the blocks and not in the others, an "in place" processing strategy, whereby the output of one stage is fed to the input of the same stage for a second round of operations, is not feasible. The circuits have to be true pipe-line structures. As such, the throughput rate will be virtually the same irrespective of the number of stages invovled.
The basic circuit for a single-stage structureusing the linear binary decomposition technique is described in ourcopending application 81 16855. The multi-bit words representing the real and imaginary parts of the input vectors are inserted in parallel into respective shift registers, least significant bit first. The bits are read out from the shift registers in parallel, most significant bit first, under the control of clock pulses and applied bia address buses to a number of read-only memories (ROM's). The number of ROM's is equal to the number of output points required. Each ROM responds to the multi-bit address to give either one or two sets of outputs, these being the I and Q outputs respectively.
The I and Q outputs from the ROM's are then processed in an arithmetic (ALU) unit consisting of adders and latches to derive the final read and imaginary output words for each output point in the Fourier transform. The above operations can be performed by the optimised ALU structure shown in Figure 4(a). In this, one ALU is time multiplexed between four add/subtract operations. The cycle begins with both 'I' and 'Q' tri-state output shift registers having been shifted right by one bit effecting the 2k scaling. The 'I' output is first routed to the 'B' data lines of the ALU with the 'Q' output inhibited. Simultaneous with this, the in-phase input is used to address the 'I' ROM and 'Q' output the 'Q' ROM. The 'I' ROM is enabled first, passing an I output to the 'A' data lines of the ALU, which is then added to the previous value and latched.The 'Q' ROM is then enabled, passing a QQ output to the 'A' lines, which is then subtracted (by using the function control on the ALU) from the previous value and latched. At this point, an intermediate in-phase output, incorporating one 11 - QQ update, is formed on the 'I' shift register. The 'Q' shift register is then enabled instead, passing the existing 'Q' value to the ALU. The addresses to the 'I' and 'Q' ROM are now interchanged, effecting the accumulation of IQ + Ql to the existing 'Q' ROM are now interchanged, effecting the accumulation of 1o + Q to the existing 'Q'value. The cycle repeats itself until all the bits are processed. Refer to Figure 4(b) for the relative timing between the various components.
The cycle time of this circuit is equal to 2 x (ALU add time + ROM access time).
By suitable choice of components to match the access time to the add time plus the various overheads, the throughput of this circuit can usually be fairly well optimised.
Figure 5 of the accompanying drawings is a simplified schematic diagram of such a system. This can be used as a basic building block in a multi-stage structure. Each output point block in this diagram is in fact a simplified representation of the diagram described above.
Assume that an 8-point transform is being split into the sum of two 4-point transforms. In this case the sequence of sub-arrays producing the first, third, fifth and seventh output points will address all the I and Q ROM's in the block of Figure 5, while the sequence of sub-arrays producing the second, fourth, sixth and eighth output points will address a similar number of ROM's in a second building block identical to that shown in Figure 5. The resulting output words from each pair of I and Q ROM's in each building block are then weighted and summed in respective arithmetic units (ALU) blocks to produce two groups of four I outputwords and four Q output words representing four output points of a 4-point transform.
The corresponding words in the two groups of words must then be added and subtracted to produce the final eight output words representing the output points of the required 8-point transform. Provided the look-up tables in the lower block of ROM's have been compiled to take account of the inherent phase rotation term which is generated when the transform is split into two, these final additions and subtractions can be formed directly without the need for a vector multiplication.
The building block of Figure 5 can be modified as shown in Figure 6 to reduce power dissipation by time-sharing the ALU's between the various blocks. In this way, the 4-points from each block will be outputted sequentially, requiring four times the time, but, on the other hand, one quarter of the ALU dissipation.
Figure 6 also shows the I and Q memories combined into a larger memory of four times the original capacity. This can be done since data from no more than one point is required by the ALU at one time. It is achieved by arranging each shift register in the form of a cyclic store to provide the same address to each of the I and Q ROM's through address lines A3 to Al 0 eight times per transform cycle, to cater for each of eight output points formed in sequence. There are, therefore, a total of sixteen such shift registers. A timing circit, which is common to all blocks, selects one out of eight sets of output data for each cycle of the cyclic store through address lines Ao to A2. Thus instead of each sub-array simultaneously addressing a number of ROM's in each block, the arrays are presented sequentially to a single ROM in a number of cycles of the cyclic store.
The time sharing can be taken one step further by combining components not only within one block but all common components within a single stage of a multi-stage structure. The result is shown in the simplified schematic diagram of Figure 7. Each cyclic store has two sets of 2n channels of parallel-in, serial-out shift-registers to allow parallel dumping of data from the ALU block into one set, and simultaneous addressing of the next stage memory through the other set.

Claims (5)

1. A digital data processing system for producing an n-point Fourier transform, the system comprising means for splitting a binary coded array of input data points into a plurality of one-bit precision sub-arrays of shorter address length and of different significance, means for interleaving the sub-arrays into m blocks such that the successive sub-arrays sequentially address a corresponding number of memory blocks in a first stage of a multi-stage structure for splitting an n-point transform into the sum of m n/m -point transforms, each block having a look-up table for mapping the sub-arrays into respective output points and the phase rotation between the output points and the phase rotation between the output points of the second and subsequent blocks with respect to the phase of the output points of the first block being inherent in the look-up tables of the second and subsequent blocks whereby the output data words from the blocks are individually weighted according to the significance of the sub-array and combined in further blocks in one or more further stages of the structure without requiring further vector-multiplication.
2. A system according to claim 1 in which each block consists of a plurality of pairs of ROM's for the real and imaginary parts of the input vectors, and each pair of ROM's has an associated arithmetic unit.
3. A system according to claim 1 in which each block consists of a single pair of ROM's with a common arithmetic unit, and means for time-sharing the ROM's between the successive sequences of input sub-arrays.
4. A system according to claim 1 in which each stage of the multi-stage structure includes a cyclic store, a memory and an arithmetic unit, the first stage store cycling the interleaved sequences of sub-arrays and the second and subsequent stage stores each cycling the output words from the arithmetic unit of the preceding stage.
5. A system according to claim 1 and substantially as herein described with reference to the accompanying drawings.
GB08134091A 1981-11-12 1981-11-12 Digital data processing system Expired GB2109961B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB08134091A GB2109961B (en) 1981-11-12 1981-11-12 Digital data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB08134091A GB2109961B (en) 1981-11-12 1981-11-12 Digital data processing system

Publications (2)

Publication Number Publication Date
GB2109961A true GB2109961A (en) 1983-06-08
GB2109961B GB2109961B (en) 1985-06-12

Family

ID=10525815

Family Applications (1)

Application Number Title Priority Date Filing Date
GB08134091A Expired GB2109961B (en) 1981-11-12 1981-11-12 Digital data processing system

Country Status (1)

Country Link
GB (1) GB2109961B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2380829A (en) * 2001-10-12 2003-04-16 Siroyan Ltd Organization of fast fourier transforms

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2380829A (en) * 2001-10-12 2003-04-16 Siroyan Ltd Organization of fast fourier transforms

Also Published As

Publication number Publication date
GB2109961B (en) 1985-06-12

Similar Documents

Publication Publication Date Title
CA2318449C (en) Pipelined fast fourier transform processor
US4601006A (en) Architecture for two dimensional fast fourier transform
JPH02501601A (en) 2D discrete cosine transform processor
EP0083967A2 (en) Monolithic fast Fourier transform circuit
EP0902375A2 (en) Apparatus for fast Fourier transform
US5297070A (en) Transform processing circuit
US5038311A (en) Pipelined fast fourier transform processor
US4275452A (en) Simplified fast fourier transform butterfly arithmetic unit
US5034910A (en) Systolic fast Fourier transform method and apparatus
US4769779A (en) Systolic complex multiplier
US5233551A (en) Radix-12 DFT/FFT building block
US5491652A (en) Fast Fourier transform address generator
Boriakoff FFT computation with systolic arrays, a new architecture
US3956619A (en) Pipeline walsh-hadamard transformations
US5270953A (en) Fast convolution multiplier
Gorman et al. Partial column FFT pipelines
US3816729A (en) Real time fourier transformation apparatus
US4831574A (en) Device for computing a digital transform of a signal
US3881100A (en) Real-time fourier transformation apparatus
US7653676B2 (en) Efficient mapping of FFT to a reconfigurable parallel and pipeline data flow machine
US4965761A (en) Fast discrete fourier transform apparatus and method
KR0175733B1 (en) Vlsi for transforming beat serial matrix
US6728742B1 (en) Data storage patterns for fast fourier transforms
Zhou et al. Novel design of multiplier-less FFT processors
GB2109961A (en) Digital data processing system

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee