GB2459339A - Pipelined 2D fast Fourier transform with three permutation stages, two FFT processor units and a twiddle factor unit. - Google Patents

Pipelined 2D fast Fourier transform with three permutation stages, two FFT processor units and a twiddle factor unit. Download PDF

Info

Publication number
GB2459339A
GB2459339A GB0807577A GB0807577A GB2459339A GB 2459339 A GB2459339 A GB 2459339A GB 0807577 A GB0807577 A GB 0807577A GB 0807577 A GB0807577 A GB 0807577A GB 2459339 A GB2459339 A GB 2459339A
Authority
GB
United Kingdom
Prior art keywords
data
fft processor
words
permuted
permutation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0807577A
Other versions
GB2459339A8 (en
GB0807577D0 (en
Inventor
Simon John Shepherd
James Mackenzie Noras
Yuan Zhou
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Bradford
Original Assignee
University of Bradford
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Bradford filed Critical University of Bradford
Priority to GB0807577A priority Critical patent/GB2459339A/en
Publication of GB0807577D0 publication Critical patent/GB0807577D0/en
Priority to PCT/GB2009/050396 priority patent/WO2009130498A2/en
Publication of GB2459339A publication Critical patent/GB2459339A/en
Publication of GB2459339A8 publication Critical patent/GB2459339A8/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • G06F17/142Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/76Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data
    • G06F7/78Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data for changing the order of data flow, e.g. matrix transposition or LIFO buffers; Overflow or underflow handling therefor
    • G06F7/785Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data for changing the order of data flow, e.g. matrix transposition or LIFO buffers; Overflow or underflow handling therefor having a sequence of storage locations each being individually accessible for both enqueue and dequeue operations, e.g. using a RAM

Landscapes

  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Discrete Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

Disclosed is a 2D FFT processor (100) for data inputs of size N by N (2k or 8k-point side square matrix input data), with an m-point FFT processor unit (10) and an n-point FFT processor unit (20) in combination. N = rn*n with m and n are positive integers. A first permutation unit (31) permutes the input data into first permuted data arranged in n*n data blocks each of size m*m words. The first m-point FFT processor unit (10) performs a Fourier transform on the first permuted data to provide first transformed data arranged in n*n data blocks each of size m*m words. A second permutation unit (32) permutes the first transformed data into second permuted data arranged in m*m data blocks each of size n*n words. A twiddle factor multiplication unit (40) comprises a complex multiplier arranged to multiply each word of the second permuted data by a predetermined twiddle factor to provide twiddle factor multiplied data. The n-point FFT processor unit (20) is arranged to perform a Fourier transform on the twiddle factor multiplied data to provide second transformed data arranged in m*m data blocks each of size n*n words. A third permutation unit (33) permutes the second transformed data into third permuted data and outputs the third permuted data in a N by N matrix as a 2D Fourier transform of the input data.

Description

PIPELINED 2D FFT PROCESSOR
Field of the Invention
The present invention relates in general to the field of fast Fourier transform (FFT) processors. More particularly, the present invention relates to a FFT processor for two-dimensional (2D) transforms.
Background of the Invention
FFT processors are a vital component of almost all modern digital signal processing systems. In particular, FFT processors are vital in most digital communication systems such as wireless computer networks and cellular telephone systems. For example, digital communication systems based on the OFDM technique use FFTs for signal modulation. 2K, 1K, 512 and 256-point FFT processors are often needed in digital audio broadcasting (DAB) systems, whilst 2K and even 8K-point FFT processors are often required in digital video broadcasting (DVB) systems.
Area, speed and power consumption are three main parameters of an FFT processor, and determine whether a particular FFT processor architecture will be successfully integrated into a digital signal processing (DSP) system. To achieve high throughput and low power consumption, especially in various real-time applications, systolic FFT processors are often used. A systolic FFT processor comprises a chain of processing units (also called pipeline elements or PEs) which pass data through the system continuously. Typically, an N-point systolic FFT processor can complete one transform in N cycles and hence systolic FFT processors are economic in term of clock cycles. However, systolic processors for large FFTs such as 2K and 8K-point FFTs consume large silicon area. Most architectures of the related art include many complex multipliers for multiplications with twiddle factors and commutators for permuting intermediate results, both of which involve a large component area.
Many approaches have been proposed in the related art to reduce silicon area. First, internal shift registers have been used in each pipeline element for scheduling the data entering into a butterfly and storing intermediate results. Second, delay commutators have been used for switching data among data paths. Further, CORDIC techniques, ROM-based designs and parallel adders have each been used to implement multipliers. Finally, radix-8 and radix-16 algorithms instead of radix-2 and radix-4 algorithms have been used to reduce the number of multipliers.
WO-A-2005/052808 describes several different architectures of FFT processors in the related art and in particular discusses a pipelined FFT processor having memory address interleaving between adjacent butterfly units. Here, an interleaver reorders the output of a first butterfly unit so as to provide reordered data in a required order as an input to a subsequent second butterfly unit. However, even this recently published example the related art can be further improved in relation to area, speed and/or power consumption.
The difficulties of the related art are particularly acute for FFTs performing 2D transforms, because the area, speed and power consumption problems are magnified for the larger 2D data set.
Summary of the Invention
According to the present invention there is provided a pipelined 2D FFT processor as set forth in the claims appended hereto. Also, according to the present invention there is provided a digital signal processing apparatus incorporating such a 2D FFT processor as set forth in the claims appended hereto. Further, according to the present invention there is provided a digital signal processing method as set forth in the claims appended hereto. Further still, according to the present invention there is provided a testing apparatus for testing a 2D FFT processor as set forth in the claims appended hereto. Other, optional, features of the invention will be appreciated from the dependent claims and the discussion that follows.
According to an aspect of the present invention there is provided 2D FFT processor which, at least in some example embodiments, is economical in terms of the area consumed.
The exemplary 2D FFT processor also maintains a high throughput. Further, the exemplary embodiments provide a 2D FFT processor which is readily adapted to different specific implementations and is readily fabricated as a dedicated hardware device. Further still, the exemplary embodiments provide a 2D FFT processor which minimises a number of multipliers and uses a compact and simplified permutation scheme. Thus, the exemplary 2D FFT processor minimises area requirements whilst maintaining a high speed and a high output signal-to-noise ratio (SNR). The exemplary embodiments are particularly beneficial for a large-point 2D FFT processor such as a 2K-point 2D FFT.
When operating on a single column of data (i.e. 1D data), the FFT processor proceeds by dividing N words of data (i.e. N points) into factors m and n and performing, in order, a first permutation, then a Kronecker matrix with m instances of f_trans(n), then a second (inverse) permutation, then a multiplication by a diagonal twiddle matrix, then a second Kronecker with n instances of f_trans(m), and then finally the initial permutation again. Advantageously, data is held (e.g. in local memory) only in sections of size m and n for the two Kronecker phases.
Thus, for 1D transforms, the exemplary architecture can be expressed by the following equation A: Equation A F*X = Pmn*kron(ln,Fm)*Dmn*Pmn*kron(lm,Fn)*Pmn*X where Pmn is the transpose of Pmn.
In the present invention, this transform is now extended to a two-dimensional transform.
Here, the data are not in the form of a column matrix of length N = m*n, but instead in the form of a square matrix of side N = m*n.
Taking the above equation A, the exemplary embodiments now provide an algorithm with the (square) data matrix nested in the middle of two sets of operations, with Equation A of the one set on one side and the transpose of this set on the other. The 2D transform is expressed by the following Equation B: Equation B F*X=Pmn*kron(ln, Fm)*Dmn*Pmn*kron(l m, Fn)*Pmn*X.
In Equation B, X. means the transpose of X, rather than the complex conjugate of X. The exemplary FFT processor runs through the same set of operations as noted above for a 1D transform, with the same number of steps, but now as a two-dimensional transform, giving significant savings.
Data are now provided in blocks of size m*m and n*n for processing in the 2D FFT processor. If the initial data are treated separately for their real and imaginary parts, then it is noted that there are symmetry savings for the transforms of each block. That is, results occur (mostly) in conjugate pairs, cutting down the required number of operations.
In the exemplary embodiments, diagonal twiddle-factor matrices are merged and applied as a single, full matrix to all the 2D data. Thus, diagonal twiddle-factor matrices do not need to be applied first to rows then to columns.
In the exemplary embodiments, not all the elements need to be applied in multiplication of data. Starting with the initial m by m blocks (n*n of them), the second permutation after Kroneckering maps these into m*m n by n blocks, ready for the second Kronecker after twiddle factors are applied. These blocks can mostly (apart from one in the top left corner) be sorted into pairs where the data contained in one are equal to a conjugate perm of the other, so that only about half the twiddle multiplications need be done, and only about half the n by n sized transforms. Here, the blocks of data are written back to two locations, with a permutation of the computed data to get the second block.
In one aspect of the present invention there is provided a N by N 2D FFT processor suitable for large data inputs (e.g. 2k or 8k-point side square matrix input data), comprising an m-point FFT processor unit and an n-point FFT processor unit in combination, where N = m*n and m and n are any positive integers. A first permutation unit is arranged to receive the N words of input data and to permute the input data into first permuted data arranged in n*n data blocks each of size m*m words. The first rn-point FFT processor unit is arranged to perform a Fourier transform on the first permuted data to provide first transformed data arranged in n*n data blocks each of size m*m words. A second permutation unit is arranged to permute the first transformed data into second permuted data arranged in m*m data blocks each of size n*n words. A twiddle factor multiplication unit comprises a complex multiplier arranged to multiply each word of the second permuted data by a predetermined twiddle factor to provide twiddle factor multiplied data. The n-point FFT processor unit is arranged to perform a Fourier transform on the twiddle factor multiplied data to provide second transformed data arranged in m*m data blocks each of size n*n words. A third permutation unit is arranged to permute the second transformed data into third permuted data and to output the third permuted data in a N by N matrix as a 2D Fourier transform of the input data.
In a further aspect of the present invention there is provided a digital signal processing apparatus, such as a digital audio broadcasting receiver or a digital video broadcasting receiver, comprising a receiver unit arranged to receive input data of length N words, a FFT processor arranged to perform a fast Fourier transform of the N words of input data to produce N words of output data, and an output unit arranged to output the N words of output data, wherein the FFT processor is arranged as set forth herein.
In a further aspect of the present invention there is provided a method of performing a 2D fast Fourier transform on N by N words of input data arranged in a square matrix of side N, wherein N = m x n, wherein m and n are both positive integers, the method comprising: receiving the N by N words of input data (700); permuting the input data (700) into first permuted data (710) arranged in n*n data blocks each of size m by m words; performing a fast Fourier transform on the first permuted data (710) using a first m-point FFT processor unit (10) to provide first transformed data (720) arranged in n*n data blocks each of size m by m words; permuting the first transformed data (720) into second permuted data (730) arranged in m*m data blocks each of size n by n words; multiplying each of the words of the second permuted data (730) by a predetermined twiddle factor to provide twiddle factor multiplied data (740); performing a fast Fourier transform on the twiddle factor multiplied data (740) using a second n-point FFT processor unit (20) to provide second transformed data (750) arranged in m*m data blocks each of size n by n words; permuting the second transformed data (750) into third permuted data (760); and outputting the third permuted data (760) as a 2D fast Fourier transform of the input data (700).
In another aspect of the present invention there is provided a computer-readable storage medium having recorded thereon computer instructions to perform any of the methods recited herein.
In a still further aspect of the present invention there is provided a testing apparatus for testing a 2D FFT processor arranged to perform a 2D fast Fourier transform on N by N words of input data where N = m x n, where m and n are both positive integers, the testing apparatus comprising a first selector unit arranged to select one of a plurality of m-point FFT processor units, and a second selector unit arranged to select one of a plurality of n-point FFT processor units, whereby the selected one of the plurality of m-point FFT processor units and the selected one of the plurality of n-point FFT processor units are arranged in combination to provide the N by N 2D FFT processor.
Brief Description of the Drawings
For a better understanding of the invention and to show how embodiments of the same may be carried into effect, reference will now be made by way of example to the accompanying diagrammatic drawings in which: Figure 1 is a schematic block diagram of a 2D FFT processor according to an exemplary embodiment of the present invention; Figure 2 is a schematic block diagram of dataflow through the exemplary FFT processor; Figure 3 is schematic block diagram of a 2K-point FFT processor according to an exemplary embodiment of the present invention; Figure 4 is a schematic block diagram of an exemplary twiddle factor multiplication unit; Figure 5 is a schematic block diagram of an exemplary first FFT processor unit; Figure 6 is a schematic block diagram of an exemplary second FFT processor unit; Figure 7 is a schematic block diagram of an exemplary constant multiplier unit; Figure 8 is a schematic block diagram of an exemplary dual-port RAM unit; Figure 9 is a schematic floor plan illustrating area consumption of an FFT processor according to an exemplary embodiment of the present invention; Figure 10 is a schematic flow diagram illustrating an example FFT processing method; Figure 11 is a schematic block diagram of an exemplary digital signal processing apparatus; and Figure 12 is a schematic block diagram of an exemplary testing and evaluation apparatus for an FFT processor according to a further aspect of the present invention.
Detailed Description of the Exemplary Embodiments
The following detailed description of the exemplary embodiments first discusses a 2K-point 1D FFT processor which is suitable for signal demodulation in a digital audio broadcasting (DAB) system. Then, an additional 2D embodiment will be described. However, it will be appreciated that this example embodiment is not intended to limit the more general teachings of the present invention which will be ascertained by those of ordinary skill in the art
from the following detailed description.
The exemplary embodiments of the FFT processor discussed herein balance the competing demands that arise when considering particularly speed (throughput and/or latency), area and power consumption of the processor. Here, throughput concerns the volume of data which the processor is able to handle. Latency concerns the delay between an input signal being received and a useful output being produced from the FFT processor. Area concerns the physical size of the FFT processor, particularly the physical size of the processor when constructed as an integrated circuit as either a stand-alone component or as part of a more complex circuit. Power consumption concerns electric current drawn by the processor in operation, and is particularly relevant in modern hand-held battery-powered equipment.
Figure 1 shows a schematic block overview of the architecture of the exemplary 2D FFT processor 100. Here, the 2D FFT processor 100 comprises first, second and third permutation units 31, 32 & 33, a first FFT processor unit 10, a second FFT processor unit 20, a twiddle factor multiplication unit 40, and a permutation controller 50. Other control elements such as clock signals have not been shown for clarity, because these elements in themselves are familiar to persons of ordinary skill in this art.
The first FFT processor unit 10 and the second FFT processor unit 20 are each self-contained low-point FFT processors. The first and second FFT processor units 10, 20 are used in combination and cooperatively form the high-point FFT processor 100.
The permutation units 31, 32, 33 perform global permutations on the data passing through the FFT processor 100. These global permutations assist in simplifying the twiddle factor multiplication performed by the twiddle factor multiplication unit 40. In particular, the permutation units 31, 32, 33 apply global permutations such that the data lies close to the leading diagonal or effective diagonal of the Fourier matrix. Further, the global permutations allow the FFT processor 100 to receive input data in natural order and to output transformed data in natural order.
In general terms for a 2DFFT, the input data lies in a square matrix of side N, and let m and n be factors of N such that N = m x n. Here, m and n are both positive integers such that N is any non-prime positive integer. The first FFT processor unit 10 is an m-point FFT processor and the second FFT processor unit 20 is an n-point FFT processor.
The exemplary architecture operates on an N-point data column by dividing into factors m and n and performing, in order, a first permutation, then a Kronecker matrix with m instances of f_trans(n), then a second (inverse) permutation, then a multiplication by a diagonal twiddle matrix, then a second Kronecker with n instances of f_trans(m), and then finally the initial permutation again. Advantageously, data is held (e.g. in local memory) only in sections of size m and n for the two Kronecker phases. Thus, taking the input data as "X" and Pmn as the transpose of Pmn, the exemplary architecture can be expressed by the following equation: F*X = Pmn*kron(ln,Fm)*Dmn*Pmn*kron(lm,Fn)*Pmn*X In the special case where N = 2n2 (i.e. m=2n) then the N-point FFT processor 100 comprises a first 2n-point processor 10 and a second n-point FFT processor 20. Alternatively, in the special case where N=n2 (i.e. m=n), then two n-point FFT processors 10, 20 are employed. Thus, it has been found that the architecture of Figure 1 is most efficient when N is apoweroftwo.
The architecture of Figure 1 is particularly effective for large-point FFT processors. Here, the exemplary architecture operates efficiently such as where N is greater than 256, more effective still when N is equal to or greater than 1024, and most effective when N is equal to or greater than 2048, because of the increased efficiency of the architecture for larger-point FFTs.
Figure 2 is a simplified overview of dataflow through the FFT processor 100 of Figure 1, when processing a 1D column of data. An understanding of the 2D FFT processor of the present invention can be understood by first illustrating the same components when acting on a 1D data column.
The FFT processor 100 receives a set of N-word input data 700 suitably in natural order.
The first permutation unit 31 performs a first global permutation on the N data words to permute the input data 700 into first permuted data 710 which are arranged in n data blocks each of length m words (i.e. n length-m data sequences).
The first FFT processor unit 10 performs a first m-point fast Fourier transform on the first permuted data 710 to provide first transformed data 720. Here, each of the n blocks of length m words is passed separately in turn through the first m-point FFT processor unit 10 and the resultant n blocks are written into the second permutation unit 32 as the first transformed data 720.
The second permutation unit 32 performs a second global permutation on the N data words to permute the first transformed data 720 into second permuted data 730 arranged in m blocks each of length n words (i.e. m length-n data sequences).
The twiddle factor multiplication unit 40 multiplies each of the N words of the second permuted data 730 by a predetermined twiddle factor to provide twiddle factor multiplied data 740.
The second n-point FFT processor unit 20 performs a second fast Fourier transform on the twiddle factor multiplied data 740 to provide second transformed data 750. Here, each of the m blocks of length n words is passed separately in turn through the second n-point FFT processor unit 20 and the resultant m blocks are written into the third permutation unit 33 as the second transformed data 750.
The third permutation unit 33 performs a third global permutation on the N data words to permute the second transformed data 750 into third permuted data 760. The third permuted data 760 is then output as output data from the FFT processor 100 as the N-point fast Fourier transform of the input data 700. Suitably, the third global permutation performed by the third permutation unit 33 provides the output data 760 in the natural order corresponding to the input data 700.
It will be appreciated that the architecture of Figures 1 and 2 is readily adapted to perform discrete Fourier transforms (DFT) or inverse fast Fourier transforms (IFFT).
Figure 3 is a schematic block diagram showing the exemplary 1D FFT processor 100 in greater detail. In this specific example, the 2048-point FFT processor 100 is obtained by combining a first 64-point FFT processor 10 and a second 32-point FFT processor 20. That is, N=2048, m=64 and n=32.
As shown in Figure 3, the first, second and third permutation units 31, 32 & 33 each comprise a RAM of size N words. In the exemplary embodiments, the permutation units 31, 32 & 33 each comprise a single-port RAM. Conveniently, single-port RAM is more area-efficient, smaller and cheaper than dual-port RAM. Each single-port RAM operates in read-before-write mode whereby data is written into and read from the RAM according to an address signal supplied by the permutation controller 50. The input data is written into the RAM in sequential address order and then read from the RAM according to the permuted address sequence supplied by the permutation controller 50. In this example, the address signal provided by the controller 50 to each RAM 31, 32, 33 repeats every 11*2K clock cycles. Hence, the permutation controller may be constructed with a small number of commonly available components including counters, shifters, modular arithmetic units (eg. adders) and multiplexers as will be familiar to persons skilled in the art.
The exemplary 1D FFT processor shown in Figure 3 uses fixed point arithmetic to achieve high speed. A mixed scaling scheme is employed to avoid overflow, which maintains good accuracy whilst keeping the structure of each of the smaller FFT processor units 10, 20 relatively simple. Here, the input word length is 8 bits. The data word length increases to 15 bits after the 64-point first FFT processor unit 10, because the maximum word length increment of a 2xpoint FFT is x+1 bits. Then, the data is shifted and chopped to 12 bits at the output of the 64-point first FFT processor 10. Each block of 64 words has one scaling factor and 32 scaling factors are obtained for each 2K of input data. After the second global permutation, each of the 2K data words are adjusted with one scaling factor. The word length is expanded from 13 bits to 19 bits in the 32-point second FFT processor unit 20 and is shifted and chopped to 12 bits at the output thereof. Then, each block of 32 data words has one scaling factor and sixty-four scaling factors are obtained for each 2K of data. After the third global permutation, each 2K of data are adjusted with one scaling factor. Given an input signal-to-noise (SNR) ratio of 48dB for the 8-bit data, the output SNR is greater than 42 dB with such a scaling scheme. Thus, the exemplary architecture achieves a high output SNR with a simple structure, especially because the permutation RAMs 31, 32, 33 are also used for adjusting word length instead of using extra RAMs at each stage.
In the general case where N = m x n, then the first and third permutations are found from Equation 1 below, while the second permutation is found from Equation 2 below: Equation - and 3rd permutations for m * n foriloop=1,2 m,and jloopl,2 n ADDR(jloop + (iloop1)*n) = iloop + (jIoop1)*m Equation 2 -2nd permutation for m * n foriloop=1,2 m andjloopl,2 n ADDR(iloop + (jloop-1)*m) = jloop + (iloop-i)*n In the special case where N = n2 (i.e. where m n), then conveniently all three permutation units 31, 32, 33 perform the same permutation as expressed by Equation 3 below: Equation 3 -1st 2nd and 3rd permutations for m = n For a = 0, log2(N), ADDR = b*2Aa mod (N-i), when b E [0 N-2], ADDR = N-i when b = N-i, where b= 0,1,2,3 N-i Note that the value of "b" repeats every N cycles. Also, the value of "a" changes every N cycles and repeats every 2N cycles.
In the special case where N = n*m (i.e. where m = 2n), then the permutation of the second permutation unit 32 is still found in Equation 3 above, whereas the permutations performed by the first and third permutation units 31, 33 are now both expressed by Equation 4 below: Equation 4 -1st and 3rd permutations for m = 2n For a = c*log2(n) mod log2(N), c = 0,1,2,3 log2(N)-i, ADDR = b*2Aa mod (N-i), when b E [0 N-2], ADDR = N-i when b = N-i, where b0,i,2...N-i, Here, the value of "a" changes every N cycles and repeats every log2(N) cycles For the exemplary embodiment under consideration where N=2048, the input data 700 comprises 2048 eight-bit words which are arranged in an ordered numerical input address sequence, e.g. in a linear sequence from address "1" through to address "2048". The input data 700 is written into the RAM 31 in this natural order and then read out as the first permuted data 710. The permuted address sequence from the controller 50 selects elements of the input data 700 in turn to form a first block of length m words. Where m=64 and n=32 (i.e. m = 2n) as shown in Figure 3, the first data element and then every subsequent 32nd element of the input data 700 is selected in turn to form a first block of length 64 words.. Then, the second block is formed by selecting the second element and every subsequent 32nd element. This process continues iteratively until the 32nd block is obtained by selecting the 32nd element and each subsequent 32nd element including the last 2048th element.
As can be seen by the generic equations expressed above, the second global permutation performed by the second permutation unit 32 rearranges the n blocks each of length m words into m blocks each of length n words.
Finally, the third global permutation performed by the third permutation unit 33 rearranges the m blocks of length n words back into natural order as a reversal of the first global permutation.
As a further explanatory example, the RAM addressing to perform the general permutations is illustrated in the following MATLAB code. Again, for N=m*n, the 1st and 3rd permutations are achieved by the addressing: a = 0; do { a=a+ 1 for row = 0:N-2 Addra = (row*na)mod(N_1); end Addra = N-i; }while(ka mod (N-i)!=i) For the 2nd permutation, the example MATLAB code is: a=0; do { a=a+ i; For row = 0:N-2 Addrb = (row * ma)mod(N -1) End Addrb = N-i; }while(ma mod (N-i)!=i) To further illustrate this particular example addressing for the permutation units 3i-33, let us consider a simplified case where N=6, m=3 & n=2. Here, a first set of the 6-point input data is received in natural order as the words: xiO, xii, xi2, xi3, xi4, xi5. Following the first permutation, the order becomes: [xiO, xi2, xi4], [xii, xi3, xi5] as n=2 blocks of length m=3.
Using a single-port RAM in read-before-write mode, the next N-point set of six words are written into these same RAM addresses 0,2,4,i,3,5 and are read out of these addresses according to the required permutation, i.e. in the address sequence 0,4,3,2,i,5. In this way, this next set of data x20,x2i,x22,x23,x24,x25 is correctly permuted to [x20,x22,x24],[x2i,x23,25]. The third set of input data x30,x3i,x32,x33,x34,x35 is now again written into these locations as the old second set of data is read out and the next address sequence applied, i.e. 0,3,i,4,2,5, to read out the permuted data [x30,x32,x34],[x3i,x33,35]. A fourth data set x40,x4i,x42,x43,x44,x45 now uses these locations and is read out in the address sequence 0,i,2,3,4,5 to provide [x40,x42,x44],[x4i,x43,45]. At this point, in this simple example, the above address sequences now repeat indefinitely for the fifth and subsequent sets of input data, allowing the FFT processor to receive further data sets continuously.
Figure 4 shows the twiddle factor multiplication unit 40 in more detail. Here, the twiddle factor multiplication unit 40 comprises a ROM 44 that stores the twiddle factors and a complex multiplier 42 to multiply the stored twiddle factors supplied from the ROM 44 in turn with the second permuted data 730. In the exemplary embodiment, the ROM 44 stores 2K words of twiddle factor data or more generally N words of predetermined twiddle factor data. In other exemplary embodiments, a twiddle factor generator is used to dynamically generate the twiddle factors. However, the ROM is a more convenient implementation in many circumstances and requires less area than a dynamic generator.
Figure 5 is a schematic block diagram of the first FFT processor unit 10 in more detail.
To minimise the number of multipliers, this example 64-point FFT processor is based on the radix-8 algorithm. As shown in Figure 5, the first FFT processor unit 10 comprises six pipeline elements (PE) 101-106, a first constant multiplier 110, a second constant multiplier 120, a twiddle factor multiplier unit 140, and a dual-port RAM 150. Each pipeline element 101-106 comprises a radix-2 butterfly 111-116 and a first-in-first-out (FIFO) buffer 121-126. The FIFO buffers 12 1-126 are used for scheduling the data entering into the respective butterfly unit 111- 116, and storing the intermediate results therefrom, so that a single data stream goes through the first FFT processor unit 10. The twiddle factor multiplier unit 140 comprises a complex multiplier 142 and a ROM 144 which stores sixty-four words of local twiddle factor data. The 128-word (2m word) dual-port RAM 150 is used to reorder the data output from the final pipeline element 106 sO that the transform results from the first FFT processor 10 are obtained in a natural order for each block of data.
Figure 6 is a schematic block diagram of the second FFT processor unit 20, comprising first to fifth pipeline elements 201-205, one constant multiplier 220, one twiddle factor multiplier unit 240 and a 64-word (2n word) dual-port RAM 250. Each of the pipeline elements 20 1-205 comprises a radix-2 butterfly 211-215 and respective FIFO buffers 221-225. The internal architecture of this second FFT unit 20 is similar in construction to the first FFT 10 already described above.
Notably, in the exemplary embodiment discussed above, only 286 words of RAM are used for local data permutation and buffering in the first and second FFT processor units 10, 20.
Figure 7 is a schematic diagram showing the construction of the constant multiplier units 110, 120, 220 used in the first and second FFT processor units 10, 20. In Figure 7, the first constant multiplier unit 110 is shown for illustration. The constant multipliers 110, 120, 210 are used for multiplications with i, -i and 0.707 1 1*(�1i).
To minimize the number of adders and subtracters, canonic signed digit (CSD) and subexpression sharing techniques are used for implementing multiplications with 0.70711. For example, the 9-bit CSD coding of 0.70711 is 1.0-10-10101, so the multiplication with 0.70711 can be implemented with 3 additions and 3 shifts. As shown in Figure 7, the constant multiplier 110 can be constructed with several adders, subtracters, negators and two multiplexers.
Figure 8 is a schematic diagram showing an interface of the dual-port RAM used for each of the FIFO buffers 121-126, 221-225. Each of these dual-port RAMS has two independent ports that enable simultaneous access to a single memory space. One port of the dual-port RAM is configured in a write-only mode, whilst the other is configured in a read-only mode. As the dual-port RAM is filled with data, the data are sent to the output port in the same sequence as it enters the RAM.
Figure 9 shows an example floor plan of the above 2K-point FFT processor 100 when implemented using a field programmable gate array (FPGA). Here, it will be appreciated that the complex multiplier unit 40 occupies approximately 118th of the total area. By contrast, the single port 2K RAMs of the first, second and third permutation units 31, 24 and 33 occupy a much smaller proportion of the overall area. Thus, the exemplary FFT processor architecture requires only a minimum number of complex multipliers. Further, as shown in Figure 9, the exemplary architecture employs single-port RAMs 31, 32 & 33 which have a relatively small area and also have relatively low power consumption, compared with other permutation arrangements requiring shift registers or dual-port RAMs which require a much larger area and/or have much larger power consumption.
As noted above, the exemplary 2K-point FFT processor 100 is based on the radix-64/32 algorithm and is constructed using a 64-point FFT processor unit 10, a 32-point FFT processor 20, and three 2K-word permutation RAMs 31, 32, 33. This exemplary FFT processor 100 completes one 2K-point DFT in 2K clock cycles with a delay of 6K clock cycles. Thus, the exemplary architecture has a high throughput. However, there is a slight disadvantage in that there is a relatively long latency.
Figure 10 is a schematic overview of a 1D digital signal processing method. Here, consistent with the more detailed discussion already provided herein, the method includes a Step 1001 of receiving the N words of input data (700). Step 1002 includes permuting the input data (700) into first permuted data (710) arranged in n data blocks each of length m words.
Step 1003 includes performing a fast Fourier transform on the first permuted data (710) using a first m-point FFT processor unit (10) to provide first transformed data (720) arranged in n data blocks each of length m words. Step 1004 includes permuting the first transformed data (720) into second permuted data (730) arranged in m data blocks each of length n words. Step 1005 includes multiplying each of the words of the second permuted data (730) by a predetermined twiddle factor to provide twiddle factor multiplied data (740). Step 1006 includes performing a fast Fourier transform on the twiddle factor multiplied data (740) using a second n-point FFT processor unit (20) to provide second transformed data (750) arranged in m data blocks each of length n words. Step 1007 includes permuting the second transformed data (750) into third permuted data (760). Step 1008 includes outputting the third permuted data (760) as an N-point fast Fourier transform of the input data (700).
It will be appreciated that, when operating on a single column of data (i.e. 1 D data), the FFT processor proceeds by dividing N words of data (i.e. N points) into factors m and n and performing, in order, a first permutation, then a Kronecker matrix with m instances of f_trans(n), then a second (inverse) permutation, then a multiplication by a diagonal twiddle matrix, then a second Kronecker with n instances of f_trans(m), and then finally the initial permutation again. Advantageously, data is held (e.g. in local memory) only in sections of size m and n for the two Kronecker phases. Thus, for 1D transforms, the exemplary architecture can be expressed by the following equation A: Equation A F*X = Pmn*kron(ln,Fm)*Dmn*Pmn*kron(lm,Fn)*Pmn*X where Pmn is the transpose of Pmn.
In the present invention, this transform is now extended to a two-dimensional transform.
Here, the data are not in the form of a column matrix of length N = m*n, but instead in the form of a square matrix of side N = m*n.
Taking the above equation A, the exemplary embodiments now provide an algorithm with the (square) data matrix nested in the middle of two sets of operations, with Equation A of the one set on one side and the transpose of this set on the other. The 2D transform is expressed by the following Equation B: Equation B F*X=Pmn*kron(ln, Fm)*Dmn*Pmn*kron(l m, Fn)*Pmn*X.
In Equation B, X. means the transpose of X, rather than the complex conjugate of X. The exemplary FFT processor runs through the same set of operations as noted above for a 1D transform, with the same number of steps, but now as a two-dimensional transform, giving significant savings.
Data are now provided in blocks of size m*m and n*n for processing in the 2D FFT processor. If the initial data are treated separately for their real and imaginary parts, then it is noted that there are symmetry savings for the transforms of each block. That is, results occur (mostly) in conjugate pairs, cutting down the required number of operations.
In the exemplary embodiments, diagonal twiddle-factor matrices are merged and applied as a single, full matrix to all the 2D data. Thus, diagonal twiddle-factor matrices do not need to be applied first to rows then to columns.
In the exemplary embodiments, not all the elements need to be applied in multiplication of data. Starting with the initial m by m blocks (n*n of them), the second permutation after Kroneckering maps these into m*m n by n blocks, ready for the second Kronecker after twiddle factors are applied. These blocks can mostly (apart from one in the top left corner) be sorted into pairs where the data contained in one are equal to a conjugate perm of the other, so that only about half the twiddle multiplications need be done, and only about half the n by n sized transforms. Here, the blocks of data are written back to two locations, with a permutation of the computed data to get the second block.
Figure 11 is a schematic overview of a digital signal processing apparatus 1100 according to an exemplary embodiment of the present invention. The apparatus is, for example, an audio DSP and/or a video DSP. The apparatus comprises a receiver unit 1110 arranged to receive input data of length N words, where N = m*n, m and n are each positive integers, and N is a power of two. Also, the apparatus comprises an FFT processor 100 arranged to perform a fast Fourier transform of the N words of input data to produce N words of output data according to any of the e,bodiments discussed herein. Further, the apparatus comprises an output unit 1120 arranged to output the N words of output data after processing by the FF1 100.
Figure 12 illustrates an example testing and validation apparatus 1200 according to a further aspect of the present invention. Here, various different designs of m-point and n-point FFT processor units are provided simultaneously. A first selector unit 1201 selects one of the available rn-point FFT units lOa -lOc. Similarly, a second selector unit 1202 selects one of the available n-point FFT units 20a -20c.
As noted above, in general terms the N-point FFT processor is divided by factors into two smaller m-point and n-point FFT processor units. Here, m and n can be any suitable factors of N such that m times n equals N. Thus, alternate embodiments of the FFT processor architecture may be implemented using a readily available FFT processor unit of any suitable design and construction as available in the related art or elsewhere. Thus, the exemplary architecture is readily adapted to incorporate existing tried and tested smaller FFT processor units to form the required high-point FFT processor. Here, the design and verification of the two small FFT processor units requires much less effort and time than the design and verification of the large FFT processor. Thus, it is easy to implement the exemplary architecture in many different specific forms.
Recent advances in semiconductor processing technology have lead to the evolution of programmable logic chips such as field-programmable gate arrays (FPGA5) and complex programmable logic devices (CPLD5) which increase both in terms of speed and capacity.
Hence, the architecture discussed herein is particularly suitable for the rapid prototyping and development of DSP devices incorporating one or more large-point FFT processors.
As will be familiar to those skilled in the art, a limiting factor in most FFT architectures is the complex multiplication required when applying the twiddle factors which therefore leads to a bottleneck. Factoring the high-point FFT into two smaller FFT processor units with a high-radix algorithm substantially reduces the number of complex multipliers and improves the output SNR.
Although a few preferred embodiments have been shown and described, it will be appreciated by those skilled in the art that various changes and modifications might be made without departing from the scope of the invention, as defined in the appended claims.
Attention is directed to all papers and documents which are filed concurrently with or previous to this specification in connection with this application and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference.
All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive.
Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
The invention is not restricted to the details of the foregoing embodiment(s). The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed.

Claims (15)

  1. CLAIMS1. A pipelined 2D FFT processor to perform a fast Fourier transform on N by N words of input data where N = m x n, wherein m and n are both positive integers, the 2D FFT processor comprising: a first permutation unit (31) arranged to receive the N words of input data (700) and to permute the input data (700) into first permuted data (710) arranged in n*n data blocks each of size m*m words; a first m-point FFT processor unit (10) arranged to perform a fast Fourier transform on the first permuted data (710) to provide first transformed data (720) arranged in n*n data blocks each of size m*m words; a second permutation unit (32) arranged to permute the first transformed data (720) into second permuted data (730) arranged in m*m data blocks each of size n*n words; a twiddle factor multiplication unit (40) comprising a complex multiplier (42) arranged to multiply each word of the second permuted data (730) by a predetermined twiddle factor to provide twiddle factor multiplied data (740); a second n-point FFT processor unit (20) arranged to perform a fast Fourier transform on the twiddle factor multiplied data (740) to provide second transformed data (750) arranged in m*m data blocks each of size n*n words; and a third permutation unit (33) arranged to permute the second transformed data (750) into third permuted data (760) and to output the third permuted data (760) in a N by N matrix as a 2D fast Fourier transform of the input data (700).
  2. 2. The FFT processor of claim 1, further comprising: a permutation controller (50) arranged to provide address signals to each of the first, second and third permutation units (31, 32, 33) whereby data are written into and read from the first, second and third permutation units (31, 32, 33) according to address signals.
  3. 3. The FFT processor of claim 2, wherein the first, second and third permutation units (31, 32, 33) are each arranged to write data in a sequential order and to read data from the first, second and third permutation units (31, 32, 33) in a permuted sequence according to the address signals supplied by the permutation controller (50).
  4. 4. The FFT processor of claim 1, wherein the first, second and third permutation units (31, 32, 33) each comprise a single-port RAM.
  5. 5. The FFT processor of claim 4, wherein the first, second and third permutation units (31, 32, 33) are each arranged to operate in a read-before-write mode.
  6. 6. The FFT processor of claim 1, wherein: the first m-point FFT processor (10) is arranged to process each of the n*n data blocks of size m by m words of the first permuted data (710) separately in turn and to write each of the n*n data blocks of the first transformed data (720) into the second permutation unit (32); and the second n-point FFT processor (20) is arranged to process each of the m*m data blocks of size n by n words of the twiddle factor multiplied data (740) separately in turn and to write each of the m*m data blocks of the second transformed data (750) into the third permutation unit (33).
  7. 7. The FFT processor of claim 1, wherein the twiddle factor multiplication unit (40) comprises a ROM (44) arranged to store a plurality of twiddle factors and a complex multiplier (42) arranged to multiply the stored twiddle factors supplied from the ROM (44) in turn with the second permuted data (730).
  8. 8. A digital signal processing apparatus (1100), comprising: a receiver unit (1110) arranged to receive input data of length N words, where N = m*n, where m and n are each positive integers; a FFT processor (100) arranged to perform a fast Fourier transform of the N words of input data to produce N words of output data; and an output unit (1120) arranged to output the N words of output data; wherein the FFT processor (100) is arranged as set forth in any of claims 1 to 7.
  9. 9. The digital signal processing apparatus of claim 8, wherein the apparatus comprises a digital audio broadcasting receiver.
  10. 10. The digital signal processing apparatus of claim 8, wherein the apparatus comprises a digital video broadcasting receiver.
  11. 11. A method of performing a 2D fast Fourier transform on N by N words of input data arranged in a square matrix of side N, wherein N = m x n, wherein m and n are both positive integers, the method comprising: receiving the N by N words of input data (700); permuting the input data (700) into first permuted data (710) arranged in n*n data blocks each of size m by m words; performing a fast Fourier transform on the first permuted data (710) using a first m-point FFT processor unit (10) to provide first transformed data (720) arranged in n*n data blocks each of size m by m words; permuting the first transformed data (720) into second permuted data (730) arranged in m*m data blocks each of size n by n words; multiplying each of the words of the second permuted data (730) by a predetermined twiddle factor to provide twiddle factor multiplied data (740); performing a fast Fourier transform on the twiddle factor multiplied data (740) using a second n-point FFT processor unit (20) to provide second transformed data (750) arranged in m*m data blocks each of size n by n words; permuting the second transformed data (750) into third permuted data (760); and outputting the third permuted data (760) as a 2D fast Fourier transform of the input data (700).
  12. 12. A testing apparatus (1200) for testing a 2D FFT processor arranged to perform a 2D fast Fourier transform on N by N words of input data where N = m x n, where m and n are both positive integers, the apparatus comprising: a first permutation unit (31) arranged to receive the N words of input data (700) and to permute the input data (700) into first permuted data (710) arranged in n*n data blocks each of size m by m words; a plurality of m-point FFT processor units (lOa, lOb, lOc) each arranged to perform a fast Fourier transform on the first permuted data (710) to provide first transformed data (720) arranged in n*n data blocks each of size m by m words; a second permutation unit (32) arranged to permute first transformed data (720) into second permuted data (730) arranged in m*m data blocks each of size n by n words; a twiddle factor multiplication unit (40) comprising a complex multiplier (42) arranged to multiply each word of the second permuted data (730) by a predetermined twiddle factor to provide twiddle factor multiplied data (740); a plurality of n-point FFT processor units (20a, 20b, 20c) each arranged to perform a fast Fourier transform on the twiddle factor multiplied data (740) to provide second transformed data (750) arranged in m*m data blocks each of size n by n words; a third permutation unit (33) arranged to permute the second transformed data (750) into third permuted data (760) and to output the third permuted data (760) as a 2D fast Fourier transform of the input data (700); a first selector unit (1201) arranged to select one of the plurality of m-point FFT processor units (lOa -bc); and a second selector unit (1202) arranged to select one of the plurality of n-point FFT processor units (20a -20c); whereby the selected one of the plurality of m-point FFT processor units (1 Oa -1 Oc) and the selected one of the plurality of n-point FFT processor units (20a -20c) are arranged in combination to perform the 2D fast Fourier transform on the N by N words of input data.
  13. 13. A pipelined 2D FFT processor, substantially as hereinbefore described with reference to the accompanying drawings.
  14. 14. A digital signal processing apparatus, substantially as hereinbefore described with reference to the accompanying drawings.
  15. 15. A testing apparatus for testing a FFT processor, substantially as hereinbefore described with reference to the accompanying drawings.
GB0807577A 2008-04-25 2008-04-25 Pipelined 2D fast Fourier transform with three permutation stages, two FFT processor units and a twiddle factor unit. Withdrawn GB2459339A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB0807577A GB2459339A (en) 2008-04-25 2008-04-25 Pipelined 2D fast Fourier transform with three permutation stages, two FFT processor units and a twiddle factor unit.
PCT/GB2009/050396 WO2009130498A2 (en) 2008-04-25 2009-04-20 Pipelined 2d fft processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0807577A GB2459339A (en) 2008-04-25 2008-04-25 Pipelined 2D fast Fourier transform with three permutation stages, two FFT processor units and a twiddle factor unit.

Publications (3)

Publication Number Publication Date
GB0807577D0 GB0807577D0 (en) 2008-06-04
GB2459339A true GB2459339A (en) 2009-10-28
GB2459339A8 GB2459339A8 (en) 2009-12-30

Family

ID=39522566

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0807577A Withdrawn GB2459339A (en) 2008-04-25 2008-04-25 Pipelined 2D fast Fourier transform with three permutation stages, two FFT processor units and a twiddle factor unit.

Country Status (2)

Country Link
GB (1) GB2459339A (en)
WO (1) WO2009130498A2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609396A (en) * 2012-01-19 2012-07-25 中国传媒大学 Discrete Fourier transform processing device and method in data rights management (DRM) system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4768159A (en) * 1984-11-26 1988-08-30 Trw Inc. Squared-radix discrete Fourier transform
US5031038A (en) * 1989-04-18 1991-07-09 Etat Francais (Cnet) Process and device for the compression of image data by mathematical transformation effected at low cost, particularly for the transmission at a reduced rate of sequences of images
US20030225805A1 (en) * 2002-05-14 2003-12-04 Nash James G. Digital systolic array architecture and method for computing the discrete fourier transform
US20040039765A1 (en) * 2001-02-28 2004-02-26 Fujitsu Limited Fourier transform apparatus
GB2448755A (en) * 2007-04-27 2008-10-29 Univ Bradford Large N-point fast Fourier transform processor with three permutation units, two FFT units and a twiddle factor multiplication unit.

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4320466A (en) * 1979-10-26 1982-03-16 Texas Instruments Incorporated Address sequence mechanism for reordering data continuously over some interval using a single memory structure
US5528736A (en) * 1992-04-28 1996-06-18 Polytechnic University Apparatus and method for performing a two-dimensional block data transform without transposition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4768159A (en) * 1984-11-26 1988-08-30 Trw Inc. Squared-radix discrete Fourier transform
US5031038A (en) * 1989-04-18 1991-07-09 Etat Francais (Cnet) Process and device for the compression of image data by mathematical transformation effected at low cost, particularly for the transmission at a reduced rate of sequences of images
US20040039765A1 (en) * 2001-02-28 2004-02-26 Fujitsu Limited Fourier transform apparatus
US20030225805A1 (en) * 2002-05-14 2003-12-04 Nash James G. Digital systolic array architecture and method for computing the discrete fourier transform
GB2448755A (en) * 2007-04-27 2008-10-29 Univ Bradford Large N-point fast Fourier transform processor with three permutation units, two FFT units and a twiddle factor multiplication unit.

Also Published As

Publication number Publication date
WO2009130498A3 (en) 2010-03-11
GB2459339A8 (en) 2009-12-30
WO2009130498A2 (en) 2009-10-29
GB0807577D0 (en) 2008-06-04

Similar Documents

Publication Publication Date Title
US20100128818A1 (en) Fft processor
US6366936B1 (en) Pipelined fast fourier transform (FFT) processor having convergent block floating point (CBFP) algorithm
KR101162649B1 (en) A method of and apparatus for implementing fast orthogonal transforms of variable size
KR20010034300A (en) Pipelined fast fourier transform processor
KR100989797B1 (en) Fast fourier transform/inverse fast fourier transform operating core
JPH11161637A (en) Parallel system for executing array processing in short wait time by using fast fourier transformation
US20060288068A1 (en) Memory control method for storing operational result data with the data order changed for further operation
WO2013097217A1 (en) Multi-granularity parallel fft butterfly calculation method and corresponding device
Revanna et al. A scalable FFT processor architecture for OFDM based communication systems
JP2010016830A (en) Computation module to compute multi-radix butterfly to be used in dtf computation
KR100836624B1 (en) Device of variable fast furier transform and method thereof
Joshi FFT architectures: a review
JP2005196787A (en) Fast fourier transform device improved in processing speed and its processing method
JP5486226B2 (en) Apparatus and method for calculating DFT of various sizes according to PFA algorithm using Ruritanian mapping
US6728742B1 (en) Data storage patterns for fast fourier transforms
JP2010016831A (en) Device for computing various sizes of dft
GB2459339A (en) Pipelined 2D fast Fourier transform with three permutation stages, two FFT processor units and a twiddle factor unit.
Hazarika et al. Low-complexity continuous-flow memory-based FFT architectures for real-valued signals
Ranganathan et al. Efficient hardware implementation of scalable FFT using configurable Radix-4/2
Kala et al. Radix-43 based two-dimensional FFT architecture with efficient data reordering scheme.
Hassan et al. FPGA Implementation of an ASIP for high throughput DFT/DCT 1D/2D engine
US8572148B1 (en) Data reorganizer for fourier transformation of parallel data streams
Moon et al. Area-efficient memory-based architecture for FFT processing
Dawwd et al. Reduced Area and Low Power Implementation of FFT/IFFT Processor.
Song et al. An efficient FPGA-based accelerator design for convolution

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)