KR101652899B1 - Fast fourier trasform processor using eight-parallel mdc architecture - Google Patents

Fast fourier trasform processor using eight-parallel mdc architecture Download PDF

Info

Publication number
KR101652899B1
KR101652899B1 KR1020150105917A KR20150105917A KR101652899B1 KR 101652899 B1 KR101652899 B1 KR 101652899B1 KR 1020150105917 A KR1020150105917 A KR 1020150105917A KR 20150105917 A KR20150105917 A KR 20150105917A KR 101652899 B1 KR101652899 B1 KR 101652899B1
Authority
KR
South Korea
Prior art keywords
processing module
stages
pass
parallel
multiplexer
Prior art date
Application number
KR1020150105917A
Other languages
Korean (ko)
Inventor
선우명훈
김문기
Original Assignee
아주대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 아주대학교 산학협력단 filed Critical 아주대학교 산학협력단
Priority to KR1020150105917A priority Critical patent/KR101652899B1/en
Application granted granted Critical
Publication of KR101652899B1 publication Critical patent/KR101652899B1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • G06F17/142Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations

Abstract

Disclosed is a fast Fourier transform device having an eight-parallel multi-path delay commutator (MDC) architecture, capable of reducing hardware complexity. According to one embodiment of the present invention, the fast Fourier transform device having the eight-parallel MDC architecture includes: a first processing module having a plurality of stages, in which each of the stages includes at least one from a plurality of first butterflies, a plurality of delay elements, a plurality of constant multipliers, and a plurality of first commutators; a second processing module having a plurality of stages of which the number of stages is smaller than the number of stages included in the first processing module, in which each of the stages includes a plurality of second butterflies; and a data reconfiguring module arranged between the first processing module and the second processing module, and including a plurality of second commutators for switching an output signal of the first processing module to transmit the output signal to the second processing module as an input signal, and a plurality of complex multipliers connected to the remaining output terminals except one output terminal among output terminals of the respective second commutators.

Description

[0001] FAST FOURIER TRASFORM PROCESSOR USING EIGHT-PARALLEL MDC ARCHITECTURE [0002]

Embodiments of the present invention relate to a fast Fourier transform apparatus, and more particularly, to a fast Fourier transform apparatus employing an 8-parallel MDC structure.

The Fast Fourier Transform (FFT) algorithm is widely used as a mathematical algorithm to reduce the computational complexity of Discrete Fourier Transform (DFT). FFT algorithms are used in many fields such as communication systems, bio applications, sensor signal processing, and satellite signal processing. In addition, the FFT processor has the largest complexity in the IEEE 802.11n / ac / ad, IEEE 802.15.3.c, and IEEE 802.16e standards adopting OFDM (Orthogonal Frequency Division Multiplexing) transmission method, It is one.

Various FFT processors have been proposed to satisfy high throughput for real - time signal processing. The FFT structure is divided into a memory-based structure and a pipeline structure. The memory infrastructure meets small hardware footprint, but has difficulty in achieving high throughput. In a field requiring real-time signal processing, a pipeline structure is mainly used to overcome these drawbacks and obtain high processing speed.

The pipeline structure can be classified into single-path delay feedback (SDF), multi-path delay feedback (MDF), single-path delay commutator (SDC), and multi-path delay commutator . The SDF structure has low hardware complexity because it uses only the same number of delay elements as the MDC structure, but it has low throughput because it moves data to a single path. On the other hand, the MDC structure uses the exchange to send data, which increases the throughput, but also increases the hardware complexity. Therefore, hardware complexity and data throughput of the entire structure are determined according to each structure.

In recent real-time applications, the FFT processor must satisfy a high throughput rate of more than a few GSamples / s. Therefore, a study on pipeline structure using parallel processing technique is actively proposed.

Related Prior Art Korean Patent Laid-Open Publication No. 10-2011-0068763 (entitled: Complex constant multiplier, Fast Fourier transform device including complex constant multiplier and method, Published date: June 22, 2011) is available.

One embodiment of the present invention employs an 8-parallel structure together with an MDC structure based on a radix-2 6 algorithm to reduce hardware complexity by reducing the number of complex calculators while satisfying a high data throughput. And a fast Fourier transform apparatus using the seed structure.

The problems to be solved by the present invention are not limited to the above-mentioned problem (s), and another problem (s) not mentioned can be clearly understood by those skilled in the art from the following description.

A fast Fourier transform apparatus using an 8-parallel MDC structure according to an embodiment of the present invention includes a plurality of stages, each of which includes a plurality of first butterflies, a plurality of delay elements, a plurality of constant multipliers, A first processing module including at least one of a first commutator of the first processing module; A second processing module having a plurality of stages having a smaller number of stages than the first processing module, each of the stages comprising the plurality of second butterflies; And a plurality of second commutators disposed between the first processing module and the second processing module for switching the output signal of the first processing module to transfer the signal to the second processing module as an input signal, And a plurality of complex multipliers connected to the output terminals except for one of the output ends of the respective commutators.

Wherein the first processing module applies a radix-2 6 algorithm consisting of six first through sixth stages such that the first plurality of butterflies has a multiplication operation such as a radix-64 algorithm, And the second processing module applies a radix- 2 algorithm consisting of two seventh and eighth stages such that the second plurality of butterflies is multiplied by a multiplication operation such as a radix-4 algorithm And can be a butterfly structure like the radix-2 algorithm.

The radix-2 6 algorithm comprises two twiddle factors -j computed in the first and fourth stages, two twiddle factors W 8 computed in the second and fifth stages, One twiddle factor W 64 computed in the stage, and one twiddle factor W 256 computed in the sixth stage.

The twiddle factor -j may be calculated by replacing the data of the real part and the imaginary part with 2's complement to the imaginary part, and the twiddle factor W 64 may be calculated by multiplying the multipliers of the plurality of constant multipliers < RTI ID = . ≪ / RTI >

Wherein the plurality of constant multipliers are operable to perform shift and addition operations using CSD (Canonical Signed Digit) and CSS (Common Sub-expression Sharing) methods on data values separated by real and imaginary values according to the operation in the third stage. The complex multiplication operation can be performed only by the operation.

Wherein the plurality of first butterflies are arranged in parallel in each of six stages provided in the first processing module so as to output two output signals to two input signals, Two processing modules are arranged in parallel between the first processing module and the second processing module, and output signals of the four first butterflies arranged in the sixth stage among the six stages are input one by one.

The second communicator may output only data having a twiddle factor of 1 through one output terminal of each of the output terminals.

Wherein the second communicator includes a plurality of multiplexers for selecting and outputting one of a plurality of input signals in accordance with a control signal changed for each clock, wherein the twitter factor is set to 1 through an output terminal of one of the plurality of multiplexers, Only data can be output.

The plurality of multiplexers being disposed at a top end of the second communicator and having a first path coupled to a first input of the second commutator, a second path coupled to a second input of the second commutator, A first multiplexer for sequentially receiving the input signal from a third path connected to a third input terminal of the data and a fourth path connected to a fourth input terminal of the secondcommitter; A second multiplexer disposed at a lower end of the first multiplexer and sequentially receiving the input signal from each of the third pass, the fourth pass, the first pass, and the second pass; A third multiplexer disposed at a lower end of the second multiplexer and sequentially receiving the input signal from each of the fourth pass, the first pass, the second pass, and the third pass; And a fourth multiplexer disposed at the lowermost end of the second communicator and sequentially receiving the input signal from each of the second pass, the third pass, the fourth pass, and the first pass.

The first multiplexer outputs only the data having the twiddle factor of 1, and the plurality of complex multipliers may be connected to the output terminals of the remaining second to fourth multiplexers except the output terminal of the first multiplexer.

The details of other embodiments are included in the detailed description and the accompanying drawings.

According to an embodiment of the present invention, by applying the 8-parallel structure together with the MDC structure based on the radix-2 6 algorithm, hardware complexity can be reduced by reducing the number of complex calculators while satisfying a high data throughput.

1 is a circuit diagram of a fast Fourier transform (FFT) apparatus using an 8-parallel MDC structure according to an embodiment of the present invention.
2 is a detailed circuit diagram of the second commutator of FIG.
3 is a diagram illustrating an existing MDC structure for a 256-point FFT.
4 is a diagram illustrating an MDC structure proposed in an embodiment of the present invention for a 256-point FFT.
5 is a detailed circuit diagram of the constant multiplier of FIG.
6 is a diagram illustrating eight regions used in the twiddle factor W 64 in one embodiment of the present invention.
7 is a diagram showing 8 coefficients corresponding to a real number among the 15 coefficients used in the stage 3 in the 10-bit CSD type according to an embodiment of the present invention.
FIG. 8 is a diagram showing an imaginary number among the fifteen coefficients used in the stage 3 in the 10-bit CSD type, according to an embodiment of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS The advantages and / or features of the present invention, and how to accomplish them, will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. It should be understood, however, that the invention is not limited to the disclosed embodiments, but is capable of many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, To fully disclose the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout the specification.

Before describing the embodiments of the present invention, the algorithm applied to the embodiments of the present invention will be described below.

The 256-point DFT equation is shown in Equation (1) below.

[Equation 1]

Figure 112015072789733-pat00001

Here, W denotes a twiddle factor, X [n] denotes a time axis signal, and X (k) denotes a frequency axis signal.

Equation (1) is divided into 64 points in four points, for a 64-point radix-2 for 6 and 4-point FFT algorithm can be applied to the radix-2 2.

The formula for dividing by 64 points and 4 points is as follows.

First, the index can be defined as shown in Equation 2 below.

&Quot; (2) "

Figure 112015072789733-pat00002

Applying the divided index to Equation (1) yields Equation (3).

&Quot; (3) "

Figure 112015072789733-pat00003

If the radix-2 6 algorithm is further applied to 64 points, the equation is expressed by Equation 4 below. Similarly, for n 1 and k 1 , the index can be divided into Equation (4).

&Quot; (4) "

Figure 112015072789733-pat00004

If the divided index is applied to W 64 of Equation (3), it can be expressed as Equation (5).

&Quot; (5) "

Figure 112015072789733-pat00005

For the remaining four points when applied 2 radix-2 algorithm shown in Equation 6, 7.

&Quot; (6) "

Figure 112015072789733-pat00006

&Quot; (7) "

Figure 112015072789733-pat00007

That is, the FFT algorithm in which radix-2 6 and radix-2 2 are applied to the whole can be expressed by Equation (8).

&Quot; (8) "

Figure 112015072789733-pat00008

Therefore, in the embodiment of the present invention, the twiddle factors shown in Table 1 below are displayed for each stage. In Table 1, the twiddle factor W 256 of Stage 6 However,

Figure 112015072789733-pat00009
Respectively. At this time, as can be seen from the divided indices (Equation 4), γ 1 , γ 2 , β 1 , β 2 , β 3 , β 4 , β 5 and β 6 may have only values of 0 and 1.

If we look at the equation of ( 2 1 + 2 ) (β 1 + 2 β 2 + 4 β 3 + 8β 4 + 16β 5 + 32β 6 ), if the β values have any value and γ is 0, then the twiddle factor W 256 is W 256 has a value of 0 and does not need to use W 256 .

In terms γ are the number of cases that may have, (γ 1, γ 2) = (0,0), (γ 1, γ 2) = (1,0), (γ 1, γ 2) = (0, 1), (γ 1, γ 2) = ( number of all four cases there to 1,1), (2γ case of 1 + γ 2) = 0 is (γ 1, γ 2) = (0,0) Only one case can satisfy 0. That is, the twiddle factor W 256 of Stage 6 25% of the twelve factors have a twiddle factor of 1 (W 256 0 ) .

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

1 is a circuit diagram of a fast Fourier transform (FFT) apparatus using an 8-parallel MDC structure according to an embodiment of the present invention, and FIG. 2 is a detailed circuit diagram of a second combiner 132 of FIG. 1 .

Referring to FIG. 1, a fast Fourier transform apparatus 100 employing an 8-parallel MDC structure according to an embodiment of the present invention includes a first processing module 110, a second processing module 120, (130).

The first processing module 110 includes a plurality of stages each of which includes a plurality of first butterflies 112, a plurality of delay elements 114, a plurality of constant multipliers 116, And may include a communicator 118.

The first processing module 110 includes six first to sixth stages by applying the radix-2 6 algorithm consisting of (Stage-1, 2, 3 , 4, 5, 6), the first plurality of butter The ply 112 has a multiplication operation such as the radix-64 algorithm and can be a butterfly structure such as the radix-2 algorithm.

Accordingly, the plurality of first butterflies 112 are arranged in parallel in each of six stages (Stage-1, 2, 3, 4, 5, 6) provided in the first processing module 110 Two output signals can be output for two input signals.

The radix-2 6 algorithm comprises two twiddle factors -j computed in the first and fourth stages (Stage-1, 2, 3, 4) and computed in the second and fifth stages Two twiddle factors W 8 , one twiddle factor W 64 computed in the third stage, and one twiddle factor W 256 computed in the sixth stage. That is, the twiddle factor shown in Table 1 is displayed for each stage.

[Table 1]

Figure 112015072789733-pat00010

Here, the twiddle factor -j may be calculated by replacing the data of the real part and the imaginary part with 2's complement to the imaginary part, and the twiddle factor W 64 may be calculated by multiplying the May be computed using a constant multiplier 116.

The plurality of constant multipliers 116 may use CSD (Canonical Signed Digit) and CSS (Common Sub-expression Sharing) methods on data values separated by real and imaginary values according to the operation in the third stage The complex multiplication operation can be performed only by the shift and addition operations. The plurality of constant multipliers 116 will be described later with reference to FIG.

The second processing module 120 includes a plurality of stages having fewer stages than the first processing module 110, each of the stages including the plurality of second butterflies 122.

The second processing module 120 applies the radix- 2 algorithm consisting of two seventh and eighth stages Stage-7 and 8 so that the plurality of second butterflies 122 are radix-4 Algorithm, and can be a butterfly structure, such as the radix-2 algorithm.

The data reconstruction module 130 is disposed between the first processing module 110 and the second processing module 120. The data reconstruction module 130 includes a plurality of second communicators 132 for switching the output signals of the first processing module 110 and transmitting the signals as input signals to the second processing module 120, And a plurality of complex multipliers 134 connected to the remaining output terminals except for one of the output terminals of the respective commutators 132.

Two of the second commutators 132 are arranged in parallel between the first processing module 110 and the second processing module 120 and the six stages Stage-1, 2, 3 The output signals of each of the four first butterflies 120 disposed in the sixth stage (Stage-6) among the first butterflies 120, 4, 5, and 6 may be input one by one.

That is, the second communicator 132 may be arranged in parallel between the first processing module 110 and the second processing module 120, each of which outputs one of four output terminals Only the data having the twiddle factor of 1 can be output.

To this end, the second communicator 132 includes a plurality of multiplexers for selecting one of a plurality of input signals according to a control signal changed for each clock, and outputting one of the plurality of multiplexers through the output terminal of one of the plurality of multiplexers Only the data having the twiddle factor of 1 can be output

Specifically, the second communicator 132 may include first to fourth multiplexers 212, 214, 216, and 218. The second communicator 132 may further include a plurality of input delay buffers 222, 224 and 226, a plurality of output delay buffers 232, 234 and 236, and a control signal 240 have.

The first multiplexer 212 is disposed at the uppermost end of the second communicator 132 and includes a first path 201 connected to the first input terminal of the second communicator 132, A third path 203 connected to the third input of the second communicator 132, a second path 203 connected to the second input of the second communicator 132, The input signal can be received sequentially from each of the four paths 204. [

The second multiplexer 214 is disposed at the lower end of the first multiplexer 212 in the second communicator 132 and the third pass 203, the fourth pass 204, The input signal can be received sequentially from the path 201 and the second path 202, respectively.

The third multiplexer 216 is disposed at the lower end of the second multiplexer 214 in the second communicator 132 and is connected to the fourth path 204, the first path 201, The input signal can be received sequentially from the path 202 and the third path 203, respectively.

The fourth multiplexer 218 is disposed at the lowermost end of the second communicator 132 and includes the second path 202, the third path 203, the fourth path 204, The input signal can be received sequentially from each of the input terminals 201.

The second multiplexer 132 may output only the data having the twiddle factor of 1 through the first multiplexer 212 and the plurality of complex multipliers 134 may output the output of the first multiplexer 212, And may be connected to the output terminals of the remaining second to fourth multiplexers 214, 216, and 218, respectively.

Thus, according to an embodiment of the present invention, after the sixth stage (Stage-6) is calculated, a second communicator 132 for outputting only data having a twiddle factor of 1 at one of four output stages The number of complex multipliers 134 can be reduced by performing data rearrangement, thereby greatly reducing hardware complexity. Also, according to an embodiment of the present invention, data processing rates can be further improved by satisfying a high processing speed by continuously computing data using eight parallel paths in total.

FIG. 3 is a diagram illustrating a conventional MDC structure for a 256-point FFT, and FIG. 4 is a diagram illustrating an MDC structure proposed in an embodiment of the present invention for a 256-point FFT.

To explain the reduction of the complex multiplier number, FIG. 3 and FIG. 4 show data scheduling schemes of the existing structure and the proposed structure. First, FIG. 3 illustrates a data operation method of a 256 MDC FFT structure based on a conventional radix-2 n algorithm. The complex multiplication in stage 7 requires a twiddle factor W 256 ( 2 ? 1 +? 2 ) , and 2 ? 1 +? 2 is composed of {0, 1, 2, 3}. If so, the eight data samples in stage 7 are successively multiplied by the appropriate twiddle factor value. If 2 ? 1 +? 2 has {0}, then its twig factor value becomes W 256 0 and this value does not need a multiplication operation, but the existing 256-point MDC FFT structure has a twiddle factor Since there is no commutator having an output terminal for outputting only data having a value of 1, a complex multiplier is required for all paths as shown in FIG. 3, requiring 8 complex multipliers.

FIG. 4 illustrates a detailed data operation using the data scheduling technique of the 256-point MDC FFT structure proposed in the embodiment of the present invention. 4, the data values (ie, 4k = 0 , 4, 8, 12, ..., 140 ...) multiplied by the twiddle factor 1 (W 256 0 ) Path-5 is displayed in order. Therefore, complex multipliers are required in all paths in the existing structure, but complex multipliers are not needed in Path-1 and Path-5 by applying the proposed data scheduling technique. Therefore, the structure proposed in the embodiment of the present invention reduces the number of complex multipliers from 8 to 6 using a data scheduling technique, thereby reducing the amount of computation by 25%.

FIG. 5 is a detailed circuit diagram of the constant multiplier 116 of FIG. In particular, FIG. 5 proposes a twiddle factor W 64 constant multiplier required in stage 3 to reduce hardware complexity. 6 is a diagram illustrating eight regions used in the twiddle factor W 64 in one embodiment of the present invention.

As shown in FIG. 5, the constant multiplier 116 is composed of 6 multiplexers, 12 adders, 38 shifts, and a mapping block. In stage 3, the data values are separated into real and imaginary values, and a data multiplication operation is performed according to the appropriate twiddle factor W 64 .

The constants multiplier 116 can replace the complex booth multiplier efficiently using CSD (canonical signed digit) and CSS (sub-expression sharing) methods. The CSD scheme uses only shift and addition operations and does not have non-zero bits, so the amount of computation can be reduced. In addition, the CSS method can greatly reduce redundant operations by preliminarily performing an addition operation.

Therefore, when the constant multiplier 116 proposed in the embodiment of the present invention is used, the complex multiplication operation can be performed using only the shift and addition operations. In addition, in the embodiment of the present invention, by using the constant multiplier 116, the hardware complexity can be greatly reduced as compared with the conventional complex multiplier through optimization of the CSD and the CSS method. The final real and imaginary data values can be obtained through the mapping block.

The CSS method is a method of implementing multiplication operations using addition operations and shifts, in which common patterns are defined and shared with each other. In order to use the CSS method, the constant coefficients used first must be represented by the CSD type. In order to represent the twiddle factor W 64 operation in the CSD type, only eight twiddle factors corresponding to 1/8 are considered as shown in FIG.

In stage 3, a twiddle factor distributed in area A of FIG. 6 is used. A total of 16 coefficients are used for 8 real and 8 imaginary twiddle factors distributed in the A region. The values of the 45 ° portion are equal to (0.7071, 0.7071), so that a total of 15 coefficients are used do.

Among the fifteen coefficients used in the stage 3, eight coefficients corresponding to the real numbers are represented by a 10-bit CSD type as shown in FIG. In FIG. 7, N indicates -1, and when the CSD type is represented by this type, the number of non-zero bits is smaller than that of 2's complement type. To further reduce the number of addition operations, the CSS method is applied to the real part. In Fig. 7, the common pattern can be calculated first to further reduce the computation.

Among the fifteen coefficients used in the stage 3, the imaginary number is expressed by a 10-bit CSD type as shown in FIG. In FIG. 8, the 45-degree portion of the twiddle factor region of FIG. 6 has the same coefficients as (0.7071, 0.7071), and is not implemented in the imaginary part implementation because it is implemented in FIG. 7 of the real number portion.

To verify the reduction of hardware complexity, the proposed architecture was compared with the hardware through synthesis of Synopsys Design Compiler. The conventional complex booth multiplier has a hardware size of 60,095 um 2 , whereas the proposed constant multiplier has a hardware complexity of 28,501 um 2 , resulting in a hardware reduction of about 53%. As a result, the proposed constant multiplier only uses shift and add operations using the optimized CSD and CSS methods, and it can be confirmed that the hardware complexity is reduced.

[Table 2]

Figure 112015072789733-pat00011

Table 2 shows performance comparison charts of the proposed structure and the conventional FFT structure by dividing them into the embodiments and the comparative examples in order to verify the effect of the proposed structure. The proposed architecture (example) satisfies a throughput of 2.7 GSample / s at a frequency of 338 MHz. The maximum frequency can be satisfied up to 500 MHz. This throughput is the highest in comparison with the comparative examples 1, 2, and 3. Comparing only the generalized area, it can be seen that the embodiment shows a larger hardware area reduction than the comparative examples. In particular, it can be seen that the embodiment has a hardware size of about 21% as compared with the comparative example 2.

The embodiments are compared in detail with the respective comparative examples as follows. Although the embodiment is proposed to aim at the same 256-point compared to the comparative example 1, the throughput is high and the generalized area is only about 41% as compared with the comparative example 1. [ In addition, although the same MDC structure is proposed with a target of 512-point as compared with the comparative example 2, the embodiment has the same 8-parallel structure, so the throughput is similar to that of the comparative example 2, % Of hardware area. In Comparative Example 3, the generalized area is about 45%, but the throughput rate is higher in the embodiment.

Therefore, the proposed structure (embodiment) improves the data throughput through eight parallel paths while applying the MDC structure having higher data throughput than the MDF structure. In addition, the proposed architecture reduces the number of complex multipliers required by using an efficient multiply scheduling technique, and reduces hardware complexity by optimizing constant multipliers. Therefore, the proposed structure shown in Table 2 shows that the data throughput is the highest and the hardware complexity is improved by up to 79% when compared with the conventional structures (Comparative Examples 1, 2 and 3).

While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined by the scope of the appended claims and equivalents thereof.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, Modification is possible. Accordingly, the spirit of the present invention should be understood only in accordance with the following claims, and all equivalents or equivalent variations thereof are included in the scope of the present invention.

110: first processing module
112: 1st butterfly
114: delay element
116: constant multiplier
118: first communicator
120: second processing module
122: Second Butterfly
130: data reconstruction module
132: second communicator
134: complex multiplier

Claims (10)

Each of the stages comprising a first processing module including at least one of a plurality of first butterflies, a plurality of delay elements, a plurality of constant multipliers, and a plurality of first commutators;
A second processing module having a plurality of stages having a smaller number of stages than the first processing module, each of the stages comprising a plurality of second butterflies; And
A plurality of second communicators disposed between the first processing module and the second processing module for switching output signals of the first processing module and transmitting the signals as input signals to the second processing module, And a plurality of complex multipliers connected to the output terminals except for one of the output ends of the data,
/ RTI >
The first processing module
By applying the radix-2 6 algorithm consisting of six first through sixth stages, the plurality of first butterflies have the same multiplication operation as the radix-64 algorithm, and have a butterfly structure like the radix-2 algorithm. and,
The second processing module
Two seventh and the plurality of second butterfly by applying the radix-2 second algorithm consisting of an eighth stage that has a multiplication operation, such as a radix-4 algorithm, so that the butterfly structure, such as a radix-2 algorithm And a fast Fourier transformer applying an 8-parallel MDC structure.
delete The method according to claim 1,
The radix- 26 algorithm
Two twiddle factors -j computed in the first and fourth stages, two twiddle factors W 8 computed in the second and fifth stages, one twiddle factor computed in the third stage, A factor W 64 , and a twiddle factor W 256 computed in the sixth stage. The fast Fourier transform apparatus using the 8-parallel MDC structure.
The method of claim 3,
The twiddle factor -j
The data of the real part and the imaginary part are changed and the imaginary part is calculated by 2's complement,
The twiddle factor W 64
Wherein the plurality of constant multipliers are computed using the constant multipliers to reduce hardware complexity.
5. The method of claim 4,
The plurality of constant multipliers
The complex multiplication operation is performed only by shifting and addition operations using CSD (Canonical Signed Digit) and CSS (Common Sub-expression Sharing) methods on the data values separated by the real and imaginary values according to the operation in the third stage And a fast Fourier transformer applying an 8-parallel MDC structure.
The method according to claim 1,
The plurality of first butterfly
Four parallel stages arranged in each of six stages provided in the first processing module to output two output signals to two input signals,
The plurality of second commutators
Wherein the first processing module and the second processing module are arranged in parallel and the output signals of four first butterflies arranged in a sixth stage among the six stages are input one by one. Fast Fourier Transform Apparatus with Parallel MDC Structure.
The method according to claim 1,
The second commutator
And outputs only data having a twiddle factor of 1 through one output terminal of each of the output terminals.
The method according to claim 1,
The second commutator
And a plurality of multiplexers for selecting and outputting one of the plurality of input signals in accordance with a control signal changed for each clock, and outputting only data having a twiddle factor of 1 through one output terminal of the plurality of multiplexers A fast Fourier transform apparatus employing an 8-parallel MDC structure.
9. The method of claim 8,
The plurality of multiplexers
A second path coupled to a second input of the second commutator, a second path coupled to a second input of the second commutator, a second path coupled to a second input of the second commutator, A first multiplexer for sequentially receiving the input signal from a fourth path connected to a fourth input of the second commutator;
A second multiplexer disposed at a lower end of the first multiplexer and sequentially receiving the input signal from each of the third pass, the fourth pass, the first pass, and the second pass;
A third multiplexer disposed at a lower end of the second multiplexer and sequentially receiving the input signal from each of the fourth pass, the first pass, the second pass, and the third pass; And
And a fourth multiplexer arranged at the lowermost end of the second communicator and sequentially receiving the input signals from the second pass, the third pass, the fourth pass, and the first pass,
And a fast Fourier transformer applying an 8-parallel MDC structure.
10. The method of claim 9,
The first multiplexer
Only the data having the twiddle factor of 1 is output,
The plurality of complex multipliers
And the second multiplexer is connected to the output terminals of the remaining second to fourth multiplexers excluding the output terminal of the first multiplexer.
KR1020150105917A 2015-07-27 2015-07-27 Fast fourier trasform processor using eight-parallel mdc architecture KR101652899B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150105917A KR101652899B1 (en) 2015-07-27 2015-07-27 Fast fourier trasform processor using eight-parallel mdc architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150105917A KR101652899B1 (en) 2015-07-27 2015-07-27 Fast fourier trasform processor using eight-parallel mdc architecture

Publications (1)

Publication Number Publication Date
KR101652899B1 true KR101652899B1 (en) 2016-09-01

Family

ID=56942707

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150105917A KR101652899B1 (en) 2015-07-27 2015-07-27 Fast fourier trasform processor using eight-parallel mdc architecture

Country Status (1)

Country Link
KR (1) KR101652899B1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101952547B1 (en) * 2018-11-23 2019-02-26 인하대학교 산학협력단 Method and Apparatus for Number Theoretic Transform based Polynomial Multiplier For Lattice based Cryptosystem

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120119939A (en) * 2011-04-22 2012-11-01 아주대학교산학협력단 Fast fourier transform processor using mrmdc architecture for ofdm system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120119939A (en) * 2011-04-22 2012-11-01 아주대학교산학협력단 Fast fourier transform processor using mrmdc architecture for ofdm system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101952547B1 (en) * 2018-11-23 2019-02-26 인하대학교 산학협력단 Method and Apparatus for Number Theoretic Transform based Polynomial Multiplier For Lattice based Cryptosystem

Similar Documents

Publication Publication Date Title
Jung et al. New efficient FFT algorithm and pipeline implementation results for OFDM/DMT applications
KR101162649B1 (en) A method of and apparatus for implementing fast orthogonal transforms of variable size
US7856465B2 (en) Combined fast fourier transforms and matrix operations
CN109117188B (en) Multi-path mixed-basis FFT (fast Fourier transform) reconfigurable butterfly operator
US9735996B2 (en) Fully parallel fast fourier transformer
WO2007060879A1 (en) Fast fourier transformation circuit
Kim et al. High speed eight-parallel mixed-radix FFT processor for OFDM systems
Kang et al. Low complexity multi-point 4-channel FFT processor for IEEE 802.11 n MIMO-OFDM WLAN system
KR101652899B1 (en) Fast fourier trasform processor using eight-parallel mdc architecture
CN101937332A (en) Multiplier multiplexing method in base 2<4> algorithm-based multi-path FFT processor
Kim et al. Novel shared multiplier scheduling scheme for area-efficient FFT/IFFT processors
KR100720949B1 (en) Fast fourier transform processor in ofdm system and transform method thereof
Prasanna Kumar et al. Optimized pipelined fast Fourier transform using split and merge parallel processing units for OFDM
Jang et al. Area-efficient scheduling scheme based FFT processor for various OFDM systems
KR20140142927A (en) Mixed-radix pipelined fft processor and method using the same
Lee et al. Modified sdf architecture for mixed dif/dit fft
US8010588B2 (en) Optimized multi-mode DFT implementation
Nguyen et al. High-throughput low-complexity mixed-radix FFT processor using a dual-path shared complex constant multiplier
Chahardahcherik et al. Implementing FFT algorithms on FPGA
Mangaiyarkarasi et al. Performance analysis between Radix2, Radix4, mixed Radix4-2 and mixed Radix8-2 FFT
KR20120109214A (en) Fft processor and fft method for ofdm system
Xu et al. Split-Radix FFT pruning for the reduction of computational complexity in OFDM based Cognitive Radio system
KR102505022B1 (en) Fully parallel fast fourier transform device
Kirubanandasarathy et al. VLSI design of mixed radix FFT Processor for MIMO OFDM in wireless communications
Efnusheva et al. Efficiency comparison of DFT/IDFT algorithms by evaluating diverse hardware implementations, parallelization prospects and possible improvements

Legal Events

Date Code Title Description
E701 Decision to grant or registration of patent right
GRNT Written decision to grant
FPAY Annual fee payment

Payment date: 20190702

Year of fee payment: 4