200821865 九、發明說明: ^ 【發明所屬之技術領域】 本發明係有關一種快速傅立葉轉換系統,特別是指一 種可處理雙輸入信號的管線式傅立葉轉換系統。 【先前技術】 由於多媒體需求越來越高,造成資料量龐大,也造就 使用正交頻率多重分割(Orthogonal Frequency Division Multiplexing,OFDM)技術的無線/有線寬頻通訊系統崛 起。這些系統可以提供聲音、影像或其他數位資訊等資 料,這些多媒體資料大部分必須即時處理或播放,所以即 時的訊號處理為非常重要的課題。 離散形式的傅立葉轉換(Discrete Fourier Transform, DFT)與反傅立葉轉換(Inverse Discrete Fourier Transform, IDFT)已廣泛的用於OFDM通訊系統中,而一種蝶狀 (Butterfly)架構的快速傅立葉轉換(Fast Fourier Transform, ' FFT)處理器更可以大量減少資料的運算量,而提昇計算速 度。 快速傅立葉轉換(FFT)是將重複出現在傅立葉轉換中 的項次編組,並用一種網狀的流程圖來計算以提昇計算速 度,而該網狀流程圖一般稱為蝶狀結構。 一計算離散傅立葉轉換的公式如式(1)所式: Ϋ,χΙ^Ψν\ k=0,l”",N-l 式⑴ w=0 200821865 而反 其中x[n]為轉換前的離散輪入訊號,X[k]為經過傅立 葉轉換後得到的離散輸出信號,且W〃則為e_y(2;r/A〇b 傅立葉轉換的公式則如式(2)所式: nk k=0,l”",N- 式(2) j{2K/N)kn 其中X[n]為在做反傅立葉轉換前的離散輸入訊號,x[k] 為經過反傅立葉轉換後得到的離散輸出信號,且則為 e 而快速傅立葉轉換(FFT)則是將式(1)進行拆解的動 作,一般有以基數(Radix)為2、4、8或2/4/8等不同的拆 解法,而一個好的拆解法可使複數(Complex)乘法的運算量 減少。 且 基數為2(Radix-2)的拆解法是令式⑴中的, γ=0,1,···,Ν/2-1,則可得到式⑴中的偶數成分為式(3)所示 且同時令k=2r+l,可得到如式(4)的奇數成分。而式(3)與 式(4)則合稱基數為2的拆解法。 iV/2-l X[2r]= £(_] +小2 + (#/2)]F7/2 «=0 式(3) N/2-l X [ 2 Γ+1 ] = 2] (x[n] - χ[η + (Λ^ / 2)]WnWn/2 w=0 式(4) 如「第1圖」所示,是一筆具有八個資料點的信號 χ[1],x[2],x[3],x[4],x[5],X[6],x[7]}在經過 FFT 後所得到 的結果,且在此是採用基數為2的拆解法。而需先說明的 是「第1圖」中為了使圖示清楚,並未標示出元件的編銳。 200821865 在第2圖」❾蝶狀架構中,共有三級㈣2(N)=3)運算過 程’且母-級都包括四個蝴蝶單元n及四個乘法薄12。 蝴蝶單元11則如「第2圖所 輸入端、H人端、—第—輸出端及 乐 口」所不,且每一蝴蝶單元丨1具 第 有一第一 &、^ “ - 輸出端。而兩輸入資料可分別從該第一盥第二輸入 端輸入,則該蝴蝶單元η之第一輪出端即輸出資料α+β, 而該第,輸出端則輸出資料科1)χΒ,即為Α_Β。 、如、「第3圖」所示’是蝴蝶單元η輸出的信號再經 過一乘法器12的結果’其中該乘法器12具有一係數w〇;, 則㈣)由該蝴蝶單元n之第二輪出端輸出後,再經過該 乘法器12處理,經產生一(Α-Β) xw°:的輸出。 在此為了方便說明,在「第i圖」中只是以每〆 乘法器12之係數來代表該乘法器12,而不像「第3圖」 有完整地標明該乘法器12的圖像。 而以基數為2/4/8的拆解法可融合以基數為2、4 與8的優點,故有較少的複數乘法運算。如「第4圖」所 示,係一筆具有八個資料點的信號,以基數為2/4/8的快 速傅立葉轉換法運算後的結果。令r=2s及r=2s+l,且 s=0,l,…,N/4-1分別代入式(4)中,可得到如式與式(6)的 結果: W/4-1 X[4s+1]= X {(4^] - χ[η + (Ν/2)]) - j\x[n ^N/4]-x[n-h3N/A])}W^W^/a #/4-1 式(5) X[4s+3]= ^{(x[n] - x[n + (N/4)]) + j{x[n + ^/4] - + / 式(6) 7 200821865 其中式(5)與式(6)稱為基數為4的拆解法。若再令s=2k • 及s=2k+l,且k=0,l,N/8-1分別代入式(5)與式(6)中,則 可得到如式(7)〜式(10)的結果: 7//8-1 X[8k+1]= Υ,{(4η]-χ[η^(Ν/2)])-j(x[n + N/4]-x[n + 3N/4]) + n=0 + #/8] - x[" + 5#/8]) - J•(咖 + 3iV /8] - x(” + 7iV/8])} 式(7) iV/8-l X [ 8 k+3 ] = X {(x[n] - x[n + (A^ / 2)]) + j{x[n + N / 4]-x[n + 3N / 4])-l· n=0 W^N,s{(x[n + Ν/^]-χ[η + 5Ν/8]) + j\x[n + 3JV/8]-x(n + IN/8])}}W^nW;% ^,(8) N/S-l X[8k+5]= ^{(x[n] - x[n + (Λ^/2)]) ~ j(x[n + N/4]-x[n^ 3N/4])- n=0 F/’8 {(♦ + #/ 8] - φ + 57V / 8]) - + 3# / 8] - x(n + 7#/8])}}式(9 ) iV/8-l X [ 8 k+7 ] = X {(x[w] - x[n + (^ / 2)]) + J\x[n + N/4]-x[n + 3N/4])- n=0 ^Γ/8 {(咖 + #/ 8] - Φ + 5iV/ 8]) + ΑΦ + 37V/8] - x(n + 7iV/ 8])} 式(10) 其中,式(Ό〜式(10)則稱為基數為8的拆解法。而式(3)〜 式(10)則稱為基數為2/4/8的拆解法。 快速傅立葉轉換是一種被廣泛應用於數位信號處理 的演算法,由於半導體製程技術不斷的進步,已經使得快 速傅立葉轉換處理器的效能大大的提高了。對於大部分數 位信號處理的演算法而言,快速傅立葉轉換經常需要從記 憶體來存取資料,快速傅立葉轉換的運算需要〇(l〇grN)個 階段’其中N為這個快速傅立葉轉換的長度,r為其基 底’而每一個階段均需要讀寫所有的N個資料。 一般而言,要實現即時的快速傅立葉轉換的運算電路 在超大型積體電路(VLSI)上大致可分為兩大類,一為單一 運算單元(Process Element,PE)架構、一為管線化(Pipelined) 8 200821865 架構。 單一運算單元的快速傅立葉轉換的運算電路架構,則 是採用單一的運算單元來實現快速傅立葉轉換的運算,主 要為四個部份所構成,包含了蝶狀運算單元、暫存記憶 體、控制單元及一個存放WN參數的唯讀記憶體 (Read-Only Memory,ROM )雖然整體的速度因為資料存 取的瓶頸導致運算速度無法與管線化的架構相比較,但這 樣的方式卻具有抵面積消耗的特性。 管線化的電路架構可以在最少的記憶體用量中達到 即時(real-time)的運算,但是其所需的運算單元(PE)則與 logrN成正比的關係,也就是說當N越大,則相對應所需 的運算單元也就越多。 以管線化為主架構的FFT處理器,在硬體架構的實現 上具有規則性高,修改模組容易,處理速度快且直接連線 等主要優點,管線化架構的FFT處理器又有多路徑延遲換 向器(Multi-Path Delay Commutator,MDC)以及單一路徑延 遲回授(Single-Path Delay Feedback,SDF)兩大類電路設 計0 其中,資料排程方面,傳統的SDF與MDC架構是在 一個週期輸入一筆資料給一個運算單元(processor Element)。也就是說該些管線化架構係為單一輸入,分別 單一輸入到暫存器中延遲(delay)—段時間才開始運算,使 整體效率不高。 200821865 【發明内容】 爰疋本發明之主要目的係提供一種傅立葉轉換系 統’該系統不同於傳統架構的資料排程,在輸人資料時同 時輸入兩筆相對應資料給運算單元,讓運算單元能在一個 工作週期就做完一次蝴蝶運算(Butterfly200821865 IX. INSTRUCTIONS: ^ TECHNICAL FIELD OF THE INVENTION The present invention relates to a fast Fourier transform system, and more particularly to a pipelined Fourier transform system that can process dual input signals. [Prior Art] Due to the increasing demand for multimedia, the amount of data is huge, and the rise of wireless/wireline broadband communication systems using Orthogonal Frequency Division Multiplexing (OFDM) technology has also arisen. These systems can provide information such as sound, video or other digital information. Most of these multimedia materials must be processed or played on the fly, so instant signal processing is a very important issue. Discrete Fourier Transform (DFT) and Inverse Discrete Fourier Transform (IDFT) have been widely used in OFDM communication systems, and a Fast Fourier Transform of Butterfly architecture (Fast Fourier Transform) The 'FFT' processor can greatly reduce the amount of data computation and increase the computation speed. Fast Fourier Transform (FFT) is a grouping of items that are repeated in the Fourier transform, and is calculated by a mesh flow chart to increase the calculation speed, and the mesh flow diagram is generally called a butterfly structure. A formula for calculating the discrete Fourier transform is of the formula (1): Ϋ, χΙ^Ψν\ k=0,l”", Nl (1) w=0 200821865 and the inverse of x[n] is the discrete wheel before the conversion The input signal, X[k] is the discrete output signal obtained after Fourier transform, and W〃 is e_y(2;r/A〇b The formula of Fourier transform is as shown in equation (2): nk k=0, l"",N- (2) j{2K/N)kn where X[n] is the discrete input signal before the inverse Fourier transform, and x[k] is the discrete output signal after the inverse Fourier transform And e is fast and Fourier transform (FFT) is the action of disassembling equation (1). Generally, there are different disassembly methods such as Radix as 2, 4, 8, or 2/4/8. And a good disassembly method can reduce the amount of computation of complex multiplication. And the resolution of the base number 2 (Radix-2) is in the formula (1), γ = 0, 1, ..., Ν / 2-1, then the even component in the formula (1) can be obtained as the formula (3) The odd component of the formula (4) can be obtained by showing k = 2r + l at the same time. Equations (3) and (4) are collectively referred to as a disassembly method with a base of 2. iV/2-l X[2r]= £(_] + small 2 + (#/2)]F7/2 «=0 Equation (3) N/2-l X [ 2 Γ+1 ] = 2] ( x[n] - χ[η + (Λ^ / 2)]WnWn/2 w=0 Equation (4) As shown in "Figure 1," is a signal with eight data points χ[1], x [2], x[3], x[4], x[5], X[6], x[7]} are the results obtained after FFT, and here the base 2 is used for the disassembly method. In the first picture, in order to make the illustration clear, the components are not marked. 200821865 In Figure 2, in the butterfly structure, there are three levels (four) 2 (N) = 3) The operation process 'and the mother-level includes four butterfly units n and four multiplication thins 12. The butterfly unit 11 is as "the input end of the second picture, the H-end, the - the output and the music port". And each butterfly unit 丨1 has a first &, ^" - output end, and two input data can be input from the first 盥 second input end respectively, then the first round of the butterfly unit η is The output data α+β, and the output, output output data 1) χΒ, that is, Α _ Β., as shown in the "Figure 3" is the butterfly unit η output signal and then multiplied The result of the ruler 12, wherein the multiplier 12 has a coefficient w〇; (4) is output by the second round of the butterfly unit n, and then processed by the multiplier 12 to generate a (Α-Β) ) xw°: the output. For convenience of explanation, in the "ith diagram", the multiplier 12 is represented only by the coefficient of each multiplier 12, and the image of the multiplier 12 is not completely indicated as in the "figure 3". The disassembly method with a base of 2/4/8 can combine the advantages of bases 2, 4 and 8, so there are fewer complex multiplication operations. As shown in Fig. 4, a signal with eight data points is the result of the fast Fourier transform method with a base of 2/4/8. Let r=2s and r=2s+l, and s=0,l,...,N/4-1 are substituted into equation (4), respectively, and the result of equation (6) can be obtained: W/4-1 X[4s+1]= X {(4^] - χ[η + (Ν/2)]) - j\x[n ^N/4]-x[n-h3N/A])}W^W ^/a #/4-1 Equation (5) X[4s+3]= ^{(x[n] - x[n + (N/4)]) + j{x[n + ^/4] - + / Formula (6) 7 200821865 wherein Equations (5) and (6) are referred to as a base 4 disassembly method. If s=2k • and s=2k+l are again given, and k=0, l, N/8-1 are substituted into equations (5) and (6), respectively, then equation (7) to equation (7) can be obtained. Results of 10): 7//8-1 X[8k+1]= Υ, {(4η]-χ[η^(Ν/2)])-j(x[n + N/4]-x[ n + 3N/4]) + n=0 + #/8] - x[" + 5#/8]) - J•(咖+3iV /8] - x(" + 7iV/8])} (7) iV/8-l X [ 8 k+3 ] = X {(x[n] - x[n + (A^ / 2)]) + j{x[n + N / 4]-x[ n + 3N / 4])-l· n=0 W^N,s{(x[n + Ν/^]-χ[η + 5Ν/8]) + j\x[n + 3JV/8]- x(n + IN/8])}}W^nW;% ^,(8) N/Sl X[8k+5]= ^{(x[n] - x[n + (Λ^/2)] ) ~ j(x[n + N/4]-x[n^ 3N/4])- n=0 F/'8 {(♦ + #/ 8] - φ + 57V / 8]) - + 3# / 8] - x(n + 7#/8])}} (9) iV/8-l X [ 8 k+7 ] = X {(x[w] - x[n + (^ / 2) ]) + J\x[n + N/4]-x[n + 3N/4])- n=0 ^Γ/8 {(咖+#/ 8] - Φ + 5iV/ 8]) + ΑΦ + 37V/8] - x(n + 7iV/ 8])} (10) where, the formula (Ό~ (10) is called the base 8 solution, and the formula (3) ~ (10) It is called the disassembly method with the base number of 2/4/8. Fast Fourier transform is a kind of algorithm widely used in digital signal processing, due to the continuous technology of semiconductor process Progress has greatly improved the performance of the fast Fourier transform processor. For most digital signal processing algorithms, fast Fourier transform often needs to access data from memory, and the operation of fast Fourier transform requires 〇(l 〇grN) stages 'where N is the length of this fast Fourier transform, r is its base' and each stage needs to read and write all N data. In general, the arithmetic circuit to achieve instant fast Fourier transform is Very large integrated circuits (VLSI) can be roughly divided into two categories, one is a single operation unit (Process Element, PE) architecture, and the other is Pipelined 8 200821865 architecture. The operation circuit of the fast Fourier transform of a single arithmetic unit The architecture uses a single arithmetic unit to implement fast Fourier transform operations. It consists of four parts, including a butterfly unit, a temporary memory, a control unit, and a read-only memory that stores WN parameters. (Read-Only Memory, ROM) Although the overall speed is not possible due to the bottleneck of data access The pipelined architecture is comparable, but this approach has the characteristics of area consumption. A pipelined circuit architecture can achieve real-time operations with minimal memory usage, but the required computational unit (PE) is proportional to logrN, which means that when N is larger, then The more computing units you need, the more. The FFT processor with pipeline-based architecture has the advantages of high regularity in hardware architecture, easy modification of modules, fast processing speed and direct connection. The FFT processor of the pipelined architecture has multipath. Two types of circuit design: Multi-Path Delay Commutator (MDC) and Single-Path Delay Feedback (SDF). Among them, in terms of data scheduling, the traditional SDF and MDC architectures are in one cycle. Enter a piece of information into a processor unit. That is to say, the pipelined architecture is a single input, and the delay is started in a single input to the scratchpad - the time is started, making the overall efficiency low. 200821865 SUMMARY OF THE INVENTION The main object of the present invention is to provide a Fourier transform system, which is different from the traditional architecture data scheduling. When inputting data, input two corresponding data to the arithmetic unit at the same time, so that the computing unit can Do a butterfly operation in one work cycle (Butterfly
Operation) 〇 本發明係一種快速傅立葉轉換系統,應用於設有複數 、’及(stage)之傅立葉轉換,用以分別處理N個資料點之信號 汁异,而N為大於或等於8的正整數且為8的冪次方,該 系統包括:複數個乘法器,每—乘法器具有—係數,產可 接收-信號’並將該信號與該係數相乘而產生一輸出信 號’及lGg2N級運算電路,其中需運算處理之'级運算模 組包括: 匕一蝴蝶單元,具有一第—輪入端、一第二輸入端、一 二一輸出端及-第二輸出端,二不同信號可分別輸入該等 ^入端,並將該二信號的數值和由該第—輸出端輸出,且 該-㈣的差由該第二輪出端輪出;m括^第 一多工器、一第二多工器, " 號,並將該二信號的其中之j—多卫器可接收兩輸入t 二輸出信號;-控制單元,八^ ’產生該級運算電路奸 可控制該等多工器的切換,與該等多工器電連接’旅 存器,包括-第-暫存器、輪*出正確的信號;以及三暫 器,其中該些暫存器係為先第—暫存#及—第二暫存 第-輸出端輪出的信號透過:出暫存器。該蝴蝶單元之 多工器’及直接傳至該第二,第f暫存器再傳至該第二 夕工器;而該蝴蝶單元之第一 200821865 輸出端輸出的信號透過該第二暫存器再傳至該第二多工Operation) The present invention is a fast Fourier transform system applied to a complex, staged Fourier transform for separately processing the signal juices of N data points, and N is a positive integer greater than or equal to 8. And a power of 8, the system includes: a plurality of multipliers, each multiplier having a - coefficient, producing a receiveable signal - and multiplying the signal by the coefficient to produce an output signal 'and lGg2N level operation The circuit, wherein the level operation module to be processed comprises: a butterfly unit having a first wheel input end, a second input end, a two-one output end, and a second output end, wherein the two different signals can be respectively Inputting the input terminals, and outputting the values of the two signals from the first output terminal, and the difference of the - (four) is rotated by the second round output; m includes the first multiplexer, a first Two multiplexers, " number, and the j-multi-guard of the two signals can receive two input t two output signals; - control unit, eight ^ 'generate the level of operation circuit can control the multiplex Switching between the devices and the multiplexers Including - the first register, the wheel * the correct signal; and the three temporary devices, wherein the temporary registers are first - temporary storage # and - the second temporary storage - output end of the signal transmission: Out of the scratchpad. The multiplexer of the butterfly unit is directly transmitted to the second, the f-th register is transmitted to the second shovel; and the signal outputted by the first end of the butterfly unit of the butterfly unit is transmitted through the second temporary storage Transmitted to the second multiplex
;:二該Γ暫存器傳出之信號再透過該第三暫存器再傳 至該第一多工器。 T ❹乘法器之係數是採用基數為_之快速 傅立葉轉換的係數。 明在^人㈣時可以同時輪人兩筆相對應資料 i運胳讓運异早兀能在—個工作週期就做完-次蝴 蝶=’並將賢料輸出暫存在先進先出暫存器(削)中, 根據其訊號流朗(Signal f1gw Graph),再相㈣的時間點 上同時將兩筆資料輸出給下一級的運算單元做運算,如 此大大提高運算單元的工作效率,使整體使用效率達到 100% 〇 η τ/ί我們所需附貧料點數不同,本發明之架構可由 1久德1Χ 2/4/8來實現8的幕次方點數的FFT,可輕易符 :快速通訊系統處理所需之不同點數,所以本發明 ㈣需越來^轉換线可以滿足現今正在歡或未來系 、、死而越來越向的傳輸速率。 【實施方式】 做進一發明之詳細内容及技術說明,現以實施例來 之用,而不虛、’但應瞭解的是,該等實施例僅為例示說明 锖來閱皮解釋為本發明實施之限制。 運算=架構t係說明本發明實施例之主 一、、及運^r輪組之示意圖及其電路圖。本 200821865 發明係應用於設有複數級之傅立葉轉換,用以分別處理N 個資料點之信號計算,而N為大於或等於8的正整數且為 8的冪次方,該系統包括: 複數個乘法器200,每一乘法器200具有一係數,並 可接收一信號,並將該信號與該係數相乘而產生一輸出信 號,其中該些乘法器之係數是採用基數為2/4/8之快速傅 立葉轉換的係數(如「第6、7圖」所示);及log2N級運算 電路,其中需運算處理之一級運算模組100包括: 一蝴蝶單元110,該蝴蝶單元110具有一第一輸入端 111、一第二輸入端112、一第一輸出端113及一第二輸出 端114,二不同信號可分別輸入該等輸入端111、112,並 將該二信號的數值和由該第一輸出端113輸出,且該二信 號的差由該第二輸出端114輸出。 二多工器150、160,包括一第一多工器150、一第二 多工器160,且每一多工器150、160可接收兩輸入信號, 並將該二信號的其中之一輸出,產生該級運算電路的二輸 出信號。 一控制單元170,該控制單元170分別與該等多工器 150、160電連接,並可控制該等多工器150、160的切換, 使該等多工器150、160輸出正確的信號;以及三暫存器 120、130、140,包括一第一暫存器120、一第二暫存器 130及一第三暫存器140,其中該些暫存器120、130、140 係為先進先出暫存器。 該蝴蝶單元110之第一輸出端113輸出的信號透過該 12 200821865 第暫存器120再傳至兮·笛 110之第-M姓 器15G,且該蝴蝶單元 哭160 J 輸出的信號也直接傳至該第二多工 透過該第二暫存器⑽再輸出的钱 二暫存器13。傳出 夕以16°’及該弟 該第-多工器15〇。如。是:ΓΛ過該第三暫存器140再傳至 輪入信號,並藉由該㈣單―一二器150、160分別接收兩 中之^ 控制早70 17〇的控制將該二信號的其 之:輪出’產生該級運算電路的二輸出信號。 傅立笨μ 2點的FFT處理電路為例來說明本發明之快速 fft,轉換糸統的整體運作方式,其中本發明之64點的 圖如3路 Γ整體方塊圖如「第8圖」所示,而其時序 来 \ C圖」所示。其電路動作原理說明如下·· 方倍^驟1:第一級的運算 100谓輸入兩筆資料,(為 程t明’將以「第10圖」餘為謂演算法之訊號流 時八^月本土月之動作模式),蝴蝶單元110在做訊號處理 相力刀要輸入X(〇)以及X(32)給第一級的蝴蝶單元110做 =目減的動作(如「第1〇圖」所示),運算模組1〇〇的 如下·认1N1以及1N2(如「第9a圖」所示)輸入順序分別 在在· ΙΝ1輸入資料順序χ[〇] χ[1],χ[2]……χ[Ν/21], 同時ΙΝ2輸入的資料順序為χ[Ν/2],χ[Ν/2+ι], 料、2+2]......Χ [Ν·1] ’經過蝴蝶單元110之後所得到的資 “為 a[k卜x[k]+x[N/2+k],b[k卜X[k]-X[N/2+k],其中 、〇“,2......N/2-1 〇 、 ^驟2:由於第一級的運算模組1〇〇處理完之後要接著 13 200821865 送給第二級的運算模組100處理,但由於第二級的運算模 組1⑼輸入資料排程(data ordering)的關係,經過第一級的 運异模組10〇所輸出的結果並不是第二級的運算模組100 所需要的資料,(如「第10圖」所示)。因為第二級的運算 模組100所需要的資料是a[k]及a[N/4+k],而此時運算f 元第一級只輸出a[k]和b[k],不能滿足第二級的運算模組 100的需求,所以在此時需要將第一級的運算模組1〇〇的 輸出結果暫存至其暫存器120、130(FIFO 一 16)中(暫存器 存入a[k],暫存器130存入b[k]),經由暫存器12〇、 130(FIFO—16)將a[k]及b[k]資料延遲16個週期時間再從暫 存器120、130(FIF〇一 16)中輸出,當a[k]與b[k]資料延遲 16週期時間後從暫存器12〇、130(FIFO一 16)輸出時第一級 的運算模組100已經將a[N/4+k]的結果運算完成,然後再 同時將暫存器120、130(FIFO—16)上的資料以及第一級的 運算模組100結果的資料送給第二級的運算模組1〇(),以 符合資料排程(data ordering)的關係。 % 步驟3:經由第一級的運算模組1〇〇處理完成資料有 a[k]和b[k],但為了符合第二級的運算模組1〇〇所需要的 資料排程(data ordering),a[k]及a[N/4+k]必須先行送出給 第二級的運算模組100處理,但b[k]的資料也在a[k]完成 時一併完成,所以b[k]的資料要在第二暫存器13〇中等待 更長的時間,等到a[k]處理完成後,b[k]才能由第一級的 運异模組100送給第二級的運算模組1 〇〇。如「第8圖」 所示,在b[k]完成時,直接送到第二暫存器130(FIFO_16) 14 200821865 - 中暫存,该第一級的第二暫存器13〇(FIF〇一 16)的長度為 16,§ b[16]的資料完成時,該第二暫存器130(FIFO_16) 的内容已經全滿,第二暫存器13〇(fif〇 一 16)已經要將資料 达出,可是第一級的運算模組1〇〇正在送出a[k]的資料給 第一級的運异模組100,尚無可利用的通道輸出 b[k]的資 料,所以此日守第二暫存器13〇(FIF〇—16)所送出的資料必須 繼縯送到其後端的第三暫存器、14q(fif〇—16)做暫存的動 作’第二暫存器14〇的長度一樣為16。此時b[16]資料送 至第二暫存器130(FIFO—16)中,而第二暫存器 13〇(™?-16)將_資料送至第三暫存H 140巾,當第二 級運算單元將对幻資料處理完畢後,開始接受b[k]的資 料,而此日寸b[〇]的資料經由第三暫存器14〇(FIF〇—16)延遲 16個週期時間後,經由第一多工器將b [ 〇 ]由第一輸出〇 (i 送到第二級的運算模組100,而從第二暫存器13〇(fif〇_16) 所要送出的貧料b[16]經由第二多工器16〇將b[16]送至第 二輸出oiit—2,第一輸出out—i所輪出的資料b[〇]以及第二 、 輸出0ut-2所輸出的資料b[16]滿足第二級的運算模組1〇〇 所需的資料排程(data ordering)。 步驟4:第二級的運算模組1〇()至第三級的運算模組 1〇〇的電路動作原理依舊如同步驟i〜步驟3,其主要的差 別只要於暫存器12〇、130、140的長度,以處理64點的 FFT為例,弟一級的運算模組1 〇〇的三個暫存器12〇、13〇、 140的長度皆為16(FIFO_16),而第二級的運算模組1〇〇 的暫存器120、13〇、140的長度皆為8(FIF〇_8),第三級 15 200821865 的運算模、級100的暫存器120、130、140的長度皆為 (F0 4)第四級的運异模組1〇〇的暫存器^20、130、 刚的長度皆為2(FIFO_2),第五級的運算模組⑽的暫存 f 120、、^0、140的長度皆為l(FIFO_l),至最後-個第 六級的運賴組剛’因為直接將運算最後結果輸出,故 不需要暫存其資料。 步驟5·在第二級的運算模組以及第四級的運算模 、、且.⑽中間’必須經過兩組複數乘法器乘上旋轉因子 (lddle factor) ’因為啦]以及b[k]的資料是同時經由第三 級的運算模組議同時輪出啦]和a_6+k],b[k]和 =16+k]。所以需要同時將·要的旋轉因子乘上後輸入 給弟四級的的運算模組1〇〇。 步驟6:而後端第四級至第六級的運算模組刚電 ^原理如前述步驟1〜步驟3相同,差別在於不同暫存的長 个㈣所提出的新結構藉由—組多工器,以及先進 ^0)暫存器的樣態下,本發明的輸人為雙輸入,對 =說皆可同時輸入所需相對應資料,加上每級中並 式的先進先罐F0)暫存器架構 :"。。%’使蝴蝶單元及複數乘法二= :°二相較於傳統_管線式架構的戰路: 异虿增加為2倍。 ,、 -2/4/Γ(使因用2發9明的結構是由一階段結構組成基 (吏用基數_2/4/8演算法)之電路,再由整個基 16 200821865 -2/4/8的電路當成〜έ — 、、且,错由多組基數-2/4/8來組成所需 FFT,配合上不同的Ftf 一 ^ 大小,就可組成任意所需8冪次 方點數之FFT,設計φ、奋入 ^ ^ , T出週合不同OFDM通訊系統的可變長 度的傅立葉轉換電略。 r隹上述僅^本發明之較佳實施例而已,並非用來限定 ^月⑯之㈣。即凡依本發明申請專利範圍所做的均 艾化與修飾’皆為本發明專利範圍所涵蓋。 【圖式簡單說明】 第1圖,係說明-筆具有八個資料點的信號,基數為2的 演算法之訊號流程圖。 第2圖,係說明一蝴蝶單元的架構示意圖。 第3圖,係說明該蝴蝶單元輸出的信號再經過一乘法器後 的結果示意圖。 第4圖,係說明一筆具有八個資料點的信號以基數為2/4/8 演算法之訊號流程圖。 第5圖,係說明本發明實施例之主運算模組架構示意圖。 第6圖,係本發明實施例FFT電路之前三級運算模組之示 思圖。 第7圖,係本發明實施例FFT電路之前三級運算模組之電 袼圖。 第8圖,係本發明實施例柯丁處理電路之整體方塊圖(N=64 點為例)。 第9a〜9c圖,係本發明實施例FFT處理電路之時序圖 (N=64點為例)。 17 200821865 • 第10圖,係採用先前技術之基數為2/4/8演算法之訊號流 • 程圖(N二64點為例)。 【主要元件符號說明】 11 :蝴蝶單元 12 :乘法器 100 :運算模組 110 :蝴蝶單元 111:第一輸入端 112 :第二輸入端 113 :第一輸出端 114 :第二輸出端 120 :第一暫存器 130 :第二暫存器 140 :第三暫存器 150 :第一多工器 160 ··第二多工器 170 :控制單元 200 :乘法器 18;: 2 The signal transmitted from the buffer is transmitted to the first multiplexer through the third register. The coefficient of the T ❹ multiplier is the coefficient of the fast Fourier transform with the base _. Ming ^ people (four) can be two people at the same time, the corresponding information i transport 让 运 异 兀 兀 兀 — — — — — — — 次 次 次 次 次 次 次 次 次 次 次 次 次 次 次 次 次 次 次 次 次 次 次 次 次 次 次 次In (sharp), according to its signal f1gw Graph, at the same time (4), the two data are simultaneously output to the next-level arithmetic unit for operation, which greatly improves the working efficiency of the arithmetic unit and makes the whole use. The efficiency reaches 100% 〇η τ/ί We need different points of poor materials. The architecture of the present invention can realize the FFT of 8 scene points by 1 久德1Χ 2/4/8, which can be easily: fast The communication system handles the different points required for processing, so the present invention (4) needs to be able to meet the transmission rate of today's happy or future, dead and increasingly oriented. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The detailed description and the technical description of the present invention are now described in the context of the embodiments, and are not intended to be The limit. Operation = Architecture t is a schematic diagram of a main assembly of the embodiment of the present invention, and a circuit diagram thereof. The invention of 200821865 is applied to a Fourier transform with a complex level for separately processing signal calculations of N data points, and N is a positive integer greater than or equal to 8 and a power of 8, the system comprising: a plurality of Multiplier 200, each multiplier 200 has a coefficient and can receive a signal and multiply the signal by the coefficient to produce an output signal, wherein the coefficients of the multipliers are 2/4/8. The coefficient of the fast Fourier transform (as shown in "Fig. 6, 7"); and the log2N level operation circuit, wherein the arithmetic processing unit 100 includes: a butterfly unit 110 having a first The input end 111, a second input end 112, a first output end 113 and a second output end 114, two different signals can be respectively input to the input terminals 111, 112, and the values of the two signals are An output terminal 113 outputs, and the difference between the two signals is output by the second output terminal 114. The second multiplexer 150, 160 includes a first multiplexer 150 and a second multiplexer 160, and each multiplexer 150, 160 can receive two input signals and output one of the two signals. , generating two output signals of the stage operation circuit. a control unit 170, the control unit 170 is electrically connected to the multiplexers 150, 160, respectively, and can control the switching of the multiplexers 150, 160, so that the multiplexers 150, 160 output correct signals; And the three registers 120, 130, 140, including a first register 120, a second register 130, and a third register 140, wherein the registers 120, 130, 140 are advanced First out of the scratchpad. The signal outputted by the first output end 113 of the butterfly unit 110 is transmitted to the first-M device 15G of the 兮 flute 110 through the 12 200821865 first register 120, and the signal output by the butterfly unit crying 160 J is also directly transmitted. The second multiplexer 13 is re-outputted by the second multiplexer through the second register (10). On the eve of the 16th's and the younger, the first multiplexer 15〇. Such as. Yes: the third register 140 is transmitted to the rounding signal, and the (four) single-two units 150, 160 respectively receive two of them. It: rotates out to produce the two output signals of the stage of the operation circuit. Fu Li's 2 point FFT processing circuit is taken as an example to illustrate the overall operation mode of the fast fft, conversion system of the present invention, wherein the 64-point diagram of the present invention is as shown in the "Figure 8" Show, and its timing is shown in \C图". The principle of circuit operation is as follows: · Square multiplication 1: The first level of operation 100 means inputting two pieces of data, (for Cheng Tming' will use "10th picture" as the signal flow of the algorithm. In the action mode of the month of the month, the butterfly unit 110 performs the signal processing of the phase force knife to input X (〇) and X (32) to the first stage butterfly unit 110 to do the reduction operation (such as "the first map" In the calculation module 1〇〇, the input sequence of 1N1 and 1N2 (as shown in "Fig. 9a") is entered in the data sequence χ[〇] χ[1], χ[2 ]......χ[Ν/21], at the same time ΙΝ2 input data order is χ[Ν/2], χ[Ν/2+ι], material, 2+2]......Χ [Ν·1 ] 'The capital obtained after passing through the butterfly unit 110 is a[kbu x[k]+x[N/2+k], b[kbu X[k]-X[N/2+k], where 〇", 2...N/2-1 〇, ^2: Since the first-level computing module 1〇〇 is processed, it is then sent to the second-level computing module 100 after 13 200821865 Processing, but because of the relationship between the data ordering of the second-level computing module 1 (9), the output of the first-level transport module 10〇 Information is not the result of the second stage of the operation module 100 required (as shown in "Figure 10"). Because the data required by the second-stage computing module 100 is a[k] and a[N/4+k], at this time, the first stage of the operation f-factor only outputs a[k] and b[k], and cannot The requirements of the second-level computing module 100 are met. Therefore, at this time, the output result of the first-level computing module 1〇〇 needs to be temporarily stored in its temporary registers 120 and 130 (FIFO-16) (temporary storage). The device stores a[k], the register 130 is stored in b[k]), and the a[k] and b[k] data are delayed by 16 cycle times via the registers 12〇, 130 (FIFO-16). Output from the registers 120, 130 (FIF〇16), when the a[k] and b[k] data are delayed by 16 cycles from the registers 12〇, 130 (FIFO-16), the first stage The computing module 100 has completed the result of a[N/4+k], and then simultaneously stores the data on the registers 120, 130 (FIFO-16) and the data of the first-level computing module 100. It is sent to the second-level computing module 1〇() to conform to the data ordering relationship. % Step 3: Processing the completed data through the first-level computing module 1 a[k] and b[k], but in order to meet the data scheduling required by the second-level computing module 1 Ordering), a[k] and a[N/4+k] must be sent out to the second-level computing module 100 first, but the data of b[k] is also completed when a[k] is completed, so The data of b[k] is to wait for a longer time in the second register 13〇, and b[k] can be sent to the second stage by the transport module 100 of the first stage after the completion of a[k] processing. Level of the operation module 1 〇〇. As shown in Fig. 8, when b[k] is completed, it is directly sent to the second temporary register 130 (FIFO_16) 14 200821865 - temporarily stored, and the second stage of the first stage is 13F (FIF) 〇16) has a length of 16, and when the data of § b[16] is completed, the content of the second register 130 (FIFO_16) is already full, and the second register 13〇(fif〇16) has been The data is released, but the first-level computing module 1 is sending the data of a[k] to the first-level transport module 100, and there is no available channel output b[k] data, so The data sent by the second register 13F (FIF〇-16) must be sent to the third register of the backend, 14q (fif〇-16) for temporary storage. The length of the memory 14 is the same as 16. At this time, the b[16] data is sent to the second register 130 (FIFO-16), and the second register 13(TM?-16) sends the data to the third temporary H 140 towel. After the second-level arithmetic unit finishes processing the magic data, it starts to accept the data of b[k], and the data of this day b[〇] is delayed by 16 cycles via the third register 14〇 (FIF〇-16). After the time, b [ 〇 ] is sent from the first output 〇 (i to the second-stage computing module 100 and from the second temporary register 13 fi (fif 〇 _ 16) via the first multiplexer. The poor material b[16] sends b[16] to the second output oiit-2 via the second multiplexer 16〇, the data b[〇] and the second output OUT of the first output out_i 2 The output data b[16] satisfies the data ordering required by the second-level computing module 1 Step 4: The second-level computing module 1〇() to the third level The circuit operation principle of the operation module 1〇〇 is still the same as steps i to 3. The main difference is as long as the length of the registers 12〇, 130, and 140, and the 64-point FFT is taken as an example. Group 1 〇〇 three registers 12 〇, 13 〇, 140 The degrees are all 16 (FIFO_16), and the lengths of the registers 120, 13〇, 140 of the second-level computing module 1〇〇 are all 8 (FIF〇_8), and the operation mode of the third level 15 200821865, The lengths of the registers 120, 130, and 140 of the stage 100 are all (F0 4), and the temporary registers of the fourth-level transport module 1 are 130 (FIFO_2), respectively. The lengths of the temporary storage f 120, ^0, and 140 of the five-level computing module (10) are all l (FIFO_l), and the last-to-sixth-level tribute group just 'sends the final result of the operation directly, so It is necessary to temporarily store its data. Step 5: In the second-level operation module and the fourth-level operation module, and in the middle of (10), you must multiply the lddle factor by the two sets of complex multipliers [because] And the data of b[k] is simultaneously rotated by the third-level operation module] and a_6+k], b[k] and =16+k]. Therefore, it is necessary to multiply the twirl factor at the same time. After inputting the operation module 1 to the fourth level of the brother. Step 6: The operation module of the fourth to sixth stages of the back end is the same as the steps 1 to 3 of the foregoing, the difference is different temporary storage. Long (4) Under the state of the new structure proposed by the group multiplexer and the advanced ^0) register, the input of the present invention is double input, and the input corresponding information can be input at the same time, plus Advanced level tank F0) register structure in each stage: ". . %' makes the butterfly unit and the complex multiplication two = : ° The phase of the two phases is compared to the traditional _ pipeline architecture: the increase in the 虿 is 2 times. ,, -2/4/Γ (so that the structure of the two-phase structure is composed of a one-stage structure (using the base _2/4/8 algorithm), and then the whole base 16 200821865 -2/ The 4/8 circuit is composed of ~έ-, and, the error is composed of multiple sets of bases -2/4/8 to form the desired FFT, and the different Ftf-^ size can be combined to form any desired 8 power points. The FFT of the number, the design φ, the end of the ^ ^, T out of the different lengths of the OFDM communication system, the variable length of the Fourier transform circuit. r 隹 above the preferred embodiment of the present invention, not used to limit the ^ month 16(4). All the modifications and modifications made by the scope of the patent application of the present invention are covered by the scope of the invention. [Simplified description of the drawing] Figure 1 shows that the pen has eight data points. Signal, the signal flow chart of the algorithm with base 2. Figure 2 is a schematic diagram showing the structure of a butterfly unit. Fig. 3 is a schematic diagram showing the result of the signal output by the butterfly unit after passing through a multiplier. Figure is a signal flow diagram illustrating a signal with eight data points in the base 2/4/8 algorithm. FIG. 6 is a schematic diagram of a three-level operation module before the FFT circuit of the embodiment of the present invention. FIG. 7 is a diagram of the FFT circuit of the embodiment of the present invention. FIG. 8 is an overall block diagram of a Koding processing circuit according to an embodiment of the present invention (N=64 points is taken as an example). FIGS. 9a-9c are diagrams of an FFT processing circuit according to an embodiment of the present invention. Timing diagram (N=64 points as an example) 17 200821865 • Figure 10 is a signal flow diagram with the base 2/4/8 algorithm of the prior art (N=64 points as an example). DESCRIPTION OF SYMBOLS 11: Butterfly unit 12: Multiplier 100: Operation module 110: Butterfly unit 111: First input terminal 112: Second input terminal 113: First output terminal 114: Second output terminal 120: First temporary storage 130: second register 140: third register 150: first multiplexer 160 · second multiplexer 170: control unit 200: multiplier 18