TWI313825B

TWI313825B -

Info

Publication number: TWI313825B
Application number: TW95141594A
Authority: TW
Inventors: Ming-Qian Gao; Ying-Ji Chen; Yin Tsung Hwang; Ming Hwa Sheu
Original assignee: Univ Nat Yunlin Sci & Tech
Priority date: 2006-11-10
Filing date: 2006-11-10
Publication date: 2009-08-21
Also published as: TW200821865A

Description

1313825 九、發明說明： - 【發明所屬之技術領域】本發明係有關一種快速傅立葉轉換系統，特別是指一種可處理雙輸入信號的管線式傅立葉轉換系統。【先前技術】由於多媒體需求越來越高，造成資料量龐大，也造就使用正父頻率多重分割（〇rth〇g〇nai Frequency Division φ Multiplexing ’ 0FDM)技術的無線/有線寬頻通訊系統崛起。這些系統可以提供聲音、影像或其他數位資訊等資料’這些多媒體資料大部分必須即時處理或播放，所以即時的訊號處理為非常重要的課題。離政形式的傅立葉轉換（Discrete Fourier Transform， DFT)與反傅立葉轉換(inverse Discrete Fourier Transform, IDFT)已廣泛的用於〇FDM通訊系統中，而一種蝶狀 (Butterfly)架構的快速傅立葉轉換(Fast F〇urier Transform， ® FFT)處理器更可以大量減少資料的運算量，而提昇計算速度。快速傅立葉轉換(FFT)是將重複出現在傅立葉轉換中的項次編組’並用一種網狀的流程圖來計算以提昇計算速度，而該網狀流程圖一般稱為蝶狀結構。一計算離散傅立葉轉換的公式如式（1)所式： N-\ X[k卜 k=0，l，...，N-l 式（1) n=0 \ / 1313825 而反其中x[n]為轉換前的離散輸入訊號，X[k]為緩過傅立葉轉換後得到的離散輸出信號’且w =則為e"(2w)fol。傅立葉轉換的公式則如式（2)所式： N-1 x[k]=丄，k=0，l,…，Ν-11313825 IX. INSTRUCTIONS: - TECHNICAL FIELD OF THE INVENTION The present invention relates to a fast Fourier transform system, and more particularly to a pipelined Fourier transform system that can process dual input signals. [Prior Art] Due to the increasing demand for multimedia, the amount of data is huge, and the rise of the wireless/wireline broadband communication system using the 父rth〇g〇nai Frequency Division φ Multiplexing' 0FDM technology is also created. These systems can provide information such as sound, video or other digital information. Most of these multimedia materials must be processed or played on the fly, so immediate signal processing is a very important issue. The Discrete Fourier Transform (DFT) and the Inverse Discrete Fourier Transform (IDFT) of the detached form have been widely used in the FDM communication system, and a fast Fourier transform of the butterfly structure (Fast) The F〇urier Transform, ® FFT) processor can greatly reduce the amount of data computation and increase the computation speed. Fast Fourier Transform (FFT) is a grouping of items that are repeated in the Fourier transform and is calculated using a mesh flow diagram to increase the computational speed, which is generally referred to as a butterfly structure. A formula for calculating the discrete Fourier transform is as shown in equation (1): N-\ X[k bk=0,l,...,Nl Equation (1) n=0 \ / 1313825 and against x[n] For the discrete input signal before conversion, X[k] is the discrete output signal obtained after the Fourier transform is delayed and w = then e"(2w)fol. The formula for Fourier transform is as shown in equation (2): N-1 x[k]=丄,k=0,l,...,Ν-1

N 式（2)N formula (2)

Mk] 其中X[n]為在做反傅立葉轉換前的離散輸入訊為經過反傅立葉轉換後得到的離散輸出信號，且1則為 j(2jr/N)kn e 而快速傅立葉轉換（FFT)則是將式（1)進行柝解的動作，一般有以基數(Radix)為2、4、8或2/4/8等不同的柝解法，而一個好的拆解法可使複數(Complex)乘法的運算量減少。且基數為2(Radix-2)的拆解法是令式（1)中的 r=0，l，..”N/2-l，則可得到式（1)中的偶數成分為式（3)所示且同時令k=2r+l，可得到如式（4)的奇數成分。而式（3)與式(4)則合稱基數為2的拆解法。式（3) X[2r]= Y.{x[n] + x[n + {N I2)])Wn,i X [2r+1 ] = £ (χ[«] - x[n + {NI 2)])Wnwn a 式(4) 如「第1圖」所示，是一筆具有八個資料點的信號{χ[0]， x[l]，x[2]，x[3]，x[4], x[5]，x[6], x[7]}在經過 FFT 後所得到的結果，且在此是採用基數為2的拆解法。而需先說明的是「第1圖」中為了使圖示清楚’並未標示出元件的編號。 1313825 在「第1圖」的蝶狀架構中程，且每一級都包括四個蝴蝶單、-二級(l〇g2(N)=3)運算過蝴蝶單元11則如「第2圖 '早元11及四個乘法器12。有-第-輸入端、-第二輪」:斤端不’且每-蝴蝶單元11具輸出端。而兩輸入資料八與^可八〜第一輸出端及一第二端輸入，則該蝴蝶單元n之第°一刀別從該第一與第二輸入而該第二輸出端則輪出資料Α輪出端即輸出資料Α+Β，如「第3圖」所示：是蝴蝶^ ’即為Α-Β。過一乘法器12的結果，其中該乘法^ U輸出的信號再經則（A-B)由該蝴蝶單元u之第二，器12具有—係數’ 乘法器12處理，經產生一（A_ =出。，輪出後，再經過該在此為了方便說明，在（^的輸出。乘法器12之係數來代表該乘法器121圖」中只是以每一有完整地標明該乘法器12的圖像。，而不像「第3圖」而以基數為2/4/8的拆解法與8的優點，故有較少的複數乘法°融合以基數為2、4 示，係-筆具有八個資料點的信號第4圖= 速傅立葉轉換法運算後的結果。令2/4/8 、义 7 r 2s 及 r=2s+l ，且 s二0，1，."，Ν/4·1分別代入式（4)中，可得到如式（5)與式（6)的結果： Ν/4-Λ X [4S+1 ] = Σ- An + (N/ 2)]) - j'(x[n + ^/4] ~ + 3iy y 4])}p^^/4 式（5) N/4~)Mk] where X[n] is the discrete input signal obtained after inverse Fourier transform before the inverse Fourier transform, and 1 is j(2jr/N)kn e and fast Fourier transform (FFT) It is an action of decomposing equation (1). Generally, there are different simplification methods such as Radix, 2, 4, 8, or 2/4/8, and a good disassembly method can make complex. The amount of calculation for multiplication is reduced. And the resolution of the base number 2 (Radix-2) is such that r=0, l, .."N/2-l in the formula (1), the even component in the formula (1) can be obtained as the formula ( 3) The above-mentioned and at the same time let k = 2r + l, the odd component of the formula (4) can be obtained, and the formula (3) and the formula (4) are collectively called the base of the disassembly method. Equation (3) X [2r]= Y.{x[n] + x[n + {N I2)])Wn,i X [2r+1 ] = £ (χ[«] - x[n + {NI 2)])Wnwn a Equation (4) As shown in Figure 1, is a signal with eight data points {χ[0], x[l], x[2], x[3], x[4], x [5], x[6], x[7]} The result obtained after FFT, and here is the disassembly method with a base of 2. It should be noted that in the "Fig. 1", in order to make the illustration clear, the number of the component is not indicated. 1313825 In the middle of the butterfly structure of "Picture 1", each level includes four butterfly orders, - level 2 (l〇g2(N)=3). The butterfly unit 11 is operated as shown in "Fig. 2" Element 11 and four multipliers 12. There are - first input, - second round": the end is not 'and each - butterfly unit 11 has an output. And the two input data eight and the ^ eight to the first output end and the second end input, the butterfly unit n of the first step from the first and second inputs and the second output end of the data Α The output is Α+Β, as shown in "Figure 3": It is a butterfly ^ 'that is Α-Β. The result of a multiplier 12, wherein the signal output by the multiplication ^ U is then (AB) by the second of the butterfly unit u, the processor 12 has a - coefficient 'multiplier 12, which produces one (A_ = out. After the rounding, the image of the multiplier 12 is completely indicated in each of the (multipliers of the multipliers 12). Unlike "3", with the base 2/4/8 disassembly method and the advantage of 8, there are fewer complex multiplications. The fusion is based on the base 2, 4, and the system has eight pens. The signal of the data point is shown in Fig. 4 = the result of the fast Fourier transform method. Let 2/4/8, 7r 2s and r=2s+l, and s2 0,1,.",Ν/4· 1 Substituting into equation (4), the results of equations (5) and (6) are obtained: Ν/4-Λ X [4S+1 ] = Σ- An + (N/ 2)]) - j' (x[n + ^/4] ~ + 3iy y 4])}p^^/4 Equation (5) N/4~)

X[4s+3]= ^{{xin^-xln + iN 1^)^ j{x[n + N l^\-x[n + 3N n-0 式⑹ J313825 其中式（5)與式(6)稱為基數為4的拆解法。若再令s=2k 及s=2k+：l，且k=0，l，...，N/8-l分別代入式（5)與式⑹中，則可得到如式（7)〜式（10)的結果： iV/8-l X [ 8k+1 ] = Σ ^ - x[n + (N / 2)]) - + N/4]-x[n + 3N/4]) + n=0 F/’8{〇[« + #/8] -4« + 5iV/8]) - /(x[” + 3iV/8] - x(« + 7#/8])}式（7) Λ^/8-l X[8k+3]= Σ{(χ[«]-χ[« + (Λ^/2)]) + j\x[n + N/4]-x[n + 3N/A]) + n=0 πΓ/8{〇[« + 8] -+ 5iV/8]) + /(x[« + 3# /8] - ♦ + 7#/8])}}二式（8) N !%-\ 攀 X[8k+5]= Σ{(4«]-φ+(#/2)])-y(x[«+#/4]-x[«+3AT/4])- «=0 </8{(X« + #/8]-φ + 57V/8])-y(4« + 3ΛΓ/8]-♦ + 7W/8])}式（9) 况/8-1 X[8k+7]= Σ {(4«]~ 4«+ (^/2)]) + jXx[n + N/4]-x[n + 3N/A])- /1=0 {(Jc[« + TV/8]-x[n + 5N/S]) + j(x[n + 3N/S]-x(n + 7N/8])}}W^W^k/s (1 0) 其中，式（7)〜式（10)則稱為基數為8的拆解法。而式(3)〜式（10)則稱為基數為2/4/8的拆解法。快速傅立葉轉換是一種被廣泛應用於數位信號處理 • 的演算法，由於半導體製程技術不斷的進步，已經使得快速傅立葉轉換處理器的效能大大的提高了。對於大部分數位信號處理的演算法而言，快速傅立葉轉換經常需要從記 fe體來存取資料，快速傅立葉轉換的運算需要0(l〇grN)個 Ps & ’其中N為這個快速傅立葉轉換的長度，γ為其基底’而每—個階段均需要讀寫所有的N個資料。 —般而言，要實現即時的快速傅立葉轉換的運算電路在超大型積體電路(VLSI)上大致可分為兩大類，一為單一運算單元(Process Element, PE)架構、一為管線化(Pipelined) •1313825 架構。 " 單一運算單元的快速傅立葉轉換的運算電路架構，則是採用單一的運算單元來實現快速傅立葉轉換的運算，主要為四個部份所構成，包含了蝶狀運算單元、暫存記憶體、控制單元及一個存放WN參數的唯讀記憶體 (Read-Only Memory，ROM )雖然整體的速度因為資料存取的瓶頸導致運算速度無法與管線化的架構相比較，但這樣的方式卻具有抵面積消耗的特性。 _ 管線化的電路架構可以在最少的記憶體用量中達到即時(real_time)的運算，但是其所需的運算單元(PE)則與 logrN成正比的關係，也就是說當N越大，則相對應所需的運算單元也就越多。以管線化為主架構的FFT處理器，在硬體架構的實現上具有規則性高，修改模組容易，處理速度快且直接連線等主要優點，管線化架構的FFT處理器又有多路徑延遲換向器（Multi-Path Delay Commutator, MDC)以及單一路徑延 _ 遲回授（Single-Path Delay Feedback, SDF)兩大類電路設計。其中，資料排程方面，傳統的SDF與MDC架構是在一個週期輸入一筆資料給一個運算單元（Processor Element)。也就是說該些管線化架構係為單一輸入，分別單一輸入到暫存器中延遲(delay)—段時間才開始運算，使整體效率不高。 1313825 【發明内容】爰是，本發明之主要目的係提供一種傅立葉轉換系統，該系統不同於傳統架構的資料排程，在輸入資料時同時輸入兩筆相對應資料給運算單元，讓運算單元能在一個工作週期就做完一次蝴蝶運算（Butterfly Operation)。本發明係一種快速傅立葉轉換系統，應用於設有複數級（stage)之傅立葉轉換，用以分別處理N個資料點之信號計算，而N為大於或等於8的正整數且為8的冪次方，該 • 系統包括：複數個乘法器，每一乘法器具有一係數，並可接收一信號，並將該信號與該係數相乘而產生一輸出信號，及log2N級運算電路，其中需運算處理之一級運算模組包括：一蝴蝶單元，具有一第一輸入端、一第二輸入端、一第一輸出端及一第二輸出端，二不同信號可分別輸入該等輸入端，並將該二信號的數值和由該第一輸出端輸出，且該二信號的差由該第二輸出端輸出；二多工器，包括一第 • 一多工器、一第二多工器，且每一多工器可接收兩輸入信號，並將該二信號的其中之一輸出，產生該級運算電路的二輸出信號；一控制單元，分別與該等多工器電連接，並可控制該等多工器的切換，以輸出正確的信號；以及三暫存器，包括一第一暫存器、一第二暫存器及一第三暫存器，其中該些暫存器係為先進先出暫存器。該蝴蝶單元之第一輸出端輸出的信號透過該第一暫存器再傳至該第一多工器，及直接傳至該第二多工器；而該蝴蝶單元之第二 1313825 輪出端輸出的信號透過該第二暫存哭》兮楚-誕士丹傳至s亥第二多工為，及㈣-暫存器傳出之信號魏過 ς 至該第一多工器。 $仔态再傳 2/4/8之快速其中，該些乘法器之係數是採用基數為傅立葉轉換的係數。本發明在輪入資料時可以同時輸入兩筆相對應資料給運算單元’讓運算單元能在-個工作週期就做完一次蝴蝶運算，並將資料輸出暫存在先進先出暫存器（fif〇)中， φ 根據其訊號流程圖（Signal Flow Graph)，再相對應的時間點上’同時將兩筆資料輸出給下一級的運算單元做運算，如此可大大提高運算單元的工作效率，使整體使用效率達到 100%。根據我們所需FFT資料點數不同，本發明之架構可由多組Radix-2/4/8來實現8的冪次方點數的FFT，可輕易符合各種不同之通訊系統處理所需之不同點數，所以本發明之快速傅立葉轉換系統可以滿足現今正在制定或未來系馨統所需越來越高的傳輸速率。【實施方式】茲有關本發明之詳細内容及技術說明，現以實施例來做進一步說明，但應暸解的是，該等實施例僅為例示說明之用，而不應被解釋為本發明實施之限制。請參閱「第5〜7圖」所示，係說明本發明實施例之主運算模組架構、前三級運算模組之示意圖及其電路圖。本 1313825 用於設有複數級之傅立葉轉換，用 ==:N為大於或等於8的正整數且為複數個乘法$ 200，每-乘法器2〇〇具有二接=信號’並將該信號與該係數相乘而產生一輸出信立^棘:該些乘法器之係數是採用基數為2/4/8之快速傅電:轉=係數(如「第6、7圖」所示);及1。_級運算電路’其中需運算處理之一級運算模組100包括： ”第—輸入端112、-第-輸出端113及一第二輸出 ^14 ’―不同信號可分別輸人該等輸人端111、112，並號的數值和由該第一輸出端113輸出，且該二信唬的差由該第二輸出端114輸出。，多工器150、160,包括一第一多工器15〇、一第二多工器160，且每一吝工哭, 並將該二信號的V之二出二。咖出信號。八輸出，產生該級運算電路的二輸 1 π 早7° 170，該控制單元170分別與該等多工器 =二接’並可控制該等多工器15〇、160的切換， =夕〇工Γ 150 ' 160輪出正確的信號；以及三暫存器 :笛：包括一第—暫存器120、一第二暫存器 n第—暫存器140，其中該些暫存器120、130、140 係為先進先出暫存器。 °亥蝴蝶早兀11G之第—輪出端113輸出的信號透過該 12 1313825 第一暫存器120再傳至該第一多工器ι5〇，且該 110之第一輸出端113輸出的信號也直接傳至泫第二多工器16〇。而該蝴蝶單元110之第二輪出端114^㈣信號透過該第二暫存器13〇再傳至該第二多工器16〇，及該第二暫存器13G傳出之信號再透過該第三暫存器⑽再傳至該第-多工器150。如是該二多工器15〇、16〇分別接收兩輸入信號’並藉由該控制單元170的控制將該二信號的其中之一輸出，產生該級運算電路的二輪出信號。X[4s+3]= ^{{xin^-xln + iN 1^)^ j{x[n + N l^\-x[n + 3N n-0 (6) J313825 where equation (5) and formula ( 6) is called a disassembly method with a base of 4. If s=2k and s=2k+:l are again given, and k=0,l,...,N/8-l are substituted into equations (5) and (6), respectively, then equation (7)~ can be obtained. (10) Results: iV/8-l X [ 8k+1 ] = Σ ^ - x[n + (N / 2)]) - + N/4]-x[n + 3N/4]) + n =0 F/'8{〇[« + #/8] -4« + 5iV/8]) - /(x[" + 3iV/8] - x(« + 7#/8])} (7 ) Λ^/8-l X[8k+3]= Σ{(χ[«]-χ[« + (Λ^/2)]) + j\x[n + N/4]-x[n + 3N/A]) + n=0 πΓ/8{〇[« + 8] -+ 5iV/8]) + /(x[« + 3# /8] - ♦ + 7#/8])}} Equation (8) N !%-\ Climbing X[8k+5]= Σ{(4«]-φ+(#/2)])-y(x[«+#/4]-x[«+3AT /4])- «=0 </8{(X« + #/8]-φ + 57V/8])-y(4« + 3ΛΓ/8]-♦ + 7W/8])} 9) Condition /8-1 X[8k+7]= Σ {(4«]~ 4«+ (^/2)]) + jXx[n + N/4]-x[n + 3N/A]) - /1=0 {(Jc[« + TV/8]-x[n + 5N/S]) + j(x[n + 3N/S]-x(n + 7N/8])}}W^ W^k/s (1 0) where Equations (7) to (10) are called the base 8 resolution, and Equations (3) to (10) are called the base 2/4/. Decomposition method of 8. Fast Fourier transform is a kind of algorithm widely used in digital signal processing. Due to the continuous advancement of semiconductor process technology, it has been made. The performance of the fast Fourier transform processor is greatly improved. For most digital signal processing algorithms, fast Fourier transform often needs to access data from the ge body, and the fast Fourier transform operation requires 0 (l〇grN). Ps & 'where N is the length of this fast Fourier transform, γ is its base' and every N stages need to read and write all N data. - In general, the real-time fast Fourier transform operation circuit In the very large integrated circuit (VLSI), it can be roughly divided into two categories, one is a single processing unit (PE) architecture, and the other is Pipelined • 1313825 architecture. " Fast Fourier transform of a single arithmetic unit The arithmetic circuit architecture uses a single arithmetic unit to implement the fast Fourier transform operation. It is mainly composed of four parts, including a butterfly operation unit, a temporary memory, a control unit, and a memory WN parameter. Read-Only Memory (ROM) Although the overall speed is due to the bottleneck of data access, the computing speed cannot be compared with the pipelined shelf. Compared with the structure, this way has the characteristics of area consumption. _ Pipelined circuit architecture can achieve real-time operation in the least amount of memory usage, but the required arithmetic unit (PE) is proportional to logrN, that is, when N is larger, the phase is The more computing units you need, the more. The FFT processor with pipeline-based architecture has the advantages of high regularity in hardware architecture, easy modification of modules, fast processing speed and direct connection. The FFT processor of the pipelined architecture has multipath. Multi-Path Delay Commutator (MDC) and single-path delay feedback (SDF) are two types of circuit design. Among them, in terms of data scheduling, the traditional SDF and MDC architectures input a piece of data into a single processor unit (Processor Element) in one cycle. That is to say, the pipelined architecture is a single input, and the delay is started in a single input to the scratchpad - the time is started, making the overall efficiency low. 1313825 [Description of the Invention] The main purpose of the present invention is to provide a Fourier transform system, which is different from the data scheduling of the traditional architecture, and simultaneously input two corresponding data to the arithmetic unit when inputting data, so that the arithmetic unit can Complete a Butterfly Operation in one work cycle. The present invention is a fast Fourier transform system applied to a Fourier transform with a complex stage for separately processing signal calculations of N data points, and N is a positive integer greater than or equal to 8 and a power of 8 The system includes: a plurality of multipliers each having a coefficient, and receiving a signal, multiplying the signal by the coefficient to generate an output signal, and a log2N-level operation circuit, wherein the operation is required The first-level computing module includes: a butterfly unit having a first input end, a second input end, a first output end, and a second output end, wherein two different signals can be respectively input to the input ends, and the The value of the two signals is outputted by the first output terminal, and the difference between the two signals is output by the second output terminal; the second multiplexer includes a first multiplexer and a second multiplexer, and each A multiplexer can receive two input signals and output one of the two signals to generate two output signals of the stage operation circuit; a control unit is electrically connected to the multiplexers and can control the signals many Switching to output the correct signal; and the three registers, including a first register, a second register, and a third register, wherein the registers are first in first out Register. The signal outputted by the first output end of the butterfly unit is transmitted to the first multiplexer through the first temporary register and directly to the second multiplexer; and the second 1313825 round output end of the butterfly unit The output signal passes through the second temporary crying, and the second multiplex is transmitted to the shai, and (4) the signal from the register is passed to the first multiplexer. The value of $2/4/8 is fast. The coefficients of these multipliers are the coefficients of the Fourier transform using the base. The invention can simultaneously input two corresponding data to the operation unit when the data is wheeled. 'The operation unit can perform a butterfly operation in one work cycle, and temporarily store the data output in the first-in first-out register (fif〇) In the φ, according to its Signal Flow Graph, at the corresponding time point, 'the two data are simultaneously output to the next-level arithmetic unit for operation, which can greatly improve the working efficiency of the arithmetic unit and make the whole The use efficiency reaches 100%. According to the number of FFT data points we need, the architecture of the present invention can realize the power FFT of 8 power points by multiple sets of Radix-2/4/8, which can easily meet the different points required for different communication system processing. Therefore, the fast Fourier transform system of the present invention can meet the increasingly high transmission rates required for today's development or future system. The embodiments and the technical description of the present invention are further described in the following examples, but it should be understood that the embodiments are merely illustrative and should not be construed as being The limit. Please refer to FIG. 5 to FIG. 7 for a schematic diagram of a main operation module architecture, a first three-level operation module, and a circuit diagram thereof according to an embodiment of the present invention. This 1313825 is used for Fourier transforms with complex levels, with ==:N being a positive integer greater than or equal to 8 and a complex multiplication of $200, each multiplier 2〇〇 having two connections = signal 'and the signal Multiplying the coefficient to produce an output signal: the coefficients of the multipliers are fast fuss with a base of 2/4/8: turn = coefficient (as shown in "Figures 6, 7"); And 1. The _ level operation circuit s in which the arithmetic processing unit 100 includes: "the first input terminal 112, the first output terminal 113, and the second output ^14" - different signals can be input to the input terminals respectively 111, 112, the value of the parallel number is outputted by the first output terminal 113, and the difference between the two signals is output by the second output terminal 114. The multiplexer 150, 160 includes a first multiplexer 15 〇, a second multiplexer 160, and each of the workers cry, and the two signals of the two of the two out of the two. Coffee signal. Eight output, the second circuit of the stage of the operation of the circuit π 7 ° 170 The control unit 170 is respectively connected to the multiplexers=2 and can control the switching of the multiplexers 15〇, 160, and the correct signal is rotated by the ' 〇 150 150 160; and the three registers The flute includes a first register 120, a second register n, and a register 140, wherein the registers 120, 130, and 140 are first-in-first-out registers. The signal outputted by the first wheel-out terminal 113 of the 兀11G is transmitted to the first multiplexer ι5〇 through the 12 1313825 first register 120, and the first output end of the 110 The output signal of 13 is also directly transmitted to the second multiplexer 16〇, and the signal of the second round end 114^(4) of the butterfly unit 110 is transmitted to the second multiplexer through the second register 13 16〇, and the signal transmitted from the second register 13G is transmitted to the first multiplexer 150 through the third register (10). If the two multiplexers 15〇 and 16〇 receive two inputs respectively, The signal 'and one of the two signals is output by the control of the control unit 170 to generate a two-round signal of the stage operation circuit.

現以64點的FFT處理電路為例來說明1發明之快速傅立葉轉換系統的整體運作方式，其中本發明之64點的 FFT處理電路之整體方塊圖如「第8圖」^示，而其時序圖如「第9a〜9c圖」所示。其電路動作原理說明如下：步驟1:第一級的運算模組1〇〇同時輸入兩筆資料，（為方便說明，將以「第10圖」基數為2/4/8演算法之訊號流程圖說明本發明之動作模式），蝴蝶單元11〇在做訊號處理時分別要輸入X(0)以及X(32)給第一級的蝴蝶單元n〇做相加及相減的動作（如「第1 0圖」所示），運算模組1 〇〇的兩組輸入IN1以及IN2(如「第9a圖」所示）輸入順序分別如下：IN1 輸入資料順序 X[0]，X [1]，X [2]......X [N/2-1]，在此同時IN2輸入的資料順序為χ[Ν/2]，χ[Ν/2+1]， Χ[Ν/2+2]......X [Ν_Π ’經過蝴蝶單元110之後所得到的資料為 a[k]=x[k]+x[N/2+k]，b[k]=x[k]_x[N/2+k]，其中 k=0，l，2......N/2-1。步驟2:由於第一級的運算模組1〇〇處理完之後要接著 13 -1313825 送給第二級的運算模組100處理’但由於第二級的運算模 - 組100輸入資料排程(data ordering)的關係，經過第—級的運算模組100所輸出的結果並不是第二級的運算模組1〇〇所需要的資料，（如「第1〇圖」所示）。因為第二級的運算模組100所需要的資料是a[k]及a[N/4+k]，而此時運算單元第一級只輸出a[k]和b[k]，不能滿足第二級的運算模組 100的需求’所以在此時需要將第一級的運算模組1〇〇的輸出結果暫存至其暫存器120、130(FIFO_16)中（暫存器12〇 • 存入a[k]，暫存器130存入b[k])，經由暫存器12〇、 130(FIFO_16)將a[k]及b[k]資料延遲16個週期時間再從暫存器120、130(FIFO_16)中輸出，當a[k]與b[k]資料延遲 16週期時間後從暫存器120、130(FIFO_16)輸出時第一級的運算模組100已經將a[N/4+k]的結果運算完成，然後再同時將暫存器120、130(FIFO_16)上的資料以及第一級的運算模组1〇〇結果的資料送給第二級的運算模組1〇〇，以符合資料排程(data ordering)的關係。 ^ 步驟3:經由第一級的運算模組1〇〇處理完成資料有 a[k]和b[k]，但為了符合第二級的運算模組ι〇0所需要的資料排程(data ordering)，a[k]及a[N/4+k]必須先行送出給第二級的運算模組100處理，但b[k]的資料也在a[k]完成時一併完成，所以b[k]的資料要在第二暫存器13〇中等待更長的時間，等到a[k]處理完成後，b[k]才能由第一級的運算模組100送給第二級的運算模組1〇〇。如「第8圖」所示，在b[k]完成時，直接送到第二暫存器i3〇(FIF〇 16) 1313825 中暫存，該第一級的第二暫存器13〇(fIFO—16)的長度為 _ 16，當b[16]的資料元成時，該第二暫存器130(FIFO_16) 的内容已經全滿，第二暫存器130(FIFO 一 16)已經要將資料送出’可是第一級的運算模組100正在送出a[k]的資料給第二級的運算模組1〇〇，尚無可利用的通道輸出b[k]的資料，所以此時第二暫存器130(FIFO—16)所送出的資料必須繼續送到其後端的第三暫存器140(FIFO—16)做暫存的動作，第三暫存器140的長度一樣為16。此時b[16]資料送籲至第二暫存器130(FIF〇_16)中，而第二暫存器 13〇(FIFO_16)將b[0]資料送至第三暫存器14〇中，當第二級運算單元將a[k]資料處理完畢後，開始接受b[k]的資料’而此時b[0]的資料經由第三暫存器14〇(FIF〇_16)延遲 16個週期時間後，經由第一多工器將b[〇]由第一輸出〇讲工送到第二級的運算模組1〇〇,而從第二暫存器13〇(FIF〇所要送出的資料b[16]經由第二多工器160將b[l6]送至第鲁二輸出out—2’第一輸出cmtj所輸出的資料b[〇]以及第二輸出out一2所輸出的資料b[l6]滿足第二級的運算模組1〇〇所需的資料排程(data ordering)。步驟4:第二級的運算模組100至第三級的運算模紐 100的電路動作原理依舊如同步驟1〜步驟3，其主要的差別只要於暫存器丨2〇、13〇、140的長度，以處理64點的 FFT為例’第—級的運算模組1〇〇的三個暫存器丨如、1%、的長度皆為16(FIFO_16)，而第二級的運算模組工㈨的暫存器120、130、140的長度皆為8(FIFO_8) ’第三級 15 1313825 的運算模組100的暫存器l2〇、13〇、14〇的長度皆為 J 4(FIF0-4) ’第四級的運算模組100的暫存器12〇、130、 140的長度皆為2(FIF〇—2)，第五級的運算模組100的暫存器I20、130、140的長度皆為l(FIFOj)，至最後一個第六級的運算模組1〇〇，因為直接將運算最後結果輸出，故不需要暫存其資料。步驟5:在第三級的運算模組1〇〇以及第四級的運算模組100中間’必須經過兩組複數乘法器乘上旋轉因子籲（twiddle factor)，因為a[k]以及b[k]的資料是同時經由第三級的運算模組1〇〇同時輸出a[k]和a[N/16+k]，b[k]和 b[N/16+k]。所以需要同時將所需要的旋轉因子乘上後輸入給第四級的的運算模組1〇〇。步驟6:而後端第四級至第六級的運算模組100電路動作原理如前述步驟1〜步驟3相同，差別在於不同暫存的長度。本發明所提出的新結構藉由一組多工器，以及先進先出（FIFO)暫存器的樣態下，本發明的輸入為雙輸入，對每一級來說皆可同時輸入所需相對應資料，加上每級中並聯式的先進先出（FIFO)暫存器架構，因此整體硬體使用效率可達到100%，使蝴蝶單元及複數乘法器的工作效率能達到100%，相較於傳統SDF管線式架構的FFT電路，其運算量增加為2倍。又’因為本發明的結構是由一階段結構組成基數 -2/4/8 (使用基數-2/4/8演算法）之電路，再由整個基數 16 1313825 -2/4/8的電路當成一組，藉由多組基數-2/4/8來組成所需 - FFT，配合上不同的FIFO大小，就可組成任意所需8冪次方點數之FFT，設計出適合不同OFDM通訊系統的可變長度的傅立葉轉換電路。惟上述僅為本發明之較佳實施例而已，並非用來限定本發明實施之範圍。即凡依本發明申請專利範圍所做的均等變化與修飾，皆為本發明專利範圍所涵蓋。 • 【圖式簡單說明】第1圖，係說明一筆具有八個資料點的信號，基數為2的演算法之訊號流程圖。第2圖，係說明一蝴蝶單元的架構示意圖。第3圖，係說明該蝴蝶單元輸出的信號再經過一乘法器後的結果不意圖。第4圖，係說明一筆具有八個資料點的信號以基數為2/4/8 演算法之訊號流程圖。 φ 第5圖，係說明本發明實施例之主運算模組架構示意圖。第6圖，係本發明實施例FFT電路之前三級運算模組之示意圖。第7圖，係本發明實施例FFT電路之前三級運算模組之電路圖。第8圖，係本發明實施例FFT處理電路之整體方塊圖（N=64 點為例）。第9a〜9c圖，係本發明實施例FFT處理電路之時序圖 (N=64點為例）。 17 1313825 * 第10圖，係採用先前技術之基數為2/4/8演算法之訊號流 -程圖（N=64點為例）。【主要元件符號說明】The 64-point FFT processing circuit is taken as an example to illustrate the overall operation mode of the inventive fast Fourier transform system. The overall block diagram of the 64-point FFT processing circuit of the present invention is shown in FIG. 8 and its timing. The figure is shown in "Figures 9a to 9c". The principle of circuit operation is as follows: Step 1: The first-level computing module 1〇〇 inputs two data at the same time. (For convenience, the signal flow of the 2/4/8 algorithm based on the “10th figure” base will be used. The figure illustrates the operation mode of the present invention. The butterfly unit 11 输入 inputs X(0) and X(32) to the butterfly unit n第一 of the first stage to perform addition and subtraction operations (such as " Figure 10 shows that the input order of the two sets of inputs IN1 and IN2 (as shown in Figure 9a) of the operation module 1 is as follows: IN1 input data order X[0], X [1] , X [2] ... X [N/2-1], at the same time, the order of data input by IN2 is χ[Ν/2], χ[Ν/2+1], Χ[Ν/2 +2]......X [Ν_Π 'The data obtained after passing through the butterfly unit 110 is a[k]=x[k]+x[N/2+k], b[k]=x[k ]_x[N/2+k], where k=0, l, 2...N/2-1. Step 2: Since the first-level computing module 1〇〇 is processed, 13-1313825 is sent to the second-level computing module 100 for processing. However, due to the second-level computing module-group 100 input data scheduling ( The data ordering relationship is not the data required by the second-level computing module 100 (as shown in "1"). Because the data required by the second-level computing module 100 is a[k] and a[N/4+k], at this time, the first stage of the arithmetic unit outputs only a[k] and b[k], which cannot be satisfied. The requirement of the second-level computing module 100 is such that at this time, the output result of the first-level computing module 1〇〇 needs to be temporarily stored in its temporary registers 120, 130 (FIFO_16) (storage register 12〇) • Store a[k], register register 130 is stored in b[k]), delay a[k] and b[k] data by 16 cycles for the period of time via buffers 12〇, 130 (FIFO_16). The output of the memory 120, 130 (FIFO_16), when the a[k] and b[k] data are delayed from the register 120, 130 (FIFO_16) after a period of 16 cycles, the first stage of the operation module 100 has been a The result of [N/4+k] is completed, and then the data on the registers 120, 130 (FIFO_16) and the data of the first-stage operation module 1 are sent to the second-level operation mode. Group 1〇〇 to match the data ordering relationship. ^ Step 3: Through the first-level computing module 1〇〇 processing data has a[k] and b[k], but in order to meet the second-level computing module ι〇0 required data scheduling (data Ordering), a[k] and a[N/4+k] must be sent out to the second-level computing module 100 first, but the data of b[k] is also completed when a[k] is completed, so The data of b[k] is to wait for a longer time in the second register 13〇, and b[k] can be sent to the second level by the first-stage computing module 100 after the a[k] processing is completed. The computing module is 1〇〇. As shown in Figure 8, when b[k] is completed, it is directly sent to the second register i3〇(FIF〇16) 1313825 for temporary storage. The second stage of the first stage is 13〇( The length of fIFO-16) is _16, when the data element of b[16] is formed, the content of the second register 130 (FIFO_16) is full, and the second register 130 (FIFO-16) has been Sending the data out, but the first-level computing module 100 is sending the data of a[k] to the computing module 1 of the second-level computing unit, and there is no available channel output b[k] data, so this time The data sent by the second register 130 (FIFO-16) must continue to be sent to the third register 140 (FIFO-16) of its back end for temporary storage. The length of the third register 140 is 16 . At this time, the b[16] data is sent to the second register 130 (FIF〇_16), and the second register 13〇 (FIFO_16) sends the b[0] data to the third register 14〇. When the second-level arithmetic unit finishes processing the a[k] data, it starts accepting the data of b[k] and the data of b[0] is now passed through the third register 14 (FIF〇_16). After delaying 16 cycles, b[〇] is sent from the first output port to the second level computing module 1〇〇 via the first multiplexer, and from the second register 13〇 (FIF〇) The data b[16] to be sent is sent to the data b[〇] outputted by the first output cmtj of the second output out-2' via the second multiplexer 160, and the second output out2 The output data b[l6] satisfies the data ordering required by the second-level computing module 1. Step 4: The second-level computing module 100 to the third-level computing module 100 The principle of circuit operation is still the same as steps 1 to 3. The main difference is as long as the length of the register 丨2〇, 13〇, 140, to handle the 64-point FFT as an example of the first-level operation module. The three registers are, for example, 1%, and the length is 16 (F IFO_16), and the lengths of the registers 120, 130, and 140 of the second-level computing module (9) are all 8 (FIFO_8) 'The third level 15 1313825 of the computing module 100 of the temporary memory l2〇, 13〇 The length of 14〇 is J 4 (FIF0-4) 'The lengths of the registers 12〇, 130, and 140 of the fourth-level computing module 100 are both 2 (FIF〇-2), and the fifth level is operated. The lengths of the registers I20, 130, and 140 of the module 100 are all l (FIFOj), and the operation module of the last sixth stage is 〇〇. Since the final result of the operation is directly output, there is no need to temporarily store the data. Step 5: In the middle of the third-level computing module 1〇〇 and the fourth-level computing module 100, the two sets of complex multipliers must be multiplied by the twiddle factor because a[k] and b The data of [k] is that both a[k] and a[N/16+k], b[k] and b[N/16+k] are simultaneously output via the third-level arithmetic module 1 . At the same time, the required rotation factor is multiplied and input to the operation module 1 of the fourth stage. Step 6: The operation principle of the operation module 100 of the fourth stage to the sixth stage of the back end is as follows: Step 1 to Step 3 same, bad Depending on the length of the different temporary storage. The new structure proposed by the present invention is a set of multiplexers, and a first-in-first-out (FIFO) register, the input of the present invention is dual input, for each level Both the required corresponding data can be input at the same time, plus the parallel first-in, first-out (FIFO) register structure in each stage, so the overall hardware efficiency can reach 100%, making the butterfly unit and the complex multiplier work. It can achieve 100%, compared with the traditional SDF pipeline architecture FFT circuit, the amount of operation is increased by 2 times. 'Because the structure of the present invention is a circuit consisting of a one-stage structure consisting of a base of -2/4/8 (using a base-2/4/8 algorithm), and then a circuit of the entire base 16 1313825 -2/4/8 A group consisting of multiple sets of bases -2/4/8 to form the required - FFT, combined with different FIFO sizes, can form an FFT of any desired 8 power points, designed to suit different OFDM communication systems. Variable length Fourier conversion circuit. The above are only the preferred embodiments of the present invention and are not intended to limit the scope of the present invention. That is, the equivalent changes and modifications made by the scope of the patent application of the present invention are covered by the scope of the invention. • [Simplified Schematic] Figure 1 shows a signal flow diagram of a signal with eight data points and a base of two. Fig. 2 is a schematic view showing the structure of a butterfly unit. Fig. 3 is a view showing the result of the signal output from the butterfly unit passing through a multiplier. Figure 4 is a flow chart showing the signal with a base of 2/4/8 algorithm for a signal with eight data points. FIG. 5 is a schematic diagram showing the architecture of a main operation module according to an embodiment of the present invention. Figure 6 is a schematic diagram of a three-level operation module before the FFT circuit of the embodiment of the present invention. Figure 7 is a circuit diagram of a three-stage operation module before the FFT circuit of the embodiment of the present invention. Figure 8 is an overall block diagram of an FFT processing circuit according to an embodiment of the present invention (N = 64 points as an example). Figures 9a to 9c are timing charts of the FFT processing circuit of the embodiment of the present invention (N = 64 points as an example). 17 1313825 * Figure 10 is a signal flow diagram of the prior art with a base 2/4/8 algorithm (N = 64 points as an example). [Main component symbol description]

11 ：蝴蝶單元 12 ：乘法器 100 運算模組 110 蝴蝶單元 111 第一輸入端 112 第二輸入端 113 第一輸出端 114 第二輸出端 120 第一暫存器 130 第二暫存器 140 第三暫存器 150 第一多工器 160 第二多工器 170 控制單元 200 乘法器 1811: Butterfly unit 12: Multiplier 100 Operation module 110 Butterfly unit 111 First input terminal 112 Second input terminal 113 First output terminal 114 Second output terminal 120 First register 130 Second register 140 Third Register 150 first multiplexer 160 second multiplexer 170 control unit 200 multiplier 18

Claims

1313825 X. The scope of application for patents: 1·- fast Fourier transform system, the Fourier transform of the complex level should be used to calculate the signal of the fixed data point separately, and n is a positive integer greater than or equal to 8 and is 8 The system includes: a plurality of multipliers each having a coefficient and capable of receiving a signal and multiplying the signal by the coefficient to produce an output signal; and l〇g2N stage nose circuit The first-level operation module that needs to be processed includes: a first input terminal, a second input terminal, a private frame λ*, and a different signal can be input to the input port, and the two signals are outputted. And 兮 is output by the first output terminal and the difference of 3 仏仏 is output by the second output terminal; —multi:;: includes: a multiplexer, a second multiplexer, and each can receive two Input signal, and one of the two signals: W generates a second output signal of the hybrid operation circuit; a first register, a second register, and a control unit, respectively, and a multiplexer such as the multiplexer Switch to output the correct signal, The three registers, A > - · three registers; 2 'the butterfly - the first - output; the register is transferred to the first multiplexer, and direct = more storage: the end wheel The letter of the letter is transmitted to the first multiplexer through the second temporary storage device through the second first working benefit and the letter 19 1913825 transmitted from the second temporary register. 2. The fast Fourier transform system of claim 1, wherein the coefficients of the multipliers are coefficients of a fast Fourier transform with a base of 2/4/8. 3. The fast Fourier transform system of claim 1, wherein the registers are first in first out registers.

20