1276975 九、發明說明: 【發明所屬之技術領域】 本發明有關於快速傅立葉轉換(fast fourier transform FFT),尤有關於一種快速傅立葉轉換處理器。 【先前技術】 時下,在行動通訊日漸普及,對無線區域網路的頻寬 要求也日漸提昇,在此要求下所提出的IEEE 802.11a規範 中,FFT運算單元是一個重要的調變(modulation)單元,利 用這個運算單元可以將頻域(frequency domain)上的資料轉 換為在時域(time domain)上相對應的資料,以此特性可對 無線通訊中面臨的訊號衰減(signal attenuation)及多通道干 擾(multi-path interference)的問題作出改善。因此,現今及 曰後發展的通訊規格也大量採用此種運算,惟,無線通訊 硬體實現上必須負荷其龐大的計算量。 傳統的快速傅立葉轉換電路設計主要有下列三種架 構·卓憶體(single-memory)、雙記憶體(dual-memory)與 管線(pipeline)架構。單記憶體架構因為只需要一個運算單 元以及充分利用到快速傅立葉轉換的置換運算(in_place computation)特性,因此有最小的面積,但是運算所需的延 遲(latency)也比較多。雙記憶體架構利用一個記憶體來儲 5 1276975 存輸入’一個記憶體來儲存輸出,此架構可以提供較單記 憶體架構為咼的輪出生產率,但需要LogrN個運算單元(其 中r,N為正整數),因此所需要的面積最大。 離散傅立葉轉換(discrete fourier transform,DFT)的定 義如下: ΛΓ—1 ^Ιλ]=ΣΦ^?其中先=0,1,...,况-1; « = 〇,ι,…,ν_\ κ_0 其中’運异子H//v=e-j27r/N為轉動係數(twiddle factor)。 第一圖為一習知技術的FFT處理器的架構方塊圖。在 2002 年 ’ Lenart 等人(“A pipelined FFT processor using Data Scaling with Reduced Memory Requirements' In Proceedings of Norchip, November 11-12, 2002, Copenhagen, Denmark)提出一種以基數4為基礎的FFT處理器管線架 構,此架構處理64點的FFT需要三級,共需三個乘法器 (multiplier)和存取轉動係數三次,每一個時脈需讀寫多達6 個先進先出(first in first out,FIFO)緩衝區(buffer)。1276975 IX. Description of the Invention: [Technical Field] The present invention relates to fast fourier transform FFT, and more particularly to a fast Fourier transform processor. [Prior Art] Nowadays, mobile communication is becoming more and more popular, and the bandwidth requirement for wireless local area networks is increasing. Under the IEEE 802.11a specification proposed by this requirement, the FFT operation unit is an important modulation. a unit by which data on a frequency domain can be converted into data corresponding to a time domain, which can be used for signal attenuation in wireless communication and The problem of multi-path interference has been improved. Therefore, the communication specifications developed today and later are also widely used in this type of computing. However, the wireless communication hardware must be loaded with a large amount of computation. The traditional fast Fourier transform circuit design has the following three architectures: single-memory, dual-memory, and pipeline architecture. The single-memory architecture has a minimum area because it requires only one arithmetic unit and takes advantage of the in-place computation feature of Fast Fourier Transform, but the latency required for the operation is also large. The dual-memory architecture uses a memory to store 5 1276975 memory inputs 'one memory to store output. This architecture can provide a round-robin productivity compared to a single memory architecture, but requires LogrN arithmetic units (where r, N are Positive integer), so the area required is the largest. The definition of discrete fourier transform (DFT) is as follows: ΛΓ—1 ^Ιλ]=ΣΦ^? where =0,1,...,condition-1; « = 〇,ι,...,ν_\ κ_0 Among them, 'transporter H//v=e-j27r/N is the twiddle factor. The first figure is an architectural block diagram of a conventional FFT processor. In 2002, Lenart et al. ("A pipelined FFT processor using Data Scaling with Reduced Memory Requirements" In Proceedings of Norchip, November 11-12, 2002, Copenhagen, Denmark) proposed a radix-4 based FFT processor pipeline architecture. This architecture requires three stages to process a 64-point FFT. A total of three multipliers and an access rotation coefficient are required three times. Each clock needs to read and write up to six first in first out (FIFO). Buffer.
第二圖為另一習知技術的FFT處理器的架構方塊圖。 在 2003 年,Maharatna 等人(“A Novel 64 Point FFT/IFFT 6 1276975The second figure is an architectural block diagram of another conventional FFT processor. In 2003, Maharatna et al. ("A Novel 64 Point FFT/IFFT 6 1276975
Processor for IEEE802.1 la standard’’,ICASSP 2003)提出 一種基數8的FFT處理器架構(應該沒有這麼晚才提出), 如第二圖所示。此架構主要是處理64點的FFT運算,以 展開二個基數8的蝶式運算單元與包含特殊硬體連結的8 個輪出單元的方式來完成,雖可以減少延遲,但是卻有較 多的硬體運算單元。 第二圖為又一習知技術的FFT處理器的架構方塊圖。 在 2002 年,Guo(“A New Hardware-Efficient Design A^pproachfor the ID Discrete Fourier Transform,,,PatternProcessor for IEEE 802.1 la standard', ICASSP 2003) proposes a base 8 FFT processor architecture (which should not be raised so late), as shown in the second figure. This architecture mainly deals with 64-point FFT operations, which is done by expanding the two base 8 butterfly units and the 8 round units with special hardware connections. Although the delay can be reduced, there are more Hardware unit. The second figure is an architectural block diagram of another conventional FFT processor. In 2002, Guo ("A New Hardware-Efficient Design A^pproachfor the ID Discrete Fourier Transform,,,Pattern
Recognition and Image Analysis, Vol. 12, No. 3, 2002, pp. 299-307)提出-種一維的DFT處理器架構,如第三圖所 不。此架構料數熱數部分分開運算,佔魏大的硬體 面積。 傳統的FFT處理器結構不勝枚舉,然而能達到利用最 少硬體面積與成本、最少的時間延遲與功率消耗等目的, 才是最實用的FFT處理器結構。上述的架構中,不僅有存 取記憶體缝絲法缝點,並且單元的數 量也多。 【發明内容】 7 1276975 本發明克服上述傳統快速傅立葉轉換處理器的缺點, 其主要目的為提供一種快速傅立葉轉換處理器。依此,本 發明之快速傅立葉轉換處理器主要包含一多工器、一第一 角度旋轉器、-第二角度旋轉暨多工器、一加法器、一轉 動係數儲存器、一乘法器與一資料儲存器。此處理器減少 了硬體的面積、成本,也降低了技術複雜度。 根據本發明,多工器係在多組N筆資料中選擇一組輸 入,並輸出一組N筆資料,其中,:^為二的訄次方,M 大於或等於3的正整數。第一角度旋轉器接收此n筆資料 的其中N/2筆資料,旋轉第一角度後,再分別輸出N/2筆 已旋轉第一角度的資料。第二角度旋轉暨多工器接收一組 N筆資料與N/2筆已旋轉第一角度的資料,在一預設第一 時間選擇此組N筆資料輪入,或者在一預設第二時間選擇 此組N筆資料中的另N/2筆資料搭配N/2筆已旋轉第一角 度的資料輸入,分別旋轉第二角度之後,再依序輪出\筆 已旋轉第二角度的資料。 加法器依序加總N筆已旋轉第二角度的資料後,輸出 一加總後的頻域資料。轉動係數儲存器儲存N點快速傅立 葉轉換的所有轉動係數。乘法器依序將加總後的頻域資料 與其相對應的轉動係數相乘後,輪出一中間資料。資料儲 8 1276975 存器依序触_存這些_間·,再輪料巾N筆資料 至多工器,進行下一級的運算。 根據本發明’此快速傅立雜換處理器更包含一第一 暫存器陣列與-第二暫存器陣列。第一暫存器陣列位於多 工器與第二角度旋轉暨多工器之間,第二暫存器陣列則位 於第-角度旋轉器與第二角度旋轉暨多工器之間。 本發明主要;^針對整體快速傅立葉轉換之運算輸入/ 輪出的順序作分析,將需要較複雜運算的部分獨立出來以 簡化硬體,並調整輸出順序。不僅有效節省硬體面積、成 本與降低技織雜度’也目喊少運算與記舰存取的次 數’因而達到省電的目的。 兹配合下列圖示、實施例之詳細說明及申請專利範 圍,將上収本發敗其他目的與優點詳述於後。 【實施方式】 第四圖係本發明之快速傅立葉轉換處理器的架構方塊 圖。參考第四圖,此快速傅立葉轉換處理器4〇〇,主要包 含一多工器41、一第一角度旋轉器42、一第二角度旋轉 9 1276975 償的加法運算’如此一來可以大幅減少進位(叫)項所帶 來的功率消耗。 以下第圖與第八(:圖說明本發明之資料流的時序 安排及暫存。第八;8圖說明本發明在預設第-時間時,資 料流的時序安排及暫存。 多工器41係在二組八筆資料中選擇一組輸入,並選擇 馨 輸出一組人筆資料。參考第人B圖,經由妥善的時序安排, 第二角度旋轉暨多工器43在預設第一時間接收由第一暫 存器陣列88輸出的-組八筆資料(實線部分),在第二角度 · 疑轉暨多工器43與加法H 44計算頻域資料χ⑼、χ(2)、 · X⑷、X(6)的同時,第一角度旋轉器42亦經由虛線的部 分,接收此八筆資料的其中四筆資料(時序為u,5,7的輸 入,即x(l),x(3),x(5),x(7)),進行旋轉2;r/8角度,再分別 籲 將此四筆已旋轉2ττ/8角度的資料存放在第二暫存器陣列 89之中。 · 第八C圖說明本發明在預設第二時間時,資料流的時 一 序安排及暫存。參考第八C圖,在預設第二時間時,第二 角度旋轉暨多工器43選擇此組八筆資料中的另四筆資料 (時序為0,2,4,6的輸入,即聯吨),#4),^6)),搭配上述 13 1276975 四筆已旋轉2ττ/8角度的資料輸入(實線部分),分別執行 旋轉2ττ/4角度的運算之後,再依序輸出八筆已旋轉2冗/4 的資料。加法器44依序加總八筆已旋轉2 7^/4角度的資料 後,再依序輸出加總後的頻域資料X(l)、χ(3)、χ(5)、χ(7)。 以Ν=8為例,本發明所採用的是基於基數8的快速傅 立葉轉換。同時,根據Yeo等人(“LowP〇werImplementati_f FFT/IFFT processor for IEEE 802.11a Wireless LAN Transceiver,,)指 出基數8的設計最能夠達到最省電的要求。在實際應用 上,IEEE 802.11a無線網路規範中所要求的架構是六十四 點的快速傅立葉轉換,因此,本發明在實現六十四點的快 速傅立葉轉換時只需要兩個輪迴(cycle)的運算,也就是兩 個階段的運算,除了可以節省運算的次數外,也可節省記 憶體(轉動係數儲存器45與資料儲存器47)存取的次數,達 到省電的目的。 因此,加法器44的輸出是第一階段的完成,若要實現 六十四點的快速傅立葉轉換,則必須要從轉動係數儲存器 45讀取相對應的轉動係數。第九圖係六十四點的快速傅立 葉轉換的轉動係數表。圖中,頻域資料X(〇)〜X(8)所需乘 上的轉動係數是W(0,64) se·,6^0 =卜X(9)所需乘上的轉 動係數是W(l,64) ,X(9)所需乘上的轉動係數是 14 1276975 部分提前讀到運算子暫翻之巾,如此-來,可同時保有 。己隱體械存大里數值時的好處,也能贿暫存器頻寬較 大的優點。 惟’以上所述者,僅為本發日月之較佳實施例而已,當 不月匕以此限疋本發明實施之範圍。即大凡依本發明申請專 利範圍所作之均等變化與修飾,皆應仍屬本發明申請專利 涵蓋之範圍内。 17 1276975 【圖式簡單説明】 第一圖為一習知技術的FFT處理器的架構方塊圖。 第二圖為另一習知技術的FFT處理器的架構方塊圖。 第三圖為又一習知技術的FFT處理器的架構方塊圖。 第四圖係本發明之快速傅立葉轉換處理器的架構方塊圖。 第五圖為經由離散傅立葉轉換的輸入與輸出的關係表。 第六圖為重新排序後的離散傅立葉轉換之運算順序。 第七圖係將W(l,8)抽出與重新排序之後的離散傅立葉轉換 的輸入與輸出的關係表。 第八A圖係第四圖中加入暫存器陣列,以提供輸出及輸入的 資料暫存。 第八3圖細本發明在預設時間時,資料流的時序安排 及暫存。 第八C圖說明本發明在預設第二時間時,資料流的時序安排 及暫存。 第九圖係六十四點的快速傅立葉轉換的轉動係數表。 第十圖係本發明之第一角度旋轉器的硬體架構圖,其中以 N=8為例。 第十一圖係本發明之快速傅立葉轉換處理器與多種習知技 術的硬體架構與效能比較表。 【主要元件符號說明】 18 1276975 圖號說明: 41多工器 43第二角度旋轉暨多工器 45轉動係數儲存器 47資料儲存器 89第二暫存器陣列 42第一角度旋轉器 44加法器 46乘法器 88第一暫存器陣列 400快速傅立葉轉換處理器 800快速傅立葉轉換處理器Recognition and Image Analysis, Vol. 12, No. 3, 2002, pp. 299-307) proposes a one-dimensional DFT processor architecture, as shown in the third figure. The number of hot numbers in this architecture is separately calculated, which accounts for the hardware area of Wei Da. The traditional FFT processor architecture is endless, but it can achieve the most practical FFT processor architecture with the least hardware area and cost, minimal time delay and power consumption. In the above architecture, not only the memory stitching point is stored, but also the number of units is large. SUMMARY OF THE INVENTION 7 1276975 The present invention overcomes the shortcomings of the conventional fast Fourier transform processor described above, and its main object is to provide a fast Fourier transform processor. Accordingly, the fast Fourier transform processor of the present invention mainly comprises a multiplexer, a first angle rotator, a second angle rotation multiplexer, an adder, a rotation coefficient memory, a multiplier and a Data storage. This processor reduces the size and cost of the hardware and reduces the technical complexity. According to the present invention, the multiplexer selects a set of inputs among a plurality of sets of N-pen data, and outputs a set of N-pen data, wherein: ^ is a second power of the second, and a positive integer of M is greater than or equal to 3. The first angle rotator receives the N/2 data of the n pieces of data, rotates the first angle, and then outputs the data of the first angle rotated by N/2. The second angle rotation and multiplexer receives a set of N pen data and the N/2 pen has rotated the first angle of the data, and selects the set of N pen data rounds at a preset first time, or at a preset second Time selects the other N/2 data in this group of N data and the data input of the N/2 pen rotated the first angle, respectively rotates the second angle, and then sequentially rotates the data of the second angle of the pen. . The adder sequentially adds the N pieces of data that have been rotated by the second angle, and outputs a total frequency domain data. The rotation coefficient storage stores all the rotation coefficients of the N-point fast Fourier transform. The multiplier sequentially multiplies the summed frequency domain data by its corresponding rotation coefficient, and then rotates an intermediate data. Data storage 8 1276975 The memory is in order to save these _ between, and then the towel N data to the multiplexer, the next level of calculation. According to the present invention, the fast Fourier processor further includes a first register array and a second register array. The first register array is located between the multiplexer and the second angle rotation multiplexer, and the second register array is located between the first angle rotator and the second angle rotation multiplexer. The present invention mainly analyzes the order of the operation input/rounding of the overall fast Fourier transform, and separates the parts requiring more complicated operations to simplify the hardware and adjust the output order. Not only does it save the hardware area, the cost and the technical complexity, but also the number of operations and the number of visits to the ship, thus saving power. In conjunction with the following diagrams, the detailed description of the embodiments, and the scope of the patent application, the other purposes and advantages of the above-mentioned receipts are detailed below. [Embodiment] The fourth figure is an architectural block diagram of the fast Fourier transform processor of the present invention. Referring to the fourth figure, the fast Fourier transform processor 4〇〇 mainly includes a multiplexer 41, a first angle rotator 42, and a second angle rotation 9 1276975 for the addition operation. Thus, the carry can be greatly reduced. The power consumption caused by the (called) item. The following figures and eighth (: the diagram illustrates the scheduling and temporary storage of the data stream of the present invention. Eighth; Figure 8 illustrates the timing and temporary storage of the data stream at the preset first time. The 41 series selects one set of input in the two sets of eight data, and selects the Xin to output a set of personal data. Referring to the first person B picture, through proper scheduling, the second angle rotation and multiplexer 43 is preset first. The time receives the set of eight data (solid line part) output by the first register array 88, and calculates the frequency domain data χ(9), χ(2), at the second angle, the suspected multiplexer 43 and the addition H 44, · At the same time of X(4) and X(6), the first angle rotator 42 also receives four of the data of the eight data through the dotted line portion (the input of the time sequence u, 5, 7 is x(l), x (3), x (5), x (7)), rotate 2; r / 8 angle, and then respectively call the four rotated 2ττ / 8 angle data in the second register array 89 · The eighth C diagram illustrates the time sequence arrangement and temporary storage of the data stream when the second time is preset. Referring to the eighth C picture, at the second time preset, the second angle Rotate and multiplexer 43 selects the other four data in this group of eight data (the timing is 0, 2, 4, 6 input, ie, tons), #4), ^6)), with the above 13 1276975 four The pen has rotated the data input of 2ττ/8 angle (solid line part), and respectively performs the operation of rotating 2ττ/4 angle, and then sequentially outputs eight pieces of rotated 2 redundancy/4 data. The adder 44 sequentially adds eight pieces of data that have been rotated by 2 7^/4 angles, and then outputs the added frequency domain data X(l), χ(3), χ(5), χ(7) in sequence. ). Taking Ν=8 as an example, the present invention employs a fast Fourier transform based on radix 8. At the same time, according to Yeo et al. ("LowP〇werImplementati_f FFT/IFFT processor for IEEE 802.11a Wireless LAN Transceiver,"), the design of the base 8 can best meet the most power-saving requirements. In practical applications, the IEEE 802.11a wireless network The architecture required in the specification is a fast Fourier transform of sixty-four points. Therefore, the present invention requires only two cycles of operations, that is, two-stage operations, in realizing a fast Fourier transform of sixty-four points. In addition to saving the number of operations, the number of accesses of the memory (the rotation coefficient storage 45 and the data storage 47) can be saved to save power. Therefore, the output of the adder 44 is completed in the first stage. In order to realize the fast Fourier transform of sixty-four points, the corresponding rotation coefficient must be read from the rotation coefficient storage 45. The ninth diagram is a rotation coefficient table of the fast Fourier transform of sixty-four points. The rotation coefficient of the domain data X(〇)~X(8) is W(0,64) se·,6^0 =Bu X(9) The required rotation coefficient is W(l,64) ), X(9) is required to multiply the rotation system The number is 14 1276975. The part is read in advance to the operator's temporary roll, so that it can be kept at the same time. The advantage of the hidden body value when storing the large value can also bribe the advantage of the buffer width. The above is only the preferred embodiment of the present invention, and the scope of the present invention is limited to the extent that it is not limited to the scope of the present invention. The invention covers the scope of the patent application. 17 1276975 [Simplified Schematic] The first figure is an architectural block diagram of a conventional FFT processor. The second figure is an architecture block of another conventional FFT processor. The third figure is an architectural block diagram of another conventional FFT processor. The fourth figure is an architectural block diagram of the fast Fourier transform processor of the present invention. The fifth figure is the input and output via discrete Fourier transform. The sixth table is the operation sequence of the discrete Fourier transform after reordering. The seventh figure is the relationship table between the input and output of the discrete Fourier transform after W(l,8) is extracted and reordered. Attached to the fourth figure is a register array to provide output and input data temporary storage. Figure 8 is a detailed description of the present invention at a preset time, the data flow is scheduled and temporarily stored. At the preset second time, the data flow is scheduled and temporarily stored. The ninth figure is a six-fourth point fast Fourier transform rotation coefficient table. The tenth figure is the hardware architecture of the first angle rotator of the present invention. In the figure, N=8 is taken as an example. The eleventh figure is a comparison table of hardware architecture and performance of the fast Fourier transform processor of the present invention and various conventional technologies. [Main component symbol description] 18 1276975 Legend: 41 multiplexer 43 second angle rotation and multiplexer 45 rotation coefficient storage 47 data storage 89 second register array 42 first angle rotator 44 adder 46 multiplier 88 first register array 400 fast Fourier transform processor 800 fast Fourier transform processor