JPH03100863A

JPH03100863A - System and device for fft operation

Info

Publication number: JPH03100863A
Application number: JP1237029A
Authority: JP
Inventors: Hideyuki Ban; 秀行伴; Ryuichi Suzuki; 隆一鈴木
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1989-09-14
Filing date: 1989-09-14
Publication date: 1991-04-25

Abstract

PURPOSE:To reduce the overhead and to attain a high speed FFT operation by transferring in sequence the data used for the small FFT operations having the equal necessary normalization value to the same processor and performing an N-point FFT after the small FFT operations. CONSTITUTION:Plural processors are capable of the small FFT operations where one or more times of normalization are carried out. Each of these processors is provided with a data transfer means which transfers the data necessary for the small FFT operation. The data transfer means supplies in sequence the data used for the small FFT operations having equal normalization value to the same processor for execution of the small FFT operation. That is, the normalization value of the input data on the 2nd small FFT 113 - 116 are equal to each other since these input data are obtained from the same small FFT 101 - 104. Then the operations of the small FFTs having the equal normalization value are carried out via the same processor. Thus it is possible to reduce the overhead and to perform the FFT operation at a higher speed.

Description

[Detailed description of the invention]

（産業上の利用分野］本発明は、高速フーリエ変換（ＦＦＴ）の実現手段に関
する。(Industrial Application Field) The present invention relates to a fast Fourier transform (FFT) implementation means.

[Conventional technology]

ＦＦＴの演算は、特開昭５９−３０１６８号公報に記載
のように、互いに独立した複数の小ＦＦＴと呼ばれる処
理単位に分割することにより、並列処理が可能となる。第１図は、６４点ＦＦＴを、第２図に示す２個×２段の
バタフライ演算から構成される小ＦＦＴにより分割した
ときのデータフロー図である。なお９図面を見易くする
ために、小ＦＦＴ及び小ＦＦＴ間を接続する線の一部を
省略した。第１図より、１０１から１３６の小ＦＦＴ、
すなわち１６個×３段のバタフライ演算に分割できるこ
とが分かる。ところで各車ＦＦＴにおけるバタフライ演
算の回転因子１４１〜１４４の値は、第１図上の小ＦＦ
Ｔの位置に応じて決定される。第１図において、同−段の１６個の小ＦＦＴｌ０１〜１
１２．１１３〜１２４あるいは１２５〜１３６は、互い
に処理が独立である。そこで、これら同−段の小ＦＦＴ
を複数のプロセッサを用いて同時に処理することにより
、ＦＦＴの並列処理が可能になる。一方、ＦＦＴの固定小数点演算器での実現上の問題であ
る。オーバフローと内部演算精度の維持を解決する方法
として、中間演算結果全体にある値を乗じ、データの実
質的な有効桁を向上させる方法がある。ここではこの処
理を正規化と呼び。各段階で乗じた値の累積値を正規化量と呼ぶ。その例と
して、ｒ電子通信学会論文誌５８−Ｄ、９（１９７５年
）第５７８頁から５８５頁」で論じられているような、
オートスケーリングあるいはブロックフローティングと
呼ばれるものがある。この正規化の手法は９個々の小ＦＦＴの演算に対しても
適用できる。しかしこの場合には９桁合わせと呼ばれる
処理が必要になる。今、第１図における各車ＦＦＴの演算過程で正規化を行
うとき、第２及び３段の小ＦＦＴｌ１３〜１３６のよう
に、入力データが前段の小ＦＦＴの演算結果である第２
段以降の小ＦＦＴでは、入力データの正規化量が互いに
異なる。これは、入力データの小数点の位置が互いに異
なることを意味する。よってこのような小ＦＦＴでは、
処理に先立ち個々の入力データにその正規化量に応じた
ある値を乗じ、小数点の位置を一致させる必要がある。この処理を桁合わせと呼ぶ。このように桁合わせには、入力データ以外にその正規化
量を必要とする。よって、第２段以降の小ＦＦＴの演算
を行うとき、プロセッサは、入力データ以外にその正規
化量をアクセスする必要が生じる。FFT calculations can be processed in parallel by dividing into a plurality of mutually independent processing units called small FFTs, as described in Japanese Patent Laid-Open No. 59-30168. FIG. 1 is a data flow diagram when a 64-point FFT is divided into small FFTs constructed from 2×2 stages of butterfly operations shown in FIG. In order to make drawing 9 easier to read, some of the small FFTs and the lines connecting the small FFTs are omitted. From Figure 1, small FFTs from 101 to 136,
That is, it can be seen that it can be divided into 16 x 3 stages of butterfly operations. By the way, the values of the twiddle factors 141 to 144 of the butterfly calculation in each vehicle FFT are the same as the small FF shown in FIG.
It is determined according to the position of T. In FIG. 1, 16 small FFTs l01 to 1 in the same stage
12. Processes of 113 to 124 or 125 to 136 are independent of each other. Therefore, these small FFTs of the same stage
By simultaneously processing using multiple processors, parallel processing of FFT becomes possible. On the other hand, this is a problem in implementing the FFT with a fixed-point arithmetic unit. As a method for solving overflow and maintaining internal calculation precision, there is a method of multiplying all intermediate calculation results by a certain value to increase the actual number of significant digits of the data. This process is called normalization here. The cumulative value of the values multiplied at each stage is called the normalized amount. As an example, as discussed in ``Transactions of the Institute of Electronics and Communication Engineers 58-D, 9 (1975), pp. 578-585'',
There is something called autoscaling or block floating. This normalization method can also be applied to nine individual small FFT operations. However, in this case, a process called 9-digit matching is required. Now, when performing normalization in the calculation process of each vehicle FFT in FIG.
In the small FFTs after the stage, the normalization amount of the input data is different from each other. This means that the positions of the decimal points of the input data are different from each other. Therefore, in such a small FFT,
Prior to processing, each input data must be multiplied by a certain value according to its normalization amount to align the positions of the decimal points. This process is called digit alignment. In this way, digit alignment requires a normalized amount in addition to the input data. Therefore, when performing small FFT calculations in the second and subsequent stages, the processor needs to access the normalized amount in addition to the input data.

[Problem to be solved by the invention]

上記従来技術では、ＦＦＴの演算を複数の小ＦＦＴに分
割して段階的に求め、各段階での小ＦＦＴの演算過程で
演算精度を維持するための正規化を１回以上行う演算方
式において９桁合わせを必要とする小ＦＦＴを処理する
プロセッサでは、入力データ以外にその正規化量を必要
とすることについての十分な配慮がなされていない。このため、プロセッサが小ＦＦＴを処理する上で必要な
データ数が増大するため、より高速なメモリやデータ転
送回路が必要になる。あるいはメモリやデータ転送回路
の能力の制限により、プロセッサが必要とするデータを
転送できないためにオーバヘッドが発生し、並列化の効
果が十分に得られないという問題があった。本発明の目的は、より高速化が可能なＦＦＴ演算方法を
提供することにある。本発明の他の目的は、より高速化が可能なＦＦＴ演算装
置を提供することにある。In the above-mentioned conventional technology, the FFT operation is divided into a plurality of small FFTs and calculated in stages, and normalization is performed at least once to maintain the calculation accuracy in the small FFT calculation process at each stage. In a processor that processes a small FFT that requires digit alignment, sufficient consideration is not given to the need for normalized amounts in addition to input data. For this reason, the amount of data required for the processor to process the small FFT increases, which requires faster memory and data transfer circuits. Alternatively, there is a problem in that the data required by the processor cannot be transferred due to limitations in the capacity of the memory or data transfer circuit, resulting in overhead, which prevents the full effect of parallelization from being obtained. An object of the present invention is to provide an FFT calculation method that can be faster. Another object of the present invention is to provide an FFT arithmetic device that can achieve higher speeds.

[Means to solve the problem]

上記の目的を達成するために２例えば第１図の第２段の
小ＦＦＴ１１３〜１１６において、各車ＦＦＴの入力デ
ータの正規化量は、これら入力データが全て同じ小ＦＦ
Ｔｌ０Ｉ〜１０４から得られることから、互いに等しい
ことに着目し、このような演算に必要な正規化量が等し
い小ＦＦＴの演算を、同一プロセッサで行うようにした
。上記他の目的を達成するために、演算過程で正規化を１
回以上行う小ＦＦＴの演算が可能な複数のプロセッサと
、各プロセッサが小ＦＦＴの演算に必要となるデータを
転送するデータ転送手段とを有し、データ転送手段が、
必要な正規化量が等しい小ＦＦＴの演算に用いるデータ
を、同一のプロセッサに逐次供給することにより、当該
小ＦＦＴの演算を行うようにした。（作用］プロセッサは、同一の正規化量を用いて複数の小ＦＦＴ
の演算を行うことができるので、プロセッサが必要とす
る正規化量の数を減少できる。あるいは、１つの正規化
量は、ある１つのプロセッサしか必要としなくなるので
、正規化量をプロセッサ間で転送する回数を低減できる
。よって、高速なメモリやデータ転送回路が不要になる
。あるいはメモリやデータ転送回路の・能力の制限から
くるオーバヘッド発生の問題を低減できる。In order to achieve the above purpose, for example, in the second-stage small FFTs 113 to 116 in FIG. 1, the normalized amount of input data of each car FFT is
Since they are obtained from Tl0I~104, we focused on the fact that they are equal to each other, and the same processor performs the small FFT calculations that require the same amount of normalization for such calculations. In order to achieve the other objectives mentioned above, normalization is performed by 1 in the calculation process.
The data transfer means includes a plurality of processors capable of performing small FFT calculations more than once, and data transfer means for transferring data necessary for each processor to perform small FFT calculations, and the data transfer means includes:
The small FFT calculations are performed by sequentially supplying data used for small FFT calculations that require the same amount of normalization to the same processor. (Operation) The processor performs multiple small FFTs using the same normalization amount.
, the number of normalization quantities required by the processor can be reduced. Alternatively, since one normalized amount requires only one processor, the number of times the normalized amount is transferred between processors can be reduced. Therefore, high-speed memory and data transfer circuits are not required. Alternatively, it is possible to reduce the problem of overhead caused by limitations in the capacity of memory and data transfer circuits.

【Example】

以下本発明の第１の実施例について述べる。第３図は、Ｎ点ＦＦＴを構成するバタフライ演算と分割
した小ＦＦＴとの関係の概略を示したものである。ここ
で、１はＮ点ＦＦＴに必要な（Ｎ／２）個×Ｍ段のバタ
フライ演算を表す。但し。Ｍ　＝　ｌｏｇ、　Ｎである。今、小ＦＦＴ２〜６の大
きさを（Ｑ／２）個×Ｐ段のバタフライ演算であるとす
る。但し、　ｐ＝ｌｏｇ、Ｑである。このときＮ点ＦＦ
Ｔは、Ｒ段×Ｓ個の小ＦＦＴに分割できる。ここで、Ｒ
＝Ｍ／Ｐ且つＳ＝Ｎ／Ｑである。なお。第３図上で左上を原点とし、そこから右にｒ段。下にＳ個の位置の小ＦＦＴを、第ｒ段第Ｓ番目の小ＦＦ
Ｔと呼び、Ｈ（ｒ、５）（ｒ＝１，２．−Ｒ；５＝＝１
．２．・・・、Ｓ）と表すことにする。第４図は、第３図のＲ段×Ｓ個の小ＦＦＴをＵ個のプロ
セッサで処理するときの処理手順を示すものである。ま
ずＮ点ＦＦＴの原始データ（Ｎ個）をビットリバース順
に並び替える処理７を行う９次にＲ段×Ｓ個の小ＦＦＴ
の演算８，９を行う。最後に全ての演算結果を対象にした桁合わせの処理１０
を行うことにより、Ｎ点ＦＦＴを終了する。各車ＦＦＴの演算過程でそれぞれ独立した正規化を行う
とき、第２段目以降の小ＦＦＴの演算では９桁合わせの
処理が必要になる。そこで、この桁合わせの処理でのオ
ーバヘッドを減少させるために、Ｒ段×Ｓ個の小ＦＦＴ
の演算８，９のうち。第１段（ｒ≧２）の小ＦＦＴの演算９を第５図に示す処
理手順により行う。第５図は、Ｕ番目のプロセッサ（以下プロセッサＵと呼
ぶ：ｕ＝１．２．・・・、Ｕ）における第ｒ段（ｒ≧２
）の小ＦＦＴの処理手順を示すものである。まず、第ｒ
段小ＦＦＴのうち必要な正規化量が等しい小ＦＦＴの番
号（Ｈ（ｒ、ｓ）のＳ）の導出２１を行う。このような
第ｒ段小ＦＦＴはＱ個存在し、その番号を？　（Ｓｉｔ
　８２．”’ｊ　５Ｑ）１（ｉ＝１．２．・・・、Ｓ／
Ｑ）と表すことにする。ここで添字ｉは、このＱ個の第ｒ段小ＦＦＴに対応する
固有の値であり、各々の番号”１１８！Ｉ・・・ｆｉＱ
と１対１に対応する。よって添字ｉが異なると、その番
号＄１１８２１・・・、ｓｅは全て異なる。次に、処理２１で導出された（Ｓ工、ｓ２．・・・、５
Ｑ）１の中のあるＳｑ　　（ｑ＝１，２．・・・、Ｑ）
番目の第ｒ段小ＦＦＴＨ（ｒ、Ｓ’ｔ　）の処理２２を
行う。２２の処理は、　Ｈ（ｒ、　ｓｑ　）の入力データの桁
合わせを行う処理２３と、（Ｑ／２）個×Ｐ段のバタフ
ライ演算の処理２４からなる。なお２４の処理では、演
算過程で正規化が行われる。この２２の処理は、２５の
条件分岐及び２６の処理する小ＦＦＴの番号の更新の処
理により、処理２１で導出したＱ個の番号の第ｒ段小Ｆ
ＦＴＨ（ｒ、ｓｌ）　ｔ　Ｈ（ｒｅ　８２）　ｇ　・・
・ｇ　Ｈ（ｒｅ　ｓＱ　）の処理を全て終えるまで繰り
返される。以上２１〜２６の処理の終了後、２７の条件分岐により
、必要な正規化量が等しいＱ個の第ｒ段小Ｆ　ＦＴＨ（
ｒ、　ｓ、）　、　Ｈ（ｒ、　ｓ２）　、　−、Ｈ（ｒ
、ｓＱ）であって、まだ処理を終えておらず且つプロセ
ッサＵで処理すべきものが存在するか否かを判定する。これは、予めプロセッサＵが処理すべきＱ個の小ＦＦＴ
の番号（ｓｌ、ｓ２．”’、ｓＱ）息の添字ｉを決めて
おき、処理済みの添字との比較から、あるいは、まだど
のプロセッサでも処理されていないＱ個の小ＦＦＴＨ（
ｒ、Ｓｌ）、Ｈ（ｒ　ｓ　９ｚ）　＋　”’＋　Ｈ（ｒ
　ｇ　ｓＱ）の存在の有無を調べることなどから可能で
ある。その結果、プロセッサＵで処理すべきものが存在
する場合には。２１〜２６の処理を再度繰り返す。このとき処理２１は
この新たな添字ｉを有するＱ個の第ｒ段小ＦＦＴの番号
（５１，８２，・・’、５Ｑ）ｔ　を導出する。次に、第５図での２１及び２２の処理について。詳細に説明する。２１の処理で導出される。Ｑ個の第ｒ段小ＦＦＴの番号
Ｓｑ　　（ｑ＝１，２．・・・、Ｑ）は９次式より導出
する。５ｑ＝Ｑ”−”ｘｄｉｖ（（ｉ　　１）／Ｑ”−”）＋
ｍｏｄ（（ｉ　−１）／　Ｑ””）＋（（１−１）ＸＱ”−”＋１　　　　（１）但しｅ　
ｌ　”（Ｓ１ｔｓｌ＋”・ｔｓＱ）ｌ　の添字ｅｘ＝Ｌ
２、・・・、（Ｓ／Ｑ）。ｄｉｖ　（ｕ／ｖ）：　ｕ／ｖの商。ｓｏｄ　（ｕ／ｖ）：　ｕ／ｖの余り。第６図は、２２の処理すなわちＨ（ｒ、　Ｓｑ　）の処
理の内容をデータフロー図で示したものである。Ｈ（ｒ
、　ｓｑ　）の処理とは、　ｈ　（０）　〜ｈ（Ｑ−１
）に対して９桁合わせの処理３１と（Ｑ／２）個ｘｐ段
のバタフライ演算３２を行うことである。それぞれ第５
図における２３の処理及び２４の処理に対応する６Ｈ（ｒｅｓｑ）の入力データｈ　（０）　〜ｈ　（Ｑ−
１）とＮ点ＦＦＴの原始データｘ（０）〜ｘ（Ｎ−１）
との関係は、以下のようになる。但しＮ点ＦＦＴの原始
データは、既にビットリバース順に並び変えられている
ものとする。ｈ　（ｋ）＝　ｘ　（Ｑ’Ｘｄｉｖ（（ｓｑ　−１）／　Ｑ”−１
））＋ｍｏｄ（（ｓ　ｑ　−１）／　Ｑ”−１））＋ｋ
ＸＱ”−”）　　　　　　　　（２）但し、　ｈ　（ｋ
）　：小ＦＦＴＨ（ｒ、　ｓｑ　）のに個目の入力デー
タ。ｋ＝０゜１、・・・ｘ（ｎ）：Ｎ点ＦＦＴのｎ個目の原始データ。ｎ冨０，
１．・・・ｄｉｖ　（ｕ／ｖ）：　ｕ／ｖの商。ｌ１ｏｄ　（ｕ／ｖ）　　：　ｕ／ｖの余り。３１の桁合わせの処理は、Ｈ（ｒ、ｓｑ）の入力データ
ｈ　（ｋ）に、それぞれの正規化量に従って、定数Ｄｗ
　　（Ｗ　＝１　ｅ　２　ｔ　−＋　Ｑ）を乗じること
により行われる。（１）式で導出したＱ個の第ｒ段小Ｆ
ＦＴにおけるＤｌは、　Ｈ（ｒ、　Ｓｑ　）での正規化
量をＥ　（ｒ、　ｓｑ　）とすると９次式から求められ
る。Ｄｗ”：５ｉｎ（Ｅ　（ｒ−１，ｓ　ｚＬ　Ｅ　（ｒ−
１，Ｓ　ｚＬ−・・Ｅ　（ｒ−１，ｓ　ｅ））／　Ｅ　
（ｒ−１，ｓ−）　　　（３）但しｇ　Ｉ＠Ｉｎ　（ｕ
ｌＩｕｚｇ”’ｇ　ｕｖ）　　：　ｕＬＨＬ１２ｇ°°
。ｕｖの最小値。すなわち、Ｑ個の小ＦＦＴの演算に必要な正規化量は、
いずれもＥ（ｒ−１＊　５１）Ｉ　Ｅ　（ｒ−１ｓ　　
ｓｚ）　ｅ　−Ｅ　（ｒ−１）　　ＳＱ）　　と等しく
なる。３２の（２／Ｑ）　個ｘｐ段のバタフライ演算の処理は
、Ｑ点ＦＦＴの演算とアルゴリズム的に全く同一である
ことから、一般に用いられているＦＦＴの処理アルゴリ
ズムにより実現できる。但し。各バタフライ演算で乗じる回転因子の値は、Ｑ点ＦＦＴ
の場合と異なる。今、第６図上で第０段（ｕ　＝　１　、２　、−　Ｐ　
）の（Ｑ／２）個のバタフライ演算の中で、上からＶｆ
ｈＩｌｌｌ（ｖ＝１．２．・・・Ｑ）のバタフライ演算
で用いる回転因子を＠Ｗｕｖと表す。このＷ　ｕ　ｖは
９次式より導出される。Ｗｕｖ　＝ｅｘｐ（−ｊ（２π／　Ｎ）　Ｘ　ｃ　）ｃ
−＝Ｉｌｏｄ（（ｖ　−１）／　２　（ｕ”））　Ｘ　
Ｑｃｒ−ｘ＞＋ｍｏｄ（（ｓｑ　　１）／Ｑ”−”）　
　　（４）すなわち３２の処理は、処理アルゴリズムと
してＦＦＴアルゴリズムを用い２乗じる回転因子は（５
）式より導出することから実現できる。以上のように、第ｒ段（ｒ≧２）の小ＦＦＴのうち、必
要な正規化量が等しいＱ個の小ＦＦＴの番号ＳＭ　　（
ｑ　＝　１　ｔ　２１　・・・＊　Ｑ）を（１）式より
導出し、これらの番号の小ＦＦＴＨ（ｒ、ｓ工）。Ｈ（ｒ＋　　ｓｚ）　ｅ　・・・＊　Ｈ（ｒ、　　ｓＱ
）　　を同一プロセッサで処理する。すると、同一プロ
セッサで処理するこれら小ＦＦＴの演算に必要な正規化
量が等しくなるので、プロセッサに転送する正規化量の
数を削減できる。あるいはこれらの正規化量は。同一プロセッサで処理する小ＦＦＴと１対１に対応する
ことから、１つの正規化量は、ある１つのプロセッサし
か必要としなくなり、正規化量をプロセッサ間で転送す
る回数を低減できる。よって。高速なメモリやデータ転送回路が不要になる。あるいは
メモリやデータ転送回路の能力の制限からくるオーバヘ
ッド発生の問題を低減できることから、より高速なＦＦ
Ｔの演算が可能になる。以下本発明の第２の実施例について述べる。第７図は、第２の実施例の構成図である。第７図におい
て、６０はデータ転送ユニット、４１〜４３はプロセッ
サである。データ転送ユニットは。共有メモリ４０とバス４４から構成される。共有メモリ
上には、少なくともＦＦＴの原始データ。中間演算結果、最終演算結果を格納する領域４５と、こ
れらデータの正規化量を格納する領域４６とを有する。そしてプロセッサが必要とするデータを、共有メモリか
らバスを介して転送する。各プロセッサは９局所メモリ４７とデータ処理部４８．
アドレス発生部４９．制御部５０から構成される。そし
て、制御部からの指示により、アドレス発生部及びデー
タ処理部が有機的に働き。共有メモリ上あるいは局所メモリ上のデータに対する処
理を行う。ところで、共有メモリと各プロセッサ間のデータ転送は
単一のバスにより行うため、複数のプロセッサが同時に
共有メモリをアクセスすると、バス上でデータの衝突が
生じる。この問題は、各プロセッサの制御部により各々
のプロセッサが共有メモリをアクセスするタイミングを
互いに少しずつずらす、あるいはバスアービタを用いる
などの工夫により解決しているものとする。このような
ＦＦＴ演算装置は２例えば市販されている乗算器やメモ
リなどを組み合わせて、あるいはＤＳＰ（ディジタル　
シグナル　プロセッサ：　ｐｉｇｉｔａｌＳｉｇｎａｌ
　Ｐｒｏｃｅｓｓｏｒ）を用いることなどにより実現で
きる。本実施例によるＮ点ＦＦＴの演算は、先の第１の実施例
で述べたＦＦＴ演算方法に基づいて行う。以下２本実施例によりＮ点ＦＦＴを処理するときのデー
タの流れ及び各部の動作について、第４図及び第５図に
従って述べる。なお、処理開始時点において、ＦＦＴの
原始データ（Ｎ個）は、既に共有メモリ上に格納されて
いるものとする。まず、第４図の７の処理を行う。これは２例えば各プロ
セッサが共有メモリ上のＮ個のＦＦＴ原Ａとそのビット
リバース位置である範囲Ｂの原始データをそれぞれ局所
メモリ上に転送する。そして、帰所メモリ上の範囲Ａの
データをそのビットリバース位置である共有メモリ上の
範囲Ｂに９局所メモリ上の範囲Ｂのデータをそのビット
リバース位置である共有メモリ上の範ＭＡにそれぞれ転
送することにより実現する。第２に、第４図の８の処理を行う。これは、各プロセッ
サが共有メモリ上のＮ個のＦＦＴ原始データのうち、プ
ロセッサ毎に異なる小ＦＦＴの入力データ（Ｑ個）を局
所メモリ上に転送した後。データ処理部でこの局所メモリ上の入力データに対して
、演算過程で正規化を１回以上行う小ＦＦＴの演算を行
い、その結果得られた演算結果（Ｑ個）と正規化量（１
個）を、アドレス発生部を用いて再び共有メモリ上に転
送することにより実現する。演算結果は小ＦＦＴの入力
データが格納されていた同一アドレスに、正規化量は処
理した小ＦＦＴの番号に対応したアドレスに格゛納され
る。第３に、第４図の９の処理、すなわち第５図の処理を行
う。なお、第２段以降の小ＦＦＴの処理において、各プ
ロセッサで処理すべきＱ個の小ＦＦＴの番号（ｓｘｅ　
ｓｚ＋”’ｔ　１ｉＱ）１の添字ｉは、予め次のように
決定しておくものとする。すなわち。プロセッサＵで処理すべきＱ個の小ＦＦＴの番号（５１
１５２１”’ｐｓＱ）、の添字ｉは。ｕ、ｕ＋Ｕ、ｕ＋２Ｕ、−、ｕ＋ｚＵ但し、ｚ：＝ｄｉｖ（Ｓ／　（ＱＸＵ））＋ｄｉｖ（ｍ
ａｄ（Ｓ　／　（Ｑ　Ｘ　Ｕ）　）／　ｕ　）−１とす
る。そしてこれらの値は、プロセッサＵに格納しておく
ものとする。第５図の２１の処理は、データ処理部において。予め与えられているプロセッサＵで処理すべきＱ個の小
ＦＦＴの番号（Ｓａｔｓ２＊”’＋ＳＱ）＋の添字ｉよ
り、この（Ｓｚｓ　Ｓａｔ”’ｔ　ＳＱＩ　を（１）式
を用いて導出することにより実現できる。第５図の２３の処理は、以下の手順で行うことができる
。手順１：　２１の処理で得られた小ＦＦＴの番号（Ｓ□
、ｓ２．・・・、ｌ！Ｑ）ｌから２桁合わせ、すなわち
（３）式を計算するのに必要なＱ個の正規化量（Ｅ　（
ｒ−１，ｓｚ　）　ｓ　Ｅ　（ｒ−１゜ｓｔ）　ｌ　・
Ｅ　（ｒ−１，ｓＱ）　）を、共有メモリ上から局所メ
モリ上に転送する。手順２：　　　（３）式より２桁合わせに必要な定数り
、（ｗ＝１．２．−、Ｑ）をデータ処理部で求め、その
結果を局所メモリ上のＥ（ｒ−１，ｓ、）　　の格納位
置に書き込む。手順３　：　　Ｈ（ｒｅ　ｓｑ　）のＱ個の入力データ
を、共有メモリ上から局所メモリ上に転送する。このと
き共有メモリ上の入力データのアドレスは、（２）式を
用いて導出する。手順４：　局所メモリ上のＨ（ｒ、　Ｓｑ　）の入力デ
ータに、データ処理部を用いて手順２で求めた定数り、
を乗じ、その結果を局所メモリ上の同一アドレスに書き
込む。また第５図の２４の処理は、引き続き以下の手順により
行う。手順５：　手順４で求めた局所メモリ上の桁合わせを終
えたＨ　（ｒｅ　Ｓｑ　）の入力データに対し、データ
処理部を用いて（２／Ｑ）個×Ｐ段のバタフライ演算を
行い、その結果を局所メモリ上の同一アドレスに書き込
む。例えば、処理アルゴリズムとしてＩｎ−Ｐｌａｃｅ
型のＦＦＴアルゴリズムを用いる。但し。（５）式から決定される回転因子を用いるものとする。手順６：　手順５の結果、得られたＨ（ｒ、Ｓｑ）の演
算結果（Ｑ個）と正規化量（１個）を共有メモリ上に転
送する。演算結果はＨ（ｒ。Ｓｑ）の入力データが格納されていた同一アドレスに、
正規化量は処理した小ＦＦＴの番号ｓｑに対応したアド
レスに格納される。ここで９手順５においてＨ（ｒ、ｓｑ）の第１段のバタ
フライ演算の入力データを対象に正規化を行う場合、正
規化に必要な各入力データに対する乗算と手順４におけ
る定数Ｄｗの乗算を同時に行うことができる。第５図の２５に条件分岐及び２６の処理は、２３及び２
４の処理をＱ回繰り返すためのものであって、制御部で
のハードウェア的あるいはソフトウェア的なカウンタに
より実現できる。また、Ｑ回繰り返される２３の処理の
うち、前記手順１゜手順２については最初の１回だけ実
行するだけでよい。第５図の２７の条件分岐は、予め与えられているプロセ
ッサＵで処理すべきＱ個の小ＦＦＴの番号（”１９８２
９・・・、５Ｑ）１の添字ｉと、処理を終えた添字と比
較することにより実現する。第４に、第４図の１０の処理を行う。これは。最終段（第ｎ段）の８個の小ＦＦＴで異なった正規化が
行われ、演算結果の正規化量が小ＦＦＴ毎に異なるため
に行う６まず、８個の小ＦＦＴの正規化量の中から、最
小のものを導出する。これは。あるプロセッサが全正規化量をサーチすることで可能で
ある。次に各車ＦＦＴの演算結果にある一定値を乗じる
。この一定値とは、求めた最小の正規化量を演算結果の
正規化量で除した数である。これは、各プロセッサが互いに異なる小ＦＦＴの演算結
果に対し、上記一定値を乗じることで可能である。以上のように、第７図に示す構成を用い、且つ必要な正
規化量が等しい小ＦＦＴの演算に用いるデータを、同一
のプロセッサに逐次転送して、当該小ＦＦＴの演算を行
うことにより、Ｎ点ＦＦＴを行う。すると、プロセッサ
が必要とする正規化量の数が減少する。よって、共有メ
モリ上の正規化量のアクセス頻度が減少でき、メモリや
データ転送回路の能力の制限からくるオーバヘッドの発
生を低減できる。すなわち、より高速なＦＦＴ演算装置
を実現できる。以下本発明の第３の実施例について述べる。第８図は、第３の実施例の構成図である。第８図におい
て、７０はデータ転送ユニット、７１〜７３はプロセッ
サである。データ転送ユニットは。バス７４から構成される。プロセッサが必要とするデー
タは、複数のプロセッサからバスを介して転送する。各プロセッサは９局所メモリ４７とデータ処理部４８．
アドレス発生部４９．制御部５０から構成される。そし
て、制御部からの指示により、アドレス発生部及びデー
タ処理部が有機的に働き。局所メモリ上のデータに対する処理を行う。本実施例では、各プロセッサの局所メモリ上に。少なくともＦＦＴの原始データ、中間演算結果。最終演算結果を格納する領域８１と、これらデータの正
規化量を格納する領域８２とを有し、これらのデータを
複数のプロセッサに分散して記憶する。本実施例によるＮ点ＦＦＴの演算は、前記第２の実施例
の場合と同様の手順により実現できる。但し本実施例では、各プロセッサが処理を行う上で必要
なデータの授受は、データ転送ユニットのバスを介して
、複数の局所メモリを直接アクセスすることにより行う
、すなわち、プロセッサにおける小ＦＦＴの演算は、以
下に示すような方法により行う。方法１゜複数のプロセッサの局所“メモリ上にある。予め定５ｅ
れた正規化量を・当該ブト９″′局所８モリに転送した
後、複数のプロセッサの局所メモリ上にある。当該正規
化量を有する小ＦＦＴの入力データを、当該プロセッサ
に逐次転送し、前回に、当該プロセッサで行った小ＦＦ
Ｔの演算結果とその正規化量とが参照されたことを検出
した後に、当該小ＦＦＴの演算を行い、得られた演算結
果とその正規化量を、当該プロセッサの局所メモリに格
納することをにより行う。ここで各プロセッサは、入力
データとその正規化量の読み出しに対して主導権を有す
る。方法２゜当該プロセッサの局所メモリに、予め定められた正規化
量と当該正規化量を有する小ＦＦＴの入力データとが格
納されたことを検出し、当該プロセッサがその局所メモ
リの正規化量と入力データを逐次読み出すことにより、
当該小ＦＦＴの演算を行い、得られた演算結果とその正
規化量を、当該演算結果を用いて演算を行うプロセッサ
の局所メモリに転送することにより行う。ここで各プロ
セッサは、演算結果をその正規化量の格納に対して主導
権を有する。なお、処理開始時点において、ＦＦＴの原始データ（Ｎ
個）は、既に共有メモリ上に分散して格納されているも
のとする。また、前記第２の実施例の場合と同じように
、小ＦＦＴの演算に必要な正規化量は、必要な正規化量
が同じである複数の小ＦＦＴの演算当り、１回転送する
だけでよい。本実施例では、少なくともＦＦＴの原始データ。中間演算結果、最終演算結果を格納する領域と。これらデータの正規化量を格納する領域とを、共に各プ
ロセッサの局所メモリ上に分散配置したが。少なくともＦＦＴの原始データ、中間演算結果。最終演算結果を格納する領域のみ、あるいは正規化量を
格納する領域のみを９分散配置することも可能である。以上のように、第８図に示す構成を用い、且つ必要な正
規化量が等しい小ＦＦＴの演算に用いるデータを、同一
のプロセッサに逐次転送して、当該小ＦＦＴの演算を行
うことにより、Ｎ点ＦＦＴを行う。すると、１つの正規
化量は、ある１つのプロセッサしか必要としなくなるの
で、正規化量をプロセッサ間で転送する回数を低減でき
る。よって、メモリやデータ転送回路の能力の制限から
くるオーバヘッドの発生を低減でき、より高速なＦＦＴ
演算装置を実現できる。A first embodiment of the present invention will be described below. FIG. 3 schematically shows the relationship between the butterfly calculations that constitute the N-point FFT and the divided small FFTs. Here, 1 represents (N/2)×M stages of butterfly operations required for N-point FFT. however. M = log, N. Now, it is assumed that the size of the small FFTs 2 to 6 is (Q/2) times P stages of butterfly calculation. However, p=log, Q. At this time, N point FF
T can be divided into R stages×S small FFTs. Here, R
=M/P and S=N/Q. In addition. In Figure 3, take the upper left as the origin and move r steps to the right from there. Small FFTs at S positions below, r-th stage S-th small FF
Called T, H(r, 5) (r=1,2.-R; 5==1
．． 2. ..., S). FIG. 4 shows a processing procedure when the R stages×S small FFT shown in FIG. 3 is processed by U processors. First, process 7 of rearranging the N-point FFT source data (N pieces) in bit-reverse order is performed. 9th stage: R stages x S small FFTs
Perform calculations 8 and 9. Finally, digit alignment processing for all calculation results 10
By performing this, the N-point FFT is completed. When performing independent normalization in the calculation process of each vehicle FFT, 9-digit matching processing is required in the calculation of the small FFT from the second stage onwards. Therefore, in order to reduce the overhead in this digit alignment process, we implemented R stages x S small FFTs.
Of operations 8 and 9. The calculation 9 of the small FFT in the first stage (r≧2) is performed according to the processing procedure shown in FIG. FIG. 5 shows the r-th stage (r≧2.
) shows the processing procedure of the small FFT. First, the rth
The number (S of H(r, s)) of the small FFTs that require the same normalization amount among the step small FFTs is derived 21. There are Q such r-th stage small FFTs, and what is their number? (Sit
82. ”'j 5Q) 1 (i=1.2..., S/
Let us express it as Q). Here, the subscript i is a unique value corresponding to the Q r-th stage small FFTs, and each number "118!I...fiQ
There is a one-to-one correspondence. Therefore, if the subscript i is different, the numbers $11821..., se are all different. Next, in process 21, (S engineering, s2..., 5
Q) Some Sq in 1 (q=1, 2..., Q)
Processing 22 of the r-th stage small FFTH (r, S't) is performed. The process 22 consists of a process 23 that performs digit alignment of the input data of H(r, sq), and a process 24 that performs (Q/2)×P stages of butterfly calculations. Note that in the process 24, normalization is performed during the calculation process. This processing in step 22 is performed by the conditional branching in step 25 and the updating of the number of the small FFT to be processed in step 26.
FTH (r, sl) t H (re 82) g...
- It is repeated until all the processing of g H (re sQ ) is completed. After completing the processing in steps 21 to 26 above, the conditional branch in step 27 allows Q r-th stage small FTH(
r, s, ) , H(r, s2) , −, H(r
, sQ), which has not yet been processed and which should be processed by the processor U. This consists of Q small FFTs to be processed by processor U in advance.
The number (sl, s2.'', sQ) of the subscript i is determined, and from the comparison with the processed subscript, or the Q small FFTHs (
r, Sl), H(r s 9z) + ”'+ H(r
This can be done by checking the presence or absence of g sQ). As a result, if there is something to be processed by processor U. Repeat steps 21 to 26 again. At this time, the process 21 derives the numbers (51, 82, . . . ', 5Q) t of the Q r-th stage small FFTs having this new subscript i. Next, regarding the processing of 21 and 22 in FIG. Explain in detail. 21. The numbers Sq (q=1, 2, . . . , Q) of the Q r-th stage small FFTs are derived from the ninth-order equation. 5q=Q"-"xdiv((i 1)/Q"-")+
mod((i −1)/Q””) +((1-1)XQ”−”+1 (1) However, e
l ”(S1tsl+”・tsQ)l subscript ex=L
2,..., (S/Q). div (u/v): quotient of u/v. sod (u/v): remainder of u/v. FIG. 6 is a data flow diagram showing the contents of the processing of 22, that is, the processing of H(r, Sq). H(r
, sq ) means h (0) to h(Q-1
), a 9-digit matching process 31 and a butterfly operation 32 of (Q/2) xp stages are performed. 5th each
6H (resq) input data h (0) to h (Q-
1) and the original data of N-point FFT x(0) to x(N-1)
The relationship is as follows. However, it is assumed that the original data of the N-point FFT has already been rearranged in bit-reverse order. h (k) = x (Q'Xdiv((sq -1)/Q''-1
))+mod((s q −1)/Q”−1))+k
XQ"-") (2) However, h (k
): The input data of the small FFTH (r, sq). k=0°1,... x(n): n-th original data of N-point FFT. n wealth 0,
1. ... div (u/v): quotient of u/v. l1od (u/v): remainder of u/v. The digit alignment process in No. 31 is performed by adding a constant Dw to the input data h (k) of H (r, sq) according to the respective normalization amounts.
This is done by multiplying by (W = 1 e 2 t −+ Q). Q r-th stage small F derived from equation (1)
Dl in FT can be obtained from the 9th equation, where E (r, sq) is the normalized amount in H (r, Sq). Dw”: 5in(E (r-1, s zL E (r-
1, S zL-...E (r-1, s e))/E
(r-1, s-) (3) However, g I@In (u
lIuzg"'g uv): uLHL12g°°
. Minimum value of uv. In other words, the amount of normalization required to calculate Q small FFTs is:
Both are E (r-1* 51) I E (r-1s
sz) e −E (r−1) SQ) . The processing of the 32 (2/Q) xp stages of butterfly computation is algorithmically the same as the Q-point FFT computation, and therefore can be realized by a generally used FFT processing algorithm. however. The value of the twiddle factor multiplied in each butterfly operation is the Q-point FFT
This is different from the case of Now, the 0th stage (u = 1, 2, -P
) among (Q/2) butterfly operations, Vf from above
The twiddle factor used in the butterfly calculation of hIll(v=1.2...Q) is expressed as @Wuv. This W u v is derived from the 9th order equation. Wuv =exp(-j(2π/N)Xc)c
−=Ilod((v −1)/2 (u”)) X
Qcr-x>+mod((sq 1)/Q"-")
(4) In other words, the processing of 32 uses the FFT algorithm as the processing algorithm, and the twiddle factor to be squared is (5
) can be realized by deriving from the equation. As described above, among the small FFTs of the r-th stage (r≧2), the numbers SM (
q = 1 t 21 ... * Q) is derived from equation (1), and the small FFTH (r, s) of these numbers is calculated. H(r+sz) e...* H(r, sQ
) are processed by the same processor. Then, the normalization amounts required for these small FFT operations processed by the same processor become equal, so the number of normalization amounts to be transferred to the processor can be reduced. Or these normalized quantities. Since there is a one-to-one correspondence with a small FFT processed by the same processor, one normalized amount requires only one processor, and the number of times the normalized amount is transferred between processors can be reduced. Therefore. High-speed memory and data transfer circuits are no longer required. Alternatively, faster FF
It becomes possible to calculate T. A second embodiment of the present invention will be described below. FIG. 7 is a configuration diagram of the second embodiment. In FIG. 7, 60 is a data transfer unit, and 41 to 43 are processors. data transfer unit. It is composed of a shared memory 40 and a bus 44. At least the FFT original data is on the shared memory. It has an area 45 for storing intermediate calculation results and final calculation results, and an area 46 for storing normalized amounts of these data. Data needed by the processor is then transferred from the shared memory via the bus. Each processor has nine local memories 47 and a data processing section 48.
Address generation section 49. It is composed of a control section 50. Then, the address generation section and data processing section operate organically according to instructions from the control section. Performs processing on data in shared memory or local memory. By the way, since data transfer between the shared memory and each processor is performed by a single bus, if a plurality of processors access the shared memory at the same time, a data collision occurs on the bus. This problem is assumed to be solved by using a control unit of each processor to slightly shift the timing at which each processor accesses the shared memory, or by using a bus arbiter. Such an FFT calculation device can be constructed by combining two commercially available multipliers, memories, etc., or by using a DSP (digital processor).
Signal processor: digitalSignal
This can be realized by using, for example, The N-point FFT calculation according to this embodiment is performed based on the FFT calculation method described in the first embodiment. The flow of data and the operation of each part when processing the N-point FFT according to the two embodiments will be described below with reference to FIGS. 4 and 5. It is assumed that the FFT original data (N pieces) are already stored on the shared memory at the time of starting the process. First, the process 7 in FIG. 4 is performed. 2. For example, each processor transfers N FFT originals A on the shared memory and the original data of range B, which is the bit reverse position thereof, to the local memory. Then, data in range A on the home memory is transferred to range B on the shared memory, which is its bit reversed position.9 Data in range B on the local memory is transferred to range MA on the shared memory, which is its bit reversed position. This is achieved by doing. Second, the process 8 in FIG. 4 is performed. This is after each processor transfers small FFT input data (Q pieces), which are different for each processor, to the local memory among the N pieces of FFT original data on the shared memory. The data processing unit performs a small FFT calculation that performs normalization at least once on the input data on the local memory during the calculation process, and the resulting calculation results (Q pieces) and the normalization amount (1
This is achieved by transferring the data (individuals) to the shared memory again using the address generator. The calculation result is stored at the same address where the input data of the small FFT was stored, and the normalized amount is stored at the address corresponding to the number of the processed small FFT. Third, the process 9 in FIG. 4, that is, the process in FIG. 5 is performed. In addition, in the processing of small FFTs from the second stage onwards, the numbers of Q small FFTs to be processed by each processor (sxe
The subscript i of sz+"'t 1iQ)1 shall be determined in advance as follows. That is, the number of Q small FFTs to be processed by processor U (51
The subscript i of 1521'''psQ) is: u, u+U, u+2U, -, u+zU However, z:=div(S/ (QXU))+div(m
Let ad(S/(QXU))/u)-1. It is assumed that these values are stored in the processor U. The process 21 in FIG. 5 is performed in the data processing section. From the subscript i of the number of Q small FFTs to be processed by the processor U given in advance (Sats2*"'+SQ) +, derive this (Szs Sat"'t SQI using equation (1)) The process at 23 in Figure 5 can be performed by the following steps: Step 1: The small FFT number obtained in the process at 21 (S□
, s2. ..., l! Q) Q normalization amounts (E (
r-1,sz) s E (r-1゜st) l ・
E(r-1,sQ)) is transferred from the shared memory to the local memory. Step 2: From formula (3), the data processing unit calculates the constant (w=1.2.-, Q) necessary for two-digit matching, and stores the result in the local memory as E(r-1, s, ) is written to the storage location. Step 3: Transfer Q pieces of input data of H(re sq ) from the shared memory to the local memory. At this time, the address of the input data on the shared memory is derived using equation (2). Step 4: The input data of H(r, Sq) on the local memory is given the constant calculated in step 2 using the data processing unit.
Multiply by , and write the result to the same address in local memory. Further, the process 24 in FIG. 5 is subsequently performed in accordance with the following procedure. Step 5: Using the data processing unit, perform (2/Q) × P butterfly operations on the input data of H (re Sq) that has been digit-aligned in the local memory obtained in step 4, and Write the result to the same address in local memory. For example, In-Place is used as a processing algorithm.
The type FFT algorithm is used. however. It is assumed that the twiddle factor determined from equation (5) is used. Step 6: As a result of step 5, the operation results (Q pieces) of H(r, Sq) and the normalized amount (1 piece) are transferred onto the shared memory. The calculation result is stored at the same address where the input data of H (r. Sq) was stored.
The normalized amount is stored at the address corresponding to the processed small FFT number sq. Here, when normalizing the input data of the first stage butterfly operation of H (r, sq) in step 5, the multiplication of each input data necessary for normalization and the multiplication by the constant Dw in step 4 are performed. Can be done at the same time. The conditional branch at 25 and the processing at 26 in FIG.
This is for repeating the process in step 4 Q times, and can be realized by a hardware or software counter in the control unit. Further, among the 23 processes that are repeated Q times, it is only necessary to execute the steps 1 and 2 for the first time. Conditional branch 27 in FIG. 5 indicates the number of Q small FFTs ("1982
9..., 5Q) This is realized by comparing the subscript i of 1 with the processed subscript. Fourth, the process 10 in FIG. 4 is performed. this is. This is done because different normalizations are performed in the eight small FFTs in the final stage (nth stage), and the normalization amount of the calculation result is different for each small FFT.6 First, the normalization amount of the eight small FFTs is From among them, derive the smallest one. this is. This is possible by a certain processor searching all normalized quantities. Next, the calculation result of each car's FFT is multiplied by a certain value. This constant value is the number obtained by dividing the obtained minimum normalization amount by the normalization amount of the calculation result. This can be done by multiplying the calculation results of the small FFTs, which are different from each other, by the above-mentioned constant value. As described above, by using the configuration shown in FIG. 7 and sequentially transferring data used for small FFT calculations with the same required normalization amount to the same processor to perform the small FFT calculations, Perform N-point FFT. This reduces the number of normalization quantities required by the processor. Therefore, the frequency of accessing the normalized amount on the shared memory can be reduced, and the occurrence of overhead due to limitations on the capacity of the memory and data transfer circuit can be reduced. That is, a faster FFT calculation device can be realized. A third embodiment of the present invention will be described below. FIG. 8 is a configuration diagram of the third embodiment. In FIG. 8, 70 is a data transfer unit, and 71 to 73 are processors. data transfer unit. It consists of a bus 74. Data required by the processors is transferred from multiple processors via a bus. Each processor has nine local memories 47 and a data processing section 48.
Address generation section 49. It is composed of a control section 50. Then, the address generation section and data processing section operate organically according to instructions from the control section. Performs processing on data in local memory. In this embodiment, on the local memory of each processor. At least FFT original data and intermediate calculation results. It has an area 81 for storing final calculation results and an area 82 for storing the normalized amount of these data, and stores these data in a distributed manner among a plurality of processors. The N-point FFT calculation according to this embodiment can be realized by the same procedure as in the second embodiment. However, in this embodiment, the data necessary for each processor to perform processing is performed by directly accessing multiple local memories via the bus of the data transfer unit.In other words, the small FFT calculation in the processor is , by the method shown below. Method 1: Local “memory” of multiple processors.Predetermined 5e
After transferring the normalized amount to the local 8 memory of the relevant button 9'', the input data of the small FFT having the normalized amount is sequentially transferred to the relevant processor, Small FF performed with the processor in question last time
After detecting that the calculation result of T and its normalized amount are referenced, the calculation of the small FFT is performed, and the obtained calculation result and its normalized amount are stored in the local memory of the processor. This is done by Here, each processor has initiative in reading input data and its normalized amount. Method 2: Detecting that a predetermined normalized amount and small FFT input data having the normalized amount are stored in the local memory of the processor, and the processor stores the normalized amount and the input data of the local memory. By reading input data sequentially,
This is performed by performing the calculation of the small FFT and transferring the obtained calculation result and its normalized amount to the local memory of the processor that performs the calculation using the calculation result. Here, each processor has the initiative in storing the normalized amount of the calculation result. Note that at the start of processing, the original FFT data (N
) are already distributed and stored on the shared memory. Furthermore, as in the case of the second embodiment, the normalization amount required for small FFT calculations is transferred only once for each of multiple small FFT calculations that require the same normalization amount. good. In this embodiment, at least FFT original data. An area for storing intermediate calculation results and final calculation results. The area for storing the normalized amount of data is distributed and distributed on the local memory of each processor. At least FFT original data and intermediate calculation results. It is also possible to distribute and arrange nine areas for storing only the final calculation results or only the areas for storing the normalized amounts. As described above, by using the configuration shown in FIG. 8 and sequentially transferring data used for small FFT calculations with the same required normalization amount to the same processor to perform the small FFT calculations, Perform N-point FFT. Then, one normalized amount requires only one processor, so the number of times the normalized amount is transferred between processors can be reduced. Therefore, it is possible to reduce the overhead caused by the limited capacity of memory and data transfer circuits, and to perform faster FFT.
A computing device can be realized.

【Effect of the invention】

本発明によれば、小ＦＦＴの入力データの数をＱ個とす
ると、プロセッサは、各段階で同一の正規化量を用いて
Ｑ個の小ＦＦＴの演算を行うことができるので、プロセ
ッサが必要とする正規化量の数を１／Ｑに削減できる。あるいは、１つの正規化量は、ある１つのプロセッサし
か必要としなくなるので、正規化量をプロセッサ間で転
送する回数を１／Ｑに低減できる。よって、高速なメモ
リやデータ転送回路が不要になる。あるいはメモリやデ
ータ転送回路の能力の制限からくるオーバヘッド発生の
問題を低減できる。すなわち、より高速なＦＦＴの演算
が可能になる。According to the present invention, if the number of small FFT input data is Q, the processor can perform Q small FFT calculations using the same normalization amount at each stage, so the processor is not required. The number of normalized quantities can be reduced to 1/Q. Alternatively, since one normalized amount requires only one processor, the number of times the normalized amount is transferred between processors can be reduced to 1/Q. Therefore, high-speed memory and data transfer circuits are not required. Alternatively, it is possible to reduce the problem of overhead caused by limitations in the capacity of memory and data transfer circuits. In other words, faster FFT calculations are possible.

[Brief explanation of drawings]

第１図は６４点ＦＦＴを小ＦＦＴに分割して処理すると
きのデータフロー図、第２図は小ＦＦＴのデータフロー
図、第３図はＮ点ＦＦＴと分割した小ＦＦＴの関係の概
略を示す図、第４図は本′発明の一実施例であるＮ点Ｆ
ＦＴの処理手順を示すフローチャート、第５図は本発明
の一実施例であるＵ番目のプロセッサによる第ｒ段小Ｆ
ＦＴの処理手順を示すフローチャート、第６図は小ＦＦ
Ｔのデータフロー図、第７図は本発明の一実施例を示す
構成図、第８図は本発明の一実施例を示す構成図である
。符号の説明１０１〜１３６・・・小ＦＦＴ、８・・・第１段小ＦＦ
Ｔの演算、９・・・第ｒ段小ＦＦＴの演算、２１・・・
必要な正規化量が等しい第ｒ段小ＦＦＴの番号（Ｓｌ。ｓ２．、”’　　５Ｑ）ｔの導出、２２＝−Ｈ（ｒ、Ｓ
ｑ　）の処理、２３・・・Ｈ（ｒ、ｓｑ）の入力データ
の桁合わせ、２４・・・Ｈ（ｒ、ｓ噌）を構成するＱ／
２個×Ｐ段のバタフライ演算の処理、６０．７０・・・
データ転送ユニット、４１，４２，４３．７１，７゛２
，７３・・・プロセッサ、４０・・・共有メモリ、４４
゜７４・・・バス、４７・・・局所メモリ４図晃２図）のイＰ！−嘘バタ２〉イ渭１邊［艮し４■ＶＣ功遍。第Ｓ図Figure 1 is a data flow diagram when processing a 64-point FFT divided into small FFTs, Figure 2 is a data flow diagram of the small FFT, and Figure 3 is an overview of the relationship between the N-point FFT and the divided small FFTs. The figure shown in FIG. 4 is an embodiment of the present invention at point N.
FIG. 5 is a flowchart showing the processing procedure of FT.
Flowchart showing the processing procedure of FT, Figure 6 is a small FF
FIG. 7 is a block diagram showing an embodiment of the present invention, and FIG. 8 is a block diagram showing an embodiment of the present invention. Explanation of symbols 101 to 136...Small FFT, 8...1st stage small FF
Calculation of T, 9... Calculation of r-th stage small FFT, 21...
Derivation of the number (Sl. s2.,"' 5Q)t of the r-th stage small FFT with the same required normalization amount, 22=-H(r,S
q ) processing, digit alignment of input data of 23...H (r, sq), 24...Q/ that constitutes H (r, s 噌)
Processing of 2 × P stages of butterfly calculation, 60.70...
Data transfer unit, 41, 42, 43.71, 7゛2
, 73... Processor, 40... Shared memory, 44
゜74...Bus, 47...Local memory (Figure 4, Figure 2) IP! - Usobata 2〉I 1 side [艮し4■VC work. Figure S

Claims

[Scope of Claims] 1. An operation in which an FFT calculation is divided into a plurality of small FFTs and calculated in stages, and normalization is performed at least once to maintain calculation accuracy during the small FFT calculation process at each stage. An FFT calculation method characterized in that small FFT calculations in which the normalized amount of input data is equal are performed by the same processor. 2. A plurality of processors that are capable of calculating a small FFT, which is a part of the FFT, and that can perform normalization at least once to maintain calculation accuracy during the calculation process, and are connected to each processor to perform calculations. An FFT arithmetic device comprising: a data transfer means capable of transmitting and receiving data necessary for the FFT operation. 3. Each processor uses a small FFT with the same amount of normalization required.
After obtaining the input data and its normalized amount via the data transfer means, sequentially perform the calculations of the small FFT, and transfer the obtained operation results and its normalized amount via the data transfer means. The FFT calculation device according to claim 2, characterized in that: 4. The processor includes at least a local memory capable of exchanging data with the data transfer means, and a data processing means capable of performing a small FFT operation that normalizes the data in the local memory one or more times during the operation process. The FFT arithmetic device according to the above item 3, characterized in that it has the following. 5. The data transfer means includes a shared memory that can store at least FFT original data, intermediate calculation results, final calculation results, and normalized amounts, and a means that can send and receive data between the shared memory and each processor. 5. The FFT arithmetic device according to item 4 above. 6. After transferring a predetermined normalized amount from the shared memory to the local memory of an appropriate processor, by sequentially transferring the input data of the small FFT having the normalized amount from the shared memory to the relevant processor, 6. The FFT calculation device according to item 5, wherein the small FFT calculation is performed and the obtained calculation result and its normalized amount are transferred from the processor to the shared memory. 7. The FFT arithmetic device according to claim 4, wherein the data transfer means includes means capable of exchanging data between at least the processors. 8. Each processor has at least FF on its local memory.
Item 7 above, characterized in that the original data of T, intermediate calculation results, final calculation results, and normalized amounts are stored in a distributed manner, and data on other processors is accessed via a data transfer means. The FFT device described. 9. After transferring a predetermined normalized amount on the local memory of a plurality of processors to the local memory of an appropriate processor, a small FFT having the normalized amount on the local memory of a plurality of processors. The input data of is sequentially transferred to the processor, and after detecting that the calculation result of the small FFT and its normalized amount previously performed by the processor are referenced, the calculation of the small FFT is performed and the obtained The FFT according to item 8 above, wherein the calculated calculation result and its normalized amount are stored in a local memory of the processor.
Computing device. 10. It is detected that a predetermined normalized amount and the small FFT input data having the normalized amount are stored in the local memory of each processor, and each processor stores the normalized amount and the input data of the small FFT in its local memory. The above method is characterized in that the small FFT calculation is performed by sequentially reading input data, and the obtained calculation result and its normalized amount are transferred to a local memory of a processor that performs calculation using the calculation result. 9. The FFT calculation device according to item 8.