JPS59189474A - High speed foulier transformation operating device - Google Patents

High speed foulier transformation operating device

Info

Publication number
JPS59189474A
JPS59189474A JP58065130A JP6513083A JPS59189474A JP S59189474 A JPS59189474 A JP S59189474A JP 58065130 A JP58065130 A JP 58065130A JP 6513083 A JP6513083 A JP 6513083A JP S59189474 A JPS59189474 A JP S59189474A
Authority
JP
Japan
Prior art keywords
memory
data
fft
written
speed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP58065130A
Other languages
Japanese (ja)
Other versions
JPH0230062B2 (en
Inventor
Hideo Nagai
秀夫 長井
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GE Healthcare Japan Corp
Original Assignee
Yokogawa Medical Systems Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yokogawa Medical Systems Ltd filed Critical Yokogawa Medical Systems Ltd
Priority to JP58065130A priority Critical patent/JPS59189474A/en
Publication of JPS59189474A publication Critical patent/JPS59189474A/en
Publication of JPH0230062B2 publication Critical patent/JPH0230062B2/ja
Granted legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

PURPOSE:To shorten time required for operation by providing two high speed access memories of minimum capacity and controlling input/output memory access of large capacity memory ingeniously. CONSTITUTION:Data of one point each is taken out from small blocks made by quartering each block and written in a high speed access memory HSM1. Next four complex number data are written in a memory HSM2 from a large capacity memory MS, and at the same time, high speed Foulier transformation operation (FFT) is made on the memory HSM1, and the result is written in HSM1. Then, FFT operation is made on the memory HSM2 and the result is written in the memory HSM2. At the same time, four complex number data of the memory HSM1 are written in the memory MS. Then, next data are written in the memory HSM1 from the memory MS. FFT operation is made on the memory HGM1 and the result is written in the memory HSM1. At the same time, data of the memory HSM2 are written in the memory MS, and then next data are read from the memory MS.

Description

【発明の詳細な説明】 本発明は、高速フーリエ変換(以下FFTと略す)を行
うFFT演算装置に関する0 従来から、汎用のノくイブライン方式のアレイプロ4 
ッf −(Array processor  以下A
Pと略す)では、通常データ・メモリ・アクセスに時間
遅れ汐よあシ、加減算2乗算の完了に時間遅れを増大さ
せている。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to an FFT arithmetic device that performs fast Fourier transform (hereinafter abbreviated as FFT).
f - (Array processor hereinafter A
(abbreviated as P), there is normally a time delay for data memory access and an increased time delay for the completion of addition, subtraction, and squaring.

演算の入力(又は出力)−データの格納メモIJ 75
−らのデータ読出しく又は書込み)に遅れi!あり、か
つまた連続的に大量のデータの読出しないし書込みを必
要とするFFT演算においては、その演算(処理)能力
は、−メモリとのデータ授受でfIIj限される速度に
なってしまう。
Calculation input (or output) - data storage memo IJ 75
- There is a delay in reading or writing data from i! In an FFT operation that requires continuous reading or writing of a large amount of data, the calculation (processing) capability is limited by fIIj by the data exchange with the memory.

例えば、2点のコンプレックス・データ(Comple
x Data )の読出し及び書込みに要する時間は、
各点が実数部と虚数部の2データで表挑されているため
、読出しに!サイクル、書込みに4サイクルの計8サイ
クル必要である。この場合、4回の乗算の開始と6回の
加算の開始が、乗算とカロ算の並列処理によって6サイ
クルで実行できたとしても、メモリ・アクセス回数8サ
イクルがFFT演算の速度を決定してしまい、すなわち
FFT演算速度がメモリ・アクセス回数で制限されてし
まうという問題があった。
For example, two points of complex data (Complete
The time required to read and write xData) is
Each point is represented by two data, the real part and the imaginary part, so read it out! A total of 8 cycles, 4 cycles for writing, are required. In this case, even if the start of 4 multiplications and the start of 6 additions can be executed in 6 cycles by parallel processing of multiplication and Caro arithmetic, the number of memory accesses of 8 cycles determines the speed of the FFT operation. In other words, there is a problem in that the FFT calculation speed is limited by the number of memory accesses.

本発明は、このような点に鑑み、その目的とするところ
は、FFT入出力データ格納用低速大容量のメモリの他
に、2個の最小容量の高速アクセス・メモリを用意し、
大容量メモリの入力、出力メモリ・アドレスを巧みに制
御して演算に要する時間の短縮化を図り得るFFT装置
を提供するととKある。
In view of these points, the present invention aims to provide, in addition to a low-speed, large-capacity memory for storing FFT input/output data, two minimum-capacity, high-speed access memories;
It is an object of the present invention to provide an FFT device that can shorten the time required for calculation by skillfully controlling input and output memory addresses of a large-capacity memory.

以下図面を用いて本発明の詳細な説明する。まず、本発
明の方式のアルゴリズムについて説明する。第1図は本
発明の方式のシグナル・フローを示す図で、データ数1
6.DIT法、1n−place方式。
The present invention will be described in detail below using the drawings. First, the algorithm of the method of the present invention will be explained. Figure 1 is a diagram showing the signal flow of the method of the present invention, where the number of data is 1.
6. DIT method, 1n-place method.

入力正順の例における演算の方法及びデータの流れを示
し、第2図にブロック内データ数2・恥。
The calculation method and data flow in an example of normal input order are shown in Figure 2, where the number of data in a block is 2.

DIT法+ In place方式、入力正順の例にお
けるシグナル・フローの詳細を示しである。
The details of the signal flow in an example of the DIT method + In place method and normal input order are shown.

本発明の方式におけるFFTアルコ゛1ノズムの基本を
述べれば次のとおりである。本FFTは、Ba5e2F
FT 、 :I 7プレツクス・データ数N 、  1
n−place方式、入力データ正順、 DIT法の場
合であり、演算手順をN = 22YとN= 22Y”
1−の場合に分けて示せば次のとおりである。
The basics of the FFT algorithm in the method of the present invention are as follows. This FFT is Ba5e2F
FT, :I 7plex data number N, 1
This is the case of n-place method, input data forward order, and DIT method, and the calculation procedure is N = 22Y and N = 22Y"
The case of 1- is shown separately as follows.

(()  N=22Yの場合 (1)  N = 22Y、 ND = N/2. t
 = 1とする。
(() For N=22Y (1) N = 22Y, ND = N/2.t
= 1.

(2)  K = O、M = Oとする。(2) Let K=O, M=O.

(3)  k=に、 k+1.  k+2. 、、、、
、、、  k+ND−1に対し、下記の演算を行う。
(3) For k=, k+1. k+2. ,,,,
, , The following calculation is performed for k+ND-1.

A=C,−□(k)−!−C,−□(k+ND)・W(
M)      (1−1)c=c見−1(k)−C,
、(k+ND)・w(+lI)      (1−2)
B=C(k+−ND)+C,−1(k+TND)・W(
M)    (2J)z−12 D = C,、(k+ T ND)−C,1(k+ T
年戸(M)  (2−2)c、(k) = A+B−W
(2M)          (3−:L)C,(k+
TND) = A−B−W(2M)        (
3−2)Cy、(k”ND ) =C+D−W(2M”
2 )        (4−1)c  (k+−ND
)= c−D−w(2M+2)           
    (4−2)   2 P=Bit Reverse(i )Ij2=−1(4
)  M=M+2 、 K=に+2・即とし、K(Nな
ら(3)に戻る。
A=C, -□(k)-! −C, −□(k+ND)・W(
M) (1-1)c=cview-1(k)-C,
, (k+ND)・w(+lI) (1-2)
B=C(k+-ND)+C,-1(k+TND)・W(
M) (2J)z-12 D = C,, (k+ T ND)-C,1(k+ T
Year (M) (2-2)c, (k) = A+B-W
(2M) (3-:L)C, (k+
TND) = A-B-W(2M) (
3-2) Cy, (k”ND) = C+D-W(2M”
2) (4-1)c (k+-ND
) = c-D-w(2M+2)
(4-2) 2 P=Bit Reverse(i)Ij2=-1(4
) M=M+2, K=+2・immediately, and if K(N), return to (3).

(5)  ND=ND/4. t = t + 1とし
、t≦γなら(2)に戻る。
(5) ND=ND/4. Let t = t + 1, and if t≦γ, return to (2).

(6)  k=o、 1 、・・・・・・、N−1に対
して(出力データ配列正順化) q = Bit ReverseJ k )としq)k
なら CL(k) = G (ロ)N=22Y+1の場合 (1)N;22Y+II卯=N/2. t = 1とす
る。
(6) For k=o, 1,..., N-1 (output data array normalization) q = Bit ReverseJ k ) and q)k
Then CL(k) = G (b) If N=22Y+1 (1) N; 22Y+II=N/2. Let t = 1.

(2)  K=O,M=Oとする。(2) K=O, M=O.

(3) k=に、に+4.に+2.−−−−−−、  
k+IfD−4に対し、下記の演算を行う。
(3) k=to, to+4. +2. --------,
The following calculation is performed for k+IfD-4.

A−CL−1(k)+C1−1(k+ND)・W(M)
c=c免−1(k)−CL−1(k士別・W(M)B 
= C(k+ND/2)+CCL1(k+TND)°W
(+11)免−1 D=C(k十ND/2)−CL−1(k+TND)・W
(酌9、−1 C(k) = A+B−W(2M) C(k+ND/2) = h−B−W(2M)交 C(k+ND) =C+D−W(2M+2)見 C(k+−’−ND) =C−D−W(2M+2)  
 2 P=Bit Reversef i) 、 j2=J(
4)  M=M+2 、 Kぞに+2・卯とし、K<N
なら(6)に戻る。
A-CL-1(k)+C1-1(k+ND)・W(M)
c=cmen-1(k)-CL-1(k Shibetsu・W(M)B
= C(k+ND/2)+CCL1(k+TND)°W
(+11) Immunity-1 D=C(k+ND/2)-CL-1(k+TND)・W
(cup 9, -1 C(k) = A+B-W(2M) C(k+ND/2) = h-B-W(2M) Cross C(k+ND) = C+D-W(2M+2) Look C(k+-' -ND) =C-D-W(2M+2)
2 P=Bit Reverse i), j2=J(
4) M=M+2, K +2, Rabbit, K<N
Then return to (6).

(5)  ND=ND/4 、 t := t 十iと
し、t≦γなら(2)に戻る。
(5) ND=ND/4, t := t 10i, and if t≦γ, return to (2).

(6)  k=o、1,2.・、、、T−1に対し、下
記の演算を行う。
(6) k=o, 1, 2.・The following calculation is performed for T-1.

CL(2k)−CL(2k)+CL(2に+4)・W(
210(5−1)C(2に+1)=C(2k)−C(2
に+1)・W(2k)    (5−2)L     
  見     克 (ア)  k−0,1,2,、、、、N−1に対しくデ
ータ配列正順化) q = Bit Reverse (k Iとしq>k
々ら C,(q)=C,(k) Ci(k)=C また゛、本発明の方式は、各ブロックを4分割した小ブ
ロックよシ各1点ずつのデータを取出し、この4点デー
タについて通常のFnの2ル一プ分の処理を一度に行い
、大容量メモリ・アクセス回数を半減するように構成さ
れている。
CL(2k)-CL(2k)+CL(2+4)・W(
210(5-1)C(2+1)=C(2k)-C(2
+1)・W(2k) (5-2)L
(a) Data array normalization for k-0, 1, 2, ..., N-1) q = Bit Reverse (k I and q>k
C, (q)=C, (k) Ci(k)=C Also, in the method of the present invention, each block is divided into four small blocks, and one point of data is extracted from each small block, and these four points of data are It is configured to perform processing for two loops of normal Fn at one time, and to reduce the number of large-capacity memory accesses by half.

第5図及び第4図はループとブロック及び小ブロックの
関係を示す図で、第3図は実数部だけ又は虚数部だけの
データが連続するような場合、第4図は実数部と虚数部
とでなるデータが連続する場合について示しである。
5 and 4 are diagrams showing the relationship between loops, blocks, and small blocks. This figure shows a case where data consisting of is continuous.

以上述べたようなアルゴリズムを施行するためのFFT
装置の一実施例を第5図に示す。第5図において、RF
I、 RF2は複数個のレジスターでなるレジスターフ
ァイルで、大容量メモリMSの読出しアドレス、書込み
アドレスを格納する。RIiとRI2はともに読出しア
ドレスの変化分及び格納アドレスの変化分を与えるレジ
スタである。5ELL、 5EL2は入力a□、a2の
いずれか一方を選択するセレクターで、ADI、 AD
2はアドレスを加算する加算器である0AD1ではRF
Iと5ELIの百出力の加算を行い、結果をセレクター
5EL3経由でMSのアドレス・レジスターMAに与え
る。
FFT to implement the algorithm described above
An embodiment of the apparatus is shown in FIG. In Figure 5, RF
I and RF2 are register files consisting of a plurality of registers, and store read addresses and write addresses of the large capacity memory MS. Both RIi and RI2 are registers that provide changes in the read address and changes in the storage address. 5ELL and 5EL2 are selectors that select either input a□ or a2, and ADI, AD
2 is an adder that adds addresses. RF in 0AD1
The 100 outputs of I and 5ELI are added and the result is given to the address register MA of the MS via the selector 5EL3.

ADlの出力はRIiの該当レジスターにフィードバッ
クされ、次のアドレスデータとなる。AD2ではRF2
と5EL2の百出力の加算を行い、結果を5EL3経由
でMAに与える。AD2の出力は、RF2の該当レジス
ターに格納され、次のアドレス・データとなる。
The output of ADl is fed back to the corresponding register of RIi and becomes the next address data. RF2 in AD2
and the 100 outputs of 5EL2 are added, and the result is given to MA via 5EL3. The output of AD2 is stored in the corresponding register of RF2 and becomes the next address data.

H8MI、 I(8H2は高速小容量のメモリで、)、
ISがもの読出しデータの格納、 FFT演算の中間結
果の格納。
H8MI, I (8H2 is a high-speed small capacity memory),
Storage of IS read data and storage of intermediate results of FFT calculations.

FFT演算結果の格納メモリとして使用される。It is used as a storage memory for FFT calculation results.

H8M1.2  は最小サイクル・タイムでのデータの
読出し/書込みの同時動作ができる(読出しアドレスと
書込みアドレスは必ずしも一致しない)。
H8M1.2 is capable of simultaneous data read/write operations with minimum cycle time (read address and write address do not necessarily match).

TBMはFFT演算の定数(W(ロ)など)類を格納し
ておくデータ・メモリ(RAMが使用される)である。
The TBM is a data memory (RAM is used) that stores constants (such as W) for FFT calculations.

ADDは加算器で、2人力Al、 A2の加算を−行い
、結果をH8MI、 H81i12. ADDのA2各
入力へ与えることができる。ADDのA1人力には、H
8八り1. I(8H2,MULの出力の中の何れかが
選ばれる。ADDのA2人力には、H3MIL、 H8
M2. ADD自身の出力の中の何れかが選ばれる。M
ULは乗算器で、2人力Ml、 H2の乗算を行い、結
果をH8MI、 H8M2. ADDのA1各入力へ与
えることができる。MUL (D Ml入力には、TB
Mの出力が与えられる。
ADD is an adder that performs addition of Al and A2 by two people, and outputs the result to H8MI, H81i12. It can be given to each A2 input of ADD. For ADD A1 manpower, H
88ri1. I (8H2, one of the MUL outputs is selected. ADD A2 manual power is H3MIL, H8
M2. One of the outputs of ADD itself is selected. M
UL is a multiplier that multiplies Ml and H2 by two people, and outputs the results to H8MI, H8M2. It can be given to each A1 input of ADD. MUL (D Ml input has TB
The output of M is given.

MULのM22人力には、H3MI、H8λ12の出力
の中のいずれかが選ばれる。
For MUL's M22 manual power, one of the outputs of H3MI and H8λ12 is selected.

IFは外部装置(CPU 、データ収集装置等)とのイ
ンターフェース部である。
The IF is an interface unit with external devices (CPU, data collection device, etc.).

データやアドレス情報、制御信号等が授受される。外部
とのデータ転送はMS〜外部装置で行う。
Data, address information, control signals, etc. are exchanged. Data transfer with the outside is performed between the MS and the external device.

csld、 FFT 演算用のマイクロ・プログラムを
格納す不メモリであり、CTLはマイクロ・プログラム
にるが、ADD、 MUL、 1(SMI/H8M2の
読出し及び書込み動作、  TBMの読出し動作等は並
列的に動作できる。
csld, FFT It is a non-memory that stores the micro program for calculation. CTL is in the micro program, but ADD, MUL, 1 (SMI/H8M2 read and write operations, TBM read operations, etc. are performed in parallel. It can work.

このような構成における本発明の動作を次に説明する。The operation of the present invention in such a configuration will be described next.

(1)  各ループ処理の準備 FFTの入力データは、MSK格納しておく。(1) Preparation for each loop process The input data of FFT is stored in MSK.

RF]、、 RF2は、各8レジスターより成るレジス
ター7アイルで、各レジスターの初期値rtpl(j)
RF],, RF2 is a register 7 aisle consisting of 8 registers each, and the initial value rtpl(j) of each register is
.

RF2(j) (j=o、 1.、、、.7)を次のよ
うにセットする( RFIは読出し、  RF2は書込
み用)。
Set RF2(j) (j=o, 1., .7) as follows (RFI is for reading, RF2 is for writing).

RBA :実数データの先頭アドレス よりA;虚数データ NDニブロック内データ数の半分 またRII、 RI2の初期値を次のようにセットする
(実数データと虚数データが分離している場合の例)。
RBA: A from the first address of real number data; Imaginary number data ND Half of the number of data in the block and the initial values of RII and RI2 are set as follows (an example when real number data and imaginary number data are separated).

Rより = 1          (9−1,)R1
2=2−N′D+1          (92)W(
λ1)=W(0)と1       (10)(2)H
3M 1に(RBA、 IBA); (RBA +T、
 IBA +−T−);に対応する4複素数データを書
込む(MS −+ H8MI ’)、。
From R = 1 (9-1,)R1
2=2-N'D+1 (92)W(
λ1)=W(0) and 1 (10)(2)H
3M to 1 (RBA, IBA); (RBA +T,
Write 4 complex number data corresponding to IBA +-T-) (MS-+H8MI').

(5) H8M2に、MSより次の4複素数データを書
込む(MS−+H8M2 )。
(5) Write the next 4 complex number data from MS to H8M2 (MS-+H8M2).

同時に、H8M’iにFFT演算を施す(H8MI→H
3MIHアルゴリズム(イ)の(3)項)。
At the same time, perform FFT operation on H8M'i (H8MI→H
3MIH algorithm (a) (3)).

(4)  次の動作を並行処理する。(4) Process the following operations in parallel.

(1)■(SMlの4 complex dataをh
ブロックの先頭データのときは RF2(j)十RI2 ブロックの先頭データ以外のときは RF2(j) +RII のアトlメスに格納しく H8MI→MS)、その後、
ブロックの先頭データのときは RFI(j ) + RI2 ブロックの先頭データ以外のときは RFI(、j) 十R工1 (7) 4 complex dataをIISMIに
読込む(1iis −+ H3MI )。
(1) ■ (4 complex data of SM1
If it is the first data of the block, it should be stored in the RF2(j)+RII female.If it is other than the first data of the block, it should be stored in the atl female of RF2(j)
If it is the first data of the block, RFI(j) + RI2 If it is other than the first data of the block, RFI(,j) 10R engineering 1 (7) 4 Read complex data into IISMI (1iis −+ H3MI).

(ii)  I(8M2t7) 4 complex 
dataにFFT演算を施す(H8M2→H3M2; 
 アルゴリズム(イ)の(3)項)。
(ii) I (8M2t7) 4 complex
Perform FFT operation on data (H8M2→H3M2;
Section (3) of algorithm (a)).

(5)  次の動作を並行処理する。(5) Process the following operations in parallel.

(i)nsh+2の 4 complex dataを
ブロックの先頭データのときは RF2(j ) 十RI2 ブロックの先頭データ以外のときは RF2(j ) + RII のアドレスに格納しく H5M2→MS)、その後、ブ
ロックの先頭デ・〜夕のときは RFI(j) 十RI2 それ以外のときは RFI(j)+R工1 の4 complex dataをH8M2に読込む0
.(S−+H3M2 )。
(i) When the 4 complex data of nsh+2 is the first data of the block, store it at the address of RF2(j), and when it is other than the first data of the block, store it at the address of RF2(j)+RII (H5M2→MS), then If it is the first day or evening, RFI (j) 10RI 2 Otherwise, RFI (j) + R engineering 1 4 Read complex data into H8M2 0
.. (S-+H3M2).

(ii)  H8Mi8Mミノomplex data
 K FFT演算を施す(H8M1→I(SMli  
アルゴリズム(イ)の(3)項)。
(ii) H8Mi8M mino complex data
K Perform FFT operation (H8M1→I(SMli
Section (3) of algorithm (a)).

(6)1ブロツクの全FFT演算の終了まで上記(4)
(6) The above (4) until the end of all FFT calculations for one block.
.

(5)を繰9返し、この演算処理の終了後M = M+
2とし、このループの全ブロックの処理が終了しなけれ
ば再び(5)の動作に戻る。
Repeat (5) 9 times, and after completing this calculation process, M = M+
2, and if processing of all blocks in this loop is not completed, the operation returns to step (5) again.

(7)HsM2の4 ComplexデータをMSに格
納する。
(7) Store the 4 Complex data of HsM2 in the MS.

1ルーズの処理終了後ND = ND / 4  とし
、全ループの処理が終了しなければ(1)に戻る。
After completing the processing of 1 loose, set ND = ND / 4, and if the processing of all loops is not completed, return to (1).

上記(1)〜(7)が第5図の装置を使用して、メモリ
・アクセス・リミットを越える高速FFT演算を実現す
るアルゴリズムの例である。
The above (1) to (7) are examples of algorithms for realizing high-speed FFT operations exceeding the memory access limit using the apparatus shown in FIG.

データ処理回数の制御(1ブロックn回)、ループ回数
の制御等は、C8のマイクロ・プログラムによF)、C
TLで行う。
Control of the number of data processing times (n times per block), control of the number of loops, etc. are controlled by the C8 micro program.F), C
Do it on TL.

レジスター・ファイルRFI、 RF2は、4点8デー
タのRead Address or Write A
ddressを記憶し、参照後そのレジスターの内容を
次の参照アドレスに更新する。RI2は、ブロックの先
頭データのアドレス算出に際して用いられ(Read 
Add、/Write Add、)、RIiはその他の
データ・アドレス算出に際して用いられるように制御さ
れる。
Register files RFI and RF2 have 4 points and 8 data Read Address or Write A
ddress is stored, and after reference, the contents of the register are updated to the next reference address. RI2 is used when calculating the address of the first data of the block (Read
Add, /Write Add, ), RIi are controlled to be used when calculating other data addresses.

なお、本発明は、上述の実施例に限定することなく以下
に列挙する拡張や変形も可能でおる。
Note that the present invention is not limited to the above-described embodiments, but can also be expanded and modified as listed below.

(1)1ブロツクを8分割(16,32,、、、分割)
し、3ループ(4,5,、、、ループ)一括処理アA・
ゴリズムによるFFT演算装置等のアルゴリズムの拡張
や変形 (2)  制御信号、データ、アドレス等のライン、バ
ス等の統一または分離及び装置相互間の別の接続、結合
、結線。
(1) 1 block divided into 8 (16, 32,..., division)
Then, 3 loops (4, 5,..., loop) batch processing A.
(2) Unification or separation of lines and buses for control signals, data, addresses, etc., and different connections, combinations, and connections between devices.

(3)  複数の加算器(加減算器)の使用(Butt
erfly演算器を有するもの等)、複数の乗算器の使
用による処理の高速化を図った場合 (4)  装置の共用、代用 (1)  Apl とAD2の共用(5EL3は不要と
なる)(it)  TBM をH5MI or H3N
2と共用0ii)  csなしでCTLで代用の場合(
5)装置の結合、合体 (i)  (SEL3.)MA  をMSに含む((S
EL3.) MAが見かけ上ガいケース) (ii)SELLをADDに含む (山)  5EL2をAD2に含む 6v)  RII とRI2の合体(SELコ、 5E
L2不要)M  RFI とRF2の合体 (6)その他 會RII、 RI2を2個ずつもつ場合・R工’l、 
RI2をROMとする場合(複数個)・RII、 RI
2が複数個ある場合 ・TBMがROMの場合 −RFI、 RF2のレジスター数は8に限定されない
以上説明したように、本発明によれば、次のような効果
を奏する。
(3) Use of multiple adders (addition/subtraction units) (Butt
(4) Sharing and substitution of devices (1) Sharing of Apl and AD2 (5EL3 becomes unnecessary) (it) TBM to H5MI or H3N
2 and shared 0ii) When using CTL instead without cs (
5) Combining and merging devices (i) (SEL3.) Include MA in MS ((S
EL3. ) Case where MA is apparently bad) (ii) Including SELL in ADD (mountain) Including 5EL2 in AD2 6v) Combining RII and RI2 (SEL, 5E)
(L2 unnecessary) Combination of M RFI and RF2 (6) If you have two each of RII and RI2, R engineering'l,
When using RI2 as ROM (multiple ROMs)・RII, RI
In the case where there is a plurality of registers 2 and TBM is a ROM, the number of registers of RFI and RF2 is not limited to eight.As explained above, the present invention provides the following effects.

(1)  メモリ・アクセス・リミットを越える高速F
FT演算が実現できる。
(1) High-speed F that exceeds memory access limits
FT calculation can be realized.

(2)  簡潔で適用性の高いBa5e2. FFTを
実現できる。
(2) Concise and highly applicable Ba5e2. FFT can be realized.

(3)  Butterfly演算器等を使用しない、
安価で経済的な装置ができる。
(3) Do not use Butterfly calculation units, etc.
A cheap and economical device can be created.

(4)  汎用Array Processorとして
の特徴を失わない装置になし得る。
(4) The device can be made into a device that does not lose its characteristics as a general-purpose array processor.

【図面の簡単な説明】[Brief explanation of drawings]

第1図及び第2図は本発明のシグナル・フローを示す図
、第3図及び第4図はループとブロック及び小ブロック
の関係を示す図、第5図は本発明のFFT演算装置の一
実施例を示す要部構成図である。 RFI、 RF2・・・レジスターファイル、RII、
 RI2・・・レジスタ、5ELI、 5EL2.5E
L3 ・・・セレクター、ADl。 AD2・・・加算器、MA・・・アドレス・レジスター
、λIS・・・大容量メモIJ 、H3MI、 H3N
2・・・高速小容量メモリ、TBM・・・データ・メモ
リ、ADD・・・加算器、MTJL・・・乗Xi、xF
・・・インターフェース。 i=0.1,2.、B+ IL−1″7゛0′凸0   尾4図
1 and 2 are diagrams showing the signal flow of the present invention, Figures 3 and 4 are diagrams showing the relationship between loops, blocks, and small blocks, and Figure 5 is an illustration of the FFT calculation device of the present invention. FIG. 2 is a main part configuration diagram showing an example. RFI, RF2... register file, RII,
RI2...Register, 5ELI, 5EL2.5E
L3...Selector, ADl. AD2...Adder, MA...Address register, λIS...Large capacity memo IJ, H3MI, H3N
2...High speed small capacity memory, TBM...Data memory, ADD...Adder, MTJL...Xi, xF
···interface. i=0.1,2. , B+ IL-1″7゛0′ Convex 0 Tail 4 figure

Claims (2)

【特許請求の範囲】[Claims] (1)  入力データ・ブロックを分割し、更に各ブロ
ックを小ブロツク群に分割し、該小ブロツク群よシ取出
したデータ群の間で、複数ループ分のFFT演算を施す
ことによυメモリアクセス回数を減少させ、メモリ・ア
クセスの遅延を軽減させ高速演算を可能とすることを特
徴とするFFT演算方式を用い九FFT演算装置。
(1) Memory access is achieved by dividing the input data block, further dividing each block into small block groups, and performing FFT operations for multiple loops between the data groups extracted from the small block groups. 9. An FFT arithmetic device using an FFT arithmetic method, which is characterized by reducing the number of times, reducing delays in memory access, and enabling high-speed arithmetic operations.
(2)複a伺の高速アクセス・メそりをバッファーとし
て使い分け、低速大容量メモリ中の小ブロックの入力デ
ータ・アドレス群及び出力データ・アドレス群を制御し
ながら、今回のデータ群の高速メモリを使った複数ルー
プのFFT演算と前回のFFT演算結果を低速メモリへ
格納した後での次回のデータ群の低速メモリよシ高速メ
モリへの読出し動作とを並行して行うことを特徴とする
特許請求の範囲第1項記載のFFT演算装置。
(2) Using double-A high-speed access memory as a buffer, the high-speed memory for this data group is controlled while controlling the input data address group and output data address group of small blocks in the low-speed large-capacity memory. A patent claim characterized in that the FFT calculation of the plural loops used and the readout operation of the next data group from the low-speed memory to the high-speed memory after storing the previous FFT calculation result in the low-speed memory are performed in parallel. The FFT calculation device according to item 1.
JP58065130A 1983-04-13 1983-04-13 High speed foulier transformation operating device Granted JPS59189474A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58065130A JPS59189474A (en) 1983-04-13 1983-04-13 High speed foulier transformation operating device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58065130A JPS59189474A (en) 1983-04-13 1983-04-13 High speed foulier transformation operating device

Publications (2)

Publication Number Publication Date
JPS59189474A true JPS59189474A (en) 1984-10-27
JPH0230062B2 JPH0230062B2 (en) 1990-07-04

Family

ID=13277978

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58065130A Granted JPS59189474A (en) 1983-04-13 1983-04-13 High speed foulier transformation operating device

Country Status (1)

Country Link
JP (1) JPS59189474A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7702712B2 (en) 2003-12-05 2010-04-20 Qualcomm Incorporated FFT architecture and method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5491151A (en) * 1977-12-28 1979-07-19 Fujitsu Ltd Internal memory control system on array processor
JPS5674747A (en) * 1979-11-26 1981-06-20 Toshiba Corp Parallel operation system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5491151A (en) * 1977-12-28 1979-07-19 Fujitsu Ltd Internal memory control system on array processor
JPS5674747A (en) * 1979-11-26 1981-06-20 Toshiba Corp Parallel operation system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7702712B2 (en) 2003-12-05 2010-04-20 Qualcomm Incorporated FFT architecture and method

Also Published As

Publication number Publication date
JPH0230062B2 (en) 1990-07-04

Similar Documents

Publication Publication Date Title
CN109543832B (en) Computing device and board card
CN109522052B (en) Computing device and board card
WO2019007095A1 (en) Operational accelerator
TW201913460A (en) Chip device and related products
CN110036369A (en) A kind of calculation method and Related product
CN110738308B (en) Neural network accelerator
JPS6118792B2 (en)
JPS5977574A (en) Vector processor
US5404558A (en) Data driven type information processor having a plurality of memory banks
KR20210033757A (en) Memory device and operation method thereof
CN104679670B (en) A kind of shared data buffer structure and management method towards FFT and FIR
CN115221102B (en) Method for optimizing convolution operation of system-on-chip and related product
US20210311874A1 (en) Distributed memory-augmented neural network architecture
KR960024997A (en) Digital array processor for multi-instruction multi-data type neural network and system configured using same
JPS59189474A (en) High speed foulier transformation operating device
CN108595369B (en) Arithmetic parallel computing device and method
CN110427496A (en) Knowledge mapping extending method and device for text-processing
JPH0312741B2 (en)
CN101794276B (en) Discrete cosine transform (DCT)-inverse discrete cosine transform (IDCT) coprocessor suitable for system on chip (SOC)
CN111260070B (en) Operation method, device and related product
CN111078625B (en) Network-on-chip processing system and network-on-chip data processing method
JP2781550B2 (en) Parallel processing computer
JPH06223166A (en) General processor for image processing
WO2021179286A1 (en) Data processing method, prediction method, and calculation device for convolutional neural network, and storage medium
JPS58214957A (en) Computer system