JPS60201472A

JPS60201472A - Matrix product computing device

Info

Publication number: JPS60201472A
Application number: JP5790184A
Authority: JP
Inventors: Akira Sawada; 明澤田
Original assignee: NEC Corp; Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1984-03-26
Filing date: 1984-03-26
Publication date: 1985-10-11

Abstract

PURPOSE:To obtain a matrix product computing device in which matrix product is computed in short computing operator obtaining the product of two operands and integrating them in a matrix and connecting each operand registers of each operator as a shift register. CONSTITUTION:Product adders are arranged in a matrix, operand registers RA of each row are connected as shift registers and operand registers RB of each column are connected as shift registers. When an operand comes from a data bus DB, a clock signal (CA or CB) is given to a corresponding row or column and the content of the data bus is stored in the 1st row or column of operand register, the operand stored so far is stored in the operand register of the next stage and the operand is moved similarly. A flip-flop FFA or FFB in the product adder is set at the same time. When a clock Cop is outputted after the operand is fed to he required row or column, the arithmetic is conducted only for the product adder whose flip-flops FFA, FFB are both set.

Description

【発明の詳細な説明】（１）　発明の属する分野の説明本発明は中央処理装置を補助して高速にマ）　ＩＪクス
積の計算を行なう装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION (1) Description of the field to which the invention pertains The present invention relates to a device that assists a central processing unit to calculate an IJ product at high speed.

（２）　従来の技術の説明従来装置で（ｍ、ｎ）形行列と＜ｎ、ｔ）形行列の積を
めるには１．Σ　ａｌｊ−１）ｊｋの計算をｍ・」−１を回行なっていた。この方式で計算能力を高めるには、
ａｉｊ　＠　１）ｊｌ（の計算をしてその累計をめる積
和演算器を多数用いれば良い。しかし、積和演算器の数
がかなり多くなると、データバスのオペランド供給がお
いつかなくなり、実質的にデータバスの能力で計算速度
の限界が法談るようになる。(2) Description of conventional technology To calculate the product of an (m, n) type matrix and a <n, t) type matrix using a conventional device, 1. The calculation of Σ alj-1)jk was performed m·''-1 times. To increase your computing power using this method,
aij @ 1) jl() and calculates the cumulative sum. However, if the number of product-accumulators increases considerably, the operand supply of the data bus will not be able to keep up, and the In the end, the limits of calculation speed became apparent due to the capabilities of the data bus.

この場合、全オペランドの供給時間が計算時間となるの
で、次式の関係が成立する。In this case, since the supply time of all operands becomes the calculation time, the following relationship holds true.

Ｔ□　＝２ｎｓｍｓｔ／Ｍ　・−・−・−・　（１１こ
こでｓ　ＴＯは計算時間（秒）２Ｍはデータバスが単位
時間にオペランドを供給できる量である。T□=2nsmst/M (11) where sTO is calculation time (seconds) and 2M is the amount of operands that the data bus can supply per unit time.

例えば（１００，１００）形行列どうしの積をめる場合
、Ｍを１００Ｍ語／秒とすると、Ｔｏは２０ｍ秒となり
、積和演算器の数がいくら多くてもこれ以上速くは計算
できない。バスを多重化すればオペランド供給能力が高
くなるが、主記憶の競合が発生したり、バス間の制御が
複雑となるので多重度はあまり上げられない。For example, when calculating the product of (100,100) type matrices, if M is 100M words/sec, To will be 20 msec, and no matter how many product-sum calculators there are, calculations cannot be made any faster. Multiplexing the buses increases operand supply capability, but the degree of multiplicity cannot be increased much because main memory contention occurs and control between buses becomes complicated.

（３）　発明の目的本発明の目的は、短い計算時間でマトリクス積の計算が
でき、かつ制御が容易で積和演算器の数が少なくてすむ
・マトリクス積計算装置を得ることにある。(3) Object of the Invention An object of the present invention is to provide a matrix product calculation device that can calculate matrix products in a short calculation time, is easy to control, and requires a small number of product-sum calculation units.

（４）発明の構成本発明によれば、２つのオペランドの積をめてその累計
をとる演算器を格子状に配置し、各演算器のそれぞれの
オペランドレジスタ同志をシフトレジスタとして連結し
たマトリクス積計算装置を得る。(4) Structure of the Invention According to the present invention, arithmetic units that multiply the products of two operands and take the cumulative sum are arranged in a grid, and each operand register of each arithmetic unit is connected as a shift register to form a matrix product. Get a computing device.

（５）発明の実施例次に、図面を参照して本拠明をより詳細に説明する。(5) Examples of the invention Next, the main feature will be explained in more detail with reference to the drawings.

第１図は本発明のマｌ−ＩＪクス積計算装置を構成する
構成要素である積和演算器の内部構成例であって、クロ
ックＣＡによって前段又はデータ源からの情報を供給す
る内部データーバスＤＡＩの内容を記憶するオペランド
レジスタＲＡと同様にクロックＣＢによって前段又は他
のデータ源からの情報を供給する内部データバスＤＢＩ
の内容を記憶するオペランドレジスタＲＢおよび、各ク
ロック０人。FIG. 1 shows an example of the internal configuration of a product-sum calculator that is a component of the multi-IJ multiplication calculation device of the present invention, and shows an internal data bus that supplies information from the previous stage or data source using a clock CA. An internal data bus DBI that supplies information from previous stages or other data sources by clock CB as well as an operand register RA that stores the contents of DAI.
Operand register RB that stores the contents of and each clock zero.

ＣＢ、ＣＯＰにより起動され乗算・累算を行なう演算部
ＯＰから成る。演算部ＯＰ内では、クロックＣＡでフリ
ップ７０ツブＦＦＡを駆動し、クロックＣＢでフリップ
フロップＦＦＢを駆動し、フリップ７０ツブＦＦＡとＦ
ＦＢとの出力とクロックＣＯＰとのＡＮＤを取り、制御
部で制御信号を作っている。一方、制御部からの制御信
号で、内部データバスＤＡＯからオペランドレジスタＲ
Ａ７５）うの情報と内部データバスＤＢＯからオペラン
ドレジスタＲＢからの情報とを得て、これらを乗算器で
乗算し、その出力を、前回の乗算の結果を記憶している
結果レジスタからの出力とを加算器で加算して結果レジ
スタに新らたに記憶している。It consists of an arithmetic unit OP that is activated by CB and COP and performs multiplication and accumulation. In the arithmetic unit OP, the clock CA drives the flip-flop 70-tube FFA, the clock CB drives the flip-flop FFB, and the flip-flop 70-tubes FFA and F
The output from the FB and the clock COP are ANDed to create a control signal in the control section. On the other hand, a control signal from the control unit causes the operand register R to be transferred from the internal data bus DAO.
A75) Obtain the information from the operand register RB from the internal data bus DBO, multiply them by a multiplier, and use the output as the output from the result register that stores the result of the previous multiplication. are added by an adder and newly stored in the result register.

第２図は本発明の一実施例であって、第１図の積和演算
器を格子状に並べ、各行毎のオペランドレジスタＲＡを
シフトレジスタとして結線し、各列毎のオペランドレジ
スタＲＢもシフトレジスタとして結線したものである。FIG. 2 shows an embodiment of the present invention, in which the product-accumulators shown in FIG. 1 are arranged in a grid, the operand registers RA for each row are connected as shift registers, and the operand registers RB for each column are also shifted. It is wired as a resistor.

従ってクロックＣＡ。Therefore, clock CA.

ＣＢは各行、各列ごとに供給している。またクロックＣ
ｏｐは全積和演算器に共通に供給している。CB is supplied to each row and each column. Also clock C
op is commonly supplied to all product-sum calculation units.

この動作は、データバスＤＢよりオペランドが送られて
くると該当する行または列にクロック信号（ＣＡまたは
ＣＢ）が出され、データバスの内容が１列目または１行
目のオペランドレジスタこ記憶され、それまで記憶され
ていたオペランドは次段のオペランドレジスタに記憶さ
れ、以下同様にオペランドが移動する。同時に、積和演
算器内の７リツプフロツプＦＦＡまたはＦＦＢがセット
される。必要な行２列にオペランドを送ったあとでクロ
ックＣｏｐを出すき、フリップフロップＦＦＡ。In this operation, when an operand is sent from the data bus DB, a clock signal (CA or CB) is output to the corresponding row or column, and the contents of the data bus are stored in the operand register in the first column or row. , the operands stored up to that point are stored in the next-stage operand register, and the operands are moved in the same manner thereafter. At the same time, 7 lip-flops FFA or FFB in the product-sum calculator are set. After sending the operands to the required rows and 2 columns, the clock Cop is sent to the flip-flop FFA.

ＦＦＢがともにセットされている積和演算器に限り演算
を行なう。演算終了後フリップフロップＦＦＡ、ＦＦＢ
はリセットされる。この一連の動作を１サイクルとし、
必要なサイクル数分実行することによりマトリクス積が
得られる。Calculation is performed only in the product-sum calculation unit in which both FFB is set. After the calculation is completed, flip-flops FFA and FFB
will be reset. This series of operations is called one cycle,
A matrix product can be obtained by executing the required number of cycles.

オペランドが送る順序は次のとうりである。オペランド
レジスタＲＡｔｘＪこはマトリクスＡの１行目のｎ個の
要素とそれにつづくｔ−１個のゼロを順に送る。オペラ
ンドレジスタＲＡ２１にはＡの２行目のｎ個の要素とそ
れにつづくｍ−１個のゼロを２サイクル目から順に送る
。以下同様にｎ個の５− 要素とｔ−１個のゼロを１サイクルずつ遅らせながら送
る。同様にＢの各列のｎ個の要素とｍ−１個のゼロを１
サイクルずつずらして送る。このようにして全部送り終
わると積がまっている。従って、積をめるに必要なサイ
クル数はｎ＋（ｍ−１）＋（ｔ−１）である。The order in which the operands are sent is as follows. Operand register RAtxJ sequentially sends n elements of the first row of matrix A and t-1 zeros following them. The n elements of the second row of A and the following m-1 zeros are sequentially sent to the operand register RA21 from the second cycle. Thereafter, in the same way, n 5- elements and t-1 zeros are sent with a delay of one cycle. Similarly, n elements and m-1 zeros in each column of B are 1
Send by shifting each cycle. After sending everything in this way, it will be piled up. Therefore, the number of cycles required to calculate the product is n+(m-1)+(t-1).

第３図は具体的な動作例を示した図であって。FIG. 3 is a diagram showing a specific example of operation.

（４，２）形行列Ａと（２，３）形行列Ｂの計算例であ
る。内側の４×３のマス目がそれぞれ積和演算器に対応
し、マス自白には経過サイクルに対応した演算器の動作
が示しである。外側のマスにはオペランドを送る順序を
示しである。This is an example of calculation of a (4,2) type matrix A and a (2,3) type matrix B. Each of the inner 4×3 squares corresponds to a product-sum calculator, and the cell confession shows the operation of the calculator corresponding to the elapsed cycle. The outer cells indicate the order in which operands are sent.

次にｍ−を個の積和演算器を用いる場合について実施例
の装置と従来装置を比較してみる。１サイクル当り必要
なオペランド数は従来装置では２ｍｅｔ個であるが、本
実施例の装置ではｍ＋を個必要とするにすぎない。一方
、マトリクス積をめるに必要なサイクル数は従来ｎサイ
クルである６− 倍と少ない。例えば、（１０，１０）形行列どうしの場
合は０２８倍、（１００，１００）形の場合は約００３
倍ですむ。逆にバスの能力を同じとすると演算器の数を
ふやすことができる。Next, a comparison will be made between the device of the embodiment and the conventional device in the case where m- product-sum calculators are used. The number of operands required per cycle is 2met in the conventional device, but only m+ in the device of this embodiment. On the other hand, the number of cycles required to calculate the matrix product is 6- times smaller than the conventional n cycles. For example, in the case of (10,10) type matrices, it is multiplied by 028, and in the case of (100,100) type matrices, it is approximately 003
It costs twice as much. Conversely, if the bus capacity remains the same, the number of computing units can be increased.

以上のように、バスの制限を受けにくくなるので演算器
の数を大幅にふやすことができ、計算速度を向上させる
ことができる。As described above, since it is less susceptible to bus limitations, the number of arithmetic units can be greatly increased, and calculation speed can be improved.

また、演算器の数を同じとするとバスの使用効率が小さ
くなり、バスを他のデータ転送に利用できるので中央処
理装置の処理能力が向上する。Furthermore, if the number of arithmetic units is kept the same, the efficiency of using the bus will be reduced, and the bus can be used for other data transfers, thereby improving the processing capacity of the central processing unit.

[Brief explanation of the drawing]

第１図は本発明の一実施例に用いる積和演算器の例を示
すブロック図、第２図は本発明の一実施例を示すブロッ
ク図、第３図は（４，２）形行列と（２，３）形行列の
マ）　ＩＪクス積をめるときの動作を示した図である。Ｄ人Ｉ　、　ＤＢＩ・・・・・・前段からの内部データ
バス、ＤＡＯ。ＤＢＯ・・・・・・次段への内部データバス％Ｃ人、Ｃ
Ｂ・・・・・・クロックパルス、Ｃｏｐ・・・・・・演
算タイミングクロックパルス、Ｄ几・・・・・・データ
バスへの結線。FIG. 1 is a block diagram showing an example of a product-sum calculator used in an embodiment of the present invention, FIG. 2 is a block diagram showing an embodiment of the present invention, and FIG. 3 is a block diagram showing an example of a (4,2) type matrix. FIG. 3 is a diagram showing the operation when calculating the IJ product of a (2,3) type matrix. D person I, DBI...Internal data bus from the previous stage, DAO. DBO・・・・・・Internal data bus to next stage %C person, C
B: Clock pulse, Cop: Operation timing clock pulse, D: Connection to data bus.

Claims

[Claims]

Arithmetic units that multiply the products of two operands and take the cumulative sum are arranged in a grid, and the two operand registers of each arithmetic unit are elements that constitute shift registers for each row and column of the lattice. A matrix product calculation device featuring: