JPH05324700A

JPH05324700A - Matrix multiplication device

Info

Publication number: JPH05324700A
Application number: JP4125767A
Authority: JP
Inventors: Noriaki Otake; 紀明大竹; Takahiro Sakurai; 隆博桜井
Original assignee: N T T DATA TSUSHIN KK; NTT Data Communications Systems Corp
Current assignee: N T T DATA TSUSHIN KK; NTT Data Corp
Priority date: 1992-05-19
Filing date: 1992-05-19
Publication date: 1993-12-07

Abstract

PURPOSE:To reduce the number of times of the read-out of the element of a first matrix from a storage device by multiplying each element of the matrix, and accumulating and summing the multiplied result of each element of the same row of the first matrix. CONSTITUTION:One element of the matrix B and the elements of the matrix A are multiplied successively by a multiplication circuit 4, and results are stored in a multiplied result storage circuit 5, and the multiplied result and contents read out of a total added result storage circuit 9 to an added result storage circuit 6 are summed by an addition circuit 7, and after this result is stored temporarily in a temporary added result storage circuit 8, it is written in the total added result storage circuit 9. Thus, the multiplied result stored in the multiplied result storage circuit 5 and the contents read out of the total added result storage circuit 9 to the added result storage circuit 6 are summed by the addition circuit 7, and this result is written in the total added result storage circuit 9 after being stored temporarily in the temporary added result storage circuit 8.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、コンピュータに用いら
れ、行列乗算を行なう行列乗算装置に係わり、特に、行
列の乗算を高速に行なうのに好適な行列乗算装置に関す
るものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a matrix multiplication device used in a computer for performing matrix multiplication, and more particularly to a matrix multiplication device suitable for performing matrix multiplication at high speed.

【０００２】[0002]

【従来の技術】コンピュータで科学技術計算などを処理
する場合、行列計算の処理に、多大の処理時間を費や
す。この行列計算の中には、行列の乗算も頻繁に出現す
る。このような行列の乗算を行なうコンピュータに関し
て、以下の図１０〜図１３を用いて説明する。2. Description of the Related Art When a computer is used to process scientific and technological calculations, a great deal of processing time is spent in processing matrix calculations. Matrix multiplication frequently appears in this matrix calculation. A computer that performs such matrix multiplication will be described with reference to FIGS. 10 to 13 below.

【０００３】図１０は、行列の乗算の表記形態を示す説
明図であり、図１１は、従来のコンピュータによる行列
の乗算手順を示す説明図である。行列Ａ（ｌ×ｍ）と行
列Ｂ（ｍ×ｎ）、および、その行列Ａと行列Ｂの積を行
列Ｃ（ｌ×ｎ）、また、行列Ａ、Ｂ、Ｃのｉ行ｊ列の要
素を、それぞれ、ａｉｊ、ｂｉｊ、ｃｉｊとすると、行
列の乗算は、図１０に示す表記形態で表現される。コン
ピュータで、行列Ａと行列Ｂの乗算を行ない、行列Ｃを
算出する場合、行列Ａ、Ｂ、Ｃの各要素のデータは、記
憶装置（メモリなど）に格納される。行列Ｃの一つの要
素ｃｉｊを求めるには、次の式（１）に示すように、図
１１において、枠で囲まれた行列の部分に着目して、行
列Ａの要素ａｉｋ（ｋ＝１〜ｍ）と、行列Ｂの要素ｂｋ
ｊ（ｋ＝１〜ｍ）のそれぞれ対応する要素どうしの乗算
を行ない、その乗算結果を累積加算する。 FIG. 10 is an explanatory diagram showing a notation form of matrix multiplication, and FIG. 11 is an explanatory diagram showing a matrix multiplication procedure by a conventional computer. The matrix A (l × m) and the matrix B (m × n), and the product of the matrix A and the matrix B is the matrix C (l × n), and the elements in the rows i and j of the matrices A, B and C Are respectively aij, bij, and cij, matrix multiplication is expressed in the notation form shown in FIG. When a computer multiplies the matrix A and the matrix B to calculate the matrix C, the data of each element of the matrices A, B, and C is stored in a storage device (memory or the like). To obtain one element cij of the matrix C, as shown in the following expression (1), paying attention to the part of the matrix surrounded by a frame in FIG. 11, the element aik (k = 1 to 1 of the matrix A is focused. m) and the element bk of the matrix B
The elements corresponding to j (k = 1 to m) are respectively multiplied, and the multiplication results are cumulatively added.

【０００４】ここで、累積加算の途中結果を、計算が終
了する毎に、記憶装置に書き戻すのでは性能が上がらな
いので、処理装置（プロセッサなど）内のレジスタを使
用して、累積加算の途中結果を格納する。しかし、累積
加算の途中結果を格納するためのレジスタなどには制限
があるので、通常は、次の図１２、および、図１３で説
明するように、行列Ｃの要素を、一つずつ、逐次求めて
いく。Here, since the performance cannot be improved by writing the intermediate result of the cumulative addition back to the storage device each time the calculation is completed, the register in the processing device (processor etc.) is used to perform the cumulative addition. Store the intermediate results. However, since there are restrictions on the registers and the like for storing the intermediate result of cumulative addition, normally, the elements of the matrix C are sequentially added one by one, as described in FIG. 12 and FIG. To seek.

【０００５】図１２は、従来の行列乗算を行なう処理装
置の構成を示すブロック図である。処理装置１１１は、
記憶装置１１０に記憶してある行列Ａの要素１１２、お
よび、行列Ｂの要素１１３のそれぞれを読み出して格納
する行列Ａ格納領域１１４、および、行列Ｂ格納領域１
１５などからなるレジスタ群１１６と、このレジスタ群
１１６を用いて、行列Ａと行列Ｂとの行列乗算を行な
い、その結果、すなわち、行列Ｃの要素１１７を、記憶
装置１１０に書き込む演算器１１８により構成されてい
る。尚、レジスタ群１１６には、演算器１１８による各
行列Ａの要素１１２と行列Ｂの要素１１３のそれぞれの
乗算の結果を格納する乗算結果格納領域１１９と、演算
器１１８による乗算結果の累積加算の結果を格納する加
算結果格納領域１２０が設けられている。このような構
成の処理装置による行列Ｃの一つの要素ｃｉｊを求める
時の動作を、次の図１３を用いて説明する。FIG. 12 is a block diagram showing the structure of a conventional processor for performing matrix multiplication. The processing device 111 is
The matrix A storage area 114 and the matrix B storage area 1 for reading and storing the elements 112 of the matrix A and the elements 113 of the matrix B stored in the storage device 110, respectively.
A register group 116 including 15 and the like, and a matrix multiplication of the matrix A and the matrix B using the register group 116, and the result, that is, the element 117 of the matrix C is written by the computing unit 118. It is configured. In the register group 116, a multiplication result storage area 119 for storing the result of multiplication of the element 112 of each matrix A and the element 113 of the matrix B by the arithmetic unit 118, and the cumulative addition of the multiplication result by the arithmetic unit 118. An addition result storage area 120 for storing the result is provided. The operation of the processing apparatus having such a configuration for obtaining one element cij of the matrix C will be described with reference to FIG.

【０００６】図１３は、図１２における処理装置の行列
乗算の動作を示すフローチャートである。まず、記憶装
置に格納されている行列Ａの要素ａｉｋを、図１２のレ
ジスタ群１１６の行列Ａ格納領域１１４に読み出す（ス
テップ１２０１）。次に、記憶装置に格納されている行
列Ｂの要素ｂｋｊを、図１２のレジスタ群１１６の行列
Ｂ格納領域１１５に読み出す（ステップ１２０２）。そ
して、行列Ａ格納領域の内容と、行列Ｂ格納領域の内容
を、図１２の演算器１１８で掛け算し、その結果を、図
１２の乗算結果格納領域１１９に格納する（ステップ１
２０３）。この乗算結果格納領域の内容と、（ｋ−１）
回目までの累積加算結果を格納している図１２の加算結
果格納領域１２０の内容を、図１２の演算器１１８で、
足し算して、その結果を、図１２の加算結果格納領域１
２０に書き戻す（ステップ１２０４）。この処理を、ｍ
回繰り返した後に（ステップ１２０５）、図１２の加算
結果格納領域１２０の内容を、行列Ｃの要素Ｃｉｊとし
て、図１２の記憶装置１１０に書き込む（ステップ１２
０６）。FIG. 13 is a flow chart showing the operation of the matrix multiplication of the processing device in FIG. First, the element aik of the matrix A stored in the storage device is read into the matrix A storage area 114 of the register group 116 of FIG. 12 (step 1201). Next, the element bkj of the matrix B stored in the storage device is read into the matrix B storage area 115 of the register group 116 in FIG. 12 (step 1202). Then, the contents of the matrix A storage area and the contents of the matrix B storage area are multiplied by the arithmetic unit 118 of FIG. 12, and the result is stored in the multiplication result storage area 119 of FIG. 12 (step 1
203). The contents of this multiplication result storage area and (k-1)
The contents of the addition result storage area 120 of FIG. 12, which stores the cumulative addition results up to the first time, are calculated by the arithmetic unit 118 of FIG.
The result of addition is calculated and the result is added area 1 in FIG.
Write back to 20 (step 1204). This process is m
After repeating (step 1205), the content of the addition result storage area 120 of FIG. 12 is written in the storage device 110 of FIG. 12 as the element Cij of the matrix C (step 12).
06).

【０００７】そして、この処理を、行列Ｃの要素数（ｌ
×ｎ）回繰り返して、行列Ｃ全体を求める。その結果、
行列Ｃ全体を求めるためには、２ｍ×ｌ×ｎ回の記憶装
置からの行列要素の読み出しと、ｍ×ｌ×ｎ回の乗算演
算、ｍ×ｌ×ｎ回の加算演算、および、ｌ×ｎ回の記憶
装置への行列要素の書き込みの処理が必要になる。ま
た、コンピュータ上では、プログラム（ソフトウェア）
は、命令語に展開され、記憶装置に格納されている。そ
のため、上述の例では、記憶装置から（４ｍ＋１）×ｌ
×ｎ回の命令語の読み出しが必要になる。尚、アドレス
の更新、及び、ループ回数制御の命令語の読み出し回数
は含んでいない。Then, this processing is performed by the number of elements of the matrix C (l
× n) times are repeated to obtain the entire matrix C. as a result,
In order to obtain the entire matrix C, the matrix elements are read out from the storage device 2m × l × n times, m × l × n multiplication operations, m × l × n addition operations, and l × The process of writing the matrix element to the storage device n times is required. Also, on the computer, the program (software)
Is expanded into an instruction word and stored in the storage device. Therefore, in the above example, (4m + 1) × l from the storage device
It is necessary to read the instruction word × n times. It does not include the number of times the address is updated and the instruction word for controlling the loop count is read.

【０００８】このように、処理装置内部の演算より多く
の記憶装置へのアクセスがある。現状では、処理装置の
処理能力は、記憶装置と処理装置間のデータ転送能力よ
り高いので、全体の処理時間は、記憶装置と処理装置間
のデータ転送能力によって決定される。これは、フォン
ノイマンのボトルネックとよばれている部分である。Thus, there are more accesses to the storage device than there are calculations within the processor. At present, since the processing capacity of the processing device is higher than the data transfer capacity between the storage device and the processing device, the entire processing time is determined by the data transfer capacity between the storage device and the processing device. This is what is called the von Neumann bottleneck.

【０００９】このボトルネックを解消するために、例え
ば、スーパーコンピュータなどにおいては、ベクトル命
令の概念を採用している。このベクトル命令は、決まっ
た手順の繰り返しで処理するデータの転送と演算を、一
つの命令で指定するものである。現在のスーパーコンピ
ュータのアーキテクチャは、全て、このベクトル命令を
持っていることが特徴であり、これにより、行列演算な
どの高速化を図っている。尚、このベクトル命令に対し
て、通常のコンピュータの一つの命令をスカラー命令と
いう。In order to eliminate this bottleneck, for example, in supercomputers and the like, the concept of vector instructions is adopted. This vector instruction specifies transfer and operation of data to be processed by repeating a predetermined procedure with one instruction. All current supercomputer architectures are characterized by having this vector instruction, which is intended to speed up matrix operations. In addition to this vector instruction, one instruction of a normal computer is called a scalar instruction.

【００１０】また、スーパーコンピュータなどでは、記
憶装置とのデータ転送能力を、最大限に発揮できるよう
に、高速演算技術を採用している。このような高速化を
図る高速演算技術には、パイプライン処理や並列演算が
ある。さらに、昨今のデバイス技術の進歩により、大容
量の記憶装置（メモリなど）を、処理装置（プロセッサ
など）内部に格納できるようになり、記憶装置への高速
データ転送が可能となり、処理時間の短縮が図られてい
る。Further, in supercomputers and the like, high-speed arithmetic technology is adopted so as to maximize the data transfer capability with the storage device. High-speed arithmetic techniques for achieving such high speed include pipeline processing and parallel arithmetic. Furthermore, due to recent advances in device technology, a large-capacity storage device (memory, etc.) can be stored inside a processing device (processor, etc.), enabling high-speed data transfer to the storage device and shortening processing time. Is being pursued.

【００１１】尚、このようなパイプライン処理やベクト
ル命令などの処理装置の高速化技術に関しては、例え
ば、電子情報通信学会編「電子情報通信ハンドブック」
（１９８８年、オーム社発行）の第１６２７頁から第１
６３２頁に記載されている。Regarding the speed-up technology of the processing device such as pipeline processing and vector instruction, for example, "Electronic Information and Communication Handbook" edited by The Institute of Electronics, Information and Communication Engineers.
(Published by Ohmsha, Ltd. in 1988), pages 1627 to 1
Page 632.

【００１２】[0012]

【発明が解決しようとする課題】解決しようとする問題
点は、従来の技術では、行列の乗算を行なう場合に、処
理装置と記憶装置間のデータ転送回数が非常に多く、デ
ータ転送が、処理時間の多くを占めてしまい、高速な処
理を行なうことができない点である。本発明の目的は、
これら従来技術の課題を解決し、処理装置と記憶装置間
のデータ転送回数を削減し、かつ、パイプライン処理技
術、及び、並列演算技術の併用を容易とし、高速な行列
の乗算を可能とする行列乗算装置を提供することであ
る。The problem to be solved by the present invention is that, in the conventional technique, when matrix multiplication is performed, the number of times of data transfer between the processing device and the storage device is very large, and the data transfer is It takes a lot of time and cannot perform high-speed processing. The purpose of the present invention is to
These conventional problems are solved, the number of data transfers between the processing device and the storage device is reduced, the pipeline processing technology and the parallel operation technology are easily used together, and high-speed matrix multiplication is enabled. A matrix multiplication device is provided.

【００１３】[0013]

【課題を解決するための手段】上記目的を達成するた
め、本発明の行列乗算装置は、（１）記憶装置に格納さ
れている第１および第２の二つの行列の乗算を行ない、
この乗算結果を、記憶装置に書き込む行列乗算装置であ
り、記憶装置から読み出した第１の行列の一つの要素毎
に、この要素に対応する第２の行列の全ての乗算要素
を、記憶装置から順次に読み出し、それぞれの掛け算を
行ない、第１の行列の各要素に対応する掛け算結果を累
積して加算する行列演算回路を設け、この行列演算回路
の加算結果を、記憶装置に書き込むことを特徴とする。
また、（２）上記（１）に記載の行列乗算装置におい
て、行列演算回路は、記憶装置から第１の行列の一つの
要素を読み出す第１の行列読み出し回路と、この第１の
行列読み出し回路で読み出した要素に対応する第２の行
列の全ての乗算要素を、記憶装置から順次に読み出す第
２の行列読み出し回路と、この第２の行列読み出し回路
で順次に読み出す各乗算要素と、第１の行列読み出し回
路で読み出した一つの要素との掛け算を、順次に行なう
乗算回路と、この乗算回路により、第１の行列読み出し
回路と第２の行列読み出し回路で順次に読み出す各要素
に対応して算出されたそれぞれの掛け算結果を、順次に
累積加算する加算回路とを具備し、この加算回路の加算
結果を、記憶装置に書き込むことを特徴とする。また、
（３）上記（２）に記載の行列乗算装置において、第１
の行列読み出し回路は、第１の行列の同じ列の要素群か
ら、一つの要素を順次に読み出すことを特徴とする。ま
た、（４）上記（２）に記載の行列乗算装置において、
第１の行列読み出し回路は、第１の行列の同じ行の要素
群から、一つの要素を順次に読み出すことを特徴とす
る。また、（５）上記（２）から（４）のいずれかに記
載の行列乗算装置において、第２の行列読み出し回路
は、第２の行列の同じ列の要素群から、乗算要素を順次
に読み出すことを特徴とする。また、（６）上記（２）
から（４）のいずれかに記載の行列乗算装置において、
第２の行列読み出し回路は、第２の行列の同じ行の要素
群から、乗算要素を順次に読み出すことを特徴とする。
また、（７）上記（１）から（６）のいずれかに記載の
行列乗算装置において、行列演算回路は、加算結果を格
納する部分加算結果格納回路を具備し、この部分加算結
果格納回路に格納した加算結果が、それぞれ対応する要
素間の行列乗算を全て完了した時点で、この加算結果
を、記憶装置に順次に書き込むことを特徴とする。ま
た、（８）上記（１）から（６）のいずれかに記載の行
列乗算装置において、行列演算回路は、全ての要素間の
行列乗算を完了するまで、加算結果を格納する全加算結
果格納回路を具備し、全ての要素間の行列乗算を完了し
た時点で、この全加算結果格納回路に格納した全ての加
算結果を、まとめて、記憶装置に書き込むことを特徴と
する。また、（９）上記（１）から（８）のいずれかに
記載の行列乗算装置において、少なくとも、行列演算回
路による第２の行列の各要素の記憶装置からの読み出し
と、乗算および加算とを含む動作を、パイプライン処理
する制御回路を具備することを特徴とする。また、（１
０）上記（１）から（９）のいずれかに記載の行列乗算
装置において、記憶装置に格納されている第１および第
２の二つの行列の乗算を、任意に分割された単位で、そ
れぞれ、並列に行なう複数個の行列演算回路を設けるこ
とを特徴とする。In order to achieve the above object, the matrix multiplication device of the present invention (1) multiplies two first and second matrices stored in a storage device,
This is a matrix multiplication device that writes this multiplication result in a storage device, and for each element of the first matrix read from the storage device, all multiplication elements of the second matrix corresponding to this element are stored in the storage device. A matrix arithmetic circuit for sequentially reading out, performing each multiplication, accumulating and adding the multiplication results corresponding to each element of the first matrix is provided, and the addition result of the matrix arithmetic circuit is written to a storage device. And
(2) In the matrix multiplication device according to (1) above, the matrix operation circuit includes a first matrix read circuit that reads one element of the first matrix from the storage device, and the first matrix read circuit. A second matrix read circuit that sequentially reads from the storage device all the multiplication elements of the second matrix that correspond to the elements that were read in step 1, and each multiplication element that is sequentially read by the second matrix read circuit; Of the matrix read circuit, and a multiplication circuit for sequentially performing multiplication with one element read by the matrix read circuit, and this multiplication circuit corresponds to each element sequentially read by the first matrix read circuit and the second matrix read circuit. The present invention is characterized by including an adder circuit for sequentially accumulatively adding the calculated multiplication results, and writing the addition result of the adder circuit to a storage device. Also,
(3) In the matrix multiplication device according to (2), the first
The matrix read circuit of 1 is characterized in that one element is sequentially read from the element group of the same column of the first matrix. (4) In the matrix multiplication device according to (2) above,
The first matrix reading circuit is characterized in that one element is sequentially read from the element group in the same row of the first matrix. (5) In the matrix multiplication device according to any one of (2) to (4), the second matrix reading circuit sequentially reads the multiplication elements from the element group in the same column of the second matrix. It is characterized by Also, (6) above (2)
In the matrix multiplication device according to any one of (4) to (4),
The second matrix read circuit is characterized in that the multiplication elements are sequentially read from the element groups in the same row of the second matrix.
(7) In the matrix multiplication device according to any one of (1) to (6), the matrix operation circuit includes a partial addition result storage circuit that stores an addition result, and the partial addition result storage circuit includes: It is characterized in that the stored addition results are sequentially written in the storage device when the matrix multiplication between the corresponding elements is completed. (8) In the matrix multiplication device according to any one of (1) to (6), the matrix operation circuit stores the addition result until the matrix multiplication between all the elements is completed. The present invention is characterized by including a circuit, and when the matrix multiplication between all the elements is completed, all the addition results stored in the full addition result storage circuit are collectively written in the storage device. (9) In the matrix multiplication device according to any one of (1) to (8), at least reading of each element of the second matrix by the matrix calculation circuit from the storage device and multiplication and addition are performed. It is characterized by comprising a control circuit for carrying out pipeline processing of the operation including. Also, (1
0) In the matrix multiplication device according to any one of (1) to (9), the multiplication of the first and second two matrices stored in the storage device is performed in arbitrarily divided units, respectively. , A plurality of matrix operation circuits that are operated in parallel are provided.

【００１４】[0014]

【作用】本発明においては、行列乗算装置は、第１の行
列の一つの要素毎に、対応する第２の行列の全ての乗算
要素を読み出して、行列の各要素の乗算を行なう。そし
て、第１の行列の同列の各要素のそれぞれの乗算結果
を、累積して加算する。このことにより、第１の行列の
要素の記憶装置からの読み出し回数を削減することがで
きる。さらに、行列乗算装置内でパイプライン処理を行
なうことにより、行列の乗算を高速に処理できる。ま
た、行列演算回路を複数個設け、それぞれ並列に使用す
ることにより、さらに、行列乗算処理を高速化すること
ができる。In the present invention, the matrix multiplication device reads all the multiplication elements of the corresponding second matrix for each element of the first matrix and performs the multiplication of each element of the matrix. Then, the multiplication results of the respective elements in the same column of the first matrix are accumulated and added. This makes it possible to reduce the number of times the elements of the first matrix are read from the storage device. Further, by performing pipeline processing in the matrix multiplication device, matrix multiplication can be processed at high speed. Further, by providing a plurality of matrix operation circuits and using them in parallel, it is possible to further speed up the matrix multiplication process.

【００１５】[0015]

【実施例】以下、本発明の実施例を、図面により詳細に
説明する。図１は、本発明を施した行列乗算装置の本発
明に係わる構成の第１の実施例を示すブロック図であ
る。本実施例の行列乗算装置１１は、記憶装置１から、
本発明の第２の行列としての行列Ａの要素１ａを読み出
して格納する本発明の第２の行列読み出し回路としての
行列Ａ格納回路３と、記憶装置１から、本発明の第１の
行列としての行列Ｂの要素１ｂを読み出して格納する本
発明の第１の行列読み出し回路としての行列Ｂ格納回路
２と、これらの行列Ａ格納回路３および行列Ｂ格納回路
２の内容を掛け算する本発明に係わる乗算回路４と、こ
の乗算回路４によって掛け算された結果を格納する乗算
結果格納回路５と、行列計算において求める行列の全て
の要素の累積加算の途中結果を格納する本発明に係わる
全加算結果格納回路９と、この全加算結果格納回路９の
内容を読み出して格納する加算結果格納回路６と、この
加算結果格納回路６の内容と、乗算結果格納回路５の内
容とを足し算する本発明に係わる加算回路７と、この加
算回路７によって足し算された結果を、全加算結果格納
回路９に書き込む前に、一時的に格納する仮加算格納回
路８と、行列乗算装置１１の全体制御と、記憶装置１へ
のアクセス制御、および、本発明に係わるパイプライン
処理制御などを行なう制御回路１０とにより構成されて
いる。尚、この制御回路１０を除く各回路により、本発
明の行列演算回路が構成され、その乗算結果である行列
Ｃの要素１ｃは、制御回路１０の制御に基づき、記憶装
置１に書き込まれる。Embodiments of the present invention will now be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a first embodiment of a configuration according to the present invention of a matrix multiplication device according to the present invention. The matrix multiplication device 11 of the present embodiment is
The matrix A storage circuit 3 as the second matrix reading circuit of the present invention that reads and stores the element 1a of the matrix A as the second matrix of the present invention, and the storage device 1 as the first matrix of the present invention. The matrix B storage circuit 2 as the first matrix readout circuit of the present invention for reading and storing the element 1b of the matrix B of the above, and the present invention for multiplying the contents of these matrix A storage circuit 3 and matrix B storage circuit 2 The related multiplication circuit 4, the multiplication result storage circuit 5 for storing the result of multiplication by the multiplication circuit 4, and the full addition result according to the present invention for storing the intermediate result of cumulative addition of all the elements of the matrix obtained in the matrix calculation. The storage circuit 9, the addition result storage circuit 6 for reading and storing the contents of the full addition result storage circuit 9, the contents of the addition result storage circuit 6, and the contents of the multiplication result storage circuit 5 are added. The addition circuit 7 according to the present invention, the temporary addition storage circuit 8 for temporarily storing the result of addition by the addition circuit 7 before writing it in the full addition result storage circuit 9, and the overall control of the matrix multiplication device 11. , A control circuit 10 for controlling access to the storage device 1 and controlling pipeline processing according to the present invention. Each circuit except the control circuit 10 constitutes a matrix operation circuit of the present invention, and the element 1c of the matrix C which is the multiplication result is written in the storage device 1 under the control of the control circuit 10.

【００１６】このような構成により、本実施例の行列乗
算装置１１は、行列Ａと行列Ｂとの乗算において、行列
の要素の記憶装置１からの読み出し回数を削減し、高速
化することができる。以下、図２〜図４に示す行列の乗
算式を用いて、行列乗算装置１１の動作を説明する。
尚、行列の乗算（Ｃ＝ＡＢ）の行列Ｃの各要素の計算において、行列Ｃのｊ列の要素を算出する場合を用いて
説明する。With such a configuration, the matrix multiplication device 11 of this embodiment can reduce the number of times of reading the elements of the matrix from the storage device 1 and speed up the multiplication of the matrix A and the matrix B. .. The operation of the matrix multiplication device 11 will be described below using the matrix multiplication formulas shown in FIGS.
Calculation of each element of matrix C of matrix multiplication (C = AB) In the above, a case where the element of the j-th column of the matrix C is calculated will be described.

【００１７】図２〜図４は、それぞれ、図１における行
列乗算装置の本発明に係わる行列の乗算手順の一実施例
を示す説明図である。まず、図２に示すように、各行列
Ａ、Ｂ、Ｃに対して、枠で囲まれた部分に着目する。す
なわち、行列Ｂの一つの要素ｂ１ｊと、行列Ａの要素ａ
ｋ１（ｋ＝１〜ｌ）を、図１の乗算回路４で逐次掛け算
し、その結果を、図１の乗算結果格納回路５に格納す
る。そして、その乗算結果と、図１の全加算結果格納回
路９から加算結果格納回路６に読み出した内容（初期値
は「０」）とを、図１の加算回路７で加算して、その結
果を、一時的に、図１の仮加算結果格納回路８に格納し
た後、図１の全加算結果格納回路９に書き込む。2 to 4 are explanatory views showing one embodiment of the matrix multiplication procedure according to the present invention of the matrix multiplication apparatus in FIG. First, as shown in FIG. 2, for each matrix A, B, and C, pay attention to the portion surrounded by a frame. That is, one element b1j of matrix B and element a of matrix A
k1 (k = 1 to 1) is sequentially multiplied by the multiplication circuit 4 of FIG. 1, and the result is stored in the multiplication result storage circuit 5 of FIG. Then, the multiplication result and the content (initial value “0”) read from the full addition result storage circuit 9 of FIG. 1 to the addition result storage circuit 6 are added by the addition circuit 7 of FIG. Is temporarily stored in the temporary addition result storage circuit 8 of FIG. 1 and then written in the full addition result storage circuit 9 of FIG.

【００１８】次に、図３に示すように、各行列Ａ、Ｂ、
Ｃに対して、枠で囲まれた部分に着目する。すなわち、
行列Ｂの一つの要素ｂ２ｊと、行列Ａの要素ａｋ２（ｋ
＝１〜ｌ）を、図１の乗算回路４で逐次掛け算し、その
結果を、図１の乗算結果格納回路５に格納する。そし
て、その乗算結果と、図１の全加算結果格納回路９から
加算結果格納回路６に読み出した内容とを、図１の加算
回路７で加算して、その結果を、一時的に、図１の仮加
算結果格納回路８に格納した後、図１の全加算結果格納
回路９に書き込む。Next, as shown in FIG. 3, each matrix A, B,
For C, pay attention to the part surrounded by the frame. That is,
One element b2j of matrix B and element ak2 (k of matrix A
= 1 to 1) are sequentially multiplied by the multiplication circuit 4 of FIG. 1, and the result is stored in the multiplication result storage circuit 5 of FIG. Then, the multiplication result and the contents read from the full addition result storage circuit 9 of FIG. 1 to the addition result storage circuit 6 are added by the addition circuit 7 of FIG. 1, and the result is temporarily shown in FIG. 1 is stored in the temporary addition result storage circuit 8 and then written in the full addition result storage circuit 9 in FIG.

【００１９】このように、ｉ回目の処理は、図４に示す
ように、各行列Ａ、Ｂ、Ｃに対して、枠で囲まれた部分
に着目して、行列Ｂの一つの要素ｂｉｊと、行列Ａの要
素ａｋｉ（＝ｊ）（ｋ＝１〜ｌ）を、図１の乗算回路４
で逐次掛け算し、その結果を、図１の乗算結果格納回路
５に格納し、その乗算結果と、図１の全加算結果格納回
路９から加算結果格納回路６に読み出した内容（（ｉ−
１）回目までの累積加算結果）とを、図１の加算回路７
で加算して、その結果を、一時的に、図１の仮加算結果
格納回路８に格納した後、図１の全加算結果格納回路９
に書き込む。このような処理を、行列Ｃの列の数（ｎ
回）繰り返して行列Ｃ全体が求められる。As described above, in the i-th processing, as shown in FIG. 4, for each matrix A, B, and C, paying attention to the portion surrounded by the frame, one element bij of the matrix B and , The element aki (= j) (k = 1 to 1) of the matrix A is multiplied by the multiplication circuit 4 of FIG.
1 and the result is stored in the multiplication result storage circuit 5 of FIG. 1, and the multiplication result and the contents read from the full addition result storage circuit 9 of FIG. 1 to the addition result storage circuit 6 ((i-
1) cumulative addition result up to the first time) and the addition circuit 7 in FIG.
1 and the result is temporarily stored in the temporary addition result storage circuit 8 of FIG. 1, and then the full addition result storage circuit 9 of FIG.
Write in. Such processing is performed by the number of columns of the matrix C (n
The entire matrix C is obtained repeatedly.

【００２０】次に、図５、および、図６を用いて、図１
の行列乗算装置１１の動作を、さらに詳しく説明する。
図５、および、図６は、図１における行列乗算装置の本
発明に係わる動作の一実施例を示すフローチャートであ
る。まず、図５に示すように、記憶装置に格納されてい
る行列Ｂの要素ｂｋｊを、図１の行列Ｂ格納回路２に読
み出す（ステップ５０１）。次に、記憶装置に格納され
ている行列Ａの要素ａｉｊを、図１の行列Ａ格納回路３
に読み出す（ステップ５０２）。そして、行列Ａ格納回
路の内容と、行列Ｂ格納回路の内容を、図１の乗算回路
４で掛け算する（ステップ５０３）。その掛け算の結果
を、図１の乗算結果格納回路５に格納する（ステップ５
０４）。Next, referring to FIG. 5 and FIG. 6, FIG.
The operation of the matrix multiplication device 11 will be described in more detail.
5 and 6 are flowcharts showing an embodiment of the operation of the matrix multiplication apparatus in FIG. 1 according to the present invention. First, as shown in FIG. 5, the element bkj of the matrix B stored in the storage device is read into the matrix B storage circuit 2 of FIG. 1 (step 501). Next, the element aij of the matrix A stored in the storage device is stored in the matrix A storage circuit 3 of FIG.
(Step 502). Then, the contents of the matrix A storage circuit and the contents of the matrix B storage circuit are multiplied by the multiplication circuit 4 in FIG. 1 (step 503). The result of the multiplication is stored in the multiplication result storage circuit 5 of FIG. 1 (step 5).
04).

【００２１】次に、累積加算結果を格納している図１の
全加算結果格納回路９から、（ｋ−１）回目までの加算
結果を、図１の加算結果格納回路６に読みだす（ステッ
プ５０５）。この加算結果格納回路の内容と、ステップ
５０３において格納した乗算結果格納回路の内容とを、
図１の加算回路７で足し算する（ステップ５０６）。そ
して、その加算結果を、図１の仮加算結果格納回路８に
書き込み（ステップ５０７）、さらに、この仮加算結果
格納回路の内容を、図１の全加算結果格納回路９に書き
込む（ステップ５０８）。Next, the (k-1) th addition result is read from the addition result storage circuit 9 of FIG. 1 which stores the cumulative addition result to the addition result storage circuit 6 of FIG. 1 (step 505). The contents of this addition result storage circuit and the contents of the multiplication result storage circuit stored in step 503 are
The addition circuit 7 of FIG. 1 performs addition (step 506). Then, the addition result is written to the temporary addition result storage circuit 8 of FIG. 1 (step 507), and the content of the temporary addition result storage circuit is written to the full addition result storage circuit 9 of FIG. 1 (step 508). ..

【００２２】そして、図６に示すように、ステップ５０
２〜５０８のループを、ｌ回繰り返し（ステップ５０
９）、行列Ｂの一つの要素、例えば、図２に示すよう
に、要素ｂ１ｊと、この要素ｂ１ｊに対応する行列Ａの
全ての乗算要素（ａ１１〜ａｌ１）との乗算を行ない、
要素ｂ１ｊに対する行列Ｃの各要素（ｃ１ｊ〜ｃｌｊ）
を得る。このようにして、行列Ｂの一つの要素に対応し
て、行列Ｃの各要素を得たならば、ステップ５０１に戻
り、次の行列Ｂの一要素、例えば、図３に示す要素ｂ２
ｊを読み出し、以下、ステップ５０１〜５０８のループ
を、ｌ回繰り返す。このことにより、要素ｂ１ｊに対す
る行列Ｃの各要素（ｃ１ｊ〜ｃｌｊ）と、要素ｂ２ｊに
対する行列Ｃの各要素（ｃ１ｊ〜ｃｌｊ）との累積結果
を得ることができる。Then, as shown in FIG. 6, step 50
The loop of 2 to 508 is repeated 1 times (step 50
9), one element of the matrix B, for example, as shown in FIG. 2, the element b1j is multiplied by all the multiplication elements (a11 to al1) of the matrix A corresponding to this element b1j,
Each element of the matrix C for the element b1j (c1j to clj)
To get In this way, when each element of the matrix C is obtained corresponding to one element of the matrix B, the process returns to step 501 and one element of the next matrix B, for example, the element b2 shown in FIG.
j is read, and then the loop of steps 501 to 508 is repeated l times. As a result, the cumulative result of each element (c1j to clj) of the matrix C for the element b1j and each element (c1j to clj) of the matrix C for the element b2j can be obtained.

【００２３】次の行列Ｂの要素に対しても、同様の処理
を繰り返し、ステップ５０１〜５０９のループを、ｍ回
繰り返す（ステップ５１０）。このことにより、行列Ｂ
の一つの列の全要素、例えば、図２〜図４における各要
素（ｂ１ｊ〜ｂｍｊ）に対する行列Ａの各乗算要素（ａ
１１〜ａｌｍ）の乗算と累積加算を行ない、行列Ｃの一
つの列の全要素（ｃ１ｊ〜ｃｌｊ）を得ることができ
る。このようにして、行列Ｂの一つの列の全要素に対応
して、行列Ｃの一列分の全要素を得たならば、ステップ
５０１に戻り、行列Ｂの次の列の一要素、例えば、図２
〜図４に示す要素ｂ１ｎを読み出す。The same processing is repeated for the next element of the matrix B, and the loop of steps 501 to 509 is repeated m times (step 510). This gives the matrix B
Of all the elements in one column, for example, each multiplication element (a of the matrix A for each element (b1j to bmj) in FIGS. 2 to 4).
11 to alm) and cumulative addition are performed to obtain all the elements (c1j to clj) in one column of the matrix C. In this way, when all the elements of one column of the matrix C are obtained corresponding to all the elements of one column of the matrix B, the process returns to step 501 and one element of the next column of the matrix B, for example, Figure 2
~ The element b1n shown in FIG. 4 is read.

【００２４】以下、ステップ５０２〜５０８のループの
ｌ回の繰り返しと、ステップ５０１〜５０９のループの
ｍ回の繰り返しを行ない、行列Ｂの次の列の全要素に対
応して、行列Ｃの一つの列の全要素（ｃ１ｎ〜ｃｌｎ）
を得ることができる。そして、行列Ａと行列Ｂの各要素
に対して、ステップ５０２〜５０８のループのｌ回の繰
り返しと、ステップ５０１〜５０９のループのｍ回の繰
り返しを、ｎ回繰り返すことにより（ステップ５１
１）、行列Ｃの全ての要素（ｃ１１〜ｃｌｎ）を得るこ
とができる。After that, the loop of steps 502-508 is repeated 1 times and the loop of steps 501-509 is repeated m times, and one of the matrix C corresponds to all elements in the next column of the matrix B. All elements in one row (c1n to cln)
Can be obtained. Then, for each element of the matrix A and the matrix B, the loop of steps 502-508 is repeated 1 times and the loop of steps 501-509 is repeated m times (step 51
1), all the elements (c11 to cln) of the matrix C can be obtained.

【００２５】このようにして図１の全加算結果格納回路
９に格納された行列Ｃの全ての要素（ｃ１１〜ｃｌｎ）
を、記憶装置に書き込む（ステップ５１２）み、処理を
終了する。この結果、ｍ×（ｌ＋１）×ｎ回の記憶装置
からの行列要素の読み出しと、ｍ×ｌ×ｎ回の乗算演算
および加算演算、そして、ｌ×ｎ回の記憶装置への行列
要素の書き込みとなり、従来技術に比較して、記憶装置
からの行列の要素の読み出し回数を削減することができ
る。すなわち、従来は、２ｍ×ｌ×ｎ回であり、その差
は、２ｍ×ｌ×ｎ−ｍ×（ｌ＋１）×ｎ＝ｍｎ（ｌ−
１）となり、ｌ＞１の条件で、２ｍ×ｌ×ｎ−ｍ×（ｌ
＋１）×ｎ＞０、故に、２ｍ×ｌ×ｎ＞ｍ×（ｌ＋１）
×ｎとなる。In this way, all the elements (c11 to cln) of the matrix C stored in the full addition result storage circuit 9 of FIG.
Is written in the storage device (step 512), and the process ends. As a result, m × (l + 1) × n times of reading of matrix elements from the storage device, m × l × n times of multiplication and addition operations, and 1 × n times of writing of the matrix elements to the storage device. Therefore, the number of times of reading the elements of the matrix from the storage device can be reduced as compared with the conventional technique. That is, conventionally, it is 2m × l × n times, and the difference is 2m × l × n−m × (l + 1) × n = mn (l−
1), and under the condition of l> 1, 2m × l × n−m × (l
+1) × n> 0, therefore 2m × l × n> m × (l + 1)
Xn.

【００２６】次に、図７を用いて、上述のステップ５０
２〜５０８の処理に対するパイプライン処理の適用を説
明する。図７は、図１における行列乗算装置の本発明に
係わるパイプライン処理の一実施例を示す説明図であ
る。本図において、６１ａ〜６１ｃは、記憶装置から行
列Ａ格納回路への、行列Ａの読み出し処理であり、図
中、例えば、ＡＭＲ（ｋ）は、記憶装置から行列Ａ格納
回路への、行列Ａの、ｋ番目の読み出し処理を表わし、
図５のステップ５０２に対応する。また、６２ａ〜６２
ｃは、行列Ａと行列Ｂの掛け算、及び、この乗算結果
の、図１の乗算結果格納回路５への書き込み処理であ
り、図中、例えば、ＭＵＸ（ｋ）は、ｋ番目の行列Ａと
行列Ｂの掛け算と書き込み処理を表わし、それぞれ、図
５のステップ５０３、５０４に対応する。Next, referring to FIG. 7, the above step 50 is performed.
The application of pipeline processing to the processing of 2 to 508 will be described. FIG. 7 is an explanatory diagram showing an embodiment of pipeline processing according to the present invention of the matrix multiplication device in FIG. In the figure, reference numerals 61a to 61c denote a process of reading the matrix A from the storage device to the matrix A storage circuit. In the figure, for example, AMR (k) is the matrix A from the storage device to the matrix A storage circuit. Represents the k-th reading process of
This corresponds to step 502 in FIG. Also, 62a to 62
c is a multiplication of the matrix A and the matrix B, and a writing process of the multiplication result to the multiplication result storage circuit 5 of FIG. 1. For example, in the figure, MUX (k) is the kth matrix A The multiplication and the writing process of the matrix B are shown, which correspond to steps 503 and 504 of FIG. 5, respectively.

【００２７】また、６３ａ〜６３ｃは、図１の全加算結
果格納回路９から、加算結果格納回路６への累積加算結
果の読み出し処理であり、図中、例えば、ＡＲＲ（ｋ）
は、（ｋ−１）回目までの累積加算結果の読み出し処理
を表わし、図５のステップ５０５に対応する。また、６
４ａ〜６４ｃは、図１の乗算結果格納回路５の内容と加
算結果格納回路６の内容の足し算、及び、この加算結果
の図１の仮加算結果格納回路８への書き込み処理であ
り、図中、例えば、ＡＤＤ（ｋ）は、ｋ番目の乗算結果
格納回路の内容と（ｋ−１）回目までの加算結果格納回
路の内容の足し算と書き込み処理を表わし、それぞれ、
図５のステップ５０６、５０７に対応する。また、６５
ａ〜６５ｃは、図１の仮加算結果格納回路８の内容の全
加算結果格納回路９への書き込み処理であり、図中、例
えば、ＡＲＷ（ｋ）は、ｋ番目の仮加算結果格納回路の
内容の全加算結果格納回路への書き込み処理を表わし、
図５のステップ５０８に対応する。Reference numerals 63a to 63c denote a process of reading the cumulative addition result from the full addition result storage circuit 9 of FIG. 1 to the addition result storage circuit 6, for example, ARR (k).
Represents the process of reading the cumulative addition result up to the (k-1) th time and corresponds to step 505 in FIG. Also, 6
Reference numerals 4a to 64c denote addition of the contents of the multiplication result storage circuit 5 and the contents of the addition result storage circuit 6 of FIG. 1, and a writing process of the addition result to the temporary addition result storage circuit 8 of FIG. For example, ADD (k) represents addition and writing processing of the contents of the kth multiplication result storage circuit and the contents of the addition result storage circuit up to (k-1) th time, and respectively.
This corresponds to steps 506 and 507 in FIG. Also, 65
a to 65c are processes for writing the contents of the temporary addition result storage circuit 8 of FIG. 1 to the full addition result storage circuit 9. In the figure, for example, ARW (k) is the kth temporary addition result storage circuit. Represents the process of writing the contents to the full addition result storage circuit,
This corresponds to step 508 in FIG.

【００２８】このように、本実施例においては、図５に
おけるステップ５０２〜５０８の処理を、三つの独立し
た処理に分割する。このことにより、図１における行列
乗算装置１１の演算速度を高速化することができる。さ
らに、行列を分割して行列の乗算を行なうことにより、
処理の高速化を図ることができ、以下、次の図８、図９
を用いて、その行列乗算装置の構成と動作を説明する。As described above, in this embodiment, the processing of steps 502 to 508 in FIG. 5 is divided into three independent processings. As a result, the calculation speed of the matrix multiplication device 11 in FIG. 1 can be increased. Furthermore, by dividing the matrix and performing matrix multiplication,
Since the processing speed can be increased, the following FIG. 8 and FIG.
The configuration and operation of the matrix multiplication device will be described using.

【００２９】図８は、本発明に係わる行列の分割形態の
一実施例を示す説明図である。本実施例においては、行
列Ａ７１を行方向にＬ分割する。そして、例えば、分割
した行列Ａ７１のＩ番目の行列ａＩ７２（図中の斜線部
分）の要素群と、行列Ｂ７３全体（斜線部分）を処理す
ると、行列Ａ７１の他の行列の要素に関係なく、行列Ｃ
のＩ番目の行列ｃＩ７４（斜線部分）の要素群が求めら
れる。ここで、次の図９に示すように、行列Ａ７１のそ
れぞれの行列に対応して、図１で説明した行列演算回路
を複数個設けて、各行列毎に、独立して処理を行なうこ
とができる。FIG. 8 is an explanatory view showing an embodiment of a matrix division mode according to the present invention. In the present embodiment, the matrix A71 is divided into L in the row direction. Then, for example, when the element group of the I-th matrix aI72 (hatched portion in the drawing) of the divided matrix A71 and the entire matrix B73 (hatched portion) are processed, the matrix is transformed regardless of the elements of other matrices of the matrix A71. C
The element group of the I-th matrix cI74 (hatched portion) of is calculated. Here, as shown in FIG. 9 below, a plurality of matrix operation circuits described in FIG. 1 are provided corresponding to each matrix of the matrix A71, and processing can be performed independently for each matrix. it can.

【００３０】図９は、図１における行列乗算装置の本発
明に係わる構成の第２の実施例を示すブロック図であ
る。本実施例においては、複数の行列演算回路８１〜８
４で、それぞれ、図８に示す行列Ａ７１のａ１〜ａＬま
での各行列と、行列Ｂ７３との乗算を、並列して行なう
構成となっている。このように、行列Ａの分割数（Ｌ
個）だけ、行列演算回路８１〜８４を設け、図８の行列
Ａ７１を分割した各行列ａ１〜ａＬの要素群と、行列Ｂ
全体とを、各行列演算回路８１〜８４に、並列に処理さ
せることにより、行列の乗算を、さらに高速に行なうこ
とができる。FIG. 9 is a block diagram showing a second embodiment of the configuration according to the present invention of the matrix multiplication device in FIG. In this embodiment, a plurality of matrix operation circuits 81 to 8 are provided.
4, each of the matrices a1 to aL of the matrix A71 shown in FIG. 8 and the matrix B73 are multiplied in parallel. Thus, the number of divisions of the matrix A (L
), Matrix operation circuits 81 to 84 are provided, and the element group of each matrix a1 to aL obtained by dividing the matrix A71 of FIG.
By causing the matrix operation circuits 81 to 84 to process the whole in parallel, the matrix multiplication can be performed at a higher speed.

【００３１】以上、図１〜図９を用いて説明したよう
に、本実施例の行列乗算装置では、行列の乗算におい
て、一方の行列の一つの要素毎に、他方の行列の全ての
乗算要素の乗算を行なう。このことにより、記憶装置に
格納されている行列の要素の読み出し回数を削減するこ
とができる。また、行列乗算装置内部の演算技術に、パ
イプライン処理技術を用いることにより、処理時間を短
縮化できる。また、行列を分割して、並列処理が容易と
なるために、さらに、処理速度を高速化することができ
る。As described above with reference to FIGS. 1 to 9, in the matrix multiplication apparatus of the present embodiment, in the matrix multiplication, every one element of one matrix has all the multiplication elements of the other matrix. Multiplication of. This makes it possible to reduce the number of times the matrix elements stored in the storage device are read out. Moreover, the processing time can be shortened by using the pipeline processing technique as the arithmetic technique inside the matrix multiplication device. Further, since the matrix is divided to facilitate parallel processing, the processing speed can be further increased.

【００３２】尚、本発明は、図１〜図９を用いて説明し
た実施例に限定されるものではない。例えば、図１〜図
６で説明した実施例においては、列単位で行列の乗算と
累積加算を行なっているが、行単位で行なっても良い。
また、行列乗算が全て完了するまで、全ての結果を、全
加算結果格納回路に格納し、まとめて、記憶装置に書き
込む動作で説明したが、例えば、本発明の部分加算結果
格納回路を用いることにより、加算結果が完了した要素
毎に、順次、記録装置への書き込みを行なうものでも良
い。The present invention is not limited to the embodiment described with reference to FIGS. For example, in the embodiments described with reference to FIGS. 1 to 6, matrix multiplication and cumulative addition are performed in column units, but they may be performed in row units.
Further, the operation of storing all the results in the full addition result storage circuit and collectively writing the results in the storage device until the matrix multiplication is completed is explained. For example, the partial addition result storage circuit of the present invention is used. Thus, the writing to the recording device may be sequentially performed for each element for which the addition result is completed.

【００３３】[0033]

【発明の効果】本発明によれば、処理装置と記憶装置間
のデータ転送回数を削減することができ、さらに、パイ
プライン処理技術、及び、並列演算技術を採用すること
により、行列の乗算を高速に処理することが可能とな
る。According to the present invention, it is possible to reduce the number of times of data transfer between the processing device and the storage device. Furthermore, by adopting the pipeline processing technology and the parallel operation technology, the matrix multiplication can be performed. It becomes possible to process at high speed.

【００３４】[0034]

[Brief description of drawings]

【図１】本発明を施した行列乗算装置の本発明に係わる
構成の第１の実施例を示すブロック図である。FIG. 1 is a block diagram showing a first embodiment of a configuration according to the present invention of a matrix multiplication device according to the present invention.

【図２】図１における行列乗算装置の本発明に係わる行
列の乗算手順の一実施例を示す説明図である。FIG. 2 is an explanatory diagram showing an embodiment of a matrix multiplication procedure according to the present invention of the matrix multiplication device in FIG.

【図３】図１における行列乗算装置の本発明に係わる行
列の乗算手順の一実施例を示す説明図である。FIG. 3 is an explanatory diagram showing an embodiment of a matrix multiplication procedure according to the present invention of the matrix multiplication device in FIG.

【図４】図１における行列乗算装置の本発明に係わる行
列の乗算手順の一実施例を示す説明図である。4 is an explanatory diagram showing an example of a matrix multiplication procedure according to the present invention of the matrix multiplication device in FIG.

【図５】図１における行列乗算装置の本発明に係わる動
作の一実施例を示すフローチャートである。5 is a flowchart showing an embodiment of the operation of the matrix multiplication device in FIG. 1 according to the present invention.

【図６】図１における行列乗算装置の本発明に係わる動
作の一実施例を示すフローチャートである。FIG. 6 is a flowchart showing an embodiment of the operation of the matrix multiplication device in FIG. 1 according to the present invention.

【図７】図１における行列乗算装置の本発明に係わるパ
イプライン処理の一実施例を示す説明図である。7 is an explanatory diagram showing an example of pipeline processing according to the present invention of the matrix multiplication device in FIG. 1. FIG.

【図８】本発明に係わる行列の分割形態の一実施例を示
す説明図である。FIG. 8 is an explanatory diagram showing an example of a matrix division mode according to the present invention.

【図９】本発明を施した行列乗算装置の本発明に係わる
構成の第２の実施例を示すブロック図である。FIG. 9 is a block diagram showing a second embodiment of the configuration of the matrix multiplication device according to the present invention according to the present invention.

【図１０】行列の乗算の表記形態を示す説明図である。FIG. 10 is an explanatory diagram showing a notation form of matrix multiplication.

【図１１】従来のコンピュータによる行列の乗算手順を
示す説明図である。FIG. 11 is an explanatory diagram showing a matrix multiplication procedure by a conventional computer.

【図１２】従来の行列乗算を行なう処理装置の構成を示
すブロック図である。FIG. 12 is a block diagram showing a configuration of a conventional processing device for performing matrix multiplication.

【図１３】図１２における処理装置の行列乗算の動作を
示すフローチャートである。13 is a flowchart showing an operation of matrix multiplication of the processing device in FIG.

[Explanation of symbols]

１記憶装置１ａ行列Ａの要素１ｂ行列Ｂの要素１ｃ行列Ｃの要素２行列Ｂ格納回路３行列Ａ格納回路４乗算回路５乗算結果格納回路６加算結果格納回路７加算回路８仮加算格納回路９全加算結果格納回路１０制御回路１１行列乗算装置６１ａ〜６１ｃ行列Ａの読み出し処理６２ａ〜６２ｃ行列Ａと行列Ｂの掛け算と書き込み処
理６３ａ〜６３ｃ累積加算結果の読み出し処理６４ａ〜６４ｃ乗算結果格納回路の内容と加算結果格
納回路の内容の足し算と書き込み処理６５ａ〜６５ｃ全加算結果格納回路への書き込み処理７１行列Ａ７２行列ＡのＩ番目の行列ａＩ７３行列Ｂ７４行列ＣのＩ番目の行列ｃＩ８１〜８４行列演算回路１１０記憶装置１１１処理装置１１２行列Ａの要素１１３行列Ｂの要素１１４行列Ａ格納領域１１５行列Ｂ格納領域１１６レジスタ群１１７行列Ｃの要素１１８演算器１１９乗算結果格納領域１２０加算結果格納領域1 storage device 1a element of matrix A 1b element of matrix B 1c element of matrix C 2 matrix B storage circuit 3 matrix A storage circuit 4 multiplication circuit 5 multiplication result storage circuit 6 addition result storage circuit 7 addition circuit 8 temporary addition storage circuit 9 Full addition result storage circuit 10 Control circuit 11 Matrix multiplication device 61a to 61c Matrix A reading process 62a to 62c Multiplying matrix A and matrix B and writing process 63a to 63c Cumulative addition result reading process 64a to 64c Of multiplication result storage circuit Addition of contents and contents of addition result storage circuit and write processing 65a to 65c Write processing to full addition result storage circuit 71 Matrix A 72 I-th matrix aI 73 of matrix A 73 B 74 I-th matrix of matrix C 81 -84 matrix operation circuit 110 storage device 111 processing device 112 element of matrix A 113 element of matrix B 1 4 matrix A storage area 115 matrix B storage area 116 of the register group 117 matrix C elements 118 calculator 119 multiplies the result storage area 120 addition result storage area

Claims

[Claims]

1. A matrix multiplication device that multiplies two first and second matrices stored in a storage device and writes the multiplication result into the storage device, wherein the first and second matrices are read from the storage device. For each element of one matrix, all the multiplication elements of the second matrix corresponding to the element are sequentially read from the storage device, and each multiplication is performed to obtain each element of the first matrix. A matrix multiplication device characterized by comprising matrix operation means for accumulating and adding corresponding multiplication results, and writing the addition result of the matrix operation means to the storage device.

2. The matrix multiplication device according to claim 1, wherein the matrix calculation means includes first matrix reading means for reading one element of the first matrix from the storage device, and the first matrix. Second matrix reading means for sequentially reading from the storage device all the multiplication elements of the second matrix corresponding to the elements read by the reading means;
Multiplication means for sequentially multiplying each of the multiplication elements sequentially read by the matrix reading means and one element read by the first matrix reading means, and the first matrix reading means by the multiplication means. And an adding means for sequentially accumulatively adding the respective multiplication results calculated corresponding to the respective elements sequentially read by the second matrix reading means, and the addition results of the adding means are stored in the storage device. A matrix multiplication device characterized by writing.

3. The matrix multiplication device according to claim 2, wherein the first matrix reading means sequentially reads the one element from an element group in the same column of the first matrix. Matrix multiplication device.

4. The matrix multiplication device according to claim 2, wherein the first matrix reading means sequentially reads the one element from an element group in the same row of the first matrix. Matrix multiplication device.

5. The matrix multiplication device according to any one of claims 2 to 4, wherein the second matrix reading means sequentially outputs the multiplication elements from an element group in the same column of the second matrix. A matrix multiplication device characterized in that it is read out to.

6. The matrix multiplication device according to any one of claims 2 to 4, wherein the second matrix reading means sequentially outputs the multiplication elements from an element group in the same row of the second matrix. A matrix multiplication device characterized in that it is read out to.

7. The matrix multiplication device according to claim 1, wherein the matrix calculation means comprises a partial addition result storage means for storing the addition result, and the partial addition result storage means. The matrix multiplication device characterized in that when the addition results stored in [3] are all matrix multiplied between corresponding elements, the addition results are sequentially written to the storage device.

8. The matrix multiplication device according to claim 1, wherein the matrix calculation means stores the addition result until the matrix multiplication between all the elements is completed. Matrix multiplication characterized by comprising storage means and, when the matrix multiplication between all the elements is completed, all the addition results stored in the full addition result storage means are collectively written in the storage device. apparatus.

9. The matrix multiplication device according to claim 1, wherein at least each element of the second matrix is read from the storage device by the matrix calculation means, and the multiplication and A matrix multiplying device comprising a control means for pipeline processing of operations including addition.

10. The matrix multiplication device according to claim 1, wherein the multiplication of the first and second matrices stored in the storage device is an arbitrarily divided unit. The matrix multiplying device is characterized in that a plurality of matrix calculating means are provided in parallel.