JP3333779B2

JP3333779B2 - Matrix arithmetic unit

Info

Publication number: JP3333779B2
Application number: JP2001393487A
Authority: JP
Inventors: 学志高橋; 貴則古園
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2001-12-26
Filing date: 2001-12-26
Publication date: 2002-10-15
Anticipated expiration: 2016-08-23
Also published as: JP2002269067A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、例えば行列同士の
かけ算等を行う行列演算装置に関するものである。行列
のかけ算をする際に用いる行列演算装置において、演算
器として１個の積和演算器をもつか、あるいは同時並列
動作する複数個の積和演算器をもち、１個または複数個
の積和演算器に対して効率的に必要な行列演算のための
データを入力できるようなレジスタ、またはメモリをも
つ行列演算装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a matrix operation device for performing, for example, multiplication of matrices. A matrix arithmetic unit used for multiplication of a matrix has one multiply-accumulate unit as an arithmetic unit, or has a plurality of multiply-accumulate units operating simultaneously and in parallel, and has one or more multiply-accumulate units The present invention relates to a matrix operation device having a register or a memory capable of efficiently inputting data for a required matrix operation to an operation unit.

【０００２】[0002]

【従来の技術】行列のかけ算の場合、行列の項どうしの
乗算、加算が行列の項の総数に比べて多く行われる。こ
れは、かける際に左側にある行列の１行に入っている項
の数と、求める項の数とをかけた分だけ乗算の回数が多
くなり、かける際に左側にある行列の１行に入っている
項の数から１引いた項の数と、求める項の数とをかけた
分だけ加算の回数が多くなるためである。したがって、
従来このような行列のかけ算を高速に行う方法として、
乗算と加算の演算器を複数個持ち、複数の演算を同時に
行う並列演算の方法を用いて高速に処理させる、例え
ば、特開昭６３−８６０７９号公報のような方法があっ
た。2. Description of the Related Art In the case of matrix multiplication, multiplication and addition of matrix terms are performed more than the total number of matrix terms. This means that the number of multiplications is increased by the number of terms in the matrix on the left when multiplied by the number of terms to be calculated, and the number of terms in the matrix on the left when multiplied This is because the number of additions is increased by an amount obtained by multiplying the number of terms obtained by subtracting one from the number of included terms and the number of terms to be obtained. Therefore,
Conventionally, as a method of performing such matrix multiplication at high speed,
For example, there is a method as disclosed in Japanese Patent Application Laid-Open No. 63-86079, in which a plurality of arithmetic units for multiplication and addition are provided, and high-speed processing is performed using a parallel operation method for simultaneously performing a plurality of operations.

【０００３】また、行列のかけ算では、かける際に左側
にある行列の行の項に対して右側にある行列の列の項が
必要になり、行列の演算を進めていく際に行列の項を連
続して引き出して演算する場合には、データを並べ換え
ることが必要であったが、データを並べ換える操作を、
行および列変換用のメモリなどを用いて不要とする技術
はあった。例えば、特開平１−８２１７５号公報参照。In addition, in matrix multiplication, a matrix column term on the right side is required for a matrix row term on the left side when multiplying. In the case of continuous extraction and calculation, it was necessary to rearrange the data.
There has been a technique for eliminating the need for using a memory for row and column conversion. For example, see JP-A-1-82175.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、演算器
で行列のかけ算を行う場合、乗算、加算を行う演算器に
対して行列の項を次々に与える必要があり、行列の項の
一部である行や列に対してデータを次々に与えることは
できたが、行列全体のデータを内部に持ち、次々に与え
るといった方法はとられていなかった。However, when a matrix is multiplied by a computing unit, it is necessary to give the matrix terms to the computing unit that performs multiplication and addition one after another, which is a part of the matrix term. Data could be given to rows and columns one after another, but there was no way to keep the entire matrix data inside and give it one after another.

【０００５】また、行や列に比べて大きな行列全体の項
のデータ量を取り込むためには、フリップフロップ等論
理回路を用いた記憶装置に比べて、メモリ等記憶の専用
回路を用いた記憶装置を用いる方が回路規模が少なくて
済み、効率が良いのであるが、従来では、ＦＩＦＯのよ
うにデータを入れた順番に使用するなど、パイプライン
のバッファ程度にしか利用されていなかった。In order to capture the data amount of the entire matrix term larger than the rows and columns, a storage device using a dedicated circuit for storage such as a memory is required as compared with a storage device using a logic circuit such as a flip-flop. Is smaller in circuit size and more efficient, but it has been conventionally used only for a buffer of a pipeline, for example, using data in the order in which data is input like FIFO.

【０００６】したがって、本発明では、一つの演算器あ
るいは並列に動作させる複数の演算器に対して、行列演
算における必要なデータを全て同時に演算器に与えるよ
うにして演算処理を高速に行い、かつ行列全体の項のデ
ータを予め記憶手段に取り込むことにより、行列のデー
タを外部記憶装置から追加読み出しすることなしに行列
全体の演算を連続的に行うことを目的とする。Therefore, according to the present invention, the arithmetic processing is performed at a high speed by simultaneously supplying all the necessary data in the matrix operation to one arithmetic unit or a plurality of arithmetic units operating in parallel. An object of the present invention is to continuously perform calculations on the entire matrix without loading additional data of the matrix from an external storage device by previously loading data of terms of the entire matrix into storage means.

【０００７】[0007]

【課題を解決するための手段】請求項１記載の行列演算
装置は、入力される行列演算命令を解読して書き込み制
御信号，読み出し制御信号，選択制御信号および演算制
御信号を出力するデコーダと、外部記憶装置から読み出
される行列の項のデータをそれぞれ複数個ずつ記憶する
複数個の記憶手段と、書き込み制御信号を入力として書
き込み信号を複数個の記憶手段に与えることにより行列
の項のデータを複数個の記憶手段に書き込ませる書き込
み部と、読み出し制御信号を入力として読み出し信号を
複数個の記憶手段に与えることにより複数個の記憶手段
にそれぞれ記憶した行列の項のデータの中から複数個の
データを同時に読み出させる読み出し部と、演算制御信
号と複数個の演算用データを入力として行列演算を行い
演算結果を出力する演算器と、複数の記憶手段と演算器
の間に設けられて複数個のデータと選択制御信号を入力
として複数個の演算用データを演算器の複数の入力端に
選択的に与えるセレクタ部とを備えている。そして、複
数個の記憶手段に対して行列の項のデータを記憶してお
く際に、演算器での演算に必要な項のデータを同時に出
力できる個数に分割して記憶し、複数個の記憶手段から
演算器に対して、演算に必要な項のデータ全てを読み出
し制御信号に従って同時に出力するようにしている。According to a first aspect of the present invention, there is provided a matrix operation device which decodes an input matrix operation instruction and outputs a write control signal, a read control signal, a selection control signal, and an operation control signal. A plurality of storage means for storing a plurality of matrix term data read from the external storage device, and a plurality of matrix term data by providing a write signal to the plurality of storage means by inputting a write control signal; A write unit for writing to the plurality of storage means, and a plurality of data from the data of the matrix terms stored in the plurality of storage means by applying a read control signal to the plurality of storage means as input. A read-out unit for simultaneously reading out, a matrix operation with an operation control signal and a plurality of operation data as inputs and outputting the operation result An arithmetic unit, a selector unit provided between the plurality of storage units and the arithmetic unit, and selectively inputting the plurality of data and the selection control signal and providing a plurality of arithmetic data to the plurality of input terminals of the arithmetic unit; It has. Then, when storing the data of the terms of the matrix in the plurality of storage means, the data of the terms required for the operation in the computing unit is divided into a number that can be output at the same time and stored. All the data of the terms required for the operation are simultaneously output from the means to the arithmetic unit in accordance with the read control signal.

【０００８】ここで、行列を４行４列とし、記憶手段を
４個とし、行列演算に必要な行列の項のデータを４個に
分割して記憶手段に格納している。Here, the matrix has four rows and four columns, the number of storage means is four, and the data of the terms of the matrix required for the matrix operation are divided into four and stored in the storage means.

【０００９】そして、第１から第４までの４個の記憶手
段に２つの行列の項のデータを格納するときに、１つ目
の行列の１行１列、１行３列、２行２列、２行４列、３
行１列、３行３列、４行２列、４行４列を第１の記憶手
段に、１つ目の行列の１行２列、１行４列、２行１列、
２行３列、３行２列、３行４列、４行１列、４行３列を
第２の記憶手段に、２つ目の行列の１行１列、１行３
列、２行２列、２行４列、３行１列、３行３列、４行２
列、４行４列を第３の記憶手段に、２つ目の行列の１行
２列、１行４列、２行１列、２行３列、３行２列、３行
４列、４行１列、４行３列を第４の記憶手段に格納して
いる。When the data of the terms of the two matrices are stored in the first to fourth storage means, the first matrix has one row and one column, one row and three columns, and two rows and two columns. Columns, 2 rows, 4 columns, 3
Row 1 column, 3 row 3 columns, 4 row 2 columns, 4 rows 4 columns are stored in the first storage means as 1 row 2 columns, 1 row 4 columns, 2 rows 1 column of the first matrix,
2 rows and 3 columns, 3 rows and 2 columns, 3 rows and 4 columns, 4 rows and 1 column, and 4 rows and 3 columns are stored in the second storage means as 1 row, 1 column, 1 row 3 of the second matrix.
Column, 2 rows and 2 columns, 2 rows and 4 columns, 3 rows and 1 column, 3 rows and 3 columns, 4 rows and 2
Columns, 4 rows, 4 columns in the third storage means, 1 row, 2 columns, 1 row, 4 columns, 2 rows, 1 column, 2 rows, 3 columns, 3 rows, 2 columns, 3 rows, 4 columns of the second matrix Four rows and one column and four rows and three columns are stored in the fourth storage means.

【００１０】この構成によると、書き込み部は複数個の
記憶手段に行列のすべての項のデータを書き込み制御信
号に従って書き込み、読み出し部は、読み出し制御信号
に従って複数個の記憶手段から行列演算に必要な複数個
のデータを同時に読み出し、セレクタ部は、選択制御信
号に従って複数個のデータを選択して演算器に複数個の
演算用データを与える。また、記憶手段に対して行列の
項のデータを記憶させておく際に、演算器での演算に必
要な複数の項のデータを同時に出力できる個数に分割し
て記憶しているので、記憶手段から演算器に対して、演
算に必要な項のデータの全てを読み出し制御信号に従っ
て同時に出力することができ、また行列のかけ算を行う
内部の演算を連続して行う場合にも、行列の項のデータ
を演算器に対して連続して与え続けることができる。According to this configuration, the writing section writes data of all the items of the matrix into the plurality of storage means in accordance with the write control signal, and the reading section reads the data necessary for the matrix operation from the plurality of storage means in accordance with the read control signal. A plurality of data are simultaneously read, and the selector selects a plurality of data in accordance with the selection control signal and supplies a plurality of operation data to the arithmetic unit. Further, when the data of the terms of the matrix is stored in the storage means, the data of the plurality of terms required for the operation in the arithmetic unit are divided into a number that can be output simultaneously, and stored. Can output all the data of the terms required for the operation to the arithmetic unit simultaneously in accordance with the read control signal.Also, in the case where the internal operation for performing the matrix multiplication is performed continuously, the Data can be continuously provided to the arithmetic unit.

【００１１】また、４行４列の行列同士のかけ算を行う
際に、行列の項のデータを適切に分配して記憶手段に格
納することにより、必要なデータを連続して同時に引き
出せる。In addition, when multiplying matrices of 4 rows and 4 columns, necessary data can be continuously and simultaneously extracted by appropriately distributing the data of the terms of the matrices and storing the data in the storage means.

【００１２】請求項２記載の行列演算装置は、請求項１
記載の行列演算装置において、演算器を複数個設け、複
数個の演算器を並列動作可能としている。According to a second aspect of the present invention, there is provided a matrix operation device.
In the described matrix operation device, a plurality of operation units are provided, and the plurality of operation units can be operated in parallel.

【００１３】この構成によると、複数の演算器が並列に
動作するので、複数の演算を同時に行うことができ、行
列演算を高速に行うことができる。According to this configuration, since a plurality of operation units operate in parallel, a plurality of operations can be performed simultaneously, and a matrix operation can be performed at high speed.

【００１４】請求項３記載の行列演算装置は、入力され
る行列演算命令を解読して書き込み制御信号，読み出し
制御信号，選択制御信号および演算制御信号を出力する
デコーダと、外部記憶装置から読み出される行列の項の
データをそれぞれ複数個ずつ記憶する複数個の記憶手段
と、書き込み制御信号を入力として書き込み信号を複数
個の記憶手段に与えることにより行列の項のデータを複
数個の記憶手段に書き込ませる書き込み部と、読み出し
制御信号を入力として読み出し信号を複数個の記憶手段
に与えることにより複数個の記憶手段にそれぞれ記憶し
た行列の項のデータの中から複数個のデータを同時に読
み出させる読み出し部と、演算制御信号と複数個の演算
用データを入力として行列演算を行い演算結果を出力す
る演算器と、複数の記憶手段と演算器の間に設けられて
複数個のデータと選択制御信号を入力として複数個の演
算用データを演算器の複数の入力端に選択的に与えるセ
レクタ部とを備えている。そして、複数個の記憶手段に
対して行列の項のデータを記憶しておく際に、演算器で
の演算に必要な項のデータを同時に出力できる個数に分
割して記憶し、複数個の記憶手段から演算器に対して、
演算に必要な項のデータ全てを読み出し制御信号に従っ
て同時に出力するようにしている。According to a third aspect of the present invention, a matrix operation device decodes an input matrix operation instruction and outputs a write control signal, a read control signal, a selection control signal, and an operation control signal, and is read from an external storage device. A plurality of storage means for storing a plurality of matrix term data, and a matrix control data is input to the plurality of storage means to write the matrix term data to the plurality of storage means. A read unit for simultaneously reading a plurality of data from the data of the matrix items stored in the plurality of storage units by providing a read unit to the plurality of storage units with a read control signal as an input; A computing unit that performs a matrix operation with an operation control signal and a plurality of operation data as inputs and outputs an operation result; Provided between the storage means and the computing unit and a selector unit providing a plurality of arithmetic data a selection control signal and a plurality of data as input selectively to a plurality of input terminals of the arithmetic unit. Then, when storing the data of the terms of the matrix in the plurality of storage means, the data of the terms required for the operation in the computing unit is divided into a number that can be output at the same time and stored. From the means to the arithmetic unit,
All the data of the terms required for the operation are output simultaneously according to the read control signal.

【００１５】ここで、行列を４行４列とし、記憶手段を
８個とし、行列演算に必要な行列の項のデータを８個に
分割して記憶手段に格納している。Here, the matrix has 4 rows and 4 columns, the number of storage means is eight, and the data of the matrix term necessary for the matrix operation is divided into eight and stored in the storage means.

【００１６】そして、第１から第８までの８個の記憶手
段に２つの行列の項のデータを格納するときに、１つ目
の行列の１行１列、２行４列、３行３列、４行２列を第
１の記憶手段に、１つ目の行列の１行２列、２行１列、
３行４列、４行３列を第２の記憶手段に、１つ目の行列
の１行３列、２行２列、３行１列、４行４列を第３の記
憶手段に、１つ目の行列の１行４列、２行３列、３行２
列、４行１列を第４の記憶手段に、２つ目の行列の１行
１列、２行４列、３行３列、４行２列を第５の記憶手段
に、２つ目の行列の１行２列、２行１列、３行４列、４
行３列を第６の記憶手段に、２つ目の行列の１行３列、
２行２列、３行１列、４行４列を第７の記憶手段に、２
つ目の行列の１行４列、２行３列、３行２列、４行１列
を第８の記憶手段に格納している。When the data of the terms of the two matrices are stored in the eight storage means from the first to the eighth, the first matrix has the first row, the first column, the second row, the fourth column, the third row, and the third row. Column, 4 rows, 2 columns in the first storage means, 1 row, 2 columns, 2 rows, 1 column of the first matrix,
3 rows and 4 columns and 4 rows and 3 columns are stored in the second storage means, and 1 row and 3 columns, 2 rows and 2 columns, 3 rows and 1 column, and 4 rows and 4 columns of the first matrix are stored in the third storage means. 1 row 4 columns, 2 rows 3 columns, 3 rows 2 of the first matrix
Column, 4 rows, 1 column in the fourth storage means, 1 row 1 column, 2 rows, 4 columns, 3 rows, 3 columns, 4 rows, 2 columns of the second matrix in the fifth storage means, 1 row, 2 columns, 2 rows, 1 column, 3 rows, 4 columns, 4
Row 3 column is stored in the sixth storage means, 1 row and 3 column of the second matrix,
Two rows and two columns, three rows and one column, and four rows and four columns are stored in the seventh storage means,
The first row, fourth column, second row, third column, third row, second column, and fourth row, first column of the first matrix are stored in the eighth storage means.

【００１７】この構成によると、書き込み部は複数個の
記憶手段に行列のすべての項のデータを書き込み制御信
号に従って書き込み、読み出し部は、読み出し制御信号
に従って複数個の記憶手段から行列演算に必要な複数個
のデータを同時に読み出し、セレクタ部は、選択制御信
号に従って複数個のデータを選択して演算器に複数個の
演算用データを与える。また、記憶手段に対して行列の
項のデータを記憶させておく際に、演算器での演算に必
要な複数の項のデータを同時に出力できる個数に分割し
て記憶しているので、記憶手段から演算器に対して、演
算に必要な項のデータの全てを読み出し制御信号に従っ
て同時に出力することができ、また行列のかけ算を行う
内部の演算を連続して行う場合にも、行列の項のデータ
を演算器に対して連続して与え続けることができる。According to this configuration, the writing section writes data of all the items of the matrix into the plurality of storage means in accordance with the write control signal, and the read section reads the data necessary for the matrix operation from the plurality of storage means in accordance with the read control signal. A plurality of data are simultaneously read, and the selector selects a plurality of data in accordance with the selection control signal and supplies a plurality of operation data to the arithmetic unit. Further, when the data of the terms of the matrix is stored in the storage means, the data of the plurality of terms required for the operation in the arithmetic unit are divided into a number that can be output simultaneously, and stored. Can output all the data of the terms required for the operation to the arithmetic unit simultaneously in accordance with the read control signal.Also, in the case where the internal operation for performing the matrix multiplication is performed continuously, the Data can be continuously provided to the arithmetic unit.

【００１８】また、４行４列の行列同士のかけ算を行う
際に、行列の項のデータを適切に分配して記憶手段に格
納することにより、必要なデータを連続して同時に引き
出せる。Further, when performing multiplication between matrices of 4 rows and 4 columns, by appropriately distributing the data of the terms of the matrices and storing them in the storage means, necessary data can be continuously and simultaneously extracted.

【００１９】請求項４記載の行列演算装置は、請求項３
記載の行列演算装置において、演算器を複数個設け、複
数個の演算器を並列動作可能としている。According to a fourth aspect of the present invention, there is provided a matrix operation device according to the third aspect.
In the described matrix operation device, a plurality of operation units are provided, and the plurality of operation units can be operated in parallel.

【００２０】この構成によると、複数の演算器が並列に
動作するので、複数の演算を同時に行うことができ、行
列演算を高速に行うことができる。According to this configuration, since a plurality of operation units operate in parallel, a plurality of operations can be performed simultaneously, and a matrix operation can be performed at high speed.

【００２１】[0021]

【発明の実施の形態】以下、本発明の実施の形態につい
て、図１を用いて説明する。以下の説明では、演算器と
して、２個の乗算器と２つの乗算結果を加算する加算器
とからなる積和演算器を利用する場合の構成例について
説明している。Embodiments of the present invention will be described below with reference to FIG. In the following description, a configuration example in the case of using a product-sum operation unit including two multipliers and an adder for adding two multiplication results is described as an operation unit.

【００２２】図１において、１は命令である。２は入力
される命令１を解読して制御信号４を出力するデコーダ
である。この場合、制御信号４は、書き込み制御信号と
読み出し制御信号と選択制御信号と演算制御信号とに分
けられる。In FIG. 1, 1 is an instruction. Reference numeral 2 denotes a decoder which decodes the input instruction 1 and outputs a control signal 4. In this case, the control signal 4 is divided into a write control signal, a read control signal, a selection control signal, and an arithmetic control signal.

【００２３】１９は積和演算器としての機能を有する演
算器である。２１は制御信号４（書き込み制御信号）を
入力として書き込み信号２２を出力する書き込み部、２
３は制御信号４（読み出し制御信号）を入力として読み
出し信号２４を出力する読み出し部である。Numeral 19 denotes an arithmetic unit having a function as a product-sum arithmetic unit. A writing unit 21 receives a control signal 4 (write control signal) as an input and outputs a write signal 22.
A read unit 3 receives a control signal 4 (read control signal) and outputs a read signal 24.

【００２４】３は演算に利用される行列の項のデータの
全てを記憶してある主記憶などの外部記憶装置である。
この外部記憶装置３のデータのバス幅は１項分である。Reference numeral 3 denotes an external storage device such as a main storage which stores all data of matrix terms used for calculation.
The bus width of the data in the external storage device 3 is one item.

【００２５】６，７，８，９は外部記憶装置３からの１
項分ずつ出力されるデータ５を入力して、書き込み信号
２２に従ってデータを記憶する記憶手段となるマトリク
スデータ分割格納メモリである。このマトリクスデータ
分割格納メモリ６〜９は、読み出し部２３から与えられ
る、読み出し信号２４に従って、出力データ１０〜１３
をそれぞれ出力する。6, 7, 8, and 9 are 1 from the external storage device 3.
This is a matrix data division storage memory serving as storage means for inputting data 5 output for each term and storing data in accordance with a write signal 22. The matrix data division storage memories 6 to 9 output data 10 to 13 according to a read signal 24 given from the read unit 23.
Are output.

【００２６】１４はマトリクスデータ分割格納メモリ６
〜９からの出力データ１０〜１３を入力とするセレクタ
部であり、このセレクタ部１４からは、演算器１９に必
要なデータとして、演算用データ１５，１６，１７，１
８を出力し、演算器１９へ与える。演算器１９は、それ
らのデータ１５〜１８を用いて制御信号４（演算制御信
号）に従って演算（積和演算）し、演算結果２０を出力
する。演算結果は書き込部２１にも入力され、制御信号
４（書き込み制御信号）に従ってマトリクスデータ分割
格納メモリ６，７，８，９のいずれかに格納される。14 is a matrix data division storage memory 6
The selector 14 receives the output data 10 to 13 from the selectors 9 to 9 as input, and the selector 14 outputs the data for operation 15, 16, 17, 1 as data necessary for the arithmetic unit 19.
8 is output to the arithmetic unit 19. The arithmetic unit 19 performs an operation (product-sum operation) using the data 15 to 18 according to the control signal 4 (operation control signal), and outputs an operation result 20. The calculation result is also input to the writing unit 21 and stored in any of the matrix data division storage memories 6, 7, 8, and 9 according to the control signal 4 (write control signal).

【００２７】ここで、具体的な行列演算の例として、２
個の行列α，βのかけ算を行う場合について説明する。
マトリクスデータ分割格納メモリ６，７，８，９には２
つの行列αおよび行列βのかけ算を行う際に用いる行列
α，βの項のデータを複数個ずつ格納する。そして、読
み出し部２３は、制御信号４（読み出し制御信号）を入
力してマトリクスデータ分割格納メモリ６，７，８，９
に格納された行列の項データから１回の積和演算に必要
な複数個のデータ１０，１１，１２，１３を同時に読み
出す。セレクタ部１４は、複数個のデータ１０，１１，
１２，１３と制御信号４（選択制御信号）を入力として
複数個の演算用データ１５，１６，１７，１８を演算器
１９に与える。そして、演算器１９は、複数個の演算用
データ１５，１６，１７，１８を入力し、演算結果２０
を出力し、また書き込み部２１へ与える。Here, as a specific example of the matrix operation, 2
A case where multiplication of the matrices α and β is performed will be described.
2 is stored in the matrix data division storage memories 6, 7, 8, and 9.
The data of the terms of the matrices α and β used when multiplying the two matrices α and β are stored. Then, the read unit 23 receives the control signal 4 (read control signal) and receives the matrix data division storage memories 6, 7, 8, 9
, A plurality of pieces of data 10, 11, 12, and 13 necessary for one product-sum operation are simultaneously read from the term data of the matrix stored in. The selector unit 14 includes a plurality of data 10, 11,.
A plurality of operation data 15, 16, 17, and 18 are supplied to an arithmetic unit 19 by using the input signals 12 and 13 and the control signal 4 (selection control signal). Then, the arithmetic unit 19 receives the plurality of operation data 15, 16, 17, and 18 and outputs the operation result 20.
Is output to the writing unit 21.

【００２８】上記の行列演算装置を利用した演算の例と
して、行列αを行列βに左からかける場合を考える。行
列α内のデータは、（数１）で、行列β内のデータは、
（数２）からなるとする。As an example of an operation using the above-described matrix operation device, consider a case where a matrix α is multiplied by a matrix β from the left. The data in the matrix α is (Equation 1), and the data in the matrix β is
Suppose that it consists of (Equation 2).

【００２９】[0029]

【数１】 (Equation 1)

【００３０】[0030]

【数２】 (Equation 2)

【００３１】まず、１行１列の項を求めるためには、ａ
×Ａ、ｂ×Ｅ、ｃ×Ｉ、ｄ×Ｍと、それらの加算が必要
となる。上述の演算器（積和演算器）１９では、２つの
乗算と乗算結果を加算することが同時に可能であるの
で、ａ×Ａ、ｂ×Ｅと、これらの結果を加算することが
１度でできる。次のサイクルでは、ｃ×Ｉ、ｄ×Ｍと、
これらの結果を加算し、最後のサイクルで全体を加算し
て１行１列の項の結果を求めることができる。First, in order to obtain a term of one row and one column, a
× A, b × E, c × I, d × M, and their addition are required. In the above-described arithmetic unit (product-sum arithmetic unit) 19, two multiplications and the addition of the multiplication results can be performed at the same time, so that a × A, b × E, and the addition of these results can be performed only once. it can. In the next cycle, c × I, d × M,
These results are added, and the whole is added in the last cycle to obtain the result of the item of 1 row and 1 column.

【００３２】その際、最初にａ，Ａ，ｂ，Ｅの４項が同
時に必要で、次にｃ，Ｉ，ｄ，Ｍの４項が同時に必要と
なる。最後のサイクルでは先に演算した結果の２項が同
時にあれば良い。以下、同様にして残りの行列演算を行
うことができる。At that time, first, the four terms a, A, b, and E are simultaneously required, and then the four terms c, I, d, and M are simultaneously required. In the last cycle, it suffices that the two terms of the result of the first operation be simultaneously. Hereinafter, the remaining matrix operations can be performed in the same manner.

【００３３】上記のような行列の各項毎の演算を連続的
に行っていくためには、１サイクルあたり平均１０／３
（≒３．３）項が同時に必要となる。しかしながら、主
記憶などの外部記憶装置３からのデータは、１項分のバ
ス幅しかなく、同時に４項分のデータを入力することが
できない。したがって、予めマトリクスデータ分割格納
メモリ１０，１１，１２，１３に行列α，βの全ての項
のデータを演算に必要な項を同時に出力できるように分
配して格納しておく。In order to continuously perform the operation for each term of the matrix as described above, an average of 10/3 per cycle is required.
($ 3.3) is required at the same time. However, the data from the external storage device 3 such as the main memory has only one item of bus width, and data of four items cannot be input at the same time. Therefore, the data of all the terms of the matrices α and β are distributed and stored in the matrix data division storage memories 10, 11, 12 and 13 so that the terms necessary for the operation can be output simultaneously.

【００３４】上記の４行４列の行列α，βのかけ算を行
う場合において、４個のマトリクスデータ分割格納メモ
リ１０，１１，１２，１３には、例えば以下のように、
データを分配する。すなわち、１つ目の行列αの１行１
列、１行３列、２行２列、２行４列、３行１列、３行３
列、４行２列、４行４列の項のデータａ，ｃ，ｆ，ｈ，
ｉ，ｋ，ｎ，ｐをマトリクスデータ分割格納メモリ１０
に、１つ目の行列αの１行２列、１行４列、２行１列、
２行３列、３行２列、３行４列、４行１列、４行３列の
項のデータｂ，ｄ，ｅ，ｇ，ｊ，ｌ，ｍ，ｏをマトリク
スデータ分割格納メモリ１１に、２つ目の行列βの１行
１列、１行３列、２行２列、２行４列、３行１列、３行
３列、４行２列、４行４列の項のデータＡ，Ｃ，Ｆ，
Ｈ，Ｉ，Ｋ，Ｎ，Ｐをマトリクスデータ分割格納メモリ
１２に、２つ目の行列βの１行２列、１行４列、２行１
列、２行３列、３行２列、３行４列、４行１列、４行３
列の項のデータＢ，Ｄ，Ｅ，Ｇ，Ｊ，Ｌ，Ｍ，Ｏをマト
リクスデータ分割格納メモリ１３に格納する。In the case of performing the above-mentioned multiplication of the matrices α and β of 4 rows and 4 columns, the four matrix data division storage memories 10, 11, 12 and 13 store, for example,
Distribute data. That is, 1 row 1 of the first matrix α
Column, 1 row, 3 columns, 2 rows, 2 columns, 2 rows, 4 columns, 3 rows, 1 column, 3 rows, 3
Column, 4 rows, 2 columns, 4 rows, 4 columns of data a, c, f, h,
i, k, n, p are stored in a matrix data division storage memory 10
In the first matrix α, 1 row and 2 columns, 1 row and 4 columns, 2 rows and 1 column,
The data b, d, e, g, j, l, m, and o of the items of 2 rows and 3 columns, 3 rows and 2 columns, 3 rows and 4 columns, 4 rows and 1 column, and 4 rows and 3 columns are stored in the matrix data division storage memory 11. In the second matrix β, terms of 1 row and 1 column, 1 row and 3 columns, 2 rows and 2 columns, 2 rows and 4 columns, 3 rows and 1 column, 3 rows and 3 columns, 4 rows and 2 columns, and 4 rows and 4 columns Data A, C, F,
H, I, K, N, and P are stored in the matrix data division storage memory 12 in the first row, second column, first row, four columns, and second row of the second matrix β.
Column, 2 rows and 3 columns, 3 rows and 2 columns, 3 rows and 4 columns, 4 rows and 1 column, 4 rows and 3
The data B, D, E, G, J, L, M, and O of the column items are stored in the matrix data division storage memory 13.

【００３５】上記のように、行列α，βの全ての項をマ
トリクスデータ分割格納メモリ１０〜１３に格納してお
くのは、１行２列、１行３列、…といった項を求める際
に、同じ項が必要になることが何度かあるため、前に利
用した項を再び利用したい場合に、外部記憶装置３から
再度読み出しをするのは、非効率であるからである。ま
た、行列α，βの全ての項を格納しておくことにより、
行列の右から掛けることや、左から掛けるといった場合
にも、いちいち内部に入った項の順番を入れ換えること
なしに、命令を変えるだけで対応可能である。As described above, all the terms of the matrices α and β are stored in the matrix data division storage memories 10 to 13 when the terms such as 1 row, 2 columns, 1 row, 3 columns,. This is because it is inefficient to read out the external storage device 3 again when it is desired to reuse the previously used term because the same term is required several times. Also, by storing all terms of the matrices α and β,
Even when the matrix is multiplied from the right or from the left, it can be dealt with simply by changing the instruction, without changing the order of the terms inside.

【００３６】この実施の形態の行列演算装置によると、
書き込み部２１は複数個のマトリクスデータ分割格納メ
モリ１０〜１３に行列の項のデータを制御信号４に従っ
て書き込み、読み出し部２３は、制御信号４に従って複
数個のマトリクスデータ分割格納メモリ１０〜１３から
行列演算に必要な複数個のデータを同時に読み出し、セ
レクタ部１４は、制御信号４に従って複数個のデータを
選択して演算器１９に複数個の演算用データ１５〜１８
を与え、マトリクスデータ分割格納メモリ１０〜１３に
対して行列の項のデータを記憶させておく際に、演算器
１９での演算に必要な複数の項のデータ１０〜１３を同
時に出力できる個数に分割して記憶しているので、マト
リクスデータ分割格納メモリ１０〜１３から演算器１９
に対して、演算に必要な項のデータの全てを制御信号４
に従って同時に出力することができ、また行列のかけ算
を行う内部の演算を連続して行っても、行列の項のデー
タを演算器に対して連続して与え続けることができる。
したがって、行列演算を高速に行うことができる。According to the matrix operation device of this embodiment,
The writing unit 21 writes the data of the matrix term into the plurality of matrix data division storage memories 10 to 13 according to the control signal 4, and the reading unit 23 reads the matrix data from the plurality of matrix data division storage memories 10 to 13 according to the control signal 4. A plurality of data necessary for the operation are simultaneously read out, and the selector unit 14 selects a plurality of data according to the control signal 4 and sends the plurality of operation data 15 to 18 to the arithmetic unit 19.
When the data of the terms of the matrix are stored in the matrix data division storage memories 10 to 13, the number of data 10 to 13 of the plurality of terms required for the operation by the arithmetic unit 19 can be simultaneously output. Since the data is divided and stored, the matrix data division storage memories 10 to 13 are used to
In response to this, all of the data of the term necessary for the operation
Can be output simultaneously, and even if the internal calculation for multiplying the matrix is continuously performed, the data of the terms of the matrix can be continuously provided to the arithmetic unit.
Therefore, the matrix operation can be performed at high speed.

【００３７】なお、上記の実施の形態では、積和演算を
行う演算器は１個設けているだけであったが、並列動作
する２個または４個の積和演算器を設けてもよく、演算
器の個数はそれらの個数に限定されることはなく、何個
でもよい。このように、演算器を多数並列的に設ける
と、複数の演算を同時に行うことができ、少ない演算サ
イクルで多くの演算を行うことができ、行列演算の高速
化を図ることができる。なお、その際に、演算器の個数
が増加すると、同時に出力すべきデータの個数が増える
ため、マトリクスデータ分割格納メモリの個数もそれに
合わせて増加させることが必要であり、マトリクスデー
タ分割格納メモリを８個にすることも可能であり、それ
以上に増加させることも可能である。In the above-described embodiment, only one arithmetic unit for performing the product-sum operation is provided. However, two or four product-sum arithmetic units operating in parallel may be provided. The number of arithmetic units is not limited to these numbers, and may be any number. As described above, when a large number of arithmetic units are provided in parallel, a plurality of arithmetic operations can be performed at the same time, many operations can be performed in a small number of operation cycles, and the speed of the matrix operation can be increased. At this time, if the number of arithmetic units increases, the number of data to be output at the same time increases. Therefore, it is necessary to increase the number of matrix data division storage memories accordingly. It is possible to use eight or more.

【００３８】ここで、マトリクスデータ分割格納メモリ
を８個にした場合において４行４列の行列α，βのかけ
算を行う場合を例にとると、８個のマトリクスデータ分
割格納メモリには、例えば以下のように、データを分配
する。すなわち、第１から第８までの８個のマトリクス
データ分割格納メモリに２つの行列の項のデータを格納
するときに、１つ目の行列αの１行１列、２行４列、３
行３列、４行２列の項のデータａ，ｈ，ｋ，ｎを第１の
マトリクスデータ分割格納メモリに、１つ目の行列αの
１行２列、２行１列、３行４列、４行３列の項のデータ
ｂ，ｅ，ｌ，ｏを第２のマトリクスデータ分割格納メモ
リに、１つ目の行列αの１行３列、２行２列、３行１
列、４行４列の項のデータｃ，ｆ，ｉ，ｐを第３のマト
リクスデータ分割格納メモリに、１つ目の行列αの１行
４列、２行３列、３行２列、４行１列の項のデータｄ，
ｇ，ｊ，ｍを第４のマトリクスデータ分割格納メモリ
に、２つ目の行列βの１行１列、２行４列、３行３列、
４行２列の項のデータＡ，Ｈ，Ｋ，Ｎを第５のマトリク
スデータ分割格納メモリに、２つ目の行列βの１行２
列、２行１列、３行４列、４行３列の項のデータＢ，
Ｅ，Ｌ，Ｏを第６のマトリクスデータ分割格納メモリ
に、２つ目の行列βの１行３列、２行２列、３行１列、
４行４列の項のデータＣ，Ｆ，Ｉ，Ｐを第７のマトリク
スデータ分割格納メモリに、２つ目の行列βの１行４
列、２行３列、３行２列のデータＤ，Ｇ，Ｊ，Ｍを第８
のマトリクスデータ分割格納メモリに格納する。Here, taking as an example the case where the matrix α and β of 4 rows and 4 columns are multiplied when the number of matrix data division storage memories is eight, the eight matrix data division storage memories include, for example, Distribute the data as follows. That is, when the data of the terms of the two matrices are stored in the eight matrix data division storage memories from the first to the eighth, the first matrix α has the first row, the first column, the second row, the fourth column, and the third matrix.
The data a, h, k, and n of the row 3 column, 4 row 2 column items are stored in the first matrix data division storage memory in the first matrix α in the first row, the second column, the second row, the first column, and the third row 4 The data b, e, l, and o in the column, row 3 and column 3 are stored in the second matrix data division storage memory in the first matrix α in the first row, the third column, the second row, the second column, and the third row 1
The data c, f, i, and p in the column, row 4 and column 4 are stored in the third matrix data division storage memory in the first matrix α in rows 1 and 4, 2 and 3, 3 and 2 columns, Data d of 4 rows and 1 column term,
g, j, and m are stored in the fourth matrix data division storage memory in the first matrix β, row 1, column 2, row 4, column 3, row 3, and column 3,
The data A, H, K, and N of the 4-row, 2-column term are stored in the fifth matrix data division storage memory in the first row of the second matrix β.
Column B, row 1, column 3, row 4, column 4, row 3, column data B,
E, L, and O are stored in the sixth matrix data division storage memory in the first matrix β in the first row, three columns, two rows, two columns, three rows, one column,
The data C, F, I, and P in the 4-row, 4-column terms are stored in the seventh matrix data division storage memory in the first matrix β in the first row 4
The data D, G, J, and M of column 2, row 3 and column 3
In the matrix data division storage memory.

【００３９】そして、この８個のマトリクスデータ分割
格納メモリから読み出されるデータをもとに２個または
４個の演算器を用いて積和演算を行って、行列のかけ算
を行う。Then, based on the data read from the eight matrix data division storage memories, a product-sum operation is performed using two or four arithmetic units to perform matrix multiplication.

【００４０】[0040]

【発明の効果】この発明の行列演算装置によると、行列
全体のデータを格納する記憶手段を有し、演算器の演算
に同時に必要なデータを同時に演算器に与えることがで
きるように、記憶手段を複数に分割して、さらに行列の
演算に同時に必要になるデータを別々の記憶手段に格納
しておくことにより、行列演算全体が終了するまで演算
器に連続して必要なデータを与えることが可能になり、
行列演算を高速に行うことができるという効果を奏す
る。According to the matrix operation device of the present invention, there is provided storage means for storing data of the entire matrix, and the storage means is provided so that data necessary for the operation of the operation unit can be simultaneously supplied to the operation unit. Is divided into a plurality of pieces of data, and the data required simultaneously for the matrix operation is stored in separate storage means, so that the necessary data can be continuously provided to the arithmetic unit until the entire matrix operation is completed. Becomes possible,
There is an effect that matrix operation can be performed at high speed.

【００４１】また、演算器を複数個設けることにより、
複数の演算を同時に行うことが可能になり、演算の回数
を減少させることができ、演算をいっそう高速に行うこ
とができるという効果を奏する。By providing a plurality of arithmetic units,
A plurality of operations can be performed at the same time, the number of operations can be reduced, and the operation can be performed at higher speed.

【００４２】また、特定の記憶手段に特定の行列の項を
格納しておくことにより、行列演算をする際に必要なデ
ータを連続して同時に引き出せ、また特定の記憶手段に
格納してあるので、同じ行列を利用して複数の演算をさ
せることができるという効果を奏する。Also, by storing specific matrix terms in a specific storage means, data necessary for performing a matrix operation can be continuously and simultaneously extracted and stored in a specific storage means. Thus, there is an effect that a plurality of operations can be performed using the same matrix.

[Brief description of the drawings]

【図１】本発明の実施の形態における行列演算装置の構
成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a matrix operation device according to an embodiment of the present invention.

[Explanation of symbols]

１命令２デコーダ３外部記憶装置４制御信号５データ６マトリクスデータ分割格納メモリ７マトリクスデータ分割格納メモリ８マトリクスデータ分割格納メモリ９マトリクスデータ分割格納メモリ１０マトリクスデータ分割格納メモリ６の出力データ１１マトリクスデータ分割格納メモリ７の出力データ１２マトリクスデータ分割格納メモリ８の出力データ１３マトリクスデータ分割格納メモリ９の出力データ１４セレクタ部１５演算用データ１６演算用データ１７演算用データ１８演算用データ１９演算器２０演算結果２１書き込み部２２書き込み信号２３読み出し部２４読み出し信号 REFERENCE SIGNS LIST 1 instruction 2 decoder 3 external storage device 4 control signal 5 data 6 matrix data division storage memory 7 matrix data division storage memory 8 matrix data division storage memory 9 matrix data division storage memory 10 output data of matrix data division storage memory 6 11 matrix data Output data of the division storage memory 7 12 Output data of the matrix data division storage memory 8 13 Output data of the matrix data division storage memory 9 14 Selector unit 15 Operation data 16 Operation data 17 Operation data 18 Operation data 19 Operation unit 20 Calculation result 21 Write section 22 Write signal 23 Read section 24 Read signal

フロントページの続き (56)参考文献特開昭54−120546（ＪＰ，Ａ) 特開昭62−97060（ＪＰ，Ａ) 特開平８−255151（ＪＰ，Ａ) 特開平５−346935（ＪＰ，Ａ) 特開平５−324700（ＪＰ，Ａ) 特開平４−43461（ＪＰ，Ａ) 特開平２−77967（ＪＰ，Ａ) 特開昭60−101671（ＪＰ，Ａ) 特開昭55−49763（ＪＰ，Ａ) 清木泰，ＡＳｐｅｃｉａｌ−ＰｕｒｐｏｓｅＣｏｍｐｕｔｅｒｆｏｒＳｏｌｖｉｎｇＤｅｎｓｅＭａｔｒｉｘＢａｓｅｄｏｎＧａｕｓｓｉａｎＥｌｉｍｉｎａｔｉｏｎＡｌｇｏｒｉｔｈｍ：ＧＥＮＥＲＡＬ，修士学位論文、日本，東京大学大学院総合文化研究科，1996年３月清木泰，他５名，密行列専用計算機ＧＥＮＥＲＡＬ−１の開発，情報処理学会研究報告，日本，社団法人情報処理学会，1995年３月10日，第95巻，第29 号，（95−ＡＲＣ−111），ｐ．65−72 中西恒夫，他３名，ＤＰＧ：データ分割グラフ，情報処理学会研究報告，日本，社団法人情報処理学会，1994年１月28日，第94巻，第13号，（94−ＡＲＣ −104，94−ＯＳ−62），ｐ．121−128 尾林善正，他４名，物理ＰＥのデータ構造を併用した並列記述言語ＡＤＥＴＲＡＮのプログラミングとその実効性能, 情報処理学会研究報告，日本，社団法人情報処理学会，1993年８月19日，第93 巻，第72号，（93−ＨＰＣ−48），ｐ. １−７ (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/10 - 17/18 Continuation of front page (56) References JP-A-54-120546 (JP, A) JP-A-62-97060 (JP, A) JP-A-8-255151 (JP, A) JP-A-5-346935 (JP) JP-A-5-324700 (JP, A) JP-A-4-43461 (JP, A) JP-A-2-77967 (JP, A) JP-A-60-1001671 (JP, A) 55-49763 (JP, A) Yasushi Kiyoshi, A Special-Purpose Computer for Solving Dense Matrix Based on Gaussian Elimination Algorithm Graduate School of Arts and Sciences, Tokyo, Japan, 1996. Tsuyoshi Kiyoki and 5 others, Development of a dedicated computer GENERA-1 for dense matrix, Research Report of the Information Processing Society of Japan, Information Processing Society of Japan, March 10, 1995, Vol. 95, No. 29, ( 95− RC-111), p. 65-72 Tsuneo Nakanishi and 3 others, DPG: Data-separated graph, Information Processing Society of Japan research report, Japan, Information Processing Society of Japan, January 28, 1994, Vol. 94, No. 13, (94- ARC-104, 94-OS-62), p. 121-128 Yoshimasa Obayashi, 4 others, Programming of Parallel Description Language ADETRAN Using Data Structure of Physical PE and Its Effective Performance, Information Processing Society of Japan, IPSJ, IPSJ, August 19, 1993 Vol. 93, No. 72, (93-HPC-48), p. 1-7 (58) Fields investigated (Int. Cl. ⁷ , DB name) G06F 17/10-17/18

Claims

(57) [Claims]

1. A decoder that decodes an input matrix operation command and outputs a write control signal, a read control signal, a selection control signal, and an operation control signal, and a plurality of matrix term data read from an external storage device. A plurality of storage means for storing the write control signal as input, and a writing unit for writing data of the items of the matrix to the plurality of storage means by giving a write signal to the plurality of storage means with the write control signal as an input; Read-out for simultaneously reading out a plurality of data from the data of the items of the matrix stored in the plurality of storage means by providing the read-out control signal as an input and providing a read-out signal to the plurality of storage means. A computing unit that performs a matrix operation with the operation control signal and a plurality of operation data as inputs and outputs an operation result. Receiving the plurality of data and the selection control signal as inputs between the plurality of storage means and the arithmetic unit to selectively supply the plurality of arithmetic data to the plurality of input terminals of the arithmetic unit And a selector unit, when storing the data of the terms of the matrix in the plurality of storage means, by dividing the data of the terms necessary for the operation in the computing unit into a number that can be output simultaneously. Storing, in accordance with the read control signal, all the data of the terms required for the operation from the plurality of storage means to the arithmetic unit at the same time; The data of the matrix term necessary for matrix operation is divided into four and stored in the storage means, and the data of two matrix terms are stored in the first to fourth storage means. Is stored in the first matrix Line 1
Column, 1 row, 3 columns, 2 rows, 2 columns, 2 rows, 4 columns, 3 rows, 1 column, 3 rows, 3
Columns, 4 rows and 2 columns, and 4 rows and 4 columns are stored in the first storage means in the first matrix, 1 row and 2 columns, 1 row and 4 columns, 2 rows and 1 column, 2 rows and 3 rows
Columns, 3 rows and 2 columns, 3 rows and 4 columns, 4 rows and 1 column, and 4 rows and 3 columns are stored in the second storage means as 1 row, 1 column, 1 row, 3 columns of the second matrix,
2 rows and 2 columns, 2 rows and 4 columns, 3 rows and 1 column, 3 rows and 3 columns, 4 rows and 2 columns,
Four rows and four columns are stored in the third storage means, in the second matrix, 1 row and 2 columns, 1 row and 4 columns, 2 rows and 1 column, 2 rows and 3 columns, 3 rows and 2 columns,
A matrix operation device wherein three rows and four columns, four rows and one column, and four rows and three columns are stored in the fourth storage means.

2. The matrix operation device according to claim 1, wherein a plurality of operation units are provided, and the plurality of operation units can operate in parallel.

3. A decoder that decodes an input matrix operation command and outputs a write control signal, a read control signal, a selection control signal, and an operation control signal, and a plurality of matrix term data read from an external storage device. A plurality of storage means for storing the write control signal as input, and a writing unit for writing data of the items of the matrix to the plurality of storage means by giving a write signal to the plurality of storage means with the write control signal as an input; Read-out for simultaneously reading out a plurality of data from the data of the items of the matrix stored in the plurality of storage means by providing the read-out control signal as an input and providing a read-out signal to the plurality of storage means. A computing unit that performs a matrix operation with the operation control signal and a plurality of operation data as inputs and outputs an operation result. Receiving the plurality of data and the selection control signal as inputs between the plurality of storage means and the arithmetic unit to selectively supply the plurality of arithmetic data to the plurality of input terminals of the arithmetic unit And a selector unit, when storing the data of the terms of the matrix in the plurality of storage means, by dividing the data of the terms necessary for the operation in the computing unit into a number that can be output simultaneously. Storing, in accordance with the read control signal, all the data of the terms required for the operation from the plurality of storage means to the arithmetic unit at the same time; The data of the matrix terms necessary for matrix operation are divided into eight and stored in the storage means, and the data of the two matrix terms are stored in the first to eighth storage means. Is stored in the first matrix Line 1
Columns, 2 rows and 4 columns, 3 rows and 3 columns, and 4 rows and 2 columns are stored in the first storage means as 1 row, 2 columns, 2 rows, 1 column, 3 rows, 4 rows of the first matrix.
Columns, 4 rows and 3 columns are stored in the second storage means, and 1 row and 3 columns, 2 rows and 2 columns, 3 rows and 1 column, and 4 rows and 4 columns of the first matrix are stored in the third storage means. 1 row, 4 columns, 2 rows 3 of the first matrix
Columns, 3 rows and 2 columns, and 4 rows and 1 column are stored in the fourth storage means, and 1 row and 1 column, 2 rows and 4 columns, 3 rows and 3 columns, and 4 rows and 2 columns of the second matrix are stored in the fifth storage means. In the storage means, 1 row and 2 columns of the second matrix,
Two rows and one column, three rows and four columns, and four rows and three columns are stored in the sixth storage means, in the second matrix, in one row and three columns, two rows and two columns, and three rows and one row.
Columns, 4 rows and 4 columns are stored in the seventh storage means, and 1 row and 4 columns, 2 rows and 3 columns, 3 rows and 2 columns, and 4 rows and 1 column of the second matrix are stored in the eighth storage means. A matrix operation device, characterized in that:

4. The matrix operation device according to claim 3, wherein a plurality of operation units are provided, and the plurality of operation units can be operated in parallel.