JPH05120321A

JPH05120321A - Method for operating matrix calculation

Info

Publication number: JPH05120321A
Application number: JP28514691A
Authority: JP
Inventors: Mitsuharu Oki; 光晴大木
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1991-10-30
Filing date: 1991-10-30
Publication date: 1993-05-18

Abstract

PURPOSE:To operate matrix calculation by using parallel processors in an SIMD system. CONSTITUTION:Circuits composed of at least one memory 100 and one processor element (operation processing circuit) 200 are parallelly arranged. The respective processor elements 200 can receive data from both adjacent memories as well in addition to the input/output of the correspondent memory 100. Thus, the data stored in the adjacent memories can be operated as well. By using these circuits, the matrix calculation is operated.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ビデオ信号処理におい
て特に有効な行列計算の演算方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a matrix calculation method particularly effective in video signal processing.

【０００２】[0002]

【従来の技術】単一の制御信号にて複数の単位演算回路
の制御を行う（ＳＩＭＤ＝ｓｉｎｇｌｅＩｎｓｔｒｕ
ｃｔｉｏｎＭｕｌｔｉｐｌｅＤａｔａ）方式の並列
プロセッサとは、複数のプロセッサーエレメントを並列
に配し、共通の命令（制御）信号を全てのプロセッサー
エレメントへ与えて、各プロセッサーエレメントでは同
じ演算処理をして行く回路である。2. Description of the Related Art A plurality of unit arithmetic circuits are controlled by a single control signal (SIMD = single Install).
A parallel processor of the section multiple data) system is a circuit in which a plurality of processor elements are arranged in parallel, a common instruction (control) signal is given to all the processor elements, and each processor element performs the same arithmetic processing. is there.

【０００３】代表的なＳＩＭＤ方式の並列プロセッサの
回路構成を図１、図２に示す。図１において、１つのメ
モリ１００と１つのプロセッサエレメント（演算処理回
路）２００より成る回路を並列に配している。各プロセ
ッサエレメント（演算処理回路）では、対応するメモリ
との入出力の他に、両隣のメモリからもデータを受け取
ることが出来るようにしてある。これにより、隣のメモ
リに格納されているデータをも演算処理することが可能
となる。A circuit configuration of a typical SIMD parallel processor is shown in FIGS. In FIG. 1, a circuit including one memory 100 and one processor element (arithmetic processing circuit) 200 is arranged in parallel. In each processor element (arithmetic processing circuit), in addition to input / output to / from the corresponding memory, data can be received from the memories on both sides. As a result, the data stored in the adjacent memory can be processed.

【０００４】図１内のプロセッサエレメント（演算処理
回路）の詳細図を図２に示す。図２において、対応する
メモリ（ｍ番目のプロセッサエレメント（演算処理回
路）ならｍ番目のメモリ）の出力Ｍからのデータ、左隣
のメモリ（ｍ−１番目のメモリ）の出力Ｌからのデー
タ、及び右隣のメモリ（ｍ＋１番目のメモリ）の出力Ｒ
からのデータがデータバス１０を介して乗算器２０及び
２入力セレクタ（他方の入力は０）３０に入力される。
乗算器２０では係数メモリ４０から読み出された係数と
の乗算が行われる。A detailed view of the processor element (arithmetic processing circuit) in FIG. 1 is shown in FIG. In FIG. 2, the data from the output M of the corresponding memory (the m-th memory for the m-th processor element (arithmetic processing circuit)), the data from the output L of the memory (m-1th memory) on the left side, And the output R of the memory next to the right (m + 1st memory)
The data from is input to the multiplier 20 and the 2-input selector (the other input is 0) 30 via the data bus 10.
The multiplier 20 multiplies the coefficient read from the coefficient memory 40.

【０００５】２入力セレクタ３０は、加算入力セレクタ
メモリ５０から読み出された選択信号（０、１）によ
り、１のときはデータバス１０からの値、０のときは０
を加算器６０に供給させる。The 2-input selector 30 receives a value from the data bus 10 when it is 1 and 0 when it is 0 according to the selection signal (0, 1) read from the addition input selector memory 50.
Is supplied to the adder 60.

【０００６】従って、加算入力セレクタメモリ５０から
読み出す信号を１とすれば、（ｍ番目のメモリ、あるい
はｍ−１番目のメモリ、あるいはｍ＋１番目のメモリか
ら読み出されたデータ）×（係数メモリから読み出され
た係数）＋（ｍ番目のメモリ、あるいはｍ−１番目のメ
モリ、あるいはｍ＋１番目のメモリから読み出されたデ
ータ）を計算し、データバス１０を介してｍ番目のメモ
リの入力Ｍに入力することにより、ｍ番目のメモリに上
述の乗加算結果を格納することが出来る。Therefore, if the signal read from the addition input selector memory 50 is 1, (data read from the mth memory, the m-1th memory, or the m + 1th memory) .times. (From the coefficient memory) (Read coefficient) + (data read from the mth memory, or the m-1th memory, or the m + 1th memory) is calculated, and the input M of the mth memory is input via the data bus 10. By inputting into, the above-mentioned multiplication and addition result can be stored in the m-th memory.

【０００７】また、加算入力セレクタメモリ５０から読
み出す信号を０とすれば、（ｍ番目のメモリ、あるいは
ｍ−１番目のメモリ、あるいはｍ＋１番目のメモリから
読み出されたデータ）×（係数メモリから読み出された
係数）を計算し、データバス１０を介してｍ番目のメモ
リの入力Ｍに入力することにより、ｍ番目のメモリに上
述の乗算結果を格納することが出来る。If the signal read from the addition input selector memory 50 is 0, (data read from the mth memory, the m-1th memory, or the m + 1th memory) * (from the coefficient memory) By calculating the read coefficient) and inputting it to the input M of the mth memory via the data bus 10, the above multiplication result can be stored in the mth memory.

【０００８】図２において、係数メモリ４０のアドレス
制御、加算入力セレクタメモリ５０のアドレス制御、ど
のメモリから入力されてきたデータを（データバスを介
して）乗算器２０に入力するかの制御、及びどのメモリ
から入力されてきたデータを（データバスを介して）２
入力セレクタ３０に入力するかの制御は、全てのプロセ
ッサエレメント（演算処理回路）２００において共通の
制御信号（図１の演算処理制御信号）により行われる。In FIG. 2, address control of the coefficient memory 40, address control of the addition input selector memory 50, control of which memory is input to the multiplier 20 (via the data bus), and Data input from any memory (via data bus) 2
Control of whether to input to the input selector 30 is performed by a common control signal (arithmetic processing control signal of FIG. 1) in all the processor elements (arithmetic processing circuits) 200.

【０００９】図１において、メモリ１００のアドレス制
御（即ち、出力Ｍに出力するデータの格納されている番
地の指定、入力Ｍに入力されるデータを格納する番地の
指定、出力Ｌに出力するデータの格納されている番地の
指定、出力Ｒに出力するデータの格納されている番地の
指定）は、全てのプロセッサエレメント（演算処理回
路）２００において共通の制御信号（アドレス制御信
号）により行われる。In FIG. 1, address control of the memory 100 (that is, designation of an address in which data to be output to the output M is stored, designation of an address to store data to be input to the input M, data to be output to the output L). The designation of the stored address and the designation of the stored address of the data to be output to the output R) are performed by a common control signal (address control signal) in all the processor elements (arithmetic processing circuits) 200.

【００１０】図１のメモリ１００と図２の係数メモリ４
０と加算入力セレクタメモリ５０を１つのメモリにまと
めた構成もある。またハードウェアとしては、プロセッ
サエレメント（演算処理回路）２００に加算器６０しか
内蔵していないものもある。この場合、乗算を行うには
部分積（乗数と被乗数との論理和）を加算器６０で累加
算していくことになり、演算サイクル数が長くなるとい
う欠点があるが、いずれにせよ図２に示す回路と同じ演
算を行うことが出来る。The memory 100 of FIG. 1 and the coefficient memory 4 of FIG.
There is also a configuration in which 0 and the addition input selector memory 50 are integrated into one memory. In addition, as hardware, there is one in which only the adder 60 is built in the processor element (arithmetic processing circuit) 200. In this case, in order to perform the multiplication, the partial product (logical sum of the multiplier and the multiplicand) is cumulatively added by the adder 60, which has a disadvantage that the number of operation cycles becomes long. The same operation as the circuit shown in can be performed.

【００１１】ビデオ信号処理においては、全ての画素に
対して同じ演算処理をすることが多いのでＳＩＭＤ方式
で充分に対応でき不便はない。そしてＳＩＭＤ方式なら
ば、命令（図１の演算処理制御信号、アドレス制御信
号）は共通だから、その命令を与える制御回路は１つで
済み回路規模が小さくなるという利点がある。In video signal processing, since the same arithmetic processing is often performed on all pixels, the SIMD method can be sufficiently applied and there is no inconvenience. In the SIMD system, since the instructions (the arithmetic processing control signal and the address control signal in FIG. 1) are common, there is an advantage that only one control circuit needs to give the instruction and the circuit scale can be reduced.

【００１２】しかし、ビデオ信号処理において、ある画
素同士は同じ演算処理をしない場合もある。即ち、一連
の入力画素データを４つずつの組However, in the video signal processing, some pixels may not perform the same arithmetic processing. That is, a series of four sets of input pixel data

【数１】に分け、それぞれ４次のベクトルとみなし、４×４定数
行列[Equation 1] 4 × 4 constant matrix

【数２】とそれぞれ乗算[Equation 2] And each multiply

【数３】を行う処理もある。。[Equation 3] There is also a process to perform. .

【００１３】このような一連の画素を複数の組に分割
し、そして各組内の画素データを１つのベクトルとみな
して固定係数行列との乗算をおこなう画像処理の例とし
て、離散コサイン変換（ＤＣＴ＝ＤｉｓｃｒｅｔｅＣ
ｏｓｉｎｅＴｒａｎｓｆｏｒｍ）があげられる。As an example of image processing in which such a series of pixels is divided into a plurality of sets, and pixel data in each set is regarded as one vector and multiplication with a fixed coefficient matrix is performed, a discrete cosine transform (DCT) is used. = Discrete C
osine Transform).

【００１４】しかしながらこのような行列計算に関して
は、従来これを簡単に行う演算方法は知られていなかっ
た。However, regarding such matrix calculation, a calculation method for easily performing this has not been known.

【００１５】[0015]

【発明が解決しようとする課題】解決しようとする問題
点は、従来は単一の制御信号にて複数の単位演算回路の
制御を行う方式の並列プロセッサを用いて行列計算を簡
単に行う演算方法は知られていなかったというものであ
る。A problem to be solved by the present invention is that an arithmetic method for easily performing matrix calculation by using a parallel processor of a system which conventionally controls a plurality of unit arithmetic circuits with a single control signal. Was not known.

【００１６】[0016]

【課題を解決するための手段】本発明は、少なくとも１
つのメモリと１つの乗算及び加算を行うことが出来るプ
ロセッサエレメントとより成る単位演算回路を並列に複
数個配し、上記各プロセッサエレメントでは、対応する
メモリとの入出力の他に、両隣のメモリからもデータを
受け取ることが出来る構成を有し、単一の制御信号にて
上記複数の単位演算回路の制御を行う方式の並列プロセ
ッサに対して、一連の入力データを順次対応するメモリ
に格納するとともに、この一連の入力データをｋ組に分
割して、各組内の入力データを１つのｋ次ベクトルとみ
なしてｋ×ｋ定数行列〔Ｌ i,j〕との乗算を行う行列計
算の演算方法において、最初に、ｎｋ＋ｍ（ｎ，ｍは０
〜ｋ−１の自然数）番目のプロセッサエレメントでは
〔Ｌ (i+m)mod(k),1〕と対応するメモリに格納されてい
る入力データ（Ａｎｋ＋ｍ）との乗算を順次行う命令ｍ
ｐｙ（ｉ）（ｉ＝０〜ｋ−１）を順次並列プロセッサに
与えて計算させ、第２に、（対応するメモリ（ｈ番目の
メモリ）に格納されているデータ）＋Ｖｈ×（隣のメモ
リ（ｈ−１、あるいはｈ＋１番目のメモリ）に格納され
ているデータ）（但し、ｈmod(k)＝０の時Ｖｈ＝０、そ
れ以外の時はＶｈ＝１）を行う命令ａｄｄ１を複数回並
列プロセッサに与えて計算させ、第３に、Ｗｈ×（対応
するメモリ（ｈ番目のメモリ）に格納されているデー
タ）＋Ｙｈ×（上記ａｄｄ１とは反対側の隣のメモリ
（ｈ＋１、あるいはｈ−１番目のメモリ）に格納されて
いるデータ）（但し、ｈmod(k)＝ｋ−１の時Ｗｈ＝１、
Ｙｈ＝０、それ以外の時はＷｈ＝０、Ｙｈ＝１）を行う
命令ａｄｄ２を複数回並列プロセッサに与えて計算さ
せ、第４に、命令ａｄｄ１を並列プロセッサに与えて計
算させるようにした行列計算の演算方法である。SUMMARY OF THE INVENTION The present invention comprises at least one
A plurality of unit arithmetic circuits each including one memory and one processor element capable of performing one multiplication and addition are arranged in parallel. In each of the above processor elements, in addition to the input / output to / from the corresponding memory, both adjacent memory Also has a configuration capable of receiving data, and stores a series of input data in a corresponding memory sequentially for a parallel processor of a method of controlling the plurality of unit arithmetic circuits with a single control signal. , This series of input data is divided into k sets, the input data in each set is regarded as one k-order vector, and multiplication with a k × k constant matrix [L i, j] is performed. First, nk + m (n and m are 0
In the (k-1 natural number) th processor element, an instruction m for sequentially multiplying [L (i + m) mod (k), 1] by the input data (Ank + m) stored in the corresponding memory
py (i) (i = 0 to k−1) is sequentially given to the parallel processors to be calculated, and secondly, (data stored in the corresponding memory (hth memory)) + Vh × (adjacent memory) (Data stored in h-1 or h + 1th memory) (however, if hmod (k) = 0, Vh = 0, otherwise, Vh = 1) Thirdly, Wh × (data stored in the corresponding memory (hth memory)) + Yh × (adjacent memory (h + 1, or h−1) opposite to the above add1 is calculated by the processor. Data stored in the second memory) (however, when hmod (k) = k−1, Wh = 1,
Yh = 0, otherwise Wh = 0, Yh = 1) The instruction add2 is given to the parallel processor multiple times for calculation, and fourth, the instruction add1 is given to the parallel processor for calculation. This is a calculation method.

【００１７】[0017]

【作用】これによれば、単一の制御信号にて複数の単位
演算回路の制御を行う方式の並列プロセッサを用いて、
行列計算の演算を簡単に行うことが出来る。According to this, by using a parallel processor of a system for controlling a plurality of unit arithmetic circuits with a single control signal,
The matrix calculation can be easily performed.

【００１８】[0018]

【実施例】例えば、１つのメモリと１つの乗算及び加算
を行うことが出来るプロセッサエレメント（演算処理回
路）より成る回路を並列に複数個配し、各プロセッサエ
レメント（演算処理回路）では、対応するメモリとの入
出力の他に、両隣のメモリからもデータを受け取ること
が出来る構成をしているＳＩＭＤ方式の並列プロセッサ
が設けられる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS For example, a plurality of circuits each including one memory and one processor element (arithmetic processing circuit) capable of performing multiplication and addition are arranged in parallel, and each processor element (arithmetic processing circuit) corresponds to each circuit. In addition to the input / output to / from the memory, a SIMD type parallel processor having a structure capable of receiving data from the memories on both sides is provided.

【００１９】この並列プロセッサに、一連の入力データ
・・・Ａn-1,Ｂn-1,Ｃn-1,Ｄn-1,Ａn,Ｂn,Ｃn,Ｄn,Ａn+
1,Ｂn+1,Ｃn+1,Ｄn+1,・・・を、・・・４（ｎ−１）番
目のメモリ、４（ｎ−１）＋１番目のメモリ、４（ｎ−
１）＋２番目のメモリ、４（ｎ−１）＋３番目のメモ
リ、４ｎ番目のメモリ、４ｎ＋１番目のメモリ、４ｎ＋
２番目のメモリ、４ｎ＋３番目のメモリ、４（ｎ＋１）
番目のメモリ、４（ｎ＋１）＋１番目のメモリ、４（ｎ
＋１）＋２番目のメモリ、４（ｎ＋１）＋３番目のメモ
リ、・・・に格納する。In this parallel processor, a series of input data ... An-1, Bn-1, Cn-1, Dn-1, An, Bn, Cn, Dn, An +.
1, Bn + 1, Cn + 1, Dn + 1, ..., 4 (n-1) th memory, 4 (n-1) + 1th memory, 4 (n-
1) + 2nd memory, 4 (n-1) + 3rd memory, 4nth memory, 4n + 1th memory, 4n +
2nd memory, 4n + 3rd memory, 4 (n + 1)
Th memory, 4 (n + 1) + 1th memory, 4 (n
+1) + 2nd memory, 4 (n + 1) + 3rd memory ,.

【００２０】この格納された一連の入力データを４（＝
ｋ）つの組に分割し、そして各組内の入力データを１つ
の４次ベクトルThis stored series of input data is converted into 4 (=
k) split into sets, and the input data in each set is a quaternary vector

【数４】とみなし、４×４定数行列[Equation 4] And a 4 × 4 constant matrix

【数５】との乗算[Equation 5] Multiplication with

【数６】を行う。[Equation 6] I do.

【００２１】すなわち、４、８、１２、１６・・・番目
のプロセッサエレメント（演算処理回路）では、Ｌ i,0
（ｉ＝０の時Ｅ 0、ｉ＝１の時Ｆ 0、ｉ＝２の時Ｇ 0、
ｉ＝３の時Ｈ 0）と、対応するメモリに格納されている
入力データＡ 4、Ａ 8、Ａ12、Ａ16・・・との乗算（Ｌ
i,0×Ａ 4、Ｌ i,0×Ａ 8、Ｌ i,0×Ａ12、Ｌ i,0×Ａ
16・・・）を行う。That is, in the 4, 8, 12, 16 ... Processor elements (arithmetic processing circuits), L i, 0
(E 0 when i = 0, F 0 when i = 1, G 0 when i = 2,
When i = 3, H 0) is multiplied by the input data A 4, A 8, A 12, A 16 ...
i, 0 × A 4, L i, 0 × A 8, L i, 0 × A 12, L i, 0 × A
16 ...)

【００２２】それと同時に、５、９、１３、１７・・・
番目のプロセッサエレメント（演算処理回路）では、Ｌ
(i+1)mod(4),1（ｉ＝０の時Ｆ 1、ｉ＝１の時Ｇ 1、ｉ
＝２の時Ｈ 1、ｉ＝３の時Ｅ 1）と、対応するメモリに
格納されている入力データＡ5、Ａ 9、Ａ13、Ａ17・・
・との乗算（Ｌ (i+1)mod(4),1×Ａ 5、Ｌ (i+1)mod
(4),1×Ａ 9、Ｌ (i+1)mod(4),1×Ａ13、Ｌ (i+1)mod
(4),1×Ａ17・・・）を行う。At the same time, 5, 9, 13, 17 ...
In the th processor element (arithmetic processing circuit), L
(i + 1) mod (4), 1 (F 1 when i = 0, G 1 when i = 1, i
= 2, H 1 when i = 3, E 1) when i = 3, and the input data A5, A9, A13, A17 ...
・ Multiply with (L (i + 1) mod (4), 1 × A 5, L (i + 1) mod
(4), 1 × A 9, L (i + 1) mod (4), 1 × A 13, L (i + 1) mod
(4), 1 × A17 ...)

【００２３】それと同時に、６、１０、１４、１８・・
・番目のプロセッサエレメント（演算処理回路）では、
Ｌ (i+2)mod(4),2（ｉ＝０の時Ｇ 2、ｉ＝１の時Ｈ 2、
ｉ＝２の時Ｅ 2、ｉ＝３の時Ｆ2）と、対応するメモリ
に格納されている入力データＡ 6、Ａ10、Ａ14、Ａ18・
・・との乗算（Ｌ (i+2)mod(4),2×Ａ 6、Ｌ (i+2)mod
(4),2×Ａ10、Ｌ (i+2)mod(4),2×Ａ14、Ｌ (i+2)mod
(4),2×Ａ18・・・）を行う。At the same time, 6, 10, 14, 18 ...
-In the th processor element (arithmetic processing circuit),
L (i + 2) mod (4), 2 (G 2 when i = 0, H 2 when i = 1,
E 2 when i = 2, F 2 when i = 3), and the input data A 6, A 10, A 14, A 18 stored in the corresponding memory.
..Multiplication with (L (i + 2) mod (4), 2 × A 6, L (i + 2) mod
(4), 2 × A10, L (i + 2) mod (4), 2 × A14, L (i + 2) mod
(4), 2 × A18 ...)

【００２４】それと同時に、７、１１、１５、１
９、... 番目のプロセッサエレメント（演算処理回路）
では、Ｌ (i+3)mod(4),3（ｉ＝０の時Ｈ 3、ｉ＝１の時
Ｅ 3、ｉ＝２の時Ｆ 3、ｉ＝３の時Ｇ 3）と、対応する
メモリに格納されている入力データＡ 7、Ａ11、Ａ15、
Ａ19・・・との乗算（Ｌ (i+3)mod(4),3×Ａ 7、Ｌ (i+
3)mod(4),3×Ａ11、Ｌ (i+3)mod(4),3×Ａ15、Ｌ (i+3)
mod(4),3×Ａ19・・・）を行う命令ｍｐｙ（ｉ）（ｉ＝
０〜３）を順次並列プロセッサに与えて計算させ
る。At the same time, 7, 11, 15, 1
9th, ... th processor element (arithmetic processing circuit)
Then, L (i + 3) mod (4), 3 (H 3 when i = 0, E 3 when i = 1, F 3 when i = 2, G 3 when i = 3) correspond to Input data A 7, A 11, A 15, stored in the memory
Multiplying with A19 ... (L (i + 3) mod (4), 3 × A 7, L (i +
3) mod (4), 3 × A11, L (i + 3) mod (4), 3 × A15, L (i + 3)
An instruction mpy (i) (i =) for performing mod (4), 3 × A19 ...
0 to 3) are sequentially given to a parallel processor to be calculated.

【００２５】次に（ｈ番目のメモリに格納されているデ
ータ）＋Ｖ h×（ｈ−１番目のメモリに格納されている
データ）（但し、ｈmod(4)＝０の時Ｖ h＝０、それ以外
の時はＶ h＝１）を行う命令ａｄｄ１を３回並列プロセ
ッサに与えて計算させる。Next, (data stored in the h-th memory) + V h × (data stored in the (h−1) -th memory) (however, when hmod (4) = 0, V h = 0, In the other cases, the instruction add1 for performing Vh = 1) is given to the parallel processor three times to perform calculation.

【００２６】そして次にＷ h×（ｈ番目のメモリに格納
されているデータ）＋Ｙ h×（ｈ＋１番目のメモリに格
納されているデータ）（但し、ｈmod(4)＝３の時Ｗ h＝
１、Ｙ h＝０、それ以外の時はＷh＝０、Ｙ h＝１）を
行う命令ａｄｄ２を３回並列プロセッサに与えて計算さ
せる。最後に、命令ａｄｄ１を１回並列プロセッサに与
えて計算させる。Then, W h × (data stored in the h-th memory) + Y h × (data stored in the h + 1-th memory) (however, when hmod (4) = 3, W h =
1, Y h = 0, otherwise Wh = 0, Y h = 1) The instruction add2 is given to the parallel processor three times for calculation. Finally, the instruction add1 is given to the parallel processor once to be calculated.

【００２７】これによって４×４定数行列との乗算が行
われ、この演算方法によれば、従来に比べ演算サイクル
数が少なくて済む。As a result, multiplication with a 4 × 4 constant matrix is performed, and according to this calculation method, the number of calculation cycles can be reduced as compared with the conventional method.

【００２８】以下、ｋ＝４の場合について具体的に説明
する。即ち、一連の入力データ・・・Ａn-1,Ｂn-1,Ｃn-
1,Ｄn-1,Ａn,Ｂn,Ｃn,Ｄn,Ａn+1,Ｂn+1,Ｃn+1,Ｄn+1,・
・・を、・・・４（ｎ−１）番目のメモリ、４（ｎ−
１）＋１番目のメモリ、４（ｎ−１）＋２番目のメモ
リ、４（ｎ−１）＋３番目のメモリ、４ｎ番目のメモ
リ、４ｎ＋１番目のメモリ、４ｎ＋２番目のメモリ、４
ｎ＋３番目のメモリ、４（ｎ＋１）番目のメモリ、４
（ｎ＋１）＋１番目のメモリ、４（ｎ＋１）＋２番目の
メモリ、４（ｎ＋１）＋３番目のメモリ・・・に格納す
る。The case of k = 4 will be specifically described below. That is, a series of input data ... An-1, Bn-1, Cn-
1, Dn-1, An, Bn, Cn, Dn, An + 1, Bn + 1, Cn + 1, Dn + 1, ...
.., ... 4 (n-1) th memory, 4 (n-)
1) + 1st memory, 4 (n-1) + 2nd memory, 4 (n-1) + 3rd memory, 4nth memory, 4n + 1th memory, 4n + 2nd memory, 4
n + 3rd memory, 4 (n + 1) th memory, 4
(N + 1) + 1th memory, 4 (n + 1) + 2nd memory, 4 (n + 1) + 3rd memory ...

【００２９】この格納された一連の入力データを４（＝
ｋ）つの組に分割し、そして各組内の入力データを１つ
の４次ベクトルThis stored series of input data is converted into 4 (=
k) split into sets, and the input data in each set is a quaternary vector

【数７】とみなし、４×４定数行列[Equation 7] And a 4 × 4 constant matrix

【数８】との乗算[Equation 8] Multiplication with

【数９】をおこなう場合について説明する。[Equation 9] The case of performing will be described.

【００３０】この場合、図１のメモリ１００は５ワード
（０〜４番地）の大きさのメモリで良い。図２の係数メ
モリ４０は６ワード（０〜５番地）の大きさのメモリで
良い。加算入力セレクタメモリ５０は３ワード（０〜２
番地）の大きさのメモリで良い。説明に当たり、図２を
具体化した回路図３を用いることにする。つまり、図１
及びプロセッサエレメント（演算処理回路）２００とし
て図３を用いた並列プロセッサで演算を行う場合につい
て述べる。図３のＬ／Ｒセレクタ制御信号、係数メモリ
の番地指定信号、加算入力セレクタメモリの番地指定信
号が図１で言うところの演算処理制御信号である。In this case, the memory 100 of FIG. 1 may be a memory having a size of 5 words (addresses 0 to 4). The coefficient memory 40 of FIG. 2 may be a memory having a size of 6 words (addresses 0 to 5). The addition input selector memory 50 has 3 words (0 to 2
A memory with the size of the address is sufficient. For the description, the circuit diagram 3 that embodies FIG. 2 will be used. That is, FIG.
A case will be described in which the parallel processor using FIG. 3 as the processor element (arithmetic processing circuit) 200 performs arithmetic. The L / R selector control signal, the address designating signal of the coefficient memory, and the address designating signal of the addition input selector memory in FIG. 3 are the arithmetic processing control signals referred to in FIG.

【００３１】４、８、１２・・・番目のプロセッサエレ
メント（演算処理回路）の係数メモリ４０及び加算入力
セレクタメモリ５０はあらかじめ図４のように設定して
おく。５、９、１３・・・番目、６、１０、１４・・・
番目、７、１１、１５・・・番目のプロセッサエレメン
ト（演算処理回路）２００も同様に図５、図６、図７の
ように設定しておく。The coefficient memory 40 and the addition input selector memory 50 of the 4th, 8th, 12th, ... Processor elements (arithmetic processing circuits) are set in advance as shown in FIG. 5,9,13 ... th, 6,10,14 ...
The seventh, seventh, eleventh, fifteenth, ... Processor element (arithmetic processing circuit) 200 is similarly set as shown in FIGS. 5, 6, and 7.

【００３２】一連の入力データ・・・Ａn-1,Ｂn-1,Ｃn-
1,Ｄn-1,Ａn,Ｂn,Ｃn,Ｄn,Ａn+1,Ｂn+1,Ｃn+1,Ｄn+1,・
・・は、・・・４（ｎ−１）番目のメモリ、４（ｎ−
１）＋１番目のメモリ、４（ｎ−１）＋２番目のメモ
リ、４（ｎ−１）＋３番目のメモリ、４ｎ番目のメモ
リ、４ｎ＋１番目のメモリ、４ｎ＋２番目のメモリ、４
ｎ＋３番目のメモリ、４（ｎ＋１）番目のメモリ、４
（ｎ＋１）＋１番目のメモリ、４（ｎ＋１）＋２番目の
メモリ、４（ｎ＋１）＋３番目のメモリ・・・の各０番
地に格納する。A series of input data ... An-1, Bn-1, Cn-
1, Dn-1, An, Bn, Cn, Dn, An + 1, Bn + 1, Cn + 1, Dn + 1, ...
.. is the 4 (n-1) th memory, 4 (n-)
1) + 1st memory, 4 (n-1) + 2nd memory, 4 (n-1) + 3rd memory, 4nth memory, 4n + 1th memory, 4n + 2nd memory, 4
n + 3rd memory, 4 (n + 1) th memory, 4
(N + 1) + 1st memory, 4 (n + 1) + 2nd memory, 4 (n + 1) + 3rd memory, ...

【００３３】従って、図１の４ｎ番目のメモリ、４ｎ＋
１番目のメモリ、４ｎ＋２番目のメモリ、４ｎ＋３番目
のメモリの内容は、図９の第２表のようになる。但し、
第２表、及びこれ以降の説明においては、Ａn,Ｂn,Ｃn,
Ｄn をＡ、Ｂ、Ｃ、Ｄとしている。また、４ｎ〜４ｎ＋
３番目以外のメモリの内容も第２表と同様である。以下
の説明では、４ｎ〜４ｎ＋３番目部分について述べてい
くが、他の部分も同様である。Therefore, the 4nth memory in FIG.
The contents of the first memory, 4n + 2nd memory, and 4n + 3th memory are as shown in Table 2 of FIG. However,
In Table 2 and the following description, An, Bn, Cn,
Let Dn be A, B, C and D. Also, 4n to 4n +
The contents of the memories other than the third one are the same as in Table 2. In the following description, the 4n to 4n + third part will be described, but the other parts are also the same.

【００３４】１サイクル目に、制御回路（図示省略）か
らの制御信号（図１のアドレス制御信号及び、演算処理
制御信号）として、図８の第１表の１サイクル目に示す
信号を与える。従って、４ｎ番目のプロセッサエレメン
ト（演算処理回路）では、（４ｎ番目の係数メモリ０番
地の値）×（４ｎ番目のメモリ０番地の値）＋０＝Ｅ0
×Ａが計算され、４ｎ番目のメモリ１番地に格納され
る。４ｎ＋１〜４ｎ＋３番目についても同様であり、１
サイクル目終了時には図１０の第３表となる。In the first cycle, the signals shown in the first cycle of Table 1 of FIG. 8 are given as the control signals (address control signal and arithmetic processing control signal of FIG. 1) from the control circuit (not shown). Therefore, in the 4nth processor element (arithmetic processing circuit), (4nth coefficient memory 0 address value) × (4nth memory 0 address value) + 0 = E0
× A is calculated and stored in the 1st address of the 4nth memory. The same applies to the 4n + 1 to 4n + third, and 1
At the end of the cycle, the table shown in FIG.

【００３５】２〜４サイクル目も制御回路から第１表の
２〜４サイクル目に示す信号を与える事で各サイクル終
了時には図１１〜１３の第４表〜第６表となる。５サイ
クル目に制御回路から第１表の５サイクル目に示す信号
を与える。従って、４ｎ番目のプロセッサエレメント
（演算処理回路）では、（４ｎ番目の係数メモリ４番地
の値）×（４ｎ番目のメモリ２番地の値）＋０＝Ｈ 0×
Ａが計算され、４ｎ番目のメモリ２番地に格納される。In the 2nd to 4th cycles, the signals shown in the 2nd to 4th cycles of Table 1 are given from the control circuit, and at the end of each cycle, the 4th to 6th tables of FIGS. 11 to 13 are obtained. At the 5th cycle, the control circuit gives the signal shown at the 5th cycle of Table 1. Therefore, in the 4n-th processor element (arithmetic processing circuit), (value of 4n-th coefficient memory address 4) × (value of 4n-th memory address 2) + 0 = H 0 ×
A is calculated and stored in the 2nd address of the 4nth memory.

【００３６】４ｎ＋１番目のプロセッサエレメント（演
算処理回路）では、（４ｎ＋１番目の係数メモリ４番地
の値）×（４ｎ＋１番目のメモリ２番地の値）＋（４ｎ
番目のメモリ１番地の値）＝Ｅ 0×Ａ＋Ｅ 1×Ｂが計算
され、４ｎ＋１のメモリ２番地に格納される。４ｎ＋
２、４ｎ＋３番目についても同様であり、５サイクル目
終了時には図１４の第７表となる。In the 4n + 1th processor element (arithmetic processing circuit), (4n + 1th coefficient memory 4th address value) × (4n + 1th memory 2nd address value) + (4n
The value of the 1st memory address 1) = E 0 × A + E 1 × B is calculated and stored in the memory address 2 of 4n + 1. 4n +
The same applies to the 2nd, 4n + 3rd, and Table 7 in FIG. 14 is obtained at the end of the fifth cycle.

【００３７】６、７サイクル目も制御回路から第１表の
６、７サイクル目に示す信号を与える事で各サイクル終
了時には図１６、図１６の第８表、第９表となる。８サ
イクル目以降で使用しないデータをＸで置き換えたメモ
リの内容を図１７の第１０表に記す。Also in the 6th and 7th cycles, the signals shown in the 6th and 7th cycles of Table 1 are given from the control circuit, and at the end of each cycle, the tables shown in FIGS. Table 10 in FIG. 17 shows the contents of the memory in which unused data is replaced with X in the eighth and subsequent cycles.

【００３８】８サイクル目に制御回路から第１表の８サ
イクル目に示す信号を与える。従って、４ｎ＋３番目の
プロセッサエレメント（演算処理回路）では、（４ｎ＋
３番目の係数メモリ５番地の値）×（４ｎ＋３番目のメ
モリ３番地の値）＋０＝Ｆ 1×Ｂ＋Ｆ 2×Ｃ＋Ｆ 3×Ｄ
が計算され、４ｎ＋３番目のメモリ３番地に格納され
る。At the 8th cycle, the control circuit gives the signal shown at the 8th cycle of Table 1. Therefore, in the 4n + 3rd processor element (arithmetic processing circuit), (4n +
Value of third coefficient memory 5) × (4n + value of third memory 3) + 0 = F 1 × B + F 2 × C + F 3 × D
Is calculated and stored in the 4n + 3rd memory address 3.

【００３９】４ｎ＋２番目のプロセッサエレメント（演
算処理回路）では、（４ｎ＋２番目の係数メモリ５番地
の値）×（４ｎ＋２番目のメモリ３番地の値）＋（４ｎ
＋３番目のメモリ４番地の値）＝０×（４ｎ＋２番目の
メモリ３番地の値）＋（４ｎ＋３番目のメモリ４番地の
値）＝Ｅ 0×Ａ＋Ｅ 1×Ｂ＋Ｅ 2×Ｃ＋Ｅ 3×Ｄが計算
され、４ｎ＋２番目のメモリ３番地に格納される。４
ｎ、４ｎ＋１番目についても同様であり、８サイクル目
終了時には図１８の第１１表となる。In the 4n + 2nd processor element (arithmetic processing circuit), (4n + value of 5th coefficient memory address 5) × (4n + value of 2nd memory address 3) + (4n
+ Value of third memory address 4) = 0 × (4n + value of second memory address 3) + (4n + value of third memory address 4) = E 0 × A + E 1 × B + E 2 × C + E 3 × D is calculated. 4n + 2 is stored in the third memory address. Four
The same applies to the nth, 4n + 1th, and the end of the eighth cycle results in Table 11 in FIG.

【００４０】９、１０サイクル目も制御回路から第１表
の９、１０サイクル目に示す信号を与える事で各サイク
ル終了時には図１９、図２０の第１２表、第１３表とな
る。１１サイクル目で使用しないデータをＸで置き換え
たメモリの内容を図２１の第１４表に記す。In the 9th and 10th cycles, the signals shown in the 9th and 10th cycles of Table 1 are given from the control circuit, so that the 12th and 13th tables of FIGS. 19 and 20 are obtained at the end of each cycle. The contents of the memory obtained by replacing the unused data in the 11th cycle with X are shown in Table 14 of FIG.

【００４１】１１サイクル目に制御回路から第１表の１
１サイクル目に示す信号を与える。従って、４ｎ番目の
プロセッサエレメント（演算処理回路）では、（４ｎ番
目の係数メモリ４番地の値）×（４ｎ番目のメモリ１番
地の値）＋０＝Ｅ 0×Ａ＋Ｅ1×Ｂ＋Ｅ 2×Ｃ＋Ｅ 3×
Ｄが計算され、４ｎ番目のメモリ０番地に格納される。In the 11th cycle, the control circuit outputs 1 in Table 1
The signal shown in the first cycle is given. Therefore, in the 4nth processor element (arithmetic processing circuit), (value of 4nth coefficient memory address 4) × (value of 4nth memory address 1) + 0 = E 0 × A + E 1 × B + E 2 × C + E 3 ×
D is calculated and stored in the 0th address of the 4nth memory.

【００４２】４ｎ＋１番目のプロセッサエレメント（演
算処理回路）では、（４ｎ＋１番目の係数メモリ４番地
の値）×（４ｎ＋１番目のメモリ１番地の値）＋（４ｎ
番目のメモリ４番地の値）＝Ｆ 0×Ａ＋Ｆ 1×Ｂ＋Ｆ 2
×Ｃ＋Ｆ 3×Ｄが計算され、４ｎ＋１のメモリ０番地に
格納される。４ｎ＋２、４ｎ＋３番目についても同様で
あり、１１サイクル目終了時には図２２の第１５表とな
る。In the 4n + 1th processor element (arithmetic processing circuit), (4n + 1th coefficient memory 4th address value) × (4n + 1th memory 1st address value) + (4n
Value of the 4th memory) = F 0 × A + F 1 × B + F 2
× C + F 3 × D is calculated and stored in the memory address 0 of 4n + 1. The same applies to the 4n + 2, 4n + 3th, and when the 11th cycle ends, the table shown in FIG.

【００４３】従って、１１サイクル目終了後に各メモリ
の０番地のデータを図１では図示省略した出力端子から
取り出すことによりTherefore, after completion of the 11th cycle, the data at address 0 of each memory is taken out from the output terminal not shown in FIG.

【数１０】を取り出すことが出来る。[Equation 10] Can be taken out.

【００４４】こうして上述の方法によれば、単一の制御
信号にて複数の単位演算回路の制御を行う方式の並列プ
ロセッサを用いて、行列計算の演算を簡単に行うことが
出来るものである。Thus, according to the above method, the matrix calculation can be easily performed by using the parallel processor of the method of controlling a plurality of unit arithmetic circuits with a single control signal.

【００４５】なお上述の図３の回路構成においては、命
令ｍｐｙ（ｉ）（ｉ＝０〜ｋ−１）と命令ａｄｄ１を１
度に行うことも出来る。例えば、上述の例（ｋ＝４の場
合）で係数メモリ及び加算入力セレクタメモリは、あら
かじめ上述と同じく第４図〜第７図のように設定してお
く。そして図２３の第１６表に示す信号を各サイクルで
与えることにより、各サイクルの終了時には、図２４〜
図３４の第１７表〜第２７表のようになり、８サイクル
で演算を行うことが出来る。すなわちこの例では、第４
表〜第６表の命令ｍｐｙ（ｉ）（ｉ＝０〜ｋ−１）と第
７表〜第９表の命令ａｄｄ１を、第１９表〜第２１表に
示すように１度に行っている。In the circuit configuration shown in FIG. 3, the instruction mpy (i) (i = 0 to k-1) and the instruction add1 are set to 1
You can also do it every time. For example, in the above example (when k = 4), the coefficient memory and the addition input selector memory are set in advance as shown in FIGS. 4 to 7 as described above. By applying the signals shown in Table 16 of FIG. 23 in each cycle, at the end of each cycle, as shown in FIG.
As shown in Tables 17 to 27 of FIG. 34, the operation can be performed in 8 cycles. That is, in this example, the fourth
The instruction mpy (i) (i = 0 to k-1) in Tables to Table 6 and the instruction add1 in Tables 7 to 9 are performed at one time as shown in Tables 19 to 21. ..

【００４６】こうしてこの方法においても、単一の制御
信号にて複数の単位演算回路の制御を行う方式の並列プ
ロセッサを用いて、行列計算の演算を簡単に行うことが
出来るものである。As described above, also in this method, the matrix calculation can be easily performed by using the parallel processor of the method of controlling the plurality of unit arithmetic circuits by the single control signal.

【００４７】また上述では、特に画像データを例にとり
説明したが、それ以外のデータを扱っても良い。In the above description, the image data has been described as an example, but other data may be used.

【００４８】[0048]

【発明の効果】この発明によれば、単一の制御信号にて
複数の単位演算回路の制御を行う方式の並列プロセッサ
を用いて、行列計算の演算を簡単に行うことができるよ
うになった。According to the present invention, it becomes possible to easily perform matrix calculation by using a parallel processor which controls a plurality of unit arithmetic circuits by a single control signal. ..

[Brief description of drawings]

【図１】本発明による行列計算の演算方法を実施する並
列プロセッサの一例の構成図である。FIG. 1 is a configuration diagram of an example of a parallel processor that implements a calculation method for matrix calculation according to the present invention.

【図２】そのプロセッサエレメントの一例の構成図であ
る。FIG. 2 is a configuration diagram of an example of the processor element.

【図３】そのメモリ設定の一例の説明図である。FIG. 3 is an explanatory diagram of an example of the memory setting.

【図４】そのメモリ設定の一例の説明図である。FIG. 4 is an explanatory diagram of an example of the memory setting.

【図５】そのメモリ設定の一例の説明図である。FIG. 5 is an explanatory diagram of an example of the memory setting.

【図６】そのメモリ設定の一例の説明図である。FIG. 6 is an explanatory diagram of an example of the memory setting.

【図７】そのメモリ設定の一例の説明図である。FIG. 7 is an explanatory diagram of an example of the memory setting.

【図８】動作を説明する第１表の図である。FIG. 8 is a table in Table 1 for explaining the operation.

【図９】途中の状態を示す第２表の図である。FIG. 9 is a second table showing a state in the middle of the process.

【図１０】途中の状態を示す第３表の図である。FIG. 10 is a diagram of Table 3 showing a state in the middle of the process.

【図１１】途中の状態を示す第４表の図である。FIG. 11 is a diagram of a fourth table showing an intermediate state.

【図１２】途中の状態を示す第５表の図である。FIG. 12 is a diagram of Table 5 showing a state in the middle of the process.

【図１３】途中の状態を示す第６表の図である。FIG. 13 is a diagram of Table 6 showing a state in the middle of the process.

【図１４】途中の状態を示す第７表の図である。FIG. 14 is a diagram of Table 7 showing a state in the middle of the process.

【図１５】途中の状態を示す第８表の図である。FIG. 15 is a diagram of Table 8 showing a state in the middle of the process.

【図１６】途中の状態を示す第９表の図である。FIG. 16 is a diagram of Table 9 showing a state in the middle of the process.

【図１７】途中の状態を示す第１０表の図である。FIG. 17 is a diagram of Table 10 showing a state in the middle of the process.

【図１８】途中の状態を示す第１１表の図である。FIG. 18 is a diagram of Table 11 showing a state in the middle of the process.

【図１９】途中の状態を示す第１２表の図である。FIG. 19 is a diagram of Table 12 showing an intermediate state.

【図２０】途中の状態を示す第１３表の図である。FIG. 20 is a diagram of Table 13 showing a state in the middle of the process.

【図２１】途中の状態を示す第１４表の図である。FIG. 21 is a diagram of Table 14 showing an intermediate state.

【図２２】途中の状態を示す第１５表の図である。FIG. 22 is a diagram of Table 15 showing a state in the middle of the process.

【図２３】動作を説明する第１６表の図である。FIG. 23 is a table in Table 16 explaining the operation.

【図２４】途中の状態を示す第１７表の図である。FIG. 24 is a diagram of Table 17 showing a state in the middle of the process.

【図２５】途中の状態を示す第１８表の図である。FIG. 25 is a diagram of Table 18 showing a state in the middle of the process.

【図２６】途中の状態を示す第１９表の図である。FIG. 26 is a chart of Table 19 showing a state in the middle of the process.

【図２７】途中の状態を示す第２０表の図である。FIG. 27 is a diagram of Table 20 showing a state in the middle of the process.

【図２８】途中の状態を示す第２１表の図である。FIG. 28 is a diagram of Table 21 showing an intermediate state.

【図２９】途中の状態を示す第２２表の図である。FIG. 29 is a diagram of Table 22 showing a state in the middle of the process.

【図３０】途中の状態を示す第２３表の図である。FIG. 30 is a table of Table 23 showing a state in the middle of the process.

【図３１】途中の状態を示す第２４表の図である。FIG. 31 is a diagram of Table 24 showing an intermediate state.

【図３２】途中の状態を示す第２５表の図である。FIG. 32 is a diagram of Table 25 showing an intermediate state.

【図３３】途中の状態を示す第２６表の図である。FIG. 33 is a table of Table 26 showing a state in the middle of the process.

【図３４】途中の状態を示す第２７表の図である。FIG. 34 is a diagram of Table 27 showing a state in the middle of the process.

[Explanation of symbols]

１００メモリ２００プロセッサエレメント（演算処理回路）１０データバス２０乗算器３０２入力セレクタ４０係数メモリ５０加算入力セレクタメモリ６０加算器６０ 100 memory 200 processor element (arithmetic processing circuit) 10 data bus 20 multiplier 30 2 input selector 40 coefficient memory 50 addition input selector memory 60 adder 60

Claims

[Claims]

1. A plurality of unit arithmetic circuits each including at least one memory and one processor element capable of performing multiplication and addition are arranged in parallel, and each of the processor elements has an input / output circuit to and from a corresponding memory. In addition, it has a structure that can receive data from the memory on both sides, and outputs a series of input data to a parallel processor that controls a plurality of unit arithmetic circuits with a single control signal. While storing in a corresponding memory, this series of input data is divided into k sets, and the input data in each set is regarded as one k-order vector and multiplied by a k × k constant matrix [L i, j]. In the calculation method of the matrix calculation for performing the following, first, in the memory element corresponding to [L (i + m) mod (k), 1] in the nk + m (n, m is a natural number of 0 to k−1) processor element. Stored Input data (Ank +
instruction mpy (i) (i = 0 to k) for sequentially performing multiplication with m)
−1) is sequentially given to the parallel processors to be calculated, and secondly, (data stored in the corresponding memory (hth memory)) + Vh × (adjacent memory (h−1 or h + 1th memory) Data stored in))
(However, when hmod (k) = 0, Vh = 0; otherwise, Vh
The instruction add1 for executing h = 1) is given to the parallel processor a plurality of times to be calculated, and thirdly, Wh × (data stored in the corresponding memory (hth memory)) + Yh × (opposite to the above add1. Data stored in the adjacent memory (h + 1 or h-1th memory) on the side (however, hmod (k) = k-
When 1, Wh = 1, Yh = 0, otherwise Wh = 0,
The operation method of matrix calculation in which the instruction add2 for performing Yh = 1) is given to the parallel processor a plurality of times for calculation, and fourthly, the instruction add1 is given to the parallel processor for calculation.