JPH0616287B2

JPH0616287B2 - Vector arithmetic processor with mask

Info

Publication number: JPH0616287B2
Application number: JP57168358A
Authority: JP
Inventors: 貴之中川; 重夫長島; 仁阿部
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1982-09-29
Filing date: 1982-09-29
Publication date: 1994-03-02
Anticipated expiration: 2009-03-02
Also published as: JPS5958580A

Description

【発明の詳細な説明】〔発明の利用分野〕本発明はベクトルデータを高速に演算する装置に関す
る。The present invention relates to an apparatus for calculating vector data at high speed.

[Prior art]

ベクトルプロセサのより高速な処理を達成するには、よ
り多くの種類の処理をベクトルプロセサにより高速に処
理可能とすることが課題となつている。中でも、ＦＯＲ
ＴＲＡＮブログラムで、ＩＦ文を含むＤＯループを高速
に処理にするには、より高度の処理装置が必要とされる
が、第１図に示すようなＤＯループの従来技術による処
理手順を第２図と第３図を用いて説明する。In order to achieve higher speed processing of the vector processor, it is an issue that more kinds of processing can be processed at higher speed by the vector processor. Above all, FOR
In the TRAN program, a higher-level processing device is required to process a DO loop containing an IF statement at high speed. However, the conventional DO loop processing procedure shown in FIG. This will be described with reference to FIG. 3 and FIG.

第１図に示すＤＯループでは、式(1)にあるような論理
演算の結果の値が“真”であり、かつ、式(2)に示す論
理演算結果が“真”である場合に限り、対応するインデ
クス値：Ｉを持つエレメント間で、式(3)の加算及び代
入を行い、式(1)か、式(2)のいづれかが成立しないよう
なインデクス値を持つエレメント間での式(3)の加算及
び代入は行わない。このような処理をＮ組のエレメント
について繰り返す。In the DO loop shown in FIG. 1, only when the value of the result of the logical operation as shown in Expression (1) is "true" and the logical operation result shown in Expression (2) is "true". , Expression corresponding to the index value: I is added and assigned between the elements having the expression (3), and the expression between the elements having the index value such that either expression (1) or expression (2) does not hold Do not add or substitute in (3). Such processing is repeated for N sets of elements.

第２図は、エレメント数：Ｎが６で、オペランドＡ（１
〜６），Ｂ（１〜６），Ｃ（１〜６），Ｄ（１〜６），
Ｆ（１〜６），Ｇ（１〜６）に適当な数値を仮定し、第
１図の演算を行なつた場合のデータの流れを示したもの
である。In FIG. 2, the number of elements: N is 6, and the operand A (1
~ 6), B (1-6), C (1-6), D (1-6),
It shows a data flow when the calculation of FIG. 1 is performed by assuming appropriate numerical values for F (1-6) and G (1-6).

第２図で値“Ｘ”はこれらの処理により変更を受けず、
また、この処理に関与しないことを示す。The value "X" in FIG. 2 is not changed by these processes,
It also indicates that it is not involved in this process.

以下に、その処理手順を示す。The processing procedure is shown below.

ステツプ１：オペランドＡ（１〜６）とＢ（１〜６）の
対応するエレメン同志を、それぞれ比較してベクトルマ
スク：ＶＭ（１〜６）を作成する。この場合、対応する
両エレメントの値が一致して論理演算結果が“真”とな
るとき、ＶＭには値“１”を書き、一致しないときＶＭ
には値“０”を書くものとする。従つて本例ではＶＭ
（１〜６）の値はそれぞれ、０，１，０，１，１，０と
なる。Step 1: Comparing the corresponding element members of the operands A (1 to 6) and B (1 to 6), respectively, to create a vector mask: VM (1 to 6). In this case, when the values of both corresponding elements match and the logical operation result is “true”, the value “1” is written in VM, and when they do not match, VM
The value "0" shall be written in. Therefore, in this example, VM
The values of (1 to 6) are 0, 1, 0, 1, 1, 0, respectively.

ステツプ２：次のステツプ３で、ベクトルマスク：ＶＭ
を格納している、ベクトルマスクレジスタを、再度書込
みに使用する為に、ＶＭ（１〜６）の内容を、別のレジ
スタＳＲ１に退避させる。なお、このＳＲ１は、ベクト
ルマスク専用のレジスタではなく、保持データの全エレ
メント分を一括して処理することしか出来ない。その為
に、このステツプは、ステツプ１が全て終了してから行
われる。Step 2: At the next Step 3, the vector mask: VM
In order to use the vector mask register, which stores the above, for writing again, the contents of VM (1 to 6) are saved in another register SR1. Note that this SR1 is not a register dedicated to a vector mask, and can only process all the elements of the held data collectively. Therefore, this step is performed after all steps 1 are completed.

ステツプ３：ステツプ(1)と同様に第１図の式(2)の演算
を行い、結果をＶＭ（１〜６）に書込む。ここでも、論
理演算値“真”に対して値“１”，“為”に対して値
“０”を対応させると、この場合のＶＭ（１〜６）の値
は、それぞれ０，１，１，１，０，１となる。Step 3: The operation of the equation (2) shown in FIG. 1 is performed in the same manner as the step (1), and the result is written in the VM (1-6). Also here, when the value "1" is associated with the logical operation value "true" and the value "0" is associated with the "reason", the values of the VMs (1 to 6) in this case are 0, 1, respectively. It becomes 1,1,0,1.

ステツプ４：次のステツプ５で、全エレメント分の一括
処理しか出来ない汎用の演算器を用いて汎用のレジスタ
間の論理演算を行うため、ＶＭ（１〜６）の値を汎用の
レジスタＳＲ２に退避する。このステツプもステツプ２
と同じ理由で、ステツプ３が全て終了してから行われ
る。Step 4: In the next step 5, the logical operation between the general purpose registers is performed by using the general purpose arithmetic unit capable of performing only the batch processing for all elements. Therefore, the value of VM (1 to 6) is stored in the general purpose register SR2. evacuate. This step is also step 2
For the same reason as above, it is performed after all the steps 3 are completed.

ステツプ５：ＳＲ１とＳＲ２のビツト毎の論理積をＳＲ
３に求める。このステツプはステツプ４が終了してから
行われる。Step 5: The logical product of each bit of SR1 and SR2 is SR
Ask for 3. This step is performed after step 4 is completed.

ステツプ６：ステツプ５で求めたＳＲ３の値をＶＭ（１
〜６）に転送する。Step 6: The value of SR3 obtained in Step 5 is set to VM (1
To 6).

ステツプ７：ステツプ６で得られたＶＭ（１〜６）の値
にをもとに、第１図の式(3)に示す加算と代入を行う。
その際、対応するベクトルマスクＶＭ（Ｉ）の値が
“０”のエレメントについては演算の結果を無効とす
る。すなわち、その時のＥ（Ｉ）の値を変更しない。本
例では、第２及び第４エレメントについてのみ演算結果
が主記憶上のＥ（Ｉ）の値を変更するように動作する。Step 7: Based on the value of VM (1 to 6) obtained in Step 6, addition and substitution shown in the equation (3) of FIG. 1 are performed.
At this time, the result of the operation is invalidated for the element whose corresponding vector mask VM (I) value is “0”. That is, the value of E (I) at that time is not changed. In this example, the operation result operates only for the second and fourth elements so as to change the value of E (I) in the main memory.

以上の処理の様子を、第３図のタイムチヤートで示す。
タイムチヤートの縦軸は各ステツプを示し、横軸は時間
を示す。各番号は、そのエレメント番号の処理の最初の
サイクルを示しており、ステツプ間の時間的ずれは、起
動時間のずれを示している。The state of the above processing is shown in the time chart of FIG.
The vertical axis of the time chart shows each step, and the horizontal axis shows time. Each number indicates the first cycle of processing of that element number, and the time shift between steps indicates the start time shift.

以上述べてきたように、従来技術による複雑な条件文を
含むＤＯループのベクトルプロセサでの処理は、ベクト
ルマスクレジスタが１つしかない為、一括処理用のレジ
スタに移動して、一括処理による演算処理を行い、出来
あがつた最終的なベクトルマスクを、再びベクトルマス
クレジスタに転送しなければ、条件付きの演算処理が出
来ない。従つて、第３図に見るように、従来技術ではベ
クトル処理が、ステツプ１と、ステツプ３、ステツプ７
に、それぞれ処理時間帯を分けられ、これらのステツプ
間では、処理の並列・高速化ができないという問題があ
つた。As described above, the processing by the vector processor of the DO loop including the complicated conditional statement according to the conventional technique has only one vector mask register. Therefore, the processing is moved to the batch processing register to perform the calculation by the batch processing. Unless the final vector mask is processed and transferred to the vector mask register again, conditional arithmetic processing cannot be performed. Therefore, as shown in FIG. 3, in the prior art, the vector processing is performed in steps 1, 3, and 7.
In addition, there is a problem that the processing time zones can be divided and processing cannot be performed in parallel or at high speed between these steps.

一般に、ベクトルプロセサでは複数ベクトル演算の異な
るエレメントについての処理を、同時並行に処理するこ
とで高速性を達成しているが、ステツプ２，４，５，６
の処理がベクトルプロセサ向きでない為に、第４図のよ
うに、ステツプ１，３，７を並列に処理することを妨げ
ている。In general, a vector processor achieves high speed by processing the elements of different vector operations differently in parallel, but steps 2, 4, 5, 6
Since the above process is not suitable for the vector processor, it prevents the steps 1, 3 and 7 from being processed in parallel as shown in FIG.

〔発明の目的〕本発明の目的は、ベクトルマスクの生成、ベクトルマス
ク間の演算処理、及び条件付きベクトル演算処理を、全
て同時並行して処理可能とするベクトル演算処理装置を
提供することにある。[Object of the Invention] It is an object of the present invention to provide a vector operation processing device capable of simultaneously performing vector mask generation, vector mask operation processing, and conditional vector operation processing in parallel. .

[Outline of Invention]

このため、本発明による装置では、 (1)同時に書込み・読出し可能な複数のベクトルマスク
専用レジスタ。Therefore, in the device according to the present invention, (1) a plurality of vector mask dedicated registers that can be simultaneously written and read.

(2)ベクトルマスク専用レジスタ中のデータを入力とし
て新しいベクトルマスクの値を算出する、１個以上のベ
クトルマスク専用演算器。(2) One or more vector mask dedicated arithmetic units for calculating a new vector mask value by inputting the data in the vector mask dedicated register.

(3)ベクトルマスクの書込みと、ベクトルマスク専用演
算器への読出しを同期させる手段。(3) A means for synchronizing the writing of the vector mask and the reading to the vector mask dedicated arithmetic unit.

(4)ベクトルマスクの書込みと、マスク付きベクトル演
算処理用演算器への読出しを同期させる手段。(4) A means for synchronizing the writing of the vector mask and the reading to the arithmetic unit for vector arithmetic processing with a mask.

とを、新たに設けることにより、ベクトルマスクの生
成，ベクトルマスク間演算，条件付きベクトル演算処理
を、全て同時並行して処理することを可能としたもので
ある。By newly providing (1) and (2), vector mask generation, vector mask calculation, and conditional vector calculation processing can all be processed in parallel at the same time.

ただし、本明細書では“演算処理”という言葉により、
主記憶参照，ベクトル間の加減算等の演算，結果の主記
憶への格納の全てのものをさし、本明細書の実施例では
マスク付き主記憶格納を例にとる。However, in this specification, the term "arithmetic processing"
Main memory refers to all operations such as main memory reference, addition / subtraction between vectors, and storage of results in main memory. In the embodiments of the present specification, masked main memory storage is taken as an example.

Example of Invention

以下、本発明を実施例を参照して詳細に説明する。第５
図は本発明の一実施例を示す。Hereinafter, the present invention will be described in detail with reference to examples. Fifth
The figure shows an embodiment of the invention.

本発明に直接関連しない装置部分は、通常のベクトルプ
ロセサと同じ構成を有するものとし、その部分の説明は
簡単にするに止める。The part of the apparatus not directly related to the present invention has the same structure as a normal vector processor, and the description of that part will be briefly described.

第５図において、１０１は主記憶装置、１０２は主記憶
制御装置、１０３はスカラ処理装置であり、１０４はそ
の一部であるスカラ命令制御装置、１０５はベクトル命
令制御装置である。In FIG. 5, 101 is a main storage device, 102 is a main storage control device, 103 is a scalar processing device, 104 is a scalar instruction control device which is a part thereof, and 105 is a vector instruction control device.

スカラ命令制御装置１０４では主記憶制御装置１０２と
信号線２０１を介して主記憶装置１０１から順次読出し
た命令を解読し、これが通常のスカラ命令であるとき
は、スカラ処理装置１０３にて通常のスカラ演算処理を
行い、信号線２０２を介し結果を主記憶装置１０１に書
込む。スカラ命令制御装置１０４で解読した命令が、ベ
クトル命令列の起動を指示する命令であれば、信号線２
０３を介して、ベクトル命令列の主記憶装置１０１上で
の先頭アドレスと、処理すべきベクトルエレメント数
と、起動信号をベクトル命令制御装置１０５へ渡す。引
続き、ベクトル命令制御装置１０５は与えられたアドレ
スに従い、信号線２０１，２０４を介してベクトル命令
列を主記憶装置１０１から順次読出し、ベクトル命令を
解読し、命令中で指定された、レジスタ，演算器，メモ
リリクエタ等が使用可能な状態にあると判断した命令か
ら、それぞれ信号線２０５，２０６，２０７を介して命
令に必要なリソースを起動すると共に、同時に、処理す
べきベクトルエレメント数を含めた制御情報を転送す
る。ベクトル命令のエレメント個々に対する処理は、ベ
クトルレジスタ制御装置１１０や、ベクトルマスクレジ
スタ制御装置１２０の送出する。１エレメント毎の処理
の許可信号に従つて進められる。以下、第１図に示した
ＤＯループの処理を行う場合を例にとり、本発明による
ベクトル命令の実行を説明する。本例では、ベクトルレ
ジスタと呼ぶバツフア記憶を用いており、それぞれＮエ
レメントからなる一連のデータ、Ａ（１〜Ｎ），Ｂ（１
〜Ｎ），Ｃ（１〜Ｎ），Ｄ（１〜Ｎ），Ｅ（１〜Ｎ），
Ｇ（１〜Ｎ）を信号線２１１〜２１６をを介し、ベクト
ルレジスタ１１１〜１１６にそれぞれ格納することが必
要であるが、この間の処理は通常のベクトルプロセツサ
による処理に従うものとする。The scalar instruction control unit 104 decodes the instructions sequentially read from the main storage unit 101 via the main storage control unit 102 and the signal line 201, and when this is a normal scalar instruction, the scalar processing unit 103 causes a normal scalar The arithmetic processing is performed, and the result is written in the main storage device 101 via the signal line 202. If the instruction decoded by the scalar instruction control device 104 is an instruction to start the vector instruction sequence, the signal line 2
The start address of the vector instruction string on the main storage device 101, the number of vector elements to be processed, and the activation signal are passed to the vector instruction control device 105 via 03. Subsequently, the vector instruction control device 105 sequentially reads the vector instruction sequence from the main storage device 101 via the signal lines 201 and 204 in accordance with the given address, decodes the vector instruction, and registers and arithmetic operations specified in the instruction. Based on the instruction that the controller, memory requester, etc. are in a usable state, the resources required for the instruction are activated via the signal lines 205, 206, and 207, respectively, and at the same time, control including the number of vector elements to be processed is performed. Transfer information. The processing for each element of the vector instruction is sent from the vector register controller 110 or the vector mask register controller 120. The process proceeds in accordance with the processing permission signal for each element. The execution of the vector instruction according to the present invention will be described below by taking the case of performing the processing of the DO loop shown in FIG. 1 as an example. In this example, buffer storage called a vector register is used, and a series of data consisting of N elements, A (1 to N) and B (1
~ N), C (1 ~ N), D (1 ~ N), E (1 ~ N),
It is necessary to store G (1 to N) in the vector registers 111 to 116 via the signal lines 211 to 216, respectively, but the processing during this period is to follow the processing by a normal vector processor.

第１図の処理を実行するには、上記の処理を行う６つの
命令の他に、Ａ（Ｉ）とＢ（Ｉ）を要素ごとに比較し、
比較結果をベクトルマスクレジスタに書込む第７の命
令。同じく、Ｃ（Ｉ）とＤ（Ｉ）の値の要素ごとの比較
結果を第７の命令で書込んだのとは別のマスクレジスタ
に書込む第８の命令。第７の命令と第８の命令により得
られる２つのベクトルマスク間で論理演算を行い、結果
を第３のベクトルマスクレジスタに書込む第９の命令。
Ｆ（Ｉ）とＧ（Ｉ）を要素毎に加算し、結果をベクトル
レジスタに格納する第１０の命令と、第１０の命令によ
つて得られた加算結果を、第９の命令によつて得られた
ベクトルマスクが“１”をとる場合のみ（もしくは、
“０”をとる場合のみ）、Ｅ（Ｉ）に相当するメモリア
ドレスに書込む、第１１の命令が用いられる。To execute the processing of FIG. 1, in addition to the six instructions that perform the above processing, A (I) and B (I) are compared element by element,
Seventh instruction to write the comparison result to the vector mask register. Similarly, an eighth instruction to write the comparison result of each element of the values of C (I) and D (I) to a mask register different from that written by the seventh instruction. A ninth instruction that performs a logical operation between two vector masks obtained by the seventh instruction and the eighth instruction and writes the result to the third vector mask register.
The tenth instruction for adding F (I) and G (I) element by element and storing the result in the vector register, and the addition result obtained by the tenth instruction are given by the ninth instruction. Only when the obtained vector mask takes "1" (or
The eleventh instruction for writing to the memory address corresponding to E (I) is used only when “0” is taken).

以下第５図に用いて、第１図の処理を説明する。第７の
命令に応答してベクトル命令制御装置は、演算器１３１
に、ベクトルレジスタ１１１，１１２中の値を送り、式
(1)の比較を行なう。この場合、等しいという関係が成
立する場合には、値“１”を、成立しない場合には値
“０”を、信号線２３１を介し、ベクトルマスクレジス
タ１２２に書込むものとする。この比較演算は、１３１
にパイプライン演算器を用いることで、１サイクルに１
エレメントのピツチで進めることができる。第８の命令
は、同様に式(2)の比較を行い、成立に対し“１”を、
不成立に対しては“０”を、パイプライン演算器１３２
から信号線２３２を介して、ベクトルマスクレジスタ１
２３に書込む。The process of FIG. 1 will be described below with reference to FIG. In response to the seventh command, the vector command control device is
The value in the vector register 111, 112 is sent to
Compare (1). In this case, the value “1” is written to the vector mask register 122 via the signal line 231 when the relation of equality is established, and the value “0” is written to the other case. This comparison operation is 131
1 in 1 cycle by using pipeline arithmetic unit
You can proceed with the pitch of the element. Similarly, the eighth instruction compares equation (2), and if it is satisfied, “1”
If not satisfied, “0” is set, and the pipeline arithmetic unit 132
From the vector mask register 1 via the signal line 232.
Write in 23.

１２２と１２３に書込まれた値の間で、今度は第９の命
令によつて論理積をとる演算が行われる。これは、従来
技術で述べたような、一括処理しか出来ないレジスタ及
び演算器ではなく、ベクトルマスクの専用レジスタ１２
１〜１２３、及び専用のパイプライン演算器１４１を用
いているため、必要なデータが揃つたエレメント間で
は、ただちに論理積をとる処理に移るように同期され
る。An operation is performed between the values written in 122 and 123, this time by the ninth instruction. This is not a register and an arithmetic unit capable of only batch processing as described in the prior art, but a dedicated register 12 for vector mask.
1 to 123 and the dedicated pipeline arithmetic unit 141 are used, the elements for which necessary data are gathered are synchronized so as to immediately shift to the process of taking the logical product.

また、第９の命令で作成されたベクトルマスクの値を、
第１１の命令で参照する場合も、メモリに格納するデー
タと、ベクトルマスクが揃つたエレメント間では、直ち
に処理が行われるように同期される。In addition, the value of the vector mask created by the ninth instruction is
Also when referred to by the eleventh instruction, the data to be stored in the memory and the elements having the same vector mask are synchronized so that the processing is immediately performed.

前者のベクトルマスクレジスタ間の同期機構を第６図で
説明し、後者のベクトルマスクレジスタとベクトルレジ
スタの間での同期機構を、条件付きベクトル演算を例
に、第７図で説明する。The former synchronization mechanism between the vector mask registers will be described with reference to FIG. 6, and the latter synchronization mechanism between the vector mask registers and the vector registers will be described with reference to FIG. 7 by taking conditional vector operation as an example.

第７，第８の命令で書込まれたベクトルマスクレジスタ
の値は、書込みが全エレメントにわたつて終了していれ
ば、どこから読んでも良いが、途中のエレメントを書き
込んでいる途中なら、書込む前の値を読むと、第１図の
処理が正しく実行されない。この書込み済の範囲を知る
目的で、例えば、比較演算結果を書き込んでから、論理
積をとるため読出されるベクトルマスクレジスタ１２２
（以下ＶＭＲ２と呼ぶ）に対応して、書き込み済みエレ
メント数が、どれだけ読み出しエレメント数を上回つて
いるかを蓄える、アツプダウンカウンタ３２２と、書込
み中であるとか読出し中であるという状態を蓄えるレジ
スタ３２１を用意する。ＶＭＲ２から１エレメント読み
出す為の許可信号４０４は、ＶＭＲ２が書込み中でない
場合か、書込み中だがこれから読み出そうとするエレメ
ントが書込み済である場合に発行される。カウンタ３２
２は書込みエレメント数から読出しエレメント数（もし
くは読出し予定エレメント数）を引いた差を保持してい
るので、このカウンタの値が正であることを示す信号４
０１、ＶＭＲ２が書込み中かつ、読出し中であることを
示す信号４０２、ＶＭＲ２が書込み中ではない、単純な
読み出し状態にあることを示す信号４０３に、ＡＮＤ回
路３２３、ＯＲ回路３２４の論理をとることで、信号４
０４を得ることができる。The value of the vector mask register written by the 7th and 8th instructions can be read from anywhere as long as the writing is completed over all the elements, but it can be written if the elements in the middle are being written. If the previous value is read, the process of FIG. 1 will not be executed correctly. For the purpose of knowing the written range, for example, the vector mask register 122 that is read out to obtain the logical product after writing the comparison operation result
Corresponding to (hereinafter, referred to as VMR2), an up-down counter 322 for storing how much the number of written elements exceeds the number of read elements, and a register for storing a state of being written or being read 321 is prepared. The permission signal 404 for reading one element from the VMR2 is issued when the VMR2 is not writing or when writing is being performed but the element to be read from now is already written. Counter 32
Since 2 holds the difference obtained by subtracting the number of read elements (or the number of read scheduled elements) from the number of write elements, a signal 4 indicating that the value of this counter is positive
01, a signal 402 indicating that VMR2 is writing and reading, and a signal 403 indicating that VMR2 is not writing and that is in a simple read state, the logics of the AND circuit 323 and the OR circuit 324 are taken. And then signal 4
04 can be obtained.

また、ＶＭＲ２とベクトルマスクレジスタ１２３（以
下、ＶＭＲ３と呼ぶ）の間で演算をして良いのは、ＶＭ
Ｒ２も、ＶＭＲ３も、どちらも１エレメント読み出す許
可信号が揃つている場合に限る。１エレメントの演算許
可信号４０６は、ＶＭＲ２の１エレメント読出し許可信
号４０４と、ＶＭＲ３の１エレメント読出し許可信号４
０５をＡＮＤ回路３０１を用いてＡＮＤすることにより
得られる。この１エレメントの演算許可信号４０６によ
つてＶＭＲ２と３の読出しアドレス３０２の値が１追加
され、未処理エレメント数を蓄えたカウンタ３０３の値
が１でないなら、１減じて、次のエレメント処理へと進
む。この論理積演算はパイプライン演算器１４１の各ス
テージ、１４１１〜１４１３を経て、ベクトルマスクレ
ジスタ１２１（以下ＶＭＲ１と呼ぶ）に書込まれる。こ
の際、書込み側のアドレス３０４やアツプダウンカウン
タ３１２への加算は、３０６〜３０７に示す同期フリツ
プフロツプを適当に追加することで、データの処理の同
期がとられる。ＶＭＲ１に書き込んだエレメントデータ
を、必要な場合に、書込みと同時に読み出すには、アツ
プダウンカウンタ３１２への＋１動作が行われてから、
実際にデータが１２２へ書込まれるまでの時間差を、ア
ツプダウンカウンタ３１２の−１動作が行われてから、
読出しが行われるまでの時間差に等しく設計することが
必要である。本例ではそのために、信号４０７は３０７
のフリツプフロツプ群をバイパスしてアツプダウンカウ
ンタ３１２へ送られている。この３０７のフリツプフロ
ツプのうち最初のもの３０７１から、ＶＭＲ１までの信
号の伝達時間と、３１２から、第６図では省略されてい
るＶＲ１の読出しアドレス（第７図５０３）までの伝達
時間を等しくすることによつて、ＶＭＲ１に書込まれた
データを直ちに読出すことが可能となつている。Further, the VM may be operated between the VMR2 and the vector mask register 123 (hereinafter referred to as VMR3).
Both R2 and VMR3 are limited to the case where permission signals for reading one element are available. The 1-element operation permission signal 406 includes a VMR 2 1-element read permission signal 404 and a VMR 3 1-element read permission signal 4.
It is obtained by ANDing 05 using the AND circuit 301. When the value of the read address 302 of VMR2 and 3 is added by the operation permission signal 406 of this one element and the value of the counter 303 which stores the number of unprocessed elements is not 1, it is decremented by 1 and the next element is processed. And proceed. This logical product operation is written in the vector mask register 121 (hereinafter referred to as VMR1) via the stages 1411 to 1413 of the pipeline arithmetic unit 141. At this time, the addition to the write-side address 304 and the up-down counter 312 is performed by synchronizing the data processing by appropriately adding the synchronization flip-flops 306 to 307. In order to read the element data written in VMR1 at the same time as writing, if necessary, after the +1 operation to the up-down counter 312 is performed,
The time difference until the data is actually written in 122 is calculated after the -1 operation of the up-down counter 312 is performed.
It is necessary to design equal to the time difference between reading. In this example, therefore, the signal 407 is 307
Of the flip-flops are sent to the up-down counter 312. The transmission time of the signal from the first one 3071 of the flip-flops 307 to VMR1 and the transmission time from 312 to the read address of VR1 (503 in FIG. 7) omitted in FIG. 6 are made equal. Thus, the data written in VMR1 can be immediately read out.

第６図で、ＷＴと書かれているのは、それぞれのベクト
ルマスクレジスタへの書込み命令により発行される信号
である。本例では、第１図の式(1)の処理の始まりと共
に、信号４２１により、ＶＭＲ２の状態３２１を、書込
み状態にセツトし、アツプダウンカウンタ３２２を０に
リセツトする。そして、１エレメント毎の書込み信号４
２２により、書き込み数をカウントアツプさせると共に
信号４２３により、結果をＶＭＲ１に書込む。そして、
全エレメント分の処理の終了をカウンタ３０３の値が１
であるという信号４２４により検出し、書込み状態にあ
るという記録を３１１から、読出し状態にあるという記
録を３２１〜３３１から解除する。In FIG. 6, WT is a signal issued by a write command to each vector mask register. In this example, with the start of the processing of the equation (1) in FIG. 1, the state 321 of the VMR2 is set to the write state and the up-down counter 322 is reset to 0 by the signal 421. And the write signal 4 for each element
The number of writes is counted up by 22 and the result is written in VMR1 by a signal 423. And
The value of the counter 303 is 1 when the processing for all elements is completed.
The signal 424 indicating that the write status is detected and the record indicating that the write status is set from 311 and the record indicating that the read status is set from 321 to 331 are released.

以上の動作は、第６図で説明しているベクトルマスクレ
ジスタ間演算によるＶＭＲ１の書き込み制御方式と同じ
ため、第６図ではＶＭＲ２及びＶＭＲ３の書込み側の制
御回路は省略してある。The above operation is the same as the write control method of VMR1 by the vector mask register calculation explained in FIG. 6, and therefore the write side control circuits of VMR2 and VMR3 are omitted in FIG.

第１図の式(1)，式(2)が全エレメント処理されるのを待
たず、式(1)と式(2)のベクトルマスク間演算命令が起動
されると、ｉＮｉＴと書かれた信号２０７により、ま
ず、ＶＭＲ１の状態３１１が書き込み状態にセツトされ
ると共に、アツプダウンカウンタ３１２が０にリセツト
されるのは、先に述べた式(1)の処理と同様であるが、
さらにＶＭＲ２及びＶＭＲ３の状態か、読み出し状態に
セツトされる。従つて本例で第１図のプログラムの処理
を行つた場合、ＶＭＲ２は読み出しかつ書込み中の状態
となり、信号４０１がオンで信号４０３がオフのため、
書き込み済のエレメントのみについて読み出しの許可を
与える制御を行う。INiT is written when the vector-mask operation instructions of equations (1) and (2) are started without waiting for all elements of equations (1) and (2) in FIG. 1 to be processed. The signal 207 first sets the state 311 of VMR1 to the write state and resets the up-down counter 312 to 0, as in the case of the above-described equation (1).
Further, the state of VMR2 and VMR3 or the read state is set. Therefore, when the processing of the program shown in FIG. 1 is performed in this example, the VMR2 is in a state of reading and writing, and since the signal 401 is on and the signal 403 is off,
Control is performed to give read permission only to the elements that have been written.

前述の式(1)の命令により１エレメントの書き込みがあ
る度に、カウンタ３２２は＋１されるから、その結果を
使うベクトルマスク間演算では、カウンタ３２２の値が
正である限り、読み出し許可信号４０４を発行し、カウ
ンタ３２２から−１する。例では、簡単の為に４０８に
より−１しているが、４０６により−１してもよい。The counter 322 is incremented by 1 every time one element is written by the instruction of the above formula (1). Therefore, in the vector-mask operation using the result, as long as the value of the counter 322 is positive, the read permission signal 404 Is issued and the counter 322 is decremented by -1. In the example, -1 is given by 408 for the sake of simplicity, but -1 may be given by 406.

この４０８により−１した場合は４０６により−１した
場合に比べ性能の劣化する場合があるが、この劣化は３
９１〜３９４にある回路により回避できる。信号４０６
により−１する場合３９１〜３９４の回路は不要であ
る。以下では、簡単のため、ベクトルレジスタ，ベクト
ルマスクレジスタの読出し許可信号作成及び、演算許可
信号作成時の３９１〜３９４に相当する回路は図面中で
は省略する。In the case of -1 by 408, the performance may deteriorate as compared with the case of -1 by 406, but this deterioration is 3
This can be avoided by the circuits in 91-394. Signal 406
Therefore, when -1 is applied, the circuits 391 to 394 are unnecessary. In the following, for simplification, the circuits corresponding to 391 to 394 at the time of creating the read enable signal of the vector register and the vector mask register and the operation enable signal are omitted in the drawing.

ＶＭＲ２と同様にＶＭＲ３も、読み出し許可信号を作成
し、両者のＡＮＤ条件をとつた４０６が最終的な演算許
可信号となつて読み出しアドレス３０２の値を進め、読
み出し用セレクタ３２６，３３６を介して、ベクトルマ
スク専用演算器３０５にデータを送ると共に、書込み許
可信号としてカウンタ３１２の値に＋１を行い、書込み
アドレス３０４の値を進める。ここで、演算器１４１を
ベクトルマスクレジスタ間演算専用とすることで、僅か
なコストの追加により、処理の高速化が可能となつてい
る。第５図では、ベクトルレジスタ用演算器３個、ベク
トルマスク専用演算器１個の構成となつているが、ベク
トルマスク専用演算器は入力データ巾が、前者の３２〜
６４ビツト／エレメントに対し、後者は１ビツト／エレ
メントであり、コストもそれに順じて少なく出来る。Like the VMR2, the VMR3 also creates a read enable signal, and the AND condition 406 of both is used as the final operation enable signal to advance the value of the read address 302, and through the read selectors 326 and 336, The data is sent to the vector mask dedicated arithmetic unit 305, the value of the counter 312 is incremented by 1 as a write enable signal, and the value of the write address 304 is advanced. Here, by dedicating the computing unit 141 to the computation between the vector mask registers, it is possible to speed up the processing by adding a little cost. In FIG. 5, three vector register arithmetic units and one vector mask dedicated arithmetic unit are provided, but the vector mask dedicated arithmetic unit has an input data width of 32 to 32.
The latter is 1 bit / element with respect to 64 bits / element, and the cost can be reduced accordingly.

第６図の制御により、第７〜８の命令と第９の命令が同
期されることが判るが、第７図は、第１１の命令と第１
２の命令の同期方法を示したものである。ＶＲ２とＶＲ
３の読み出し制御回路は、第６図のベクトルマスクレジ
スタのＶＭＲ２とＶＭＲ３のそれと同様のものを採用し
ている為、ここでは説明を省く。ＶＭＲ１の読み出し制
御回路と第６図の読出し制御回路との相違は、５０２の
フリツプフロツプを条件付き命令でないことを記録させ
るよう、新たに設け、その場合、演算器等に送るベクト
ルマスク値６０３を、信号６０２を介して常に“１”に
していることである。このフリツプフロツプにより、条
件付きベクトル演算命令以外にも、同じ演算器５０５を
使用できる。By the control of FIG. 6, it can be seen that the 7th to 8th instructions and the 9th instruction are synchronized, but FIG. 7 shows the 11th instruction and the 1st instruction.
2 shows a method of synchronizing the two instructions. VR2 and VR
Since the read control circuit of No. 3 adopts the same one as that of VMR2 and VMR3 of the vector mask register of FIG. 6, its explanation is omitted here. The difference between the read control circuit of VMR1 and the read control circuit of FIG. 6 is that a flip-flop 502 is newly provided so as to record that it is not a conditional instruction, and in that case, a vector mask value 603 to be sent to an arithmetic unit or the like is added. That is, it is always set to “1” via the signal 602. By this flip-flop, the same arithmetic unit 505 can be used in addition to the conditional vector arithmetic instruction.

また、第６図でＶＭＲ２とＶＭＲ３の読出し許可信号間
のＡＮＤをとつた信号４０６に相当して、第７図での演
算処理許可（本例では１エレメントの主記憶格納許可）
信号６０６は、ベクトルマスクレジスタＶＭＲ１の読出
し許可信号６４４と、ベクトルレジスタＶＲ１の読出し
許可信号６３４と、主記憶書込み制御回路１０２１から
の受付け許可信号６０１との間でＡＮＤをとつたものと
している。条件付き命令以外の主記憶書込み命令の処理
の場合は、信号２０７によりレジスタ５０２に１を格納
し、信号６０２とＯＲ回路５４３により、信号６４４を
常に１に設定する。In addition, in FIG. 6, corresponding to the signal 406 obtained by ANDing the read permission signals of VMR2 and VMR3, the arithmetic processing permission in FIG. 7 (in this example, the main memory storage permission of one element) is permitted.
The signal 606 is an AND of the read permission signal 644 of the vector mask register VMR1, the read permission signal 634 of the vector register VR1, and the acceptance permission signal 601 from the main memory write control circuit 1021. When processing a main memory write instruction other than a conditional instruction, 1 is stored in the register 502 by the signal 207, and the signal 644 is always set to 1 by the signal 602 and the OR circuit 543.

〔The invention's effect〕

以上のようにして、第４図のタイムチヤートに示したよ
うに従来は３Ｎ＋３サイクルかかつた処理を、本特許に
よりＮ＋３サイクルで処理できる。この短縮された２Ｎ
サイクルの内訳は、ベクトルマスク生成の比較演算命令
と、ベクトルマスク間の論理積演算命令との間での処理
の非並列化による従来方式での損失がＮサイクル、論理
積演算命令と、条件付きベクトル演算処理（本例では条
件付き主記憶格納）命令との間での処理の非並列化によ
る従来方式での損失がＮサイクルであり、両損失の改善
には、第６図と第７図に示した２通りの同期機構がそれ
ぞれ寄与している。両改善とも、専用のベクトルマスク
レジスタと演算器を使用せずに、通常のベクトルレジス
タを拡張して使用しても可能であるが、専用化すること
により、同じ性能を得る上で、コスト的に３２〜６４倍
有利であることが明らかである。As described above, as shown in the time chart of FIG. 4, it is possible to process the conventional 3N + 3 cycles by N + 3 cycles according to this patent. This shortened 2N
The breakdown of the cycles is that the loss in the conventional method due to the non-parallelization of the processing between the vector mask generation comparison operation instruction and the logical product operation instruction between the vector masks is N cycles, the logical product operation instruction, and the conditional operation The loss in the conventional method due to the non-parallelization of the processing with the vector operation processing (conditional main memory storage in this example) instruction is N cycles, and both losses can be improved by referring to FIGS. 6 and 7. The two types of synchronization mechanisms shown in (1) contribute to each. Both improvements can be achieved by expanding and using a normal vector register without using a dedicated vector mask register and arithmetic unit. It is clear that there is an advantage of 32 to 64 times.

[Brief description of drawings]

第１図はマスク付きベクトル処理を含むＦＯＲＴＲＡＮ
ブログラム例。第２図は従来技術による第１図ブログラ
ムの処理手順。第３図は、従来技術による図１の処理の
タイムチヤート。第４図は本発明による図１の処理のタ
イムチヤート。第５図は、本発明による処理装置の一構
成例を示す。第６図は、ベクトルマスクレジスタ間の演
算実行制御の一実施例、第７図は条件付きベクトル主記
憶格納処理実行制御の一実施例を示す。１２１〜１２３
はベクトルマスク専用レジスタ。１４１はベクトルマスク専用演算器。FIG. 1 shows FORTRAN including vector processing with mask.
Program example. FIG. 2 is a processing procedure of the FIG. 1 program according to the prior art. FIG. 3 is a time chart of the processing of FIG. 1 according to the prior art. FIG. 4 is a time chart of the process of FIG. 1 according to the present invention. FIG. 5 shows a structural example of the processing apparatus according to the present invention. FIG. 6 shows an embodiment of arithmetic execution control between vector mask registers, and FIG. 7 shows an embodiment of conditional vector main memory storage processing execution control. 121-123
Is a vector mask dedicated register. 141 is a vector mask dedicated arithmetic unit.

───────────────────────────────────────────────────── フロントページの続き (72)発明者阿部仁神奈川県秦野市堀山下１番地株式会社日立製作所神奈川工場内 (56)参考文献特開昭57−23174（ＪＰ，Ａ) 特開昭56−22168（ＪＰ，Ａ) 特開昭56−88562（ＪＰ，Ａ) ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Hitoshi Abe 1 Horiyamashita, Hadano City, Kanagawa Pref., Kanagawa Factory, Hiritsu Seisakusho Co., Ltd. (56) Reference JP-A-57-23174 (JP, A) JP-A-56 -22168 (JP, A) JP-A-56-88562 (JP, A)

Claims

[Claims]

1. A plurality of vector registers, a plurality of arithmetic processing means for performing arithmetic processing on vector data held therein, and a plurality of vector masks used by any one of the plurality of arithmetic processing means. Of the mask registers and the vector masks held in those mask registers, and a mask calculator for supplying the result to any one of the mask registers, and the first and second mask registers held in the first and second mask registers. When the second vector mask is read out, supplied to the mask arithmetic unit, and the resulting third vector mask is being written in the third mask register, in parallel with the writing, one of the vector registers In order to perform a masked operation on any one of the vector data held by the third vector matrix, Masked vector operation processing apparatus having a synchronization means for sequentially reading corresponding pairs of elements in the written element and the vector data of click.

2. A plurality of vector registers and a plurality of arithmetic processing means for performing arithmetic processing on vector data held therein, which are first and second for creating a vector mask.
Including first and second arithmetic processing means capable of executing the arithmetic operation, a plurality of mask registers respectively holding vector masks used by any of the plurality of arithmetic processing means, and the mask registers The first and second arithmetic operations are performed in parallel by a mask arithmetic unit that performs an arithmetic operation on the held vector mask and supplies the result to one of the mask registers, and the first and second arithmetic processing means. When the resulting first and second vector masks are written into the first and second mask registers, respectively, the third
For the generation of the vector mask, the first synchronization is performed in parallel with the writing, and the pair of written elements corresponding to each other of the first and second vector masks are sequentially read and supplied to the mask calculator. And writing the third vector mask to the third mask register in order to perform a masked operation on the vector data held in any of the vector registers by any one of the arithmetic processing means. , A second vector synchronization processing device for sequentially reading a pair of a written element of the third vector mask and a corresponding element of vector data to be subjected to the masked operation.