JPS626374A

JPS626374A - Vector processor

Info

Publication number: JPS626374A
Application number: JP14479985A
Authority: JP
Inventors: Tomoo Aoyama; 青山　智夫; Takayuki Nakagawa; 貴之中川; Tadaaki Isobe; 磯部　忠章
Original assignee: Hitachi Ltd; Hitachi Computer Engineering Co Ltd
Current assignee: Hitachi Ltd; Hitachi Computer Engineering Co Ltd
Priority date: 1985-07-03
Filing date: 1985-07-03
Publication date: 1987-01-13

Abstract

PURPOSE:To prevent the deterioration of performance of a vector processor due to serializing control by adding a means to the vector processor to control the executing order of plural vector instruction elements. CONSTITUTION:When an advanced instruction is executed, a logic circuit produces a valid signal for execution of processing equivalent to a single vector element of the advanced instruction. The subsequent instruction is decoded while said valid signal is processed and therefore a writing counter 201 is cleared to zero. While the valid which executed the 2nd vector element of the advanced instruction is suppressed by a comparator 206. When the processing is through with the 1st element of the subsequent instruction, the valid signal is received through a bus 200 and the counter 201 is counted up. Thus the valid suppression is released and a reading counter 203 is counted up. Then an execution valid signal is obtained on a bus 209.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は、ベクトル処理装置に係り、特に主記憶参照の
順序性をベクトル要素について保証する制御を導入する
ことにより、任意の主記憶参照を行うベクトル命令相互
間の並列実行を可能とし、ベクトル処理装置の性能を向
上させる方式に関する。[Detailed Description of the Invention] [Field of Application of the Invention] The present invention relates to a vector processing device, and in particular, a method for performing arbitrary main memory references by introducing control that guarantees the order of main memory references for vector elements. The present invention relates to a method for improving the performance of a vector processing device by enabling parallel execution of vector instructions.

[Background of the invention]

従来、科学技術計算に頻繁にあられれる行列計算等の高
速処理を行うために、種々のベクトル処理装置が提案さ
れている。当該ベクトル処理装置の一つに、複数のベク
トル演算器、ベクトルレジスタ、および該ベクトルレジ
スタと主記憶装置間のデータ転送を行うリクエスタを具
備するベクトル処理装置がある（リチャード・Ｍ・ラッ
セル、ｒＣＲＡＭ−１コンピユータ・システム」、汎用
大型コンピュータｐ、２８５−２９５゜日経マグロウヒ
ル社（１９８２））。当該ベクトタはりクエスタによっ
ていりたんベクトルレジスタにロードされ、のちにベク
トル命令の指定に従いベクトル演算器に送られ処理され
た後ベクトルレジスタに格納される。このベクトルレジ
スタ上のベクトルデータは続いて解読されるベクトル命
令がデータ処理を規定している場合、主記憶を介するこ
となくベクトル演算器に送られ処理が行われる。このよ
うに一連のベクトル命令によって規定されるデータ処理
が主記憶を経由することなく主記憶よりもアクセス時間
の速いベクトルレジスタを介して行われるため、演算器
を主記憶装置よりも高速に設計しても演算に必要なデー
タ転送能力を確保できる利点がある。さらにベクトル処
理装置の性能を向上させるため、ベクトルレジスタを高
速のＲＡＭ素子により構成し、これによりマシンサイク
ルピッチに当該レジスタに対し書込、続出を行う機能を
実現し、この機能により前出のベクトル命令の全ベクト
ル要素に対する処理の終了を待つことなく、引きつづい
て解読されたベクトル命令の指示に従って処理の完了し
たベクトル要素を異なるベクトル演算器に送出し、複数
のベクトルデータ処理を並列化する技術がある（特開昭
５７−１８７８２８号）。以後この技術によってベクト
ルレジスタ上のデータを同時書込、読出する論理動作を
チェイニングと呼ぶ。Conventionally, various vector processing devices have been proposed in order to perform high-speed processing such as matrix calculations that are frequently involved in scientific and technical calculations. One such vector processing device is a vector processing device that includes a plurality of vector arithmetic units, a vector register, and a requester that transfers data between the vector register and the main memory (Richard M. Russell, rCRAM- 1 Computer System'', General Purpose Large Computer p, 285-295゜Nikkei McGraw-Hill Inc. (1982)). The vector data is once loaded into the vector register by the quester, and later sent to the vector arithmetic unit according to the designation of the vector instruction, processed, and stored in the vector register. If the vector instruction that is subsequently decoded specifies data processing, the vector data on this vector register is sent to the vector arithmetic unit and processed without going through the main memory. In this way, data processing specified by a series of vector instructions is performed via vector registers, which have faster access times than main memory, without going through main memory, so arithmetic units can be designed to be faster than main memory. It has the advantage that it can secure the data transfer capacity necessary for calculations. Furthermore, in order to improve the performance of the vector processing device, the vector register is configured with a high-speed RAM element, and this realizes the function of writing and successively writing to the register at a machine cycle pitch. Technology that parallelizes multiple vector data processing by sending vector elements that have been processed to different vector arithmetic units according to the instructions of the subsequently decoded vector instruction, without waiting for the completion of processing on all vector elements of the instruction. (Japanese Unexamined Patent Publication No. 187828/1983). Hereinafter, the logical operation of simultaneously writing and reading data on vector registers using this technique will be referred to as chaining.

ベクトル処理装置において、チェイニングを行う場合、
命令のオペランドに記述されているレジスタ番号を複数
の命令間で共有することにより、複数のベクトル命令の
処理要素間の順序性が保障される。この命令オペランド
のベクトルレジスタ番号共有により、主記憶参照命令と
演算命令間、演算命令相互間のベクトル要素の処理の順
序性が保証され、これによりプログラムによって記述さ
れている「論理」がベクトル処理装置によって実行され
る。しかし、プログラムによって記述されている「論理
」は上記のチェイニング動作の制御だけでは十分でない
。When chaining is performed in a vector processing device,
By sharing the register number written in the operand of an instruction among a plurality of instructions, the ordering among processing elements of a plurality of vector instructions is guaranteed. This sharing of vector register numbers for instruction operands guarantees the order of vector element processing between main memory reference instructions and arithmetic instructions, and between arithmetic instructions. executed by However, the "logic" described by the program is not sufficient to control the chaining operation described above.

たとえば次のようなＦＯＲＴＲＡＮコードで、ＤＯｌｏ
ｏＩ−１，ＮＡ（Ｉ）−Ｂ（Ｉ）＊　Ｃ（Ｉ）Ｄ（Ｉ）−Ａ（Ｉ　−１）＋　Ｅ（Ｉ）１０ロ　Ｃ０Ｎ
ＴＩＮＵＥ配列Ａに対するストアとロードを同時に実行するような
オブジェクトコードをコンパイラが生成すると動作の正
当性が保証されない。For example, with FORTRAN code like this, DOlo
oI-1, N A(I)-B(I)* C(I) D(I)-A(I-1)+ E(I) 10ro C0N
TINUE If the compiler generates object code that simultaneously stores and loads array A, the correctness of the operation cannot be guaranteed.

たとえばコンパイラの出力が、 ■ｖ４ｃＪＪ４ｉ４Ｐａｊ　ＶＲ０４−１１■ｖ４ｃｂ
４１ａｎｄ　７　Ｒｊ　４−０■ｖ４ｅＪｚ４ｗａｄｕ
ＬｐＬｙ　Ｖ　Ｒ２４−Ｖ　ＲＯ＊Ｖ　Ｒ１■ｖ４６−
ｔｊ４山４ｖＲ２→ム ■Ｖ＋ａｉｆ４Ｌｙａｔｔ　ｙ　Ｒ５←Ａ■ｖ４６１ｊ
４Ｌｔａｔｌ　ＶＲ４４−ＩＩのｖ４ｃ、ｂａ、　ａｄ
ｔｌ　Ｖ　Ｒ５４−Ｖ　Ｒ５＋ｖＲ４■ｖ４ｃＪｓ４Ｊ
ａ４４７１ｊ　５−＋　１Ｄとすると、■のベクトルス
トアと■のベクトルロードの間に、要素間の順序性が保
証されていない。これは■のベクトルストアを■のベク
トルロードが追抜く可能性があるからである。For example, the compiler output is ■v4cJJ4i4Paj VR04-11■v4cb
41and 7 Rj 4-0■v4eJz4wadu
LpLy V R24-V RO*V R1■v46-
tj4yama4vR2→mu■V+aif4Lyatt y R5←A■v461j
4Ltatl VR44-II v4c, ba, ad
tl V R54-V R5+vR4■v4cJs4J
When a4471j 5-+ 1D, the ordering between elements is not guaranteed between the vector store (■) and the vector load (■). This is because the vector load of ■ may overtake the vector store of ■.

第１図にこの追抜ケースを図示した。第１図はベクトル
命令の処理ステージをデフードと各要素の実行ステージ
に区分し、この区分を時間軸を横軸にとった場合の、■
のベクトルストア命令と■のベクトルロード命令の処理
ステージ一時間対応を図示したものである。第１図にお
いてベクトル要素番号の示されていない場所は、多バン
ク構成の主記憶に対する他の主記憶参照によって、■、
■のベクトル命令が実行できないステージである。Figure 1 illustrates this overtaking case. Figure 1 divides the processing stage of a vector instruction into a defood stage and an execution stage of each element, and when this division is plotted on the horizontal axis,
This diagram illustrates the one-hour correspondence between the processing stages of the vector store instruction (2) and the vector load instruction (2). In FIG. 1, the locations where vector element numbers are not shown are
This is the stage where the vector instruction (2) cannot be executed.

演算処理と異なり、主記憶参照要求は第１図に示したよ
うな、命令実行不可となるステージが不規則に出現する
ことが特徴である。この実行不可ステージにより、■の
ベクトルストア命令の第２要素分の処理が終了しないに
もかかわらず、■のベクトルロード命令の第３要素分の
処理が実行されている。即ちＦＯＲＴＲＡＮコードの第
２の式のＩ＝３の場合に、目的とした計算が行われない
ことになる。Unlike arithmetic processing, main memory reference requests are characterized by the irregular appearance of stages in which instructions cannot be executed, as shown in FIG. Due to this unexecutable stage, the processing for the third element of the vector load instruction (2) is being executed even though the processing for the second element of the vector store instruction (2) has not been completed. That is, when I=3 in the second formula of the FORTRAN code, the intended calculation will not be performed.

上記に示したような不都合を未然に防ぐため、従来は■
と■の間のベクトル命令間にシリアライスヲ行つ命令を
挿入し、■のベクトルス）ア命令が完了するのを待って
、■のベクトルロード命令の実行を開始する制御を行う
ことが多かった。このようなシリアライス制御を行うと
、ベクトル命令の並列実行を抑止することになるため著
しく処理装置の性能が低下する。In order to prevent the inconveniences shown above, conventionally
In many cases, an instruction to perform serial processing is inserted between the vector instructions between and ■, and control is performed to wait for the completion of the vector instruction in ■ and then start execution of the vector load instruction in ■. When such serial rice control is performed, parallel execution of vector instructions is inhibited, and the performance of the processing device is significantly degraded.

[Purpose of the invention]

本発明の目的は、主記憶参照要求を行う複数のベクトル
命令間にシリアライス制御を行うことなく、複数のベク
トル命令要素間の実行順序を制御する手段を新たにベク
トル処理装置に付加し、当該手段によりシリアライス制
御に起因する処理装置の性能低下を抑止できるベクトル
処理装置を提供することにある。An object of the present invention is to add a new means to a vector processing device for controlling the execution order among a plurality of vector instruction elements without performing serial rice control between a plurality of vector instructions that make a main memory reference request, and to It is an object of the present invention to provide a vector processing device that can suppress performance degradation of the processing device due to serial rice control.

[Summary of the invention]

現在のベクトル処理装置では、ベクトルレジスタ番号を
複数のベクトル命令のオペランドとして共有することに
より、ベクトル処理装置のチェイニング動作を規定して
いる。即ちベクトル命令のオペランドとして明示されて
いるベクトルレジスタは、演算の結果を保存しておく記
憶手段以外に、演算の順序性を規定する制御手段を指定
している。従って主記憶を参照するベクトル命令間で、
チェイニング制御を行えば、データアクセスの順序性は
保証される。In current vector processing devices, chaining operations of the vector processing device are defined by sharing a vector register number as an operand of a plurality of vector instructions. That is, a vector register specified as an operand of a vector instruction specifies a control means that defines the order of operations in addition to a storage means for storing the results of operations. Therefore, between vector instructions that refer to main memory,
Chaining control guarantees the order of data access.

複数の主記憶参照要求は次の４種類の基本的ベクトル命
令対に論理的に分解される。Multiple main memory reference requests are logically decomposed into the following four basic vector instruction pairs.

（１）　ｖ４山４Ｊａｎｔｔ　Ｖ　ＲＯ、ａｄｔｔ昇ａ
ｉ　０ｖ４ｔ、ＪｔＪ４ＪＬｅｔｕｉ　ｖ’ｆｉ　１ａ
ｔｌｔｔｋ＊ｉ　１（２）　ｖ４ｄａ４Ｊａａｄ　Ｖ　
ＲＤ　、　ａｄｔｌａ＋、ｈｉ　０Ｖ４６λ５４　　Ａ
ム’４４　”Ｉ　Ｒ１ｎｄｔｉａ、ｂｊｉ　　１（３）
　ｖ４６．ｔｅ４藷４４Ｖ　ＲＯ、ａｄｔｌ赫１ｊＯＶ
＜ｃＪＪ＋　ｂａｄ　ｙ　Ｒｊ　　’−赫ｉｉ　ｊ（４
）　ｖ４ａ、ｔｙ４−イ４４Ｖ　ＲＯ、−−ｊ−ｈ　０
ｖ４ｃ、ｔｒ４　！４４７　Ｒｊ　　、ａｔｌｄａ＜ａ
ｊ　ｊ　　。(1) v4 mountain 4 Jantt V RO, adtt rise a
i 0v4t, JtJ4JLetui v'fi 1a
tlttk*i 1(2) v4da4Jaad V
RD, adtla+, hi 0V46λ54 A
Mu'44 "I R1ndtia,bji 1(3)
v46. te4藷44V RO, adtl浫1jOV
<cJJ+ bad y Rj '-赫ii j(4
) v4a, ty4-i44V RO, --j-h 0
v4c, tr4! 447 Rj, atlda<a
j j.

（１）の場合、アドレス０．１間に重複があるなしにか
かわらず、２つのベクトルロード命令の処理要素間に順
序性を保証する必要がない。即ち、順序性を保証しても
結果に影響がない。In the case of (1), there is no need to guarantee ordering between processing elements of two vector load instructions, regardless of whether there is an overlap between addresses 0.1 or not. That is, even if the ordering is guaranteed, the results are not affected.

（２、３）の場合、ベクトルロード命令とベクトルスト
ア命令のアドレスに重複があると、２つの命令間に処理
要素間の順序性を保証する制御が必要である。In the case of (2, 3), if there is an overlap in the addresses of the vector load instruction and the vector store instruction, control is required between the two instructions to ensure ordering between processing elements.

（４）の場合、２つのベクトルストア命令でアト、レス
の重複が処理要素全てについて成立する場合、前出のベ
クトルストア命令は無効になる。In the case of (4), if two vector store instructions have overlapping at and responses for all processing elements, the previous vector store instruction becomes invalid.

このような場合、コンパイラの最適化機能によって、無
効となった命令を消去し、処理要素の順序性の問題を解
決することができる。しかし２つのベクトルストア命令
のアドレスの一部に重複がある場合、フンバイラでは処
理要素の順序性の問題を解決できず、ハードウェアのチ
ェイニング制御を適用する必要がある。In such a case, the optimization function of the compiler can eliminate the invalid instructions and solve the problem of ordering of processing elements. However, if there is overlap in some of the addresses of two vector store instructions, Funvira cannot solve the problem of ordering of processing elements, and it is necessary to apply hardware chaining control.

チェイニング制御は、現在のベクトルプロセッサではベ
クトルレジスタに対する「書込、続出」の場合について
行われることが多い。チェイニング制御は任意のベクト
ルレジスタに対するアクセスの組合せについて可能であ
るが、プログラムに現われる場合の数と制御論理の八−
ド量とのかね合いから、「書込、続出」の場合にチェイ
ニング制御を採用している。In current vector processors, chaining control is often performed in the case of "writing and continuing" to a vector register. Chaining control is possible for any combination of accesses to vector registers, but the number of cases that appear in the program and the control logic are limited.
Due to the balance with the amount of data, chaining control is used in the case of ``writing and writing one after another.''

上記の例の（２〜４）の場合に処理要素の順序性を保証
するためチェイニング制御を導入することを考える。ま
ず（２）の場合、処理の順序性を保証するためには、ベ
クトルロード命令のベクトルレジスタ０番への書込許可
信号を後続のベクトルストア命令の実行バリッド信号と
す゛ればよい。（３）の場合は前出のベクトルストア命
令の主記憶装置への要素書込信号を後続のベクトルロー
ド命令の実行バリッド信号とする。Consider introducing chaining control to ensure the order of processing elements in cases (2 to 4) of the above examples. First, in case (2), in order to guarantee the order of processing, it is sufficient to use the write enable signal of the vector load instruction to vector register number 0 as the execution valid signal of the subsequent vector store instruction. In case (3), the element write signal to the main memory of the preceding vector store instruction is used as the execution valid signal of the subsequent vector load instruction.

（４）の場合は同様にベクトルストア命令の主記憶装置
への要素書込信号を後続のベクトルストア命令の実行バ
リッド信号とすればよい。In the case of (4), similarly, the element write signal to the main memory of the vector store instruction may be used as the execution valid signal of the subsequent vector store instruction.

このような実行バリッド信号によってメモリリクエスタ
を動作させる方式は、現在のベクトル処理装置では標準
的に備えられていて、この機能によりてインデクス付の
ロード／ストア命令が実行される。しかしインデクス付
命令のアドレスは実行バリッド信号に同期してベクトル
レジスタから送られて来るデータであるが、本発明の方
式では「アドレス」はりクエスタ内のアドレス加算器で
生成する点が異なる。従って実行バリッド信号にデータ
は不必要で、当該バリッドを保持すべきベクトルレジス
タも又データ部分が不要である。このようなデータ部分
のないベクトルレジスタを以下■−５ｃａｎ！　７＋ｃ
ルル’Ｆｉｃ針−２ゐル（略してＶＶＲ）という。この
ＶＶＲを用いて前述の（２〜４）の場合の要素類の順序
性保証を行りた場合の命令ニモニック表示を示す。A method of operating a memory requester using such an execution valid signal is a standard feature in current vector processing devices, and this function is used to execute load/store instructions with an index. However, the address of an indexed instruction is data sent from a vector register in synchronization with an execution valid signal, but the method of the present invention differs in that the "address" is generated by an address adder in the quester. Therefore, the execution valid signal does not require data, and the vector register that should hold the valid signal also does not need a data portion. A vector register without such a data part is shown below ■-5can! 7+c
It's called Lulu'Fic Needle-2 (VVR for short). The instruction mnemonic display is shown when this VVR is used to guarantee the order of the elements in the cases (2 to 4) described above.

（２）　　７４６λｅａ、　Ｊ４ａｔｉ　　Ｖ　ＲＯ、
ａｔｌｉａ、ｃ＊ｊＯ、Ｖ　Ｖ　Ｒ０ｖ４ｃ、ｔｅ４ｚ
ム１４７　Ｒ１、ａｄｔｔ４４ｊｊ１　　、Ｖ　Ｖ　Ｒ
Ｏ。(2) 746λea, J4ati V RO,
atlia, c*jO, V V R0v4c, te4z
M147 R1, adtt44jj1, V V R
O.

（３）　　ｖ０Ω）ｈ　ル６Ａ４Ｖ　ＲＯ、ｃＬｔｌｄ
４４＊ａ　Ｏ、Ｖ　Ｖ　Ｒ０ｖ４ｔｓｌｚ４Ｊ−ｔａｊ
　Ｖ　Ｒ１、ａｔｔｄ赫＊ｉ　１　、　Ｖ　Ｖ　ＲＯ。(3) v0Ω)h 6A4V RO, cLtld
44*a O, V V R0v4tslz4J-taj
V R1, attd 赫*i 1, V V RO.

（４）　　ｖ４ｃ、λ１４ルムＡ４Ｖ　ＲＯ、ａｔｌｔ
ｔル番＊、ｈ　　Ｏ、Ｖ　Ｖ　Ｒ０ｖ４ｃｉｌｙ４ｊ＋
ム１４Ｖ　Ｒ１、ｎｄｔｔａ＜ｉ＊　１　　、’Ｉ　Ｖ
　ＲＯ。(4) v4c, λ14 Lum A4V RO, atlt
tru number *, h O, V V R0v4cily4j+
M14V R1, ndtta<i* 1,'I V
R.O.

次に後続の命令がベクトルストア命令であった場合の、
ベクトルレジスタから送出されるストアデータと、ＶＶ
Ｒから送出される実行バリッド信号との同期について考
察する。この同期をとるためにベクトルレジスタおよび
ＶＶＲのチェイニング制御を用いることは出来ないので
、メモリリクエスタ内に、ストアデータと実行バリッド
のスタックを具備し、このスタックの制御によって、ベ
クトルレジスタ、ＶＶＲのチェイニングを中断させる制
御を行う。Next, if the subsequent instruction is a vector store instruction,
Store data sent from the vector register and VV
Consider synchronization with the execution valid signal sent from R. Since chaining control of vector registers and VVR cannot be used to achieve this synchronization, a stack of store data and execution valid data is provided in the memory requester, and by controlling this stack, chaining control of vector registers and VVR can be used. Performs control to interrupt the inning.

以上の議論ではベクトルレジスタのチェイニング制御と
いう観点から、２ベクトル命令の主記憶参照要求の処理
要素の順序性問題の解決を試みている。しかしベクトル
命令の処理状況を伝帳させる手段という観点からみると
、上記のＶＶＲを用いる制御は、先出の命令が後出の命
令実行を制御していることである。従ってＶＶＲを用い
る前述の制御は従来のベクトルレジスタ制御法を採用し
ているベクトル処理装置で可能なデータ処理の範囲を越
えるものではない。In the above discussion, an attempt has been made to solve the problem of the ordering of processing elements for a main memory reference request of a two-vector instruction from the viewpoint of vector register chaining control. However, from the viewpoint of a means for reporting the processing status of vector instructions, the control using the VVR described above means that the earlier issued instruction controls the execution of the later issued instruction. Therefore, the above-described control using VVR does not exceed the range of data processing possible with vector processing devices employing conventional vector register control methods.

従来のベクトル処理装置のデータ処理を越えるには、後
出の命令の処理が先出の命令の処理を規定する制御手段
が必要になる。このような従来の命令の制御方向とは逆
の方向の制御を必要とする例は、ＦＯＲＴＲＡＮコーＴ
Ｒ状すと、ＤＯｌｏｏＩ−１，ＮＡ（ＬＩＳＴｌ　（Ｉ））−Ａ（ＬＩＳＴ２　（Ｉ））
＋１．０１ｏｏ　Ｃ０ＮＴＩＮＵＥである。この例では配列人に対するロードとストアの実
行順序をロード→ストア→ロード→ストア→・・・・・
・のように相互に行わなければならない。従って配列Ａ
をインデクス付ロード命令でベクトルレジスタへ格納す
る処理と、計算結果をインデクス付ストア命令で主記憶
へ書込む処理を同期させる必要がある。この同期手段に
「後出の命令が先出の命令の処理を規定する制御手段」
を用いることができる。In order to surpass the data processing of conventional vector processing devices, a control means is required in which the processing of later instructions defines the processing of earlier instructions. An example that requires control in the opposite direction to the control direction of conventional instructions is the FORTRAN code T
R state, DOlooI-1,NA(LISTl(I))-A(LIST2(I))
+1.01oo C0NTINUE. In this example, the execution order of loading and storing for the array person is Load → Store → Load → Store → ...
・It must be done mutually as follows. Therefore array A
It is necessary to synchronize the process of storing the calculation result into the vector register using the load instruction with index and the process of writing the calculation result to the main memory using the store instruction with index. This synchronization means is a "control means by which later instructions specify the processing of earlier instructions."
can be used.

このようにＶＶＲを用いる制御法を拡張することにより
、ベクトルレジスタのチェイニング制御では不可能であ
った処理をもベクトル処理の対象とすることができる。By expanding the control method using VVR in this way, vector processing can also be applied to processing that is impossible with vector register chaining control.

ＶＶＲを用いた上記の如きリカーシブルな制御を、前述
のＶＶＲのチェイニング制御と区別するために、命令の
ニモニックを’ＰＡＴＨ’という表示にする。この表示
によれば前記のＦＯＲＴＲＡＮコーＴＲ状のようなオブ
ジェクトコードとなる。In order to distinguish the above-mentioned recursive control using VVR from the VVR chaining control described above, the mnemonic of the command is expressed as 'PATH'. According to this display, the object code is like the FORTRAN code TR format described above.

■ｖ４ｃ、ｔｅａ、Ｊｅａｊ　　　ＶＲＯｈ　　Ｌ　Ｉ
　Ｓ　Ｔ　１■ｖ４ｂｂ＋　Ｌｅａｄ　　Ｖ　Ｒ１４−
Ｌ工Ｓ’Ｉ’２■　Ｖ＜ａ＆ルＪａａｄ　Ｌｒｕｔ＜ｒ
−ｕｌ　Ｖ　Ｒ２４−Ａ、ｓｗＬｔｃｙ　Ｌ工Ｓ　Ｔ　
２　、ＰＡＴＨＯ■ｖ４！４ｄ偽−−ｊ４ａｄｔｌＶ　
Ｒ５４−Ｖ　Ｒ２＋’ｔ　Ｏ’■　−１）ａ、、＊λＩ
ｖ４Ｌｒｗ１４−Ｖ　Ｒ３４Ａ　、ｖａＬｎｇ　ＬＩＳ
Ｔｌ、ＰＡＴＨＯ。■v4c, tea, Jeaj VROh L I
S T 1■v4bb+ Lead V R14-
L Eng S'I'2■ V<a&ru Jaad Lrut<r
-ul V R24-A, swLtcy L engineering S T
2, PATHO■v4!4d false--j4adtlV
R54-V R2+'t O'■ -1) a,, *λI
v4Lrw14-V R34A, vaLng LIS
Tl, PATHO.

即ち■のインデクス付ロード命令と■のインデクス付ス
トア命令に、ＶＶＲの制御によりて因果関係が設定され
次のような処理となる。That is, a causal relationship is set between the indexed load instruction (2) and the indexed store instruction (2) under the control of the VVR, resulting in the following processing.

はじめに■のインデクス付ロード命令が実行され、第１
ベクトル要素がＶＲ２に格納される。First, the load command with index in ■ is executed, and the first
Vector elements are stored in VR2.

次のタイミングで当該要素は■のベクトル加算命令で処
理され結果がＶＲ３に格納される。次にＶＲ３上の演算
結果はインデクス付ストア命令によって主記憶上のＡ配
列に格納される。当該ストア動作の完了後ＶＶＲ制御を
通して■のベクトル命令の処理が一要素分完了したこと
が■のインデクス付ロード命令を実行しているメモリリ
クエスタに報告される。この報告に従って、■の命令に
よって配列Ａの第２の要素が■Ｒ２にロードされる。以
上のようにして■、■。At the next timing, the element in question is processed by the vector addition instruction (3) and the result is stored in VR3. Next, the operation result on VR3 is stored in the A array on the main memory by an indexed store instruction. After the store operation is completed, the completion of processing of one element of the vector instruction (2) is reported to the memory requester executing the indexed load instruction (2) through VVR control. According to this report, the second element of array A is loaded into ■R2 by the instruction ■. As above, ■, ■.

■のベクトル命令間で同期がとられ、前述のＦＯＲＴＲ
ＡＮコードの処理の正当性がベクトル処理装置で保証さ
れる。Synchronization is established between the vector instructions of
The correctness of AN code processing is guaranteed by the vector processing device.

[Embodiments of the invention]

以下、本発明の一実施例を第２．３．４図を用いて説明
する。Hereinafter, one embodiment of the present invention will be described using FIG. 2.3.4.

第２図は本発明のベクトル処理装置の概略ブロック図を
示し、１はベクトルレジスタ、２は・演算器、３はＶＶ
Ｒ，４は初期値セット論理部、５．６はメモリリクエス
タ、７，９はスイッチング回路、８はインターリーブ構
成の主記憶、１０はベクトル命令によってパスを結合す
るスイッチング回路である。FIG. 2 shows a schematic block diagram of the vector processing device of the present invention, where 1 is a vector register, 2 is an arithmetic unit, and 3 is a VV
R and 4 are initial value set logic units, 5 and 6 are memory requesters, 7 and 9 are switching circuits, 8 is a main memory having an interleaved configuration, and 10 is a switching circuit that connects paths by vector instructions.

ベクトル処理装置が起動され、データ処理が行われ、ロ
ード命令とストア命令が同時に並列的に実行され、かつ
両命令のアクセスが主記憶上で重複している場合を考え
る。この重複はコンパイラによりて検出可能である。先
行しているベクトルロード命令が実行される時、命令の
オペランドフィールドに定醸されている情報に従ってパ
スの結合が行われる。ここでは、ロード命令によってメ
モリリクエスタ５が起動され、スイッチング回路１１に
よりてパス２０　、２１が接続され、スイッチング回路
９によって主記憶８からのデータバス２２がベクトルレ
ジスタ書込パス２３と対応づけられたとする。ここで「
対応」とは主記憶からのデータバス上のデータがどのメ
モリリクエスタからの主記憶参照要求であるかによって
、データの分別を行い、ある特定のメモリリクエスタに
対するデータのみを目的とするベクトルレジスタ書込パ
スへ送出することをいう。ベクトルロード命令の実行が
開始されると、主記憶参照アドレスは、メモリリクエス
タ５の内部で生成される。Consider a case where a vector processing device is activated, data processing is performed, a load instruction and a store instruction are simultaneously executed in parallel, and accesses of both instructions overlap on the main memory. This duplication can be detected by the compiler. When a preceding vector load instruction is executed, path combinations are performed according to the information defined in the operand fields of the instruction. Here, the memory requester 5 is activated by the load instruction, the paths 20 and 21 are connected by the switching circuit 11, and the data bus 22 from the main memory 8 is associated with the vector register write path 23 by the switching circuit 9. do. here"
"Compatible with" means that the data on the data bus from the main memory is classified according to which memory requester is the main memory reference request, and the data is written to a vector register for the purpose of only data for a specific memory requester. It means sending to the path. When execution of the vector load instruction is started, a main memory reference address is generated inside the memory requester 5.

第５図はメモリリクエスタ５の概略ブロック図である。FIG. 5 is a schematic block diagram of the memory requester 5. As shown in FIG.

メモリリクエスタ起動時、配列の基底値はパス１００を
通してカウンタ１０１内のラッチにセットされる。同時
に配列をポイントするための増分値がパス１０２を介し
てレジスタ１０３にセットされる。次にパス１０４を通
して送られて来る「実行バリッド信号」によって、カウ
ンタ１０１によって基底値と増分レジスタの値が加算さ
れ、レジスタ１０５にセットされてアドレス生成が行わ
れる。レジスタ１０５上のアドレスはパス１０６を通じ
てスイッチング回路７（第２図）へ送られる。When the memory requester is activated, the base value of the array is set in a latch in the counter 101 through path 100. At the same time, an increment value for pointing to the array is set in register 103 via path 102. Next, in response to the "execution valid signal" sent through the path 104, the base value and the value of the increment register are added by the counter 101, and the result is set in the register 105 to generate an address. The address on register 105 is sent via path 106 to switching circuit 7 (FIG. 2).

第２図のスイッチング回路７では、送られたアドレスを
チェックし、当該アドレスが主記憶８のどのバンクに存
在するかを決定し、パス２４のいずれかを選択し主記憶
ヘリクエスト信号を送出する。この時、他のメモリリク
エスタのリクエスト信号と区別をするため、リクエスト
信号にタグを付加させる。またストアとロードを区別す
るためのホダ信号も付加させる。主記憶装置８では、リ
クエスト信号に対応してデータとタグをパス２２に送出
する。スイッチング回路９は、タグを使用して指定され
たベクトルレジスタ書込パス２３ヘデータを送出する。The switching circuit 7 in FIG. 2 checks the sent address, determines in which bank of the main memory 8 the address is located, selects one of the paths 24, and sends a request signal to the main memory. . At this time, a tag is added to the request signal to distinguish it from request signals from other memory requesters. Additionally, a storage signal is added to distinguish between stores and loads. The main storage device 8 sends data and tags to the path 22 in response to the request signal. Switching circuit 9 sends data to the specified vector register write path 23 using the tag.

一方、命令のオペランドにＶＶＲを用いる制御が記述さ
れている場合、メモリリクエスタはアドレスを生成する
毎にパス２０を介して、バリッド信号を送出する。当該
信号はパス２１を通り、セレクタ１３を介して、命令オ
ペランドに示されているＶＶＲの書込カウンタを＋１す
る。On the other hand, if control using VVR is described in the operand of the instruction, the memory requester sends a valid signal via the path 20 every time it generates an address. The signal passes through the path 21 and via the selector 13, and increments the write counter of the VVR indicated by the instruction operand by one.

第４図にＶＴＲの概略ブロック図を示した。FIG. 4 shows a schematic block diagram of the VTR.

第４図において、パス２００を通してバリッド信号が送
られると、書込カウンタ２０１はカウント。In FIG. 4, when a valid signal is sent through path 200, write counter 201 starts counting.

アップされる。レジスタ２０２にはこのカウントアツプ
値が格納されているものとする。書込カウンタ２０１は
命令がベクトル処理装置によってデコードされる際ゼロ
クリアされるものとする。Will be uploaded. It is assumed that the register 202 stores this count-up value. It is assumed that the write counter 201 is cleared to zero when the instruction is decoded by the vector processing device.

ＶＶＲを介するチェイ、ニング制御は次のように行われ
る。ＶＶＲを読出す命令がデコードされると、読出カウ
ンタ２０３がゼロクリアされる。Chaining control via VVR is performed as follows. When the instruction to read the VVR is decoded, the read counter 203 is cleared to zero.

次にタイミングジェネレータからのクロック信号をパス
２０４を介して受取り、ＡＮＤ回路２０５を介して、読
出カウンタ２０３をカウントアツプする。読出、書込の
カウンタの内容は常時比較回路２０６によって比較され
、続出カウンタ２０３の値が書込カウンタ２０１の値と
等しくなると、続出カウンタ２０３のカウントアツプを
抑止する信号をバス２０７上に送る。該信号はインバー
タ２０８を介してＡＮＤ回路２０５によってクロック信
号と論理積がとられ、続出カウンタ２０３の動作を中断
させる。パス２１０はＶＶＲの読出を外の要因で中断し
たい場合に用いる。読出カウンタ２０３の値が書込カウ
ンタ２０１の値よりも小さい時、比較回路２０６はバリ
ッド信号を、読出カウンタ２０３をカウントアツプした
信号と同期してバス２０９上に送出する。Next, a clock signal from the timing generator is received via a path 204, and a read counter 203 is counted up via an AND circuit 205. The contents of the read and write counters are constantly compared by a comparison circuit 206, and when the value of the successive counter 203 becomes equal to the value of the write counter 201, a signal is sent onto the bus 207 to suppress the count up of the successive counter 203. The signal is logically ANDed with the clock signal by the AND circuit 205 via the inverter 208, and the operation of the successive counter 203 is interrupted. Path 210 is used when it is desired to interrupt reading of the VVR due to an external factor. When the value of the read counter 203 is smaller than the value of the write counter 201, the comparison circuit 206 sends out a valid signal onto the bus 209 in synchronization with the signal that counts up the read counter 203.

ＶＶＲを介する後出命令の先行命令に対する実行制御は
次のように行われる。当該制御を行う先行命令が実行さ
れる時、この命令の１ベクトル要素分の処理を行わせる
ようにバリッド信号を第２図の論理回路４が生成する。Execution control of the subsequent instruction with respect to the preceding instruction via the VVR is performed as follows. When the preceding instruction that performs the control is executed, the logic circuit 4 in FIG. 2 generates a valid signal so that one vector element of this instruction is processed.

このバリッド信号に対する処理が行われている間、後続
命令のデコードが行われるので、書込カウンタ２０１は
ゼロクリアされており、先行命令の第２ベクトル要素を
実行するバリッドは第４図の比較回路２０６によって抑
止されている。後続命令の第１要素分の処理が完了する
と、パス２００を通してバリッド信号を受取り、書込カ
ウンタ２０１をカウントアツプする。これにより比較回
路２０６による抑止が解除され、読出カウンタ２０３が
カウントアツプされ、パス２０９上に「実行バリッド」
信号が得られる。While this valid signal is processed, the subsequent instruction is decoded, so the write counter 201 is cleared to zero, and the valid signal for executing the second vector element of the preceding instruction is detected by the comparison circuit 206 in FIG. is suppressed by. When the processing of the first element of the subsequent instruction is completed, a valid signal is received through the path 200, and the write counter 201 is counted up. As a result, the suppression by the comparison circuit 206 is released, the read counter 203 counts up, and an "execution valid" message is displayed on the path 209.
I get a signal.

第２図に戻りＶＶＲを介するチェイニング制御を説明す
る。先行するベクトルロード命令とベクトルストア命令
がＶＴＲを介してチェイニングした場合を考え、メモリ
リクエスタ５でベクトルロード命令を実行し、メモリリ
クエスタ６でベクトルストア命令の実行を開始したとす
　　　　゛る。このベクトルストア命令の開始に先立ち
、該命令のデコード時に、ベクトルレジスタ１とストア
データを転送するパス２５を結合させ、ＶＶＲｌ、！：
データパス２６をセレクタ１４を介して結合させる。ま
たパス２６とパス２７をスイッチング回路１０によって
結合させる。以上の結合が完了した後メモリリクエスタ
６が起動されるＯここでメモリリクエスタが２個以上動作している場合の
主記憶参照制御法について説明する。Returning to FIG. 2, chaining control via VVR will be explained. Assume that the preceding vector load instruction and vector store instruction are chained via a VTR, and the memory requester 5 executes the vector load instruction, and the memory requester 6 starts executing the vector store instruction. Prior to the start of this vector store instruction, when decoding the instruction, the vector register 1 and the path 25 for transferring store data are coupled, VVRl, ! :
Data path 26 is coupled via selector 14 . Further, the path 26 and the path 27 are coupled by the switching circuit 10. After the above combination is completed, the memory requester 6 is activated. Here, a main memory reference control method when two or more memory requesters are operating will be described.

２個以上のメモリリクエスタが同時に動作していると、
偶然両リクエスタからの主記憶参照要求が同一のバンク
を指すことがある。この時各メモリリクエスタの間で優
先順位を定めておき、これによって先方のメモリリクエ
スタの主記憶参照要求を待たせ、当該バンクへのアクセ
スが終了してから当該参照要求を受付けるように制御す
る。この主記憶参照要求の順位決定と遅延制御論理は一
括してプライオリティ論理と呼ばれ、すでによく知られ
ている制御方式である。If two or more memory requesters are running at the same time,
By chance, main memory reference requests from both requesters may point to the same bank. At this time, a priority order is determined between each memory requester, and control is thereby made such that the main memory reference request of the other memory requester is made to wait, and the reference request is accepted after access to the bank in question is completed. This main memory reference request ranking determination and delay control logic is collectively called priority logic, and is a well-known control method.

第２図では図面の簡約化のため当該論理を削除している
。しかしプライオリティ論理を介して主記憶を参照する
ということは、メモリリクエスタ内のアドレス生成なら
びにストアデータ転送をベクトルレジスタからマシンサ
イクルピッチに行ってはならないことと同篩である。従
って第２図に、メモリリクエスタ、ベクトルレジスタへ
の動作中断指示バス２Ｂ　、　２９　、３０　、３１を
示した。これらのパスを使ってベクトルロード命令で主
記憶参照の中断が起こると、パス２８を通ってリクエス
タ５のアドレス生成を中断させる。ベクトルストア命令
の場合、パス２８によりてアドレス生成を中断させると
ともに、パス５０によってベクトルレジスタ１の続出を
中断させる。In FIG. 2, the logic concerned is deleted to simplify the drawing. However, referencing main memory via priority logic is equivalent to not being able to generate addresses in the memory requester and transfer store data from vector registers to machine cycle pitches. Therefore, FIG. 2 shows operation interruption instruction buses 2B, 29, 30, and 31 to the memory requester and vector register. When a main memory reference is interrupted by a vector load instruction using these paths, the address generation of the requester 5 is interrupted via path 28. In the case of a vector store instruction, the path 28 interrupts address generation, and the path 50 interrupts the successive reading of the vector register 1.

以上のようにａｔのメモリリクエスタが動作していると
仮定して、ストア命令がメモリリクエスタ６でＶＶＲの
制御下で処理されている場合を第５図を用いて説明する
。Assuming that the memory requester of at is operating as described above, the case where a store instruction is processed by the memory requester 6 under the control of the VVR will be described with reference to FIG.

第３図において、１０８はアップダウンカウンタ、１０
９はストアデータスタック、パス２５αはストアデータ
を送出したことを示すコマンド信号のパス、２５４はパ
ス２５αに対するデータバスである。（第２図ではこの
２５α、４の両バスを合わせてパス２５と記しである。In FIG. 3, 108 is an up/down counter;
9 is a store data stack, path 25α is a path for a command signal indicating that store data has been sent, and 254 is a data bus for path 25α. (In FIG. 2, both buses 25α and 4 are collectively referred to as path 25.

）パス２７を経由してＶＶＲから実行バリッド信号が送ら
れて来るとアップダウンカウンタ１０８は−１される。) When an execution valid signal is sent from the VVR via the path 27, the up/down counter 108 is decremented by 1.

パス２５αを経由してストアデータコマンドが送られて
来るとアップダウンカウンタ１０８は＋１される。When a store data command is sent via the path 25α, the up/down counter 108 is incremented by 1.

ここでストアデータスタックの段数をルとする。アップ
ダウンカウンタ１０８は、カウンタ値が１から＋ルの範
囲にある間はマシンサイクルピッチにパス１０４上にバ
リッド信号を送出する。Here, let the number of stages of the store data stack be le. The up/down counter 108 sends out a valid signal on the path 104 at the machine cycle pitch while the counter value is in the range from 1 to +1.

カウンタ値が０以下になるとＶＶＲ読出を抑止する信号
をパス３１α上に送出する。またカウンタ値がか＋１以
上になるとベクトルレジスタ続出抑止を指示する信号を
パス３１善上に送出する。When the counter value becomes 0 or less, a signal to inhibit VVR reading is sent onto the path 31α. Further, when the counter value becomes +1 or more, a signal instructing to suppress successive generation of vector registers is sent to the path 31.

主記憶装置でプライオリティ論理によって主記憶参照要
求が待たされた場合、パス２９を介してメモリリクエス
タの動作中断指示信号が送出される。当該信号はアップ
ダウンカウンタ１０８のカウントを中断させ、パス１０
４を介して読出カウンタ１０１の動作を中断させる。同
時にパス３１α、４を通してベクトルレジスタ、ＶＶＲ
の動作を中断させる。以上のようにしてＶＴＲとベクト
ルレジスタの読出同期がとられる。上記の同期制御下で
ストアデータはパス１０７　、ｆに、アドレスはパス１
０６上に送出される。第２図ではこのパスは３２　、５
５に対応している。When a main memory reference request is made to wait in the main memory device due to the priority logic, a memory requester operation interruption instruction signal is sent via the path 29. This signal interrupts the up/down counter 108 and causes the path 10 to stop counting.
4, the operation of the read counter 101 is interrupted. At the same time, through paths 31α and 4, the vector register, VVR
interrupt the operation. As described above, read synchronization between the VTR and the vector register is achieved. Under the above synchronization control, the store data is on path 107, f, and the address is on path 1
06. In Figure 2, this path is 32,5
It corresponds to 5.

次に先行ベクトル命令の動作を後出のベクトル命令が制
御する場合の動作を第２図によって説明する。先行ベク
トル命令がロードでありメモリリクエスタ５によって処
理されると仮定する。ベクトル処理装置のベクトル命令
デコーダがＰＡＴＨ指定を検出し、ＰＡＴＨ指定のある
ベクトル命令を起動する時、パス３４を通してスイッチ
ング回路１０を作動させ、パス３５と３６を結合する。Next, the operation when the operation of the preceding vector instruction is controlled by the later vector instruction will be explained with reference to FIG. Assume that the preceding vector instruction is a load and is processed by memory requester 5. When the vector instruction decoder of the vector processing unit detects a PATH specification and activates a vector instruction with a PATH specification, it activates switching circuit 10 through path 34 and couples paths 35 and 36.

この結合は、初期値セット論理部４の作動する第１ベク
トル要素の処理タイミング間だけ行われる。第２ベクト
ル要素以降は、パス３７と３６が結合される。This combination is performed only during the processing timing of the first vector element when the initial value set logic unit 4 operates. After the second vector element, paths 37 and 36 are combined.

後続ベクトル命令がメモリリクエスタ６によって処理さ
れかつストア命令であると仮定す４この時、パス３８と
３９がスイッチング回路１２によって結合される。Assuming that the subsequent vector instruction is processed by memory requester 6 and is a store instruction, paths 38 and 39 are then coupled by switching circuit 12.

先行ベクトルロード命令は、初期値セットルーチン４の
生成したバリッドによって、１ベクトル要素分のアドレ
スが生成されパス１０６へ主記憶参照要求が送出される
。次のタイミングではパス３７．３６を通してＶＶＲか
らバリッドが送られて来ないので、第２ベクトル要素分
の処理を開始しない。しかし第１ベクトル要素の主記憶
参照要求は、主記憶８とスイッチング回路９を経由して
ロードデータとなり、パス２３を通りベクトルレジスタ
１にデータが書込まれる。In the preceding vector load instruction, an address for one vector element is generated based on the valid value generated by the initial value set routine 4, and a main memory reference request is sent to the path 106. At the next timing, no valid data is sent from the VVR through paths 37 and 36, so processing for the second vector element is not started. However, the main memory reference request for the first vector element becomes load data via the main memory 8 and the switching circuit 9, and the data is written into the vector register 1 via the path 23.

当該データは先行ベクトルロード命令と後続ベクトルス
トア命令の間に演算命令が存在する場合には、パス４Ｇ
、演算器２、パス４１を経由して演算結果となり、ベク
トルレジスタ１に書込まれる。当該演算結果はパス２５
を介してメモリリクエスタ６に送られる。該メモリリク
エスタではストア命令の処理が１ベクトル要素分だけ行
われ、パス３２　、３３上にストアアドレスとデータが
送出される。同時にパス３８上に１ベクトル要素分の処
理を完了したことを示すバリッド信号が送出される。こ
のバリッドはパス３９を介してＶＶＲの書込カウンタ２
０１の値を＋１する。これによりＶＶＲの続出カウンタ
２０３の値が＋１され、パス３７上にバリッド信号が送
出される。以上の処理をサイクリックに行うことによっ
て、先行ベクトルロード命令の処理と後出のベクトルス
トア命令の同期が可能になる。If there is an arithmetic instruction between the preceding vector load instruction and the subsequent vector store instruction, the data will be passed to pass 4G.
, the arithmetic unit 2, and the path 41 to obtain the operation result, which is written into the vector register 1. The calculation result is pass 25
The data is sent to the memory requester 6 via the memory requester 6. The memory requester processes the store instruction for one vector element, and sends the store address and data onto paths 32 and 33. At the same time, a valid signal is sent on the path 38 indicating that processing for one vector element has been completed. This valid is sent to the write counter 2 of the VVR via path 39.
Add 1 to the value of 01. As a result, the value of the VVR successive counter 203 is incremented by 1, and a valid signal is sent onto the path 37. By performing the above processing cyclically, it becomes possible to synchronize the processing of the preceding vector load instruction with the vector store instruction described later.

この例では簡単のためにベクトルロード命令とベクトル
ストア命令の組合せについて説明したが、インデクス付
ロード命令とインデクス付ストア命令の組合せでも同様
の処理で同期を行うことができる。この場合はインデク
スとストアデータの同期はベクトルレジスタのチェイニ
ング動作で行われる。またベクトルストアとロード命令
の組合せ、ベクトルストア命令相互の組合せも、パスの
組合せをスイッチング回路１０゜１１　、１２で変える
ことにより可能である。In this example, for simplicity, a combination of a vector load instruction and a vector store instruction has been described, but synchronization can also be performed using a similar process with a combination of a load instruction with an index and a store instruction with an index. In this case, the index and store data are synchronized by chaining operations of vector registers. Furthermore, combinations of vector store and load instructions and mutual combinations of vector store instructions are also possible by changing the combination of paths using the switching circuits 10, 11 and 12.

第２図では主記憶のインタリープ数を４、メモリリクエ
スタを２としているが、説明の単純化のために特定のリ
ソース数としたのであって、この数自体に特に意味はな
い。In FIG. 2, the number of interleaps in the main memory is 4 and the number of memory requesters is 2, but these numbers are specific numbers for the purpose of simplifying the explanation, and these numbers themselves have no particular meaning.

〔Effect of the invention〕

本発明によれば、ベクトル処理装置において、ベクトル
レジスタ番号を共有しない任意のベクトル命令対につい
て、ＶＶＲによってベクトル要素ごとの同期をとること
が可能になる。この同期制御により、主記憶を参照する
任意のべり、トル命令対について要素の処理順序を保証
することができる。この処理順序保証により、従来のベ
クトル処理装置では処理順序、を保証しなければならな
いベクトル命令間にシリアライスを行う命令を挿入して
いた処理を、並列に実行することが可能になる。また従
来のベクトル処理装置ではベクトル処理不可能であった
、間接アドレッシングを含む主記憶参照命令対を、後続
ベクトル命令の実行と先行ベクトル命令の実行とを同期
制御することにより、ベクトル処理とすることが可能に
なる。以上によりベクトル処理装置の並列処理性を向上
させ、性能を向上させることが可能となる。According to the present invention, in a vector processing device, it becomes possible to synchronize each vector element using VVR for any pair of vector instructions that do not share a vector register number. This synchronization control makes it possible to guarantee the processing order of elements for any pair of read and write instructions that refer to the main memory. By guaranteeing the processing order, it becomes possible to execute in parallel a process in which a conventional vector processing device inserts an instruction to perform serialization between vector instructions whose processing order must be guaranteed. In addition, a pair of main memory reference instructions including indirect addressing, which could not be vector-processed by conventional vector processing devices, can be processed by vector processing by synchronously controlling the execution of the subsequent vector instruction and the execution of the preceding vector instruction. becomes possible. As described above, it is possible to improve the parallel processing performance of the vector processing device and improve the performance.

[Brief explanation of drawings]

第１図は主記憶参照ベクトル命令の処理図、第２図は本
発明に基づくベクトル処理装置の同期制御部概略ブロッ
ク図、第３図はメモリリクエスタ概略ブロック図、第４
図はＶＶＲ概略ブロック図である。第２図において、１・・・ベクトルレジスタ、２・・・演算器、３−　ＶＬ−ｓｍｊ　Ｖ＜、ｄｚｃ　Ｒ＜＄Ａｌ４ｈ　
（Ｖ　Ｖ　Ｒ）、５．６・・・メモリリクエスタ１．７，９，１１．１２・・・スイッチング回路、８・・
・主記憶装置。第３図において、１０１・・・カウンタ、１０８・・・アップダウンカウンタ、１０９・・・ストアデータスタック。第４図において、２０１・・・書込カウンタ、２０３・・・読出カウンタ、２０５°°Ａ　Ｎ　Ｄ回路・　　　　　　　　　　　１
．２゜代理人弁理士　小　　川　　勝　　男第　１図特開第　３０第２図FIG. 1 is a processing diagram of a main memory reference vector instruction, FIG. 2 is a schematic block diagram of the synchronization control section of the vector processing device based on the present invention, FIG. 3 is a schematic block diagram of the memory requester, and FIG.
The figure is a VVR schematic block diagram. In FIG. 2, 1... Vector register, 2... Arithmetic unit, 3- VL-smj V<, dzc R<$Al4h
(V VR), 5.6...Memory requester 1. 7,9,11.12...Switching circuit, 8...
・Main memory. In FIG. 3, 101...counter, 108...up/down counter, 109...store data stack. In FIG. 4, 201...Write counter, 203...Read counter, 205°A N D circuit 1
．． 2゜Representative Patent Attorney Katsuo Ogawa Figure 1 Unexamined Patent Publication No. 30 Figure 2

Claims

[Claims]

A main storage device, a plurality of vector registers, a plurality of data transfer circuits that transfer data between the main storage device and the vector registers, and perform arithmetic processing on the vector data received from the vector registers and output the results. In a vector processing device that is equipped with a plurality of vector arithmetic units that send data to the vector register and processes vector instructions, chaining control via the vector registers is different from chaining control to virtual registers that do not have data. A control means, a means for coupling the control means and a signal indicating a processing state of a vector instruction that refers to the main storage device, and a combination of the means is specified in a field of the vector instruction, and these means are combined. A vector processing device characterized in that the processing order of vector elements can be guaranteed for any pair of main memory reference vector instructions.