JPS6259829B2

JPS6259829B2 -

Info

Publication number: JPS6259829B2
Application number: JP56142976A
Authority: JP
Inventors: Shigeaki Okuya; Juji Oinaga
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1981-09-10
Filing date: 1981-09-10
Publication date: 1987-12-12
Also published as: JPS5844569A

Description

【発明の詳細な説明】本発明は、ベクトル処理においてメモリビジー
などにより処理中断が生じた場合の命令処理同期
制御方式に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to an instruction processing synchronization control method when processing is interrupted due to memory busy or the like in vector processing.

複数のエレメントを有する第２オペランドＡ＝
a₀、a₁、………、ａ_o-1と複数のエレメントを有
する第３オペランドＢ＝b₀、b₁、………ｂ_o-1で
対応するエレメント同士に演算を施し、結果の第
１オペランド、Ｃ＝c₀、c₁、………ｃ_o-1（ここ
で演算が加算なら、ｃ_i＝ａ_i＋ｂ_i、ｉは０、１、
２、………ｎ−１のいずれか）を得るような処理
装置がある。これはベクトル処理装置と呼ばれ、
これに対してオペランドのエレメントが１個（ｎ
＝１）に限定された従来の汎用処理装置をスカラ
ー処理装置と称する。第１図にベクトル処理装置
の構成を示す。この図に示すようにベクトル処理
装置は命令制御部、主記憶MS_uの制御装置MC_uに
対しロード、ストア処理をするメモリアクセス処
理部、ベクトルレジスタ、加算器および乗算器を
含む演算処理部を備える。ベクトルレジスタは、
ロード／ストアされる複数のエレメントからなる
複数のベクトルデータを収容する。ベクトル命令
は命令コード、第１オペランド指定、第２オペラ
ンド指定および第３オペランド指定より成る。例
えばVM1、２、３はベクトルレジスタ２とベク
トルレジスタ３の内容を乗算（VM：ベクトルマ
ルチ）し、ベクトルレジスタ１に結果を入れよう
という乗算命令である。またVA4、５、１はベク
トルレジスタ５とベクトルレジスタ１の内容を加
算（VA：ベクトルアツド）し、結果をベクトル
レジスタ４に入れよというベクトル加算命令であ
る。 Second operand A= with multiple elements
The third operand B has a plurality of elements, a ₀ , a ₁ , ......, a _o-1 = b ₀ , b ₁ , ......b _o-1 , and the corresponding elements are operated on, and the result is First operand, C = c ₀ , c ₁ , ......c _o-1 (Here, if the operation is addition, c _i = a _i + b _i , i is 0, 1,
There is a processing device that obtains any one of 2, . . . n-1). This is called a vector processing unit.
In contrast, the operand element is one (n
A conventional general-purpose processing device limited to 1) is called a scalar processing device. FIG. 1 shows the configuration of a vector processing device. As shown in this figure, the vector processing device includes an instruction control section, a memory access processing section that performs load and store processing for the control device MC _u of the main memory MS _u , and an arithmetic processing section including vector registers, adders, and multipliers. Be prepared. The vector register is
Accommodates multiple vector data consisting of multiple elements to be loaded/stored. A vector instruction consists of an instruction code, a first operand specification, a second operand specification, and a third operand specification. For example, VM1, 2, and 3 are multiplication instructions for multiplying the contents of vector register 2 and vector register 3 (VM: vector multi) and putting the result in vector register 1. VA4, 5, and 1 are vector addition instructions to add the contents of vector register 5 and vector register 1 (VA: vector add) and put the result into vector register 4.

ベクトル命令を処理する場合には、処理を高速
に行なうために、演算処理部をパイプライン構造
にし、先行のエレメントの演算処理が完了する前
に後続のエレメントを投入するようにしている。
例えば、ベクトル加算命令を行なうときはデー
タの読出し（READ）、両オペランドの指数比
較（COMPARE）、指数を合わせるためのシフ
ト（PRE SHIFT）、加算（ADD）、演算後正
規化のためのシフト（POST SHIFT）、データ
の書込み（WRITE）、の６段階の処理となり、
これらが第２図に示すように逐次かつ同時に実行
される。即ちエレメント０についてはa₀、b₀の読
出し、指数比較、プリシフト………が逐次行なわ
れ、エレメント１についても同様であるがタイミ
ングは１処理タイミングτ_０だけ遅れ、エレメン
ト２についても同様であるがタイミングは２処理
タイミングだけ遅れ、以下これに準ずる。従つて
エレメント全体の処理状況は図示のように平行四
辺形で表わされる。E0T、E1T………はエレメン
ト０の処理、エレメント１の処理………を示す。 When processing vector instructions, in order to perform the processing at high speed, the arithmetic processing unit has a pipeline structure, and the subsequent element is input before the arithmetic processing of the preceding element is completed.
For example, when performing a vector addition instruction, read data (READ), compare the exponents of both operands (COMPARE), shift to match the exponents (PRE SHIFT), add (ADD), and shift for normalization after operation ( It is a 6-step process: POST SHIFT), data writing (WRITE),
These are executed sequentially and simultaneously as shown in FIG. That is, for element 0, reading of a ₀ and b ₀ , index comparison, preshift, etc. are performed sequentially, and the same is true for element 1, but the timing is delayed by one processing timing τ ₀ , and the same is true for element 2. However, the timing is delayed by two processing timings, and the same applies hereafter. Therefore, the processing status of the entire element is represented by a parallelogram as shown. E0T, E1T... indicate processing of element 0, processing of element 1, etc.

ところで演算処理ではある処理TR₁の結果を使
用して次の処理TR₂を行なう場合があるが、メモ
リビジーなどで処理部TR₁の実行が遅れると当然
次の処理TR₂は実行できず、TR₁処理待ちとな
る。そして該TR₁の処理が終つて次の処理TR₂の
実行が開始される。例えば次のような命令群を考
える。VL2、Ａ（メモリアドレスＡの内容をベ
クトルレジスタ２にロードせよ）、VA3、０、
２（ベクトルレジスタ２，０の内容を加算してレ
ジスタ３に入れよ）、VA4、１、２、VL5、
Ｂ（メモリアドレスＢの内容をベクトルレジスタ
５にロードせよ）、VM6、３、５（ベクトルレ
ジスタ３，５の内容を乗算して結果をベクトルレ
ジスタ６に入れよ）、VM7、４、５、この命令
列では処理、などは、処理、の実行が終
了しないと実行できない。このように、先行命令
の結果を後続命令で使用するとき、それらの命令
間に連鎖（チエイン）があるという。そしてメモ
リはアクセス処理部だけでなくチヤネルその他も
使用するので常にアクセス可とは限らない。 By the way, in arithmetic processing, the result of a certain process TR ₁ may be used to perform the next process TR ₂ , but if the execution of the processing unit TR ₁ is delayed due to memory busy etc., the next process TR ₂ will naturally not be able to be executed. Waiting for TR ₁ processing. After the processing of TR ₁ is completed, execution of the next processing TR ₂ is started. For example, consider the following set of instructions. VL2, A (load the contents of memory address A into vector register 2), VA3, 0,
2 (Add the contents of vector registers 2 and 0 and put it in register 3), VA4, 1, 2, VL5,
B (load the contents of memory address B into vector register 5), VM6, 3, 5 (multiply the contents of vector registers 3 and 5 and put the result in vector register 6), VM7, 4, 5, this In a sequence of instructions, a process, etc. cannot be executed until the execution of the process has finished. In this way, when the result of a preceding instruction is used in a subsequent instruction, there is said to be a chain between those instructions. Since memory is used not only by the access processing unit but also by channels and others, it is not always possible to access the memory.

メモリアクセスに遅れがなければ上記命令列の
実行は第３図の如くなる。即ち命令VL2、Ａは
各ベクトルエレメントE₀、E₁………について逐
次実行され、ベクトルレジスタ２へ書込まれてゆ
く、次の命令VA3、０、２はレジスタ２にエレ
メントE0、E1………が書込まれるにつれて第
０、第１………エレメントから実行してゆけるの
で、レジスタ２への最初のエレメントE0が書込
まれる時点から１タイミング以上ずれただけで直
ちに第０エレメント側から実行開始される。これ
をベクトル処理の連鎖化（チエイニング）とい
う。次の命令はの結果を使うだけであるから
と同時にスタートさせてもよいが、と同じベ
クトルアツドVAであるので使用ハードウエアが
同じであるという関係上の処理待ちとなる。次
の命令の結果は命令で使用するだけであるの
でそれ迄に終れば（レジスタ５への第０エレメン
トの格納が）よく、図示の如き適当なタイミング
で実行する。命令は乗算器が加算器とは別であ
れば命令、などと同時に実行してよいが、本
例では同じハードウエアを使用する、従つての
後としている。命令は命令との結果を使用
するのでそれらの実行後なら何時でもよいが、同
様な理由で命令の後で行なう。 If there is no delay in memory access, the execution of the above instruction sequence will be as shown in FIG. That is, the instruction VL2, A is executed sequentially for each vector element E ₀ , E ₁ . As ... is written, it can be executed from the 0th, 1st, etc. elements, so if there is a delay of more than one timing from the time when the first element E0 is written to register 2, it will be executed immediately from the 0th element side. Begins. This is called chaining of vector processing. Since the next instruction only uses the result of , it may be started at the same time, but since it is the same vector added VA, it will wait for processing because it uses the same hardware. Since the result of the next instruction is only used by the instruction, it is sufficient to finish by then (storing the 0th element to register 5), and the execution is executed at an appropriate timing as shown in the figure. The instruction may be executed at the same time as the instruction if the multiplier is separate from the adder, etc., but in this example the same hardware is used, so it is executed later. Since the command uses the results of the command, it can be executed at any time after the execution of the command, but for the same reason, it is performed after the command.

ベクトル処理では主記憶は複数のバンクに分
け、インターリーブを行なつている。しかし、イ
ンターリーブしていても、主記憶のサイクルタイ
ムの間に同一バンクに後続のアクセスが来たとき
には後続のアクセスは待たされる。またチヤネル
プロセツサ（CHP）等の他装置とメモリアクセ
スバンクが競合すると、ベクトル命令のメモリア
クセスは待たされる。メモリアクセスが持たされ
る場合は単純にはチエイニングは行なえない。例
えば命令の処理であるエレメントEiまでのレ
ジスタ２への格納が終つて次のエレメントＥ_i+1
ではメモリアクセスを待機させられたとすると、
命令ではそのＥ_i+1以降のエレメントはレジス
タ２のＥ_i+1以降に対応する個所の無意味なデー
タとの加算をしてしまう。そこで従来は(i)ロード
命令を出すとき、その命令内の各エレメント間
で、サイクルタイム間にバンクのぶつかりがない
ことを確認し、(ii)CHP等の他からのアクセスが
ないことを確認し、(iii)そのロード命令がメモリア
クセスを行なつているときは、他のメモリアクセ
スを禁止する、ようにして即ちメモリアクセスに
遅れが生じないようにした状態でのみ、チエイニ
ングを行なうようにしていた。このため前記命令
のあるエレメントで、サイクルタイムの間バン
クのぶつかりがある場合にはチエイニングを行な
えず、第４図のような命令処理タイムチヤートに
なる。 In vector processing, main memory is divided into multiple banks and interleaved. However, even with interleaving, if a subsequent access comes to the same bank during the cycle time of the main memory, the subsequent access will have to wait. Furthermore, if a memory access bank competes with another device such as a channel processor (CHP), memory access for vector instructions is forced to wait. If memory access is required, chaining cannot be performed simply. For example, after the storage in register 2 up to element Ei, which is instruction processing, is completed, the next element E _i+1
Now, if memory access is made to wait,
In the instruction, elements after E _i+1 are added with meaningless data in register 2 corresponding to E _i+1 and after. Therefore, in the past, (i) when issuing a load instruction, it was confirmed that there was no bank conflict during the cycle time between each element in that instruction, and (ii) it was confirmed that there was no access from other sources such as CHP. and (iii) when the load instruction is accessing memory, chaining should be performed only when other memory accesses are prohibited, that is, there is no delay in memory access. was. Therefore, if there is a bank conflict during the cycle time in an element with the instruction, chaining cannot be performed, resulting in an instruction processing time chart as shown in FIG.

第４図を見れば明らかなようにこの方式では、
後続する命令の一部のエレメントは実行できる
のに命令の実行完了まで命令の実行は待たさ
れ、システム資源が有効に使用されていないこと
になる。またCHP等の他装置は完全に待たされ
るため、インタリーブの効果が減少させられる。 As is clear from Figure 4, with this method,
Even though some elements of the subsequent instruction can be executed, the execution of the instruction is delayed until the execution of the instruction is completed, resulting in system resources not being used effectively. Also, other devices such as CHP are forced to wait completely, reducing the effect of interleaving.

本発明はかかる点を改善し、処理待ちを可及的
に減少させて所要時間を短縮し、また資源の有効
利用を図ろうとするものである。本発明ではメモ
リアクセスに中断があつても、それ迄にロードさ
れてきたエレメントは早速処理するようなチエイ
ニング制御を行なうようにする。 The present invention aims to improve this problem, reduce processing waiting time as much as possible, shorten the required time, and effectively utilize resources. In the present invention, chaining control is performed so that even if memory access is interrupted, elements that have been loaded so far are processed immediately.

すなわち第５図に示すように、ロード命令で
エレメントデータをベクトルレジスタ２に書込ん
だときにはチエイニングを行ない、後続命令を
実行させるが、先行ロード命令がメモリアクセ
ス不可のため、途中でエレメントデータを書込め
なくなつたときは、チエイニングしている後続命
令を停止させるようにする。このとき、チエイ
ニングが有るか無いかを確認しないで、単にロー
ド命令においてエレメントデータをベクトルレジ
スタに書込めなかつたということだけでこのとき
走つている演算処理を停止させるなら、チエイニ
ングの同期制御は簡単になるが、第５図の命令
、のように、の実行中断によつてチエイニ
ングしていない命令まで中断させてしまい、性
能はよくない。 In other words, as shown in FIG. 5, when element data is written to the vector register 2 by a load instruction, chaining is performed and the subsequent instruction is executed, but since the preceding load instruction cannot access memory, the element data is written in the middle. When it becomes impossible to load the commands, the chaining subsequent commands are stopped. At this time, if you do not check whether there is chaining or not, but simply stop the arithmetic processing that is currently running because element data could not be written to the vector register in a load instruction, synchronization control of chaining is easy. However, as in the instruction shown in FIG. 5, by interrupting the execution of the instruction, non-chaining instructions are also interrupted, resulting in poor performance.

本発明は、先行するロード命令に後続命令がチ
エイニングしているか否かを検査し、チエイニン
グしているときで、先行するロード命令における
エレメントデータのベクトルレジスタへの書込み
が中断したときのみ、その期間中、後続のチエイ
ニングしている命令処理を中断させて、同期処理
を行ない、チエイニングしていないときには、そ
れらのロード命令および他命令は非同期に並列処
理させることによつてシステム資源の有効利用
と、性能向上を計るものである。 The present invention checks whether or not a subsequent instruction is chained to a preceding load instruction, and only when chaining occurs and the writing of element data to the vector register in the preceding load instruction is interrupted, the period of the chaining is determined. During the process, the processing of subsequent chained instructions is interrupted to perform synchronous processing, and when not chained, those load instructions and other instructions are processed asynchronously in parallel, thereby making effective use of system resources. It measures performance improvement.

第６図に本発明の場合のタイムチヤートを示
す。命令においてアクセス可能は第４エレメン
トE₃までで以後は待機となつたとすると、チエ
イニングしている命令があるのでエレメント
E₃がレジスタ２に書込まれたタイミングでクロ
ツクを止める。従つて命令では第１、第２、第
３エレメントE₀、E₁、E₂の処理が一部実行され
てただけで処理中断となる。そして本例では４ク
ロツク周期後にアクセス可となり、後続のエレメ
ントE₄、E₅………がレジスタ２に書込まれ始め
たとき命令の実行を再開する。次にロード命令
においても中断が生じたが、同時に実行されて
いる演算命令はとチエイニングしていないの
でクロツクを止めることはせず、命令の実行中
断、命令は処理続行とする。このようにすれば
第４図の方式では中断は同様に４＋４＝８クロツ
ク周期であるのに命令の実行開始までに42クロ
ツク周期要したが、第６図の方式では30クロツク
周期で済む。 FIG. 6 shows a time chart in the case of the present invention. Assuming that the instruction can access up to the 4th element E ₃ and is on standby after that, there is a chaining instruction, so the element
Stop the clock when _E3 is written to register 2. Therefore, in the instruction, the processing is interrupted after only a portion of the processing of the first, second, and third elements E ₀ , E ₁ , and E ₂ has been executed. In this example, access becomes possible after four clock cycles, and instruction execution resumes when the subsequent elements E ₄ , E ₅ , . . . begin to be written to register 2. Next, an interruption occurred in the load instruction, but since there is no chaining with the arithmetic instructions being executed at the same time, the clock is not stopped, the instruction execution is interrupted, and the instruction processing is continued. In this way, in the system of FIG. 4, it took 42 clock cycles to start execution of the instruction, even though the interruption takes 4+4=8 clock cycles, whereas in the system of FIG. 6, it only takes 30 clock cycles.

またこの方式では前記(i)(ii)(iii)の制御は不要にな
るため、ロード命令を出すとき、その命令内の各
エレメント間でサイクルタイムの間にバンクのぶ
つかりが有るか否かを確認する必要がなくなり、
更に、メモリには、CHP等他装置からのアクセ
スも並行して実行できることになる。 In addition, this method eliminates the need for the controls in (i), (ii), and (iii) above, so when issuing a load instruction, it is checked whether or not there is a bank conflict during the cycle time between each element in that instruction. There is no need to check
Furthermore, the memory can be accessed in parallel from other devices such as CHP.

これは、ベクトル処理装置に複数のメモリアク
セス処理部すなわち複数のロード処理部やストア
処理部を置いても、それらを並列に実行させなが
らチエイニングを行なうことを可能とするので、
ロード処理部を２個置き、後続演算命令の第３オ
ペランドと第２オペランドを各ロード処理部でロ
ードしながら、その後続演算命令をチエイニング
しながら実行するようなとき、とりわけ有効とな
る。また命令と命令のプログラム上の順序を
入れ替えておけばVL命令の実行を早くスター
トさせることができるため、該VL命令の処理
が遅れたとしても、VA命令の処理を非同期に
並列に実行しながら、VM命令の実行時にはベ
クトルレジスタにデータが揃つているようにする
ことができる。すなわち、メモリアクセスによる
遅れを見えなくすることができる。第７図に本発
明の実施例を示す。１０，１２，１４，１６，１
８はレジスタ、２０は命令デコーダ、２２は命令
発信制御回路、２４，２６は一致回路、２８はオ
アゲート、３０，３２はアンドゲート、３４はイ
ンバータである。矩形枠３６は実行中アクセス命
令管理部、３８は実行中演算命令管理部を示す。
第７図は第１図のベクトル処理部に相当するもの
で、命令処理部、ロード処理部、ベクトルレジス
タ、演算処理部は第１図のそれに対応する。 This allows even if a vector processing device has multiple memory access processing units, that is, multiple load processing units and store processing units, to perform chaining while executing them in parallel.
This is particularly effective when two load processing units are provided and each load processing unit loads the third and second operands of a subsequent arithmetic instruction while executing the subsequent arithmetic instruction in a chained manner. In addition, by changing the program order of instructions, VL instruction execution can be started earlier, so even if the processing of the VL instruction is delayed, VA instruction processing can be executed asynchronously and in parallel. , it is possible to ensure that the data is aligned in the vector register when the VM instruction is executed. In other words, delays caused by memory access can be made invisible. FIG. 7 shows an embodiment of the present invention. 10, 12, 14, 16, 1
8 is a register, 20 is an instruction decoder, 22 is an instruction transmission control circuit, 24 and 26 are matching circuits, 28 is an OR gate, 30 and 32 are AND gates, and 34 is an inverter. A rectangular frame 36 indicates an access instruction management section being executed, and a reference numeral 38 indicates an execution instruction management section.
7 corresponds to the vector processing section shown in FIG. 1, and the instruction processing section, load processing section, vector register, and arithmetic processing section correspond to those shown in FIG.

命令制御部において、フエツチＦした命令はレ
ジスタ１０に格納され、次いで命令デコードＤ用
レジスタ１２に移され、デコーダ２０で解読さ
れ、然るのち命令発信待合せＱレジスタ１４に格
納される。レジスタ１４内の命令は、先行中の命
令とオペランドレジスタのレジスタ干渉がないこ
と、演算処理部が空いていること、かつ命令制御
部の管理部３６，３８が空いていること等を確認
して命令実行に移る。前記命令列〜の場合は
次のようになる。フエツチされた命令はレジス
タ１０，１２、デコーダ２０、レジスタ１４、回
路２２を通つて管理部３６のレジスタ１６に入
る。続いて命令がフエツチされかつシフト等さ
れてレジスタ１４に入り、該レジスタで回路２２
により命令発信条件が検査される。命令のオペ
ランドはその第３が命令の第１オペランドと一
致しているのでチエイニングしていることが分
り、該命令はレジスタ１４で待たされる。命令
の実行が進んでロードデータエレメントをベク
トルレジスタ２に書込み開始すると、それを示す
フラグWFがロード処理部から送られ、これはイ
ンバータ３４で反転されてＨ（ハイ）のレベル信
号がＬ（ロー）レベル信号に変り、アンドゲート
３０を閉じて命令発信待ち信号を消滅させる（Ｌ
レベルにする）。そこで待機は解除となり、演算
命令の回路２２を通つてベクトル処理装置の演
算処理部／アクセス処理部の起動情報Sbとなり
命令の実行を開始させると共に、管理部３８の
レジスタ１８に格納される。またこのときアンド
ゲート３２から同期動作開始信号Scが出力さ
れ、ロード処理部の同期制御回路に入力される。
これによりロード処理部は、命令の実行中アク
セス中断が生じてレジスタへ後続データエレメン
トの書込みが不可能になつたときは、クロツクを
止めて命令(2)の実行も中断させる。Seは演算処
理部へ入力する同期制御信号であり、演算停止、
再開を指示する。 In the instruction control section, the fetched F instruction is stored in a register 10, then transferred to an instruction decode D register 12, decoded by a decoder 20, and then stored in an instruction transmission waiting Q register 14. For instructions in the register 14, check that there is no register interference between the preceding instruction and the operand register, that the arithmetic processing section is empty, and that the management sections 36 and 38 of the instruction control section are empty. Move to command execution. The above instruction sequence ~ is as follows. The fetched instruction passes through registers 10 and 12, decoder 20, register 14, and circuit 22 and enters register 16 of management section 36. The instruction is then fetched and shifted etc. into register 14 where it is transferred to circuit 22.
The command issuing conditions are checked. Since the third operand of the instruction matches the first operand of the instruction, chaining is known, and the instruction is made to wait in register 14. When the execution of the instruction progresses and the writing of the load data element to the vector register 2 starts, a flag WF indicating this is sent from the load processing section, and this is inverted by the inverter 34, so that the H (high) level signal becomes L (low). ) level signal, closes the AND gate 30, and eliminates the command transmission wait signal (L
level). Thereupon, the standby state is canceled, and the information passes through the arithmetic instruction circuit 22 to become activation information Sb for the arithmetic processing unit/access processing unit of the vector processing device, starting execution of the instruction, and is stored in the register 18 of the management unit 38. At this time, the AND gate 32 outputs a synchronous operation start signal Sc, which is input to the synchronous control circuit of the load processing section.
As a result, the load processing section stops the clock and also interrupts the execution of instruction (2) when an access interruption occurs during execution of the instruction and it becomes impossible to write a subsequent data element to the register. Se is a synchronization control signal input to the arithmetic processing unit, which stops the arithmetic operation,
Instruct restart.

一致回路２４，２６の一方の端子へはレジスタ
１６中の命令の第１オペランド本例ではレジス
タ番号２が入力され、２４の他方の端子にはレジ
スタ１４内の命令の第２オペランドがまた２６
の他方の端子には同レジスタ１４内の命令の第
３オペランドが入力され（これらは逆でもよ
い）、両者比較される。従つて命令の第２、第
３オペランドの一方でも命令の第１オペランド
と同じであれば、従つて命令が命令とチエイ
ニングしておれば一致回路２４，２６のいずれか
から一致出力があり、これがアンドゲート３０，
３２の一方の入力となる。従つてチエイニングし
ていなければアンドゲート３０の出力Saはな
く、回路２２はロード命令の次の演算命令は直ち
に発信可とする。チエイニングしている場合は、
前述のようにフラグWFの到来を待つ。またアン
ド３２はチエイニングしておればオアゲート２８
の出力で開可能となつており、回路２２が演算処
理部起動信号Sdを出力するとき前述の信号Scを
出力する。 The first operand of the instruction in the register 16, register number 2 in this example, is input to one terminal of the matching circuits 24, 26, and the second operand of the instruction in the register 14 is also input to the other terminal of the matching circuit 24, 26.
The third operand of the instruction in the register 14 is input to the other terminal of the register 14 (these may be reversed), and the two are compared. Therefore, if either the second or third operand of the instruction is the same as the first operand of the instruction, if the instruction is chained with the instruction, there will be a match output from either of the match circuits 24 or 26, and this will result in a match output. and gate 30,
This is one input of 32. Therefore, if there is no chaining, there is no output Sa from the AND gate 30, and the circuit 22 can immediately issue the operation instruction following the load instruction. If you are chaining,
Wait for the arrival of flag WF as mentioned above. Also, if AND32 is chaining, or gate 28
When the circuit 22 outputs the arithmetic processing unit activation signal Sd, it outputs the above-mentioned signal Sc.

第６図の下部タイムチヤートのＦ、Ｄ、Ｑ、
Ｌ、Ｅ、WFは第７図で説明したそれに対応し、
第６図上部の命令実行状況を命令フエツチ、デコ
ードなどを含めて説明するものである。 F, D, Q in the lower time chart of Figure 6.
L, E, WF correspond to those explained in Fig. 7,
The instruction execution situation shown in the upper part of FIG. 6 will be explained, including instruction fetching, decoding, etc.

以上説明したように本発明によればアクセス中
断が生じてもベクトル処理遅れを可及的に短縮す
ることができ、勿論ベクトル処理に何ら不都合、
支障はない利点が得られる。なお以上ではロード
命令と演算命令とでチエイニングがある場合、な
い場合についての同期制御について説明したが、
本発明はこれのみに限定されず、第１の種類の命
令と第２の種類の命令との間にチエイニングが有
つたり無かつたりする場合に適用できるものであ
る。 As explained above, according to the present invention, even if access interruption occurs, vector processing delay can be shortened as much as possible, and of course there is no problem with vector processing.
You get the benefits without any problems. In the above, we have explained synchronization control when there is chaining between the load instruction and the operation instruction, and when there is no chaining.
The present invention is not limited to this, but can be applied to cases where there is or is not chaining between the first type of instruction and the second type of instruction.

[Brief explanation of the drawing]

第１図はベクトル処理装置を説明するブロツク
図、第２図はベクトル処理要領を説明する図、第
３図〜第５図は命令列実行状況を説明する図、第
６図は本発明の制御方式の説明図、そして第７図
は本発明の実施例を示すブロツク図である。図面でブロツク「ロード処理部」は第１の処理
部、ブロツク「ストア処理部」は第２の処理部、
１４，１６，２４，２６はチエイン検出手段であ
る。 FIG. 1 is a block diagram explaining the vector processing device, FIG. 2 is a diagram explaining the vector processing procedure, FIGS. 3 to 5 are diagrams explaining the instruction sequence execution status, and FIG. 6 is the control of the present invention. An explanatory diagram of the system, and FIG. 7 is a block diagram showing an embodiment of the present invention. In the drawing, the block "load processing section" is the first processing section, the block "store processing section" is the second processing section,
14, 16, 24, and 26 are chain detection means.

Claims

[Claims] 1. Comprising first and second processing units that can be executed in parallel,
In an instruction processing synchronization control method in a vector processing device that processes multiple element data with one vector instruction, there is A means for detecting whether or not there is a chain is provided, and when it is detected that there is a chain and the execution of the preceding instruction is interrupted halfway through the number of element data to be processed by the instruction, the method is implemented accordingly. The execution of the subsequent outgoing instruction is interrupted until the number of element data is less than the number of element data processed by the preceding instruction, and when it is detected that there is no chain, the execution of the subsequent outgoing instruction is asynchronous with the preceding instruction. An instruction processing synchronous control method characterized by execution. 2 The first processing unit is a load processing unit, the second processing unit is an arithmetic processing unit, and the first instruction is a load instruction,
2. The instruction processing synchronization control system according to claim 1, wherein the second instruction is an arithmetic instruction.