JPS6116112B2

JPS6116112B2 -

Info

Publication number: JPS6116112B2
Application number: JP8611080A
Authority: JP
Inventors: Tetsuo Okamoto; Shigeaki Okuya
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1980-06-25
Filing date: 1980-06-25
Publication date: 1986-04-28
Also published as: JPS5710872A

Description

【発明の詳細な説明】本発明は、例えば複数の第１オペランドと複数
の第２オペランドの対応するオペランド同士を演
算するベクトル演算処理装置における命令制御装
置に関し、特に命令制御装置の命令パイプライン
において、演算処理や記憶制御部を管理する演算
制御管理部分を複数段にするとともに、演算処理
部の少なくとも第１段階を管理する命令パイプラ
インの動作を参照することにより、後続命令の制
御を行なうようにしてデータ処理の制御を容易に
した命令制御装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to an instruction control device in a vector arithmetic processing device that operates on corresponding operands, for example, a plurality of first operands and a plurality of second operands, and particularly relates to an instruction control device in an instruction pipeline of an instruction control device. , the arithmetic control management section that manages the arithmetic processing and storage control section is arranged in multiple stages, and subsequent instructions are controlled by referring to the operation of the instruction pipeline that manages at least the first stage of the arithmetic processing section. The present invention relates to an instruction control device that facilitates control of data processing.

汎用計算機では、１エレメントのデータをメモ
リ上から中央処理装置内のレジスタにロードした
り、またレジスタ上の１エレメントよりなる第２
入力オペランドと１エレメントよりなる第３入力
オペランドとの間に演算を施し、１エレメントよ
りなる結果オペランドを得る。そしてこのような
制御を行なう命令はスカラ命令といわれる１命令
で単数のエレメント処理を行なう命令である。 In a general-purpose computer, one element of data is loaded from memory into a register in the central processing unit, and a second element consisting of one element on a register is loaded.
An operation is performed between the input operand and a third input operand consisting of one element to obtain a result operand consisting of one element. An instruction that performs such control is an instruction called a scalar instruction that processes a single element in one instruction.

しかしながらベクトル演算装置では、１命令に
より複数のエレメントを処理するベクトル命令に
より制御されるものである。例えば、ロード命令
では第１図に示す如く、主記憶装置１上の複数の
エレメントa₁，a₂，a₃…………anおよびb₁，b₂，
b₃…………bnを命令制御装置４に命令にもとづ
き、記憶制御部３、主記憶制御装置２を経由して
ベクトル・レジスタ５にロードし、例えば次の加
算命令によつて、演算処理部６において、これら
のデータを加算させて、Ａ＋Ｂ＝Ｃすなわちa₁＋
b₁＝c₁、a₂＋b₂＝c₂…………an＋bn＝cnという加
算を行なわせ、この結果得られた複数の加算結果
c₁，c₂，c₃…………cnをベクトル・レジスタ５に
セツトしたのちに主記憶装置１に格納するような
処理が行なわれる。この場合、１つの加算命令に
より、上記a₁＋b₁＝c₁、a₂＋b₂＝c₂…………an＋
bn＝cnという複数の演算が順次行なわれるもの
である。 However, vector arithmetic devices are controlled by vector instructions that process a plurality of elements with one instruction. For example _, in _a load instruction, as _shown in _FIG _.
b ₃ ......Bn is loaded into the vector register 5 via the memory control unit 3 and main memory control unit 2 based on the instruction to the instruction control device 4, and the arithmetic processing is performed by, for example, the next addition instruction. In section 6, these data are added and A+B=C, that is, a ₁ +
b ₁ = c ₁ , a ₂ + b ₂ = c ₂ ……an + bn = cn, and the resulting multiple addition results
c ₁ , c ₂ , c _{3 .} . . cn is set in the vector register 5 and then stored in the main storage device 1. In this case, with one addition instruction, the above a ₁ + b ₁ = c ₁ , a ₂ + b ₂ = c ₂ …………an+
Multiple operations such as bn=cn are performed sequentially.

このような演算を行なう場合に、ベクトル演算
装置の如き高速計算機の分野では、命令をパイプ
ライン処理することが一般に行なわれいる。例え
ば、１つの命令処理は、第２図に示す如く、命
令語の取出し（Fetch）、その解読（Decode）、
命令実行（Execute）の３段階に分けることが
できる。そしてこのような命令処理を１命令ずつ
処理せずに、先行の命令が命令実行を行なつてい
るとき、次の命令は、命令の解読を行なつてお
り、さらに後続する命令は命令語の取出を行な
う。すなわち、第３図に示す如く、先行の命令
V₁が命令実行（Ｅ）を行なつているとき、次の
命令V₂は命令解読（Ｄ）を行ない、さらに次の
命令V₃は命令語取出（Ｆ）を行なうというよう
に、各段階を同時に処理するパイプライン処理方
式により処理されている。このとき命令実行のた
めに、複数サイクルを必要とするものであるが、
一般に演算処理部はパイプライン構造ではなく１
命令実行に際して、アダーやシフターを複数回使
用して１つのスカラ命令、つまり１エレメントづ
つの処理を行なうようになつている。したがつて
１つの先行命令が命令実行（Ｅ）を終了するまで
次の命令に対する命令実行を行なうことができな
かつた。 When performing such operations, instructions are generally processed in a pipeline in the field of high-speed computers such as vector arithmetic units. For example, as shown in FIG. 2, one instruction process includes fetching an instruction word (Fetch), decoding it (Decode),
It can be divided into three stages: instruction execution (Execute). When the preceding instruction is executing the instruction without processing the instruction one by one, the next instruction is decoding the instruction, and the subsequent instruction is decoding the instruction word. Perform extraction. In other words, as shown in Figure 3, the preceding instruction
When V ₁ is executing an instruction (E), the next instruction V ₂ executes instruction decoding (D), and the next instruction V ₃ executes instruction word fetching (F), and so on. Processing is performed using a pipeline processing method that processes both at the same time. At this time, multiple cycles are required to execute the instruction, but
In general, the arithmetic processing unit does not have a pipeline structure but a single structure.
When executing an instruction, adders and shifters are used multiple times to process one scalar instruction, that is, one element at a time. Therefore, the next instruction cannot be executed until the instruction execution (E) of one preceding instruction is completed.

ところが演算速度の高速化があまり要求されな
い場合には、上記の如き方式でもあまり問題はな
いが、ベクトル命令を超高速度で処理する場合に
は、演算処理部をパイプライン構造にし、先行の
エレメントの演算処理が完了する前に後続のエレ
メントを投入し、その演算処理を開始する必要が
ある。 However, if high-speed calculations are not required, the method described above will not cause much of a problem, but if vector instructions are to be processed at extremely high speeds, the calculation processing section should be constructed in a pipeline structure, and the preceding elements It is necessary to input the subsequent element and start its calculation process before the calculation process of .

例えば加算を行なう場合、演算処理部における
命令実行は、データの読出し（Read）、両オ
ペランドの指数比較（Compare）、指数を合わ
せるためのシフト（Aligment）、加算
（Add）、演算後正規化のためのシフト（Post
Shift）、データの書込み（Write）の６段階に
分けられる。ここで上記およびではシフタを
使用する必要がある。汎用計算機では同じシフタ
を使用しているが、これでは演算速度が遅くなる
ので、ベクトル命令を超高速に処理するためには
当然演算処理部をパイプライン構造にし、このた
めに上記およびのためにそれぞれ別のシフタ
が設けられることになる。したがつて、複数のエ
レメントを１つの命令で処理するベクトル命令を
パイプライン演算器で処理すると、第４図イに示
す如く、一番先行のエレメントl₁に関して最終段
階である書込処理が行なわれるとき、次のエレメ
ントl₂はポストシフト処理が行なわれ、エレメン
トl₃については加算処理が行なわれエレメントl₄
についてはアライメント処理が行なわれ、エレメ
ントl₅については、指数比較処理が行なわれ、そ
してエレメントl₆については、読出処理が行なわ
れ、このような各処理がエレメントlnについて順
次行なわれる。そしてその結果、ベクトル命令で
加算を行なう場合には、１命令について第４図ロ
の如き、平行四辺形で表示されるような処理が遂
行されることになる。 For example, when performing addition, the instruction execution in the arithmetic processing unit involves reading the data (Read), comparing the exponents of both operands (Compare), shifting to match the exponents (Aligment), adding (Add), and normalizing after the operation. Shift for (Post
Shift), data writing (Write). Here and above you need to use a shifter. General-purpose computers use the same shifter, but this slows down the calculation speed, so in order to process vector instructions at ultra-high speed, the calculation processing unit naturally has a pipeline structure, and for this purpose, the above and Separate shifters will be provided for each. Therefore, when a vector instruction that processes multiple elements in one instruction is processed by a pipeline arithmetic unit, the final stage of write processing is performed for the most preceding element _l1 , as shown in Figure 4A. When the next element l ₂ is post-shifted, element l ₃ is added and added to element l _4.
Alignment processing is performed for element _l5 , index comparison processing is performed for element l5, and read processing is performed for element _l6 , and each of these processes is sequentially performed for element ln. As a result, when addition is performed using a vector instruction, a process represented by a parallelogram as shown in FIG. 4B is performed for each instruction.

また第１図に示すベクトル・レジスタ３５に主
記憶装置１からデータをロードするロード命令の
ときは、記憶制御部３において、加算命令と同様
のパイプライン処理が行われる。 Furthermore, in the case of a load instruction for loading data from the main memory device 1 into the vector register 35 shown in FIG. 1, pipeline processing similar to that for the addition instruction is performed in the storage control unit 3.

しかしながら、このようなパイプライン構造を
具備した演算処理部で、ベクトル命令V₁，V₂
（例えばいずれも加算命令とする）を連続的に処
理する場合、命令制御装置における命令制御パイ
プライン構造では、第５図に示すような状態でこ
れらのベクトル命令V₁，V₂に対する処理が行な
われる。 However, in an arithmetic processing unit equipped with such a pipeline structure, vector instructions V ₁ and V ₂
(for example, both are addition instructions), the instruction control pipeline structure in the instruction control device processes these vector instructions V ₁ and V ₂ in the state shown in FIG. It can be done.

いま、命令制御装置では、命令V₁により、命
令語の読出しF₁が行なわれ、それの解読D₁が行
なわれるとき、命令V₂により命令語の読出しF₂
が行なわれる。そして演算処理部での命令V₁の
命令実行E₁が行なわれるとき、命令制御装置で
は命令V₂における命令語の解読D₂が行なわれ
る。しかるにこの命令実行E₁は、第５図に示す
如く、時間t₂において最後のエレメントl₈に対す
る書込処理が行なわれたときに終了し、それか
ら、命令V₂に対する命令実行E₂が行なわれる。
それ故、演算処理部で命令V₁において最後のエ
レメントl₈に対するデータ読出し処理が終つた時
刻t₁から命令V₂において最初のエレメントl₁′に対
するデータ読出し処理が始まる時刻t₂までのt₂−
t₁＝t₀の期間はこのデータ読出し処理回路は、ジ
ヨブの遂行が可能であるにもかかわらず、命令制
御装置から何もジヨブが与えられない、いわゆる
遊び期間となる。同様にして指数比較処理、アラ
イメント処理、加算処理、ポストシフト処理、お
よび書込処理の各回路にもそれぞれ期間t₀だけの
遊び期間が存在し、その結果、第５図において斜
線部Ｌで示す如き遊び期間が存在することにな
る。このように命令制御装置のパイプライン構造
上の制限により、演算処理部で命令V₁に対する
読出しの段階があけば命令V₂に対する読出し処
理を行なうことができるにもかかわらず、これを
行なうことができず、この結果上記遊び期間を生
ずる欠点があり、高速処理上問題となる。 Now, in the instruction control device, when an instruction word F 1 is read out by the instruction V ₁ and is decoded _{D 1} _, the instruction word is read out F ₂ by the instruction V ₂ .
will be carried out. When the instruction execution E ₁ of the instruction V ₁ is performed in the arithmetic processing unit, the instruction word in the instruction V ₂ is decoded D ₂ in the instruction control device. However, as shown in FIG. 5, this instruction execution E ₁ ends when the write process for the last element l ₈ is performed at time t ₂ , and then instruction execution E ₂ for the instruction V ₂ is performed. .
Therefore, t ₂ from time t ₁ when the data read process for the last element _l ₈ is finished in the instruction V 1 in the arithmetic processing unit to time t ₂ when the data read process for the first element l ₁ ' starts in the instruction V ₂ −
The period t ₁ =t ₀ is a so-called idle period in which the data read processing circuit is not given any job from the instruction control device, although it is possible to execute a job. Similarly, each of the circuits for index comparison processing, alignment processing, addition processing, postshift processing, and write processing each has an idle period of period t ₀ , and as a result, as shown by the shaded area L in FIG. There will be a period of play like this. As described above, due to limitations in the pipeline structure of the instruction control device, even though it is possible to perform read processing for instruction V ₂ once the read stage for instruction V ₁ is completed in the arithmetic processing unit, it is not possible to do so. As a result, there is a drawback that the above-mentioned idle period occurs, which poses a problem in terms of high-speed processing.

それ故、第６図に示す如く、命令パイプライン
の命令実行段階を、例えば２分割してデータ読出
し段階とその後の段階に分けて管理することが考
えられる。この場合には、命令制御装置に、第１
命令実行レジスタと第２命令実行レジスタを設
け、命令V₁については最初第１命令実行レジス
タにセツトされた命令により制御を行ない、命令
V₁においてすべてのエレメントに対する読出し
処理が終了した時刻T₁において第２命令実行レ
ジスタに命令をセツトして、該第２命令実行レジ
スタにセツトした命令にもとづき、時刻T₂まで
の書込み処理を管理するようにする。 Therefore, as shown in FIG. 6, it is conceivable to manage the instruction execution stage of the instruction pipeline by dividing it into two, for example, into a data read stage and a subsequent stage. In this case, the command control device has the first
An instruction execution register and a second instruction execution register are provided, and instruction _V1 is initially controlled by the instruction set in the first instruction execution register.
An instruction is set in the second instruction execution register at time T ₁ when read processing for all elements is completed in V ₁ , and write processing up to time T ₂ is managed based on the instruction set in the second instruction execution register. I'll do what I do.

しかしながら、例えば命令V₁が加算命令であ
り、命令V₂が比較命令であるようなときに次の
ような問題がある。 However, the following problem occurs when, for example, instruction V ₁ is an addition instruction and instruction V ₂ is a comparison instruction.

加算命令の場合には、既に記載した如く、デ
ータの読出し（Read）、両オペランドの指数比
較（Exponent Compare）、指数合わせのシフ
ト（Aligment）、加算（Add）、演算後正規
化のためのシフト（Post Shift）、データの書
込み（Write）の６サイクルで処理が終了し、ま
た比較命令の場合には、データの読出し
（Read）、両オペランドの指数比較（Exponent
Compare）、指数合わせのシフト
（Aligment）、比較（Compare）、比較結果の
書込み（Write）の５サイクルで処理が終了す
る。 In the case of an addition instruction, as described above, reading data (Read), exponent comparison of both operands (Exponent Compare), shift for exponent adjustment (Aligment), addition (Add), and shift for normalization after operation (Post Shift), data writing (Write), processing is completed in 6 cycles, and in the case of a comparison instruction, data reading (Read) and exponent comparison of both operands (Exponent
The process is completed in five cycles: Compare), shift for index alignment (Aligment), comparison (Compare), and writing of comparison results (Write).

それ故、第７図に示す如く、命令V₁として加
算命令が処理され、命令V₂として比較命令が処
理されるとき、書込みサイクルが同時に行なわれ
ることになるが、演算処理部における書込みレジ
スタは１つしかないので、命令V₁による結果と
命令V₂による結果が同時に重ね書きされること
になり、これらは分離することができないので、
命令V₁と命令V₂との結果が混合され、結局いず
れの命令に対しても正しい結果を得ることができ
ないことになる。 Therefore, as shown in FIG. 7, when an addition instruction is processed as instruction V ₁ and a comparison instruction is processed as instruction V ₂ , write cycles are performed simultaneously, but the write register in the arithmetic processing section is Since there is only one, the result of instruction V ₁ and the result of instruction V ₂ will be overwritten at the same time, and they cannot be separated, so
The results of instruction V ₁ and instruction V ₂ will be mixed, and in the end it will not be possible to obtain correct results for either instruction.

これを防止するには、先行の命令V₁が完了し
てから後続の命令V₂を発信するか、あるいは命
令毎に必要な処理サイクルを検出しておき、追い
越すことあるいは追付くことが予想されるときに
はこのようなことが生じなくなるまで後続の命令
発信を待たせるようにしなければならない。しか
しながら前者の場合には、パイプラインにした意
味がなくなり、データ処理能率が落ちるし、また
後者の場合には、常に命令を比較し、命令に応じ
て後続の命令発信制御を行なわなければならない
ので、制御が非常に複雑になる。 To prevent this, either issue the subsequent instruction V ₂ after the preceding instruction V ₁ is completed, or detect the processing cycles required for each instruction, so that it is expected that it will overtake or catch up. When this occurs, it is necessary to make subsequent command transmissions wait until such a situation no longer occurs. However, in the former case, there is no point in creating a pipeline and data processing efficiency decreases, and in the latter case, instructions must be constantly compared and subsequent instruction transmission control must be performed according to the instruction. , control becomes very complex.

したがつて本発明は上記の如く、後続の命令が
先行命令を追付き追越すようなことを簡単に防止
することができる命令制御装置の提供を目的とす
るものであつて、このために本発明における命令
制御装置では、パイプライン構造を有する演算処
理部等の演算処理手段または記憶データ処理手段
と上記演算処理手段または記憶制御部等の記憶デ
ータ処理手段におけるデータ処理を制御する命令
制御装置を具備し、ベクトル命令を処理するデー
タ処理装置において、上記命令制御装置の演算実
行命令情報を保持するステージ・レジスタ、ステ
ージ設定回路、命令デコーダ等を具備する演算制
御実行ステージに演算制御実行命令情報を保持す
る複数の演算制御実行命令保持手段を設けてこの
演算制御実行ステージを複数のステージに分割す
るとともに、上記演算処理手段または記憶データ
処理手段において命令を実行する場合、一個の演
算処理手段のなかで、または一個の記憶データ処
理手段のなかではパイプライン段数を命令毎に一
定にするように構成し、少なくとも一個の演算処
理手段のなかで、または一個の記憶データ処理手
段のなかでは、先行の命令が後続命令よりも先に
実行処理が終了するようにしたことを特徴とす
る。 Therefore, as described above, an object of the present invention is to provide an instruction control device that can easily prevent a subsequent instruction from catching up and overtaking a preceding instruction. The instruction control device according to the invention includes an arithmetic processing means such as an arithmetic processing section having a pipeline structure or a storage data processing means, and an instruction control device that controls data processing in the arithmetic processing means or storage data processing means such as a storage control section. In a data processing device that processes vector instructions, arithmetic control execution instruction information is transmitted to an arithmetic control execution stage that includes a stage register, a stage setting circuit, an instruction decoder, etc., that holds arithmetic execution instruction information of the instruction control device. A plurality of arithmetic control execution instruction holding means are provided to divide the arithmetic control execution stage into a plurality of stages, and when an instruction is executed in the arithmetic processing means or storage data processing means, one arithmetic processing means. or in one storage data processing means, the number of pipeline stages is made constant for each instruction, and in at least one arithmetic processing means or one storage data processing means, the number of preceding pipeline stages is configured to be constant for each instruction. It is characterized in that the execution of an instruction is completed before the subsequent instruction.

本発明の詳細を説明するに先立ち、第８図によ
り本発明の動作の概略を説明する。 Before explaining the details of the present invention, an outline of the operation of the present invention will be explained with reference to FIG.

命令V₂として比較命令を実行する場合に、比
較段階の次にダミーサイクルを挿入しておき、そ
の次にライト段階を行なう。したがつて、第８図
に示す如く、比較命令を実行するとき、加算命令
と同様に６サイクルタイムを必要とするようにな
り、命令V₁として加算命令が伝達され、命令V₂
として比較命令が伝達される場合でも、その書込
み段階が重畳するようなことがなくなる。 When executing a comparison instruction as instruction _V2 , a dummy cycle is inserted after the comparison stage, and then a write stage is performed. Therefore, as shown in FIG. 8, when executing the comparison instruction, it requires 6 cycle time like the addition instruction, and the addition instruction is transmitted as instruction V ₁ , and the addition instruction is transmitted as instruction V ₂ .
Even if a comparison instruction is transmitted as a ``transfer'', the write steps will not overlap.

以下本発明の一実施例を第９図乃至第１１図に
もとづき説明する。 An embodiment of the present invention will be described below based on FIGS. 9 to 11.

第９図は本発明の一実施例構成図、第１０図は
その動作説明図、第１１図は本発明の一部詳細説
明図である。 FIG. 9 is a block diagram of an embodiment of the present invention, FIG. 10 is an explanatory diagram of its operation, and FIG. 11 is a partially detailed explanatory diagram of the present invention.

図中、７はERステージ設定回路、８はERステ
ージレジスタ、９はEWステージ設定回路、１０
はEWステージレジスタ、１１および１２はデコ
ーダ、１３はベクトルレジスタ、１４は演算処理
部、１５は第１オペランドレジスタ、１６は第２
オペランドレジスタ、１７は比較回路、１８は第
１データレジスタ、１９は第２データレジスタ、
２０は比較保持レジスタ、２１は第１シフタ、２
２は第１演算レジスタ、２３は第２演算レジス
タ、２４は演算回路、２５は演算識別レジスタ、
２６は演算出力レジスタ、２７は第２シフタ、２
８は出力レジスタ、２９はダミーレジスタであ
る。 In the figure, 7 is an ER stage setting circuit, 8 is an ER stage register, 9 is an EW stage setting circuit, 10
is an EW stage register, 11 and 12 are decoders, 13 is a vector register, 14 is an arithmetic processing unit, 15 is a first operand register, and 16 is a second operand register.
Operand register, 17 is a comparison circuit, 18 is a first data register, 19 is a second data register,
20 is a comparison holding register, 21 is a first shifter, 2
2 is a first calculation register, 23 is a second calculation register, 24 is a calculation circuit, 25 is a calculation identification register,
26 is a calculation output register, 27 is a second shifter, 2
8 is an output register, and 29 is a dummy register.

ERステージ・レジスタ８およびEWステー
ジ・レジスタ１０は、それぞれ演算実行命令を保
持するステージ・レジスタであり、演算実行命令
保持手段を提供する。 The ER stage register 8 and the EW stage register 10 are stage registers each holding an operation execution instruction, and provide operation execution instruction holding means.

ERステージ設定回路７は、演算処理部１４に
対してデコーダ１１から命令を伝達したときに読
出し処理の実行が可能かどうか判断し、読出し処
理可能であるときデコーダＤにより命令を受取つ
てこれをERステージレジスタ８に伝達してさら
にデコーダ１１で解読したり、また該ERステー
ジレジスタ８にツトした命令を保持する必要がな
くなつたときにはこれを消去したり、あるいは次
の新らしい命令を受理するような制御を行なうも
のである。 The ER stage setting circuit 7 determines whether the read process can be executed when the instruction is transmitted from the decoder 11 to the arithmetic processing unit 14, and when the read process is possible, the decoder D receives the instruction and executes the ER stage setting circuit 7. It is transmitted to the stage register 8 and further decoded by the decoder 11, or when it is no longer necessary to hold the instruction loaded in the ER stage register 8, it is erased, or the next new instruction is accepted. It performs precise control.

EWステージ設定回路９はERステージレジス
タ８からEWステージレジスタ１０に対して命令
をセツトしたり、またEWステージレジスタ１０
にセツトした命令を保持する必要がなくなつたと
きにはこれを消去したり、あるいは次の新らしい
命令を受理するような制御を行なうものである。 The EW stage setting circuit 9 sets the command from the ER stage register 8 to the EW stage register 10, and also sets the command from the ER stage register 8 to the EW stage register 10.
When it is no longer necessary to hold the set command, it is deleted or the next new command is accepted.

デコーダ１１はERステージレジスタ８にセツ
トされた命令をデコードするものであつて、特に
書込み処理命令以外の部分をデコードするもので
ある。 The decoder 11 decodes the instructions set in the ER stage register 8, and in particular decodes portions other than write processing instructions.

デコーダ１２はEWステージレジスタ１０にセ
ツトされた命令をデコードするものであつて、特
に書込み処理命令部分をデコードするものであ
る。 The decoder 12 decodes the instructions set in the EW stage register 10, and in particular decodes the write processing instruction portion.

ベクトルレジスタ１３はベクトル演算に必要な
複数のエレメントを一時的にセツトしたり、また
演算処理部１４で処理した結果をデコーダ１２に
より、解読されたライト命令により一時的にセツ
トされるようなものである。 The vector register 13 is used to temporarily set a plurality of elements necessary for a vector operation, and is temporarily set by a write instruction decoded by the decoder 12 for the result processed by the arithmetic processing unit 14. be.

演算処理部１４はベクトルレジスタ１３にセツ
トされているエレメントを使用して、デコーダ１
１により解読された演算処理を実行するものであ
つて、比較回路１７、第１シフタ２１、演算回路
２４、第２シフタ２７等を具備している。 The arithmetic processing unit 14 uses the elements set in the vector register 13 to process the decoder 1.
1, and includes a comparator circuit 17, a first shifter 21, an arithmetic circuit 24, a second shifter 27, and the like.

第１オペランドレジスタ１５、および第２オペ
ランドレジスタ１６は、演算をうける第１オペラ
ンドおよび第２オペランドがそれぞれベクトルレ
ジスタ１３から伝達されるものである。比較回路
１７は、第１オペランドおよび第２オペランドの
指数部分を比較するものである。 The first operand register 15 and the second operand register 16 each receive a first operand and a second operand to be operated on from the vector register 13. Comparison circuit 17 compares the exponent parts of the first and second operands.

第１データレジスタ１８および第２データレジ
スタ１９は、上記第１オペランドレジスタ１５お
よび第２オペランドレジスタ１６からそれぞれ第
１オペランドおよび第２オペランドが伝達される
ものである。比較保持レジスタ２０は、比較回路
１７で行なわれた上記第１オペランドおよび第２
オペランドの指数部の比較結果がセツトされるレ
ジスタであつて、この比較結果にもとづき、第１
シフタ２１において、両オペランドのいずれか一
方が指数合わせのためにシフトされるものであ
る。 The first data register 18 and the second data register 19 are to which the first and second operands are transmitted from the first operand register 15 and the second operand register 16, respectively. The comparison holding register 20 stores the first and second operands performed by the comparison circuit 17.
A register in which the comparison result of the exponent parts of the operands is set, and based on this comparison result, the first
In the shifter 21, one of both operands is shifted for index matching.

第１演算レジスタ２２および第２演算レジスタ
２３は、第１シフタ２１の出力がセツトされるも
のであつて、指数合わせが行なわれた、両オペラ
ンドがセツトされるものである。演算回路２４は
加算とか比較等の演算を行なうものであり、演算
結果得られた識別事項、例えば両オペランドのど
ちらが大きいとか、また演算の結果得られた数値
の先頭に０がいくつあるかというような識別結果
は、演算識別レジスタ２５にセツトされ、また演
算結果の数値は演算出力レジスタ２６にセツトさ
れる。 The first arithmetic register 22 and the second arithmetic register 23 are set with the output of the first shifter 21, and both operands after exponent matching are set therein. The arithmetic circuit 24 performs operations such as addition and comparison, and identifies items obtained as a result of the operation, such as which of the two operands is larger, or how many 0s there are at the beginning of the numerical value obtained as the result of the operation. The identification result is set in the calculation identification register 25, and the numerical value of the calculation result is set in the calculation output register 26.

第２シフタ２７は演算出力レジスタ２６に出力
された数値の先頭に０が存在する場合、演算識別
レジスタ２５から伝達される制御信号により、そ
の０が存在しないようにシフトされるものであ
り、そのシフト結果が出力レジスタ２８に出力さ
れる。 If a 0 exists at the beginning of the numerical value output to the arithmetic output register 26, the second shifter 27 shifts it so that the 0 does not exist in response to a control signal transmitted from the arithmetic identification register 25. The shift result is output to the output register 28.

ダミーレジスタ２９は演算回路２４から伝達さ
れた比較結果データを直接出力レジスタに伝達し
ないでそのまま一時保持するものであつて、例え
ば、演算処理部１４で処理すべき命令が、比較命
令の場合に、デコーダ１１からの制御信号にもと
づきこのダミーレジスタ２９が動作されるもので
あり、加算命令等の場合と処理サイクル数を揃え
るためのダミーサイクル処理に使用されるもので
ある。第１１図ロはこの状態のときの演算回路２
４の出力径路を示す。 The dummy register 29 temporarily holds the comparison result data transmitted from the arithmetic circuit 24 without directly transmitting it to the output register. For example, when the instruction to be processed by the arithmetic processing unit 14 is a comparison instruction, This dummy register 29 is operated based on a control signal from the decoder 11, and is used for addition instructions and dummy cycle processing to equalize the number of processing cycles. Figure 11b shows the arithmetic circuit 2 in this state.
4 shows the output path.

いま、第９図において加算命令である命令V₁
を実行する場合、命令制御装置にこの命令V₁が
伝達される。これにより命令制御装置は命令フエ
ツチF₁を行ない、次いでこれをデコードD₁す
る。そして時刻t₀′でERステージ設定回路７が演
算処理部１４において、上記デコードD₁にもと
づく命令が実行できると判断したときERステー
ジレジスタ８にデコーダにより伝達された命令を
セツトする。 Now, in FIG. 9, instruction V ₁ which is an addition instruction
When executing, this instruction _V1 is transmitted to the instruction control device. This causes the instruction control device to perform an instruction fetch _F1 and then decode it _D1 . Then, at time _t0 ', when the ER stage setting circuit 7 determines in the arithmetic processing section 14 that the instruction based on the decode _D1 can be executed, it sets the instruction transmitted by the decoder in the ER stage register 8.

このERステージレジスタ８にセツトされた命
令はさらにデコーダ１１によりデコードされる。
それにもとづきデコーダ１１はエレメントの読出
要求命令および該エレメントの処理要求命令をベ
クトルレジスタ１３および演算処理部１４に伝達
する。これによりベクトルレジスタ１３にセツト
されているエレメントl₁，l₂…………が順次読出
され（R₁）（エレメントl₁は第１オペランドa₁と
第２オペランドb₁、エレメントl₂は第１オペラン
ドa₂と第２オペランドb₂…………よりなる）、こ
れら第１オペランドa₁，a₂…………および第２オ
ペランドb₁，b₂…………は、順次第１オペランド
レジスタ１５および第２オペランドレジスタ１６
にセツトされ、指数比較（C₁）指数合わせのため
のシフト（A₁）、加算（Ad）、加算後正規化のた
めのシフト（PS）が行なわれる。そして最初の
エレメントl₁に対する上記（R₁）乃至（PS）まで
の各段階の処理が行なわれ、その演算結果をベク
トルレジスタ１３にセツトすべき段階にきたと
き、EWステージ設定回路９は、ERステージレ
ジスタ８にセツトされている命令をEWステージ
レジスタ１０にセツトする。これは、例えば演算
処理部１４からEWステージ設定回路９に対して
発生される書込段階到達指示信号にもとづき行な
うこともできるし、またカウンタ等の手段で書込
段階到達サイクルを予測して行なうこともでき
る。このEWステージレジスタ１０に命令をセツ
トしたとき、デコーダ１２はその書込み処理命令
部分をデコードして書込要求命令をベクトルレジ
スタ１３に伝達する。これにより時刻t₁′におい
て上記エレメントl₁，l₂…………に対する演算結
果が演算処理部１４からベクトルレジスタ１３に
セツトされる。書込み段階の処理が遂行されるこ
とになる。（W₁）。 The instruction set in the ER stage register 8 is further decoded by the decoder 11.
Based on this, the decoder 11 transmits an element read request instruction and an element processing request instruction to the vector register 13 and the arithmetic processing section 14. As a result, elements l ₁ , l ₂ ...... set in the vector register 13 are sequentially read out (R ₁ ) (element l ₁ has the first operand a 1 and second operand b 1 , element l 2 has the first operand a ₁ and second operand b ₁ , and element l ₂ has the first (consisting of the first operand a ₂ and the second operand b ₂ ...), these first operands a ₁ , a ₂ ... and the second operands b ₁ , b ₂ ...... are the first operand in order. Register 15 and second operand register 16
, and index comparison (C ₁ ), shift for index adjustment (A ₁ ), addition (Ad), and shift for normalization after addition (PS) are performed. Then, each stage of processing from (R ₁ ) to (PS) described above is performed for the first element l ₁ , and when it comes to the stage where the calculation result should be set in the vector register 13 , the EW stage setting circuit 9 The instruction set in stage register 8 is set in EW stage register 10. This can be done, for example, based on a write stage arrival instruction signal generated from the arithmetic processing unit 14 to the EW stage setting circuit 9, or by predicting the cycle at which the write stage has been reached by means such as a counter. You can also do that. When an instruction is set in the EW stage register 10, the decoder 12 decodes the write processing instruction portion and transmits a write request instruction to the vector register 13. As a result, at time t ₁ ', the calculation results for the elements l ₁ , l ₂ . . . are set in the vector register 13 from the calculation processing section 14. The write stage process will be performed. ( _W1 ).

一方、命令V₁による上記エレメントl₁，l₂……
……の読出し段階が時刻t₂′で終るとき、ベクト
ルレジスタ１３はこれをERステージ設定回路７
に報告する。これにより、ERステージ設定回路
７は今度はデコーダＤを経由して伝達されている
比較命令である命令V₂をERステージレジスタ８
にセツトし、これをデコーダ１１でデコードす
る。この場合、デコーダ１１からの指示によりベ
クトルレジスタ１３から、比較されるべき第１オ
ペランドa₁′，a₂′…………と第２オペランドb₁′，
b₂′…………が順次読出され（R₂）、第１オペラン
ドレジスタ１５および第２オペランドレジスタ１
６にセツトされ、上記命令V₁の場合と同様に指
数比較（C₂）、指数合わせのためのシフト（A₂）
が行なわれ、次いで第１オペランドa₁′と第２オ
ペランドb₁′、第１オペランドa₂′と第２オペラン
ドb₂′…………の比較（CMP）、ダミーサイクル処
理（DUMY）が行なわれる。 On the other hand, the above elements l ₁ , l ₂ . . . according to instruction V ₁
When the readout stage of ... ends at time _t2 ', the vector register 13 transfers it to the ER stage setting circuit 7.
Report to. As a result, the ER stage setting circuit 7 transfers the instruction V ₂ , which is the comparison instruction transmitted via the decoder D, to the ER stage register 8.
and is decoded by the decoder 11. In this case, the first operands a ₁ ′, a ₂ ′…… and the second operands b ₁ ′,
b ₂ '...... are read out sequentially (R ₂ ), and the first operand register 15 and second operand register 1
6, and performs exponent comparison (C ₂ ) and shift for exponent matching (A ₂ ) as in the case of instruction V ₁ above.
is performed, and then comparisons (CMP) between the first operand a ₁ ′ and the second operand b ₁ ′, between the first operand a ₂ ′ and the second operand b ₂ ′, and dummy cycle processing (DUMY) are performed. It can be done.

また、上記時刻t₁′から開始された命令V₁にお
ける書込み段階の処理は時刻t₃′で終了するが、
このとき例えば、演算処理部１４からの指示によ
りあるいはカウンタ等の手段で書込段階終了サイ
クルを予測するなどの適当な手段により、EWス
テージ設定回路９はV₁命令を終了させるととも
に、V₂命令書込開始情報によりERステージレジ
スタ８にセツトされている命令をEWステージレ
ジスタ１０にセツトする。そしてデコーダ１２は
その書込み処理命令部分でデコードして書込要求
命令をベクトルレジスタ１３に伝達する。これに
より時刻t₃′から上記比較結果がベクトルレジス
タ１３にセツトされ（E₂）、時刻t₄′において上記
命令V₂における読出し段階の処理が終り、時刻
t₅′において命令V₂における書込み段階の処理が
終了することになる。 Furthermore, the write stage processing in instruction V ₁ started at time t ₁ ′ ends at time t ₃ ′, but
At this time, for example, the EW stage setting circuit 9 terminates the V ₁ instruction and executes the V 2 instruction by an instruction from the arithmetic processing unit 14 or by an appropriate means such as predicting the write stage end cycle by means such as _a counter. The instruction set in the ER stage register 8 based on the write start information is set in the EW stage register 10. Then, the decoder 12 decodes the write processing command part and transmits the write request command to the vector register 13. As a result, the above comparison result is set in the vector register 13 from time _t3 ' ( _E2 ), and at time _t4 ', the processing of the read stage in the above instruction _V2 is completed, and at time
At t ₅ ′, the write stage processing in instruction V ₂ ends.

したがつて、本発明によれば、比較命令のよう
に短かいサイクル数で終了する命令の実行に際し
ダミーサイクルを挿入し、サイクル数を揃えるこ
とができるので、先行の命令よりも後続命令が先
に終了するようなことはない。 Therefore, according to the present invention, a dummy cycle can be inserted when executing an instruction that completes in a short number of cycles, such as a comparison instruction, so that the number of cycles can be made equal, so that the subsequent instruction comes before the preceding instruction. There is no such thing as ending.

しかも、上記の命令V₂の如く、後続命令の実
行開始段階は、先行命令V₁の第１段階部分（読
出し段階）の処理終了後に連続して行なうことが
できるので、データ処理能率を高めることができ
る。 Moreover, as in the above-mentioned instruction V ₂ , the execution start stage of the subsequent instruction can be performed continuously after the processing of the first stage portion (read stage) of the preceding instruction V ₁ is completed, thereby improving data processing efficiency. I can do it.

[Brief explanation of the drawing]

第１図はベクトル演算装置の構成図、第２図乃
至第４図はその動作説明図、第５図は従来のベク
トル演算装置の問題点の説明図、第６図は上記問
題点を改善した場合の動作説明図、第７図は第６
図における問題点の説明図、第８図は本発明の動
作の概略説明図、第９図は本発明の一実施例構成
図、第１０図はその動作説明図、第１１図は本発
明の一部詳細説明図である。図中、１は主記憶装置、２は主記憶制御装置、
３は記憶制御部、４は命令制御装置、５はベクト
ルレジスタ、６は演算処理部、７はERステージ
設定回路、８はERステージレジスタ、９はEW
ステージ設定回路、１０はEWステージレジス
タ、１１および１２はデコーダ、１３はベクトル
レジスタ、１４は演算処理部、１５は第１オペラ
ンドレジスタ、１６は第２オペランドレジスタ、
１７は比較回路、１８は第１データレジスタ、１
９は第２データレジスタ、２０は比較保持レジス
タ、２１は第１シフタ、２２は第１演算レジス
タ、２３は第２演算レジスタ、２４は演算回路、
２５は演算識別レジスタ、２６は演算出力レジス
タ、２７は第２シフタ、２８は出力レジスタ、２
９はダミーレジスタをそれぞれ示す。 Figure 1 is a block diagram of a vector calculation device, Figures 2 to 4 are illustrations of its operation, Figure 5 is an illustration of the problems of the conventional vector calculation device, and Figure 6 is a diagram that has improved the above problems. An explanatory diagram of the operation in the case, Fig. 7 is the 6th
8 is a schematic diagram of the operation of the present invention, FIG. 9 is a configuration diagram of an embodiment of the present invention, FIG. 10 is an explanatory diagram of the operation, and FIG. 11 is a diagram of the operation of the present invention. It is a partially detailed explanatory diagram. In the figure, 1 is a main storage device, 2 is a main storage control device,
3 is a memory control unit, 4 is an instruction control unit, 5 is a vector register, 6 is an arithmetic processing unit, 7 is an ER stage setting circuit, 8 is an ER stage register, 9 is an EW
Stage setting circuit, 10 is an EW stage register, 11 and 12 are decoders, 13 is a vector register, 14 is an arithmetic processing unit, 15 is a first operand register, 16 is a second operand register,
17 is a comparison circuit, 18 is a first data register, 1
9 is a second data register, 20 is a comparison and holding register, 21 is a first shifter, 22 is a first calculation register, 23 is a second calculation register, 24 is a calculation circuit,
25 is an operation identification register, 26 is an operation output register, 27 is a second shifter, 28 is an output register, 2
9 indicates dummy registers, respectively.

Claims

[Scope of Claims] 1. Data processing for processing vector instructions, comprising an arithmetic processing means or storage data processing means having a pipeline structure and an instruction control device for controlling data processing in the arithmetic processing means or storage data processing means. In the apparatus, a plurality of arithmetic control execution instruction holding means for holding arithmetic control execution instruction information in an arithmetic control execution stage comprising a stage register, a stage setting circuit, an instruction decoder, etc. for holding arithmetic execution instruction information of the instruction control device; is provided to divide this arithmetic control execution stage into a plurality of stages, and when executing an instruction in the arithmetic processing means or storage data processing means, one arithmetic processing means or one storage data processing means. Among them, the number of pipeline stages is configured to be constant for each instruction, and in at least one arithmetic processing means,
Alternatively, an instruction control device characterized in that, in one storage data processing means, execution of a preceding instruction is completed before a subsequent instruction.