JPS628815B2

JPS628815B2 -

Info

Publication number: JPS628815B2
Application number: JP55181227A
Authority: JP
Inventors: Masanori Mogi
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1980-12-23
Filing date: 1980-12-23
Publication date: 1987-02-25
Also published as: JPS57105033A

Description

【発明の詳細な説明】本発明はいわゆるパイプライン演算器の制御方
法に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a control method for a so-called pipeline arithmetic unit.

コンピユータ処理の高速化を図るための手法が
種々提案されている。本発明はその手法の中の１
つであるパイプライン演算器について言及する。
パイプライン演算器とは、演算すべきデータを複
数段の演算ステージに次から次へと連続的に送り
出して処理するものである。かくしてパイプライ
ン演算器自体はかなり高速のデータ処理を行なう
ことができる。 Various methods have been proposed to speed up computer processing. The present invention is one of those methods.
Let's talk about the pipeline arithmetic unit.
A pipeline arithmetic unit is one that processes data by continuously sending it to a plurality of arithmetic stages one after another. In this way, the pipeline arithmetic unit itself can perform data processing at a fairly high speed.

ところが、パイプライン演算器そのものは高速
処理を行なつているにも拘らず、これと連係する
例えばインストラクシヨン・ユニツト、メモリ・
ユニツトとのやりとりが円滑になされないため、
結局コンピユータシステム全体としての演算処理
速度は、期待し得る理論上の速度に到達していな
い。この点について、現状では未だ十分な解決案
が議論されていない。というのは、パイプライン
演算器そのものが未だ完全な実用段階に入つてい
ないからである。 However, although the pipeline arithmetic unit itself performs high-speed processing, the instruction units, memory, etc.
Because communication with the unit is not smooth,
In the end, the processing speed of the computer system as a whole has not reached the expected theoretical speed. Regarding this point, no sufficient solution has yet been discussed. This is because the pipeline arithmetic unit itself has not yet entered the stage of complete practical use.

従つて本発明の目的は、実質的に期待し得る理
論上の演算速度に近付けることのできるパイプラ
イン演算器の制御方法を提案することである。 Accordingly, an object of the present invention is to propose a method of controlling a pipeline computing unit that can substantially approach the expected theoretical computing speed.

上記目的に従い本発明は、直列接続された複数
段の演算ステージからなり１命令毎に複数のエレ
メントに含まれるデータを順次処理するパイプラ
イン演算器において、該複数段の演算ステージと
並行して前記複数のエレメントのエレメント数を
計数するステージを設け、その計数が所定の計数
値に達したとき、その周辺装置に対して所定の予
告信号を供給するようにしたことを特徴とするも
のである。 In accordance with the above object, the present invention provides a pipeline arithmetic unit that includes a plurality of serially connected arithmetic stages and sequentially processes data included in a plurality of elements for each instruction. The present invention is characterized in that a stage is provided for counting the number of elements of a plurality of elements, and when the count reaches a predetermined count value, a predetermined warning signal is supplied to the peripheral device.

以下図面に従つて本発明を説明する。 The present invention will be explained below with reference to the drawings.

第１図は一般的なパイプライン演算器の一構成
例を示すブロツク図である。本図において、Ｄ２
およびＤ３はそれぞれ第１エレメント群および第
２エレメント群に含まれるデータを意味し、これ
らが入力データとなる。各入力データはそれぞれ
ＡステージのレジスタＲ２ＡおよびＲ３Ａに一旦
ストアされたのち、対応するプレシフタPRS２お
よびPRS３を経由して、Ｂステージの各レジスタ
Ｒ２ＢおよびＲ３Ｂに至る。前記プレシフタ
（PRS）は、例えば浮動小数点の加算において、
指数部を比較し、その大きい方に数値合せを行な
う、というような操作を施す。 FIG. 1 is a block diagram showing an example of the configuration of a general pipeline arithmetic unit. In this figure, D2
and D3 mean data included in the first element group and the second element group, respectively, and these become input data. Each input data is once stored in the registers R2A and R3A of the A stage, and then reaches the registers R2B and R3B of the B stage via the corresponding preshifters PRS2 and PRS3. The preshifter (PRS) is used, for example, in floating point addition.
Operations such as comparing the exponent parts and matching the numerical value to the larger one are performed.

レジスタＲ２ＢおよびＲ３Ｂ内のデータは、ア
ダーADDに加算されてＣステージのレジスタＲ
１Ｃに至り、ポストシフタPTSを経由して、Ｄス
テージのレジスタRIDにストアされ、その後出力
データＤ１として出力される。 The data in registers R2B and R3B are added to adder ADD and sent to register R in C stage.
1C, is stored in the register RID of the D stage via the post-shifter PTS, and is then output as output data D1.

上述したパイプライン演算のプロセスは第２図
より明らかである。第２図は第１図のステージを
走るエレメントの流れを図解的に示すタイムチヤ
ートである。これらタイムチヤートは４段（図中
の１〜４まで）に積み重ねられており、各段は前
述したＡ，Ｂ，ＣおよびＤの各ステージに対応す
る。図中の１〜９の数字は連続して処理されるエ
レメント番号（その中味は前述のデータである）
を示している。第１図に示す如く、レジスタＲ２
ＡおよびＲ３Ａは同一ステージにあり、第２図の
１欄に示す如くエレメント１，２，３…９の各デ
ータが逐次ストアされる。これらエレメント１，
２，３…９の各データは次のＢステージに移ると
（第２図の２欄）、次の新たなエレメントが同第１
欄のように入つてくる。結局、レジスタ（Ｒ２
Ａ，Ｒ３Ａ，Ｒ２Ｂ，Ｒ３Ｂ，Ｒ１Ｃ，Ｒ１Ｄ）
に入るエレメントは時系列的に縦の流れでみる
と、１エレメントずつシフトしていることにな
る。このシフトは１イクル・クロツクを単位とし
てなされる。本図から明らかな如く、パイプライ
ン演算器は、一連のデータが隙間なく密に処理さ
れて行くから、極めて高速な処理が期待される。 The process of the pipeline operation described above is clear from FIG. FIG. 2 is a time chart schematically showing the flow of elements running through the stage of FIG. These time charts are stacked in four stages (1 to 4 in the figure), and each stage corresponds to each of the stages A, B, C, and D described above. The numbers 1 to 9 in the diagram are element numbers that are processed consecutively (the contents are the data described above)
It shows. As shown in FIG.
A and R3A are at the same stage, and each data of elements 1, 2, 3, . . . 9 is stored sequentially as shown in column 1 of FIG. These elements 1,
When each data of 2, 3...9 moves to the next B stage (column 2 in Figure 2), the next new element is transferred to the same stage 1.
It comes in like a column. In the end, register (R2
A, R3A, R2B, R3B, R1C, R1D)
If we look at the elements that enter in the vertical direction in chronological order, they shift one element at a time. This shift is done in units of one cycle clock. As is clear from this figure, the pipeline arithmetic unit processes a series of data densely without gaps, and is therefore expected to perform extremely high-speed processing.

ところが実際にその高速性は十分に発揮されて
いない。すなわち、上記パイプライン演算器に連
係する周辺装置のやりとりが効率良くなされてい
ないから、コンピユータシステム全体の実質的処
理速度はそれ程大とならない。ここにいう周辺装
置とは主としてインストラクシヨン・ユニツトを
いう。インストラクシヨン・ユニツトはコンピユ
ータシムテム内の種々装置に対して所定の命令を
与えるものである。例えば、該インストラクシヨ
ン・ユニツトはメモリ（図示せず）に対し第１図
の入力データＤ２およびＤ３をパイプライン演算
器に出力すべきこと（読出し）を指示する。ある
いは、第１図のパイプライン演算器からの出力デ
ータＤ１をメモリ（図示せず）に入力すべきこと
（書込み）を指示する。本発明は前記指示のうち
前者の指示について言及するものであるが、この
場合、入力データＤ２，Ｄ３を入力し終えてから
当該メモリに読出し操作を指示したのでは無駄時
間が生じてしまう。これが前述した処理速度の増
大に支障を及ぼす。 However, in reality, its high speed is not fully demonstrated. That is, since the peripheral devices connected to the pipeline arithmetic unit are not efficiently exchanged, the actual processing speed of the entire computer system is not so high. The peripheral devices referred to here mainly refer to instruction units. The instruction unit provides predetermined instructions to various devices within the computer system. For example, the instruction unit instructs a memory (not shown) to output (read) the input data D2 and D3 of FIG. 1 to the pipeline arithmetic unit. Alternatively, it instructs that the output data D1 from the pipeline arithmetic unit in FIG. 1 should be input (written) into a memory (not shown). The present invention refers to the former of the above-mentioned instructions, but in this case, instructing the memory to perform a read operation after inputting the input data D2 and D3 would result in wasted time. This poses a problem in increasing the processing speed described above.

そこで本発明は第３図の如き構成を提案する。
第３図は本発明の制御方法を実現するための一実
施例を示すブロツク図である。ただし、本図中右
側の系は既に第１図で説明したとおりである。従
つて本図中左側の系が本発明に基づいて導入され
る。該系３０に入力すべきものは１命令で処理す
べきエレメント数EN（第２図１，２，３…９で
ありベクトル長とも称される）である。このエレ
メント数ENは、前記インストラクシヨン・ユニ
ツトがパイプライン演算器に対して命令を指示す
る際に明確になつている。エレメント数ENは、
さらに計数手段、例えばカウンタ（CNTA）３１
にプリセツトされる。このとき対応データの一番
目はＡステージのレジスタ（Ｒ２Ａ，Ｒ３Ａ）に
入つている。その後、当該データが１ステージず
つ、１サイクル・クロツクCCに同期して、図中
の下方にシフトすると同時に、カウンタ３１の内
容はデクレメント手段（−１）３２により１ずつ
減算（ダウンカウント）される。ここで前述の入
力データＤ２，Ｄ３が前記メモリから読出し開始
されるべきタイミングより先行して数サイクル・
クロツク前に、前記インストラクシヨン・ユニツ
トに予告信号Ｗを形成しようとするのが本発明の
趣旨であるから、このために先ず１エレメント群
のうちの所定のエレメントの出現を検出すること
とする。すなわち、前記カウンタ３１の計数値が
例えば４となつたときに、予告のための操作を起
動する。この起動の最初はカウンタ３１の計数値
４を検出することであり、これを行なうのが回路
３３であり例えばデコーダからなる。該デコーダ
３３は当該検出時に論理“１”を出力する。この
論理“１”が、前述した予告信号である。この予
告信号をいかなるステージで送出するかは任意で
あり、前記計数値のデコーダに対する設定値を前
述の４ではなく３としたり、５としたりすれば良
い。 Therefore, the present invention proposes a configuration as shown in FIG.
FIG. 3 is a block diagram showing an embodiment for implementing the control method of the present invention. However, the system on the right side of the figure is as already explained in FIG. Therefore, the system on the left side of the figure is introduced based on the present invention. What should be input to the system 30 is the number of elements EN (1, 2, 3, . . . 9 in FIG. 2, also called vector length) to be processed in one instruction. This number of elements EN is made clear when the instruction unit instructs the pipeline arithmetic unit. The number of elements EN is
Furthermore, a counting means, for example a counter (CNTA) 31
Preset to . At this time, the first corresponding data is stored in the A stage registers (R2A, R3A). Thereafter, the data is shifted downward in the figure one stage at a time in synchronization with the one cycle clock CC, and at the same time, the contents of the counter 31 are subtracted (down-counted) by 1 by the decrement means (-1) 32. Ru. Here, the input data D2 and D3 described above are read out from the memory several cycles in advance of the timing at which reading is to be started.
Since the gist of the present invention is to form a warning signal W in the instruction unit before the clock, the appearance of a predetermined element in one element group is first detected for this purpose. . That is, when the count value of the counter 31 reaches, for example, 4, an operation for advance notice is activated. The first step in this activation is to detect the count value 4 of the counter 31, and the circuit 33 that performs this is comprised of, for example, a decoder. The decoder 33 outputs logic "1" at the time of this detection. This logic "1" is the aforementioned warning signal. The stage at which this notice signal is sent is arbitrary, and the set value for the decoder of the count value may be set to 3 or 5 instead of the above-mentioned 4.

第４図は第３図に示した構成の動作を説明し、
本発明の方法を明らかにするためのタイムチヤー
トである。なお、理解を早めるために第２図のタ
イムチヤートとの時系列的な関連をもたせるよう
に描いてある。本図中の５欄におけるカウンタ
（CNTA）の計数値４が前述した計数値４であ
り、これによりデコーダ３３が論理“１”を送出
する。この論理“１”がすなわち予告信号Ｗとな
る。Ｔは予告の際にとるべき余裕期間である。 FIG. 4 explains the operation of the configuration shown in FIG.
This is a time chart for clarifying the method of the present invention. In addition, in order to speed up understanding, it is drawn to have a chronological relationship with the time chart in Figure 2. The count value 4 of the counter (CNTA) in column 5 in the figure is the count value 4 described above, and the decoder 33 thereby sends out a logic "1". This logic "1" becomes the notice signal W. T is the margin period that should be taken when giving advance notice.

このようにして得られた予告信号は、当該命令
における、メモリからのデータの投入（エレメン
ト１，２，３…）がそろそろ終わりに近づくか
ら、次の命令における該メモリからのデータの投
入を準備せよ、という意味あいを持つ。すなわ
ち、該メモリはインストラクシヨン・ユニツトか
らの指示があり次第、いつでも即座にデータの供
給ができる状態で待機できる。これが本発明によ
つてもたらされる高速処理の原理ともなる。 The advance notice signal obtained in this way prepares for the input of data from the memory in the next instruction, since the input of data from the memory (elements 1, 2, 3...) in the relevant instruction is nearing the end. It has the meaning of "Let's do it." That is, the memory can stand by and be ready to supply data at any time upon receiving an instruction from the instruction unit. This is also the principle of high-speed processing brought about by the present invention.

第５図は本発明の方法を実行した場合の一態様
例を示すタイムチヤートである。本図では例えば
エレメント数６の命令１とエレメント数10の命令
２が示されており、エレメント数６の命令１にお
いて、パイプライン演算器が、例えば残り４にな
るまでに至つたとき予告信号Ｗを送出し、該予告
信号Ｗは次の命令スタート（START(2)）に起動
をかける。このようにすれば、一連の命令は隙間
なく密に実行される。 FIG. 5 is a time chart showing one embodiment of the method of the present invention. In this figure, for example, instruction 1 with 6 elements and instruction 2 with 10 elements are shown. In instruction 1 with 6 elements, when the pipeline arithmetic unit reaches, for example, 4 remaining, the warning signal W The advance notice signal W activates the next command start (START(2)). In this way, a series of instructions will be executed closely without any gaps.

そもそも、エレメント数が全ての命令について
同一であれば本発明のような制御方法を導入する
余地も必要性もない。ところが、エレメント数と
いうのは各命令毎に区々である。このために毎
回、エレメント数をカウンタにイニシヤルセツト
し、これをサイクル・クロツクに同期して１ずつ
減算すれば、どのようなベクトル長の命令であつ
ても、必ず全データの投入前の一定タイミングに
おいて予告を発することができる。 In the first place, if the number of elements is the same for all instructions, there is neither room nor necessity to introduce a control method such as the present invention. However, the number of elements varies for each instruction. For this purpose, by initializing the number of elements in a counter each time and subtracting it by 1 in synchronization with the cycle clock, no matter what vector length the instruction is, it will always be executed at a certain timing before inputting all data. A notice can be issued.

以上説明したように本発明によればパイプライ
ン演算器とやりとりをする周辺装置との連係が、
無駄時間を介在させずに連続的に行なわれるから
コンピユータシステム全体としての処理速度は大
幅に向上する。 As explained above, according to the present invention, the cooperation between the pipeline arithmetic unit and the peripheral devices that interact with it is
The processing speed of the entire computer system is greatly improved because the processing is performed continuously without any wasted time.

[Brief explanation of the drawing]

第１図は一般的なパイプライン演算器の一構成
例を示すブロツク図、第２図は第１図のステージ
を走るエレメントの流れを図解的に示すタイムチ
ヤート、第３図は本発明の制御方法を実現するた
めの一実施例を示すブロツク図、第４図は第３図
に示した構成の動作を説明し、本発明の方法を明
らかにするためのタイムチヤート、第５図は本発
明の方法を実行した場合の一態様例を示すタイム
チヤートである。図において３０は本発明により導入された系、
３１はカウンタ、３２はデクレメント手段、３３
はデコーダ、Ｗは予告信号である。 Fig. 1 is a block diagram showing an example of the configuration of a general pipeline arithmetic unit, Fig. 2 is a time chart schematically showing the flow of elements running through the stages shown in Fig. 1, and Fig. 3 is a control diagram of the present invention. FIG. 4 is a block diagram showing an example of implementing the method; FIG. 4 is a time chart for explaining the operation of the configuration shown in FIG. 3 and clarifying the method of the present invention; FIG. 3 is a time chart showing an example of an embodiment when the method is executed. In the figure, 30 is a system introduced by the present invention,
31 is a counter, 32 is a decrement means, 33
is a decoder, and W is a warning signal.

Claims

[Claims] 1. A pipeline arithmetic unit that consists of a plurality of serially connected arithmetic stages and processes each data of a group of elements while sequentially shifting it to the next arithmetic stage in synchronization with a cycle clock. , a computer system comprising a peripheral device linked to the pipeline arithmetic unit, a system that operates in the same step as the pipeline arithmetic unit is provided in parallel with the pipeline arithmetic unit, and a first stage of the system is provided. is provided with counter means for presetting the number of elements in the group of elements, counts down the number of elements preset in the counter means in synchronization with the cycle clock, and by the down count, the number of elements in the group of elements is preset. Detecting the appearance of the stage a predetermined number of cycle clocks before the arrival of the last element of the elements, forming a warning signal at the timing of the detection and sending it to the peripheral device,
1. A method for controlling a pipeline computing unit, characterized in that the peripheral device is prompted to prepare for execution of a predetermined process.