JPH04105126A

JPH04105126A - Pipeline processing system

Info

Publication number: JPH04105126A
Application number: JP22277590A
Authority: JP
Inventors: Makoto Nakahara; 誠中原
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-08-24
Filing date: 1990-08-24
Publication date: 1992-04-07
Anticipated expiration: 2014-03-31
Also published as: JP2875864B2

Abstract

PURPOSE:To evade the interlock and to prevent the deterioration of the processing speed by moving back and forth the position of a prescribed stage including the operand processing. CONSTITUTION:A dummy stage 2 that only functions to transmit the data is added to a pipeline stage 1 including plural stages and can be replaced with a prescribed stage 3 which carries out the operand read processing. So that the position of the replaced stage 3 is set the rear side of a stage executing sequence. Then the position of the stage 3 including the operand read processing can be moved back and forth. Therefore the stage 3 is usually set at the front side and then moved to the rear side at estimation of the interlock so that the interlock can be evaded. Thus it is possible to evade the occurrence of the interlock and to prevent the deterioration of the processing speed in a pipeline processing system.

Description

【発明の詳細な説明】〔概要〕コンピュータ等の命令処理レベルに用いられるパイプラ
イン処理方式に関し、オペランドの読み出し処理を含む所定ステージの位置を
前後に移動可能とし、インターロックを回避して処理速
度の低下を防止することを目的とし、１つの命令を構成する複数段のパイプラインステージに
データを通過させるだけのダミーステージを追加し、少
なくともオペランドの読み出し処理を実行する所定ステ
ージと前記ダミーステージとを置換可能とすると共に、
置換後の所定ステージ位置がステージ実行順の後側に位
置することを特徴とし、好ましくは、先行命令の実行結果もしくは先行命令その
ものに依存する後続命令の実行に際して、前記所定ステ
ージとダミーステージとを置換することを特徴とし、好ましくはまた、先行命令の実行結果もしくは先行命令
そのものに依存しない後続命令の実行に際して、置換後
の所定ステージを置換前の位置に復帰させることを特徴
とする。[Detailed Description of the Invention] [Summary] Regarding the pipeline processing method used at the instruction processing level of computers, etc., the position of a predetermined stage including operand read processing can be moved back and forth, thereby avoiding interlocks and increasing processing speed. In order to prevent the deterioration of performance, a dummy stage that only passes data is added to the multiple pipeline stages constituting one instruction, and at least a predetermined stage that executes operand read processing and the dummy stage are connected to each other. In addition to making it possible to replace
The predetermined stage position after replacement is located at the rear of the stage execution order, and preferably, when executing a subsequent instruction that depends on the execution result of the preceding instruction or the preceding instruction itself, the predetermined stage and the dummy stage are Preferably, the predetermined stage after the replacement is returned to the position before the replacement upon execution of a subsequent instruction that does not depend on the execution result of the preceding instruction or the preceding instruction itself.

[Industrial application field]

本発明は、コンピュータ等の命令処理レベルに用いられ
るパイプライン処理方式に関する。The present invention relates to a pipeline processing method used at the instruction processing level of computers and the like.

−ｇに、コンピュータを高速化する手法の代表として、
１つの命令をいくつかの処理単位（パイプラインステー
ジ）に分割し、連続する複数命令を並列に実行するパイ
プライン処理方式がある。-g, as a representative method for speeding up computers,
There is a pipeline processing method in which one instruction is divided into several processing units (pipeline stages) and multiple consecutive instructions are executed in parallel.

[Conventional technology]

第１５図は従来のパイプライン処理方式の概念図であり
、この例では、１つの命令が処理単位毎に４つのステー
ジに分けられている。FIG. 15 is a conceptual diagram of a conventional pipeline processing method, and in this example, one instruction is divided into four stages for each processing unit.

１番目の処理サイクルで命令ｎの■ステージを実行した
後、続くｊ＋１番目の処理サイクルで命令ｎの■ステー
ジと命令ｎ＋１の■ステージとを実行し、以降、ｉ＋２
番目の処理サイクルで命令ｎの■ステージ、命令ｎ＋１
の■ステージおよび命令ｎ＋２の１ステージを実行し、
続くｉ＋３番目の処理サイクルで命令ｎの■ステージ、
命令ｎ＋１の■ステージ、命令ｎ＋２の■ステージおよ
び命令ｎ＋３の１ステージを実行する。After executing the ■stage of instruction n in the first processing cycle, the ■stage of instruction n and the ■stage of instruction n+1 are executed in the following j+1th processing cycle, and thereafter, the ■stage of instruction n+1 is executed.
■stage of instruction n in the th processing cycle, instruction n+1
Execute ■ stage of and 1 stage of instruction n+2,
■stage of instruction n in the following i+3rd processing cycle,
The ■stage of instruction n+1, the ■stage of instruction n+2, and the 1st stage of instruction n+3 are executed.

複数の命令を並列的に実行でき、コンピュータを高速動
作させることができる。Multiple instructions can be executed in parallel, allowing the computer to operate at high speed.

[Problem to be solved by the invention]

しかしながら、かかる従来のパイプライン処理方式にあ
っては、パイプラインの各ステージが命令の処理単位、
例えば「命令フエ・ノチ処理」　「デコード処理」　「
実行処理」・・・・・・に応じて固定となっていたため
、特定の命令実行時にいわゆるパイプラインインターロ
ック（以下、単にインターロックという）が発生し、こ
れにより余分な処理サイクルを消費して処理速度が低下
するといった問題点があった。However, in such conventional pipeline processing methods, each stage of the pipeline is an instruction processing unit,
For example, "instruction processing", "decoding processing", "
"Execution processing" is fixed depending on the execution process, so a so-called pipeline interlock (hereinafter simply referred to as "interlock") occurs when a specific instruction is executed, which consumes extra processing cycles. There was a problem that the processing speed decreased.

ここで、インターロックとはパイプライン処理に特有の
現象であり、以下のように説明できる。Here, interlock is a phenomenon specific to pipeline processing, and can be explained as follows.

すなわち、先行命令の実行結果をある命令で参照する際
に、当該実行結果が未だ確定状態にない場合には、ある
命令は確定前の古い（換言すれば正しくない）データを
参照することになり、誤った処理を行なうことになるの
で、これを避けるために、先行命令の実行結果が確定す
るまでの間、ある命令の実行を待たせることが行なわれ
る。In other words, when an instruction refers to the execution result of a preceding instruction, if the execution result is not yet finalized, the instruction will refer to old (in other words, incorrect) data that has not yet been finalized. In order to avoid this, the execution of a certain instruction is made to wait until the execution result of the preceding instruction is determined.

例えば、ある命令が、先行命令の実行結果（ロードデー
タ）に依存する場合、ロードデータが確定するまで、次
の命令の実行が待たされることが行われる。これは一般
に「ロードデータ依存のインターロック」と呼ばれてい
る。For example, when a certain instruction depends on the execution result (load data) of a preceding instruction, execution of the next instruction is made to wait until the load data is finalized. This is generally referred to as a "load data dependent interlock."

また、演算ユニットを複数の処理サイクルの間、占有す
るような特殊命令（いわゆるマルチサイクル命令）の場
合にも、当該命令が演算ユニットを解放するまでの間は
、次の命令の実行を待たせることが行なわれ、これは「
マルチサイクル命令によるインターロック」と呼ばれて
いる。Also, in the case of a special instruction that occupies an arithmetic unit for multiple processing cycles (a so-called multi-cycle instruction), execution of the next instruction is made to wait until the instruction releases the arithmetic unit. This is done as follows:
This is called "interlock using multi-cycle instructions."

なお、インターロックは、インプリメント（回路の実現
方法）によって種々異なるものであり、上記２つのイン
ターロックに限定されるものではない。Note that interlocks vary depending on the implementation (method of realizing the circuit), and are not limited to the above two interlocks.

かかるインターロックの具体例について、図を参照しな
がら具体的に説明すると、第１６図において、Ｉ−Ｆは
命令フェッチステージ、Ｄはデコードステージ、Ｅはオ
ペランドの読み出し処理を含む実行ステージ、Ｄ−Ｆは
データフェッチステージ、Ｗはデータライトステージで
ある。A specific example of such an interlock will be explained in detail with reference to the drawings. In FIG. 16, I-F is an instruction fetch stage, D is a decode stage, E is an execution stage including operand read processing, and D- F is a data fetch stage, and W is a data write stage.

ｎ番目の命令は加算命令（ａｄｄ）、ｎ＋１番目の命令
はロード命令（Ｌｏａｄ）、ｎ＋２番目の命令は加算命
令（ａｄｄ）、ｎ＋３の命令は減算命令（ｓｕｂ）であ
る。The n-th instruction is an addition instruction (add), the n+1-th instruction is a load instruction (Load), the n+2-th instruction is an addition instruction (add), and the n+3-th instruction is a subtraction instruction (sub).

今、ｎ＋２番目の加算命令が、先行命令（ｎ＋１）の実
行結果（データ）を利用する命令であるとすると、この
場合、７１　＋　ｌのデータはＩ＞Ｆステージのほぼ終
り近くにならなければ確定しないので、この確定タイミ
ングに合わせてｎ千２番目の命令を遅らせる必要がある
。このため、ｎ＋１番目以降の命令を数サイクル（図で
は１サイクル）だけハード的に遅延（インターロック）
させることが行なわれる。こうすることにより、ｎ±２
番目の命令に正しいオペランドデータを与えることがで
き、実行結果を正確にすることができる。Now, assuming that the n+2nd addition instruction is an instruction that uses the execution result (data) of the preceding instruction (n+1), in this case, the data of 71 + l must be near the end of the I>F stage. Since it is not finalized, it is necessary to delay the n-thousand-second instruction in accordance with this finalization timing. For this reason, the instructions from the n+1st onwards are delayed (interlocked) by several cycles (one cycle in the figure).
What is done is done. By doing this, n±2
Correct operand data can be given to the second instruction, resulting in accurate execution results.

しかしその反面で、インターロックの発生回数に比例し
て上記遅延サイクルが累積されていき、パイプライン処
理サイクルが余分に費やされる結果、処理速度が低下す
るといった問題点があった。However, on the other hand, there is a problem in that the delay cycles are accumulated in proportion to the number of times the interlock occurs, and as a result, the processing speed decreases as a result of extra pipeline processing cycles.

なお、コンパイラ開発の段階でソフト的にインターロッ
クを発生しないようにすることも行なわれているが、か
かる方法はソフト開発時の負担が大きく、バグの発生や
コストの面で好ましいものではない。Incidentally, it is also possible to prevent interlocks from occurring in software at the stage of compiler development, but such a method is not preferable in terms of the burden of software development, the occurrence of bugs, and cost.

本発明は、このような問題点に鑑みてなされたもので、
オペランドの読み出し処理を含む所定ステージの位置を
前後に移動可能とし、インターロックを回避して処理速
度の低下を防止することを目的としている。The present invention was made in view of these problems, and
The purpose of this method is to enable the position of a predetermined stage that includes operand read processing to be moved back and forth, thereby avoiding interlocks and preventing a decrease in processing speed.

〔課題を解決するための手段］本発明は、上記目的を達成するためその原理構成図を第
１図に示すように、１つの命令を構成する複数段のパイ
プラインステージ１にデータを通過させるだけのダミー
ステージ２を追加し、少なくともオペランドの読み出し
処理を実行する所定ステージ３と前記ダミーステージ２
とを置換可能とすると共に、置換後の所定ステージ位置
がステージ実行順の後側に位置することを特徴とし、好
ましくは、先行命令の実行結果もしくは先行命令そのも
のに依存する後続命令の実行に際して、前記所定ステー
ジとダミーステージとを置換することを特徴とし、好ましくはまた、先行命令の実行結果もしくは先行命令
そのものに依存しない後続命令の実行に際して、置換後
の所定ステージを置換前の位置に復帰させることを特徴
とする。[Means for Solving the Problems] In order to achieve the above object, the present invention passes data through a plurality of pipeline stages 1 constituting one instruction, as shown in FIG. dummy stage 2 is added, and at least a predetermined stage 3 that executes operand read processing and the dummy stage 2 are added.
and the predetermined stage position after the replacement is located at the rear of the stage execution order. Preferably, when executing a subsequent instruction that depends on the execution result of the preceding instruction or the preceding instruction itself, The method is characterized in that the predetermined stage and the dummy stage are replaced, and preferably the predetermined stage after the replacement is returned to the position before the replacement upon execution of a subsequent instruction that does not depend on the execution result of the preceding instruction or the preceding instruction itself. It is characterized by

〔作用］本発明では、オペランドの読み出し処理を実行する所定
ステージと、当該ステージの後側に位置しデータを通過
させるだけのダミーステージとが、置換される。[Operation] In the present invention, a predetermined stage that executes operand read processing is replaced with a dummy stage that is located behind the stage and only allows data to pass through.

かかる置換は、例えば、先行命令の実行結果が所定ステ
ージで必要とされるときに行なわれ、または、所定ステ
ージを含む命令が先行命令に依存するものであるときに
も行なわれる。Such replacement is performed, for example, when the execution result of a preceding instruction is required at a predetermined stage, or when an instruction including a predetermined stage is dependent on a preceding instruction.

さらに、所定ステージの元の位置への復帰は、先行命令
の実行結果が所定ステージで必要とされなくなったとき
、または、所定ステージを含む命令が先行命令に依存し
なくなったときに行なわれる。Further, the return of the predetermined stage to its original position is performed when the execution result of the preceding instruction is no longer needed at the predetermined stage, or when the instruction including the predetermined stage no longer depends on the preceding instruction.

したがって、先行命令とそれに続く命令との相互関係に
応じてパイプラインステージの配列が自在に変化し、イ
ンターコックの発生を回避して処理速度の低下が防止さ
れる。Therefore, the arrangement of the pipeline stages can be freely changed according to the interrelationship between the preceding instruction and the following instruction, thereby avoiding the occurrence of intercocks and preventing a decrease in processing speed.

（実施例）以下、本発明を図面に基づいて説明する。(Example) Hereinafter, the present invention will be explained based on the drawings.

第２〜１４図は本発明に係るパイプライン処理方式の一
実施例を示す図である。2 to 14 are diagrams showing an embodiment of the pipeline processing method according to the present invention.

皿理説所まず、第２〜１０図を参照しながら原理を説明する。第
２図は、簡単な例として５段のグイナミンクパイプライ
ンを示す図である。First, the principle will be explained with reference to FIGS. 2 to 10. FIG. 2 is a diagram showing a five-stage Guinamink pipeline as a simple example.

このパイプラインの基本型フォーマットは、ＩＦ（命令
フェッチ）ステージとＤ（デコード）ステージの後に、
２つの実行候補ステージ（Ａ、Ｂ）を設け、最後にＷ（
ライト）ステージを配置して構成する。The basic format of this pipeline is that after the IF (instruction fetch) stage and the D (decode) stage,
Two execution candidate stages (A, B) are provided, and finally W (
Light) Place and configure the stage.

２つの実行候補ステージ（Ａ、Ｂ）は、図中のフォーマ
ット１およびフォーマット２で示すように、いくつかの
組み合わせに変化するようになっている。The two execution candidate stages (A, B) can be changed into several combinations, as shown by format 1 and format 2 in the figure.

こフォーマット１］このフォーマントは、通常の場合の命令処理形式で、３
つの細分形に分かれる。その１つは前側にＥ（実行）ス
テージを置き、後側にダミーステージ（データが通過す
るだけのステージ）を配置するもの（フォーマント１−
１）、他の１つは前側にＥＬ／Ｓ　　（ロード／ストア
・アドレス計算）ステージを置き、後側にＤ−Ｆ　（デ
ータフェッチ）ステージを配置するもの（フォーマント
１−２）、最後の１つは２つの実行候補ステージの双方
にＥｌ、２（２サイクル命令の実行）ステージを配置す
るもの（フォーマット１−３）であり、何れも２つの実
行候補ステージの前側にオペランドの読み出し処理を含
む実行ステージを位置させている。This format 1] This formant is the instruction processing format in the normal case, and 3
Divided into two subdivisions. One is to place the E (execution) stage at the front and a dummy stage (a stage where data only passes) at the rear (Formant 1-
1), the other one places the EL/S (load/store address calculation) stage on the front side and the D-F (data fetch) stage on the back side (formant 1-2), and the last One is the one (format 1-3) in which the El, 2 (two-cycle instruction execution) stage is placed in both of the two execution candidate stages, and in both cases, the operand read processing is placed before the two execution candidate stages. The execution stage containing is located.

［フォーマット２］ −４、このフォーマットはインターロックを起こすよう
な相互依存の命令が出現した場合、あるいは、２サイク
ル命令に続く命令を実行する場合の命令処理形式で、２
つの実行候補ステージの前側にダミーステージを置き、
後側にオペランドの読み出し処理を含む実行ステージを
配置する。[Format 2] -4, This format is an instruction processing format when interdependent instructions that cause an interlock appear, or when executing an instruction following a 2-cycle instruction.
Place a dummy stage in front of the two execution candidate stages,
An execution stage including operand read processing is placed at the rear.

すなわち、複数の実行候補ステージを有し、少なくとも
オペランドの読み出し処理を実行する所定ステージ（Ｅ
、ＥＬ／ＳまたはＥｌステージ）を前側に配置したフォ
ーマット１と、複数の実行候補ステージの前側にダミー
ステージを配置すると共に、後側に所定ステージを配置
したフォーマット２の２種類の命令処理形式を設定する
。That is, there is a plurality of execution candidate stages, and at least a predetermined stage (E
There are two types of instruction processing formats: format 1 in which a dummy stage is placed in front of multiple execution candidate stages and a predetermined stage is placed in the rear. Set.

これらの２種類の命令処理形式は、簡単には、ダミース
テージが追加されたこと、および、必要に応じてダミー
ステージと所定ステージを置換できることの２点に特徴
がある。Simply put, these two types of instruction processing formats are characterized by two points: the addition of a dummy stage, and the ability to replace the dummy stage with a predetermined stage as necessary.

このような幾種類かのフォーマットを有するパイプライ
ンの命令処理シーケンスを具体的に説明すると、例えば
第３図において、ｎ番目のａｄｄ命令から始まってｎ＋
１１番目のａｄｄ命令で完結する一連の命令処理シーケ
ンスの場合には、次表のように命令に応じた最適なフォ
ーマットが選択される。なお、第３図中のａｄｄ　（ｄ
ｅｐ）は先行命令の実行結果またはロードデータを用い
る加算命令、２ｃｙｃＩｎｓｔは２サイクル命令、各命
令間を結ぶ矢印はデータバイパスを表している。To specifically explain the instruction processing sequence of a pipeline having several types of formats, for example, in FIG.
In the case of a series of instruction processing sequences that are completed with the 11th add instruction, the optimal format according to the instruction is selected as shown in the following table. In addition, add (d
ep) is an addition instruction that uses the execution result of the preceding instruction or load data, 2cycInst is a 2-cycle instruction, and the arrows connecting each instruction represent data bypass.

表すなわち、第３図において、■では通常の命令処理形式
としてのフォーマット１が使用される。In the table, that is, in FIG. 3, format 1 as a normal instruction processing format is used in (1).

■は先行命令（ｎ　＋　２　）のロードデータを必要と
するケースであり、フォーマット２に切り換えられる。Case (2) is a case in which load data of the preceding instruction (n + 2) is required, and the format is switched to format 2.

したがって、先行命令のデータ読み出し処理を含むＥス
テージが後側に移動し、先行命令のロードデータを支障
なく取り込むことができる。■では前の命令と同じリソ
ース（加算・論理演算部）が使用されるので、引き続い
てフォーマット２を使用する。■ではフォーマット２が
らフォーマット１へと復帰させる。すなわち、ｎ＋５番
目の命令が先行命令に依存しない場合であって、しかも
、命令（ｎ＋４）のＥステージで使用するリソースと、
命令（ｎ＋５）のＥ　Ｌ／Ｓステージで使用するリソー
スとが競合しない場合には、フォーマット１に復帰させ
ることになる。これにより、フォーマット２の採用で１
処理サイクルだけ後にずらされた所定ステージを、元の
位置に戻すことができる。■では２サイクル命令を処理
するので、これに続く■で再びフォーマット２を採用す
る。Therefore, the E stage including the data read processing of the preceding instruction is moved to the rear side, and the load data of the preceding instruction can be taken in without any problem. In (2), the same resources (addition/logical operation unit) as the previous instruction are used, so format 2 is used continuously. In (2), format 2 is restored to format 1. That is, in the case where the n+5th instruction does not depend on the preceding instruction, and the resources used in the E stage of the instruction (n+4),
If there is no conflict with the resources used in the E L/S stage of instruction (n+5), format 1 is restored. As a result, by adopting format 2, 1
A given stage that has been shifted back by a processing cycle can be returned to its original position. Since a 2-cycle instruction is processed in step 2, format 2 is adopted again in step 2 that follows.

したがって、各パイプラインステージが命令毎に１サイ
クルずつずれて配列され、従来のパイプライン処理方式
の場合に例えば図中■■の処理サイクルで発生していた
インターロックを回避できる。しかも、図中の■［相］
の処理サイクルに着Ｈすると、隣接命令のＥおよびＥ　
Ｌ／Ｓステージが並列実行されているので、■■の処理
サイクルで実行ステージを行なわなかった分を補うこと
ができる。Therefore, each pipeline stage is arranged with a one-cycle shift for each instruction, and it is possible to avoid the interlock that occurs in the conventional pipeline processing method, for example, in the processing cycles indicated by ■■ in the figure. Moreover, ■ [phase] in the figure
When the processing cycle of H is reached, the adjacent instructions E and E
Since the L/S stages are executed in parallel, it is possible to compensate for the lack of execution stages in the processing cycles of ■■.

その結果、命令処理シーケンスの全体ではあたかも１サ
イクルにつき１つの命令が実行されたことになり、ペナ
ルティゼロでパイプライン処理を行なうことができる。As a result, it is as if one instruction was executed per cycle in the entire instruction processing sequence, and pipeline processing can be performed with zero penalty.

実施± 次に、具体的な実施例として、基本構成が６段で、パイ
プラインフォーマットが２種類のダイナミックパイプラ
インを開示し、これについて説明する。Implementation Next, as a specific example, a dynamic pipeline with a basic configuration of six stages and two types of pipeline formats will be disclosed and explained.

第４図は６段ダイナミックパイプライン回路の要部の構
成図である。この図において、１０はアドレスキュー、
１１は命令キューであり、命令キュー１１はパイプライ
ンステージ数よりも１つ少ない５つのレジスタ、すなわ
ちデコードレジスタ（ＤＲ）、Ａレジスタ（Ａ−Ｒ）　
、Ｂレジスタ（Ｂ−Ｒ）、ライトレジスタ（Ｗ−Ｒ）を
備え、これらをシリーズに接続すると共に、各レジスタ
の出力ラインターコック検出＆マルチプレクサ制御回路
（以下、制御回路）１２に接続して構成している。FIG. 4 is a block diagram of the main parts of a six-stage dynamic pipeline circuit. In this figure, 10 is an address queue;
11 is an instruction queue, and the instruction queue 11 has five registers, one less than the number of pipeline stages, namely a decode register (DR) and an A register (A-R).
, B register (B-R), and write register (W-R), which are connected in series and connected to the output line tercock detection & multiplexer control circuit (hereinafter referred to as control circuit) 12 of each register. It consists of

制御回路１２は、メモリからのフェッチ命令や各レジス
タの内容に基づいて、先行命令とのデータ依存性を有す
る命令またはマルチサイクル命令に続く命令の有無を判
定（すなわちインター口・ンクの有無を予測）し、この
判定結果に応した切り換え操作信号を後述の各種マルチ
プレクサに出力する。The control circuit 12 determines the presence or absence of an instruction that has a data dependency with a preceding instruction or an instruction following a multicycle instruction (that is, predicts the presence or absence of an interface/link) based on the fetch instruction from memory and the contents of each register. ), and outputs a switching operation signal corresponding to the determination result to various multiplexers to be described later.

なお、Ａｌｉｇｎ、ＷＤＩ、ＷＤ２、Ｗｌ、Ｗ２、Ｗ３
、ＥＥＬ、ＥＥ２、ＥＯＩおよびＥＯ２はレジスタ、Ｍ
ＵＸ１は第１マルチプレクサ、ＭＵＸ２は第２マルチプ
レクサ、ＭＵＸ３は第３マルチプレクサ、ＭＵＸ４は第
４マルチプレクサ、ＡＬＵは算術論理ユニット、ＬＤ／
ＳＴ　　Ａｄｄｅｒはロード／ストア・アドレス計算ユ
ニット、ＬＬ−Ｌ７はバイパス路である。In addition, Align, WDI, WD2, Wl, W2, W3
, EEL, EE2, EOI and EO2 are registers, M
UX1 is the first multiplexer, MUX2 is the second multiplexer, MUX3 is the third multiplexer, MUX4 is the fourth multiplexer, ALU is the arithmetic logic unit, LD/
ST Adder is a load/store address calculation unit, and LL-L7 is a bypass path.

第５図はアドレスキュー１０の構成図であり、アドレス
キュー１０は、アドレスジェネレータ１３、Ｆステージ
のアドレスカウンタ（ＦＰＣ）、Ｄステージのアドレス
カウンタ（ＤＰＣ）、Ａステージのアドレスカウンタ（
ＡＰＣ）、Ｂステージのアドレスカウンタ（ＢＰＣ）、
Ｃステージのアドレスカウンタ（ＣＰＣ）　、Ｗステー
ジのアドレスカウンタ（ｗｐｃ）、分岐先アドレス計算
部１４、分岐不成立時復帰アドレスカウンタ（ＲＥＰＣ
）、およびマルチプレクサ１５を備える。FIG. 5 is a configuration diagram of the address queue 10. The address queue 10 includes an address generator 13, an F stage address counter (FPC), a D stage address counter (DPC), and an A stage address counter (
APC), B stage address counter (BPC),
C stage address counter (CPC), W stage address counter (wpc), branch destination address calculation unit 14, branch failure return address counter (REPC)
), and a multiplexer 15.

第６図（ａ）〜（ｄ）は１つの処理サイクルにおけるス
テージ配列（第７図参照）を１〜４までの４つのケース
に分けて、各ケース毎に定めたＭＵＸ１〜ＭＵＸ４の制
御規則を示す図である。なお、図中の符号＊はレジスタ
相互依存性の有無（Ｄステージの場合）または演算結果
あるいはロードデータの書き込みの有無（Ｗステージの
場合）によってケースバイケースの制御が行なわれるこ
とを表し、また、符号−は切り換え不要であることを表
している。Figures 6(a) to (d) divide the stage arrangement in one processing cycle (see Figure 7) into four cases 1 to 4, and define the control rules for MUX1 to MUX4 determined for each case. FIG. Note that the symbol * in the figure indicates that control is performed on a case-by-case basis depending on the presence or absence of register interdependence (in the case of D stage) or the presence or absence of writing of calculation results or load data (in the case of W stage). , the symbol - indicates that switching is not necessary.

しケースエコ１つの処理サイクルに２つのダミーステージとＥステー
ジが並ぶようなケースであり、ＭＵＸ　２によってＥＥ
ＬおよびＥＯＩの出力（ロ）が選択される。This is a case where two dummy stages and an E stage are lined up in one processing cycle.
The output of L and EOI (b) is selected.

［ケース２］１つの処理サイクルに２つのダミーステージとＥ　Ｌ／
Ｓステージが並ぶようなケースであり、ＭＵＸ３によっ
て同じ＜ＥＥＬおよびＥＯＩの出力（ロ）が選択される
。[Case 2] Two dummy stages and E L/ in one processing cycle
This is a case where the S stages are lined up, and the same <EEL and EOI outputs (b) are selected by MUX3.

すなわち、ケース１およびケース２では、レジスタ１段
（ＥＥＬまたはＥｏｌ）分の通過遅延（１クロツク）の
データがＡＬＵ、またはＬＤ／ＳＴ　　Ａｄｄｅｒに伝
えられる。That is, in case 1 and case 2, data with a passing delay (1 clock) of one register stage (EEL or Eol) is transmitted to the ALU or LD/ST Adder.

［ケース３］１つの処理サイクルに１つのダミーステージと、Ｅステ
ージおよびＥ　Ｌ／Ｓステージが並ぶようなケースであ
り、ＭＵＸ２によってＥＥＬおよびＥＯｌの出力（ロ）
が選択される一方、ＭＵＸ３によってＥＥ２およびＥＯ
２の出力（ハ）が選択される。[Case 3] This is a case where one dummy stage, E stage and E L/S stage are lined up in one processing cycle, and the output of EEL and EOl (B) is output by MUX2.
is selected, while EE2 and EO are selected by MUX3.
Output 2 (c) is selected.

［ケース４１ケース３とは逆順でＥステージとＥ　Ｌ／Ｓステージが
並ぶようなケースであり、ＭＵＸ２によってＥＥ２およ
びＥＯ２の出力（ハ）が選択される一方、ＭＵＸ３によ
ってＥＥＬおよびＥＯＩの出力（ロ）が選択される。[Case 41 This is a case in which the E stage and the E L/S stage are lined up in the reverse order of Case 3. MUX2 selects the outputs of EE2 and EO2 (c), while MUX3 selects the outputs of EEL and EOI (ro). ) is selected.

すなわち、ケース３では、レジスタ２段（ＥＥ１＋ＥＥ
２またはＥＯ１＋ＥＯ２）分の通過遅延のデータがＬＤ
／ＳＴ　　Ａｄｄｅｒに伝えられ、一方、ケース４では
、同じ遅延のデータがＡＬＵに伝えられる。In other words, in case 3, two stages of registers (EE1+EE
2 or EO1+EO2) minute transit delay data is LD
/ST Adder, while in case 4, data with the same delay is passed to the ALU.

第８図は本実施例で使用するパイプラインのフォーマッ
トである。Ｉ−Ｆ（命令フェッチ）ステージ、Ｄ（デコ
ード）ステージ、Ａステージ、Ｂステージ、Ｃステージ
およびＷ（ライト）ステージの６段のパイプラインを基
本形とし、Ａ〜Ｃまでの３ステージを実行候補ステージ
としている。FIG. 8 shows the format of the pipeline used in this embodiment. The basic structure is a six-stage pipeline: I-F (instruction fetch) stage, D (decode) stage, A stage, B stage, C stage, and W (write) stage, and three stages A to C are execution candidate stages. It is said that

実行候補ステージは、その内容によってフォーマット１
とフォーマット２に分けられる。The execution candidate stage is formatted as 1 depending on its contents.
and format 2.

フォーマ、・ト１は、最初の段にＥ（実行）ステージを
置き、残りの段にダミーステージを配置したフォーマッ
ト１１と、最初の段↓こＥＬ／Ｓ（ロード／ストア　ア
ドレス計算）ステージを置き、次の段にＤ−Ｆ　（デー
タフェッチ）ステージを置き、最後の段にダミーステー
ジを配置したフォーマット１−２の２種類からなる。Format 11 has the E (execution) stage in the first stage and dummy stages in the remaining stages, and the EL/S (load/store address calculation) stage in the first stage. , a DF (data fetch) stage is placed in the next stage, and a dummy stage is placed in the last stage.Formats 1-2 are available.

フォーマント２は、最初の段にダミーステージを置き、
次の段にＥ（実行）ステージを置き、最後の段にダミー
ステージを配置したフォーマット２−１と、最初の段に
ダミーステージを置き、次の段にＥＬ／Ｓ（ロード／ス
トア　アドレス計算）ステージを置き、最後の段にＤ−
Ｆ（データフェッチ）ステージをを配置したフォーマッ
ト２−２の２種類からなる。Formant 2, a dummy stage is placed in the first stage,
Format 2-1 has an E (execution) stage in the next stage and a dummy stage in the last stage, and a dummy stage in the first stage and EL/S (load/store address calculation) in the next stage. Place the stage and place D- on the last stage.
There are two formats: format 2-2 in which an F (data fetch) stage is arranged.

なお、禁止フォーマットとして、実行候補ステージの最
初の２つの段にダミーステージを置き、最後の段にＥ（
実行）ステージを配置したフォーマットを設定する。As a prohibited format, a dummy stage is placed in the first two stages of the execution candidate stage, and an E(
Execution) Set the format in which the stage is placed.

フォーマット１とフォーマント２の切り換えは、第４図
の状態遷移図に従う。すなわち、（１）現在のロード命
令の次の命令が、現在の命令のロードデータを利用する
場合（第１０図の■参照）に、フォーマット１からフォ
ーマット２へと状態を遷移させる。（ＩＩ）現在の命令
に対して前の命令がフォーマット２の状態であって、且
つ依存性がなく、しかも実行部のリソースが異なる場合
（第１０図の■参照）、あるいは、現在の命令が分岐命
令である場合（第１０図の■参照）に、フォーマット２
からフォーマット１へと状態を遷移させる。Switching between format 1 and formant 2 follows the state transition diagram shown in FIG. That is, (1) when the instruction next to the current load instruction uses the load data of the current instruction (see ■ in FIG. 10), the state is transitioned from format 1 to format 2. (II) If the previous instruction is in format 2 with respect to the current instruction, there is no dependency, and the resources of the execution section are different (see ■ in Figure 10), or the current instruction is If it is a branch instruction (see ■ in Figure 10), format 2
The state transitions from to format 1.

（１）現在のロード命令の次の命令が、現在の命令のロ
ードデータを利用しない場合、すなわち（１）以外の場
合（例えば第５図の■参照）に、フォーマット１の状態
を維持する。（ＩＶ）同じリソースを使う命令が続く場
合、すなわち、前の命令がフォーマット２であって、そ
の命令の実行部のリソースと同じリソースを使用する場
合（第１２図参照）に、フォーマット２の状態を維持す
る。(1) When the instruction following the current load instruction does not use the load data of the current instruction, that is, in cases other than (1) (for example, see ■ in FIG. 5), the state of format 1 is maintained. (IV) When instructions that use the same resource continue, that is, when the previous instruction is format 2 and uses the same resources as the execution part of that instruction (see Figure 12), the format 2 state maintain.

ここで、分岐について説明すると、本実施例における分
岐は、第１１図に示すように、デイレイドブランチ方式
の採用を前提としている。このため、分岐命令の不成立
に伴うペナルティが発生する。Here, to explain branching, branching in this embodiment is based on the adoption of a delayed branching method, as shown in FIG. 11. Therefore, a penalty occurs due to failure of the branch instruction.

すなわち、今回の６段ダイナミックパイプラインではフ
ォーマットｌの状態で分岐が不成立になると１サイクル
のペナルティが発生しく第１２図参照）、また、フォー
マット２の状態では２サイクルのペナルティが発生する
（第１３図参照）。In other words, in the current six-stage dynamic pipeline, if a branch is not taken in the state of format l, a penalty of one cycle occurs (see Figure 12), and in the state of format 2, a penalty of two cycles occurs (see figure 12). (see figure).

従来のパイプライン方式では、分岐不成立のペナルティ
を１サイクルに抑えるものが多く、この点で本実施例の
ペナルティ数は不利とも思えるが、実際には、ロードデ
ータ依存のインターロックの回避によって得をした１サ
イクル分でペナルティの不利を吸収でき、不都合を生ず
ることはない。In many conventional pipeline systems, the penalty for an untaken branch is limited to one cycle, and in this respect, the number of penalties in this embodiment may seem disadvantageous, but in reality, there is an advantage by avoiding interlocks that depend on load data. The disadvantage of the penalty can be absorbed in one cycle, and no inconvenience occurs.

これは、フォーマット２の状態で分岐不成立を生じると
いうことは、言い換えれば、フォーマット１からフォー
マット２への状態遷移時に１サイクル得をした状態で分
岐不成立を生じるがらである。In other words, if a branch is not taken in the format 2 state, a branch is not taken in a state where one cycle is gained during the state transition from format 1 to format 2.

すなわち、第１４図において、図中の■でロードデータ
依存性のインターロックを回避するため乙こ、フォーマ
ット１からフォーマット２へと状態が遷移してｌサイク
ルの得をし、その後、図中■でフォーマット２の状態の
ままの分岐不成立が生じたような最悪の場合を考えると
、この場合には２サイクル分のペナルティを生ずるが、
先の得をした１サイクル分を考慮すれば、結局ペナルテ
ィは１サイクルだけとなり、従来のパイプライン方式と
同しペナルティとすることができる。しかも、この場合
はあくまでも最悪の状態を想定したのであって、フォー
マット１の状態で分岐不成立が生しる限りにおいては、
インターロック回避によって得をした１サイクル分がそ
のまま残るから、ペナルティは１を越えることはなく、
したがって、従来のパイプライン方式に比べて処理速度
を向上できる。That is, in FIG. 14, in order to avoid the load data dependent interlock at ■ in the figure, the state transitions from format 1 to format 2, gaining l cycles, and then at ■ in the figure. Considering the worst case where a branch is not taken while the format 2 is still in the state, in this case there will be a penalty of 2 cycles, but
If one cycle worth of the previous gain is taken into account, the penalty is only one cycle after all, which can be the same penalty as in the conventional pipeline system. Moreover, in this case, we have only assumed the worst case, and as long as the branch is not taken in the format 1 state,
The penalty will not exceed 1 because the 1 cycle gained by avoiding the interlock will remain.
Therefore, processing speed can be improved compared to the conventional pipeline method.

なお、上記の実施例では、−船釣なパイプライン構成に
ダミーステージを追加し、「ロードデータ依存のインタ
ーロック」を回避しているが、これに限らず、例えばイ
ンプリメント次第では、マルチサイクル命令のうち２サ
イクル命令によるインターロックを回避できる。さらに
、ダミーステジの数を増やすことにより、３サイクル命
令や４サイクル命令などの他のマルチサイクル命令のイ
ンターロックにも容易に対応できる。Note that in the above embodiment, a dummy stage is added to the simple pipeline configuration to avoid "load data dependent interlock," but this is not limited to this. For example, depending on the implementation, multi-cycle instructions may be Interlock caused by two cycle instructions can be avoided. Furthermore, by increasing the number of dummy stages, interlocking of other multi-cycle instructions such as 3-cycle instructions and 4-cycle instructions can be easily accommodated.

；発明の効果］本発明によれば、以上述べたように、オペランドの読み
出し処理を含む所定ステージの位置を前後に移動可能と
したので、通常は所定ステージを前側に位置させる一方
、インターロック予ｍ　時には所定ステージを後側に位
置させてこれを回避できるようになり、処理速度の低下
を防止することができる。; Effects of the Invention] According to the present invention, as described above, the position of the predetermined stage including operand read processing can be moved back and forth. In some cases, this can be avoided by positioning the predetermined stage on the rear side, and a decrease in processing speed can be prevented.

[Brief explanation of drawings]

第１図は本発明の原理図、第２〜１４図は本発明に係るパイプライン処理方式の一
実施例を示す図であり、第２図はその５段パイプラインのフォーマットを示す概
念図、第３回はその５段パイプラインの基本動作図、第４図は
その６段ダイナミ・ノクノ々イブライン回路の構成図、第５図はそのアドレスキューの構成図、第６図はそのケ
ース毎に分けたマルチプレクサの制御規則図、第７図はそのマルチプレクサ制御の基本ルールを示す図
、第８図はその６段パイプラインのフォーマ・ノドを示す
概念図、第９図はその処理フォーマ・ントの状態遷移図、第１０
図はその６段パイプラインの基本動作図、第１１図はそ
のデイレイドブランチ方式の分岐を示す図、第１２図はその分岐不成立（１サイクルペナルティ第１
３図はその分岐不成立（２サイクルペナルティ第１４図
はその分岐不成立（２サイクルペナルティ図、第１５、１６図は従来例を示す図であり、第１５図はそ
の基本的なパイプライン処理の概念図、第１６図はそのパイプラインインターロックヲ説明する
図である。１・・・・・・パイプラインステージ、２・・・・・・
ダミーステージ、３・・・・・・所定ステージ。ステージ実行順本発明の原理口FIG. 1 is a diagram showing the principle of the present invention, FIGS. 2 to 14 are diagrams showing an embodiment of the pipeline processing method according to the present invention, and FIG. 2 is a conceptual diagram showing the format of the five-stage pipeline. Part 3 is the basic operation diagram of the 5-stage pipeline, Figure 4 is the configuration diagram of the 6-stage dynamic line circuit, Figure 5 is the configuration diagram of the address queue, and Figure 6 is a diagram of each case. Figure 7 is a diagram showing the basic rules for multiplexer control, Figure 8 is a conceptual diagram showing the former of the six-stage pipeline, and Figure 9 is a diagram of the processing former. State transition diagram, 10th
Figure 11 is a diagram showing the basic operation of the 6-stage pipeline, Figure 11 is a diagram showing the branches of the delayed branch method, and Figure 12 is a diagram showing the branch failure (1 cycle penalty
Figure 3 shows the branch not taken (two-cycle penalty). Figure 14 shows the branch untaken (two-cycle penalty). Figures 15 and 16 are diagrams showing conventional examples. Figure 15 shows the basic concept of pipeline processing. Figure 16 is a diagram explaining the pipeline interlock. 1...Pipeline stage, 2...
Dummy stage, 3... Predetermined stage. Stage execution order Principle of the present invention

Claims

[Claims]

(1) Adding a dummy stage that only allows data to pass through a plurality of pipeline stages constituting one instruction, and making it possible to replace the dummy stage with at least a predetermined stage that executes operand read processing, A pipeline processing method characterized in that a predetermined stage position after replacement is located at the rear of the stage execution order.

(2) The pipeline processing method according to claim 1, wherein the predetermined stage and a dummy stage are replaced when executing a subsequent instruction that depends on the execution result of the preceding instruction or the preceding instruction itself.

(3) The pipeline processing method according to claim 1 or 2, characterized in that upon execution of a subsequent instruction that does not depend on the execution result of the preceding instruction or the preceding instruction itself, the predetermined stage after replacement is returned to the position before replacement. .