JPH05313893A

JPH05313893A - Arithmetic bypassing circuit

Info

Publication number: JPH05313893A
Application number: JP11505992A
Authority: JP
Inventors: Tatsuki Nakada; 達己中田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-05-08
Filing date: 1992-05-08
Publication date: 1993-11-26

Abstract

PURPOSE:To bypass arithmetic results at a high speed when processing time at an operation stage is short, for example, when executing an operation instruction with '0' relating to the arithmetic bypassing circuit in a pipeline processing computer provided with a function guaranteeing a data-dependence relation. CONSTITUTION:The circuit is provided with a means for suppressing the interlock of a D stage and validating the interlock of an E stage when the data dependence relation is present between a preceding instruction and a succeeding instruction and the succeeding operation instruction with '0' is detected, the means for detecting timing for bypassing the data of the preceding instruction and a circuit for detecting '0' of the load data of the preceding instruction. The operation instruction with '0' is supplied to the D stage, the next E stage is interlocked, the load data of the preceding instruction is bypassed by the bypass control signal of the bypass timing detection means just prior to a W stage and thereafter the condition code of the operation instruction with '0' is set in a condition code register (CC) based on the decision signal of the '0' detection circuit.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、データ依存関係を保証
する機能を備えたパイプライン処理計算機における演算
バイパス回路に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an operation bypass circuit in a pipeline processing computer having a function of guaranteeing data dependency.

【０００２】計算機の処理の高速化、特に、メモリアク
セスの平均的な高速化を計るためには、キャッシュメモ
リが利用される。図９はキャッシュメモリの基本構成を
示した図である。キャッシュメモリ(CACHE) を構成する
ときの選択肢の一つにウエイ(WAY) 数がある。これは、
任意のアドレスのデータに対して、そのデータを保持
(キャッシュ) する可能性のあるキャッシュメモリのエ
ントリ数であり、一般的には、２ウエイ−セット−アソ
シアティブ，４ウエイ−セット−アソシアティブ，全セ
ット−アソシアティブ，ダイレクト−マッピング等がよ
く採用されている。A cache memory is used in order to increase the processing speed of a computer, and particularly to increase the average speed of memory access. FIG. 9 is a diagram showing the basic configuration of the cache memory. One of the options when configuring the cache memory (CACHE) is the number of ways (WAY). this is,
Holds the data for any address
The number of cache memory entries that can be (cached). Generally, 2-way-set-associative, 4-way-set-associative, all-set-associative, direct-mapping, etc. are often adopted. ..

【０００３】一般的に、同じ容量のキャッシュメモリな
らば、ウエイ(WAY) 数を増加させると、ヒット率が向上
し、性能が向上することが知られている。しかし、該ウ
エイ(WAY) 数が大きくなると、キャッシュメモリの読み
出しを行う時に、より多くの保持する可能性のあるエレ
メントから、目的とするデータを選択しなければならな
いので、ハードウェアの増加, 及び、遅延時間の増大を
招くことになる。It is generally known that if the cache memories have the same capacity, the hit rate is improved and the performance is improved by increasing the number of ways. However, when the number of ways (WAY) becomes large, when reading the cache memory, it is necessary to select the target data from more elements that may be held, so the increase in hardware, and However, this leads to an increase in delay time.

【０００４】ここで、キャッシュメモリを採用した場合
の一構成, 及び処理を、図９により説明する。キャッシ
ュメモリ 15 の中には、データを保持するデータメモリ
150と、ヒット判定のためよ使用されるタグメモリ 151
と、ヒット判定を行う判定回路 152｛選択するウエイ(W
AY) の決定｝と、そのヒット判定の結果によってデータ
を選択(WAYセレクト) するウエイ選択回路 153とから構
成され、もし、いずれかのウエイ(WAY) に目的とするデ
ータがあれば、ヒット判定信号とともに、キャッシュさ
れたデータが読み出される。Here, one configuration and processing when a cache memory is adopted will be described with reference to FIG. The cache memory 15 has a data memory that holds data.
150 and the tag memory 151 used for hit judgment
And a decision circuit for making a hit decision 152 {selecting way (W
AY)} and a way selection circuit 153 that selects data (WAY select) according to the result of the hit judgment. If there is target data in any way (WAY), hit judgment is made. The cached data is read along with the signal.

【０００５】この図のまま、全セット−アソシアティブ
にした場合には、データメモリ 150, タグメモリ 151の
出力は、非常に多くなることが予測できる。従って、一
般的に、全セット−アソシアティブのように、ウエイ(W
AY) 数の多い場合は、連想メモリを使用する。As shown in this figure, if all sets are associative, the outputs of the data memory 150 and the tag memory 151 can be expected to be very large. Therefore, in general, like the whole set-associative, the way (W
AY) Use associative memory for large numbers.

【０００６】本図からも分かるように、一般に、キャッ
シュメモリ 15 で、最も大きな伝播時間を必要とするパ
スは、アドレス保持回路 155からタグメモリ 151を検索
して、ヒット(WAY) の判定を行い、ウエイ(WAY) 選択を
行い、データ出力に至るパスである。As can be seen from this figure, generally, in the cache memory 15, the path that requires the largest propagation time is searched from the address holding circuit 155 to the tag memory 151 to determine the hit (WAY). , Way (WAY) is selected and the path to output data.

【０００７】図１０，図１１は、キャッシュメモリを含
んだパイプライン計算機の概念を示す図である。汎用レ
ジスタ(GR) 11 は、本図では、２つ図示してあるが、実
態は１つてある。汎用レジスタ(GR) 11 は、デコードス
テージ(D) では、ソースオペランドの読み出しに使用さ
れ、書き込みステージ(W) では、演算結果の書き込みに
使用される。FIGS. 10 and 11 are views showing the concept of a pipeline computer including a cache memory. Although two general-purpose registers (GR) 11 are shown in this figure, there is only one. The general-purpose register (GR) 11 is used for reading the source operand in the decode stage (D) and used for writing the operation result in the write stage (W).

【０００８】本図では、該キャッシュメモリ 15 の処理
を１ステージ(C) で行っている。又、本図に示した、
Ｄ，Ｅ，Ｃ，Ｗの各ステージに必要な時間は、必ずし
も、同じにならない。In the figure, the processing of the cache memory 15 is performed in one stage (C). Also, as shown in this figure,
The times required for the D, E, C, and W stages are not necessarily the same.

【０００９】最も時間のかかるステージによって、この
パイプラインのサイクル時間が決定されてしまう。従来
より、上記Ｃステージは、大きな時間を必要としてお
り、パイプラインのサイクルタイムを決定付ける要因と
なっていた。The most time consuming stage determines the cycle time of this pipeline. Conventionally, the C stage requires a large amount of time and has been a factor that determines the cycle time of the pipeline.

【００１０】更に、近年、論理回路部分の高速化がなさ
れてきたために、このＣステージを２つに分けて、パイ
プラインのサイクルタイムを小さくする場合もある。図
１１は、図１０で示したキャッシュメモリ 15 の処理を
２つのステージ(C1,C2) に分けたものである。C1ステー
ジでは、データメモリ(DATA) 150, タグメモリ(TAG) 15
1 の参照を行い, C2ステージでヒット(HIT) の判定, 及
び、ウエイ(WAY) セレクト(way-sel) を行っている。Further, in recent years, since the speed of the logic circuit portion has been increased, the C stage may be divided into two to shorten the cycle time of the pipeline. FIG. 11 shows the processing of the cache memory 15 shown in FIG. 10 divided into two stages (C1, C2). In the C1 stage, data memory (DATA) 150, tag memory (TAG) 15
Refer to 1 and judge the hit (HIT) and the way (WAY) select (way-sel) at the C2 stage.

【００１１】図１２は、従来のパイプライン計算機での
演算バイパスを示した図である。即ち、上記図１１に示
したようなステージＤ，Ｅ，C1，C2，Ｗを備えたパイプ
ライン計算機において、上記キャッシュメモリ 15 から
読み出したデータが後続の演算命令で使用される場合
の、演算バイパスの様子を示している。FIG. 12 is a diagram showing an operation bypass in a conventional pipeline computer. That is, in the pipeline computer having the stages D, E, C1, C2, W as shown in FIG. 11, the operation bypass when the data read from the cache memory 15 is used in the subsequent operation instruction. Is shown.

【００１２】このような演算バイパス技術を使用して
も、先行命令と後続命令との間で、データ依存関
係、即ち、レジスタ干渉があった場合、図示されている
インタロック手段 20,21により、該後続命令は、Ｄステ
ージでインタロックがかかり、データが得られるまでに
２τの間待たされる。Even if such an arithmetic bypass technique is used, when there is a data dependency relationship, that is, register interference between the preceding instruction and the succeeding instruction, the interlocking means 20, 21 shown in the figure cause The subsequent instruction is interlocked in the D stage, and is waited for 2τ before data is obtained.

【００１３】このようなパイプライン計算機において
は、該データ依存関係によって、該パイプラインが停止
され、性能が下がっていた。近年、計算機での処理の高
速化のために、例えば、スーパスカラーシステムや，超
長命令語(VLIW)システムで代表される、命令の並列実行
の技術が実用化されている。In such a pipeline computer, due to the data dependency, the pipeline is stopped and the performance is degraded. 2. Description of the Related Art In recent years, in order to speed up the processing in a computer, a technology of parallel execution of instructions, which is represented by, for example, a super color system and a very long instruction word (VLIW) system, has been put into practical use.

【００１４】該スーパスカラーシステムは、主記憶装置
(MSU) 上の命令は、通常の計算機のように、１命令宛配
列されているが、該命令を読み出して実行する段階にお
いて、２命令宛にスケジュールして、該命令の並列実行
を行うものであり、該計算機システムの中は、該並列実
行ができるように、命令レジスタ，デコーダ，演算器(A
LU) 等は並列にインプリメントされているが、上記キャ
ッシュメモリ 15 は１つしか置かれていないため、例え
ば、ロード命令等は、１命令としてスケジュールされ
る。The super color system is a main memory device.
The instruction on (MSU) is arranged for one instruction like an ordinary computer, but at the stage of reading and executing the instruction, the instruction is scheduled for two instructions and the instructions are executed in parallel. In the computer system, an instruction register, a decoder, an arithmetic unit (A
LU) and the like are implemented in parallel, but since only one cache memory 15 is provided, a load instruction or the like is scheduled as one instruction.

【００１５】又、超長命令語(VLIW)システムでは、高級
言語で記述されたソースプログラムをコンパイルする時
に、例えば、２命令毎にスケジュールして、命令 (操
作) の並列実行を行うもので、この場合も、上記ロード
命令等は、１命令としてスケジュールされる。In a very long instruction word (VLIW) system, when compiling a source program written in a high-level language, for example, every two instructions are scheduled to execute instructions (operations) in parallel. Also in this case, the load instruction and the like are scheduled as one instruction.

【００１６】このように、命令を並列に実行することに
より、上記データ依存関係のある先行命令と後続命令
との間の実行タイミングが近づき、従来にもまして、
該データ依存によるパイプラインの停止による性能の低
下の割合が大きくなっていることから、該データ依存関
係がある場合の演算バイパス処理も、効果的な演算バイ
パスができる回路が要求される。By executing the instructions in parallel in this way, the execution timings of the preceding instruction and the succeeding instruction having the above-mentioned data dependency are close to each other, and
Since the rate of decrease in performance due to the suspension of the pipeline due to the data dependency is large, a circuit capable of effective operation bypass is required for the operation bypass processing when there is the data dependency.

【００１７】[0017]

【従来の技術】前述の図１２は、従来のパイプライン計
算機での演算バイパスを示した図であり、図１３は、従
来の演算バイパス回路の問題点を説明する図であって、
図１３(a) はＣ言語で記述されたソースプログラムでよ
く使用されるストリングコピー関数のサブルーチンを疑
似アセンブラ言語で示したものであり、図１３(b) は、
該ストリングコピー関数の実行の様子を示している。2. Description of the Related Art FIG. 12 described above is a diagram showing an arithmetic bypass in a conventional pipeline computer, and FIG. 13 is a diagram explaining a problem of a conventional arithmetic bypass circuit.
FIG. 13A shows a subroutine of a string copy function which is often used in a source program written in C language in a pseudo assembler language, and FIG. 13B shows
The execution state of the string copy function is shown.

【００１８】図１３(a) に示した疑似アセンブラ言語に
おいて、横に並べて記述された操作は、並列に、一度に
実行されることを示している。図１３(b) は、該ストリ
ングコピー関数の実行の様子を示しており、最初の破線
で示した部分は、上記図１２で説明したように、先行命
令と後続命令との間にデータ依存関係、即ち、所
謂、レジスタ干渉が存在する場合の、従来のインタロッ
ク機構により、パイプラインがインタロックしている部
分であり、本例においては、比較命令(CMP) におい
て、先行のロード(LD)命令によるキャッシュメモリ 1
5 から読み出されるデータを待っているため、２τの待
ち時間が生じている。In the pseudo assembler language shown in FIG. 13 (a), the operations described side by side are executed in parallel at one time. FIG. 13B shows the execution state of the string copy function, and the portion indicated by the first broken line is the data dependency relationship between the preceding instruction and the succeeding instruction as described in FIG. 12 above. That is, the pipeline is interlocked by the conventional interlock mechanism when there is so-called register interference, and in this example, in the comparison instruction (CMP), the preceding load (LD) Instruction-based cache memory 1
Waiting for data to be read from 5, there is a waiting time of 2τ.

【００１９】２つ目の破線で示す部分は、分岐命令(BN
E) を実行して、分岐先を読み出す為には、先行してい
る上記比較命令(CMP) の結果が、該比較命令(CMP)
のＷステージで条件レジスタ(CC)に設定された条件コー
ドを、該分岐命令(BNE) が参照して、該分岐命令(BNE)
のＥステージで生成した分岐先のアドレスの命令を読み
出してくる為に生じるインタロックである。The portion indicated by the second broken line is a branch instruction (BN
To execute E) and read the branch destination, the result of the preceding comparison instruction (CMP) is the comparison instruction (CMP).
The branch instruction (BNE) refers to the condition code set in the condition register (CC) at the W stage of
This is an interlock that occurs when the instruction at the branch destination address generated in the E stage is read.

【００２０】[0020]

【発明が解決しようとする課題】この待ち時間は、例え
ば、公知の分岐予測機構などを使用して、先行読み出し
を行うことにより無くすることができる。この場合、該
ストリングコピー関数のループ部分は、図１３に示して
ある６τ→５τで実行できる。これに、上記キャッシュ
メモリ 15 を読み出す為の待ちが無ければ、該ストリン
グコピー関数のループ部分は３τで実行することができ
ることになる。This waiting time can be eliminated by performing the read-ahead by using, for example, a well-known branch prediction mechanism. In this case, the loop portion of the string copy function can be executed by 6τ → 5τ shown in FIG. If there is no waiting for reading the cache memory 15, the loop part of the string copy function can be executed in 3τ.

【００２１】即ち、該ストリングコピー関数のループ部
分において、該キャッシュメモリの読み出しを待つ為
に、最大４０％の性能の低下を招いている。従って、従
来の演算バイパス回路では、キャッシュメモリから読み
出したデータをオペランドとした場合、待ち時間があ
り、ハードウェアの性能を最大限に発揮できないという
問題があった。In other words, in the loop portion of the string copy function, waiting for the reading of the cache memory causes a decrease in performance of up to 40%. Therefore, the conventional operation bypass circuit has a problem that when the data read from the cache memory is used as an operand, there is a waiting time and the hardware performance cannot be maximized.

【００２２】本発明は上記従来の欠点に鑑み、データ依
存関係を保証する機能を備えたパイプライン処理計算機
において、キャッシュメモリの読み出しステージ(C1)
と、キャッシュヒット判定のステージ(C2)とが独立であ
るような場合に、該キャッシュメモリの読み出したデー
タを、後続命令がすぐに使用する際、積極的なバイパス
処理で、より速いタイミングで該読み出しデータを後続
命令に送ることにより、該パイプライン計算機の性能を
向上させること、又、演算ステージで行う処理に、演算
時間のかかる通常の演算処理と、例えば、“０”を加
算，“０”との比較といった、“０”との演算のよう
に、専用の演算回路を設けることで、演算時間を短縮す
ることができ、該演算処理時間の少ない処理があって、
該演算処理を短くできる場合には、入力データが遅いタ
イミングで入力されても、同じ演算サイクルでも充分に
結果を出すことができることに着目して、該演算処理時
間の短い"0" との演算命令を実行する際での先行命令の
演算結果のバイパスを高速に行うことができる演算バイ
パス回路を提供することを目的とするものである。In view of the above-mentioned conventional drawbacks, the present invention is a pipeline processing computer having a function of guaranteeing data dependency, and a read stage (C1) of a cache memory.
When the cache hit determination stage (C2) is independent, when the subsequent instruction immediately uses the data read from the cache memory, aggressive bypass processing is performed at a faster timing. By sending the read data to the subsequent instruction, the performance of the pipeline computer is improved, and for the processing performed in the arithmetic stage, for example, "0" is added and "0" is added to the normal arithmetic processing that requires an arithmetic time. By providing a dedicated arithmetic circuit like the arithmetic with "0" such as "comparison with", the arithmetic time can be shortened, and there is a processing with a short arithmetic processing time,
If the arithmetic processing can be shortened, paying attention to the fact that even if the input data is input at a late timing, a sufficient result can be obtained even in the same arithmetic cycle. It is an object of the present invention to provide an arithmetic bypass circuit capable of bypassing the arithmetic result of a preceding instruction when executing an instruction at high speed.

【００２３】[0023]

【課題を解決するための手段】図１は、本発明の原理構
成図である。上記の問題点は下記の如くに構成したバイ
パス演算回路によって解決される。FIG. 1 is a block diagram showing the principle of the present invention. The above problem is solved by the bypass arithmetic circuit configured as follows.

【００２４】少なくとも、デコードステージ(D) と, 演
算ステージ(E) と、オペランドフェッチステージ(C1,C
2) と, 書き込みステージ(W) とからなり、先行して実
行する命令と，後続して実行する命令との間のデー
タ依存関係を検出して、上記Ｄステージ，又は、Ｅステ
ージをインタロックする手段 20 〜22と,上記先行して
実行する命令のレジスタ書き込み，若しくは、該レジ
スタ書き込みの直前のタイミングで、後続命令に演算
データをバイパスする演算バイパス手段と，演算結果
によって生成される条件コード生成回路 32 と、を備え
たパイプライン処理計算機において、命令をデコードし
て「“０”との演算」命令であることを検出する手段 4
0 と, 該「“０”との演算」命令であることを検出した
とき、該Ｄステージインタロック手段 20 でのＤステー
ジインタロックを抑止し、該Ｅステージインタロック手
段 22 でのＥステージのインタロックを有効にするイン
タロック制御手段 20a,22aと、先行命令のデータをバ
イパスするタイミングを検出する手段 23 と、先行命令
のロードデータの“０”検出回路 50 と、上記ロードデ
ータの“０”検出回路 50 の“０”判定信号と、該ロ
ードデータの符号ビットを入力して条件コードを生成す
る「条件コード発生回路」31と, 上記条件コード生成回
路 32 によって、通常の演算で生成される条件コード
と、上記「条件コード発生回路」31によって生成される
条件コードを入力し、その何れかを選択する「条件コー
ド選択回路」30とを設けて、実行している命令が
「“０”との演算」命令であって、先行している命令
との間で「データ依存関係」が検出されたとき、該
「“０”との演算」命令を上記インタロック制御手段 2
0a,22aで、Ｄステージのインタロックを抑止して、次の
Ｅステージでインタロックを行い、先行命令のロード
データを、上記Ｗステージの直前で、上記バイパスタイ
ミング検出手段 23 のバイパス制御信号に基づいて、該
「“０”との演算」命令の演算ステージ（Ｅ）にバイ
パスした後、該“０”との演算を行うことなく、上記
“０”検出回路(50)で判定信号を上記「条件コード選
択回路」 31 で選択して、該「“０”との演算」命令の
演算結果の条件コードレジスタ(CC)を設定するように構
成する。At least the decode stage (D), the operation stage (E), and the operand fetch stage (C1, C
2) and the write stage (W), detect the data dependency between the instruction to be executed first and the instruction to be executed subsequently, and interlock the D stage or E stage. Means 20 to 22, a register write of the preceding instruction to be executed, or an operation bypass means for bypassing operation data to a subsequent instruction at a timing immediately before the register write, and a condition code generated by the operation result. A pipeline processing computer having a generation circuit 32 and means for decoding an instruction and detecting that the instruction is an “arithmetic operation with“ 0 ”” 4
When 0 and the "operation with" 0 "" instruction are detected, the D stage interlock means 22 suppresses the D stage interlock, and the E stage interlock means 22 controls the E stage. Interlock control means 20a, 22a for enabling the interlock, means 23 for detecting the timing to bypass the data of the preceding instruction, a "0" detection circuit 50 for the load data of the preceding instruction, and "0" for the load data. The "0" judgment signal of the "detection circuit 50", the "condition code generation circuit" 31 for generating the condition code by inputting the sign bit of the load data, and the condition code generation circuit 32 are generated by a normal operation. The condition code to be input and the condition code generated by the above-mentioned “condition code generation circuit” 31 are input and a “condition code selection circuit” 30 for selecting one of them is provided and executed. When the instruction is an "arithmetic operation with" 0 "" and a "data dependency" is detected with the preceding instruction, the "arithmetic operation with" 0 "instruction is interlock-controlled. Means 2
At 0a and 22a, the interlock of the D stage is suppressed, the interlock is performed at the next E stage, and the load data of the preceding instruction is used as the bypass control signal of the bypass timing detection means 23 immediately before the W stage. Based on the above, after bypassing to the operation stage (E) of the “operation with“ 0 ”” instruction, the determination signal is sent to the above-mentioned “0” detection circuit (50) without performing the operation with the “0”. The "condition code selection circuit" 31 is used to select and set the condition code register (CC) of the operation result of the "operation with" 0 "" instruction.

【００２５】[0025]

【作用】即ち、本発明においては、例えば、“０”との
比較演算は、専用の“０”検出回路を設けることで、高
速に実現することができることに着目し、キャッシュメ
モリの読み出しで、ウエイ(WAY) を選択したあとに、一
般の比較命令で使用する演算器(ALU) を使用することな
く、上記専用の“０”検出回路で“０”比較を行い、該
比較命令の条件コードを生成することにより、後続の待
ち時間を減少させる。That is, in the present invention, it is noted that the comparison operation with "0" can be realized at high speed by providing a dedicated "0" detection circuit. After selecting the way (WAY), without using the arithmetic unit (ALU) used in the general comparison instruction, perform the "0" comparison by the dedicated "0" detection circuit, and the condition code of the comparison instruction. To reduce the subsequent latency.

【００２６】又、本発明においては、Ｄステージでイン
タロックをかけると、後続命令のパイプラインへの投入
が後れ、処理の開始が後れることに着目して、データの
依存関係によるインタロックを、できる限り後のステー
ジ、例えば、Ｅステージでかけるようにして、該後続命
令である“０”比較命令の従来のパイプライン計算機で
は行われていたＤステージのインタロックを抑止し、Ｅ
ステージでインタロックをかけることにより、該“０”
比較命令の処理の開始を速めると共に、該“０”比較命
令に続く命令のパイプラインへの投入も早くすることが
できる。Further, in the present invention, when the interlock is applied in the D stage, the interlock due to the data dependency relationship is noticed, focusing on the fact that the subsequent instruction is delayed in the pipeline and the processing is delayed. Is performed at a stage as late as possible, for example, the E stage, so that the interlock of the D stage, which has been performed in the conventional pipeline computer of the subsequent instruction “0” comparison instruction, is suppressed.
By interlocking on the stage, the "0"
It is possible to speed up the start of processing of the comparison instruction and speed up the injection of the instruction following the “0” comparison instruction into the pipeline.

【００２７】以下、各ステージでの処理の概要を説明す
る。先ず、命令デコードステージ（Ｄ）：命令をデコー
ドして「“０”との比較」命令であることを検出し、図
示されていないパイプラインタグ (図１１参照）に投入
する。The outline of the processing in each stage will be described below. First, an instruction decode stage (D): an instruction is decoded to detect that it is a "comparison with" 0 "" instruction, and it is input to a pipeline tag (not shown) (see FIG. 11).

【００２８】多くの計算機では、０番の汎用レジスタを
０レジスタ（書き込んだ値によらず、読み出したデータ
は、常に、“０”であるようなレジスタ）としており、
該０番のレジスタと比較することによって「“０”との
比較」を行っている。In many computers, the 0th general-purpose register is the 0 register (the read data is always "0" regardless of the written value),
The “comparison with“ 0 ”” is performed by comparing with the 0th register.

【００２９】又、比較命令も、減算命令の結果の書き込
み先を０番のレジスタを指定した減算命令で代用してい
ることが多い。よって、命令デコーダでは、該命令をデ
コードし、例えば、上記のようなオペランド２が全
“０”であることをデコードするだけで、レジスタから
読み出したオペランドの値を検査することなく、該
「“０”との比較」命令を検出することができる。該検
出した結果は、上記パイプラインタグに保持し、所定の
ステージで使用する。Also, in the comparison instruction, the write destination of the result of the subtraction instruction is often substituted with the subtraction instruction designating the 0th register. Therefore, the instruction decoder simply decodes the instruction and decodes, for example, that all of the operands 2 are “0”, without checking the value of the operand read from the register. The "compare with 0""instruction can be detected. The detected result is held in the pipeline tag and used in a predetermined stage.

【００３０】演算ステージ（Ｅ）：先行のロード(LD)命
令の、上記ヒット判定／ウエイ(WAY) 選択ステージ(C
1)で処理されているデータが“０”と比較するデータで
あることを、上記Ｄステージでセットされ、パイプライ
ンタグを流れているタグから検出し、その場合は、該ウ
エイ(WAY) 選択されたデータを、本発明の専用の“０”
検出回路に入力して“０”判定を行い、“０”であった
場合には、予め、準備されている固定パターン（条件コ
ード）をフラグとして条件コード(CC)フラグ(Z=0) に設
定し、“０”でなかった場合には、予め、準備されてい
る所定の固定パターン（条件コード）をフラグとして条
件コード(CC)フラグ(Z=1) に設定する。Operation stage (E): The above-mentioned hit judgment / way (WAY) selection stage (C) of the preceding load (LD) instruction
It is detected from the tag set in the D stage and flowing in the pipeline tag that the data processed in 1) is the data to be compared with "0", and in that case, the way (WAY) is selected. The converted data is converted into the special “0” of the present invention.
When it is “0” by inputting to the detection circuit and judging “0”, the prepared fixed pattern (condition code) is used as a flag to set the condition code (CC) flag (Z = 0). If it is not "0", the prepared fixed pattern (condition code) is set as a flag in the condition code (CC) flag (Z = 1) in advance.

【００３１】書き込みステージ（Ｗ）：上記演算ステー
ジ（Ｅ）で得られたフラグを条件コードレジスタ(CC)に
書き込む。上記“０”との比較処理は、前述のように、
Ｃ言語で記述されたプログラムでは、文字列の終端を検
出する文字列検出処理を行う場合に非常に多く使用され
る為、上記のような、該「“０”検出」命令の高速化
が、該パイプライン計算機での、該文字列検出処理の高
速化に寄与する効果が大きい。Write stage (W): Writes the flag obtained in the arithmetic stage (E) to the condition code register (CC). As described above, the comparison process with “0” is as follows.
In the program written in C language, it is very often used in the case of performing the character string detection processing for detecting the end of the character string. Therefore, the speeding up of the ““ 0 ”detection” command as described above is The pipeline computer has a great effect of contributing to speeding up the character string detection processing.

【００３２】[0032]

【実施例】以下本発明の実施例を図面によって詳述す
る。前述の図１は、本発明の原理構成図であり、図２〜
図５は、本発明の一実施例を示した図であり、図２(a1)
〜(a2)は、「“０”検出」命令の構成例を示し、図３，
図４は先行命令と後続命令との間のデータの依存性を検
出してインタロック信号を生成する「データ依存性検出
手段」の構成例を示し、図５(a) は条件フラグの形式例
を示し、図５(b) は「“０”検出」命令の条件フラグを
生成する場合の構成例を示しており、図６は、本発明の
効果を説明する図であって、図６(a) は、従来のパイプ
ライン実行の様子を示し、図６(b) は本発明の演算パイ
プライン回路によるパイプライン実行の様子を示してお
り、図７，図８は、本発明の他の実施例を示した図であ
って、図７はスコアボードの構成例を示し、図８はスコ
アボードを使用した場合の演算バイパスの概念を示して
いる。Embodiments of the present invention will now be described in detail with reference to the drawings. The above-mentioned FIG. 1 is a principle configuration diagram of the present invention, and FIG.
FIG. 5 is a diagram showing an embodiment of the present invention, which is shown in FIG.
(A2) to (a2) show an example of the structure of the "detect" 0 "" command.
FIG. 4 shows a configuration example of "data dependency detection means" for detecting the data dependency between the preceding instruction and the subsequent instruction and generating an interlock signal, and FIG. 5 (a) shows an example of the format of the condition flag. FIG. 5B shows an example of the configuration in the case of generating the condition flag of the ““ 0 ”detection” instruction, and FIG. 6 is a diagram for explaining the effect of the present invention. 6A shows the state of conventional pipeline execution, FIG. 6B shows the state of pipeline execution by the arithmetic pipeline circuit of the present invention, and FIGS. 7 and 8 show other aspects of the present invention. FIG. 7 is a diagram showing an embodiment, FIG. 7 shows a configuration example of a scoreboard, and FIG. 8 shows a concept of operation bypass when a scoreboard is used.

【００３３】本発明においては、データ依存関係を保証
する機能を備えたパイプライン処理計算機における演算
バイパス回路であって、少なくとも、デコードステージ
(D)と, 演算ステージ(E) と、オペランドフェッチステ
ージ(C1,C2) と, 書き込みステージ(W) とからなり、デ
ータ依存関係検出手段 20 〜22により、先行命令と後
続命令の間の「データの依存関係」（レジスタ干渉）
を検出して、上記Ｄステージ，又は、Ｅステージをイン
タロックする手段を備え、「“０”との比較」命令を検
出したとき、該Ｄステージのインタロックを抑止し、Ｅ
ステージのインタロック 22 を有効にする手段 20a〜22
a と、先行命令のロードデータの“０”検出回路 50 と
を設けて、該“０”との演算命令をＤステージに投入し
て、次のＥステージをインタロックし、先行命令のロ
ードデータを、上記Ｗステージの直前で、後続命令の
Ｅステージにバイパスした後、該“０”との演算を行う
ことなく、上記“０”検出回路 50 の判定信号に基づい
て、該“０”との演算結果の条件コードを設定する手段
が、本発明を実施するのに必要な手段である。尚、全図
を通して同じ符号は同じ対象物を示している。According to the present invention, there is provided an operation bypass circuit in a pipeline processing computer having a function of guaranteeing a data dependency, at least a decode stage.
(D), operation stage (E), operand fetch stage (C1, C2), and write stage (W). Data dependency "(register interference)
Means for interlocking the D stage or the E stage, and when the "comparison with" 0 "" instruction is detected, the interlock of the D stage is suppressed, and E
Means for enabling stage interlock 22 20a-22
a and a load data "0" detection circuit 50 of the preceding instruction are provided, the operation instruction with the "0" is input to the D stage, the next E stage is interlocked, and the load data of the preceding instruction is loaded. Immediately before the W stage, after being bypassed to the E stage of the subsequent instruction, without performing the operation with the “0”, the “0” is determined based on the determination signal of the “0” detection circuit 50. The means for setting the condition code of the calculation result of is the means necessary for implementing the present invention. The same reference numerals denote the same objects throughout the drawings.

【００３４】以下、図１を参照しながら、図２〜図８に
よって、本発明の演算バイパス回路の構成と動作を説明
する。先ず、本発明に関連するパイプライン計算機の命
令形式は、例えば、図２(a1)に示したフォーマットを形
成しており、本発明に関連する「“０”比較」命令は、
図２(a2)に示したビット構成しており、前述のように、
該「“０”比較」命令は、減算命令(SUB,SUBi)であっ
て、読み出しオペランド２が“０”で、書き込みレジス
タ番号が“０”であるような命令であるので、該ビット
フォーマットから明らかな如く、Ｄステージのデコーダ
(DEC) において、ビット０〜16が "0100 0000 0000 000
0 0"か"0100 0000 0000 0000 1" であって、ビット22〜
31が "00 0000 0000" であることをデコードすれば良い
ことになる。The configuration and operation of the arithmetic bypass circuit of the present invention will be described below with reference to FIGS. First, the instruction format of the pipeline computer related to the present invention forms, for example, the format shown in FIG. 2 (a1), and the "" 0 "comparison" instruction related to the present invention is
It has the bit configuration shown in FIG. 2 (a2).
The "0" comparison "instruction is a subtraction instruction (SUB, SUBi), and the read operand 2 is" 0 "and the write register number is" 0 ". Obviously, the D stage decoder
In (DEC), bits 0-16 are "0100 0000 0000 000
0 0 "or" 0 100 0000 0000 0000 1 "and bit 22 to
You just need to decode that 31 is "00 0000 0000".

【００３５】次に、図３は本発明の「データ依存性関係
検出」回路の構成例を示している。図３からも明らかな
ように、該「データ依存性関係検出」回路の基本構成
は、先行命令のＥステージ，又は、Ｃ１ステージのパイ
プラインタグのレジスタ番号情報を用いて、先行命令の
書き込みレジスタ番号(E-WR-REG-ID,C1-WR-REG-ID)と、
後続する命令のＤステージの読み出しレジスタ番号(D-R
D-REG-ID1,2)が等しくて、且つ、該読み出しレジスタが
使用されている(D-RD-REG-1-USED,D-RD-REG-2-USED) 場
合に、該後続命令をＤステージでインタロックするよう
に構成されている。Next, FIG. 3 shows an example of the configuration of the "data dependency relationship detection" circuit of the present invention. As is apparent from FIG. 3, the basic configuration of the “data dependency relationship detection” circuit is a write register of the preceding instruction by using the register number information of the pipeline tag of the preceding instruction E stage or C1 stage. Number (E-WR-REG-ID, C1-WR-REG-ID) and
Read register number (DR
If the D-REG-ID1,2) are equal and the read register is used (D-RD-REG-1-USED, D-RD-REG-2-USED), the subsequent instruction is It is configured to interlock at the D stage.

【００３６】具体的には、上記の条件に、上記パイプラ
インタグ中の、該Ｅステージ, Ｃ１ステージが有効(E-V
ALID,C1-VALID)である条件と、該先行している命令が、
該ロード系命令である(E-LD-OP、C1-LD-OP) が必要であ
る。Specifically, the E stage and C1 stage in the pipeline tag are valid (EV
ALID, C1-VALID) and the preceding instruction are
The load instruction (E-LD-OP, C1-LD-OP) is required.

【００３７】更に、本発明においては、上記のインター
ロックを、該Ｄステージにある後続の命令が、前述の
「“０”比較」命令(D-CMP0-OP) である場合には、該Ｄ
ステージでのインターロックを抑止して、該「“０”比
較」命令(D-CMP0-OP) をパイプラインに投入し、例え
ば、Ｅステージでのインターロックをかけるように構成
する。Further, in the present invention, the above interlock is applied when the subsequent instruction in the D stage is the above-mentioned "0" compare "instruction (D-CMP0-OP).
The interlock at the stage is suppressed, the "0" comparison "instruction (D-CMP0-OP) is input to the pipeline, and, for example, the interlock is performed at the E stage.

【００３８】その為に、本発明においては、上記のイン
ターロック機構の他に、図４(a) に示したように、後続
の命令のＥステージと、先行命令のＣ１ステージとの間
にも、上記と同様のインターロック回路を設ける。Therefore, in the present invention, in addition to the interlock mechanism described above, as shown in FIG. 4 (a), the E stage of the subsequent instruction and the C1 stage of the preceding instruction are also provided. An interlock circuit similar to the above is provided.

【００３９】このインターロック回路のインターロック
条件には、図示されている如くに、先行命令のＣ１ステ
ージの書き込みレジスタ番号(C1-WR-REG-ID)と、後続す
る命令のＥステージの読み出しレジスタ番号(E-RD-REG-
ID1)とが等しい場合であって、更に、該Ｃ１ステージの
先行命令が有効(C1-VALID)で, 該先行命令がロード系の
命令(C1-LD-OP)であることが必要であり、更に、本発明
の場合には、該後続の命令が上記「“０”比較」命令(E
-CMP0-OP) である場合にインターロックをかけるように
構成する。As shown in the figure, the interlock condition of this interlock circuit is such that the write register number (C1-WR-REG-ID) of the C1 stage of the preceding instruction and the read register of the E stage of the following instruction. Number (E-RD-REG-
ID1) is equal, and further, the preceding instruction of the C1 stage must be valid (C1-VALID), and the preceding instruction must be a load instruction (C1-LD-OP), Further, in the case of the present invention, the subsequent instruction is the "" 0 "compare" instruction (E
-CMP0-OP), configure to interlock.

【００４０】このようなインターロック機構を設けるこ
とにより、図１に示されている如く、該「“０”比較」
命令は、Ｄステージでパイプラインに投入され、Ｅステ
ージでインターロックがかけられ、先行している、例え
ば、ロード命令がＣ２ステージに入った時点で、該イン
ターロックが解除され、本発明の演算バイパスが行わ
れることになる。By providing such an interlock mechanism, as shown in FIG. 1, the "0" comparison "is performed.
The instruction is input to the pipeline at the D stage, interlocked at the E stage, and preceded, for example, when the load instruction enters the C2 stage, the interlock is released and the operation of the present invention is performed. Bypass will be performed.

【００４１】次に、「“０”比較」命令を実行した場合
の演算結果に対する条件コード(CC)生成方法を説明す
る。図１で示すように、本発明においては、先行するＣ
２ステージの命令が、例えば、ロード(LD)命令であ
り、Ｅステージで実行している後続命令が、上記
「“０”比較」命令で、Ｃ２ステージのロード(LD)命令
の書き込みレジスタ番号が、Ｅステージの「“０”比
較」命令の読み出しレジスタ番号に等しい場合には、
該ロード(LD)命令のキャッシュメモリ 15 の読み出し
データ (図１では、ウエイ(WAY) 選択回路(way-sel) の
出力) を、専用の“０”検出回路 50 にバイパスして
入力し、その判定結果, 及び、該読み出しデータの符号
ビットを用いて、フラグを作成し、フラグレジスタ(CC)
にセットする。Next, a method of generating a condition code (CC) for the operation result when the "0" comparison "instruction is executed will be described. As shown in FIG. 1, in the present invention, the preceding C
The two-stage instruction is, for example, a load (LD) instruction, and the subsequent instruction executed in the E stage is the above “0” comparison ”instruction, and the write register number of the C2 stage load (LD) instruction is , If it is equal to the read register number of the “Compare“ 0 ”” instruction in the E stage,
The read data (the output of the way (WAY) selection circuit (way-sel) in the cache memory 15 of the load (LD) instruction is input to the dedicated “0” detection circuit 50 by bypassing the read data. A flag is created using the judgment result and the sign bit of the read data, and the flag register (CC)
Set to.

【００４２】該バイパスの条件は、Ｅステージで実行
している後続命令が、上記「“０”比較」命令で、先
行のＣ２ステージの命令が、ロード(LD)命令で、且
つ、Ｃ２ステージのロード(LD)命令の書き込みレジス
タ番号が、上記Ｅステージの「“０”比較」命令の読
み出しレジスタ番号に等しい場合ということになる。The condition of the bypass is that the subsequent instruction executed in the E stage is the "0" comparison "instruction, the preceding C2 stage instruction is the load (LD) instruction, and the C2 stage This means that the write register number of the load (LD) instruction is equal to the read register number of the "Compare" 0 "" instruction of the E stage.

【００４３】このバイパス制御回路 (バイパスタイミン
グ検出手段) 23の一実施例を図４(b) に示す。このバイ
パス制御回路 23 のバイパス条件には、図示されている
如くに、先行命令のＣ２ステージの書き込みレジスタ
番号(C2-WR-REG-ID)と, 後続する命令のＥステージの読
み出しレジスタ番号(E-RD-REG-ID) とが等しい場合であ
って、更に、該Ｃ２ステージの先行命令が有効(C2-VA
LID)で、該先行命令がロード系の命令(C2-LD-OP)であ
ることが必要であり、更に、本発明の場合には、該後続
の命令が上記「“０”比較」命令(E-CMP0-OP) である
場合に、バイパス(E-CMPO-BYPASS) をかけるように構成
する。An embodiment of this bypass control circuit (bypass timing detecting means) 23 is shown in FIG. 4 (b). As shown in the figure, the bypass condition of this bypass control circuit 23 is such that the write register number (C2-WR-REG-ID) of the C2 stage of the preceding instruction and the read register number (E of the E stage of the following instruction -RD-REG-ID) is the same, and the preceding instruction of the C2 stage is valid (C2-VA
In the LID), the preceding instruction must be a load-type instruction (C2-LD-OP), and in the case of the present invention, the subsequent instruction is the “0” comparison ”instruction ( If it is E-CMP0-OP), configure it to bypass (E-CMPO-BYPASS).

【００４４】該条件フラグの形式の例を図５(a) に示し
てあるが、該条件フラグは、３ビットから構成されてお
り、比較命令の場合には、次のように定義されている。
図５(a) において、Ｚフラグ：２つのオペランド(OP1,O
P2) が等しい場合に“１”であり、それ以外では“０”
とする。An example of the format of the condition flag is shown in FIG. 5 (a). The condition flag is composed of 3 bits, and in the case of a comparison instruction, it is defined as follows. ..
In FIG. 5 (a), Z flag: two operands (OP1, O
"1" when P2) are equal, and "0" otherwise.
And

【００４５】Ｎフラグ：減算した結果の符号（最上位）
ビットをセットする。Ｃフラグ：符号なし整数とみなして比較して、オペラン
ド１(OP1) がオペランド２(OP2) に比べて小さい時には
“１”であり、それ以外のときは“０”とする。N flag: Sign of the result of subtraction (most significant)
Set the bit. C flag: When the operand 1 (OP1) is smaller than the operand 2 (OP2), the value is "1" when the operand 1 (OP1) is smaller than the operand 2 (OP2) and is "0" otherwise.

【００４６】従って、本発明に関連する「“０”比較」
命令のように、オペランド２(OP2)が“０”の場合に
は、上記条件フラグの生成は、以下のように簡易化され
る。Ｚフラグ：オペランド１(OP1) が“０”のとき“１”
で、それ以外では“０” Ｎフラグ：オペランド１(OP1) の符号 (最上位=MSB) ビ
ットをセットする。Therefore, the "" 0 "comparison" relevant to the present invention.
When the operand 2 (OP2) is "0" like an instruction, the generation of the condition flag is simplified as follows. Z flag: "1" when operand 1 (OP1) is "0"
Otherwise, "0" N flag: The sign (most significant bit = MSB) bit of the operand 1 (OP1) is set.

【００４７】Ｃフラグ：常に、“０”とする。従って、
該「“０”比較」命令での、“０”検出回路 50 は、図
５(b) のようになる。但し、該フラグレジスタ(CC)に
は、演算結果の条件コードが設定されるので、図５(b)
に示されているように、上記本発明の“０”検出回路 5
0 からの設定値と、該演算回路からのフラグの設定値と
を、セレクタ(SEL) 30において、図４(b) に示した上記
バイパス制御信号 (バイパス制御回路 23 の出力信号)
(E-CMPO-BYPASS)で選択する必要がある。C flag: Always set to "0". Therefore,
The "0" detection circuit 50 for the "0" comparison "command is as shown in FIG. 5 (b). However, since the condition code of the operation result is set in the flag register (CC), the condition shown in FIG.
As shown in FIG.
The selector (SEL) 30 sets the set value from 0 and the set value of the flag from the arithmetic circuit to the bypass control signal (output signal of the bypass control circuit 23) shown in FIG. 4 (b).
It is necessary to select with (E-CMPO-BYPASS).

【００４８】図６は、本発明の効果を説明する図であっ
て、図６(a) は、従来のパイプライン実行の様子を示
し、図６(b) は本発明の演算パイプライン回路によるパ
イプライン実行の様子を示している。6A and 6B are views for explaining the effect of the present invention. FIG. 6A shows a state of conventional pipeline execution, and FIG. 6B shows an operation pipeline circuit of the present invention. The state of pipeline execution is shown.

【００４９】図６においては、前述のように、既に、パ
イプライン計算機でよく行われるいる並列処理を採用し
た場合の動作タイムチャートを示している。又、本図に
示したプログラム例は、Ｃ言語で記述されるプログラム
においてよく使用される、前述のストリングコピー関数
の例である。As described above, FIG. 6 shows an operation time chart in the case where the parallel processing which is often performed in the pipeline computer is already adopted. The program example shown in this figure is an example of the above-mentioned string copy function that is often used in a program written in C language.

【００５０】図６(a) では、比較の意味で、従来の演算
バイパス回路による場合の例を示しており、先行のロー
ド(LD)命令のＣ２ステージまで、後続の「“０”比
較」命令(SUBi 命令) が、先行のロード(LD)命令に
よるロードデータを待って、Ｄステージによりインター
ロックしていた為、１回のループに少なくとも、５サイ
クルを必要としている。For comparison, FIG. 6 (a) shows an example of a case where a conventional operation bypass circuit is used, and the following "0" comparison "instruction is executed until the C2 stage of the preceding load (LD) instruction. (SUBi instruction) waits for the load data by the preceding load (LD) instruction and interlocks with the D stage, so at least 5 cycles are required for one loop.

【００５１】然して、本発明の場合には、先行のロード
(LD)命令のＣ２ステージで、後続の「“０”比較」命
令(SUBi 命令) のフラグが生成できるので、１回のル
ープは４サイクルとなり、該ストリングコピー関数で扱
う文字列が長い場合には、約20％の高速化が達成できた
ことになる。Therefore, in the case of the present invention, the preceding load
At the C2 stage of the (LD) instruction, the flag of the subsequent “0” comparison instruction (SUBi instruction) can be generated, so that one loop has 4 cycles, and the character string handled by the string copy function is long. Is about 20% faster.

【００５２】上記の実施例においては、先行のロード(L
D)命令に対してのみ、後続の「“０”比較」命令を
高速化する例で説明したが、これに限るものではなく、
該ロード(LD)命令と同じような、例えば、４ステージ
目に、データが出力される命令であって、該最終のステ
ージでの操作に、マシンサイクルに対して遅延時間の少
ない命令（命令Ｘと呼ぶ）に対しても、同様に、該後続
の「“０”比較」命令を高速化することができることは
明らかである。In the above embodiment, the preceding load (L
D) For the instruction only, the explanation has been given with the example of accelerating the subsequent "" 0 "comparison" instruction, but the invention is not limited to this.
Similar to the load (LD) instruction, for example, an instruction in which data is output in the fourth stage, and an instruction (instruction X that has a small delay time with respect to the machine cycle) for the operation in the final stage It is clear that the subsequent "" 0 "compare" instruction can be speeded up as well.

【００５３】特に、該命令Ｘが、上記ロード(LD)命令と
同じレジスタ書き込みポート(Y) を使用する場合には、
該レジスタのポート(Y) のためのＣ２ステージの書き込
みレジスタ番号と，後続の命令のＥステージの読み出し
番号を比較すればよく、該命令Ｘをデコードし、該命令
Ｘの結果を、上記専用の検出回路で選択し、該命令Ｘの
結果の符号ビット（最上位ビット）を選択するだけで、
該命令Ｘに対する条件コードをフラグレジスタ(CC)に設
定でき、該後続の命令を高速化することができる。Particularly, when the instruction X uses the same register write port (Y) as the load (LD) instruction,
It suffices to compare the write register number of the C2 stage for the port (Y) of the register with the read number of the E stage of the subsequent instruction, decode the instruction X, and output the result of the instruction X to the dedicated By selecting by the detection circuit and selecting the sign bit (most significant bit) of the result of the instruction X,
The condition code for the instruction X can be set in the flag register (CC), and the subsequent instruction can be speeded up.

【００５４】上記の命令Ｘは、一般的には、上記ロード
(LD)命令と同じサイクル数で実行することができるとは
限らない。上記の議論を最も簡単に拡張すると、Ｃｎス
テージで、後続命令が演算結果をバイパスする為に
は、Ｃｎ−１ステージまで、該後続命令はＥステージ
でインターロックしなくてはならない。即ち、後続命令
のＥステージの読み出しレジスタ番号と，先行する命令
のＣ１ステージからＣｎ−１(n≧2)ステージまでの全て
のステージの書き込みレジスタ番号とを、該パイプライ
ンタグの情報を用いて比較するようにすればよい。The above instruction X is generally the above load.
It cannot always be executed in the same number of cycles as the (LD) instruction. The simplest extension of the above discussion is that in the Cn stage, in order for the subsequent instruction to bypass the operation result, the subsequent instruction must interlock in the E stage until the Cn-1 stage. That is, the read register number of the E stage of the succeeding instruction and the write register numbers of all the stages from the C1 stage to the Cn-1 (n ≧ 2) stage of the preceding instruction are used by using the information of the pipeline tag. You should make a comparison.

【００５５】又、該演算結果のバイパスを制御する条
件「E-CMP0-BYPASS 」は、該「“０”比較」命令のＥス
テージの読み出しレジスタ番号と，先行する命令のＣ
ｎ−１ステージの書き込みレジスタ番号のタグと比較す
るようにすればよい。The condition "E-CMP0-BYPASS" for controlling the bypass of the operation result is that the read register number of the E stage of the "compare" 0 "" instruction and the C of the preceding instruction.
It suffices to compare it with the tag of the write register number of the n-1 stage.

【００５６】然しながら、この方法では、ｎが大きくな
ると、該比較回路の数が多くなっししまって現実的でな
い。そこで、本発明においては、この問題を、公知のス
コアボード技術を用いることにより解決することを考え
る。However, in this method, when n becomes large, the number of the comparison circuits becomes large, which is not realistic. Therefore, in the present invention, it is considered to solve this problem by using a known scoreboard technique.

【００５７】図７は、該スコアボードの概念を説明して
いる。スコアボードは、レジスタや演算器などの資源を
効率良く管理するための手法である。先行命令のレジス
タへの書き込みの完了を、後続命令が待つ場合（即ち、
レジスタ書き込みのインターロックがある場合）を、該
スコアボードの技術を用いて実現する場合の例を、図
７，図８を用いて以下に説明する。FIG. 7 illustrates the concept of the scoreboard. The scoreboard is a method for efficiently managing resources such as registers and arithmetic units. When the subsequent instruction waits for the completion of the writing of the preceding instruction to the register (that is,
An example of a case where a register writing interlock) is realized by using the scoreboard technique will be described below with reference to FIGS. 7 and 8.

【００５８】図７の例はレジスタの数が１６個ある場合
のスコアボードの構成例を示している。該１６個のレジ
スタに対応して、図示されている如くに、少なくとも、
１ビットづつの記憶回路 60 を備える。The example of FIG. 7 shows a configuration example of the scoreboard when the number of registers is 16. Corresponding to the 16 registers, at least as shown,
A memory circuit 60 for each bit is provided.

【００５９】前述のロード(LD)命令のようにレジスタに
書き込む命令をデコードした場合は、該ロード(LD)命令
によって書き込まれるレジスタの番号 (仮にi とする)
に対応するｉ番目の先に示した記憶回路 60iをセットす
る。When an instruction to write to a register like the load (LD) instruction is decoded, the number of the register written by the load (LD) instruction (provisionally i)
The i-th storage circuit 60i corresponding to the above is set.

【００６０】後続命令が、ｊ番目のレジスタを読み出す
時、該スコアボードの、該ｊに対応する記憶回路 60jを
読み出して、該対応する記憶回路 60jがセットされてい
ると、先行する命令により書き換えられることが認識で
きたので、例えば、Ｅステージでインターロックして、
該先行命令の書き込みの完了を待つ。When the succeeding instruction reads the j-th register, the memory circuit 60j of the scoreboard corresponding to the j is read, and if the corresponding memory circuit 60j is set, it is rewritten by the preceding instruction. I was able to recognize that it would be done, for example, interlock at the E stage,
Wait for completion of writing of the preceding instruction.

【００６１】もし、該読み出しにおいて、該記憶回路 6
0jがリセットされていたら、先行命令によって書き換え
られることはないものと認識して、インターロックをす
ることなく次のステージに移る。If the read operation is performed, the memory circuit 6
If 0j is reset, it recognizes that it will not be rewritten by the preceding instruction and moves to the next stage without interlocking.

【００６２】先行している、例えば、ロード(LD)命令
は、ｉ番目のレジスタに書き込むときに、該スコアボー
ドのｉ番目に対応する記憶回路 60iをリセットする。従
って、もし、後続の命令で、該ｉ番目のレジスタを読み
出すためにインターロックしている命令があれば、該ｉ
番目のスコアボードの記憶回路 60iがリセットされた時
点で、該インターロックは解除され、該後続命令は、
該先行命令が書き込んだデータを使用して処理を再開
する。The preceding, for example, load (LD) instruction resets the memory circuit 60i corresponding to the i-th position of the scoreboard when writing to the i-th register. Therefore, if there is a subsequent instruction that interlocks to read the i-th register,
When the memory circuit 60i of the second scoreboard is reset, the interlock is released and the subsequent instruction is
The processing is restarted using the data written by the preceding instruction.

【００６３】上記の説明は、従来のスコアボードを使用
したインターロック制御の場合を示しているが、この従
来方法では、該先行命令が、実際にレジスタに書き込み
を行って、該スコアボードの対応するレジスタ 60iがリ
セットした時点で、初めて、該インターロックが解除さ
れ、前述の専用の“０”検出回路 50 を使用した場合に
比較して、１τだけ、該インターロックの解除が後れる
問題が残る。 (図１，図８参照）そこで、本発明におい
ては、図８に該スコアボードによるインターロック制御
を模式的に示したように、後続命令のＥステージへ
の、先行命令の演算結果のバイパスは、該後続命令
のＥステージに前述の「“０”比較」命令があって、
レジスタに対応して設けられているスコアボードの記憶
回路 60iが“１”であって、少なくとも、該レジスタに
対するライトイネーブル（ＷＥ）＝“１”であると
き、即ち、該スコアボードの記憶回路 60iが“０”にリ
セットされる前のサイクルで行うように構成すること
で、前述の専用の“０”検出回路 50 を使用した場合と
同等の、該「“０”比較」命令の高速化を行うことがで
きるようになる。The above description shows the case of interlock control using the conventional scoreboard, but in this conventional method, the preceding instruction actually writes to the register and the correspondence of the scoreboard is made. When the register 60i is reset, the interlock is released for the first time, and there is a problem that the interlock is released by 1τ in comparison with the case where the dedicated “0” detection circuit 50 described above is used. Remain. Therefore, in the present invention, as shown schematically in FIG. 8 for interlock control by the scoreboard, bypassing of the operation result of the preceding instruction to the E stage of the succeeding instruction is prevented. , There is the above-mentioned "0" comparison "instruction in the E stage of the following instruction,
When the storage circuit 60i of the scoreboard provided corresponding to the register is "1" and at least the write enable (WE) to the register is "1", that is, the storage circuit 60i of the scoreboard. Is configured to be performed in the cycle before being reset to "0", the same "0" comparison "instruction as in the case of using the dedicated" 0 "detection circuit 50 can be speeded up. You will be able to do it.

【００６４】上記の実施例においては、「“０”比較」
命令を例にして説明したが、先行命令の、例えば、ロー
ド(LD)命令の演算結果のバイパスデータとの「“０”と
の加算」「“０”との論理演算」等についても、同様
に、該演算バイパスデータに対する条件コードの生成等
を高速化できることはいうまでもないことである。但
し、この場合には、次のＷステージにおいて、レジスタ
へ該バイパスデータを書き込む処理を必要とする。In the above embodiment, "" 0 "comparison"
Although the instruction has been described as an example, the same applies to the preceding instruction, for example, “addition with“ 0 ”” and “logical operation with“ 0 ”” with bypass data of the operation result of the load (LD) instruction. Needless to say, the generation of the condition code for the operation bypass data can be speeded up. However, in this case, a process of writing the bypass data in the register is required in the next W stage.

【００６５】このように、本発明は、少なくとも、デコ
ードステージ(D) と, 演算ステージ(E) と、オペランド
フェッチステージ(C1,C2) と, 書き込みステージ(W) と
からなり、データの依存関係を検出して、上記Ｄステー
ジ，又は、Ｅステージをインタロックする手段を備えた
パイプライン計算機において、先行命令と後続命令
との間でインターロックがあり、該後続の“０”との演
算命令を検出したとき、該Ｄステージインタロックを抑
止し、Ｅステージのインタロックを有効にする手段と、
先行命令のデータをバイパスするタイミングを検出す
る手段と、先行命令のロードデータの“０”検出回路
とを設けて、該“０”との演算命令をＤステージに投入
して、次のＥステージをインタロックし、先行命令のロ
ードデータを、上記Ｗステージの直前で、上記バイパス
タイミング検出手段のバイパス制御信号に基づいて、後
続命令のＥステージにバイパスした後、上記“０”
検出回路の判定信号に基づいて、該“０”との演算結果
の条件コード(Z,N,C) を条件コードレジスタ(CC)を設定
するようにした所に特徴がある。As described above, the present invention comprises at least the decode stage (D), the operation stage (E), the operand fetch stages (C1 and C2), and the write stage (W). In the pipeline computer having means for interlocking the D stage or the E stage by detecting the above, there is an interlock between the preceding instruction and the succeeding instruction, and the operation instruction with the subsequent "0" Means for suppressing the D-stage interlock and enabling the E-stage interlock,
A means for detecting the timing of bypassing the data of the preceding instruction and a “0” detection circuit for the load data of the preceding instruction are provided, and the arithmetic instruction with the “0” is input to the D stage and the next E stage. , And the load data of the preceding instruction is bypassed to the E stage of the subsequent instruction immediately before the W stage based on the bypass control signal of the bypass timing detecting means, and then to the “0”.
The feature is that the condition code register (CC) is set with the condition code (Z, N, C) of the operation result with "0" based on the determination signal of the detection circuit.

【００６６】[0066]

【発明の効果】以上、詳細に説明したように、本発明の
演算バイパス回路は、少なくとも、デコードステージ
(D) と, 演算ステージ(E) と、オペランドフェッチステ
ージ(C1,C2) と, 書き込みステージ(W) とからなり、デ
ータの依存関係を検出して、上記Ｄステージ，又は、Ｅ
ステージをインタロックする手段を備えたパイプライン
計算機において、先行命令と後続命令との間でイン
ターロックがあり、該後続の“０”との演算命令を検
出したとき、該Ｄステージインタロックを抑止し、Ｅス
テージのインタロックを有効にする手段と、先行命令
のデータをバイパスするタイミングを検出する手段と、
先行命令のロードデータの“０”検出回路とを設け
て、該“０”との演算命令をＤステージに投入して、次
のＥステージをインタロックし、先行命令のロードデ
ータを、上記Ｗステージの直前で、上記バイパスタイミ
ング検出手段のバイパス制御信号に基づいて、後続命令
のＥステージにバイパスした後、上記“０”検出回
路の判定信号に基づいて、上記バイパスタイミング検出
手段のバイパス制御信号に基づいて、該“０”との演算
結果の条件コードを条件コードレジスタ(CC)を設定する
ようにしたものであるので、例えば、Ｃ言語で記述され
たプログラムでは、上記「“０”比較」命令が文字列の
終端を検出する文字列検出処理を行う場合に非常に多く
使用される為、該「“０”比較」命令の高速化が、該パ
イプライン計算機での処理の高速化に寄与する効果が大
きい。As described above in detail, the operation bypass circuit of the present invention is provided with at least the decode stage.
(D), operation stage (E), operand fetch stage (C1, C2), and write stage (W), and detects the data dependency to detect the D stage or E
In a pipeline computer having means for interlocking stages, when there is an interlock between a preceding instruction and a succeeding instruction and an operation instruction with the subsequent "0" is detected, the D stage interlock is suppressed. And a means for enabling the E-stage interlock, and a means for detecting the timing of bypassing the data of the preceding instruction,
A load data "0" detection circuit for the preceding instruction is provided, an operation instruction with "0" is input to the D stage, the next E stage is interlocked, and the load data of the preceding instruction is transferred to the W Immediately before the stage, after bypassing to the E stage of the subsequent instruction based on the bypass control signal of the bypass timing detection means, based on the determination signal of the "0" detection circuit, the bypass control signal of the bypass timing detection means. On the basis of the above, the condition code of the operation result with "0" is set in the condition code register (CC). Therefore, for example, in a program written in C language, the above "" 0 comparison Since the "instruction" is very often used in the case of performing the character string detection processing for detecting the end of the character string, the speeding up of the "" 0 "comparison" instruction increases the processing speed in the pipeline computer. Greatly contributes to speeding up.

[Brief description of drawings]

【図１】本発明の原理構成図FIG. 1 is a block diagram of the principle of the present invention.

【図２】本発明の一実施例を示した図（その１）FIG. 2 is a diagram showing an embodiment of the present invention (No. 1).

【図３】本発明の一実施例を示した図（その２）FIG. 3 is a diagram showing an embodiment of the present invention (part 2).

【図４】本発明の一実施例を示した図（その３）FIG. 4 is a diagram showing an embodiment of the present invention (part 3).

【図５】本発明の一実施例を示した図（その４）FIG. 5 is a diagram showing an embodiment of the present invention (No. 4).

【図６】本発明の効果を説明する図FIG. 6 is a diagram for explaining the effect of the present invention.

【図７】本発明の他の実施例を示した図（その１）FIG. 7 is a diagram showing another embodiment of the present invention (No. 1).

【図８】本発明の他の実施例を示した図（その２）FIG. 8 is a diagram showing another embodiment of the present invention (No. 2).

【図９】キャッシュメモリの基本構成を示した図FIG. 9 is a diagram showing a basic configuration of a cache memory.

【図１０】キャッシュメモリを含んだパイプライン計算
機の概念を示す図（その１）FIG. 10 is a diagram (1) showing the concept of a pipeline computer including a cache memory.

【図１１】キャッシュメモリを含んだパイプライン計算
機の概念を示す図（その１）FIG. 11 is a diagram showing a concept of a pipeline computer including a cache memory (No. 1).

【図１２】従来のパイプライン計算機での演算バイパス
を示した図FIG. 12 is a diagram showing operation bypass in a conventional pipeline computer.

【図１３】従来の演算バイパス回路の問題点を説明する
図FIG. 13 is a diagram illustrating a problem of a conventional arithmetic bypass circuit.

[Explanation of symbols]

11 レジスタ, 又は、汎用レジスタ(GR) 12 演算器(ALU) 15 キャッシュ
メモリ(CACHE) 150 データメモリ 151 タグメモリ 152 ヒット判定回路 153 データ(WA
Y) 選択回路 20,21,22 インタロック手段 23 バイパス制御回路 (バイパスタイミング検出手
段) 50 “０”検出回路 60i スコアボードの記憶回路 D デコードステージ E 演算ステージ C,C1,C2 キャッシュメモリアクセスステージ W 書き込みステージ先行命令後続命令演算バイパス路, 又は、演算バイパス判定信号書き込みイ
ネーブル信号11 register or general-purpose register (GR) 12 arithmetic unit (ALU) 15 cache memory (CACHE) 150 data memory 151 tag memory 152 hit judgment circuit 153 data (WA
Y) Selection circuit 20,21,22 Interlock means 23 Bypass control circuit (Bypass timing detection means) 50 “0” detection circuit 60i Scoreboard memory circuit D Decode stage E Operation stage C, C1, C2 Cache memory access stage W Write stage Preceding instruction Subsequent instruction Operation bypass path or operation bypass judgment signal Write enable signal

Claims

[Claims]

1. A decode stage (D), an operation stage (E), and an operand fetch stage (C1, C).
2) and a write stage (W), and detects the data dependency between the instruction () to be executed first and the instruction () to be executed subsequently to detect the D stage or E. Means to interlock the stage (20 ~
22), a register write of the instruction () to be executed earlier, or an operation bypass means () for bypassing operation data to a subsequent instruction () at a timing immediately before the register write, and an operation result generated by an operation result. A pipeline processing computer equipped with a conditional code generation circuit (32), and a means (4) for detecting that the instruction is an operation instruction with "0".
0) and the operation instruction of "0", the D stage interlock means (20) is prevented from interfering with the D stage interlock means and the E stage interlock means
The interlock control means (20a, 22a) for enabling the E-stage interlock in (22), the means (23) for detecting the timing of bypassing the data of the preceding instruction, and the "0" of the load data of the preceding instruction. "Detection circuit (50), condition code generation circuit for generating condition code by inputting" 0 "judgment signal () of load data" 0 "detection circuit (50) and sign bit of the load data ( 31) and the condition code generated by the above condition code generation circuit (31) and the condition code generated by the above condition code generation circuit (31) are input, and a condition for selecting one of them is input. When the code selection circuit (30) is provided and the executing instruction () is an operation instruction with "0" and a "data dependency" is detected with the preceding instruction (). , The operation command with "0" is given to the interlock control means (20a, 22a). Then, the interlock of the D stage is suppressed, the interlock is performed at the next E stage, and the load data of the preceding instruction () is transferred to the bypass control signal of the bypass timing detection means (23) immediately before the W stage. On the basis of the above, by bypassing to the operation stage (E) of the operation instruction () with the “0”, the judgment signal is obtained by the “0” detection circuit (50) without performing the operation with the “0”.
() Is selected by the condition code selection circuit (31), and the condition code register (C
Operation bypass circuit characterized by setting to C).

2. A scoreboard memory circuit (60i) as means for detecting a data dependency between the preceding instruction and the succeeding instruction by the succeeding instruction in the arithmetic bypass circuit.
Based on the write enable signal () of the operation result data of the preceding instruction to the register (11), the operation result data of the preceding instruction is bypassed () to the E stage of the subsequent instruction, and the corresponding scoreboard The operation bypass circuit according to claim 1, wherein the memory circuit (60i) is reset.