JPH05346852A

JPH05346852A - Pipeline processing data processor

Info

Publication number: JPH05346852A
Application number: JP4153941A
Authority: JP
Inventors: Tatsuki Nakada; 達己中田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-06-15
Filing date: 1992-06-15
Publication date: 1993-12-27
Anticipated expiration: 2013-06-11
Also published as: JP2763450B2

Abstract

PURPOSE:To improve the performance of a computer by bypassing data at much earlier timing when immediately using cache read data for following instructions in the case of pipeline processing provided with the cache memory of direct mapping. CONSTITUTION:A bypass processing condition in the case of generating cache hit is shown and this device is provided with an instruction register 10, general- purpose register 11, arithmetic circuit 12 and cache memory 13. Respective stages are expressed by D, E, C1, C2 and W, the time point of bypassing data is shown by an arrow, and the hit of the cache memory is shown by an arrow with a white circle. In this case, data appear at the end of the stage C1 and at this time point, the relevant data are controlled so as to be bypassed to the following arithmetic instructions. Namely, when cache data can be read, data are immediately bypassed. Thus, processing is more speedily completed in comparison with the system of bypassing data on the stage C2.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ダイレクトマッピング
のオペランドキャッシュメモリ（以下、キャッシュメモ
リという）をそなえてパイプライン処理を効率よく実行
するようにしたパイプライン処理データ処理装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a pipeline processing data processing device having a direct mapping operand cache memory (hereinafter referred to as a cache memory) to efficiently execute pipeline processing.

【０００２】[0002]

【従来の技術】データ処理装置における処理の高速化を
はかるために、特にメモリのアクセスの平均的な高速化
をはかるためにキャッシュメモリが使用される。以下、
キャッシュメモリを更に簡単化してキャッシュと略すこ
とがある。2. Description of the Related Art A cache memory is used in order to increase the speed of processing in a data processing device, particularly to increase the average speed of memory access. Less than,
The cache memory may be simplified to be abbreviated as cache.

【０００３】キャッシュを構成の選択肢の一つにＷＡＹ
数がある。これは任意のアドレスのデータに対してその
データを保持（キャッシュ）する可能性のあるキャッシ
ュメモリのエントリ数である。一般的には２way-set-as
sociative, 4way-set-associative, full-set-assoceat
ive, direct-mapping 等がよく採用される。WAY is one of the options for configuring the cache
There are numbers. This is the number of entries in the cache memory that may hold (cache) the data of any address. Generally 2way-set-as
sociative, 4way-set-associative, full-set-assoceat
ive, direct-mapping, etc. are often adopted.

【０００４】一般的に同じ容量のキャッシュメモリなら
ばＷＡＹ数を増化させると、hit 率が向上し性能が向上
することが知られている。しかしＷＡＹ数が大きくなる
とキャッシュの読み出しを行う時に、より多くの保持す
る可能性のあるエレメントから目的とするデータを選択
しなければならないのでハードウェアの増化および遅延
時間の増大を招く。It is generally known that if the cache memory has the same capacity, increasing the number of ways increases the hit rate and the performance. However, when the number of WAYs becomes large, the target data must be selected from a larger number of elements that can be held when the cache is read, resulting in an increase in hardware and an increase in delay time.

【０００５】ここでキャッシュを採用した場合の一構成
および処理を図８を参照して説明する。キャッシュの中
にはデータを保持するデータメモリ１とhit 判定のため
に使用されるタグメモリ２と、hit 判定をするヒット判
定回路（選択するＷＡＹの決定）３と、そのヒット判定
の結果によってデータを選択（ＷＡＹセレクト）する回
路４とから構成され、もしいずれかのＷＡＹに目的とす
るデータがあればhit信号とともにキャッシュされたデ
ータが読み出される。Here, one configuration and processing when the cache is adopted will be described with reference to FIG. In the cache, there is a data memory 1 for holding data, a tag memory 2 used for hit judgment, a hit judgment circuit (decision of WAY to select) 3 for hit judgment, and data depending on the result of the hit judgment. And a circuit 4 for selecting (WAY select). If there is target data in any of the WAYs, the cached data is read out together with the hit signal.

【０００６】この図８の構成をそのままＦＵＬＬ-assoc
iativeの構成にした場合にはデータメモリ１やタグメモ
リ２からの出力が非常に多くなることが予測できる。し
たがって一般的にfull-associativeのようにＷＡＹ数の
多い場合には連想メモリを使用する。The structure of FIG. 8 is used as it is for the FULL-assoc.
When the iative configuration is adopted, it can be predicted that the output from the data memory 1 or the tag memory 2 will be very large. Therefore, generally, the associative memory is used when the number of ways is large like full-associative.

【０００７】図８からも分かるように一般にキャッシュ
メモリで最も大きな伝搬時間を必要とするパスは、アド
レス保持回路からタグメモリを索引して、ヒット（ＷＡ
Ｙ）判定を行ってＷＡＹ選択を行いデータ出力に至るま
でのパスである。As can be seen from FIG. 8, in general, the path requiring the longest propagation time in the cache memory is indexed from the address holding circuit to the tag memory and hit (WA).
Y) This is a path from judgment to WAY selection to data output.

【０００８】図９にダイレクトマッピング（１ＷＡＹ）
の場合の構成例を示す。データが入る可能性のあるキャ
ッシュメモリのエントリは１箇所だけなので、データメ
モリの出力をＷＡＹ選択する回路は必要ない。図９中の
符号は図８に対応している。Direct mapping (1 WAY) is shown in FIG.
A configuration example in the case of is shown. Since there is only one entry in the cache memory where data may enter, there is no need for a circuit for selecting the output of the data memory as WAY. The reference numerals in FIG. 9 correspond to those in FIG.

【０００９】図１０はキャッシュメモリを含んだパイプ
ライン処理構成の概念図を示す。図中の符号１０は命令
レジスタ、１１は汎用レジスタ（ＧＲ）、１２は演算回
路（ＡＬＵ）、１３はキャッシュメモリ、１１’は汎用
レジスタを表している。またＤステージはオペランドの
読み出しステージであり、Ｅステージはアドレス演算ス
テージであり、Ｃステージはキャッシュ読み出しステー
ジであり、Ｗステージは書き込みステージである。FIG. 10 is a conceptual diagram of a pipeline processing configuration including a cache memory. In the figure, reference numeral 10 is an instruction register, 11 is a general-purpose register (GR), 12 is an arithmetic circuit (ALU), 13 is a cache memory, and 11 'is a general-purpose register. The D stage is an operand read stage, the E stage is an address operation stage, the C stage is a cache read stage, and the W stage is a write stage.

【００１０】図１０において、ＧＲ（汎用レジスタ）は
２つ図示してあるが実態は１つである。ＧＲ１１，１
１’はＤステージではソースオペランドの読み出しに使
用され、Ｗステージでは結果の書き込みに使用される。
図１０では図８、図９で示したキャッシュメモリの処理
を１ステージで行っている。In FIG. 10, two GRs (general purpose registers) are shown, but the actual number is one. GR11,1
1'is used for reading the source operand in the D stage, and used for writing the result in the W stage.
In FIG. 10, the processing of the cache memory shown in FIGS. 8 and 9 is performed in one stage.

【００１１】図１０でのＤ，Ｅ，Ｃ，Ｗの各ステージに
必要な時間は必ずしも同じにならない。最も時間のかか
るステージによってこのパイプラインのサイクル時間が
決定されてしまう。従来よりＣステージは大きな時間を
必要としており、パイプラインのサイクルタイムを決定
づけていた。さらに近年論理回路部分の高速化がなされ
てきたためにこのＣステージを２つに分けてパイプライ
ンのサイクルタイムを小さくする場合もある。The time required for each stage of D, E, C and W in FIG. 10 is not necessarily the same. The most time consuming stage determines the cycle time of this pipeline. Conventionally, the C stage requires a large amount of time, and determines the pipeline cycle time. Further, in recent years, since the speed of the logic circuit portion has been increased, the C stage may be divided into two to reduce the cycle time of the pipeline.

【００１２】図１１は図１０で示したキャッシュメモリ
の処理を２つのステージに分けたものである。図中の符
号は図１と図１０とに対応している。Ｃ１ステージでメ
モリの参照、Ｃ２ステージでヒット判定およびＷＡＹセ
レクトを行っている。FIG. 11 shows the processing of the cache memory shown in FIG. 10 divided into two stages. The reference numerals in the figure correspond to those in FIG. 1 and FIG. Memory reference is performed in the C1 stage, and hit determination and WAY selection are performed in the C2 stage.

【００１３】図１２では図２で示したダイレクトマッピ
ングのキャッシュメモリの処理を２つのステージに分け
たものを示している。図１２で注意されたことはＣ２ス
テージではヒット判定は行っているのみでデータメモリ
から読み出されたデータは何の処理もされていないこと
である。なお図１２における符号は図１１に対応してい
る。FIG. 12 shows the processing of the direct mapping cache memory shown in FIG. 2 divided into two stages. It should be noted in FIG. 12 that the hit determination is only performed in the C2 stage, and the data read from the data memory is not processed. The reference numerals in FIG. 12 correspond to those in FIG.

【００１４】[0014]

【発明が解決しようとする課題】図１３はキャッシュメ
モリからの読み出しデータが後続の演算命令にバイパス
される様子を示す。図１１の場合のようにＣ２ステージ
でＷＡＹ選択の処理をしていればＣ２ステージは有効に
使用されていることになるが、図１２のような構成の場
合にはＣ２ステージではデータを流しているだけで有効
に使用されていない。もしキャッシュがヒットしたとす
ると、Ｃ１ステージの終わりにはデータが準備されてい
るのにＣ２ステージの終わりまでデータを使用しないの
で、後続の命令がこのデータを使用する場合はＣ２ステ
ージの終わりまでインターロックしてしまい、性能が低
下する。FIG. 13 shows how read data from the cache memory is bypassed by a subsequent operation instruction. If the WAY selection processing is performed in the C2 stage as in the case of FIG. 11, the C2 stage is effectively used, but in the case of the configuration shown in FIG. It is not being used effectively. If the cache is hit, the data is not prepared until the end of the C2 stage even though the data is prepared at the end of the C1 stage. Therefore, if a subsequent instruction uses this data, the data is not read until the end of the C2 stage. It locks up and reduces performance.

【００１５】このように従来の構成ではデータが得られ
ているにも関わらず、そのデータを使用して処理を進め
ないので必要以上のインターロックをしてしまい性能を
下げていた。As described above, in the conventional configuration, although the data is obtained, the data cannot be used to proceed the processing, so that the interlock is performed more than necessary and the performance is lowered.

【００１６】本発明は、キャッシュメモリ読み出しステ
ージとキャッシュヒット判定ステージとが独立であるよ
うな、ダイレクトマッピングのキャッシュメモリをもつ
パイプラインにおいて、キャッシュ読み出しデータを後
続命令がすぐに使用する場合により早いタイミングでデ
ータをバイパスすることによって性能を上げることを目
的とする。According to the present invention, in a pipeline having a cache memory of direct mapping in which the cache memory read stage and the cache hit determination stage are independent of each other, the cache read data can be immediately used by a subsequent instruction immediately. The purpose is to improve performance by bypassing the data in.

【００１７】[0017]

【課題を解決するための手段】図１は本発明の原理説明
図であって、キャッシュヒットが生じた場合におけるバ
イパス処理態様を示している。図中の符号１０は命令レ
ジスタ、１１は汎用レジスタ、１２は演算回路、１３は
キャッシュメモリを表している。またダッシュ「 '」や
「''」、「 '''」を付したものはダッシュを付していな
いものと同一物である。更にＤ，Ｅ，Ｃ１，Ｃ２，Ｗは
夫々ステージを表している。また矢印はデータがバイパ
スする時点を示し、矢印に白丸を付したものはキャッシ
ュがヒットしたことを示している。FIG. 1 is an explanatory diagram of the principle of the present invention, showing a mode of bypass processing when a cache hit occurs. In the figure, reference numeral 10 is an instruction register, 11 is a general-purpose register, 12 is an arithmetic circuit, and 13 is a cache memory. Also, those with a dash "'", "''", or "'''" are the same as those without a dash. Further, D, E, C1, C2 and W respectively represent stages. The arrow indicates the time when the data is bypassed, and the white circle on the arrow indicates that the cache is hit.

【００１８】本発明の場合にはＣ１ステージの終わりに
おいてデータが現れており、この時点で当該データを後
続する演算命令にバイパスするように制御する。In the case of the present invention, data appears at the end of the C1 stage, and at this time, the data is controlled so as to be bypassed to the subsequent arithmetic instruction.

【００１９】[0019]

【作用】図１のようにキャッシュデータが読めるように
なったらすぐバイパスすることによって達成できる。Ｃ
２ステージでバイパスする方式と比べると１τ早く処理
が終了することが分かる。しかしＷステージで同時に２
つの命令が実行されることがわかる。このためにはＧＲ
の書き込みポートを増加させるのが最も明らかな方法で
ある。しかしこの方法は大幅なハードウェアの増加を伴
う。ＧＲの書き込みポートを増加させることなくこの問
題を解決するためにレジスタ書き込みバッファを使用す
る方法もあるが、本発明においては、そのための対策は
問わない。解決されればよい。This can be achieved by bypassing as soon as the cache data can be read as shown in FIG. C
It can be seen that the processing ends 1τ earlier than the method of bypassing in two stages. But 2 at the same time on the W stage
It can be seen that one instruction is executed. GR for this
The most obvious way is to increase the write port of. However, this method involves a large increase in hardware. There is also a method of using a register write buffer to solve this problem without increasing the number of write ports of GR, but the present invention does not take any measures for that purpose. It should be resolved.

【００２０】[0020]

【実施例】図２は本発明の一実施例ブロック図を示す。
図中の符号１，２，３，１１，１２，１３は図８、図１
０に対応している。また符号１４，１５，１６は夫々セ
レクタ、１７はロードデータバイパス検出回路であって
図１に示した如きバイパスを行うか否かを判定するも
の、１８はＣ１ステージ・ライトレジスタＩＤ（Ｃ１−
ＷＲ−ＲＥＧ−ＩＤ）保持部、１９は汎用レジスタ読み
出しインターロック検出回路であって上記バイパス処理
などに当ってレジスタがビジーとなるなどによってイン
ターロックを必要とする状態にあるか否かを検出するも
の、２０はＣ１ステージ・ロード命令（Ｃ１−ＬＤ−Ｏ
Ｐ）保持部、２１はパイプライン・ステージ制御回路で
あってパイプラインの各ステージに対応した制御信号を
生成するものを表している。FIG. 2 shows a block diagram of an embodiment of the present invention.
Reference numerals 1, 2, 3, 11, 12, and 13 in FIG.
Corresponds to 0. Reference numerals 14, 15 and 16 are selectors, 17 is a load data bypass detection circuit for determining whether or not the bypass as shown in FIG. 1 is performed, and 18 is a C1 stage write register ID (C1-
A WR-REG-ID) holding unit 19 is a general-purpose register read interlock detection circuit that detects whether or not the register is in an interlock-needed state due to a busy register or the like during the bypass process. 20 is a C1 stage load instruction (C1-LD-O
P) Holding unit, 21 is a pipeline stage control circuit that generates a control signal corresponding to each stage of the pipeline.

【００２１】また、図中の各種信号は次の如き信号を表
している。Ｄ−ＲＤ−ＲＥＧ−ＩＤ−０：Ｄステージでレジスタ
ＩＤ＃０からリードされることを指示する信号である。The various signals in the figure represent the following signals. D-RD-REG-ID-0: This signal indicates that the register ID # 0 is read in the D stage.

【００２２】Ｄ−ＲＤ−ＲＥＧ−ＩＤ−１：Ｄステー
ジでレジスタＩＤ＃１からリードされることを指示する
信号である。Ｃ１−ＶＡＬＩＤ：Ｃ１ステージが処理中であること
を表す信号である。D-RD-REG-ID-1: This signal indicates that the register ID # 1 is read from the D stage. C1-VALID: A signal indicating that the C1 stage is processing.

【００２３】Ｅ−ＷＲ−ＲＥＧ−ＩＤ：Ｅステージで
ライトされるレジスタを指示する信号である。Ｄ−ＲＤ−ＲＥＧ−０−ＵＳＥＤ：Ｄステージで命令
をデコードして得られる信号であってレジスタ＃０を使
用するかどうかを指示する信号である。E-WR-REG-ID: This signal is for instructing the register to be written in the E stage. D-RD-REG-0-USED: This signal is obtained by decoding an instruction at the D stage and is a signal indicating whether or not to use the register # 0.

【００２４】Ｄ−ＲＤ−ＲＥＧ−１−ＵＳＥＤ：Ｄス
テージで命令をデコードして得られる信号であってレジ
スタ＃１を使用するかどうかを指示する信号である。Ｅ−ＶＡＬＩＤ：Ｅステージが処理中であることを表
す信号である。D-RD-REG-1-USED: This is a signal obtained by decoding an instruction in the D stage, and is a signal instructing whether or not to use the register # 1. E-VALID: A signal indicating that the E stage is processing.

【００２５】Ｅ−ＬＤ−ＯＰ：Ｅステージでロード命
令が処理されていることを表す信号である。ＢＲＡＮＣＨ−ＩＮＴＥＲＬＯＣＫ：ブランチに関し
てインターロックとなることを表す信号である。E-LD-OP: This signal indicates that the load instruction is being processed at the E stage. BRANCH-INTERLOCK: A signal indicating that the branch is interlocked.

【００２６】ＧＲ−ＲＥＡＤ−ＩＮＴＥＲＬＯＣＫ：
汎用レジスタをリードするに際してインターロックとな
ることを表す信号である。Ｄ−ＳＴＡＧＥ−ＲＥＬＥＡＳＥ：Ｄステージの終わ
りを表す信号である。GR-READ-INTERLOCK:
It is a signal indicating that an interlock occurs when reading a general-purpose register. D-STAGE-RELEASE: This signal represents the end of the D stage.

【００２７】Ｅ−ＳＴＡＧＥ−ＲＥＬＥＡＳＥ：Ｅス
テージの終わりを表す信号である。図２においては、図１を参照して処理態様を簡単に述べ
た如く、汎用レジスタ１１からＤステージでオペランド
が読み出され、ステージにおいてセレクタ１５を経由し
て演算回路１２に供給されてアドレスが演算される。Ｃ
１ステージにおいて、データメモリ１とタグメモリ２と
がアクセスされ、Ｃ２ステージにおいてヒット判定回路
３によってヒットか否かが決定される。E-STAGE-RELEASE: This signal represents the end of the E stage. In FIG. 2, as briefly described in the processing mode with reference to FIG. 1, the operand is read from the general-purpose register 11 at the D stage and is supplied to the arithmetic circuit 12 via the selector 15 at the stage to supply the address. Is calculated. C
In the first stage, the data memory 1 and the tag memory 2 are accessed, and in the C2 stage, the hit determination circuit 3 determines whether or not there is a hit.

【００２８】図示のロードデータバイパス検出回路１７
は、図１に示した如きバイパスを行うか否かを判定し、
バイパスを行う場合には、セレクタ１５，１６によっ
て、データメモリ１から読み出されたデータが演算回路
１２に供給される。The illustrated load data bypass detection circuit 17
Determines whether to perform bypass as shown in FIG.
When performing the bypass, the data read from the data memory 1 is supplied to the arithmetic circuit 12 by the selectors 15 and 16.

【００２９】図示の汎用レジスタ読み出しインターロッ
ク検出回路１９は、後述する如く、Ｄステージで使用し
ているレジスタとＷステージで使用するレジスタとが同
じであるなどの場合に、インターロックを行うべきこと
を検出する。The general-purpose register read interlock detection circuit 19 shown in the figure should perform the interlock when the register used in the D stage and the register used in the W stage are the same, as will be described later. To detect.

【００３０】図示のパイプライン・ステージ制御回路
は、パイプライン処理のための制御信号、即ち処理中や
解放を指示する信号を生成し、パイプライン処理を制御
する。図３は本発明の場合においてキャッシュがミスヒ
ットした場合のバイパス処理態様を示す。図中の符号は
図１に対応し、矢印はデータがバイパスする時点を示
し、矢印に白丸を付したものはキャッシュがヒットした
ことを示し、矢印にバツ印を付したものはキャッシュが
ミスヒットしたことを示している。The illustrated pipeline stage control circuit generates a control signal for pipeline processing, that is, a signal instructing during processing or release to control the pipeline processing. FIG. 3 shows a bypass processing mode in the case of a cache hit in the case of the present invention. The symbols in the figure correspond to those in Fig. 1, the arrow indicates the time when the data is bypassed, the white circle on the arrow indicates that the cache has been hit, and the cross on the arrow indicates the cache is a miss. It shows that it did.

【００３１】図３ではミスしたキャッシュメモリにソフ
トウェアの介入なしでミスしたデータを主記憶装置から
読み出しキャッシュメモリに書き込み、書き込まれた内
容を再度Ｃ１およびＣ２ステージを実行して読み出して
いる。このように制御すればヒットした場合に近い制御
でミスヒットした場合も制御ができる。In FIG. 3, the missed data is written into the missed cache memory from the main storage device without any software intervention and written into the cache memory, and the written contents are read again by executing the C1 and C2 stages. By controlling in this way, it is possible to control even if there is a mis-hit with a control close to that when a hit occurs.

【００３２】ミスヒットした場合に一度キャッシュメモ
リに書き込んでから、再度読み出しをしてＧＲへの書き
込みやバイパスを行う方法以外に、キャッシュメモリに
書き込みながらＧＲ１１への書き込みやバイパスを行う
方法がある。また割り込みを起こしてソフトウェアによ
ってキャッシュメモリの更新を行う方法もある。しかし
ここではこれらすべての場合に対して詳細に考慮しな
い。なぜならばキャッシュメモリのヒット率は一般に９
０％以上であり、残りの１０％に対して性能が向上する
ような構成にしてもそれ程メリットがないことがあるた
めである。少なくともミスヒットした場合には従来どう
り処理するように構成すればよい。また最悪の場合には
バイパスしなくて一度ＧＲに書いてから後続命令がＧＲ
を読むように制御してもよい。In the case of a miss hit, there is a method of writing to the cache memory once and then reading again to write to the GR or bypass, while there is a method of writing to the GR11 or bypass while writing to the cache memory. There is also a method of causing an interrupt to update the cache memory by software. However, we will not consider them in detail here. Because the hit rate of cache memory is generally 9
This is because it is 0% or more, and even if the configuration is such that the performance is improved with respect to the remaining 10%, it may not be so advantageous. At least a mishit may be processed in the conventional manner. In the worst case, do not bypass and write to GR once, then the subsequent instruction is GR
May be controlled to read.

【００３３】図３の場合には、Ｃ１ステージにおいてバ
イパスのためのデータが渡され、Ｃ２ステージにおいて
ヒット判定の結果が通知されている。幾回かのＣ１ステ
ージとＣ２ステージとを繰り返し、図示右端のＣ１ステ
ージにおいてバイパスされたデータについてＣ２ステー
ジでヒットした旨が通知され、そのステージで演算した
ものが正しいものとして次の演算に利用される。In the case of FIG. 3, the data for bypass is passed in the C1 stage, and the result of the hit determination is notified in the C2 stage. The C1 stage and C2 stage are repeated several times, and the fact that the data bypassed in the C1 stage at the right end of the figure is hit in the C2 stage is notified, and the one calculated in that stage is used as the correct one for the next calculation. It

【００３４】以下例えば、ＩＥＥＥでハードウェア記述
言語として定められたＶＨＤＬで記述したの如き表現をもって、信号next-e-alu-op1をＥステージ
における信号e-alu-op1に取り込み、また信号next-e-al
u-op2をＥステージにおける信号e-alu-op1 に取り込む
ことを記述するが、当該記述は図４に示す如きハードウ
ェア構成を表現しているものと考えてよい。なお図４に
おいては、クロックの立ち上がり(clock＝'1' and cloc
k'event)において、夫々のフリップ・フロップに入力さ
れている信号next-e-alu-op1と信号next-e-alu-op2とが
夫々のフリップ・フロップにセットされ、信号e-alu-op
1 と信号e-alu-op2 となることを、ハードウェアの形で
表している。なお上記記述において「＜＝」なる記述は
「取り込まれる」ことを表しているものである（以下同
じ）。（Ａ）Ｄステージ終了信号について考える。Below, for example, it is described in VHDL which is defined as a hardware description language in IEEE. Incorporating the signal next-e-alu-op1 into the signal e-alu-op1 in the E stage, and also the signal next-e-al
It is described that u-op2 is taken into the signal e-alu-op1 in the E stage, but the description can be considered to represent the hardware configuration as shown in FIG. Note that in FIG. 4, the rising edge of the clock (clock = '1' and cloc
k'event), the signals next-e-alu-op1 and next-e-alu-op2 input to the respective flip-flops are set in the respective flip-flops, and the signal e-alu-op
1 and the signal e-alu-op2 are represented in the form of hardware. In the above description, the description “<=” means “taken in” (the same applies hereinafter). (A) Consider the D stage end signal.

【００３５】図２に示した信号Ｄ−ＳＴＡＧＥ−ＲＥＬ
ＥＡＳＥは次の如き場合に'1' とされる。 D-STAGE-RELEASE <= '1' when D-VALID = '1' and not(E-VALID = '1' and E-STAGE-RELEASE ='0')and BRANCH-INTERLOCK = '0' and GR-READ-INTERLOCK = '0' else '0' ; この式はＤステージの終了条件を示している。第１項は
Ｄステージ処理中であることを示している。第２項は次
のＥステージが処理中でかつＥステージが終了しない場
合にはＤステージは終了せずＤステージでインターロッ
クすることを示している。第３項は分岐命令を実行した
場合に分岐先の命令が取り込まれるまでＤステージで待
つことを示している。第４項ではＤステージでオペラン
ド読み出しをするＧＲが、先行してかつまだ終了してい
ない命令によって変更される場合に、書き込みが完了す
る（またはバイパスが可能になる）までＤステージでイ
ンターロックすることを示している。このＧＲ−ＲＥＡ
Ｄ−ＩＮＴＥＲＬＯＣＫ信号を以後ＧＲ読み出しインタ
ーロック信号と呼ぶ。The signal D-STAGE-REL shown in FIG.
EASE is set to '1' in the following cases. D-STAGE-RELEASE <= '1' when D-VALID = '1' and not (E-VALID = '1' and E-STAGE-RELEASE = '0') and BRANCH-INTERLOCK = '0' and GR- READ-INTERLOCK = '0' else '0'; This expression shows the end condition of the D stage. The first term indicates that the D stage processing is being performed. The second term indicates that the D stage does not end and the D stage interlocks when the next E stage is processing and the E stage does not end. The third term indicates that when a branch instruction is executed, it waits in the D stage until the branch destination instruction is fetched. Section 4 interlocks in the D stage until the write is complete (or bypass is possible) when the GR reading the operand in the D stage is modified by an instruction that preceded and has not finished. It is shown that. This GR-REA
The D-INTERLOCK signal is hereinafter referred to as a GR read interlock signal.

【００３６】本発明ではＧＲ読み出しインターロック信
号などを変更して早いタイミングでデータをバイパスす
ることを可能とする。（Ｂ）次にＧＲ読み出しインターロックについて考え
る。According to the present invention, it is possible to bypass the data at an early timing by changing the GR read interlock signal or the like. (B) Next, consider the GR read interlock.

【００３７】図１３に示した従来の場合においては図１
３から判るように、先行するＬＤ（ロード）命令のＥま
たはＣ１ステージで保持しているＬＤ命令の結果を書き
込むレジスタ番号（各々Ｅ−ＷＲ−ＲＥＧ−ＩＤ，Ｃ１
−ＷＲ−ＲＥＧ−ＩＤ）が、後続する命令のＤステージ
の読み出しレジスタ番号（図では２つあり、各々Ｄ−Ｒ
Ｄ−ＲＥＧ−ＩＤ０，Ｄ−ＲＤ−ＲＥＧ−ＩＤ１）に等
しい場合、後続命令をＤステージでインターロックさせ
ればよい。In the conventional case shown in FIG.
As can be seen from 3, the register number (E-WR-REG-ID, C1 respectively) in which the result of the LD instruction held in the E or C1 stage of the preceding LD (load) instruction is written.
-WR-REG-ID) is the read register number of the D stage of the subsequent instruction (there are two in the figure;
D-REG-ID0, D-RD-REG-ID1), the subsequent instruction may be interlocked at the D stage.

【００３８】ここでミスヒットした場合の処理について
説明する。ミスヒットしたデータを割り込み処理で処理
する場合は、自命令および後続命令の実行を中止して、
割り込みプログラムによってキャッシュメモリの置き換
え処理を行った後で再度割り込んだＬＤ命令を実行すれ
ばよい。Here, the processing in the case of a mishit will be described. When processing missed data by interrupt processing, stop the execution of its own instruction and the following instruction,
After the cache memory replacement processing is performed by the interrupt program, the interrupted LD instruction may be executed.

【００３９】 GR-READ-INTERLOCK <= '1' when (E-VALID = '1' and E-LD-OP = '1' and ( (E-WR-REG-ID = D-RD-REG-ID0 and D-RD-REG-0-USED) or (E-WR-REG-ID = D-RD-REG-ID1 and D-RD-REG-1-USED)) ) (C1-VALID = '1' and C1-LD-OP = '1' and ( (C1-WR-REG-ID = D-RD-REG-ID0 and D-RD-REG-0-USED) or (C1-WR-REG-ID = D-RD-REG-ID1 and D-RD-REG-1-USED)) ) else '0' ; 上記において、Ｅ−ＶＡＬＩＤ，Ｃ１−ＶＡＬＩＤは各
々Ｅ，Ｃ１ステージが有効であることを示す信号であ
り、Ｅ−ＬＤ−ＯＰ，Ｃ１−ＬＤ−ＯＰは各々Ｅ，Ｃ１
ステージでＬＤ命令が処理されていることを示す信号で
あり、Ｄ−ＲＤ−ＲＥＧ−０−ＵＳＥＤ，Ｄ−ＲＤ−Ｒ
ＥＧ−１−ＵＳＥＤはＤステージで命令をデコードして
得られる信号であり、２つのソースレジスタオペランド
各々を使用するかどうかを示す信号である。GR-READ-INTERLOCK <= '1' when (E-VALID = '1' and E-LD-OP = '1' and ((E-WR-REG-ID = D-RD-REG-ID0 and D-RD-REG-0-USED) or (E-WR-REG-ID = D-RD-REG-ID1 and D-RD-REG-1-USED))) (C1-VALID = '1' and C1-LD-OP = '1' and ((C1-WR-REG-ID = D-RD-REG-ID0 and D-RD-REG-0-USED) or (C1-WR-REG-ID = D- RD-REG-ID1 and D-RD-REG-1-USED))) else '0'; In the above, E-VALID and C1-VALID are signals indicating that the E and C1 stages are valid, respectively. E-LD-OP and C1-LD-OP are E and C1, respectively.
It is a signal indicating that the LD instruction is processed in the stage, and is D-RD-REG-0-USED, D-RD-R.
EG-1-USED is a signal obtained by decoding an instruction in the D stage, and is a signal indicating whether or not to use each of two source register operands.

【００４０】図１３の場合には上記の如き形でＧＲ読み
出しインターロック信号を与えるが、本発明の場合の図
１においてはＣ１ステージにおいてインターロックをか
ける必要が生じる。このことからＧＲ読み出しインター
ロック信号は次の論理にしたがうものとなる。In the case of FIG. 13, the GR read interlock signal is given in the above-described manner, but in the case of the present invention in FIG. 1, it is necessary to apply the interlock at the C1 stage. From this, the GR read interlock signal follows the following logic.

【００４１】 GR-READ-INTERLOCK <= '1' when (E-VALID = '1' and E-LD-OP = '1' and ( (E-WR-REG-ID = D-RD-REG-ID0 and D-RD-REG-0-USED) or (E-WR-REG-ID = D-RD-REG-ID1 and D-RD-REG-1-USED)) ) else '0' ; なお、ミスヒットが生じた場合の動作については、図１
３の場合と同様に考えればよい。（Ｃ）次にオペランド選択制御について考える。GR-READ-INTERLOCK <= '1' when (E-VALID = '1' and E-LD-OP = '1' and ((E-WR-REG-ID = D-RD-REG-ID0 and D-RD-REG-0-USED) or (E-WR-REG-ID = D-RD-REG-ID1 and D-RD-REG-1-USED))) else '0'; For the operation when the
It may be considered in the same manner as in the case of 3. (C) Next, consider the operand selection control.

【００４２】上述した如き制御信号を用いることによっ
てパイプライン・ステージの動作を目標の通りに行わせ
ることができる。しかし本発明の場合にはオペランドと
なるデータの流れについての制御をも変更しなければな
らない。By using the control signals as described above, the operation of the pipeline stage can be performed as desired. However, in the case of the present invention, the control on the flow of data as an operand must be changed.

【００４３】まず従来の場合を述べ、ついで本発明の場
合を述べる。ＡＬＵとして通常の加算などの演算器の例
として示しているが、シフタなどにおいても同様に考え
てよい。First, the conventional case will be described, and then the case of the present invention will be described. Although the ALU is shown as an example of an arithmetic unit such as a normal addition, the shifter and the like may be similarly considered.

【００４４】 process(clock, reset) --（レジスタプロセス） begin if reset = 1 then --（リセット時の処理） e-alu-op1 <= x"00000000" ; e-alu-op2 <= x"00000000" ; elsif (clock='1' and clock'event) then --（クロック立ち上がり e-alu-op1 <= next-e-alu-op1 ; 時の処理） e-alu-op2 <= next-e-alu-op2 ; end if ; end process ; このプロセスはエッジトリガータイプのフリップ・フロ
ップを表している。Process (clock, reset)-(register process) begin if reset = 1 then-(reset process) e-alu-op1 <= x "00000000"; e-alu-op2 <= x " 00000000 "; elsif (clock = '1' and clock'event) then-(clock rising e-alu-op1 <= next-e-alu-op1; time processing) e-alu-op2 <= next-e -alu-op2; end if; end process; This process represents an edge-triggered flip-flop.

【００４５】process(clock, reset) ではフリップ・フ
ロップがclock, reset入力が変化した時にだけ出力が変
化し得ることを示している。if reset＝1 thenからelsi
f までで reset＝１の時すなわちリセットが入力された
時の動作を表している。ここでフリップ・フロップの出
力信号を表す二つの信号e-alu-op1, e-alu-op2が０に初
期化される様子を示している。Process (clock, reset) indicates that the flip-flop can change its output only when the clock, reset input changes. if reset = 1 then from elsi
Up to f, it shows the operation when reset = 1, that is, when reset is input. Here, it is shown that two signals e-alu-op1 and e-alu-op2 representing the output signals of the flip-flops are initialized to 0.

【００４６】elsif (clock＝'1' and clock'event) the
n からend ifまででクロックの立ち上がり時の処理を表
している。clock'event とはclock 信号が変化した時に
真になるまたclock ＝'1' はclock 信号が'1' の時に真
になる。Elsif (clock ＝ '1' and clock'event) the
The process from the n to the end if represents the rising edge of the clock. A clock'event is true when the clock signal changes, and clock = '1' is true when the clock signal is '1'.

【００４７】したがって(clock='1' and clock'event)
でクロックの立ち上がりを検出している。クロックの立
ち上がりにはフリップ・フロップがそれぞれのフリップ
・フロップの入力に接続されたnext-e-alu-op1, next-e
-alu-op2信号を取り込み、出力信号を表す二つの信号e-
alu-op1, e-alu-op2が変化することを表している。なお
nextは入力であることを表している。Therefore (clock = '1' and clock'event)
The rising edge of the clock is detected at. At the rising edge of the clock, flip-flops are connected to the inputs of each flip-flop next-e-alu-op1, next-e
-Two signals that capture the alu-op2 signal and represent the output signal e-
It shows that alu-op1 and e-alu-op2 change. Note that
next indicates that it is an input.

【００４８】（ａ）ＤステージからＥステージに移る時
にオペランドの更新をすることになる。上記next-e-alu
-op1などは次のように与えられる。 NEXT-E-ALU-OP1 <= D-ALU-OP1 when D-STAGE-REL = '1' else E-ALU-OP1 ; NEXT-E-ALU-OP2 <= D-ALU-OP2 when D-STAGE-REL = '1' else E-ALU-OP2 ; （ｂ）Ｄステージのオペランド生成において第一オペラ
ンドはＧＲのみをとりうる D-ALU-OP1 <= D-GR-DATA-1 ; 第二オペランドはＧＲまたはイミディエート（即値）を
取り得る。そして命令をデコードしたd-op2-sel で選択
する。(A) When moving from the D stage to the E stage, the operand is updated. Above next-e-alu
-op1 etc. are given as follows. NEXT-E-ALU-OP1 <= D-ALU-OP1 when D-STAGE-REL = '1' else E-ALU-OP1; NEXT-E-ALU-OP2 <= D-ALU-OP2 when D-STAGE- REL = '1' else E-ALU-OP2; (b) In the D stage operand generation, the first operand can be GR only D-ALU-OP1 <= D-GR-DATA-1; The second operand is GR Or it can be immediate. Then, the instruction is decoded and selected by d-op2-sel.

【００４９】 D-ALU-OP2 <= D-GR-DATA-2 when D-OP2-SEL = REG-SEL else x"0000" & D-OPCODE(16 to 31) ; （ｃ）ここでd-gr-data-1, d-gr-data-2はいずれも従来
の方法によって、先行する命令のＥステージの結果をバ
イパスすべき場合にはバイパスしたデータである。D-ALU-OP2 <= D-GR-DATA-2 when D-OP2-SEL = REG-SEL else x "0000"& D-OPCODE (16 to 31); (c) where d-gr -data-1 and d-gr-data-2 are bypassed data by the conventional method when the result of the E stage of the preceding instruction should be bypassed.

【００５０】 d-gr-data-1 <= e-result when d-op1-bypass-from-e = '1' else gr(d-gr-rd-adr-1) ; d-gr-data-2 <= e-result when d-op2-bypass-from-e = '1' else gr(d-gr-rd-adr-2) ; d-op1-bypass-from-e <= '1' when e-valid = '1' and e-write = '1' and e-wr-gr-adr = d-gr-rd-adr-1 else '0' ; d-op2-bypass-from-e <= '1' when e-valid = '1' and e-write = '1' and e-wr-gr-adr = d-gr-rd-adr-2 else '0' ; ここでgr(i) は汎用レジスタのｉ番のレジスタ値であ
り、d-gr-rd-adr-1, d-gr-rd-adr-2は各々第一、第二ソ
ースレジスタオペランドのレジスタ番号で命令をデコー
ドして得られる。D-gr-data-1 <= e-result when d-op1-bypass-from-e = '1' else gr (d-gr-rd-adr-1); d-gr-data-2 <= e-result when d-op2-bypass-from-e = '1' else gr (d-gr-rd-adr-2); d-op1-bypass-from-e <= '1' when e- valid = '1' and e-write = '1' and e-wr-gr-adr = d-gr-rd-adr-1 else '0'; d-op2-bypass-from-e <= '1' when e-valid = '1' and e-write = '1' and e-wr-gr-adr = d-gr-rd-adr-2 else '0'; where gr (i) is the general register i The second register value, d-gr-rd-adr-1, d-gr-rd-adr-2, is obtained by decoding the instruction with the register numbers of the first and second source register operands, respectively.

【００５１】e-write は次のＷステージでレジスタに書
き込みを行うことを示す信号である。またe-wr-gr-adr
はそのＷステージでどのレジスタに書き込みをするかを
示す。E-write is a signal indicating that writing to the register will be performed in the next W stage. Also e-wr-gr-adr
Indicates which register is written in the W stage.

【００５２】e-resultはＡＬＵによって演算された結果
である。本発明の場合には、上記従来の場合に次の機能
を付加することになる（図１参照）。すなわち先行する
ＬＤ命令がＣ１ステージでありかつ書き込みレジスタ番
号が読み出しレジスタ番号に等しい場合に先行するＬＤ
命令のキャッシュ読み出しデータをあたかもＧＲ読み出
しデータとして使用するようにバイパスする。この論理
は演算器からのものとよく似ており、類推されよう。E-result is the result calculated by the ALU. In the case of the present invention, the following functions are added to the above conventional case (see FIG. 1). That is, when the preceding LD instruction is the C1 stage and the write register number is equal to the read register number, the preceding LD instruction
Bypass the cache read data of the instruction as if it were used as GR read data. This logic is very similar to that from the arithmetic unit and can be inferred.

【００５３】 D-OP1-BYPASS-FROM-C1 <= '1' when C1-VALID = '1' and C1-LD-OP = '1' and C1-WR-GR-ADR = D-GR-RD-ADR-1 else '0' ; D-OP2-BYPASS-FROM-C1 <= '1' when C1-VALID = '1' and C1-LD-OP = '1' and C1-WR-GR-ADR = D-GR-RD-ADR-2 else '0' ; D-GR-DATA-1 <= E-RESULT when D-OP1-BYPASS-FROM-E = '1' else C1-CACHE-DATA when D-OP1-BYPASS-FROM-C1 = '1' else GR(D-GR-RD-ADR-1) ; D-GR-DATA-2 <= E-RESULT when D-OP2-BYPASS-FROM-E = '1' else C1-CACHE-DATA when D-OP2-BYPASS-FROM-C1 = '1' else GR(D-GR-RD-ADR-2) ; 以上の変更によってミスヒット時に割り込み処理と命令
の再実行によって処理を行う方式のキャッシュメモリの
高速化は完了した。D-OP1-BYPASS-FROM-C1 <= '1' when C1-VALID = '1' and C1-LD-OP = '1' and C1-WR-GR-ADR = D-GR-RD- ADR-1 else '0'; D-OP2-BYPASS-FROM-C1 <= '1' when C1-VALID = '1' and C1-LD-OP = '1' and C1-WR-GR-ADR = D -GR-RD-ADR-2 else '0'; D-GR-DATA-1 <= E-RESULT when D-OP1-BYPASS-FROM-E = '1' else C1-CACHE-DATA when D-OP1- BYPASS-FROM-C1 = '1' else GR (D-GR-RD-ADR-1); D-GR-DATA-2 <= E-RESULT when D-OP2-BYPASS-FROM-E = '1' else C1-CACHE-DATA when D-OP2-BYPASS-FROM-C1 = '1' else GR (D-GR-RD-ADR-2); Due to the above changes, processing is performed by interrupt processing and instruction re-execution at a miss hit. The speed-up of the cache memory of the done method has been completed.

【００５４】次に図３に示すようにハードウェアによっ
てミスヒットしたキャッシュデータを読み込み後続の命
令にデータをバイパスするような回路の構成について示
す。今までに示したオペランドレジスタの更新制御では
Ｄステージの終了信号Ｄ−ＳＴＡＧＥ−ＲＥＬがアサー
トされた時にのみ更新され、それ以外の時は前の値を保
持している。図３ではＤ−ＳＴＡＧＥ−ＲＥＬがアサー
トされていないのに先行命令からバイパスされるデータ
をオペランドレジスタに取り込まなければならない。以
下にそのための制御を示す。Next, as shown in FIG. 3, a circuit configuration is shown in which cache data missed by hardware is read and the data is bypassed by a subsequent instruction. In the update control of the operand register described so far, the update is performed only when the end signal D-STAGE-REL of the D stage is asserted, and in other cases, the previous value is held. In FIG. 3, the data bypassed from the preceding instruction even though D-STAGE-REL is not asserted must be fetched into the operand register. The control for that is shown below.

【００５５】ＤステージからＥステージに移る時にオペ
ランドの更新をする。 NEXT-E-ALU-OP1 <= D-ALU-OP1 when D-STAGE-REL = '1' else C1-CACHE-DATA when E-OP1-BYPASS-FROM-C1 = '1' else E-ALU-OP1 ; NEXT-E-ALU-OP2 <= D-ALU-OP2 when D-STAGE-REL = '1' else C1-CACHE-DATA when E-OP2-BYPASS-FROM-C1 = '1' else E-ALU-OP2 ; E-OP1-BYPASS-FROM-C1 <= '1' when C1-VALID = '1' and C1-LD-OP = '1' and C1-WR-GR-ADR = E-GR-RD-ADR-1 else '0' ; E-OP2-BYPASS-FROM-C1 <= '1' when C1-VALID = '1' and C1-LD-OP = '1' and C1-WR-GR-ADR = E-GR-RD-ADR-2 else '0' ; 図３で示した方法以外にもキャッシュに書き込むデータ
が用意されると同時にバイパスを行う等多くの他の方法
が考えられるが、基本的に上の論理で示したように、レ
ジスタ番号の一致、ＬＯＡＤしたデータが来るタイミン
グを検出してオペランドレジスタに取り込むように制御
をすればよい。（Ｄ）次にパイプライン制御の変形（Ｉ）について考え
る。When moving from the D stage to the E stage, the operand is updated. NEXT-E-ALU-OP1 <= D-ALU-OP1 when D-STAGE-REL = '1' else C1-CACHE-DATA when E-OP1-BYPASS-FROM-C1 = '1' else E-ALU-OP1 ; NEXT-E-ALU-OP2 <= D-ALU-OP2 when D-STAGE-REL = '1' else C1-CACHE-DATA when E-OP2-BYPASS-FROM-C1 = '1' else E-ALU- OP2; E-OP1-BYPASS-FROM-C1 <= '1' when C1-VALID = '1' and C1-LD-OP = '1' and C1-WR-GR-ADR = E-GR-RD-ADR -1 else '0'; E-OP2-BYPASS-FROM-C1 <= '1' when C1-VALID = '1' and C1-LD-OP = '1' and C1-WR-GR-ADR = E- GR-RD-ADR-2 else '0'; In addition to the method shown in Fig. 3, many other methods are conceivable, such as when data to be written in the cache is prepared and bypassing is possible. As indicated by the logic, control may be performed so as to detect the coincidence of the register numbers and the timing at which the LOADed data arrives and capture the same in the operand register. (D) Next, consider the modification (I) of pipeline control.

【００５６】今までの説明ではＬＯＡＤ命令のアドレッ
シングモードに２つのレジスタオペランドの和または１
つのレジスタオペランドと即値（イミディエート）との
和を使えるものとしたが、１つのレジスタオペランドの
みを指定できる場合を考える（すなわちレジスタ間接ア
ドレッシングのみであり、加算によってアドレスを計算
する必要はない）。In the above description, the sum of two register operands or 1 is set in the addressing mode of the LOAD instruction.
The sum of two register operands and an immediate value (immediate) is used, but consider the case where only one register operand can be specified (that is, only register indirect addressing, and it is not necessary to calculate an address by addition).

【００５７】図６、図７にその場合の従来例と本発明で
のパイプラインの動作の様子を示す。図中の符号は図１
３や図１に対応している。この場合にはＬＯＡＤ命令に
よるインターロックの制御は図７で示すように必要な
い。オペランド制御回路は従来の制御のままでよい。（Ｅ）更にパイプライン制御の変形（II）について考え
る。FIGS. 6 and 7 show the operation of the pipeline in the conventional example and the present invention in that case. Reference numerals in the figure are those in FIG.
3 and FIG. 1. In this case, the interlock control by the LOAD instruction is not necessary as shown in FIG. The operand control circuit may remain conventional control. (E) Further, consider the modification (II) of pipeline control.

【００５８】上記した説明の場合では、パイプラインを
インターロックさせて命令間のデータの依存などを解決
している。しかし、一般的にはインターロックを行う方
法以外に、無効命令をＥステージ以降のパイプラインに
投入し、待つ条件がなくなったら本来の命令をＥステー
ジ以降のパイプラインに流す方法がある。この様子を図
５に示す。ほとんどの命令はＧＲへの書き込みをしなけ
れば無効命令と等価である。図５の場合「ＡＤＤ」命令
に対応して「ＮＯＰ」が流れるようにされている。ＧＲ
書き込みの制御信号を無効化して命令の無効化をした例
を以下にしめす。In the case of the above description, the pipeline is interlocked to solve the data dependency between instructions. However, in general, in addition to the method of interlocking, there is a method of inputting an invalid instruction into the pipeline after the E stage and causing the original instruction to flow into the pipeline after the E stage when the waiting condition disappears. This state is shown in FIG. Most instructions are equivalent to invalid instructions without a write to GR. In the case of FIG. 5, "NOP" is made to flow corresponding to the "ADD" command. GR
The following is an example of invalidating the instruction by invalidating the write control signal.

【００５９】（ａ）共通 process(clock, reset) begin if (clock='1' and clock'event) or reset = '1' then if reset = 1 then --（リセット時の処理） E-WRITE <= '0'; else --（クロック立ち上がり時の処理） E-WRITE <= NEXT-E-WRITE ; end if ; end if ; end process ; （ｂ）インターロック方式 NEXT-E-WRITE <= D-WRITE when D-STAGE RELEASE = '1' else E-WRITE ; （ｃ）無効化方式 NEXT-E-WRITE <= D-WRITE when D-STAGE-RELEASE = '1' else E-WRITE when E-STAGE-RELEASE = '0' and E-VAL ='1' else '0' ; ＧＲ書き込み制御信号だけで命令の無効化ができない場
合には、少なくとも適当なＥステージの信号を無効化す
れば良い。たとえばＦＲ（浮動少数点）書き込み信号、
例外検出信号、システムレジスタ更新信号などがそれに
相当する。（Ｆ）次に本発明によるパイプライン処理の例を示す。(A) Common process (clock, reset) begin if (clock = '1' and clock'event) or reset = '1' then if reset = 1 then-(Process at reset) E-WRITE < = '0'; else-(Process at clock rising) E-WRITE <= NEXT-E-WRITE; end if; end if; end process; (b) Interlock method NEXT-E-WRITE <= D- WRITE when D-STAGE RELEASE = '1' else E-WRITE; (c) Invalidation method NEXT-E-WRITE <= D-WRITE when D-STAGE-RELEASE = '1' else E-WRITE when E-STAGE- RELEASE = '0' and E-VAL = '1' else '0'; When the instruction cannot be invalidated only by the GR write control signal, at least an appropriate E stage signal may be invalidated. For example, FR (floating point) write signal,
An exception detection signal, a system register update signal, etc. correspond to this. (F) Next, an example of pipeline processing according to the present invention will be described.

【００６０】今、固定の大きさのデータの転送やレコー
ド・文字列の代入などで非常によく使用される例を挙げ
る。ここでは１要素を４バイトとしている。このようなループ処理は次の如き処理に展開される。Now, an example very often used for transferring data of a fixed size and substituting records / character strings will be given. Here, one element is 4 bytes. Such loop processing is expanded to the following processing.

【００６１】この命令列の中で最も処理時間に大きな影響を与えるの
はループの中である。以下にこのループを実行している
時のパイプラインを示す。図１４は従来の場合を示し、
図１５は本発明の場合を示す。図１５に示される如く、
この例では本発明により最大１１％の性能が向上する。[0061] It is the loop that has the greatest effect on the processing time in this instruction sequence. Below is the pipeline when executing this loop. FIG. 14 shows a conventional case,
FIG. 15 shows the case of the present invention. As shown in FIG.
In this example, the invention improves performance by up to 11%.

【００６２】[0062]

【発明の効果】以上説明した如く、本発明によれば、キ
ャッシュメモリ読み出しステージとキャッシュヒット判
定のステージが独立であるような、ダイレクトマッピン
グのキャッシュメモリをもつパイプライン処理におい
て、キャッシュ読み出しデータを後続命令がすぐに使用
する場合により早いタイミングでデータをバイパスする
ことによって計算機の性能を上げることができる。As described above, according to the present invention, in a pipeline process having a cache memory of direct mapping in which a cache memory read stage and a cache hit determination stage are independent, cache read data is succeeded. The performance of the computer can be improved by bypassing the data at an earlier timing when the instruction is used immediately.

[Brief description of drawings]

【図１】本発明の原理説明図を示す。FIG. 1 is a diagram illustrating the principle of the present invention.

【図２】本発明の一実施例ブロック図を示す。FIG. 2 shows a block diagram of an embodiment of the present invention.

【図３】キャッシュがミスヒットした場合のバイパス処
理態様を示す。FIG. 3 shows a bypass processing mode in the case of a cache miss.

【図４】ＶＨＤＬの記述に等価な回路を示す。FIG. 4 shows a circuit equivalent to the description of VHDL.

【図５】無効命令によるパイプライン制御の場合を示
す。FIG. 5 shows a case of pipeline control by an invalid instruction.

【図６】アドレス計算不要の場合の従来の処理態様を示
す。FIG. 6 shows a conventional processing mode when address calculation is not required.

【図７】アドレス計算不要の場合の本発明の処理態様を
示す。FIG. 7 shows a processing mode of the present invention when address calculation is not required.

【図８】キャッシュメモリの基本構成を示す。FIG. 8 shows a basic configuration of a cache memory.

【図９】ダイレクトマッピングの場合の基本構成を示
す。FIG. 9 shows a basic configuration in the case of direct mapping.

【図１０】パイプライン（キャッシュに１ステージを費
やす場合）を示す。FIG. 10 shows a pipeline (when one stage is spent in cache).

【図１１】パイプライン（キャッシュに２ステージを費
やす場合）を示す。FIG. 11 shows a pipeline (when two stages are spent in cache).

【図１２】パイプライン（２ステージ−ダイレクトマッ
ピング）を示す。FIG. 12 shows a pipeline (2-stage-direct mapping).

【図１３】従来の場合におけるバイパス処理の態様を示
す。FIG. 13 shows a mode of bypass processing in a conventional case.

【図１４】従来の場合の処理例を示す。FIG. 14 shows a processing example of a conventional case.

【図１５】本発明の場合の処理例を示す。FIG. 15 shows a processing example in the case of the present invention.

[Explanation of symbols]

１データメモリ２タグメモリ３ヒット判定回路４データ選択回路１０命令レジスタ１１汎用レジスタ１２演算回路１３キャッシュメモリ１４セレクタ１５セレクタ１６セレクタ１７ロードデータバイパス検出回路１８Ｃ１ステージ・ライトレジスタＩＤ保持部１９汎用レジスタ読み出しインターロック検出回路２０Ｃ１ステージ・ロード命令保持部２１パイプライン・ステージ制御回路 1 data memory 2 tag memory 3 hit determination circuit 4 data selection circuit 10 instruction register 11 general purpose register 12 arithmetic circuit 13 cache memory 14 selector 15 selector 16 selector 17 load data bypass detection circuit 18 C1 stage write register ID holding unit 19 general purpose register Read interlock detection circuit 20 C1 stage load instruction holding unit 21 Pipeline stage control circuit

Claims

[Claims]

1. A read stage (C1) comprising a direct mapping operand cache memory (13) for reading the operand cache memory (13).
And a hit determination stage (C2) for performing a hit determination by the operand cache memory (13) independently, in a pipelined data processing device, an instruction for a write register for data being read from the operand cache memory The “write register identification signal” (WR-REG-ID) and the “read register identification signal” (RD-REG-ID) indicating the register to be read as the operand are input, and the “write register identification signal” ( WR-RE
G-ID) and the "readout register identification signal" (RD
-REG-ID) has a load data bypass detection circuit (17) for detecting whether writing and reading are performed on the same register, and the load data bypass detection circuit (17) is the same register as described above. When it is detected that the data is written to or read from the operand cache memory (13), the data read from the operand cache memory (13) is bypassed for the operation by the subsequent instruction without waiting for the hit determination in the operand cache memory (13). Then, the pipeline processing data processing device is configured to start the execution stage (E) of the subsequent instruction.

2. The register read stage (D) of the subsequent instruction when the preceding load instruction has not reached the read stage (C1) when reading the register in the register read stage (D) of the subsequent instruction.
Alternatively, interlocking is performed in the operation stage (E), and at the time of a mishit in the operand cache memory (13), the preceding load instruction is interrupted, and the necessary data from the main memory device to the operand cache memory (13). 2. After the move-in is performed, the load instruction is re-executed.
The pipeline processing data processing device described.

3. The register read stage (D) of the subsequent instruction when the preceding load instruction has not reached the read stage (C1) when reading the register in the register read stage (D) of the subsequent instruction.
Alternatively, interlocking is performed in the arithmetic stage (E), and when a miss hit occurs in the operand cache memory (13), the interlocking state occurs, and necessary data is moved in from the main memory device to the operand cache memory (13). The pipeline processing data processing apparatus according to claim 1, wherein the interlock is released and the execution is continued after that.

4. The register read stage (D) of the subsequent instruction when the preceding load instruction has not reached the read stage (C1) when reading the register in the register read stage (D) of the subsequent instruction.
The pipeline processing data processing device according to claim 1, wherein an invalid instruction is passed in the subsequent stages.