JPH0588891A

JPH0588891A - Cache memory controller

Info

Publication number: JPH0588891A
Application number: JP3252310A
Authority: JP
Inventors: Yukihiro Ide; 進博井出
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1991-09-30
Filing date: 1991-09-30
Publication date: 1993-04-09

Abstract

PURPOSE:To improve the hit rate of a cache memory by exactly executing prefetching even when branching is generated. CONSTITUTION:This device is equipped with a cache memory 2 and a branch prediction circuit 3 to predict the address of a branch destination from the storage address of a branching instruction and in the case of executing feel processing because of cache error, the address of the cache error is registered on the branch prediction circuit 3. When there is no address of the predicted branch destination in the cache memory 2, a block containing the predicted branch destination address is prefetched immediately after a refeel processing and when there is the predicted branch destination address in the cache memory 2, a bus 113 is released immediately after the refeel processing. When the address of the cache error is not registered on the branch prediction circuit 3 and a block next to the block containing the address of the cache error does not exist in the cache memory 2, the next block is prefetched immediately after the refeel processing and when the block next to the block containing the address of the cache error is existent in the cache memory 2, the bus 113 is released immediately after the refeel processing.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、キャッシュ・メモリの
プリフェッチを制御するキャッシュメモリ制御装置に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a cache memory controller for controlling prefetch of cache memory.

【０００２】[0002]

【従来の技術】近年、プロセッサの高速化に対応して、
高速に動作するプロセッサと比較的アクセス速度が遅い
主記憶装置との動作速度のギャップを緩和するために、
より高速なキャッシュ・メモリをプロセッサと主記憶装
置の間に置くことが一般化している。しかし、一連のキ
ャッシュ・メモリの動作の内、主記憶装置の内容をキャ
ッシュ・メモリに取り込むリフィール処理、およびキャ
ッシュ・メモリの内容を主記憶装置に書き戻すライトバ
ック処理は、非常に時間を要し、オーバヘッドが大き
い。従って、キャッシュ・メモリのヒット率を向上さ
せ、なるべくリフィール処理やライトバック処理を生じ
させないようにすることが、プロセッサを効率よく動作
させる上で大切である。2. Description of the Related Art In recent years, in response to increasing processor speed,
To reduce the operating speed gap between the high-speed processor and the main memory, which has a relatively low access speed,
It is becoming commonplace to have faster cache memory between the processor and main memory. However, in the series of cache memory operations, the refill process of fetching the contents of the main memory device to the cache memory and the write-back process of writing the contents of the cache memory back to the main memory device take a very long time. , Overhead is large. Therefore, it is important to improve the cache memory hit rate and prevent refill processing and write back processing as much as possible in order to operate the processor efficiently.

【０００３】命令キャッシュ・メモリやデータキャッシ
ュ・メモリのプリフェッチは、ヒット率を向上させる一
手法である。キャッシュ・メモリのプリフェッチとは、
次にフェッチすべきブロックを予測して、あらかじめフ
ェッチしておくことにより、キャッシュ・ミスを少なく
して、リフィール処理、ライトバック処理によるオーバ
ヘッドを小さくする方法である。Prefetching the instruction cache memory and the data cache memory is one technique for improving the hit rate. What is cache memory prefetch?
This is a method of predicting the block to be fetched next and fetching it in advance to reduce cache misses and reduce the overhead due to refill processing and writeback processing.

【０００４】従来のプリフェッチのアルゴリズムは、デ
ータ、命令シーケンスの局所性を利用している。最も単
純なアルゴリズムは、現在リフィールされているブロッ
クの次のブロックを無条件にフェッチする方法である。
この方法では、無条件にプリフェッチを行うので、
（１）不必要なプリフェッチを行い易い、（２）不用意
に必要なブロックを追い出してしまう、（３）必要以上
に外部バスを占有してしまう、などの問題点があった。
特に命令シーケンスは、分岐によって流れが大きく変わ
ることが多く、単に次のブロックをフェッチする方法で
は、予測が不十分である。The conventional prefetch algorithm utilizes the locality of data and instruction sequences. The simplest algorithm is to unconditionally fetch the block next to the block currently being refilled.
With this method, prefetch is performed unconditionally, so
There are problems that (1) it is easy to perform unnecessary prefetch, (2) carelessly evicts a necessary block, and (3) occupies an external bus more than necessary.
In particular, the flow of an instruction sequence often changes greatly depending on a branch, and the method of simply fetching the next block is insufficient in prediction.

【０００５】特に、命令キャッシュ・メモリに於いて
は、次のような方法も提案されている。すなわち、
（１）キャッシュ・ミスがあって、結果として、プロセ
ッサがある番地からのリフィールを実行するときには、
直ちに次のブロックのフェッチを行う、（２）分岐の結
果、キャッシュ・メモリ内の命令を行うことになったと
きには、いつでも、直ちにその次のブロックのフェッチ
を行う、という方法である。しかしながら、この方法は
分岐に関しては、あらかじめ分岐を予測してキャッシュ
・メモリにプリフェッチできないという欠点がある。In particular, the following method has been proposed for the instruction cache memory. That is,
(1) When there is a cache miss and, as a result, the processor executes a refill from a certain address,
This is a method of immediately fetching the next block, or (2) immediately fetching the next block whenever an instruction in the cache memory is to be executed as a result of branching. However, this method has a drawback that the branch cannot be predicted in advance and prefetched into the cache memory.

【０００６】一方、微細加工技術の向上により、従来は
実装できなかった分岐予測回路も、ＭＰＵに実装される
ことが一般化してきている。分岐予測回路とは、分岐命
令の実行に先だって、過去の分岐履歴をもとに分岐命令
の番地により分岐先番地を予測する回路である。通常、
分岐命令は、実行ステージで分岐先番地を求め、プログ
ラム・カウンタを更新することによって行われるため、
命令パイプラインの乱れを引き起こす。分岐予測回路を
用いた場合には、実際の分岐先番地を求める前に予想し
た分岐先番地に分岐してしまうため、予測分岐先番地が
当たっていれば、命令パイプラインを乱すことなく実行
を継続することができる。On the other hand, with the improvement of the microfabrication technology, it has become general to mount a branch prediction circuit that could not be mounted in the past in the MPU. The branch prediction circuit is a circuit that predicts the branch destination address based on the address of the branch instruction based on the past branch history before executing the branch instruction. Normal,
Branch instructions are executed by obtaining the branch destination address in the execution stage and updating the program counter.
Causes disruption of the instruction pipeline. If the branch prediction circuit is used, it will branch to the predicted branch destination address before obtaining the actual branch destination address, so if the predicted branch destination address is correct, execution will not be disturbed in the instruction pipeline. You can continue.

【０００７】しかし、分岐予測回路を用いて命令パイプ
ラインを乱すことなく分岐を完了しても、分岐先番地が
キャッシュ・ミスを起こしてしまった場合には、命令パ
イプラインは停止してしまい、分岐予測回路による効果
が半減してしまう。このため、以前より、命令キャッシ
ュ・メモリに於けるキャッシュ・ミスを少なくするため
に、分岐命令に対してあらかじめ正確な分岐予測を行
い、分岐先の命令をプリフェッチするキャッシュメモリ
制御装置が求められていた。However, even if the branch prediction circuit is used to complete the branch without disturbing the instruction pipeline, if the branch destination address causes a cache miss, the instruction pipeline stops, The effect of the branch prediction circuit is halved. Therefore, in order to reduce cache misses in the instruction cache memory, there has been a demand for a cache memory control device that pre-fetches a branch destination instruction by performing accurate branch prediction in advance for the branch instruction. It was

【０００８】[0008]

【発明が解決しようとする課題】上述したように、従来
のキャッシュ・メモリに於けるプリフェッチ方法では、
分岐先を正確に予測してプリフェッチすることができな
いため、不必要なプリフェッチを行い易い、不用意に必
要なブロックを追い出してしまう、必要以上に外部バス
を占有してしまう、などの問題点があった。As described above, according to the prefetch method in the conventional cache memory,
Since it is not possible to accurately predict the branch destination and perform prefetching, problems such as easy unnecessary prefetching, careless removal of necessary blocks, and unnecessarily occupying an external bus are encountered. there were.

【０００９】本発明は、以上のような従来技術の欠点を
鑑みてなされたものであり、命令キャッシュ・メモリへ
のプリフェッチの際に、分岐予測回路を参照して分岐先
を予測することにより、分岐が生じる場合でも、より正
確にプリフェッチを行い、不必要なプリフェッチを行わ
ないキャッシュメモリ制御装置を提供することを目的と
する。The present invention has been made in view of the above-mentioned drawbacks of the prior art, and at the time of prefetching to the instruction cache memory, by referring to the branch prediction circuit to predict the branch destination, An object of the present invention is to provide a cache memory control device that performs prefetching more accurately and does not perform unnecessary prefetching even when a branch occurs.

【００１０】[0010]

【課題を解決するための手段】上記目的を達成するた
め、本発明は、命令やデータを格納するキャッシュメモ
リと、分岐命令の格納されている番地と過去の分岐履歴
を保持し、分岐命令の格納されている番地によって次の
分岐先番地を予測する分岐予測回路と、次に実行する命
令の番地がキャッシュミスを起こしたことによってリフ
ィール処理を行う際に、リフィール処理と並行して前記
分岐予測回路をキャッシュミスを起こした番地をもとに
参照し、もし前記分岐予測回路にキャッシュミスを起こ
した番地が登録されており、かつこの分岐予測回路によ
って予測された分岐先番地がキャッシュメモリの中に存
在しない場合には、リフィール処理終了後、バスを解放
することなく、直ちにこの分岐予測回路によって予測さ
れた分岐予測番地を含むブロックのプリフェッチを行
い、もし前記分岐予測回路にキャッシュミスを起こした
番地が登録されており、かつ前記分岐予測回路によって
予測された分岐先番地がキャッシュメモリの中に存在す
る場合には、リフィール処理終了後、直ちにバスを解放
し、もし前記分岐予測回路にキャッシュミスを起こした
番地が登録されておらず、かつキャッシュミスを起こし
た番地を含むブロックの次のブロックがキャッシュメモ
リの中に存在しない場合には、リフィール処理終了後、
バスを解放することなく、直ちに次のブロックのプリフ
ェッチを行い、もし前記分岐予測回路にキャッシュミス
を起こした番地が登録されておらず、かつキャッシュミ
スを起こした番地を含むブロックの次のブロックが前記
キャッシュメモリの中に存在する場合には、リフィール
処理終了後、直ちにバスを解放するキャッシュ制御部と
から構成されている。To achieve the above object, the present invention holds a cache memory for storing instructions and data, an address where a branch instruction is stored and a past branch history, and stores the branch instruction. A branch prediction circuit that predicts the next branch destination address according to the stored address, and the branch prediction in parallel with the refill processing when the refill processing is performed due to a cache miss in the address of the instruction to be executed next. The circuit is referred to based on the address in which the cache miss has occurred, and if the address in which the cache miss has occurred is registered in the branch prediction circuit, and the branch destination address predicted by this branch prediction circuit is in the cache memory. If it does not exist, the branch prediction address predicted by this branch prediction circuit is immediately released without releasing the bus after the refill processing is completed. Prefetch of a block, and if the address causing the cache miss is registered in the branch prediction circuit and the branch destination address predicted by the branch prediction circuit exists in the cache memory, the refill Immediately after the processing is completed, the bus is released immediately. If the address causing the cache miss is not registered in the branch prediction circuit, and the block next to the block including the address causing the cache miss exists in the cache memory. If you don't, after finishing the refill process,
The next block is immediately prefetched without releasing the bus, and if the address causing the cache miss is not registered in the branch prediction circuit, and the block next to the block including the address causing the cache miss is When present in the cache memory, the cache control unit releases the bus immediately after the refill processing is completed.

【００１１】[0011]

【作用】上記構成により、本発明は、まず、キャッシュ
・メモリがキャッシュ・ミスすると、直ちに主記憶装置
よりリフィールを開始する。このとき同時に命令キュー
にも転送される。リフィールが終了すると命令キューの
内容によって直ちに演算の実行を開始するが、外部バス
の解放はしない。一方、リフィール開始とともに、キャ
ッシュ・ミスした番地は分岐予測回路に送られ、分岐予
測を行う。もし、ヒットしたならば、次も予測された番
地に分岐するものと予測する。With the above structure, according to the present invention, first, when a cache miss occurs in the cache memory, the refill is started from the main memory immediately. At this time, it is also transferred to the instruction queue. When the refill finishes, execution of the operation is immediately started according to the contents of the instruction queue, but the external bus is not released. On the other hand, when the refill is started, the address where the cache miss occurs is sent to the branch prediction circuit to perform branch prediction. If there is a hit, it is predicted that the next branch will also occur at the predicted address.

【００１２】リフィール処理終了後、すでにキャッシュ
・メモリに予測した番地を含むブロックがフェッチされ
ているか否かをチェックし、まだフェッチされていなけ
れば、外部バスを解放せず、直ちに分岐先番地を含むブ
ロックを主記憶装置からプリフェッチする。プリフェッ
チ中は、命令キューの中の命令が実行されているので、
プリフェッチ中も実行が停止することは少ない。分岐予
測回路に於いて、複数の分岐先番地がヒットする可能性
があるが、この場合は最も小さい番地に格納されている
分岐命令に対応する分岐先番地が選択される。もし、す
でにキャッシュ・メモリの中に予測した分岐先番地を含
むブロックがフェッチされているならば、直ちに外部バ
スを解放し、プリフェッチは行わない。After the refill processing is completed, it is checked whether or not the block including the predicted address has already been fetched in the cache memory. If it has not been fetched, the external bus is not released and the branch destination address is immediately included. Prefetch blocks from main memory. During prefetch, the instructions in the instruction queue are being executed, so
Execution rarely stops during prefetch. In the branch prediction circuit, a plurality of branch destination addresses may be hit, but in this case, the branch destination address corresponding to the branch instruction stored in the smallest address is selected. If the block including the predicted branch destination address is already fetched in the cache memory, the external bus is released immediately and the prefetch is not performed.

【００１３】分岐予測回路でヒットしなかった場合に
は、現在リフィール中のブロックの中には、分岐命令が
含まれておらず、したがって分岐は起きないものと考え
られる。よって、もしキャッシュ・メモリに現在リフィ
ール中のブロックの次のブロックがフェッチされていな
ければ、外部バスを解放せずに、直ちに次のブロックの
プリフェッチを行う。しかし、すでにキャッシュ・メモ
リに現在リフィール中の次のブロックがフェッチされて
いたならば、直ちに外部バスを解放し、プリフェッチは
行わない。If the branch prediction circuit does not hit, it is considered that the block currently being refilled does not include a branch instruction and therefore no branch occurs. Therefore, if the block next to the block currently being refilled has not been fetched into the cache memory, the next block is immediately prefetched without releasing the external bus. However, if the next block currently being refilled has already been fetched into the cache memory, the external bus is released immediately and prefetch is not performed.

【００１４】[0014]

【実施例】以下、本発明の実施例を図面を参照して説明
する。今回の実施例では、本プロセッサが図５で示すよ
うに、（１）命令フェッチ（Ｆ），（２）デコード／オ
ペランドフェッチ（Ｄ），（３）実行（Ｅ），（４）メ
モリ・アクセス（Ｍ），（５）レジスタ書き込み
（Ｗ），の５段の命令パイプラインで命令の実行を行う
ものとする。また、本プロセッサは、基本的に１ステー
ジを１クロックで実行する。ただし、浮動小数点演算
は、実行ステージに３クロックを要する。さらに、本プ
ロセッサは、命令パイプライン中にオペランドのバイパ
スが用意されており、次の演算で現在演算中の演算結果
をオペランドとして用いる場合には、現在の演算の実行
ステージ終了後、直ちに次の演算の実行を行うことがで
きる。アドレス・バスのビット幅は３２ｂｉｔ、命令の
語長は３２ｂｉｔ固定である。Embodiments of the present invention will be described below with reference to the drawings. In the present embodiment, as shown in FIG. 5, the processor performs (1) instruction fetch (F), (2) decode / operand fetch (D), (3) execution (E), (4) memory access. It is assumed that the instructions are executed in the five-stage instruction pipeline of (M) and (5) register write (W). Further, this processor basically executes one stage in one clock. However, floating-point arithmetic requires 3 clocks in the execution stage. Further, this processor is provided with an operand bypass in the instruction pipeline, and when the operation result of the current operation is used as an operand in the next operation, immediately after the execution stage of the current operation is finished, Operations can be performed. The bit width of the address bus is fixed at 32 bits and the word length of the instruction is fixed at 32 bits.

【００１５】図１は、本発明のキャッシュメモリ制御装
置に係わる一実施例の構成を示すブロック図である。ハ
ーバード・アーキテクチャをとるこのキャッシュメモリ
制御装置は、命令キャッシュ・メモリ２、キャッシュ制
御部６、分岐予測回路３、命令キュー４、主記憶装置１
４、及び外部バス１１３から構成されている。FIG. 1 is a block diagram showing the configuration of an embodiment relating to the cache memory control device of the present invention. This cache memory control device adopting the Harvard architecture includes an instruction cache memory 2, a cache control unit 6, a branch prediction circuit 3, an instruction queue 4, and a main storage device 1.
4 and the external bus 113.

【００１６】同図に於いて、プログラム・カウンタ１
は、次にフェッチする命令が格納されている主記憶装置
１４の番地を示している。In the figure, the program counter 1
Indicates the address of the main storage device 14 in which the instruction to be fetched next is stored.

【００１７】命令キャッシュ・メモリ２は、直接法で構
成されており、ハーバード・アーキテクチャに基づく命
令キャッシュ・メモリで、エントリ数１２８、１ブロッ
ク１２８ｂｉｔ、容量１６Ｋｂｉｔである。プログラム
・カウンタ１より与えられた主記憶番地１１１の下位４
ｂｉｔはブロック内番地１０３、上位２８ｂｉｔはブロ
ック番地を表す。また、ブロック番地の下位７ｂｉｔは
インデクス１０１、上位２１ｂｉｔはタグ１００であ
る。タグ・メモリ７には、２１ｂｉｔのタグが記憶され
ている。The instruction cache memory 2 is constructed by the direct method, is an instruction cache memory based on the Harvard architecture, and has 128 entries, 1 block 128 bits, and a capacity of 16 Kbits. Lower 4 of main memory address 111 given by program counter 1
Bit represents the address 103 in the block, and upper 28 bits represent the block address. The lower 7 bits of the block address are the index 101 and the upper 21 bits are the tag 100. A 21-bit tag is stored in the tag memory 7.

【００１８】この命令キャッシュ・メモリ２は、以下の
ように動作する。まず、インデクス１０１が、タグ・メ
モリ７に与えられ、インデクス１０１で示された番地の
内容をタグ１０５として読み出す。これと平行して、主
記憶番地１１１の下位１１ｂｉｔを番地と用いて、デー
タ・メモリ９がアクセスされる。読み出されたタグ１０
５とタグ１００は、ヒット／ミス検出器８で比較され
る。The instruction cache memory 2 operates as follows. First, the index 101 is given to the tag memory 7, and the content of the address indicated by the index 101 is read as the tag 105. In parallel with this, the data memory 9 is accessed using the lower 11 bits of the main memory address 111 as an address. Tag 10 read
5 and tag 100 are compared in hit / miss detector 8.

【００１９】この比較によってタグ１０５とタグ１００
が一致するならば、主記憶番地１１１を含むブロック
は、データ・メモリ９の中に存在し、アクセスされたブ
ロックから主記憶番地１１１の内容のコピーがラッチ１
０に読み出される。逆に、読み出されたタグ１０５とタ
グ１００が不一致ならば、主記憶番地１１１を含むブロ
ックはデータ・メモリ９の中に存在しないので、主記憶
装置１４より主記憶番地１１１を含むブロックを読み出
し、インデクス１０１で決まる命令キャッシュ・メモリ
２の中のブロックを読み出されたブロックに置き換え
る。同時にタグ・メモリ７のタグの内容もタグ１００に
更新する。By this comparison, the tag 105 and the tag 100
, The block containing main memory address 111 exists in data memory 9 and a copy of the contents of main memory address 111 from the accessed block is latched by latch 1
It is read to 0. On the contrary, if the read tag 105 and tag 100 do not match, the block including the main memory address 111 does not exist in the data memory 9, so the block including the main memory address 111 is read from the main memory device 14. , The block in the instruction cache memory 2 determined by the index 101 is replaced with the read block. At the same time, the content of the tag in the tag memory 7 is updated to the tag 100.

【００２０】分岐予測回路３は、完全連想方式による分
岐予測回路であり、分岐命令の番地を記憶するアドレス
・メモリ１１と分岐先番地を記憶するターゲット・メモ
リ１２より構成される。アドレス・メモリ１１のビット
幅は３０ｂｉｔ、エントリ数は１２８である。また、ア
ドレス・メモリ１１は４つのバンクに分割されており、
主記憶番地１１１の下位２ｂｉｔの値により示された特
定のバンクに分岐命令が格納されている主記憶装置１４
の番地が格納される。ただし、格納されるのは、下位２
ｂｉｔを除いた上位３０ｂｉｔである。ターゲット・メ
モリ１２のビット幅３２ｂｉｔ、エントリ数は１２８で
ある。ターゲット・メモリ１２は、アドレス・メモリ１
１に１対１対応しており、対応するアドレス・メモリ１
１に記憶された番地の分岐命令が分岐した過去１回の分
岐先番地を記憶している。The branch prediction circuit 3 is a fully associative branch prediction circuit, and comprises an address memory 11 for storing the address of a branch instruction and a target memory 12 for storing the branch destination address. The bit width of the address memory 11 is 30 bits and the number of entries is 128. The address memory 11 is divided into four banks,
The main memory device 14 in which the branch instruction is stored in the specific bank indicated by the value of the lower 2 bits of the main memory address 111.
The address of is stored. However, the lower two are stored.
It is the top 30 bits excluding bits. The target memory 12 has a bit width of 32 bits and the number of entries is 128. The target memory 12 is the address memory 1
There is a one-to-one correspondence with one, and the corresponding address memory 1
The branch destination address of the past one time when the branch instruction of the address stored in 1 branched is stored.

【００２１】このような分岐予測回路３は通常、以下の
ように動作する。まず、分岐先予測は、次のように行わ
れる。命令フェッチ・ステージで主記憶番地１１１が分
岐予測回路３に与えられると、下位２ｂｉｔがデコード
され、アドレス・メモリ１１の特定のバンクがアクセス
される。上位３０ｂｉｔはアドレス・メモリ１１に与え
られ、もし指定されたバンク内で一致する番地があれ
ば、それと対応するターゲット・メモリ１２に格納され
た番地を予測分岐先番地１０９として出力する。プロセ
ッサは、この番地に分岐すると仮定し、プログラム・カ
ウンタ１の値を予測分岐先番地１０９に変更する。Such a branch prediction circuit 3 normally operates as follows. First, branch destination prediction is performed as follows. When the main memory address 111 is given to the branch prediction circuit 3 at the instruction fetch stage, the lower 2 bits are decoded and a specific bank of the address memory 11 is accessed. The upper 30 bits are given to the address memory 11, and if there is a matching address in the designated bank, the address stored in the target memory 12 corresponding thereto is output as the predicted branch destination address 109. The processor changes the value of the program counter 1 to the predicted branch destination address 109 on the assumption that the address will branch to this address.

【００２２】また、履歴の登録は次のように行われる。
分岐命令が実行されると分岐命令の番地の下位２ｂｉｔ
によって示されたアドレス・メモリ１１の特定のバンク
に実行された分岐命令の番地の上位３０ｂｉｔが記憶さ
れる。また、ターゲット・メモリ１２には、プロセッサ
内部の実行部で算出された分岐先番地を記憶する。命令
キュー４は、現在実行中の命令を含むブロックが格納さ
れており、プログラム・カウンタ１の値により、実行部
に命令を発行する。The history registration is performed as follows.
When the branch instruction is executed, the lower 2 bits of the address of the branch instruction
The upper 30 bits of the address of the executed branch instruction are stored in the specific bank of the address memory 11 indicated by. Further, the target memory 12 stores the branch destination address calculated by the execution unit inside the processor. The instruction queue 4 stores a block including an instruction currently being executed, and issues an instruction to the execution unit according to the value of the program counter 1.

【００２３】インクリメンタ５は、現在リフィール処理
中のブロックの次のブロックの先頭番地を計算するビッ
ト幅３０ｂｉｔのインクリメンタである。キャッシュ・
ミスを起こした番地ｍの上位３０ｂｉｔはインクリメン
タに送られ、リフィール処理と並行してインクリメント
され、次のブロックの先頭番地ｍ＋４が計算される。番
地ｍ＋４は、プリフェッチの先頭番地として、命令キャ
ッシュ・メモリ２に転送される。キャッシュ制御部６
は、命令キャッシュ・メモリ２のリフィール処理、プリ
フェッチ処理などの制御を行う論理回路である。The incrementer 5 is an incrementer having a bit width of 30 bits for calculating the start address of the block next to the block currently being refilled. cache·
The upper 30 bits of the address m in which a mistake has occurred are sent to the incrementer and incremented in parallel with the refill processing, and the head address m + 4 of the next block is calculated. The address m + 4 is transferred to the instruction cache memory 2 as the leading address of prefetch. Cache control unit 6
Is a logic circuit for controlling refill processing, prefetch processing, etc. of the instruction cache memory 2.

【００２４】次に、本発明の動作説明を行う。本発明
は、命令キャッシュ・メモリ２が、キャッシュ・ミスし
た場合にのみ動作する。図２（ａ）〜図３（ｃ）は、本
発明のキャッシュメモリ制御装置を適用したプロセッサ
が下記に示す命令シーケンスを実行した場合の動作を示
したタイム・チャートである。図４（ａ）〜図４（ｃ）
は、本発明を適用していないプロセッサが同様な命令シ
ーケンスを実行した場合の動作を示したタイム・チャー
トである。Next, the operation of the present invention will be described. The present invention operates only when the instruction cache memory 2 makes a cache miss. 2A to 3C are time charts showing the operation when the processor to which the cache memory control device of the present invention is applied executes the following instruction sequence. 4 (a) to 4 (c)
FIG. 6 is a time chart showing an operation when a processor to which the present invention is not applied executes a similar instruction sequence.

【００２５】下記の命令で、ＡＤＤ，ＳＵＢ，ＭＵＬ
は、整数の加減算および乗算命令である。ＪＭＰＣＺ
は、条件分岐命令で１つ前の演算の結果が、“０”なら
ば分岐する。ＪＭＰＣＺは、スカッシュ分岐であり、分
岐が生じた場合にはｍ＋４の命令は実行されない。Ｒ
０，Ｒ１，Ｒ２，Ｒ３，Ｒ４はレジスタを表す。また、
ｍ−１，…，ｍ＋４，ｎ，ｎ＋１は、主記憶装置１４の
番地を表す。ｍ，ｍ＋４，ｎは、ブロックの境界に置か
れている。[0025] ADD, SUB, MUL
Is an integer addition / subtraction and multiplication instruction. JMPCZ
Branches if the result of the previous operation by the conditional branch instruction is "0". JMPCZ is a squash branch, and if the branch occurs, the m + 4 instruction is not executed. R
0, R1, R2, R3 and R4 represent registers. Also,
m-1, ..., M + 4, n, n + 1 represent addresses of the main storage device 14. m, m + 4, and n are placed at the boundaries of blocks.

【００２６】現在、プロセッサは、逐次的に上記の命令シーケンスを
実行している。プロセッサが、ｍ番地から命令のフェッ
チを行おうとしたところでキャッシュ・ミスを起こした
とする。プロセッサは、直ちに外部バス１１３を通し
て、命令キャッシュ・メモリ２のリフィールを行う。リ
フィールは、（１）外部バス１１３のバス権の獲得、
（２）主記憶装置１４の連続アクセスによって行われ
る。（１）の処理は、外部バス１１３の状態によって要
する時間が異なるが、図２〜図４では１クロックを要し
ている。[0026] Currently, the processor is sequentially executing the above instruction sequence. It is assumed that the processor makes a cache miss when it tries to fetch an instruction from the address m. The processor immediately refills the instruction cache memory 2 through the external bus 113. Refill is (1) acquisition of bus right of external bus 113,
(2) It is performed by continuous access to the main storage device 14. Although the time required for the process (1) depends on the state of the external bus 113, one clock is required in FIGS. 2 to 4.

【００２７】外部バス１１３は、３２ｂｉｔであり、主
記憶装置１４から２クロックで３２ｂｉｔのデータを転
送できるとすると、１ブロックの転送に８クロックを要
する。従って、リフィール処理に９クロックを要してい
る。命令キャッシュ・メモリ２は、データ・メモリ８の
中の任意のブロックを無効にし、そのブロックに、ｍ，
ｍ＋１，ｍ＋２，ｍ＋３番地の命令がリフィールされ
る。また、リフィールと同時に命令キュー４にも、ｍ，
ｍ＋１，ｍ＋２，ｍ＋３番地の命令が転送される。リフ
ィールが終了すると、プロセッサは直ちに命令キュー４
の内容によって実行を開始する。しかし、プロセッサ
は、リフィールが終了しても外部バス１１３を解放しな
い。The external bus 113 is 32 bits, and if it is possible to transfer 32 bits of data from the main storage device 14 in 2 clocks, it takes 8 clocks to transfer 1 block. Therefore, 9 clocks are required for the refill processing. The instruction cache memory 2 invalidates an arbitrary block in the data memory 8 and sets m,
The instructions at addresses m + 1, m + 2 and m + 3 are refilled. Also, at the same time as the refill, m,
Instructions at addresses m + 1, m + 2, m + 3 are transferred. When the refill ends, the processor immediately
Execution starts depending on the contents of. However, the processor does not release the external bus 113 even after the refill ends.

【００２８】一方、プロセッサは、リフィール処理を開
始するとともに、キャッシュ・ミスを起こした番地ｍを
分岐予測回路３に送る。分岐予測回路３は、番地ｍの上
位３０ｂｉｔをもとに分岐予測を行う（ＢＴＢアクセ
ス）。アドレス・メモリ１１の中に一致する番地がある
と、対応するターゲット・メモリ１２より予測分岐先番
地１０９が出力される。番地ｍの上位３０ｂｉｔで予測
するため、各バンク毎に１エントリが一致し、最大４つ
のエントリが一致する可能性がある。このように複数個
のエントリが一致を起こした場合には、最も小さいバン
ク番号のターゲット・メモリ１２の出力が優先される。On the other hand, the processor starts the refill processing and sends the address m where the cache miss has occurred to the branch prediction circuit 3. The branch prediction circuit 3 performs branch prediction based on the upper 30 bits of the address m (BTB access). If there is a matching address in the address memory 11, the predicted branch destination address 109 is output from the corresponding target memory 12. Since the prediction is performed in the upper 30 bits of the address m, one entry may match for each bank, and up to four entries may match. When a plurality of entries match as described above, the output of the target memory 12 having the smallest bank number is given priority.

【００２９】図２（ａ）、図２（ｂ）は、アドレス・メ
モリ１１の中に一致する番地があった場合の動作を示し
ている。予測分岐番地１０９は、命令キャッシュ・メモ
リ２に送られ、リフィール終了後、すでに命令キャッシ
ュ・メモリ２の中に転送されているか否かがテストされ
る。2A and 2B show the operation when there is a matching address in the address memory 11. The predicted branch address 109 is sent to the instruction cache memory 2, and after the refill is completed, it is tested whether or not it has already been transferred to the instruction cache memory 2.

【００３０】図２（ａ）は、予測分岐先番地１０９がキ
ャッシュ・ミスを生じた場合の動作を示した図である。
命令キャッシュ・メモリ２は、直ちに予測分岐番地１０
９を含むブロックのプリフェッチを行う。外部バス１１
３のバス権は、獲得したままなので、プリフェッチには
８クロックしか要さない。図４（ａ）は、本発明を適用
していないプロセッサの場合の動作を示している。図２
（ａ）および図４（ａ）より明らかなように、本発明を
適用した場合では、適用しない場合に比較して４クロッ
ク速く動作させることが可能である。実際には、外部バ
ス１１３を解放してしまうと、再びバス権を獲得するた
めにかなりの時間を要するので、本発明の効果はさらに
大きくなる。FIG. 2A is a diagram showing the operation when the predicted branch destination address 109 causes a cache miss.
The instruction cache memory 2 immediately returns the predicted branch address 10
Prefetch a block including 9. External bus 11
Since the bus right of 3 is still acquired, prefetching requires only 8 clocks. FIG. 4A shows the operation in the case of the processor to which the present invention is not applied. Figure 2
As is clear from (a) and FIG. 4 (a), when the present invention is applied, it is possible to operate 4 clocks faster than when the present invention is not applied. Actually, if the external bus 113 is released, it takes a considerable time to acquire the bus right again, so that the effect of the present invention is further enhanced.

【００３１】図２（ｂ）は、予測分岐先番地１０９が、
命令キャッシュ・メモリ２でヒットした場合である。す
でに命令キャッシュ・メモリ２の中にフェッチされてい
るので、プリフェッチする必要はない。キャッシュ制御
部６は、外部バス１１３を直ちに解放する。In FIG. 2B, the predicted branch destination address 109 is
This is a case where the instruction cache memory 2 is hit. There is no need to prefetch because it has already been fetched into the instruction cache memory 2. The cache control unit 6 immediately releases the external bus 113.

【００３２】図２（ｃ）、図３（ａ）は、アドレス・メ
モリ１１の中に番地ｍの上位３０ｂｉｔと一致する番地
がなかった場合である。この場合は、現在リフィール中
のブロックの中には分岐命令が存在せず、したがって、
ｍ＋３番地の命令の終了後、ｍ＋４番地の命令が実行さ
れるものと考えられる。この場合も、プロセッサは、リ
フィール終了後、命令キュー４の内容によって直ちに実
行を開始するが、外部バス１１３は解放しない。一方、
インクリメンタ５は、リフィール処理に並行してキャッ
シュ・ミス番地ｍより番地ｍ＋４を計算する。番地ｍ＋
４は、命令キャッシュ・メモリ２に送られ、すでに命令
キャッシュ・メモリ２の中に転送されているか否かがテ
ストされる。FIGS. 2 (c) and 3 (a) show the case where there is no address in the address memory 11 that matches the upper 30 bits of the address m. In this case, there are no branch instructions in the block currently being refilled, and therefore
It is considered that the instruction at the address m + 4 is executed after the instruction at the address m + 3 is completed. In this case as well, the processor immediately starts execution according to the contents of the instruction queue 4 after the refill ends, but does not release the external bus 113. on the other hand,
The incrementer 5 calculates the address m + 4 from the cache miss address m in parallel with the refill processing. Address m +
4 is sent to the instruction cache memory 2 to test whether it has already been transferred into the instruction cache memory 2.

【００３３】図２（ｃ）は、番地ｍ＋４が命令キャッシ
ュ・メモリ２でキャッシュ・ミスを起こした場合の動作
を示している。命令キャッシュ・メモリ２は、番地ｍ＋
４よりプリフェッチを行う。外部バス１１３は獲得した
ままなので、プリフェッチには８クロックしか要さな
い。FIG. 2C shows the operation when the address m + 4 causes a cache miss in the instruction cache memory 2. The instruction cache memory 2 has an address m +
Perform prefetch from 4. Since the external bus 113 remains acquired, prefetching requires only 8 clocks.

【００３４】図３（ａ）は、番地ｍ＋４が命令キャッシ
ュ・メモリ２でヒットした場合の動作を示している。こ
の場合は、プリフェッチする必要がないので、外部バス
１１３を直ちに解放する。FIG. 3A shows the operation when the address m + 4 is hit in the instruction cache memory 2. In this case, there is no need for prefetching, so the external bus 113 is released immediately.

【００３５】さらに、図３（ｂ）は、図２（ａ）と同様
な場合であるが、以下のような命令シーケンスが実行さ
れた場合の動作を示している。下記の命令シーケンスに
於いて、ＦＡＤＤ，ＦＳＵＢ，ＦＭＵＬは、浮動小数点
数の加減算および乗算命令であり、実行ステージに３ク
ロックを必要とする。下記の命令シーケンスでは、レジ
スタが書き込み→読み込みのコンフリクトを生じてお
り、ｍ＋１，ｍ＋２，ｍ＋３番地の命令は、前の命令の
実行ステージが終了するまで演算を開始することができ
ない。Further, FIG. 3B shows the same operation as in FIG. 2A, but shows the operation when the following instruction sequence is executed. In the instruction sequence below, FADD, FSUB, and FMUL are floating-point number addition / subtraction and multiplication instructions and require 3 clocks in the execution stage. In the following instruction sequence, the register has a write-to-read conflict, and the instruction at addresses m + 1, m + 2, and m + 3 cannot start the operation until the execution stage of the previous instruction ends.

【００３６】この場合、図３（ｂ）より明らかなように分岐命令が実
行される時刻には、すでに分岐先の命令は、命令キャッ
シュ・メモリ２にフェッチされているので、分岐予測が
間違っていなければ、命令パイプラインを乱すことなく
実行が継続される。[0036] In this case, as is clear from FIG. 3B, at the time when the branch instruction is executed, the branch destination instruction has already been fetched into the instruction cache memory 2, so if the branch prediction is correct, Execution continues without disturbing the instruction pipeline.

【００３７】図４（ｃ）は、本発明を適用していないプ
ロセッサの場合を示している。この場合では、分岐先命
令のフェッチを行うところで再びキャッシュ・ミスが生
じる。図３（ｂ）、図４（ｃ）より明らかなように、本
発明を適用した場合、１０クロック速く動作することが
可能である。実際には、外部バス１１３を解放してしま
うと、再びバス権を獲得するためにかなりの時間を要す
るので、本発明の効果はさらに大きくなる。FIG. 4C shows the case of a processor to which the present invention is not applied. In this case, a cache miss occurs again where the branch destination instruction is fetched. As is clear from FIGS. 3B and 4C, when the present invention is applied, it is possible to operate 10 clocks faster. Actually, if the external bus 113 is released, it takes a considerable time to acquire the bus right again, so that the effect of the present invention is further enhanced.

【００３８】図３（ｃ）は、分岐予測が失敗した場合の
動作を示している。分岐予測の失敗は、実行ステージの
クロックで明らかになる。この場合、キャッシュ制御部
６は、直ちにプリフェッチを中断する。プリフェッチを
行っていた命令キャッシュ・メモリ２のブロックは、無
効フラグをセットすることにより、ブロックの内容が無
効になる。FIG. 3C shows the operation when the branch prediction fails. Branch prediction failures are revealed at the clock of the execute stage. In this case, the cache control unit 6 immediately suspends prefetch. The block of the instruction cache memory 2 that has been prefetched becomes invalid by setting the invalid flag.

【００３９】本実施例では、簡単のため、命令キャッシ
ュ・メモリ２の構成を直接方式、分岐予測回路３は完全
連想方式であるとした。しかし、命令キャッシュ・メモ
リ２および分岐予測回路３を如何なる方式で構成して
も、本発明の効果には何等影響を与えることはない。ま
た、ハーバード・アーキテクチャに限らず、キャッシュ
・メモリに命令およびデータが混在するようなアーキテ
クチャをとった場合には、命令フェッチの場合に限っ
て、本実施例で示した動作を行えば、まったく同様な効
果が得られるものである。In this embodiment, for the sake of simplicity, the instruction cache memory 2 has a direct system and the branch prediction circuit 3 has a completely associative system. However, no matter how the instruction cache memory 2 and the branch prediction circuit 3 are configured, the effect of the present invention is not affected. Further, not only the Harvard architecture but also the architecture in which the instruction and the data are mixed in the cache memory, if the operation shown in this embodiment is performed only in the case of the instruction fetch, it is exactly the same. It is possible to obtain various effects.

【００４０】[0040]

【発明の効果】このように、本発明のキャッシュ・メモ
リ制御装置によれば、分岐予測回路を参照して分岐を予
測するため、分岐が生じる場合でも正確にキャッシュ・
メモリへのプリフェッチを行うことができ、キャッシュ
・メモリのヒット率の向上、命令パイプラインの乱れ削
減、および不必要なリフィールの削減が可能となる。こ
の結果、プロセッサの処理速度を向上させることができ
る。As described above, according to the cache memory control device of the present invention, the branch is predicted by referring to the branch prediction circuit.
It is possible to perform prefetch to the memory, and it is possible to improve the hit rate of the cache memory, reduce the disorder of the instruction pipeline, and reduce unnecessary refills. As a result, the processing speed of the processor can be improved.

[Brief description of drawings]

【図１】本発明のキャッシュメモリ制御装置に係わる一
実施例の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment relating to a cache memory control device of the present invention.

【図２】本発明のキャッシュメモリ制御装置の動作を示
すタイム・チャートである。FIG. 2 is a time chart showing the operation of the cache memory control device of the present invention.

【図３】本発明のキャッシュメモリ制御装置の動作を示
すタイム・チャートである。FIG. 3 is a time chart showing the operation of the cache memory control device of the present invention.

【図４】従来のキャッシュメモリ制御装置の動作を示す
タイム・チャートである。FIG. 4 is a time chart showing the operation of a conventional cache memory control device.

【図５】本発明のキャッシュメモリ制御装置を用いたプ
ロセッサのパイプライン動作を説明するための動作図で
ある。FIG. 5 is an operation diagram for explaining a pipeline operation of a processor using the cache memory control device of the present invention.

[Explanation of symbols]

１プログラム・カウンタ２命令キャッシュ・メモリ３分岐予測回路４命令キュー５インクリメンタ６キャッシュ制御部７タグ・メモリ８ヒット／ミス検出器９データ・メモリ１０ラッチ１１アドレス・メモリ１２ターゲット・メモリ１３デコーダ１４主記憶装置１００タグ１０１インデクス１０２ブロック内番地１０３ブロック内番地（下位２ｂｉｔ）１０４主記憶番地（上位３０ｂｉｔ）１０５タグ１０６記憶データ１０７フェッチ・データ１０８命令１０９予測分岐番地１１０バンク選択信号１１１主記憶番地１１２次フェッチブロック先頭番地 1 program counter 2 instruction cache memory 3 branch prediction circuit 4 instruction queue 5 incrementer 6 cache control unit 7 tag memory 8 hit / miss detector 9 data memory 10 latch 11 address memory 12 target memory 13 decoder 14 Main storage device 100 Tag 101 Index 102 In-block address 103 In-block address (lower 2 bits) 104 Main storage address (upper 30 bits) 105 Tag 106 Storage data 107 Fetch data 108 Instruction 109 Predictive branch address 110 Bank selection signal 111 Main storage address 112 First address of next fetch block

─────────────────────────────────────────────────────
─────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成３年１０月４日[Submission date] October 4, 1991

【手続補正１】[Procedure Amendment 1]

【補正対象書類名】図面[Document name to be corrected] Drawing

【補正対象項目名】図１[Name of item to be corrected] Figure 1

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【図１】 [Figure 1]

Claims

[Claims]

1. A branch for holding a cache memory for storing instructions and data, an address for storing a branch instruction and a past branch history, and predicting a next branch destination address according to the address for storing the branch instruction. When performing a refill process due to the prediction circuit and the address of the instruction to be executed next caused a cache miss, the branch prediction circuit is referenced in parallel with the refill process based on the address at which the cache miss occurred. If the address causing the cache miss is registered in the branch prediction circuit and the branch destination address predicted by the branch prediction circuit does not exist in the cache memory, the bus is released after the refill processing is completed. A cache control unit that immediately prefetches a block including the branch prediction address predicted by this branch prediction circuit without A cache memory control device comprising: