JPH07200406A

JPH07200406A - Cache system

Info

Publication number: JPH07200406A
Application number: JP5348760A
Authority: JP
Inventors: Tsukasa Matoba; 司的場
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-12-27
Filing date: 1993-12-27
Publication date: 1995-08-04

Abstract

PURPOSE:To improve the performance of branch instruction execution by a CPU. CONSTITUTION:A cache controller 303 registers a branch destination address in an entry of a tag memory 301 corresponding to a cache line when the branch instruction in the cache line of an instruction memory 302 is executed. When the instruction of the cache line is fetched again, the cache controller 303 inspects the branch destination address registered in the entry of the tag memory 301 corresponding to the cache line, and reads an instruction group including the instruction of the branch destination address out of main storage and stores it in the instruction memory 301 prior to the execution of the branch instruction. Therefore, a determining process for the branch destination address and a cache refilling process can be started prior to the execution of the branch instruction to improve the CPU performance.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明はキャッシュシステムに
関し、特に命令キャッシュを有するキャッシュシステム
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a cache system, and more particularly to a cache system having an instruction cache.

【０００２】[0002]

【従来の技術】近年、コンピュータアーキテクチャの進
歩に伴い、コンピュータの性能は飛躍的に向上してい
る。特に、半導体技術の発展により、コンピュータのＣ
ＰＵとして使用されるマイクロプロセッサの性能向上は
目覚ましく、その性能は年々向上している。2. Description of the Related Art In recent years, the performance of computers has been dramatically improved with the progress of computer architecture. Especially, due to the development of semiconductor technology, computer C
The performance of a microprocessor used as a PU has been remarkably improved, and the performance has been improved year by year.

【０００３】最近のＣＰＵにおいては、命令実行処理の
効率を高めるために、命令パイプライン方式が採用され
ているのが普通である。命令パイプライン方式は、命令
の実行を命令フェッチサイクル、デコードサイクル、実
行サイクル、データ書き込みサイクル等の段階に分け、
複数の命令をそれぞれ段階的にオーバーラップして実行
する方式である。この方式では、ある命令の実行完了を
待たずに後続する命令のフェッチを行うために、命令の
先取り処理が行なわれる。命令先取り処理は、将来実行
が予想される命令のフェッチを、前の命令のデコードや
実行と並行して予め行なうものである。In recent CPUs, an instruction pipeline system is usually adopted in order to increase the efficiency of instruction execution processing. The instruction pipeline system divides the execution of instructions into stages such as an instruction fetch cycle, a decode cycle, an execution cycle, a data write cycle,
It is a method of executing a plurality of instructions in a stepwise overlapping manner. In this method, an instruction prefetch process is performed in order to fetch a subsequent instruction without waiting for the completion of execution of a certain instruction. The instruction prefetch process is to fetch an instruction that is expected to be executed in the future in parallel with the decoding and execution of the previous instruction.

【０００４】このようにＣＰＵが複数命令をパイプライ
ンで並行処理するためには、主記憶からの命令読み出
し、および主記憶に対するデータ読み出し／書き込みを
高速に実行することが要求される。As described above, in order for the CPU to process a plurality of instructions in parallel in a pipeline, it is required to read instructions from the main memory and read / write data to / from the main memory at high speed.

【０００５】そこで、最近のマイクロプロセッサには、
データキャッシュの他に、それと独立してアクセス可能
な命令キャッシュが設けられている。命令キャッシュは
命令語専用のキャッシュメモリである。このように命令
キャッシュとデータキャッシュを別個にすることによ
り、命令およびデータの双方を同時に高速アクセスする
ことが可能となる。Therefore, in recent microprocessors,
In addition to the data cache, an instruction cache that can be accessed independently of the data cache is provided. The instruction cache is a cache memory dedicated to instruction words. By thus separating the instruction cache and the data cache, it becomes possible to simultaneously access both the instruction and the data at high speed.

【０００６】このようなマイクロプロセッサをＣＰＵと
して使用した場合においては、命令キャッシュのアクセ
スは次のように行なわれる。When such a microprocessor is used as a CPU, the instruction cache is accessed as follows.

【０００７】すなわち、ＣＰＵは、命令フェッチアドレ
スを順次更新して、アドレス順に連続して格納された命
令群を命令キャッシュから１個づつシリアルに読み出
す。通常は、ＣＰＵによる命令の実行順序は命令キャッ
シュ内の命令の格納順に一致するので、命令キャッシュ
からの命令の先読みを効率良く行うことができる。That is, the CPU sequentially updates the instruction fetch address, and serially reads, from the instruction cache, an instruction group stored successively in the address order. Normally, the order of execution of instructions by the CPU is the same as the order of instruction storage in the instruction cache, so prefetching of instructions from the instruction cache can be performed efficiently.

【０００８】しかしながら、ＣＰＵによって分岐命令が
実行されるときは、次のような問題が発生する。However, when the CPU executes the branch instruction, the following problems occur.

【０００９】すなわち、分岐命令、特に条件分岐命令の
場合は、その分岐命令が実行されまでその分岐先アドレ
スが確定しない。このため、分岐命令実行前にその分岐
先の命令をフェッチすることができず、分岐先命令のフ
ェッチが遅れる。このような分岐先命令のフェッチの遅
れは、ＣＰＵの性能低下を引き起こす大きな原因とな
る。That is, in the case of a branch instruction, particularly a conditional branch instruction, the branch destination address is not fixed until the branch instruction is executed. Therefore, the instruction of the branch destination cannot be fetched before the execution of the branch instruction, and the fetch of the branch destination instruction is delayed. Such a delay in fetching the branch destination instruction is a major cause of deterioration in CPU performance.

【００１０】従来では、このような分岐命令を原因とす
るＣＰＵの性能低下を改善するための手法としては、分
岐予測と呼ばれる投機的手法が用いられていた。Conventionally, a speculative method called branch prediction has been used as a method for improving the performance deterioration of the CPU caused by such a branch instruction.

【００１１】分岐予測は、条件分岐命令のコンディショ
ンコードの確定とチェックの結果を待たずして、分岐す
るかしないかをハードウェアが予測して、分岐成立と判
断した場合にはアドレス順に連続した命令ではなく分岐
先の命令を命令キャッシュから読み込む方法である。こ
の分岐予測の方法を使用した場合には、予測が成功すれ
ば分岐先の命令を命令キャッシュから高速に読み込む事
ができ、分岐先命令のフェッチの遅れによるペナルティ
を無くす事ができる。In the branch prediction, the hardware predicts whether or not to branch without waiting for the result of confirmation and check of the condition code of the conditional branch instruction, and when it is determined that the branch is taken, it is continued in the order of addresses. This is a method of reading not the instruction but the instruction at the branch destination from the instruction cache. When this branch prediction method is used, if the prediction is successful, the branch destination instruction can be read from the instruction cache at high speed, and the penalty due to the delay in fetching the branch destination instruction can be eliminated.

【００１２】しかし、たとえ分岐予測が成功しても、キ
ャッシュミスが発生するとＣＰＵによる命令実行は長い
間またされることになる。However, even if the branch prediction succeeds, the instruction execution by the CPU will be delayed for a long time if a cache miss occurs.

【００１３】つまり、ＣＰＵが分岐命令を実行する段階
になるまではその分岐先アドレスが決定されないので、
命令キャッシュの検索は分岐命令の実行時まで行われな
い。このため、もしその分岐先アドレスがキャッシュミ
スの場合は、その時点で初めてキャッシュリフィルが開
始されることになる。ここで、キャッシュリフィルと
は、主記憶からのデータ転送によって命令キャッシュの
内容を入れ替える操作のことである。That is, the branch destination address is not determined until the CPU executes the branch instruction.
The instruction cache is not searched until the branch instruction is executed. Therefore, if the branch destination address is a cache miss, the cache refill will be started only at that time. Here, the cache refill is an operation of replacing the contents of the instruction cache by data transfer from the main memory.

【００１４】一般に、主記憶アクセスには比較的多くの
時間を要する。このため、キャッシュミスが発生すると
分岐先命令をフェッチするために多くの時間を要し、そ
の間ＣＰＵはフェッチステージのまま待機されることに
なる。Generally, a main memory access requires a relatively long time. Therefore, if a cache miss occurs, it takes a lot of time to fetch the branch destination instruction, and the CPU is in a waiting state in the fetch stage during that time.

【００１５】また、また従来の分岐予測は、分岐先バッ
ファと呼ばれる連想メモリを含む大規模で複雑なハード
ウェアを必要し、またそのハードウェア制御がたいへん
複雑となる欠点もあった。Further, the conventional branch prediction requires a large-scale and complicated hardware including an associative memory called a branch destination buffer, and its hardware control is very complicated.

【００１６】[0016]

【発明が解決しようとする課題】従来では、ＣＰＵが分
岐命令をフェッチする段階になって初めて分岐先アドレ
スを導出することが可能になり、その分岐先アドレスが
キャッシュミスの場合はその時点で初めてキャッシュリ
フィルが開始される。このため、ＣＰＵが分岐先命令を
フェッチするまでに要する長い時間ＣＰＵがフェッチス
テージのまま待機されてしまい、これによってＣＰＵ性
能が著しく低下される欠点があった。また、分岐予測の
ために分岐バッファを含む大規模で複雑なハードウェア
が必要となり、そのハードウェア制御が繁雑になる欠点
もあった。Conventionally, the branch destination address can be derived only when the CPU fetches the branch instruction, and if the branch destination address is a cache miss, it is the first time at that point. Cache refill is started. For this reason, there is a drawback that the CPU waits in the fetch stage for a long time required for the CPU to fetch the branch destination instruction, which significantly reduces the CPU performance. In addition, a large-scale and complicated hardware including a branch buffer is required for branch prediction, which makes the hardware control complicated.

【００１７】この発明はこのような点に鑑みてなされた
もので、命令キャッシュのタグメモリの情報を利用する
ことによって分岐先アドレスの決定処理およびキャッシ
ュリフィル処理を分岐命令の実行に先立って開始できる
ようにし、ＣＰＵ性能の向上を実現できるキャッシュシ
ステムを提供することを目的とする。The present invention has been made in view of the above circumstances, and by utilizing the information in the tag memory of the instruction cache, the branch destination address determination processing and the cache refill processing can be started prior to the execution of the branch instruction. Thus, it is an object of the present invention to provide a cache system that can improve CPU performance.

【００１８】[0018]

【課題を解決するための手段および作用】この発明は、
主記憶上の異なるブロックの命令群をそれぞれ格納する
複数のキャッシュラインを有する命令メモリと、この命
令メモリのキャッシュラインに格納されている命令群の
ブロックアドレスをそれぞれ格納する複数のエントリを
有するタグメモリとを備えたキャッシュシステムにおい
て、ＣＰＵによる分岐命令の実行に応答して、その分岐
命令によって指定される分岐先アドレスを、その分岐命
令が格納されている前記命令メモリのキャッシュライン
に対応する前記タグメモリのエントリに登録する手段
と、前記ＣＰＵによる前記命令メモリからの命令のフェ
ッチに応答して、そのフェッチされる命令が格納されて
いるキャッシュラインに対応する前記タグメモリのエン
トリに登録されている分岐先アドレスの命令についてキ
ャッシュヒット／キャッシュミスを判定し、キャッシュ
ミスの時に前記タグメモリのエントリに登録されている
分岐先アドレスの命令を含む命令群を前記主記憶から読
み出して前記命令メモリに格納する分岐先命令先取り手
段とを具備することを特徴とする。Means and Actions for Solving the Problems
An instruction memory having a plurality of cache lines respectively storing instruction groups of different blocks on the main memory, and a tag memory having a plurality of entries respectively storing block addresses of the instruction groups stored in the cache lines of the instruction memory And a tag corresponding to a branch address specified by the branch instruction corresponding to a cache line of the instruction memory in which the branch instruction is stored, in response to execution of the branch instruction by the CPU. Means for registering in an entry of the memory, and in response to an instruction fetch from the instruction memory by the CPU, an entry of the tag memory corresponding to a cache line in which the fetched instruction is stored. Cache hit / key for instruction at branch destination address A branch destination instruction prefetching unit that determines a cache miss, reads an instruction group including an instruction of a branch destination address registered in the entry of the tag memory from the main memory and stores the instruction destination in the instruction memory when the cache miss occurs. It is characterized by

【００１９】このキャッシュシステムにおいては、キャ
ッシュライン内の分岐命令の実行時に、そのキャッシュ
ラインのタグエントリに分岐先アドレスが登録される。
そして、そのキャッシャラインの命令が再びフェッチさ
れる段階になると、そのキャッシュラインのタグエント
リに登録されいてる分岐先アドレスが検査される。この
場合、キャッシュミスならば、前述の分岐命令の実行に
先立って、その分岐先アドレスの命令を含む命令群が命
令メモリに格納される。したがって、分岐先アドレスの
決定処理およびキャッシュリフィル処理を分岐命令の実
行に先立って開始できるようになり、ＣＰＵ性能の向上
を実現できる。In this cache system, when the branch instruction in the cache line is executed, the branch destination address is registered in the tag entry of the cache line.
When the instruction of the cache line is fetched again, the branch destination address registered in the tag entry of the cache line is checked. In this case, if there is a cache miss, the instruction group including the instruction of the branch destination address is stored in the instruction memory prior to the execution of the branch instruction. Therefore, the branch destination address determination processing and the cache refill processing can be started prior to the execution of the branch instruction, and the CPU performance can be improved.

【００２０】[0020]

【実施例】以下、図面を参照してこの発明の実施例を説
明する。まず、図１を参照して、この発明のキャッシュ
メモリシステムを内蔵したマイクロプロセッサの全体構
成を説明する。Embodiments of the present invention will be described below with reference to the drawings. First, referring to FIG. 1, the overall structure of a microprocessor incorporating the cache memory system of the present invention will be described.

【００２１】このマイクロプロセッサ１００はＲＩＳＣ
型プロセッサであり、このマイクロプロセッサ１００に
は、ＣＰＵコアユニット２００、命令キャッシュ３０
０、データキャッシュ４００、およびレジスタファイル
５００などが設けられている。This microprocessor 100 is a RISC
The microprocessor 100 includes a CPU core unit 200 and an instruction cache 30.
0, a data cache 400, a register file 500 and the like are provided.

【００２２】命令キャッシュ３００は、ＣＰＵコアユニ
ット２００によって実行される命令群の一部を記憶する
ためのものであり、ｎ＋１個のキャッシュラインを有し
ている。これらキャッシュラインは、ＣＰＵコアユニッ
ト２００からの命令アドレスによって検索される。この
命令キャッシュ３００には、図示のように、タグメモリ
３０１および命令メモリ３０２が設けられている。The instruction cache 300 is for storing a part of the instruction group executed by the CPU core unit 200, and has n + 1 cache lines. These cache lines are searched by the instruction address from the CPU core unit 200. The instruction cache 300 is provided with a tag memory 301 and an instruction memory 302 as shown in the figure.

【００２３】タグメモリ３０１は、命令メモリ３０２が
保持する命令が主記憶３０のどのアドレスに対応するも
のであるかを示すディレクトリ記憶として利用される。
このタグメモリ３０１は、命令メモリ３０２のキャッシ
ュライン０〜ｎと等しい数のｎ＋１個のタグエントリ０
〜ｎを有している。この場合、１つのキャッシュライン
と１つのタグエントリは、ＣＰＵコアユニット２００に
よって同時にアクセスされる。The tag memory 301 is used as a directory storage indicating which address in the main memory 30 the instruction stored in the instruction memory 302 corresponds to.
This tag memory 301 has n + 1 tag entries 0, which is the same number as the cache lines 0 to n of the instruction memory 302.
Have n. In this case, one cache line and one tag entry are simultaneously accessed by the CPU core unit 200.

【００２４】タグメモリ３０１において、各タグエント
リには、バリッドビットＶ、上位ビットアドレスＣＡ、
ネクストアドレス有効ビットＮＶ、ネクスト命令アドレ
スフィールドＮＣＡ、ネクストアドレス予測命令アドレ
スフィールドＮＡＦが登録される。In the tag memory 301, each tag entry has a valid bit V, an upper bit address CA,
The next address valid bit NV, the next instruction address field NCA, and the next address prediction instruction address field NAF are registered.

【００２５】タグエントリから読み出されるバリッドビ
ットＶおよび上位ビットアドレスＣＡは、ＣＰＵコアユ
ニット２００によって現在アクセスされているキャッシ
ュラインのデイレクトリ情報として利用される。この場
合、バリッドビットＶは、現在アクセスされているキャ
ッシュラインに格納されている８個の命令（命令１〜命
令８）が有効か否かを示す。また、上位ビットアドレス
ＣＡは、現在アクセスされているキャッシュラインに格
納されている８個の命令（命令１〜命令８）が主記憶３
０のどのブロックの命令群であるかを示すブロックアド
レスである。これらバリッドビットＶおよび上位ビット
アドレスＣＡの値は、キャッシュヒット／キャッシュミ
スの判定に利用される。The valid bit V and the upper bit address CA read from the tag entry are used as directory information of the cache line currently accessed by the CPU core unit 200. In this case, the valid bit V indicates whether or not the eight instructions (instruction 1 to instruction 8) stored in the currently accessed cache line are valid. Further, the upper bit address CA is composed of eight instructions (instruction 1 to instruction 8) stored in the currently accessed cache line.
It is a block address indicating which block of 0 is the instruction group. The values of the valid bit V and the high-order bit address CA are used for determining cache hit / cache miss.

【００２６】ネクストアドレス有効ビットＮＶ、ネクス
ト命令アドレスフィールドＮＣＡ、およびネクストアド
レス予測命令アドレスフィールドＮＡＦは、このキャッ
シュメモリシステムにおいて分岐先命令のフェッチに要
するストール時間を短縮するためにタグメモリ３０１に
新たに追加された情報である。The next address valid bit NV, the next instruction address field NCA, and the next address prediction instruction address field NAF are newly added to the tag memory 301 in order to shorten the stall time required for fetching a branch destination instruction in this cache memory system. This is the added information.

【００２７】ネクストアドレス有効ビットＮＶは、ネク
スト命令アドレスフィールドＮＣＡによって指定される
次にアクセスされるべき分岐先アドレスが有効か否かを
示す。ネクスト命令アドレスフィールドＮＣＡは、次に
アクセスされるべき分岐先アドレス、すなわち、現在ア
クセスされているキャッシュラインに存在する分岐命令
によって指定される分岐先アドレスを示す。ネクストア
ドレス有効ビットＮＶおよびネクスト命令アドレスフィ
ールドＮＣＡの値は、先行キャッシュリフィルを実行す
るために利用される。ここで、先行キャッシュリフィル
とは、現在アクセスされているキャッシュラインに存在
する分岐命令が実行される前にその分岐先アドレスの命
令を含む命令ブロックを命令メモリ３００に格納すると
いう分岐先命令の先取り操作をいう。The next address valid bit NV indicates whether or not the branch destination address to be accessed next designated by the next instruction address field NCA is valid. The next instruction address field NCA indicates the branch destination address to be accessed next, that is, the branch destination address specified by the branch instruction existing in the currently accessed cache line. The value of the next address valid bit NV and the value of the next instruction address field NCA are used to perform the preceding cache refill. Here, the preceding cache refill is a prefetch of a branch destination instruction in which an instruction block including the instruction of the branch destination address is stored in the instruction memory 300 before the branch instruction existing in the currently accessed cache line is executed. Refers to the operation.

【００２８】ネクストアドレス予測命令アドレスフィー
ルドＮＡＦは、現在アクセスされているキャッシュライ
ンに存在する分岐命令のオフセットアドレスを特定する
ために必要な情報であり、これは、ネクスト命令アドレ
スフィールドＮＣＡを用いた先行キャッシュリフィルを
許可／禁止するために利用される。この場合、ネクスト
アドレス予測命令アドレスフィールドＮＡＦによって指
定される分岐命令のアドレスよりも現在アクセスされて
いる命令アドレスの値が小さいならば先行キャッシュリ
フィルは実行許可され、大きいならばその実行は禁止さ
れる。The next address prediction instruction address field NAF is information necessary for specifying the offset address of the branch instruction existing in the cache line which is currently being accessed, and it is the information using the preceding instruction using the next instruction address field NCA. Used to allow / prohibit cache refill. In this case, if the value of the instruction address currently accessed is smaller than the address of the branch instruction specified by the next address prediction instruction address field NAF, the execution of the preceding cache refill is permitted, and if it is larger, the execution thereof is prohibited. .

【００２９】通常、現在アクセスされているキャッシュ
ラインに存在する命令群はアドレス順に逐次実行され
る。このため、ネクストアドレス予測命令アドレスフィ
ールドＮＡＦを用いた先行キャッシュリフィルの実行許
可／禁止操作により、現在アクセスされているキャッシ
ュラインに存在する分岐命令が後に実行される場合にだ
けリフィルが許可され、無駄なリフィル操作の実行は防
止される。Normally, the instruction groups existing in the currently accessed cache line are sequentially executed in the address order. For this reason, the execution permission / prohibition operation of the preceding cache refill using the next address prediction instruction address field NAF allows the refill only when the branch instruction existing in the currently accessed cache line is executed later, and thus is unnecessary. Execution of a simple refill operation is prevented.

【００３０】これらネクストアドレス有効ビットＮＶお
よびネクスト命令アドレスフィールドＮＣＡを用いた先
行キャッシュリフィル操作、およびネクストアドレス予
測命令アドレスフィールドＮＡＦを用いた先行キャッシ
ュリフィル操作の許可／禁止操作は、この発明の特徴と
する部分であり、その操作手順の詳細については図４以
降で説明する。The preceding cache refill operation using the next address valid bit NV and the next instruction address field NCA and the permission / prohibition operation of the preceding cache refill operation using the next address prediction instruction address field NAF are the features of the present invention. The details of the operation procedure will be described later with reference to FIG.

【００３１】ＣＰＵコアユニット２００は、命令キャッ
シュ３００、データキャッシュ４００およびレジスタフ
ァイル５００を除くマイクロプロセッサ１００内のほと
んど全てのユニットを総称して示すものであり、それぞ
れ独立動作可能な命令フェッチユニット２０１、命令デ
コードユニット２０２、命令実行ユニット２０３、およ
びデータ書き込みユニット２０４を含んでいる。これら
ユニットは、命令フェッチステージ（Ｆ）、デコードス
テージ（Ｄ）、実行ステージ（Ｅ）、およびデータ書き
込みステージ（Ｗ）から構成される４段のパイプライン
を構成する。The CPU core unit 200 is a generic name for almost all units in the microprocessor 100 except the instruction cache 300, the data cache 400 and the register file 500, and each of them independently operable instruction fetch unit 201, It includes an instruction decode unit 202, an instruction execution unit 203, and a data write unit 204. These units form a four-stage pipeline including an instruction fetch stage (F), a decode stage (D), an execution stage (E), and a data write stage (W).

【００３２】命令フェッチユニット２０１による命令フ
ェッチステージ（Ｆ）では、通常は、アドレス順に連続
した命令群を順次フェッチするために、命令キャッシュ
３００の検索が行われる。すなわち、命令フェッチユニ
ットからは命令フェッチアドレスが出力され、それが命
令キャッシュ３００に供給される。命令キャッシュ３０
０にヒットした場合は、命令フェッチアドレスによって
指定された命令が命令キャッシュ３００から読み出さ
れ、それが命令フェッチユニット２０１に供給される。
命令キャッシュ３００に命令が無い（ミス）場合は、命
令キャッシュ３００の更新シーケンスであるキャッシュ
リフィルが行われ、更新が終了するまで命令フェッチス
テージの状態が続く。キャッシュリフィルにおいては、
メモリバス３１を介して主記憶３０から命令キャッシュ
３００に新たな命令群がバースト転送される。In the instruction fetch stage (F) by the instruction fetch unit 201, normally, the instruction cache 300 is searched in order to sequentially fetch an instruction group which is continuous in the order of address. That is, the instruction fetch address is output from the instruction fetch unit and is supplied to the instruction cache 300. Instruction cache 30
When 0 is hit, the instruction designated by the instruction fetch address is read from the instruction cache 300 and supplied to the instruction fetch unit 201.
When there is no instruction in the instruction cache 300 (miss), the cache refill which is the update sequence of the instruction cache 300 is performed, and the state of the instruction fetch stage continues until the update is completed. In cache refill,
A new instruction group is burst-transferred from the main memory 30 to the instruction cache 300 via the memory bus 31.

【００３３】また、命令フェッチユニット２０１は、命
令をフェッチした際、その命令のアドレスをラッチして
保持している。そして、その命令が分岐命令であること
がデコードステージにて検出されると、命令フェッチユ
ニット２０１は、ラッチされているアドレスを出力す
る。従って、分岐命令実行時には、命令フェッチユニッ
ト２０１は、ラッチしているアドレスを実行中の分岐命
令のアドレスとして出力すると共に、分岐先アドレスを
命令フェッチアドレスとして出力する。When fetching an instruction, the instruction fetch unit 201 latches and holds the address of the instruction. Then, when the decode stage detects that the instruction is a branch instruction, the instruction fetch unit 201 outputs the latched address. Therefore, when executing a branch instruction, the instruction fetch unit 201 outputs the latched address as the address of the branch instruction being executed and outputs the branch destination address as the instruction fetch address.

【００３４】命令デコードユニット２０２による命令デ
コードステージ（Ｄ）では、フェッチされた命令がデコ
ードされ、分岐命令の分岐先アドレスの算出や、Ｌｏａ
ｄ／Ｓｔｏｒｅ命令のオペランドアドレスの算出等が行
われる。In the instruction decode stage (D) by the instruction decode unit 202, the fetched instruction is decoded, the branch destination address of the branch instruction is calculated, and the Loa
The operand address of the d / Store instruction is calculated.

【００３５】命令実行ステージ（Ｅ）では、命令で指定
される各種演算が行われる。また、Ｌｏａｄ／Ｓｔｏｒ
ｅ命令ではデータキャッシュ４００の検索が行われる。
データ書き込みユニット２０４によるデータ書き込みス
テージ（Ｗ）では、演算結果やＬｏａｄ命令のオペラン
ドがレジスタファイル５００に格納される。At the instruction execution stage (E), various operations designated by the instruction are performed. In addition, Load / Store
With the e instruction, the data cache 400 is searched.
In the data writing stage (W) by the data writing unit 204, the operation result and the operand of the Load instruction are stored in the register file 500.

【００３６】次に、図２および図３を参照して、命令キ
ャッシュ３００の具体的な構成を説明する。Next, a specific structure of the instruction cache 300 will be described with reference to FIGS. 2 and 3.

【００３７】図２には、タグメモリ３０１と命令メモリ
３０２の関係が示されている。ここでは、命令メモリ３
０２は４Ｋバイトのサイズを持つ１ウェイ構成のメモリ
とし、１キャッシュラインのサイズが３２バイト（８ワ
ード）である場合を想定する。この場合、１命令が４バ
イトであると仮定すると、１キャッシュラインには、前
述したように、アドレス順に連続した８個の命令（命令
０〜命令７）が格納される。また、命令メモリ３０２に
含まれる総キャッシュライン数は１２８である。FIG. 2 shows the relationship between the tag memory 301 and the instruction memory 302. Here, the instruction memory 3
Reference numeral 02 is a 1-way memory having a size of 4 Kbytes, and it is assumed that the size of one cache line is 32 bytes (8 words). In this case, assuming that one instruction is 4 bytes, one cache line stores eight consecutive instructions (instruction 0 to instruction 7) in the address order as described above. The total number of cache lines included in the instruction memory 302 is 128.

【００３８】キャッシュライン０の８個の命令（命令０
〜命令７）に関する情報は、タグエントリ０によって管
理される。同様に、キャッシュライン１，２，…１２７
に関する情報は、タグエントリ１，２，…１２７によっ
て管理される。Eight instructions of cache line 0 (instruction 0
~ Information regarding instruction 7) is managed by tag entry 0. Similarly, cache lines 1, 2, ... 127
The information regarding the information is managed by the tag entries 1, 2, ... 127.

【００３９】タグエントリ０，１，２，…１２７の各々
には、前述したように、バリッドビットＶ、上位ビット
アドレスＣＡ、ネクストアドレス有効ビットＮＶ、ネク
スト命令アドレスフィールドＮＣＡ、ネクストアドレス
予測命令アドレスフィールドＮＡＦが登録される。As described above, each of the tag entries 0, 1, 2, ... 127 has a valid bit V, an upper bit address CA, a next address valid bit NV, a next instruction address field NCA, and a next address prediction instruction address field. NAF is registered.

【００４０】図３には、命令キャッシュ３００の具体的
な回路構成が示されている。FIG. 3 shows a specific circuit configuration of the instruction cache 300.

【００４１】命令キャッシュ３００には、前述したタグ
メモリ３０１および命令メモリ３０２に加え、キャッシ
ュコントロールユニット３０３、ヒット検出回路３０
４、セレクタ３０５〜３０７、減算器３０８が設けられ
ている。In addition to the tag memory 301 and the instruction memory 302 described above, the instruction cache 300 includes a cache control unit 303 and a hit detection circuit 30.
4, selectors 305 to 307, and a subtractor 308 are provided.

【００４２】タグメモリ３０１において、バリッドビッ
トＶ、上位ビットアドレスＣＡ、ネクストアドレス有効
ビットＮＶ、ネクスト命令アドレスフィールドＮＣＡ、
ネクストアドレス予測命令アドレスフィールドＮＡＦ
は、それぞれ１ビット、２０ビット、３２ビット、２ビ
ットのサイズを持つ。ネクストアドレス有効ビットＮ
Ｖ、ネクスト命令アドレスフィールドＮＣＡ、ネクスト
アドレス予測命令アドレスフィールドＮＡＦのセット
は、分岐命令実行時に、キャッシュコントロールユニッ
ト３０３によって実行される。In the tag memory 301, a valid bit V, an upper bit address CA, a next address valid bit NV, a next instruction address field NCA,
Next address prediction instruction address field NAF
Have sizes of 1 bit, 20 bits, 32 bits, and 2 bits, respectively. Next address valid bit N
The V, next instruction address field NCA, and next address prediction instruction address field NAF are set by the cache control unit 303 when executing a branch instruction.

【００４３】タグメモリ３０１のデータ入力ポートは、
命令フェッチユニット２０１およびキャッシュコントロ
ールユニット３０３に接続されている。命令フェッチユ
ニット２０１からデータ入力ポートに供給される命令フ
ェッチアドレス（３１：０）の上位２０ビット（３１：
１２）は、上位ビットアドレスＣＡとしてタグメモリ３
０１に登録される。また、分岐命令を実行した時に命令
フェッチユニット２０１からデータ入力ポートに供給さ
れる命令フェッチアドレス（３１：０）はその分岐命令
の分岐先アドレスであり、この分岐先アドレス（３１：
０）は、ネクスト命令アドレスフィールドＮＣＡとして
タグメモリ３０１に登録される。さらに、命令フェッチ
ユニット２０１からデータ入力ポートに供給される実行
中の分岐命令のアドレス（３１：０）の下位２ビット
（４：２）は、ネクストアドレス予測命令アドレスフィ
ールドＮＡＦとしてタグメモリ３０１に登録される。こ
こで、命令フェッチユニット２０１から出力される実行
中の分岐命令のアドレス（３１：０）としては、前述し
たように、命令フェッチユニット２０１のラッチ出力が
利用される。The data input port of the tag memory 301 is
It is connected to the instruction fetch unit 201 and the cache control unit 303. The upper 20 bits (31 :) of the instruction fetch address (31: 0) supplied from the instruction fetch unit 201 to the data input port.
12) is the tag memory 3 as the upper bit address CA.
01 is registered. The instruction fetch address (31: 0) supplied from the instruction fetch unit 201 to the data input port when the branch instruction is executed is the branch destination address of the branch instruction, and the branch destination address (31:
0) is registered in the tag memory 301 as the next instruction address field NCA. Further, the lower 2 bits (4: 2) of the address (31: 0) of the branch instruction being executed which is supplied from the instruction fetch unit 201 to the data input port are registered in the tag memory 301 as the next address prediction instruction address field NAF. To be done. Here, as the address (31: 0) of the branch instruction being executed, which is output from the instruction fetch unit 201, the latch output of the instruction fetch unit 201 is used as described above.

【００４４】タグメモリ３０１のアドレス入力ポートに
は、命令フェッチユニット２０１から供給される命令フ
ェッチアドレスの中位７ビット（１１：５）、実行中の
分岐命令のアドレス（３１：０）の中位７ビット（１
１：５）、またはネクスト命令アドレスフィールドＮＣ
Ａの中位７ビット（１１：５）がタグアドレスとして供
給される。これらタグアドレスは、セレクタ３０７によ
って選択される。アドレス入力ポートに供給される７ビ
ットのタグアドレスにより、タグメモリ３０１がアドレ
ッシングされ、タグエントリ０〜１２７の１つが選択さ
れる。選択されたタグエントリに格納されている情報
は、タグメモリ３０１のデータ出力ポートから読み出さ
れる。At the address input port of the tag memory 301, the middle 7 bits (11: 5) of the instruction fetch address supplied from the instruction fetch unit 201 and the middle (31: 0) of the address of the branch instruction being executed are set. 7 bits (1
1: 5), or next instruction address field NC
The middle 7 bits (11: 5) of A are supplied as the tag address. These tag addresses are selected by the selector 307. The tag memory 301 is addressed by the 7-bit tag address supplied to the address input port, and one of the tag entries 0 to 127 is selected. The information stored in the selected tag entry is read from the data output port of the tag memory 301.

【００４５】データ出力ポートから読み出されたバリッ
ドビットＶおよびネクストアドレス有効ビットＮＶは、
キャッシュコントロールユニット３０３に直接送られ
る。データ出力ポートから読み出された上位ビットアド
レスＣＡはヒット検出回路３０４に送られ、またネクス
ト命令アドレスフィールドＮＣＡはセレクタ３０５〜３
０７それぞれの一方の入力に供給される。データ出力ポ
ートから読み出されたネクストアドレス予測命令アドレ
スフィールドＮＡＦは、減算器３０８の第１入力に供給
される。The valid bit V and the next address valid bit NV read from the data output port are
It is sent directly to the cache control unit 303. The upper bit address CA read from the data output port is sent to the hit detection circuit 304, and the next instruction address field NCA is the selectors 305-3.
07 is supplied to one input of each. The next address prediction instruction address field NAF read from the data output port is supplied to the first input of the subtractor 308.

【００４６】命令メモリ３０２の命令入力ポートは、キ
ャッシュコントロールユニット３０３に接続されてい
る。キャッシュリフィル時には、キャッシュコントロー
ルユニット３０３によって主記憶３０から読み出された
８個の命令が命令メモリ３０２に順次格納される。The instruction input port of the instruction memory 302 is connected to the cache control unit 303. At the time of cache refill, eight instructions read from the main memory 30 by the cache control unit 303 are sequentially stored in the instruction memory 302.

【００４７】命令メモリ３０２のアドレス入力ポートに
は、命令フェッチユニット２０１からの命令アドレスの
中位１０ビット（１１：２）のタグアドレスが供給され
る。このタグアドレスの上位７ビット（１１：５）はキ
ャッシュエントリ０〜１２７の１つを選択するために使
用され、下位２ビットは選択されたキャッシュエントリ
に格納されている８個の命令の１を選択するために使用
される。The address input port of the instruction memory 302 is supplied with the middle 10-bit (11: 2) tag address of the instruction address from the instruction fetch unit 201. The high-order 7 bits (11: 5) of this tag address are used to select one of the cache entries 0 to 127, and the low-order 2 bits are 1 of 8 instructions stored in the selected cache entry. Used to choose.

【００４８】命令メモリ３０２の命令出力ポートは、命
令フェッチユニット２０１に接続されている。この命令
出力ポートからは、命令アドレスの中位１０ビット（１
１：２）によって選択された命令が読み出される。The instruction output port of the instruction memory 302 is connected to the instruction fetch unit 201. From the instruction output port, the middle 10 bits (1
The instruction selected by 1: 2) is read.

【００４９】キャッシュコントロールユニット３０３
は、ヒット検出回路３０４、セレクタ３０５〜３０７、
減算器３０８を用いて、タグメモリ３０１および命令メ
モリ３０３のアクセス制御する。Cache control unit 303
Is a hit detection circuit 304, selectors 305 to 307,
The subtractor 308 is used to control access to the tag memory 301 and the instruction memory 303.

【００５０】キャッシュコントロールユニット３０３
は、タグメモリ３０１にバリッドビットＶ、上位ビット
アドレスＣＡ、ネクストアドレス有効ビットＮＶ、ネク
スト命令アドレスフィールドＮＣＡ、およびネクストア
ドレス予測命令アドレスフィールドＮＡＦを登録する。
この場合、バリッドビットＶおよび上位ビットアドレス
ＣＡの登録は、キャッシュフィル時およびリアイル時に
行われる。一方、ネクストアドレス有効ビットＮＶ、ネ
クスト命令アドレスフィールドＮＣＡ、およびネクスト
アドレス予測命令アドレスフィールドＮＡＦの登録処理
は、命令実行ユニット２０３から出力される分岐命令実
行信号に応答して実行される。Cache control unit 303
Registers the valid bit V, the upper bit address CA, the next address valid bit NV, the next instruction address field NCA, and the next address prediction instruction address field NAF in the tag memory 301.
In this case, registration of the valid bit V and the high-order bit address CA is performed at the time of cache fill and at the time of real time. On the other hand, the registration processing of the next address valid bit NV, the next instruction address field NCA, and the next address prediction instruction address field NAF is executed in response to the branch instruction execution signal output from the instruction execution unit 203.

【００５１】また、キャッシュコントロールユニット３
０３は、キャッシュミス時のキャッシュフィル／リフィ
ル操作、およびネクスト命令アドレスフィールドＮＣＡ
によって指定され分岐先命令を含む命令群の先行リフィ
ル操作を行う。The cache control unit 3
03 is a cache fill / refill operation at the time of a cache miss, and a next instruction address field NCA.
The pre-refill operation of the instruction group including the branch destination instruction specified by is performed.

【００５２】次に、図４のフローチャートを参照して、
タグメモリ３０１へのネクストアドレス有効ビットＮ
Ｖ、ネクスト命令アドレスフィールドＮＣＡ、およびネ
クストアドレス予測命令アドレスフィールドＮＡＦの登
録動作を説明する。Next, referring to the flowchart of FIG.
Next address valid bit N to tag memory 301
The registration operation of V, the next instruction address field NCA, and the next address prediction instruction address field NAF will be described.

【００５３】キャッシュコントロールユニット３０３
は、分岐命令実行信号を監視しており、分岐命令実行信
号の発生の有無により、実行されている命令が分岐命令
であるか否かを判断する（ステップＳ１１）。Cache control unit 303
Monitors the branch instruction execution signal, and determines whether or not the instruction being executed is a branch instruction depending on whether or not the branch instruction execution signal is generated (step S11).

【００５４】実行中の命令が分岐命令ならば、キャッシ
ュコントロールユニット３０３は、その時にＣＰＵコア
ユニット２００の命令フェッチユニット２０１から出力
されている実行中の分岐命令のアドレス（３１：０）の
中位ビット（１１：５）からなるタグアドレスをセレク
タ３０７に選択させ、それによってタグメモリ３０１を
アドレッシングする。そして、キャッシュコントロール
ユニット３０３は、分岐命令のアドレス（３１：０）の
上位ビットアドレス（３１：１２）をセレクタ３０６に
選択させ、それとタグメモリ３０１から出力される上位
ビットアドレスＣＡとをヒット検出回路３０４に比較さ
せる（ステップＳ１２）。If the instruction being executed is a branch instruction, the cache control unit 303 determines the middle level of the address (31: 0) of the branch instruction being executed being output from the instruction fetch unit 201 of the CPU core unit 200 at that time. The selector 307 is caused to select the tag address composed of bits (11: 5), and thereby the tag memory 301 is addressed. Then, the cache control unit 303 causes the selector 306 to select the high-order bit address (31:12) of the address (31: 0) of the branch instruction, and the high-order bit address CA output from the tag memory 301 and the hit detection circuit. It is compared with 304 (step S12).

【００５５】次いで、キャッシュコントロールユニット
３０３は、ヒット検出回路３０４からの比較結果を示す
ヒット信号とタグメモリ３０１から出力されるバリッド
ビットＶを調べ、実行中の分岐命令についてのキャッシ
ュヒット／キャッシュミスを判定する（ステップＳ１
３）。この場合、比較結果が一致し且つバリッドビット
Ｖ＝“１”（有効）であれば、キャッシュヒットである
と判定される。Next, the cache control unit 303 checks the hit signal indicating the comparison result from the hit detection circuit 304 and the valid bit V output from the tag memory 301, and checks for a cache hit / cache miss for the branch instruction being executed. Judgment (step S1
3). In this case, if the comparison results match and the valid bit V = “1” (valid), it is determined to be a cache hit.

【００５６】キャッシュヒットの場合には、キャッシュ
コントロールユニット３０３は、実行中の分岐命令のア
ドレスによって選択されているタグエントリに、ネクス
トアドレス有効ビットＮＶ、ネクスト命令アドレスフィ
ールドＮＣＡ、およびネクストアドレス予測命令アドレ
スフィールドＮＡＦをセットする（ステップＳ１４）。In the case of a cache hit, the cache control unit 303 adds the next address valid bit NV, the next instruction address field NCA, and the next address prediction instruction address to the tag entry selected by the address of the branch instruction being executed. The field NAF is set (step S14).

【００５７】この場合、ネクストアドレス有効ビットＮ
Ｖは有効を示す“１”にセットされ、ネクスト命令アド
レスフィールドＮＣＡはその時に命令フェッチユニット
から出力されている命令フェッチアドレス、つまり実行
中の分岐命令によって指定される分岐先アドレス（３
１：０）にセットされ、ネクストアドレス予測命令アド
レスフィールドＮＡＦは実行中の分岐命令のアドレス
（３１：０）の下位２ビット（４：２）からなるエント
リ内オフセットアドレスにセットされる。In this case, the next address valid bit N
V is set to "1" indicating validity, and the next instruction address field NCA is set to the instruction fetch address output from the instruction fetch unit at that time, that is, the branch destination address (3 specified by the branch instruction being executed).
1: 0) and the next address prediction instruction address field NAF is set to the in-entry offset address consisting of the lower 2 bits (4: 2) of the address (31: 0) of the branch instruction being executed.

【００５８】このように、分岐命令の実行時において
は、その分岐命令が格納されているキャッシュラインに
対応するタグエントリに、その分岐先アドレスを示すネ
クスト命令アドレスフィールドＮＣＡとその分岐命令の
エントリ内オフセットアドレスがセットされる。次
に、図５のフローチャートを参照して、ネクストアドレ
ス有効ビットＮＶ、ネクスト命令アドレスフィールドＮ
ＣＡ、およびネクストアドレス予測命令アドレスフィー
ルドＮＡＦを用いた先行キャッシュリフィル操作を説明
する。As described above, when a branch instruction is executed, the tag entry corresponding to the cache line in which the branch instruction is stored has the next instruction address field NCA indicating the branch destination address and the entry of the branch instruction. Offset address is set. Next, referring to the flowchart in FIG. 5, the next address valid bit NV and the next instruction address field N
The preceding cache refill operation using CA and the next address prediction instruction address field NAF will be described.

【００５９】命令フェッチステージにおいて、キャッシ
ュコントロールユニット３０３は、その時にＣＰＵコア
ユニット２００の命令フェッチユニット２０１から出力
されている命令フェッチアドレス（３１：０）の中位ビ
ット（１１：５）からなるタグアドレスをセレクタ３０
７に選択させ、それによってタグメモリ３０１をアドレ
ッシングする。そして、キャッシュコントロールユニッ
ト３０３は、命令フェッチアドレス（３１：０）の上位
ビットアドレス（３１：１２）をセレクタ３０６に選択
させ、それとタグメモリ３０１から出力される上位ビッ
トアドレスＣＡとをヒット検出回路３０４に比較させる
（ステップＳ２１）。In the instruction fetch stage, the cache control unit 303 is a tag consisting of the middle bits (11: 5) of the instruction fetch address (31: 0) output from the instruction fetch unit 201 of the CPU core unit 200 at that time. Address selector 30
7 and thereby address the tag memory 301. Then, the cache control unit 303 causes the selector 306 to select the high-order bit address (31:12) of the instruction fetch address (31: 0), and the high-order bit address CA output from the tag memory 301 and the hit detection circuit 304. Are compared (step S21).

【００６０】次いで、キャッシュコントロールユニット
３０３は、ヒット検出回路３０４からの比較結果を示す
ヒット信号とタグメモリ３０１から出力されるバリッド
ビットＶを調べ、実行中の分岐命令についてのキャッシ
ュヒット／キャッシュミスを判定する（ステップＳ２
２）。この場合、比較結果が一致し且つバリッドビット
Ｖ＝“１”（有効）であればキャッシュヒットであると
判定され、一方、比較結果が不一致か、またはバリッド
ビットＶ＝“０”ならばキャッシュミスであると判定さ
れる。Next, the cache control unit 303 checks the hit signal indicating the comparison result from the hit detection circuit 304 and the valid bit V output from the tag memory 301, and checks for a cache hit / cache miss for the branch instruction being executed. Judge (step S2
2). In this case, if the comparison result is coincident and the valid bit V = "1" (valid), it is determined to be a cache hit. On the other hand, if the comparison result is not coincident or the valid bit V = "0", a cache miss. It is determined that

【００６１】キャッシュヒットの場合には、キャッシュ
コントロールユニット３０３は、まず、その時にタグメ
モリ３０１から読み出されるネクストアドレス予測命令
アドレスフィールドＮＡＦの値が命令フェッチアドレス
（３１：０）の下位２ビット（４：２）の値以上か否
か、およびネクストアドレス有効ビットＮＶ＝“１”か
否かを調べる（ステップＳ２３）。In the case of a cache hit, the cache control unit 303 first determines that the value of the next address prediction instruction address field NAF read from the tag memory 301 at that time is the lower 2 bits (4) of the instruction fetch address (31: 0). : 2) or more and whether or not the next address valid bit NV = "1" is checked (step S23).

【００６２】この場合、ネクストアドレス予測命令アド
レスフィールドＮＡＦの値が命令フェッチアドレス（３
１：０）の下位２ビット（４：２）の値以上か否かの判
定は、、減算回路３０８からの減算結果信号に基づいて
行われる。In this case, the value of the next address prediction instruction address field NAF is the instruction fetch address (3
The determination as to whether or not the value of the lower 2 bits (4: 2) of 1: 0) is greater than or equal to is performed based on the subtraction result signal from the subtraction circuit 308.

【００６３】ネクストアドレス有効ビットＮＶ＝
“１”、且つネクストアドレス予測命令アドレスフィー
ルドＮＡＦの値が命令フェッチアドレス（３１：０）の
下位２ビット（４：２）の値以上の場合には、キャッシ
ュコントロールユニット３０３は、その時にタグメモリ
３０１から読み出されているネクスト命令アドレスフィ
ールドＮＣＡの中位ビット（１１：５）からなるタグア
ドレスをセレクタ３０７に選択させ、それによってタグ
メモリ３０１をアドレッシングする。そして、キャッシ
ュコントロールユニット３０３は、前述のネクスト命令
アドレスフィールドＮＣＡ（３１：０）の上位ビットア
ドレス（３１：１２）をセレクタ３０６に選択させ、そ
の選択したネクスト命令アドレスフィールドＮＣＡと、
新たにアドレッシングされたタグメモリ３０１から出力
される上位ビットアドレスＣＡとをヒット検出回路３０
４に比較させる（ステップＳ２４）。Next address valid bit NV =
If "1" and the value of the next address prediction instruction address field NAF is equal to or more than the value of the lower 2 bits (4: 2) of the instruction fetch address (31: 0), the cache control unit 303 determines that time. The selector 307 is caused to select the tag address formed of the middle bits (11: 5) of the next instruction address field NCA read from the selector 301, thereby addressing the tag memory 301. Then, the cache control unit 303 causes the selector 306 to select the high-order bit address (31:12) of the above-described next instruction address field NCA (31: 0), and the selected next instruction address field NCA,
The high-order bit address CA output from the newly addressed tag memory 301 is used as the hit detection circuit 30.
4 is compared (step S24).

【００６４】次いで、キャッシュコントロールユニット
３０３は、ヒット検出回路３０４からの比較結果を示す
ヒット信号とタグメモリ３０１から出力されるバリッド
ビットＶを調べ、ネクスト命令アドレスフィールドＮＣ
Ａによって指定される分岐先命令についてのキャッシュ
ヒット／キャッシュミスを判定する（ステップＳ２
５）。この場合、比較結果が一致し且つバリッドビット
Ｖ＝“１”（有効）であればキャッシュヒットであると
判定され、一方、比較結果が不一致か、またはバリッド
ビットＶ＝“０”ならばキャッシュミスであると判定さ
れる。Next, the cache control unit 303 checks the hit signal indicating the comparison result from the hit detection circuit 304 and the valid bit V output from the tag memory 301, and checks the next instruction address field NC.
A cache hit / cache miss is determined for the branch destination instruction designated by A (step S2
5). In this case, if the comparison result is coincident and the valid bit V = "1" (valid), it is determined to be a cache hit. On the other hand, if the comparison result is not coincident or the valid bit V = "0", a cache miss. It is determined that

【００６５】キャッシュミスの場合には、キャッシュコ
ントロールユニット３０３は、ネクスト命令アドレスフ
ィールドＮＣＡによって指定される分岐命令を含むブロ
ックを主記憶３０から命令メモリ３０２に転送する先行
キャッシュリフィル操作を行う（ステップＳ２６）。In the case of a cache miss, the cache control unit 303 performs a preceding cache refill operation for transferring a block including a branch instruction designated by the next instruction address field NCA from the main memory 30 to the instruction memory 302 (step S26). ).

【００６６】このように、先行キャッシュリフィル操作
は、命令フェッチステージにおいてアクセスされたキャ
ッシュラインに命令フェッチアドレス以上のアドレス値
を持つ分岐命令が格納されており、且つその分岐先命令
が命令キャッシュに存在しないことを条件に実行され
る。As described above, in the preceding cache refill operation, a branch instruction having an address value equal to or greater than the instruction fetch address is stored in the cache line accessed in the instruction fetch stage, and the branch destination instruction exists in the instruction cache. It is executed on the condition that it does not.

【００６７】次に、図６を参照して、具体的な命令実行
シーケンスを例にとって命令キャッシュ３００の動作を
説明する。Next, referring to FIG. 6, the operation of the instruction cache 300 will be described by taking a specific instruction execution sequence as an example.

【００６８】ここでは、アドレス００００１００４番地
から００００１０２４番地までのシーケンスが１６進で
００００１０００回繰り返され、そのループの中で００
００１０８０番地のサブルーチンが呼び出される場合を
例示して説明する。Here, the sequence from address 00001004 to address 000010024 is repeated in hexadecimal 00001000 times, and in the loop, 00
A case where the subroutine at the address 001080 is called will be described as an example.

【００６９】以下、命令キャッシュ３００が空の状態か
ら図５のプログラムを実行する場合を考える。Consider the case where the instruction cache 300 executes the program of FIG. 5 from an empty state.

【００７０】まず、００００１０００番地を指定する命
令フェッチアドレスが命令フェッチユニット２０１から
出力される。この時命令キャッシュ３００が空であるの
で、キャッシュミスが発生する。このキャッシュミスに
応答して、キャッシュコントローラ３０３は、アドレス
００００１０００からアドレス００００１０１ｃまでの
主記憶３０上のブロックに存在する８個の命令を命令メ
モリ３０２のキャッシュライン０にリフィルする。その
後順次命令が実行され、００００１０１０番地の分岐命
令（ｃａｌｌ０×１０８０）によって、００００１０
８０番地へ分岐する。この分岐命令実行の際、キャッシ
ュコントローラ３０３は、タグメモリ３０１のタグエン
トリ０に以下の情報を書き込む。First, the instruction fetch unit 201 outputs an instruction fetch address designating the address 00001000. At this time, since the instruction cache 300 is empty, a cache miss occurs. In response to this cache miss, the cache controller 303 refills the cache line 0 of the instruction memory 302 with eight instructions existing in the blocks on the main memory 30 from the address 00001000 to the address 0000101c. After that, the instructions are sequentially executed, and a branch instruction (call 0x1080) at address 00001010 causes 000010.
Branch to number 80. When executing this branch instruction, the cache controller 303 writes the following information in the tag entry 0 of the tag memory 301.

【００７１】ＮＶ＝１（有効）ＮＣＡ＝００００１０８０ＮＡＦ＝４ＮＡＦに書き込まれる値４は分岐命令（ｃａｌｌ）がキ
ャッシュライン０の４番目の命令であることを示す。NV = 1 (valid) NCA = 00001080 NAF = 4 The value 4 written in NAF indicates that the branch instruction (call) is the fourth instruction of cache line 0.

【００７２】ｃａｌｌ命令によって００００１０８０番
地へ分岐すると、キャッシュミスが発生する。このキャ
ッシュミスに応答して、キャッシュコントローラ３０３
は、アドレス００００１０８０からアドレス００００１
０９ｃまでの主記憶３０上のブロックに存在する８個の
命令を命令メモリ３０２のキャッシュライン３にフィル
する。その後、サブルーチンの命令群が順次実行され、
００００１０９４番地のｒｅｔｕｒｎ命令によって、０
０００１０１４番地へ復帰する。この時、ｒｅｔｕｒｎ
命令を含む命令メモリ３０２のキャッシュライン３に対
応するタグメモリ３０１のタグエントリ３には、以下の
情報が書き込まれる。When the call instruction branches to the address 000010080, a cache miss occurs. In response to this cache miss, the cache controller 303
Is from address 00001080 to address 00001
The eight instructions existing in blocks on the main memory 30 up to 09c are filled in the cache line 3 of the instruction memory 302. After that, the instruction group of the subroutine is sequentially executed,
0 by the return instruction at address 000010994
Return to address 0001014. At this time, return
The following information is written in the tag entry 3 of the tag memory 301 corresponding to the cache line 3 of the instruction memory 302 including the instruction.

【００７３】ＮＶ＝１（有効）ＮＣＡ＝００００１０１４ＮＡＦ＝５ｒｅｔｕｒｎ命令によって００００１０１４番地へ戻る
と、その時の命令フェッチはキャッシュヒットするが、
命令実行が進み００００１０２０番地の命令をフェッチ
する時にキャッシュミスが発生する。このキャッシュミ
スに応答して、キャッシュコントローラ３０３は、アド
レス００００１０２０からアドレス００００１０３ｃま
での主記憶３０上のブロックに存在する８個の命令を命
令メモリ３０２のキャッシュライン１にフィルする。NV = 1 (valid) NCA = 00001014 NAF = 5 When returning to address 00001014 by a return instruction, the instruction fetch at that time causes a cache hit,
As the instruction execution progresses, a cache miss occurs when the instruction at address 00001020 is fetched. In response to this cache miss, the cache controller 303 fills the cache line 1 of the instruction memory 302 with eight instructions existing in the blocks on the main memory 30 from the address 0000001020 to the address 0000103c.

【００７４】その後、００００１０２４番地の分岐命令
（ｂｌ：ｂｒａｎｃｈｉｆｌｅｓｓ）の実行で０×
１００４番地へ分岐する。その際、分岐命令（ｂｌ）を
含む命令メモリ３０２のキャッシュライン１に対応する
タグメモリ３０１のタグエントリ１の内容は以下のよう
にセットされる。Thereafter, the execution of the branch instruction (bl: branch if less) at address 00001024 causes 0 ×.
Branch to address 1004. At that time, the contents of the tag entry 1 of the tag memory 301 corresponding to the cache line 1 of the instruction memory 302 including the branch instruction (bl) are set as follows.

【００７５】ＮＶ＝１（有効）ＮＣＡ＝００００１００４ＮＡＦ＝１分岐命令（ｂｌ）の実行によって００００１００４番地
に戻ると、その時の命令フェッチはキャッシュヒットす
る。この時、タグメモリのタグエントリ０のＮＶ，ＮＣ
Ａ，ＮＡＦがキャッシュコントローラ３０３によって検
査され、ＮＶ＝１ＮＣＡ＝００００１０８０ＮＡＦ＝４であることがわかる。NV = 1 (valid) NCA = 00001004 NAF = 1 When returning to the address 00001004 by executing the branch instruction (bl), the instruction fetch at that time causes a cache hit. At this time, NV, NC of the tag entry 0 of the tag memory
A, NAF is examined by the cache controller 303 and it is found that NV = 1 NCA = 000010080 NAF = 4.

【００７６】ここでは、フェッチされる命令のアドレス
００００１００４のオフセットアドレス（エントリ番
号）は２である。したがって、そのフェッチされる命令
は、キャッシュライン０に含まれる分岐命令（ｃａｌ
ｌ）のオフセットアドレス以前の命令である。また、Ｎ
Ｖも１であるので、これら条件からネクストアドレスの
先行キャッシュリフィルが有効であると判断される。Here, the offset address (entry number) of the address 00001004 of the fetched instruction is 2. Therefore, the fetched instruction is the branch instruction (cal) included in the cache line 0.
It is an instruction before the offset address of l). Also, N
Since V is also 1, it is determined from these conditions that the preceding cache refill of the next address is valid.

【００７７】この後、キャッシュコントローラ３０３
は、ＮＣＡによって指定される分岐先アドレス００００
１０８０の命令がキャッシュヒットするかどうかを検査
する。この例ではヒットするため先行リフィルは実行さ
れないが、より複雑なプログラムの場合や、タスクスイ
ッチなどによって途中で割り込み処理が発生して、キャ
ッシュの状態が変化した場合などには、例えばＶ＝０と
なることによりキャッシュミスが発生し、先行リフィル
が起動される。この場合、分岐命令（ｃａｌｌ）のフェ
ッチよりも前に、分岐命令（ｃａｌｌ）の分岐先命令を
含む命令群のリフィルが開始される。図６のシーケンス
を実行した場合の命令キャッシュ３００の内容は、図７
の通りである。After this, the cache controller 303
Is the branch destination address 0000 specified by the NCA.
Check if 1080 instructions cache hit. In this example, the preceding refill is not executed because there is a hit, but in the case of a more complicated program, or when the cache state changes due to interrupt processing occurring midway due to a task switch, for example, V = 0 As a result, a cache miss occurs and the preceding refill is activated. In this case, refilling of the instruction group including the branch destination instruction of the branch instruction (call) is started before the fetch of the branch instruction (call). The contents of the instruction cache 300 when the sequence of FIG. 6 is executed are as shown in FIG.
Is the street.

【００７８】図８には、従来のプロセッサと本発明の命
令キャッシュ３００を備えたプロセッサの分岐命令処理
における動作タイミングが対比して示されている。FIG. 8 shows operation timings in branch instruction processing of a conventional processor and a processor including the instruction cache 300 of the present invention in comparison.

【００７９】ここでは、１０１ｃ番地から連続的に命令
フェッチし、１０３０番地にある分岐命令を実行して、
２０００番地に分岐するシーケンスを考える。また、１
０３０番地の分岐命令は命令メモリのキャッシュライン
０の５番目の命令であると仮定する。Here, instruction fetches are continuously made from the address 101c, the branch instruction at the address 1030 is executed,
Consider a sequence that branches to address 2000. Also, 1
It is assumed that the branch instruction at address 030 is the fifth instruction of cache line 0 of the instruction memory.

【００８０】また、ここでは、分岐予測機構は使用しな
いものとし、分岐命令のデコード（Ｄ）サイクルで分岐
先アドレスを計算し、次のサイクルから分岐先フェッチ
が始まるものとする。さらに、分岐先の２０００番地は
キャッシュミスを起こし、そのキャッシュリフィルサイ
クルには７サイクル要するとする。It is assumed here that the branch prediction mechanism is not used, the branch destination address is calculated in the decode (D) cycle of the branch instruction, and the branch destination fetch starts from the next cycle. Further, it is assumed that the branch destination address 2000 causes a cache miss and the cache refill cycle requires 7 cycles.

【００８１】従来のプロセッサでは、サイクル５におい
て１０３０番地の分岐命令がフェッチされ、サイクル６
でデコードされると同時に分岐先アドレスが計算されサ
イクル７から分岐先アドレス（２０００）の命令フェッ
チが開始される。分岐先命令はキャッシュミスを起こ
し、サイクル１３でリフィルが完了する。このため、分
岐先命令のデコードはサイクル１４から開始される。In the conventional processor, the branch instruction at the address 1030 is fetched in cycle 5, and cycle 6 is fetched.
At the same time as the decoding, the branch destination address is calculated and the instruction fetch of the branch destination address (2000) is started from cycle 7. The branch destination instruction causes a cache miss, and the refill is completed in cycle 13. Therefore, the decoding of the branch destination instruction is started from cycle 14.

【００８２】このように、分岐先アドレスを計算してか
らキャッシュリフィルを行なう場合には、リフィル時間
の間、プロセッサによる命令実行は中断される。As described above, when the cache refill is performed after calculating the branch destination address, the instruction execution by the processor is suspended during the refill time.

【００８３】一方、この発明の命令キャッシュ３００を
使用した場合には、キャッシュラインの境界である１０
２０番地の命令をフェッチした時点で、そのキャッシュ
タグ内のＮＶ、ＮＣＡ、ＮＡＦが検査される。ＮＶが
“１”で、命令フェッチアドレスのライン内オフセット
アドレスがＮＡＦと等しいか小さい場合は、サイクル２
からＮＣＡ（＝２０００番地）を含む命令群の先行リフ
ィルが開始される。リフィルはサイクル８で終了し、２
０００番地の分岐先命令はヒット状態となる。On the other hand, when the instruction cache 300 according to the present invention is used, it is a boundary of 10 cache lines.
When the instruction at address 20 is fetched, NV, NCA, and NAF in the cache tag are inspected. If NV is “1” and the offset address within the line of the instruction fetch address is equal to or smaller than NAF, cycle 2
From the start, pre-refill of the instruction group including NCA (= 2000 address) is started. Refill ends in cycle 8 and 2
The branch destination instruction at address 000 is in a hit state.

【００８４】先行リフィルの期間もプロセッサによる命
令実行は継続して実行され、サイクル５で分岐命令をフ
ェッチし、サイクル６で分岐先アドレスを計算し、サイ
クル７で２０００番地の命令フェッチを開始する。この
時点では、先行して行われていた２０００番地のリフィ
ルはまだ実行中であるので、このフェッチ処理はサイク
ル８まで待たされ、サイクル８の終わりで命令がフェッ
チされる。この結果、サイクル９から分岐先命令のデコ
ードが開始できる。Instruction execution by the processor is continuously executed even during the preceding refill period, the branch instruction is fetched in cycle 5, the branch destination address is calculated in cycle 6, and the instruction fetch at address 2000 is started in cycle 7. At this point in time, the refill of the address 2000, which was previously performed, is still being executed, so this fetch processing is held until cycle 8 and the instruction is fetched at the end of cycle 8. As a result, the decoding of the branch destination instruction can be started from cycle 9.

【００８５】リフィル操作では、８命令分のバースト転
送によって主記憶３０から命令メモリ３０２に命令群が
読み込まれる。このため、もしサイクル７において分岐
先命令がすでに命令キャッシュ３０２に読み込まれてい
ればフェッチ動作をサイクル８まで待たせないようにす
ることも可能である。In the refill operation, an instruction group is read from the main memory 30 into the instruction memory 302 by burst transfer for 8 instructions. Therefore, if the branch target instruction has already been read into the instruction cache 302 in cycle 7, it is possible to prevent the fetch operation from waiting until cycle 8.

【００８６】以上のように、この実施例においては、キ
ャッシュラインがアクセスされたときにそのタグメモリ
３０１のネクストアドレス有効ビットＮＶが１であれ
ば、そのキャッシュライン中に存在する分岐命令の分岐
先アドレスを即座に導くことができ、もし分岐先命令が
キャッシュに入っていない場合は即座にリフィルを起動
することができる。これにより分岐命令が実行される前
に分岐先のリフィルが可能となり、キャッシュミス時の
ストール時間を短縮することができる。またＣＰＵが実
際に分岐命令をフェッチしたときに、分岐先の命令がキ
ャッシュヒットであればその命令を分岐命令に続いてＣ
ＰＵに対して送ることで、ＣＰＵの分岐先アドレス計算
サイクルを省略することもできる。As described above, in this embodiment, if the next address valid bit NV of the tag memory 301 is 1 when the cache line is accessed, the branch destination of the branch instruction existing in the cache line The address can be immediately derived, and if the branch target instruction is not in the cache, the refill can be activated immediately. This makes it possible to refill the branch destination before the branch instruction is executed, and the stall time at the time of a cache miss can be shortened. When the CPU actually fetches a branch instruction, if the branch destination instruction is a cache hit, that instruction is followed by C
By sending to the PU, the branch destination address calculation cycle of the CPU can be omitted.

【００８７】[0087]

【発明の効果】以上のようにこの本発明によれば、ＣＰ
Ｕの命令アクセスによって、あるキャッシュラインがア
クセスされたときにそのタグメモリのネクストアドレス
有効ビットが１であれば、そのキャッシュライン中に存
在する分岐命令の直前に実行したときに分岐先アドレス
を即座に導くことができ、もし分岐先命令がキャッシュ
に入っていない場合は即座にリフィルを起動することが
できる。これにより分岐命令が実行される前に分岐先の
リフィルが可能となり、キャッシュミス時のストール時
間を短縮することができる。またＣＰＵが実際に分岐命
令をフェッチしたときに、分岐先の命令がキャッシュヒ
ットであればその命令を分岐命令に続いてＣＰＵに対し
て送ることで、ＣＰＵの分岐先アドレス計算サイクルを
省略することができる。このようにこの発明によってＣ
ＰＵの分岐命令実行の高速化に多大な効果を奏すること
ができる。As described above, according to the present invention, CP
If the next address valid bit of the tag memory is 1 when a certain cache line is accessed by the U instruction access, the branch destination address is immediately output when the instruction is executed immediately before the branch instruction existing in the cache line. If the branch destination instruction is not in the cache, the refill can be activated immediately. This makes it possible to refill the branch destination before the branch instruction is executed, and the stall time at the time of a cache miss can be shortened. When the CPU actually fetches a branch instruction, if the branch destination instruction is a cache hit, the instruction is sent to the CPU following the branch instruction, thereby omitting the CPU's branch destination address calculation cycle. You can Thus, according to the present invention, C
A great effect can be brought about in speeding up the execution of the branch instruction of the PU.

[Brief description of drawings]

【図１】この発明の一実施例に係わるキャッシュシステ
ムを内蔵したマイクロプロセッサの全体構成を示すブロ
ック図。FIG. 1 is a block diagram showing an overall configuration of a microprocessor incorporating a cache system according to an embodiment of the present invention.

【図２】図１のキャッシュシステムに設けられている命
令キャッシュを構成するタグメモリと命令メモリの関係
を示す図。FIG. 2 is a diagram showing a relationship between a tag memory and an instruction memory that form an instruction cache provided in the cache system of FIG.

【図３】図１のキャッシュシステムに設けられている命
令キャッシュの具体的な回路構成を示す図。3 is a diagram showing a specific circuit configuration of an instruction cache provided in the cache system of FIG.

【図４】図１のキャッシュシステムにおけるタグ情報登
録動作を説明するためのフローチャート。4 is a flowchart for explaining a tag information registration operation in the cache system of FIG.

【図５】図１のキャッシュシステムにおける先行キャッ
シュリフィル動作を説明するためのフローチャート。5 is a flowchart for explaining a preceding cache refill operation in the cache system of FIG.

【図６】図１のキャッシュシステムの動作を具体的に説
明するための命令実行シーケンスの一例を示す図。FIG. 6 is a diagram showing an example of an instruction execution sequence for specifically explaining the operation of the cache system in FIG.

【図７】図７の命令実行シーケンスを実行した後のタグ
メモリの内容を示す図。7 is a diagram showing the contents of a tag memory after executing the instruction execution sequence of FIG.

【図８】図１のプロセッサによる分岐命令処理における
動作タイミングを示す図。8 is a diagram showing an operation timing in branch instruction processing by the processor of FIG. 1;

[Explanation of symbols]

３０…主記憶、１００…マイクロプロセッサ、２００…
ＣＰＵコアユニット、２０１…命令フェッチユニット、
２０２…命令デコードユニット、２０３…命令実行ユニ
ット、２０４…データ書き込みユニット、３００…命令
キャッシュ、３０１…タグメモリ、３０２…命令メモ
リ、３０３…キャッシュコントロールユニット、４００
…データキャッシュ、Ｖ…バリッドビット、ＣＡ…上位
ビットアドレス（ブロックアドレス）、ＮＶ…ネクスト
アドレス有効ビット、ＮＣＡ…ネクスト命令アドレスフ
ィールド、ＮＡＦ…ネクストアドレス予測命令アドレス
フィールド。30 ... Main memory, 100 ... Microprocessor, 200 ...
CPU core unit, 201 ... Instruction fetch unit,
202 ... Instruction decoding unit, 203 ... Instruction execution unit, 204 ... Data writing unit, 300 ... Instruction cache, 301 ... Tag memory, 302 ... Instruction memory, 303 ... Cache control unit, 400
... data cache, V ... valid bit, CA ... upper bit address (block address), NV ... next address valid bit, NCA ... next instruction address field, NAF ... next address prediction instruction address field.

Claims

[Claims]

1. An instruction memory having a plurality of cache lines respectively storing instruction groups of different blocks in a main memory, and a plurality of instruction memories each storing a block address of an instruction group stored in the cache line of the instruction memory. In a cache system having a tag memory having an entry, in response to execution of a branch instruction by a CPU, a branch destination address designated by the branch instruction is stored in a cache line of the instruction memory in which the branch instruction is stored. Means for registering to the corresponding entry of the tag memory, and, in response to the instruction fetch from the instruction memory by the CPU, to the entry of the tag memory corresponding to the cache line in which the fetched instruction is stored. The instruction at the registered branch destination address is A branch destination instruction prefetching means for determining a cache hit / cache miss and reading an instruction group including an instruction of a branch destination address registered in the entry of the tag memory from the main memory and storing the instruction memory when the cache miss occurs. A cache system comprising:

2. In response to execution of a branch instruction by the CPU, an offset address indicating an entry number in a cache line of the instruction memory in which the branch instruction is stored is set to an offset address corresponding to the cache line. Registering an entry in the tag memory corresponding to a cache line in which the fetched instruction is stored, in response to the instruction fetched from the instruction memory by the CPU. Means for determining whether or not the fetched instruction is an instruction executed before the branch instruction by referring to an offset address, and the fetched instruction is an instruction executed before the branch instruction Means for permitting execution of prefetching processing by the branch destination instruction prefetching means only when is determined. Cache system according to claim 1, characterized by comprising the al.

3. In response to execution of a branch instruction by the CPU, an offset address indicating an entry number in a cache line of the instruction memory in which the branch instruction is stored is set to an offset address corresponding to the cache line. Registering an entry in the tag memory corresponding to a cache line in which the fetched instruction is stored, in response to the instruction fetched from the instruction memory by the CPU. Means for determining whether or not the fetched instruction is the branch instruction by referring to an offset address, and an entry in the tag memory when the fetched instruction is determined to be the branch instruction The branch destination address specified by the registered branch destination address is added to the branch instruction. Cache system according to claim 1, characterized by further comprising a means for transferring to said CPU Te.

4. An instruction memory having a plurality of cache lines for storing respective instruction groups of different blocks in a main memory, and a tag memory having a plurality of entries respectively corresponding to the plurality of cache lines of the instruction memory. In each entry of the tag memory, a first field holding a block address of an instruction group stored in a corresponding cache line of the instruction memory, and an instruction stored in a corresponding cache line of the instruction memory A second field holding a valid bit indicating that the group is valid, and a third field holding a branch destination address specified by a branch instruction included in the instruction group stored in the corresponding cache line of the instruction memory. A field and a branch destination address indicating that the branch destination address is valid. A fourth field holding a reply valid bit and a fifth field holding an offset address indicating an entry number in the cache line in which the branch instruction is stored, and in response to execution of the branch instruction by the CPU. A unit for registering a branch destination address designated by the branch instruction in the third field of an entry of the tag memory corresponding to a cache line of the instruction memory in which the branch instruction is stored; In response to an instruction fetch from the instruction memory, a cache hit occurs for the instruction of the branch destination address registered in the third field of the entry of the tag memory corresponding to the cache line in which the fetched instruction is stored. / Judge a cache miss, and if there is a cache miss, Cache system characterized by comprising a branch target instruction prefetch means for storing in said instruction memory is read from said main memory a group of instructions including instructions branch address 3 registered in the field.

5. The branch destination address valid bit and the offset address of the branch instruction corresponding to a cache line of the instruction memory in which the branch instruction is stored, in response to execution of the branch instruction by the CPU. Means for registering respectively in the fourth and fifth fields of the entry of the tag memory, and in response to a fetch of an instruction from the instruction memory by the CPU, corresponding to a cache line in which the fetched instruction is stored By referring to the fourth and fifth fields of the entry of the tag memory, it is determined whether the branch destination address of the third field is valid and the fetched instruction is an instruction executed before the branch instruction. A branch destination address is valid and the fetched instruction is executed before the branch instruction. When it is the instruction is determined, the cache system of claim 4, wherein the further comprising means for permitting the execution of the prefetch process by the branch target instruction prefetch unit.

6. The branch destination instruction prefetching unit addresses the tag memory by the upper bit part of the branch destination address registered in the third field, and the entry of the tag memory designated by the branch destination address. Of the block address registered in the first field and the means for reading the block address, the read block address and the upper bit part of the branch destination address are compared, and the branch destination address Means for determining cache hit / cache miss for an instruction; and when a cache miss is determined, an instruction group including an instruction of a branch destination address registered in the third field is read from the main memory and the instruction memory 5. The cache system according to claim 4, further comprising: Stem.