JPWO2004077299A1

JPWO2004077299A1 - Cache memory

Info

Publication number: JPWO2004077299A1
Application number: JP2004568749A
Authority: JP
Inventors: 誠司後藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2003-02-27
Filing date: 2003-02-27
Publication date: 2006-06-08
Also published as: WO2004077299A1

Abstract

キャッシュメモリを、ＣＡＭ構成で、格納されているデータブロックの先頭アドレスを示す先頭ポインタを格納するＣＡＭ部と、ＣＡＭ部に格納されている先頭ポインタからの、データブロックを構成するデータのアドレスを示すポインタ間の一連の接続関係を格納するポインタマップメモリと、ポインタで示されたアドレスのデータを格納するポインタデータメモリで構成する。ポインタの接続関係を自由に設定可能であるため、キャッシュメモリに格納されるデータブロックの大きさを自由に設定でき、キャッシュメモリに使用率を向上することができる。The cache memory has a CAM configuration and stores a head pointer indicating the head address of the data block stored therein, and indicates the address of the data constituting the data block from the head pointer stored in the CAM portion A pointer map memory that stores a series of connection relationships between pointers and a pointer data memory that stores data at an address indicated by the pointer. Since the pointer connection relationship can be freely set, the size of the data block stored in the cache memory can be freely set, and the usage rate of the cache memory can be improved.

Description

本発明は、キャッシュメモリの構成に関する。 The present invention relates to a configuration of a cache memory.

プロセッサが使用する命令キャッシュメモリ（主記憶（メモリ）からの命令データを一時的に保持し、メモリアクセス遅延を緩和する一時記憶（メモリ））には、ダイレクトマップ、あるいは、Ｎウェイセットアソシエイティブ方式が主に使用されている。これらの方式では、アクセスアドレスのインデックス（キャッシュメモリのエントリ番号に相当するアドレス下位ビット）を用いてキャッシュを索引し、タグ（キャッシュメモリのエントリ数より上位のメモリアドレスと有効ビット）を用いてキャッシュデータの一致判定を行っている。ここで、特定のインデックスを持つプログラムは、同一時刻に２つ以上（Ｎウェイセット方式では、Ｎ＋１個以上）、同一時刻にキャッシュ上に存在することができないため、キャッシュの使用効率が低下するという問題点がある。
図１は、従来技術のダイレクトマップ方式を採用したキャッシュメモリの概念構成を示す図である。
ダイレクトマップキャッシュメモリにおいては、インデックス（キャッシュメモリの記憶領域を表すアドレス）として、２桁の１６進数（０ｘというは１６進数の数であることを表し、同図では、１６進数で００〜ｆｆまでのインデックスが設けられている）を取っている。そして、キャッシュメモリの１つのインデックスで表されるエントリの長さが０ｘ４０バイト、すなわち、６４バイトとなっている。ここで、同図では、主記憶の１６進数のアドレスの内、下位２桁の値によって、そのアドレスのデータをどのキャッシュエントリに格納するかを決めている。例えば、主記憶の０ｘ００００番地のデータは、下位２桁のアドレスとして００を持っており、これは、キャッシュメモリのインデックス０ｘ００で表されるエントリに格納される。また、主記憶のアドレスの下位２桁が８０のデータは、キャッシュメモリのインデックス０ｘ０２のエントリに格納される。このように、主記憶の下位２桁のアドレス値のみを見て、キャッシュメモリへの格納領域を決定しているので、同図に示されるように、主記憶のアドレスが０ｘ１０４０と０ｘ００４０の両方をキャッシュメモリに格納したい場合には、キャッシュメモリにインデックス０ｘ０１のエントリが１つしかないため、格納できないことになる。従って、いずれかのみを格納することになるが、キャッシュメモリに格納しなかった方がプロセッサから呼び出された場合、キャッシュミスが生じ、再び主記憶にアクセスしなければならなくなる。
図２は、従来の２ウェイセットアソシエイティブキャッシュメモリの概念構成図である。
この場合には、主記憶のアドレスの下位２桁のみを検出して、キャッシュメモリ内のどのエントリに格納するかを決定するが、同じインデックスのエントリが２つ設けられる（ウェイ１とウェイ２と呼ばれる）ので、ダイレクトマップキャッシュメモリの時に比べ、キャッシュミスが生じる可能性は小さくなるが、それでも、主記憶のアドレスの下位２桁が同じデータを３つ以上記憶することができないので、やはり、キャッシュミスが生じる。
図３は、従来の連想記憶メモリの概念構成図である。
連想記憶メモリ（ＣＡＭ）を用いれば、Ｎウェイがエントリ数と同一に出来、使用効率の問題を解決可能であるが、回路の増大による高コスト化という問題がある。
同図の場合には、２５６ウェイセットアソシエイティブキャッシュメモリと同等である。すなわち、主記憶のアドレスにおいて、下位２桁が同じアドレスが２５６個である場合には、主記憶のデータを全てキャッシュメモリに記憶可能となるのである。従って、主記憶からキャッシュメモリにデータが格納できないと言うことが無く、従って、キャッシュミスも起きない。しかし、主記憶のデータを全て格納するだけのキャッシュメモリを設けることは、それだけハードウェアが多くなることになり、また、多くのウェイを制御する必要も生じることから、キャッシュメモリ自体が高価になってしまう。
上記のキャッシュメモリの構成については、下記の参照文献を参考にされたい。
”コンピュータアーキテクチャ”第８章［記憶階層の設計］日経ＢＰ社／ＩＳＢＮ４−８２２２−７１５２−８
図４は、従来の４ウェイセットアソシエイティブキャッシュメモリのデータアクセス機構の構成図である。
プログラムカウンタからの命令アクセスリクエスト／アドレス（１）は命令アクセスＭＭＵ１０に送られ、物理アドレス（８）に変換された後、キャッシュタグ１２−１〜１２−４及びキャッシュデータ１３−１〜１３−４にアドレスとして送られる。同一下位アドレス（インデックス）で検索されたタグ出力のうち、タグ出力が示すアドレス上位ビット（タグ）が、命令アクセスＭＭＵ１０からのリクエストアドレスと一致したものがあれば、それはキャッシュデータ１３−１〜１３−４内に有効なデータが存在する（ヒット）ことを示す。これらの一致検出を比較器１５で行い、同時にそのヒット情報（４）でセレクタ１６を起動する。ヒットしていれば、そのデータは命令データ（５）として命令バッファに送られる。ヒットしなかった場合は、キャッシュミスリクエスト（３）として、二次キャッシュに出力される。キャッシュミスリクエスト（３）は、リクエストそのもの（３）−１とミスアドレス（３）−２からなる。その後、二次キャッシュからの戻りデータは、キャッシュタグ１２−１〜１２−４及びキャッシュデータ１３−１〜１３−４を更新し、同様に命令バッファにデータを返す。キャッシュタグ１２−１〜１２−４及びキャッシュデータ１３−１〜１３−４を更新する場合には、書き込みアドレス（７）が命令アクセスＭＭＵ１０から出力される。キャッシュタグ１２−１〜１２−４及びキャッシュデータ１３−１〜１３−４の更新は、タグ更新制御部１１及びデータ更新制御部１４により行われる。Ｎウェイ構成の場合、比較器１５及びセレクタ１６の入力がＮになる。また、ダイレクトマップ構成の場合はセレクタは不要になる。
特開平１１−３２８０１４号公報には、キャッシュメモリの使用効率を上げるために、アドレス空間中における空間局所性の及ぶ範囲の違いに対応するために、アドレス空間毎にブロックサイズを適切に設定する技術が開示されている。
特開２００１−２９７０３６号公報には、ダイレクトマップ方式やセットアソシエイティブ方式と共に用いることのできる、ＲＡＭセットキャッシュを設ける技術が記載されている。ＲＡＭセットキャッシュは、セットアソシエイティブ方式の１つのウェイを構成するように設けられ、行単位の読み書きを行う。Instruction map memory used by the processor (temporary storage (memory) that temporarily stores instruction data from the main memory (memory) and reduces memory access delay) is a direct map or N-way set associative method Is mainly used. In these methods, the cache is indexed using the access address index (the lower address bit corresponding to the entry number of the cache memory), and the cache is used using the tag (the memory address and the effective bit higher than the number of entries in the cache memory). Data matching is determined. Here, two or more programs having a specific index (N + 1 or more in the N-way set method) cannot exist on the cache at the same time, and the use efficiency of the cache is reduced. There is a problem.
FIG. 1 is a diagram showing a conceptual configuration of a cache memory that employs a conventional direct map method.
In the direct mapped cache memory, as an index (address indicating the storage area of the cache memory), a two-digit hexadecimal number (0x represents a hexadecimal number, and in the figure, from hexadecimal to 00 to ff) Index is provided). The length of the entry represented by one index of the cache memory is 0x40 bytes, that is, 64 bytes. In this figure, the cache entry in which the data at the address is stored is determined by the lower two digits of the hexadecimal address in the main memory. For example, data at address 0x0000 in the main memory has 00 as the lower two-digit address, and this is stored in the entry represented by the index 0x00 in the cache memory. Further, the data having the lower 2 digits of the main memory address of 80 is stored in the entry of the index 0x02 of the cache memory. In this way, since the storage area in the cache memory is determined by looking only at the lower two digits of the address value of the main memory, as shown in the figure, both the addresses of the main memory are 0x1040 and 0x0040. If the data is to be stored in the cache memory, the cache memory has only one entry with index 0x01, and therefore cannot be stored. Accordingly, only one of them is stored. However, if the one not stored in the cache memory is called from the processor, a cache miss occurs and the main memory must be accessed again.
FIG. 2 is a conceptual configuration diagram of a conventional 2-way set associative cache memory.
In this case, only the lower two digits of the main memory address are detected to determine which entry in the cache memory is stored, but two entries with the same index are provided (way 1 and way 2). Therefore, compared with the direct mapped cache memory, the possibility of a cache miss is reduced. However, since the lower two digits of the main memory address cannot store three or more pieces of the same data, the cache is still cached. Mistakes occur.
FIG. 3 is a conceptual configuration diagram of a conventional associative memory.
If the content addressable memory (CAM) is used, the N way can be made the same as the number of entries and the problem of use efficiency can be solved, but there is a problem that the cost is increased due to an increase in the number of circuits.
In the case of this figure, it is equivalent to a 256-way set associative cache memory. That is, in the main memory address, when there are 256 addresses having the same lower two digits, all the data in the main memory can be stored in the cache memory. Therefore, it cannot be said that data cannot be stored in the cache memory from the main memory, and therefore no cache miss occurs. However, providing a cache memory that only stores all the data in the main memory increases the amount of hardware and also requires the control of many ways, so that the cache memory itself becomes expensive. End up.
Please refer to the following references for the configuration of the above cache memory.
“Computer Architecture” Chapter 8 [Design of Storage Hierarchy] Nikkei Business Publications, Inc./ISBN 4-8222-7152-8
FIG. 4 is a configuration diagram of a data access mechanism of a conventional 4-way set associative cache memory.
The instruction access request / address (1) from the program counter is sent to the instruction access MMU 10 and converted into a physical address (8), and then cache tags 12-1 to 12-4 and cache data 13-1 to 13-4 are converted. As an address. Among the tag outputs searched for at the same lower address (index), if there is an address higher bit (tag) indicated by the tag output that matches the request address from the instruction access MMU 10, that is the cache data 13-1-13. -4 indicates that valid data exists (hit). These coincidence detections are performed by the comparator 15, and at the same time, the selector 16 is activated with the hit information (4). If there is a hit, the data is sent to the instruction buffer as instruction data (5). If there is no hit, it is output to the secondary cache as a cache miss request (3). The cache miss request (3) consists of the request itself (3) -1 and the miss address (3) -2. Thereafter, the return data from the secondary cache updates the cache tags 12-1 to 12-4 and the cache data 13-1 to 13-4, and similarly returns the data to the instruction buffer. When updating the cache tags 12-1 to 12-4 and the cache data 13-1 to 13-4, the write address (7) is output from the instruction access MMU10. The update of the cache tags 12-1 to 12-4 and the cache data 13-1 to 13-4 is performed by the tag update control unit 11 and the data update control unit 14. In the N-way configuration, the inputs of the comparator 15 and the selector 16 are N. In the case of a direct map configuration, a selector is not necessary.
Japanese Patent Laid-Open No. 11-328014 discloses a technique for appropriately setting a block size for each address space in order to cope with a difference in the range of spatial locality in the address space in order to increase the use efficiency of the cache memory. Is disclosed.
Japanese Patent Application Laid-Open No. 2001-297036 describes a technique for providing a RAM set cache that can be used together with a direct map method or a set associative method. The RAM set cache is provided so as to constitute one set associative way, and performs reading and writing in line units.

本発明の課題は、低コストで使用効率の高いキャッシュメモリを提供することである。
本発明のキャッシュメモリは、格納されているデータブロックの先頭アドレスに対応する先頭ポインタを格納する先頭ポインタ格納手段と、該データブロックを構成するデータが格納されているアドレスに対応するポインタと、該先頭ポインタからの該ポインタ間の接続関係を格納するポインタマップ格納手段と、該ポインタに対応するアドレスに格納されるデータを格納するポインタデータ格納手段とからなることを特徴とする。
本発明によれば、ポインタの接続関係を格納することにより、データをブロックとして格納するようにしている。従って、ポインタの接続関係を変更すれば、可変長のデータブロックを格納可能となる。
すなわち、格納するデータブロックの単位が決定されている従来の方式に比べ、キャッシュメモリの記憶容量を可能な限り有効に使い切ることができ、また、大きなブロック単位で格納すべき場合と小さなブロック単位でよい場合に対して柔軟に対応できる。よって、キャッシュメモリの使用効率が上がり、結果として、キャッシュミスの可能性を小さくすることができる。An object of the present invention is to provide a cache memory with low cost and high use efficiency.
The cache memory of the present invention comprises a head pointer storage means for storing a head pointer corresponding to a head address of a stored data block, a pointer corresponding to an address at which data constituting the data block is stored, It is characterized by comprising pointer map storage means for storing the connection relationship between the pointers from the head pointer, and pointer data storage means for storing data stored at an address corresponding to the pointer.
According to the present invention, data is stored as a block by storing pointer connection relationships. Therefore, variable length data blocks can be stored by changing the pointer connection relationship.
In other words, compared to the conventional method in which the unit of the data block to be stored is determined, the storage capacity of the cache memory can be used as effectively as possible. It is possible to respond flexibly to good cases. Therefore, the use efficiency of the cache memory is increased, and as a result, the possibility of a cache miss can be reduced.

図１は、従来技術のダイレクトマップ方式を採用したキャッシュメモリの概念構成を示す図である。
図２は、従来の２ウェイセットアソシエイティブキャッシュメモリの概念構成図である。
図３は、従来の連想記憶メモリの概念構成図である。
図４は、従来の４ウェイセットアソシエイティブキャッシュメモリのデータアクセス機構の構成図である。
図５及び図６は、本発明の概念を説明する図である。
図７は、本発明を含む全体構成図である。
図８は、本発明の実施形態の構成図である。
図９は、プロセッサの命令アクセスＭＭＵのページ管理機構とＣＡＭとを共用する場合の構成を示したものである。
図１０〜図１３は、本発明の実施形態の動作を説明する図である。FIG. 1 is a diagram showing a conceptual configuration of a cache memory that employs a conventional direct map method.
FIG. 2 is a conceptual configuration diagram of a conventional 2-way set associative cache memory.
FIG. 3 is a conceptual configuration diagram of a conventional associative memory.
FIG. 4 is a configuration diagram of a data access mechanism of a conventional 4-way set associative cache memory.
5 and 6 are diagrams for explaining the concept of the present invention.
FIG. 7 is an overall configuration diagram including the present invention.
FIG. 8 is a configuration diagram of an embodiment of the present invention.
FIG. 9 shows a configuration when the page management mechanism of the instruction access MMU of the processor and the CAM are shared.
10-13 is a figure explaining operation | movement of embodiment of this invention.

図５及び図６は、本発明の概念を説明する図である。
本発明では、プロセッサの命令実行はキャッシュ１エントリ分ではなく、数ブロック〜数十ブロック以上で行われることが多いことに着目した。ＣＡＭを全エントリに対し適用可能ならば、問題は解決可能であるが、高コストになることは前述の通りである。そこで、ＣＡＭをキャッシュエントリごとではなく、命令ブロック単位毎に適用する。具体的には、ある命令ブロックの情報（先頭アドレス、命令ブロックのサイズ、命令ブロックの先頭ポインタの番号）のみＣＡＭ上に保持する（図５参照）。命令データ自体は先頭ポインタで示されるＦＩＦＯ構造のポインタメモリに記憶する（図６参照）。ポインタメモリはポインタマップメモリとポインタデータメモリの２つのメモリから構成され、ポインタマップメモリはポインタ間の接続情報を、ポインタデータメモリはそのポインタにおけるデータそのものを表し、複数のＦＩＦＯを１つのメモリ上に仮想的に構築可能としている。すなわち、記憶領域は、例えば、ＲＡＭなどの連続領域であるが、データの連続性がポインタの接続情報を保持することによって保たれているということである。よって、連続性を持つポインタによって示されるデータは１つのブロックを構成し、ブロック毎に本発明の実施形態のキャッシュメモリに格納されていることになる。ここで、特に、本発明の実施形態のキャッシュメモリは、ポインタの接続情報を操作することによって、格納されるデータのブロックサイズを自由に変更可能となっている。すなわち、物理的なＦＩＦＯが複数用意されているわけではない。本発明における命令キャッシュの読み込みは（１）アドレスからＣＡＭを索引して、アクセスすべきデータが含まれるブロックの先頭アドレスが格納されているポインタを取得する、（２）ポインタマップメモリからアクセスすべきデータが含まれるブロックへのポインタを取得する、（３）ポインタデータメモリから取得したポインタで示されるアドレスの命令データブロックから、アクセスしたい命令データを読み込む、（４）実行する、となる。これにより、命令ブロック毎に異なる長さのデータ記録領域を実装したのと同じキャッシュ使用効率が得られる。また、全エントリにＣＡＭを用いるより索引情報が少ないため回路を相対的に削減することができる。キャッシュミス時は、ＣＡＭにタグをセットすると同時に空きポインタを空きポインタ供給部（不図示）から提供し、メモリからのデータを空きポインタが示すポインタメモリのエントリに書き込む。継続アクセスがプロセッサから要求された場合は、空きポインタを再度供給し、同様にキャッシュに書き込み、２番目のポインタをポインタキューに追加する。空きポインタが枯渇した場合は、古いデータから破棄するなどの方法で命令ブロックを破棄し、空きポインタを確保する。
図７は、本発明を含む全体構成図である。
同図はマイクロプロセッサの概略を示しており、動作は以下のようになる。
１）命令フェッチ
実行するための命令を外部バスより外部バスインターフェース２０を介して獲得する。まず、プログラムカウンタ２１の指し示す命令が命令バッファ２２に存在するかを確認し、ない場合、命令バッファ２２は命令をフェッチする要求を命令アクセスＭＭＵ２３に送る。命令アクセスＭＭＵ２３はプログラムが使用する論理アドレスから、ハードウェアのマッピング順序に依存する物理アドレスに変換する。そのアドレスを用いて命令アクセス一次キャッシュタグ２４を検索し、一致すれば命令アクセス一次キャッシュデータ２５に該当データが存在するため、読み出しアドレスを送り、命令データを命令バッファ２２に返す。存在しない場合は、更に二次キャッシュタグ２６を検索し、なお存在しない場合は外部バスというように要求を発行し、戻りデータを順次、二次キャッシュデータ２７、命令アクセス一次キャッシュデータ２５へと補充する。このとき、補充されたことを二次キャッシュタグ２６、命令アクセス一次キャッシュタグ２４の更新により記憶する。補充されたデータは、命令アクセス一次キャッシュデータ２５に存在しているときと同じ要領で、命令バッファ２２に格納する。
２）命令の実行
命令バッファ２２に格納された命令列は、実行ユニット２８に送られ、それぞれの命令の種別に応じて演算器２９、ロードストアユニット３０に転送される。演算命令や分岐命令は演算器２９の出力を汎用レジスタファイル３１に記録するか、あるいはプログラムカウンタ（不図示）を更新する、といった処理を行う。また、ロードストア命令は、ロードストアユニット３０にて、命令アクセスと同様にデータアクセスＭＭＵ３２−データアクセス一次キャッシュタグ３３−データアクセス一次キャッシュデータ３４と順次アクセスし、そのデータを汎用レジスタファイル３１にコピーするロード命令、あるいは、そのエントリに対し、汎用レジスタファイル３１からコピーするストア命令、と命令に応じて実行する。一次キャッシュにない場合は、命令実行機構と共用の二次キャッシュ、あるいは外部バスよりデータを獲得して、同様に実行する。実行の完了後、プログラムカウンタは逐次加算されるか、あるいは、分岐指示アドレスに更新され、再度１）の命令フェッチを行う。
３）全体
このように命令フェッチ、命令実行を繰り返し、マイクロプロセッサは動作するが本発明では、点線部の命令アクセスＭＭＵ２３、命令アクセス一次キャッシュタグ２４、命令アクセス一次キャッシュデータ２５の新しい構成を提供する。
図８は、本発明の実施形態の構成図である。
プログラムカウンタからの命令アクセスリクエスト／アドレスは命令アクセスＭＭＵ２３に送られ、物理アドレスに変換された後、ＣＡＭ４１にアドレスとして送られる。ＣＡＭ４１はタグ、サイズ、先頭ポインタデータを出力する。アドレス・サイズ判定／ヒット判定ブロック４２では、最終的な希望ポインタの検索を行い、そのポインタが存在した場合はポインタデータを読み出し、そのデータは命令データ（１）として命令バッファ（不図示）に送られる。存在しない場合、キャッシュミスリクエスト（２）として、二次キャッシュに出力される。その後、二次キャッシュからの戻りデータは、ブロック先頭判定ブロック４３を経由し、戻りデータが先頭命令であれば、ＣＡＭ４１の更新を、先頭命令でなければポインタマップメモリ４４の更新とＣＡＭサイズ情報４２の更新と共に、ポインタデータメモリ４５を更新し、同様に命令バッファにデータを返す。ブロック先頭判定ブロック４３では、書き込みに際し、空きポインタＦＩＦＯ４６からの空きポインタの供給を受けるが、枯渇してしまった場合は、空きポインタＦＩＦＯ４６から破棄ポインタ選択制御ブロック４７に指示が送られ、任意のＣＡＭエントリに破棄指示を行う。その出力はアドレス・サイズ判定／ヒット判定ブロック４２で無効化され、空きポインタＦＩＦＯ４６に返却される。
図９は、プロセッサの命令アクセスＭＭＵのページ管理機構とＣＡＭとを共用する場合の構成を示したものである。
なお、同図において、図８と同じ構成要素には同じ参照符号を付して、説明を省略する。
この構成は、ＭＭＵのアドレス変換単位（ページ）とキャッシュの管理単位を同一サイズにすることにより、ＭＭＵ内のＣＡＭに同一機能を持たせ、ＣＡＭの削減をはかったものである（同図の５０）。すなわち、命令アクセスＭＭＵでは、仮想アドレスを物理アドレスに変換するテーブルを有しているわけであるが、このテーブルと、ＣＡＭのテーブルとを１つのテーブルに併合し、命令アクセスＭＭＵの機構で、ＣＡＭの検索などの動作を行うようにしたものである。これによれば、テーブルの検索機構を命令アクセスＭＭＵとＣＡＭの検索機構を１つのハードウェアで扱うことができるので、ハードウェアの削減となる。
また、本発明の実施形態においては、キャッシュメモリに読み込む命令データを一塊りずつ格納するので、プログラムをブロック化して読み込む必要がある。この場合、命令がプロセッサが読み込んだとき、読み込んだ命令がサブルーチンコール及びその復帰命令、条件分岐命令、例外処理及びその復帰命令であると判断した場合には、プログラムの先頭あるいは最後であると判断して、その命令間のブロックを単位としてキャッシュメモリに格納する。このように、プログラムの内容に従って、読み込む命令をブロック化してキャッシュメモリに読み込む場合は、ブロックの大きさが、読み込む度に異なることになるが、本発明の実施形態によれば、ポインタを用いて、可変長のブロックをメモリ上に構成可能であるので、このような方法が採用できる。あるいは、ブロックの大きさを強制的に決定し、順次プログラムの命令をデコードする際に、任意の命令をブロックの先頭とし、所定の大きさのブロックが得られた時点で、最後の命令をブロックの最後の命令とするような方法が可能である。この場合、図８、９のブロック先頭判定の命令デコードを変更するだけで、任意のブロック化方法が採用可能である。例えば、プログラムの記述に従ってブロック化する場合には、ＣＡＬＬ命令／レジスタライト命令等を判定して、ブロックの先頭と判断することになる。
本発明の実施形態においては、プロセッサで命令ブロックの先頭と末尾を検出し、命令ブロックＣＡＭに制御信号を送出する。本制御機構では、先頭信号を受け取ると、キャッシュタグを記録し、主記憶よりデータを獲得し、ポインタが示すキャッシュアドレスに命令を書き込む。プロセッサの要求が１キャッシュエントリに達する毎に、空きポインタキューより空きエントリを補充し、そのエントリ番号をキャッシュタグ・キューに追加すると同時に、命令ブロックサイズを加算する。同一ブロックを複数回、またはブロック内の途中に分岐する場合は、キャッシュタグ＋サイズからエントリ番号を割り出し、アクセスを行う。また、上記において、命令ブロックの先頭と末尾の通知を特定のレジスタアクセスで通知する。この場合、命令によるブロックの明示的開始／終了を宣言する必要がある。これは、前述した、プログラムの中で命令によらず、強制的にブロック化する場合である。
図１０〜図１３は、本発明の実施形態の動作を説明する図である。
図１０は、本発明の実施形態に従ったキャッシュメモリに命令が存在した場合、すなわち、命令ヒットの場合の動作である。
プロセッサ６０から、アクセスすべき命令データのアドレスが出力されると、ＣＡＭ部６１を検索して、アクセスすべき命令データを含むブロックの先頭ポインタを検索する。命令ヒットの場合には、アクセスすべき命令データを含むブロックの先頭ポインタが存在した場合である。次には、得られた先頭ポインタから、ポインタマップメモリ６２を検索し、ブロックを構成する命令データのポインタを全て取得する。そして、取得したポインタを用いて、ポインタデータメモリ６３から命令データを取得し、プロセッサ６０に返す。
図１１は、本発明の実施形態に従ったキャッシュメモリに命令が存在しない、すなわち、命令ミスの場合であって、アクセスすべき命令がブロックの先頭に有るべき場合を示している。
この場合、プロセッサ６０からアドレス指定が行われ、命令データへのアクセスが試みられる。ＣＡＭ部６１では、このアドレスに従ってポインタを検索するが、今の場合、対応する命令が含まれているブロックが無く、しかも、対応する命令がブロックの先頭となるべきものであると判断される。この場合、空きポインタキュー６４から、空きのポインタを獲得し、主記憶から当該命令データを含むブロックを読み込み、ＣＡＭの先頭ポインタで示される先頭アドレスを更新する。また、ポインタマップメモリ６２も、獲得された空きポインタをブロックとして関連付けて、ポインタデータメモリ６３は、各ポインタに主記憶から読み込まれた命令データを対応付けて、命令データをプロセッサ６０に返す。空きポインタキュー６４は、通常のＦＩＦＯによるポインタデータバッファであり、初期値は、ポインタが０から最大値まで記録されるものである。
図１２は、アクセスすべき命令データが本発明の実施形態に従ったキャッシュメモリに存在せず、命令データがブロックの先頭以外の場所に存在すべきものである場合の動作を示す図である。
プロセッサ６０からアドレスが出力され、ＣＡＭ部６１で命令データの検索をするが、キャッシュメモリに存在しないと判断される。すると、空きポインタキュー６４から空きポインタを取得し、主記憶から当該命令データを含むブロックを読み込む。そして、当該ブロックに隣接する、既にＣＡＭ部６１に登録されているブロックに、読み込んだブロックをつなげる要領で、ＣＡＭ部６１のブロックサイズを更新し、ポインタマップメモリ６２を更新し、ポインタデータメモリ６３が、読み込んだブロックの命令データを格納し、命令データをプロセッサ６０に返す。
図１３は、命令データのブロックを読み込む必要があるが、空きポインタがない場合の動作を示す図である。
プロセッサ６０から命令データへのアクセスがＣＡＭ部６１になされる。しかし、命令データがキャッシュメモリ内にないと判断される。更に、主記憶から命令データのブロックを読み込むために空きポインタキューから空きポインタを獲得しようとするが、空きポインタがないため、任意の１ブロックを破棄する命令が出される。ポインタマップメモリ６２は、ポインタマップから１ブロック分を破棄し、破棄したポインタを空きポインタキュー６４に通知する。これにより、空きポインタキュー６４は、空きポインタを獲たので、これをＣＡＭ部６１に通知して、主記憶から新しい命令データブロックを読み込む。5 and 6 are diagrams for explaining the concept of the present invention.
In the present invention, attention is paid to the fact that processor instruction execution is often performed in several blocks to several tens of blocks or more instead of one cache entry. If the CAM can be applied to all entries, the problem can be solved, but as described above, the cost becomes high. Therefore, CAM is applied not for each cache entry but for each instruction block. Specifically, only information on a certain instruction block (start address, instruction block size, instruction block start pointer number) is held on the CAM (see FIG. 5). The instruction data itself is stored in a pointer memory having a FIFO structure indicated by the head pointer (see FIG. 6). The pointer memory is composed of two memories, a pointer map memory and a pointer data memory. The pointer map memory represents connection information between pointers, the pointer data memory represents data in the pointer itself, and a plurality of FIFOs on one memory. Virtually buildable. That is, the storage area is, for example, a continuous area such as a RAM, but data continuity is maintained by holding pointer connection information. Therefore, the data indicated by the pointer having continuity constitutes one block, and each block is stored in the cache memory according to the embodiment of the present invention. Here, in particular, the cache memory according to the embodiment of the present invention can freely change the block size of stored data by manipulating pointer connection information. That is, a plurality of physical FIFOs are not prepared. In reading the instruction cache in the present invention, (1) the CAM is indexed from the address to obtain a pointer storing the head address of the block containing the data to be accessed, and (2) the pointer map memory should be accessed. A pointer to a block including data is acquired, (3) instruction data to be accessed is read from an instruction data block at an address indicated by the pointer acquired from the pointer data memory, and (4) execution is performed. As a result, the same cache usage efficiency can be obtained as when a data recording area having a different length is mounted for each instruction block. Further, since there is less index information than using CAM for all entries, the circuit can be relatively reduced. When a cache miss occurs, a tag is set in the CAM and a free pointer is provided from a free pointer supply unit (not shown), and data from the memory is written to the pointer memory entry indicated by the free pointer. When continuous access is requested from the processor, the empty pointer is supplied again, and it is similarly written to the cache, and the second pointer is added to the pointer queue. When the empty pointer is exhausted, the instruction block is discarded by a method such as discarding old data, and an empty pointer is secured.
FIG. 7 is an overall configuration diagram including the present invention.
This figure shows the outline of the microprocessor, and the operation is as follows.
1) Instruction fetch An instruction to be executed is acquired from the external bus via the external bus interface 20. First, it is confirmed whether or not the instruction pointed to by the program counter 21 exists in the instruction buffer 22. If there is no instruction, the instruction buffer 22 sends a request to fetch an instruction to the instruction access MMU 23. The instruction access MMU 23 converts the logical address used by the program into a physical address depending on the hardware mapping order. The instruction access primary cache tag 24 is searched using the address, and if it matches, the corresponding data exists in the instruction access primary cache data 25, so the read address is sent and the instruction data is returned to the instruction buffer 22. If it does not exist, the secondary cache tag 26 is further searched. If it does not exist, a request is issued as an external bus, and the return data is sequentially replenished to the secondary cache data 27 and the instruction access primary cache data 25. To do. At this time, the replenishment is stored by updating the secondary cache tag 26 and the instruction access primary cache tag 24. The supplemented data is stored in the instruction buffer 22 in the same manner as it exists in the instruction access primary cache data 25.
2) Instruction Execution The instruction sequence stored in the instruction buffer 22 is sent to the execution unit 28 and transferred to the arithmetic unit 29 and the load / store unit 30 according to the type of each instruction. An arithmetic instruction or a branch instruction performs processing such as recording the output of the arithmetic unit 29 in the general-purpose register file 31 or updating a program counter (not shown). The load / store instruction sequentially accesses the data access MMU 32 -data access primary cache tag 33 -data access primary cache data 34 in the load / store unit 30 in the same manner as the instruction access, and copies the data to the general-purpose register file 31. A load instruction to be executed or a store instruction to be copied from the general register file 31 is executed in response to the entry. If it is not in the primary cache, data is acquired from the secondary cache shared with the instruction execution mechanism or the external bus and executed in the same manner. After the execution is completed, the program counter is sequentially added or updated to the branch instruction address, and the instruction fetch of 1) is performed again.
3) Overall In this way, instruction fetch and instruction execution are repeated, and the microprocessor operates, but the present invention provides a new configuration of the instruction access MMU 23, instruction access primary cache tag 24, and instruction access primary cache data 25 in the dotted line portion. .
FIG. 8 is a configuration diagram of an embodiment of the present invention.
The instruction access request / address from the program counter is sent to the instruction access MMU 23, converted into a physical address, and then sent to the CAM 41 as an address. The CAM 41 outputs the tag, size, and head pointer data. In the address / size determination / hit determination block 42, the final desired pointer is searched. If the pointer exists, the pointer data is read, and the data is sent to the instruction buffer (not shown) as instruction data (1). It is done. If it does not exist, it is output to the secondary cache as a cache miss request (2). Thereafter, the return data from the secondary cache passes through the block head determination block 43. If the return data is the head instruction, the CAM 41 is updated. If the return data is not the head instruction, the pointer map memory 44 is updated and the CAM size information 42 is updated. Is updated, the pointer data memory 45 is updated, and data is similarly returned to the instruction buffer. In the block head determination block 43, the empty pointer is supplied from the empty pointer FIFO 46 at the time of writing, but when it is depleted, an instruction is sent from the empty pointer FIFO 46 to the discard pointer selection control block 47, and an arbitrary CAM is received. Instructs the entry to be discarded. The output is invalidated by the address / size determination / hit determination block 42 and returned to the empty pointer FIFO 46.
FIG. 9 shows a configuration when the page management mechanism of the instruction access MMU of the processor and the CAM are shared.
In the figure, the same components as those in FIG. 8 are denoted by the same reference numerals, and the description thereof is omitted.
In this configuration, the MMU address translation unit (page) and the cache management unit have the same size, so that the CAM in the MMU has the same function to reduce the CAM (50 in the figure). ). In other words, the instruction access MMU has a table for converting a virtual address into a physical address, but this table and the CAM table are merged into one table. The operation such as searching is performed. According to this, since the table search mechanism can handle the instruction access MMU and the CAM search mechanism with a single hardware, the hardware is reduced.
Further, in the embodiment of the present invention, the instruction data to be read into the cache memory is stored one by one, so that it is necessary to read the program in blocks. In this case, when the instruction is read by the processor, if it is determined that the read instruction is a subroutine call and its return instruction, conditional branch instruction, exception handling and its return instruction, it is determined that it is the beginning or end of the program. Then, the block between the instructions is stored in the cache memory as a unit. As described above, when the instruction to be read is blocked and read into the cache memory in accordance with the contents of the program, the size of the block differs every time it is read. According to the embodiment of the present invention, the pointer is used. Since a variable-length block can be configured on the memory, such a method can be adopted. Alternatively, when the size of a block is forcibly determined and instructions in a sequential program are decoded, an arbitrary instruction is set at the beginning of the block, and when the block of a predetermined size is obtained, the last instruction is blocked. It is possible to use a method in which the last instruction is used. In this case, an arbitrary blocking method can be adopted only by changing the instruction decoding of the block head determination in FIGS. For example, when a block is formed in accordance with the description of the program, a CALL instruction / register write instruction or the like is determined to determine the head of the block.
In the embodiment of the present invention, the processor detects the beginning and end of an instruction block and sends a control signal to the instruction block CAM. In this control mechanism, when a head signal is received, a cache tag is recorded, data is acquired from the main memory, and an instruction is written at the cache address indicated by the pointer. Each time a processor request reaches one cache entry, the empty pointer queue is replenished with empty entries, the entry number is added to the cache tag queue, and at the same time, the instruction block size is added. When branching the same block a plurality of times or in the middle of the block, the entry number is determined from the cache tag + size and accessed. Further, in the above, notification of the beginning and end of the instruction block is notified by specific register access. In this case, it is necessary to declare the explicit start / end of the block by the instruction. This is a case where the block is forcibly made regardless of the instruction in the program as described above.
10-13 is a figure explaining operation | movement of embodiment of this invention.
FIG. 10 shows the operation in the case where an instruction exists in the cache memory according to the embodiment of the present invention, that is, in the case of an instruction hit.
When the address of the instruction data to be accessed is output from the processor 60, the CAM unit 61 is searched to search for the head pointer of the block including the instruction data to be accessed. In the case of an instruction hit, there is a head pointer of a block including instruction data to be accessed. Next, the pointer map memory 62 is searched from the obtained head pointer, and all instruction data pointers constituting the block are acquired. Then, using the acquired pointer, the instruction data is acquired from the pointer data memory 63 and returned to the processor 60.
FIG. 11 shows a case where there is no instruction in the cache memory according to the embodiment of the present invention, that is, an instruction miss, and the instruction to be accessed should be at the head of the block.
In this case, addressing is performed from the processor 60, and access to instruction data is attempted. The CAM unit 61 searches for a pointer according to this address. In this case, it is determined that there is no block including the corresponding instruction, and that the corresponding instruction should be the head of the block. In this case, an empty pointer is obtained from the empty pointer queue 64, a block including the instruction data is read from the main memory, and the head address indicated by the head pointer of the CAM is updated. The pointer map memory 62 also associates the acquired empty pointer as a block, and the pointer data memory 63 associates the instruction data read from the main memory with each pointer and returns the instruction data to the processor 60. The empty pointer queue 64 is a pointer data buffer by a normal FIFO, and the initial value is recorded from 0 to the maximum value of the pointer.
FIG. 12 is a diagram showing an operation when the instruction data to be accessed does not exist in the cache memory according to the embodiment of the present invention, and the instruction data should exist at a place other than the head of the block.
An address is output from the processor 60, and the instruction data is searched by the CAM unit 61, but it is determined that it does not exist in the cache memory. Then, an empty pointer is acquired from the empty pointer queue 64, and a block including the instruction data is read from the main memory. Then, the block size of the CAM unit 61 is updated, the pointer map memory 62 is updated, and the pointer data memory 63 is updated in a manner to connect the read block to the block that is adjacent to the block and has already been registered in the CAM unit 61. However, the instruction data of the read block is stored, and the instruction data is returned to the processor 60.
FIG. 13 is a diagram showing an operation when it is necessary to read a block of instruction data but there is no empty pointer.
Access to the instruction data from the processor 60 is made to the CAM unit 61. However, it is determined that the instruction data is not in the cache memory. Furthermore, an attempt is made to acquire an empty pointer from the empty pointer queue in order to read a block of instruction data from the main memory, but since there is no empty pointer, an instruction to discard an arbitrary block is issued. The pointer map memory 62 discards one block from the pointer map and notifies the empty pointer queue 64 of the discarded pointer. As a result, the empty pointer queue 64 has acquired an empty pointer, so notifies the CAM unit 61 of this and reads a new instruction data block from the main memory.

本発明による、キャッシュメモリの構成としてＣＡＭを採用するよりも少ない回路量で、キャッシュ使用効率を大幅に改善したキャッシュメモリ機構を提供可能となる。 According to the present invention, it is possible to provide a cache memory mechanism in which the cache use efficiency is greatly improved with a smaller amount of circuit than when the CAM is adopted as the configuration of the cache memory.

プロセッサが使用する命令キャッシュメモリ（主記憶（メモリ）からの命令データを一時的に保持し、メモリアクセス遅延を緩和する一時記憶（メモリ））には、ダイレクトマップ、あるいは、Ｎウェイセットアソシエイティブ方式が主に使用されている。これらの方式では、アクセスアドレスのインデックス（キャッシュメモリのエントリ番号に相当するアドレス下位ビット）を用いてキャッシュを索引し、タグ（キャッシュメモリのエントリ数より上位のメモリアドレスと有効ビット）を用いてキャッシュデータの一致判定を行っている。ここで、特定のインデックスを持つプログラムは、同一時刻に２つ以上（Ｎウェイセット方式では、Ｎ＋１個以上）、同一時刻にキャッシュ上に存在することができないため、キャッシュの使用効率が低下するという問題点がある。 Instruction map memory used by the processor (temporary storage (memory) that temporarily stores instruction data from the main memory (memory) and reduces memory access delay) is a direct map or N-way set associative method Is mainly used. In these methods, the cache is indexed using the access address index (the lower address bit corresponding to the entry number of the cache memory), and the cache is used using the tag (the memory address and the effective bit higher than the number of entries in the cache memory). Data matching is determined. Here, two or more programs having a specific index (N + 1 or more in the N-way set method) cannot exist on the cache at the same time, and the use efficiency of the cache is reduced. There is a problem.

図１は、従来技術のダイレクトマップ方式を採用したキャッシュメモリの概念構成を示す図である。
ダイレクトマップキャッシュメモリにおいては、インデックス（キャッシュメモリの記憶領域を表すアドレス）として、２桁の１６進数（０ｘというは１６進数の数であることを表し、同図では、１６進数で００〜ｆｆまでのインデックスが設けられている）を取っている。そして、キャッシュメモリの１つのインデックスで表されるエントリの長さが０ｘ４０バイト、すなわち、６４バイトとなっている。ここで、同図では、主記憶の１６進数のアドレスの内、下位２桁の値によって、そのアドレスのデータをどのキャッシュエントリに格納するかを決めている。例えば、主記憶の０ｘ００００番地のデータは、下位２桁のアドレスとして００を持っており、これは、キャッシュメモリのインデックス０ｘ００で表されるエントリに格納される。また、主記憶のアドレスの下位２桁が８０のデータは、キャッシュメモリのインデックス０ｘ０２のエントリに格納される。このように、主記憶の下位２桁のアドレス値のみを見て、キャッシュメモリへの格納領域を決定しているので、同図に示されるように、主記憶のアドレスが０ｘ１０４０と０ｘ００４０の両方をキャッシュメモリに格納したい場合には、キャッシュメモリにインデックス０ｘ０１のエントリが１つしかないため、格納できないことになる。従って、いずれかのみを格納することになるが、キャッシュメモリに格納しなかった方がプロセッサから呼び出された場合、キャッシュミスが生じ、再び主記憶にアクセスしなければならなくなる。 FIG. 1 is a diagram showing a conceptual configuration of a cache memory that employs a conventional direct map method.
In the direct mapped cache memory, as an index (address indicating the storage area of the cache memory), a two-digit hexadecimal number (0x represents a hexadecimal number, and in the figure, from hexadecimal to 00 to ff) Index is provided). The length of the entry represented by one index of the cache memory is 0x40 bytes, that is, 64 bytes. In this figure, the cache entry in which the data at the address is stored is determined by the lower two digits of the hexadecimal address in the main memory. For example, data at address 0x0000 in the main memory has 00 as the lower two-digit address, and this is stored in the entry represented by the index 0x00 in the cache memory. Further, the data having the lower 2 digits of the main memory address of 80 is stored in the entry of the index 0x02 of the cache memory. In this way, since the storage area in the cache memory is determined by looking only at the lower two digits of the address value of the main memory, as shown in the figure, both the addresses of the main memory are 0x1040 and 0x0040. If the data is to be stored in the cache memory, the cache memory has only one entry with index 0x01, and therefore cannot be stored. Accordingly, only one of them is stored. However, if the one not stored in the cache memory is called from the processor, a cache miss occurs and the main memory must be accessed again.

図２は、従来の２ウェイセットアソシエイティブキャッシュメモリの概念構成図である。
この場合には、主記憶のアドレスの下位２桁のみを検出して、キャッシュメモリ内のどのエントリに格納するかを決定するが、同じインデックスのエントリが２つ設けられる（ウェイ１とウェイ２と呼ばれる）ので、ダイレクトマップキャッシュメモリの時に比べ、キャッシュミスが生じる可能性は小さくなるが、それでも、主記憶のアドレスの下位２桁が同じデータを３つ以上記憶することができないので、やはり、キャッシュミスが生じる。 FIG. 2 is a conceptual configuration diagram of a conventional 2-way set associative cache memory.
In this case, only the lower two digits of the main memory address are detected to determine which entry in the cache memory is stored, but two entries with the same index are provided (way 1 and way 2). Therefore, compared with the direct mapped cache memory, the possibility of a cache miss is reduced. However, since the lower two digits of the main memory address cannot store three or more pieces of the same data, the cache is still cached. Mistakes occur.

図３は、従来の連想記憶メモリの概念構成図である。
連想記憶メモリ（ＣＡＭ）を用いれば、Ｎウェイがエントリ数と同一に出来、使用効率の問題を解決可能であるが、回路の増大による高コスト化という問題がある。 FIG. 3 is a conceptual configuration diagram of a conventional associative memory.
If the content addressable memory (CAM) is used, the N way can be made the same as the number of entries and the problem of use efficiency can be solved, but there is a problem that the cost is increased due to an increase in the number of circuits.

同図の場合には、２５６ウェイセットアソシエイティブキャッシュメモリと同等である。すなわち、主記憶のアドレスにおいて、下位２桁が同じアドレスが２５６個である場合には、主記憶のデータを全てキャッシュメモリに記憶可能となるのである。従って、主記憶からキャッシュメモリにデータが格納できないと言うことが無く、従って、キャッシュミスも起きない。しかし、主記憶のデータを全て格納するだけのキャッシュメモリを設けることは、それだけハードウェアが多くなることになり、また、多くのウェイを制御する必要も生じることから、キャッシュメモリ自体が高価になってしまう。 In the case of this figure, it is equivalent to a 256-way set associative cache memory. That is, in the main memory address, when there are 256 addresses having the same lower two digits, all the data in the main memory can be stored in the cache memory. Therefore, it cannot be said that data cannot be stored in the cache memory from the main memory, and therefore no cache miss occurs. However, providing a cache memory that only stores all the data in the main memory increases the amount of hardware and also requires the control of many ways, so that the cache memory itself becomes expensive. End up.

上記のキャッシュメモリの構成については、下記の参照文献を参考にされたい。
”コンピュータアーキテクチャ”第８章［記憶階層の設計］日経ＢＰ社／ＩＳＢＮ４−８２２２−７１５２−８
図４は、従来の４ウェイセットアソシエイティブキャッシュメモリのデータアクセス機構の構成図である。 Please refer to the following references for the configuration of the above cache memory.
“Computer Architecture” Chapter 8 [Design of Storage Hierarchy] Nikkei Business Publications, Inc./ISBN 4-8222-7152-8
FIG. 4 is a configuration diagram of a data access mechanism of a conventional 4-way set associative cache memory.

プログラムカウンタからの命令アクセスリクエスト／アドレス（１）は命令アクセスＭＭＵ１０に送られ、物理アドレス（８）に変換された後、キャッシュタグ１２−１〜１２−４及びキャッシュデータ１３−１〜１３−４にアドレスとして送られる。同一下位アドレス（インデックス）で検索されたタグ出力のうち、タグ出力が示すアドレス上位ビット（タグ）が、命令アクセスＭＭＵ１０からのリクエストアドレスと一致したものがあれば、それはキャッシュデータ１３−１〜１３−４内に有効なデータが存在する（ヒット）ことを示す。これらの一致検出を比較器１５で行い、同時にそのヒット情報（４）でセレクタ１６を起動する。ヒットしていれば、そのデータは命令データ（５）として命令バッファに送られる。ヒットしなかった場合は、キャッシュミスリクエスト（３）として、二次キャッシュに出力される。キャッシュミスリクエスト（３）は、リクエストそのもの（３）−１とミスアドレス（３）−２からなる。その後、二次キャッシュからの戻りデータは、キャッシュタグ１２−１〜１２−４及びキャッシュデータ１３−１〜１３−４を更新し、同様に命令バッファにデータを返す。キャッシュタグ１２−１〜１２−４及びキャッシュデータ１３−１〜１３−４を更新する場合には、書き込みアドレス（７）が命令アクセスＭＭＵ１０から出力される。キャッシュタグ１２−１〜１２−４及びキャッシュデータ１３−１〜１３−４の更新は、タグ更新制御部１１及びデータ更新制御部１４により行われる。Ｎウェイ構成の場合、比較器１５及びセレクタ１６の入力がＮになる。また、ダイレクトマップ構成の場合はセレクタは不要になる。 The instruction access request / address (1) from the program counter is sent to the instruction access MMU 10 and converted into a physical address (8), and then cache tags 12-1 to 12-4 and cache data 13-1 to 13-4 are converted. As an address. Among the tag outputs searched for at the same lower address (index), if there is an address higher bit (tag) indicated by the tag output that matches the request address from the instruction access MMU 10, that is the cache data 13-1-13. -4 indicates that valid data exists (hit). These coincidence detections are performed by the comparator 15, and at the same time, the selector 16 is activated with the hit information (4). If there is a hit, the data is sent to the instruction buffer as instruction data (5). If there is no hit, it is output to the secondary cache as a cache miss request (3). The cache miss request (3) consists of the request itself (3) -1 and the miss address (3) -2. Thereafter, the return data from the secondary cache updates the cache tags 12-1 to 12-4 and the cache data 13-1 to 13-4, and similarly returns the data to the instruction buffer. When updating the cache tags 12-1 to 12-4 and the cache data 13-1 to 13-4, the write address (7) is output from the instruction access MMU10. The update of the cache tags 12-1 to 12-4 and the cache data 13-1 to 13-4 is performed by the tag update control unit 11 and the data update control unit 14. In the N-way configuration, the inputs of the comparator 15 and the selector 16 are N. In the case of a direct map configuration, a selector is not necessary.

特開平１１−３２８０１４号公報には、キャッシュメモリの使用効率を上げるために、アドレス空間中における空間局所性の及ぶ範囲の違いに対応するために、アドレス空間毎にブロックサイズを適切に設定する技術が開示されている。 Japanese Patent Laid-Open No. 11-328014 discloses a technique for appropriately setting a block size for each address space in order to cope with a difference in the range of spatial locality in the address space in order to increase the use efficiency of the cache memory. Is disclosed.

特開２００１−２９７０３６号公報には、ダイレクトマップ方式やセットアソシエイティブ方式と共に用いることのできる、ＲＡＭセットキャッシュを設ける技術が記載されている。ＲＡＭセットキャッシュは、セットアソシエイティブ方式の１つのウェイを構成するように設けられ、行単位の読み書きを行う。 Japanese Patent Application Laid-Open No. 2001-297036 describes a technique for providing a RAM set cache that can be used together with a direct map method or a set associative method. The RAM set cache is provided so as to constitute one set associative way, and performs reading and writing in line units.

本発明の課題は、低コストで使用効率の高いキャッシュメモリを提供することである。
本発明のキャッシュメモリは、格納されているデータブロックの先頭アドレスに対応する先頭ポインタを格納する先頭ポインタ格納手段と、該データブロックを構成するデータが格納されているアドレスに対応するポインタと、該先頭ポインタからの該ポインタ間の接続関係を格納するポインタマップ格納手段と、該ポインタに対応するアドレスに格納されるデータを格納するポインタデータ格納手段とからなることを特徴とする。 An object of the present invention is to provide a cache memory with low cost and high use efficiency.
The cache memory of the present invention comprises a head pointer storage means for storing a head pointer corresponding to a head address of a stored data block, a pointer corresponding to an address at which data constituting the data block is stored, It is characterized by comprising pointer map storage means for storing the connection relationship between the pointers from the head pointer, and pointer data storage means for storing data stored at an address corresponding to the pointer.

本発明によれば、ポインタの接続関係を格納することにより、データをブロックとして格納するようにしている。従って、ポインタの接続関係を変更すれば、可変長のデータブロックを格納可能となる。 According to the present invention, data is stored as a block by storing pointer connection relationships. Therefore, variable length data blocks can be stored by changing the pointer connection relationship.

すなわち、格納するデータブロックの単位が決定されている従来の方式に比べ、キャッシュメモリの記憶容量を可能な限り有効に使い切ることができ、また、大きなブロック単位で格納すべき場合と小さなブロック単位でよい場合に対して柔軟に対応できる。よって、キャッシュメモリの使用効率が上がり、結果として、キャッシュミスの可能性を小さくすることができる。 In other words, compared to the conventional method in which the unit of the data block to be stored is determined, the storage capacity of the cache memory can be used as effectively as possible. It is possible to respond flexibly to good cases. Therefore, the use efficiency of the cache memory is increased, and as a result, the possibility of a cache miss can be reduced.

図５及び図６は、本発明の概念を説明する図である。
本発明では、プロセッサの命令実行はキャッシュ１エントリ分ではなく、数ブロック〜数十ブロック以上で行われることが多いことに着目した。ＣＡＭを全エントリに対し適用可能ならば、問題は解決可能であるが、高コストになることは前述の通りである。そこで、ＣＡＭをキャッシュエントリごとではなく、命令ブロック単位毎に適用する。具体的には、ある命令ブロックの情報（先頭アドレス、命令ブロックのサイズ、命令ブロックの先頭ポインタの番号）のみＣＡＭ上に保持する（図５参照）。命令データ自体は先頭ポインタで示されるＦＩＦＯ構造のポインタメモリに記憶する（図６参照）。ポインタメモリはポインタマップメモリとポインタデータメモリの２つのメモリから構成され、ポインタマップメモリはポインタ間の接続情報を、ポインタデータメモリはそのポインタにおけるデータそのものを表し、複数のＦＩＦＯを１つのメモリ上に仮想的に構築可能としている。すなわち、記憶領域は、例えば、ＲＡＭなどの連続領域であるが、データの連続性がポインタの接続情報を保持することによって保たれているということである。よって、連続性を持つポインタによって示されるデータは１つのブロックを構成し、ブロック毎に本発明の実施形態のキャッシュメモリに格納されていることになる。ここで、特に、本発明の実施形態のキャッシュメモリは、ポインタの接続情報を操作することによって、格納されるデータのブロックサイズを自由に変更可能となっている。すなわち、物理的なＦＩＦＯが複数用意されているわけではない。本発明における命令キャッシュの読み込みは（１）アドレスからＣＡＭを索引して、アクセスすべきデータが含まれるブロックの先頭アドレスが格納されているポインタを取得する、（２）ポインタマップメモリからアクセスすべきデータが含まれるブロックへのポインタを取得する、（３）ポインタデータメモリから取得したポインタで示されるアドレスの命令データブロックから、アクセスしたい命令データを読み込む、（４）実行する、となる。これにより、命令ブロック毎に異なる長さのデータ記録領域を実装したのと同じキャッシュ使用効率が得られる。また、全エントリにＣＡＭを用いるより索引情報が少ないため回路を相対的に削減することができる。キャッシュミス時は、ＣＡＭにタグをセットすると同時に空きポインタを空きポインタ供給部（不図示）から提供し、メモリからのデータを空きポインタが示すポインタメモリのエントリに書き込む。継続アクセスがプロセッサから要求された場合は、空きポインタを再度供給し、同様にキャッシュに書き込み、２番目のポインタをポインタキューに追加する。空きポインタが枯渇した場合は、古いデータから破棄するなどの方法で命令ブロックを破棄し、空きポインタを確保する。 5 and 6 are diagrams for explaining the concept of the present invention.
In the present invention, attention is paid to the fact that processor instruction execution is often performed in several blocks to several tens of blocks or more instead of one cache entry. If the CAM can be applied to all entries, the problem can be solved, but as described above, the cost becomes high. Therefore, CAM is applied not for each cache entry but for each instruction block. Specifically, only information on a certain instruction block (start address, instruction block size, instruction block start pointer number) is held on the CAM (see FIG. 5). The instruction data itself is stored in a pointer memory having a FIFO structure indicated by the head pointer (see FIG. 6). The pointer memory is composed of two memories, a pointer map memory and a pointer data memory. The pointer map memory represents connection information between pointers, the pointer data memory represents data in the pointer itself, and a plurality of FIFOs on one memory. Virtually buildable. That is, the storage area is, for example, a continuous area such as a RAM, but data continuity is maintained by holding pointer connection information. Therefore, the data indicated by the pointer having continuity constitutes one block, and each block is stored in the cache memory according to the embodiment of the present invention. Here, in particular, the cache memory according to the embodiment of the present invention can freely change the block size of stored data by manipulating pointer connection information. That is, a plurality of physical FIFOs are not prepared. In reading the instruction cache in the present invention, (1) the CAM is indexed from the address to obtain a pointer storing the head address of the block containing the data to be accessed, and (2) the pointer map memory should be accessed. A pointer to a block including data is acquired, (3) instruction data to be accessed is read from an instruction data block at an address indicated by the pointer acquired from the pointer data memory, and (4) execution is performed. As a result, the same cache usage efficiency can be obtained as when a data recording area having a different length is mounted for each instruction block. Further, since there is less index information than using CAM for all entries, the circuit can be relatively reduced. When a cache miss occurs, a tag is set in the CAM and a free pointer is provided from a free pointer supply unit (not shown), and data from the memory is written to the pointer memory entry indicated by the free pointer. When continuous access is requested from the processor, the empty pointer is supplied again, and it is similarly written to the cache, and the second pointer is added to the pointer queue. When the empty pointer is exhausted, the instruction block is discarded by a method such as discarding old data, and an empty pointer is secured.

図７は、本発明を含む全体構成図である。
同図はマイクロプロセッサの概略を示しており、動作は以下のようになる。
１）命令フェッチ
実行するための命令を外部バスより外部バスインターフェース２０を介して獲得する。まず、プログラムカウンタ２１の指し示す命令が命令バッファ２２に存在するかを確認し、ない場合、命令バッファ２２は命令をフェッチする要求を命令アクセスＭＭＵ２３に送る。命令アクセスＭＭＵ２３はプログラムが使用する論理アドレスから、ハードウェアのマッピング順序に依存する物理アドレスに変換する。そのアドレスを用いて命令アクセス一次キャッシュタグ２４を検索し、一致すれば命令アクセス一次キャッシュデータ２５に該当データが存在するため、読み出しアドレスを送り、命令データを命令バッファ２２に返す。存在しない場合は、更に二次キャッシュタグ２６を検索し、なお存在しない場合は外部バスというように要求を発行し、戻りデータを順次、二次キャッシュデータ２７、命令アクセス一次キャッシュデータ２５へと補充する。このとき、補充されたことを二次キャッシュタグ２６、命令アクセス一次キャッシュタグ２４の更新により記憶する。補充されたデータは、命令アクセス一次キャッシュデータ２５に存在しているときと同じ要領で、命令バッファ２２に格納する。
２）命令の実行
命令バッファ２２に格納された命令列は、実行ユニット２８に送られ、それぞれの命令の種別に応じて演算器２９、ロードストアユニット３０に転送される。演算命令や分岐命令は演算器２９の出力を汎用レジスタファイル３１に記録するか、あるいはプログラムカウンタ（不図示）を更新する、といった処理を行う。また、ロードストア命令は、ロードストアユニット３０にて、命令アクセスと同様にデータアクセスＭＭＵ３２−データアクセス一次キャッシュタグ３３−データアクセス一次キャッシュデータ３４と順次アクセスし、そのデータを汎用レジスタファイル３１にコピーするロード命令、あるいは、そのエントリに対し、汎用レジスタファイル３１からコピーするストア命令、と命令に応じて実行する。一次キャッシュにない場合は、命令実行機構と共用の二次キャッシュ、あるいは外部バスよりデータを獲得して、同様に実行する。実行の完了後、プログラムカウンタは逐次加算されるか、あるいは、分岐指示アドレスに更新され、再度１）の命令フェッチを行う。
３）全体
このように命令フェッチ、命令実行を繰り返し、マイクロプロセッサは動作するが本発明では、点線部の命令アクセスＭＭＵ２３、命令アクセス一次キャッシュタグ２４、命令アクセス一次キャッシュデータ２５の新しい構成を提供する。 FIG. 7 is an overall configuration diagram including the present invention.
This figure shows the outline of the microprocessor, and the operation is as follows.
1) Instruction fetch An instruction to be executed is acquired from the external bus via the external bus interface 20. First, it is confirmed whether or not the instruction pointed to by the program counter 21 exists in the instruction buffer 22. If there is no instruction, the instruction buffer 22 sends a request to fetch an instruction to the instruction access MMU 23. The instruction access MMU 23 converts the logical address used by the program into a physical address depending on the hardware mapping order. The instruction access primary cache tag 24 is searched using the address, and if it matches, the corresponding data exists in the instruction access primary cache data 25, so the read address is sent and the instruction data is returned to the instruction buffer 22. If it does not exist, the secondary cache tag 26 is further searched. If it does not exist, a request is issued as an external bus, and the return data is sequentially replenished to the secondary cache data 27 and the instruction access primary cache data 25. To do. At this time, the replenishment is stored by updating the secondary cache tag 26 and the instruction access primary cache tag 24. The supplemented data is stored in the instruction buffer 22 in the same manner as it exists in the instruction access primary cache data 25.
2) Instruction Execution The instruction sequence stored in the instruction buffer 22 is sent to the execution unit 28 and transferred to the arithmetic unit 29 and the load / store unit 30 according to the type of each instruction. An arithmetic instruction or a branch instruction performs processing such as recording the output of the arithmetic unit 29 in the general-purpose register file 31 or updating a program counter (not shown). The load / store instruction sequentially accesses the data access MMU 32 -data access primary cache tag 33 -data access primary cache data 34 in the load / store unit 30 in the same manner as the instruction access, and copies the data to the general-purpose register file 31. A load instruction to be executed or a store instruction to be copied from the general register file 31 is executed in response to the entry. If it is not in the primary cache, data is acquired from the secondary cache shared with the instruction execution mechanism or the external bus and executed in the same manner. After the execution is completed, the program counter is sequentially added or updated to the branch instruction address, and the instruction fetch of 1) is performed again.
3) Overall In this way, instruction fetch and instruction execution are repeated, and the microprocessor operates, but the present invention provides a new configuration of the instruction access MMU 23, instruction access primary cache tag 24, and instruction access primary cache data 25 in the dotted line portion. .

図８は、本発明の実施形態の構成図である。
プログラムカウンタからの命令アクセスリクエスト／アドレスは命令アクセスＭＭＵ２３に送られ、物理アドレスに変換された後、ＣＡＭ４１にアドレスとして送られる。ＣＡＭ４１はタグ、サイズ、先頭ポインタデータを出力する。アドレス・サイズ判定／ヒット判定ブロック４２では、最終的な希望ポインタの検索を行い、そのポインタが存在した場合はポインタデータを読み出し、そのデータは命令データ（１）として命令バッファ（不図示）に送られる。存在しない場合、キャッシュミスリクエスト（２）として、二次キャッシュに出力される。その後、二次キャッシュからの戻りデータは、ブロック先頭判定ブロック４３を経由し、戻りデータが先頭命令であれば、ＣＡＭ４１の更新を、先頭命令でなければポインタマップメモリ４４の更新とＣＡＭサイズ情報４２の更新と共に、ポインタデータメモリ４５を更新し、同様に命令バッファにデータを返す。ブロック先頭判定ブロック４３では、書き込みに際し、空きポインタＦＩＦＯ４６からの空きポインタの供給を受けるが、枯渇してしまった場合は、空きポインタＦＩＦＯ４６から破棄ポインタ選択制御ブロック４７に指示が送られ、任意のＣＡＭエントリに破棄指示を行う。その出力はアドレス・サイズ判定／ヒット判定ブロック４２で無効化され、空きポインタＦＩＦＯ４６に返却される。 FIG. 8 is a configuration diagram of an embodiment of the present invention.
The instruction access request / address from the program counter is sent to the instruction access MMU 23, converted into a physical address, and then sent to the CAM 41 as an address. The CAM 41 outputs the tag, size, and head pointer data. In the address / size determination / hit determination block 42, the final desired pointer is searched. If the pointer exists, the pointer data is read, and the data is sent to the instruction buffer (not shown) as instruction data (1). It is done. If it does not exist, it is output to the secondary cache as a cache miss request (2). Thereafter, the return data from the secondary cache passes through the block head determination block 43. If the return data is the head instruction, the CAM 41 is updated. If the return data is not the head instruction, the pointer map memory 44 is updated and the CAM size information 42 is updated. Is updated, the pointer data memory 45 is updated, and data is similarly returned to the instruction buffer. In the block head determination block 43, the empty pointer is supplied from the empty pointer FIFO 46 at the time of writing, but when it is depleted, an instruction is sent from the empty pointer FIFO 46 to the discard pointer selection control block 47, and an arbitrary CAM is received. Instructs the entry to be discarded. The output is invalidated by the address / size determination / hit determination block 42 and returned to the empty pointer FIFO 46.

図９は、プロセッサの命令アクセスＭＭＵのページ管理機構とＣＡＭとを共用する場合の構成を示したものである。
なお、同図において、図８と同じ構成要素には同じ参照符号を付して、説明を省略する。 FIG. 9 shows a configuration when the page management mechanism of the instruction access MMU of the processor and the CAM are shared.
In the figure, the same components as those in FIG. 8 are denoted by the same reference numerals, and the description thereof is omitted.

この構成は、ＭＭＵのアドレス変換単位（ページ）とキャッシュの管理単位を同一サイズにすることにより、ＭＭＵ内のＣＡＭに同一機能を持たせ、ＣＡＭの削減をはかったものである（同図の５０）。すなわち、命令アクセスＭＭＵでは、仮想アドレスを物理アドレスに変換するテーブルを有しているわけであるが、このテーブルと、ＣＡＭのテーブルとを１つのテーブルに併合し、命令アクセスＭＭＵの機構で、ＣＡＭの検索などの動作を行うようにしたものである。これによれば、テーブルの検索機構を命令アクセスＭＭＵとＣＡＭの検索機構を１つのハードウェアで扱うことができるので、ハードウェアの削減となる。 In this configuration, the MMU address translation unit (page) and the cache management unit have the same size, so that the CAM in the MMU has the same function to reduce the CAM (50 in the figure). ). In other words, the instruction access MMU has a table for converting a virtual address into a physical address, but this table and the CAM table are merged into one table. The operation such as searching is performed. According to this, since the table search mechanism can handle the instruction access MMU and the CAM search mechanism with a single hardware, the hardware is reduced.

また、本発明の実施形態においては、キャッシュメモリに読み込む命令データを一塊りずつ格納するので、プログラムをブロック化して読み込む必要がある。この場合、命令がプロセッサが読み込んだとき、読み込んだ命令がサブルーチンコール及びその復帰命令、条件分岐命令、例外処理及びその復帰命令であると判断した場合には、プログラムの先頭あるいは最後であると判断して、その命令間のブロックを単位としてキャッシュメモリに格納する。このように、プログラムの内容に従って、読み込む命令をブロック化してキャッシュメモリに読み込む場合は、ブロックの大きさが、読み込む度に異なることになるが、本発明の実施形態によれば、ポインタを用いて、可変長のブロックをメモリ上に構成可能であるので、このような方法が採用できる。あるいは、ブロックの大きさを強制的に決定し、順次プログラムの命令をデコードする際に、任意の命令をブロックの先頭とし、所定の大きさのブロックが得られた時点で、最後の命令をブロックの最後の命令とするような方法が可能である。この場合、図８、９のブロック先頭判定の命令デコードを変更するだけで、任意のブロック化方法が採用可能である。例えば、プログラムの記述に従ってブロック化する場合には、ＣＡＬＬ命令／レジスタライト命令等を判定して、ブロックの先頭と判断することになる。 Further, in the embodiment of the present invention, the instruction data to be read into the cache memory is stored one by one, so that it is necessary to read the program in blocks. In this case, when the instruction is read by the processor, if it is determined that the read instruction is a subroutine call and its return instruction, conditional branch instruction, exception handling and its return instruction, it is determined that it is the beginning or end of the program. Then, the block between the instructions is stored in the cache memory as a unit. As described above, when the instruction to be read is blocked and read into the cache memory in accordance with the contents of the program, the size of the block differs every time it is read. According to the embodiment of the present invention, the pointer is used. Since a variable-length block can be configured on the memory, such a method can be adopted. Alternatively, when the size of a block is forcibly determined and instructions in a sequential program are decoded, an arbitrary instruction is set at the beginning of the block, and when the block of a predetermined size is obtained, the last instruction is blocked. It is possible to use a method in which the last instruction is used. In this case, an arbitrary blocking method can be adopted only by changing the instruction decoding of the block head determination in FIGS. For example, when a block is formed in accordance with the description of the program, a CALL instruction / register write instruction or the like is determined to determine the head of the block.

本発明の実施形態においては、プロセッサで命令ブロックの先頭と末尾を検出し、命令ブロックＣＡＭに制御信号を送出する。本制御機構では、先頭信号を受け取ると、キャッシュタグを記録し、主記憶よりデータを獲得し、ポインタが示すキャッシュアドレスに命令を書き込む。プロセッサの要求が１キャッシュエントリに達する毎に、空きポインタキューより空きエントリを補充し、そのエントリ番号をキャッシュタグ・キューに追加すると同時に、命令ブロックサイズを加算する。同一ブロックを複数回、またはブロック内の途中に分岐する場合は、キャッシュタグ＋サイズからエントリ番号を割り出し、アクセスを行う。また、上記において、命令ブロックの先頭と末尾の通知を特定のレジスタアクセスで通知する。この場合、命令によるブロックの明示的開始／終了を宣言する必要がある。これは、前述した、プログラムの中で命令によらず、強制的にブロック化する場合である。 In the embodiment of the present invention, the processor detects the beginning and end of an instruction block and sends a control signal to the instruction block CAM. In this control mechanism, when a head signal is received, a cache tag is recorded, data is acquired from the main memory, and an instruction is written at the cache address indicated by the pointer. Each time a processor request reaches one cache entry, the empty pointer queue is replenished with empty entries, the entry number is added to the cache tag queue, and at the same time, the instruction block size is added. When branching the same block a plurality of times or in the middle of the block, the entry number is determined from the cache tag + size and accessed. Further, in the above, notification of the beginning and end of the instruction block is notified by specific register access. In this case, it is necessary to declare the explicit start / end of the block by the instruction. This is a case where the block is forcibly made regardless of the instruction in the program as described above.

図１０〜図１３は、本発明の実施形態の動作を説明する図である。
図１０は、本発明の実施形態に従ったキャッシュメモリに命令が存在した場合、すなわち、命令ヒットの場合の動作である。 10-13 is a figure explaining operation | movement of embodiment of this invention.
FIG. 10 shows the operation in the case where an instruction exists in the cache memory according to the embodiment of the present invention, that is, in the case of an instruction hit.

プロセッサ６０から、アクセスすべき命令データのアドレスが出力されると、ＣＡＭ部６１を検索して、アクセスすべき命令データを含むブロックの先頭ポインタを検索する。命令ヒットの場合には、アクセスすべき命令データを含むブロックの先頭ポインタが存在した場合である。次には、得られた先頭ポインタから、ポインタマップメモリ６２を検索し、ブロックを構成する命令データのポインタを全て取得する。そして、取得したポインタを用いて、ポインタデータメモリ６３から命令データを取得し、プロセッサ６０に返す。 When the address of the instruction data to be accessed is output from the processor 60, the CAM unit 61 is searched to search for the head pointer of the block including the instruction data to be accessed. In the case of an instruction hit, there is a head pointer of a block including instruction data to be accessed. Next, the pointer map memory 62 is searched from the obtained head pointer, and all instruction data pointers constituting the block are acquired. Then, using the acquired pointer, the instruction data is acquired from the pointer data memory 63 and returned to the processor 60.

図１１は、本発明の実施形態に従ったキャッシュメモリに命令が存在しない、すなわち、命令ミスの場合であって、アクセスすべき命令がブロックの先頭に有るべき場合を示している。 FIG. 11 shows a case where there is no instruction in the cache memory according to the embodiment of the present invention, that is, an instruction miss, and the instruction to be accessed should be at the head of the block.

この場合、プロセッサ６０からアドレス指定が行われ、命令データへのアクセスが試みられる。ＣＡＭ部６１では、このアドレスに従ってポインタを検索するが、今の場合、対応する命令が含まれているブロックが無く、しかも、対応する命令がブロックの先頭となるべきものであると判断される。この場合、空きポインタキュー６４から、空きのポインタを獲得し、主記憶から当該命令データを含むブロックを読み込み、ＣＡＭの先頭ポインタで示される先頭アドレスを更新する。また、ポインタマップメモリ６２も、獲得された空きポインタをブロックとして関連付けて、ポインタデータメモリ６３は、各ポインタに主記憶から読み込まれた命令データを対応付けて、命令データをプロセッサ６０に返す。空きポインタキュー６４は、通常のＦＩＦＯによるポインタデータバッファであり、初期値は、ポインタが０から最大値まで記録されるものである。 In this case, addressing is performed from the processor 60, and access to instruction data is attempted. The CAM unit 61 searches for a pointer according to this address. In this case, it is determined that there is no block including the corresponding instruction, and that the corresponding instruction should be the head of the block. In this case, an empty pointer is obtained from the empty pointer queue 64, a block including the instruction data is read from the main memory, and the head address indicated by the head pointer of the CAM is updated. The pointer map memory 62 also associates the acquired empty pointer as a block, and the pointer data memory 63 associates the instruction data read from the main memory with each pointer and returns the instruction data to the processor 60. The empty pointer queue 64 is a pointer data buffer by a normal FIFO, and the initial value is recorded from 0 to the maximum value of the pointer.

図１２は、アクセスすべき命令データが本発明の実施形態に従ったキャッシュメモリに存在せず、命令データがブロックの先頭以外の場所に存在すべきものである場合の動作を示す図である。 FIG. 12 is a diagram showing an operation in a case where the instruction data to be accessed does not exist in the cache memory according to the embodiment of the present invention and the instruction data should exist in a place other than the head of the block.

プロセッサ６０からアドレスが出力され、ＣＡＭ部６１で命令データの検索をするが、キャッシュメモリに存在しないと判断される。すると、空きポインタキュー６４から空きポインタを取得し、主記憶から当該命令データを含むブロックを読み込む。そして、当該ブロックに隣接する、既にＣＡＭ部６１に登録されているブロックに、読み込んだブロックをつなげる要領で、ＣＡＭ部６１のブロックサイズを更新し、ポインタマップメモリ６２を更新し、ポインタデータメモリ６３が、読み込んだブロックの命令データを格納し、命令データをプロセッサ６０に返す。 An address is output from the processor 60, and the instruction data is searched by the CAM unit 61, but it is determined that it does not exist in the cache memory. Then, an empty pointer is acquired from the empty pointer queue 64, and a block including the instruction data is read from the main memory. Then, the block size of the CAM unit 61 is updated, the pointer map memory 62 is updated, and the pointer data memory 63 is updated in a manner to connect the read block to the block that is adjacent to the block and has already been registered in the CAM unit 61. However, the instruction data of the read block is stored, and the instruction data is returned to the processor 60.

図１３は、命令データのブロックを読み込む必要があるが、空きポインタがない場合の動作を示す図である。
プロセッサ６０から命令データへのアクセスがＣＡＭ部６１になされる。しかし、命令データがキャッシュメモリ内にないと判断される。更に、主記憶から命令データのブロックを読み込むために空きポインタキューから空きポインタを獲得しようとするが、空きポインタがないため、任意の１ブロックを破棄する命令が出される。ポインタマップメモリ６２は、ポインタマップから１ブロック分を破棄し、破棄したポインタを空きポインタキュー６４に通知する。これにより、空きポインタキュー６４は、空きポインタを獲たので、これをＣＡＭ部６１に通知して、主記憶から新しい命令データブロックを読み込む。 FIG. 13 is a diagram showing an operation when it is necessary to read a block of instruction data but there is no empty pointer.
Access to the instruction data from the processor 60 is made to the CAM unit 61. However, it is determined that the instruction data is not in the cache memory. Furthermore, an attempt is made to acquire an empty pointer from the empty pointer queue in order to read a block of instruction data from the main memory, but since there is no empty pointer, an instruction to discard an arbitrary block is issued. The pointer map memory 62 discards one block from the pointer map and notifies the empty pointer queue 64 of the discarded pointer. As a result, since the empty pointer queue 64 has acquired the empty pointer, it notifies the CAM unit 61 of this and reads a new instruction data block from the main memory.

従来技術のダイレクトマップ方式を採用したキャッシュメモリの概念構成を示す図である。It is a figure which shows the conceptual structure of the cache memory which employ | adopted the direct map system of the prior art. 従来の２ウェイセットアソシエイティブキャッシュメモリの概念構成図である。It is a conceptual block diagram of the conventional 2 way set associative cache memory. 従来の連想記憶メモリの概念構成図である。It is a conceptual block diagram of the conventional associative memory. 従来の４ウェイセットアソシエイティブキャッシュメモリのデータアクセス機構の構成図である。It is a block diagram of the data access mechanism of the conventional 4-way set associative cache memory. 本発明の概念を説明する図である。It is a figure explaining the concept of this invention. 本発明の概念を説明する図である。It is a figure explaining the concept of this invention. 本発明を含む全体構成図である。1 is an overall configuration diagram including the present invention. 本発明の実施形態の構成図である。It is a block diagram of embodiment of this invention. プロセッサの命令アクセスＭＭＵのページ管理機構とＣＡＭとを共用する場合の構成を示したものである。The configuration in the case where the page management mechanism of the instruction access MMU of the processor and the CAM are shared is shown. 本発明の実施形態の動作を説明する図である。It is a figure explaining operation | movement of embodiment of this invention. 本発明の実施形態の動作を説明する図である。It is a figure explaining operation | movement of embodiment of this invention. 本発明の実施形態の動作を説明する図である。It is a figure explaining operation | movement of embodiment of this invention. 本発明の実施形態の動作を説明する図である。It is a figure explaining operation | movement of embodiment of this invention.

Claims

Head pointer storage means for storing a head pointer corresponding to the head address of the stored data block;
A pointer corresponding to an address in which data constituting the data block is stored; pointer map storage means for storing a connection relationship between the pointers from the head pointer;
Pointer data storage means for storing data stored at an address corresponding to the pointer;
A cache memory characterized by comprising:

2. The cache memory according to claim 1, wherein the data block is a data string whose head and tail are determined by an instruction from a processor.

2. The cache memory according to claim 1, wherein the data block is a data string whose head and tail are determined based on a result of decoding an instruction in a program.

4. The cache memory according to claim 3, wherein the instruction is a subroutine call and return instruction, a conditional branch instruction, or an exception handling and return instruction.

2. The cache memory according to claim 1, wherein the head pointer storage means stores the head address of the data block, the size of the data block, and the head pointer of the data block in association with each other.

2. The cache memory according to claim 1, wherein the head pointer storage means is storage means adopting an associative memory system.

It further comprises empty pointer queue means for holding pointers that are not used,
2. The cache memory according to claim 1, wherein when a new data block needs to be stored, the empty pointer indicated by the empty pointer queue means is used.

When a new data block needs to be stored, but the empty pointer queue means does not hold the empty pointer, the empty pointer is obtained by discarding one of the currently stored data blocks. The cache memory according to claim 7, wherein the cache memory is generated.

The cache memory according to claim 8, wherein the discarding is performed in order from an old data block.

When the data to be accessed by the processor is not stored and the data is data to be the head of the data block, a data block starting from the data to be accessed by the processor is newly stored. The cache memory according to claim 1.

When the data to be accessed by the processor is not stored and the data is data other than the data to be the head of the data block, the data block including the data to be accessed by the processor is already stored. 2. The cache memory according to claim 1, wherein the cache memory is newly stored so as to be connected to a data block.

2. The cache memory according to claim 1, wherein the data possessed by the head pointer storage means is managed together with data possessed by a mechanism for converting a virtual address issued by a processor into a physical address.

The cache memory according to claim 1, wherein the data is instruction data.

A head pointer storing step for storing a head pointer corresponding to the head address of the stored data block;
A pointer corresponding to an address where data constituting the data block is stored; a pointer map storing step for storing a connection relationship between the pointers from the head pointer;
A pointer data storage step for storing data stored at an address corresponding to the pointer;
Have
A cache memory control method characterized in that a variable-length data block can be stored.