JP2011164975A

JP2011164975A - Information processor

Info

Publication number: JP2011164975A
Application number: JP2010027626A
Authority: JP
Inventors: 毅 ▲葛▼; Takeshi Katsura
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-02-10
Filing date: 2010-02-10
Publication date: 2011-08-25
Anticipated expiration: 2030-02-10
Also published as: JP5434646B2

Abstract

<P>PROBLEM TO BE SOLVED: To automatically and properly increase the number of ways according to an application to improve utilization efficiency and hit rate of a cache. <P>SOLUTION: The information processor includes a cache unit installed between a processing unit that processes data and a memory that stores data, and including a plurality of data arrays where data in the memory is copied and an information array that holds information on the order of the time elapsed from the last use of ways in the data arrays. Each of the data arrays has one or more ways. At least one of them is different in the number of entries. For each sub-memory area, one entry is assigned in the plurality of data arrays. The number of ways is dynamically changed to the proper one by entry according to an application to be executed. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、キャッシュ部を有する情報処理装置に関する。 The present invention relates to an information processing apparatus having a cache unit.

プロセッサシステムにおいて、メモリ（主記憶）に記憶されているデータの読み出しに係るプロセッサからメモリへのアクセス速度を速くするために、キャッシュが使用されている。キャッシュ内のデータとメモリ内のデータとの対応に係るキャッシュエントリとメモリアドレスとのマッピング方式としては、ダイレクトマッピング方式、セットアソシアティブ方式、フルアソシアティブ方式がよく知られている。 In the processor system, a cache is used to increase the access speed from the processor to the memory related to reading of data stored in the memory (main memory). As a mapping method between a cache entry and a memory address related to correspondence between data in a cache and data in a memory, a direct mapping method, a set associative method, and a full associative method are well known.

キャッシュのヒット率については、ダイレクトマッピング方式、セットアソシアティブ方式、フルアソシアティブ方式の順で性能が向上する。一方、実装容易性は、ダイレクトマッピング方式、セットアソシアティブ方式、フルアソシアティブ方式の順で低くなり、フルアソシアティブ方式は現実的に実装困難である。そのため、４ウェイ程度のセットアソシアティブ方式を用いるのが一般的である。 As for the cache hit rate, the performance improves in the order of the direct mapping method, the set associative method, and the full associative method. On the other hand, the ease of mounting becomes lower in the order of the direct mapping method, the set associative method, and the full associative method, and the full associative method is actually difficult to implement. Therefore, it is common to use a set associative method of about 4 ways.

図１１は、４ウェイのセットアソシアティブ方式のキャッシュの構成を示す図である。
図１１において、１１０はタグアレイ、１１１はデータアレイ、１１２（１１２−０〜１１２−３）は比較器、１１３は選択回路、１１４はＬＲＵ（Least Recently Used）アレイである。データアレイ１１１は、キャッシュラインサイズ単位でメモリ（主記憶）内のデータのコピーをブロックに記憶する。タグアレイ１１０は、データアレイ１１１の各ブロックに格納されているデータについてのタグ（メモリアドレスの一部）を記憶する。図示のようにアドレスの全ビットのうち、キャッシュラインサイズに対応する下位ビットをオフセットとして、オフセット部より上位に位置する、エントリ数（データアレイにおける縦方向のブロック数）に対応する所定数のビットがキャッシュのインデックスとなる。インデックス部よりさらに上位の残りのビットがキャッシュのタグとなる。ＬＲＵアレイ１１４は、タグアレイ１１０及びデータアレイ１１１におけるウェイの古さ順（最後に使用されてからの経過時間の順）を示すＬＲＵ情報を保持する。 FIG. 11 is a diagram showing the configuration of a 4-way set associative cache.
In FIG. 11, 110 is a tag array, 111 is a data array, 112 (112-0 to 112-3) is a comparator, 113 is a selection circuit, and 114 is an LRU (Least Recently Used) array. The data array 111 stores a copy of data in the memory (main memory) in blocks in units of cache line size. The tag array 110 stores a tag (a part of a memory address) for data stored in each block of the data array 111. As shown in the figure, out of all bits of the address, a predetermined number of bits corresponding to the number of entries (the number of blocks in the vertical direction in the data array) positioned higher than the offset portion with the lower bits corresponding to the cache line size as an offset Is the index of the cache. The remaining bits that are higher than the index part are cache tags. The LRU array 114 holds LRU information indicating the order of the way in the tag array 110 and the data array 111 (the order of the elapsed time since the last use).

メモリ空間上のあるアドレスへのアクセスを実行する場合に、アクセス先を示すアドレスが入力されるとアドレス中のインデックス部に基づいてタグアレイ１１０が参照され、各ウェイにおいてインデックスに対応するブロックからタグが読み出される。読み出されたタグとアドレス中のタグ部のビットパターンとがウェイ毎に比較器１１２によって比較され、比較結果が選択回路１１３に出力される。 When accessing an address in the memory space, when an address indicating an access destination is input, the tag array 110 is referred to based on the index portion in the address, and a tag is identified from the block corresponding to the index in each way. Read out. The read tag and the bit pattern of the tag portion in the address are compared for each way by the comparator 112, and the comparison result is output to the selection circuit 113.

読み出されたタグとアドレス中のタグ部のビットパターンとが一致する場合にはキャッシュヒットとなる。このとき、インデックスに対応するデータアレイ１１１のブロックからデータが読み出され、タグが一致したウェイに対応するブロックから読み出されたデータが、比較器１１２での比較結果に応じて選択回路１１３により選択されて出力データとして出力される。 When the read tag matches the bit pattern of the tag portion in the address, a cache hit occurs. At this time, data is read from the block of the data array 111 corresponding to the index, and the data read from the block corresponding to the way with the matching tag is read by the selection circuit 113 according to the comparison result in the comparator 112. Selected and output as output data.

一方、いずれのウェイにおいても、読み出されたタグとアドレス中のタグ部のビットパターンとが一致しない場合にはキャッシュミスとなる。キャッシュミスである場合には、メモリにアクセスすることで、アクセス先を示すアドレスに対応するメモリの記憶領域からデータが読み出され出力データとして出力される。また、メモリから読み出されたデータが、キャッシュに入力データとして供給され登録される。アドレスにおけるインデックスに対応するタグアレイ１１０のブロックにアドレスにおけるタグが登録され、インデックスに対応するデータアレイ１１１のブロックにメモリから読み出されたデータが登録される。このとき、ＬＲＵアレイ１１４に保持されているＬＲＵ情報に基づき、インデックスに対応するタグアレイ１１０及びデータアレイ１１１のブロックのうち、最後に使用されてから最も長い時間が経過したブロックにタグ及びデータがリプレース（書き換え）される。 On the other hand, in any way, if the read tag does not match the bit pattern of the tag portion in the address, a cache miss occurs. In the case of a cache miss, by accessing the memory, data is read from the storage area of the memory corresponding to the address indicating the access destination and output as output data. Further, data read from the memory is supplied and registered as input data in the cache. The tag at the address is registered in the block of the tag array 110 corresponding to the index in the address, and the data read from the memory is registered in the block of the data array 111 corresponding to the index. At this time, based on the LRU information held in the LRU array 114, the tag and data are replaced in the block of the tag array 110 and the data array 111 corresponding to the index that has the longest time since the last use. (Rewritten).

また、キャッシュシステムにおいて、ウェイ数とブロックサイズとの組合せをレジスタに指定することで変更可能なようにするキャッシュ方式が提案されている（特許文献１参照）。このように、アプリケーションに最適なウェイ数及びブロックサイズにキャッシュ構成を変更することで、キャッシュの利用効率やヒット率の向上を図っている。また、キャッシュのヒット率を観測する回路を設けて、適当な閾値に基づきデータアレイへの電源供給をウェイ単位でオン／オフし、動作させるウェイ数を制御できるようにしたキャッシュが提案されている（特許文献２参照）。 In addition, a cache system has been proposed in which a cache system can be changed by specifying a combination of the number of ways and a block size in a register (see Patent Document 1). In this way, the cache utilization efficiency and hit rate are improved by changing the cache configuration to the optimum number of ways and block size for the application. In addition, a cache has been proposed in which a circuit for observing the hit rate of the cache is provided, and the number of ways to be operated can be controlled by turning on / off the power supply to the data array in units of ways based on an appropriate threshold. (See Patent Document 2).

特開２０００−２０３９６号公報JP 2000-20396 A 特開平９−５０４０１号公報Japanese Patent Laid-Open No. 9-50401

キャッシュとしては、ヒット率やハードウェア量の観点から４ウェイ程度のセットアソシアティブ方式のキャッシュを用いるのが一般的である。しかし、実行するアプリケーションによっては、２ウェイ程度で十分なこともあれば、８ウェイ程度でないと頻繁にキャッシュのデータがリプレースされてしまうことがある。また、データアレイ全体のウェイ数だけでなく、エントリ番号毎に使用頻度が偏ることがあり得る。 As a cache, a set associative cache of about 4 ways is generally used from the viewpoint of hit rate and hardware amount. However, depending on the application to be executed, about 2 ways may be sufficient, and if it is not about 8 ways, the cache data may be frequently replaced. In addition to the number of ways in the entire data array, the usage frequency may be uneven for each entry number.

前述の特許文献１や特許文献２に記載の技術によれば、キャッシュのウェイ数を変更することが可能である。しかし、特許文献１のような、ウェイ数をアプリケーションに応じてユーザが予め決定し、固定的にユーザが手動でウェイ数を指定して使用する方法では、利用の容易性を考えると問題がある。また、すべてのエントリ番号に対して一意にウェイ数を変更させるため、エントリ番号毎の使用頻度に隔たりがあるとキャッシュの利用効率の向上は期待できない。また、特許文献２においても、すべてのエントリ番号に対して一意にウェイ数を変更させるため、エントリ番号毎に使用頻度の偏りがあるとキャッシュの利用効率の向上は期待できない。 According to the techniques described in Patent Document 1 and Patent Document 2 described above, the number of cache ways can be changed. However, the method in which the user previously determines the number of ways according to the application and uses the fixed number manually by the user as in Patent Document 1 has a problem in view of ease of use. . Further, since the number of ways is uniquely changed for all entry numbers, improvement in cache utilization efficiency cannot be expected if there is a difference in the frequency of use for each entry number. Also, in Patent Document 2, since the number of ways is uniquely changed for all entry numbers, improvement in cache utilization efficiency cannot be expected if there is a bias in use frequency for each entry number.

本発明の一観点によれば、データに処理を施す処理部と、データが記憶されている記憶部と処理部との間に設けられたキャッシュ部とを備える情報処理装置が提供される。キャッシュ部は、記憶部のデータがコピーされる複数のデータアレイと、複数のデータアレイにおけるウェイの古さ順を示す情報を保持する情報アレイとを有する。複数のデータアレイの各々はウェイ数が１以上であって、複数のデータアレイのうち少なくとも１つはエントリ数が異なり、分割された記憶部の記憶領域に対し、複数のデータアレイにてそれぞれ１つのエントリが割り当てられている。 According to one aspect of the present invention, an information processing apparatus is provided that includes a processing unit that performs processing on data, and a cache unit that is provided between a storage unit that stores data and the processing unit. The cache unit includes a plurality of data arrays to which data in the storage unit is copied, and an information array that holds information indicating the order of way ages in the plurality of data arrays. Each of the plurality of data arrays has a number of ways of 1 or more, and at least one of the plurality of data arrays has a different number of entries, and each of the plurality of data arrays has 1 for each of the storage areas of the divided storage units. One entry is assigned.

本発明によれば、実行するアプリケーションに応じてエントリ単位で動的にウェイ数を増減し最適なウェイ数に自動で変更することができ、キャッシュの利用効率やヒット率を向上させることができる。 According to the present invention, the number of ways can be dynamically increased / decreased in entry units according to the application to be executed and automatically changed to the optimum number of ways, and the cache utilization efficiency and the hit rate can be improved.

本発明の実施形態における情報処理装置の構成例を示す図である。It is a figure which shows the structural example of the information processing apparatus in embodiment of this invention. 本実施形態におけるキャッシュ部の構成例を示す図である。It is a figure which shows the structural example of the cache part in this embodiment. 本実施形態におけるメモリアドレスとキャッシュエントリとの対応を説明するための図である。It is a figure for demonstrating a response | compatibility with the memory address and cache entry in this embodiment. 本実施形態でのアクセス動作の一例を示すフローチャートである。It is a flowchart which shows an example of the access operation | movement in this embodiment. ＬＲＵ参照動作処理を示すフローチャートである。It is a flowchart which shows a LRU reference operation process. 本実施形態におけるキャッシュのウェイ数の変化を説明するための図である。It is a figure for demonstrating the change of the number of ways of the cache in this embodiment. 本実施形態におけるＬＲＵ情報の例を示す図である。It is a figure which shows the example of the LRU information in this embodiment. 本実施形態におけるＬＲＵ情報のエンコード方式の例を示す図である。It is a figure which shows the example of the encoding system of LRU information in this embodiment. 本実施形態での具体的な動作例を説明するための図である。It is a figure for demonstrating the specific operation example in this embodiment. 本実施形態での具体的な動作例を説明するための図である。It is a figure for demonstrating the specific operation example in this embodiment. セットアソシアティブ方式のキャッシュの一例を示す図である。It is a figure which shows an example of the cache of a set associative system.

以下、本発明の実施形態を図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の実施形態による情報処理装置の構成例を示すブロック図である。
図１において、１は情報処理装置であり、４はメモリ（主記憶）である。情報処理装置１は、メモリ４からデータを読み出して（ロードして）処理を施したり、処理結果として得られたデータをメモリに書き込んだりする（ストアする）。メモリ４は、情報処理装置１での処理に係るデータ等を記憶する。 FIG. 1 is a block diagram illustrating a configuration example of an information processing apparatus according to an embodiment of the present invention.
In FIG. 1, 1 is an information processing apparatus, and 4 is a memory (main memory). The information processing apparatus 1 reads (loads) data from the memory 4 to perform processing, and writes (stores) data obtained as a processing result in the memory. The memory 4 stores data related to processing in the information processing apparatus 1.

情報処理装置１は、命令に応じた処理を行う処理部２と、メモリ４のデータがコピーされるキャッシュ部３とを有する。図１においては、処理部２の一例としてＣＰＵコアを示しているが、これに限定されるものではなく、例えばアクセラレータであっても良い。 The information processing apparatus 1 includes a processing unit 2 that performs processing according to a command, and a cache unit 3 to which data in the memory 4 is copied. In FIG. 1, a CPU core is shown as an example of the processing unit 2, but the present invention is not limited to this, and may be an accelerator, for example.

情報処理装置１の処理部２がメモリ空間上のあるアドレスへアクセスしてデータをロードする場合、まずキャッシュ部３が参照されて、所望のデータ（当該アドレスにより指定される格納領域のデータ）のコピーがキャッシュ部３に存在するか否かが判断される。その結果、所望のデータのコピーがキャッシュ部３に存在するとき（キャッシュヒット）には、キャッシュ部３より所望のデータのコピーが読み出されて処理部２に供給される。 When the processing unit 2 of the information processing apparatus 1 accesses a certain address in the memory space and loads data, the cache unit 3 is first referred to and the desired data (the data in the storage area specified by the address) is stored. It is determined whether or not a copy exists in the cache unit 3. As a result, when a copy of desired data exists in the cache unit 3 (cache hit), a copy of the desired data is read from the cache unit 3 and supplied to the processing unit 2.

一方、所望のデータのコピーがキャッシュ部３に存在しないとき（キャッシュミス）には、メモリ４に対するアクセスが行われ、所望のデータが読み出される。そして、メモリ４から読み出された所望のデータが、処理部２に供給されるとともに、キャッシュ部３内のデータ（例えば、最後に使用されてから最も長い時間が経過したデータ）とリプレース（置き換え）されてキャッシュ部３に登録される。 On the other hand, when a copy of the desired data does not exist in the cache unit 3 (cache miss), the memory 4 is accessed and the desired data is read. Then, the desired data read from the memory 4 is supplied to the processing unit 2 and is replaced (replaced) with the data in the cache unit 3 (for example, the data having the longest time elapsed since the last use). And registered in the cache unit 3.

図２は、図１に示したキャッシュ部３の構成例を示す図である。
図２において、１０は第１のアレイ（メインアレイ）、２０は第２のアレイ（サブアレイ）、３０はＬＲＵ（Least Recently Used）アレイである。第１のアレイ（メインアレイ）１０は、例えばウェイ数が２である２ウェイのセットアソシアティブ方式のキャッシュに相当し、第２のアレイ（サブアレイ）２０は、例えばウェイ数が４である４ウェイのセットアソシアティブ方式のキャッシュに相当する。 FIG. 2 is a diagram showing a configuration example of the cache unit 3 shown in FIG.
In FIG. 2, 10 is a first array (main array), 20 is a second array (subarray), and 30 is an LRU (Least Recently Used) array. The first array (main array) 10 corresponds to, for example, a 2-way set associative cache having 2 ways, and the second array (sub-array) 20 has, for example, 4 ways having 4 ways. This corresponds to a set associative cache.

なお、第１のアレイ（メインアレイ）１０と第２のアレイ（サブアレイ）２０とはエントリ数（アレイにおける縦方向のブロック数）が異なっており、第１のアレイ（メインアレイ）１０のエントリ数は第２のアレイ（サブアレイ）２０のエントリ数より大きい。例えば、第１のアレイ（メインアレイ）１０のエントリ数は、第２のアレイ（サブアレイ）２０のエントリ数の４倍程度が好ましい。 The first array (main array) 10 and the second array (subarray) 20 have different numbers of entries (number of vertical blocks in the array), and the number of entries in the first array (main array) 10. Is larger than the number of entries in the second array (subarray) 20. For example, the number of entries in the first array (main array) 10 is preferably about four times the number of entries in the second array (subarray) 20.

メインアレイ１０は、メインタグアレイ１２、メインデータアレイ１４、比較器１６（１６−０、１６−１）、及び選択回路１８を有する。メインデータアレイ１４は、キャッシュラインサイズ単位でメモリ４内のデータのコピーをブロックに記憶する。メインタグアレイ１２は、メインデータアレイ１４の各ブロックに格納されているデータについてのタグ（メモリアドレスの一部）を記憶する。 The main array 10 includes a main tag array 12, a main data array 14, a comparator 16 (16-0, 16-1), and a selection circuit 18. The main data array 14 stores a copy of data in the memory 4 in blocks in units of cache line size. The main tag array 12 stores a tag (part of a memory address) for data stored in each block of the main data array 14.

比較器１６は、入力されたアクセス先を示すアドレスに基づいてメインタグアレイ１２から読み出されたタグとアドレス中のタグ部のビットパターンとをウェイ毎に比較し、比較結果を選択回路１８に出力する。選択回路１８は、入力されたアドレスに基づいてメインデータアレイ１４から読み出されたデータを、比較器１６での比較結果に応じて選択し出力データとして出力する。詳細には、選択回路１８は、アドレスに基づいてメインデータアレイ１４から読み出されたデータのうち、メインタグアレイ１２から読み出されたタグとアドレス中のタグ部のビットパターンとが一致したウェイのデータを選択し出力する。なお、いずれのウェイにおいても読み出されたタグとアドレス中のタグ部のビットパターンとが一致しない場合には、選択回路１８はデータを出力しない。 The comparator 16 compares the tag read from the main tag array 12 with the bit pattern of the tag portion in the address for each way based on the input address indicating the access destination, and the comparison result is sent to the selection circuit 18. Output. The selection circuit 18 selects the data read from the main data array 14 based on the input address according to the comparison result in the comparator 16 and outputs it as output data. Specifically, the selection circuit 18 selects the way in which the tag read from the main tag array 12 and the bit pattern of the tag portion in the address match among the data read from the main data array 14 based on the address. Select and output the data. Note that the selection circuit 18 does not output data when the tag read in any way does not match the bit pattern of the tag portion in the address.

また、サブアレイ２０は、サブタグアレイ２２、サブデータアレイ２４、比較器２６（２６−０〜２６−３）、及び選択回路２８を有する。サブデータアレイ２４は、キャッシュラインサイズ単位でメモリ４内のデータのコピーをブロックに記憶する。サブタグアレイ２２は、サブデータアレイ２４の各ブロックに格納されているデータについてのタグ（メモリアドレスの一部）を記憶する。 The subarray 20 includes a subtag array 22, a subdata array 24, comparators 26 (26-0 to 26-3), and a selection circuit 28. The sub data array 24 stores a copy of data in the memory 4 in blocks in units of cache line size. The sub tag array 22 stores a tag (part of a memory address) for data stored in each block of the sub data array 24.

比較器２６は、入力されたアドレスに基づいてサブタグアレイ２２から読み出されたタグとアドレス中のタグ部のビットパターンとをウェイ毎に比較し、比較結果を選択回路２８に出力する。選択回路２８は、選択回路１８と同様に、入力されたアドレスに基づいてサブデータアレイ２４から読み出されたデータを、比較器２６での比較結果に応じて選択し出力データとして出力する。 The comparator 26 compares the tag read from the sub tag array 22 with the bit pattern of the tag portion in the address for each way based on the input address, and outputs the comparison result to the selection circuit 28. Similar to the selection circuit 18, the selection circuit 28 selects the data read from the sub data array 24 based on the input address according to the comparison result in the comparator 26 and outputs it as output data.

ＬＲＵアレイ３０は、各エントリ単位で、メインアレイ１０及びサブアレイ２０におけるウェイの古さ順（最後に使用されてからの経過時間の順）を示すＬＲＵ情報を保持する。また、キャッシュ部３は、キャッシュ部３内の各機能部の制御等を行う図示しないキャッシュ制御部を有する。キャッシュ制御部は、例えば、ＬＲＵ情報の更新を行ったり、キャッシュミスのときのメモリ４へのアクセスや各データアレイのリプレース等の制御を行ったりする。 The LRU array 30 holds LRU information indicating the order of the way in the main array 10 and the sub-array 20 (order of elapsed time since the last use) in each entry unit. In addition, the cache unit 3 includes a cache control unit (not shown) that controls each functional unit in the cache unit 3. The cache control unit, for example, updates LRU information, or controls access to the memory 4 at the time of a cache miss or replacement of each data array.

本実施形態におけるキャッシュ部３において、アドレスの全ビットのうち、キャッシュラインサイズに対応する下位ビットをオフセットとして、オフセット部より上位に位置する、エントリ数に対応する所定数のビットがキャッシュのインデックスとなる。インデックス部よりさらに上位の残りのビットがキャッシュのタグとなる。ここで、メインアレイ１０とサブアレイ２０とはエントリ数が異なる。そのため、任意のアドレスに対し、メインアレイ１０でのキャッシュのタグ及びインデックスと、サブアレイ２０でのキャッシュのタグ及びインデックスとは異なる。 In the cache unit 3 according to the present embodiment, among all the bits of the address, a lower number bit corresponding to the cache line size is used as an offset, and a predetermined number of bits positioned higher than the offset unit and corresponding to the number of entries are the cache index. Become. The remaining bits that are higher than the index part are cache tags. Here, the main array 10 and the sub-array 20 have different numbers of entries. Therefore, the cache tag and index in the main array 10 and the cache tag and index in the subarray 20 are different for an arbitrary address.

例えば、アドレスがその最下位ビットを１ビット目としその最上位ビットを３２ビット目とする３２ビットであり、キャッシュラインサイズが３２バイトであり、メインアレイ１０のエントリ数が１２８であり、サブアレイ２０のエントリ数が３２であるとする。このとき、メインアレイ１０については、アドレスにおける１ビット目〜５ビット目の５ビットがオフセットとなり、６ビット目〜１２ビット目の７ビットがインデックスとなり、１３ビット目〜３２ビット目の２０ビットがタグとなる。また、サブアレイ２０については、アドレスにおける１ビット目〜５ビット目の５ビットがオフセットとなり、６ビット目〜１０ビット目の５ビットがインデックスとなり、１１ビット目〜３２ビット目の２２ビットがタグとなる。 For example, the address is 32 bits with the least significant bit as the first bit and the most significant bit as the 32nd bit, the cache line size is 32 bytes, the number of entries in the main array 10 is 128, the subarray 20 The number of entries is 32. At this time, for the main array 10, 5 bits from the 1st bit to the 5th bit in the address are offset, 7 bits from the 6th bit to 12th bit are an index, and 20 bits from the 13th bit to the 32nd bit are It becomes a tag. For the subarray 20, 5 bits from the 1st bit to the 5th bit in the address are offset, 5 bits from the 6th bit to the 10th bit are indexes, and 22 bits from the 11th bit to the 32nd bit are tags. Become.

図３は、本実施形態における情報処理装置でのメモリアドレスとキャッシュエントリとの対応を説明するための図である。図３において、１００はメモリ（主記憶）であり、１０１はキャッシュ部のメインアレイ（メインデータアレイ）であり、１０２はキャッシュ部のサブアレイ（サブデータアレイ）である。なお、図３に示す例では、メインアレイ１０は、エントリ数が８、ウェイ数が２のアレイとし、サブアレイ１０２は、エントリ数が２、ウェイ数が４のアレイとする。また、キャッシュラインサイズは３２バイトとする。 FIG. 3 is a diagram for explaining a correspondence between a memory address and a cache entry in the information processing apparatus according to the present embodiment. In FIG. 3, reference numeral 100 denotes a memory (main memory), 101 denotes a main array (main data array) of the cache unit, and 102 denotes a sub array (sub data array) of the cache unit. In the example shown in FIG. 3, the main array 10 is an array having 8 entries and 2 ways, and the sub-array 102 is an array having 2 entries and 4 ways. The cache line size is 32 bytes.

メモリ１００からキャッシュ部のメインアレイ１０１、サブアレイ１０２へのデータのコピーはキャッシュラインサイズを単位として行われる。メモリ空間はキャッシュラインサイズで分割され、分割された記憶領域（空間）を順番に各エントリに割り当てる。 Data is copied from the memory 100 to the main array 101 and the sub-array 102 of the cache unit in units of cache line size. The memory space is divided by the cache line size, and the divided storage areas (spaces) are sequentially assigned to the entries.

１６進表記でアドレスA〜アドレス（A＋20）の空間を“A”として示すと、図３に示す例では、例えばメモリ１００における“0x0000”、“0x0100”、“0x0200”、・・・が、メインアレイ１０１の１つのエントリ（第０のエントリ）に割り当てられる。同様に、例えばメモリ１００における“0x0000”、“0x0040”、“0x0080”、“0x00c0”、“0x0100”、・・・が、サブアレイ１０２の１つのエントリ（第０のエントリ）に割り当てられる。 When the space from address A to address (A + 20) is represented as “A” in hexadecimal notation, in the example shown in FIG. 3, for example, “0x0000”, “0x0100”, “0x0200”,. It is assigned to one entry (0th entry) of the array 101. Similarly, for example, “0x0000”, “0x0040”, “0x0080”, “0x00c0”, “0x0100”,... In the memory 100 are assigned to one entry (0th entry) of the subarray 102.

このように、本実施形態ではメモリ空間におけるキャッシュラインサイズで分割した１つの記憶領域を、メインアレイ１０１及びサブアレイ１０２にてそれぞれ１つのエントリに割り当てる。例えば、図３に示した例において、メモリ１００における“0x0000”が、メインアレイ１０１の第０のエントリに割り当てられているとともに、サブアレイ１０２の第０のエントリに割り当てられている。 As described above, in this embodiment, one storage area divided by the cache line size in the memory space is allocated to one entry in the main array 101 and the subarray 102, respectively. For example, in the example shown in FIG. 3, “0x0000” in the memory 100 is assigned to the 0th entry of the main array 101 and to the 0th entry of the subarray 102.

ただし、本実施形態において、メモリ１００のデータをコピーするときには、メインアレイ１０１の割り当てられたエントリ又はサブアレイ１０２の割り当てられたエントリの一方のブロックに登録される。したがって、メモリ１００におけるあるアドレスのデータのコピーが、メインアレイ１０１とサブアレイ１０２との両方に同時に存在することはない。メモリ１００からメインアレイ１０１、サブアレイ１０２へのデータのコピーは、ＬＲＵアレイに保持されているＬＲＵ情報に基づいて行われる。メインアレイ１０１及びサブアレイ１０２の各々の割り当てられたエントリの各ブロックのうち、最後に使用されてから最も長い時間が経過したブロック（以下、最古ブロックとも称す。）にリプレースすることで、メモリ１００のデータのコピーが行われる。 However, in this embodiment, when copying the data in the memory 100, it is registered in one block of the assigned entry of the main array 101 or the assigned entry of the subarray 102. Therefore, a copy of data at a certain address in the memory 100 does not exist in both the main array 101 and the subarray 102 at the same time. Data is copied from the memory 100 to the main array 101 and the sub-array 102 based on LRU information held in the LRU array. By replacing each block of the assigned entry in each of the main array 101 and the sub-array 102 with a block having the longest time elapsed since the last use (hereinafter also referred to as the oldest block), the memory 100 is replaced. The data is copied.

図４は、本実施形態における動作の一例を示すフローチャートである。図４には、処理部（ＣＰＵコア）２がメモリ空間上のあるアドレスのデータをロードするときのキャッシュ部３の動作について示しており、キャッシュ部３の図示しないキャッシュ制御部による制御等に基づいて実行される。 FIG. 4 is a flowchart showing an example of the operation in the present embodiment. FIG. 4 shows the operation of the cache unit 3 when the processing unit (CPU core) 2 loads data at a certain address in the memory space, and is based on the control of the cache unit 3 by a cache control unit (not shown). Executed.

ステップＳ１１で、処理部（ＣＰＵコア）２がメモリ４にアクセスするために、処理部２からアクセス先を示すアドレスが送られると、ステップＳ１２で、当該アドレスを用いて、キャッシュ部３内のメインアレイ１０及びサブアレイ２０が同時に参照される。 In step S11, when the processing unit (CPU core) 2 accesses the memory 4, an address indicating the access destination is sent from the processing unit 2, and in step S12, the main address in the cache unit 3 is used using the address. Array 10 and subarray 20 are referenced simultaneously.

ステップＳ１２では、メインアレイ１０のエントリ数に応じたアドレス中のインデックス部に基づいてメインタグアレイ１２が参照されるとともに、サブアレイ２０のエントリ数に応じたアドレス中のインデックス部に基づいてサブタグアレイ２２が参照される。そして、メインタグアレイ１２及びサブタグアレイ２２の各ウェイにおいてインデックスに対応するブロックからタグが読み出され、読み出されたタグとアドレス中のタグ部のビットパターンとがウェイ毎に比較器１６、２６によって比較される。 In step S12, the main tag array 12 is referred to based on the index portion in the address corresponding to the number of entries in the main array 10, and the sub tag array 22 based on the index portion in the address corresponding to the number of entries in the subarray 20. Is referenced. Then, the tag is read from the block corresponding to the index in each way of the main tag array 12 and the sub tag array 22, and the read tag and the bit pattern of the tag portion in the address are compared for each way by the comparators 16, 26. Compared by.

比較の結果、メインタグアレイ１２及びサブタグアレイ２２のウェイのいずれかにおいて、読み出されたタグとアドレス中のタグ部のビットパターンとが一致する場合には、キャッシュヒットであると判定し（ステップＳ１３のＹ）、ステップＳ１４に進む。一方、比較の結果、メインタグアレイ１２及びサブタグアレイ２２のいずれのウェイにおいても、読み出されたタグとアドレス中のタグ部のビットパターンとが一致しない場合には、キャッシュミスであると判定し（ステップＳ１３のＮ）、ステップＳ１５に進む。 As a result of the comparison, if the read tag and the bit pattern of the tag portion in the address match in any of the ways of the main tag array 12 and the sub tag array 22, it is determined that the cache hit (step Y of S13), the process proceeds to step S14. On the other hand, as a result of the comparison, if the read tag and the bit pattern of the tag portion in the address do not match in any way of the main tag array 12 and the sub tag array 22, it is determined that there is a cache miss. (N of step S13), it progresses to step S15.

ステップＳ１４では、メインデータアレイ１４又はサブデータアレイ２４にてインデックスに対応し、かつ読み出されたタグとアドレス中のタグ部のビットパターンとが一致したウェイに対応するブロックから読み出されたデータが、選択回路１８、２８より出力データとして出力される。また、出力されるデータが読み出されたブロックを、使用されてから最も時間が経過していないブロック（以下、最新ブロックとも称す。）とするようＬＲＵアレイ３０に保持されているＬＲＵ情報の更新を行い、動作を終了する。 In step S14, the data read from the block corresponding to the index corresponding to the index in the main data array 14 or the sub data array 24 and the way corresponding to the read tag and the bit pattern of the tag portion in the address. Is output from the selection circuits 18 and 28 as output data. In addition, the update of the LRU information held in the LRU array 30 is performed so that the block from which the output data is read out is the block that has not been used for the longest time (hereinafter also referred to as the latest block). To finish the operation.

一方、ステップＳ１５では、図５に示すＬＲＵ参照動作処理を行う。ＬＲＵ参照動作処理では、ＬＲＵアレイ３０に保持されているＬＲＵ情報を参照して、メインアレイ１０及びサブアレイ２０のブロックのうちから、既に登録されているデータのコピー及びそれに対応するタグを破棄するブロック（最古ブロック）を決定する。 On the other hand, in step S15, the LRU reference operation process shown in FIG. 5 is performed. In the LRU reference operation processing, a block that refers to the LRU information held in the LRU array 30 and discards a copy of already registered data and a corresponding tag from the blocks of the main array 10 and the subarray 20 (Oldest block) is determined.

図５は、図４に示したステップＳ１５でのＬＲＵ参照動作処理の流れを示すフローチャートである。
ステップＳ２１で、アクセス先を示すアドレスに対応するＬＲＵ情報がＬＲＵアレイ３０より読み出される。すなわち、ＬＲＵアレイ３０に保持されているＬＲＵ情報のなかから、アクセス先のアドレスを割り当てたメインアレイ１０及びサブアレイ２０のエントリでのウェイの古さ順が情報として含まれるＬＲＵ情報が読み出される。 FIG. 5 is a flowchart showing the flow of the LRU reference operation process in step S15 shown in FIG.
In step S 21, LRU information corresponding to the address indicating the access destination is read from the LRU array 30. In other words, from the LRU information held in the LRU array 30, LRU information including information on the order of way ages in the entries of the main array 10 and the sub-array 20 to which the access destination addresses are assigned is read.

ステップＳ２２で、ステップＳ２１において読み出したＬＲＵ情報を参照してメインアレイとサブアレイに跨って最古ブロックを決定し、メモリ４から読み出したデータのコピーがリプレースされる置き換え対象ブロックとする。 In step S22, the oldest block is determined across the main array and the sub-array by referring to the LRU information read in step S21, and set as a replacement target block in which a copy of the data read from the memory 4 is replaced.

図４に戻り、ステップＳ１６では、メモリ４に対するアクセスが実行され、アクセス先を示すアドレスに対応するメモリ４の記憶領域からデータが読み出されて処理部２に供給される。また、メモリ４から読み出されたデータのコピー及びそれに対応するタグが、ステップＳ１５におけるＬＲＵ参照動作処理で決定された最古ブロックに登録（リプレース）される。そして、データのコピー及びタグがリプレースされた最古ブロックを、最新ブロックとするようＬＲＵアレイ３０に保持されているＬＲＵ情報の更新を行い、動作を終了する。 Returning to FIG. 4, in step S <b> 16, access to the memory 4 is executed, and data is read from the storage area of the memory 4 corresponding to the address indicating the access destination and supplied to the processing unit 2. Further, a copy of the data read from the memory 4 and a tag corresponding to the copy are registered (replaced) in the oldest block determined in the LRU reference operation process in step S15. Then, the LRU information held in the LRU array 30 is updated so that the oldest block in which the data copy and the tag are replaced is the latest block, and the operation ends.

本実施形態によれば、エントリ数が異なるメインアレイ１０とサブアレイ２０とを設け、メモリ空間をキャッシュラインサイズで分割した１つの領域を、メインアレイ１０及びサブアレイ２０のそれぞれにて１つのエントリに割り当てる。 According to the present embodiment, the main array 10 and the subarray 20 having different numbers of entries are provided, and one area obtained by dividing the memory space by the cache line size is assigned to one entry in each of the main array 10 and the subarray 20. .

例えば、図６（Ａ）に示すようにメインアレイ（ウェイ数２）のブロックｍｗ００、ｍｗ１０からなる第０のエントリが割り当てられるアドレスについては、サブアレイ（ウェイ数４）のブロックｓｗ００、ｓｗ１０、ｓｗ２０、ｓｗ３０からなる第０のエントリが割り当てられる。また、メインアレイのブロックｍｗ０２、ｍｗ１２からなる第２のエントリが割り当てられるアドレスについては、サブアレイの第０のエントリが割り当てられる。メインアレイのブロックｍｗ０４、ｍｗ１４からなる第４のエントリが割り当てられるアドレス、及びメインアレイのブロックｍｗ０６、ｍｗ１６からなる第６のエントリが割り当てられるアドレスのそれぞれについても、サブアレイの第０のエントリが割り当てられる。 For example, as shown in FIG. 6A, for the address to which the 0th entry consisting of blocks mw00 and mw10 in the main array (way number 2) is assigned, blocks sw00, sw10, sw20 in the subarray (way number 4), The 0th entry consisting of sw30 is assigned. For the address to which the second entry consisting of the blocks mw02 and mw12 of the main array is assigned, the 0th entry of the subarray is assigned. The 0th entry of the subarray is also assigned to each of the address to which the fourth entry consisting of the blocks mw04 and mw14 of the main array is assigned and the address to which the sixth entry consisting of the blocks mw06 and mw16 of the main array is assigned. .

すなわち、サブアレイの第０のエントリは、メインアレイにおける第０のエントリ、第２のエントリ、第４のエントリ、及び第６のエントリの４つのエントリにより共有されることとなる。このように構成することで、メインアレイの４つのエントリがサブアレイの１つのエントリを取り合うような仕組みを実現でき、メインアレイのエントリに対してサブアレイのエントリの各ブロック（ウェイ）を動的に割り当てることができる。これにより、すべてのエントリに対して一様にウェイ数を増減させるのではなく、各エントリ単位で個別にウェイ数を動的に増減させることができる。 That is, the 0th entry of the subarray is shared by the 4 entries of the 0th entry, the 2nd entry, the 4th entry, and the 6th entry in the main array. With this configuration, it is possible to realize a mechanism in which four entries in the main array share one entry in the subarray, and each block (way) of the subarray entry is dynamically assigned to the main array entry. be able to. Thus, the number of ways can be dynamically increased or decreased individually for each entry, instead of increasing or decreasing the number of ways uniformly for all entries.

例えば、メインアレイのウェイ数をｍ、サブアレイのウェイ数をｎとすると、メインアレイの各エントリのウェイ数は、論理的に最小でｍ、最大で（ｍ＋ｎ）となるように動的に変更可能である。図６に示した例では、図６（Ｂ）に示すように各エントリのウェイ数を最小でメインアレイによる２ウェイとなり、最大でメインアレイとサブアレイによる６ウェイとなるように動的に変更することができる。したがって、アプリケーションの実行中に、ハードウェアが自動的に判定して当該アプリケーションに応じてエントリ単位で動的にウェイ数を増減し最適なウェイ数に自動で変更することができ、キャッシュの利用効率やヒット率を向上させることができる。 For example, if the number of ways in the main array is m and the number of ways in the sub-array is n, the number of ways in each entry in the main array can be dynamically changed to a minimum of m and a maximum of (m + n). It is. In the example shown in FIG. 6, as shown in FIG. 6B, the number of ways of each entry is dynamically changed to a minimum of 2 ways by the main array and a maximum of 6 ways by the main array and the subarray. be able to. Therefore, during application execution, the hardware can automatically determine and dynamically increase or decrease the number of ways for each entry according to the application, and automatically change to the optimal number of ways. And the hit rate can be improved.

本実施形態によれば、サブアレイを設けることによるウェイ数の増加により、メインアレイのみの従来の方式では、キャッシュミスになってしまうようなケースでも、キャッシュヒットとなるケースが増加し、ヒット率を向上させることができる。また、同一データアレイサイズ（データアレイの総ブロック数を同じとした場合）での本実施形態と従来方式とを比較すると、エントリ数を本実施形態におけるサブアレイに合わせた従来の構成と比較すると、比較器の数を少なくできる。 According to the present embodiment, due to the increase in the number of ways due to the provision of sub-arrays, even in the case of the conventional method using only the main array, the number of cases of cache hits increases even in the case of a cache miss. Can be improved. Also, comparing this embodiment with the same data array size (when the total number of blocks in the data array is the same) and the conventional method, comparing the number of entries with the conventional configuration according to the subarray in this embodiment, The number of comparators can be reduced.

また、同一データアレイサイズ（データアレイの総ブロック数を同じとした場合）での本実施形態と従来方式とを比較すると、必要とするエントリにサブアレイのブロックが動的に割り当てられることができるため、キャッシュの利用効率及びヒット率を向上させることができる。例えば、例示したようなメインアレイを８エントリ、２ウェイとし、サブアレイを２エントリ、４ウェイとした本実施形態におけるキャッシュ部と、８エントリ、３ウェイの従来のキャッシュ部とを比較する。このとき、従来においてはウェイサイズに相当する３回以上同じインデックスに対応するエントリにアクセスするとキャッシュミスとなってしまうが、本実施形態では最大で６回アクセスしてもキャッシュミスとならないことがありヒット率を向上させることができる。 Further, when this embodiment is compared with the conventional method with the same data array size (when the total number of blocks in the data array is the same), the subarray blocks can be dynamically allocated to the required entries. Cache utilization efficiency and hit rate can be improved. For example, the cache unit in the present embodiment in which the main array as illustrated has 8 entries and 2 ways and the subarray has 2 entries and 4 ways is compared with the conventional cache unit of 8 entries and 3 ways. At this time, conventionally, when an entry corresponding to the same index is accessed three or more times corresponding to the way size, a cache miss occurs. However, in this embodiment, a cache miss may not occur even if accessed up to six times. The hit rate can be improved.

なお、前述した実施形態では、メインアレイのエントリに対してサブアレイのエントリのブロック（ウェイ）のすべてを動的に割り当てるようにしているが、あるブロック（ウェイ）についてメインアレイのあるエントリに固定して割り当てるようにしても良い。 In the above-described embodiment, all blocks (way) of sub-array entries are dynamically allocated to main array entries. However, a certain block (way) is fixed to an entry in the main array. May be assigned.

次に、ＬＲＵアレイ３０に保持されるＬＲＵ情報について説明する。
キャッシュミスとなった場合には、キャッシュ部３にてデータのコピー及びタグの追い出しとリプレースがＬＲＵ情報に基づいて行われる。本実施形態では、メインアレイの複数のエントリがサブアレイの１つのエントリを取り合うこととなるため、サブアレイのエントリ数毎にメインアレイを分割して、それらが横に並んでいるとみなしてその総ウェイ数でのＬＲＵ情報とする。 Next, the LRU information held in the LRU array 30 will be described.
When a cache miss occurs, the cache unit 3 performs data copying, tag eviction, and replacement based on the LRU information. In this embodiment, since a plurality of entries in the main array share one entry in the sub-array, the main array is divided for each number of entries in the sub-array, and the total way is considered to be arranged side by side. Let LRU information be a number.

例えば、図７（Ａ）に示すように、メインアレイが８エントリ、２ウェイであるとし、サブアレイが２エントリ、４ウェイであるとすると、ＬＲＵ情報は、図７（Ｂ）に示すようにエントリが配置されているとみなしたときのウェイの古さ順を示す。すなわち、メインアレイの第０のエントリ、第２のエントリ、第４のエントリ、第６のエントリ、及びサブアレイの第０のエントリを１組としたＬＲＵ情報がＬＲＵアレイに保持される。同様に、メインアレイの第１のエントリ、第３のエントリ、第５のエントリ、第７のエントリ、及びサブアレイの第１のエントリを１組としたＬＲＵ情報がＬＲＵアレイに保持される。 For example, as shown in FIG. 7A, if the main array has 8 entries and 2 ways, and the subarray has 2 entries and 4 ways, the LRU information is entered as shown in FIG. 7B. Shows the order of the way old when is considered to be placed. That is, LRU information including a set of the 0th entry, the 2nd entry, the 4th entry, the 6th entry, and the 0th entry of the subarray in the main array is held in the LRU array. Similarly, LRU information including a first entry in the main array, a third entry, a fifth entry, a seventh entry, and a first entry in the subarray is held in the LRU array.

ここで、ＬＲＵ情報としてウェイの古さ順を単純にエンコードするとＬＲＵ情報のビット数は膨大となる。それを回避するには、例えばＬＲＵ情報を各アレイ内（メインアレイ内及びサブアレイ内のそれぞれ）とアレイ間（メインアレイ−サブアレイ間）で分割して保持するようにすれば良く、ＬＲＵ情報のデータ量を低減することができる。例えば、図７に示した例では、図７（Ｃ）に示すように各アレイ内のＬＲＵ情報は２項関係によるエンコードとし、アレイ間のＬＲＵ情報は順序関係によるエンコードとすることで、１組あたりのＬＲＵ情報は４６ビットで構成される。なお、この場合、アレイ間のＬＲＵ情報ではブロック単位ではなく、エントリ単位の最古順の識別が出来ればよい。 Here, if the way oldness is simply encoded as the LRU information, the number of bits of the LRU information becomes enormous. In order to avoid this, for example, the LRU information may be divided and held in each array (in the main array and in each subarray) and between the arrays (in the main array and the subarray). The amount can be reduced. For example, in the example shown in FIG. 7, as shown in FIG. 7C, the LRU information in each array is encoded by a binary relationship, and the LRU information between arrays is encoded by an order relationship. The per LRU information is composed of 46 bits. In this case, the LRU information between the arrays only needs to be identified in the oldest order not in units of blocks but in units of entries.

ここで、２項関係によるエンコードは、図８（Ａ）に示すように２つのブロック（ウェイ）の新旧関係を１ビットで保持する方法である。図８（Ａ）に示した例では、上位側から順に各ビットが、第０ブロックの第１ブロックに対する新旧関係、第０ブロックの第２ブロックに対する新旧関係、第０ブロックの第３ブロックに対する新旧関係、第１ブロックの第２ブロックに対する新旧関係、第１ブロックの第３ブロックに対する新旧関係、第２ブロックの第３ブロックに対する新旧関係を示している。例えば第ｉブロックの第ｊブロックに対する新旧関係（ｉ，ｊは整数）を示すビットは、第ｉブロックが第ｊブロックより新しいときに値“０”とし、第ｉブロックが第ｊブロックより古いときに値“１”とする。したがって、第２ブロック（最古）、第１ブロック、第０ブロック、第３ブロック（最新）の順に古い場合には、ＬＲＵ情報は“００１０１１”となる。 Here, the binary relationship encoding is a method in which the new / old relationship between two blocks (ways) is held in one bit as shown in FIG. In the example shown in FIG. 8A, each bit in order from the upper side is the new / old relationship of the 0th block to the first block, the new / old relationship of the 0th block to the second block, and the old and new of the 0th block to the third block. The relationship shows a new / old relationship of the first block to the second block, a new / old relationship of the first block to the third block, and a new / old relationship of the second block to the third block. For example, the bit indicating the old / new relationship (i and j are integers) of the i-th block with respect to the j-th block is set to “0” when the i-th block is newer than the j-th block, and when the i-th block is older than the j-th block. The value is “1”. Accordingly, when the second block (oldest), the first block, the 0th block, and the third block (latest) are in order, the LRU information is “001011”.

また、順序関係によるエンコードは、図８（Ｂ）に示すようにブロック番号を古い順にＬＲＵ情報として保持する方法である。第２ブロック（最古）、第１ブロック、第０ブロック、第３ブロック（最新）の順に古い場合には、図８（Ｂ）に示したようにＬＲＵ情報は“１００１００１１”となる。 Further, the encoding based on the order relationship is a method of holding the block numbers as the LRU information in the oldest order as shown in FIG. When the second block (oldest), first block, zeroth block, and third block (latest) are in order, the LRU information is “10010011” as shown in FIG.

したがって、図７（Ｃ）に示した例のＬＲＵ情報は、ｍｗ０１、ｍｗ１３、ｍｗ１１、ｍｗ１５、ｓｗ２１、ｓｗ１１、ｍｗ０３、ｍｗ０７、ｍｗ１７、ｓｗ０１、ｍｗ０５、ｓｗ３１の順で古いブロックであることを示している。ｍｗＸＹは、メインアレイにおけるウェイＸ、エントリＹのブロックであることを示し、ｓｗＸＹは、サブアレイにおけるウェイＸ、エントリＹのブロックであることを示す。なお、前述した説明では、ＬＲＵ情報として完全な新旧関係を保持する場合を一例として説明したが、ある程度端折って新旧関係を保持するようにしても良い。 Therefore, the LRU information in the example shown in FIG. 7C indicates that the blocks are older in the order of mw01, mw13, mw11, mw15, sw21, sw11, mw03, mw07, mw17, sw01, mw05, sw31. Yes. mwXY indicates a block of way X and entry Y in the main array, and swXY indicates a block of way X and entry Y in the sub-array. In the above description, the case where the complete new / old relationship is held as the LRU information is described as an example. However, the old / old relationship may be held by breaking to some extent.

次に、図９及び図１０を参照して、本実施形態においてキャッシュ部３のメインアレイ及びサブアレイにて各ウェイが動的にどのように割り当てられるかの具体的な動作例について説明する。以下では、図９（Ａ）に示すプログラム列が実行されたときを一例として示す。また、簡単のためキャッシュ構成は、キャッシュラインサイズが１６バイトであり、図９（Ｂ）に示すようにメインアレイが８エントリ、２ウェイであり、サブアレイが２エントリ、４ウェイであるとして説明する。 Next, a specific operation example of how each way is dynamically allocated in the main array and sub-array of the cache unit 3 in this embodiment will be described with reference to FIGS. In the following, a case where the program sequence shown in FIG. 9A is executed is shown as an example. Further, for simplicity, the cache configuration will be described assuming that the cache line size is 16 bytes, the main array has 8 entries and 2 ways, and the subarray has 2 entries and 4 ways as shown in FIG. 9B. .

図９（Ａ）に示したプログラム列において“ＬＤ＠（ａｄｄ），Ｒ１”は、アドレスａｄｄのデータを読み出してレジスタＲ１へ書き込む処理を示している。また、以下では、メインアレイにおけるウェイｗａｙＸ、エントリ０ｘＹブロックをｍｗＸＹと示し、サブアレイにおけるウェイｗａｙＸ、エントリ０ｘＹブロックをｓｗＸＹと示す。例えば、メインアレイにおけるウェイｗａｙ１のエントリ０ｘ３のブロックはｍｗ１３と示し、サブアレイにおけるウェイｗａｙ２のエントリ０ｘ１のブロックはｓｗ２１と示す。また、メインアレイのエントリ０ｘ１、０ｘ３、０ｘ５、０ｘ７、サブアレイのエントリ０ｘ１を１組とするＬＲＵ情報（旧から新への昇順、以下同様）の初期値は、ｍｗ０１、ｍｗ１１、ｍｗ０３、ｍｗ１３、ｍｗ０５、ｍｗ１５、ｍｗ０７、ｍｗ１７、ｓｗ０１、ｓｗ１１、ｓｗ２１、ｓｗ３１であるとする。 In the program sequence shown in FIG. 9A, “LD @ (add), R1” indicates a process of reading data at address add and writing it to register R1. In the following, the way wayX and entry 0xY block in the main array are indicated as mwXY, and the way wayX and entry 0xY block in the subarray are indicated as swXY. For example, the block of entry 0x3 of way way1 in the main array is indicated as mw13, and the block of entry 0x1 of way way2 in the sub array is indicated as sw21. Also, the initial values of LRU information (ascending order from old to new, and so on) with the entries 0x1, 0x3, 0x5, 0x7 in the main array and the entry 0x1 in the subarray as one set are mw01, mw11, mw03, mw13, mw05 , Mw15, mw07, mw17, sw01, sw11, sw21, and sw31.

“ＬＤ＠（０ｘ５１０１８），Ｒ１”の命令まで処理したときの様子を図１０（Ａ）に示す。ブロックｍｗ０１にはアドレス０ｘ１１０１＿のデータがコピーされている。（‘＿’（アンダーバー）は命令がアクセスする当該アドレスのワードデータを含むキャッシュラインサイズで分割されたデータを示す。）また、ブロックｍｗ１１、ｍｗ０７にはアドレス０ｘ２１０１＿、０ｘ１２０７＿のデータがそれぞれコピーされ、ブロックｓｗ０１、ｓｗ１１、ｓｗ２１にはアドレス０ｘ１ａ０１＿、０ｘ１３０１＿、０ｘ５１０１＿のデータがそれぞれコピーされている。また、このときＬＲＵ情報は、ｍｗ０３、ｍｗ１３、ｍｗ０５、ｍｗ１５、ｍｗ１７、ｓｗ３１、ｍｗ０１、ｍｗ１１、ｍｗ０７、ｓｗ０１、ｓｗ１１、ｓｗ２１である。 FIG. 10A shows a state when processing up to the instruction “LD @ (0x51018), R1”. Data of address 0x1101_ is copied to the block mw01. ('_' (Underscore) indicates data divided by the cache line size including the word data of the address accessed by the instruction.) Also, the data at addresses 0x2101_ and 0x1207_ are copied to the blocks mw11 and mw07, respectively. Data of addresses 0x1a01_, 0x1301_, and 0x5101_ are respectively copied to the blocks sw01, sw11, and sw21. At this time, the LRU information is mw03, mw13, mw05, mw15, mw17, sw31, mw01, mw11, mw07, sw01, sw11, and sw21.

次に、“ＬＤ＠（０ｘ１１０１８），Ｒ１”の命令が実行されるが、アドレス０ｘ１１０１＿のデータのコピーはメインアレイのブロックｍｗ０１に存在するため、キャッシュヒットとなる。したがって、メインアレイ及びサブアレイについてデータの変更はなく、ＬＲＵ情報の更新が行われる。ＬＲＵ情報は、ｍｗ０３、ｍｗ１３、ｍｗ０５、ｍｗ１５、ｍｗ１７、ｓｗ３１、ｍｗ１１、ｍｗ０７、ｓｗ０１、ｓｗ１１、ｓｗ２１、ｍｗ０１と更新される。 Next, the instruction “LD @ (0x11018), R1” is executed, but a copy of the data at the address 0x1101_ exists in the block mw01 of the main array, which results in a cache hit. Therefore, there is no data change for the main array and the sub-array, and the LRU information is updated. The LRU information is updated as mw03, mw13, mw05, mw15, mw17, sw31, mw11, mw07, sw01, sw11, sw21, mw01.

続いて、“ＬＤ＠（０ｘａｂ０７８），Ｒ１”、“ＬＤ＠（０ｘ２２０７０），Ｒ１”、の命令が実行される。“ＬＤ＠（０ｘ２２０７０），Ｒ１”の命令の処理完了後の様子を図１０（Ｂ）に示す。図１０（Ａ）に示した状態との違いは、ブロックｍｗ１７にアドレス０ｘａｂ０７＿のデータがコピーされ、ブロックｓｗ３１にアドレス０ｘ２２０７＿のデータがコピーされている点である。命令の実行に伴ってＬＲＵ情報が更新され、このときＬＲＵ情報は、ｍｗ０３、ｍｗ１３、ｍｗ０５、ｍｗ１５、ｍｗ１１、ｍｗ０７、ｓｗ０１、ｓｗ１１、ｓｗ２１、ｍｗ０１、ｍｗ１７、ｓｗ３１である。 Subsequently, the instructions “LD @ (0xab078), R1” and “LD @ (0x22070), R1” are executed. FIG. 10B shows a state after the processing of the instruction “LD @ (0x22070), R1” is completed. The difference from the state shown in FIG. 10A is that data at address 0xab07_ is copied to block mw17 and data at address 0x2207_ is copied to block sw31. The LRU information is updated with the execution of the instruction. At this time, the LRU information is mw03, mw13, mw05, mw15, mw11, mw07, sw01, sw11, sw21, mw01, mw17, and sw31.

次に、“ＬＤ＠（０ｘａｂ０７０），Ｒ１”の命令が実行されるが、アドレス０ｘａｂ０７＿のデータのコピーはメインアレイのブロックｍｗ１７に存在するため、キャッシュヒットとなる。したがって、メインアレイ及びサブアレイについてデータの変更はなく、ＬＲＵ情報の更新が行われる。ＬＲＵ情報は、ｍｗ０３、ｍｗ１３、ｍｗ０５、ｍｗ１５、ｍｗ１１、ｍｗ０７、ｓｗ０１、ｓｗ１１、ｓｗ２１、ｍｗ０１、ｓｗ３１、ｍｗ１７と更新される。 Next, the instruction “LD @ (0xab070), R1” is executed, but a copy of the data at the address 0xab07_ exists in the block mw17 of the main array, so that a cache hit occurs. Therefore, there is no data change for the main array and the sub-array, and the LRU information is updated. The LRU information is updated as mw03, mw13, mw05, mw15, mw11, mw07, sw01, sw11, sw21, mw01, sw31, mw17.

次に、“ＬＤ＠（０ｘｃｂ０７０），Ｒ１”の命令が実行される。このとき、アドレス０ｘｃｂ０７＿のデータのコピーはメインアレイ及びサブアレイには存在せず、キャッシュミスとなる。これにより、メモリよりアドレス０ｘｃｂ０７＿のデータが読み出され最古ブロックにリプレースされる。ただし、リプレース可能なブロックは、アクセスされたアドレス（０ｘｃｂ０７＿）に対応するメインアレイのブロック、すなわちブロックｍｗ０７、ｍｗ１７、及びサブアレイのブロックｓｗ０１、ｓｗ１１、ｓｗ２１、ｓｗ３１である。したがって、ＬＲＵ情報に基づいて、そのなかの最古ブロックｍｗ０７にアドレス０ｘｃｂ０７＿のデータがコピーされる。また、ＬＲＵ情報の更新が行われ、ＬＲＵ情報は、ｍｗ０３、ｍｗ１３、ｍｗ０５、ｍｗ１５、ｍｗ１１、ｓｗ０１、ｓｗ１１、ｓｗ２１、ｍｗ０１、ｓｗ３１、ｍｗ１７、ｍｗ０７となる。 Next, the instruction “LD @ (0xcb070), R1” is executed. At this time, a copy of the data at the address 0xcb07_ does not exist in the main array and the subarray, resulting in a cache miss. As a result, the data at the address 0xcb07_ is read from the memory and replaced with the oldest block. However, the replaceable blocks are main array blocks corresponding to the accessed address (0xcb07_), that is, blocks mw07 and mw17, and subarray blocks sw01, sw11, sw21, and sw31. Therefore, based on the LRU information, the data at address 0xcb07_ is copied to the oldest block mw07. Also, the LRU information is updated, and the LRU information becomes mw03, mw13, mw05, mw15, mw11, sw01, sw11, sw21, mw01, sw31, mw17, and mw07.

次に、“ＬＤ＠（０ｘ２ｂ０７０），Ｒ１”の命令が実行されるが、この場合もキャッシュミスとなり、メモリからアドレス０ｘ２ｂ０７＿のデータが読み出され最古ブロックにリプレースされる。リプレース可能なブロックは、アクセスされたアドレス（０ｘ２ｂ０７＿）に対応するメインアレイのブロック、すなわちブロックｍｗ０７、ｍｗ１７、及びサブアレイのブロックｓｗ０１、ｓｗ１１、ｓｗ２１、ｓｗ３１である。したがって、ＬＲＵ情報に基づいて、そのなかの最古ブロックｓｗ０１にアドレス０ｘ２ｂ０７＿のデータがコピーされ、図１０（Ｃ）に示すようになる。また、ＬＲＵ情報の更新が行われ、ＬＲＵ情報は、ｍｗ０３、ｍｗ１３、ｍｗ０５、ｍｗ１５、ｍｗ１１、ｓｗ１１、ｓｗ２１、ｍｗ０１、ｓｗ３１、ｍｗ１７、ｍｗ０７、ｓｗ０１となる。 Next, the instruction “LD @ (0x2b070), R1” is executed. In this case as well, a cache miss occurs, and the data at address 0x2b07_ is read from the memory and replaced with the oldest block. The replaceable blocks are main array blocks corresponding to the accessed address (0x2b07_), that is, blocks mw07 and mw17, and subarray blocks sw01, sw11, sw21, and sw31. Therefore, based on the LRU information, the data at the address 0x2b07_ is copied to the oldest block sw01, and the data is as shown in FIG. Also, the LRU information is updated, and the LRU information becomes mw03, mw13, mw05, mw15, mw11, sw11, sw21, mw01, sw31, mw17, mw07, sw01.

なお、キャッシュ部がメインアレイとサブアレイの２つのアレイを有する場合を一例として説明したが、これに限定されるものではなく、３つ以上のアレイを有するようにしても良い。複数のデータアレイを持ち、各データアレイのなかで少なくとも１つはエントリ数が異なる（すなわち、データアレイのエントリ数が２種以上）ものであれば良い。そして、アクセス先を示すアドレス中のエントリ数に応じたビット数のインデックス部に基づいて同時に複数のアレイを参照するとともに、メモリ空間上のあるアドレスのデータのコピーがある時間においては２以上のデータアレイに同時に存在しないようにする。また、エントリ数が最大であるデータアレイのエントリ数は、他のデータアレイのエントリ数の整数倍であることが好ましい。また、ウェイ数についても２、４としたアレイを一例として示したが、これに限定されることなく、ウェイ数は任意である。なお、ウェイ数を１、すなわちダイレクトマッピング方式相当としても良い。 Although the case where the cache unit has two arrays of the main array and the sub-array has been described as an example, the present invention is not limited to this, and the cache unit may have three or more arrays. It suffices to have a plurality of data arrays and at least one of the data arrays has a different number of entries (that is, the number of data array entries is two or more). A plurality of arrays are simultaneously referred to based on the index portion having the number of bits corresponding to the number of entries in the address indicating the access destination, and two or more pieces of data can be copied at a certain time in the memory space. Avoid being in the array at the same time. The number of entries in the data array having the maximum number of entries is preferably an integer multiple of the number of entries in the other data array. Moreover, although the number of ways is 2 and 4 as an example, the number of ways is not limited to this, and the number of ways is arbitrary. Note that the number of ways may be 1, that is, the direct mapping method.

なお、前記実施形態は、何れも本発明を実施するにあたっての具体化のほんの一例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその技術思想、またはその主要な特徴から逸脱することなく、様々な形で実施することができる。 The above-described embodiments are merely examples of implementation in carrying out the present invention, and the technical scope of the present invention should not be construed as being limited thereto. That is, the present invention can be implemented in various forms without departing from the technical idea or the main features thereof.

１情報処理装置
２処理部
３キャッシュ部
４メモリ
１０第１のアレイ（メインアレイ）
２０第２のアレイ（サブアレイ）
３０ＬＲＵアレイ
１２、２２タグアレイ
１４、２４データアレイ
１６、２６比較器
１８、２８選択回路 DESCRIPTION OF SYMBOLS 1 Information processing apparatus 2 Processing part 3 Cache part 4 Memory 10 1st array (main array)
20 Second array (subarray)
30 LRU array 12, 22 Tag array 14, 24 Data array 16, 26 Comparator 18, 28 Selection circuit

Claims

A processing unit for processing the data;
A cache unit provided between the storage unit storing the data and the processing unit;
The cache unit is
A plurality of data arrays to which data of the storage unit is copied;
An information array holding first information indicating the order of the way in the plurality of data arrays,
Each of the plurality of data arrays has a number of ways of 1 or more, and at least one of the plurality of data arrays has a different number of entries,
An information processing apparatus, wherein one entry is allocated in each of the plurality of data arrays to a storage area of the storage unit divided into a predetermined size.

2. The information processing according to claim 1, wherein the plurality of data arrays are simultaneously referred to based on an index portion in the address corresponding to each data array based on an address indicating an access destination from the processing unit. apparatus.

3. The information processing apparatus according to claim 1, wherein when the data in the storage unit is copied to the data array, the information processing apparatus copies the data to any one of the plurality of data arrays.

The first information is stored in the information array by dividing into information indicating the order of way ages in the data array for each data array and information indicating the order of way ages between the data arrays. The information processing apparatus according to claim 1, wherein the information processing apparatus is an information processing apparatus.

5. The information processing according to claim 1, wherein the number of entries in the data array having the maximum number of entries is an integer multiple of the number of entries in the data array other than the data array. apparatus.