JP2024011696A

JP2024011696A - Arithmetic processing apparatus and arithmetic processing method

Info

Publication number: JP2024011696A
Application number: JP2022113918A
Authority: JP
Inventors: 汐中原; Shio Nakahara; 隆英吉川; Takahide Yoshikawa
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2022-07-15
Filing date: 2022-07-15
Publication date: 2024-01-25

Abstract

To provide an arithmetic processing apparatus and an arithmetic processing method for improving efficiency of data transfer.SOLUTION: A storage unit 134 stores data. On receipt of a data access request from an arithmetic unit or a superior cache, a control unit 131 accesses target data when the target data of the data access request exists in the storage unit 134, or acquires the target data when the target data does not exist in the storage unit 134, from a subordinate cache or a main memory and stores the target data in the storage unit 134. A cache miss information update unit 132 calculates the number of occurrences of cache miss indicating that the target data does not exist in the storage unit 134. A speculative prefetch unit 133 acquires speculative data based on the number of occurrences from the main memory or the subordinate cache, and stores the acquired speculative data in the storage unit 134.SELECTED DRAWING: Figure 2

Description

本発明は、演算処理装置及び演算処理方法に関する。 The present invention relates to an arithmetic processing device and an arithmetic processing method.

近年、プロセッサの動作周波数が飛躍的に向上している。これに対し、メインメモリとして一般的に使用されるＤＲＡＭ（Dynamic Random Access Memory）の動作速度の向上は低調であり、プロセッサの性能を十分に活かすためにデータ転送を効率化するアーキテクチャの技術研究が盛んである。情報処理装置では、一般に、メインメモリよりもデータアクセスが高速なキャッシュメモリをＣＰＵ（Central Processing Unit）に配置する。そして、このキャッシュメモリ上に、最近参照したデータを置くことによって、メインメモリ参照によるレイテンシの低減が図られる。 In recent years, the operating frequency of processors has improved dramatically. On the other hand, improvements in the operating speed of DRAM (Dynamic Random Access Memory), which is commonly used as main memory, have been slow, and technical research into architectures that make data transfer more efficient is needed to take full advantage of processor performance. It's thriving. In an information processing device, a cache memory whose data access is faster than a main memory is generally arranged in a CPU (Central Processing Unit). By placing recently referenced data on this cache memory, latency due to main memory reference can be reduced.

ただし、レイテンシの低減のためのキャッシュメモリの配置が効果的に働くためには、キャッシュメモリ上に参照するデータが格納されていることが条件となる。キャッシュメモリ上に参照するデータが無ければ、メインメモリへのアクセスが発生してしまうため、キャッシュメモリを配置した場合であっても、データ転送速度が演算速度のボトルネックとなる。そこで、プリフェッチなどでデータを効率的にキャッシュメモリに格納することで、データ転送速度の向上が図られている。 However, in order for the arrangement of the cache memory to reduce latency to be effective, it is necessary that the data to be referenced is stored in the cache memory. If there is no data to be referenced on the cache memory, an access to the main memory will occur, so even if a cache memory is provided, the data transfer speed becomes a bottleneck in the calculation speed. Therefore, attempts are being made to improve the data transfer speed by efficiently storing data in a cache memory using prefetching or the like.

なお、キャッシュ管理の技術として、キャッシュが連携置換ポリシによって管理されるダイレクトマップ部分とマルチウェイ部分とを含み、マルチウェイ部分をダイレクトマップ部分のためのビクティムキャッシュとして機能させる技術が提案されている。ビクティムキャッシュとは、キャッシュから追い出されたデータを書き込むためのキャッシュである。また、キャッシュから追い出されたエントリをメインメモリに書き戻す前に一時的にライトバッファに保存し、要求されたデータがライトバッファにあれば、そのデータをキャッシュに戻す技術が提案されている。また、条件分岐予測でどの分岐を取るかを予測し、予測に基づいてメモリを検索して予測が正しければ次の命令はすでにフェッチされているが、予測が間違っている場合は、条件分岐が実際に解決されるまで、投機的にフェッチしておく技術が提案されている。また、メモリ内のＷｏｒｋ領域に転送されたデータをキャッシュ内の特定キャッシュ領域へ格納してデータ処理を行い、その後に特定のキャッシュからデータを追い出す動作を繰り返して、局所性の高いデータをキャッシュメモリに残す技術が提案されている。 As a cache management technique, a technique has been proposed in which the cache includes a direct map part and a multiway part managed by a cooperative replacement policy, and the multiway part functions as a victim cache for the direct map part. A victim cache is a cache for writing data evicted from the cache. Furthermore, a technique has been proposed in which an entry evicted from the cache is temporarily stored in a write buffer before being written back to the main memory, and if the requested data is in the write buffer, the data is returned to the cache. Also, conditional branch prediction predicts which branch to take, searches memory based on the prediction, and if the prediction is correct, the next instruction has already been fetched, but if the prediction is wrong, the conditional branch is Techniques have been proposed in which the problem is fetched speculatively until it is actually resolved. In addition, the data transferred to the work area in memory is stored in a specific cache area in the cache for data processing, and then the operation of evicting the data from the specific cache is repeated to store highly localized data in the cache memory. A technique has been proposed to leave the

特表２０１８－５１２６５０号公報Special table 2018-512650 publication 特開平０８－３１４８０２号公報Japanese Patent Application Publication No. 08-314802 米国特許出願公開第２０１８／０３２２０５９号明細書US Patent Application Publication No. 2018/0322059 特開２０１１－８６１３１号公報JP2011-86131A

しかしながら、疎行列演算などの不規則アクセスが多い演算では、将来のデータアクセス予測が難しい。例えば、疎行列ベクトル積演算（ＳｐＭＶ：Sparse Matrix-Vector multiplication）におけるベクトルデータなどは、どのデータが将来利用されるかを予測することは非常に困難である。将来のデータアクセス予測に失敗すると、プリフェッチによってデータを効率的にキャッシュメモリに格納することが難しく、データ転送を効率化することは困難となる。また、上述したいずれの技術であっても、不規則アクセスが多い演算において効率的なプリフェッチを実現することは難しく、データ転送を効率化することは困難である。 However, in operations that involve many irregular accesses, such as sparse matrix operations, it is difficult to predict future data accesses. For example, it is very difficult to predict which data will be used in the future, such as vector data in sparse matrix-vector multiplication (SpMV). If future data access prediction fails, it becomes difficult to efficiently store data in the cache memory by prefetching, and it becomes difficult to improve the efficiency of data transfer. Furthermore, with any of the techniques described above, it is difficult to implement efficient prefetching in operations that involve many irregular accesses, and it is difficult to improve the efficiency of data transfer.

開示の技術は、上記に鑑みてなされたものであって、データ転送を効率化する演算処理装置及び演算処理方法を提供することを目的とする。 The disclosed technology has been developed in view of the above, and aims to provide an arithmetic processing device and an arithmetic processing method that improve the efficiency of data transfer.

本願の開示する演算処理装置及び演算処理方法の一つの態様において、演算処理装置は、演算部、メインメモリ及び１つ又は階層化された複数のキャッシュを有する。前記キャッシュの少なくとも１つは、以下の各部を備える。記憶部は、データを格納する。制御部は、前記演算部又は上位キャッシュからデータアクセス要求を受けて、前記データアクセス要求の対象データが前記記憶部に存在する場合に前記対象データにアクセスする。また前記制御部は、前記対象データが前記記憶部に存在しない場合に前記対象データを下位キャッシュ又は前記メインメモリから取得して前記記憶部に格納する。情報管理部は、前記対象データが前記記憶部に存在しないことを示すキャッシュミスの発生回数を算出する。投機的プリフェッチ部は、前記発生回数を基に投機的データを前記メインメモリ又は前記下位キャッシュから取得して、取得した前記投機的データを前記記憶部に格納する。 In one aspect of the arithmetic processing device and the arithmetic processing method disclosed in the present application, the arithmetic processing device includes a calculation unit, a main memory, and one or a plurality of hierarchical caches. At least one of the caches includes the following parts. The storage unit stores data. The control unit receives a data access request from the calculation unit or the upper cache, and accesses the target data when the target data of the data access request exists in the storage unit. Further, when the target data does not exist in the storage unit, the control unit acquires the target data from the lower cache or the main memory and stores it in the storage unit. The information management unit calculates the number of occurrences of a cache miss indicating that the target data does not exist in the storage unit. The speculative prefetch unit acquires speculative data from the main memory or the lower cache based on the number of occurrences, and stores the acquired speculative data in the storage unit.

１つの側面では、本発明は、データ転送を効率化することができる。 In one aspect, the present invention can streamline data transfer.

図１は、情報処理装置の全体構成を示す概略図である。FIG. 1 is a schematic diagram showing the overall configuration of an information processing device. 図２は、実施例１に係るＬ１及びＬ２キャッシュのブロック図である。FIG. 2 is a block diagram of L1 and L2 caches according to the first embodiment. 図３は、メモリアドレスの一例を示す図である。FIG. 3 is a diagram showing an example of memory addresses. 図４は、キャッシュミス情報の一例の図である。FIG. 4 is a diagram of an example of cache miss information. 図５は、実施例１に係る制御部によるデータキャッシュ処理のフローチャートである。FIG. 5 is a flowchart of data cache processing by the control unit according to the first embodiment. 図６は、キャッシュミス情報更新部によるキャッシュミス情報の更新処理のフローチャートである。FIG. 6 is a flowchart of the cache miss information update process by the cache miss information update unit. 図７は、投機的プリフェッチ部による投機的プリフェッチ処理のフローチャートである。FIG. 7 is a flowchart of speculative prefetch processing by the speculative prefetch unit. 図８は、実施例２に係るＬ２キャッシュのブロック図である。FIG. 8 is a block diagram of the L2 cache according to the second embodiment. 図９は、実施例２に係るキャッシュ情報の構成を示す図である。FIG. 9 is a diagram showing the configuration of cache information according to the second embodiment. 図１０は、実施例２に係る制御部によるデータキャッシュ処理のフローチャートである。FIG. 10 is a flowchart of data cache processing by the control unit according to the second embodiment. 図１１は、データモニタ部によるデータ格納処理のフローチャートである。FIG. 11 is a flowchart of data storage processing by the data monitor section. 図１２は、ＭＲＡＭをビクティムキャッシュとして用いる場合のデータキャッシュ処理のフローチャートである。FIG. 12 is a flowchart of data cache processing when MRAM is used as a victim cache.

以下に、本願の開示する演算処理装置及び演算処理方法の実施例を図面に基づいて詳細に説明する。なお、以下の実施例により本願の開示する演算処理装置及び演算処理方法が限定されるものではない。 Embodiments of the arithmetic processing device and the arithmetic processing method disclosed in the present application will be described in detail below with reference to the drawings. Note that the following embodiments do not limit the arithmetic processing device and the arithmetic processing method disclosed in the present application.

図１は、情報処理装置の全体構成を示す概略図である。図１に示すように、情報処理装置１は、演算部１１、Ｌ１キャッシュ１２、Ｌ２キャッシュ１３、Ｌ３キャッシュ１４、メインメモリ１５、補助記憶装置１６、表示装置１７及び入力装置１８を有する。演算部１１は、Ｌ１キャッシュ１２、Ｌ２キャッシュ１３、Ｌ３キャッシュ１４、メインメモリ１５、補助記憶装置１６、表示装置１７及び入力装置１８のそれぞれとバスで接続される。演算部１１、Ｌ１キャッシュ１２、Ｌ２キャッシュ１３及びＬ３キャッシュ１４は、例えば、演算処理装置であるＣＰＵ１０に搭載される。 FIG. 1 is a schematic diagram showing the overall configuration of an information processing device. As shown in FIG. 1, the information processing device 1 includes a calculation unit 11, an L1 cache 12, an L2 cache 13, an L3 cache 14, a main memory 15, an auxiliary storage device 16, a display device 17, and an input device 18. The calculation unit 11 is connected to each of the L1 cache 12, L2 cache 13, L3 cache 14, main memory 15, auxiliary storage device 16, display device 17, and input device 18 via a bus. The calculation unit 11, the L1 cache 12, the L2 cache 13, and the L3 cache 14 are installed, for example, in the CPU 10, which is a calculation processing device.

演算部１１は、例えば、ＣＰＵ（Central Processing Unit）コアである。演算部１１は、補助記憶装置１６に格納された各種プログラムなどを読み出してメインメモリ１５に展開して、Ｌ１キャッシュ１２、Ｌ２キャッシュ１３、Ｌ３キャッシュ１４及びメインメモリ１５に格納されたデータを用いて演算を実行する。 The calculation unit 11 is, for example, a CPU (Central Processing Unit) core. The calculation unit 11 reads various programs stored in the auxiliary storage device 16, develops them in the main memory 15, and uses the data stored in the L1 cache 12, L2 cache 13, L3 cache 14, and main memory 15. Perform calculations.

Ｌ１キャッシュ１２は、動作速度が速く且つＬ２キャッシュ１２及びＬ３キャッシュ１４と比べて容量の小さいキャッシュメモリであり、演算部１１によるデータアクセス時に最初に読み込まれるキャッシュメモリである。Ｌ１キャッシュ１２は、例えば、ＳＲＡＭ（Static Random Access Memory）である。 The L1 cache 12 is a cache memory that has a high operating speed and a smaller capacity than the L2 cache 12 and the L3 cache 14, and is the cache memory that is read first when data is accessed by the calculation unit 11. The L1 cache 12 is, for example, SRAM (Static Random Access Memory).

Ｌ２キャッシュ１３は、動作速度が速く且つ一般的にＬ１キャッシュ１２よりも容量の大きいキャッシュメモリであり、演算部１１によるデータアクセス時に、Ｌ１キャッシュ１２でキャッシュミスが発生した場合に次に読み込まれるキャッシュメモリである。Ｌ２キャッシュ１３も、例えば、ＳＲＡＭである。 The L2 cache 13 is a cache memory that operates at a faster speed and generally has a larger capacity than the L1 cache 12. The L2 cache 13 is a cache memory that has a faster operation speed and generally has a larger capacity than the L1 cache 12, and is the cache memory that is read next when a cache miss occurs in the L1 cache 12 during data access by the calculation unit 11. It's memory. The L2 cache 13 is also, for example, SRAM.

Ｌ３キャッシュ１４は、動作速度が速く且つ一般的にＬ２キャッシュ１３よりも容量の大きいキャッシュメモリであり、演算部１１によるデータアクセス時に、Ｌ２キャッシュ１３でキャッシュミスが発生した場合に次に読み込まれるキャッシュメモリである。Ｌ３キャッシュ１４も、例えば、ＳＲＡＭである。 The L3 cache 14 is a cache memory that has a faster operating speed and generally has a larger capacity than the L2 cache 13, and is the cache memory that is read next when a cache miss occurs in the L2 cache 13 during data access by the calculation unit 11. It's memory. The L3 cache 14 is also, for example, SRAM.

ここで、本実施例では、情報処理装置１が、Ｌ１キャッシュ１２、Ｌ２キャッシュ１３及びＬ３キャッシュ１４という３つのキャッシュメモリを有する場合で説明するが、キャッシュメモリの階層の数はこれに限らない、例えば、情報処理装置１は、Ｌ２キャッシュ１３やＬ３キャッシュ１４を有さなくても良いし、４つ以上の階層を有してもよい。 Here, in this embodiment, a case will be described in which the information processing device 1 has three cache memories, L1 cache 12, L2 cache 13, and L3 cache 14, but the number of cache memory hierarchies is not limited to this. For example, the information processing device 1 may not have the L2 cache 13 or the L3 cache 14, or may have four or more layers.

メインメモリ１５は、Ｌ１キャッシュ１２、Ｌ２キャッシュ１３及びＬ３キャッシュ１４に比べて動作速度が遅く且つ大容量の主記憶装置である。メインメモリ１５は、演算部１１が演算に用いるデータが格納される。メインメモリ１５は、Ｌ１キャッシュ１２、Ｌ２キャッシュ１３及びＬ３キャッシュ１４のいずれにもアクセス対象のデータが存在しない場合に演算部１１からのアクセスを受ける。メインメモリ１５は、例えば、ＤＲＡＭ（Dynamic Random Access Memory）である。 The main memory 15 is a main storage device with a lower operating speed and larger capacity than the L1 cache 12, L2 cache 13, and L3 cache 14. The main memory 15 stores data used in calculations by the calculation unit 11. The main memory 15 receives access from the calculation unit 11 when there is no data to be accessed in any of the L1 cache 12, L2 cache 13, and L3 cache 14. The main memory 15 is, for example, a DRAM (Dynamic Random Access Memory).

補助記憶装置１６は、例えば、ＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）などである。補助記憶装置１６には、ＯＳ（Operating System）や演算を行なうための各種プログラムが格納される。 The auxiliary storage device 16 is, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive). The auxiliary storage device 16 stores an OS (Operating System) and various programs for performing calculations.

表示装置１７は、例えば、モニタやディスプレイなどである。表示装置１７は、演算部１１による演算結果の利用者への提示などを行なう。入力装置１８は、例えば、キーボードやマウスなどである。利用者は、表示装置１７に表示された画面を参照しつつ、入力装置１８を用いて情報処理装置１へのデータや命令の入力を行なう。表示装置１７及び入力装置１８は、１つのハードウェアとして構成されてもよい。 The display device 17 is, for example, a monitor or a display. The display device 17 presents the results of calculations by the calculation unit 11 to the user. The input device 18 is, for example, a keyboard or a mouse. The user inputs data and commands to the information processing device 1 using the input device 18 while referring to the screen displayed on the display device 17 . The display device 17 and the input device 18 may be configured as one piece of hardware.

図２は、実施例１に係るＬ１～Ｌ２キャッシュのブロック図である。図２では、Ｌ１キャッシュ１２、Ｌ２キャッシュ１３及びＬ３キャッシュ１４の階層構造を分かり易くするため、それぞれが多段で接続されるように示した。実際の接続は、図１に示したように演算部１１から延びるバスにそれぞれが接続してもよい。 FIG. 2 is a block diagram of the L1-L2 cache according to the first embodiment. In FIG. 2, in order to make the hierarchical structure of the L1 cache 12, L2 cache 13, and L3 cache 14 easier to understand, they are shown connected in multiple stages. In actual connection, each may be connected to a bus extending from the calculation unit 11 as shown in FIG.

Ｌ１キャッシュ１２は、制御部１２１及び記憶部１２２を有する。記憶部１２２は、キャッシュされたデータ群であるキャッシュ情報１２３を有する。 The L1 cache 12 includes a control section 121 and a storage section 122. The storage unit 122 has cache information 123 that is a cached data group.

制御部１２１は、演算部１１からのデータ要求を受信する。そして、制御部１２１は、データ要求で指定されたデータが記憶部１２２に存在するか否かを判定する。記憶部１２２が保持するキャッシュ情報１２３に指定されたデータが存在する場合、制御部１２１は、キャッシュ情報１２３の中から指定されたデータを取り出して演算部１１へ送信して応答を行なう。 The control unit 121 receives a data request from the calculation unit 11. Then, the control unit 121 determines whether the data specified in the data request exists in the storage unit 122. If the specified data exists in the cache information 123 held by the storage unit 122, the control unit 121 extracts the specified data from the cache information 123 and sends it to the calculation unit 11 to respond.

これに対して、記憶部１２２が保持するキャッシュ情報１２３に指定されたデータが存在しない場合、制御部１２１は、キャッシュミスと判定して、データ要求をＬ２キャッシュ１３へ出力する。その後、Ｌ２キャッシュ１３から要求したデータを受信すると、制御部１２１は、記憶部１２２が保持するキャッシュ情報１２３として受信したデータを格納する。また、キャッシュの方式によっては、Ｌ１キャッシュ１２が、Ｌ２キャッシュ１３から受信したデータを演算部１１へ出力してもよい。 On the other hand, if the data specified in the cache information 123 held by the storage unit 122 does not exist, the control unit 121 determines that there is a cache miss and outputs a data request to the L2 cache 13. Thereafter, upon receiving the requested data from the L2 cache 13, the control unit 121 stores the received data as cache information 123 held by the storage unit 122. Further, depending on the cache method, the L1 cache 12 may output the data received from the L2 cache 13 to the calculation unit 11.

次に、Ｌ２キャッシュ１３について説明する。本実施例では、Ｌ２キャッシュ１３は、キャッシュのマッピング方式としてセットアソシアティブ方式を用いて動作する。図３は、メモリアドレスの一例を示す図である。本実施例では、メモリアドレス１０１は、例えば、図３に示すように、タグ、セット、ブロックオフセット及びバイトオフセットを有する。タグとセットとを合わせてブロックアドレスと呼ぶ。ブロックアドレスは、メインメモリ１５におけるどのブロックかを示す情報である。また、セットは、キャッシュにおけるどのセットに格納するかを示すインデックスである。また、ブロックオフセットは、ブロック内のどの部分に要求するデータがあるかを示す情報である。また、バイトオフセットは、１回の読み書きの単位であるワードの中のどの部分に要求するデータがあるかを示す情報である。以下では、あるデータが格納された位置を示すメモリアドレスを、そのデータのメモリアドレスと呼ぶ。 Next, the L2 cache 13 will be explained. In this embodiment, the L2 cache 13 operates using a set associative method as a cache mapping method. FIG. 3 is a diagram showing an example of memory addresses. In this embodiment, the memory address 101 has, for example, a tag, a set, a block offset, and a byte offset, as shown in FIG. The tag and set together are called a block address. The block address is information indicating which block in the main memory 15. Further, the set is an index indicating in which set in the cache the data is stored. Further, the block offset is information indicating in which part of the block the requested data is located. Further, the byte offset is information indicating in which part of a word, which is a unit of one read/write, the requested data is located. Hereinafter, a memory address indicating a location where certain data is stored will be referred to as a memory address of that data.

ここで、Ｌ２キャッシュ１３のセット数の一例について説明する。Ｌ２キャッシュ１３の記憶部１３４におけるキャッシュ情報１３５を格納する領域の容量は、セット数とウェイ数とブロックサイズとの総積により算出される。例えば、Ｌ２キャッシュ１３には、１６ウェイ、２０４８セット且つブロックサイズが２５６Ｂｙｔｅのものがある。他にも、Ｌ２キャッシュ１３には、１６ウェイ、１０２４セット且つブロックサイズが６４Ｂｙｔｅのものがある。 Here, an example of the number of sets in the L2 cache 13 will be explained. The capacity of the area for storing cache information 135 in the storage unit 134 of the L2 cache 13 is calculated by the total product of the number of sets, the number of ways, and the block size. For example, the L2 cache 13 has 16 ways, 2048 sets, and a block size of 256 Bytes. In addition, the L2 cache 13 has 16 ways, 1024 sets, and a block size of 64 Bytes.

図２に戻って説明を続ける。本実施例に係るＬ２キャッシュ１３は、処理能力に余裕がある場合に、不規則アクセスの多いデータに対して投機的なプリフェッチを行なう。すなわち、Ｌ２キャッシュ１３は、不規則アクセスが多いと判定したデータを、データアクセスを受けない段階で事前にキャッシュ情報１３５として記憶部１３４に格納する。ここで、本実施例では、Ｌ２キャッシュ１３が不規則アクセスに基づく投機的プリフェッチを行う場合で説明したが、これに限らず、Ｌ１キャッシュ１２やＬ３キャッシュ１４といった他のキャッシュメモリが行ってもよい。ただし、Ｌ１キャッシュ１２は、容量の小ささや演算部１１からのアクセス頻度を考えた場合、実施例に係る不規則アクセスに基づく投機的プリフェッチにはあまり適さない。 Returning to FIG. 2, the explanation will be continued. The L2 cache 13 according to this embodiment performs speculative prefetching on data that is frequently accessed irregularly when there is sufficient processing capacity. That is, the L2 cache 13 stores data determined to be frequently accessed irregularly in the storage unit 134 in advance as cache information 135 before the data is accessed. Here, in this embodiment, a case has been described in which the L2 cache 13 performs speculative prefetching based on irregular access, but the present invention is not limited to this, and other cache memories such as the L1 cache 12 and the L3 cache 14 may also perform the speculative prefetching. . However, considering the small capacity of the L1 cache 12 and the frequency of access from the calculation unit 11, the L1 cache 12 is not very suitable for the speculative prefetch based on irregular access according to the embodiment.

以下に、Ｌ２キャッシュ１３の詳細について説明する。Ｌ２キャッシュ１３は、図２に示すように、制御部１３１、キャッシュミス情報更新部１３２、投機的プリフェッチ部１３３及び記憶部１３４を有する。 Details of the L2 cache 13 will be explained below. As shown in FIG. 2, the L2 cache 13 includes a control section 131, a cache miss information update section 132, a speculative prefetch section 133, and a storage section 134.

制御部１３１は、Ｌ１キャッシュ１２からのデータ要求を受信する。そして、制御部１２１は、データ要求で指定されたデータが記憶部１３４に存在するか否かを判定する。記憶部１３４が保持するキャッシュ情報１３５に指定されたデータが存在する場合、制御部１３１は、キャッシュ情報１３５の中から指定されたデータを取り出してＬ１キャッシュ１２及び演算部１１へ送信して応答を行なう。 The control unit 131 receives a data request from the L1 cache 12. Then, the control unit 121 determines whether the data specified in the data request exists in the storage unit 134. If the specified data exists in the cache information 135 held by the storage unit 134, the control unit 131 extracts the specified data from the cache information 135, sends it to the L1 cache 12 and the calculation unit 11, and sends a response. Let's do it.

これに対して、記憶部１３４が保持するキャッシュ情報１３５に指定されたデータが存在しない場合、制御部１３１は、キャッシュミスと判定して、データ要求をＬ３キャッシュ１４へ出力する。また、制御部１３１は、指定されたデータのキャッシュミスをキャッシュミス情報更新部１３２に通知する。その後、Ｌ３キャッシュ１４から要求したデータを受信すると、制御部１３１は、記憶部１３４が保持するキャッシュ情報１３５として受信したデータを格納する。また、キャッシュの方式によっては、Ｌ１キャッシュ１２が、Ｌ３キャッシュ１４から受信したデータをＬ１キャッシュ１２へ出力してもよい。 On the other hand, if the data specified in the cache information 135 held by the storage unit 134 does not exist, the control unit 131 determines that there is a cache miss and outputs a data request to the L3 cache 14. Further, the control unit 131 notifies the cache miss information updating unit 132 of a cache miss of the specified data. Thereafter, upon receiving the requested data from the L3 cache 14, the control unit 131 stores the received data as cache information 135 held by the storage unit 134. Further, depending on the cache method, the L1 cache 12 may output data received from the L3 cache 14 to the L1 cache 12.

キャッシュミス情報１３６は、メモリアドレスに含まれるタグ毎に、キャッシュミスの発生頻度を管理するためのテーブルである。図４は、キャッシュミス情報の一例の図である。キャッシュミス情報１３６は、タグ及び各タグに対応するカウンタの値が登録される。タグは、図３に示すメモリアドレス１０１の上位部分にあたる。また、カウンタは、タグに対応するアドレス範囲で発生したキャッシュミスの数を表す。 The cache miss information 136 is a table for managing the frequency of occurrence of cache misses for each tag included in a memory address. FIG. 4 is a diagram of an example of cache miss information. In the cache miss information 136, tags and counter values corresponding to each tag are registered. The tag corresponds to the upper part of the memory address 101 shown in FIG. Further, the counter represents the number of cache misses that have occurred in the address range corresponding to the tag.

キャッシュミス情報更新部１３２は、キャッシュ情報１３５の変更に応じて、キャッシュミス情報１３６を更新する。より詳しくは、キャッシュミス情報更新部１３２は、キャッシュミス発生時に、指定されたデータのキャッシュミスの通知を制御部１３１から受ける。次に、キャッシュミス情報更新部１３２は、記憶部１３４に格納されたキャッシュミス情報１３６を参照する。そして、キャッシュミス情報更新部１３２は、キャッシュミス情報１３６に指定されたデータに対応するエントリが存在するか否かを判定する。すなわち、キャッシュミス情報更新部１３２は、指定されたデータのメモリアドレスに含まれるタグを有するエントリがキャッシュミス情報１３６に存在するか否かを判定する。 The cache miss information update unit 132 updates the cache miss information 136 in accordance with the change in the cache information 135. More specifically, the cache miss information update unit 132 receives a notification of a cache miss for specified data from the control unit 131 when a cache miss occurs. Next, the cache miss information update unit 132 refers to the cache miss information 136 stored in the storage unit 134. Then, the cache miss information update unit 132 determines whether an entry corresponding to the data specified in the cache miss information 136 exists. That is, the cache miss information update unit 132 determines whether or not an entry having the tag included in the memory address of the specified data exists in the cache miss information 136.

対応するエントリが存在しない場合、キャッシュミス情報更新部１３２は、指定されたデータのメモリアドレスに含まれるタグ有するエントリをキャッシュミス情報１３６に追加する。さらに、キャッシュミス情報更新部１３２は、カウンタをカウンタの初期値に設定する。これに対して、対応するエントリが存在する場合、キャッシュミス情報更新部１３２は、キャッシュミス情報１３６の指定されたデータのメモリアドレスに含まれるタグに対応するカウンタをインクリメントする。すなわち、キャッシュミス情報更新部１３２は、タグ毎にキャッシュミスの発生数を累積していく。キャッシュミス情報１３６を更新した場合、キャッシュミス情報更新部１３２は、キャッシュミス情報１３６の更新を投機的プリフェッチ部１３３へ通知する。 If the corresponding entry does not exist, the cache miss information update unit 132 adds an entry having the tag included in the memory address of the specified data to the cache miss information 136. Furthermore, the cache miss information update unit 132 sets the counter to its initial value. On the other hand, if a corresponding entry exists, the cache miss information update unit 132 increments the counter corresponding to the tag included in the memory address of the specified data in the cache miss information 136. That is, the cache miss information updating unit 132 accumulates the number of cache misses for each tag. When the cache miss information 136 is updated, the cache miss information update unit 132 notifies the speculative prefetch unit 133 of the update of the cache miss information 136.

ここで、Ｌ２キャッシュ１３は、ストライドプリフェッチなどのハードウェアプリフェッチを実行してもよい。ただし、キャッシュミス情報更新部１３２は、ハードウェアプリフェッチをキャッシュミスと同様には扱わず、ハードウェアプリフェッチが発生した場合でもキャッシュミス情報１３６のカウントをインクリメントしない。 Here, the L2 cache 13 may perform hardware prefetch such as stride prefetch. However, the cache miss information update unit 132 does not treat a hardware prefetch in the same way as a cache miss, and does not increment the count in the cache miss information 136 even if a hardware prefetch occurs.

また、キャッシュミス情報更新部１３２は、所定の条件が満たされた場合に、キャッシュミス情報１３６のエントリをクリアしてもよい。例えば、特定の処理における一群の演算が終了した場合に、キャッシュミス情報更新部１３２は、キャッシュミス情報１３６のエントリをクリアしてもよい。このキャッシュミス情報更新部１３２が、「情報管理部」の一例にあたる。 Further, the cache miss information updating unit 132 may clear the entry of the cache miss information 136 when a predetermined condition is satisfied. For example, when a group of operations in a specific process is completed, the cache miss information update unit 132 may clear the entry of the cache miss information 136. This cache miss information update unit 132 is an example of an “information management unit”.

投機的プリフェッチ部１３３は、Ｌ２キャッシュ１３とＬ３キャッシュ１４との間のバスに余裕が存在するときに、不規則アクセスが多いアドレス範囲からキャッシュしていないデータをプリフェッチする。すなわち、投機的プリフェッチ部１３３は、キャッシュしていないデータを下位メモリであるＬ３キャッシュ１４又はメインメモリ１５から取得してキャッシュ情報１３５として記憶部１３４に格納する。 The speculative prefetch unit 133 prefetches uncached data from an address range that is frequently accessed irregularly when there is some free space on the bus between the L2 cache 13 and the L3 cache 14. That is, the speculative prefetch unit 133 acquires uncached data from the L3 cache 14 or the main memory 15, which is a lower memory, and stores it in the storage unit 134 as cache information 135.

ここで、ハードウェアプリフェッチなどの通常のプリフェッチでは、データのアクセスの規則性を推測してプリフェッチが行われる。すなわち、あるアドレス範囲において推測できる程度の規則性にしたがってデータアクセスが行われていれば、そのアドレス範囲においてキャッシュミスの発生は通常のプリフェッチにより抑えられると考える。このことから、キャッシュミスが頻発している場合、推測できる程度の規則性から外れたデータアクセスが行われていると想定される。すなわち、キャッシュミスの発生数が多いアドレス範囲は、不規則アクセスが多いアドレス範囲と考えることができる。そこで、投機的プリフェッチ部１３３は、キャッシュミス情報１３６においてカウンタの値が大きいアドレス範囲に不規則アクセスの対象となるデータがあると判定し、そのデータをＬ２キャッシュ１３の記憶部１３４に格納する。 Here, in normal prefetching such as hardware prefetching, prefetching is performed by estimating the regularity of data access. That is, if data accesses are performed according to a regularity that can be estimated within a certain address range, it is considered that the occurrence of cache misses in that address range can be suppressed by normal prefetching. From this, if cache misses occur frequently, it is assumed that data accesses are being performed with a degree of irregularity that can be inferred. In other words, an address range where a large number of cache misses occur can be considered an address range where irregular accesses occur frequently. Therefore, the speculative prefetch unit 133 determines that there is data to be irregularly accessed in the address range where the counter value is large in the cache miss information 136, and stores the data in the storage unit 134 of the L2 cache 13.

ただし、キャッシュミス情報１３６においてカウンタ値が全体的に小さい場合、いずれのアドレス範囲においても不規則アクセスが発生していないといえる。そこで、投機的プリフェッチ部１３３は、不規則アクセスの発生を判定するための判定閾値を予め有する。判定閾値は、大きければ不規則アクセスの検出の見逃しが多くなり、小さければ不規則アクセスの誤検出が多くなる。そこで、判定閾値は、情報処理装置１の運用状態に応じて決定されることが好ましい。 However, if the counter values in the cache miss information 136 are small overall, it can be said that irregular accesses have not occurred in any address range. Therefore, the speculative prefetch unit 133 has a determination threshold value in advance for determining the occurrence of irregular access. When the determination threshold is large, irregular accesses are often missed, and when it is small, irregular accesses are often erroneously detected. Therefore, it is preferable that the determination threshold value is determined according to the operating state of the information processing device 1.

このように、特定のアドレス範囲に不規則アクセスの対象となるデータが推測し、そのアドレス範囲の中から今後アクセス対象になるかが不確実なデータを選択してプリフェッチすることから、以下では、投機的プリフェッチ部１３３が行なうプリフェッチを投機的プリフェッチと呼ぶ。投機的プリフェッチ部１３３が投機的プリフェッチの対象とするデータが、「投機的データ」の一例にあたる。以下に、投機的プリフェッチ部１３３の動作の詳細について説明する。 In this way, data to be accessed irregularly is inferred in a specific address range, and data that is uncertain whether it will be accessed in the future is selected from that address range and prefetched. The prefetch performed by the speculative prefetch unit 133 is called speculative prefetch. The data targeted for speculative prefetch by the speculative prefetch unit 133 is an example of "speculative data." The details of the operation of the speculative prefetch unit 133 will be described below.

投機的プリフェッチ部１３３は、キャッシュミス情報１３６の更新通知をキャッシュミス情報更新部１３２から受信する。次に、投機的プリフェッチ部１３３は、データを格納する空き領域が記憶部１３４に存在するか否かを判定する。データを格納する空き領域があれば、投機的プリフェッチ部１３３は空いているウェイが存在するセットを１つ選択する。 The speculative prefetch unit 133 receives an update notification of the cache miss information 136 from the cache miss information update unit 132. Next, the speculative prefetch unit 133 determines whether or not there is free space in the storage unit 134 to store the data. If there is a free area to store data, the speculative prefetch unit 133 selects one set in which a free way exists.

次に、投機的プリフェッチ部１３３は、キャッシュミス情報１３６を参照する。そして、投機的プリフェッチ部１３３は、キャッシュミス情報１３６に格納されたエントリの中からカウンタ値が大きいものから順にエントリを選択する。 Next, the speculative prefetch unit 133 refers to the cache miss information 136. Then, the speculative prefetch unit 133 selects entries from among the entries stored in the cache miss information 136 in descending order of the counter value.

次に、投機的プリフェッチ部１３３は、選択したエントリのカウンタ値が判定閾値より大きいか否かを判定する。判定閾値よりもカウンタ値が大きい場合、投機的プリフェッチ部１３３は、選択したセットと選択したエントリとのタグに対応するデータが、記憶部１３４に存在するか否かを判定する。選択したセットとタグに対応するデータが記憶部１３４に既に存在する場合、投機的プリフェッチ部１３３は、次にカウンタ値の大きいエントリをキャッシュミス情報１３６から選択して、同様の処理を繰り返す。 Next, the speculative prefetch unit 133 determines whether the counter value of the selected entry is greater than the determination threshold. If the counter value is larger than the determination threshold, the speculative prefetch unit 133 determines whether data corresponding to the tag of the selected set and the selected entry exists in the storage unit 134. If data corresponding to the selected set and tag already exists in the storage unit 134, the speculative prefetch unit 133 selects the entry with the next largest counter value from the cache miss information 136, and repeats the same process.

これに対して、選択したセットとタグに対応するデータが記憶部１３４に存在しなければ、投機的プリフェッチ部１３３は、Ｌ２キャッシュ１３とＬ３キャッシュ１４との間のバスの処理能力に余裕ができるまで待機する。具体的には、投機的プリフェッチ部１３３は、Ｌ２キャッシュ１３とＬ３キャッシュ１４との間のバスの処理量が予め設定した処理量閾値よりも小さい場合に、処理能力に余裕があると判定する。投機的プリフェッチ部１３３は、例えば、バスの処理量としてバスのビジー率（使用率）、ロードストア比率、Ｌ２キャッシュ１３のキャッシュミス率又はデータ取得に係るレイテンシなどを用いることができる。 On the other hand, if the data corresponding to the selected set and tag does not exist in the storage unit 134, the speculative prefetch unit 133 can free up processing capacity of the bus between the L2 cache 13 and the L3 cache 14. Wait until. Specifically, the speculative prefetch unit 133 determines that there is sufficient processing capacity when the throughput of the bus between the L2 cache 13 and the L3 cache 14 is smaller than a preset throughput threshold. The speculative prefetch unit 133 can use, for example, the busy rate (usage rate) of the bus, the load/store ratio, the cache miss rate of the L2 cache 13, or the latency related to data acquisition as the amount of bus processing.

例えば、ロードストア比率を用いる場合、投機的プリフェッチ部１３３は、実行命令に対するロード命令数及びストア命令数の比率をＬ１キャッシュ１２から取得する。ロード命令及びストア命令の場合データがバスで運ばれるため、その比率が大きい場合に、投機的プリフェッチ部１３３は、バスの処理量が大きいと判定できる。また、Ｌ２キャッシュ１３のキャッシュミスが多い場合、Ｌ２キャッシュ１３がＬ３キャッシュ１４から取得するデータ量が増加する。そこで、Ｌ２キャッシュ１３のキャッシュミスが多い場合に、投機的プリフェッチ部１３３は、バスの処理量が大きいと判定できる。また、データ取得に係るレイテンシは、演算部１１によるＬ１キャッシュ１２からのデータ取得にかかる時間である。データ取得に係るレイテンシはＬ３キャッシュ１４やメインメモリ１５へのアクセスが増えると増加するため、この値が大きい場合に、投機的プリフェッチ部１３３は、バスの処理量が大きいと判定できる。 For example, when using the load/store ratio, the speculative prefetch unit 133 obtains the ratio of the number of load instructions and the number of store instructions to the executed instructions from the L1 cache 12. In the case of load instructions and store instructions, data is carried on the bus, so if the ratio is large, the speculative prefetch unit 133 can determine that the amount of processing on the bus is large. Furthermore, when there are many cache misses in the L2 cache 13, the amount of data that the L2 cache 13 acquires from the L3 cache 14 increases. Therefore, when there are many cache misses in the L2 cache 13, the speculative prefetch unit 133 can determine that the amount of bus processing is large. Furthermore, the latency related to data acquisition is the time required for the calculation unit 11 to acquire data from the L1 cache 12. The latency associated with data acquisition increases as the number of accesses to the L3 cache 14 and main memory 15 increases, so if this value is large, the speculative prefetch unit 133 can determine that the amount of bus processing is large.

Ｌ２キャッシュ１３とＬ３キャッシュ１４との間のバスの処理能力に余裕ができた後、投機的プリフェッチ部１３３は、選択したセットとタグに対応するデータが待機中に記憶部１３４に格納されていないかを確認する。待機中にデータの格納が行われていなければ、投機的プリフェッチ部１３３は、対応するデータをＬ３キャッシュ１４から取得してＬ２キャッシュ１３の記憶部１３４に格納する。これにより、投機的プリフェッチ部１３３は、投機的プリフェッチ処理を実行する。 After the processing capacity of the bus between the L2 cache 13 and the L3 cache 14 becomes available, the speculative prefetch unit 133 determines whether the data corresponding to the selected set and tag is not stored in the storage unit 134 while waiting. Check whether If data is not stored during standby, the speculative prefetch unit 133 acquires the corresponding data from the L3 cache 14 and stores it in the storage unit 134 of the L2 cache 13 . Thereby, the speculative prefetch unit 133 executes speculative prefetch processing.

図５は、実施例１に係る制御部によるデータキャッシュ処理のフローチャートである。次に、図５を参照して、制御部１３１によるデータキャッシュ処理の流れを説明する。ここでは、演算部１１によるデータＡへのアクセスが発生した場合で説明する。 FIG. 5 is a flowchart of data cache processing by the control unit according to the first embodiment. Next, the flow of data cache processing by the control unit 131 will be described with reference to FIG. Here, a case will be described in which access to data A by the calculation unit 11 occurs.

制御部１３１は、データＡの送信要求をＬ１キャッシュ１２から受ける（ステップＳ１０１）。 The control unit 131 receives a request to send data A from the L1 cache 12 (step S101).

次に、制御部１３１は、データＡが記憶部１３４に格納されたキャッシュ情報１３５に存在するか否かを判定する（ステップＳ１０２）。 Next, the control unit 131 determines whether data A exists in the cache information 135 stored in the storage unit 134 (step S102).

データＡが記憶部１３４に格納されたキャッシュ情報１３５に存在しない場合（ステップＳ１０２：否定）、制御部１３１は、データＡのキャッシュミスをキャッシュミス情報更新部１３２に通知する（ステップＳ１０３）。 If data A does not exist in the cache information 135 stored in the storage unit 134 (step S102: negative), the control unit 131 notifies the cache miss information update unit 132 of the cache miss of data A (step S103).

次に、制御部１３１は、データＡの送信をＬ３キャッシュ１４に要求する（ステップＳ１０４）。ステップＳ１０４は、ステップＳ１０３と同時、もしくは、ステップＳ１０３の前に行ってもよい。 Next, the control unit 131 requests the L3 cache 14 to transmit data A (step S104). Step S104 may be performed simultaneously with step S103 or before step S103.

その後、制御部１３１は、データＡをＬ３キャッシュ１４から取得する。そして、制御部１３１は、データＡをＳＲＡＭである記憶部１３４にキャッシュ情報１３５として格納する（ステップＳ１０５）。 Thereafter, the control unit 131 obtains data A from the L3 cache 14. Then, the control unit 131 stores the data A in the storage unit 134, which is an SRAM, as cache information 135 (step S105).

これに対して、データＡが記憶部１３４に格納されたキャッシュ情報１３５に存在する場合（ステップＳ１０２：肯定）、制御部１３１は、記憶部１３４が保持するキャッシュ情報１３５の中からデータＡを取得する。そして、制御部１３１は、データＡをＬ１キャッシュ１２及び演算部１１へ送信する（ステップＳ１０６）。 On the other hand, if the data A exists in the cache information 135 stored in the storage unit 134 (step S102: affirmative), the control unit 131 acquires the data A from the cache information 135 held in the storage unit 134. do. Then, the control unit 131 transmits the data A to the L1 cache 12 and the calculation unit 11 (step S106).

図６は、キャッシュミス情報更新部によるキャッシュミス情報の更新処理のフローチャートである。次に、図６を参照して、キャッシュミス情報更新部１３２によるキャッシュミス情報の更新処理の流れを説明する。 FIG. 6 is a flowchart of the cache miss information update process by the cache miss information update unit. Next, with reference to FIG. 6, the flow of the cache miss information update process by the cache miss information update unit 132 will be described.

キャッシュミス情報更新部１３２は、キャッシュミスの通知を制御部１３１から受信する（ステップＳ２０１）。 The cache miss information update unit 132 receives a cache miss notification from the control unit 131 (step S201).

次に、キャッシュミス情報更新部１３２は、キャッシュミスされたデータに対応するエントリ、すなわち通知で指定されたデータのメモリアドレスに含まれるタグに対応するエントリがキャッシュミス情報１３６に存在するか否かを判定する（ステップＳ２０２）。 Next, the cache miss information update unit 132 determines whether or not there is an entry corresponding to the cache miss data, that is, an entry corresponding to the tag included in the memory address of the data specified in the notification, in the cache miss information 136. is determined (step S202).

キャッシュミスされたデータに対応するエントリが存在する場合（ステップＳ２０２：肯定）、キャッシュミス情報更新部１３２は、キャッシュミスされたデータに対応するエントリのカウンタ値をインクリメントする（ステップＳ２０３）。 If there is an entry corresponding to the cache-missed data (step S202: affirmative), the cache-miss information update unit 132 increments the counter value of the entry corresponding to the cache-missed data (step S203).

これに対して、キャッシュミスされたデータに対応するエントリが存在しない場合（ステップＳ２０２：否定）、キャッシュミス情報更新部１３２は、キャッシュミスされたデータに対応するエントリをキャッシュミス情報１３６に追加する（ステップＳ２０４）。この際、キャッシュミス情報更新部１３２は、追加したエントリのカウンタ値を初期値に設定する。 On the other hand, if there is no entry corresponding to the cache-missed data (step S202: negative), the cache-miss information update unit 132 adds the entry corresponding to the cache-missed data to the cache-miss information 136. (Step S204). At this time, the cache miss information update unit 132 sets the counter value of the added entry to the initial value.

図７は、投機的プリフェッチ部による投機的プリフェッチ処理のフローチャートである。次に、図７を参照して、投機的プリフェッチ部１３３による投機的プリフェッチ処理の流れを説明する。 FIG. 7 is a flowchart of speculative prefetch processing by the speculative prefetch unit. Next, with reference to FIG. 7, the flow of speculative prefetch processing by the speculative prefetch unit 133 will be described.

投機的プリフェッチ部１３３は、キャッシュミス情報１３６の更新通知をキャッシュミス情報更新部１３２から受信する（ステップＳ３０１）。 The speculative prefetch unit 133 receives an update notification of the cache miss information 136 from the cache miss information update unit 132 (step S301).

次に、投機的プリフェッチ部１３３は、データを格納する空き領域が記憶部１３４に存在するか否かを判定する（ステップＳ３０２）。データを格納する空き領域が存在しない場合（ステップＳ３０２：否定）、投機的プリフェッチ部１３３は、投機的プリフェッチ処理を終了する。 Next, the speculative prefetch unit 133 determines whether there is a free area in the storage unit 134 to store data (step S302). If there is no free space to store the data (step S302: negative), the speculative prefetch unit 133 ends the speculative prefetch process.

これに対して、データを格納する空き領域が存在する場合（ステップＳ３０２：肯定）、投機的プリフェッチ部１３３は空いているウェイが存在するセットを選択する（ステップＳ３０３）。 On the other hand, if there is a free area to store data (step S302: affirmative), the speculative prefetch unit 133 selects a set in which a free way exists (step S303).

次に、投機的プリフェッチ部１３３は、キャッシュミス情報１３６を参照する。そして、投機的プリフェッチ部１３３は、キャッシュミス情報１３６に登録されたエントリの中の未選択のエントリのうちカウンタ値が最大のエントリを選択する（ステップＳ３０４）。 Next, the speculative prefetch unit 133 refers to the cache miss information 136. Then, the speculative prefetch unit 133 selects the entry with the largest counter value among the unselected entries among the entries registered in the cache miss information 136 (step S304).

次に、投機的プリフェッチ部１３３は、選択したエントリのカウンタ値が判定閾値より大きいか否かを判定する（ステップＳ３０５）。選択したエントリのカウンタ値が判定閾値以下の場合（ステップＳ３０５：否定）、投機的プリフェッチ部１３３は、投機的プリフェッチ処理を終了する。 Next, the speculative prefetch unit 133 determines whether the counter value of the selected entry is larger than the determination threshold (step S305). If the counter value of the selected entry is less than or equal to the determination threshold (step S305: negative), the speculative prefetch unit 133 ends the speculative prefetch process.

これに対して、選択したエントリのカウンタ値が判定閾値よりも大きい場合（ステップＳ３０５：肯定）、投機的プリフェッチ部１３３は、選択したセットとタグに対応するデータが記憶部１３４に存在するか否かを判定する（ステップＳ３０６）。選択したセットとタグに対応するデータが記憶部１３４に既に存在する場合（ステップＳ３０６：肯定）、投機的プリフェッチ部１３３は、ステップＳ３０４へ戻る。 On the other hand, if the counter value of the selected entry is larger than the determination threshold (step S305: affirmative), the speculative prefetch unit 133 determines whether data corresponding to the selected set and tag exists in the storage unit 134. (Step S306). If data corresponding to the selected set and tag already exists in the storage unit 134 (step S306: affirmative), the speculative prefetch unit 133 returns to step S304.

これに対して、選択したセットとタグに対応するデータが記憶部１３４に既に存在しない場合（ステップＳ３０６：否定）、投機的プリフェッチ部１３３は、Ｌ２キャッシュ１３とＬ３キャッシュ１４との間のバスの処理能力に余裕ができるまで待機する（ステップＳ３０７）。 On the other hand, if the data corresponding to the selected set and tag does not already exist in the storage unit 134 (step S306: negative), the speculative prefetch unit 133 controls the bus between the L2 cache 13 and the L3 cache 14. The processing waits until processing capacity becomes available (step S307).

その後、投機的プリフェッチ部１３３は、選択したセットとタグに対応するデータが記憶部１３４に格納されていないかを再度確認する（ステップＳ３０８）。選択したセットとタグに対応するデータが記憶部１３４に格納されていた場合（ステップＳ３０８：肯定）、投機的プリフェッチ部１３３は、ステップＳ３０４へ戻る。 After that, the speculative prefetch unit 133 checks again whether data corresponding to the selected set and tag is stored in the storage unit 134 (step S308). If the data corresponding to the selected set and tag is stored in the storage unit 134 (step S308: affirmative), the speculative prefetch unit 133 returns to step S304.

これに対して、選択したセットとタグに対応するデータが記憶部１３４に格納されていない場合（ステップＳ３０８：否定）、投機的プリフェッチ部１３３は、対応するデータをＬ３キャッシュ１４から取得する。そして、投機的プリフェッチ部１３３は、記憶部１３４にキャッシュ情報１３５として格納する（ステップＳ３０９）。 On the other hand, if the data corresponding to the selected set and tag is not stored in the storage unit 134 (step S308: negative), the speculative prefetch unit 133 acquires the corresponding data from the L3 cache 14. Then, the speculative prefetch unit 133 stores the cache information 135 in the storage unit 134 (step S309).

以上に説明したように、本実施例に係る情報処理装置のキャッシュは、キャッシュミスの頻度が高いアドレス範囲のデータを投機的にプリフェッチしておく。これにより、不規則なデータアクセスが行われる場合に、データがプリフェッチされている可能性を向上させることができる。したがって、限られたメモリバンド幅を無駄なく使用することが可能となり、データ転送を効率化することができる。 As described above, the cache of the information processing apparatus according to this embodiment speculatively prefetches data in an address range where cache misses occur frequently. Thereby, when irregular data access is performed, it is possible to improve the possibility that data is prefetched. Therefore, it is possible to use the limited memory bandwidth without wasting it, and data transfer can be made more efficient.

図８は、実施例２に係るＬ２キャッシュのブロック図である。図８では、Ｌ１キャッシュ１２の詳細は省略した。本実施例に係るＬ２キャッシュ１３は、ＳＲＡＭ１３７とＳＲＡＭ１３７よりも記憶密度の高いメモリであるＭＲＡＭ１３８の混載メモリである記憶部１３４を有する。記憶部１３４において、ＳＲＡＭ１３７がメイン領域であり、ＭＲＡＭ１３８が予備領域である。そして、Ｌ２キャッシュ１３は、投機的プリフェッチにおいてデータをＭＲＡＭ１３８に格納する。さらに、Ｌ２キャッシュ１３は、ＭＲＡＭ１３８に格納されたデータのうちアクセス頻度の高いデータはＳＲＡＭ１３７に移動する。以下に、本実施例に係るＬ２キャッシュ１３の詳細について説明する。以下の説明では、実施例１と同様の各部の動作については説明を省略する。 FIG. 8 is a block diagram of the L2 cache according to the second embodiment. In FIG. 8, details of the L1 cache 12 are omitted. The L2 cache 13 according to this embodiment includes a storage unit 134 that is a mixed memory of an SRAM 137 and an MRAM 138 that is a memory with a higher storage density than the SRAM 137. In the storage unit 134, the SRAM 137 is the main area, and the MRAM 138 is the spare area. The L2 cache 13 then stores the data in the MRAM 138 in speculative prefetch. Furthermore, the L2 cache 13 moves frequently accessed data among the data stored in the MRAM 138 to the SRAM 137. Details of the L2 cache 13 according to this embodiment will be explained below. In the following description, descriptions of the operations of the same parts as in the first embodiment will be omitted.

図８に示すように、本実施例に係るＬ２キャッシュ１３は、実施例１に係る各部に加えてデータモニタ部１４０を有する。さらに、本実施例に係るＬ２キャッシュ１３は、記憶部１３４をＳＲＡＭ１３７とＭＲＡＭ（Magnetoresistive Random Access Memory）１３８との混載メモリである。ＭＲＡＭ１３８は、ＳＲＡＭ１３７よりも記憶密度が高いメモリである。 As shown in FIG. 8, the L2 cache 13 according to the present embodiment includes a data monitor section 140 in addition to each section according to the first embodiment. Furthermore, in the L2 cache 13 according to this embodiment, the storage unit 134 is a mixed memory of an SRAM 137 and an MRAM (Magnetoresistive Random Access Memory) 138. MRAM 138 is a memory with higher storage density than SRAM 137.

ＳＲＡＭ１３７は、キャッシュ情報１３５を格納する。ＳＲＡＭ１３７は、一般的なキャッシュメモリと同様のフィールド及び機能を有する。 SRAM 137 stores cache information 135. The SRAM 137 has fields and functions similar to general cache memory.

ＭＲＡＭ１３８は、補助キャッシュ情報１３９及びキャッシュミス情報１３６を格納する。補助キャッシュ情報１３９は、投機的プリフェッチ部１３３が行う投機的プリフェッチにより記憶部１３４に格納されたデータ群である。 MRAM 138 stores auxiliary cache information 139 and cache miss information 136. The auxiliary cache information 139 is a data group stored in the storage unit 134 by speculative prefetch performed by the speculative prefetch unit 133.

図９は、実施例２に係るキャッシュ情報の構成を示す図である。記憶部１３４は、ＳＲＡＭ１３７に格納されたキャッシュ情報１３５及びＭＲＡＭ１３８に格納された補助キャッシュ情報１３９をまとめて、全体を１つのメモリアレイ２００として保持する。例えば、図９に示す例では、記憶部１３４は、４ウェイアソシアティブのメモリアレイ２００として保持する。ＭＲＡＭ１３８が有する補助キャッシュ情報１３９には、図９でReference Countとして示した各データの参照回数を表す参照カウンタがブロック毎に登録される。参照カウンタは、２ビット程度のカウンタである。 FIG. 9 is a diagram showing the configuration of cache information according to the second embodiment. The storage unit 134 collectively holds the cache information 135 stored in the SRAM 137 and the auxiliary cache information 139 stored in the MRAM 138 as one memory array 200. For example, in the example shown in FIG. 9, the storage unit 134 stores data as a 4-way associative memory array 200. In the auxiliary cache information 139 of the MRAM 138, a reference counter indicating the number of times each data is referenced, shown as Reference Count in FIG. 9, is registered for each block. The reference counter is an approximately 2-bit counter.

制御部１３１は、データの送信要求をＬ１キャッシュ１２から受ける。そして、制御部１３１は、送信要求で指定されたデータが記憶部１３４に格納されているか否かを判定する。この場合、制御部１３１は、図９に示すキャッシュ情報１３５及び補助キャッシュ情報１３９をまとめたメモリアレイ２００を対象としてデータを検索する。すなわち、制御部１３１は、キャッシュ情報１３５又は補助キャッシュ情報１３９に存在するか否かを判定する。指定されたデータが記憶部１３４に格納されている場合、制御部１３１は、記憶部１３４が保持するキャッシュ情報１３５又は補助キャッシュ情報１３９の中から指定されたデータを取得する。その後、制御部１３１は、取得したデータをＬ１キャッシュ１２及び演算部１１へ送信する。 The control unit 131 receives a data transmission request from the L1 cache 12. Then, the control unit 131 determines whether the data specified in the transmission request is stored in the storage unit 134. In this case, the control unit 131 searches for data in the memory array 200 that collects the cache information 135 and the auxiliary cache information 139 shown in FIG. That is, the control unit 131 determines whether it exists in the cache information 135 or the auxiliary cache information 139. If the specified data is stored in the storage unit 134, the control unit 131 acquires the specified data from the cache information 135 or the auxiliary cache information 139 held by the storage unit 134. After that, the control unit 131 transmits the acquired data to the L1 cache 12 and the calculation unit 11.

さらに、制御部１３１は、キャッシュヒットしたデータの格納場所がＭＲＡＭ１３８か否かを判定する。取得したデータの格納場所がＭＲＡＭ１３８でない場合、制御部１３１は、キャッシュヒットしたデータをＳＲＡＭ１３７に格納したままの状態としてデータキャッシュ処理を終了する。これに対して、キャッシュヒットしたデータの格納場所がＭＲＡＭ１３８の場合、制御部１３１は、キャッシュヒットしたデータへのアクセスをデータモニタ部１４０に通知する。 Furthermore, the control unit 131 determines whether the storage location of the cache hit data is the MRAM 138 or not. If the acquired data is not stored in the MRAM 138, the control unit 131 ends the data caching process while leaving the cache hit data stored in the SRAM 137. On the other hand, if the storage location of the cache hit data is the MRAM 138, the control unit 131 notifies the data monitor unit 140 of access to the cache hit data.

投機的プリフェッチ部１３３は、キャッシュミスの通知をキャッシュミス情報更新部１３２から受ける。そして、投機的プリフェッチ部１３３は、キャッシュミス情報１３６におけるカウンタ値を用いて不規則アクセスのデータが格納されたアドレス範囲を特定する。その後、投機的プリフェッチ部１３３は、特定したアドレス範囲のデータをＬ３キャッシュ１４から取得する。 The speculative prefetch unit 133 receives a cache miss notification from the cache miss information update unit 132. Then, the speculative prefetch unit 133 uses the counter value in the cache miss information 136 to identify the address range in which the irregularly accessed data is stored. Thereafter, the speculative prefetch unit 133 acquires data in the specified address range from the L3 cache 14.

ここで、投機的プリフェッチの対象のデータは、ハードウェアプリフェッチなど他のプリフェッチされたデータを含む記憶部１３４に格納された他のデータよりもアクセスされる可能性が低い。そこで、投機的プリフェッチの対象のデータは、比較的動作速度が遅いメモリに格納してもデータ転送速度への影響は小さい。また、投機的プリフェッチの対象のデータは、より多く保持することでアクセスされる可能性を向上させることができ、投機的プリフェッチの効果を大きくすることができる。ただし、投機的プリフェッチにより得られたデータに対するアクセス頻度が高い場合には、そのデータは動作速度がなるべく速いメモリに格納されることが好ましい。 Here, the data subject to speculative prefetching is less likely to be accessed than other data stored in the storage unit 134 including other prefetched data such as hardware prefetching. Therefore, even if the data to be subjected to speculative prefetching is stored in a memory whose operating speed is relatively slow, the effect on the data transfer speed is small. Further, by retaining more data to be subjected to speculative prefetching, the possibility of access can be improved, and the effect of speculative prefetching can be increased. However, if the data obtained by speculative prefetching is frequently accessed, it is preferable that the data be stored in a memory whose operating speed is as fast as possible.

そこで、投機的プリフェッチ部１３３は、取得したデータを記憶部１３４のＭＲＡＭ１３８が保持する補助キャッシュ情報１３９として格納する。すなわち、投機的プリフェッチ部１３３は、投機的プリフェッチの対象のデータをＭＲＡＭ１３８に格納して投機的プリフェッチを行なう。 Therefore, the speculative prefetch unit 133 stores the acquired data as auxiliary cache information 139 held in the MRAM 138 of the storage unit 134. That is, the speculative prefetch unit 133 stores data to be speculatively prefetched in the MRAM 138 and performs speculative prefetching.

データモニタ部１４０は、ＭＲＡＭ１３８に格納されたデータのアクセス頻度に応じてデータをＳＲＡＭ１３７に移動する。以下にデータモニタ部１４０の詳細について説明する。 The data monitor unit 140 moves data to the SRAM 137 according to the access frequency of the data stored in the MRAM 138. Details of the data monitor section 140 will be explained below.

データモニタ部１４０は、ＭＲＡＭ１３８からＳＲＡＭ１３７への移動の判定のための移動閾値を有する。データモニタ部１４０は、キャッシュヒットしたデータへのアクセスの通知を制御部１３１から受ける。そして、データモニタ部１４０は、補助キャッシュ情報１３９におけるキャッシュヒットしたデータの参照カウンタをインクリメントする。ここで、データモニタ部１４０は、書込速度が特に遅い場合は、書き込みによるデータアクセスの場合のインクリメント量が読み出しによるデータアクセスの場合のインクリメント量より大きくするようにインクリメント量を変えてもよい。 The data monitor unit 140 has a movement threshold for determining movement from MRAM 138 to SRAM 137. The data monitor unit 140 receives notification from the control unit 131 of access to cache hit data. Then, the data monitor unit 140 increments the reference counter of the cache hit data in the auxiliary cache information 139. Here, if the write speed is particularly slow, the data monitor unit 140 may change the increment amount so that the increment amount in the case of data access by writing is larger than the increment amount in the case of data access by reading.

次に、データモニタ部１４０は、キャッシュヒットしたデータの参照カウンタが移動閾値を超えたか否かを判定する。キャッシュヒットしたデータの参照カウンタが移動閾値を超えた場合、データモニタ部１４０は、キャッシュヒットしたデータを格納するための空き領域がＳＲＡＭ１３７に存在するか否かを判定する。 Next, the data monitor unit 140 determines whether the reference counter of the cache hit data exceeds the movement threshold. If the reference counter of the cache hit data exceeds the movement threshold, the data monitor unit 140 determines whether there is a free space in the SRAM 137 to store the cache hit data.

キャッシュヒットしたデータを格納するための空き領域がＳＲＡＭ１３７に存在する場合、データモニタ部１４０は、キャッシュヒットしたデータをＭＲＡＭ１３８からＳＲＡＭ１３７へ移動する。 If there is free space in the SRAM 137 to store the cache hit data, the data monitor unit 140 moves the cache hit data from the MRAM 138 to the SRAM 137.

これに対して、キャッシュヒットしたデータを格納するための空き領域がＳＲＡＭ１３７に存在しなければ、データモニタ部１４０は、ＳＲＡＭ１３７に格納されたデータの中からキャッシュヒットしたデータを格納するための空き領域を確保するためのリプレイスデータを選択する。データモニタ部１４０は、例えば、疑似ＬＲＵ（Least Recently Used）などの一般的に用いられる手法でリプレイスデータの選択を行なう。そして、データモニタ部１４０は、キャッシュヒットしたデータをＭＲＡＭ１３８からＳＲＡＭ１３７に移動する。また、データモニタ部１４０は、選択したリプレイスデータをＳＲＡＭ１３７からＭＲＡＭ１３８に移動する。 On the other hand, if there is no free space in the SRAM 137 to store the cache hit data, the data monitor unit 140 selects a free space to store the cache hit data from among the data stored in the SRAM 137. Select replacement data to secure. The data monitor unit 140 selects replacement data using a commonly used method such as pseudo LRU (Least Recently Used). The data monitor unit 140 then moves the cache hit data from the MRAM 138 to the SRAM 137. Furthermore, the data monitor unit 140 moves the selected replacement data from the SRAM 137 to the MRAM 138.

図１０は、実施例２に係る制御部によるデータキャッシュ処理のフローチャートである。次に、図１０を参照して、本実施例に係る制御部１３１によるデータキャッシュ処理の流れを説明する。ここでは、演算部１１によるデータＡへのアクセスが発生した場合で説明する。 FIG. 10 is a flowchart of data cache processing by the control unit according to the second embodiment. Next, with reference to FIG. 10, the flow of data cache processing by the control unit 131 according to this embodiment will be described. Here, a case will be described in which access to data A by the calculation unit 11 occurs.

制御部１３１は、データＡの送信要求をＬ１キャッシュ１２から受ける（ステップＳ４０１）。 The control unit 131 receives a request to send data A from the L1 cache 12 (step S401).

次に、制御部１３１は、データＡが記憶部１３４に格納されたキャッシュ情報１３５又は補助キャッシュ情報１３９のいずれかに存在するか否かを判定する（ステップＳ４０２）。 Next, the control unit 131 determines whether data A exists in either the cache information 135 or the auxiliary cache information 139 stored in the storage unit 134 (step S402).

データＡが記憶部１３４に格納されたキャッシュ情報１３５及び補助キャッシュ情報１３９のいずれにも存在しない場合（ステップＳ４０２：否定）、制御部１３１は、データＡのキャッシュミスをキャッシュミス情報更新部１３２に通知する（ステップＳ４０３）。 If the data A does not exist in either the cache information 135 or the auxiliary cache information 139 stored in the storage unit 134 (step S402: negative), the control unit 131 updates the cache miss information update unit 132 with the cache miss of the data A. Notify (step S403).

次に、制御部１３１は、データＡの送信をＬ３キャッシュ１４に要求する（ステップＳ４０４）。 Next, the control unit 131 requests the L3 cache 14 to transmit data A (step S404).

その後、制御部１３１は、データＡをＬ３キャッシュ１４から取得する。そして、制御部１３１は、データＡをＳＲＡＭ１３７にキャッシュ情報１３５として格納する（ステップＳ４０５）。 Thereafter, the control unit 131 obtains data A from the L3 cache 14. Then, the control unit 131 stores data A in the SRAM 137 as cache information 135 (step S405).

これに対して、データＡが記憶部１３４に格納されたキャッシュ情報１３５又は補助キャッシュ情報１３９のいずれかに存在する場合（ステップＳ４０２：肯定）、制御部１３１は、記憶部１３４が保持するキャッシュ情報１３５又はキャッシュ補助情報１３９の中からデータＡを取得する。そして、制御部１３１は、データＡをＬ１キャッシュ１２及び演算部１１へ送信する（ステップＳ４０６）。 On the other hand, if data A exists in either the cache information 135 or the auxiliary cache information 139 stored in the storage unit 134 (step S402: affirmative), the control unit 131 controls the cache information held by the storage unit 134. 135 or cache auxiliary information 139. Then, the control unit 131 transmits the data A to the L1 cache 12 and the calculation unit 11 (step S406).

その後、制御部１３１は、データＡの格納場所がＭＲＡＭ１３８か否かを判定する（ステップＳ４０７）。データＡの格納場所がＭＲＡＭ１３８でない場合（ステップＳ４０７：否定）、制御部１３１は、データＡをそのままの状態としてデータキャッシュ処理を終了する。 After that, the control unit 131 determines whether the storage location of data A is the MRAM 138 (step S407). If the storage location of data A is not the MRAM 138 (step S407: negative), the control unit 131 leaves data A as is and ends the data cache processing.

これに対して、データＡの格納場所がＭＲＡＭ１３８の場合（ステップＳ４０７：肯定）、制御部１３１は、データＡへのアクセスをデータモニタ部１４０に通知する（ステップＳ４０８）。その後、制御部１３１は、データキャッシュ処理を終了する。 On the other hand, if the storage location of data A is MRAM 138 (step S407: affirmative), control unit 131 notifies data monitor unit 140 of access to data A (step S408). After that, the control unit 131 ends the data cache processing.

ここで、本実施例に係る投機的プリフェッチ部１３３による投機的プリフェッチの流れは、図７に示したフローと同様の処理である。ただし、本実施例に係る投機的プリフェッチ部１３３、ステップＳ３０９において、データをＭＲＡＭ１３８に格納する。 Here, the flow of speculative prefetch by the speculative prefetch unit 133 according to this embodiment is similar to the flow shown in FIG. 7 . However, the speculative prefetch unit 133 according to this embodiment stores data in the MRAM 138 in step S309.

図１１は、データモニタ部によるデータ格納処理のフローチャートである。次に、図１１を参照して、データモニタ部１４０によるデータキャッシュ格納処理の流れを説明する。ここでは、制御部１３１によるデータＡへのアクセスが発生した場合で説明する。 FIG. 11 is a flowchart of data storage processing by the data monitor section. Next, with reference to FIG. 11, the flow of data cache storage processing by the data monitor unit 140 will be described. Here, a case will be described in which access to data A by the control unit 131 occurs.

データモニタ部１４０は、キャッシュヒットしたデータＡへのアクセスの通知を制御部１３１から受ける（ステップＳ５０１）。 The data monitor unit 140 receives from the control unit 131 a notification of access to the cache hit data A (step S501).

次に、データモニタ部１４０は、補助キャッシュ情報１３９におけるデータＡの参照カウンタをインクリメントする（ステップＳ５０２）。 Next, the data monitor unit 140 increments the reference counter for data A in the auxiliary cache information 139 (step S502).

次に、データモニタ部１４０は、データＡの参照カウンタが移動閾値より大きいか否かを判定する（ステップＳ５０３）。データＡの参照カウンタが移動閾値以下の場合（ステップＳ５０３：否定）、データモニタ部１４０は、データの格納処理を終了する。 Next, the data monitor unit 140 determines whether the reference counter of data A is larger than the movement threshold (step S503). If the reference counter of data A is less than or equal to the movement threshold (step S503: negative), the data monitor unit 140 ends the data storage process.

これに対して、データＡの参照カウンタが移動閾値より大きい場合（ステップＳ５０３：肯定）、データモニタ部１４０は、データＡを格納するための空き領域がＳＲＡＭ１３７に存在するか否かを判定する（ステップＳ５０４）。 On the other hand, if the reference counter for data A is larger than the movement threshold (step S503: affirmative), the data monitor unit 140 determines whether there is free space in the SRAM 137 to store data A ( Step S504).

データＡを格納するための空き領域がＳＲＡＭ１３７に存在する場合（ステップＳ５０４：肯定）、データモニタ部１４０は、データＡをＭＲＡＭ１３８からＳＲＡＭ１３７へ移動してＳＲＡＭ１３７に格納する（ステップＳ５０５）。その後、データモニタ部１４０は、データの格納処理を終了する。 If there is free space in the SRAM 137 to store the data A (step S504: affirmative), the data monitor unit 140 moves the data A from the MRAM 138 to the SRAM 137 and stores it in the SRAM 137 (step S505). Thereafter, the data monitor unit 140 ends the data storage process.

これに対して、データＡを格納するための空き領域がＳＲＡＭ１３７に存在しない場合（ステップＳ５０４：否定）、データモニタ部１４０は、ＳＲＡＭ１３７に格納されたデータの中からリプレイスデータを選択する（ステップＳ５０６）。ここでは、データモニタ部１４０が、リプレイスデータとしてデータＢを選択した場合で説明する。 On the other hand, if there is no free space in the SRAM 137 to store data A (step S504: negative), the data monitor unit 140 selects replacement data from among the data stored in the SRAM 137 (step S506). ). Here, a case will be described in which the data monitor unit 140 selects data B as the replacement data.

次に、データモニタ部１４０は、データＡをＭＲＡＭ１３８からＳＲＡＭ１３７に移動してＳＲＡＭ１３７に格納する。また、データモニタ部１４０は、データＢをＳＲＡＭ１３７からＭＲＡＭ１３８に移動してＭＲＡＭ１３８に格納する（ステップＳ５０７）。その後、データモニタ部１４０は、データの格納処理を終了する。 Next, the data monitor unit 140 moves data A from the MRAM 138 to the SRAM 137 and stores it in the SRAM 137. Furthermore, the data monitor unit 140 moves data B from the SRAM 137 to the MRAM 138 and stores it in the MRAM 138 (step S507). Thereafter, the data monitor unit 140 ends the data storage process.

以上に説明したように、本実施例に係るＬ２キャッシュは、ＳＲＡＭとＳＲＡＭよりも記憶密度の高いメモリとの混載メモリであり、投機的プリフェッチの対象のデータをＳＲＡＭよりも記憶密度の高いメモリに格納する。また、本実施例に係るＬ２キャッシュは、ＭＲＡＭに格納したデータのアクセス頻度が高い場合には、そのデータをＳＲＡＭに移動する。このように、Ｌ２キャッシュに格納されたデータのうちアクセスの可能性が低い投機的プリフェッチによるデータを記憶密度の高いＭＲＡＭに格納することで、データの転送速度の低下を抑えつつ、投機的プリフェッチによるデータをより多く保持することができる。これにより、投機的プリフェッチによるキャッシュヒット率を向上させることができ、投機的プリフェッチの効果を大きくすることができる。 As explained above, the L2 cache according to this embodiment is a mixed memory of SRAM and memory with higher storage density than SRAM, and the data to be speculatively prefetched is transferred to the memory with higher storage density than SRAM. Store. Furthermore, when the data stored in MRAM is accessed frequently, the L2 cache according to the present embodiment moves the data to SRAM. In this way, by storing speculative prefetch data that is unlikely to be accessed among the data stored in the L2 cache in MRAM with high storage density, data transfer speed can be suppressed while reducing speculative prefetch data. Can hold more data. Thereby, the cache hit rate due to speculative prefetching can be improved, and the effect of speculative prefetching can be increased.

（変形例）
実施例２におけるＭＲＡＭ１３８は、Ｌ２キャッシュ１３内でのビクティムキャッシュとして用いることも可能である。以下で、ＭＲＡＭ１３８をビクティムキャッシュとして用いる場合の動作について説明する。 (Modified example)
The MRAM 138 in the second embodiment can also be used as a victim cache within the L2 cache 13. The operation when using the MRAM 138 as a victim cache will be described below.

図１２は、ＭＲＡＭをビクティムキャッシュとして用いる場合のデータキャッシュ処理のフローチャートである。図１２を参照して、ＭＲＡＭ１３８をビクティムキャッシュとして用いる場合のデータキャッシュ処理を説明する。ここでは、演算部１１がデータＡへのアクセスを要求し、且つ、データＡがＬ２キャッシュ１３に存在しない場合で説明する。 FIG. 12 is a flowchart of data cache processing when MRAM is used as a victim cache. With reference to FIG. 12, data cache processing when using the MRAM 138 as a victim cache will be described. Here, a case will be described in which the calculation unit 11 requests access to data A and data A does not exist in the L2 cache 13.

制御部１３１は、データＡの取得要求をＬ１キャッシュ１２から受けて、データＡが記憶部１３４のキャッシュ情報１３５及び補助キャッシュ情報１３９のいずれかとして格納されているか否かを判定する。この場合、データＡは記憶部１３４に格納されておらず、制御部１３１はデータＡを検出できないため、データＡのキャッシュミスが発生する（ステップＳ６０１）。 The control unit 131 receives an acquisition request for data A from the L1 cache 12 and determines whether data A is stored as either cache information 135 or auxiliary cache information 139 in the storage unit 134. In this case, since data A is not stored in the storage unit 134 and the control unit 131 cannot detect data A, a cache miss of data A occurs (step S601).

次に、制御部１３１は、データＡの送信をＬ３キャッシュ１４に要求する（ステップＳ６０２）。 Next, the control unit 131 requests the L3 cache 14 to transmit data A (step S602).

次に、制御部１３１は、ＳＲＡＭ１３７にデータＡを格納する空き領域が存在するか否かを判定する（ステップＳ６０３）。データＡを格納する空き領域が存在する場合（ステップＳ６０３：肯定）、制御部１３１は、ステップＳ６０８へ進む。 Next, the control unit 131 determines whether there is any free space in the SRAM 137 to store the data A (step S603). If there is free space to store data A (step S603: affirmative), the control unit 131 proceeds to step S608.

これに対して、データＡを格納する空き領域が存在しない場合（ステップＳ６０３：否定）、制御部１３１は、データＡを格納する空き領域を確保するために、ＳＲＡＭ１３７のキャッシュ情報１３５の中から削除するデータＢを選択する（ステップＳ６０４）。 On the other hand, if there is no free space to store the data A (step S603: negative), the control unit 131 deletes the data from the cache information 135 of the SRAM 137 in order to secure the free space to store the data A. Data B to be processed is selected (step S604).

次に、制御部１３１は、ＭＲＡＭ１３８にデータＢを格納する空き領域が存在するか否かを判定する（ステップＳ６０５）。データＢを格納する空き領域が存在する場合（ステップＳ６０５：肯定）、制御部１３１は、ステップＳ６０７へ進む。 Next, the control unit 131 determines whether there is a free area in the MRAM 138 to store the data B (step S605). If there is free space to store data B (step S605: affirmative), the control unit 131 proceeds to step S607.

これに対して、データＢを格納する空き領域が存在しない場合（ステップＳ６０５：否定）、制御部１３１は、データＢを格納する空き領域を確保するために、ＭＲＡＭ１３８の補助キャッシュ情報１３９から削除するデータを選択する。そして、制御部１３１は、選択したデータをＭＲＡＭ１３８の補助キャッシュ情報１３９から削除する（ステップＳ６０６）。その後、制御部１３１は、ステップＳ６０７へ進む。 On the other hand, if there is no free space to store the data B (step S605: negative), the control unit 131 deletes the data from the auxiliary cache information 139 of the MRAM 138 in order to secure a free space to store the data B. Select data. Then, the control unit 131 deletes the selected data from the auxiliary cache information 139 of the MRAM 138 (step S606). After that, the control unit 131 proceeds to step S607.

次に、制御部１３１は、データＢをＭＲＡＭ１３８に格納する（ステップＳ６０７）。 Next, the control unit 131 stores data B in the MRAM 138 (step S607).

その後、制御部１３１は、データＡをＬ３キャッシュ１４から取得する。そして、制御部１３１は、データＡをＳＲＡＭ１３７に格納する（ステップＳ６０８）。 Thereafter, the control unit 131 obtains data A from the L3 cache 14. Then, the control unit 131 stores data A in the SRAM 137 (step S608).

以上に説明したように、Ｌ２キャッシュ内のＭＲＡＭは、Ｌ２キャッシュ内でのビクティムキャッシュとして使用することも可能である。これにより、Ｌ２キャッシュ内のＭＲＡＭをより効率的に使用することが可能となる。 As explained above, the MRAM in the L2 cache can also be used as a victim cache in the L2 cache. This allows the MRAM in the L2 cache to be used more efficiently.

１情報処理装置
１１演算部
１２Ｌ１キャッシュ
１３Ｌ２キャッシュ
１４Ｌ３キャッシュ
１５メインメモリ
１６補助記憶装置
１７表示装置
１８入力装置
１２１制御部
１２２記憶部
１２３キャッシュ情報
１３１制御部
１３２キャッシュミス情報更新部
１３３投機的プリフェッチ部
１３４記憶部
１３５キャッシュ情報
１３６キャッシュミス情報
１４０データモニタ部 1 Information Processing Device 11 Arithmetic Unit 12 L1 Cache 13 L2 Cache 14 L3 Cache 15 Main Memory 16 Auxiliary Storage Device 17 Display Device 18 Input Device 121 Control Unit 122 Storage Unit 123 Cache Information 131 Control Unit 132 Cache Miss Information Update Unit 133 Speculative Prefetch unit 134 Storage unit 135 Cache information 136 Cache miss information 140 Data monitor unit

Claims

An arithmetic processing device having an arithmetic unit and one or a plurality of hierarchical caches,
At least one of the caches includes:
a storage unit that stores data;
Upon receiving a data access request from the arithmetic unit or upper cache, if the target data of the data access request exists in the storage unit, the target data is accessed, and if the target data does not exist in the storage unit, the target data is accessed. a control unit that acquires target data from a lower cache or main memory and stores it in the storage unit;
an information management unit that calculates the number of cache misses that indicate that the target data does not exist in the storage unit;
and a speculative prefetch unit that acquires speculative data from the main memory or the lower cache based on the number of occurrences and stores the acquired speculative data in the storage unit. Device.

The information management unit calculates the number of occurrences for each address range,
The speculative prefetch unit selects one of the address ranges based on the number of occurrences and acquires the speculative data included in the selected address range. arithmetic processing unit.

The storage unit has a main area and a spare area,
The arithmetic processing device according to claim 1, wherein the speculative prefetch unit stores the speculative data in the spare area.

A claim further comprising a data monitor unit that calculates an access frequency for each of the plurality of data stored in the spare area, selects data based on the access frequency, and stores the selected data in the main area. The arithmetic processing device according to item 3.

The control unit acquires the target data from the lower cache or the main memory when the target data does not exist in the storage unit, and acquires the target data from the lower cache or the main memory when the target data is not stored in the main area. 4. The arithmetic processing device according to claim 3, wherein specific data is moved from the main area to the spare area and the target data is stored in the main area.

The speculative prefetch unit determines whether or not there is sufficient processing capacity on the bus between the cache in which it is mounted and the main memory or the lower cache, and if there is sufficient processing capacity, the speculative prefetch unit 2. The arithmetic processing device according to claim 1, wherein speculative data is acquired from the main memory or the lower cache, and the acquired speculative data is stored in the storage unit.

The speculative prefetch unit calculates a value of an index representing the processing amount of the bus, and determines whether or not there is enough processing capacity based on a comparison between the calculated value and a threshold value. 7. The arithmetic processing device according to claim 6.

An arithmetic processing method using an arithmetic unit, a main memory, and one or a plurality of hierarchical caches, the method comprising:
at least one of the caches,
receive a data access request from the arithmetic unit or upper cache;
accessing the target data when the target data of the data access request exists in the storage area of the cache;
If the target data does not exist in the storage area, acquiring the target data from a lower cache or the main memory and storing it in the storage area;
calculating the number of occurrences of cache misses indicating that the target data does not exist in the storage area;
An arithmetic processing method comprising: acquiring speculative data from the main memory or the lower cache based on the number of occurrences, and storing the acquired speculative data in the storage area.