JP5045334B2

JP5045334B2 - Cash system

Info

Publication number: JP5045334B2
Application number: JP2007246192A
Authority: JP
Inventors: 毅 ▲葛▼; 真一郎多湖
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2007-09-21
Filing date: 2007-09-21
Publication date: 2012-10-10
Anticipated expiration: 2027-09-21
Also published as: JP2009075978A

Description

本発明は、キャッシュシステムに関する。 The present invention relates to a cache system.

コンピュータシステムにおいては一般に、主記憶とは別に小容量で高速なキャッシュメモリが設けられる。主記憶に記憶される情報の一部をキャッシュにコピーしておくことで、この情報をアクセスする場合には主記憶からではなくキャッシュから読み出すことで、高速な情報の読み出しが可能となる。 In general, in a computer system, a small-capacity and high-speed cache memory is provided separately from the main memory. By copying a part of the information stored in the main memory to the cache, when accessing this information, it is possible to read the information at a high speed by reading from the cache instead of the main memory.

キャシュは複数のキャッシュラインを含み、主記憶からキャッシュへの情報のコピーはキャッシュライン単位で実行される。主記憶のメモリ空間はキャッシュライン単位で分割され、分割されたメモリ領域を順番にキャッシュラインに割当てておく。キャッシュの容量は主記憶の容量よりも小さいので、主記憶のメモリ領域を繰り返して同一のキャッシュラインに割当てることになる。 The cache includes a plurality of cache lines, and information is copied from the main memory to the cache in units of cache lines. The memory space of the main memory is divided in units of cache lines, and the divided memory areas are sequentially assigned to the cache lines. Since the capacity of the cache is smaller than the capacity of the main memory, the memory area of the main memory is repeatedly assigned to the same cache line.

メモリ空間上のあるアドレスに最初のアクセスが実行されると、そのアドレスの情報（データやプログラム）をキャシュ内の対応するキャッシュラインにコピーする。同一アドレスに対して次のアクセスを実行する場合にはキャシュから直接に情報を読み出す。一般に、アドレスの全ビットのうちで、所定数の下位ビットがキャッシュのインデックスとなり、それより上位に位置する残りのビットがキャッシュのタグとなる。 When the first access is executed to an address in the memory space, the information (data or program) at that address is copied to the corresponding cache line in the cache. When executing the next access to the same address, information is read directly from the cache. Generally, out of all bits of the address, a predetermined number of lower bits serve as a cache index, and the remaining bits positioned higher than that serve as a cache tag.

データをアクセスする場合には、アクセス先を示すアドレス中のインデックス部分を用いて、キャッシュ中の対応するインデックスのタグを読み出す。読み出したタグと、アドレス中のタグ部分のビットパターンが一致するか否かを判断する。一致しない場合にはキャッシュミスとなる。一致する場合には、キャッシュヒットとなり、当該インデックスに対応するキャッシュデータ（１キャッシュライン分の所定ビット数のデータ）がアクセスされる。 When accessing the data, the tag of the corresponding index in the cache is read using the index portion in the address indicating the access destination. It is determined whether or not the read tag matches the bit pattern of the tag portion in the address. If they do not match, a cache miss occurs. If they match, a cache hit occurs, and the cache data corresponding to the index (data of a predetermined number of bits for one cache line) is accessed.

各キャッシュラインに対して１つだけタグを設けたキャッシュの構成を、ダイレクトマッピング方式と呼ぶ。各キャッシュラインに対してＮ個のタグを設けたキャッシュの構成をＮウェイセットアソシアティブと呼ぶ。ダイレクトマッピング方式は１ウェイセットアソシアティブとみなすことができる。 A cache configuration in which only one tag is provided for each cache line is called a direct mapping method. A cache configuration in which N tags are provided for each cache line is called an N-way set associative. The direct mapping method can be regarded as a one-way set associative.

キャッシュミスが発生した場合に、主記憶にアクセスすることによるペナルティーを軽減するために、キャッシュメモリを多階層化したシステムが用いられる。例えば、１次キャッシュと主記憶との間に、主記憶よりは高速にアクセスできる２次キャッシュを設けることにより、１次キャッシュにおいてキャッシュミスが発生した場合に、主記憶にアクセスが必要になる頻度を低くして、キャッシュミス・ペナルティーを軽減することができる。 In order to reduce the penalty for accessing the main memory when a cache miss occurs, a system in which the cache memory is hierarchized is used. For example, by providing a secondary cache that can be accessed faster than the main memory between the primary cache and the main memory, the frequency at which access to the main memory is required when a cache miss occurs in the primary cache. Can be reduced to reduce the cache miss penalty.

従来、プロセッサにおいては動作周波数の向上やアーキテクチャの改良により処理速度を向上させてきた。しかし近年、周波数をこれ以上高くすることには技術的な限界が見え始めており、複数のプロセッサを用いたマルチプロセッサ構成により処理速度の向上を目指す動きが強くなっている。 Conventionally, in a processor, the processing speed has been improved by improving the operating frequency and improving the architecture. However, in recent years, a technical limit has begun to appear when the frequency is further increased, and there is a strong movement toward improving the processing speed by a multiprocessor configuration using a plurality of processors.

複数のプロセッサが存在するシステムは、各々がキャッシュを有する既存のシングルプロセッサコアを複数個設け、それらを単純に繋げることで実現可能である。このような構成は、設計コストを低く抑えることができるが、キャッシュの使用効率やキャッシュの一貫性に関して問題がある。 A system having a plurality of processors can be realized by providing a plurality of existing single processor cores each having a cache and simply connecting them. Such a configuration can keep the design cost low, but there are problems with cache usage efficiency and cache consistency.

下記の特許文献１には、各々が１つ以上のノードの組のうちのあるノード上に存在する複数個のキャッシュのバッファに記憶されるデータの置換を管理するための方法であって、前記１つ以上のノードの組の各ノードは、前記複数個のキャッシュのうちの１つ以上のキャッシュを含み、複数個のキャッシュは第１のキャッシュを含み、前記方法は、１つ以上の要因に基づいて前記第１のキャッシュから第１のバッファを置換のために選択するステップを含み、前記第１のバッファは、第１のデータ項目を記憶するために現在使用されており、前記１つ以上の要因は、前記複数個のキャッシュのうち少なくとも１つの他のキャッシュの状態、または前記複数個のキャッシュのうち前記少なくとも１つの他のキャッシュの構成、のいずれかを含み、さらに、前記第１のバッファに記憶された前記第１のデータ項目を第２のデータ項目と置換するステップを含む、方法が記載されている。 The following patent document 1 is a method for managing replacement of data stored in a plurality of cache buffers, each of which exists on a certain node in a set of one or more nodes. Each node in the set of one or more nodes includes one or more caches of the plurality of caches, the plurality of caches includes a first cache, and the method is based on one or more factors. Selecting a first buffer for replacement from the first cache based on the first buffer, wherein the first buffer is currently being used to store a first data item and the one or more The factors include either the state of at least one other cache among the plurality of caches, or the configuration of the at least one other cache among the plurality of caches. Further comprising the step of replacing the first of the stored in the buffer the first data item and the second data item, the method has been described.

また、下記の特許文献２には、主記憶装置のデータの一部をブロック単位でコピーして格納し、新たなブロックの格納に際してリプレースすべきブロックをＬＲＵ方式で決定するバッファ記憶装置において、バッファ記憶装置のブロックを複数のブロックを複数のグループに分け、各グループ毎にＬＲＵ方式でリプレースブロックを決定する手段を設けるとともに、どのグループをリプレース対象とするかを指示するフラグビットを設け、かつこのフラグビットを三重化して設けて、読出し時には多数決をとるようにしたことを特徴とするバッファ記憶装置のＬＲＵ制御方式が記載されている。 Further, in Patent Document 2 below, in a buffer storage device that copies and stores a part of data in a main storage device in units of blocks, and determines a block to be replaced when a new block is stored by the LRU method, A block of the storage device is divided into a plurality of groups, and a means for determining a replacement block by the LRU method is provided for each group, and a flag bit that indicates which group is to be replaced is provided. An LRU control method for a buffer storage device is described in which flag bits are provided in triplicate and a majority vote is taken at the time of reading.

また、下記の特許文献３には、ＣＰＵキャッシュを内蔵するＣＰＵとＣＰＵキャッシュ複製タグメモリを内蔵するＣＰＵ−ＳＣインタフェース制御部を備えた複数のプロセッサモジュールと、メインメモリとが、システムコネクション（ＳＣ）を介して結合されたマルチプロセッサシステムにおけるキャッシュ構成方法であって、前記ＣＰＵのＣＰＵキャッシュをセット数ｋ、ウェイ数ｗのセットアソシアティブとし、前記ＣＰＵ−ＳＣインタフェース制御部のＣＰＵキャッシュ複製タグメモリをセット数ｋ、ウェイ数ｗ＋αのセットアソシアティブとすることを特徴とするキャッシュ構成方法が記載されている。 In Patent Document 3 below, a plurality of processor modules including a CPU having a CPU cache and a CPU-SC interface control unit having a CPU cache duplication tag memory, and a main memory are connected to a system connection (SC). And a CPU cache replication tag memory of the CPU-SC interface control unit is set by setting the CPU cache of the CPU as a set associative with the number of sets k and the number of ways w. A cache configuration method is described in which a set associative of a number k and a number of ways w + α is used.

特表２００４−５１１８４０号公報Japanese translation of PCT publication No. 2004-511840 特開昭５９−００３７７３号公報JP 59-003773 A 特開２００２−５５８８０号公報JP 2002-55880 A

本発明の目的は、複数の処理装置に一対一に複数のキャッシュを接続した場合に、複数のキャッシュ間で効率的にデータ転送を行うことができるキャッシュシステムを提供することである。 An object of the present invention is to provide a cache system capable of efficiently transferring data between a plurality of caches when a plurality of caches are connected to a plurality of processing devices on a one-to-one basis.

本発明のキャッシュシステムは、複数の処理装置と、前記複数の処理装置に一対一に接続された複数のキャッシュと、前記複数のキャッシュに接続され、前記複数のキャッシュ間のデータ転送を制御するコントローラと、前記処理装置毎に自己のキャッシュ内のエントリ及び他のキャッシュの使用古さ順を示す第１の情報を記憶する情報メモリとを有することを特徴とする。 A cache system according to the present invention includes a plurality of processing devices, a plurality of caches connected to the plurality of processing devices on a one-to-one basis, and a controller connected to the plurality of caches and controlling data transfer between the plurality of caches. And an information memory for storing first information indicating an entry in its own cache and the order of use of other caches for each of the processing devices.

第１の情報を用いることにより、複数のキャッシュ間で効率的にデータ転送を行うことができ、トラフィックの増大を防止することができる。 By using the first information, data can be efficiently transferred between a plurality of caches, and an increase in traffic can be prevented.

図１４は、共有分散キャッシュシステムの構成例を示す図である。この共有分散キャッシュシステムは、複数のコア（プロセッサ：処理装置）１０１〜１０３、複数のコア１０１〜１０３に一対一に対応する複数のキャッシュ（キャッシュメモリ）１１１〜１１３、キャッシュ１１１〜１１３に接続されるキャッシュ間接続コントローラ（ＩＣＣ）１２０、及びメインメモリ１３０を含む。コア１０１〜１０３はそれぞれ、自己に直接に接続されるキャッシュ（自コアキャッシュ）１１１〜１１３を１次キャッシュとしてアクセス可能である。この共有分散キャッシュシステムでは、さらに、他のコアのキャッシュ（他コアキャッシュ）を２次キャッシュとしてアクセス可能なように構成される。すなわち、例えばコア１０１から見たときに、キャッシュ１１１を１次キャッシュとしてアクセス可能であるとともに、さらにキャッシュ１１２及び１１３を２次キャッシュとしてアクセス可能なように構成される。このような２次キャッシュをアクセスする経路は、キャッシュ間接続コントローラ１２０を介して提供される。キャッシュ間接続コントローラ１２０は、ＬＲＵ情報１４００を有する。ＬＲＵはLeast Recent Usedの意味であり、ＬＲＵ情報は最も長いこと使用されていないものを特定する情報である。具体的には、複数のエントリに対するＬＲＵ情報は、それら複数のエントリの使用古さ順（最も長いこと使用されていない順）を示す情報である。 FIG. 14 is a diagram illustrating a configuration example of a shared distributed cache system. This shared distributed cache system is connected to a plurality of cores (processors: processing devices) 101 to 103, a plurality of caches (cache memories) 111 to 113 corresponding to the plurality of cores 101 to 103, and caches 111 to 113. And an inter-cache connection controller (ICC) 120 and a main memory 130. Each of the cores 101 to 103 can access caches (self core cache) 111 to 113 directly connected to the cores 101 to 113 as primary caches. This shared distributed cache system is further configured so that a cache of another core (another core cache) can be accessed as a secondary cache. That is, for example, when viewed from the core 101, the cache 111 can be accessed as a primary cache, and the caches 112 and 113 can be accessed as secondary caches. A path for accessing such a secondary cache is provided via the inter-cache connection controller 120. The inter-cache connection controller 120 has LRU information 1400. LRU means “Least Recent Used”, and LRU information is information that identifies the longest unused one. Specifically, the LRU information for a plurality of entries is information indicating the order of use of the plurality of entries (the order of least used).

例えば、コア１０１は、自コアキャッシュ１１１にロード要求し、キャッシュヒットすれば、自コアキャッシュ１１１からデータを読み出す。キャッシュミスすれば、２次キャッシュ１１２又は１１３にアクセスする。例えば、２次キャッシュ１１２でキャッシュヒットすれば、処理１４１１のように、１次キャッシュ１１１と２次キャッシュ１１２との間でキャッシュラインのデータ交換処理を行う。２次キャッシュ１１２及び１１３でキャッシュミスした場合には、処理１４１２によりメインメモリ１３０のデータがキャッシュライン分、１次キャッシュ１１１にコピーされると共に、そのデータがコア１０１に出力される。また、その際、処理１４１３により、１次キャッシュ１１１から追い出されたキャッシュラインは、２次キャッシュ１１２に移動される。 For example, the core 101 makes a load request to the own core cache 111 and reads data from the own core cache 111 when a cache hit occurs. If a cache miss occurs, the secondary cache 112 or 113 is accessed. For example, if a cache hit occurs in the secondary cache 112, a cache line data exchange process is performed between the primary cache 111 and the secondary cache 112 as in process 1411. When a cache miss occurs in the secondary caches 112 and 113, the data in the main memory 130 is copied to the primary cache 111 for the cache line by the processing 1412 and the data is output to the core 101. At that time, the cache line evicted from the primary cache 111 by the process 1413 is moved to the secondary cache 112.

他コアキャッシュ１１２又は１１３にヒットした時、ヒットしたエントリのＬＲＵ情報１４００が最新でない場合は必ず交換処理１４１１を行う。しかし、この場合、あまり利用しないエントリのときでも交換処理１４１１が発生し、トラフィックが増加してしまう。これにより、システムの動作が低下してしまうため、トラフィックを減少させる必要がある。 When the other core cache 112 or 113 is hit, if the LRU information 1400 of the hit entry is not the latest, the exchange process 1411 is always performed. However, in this case, the exchange process 1411 occurs even when the entry is not used much, and traffic increases. As a result, the operation of the system is lowered, and it is necessary to reduce the traffic.

また、コア１０１〜１０３の数が増加した場合に、より多くの他コアキャッシュを下位階層キャッシュとして利用するため、ＬＲＵ情報１４００のビット数が増加してしまう。したがって、ＬＲＵ情報１４００のビット数増加を防止する対策が望まれる。 Further, when the number of cores 101 to 103 is increased, the number of bits of the LRU information 1400 is increased because more other core caches are used as lower layer caches. Therefore, a countermeasure for preventing an increase in the number of bits of the LRU information 1400 is desired.

図１５は、他の共有分散キャッシュシステムの構成例を示す図である。以下、図１５が図１４と異なる点を説明する。ＬＲＵ情報１５００は、図１４のＬＲＵ情報１４００に対応するものである。第１のＬＲＵ情報１５０１は、キャッシュ１１１のエントリａ，ｂ，ｃの最古順を示し、「０」が最古、「２」が最新を示す。第１のＬＲＵ情報１５０２は、キャッシュ１１２のエントリａ，ｂ，ｃの最古順を示す。第１のＬＲＵ情報１５０３は、キャッシュ１１３のエントリａ，ｂ，ｃの最古順を示す。第１のＬＲＵ情報１５０１〜１５０３を２項関係方式によりエンコードすると、それぞれ３ビットになる。第２のＬＲＵ情報１５０９は、３個のキャッシュ１１１〜１１３全体でのエントリの最古順を示す。第２のＬＲＵ情報１５０９を順序関係方式によりエンコードすると、１８ビットになる。したがって、ＬＲＵ情報１５００は、３＋３＋３＋１８＝２７ビットになる。これにより、全コアのキャッシュ１１１〜１１３のキャッシュラインを大きな１つのキャッシュと見て、完全な最古関係を保持することができる。 FIG. 15 is a diagram illustrating a configuration example of another shared distributed cache system. Hereinafter, the points of FIG. 15 different from FIG. 14 will be described. The LRU information 1500 corresponds to the LRU information 1400 in FIG. The first LRU information 1501 indicates the oldest order of the entries a, b, and c in the cache 111, “0” being the oldest and “2” indicating the latest. The first LRU information 1502 indicates the oldest order of entries a, b, and c in the cache 112. First LRU information 1503 indicates the oldest order of entries a, b, and c in the cache 113. When the first LRU information 1501 to 1503 is encoded by the binary relation method, each becomes 3 bits. The second LRU information 1509 indicates the oldest order of entries in the three caches 111 to 113 as a whole. When the second LRU information 1509 is encoded by the order relation method, it becomes 18 bits. Therefore, the LRU information 1500 is 3 + 3 + 3 + 18 = 27 bits. Thereby, the cache lines of the caches 111 to 113 of all the cores can be regarded as one large cache and the complete oldest relationship can be maintained.

図１６は、図１５の共有分散キャッシュシステムにおけるキャッシュエントリ追い出し処理を示すフローチャートである。複数のコア１０１〜１０３の何れか１つのコアが、自己に直接に接続されたキャッシュ（自コアキャッシュ）へアクセス要求を発行し、まず自コアでキャッシュミスし、さらにその下位のキャッシュでもキャッシュミスし、メインメモリ１３０からデータを読み出して自コアにアロケートする必要が生じた時点で、図１６のフローチャートの動作が実行される。なおここで、図１６のフローチャートの動作が読み出すデータの読み出し元は、必ずしもメインメモリ１３０である必要はない。キャッシュ間接続コントローラ１２０は、キャッシュ１１１〜１１３を優先順位に応じてアクセスすることでキャッシュを階層化しているが、その階層構造のキャッシュの下に、キャッシュとは別の管理単位として管理されているメモリが読み出し元となる。階層構造のキャッシュの下に直ぐにメインメモリ１３０が位置されているのであれば、データの読み出し先はメインメモリ１３０となる。以下では、メインメモリ１３０の場合を例にとって説明する。 FIG. 16 is a flowchart showing cache entry eviction processing in the shared distributed cache system of FIG. Any one of the plurality of cores 101 to 103 issues an access request to a cache (self-core cache) directly connected to itself, first makes a cache miss in the self-core, and further caches in a lower level cache. When the data needs to be read from the main memory 130 and allocated to the own core, the operation of the flowchart of FIG. 16 is executed. Here, the data read source read by the operation of the flowchart of FIG. 16 is not necessarily the main memory 130. The inter-cache connection controller 120 tiers the cache by accessing the caches 111 to 113 according to the priority order, but is managed as a management unit separate from the cache under the hierarchical cache. The memory becomes the reading source. If the main memory 130 is located immediately below the hierarchical cache, the data read destination is the main memory 130. Hereinafter, the case of the main memory 130 will be described as an example.

ステップＳ１１で、コア１０１〜１０３のうちの１つがメインメモリ１３０にアクセスするために、アクセス先のアドレス（例えば読み出し先のアドレス）をキャッシュ間接続コントローラ１２０に送信する。ステップＳ１２で、第２のＬＲＵ情報１５０９を参照し、アクセス先に対応するインデックスのエントリのうちで、キャッシュ１１１〜１１３全体で最古のエントリを探索する。 In step S <b> 11, one of the cores 101 to 103 transmits an access destination address (for example, a read destination address) to the inter-cache connection controller 120 in order to access the main memory 130. In step S12, the second LRU information 1509 is referenced to search for the oldest entry in the caches 111 to 113 as a whole among the entries of the index corresponding to the access destination.

ステップＳ１３で、全体で最古のエントリが自コアキャッシュ（アクセス要求を出したコアに対応するキャッシュ）にあるか否かを判定する。全体で最古のエントリが自コアキャッシュに存在する場合には、ステップＳ１４において、第１のＬＲＵ情報１５０１〜１５０３及び第２のＬＲＵ情報１５０９を更新する。すなわち、自コアキャッシュで最古のエントリが最新のエントリとなるように第１のＬＲＵ情報１５０１〜１５０３を更新するとともに、全体で最古のエントリが最新のエントリとなるように第２のＬＲＵ情報１５０９を更新する。その後、ステップＳ１７で、メインメモリ１３０から読み出したデータを自コアキャッシュ内の最古エントリ（ＬＲＵ更新により最新エントリに更新されたエントリ）に上書きする。 In step S13, it is determined whether the oldest entry as a whole is in its own core cache (cache corresponding to the core that issued the access request). If the oldest entry in total exists in the own core cache, the first LRU information 1501 to 1503 and the second LRU information 1509 are updated in step S14. That is, the first LRU information 1501-1503 is updated so that the oldest entry becomes the latest entry in the own core cache, and the second LRU information so that the oldest entry becomes the latest entry as a whole. 1509 is updated. Thereafter, in step S17, the data read from the main memory 130 is overwritten on the oldest entry (the entry updated to the latest entry by the LRU update) in the own core cache.

全体で最古のエントリが自コアキャッシュに存在しないとステップＳ１３で判定された場合には、ステップＳ１５において、第１のＬＲＵ情報１５０１〜１５０３及び第２のＬＲＵ情報１５０９を更新する。すなわち、自コアキャッシュで最古のエントリが最新のエントリとなるように自コアキャッシュの第１のＬＲＵ情報を更新するとともに、自コアキャッシュから追い出されたエントリ（追い出しエントリ）の移動先のキャッシュにおける最古のエントリ（全体で最古のエントリ）が追い出しエントリの古い順に一致するように移動先キャッシュの第１のＬＲＵ情報を更新する。また第２のＬＲＵ情報１５０９については、全体で最古のエントリが追い出しエントリの古い順に一致する順番のエントリとなり、かつ自コアキャッシュで最古のエントリが最新のエントリとなるように更新する。 If it is determined in step S13 that the oldest entry does not exist in the own core cache as a whole, the first LRU information 1501-1503 and the second LRU information 1509 are updated in step S15. In other words, the first LRU information of the own core cache is updated so that the oldest entry in the own core cache becomes the latest entry, and the entry evicted from the own core cache (evicted entry) in the destination cache The first LRU information in the destination cache is updated so that the oldest entry (the oldest entry as a whole) matches the oldest entry in the eviction entry. Also, the second LRU information 1509 is updated so that the oldest entry as a whole becomes an entry in the order matching the oldest entry of the eviction entry, and the oldest entry becomes the latest entry in the own core cache.

その後ステップＳ１６で、自コアキャッシュで最古のエントリを全体で最古のエントリに転送する（移動する）。最後にステップＳ１７で、メインメモリ１３０から読み出したデータを自コアキャッシュ内の最古エントリ（ＬＲＵ更新により最新エントリに更新されたエントリ）に上書きする。 Thereafter, in step S16, the oldest entry in the own core cache is transferred (moved) to the oldest entry as a whole. Finally, in step S17, the data read from the main memory 130 is overwritten on the oldest entry (entry updated to the latest entry by LRU update) in the own core cache.

しかし、図１４のシステムの交換動作判定方式では、そのトラフィックが大きくなりシステムの動作を低下させる可能性があるという問題がある。また、図１５のシステムのＬＲＵ符号化方式では、一般性はあるが、共有分散キャッシュ特有の性質を十分に利用していない。また、コアが増加してきた場合に、より多くの他コアキャッシュを下位階層キャッシュとして利用するためには他コアキャッシュの増加に対するＬＲＵ情報のビット数増加を抑える方式が必要である。以下、本発明の実施形態では、上記の課題を解決するための手段を説明する。 However, the system replacement operation determination method of FIG. 14 has a problem that the traffic becomes large and the operation of the system may be reduced. In addition, the LRU encoding method of the system of FIG. 15 is general, but does not fully utilize the characteristics unique to the shared distributed cache. Also, in order to use more other core caches as lower layer caches when the number of cores has increased, a method for suppressing an increase in the number of bits of LRU information with respect to the increase in other core caches is required. Hereinafter, in the embodiment of the present invention, means for solving the above problems will be described.

図１は、本発明の実施形態による共有分散キャッシュシステムの構成例を示す図である。この共有分散キャッシュシステムは、３個のコア（プロセッサ：処理装置）１０１〜１０３、３個のコア１０１〜１０３に一対一に対応する３個の３ウェイ１次キャッシュ１１１〜１１３、キャッシュ１１１〜１１３に接続されるキャッシュ間接続コントローラ（ＩＣＣ）１２０、及びメインメモリ（又は２次キャッシュ）１３０を含む。コア１０１〜１０３はそれぞれ、自己に直接に接続されるキャッシュ（自コアキャッシュ）を上位階層キャッシュとしてアクセス可能である。この共有分散キャッシュシステムでは、さらに、他のコアのキャッシュ（他コアキャッシュ）を下位階層キャッシュとしてアクセス可能なように構成される。すなわち、例えばコア１０１から見たときに、キャッシュ１１１を上位階層キャッシュとしてアクセス可能であるとともに、さらに他コアキャッシュ１１２及び１１３を下位階層キャッシュとしてアクセス可能なように構成される。このような下位階層キャッシュをアクセスする経路は、キャッシュ間接続コントローラ１２０を介して提供される。 FIG. 1 is a diagram showing a configuration example of a shared distributed cache system according to an embodiment of the present invention. This shared distributed cache system has three cores (processors: processing devices) 101 to 103, three three-way primary caches 111 to 113 corresponding to the three cores 101 to 103, and caches 111 to 113. And an inter-cache connection controller (ICC) 120 connected to the main memory (or secondary cache) 130. Each of the cores 101 to 103 can access a cache (own core cache) directly connected to itself as an upper layer cache. This shared distributed cache system is further configured so that a cache of another core (another core cache) can be accessed as a lower hierarchy cache. That is, for example, when viewed from the core 101, the cache 111 can be accessed as an upper hierarchy cache, and the other core caches 112 and 113 can be accessed as lower hierarchy caches. A path for accessing such a lower hierarchy cache is provided via the inter-cache connection controller 120.

キャッシュ間接続コントローラ１２０は、第１のＬＲＵ情報１２１〜１２３及び第２のＬＲＵ情報１２９を記憶する情報メモリを有する。情報メモリは、キャッシュ間接続コントローラ１２０の外にあってもよい。第１のＬＲＵ情報１２１は、キャッシュ１１１のエントリａ，ｂ，ｃ及び他コアキャッシュ１１２及び１１３のグループｘの最古順を示し、「０」が最古、「２」が最新を示す。すなわち、第１のＬＲＵ情報１２１は、自己のキャッシュ１１１の３個のエントリａ，ｂ，ｃに、ｎ個（＝１個）の他コアキャッシュグループｘを追加したＬＲＵ情報である。 The inter-cache connection controller 120 includes an information memory that stores the first LRU information 121 to 123 and the second LRU information 129. The information memory may be outside the inter-cache connection controller 120. The first LRU information 121 indicates the oldest order of the entries a, b, and c of the cache 111 and the group x of the other core caches 112 and 113, with “0” indicating the oldest and “2” indicating the latest. That is, the first LRU information 121 is LRU information obtained by adding n (= 1) other core cache groups x to the three entries a, b, and c of the own cache 111.

図２は、図１の第１のＬＲＵ情報１２１〜１２３及び第２のＬＲＵ情報１２９の詳細を示す図である。第１のＬＲＵ情報１２２は、第１のＬＲＵ情報１２１と同様に、キャッシュ１１２のエントリａ，ｂ，ｃ及び他コアキャッシュ１１１及び１１３のグループｘの最古順を示す。第１のＬＲＵ情報１２３も、第１のＬＲＵ情報１２１と同様に、キャッシュ１１３のエントリａ，ｂ，ｃ及び他コアキャッシュ１１１及び１１２のグループｘの最古順を示す。第１のＬＲＵ情報１２１〜１２３を２項関係方式によりエンコードすると、それぞれ６ビットになる。 FIG. 2 is a diagram showing details of the first LRU information 121 to 123 and the second LRU information 129 of FIG. Similar to the first LRU information 121, the first LRU information 122 indicates the oldest order of the entries a, b, and c of the cache 112 and the group x of the other core caches 111 and 113. Similarly to the first LRU information 121, the first LRU information 123 also indicates the oldest order of the entries a, b, and c of the cache 113 and the group x of the other core caches 111 and 112. When the first LRU information 121 to 123 is encoded by the binary relation method, each becomes 6 bits.

コア１０１はコアＡであり、キャッシュ１１１はキャッシュＡである。コア１０２はコアＢであり、キャッシュ１１２はキャッシュＢである。コア１０３はコアＣであり、キャッシュ１１３はキャッシュＣである。 The core 101 is the core A, and the cache 111 is the cache A. The core 102 is the core B, and the cache 112 is the cache B. The core 103 is the core C, and the cache 113 is the cache C.

第２のＬＲＵ情報１２９は、３個のコア１０１〜１０３間での自己のキャッシュ１１１〜１１３に対する使用古さ順を示し、ＡがキャッシュＡ（１１１）のＬＲＵ情報、ＢがキャッシュＢ（１１２）のＬＲＵ情報、ＣがキャッシュＣ（１１３）のＬＲＵ情報である。第２のＬＲＵ情報１２９を２項関係方式によりエンコードすると、３ビットになる。したがって、ＬＲＵ情報１２１〜１２３、１２９の合計は、６＋６＋６＋３＝２１ビットになり、図１５のシステムのビット数より少なくなる。第１のＬＲＵ情報１２１〜１２３の情報メモリは、各キャッシュ１１１〜１１３のローカルなＬＲＵ情報に、他コアキャッシュグループが１ウェイ増えたとみて、各キャッシュ毎に第１のＬＲＵ情報１２１〜１２３を保持する。 The second LRU information 129 indicates the usage order of the caches 111 to 113 among the three cores 101 to 103, A is the LRU information of the cache A (111), and B is the cache B (112). LRU information, and C is the LRU information of the cache C (113). When the second LRU information 129 is encoded by the binary relation method, it becomes 3 bits. Therefore, the sum of the LRU information 121 to 123 and 129 is 6 + 6 + 6 + 3 = 21 bits, which is smaller than the number of bits of the system of FIG. The information memory of the first LRU information 121 to 123 holds the first LRU information 121 to 123 for each cache on the assumption that the other core cache group is increased by one way in the local LRU information of each of the caches 111 to 113. To do.

図３は、本発明の実施形態による共有分散キャッシュシステムの他の構成例を示す図である。以下、図３が図１と異なる点を説明する。この共有分散キャッシュシステムは、６個のコア１０１〜１０６、６個のコア１０１〜１０６に一対一に対応する６個の１次キャッシュ１１１〜１１６、キャッシュ１１１〜１１６に接続されるキャッシュ間接続コントローラ（ＩＣＣ）１２０、及びメインメモリ（又は２次キャッシュ）１３０を含む。キャッシュ１１１〜１１４及び１１６は３ウェイであり、キャッシュ１１５は２ウェイである。コア１０１〜１０６はそれぞれ、自コアキャッシュ１１１〜１１６を上位階層キャッシュとしてアクセス可能である。この共有分散キャッシュシステムでは、さらに、他コアキャッシュを下位階層キャッシュとしてアクセス可能なように構成される。すなわち、例えばコア１０１から見たときに、キャッシュ１１１を上位階層キャッシュとしてアクセス可能であるとともに、さらに他コアキャッシュ１１２及び１１３のグループｘ、他コアキャッシュ１１４及び１１５のグループｙ、他コアキャッシュ１１６のグループｚを下位階層キャッシュとしてアクセス可能なように構成される。 FIG. 3 is a diagram showing another configuration example of the shared distributed cache system according to the embodiment of the present invention. Hereinafter, the points of FIG. 3 different from FIG. 1 will be described. The shared distributed cache system includes six cores 101 to 106, six primary caches 111 to 116 corresponding to the six cores 101 to 106, and an inter-cache connection controller connected to the caches 111 to 116. (ICC) 120 and main memory (or secondary cache) 130 are included. The caches 111 to 114 and 116 are 3 ways, and the cache 115 is 2 ways. Each of the cores 101 to 106 can access the own core caches 111 to 116 as an upper hierarchy cache. This shared distributed cache system is further configured so that the other core cache can be accessed as a lower hierarchy cache. That is, for example, when viewed from the core 101, the cache 111 can be accessed as a higher-level cache, and the group x of the other core caches 112 and 113, the group y of the other core caches 114 and 115, and the other core cache 116 The group z is configured to be accessible as a lower hierarchy cache.

キャッシュ間接続コントローラ１２０は、第１のＬＲＵ情報１２１〜１２６及び第２のＬＲＵ情報１２９を記憶する情報メモリを有する。第１のＬＲＵ情報１２１は、キャッシュ１１１のエントリａ，ｂ，ｃ、他コアキャッシュ１１２及び１１３のグループｘ、他コアキャッシュ１１４及び１１５のグループｙ、他コアキャッシュ１１６グループｚの最古順を示し、「０」が最古、「５」が最新を示す。すなわち、第１のＬＲＵ情報１２１は、自己のキャッシュ１１１の３個のエントリａ，ｂ，ｃに、ｎ個（＝３個）の他コアキャッシュグループｘ，ｙ，ｚを追加したＬＲＵ情報である。 The inter-cache connection controller 120 includes an information memory that stores the first LRU information 121 to 126 and the second LRU information 129. The first LRU information 121 indicates the earliest order of the entries a, b, and c of the cache 111, the group x of the other core caches 112 and 113, the group y of the other core caches 114 and 115, and the other core cache 116 group z. , “0” indicates the oldest and “5” indicates the latest. That is, the first LRU information 121 is LRU information obtained by adding n (= 3) other core cache groups x, y, and z to the three entries a, b, and c of the own cache 111. .

図４は、図３の第１のＬＲＵ情報１２１〜１２６及び第２のＬＲＵ情報１２９の詳細を示す図である。第１のＬＲＵ情報１２２は、第１のＬＲＵ情報１２１と同様に、キャッシュ１１２のエントリａ，ｂ，ｃ、ｎ個（＝３個）の他コアキャッシュ１１１及び１１３のグループｘ、他コアキャッシュ１１４及び１１５のグループｙ、他コアキャッシュ１１６のグループｚの最古順を示す。第１のＬＲＵ情報１２３は、キャッシュ１１３のエントリａ，ｂ，ｃ、ｎ個（＝３個）の他コアキャッシュ１１１及び１１２のグループｘ、他コアキャッシュ１１４及び１１５のグループｙ、他コアキャッシュ１１６のグループｚの最古順を示す。第１のＬＲＵ情報１２４は、キャッシュ１１４のエントリａ，ｂ，ｃ、ｎ個（＝２個）の他コアキャッシュ１１１及び１１２のグループｘ、他コアキャッシュ１１３、１１５及び１１６のグループｙの最古順を示す。第１のＬＲＵ情報１２５は、キャッシュ１１５の２個のエントリａ，ｂ、ｎ個（＝３個）の他コアキャッシュ１１１及び１１２のグループｘ、他コアキャッシュ１１３及び１１４のグループｙ、他コアキャッシュ１１６のグループｚの最古順を示す。第１のＬＲＵ情報１２６は、キャッシュ１１６のエントリａ，ｂ，ｃ、ｎ個（＝３個）の他コアキャッシュ１１１及び１１２のグループｘ、他コアキャッシュ１１３及び１１４のグループｙ、他コアキャッシュ１１５のグループｚの最古順を示す。グループの組み合わせは、任意に決めることができる。 FIG. 4 is a diagram showing details of the first LRU information 121 to 126 and the second LRU information 129 of FIG. Similarly to the first LRU information 121, the first LRU information 122 includes the entries a, b, c, n (= 3) of the other core caches 111 and 113 of the cache 112, the group x of the other core caches 114, and the other core cache 114. , 115 of group y, and the oldest order of group z of other core cache 116. The first LRU information 123 includes entries a, b, c and n (= 3) of the cache 113, the group x of the other core caches 111 and 112, the group y of the other core caches 114 and 115, and the other core cache 116. Indicates the oldest order of group z. The first LRU information 124 includes the entries a, b, c and n (= 2) of the cache 114, the group x of the other core caches 111 and 112, and the oldest group y of the other core caches 113, 115 and 116. Indicates the order. The first LRU information 125 includes two entries a and b in the cache 115, group x of n (= 3) other core caches 111 and 112, group y of other core caches 113 and 114, and other core caches. 116 shows the oldest order of 116 groups z. The first LRU information 126 includes entries a, b, c and n (= 3) of the cache 116, the group x of the other core caches 111 and 112, the group y of the other core caches 113 and 114, and the other core cache 115. Indicates the oldest order of group z. The combination of groups can be determined arbitrarily.

第１のＬＲＵ情報１２１〜１２３，１２６を２項関係方式によりエンコードすると、それぞれ１５ビットになる。第１のＬＲＵ情報１２４は、２個の他コアキャッシュグループｘ及びｙの情報を有するので、２項関係方式によりエンコードすると１０ビットになる。第１のＬＲＵ情報１２５は、キャッシュ１１５の２個のエントリａ及びｂの情報を有するので、２項関係方式によりエンコードすると１０ビットになる。 When the first LRU information 121 to 123, 126 is encoded by the binary relation method, each becomes 15 bits. Since the first LRU information 124 includes information on two other core cache groups x and y, the first LRU information 124 becomes 10 bits when encoded by the binary relational method. Since the first LRU information 125 includes information on two entries a and b in the cache 115, the first LRU information 125 becomes 10 bits when encoded by the binary relation method.

コア１０１はコアＡであり、キャッシュ１１１はキャッシュＡである。コア１０２はコアＢであり、キャッシュ１１２はキャッシュＢである。コア１０３はコアＣであり、キャッシュ１１３はキャッシュＣである。コア１０４はコアＤであり、キャッシュ１１４はキャッシュＤである。コア１０５はコアＥであり、キャッシュ１１５はキャッシュＥである。コア１０６はコアＦであり、キャッシュ１１６はキャッシュＦである。 The core 101 is the core A, and the cache 111 is the cache A. The core 102 is the core B, and the cache 112 is the cache B. The core 103 is the core C, and the cache 113 is the cache C. The core 104 is the core D, and the cache 114 is the cache D. The core 105 is the core E, and the cache 115 is the cache E. The core 106 is the core F, and the cache 116 is the cache F.

第２のＬＲＵ情報１２９は、６個のコア１０１〜１０６間での自己のキャッシュ１１１〜１１６に対する使用古さ順を示し、ＡがキャッシュＡ（１１１）のＬＲＵ情報、ＢがキャッシュＢ（１１２）のＬＲＵ情報、ＣがキャッシュＣ（１１３）のＬＲＵ情報、ＤがキャッシュＤ（１１４）のＬＲＵ情報、ＥがキャッシュＥ（１１５）のＬＲＵ情報、ＦがキャッシュＦ（１１６）のＬＲＵ情報である。第２のＬＲＵ情報１２９を２項関係方式によりエンコードすると、１５ビットになる。したがって、ＬＲＵ情報１２１〜１２６、１２９の合計は、１５＋１５＋１５＋１０＋１０＋１５＋１５＝９５ビットになり、コア数が多いときにはビット数を少なくできる効果が大きい。 The second LRU information 129 indicates the used order of the caches 111 to 116 among the six cores 101 to 106, A is the LRU information of the cache A (111), and B is the cache B (112). LRU information of the cache C (113), D is LRU information of the cache D (114), E is LRU information of the cache E (115), and F is LRU information of the cache F (116). When the second LRU information 129 is encoded by the binary relation method, it becomes 15 bits. Therefore, the sum of the LRU information 121 to 126, 129 is 15 + 15 + 15 + 10 + 10 + 15 + 15 = 95 bits, and when the number of cores is large, the effect of reducing the number of bits is great.

図５は、本実施形態による共有分散キャッシュシステムにおけるデータロードアクセス動作を示すフローチャートである。以下、図１のシステムを例に説明する。ステップＳ１において、まず、複数のコア１０１〜１０３の何れか１つのコアが、自己に直接に接続されたキャッシュ（自コアキャッシュ）へロード要求を発行する。 FIG. 5 is a flowchart showing a data load access operation in the shared distributed cache system according to the present embodiment. Hereinafter, the system of FIG. 1 will be described as an example. In step S1, first, any one of the plurality of cores 101 to 103 issues a load request to a cache (own core cache) directly connected to itself.

ステップＳ２において、ロード要求を受け取ったキャッシュは、要求対象のデータがキャッシュ内に存在するか否か、すなわちキャッシュヒットであるか否かを判定する。キャッシュヒットである場合には、ステップＳ３において、自コアキャッシュから要求対象のデータを読み出して、ロード要求を発行したコアにデータを返送する。 In step S2, the cache that has received the load request determines whether or not the requested data exists in the cache, that is, whether or not it is a cache hit. If it is a cache hit, in step S3, the requested data is read from the own core cache, and the data is returned to the core that issued the load request.

ステップＳ２においてキャッシュミスである場合には、ステップＳ４に進む。このステップＳ４において、キャッシュミスが検出されたキャッシュの下に、下位階層キャッシュが存在するか否かを判定する。例えばステップＳ２においてキャッシュミスが検出されてステップＳ４に進んだ場合には、キャッシュミスが検出されたキャッシュはキャッシュ１１１〜１１３の何れか１つであるので、この場合、他の２つのキャッシュが下位階層キャッシュとして存在する。下位階層キャッシュが存在する場合には、ステップＳ５に進む。 If there is a cache miss in step S2, the process proceeds to step S4. In step S4, it is determined whether or not a lower hierarchy cache exists under the cache in which a cache miss is detected. For example, when a cache miss is detected in step S2 and the process proceeds to step S4, the cache in which the cache miss is detected is any one of the caches 111 to 113. In this case, the other two caches are in the lower level. It exists as a hierarchical cache. If a lower hierarchy cache exists, the process proceeds to step S5.

ステップＳ５において、キャッシュ階層が１つ下の他コアキャッシュにアクセスする。ステップＳ６において、アクセス要求を受け取ったキャッシュは、要求対象のデータがキャッシュ内に存在するか否か、すなわちキャッシュヒットであるか否かを判定する。キャッシュミスである場合には、ステップＳ４に戻り、上記の処理を繰り返す。 In step S5, the cache hierarchy accesses the other core cache one level below. In step S6, the cache that has received the access request determines whether or not the requested data exists in the cache, that is, whether or not it is a cache hit. If it is a cache miss, the process returns to step S4 and the above processing is repeated.

ステップＳ６においてキャッシュヒットである場合には、ステップＳ７に進む。このステップＳ７において、キャッシュデータの交換処理を行う。すなわち、キャッシュヒットしたキャッシュからアクセス対象のキャッシュライン（アクセス対象のデータ）を自コアキャッシュに移動し、コアへデータを返送する。この際に、キャッシュラインの自コアキャッシュへの移動に伴い、自コアキャッシュから追い出されるキャッシュラインを他コアに移動する。ステップＳ７の詳細は、後に図６を参照しながら説明する。 If it is a cache hit in step S6, the process proceeds to step S7. In step S7, cache data exchange processing is performed. That is, the cache line to be accessed (data to be accessed) is moved from the cache hit to the own core cache, and the data is returned to the core. At this time, as the cache line moves to the own core cache, the cache line evicted from the own core cache is moved to another core. Details of step S7 will be described later with reference to FIG.

またステップＳ４において、下位階層キャッシュが存在しないと判定された場合には、ステップＳ８に進む。例えばステップＳ６においてキャッシュミスが検出されてステップＳ４に進んだ際に、このキャッシュミスが検出されたキャッシュが既に最下層のキャッシュであった場合、その下層にはメインメモリ１３０しか存在しない。このような場合には、ステップＳ８で、メインメモリ１３０から要求対象のデータを読み出し、自コアキャッシュにアロケート（１キャッシュライン分の要求対象のデータを自コアキャッシュにコピー）するとともに、ロード要求を発行したコアにデータを返送する。またこの動作に伴い自コアキャッシュから追い出されたキャッシュラインは、例えば下位階層キャッシュに移動される。ステップＳ８の詳細は、後に図１０を参照しながら説明する。 If it is determined in step S4 that there is no lower hierarchy cache, the process proceeds to step S8. For example, when a cache miss is detected in step S6 and the process proceeds to step S4, if the cache in which this cache miss is detected is already the lowermost cache, only the main memory 130 exists in the lower layer. In such a case, in step S8, the requested data is read from the main memory 130, allocated to the own core cache (the requested data for one cache line is copied to the own core cache), and a load request is issued. Return data to the issued core. Further, the cache line evicted from the own core cache as a result of this operation is moved to, for example, a lower hierarchy cache. Details of step S8 will be described later with reference to FIG.

上記の動作フローにおいて、ステップＳ７はキャッシュとキャッシュとの間のデータ転送に関連する動作であり、ステップＳ８はメインメモリ１３０とキャッシュとの間のデータ転送に関連する動作である。 In the above operation flow, step S7 is an operation related to data transfer between the cache and the cache, and step S8 is an operation related to data transfer between the main memory 130 and the cache.

図６は、図５のステップＳ７の交換処理を示すフローチャートである。まず、ステップＡ１では、図５のステップＳ６の判断により、下位階層キャッシュとして利用可能な他コアキャッシュにヒットしたと判断される。 FIG. 6 is a flowchart showing the exchange process in step S7 of FIG. First, in step A1, it is determined that another core cache that can be used as a lower hierarchy cache has been hit by the determination in step S6 of FIG.

次に、ステップＡ２では、キャッシュ間接続コントローラ１２０は判定処理を行う。すなわち、ヒットした他コアキャッシュが所属するグループの自コアローカルの第１のＬＲＵ情報が最新であり、かつ、ヒットした他コアキャッシュの第１のＬＲＵ情報のエントリが最新でないことの条件を満たすか否かを判定する。条件を満たす場合にはステップＡ４に進み、条件を満たさない場合にはステップＡ７に進む。 Next, in step A2, the inter-cache connection controller 120 performs a determination process. That is, whether the condition that the first LRU information of the local core local of the group to which the hit other core cache belongs is the latest and the entry of the first LRU information of the hit other core cache is not the latest is satisfied Determine whether or not. If the condition is satisfied, the process proceeds to step A4. If the condition is not satisfied, the process proceeds to step A7.

ステップＡ４では、キャッシュ間接続コントローラ１２０は自コアキャッシュ及び他コアキャッシュ間で交換処理を行う。すなわち、自コアキャッシュの最古エントリと他コアキャッシュのヒットしたエントリを交換する。 In step A4, the inter-cache connection controller 120 performs exchange processing between the own core cache and the other core cache. That is, the oldest entry in the own core cache and the hit entry in the other core cache are exchanged.

次に、ステップＡ５では、キャッシュ間接続コントローラ１２０はＬＲＵ更新処理を行う。すなわち、自コアの第１のＬＲＵ情報については自コアキャッシュの交換されたエントリを最新に更新し、他コアの第１のＬＲＵ情報については他コアキャッシュの交換されたエントリを最古に更新し、第２のＬＲＵ情報については自コアキャッシュが最新となるように更新する。その後、ステップＡ６に進み、処理を終了する。 Next, in step A5, the inter-cache connection controller 120 performs LRU update processing. That is, for the first LRU information of the own core, the exchanged entry of the own core cache is updated to the latest, and for the first LRU information of the other core, the exchanged entry of the other core cache is updated the oldest. The second LRU information is updated so that the own core cache becomes the latest. Then, it progresses to step A6 and complete | finishes a process.

ステップＡ７では、自コアはキャッシュ間接続コントローラ１２０を介して参照処理を行う。すなわち、他コアキャッシュのヒットしたエントリを直接参照する（読み出す）。 In step A7, the self-core performs a reference process via the inter-cache connection controller 120. That is, it directly refers to (reads out) the entry hit in the other core cache.

次に、ステップＡ３では、キャッシュ間接続コントローラ１２０はＬＲＵ更新処理を行う。すなわち、自コアの第１のＬＲＵ情報についてはヒットした他コアキャッシュが所属するグループを最新に更新し、他コアの第１のＬＲＵ情報についてはヒットした他コアキャッシュのエントリが最新であれば２番目に新しいものと交換するように更新し、第２のＬＲＵ情報についてはヒットした他コアキャッシュが最新となるように更新する。その後、ステップＡ６に進み、処理を終了する。 Next, in step A3, the inter-cache connection controller 120 performs LRU update processing. That is, for the first LRU information of the own core, the group to which the hit other core cache belongs is updated to the latest, and for the first LRU information of the other core, 2 if the entry of the hit other core cache is the latest. The second LRU information is updated so that the hit other core cache becomes the latest. Then, it progresses to step A6 and complete | finishes a process.

図７〜図９は、図１及び図２のシステムにおける図６の交換処理の例を示す図である。図７は、初期状態を示す。第１のＬＲＵ情報１２１〜１２３及び第２のＬＲＵ情報１２９は、０が最古であり、値が大きくなるほど新しいことを示す。 7 to 9 are diagrams illustrating an example of the exchange process of FIG. 6 in the system of FIGS. 1 and 2. FIG. 7 shows an initial state. The first LRU information 121 to 123 and the second LRU information 129 indicate that 0 is the oldest and newer as the value increases.

まず、ステップＡ１において、自コアＡ（１０１）から他コアＢ（１０２）のキャッシュＢ（１１２）のエントリｃにヒットした場合を説明する。 First, the case where the entry c of the cache B (112) of the other core B (102) is hit from the own core A (101) in step A1 will be described.

次に、ステップＡ２では、ヒットした他コアキャッシュＢ（１１２）が所属するグループｘの自コアローカルの第１のＬＲＵ情報１２１が１であり、最新でないので、ステップＡ７に進む。ステップＡ７では、他コアキャッシュＢ（１１２）のヒットしたエントリｃを直接参照する。 Next, in step A2, the first LRU information 121 of the own core local of the group x to which the hit other core cache B (112) belongs is 1 and is not the latest, so the process proceeds to step A7. In step A7, the entry c hit in the other core cache B (112) is directly referenced.

次に、ステップＡ３では、図８に示すように、自コアＡ（１０１）の第１のＬＲＵ情報１２１についてはヒットした他コアキャッシュＢ（１１２）が所属するグループｘを最新（値「３」）に更新し、他コアＢ（１０２）の第１のＬＲＵ情報１２２についてはヒットした他コアキャッシュＢ（１１２）のエントリｃが最新（図７の値「３」）であるので、２番目に新しいエントリｘと交換するように更新し、第２のＬＲＵ情報１２９についてはヒットした他コアキャッシュＢ（１１２）が最新となるように更新する。その後、ステップＡ６に進み、処理を終了する。 Next, in step A3, as shown in FIG. 8, for the first LRU information 121 of the own core A (101), the group x to which the hit other core cache B (112) belongs is updated (value “3”). ), The entry c of the other core cache B (112) that has been hit is the latest (value “3” in FIG. 7) for the first LRU information 122 of the other core B (102). The new entry x is updated to be exchanged, and the second LRU information 129 is updated so that the hit other core cache B (112) becomes the latest. Then, it progresses to step A6 and complete | finishes a process.

次に、ステップＡ１において、再度、自コアＡ（１０１）から他コアＢ（１０２）のキャッシュＢ（１１２）のエントリｃにヒットした場合を説明する。 Next, the case where the entry c of the cache B (112) of the other core B (102) is hit again from the own core A (101) in step A1 will be described.

次に、ステップＡ２では、ヒットした他コアキャッシュＢ（１１２）が所属するグループｘの自コアローカルの第１のＬＲＵ情報１２１が最新（値「３」）であり、かつ、ヒットした他コアキャッシュＢ（１１２）の第１のＬＲＵ情報１２２のエントリｃが最新でない（値「２」）ので、ステップＡ４に進む。ステップＡ４では、自コアキャッシュＡ（１１１）の最古エントリｂと他コアキャッシュＢ（１１２）のヒットしたエントリｃを交換する。 Next, in Step A2, the first LRU information 121 of the own core local of the group x to which the hit other core cache B (112) belongs is the latest (value “3”), and the hit other core cache Since the entry c of the first LRU information 122 of B (112) is not the latest (value “2”), the process proceeds to step A4. In step A4, the oldest entry b in the own core cache A (111) and the hit entry c in the other core cache B (112) are exchanged.

次に、ステップＡ５では、図９に示すように、自コアＡ（１０１）の第１のＬＲＵ情報１２１については自コアキャッシュＡ（１１１）の交換されたエントリｂを最新に更新し、他コアＢ（１０２）の第１のＬＲＵ情報１２２については他コアキャッシュＢ（１１２）の交換されたエントリｃを最古に更新し、第２のＬＲＵ情報１２９については自コアキャッシュＡ（１１１）が最新となるように更新する。その後、ステップＡ６に進み、処理を終了する。 Next, in step A5, as shown in FIG. 9, for the first LRU information 121 of the own core A (101), the exchanged entry b of the own core cache A (111) is updated to the latest, and the other cores For the first LRU information 122 of B (102), the exchanged entry c of the other core cache B (112) is updated to the oldest, and for the second LRU information 129, the own core cache A (111) is the latest Update to be Then, it progresses to step A6 and complete | finishes a process.

以上のように、第１回目に他コアキャッシュＢのエントリｃにヒットした場合には、交換処理を行わず、第２回目に他コアキャッシュＢのエントリｃにヒットした場合に交換処理行うことにより、無駄な交換処理をなくし、トラフィックを減少させることができる。すなわち、自コアＡが連続してアクセスし、かつ、ヒットした他コアキャッシュＢで現在あまり使っていないエントリｃのみ交換することにより、効率的な交換処理を行うことができる。 As described above, when the entry c of the other core cache B is hit at the first time, the replacement process is not performed, and when the entry c of the other core cache B is hit at the second time, the replacement process is performed. , Useless exchange processing can be eliminated and traffic can be reduced. In other words, efficient exchange processing can be performed by exchanging only the entry c that is continuously accessed by the own core A and that is currently not frequently used in the hit other core cache B.

次に、図３及び図４のシステムを例に交換処理を説明する。上記と同様に、第１回目は交換せず、第２回目は交換する場合の動作例を示す。 Next, the exchange process will be described by taking the system of FIGS. 3 and 4 as an example. Similarly to the above, an example of operation when the first time is not exchanged and the second time is exchanged is shown.

以下に、初期の第１及び第２のＬＲＵ情報を示す。第１及び第２のＬＲＵ情報は、古→新の順にエントリを示す。
コアＡの第１のＬＲＵ情報１２１：ｂｘａｃｚｙ
コアＢの第１のＬＲＵ情報１２２：ａｂｘｚｙｃ
コアＣの第１のＬＲＵ情報１２３：ｂｘｃｚｙａ
コアＤの第１のＬＲＵ情報１２４：ａｂｃｘｙ
コアＥの第１のＬＲＵ情報１２５：ａｂｘｙｚ
コアＦの第１のＬＲＵ情報１２６：ａｂｃｘｙｚ
第２のＬＲＵ情報１２９：ＣＢＦＡＤＥ The initial first and second LRU information is shown below. The first and second LRU information indicate entries in the order of old → new.
Core A first LRU information 121: bxaczy
Core B first LRU information 122: abxzyc
Core C first LRU information 123: bxczya
Core D first LRU information 124: abcxy
Core E first LRU information 125: abxyz
Core F first LRU information 126: abcxyz
Second LRU information 129: CBFADE

まず、ステップＡ１では、コアＡからコアＢのキャッシュＢのエントリｃにヒットする。
ステップＡ２では、キャッシュＡのエントリｘが最新でないため交換しない。
ステップＡ７では、直接参照する。
ステップＡ３では、ＬＲＵ更新処理により、キャッシュＡのエントリｘを最新に、キャッシュＢのエントリｃを２番目に各々のコアＡ，コアＢに対応する第１のＬＲＵ情報を更新し、キャッシュＢが最新となるように第２のＬＲＵ情報１２９を更新する。
ステップＡ６で終了する。 First, in step A1, an entry c in the cache B of the core B is hit from the core A.
In step A2, since the entry x of the cache A is not the latest, it is not exchanged.
In step A7, direct reference is made.
In step A3, the entry x of the cache A is updated to the latest, the entry c of the cache B is updated second to the first LRU information corresponding to each of the cores A and B, and the cache B is updated. The second LRU information 129 is updated so that
The process ends at step A6.

以下に、更新後の第１及び第２のＬＲＵ情報を示す。第１及び第２のＬＲＵ情報は、古→新の順にエントリを示す。
コアＡの第１のＬＲＵ情報１２１：ｂａｃｚｙｘ
コアＢの第１のＬＲＵ情報１２２：ａｂｘｚｃｙ
コアＣの第１のＬＲＵ情報１２３：ｂｘｃｚｙａ
コアＤの第１のＬＲＵ情報１２４：ａｂｃｘｙ
コアＥの第１のＬＲＵ情報１２５：ａｂｘｙｚ
コアＦの第１のＬＲＵ情報１２６：ａｂｃｘｙｚ
第２のＬＲＵ情報１２９：ＣＦＡＤＥＢ Below, the 1st and 2nd LRU information after an update is shown. The first and second LRU information indicate entries in the order of old → new.
Core A first LRU information 121: baczyx
Core B first LRU information 122: abxzcy
Core C first LRU information 123: bxczya
Core D first LRU information 124: abcxy
Core E first LRU information 125: abxyz
Core F first LRU information 126: abcxyz
Second LRU information 129: CFADEB

次に、ステップＡ１では、再度、コアＡからコアＢのキャッシュＢのエントリｃにヒットする。
ステップＡ２では、キャッシュＡのエントリｘが最新であり、かつ、キャッシュＢのエントリｃが最新でないため交換する。
ステップＡ４では、キャッシュＡのエントリｂとキャッシュＢのエントリｃを交換する。
ステップＡ５では、ＬＲＵ更新処理により、キャッシュＡのエントリｂを最新に、キャッシュＢのエントリｃを最古に各々のコアＡ，コアＢに対応する第１のＬＲＵ情報を更新し、キャッシュＡが最新となるように第２のＬＲＵ情報１２９を更新する。
ステップＡ６で終了する。 Next, in step A1, the entry c of the cache B of the core B is hit again from the core A.
In step A2, since the entry x of the cache A is the latest and the entry c of the cache B is not the latest, they are exchanged.
In step A4, the entry b in the cache A and the entry c in the cache B are exchanged.
In step A5, the first LRU information corresponding to each of core A and core B is updated with the latest entry b of cache A and the oldest entry c of cache B by the LRU update process, and cache A is the latest. The second LRU information 129 is updated so that
The process ends at step A6.

以下に、更新後の第１及び第２のＬＲＵ情報を示す。第１及び第２のＬＲＵ情報は、古→新の順にエントリを示す。
コアＡの第１のＬＲＵ情報１２１：ａｃｚｙｘｂ
コアＢの第１のＬＲＵ情報１２２：ｃａｂｘｚｙ
コアＣの第１のＬＲＵ情報１２３：ｂｘｃｚｙａ
コアＤの第１のＬＲＵ情報１２４：ａｂｃｘｙ
コアＥの第１のＬＲＵ情報１２５：ａｂｘｙｚ
コアＦの第１のＬＲＵ情報１２６：ａｂｃｘｙｚ
第２のＬＲＵ情報１２９：ＣＦＤＥＢＡ Below, the 1st and 2nd LRU information after an update is shown. The first and second LRU information indicate entries in the order of old → new.
Core A first LRU information 121: aczyxb
Core B first LRU information 122: cabxzy
Core C first LRU information 123: bxczya
Core D first LRU information 124: abcxy
Core E first LRU information 125: abxyz
Core F first LRU information 126: abcxyz
Second LRU information 129: CFDEBA

以上のように、他コアキャッシュにヒットした場合の交換動作判定は、そのヒットエントリが自コアで頻繁に使われて、かつ、そのエントリを有している他コアが時間的に同時期に頻繁にアクセスしないという条件を満たすときに、交換処理を行うと判定する。 As described above, when the other core cache is hit, the replacement operation determination is that the hit entry is frequently used by the own core, and the other core having the entry is frequently used at the same time in time. When the condition that no access is made is satisfied, it is determined that the exchange process is performed.

図１０は、図５のステップＳ８の追い出し処理を示すフローチャートである。まず、ステップＢ１では、図５のステップＳ４の判断により、下位階層キャッシュとして利用可能な他コアキャッシュに全てミスしたと判断される。 FIG. 10 is a flowchart showing the eviction process in step S8 of FIG. First, in step B1, it is determined that all other core caches that can be used as the lower hierarchy cache have missed by the determination in step S4 of FIG.

次に、ステップＢ２では、キャッシュ間接続コントローラ１２０は判定処理を行う。すなわち、第２のＬＲＵ情報を参照し、自コアが最近そのインデックスのエントリを最も使っていないコアでないことの条件を満たすか否かを判定する。条件を満たす場合にはステップＢ４に進み、条件を満たさない場合にはステップＢ７に進む。 Next, in step B2, the inter-cache connection controller 120 performs a determination process. That is, referring to the second LRU information, it is determined whether or not the self-core satisfies the condition that it is not the most recently used core of the index. If the condition is satisfied, the process proceeds to step B4. If the condition is not satisfied, the process proceeds to step B7.

ステップＢ４では、キャッシュ間接続コントローラ１２０は自コアキャッシュから他コアキャッシュへ追い出し処理を行う。すなわち、最近最も使われていない他コアキャッシュの最古エントリに自コアキャッシュの最古エントリ（他コアグループ用エントリを含まないものの中で最古エントリ）を移動する。 In step B4, the inter-cache connection controller 120 performs an eviction process from the own core cache to another core cache. That is, the oldest entry of the own core cache (the oldest entry among those not including the other core group entry) is moved to the oldest entry of the other core cache that has not been used most recently.

次に、ステップＢ５では、キャッシュ間接続コントローラ１２０はＬＲＵ更新処理を行う。すなわち、自コアの第１のＬＲＵ情報については自コアキャッシュの最古エントリを最新に更新し、他コアの第１のＬＲＵ情報については追い出し先の他コアキャッシュの最古エントリを最新に更新し、第２のＬＲＵ情報については自コアキャッシュが最新に、追い出し先の他コアキャッシュを２番目に新しいものとなるように更新する。その後、ステップＢ６に進む。 Next, in step B5, the inter-cache connection controller 120 performs LRU update processing. That is, the oldest entry of the own core cache is updated to the latest for the first LRU information of the own core, and the oldest entry of the other core cache to be evicted is updated to the latest for the first LRU information of the other core. The second LRU information is updated so that its own core cache is the latest and the other core cache of the eviction destination is the second most recent. Then, it progresses to step B6.

ステップＢ７では、キャッシュ間接続コントローラ１２０は破棄処理を行う。すなわち、自コアキャッシュの最古エントリを破棄する。具体的には、このステップでは何も処理せず、後のステップＢ６の上書き処理により自コアキャッシュの最古エントリが破棄される。 In step B7, the inter-cache connection controller 120 performs a discard process. That is, the oldest entry in the own core cache is discarded. Specifically, nothing is processed in this step, and the oldest entry in the own core cache is discarded by the overwrite process in the subsequent step B6.

次に、ステップＢ３では、キャッシュ間接続コントローラ１２０はＬＲＵ更新処理を行う。すなわち、自コアの第１のＬＲＵ情報については自コアキャッシュの最古エントリを最新に更新し、第２のＬＲＵ情報については自コアキャッシュが最新になるように更新する。その後、ステップＢ６に進む。 Next, in step B3, the inter-cache connection controller 120 performs LRU update processing. That is, for the first LRU information of the own core, the oldest entry in the own core cache is updated to the latest, and the second LRU information is updated to be the latest. Then, it progresses to step B6.

ステップＢ６では、キャッシュ間接続コントローラ１２０は、自コアキャッシュの最古エントリ（ＬＲＵ情報更新後の最新エントリ）にメインメモリ１３０から読み出したデータを上書きし、自コアはそのデータを取得する。以上で、処理を終了する。 In step B6, the inter-cache connection controller 120 overwrites the data read from the main memory 130 on the oldest entry (the latest entry after the LRU information update) of the own core cache, and the own core acquires the data. Thus, the process ends.

図１１〜図１３は、図１及び図２のシステムにおける図１０の追い出し処理の例を示す図である。図１１は、初期状態を示す。第１のＬＲＵ情報１２１〜１２３及び第２のＬＲＵ情報１２９は、０が最古であり、値が大きくなるほど新しいことを示す。 11 to 13 are diagrams illustrating an example of the eviction process of FIG. 10 in the system of FIGS. 1 and 2. FIG. 11 shows an initial state. The first LRU information 121 to 123 and the second LRU information 129 indicate that 0 is the oldest and newer as the value increases.

まず、ステップＢ１において、自コアＡ（１０１）のアクセスが全ての他コアキャッシュでミスした場合を説明する。 First, in step B1, the case where the access of the own core A (101) misses in all other core caches will be described.

次に、ステップＢ２では、第２のＬＲＵ情報１２９を参照すると、自コアキャッシュＡ（１１１）が最近最も使っていないので、ステップＢ７を介してステップＢ３に進む。 Next, in step B2, when referring to the second LRU information 129, since the own core cache A (111) has not been used most recently, the process proceeds to step B3 via step B7.

ステップＢ３では、図１２に示すように、自コアＡ（１０１）の第１のＬＲＵ情報１２１については自コアキャッシュＡ（１１１）の最古エントリｂを最新に更新し、第２のＬＲＵ情報１２９については自コアキャッシュＡが最新になるように更新する。 In step B3, as shown in FIG. 12, for the first LRU information 121 of the own core A (101), the oldest entry b of the own core cache A (111) is updated to the latest, and the second LRU information 129 is updated. Is updated so that its own core cache A becomes the latest.

次に、ステップＢ６では、自コアキャッシュＡ（１１１）の最古エントリ（ＬＲＵ情報更新後の最新エントリ）ｂにメインメモリ１３０から読み出したデータを上書きする。以上で処理を終了する。 Next, in step B6, the data read from the main memory 130 is overwritten on the oldest entry (latest entry after updating the LRU information) b of the own core cache A (111). The process ends here.

次に、ステップＢ１において、再度、自コアＡ（１０１）のアクセスが全ての他コアキャッシュでミスした場合を説明する。 Next, the case where the access of the own core A (101) misses in all other core caches again in step B1 will be described.

次に、ステップＢ２では、第２のＬＲＵ情報１２９を参照すると、他コアキャッシュＣ（１１３）が最近最も使っていないので、ステップＢ４に進む。 Next, in step B2, referring to the second LRU information 129, the other core cache C (113) has not been used most recently, so the process proceeds to step B4.

ステップＢ４では、最近最も使われていない他コアキャッシュＣ（１１３）の最古エントリｂに自コアキャッシュＡ（１１１）の最古エントリ（他コアグループ用エントリを含まないものの中で最古エントリ）ａを移動する。 In step B4, the oldest entry b of the own core cache A (111) is included in the oldest entry b of the other core cache C (113) that has not been used most recently (the oldest entry among those not including the other core group entry). Move a.

次に、ステップＢ５では、図１３に示すように、自コアＡ（１０１）の第１のＬＲＵ情報１２１については自コアキャッシュＡ（１１１）の最古エントリａを最新に更新し、他コアＣ（１０３）の第１のＬＲＵ情報１２３については追い出し先の他コアキャッシュＣ（１１３）の最古エントリｂを最新に更新し、第２のＬＲＵ情報１２９については自コアキャッシュＡ（１１１）が最新に、追い出し先の他コアキャッシュＣ（１１３）を２番目に新しいものとなるように更新する。 Next, in step B5, as shown in FIG. 13, for the first LRU information 121 of the own core A (101), the oldest entry a of the own core cache A (111) is updated to the latest, and the other core C For the first LRU information 123 of (103), the oldest entry b of the other core cache C (113) to be evicted is updated to the latest, and for the second LRU information 129, the own core cache A (111) is the latest Then, the other core cache C (113) of the eviction destination is updated so as to be the second newest.

次に、ステップＢ６では、自コアキャッシュＡ（１１１）の最古エントリ（ＬＲＵ情報更新後の最新エントリ）ａにメインメモリ１３０から読み出したデータを上書きする。以上で処理を終了する。 Next, in step B6, the data read from the main memory 130 is overwritten on the oldest entry (latest entry after updating the LRU information) a of the own core cache A (111). The process ends here.

以上のように、第１回目に全ての他コアキャッシュでミスした場合には、追い出し処理を行わず、第２回目に全ての他コアキャッシュでミスした場合に追い出し処理行うことにより、無駄な追い出し処理をなくし、トラフィックを減少させることができる。すなわち、自コアＡが連続してキャッシュミスし、かつ、他コアキャッシュが最古であるときのみ追い出し処理することにより、効率的な追い出し処理を行うことができる。 As described above, if all other core caches miss in the first time, the eviction process is not performed, and if the other core cache misses in the second time, the eviction process is performed, so that unnecessary eviction is performed. Processing can be eliminated and traffic can be reduced. In other words, efficient eviction processing can be performed by performing eviction processing only when the own core A continuously misses the cache and the other core cache is the oldest.

次に、図３及び図４のシステムを例に追い出し処理を説明する。上記と同様に、第１回目は追い出しせず、第２回目は追い出しする場合の動作例を示す。 Next, the eviction process will be described using the system of FIGS. 3 and 4 as an example. In the same manner as described above, an operation example in the case where the first time is not driven and the second time is driven is shown.

以下に、初期の第１及び第２のＬＲＵ情報を示す。第１及び第２のＬＲＵ情報は、古→新の順にエントリを示す。
コアＡの第１のＬＲＵ情報１２１：ｂｘａｃｚｙ
コアＢの第１のＬＲＵ情報１２２：ａｂｘｚｙｃ
コアＣの第１のＬＲＵ情報１２３：ｂｘｃｚｙａ
コアＤの第１のＬＲＵ情報１２４：ａｂｃｘｙ
コアＥの第１のＬＲＵ情報１２５：ａｂｘｙｚ
コアＦの第１のＬＲＵ情報１２６：ａｂｃｘｙｚ
第２のＬＲＵ情報１２９：ＡＣＢＦＤＥ The initial first and second LRU information is shown below. The first and second LRU information indicate entries in the order of old → new.
Core A first LRU information 121: bxaczy
Core B first LRU information 122: abxzyc
Core C first LRU information 123: bxczya
Core D first LRU information 124: abcxy
Core E first LRU information 125: abxyz
Core F first LRU information 126: abcxyz
Second LRU information 129: ACBFDE

まず、ステップＢ１では、コアＡのアクセスが全ての他コアキャッシュでミスする。
ステップＢ２では、第２のＬＲＵ情報１２９から、自コアＡが最近最もアクセスしていないため追い出ししない。
ステップＢ３では、ＬＲＵ更新処理により、キャッシュＡのエントリｂを最新に、コアＡに対応する第１のＬＲＵ情報を更新し、キャッシュＡが最新となるように第２のＬＲＵ情報１２９を更新する。
ステップＢ６では、キャッシュＡのエントリｂに上書きする。 First, in step B1, the access of core A misses in all other core caches.
In step B2, since the own core A has not been accessed most recently from the second LRU information 129, it is not evicted.
In step B3, the entry B of the cache A is updated to the latest, the first LRU information corresponding to the core A is updated by the LRU update process, and the second LRU information 129 is updated so that the cache A is the latest.
In step B6, the entry b in the cache A is overwritten.

以下に、更新後の第１及び第２のＬＲＵ情報を示す。第１及び第２のＬＲＵ情報は、古→新の順にエントリを示す。
コアＡの第１のＬＲＵ情報１２１：ｘａｃｚｙｂ
コアＢの第１のＬＲＵ情報１２２：ａｂｘｚｙｃ
コアＣの第１のＬＲＵ情報１２３：ｂｘｃｚｙａ
コアＤの第１のＬＲＵ情報１２４：ａｂｃｘｙ
コアＥの第１のＬＲＵ情報１２５：ａｂｘｙｚ
コアＦの第１のＬＲＵ情報１２６：ａｂｃｘｙｚ
第２のＬＲＵ情報１２９：ＣＢＦＤＥＡ Below, the 1st and 2nd LRU information after an update is shown. The first and second LRU information indicate entries in the order of old → new.
Core A first LRU information 121: xaczyb
Core B first LRU information 122: abxzyc
Core C first LRU information 123: bxczya
Core D first LRU information 124: abcxy
Core E first LRU information 125: abxyz
Core F first LRU information 126: abcxyz
Second LRU information 129: CBFDEA

次に、ステップＢ１では、再度、コアＡのアクセスが全ての他コアキャッシュでミスする。
ステップＢ２では、第２のＬＲＵ情報１２９から、他コアが最近最もアクセスしていないため追い出しが発生する。
ステップＢ４では、キャッシュＡのエントリｂとキャッシュＣのエントリｂを交換する。
ステップＢ５では、ＬＲＵ更新処理により、キャッシュＡのエントリｂを最新に、キャッシュＣのエントリｂを最新に各々のコアＡ，コアＣに対応する第１のＬＲＵ情報を更新し、キャッシュＡが最新に、キャッシュＣが2番目に最新になるように第２のＬＲＵ情報１２９を更新する。
ステップＢ６では、キャッシュＡのエントリｂに上書きする。 Next, in step B1, the access of core A misses again in all other core caches.
In step B2, eviction occurs from the second LRU information 129 because the other core has not been accessed most recently.
In step B4, the entry b of the cache A and the entry b of the cache C are exchanged.
In step B5, the LRU update process updates the entry b of the cache A to the latest, updates the entry b of the cache C to the latest, and updates the first LRU information corresponding to each of the cores A and C. The second LRU information 129 is updated so that the cache C is the second most recent.
In step B6, the entry b in the cache A is overwritten.

以下に、更新後の第１及び第２のＬＲＵ情報を示す。第１及び第２のＬＲＵ情報は、古→新の順にエントリを示す。
コアＡの第１のＬＲＵ情報１２１：ｘａｃｚｙｂ
コアＢの第１のＬＲＵ情報１２２：ｃａｂｘｚｙ
コアＣの第１のＬＲＵ情報１２３：ｘｃｚｙａｂ
コアＤの第１のＬＲＵ情報１２４：ａｂｃｘｙ
コアＥの第１のＬＲＵ情報１２５：ａｂｘｙｚ
コアＦの第１のＬＲＵ情報１２６：ａｂｃｘｙｚ
第２のＬＲＵ情報１２９：ＢＦＤＥＣＡ Below, the 1st and 2nd LRU information after an update is shown. The first and second LRU information indicate entries in the order of old → new.
Core A first LRU information 121: xaczyb
Core B first LRU information 122: cabxzy
Core C first LRU information 123: xczyab
Core D first LRU information 124: abcxy
Core E first LRU information 125: abxyz
Core F first LRU information 126: abcxyz
Second LRU information 129: BFDECA

以上のように、追い出し動作判定は、第２のＬＲＵ情報１２９で最古のコアに追い出す。本実施形態では、第２のＬＲＵ情報１２９は最新のアクセスのみしか保持していないが、完全な最古順を保持しても構わない。最新のアクセスのみしか保持しない理由は、ビット数削減のためである。また、第２のＬＲＵ情報１２９の更新時に追い出し先コアを２番目に新しいものに更新して、追い出し先を持ち回りすることで、完全な最古順を保持した場合に比べた性能低下を抑えられる。 As described above, in the eviction operation determination, the second LRU information 129 is evacuated to the oldest core. In the present embodiment, the second LRU information 129 holds only the latest access, but it may hold the complete oldest order. The reason for retaining only the latest access is to reduce the number of bits. In addition, when the second LRU information 129 is updated, the eviction destination core is updated to the second most recent one, and the eviction destination is carried around, so that it is possible to suppress the performance degradation compared to the case where the complete oldest order is maintained. .

図１４のシステムのキャッシュライン交換トラフィックの発生確率は、（全キャッシュのウェイ数−１）／（全キャッシュのウェイ数）である。例えば、３ウェイ４コアの場合には１１／１２となる。自コアが頻繁にアクセスしないエントリでも、最新でない限り交換してしまうため、交換トラフィックが大きい。 The occurrence probability of the cache line exchange traffic in the system of FIG. 14 is (the number of ways of all caches−1) / (the number of ways of all caches). For example, in the case of a 3-way 4-core, it is 11/12. Even if an entry that the core does not access frequently is exchanged unless it is the latest, the exchange traffic is large.

これに対し、本実施形態のシステムのキャッシュライン交換トラフィックの発生確率は、（１／（全キャッシュのウェイ数＋ｎ））×（（キャッシュのウェイ数）／（キャッシュのウェイ数＋ｎ））である。例えば、３ウェイ４コア、ｎ＝１の場合には（１／４）×（３／４）＝３／１６となる。自コアが連続してアクセスし、かつ、ヒットした他コアキャッシュで現在あまり使っていないエントリのみ交換するので、交換トラフィックが小さくなる。 In contrast, the probability of occurrence of cache line exchange traffic in the system of the present embodiment is (1 / (the number of ways of all caches + n)) × ((the number of ways of caches) / (the number of ways of caches + n)). . For example, in the case of 3 ways, 4 cores and n = 1, (1/4) × (3/4) = 3/16. Since only the entries that are continuously accessed by the own core and that are not currently used in the hit other core cache are exchanged, the exchange traffic is reduced.

また、図１５のシステムのＬＲＵ情報は、（第１のＬＲＵ情報）＋（第２のＬＲＵ情報）により、以下のビット数となる。
３ウェイ４コア：３×４＋２×１２＝３６ビット
４ウェイ４コア：６×４＋２×１６＝５６ビット
３ウェイ８コア：３×８＋２×２４＝１０６ビット
４ウェイ８コア：６×８＋３×３２＝１４４ビット Further, the LRU information in the system of FIG. 15 has the following number of bits according to (first LRU information) + (second LRU information).
3-way 4-core: 3 × 4 + 2 × 12 = 36 bits 4-way 4-core: 6 × 4 + 2 × 16 = 56 bits 3-way 8-core: 3 × 8 + 2 × 24 = 106 bits 4-way 8-core: 6 × 8 + 3 × 32 = 144 bits

これに対し、本実施形態のＬＲＵ情報は、（第１のＬＲＵ情報）＋（第２のＬＲＵ情報）により、以下のビット数となる。
３ウェイ４コア：６×４＋６＝３０ビット
４ウェイ４コア：１０×４＋６＝４６ビット
３ウェイ８コア：６×８＋２８＝７６ビット
４ウェイ８コア：１０×８＋２８＝１０８ビット On the other hand, the LRU information of the present embodiment has the following number of bits according to (first LRU information) + (second LRU information).
3-way 4-core: 6 × 4 + 6 = 30 bits 4-way 4-core: 10 × 4 + 6 = 46 bits 3-way 8-core: 6 × 8 + 28 = 76 bits 4-way 8-core: 10 × 8 + 28 = 108 bits

本実施形態は、ＬＲＵ情報のビット数を少なくすることができる。特に、コア数又はウェイ数が多くなった場合にビット数の減少効果が大きい。 In the present embodiment, the number of bits of LRU information can be reduced. In particular, when the number of cores or the number of ways increases, the effect of reducing the number of bits is large.

以上のように、本実施形態のキャッシュシステムは、例えば図１に示すように、複数の処理装置（コア）１０１〜１０３と、前記複数の処理装置１０１〜１０３に一対一に接続された複数のキャッシュ１１１〜１１３と、前記複数のキャッシュ１１１〜１１３に接続され、前記複数のキャッシュ１１１〜１１３間のデータ転送を制御するコントローラ１２０と、前記処理装置毎に自己のキャッシュ内のエントリ及び他のキャッシュの使用古さ順を示す第１の情報１２１〜１２３を記憶する情報メモリとを有する。 As described above, the cache system according to the present embodiment includes, for example, a plurality of processing devices (cores) 101 to 103 and a plurality of processing devices 101 to 103 connected one-on-one to each other, as shown in FIG. Caches 111 to 113, a controller 120 that is connected to the plurality of caches 111 to 113 and controls data transfer between the plurality of caches 111 to 113, and an entry in the own cache for each processing device and other caches And an information memory for storing first information 121 to 123 indicating the order of use age.

例えば図６〜図９に示すように、前記コントローラ１２０は、前記処理装置１０１が要求したデータが、自己のキャッシュ１１１でミスし、他のキャッシュ１１２でヒットした場合に、前記第１の情報１２１を基に前記自己のキャッシュ１１１のエントリ及び前記ヒットした他のキャッシュ１１２のエントリを交換する。 For example, as shown in FIG. 6 to FIG. 9, when the data requested by the processing apparatus 101 misses in its own cache 111 and hits in another cache 112, the controller 120 sends the first information 121. Based on this, the entry of the own cache 111 and the entry of the other hit cache 112 are exchanged.

図２に示すように、第１の情報１２１〜１２３は、前記処理装置１０１〜１０３毎に自己のキャッシュ内のエントリａ，ｂ，ｃ及び複数の他のキャッシュがグループ化された一のエントリｘの使用古さ順を示す情報である。また、図４に示すように、第１の情報１２１〜１２６は、前記処理装置１０１〜１０６毎に自己のキャッシュ内のエントリａ，ｂ，ｃ及び複数の他のキャッシュがグループ化された複数のエントリｘ，ｙ，ｚの使用古さ順を示す情報である。 As shown in FIG. 2, the first information 121 to 123 includes one entry x in which entries a, b and c in its own cache and a plurality of other caches are grouped for each of the processing devices 101 to 103. This is information indicating the order of use. Further, as shown in FIG. 4, the first information 121 to 126 includes a plurality of entries in which entries a, b, and c in the own cache and a plurality of other caches are grouped for each of the processing devices 101 to 106. This is information indicating the order of use of entries x, y, and z.

例えば図６〜図９に示すように、前記コントローラ１２０は、前記処理装置１０１が要求したデータが、自己のキャッシュ１１１でミスし、他のキャッシュ１１２でヒットした場合、前記複数の他のキャッシュがグループ化されたエントリｘが最新であるときに、前記自己のキャッシュ１１１のエントリ及び前記ヒットした他のキャッシュ１１２のエントリを交換する。 For example, as shown in FIGS. 6 to 9, when the data requested by the processing device 101 misses in its own cache 111 and hits in another cache 112, the controller 120 determines that the other caches When the grouped entry x is up-to-date, the entry of its own cache 111 and the entry of the other cache 112 that has been hit are exchanged.

より具体的には、前記コントローラ１２０は、前記処理装置１０１が要求したデータが、自己のキャッシュ１１１でミスし、他のキャッシュ１１２でヒットした場合、前記複数の他のキャッシュがグループ化されたエントリｘが最新であり、かつ前記ヒットした他のキャッシュ１１２のエントリｃが最新でないときに、前記自己のキャッシュ１１１のエントリｂ及び前記ヒットした他のキャッシュ１１２のエントリｃを交換する。 More specifically, when the data requested by the processing device 101 misses in its own cache 111 and hits in another cache 112, the controller 120 is an entry in which the plurality of other caches are grouped. When x is the latest and the entry c of the other hit cache 112 is not the latest, the entry b of the own cache 111 and the entry c of the other hit cache 112 are exchanged.

前記第１の情報１２１〜１２３は、前記処理装置１０１が要求したデータが、自己のキャッシュ１１１でミスし、他のキャッシュ１１２でヒットした場合、前記複数の他のキャッシュがグループ化されたエントリｘが最新でなく、又は前記ヒットした他のキャッシュ１１２のエントリｃが最新であるときに、前記ヒットした他のキャッシュ１１２に対応する処理装置１０２において前記ヒットした他のキャッシュ１１２のエントリｃが最新であるときには、そのエントリｃが２番目に新しい順番に更新され、２番目に新しいエントリｘが最新の順番に更新される。 The first information 121 to 123 is an entry x in which the plurality of other caches are grouped when the data requested by the processing device 101 misses in its own cache 111 and hits in another cache 112. Is not the latest, or the entry c of the other hit cache 112 is the latest, the entry c of the other hit cache 112 is the latest in the processing device 102 corresponding to the other hit cache 112. In some cases, the entry c is updated in the second most recent order, and the second new entry x is updated in the most recent order.

図２に示すように、前記情報メモリは、前記複数の処理装置１０１〜１０３間での自己のキャッシュに対する使用古さ順を示す第２の情報１２９を記憶する。 As shown in FIG. 2, the information memory stores second information 129 indicating the order of use with respect to its own cache among the plurality of processing apparatuses 101 to 103.

例えば図１０〜図１３に示すように、前記コントローラ１２０は、前記処理装置１０１が要求したデータが、前記複数のキャッシュ１１１〜１１３すべてでミスした場合、前記第２の情報１２９を基に前記自己のキャッシュ１１１のエントリａを他のキャッシュ１２３のエントリｂに移動し、主記憶装置（メインメモリ）又は前記複数のキャッシュより下位階層のキャッシュ（２次キャッシュ）１３０から読み出したデータを前記移動した自己のキャッシュ１１１のエントリａに上書きする。 For example, as shown in FIGS. 10 to 13, when the data requested by the processing device 101 misses in all of the plurality of caches 111 to 113, the controller 120 determines that the controller 120 is based on the second information 129. The entry a of the cache 111 is moved to the entry b of the other cache 123, and the data read from the main memory (main memory) or the cache (secondary cache) 130 at a lower hierarchy than the plurality of caches is moved. The entry a of the cache 111 is overwritten.

より具体的には、前記コントローラ１２０は、前記処理装置１０１が要求したデータが、前記複数のキャッシュ１１１〜１１３すべてでミスした場合、前記要求した処理装置１０１が自己のキャッシュ１１１を最近最も使用していない処理装置でないときに、前記自己のキャッシュ１１１のエントリａを他のキャッシュ１１３のエントリｂに移動し、主記憶装置又は前記複数のキャッシュより下位階層のキャッシュ１３０から読み出したデータを前記移動した自己のキャッシュ１１１のエントリａに上書きする。 More specifically, when the data requested by the processing device 101 misses in all of the plurality of caches 111 to 113, the controller 120 uses the cache 111 most recently. When the processing device is not a non-processing device, the entry a of the own cache 111 is moved to the entry b of the other cache 113, and the data read from the main storage device or the cache 130 at a lower hierarchy than the plurality of caches is moved. Overwrite entry a in its own cache 111.

なお、上記実施形態は、何れも本発明を実施するにあたっての具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 The above-described embodiments are merely examples of implementation in carrying out the present invention, and the technical scope of the present invention should not be construed in a limited manner. That is, the present invention can be implemented in various forms without departing from the technical idea or the main features thereof.

本発明の実施形態は、例えば以下のように種々の適用が可能である。 The embodiment of the present invention can be applied in various ways as follows, for example.

（付記１）
複数の処理装置と、
前記複数の処理装置に一対一に接続された複数のキャッシュと、
前記複数のキャッシュに接続され、前記複数のキャッシュ間のデータ転送を制御するコントローラと、
前記処理装置毎に自己のキャッシュ内のエントリ及び他のキャッシュの使用古さ順を示す第１の情報を記憶する情報メモリと
を有することを特徴とするキャッシュシステム。
（付記２）
前記コントローラは、前記処理装置が要求したデータが、自己のキャッシュでミスし、他のキャッシュでヒットした場合に、前記第１の情報を基に前記自己のキャッシュのエントリ及び前記ヒットした他のキャッシュのエントリを交換することを特徴とする付記１記載のキャッシュシステム。
（付記３）
前記第１の情報は、前記処理装置毎に自己のキャッシュ内のエントリ及び複数の他のキャッシュがグループ化された一又は複数のエントリの使用古さ順を示す情報であることを特徴とする付記１記載のキャッシュシステム。
（付記４）
前記コントローラは、前記処理装置が要求したデータが、自己のキャッシュでミスし、他のキャッシュでヒットした場合、前記複数の他のキャッシュがグループ化されたエントリが最新であるときに、前記自己のキャッシュのエントリ及び前記ヒットした他のキャッシュのエントリを交換することを特徴とする付記３記載のキャッシュシステム。
（付記５）
前記コントローラは、前記処理装置が要求したデータが、自己のキャッシュでミスし、他のキャッシュでヒットした場合、前記複数の他のキャッシュがグループ化されたエントリが最新であり、かつ前記ヒットした他のキャッシュのエントリが最新でないときに、前記自己のキャッシュのエントリ及び前記ヒットした他のキャッシュのエントリを交換することを特徴とする付記３記載のキャッシュシステム。
（付記６）
前記第１の情報は、前記処理装置が要求したデータが、自己のキャッシュでミスし、他のキャッシュでヒットした場合、前記複数の他のキャッシュがグループ化されたエントリが最新でなく、又は前記ヒットした他のキャッシュのエントリが最新であるときに、前記ヒットした他のキャッシュに対応する処理装置において前記ヒットした他のキャッシュのエントリが最新であるときには、そのエントリが２番目に新しい順番に更新され、２番目に新しいエントリが最新の順番に更新されることを特徴とする付記５記載のキャッシュシステム。
（付記７）
前記情報メモリは、前記複数の処理装置間での自己のキャッシュに対する使用古さ順を示す第２の情報を記憶することを特徴とする付記１記載のキャッシュシステム。
（付記８）
前記コントローラは、前記処理装置が要求したデータが、前記複数のキャッシュすべてでミスした場合、前記第２の情報を基に前記自己のキャッシュのエントリを他のキャッシュのエントリに移動し、主記憶装置又は前記複数のキャッシュより下位階層のキャッシュから読み出したデータを前記移動した自己のキャッシュのエントリに上書きすることを特徴とする付記７記載のキャッシュシステム。
（付記９）
前記コントローラは、前記処理装置が要求したデータが、前記複数のキャッシュすべてでミスした場合、前記要求した処理装置が自己のキャッシュを最近最も使用していない処理装置でないときに、前記自己のキャッシュのエントリを他のキャッシュのエントリに移動し、主記憶装置又は前記複数のキャッシュより下位階層のキャッシュから読み出したデータを前記移動した自己のキャッシュのエントリに上書きすることを特徴とする付記８記載のキャッシュシステム。 (Appendix 1)
A plurality of processing devices;
A plurality of caches connected one-to-one to the plurality of processing devices;
A controller connected to the plurality of caches for controlling data transfer between the plurality of caches;
A cache system comprising: an information memory for storing an entry in its own cache and first information indicating the order of use of other caches for each processing device.
(Appendix 2)
The controller, when the data requested by the processing device misses in its own cache and hits in another cache, based on the first information, the entry in the own cache and the other cache hit The cache system according to appendix 1, wherein the entries are exchanged.
(Appendix 3)
The first information is information indicating an order of use of one or a plurality of entries in which an entry in its own cache and a plurality of other caches are grouped for each processing device. 1. The cache system according to 1.
(Appendix 4)
When the data requested by the processing unit misses in its own cache and hits in another cache, when the entry into which the plurality of other caches are grouped is the latest, the controller 4. The cache system according to appendix 3, wherein a cache entry and the hit other cache entry are exchanged.
(Appendix 5)
When the data requested by the processing device misses in its own cache and hits in another cache, the controller has the latest entry in which the other caches are grouped, and the other hit 4. The cache system according to claim 3, wherein when the cache entry is not up-to-date, the entry of the own cache and the entry of the other hit cache are exchanged.
(Appendix 6)
In the first information, when the data requested by the processing device misses in its own cache and hits in another cache, an entry in which the plurality of other caches are grouped is not the latest, or When the entry of the other hit cache is the latest, when the entry of the other hit cache is the latest in the processing device corresponding to the other hit cache, the entry is updated in the second newest order. The cache system according to appendix 5, wherein the second newest entry is updated in the latest order.
(Appendix 7)
2. The cache system according to claim 1, wherein the information memory stores second information indicating an order of use with respect to the own cache among the plurality of processing devices.
(Appendix 8)
When the data requested by the processing device misses in all of the plurality of caches, the controller moves the cache entry to another cache entry based on the second information, and the main storage device Alternatively, the cache system according to appendix 7, wherein data read from a cache lower than the plurality of caches is overwritten on the entry of the moved own cache.
(Appendix 9)
The controller, when the data requested by the processing device misses in all of the plurality of caches, when the requested processing device is not the processing device that has not used the cache most recently, The cache according to appendix 8, wherein the entry is moved to an entry in another cache, and the data read from the main storage device or a cache in a lower hierarchy than the plurality of caches is overwritten on the moved cache entry. system.

本発明の実施形態による共有分散キャッシュシステムの構成例を示す図である。It is a figure which shows the structural example of the shared distributed cache system by embodiment of this invention. 図１の第１のＬＲＵ情報及び第２のＬＲＵ情報の詳細を示す図である。It is a figure which shows the detail of the 1st LRU information and 2nd LRU information of FIG. 本発明の実施形態による共有分散キャッシュシステムの他の構成例を示す図である。It is a figure which shows the other structural example of the shared distributed cache system by embodiment of this invention. 図３の第１のＬＲＵ情報及び第２のＬＲＵ情報の詳細を示す図である。It is a figure which shows the detail of the 1st LRU information and 2nd LRU information of FIG. 本実施形態による共有分散キャッシュシステムにおけるデータロードアクセス動作を示すフローチャートである。It is a flowchart which shows the data load access operation | movement in the shared distributed cache system by this embodiment. 図５のステップＳ７の交換処理を示すフローチャートである。It is a flowchart which shows the exchange process of FIG.5 S7. 図１及び図２のシステムにおける図６の交換処理の例を示す図である。It is a figure which shows the example of the exchange process of FIG. 6 in the system of FIG.1 and FIG.2. 図１及び図２のシステムにおける図６の交換処理の例を示す図である。It is a figure which shows the example of the exchange process of FIG. 6 in the system of FIG.1 and FIG.2. 図１及び図２のシステムにおける図６の交換処理の例を示す図である。It is a figure which shows the example of the exchange process of FIG. 6 in the system of FIG.1 and FIG.2. 図５のステップＳ８の追い出し処理を示すフローチャートである。It is a flowchart which shows the eviction process of step S8 of FIG. 図１及び図２のシステムにおける図１０の追い出し処理の例を示す図である。It is a figure which shows the example of the eviction process of FIG. 10 in the system of FIG.1 and FIG.2. 図１及び図２のシステムにおける図１０の追い出し処理の例を示す図である。It is a figure which shows the example of the eviction process of FIG. 10 in the system of FIG.1 and FIG.2. 図１及び図２のシステムにおける図１０の追い出し処理の例を示す図である。It is a figure which shows the example of the eviction process of FIG. 10 in the system of FIG.1 and FIG.2. 共有分散キャッシュシステムの構成例を示す図である。It is a figure which shows the structural example of a shared distributed cache system. 他の共有分散キャッシュシステムの構成例を示す図である。It is a figure which shows the structural example of another shared distributed cache system. 図１５の共有分散キャッシュシステムにおけるキャッシュエントリ追い出し処理を示すフローチャートである。It is a flowchart which shows the cache entry eviction process in the shared distributed cache system of FIG.

Explanation of symbols

１０１〜１０３コア（処理装置）
１１１〜１１３キャッシュ
１２０キャッシュ間接続コントローラ
１２１〜１２３第１のＬＲＵ情報
１２９第２のＬＲＵ情報
１３０メインメモリ 101-103 core (processing equipment)
111 to 113 cache 120 inter-cache connection controllers 121 to 123 first LRU information 129 second LRU information 130 main memory

Claims

A plurality of processing devices;
A plurality of caches connected one-to-one to the plurality of processing devices;
A controller connected to the plurality of caches for controlling data transfer between the plurality of caches;
A cache system comprising: an information memory for storing an entry in its own cache and first information indicating the order of use of other caches for each processing device.

The controller, when the data requested by the processing device misses in its own cache and hits in another cache, based on the first information, the entry in the own cache and the other cache hit 2. The cache system according to claim 1, wherein the entries are exchanged.

The first information is information indicating an order of use of one or a plurality of entries in which an entry in its own cache and a plurality of other caches are grouped for each processing device. Item 3. The cache system according to item 1 or 2.

4. The cache system according to claim 1, wherein the information memory stores second information indicating an order of use with respect to a cache of the cache among the plurality of processing devices. 5. .

When the data requested by the processing device misses in all of the plurality of caches, the controller moves the cache entry to another cache entry based on the second information, and the main storage device 5. The cache system according to claim 4, wherein data read from a cache lower than the plurality of caches is overwritten on the entry of the moved own cache.