JP2009245323A

JP2009245323A - System and method for reducing latency

Info

Publication number: JP2009245323A
Application number: JP2008093106A
Authority: JP
Inventors: Koji Kobayashi; 宏次小林
Original assignee: NEC Computertechno Ltd
Current assignee: NEC Computertechno Ltd
Priority date: 2008-03-31
Filing date: 2008-03-31
Publication date: 2009-10-22

Abstract

<P>PROBLEM TO BE SOLVED: To provide a plurality of directories for managing a cache state in one node and to guarantee coherency. <P>SOLUTION: A system for reducing latency includes a first processor, a memory, a first directory, a second directory, a first directory control unit, and a second directory control unit. The memory is shared by the first processor belonging to one node and a second processor belonging to the other node. The first directory manages the cache state of the one node. The second directory manages the cache state of the other node. The first directory control unit processes a request issued in the one node, and searches the first directory. The second directory control unit processes a request issued from the other node, searches the second directory, and performs predetermined control for guaranteeing coherency. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、レイテンシ短縮方式及び方法に関する。 The present invention relates to a latency shortening method and method.

ある関連技術においては、コヒーレンシを保障する制御を簡単化するために、メモリアドレスに関するキャッシュの管理を１個のディレクトリで行い、受信順にリクエストを処理する。しかしながら、この関連技術のように、１個のディレクトリでキャッシュの管理を行うようにすると次のような問題がある。すなわち、ディレクトリをメモリコントローラのところに置くと、他ノードからの距離が遠くなり、グローバルスイッチのところに置くと、自ノード内のプロセッサからの距離が遠くなる。よって、ディレクトリの位置により、自ノード又は他ノードからのリクエスト処理のレイテンシが悪化するという問題である。 In a related technique, in order to simplify control for ensuring coherency, cache management related to memory addresses is performed in one directory, and requests are processed in the order received. However, if the cache is managed by one directory as in this related technique, there is the following problem. That is, when the directory is placed at the memory controller, the distance from other nodes is increased, and when the directory is placed at the global switch, the distance from the processor in the own node is increased. Therefore, there is a problem that the latency of request processing from the own node or other nodes deteriorates depending on the directory position.

関連技術１の説明図である図１Ａ及び図１Ｂと、関連技術２の説明図である図２Ａ及び図２Ｂとを用いて、リクエスト処理の経路を説明する。関連技術１は、ディレクトリがメモリコントローラの所にのみ接続されている構成である。関連技術２は、ディレクトリがグローバルスイッチの所にのみ接続されている構成である。 The request processing path will be described with reference to FIGS. 1A and 1B which are explanatory diagrams of the related technology 1 and FIGS. 2A and 2B which are explanatory diagrams of the related technology 2. FIG. Related Art 1 is a configuration in which a directory is connected only to a memory controller. Related technology 2 is a configuration in which a directory is connected only to a global switch.

図１Ａ，図２Ａは、プロセッサ２０２からメモリ１４０へ向けてリクエストが発行された場合において、プロセッサ３０２のキャッシュに有効なデータがあるケースの説明図である。図１Ａは、関連技術１におけるリクエスト発行からデータ受信までの経路を示し、図２Ａは、関連技術２におけるリクエスト発行からデータ受信までの経路を示している。 FIG. 1A and FIG. 2A are explanatory diagrams of a case where there is valid data in the cache of the processor 302 when a request is issued from the processor 202 to the memory 140. FIG. 1A shows a path from request issuance to data reception in the related technique 1, and FIG. 2A shows a path from request issuance to data reception in the related technique 2.

図１Ａの関連技術１における経路を説明する。プロセッサ２０２からリクエストが発行されると、このリクエストは、ノード２００ａ内のローカルスイッチ２１０からグローバルスイッチ２７０を通り、メモリ１４０が存在するノード１００ａのグローバルスイッチ１７０を経由して、ローカルスイッチ１１０を通りメモリ・ディレクトリ制御部１２０に入る。メモリ・ディレクトリ制御部１２０は、メモリ１４０にリクエストを出すのと同時にディレクトリ１３０を索引し、キャッシュに有効なデータを持っているプロセッサ３０２に対してスヌープ（ＳＮＯＯＰ：キャッシュ制御用のリクエスト）を発行する。このスヌープは、ローカルスイッチ１１０、グローバルスイッチ１７０を通り、プロセッサ３０２が存在するノード３００ａのグローバルスイッチ３７０、ローカルスイッチ３１０を通りプロセッサ３０２に入る。 A route in the related technique 1 of FIG. 1A will be described. When a request is issued from the processor 202, the request passes from the local switch 210 in the node 200a through the global switch 270, through the global switch 170 of the node 100a in which the memory 140 exists, through the local switch 110, and into the memory. Enter the directory control unit 120. The memory / directory control unit 120 indexes the directory 130 simultaneously with making a request to the memory 140 and issues a snoop (SNOOP: request for cache control) to the processor 302 having valid data in the cache. . This snoop passes through the local switch 110 and the global switch 170, and enters the processor 302 through the global switch 370 and the local switch 310 of the node 300a where the processor 302 exists.

スヌープを受信するプロセッサ３０２が実際に有効なキャッシュデータを保持している場合には、キャッシュデータを発行元のプロセッサ２０２へ転送し、さらにスヌープに対するレスポンスをメモリ・ディレクトリ制御部１２０へ返却する。キャッシュデータは、ローカルスイッチ３１０、グローバルスイッチ３７０、リクエスト元のグローバルスイッチ２７０、ローカルスイッチ２１０を通りプロセッサ２０２へ転送される。レイテンシはリクエスト発行からデータ受信までであり、メモリ１４０に対するリクエストおよびプロセッサ３０２からのレスポンスに関してはレイテンシに見えてこないため経路には表示していない。 If the processor 302 that receives the snoop actually holds valid cache data, the cache data is transferred to the issuing processor 202 and a response to the snoop is returned to the memory / directory control unit 120. The cache data is transferred to the processor 202 through the local switch 310, the global switch 370, the request source global switch 270, and the local switch 210. The latency is from the request issuance to the data reception, and the request to the memory 140 and the response from the processor 302 are not displayed on the route because they do not appear as latency.

図２Ａの関連技術２における経路の説明をする。プロセッサ２０２からリクエストが発行されると、このリクエストは、ノード２００ｂ内のローカルスイッチ２１０からグローバルスイッチ・ディレクトリ制御部２９０を通り、メモリ１４０が存在するノード１００ａのグローバルスイッチ・ディレクトリ制御部１９０に入る。グローバルスイッチ・ディレクトリ制御部１９０は、メモリ１４０にリクエストを発行するのと同時にディレクトリ１６０を索引し、キャッシュに有効なデータを保持しているプロセッサ３０２に対してスヌープを発行する。このスヌープは、プロセッサ３０２が所属するノード３００ｂのグローバルスイッチ・ディレクトリ制御部３９０、ローカルスイッチ３１０を通りプロセッサ３０２に入る。 The route in the related technique 2 of FIG. 2A will be described. When a request is issued from the processor 202, the request passes from the local switch 210 in the node 200b through the global switch / directory control unit 290 and enters the global switch / directory control unit 190 of the node 100a in which the memory 140 exists. The global switch / directory control unit 190 indexes the directory 160 simultaneously with issuing a request to the memory 140, and issues a snoop to the processor 302 that holds valid data in the cache. This snoop enters the processor 302 through the global switch / directory control unit 390 and the local switch 310 of the node 300b to which the processor 302 belongs.

スヌープを受信するプロセッサ３０２が実際に有効なキャッシュデータを保持している場合には、キャッシュデータを発行元のプロセッサ２０２へ転送し、さらにスヌープに対するレスポンスをグローバルスイッチ・ディレクトリ制御部１９０へ返却する。キャッシュデータは、ローカルスイッチ３１０、グローバルスイッチ・ディレクトリ制御部３９０、リクエスト元のグローバルスイッチ・ディレクトリ制御部２９０、ローカルスイッチ２１０を通りプロセッサ２０２へ転送される。レイテンシは、リクエスト発行からデータ受信までであり、メモリ１４０に対するリクエストおよびプロセッサ３０２からのレスポンスに関してはレイテンシに見えてこないため経路には表示していない。 When the processor 302 that receives the snoop actually holds valid cache data, the cache data is transferred to the issuing processor 202, and a response to the snoop is returned to the global switch / directory control unit 190. The cache data is transferred to the processor 202 through the local switch 310, the global switch / directory control unit 390, the request source global switch / directory control unit 290, and the local switch 210. The latency is from the request issuance to the data reception, and the request to the memory 140 and the response from the processor 302 are not displayed on the route because they do not appear to be a latency.

図１Ｂ，図２Ｂは、プロセッサ１０２からメモリ１４０へ向けてリクエストが発行された場合において、プロセッサ１０１のキャッシュに有効なデータがあるケースの説明図である。図１Ｂは、関連技術１におけるリクエスト発行からデータ受信までの経路を示し、図２Ｂは、関連技術２におけるリクエスト発行からデータ受信までの経路を示している。 FIGS. 1B and 2B are explanatory diagrams of a case where there is valid data in the cache of the processor 101 when a request is issued from the processor 102 to the memory 140. FIG. 1B shows a path from request issuance to data reception in the related technique 1, and FIG. 2B shows a path from request issuance to data reception in the related technique 2.

図１Ｂの関連技術１における経路の説明をする。プロセッサ１０２からリクエストが発行されると、このリクエストは、ノード１００ａ内のローカルスイッチ１１０からメモリ・ディレクトリ制御部１２０へ入る。メモリ・ディレクトリ制御部１２０は、メモリ１４０にリクエストを発行するのと同時にディレクトリ１３０を索引し、キャッシュに有効なデータを保持しているプロセッサ１０１に対してスヌープを発行する。このスヌープは、ローカルスイッチ１１０を通りプロセッサ１０１に入る。スヌープを受信するプロセッサ１０１が実際に有効なキャッシュデータを保持していた場合には、キャッシュデータを発行元のプロセッサ１０２へ転送し、さらにスフープに対するレスポンスをメモリ・ディレクトリ制御部１２０に対して返却する。キャッシュデータは、ローカルスイッチ１１０を通りプロセッサ１０２へ転送される。レイテンシはリクエスト発行からデータの受信までであり、メモリ１４０対するリクエストおよびプロセッサ１０１からのレスポンスに関してはレイテンシに見えてこないため経路には表示していない。 The route in the related technique 1 of FIG. 1B will be described. When a request is issued from the processor 102, the request enters the memory / directory control unit 120 from the local switch 110 in the node 100a. The memory / directory control unit 120 indexes the directory 130 simultaneously with issuing a request to the memory 140, and issues a snoop to the processor 101 holding valid data in the cache. This snoop enters the processor 101 through the local switch 110. If the processor 101 that receives the snoop actually holds valid cache data, the cache data is transferred to the processor 102 that issued the snoop, and a response to the swoop is returned to the memory / directory control unit 120. . The cache data is transferred to the processor 102 through the local switch 110. The latency is from the request issuance to the data reception, and the request to the memory 140 and the response from the processor 101 do not appear as a latency because they do not appear as a latency.

図２Ｂの関連技術２における経路の説明をする。プロセッサ１０２からリクエストが発行されると、このリクエストは、ノード１００ａ内のローカルスイッチ１１０からメモリ制御部１８０に入る。メモリ制御部１８０は、メモリ１４０にリクエストを発行するのと同時に、ディレクトリ１６０を制御しているグローバルスイッチ・ディレクトリ制御部１９０にスヌープを発行する。スヌープは、ローカルスイッチ１１０を通り、グローバルスイッチ・ディレクトリ制御部１９０に入る。スヌープを受信すると、グローバルスイッチ・ディレクトリ制御部１９０は、ディレクトリ１６０を索引し、キャッシュに有効なデータを保持しているプロセッサ１０１に対してスヌープを発行する。このスヌープは、ローカルスイッチ１１０を通りプロセッサ１０１に入る。スヌープを受信するプロセッサ１０１が実際に有効なキャッシュデータを保持していた場合には、キャッシュデータを発行元のプロセッサ１０２へ転送し、さらにスヌープに対するレスポンスをグローバルスイッチ・ディレクトリ制御部１９０に返却する。キャッシュデータは、ローカルスイッチ１１０を通りプロセッサ１０２へ転送される。レイテンシはリクエスト発行からデータ受信までであり、メモリ１４０に対するリクエストおよびプロセッサ１０１からのレスポンスに関してはレイテンシに見えてこないため経路には表示していない。 The route in the related technique 2 in FIG. 2B will be described. When a request is issued from the processor 102, the request enters the memory control unit 180 from the local switch 110 in the node 100a. The memory control unit 180 issues a snoop to the global switch / directory control unit 190 that controls the directory 160 simultaneously with issuing a request to the memory 140. The snoop passes through the local switch 110 and enters the global switch / directory control unit 190. When the snoop is received, the global switch / directory control unit 190 indexes the directory 160 and issues the snoop to the processor 101 holding the valid data in the cache. This snoop enters the processor 101 through the local switch 110. If the processor 101 that receives the snoop actually holds the valid cache data, the cache data is transferred to the issuing processor 102 and a response to the snoop is returned to the global switch / directory control unit 190. The cache data is transferred to the processor 102 through the local switch 110. The latency is from the request issuance to the data reception, and the request to the memory 140 and the response from the processor 101 are not displayed on the route because they do not appear as latency.

図１Ａと図２Ａとを対比すると明らかなように、他ノードからのリクエストに関しては、関連技術２の方がディレクトリまでの距離が近い分、関連技術１に比べてレイテンシが短縮される。反面、図１Ｂと図２Ｂとを対比すると明らかなように、自ノード内のリクエストに関しては、関連技術２は、関連技術１に比べてディレクトリまでの距離が遠くなるため、レイテンシが悪化する。 As is clear from the comparison between FIG. 1A and FIG. 2A, with regard to requests from other nodes, the latency of the related technology 2 is shorter than that of the related technology 1 because the distance to the directory is shorter. On the other hand, as is clear from the comparison between FIG. 1B and FIG. 2B, regarding the request in the own node, the related technology 2 has a longer distance to the directory than the related technology 1, and thus the latency deteriorates.

そこで、ディレクトリを２箇所に設けて、自ノード内におけるリクエスト処理又は他ノードからのリクエスト処理のいずれか一方に生じ得るレイテンシの悪化を防ぐことが考え得る。 Thus, it is conceivable to provide directories at two locations to prevent deterioration in latency that may occur in either request processing within the own node or request processing from another node.

図３に、ディレクトリを２箇所に置いた関連技術３におけるリクエスト処理とスヌープ処理との説明図を示す。図３では、一つのノード１００ｃのみを示しているが、このノード１００ｃは、グローバルスイッチ・ディレクトリ制御部１９０を介して、同様の構成を有する他のノードと接続されている。図３において、ノード１００ｃは、複数のプロセッサ１０１，１０２と、自ノード内のキャッシュ状態を管理するディレクトリ１３０と、メモリ１４０と、ディレクトリ１３０及びメモリ１４０の制御を行うメモリ・ディレクトリ制御部１２０と、他ノードのキャッシュ状態を管理するディレクトリ１６０と、ノード間のルーティングとディレクトリ１６０の制御を行うグローバルスイッチ・ディレクトリ制御部１９０と、これらをノード内で繋ぐローカルスイッチ１１０とを有している。 FIG. 3 is an explanatory diagram of request processing and snoop processing in the related technique 3 in which two directories are placed. Although only one node 100c is shown in FIG. 3, this node 100c is connected to other nodes having the same configuration via the global switch / directory control unit 190. In FIG. 3, the node 100 c includes a plurality of processors 101 and 102, a directory 130 that manages the cache state in the node, a memory 140, a memory / directory control unit 120 that controls the directory 130 and the memory 140, It has a directory 160 that manages the cache state of other nodes, a global switch / directory control unit 190 that performs routing between the nodes and controls the directory 160, and a local switch 110 that connects them within the node.

プロセッサ１０２からメモリ１４０へ向けてリクエストが発行された場合において、他ノードに存在するプロセッサのキャッシュに有効なデータがあるケースを考える。自ノード内のプロセッサ１０２から発行されるリクエスト（実線矢印）は、ローカルスイッチ１１０を介して、メモリ・ディレクトリ制御部１２０へ到達する。メモリ・ディレクトリ制御部１２０は、メモリ１４０へリクエストを発行するのと同時に、ディレクトリ１３０を索引し、キャッシュ状態が有効な他ノードのプロセッサに対してスヌープを発行する（実線丸印）。 When a request is issued from the processor 102 to the memory 140, consider a case where there is valid data in the cache of the processor existing in another node. A request (solid arrow) issued from the processor 102 in the own node reaches the memory / directory control unit 120 via the local switch 110. At the same time as issuing a request to the memory 140, the memory / directory control unit 120 indexes the directory 130 and issues a snoop to the processor of another node whose cache state is valid (solid line circle).

一方、他ノードに所属するプロセッサからメモリ１４０へ向けてリクエストが発行された場合において、別のノードに所属するプロセッサのキャッシュに有効なデータがあるケースを考える。他ノードからのリクエスト（点線矢印）は、自ノード以外のキャッシュを制御するグローバルスイッチ・ディレクトリ制御部１９０によって受信される。グローバルスイッチ・ディレクトリ制御部１９０は、メモリ１４０へリクエストを発行するのと同時に、ディレクトリ１６０を索引し、キャッシュ状態が有効な別のノードに所属するプロセッサに対してスヌープを発行する（点線丸印）。 On the other hand, when a request is issued from a processor belonging to another node to the memory 140, consider a case where there is valid data in the cache of a processor belonging to another node. A request (dotted arrow) from another node is received by the global switch / directory control unit 190 that controls a cache other than the own node. At the same time that the global switch / directory control unit 190 issues a request to the memory 140, the directory 160 is indexed, and a snoop is issued to a processor belonging to another node whose cache state is valid (dotted circle). .

図３を用いて、関連技術３の問題点を説明する。別のノードに存在するプロセッサのキャッシュに有効なデータがある場合を考える。このデータに対して、自ノード１００ｃ内のプロセッサ１０２からのリクエスト発行（実線矢印）と、他ノードからのリクエスト受信（点線矢印）とが同じようなタイミングで起こるとする。 The problem of the related technique 3 is demonstrated using FIG. Consider the case where there is valid data in the cache of a processor residing on another node. Assume that for this data, a request issuance (solid arrow) from the processor 102 in the own node 100c and a request reception (dotted arrow) from another node occur at the same timing.

このとき、メモリ・ディレクトリ制御部１２０は、プロセッサ１０２のリクエスト（実線矢印）、他ノードからのリクエスト（点線矢印）の順でリクエストを受信し、処理する。メモリ・ディレクトリ制御部１２０は、まず、プロセッサ１０２からのリクエスト（実線矢印）の処理を開始し、プロセッサ１０２からのリクエストに対するスヌープ（実線丸印）の返答を待つ。この間、他ノードからのリクエスト（点線矢印）の処理は待たされる。 At this time, the memory / directory control unit 120 receives and processes requests in the order of a request from the processor 102 (solid arrow) and a request from another node (dotted arrow). First, the memory / directory control unit 120 starts processing a request (solid arrow) from the processor 102 and waits for a snoop (solid line circle) response to the request from the processor 102. During this time, processing of requests (dotted arrows) from other nodes is awaited.

一方、グローバルスイッチ・ディレクトリ制御部１９０は、他ノードからのリクエストに対するスヌープ（点線丸印）、自ノード内のリクエストに対するスヌープ（実線丸印）の順でスヌープを受信し、処理する。グローバルスイッチ・ディレクトリ制御部１９０は、まず、他ノードからのリクエストに対するスヌープ（点線丸印）の処理を開始し、別のノードへスヌープ（点線丸印）を発行する。その後、別のノードからスヌープ（点線丸印）に対するレスポンスを受信することになるが、メモリ１４０へ発行したリクエスト（点線矢印）の返答を待って、スヌープ（点線丸印）の処理を終了する。この間、自ノード内のリクエストに対するスヌープ（実線丸印）の処理は待たされる。 On the other hand, the global switch / directory control unit 190 receives and processes snoops in the order of snoops for requests from other nodes (dotted circles), and snoops for requests in the own node (solid line circles). First, the global switch / directory control unit 190 starts a snoop (dotted circle) process for a request from another node, and issues a snoop (dotted circle) to another node. Thereafter, a response to the snoop (dotted line circle) is received from another node, but the response of the request issued to the memory 140 (dotted line arrow) is waited for, and the snoop (dotted line circle) process is terminated. During this time, the snoop (solid circle) process for the request in the own node is awaited.

以上説明したように、メモリ・ディレクトリ制御部１２０は、グローバルスイッチ・ディレクトリ制御部１９０のところで待ち状態になるスヌープ（実線丸印）の返答を受け取ることができず、また、グローバルスイッチ・ディレクトリ制御部１９０は、メモリ・ディレクトリ制御部１２０のところで待ち状態になるリクエスト（点線矢印）の返答を受け取ることができない。 As described above, the memory / directory control unit 120 cannot receive a response of a snoop (solid circle) that is in a waiting state at the global switch / directory control unit 190, and the global switch / directory control unit 190 cannot receive a reply to a request (dotted arrow) that is in a waiting state at the memory / directory control unit 120.

階層メモリのコヒーレンシに関連する技術が特許文献に記載されている。特開平８−１６４７４号公報（特許文献１参照）には、マルチプロセッサシステムの発明が記載されている。マルチプロセッサシステムは、それぞれキャッシュメモリを有する複数のプロセッサと、複数のプロセッサのそれぞれに接続されたローカルメモリと、複数のプロセッサ及びローカルメモリに接続された通信制御装置とを含むクラスタを複数具備する。複数のクラスタは、互いに通信制御装置を介して接続される。通信制御装置は、記憶手段と、キャッシュメモリ内容一致化範囲判定手段とを有する。記憶手段は、自クラスタのローカルメモリのデータのうち、他クラスタのキャッシュメモリに登録されたデータに関する情報を記憶する。キャッシュメモリ内容一致化範囲判定手段は、記憶手段に基づきキャッシュメモリ内容を一致化すべき範囲を判定する。 Technologies related to hierarchical memory coherency are described in the patent literature. Japanese Patent Laying-Open No. 8-16474 (see Patent Document 1) describes an invention of a multiprocessor system. The multiprocessor system includes a plurality of clusters each including a plurality of processors each having a cache memory, a local memory connected to each of the plurality of processors, and a plurality of processors and a communication control device connected to the local memory. The plurality of clusters are connected to each other via a communication control device. The communication control apparatus includes a storage unit and a cache memory content matching range determination unit. The storage means stores information related to data registered in the cache memory of the other cluster among the data of the local memory of the own cluster. The cache memory content matching range determination means determines a range in which the cache memory contents should be matched based on the storage means.

特開平１１−１３４３１２号公報（特許文献２参照）には、分散共有メモリ多重プロセッサシステムの発明が記載されている。分散共有メモリ多重プロセッサシステムは、複数のプロセッサノードと、リングバスとを具備する。複数のプロセッサノードは、リング形態に配列され、複数のプロセッサノードのうちの何れか一つが１つのデータブロックに対する要求信号を発生し、残余プロセッサノードが自分の内部要素をスヌーピングし、残余プロセッサノードのうちの何れか一つがデータブロックを発生する。リングバスは、複数のプロセッサノードをリング形態に結び、要求信号を他のプロセッサノードの各々に同報通信し、要求信号を発生したプロセッサノードへデータブロックを単一通信する経路を提供する。 Japanese Patent Laid-Open No. 11-13412 (see Patent Document 2) describes an invention of a distributed shared memory multiprocessor system. The distributed shared memory multiprocessor system includes a plurality of processor nodes and a ring bus. The plurality of processor nodes are arranged in a ring form, and any one of the plurality of processor nodes generates a request signal for one data block, the remaining processor nodes snoop their own internal elements, and the remaining processor nodes Any one of them generates a data block. The ring bus connects a plurality of processor nodes into a ring form, broadcasts a request signal to each of the other processor nodes, and provides a path for a single communication of data blocks to the processor node that generated the request signal.

特開２００６−２７７７６２号公報（特許文献３参照）には、データ処理システムの発明が記載されている。データ処理システムは、複数のプロセッサと、複数のキャッシュと、ディレクトリと、制御装置とを備える。複数のキャッシュは、複数のプロセッサのそれぞれに対応して設けられ、各々複数のキャッシュ・ライン・エントリを保存するように構築される。ディレクトリは、複数のキャッシュ中のキャッシュ・ライン・エントリ状態を把握するための状態情報を保持する。制御装置は、ディレクトリに含まれる状態情報の読み出し及び書き込みを行う。ディレクトリの第１の部分は、複数のキャッシュ・ライン・エントリの第１サブセットに関する一時的状態情報を保持する。ディレクトリの第２部分が。複数のキャッシュ・ライン・エントリの第２サブセットに関する非一次的状態を保持する。 Japanese Patent Laying-Open No. 2006-277762 (see Patent Document 3) describes an invention of a data processing system. The data processing system includes a plurality of processors, a plurality of caches, a directory, and a control device. A plurality of caches are provided corresponding to each of the plurality of processors, and are constructed to store a plurality of cache line entries. The directory holds state information for grasping cache line entry states in a plurality of caches. The control device reads and writes status information included in the directory. The first portion of the directory holds temporary state information regarding a first subset of the plurality of cache line entries. The second part of the directory. Maintain non-primary state for the second subset of the plurality of cache line entries.

特開平８−１６４７４号公報JP-A-8-16474 特開平１１−１３４３１２号公報Japanese Patent Laid-Open No. 11-13412 特開２００６−２７７７６２号公報JP 2006-277762 A

本発明の課題は、一つのノード内にキャッシュ状態を管理するディレクトリを複数設けると共に、リクエストの処理順とスヌープの処理順とが異なる場合でも、コヒーレンシを保障できるレイテンシ短縮方式、及び、レイテンシ短縮方法を提供することである。 An object of the present invention is to provide a plurality of directories for managing the cache state in one node, and a latency shortening method and a latency shortening method capable of ensuring coherency even when the request processing order and the snoop processing order are different Is to provide.

本発明の一つ目のアスペクトによるレイテンシ短縮方式は、第一のプロセッサと、メモリと、第一のディレクトリと、第二のディレクトリと、第一のディレクトリ制御部と、第二のディレクトリ制御部とを具備する。第一のプロセッサは、自ノードに所属する。メモリは、第一のプロセッサと、他ノードに所属する第二のプロセッサとに共有される。第一のディレクトリは、自ノードのキャッシュ状態を管理する。第二のディレクトリは、他ノードのキャッシュ状態を管理する。第一のディレクトリ制御部は、自ノード内で発行されるリクエストを処理し、メモリのアドレスを指定する第一のリクエストを受信した場合には、第一のディレクトリを索引する。第二のディレクトリ制御部は、他ノードから発行されるリクエストを処理し、メモリのアドレスを指定する第二のリクエストを受信した場合には、第二のディレクトリを索引すると共に、第二のリクエストを処理する場合に、他のリクエスト又はスヌープを受信したときには、アドレスが競合するか否かを調べることによって、コヒーレンシを保障するための所定の制御を行う。 The latency reduction method according to the first aspect of the present invention includes a first processor, a memory, a first directory, a second directory, a first directory control unit, and a second directory control unit. It comprises. The first processor belongs to its own node. The memory is shared by the first processor and the second processor belonging to another node. The first directory manages the cache state of the own node. The second directory manages the cache state of other nodes. The first directory control unit processes a request issued in its own node, and when receiving a first request designating a memory address, the first directory control unit indexes the first directory. When the second directory control unit processes a request issued from another node and receives a second request for specifying a memory address, the second directory control unit indexes the second directory and sends the second request. In the case of processing, when another request or snoop is received, predetermined control for ensuring coherency is performed by checking whether or not the address conflicts.

本発明の二つ目のアスペクトによるレイテンシ短縮方法は、第一のディレクトリを設けることと、第二のディレクトリを設けることと、第一のディレクトリ制御部を設けることと、第二のディレクトリ制御部を設けることと、第一のリクエストを処理することと、第三のリクエストを処理することとを具備する。第一のディレクトリを設けることにおいては、自ノードのキャッシュ状態を管理する第一のディレクトリを設ける。第二のディレクトリを設けることにおいては、他ノードのキャッシュ状態を管理する第二のディレクトリを設ける。第一のディレクトリ制御部を設けることにおいては、第一のディレクトリを索引する第一のディレクトリ制御部を設ける。第二のディレクトリ制御部を設けることにおいては、第二のディレクトリを索引する第二のディレクトリ制御部を設ける。第一のリクエストを処理することにおいては、自ノードに所属する第一のプロセッサと、他ノードに所属する第二のプロセッサとに共有されるメモリのアドレスを指定する第一のリクエストが自ノード内で発行されたときに、第一のディレクトリ制御部により、メモリへ第二のリクエストを発行し、第一のディレクトリを索引し、第一のリクエストを処理する。第三のリクエストを処理することにおいては、メモリのアドレスを指定する第三のリクエストが他ノードから発行されたときに、第二のディレクトリ制御部により、メモリへ第四のリクエストを発行し、第二のディレクトリを索引すると共に、他のリクエスト又はスヌープを受信した場合には、アドレスが競合するか否かを調べることによって、コヒーレンシを保障するための所定の制御を行い、第三のリクエストを処理する。 The latency reduction method according to the second aspect of the present invention includes providing a first directory, providing a second directory, providing a first directory control unit, and providing a second directory control unit. Providing, processing the first request, and processing the third request. In providing the first directory, a first directory for managing the cache state of the own node is provided. In providing the second directory, a second directory for managing the cache state of another node is provided. In providing the first directory control unit, a first directory control unit for indexing the first directory is provided. In providing the second directory control unit, a second directory control unit for indexing the second directory is provided. In processing the first request, the first request specifying the address of the memory shared by the first processor belonging to the own node and the second processor belonging to the other node is in the own node. The first directory control unit issues a second request to the memory, indexes the first directory, and processes the first request. In processing the third request, when the third request specifying the memory address is issued from another node, the second directory control unit issues the fourth request to the memory, When the second directory is indexed and another request or snoop is received, the third request is processed by performing predetermined control to ensure coherency by checking whether the address conflicts or not. To do.

本発明によれば、一つのノード内にキャッシュ状態を管理するディレクトリを複数設けると共に、リクエストの処理順とスヌープの処理順とが異なる場合でも、コヒーレンシを保障できるレイテンシ短縮方式、及び、レイテンシ短縮方法を提供することができる。 According to the present invention, there are provided a plurality of directories for managing the cache state in one node, and a latency reduction method and a latency reduction method capable of ensuring coherency even when the request processing order and the snoop processing order are different. Can be provided.

本発明を実施するための最良の形態の一つについて、図面を参照して詳細に説明する。図４の構成図を参照すると、本発明による一つの実施の形態では、共有メモリを含む多数のマルチプロセッサノード１００，２００，３００が、グローバルスイッチ１５０，２５０，３５０を介して接続されている。各ノード１００，２００，３００は、プロセッサ１０１，１０２，２０１，２０２，３０１，３０２と、ローカルスイッチ１１０，２１０，３１０と、メモリ・ディレクトリ制御部１２０，２２０，３２０と、メモリ１４０，２４０，３４０と、ディレクトリ１３０，１６０，２３０，２６０，３３０，３６０と、グローバルスイッチ・ディレクトリ制御部１５０，２５０，３５０とをそれぞれ含んでいる。 One of the best modes for carrying out the present invention will be described in detail with reference to the drawings. Referring to the configuration diagram of FIG. 4, in one embodiment according to the present invention, a multiplicity of multiprocessor nodes 100, 200, 300 including a shared memory are connected via global switches 150, 250, 350. Each node 100, 200, 300 includes processors 101, 102, 201, 202, 301, 302, local switches 110, 210, 310, memory / directory control units 120, 220, 320, and memories 140, 240, 340. And directories 130, 160, 230, 260, 330, and 360, and global switch / directory control units 150, 250, and 350, respectively.

図４において、３つのノード１００，２００，３００内の構成は同様であるため、１つのノード１００に関して各ブロックの機能説明を行うことにする。ノード１００において、ローカルスイッチ１１０は、プロセッサ１０１，１０２と、メモリ・ディレクトリ制御部１２０と、グローバルスイッチ・ディレクトリ制御部１５０とに接続されている。ローカルスイッチ１１０は、リクエスト、スヌープ、スヌープに対する応答（本願において、レスポンスという。）、リクエストに対する応答（本願において、リプライという。）を、これらに含まれる宛て先データに従ってルーティングする。 In FIG. 4, since the configurations in the three nodes 100, 200, and 300 are the same, the function of each block will be described with respect to one node 100. In the node 100, the local switch 110 is connected to the processors 101 and 102, the memory / directory control unit 120, and the global switch / directory control unit 150. The local switch 110 routes a request, a snoop, a response to the snoop (referred to as a response in this application), and a response to the request (referred to as a reply in this application) according to the destination data included in these.

メモリ・ディレクトリ制御部１２０は、メモリ１４０に対する全てのリクエストを受信する。リクエストを受信すると、メモリ・ディレクトリ制御部１２０は、メモリ１４０へリクエストを発行すると共に、ディレクトリ１３０を索引して、有効なキャッシュがある場合には、対象となる自ノード内のプロセッサ１０１，１０２又は自ノード内のグローバルスイッチ・ディレクトリ制御部１５０へスヌープを発行する。その後、メモリ・ディレクトリ制御部１２０は、スヌープに対するレスポンスと、メモリ１４０に発行したリクエストに対するリプライとが揃ったところで、リクエストを発行したプロセッサに対して、そのリクエストに対するリプライを返却する。リプライの返却によって、リクエストの処理を終了する。 The memory / directory control unit 120 receives all requests for the memory 140. Upon receipt of the request, the memory / directory control unit 120 issues a request to the memory 140 and indexes the directory 130, and if there is a valid cache, the processor 101, 102 in the target own node or A snoop is issued to the global switch / directory control unit 150 in the own node. Thereafter, when the response to the snoop and the reply to the request issued to the memory 140 are completed, the memory / directory control unit 120 returns a reply to the request to the processor that issued the request. The request processing is terminated by returning the reply.

また、メモリ・ディレクトリ制御部１２０は、コヒーレンシを保障する為の制御として、グローバルスイッチ・ディレクトリ制御部１５０からキャンセル通知を受信する。キャンセル通知を受信すると、メモリ・ディレクトリ制御部１２０は、一緒に送られてくるキャンセル対象リクエストの識別子ＩＤで、バッファ内に格納されているリクエストと一致比較を行う。一致した場合、そのリクエストをキャンセルする。 In addition, the memory / directory control unit 120 receives a cancel notification from the global switch / directory control unit 150 as control for ensuring coherency. When the cancellation notification is received, the memory / directory control unit 120 compares the request stored in the buffer with the identifier ID of the cancellation target request sent together. If it matches, cancel the request.

ディレクトリ１３０，１６０は、キャッシュ状態を管理している。メモリ・ディレクトリ制御部１２０に接続されているディレクトリ１３０は、ノード１００内のキャッシュ状態を管理している。グローバルスイッチ・ディレクトリ制御部１５０に接続されているディレクトリ１６０は、他ノード２００，３００のキャッシュ状態を、すなわち、他ノード２００，３００に所属するプロセッサ２０１，２０２，３０１，３０２のキャッシュ状態を管理している。ノード１００内のキャッシュ状態には、ノード１００内のプロセッサ１０１，１０２のキャッシュ状態と、ノード１００内のグローバルスイッチ・ディレクトリ制御部１５０のキャッシュ状態、すなわち、他ノード２００，３００のプロセッサ２０１，２０２，３０１，３０２のキャッシュ状態とがある。 The directories 130 and 160 manage the cache state. The directory 130 connected to the memory / directory control unit 120 manages the cache state in the node 100. The directory 160 connected to the global switch / directory control unit 150 manages the cache state of the other nodes 200, 300, that is, the cache state of the processors 201, 202, 301, 302 belonging to the other nodes 200, 300. ing. The cache state in the node 100 includes the cache state of the processors 101 and 102 in the node 100 and the cache state of the global switch / directory control unit 150 in the node 100, that is, the processors 201, 202, There are 301 and 302 cache states.

グローバルスイッチ・ディレクトリ制御部１５０は、他ノード２００，３００のグローバルスイッチ・ディレクトリ制御部２５０，３５０と接続され、リクエスト、スヌープ、レスポンス、リプライを、これらに含まれる宛て先データに従ってルーティングする。グローバルスイッチ・ディレクトリ制御部１５０は、メモリ・ディレクトリ制御部１２０からスヌープを受信したら、ディレクトリ１６０を索引する。ディレクトリ１６０の索引により、有効なキャッシュがある場合には、対象となるプロセッサに対してスヌープを発行する。その後、キャッシュデータが対象となるプロセッサに転送されると、メモリ・ディレクトリ制御部１２０へスヌープに対するレスポンスを返却する。 The global switch / directory control unit 150 is connected to the global switch / directory control units 250 and 350 of the other nodes 200 and 300, and routes requests, snoops, responses, and replies according to destination data included therein. When the global switch / directory control unit 150 receives the snoop from the memory / directory control unit 120, the global switch / directory control unit 150 indexes the directory 160. If there is a valid cache according to the index of the directory 160, a snoop is issued to the target processor. Thereafter, when the cache data is transferred to the target processor, a response to the snoop is returned to the memory / directory control unit 120.

同様に、グローバルスイッチ・ディレクトリ制御部１５０は、他ノード２００，３００のグローバルスイッチ・ディレクトリ制御部２５０，３５０からリクエストを受信したら、ディレクトリ１６０を索引する。ディレクトリ１６０の索引により、有効なキャッシュがある場合には、対象となるプロセッサに対してスヌープを発行する。その後、スヌープに対するレスポンスと、メモリ１４０に発行したリクエストに対するリプライが揃ったところで、リクエストを発行した他ノードのプロセッサに対して、リクエストに対するリプライを返却する。更に、グローバルスイッチ・ディレクトリ制御部１５０は、発明の背景の欄で挙げたリクエストの処理順とスヌープの処理順が異なるケースのコヒーレンシ保障を実現している。 Similarly, when the global switch / directory control unit 150 receives a request from the global switch / directory control units 250 and 350 of the other nodes 200 and 300, it indexes the directory 160. If there is a valid cache according to the index of the directory 160, a snoop is issued to the target processor. Thereafter, when the response to the snoop and the reply to the request issued to the memory 140 are completed, the reply to the request is returned to the processor of the other node that issued the request. Further, the global switch / directory control unit 150 realizes coherency guarantee in the case where the request processing order and the snoop processing order described in the background section of the invention are different.

次に、グローバルスイッチ・ディレクトリ制御部１５０のコヒーレンシを保障する為の制御を行っている部分に関する詳細な構成について説明する。図５は、本実施の形態におけるグローバルスイッチ・ディレクトリ制御部１５０の構成説明図である。図５において、グローバルスイッチ・ディレクトリ制御部１５０は、調停回路１５１，１５８と、アドレス一致回路１５２と、キャンセル判定回路１５３と、制御レジスタ１５４と、メッセージ格納バッファ１５５と、再送バッファ１５６と、メッセージ発行・生成回路１５７とを有している。 Next, a detailed configuration of a part that performs control for ensuring coherency of the global switch / directory control unit 150 will be described. FIG. 5 is an explanatory diagram of the configuration of the global switch / directory control unit 150 in this embodiment. In FIG. 5, the global switch / directory control unit 150 includes an arbitration circuit 151, 158, an address matching circuit 152, a cancel determination circuit 153, a control register 154, a message storage buffer 155, a retransmission buffer 156, and a message issuance. A generation circuit 157 is included.

調停回路１５１は、ローカルスイッチ１１０からのメッセージと他ノード２００，３００からのメッセージとを調停し、１つに絞っている。ここで、メッセージは、リクエスト、リプライ、スヌープ、及び、レスポンスを総称した用語である。なお、リクエストに対するリプライは、対応するリクエストと同一のエントリに格納され、スヌープに対するレスポンスは、対応するスヌープと同一のエントリに格納される。 The arbitration circuit 151 arbitrates messages from the local switch 110 and messages from the other nodes 200 and 300 to one. Here, the message is a generic term for request, reply, snoop, and response. The reply to the request is stored in the same entry as the corresponding request, and the response to the snoop is stored in the same entry as the corresponding snoop.

アドレス一致回路１５２は、調停回路１５１で１つに絞られたメッセージに対して、メッセージ格納バッファ１５５に格納されている先行処理中のリクエスト又はスヌープとアドレス一致比較を行う。アドレス一致回路１５２は、一致比較の結果をキャンセル判定回路１５３に通知する。アドレスは、共有されるメモリ１４０のアドレスであり、アドレスが一致するリクエスト又はスヌープが発見された場合には、コヒーレンシーを保障するための制御を行う。 The address matching circuit 152 performs address matching comparison with the request or snoop currently being processed stored in the message storage buffer 155 for the message narrowed down to one by the arbitration circuit 151. The address matching circuit 152 notifies the cancellation determination circuit 153 of the result of the matching comparison. The address is an address of the shared memory 140, and when a request or snoop having a matching address is found, control is performed to ensure coherency.

キャンセル判定回路１５３は、アドレス一致回路１５２からアドレスの比較結果を受信する。一致無しの場合、制御レジスタ１５４の更新及びメッセージ格納バッファ１５５への格納を行う。一致有りの場合であって、メッセージがレスポンスやリプライの場合には、同様に、制御レジスタ１５４の更新及びメッセージ格納バッファ１５５への格納を行う。 The cancel determination circuit 153 receives the address comparison result from the address matching circuit 152. If there is no match, the control register 154 is updated and stored in the message storage buffer 155. If there is a match and the message is a response or reply, the control register 154 is updated and stored in the message storage buffer 155 in the same manner.

一致有りの場合であって、メッセージがリクエストの場合には、そのリクエストを再送バッファ１５６へ格納する。これは、アドレスが一致した後続のリクエストをメッセージ格納バッファ１５５に登録しても、コヒーレンシを維持するために、先行するリクエストと並列に処理を始めることができないためである。それならば、アドレス一致のリクエストをメッセージ格納バッファ１５５に登録せずに、アドレス一致のリクエストの後に続くアドレス不一致のリクエストを優先的にメッセージ格納バッファ１５５に登録するようにし、先行するリクエストの処理が完了するまで待つことなく、アドレス不一致のリクエストの処理を開始したほうが性能上良くなるためである。再送バッファ１５６は、アドレス一致のリクエストを格納すると、リクエスト発行元へ、リクエストを出し直すように通知する。 If there is a match and the message is a request, the request is stored in the retransmission buffer 156. This is because even if a subsequent request with a matching address is registered in the message storage buffer 155, in order to maintain coherency, processing cannot be started in parallel with the preceding request. If so, the address matching request is not registered in the message storage buffer 155, but the address mismatch request following the address matching request is preferentially registered in the message storage buffer 155, and the processing of the preceding request is completed. This is because it is better in performance to start the processing of the address mismatch request without waiting until it is done. When the retransmission buffer 156 stores the address match request, it notifies the request issuer to reissue the request.

一致有りの場合であって、メッセージがスヌープの場合には、制御レジスタ１５４を更新し、そのスヌープをメッセージ格納バッファ１５５へ格納する。スヌープとアドレスが一致する先行リクエストのエントリについて、制御レジスタ１５４の競合フラグをオンにし、かつ、アドレス一致エントリに、後続スヌープのエントリ番号を登録する。このケースは、発明の背景の欄で説明したように、メモリ・ディレクトリ制御部１２０におけるリクエストの処理順と、グローバルスイッチ・ディレクトリ制御部１９０におけるスヌープの処理順とが異なるケースである。メモリ・ディレクトリ制御部１２０は、現在処理中のリクエストに対するスヌープを発行した後、そのスヌープに対するレスポンスを返却されないと、レスポンス返却待ちとなり、デットロックしてしまう。そのため、本実施の形態におけるグローバルスイッチ・ディレクトリ制御部１５０では、後続スヌープが先行リクエストとアドレス一致になった場合であっても、その後続スヌープをメッセージ格納バッファ１５５へ登録する。そして、先行リクエストのスヌープに対するレスポンスが届いた後に、後続スヌープの処理を開始するようにする。 If there is a match and the message is a snoop, the control register 154 is updated and the snoop is stored in the message storage buffer 155. For the entry of the preceding request whose address matches the snoop, the conflict flag of the control register 154 is turned on, and the entry number of the succeeding snoop is registered in the address matching entry. In this case, as described in the background section of the invention, the request processing order in the memory / directory control unit 120 and the snoop processing order in the global switch / directory control unit 190 are different. If the memory / directory control unit 120 issues a snoop for the currently processed request and does not return a response to the snoop, the memory / directory control unit 120 waits for a response to be returned and is deadlocked. Therefore, the global switch / directory control unit 150 according to the present embodiment registers the subsequent snoop in the message storage buffer 155 even if the subsequent snoop has an address match with the preceding request. Then, after a response to the snoop of the preceding request arrives, the subsequent snoop process is started.

制御レジスタ１５４は、メッセージ格納バッファ１５５に格納された各リクエスト又はスヌープを処理する際に必要となるフラグ情報等がエントリ分用意されている。フラグ情報等には、エントリ有効フラグ、発行要求フラグ、メッセージ発行済フラグ、レスポンス受信フラグ、リプライ受信フラグ、データ転送フラグ、競合フラグ、アドレス一致エントリが含まれる。 In the control register 154, flag information and the like necessary for processing each request or snoop stored in the message storage buffer 155 are prepared for each entry. The flag information and the like include an entry valid flag, an issue request flag, a message issued flag, a response reception flag, a reply reception flag, a data transfer flag, a conflict flag, and an address match entry.

メッセージ格納バッファ１５５は、多数のエントリに区分されたバッファである。各エントリに、リクエスト及びスヌープを格納すると共に、スヌープを格納したエントリにそのスヌープに対するレスポンスを格納し、リクエストを格納したエントリにそのリクエストに対するリプライを格納する。 The message storage buffer 155 is a buffer divided into a large number of entries. A request and a snoop are stored in each entry, a response to the snoop is stored in the entry storing the snoop, and a reply to the request is stored in the entry storing the request.

再送バッファ１５６は、アドレス一致有りの場合であって、メッセージがリクエストの場合に、そのリクエストが格納される。再送バッファ１５６は、リクエスト発行元に対して、リクエストの再送指示を行う機能も有している。 The retransmission buffer 156 stores the request when there is an address match and the message is a request. The retransmission buffer 156 also has a function of giving a request retransmission instruction to the request issuer.

メッセージ発行・生成回路１５７は、制御レジスタ１５４の発行要求フラグを参照し、発行要求フラグがオンになっているエントリを検索する。そして、発行要求フラグがオンになっているエントリについて、制御レジスタ１５４のフラグ情報等、及び、メッセージ格納バッファ１５５のエントリデータを読み出す。また、ディレクトリ１６０を索引し、及び更新すると共に、制御レジスタ１５４のフラグ情報等に基づいて、次に発行するメッセージがリクエスト、スヌープ、レスポンス、リプライのどれかを特定し、そのメッセージを生成する。 The message issue / generation circuit 157 refers to the issue request flag in the control register 154 and searches for an entry in which the issue request flag is on. Then, for the entry for which the issue request flag is on, the flag information of the control register 154 and the entry data of the message storage buffer 155 are read. Further, the directory 160 is indexed and updated, and a message to be issued next is specified as a request, snoop, response, or reply based on the flag information of the control register 154, and the message is generated.

調停回路１５８は、メッセージ発行・生成回路１５７から送出される他ノード２００，３００行きの通常のメッセージと、再送バッファ１５６から送出される再送指示のメッセージとを調停し、１つに絞っている。 The arbitration circuit 158 arbitrates between a normal message sent from the message issuing / generating circuit 157 and destined for the other nodes 200 and 300 and a retransmission instruction message sent from the retransmission buffer 156 to one.

次に、図５のグローバルスイッチ・ディレクトリ制御部の動作について、図６、図７のフローチャートを参照して説明する。図６は、他ノード２００，３００からリクエストを受信した時の動作説明図である。図６において、グローバルスイッチ・ディレクトリ制御部１５０が、他ノード２００，３００からリクエストを受信すると、アドレス一致回路１５２は、アドレス比較を行う（図６のＳ１）。アドレス一致無しの場合は、メッセージ格納バッファ１５５へリクエストを登録し、制御レジスタ１５４のエントリ有効フラグと発行要求フラグとをオンにする（図６のＳ２）。 Next, the operation of the global switch / directory control unit of FIG. 5 will be described with reference to the flowcharts of FIGS. FIG. 6 is an operation explanatory diagram when a request is received from the other nodes 200 and 300. In FIG. 6, when the global switch / directory control unit 150 receives a request from the other nodes 200 and 300, the address matching circuit 152 performs address comparison (S1 in FIG. 6). If there is no address match, the request is registered in the message storage buffer 155, and the entry valid flag and issue request flag of the control register 154 are turned on (S2 in FIG. 6).

メッセージ発行・生成回路１５７は、制御レジスタ１５４を参照し、発行要求フラグがオンになっているエントリを検索する。発行要求フラグがオンになっているエントリについて、制御レジスタ１５４から各フラグ情報等を読み出し、また、メッセージ格納バッファ１５５からエントリデータを読み出し、リクエストの生成と発行、更にはディレクトリ１６０の索引を行う（図６のＳ３）。ここで、メモリ１４０へのリクエスト生成時に、リクエスト発行元をグローバルスイッチ・ディレクトリ制御部１５０とする。メッセージ発行・生成回路１５７は、メモリ・ディレクトリ制御部１２０に対して、他ノード２００，３００から発行されたリクエストを、全てグローバルスイッチ・ディレクトリ制御部１５０から発行されたリクエストと見せる。このようにして、他ノード２００，３００から発行されたリクエストに対するスヌープが、グローバルスイッチ・ディレクトリ制御部１５０に戻ってこないようにする。 The message issuing / generating circuit 157 refers to the control register 154 and searches for an entry in which the issue request flag is turned on. For each entry for which the issue request flag is on, the flag information and the like are read from the control register 154, the entry data is read from the message storage buffer 155, the request is generated and issued, and the directory 160 is indexed ( S3 in FIG. Here, at the time of generating a request to the memory 140, the request issuing source is the global switch / directory control unit 150. The message issuance / generation circuit 157 shows all requests issued from the other nodes 200 and 300 to the memory / directory control unit 120 as requests issued from the global switch / directory control unit 150. In this way, snoops for requests issued from the other nodes 200 and 300 are prevented from returning to the global switch / directory control unit 150.

メッセージ発行・生成回路１５７は、ディレクトリ１６０の索引結果を参照し（図６のＳ４）、有効なキャッシュがあった場合には、そのキャッシュデータを保持する他ノードのプロセッサに対してスヌープを発行する（図６のＳ５）。スヌープを発行すると、制御レジスタ１５４の発行要求フラグをオフ、メッセージ発行済フラグをオンにする。メモリ１４０へ発行したリクエストに対するリプライと、他ノード２００，３００へ発行したスヌープに対するレスポンスとは非同期で返却される。グローバルスイッチ・ディレクトリ制御部１５０は、メモリ１４０へ発行したリクエストに対するリプライを受信したら、ディレクトリ１６０の更新を指示し、制御レジスタ１５４のリプライ受信フラグをオンにする。同様に、他ノードへ発行したスヌープに対するレスポンスを受信したら、ディレクトリ１６０の更新を指示し、制御レジスタ１５４のレスポンス受信フラグをオンにする。 The message issuing / generating circuit 157 refers to the index result of the directory 160 (S4 in FIG. 6), and if there is a valid cache, issues a snoop to the processor of the other node that holds the cache data. (S5 in FIG. 6). When the snoop is issued, the issue request flag of the control register 154 is turned off and the message issued flag is turned on. The reply to the request issued to the memory 140 and the response to the snoop issued to the other nodes 200 and 300 are returned asynchronously. When the global switch / directory control unit 150 receives a reply to the request issued to the memory 140, the global switch / directory control unit 150 instructs to update the directory 160 and turns on the reply reception flag of the control register 154. Similarly, when a response to the snoop issued to another node is received, the directory 160 is instructed to be updated, and the response reception flag of the control register 154 is turned on.

スヌープ発行後、メッセージ発行・生成回路１５７は、制御レジスタ１５４のレスポンス受信フラグを参照し、オフの間はオンになるまで待つ（図６のＳ６）。暫く待つと、他ノード２００，３００のプロセッサからグローバルスイッチ・ディレクトリ制御部１５０へ、スヌープに対するレスポンスが送出され、レスポンス受信フラグがオンになる。レスポンス受信フラグがオンになったら、制御レジスタ１５４の競合フラグを参照し（図６のＳ７）、かつ、リプライ受信フラグを参照する（図６のＳ８）。図６で説明しているリクエストの処理と並行して、図３の実線丸印に示すように、自ノード１００内のメモリ・ディレクトリ制御部１２０からグローバルスイッチ・ディレクトリ制御部１５０へスヌープが届いていた場合には、競合フラグがオンになっている。 After issuing the snoop, the message issuing / generating circuit 157 refers to the response reception flag in the control register 154, and waits until it is turned on while it is off (S6 in FIG. 6). After waiting for a while, a response to the snoop is sent from the processors of the other nodes 200 and 300 to the global switch / directory control unit 150, and the response reception flag is turned on. When the response reception flag is turned on, the contention flag in the control register 154 is referred to (S7 in FIG. 6), and the reply reception flag is referred to (S8 in FIG. 6). In parallel with the request processing described in FIG. 6, a snoop has arrived from the memory / directory control unit 120 in the own node 100 to the global switch / directory control unit 150 as indicated by a solid circle in FIG. If this is the case, the conflict flag is on.

競合フラグがオフであって（図６のＳ７：ノー）、リプライ受信フラグがオンの時は、発行要求フラグをオンにする。メッセージ発行・生成回路１５７は、発行要求フラグがオンになっているエントリについて、制御レジスタ１５４のフラグ情報等と、メッセージ格納バッファ１５５のエントリデータとを読み出し、リクエストに対するリプライの生成と発行、さらにはディレクトリ１６０の更新を行う。一連のリクエスト処理が完了したら、制御レジスタ１５４のエントリ有効フラグをオフにする（図６のＳ９）。 When the contention flag is off (S7 in FIG. 6: no) and the reply reception flag is on, the issue request flag is turned on. The message issuance / generation circuit 157 reads the flag information of the control register 154 and the entry data of the message storage buffer 155 for the entry for which the issue request flag is on, and generates and issues a reply to the request. The directory 160 is updated. When a series of request processing is completed, the entry valid flag in the control register 154 is turned off (S9 in FIG. 6).

競合有りの場合は（図６のＳ７：イエス）、制御レジスタ１５４にあるデータ転送フラグを参照する（図６のＳ１０）。レスポンスを受信した場合（図６のＳ６：イエス）、このレスポンスには、有効なキャッシュデータが実際に有ったか（データ転送有り）無かったか（データ転送無し）の情報が含まれている。データ転送有りの場合は（図６のＳ１０：イエス）、競合しているスヌープの処理よりも先に、図６の流れ図に示すリクエストの処理を完了させる必要があるため、グローバルスイッチ・ディレクトリ制御部１５０がメモリ１４０へ発行したリクエストをキャンセルし、レスポンスとリプライとが揃うことを待つことなく、図６の流れ図に示すリクエストの処理を終了させる。このキャンセルによって、メモリ・ディレクトリ制御部１２０に登録されている処理待ちのリクエストがキャンセルされる。 If there is a conflict (S7 in FIG. 6: yes), the data transfer flag in the control register 154 is referred to (S10 in FIG. 6). When a response is received (S6 in FIG. 6: Yes), this response includes information indicating whether valid cache data actually exists (with data transfer) or not (without data transfer). If there is data transfer (S10 in FIG. 6: yes), the request processing shown in the flowchart of FIG. 6 needs to be completed before the competing snoop processing, so the global switch / directory control unit The request issued by the memory 150 to the memory 140 is canceled, and the processing of the request shown in the flowchart of FIG. 6 is terminated without waiting for the response and reply to be completed. By this cancellation, the request waiting for processing registered in the memory / directory control unit 120 is canceled.

そのため、グローバルスイッチ・ディレクトリ制御部１５０は、制御レジスタ１５４の発行要求フラグをオンにする。メッセージ発行・生成回路１５７は、発行要求フラグがオンになっているエントリについて、制御レジスタ１５４のフラグ情報等とメッセージ格納バッファ１５５のエントリデータとを読み出し、リクエストのキャンセル通知をメモリ・ディレクトリ制御部１２０へ発行し、ディレクトリ１６０も更新する（図６のＳ１１）。キャンセル通知を送ったら、競合していたスヌープが格納されている他のエントリに対して開始指示を行う。競合スヌープの処理については、図７の流れ図を用いて説明する。一連のリクエスト処理が完了したら、制御レジスタ１５４のエントリ有効フラグをオフにする（図６のＳ１２）。 Therefore, the global switch / directory control unit 150 turns on the issue request flag of the control register 154. The message issuance / generation circuit 157 reads the flag information of the control register 154 and the entry data of the message storage buffer 155 for the entry for which the issue request flag is turned on, and sends a request cancellation notification to the memory / directory control unit 120. The directory 160 is also updated (S11 in FIG. 6). When a cancel notification is sent, a start instruction is given to another entry storing a competing snoop. The contention snoop process will be described with reference to the flowchart of FIG. When a series of request processing is completed, the entry valid flag in the control register 154 is turned off (S12 in FIG. 6).

データ転送無しの場合は（図６のＳ１０：ノー）、競合していたスヌープが格納されている別のエントリに対して開始指示を行う（図６のＳ１３）。グローバルスイッチ・ディレクトリ制御部１５０は、図３の実線丸印に示すようなスヌープの処理を優先して実行し、終了させて、メモリ・ディレクトリ制御部１２０へレスポンスを返し、競合を解消する。この間、グローバルスイッチ・ディレクトリ制御部１５０が発行したメモリ１４０へのリクエストは、メモリ・ディレクトリ制御部１２０で待ち状態になる。 If there is no data transfer (S10 in FIG. 6: No), a start instruction is given to another entry storing the competing snoop (S13 in FIG. 6). The global switch / directory control unit 150 preferentially executes the snoop process as shown by the solid line circle in FIG. 3, terminates it, returns a response to the memory / directory control unit 120, and resolves the conflict. During this time, a request to the memory 140 issued by the global switch / directory control unit 150 is put into a waiting state in the memory / directory control unit 120.

ディレクトリ１６０を索引した結果、有効なキャッシュが無かった場合（図６のＳ４：ノー）、制御レジスタ１５４の競合フラグを参照し（図６のＳ１４）、かつ、リプライ受信フラグ（図６のＳ１５）を参照する。競合フラグがオフでリプライ受信フラグがオンの時は、発行要求フラグをオンにする。メッセージ発行・生成回路１５７は、発行要求フラグがオンになっているエントリについて、制御レジスタ１５４のフラグ情報等と、メッセージ格納バッファ１５５のエントリデータとを読み出し、リプライの返却とディレクトリ１６０の更新とを行う。一連のリクエスト処理が完了したら、制御レジスタ１５４のエントリ有効フラグをオフにする（図６のＳ１６）。 When there is no valid cache as a result of indexing the directory 160 (S4 in FIG. 6: No), the contention flag of the control register 154 is referred to (S14 in FIG. 6), and the reply reception flag (S15 in FIG. 6). Refer to When the conflict flag is off and the reply reception flag is on, the issue request flag is turned on. The message issuance / generation circuit 157 reads the flag information of the control register 154 and the entry data of the message storage buffer 155 for the entry for which the issue request flag is on, returns a reply, and updates the directory 160. Do. When a series of request processing is completed, the entry valid flag in the control register 154 is turned off (S16 in FIG. 6).

競合フラグがオンの場合は（図６のＳ１４：イエス）、競合していたスヌープが格納されている別のエントリに対して開始指示を行う（図６のＳ１７）。グローバルスイッチ・ディレクトリ制御部１５０は、図３の実線丸印に示すようなスヌープの処理を優先して実行し、終了させて、メモリ・ディレクトリ制御部１２０へレスポンスを返し、競合を解消する。この間、グローバルスイッチ・ディレクトリ制御部１５０が発行したメモリ１４０へのリクエストは、メモリ・ディレクトリ制御部１２０で待ち状態になる。 If the conflict flag is on (S14 in FIG. 6: yes), a start instruction is given to another entry storing the conflicting snoop (S17 in FIG. 6). The global switch / directory control unit 150 preferentially executes the snoop process as shown by the solid line circle in FIG. 3, terminates it, returns a response to the memory / directory control unit 120, and resolves the conflict. During this time, a request to the memory 140 issued by the global switch / directory control unit 150 is put into a waiting state in the memory / directory control unit 120.

アドレス一致有りの場合は（図６のＳ１：イエス）、リクエストを再送バッファ１５６へ登録し（図６のＳ１８）、リクエストの発行元に対して、再送指示を行う（図６のＳ１９）。 If there is an address match (S1: Yes in FIG. 6), the request is registered in the retransmission buffer 156 (S18 in FIG. 6), and a retransmission instruction is issued to the request issuer (S19 in FIG. 6).

図７は、自ノード１００のメモリ・ディレクトリ制御部１２０からスヌープを受信した時の動作説明図である。グローバルスイッチ・ディレクトリ制御部１５０が、自ノード１００のメモリ・ディレクトリ制御部１２０からスヌープを受信すると、アドレス一致回路１５２はアドレス比較を行う（図７のＳ３１）。 FIG. 7 is an explanatory diagram of an operation when a snoop is received from the memory / directory control unit 120 of the own node 100. When the global switch / directory control unit 150 receives the snoop from the memory / directory control unit 120 of the own node 100, the address matching circuit 152 performs address comparison (S31 in FIG. 7).

アドレス一致無しの場合は（図７のＳ３１：ノー）、メッセージ格納バッファ１５５へ登録し、制御レジスタ１５４のエントリ有効フラグ及び発行要求フラグをオンにする（図７のＳ３２）。メッセージ発行・生成回路１５７は、発行要求フラグがオンになっているエントリについて、制御レジスタ１５４のフラグ情報等と、メッセージ格納バッファ１５５のエントリデータとを読み出し、また、ディレクトリ１６０を索引する（図７のＳ３３）。 If there is no address match (S31 in FIG. 7: No), it is registered in the message storage buffer 155, and the entry valid flag and issue request flag in the control register 154 are turned on (S32 in FIG. 7). The message issuing / generating circuit 157 reads the flag information of the control register 154 and the entry data of the message storage buffer 155 for the entry for which the issue request flag is on, and indexes the directory 160 (FIG. 7). S33).

ディレクトリ１６０を索引した結果、有効なキャッシュデータがあった場合（図７のＳ３４：イエス）、そのキャッシュデータを保持する他ノード２００，３００のプロセッサに対して、スヌープを発行する（図７のＳ３５）。その時に、制御レジスタ１５４の発行要求フラグをオフにし、メッセージ発行済フラグをオンにする。他ノード２００，３００のプロセッサからレスポンスを受信したら、制御レジスタ１５４の発行要求フラグ及びレスポンス受信フラグをオンにする（図７のＳ３６）。メッセージ発行・生成回路１５７は、発行要求フラグがオンになっているエントリについて、制御レジスタ１５４のフラグ情報等と、メッセージ格納バッファ１５５のエントリデータとを読み出し、自ノード１００のメモリ・ディレクトリ制御部１２０に対するレスポンスを生成し、発行し、更にはディレクトリ１６０の更新を行う（図７のＳ３７）。一連のスヌープ処理が完了したら、制御レジスタ１５４のエントリ有効フラグをオフにする。 If there is valid cache data as a result of indexing the directory 160 (S34 in FIG. 7: Yes), a snoop is issued to the processors of the other nodes 200 and 300 holding the cache data (S35 in FIG. 7). ). At that time, the issue request flag of the control register 154 is turned off, and the message issued flag is turned on. When a response is received from the processors of the other nodes 200 and 300, the issue request flag and the response reception flag of the control register 154 are turned on (S36 in FIG. 7). The message issuance / generation circuit 157 reads the flag information of the control register 154 and the entry data of the message storage buffer 155 for the entry for which the issuance request flag is turned on, and the memory / directory control unit 120 of the own node 100. A response is generated and issued, and the directory 160 is updated (S37 in FIG. 7). When a series of snoop processing is completed, the entry valid flag of the control register 154 is turned off.

有効なキャッシュが無い場合には（図７のＳ３４：ノー）、自ノード１００のメモリ・ディレクトリ制御部１２０に対するレスポンスの生成と発行を行う（図７のＳ３８）。一連のスヌープ処理が完了したら、制御レジスタ１５４のエントリ有効フラグをオフにする。 If there is no valid cache (S34 in FIG. 7: No), a response is generated and issued to the memory / directory control unit 120 of the own node 100 (S38 in FIG. 7). When a series of snoop processing is completed, the entry valid flag of the control register 154 is turned off.

アドレス一致有りの場合でも（図７のＳ３１：イエス）、グローバルスイッチ・ディレクトリ制御部１５０は、メモリ・ディレクトリ制御部１２０から受信したスヌープを、メッセージ格納バッファ１５５へ登録する。そして、グローバルスイッチ・ディレクトリ制御部１５０は、制御レジスタ１５４のエントリ有効フラグをオンにし、アドレス一致のリクエストが格納されている別のエントリについて、制御レジスタ１５４の競合フラグをオンにし、かつ、アドレス一致エントリに、アドレス一致有りのスヌープを格納したエントリ番号を登録する（図７のＳ３９）。 Even when there is an address match (S31 in FIG. 7: Yes), the global switch / directory control unit 150 registers the snoop received from the memory / directory control unit 120 in the message storage buffer 155. Then, the global switch / directory control unit 150 turns on the entry valid flag of the control register 154, turns on the conflict flag of the control register 154 for another entry in which the address match request is stored, and also matches the address match. The entry number storing the snoop with address match is registered in the entry (S39 in FIG. 7).

グローバルスイッチ・ディレクトリ制御部１５０は、図６の流れ図で説明したように、先行のリクエストを処理している。図６に示したように、先行のリクエストから発行指示がきたら（図６のＳ１２，Ｓ１３，Ｓ１７）、アドレス一致有りのスヌープの発行要求フラグをオンにする（図７のＳ４０）。以降の動作（図７のＳ３３〜Ｓ３８）は、アドレス一致無しの場合（図７のＳ３１：ノー）と同様の動作になる。 The global switch / directory control unit 150 processes the preceding request as described in the flowchart of FIG. As shown in FIG. 6, when an issuance instruction is received from a preceding request (S12, S13, S17 in FIG. 6), a snoop issuance request flag with an address match is turned on (S40 in FIG. 7). The subsequent operations (S33 to S38 in FIG. 7) are the same as those in the case where there is no address match (S31 in FIG. 7: No).

本実施の形態では、共有メモリを含む多数のマルチプロセッサノードがグローバルスイッチを経て互いに接続されるような大規模システム（ＳＭＰ）において、メモリコントローラ付近に置く自ノード内のキャッシュ状態を管理するディレクトリと、グローバルスイッチ付近に置く他ノードのキャッシュ状態を管理するディレクトリとにより、キャッシュのコヒーレンシを保障し、レイテンシを短縮している。 In the present embodiment, in a large-scale system (SMP) in which a large number of multiprocessor nodes including a shared memory are connected to each other via a global switch, a directory for managing a cache state in the own node located near the memory controller, The directory that manages the cache state of other nodes in the vicinity of the global switch guarantees cache coherency and shortens the latency.

本実施の形態では、グローバルスイッチ・ディレクトリ制御部１５０に、同一アドレスの先行リクエスト又はスヌープが存在するかどうかを確認するためのアドレス一致回路１５２と、アドレスが一致した場合にキャンセルするかどうかを判定するキャンセル判定回路１５３と、メッセージの発行要求や競合したスヌープの開始指示などを行うためのフラグ情報等を記憶する制御レジスタ１５４と、制御レジスタ１５４やメッセージ格納バッファ１５５からデータを読み出して、メッセージを発行し、生成するメッセージ発行・生成回路１５７とを設けている。これらにより、２箇所のディレクトリのコヒーレンシを維持している。 In the present embodiment, the global switch / directory control unit 150 determines whether to cancel if the address matches with the address matching circuit 152 for confirming whether there is a preceding request or snoop having the same address. The data is read from the cancel determination circuit 153, the control register 154 that stores the flag issuance request for issuing a message, the conflicting snoop start instruction, and the like, and the message is read A message issuing / generating circuit 157 for issuing and generating is provided. As a result, the coherency of the two directories is maintained.

以上説明したように、本実施の形態においては、以下に記載するような効果を奏する。第１の効果は、ディレクトリをメモリ・ディレクトリ制御部付近の１箇所のみで管理する時に比べ、リクエストを発行するプロセッサ及び有効なキャッシュデータを保持するプロセッサが、それぞれ別々のノードにある場合のレイテンシを短縮できることである。これは、ディレクトリをグローバルスイッチの所にも設けることにより、他ノードからのリクエストに対するスヌープを、グローバルスイッチ・ディレクトリ制御部から発行できるようにしているためである。 As described above, the present embodiment has the following effects. The first effect is that the latency in the case where the processor that issues the request and the processor that holds the valid cache data are in different nodes is compared with the case where the directory is managed only at one location near the memory / directory control unit. It can be shortened. This is because a directory is also provided at the global switch so that a snoop for a request from another node can be issued from the global switch / directory control unit.

第１の効果の一例として、図８Ａに、プロセッサ２０２からメモリ１４０に対するリクエストが発行され、プロセッサ３０２のキャッシュに有効なデータがあるケースにおいて、リクエスト発行からデータを受信するまでの経路を示す。本実施の形態においては、プロセッサ２０２からリクエストが発行されると、このリクエストは、ノード２００内のローカルスイッチ２１０からグローバルスイッチ・ディレクトリ制御部２５０を通り、メモリ１４０が存在するノード１００のグローバル・ディレクトリ制御部１５０に入る。グローバルスイッチ・ディレクトリ制御部１５０は、メモリ１４０にリクエストを出すのと同時に（このリクエストは、一定条件下、キャンセルされる（図６のＳ１１））ディレクトリ１６０を索引し、キャッシュに有効なデータを保持しているプロセッサ３０２に対してスヌープを発行する。スヌープは、プロセッサ３０２が所属するノード３００のグローバルスイッチ・ディレクトリ制御部３５０、ローカルスイッチ３１０を通り、プロセッサ３０２に入る。 As an example of the first effect, FIG. 8A shows a path from issuing a request to receiving data in a case where a request for the memory 140 is issued from the processor 202 and there is valid data in the cache of the processor 302. In the present embodiment, when a request is issued from the processor 202, the request passes from the local switch 210 in the node 200 through the global switch directory control unit 250, and the global directory of the node 100 in which the memory 140 exists. The controller 150 is entered. The global switch / directory control unit 150 indexes the directory 160 at the same time as making a request to the memory 140 (this request is canceled under a certain condition (S11 in FIG. 6)), and holds valid data in the cache. A snoop is issued to the processor 302 that is running. The snoop enters the processor 302 through the global switch / directory control unit 350 and the local switch 310 of the node 300 to which the processor 302 belongs.

スヌープを受信するプロセッサ３０２が、実際に有効なキャッシュデータを保持していた場合には、キャッシュデータが発行元のプロセッサ２０２へ転送され、更にレスポンスがグローバルスイッチ・ディレクトリ制御部１５０に返却される。キャッシュデータは、ローカルスイッチ３１０、グローバルスイッチ・ディレクトリ制御部３５０、リクエスト元のグローバルスイッチ・ディレクトリ制御部２５０、ローカルスイッチ２１０を通りプロセッサ２０２へ返却される。レイテンシはリクエスト発行からデータの受信までであり、メモリ１４０に対するリクエストおよびプロセッサ３０２からのレスポンスに関してはレイテンシに見えてこないため経路には表示していない。 If the processor 302 that receives the snoop actually holds the valid cache data, the cache data is transferred to the issuing processor 202 and a response is returned to the global switch / directory control unit 150. The cache data is returned to the processor 202 through the local switch 310, the global switch / directory control unit 350, the request source global switch / directory control unit 250, and the local switch 210. The latency is from the request issuance to the data reception, and the request to the memory 140 and the response from the processor 302 are not displayed on the route because they do not appear to be latency.

図１Ａに示した関連技術１における経路と比較すると、本実施の形態では、グローバルスイッチ・ディレクトリ制御部１５０からメモリ・ディレクトリ制御部１２０までの往復分のレイテンシが短縮される。 Compared with the path in the related technique 1 shown in FIG. 1A, in this embodiment, the round trip latency from the global switch / directory control unit 150 to the memory / directory control unit 120 is shortened.

第２の効果は、ディレクトリをグローバルスイッチ・ディレクトリ制御部の１箇所のみで管理する時に比べ、リクエストを発行するプロセッサが自ノード内にあって、かつ、有効なキャッシュデータを保持するプロセッサがいずれのノードにも無い場合、若しくは自ノード内にある場合のレイテンシを短縮できることである。これは、ディレクトリをメモリの所にも設けることにより、自ノード内のプロセッサから発行されたリクエストに対するスヌープ処理を、グローバルスイッチ・ディレクトリ制御部を介さずに行えるようにしたためである。 The second effect is that, compared with the case where the directory is managed by only one place of the global switch / directory control unit, the processor that issues the request is in its own node and the processor that holds the valid cache data is It is possible to shorten the latency when the node is not present or within the own node. This is because the directory is also provided in the memory so that the snoop process for the request issued from the processor in the own node can be performed without using the global switch / directory control unit.

第２の効果の一例として、図８Ｂに、プロセッサ１０２からメモリ１４０に対するリクエストが発行され、プロセッサ１０１のキャッシュに有効なデータがあるケースにおいて、リクエスト発行からデータを受信するまでの経路を示す。プロセッサ１０２からリクエストが発行されると、このリクエストは、ノード１００内のローカルスイッチ１１０からメモリ・ディレクトリ制御部１２０に入る。メモリ・ディレクトリ制御部１２０は、メモリ１４０にリクエストを発行するのと同時にディレクトリ１３０を索引し、キャッシュに有効なデータを保持しているプロセッサ１０１に対してスヌープを発行する。スヌープは、ローカルスイッチ１１０を通りプロセッサ１０１に入る。 As an example of the second effect, FIG. 8B shows a path from issuing a request to receiving data in a case where a request for the memory 140 is issued from the processor 102 and there is valid data in the cache of the processor 101. When a request is issued from the processor 102, the request enters the memory directory control unit 120 from the local switch 110 in the node 100. The memory / directory control unit 120 indexes the directory 130 simultaneously with issuing a request to the memory 140, and issues a snoop to the processor 101 holding valid data in the cache. The snoop enters the processor 101 through the local switch 110.

スヌープを受信するプロセッサ１０１が、実際に有効なキャッシュデータを保持していた場合には、キャッシュデータが、リクエスト発行元のプロセッサ１０２へ転送され、更にレスポンスがメモリ・ディレクトリ制御部１２０へ返却される。キャッシュデータは、ローカルスイッチ１１０を通りプロセッサ１０２へ転送される。レイテンシは、リクエスト発行からデータの受信までであり、メモリ１４０に対するリクエスト及びプロセッサ１０１からのレスポンスに関してはレイテンシに見えてこないため経路には表示していない。 If the processor 101 that receives the snoop actually holds valid cache data, the cache data is transferred to the processor 102 that issued the request, and a response is returned to the memory / directory control unit 120. . The cache data is transferred to the processor 102 through the local switch 110. The latency is from the request issuance to the data reception, and the request to the memory 140 and the response from the processor 101 do not appear to be a latency and are not displayed on the route.

図２Ｂに示した関連技術２における経路と比較すると、本実施の形態では、ローカルスイッチ１１０から、グローバルスイッチ・ディレクトリ制御部１５０までの往復分のレイテンシが短縮される。 Compared with the path in the related technology 2 shown in FIG. 2B, in this embodiment, the round trip latency from the local switch 110 to the global switch / directory control unit 150 is shortened.

関連技術１における他ノードからのリクエスト処理の経路を説明する図である。It is a figure explaining the path | route of the request processing from the other node in related technology 1. 関連技術１における自ノード内のリクエスト処理の経路を説明する図である。It is a figure explaining the path | route of the request process in the own node in related technology 1. FIG. 関連技術２における他ノードからのリクエスト処理の経路を説明する図である。It is a figure explaining the path | route of the request process from the other node in related technology 2. 関連技術２における自ノード内のリクエスト処理の経路を説明する図である。It is a figure explaining the path | route of the request processing in the own node in related technology 2. FIG. 関連技術３における問題点の説明図である。It is explanatory drawing of the problem in the related technology 3. FIG. 本発明の一つの実施の形態によるマルチプロセッサシステムの構成図である。1 is a configuration diagram of a multiprocessor system according to an embodiment of the present invention. FIG. 本実施の形態におけるグローバルスイッチ・ディレクトリ制御部の構成図である。It is a block diagram of the global switch directory control part in this Embodiment. 本実施の形態におけるグローバルスイッチ・ディレクトリ制御部の動作を説明する第一のフローチャートである。It is a 1st flowchart explaining operation | movement of the global switch directory control part in this Embodiment. 本実施の形態におけるグローバルスイッチ・ディレクトリ制御部の動作を説明する第二のフローチャートである。It is a 2nd flowchart explaining operation | movement of the global switch directory control part in this Embodiment. 本実施の形態における他ノードからのリクエスト処理の経路を説明する図である。It is a figure explaining the path | route of the request process from the other node in this Embodiment. 本実施の形態における自ノード内のリクエスト処理の経路を説明する図である。It is a figure explaining the path | route of the request processing in the own node in this Embodiment.

Explanation of symbols

１００，１００ａ，１００ｂ，１００ｃ，２００，２００ａ，２００ｂ，３００，３００ａ，３００ｂノード
１０１，１０２，２０１，２０２，３０１，３０２プロセッサ
１１０，２１０，３１０ローカルスイッチ
１２０，２２０，３２０メモリ・ディレクトリ制御部
１３０，１６０，２３０，２６０，３３０，３６０ディレクトリ
１４０，２４０，３４０メモリ
１５０，１９０，２５０，２９０，３５０，３９０グローバルスイッチ・ディレクトリ制御部
１５１，１５８調停回路
１５２アドレス一致回路
１５３キャンセル判定回路
１５４制御レジスタ
１５５メッセージ格納バッファ
１５６再送バッファ
１５７メッセージ発行・生成回路
１７０，２７０，３７０グローバルスイッチ
１８０，２８０，３８０メモリ制御部 100, 100a, 100b, 100c, 200, 200a, 200b, 300, 300a, 300b Nodes 101, 102, 201, 202, 301, 302 Processors 110, 210, 310 Local switches 120, 220, 320 Memory directory control unit 130 , 160, 230, 260, 330, 360 Directory 140, 240, 340 Memory 150, 190, 250, 290, 350, 390 Global switch / directory control unit 151, 158 Arbitration circuit 152 Address match circuit 153 Cancel determination circuit 154 Control register 155 Message storage buffer 156 Retransmission buffer 157 Message issuing / generating circuit 170, 270, 370 Global switch 180, 280, 380 Memory control unit

Claims

A first processor belonging to its own node;
A memory shared by the first processor and a second processor belonging to another node;
A first directory for managing the cache state of the node;
A second directory for managing the cache state of the other node;
A first directory control unit that processes the request issued in the own node and receives the first request designating the address of the memory, and indexes the first directory;
When processing a request issued from the other node and receiving a second request specifying the address of the memory, when indexing the second directory and processing the second request A latency shortening method comprising: a second directory control unit that performs predetermined control to ensure coherency by checking whether or not there is an address conflict when another request or snoop is received.

The second directory control unit
Belongs to the own node,
The first directory is
The latency shortening method according to claim 1, wherein the cache state of the second directory control unit is managed as one of the cache states of the own node.

The first directory control unit
When processing the first request, the first directory is indexed, and if the cache of the second directory control unit is valid, a first snoop is issued to the second directory control unit And
The second directory control unit
When processing the second request, when issuing the third request to the first directory control unit, and after issuing the third request, the first snoop is received. When the address of the third request and the address of the first snoop are in conflict, the first snoop process is started and terminated before receiving a reply to the third request. The latency shortening method according to claim 2.

The second directory control unit
When starting the processing of the first snoop, when the second directory is indexed, the cache is valid, and the second snoop is issued to the other node, the third request is issued. The latency shortening method according to claim 3, wherein a cancellation notification for canceling is issued to the first directory control unit.

The second directory control unit
When starting the first snoop process, if the second snoop has been issued, it waits to receive a response to the second snoop, and then the first snoop process The latency shortening method according to claim 4, wherein the cancel notification is issued only when a response to the second snoop indicates that the cache data has actually been transferred.

The second directory control unit
The latency shortening method according to claim 5, wherein the third request is issued with the issuer as oneself.

The second directory control unit
When processing the second request, if another request with an address conflict is received, the request is resent to the issuer of the other request without processing the other request. The latency shortening method according to claim 6.

The first directory control unit
A memory control unit for controlling access to the memory;
The second directory control unit
The latency shortening method according to claim 7, further comprising a global switch that transmits and receives a message including a request, a reply, a snoop, and a response to and from the other node.

Providing a first directory for managing the cache state of the own node;
Providing a second directory for managing the cache state of other nodes;
Providing a first directory control for indexing the first directory;
Providing a second directory control for indexing the second directory;
When a first request specifying an address of a memory shared by a first processor belonging to the own node and a second processor belonging to the other node is issued in the own node, the Issuing a second request to the memory by the first directory control unit, indexing the first directory, and processing the first request;
When a third request specifying the address of the memory is issued from the other node, the second directory control unit issues a fourth request to the memory and indexes the second directory. In addition, when another request or snoop is received, the third request is processed by performing predetermined control for ensuring coherency by checking whether or not the address conflicts. Yes Latency reduction method.

Setting up the first directory
The latency shortening method according to claim 9, further comprising providing a first directory for managing a cache state of the second directory control unit as one of the cache states of the own node.

Processing the first request includes
Indexing the first directory and, if the cache of the second directory control is valid, issuing a first snoop to the second directory control;
Processing the third request includes
After issuing the fourth request, when the first snoop is received by the second directory control unit, the address of the fourth request conflicts with the address of the first snoop. The latency shortening method according to claim 10, further comprising: starting and ending the first snoop process before receiving a reply to the fourth request.

Processing the third request includes
When starting the processing of the first snoop, when the second directory is indexed for the third request, the cache is valid, and the second snoop is issued to the other node The latency shortening method according to claim 11, further comprising: issuing a cancel notification for canceling the fourth request.

Processing the third request includes
When starting the first snoop process, if the second snoop has been issued, it waits to receive a response to the second snoop, and then the first snoop process The latency shortening method according to claim 12, further comprising issuing the cancellation notification only when a response to the second snoop indicates that the cache data has actually been transferred.

Processing the third request includes
The latency shortening method according to claim 13, comprising rewriting the issuer from another node to the own node and issuing the fourth request.

Processing the third request includes
The latency shortening method according to claim 14, further comprising: instructing a re-transmission of the request to an issuer of the other request without processing the other request when receiving another request having an address conflict. .