JP5571327B2

JP5571327B2 - Latency reduction system, latency reduction method, and latency reduction program

Info

Publication number: JP5571327B2
Application number: JP2009145022A
Authority: JP
Inventors: 宏次小林
Original assignee: NEC Computertechno Ltd
Current assignee: NEC Computertechno Ltd
Priority date: 2009-06-18
Filing date: 2009-06-18
Publication date: 2014-08-13
Anticipated expiration: 2029-06-18
Also published as: JP2011002986A

Description

本発明はレイテンシ短縮システム、レイテンシ短縮方法、および、レイテンシ短縮プログラムに関し、特に、トラフィックを低減させることが可能なレイテンシ短縮システム、レイテンシ短縮方法、および、レイテンシ短縮プログラムに関する。 The present invention relates to a latency shortening system, a latency shortening method, and a latency shortening program, and more particularly, to a latency shortening system, a latency shortening method, and a latency shortening program capable of reducing traffic.

一般的に、キャッシュの制御方法として、Invalidateリクエストに基づき他プロセッサのキャッシュをInvalidate(無効化)し、自身のキャッシュだけが有効データを保持するようにするものと、Sharedリクエストに基づき他プロセッサのキャッシュと共有して複数のキャッシュで有効データを保持するようにするものが知られている。 In general, as a cache control method, the cache of another processor is invalidated based on the Invalidate request so that only its own cache holds valid data, and the cache of the other processor is based on the Shared request. And sharing valid data with a plurality of caches is known.

Invalidateリクエストは、自身以外のプロセッサのキャッシュを無効化し、自身のキャッシュにだけ有効なデータを保持するためのリクエストであるため、Invalidateリクエスト処理中に受信した別のプロセッサからの同じアドレスに対するInvalidateもしくはSharedリクエストは一緒に処理してはいけない。これは、一緒に処理してしまうと複数のプロセッサに互いに異なるデータがそれぞれ有効であるとして存在してしまい、コヒーレンシ違反となるためである。 The Invalidate request is a request to invalidate the cache of a processor other than itself and retain valid data only in its own cache, so Invalidate or Shared for the same address from another processor received during Invalidate request processing Do not process requests together. This is because if they are processed together, different data exists in each of the plurality of processors as being valid, resulting in a coherency violation.

これを避けるため、Invalidateリクエストを処理している間は、別のプロセッサから送られてくる同じアドレスのInvalidateおよびSharedリクエストに対してそのリクエストの発行元にリトライ指示を出し、発行元に当該リクエストの出し直しを促すことで1つずつ順番に処理することが行われる。 To avoid this, while processing an Invalidate request, issue a retry instruction to the issuer of the request for the Invalidate and Shared requests sent from another processor and send the request to the issuer. Processing is performed one by one by encouraging re-loading.

次に、Invalidateリクエストの処理について図５を参照してさらに詳細に説明する。（また、特許文献１にも、共有メモリを含む多数マルチプロセッサノードがクロスバースイッチを介して互いに接続されるようなシステムが開示されている。）
図５に示すシステムは、共有メモリを含む多数のマルチプロセッサノード（以下、ノードと呼ぶ）100,200,300から構成される。以下の説明では、共有メモリを単にメモリと呼ぶことにする。 Next, the Invalidate request process will be described in more detail with reference to FIG. (Patent Document 1 also discloses a system in which multiple multiprocessor nodes including a shared memory are connected to each other via a crossbar switch.)
The system shown in FIG. 5 includes a large number of multiprocessor nodes (hereinafter referred to as nodes) 100, 200, 300 including a shared memory. In the following description, the shared memory is simply called a memory.

ノード100,200,300は、グローバルスイッチ・ディレクトリ制御部150,250,350を介して接続されている。各ノード100,200,300は、プロセッサ101,102,201,202,301,302とローカルスイッチ110,210,310とメモリ・ディレクトリ制御部120,220,320とメモリ140,240,340とディレクトリ130,160,230,260,330,360とグローバルスイッチ・ディレクトリ制御部150,250,350とから構成される。 The nodes 100, 200, and 300 are connected via global switch / directory control units 150, 250, and 350. Each node 100, 200, 300 includes processors 101, 102, 201, 202, 301, 302, local switches 110, 210, 310, memory directory controllers 120, 220, 320, memories 140, 240, 340, directories 130, 160, 230, 260, 330, 360, and global switch directory controllers 150, 250, 350.

ノード200,300のプロセッサ201,202,301からノード100のメモリ140に対する同一アドレスのリクエストが発行されると、そのリクエストはグローバルスイッチ・ディレクトリ制御部250,350を介してグローバルスイッチ・ディレクトリ制御部150に送られる。グローバルスイッチ・ディレクトリ制御部150は３個のリクエストのうち、最初に受信したCPU201からのリクエストを受け付けて処理を行い、残りのCPU202,301からのリクエストは最初のリクエストの処理が完了するまではグローバルスイッチ・ディレクトリ制御部150とグローバルスイッチ・ディレクトリ制御部250,350との間でリトライされ続けている。 When a request of the same address is issued to the memory 140 of the node 100 from the processors 201, 202, 301 of the nodes 200, 300, the request is sent to the global switch / directory control unit 150 via the global switch / directory control units 250, 350. The global switch / directory control unit 150 receives and processes the first request received from the CPU 201 among the three requests, and the remaining requests from the CPUs 202 and 301 are global until the processing of the first request is completed. Retries continue between the switch directory control unit 150 and the global switch directory control units 250 and 350.

一方、Sharedリクエストは、複数のプロセッサのキャッシュで有効なデータを持つことが出来るリクエストなので、Sharedリクエスト処理中に別のプロセッサから同じアドレスに対するSharedリクエストが受信された場合にそれらを一緒に処理してもコヒーレンシ違反にならないため、それらは同時に処理される。しかしながら、別のプロセッサからのInvalidateリクエストに対してはコヒーレンシ違反となるため、その場合にはやはりリクエストの再発行処理が行われる。 On the other hand, a shared request is a request that can have valid data in caches of multiple processors, so if a shared request for the same address is received from another processor during shared request processing, they are processed together. Are not coherency violations, so they are processed simultaneously. However, since an invalidate request from another processor is a coherency violation, in this case, a request reissue process is also performed.

以上のように、Sharedリクエスト処理中に、別のプロセッサからの同一アドレスに対するSharedリクエストを受信した場合にはリクエストの同時処理が可能であるが、実際には、複数あるプロセッサからはInvalidateおよびSharedリクエストが不定期なタイミングで次々と発行されるので、制御も非常に複雑である。 As described above, if a shared request for the same address from another processor is received during shared request processing, it is possible to process requests simultaneously, but in fact, invalidate and shared requests from multiple processors. Are issued one after another at irregular timing, so the control is also very complicated.

そこで、一般的に、制御を簡単化するために、InvalidateおよびSharedリクエストを区別せずに、リクエスト処理中は同一アドレスに関わるリクエストに対しリトライ指示を出し、発行元にリクエストを再発行させる事で、1つずつ順番に処理するという方法がとられていた。 Therefore, in general, in order to simplify the control, without distinguishing between Invalidate and Shared requests, it is possible to issue a retry instruction to a request related to the same address during request processing and reissue the request to the issuer. The method of processing in order one by one was taken.

特開２００３−１５０５７３号公報JP 2003-150573 A

しかしながら、近年マルチコア化により発行元の数が増えたことで、リクエスト間のアドレス競合の発生頻度が高まり、上述したようなリクエスト処理方法では、大きな性能低下を招くようになってきた。 However, since the number of issuers has increased in recent years due to the multi-core configuration, the frequency of address conflicts between requests has increased, and the above-described request processing method has led to a significant decrease in performance.

本発明の目的は、上述の問題点を解決したレイテンシ短縮システム、レイテンシ短縮方法、および、レイテンシ短縮プログラムを提供することにある。 An object of the present invention is to provide a latency shortening system, a latency shortening method, and a latency shortening program that solve the above-described problems.

本発明のレイテンシ短縮システムは、複数のノードを含むレイテンシ短縮システムにおいて、ノードは、共有メモリと、記憶手段と、複数の発行元から共有メモリの同一アドレスに対して同時にリクエストが発行された場合に、該リクエストを発行した発行元を特定する制御情報を記憶手段に登録する登録手段と、同時に発行されたリクエストのうち、最初に発行されたリクエストに対して当該リクエストの発行元にデータを返信するとき、当該発行元を制御情報に基づいて決定する送信手段と、を備えることを特徴とする。 The latency shortening system of the present invention is a latency shortening system including a plurality of nodes. When a node issues a request simultaneously to the same address of the shared memory from the shared memory, the storage unit, and the plurality of issuers. , A registration unit that registers in the storage unit control information that identifies the issuer that issued the request, and returns data to the issuer of the request in response to the request issued first among the requests issued at the same time And transmitting means for determining the issuer based on the control information.

本発明のレイテンシ短縮方法は、それぞれ共有メモリおよび記憶手段を備える複数のノードを含むレイテンシ短縮システムにおいて、複数の発行元から共有メモリの同一アドレスに対して同時にリクエストが発行された場合に、該リクエストを発行した発行元を特定する制御情報を記憶手段に登録する登録ステップと、同時に発行されたリクエストのうち、最初に発行されたリクエストに対して当該リクエストの発行元にデータを返信するとき、当該発行元を制御情報に基づいて決定する第１送信ステップと、を含むことを特徴とする。 The latency shortening method of the present invention is a latency shortening system including a plurality of nodes each having a shared memory and a storage means, and requests are issued when requests are issued simultaneously to the same address of the shared memory from a plurality of issuers. A registration step for registering in the storage means control information that identifies the issuer that issued the request, and among the requests issued at the same time, when the data is returned to the issuer of the request for the first issued request, And a first transmission step of determining an issuer based on the control information.

本発明のレイテンシ短縮プログラムは、第１のコンピュータに、リクエストの種別を判別する判別処理を実行させ、判別処理において、リクエストが他ノードのキャッシュと共有して有効データを保持する旨を示すSharedリクエストと判別された場合には、登録処理において、該他のノードをリクエストの発行元と特定するための制御情報を記憶手段に登録し、判別処理において、リクエストがSharedリクエスト以外であると判別された場合には、登録処理において、他のノードをリクエストの発行元と特定する制御情報を記憶手段に登録せず、リクエストのうち、最初に発行されたリクエストの処理が完了するまでの間、残りのリクエストが当該残りのリクエストの発行元間でリトライされるようにすることを特徴とする。 The latency shortening program of the present invention causes the first computer to execute a determination process for determining the type of request, and in the determination process, a shared request indicating that the request is shared with a cache of another node and holds valid data. In the registration process, the control information for identifying the other node as the request issuer is registered in the storage means, and in the determination process, it is determined that the request is other than the Shared request. In this case, in the registration process, the control information that identifies the other node as the issuer of the request is not registered in the storage unit, and the remaining request is processed until the process of the first issued request is completed. The request is retried between the issuers of the remaining requests.

本発明のノードは、共有メモリと、記憶手段と、複数の発行元から共有メモリの同一アドレスに対して同時にリクエストが発行された場合に、該リクエストを発行した発行元を特定する制御情報を記憶手段に登録する登録手段と、同時に発行されたリクエストのうち、最初に発行されたリクエストに対して当該リクエストの発行元にデータを返信するとき、当該発行元を制御情報に基づいて決定する送信手段と、を備えることを特徴とする。 The node of the present invention stores shared memory, storage means, and control information that identifies the issuer that issued the request when multiple requests are issued simultaneously to the same address in the shared memory. A registration means for registering with the means, and a transmission means for determining the issuer based on the control information when returning data to the issuer of the request for the first issued request among the simultaneously issued requests And.

本発明には、複数プロセッサからのSharedリクエストを同時に処理することで、レイテンシ短縮およびリトライ削減を図りトラフィックを低減することができるという効果がある。 The present invention has an effect of reducing traffic by reducing latency and retry by simultaneously processing shared requests from a plurality of processors.

本発明の本実施形態におけるレイテンシ短縮システムの全体図である。1 is an overall view of a latency shortening system according to an embodiment of the present invention. グローバルスイッチ・ディレクトリ制御部のブロック図である。It is a block diagram of a global switch directory control unit. 対象アドレスのメモリが存在するノードのグローバルスイッチ・ディレクトリ制御部においてリクエストを受信した場合の動作を示すフローチャートである。It is a flowchart which shows operation | movement at the time of receiving a request in the global switch directory control part of the node in which the memory of an object address exists. 対象アドレスのメモリが存在するノードのグローバルスイッチ・ディレクトリ制御部においてリクエストを受信した場合の動作を示すフローチャートである。It is a flowchart which shows operation | movement at the time of receiving a request in the global switch directory control part of the node in which the memory of an object address exists. 対象アドレスのメモリが存在しないノードのグローバルスイッチ・ディレクトリ制御部においてスヌープを受信した場合の動作を示すフローチャートである。It is a flowchart which shows operation | movement at the time of receiving snoop in the global switch directory control part of the node where the memory of an object address does not exist. 関連技術におけるレイテンシ短縮システムの全体図である。It is a whole view of the latency shortening system in related technology.

次に、本発明における第１の実施形態について図面を参照して説明する。 Next, a first embodiment of the present invention will be described with reference to the drawings.

図1に、本発明の第１の実施形態におけるレイテンシ短縮システム１０の全体図を示す。 FIG. 1 shows an overall view of a latency shortening system 10 according to the first embodiment of the present invention.

本実施形態のレイテンシ短縮システム１０の全体図は、グローバルスイッチ・ディレクトリ制御部(150, 250, 350)の構成のみが上述の図５のシステムと異なる。 The overall view of the latency shortening system 10 of this embodiment differs from the system of FIG. 5 described above only in the configuration of the global switch / directory control unit (150, 250, 350).

レイテンシ短縮システム１０は、図５のシステムと同様、共有メモリを含む多数のマルチプロセッサノード（以下、ノードと呼ぶ）100,200および300から構成される。以下の説明では、共有メモリのことを単にメモリと呼ぶことにする。 Similar to the system of FIG. 5, the latency shortening system 10 includes a large number of multiprocessor nodes (hereinafter referred to as nodes) 100, 200, and 300 including a shared memory. In the following description, the shared memory is simply called a memory.

ノード100,200,および300は、それぞれグローバルスイッチ・ディレクトリ制御部150,250および350を介して相互に接続されている。 The nodes 100, 200, and 300 are connected to each other via global switch / directory control units 150, 250, and 350, respectively.

ノード100は、プロセッサ101,102とローカルスイッチ110とメモリ・ディレクトリ制御部120とメモリ140とディレクトリ130,160とグローバルスイッチ・ディレクトリ制御部150とから構成され、ノード200は、プロセッサ201,202とローカルスイッチ210とメモリ・ディレクトリ制御部220とメモリ240とディレクトリ230,260とグローバルスイッチ・ディレクトリ制御部250とから構成され、ノード300は、プロセッサ301,302とローカルスイッチ310とメモリ・ディレクトリ制御部320とメモリ340とディレクトリ330,360とグローバルスイッチ・ディレクトリ制御部350とから構成される。 The node 100 includes processors 101 and 102, a local switch 110, a memory / directory control unit 120, a memory 140, directories 130 and 160, and a global switch / directory control unit 150. The node 200 includes processors 201 and 202, a local switch 210, and a memory directory. The control unit 220, the memory 240, the directories 230 and 260, and the global switch / directory control unit 250, and the node 300 includes the processors 301 and 302, the local switch 310, the memory / directory control unit 320, the memory 340, the directories 330 and 360, and the global switch directory. And a control unit 350.

本実施形態におけるレイテンシ短縮システム１０の各構成要素の説明を行う前に、本発明の基本概念について通常使用されている技術と対比させながら説明する。 Before describing each component of the latency shortening system 10 in the present embodiment, the basic concept of the present invention will be described in comparison with a commonly used technique.

まず、通常使用されている技術を用いるレイテンシ短縮システムのリクエスト処理について詳述する。 First, a detailed description will be given of request processing of a latency shortening system using a commonly used technique.

プロセッサ201,202および301から同一アドレスに対するSharedリクエストが発行されると、そのリクエストは、ローカルスイッチ210および310からグローバルスイッチ・ディレクトリ制御部250,350を介して、メモリが存在するノード100のグローバル・ディレクトリ制御部150に送られる。ここで、グローバル・ディレクトリ制御部150へのリクエストの到着順は、プロセッサ201からのもの、プロセッサ202からのもの、プロセッサ301からものの順であったとする。 When a shared request for the same address is issued from the processors 201, 202 and 301, the request is sent from the local switches 210 and 310 via the global switch directory control units 250 and 350 to the global directory control unit 150 of the node 100 where the memory exists. Sent to. Here, it is assumed that the arrival order of requests to the global directory control unit 150 is the order from the processor 201, from the processor 202, and from the processor 301.

グローバル・ディレクトリ制御部150はプロセッサ201からのリクエストを受信すると、メモリ140にリクエストを発行しかつディレクトリ160を索引し、キャッシュにM（Modified）またはE（Exclusive）(MESIプロトコル)でデータを持っているプロセッサがあればスヌープを発行する。（キャッシュがSI（Shared Invalid）の場合は、スヌープを発行してもキャッシュからデータが転送されないので、スヌープは発行せずメモリからデータを読み出す）本ケースは有効なキャッシュがない場合でありスヌープは発行していない。 When the global directory control unit 150 receives a request from the processor 201, the global directory control unit 150 issues a request to the memory 140, indexes the directory 160, and holds data in the cache with M (Modified) or E (Exclusive) (MESI protocol). If there is a processor, issue a snoop. (If the cache is SI (Shared Invalid), data is not transferred from the cache even if a snoop is issued, so the data is read from the memory without issuing a snoop.) This case is when there is no valid cache and the snoop is Not issued.

グローバル・ディレクトリ制御部150は、プロセッサ201からのリクエスト処理が完了するまでは、後続の、プロセッサ202およびプロセッサ301からのリクエストは受け付けずリトライし続ける。そして、プロセッサ201からのリクエストに対する処理が終了し、グローバル・ディレクトリ制御部150がプロセッサ201にリプライデータを返却したところで、グローバル・ディレクトリ制御部150は、後続の、プロセッサ202およびプロセッサ301からのリクエストで早く到着した方を受信して処理し、処理を完了したら、最後のリクエストを受信して処理を行う。 The global directory control unit 150 continues to retry without accepting subsequent requests from the processor 202 and the processor 301 until the request processing from the processor 201 is completed. Then, when the processing for the request from the processor 201 is completed and the global directory control unit 150 returns the reply data to the processor 201, the global directory control unit 150 responds to the subsequent requests from the processor 202 and the processor 301. The one that arrives earlier is received and processed, and when the processing is completed, the last request is received and processed.

以上の説明から分かるように、本システムでは、メモリに全く同じデータを３回読みに行っていることになる。 As can be seen from the above description, in this system, the same data is read three times in the memory.

これに対し、本実施形態のレイテンシ短縮システム１０では以下のようにリクエスト処理が行われる。 On the other hand, the request processing is performed as follows in the latency shortening system 10 of the present embodiment.

プロセッサ201,202および301から同一アドレスに対してSharedリクエストが発行されると、そのリクエストは、ローカルスイッチ210および310からグローバルスイッチ・ディレクトリ制御部250および350を介して、メモリが存在するノード100のグローバル・ディレクトリ制御部150に送られる。ここで、グローバル・ディレクトリ制御部150へのリクエストの到着順は、プロセッサ201からのもの、プロセッサ202からのもの、プロセッサ301からのものの順であったとする。 When a shared request is issued from the processors 201, 202, and 301 to the same address, the request is sent from the local switch 210 and 310 via the global switch directory control unit 250 and 350 to the global address of the node 100 where the memory exists. Sent to the directory control unit 150. Here, it is assumed that the arrival order of requests to the global directory control unit 150 is the order from the processor 201, the order from the processor 202, and the order from the processor 301.

グローバル・ディレクトリ制御部150は、プロセッサ201からのリクエストの処理が完了するまでに、後続の、プロセッサ202およびプロセッサ301からのリクエストを受け付けた場合は、リトライせずにリクエストの発行元情報(プロセッサ201,202,301)を覚えておき、メモリからリクエストに対するリプライデータを受信したら、発行元情報にある、当該リクエストの発行元プロセッサにそのリプライデータをブロードキャストする。 If the global directory control unit 150 receives subsequent requests from the processor 202 and the processor 301 by the time the request from the processor 201 is completed, the global directory control unit 150 does not retry and issues the request issuer information (processors 201, 202, 301). ) And when the reply data for the request is received from the memory, the reply data is broadcast to the issuer processor of the request in the issuer information.

このように、上述した図５のシステムでは３つのリクエストに対して３回メモリまで読みに行っていたのに対し、本実施形態では1回読みに行くだけで済み、レイテンシを短縮することができる。 As described above, in the system shown in FIG. 5 described above, the memory is read up to three times for three requests, but in this embodiment, only one read is required and the latency can be shortened. .

本実施形態において、スヌープが発行された場合のリクエスト処理は以下のようになる。 In the present embodiment, the request processing when a snoop is issued is as follows.

プロセッサ301のキャッシュがEM（Exclusive Modified）でデータを保持しており、プロセッサ201および202から、同一アドレスにあるデータに対するSharedリクエストが発行されると、そのリクエストはローカルスイッチ210からグローバルスイッチ・ディレクトリ制御部250を介して、メモリが存在するノード100のグローバル・ディレクトリ制御部150に送られる。ここで、グローバル・ディレクトリ制御部150への当該リクエストの到着順は、プロセッサ201からのもの、プロセッサ202からのものの順であったとする。 The cache of the processor 301 holds data in EM (Exclusive Modified). When a shared request for data at the same address is issued from the processors 201 and 202, the request is sent from the local switch 210 to the global switch directory control. The data is sent to the global directory control unit 150 of the node 100 where the memory exists via the unit 250. Here, it is assumed that the arrival order of the requests to the global directory control unit 150 is the order from the processor 201 and the order from the processor 202.

グローバル・ディレクトリ制御部150は、プロセッサ201からのリクエストを受信すると、メモリに140にリクエストを発行しかつディレクトリ(160)を索引しスヌープを発行する。 When receiving the request from the processor 201, the global directory control unit 150 issues a request to the memory 140, indexes the directory (160), and issues a snoop.

プロセッサ201からのリクエストを受信してから、メッセージ格納バッファから読み出されるまでに、後続のプロセッサ202からのリクエストを受け付けた場合は、そのプロセッサ202を発行元情報(プロセッサ201,202)に登録し、CPU301に対するスヌープと一緒に発行元情報を持ち回るようにする。そして、CPU301のキャッシュからデータ転送がなされた場合に、グローバルスイッチ・ディレクトリ制御部350から発行元情報に登録されているプロセッサ201,202に対してデータをブロードキャストする。 When a request from the subsequent processor 202 is received after the request from the processor 201 is received until it is read from the message storage buffer, the processor 202 is registered in the issuer information (processors 201 and 202) and Carry out publisher information with Snoop. When data is transferred from the cache of the CPU 301, the global switch / directory control unit 350 broadcasts data to the processors 201 and 202 registered in the issuer information.

このように、図５のシステムではプロセッサ202からのリクエストはプロセッサ201の処理が完了するまで処理されなかったのに対して、本実施形態では同時に処理が行われるためレイテンシを短縮することができる。 As described above, in the system of FIG. 5, the request from the processor 202 is not processed until the processing of the processor 201 is completed, but in the present embodiment, the processing is performed simultaneously, so that the latency can be shortened.

次に、本実施形態のレイテンシ短縮システム１０の各構成要素について詳述する。 Next, each component of the latency shortening system 10 of this embodiment is explained in full detail.

ノード100,200および300は全て同じ構成であるため、1つのノード100について各部の説明を行うことにする。 Since all of the nodes 100, 200, and 300 have the same configuration, each part of the node 100 will be described.

ローカルスイッチ110は、プロセッサ101および102とメモリ・ディレクトリ制御部120とグローバルスイッチ・ディレクトリ制御部150とに接続される。ローカルスイッチ110は、発行されるリクエスト、スヌープ、スヌープに対する応答(以下、レスポンスと呼ぶ)、および、リクエストに対する応答(以下、リプライと呼ぶ)をあて先に基づいてルーティングする。 The local switch 110 is connected to the processors 101 and 102, the memory / directory control unit 120, and the global switch / directory control unit 150. The local switch 110 routes the issued request, snoop, response to the snoop (hereinafter referred to as response), and response to the request (hereinafter referred to as reply) based on the destination.

メモリ・ディレクトリ制御部120は、メモリ140へのアクセスを要求する全リクエストを受信し、メモリ140へのアクセスを行い、ディレクトリ130を索引して、有効なキャッシュがある場合は、対象となるプロセッサに対してスヌープ処理を行う。そして、メモリ・ディレクトリ制御部120は、レスポンスおよびメモリからのデータが揃ったところでリプライとしてリクエスト発行元へ返却する。 The memory / directory control unit 120 receives all requests for access to the memory 140, performs access to the memory 140, indexes the directory 130, and if there is a valid cache, A snoop process is performed on it. Then, the memory / directory control unit 120 returns it to the request issuing source as a reply when the response and the data from the memory are prepared.

ディレクトリ130および160はプロセッサのキャッシュの状態を管理するものである。メモリ・ディレクトリ制御部120と接続されているディレクトリ130の管理対象は、ノード100内のプロセッサ101,102と他のプロセッサを示すグローバルスイッチ・ディレクトリ制御部150とである。グローバルスイッチ・ディレクトリ制御部150と接続されているディレクトリ160の管理対象は、他ノード200および300のプロセッサ201,202,301および302である。 Directories 130 and 160 manage the cache state of the processor. The management targets of the directory 130 connected to the memory / directory control unit 120 are the processors 101 and 102 in the node 100 and the global switch / directory control unit 150 indicating other processors. The management targets of the directory 160 connected to the global switch / directory control unit 150 are the processors 201, 202, 301, and 302 of the other nodes 200 and 300.

グローバルスイッチ・ディレクトリ制御部150は、他ノード200および300のグローバルスイッチ・ディレクトリ制御部250および350と接続され、リクエスト、スヌープ、レスポンスおよびリプライをあて先に従ってルーティングする。また、他ノード200および300からのリクエストおよびローカルスイッチ110からのスヌープを受信したとき、グローバルスイッチ・ディレクトリ制御部150はディレクトリ160を索引し、有効なデータをもつキャッシュがあった場合は、グローバルスイッチ・ディレクトリ制御部150はそのプロセッサに対してスヌープ処理を行う。 The global switch / directory control unit 150 is connected to the global switch / directory control units 250 and 350 of the other nodes 200 and 300, and routes requests, snoops, responses, and replies according to the destinations. When the requests from the other nodes 200 and 300 and the snoop from the local switch 110 are received, the global switch / directory control unit 150 indexes the directory 160, and if there is a cache having valid data, the global switch The directory control unit 150 performs a snoop process for the processor.

次に、グローバルスイッチ・ディレクトリ制御部150,250および350がアドレス競合したSharedリクエストを登録し処理する制御について詳細に説明する。 Next, the control in which the global switch / directory control units 150, 250, and 350 register and process a Shared request with an address conflict will be described in detail.

図２に示すように、グローバルスイッチ・ディレクトリ制御部150,250および350は、対象アドレスのメモリが存在する場合に機能する領域１と、対象アドレスのメモリが存在しない場合に機能する領域２とから構成される。 As shown in FIG. 2, each of the global switch / directory control units 150, 250, and 350 includes an area 1 that functions when a memory at a target address exists and an area 2 that functions when a memory at a target address does not exist. The

グローバルスイッチ・ディレクトリ制御部150の領域１は、調停回路150-1、アドレス一致回路150-2、登録判定回路150-3、制御レジスタ１150-4、メッセージ格納バッファ150-5、受付可能判定レジスタ150-6、発行元登録レジスタ150-7、発行元登録レジスタ150-8、リトライバッファ150-9、メッセージ発行・生成回路150-10、調停回路150-11、および、メッセージブロードキャスト回路150-12を備える。 Area 1 of the global switch / directory control unit 150 includes an arbitration circuit 150-1, an address matching circuit 150-2, a registration determination circuit 150-3, a control register 1150-4, a message storage buffer 150-5, and an acceptability determination register 150. -6, issuer registration register 150-7, issuer registration register 150-8, retry buffer 150-9, message issue / generation circuit 150-10, arbitration circuit 150-11, and message broadcast circuit 150-12 .

また、グローバルスイッチ・ディレクトリ制御部150の領域２は、調停回路350-1、制御レジスタ350-2、メッセージ格納バッファ350-3、発行元情報登録レジスタ350-4、メッセージ発行・生成回路350-5、および、メッセージブロードキャスト回路350-5を備える。 The area 2 of the global switch / directory control unit 150 includes an arbitration circuit 350-1, a control register 350-2, a message storage buffer 350-3, an issuer information registration register 350-4, and a message issue / generation circuit 350-5. And a message broadcast circuit 350-5.

まず、ノードに対象アドレスのメモリが存在する場合に機能する領域１の動作について説明する。 First, the operation of the region 1 that functions when the memory of the target address exists in the node will be described.

調停回路150-1は、順不同で送信されるローカルスイッチ110からのメッセージと他ノード200および300からのメッセージとを調停し、どちらのメッセージを処理するかを決定する。 The arbitration circuit 150-1 arbitrates messages from the local switch 110 and messages from the other nodes 200 and 300 transmitted in random order, and determines which message is to be processed.

アドレス一致回路150-2は、調停回路150-1で選ばれたメッセージのアドレスをメッセージ格納バッファ150-5に登録されている有効なリクエストまたはスヌープのアドレスとアドレス比較する。そして、一致した場合は、一致結果および一致したエントリの受付可能判定レジスタ150-6の値を読み出し登録判定回路150-3に通知する。 The address matching circuit 150-2 compares the address of the message selected by the arbitration circuit 150-1 with the address of a valid request or snoop registered in the message storage buffer 150-5. If they match, the match result and the value in the acceptability determination register 150-6 of the matched entry are read and notified to the registration determination circuit 150-3.

登録判定回路150-3は、リクエストを対象としてアドレス一致回路150-2から通知されるアドレス一致結果および受付可能判定情報を基に登録の可否を判定する。受付可能判定情報は受付可能判定レジスタ150-6に格納されている。受付可能判定情報については以下で詳述する。 The registration determination circuit 150-3 determines whether or not registration is possible based on the address match result and acceptability determination information notified from the address match circuit 150-2 for the request. The acceptance determination information is stored in the acceptance determination register 150-6. The acceptability determination information will be described in detail below.

アドレス一致なしの場合、登録判定回路150-3はリクエストをメッセージ格納バッファ150-5に登録し、発行元の情報を発行元登録レジスタ150-7,150-8に登録する。一方、アドレス一致ありの場合は、受付可能判定情報内の受付可能フラグが1（受付可能）でかつリクエストの種別がSharedリクエストならば、登録判定回路150-3は発行元の情報を発行元登録レジスタ150-7または発行元登録レジスタ150-8に追加登録する。 If there is no address match, the registration determination circuit 150-3 registers the request in the message storage buffer 150-5, and registers the issuer information in the issuer registration registers 150-7 and 150-8. On the other hand, if there is an address match, if the acceptability flag in the acceptability determination information is 1 (acceptable) and the request type is a Shared request, the registration determination circuit 150-3 registers the issuer information as the issuer It is additionally registered in the register 150-7 or the issuer registration register 150-8.

なお、発行元情報を発行元登録レジスタ150-7および発行元登録レジスタ150-8のどちらに登録するかは、受付可能判定情報内の発行元登録レジスタ指示フラグに従って決められる。それ以外のケースに関しては全てリトライバッファ150-9に格納されリトライが行われる。 Whether the issuer information is registered in the issuer registration register 150-7 or the issuer registration register 150-8 is determined according to the issuer registration register instruction flag in the acceptance determination information. All other cases are stored in the retry buffer 150-9 and retried.

制御レジスタ150-4は、メッセージ格納バッファ150-5に格納された各リクエストまたはスヌープを処理する際に必要となる制御フラグがエントリ数分用意されている。すなわち、制御レジスタ150-4には、エントリ有効フラグ、発行要求フラグ、レスポンス受信フラグ、リプライデータ受信フラグ、リプライ発行済みフラグ、ライトバック受信フラグおよびデータ転送フラグなどの制御情報が上記制御フラグとして格納されている。 In the control register 150-4, control flags necessary for processing each request or snoop stored in the message storage buffer 150-5 are prepared for the number of entries. That is, the control register 150-4 stores control information such as an entry valid flag, issue request flag, response reception flag, reply data reception flag, reply issued flag, writeback reception flag, and data transfer flag as the control flag. Has been.

なお、フラグ類はエントリValidが0になる時に全てクリアされる。 All flags are cleared when the entry Valid becomes 0.

メッセージ格納バッファ150-5は、リクエスト、スヌープ、レスポンスデータおよびリプライデータを格納するバッファである。 The message storage buffer 150-5 is a buffer that stores requests, snoops, response data, and reply data.

受付可能判定レジスタ150-6は、受付可能判定情報を格納している。受付可能判定情報には、受付可能フラグ、Sharedリクエストフラグ、登録期間フラグおよび発行元登録レジスタ指示フラグの4種類が含まれる。 The acceptance determination register 150-6 stores acceptance determination information. The acceptability determination information includes four types of an acceptability flag, a shared request flag, a registration period flag, and an issuer registration register instruction flag.

受付可能フラグは、後続のSharedリクエストを受付可能であるかどうかを示すものである。受付可能フラグは、Sharedリクエストフラグが1で、かつ、登録期間フラグが1の場合に、1となる。 The acceptance flag indicates whether or not subsequent shared requests can be accepted. The acceptance flag is 1 when the Shared request flag is 1 and the registration period flag is 1.

Sharedリクエストフラグは、リクエストがSharedリクエストか否かを示すフラグである。Sharedリクエストフラグは、リクエスト登録時における種別確認の結果、Sharedリクエストであれば1となる。 The Shared request flag is a flag indicating whether the request is a Shared request. The Shared request flag is 1 if it is a Shared request as a result of the type confirmation at the time of request registration.

登録期間フラグは、後続のリクエストの受付可能期間を示している。リクエストの受信から、転送データありレスポンスとライトバックデータとリプライデータとのいずれかを受信するまでの期間に1となる。 The registration period flag indicates a period during which a subsequent request can be accepted. It is 1 during the period from the reception of the request until the reception of the response with transfer data, the write-back data, or the reply data.

発行元登録レジスタ指示フラグは、発行元登録レジスタ150-7と発行元登録レジスタ150-8とのどちらに発行元情報を登録するかを示す情報である。発行元登録レジスタ指示フラグは、リクエストに対してスヌープが発行された場合に1となり、発行元登録レジスタ150-8への登録を示すことになる。一方、発行元登録レジスタ指示フラグが0の場合は、発行元登録レジスタ150-7への登録が行われる。 The issuer registration register instruction flag is information indicating whether issuer information is registered in the issuer registration register 150-7 or the issuer registration register 150-8. The issuer registration register instruction flag becomes 1 when a snoop is issued for the request, and indicates registration in the issuer registration register 150-8. On the other hand, when the issuer registration register instruction flag is 0, registration in the issuer registration register 150-7 is performed.

発行元登録レジスタ150-7および発行元登録レジスタ150-8は、リクエストの発行元を登録しておくものである。発行元登録レジスタ150-7および発行元登録レジスタ150-8は複数の１ビット格納部を有し、これらの１ビット格納部はシステム内の複数のプロセッサと一対一対応し、リクエストを受信した際にメッセージ中の発行元情報により示されるプロセッサに対応する１ビット格納部の内容が１になる。 The issuer registration register 150-7 and the issuer registration register 150-8 register a request issuer. The issuer registration register 150-7 and the issuer registration register 150-8 have a plurality of 1-bit storage units, and these 1-bit storage units have a one-to-one correspondence with a plurality of processors in the system and receive a request. The content of the 1-bit storage unit corresponding to the processor indicated by the issuer information in the message is 1.

なお、受付可能判定レジスタ150-6の発行元登録レジスタ指示フラグの値に基づいて登録するレジスタが決定されるが、レジスタへの登録方法に違いはない。 Note that the register to be registered is determined based on the value of the issuer registration register instruction flag of the acceptance determination register 150-6, but there is no difference in the registration method.

発行元登録レジスタ150-7および発行元登録レジスタ150-8はいずれもエントリValidが0になる時に全てクリアされる。 Both the issuer registration register 150-7 and the issuer registration register 150-8 are cleared when the entry Valid becomes 0.

リトライバッファ150-9は、登録判定回路150-3の結果がリトライとなった場合にリクエストを格納し、このとき、発行元に対してリトライ指示を行う。 The retry buffer 150-9 stores a request when the result of the registration determination circuit 150-3 is a retry, and issues a retry instruction to the issuer at this time.

メッセージ発行生成回路150-10は、制御レジスタ150-4の発行要求フラグから発行するエントリを決定する。そして、メッセージ発行生成回路150-10は、選ばれたエントリの制御レジスタ150-4とメッセージ格納バッファ150-5と発行元登録レジスタ150-7と発行元登録レジスタ150-8との情報を読み出し、ディレクトリ160の索引および更新と制御レジスタ150-4の情報とから、発行するメッセージがリクエスト、スヌープ、レスポンス、リプライのいずれであるかを特定し生成する。 The message issuance generation circuit 150-10 determines an entry to be issued from the issuance request flag in the control register 150-4. Then, the message issuance generation circuit 150-10 reads the information of the control register 150-4, the message storage buffer 150-5, the issuer registration register 150-7, and the issuer registration register 150-8 of the selected entry, From the index and update of the directory 160 and the information in the control register 150-4, it is specified and generated whether the message to be issued is a request, a snoop, a response, or a reply.

調停回路150-11は、順不同で送られてくるメッセージ発行・生成回路からの他ノード200,300行きの通常のメッセージとリトライバッファ150-9からのリトライのメッセージとを調停し、どちらのメッセージを処理するかを決定するものである。 The arbitration circuit 150-11 arbitrates the normal message sent to the other nodes 200 and 300 from the message issuing / generating circuit sent in random order and the retry message from the retry buffer 150-9, and processes either message. Is to decide.

メッセージ・ブロードキャスト回路150-12は、メッセージ発行・生成回路150-10で生成されたメッセージおよび発行元情報（発行元登録レジスタ150-7および発行元登録レジスタ150-8の情報から条件により選択される）を基に、リプライまたはリプライデータの際に発行元情報を参照し登録された全プロセッサに対してリプライまたはリプライデータをブロードキャストする。 The message / broadcast circuit 150-12 is selected based on the message generated by the message issuing / generating circuit 150-10 and the issuer information (information of the issuer registration register 150-7 and the issuer registration register 150-8). ), Broadcast the reply or reply data to all the registered processors with reference to the issuer information at the time of reply or reply data.

次に、ノードに対象アドレスのメモリが存在しない場合に機能する領域２の動作について説明する。 Next, the operation of the area 2 that functions when the memory of the target address does not exist in the node will be described.

調停回路350-1は、順不同で送られてくるローカルスイッチ310からのメッセージと他ノード100,200からのメッセージとを調停し、どちらのメッセージを処理するか決定する。 The arbitration circuit 350-1 arbitrates the message from the local switch 310 and the message from the other nodes 100 and 200 sent in random order, and determines which message is to be processed.

制御レジスタ350-2には、メッセージ格納バッファ350-3に格納されたスヌープを処理する際に必要となる制御フラグがエントリ数分用意されている。制御レジスタ350-2には、エントリ有効フラグ、発行要求フラグ、レスポンス受信フラグおよびデータ転送フラグなどの制御情報が制御フラグとして格納されている。 In the control register 350-2, there are prepared as many control flags as the number of entries necessary for processing the snoop stored in the message storage buffer 350-3. Control information such as an entry valid flag, an issue request flag, a response reception flag, and a data transfer flag is stored in the control register 350-2 as control flags.

メッセージ格納バッファ350-3は、スヌープおよびレスポンスデータを格納するバッファである。 The message storage buffer 350-3 is a buffer that stores snoop and response data.

発行元登録レジスタ350-4は、リクエストの発行元を登録しておくものであり、スヌープと一緒に持ちまわられる発行元情報を登録する。 The issuer registration register 350-4 registers the issuer of the request, and registers issuer information brought along with the snoop.

メッセージ発行・生成回路350-5は、制御レジスタ350-2の発行要求フラグに基づいて発行するエントリを決定する。そして、メッセージ発行・生成回路350-5は、選ばれたエントリの制御レジスタ350-2、メッセージ格納バッファ350-3、および、発行元登録レジスタ350-4の情報を読み出し、発行するメッセージがスヌープかレスポンスのいずれであるかを特定し生成する。 The message issue / generation circuit 350-5 determines an entry to be issued based on the issue request flag of the control register 350-2. The message issuance / generation circuit 350-5 reads the information in the control register 350-2, message storage buffer 350-3, and issuer registration register 350-4 of the selected entry, and determines whether the issued message is a snoop. Identify and generate a response.

メッセージブロードキャスト回路350-6は、メッセージ発行・生成回路350-5で生成されたメッセージと発行元情報（発行元登録レジスタ350-4の値）を基に、データ転送の際に発行元情報を参照し、登録された全プロセッサに対してデータをブロードキャストする。 The message broadcast circuit 350-6 refers to the issuer information at the time of data transfer based on the message generated by the message issue / generate circuit 350-5 and the issuer information (value of the issuer registration register 350-4). Then, the data is broadcast to all registered processors.

なお、図1のプロセッサ101,102,201,202,301,302、ローカルスイッチ110,210,310、メモリ・ディレクトリ制御部120,220,320、メモリ140,240,340、およびディレクトリ130,160,230.260,330,360は周知であるため、その詳細な説明は省略する。 Since the processors 101, 102, 201, 202, 301, 302, the local switches 110, 210, 310, the memory / directory control units 120, 220, 320, the memories 140, 240, 340, and the directories 130, 160, 230.260, 330, 360 in FIG. 1 are well known, a detailed description thereof will be omitted.

次に、本実施形態の動作について図面を参照して説明する。 Next, the operation of this embodiment will be described with reference to the drawings.

図１のグローバルスイッチ・ディレクトリ制御部150,250,350の動作について、図３−Ａ、図３−Ｂおよび図４のフローチャートを参照して説明する。 The operation of the global switch / directory control unit 150, 250, 350 of FIG. 1 will be described with reference to the flowcharts of FIGS.

図３−ＡおよびＢは、対象アドレスのメモリが存在するノードのグローバルスイッチ・ディレクトリ制御部150においてリクエストを受信した場合の動作である。 FIGS. 3A and 3B are operations when a request is received by the global switch / directory control unit 150 of the node where the memory of the target address exists.

他ノード200,300からのリクエストを受信すると、調停回路150-1は、種別を参照し、InvalidateリクエストかSharedリクエストかを判別する（図３−ＡのS0）。 When receiving a request from the other nodes 200 and 300, the arbitration circuit 150-1 refers to the type and determines whether it is an Invalidate request or a Shared request (S0 in FIG. 3A).

Invalidateリクエストの場合は、図５のシステムと同様、アドレス一致回路150-2にてアドレス比較が行われる。そして、登録判定回路150-3は、一致なしの場合は、メッセージ格納バッファ150-5へリクエストを登録して処理を行い、一致ありの場合は、リトライバッファ150-9へリクエストを登録しリトライとする（図３−ＡのS26）。 In the case of an Invalidate request, address comparison is performed by the address matching circuit 150-2, as in the system of FIG. If there is no match, the registration determination circuit 150-3 registers the request in the message storage buffer 150-5 and performs processing. If there is a match, the registration determination circuit 150-3 registers the request in the retry buffer 150-9 and performs a retry. (S26 in FIG. 3-A).

Sharedリクエストの場合は（図３−ＡのS1）、アドレス一致回路150-2にてアドレス比較が行われる（図３−ＡのS2）。一致なしの場合、登録判定回路150-3は、メッセージ格納バッファ150-5にリクエストを登録し、発行元登録レジスタ150-７の発行元に対応する１ビット格納部の内容を1にして、制御レジスタ150-4のエントリ有効フラグと発行要求フラグとを1にする（図３−ＡのS3）。 In the case of a Shared request (S1 in FIG. 3-A), the address matching circuit 150-2 performs address comparison (S2 in FIG. 3-A). If there is no match, the registration determination circuit 150-3 registers the request in the message storage buffer 150-5, sets the content of the 1-bit storage unit corresponding to the issuer of the issuer registration register 150-7 to 1, and performs control. The entry valid flag and issue request flag of the register 150-4 are set to 1 (S3 in FIG. 3A).

メッセージ発行・生成回路150-10は、発行要求フラグが1になっているエントリの中から発行するエントリを決定し、そのエントリについての制御レジスタ150-4とメッセージ格納バッファ150-5と発行元登録レジスタ150-7と発行元登録レジスタ150-8とに格納された値を読み出し、読み出しが完了したら制御レジスタ150-4の発行要求フラグを0にする。 The message issuance / generation circuit 150-10 determines an entry to be issued from the entries whose issue request flag is 1, and registers the control register 150-4, the message storage buffer 150-5, and the issuer for the entry. The values stored in the register 150-7 and the issuer registration register 150-8 are read, and when the reading is completed, the issue request flag of the control register 150-4 is set to 0.

メッセージ生成・発行回路150-10は、読み出した情報からリクエストを生成してメモリへ発行し、ディレクトリ160を索引する（図３−ＡのS4）。メッセージ生成・発行回路150-10は、ディレクトリ160の索引結果から有効なキャッシュがあるかどうかを判別する（図３−ＡのS5）。 The message generation / issuance circuit 150-10 generates a request from the read information, issues it to the memory, and indexes the directory 160 (S4 in FIG. 3A). The message generation / issue circuit 150-10 determines whether there is a valid cache from the index result of the directory 160 (S5 in FIG. 3-A).

そして、有効なキャッシュがない場合、メッセージ生成・発行回路150-10は、メモリからのリプライデータを待ち（図３−ＡのS6）、リプライデータを受信したら、リプライデータをメッセージ格納バッファ150-5に登録し、制御レジスタ150-4のリプライデータ受信フラグと発行要求フラグとを1にする。 If there is no valid cache, the message generation / issuance circuit 150-10 waits for reply data from the memory (S6 in FIG. 3A). When the reply data is received, the reply data is sent to the message storage buffer 150-5. And the reply data reception flag and issue request flag of the control register 150-4 are set to 1.

メッセージ生成・発行回路150-10は、読み出した情報からリプライデータを生成し、かつ、発行元情報として発行元登録レジスタ150-7の値をメッセージブロードキャスト回路150-12に通知し、制御レジスタ150-4のエントリ有効フラグを0にする。 The message generation / issuance circuit 150-10 generates reply data from the read information, and notifies the message broadcast circuit 150-12 of the value of the issuer registration register 150-7 as issuer information, and the control register 150- Set the entry valid flag of 4 to 0.

メッセージ発行・生成回路150-10からリプライデータおよび発行元情報を受信したメッセージブロードキャスト回路150-12は、発行元情報に登録されている全プロセッサにリプライデータをブロードキャストする（図３−ＡのS7）。 The message broadcast circuit 150-12 that has received the reply data and the issuer information from the message issue / generate circuit 150-10 broadcasts the reply data to all the processors registered in the issuer information (S7 in FIG. 3-A). .

ディレクトリ160の索引結果から有効なキャッシュがあった場合は（図３−ＡのS8）、メッセージ発行・生成回路150-10はスヌープを生成して有効なキャッシュへ発行する。その際に、発行元登録レジスタ150-7の値を一緒に持ち回るようにする（図３−ＡのS8）。 If there is a valid cache from the index result of the directory 160 (S8 in FIG. 3A), the message issuing / generating circuit 150-10 generates a snoop and issues it to the effective cache. At that time, the value of the issuer registration register 150-7 is carried around together (S8 in FIG. 3-A).

そして、スヌープを発行したらキャッシュからのレスポンスを待ち（図３−ＡのS9）、レスポンスを受信したら制御レジスタ150-4のレスポンス受信フラグを1にし、さらにはレスポンスの種別から、キャッシュからデータ転送されたものかどうかを判定する（図３−ＢのS10）。 When a snoop is issued, a response from the cache is waited (S9 in FIG. 3-A). When a response is received, the response reception flag of the control register 150-4 is set to 1, and further, data is transferred from the cache according to the response type. (S10 in FIG. 3-B).

レスポンスの判定結果からデータ転送なしの場合には、メモリからのリプライデータを待ち（図３−ＢのS11）、リプライデータを受信したら、リプライデータをメッセージ格納バッファ150-5に登録し、制御レジスタ150-4のリプライデータ受信フラグと発行要求フラグとを1にする。 If there is no data transfer from the response determination result, it waits for reply data from the memory (S11 in FIG. 3-B), and when reply data is received, it registers the reply data in the message storage buffer 150-5 and controls the control register. Set the reply data reception flag and issue request flag of 150-4 to 1.

メッセージ発行・生成回路150-10は、発行要求フラグが1になっているエントリの中から発行するエントリを決定し、そのエントリの制御レジスタ150-4とメッセージ格納バッファ150-5と発行元登録レジスタ150-7と発行元登録レジスタ150-8とに格納された値を読み出す。そして、メッセージ発行・生成回路150-10は、読み出しが完了したら制御レジスタ150-4の発行要求フラグを0にする。 The message issuance / generation circuit 150-10 determines an entry to be issued from the entries whose issue request flag is 1, and controls the control register 150-4, the message storage buffer 150-5, and the issuer registration register of the entry The values stored in 150-7 and issuer registration register 150-8 are read. Then, the message issuing / generating circuit 150-10 sets the issue request flag of the control register 150-4 to 0 when the reading is completed.

メッセージ生成・発行回路150-10は、読み出した情報からリプライデータを生成し、かつ、発行元情報として発行元登録レジスタ150-7と発行元登録レジスタ150-8との対応する各１ビット格納部内の値の論理和をとったものをメッセージブロードキャスト回路150-12に通知し、制御レジスタ150-4のエントリ有効フラグを0にする。 The message generation / issuance circuit 150-10 generates reply data from the read information, and in each 1-bit storage section corresponding to the issuer registration register 150-7 and the issuer registration register 150-8 as issuer information Is obtained by notifying the message broadcast circuit 150-12 of the logical sum of the two values, and the entry valid flag of the control register 150-4 is set to 0.

メッセージ発行・生成回路150-10からリプライデータおよび発行元情報を受信したメッセージブロードキャスト回路150-12は、発行元情報に登録されている全プロセッサにリプライデータをブロードキャストする（図３−ＢのS12）。 The message broadcast circuit 150-12 that has received reply data and issuer information from the message issue / generate circuit 150-10 broadcasts reply data to all the processors registered in the issuer information (S12 in FIG. 3-B). .

レスポンスの判定結果から、データ転送があったと判定された場合は（図３−ＢのS10）、制御レジスタ150-4の発行要求フラグとデータ転送フラグとを1にし、さらにライトバックデータもあった場合はライトバック受信フラグを1にする。 If it is determined from the response determination result that there has been data transfer (S10 in FIG. 3-B), the issue request flag and the data transfer flag in the control register 150-4 are set to 1, and there is also write-back data. In this case, set the write-back reception flag to 1.

また、スヌープに対するレスポンスを受信する前に、メモリからリプライデータを受信した場合は、制御レジスタ150-4のリプライデータ受信フラグおよびメッセージ格納バッファ150-5にデータを登録しておく。 If reply data is received from the memory before receiving a response to the snoop, the data is registered in the reply data reception flag of the control register 150-4 and the message storage buffer 150-5.

メッセージ発行・生成回路150-10は、発行要求フラグが1になっているエントリの中から発行するエントリを決定し、そのエントリの制御レジスタ150-4とメッセージ格納バッファ150-5と発行元登録レジスタ150-7と発行元登録レジスタ150-8とを読み出し、読み出しが完了したら発行要求フラグを0にする。 The message issuance / generation circuit 150-10 determines an entry to be issued from the entries whose issue request flag is 1, and controls the control register 150-4, the message storage buffer 150-5, and the issuer registration register of the entry 150-7 and the issuer registration register 150-8 are read, and when the reading is completed, the issue request flag is set to 0.

メッセージ生成・発行回路150-10は、読み出した情報からライトバック受信フラグ（図４のS13）とリプライデータ受信フラグ（図４のS14）とを参照し、ライトバック受信フラグが0、リプライデータ受信フラグが0の場合は、リプライを生成しかつ発行元情報として発行元登録レジスタ150-7の値をメッセージブロードキャスト回路150-12に通知する。そして、メッセージ発行・生成回路150-10は制御レジスタ150-4のリプライ発行済みフラグを1にする。 The message generation / issuance circuit 150-10 refers to the write-back reception flag (S13 in FIG. 4) and the reply-data reception flag (S14 in FIG. 4) from the read information. When the flag is 0, a reply is generated and the value of the issuer registration register 150-7 is notified to the message broadcast circuit 150-12 as issuer information. Then, the message issuing / generating circuit 150-10 sets the reply issued flag in the control register 150-4 to 1.

メッセージ発行・生成回路150-10からのリプライおよび発行元情報を受信したメッセージブロードキャスト回路150-12は、発行元情報に登録されている全プロセッサにリプライをブロードキャストする（図３−ＢのS15）。 The message broadcast circuit 150-12 that has received the reply and issuer information from the message issuing / generating circuit 150-10 broadcasts the reply to all the processors registered in the issuer information (S15 in FIG. 3-B).

さらに、メモリからのリプライデータを待ち（図３−ＢのS16）、リプライデータを受信したら制御レジスタ150-4のリプライ発行済みフラグを参照する。リプライ発行済みフラグが1の場合は、発行元登録レジスタ150-8を参照し（図３−ＢのS17）、発行元登録レジスタ150-8に登録されたプロセッサがない場合は制御レジスタ150-4のエントリ有効フラグを0にする。 Further, it waits for reply data from the memory (S16 in FIG. 3B), and when reply data is received, refers to the reply issued flag in the control register 150-4. When the reply issued flag is 1, the issuer registration register 150-8 is referred to (S17 in FIG. 3-B), and when there is no processor registered in the issuer registration register 150-8, the control register 150-4 Set the entry valid flag to 0.

また、発行元登録レジスタ150-8に登録されたプロセッサがある場合は、リプライデータをメッセージ格納バッファ150-5に登録し、制御レジスタ150-4のリプライデータ受信フラグおよび発行要求フラグを1にする。メッセージ発行・生成回路150-10は発行要求フラグが1になっているエントリから発行するエントリを決定し、そのエントリの制御レジスタ150-4とメッセージ格納バッファ150-5と発行元登録レジスタ150-7と発行元登録レジスタ150-8とに格納された値を読み出し、読み出しが完了したら制御レジスタ150-4の発行要求フラグを0にする。 If there is a processor registered in the issuer registration register 150-8, the reply data is registered in the message storage buffer 150-5, and the reply data reception flag and issue request flag in the control register 150-4 are set to 1. . The message issuance / generation circuit 150-10 determines an entry to be issued from an entry whose issuance request flag is 1, and controls the control register 150-4, message storage buffer 150-5, and issuer registration register 150-7 of the entry. And the value stored in the issuer registration register 150-8 are read, and when the reading is completed, the issue request flag of the control register 150-4 is set to 0.

メッセージ生成・発行回路150-10は、読み出した情報からリプライデータを生成する。そして、メッセージ生成・発行回路150-10は，発行元情報として発行元登録レジスタ150-8の値をメッセージブロードキャスト回路150-12に通知し、制御レジスタ150-4のエントリ有効フラグを0にする。メッセージ発行・生成回路150-10からのリプライデータと発行元情報を受信したメッセージブロードキャスト回路150-12は、発行元情報に登録されている全プロセッサにリプライデータをブロードキャストする(図３−ＢのS18)。 The message generation / issuance circuit 150-10 generates reply data from the read information. Then, the message generation / issue circuit 150-10 notifies the message broadcast circuit 150-12 of the value of the issuer registration register 150-8 as issuer information, and sets the entry valid flag of the control register 150-4 to 0. The message broadcast circuit 150-12 that has received the reply data and issuer information from the message issue / generate circuit 150-10 broadcasts the reply data to all the processors registered in the issuer information (S18 in FIG. 3-B). ).

次に、ライトバック受信フラグが0、リプライデータ受信フラグが1の場合は、メッセージ発行・生成回路150-10は、リプライおよびリプライデータの2種類を生成する。そして、メッセージ発行・生成回路150-10は、リプライに関しては発行元登録レジスタ150-7の値を、リプライデータに関しては発行元登録レジスタ150-8の値をメッセージブロードキャスト回路150-12に通知し、制御レジスタ150-4のエントリ有効フラグを0にする。 Next, when the write-back reception flag is 0 and the reply data reception flag is 1, the message issuing / generating circuit 150-10 generates two types of reply and reply data. The message issuance / generation circuit 150-10 notifies the message broadcast circuit 150-12 of the value of the issuer registration register 150-7 for the reply, and the value of the issuer registration register 150-8 for the reply data. Set the entry valid flag of the control register 150-4 to 0.

ただし、発行元登録レジスタ150-8に何も登録されていない場合は、リプライデータは生成されない。 However, when nothing is registered in the issuer registration register 150-8, reply data is not generated.

メッセージ発行・生成回路150-10からのリプライおよびリプライデータと対応した発行元情報を受信したブロードキャスト回路150-12は、発行元情報に登録されている全プロセッサにリプライおよびリプライデータをブロードキャストする（図４のS19）。 The broadcast circuit 150-12 that has received the reply information from the message issuance / generation circuit 150-10 and the reply source data corresponding to the reply data broadcasts the reply and reply data to all the processors registered in the publisher information (see FIG. 4 S19).

次に、ライトバック受信フラグが1の場合は（図３−ＢのS13）、リプライおよびリプライデータライトバックデータを使用の２種類を生成し、リプライに関しては発行元登録レジスタ150-7の値を、リプライデータに関しては発行元登録レジスタ150-8の値をメッセージブロードキャスト回路150-12に通知する。こちらも、発行元登録レジスタ150-8に何も登録されていない場合は、リプライデータは生成されない。 Next, when the write-back reception flag is 1 (S13 in FIG. 3-B), two types of reply and reply data write-back data are generated, and the value of the issuer registration register 150-7 is set for reply. As for reply data, the value of the issuer registration register 150-8 is notified to the message broadcast circuit 150-12. Again, if nothing is registered in the issuer registration register 150-8, no reply data is generated.

メッセージ発行・生成回路150-10からのリプライおよびリプライデータと対応した発行元情報を受信したメッセージブロードキャスト回路150-12は、発行元情報に登録されている全プロセッサにリプライおよびリプライデータをブロードキャストする。 The message broadcast circuit 150-12 that has received the issuer information corresponding to the reply and reply data from the message issuance / generation circuit 150-10 broadcasts the reply and reply data to all the processors registered in the issuer information.

メモリに対するライトバックリクエストの発行は、Sharedリクエストに対するリプライデータ受信後に行われ、ライトバックに対するリプライを受信したところで制御レジスタ150-4のエントリ有効フラグが0にされる（図３−ＢのS20）。 The write back request for the memory is issued after the reply data for the Shared request is received, and when the reply for the write back is received, the entry valid flag of the control register 150-4 is set to 0 (S20 in FIG. 3-B).

Sharedリクエストの場合に、アドレス一致回路150-2にてアドレス比較を行い（図３−ＡのS2）、一致があった場合は一致したエントリの受付可能判定レジスタ150-6の値を抽出する。そして、受付可能判定レジスタの中の受付可能フラグから受付可能かどうか判定（図３−ＡのS21）し、受付不可の場合はリトライバッファ150-9にメッセージを登録しリトライにする（図３−ＡのS22）。 In the case of a Shared request, address comparison is performed by the address matching circuit 150-2 (S2 in FIG. 3A). If there is a match, the value of the acceptability determination register 150-6 of the matched entry is extracted. Then, a determination is made as to whether or not acceptance is possible from the acceptance flag in the acceptance determination register (S21 in FIG. 3-A). If the acceptance is not possible, a message is registered in the retry buffer 150-9 to retry (FIG. 3). A's S22).

一方、受付可能な場合は、発行元登録レジスタ指示フラグを参照し（図３−ＡのS23）、発行元登録レジスタ指示フラグが0の場合は発行元登録レジスタ150-7の発行元に対応するビットを１にし（図３−ＡのS24）、発行元登録レジスタ指示フラグが1の場合は発行元登録レジスタ150-8の発行元に対応するビットを１にする（図３−ＡのS25）。 On the other hand, if it can be accepted, the issuer registration register instruction flag is referred to (S23 in FIG. 3-A), and if the issuer registration register instruction flag is 0, it corresponds to the issuer of the issuer registration register 150-7. The bit is set to 1 (S24 in FIG. 3-A), and if the issuer registration register instruction flag is 1, the bit corresponding to the issuer of the issuer registration register 150-8 is set to 1 (S25 in FIG. 3-A). .

図４は、対象アドレスのメモリが存在しないノードのグローバルスイッチ・ディレクトリ制御部250においてスヌープを受信した場合の動作である。 FIG. 4 shows the operation when the snoop is received in the global switch / directory control unit 250 of the node where the memory of the target address does not exist.

スヌープを受信すると、調停回路350-1がスヌープメッセージをメッセージ格納バッファ350-3に登録するとともに、一緒に持ちまわった発行元情報を発行元登録レジスタ350-4に登録する。そして、調停回路350-1は、制御レジスタ350-2のエントリ有効フラグおよび発行要求フラグを1にする（図４のS1）。 When the snoop is received, the arbitration circuit 350-1 registers the snoop message in the message storage buffer 350-3, and registers the issuer information carried together in the issuer registration register 350-4. Then, the arbitration circuit 350-1 sets the entry valid flag and issue request flag of the control register 350-2 to 1 (S1 in FIG. 4).

メッセージ発行・生成回路350-5は、制御レジスタ350-2を参照し、発行要求フラグが1になっているエントリから発行するエントリを決定する。そして、メッセージ発行・生成回路350-5は、そのエントリの制御レジスタ350-2とメッセージ格納バッファ350-3と発行元登録レジスタ350-4との内容を読み出し、読み出しが完了したら制御レジスタ350-2の発行要求フラグを0にする。 The message issuance / generation circuit 350-5 refers to the control register 350-2 and determines an entry to be issued from an entry whose issue request flag is “1”. Then, the message issuing / generating circuit 350-5 reads the contents of the control register 350-2, message storage buffer 350-3, and issuer registration register 350-4 of the entry, and when the reading is completed, the control register 350-2 Set the issuance request flag to 0.

メッセージ発行・生成回路350-5は、読み出した情報から新たにスヌープを生成し、ローカルスイッチ110を介して有効なデータを持つキャッシュに発行する（図４のS2）。 The message issuing / generating circuit 350-5 newly generates a snoop from the read information and issues it to the cache having valid data via the local switch 110 (S2 in FIG. 4).

スヌープを発行したら、キャッシュからのレスポンスを待ち（図４のS3）、レスポンスを受信したら、調停回路350-1は制御レジスタ350-2のレスポンス受信フラグを1、データ転送ありの場合はデータ転送フラグを1にし、発行要求フラグを1にする。 When a snoop is issued, a response from the cache is waited (S3 in FIG. 4). When a response is received, the arbitration circuit 350-1 sets the response reception flag of the control register 350-2 to 1, and if there is data transfer, the data transfer flag Is set to 1, and the issue request flag is set to 1.

メッセージ発行・生成回路350-5は、発行要求フラグが1になっているエントリから発行するエントリを決定し、その制御レジスタ350-2とメッセージ格納バッファ350-3と発行元登録レジスタ350-4 との内容を読み出し、読み出しが完了したら発行要求フラグを0にする。 The message issuance / generation circuit 350-5 determines an entry to be issued from an entry whose issuance request flag is 1, and includes a control register 350-2, a message storage buffer 350-3, and an issuer registration register 350-4. The issuance request flag is set to 0 when reading is completed.

メッセージ発行・生成回路350-5は、読み出した値から制御レジスタ350-2のデータ転送フラグを参照し（図４のS4）、データ転送なしの場合はレスポンスを生成してスヌープ発行元へ返却し、制御レジスタ350-2のエントリ有効フラグを0にする（図４のS5）。 The message issuing / generating circuit 350-5 refers to the data transfer flag of the control register 350-2 from the read value (S4 in FIG. 4), generates a response when there is no data transfer, and returns it to the snoop issuer Then, the entry valid flag of the control register 350-2 is set to 0 (S5 in FIG. 4).

データ転送フラグが1の場合は、メッセージ発行・生成回路350-5がレスポンスおよびデータの２種類を生成し、当該レスポンスをスヌープ発行元へ、当該データを発行元登録レジスタ350-4の値と一緒にブロードキャスト回路350-6へ通知し、制御レジスタ350-2のエントリ有効フラグを0にする。 When the data transfer flag is 1, the message issuing / generating circuit 350-5 generates two types of response and data, and sends the response to the snoop issuer and the data together with the value of the issuer registration register 350-4 Is sent to the broadcast circuit 350-6, and the entry valid flag in the control register 350-2 is set to zero.

メッセージ発行・生成回路350-5からのレスポンスとデータ、対応する発行元情報を受信したメッセージブロードキャスト回路350-6は、発行元情報に登録されている全プロセッサにデータをブロードキャストする（図４のS6）。 Upon receiving the response and data from the message issuing / generating circuit 350-5 and the corresponding issuer information, the message broadcast circuit 350-6 broadcasts the data to all the processors registered in the issuer information (S6 in FIG. 4). ).

このように、本実施形態では、リトライせずに、発行元情報を登録しておき複数のメッセージを纏めて処理できるようにしたため、レイテンシを短縮することができる。 As described above, in this embodiment, since the publisher information is registered and a plurality of messages can be processed together without retrying, the latency can be shortened.

１００，２００，３００ノード
１０１，１０２，２０１，２０２，３０１，３０２プロセッサ
１１０，２１０，３１０ローカルスイッチ
１２０，２２０，３２０メモリ・ディレクトリ制御部
１３０，２３０，３３０ディレクトリ
１４０，２４０，３４０メモリ
１５０，２５０，３５０グローバルスイッチ・ディレクトリ制御部
１６０，２６０，３６０ディレクトリ
１５０−１調停回路
１５０−２アドレス一致回路
１５０−３登録判定回路
１５０−４制御レジスタ
１５０−５メッセージ格納バッファ
１５０−６受付可能判定レジスタ
１５０−７発行元登録レジスタ
１５０−８発行元登録レジスタ
１５０−９リトライバッファ
１５０−１０メッセージ発行・生成回路
１５０−１１調停回路
１５０−１２メッセージブロードキャスト回路
３５０−１調停回路
３５０−２制御レジスタ
３５０−３メッセージ格納バッファ
３５０−４発行元登録レジスタ
３５０−５メッセージ発行・生成回路
３５０−６メッセージブロードキャスト回路 100, 200, 300 Node 101, 102, 201, 202, 301, 302 Processor 110, 210, 310 Local switch 120, 220, 320 Memory directory control unit 130, 230, 330 Directory 140, 240, 340 Memory 150, 250 , 350 Global switch directory control unit 160, 260, 360 Directory 150-1 Arbitration circuit 150-2 Address matching circuit 150-3 Registration determination circuit 150-4 Control register 150-5 Message storage buffer 150-6 Acceptance determination register 150 -7 Issuer registration register 150-8 Issuer registration register 150-9 Retry buffer 150-10 Message issue / generation circuit 150-11 Arbitration circuit 150-12 Message block De Cast circuit 350-1 arbitration circuit 350-2 control register 350-3 message storage buffer 350-4 issuing registration register 350-5 message issuing-generating circuit 350-6 message broadcast circuit

Claims

In a latency reduction system that includes multiple nodes,
The node is
Multiple processors,
Shared memory,
While processing a first request for processing a specific storage area in the shared memory, the second request is different from the first request in response to a second request for processing the specific storage area. Creating means for accepting the request without retrying and creating control information for identifying the first processor that has executed the second request among the plurality of processors;
Transmission means for transmitting the reply data as a reply to the second request to the first processor based on the control information in response to transmitting reply data for the first request. Latency reduction system.

The node comprises a cache;
The latency reduction according to claim 1, wherein the transmission means determines the first processor based on the control information when data is received from the cache or the shared memory in response to the first request. system.

The node includes a determination unit that determines the type of request,
When the determining means determines that the first request and the second request are shared requests indicating that the valid data is shared with the cache of another node, the creating means determines that the other node Create control information to identify the request issuer,
When the determination unit determines that the first request or the second request is other than the Shared request, the generation unit does not generate control information that identifies the other node as the request issuer. The latency shortening system according to claim 2.

Each of the nodes is
A directory that manages the shared memory of the local node and other nodes;
A directory controller that searches the directory to determine whether there is a valid cache;
If there is valid cache to the other nodes, to generate a snoop for the other nodes having a valid cache together with control information identifying the own node, and message issuing-generating circuit which issues said snoop ,
The latency shortening system according to claim 2, further comprising:

5. The other node that has received the snoop reads data from the valid cache and transmits the data to the own node specified by the control information received together with the snoop. The latency reduction system described.

6. The latency shortening system according to claim 5, wherein when the other node reads the data from the valid cache, the other node returns a response to the own node that is the issuing source of the snoop.

While processing a first request for processing a specific storage area in the shared memory, the second request is different from the first request in response to a second request for processing the specific storage area. with accepted without retrying, among multiple processors, a creation step of creating a control information identifying the first processor implementing the second request,
A transmission step of transmitting the reply data as a reply to the second request to the first processor based on the control information in response to transmitting reply data for the first request. To reduce latency.

8. The latency reduction method according to claim 7, wherein, in the transmission step, the first processor is determined based on the control information when data is received from a cache or the shared memory in response to the first request. .

A determination step for determining the type of request;
In the determination step, when it is determined that the first request and the second request are shared requests that share valid data with caches of other nodes and hold valid data, in the creation step, the other nodes Create control information to identify the request issuer,
In the determination step, when it is determined that the first request or the second request is other than the Shared request, control information for identifying the other node as a request issuer is generated in the creation step The latency shortening method according to claim 7 or 8, wherein:

A determination step of determining whether there is a valid cache by searching a directory managing the shared memory of the own node and other nodes;
A snoop generation step for generating a snoop when the other node has a valid cache; and
An issuing step of with respect to the other node, issuing said snoop with valid cache along with said control information,
10. The latency shortening method according to claim 8 or 9, wherein:

The other node that has received the snoop has a second transmission step of reading data from the valid cache and transmitting the data to the own node specified by the control information received together with the snoop. The latency shortening method according to claim 10, wherein:

12. The latency shortening method according to claim 11, further comprising: a return step of returning a response to the own node that is the source of the snoop when the other node reads the data from the valid cache. .

While processing a first request for processing a specific storage area in the shared memory, the second request is different from the first request in response to a second request for processing the specific storage area. together with accepted without retrying, among multiple processors, a creation function for creating control information for identifying the first processor implementing the second request,
In response to transmitting reply data for the first request, the first processor is caused to execute a transmission function for transmitting the reply data as a reply to the second request based on the control information. A latency reduction program.

14. The latency shortening program according to claim 13, wherein in the transmission function, the first processor is determined based on the control information when data is received from the cache or the shared memory in response to the first request. .

To execute the determination process for determining the type of requests,
In the determination process, when it is determined that the first request and the second request are shared requests that share valid data with caches of other nodes and hold valid data, Create control information to identify the request issuer,
In the determination process, when it is determined that the first request or the second request is other than the Shared request, the creation function creates control information that identifies the other node as a request issuer. 15. The latency shortening program according to claim 13 or 14, wherein:

A determining process of determining whether there is a valid cache by searching the directory for managing a shared memory of multiple nodes,
A snoop generation process for generating a snoop when another node of the plurality of nodes has a valid cache; and
Issuing process for issuing the snoop to the other node having the valid cache together with the control information for identifying the own node ;
16. The latency shortening program according to claim 14 or 15, wherein:

In the other node,
When the snoop is received, a second transmission process for reading data from the valid cache and transmitting the data to the own node specified by the control information received together with the snoop is sent to the second computer. The latency shortening program according to claim 16, wherein the latency shortening program is executed.

In the other node,
The latency according to claim 17, wherein when the data is read from the valid cache, the second computer is caused to execute a return process for returning a response to the local node that is the snoop issuer. Abbreviation program.

While processing a first request for processing a specific storage area in the shared memory, the second request is different from the first request in response to a second request for processing the specific storage area. with it accepted without retrying, among multiple processors, and generating means for generating control information specifying the first processor implementing the second request,
Transmission means for transmitting the reply data as a reply to the second request to the first processor based on the control information in response to transmitting reply data for the first request. Node to perform.

With a cache,
The node according to claim 19, wherein the transmitting unit determines the first processor based on the control information when data is received from the cache or the shared memory in response to the first request.

A determination means for determining the type of request;
When the determining means determines that the first request and the second request are shared requests indicating that the valid data is shared with the cache of another node, the creating means determines that the other node Create control information to identify the request issuer,
Wherein when the first request or the second request by determining means is determined to be other than the Shared request memorize means control information the generating means to identify the issuer of the request to the other nodes The node according to claim 20, wherein the node is not created.

A directory that manages the shared memory of the local node and other nodes;
A directory controller that searches the directory to determine whether there is a valid cache;
If there is valid cache to the other nodes, to generate a snoop, to other nodes having a front Symbol valid cache along with the control information, and message issuing-generating circuit which issues said snoop,
The node according to claim 20 or 21, characterized by comprising:

When receiving the snoop reads data from the valid cache, the node of claim 22, wherein the transmitting the data to the node specified by the control information received together with the snoop.

The node according to claim 23, wherein when the data is read from the valid cache, a response is returned to the node that issued the snoop.