JP3714235B2

JP3714235B2 - Multiprocessor system

Info

Publication number: JP3714235B2
Application number: JP2001345530A
Authority: JP
Inventors: 真一郎森田; 至誠藤原; 勝小柳; 祥基村上
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2001-11-12
Filing date: 2001-11-12
Publication date: 2005-11-09
Anticipated expiration: 2021-11-12
Also published as: JP2003150573A

Description

【０００１】
【発明の属する技術分野】
本発明は複数のプロセッサを有するマルチプロセッサシステムに関する。
【０００２】
【従来の技術】
計算機システムの処理速度を向上させるために、キャッシュメモリの使用が有効であることはよく知られている。キャッシュメモリは、プロセッサとメインメモリの間に位置する高速、小容量のメモリのことをいう。キャッシュメモリは、メインメモリのデータの一部を保持し、メインメモリの代わりにプロセッサとのデータの送受信を行い、メインメモリよりも高速なデータアクセスを実現する。
【０００３】
このようなキャッシュメモリを使用した計算機システムでは、プロセッサがリード要求を発行したときにキャッシュメモリにデータが格納されている（すなわちヒットした）場合、キャッシュメモリ内のデータが直ちにプロセッサに送信され、高速な動作を実現することができる。
【０００４】
プロセッサからのメインメモリへのリード要求に対し、キャッシュメモリにそのデータがなかった（すなわちミスした）場合は、キャッシュメモリはプロセッサが要求したデータをメインメモリから読み込んで、これをプロセッサに供給する。
【０００５】
ここで、複数のキャッシュメモリを有する計算機システムにおいては、メインメモリからキャッシュメモリにデータを読み込む際にキャッシュ・コヒーレンシ一貫性制御が要求される。キャッシュ・コヒーレンシ一貫性制御とは、メインメモリ上の同一アドレスにあるデータのコピーを二つ以上のキャッシュメモリが保持する場合に同一の値を持つこと（キャッシュ・コヒーレンシ一貫性）を保証するための制御である。
【０００６】
キャッシュ・コヒーレンシ一貫性制御の例を図１２に示されるシステムで説明する。図１２のシステムは、ノード（１００）を複数（ｘＮＤ０〜ｘＮＤｎ）有し、それらが相互結合網１（１１１）と相互結合網２（１１２）によって相互に接続されている構成である。
【０００７】
ノード（１００）は、プロセッサ（１０１）とキャッシュメモリ（１０３）、キャッシュタグ（１０４）、メインメモリ（１０５）とキャッシュ・メインメモリ制御装置（１０２）を有する。キャッシュ・メインメモリ制御装置（１０２）は、プロセッサ（１０１）や相互結合網１（１１１）を介した他ノードからの要求に応じて、キャッシュメモリ（１０３）、キャッシュタグ（１０４）、メインメモリ（１０５）のデータの読み出し、書き込みを行なう。
【０００８】
キャッシュ・メインメモリ制御装置（１０２）は、よく知られたＭＥＳＩキャッシュ・コヒーレンシ・プロトコルに従ったキャッシュ・コヒーレンシ一貫性制御を行なうものとする。すなわち、キャッシュ状態として、（ａ）Ｉｎｖａｌｉｄ（当該データは無効）、（ｂ）Ｓｈａｒｅｄ−Ｕｎｍｏｄｉｆｉｅｄ（当該データは他のプロセッサのキャッシュメモリ内にも存在し、主記憶内のデータと同一）、（ｃ）Ｅｘｃｌｕｓｉｖｅ‐Ｍｏｄｉｆｉｅｄ（当該データは、当該キャッシュメモリ内にのみ存在し、しかも主記憶内のデータとは同一ではない）、（ｄ）Ｅｘｃｌｕｓｉｖｅ−Ｕｎｍｏｄｉｆｉｅｄ（当該データは、当該キャッシュメモリ内にのみ存在し、主記憶内のデータとは同一）を定義する。任意のノードにおいてリード要求が発生し、当該データが自ノード内のキャッシュメモリにない（リードミス）の場合には、相互結合網１にリード・トランザクションをブロードキャスト（同報送信）し、全ノードがそれを受信する。
【０００９】
この時、いずれかのノード内キャッシュメモリにヒットした場合には、当該ノードから要求元のノードにデータを転送する。一方、いずれのノード内のキャッシュメモリにもヒットしなかった場合には、要求されたデータを保持するメインメモリからデータリターンを行なう。また、キャッシュメモリ内の置換（キャッシュメモリ内に空き領域を作るために既存のデータを追い出す操作）対象となったデータラインがＥｘｃｌｕｓｉｖｅ‐Ｍｏｄｉｆｉｅｄの場合には、メインメモリに反映させるべく、相互結合網１にライト・トランザクションを送出する。
【００１０】
相互結合網１はノード間でやり取りされるトランザクションを配信する。相互結合網２は、リード・トランザクションを受信したノードがキャッシュ・コヒーレンシ一貫性制御を行なう際の応答メッセージをノード間で配信する。
【００１１】
図１３は、このようなシステムにおいて、メインメモリからキャッシュメモリにデータを読み込み、プロセッサにそのデータを供給する動作を図示するタイミングチャートである。図１３において、縦方向は動作に関係する各ノードまたはノード内の構成要素を示し、横方向は各ノードの動作の時間軸を示す。また、図１３におけるシステムは、ｘＮＤ０〜ｘＮＤ２の３ノードの構成とする。
【００１２】
図１３において、まず、ノードｘＮＤ０のプロセッサがメモリリード要求を発行する。この要求のデータが自ノードのキャッシュメモリにおいて、キャッシュ状態Ｉｎｖａｌｉｄであり、すなわちキャッシュミスを起こしたとする。すると、ｘＮＤ０のキャッシュ・メインメモリ制御装置は、相互結合網１にこのアドレスのデータを要求するリード・トランザクションを発行する。相互結合網１はこのリード・トランザクションを全ノードに配信する。
【００１３】
ｘＮＤ１とｘＮＤ２はこのリード・トランザクションを受信すると、リード要求のアドレスのデータが自ノードのキャッシュ上にどのような状態で格納されているかを調べる。ｘＮＤ１、ｘＮＤ２ともに要求されたデータは自ノードのキャッシュ上でＩｎｖａｌｉｄ状態であったとすると、ｘＮＤ１とｘＮＤ２はそれぞれ、キャッシュ・コヒーレンシ一貫性制御結果として、メッセージ’Ｉｎｖ’を相互結合網２に送信する。ここで、’Ｉｎｖ’は要求されたデータが自ノードのキャッシュ上でＩｎｖａｌｉｄ状態であることを示すメッセージであるとする。相互結合網２はこの’Ｉｎｖ’メッセージをｘＮＤ０に配信する。
【００１４】
’Ｉｎｖ’を受信したｘＮＤ０は、他ノードのキャッシュ状態が全てＩｎｖａｌｉｄ状態であることを判定し、メインメモリからのデータを使用することを決定する。要求されたデータがｘＮＤ０のメインメモリのデータであるとすると、ｘＮＤ０のキャッシュ・メインメモリ制御装置は自ノードのメインメモリから要求されたデータを読み出し、これをキャッシュメモリにＥｘｃｌｕｓｉｖｅ−Ｕｎｍｏｄｉｆｉｅｄ状態で格納し、プロセッサにこのデータを供給して、リードの処理を完了させる。
【００１５】
このようなキャッシュ・コヒーレンシ一貫性制御の例としては、例えば特開平１０−１６１９３０号公報に示されるようなシステムが知られている。特開平１０−１６１９３０号公報では、複数のキャッシュメモリを有する計算機システムにおいて、あるキャッシュメモリからメインメモリへの書き込み要求と、同一メインメモリアドレスからの他のキャッシュへの読み出し要求が時間的に近接して発生した場合に、キャッシュ一貫性を保証しつつ、この競合を解決する方法について示している。
【００１６】
【発明が解決しようとする課題】
特開平１０−１６１９３０号公報では、ライトとリードが競合したときの解決方法を示しているが、リードと同一アドレスに対する別のリードが競合したときの解決方法については何も述べていない。前述の従来技術の場合には、リードと同一アドレスに対する別のリードが時間的に近接して発行されるような場合に、キャッシュ一貫性に問題が生じる可能性がある。これを示したのが図１４である。
【００１７】
図１４はｘＮＤ０のプロセッサがデータのリードを発行し、その処理が完了する前にｘＮＤ１のプロセッサも同じアドレスのデータのリードを発行したときの動作を示している。図１4において、縦方向は動作に関係する各ノードまたはノード内の構成要素を示し、横方向は各ノードの動作の時間軸を４つのフェーズに分けたものを示している。
【００１８】
（フェーズ１）
ｘＮＤ０のプロセッサがリード要求を発行する。要求されたデータのキャッシュ状態はＩｎｖａｌｉｄであり、キャッシュミスを起こす。キャッシュ・メインメモリ制御装置は相互結合網１にリード・トランザクションを発行する。相互結合網１は全ノードにリード・トランザクションを配信する。ｘＮＤ１とｘＮＤ２はリード・トランザクションを受けると、自ノードのキャッシュ状態を調べ、Ｉｎｖａｌｉｄ状態であることを示すメッセージ’Ｉｎｖ’を相互結合網２に送信したとする。相互結合網２はｘＮＤ１とｘＮＤ２から送信された’Ｉｎｖ’メッセージをｘＮＤ０に送信する。
【００１９】
（フェーズ２）
ｘＮＤ１のプロセッサがリード要求を発行する。リード要求のアドレスは（フェーズ１）でｘＮＤ０のプロセッサが発行したリード要求のアドレスと同一であるとする。ｘＮＤ１はキャッシュミスを起こす。キャッシュ・メインメモリ制御装置は相互結合網１にリード・トランザクションを発行する。相互結合網１は全ノードにリード・トランザクションを配信する。ｘＮＤ０とｘＮＤ２はリード・トランザクションを受けると、自ノードのキャッシュ状態を調べ、Ｉｎｖａｌｉｄ状態であることを示すメッセージ’Ｉｎｖ’を相互結合網２に送信する。相互結合網２はこの’Ｉｎｖ’メッセージをｘＮＤ１に配信する。
【００２０】
（フェーズ３）
ｘＮＤ０は（フェーズ１）のリード要求の処理を継続し、メインメモリからのデータを受信する。ｘＮＤ０は（フェーズ１）においてｘＮＤ１とｘＮＤ２の双方からキャッシュ・コヒーレンシ一貫性制御の結果のメッセージ’Ｉｎｖ’を受信しているので、このメインメモリからのデータをキャッシュメモリにＥｘｃｌｕｓｉｖｅ−Ｕｎｍｏｄｉｆｉｅｄ状態で登録して、プロセッサにデータを供給して、リード処理を完了する。
【００２１】
（フェーズ４）
ｘＮＤ１は（フェーズ２）のリード要求の処理を継続し、メインメモリからのデータを受信する。ｘＮＤ０も（フェーズ２）においてｘＮＤ０とｘＮＤ２の双方からキャッシュ・コヒーレンシ一貫性制御の結果のメッセージ’Ｉｎｖ’を受信しているので、このメインメモリからのデータをキャッシュメモリにＥｘｃｌｕｓｉｖｅ−Ｕｎｍｏｄｉｆｉｅｄ状態で登録して、プロセッサにデータを供給して、リード処理を完了する。
【００２２】
このように、Ｅｘｃｌｕｓｉｖｅ−Ｕｎｍｏｄｉｆｉｅｄの状態でデータを保持するキャッシュはシステム内でただひとつでなければならないにもかかわらず、上記の動作では二つのキャッシュメモリがＥｘｃｌｕｓｉｖｅ−Ｕｎｍｏｄｉｆｉｅｄの状態でデータを保持することになる。従って以後のプロセッサのデータ書き込み動作によってはキャッシュ一貫性が保証されない可能性がある。
【００２３】
上記の問題はキャッシュメモリがメインメモリからのデータの読み込みの処理を実行中であり、キャッシュの状態が確定していないにもかかわらず、別のキャッシュメモリからのキャッシュ・コヒーレンシ一貫性制御要求に応答して、キャッシュ・コヒーレンシ一貫性制御結果を出力してしまうことにより発生する。
【００２４】
本発明の目的は複数のキャッシュメモリが時間的に近接してメインメモリからのデータの読み込みの処理を実行した場合にも、正しいキャッシュ・コヒーレンシ一貫性制御を実現することが可能なマルチプロセッサシステムを提供することである。
【００２５】
また、本発明の別の目的はキャッシュ・コヒーレンシ一貫性を保つために、キャッシュ・コヒーレンシ一貫性制御要求に対する中止と再実行を要求しても、キャッシュ・コヒーレンシ一貫性制御要求の沈み込みを防ぐことが可能なマルチプロセッサシステムを提供することにある。その他の目的については、以下の説明であきらかになるであろう。
【００２６】
【課題を解決するための手段】
本発明では、各ノードはシステムにおいて発信されたリードアクセス要求に関する情報を備える。各ノードは、自ノードにおいてキャッシュミスが生じた場合に、この情報を参照して、必要とするデータに対して現在すでにリードアクセス要求がされているかどうかを判断し、自ノードがリード要求することができるかどうかを判断する。
【００２７】
また、本発明では、このリードアクセス要求に関する情報には、自ノードが発信したアクセス要求の状態情報を含む。他ノードから発信されたリードアクセス要求を受信すると、この情報を参照して自ノードにおいてすでにリードアクセス要求をしたものであるかどうかを判断し、その結果を出力する。
【００２８】
また、本発明では、このリードアクセス要求に関する情報には、各リードアクセス要求が行なわれる優先順位についての情報も含まれる。これらの情報を用いて、次に行なわれるリードアクセス要求が決定される。
【００２９】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照して詳細に説明する。
【００３０】
図１は、本発明に関する第１の実施の形態であるマルチプロセッサ１１システムの全体構成の一例を示す概念図である。
【００３１】
図１に例示されるように、本実施の形態のマルチプロセッサ１１システムは、複数のノードＮＤ０〜ＮＤｎ（１０）が、相互結合網A（２０）と相互結合網Ｂ（３０）を介して結合された構成となっている。ノードとは、各々が独立してデータ処理を行なうプロセッサ１１モジュールのことをいう。本実施例において、各ノードＮＤＯ〜ＮＤｎは、クロスバスバスイッチによって実現するものとするが、これに限らず、バスを用いる構成としてもよいことはいうまでもない。
【００３２】
図２はノード（１０）の構成を示す概念図である。図２に例示されるように、個々のノード１０は、プロセッサ１１、キャッシュ・メインメモリ制御部１２、キャッシュメモリ１３ａ、キャッシュタグ１３ｂ、メインメモリ１３ｃ、リクエスト管理テーブル１４、ＣＣＣ受信部１５、トランザクション受信部１６、ＣＣＣ送信部１７、トランザクション送信部１８、スターベーション管理部１９を備えている。
【００３３】
プロセッサ１１は、各種データを演算するものであり、汎用のマイクロプロセッサ１１で構成されるものである。メインメモリ１３ｃは、プロセッサ１１が処理するためのデータ、プログラムがロードされる。キャッシュメモリ１３ａは、このプロセッサ１１とメインメモリ１３ｃまたはノード１０の外部との間で授受されるデータを一時的に保持する。キャッシュ・メインメモリ制御部１２は、メインメモリ１３ｃ、キャッシュメモリ１３ａのデータ転送等の制御を行なう。キャッシュタグ１３ｂは、キャッシュ・メインメモリ制御部１２がキャッシュメモリ１３ａを制御する際に用いる制御情報を格納する。トランザクション受信部１６、トランザクション送信部１８は、相互結合網A（２０）を介して他のノードとの間における情報の授受を行なう。ＣＣＣ受信部１５、ＣＣＣ送信部１７は、相互結合網Ｂ（３０）を介してキャッシュ・コヒーレンシ一貫性制御結果（ＣＣＣ）を送信又は受信をする。
【００３４】
さらに個々のノード（１０）は、自ノードが相互結合網Aを介してノードに発行したメモリアクセスを管理するためのリクエスト管理テーブル部（１４）と、キャッシュ・コヒーレンシ一貫性制御結果がリトライとなったメモリアクセス要求のアドレスと要求元を記録するためのスターベーション管理部（１９）を備えている。
【００３５】
なお、本実施形態においては、キャッシュ・メインメモリ制御部１２はよく知られたＭＥＳＩキャッシュ・コヒーレンシ・プロトコルに従ったキャッシュ・コヒーレンシ一貫性制御を行なうものとする。すなわち、キャッシュ状態として、（ａ）Ｉｎｖａｌｉｄ（当該データは無効）、（ｂ）Ｓｈａｒｅｄ−Ｕｎｍｏｄｉｆｉｅｄ（当該データは他のプロセッサ１１のキャッシュメモリ１３ａ内にも存在し、主記憶内のデータと同一）、（ｃ）Ｅｘｃｌｕｓｉｖｅ‐Ｍｏｄｉｆｉｅｄ（当該データは、当該キャッシュメモリ１３ａ内にのみ存在し、しかも主記憶内のデータとは同一ではない）、（ｄ）Ｅｘｃｌｕｓｉｖｅ−Ｕｎｍｏｄｉｆｉｅｄ（当該データは、当該キャッシュメモリ１３ａ内にのみ存在し、主記憶内のデータとは同一）を定義する。
【００３６】
任意のノードにおいてリード要求が発生し、当該データが自ノード内のキャッシュメモリ１３ａにない（リードミスの）場合には、相互結合網A（２０）にリード・トランザクションをブロードキャスト（同報送信）し、他のノード（ＮＤ）がそれを受信する。この時、ノード内キャッシュメモリ１３ａにヒットした場合には、当該ノードから要求元のノードにデータを転送し、一方、いずれのノード内のキャッシュメモリ１３ａにもヒットしなかった場合には、メインメモリからデータリターンが行なわれる。また、キャッシュメモリ１３ａ内の置換（キャッシュメモリ１３ａ内に空き領域を作るために既存のデータを追い出す操作）対象となったデータラインがＥｘｃｌｕｓｉｖｅ‐Ｍｏｄｉｆｉｅｄの場合には、メインメモリに反映させるべく、相互結合網Aにライト・トランザクションを送出する。
【００３７】
図３は相互結合網A（２０）の構成図である。図３に例示されるように、相互結合網A（２０）は、各ノード間で、１：１または１：多（同報送信）の接続制御を行なうスイッチ結合論理２３を有する。スイッチ結合論理２３と各ノードとは、各ポート２２ａ、２２ｂで接続される。各ノードが接続されるポート２２ａ、２２ｂの間の接続を切り替えることにより、１：１または１：多（同報送信）の接続制御が行なわれる。また、各ポート２２ａ、ｂには、トランザクションキュー２１ａ、２１ｂを備えている。
【００３８】
図４は相互結合網Ｂ（３０）の構成図である。図４に例示されるように、ノードから出力されたＣＣＣ信号を受信するＣＣＣ受信キュー３１ａと、各ノードから受信したＣＣＣ信号をまとめてノード全体のコヒーレンシ一貫性制御の結果を集計する集計論理部（３３）、集計されたＣＣＣ信号を各ノードに送信するＣＣＣ送信キュー３１ｂで構成される。
【００３９】
ここで、ノードの出力するＣＣＣ信号としては、次の４種を定義する。
‘Ｉｎｖ’：要求されたデータ（キャッシュライン）は当該ノード内では無効（ＩＮＶＡＬＩＤ）である。
'Ｓｕｐ'：要求されたキャッシュラインは自キャッシュメモリ１３ａに"ＥＸＣＬＵＳＩＶＥ−ＭＯＤＩＦＩＥＤ"で存在し、当該データを要求元に転送供給（ＳＵＰＰＬＹ）し、転送した当該キャッシュラインは"ＩＮＶＡＬＩＤ"とする。要求元のノードは、当該キャッシュラインを受領後"ＥＸＣＬＵＳＩＶＥ−ＭＯＤＩＦＩＥＤ"として扱う。
'Ｓｈｒ'：要求されたデータ（キャッシュライン）は当該ノード内では（Ｓｈａｒｅｄ−Ｕｎｍｏｄｉｆｉｅｄ）である。
'Ｒｔｙ'：要求されたデータ（キャッシュライン）は他のトランザクションによるキャッシュ・コヒーレンシ一貫性制御を実行中であり、キャッシュ状態が確定していないため、データ要求の中止と再実行を要求する。
【００４０】
上記４種のＣＣＣ信号を各ノードから受信して、ＣＣＣ信号が集まったところで（キャッシュ・コヒーレンシ一貫性制御を要求したノードを除く）、集計論理部（３３）は集計結果を決定する。その方法は下記の通りである。
全てのＣＣＣ信号が’Ｉｎｖ’であったとき：集計結果は’Ｉｎｖ’。
'Ｓｈｒ'がひとつ以上存在し、'Ｒｔｙ'が含まれていなかったとき：集計結果は 'Ｓｈｒ'。
'Ｓｕｐ'が含まれていたとき：集計結果は'Ｓｕｐ'。
'Ｒｔｙ'が含まれていたとき：集計結果は'Ｒｔｙ'。ただし、同時に'Ｓｕｐ'が含まれていたときは'Ｓｕｐ'を優先し、集計結果は'Ｓｕｐ'になるものとする。
【００４１】
上記以外の組み合わせが出現しないようにキャッシュは制御される。集計論理部（３３）の出力した集計結果は、ＣＣＣ送信キュー３１ｂを経由して、全ノードに同報送信される。ＣＣＣ信号はイン・オーダで処理するものとする。
【００４２】
図５は、リクエスト管理テーブル部１４の構成図である。図５に示されるように、リクエスト管理テーブル部１４は、自ノード外のメモリに対して自ノードのキャッシュ・メインメモリ制御部１２が行なったアクセスリクエストのアドレスを保持する複数のレジスタ１４ｂで構成され、キャッシュ・メインメモリ制御部１２により、パス１４ａを介して、アドレスの登録、アドレス登録の抹消などが行われる。
【００４３】
図６はトランザクション受信部１６を説明するための構成図である。図６で示されるように、トランザクション受信部１６は、バス１６aを介して、相互結合網Ａから他ノードが発信したトランザクションを受信する。また受信したトランザクションを、キャッシュ・メインメモリ制御部１２へ送信する。トランザクション受信部１６は、キャッシュ・コヒーレンシ一貫性制御を要求するトランザクションを格納するキュー１６ｄ、キュー１６ｄのライトポインタ１６ｆとリードポインタ１６ｅを有する。
【００４４】
ライトポインタ１６ｆは、キャッシュ・コヒーレンシ一貫性制御を要求するトランザクションを順次キュー１６ｄへの格納する制御に用いられるものである。リードポインタ１６ｅは、ＣＣＣ受信部１５からＣＣＣ信号が到着したことを示す信号１５ｃを受け、そのＣＣＣ信号に対応するトランザクションをスターベーション管理部１９にパス１６ｃを介して送信する制御に用いられる。このＣＣＣ信号は、図４を用いて説明したように、各モードからのＣＣＣ信号結果を集約したものである。
【００４５】
本実施の形態においては、このリクエスト管理テーブルを参照することによって、自ノードが発行したリード・トランザクションが、発行されているのか、発行されていないのか、処理中であるのか、などのステータス情報を他ノードに提供することが可能となる。
【００４６】
図７は、スターベーション管理部１９の構成図である。図７に示されるように、スターベーション管理部１９はスターベーション・レジスタ１９ｃ、スターベーション・レジスタ読み出し制御部１９ｄとスターベーション・レジスタ書き込み制御部１９ｅを有する。スターベーション・レジスタ１９ｃは、ノード毎にリード・トランザクションが要求するデータのアクセス先であるアドレスを保持するレジスタ１９ｃｂと保持された値の有効性を示すバリッドビット１９ｃａを有する。
【００４７】
本実施の形態におけるスターベーション・レジスタ書き込み制御部１９ｅは、主に他のノードが発信したトランザクションの管理を行なうものである。パス１５ｂからＣＣＣ信号を、そのＣＣＣ信号に対応するトランザクションをパス１６ｃから受信する。このとき、スターベーション・レジスタ書き込み制御部１９ｅは以下のように動作する。
ＣＣＣ信号がＲｔｙの場合：パス１６ｃから受け取ったトランザクションに埋め込まれた情報からトランザクションの発行元ノードを判定し、そのノードに対応するバリッドビット１９ｃａを読み出す。バリッドビッドが点灯していない場合は点灯させ、トランザクションに埋め込まれているアドレスをレジスタ１９ｃｂに書き込む。バリッド信号の点灯は、対象となるアドレスについて、このノードがリトライ待機中であるということを示す。リトライ待機中のノードとは、現在処理中の他ノードのリード・トランザクションが終了後に、そのデータに関する次のリード・トランザクションのリトライをするノードのことをいう。
ＣＣＣ信号がＲｔｙ以外の場合：パス１６ｃから受け取ったトランザクションに埋め込まれた情報からトランザクションの発行元ノードを判定し、そのノードに対応するバリッドビット１９ｃａとレジスタ１９ｃｂを読み出す。バリッドが点灯していたなら、レジスタ１９ｃｂから読み出した値と、トランザクションに埋め込まれているアドレスとを比較し、一致したなら、バリッドビット１９ｃａを消灯させる。ＣＣＣ信号がＲｔｙではないことは、発行元のノードによるトランザクションのリトライが成功したことを意味する。従って、バリッドビッド１９ｃａを消灯は、次にリトライすべきノードの特定を解除するためのものである。
【００４８】
本実施の形態のスターベーション・レジスタ読み出し制御部１９ｄは、主に自ノードがキャッシュミスをした後に、リード・トランザクション発行する際に必要な処理を行なうものである。すなわち、プロセッサからリクエストが出されたデータに関してキャッシュミスが生じた場合に、そのデータに関する各ノードのステータスをチェックするものである。その際のスターベーション・レジスタ読み出し制御部１９ｄの動作は、以下の通りである。
【００４９】
キャッシュ・メインメモリ制御部１２（１２）からパス１９ｂを経由して、発信しようとしているリード・トランザクションの対象となるアドレス値を受信する。まず、自ノードに対応するバリッドビット１９ｃａとレジスタ１９ｃｂを読み出す。バリッドビットが消灯しているか、または自ノードのレジスタ１９ｃｂとパス１９ｂから受けたアドレスが一致しない場合は、対象となるデータに対してまだ自ノードからリード・トランザクションが発行されていないか、もしくは、リトライ待ちでない状態を示す。次に、自ノード以外のレジスタ１９ｃｂの中にパス１９ｂから受けたアドレスと一致するレジスタがないかどうか検索する。もし一致するレジスタがあり、さらにそのレジスタのバリッドビット１９ｃａが点灯している場合は、そのレジスタのノードが、そのアドレスに対してリトライ待機中であることを意味する。従って、パス１９ｂを介して受けたアドレスに対し、プロセッサ１１に対してリトライをすべき旨を示す信号を、１９ａを経由してキャッシュ・メインメモリ制御部１２に送信する。キャッシュ・メインメモリ制御部１２は、その旨をプロセッサ１１に対して通知する。
【００５０】
次に、本実施の形態で使用されるトランザクションのタイプについて、図８を参照して説明する。ノード間の情報の授受は、基本的には、図８に例示される６４ビット幅のデータ単位（トランザクション）を相互結合網Ａ（２０）の動作サイクル単位に時系列的に、相互結合網Ａ（２０）に送り出し、あるいは、相互結合網Ａ（２０）から受け取ることによって行われる。各トランザクションのデータ構成は、各トランザクションのタイプを示す領域（ＴＹＰＥ）、各々のトランザクションに関係するノード（要求元、転送先等）の識別情報等が格納される領域（ＰＯＲＴ）、相手先で各々のトランザクションを処理するときに使用される情報が格納される領域（ＭＩＳＣ）、リードアクセス対象のアドレスを格納する領域（ＡＤＤＲＥＳＳ）、さらに必要に応じて伝送されるデータを格納する領域（ＤＡＴＡ）を有している。次に、各々のトランザクションのデータ構成について説明する。
【００５１】
図８の（ａ）は、メモリモジュールと全プロセッサ１１モジュールにブロードキャスト（同報送信）されるリード・トランザクションを示している。８ビット幅のＴＹＰＥフィールドには、本トランザクションがリード・トランザクションであることを示すビットパターンが設定される。次の８ビットのＰＯＲＴフィールドには、宛て先情報として、要求先のメインメモリを保持するノードのポート番号と同時に他の全てのノードに同報送信すべきことを相互結合網A（２０）に指示する特定のビットパターンが設定される。次の１６ビットのＭＩＳＣフィールドには、たとえば、要求元であるノードのポート番号等の識別情報が設定さる。残りの３２ビットのＡＤＤＲＥＳＳフィールドは、リード対象のアドレスが設定される。
【００５２】
図８の（ｂ）は、メインメモリから読み出したデータを要求元に返すためのリターン・トランザクションを示す。ＴＹＰＥフィールドには、リターン・トランザクションを示す特定のビットパターンが設定され、ＰＯＲＴフィールドにはリード要求元のノードのポート番号が設定され、ＭＩＳＣフィールドには、たとえばデータ長（サイクル数）等のパラメータが設定され、残りのフィールドは未使用である。
【００５３】
図８の（ｃ）は、リード要求に対して、他ノードのキャッシュメモリ１３ａが更新された最新データを保持していたときに、これをリード要求元に転送する転送トランザクションを示している。ＴＹＰＥフィールドには、転送トランザクションを示す特定のビットパターンが設定され、ＰＯＲＴフィールドには転送先のノードのポート番号が設定され、ＭＩＳＣフィールドには、たとえばデータ長（サイクル数）等のパラメータが設定され、残りのフィールドは未使用である。
【００５４】
図８の（ｄ）は、メモリライトを要求するライト・トランザクションを示している。ＴＹＰＥフィールドには、ライト・トランザクションを示す特定のビットパターンが設定され、ＰＯＲＴフィールドには要求先のメインメモリを保持するノードのポート番号が設定され、ＭＩＳＣフィールドには、たとえばデータ長（サイクル数）等のパラメータが設定され、残りのフィールドは未使用である。
【００５５】
次に図９、図１０、図１１のタイミングチャートを参照して、本実施の形態のマルチプロセッサ１１システムおよびトランザクション制御の作用の一例を説明する。なお、図９と図１０、図１１において、縦方向には、動作に関係する各ノードまたはノード内の構成要素が配置され、横方向は各ノードの動作の時間軸を示す。時間軸は、行なわれる処理のまとまり毎に、フェーズ１〜６に区切られている。本実施の形態におけるマルチプロセッサ１１システムのノード数はＮＤ０〜ＮＤ２の３ノードとする。また、初期状態として、全ノードのリクエスト管理部１４に登録はなく、またスターベーション管理部１９のバリッドビット１９ｃａは全て消灯しているものとする。
【００５６】
（フェーズ１）
図９において、ＮＤ０のプロセッサ１１がリード要求を発行する。要求されたデータのキャッシュ状態はＩｎｖａｌｉｄであり、キャッシュミスを起こす。キャッシュ・メインメモリ制御装置は、アドレスをリクエスト管理テーブル１４に登録し、相互結合網Aにリード・トランザクションを発行する。相互結合網Aは全ノードにＮＤ０の発行したリード・トランザクションを配信する。ＮＤ１とＮＤ２はリード・トランザクションを受けると、自ノードのキャッシュ状態を調べ、Ｉｎｖａｌｉｄ状態であることを示すメッセージ’Ｉｎｖ’を相互結合網Ｂに送信する。相互結合網ＢはＮＤ１とＮＤ２から送信された’Ｉｎｖ’メッセージを受信し、これを集計して、集計結果’Ｉｎｖ’メッセージを全ノードに配信する。
【００５７】
（フェーズ２）
ＮＤ１のプロセッサ１１がリード要求を発行する。リード要求のアドレスは（フェーズ１）でＮＤ０のプロセッサ１１が発行したリード要求のアドレスと同一であるとする。従って、フェーズ１は、リード・トランザクションが処理中の状態において、同じアクセスに対するリード要求が発行された場合の具体的な処理を示している。
【００５８】
ＮＤ１はキャッシュミスを起こす。キャッシュ・メインメモリ制御装置はアドレスをリクエスト管理テーブル１４に登録し、相互結合網Aにリード・トランザクションを発行する。相互結合網Aは全ノードにＮＤ１の発行したリード・トランザクションを配信する。ＮＤ２は、トランザクション受信部１６において、リード・トランザクションを受ける。受け取った結果、キャッシュ・メインメモリ制御部１２が自ノードのキャッシュ状態を調べ、Ｉｎｖａｌｉｄ状態であることを示すメッセージ’Ｉｎｖ’を、ＣＣＣ送信部１７から相互結合網Ｂに送信する。
【００５９】
一方、このリード・トランザクションをトランザクション受信部１６で受信したＮＤ０は、キャッシュ・メインメモリ制御部１２がリクエスト管理テーブル１４を調べ、要求されたデータのアドレスが登録されているか否かをチェックする。ＮＤ０にはフェーズ１で要求されたデータのアドレスに対して、リード・トランザクションを発信しているため、このアドレスが登録されており、同一アドレスに対する自ノードのリード処理が実行中であることが判明する。ＮＤ０のリード処理は実行途中であり、キャッシュの状態が確定していないため、ＮＤ１のリード要求に対してキャッシュの状態を応答することができない。
【００６０】
この問題に対しては、たとえば、キャッシュメモリ１３ａがメインメモリからのデータの読み込みの処理を実行中で、キャッシュの状態が確定していない場合には、別のキャッシュメモリ１３ａからのキャッシュ・コヒーレンシ一貫性制御要求に対してはキャッシュ・コヒーレンシ一貫性制御要求の中止と再実行（リトライ）を要求する方法が考えられる。
【００６１】
このとき、次のような問題が生じる。例えば、ある第一のキャッシュメモリ１３ａがメインメモリからのデータの読み込みの処理を実行中であり、第二のキャッシュメモリ１３ａからのキャッシュ・コヒーレンシ一貫性制御要求に対してリトライを要求したとする。ここで、第一のキャッシュメモリ１３ａの処理が完了し、第二のキャッシュメモリ１３ａがキャッシュ・コヒーレンシ一貫性制御要求を再実行する直前に第三のキャッシュメモリ１３ａがキャッシュ・コヒーレンシ一貫性制御要求を発行したとする。すると、これより遅れてキャッシュ・コヒーレンシ一貫性制御要求の再実行をした第二のキャッシュメモリ１３ａは第三のキャッシュメモリ１３ａによって再びリトライを要求される。このようなことが繰り返されると、第二のキャッシュメモリ１３ａのキャッシュ・コヒーレンシ一貫性制御要求がいつまでも処理されず、沈み込み（スターベーション）を起こす可能性がある。この問題に関しては、（フェーズ４）で本発明により解決方法が示されるであろう。
【００６２】
ＮＤ０はキャッシュ・コヒーレンシ一貫性制御の結果として、ＮＤ１のリード要求の中止と再実行（リトライ）を要求するメッセージ'Ｒｔｙ'をＣＣＣ送信部１７から相互結合網Ｂに送信する。このように、各ノードは、自ノードの発信したリード・トランザクションが処理中の状態であるかどうかということに関して、他ノードにその情報を通知する。
【００６３】
次に、相互結合網Ｂは、ＮＤ２の発行した’Ｉｎｖ’メッセージとＮＤ０の発行した'Ｒｔｙ' メッセージを集計して、集計結果'Ｒｔｙ'メッセージを全ノードに配信する。
【００６４】
各ノードのＣＣＣ受信部１５は、このＣＣＣ信号を受け取り、スターベーション管理部１９に出力する。スターベーション管理部１９では、前述のように、ＮＤ１の発行したリード・トランザクションが送信されている。この処理では、前述のスターベーション・レジスタ書込み制御部１９ｅが処理を行なう。スターベーション・レジスタ書込み制御部１９ｅがスターベーション・レジスタ１９ｃのＮＤ１のバリッドビット１９ｃａを点灯させ、レジスタ１９ｃｂにリード・トランザクションのアドレスを登録する。このようにして、スターベーション・管理部を参照すれば、格納されたリード・トランザクションのアドレスに対して、現在処理中のＮＤ０のリード・トランザクションが終了した場合、次にＮＤ１のリード・トランザクションが処理される予定であることを、処理中のリード・トランザクションを発行したＮＤ０、リトライを要求されたリード・トランザクションを発行したＮＤ１を含めた各々のノードで把握することが可能となる。
【００６５】
一方、ＮＤ１は'Ｒｔｙ'メッセージを受信すると、プロセッサ１１に対してリード要求の再実行（リトライ）を要求し、リクエスト管理テーブル１４のアドレスの登録を抹消して、リード処理を完了する。その後、プロセッサ１１からリトライがされると、再度リード・トランザクションが実行されることとなる。
【００６６】
このように、ＮＤ０がリード処理を実行中であり、リクエスト管理テーブル１４に登録されているアドレスへの他ノードのリード要求に対して'Ｒｔｙ'を返答することにより、キャッシュメモリ１３ａの状態が確定していないアドレスに対して誤ったキャッシュ・コヒーレンシ一貫性制御結果を返答することを防ぐことができ、さらには正しくキャッシュ一貫性を保つことができる。
【００６７】
（フェーズ３）
ＮＤ０は（フェーズ１）のリード要求の処理を継続し、メインメモリのデータをキャッシュメモリ１３ａにＥｘｃｌｕｓｉｖｅ−Ｕｎｍｏｄｉｆｉｅｄ状態で登録して、プロセッサ１１にデータを供給する。また、リクエスト管理テーブル１４のアドレスの登録を抹消して、リード処理を完了する。
【００６８】
（フェーズ４）
図１０は図９の処理の続きを図示している。図１０のタイミングチャートの開始時点では、全ノードのスターベーション・レジスタ１９ｃのＮＤ１に対応するバリッドビット１９ｃａが点灯しており、またレジスタ１９ｃｂには（フェーズ２）でＮＤ１のプロセッサ１１が要求したリードアドレスが登録されている。
【００６９】
ここで、ＮＤ２のプロセッサ１１がリード要求を発行する。リード要求のアドレスは（フェーズ１）でＮＤ０のプロセッサ１１が発行したリード要求のアドレスおよび、（フェーズ２）でＮＤ１のプロセッサ１１が発行したリード要求のアドレスと同一であるとする。従って、リトライ待機中のノードがある場合に、それ以外のノードのプロセッサ１１が、同じアドレスに対してリクエストをした場合の説明である。
【００７０】
ＮＤ２のキャッシュはキャッシュミスを起こす。ＮＤ２のキャッシュ・メインメモリ制御部１２はパス１９ｂを経由して、スターベーション管理部１９に対して、リード要求のアドレスを送信する。ここで、図７で説明したように、スターベーション管理部１９のスターベーション・レジスタ読み出し制御部１９ｄはパス１９ｂから受信したアドレスに対して、スターベーション・レジスタ１９ｃの検索を行なう。ＮＤ２に対応するバリッドビットは点灯しておらず、ＮＤ１に対応するバリッドビットが点灯し、さらにレジスタ１９ｃａに登録されているアドレスはパス１９ｂから受信したアドレスに一致する。
【００７１】
上記の判定から、ＮＤ２のスターベーション管理部１９はパス１９ａを経由して、キャッシュ・メインメモリ制御部１２を経由して、プロセッサ１１から当該アドレスへのアクセスの中止および再実行（リトライ）を要求する信号を送信する。パス１９ａからリトライの要求を受けたＮＤ２のキャッシュ・メインメモリ制御部１２は相互結合網Aにリード・トランザクションを発行することなく、プロセッサ１１に対してリード要求の再実行を要求して、リード処理を完了する。
【００７２】
（フェーズ５）
次に、リード要求のリトライを要求されたＮＤ１のプロセッサ１１は再び、同一アドレスに対するリード要求を発するものとする。これは、次に処理される予定のリード・トランザクションを発行するべきノードが、リトライをする処理を示すものである。
【００７３】
ＮＤ１はキャッシュミスを起こす。キャッシュ・メインメモリ制御装置はアドレスをリクエスト管理テーブル１４に登録し、相互結合網Aに向かってリード・トランザクションを発行する。
【００７４】
相互結合網Aは全ノードにリード・トランザクションを配信する。ＮＤ２はリード・トランザクションを受けると、自ノードのキャッシュ状態を調べ、Ｉｎｖａｌｉｄ状態であることを示すメッセージ’Ｉｎｖ’を相互結合網Ｂに送信する。
【００７５】
リード・トランザクションを受信したＮＤ０は自ノードのキャッシュ状態を調べ、要求されたデータがＥｘｃｌｕｓｉｖｅ−Ｕｎｍｏｄｉｆｉｅｄ状態で登録されていることを認識する。ＮＤ０のキャッシュ・メインメモリ制御部１２はキャッシュの状態をＩｎｖａｌｉｄに変更した後、キャッシュ・コヒーレンシ一貫性制御結果として、’Ｉｎｖ’メッセージを相互結合網Ｂに送信する。また、ＮＤ０はリード・トランザクションの要求するアドレスが自ノードのメインメモリのデータを要求していることを判定し、メインメモリからデータを読み出して、相互結合網Aに送信する。
【００７６】
相互結合網ＢはＮＤ０とＮＤ２の’Ｉｎｖ’メッセージを集計して、集計結果’Ｉｎｖ’メッセージを全ノードに配信する。
【００７７】
このとき、各ノードのスターベーション管理部１９にはこの’Ｉｎｖ’メッセージとＮＤ１の発行したリード・トランザクションが送信されている。各ノードのスターベーション管理部１９はスターベーション・レジスタ１９ｃのＮＤ１のバリッドビット１９ｃａが点灯していることと、レジスタ１９ｃｂに登録されているアドレスがリード・トランザクションのアドレスと一致することを判定し、ＮＤ１のバリッドビット１９ｃａを消灯する。
【００７８】
相互結合網Ｂからの’Ｉｎｖ’メッセージと、ＮＤ０からのデータを受信したＮＤ１は、データをＥｘｃｌｕｓｉｖｅ−Ｕｎｍｏｄｉｆｉｅｄ状態でキャッシュメモリ１３ａに登録して、プロセッサ１１にデータを供給する。また、リクエスト管理テーブル１４のアドレスの登録を抹消して、リード処理を完了する。
【００７９】
このように、スターベーション管理部１９に'Ｒｔｙ'のキャッシュ・コヒーレンシ一貫性制御結果を受けたアドレスとメモリアクセスの要求元を登録して、該要求元以外からの該アドレスに対するメモリアクセスを抑止することによって、該要求元の優先度を上げて、要求の沈み込みを防ぐことができる。
【００８０】
（フェーズ６）
図１１は図１０の処理の続きを図示している。図１１のタイミングチャートの開始時点では、全ノードのスターベーション・レジスタのバリッドビット１９ｃａは全て消灯しており、また、どのノードでもリクエスト管理テーブル１４部にアドレスの登録はない。
【００８１】
リード要求のリトライを要求されたＮＤ２のプロセッサ１１は再び、同一アドレスに対するリード要求を発する。これは、次に処理されるべきリード・トランザクションがある場合に、リード・トランザクションを発行しなかったノードが、リトライをする処理を示す。
【００８２】
ＮＤ２はキャッシュミスを起こす。（フェーズ４）においては、ＮＤ２のキャッシュ・メインメモリ制御部１２はスターベーション管理部１９のリトライ指示により、相互結合網Aにリード・トランザクションを発行することなく、プロセッサ１１に対してリード要求の再実行を要求して、リード処理を完了した。ここでは、スターベーション・レジスタのバリッドビットは点灯していないため、キャッシュ・メインメモリ制御部１２はアドレスをリクエスト管理テーブル１４に登録し、相互結合網Aにリード・トランザクションを発行する。
【００８３】
相互結合網Aは全ノードにリード・トランザクションを配信する。
【００８４】
ＮＤ０はリード・トランザクションを受けると、自ノードのキャッシュ状態を調べ、Ｉｎｖａｌｉｄ状態であることを示すメッセージ’Ｉｎｖ’を相互結合網Ｂに送信する。リード・トランザクションを受信したＮＤ１は自ノードのキャッシュ状態を調べ、要求されたデータがＥｘｃｌｕｓｉｖｅ−Ｕｎｍｏｄｉｆｉｅｄ状態で登録されていることを判定する。ＮＤ１のキャッシュ・メインメモリ制御部１２はキャッシュの状態をＩｎｖａｌｉｄに変更した後、キャッシュ・コヒーレンシ一貫性制御結果として、’Ｉｎｖ’メッセージを相互結合網Ｂに送信する。また、ＮＤ０はリード・トランザクションの要求するアドレスが自ノードのメインメモリのデータを要求していることを判定し、メインメモリからデータを読み出して、相互結合網Aに送信する。
【００８５】
相互結合網ＢはＮＤ０とＮＤ１の’Ｉｎｖ’メッセージを集計して、集計結果’Ｉｎｖ’メッセージを全ノードに配信する。
【００８６】
相互結合網Ｂからの’Ｉｎｖ’メッセージとＮＤ０からのデータを受信したＮＤ２は、データをＥｘｃｌｕｓｉｖｅ−Ｕｎｍｏｄｉｆｉｅｄ状態でキャッシュメモリ１３ａに登録して、プロセッサ１１にデータを供給する。また、リクエスト管理テーブル１４のアドレスの登録を抹消して、リード処理を完了する。
【００８７】
このように、本実施の形態では、各ノードは、自ノードが発行したリード・トランザクションについてはリクエスト管理テーブル１４を用いて管理する。このリクエスト管理テーブル１４を用いて、他ノードからのリード・トランザクションを受け取った場合に、自ノードの発行状態を通知することが可能となる。また、相互結合網においては、いずれかのノードから発行されたリード・トランザクションに対応して各ノードから自己のキャッシュ状態に関する情報を集計し、いずれかのノードが発行したトランザクションが処理中であるかどうかを決定し、その決定結果を各ノードに通知する。このようにして、あるノードに対するリード・トランザクションが処理中であるのに、別のノードからリード・トランザクションが発行され、そのノードに対してキャッシュメモリが応答してしまうという事態を防止することが可能となる。
【００８８】
本実施の形態においては、さらに、相互結合網からの通知は、アドレスとともに各ノードのスターベーション管理部１９に登録され、次にリード・トランザクションをすることが可能なノードを決めることできる。さらには、このスターベーション管理部１９を用いて、自ノードからリード・トランザクションを発行すべきか否かを判断することが可能となる。このように、処理中のリード・トランザクションがある場合に、次に処理されるべきノードを定めることができ、それ以外のノードは、そのノードの処理が完了するまでは、同一アドレスに対するリード・トランザクションを発行しないことにより、処理の沈みこみを防ぐことができる。
【００８９】
なお、本実施の形態においては、リクエスト管理テーブル１４を参照し、自ノードのＣＣＣ信号として‘Ｒｔｙ’を出力する場合に、併せてスターベーション・レジスタ１９ｃに対する登録を行なうようにしてもよい。すなわち、一つでも‘Ｒｔｙ’が出力される場合は、各ノードがその旨を認識する必要があるので、集計結果を出さずに、そのまま他のノードに対して‘Ｒｔｙ’信号を出力する構成としてもよい。または、自ノードのスターベーション・レジスタ１９ｃに対してはすぐに書き込みを行なうような構成とすることも可能である。
【００９０】
次に、第二の実施の形態について説明する。第一の実施の形態では、自ノードが発行したトランザクションに関しては、リクエスト管理テーブル１４を参照するようにしていたが、スターベーション管理部１９を併用する構成としてもよい。この場合は、リクエスト管理テーブル１４の構成を全くなくし、リクエスト管理テーブル１４で行なう処理をすべてスターベーション管理部１９で統括させる構成としてもよいし、他ノードからのトランザクションを受け取った場合にどのＣＣＣ信号を出力するかを決める場合にのみスターベーション管理部１９を用いる構成としてもよい。
【００９１】
前者の構成では、図９〜１１において、リクエスト管理テーブルへのアドレスの登録、抹消処理は、すべてスターベーション管理部１９で行なわれることとなる。その場合は、バリッドビッド19のほかに、各々のノードに対して、処理中であることを示すペンディングビットを格納する構成とする。受信したＣＣＣ信号が’Ｉｎｖ’であったときは、その他にそのアドレスがあり、そのレジスタのバリッドビッド１９ｃが点灯してない場合に、自ノードのレジスタにそのアドレスを格納し、対応するペンティングビットを立てる。このようにして、現在トランザクションが処理中であることが各ノードにおいて登録される。自ノードが発行したトランザクションが終了した場合は、ペンディングビッドを倒し、レジスタの値を消去する。
【００９２】
また、後者の場合は、自ノードのみ処理中であることを把握しておけばよいので、前者のような構成でなく、第一の実施の形態においてリクエスト管理テーブル１４にアドレスを格納するのに合わせて、スターベーション・レジスタの自ノードのレジスタにそのアドレスを格納し、ペンディングビットを立てる構成とする。
【００９３】
このように、スターベーション・レジスタを処理中のノードとリトライ待機中のノードを登録する構成とした場合の、各ノードにおけるキャッシュ・コヒーレンシ一貫性制御について説明する。本実施の形態では、図９のフェーズ２において、他ノードからのトランザクションを受けると、スターベーション・レジスタを用いて、自ノードのトランザクションが処理中であるか否かを識別する点が異なる。自ノードに対して、他ノードのトランザクションのアドレスと同一アドレスに対して、ペンディングビットがたっている場合は、ＣＣＣ信号として‘Ｒｔｙ’信号を出力する。このような構成により、自ノードが処理中の場合であるのに関わらず、キャッシュメモリが別のキャッシュメモリからのキャッシュ・コヒーレンシ一貫性要求に応答してしまうことを防止することができる。
【００９４】
また、本実施の形態において、前者の構成、すなわち、処理中のリード・トランザクションを全てのノードのスターベーション・レジスタに登録可能な構成においては、図９のフェーズ２において自ノードがトランザクションを発行する場合に、スターベーション・レジスタを参照し、すでに処理中のリード・トランザクションがあることを認識することが可能となる。従って、自ノードがトランザクションを発行する前に参照することで、第二のトランザクションが発行されることを防ぐことができる。その場合は、リトライ待機中のノードに対しては、沈み込みは防止することができないが、相互結合網の不要なトランザクションを削減することが可能となる。
【００９５】
次に、第３の実施の形態について説明する。本実施の形態では、スターベーション・レジスタにおいて、リトライ待機中のノードを複数登録することができるものとする。すなわち、スターベーション管理部１９を、各ノードのエントリに対して、優先順位をつけるための領域を有する構成とする。このようにして、処理されるべきリード・トランザクションを複数個登録するものとする。次に、本実施の形態における処理を説明する。
【００９６】
まず、複数個のノードを登録可能とするために、本実施の形態においては、プロセッサ１１からリクエストが出され、スターベーション・レジスタにリトライ待機中のノードがあったとしても、プロセッサ１１に対してリトライを要求するのではなく、他ノードに対してリード・トランザクションを発行する構成となる。また、本実施の形態では、他ノードからのトランザクションを受け取った場合に、リクエスト管理テーブル１４だけでなく、スターベーション管理部１９が、スターベーション・レジスタも参照するものとする。
【００９７】
例えば、図１０のフェーズ４を例に説明すると、ＮＤ２がキャッシュミスを起こした場合にスターベーション・レジスタを参照すると、ＮＤ１がリトライ待機中であることが判明する。その場合に、リード・トランザクションを発行すると、ＮＤ０は、リクエスト管理テーブル１４とスターベーション・レジスタの自ノードのレジスタを確認し、自ノードが、処理中又はリトライ待機中であるか否かを判断する。ＮＤ０の場合は、処理中でも、リトライ待機中でもないので、ＣＣＣ信号として’Ｉｎｖ’信号を出力する。ＮＤ１は、リクエスト管理テーブル１４を参照すると自ノードが処理中でないことがわかるが、スターベーション・レジスタを参照すると、自ノードがリトライ待機中であることが判明する。この場合に、ＮＤ１は、‘Ｒｔｙ’信号をＣＣＣ信号として出力する。すると、相互結合網の集計結果により、‘Ｒｔｙ’信号が通知され、ＮＤ２についてもリトライ待機中であることを各ノードで判断することが可能である。ＮＤ２がリトライ待機中であることをスターベーション・レジスタに登録する場合に、現状のスターベーション・レジスタに登録されているリトライ待機中のノードを計数し、ＮＤ２は優先順位を格納するビットにその中で最後の優先順位を登録するものとする。このような構成により、複数のリトライ待機中のノードを登録することが可能となる。
【００９８】
次に、リトライ待機中であってノードがリトライを行なう場合の処理について説明する。優先順位をきめて複数個のリトライ待機中のノードを登録可能とする場合は、リトライ待機中のノードの優先順位に従って処理を行なわないと、沈み込みを防止することができない。まず、自ノードのプロセッサ１１からリトライがリクエストされた場合に、スターベーション・レジスタを参照する。本実施の形態のスターベーション・レジスタにおいては、自ノードのリトライ待機中のノードが、各優先順位とともに登録されている。従って、リトライ待機中の他のノードが登録されていたとしても、自ノードの優先順位が最優先となっている場合は、リード・トランザクションを出力する。反対に、自ノードよりも、他のノードの方が高い優先順位の場合は、第１の実施の形態におけるフェーズ４と同様に、プロセッサへリトライの指示をだす。このような構成により、本実施の形態においては、複数のリトライ待機中の他ノード間での、スターベーションを防止することが可能となる。
【００９９】
【発明の効果】
本発明によれば複数のキャッシュメモリが時間的に近接してメインメモリからのデータの読み込みの処理を実行した場合にも、正しいキャッシュ・コヒーレンシ一貫性制御を実現することが可能なマルチプロセッサシステムを得ることができる。
【０１００】
また、本発明によればキャッシュ一貫性を保つために、キャッシュ・コヒーレンシ一貫性制御要求に対する中止と再実行を要求しても、キャッシュ・コヒーレンシ一貫性制御要求の沈み込みを防ぐことが可能なマルチプロセッサシステムを得ることができる。
【０１０１】
また、本発明によればリトライの要求を受けたプロセッサがメモリアクセスの再実行を行なう間、該メモリアクセスのアドレスに対するその他のプロセッサのメモリアクセス要求が相互結合網に発行されることを抑止するので、相互結合網が処理するトランザクション数を削減する効果もある。
【図面の簡単な説明】
【図１】本発明の一実施の形態であるマルチプロセッサシステム全体の構成の一例を示す構成図である。
【図２】本発明の一実施の形態であるマルチプロセッサシステムにおけるノードの構成の一例を詳細に例示した構成図である。
【図３】本発明の一実施の形態であるマルチプロセッサシステムにおける相互結合網Aの構成の一例を詳細に例示した構成図である。
【図４】本発明の一実施の形態であるマルチプロセッサシステムにおける相互結合網Ｂの構成の一例を詳細に例示した構成図である。
【図５】本発明の一実施の形態であるマルチプロセッサシステムにおけるノード内のリクエスト管理テーブル部の一例を例示した構成図である。
【図６】本発明の一実施の形態であるマルチプロセッサシステムにおけるノード内のトランザクション受信部の一例を例示した構成図である。
【図７】本発明の一実施の形態であるマルチプロセッサシステムにおけるノード内のスターベーション管理部の一例を例示した構成図である。
【図８】本発明の一実施の形態であるマルチプロセッサシステムに用いられるトランザクションの構成の一例を示す構成図である。
【図９】本発明の一実施形態におけるメモリリード処理の作用の一例を示すタイミングチャートである（フェーズ１〜３）。
【図１０】本発明の一実施形態におけるメモリリード処理の作用の一例を示すタイミングチャートである（フェーズ４〜５）。
【図１１】本発明の一実施形態におけるメモリリード処理の作用の一例を示すタイミングチャートである（フェーズ６）。
【図１２】従来技術を説明するためのマルチプロセッサシステム全体の構成の一例を示す構成図である。
【図１３】従来技術を説明するためのマルチプロセッサシステムにおけるメモリリード処理の動作を示すタイミングチャートである。
【図１４】従来技術を説明するためのマルチプロセッサシステムにおいて、二つのプロセッサが同一アドレスに対するリード処理を時間的に近接して行なうときの動作を示すタイミングチャートである。
【符号の説明】
１０…ノード（ＮＤ０〜ＮＤｎ）
１１…プロセッサ
１２…キャッシュ・メインメモリ制御部
１３ａ…キャッシュメモリ
１３ｂ…キャッシュタグ
１３ｃ…メインメモリ
１４…リクエスト管理テーブル部
１４ａ…キャッシュ・メインメモリ制御部とリクエスト管理テーブル間のパス
１４ｂ…アドレス登録レジスタ
１５…ＣＣＣ受信部
１５ａ…相互結合網ＢからＣＣＣ受信部へのパス
１５ｂ…ＣＣＣ受信部からキャッシュ・メインメモリ制御部とスターベーション管理部へのパス
１５ｃ…ＣＣＣ受信部からトランザクション受信部へのパス
１６…トランザクション受信部
１６ａ…相互結合網Aからトランザクション受信部へのパス
１６ｂ…トランザクション受信部からキャッシュ・メインメモリ制御部へのパス
１６ｃ…トランザクション受信部からスターベーション管理部へのパス
１６ｄ…キャッシュ・コヒーレンシ一貫性制御を要求するトランザクションを保持するキュー
１６ｅ…キュー（１６ｄ）のリードポインタ
１６ｆ…キュー（１６ｄ）のライトポインタ
１７…ＣＣＣ送信部
１７ａ…ＣＣＣ送信部から相互結合網Ｂへのパス
１７ｂ…キャッシュ・メインメモリ制御部からＣＣＣ送信部へのパス
１８…トランザクション送信部
１８ａ…トランザクション送信部から相互結合網Aへのパス
１８ｂ…キャッシュ・メインメモリ制御部からトランザクション送信部へのパス
１９…スターベーション管理部
１９ａ…スターベーション管理部からキャッシュ・メインメモリ制御部へのパス
１９ｂ…キャッシュ・メインメモリ制御部からスターベーション管理部へのパス、
１９ｃ…スターベーション・レジスタ
１９ｃａ…スターベーション・レジスタのバリッドレジスタ
１９ｃｂ…スターベーション・レジスタのアドレスレジスタ
１９ｄ…スターベーション・レジスタ読み出し制御部
１９ｅ…スターベーション・レジスタ書き込み制御部
２０…相互結合網A（ＸＢ１）
２０ａ…リード・トランザクション
２０ｂ…リターン・トランザクション
２０ｃ…転送トランザクション
２０ｅ…ライト・トランザクション
２１ａ…トランザクション受信キュー
２１ｂ…トランザクション送信キュー
２２ａ…トランザクション受信ポート
２２ｂ…トランザクション送信ポート
３０…相互結合網Ｂ（ＸＢ２）
３１ａ…ＣＣＣ信号受信キュー
３１ｂ…ＣＣＣ信号送信キュー
３２ａ…ＣＣＣ信号受信キューから集計論理部へのパス、
３２ｂ…集計論理部からＣＣＣ信号送信キューへのパス
３３…集計論理部
１００…ノード
１０１…プロセッサ
１０２…キャッシュ・メインメモリ制御部
１０３…キャッシュメモリ
１０４…キャッシュタグ
１０５…メインメモリ
１１１…相互結合網A
１１２…相互結合網Ｂ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a multiprocessor system having a plurality of processors.
[0002]
[Prior art]
It is well known that the use of a cache memory is effective for improving the processing speed of a computer system. The cache memory is a high-speed, small-capacity memory located between the processor and the main memory. The cache memory holds a part of the data in the main memory, and transmits / receives data to / from the processor instead of the main memory, thereby realizing data access faster than the main memory.
[0003]
In a computer system using such a cache memory, if data is stored in the cache memory when the processor issues a read request (that is, hits), the data in the cache memory is immediately transmitted to the processor, and high speed Can be realized.
[0004]
In response to a read request from the processor to the main memory, if the cache memory does not have the data (that is, misses), the cache memory reads the data requested by the processor from the main memory and supplies it to the processor.
[0005]
Here, in a computer system having a plurality of cache memories, cache coherency consistency control is required when data is read from the main memory into the cache memory. Cache coherency consistency control is to guarantee that two or more cache memories have the same value (cache coherency consistency) when copies of data at the same address on the main memory are held. Control.
[0006]
An example of cache coherency consistency control will be described in the system shown in FIG. The system of FIG. 12 has a configuration in which a plurality of nodes (100) (xND0 to xNDn) are connected to each other by an interconnection network 1 (111) and an interconnection network 2 (112).
[0007]
The node (100) includes a processor (101), a cache memory (103), a cache tag (104), a main memory (105), and a cache main memory control device (102). In response to a request from another node via the processor (101) or the interconnection network 1 (111), the cache main memory control device (102) is connected to the cache memory (103), the cache tag (104), the main memory ( 105) Read and write data.
[0008]
The cache main memory controller (102) performs cache coherency consistency control according to the well-known MESI cache coherency protocol. That is, as the cache state, (a) Invalid (the data is invalid), (b) Shared-Unmodified (the data is also present in the cache memory of another processor and is the same as the data in the main memory), (c ) Exclusive-Modified (the data exists only in the cache memory and is not the same as the data in the main memory), (d) Exclusive-Unmodified (the data exists only in the cache memory) , The same as the data in the main memory). When a read request occurs in any node and the data is not in the cache memory in the own node (read miss), a read transaction is broadcast (broadcast transmission) to the interconnection network 1, and all nodes Receive.
[0009]
At this time, if any of the cache memories in the node is hit, the data is transferred from the node to the requesting node. On the other hand, if there is no hit in the cache memory in any node, data return is performed from the main memory holding the requested data. Further, when the data line targeted for replacement in the cache memory (an operation for expelling existing data in order to create a free area in the cache memory) is Exclusive-Modified, the mutual connection network is reflected in the main memory. Send a write transaction to 1.
[0010]
The interconnection network 1 distributes transactions exchanged between nodes. The interconnection network 2 distributes a response message between nodes when the node that has received the read transaction performs cache coherency consistency control.
[0011]
FIG. 13 is a timing chart illustrating an operation of reading data from the main memory into the cache memory and supplying the data to the processor in such a system. In FIG. 13, the vertical direction indicates each node or a component in the node related to the operation, and the horizontal direction indicates the time axis of the operation of each node. Further, the system in FIG. 13 has a configuration of three nodes xND0 to xND2.
[0012]
In FIG. 13, first, the processor of the node xND0 issues a memory read request. It is assumed that the requested data is in the cache state Invalid in the cache memory of the own node, that is, a cache miss has occurred. Then, the cache main memory control device of xND0 issues a read transaction requesting data of this address to the interconnection network 1. The interconnection network 1 distributes this read transaction to all nodes.
[0013]
When xND1 and xND2 receive this read transaction, xND1 and xND2 check in what state the data at the address of the read request is stored in the cache of its own node. Assuming that the requested data for both xND1 and xND2 are in the invalid state on the cache of the own node, xND1 and xND2 respectively transmit a message 'Inv' to the interconnection network 2 as a cache coherency consistency control result. Here, it is assumed that “Inv” is a message indicating that the requested data is in an invalid state on the cache of the own node. The interconnection network 2 delivers this 'Inv' message to xND0.
[0014]
The xND0 having received “Inv” determines that all the cache states of the other nodes are in the Invalid state and determines to use data from the main memory. Assuming that the requested data is the data of the main memory of xND0, the cache main memory controller of xND0 reads the requested data from the main memory of its own node and stores it in the cache memory in an Exclusive-Unmodified state. Then, this data is supplied to the processor to complete the read process.
[0015]
As an example of such cache coherency consistency control, for example, a system as disclosed in JP-A-10-161930 is known. In JP-A-10-161930, in a computer system having a plurality of cache memories, a write request from one cache memory to a main memory and a read request from another cache to the other cache are close in time. Show how to resolve this conflict while ensuring cache coherency.
[0016]
[Problems to be solved by the invention]
Japanese Patent Application Laid-Open No. 10-161930 discloses a solution when a write and a read conflict, but does not describe anything about a solution when another read and the same address conflict. In the case of the above-described prior art, there may be a problem in cache coherency when another read for the same address as the read is issued close in time. This is shown in FIG.
[0017]
FIG. 14 shows the operation when the xND0 processor issues a data read and the xND1 processor issues the same address data read before the processing is completed. In FIG. 14, the vertical direction indicates each node or a component in the node related to the operation, and the horizontal direction indicates the time axis of the operation of each node divided into four phases.
[0018]
(Phase 1)
The xND0 processor issues a read request. The cache state of the requested data is Invalid and causes a cache miss. The cache main memory controller issues a read transaction to the interconnection network 1. The interconnection network 1 distributes the read transaction to all nodes. When xND1 and xND2 receive a read transaction, it is assumed that the cache state of its own node is checked and a message 'Inv' indicating that it is in an Invalid state is transmitted to the interconnection network 2. The interconnection network 2 transmits the 'Inv' message transmitted from xND1 and xND2 to xND0.
[0019]
(Phase 2)
The xND1 processor issues a read request. It is assumed that the address of the read request is the same as the address of the read request issued by the xND0 processor in (Phase 1). xND1 causes a cache miss. The cache main memory controller issues a read transaction to the interconnection network 1. The interconnection network 1 distributes the read transaction to all nodes. When xND0 and xND2 receive the read transaction, they check the cache state of their own node, and transmit a message 'Inv' indicating the invalid state to the interconnection network 2. The interconnection network 2 delivers this “Inv” message to xND1.
[0020]
(Phase 3)
xND0 continues processing the read request in (Phase 1) and receives data from the main memory. Since xND0 has received the message 'Inv' as a result of cache coherency consistency control from both xND1 and xND2 in (Phase 1), the data from this main memory is registered in the cache memory in an Exclusive-Unmodified state. The data is supplied to the processor to complete the read process.
[0021]
(Phase 4)
The xND 1 continues to process the read request in (Phase 2) and receives data from the main memory. Since xND0 also receives the message 'Inv' as a result of cache coherency consistency control from both xND0 and xND2 in (Phase 2), the data from this main memory is registered in the cache memory in an Exclusive-Unmodified state. The data is supplied to the processor to complete the read process.
[0022]
As described above, two cache memories hold data in the exclusive-unmodified state in the above operation even though the cache that holds the data in the exclusive-unmodified state must be only one in the system. become. Accordingly, there is a possibility that cache coherence is not guaranteed depending on the subsequent data write operation of the processor.
[0023]
The above problem is that the cache memory is in the process of reading data from the main memory and responds to a cache coherency consistency control request from another cache memory even though the cache status is not fixed This is caused by outputting a cache coherency consistency control result.
[0024]
An object of the present invention is to provide a multiprocessor system capable of realizing correct cache coherency consistency control even when a plurality of cache memories are close in time and execute processing of reading data from the main memory. Is to provide.
[0025]
Another object of the present invention is to prevent the sinking of the cache coherency consistency control request even if the cancellation and the re-execution of the cache coherency consistency control request are requested in order to maintain the cache coherency consistency. It is to provide a multiprocessor system capable of performing the above. Other purposes will become apparent in the description below.
[0026]
[Means for Solving the Problems]
In the present invention, each node comprises information relating to a read access request sent in the system. When a cache miss occurs in each node, each node refers to this information to determine whether a read access request has already been made for the required data, and the node makes a read request. Determine if you can.
[0027]
In the present invention, the information related to the read access request includes status information of the access request transmitted by the own node. When a read access request transmitted from another node is received, it is determined by referring to this information whether the read access request has already been made in the own node, and the result is output.
[0028]
In the present invention, the information related to the read access request includes information about the priority order in which each read access request is made. Using these pieces of information, the next read access request to be made is determined.
[0029]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0030]
FIG. 1 is a conceptual diagram showing an example of the overall configuration of a multiprocessor 11 system according to the first embodiment of the present invention.
[0031]
As illustrated in FIG. 1, in the multiprocessor 11 system according to the present embodiment, a plurality of nodes ND0 to NDn (10) are coupled to each other via an interconnection network A (20) and an interconnection network B (30). It has been configured. A node refers to a processor 11 module that performs data processing independently. In this embodiment, each of the nodes NDO to NDn is realized by a cross bus bar switch. However, the present invention is not limited to this, and it goes without saying that a configuration using a bus may be used.
[0032]
FIG. 2 is a conceptual diagram showing the configuration of the node (10). As illustrated in FIG. 2, each node 10 includes a processor 11, a cache main memory control unit 12, a cache memory 13a, a cache tag 13b, a main memory 13c, a request management table 14, a CCC reception unit 15, and a transaction reception. Unit 16, CCC transmission unit 17, transaction transmission unit 18, and starvation management unit 19.
[0033]
The processor 11 calculates various data and is constituted by a general-purpose microprocessor 11. The main memory 13c is loaded with data and programs to be processed by the processor 11. The cache memory 13a temporarily holds data exchanged between the processor 11 and the main memory 13c or the outside of the node 10. The cache / main memory control unit 12 controls data transfer and the like of the main memory 13c and the cache memory 13a. The cache tag 13b stores control information used when the cache main memory control unit 12 controls the cache memory 13a. The transaction reception unit 16 and the transaction transmission unit 18 exchange information with other nodes via the interconnection network A (20). The CCC receiving unit 15 and the CCC transmitting unit 17 transmit or receive the cache coherency consistency control result (CCC) via the interconnection network B (30).
[0034]
Further, each node (10) is retried with the request management table unit (14) for managing the memory access issued by the node to the node via the interconnection network A and the cache coherency consistency control result. And a starvation management section (19) for recording the address of the memory access request and the request source.
[0035]
In the present embodiment, the cache main memory control unit 12 performs cache coherency consistency control according to the well-known MESI cache coherency protocol. That is, as a cache state, (a) Invalid (the data is invalid), (b) Shared-Unmodified (the data exists in the cache memory 13a of the other processor 11 and is the same as the data in the main memory), (C) Exclusive-Modified (the data exists only in the cache memory 13a and is not the same as the data in the main memory), (d) Exclusive-Unmodified (the data is stored in the cache memory 13a) Defined in the main memory and the same as the data in the main memory).
[0036]
When a read request is generated in an arbitrary node and the data is not in the cache memory 13a in the own node (read miss), a read transaction is broadcast (broadcast transmission) to the interconnection network A (20), The other node (ND) receives it. At this time, if the cache memory 13a in the node is hit, the data is transferred from the node to the requesting node. On the other hand, if the cache memory 13a in any node is not hit, the main memory A data return is performed. Further, when the data line targeted for replacement in the cache memory 13a (operation for expelling existing data to create a free area in the cache memory 13a) is Exclusive-Modified, the data lines are mutually reflected to be reflected in the main memory. A write transaction is sent to the connection network A.
[0037]
FIG. 3 is a block diagram of the interconnection network A (20). As illustrated in FIG. 3, the interconnection network A (20) includes switch coupling logic 23 that performs connection control of 1: 1 or 1: multiple (broadcast transmission) between the nodes. The switch coupling logic 23 and each node are connected by ports 22a and 22b. By switching the connection between the ports 22a and 22b to which each node is connected, 1: 1 or 1: multi (broadcast transmission) connection control is performed. Each port 22a, b includes transaction queues 21a, 21b.
[0038]
FIG. 4 is a block diagram of the interconnection network B (30). As illustrated in FIG. 4, a CCC reception queue 31a that receives a CCC signal output from a node, and an aggregation logic unit that aggregates CCC signals received from each node and totals the results of coherency consistency control of the entire node. (33) It is configured by a CCC transmission queue 31b that transmits the aggregated CCC signals to each node.
[0039]
Here, the following four types are defined as CCC signals output from the node.
'Inv': The requested data (cache line) is invalid (INVALID) in the node.
'Supp': The requested cache line exists in the own cache memory 13a as “EXCLUSIVE-MODIFIED”, the data is transferred to the request source (SUPPLY), and the transferred cache line is set to “INVALID”. The requesting node treats the cache line as “EXCLUSIVE-MODIFED” after receiving the cache line.
'Shr': The requested data (cache line) is (Shared-Unmodified) in the node.
'Rty': Since the requested data (cache line) is executing cache coherency consistency control by another transaction and the cache state is not fixed, the data request is requested to be stopped and re-executed.
[0040]
When the above four types of CCC signals are received from each node and the CCC signals are collected (except for the node that requested the cache coherency consistency control), the totaling logic unit (33) determines the totaling result. The method is as follows.
When all the CCC signals are “Inv”: the total result is “Inv”.
When one or more 'Shr' exists and 'Rty' is not included: the result of aggregation is 'Shr'.
When “Sup” is included: The total result is “Sup”.
When “Rty” is included: the total result is “Rty”. However, when 'Sup' is included at the same time, 'Sup' is given priority, and the total result is 'Sup'.
[0041]
The cache is controlled so that combinations other than the above do not appear. The aggregation result output by the aggregation logic unit (33) is broadcast to all nodes via the CCC transmission queue 31b. The CCC signal is processed in order.
[0042]
FIG. 5 is a configuration diagram of the request management table unit 14. As shown in FIG. 5, the request management table unit 14 is composed of a plurality of registers 14b that hold addresses of access requests made by the cache main memory control unit 12 of the own node with respect to the memory outside the own node. The cache main memory control unit 12 performs address registration, address registration cancellation, and the like via the path 14a.
[0043]
FIG. 6 is a configuration diagram for explaining the transaction receiving unit 16. As shown in FIG. 6, the transaction receiver 16 receives a transaction transmitted from another node from the interconnection network A via the bus 16 a. The received transaction is transmitted to the cache main memory control unit 12. The transaction receiving unit 16 includes a queue 16d that stores a transaction that requests cache coherency consistency control, a write pointer 16f of the queue 16d, and a read pointer 16e.
[0044]
The write pointer 16f is used for control for sequentially storing transactions requiring cache coherency consistency control in the queue 16d. The read pointer 16e receives a signal 15c indicating that the CCC signal has arrived from the CCC receiver 15, and is used for control to transmit a transaction corresponding to the CCC signal to the starvation manager 19 via the path 16c. As described with reference to FIG. 4, this CCC signal is a collection of CCC signal results from each mode.
[0045]
In this embodiment, by referring to this request management table, status information such as whether a read transaction issued by the own node is issued, is not issued, or is being processed is displayed. It can be provided to other nodes.
[0046]
FIG. 7 is a configuration diagram of the starvation management unit 19. As shown in FIG. 7, the starvation management unit 19 includes a starvation register 19c, a starvation register read control unit 19d, and a starvation register write control unit 19e. The starvation register 19c has, for each node, a register 19cb that holds an address to which data requested by a read transaction is accessed, and a valid bit 19ca that indicates the validity of the held value.
[0047]
The starvation register write control unit 19e in the present embodiment mainly manages transactions transmitted from other nodes. A CCC signal is received from the path 15b, and a transaction corresponding to the CCC signal is received from the path 16c. At this time, the starvation register write control unit 19e operates as follows.
When the CCC signal is Rty: The transaction source node is determined from the information embedded in the transaction received from the path 16c, and the valid bit 19ca corresponding to the node is read. If the valid bid is not lit, it is lit and the address embedded in the transaction is written to the register 19cb. The lighting of the valid signal indicates that this node is waiting for a retry for the target address. The node waiting for retry refers to a node that retries the next read transaction related to the data after the read transaction of another node currently being processed is completed.
When the CCC signal is other than Rty: The transaction source node is determined from the information embedded in the transaction received from the path 16c, and the valid bit 19ca and the register 19cb corresponding to the node are read. If the valid is on, the value read from the register 19cb is compared with the address embedded in the transaction. If they match, the valid bit 19ca is turned off. The fact that the CCC signal is not Rty means that the retry of the transaction by the issuing node has succeeded. Therefore, turning off the valid bid 19ca is for canceling the identification of the node to be retried next.
[0048]
The starvation register read control unit 19d of the present embodiment mainly performs processing necessary when issuing a read transaction after the own node makes a cache miss. That is, when a cache miss occurs with respect to data requested from the processor, the status of each node regarding the data is checked. The operation of the starvation register read control unit 19d at that time is as follows.
[0049]
The cache main memory control unit 12 (12) receives the address value that is the target of the read transaction to be transmitted via the path 19b. First, the valid bit 19ca and the register 19cb corresponding to the own node are read. If the valid bit is off, or if the address received from the register 19cb and the path 19b of the own node does not match, a read transaction has not been issued from the own node for the target data, or Indicates a state that is not waiting for a retry. Next, it is searched whether there is any register in the register 19cb other than its own node that matches the address received from the path 19b. If there is a matching register and the valid bit 19ca of the register is lit, this means that the node of the register is waiting for a retry for the address. Therefore, a signal indicating that the processor 11 should retry the address received via the path 19b is transmitted to the cache main memory control unit 12 via 19a. The cache main memory control unit 12 notifies the processor 11 to that effect.
[0050]
Next, the types of transactions used in this embodiment will be described with reference to FIG. Information exchange between nodes is basically performed by linking time-series data units (transactions) having a 64-bit width illustrated in FIG. 8 in units of operation cycles of the interconnection network A (20). This is performed by sending to (20) or receiving from the interconnection network A (20). The data structure of each transaction includes an area (TYPE) indicating the type of each transaction, an area (PORT) in which identification information of a node (request source, transfer destination, etc.) related to each transaction is stored, and the destination. An area (MISC) for storing information used when processing a transaction of, an area for storing a read access target address (ADDRESS), and an area for storing data to be transmitted as needed (DATA) Have. Next, the data structure of each transaction will be described.
[0051]
FIG. 8A shows a read transaction broadcast (broadcast) to the memory module and all the processors 11 modules. A bit pattern indicating that this transaction is a read transaction is set in the TYPE field having an 8-bit width. In the next 8-bit PORT field, it is indicated to the interconnection network A (20) that the destination information is to be broadcasted to all other nodes simultaneously with the port number of the node holding the requested main memory. A specific bit pattern to be indicated is set. In the next 16-bit MISC field, for example, identification information such as the port number of the requesting node is set. In the remaining 32-bit ADDRESS field, a read target address is set.
[0052]
FIG. 8B shows a return transaction for returning the data read from the main memory to the request source. A specific bit pattern indicating a return transaction is set in the TYPE field, the port number of the read request source node is set in the PORT field, and a parameter such as a data length (number of cycles) is set in the MISC field. Set and the remaining fields are unused.
[0053]
FIG. 8C shows a transfer transaction for transferring a read request to the read request source when the cache memory 13a of another node holds the latest updated data in response to the read request. A specific bit pattern indicating a transfer transaction is set in the TYPE field, the port number of the transfer destination node is set in the PORT field, and a parameter such as a data length (number of cycles) is set in the MISC field. The remaining fields are unused.
[0054]
FIG. 8D shows a write transaction requesting a memory write. A specific bit pattern indicating a write transaction is set in the TYPE field, the port number of the node holding the requested main memory is set in the PORT field, and the data length (number of cycles) is set in the MISC field, for example. Parameters are set, and the remaining fields are unused.
[0055]
Next, an example of the operation of the multiprocessor 11 system and transaction control according to the present embodiment will be described with reference to the timing charts of FIGS. 9, 10, and 11. 9, 10, and 11, the nodes related to the operation or the components in the nodes are arranged in the vertical direction, and the horizontal direction shows the time axis of the operation of each node. The time axis is divided into phases 1 to 6 for each group of processes to be performed. The number of nodes in the multiprocessor 11 system in this embodiment is assumed to be three nodes ND0 to ND2. Also, as an initial state, it is assumed that there is no registration in the request management unit 14 of all nodes, and all the valid bits 19ca of the starvation management unit 19 are turned off.
[0056]
(Phase 1)
In FIG. 9, the processor 11 of ND0 issues a read request. The cache state of the requested data is Invalid and causes a cache miss. The cache main memory control device registers the address in the request management table 14 and issues a read transaction to the interconnection network A. The interconnection network A distributes the read transaction issued by ND0 to all nodes. Upon receiving a read transaction, ND1 and ND2 check the cache state of their own node, and send a message “Inv” indicating the invalid state to the interconnection network B. The interconnection network B receives the “Inv” messages transmitted from the ND1 and the ND2, aggregates the messages, and distributes the aggregation result “Inv” message to all the nodes.
[0057]
(Phase 2)
The processor 11 of the ND1 issues a read request. It is assumed that the address of the read request is the same as the address of the read request issued by the processor 11 of ND0 in (Phase 1). Therefore, phase 1 shows specific processing when a read request for the same access is issued while a read transaction is being processed.
[0058]
ND1 causes a cache miss. The cache main memory control device registers the address in the request management table 14 and issues a read transaction to the interconnection network A. The interconnection network A distributes the read transaction issued by ND1 to all nodes. The ND 2 receives a read transaction at the transaction receiving unit 16. As a result, the cache main memory control unit 12 checks the cache state of its own node, and transmits a message “Inv” indicating that it is in an invalid state from the CCC transmission unit 17 to the interconnection network B.
[0059]
On the other hand, when ND0 receives this read transaction by the transaction receiving unit 16, the cache main memory control unit 12 checks the request management table 14 to check whether the address of the requested data is registered. Since ND0 is sending a read transaction to the address of the data requested in phase 1, this address is registered, and it turns out that read processing of its own node for the same address is being executed. To do. Since the read process of ND0 is in the middle of execution and the cache state is not fixed, the cache state cannot be responded to the read request of ND1.
[0060]
To solve this problem, for example, when the cache memory 13a is executing a process of reading data from the main memory and the cache state is not fixed, the cache coherency consistency from another cache memory 13a is consistent. A method of requesting cancellation and re-execution (retry) of the cache coherency consistency control request is conceivable for the security control request.
[0061]
At this time, the following problems occur. For example, it is assumed that a certain first cache memory 13a is executing a process of reading data from the main memory and requests a retry in response to a cache coherency consistency control request from the second cache memory 13a. Here, immediately after the processing of the first cache memory 13a is completed and the second cache memory 13a re-executes the cache coherency consistency control request, the third cache memory 13a issues the cache coherency consistency control request. Suppose that it was issued. Then, the second cache memory 13a that has re-executed the cache coherency consistency control request later than this is requested to retry by the third cache memory 13a. If this is repeated, the cache coherency consistency control request of the second cache memory 13a is not processed indefinitely, and there is a possibility that starvation will occur. Regarding this problem, a solution will be presented by the present invention in (Phase 4).
[0062]
As a result of the cache coherency consistency control, ND0 transmits a message “Rty” requesting cancellation and re-execution (retry) of ND1 from the CCC transmission unit 17 to the interconnection network B. In this way, each node notifies other nodes of the information regarding whether or not the read transaction transmitted by the own node is in a processing state.
[0063]
Next, the interconnection network B aggregates the “Inv” message issued by ND2 and the “Rty” message issued by ND0, and distributes the aggregation result “Rty” message to all nodes.
[0064]
The CCC reception unit 15 of each node receives this CCC signal and outputs it to the starvation management unit 19. In the starvation management unit 19, the read transaction issued by ND1 is transmitted as described above. In this process, the starvation register write control unit 19e performs the process. The starvation register write control unit 19e lights the valid bit 19ca of ND1 of the starvation register 19c, and registers the address of the read transaction in the register 19cb. In this way, referring to the starvation / management unit, when the read transaction of ND0 currently being processed is completed for the address of the stored read transaction, the read transaction of ND1 is processed next. It is possible for each node including the ND0 that issued the read transaction being processed and the ND1 that issued the read transaction requested to be retried.
[0065]
On the other hand, when the ND1 receives the “Rty” message, it requests the processor 11 to re-execute (retry) the read request, deletes the registration of the address in the request management table 14, and completes the read process. Thereafter, when the processor 11 retries, the read transaction is executed again.
[0066]
As described above, the ND0 is executing the read process, and the state of the cache memory 13a is determined by returning “Rty” in response to the read request of the other node to the address registered in the request management table 14. It is possible to prevent an erroneous cache coherency consistency control result from being returned to an address that has not been set, and to maintain cache coherency correctly.
[0067]
(Phase 3)
ND0 continues the processing of the read request in (Phase 1), registers the data in the main memory in the cache memory 13a in the Exclusive-Unmodified state, and supplies the data to the processor 11. Further, the registration of the address in the request management table 14 is deleted, and the read process is completed.
[0068]
(Phase 4)
FIG. 10 illustrates the continuation of the processing of FIG. At the start of the timing chart of FIG. 10, the valid bit 19ca corresponding to ND1 of the starvation register 19c of all nodes is lit, and the register 19cb has a read requested by the processor 11 of the ND1 in (Phase 2). The address is registered.
[0069]
Here, the processor 11 of the ND2 issues a read request. It is assumed that the read request address is the same as the read request address issued by the ND0 processor 11 in (Phase 1) and the read request address issued by the ND1 processor 11 in (Phase 2). Therefore, in the case where there is a node waiting for a retry, the processor 11 of other nodes makes a request for the same address.
[0070]
The ND2 cache causes a cache miss. The cache main memory control unit 12 of the ND 2 transmits the read request address to the starvation management unit 19 via the path 19b. Here, as described in FIG. 7, the starvation register read control unit 19d of the starvation management unit 19 searches the starvation register 19c for the address received from the path 19b. The valid bit corresponding to ND2 is not lit, the valid bit corresponding to ND1 is lit, and the address registered in the register 19ca matches the address received from the path 19b.
[0071]
From the above determination, the starvation management unit 19 of the ND2 requests the processor 11 to stop and re-execute (retry) access to the address via the path 19a and the cache main memory control unit 12. Send a signal to The cache main memory control unit 12 of the ND 2 that has received the retry request from the path 19a requests the processor 11 to re-execute the read request without issuing a read transaction to the interconnection network A, and performs read processing. To complete.
[0072]
(Phase 5)
Next, it is assumed that the processor 11 of the ND 1 requested to retry the read request issues a read request for the same address again. This indicates a process in which a node that should issue a read transaction to be processed next performs a retry.
[0073]
ND1 causes a cache miss. The cache main memory controller registers the address in the request management table 14 and issues a read transaction toward the interconnection network A.
[0074]
The interconnection network A distributes the read transaction to all nodes. When the ND 2 receives the read transaction, the ND 2 checks the cache state of its own node and transmits a message “Inv” indicating that it is in the Invalid state to the interconnection network B.
[0075]
The ND0 that has received the read transaction checks the cache state of its own node, and recognizes that the requested data is registered in the Exclusive-Unmodified state. The cache main memory control unit 12 of the ND0 changes the cache state to Invalid, and then transmits an “Inv” message to the interconnection network B as a cache coherency consistency control result. Further, ND0 determines that the address requested by the read transaction requests data in the main memory of the own node, reads the data from the main memory, and transmits it to the interconnection network A.
[0076]
The interconnection network B aggregates the “Inv” messages of ND0 and ND2, and distributes the aggregation result “Inv” message to all nodes.
[0077]
At this time, the 'Inv' message and the read transaction issued by ND1 are transmitted to the starvation management unit 19 of each node. The starvation management unit 19 of each node determines that the valid bit 19ca of the ND1 of the starvation register 19c is lit and that the address registered in the register 19cb matches the address of the read transaction. The valid bit 19ca of ND1 is turned off.
[0078]
ND1, which receives the “Inv” message from the interconnection network B and the data from ND0, registers the data in the cache memory 13a in the Exclusive-Unmodified state and supplies the data to the processor 11. Further, the registration of the address in the request management table 14 is deleted, and the read process is completed.
[0079]
In this way, the address that received the cache coherency consistency control result of 'Rty' and the memory access request source are registered in the starvation management unit 19, and memory access to the address from other than the request source is suppressed. As a result, the priority of the request source can be increased and the sinking of the request can be prevented.
[0080]
(Phase 6)
FIG. 11 illustrates the continuation of the process of FIG. At the start of the timing chart of FIG. 11, all valid bits 19ca of the starvation registers of all nodes are turned off, and no address is registered in the request management table 14 section in any node.
[0081]
The processor 11 of the ND 2 requested to retry the read request again issues a read request for the same address. This indicates a process of retrying a node that has not issued a read transaction when there is a read transaction to be processed next.
[0082]
ND2 causes a cache miss. In (Phase 4), the cache main memory control unit 12 of the ND 2 re-reads a read request to the processor 11 without issuing a read transaction to the interconnection network A in response to a retry instruction from the starvation management unit 19. Requested execution and completed read processing. Here, since the valid bit of the starvation register is not lit, the cache main memory control unit 12 registers the address in the request management table 14 and issues a read transaction to the interconnection network A.
[0083]
The interconnection network A distributes the read transaction to all nodes.
[0084]
When ND0 receives the read transaction, it checks the cache state of its own node, and sends a message “Inv” indicating the invalid state to the interconnection network B. The ND 1 that has received the read transaction checks the cache state of its own node, and determines that the requested data is registered in the Exclusive-Unmodified state. The cache main memory control unit 12 of the ND 1 changes the cache state to Invalid, and then transmits an “Inv” message to the interconnection network B as a cache coherency consistency control result. Further, ND0 determines that the address requested by the read transaction requests data in the main memory of the own node, reads the data from the main memory, and transmits it to the interconnection network A.
[0085]
The interconnection network B aggregates the “Inv” messages of ND0 and ND1, and distributes the aggregation result “Inv” message to all nodes.
[0086]
The ND 2 receiving the “Inv” message from the interconnection network B and the data from ND 0 registers the data in the cache memory 13 a in the exclusive-unmodified state and supplies the data to the processor 11. Further, the registration of the address in the request management table 14 is deleted, and the read process is completed.
[0087]
As described above, in this embodiment, each node manages the read transaction issued by itself by using the request management table 14. Using this request management table 14, when a read transaction is received from another node, it is possible to notify the issuing state of the own node. Also, in the interconnected network, in correspondence with read transactions issued from one of the nodes, information on the cache status of each node is aggregated and whether the transaction issued by one of the nodes is being processed. Determine whether or not to notify each node of the determination result. In this way, it is possible to prevent a situation where a read transaction is issued from another node and the cache memory responds to that node while a read transaction is being processed for that node. It becomes.
[0088]
In the present embodiment, the notification from the interconnection network is registered in the starvation management unit 19 of each node together with the address, and the node that can perform the next read transaction can be determined. Furthermore, it is possible to determine whether or not a read transaction should be issued from the own node using the starvation management unit 19. In this way, when there is a read transaction being processed, a node to be processed next can be determined, and other nodes can read a read transaction for the same address until the processing of the node is completed. By not issuing, the sinking of processing can be prevented.
[0089]
In the present embodiment, referring to the request management table 14, when 'Rty' is output as the CCC signal of the own node, registration to the starvation register 19c may be performed together. That is, when even one “Rty” is output, each node needs to recognize that fact, so that the “Rty” signal is output to the other nodes as it is without outputting the aggregation result. It is good. Alternatively, a configuration may be adopted in which data is immediately written to the starvation register 19c of its own node.
[0090]
Next, a second embodiment will be described. In the first embodiment, the request management table 14 is referred to for the transaction issued by the own node. However, the starvation management unit 19 may be used together. In this case, the configuration of the request management table 14 may be completely eliminated, and all processing performed in the request management table 14 may be controlled by the starvation management unit 19, or any CCC signal when a transaction from another node is received. The starvation management unit 19 may be used only when deciding whether to output.
[0091]
In the former configuration, in FIGS. 9 to 11, all the registration and deletion processing of addresses in the request management table are performed by the starvation management unit 19. In this case, in addition to the valid bid 19, a pending bit indicating that processing is being performed is stored for each node. When the received CCC signal is “Inv”, when there is another address and the valid bid 19c of the register is not lit, the address is stored in the register of the own node, and the corresponding penting is performed. Make a bit. In this way, each node registers that a transaction is currently being processed. When the transaction issued by the node ends, the pending bid is defeated and the register value is deleted.
[0092]
In the latter case, it is only necessary to know that only the own node is processing. Therefore, in the first embodiment, the address is stored in the request management table 14 instead of the former configuration. At the same time, the address is stored in the register of the own node of the starvation register, and the pending bit is set.
[0093]
In this way, cache coherency consistency control in each node when the configuration is such that a node that is processing the starvation register and a node that is waiting for a retry are registered will be described. In the present embodiment, when a transaction from another node is received in the phase 2 of FIG. 9, the difference is that the starvation register is used to identify whether or not the transaction of the own node is being processed. If a pending bit is set for the same node as the transaction address of the other node, the 'Rty' signal is output as the CCC signal. With such a configuration, it is possible to prevent the cache memory from responding to a cache coherency consistency request from another cache memory, regardless of whether the node is processing.
[0094]
Further, in the present embodiment, in the former configuration, that is, a configuration in which the read transaction being processed can be registered in the starvation registers of all nodes, the own node issues a transaction in phase 2 of FIG. In some cases, it is possible to refer to the starvation register and recognize that there is a read transaction already being processed. Therefore, it is possible to prevent the second transaction from being issued by referring to the own node before issuing the transaction. In this case, although it is not possible to prevent the sinking of the node waiting for the retry, unnecessary transactions in the interconnection network can be reduced.
[0095]
Next, a third embodiment will be described. In the present embodiment, it is assumed that a plurality of nodes waiting for retry can be registered in the starvation register. That is, the starvation management unit 19 is configured to have an area for assigning priorities to the entries of each node. In this way, a plurality of read transactions to be processed are registered. Next, processing in the present embodiment will be described.
[0096]
First, in order to enable registration of a plurality of nodes, in this embodiment, even if a request is issued from the processor 11 and there is a node waiting for retry in the starvation register, the processor 11 Instead of requesting a retry, a read transaction is issued to another node. In the present embodiment, when a transaction from another node is received, not only the request management table 14 but also the starvation management unit 19 refers to the starvation register.
[0097]
For example, in the case of phase 4 in FIG. 10, when ND2 causes a cache miss, referring to the starvation register reveals that ND1 is waiting for a retry. In this case, when a read transaction is issued, ND0 checks the request management table 14 and the register of the own node of the starvation register, and determines whether the own node is processing or waiting for a retry. . In the case of ND0, since neither processing nor waiting for retry is performed, an “Inv” signal is output as a CCC signal. ND1 finds that its own node is not processing when referring to the request management table 14, but it turns out that its own node is waiting for a retry when referring to the starvation register. In this case, ND1 outputs the “Rty” signal as a CCC signal. Then, the “Rty” signal is notified based on the total result of the interconnection network, and each node can determine that ND2 is also waiting for a retry. When registering in the starvation register that ND2 is waiting for a retry, the number of nodes waiting for retry registered in the current starvation register is counted, and ND2 includes a bit for storing priority. The last priority order is registered. With such a configuration, a plurality of nodes waiting for a retry can be registered.
[0098]
Next, processing when the node is waiting for a retry and the node performs a retry will be described. When it is possible to register a plurality of nodes waiting for a retry by determining the priority order, sinking cannot be prevented unless processing is performed in accordance with the priority order of the nodes waiting for a retry. First, when a retry is requested from the processor 11 of the own node, the starvation register is referred to. In the starvation register according to the present embodiment, a node waiting for a retry of its own node is registered together with each priority. Therefore, even if another node waiting for retry is registered, if the priority of the own node is the highest priority, a read transaction is output. On the other hand, when the priority of the other node is higher than that of the own node, a retry instruction is issued to the processor as in the phase 4 in the first embodiment. With this configuration, in the present embodiment, starvation between other nodes that are waiting for a plurality of retries can be prevented.
[0099]
【The invention's effect】
According to the present invention, there is provided a multiprocessor system capable of realizing correct cache coherency consistency control even when a plurality of cache memories are close in time and execute processing of reading data from the main memory. Obtainable.
[0100]
In addition, according to the present invention, in order to maintain cache coherency, even if cancellation and re-execution of a cache coherency consistency control request are requested, a multi-function that can prevent the cache coherency consistency control request from sinking. A processor system can be obtained.
[0101]
Further, according to the present invention, while a processor that has received a retry request performs a memory access re-execution, it is possible to prevent other processor memory access requests for the memory access address from being issued to the interconnection network. There is also an effect of reducing the number of transactions processed by the interconnection network.
[Brief description of the drawings]
FIG. 1 is a configuration diagram showing an example of the overall configuration of a multiprocessor system according to an embodiment of the present invention.
FIG. 2 is a configuration diagram illustrating in detail an example of the configuration of a node in a multiprocessor system according to an embodiment of the present invention;
FIG. 3 is a configuration diagram illustrating in detail an example of the configuration of an interconnection network A in a multiprocessor system according to an embodiment of the present invention.
FIG. 4 is a configuration diagram illustrating in detail an example of the configuration of an interconnection network B in a multiprocessor system according to an embodiment of the present invention;
FIG. 5 is a configuration diagram illustrating an example of a request management table unit in a node in a multiprocessor system according to an embodiment of the invention.
FIG. 6 is a configuration diagram illustrating an example of a transaction receiving unit in a node in the multiprocessor system according to the embodiment of the invention.
FIG. 7 is a configuration diagram illustrating an example of a starvation management unit in a node in a multiprocessor system according to an embodiment of the invention.
FIG. 8 is a configuration diagram showing an example of a configuration of a transaction used in a multiprocessor system according to an embodiment of the present invention.
FIG. 9 is a timing chart showing an example of the operation of the memory read process in one embodiment of the present invention (phases 1 to 3).
FIG. 10 is a timing chart showing an example of the operation of the memory read process in one embodiment of the present invention (phases 4 to 5).
FIG. 11 is a timing chart showing an example of the operation of the memory read process in the embodiment of the present invention (phase 6).
FIG. 12 is a configuration diagram showing an example of the configuration of the entire multiprocessor system for explaining the prior art.
FIG. 13 is a timing chart showing the operation of the memory read process in the multiprocessor system for explaining the prior art.
FIG. 14 is a timing chart showing an operation when two processors perform a read process on the same address close to each other in time in a multiprocessor system for explaining the related art.
[Explanation of symbols]
10 Node (ND0 to NDn)
11 ... Processor
12: Cache main memory control unit
13a ... Cache memory
13b ... Cash tag
13c ... Main memory
14 ... Request management table section
14a: Path between the cache main memory control unit and the request management table
14b ... Address registration register
15 ... CCC receiver
15a: Path from the interconnection network B to the CCC receiver
15b: Path from the CCC receiver to the cache main memory controller and the starvation manager
15c: Path from CCC receiver to transaction receiver
16 ... Transaction receiver
16a: Path from the interconnection network A to the transaction receiver
16b: Path from the transaction receiver to the cache main memory controller
16c: Path from the transaction receiver to the starvation manager
16d: Queue holding transactions requiring cache coherency consistency control
16e: Queue (16d) read pointer
16f: Queue (16d) write pointer
17 ... CCC transmitter
17a: Path from the CCC transmitter to the interconnection network B
17b: Path from the cache main memory control unit to the CCC transmission unit
18 ... Transaction transmitter
18a: Path from the transaction transmitter to the interconnection network A
18b: Path from the cache main memory control unit to the transaction transmission unit
19 ... Starvation Management Department
19a: Path from the starvation management unit to the cache main memory control unit
19b: path from the cache main memory control unit to the starvation management unit,
19c ... Starvation register
19ca ... Starvation register valid register
19cb ... Starvation register address register
19d: Starvation register read control unit
19e: Starvation register write controller
20: Interconnection network A (XB1)
20a: Read transaction
20b ... Return transaction
20c ... Transfer transaction
20e ... Light transaction
21a ... Transaction reception queue
21b ... Transaction transmission queue
22a ... Transaction receiving port
22b ... Transaction transmission port
30 ... Interconnection network B (XB2)
31a: CCC signal reception queue
31b ... CCC signal transmission queue
32a: Path from the CCC signal reception queue to the aggregation logic unit,
32b: Path from the aggregation logic unit to the CCC signal transmission queue
33 ... Total logic part
100 ... node
101. Processor
102: Cache main memory control unit
103: Cache memory
104 ... Cache tag
105 ... main memory
111 ... Interconnection network A
112 ... Interconnection network B

Claims

In a multiprocessor system having a plurality of nodes, an interconnection network connecting the plurality of nodes, and a main memory accessed by the plurality of nodes,
Each of the plurality of nodes is
A cache memory for holding data exchanged with the main memory;
A first storage unit that stores a request-destination main memory address of a first read request made by the node to the main memory;
When a node made a second read request to the main memory, the first read request for the same main memory address that was executed before the second read request was not completed. For this purpose, status information indicating that the second read request is waiting for a retry, request source information of the second read request, and a request main memory address of the second read request are stored. A second storage unit;
A first communication unit that transmits and receives a cache status information indicating the validity of the cache data of the node for the transaction requested by the read request to the main memory and the data requested to be read by another node;
The interconnection network is
A second communication unit that receives the transaction and the cache state information transmitted from each of the plurality of nodes;
A logic unit that aggregates the cache status information received by the second communication unit;
The second communication unit transmits the aggregation result aggregated by the logic unit and the transaction to all nodes,
Each of the plurality of nodes refers to the information stored in the second storage unit when the data of the third read request of the node is not in the cache memory of the node , and A control unit for determining whether the second read request having the same main memory address as that of the third read request is waiting for a retry and determining whether or not to perform the third read request. A multiprocessor system characterized by that.

The multiprocessor system of claim 1, wherein
The first communication unit of each node receives the transaction and the aggregation result from the interconnection network;
A multiprocessor system, wherein information stored in the second storage unit is updated based on a read access target address which is data of the transaction and the aggregation result.

The multiprocessor system of claim 1, wherein
A multiprocessor system, wherein when a plurality of read requests are made for the same address, the priority order of the plurality of read requests is further stored in the second storage unit.

A plurality of processor modules,
Each of the plurality of processor modules includes:
A processor for data processing;
Main memory for storing data to be processed by the processor of its own module and other modules;
A cache memory for temporarily storing data processed by the processor;
A first storage unit that stores a request main memory address of a first read request made by a processor of its own module with respect to the main memory;
When a processor of a module makes a second read request to the main memory, the same main memory address that was executed before the second read request is sent to the main memory. Status information indicating that the second read request is waiting for retry because the first read request is not completed, request source information of the second read request, and the second read request. A second storage unit for storing a request destination main memory address of the remote request;
A communication unit that transmits and receives the cache status information of the own module in response to a transaction by a read request to the main memory and a read request made by a processor of another module;
When the processor of its own module makes a third read request for data not stored in the cache memory,
Whether the second read request having the same main memory address as the third read request to the main memory is waiting for a retry with reference to the information stored in the second storage unit A plurality of processor modules having a cache memory control unit for determining whether to make the third read request;
An interconnection network connecting the plurality of processor modules,
A multiprocessor system comprising an interconnection network for notifying all processor modules of an aggregation result and a transaction in which cache state information output from the plurality of processor modules is aggregated.

Share main memory with other nodes connected via the interconnection network,
In a cache coherency control method for a plurality of nodes having a cache memory that temporarily stores data of the main memory and a storage unit that holds a processing state of a read request performed by the node,
The storage unit stores a request main memory address of a first read request made by the node to the main memory;
When a node made a second read request to the main memory, the first read request for the same main memory address that was executed before the second read request was not completed. For this purpose, status information indicating that the second read request is waiting for a retry, request source information of the second read request, and a request main memory address of the second read request are stored. A second storage unit,
If there is no data in the cache memory,
Refer to the information stored in the second storage unit,
If the second read request for the data of the other node is waiting for a retry, the third read request from the own node is canceled,
3. A cache coherency control method, comprising: issuing a third read request when a second read request for the data of the other node is not waiting for a retry.