JP3904251B2

JP3904251B2 - Exclusive control method

Info

Publication number: JP3904251B2
Application number: JP21607195A
Authority: JP
Inventors: 達雄樋口; 俊明垂井; 克佳北井; 茂雄武内; 達鳥羽; 真知子朝家; 泰弘稲上
Original assignee: Hitachi Ltd; Hitachi ULSI Systems Co Ltd
Current assignee: Hitachi Ltd; Hitachi Solutions Technology Ltd
Priority date: 1995-08-24
Filing date: 1995-08-24
Publication date: 2007-04-11
Anticipated expiration: 2015-08-24
Also published as: JPH0962634A

Description

【０００１】
【産業上の利用分野】
本発明は、複数のプロセッサを有する計算機システムにおいて、これらのプロセッサにより共有されている資源に対する排他制御方法に関し、とくに、それぞれプロセッサを有する複数のノードがネットワークを介して結合されている並列計算機において、いずれかのノードに属する共有資源に対する排他制御方法に関する。
【０００２】
【従来の技術】
大規模なデータベースを高速に検索するためのデータベースシステムとしては、複数のプロセッサから構成される並列計算機で実行されるかまたは複数の計算機からなるクライアント・サーバ型の分散処理システムで実行される分散データベースシステムが知られている。本明細書では、並列計算機を構成する複数のプロセッサとクライアント・サーバ型の分散処理システムを構成する複数の計算機とを区別することなく言及するために、分散処理システムの分散処理を実行する計算機要素を「ノード」と呼ぶ。
【０００３】
分散データベースシステムでは、データベースは、複数のディスク装置に分散して保持され、ユーザからの一つの検索要求を複数のノードが協調動作して処理する。すなわち、複数のノードが上記検索要求で指定されたデータベースの異なる部分を保持するディスク装置を互いに並行にアクセスして、それぞれのデータベース部分を処理するようになっている。この際、同一のデータベース部分に対して複数ノードからのアクセスが同時に発生する場合がある。この際、これらのアクセスの実行結果を保証するためには、同一のノードからの一連のアクセスが完了する間は、他のノードからのアクセスを禁止するように、異なるノードからのこれらのアクセスを調停する必要がある。
【０００４】
分散データベースシステムの構成方法としては、いくつかの方法が知られている。例えば、参考文献１：「日経エレクトロニクス」６３０号，１９９５．２．２７号，第６７頁〜第７５頁参照。中でも、アクセスの調停における利便性とシステムの構築の簡便さにおいて次の２つが優れている。まず、第１の構成方法は、シェアード・エブリシング方法で、この方法では、主記憶とデータを格納する複数のディスク装置が共有バスに接続され、さらに複数のノードもこの共有バスに接続され、各ノードはこの共有バスを介してこれらのディスク装置のいずれにもアクセスできる。しかし、一般的に共有バス結合のシェアード・エブリシング方法では、共有バスのデータ転送性能がボトルネックとなるため、１つの共有バスに接続できるノード数は大きく制限される。このため、ノード数を増大して、システムの性能をアップすることが難しい。第２の構成方法は、シェアード・ナッシング方法で、主記憶もこれらのディスク装置も複数のノードでは共有されない。すなわち、この方法では、これらの複数のディスク装置を、異なるノード内に分散して配置し、各ディスク装置を、それが属する特定のノードだけが直接にアクセス可能であり、その特定のノード以外のノードが上記ディスク装置をアクセスするときには、その特定のノードにアクセスを依頼する。すなわち、このシステムでは、複数のノードで共有されたディスク装置に格納されたファイルをアクセスする場合、直接アクセスできるノードは、そのディスク装置に物理的に接続された唯一のノード（以下、これを資源管理ノードと呼ぶ）に限定されている。物理的に接続されていない他のノード（以下、これをアクセス要求ノードと呼ぶ）がこのファイルをアクセスする場合には、ネットワークのようなメッセージ交換手段を介して、資源管理ノードにアクセス要求を発行し、その資源管理ノードを介して間接的にそのファイルをアクセスする。したがって、複数のアクセス要求ノードが同時にこのファイルをアクセスする場合には、複数のアクセス要求メッセージが資源管理ノードに送られるが、これらの要求メッセージを資源管理ノードが調停することによりこれらのアクセス要求の実行結果を保証する。
【０００５】
このように、このシェアード・ナッシング方法では、各ノードのディスク装置は物理的には他のノードによりアクセスできないので、そのディスク装置内のファイルも物理的には他のノードにより共有されていないが、上に述べたようにそのファイルの資源管理ノードを介して他のノードからアクセスできる。したがって、以下では、このシェアード・ナッシング方法でも、各ノードのファイルその他の資源の内、他のノードからのアクセスを許す資源を共有資源と呼ぶ。とくに他のノードからのアクセスを許すファイルを共有ファイルと呼ぶ。
【０００６】
このシェアード・ナッシング方法では、ノード数を増大してシステムの性能を増大することは比較的容易である。しかも、資源管理ノードのアクセス要求メッセージの処理および各ノード間でのメッセージ交換手段の性能が高ければ、共有資源へのアクセスの性能が高くなる。
【０００７】
【発明が解決しようとする課題】
しかしながら、従来のシェアード・ナッシング方法を分散データベースシステムに適用した場合では、次ぎの問題が生じる。すなわち、１つの資源、たとえば特定のディスク装置に格納されたファイルに対してアクセスが集中すると、アクセスが集中した共有ファイルを管理している資源管理ノードが、他のアクセス要求ノードからのアクセス要求処理に忙殺される。分散データベースシステムでは、この資源管理ノードは、他のノードと同様に、このデータベースの処理を分担しているので、本来はこのノードも検索処理等を行うように構成される。ところが、上記アクセス要求がその資源管理ノードに集中すると、この本来業務を実行する時間が著しく減少し、そのノードの処理が遅延し、システム全体の性能を低下させる。したがって、この資源管理ノードが、システム全体のボトルネックとなり得る。
【０００８】
複数のノードがいずれかの資源管理ノード内のディスク装置に格納されている共有ファイルを同時にアクセスする場合に、これらのアクセスノードからの複数のアクセス要求が同一のファイルに対して競合する。このアクセス競合を解決する手段として共有ファイルの排他制御が必要となる。従来のロック要求の処理方法をこのシェアード・ナッシング方法を分散データベースシステムに適用すると、アクセスノードは、アクセス要求を資源管理ノードに発行する前に、ロック要求をそこに発行する。上述のように、多数のアクセス要求が特定の資源管理ノードに集中する場合には、その前に、この資源管理ノードに多数のロック要求が集中することになる。このために、資源管理ノードによる本来業務の実行が遅延されるという上記問題がより深刻になる。以下、この問題をさらに説明する。
【０００９】
すなわち、従来のロック要求の処理方法をこのシステムに適用すると、ロックアクセス元ノードは、共有ファイルをアクセスする前に、資源管理ノードに対して、同時に共有ファイルのロック要求を資源管理ノードに転送し、割り込む。資源管理ノードは、それまで実行していた検索処理を中断し、これらのロック要求を解釈し、これらのロック要求が共有ファイルに対するロック要求であることを知ると、これらのロック要求のうちどの要求を許可するかの調停を行う。通常、リモート・プロシジャー・コールを用いたクライアント・サーバ型の処理方法では並行処理サーバ方法（コンカレント・サーバ方法）が用いられる。これは、クライアントからのサービス要求があった場合にサーバはこのサービスを提供する子プロセスを起動して次のサービス要求に備える方法である。したがって、ほぼ同時に２つのサービス要求がサーバに到着した場合、サーバは２つの子プロセスを起動し、これらの子プロセスが並行にそれぞれのクライアントにサービスを提供する。この方法を、上記シェアード・ナッシング方法の分散データベースシステムに適用すると、同一の共有ファイルへのロック要求を２つの子プロセスが発行する可能性があり、これらのロック要求を処理するため、これらの子プロセス間で調停をとる必要が生じる。これらの２つの子プロセスは同一のノードで実行されるため、良く用いられるセマフォ操作による排他制御を用いることで調停を行うことができる。たとえば、テスト・アンド・セット命令などである。
【００１０】
ロック要求の調停が済むと、それぞれの子プロセスはいずれかの調停でロック権を与えられたいずれかのアクセス要求ノードには「ロック許可」を返答し、他のアクセス要求ノードには「ロック拒絶」を返答する。そして、子プロセスの処理終了後、割り込まれていた検索処理を再開する。
【００１１】
ロック許可を受け取ったアクセス要求ノードは、共有ファイルのアクセスを行い、検索処理を続ける。そして、共有ファイルのアクセスが全て終了すると、共有ファイルのロック解除要求を資源管理ノードに転送し、資源管理ノードに割り込む。このノードは、実行中の検索処理を再び中断し、ロック解除要求を解釈し、共有ファイルのロック解除を行う。ロック解除が終了したら、割り込まれていた検索処理を再開する。一方、ロック拒絶を受け取ったアクセス要求ノードはロック許可を得たアクセス要求ノードによる共有ファイルのアクセスが終了するまで待ち、その待ち時間が経過した後、共有ファイルのロック要求を再度発行する。一般的に、この待時間は乱数もしくは一定時間であることが多い。
【００１２】
このように従来のシェアード・ナッシング方法の並列データベース処理システムに従来のロック要求の処理方法を適用すると、資源管理ノードは、ロック要求の解釈、そのロック要求の実行、ロック解除要求の解釈、その要求の実行を行う必要があるので、これらの処理の実行の間、検索処理の進行が妨害される。この妨害は、いずれかのノードがアクセス要求ロック要求を発行する度に発生する。とくに、ロック要求の処理の主要な部分は、いずれかの資源の排他的な使用を要求する複数のロック要求の一つにロック権を与えるように、複数のロック要求を調停することと、その調停でいずれかのロック要求が選ばれた後は、その資源をロックし、その選ばれたロック要求を発行したノード以外のノードからその資源にアクセス要求があった場合に、その資源へのアクセスを禁止することである。
【００１３】
本発明の目的は、同一の資源に対して複数のノードから発行される複数の排他使用要求の一つを選択する調停を高速化する排他制御方法および並列計算機を提供することである。
【００１４】
本発明のより具体的な目的は、一つのノードの管理下にある資源に対する複数のノードからの排他的使用要求の調停を、その一つのノードにあるプロセッサ以外の回路により実行させ、それにより、この調停時間を短縮し、そのプロセッサの負荷を軽減する排他制御方法および並列計算機を提供することである。
【００１５】
【課題を解決するための手段】
このために本発明の排他制御方法では、複数のノードにより使用可能な一つの資源の排他的使用状態を示す使用状態情報を各ノード内に記憶し、
いずれかのノードが該資源に対する排他的使用要求を発行すべきときに、該資源が排他的に使用された状態にあるか否かをそのノード内に記憶された該資源の使用状態情報に基づいて判別し、
該資源が排他的に使用された状態にあると判別された場合、そのノードからの該排他的使用要求の発行を中止し、
該資源が排他的に使用された状態にないと判別された場合、そのノードから該排他的使用要求を発行し、
いずれかの複数のノードから発行された複数の排他的使用要求を、該複数のノードによりアクセス可能な排他的使用の調停用の回路にネットワークを介して転送し、
該転送された複数の排他的使用要求の内、該資源を排他的に使用させる一つの排他的使用要求を該調停用の回路により選択し、
該一つの排他的使用要求が該調停用の回路により選択されたことに基づいて、その資源に関する、各ノード内に記憶されたに該使用状態情報を該資源が排他的に使用されていることを表わす使用状態情報に更新する。
【００１６】
本発明によれば、上記資源へのアクセスを要求するノードにおいて、その資源が排他的使用状態にないことを確認してから、この資源に対するアクセス要求を送出するので、無駄な排他的使用要求を出さない、それにより無駄な排他的使用要求に対する調停処理の実行を減らせる。
【００１７】
本発明による計算機システムは、上記方法を実行するために、
上記資源に対して該複数のノードから発行される複数の排他的使用要求の一つを選択する、上記ネットワークに含まれた、排他的使用要求の調停用の回路とを有し、
各ノードは、
該一つの資源の排他的使用状態を示す使用状態情報を記憶する手段と、
そのノードが該一つの資源に対する排他的使用要求を送出すべきときに、該記憶された使用状態情報に基づいて、上記資源が排他的使用済みか否かを判定する手段と、
該一つの資源が排他的使用状態でないときに、排他的使用要求を上記調停用の回路に上記ネットワークを介して排他的使用要求を含むメッセージを上記調停用の回路に宛てて転送する手段とを有し、
各ノードは、該調停用の回路により選択された一つの排他的使用要求に依存して、そのノード内に記憶されたに該使用状態情報を該資源が排他的に使用されている状態を表わす情報に更新する手段をさらに有する。
【００１８】
本発明のより具体的な態様では、上記調停用の回路は、ネットワーク内のこれらの複数のノードに共通に設けられた調停用の回路からなり、ここにおいていずれかのノードから発行された排他的使用要求に排他的使用を許可するか否かを判定するようになっている。
【００１９】
本発明のより具体的な他の態様では、上記調停用の回路は、各ノードに分散した配置された複数の調停用の回路からなり、各ノードで、いずれかのノードから発行された排他的使用要求に排他的使用を許可するか否かを判定するようになっている。
【００２０】
本発明のより具体的なさらに他の態様では、上記調停用の回路は、上記資源が属するノードに配置された調停用の回路からなり、そのノード内の記憶された上記資源の使用状態情報に基づいて、いずれかのノードから発行された排他的使用要求に排他的使用を許可するか否かを判定し、その結果を各ノードに通知するようになっている。
【００２１】
【実施例】
以下、本発明に係る計算機システムを図面に示したいくつかの実施例を参照してさらに詳細に説明する。なお、以下においては、同じ参照番号は同じものもしくは類似のものを表わすものとする。
【００２２】
＜実施例１＞
図１に本実施例による並列計算機を示す。この並列計算機では、複数のノード２はネットワーク１により結合され、各ノード２は、少なくとも一つのプロセッサ２４と、ディスク装置２５を含み、このディスク装置２５内に、これらのノードの共有資源として利用される一つ又は複数の共有ファイルが保持されている。各共有ファイルは、それを保持するノード（資源管理ノード）のみからアクセス可能であり、このファイルをアクセスしたい他のノード（アクセス要求ノード）は、この資源管理ノードにファイル読み出し要求あるいはファイル書き込み要求を発行し、その要求の実行をこの資源管理元ノードに依頼する。共有資源を使用するに当っては、その資源をいずれかのノードが使用中に他のノードが使用しないように、使用前にその資源の排他的な使用権を取得し、使用終了後にその使用権を放棄する必要がある。従来の、シェアドナッシングのシステムで、いずれかのアクセス要求ノードが、いずれかの資源管理ノードに管理された資源の排他的な使用を要求するとき、ロック要求をこの資源管理ノードに宛てて発信し、その資源管理ノードでは、複数のロック要求を調停してその一つを選択し、選択をしたときに、その資源をロックしていた。従って、ロック要求の処理は、複数の排他的要求の調停と、資源のロックとを伴う。後に説明するように、本実施例では、資源の排他的な使用権をどのアクセス要求ノードに与えられるかに関する調停が行われる。しかし、資源の使用に当っては、資源のロックを行わない。このように本実施例では、共有資源の排他的な使用の仕方が異なるが、本実施例でも、ロック要求あるいはロック状態等、従来と同じ用語「ロック」を使用するが、本実施例では、ロックは排他的使用を意味し、例えば、ロック状態とは、排他的使用状態を意味する。この用語「ロック」の使用方法は他の実施例および変形例においても同様である。
【００２３】
本実施例では、各ノードの共有資源のロック状態の管理をそのノードを含めた全てのノードで行うところに特徴がある。このために、各ノード２には、本実施例で特徴的な、ロック状態レジスタ群５２とロック制御回路５００が設けられている。ロック状態レジスタ群５２は、この並列計算機内の全ての共有資源のロック状態を管理するためのレジスタで、ロックする資源の単位に対応して設けられている。本実施例では、各ノードをロックの単位とする。すなわち、各ノード内に共有資源が複数ある場合でも、これらの共有資源をまとめてロックする。このため、本実施例では、このロック状態レジスタ群５０は、いずれかのノードにそれぞれ対応するレジスタからなり、各ロック状態レジスタは、そのレジスタに対応するノードがロックされた状態にないときには、そのことを示す情報を保持するとともに、その対応するノードがロックされているときには、そのノードをロックしているノードの識別情報を保持するようになっている。各ノードで実行されるプログラムが、他のいずれかのノードをロックしたいときには、自己のノード内のロック状態レジスタ群５０を見て、ロック先のノードがすでにロックされているか否かを判別し、もし、そのロック先ノードがロック中であるならば、ロック要求を送出しないようになっている。これにより、従来技術で発生した、ロック中のファイルに対してロック要求を出すという無駄なロック要求がなくなり、それに伴って、資源管理ノードが無駄なロック要求を処理するという無駄をなくしている。
【００２４】
複数のノードが同じ共有資源に対してロックを要求したときに、それらのロック要求を調停し、いずれか一つのロック要求を選択する必要がある。本実施例では、この調停は、各ノードのロック制御回路５００と放送中継回路１２により行われる。すなわち、いずれかのノード２がいずれかの共有ファイルを使用したい場合、ロック要求とそのファイルを保持している資源管理ノードの識別情報と、そのノード（アクセス要求ノード）の識別情報を放送すべき情報として含む放送要求メッセージを、この放送メッセージ中継回路１２に宛ててネットワーク２を介して転送する。放送メッセージ中継回路１２は、この放送要求メッセージを受け取ると、この放送すべき情報を含む放送メッセージを生成し、ネットワーク２を介して全てのノードに放送する回路である。もし、異なるノード２から、複数の放送要求メッセージがこの回路１２に転送された場合、この放送メッセージ中継回路１２は、これらのメッセージに順次応答する。したがって、この放送メッセージ中継回路１２は、複数の放送要求メッセージを逐次化する回路と言える。この放送メッセージ中継回路１２から放送されたロック要求を含む放送メッセージは、各ノード２にネットワーク２を介して転送され、各ノード２は、この放送メッセージ内のロック要求に対して、ロックを許すか否かを判定する。基本的には、同じ資源に対するロックを要求する複数のロック要求があるときには、最初に到着したロック要求にその資源のロックを許可する。この判定により新たにあるロック要求にロックが許可されたときには、そのノードのロック制御回路５００がロック状態レジスタ５２を書き換える。放送中継回路１２は、異なるノードから発行された複数のロック要求を同じ順で各ノードに供給する働きをする。したがって、各ノードのロック制御回路によるロック判定結果は、ノードによらず同じ結果となる。各ノードのプロセッサ２４は、こうして、新たにロックされた資源管理ノードおよびその資源ノードロックに成功したノードを知ることができる。したがって、ロック要求を含む放送要求メッセージを発信したノード２は、自分がロックに成功したか否かを知ることができ、もし、自分が資源管理ノードのロックに成功したときには、資源管理ノードに宛ててファイルアクセス要求を含むメッセージを発信する。
【００２５】
従来技術は、アクセス要求ノードは、資源管理ノードにロック要求を発信し、そのノードでは実行中のプログラムを中断して、ロック要求をプログラムにより処理していたが、本実施例では、このように、放送メッセージ中継回路１２という回路でロック要求を逐次化し調停した後、各ノードで、ロック状態を管理するので、資源管理ノードでのロック要求の処理は不要である。したがって、資源管理ノードでの処理を減らすことができる。
【００２６】
以下、本実施例の回路とその動作の詳細を説明する。
（ノード２）
図１に示すように、各ノード２は、検索処理３１などのプログラムを実行するプロセッサ２４と、ローカルメモリ２３と、ノードが分担するデータベースを格納するディスク装置２５と、本実施例で特徴的な高速なロック処理を行うための送信制御回路２１と、受信制御回路２２とから構成される。この並列計算機は、いわゆる分散メモリ型並列計算機で、ローカルメモリ２３は、他のノードによっては共有されていないで、そのローカルメモリが属するノードのみによりアクセス可能で、そのノードで実行するプログラムやそのプログラムが使用あるいは生成するデータを格納する。このノード内の各回路は、システムバス２６に接続され、メモリ・マップド・ＩＯ方法によりプロセッサ２４からロード命令やストア命令などのメモリ・アクセス命令により、ローカルメモリ２３と区別することなくアクセスすることができる。
【００２７】
送信制御回路２１は、メッセージ生成回路４１と送信パラメータ格納レジスタ４２と送信状態レジスタＳＴ４３とから構成される。受信制御回路２２は入力バッファ５１と、ロック状態レジスタ群５２と、ロック制御回路５００とからなる。ロック制御回路５００は、２つの入力が同一か否かを判定する一致回路５３と、２つの入力の大小を比較する大小判定回路５４と、２つの入力の加算を行なう加算回路５６と、セレクタ５５、５７と、ゲート回路５８と、アンド回路５９からなる。
【００２８】
（ネットワーク１）
ネットワーク１は、特願平６−５３４０５号明細書に記載のものと同じで、図２に示すように、基本的には、複数のクロスバスイッチ７、８と複数の中継スイッチ３から構成される、いわゆるハイパクロスバスイッチと言われるネットワークからなる。これらのクロスバスイッチは、複数のＸ方向クロスバスイッチ７または８と、複数のＹ方向のクロスバスイッチ５あるいは６からなる。各ノード２は、対応する一つの中継スイッチ３を介して、一つのＸ方向クロスバスイッチ７または８およびＹ方向クロスバスイッチ５または６に接続されている。各中継スイッチ３は、それに接続された、ノード、Ｘ方向クロスバスイッチ、Ｙ方向クロスバスイッチの間でメッセージを中継する。
【００２９】
各ノード２は、２次元空間の格子点のＸ座標Ｙ座標の組みＸＹが与えられ、Ｘ方向クロスバスイッチは、これらのノード２のうちＹ座標が特定の値に等しくＸ座標が異なる一群のノードを結合し、Ｙ方向クロスバスイッチは、これらのノード２のうちＸ座標が特定の値に等しくＹ座標が異なる一群のノードを結合する。また、この放送メッセージ中継回路１２が接続されたＸ方向クロスバスイッチ７は、他のＸ方向クロスバスイッチ８より入出力ポートを一つ多く有し、放送メッセージ中継回路１２が接続されたＹ方向クロスバスイッチ５についても同様である。このため、以下ではこれらのＸ方向クロスバスイッチ７を拡張クロスバスイッチあるいは拡張ＸＢ−Ｘ０と呼び、Ｙ方向クロスバスイッチ５を拡張クロスバスイッチあるいは拡張ＸＢ−Ｙ３と呼ぶことがある。他のＸ方向クロスバスイッチ６をＸＢ−Ｘ１、ＸＢ−Ｘ２、ＸＢ−Ｘ３と呼び、他のＹ方向クロスバスイッチ６をＸＢ−Ｙ０、ＸＢ−Ｙ１、ＸＢ−Ｙ２と呼び、中継スイッチを対応するノードの座標ｉｊを付けて、ＥＸｉｊと呼ぶことがある。各Ｘ方向クロスバスイッチ８あるいはＹ方向クロスバスイッチ５は、いずれかの中継回路３から入力されたメッセージをメッセージ内の宛先アドレスに従って転送するための、入出力ポートの数に等しい数の経路決定回路１３を有し、拡張クロスバスイッチ７あるいは６も同様に拡張ポート対応経路決定回路１４を有する。
【００３０】
（放送メッセージ中継回路１２）
この放送メッセージ中継回路１２の構造も特願平６−５３４０５号明細書に記載のものと同じ、そこに送られた複数の放送要求メッセージを順次選択し、選択された放送要求メッセージを、そのメッセージ内に含まれた放送すべき情報を含む放送メッセージに変更し、ネットワーク１を介して各ノードに放送する回路である。そこでは、この放送メッセージ中継回路１２は、一般に放送メッセージによりネットワーク１がデッドロック状態になるのを防止するために用いられている。本実施例では、その用途以外にも、ロック要求を含む複数の放送要求メッセージが複数のアクセス要求ノードから転送されたときに、それらを順次選択する逐次化回路として使用される。
【００３１】
すなわち、この放送メッセージ中継回路１２は、複数のノード２が接続されている中継スイッチ３とは別に設けられ、Ｘ方向クロスバスイッチの一つである拡張クロスバスイッチ７の拡張入出力ポート（ここではアドレス０４を有するもの）と、Ｙ方向クロスバスイッチの一つである拡張クロスバスイッチ６の拡張入出力ポート（ここではアドレス４３を有するもの）とに接続されている。この拡張Ｘ方向クロスバスイッチ７以外のＸ方向クロスバスイッチ８（例えば、ＸＢ−Ｘ１）に接続された中継スイッチに直接接続されているアクセス要求ノード（たとえば、ＥＸ１２に接続されたノード）が、ロック要求を含む放送要求メッセージをこの放送メッセージ中継回路１２に転送するときには、このノードは、この拡張Ｙ方向クロスバスイッチ６の拡張ポートのアドレス４３をメッセージ宛先アドレスとして含む放送要求メッセージを発信し、このメッセージは、このクロスバスイッチＸＢ−Ｘ１と、中継スイッチ、例えば、ＥＸ１３、拡張Ｙ方向クロスバスイッチ６、拡張入出力ポート４３を経由して放送メッセージ中継回路１２に転送される。一方、拡張Ｘ方向クロスバスイッチ７に接続された中継スイッチに直接接続されているアクセス要求ノード、例えば、中継スイッチＥＸ０１に接続されたノードが、ロック要求を含む放送要求メッセージをこの放送メッセージ中継回路１２に転送するときには、この拡張Ｘ方向クロスバスイッチ７の拡張ポートのアドレス０４をメッセージ宛先アドレスとして含む放送要求メッセージを発信し、このメッセージは、このクロスバスイッチ７のこの拡張入出力ポートを経由して放送メッセージ中継回路１２に転送される。
【００３２】
放送メッセージ中継回路１２は、特願平６−５３４０５号明細書に記載されているように、アドレスがそれぞれ０４と４３である二つの入力ポートに接続された二つの入力バッファと、これらの入力バッファの一方を選択するセレクタと、それらの二つの入力バッファのいずれを選択するかを決めて、このセレクタに指示するプライオリティ回路と、このセレクタにより選択された放送要求メッセージ内のコントロール（ＣＴＬ）ビットを放送メッセージに対するものに変更するコントロールビット変更回路と、この回路により変更された後のコントロールビットを含み、放送要求メッセージ内の放送すべき情報を含む放送メッセージをネットワーク１の出力ポート０４に送出する出力バッファとよりなる。このように、放送メッセージ中継回路１２は、そこに転送された放送要求メッセージをこのコントロールビット変更回路により放送メッセージに変更して、拡張Ｘ方向クロスバスイッチ７のアドレス０４の拡張入出力ポートに送出する。このように、アクセス要求ノードの位置により、放送要求メッセージが経由する経路が異なるのは、この放送要求メッセージに対応する放送メッセージの経路が放送要求メッセージのそれとが重複しないようにし、それにより、重複がある場合に生じ得るデッドロックが生じないようにしている。また、同時に二つの放送要求メッセージを受信したときでも、上述のプライオリティ回路により、これらを順次選択するようになっている。このように、複数の放送要求メッセージが放送メッセージ中継回路１２により、逐次ネットワーク１に送出され、同一の転送経路を通って全ノードに配送される。したがって、ネットワークの途中経路でのメッセージの追越しが発生しないネットワークでは、放送メッセージの到着する順番は全ノードで等しくなる。したがって、異なるノードから出力されたロック要求は、放送メッセージ中継回路１２により選ばれた順にしたがって全ノードに通知されることになる。後に説明するように、本実施例では同じ資源管理ノードに対するロック要求を含む複数の放送メッセージの内、各ノードに最初に到着した放送メッセージ内のロック要求が、有効なロック要求として処理され、その資源管理ノード内の共通ファイルのロックを許される。その後にその資源管理ノードに到着した放送メッセージ内のロック要求は、したがって、その共通ファイルをロックできなくなる。本実施例では、上記放送メッセージ中継回路１２の動作により、各ノードに到着する放送要求メッセージの順番は同じであるから、各ノードは同じアクセスノードから出力されたロック要求にロックを許すことになる。
【００３３】
（バイナリセマフォ）
ロック要求の処理の説明の前に、簡単にセマフォアの動作を説明する。共有ファイルをロックする方法は、本実施例では全ノードの受信制御回路２２にあるロック状態レジスタ群５２をセマフォとして用いることで実現する。ここでいうセマフォとはいわゆるバイナリ・セマフォアであり、同時に共有資源を獲得できる要求者を１つに限定する場合に用いられる。バイナリ・セマフォアは初期値として０で開始し、ロックを取るためのＰ操作とロックを解放するＶ操作が用いられる。
Ｐ（Ｘ）：「ｉｆＸ＝０ｔｈｅｎＸ：＝１」…Ｐ操作
Ｖ（Ｘ）：「ｉｆＸ＝１ｔｈｅｎＸ：＝０」…Ｖ操作
この操作を不可分な操作、例えばテスト・アンド・セット（Ｔ＆Ｓ）命令で行なうこと。そして、Ｐ操作によりバイナリ・セマフォアＸを１できた場合に、ロックが掛けられたことになる。そして、Ｖ操作によりバイナリ・セマフォアＸを０にすることでロックを解放することができる。
【００３４】
本実施例では、この動作を全ノードのロック状態レジスタ群５２をバイナリ・セマフォアとして使用することにより、シェアード・ナッシング方法のシステムで共有ファイルのロックを実現する。
【００３５】
（ロック状態レジスタ群５２の初期化）
全ノードは、装置動作を開始する前に、ロック状態レジスタ群５２の初期化を行なう。各ノード２は、この初期化動作において、ロック状態レジスタ群５２の全てに対して、プロセッサ２４がストア命令を用いてシステムバス２６を介して０を代入する。これによりロック状態レジスタ群５２の各レジスタの値が０になり、いずれのノードもロックされていないことを表す。
【００３６】
（ロック要求の処理）
以下、図３を参照して、資源管理ノード＃０に接続されたディスク装置２５に格納された共有ファイル３５をアクセス要求ノード＃１とアクセス要求ノード＃２がほぼ同時にアクセスする場合の、共有ファイルのロック方法について説明する。なお、図３において、処理２６１など二重線で囲まれた処理は主としてハードウエアにより実行される処理を表し、一重線で囲まれた処理は主としてプロセッサ２４で実行されるプログラムによる処理を表している。
【００３７】
まず資源管理ノード＃０、アクセス要求ノード＃１及び＃２で並列して検索処理プログラム３１がそれぞれのノードのプロセッサ２４により、それぞれローカルメモリ２３およびディスク装置２５を用いて実行される（ステップ２０１、２２１、２４１）。アクセス要求ノード＃１および＃２で資源管理ノード＃０のディスク装置２５の共有ファイル３５をほぼ同時に排他的にアクセスする必要が発生すると（ステップ２２２、２４２）、それぞれのノードで共有ファイルのロック要求を処理する（ステップ２２３、２４３）。
【００３８】
（１）ロック要求メッセージの放送
すでに述べたように、ロック状態レジスタ群５２は、この並列計算機内の全ての共有資源のロック状態を管理するためのレジスタで、ロックする資源の単位に対応して設けられている。本実施例では、各ノードをロックの単位とする。このため、本実施例では、このロック状態レジスタ群５０は、いずれかのノードにそれぞれ対応するレジスタからなる。したがって、以下では、各ノードのロック状態レジスタの番号を、そのレジスタが対応するノードの番号で呼ぶことにする。例えば、資源管理ノード＃０に対するロック状態レジスタを、ロック状態レジスタ＃０と呼ぶ。
【００３９】
アクセス要求ノード＃１、＃２は、この資源管理ノード＃０がすでにロック状態にあるか否かを判定する。このためにプロセッサ２４は、ロック状態レジスタ＃０の内容をシステムバス２６を介して読み出し、その値が０であるかをチェックする。もし、０でないならば、資源管理ノード＃０はロック済みであるため、ロック要求を出力しない。こうして、本実施例では、無駄なロック要求を各ノードが出すのを防いでいる。
【００４０】
アクセス要求ノード＃１、＃２は、この資源管理ノード＃０がロック済みでないことを確認した後、ロック状態レジスタ＃０に対して前述のＰ操作を以下のようにして行う。すなわち、全ノードに次のような以下に示すパラメータを含む放送要求メッセージ１１をメッセージ中継回路１２に送信する。このメッセージは、各ノードのロック状態レジスタ＃０をロック状態にすることを要求するロック要求メッセージを放送することを要求している。
【００４１】
Ｎａｄｒ（アドレス）：＝放送メッセージ中継回路１２のアドレス
ＣＴＬ（コントロール）：＝放送要求メッセージイネーブル
Ｒ＃（レジスタ番号）：＝＃０
Ｉｎｔ（割り込み）：＝ディスエーブル
Ｄ０（データ０）：＝０
Ｄ１（データ１）：＝アクセス要求ノード＃１（又は＃２）の番号
Ｃｔｙｐ０（演算タイプ０）：＝一致判定イネーブル
Ｃｔｙｐ１（演算タイプ１）：＝セットイネーブル
ここで、Ｎａｄｒは、この放送要求メッセージの転送先を表わすネットワークアドレスで、ネットワーク１により使用され、今の場合、このアドレスは放送メッセージ中継回路１２のアドレス０４または４３である。アドレス０４または４３のいずれを用いるかは、そのアクセス要求ノード＃１が拡張Ｘ方向クロスバ７に接続された中継スイッチに接続されているか否かによることはすでに記載したとおりである。ＣＴＬはメッセージの種類を表わすコントロールビットで、今の場合放送要求メッセージであることを表わす。Ｒ＃は、ロック要求の対象となった資源管理ノードに対応するロック状態レジスタの番号で、今の場合は＃０である。Ｉｎｔは、この放送要求メッセージにより要求されるロックの取得が成功した場合に、アクセス要求ノード＃１に割込によりそのことを通知するか否かを指示するフィールドであり、ここではディスエーブルを設定しているため割込は発生しない。この場合、後で述べるようにこのロック要求が成功したか否かは、アクセス要求ノード＃１内のプロセッサ２４がそのノード内のロック状態レジスタ＃０の内容を監視して検出する。一方、Ｉｎｔフィールドをイネーブルにした場合、アクセス要求ノード＃１は、割り込みを待合せて、ロックが成功したか否かの判定する。Ｄ０とＤ１は、それぞれ各ノードのロック制御回路５００での演算に使用されるべきデータを表わす。今の例では、Ｄ０はロック状態レジスタ＃０がロックされていない状態を表わす値０を示し、このレジスタの現在の値が０か否かを判定するのに使用される。Ｄ１は、今の場合、このレジスタのロックが成功した場合に、このレジスタに書き込むべきデータとして、アクセス要求ノード＃１（または＃２）の番号を表わす。Ｃｔｙｐ０は、各ノードでの演算を指定する第１のパラメータで、今の場合、ロック状態レジスタ＃０の現在の値とデータＤ０との一致判定を示す。これにより、現在のこのレジスタがロックされていない状態にあるか否かを判定することになる。Ｃｔｙｐ１は、ロック状態レジスタ＃０にロック制御回路５００での演算結果を書き込むべきか否かを示す信号で、今の場合は、書き込みを指示している。
【００４２】
アクセス要求ノード＃１は、上に述べた放送要求メッセージの送信のために、送信パラメータ格納レジスタ４２に対してプロセッサ２４はシステムバス２６を介して上に述べたパラメータをストア命令を用いて書き込む。送信制御回路２１では、メッセージ生成回路４１が送信パラメータ格納レジスタ４２に以上のパラメータが書き込まれたことを検出すると、この送信パラメータから放送要求メッセージ１１を生成し、ネットワーク１に送出する。ネットワーク１へのこの放送要求メッセージの送出が完了すると、送信状態レジスタ４３内の状態ＳＴを「送信完了」に設定する。プロセッサ２４は送信状態レジスタ４３をロード命令によりシステムバス２６を介して監視することで、共有ファイルのロック要求が終了したことを知ることができる。そして、送信状態レジスタ４３を「送信完了」になったことを確認したら、プロセッサ２４はストア命令により送信状態レジスタ４３をクリアし、次のメッセージの送信ができるようにする。
【００４３】
ネットワーク１に送出されたメッセージ１１はすでに説明したように放送メッセージ中継回路１２に送られる。アクセス要求ノード＃１及び＃２の動作はロック要求の発行に関しては同じ動作をする。これらのノードから発信された上記メッセージを放送メッセージ中継回路１２が順次選択し、この選択したメッセージ内のコントロールビットＣＴＬを、放送要求メッセージを表わすものから放送メッセージを表わすものに変換し、ネットワーク１により全ノードに放送する（ステップ２６１）。さて、先に述べたようにアクセス要求ノード１とアクセス要求ノード２はほぼ同時に共有ファイル３５のロック要求を行なっているが、本ケースではアクセス要求ノード１から送出された放送要求メッセージがアクセス要求ノード２から送出された放送要求メッセージより先に放送メッセージ中継回路１２により選択され、全ノードに放送されると仮定する。このことは既に説明したように全ノードで到着するロック要求メッセージの到着する順番が、アクセス要求ノード１からのロック要求メッセージがアクセス要求ノード２からのロック要求メッセージに先行していることを意味する。
【００４４】
（２ａ）ロック要求の調停（ロック成功ケース）
各ノードの受信制御回路２２は、アクセス要求ノード１からのロック要求を格納した放送メッセージを受信すると、次の動作を行なう。ネットワーク１からメッセージを入力バッファ５１に一時的に格納する。入力バッファ５１に格納されたメッセージのロック状態レジスタ番号＃Ｒ（今の場合は＃０）にしたがって、ロック状態レジスタ＃０に格納された値、ここでは初期値０が、一致判定回路５３、大小判定回路５４、加算回路５６に対して出力される。受信したメッセージのデータＤ０、ここでは値０が一致判定回路５３と大小判定回路５４に出力される。一致判定回路５３は、それへの二つの入力はどちらも０であるため、その出力は１となる。本ケースでは、このメッセージのＣｔｙｐ０が一致判定イネーブルであるため、一致判定回路５３の出力がセレクタ５５により選択され、大小判定回路５４の出力は選択されない。こうして、セレクタ５５の出力はアクティブになり、これに応答するゲート５８はアクティブになる。一方、受信したメッセージのＣｔｙｐ１フィールドはセットイネーブルであるため、受信したメッセージのＤ１、ここではアクセス要求ノード＃１のノード番号＃が、セレクタ５７により選択され、ゲート５８に出力される。ゲート５８がアクティブであるため、ゲート５８の出力であるアクセス要求ノード＃１のノード番号＃１がロック状態レジスタ＃０に書き込まれる。また、受信したメッセージのＩｎｔフィールドはディスエーブルであるため、アンドゲート５９の出力はネガティブであり、プロセッサ２４に対して割込は発生しない。
【００４５】
なお、ロック要求の調停が以上のように行われるが、本実施例では、この調停により選択されたロック要求に対して、資源管理ノードにおける、そのロック要求が要求する資源自体のロックは行われない。
【００４６】
（２ｂ）ロック要求の調停（ロック失敗ケース）
資源管理ノード＃０がすでにロック状態レジスタ＃０の値が
アクセス要求ノード２からのロック要求を格納した放送メッセージは、アクセス要求ノード１からのロック要求を格納した放送メッセージに後に各ノードで受信される。この場合にも上の（２ａ）と同じように処理されるが、この場合には、ロック状態レジスタ＃０にはすでにアクセス要求ノード＃１のノード番号が書き込まれているために、一致判定回路５３では一致は検出されず、その出力はネガティブになり、従って、セレクタ５５の出力もゲート５８もネガティブになる。このためゲート５８はロック状態レジスタ＃０に対して書き込みは行なわない。したがって、アクセス要求ノード＃１がノード＃０をロックしている状態が維持される。
【００４７】
このようにロック状態レジスタ＃０に対する複数のＰ操作が全てのノードに対して同じ順番で実行されることにより、ロックの排他性を保証することができる。また、このロック処理の間、資源管理ノード＃０では、実行中の処理を中断することがない。
【００４８】
（３ａ）ロックの確認（成功ケース）
共有ファイルのロック要求を行なったアクセス要求ノード＃１ではロック状態レジスタ＃０をロード命令により監視し、ロック状態レジスタ＃０の内容が自ノードのノード番号に変更されたことを確認できたことにより、ロックの成功を確認する（ステップ２２４）。これ以降資源管理ノード＃０内の共有ファイル３５へのアクセス権はアクセス要求ノード＃１がロックを解除するまでアクセス要求ノード＃１が獲得する。
【００４９】
（３ｂ）ロックの確認（失敗ケース）
共有ファイルのロック要求を行なったアクセス要求ノード＃２ではロック状態レジスタ＃０をロード命令により監視し、ロック状態レジスタ＃０の内容がはアクセス要求ノード＃１のノード番号に変更されたことにより、ロックの失敗を確認する（ステップ２４４）。これ以降は、アクセス要求ノード＃１がロックを解除するまでアクセス要求ノード＃２は共有ファイル３５をアクセスできない。
【００５０】
（４）ファイルアクセス
以下、アクセス要求ノード＃１は資源管理ノード＃０のプロセッサ２４で実行されるＩ／Ｏ処理プログラム３２に対してネットワーク１を介してアクセス要求を含むメッセージを発行することにより、共有ファイル３５のアクセスを行なう（ステップ２２５）。このメッセージを受けた資源管理ノードでは、このメッセージが入力バッファ５１に取り込まれた時点で、図示しない割り込み回路により、プロセッサ２４に割り込み信号を供給する。そのプロセッサ２４では、割り込み処理プログラムにおいて、このアクセス要求が通信宛先として指定するＩ／Ｏ処理プログラム３３を起動して、このアクセス要求が指定するファイルに対してこの要求が指定するＩ／Ｏ動作を実行することを指示する（ステップ２０２）。このアクセス要求が、読み出し要求のときには、プロセッサ２４は、このデータをディスク装置ディスク装置２５から読み出し、このデータを含むメッセージをアクセス要求ノードに送信する。もしこのアクセス要求が書き込み要求のときには、このアクセス要求には書き込みデータが含まれ、このデータをプロセッサ２４は、ディスク装置２５に書き込み、書き込みの完了を示すメッセージをアクセス要求ノードに送信する。
【００５１】
従来技術では、いずれかの資源管理ノードは、他のノードから発信されたファイルアクセス要求を受信した場合には、そのファイルアクセスが指定するアクセスを実行するためのコプロセスを生成し、そのコプロセスの実行を制御する。この場合、その資源管理ノードは、複数のノードからディスク記憶装置２５内の同一のファイルに対して、異なるノードからロック要求を並行して受信した場合、それぞれのロック要求に対してコプロセスを発生する。これらのコプロセスは同一のファイルに対するアクセスであるので、資源管理ノードでは、これらのコプロセスによる、このファイルへのロック要求を調停するようになっている。すなわち、いずれか一つのコプロセスを選択する。さらに、従来のロック要求の処理では、その選択されたコプロセスがそのファイルを排他的に使用可能なように、そのファイルをロックし、そのコプロセスの使用が終了するまで、そのファイルを他のコプロセスが使用するのを禁止する。
【００５２】
本実施例では、各アクセス要求ノードから資源管理ノードにロック要求を送らない。したがって、資源管理ノードでは、そのノード内の資源をロックすることはない。上述のように、本実施例では、資源管理ノードのプロセッサの外部に設けた排他的な使用要求を調停する回路により、同じ資源管理ノード内のディスク記憶装置２５に対しては一つのアクセス要求ノードのみがアクセスを許されている。すなわち、この資源の排他的な使用が、この資源管理ノードのプロセッサの外部で調停されている。したがって、この資源管理ノードのプロセッサには、同一のディスク記憶装置に対する複数の排他的な要求が供給されない。したがって、アクセスを要求する複数のコプロセスが並行して生成されない。従って、この資源管理ノード内では、ロック制御を行わなくても、同じディスク記憶装置への、異なるノードからのアクセスは同時には生じない。したがって、本実施例では、資源管理ノードでは、資源をロックしない。本実施例においては、各ノードがロック状態レジスタ群を有し、他のノードの資源をアクセスするときに、この状態レジスタ群によりその資源がロック済みか否かを判定することにより、同一の資源の排他的は使用を保証している。
【００５３】
（５）共有ファイルのロック解除要求
共有ファイル３５のアクセスを完了したアクセス要求ノード＃１は、共有ファイル３５のロックを解除するため、次の手順によりロック状態レジスタ＃０に対してＶ操作を行なう。まず、次のパラメータを含む放送要求メッセージを放送中継回路１２に転送し、そこから、このパラメータを含むメッセージを、全ノードに放送することで行なう。
【００５４】
Ｎａｄｒ：＝放送中継回路１２のアドレス
ＣＴＬ：＝放送要求メッセージイネーブル
Ｒ＃：＝＃０
Ｉｎｔ：＝ディスエーブル
Ｄ０：＝アクセス要求ノード＃１のノード番号
Ｄ１：＝０
Ｃｔｙｐ０：＝一致判定イネーブル
Ｃｔｙｐ１：＝セットイネーブル
このパラメータは、先に述べた、ロック要求を含む放送要求メッセージと比べると、データＤ０とＤ１の内容が入れ換わっているだけである。このメッセージは放送メッセージ中継回路１２へ送信され、そこからネットワーク１を介して全ノードへ放送される（ステップ２６２）。その時の回路動作は、先に述べたロック要求を含む放送要求メッセージの場合と同じである。
【００５５】
（６ａ）ロック解除
各ノードでは、アクセス要求ノード＃１からのロック解除要求を格納した放送メッセージを受信すると、受信制御回路２２は、さきのロック要求を含むメッセージの場合と同様に、ロック状態レジスタ＃０に、データＤ０で示されるアクセス要求ノード＃０が保持されていることを一致判定回路５３で判定した上で、このレジスタにデータＤ１で示される値０を書き込む。これにより、ロック状態レジスタ＃０は初期値０に戻り、共有ファイル３５はどのノードも使用していない状態になる。
【００５６】
一方、処理２４４の結果としてロック失敗となったアクセス要求ノード＃２は、共有ファイルのロック再要求待ち（ステップ２４５）になる。通常この処理はプロセッサ２４が内部的に保持しているタイマ機能を用いて、ある時間後に再びロック要求処理（ステップ２４３と同じ）を行なうことで実現される。例えば、アクセス要求ノード＃１がロック状態レジスタ＃０に対してロック解除を行なう前に、アクセス要求ノード＃２がロック要求を再び行なったとしても、ロック状態レジスタ＃０の値はアクセス要求ノード＃１のノード番号が格納されているためＰ操作が失敗し、ロックは失敗する。しかし、ロック状態レジスタ群５２のフリー処理が行なわれた後でアクセス要求ノード＃２がロック要求を行なった場合、処理２１１〜２と同様にして、ロック状態レジスタ＃０にはアクセス要求ノード＃２のノード番号が格納され、アクセス要求ノード＃２はロックの獲得に成功する。
【００５７】
このように本実施例ではロックの要求および解除のどちらにおいても資源管理ノードのような特定のノードのプロセッサが行なっている検索処理を中断することなく実現することができる。このため、システム全体の性能を向上することができる。
【００５８】
＜実施例１の変形例１＞
実施例１は特願平６−５３４０５号明細書に記載の放送メッセージ中継回路１２を使用した。この回路は、ハイパクロスバネットワーク内には存在しないで、新たに設けられている。本変形例ではこのような追加の回路を使用しないで、実施例１と同様の動作を実現する。
【００５９】
すなわち、特願平６−１６９９９５号明細書には、各Ｘ方向クロスバネットワーク方向とＹ方向クロスバネットワークに通常備わっている、そのクロスバネットワークに転送されてきた複数の放送メッセージを逐次に放送する逐次化回路を使用して、特願平６−５３４０５号明細書内の放送メッセージ中継回路と同じようにして、ネットワーク内でのデッドロックを防止する技術が示されている。この技術では、放送を要求するノードが、ネットワーク内の複数のクロスバスイッチの内の、予め放送メッセージの逐次化回路用に定められた特定の一つのクロスバスイッチに接続された一つの中継スイッチに宛てて放送要求メッセージを送出する。この特定の中継スイッチとしては、放送メッセージの送出元のノードが接続され、上記特定のクロスバスイッチと対応する座標軸が異なるクロスバスイッチ（例えば、上記特定のクロスバスイッチがＸ方向クロスバスイッチであるときには、Ｘ方向のクロスバスイッチ）と、上記特定のクロスバスイッチとに接続された中継スイッチが使用される。この特定のクロスバネットワークでは、それに接続された複数の中継スイッチから複数の放送要求メッセージがそのクロスバスイッチに転送されてきたときに、それらの一つを選択して、選択した放送要求メッセージを放送メッセージに変更し、そのクロスバスイッチに接続された上記複数の中継スイッチの各々にこの放送メッセージを送出し、それらの中継スイッチを介してこれらの放送メッセージをネットワークに接続された複数のノードに送出する。
【００６０】
本変形例でもこの特定のクロスバスイッチを実施例１の放送メッセージ中継回路１２と同様に使用する。すなわち、図１において、放送中継回路１２を削除し、それにともなって、拡張Ｘ方向クロスバスイッチ１ＸＥ、拡張Ｙ方向クロスバスイッチ１ＹＥを、それぞれ他のＸ方向クロスバスイッチとＹ方向クロスバスイッチと同じ数の入出力ポートを有するもので構成する。各ノードは、ロック要求を含む放送依頼メッセージを、上記特定のクロスバスイッチに接続されたいずれかの中継スイッチ（具体的には上に説明した特定の中継スイッチ）に宛てて、ロック要求を含む放送要求メッセージを送出するようにすればよい。その後は、特願平６−１６９９９５号明細書に示した方法で、この放送要求メッセージに対する放送メッセージが各ノードに放送される。各ノードは、実施例１に示したのと同じ回路により構成され、その動作も実施例１の場合と同じである。
【００６１】
＜実施例１の変形例２＞
実施例１においては、放送メッセージ中継回路１２は、そこに転送された、ロック要求を含む複数の放送要求メッセージの一つを選択し、選択された放送要求メッセージを放送メッセージに変更してネットワーク１を介して複数のノードに転送した。したがって、この放送メッセージ中継回路１２は、それぞれロック要求を含む、複数の放送要求メッセージを順次選択する逐次化回路として作用している。しかし、この放送メッセージ中継回路１２は、同じノードに対するロック要求を含む複数の放送要求メッセージの内最初に選択した放送要求メッセージに対する放送メッセージを放送するだけでなく、同じノードに対するロック要求を含む後続の放送要求メッセージに対する放送メッセージも放送する。各ノードは、同じノードに対するロック要求を含む複数の放送メッセージを順次受信するが、同じノードに対するロック要求を含むこれらの放送メッセージの内、最初に受信された放送メッセージ内のロック要求のみを有効なロック要求として処理した。このため、実施例１では、同じノードに対するロック要求を含む複数の放送メッセージの内、最初に放送されるメッセージ以外のメッセージは、各ノードでは使用されないにもかかわらず、放送される。このため、ネットワーク１は、このような無駄は放送メッセージのために使用されるという問題がある。本変形例では、このような無駄なメッセージの放送をなくす。
【００６２】
このために、放送メッセージ中継回路１２内にも、各ノードに設けられているロック制御レジスタ群５２とロック制御回路５００とを設ける。すなわち、この放送メッセージ中継回路１２内のプライオリティ回路がいずれかのノードに対するロック要求を含む放送要求メッセージを最初に選択したときに、その放送要求が指定するロック先ノードをそのロック先ノードに対応するロック状態レジスタに格納し、このノードがロック済みであることを表示するように、このロック制御回路を構成する。この放送要求に対しては、実施例１と同様に、この放送メッセージ中継回路１２内のコントロールビット変更回路により、この最初に選択された放送要求メッセージを、放送メッセージに変更した上で、ネットワークに転送する。さらに、この放送メッセージ中継回路１２のプライオリティ回路が、ロック要求を含む後続の放送要求メッセージを受信したときに、放送メッセージ中継回路１２内に設けたロック制御回路により、その放送要求メッセージが要求するロック先のノードに対するロック状態レジスタの内容を調べ、そのレジスタ内にいずれかのロック要求元のノードの番号が格納されているときには、ロック要求を含む、この後続の放送要求メッセージを無効メッセージとし、このメッセージの放送は行わないようにする。こうすることにより、同じノードの対するロック要求を含む放送要求メッセージの内、最初に選択されたものに対応する放送メッセージのみが放送されることになる。各ノードのロック制御回路５００の構成は、実施例１と同じでよい。
【００６３】
＜実施例１の変形例３＞
変形例２で放送メッセージ中継回路１２内に設けた回路と同じ回路を、変形例１で説明した、放送メッセージの逐次化回路用に使用する特定のクロスバスイッチ内の逐次化回路に設けることにより、変形例２と同様の動作を実現できる。
【００６４】
＜実施例１の変形例４＞
実施例１では、放送メッセージ中継回路を、ロック要求を含まない通常の放送要求メッセージの逐次化にも使用し、ネットワーク１をこれらの放送要求メッセージあるいはそれらに対してこの回路により生成された通常の放送メッセージの転送にも使用した。さらに、この放送メッセージ中継回路を、ロック要求を含む放送要求メッセージの逐次化にも使用し、ネットワーク１をこれらの放送要求メッセージあるいはそれらから生成される放送メッセージの転送にも使用した。しかし、ロック要求を含む放送要求メッセージと、それから生成されるロック要求を含む放送メッセージのために、別の放送メッセージ中継回路を含む別のネットワークを使用することは、メッセージの転送速度を向上するのに有効である。
【００６５】
＜実施例２＞
実施例１では、ある資源への複数のロック要求を、その資源がいずれのノードに属するかに関係なく、各ノードで調停し、その結果をそれぞれのノードのロック状態レジスタに保持し、それぞれのノードで利用した。実施例１ではそのためにロック要求元は、放送メッセージ中継回路にロック要求を含む放送要求メッセージを発信し、この回路が、このメッセージを放送メッセージに変更して各ノードに放送した。このため、全てのロック要求を含む放送要求メッセージが放送中継回路に転送された。このために、この放送メッセージ中継回路には、ロック要求を含む多くの放送要求メッセージが転送されることになり、この回路での処理の遅延が発生する恐れがある。さらに、各ノードのロック状態レジスタ群は、それぞれ一つのロック単位、例えば、ノードに対応して使用され、かつ、各ロック状態レジスタは、対応するノードの資源がロックされたときに、そのロックに成功したノードの番号を保持した。このためノードの数が増大すると、より多くのレジスタが必要になる。本実施例はこれらの問題をなくす。
【００６６】
図４は、本実施例での並列計算機の概略構成を示す。以下、実施例１との相違点を中心に説明する。本実施例では、複数のアクセスノードがある資源管理ノードに属する資源へのアクセスを要求する場合、それぞれのアクセス要求ノードはロック要求を含む複数のロック要求を、その資源管理ノードに一対一通信で転送し、その資源管理ノードはこれらのロック要求を調停し、その一つにロック権を与えることを示す調停結果を各ノードに放送メッセージにより通知する。これにより、実施例１での放送中継回路へのロック要求を含むメッセージの集中を避ける。本実施例でのネットワーク１には実施例１と同じく、放送メッセージ中継回路１２が設けられ、上記資源管理ノードは、上記調停結果の放送に当っては、この調停結果を含む放送要求メッセージをこの放送メッセージ中継回路１２に送り、この回路によりこの調停結果を全てのノードに放送する。このように、この放送メッセージ中継回路１２は、ロック要求を含む放送要求メッセージの逐次化には使用されないで、上記調停結果など、ロック要求以外の情報を含む放送要求メッセージの逐次化に使用される。
【００６７】
各ノードのロック制御回路５００では、各ノードでは全ノードに対して一つのロック状態レジスタ１５３が設けられ、このレジスタの複数のビット位置は、一つのロック単位に対応して設けられ、ここでは、ロック単位はシステムのいずれか一つのノードである。それぞれのビット位置は、対応するノードがロックされた状態にあるかに応じて値１または０をとるロック状態を記憶する。これにより、実施例１におけるロック状態レジスタの増大を防いでいる。各ノードにはさらに、自ノードがロック済みの資源を識別するためのロック獲得レジスタ１５２を設ける。このレジスタの各ビットは、それぞれ一つのロック単位、ここではノードに対応し、そのレジスタが属するノードがいずれかの資源管理ノードをロックしたときに、その資源管理ノードに対応するビットに値１を格納するようになっている。ロック制御回路５００は、実施例１と異なり、ここではロック要求の調停を行わないで、いずれかの資源管理ノードから発信された調停結果に基づいて、これらのレジスタの更新を行う。アクセス要求ノードは更新後のレジスタ値に基づいて自己が発信したロック要求が許可されたか否かを判定する。
【００６８】
以下、図４の装置の動作の詳細をさらに説明する。
（レジスタの初期化）
全てのノードにおいて、それぞれの中のプロセッサ２４は、ロック状態レジスタ１５２、ロック獲得レジスタ１５３を０にリセットする。各ノードのノード＃レジスタ１５４には、そのノードの番号をセットする。
【００６９】
（ロック要求の処理）
図５のフローチャートを参照してこの処理の実行方法を説明する
（１）ロック要求メッセージの放送
アクセス要求ノードでは、アクセスすべき資源が属する資源管理ノードがロックされているか否かを、ロック状態レジスタ１５２内の、そのノードに対応するビットの値に基づいてチェックし、そのノードがロックされていないときに、ロック要求を含むメッセージをその資源管理ノードに宛てて発信する（ステップ５２３、５４３）。この発信のために以下のパラメータを送信パラメータ格納レジスタ４２に格納する。
【００７０】
Ｎａｄｒ（ネットワークアドレス）：＝資源管理ノードのノード番号
ＣＴＬ（コントロール）：＝一対一転送メッセージイネーブル
ＣメッセージＤ（コマンド）：＝ロック要求
Ｉｎｔ（割り込み）：＝ディスエーブル
Ｂ＃（ビット番号）：＝資源管理ノードの番号
Ｔ＃（ターゲットノード番号）：＝資源管理ノードの番号
Ｒ＃（アクセス要求ノード番号）：＝自ノードの番号
ここで、ＣＴＬはメッセージの種類を表わすビットで、ここでは一対一メッセージを表わす。ＣメッセージＤは、コマンドの種類を表わすコードで、ここでは、ロック要求を表わす。本実施例では、ロックに関するコマンドとしては、この他に、ロック通知、ロック解除などのコマンドを使用する。Ｂ＃は、ロックすべき資源を識別する情報で、ここでは、その資源の資源管理ノードの番号を使用する。Ｔ＃は、ロックすべき資源が属する資源管理ノードの番号である。Ｒ＃は、アクセス要求ノードの番号である。Ｎａｄｒ、ＣＴＬ、Ｉｎｔに関しては、実施例１と同様である。メッセージ生成回路４１は、このパラメータを含むメッセージをネットワーク１を介して、このアドレスＮａｄｒにより示される資源管理ノードに転送する。
【００７１】
（２ａ）ロック要求の調停（ロック成功ケース）
複数のアクセス要求ノードからそれぞれロック要求を含む複数のメッセージが同じ資源管理ノードに宛てて発信されると、それらのメッセージはネットワーク１によりその資源管理ノードに並行して転送されるが、この資源管理ノードは、これらのメッセージをネットワークから順次受信し、入力バッファ５１に順次供給する。入力バッファ内にこのようなメッセージの内の先頭のメッセージが入力されると、このコマンド内のビット番号Ｂ＃が、ロック状態レジスタ１５３、ロック獲得レジスタ１５２に供給される。さらに、この入力バッファ５１内に設けられたデコーダ（図示せず）によりコマンドビットＣメッセージＤが解読され、このコマンドがロック要求のときには、ロック要求を解読したことを示す信号がＡＮＤゲート１５５に供給される。このＡＮＤゲート１５５の他方の反転入力には、ロック状態レジスタ１５３内の、ビット番号Ｂ＃のビットが入力され、もし、このビットの値が０のとき（すなわち、資源管理ノードがまだロックされていないとき）には、このＡＮＤゲートの出力は１となる。この出力信号は、このロック要求に対して、ロックが許可されることを示すもので、ロック状態レジスタ１５３のセット端子にＯＲゲート１５９を介して送られる。その結果、そのレジスタは、それにすでに入力されているビット番号Ｂ＃のビット位置に値１を書き込み、この資源管理ノードがロックされたことを表示する。さらに、ＡＮＤゲート１５５の出力は、メッセージ生成回路４１に、コマンドとしてロック報告を含む放送要求メッセージを送出することを指示する。このように、このＡＮＤゲート１５５によりロック状態の検査が行なわれる（ステップ５０２）。
【００７２】
メッセージ生成回路４は、ＡＮＤゲート１５５からの上記指示信号に応答して、入力バッファ５１内のメッセージに基づいて、放送要求メッセージを生成する。すなわち、このメッセージ内のパラメータとして、次ぎの新たなパラメータと、入力バッファ５１内の他のパラメータを含むメッセージを生成してネットワーク１に送出する。以下の新たなパラメータはメッセージ生成回路４１内に予め保持されている。
【００７３】
Ｎａｄｒ：放送メッセージ中継回路１２のアドレス
ＣＴＬ：放送要求メッセージ
ＣメッセージＤ：ロック報告
このメッセージは、放送メッセージ中継回路１２により、放送メッセージになるように、コントロールビットＣＴＬが変更された後に、各ノードに放送される（ステップ５０３）。
【００７４】
（２ｂ）ロック要求の調停（ロック失敗ケース）
ロック要求をそれぞれ含む複数のメッセージの内の、上記先頭のメッセージの処理が以上のようにして終了すると、それらのメッセージの内の後続のメッセージがその後順次入力バッファ５１に取り込まれるが、それらのメッセージに対しては、ＡＮＤゲート１５５に入力される、ロック状態レジスタ１５２のビットの値がすでに１になっているので、ＡＮＤゲート１５５の出力は０のままであり、これらのメッセージに含まれたロック要求にはロックは許可されない。
【００７５】
（３）ロックの確認
上記放送中継回路１２から放送されたロック報告を含むメッセージがすでにロック要求を発行したいずれかのノードに転送されたときには、上記メッセージ内のターゲットノード番号Ｔ＃が一致回路１６０に送られ、自ノードレジスタ１５４内のノード番号と比較される。この比較の結果、このノードが、上記資源管理ノード以外のノードであるときには、一致は検出されない。ＡＮＤゲート１５８には、一致回路１６０の反転出力と上記メッセージ内のコマンドＣメッセージＤがロック報告であることを解読した信号が入力される。したがって、上記資源管理ノード以外のノードでは、いずれの入力も１であるから、このＡＮＤゲート１５８の出力は１となり、ＯＲゲート１５９を介してロック状態レジスタ１５３のセット端子に入力され、このレジスタ１５３は、上記メッセージのビット番号Ｂ＃が示すビット位置に１をセットする。こうして、これらのノード内のロック状態レジスタ１５３は、上記資源管理ノードがロックされた状態にあることを表示する（ステップ５２４、５４４）。なお、上記資源管理ノードでは、前述したとおり、いずれかのロック要求にそのノードのロックを許可したときに、このロック状態レジスタ内の同じビット位置に１を書き込んでいる。
【００７６】
上記ロック報告を含むメッセージを受信したノードが、いずれかのアクセス要求ノードであるとき、そのノードでは、そのメッセージ内の、アクセス要求ノード番号Ｒ＃が、一致回路１６１に送られ、そこで、ノード番号レジスタ１５４内のノード番号と比較される。上記資源管理ノードにロック要求を送った複数のアクセス要求ノードのうち、ロックを許可されたアクセス要求ノードでは、この一致判定の結果、一致が検出される。ＡＮＤゲート１５７には、この一致回路１６１の出力と、メッセージ内のロック報告の解読信号が入力されるので、このＡＮＤゲートの出力は、ロックに成功したアクセス要求ノードでは１となり、ロック獲得レジスタ１５２のセット端子に入力され、このレジスタ１５２は、上記メッセージのビット番号Ｂ＃が示すビット位置に１をセットする（ステップ５２４）。一方、アクセス要求ノードのうち、ロックを許可されなかったノードでは、このロック獲得レジスタ１５２の内容は変更されない。
【００７７】
したがって、複数のアクセス要求ノードは、プロセッサ２４が、これらのレジスタ１５２、１５３の内容を見て、資源管理ノードがロックされたか、ロックされたときには、自己がロックに成功したかを判定することができる（ステップ５２５、５４５）。
【００７８】
なお、ＡＮＤゲート１５６は、ロックが成功したアクセス要求ノードにおいて、一致回路１６１で一致が見られ、かつ、受信した上記メッセージ内の割り込み信号Ｉｎｔが１であるとき、そのプロセッサ２４にロックが成功したことを示す割り込み信号を供給する。
【００７９】
（４）ファイルアクセス
ロックに成功したアクセス要求ノードは、上記資源管理ノードに対してアクセス要求を発信し、資源管理ノードはその要求に応答して、資源管理ノード内のディスク装置にアクセスする（ステップ２２５、２０２）。この方法は実施例１と同様である。
【００８０】
（５）共有ファイルのロック解除要求
このアクセス要求ノードによるアクセスが完了すると、このノードはアンロック要求を含む放送依頼メッセージを放送メッセージ中継回路１２に転送する。このメッセージのパラメータは以下のとおりである。このメッセージを受けて、放送メッセージ中継回路１２は、このパラメータを含むメッセージを各ノードに放送する（ステップ５２７）。
【００８１】
Ｎａｄｒ：＝放送中継回路１２のアドレス
ＣＴＬ：＝放送要求メッセージイネーブル
ＣメッセージＤ：＝アンロック
Ｉｎｔ：ディスエーブル
Ｂ＃：＝上記資源管理ノードの番号
Ｔ＃：＝上記資源管理ノードの番号
Ｒ＃：＝自ノードの番号
各ノードでこのメッセージを受信したときには、入力バッファ５１から与えられる、アンロック要求を解読したことを示す信号が、ロック状態レジスタ１５２とロック獲得レジスタ１５３のリセット端子に与えられ、これらのレジスタは、このメッセージ内のビット番号Ｂ＃が示すビットの値を０にリセットする。なお、複数のノードの内、ロックに成功したアクセス要求ノード以外では、ロック獲得レジスタ１５２の値は、それまでにすでに０であるので、このアンロック処理によって変わらない。こうして、各ノードにおいて、これらのレジスタは、上記資源管理ノードがアンロック状態にあることを示す（ステップ５０４、５２８、５４７）。
【００８２】
以上から分かるとおり、本実施例では、実施例１と同様の利点を有するだけでなく、実施例１と異なり、アクセスすべき資源を管理するノードで、その資源に対する複数のロック要求の調停を行うので、実施例１で述べたように放送メッセージ中継回路にロック要求を含む放送要求メッセージを送出する必要がなく、その回路に多くの放送要求メッセージが集中することを緩和できる。さらに、本実施例では、一つのロック状態レジスタと一つのロック獲得レジスタでもって、各資源のロック状態とどのノードがロックに成功したかを判別可能にしたので、実施例１より少ないレジスタで間に合う。
【００８３】
＜実施例２の変形例＞
実施例２では、各ノードに存在する共有ファイルは一つとしたが、各ノードに複数の共有ファイルが存在する場合、ロック状態レジスタとロック獲得レジスタを複数設け、各ノードの異なる共有ファイルに対して、異なるロック状態レジスタと異なるロック獲得レジスタを使用すれば、それぞれの共有ファイルに対して他と独立にロックを管理できる。
【００８４】
【発明の効果】
本発明によれば、アクセス要求ノードにおいて、アクセスすべき資源がロック済みでないことを確認してから、この資源に対するアクセス要求を送出するので、無駄なロック要求を出さない、それにより無駄なロック要求に対する排他制御を減らせる。
【図面の簡単な説明】
【図１】本発明による並列計算機の構成図。
【図２】図１の装置に使用するネットワークの概略構成図。
【図３】図１の装置における共有ファイルへのアクセスの処理のフローチャート。
【図４】本発明による他の並列計算機の構成図。
【図５】図４の装置における共有ファイルへのアクセスの処理のフローチャート。
【符号の説明】
２６…システムバス、５８…論理ゲート[0001]
[Industrial application fields]
The present invention relates to an exclusive control method for a resource shared by these processors in a computer system having a plurality of processors, and in particular, in a parallel computer in which a plurality of nodes each having a processor are coupled via a network. The present invention relates to an exclusive control method for a shared resource belonging to any node.
[0002]
[Prior art]
As a database system for searching a large-scale database at high speed, a distributed database is executed by a parallel computer composed of a plurality of processors or a client-server type distributed processing system composed of a plurality of computers. The system is known. In this specification, a plurality of processors constituting a parallel computer and a plurality of computers constituting a client-server type distributed processing system are referred to without distinction. Do Therefore, a computer element that executes distributed processing of the distributed processing system is referred to as a “node”.
[0003]
In a distributed database system, a database is distributed and held in a plurality of disk devices, and a plurality of nodes cooperate with each other to process one search request from a user. That is, a plurality of nodes access the disk devices holding different portions of the database designated by the search request in parallel with each other, and process each database portion. At this time, access from a plurality of nodes may occur simultaneously for the same database portion. At this time, in order to guarantee the execution result of these accesses, while a series of accesses from the same node is completed, these accesses from different nodes are prohibited so as to prohibit the access from other nodes. It is necessary to mediate.
[0004]
Several methods are known as methods for configuring a distributed database system. For example, see Reference 1: “Nikkei Electronics” No. 630, 1995.2.27, pp. 67-75. Among them, the following two are excellent in the convenience of access arbitration and the ease of system construction. First, the first configuration method is a shared everything method. In this method, a plurality of disk devices storing main memory and data are connected to a shared bus, and a plurality of nodes are also connected to the shared bus. The node can access any of these disk devices via this shared bus. However, in general, in the shared everything method using the shared bus connection, the data transfer performance of the shared bus becomes a bottleneck, so the number of nodes that can be connected to one shared bus is greatly limited. For this reason, it is difficult to increase the number of nodes and improve the performance of the system. The second configuration method is a shared nothing method, and neither the main memory nor these disk devices are shared by a plurality of nodes. In other words, in this method, these multiple disk devices are distributed and arranged in different nodes, and each disk device can be directly accessed only by the specific node to which the disk device belongs. When a node accesses the disk device, it requests access to the specific node. That is, in this system, when a file stored in a disk device shared by a plurality of nodes is accessed, the directly accessible node is the only node physically connected to the disk device (hereinafter referred to as resource management). Node). When another node that is not physically connected (hereinafter referred to as an access request node) accesses this file, it issues an access request to the resource management node via a message exchange means such as a network. Then, the file is indirectly accessed through the resource management node. Therefore, when a plurality of access request nodes access this file at the same time, a plurality of access request messages are sent to the resource management node. The resource management node arbitrates these request messages to Guarantees execution results.
[0005]
Thus, in this shared-nothing method, the disk device of each node is not physically accessible by other nodes, so the files in the disk device are not physically shared by other nodes. As described above, it can be accessed from other nodes via the resource management node of the file. Therefore, in the following, even in this shared nothing method, among the files and other resources of each node, resources that allow access from other nodes are referred to as shared resources. In particular, a file that allows access from other nodes is called a shared file.
[0006]
In this shared nothing method, it is relatively easy to increase the system performance by increasing the number of nodes. In addition, if the resource management node processes the access request message and the performance of the message exchange means between the nodes is high, the performance of access to the shared resource increases.
[0007]
[Problems to be solved by the invention]
However, when the conventional shared nothing method is applied to a distributed database system, the following problems arise. That is, when access concentrates on one resource, for example, a file stored in a specific disk device, the resource management node that manages the shared file on which access is concentrated processes access requests from other access request nodes. To be killed. In the distributed database system, this resource management node shares processing of this database like other nodes, so that this node is originally configured to also perform search processing and the like. However, when the access requests are concentrated on the resource management node, the time for executing the original business is remarkably reduced, the processing of the node is delayed, and the performance of the entire system is deteriorated. Therefore, this resource management node can be a bottleneck of the entire system.
[0008]
When a plurality of nodes simultaneously access a shared file stored in a disk device in any resource management node, a plurality of access requests from these access nodes compete for the same file. As a means for solving this access conflict, exclusive control of the shared file is required. When this shared-nothing method is applied to the distributed database system in the conventional lock request processing method, the access node issues a lock request thereto before issuing the access request to the resource management node. As described above, when a large number of access requests are concentrated on a specific resource management node, a large number of lock requests are concentrated on the resource management node before that. For this reason, the above problem that the execution of the original business by the resource management node is delayed becomes more serious. This problem will be further described below.
[0009]
That is, when the conventional lock request processing method is applied to this system, the lock access source node simultaneously transfers the shared file lock request to the resource management node to the resource management node before accessing the shared file. ,interrupt. When the resource management node interrupts the search processing that has been executed so far, interprets these lock requests, and finds that these lock requests are lock requests for the shared file, it determines which of these lock requests. Arbitration of whether or not to allow. Normally, a parallel server method (concurrent server method) is used in a client-server type processing method using a remote procedure call. In this method, when there is a service request from a client, the server activates a child process that provides this service to prepare for the next service request. Thus, if two service requests arrive at the server at approximately the same time, the server launches two child processes, which provide services to their clients in parallel. When this method is applied to the distributed database system of the shared nothing method, two child processes may issue a lock request to the same shared file, and these lock requests are processed. Child There is a need to mediate between processes. Since these two child processes are executed in the same node, arbitration can be performed by using exclusive control by a commonly used semaphore operation. For example, a test and set instruction.
[0010]
When the lock request is arbitrated, each child process responds with “lock permission” to one of the access request nodes to which the lock right is given in any of the arbitrations, and “lock rejection” to the other access request nodes. ". Then, after the process of the child process is completed, the interrupted search process is resumed.
[0011]
The access request node that has received the lock permission accesses the shared file and continues the search process. When all accesses to the shared file are completed, the shared file lock release request is transferred to the resource management node and interrupted to the resource management node. This node again interrupts the search process being executed, interprets the unlock request, and unlocks the shared file. When the unlocking is completed, the interrupted search process is resumed. On the other hand, the access request node that has received the lock rejection waits until the access of the shared file by the access request node having obtained the lock permission is completed, and issues the shared file lock request again after the waiting time elapses. In general, this waiting time is often a random number or a fixed time.
[0012]
As described above, when the conventional lock request processing method is applied to the conventional shared nothing processing parallel database processing system, the resource management node interprets the lock request, executes the lock request, interprets the lock release request, and the request. Therefore, the progress of the search process is hindered during the execution of these processes. This interference occurs every time any node issues an access request lock request. In particular, the main part of processing a lock request is to arbitrate multiple lock requests to grant the lock right to one of the multiple lock requests that require exclusive use of any resource; After any lock request is selected for arbitration, the resource is locked, and if there is an access request to the resource from a node other than the node that issued the selected lock request, access to the resource Is to ban.
[0013]
An object of the present invention is to provide an exclusive control method and a parallel computer for speeding up arbitration for selecting one of a plurality of exclusive use requests issued from a plurality of nodes for the same resource.
[0014]
A more specific object of the present invention is to allow arbitration of exclusive use requests from a plurality of nodes for resources under the management of one node by a circuit other than the processor in the one node, thereby An object of the present invention is to provide an exclusive control method and a parallel computer that reduce the arbitration time and reduce the load on the processor.
[0015]
[Means for Solving the Problems]
Therefore, in the exclusive control method of the present invention, usage state information indicating the exclusive usage state of one resource that can be used by a plurality of nodes is stored in each node,
When any node should issue an exclusive use request for the resource, whether or not the resource is in an exclusive use state is determined based on the use state information of the resource stored in the node Discriminate
If it is determined that the resource is in an exclusive use state, stop issuing the exclusive use request from the node;
When it is determined that the resource is not in an exclusive use state, the exclusive use request is issued from the node,
Transferring a plurality of exclusive use requests issued from any of the plurality of nodes to a circuit for exclusive use arbitration accessible by the plurality of nodes via a network;
Of the transferred exclusive use requests, one exclusive use request for exclusively using the resource is selected by the circuit for arbitration,
Based on the fact that the one exclusive use request is selected by the arbitration circuit, the resource is used exclusively with the use state information stored in each node regarding the resource. Is updated to use state information indicating
[0016]
According to the present invention, the node requesting access to the resource confirms that the resource is not in an exclusive use state, and then sends an access request for the resource. It does not issue, thereby reducing the execution of arbitration processing for useless exclusive use requests.
[0017]
A computer system according to the present invention performs the above method.
A circuit for arbitrating exclusive use requests included in the network, selecting one of a plurality of exclusive use requests issued from the plurality of nodes for the resource;
Each node
Means for storing usage status information indicating an exclusive usage status of the one resource;
Means for determining whether or not the resource is used exclusively based on the stored use state information when the node is to send an exclusive use request for the one resource;
Means for transferring a message including an exclusive use request to the arbitration circuit via the network when the one resource is not in an exclusive use state. Have
Each node is dependent on one exclusive use request selected by the arbitration circuit, and the use state information stored in the node represents a state in which the resource is exclusively used. It further has a means to update to information.
[0018]
In a more specific aspect of the present invention, the arbitration circuit includes an arbitration circuit provided in common to the plurality of nodes in the network, and the exclusive circuit issued from any one of the nodes is provided here. It is determined whether or not exclusive use is permitted for the use request.
[0019]
In another more specific aspect of the present invention, the arbitration circuit is composed of a plurality of arbitration circuits arranged in each node, and each node has an exclusive issue issued from any one of the nodes. It is determined whether or not exclusive use is permitted for the use request.
[0020]
In still another more specific aspect of the present invention, the arbitration circuit includes an arbitration circuit arranged in a node to which the resource belongs, and the usage status information of the resource stored in the node is included in the arbitration circuit. Based on this, it is determined whether or not exclusive use is permitted for an exclusive use request issued from any node, and the result is notified to each node.
[0021]
【Example】
Hereinafter, the computer system according to the present invention will be described in more detail with reference to some embodiments shown in the drawings. In the following, the same reference numerals represent the same or similar items.
[0022]
<Example 1>
FIG. 1 shows a parallel computer according to this embodiment. In this parallel computer, a plurality of nodes 2 are connected by a network 1, and each node 2 includes at least one processor 24 and a disk device 25, and is used as a shared resource of these nodes in the disk device 25. One or more shared files are held. Each shared file can be accessed only from the node (resource management node) that holds the shared file, and other nodes (access request nodes) that want to access this file send file read requests or file write requests to this resource management node. The resource management node is requested to execute the request. When using a shared resource, acquire an exclusive right to use that resource so that no other node can use it while one of the nodes is using it. It is necessary to give up the right. In a conventional shared nothing system, when any access request node requests exclusive use of a resource managed by any resource management node, a lock request is sent to this resource management node. The resource management node arbitrates a plurality of lock requests, selects one of them, and locks the resource when the selection is made. Thus, processing lock requests involves arbitrating multiple exclusive requests and locking resources. As will be described later, in this embodiment, arbitration is performed regarding which access request node is granted exclusive use rights of resources. However, when using resources, lock resources. I don't know . As described above, in this embodiment, the method of exclusive use of the shared resource is different, but in this embodiment, the same term “lock” as in the past, such as a lock request or a lock state, is used, but in this embodiment, Lock means exclusive use. For example, the lock state means an exclusive use state. The usage of the term “lock” is the same in other embodiments and modifications.
[0023]
This embodiment is characterized in that the shared resource lock state of each node is managed in all nodes including that node. For this purpose, each node 2 is provided with a lock state register group 52 and a lock control circuit 500 which are characteristic in this embodiment. The lock state register group 52 is a register for managing the lock state of all shared resources in the parallel computer, and is provided corresponding to the unit of the resource to be locked. In this embodiment, each node is a unit of lock. That is, even when there are a plurality of shared resources in each node, these shared resources are locked together. For this reason, in this embodiment, the lock state register group 50 is composed of registers corresponding to any one of the nodes, and each lock state register has its own state when the node corresponding to the register is not locked. In addition, information indicating this is held, and when the corresponding node is locked, identification information of the node locking the node is held. When a program executed on each node wants to lock any other node, it looks at the lock status register group 50 in its own node to determine whether the lock destination node is already locked, If the lock destination node is locked, a lock request is not sent out. As a result, there is no useless lock request for issuing a lock request to a locked file, which has occurred in the prior art, and accordingly, the waste that the resource management node processes a useless lock request is eliminated.
[0024]
When a plurality of nodes request a lock on the same shared resource, it is necessary to arbitrate those lock requests and select any one lock request. In this embodiment, this arbitration is performed by the lock control circuit 500 and the broadcast relay circuit 12 of each node. That is, when any node 2 wants to use any shared file, it should broadcast the lock request, the identification information of the resource management node holding the file, and the identification information of the node (access request node) A broadcast request message included as information is transferred to the broadcast message relay circuit 12 via the network 2. When receiving the broadcast request message, the broadcast message relay circuit 12 is a circuit that generates a broadcast message including the information to be broadcast and broadcasts it to all nodes via the network 2. If a plurality of broadcast request messages are transferred to the circuit 12 from different nodes 2, the broadcast message relay circuit 12 responds sequentially to these messages. Therefore, the broadcast message relay circuit 12 can be said to be a circuit that serializes a plurality of broadcast request messages. The broadcast message including the lock request broadcast from the broadcast message relay circuit 12 is transferred to each node 2 via the network 2, and does each node 2 allow the lock for the lock request in the broadcast message? Determine whether or not. Basically, when there are a plurality of lock requests that require a lock for the same resource, the lock request for the resource that has arrived first is granted. This determination The When the lock request is granted to the lock request, the lock control circuit 500 of the node rewrites the lock state register 52. The broadcast relay circuit 12 serves to supply a plurality of lock requests issued from different nodes to each node in the same order. Therefore, the lock determination result by the lock control circuit of each node is the same regardless of the node. In this way, the processor 24 of each node The It is possible to know the resource management node locked to the node and the node that successfully locked the resource node. Therefore, the node 2 that has transmitted the broadcast request message including the lock request can know whether or not it has successfully locked, and if it has successfully locked the resource management node, it is addressed to the resource management node. To send a message containing a file access request.
[0025]
In the prior art, the access request node transmits a lock request to the resource management node, and the node stops the program being executed and processes the lock request by the program. After the lock request is serialized and arbitrated by a circuit called the broadcast message relay circuit 12, the lock state is managed at each node, so that the lock request processing at the resource management node is unnecessary. Therefore, processing at the resource management node can be reduced.
[0026]
Hereinafter, the circuit of this embodiment and details of its operation will be described.
(Node 2)
As shown in FIG. Each node 2 performs a processor 24 that executes a program such as a search process 31, a local memory 23, a disk device 25 that stores a database shared by the node, and a high-speed lock process that is characteristic in this embodiment. Transmission control circuit 21 and reception control circuit 22. This parallel computer is a so-called distributed memory type parallel computer, and the local memory 23 is not shared by other nodes but can be accessed only by the node to which the local memory belongs, and the program executed on the node and the program thereof Stores data used or generated by. Each circuit in this node is connected to the system bus 26 and can be accessed from the processor 24 by a memory access instruction such as a load instruction or a store instruction without being distinguished from the local memory 23 by a memory mapped IO method. it can.
[0027]
The transmission control circuit 21 includes a message generation circuit 41, a transmission parameter storage register 42, and a transmission status register ST43. The reception control circuit 22 includes an input buffer 51, a lock state register group 52, and a lock control circuit 500. The lock control circuit 500 includes a coincidence circuit 53 that determines whether or not two inputs are the same, a magnitude determination circuit 54 that compares the magnitudes of the two inputs, an adder circuit 56 that adds the two inputs, and a selector 55. , 57, a gate circuit 58, and an AND circuit 59.
[0028]
(Network 1)
The network 1 is the same as that described in Japanese Patent Application No. 6-53405, and basically comprises a plurality of crossbar switches 7, 8 and a plurality of relay switches 3, as shown in FIG. The network consists of so-called hyper crossbar switches. These crossbar switches include a plurality of X-direction crossbar switches 7 or 8 and a plurality of Y-direction crossbar switches 5 or 6. Each node 2 is connected to one X-direction crossbar switch 7 or 8 and Y-direction crossbar switch 5 or 6 via a corresponding relay switch 3. Each relay switch 3 relays a message between a node, an X-direction crossbar switch, and a Y-direction crossbar switch connected to the relay switch 3.
[0029]
Each node 2 is given a set XY of X coordinates Y coordinates of lattice points in a two-dimensional space, and the X-direction crossbar switch is a group of nodes in which the Y coordinate is equal to a specific value and the X coordinate is different. The Y-direction crossbar switch combines a group of nodes having the X coordinate equal to a specific value and different Y coordinates among these nodes 2. The X-direction crossbar switch 7 to which the broadcast message relay circuit 12 is connected has one more input / output port than the other X-direction crossbar switches 8, and the Y-direction crossbar switch to which the broadcast message relay circuit 12 is connected. The same applies to 5. Therefore, hereinafter, these X-direction crossbar switches 7 may be referred to as extended crossbar switches or extended XB-X0, and the Y-direction crossbar switch 5 may be referred to as extended crossbar switches or extended XB-Y3. The other X-direction crossbar switch 6 is called XB-X1, XB-X2, XB-X3, the other Y-direction crossbar switch 6 is called XB-Y0, XB-Y1, XB-Y2, and the corresponding relay switch In some cases, it is referred to as EXij. Each X-direction crossbar switch 8 or Y-direction crossbar switch 5 has a number of route determination circuits 13 equal to the number of input / output ports for transferring a message input from one of the relay circuits 3 according to the destination address in the message. Similarly, the extended crossbar switch 7 or 6 also has an extended port corresponding route determination circuit 14.
[0030]
(Broadcast message relay circuit 12)
The structure of the broadcast message relay circuit 12 is the same as that described in the specification of Japanese Patent Application No. 6-53405. A plurality of broadcast request messages sent thereto are sequentially selected, and the selected broadcast request message is selected as the message. It is a circuit that changes to a broadcast message including information to be broadcast and is broadcast to each node via the network 1. In this case, the broadcast message relay circuit 12 is generally used to prevent the network 1 from entering a deadlock state due to a broadcast message. In the present embodiment, in addition to its use, when a plurality of broadcast request messages including a lock request are transferred from a plurality of access request nodes, they are used as a serialization circuit that sequentially selects them.
[0031]
That is, the broadcast message relay circuit 12 is provided separately from the relay switch 3 to which a plurality of nodes 2 are connected, and is an extended input / output port (here, an address) of the extended crossbar switch 7 that is one of the X-direction crossbar switches. And the extended input / output port (here, having the address 43) of the extended crossbar switch 6 which is one of the Y-direction crossbar switches. An access request node (for example, a node connected to EX12) directly connected to a relay switch connected to an X-direction crossbar switch 8 (for example, XB-X1) other than the extended X-direction crossbar switch 7 makes a lock request. When the broadcast request message including is transferred to the broadcast message relay circuit 12, this node transmits a broadcast request message including the address 43 of the expansion port of the expanded Y-direction crossbar switch 6 as the message destination address. The data is transferred to the broadcast message relay circuit 12 via the crossbar switch XB-X1 and a relay switch such as EX13, extended Y-direction crossbar switch 6, and extended input / output port 43. On the other hand, an access request node directly connected to the relay switch connected to the extended X-direction crossbar switch 7, for example, a node connected to the relay switch EX01 sends a broadcast request message including a lock request to the broadcast message relay circuit 12. Is transmitted via the extended input / output port of the crossbar switch 7, a broadcast request message including the address 04 of the extended port of the extended X-direction crossbar switch 7 as a message destination address is transmitted. It is transferred to the message relay circuit 12.
[0032]
As described in Japanese Patent Application No. 6-53405, the broadcast message relay circuit 12 includes two input buffers connected to two input ports whose addresses are 04 and 43, and these input buffers. A selector for selecting one of the two input buffers, a priority circuit for instructing the selector, and a control (CTL) bit in the broadcast request message selected by the selector. A control bit changing circuit for changing to a broadcast message and an output for sending a broadcast message including information to be broadcast in the broadcast request message to the output port 04 of the network 1 including a control bit changed by this circuit. It consists of a buffer. In this way, the broadcast message relay circuit 12 changes the broadcast request message transferred thereto to a broadcast message by this control bit change circuit, and sends it to the extended input / output port at address 04 of the extended X-direction crossbar switch 7. . As described above, the route through which the broadcast request message passes differs depending on the position of the access request node, so that the route of the broadcast message corresponding to this broadcast request message does not overlap with that of the broadcast request message. The deadlock that can occur when there is is prevented from occurring. Further, even when two broadcast request messages are received at the same time, these are sequentially selected by the above-described priority circuit. In this way, a plurality of broadcast request messages are sequentially sent to the network 1 by the broadcast message relay circuit 12 and delivered to all nodes through the same transfer path. Therefore, in a network in which message overtaking does not occur along the network route, the order of arrival of broadcast messages is the same for all nodes. Accordingly, lock requests output from different nodes are notified to all nodes in the order selected by the broadcast message relay circuit 12. As will be described later, in this embodiment, among a plurality of broadcast messages including a lock request for the same resource management node, a lock request in a broadcast message that first arrives at each node is processed as a valid lock request, and It is allowed to lock common files in the resource management node. A lock request in a broadcast message that subsequently arrives at the resource management node will therefore not be able to lock the common file. In this embodiment, the order of the broadcast request messages arriving at the respective nodes is the same by the operation of the broadcast message relay circuit 12, so that each node allows the lock request output from the same access node to be locked. .
[0033]
(Binary semaphore)
Before describing the lock request processing, the operation of the semaphore will be briefly described. In this embodiment, the method for locking the shared file is realized by using the lock state register group 52 in the reception control circuit 22 of all nodes as a semaphore. The semaphore here is a so-called binary semaphore, and is used when the number of requesters who can acquire a shared resource at the same time is limited to one. The binary semaphore starts with 0 as an initial value, and a P operation for taking a lock and a V operation for releasing the lock are used.
P (X): “if X = 0 then X: = 1”... P operation
V (X): “if X = 1 then X: = 0”... V operation
Perform this operation with an inseparable operation, such as a test-and-set (T & S) instruction. When the binary semaphore X is 1 by the P operation, the lock is applied. Then, the lock can be released by setting the binary semaphore X to 0 by V operation.
[0034]
In this embodiment, by using the lock state register group 52 of all nodes as a binary semaphore in this embodiment, the shared file locking is realized in the system of the shared nothing method.
[0035]
(Initialization of the lock state register group 52)
All nodes initialize the lock state register group 52 before starting the device operation. In each node 2, in this initialization operation, the processor 24 substitutes 0 through the system bus 26 using the store instruction for all of the lock state register groups 52. As a result, the value of each register in the lock state register group 52 becomes 0, indicating that no node is locked.
[0036]
(Lock request processing)
Hereinafter, with reference to FIG. 3, the shared file when the access request node # 1 and the access request node # 2 access the shared file 35 stored in the disk device 25 connected to the resource management node # 0 almost simultaneously. The locking method will be described. In FIG. 3, processing surrounded by a double line such as processing 261 mainly represents processing executed by hardware, and processing surrounded by a single line mainly represents processing by a program executed by the processor 24. Yes.
[0037]
First, in parallel with the resource management node # 0 and the access request nodes # 1 and # 2, the search processing program 31 is executed by the processor 24 of each node using the local memory 23 and the disk device 25, respectively (step 201, 221, 241). When the access request nodes # 1 and # 2 need to access the shared file 35 of the disk device 25 of the resource management node # 0 almost exclusively at the same time (steps 222 and 242), a request for locking the shared file at each node Are processed (steps 223 and 243).
[0038]
(1) Broadcast lock request message
As described above, the lock state register group 52 is a register for managing the lock state of all the shared resources in the parallel computer, and is provided corresponding to the resource unit to be locked. In this embodiment, each node is a unit of lock. For this reason, in this embodiment, the lock state register group 50 is composed of registers respectively corresponding to one of the nodes. Therefore, hereinafter, the number of the lock state register of each node will be referred to as the number of the node to which the register corresponds. For example, a lock status register for the resource management node # 0 is referred to as a lock status register # 0.
[0039]
The access request nodes # 1 and # 2 determine whether or not the resource management node # 0 is already locked. For this purpose, the processor 24 reads the contents of the lock state register # 0 via the system bus 26 and checks whether the value is zero. If it is not 0, the resource management node # 0 is already locked, and therefore does not output a lock request. Thus, in this embodiment, each node is prevented from issuing a useless lock request.
[0040]
After confirming that the resource management node # 0 is not locked, the access request nodes # 1 and # 2 perform the above-described P operation on the lock state register # 0 as follows. That is, the broadcast request message 11 including the following parameters is transmitted to the message relay circuit 12 to all nodes. This message requests to broadcast a lock request message requesting that the lock state register # 0 of each node be set to the locked state.
[0041]
Nadr (address): = address of broadcast message relay circuit 12
CTL (control): = broadcast request message enable
R # (register number): = # 0
Int (interrupt): = disabled
D0 (data 0): = 0
D1 (data 1): = number of access request node # 1 (or # 2)
Ctyp0 (operation type 0): = match determination enable
Ctyp1 (operation type 1): = set enable
Here, Nadr is a network address representing the transfer destination of the broadcast request message and is used by the network 1. In this case, this address is the address 04 or 43 of the broadcast message relay circuit 12. Whether the address 04 or 43 is used depends on whether or not the access request node # 1 is connected to the relay switch connected to the extended X-direction crossbar 7 as described above. CTL is a control bit indicating the type of message, which in this case indicates a broadcast request message. R # is the number of the lock state register corresponding to the resource management node that is the target of the lock request, and is # 0 in this case. Int is a field for instructing whether or not to notify the access request node # 1 by interruption when acquisition of the lock requested by the broadcast request message is successful. Here, disable is set. Interrupts do not occur. In this case, as will be described later, whether or not the lock request is successful is detected by the processor 24 in the access request node # 1 by monitoring the contents of the lock state register # 0 in the node. On the other hand, when the Int field is enabled, the access request node # 1 waits for an interrupt and determines whether or not the lock is successful. D0 and D1 respectively represent data to be used for calculation in the lock control circuit 500 of each node. In the present example, D0 indicates a value of 0 indicating that the lock state register # 0 is not locked, and is used to determine whether the current value of this register is zero. In this case, D1 represents the number of the access request node # 1 (or # 2) as data to be written to the register when the register is successfully locked. Ctyp0 is a first parameter that designates an operation at each node. In this case, Ctyp0 indicates a match determination between the current value of the lock state register # 0 and the data D0. As a result, it is determined whether or not the current register is in an unlocked state. Ctyp1 is a lock status register # 0 Is a signal indicating whether or not the calculation result in the lock control circuit 500 should be written, and in this case, writing is instructed.
[0042]
In order to transmit the broadcast request message described above, the access request node # 1 writes the parameters described above to the transmission parameter storage register 42 via the system bus 26 using the store instruction. In the transmission control circuit 21, when the message generation circuit 41 detects that the above parameters are written in the transmission parameter storage register 42, it generates the broadcast request message 11 from the transmission parameters and sends it to the network 1. When the transmission of the broadcast request message to the network 1 is completed, the state ST in the transmission state register 43 is set to “transmission complete”. The processor 24 can know that the shared file lock request has ended by monitoring the transmission status register 43 via the system bus 26 using a load command. When it is confirmed that the transmission status register 43 is “transmission complete”, the processor 24 clears the transmission status register 43 by a store instruction so that the next message can be transmitted.
[0043]
The message 11 sent to the network 1 is sent to the broadcast message relay circuit 12 as already described. The operations of the access request nodes # 1 and # 2 are the same with respect to the issuance of the lock request. The broadcast message relay circuit 12 sequentially selects the messages transmitted from these nodes, and converts the control bits CTL in the selected messages from those representing broadcast request messages to those representing broadcast messages. Broadcast to all nodes (step 261) . Well, mentioned earlier The As described above, the access request node 1 and the access request node 2 make a lock request for the shared file 35 almost simultaneously. In this case, the broadcast request message sent from the access request node 1 is broadcasted from the access request node 2. It is assumed that the message is selected by the broadcast message relay circuit 12 before the request message and broadcast to all nodes. As described above, this means that the order of arrival of the lock request messages that arrive at all the nodes is that the lock request message from the access request node 1 precedes the lock request message from the access request node 2. .
[0044]
(2a) Lock request arbitration (lock success case)
When receiving the broadcast message storing the lock request from the access requesting node 1, the reception control circuit 22 of each node performs the following operation. A message from the network 1 is temporarily stored in the input buffer 51. In accordance with the lock state register number #R (# 0 in this case) of the message stored in the input buffer 51, the value stored in the lock state register # 0, in this case, the initial value 0 is the match determination circuit 53, This is output to the determination circuit 54 and the addition circuit 56. The received message data D0, here the value 0, is output to the coincidence determination circuit 53 and the magnitude determination circuit 54. The coincidence determination circuit 53 has an output of 1 because the two inputs to it are both 0. In this case, since Ctyp 0 of this message is coincidence determination enable, the output of the coincidence determination circuit 53 is selected by the selector 55 and the output of the magnitude determination circuit 54 is not selected. Thus, the output of the selector 55 becomes active, and the gate 58 in response thereto becomes active. On the other hand, since the Ctyp1 field of the received message is set enable, D1 of the received message, here, the node number # of the access request node # 1 is selected by the selector 57 and output to the gate 58. Since the gate 58 is active, the node number # 1 of the access request node # 1 that is the output of the gate 58 is written to the lock state register # 0. Further, since the Int field of the received message is disabled, the output of the AND gate 59 is negative, and no interruption occurs to the processor 24.
[0045]
Although the lock request arbitration is performed as described above, in the present embodiment, the resource requested by the lock request in the resource management node for the lock request selected by the arbitration. itself Is not locked.
[0046]
(2b) Lock request arbitration (lock failure case)
Resource management node # 0 already has the value of lock status register # 0
The broadcast message storing the lock request from the access request node 2 is later received by each node into the broadcast message storing the lock request from the access request node 1. In this case as well, the same processing as (2a) above is performed, but in this case, since the node number of the access request node # 1 has already been written in the lock state register # 0, the coincidence determination circuit No match is detected at 53, and its output is negative, so both the output of selector 55 and gate 58 are negative. Therefore, the gate 58 does not write to the lock state register # 0. Therefore, the state where the access request node # 1 locks the node # 0 is maintained.
[0047]
As described above, the plurality of P operations for the lock state register # 0 are executed in the same order for all the nodes, thereby ensuring the lock exclusivity. Further, during this lock process, the resource management node # 0 does not interrupt the process being executed.
[0048]
(3a) Confirmation of lock (success case)
The access request node # 1 that issued the shared file lock request monitors the lock status register # 0 with a load instruction, and confirms that the contents of the lock status register # 0 have been changed to the node number of the own node. The success of the lock is confirmed (step 224). Thereafter, the access request node # 1 acquires the access right to the shared file 35 in the resource management node # 0 until the access request node # 1 releases the lock.
[0049]
(3b) Lock confirmation (failure case)
The access request node # 2 that has made a shared file lock request monitors the lock status register # 0 by a load instruction, and the contents of the lock status register # 0 are changed to the node number of the access request node # 1. The lock failure is confirmed (step 244). Thereafter, the access request node # 2 cannot access the shared file 35 until the access request node # 1 releases the lock.
[0050]
(4) File access
Hereinafter, the access request node # 1 accesses the shared file 35 by issuing a message including an access request via the network 1 to the I / O processing program 32 executed by the processor 24 of the resource management node # 0. (Step 225). Upon receiving this message, the resource management node supplies an interrupt signal to the processor 24 by an interrupt circuit (not shown) when this message is taken into the input buffer 51. In the processor 24, in the interrupt processing program, the I / O processing program 33 specified by the access request as the communication destination is activated, and the I / O operation specified by the request is performed on the file specified by the access request. It is instructed to execute (step 202). When the access request is a read request, the processor 24 reads this data from the disk device disk device 25 and transmits a message including this data to the access request node. If the access request is a write request, the access request includes write data. The processor 24 writes the data to the disk device 25 and transmits a message indicating the completion of the write to the access request node.
[0051]
In the prior art, when one of the resource management nodes receives a file access request transmitted from another node, the resource management node generates a coprocess for executing the access specified by the file access, and the coprocess Control the execution of In this case, when the resource management node receives lock requests from different nodes in parallel for the same file in the disk storage device 25 from a plurality of nodes, it generates a coprocess for each lock request. To do. Since these coprocesses are accesses to the same file, the resource management node arbitrates lock requests to these files by these coprocesses. That is, any one coprocess is selected. In addition, traditional lock request processing locks the file so that the selected co-process can use the file exclusively and keeps the file in another state until the use of the co-process is finished. Prohibit use of co-processes.
[0052]
In this embodiment, no lock request is sent from each access request node to the resource management node. Therefore, the resource management node does not lock resources in the node. As described above, in the present embodiment, one access request node is provided to the disk storage device 25 in the same resource management node by a circuit that arbitrates an exclusive use request provided outside the processor of the resource management node. Only access is allowed. That is, exclusive use of this resource is arbitrated outside the processor of this resource management node. Therefore, a plurality of exclusive requests for the same disk storage device are not supplied to the processor of this resource management node. Therefore, a plurality of coprocesses that require access are not generated in parallel. Therefore, in this resource management node, access from the different nodes to the same disk storage device does not occur at the same time without performing lock control. Therefore, in this embodiment, the resource management node does not lock the resource. In this embodiment, each node has a lock state register group, and when accessing the resource of another node, the same resource is determined by determining whether or not the resource is locked by this state register group. The exclusive guarantee of use.
[0053]
(5) Shared file unlock request
The access request node # 1 that has completed the access to the shared file 35 performs a V operation on the lock state register # 0 according to the following procedure in order to unlock the shared file 35. First, a broadcast request message including the following parameters is transferred to the broadcast relay circuit 12, and from there, the message including this parameter is broadcast to all nodes.
[0054]
Nadr: = address of broadcast relay circuit 12
CTL: = broadcast request message enable
R #: = # 0
Int: = Disable
D0: = node number of access request node # 1
D1: = 0
Ctyp0: = Enable coincidence determination
Ctyp1: = Set enable
This parameter is merely the exchange of the contents of the data D0 and D1 compared to the broadcast request message including the lock request described above. This message is transmitted to the broadcast message relay circuit 12 and broadcast from there to all the nodes via the network 1 (step 262). The circuit operation at that time is the same as that of the broadcast request message including the lock request described above.
[0055]
(6a) Unlocking
When each node receives the broadcast message storing the unlock request from the access request node # 1, the reception control circuit 22 stores the data in the lock status register # 0 in the same manner as in the case of the message including the previous lock request. After the match determination circuit 53 determines that the access request node # 0 indicated by D0 is held, the value 0 indicated by the data D1 is written to this register. As a result, the lock state register # 0 returns to the initial value 0, and the shared file 35 is not used by any node.
[0056]
On the other hand, the access request node # 2, which has failed to lock as a result of the process 244, waits for a shared file lock re-request (step 245). Normally, this processing is realized by performing the lock request processing again (same as step 243) after a certain time by using a timer function held internally by the processor 24. For example, even if the access request node # 2 makes a lock request again before the access request node # 1 unlocks the lock state register # 0, the value of the lock state register # 0 is the same as the access request node #. Since the node number 1 is stored, the P operation fails and the lock fails. However, when the access request node # 2 makes a lock request after the free process of the lock state register group 52 is performed, the access request node # 2 is stored in the lock state register # 0 in the same manner as the processes 211 and 2. And the access request node # 2 succeeds in acquiring the lock.
[0057]
As described above, in this embodiment, it is possible to realize the search processing performed by the processor of a specific node such as the resource management node without interruption in both of requesting and releasing the lock. For this reason, the performance of the whole system can be improved.
[0058]
<Modification 1 of Example 1>
In Example 1, the broadcast message relay circuit 12 described in Japanese Patent Application No. 6-53405 was used. This circuit does not exist in the hypercrossbar network, The Is provided. In this modification, such an additional circuit is not used, and the same operation as that of the first embodiment is realized.
[0059]
That is, in Japanese Patent Application No. 6-169995, serialization that sequentially broadcasts a plurality of broadcast messages that are normally provided in each X-direction crossbar network direction and Y-direction crossbar network and transferred to the crossbar network. A technique for preventing a deadlock in a network by using a circuit in the same manner as the broadcast message relay circuit in Japanese Patent Application No. 6-53405 is disclosed. In this technique, a node requesting broadcast is addressed to one relay switch connected to one specific crossbar switch that is predetermined for a broadcast message serialization circuit among a plurality of crossbar switches in the network. Broadcast request message. As this specific relay switch, a node from which a broadcast message is sent is connected, and a crossbar switch having a different coordinate axis from that of the specific crossbar switch (for example, when the specific crossbar switch is an X-direction crossbar switch, Direction crossbar switch) and a relay switch connected to the specific crossbar switch. In this specific crossbar network, when a plurality of broadcast request messages are transferred to the crossbar switch from a plurality of relay switches connected thereto, one of them is selected, and the selected broadcast request message is transmitted to the broadcast message. The broadcast message is transmitted to each of the plurality of relay switches connected to the crossbar switch, and these broadcast messages are transmitted to the plurality of nodes connected to the network via the relay switches.
[0060]
Also in this modification, this specific crossbar switch is used similarly to the broadcast message relay circuit 12 of the first embodiment. That is, in FIG. 1, the broadcast relay circuit 12 is deleted, and accordingly, the extended X-direction crossbar switch 1XE and the extended Y-direction crossbar switch 1YE are input in the same number as the other X-direction crossbar switches and Y-direction crossbar switches, respectively. It is configured with an output port. Each node sends a broadcast request message including a lock request to a relay switch (specifically, the specific relay switch described above) connected to the specific crossbar switch. A request message may be transmitted. Thereafter, a broadcast message corresponding to this broadcast request message is broadcast to each node by the method shown in Japanese Patent Application No. 6-169995. Each node is configured by the same circuit as shown in the first embodiment, and the operation thereof is the same as that in the first embodiment.
[0061]
<Modification 2 of Example 1>
In the first embodiment, the broadcast message relay circuit 12 selects one of a plurality of broadcast request messages including a lock request transferred to the network 1 and changes the selected broadcast request message to a broadcast message. Forwarded to multiple nodes. Therefore, the broadcast message relay circuit 12 functions as a serialization circuit that sequentially selects a plurality of broadcast request messages each including a lock request. However, the broadcast message relay circuit 12 not only broadcasts a broadcast message for a broadcast request message selected first among a plurality of broadcast request messages including a lock request for the same node, but also includes a subsequent request including a lock request for the same node. A broadcast message corresponding to the broadcast request message is also broadcast. Each node sequentially receives a plurality of broadcast messages including a lock request for the same node, but among these broadcast messages including a lock request for the same node, only the lock request in the first received broadcast message is valid. Processed as a lock request. For this reason, in the first embodiment, among the plurality of broadcast messages including the lock request for the same node, messages other than the message broadcast first are broadcast although they are not used in each node. For this reason, the network 1 has a problem that such waste is used for broadcast messages. In this modification, such a useless message broadcast is eliminated.
[0062]
Therefore, the broadcast message relay circuit 12 is also provided with a lock control register group 52 and a lock control circuit 500 provided at each node. That is, when the priority circuit in the broadcast message relay circuit 12 first selects a broadcast request message including a lock request for any node, the lock destination node specified by the broadcast request corresponds to the lock destination node. The lock control circuit is configured to store in the lock status register and indicate that this node is locked. In response to this broadcast request, as in the first embodiment, the control bit changing circuit in the broadcast message relay circuit 12 changes the first selected broadcast request message to a broadcast message and then sends it to the network. Forward. Further, when the priority circuit of the broadcast message relay circuit 12 receives a subsequent broadcast request message including a lock request, a lock control circuit provided in the broadcast message relay circuit 12 requests a lock requested by the broadcast request message. If the contents of the lock status register for the previous node are examined and the number of any lock request source node is stored in the register, this subsequent broadcast request message including the lock request is regarded as an invalid message. Do not broadcast messages. By doing this, only the broadcast message corresponding to the first selected broadcast request message including the lock request for the same node is broadcast. The configuration of the lock control circuit 500 at each node may be the same as in the first embodiment.
[0063]
<Modification 3 of Example 1>
By providing the same circuit as the circuit provided in the broadcast message relay circuit 12 in Modification 2 in the serialization circuit in the specific crossbar switch used for the broadcast message serialization circuit described in Modification 1, The same operation as that of the second modification can be realized.
[0064]
<Modification 4 of Example 1>
In the first embodiment, the broadcast message relay circuit is also used for serializing normal broadcast request messages that do not include a lock request, and the network 1 is used for these broadcast request messages or a normal one generated by this circuit for them. It was also used to transfer broadcast messages. Furthermore, this broadcast message relay circuit is also used for serializing broadcast request messages including lock requests, and the network 1 is also used to transfer these broadcast request messages or broadcast messages generated from them. However, using another network including another broadcast message relay circuit for a broadcast request message including a lock request and a broadcast message including a lock request generated therefrom, improves the message transfer speed. It is effective for.
[0065]
<Example 2>
In the first embodiment, a plurality of lock requests to a resource are arbitrated at each node regardless of which node the resource belongs to, and the result is held in the lock status register of each node. Used in the node. In the first embodiment, for this purpose, the lock request source transmits a broadcast request message including a lock request to the broadcast message relay circuit, and this circuit changes this message to a broadcast message. The Broadcast to each node. For this reason, a broadcast request message including all lock requests is transferred to the broadcast relay circuit. For this reason, many broadcast request messages including a lock request are transferred to the broadcast message relay circuit, which may cause a delay in processing in the circuit. Furthermore, the lock status register group of each node is used corresponding to one lock unit, for example, a node, and each lock status register is locked when the resource of the corresponding node is locked. Holds the number of successful nodes. For this reason, as the number of nodes increases, more registers are required. This embodiment eliminates these problems.
[0066]
FIG. 4 shows a schematic configuration of the parallel computer in this embodiment. Hereinafter, the difference from the first embodiment will be mainly described. In this embodiment, when a plurality of access nodes request access to resources belonging to a certain resource management node, each access request node sends a plurality of lock requests including a lock request to the resource management node in one-to-one communication. The resource management node arbitrates these lock requests, and notifies each node of the arbitration result indicating that the lock right is given by a broadcast message. This avoids concentration of messages including lock requests to the broadcast relay circuit in the first embodiment. As in the first embodiment, the network 1 in the present embodiment is provided with a broadcast message relay circuit 12, and the resource management node transmits the arbitration result in broadcasting the arbitration result. The The broadcast request message including this is sent to the broadcast message relay circuit 12, and this arbitration result is broadcast to all nodes. As described above, the broadcast message relay circuit 12 is not used for serializing the broadcast request message including the lock request, but is used for serializing the broadcast request message including information other than the lock request such as the arbitration result. .
[0067]
In the lock control circuit 500 of each node, one lock state register 153 is provided for all nodes in each node, and a plurality of bit positions of this register are provided corresponding to one lock unit. A lock unit is any one node in the system. Each bit position stores a lock state that takes the value 1 or 0 depending on whether the corresponding node is in a locked state. This prevents an increase in the lock state register in the first embodiment. Each node is further provided with a lock acquisition register 152 for identifying a resource locked by the own node. Each bit of this register corresponds to one lock unit, here a node, and the node to which the register belongs. Z When one of the resource management nodes is locked, the value 1 is stored in the bit corresponding to the resource management node. Unlike the first embodiment, the lock control circuit 500 does not arbitrate the lock request here, and updates these registers based on the arbitration result transmitted from one of the resource management nodes. The access request node determines whether or not the lock request transmitted by itself is permitted based on the updated register value.
[0068]
Details of the operation of the apparatus of FIG. 4 will be further described below.
(Register initialization)
In all the nodes, the processor 24 in each resets the lock status register 152 and the lock acquisition register 153 to zero. In the node # register 154 of each node, the node number is set.
[0069]
(Lock request processing)
A method of executing this process will be described with reference to the flowchart of FIG.
(1) Broadcast lock request message
The access request node checks whether the resource management node to which the resource to be accessed belongs is locked based on the value of the bit corresponding to that node in the lock status register 152, and the node is locked. If not, a message including a lock request is sent to the resource management node (steps 523 and 543). The following parameters are stored in the transmission parameter storage register 42 for this transmission.
[0070]
Nadr (network address): = node number of the resource management node
CTL (control): = one-to-one transfer message enable
C message D (command): = lock request
Int (interrupt): = disabled
B # (bit number): = number of resource management node
T # (target node number): = number of the resource management node
R # (access request node number): = number of own node
Here, CTL is a bit representing the type of message, and here represents a one-to-one message. The C message D is a code representing the type of command, and here represents a lock request. In the present embodiment, commands such as lock notification and lock release are also used as commands related to locking. B # is information for identifying a resource to be locked. Here, the number of the resource management node of the resource is used. T # is the number of the resource management node to which the resource to be locked belongs. R # is the number of the access request node. Nadr, CTL, and Int are the same as those in the first embodiment. The message generation circuit 41 transfers a message including this parameter to the resource management node indicated by this address Nadr via the network 1.
[0071]
(2a) Lock request arbitration (lock success case)
When a plurality of messages each including a lock request are transmitted from the plurality of access request nodes to the same resource management node, the messages are transferred in parallel to the resource management node by the network 1. The node sequentially receives these messages from the network and sequentially supplies them to the input buffer 51. When the first message of such messages is input to the input buffer, the bit number B # in this command is supplied to the lock state register 153 and the lock acquisition register 152. Further, a command bit C message D is decoded by a decoder (not shown) provided in the input buffer 51. When this command is a lock request, a signal indicating that the lock request has been decoded is supplied to the AND gate 155. Is done. The other inverted input of the AND gate 155 receives the bit of the bit number B # in the lock state register 153. If the value of this bit is 0 (that is, the resource management node is still locked). The output of this AND gate is 1. This output signal indicates that the lock is permitted in response to this lock request, and is sent to the set terminal of the lock state register 153 via the OR gate 159. As a result, the register writes the value 1 in the bit position of bit number B # already input to it, indicating that this resource management node has been locked. Further, the output of the AND gate 155 instructs the message generation circuit 41 to send a broadcast request message including a lock report as a command. Thus, the lock state is inspected by the AND gate 155 (step 502).
[0072]
In response to the instruction signal from the AND gate 155, the message generation circuit 4 generates a broadcast request message based on the message in the input buffer 51. That is, as a parameter in this message, a message including the next new parameter and other parameters in the input buffer 51 is generated and sent to the network 1. The following new parameters are stored in the message generation circuit 41 in advance.
[0073]
Nadr: address of broadcast message relay circuit 12
CTL: Broadcast request message
C message D: Lock report
This message is broadcast to each node after the control bit CTL is changed so that it becomes a broadcast message by the broadcast message relay circuit 12 (step 503).
[0074]
(2b) Lock request arbitration (lock failure case)
When the processing of the first message among a plurality of messages each including a lock request is completed as described above, subsequent messages in those messages are subsequently fetched into the input buffer 51 sequentially. Since the value of the bit of the lock state register 152 input to the AND gate 155 is already 1, the output of the AND gate 155 remains 0, and the lock included in these messages Requests are not allowed to be locked.
[0075]
(3) Checking the lock
When a message including a lock report broadcast from the broadcast relay circuit 12 is transferred to any node that has already issued a lock request, the target node number T # in the message is sent to the matching circuit 160, and the own node Compared with the node number in register 154. As a result of this comparison, when this node is a node other than the resource management node, no match is detected. The AND gate 158 receives an inverted output of the coincidence circuit 160 and a signal obtained by decoding that the command C message D in the message is a lock report. Accordingly, since all the inputs are 1 in the nodes other than the resource management node, the output of the AND gate 158 becomes 1, and is input to the set terminal of the lock state register 153 via the OR gate 159. Sets 1 in the bit position indicated by the bit number B # of the message. Thus, the lock status register 153 in these nodes indicates that the resource management node is locked (steps 524 and 544). In the resource management node, as described above, when one of the lock requests is permitted to be locked, 1 is written in the same bit position in the lock status register.
[0076]
When the node that has received the message including the lock report is any access request node, the access request node number R # in the message is sent to the matching circuit 161 in the node, where the node number Compared with the node number in register 154. Of the plurality of access request nodes that have sent a lock request to the resource management node, the access request node that is permitted to lock detects a match as a result of the match determination. Since the output of the matching circuit 161 and the decryption signal of the lock report in the message are input to the AND gate 157, the output of the AND gate becomes 1 at the access request node that has been successfully locked, and the lock acquisition register 152 The register 152 sets 1 to the bit position indicated by the bit number B # of the message (step 524). On the other hand, among the access request nodes, the contents of the lock acquisition register 152 are not changed in the nodes that are not permitted to be locked.
[0077]
Therefore, the plurality of access request nodes can determine whether the processor 24 looks at the contents of these registers 152 and 153 and locks the resource management node or, when locked, whether the resource has been successfully locked. Yes (steps 525, 545).
[0078]
The AND gate 156 has successfully locked the processor 24 when a match is found in the matching circuit 161 in the access request node where the lock has succeeded and the interrupt signal Int in the received message is 1. An interrupt signal indicating that
[0079]
(4) File access
The access request node that has succeeded in locking issues an access request to the resource management node, and the resource management node accesses the disk device in the resource management node in response to the request (steps 225 and 202). This method is the same as in Example 1.
[0080]
(5) Shared file unlock request
When the access by the access request node is completed, this node transfers a broadcast request message including an unlock request to the broadcast message relay circuit 12. The parameters of this message are as follows: Upon receiving this message, the broadcast message relay circuit 12 broadcasts a message including this parameter to each node (step 527).
[0081]
Nadr: = address of broadcast relay circuit 12
CTL: = broadcast request message enable
C Message D: = Unlock
Int: Disable
B #: = number of the resource management node
T #: = number of the resource management node
R #: = number of own node
When this message is received at each node, a signal indicating that the unlock request is decoded, which is given from the input buffer 51, is given to the reset terminals of the lock status register 152 and the lock acquisition register 153. The value of the bit indicated by bit number B # in this message is reset to zero. It should be noted that since the value of the lock acquisition register 152 has already been 0 except for the access request node that has been successfully locked among the plurality of nodes, it does not change by this unlocking process. Thus, in each node, these registers indicate that the resource management node is unlocked (steps 504, 528, 547).
[0082]
As can be seen from the above, this embodiment not only has the same advantages as the first embodiment, but also differs from the first embodiment in that a node that manages the resource to be accessed arbitrates a plurality of lock requests for that resource. Therefore, as described in the first embodiment, it is not necessary to send a broadcast request message including a lock request to the broadcast message relay circuit, and it is possible to alleviate the concentration of many broadcast request messages in the circuit. Furthermore, in this embodiment, it is possible to determine the lock state of each resource and which node has succeeded in locking by using one lock state register and one lock acquisition register, so that it is possible to make use of fewer registers than in the first embodiment. .
[0083]
<Modification of Example 2>
In the second embodiment, the number of shared files existing in each node is one. However, when a plurality of shared files exist in each node, a plurality of lock status registers and lock acquisition registers are provided, and different shared files in each node are provided. If different lock status registers and different lock acquisition registers are used, locks can be managed independently of each other for each shared file.
[0084]
【The invention's effect】
According to the present invention, since it is confirmed in the access request node that the resource to be accessed is not locked, an access request for this resource is sent, so that a useless lock request is not issued, thereby a useless lock request. The exclusive control for can be reduced.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a parallel computer according to the present invention.
FIG. 2 is a schematic configuration diagram of a network used in the apparatus of FIG.
FIG. 3 is a flowchart of processing for accessing a shared file in the apparatus of FIG. 1;
FIG. 4 is a configuration diagram of another parallel computer according to the present invention.
FIG. 5 is a flowchart of processing for accessing a shared file in the apparatus of FIG. 4;
[Explanation of symbols]
26 ... System bus, 58 ... Logic gate

Claims

In a computer system in which a plurality of nodes each having at least one processor and at least one resource management node having resources available by the plurality of nodes are connected via a network, the use of the resources is Exclusive control method limited to use by a single node that has successfully locked the resource at the time ,
Each of the plurality of nodes stores usage state information indicating an exclusive usage state of the resource in a register of a lock control circuit provided respectively.
When each node should issue an exclusive use request for the resource, the use of the resource stored in the register of the lock control circuit of the node indicates whether or not the resource is in an exclusive use state. If it is determined based on the state information and it is determined that the resource is used exclusively, the exclusive use request from the node is stopped and the resource is used exclusively. If it is determined that it is not in a state, the exclusive use request is transferred from the node to the broadcast message relay circuit provided in the computer system ,
The broadcast message relay circuit sequentially selects one exclusive use request from a plurality of transferred exclusive use requests, and broadcasts the selected exclusive use request to all of the plurality of nodes.
Based on the fact that the one exclusive use request is selected and broadcasted , each node exclusively uses the use state information stored in the register of the lock control circuit of each node regarding the resource. An exclusive control method for updating to use state information indicating use.

Issue an access request to the resource from the node that issued the selected exclusive use request,
After the access of the resource by the access request is completed, a request to release the exclusive use of the resource from the node that issued the selected exclusive use request to the resource management node that holds the resource is sent to the broadcast message relay circuit . Outgoing,
Transfer the exclusive use release request from the broadcast message relay circuit to all of the plurality of nodes,
The node further includes a step of changing the use state information to use state information representing a state where the resource is not used exclusively in response to the broadcast exclusive use release request in each node. The exclusive control method according to 1.