JP4245304B2

JP4245304B2 - Computer cluster system, file access method

Info

Publication number: JP4245304B2
Application number: JP2002117059A
Authority: JP
Inventors: 誠司前田; 記代子佐藤; 伸夫崎山; 浩邦矢野; 拓也林
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2002-04-19
Filing date: 2002-04-19
Publication date: 2009-03-25
Anticipated expiration: 2022-04-19
Also published as: JP2003316637A

Description

【０００１】
【発明の属する技術分野】
本発明は、ネットワークを介して共通に接続された複数の計算機で構成する計算機クラスタシステムに関する。
【０００２】
【従来の技術】
計算機単体の機能を補い、高性能化や高信頼性化を実現するために、複数の計算機を通信装置を介して接続した計算機クラスタシステムが用いられている。このように計算機同士を接続することで、個々の計算機に処理が分散されて高性能化でき、また、接続した計算機の一部に障害が発生した際に、他の計算機によって故障した計算機上の処理を引き継ぐことで高信頼性化を実現することができる。
【０００３】
このような計算機クラスタシステムを実現するのに適したＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）の構造の一つにマイクロカーネル方式がある。
【０００４】
マイクロカーネル方式のＯＳは特開平１１−３４５１３４号公報にあるように、ＯＳをプロセス管理やメモリ管理等を行うカーネル部分と、ファイルシステム管理やユーザ管理等のサービスをサブシステム部分としてカーネルから分離した構造になっている。
【０００５】
計算機クラスタシステムでマイクロカーネル方式のＯＳを用いた場合は、通信装置で接続された複数の計算機の全てでカーネルを動作させ、サブシステムは、計算機クラスタシステムの一部若しくは単一の計算機上でのみ動作させる。そして、計算機クラスタシステム上で動かす一般のプログラム（ユーザプログラム）は、ファイルアクセス等のＯＳのサービスを利用する際には、そのサービスを提供するサブシステムと通信を行い、当該サブシステムがカーネルと連携してサービスを提供する。
【０００６】
マイクロカーネル方式の利点は、新しいサービスの提供やＯＳのバージョンアップを、ＯＳ全体を入れ替える代わりにサブシステムの追加・変更という形で用意に実現でき、メンテナンス性に優れている点と、各計算機で動作させるカーネルがコンパクトにできるので、計算機の処理能力をユーザプログラムの動作に多く割り当てることができるという点が挙げられる。
【０００７】
【発明が解決しようとする課題】
図１２は従来のマイクロカーネル方式のＯＳで実現した計算機クラスタシステムを説明する図である。このシステムは、ユーザプログラム９０１が動作する計算機９００とファイルシステムを管理するサブシステム９１１が動作する計算機９１０がネットワーク９２０を介して共通に接続されている。説明を簡略化するために他のサブシステムが動作する計算機については省略する。
【０００８】
計算機９００は、ユーザプログラム９０１と、各種のハードウエア制御を行うカーネル９０２と、データ等を記憶する記憶装置９０３と、ネットワークを介して他の計算機と通信を行う通信装置９０４とを含んで構成されている。計算機９１０も同等な構成である。
【０００９】
計算機９１０上で動作するサブシステム９１１は、前述の通りファイルシステムを管理するためのサブシステムで、ディレクトリ構造等のパスを管理するパス管理部９１５と、ファイルがどの計算機の記憶装置内に存在するかを管理するリソース管理部９１６と、記憶装置内におけるファイルの物理的位置を管理するファイル管理部９１７とを有し、サブシステム９１１はこれらを利用してファイルシステムの管理を行っている。
【００１０】
図１３は従来の計算機クラスタシステムにおいて、ユーザプログラム９０１がファイルを読み出す際の読み出し要求及びデータの流れを説明する図である。
【００１１】
（１）まず、ユーザプログラム９０１はファイルシステムを管理するサブシステム９１１にアクセスして読み出したいファイル名を通知する。（２）サブシステム９１１は、ユーザプログラム９０１から通知された名称のファイルが、どの計算機のどの記憶装置のどの空間に保存されているかを検索し、当該ファイルが存在する記憶装置を持つ計算機のカーネルに通知する。
【００１２】
カーネル９０２に通知した場合は（３）カーネル９０２は記憶装置９０３に読み出し要求を通知する。（４）記憶装置９０３はファイルの内容を読み出してカーネル９０２に通知する。（５）カーネル９０２はサブシステム９１１にファイルの内容を転送する。（６）サブシステム９１１はファイルの内容をユーザプログラム９０１に転送する。
【００１３】
一方カーネル９１２に通知した場合は（３’）カーネル９１２は記憶装置９１３に読み出し要求を通知する。（４’）記憶装置９１３はファイルの内容を読み出してカーネル９１２に通知する。（５’）カーネル９１２はサブシステム９１１にファイルの内容を転送する。（６’）サブシステム９１１はファイルの内容をユーザプログラム９０１に転送する。
【００１４】
ファイル書き込み時の書き込み要求及びデータの流れもほぼ同じである。
【００１５】
このように従来のＯＳではサブシステム９１１でファイルシステムの論理的構造と物理的構造の両方を管理しているために、ファイルの読み書きを、常に一旦サブシステムの動いている計算機を経由して行うので、特にユーザプログラムが動作する計算機とファイルが存在する記憶装置を管理する計算機が同一である場合は、計算機内部のデータの流れに比べて低速なネットワーク経由でデータを送受信するために効率面で問題があった。
【００１６】
特に図１３の（５）（６）はネットワークを介したデータの通信であるので、計算機内部の通信である（４）などに比べると非常に効率が悪い。
【００１７】
また、記憶装置として用いられるハードディスク等の磁気ディスクは内部に駆動部品を含むため計算機を構成するメモリやプロセッサ等に比べると故障する可能性が極めて高いため、従来よりハードディスク間でデータの冗長性を持たせて故障によるファイル消失を防ぐ手法を用いることが多い。
【００１８】
従来の計算機クラスタシステムでも、各計算機の記憶装置間でファイルの複製を持つことで容易にデータに冗長性を持たせることができる。そして、記憶装置の故障時には複製したデータを利用することで処理を継続できる。しかし、従来の計算機クラスタシステムではファイルの読み書きをサブシステム経由で行うため、データに冗長性を持たせると、ファイルの複製を作る際のオーバヘッドが非常に大きくなり非効率である。
【００１９】
ここでは計算機２台だけで説明したが、台数が増えるとサブシステム９１１が動作する計算機９１０とは異なる計算機に接続された記憶装置へアクセスする機会が激増する。特に前述のようにファイルの複製を持つようにした場合は、複製元のファイルと複製したファイルの両方がサブシステム９１１が動作する計算機９１０とは異なる計算機に接続された記憶装置に存在するというケースが生じる。そして、その頻度は計算機の台数に比例して多くなるため、効率も計算機の台数に比例して悪くなってしまう。
【００２０】
そこで本発明では、効率よくファイルの読み書きを行うことができる計算機クラスタシステムを提供することを目的とする。
【００２１】
【課題を解決するための手段】
上記課題を解決するために本発明の計算機クラスタシステムは、それぞれが記憶装置と通信装置を備えた複数の計算機を、ネットワークを介して共通に接続した計算機クラスタシステムにおいて、計算機クラスタシステム全体のファイルシステムは各計算機の記憶装置上に分散して存在し、一部の計算機は、前記ファイルシステムの論理的構造を表すパスとファイルに対して計算機クラスタシステム全体で一意に割り当てられる大域識別子との関係を管理するパス管理部を有するサブシステムを有し、各計算機は、それぞれの計算機が有する記憶装置の制御を行うカーネルを有し、前記カーネルは、ファイルがどの記憶装置に記憶されているかを識別する局所識別子と前記大域識別子との対応関係を記憶・管理するリソース管理部と、記憶装置内でファイルが記憶された場所の番地と前記局所識別子との関係を管理するファイル管理部とを有し、前記サブシステムのパス管理部に、前記計算機クラスタシステム上で動作するユーザプログラムが通知したファイルのパスに対応する大域識別子がない場合に、前記各計算機のリソース管理部は、前記ファイルのパスに対応する大域識別子と局所識別子の組を新たに記憶し、前記サブシステムは、前記ファイルのパスに対応する大域識別子とパスとの組を新たに記憶し、前記ユーザプログラムは、前記ファイルのパスに対応する大域識別子を取得することを特徴とする。
【００２２】
また、本本発明の計算機クラスタシステムは、記憶装置と通信装置を備え、ネットワークを介して共通に接続された複数の計算機から成る計算機クラスタシステムにおいて、計算機クラスタシステム全体で使用するファイルシステムは各計算機の記憶装置上に分散して存在し、計算機クラスタシステムで使用するオペレーティングシステムは、前記ファイルシステムの論理的構造を表すパスとファイルに対して計算機クラスタシステム全体で一意に割り当てられる大域識別子との関係を管理するパス管理部を有するサブシステムと、それぞれの計算機が有する記憶装置の制御を行うカーネルとを有し、前記カーネルはファイルがどの記憶装置に記憶されているかを識別する局所識別子と前記大域識別子との対応関係を記憶・管理するリソース管理部と、記憶装置内でファイルが記憶された場所の番地と前記局所識別子との関係を管理するファイル管理部とを有し、かつ、各計算機において動作させ、前記サブシステムは一部の計算機で動作させ、前記サブシステムのパス管理部に、前記計算機クラスタシステム上で動作するユーザプログラムが通知したファイルのパスに対応する大域識別子がない場合に、前記各計算機のリソース管理部は、前記ファイルのパスに対応する大域識別子と局所識別子の組を新たに記憶し、前記サブシステムは、前記ファイルのパスに対応する大域識別子とパスとの組を新たに記憶し、前記ユーザプログラムは、前記ファイルのパスに対応する大域識別子を取得することを特徴とする。
【００２３】
また、本発明の計算機クラスタシステムは、記憶装置と通信装置を備え、ネットワークを介して共通に接続された複数の計算機から成る計算機クラスタシステムにおいて、計算機クラスタシステム全体で使用するファイルシステムは各計算機の記憶装置上に分散して存在し、一部の計算機では、前記ファイルシステムの論理的構造を表すパスとファイルに対して計算機クラスタシステム全体で一意に割り当てられる大域識別子との関係を管理するパス管理部を有するサブシステムを動作させ、各計算機上では、それぞれの計算機が有する記憶装置の制御を行うカーネルを動作させ、前記カーネルは、ファイルがどの記憶装置に記憶されているかを識別する局所識別子と前記大域識別子との対応関係を記憶・管理するリソース管理部と、記憶装置内でファイルが記憶された場所の番地と前記局所識別子との関係を管理するファイル管理部とを有し、前記サブシステムのパス管理部に、前記計算機クラスタシステム上で動作するユーザプログラムが通知したファイルのパスに対応する大域識別子がない場合に、前記各計算機のリソース管理部は、前記ファイルのパスに対応する大域識別子と局所識別子の組を新たに記憶し、前記サブシステムは、前記ファイルのパスに対応する大域識別子とパスとの組を新たに記憶し、前記ユーザプログラムは、前記ファイルのパスに対応する大域識別子を取得することを特徴とする。
【００２４】
また、本発明のファイルアクセス方法は、記憶装置と通信装置を備え、ネットワークを介して共通に接続された複数の計算機から成る計算機クラスタシステムにおけるファイルアクセス方法であって、ユーザプログラムがファイルシステムの論理的構造を管理するサブシステムにファイルのパスを通知して当該ファイルの大域識別子を取得するオープンステップと、ユーザプログラムが、ユーザプログラムと同一の計算機で動作するカーネルに前記大域識別子を通知してファイルの読み書きを依頼する要求ステップと、前記カーネルが、前記大域識別子を局所識別子に変換して、読み書きすべきファイルが存在する記憶装置の所在を調べるファイル所在検索ステップと、前記ファイル所在検索ステップの結果、カーネルが動作している計算機とファイルが存在する記憶装置の属する計算機とが異なる場合に、前記カーネルがファイルが存在する記憶装置の属する計算機で動作するカーネルへ読み書き要求及び前記局所識別子を転送する要求転送ステップと、ファイルが存在する記憶装置の属する計算機上で動作するカーネルが前記局所識別子の示す記憶装置にアクセスして読み書き要求を実行するアクセスステップと、前記アクセスステップを実行したカーネルが、前記アクセスステップの結果を前記ユーザプログラムに通知する結果報告ステップとを有し、前記オープンステップにおいて、前記サブシステムのパス管理部に前記ファイルのパスに対応する大域識別子がない場合に、前記ユーザプログラムが前記サブシステムにファイルのパスを通知後、前記カーネルが前記ファイルのパスに対応する大域識別子と局所識別子の組を新たに割り当てて記憶するステップと、前記カーネルとその他の計算機のカーネルとの大域識別子と局所識別子との対応関係の情報を同一に保つステップと、前記サブシステムが前記ファイルのパスに対応する大域識別子とパスの新規割り当てを行い記憶するステップとを含むことを特徴とする。
【００２５】
また、本発明のファイルアクセス方法は、記憶装置と通信装置を備え、ネットワークを介して共通に接続された複数の計算機から成る計算機クラスタシステムにおけるファイルアクセス方法であって、ユーザプログラムがファイルシステムの論理的構造を管理するサブシステムにファイルのパスを通知して当該ファイルの大域識別子を取得するオープンステップと、ユーザプログラムが、ユーザプログラムと同一の計算機で動作するカーネルに前記大域識別子を通知してファイルの読み書きを依頼する要求ステップと、前記カーネルが、前記大域識別子を局所識別子に変換して、読み書きすべきファイルが存在する記憶装置の所在を調べるファイル所在検索ステップと、前記ファイル所在検索ステップの結果、カーネルが動作している計算機とファイルが存在する記憶装置の属する計算機とが異なる場合に、前記カーネルがファイルが存在する記憶装置の属する計算機で動作するカーネルへ読み書き要求及び前記局所識別子を転送する要求転送ステップと、ファイルが存在する記憶装置の属する計算機上で動作するカーネルが前記局所識別子の示す記憶装置にアクセスして読み書き要求を実行するアクセスステップと、前記アクセスステップを実行したカーネルが、前記アクセスステップの結果を前記ユーザプログラムに通知する結果報告ステップとを有し、前記オープンステップにおいて、前記サブシステムのパス管理部に、前記ファイルのパスに対応する大域識別子がない場合に、前記ユーザプログラムが前記サブシステムにファイルのパスを通知後、前記サブシステムが前記ファイルのパスに対応する大域識別子を割り当てて、パスと関連付けて記憶するステップと、前記カーネルが前記ファイルのパスに対応する大域識別子に対応する局所識別子を新たに割り当てて記憶するステップと、前記カーネルとその他の計算機のカーネルとの大域識別子と局所識別子との対応関係の情報を同一に保つステップとを有することを特徴とする。
【００３７】
【発明の実施の形態】
（システム概要・構成）以下、図面を参照して本発明の一実施形態の計算機クラスタシステムについて説明する。図１は本発明の一実施形態の計算機クラスタシステムについて説明する図である。
【００３８】
本実施形態の計算機クラスタシステムは、ユーザプログラム１０１を実行する計算機１００と、サブシステム１１１を実行する計算機１１０の２台がネットワーク１２０を介して接続されている。
【００３９】
計算機１００には、データを記憶するための磁気ディスクである記憶装置１０５と、ネットワークを介して他の計算機と通信を行う通信装置１０６が備わっている。また、計算機１００では、これらの記憶装置１０５や通信装置１０６、そして図示しないメモリ等のハードウエア制御等を担当するプログラムであるカーネル１０２と、計算機クラスタシステム及びサブシステム１０８で提供されるサービスを利用するアプリケーション等のプログラムであるユーザプログラム１０１が実行されている。
【００４０】
カーネル１０２は、ハードウエアを制御するインターフェース（図示せず）の他に、計算機システム全体で用いられるファイルやプロセス等のリソースを管理するリソース管理部１０３と、記憶装置１０５に記憶させたデータの物理的な所在（すなわち、ファイルシステムの物理的構造）を管理するファイル管理部１０４とを備え、これらはシステムコールを用いてカーネル１０２の外部（例えばユーザプログラム１０１）からアクセス可能になっている。
【００４１】
計算機１１０の構成は計算機１００と同様で、記憶装置１１５と通信装置１１６を備える。また、計算機１１０上ではハードウエア制御等を担当するプログラムで、カーネル１０２と同等の機能を持つカーネル１１２と、ユーザプログラム１０１を含むあらゆるユーザプログラムに対してサービスを提供するプログラムであるサブシステム１１１が実行されている。
【００４２】
カーネル１１２の構成はカーネル１０２と同様で、ハードウエアを制御するインターフェース（図示せず）の他に、計算機システム全体で用いられるファイルやプロセス等のリソースを管理するリソース管理部１１３と、記憶装置１１５に記憶させたデータの物理的な所在を管理するファイル管理部１１４とを備え、これらはシステムコールを用いてカーネル１１２の外部（例えばサブシステム１１１）からアクセス可能になっている。
【００４３】
サブシステム１１１はディレクトリ構造等のファイルシステムの論理的構造を管理するプログラムであり、パスと呼ばれるユーザがファイルを識別するための文字列（例えば「／ｈｏｍｅ／ｕｓｅｒ１／ｍｏｖｉｅ／ｇｏｇｏ．ｍｐｇ」など）をファイルの識別子に変換する機能を提供するパス管理部１１７を備える。サブシステム１１１は、計算機クラスタシステム全体で用いられるあらゆるリソースに割り当てられる大域識別子（以下、ＧＩＤ）のうち、計算機クラスタシステム全体で用いられるファイルに対するＧＩＤとパスの関係を記憶し管理する。
【００４４】
リソース管理部１０３、１１３は、計算機クラスタシステム全体で用いられるリソースに対して、前述のＧＩＤ及びリソースが実際に存在する計算機内での一意の識別子である局所識別子（以下、ＬＩＤ）を割り当てを行うとともに、ＧＩＤとＬＩＤの対応関係を記憶し管理する。ＬＩＤは当該リソースが存在する計算機自体の識別子とその計算機内でのリソース識別子を組み合わせた構造をしている。例えば、計算機１００の記憶装置１０５に記憶されたファイルのＬＩＤは「（計算機１００を表す識別子）：（記憶装置１０５を表す識別子）：（計算機１００内で一意なファイルの識別子）」のような構造になる。
【００４５】
本実施形態においては、リソース管理部１０３、１１３は、一方のＧＩＤとＬＩＤの対応関係に変更が生じた場合は他方に変更を通知して互いの持つ対応関係の情報を同一に保つようにする。ただし、ＧＩＤとＬＩＤの対応関係情報の管理はこれに限らない。例えばリソース管理部１０３で中央集権的に管理して、リソース管理部１１３は必要な時に問い合わせ、変更時に変更内容をリソース管理部１０３に通知するようにしても良い。または、分散管理方式にして互いに必要な時、例えばＬＩＤとの対応関係が不明なＧＩＤを受け取った場合に問い合わせをしても良い。リソース管理部間でのＧＩＤとＬＩＤの対応関係の情報の同期もしくは交換方法は、計算機クラスタシステムの規模・設計・目的に応じて適宜変更するのが望ましい。
【００４６】
ファイル管理部１０４はカーネル１０２に接続された記憶装置１０５に記憶されるファイルの物理的な所在を管理し、ファイル管理部１１４はカーネル１１２に接続された記憶装置１１５に記憶されるファイルの物理的な所在を管理する。物理的な所在とは、ファイルに対してデータサイズに応じた数だけ割り当てられた、記憶装置１０５、１１５の記録単位であるブロックの番号やアドレスのことである。ファイル管理部１０４、１１４では、ＬＩＤとＬＩＤが指すファイルに割り当てられた全ブロックのアドレスを管理する。
【００４７】
記憶装置１０５、１１５はファイルを記憶する磁気ディスクである。尚、磁気ディスクに限らずフラッシュメモリ等でも良い。記憶装置１０５はカーネル１０２のファイル管理部１０４からの読み出し要求に応じてファイルのデータをファイル管理部１０４に渡し、ファイル管理部１０４からの書き込み要求に応じて、ファイル管理部１０４から受け取ったデータを記憶する。記憶装置１１５はカーネル１１２のファイル管理部１１４からの読み出し要求に応じてファイルのデータをファイル管理部１１４に渡し、ファイル管理部１１４からの書き込み要求に応じて、ファイル管理部１１４から受け取ったデータを記憶する。
【００４８】
通信装置１０６、１１６はネットワーク１２０を介して相互に通信を行う。通信装置１０６はカーネル１０２からの要求に応じて通信装置１１６へデータを送信し、通信装置１１６から受信したデータをカーネル１０２に渡す。通信装置１１６はカーネル１１２からの要求に応じて通信装置１０６へデータを送信し、通信装置１０６から受信したデータをカーネル１１２に渡す。通信装置１０６、１１６としては、例えば、イーサネット（登録商標）、非同期通信モード（ＡＴＭ）、トークンリング等を用いることができる。
【００４９】
（ファイルオープン）以下、図２のフローチャートを用いてユーザプログラム１０１がファイルをオープンする動作を説明する。
【００５０】
ユーザプログラム１０１は、オープンするファイルのパスを含むオープン要求をリソース管理部１０３に通知する（ステップ２０１）。リソース管理部１０３はオープン要求をサブシステム１１１に転送する（ステップ２０２）。サブシステム１１１は、オープン要求に含まれるパスに対応するＧＩＤをサブシステム１１１が記憶しているテーブルの中から探索する（ステップ２０３）。そして、見つかったか否かで処理を分岐させる（ステップ２０４）。
【００５１】
見つかった場合は、サブシステム１１１はそのＧＩＤを要求結果に含めてリソース管理部１０３に通知する（ステップ２０５）。リソース管理部１０３はユーザプログラム１０１に要求結果を通知する（ステップ２０６）。そして、ユーザプログラム１０１はＧＩＤを得る（ステップ２０７）。
【００５２】
一方、見つからなかった場合は、サブシステム１１１は後述するリソース管理部１０３にＧＩＤの割り当てを要求する（ステップ２０８）。リソース管理部１０３は後述するＧＩＤの新規割り当てを行い（ステップ２０９）、割り当て結果をユーザプログラム１０１に通知する（ステップ２１０）。そして、ユーザプログラム１０１はＧＩＤを得る（ステップ２０７）。
【００５３】
以後、ユーザプログラム１０１は当該ファイルの読み書きに際してはこのＧＩＤを用いて対象となるファイルを指定する。
【００５４】
図３は、前述のステップ２０８とステップ２０９、すなわちサブシステム１１１によるＧＩＤ割り当て要求とリソース管理部１０３によるＧＩＤの新規割り当ての動作を説明するフローチャートである。
【００５５】
サブシステム１１１はパスに対応するＧＩＤが存在しないことが判明したら、ユーザプログラム１０１からのオープン要求を解析して、ファイルの新規作成許可の有無を調べ（ステップ２２１）、以後の処理を分岐させる（ステップ２２２）。
【００５６】
新規作成許可が無い場合は、オープン結果は失敗となるのでリソース管理部１０３にオープン失敗を通知する（ステップ２２７）。リソース管理部１０３は、オープン結果は失敗なのでＧＩＤの割り当ては行わず（ステップ２２８）前述のステップ２１０でその旨をユーザプログラム１０１に通知する。
【００５７】
新規作成許可が有る場合は、サブシステム１１１はリソース管理部１０３にＧＩＤの新規割り当てを要求する（ステップ２２３）。リソース管理部１０３は、新たなＧＩＤとＬＩＤの組を生成して記憶する（ステップ２２４）。そして、生成したＧＩＤとパスをサブシステム１１１に通知して対応関係を記憶させるとともに、前述の通りＧＩＤとＬＩＤをリソース管理部１１３に通知してリソース情報の更新を行う（ステップ２２５）。そして、割り当て結果は成功となり（ステップ２２６）、前述のステップ２１０では成功した旨とＧＩＤをユーザプログラム１０１に通知する。
【００５８】
ここではファイルオープンについて説明したが、ファイル削除要求の場合は次のように処理を行う。
【００５９】
（１）ユーザプログラム１０１はリソース管理部１０３に削除要求を通知し、リソース管理部１０３はサブシステム１１１に削除要求を転送する。（２）サブシステム１１１はパス管理部１１７で記憶しているＧＩＤとこれに対応するパスを削除して、リソース管理部１０３にＧＩＤが削除されたことを通知する。（３）サブシステム１１１から通知を受けたリソース管理部１０３は該当するＧＩＤとこれに対応するＬＩＤを記憶から削除し、ファイル管理部１０４にＬＩＤが削除されたことを通知する。（４）リソース管理部１０３から通知を受けたファイル管理部１０４は、該当するＬＩＤとこれに対応するＬＩＤが指すファイルに割り当てられた全ブロックのアドレスの関係を記憶から削除する。
【００６０】
リソース管理部１０３からユーザプログラム１０１への削除結果の通知は、（３）の処理を行う時に並行して行っても良いし、（３）の処理終了後や、（４）の処理終了後でも構わない。（４）の処理終了後に削除結果の通知を行う場合は、ファイル管理部１０４の削除処理結果をリソース管理部１０３に通知させるようにする。
【００６１】
（ファイル書き込み）以下、図４のフローチャートを用いてユーザプログラム１０１がファイルにデータを書き込む動作を説明する。
【００６２】
ユーザプログラム１０１はファイルへの書き込み要求とＧＩＤとデータをリソース管理部１０３に通知する（ステップ３０１）。リソース管理部１０３はＧＩＤに対応するＬＩＤを検索する。そして、見つかったＬＩＤが示す計算機が計算機１００と異なる場合（すなわち、計算機１１０を示す場合）には書き込み要求とＬＩＤとデータをＬＩＤが示す計算機のリソース管理部に転送する（ステップ３０２）。
【００６３】
ＬＩＤが計算機１１０を示している場合は、ステップ３０２で書き込み要求とＬＩＤとデータは計算機１１０のリソース管理部１１３に転送される。リソース管理部１１３が後述するようなデータの書き込み処理を行い（ステップ３０３）、リソース管理部１１３は結果をリソース管理部１０３に通知する（ステップ３０４）。そして、リソース管理部１０３は書き込み処理の結果をユーザプログラム１０１に通知する（ステップ３０６）。
【００６４】
ＬＩＤが計算機１００を示している場合は転送の必要は無い。リソース管理部１０３は後述する書き込み処理を行い（ステップ３０５）、ステップ３０６を実行する。
【００６５】
図５はリソース管理部１０３におけるファイルの書き込み処理のフローチャートである。リソース管理部１１３についても同様の処理を行う。
【００６６】
リソース管理部１０３は書き込み要求とＬＩＤとデータをファイル管理部１０４に通知する（ステップ３１１）。ファイル管理部１０４はＬＩＤが示すファイルの記憶装置１０５内における物理的位置を検索するとともに（ステップ３１２）、ＬＩＤが示すファイルに割り当てられた記憶領域が書き込むデータのサイズに比べて十分な大きさが確保されているかをチェックする（ステップ３１３）。そして、十分な大きさの記憶領域がなければ確保を行う（ステップ３１４）。
【００６７】
そして、ファイル管理部１０４はデータを記憶装置１０５に書き込む（ステップ３１５）。ファイル管理部１０４は書き込み結果をリソース管理部１０３に通知する（ステップ３１６）。
【００６８】
このように、ユーザプログラム１０１がファイルにデータを書き込む際に、従来は必ずサブシステム１１１がユーザプログラム１０１からデータを受け取って書き込んでいたが、本実施形態ではユーザプログラム１０１が動作する計算機１００に接続された記憶装置１０５に書き込み対象のファイルが存在する場合は、ネットワークを介してデータを送信することなくファイルの書き込みが可能であるので効率がよい。
【００６９】
（ファイル読み出し）以下、図６のフローチャートを用いてユーザプログラム１０１がファイルからデータを読み出す動作を説明する。
【００７０】
ユーザプログラム１０１は読み出し要求とＧＩＤをリソース管理部１０３に通知する（ステップ４０１）。リソース管理部１０３はＧＩＤからＬＩＤを検索する。そして、見つかったＬＩＤが示す計算機が計算機１００と異なる場合（すなわち、計算機１１０を示す場合）には読み出し要求とＬＩＤをＬＩＤが示す計算機のリソース管理部に転送する（ステップ４０２）。
【００７１】
ＬＩＤが計算機１１０を示していた場合は、ステップ４０２で読み出し要求とＬＩＤを計算機１１０のリソース管理部１１３に転送される。リソース管理部１１３は後述するような読み出し処理を行い（ステップ４０３）、リソース管理部１１３は読み出し結果とデータをリソース管理部１０３に通知する（ステップ４０４）。そして、リソース管理部１０３は読み出し結果とデータをユーザプログラム１０１に通知する（ステップ４０６）。
【００７２】
ＬＩＤが計算機１００を示していた場合は転送の必要は無い。リソース管理部１０３は、後述する読み出し処理を行い（ステップ４０５）、ステップ４０６を実行する。
【００７３】
図７はリソース管理部１０３におけるファイルの読み出し処理のフローチャートである。リソース管理部１１３における読み出し処理も同様に行われる。
【００７４】
リソース管理部１０３は読み出し要求とＬＩＤをファイル管理部１０４に通知する（ステップ４１１）。ファイル管理部１０４はＬＩＤが示すファイルの記憶装置１０５内における物理的位置を検索する（ステップ４１２）。ファイル管理部１０４は記憶装置１０５からデータを読み出す（ステップ４１３）。ファイル管理部１０４はデータと読み出し結果をリソース管理部１０３に通知する（ステップ４１４）。
【００７５】
このように、ユーザプログラム１０１がファイルからデータを読み出す際に、従来は必ずサブシステム１１１が記憶装置１０５からデータを読み出してユーザプログラム１０１にデータを転送していたが、本実施形態ではユーザプログラム１０１が動作する計算機１００に接続された記憶装置１０５に読み出し対象ファイルが存在する場合はネットワークを介してデータを受信することなくファイルの読み出しが可能であるので効率がよい。
【００７６】
（リソース管理部の構成）図８はリソース管理部の構成を説明する図である。リソース管理部６０１は、ＧＩＤとＬＩＤの変換を行うＩＤ変換部６０２と、ＧＩＤとＬＩＤの対応関係を記憶しているリソース管理テーブル６０３とを有する。
【００７７】
ＩＤ変換部６０２は、ユーザプログラムやサブシステムから通知された要求、及び、他の計算機上のリソース管理部から転送された要求を受信して処理する。要求に含まれるＩＤがＧＩＤである場合は、リソース管理テーブル６０３を参照してＧＩＤをＬＩＤに変換する。変換して得られたＬＩＤが他の計算機に関するＬＩＤである場合には、該当する計算機に要求を転送する。さらに、ファイルの新規オープン時には新たなＧＩＤとＬＩＤを生成してリソース管理テーブル６０３に記憶させ、ファイル削除時にはＧＩＤとＬＩＤの削除を行う。
【００７８】
尚、前述の通り、本実施形態の場合はリソース管理テーブル６０３に変更が生じた場合は他の計算機のリソース管理部に変更内容を通知して、リソース管理テーブル６０３の内容の同一性を保つ。
【００７９】
リソース管理テーブル６０３はＧＩＤで表されるリソースに対応するＬＩＤを管理する。ＧＩＤはリソースの種類（プロセス、ファイル等）によって番号の付け方が異なるようにしてある。例えば図８だとＶＰで始まるＧＩＤはプロセス、ＶＦで始まるＧＩＤはファイルを表す。例えば、ＧＩＤがＶＰ０００１で表されるリソースは計算機１００上で動作するプロセスのＰ０００１であることが分かる。
【００８０】
（ファイル管理部の構成）図９はファイル管理部の構成を説明する図である。ファイル管理部７０１は、ＬＩＤを記憶装置上の位置に変換するブロック位置変換部７０２と、ＬＩＤと記憶装置上の位置の対応関係を記憶しているファイル管理テーブル７０３とを有する。
【００８１】
ブロック位置変換部７０２は、リソース管理部から受け取った書き込みや読み出し要求に含まれるＬＩＤ及びファイル内でのデータの位置情報（ファイルの先頭から何バイト目という情報）を、ファイル管理テーブル７０３を用いて記憶装置上の位置に変換する。記憶装置のブロックサイズが６４ＫＢの場合ならば、ＬＩＤがＦ０００３であるファイルの先頭から１３０ＫＢ目のデータは、ファイル管理テーブル７０３よりブロック８に格納されていることが分かる。
【００８２】
尚、ファイル管理部はＬＩＤ及びファイル内でのデータの位置情報から、記憶装置上の位置を知ることができれば図８の構成でなくても良い。
【００８３】
（パス管理部の構成）図１０はサブシステムのパス管理部の構成を説明する図である。パス管理部８０１は、ファイルシステムの一機能であるパスをＧＩＤに変換する機能を提供するパス変換部８０２と、パスとＧＩＤの対応を記憶するパス管理テーブル８０３とを有する。
【００８４】
パス変換部８０２は、リソース管理部よりファイルのオープン要求を受信し、要求に含まれるパスを、パス管理テーブル８０３を用いてＧＩＤに変換し、結果に含めてリソース管理部に通知する。また、逆にリソース管理部で生成されたファイルのＧＩＤとパスを受け取ってパス管理テーブルに記憶させたり、リソース管理部からＧＩＤが抹消された旨の通知を受けた場合にはパス管理テーブルから当該ＧＩＤとこれに関連付けられたパスを削除するなどの、パス管理テーブルの管理を行う。
【００８５】
変換の具体例を説明すると、例えばパスが「／ｈｏｍｅ／ｄｏｃ／ｄｏｃｕｍｅｎｔ．ｔｘｔ」であった場合、「ｈｏｍｅ」、「ｄｏｃ」、「ｄｏｃｕｍｅｎｔ．ｔｘｔ」の順にパス管理テーブル８０３を検索してＧＩＤを得る。
【００８６】
始めに、パス管理テーブル８０３に記憶されたデータから親ＧＩＤが無く、名前が「ｈｏｍｅ」であるデータを検索してデータ８０４を得る。
【００８７】
次に、データ８０４に含まれるＧＩＤ「ＶＦ０００１」を用いて、今度は親ＧＩＤが「ＶＦ０００１」であり、名前が「ｄｏｃ」であるデータを検索してデータ８０５を得る。
【００８８】
同様にしてデータ８０５を利用して、親ＧＩＤが「ＶＦ０００２」であり、名前が「ｄｏｃｕｍｅｎｔ．ｔｘｔ」であるデータを検索してデータ８０６を得て、「ｄｏｃｕｍｅｎｔ．ｔｘｔ」のＧＩＤ「ＶＦ０００３]」を結果に含めてリソース管理部に通知する。
【００８９】
ファイルが存在しない等の事情で変換できない場合はオープン要求を解析し、ファイルの新規作成許可が与えられている場合にはリソース管理部にＧＩＤの割り当てを要求し、許可が与えられていない場合はリソース管理部にエラーを返す。
【００９０】
尚、パスをＧＩＤに変換する機能を有すれば、パス管理部は図９の構成でなくても構わない。
（データの流れ）図１１はユーザプログラム１００１が記憶装置からデータを読み出す場合のデータの流れを説明する図である。図中では通信装置は省略してあるが計算機１０００、１０１０間の通信は通信装置を利用してネットワーク経由で行われる。
【００９１】
ユーザプログラム１００１が記憶装置１００３からファイルを読み出す場合は、読み出し要求は（１）ユーザプログラム１００１からカーネル１００２へ伝わり（２）カーネル１００２から記憶装置１００３に伝わる。そして、読み出したデータは（３）記憶装置１００３からカーネル１００２へ伝わり（４）カーネル１００２からユーザプログラム１００１に伝わる。
【００９２】
（１）〜（４）までの全ての通信は計算機内部で行われる。計算機内部の通信はネットワーク経由の通信に比べて大変高速であるので、読み出しが完了するまでの時間が短く済み、処理の効率が良いことがわかる。書き込みについても同様である。
【００９３】
一方、ユーザプログラム１００１が記憶装置１０１３からファイルを読み出す場合は、読み出し要求は（１）ユーザプログラム１００１からカーネル１００２へ伝わり（２’）カーネル１００２からカーネル１０１２へ転送されて（３’）カーネル１０１２から記憶装置１０１３に伝わる。そして、読み出したデータは（４’）記憶装置１０１３からカーネル１０１２へ伝わり（５’）カーネル１０１２からカーネル１００２へ転送され、（６’）カーネル１００２からユーザプログラム１００１に伝わる。書き込みの場合も同様である。
【００９４】
ユーザプログラムが動作する計算機とファイルが格納された記憶装置が接続された計算機が異なる場合は、読み書きの効率は従来と変わらない。しかし、同一の計算機に接続された記憶装置から読み書きする場合には従来のようなネットワーク経由のファイル内容の通信が行われないのでオーバヘッドが激減して効率が良くなるので、システム全体としてファイルの読み書きの効率が良くなる。
【００９５】
本実施形態では計算機は２台だけで説明したが、３台以上の計算機を用いて計算機クラスタシステムを構成することも可能である。また、複数のユーザプログラムを計算機クラスタシステムの異なる計算機上で実行し、そのユーザプログラムの一部をサブシステムが実行されている計算機上で実行しても良い。また、複数のサブシステムを計算機クラスタシステム内の異なる計算機上で実行しても良い。さらに、サブシステムの機能を分割し、計算機クラスタシステム内の異なる計算機上に分散して実行しても良い。
【００９６】
また、本実施形態では新しいファイルに対するＧＩＤをリソース管理部１０３若しくは１１３で割り当てるとしたが、サブシステム１１１で割り当てるようにしても良い。ユーザプログラム１０１ファイルのオープンをリソース管理部１０３に通知した場合を例に説明すると、ファイルが存在しない場合はサブシステム１１１はＧＩＤを割り当ててパスと関連付けて記憶するとともに、リソース管理部１０３に通知する。そして、リソース管理部１０３はＧＩＤに対応するＬＩＤを新たに割り当てて記憶する。
【００９７】
（冗長性を持たせた場合）ここまでは基本的なファイル読み書きについて説明してきたが、以下は異なる計算機の記憶装置にファイルの複製を作ることでデータの冗長性を持たせた場合の動作について説明する。
【００９８】
冗長性を持たせたファイルの場合、図８のＧＩＤ「ＶＦ０００４」のように一つのＧＩＤに対して複数のＬＩＤ「計算機１００：記憶装置１０５：Ｆ０００４」「計算機１１０：記憶装置１１５：Ｆ０００４」が存在する。
【００９９】
ファイルへの書き込み動作においては、ステップ３０２でＧＩＤに対応するＬＩＤを全て抽出する。そして、計算機１１０のリソース管理部１１３に書き込み要求とデータとＬＩＤ「計算機１１０：Ｆ０００４」を転送してリソース管理部１１３に書き込み処理を行わせる。これと並行してリソース管理部１０３も書き込み処理を行う。そしてリソース管理部１０３は両方の書き込みが完了してからステップ３０６を実行して書き込み結果をユーザプログラム１０１に通知する。尚、計算機クラスタシステムの運用形態・方針によっては、一方の書き込みが終わった段階でステップ３０６を実行させても良い。
【０１００】
ファイルの読み出し動作は、複数の計算機のうちの一つを選択して、冗長性を持たせていない場合と同様に行う。計算機の選択にあたっては、要求を出したユーザプログラムを実行している計算機を選択するのが最も効率的である。
【０１０１】
記憶装置に障害が発生した場合には、リソース管理部では故障した記憶装置の代わりに他の記憶装置のＬＩＤを選択するようにする。
【０１０２】
例えば、記憶装置１０５に障害が発生した場合には、ユーザプログラム１０１からのＧＩＤ「ＶＦ０００４」のファイルの読み出し要求を受けたリソース管理部１０３は、ＧＩＤからＬＩＤに変換する際には記憶装置１０５を指す「計算機１００：記憶装置１０５：Ｆ０００４」は選択せず、「計算機１１０：記憶装置１１５：Ｆ０００４」を選択するようにする。
【０１０３】
その結果、書き込み動作は図６のステップ４０１、４０２、４０３、４０４、４０６を実行することになり、書き込み動作は図４のステップ３０１、３０２、３０３、３０４、３０６を実行する。
【０１０４】
いずれの場合でもユーザプログラム１０１は障害が無い時と同様に読み書きをリクエストすれば良い。
【０１０５】
本実施形態の計算機クラスタシステムならば、ファイルの読み書きを要求したユーザプログラムやサブシステムが実行されている計算機と、読み書き対象のファイルが記憶された記憶装置が接続された計算機が同一である場合に効率よくファイルの読み書きを行えるので、システム全体の平均ファイルアクセス効率が向上する。また、ファイルの読み書きをカーネル側で実行することで、サブシステムが稼動する計算機に通信が集中しにくくなるので、システム全体の能力、堅牢性及び信頼性を向上させることができる。また、ファイルに冗長性を持たせた場合でも持たせていない場合と同様にファイルの読み書きを効率良くできる。
【０１０６】
【発明の効果】
以上、本発明の計算機クラスタシステムならば、ファイルシステムの機能を分割して、パス管理をサブシステムに、リソース管理及びファイル管理をカーネルに実装することにより、読み書きの効率を向上させることができる。
【図面の簡単な説明】
【図１】本発明の一実施形態の計算機クラスタシステムの構成を説明する図。
【図２】本発明の一実施形態の計算機クラスタシステムにおけるファイルオープン動作を説明するフローチャート。
【図３】図２のステップ２０８、２０９におけるＧＩＤ割り当て動作を説明するフローチャート。
【図４】本発明の一実施形態の計算機クラスタシステムにおけるファイル書き込み動作を説明するフローチャート。
【図５】リソース管理部におけるファイル書き込み動作を説明するフローチャート。
【図６】本発明の一実施形態の計算機クラスタシステムにおけるファイル読み出し動作を説明するフローチャート。
【図７】リソース管理部におけるファイル読み出し動作を説明するフローチャート。
【図８】リソース管理部の構成を説明する図。
【図９】ファイル管理部の構成を説明する図。
【図１０】サブシステムの構成を説明する図。
【図１１】本発明の一実施形態の計算機クラスタシステムにおけるファイル読み出し時のデータの流れの一例を説明する図。
【図１２】従来の計算機クラスタシステムの構成を説明する図。
【図１３】従来の計算機クラスタシステムにおけるファイル読み出し時のデータの流れの一例を説明する図。
【符号の説明】
１００、１１０計算機
１０１ユーザプログラム
１０２、１１２カーネル
１０３、１１３リソース管理部
１０４、１１４ファイル管理部
１０５、１１５記憶装置
１０６、１１６通信装置
１１１サブシステム
１１７パス管理部
１２０ネットワーク
６０１リソース管理部
６０２ＩＤ変換部
６０３リソース管理テーブル
７０１ファイル管理部
７０２ブロック位置変換部
７０３ファイル管理テーブル
８０１サブシステム（パス管理部）
８０２パス変換部
８０３パス管理テーブル
９００、９１０計算機
９０１ユーザプログラム
９０２、９１２カーネル
９０３、９１３記憶装置
９０４、９１４通信装置
９１５パス管理部
９１６リソース管理部
９１７ファイル管理部
９２０ネットワーク[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a computer cluster system including a plurality of computers commonly connected via a network.
[0002]
[Prior art]
A computer cluster system in which a plurality of computers are connected via a communication device is used to supplement the functions of a single computer and achieve high performance and high reliability. By connecting computers in this way, processing can be distributed to individual computers to improve performance, and when a failure occurs in a part of the connected computers, on the computer that has failed by another computer High reliability can be realized by taking over the processing.
[0003]
One of the OS (Operating System) structures suitable for realizing such a computer cluster system is a microkernel system.
[0004]
As described in Japanese Patent Application Laid-Open No. 11-345134, the microkernel OS is separated from the kernel as a kernel part for performing process management and memory management, and a service for file system management and user management as a subsystem part. It has a structure.
[0005]
When a micro-kernel OS is used in a computer cluster system, the kernel is operated on all of a plurality of computers connected by a communication device, and the subsystem is only a part of the computer cluster system or a single computer. Make it work. When a general program (user program) that runs on a computer cluster system uses an OS service such as file access, it communicates with a subsystem that provides the service, and the subsystem cooperates with the kernel. And provide services.
[0006]
The advantage of the microkernel method is that it is possible to provide new services and upgrade the OS in the form of adding and changing subsystems instead of replacing the entire OS. Since the kernel to be operated can be made compact, it is possible to allocate a large amount of processing capacity of the computer to the operation of the user program.
[0007]
[Problems to be solved by the invention]
FIG. 12 is a diagram for explaining a computer cluster system realized by a conventional microkernel OS. In this system, a computer 900 that operates a user program 901 and a computer 910 that operates a subsystem 911 that manages a file system are commonly connected via a network 920. In order to simplify the description, a computer in which another subsystem operates is omitted.
[0008]
The computer 900 includes a user program 901, a kernel 902 that performs various hardware controls, a storage device 903 that stores data and the like, and a communication device 904 that communicates with other computers via a network. ing. The computer 910 has an equivalent configuration.
[0009]
The subsystem 911 operating on the computer 910 is a subsystem for managing the file system as described above, and a path management unit 915 for managing a path such as a directory structure and a storage device of which computer the file is in. A resource management unit 916 that manages the file, and a file management unit 917 that manages the physical location of the file in the storage device, and the subsystem 911 uses these to manage the file system.
[0010]
FIG. 13 is a diagram for explaining a read request and a data flow when the user program 901 reads a file in a conventional computer cluster system.
[0011]
(1) First, the user program 901 accesses the subsystem 911 that manages the file system and notifies the name of the file to be read. (2) The subsystem 911 searches in which space of which storage device of which computer the file with the name notified from the user program 901 is stored, and the kernel of the computer having the storage device in which the file exists Notify
[0012]
When notified to the kernel 902 (3) the kernel 902 notifies the storage device 903 of a read request. (4) The storage device 903 reads the contents of the file and notifies the kernel 902 of it. (5) The kernel 902 transfers the file contents to the subsystem 911. (6) The subsystem 911 transfers the contents of the file to the user program 901.
[0013]
On the other hand, when notified to the kernel 912 (3 ′), the kernel 912 notifies the storage device 913 of a read request. (4 ′) The storage device 913 reads the contents of the file and notifies the kernel 912 of it. (5 ′) The kernel 912 transfers the file contents to the subsystem 911. (6 ′) The subsystem 911 transfers the contents of the file to the user program 901.
[0014]
The write request and data flow at the time of writing the file are almost the same.
[0015]
As described above, in the conventional OS, both the logical structure and the physical structure of the file system are managed by the subsystem 911. Therefore, the reading and writing of the file is always performed once via the computer in which the subsystem is running. Therefore, especially when the computer that runs the user program and the computer that manages the storage device where the file exists are the same, it is efficient in order to send and receive data via a network that is slower than the data flow inside the computer. There was a problem.
[0016]
In particular, since (5) and (6) in FIG. 13 are data communications via a network, they are very inefficient compared to (4) or the like, which is communications within a computer.
[0017]
In addition, magnetic disks such as hard disks used as storage devices contain drive components inside, so they are very likely to fail compared to memory and processors that make up computers. Many methods are used to prevent file loss due to failure.
[0018]
Even in a conventional computer cluster system, data can be easily made redundant by having a file copy between storage devices of each computer. When the storage device fails, the process can be continued by using the replicated data. However, since the conventional computer cluster system reads and writes files via the subsystem, if data is made redundant, the overhead in making a copy of the file becomes very large and inefficient.
[0019]
Although only two computers have been described here, the opportunity to access a storage device connected to a computer different from the computer 910 on which the subsystem 911 operates increases as the number increases. In particular, in the case of having a copy of a file as described above, a case where both the copy source file and the copied file exist in a storage device connected to a computer different from the computer 910 on which the subsystem 911 operates. Occurs. Since the frequency increases in proportion to the number of computers, the efficiency also decreases in proportion to the number of computers.
[0020]
Therefore, an object of the present invention is to provide a computer cluster system that can efficiently read and write files.
[0021]
[Means for Solving the Problems]
In order to solve the above problems, a computer cluster system according to the present invention is a computer cluster system in which a plurality of computers each having a storage device and a communication device are commonly connected via a network. Exist in a distributed manner on the storage device of each computer, and some computers have a relationship between a path representing the logical structure of the file system and a global identifier uniquely assigned to the file in the entire computer cluster system. Each computer has a subsystem that has a path management unit for managing, and each computer has a kernel that controls the storage device that each computer has, and the kernel identifies which storage device the file is stored in A resource management unit for storing and managing a correspondence relationship between a local identifier and the global identifier; And a file management unit for managing the relationship between the address of the location where the file is stored as the local identifier within, the path management unit of the subsystems, operating on the computer cluster system User program If there is no global identifier corresponding to the file path notified by For each calculator The resource management unit newly stores a global identifier and local identifier pair corresponding to the file path, and the subsystem newly stores a global identifier and path pair corresponding to the file path, The user program acquires a global identifier corresponding to a path of the file.
[0022]
The computer cluster system of the present invention includes a storage device and a communication device. In a computer cluster system composed of a plurality of computers commonly connected via a network, the file system used in the entire computer cluster system is the same as that of each computer. An operating system that is distributed on a storage device and used in a computer cluster system has a relationship between a path that represents the logical structure of the file system and a global identifier that is uniquely assigned to the file throughout the computer cluster system. A local identifier that identifies a storage device in which the file is stored, and a global identifier that includes a subsystem having a path management unit for management and a kernel that controls a storage device included in each computer; For storing and managing the relationship A management unit, and a file management unit that manages the relationship between the address of the location where the file is stored in the storage device and the local identifier, and is operated in each computer. And operate on the computer cluster system in the path management unit of the subsystem. User program If there is no global identifier corresponding to the file path notified by For each calculator The resource management unit newly stores a global identifier and local identifier pair corresponding to the file path, and the subsystem newly stores a global identifier and path pair corresponding to the file path, The user program acquires a global identifier corresponding to a path of the file.
[0023]
The computer cluster system of the present invention includes a storage device and a communication device, and in a computer cluster system composed of a plurality of computers commonly connected via a network, the file system used in the entire computer cluster system is the same for each computer. Path management that exists in a distributed manner on a storage device and manages the relationship between the path that represents the logical structure of the file system and the global identifier that is uniquely assigned to the file in the entire computer cluster system. A local identifier that identifies which storage device the file is stored in, and a kernel that controls a storage device included in each computer is operated on each computer. A resource management unit for storing and managing a correspondence relationship with the global identifier; And a file management unit for managing the relationship between the address of the location where the file is stored as the local identifier within, the path management unit of the subsystems, operating on the computer cluster system User program If there is no global identifier corresponding to the file path notified by For each calculator The resource management unit newly stores a global identifier and local identifier pair corresponding to the file path, and the subsystem newly stores a global identifier and path pair corresponding to the file path, The user program acquires a global identifier corresponding to a path of the file.
[0024]
The file access method of the present invention is a file access method in a computer cluster system comprising a plurality of computers commonly provided with a storage device and a communication device and connected via a network, wherein the user program is a logic of the file system. An open step of notifying the subsystem that manages the physical structure of the file and obtaining the global identifier of the file; and the user program notifying the global identifier to the kernel operating on the same computer as the user program. A request step for requesting reading / writing, a file location search step in which the kernel converts the global identifier into a local identifier to check the location of a storage device in which a file to be read / written exists, and a result of the file location search step The computer on which the kernel is running When the computer to which the storage device in which the file exists belongs is different from the computer to which the file exists, a request transfer step for transferring the read / write request and the local identifier to the kernel operating in the computer to which the storage device in which the file exists belongs, and the file exists An access step in which a kernel operating on a computer to which the storage device belongs accesses the storage device indicated by the local identifier to execute a read / write request, and the kernel that has executed the access step sends the result of the access step to the user program A result reporting step of notifying, and in the open step, the user program notifies the subsystem of the file path when there is no global identifier corresponding to the path of the file in the path management unit of the subsystem. After that, the kernel A step of newly allocated and stores the set of the global identifier and the local identifier corresponding to the path of, Maintaining the same correspondence information between global identifiers and local identifiers of the kernel and kernels of other computers; and The subsystem includes a step of newly assigning and storing a global identifier corresponding to the path of the file and a path.
[0025]
The file access method of the present invention is a file access method in a computer cluster system comprising a plurality of computers commonly provided with a storage device and a communication device and connected via a network, wherein the user program is a logic of the file system. An open step of notifying the subsystem that manages the physical structure of the file and obtaining the global identifier of the file; and the user program notifying the global identifier to the kernel operating on the same computer as the user program. A request step for requesting read / write of the file, a result of the file location search step in which the kernel converts the global identifier into a local identifier to check the location of the storage device in which the file to be read / written exists, and the file location search step The computer on which the kernel is running When the computer to which the storage device in which the file exists belongs is different from the computer to which the file exists, a request transfer step for transferring the read / write request and the local identifier to the kernel operating in the computer to which the storage device in which the file exists belongs, and the file exists An access step in which a kernel operating on a computer to which the storage device belongs accesses the storage device indicated by the local identifier to execute a read / write request, and the kernel that has executed the access step sends the result of the access step to the user program A reporting result report step, and in the open step, when the path management unit of the subsystem does not have a global identifier corresponding to the path of the file, the user program sets the path of the file to the subsystem. After notification, the subsystem Assigning a global identifier corresponding to the path of Airu, and storing in association with the path, the steps of the kernel newly allocated and storing local identifier corresponding to the global identifier corresponding to the path of the file, Maintaining the same correspondence information between global identifiers and local identifiers of the kernel and kernels of other computers; and It is characterized by having.
[0037]
DETAILED DESCRIPTION OF THE INVENTION
(System Overview / Configuration) A computer cluster system according to an embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram for explaining a computer cluster system according to an embodiment of this invention.
[0038]
In the computer cluster system of this embodiment, two computers, a computer 100 that executes a user program 101 and a computer 110 that executes a subsystem 111, are connected via a network 120.
[0039]
The computer 100 includes a storage device 105 that is a magnetic disk for storing data, and a communication device 106 that communicates with other computers via a network. In the computer 100, the storage device 105, the communication device 106, the kernel 102 which is a program responsible for hardware control of a memory (not shown), and the services provided by the computer cluster system and the subsystem 108 are used. A user program 101 that is a program such as an application to be executed is executed.
[0040]
The kernel 102 has an interface (not shown) for controlling hardware, a resource management unit 103 that manages resources such as files and processes used in the entire computer system, and the physical data stored in the storage device 105. And a file management unit 104 that manages a specific location (that is, a physical structure of the file system), and these can be accessed from outside the kernel 102 (for example, the user program 101) using a system call.
[0041]
The configuration of the computer 110 is the same as that of the computer 100 and includes a storage device 115 and a communication device 116. Further, on the computer 110, there is a subsystem 111 which is a program responsible for hardware control and the like, and has a kernel 112 having the same function as the kernel 102 and a program that provides services to all user programs including the user program 101. It is running.
[0042]
The configuration of the kernel 112 is the same as that of the kernel 102. In addition to an interface (not shown) for controlling hardware, a resource management unit 113 for managing resources such as files and processes used in the entire computer system, and a storage device 115 And a file management unit 114 for managing the physical location of the data stored therein, which can be accessed from outside the kernel 112 (for example, the subsystem 111) using a system call.
[0043]
The subsystem 111 is a program that manages the logical structure of a file system such as a directory structure, and is a character string (for example, “/home/user1/movie/gogo.mpg”) used to identify a file called a path. Is provided with a path management unit 117 that provides a function of converting a file identifier into a file identifier. The subsystem 111 stores and manages a relationship between a GID and a path for a file used in the entire computer cluster system among global identifiers (hereinafter referred to as GIDs) assigned to all resources used in the entire computer cluster system.
[0044]
The resource managers 103 and 113 assign the above-mentioned GID and local identifier (hereinafter referred to as LID) that is a unique identifier within the computer in which the resource actually exists to resources used in the entire computer cluster system. In addition, the correspondence between GID and LID is stored and managed. The LID has a structure in which the identifier of the computer itself in which the resource exists and the resource identifier in the computer are combined. For example, the LID of a file stored in the storage device 105 of the computer 100 has a structure such as “(identifier representing the computer 100): (identifier representing the storage device 105): (identifier of the file unique within the computer 100)”. become.
[0045]
In the present embodiment, when a change occurs in the correspondence relationship between one GID and LID, the resource management units 103 and 113 notify the other to the change so that the information on the correspondence relationship held by each other remains the same. . However, the management of correspondence information between GID and LID is not limited to this. For example, the resource management unit 103 may perform centralized management, and the resource management unit 113 may inquire when necessary, and notify the resource management unit 103 of the change contents when a change is made. Alternatively, inquiries may be made when a mutual management method is required for each other, for example, when a GID having an unknown correspondence with an LID is received. It is desirable that the method of synchronizing or exchanging information on the correspondence relationship between GID and LID between resource management units is appropriately changed according to the scale, design, and purpose of the computer cluster system.
[0046]
The file management unit 104 manages the physical location of the file stored in the storage device 105 connected to the kernel 102, and the file management unit 114 manages the physical location of the file stored in the storage device 115 connected to the kernel 112. Manage whereabouts. The physical location is a block number or address which is a recording unit of the storage devices 105 and 115 and is assigned to the file in a number corresponding to the data size. The file management units 104 and 114 manage LIDs and addresses of all blocks assigned to the file indicated by the LID.
[0047]
The storage devices 105 and 115 are magnetic disks that store files. It should be noted that a flash memory or the like is not limited to a magnetic disk. The storage device 105 passes file data to the file management unit 104 in response to a read request from the file management unit 104 of the kernel 102, and receives data received from the file management unit 104 in response to a write request from the file management unit 104. Remember. The storage device 115 passes file data to the file management unit 114 in response to a read request from the file management unit 114 of the kernel 112, and receives data received from the file management unit 114 in response to a write request from the file management unit 114. Remember.
[0048]
The communication devices 106 and 116 communicate with each other via the network 120. The communication device 106 transmits data to the communication device 116 in response to a request from the kernel 102, and passes the data received from the communication device 116 to the kernel 102. The communication device 116 transmits data to the communication device 106 in response to a request from the kernel 112, and passes the data received from the communication device 106 to the kernel 112. As the communication devices 106 and 116, for example, Ethernet (registered trademark), asynchronous communication mode (ATM), token ring, or the like can be used.
[0049]
(File Opening) The operation of the user program 101 opening a file will be described below using the flowchart of FIG.
[0050]
The user program 101 notifies the resource management unit 103 of an open request including the path of the file to be opened (step 201). The resource management unit 103 transfers the open request to the subsystem 111 (step 202). The subsystem 111 searches the GID corresponding to the path included in the open request from the table stored in the subsystem 111 (step 203). Then, the process branches depending on whether it is found (step 204).
[0051]
If found, the subsystem 111 includes the GID in the request result and notifies the resource management unit 103 (step 205). The resource management unit 103 notifies the request result to the user program 101 (step 206). Then, the user program 101 obtains a GID (Step 207).
[0052]
On the other hand, if not found, the subsystem 111 requests the resource management unit 103 described later to assign a GID (step 208). The resource management unit 103 newly assigns a GID, which will be described later (step 209), and notifies the user program 101 of the assignment result (step 210). Then, the user program 101 obtains a GID (Step 207).
[0053]
Thereafter, the user program 101 designates a target file using this GID when reading and writing the file.
[0054]
FIG. 3 is a flowchart for explaining the above-described steps 208 and 209, that is, the GID assignment request by the subsystem 111 and the new assignment of GID by the resource management unit 103.
[0055]
If it is determined that the GID corresponding to the path does not exist, the subsystem 111 analyzes the open request from the user program 101, checks whether or not new file creation is permitted (step 221), and branches the subsequent processing (step 221). Step 222).
[0056]
If there is no permission for new creation, the open result is a failure, so the resource management unit 103 is notified of the open failure (step 227). Since the open result is unsuccessful, the resource management unit 103 does not assign a GID (step 228), and notifies the user program 101 to that effect in step 210 described above.
[0057]
If there is permission for new creation, the subsystem 111 requests the resource management unit 103 to assign a new GID (step 223). The resource management unit 103 generates and stores a new set of GID and LID (step 224). Then, the generated GID and path are notified to the subsystem 111 to store the correspondence, and the GID and LID are notified to the resource management unit 113 as described above to update the resource information (step 225). Then, the assignment result is successful (step 226), and in step 210 described above, the success and the GID are notified to the user program 101.
[0058]
Although the file open has been described here, in the case of a file deletion request, processing is performed as follows.
[0059]
(1) The user program 101 notifies the resource management unit 103 of a deletion request, and the resource management unit 103 transfers the deletion request to the subsystem 111. (2) The subsystem 111 deletes the GID stored in the path management unit 117 and the corresponding path, and notifies the resource management unit 103 that the GID has been deleted. (3) Upon receiving the notification from the subsystem 111, the resource management unit 103 deletes the corresponding GID and the corresponding LID from the storage, and notifies the file management unit 104 that the LID has been deleted. (4) Upon receiving the notification from the resource management unit 103, the file management unit 104 deletes the relationship between the corresponding LID and the addresses of all blocks assigned to the file indicated by the corresponding LID from the storage.
[0060]
The notification of the deletion result from the resource management unit 103 to the user program 101 may be performed in parallel with the processing of (3), or after the processing of (3) or after the processing of (4). I do not care. When notifying the deletion result after completion of the process (4), the resource management unit 103 is notified of the deletion processing result of the file management unit 104.
[0061]
(File Writing) The operation of the user program 101 writing data to a file will be described below using the flowchart of FIG.
[0062]
The user program 101 notifies the resource management unit 103 of a file write request, GID, and data (step 301). The resource management unit 103 searches for an LID corresponding to the GID. If the computer indicated by the found LID is different from the computer 100 (ie, indicating the computer 110), the write request, LID, and data are transferred to the resource management unit of the computer indicated by the LID (step 302).
[0063]
If the LID indicates the computer 110, the write request, LID, and data are transferred to the resource management unit 113 of the computer 110 in step 302. The resource management unit 113 performs a data writing process as described later (step 303), and the resource management unit 113 notifies the resource management unit 103 of the result (step 304). Then, the resource management unit 103 notifies the user program 101 of the result of the writing process (step 306).
[0064]
When the LID indicates the computer 100, there is no need for transfer. The resource management unit 103 performs a writing process to be described later (step 305) and executes step 306.
[0065]
FIG. 5 is a flowchart of file write processing in the resource management unit 103. The resource management unit 113 performs similar processing.
[0066]
The resource management unit 103 notifies the file management unit 104 of the write request, LID, and data (step 311). The file management unit 104 searches for the physical location of the file indicated by the LID in the storage device 105 (step 312), and the storage area allocated to the file indicated by the LID is sufficiently larger than the size of data to be written. It is checked whether it is secured (step 313). If there is not a sufficiently large storage area, it is secured (step 314).
[0067]
Then, the file management unit 104 writes the data into the storage device 105 (step 315). The file management unit 104 notifies the write result to the resource management unit 103 (step 316).
[0068]
As described above, when the user program 101 writes data to a file, the subsystem 111 has always received and written data from the user program 101 in the past, but in the present embodiment, it is connected to the computer 100 on which the user program 101 operates. When the file to be written exists in the storage device 105, the file can be written without transmitting data via the network, which is efficient.
[0069]
(File Read) The operation of the user program 101 reading data from a file will be described below using the flowchart of FIG.
[0070]
The user program 101 notifies the resource management unit 103 of the read request and GID (step 401). The resource management unit 103 searches the LID from the GID. If the computer indicated by the found LID is different from the computer 100 (ie, indicating the computer 110), the read request and the LID are transferred to the resource management unit of the computer indicated by the LID (step 402).
[0071]
If the LID indicates the computer 110, the read request and the LID are transferred to the resource management unit 113 of the computer 110 in step 402. The resource management unit 113 performs a read process as described later (step 403), and the resource management unit 113 notifies the resource management unit 103 of the read result and data (step 404). Then, the resource management unit 103 notifies the read result and data to the user program 101 (step 406).
[0072]
When the LID indicates the computer 100, there is no need for transfer. The resource management unit 103 performs a read process described later (step 405) and executes step 406.
[0073]
FIG. 7 is a flowchart of file read processing in the resource management unit 103. The reading process in the resource management unit 113 is performed in the same manner.
[0074]
The resource management unit 103 notifies the file management unit 104 of the read request and LID (step 411). The file management unit 104 searches the physical location of the file indicated by the LID in the storage device 105 (step 412). The file management unit 104 reads data from the storage device 105 (step 413). The file management unit 104 notifies the resource management unit 103 of the data and the read result (step 414).
[0075]
As described above, when the user program 101 reads data from a file, conventionally, the subsystem 111 always reads the data from the storage device 105 and transfers the data to the user program 101. However, in this embodiment, the user program 101 If the file to be read exists in the storage device 105 connected to the computer 100 that operates, the file can be read without receiving data via the network, which is efficient.
[0076]
(Configuration of Resource Management Unit) FIG. 8 is a diagram for explaining the configuration of the resource management unit. The resource management unit 601 includes an ID conversion unit 602 that converts GID and LID, and a resource management table 603 that stores a correspondence relationship between GID and LID.
[0077]
The ID conversion unit 602 receives and processes a request notified from a user program or a subsystem and a request transferred from a resource management unit on another computer. If the ID included in the request is a GID, the GID is converted to an LID with reference to the resource management table 603. If the LID obtained by the conversion is an LID related to another computer, the request is transferred to the corresponding computer. Further, when a file is newly opened, a new GID and LID are generated and stored in the resource management table 603. When a file is deleted, the GID and LID are deleted.
[0078]
As described above, in the case of the present embodiment, when a change occurs in the resource management table 603, the change contents are notified to the resource management unit of another computer, and the same contents of the resource management table 603 are maintained.
[0079]
The resource management table 603 manages the LID corresponding to the resource represented by GID. The GID is numbered differently depending on the resource type (process, file, etc.). For example, in FIG. 8, GID starting with VP indicates a process, and GID starting with VF indicates a file. For example, it can be seen that the resource whose GID is VP0001 is P0001 of a process operating on the computer 100.
[0080]
(Configuration of File Management Unit) FIG. 9 is a diagram for explaining the configuration of the file management unit. The file management unit 701 includes a block position conversion unit 702 that converts the LID into a position on the storage device, and a file management table 703 that stores the correspondence between the LID and the position on the storage device.
[0081]
The block position conversion unit 702 uses the file management table 703 to obtain the LID included in the write or read request received from the resource management unit and the position information of the data in the file (information on the number of bytes from the beginning of the file). Convert to a location on the storage device. If the block size of the storage device is 64 KB, it can be seen that the 130 KB data from the beginning of the file with LID F0003 is stored in block 8 from the file management table 703.
[0082]
Note that the file management unit need not have the configuration shown in FIG. 8 as long as the location on the storage device can be known from the LID and the location information of the data in the file.
[0083]
(Configuration of Path Management Unit) FIG. 10 is a diagram for explaining the configuration of the path management unit of the subsystem. The path management unit 801 includes a path conversion unit 802 that provides a function of converting a path, which is a function of the file system, to a GID, and a path management table 803 that stores the correspondence between the path and the GID.
[0084]
The path conversion unit 802 receives a file open request from the resource management unit, converts the path included in the request into a GID using the path management table 803, and notifies the resource management unit of the result. Conversely, when the GID and path of the file generated by the resource management unit are received and stored in the path management table, or when a notification that the GID has been deleted is received from the resource management unit, The path management table is managed, such as deleting a GID and a path associated with the GID.
[0085]
For example, when the path is “/home/doc/document.txt”, the path management table 803 is searched in the order of “home”, “doc”, “document.txt” to obtain the GID. Get.
[0086]
First, data having no parent GID and having a name “home” is retrieved from data stored in the path management table 803 to obtain data 804.
[0087]
Next, using the GID “VF0001” included in the data 804, this time, the data having the parent GID “VF0001” and the name “doc” is retrieved to obtain the data 805.
[0088]
Similarly, data 805 is used to search for data whose parent GID is “VF0002” and whose name is “document.txt” to obtain data 806, and GID “VF0003]” of “document.txt” Is included in the result and notified to the resource management unit.
[0089]
If conversion is not possible due to circumstances such as the file does not exist, the open request is analyzed. If permission to create a new file is granted, the GID is requested to the resource management unit. If permission is not granted An error is returned to the resource manager.
[0090]
Note that the path management unit may not have the configuration of FIG. 9 as long as it has a function of converting a path to GID.
(Data Flow) FIG. 11 is a diagram for explaining the data flow when the user program 1001 reads data from the storage device. Although a communication device is omitted in the figure, communication between the computers 1000 and 1010 is performed via a network using the communication device.
[0091]
When the user program 1001 reads a file from the storage device 1003, the read request is transmitted (1) from the user program 1001 to the kernel 1002 (2) from the kernel 1002 to the storage device 1003. The read data is (3) transmitted from the storage device 1003 to the kernel 1002 and (4) transmitted from the kernel 1002 to the user program 1001.
[0092]
All communications from (1) to (4) are performed inside the computer. Since the communication inside the computer is much faster than the communication via the network, it can be understood that the time until the reading is completed is short and the processing efficiency is good. The same applies to writing.
[0093]
On the other hand, when the user program 1001 reads a file from the storage device 1013, the read request is transmitted from the user program 1001 to the kernel 1002 (2 ') and transferred from the kernel 1002 to the kernel 1012 (3') from the kernel 1012. It is transmitted to the storage device 1013. The read data is transmitted (4 ′) from the storage device 1013 to the kernel 1012 (5 ′) transferred from the kernel 1012 to the kernel 1002, and (6 ′) transmitted from the kernel 1002 to the user program 1001. The same applies to writing.
[0094]
When the computer on which the user program operates and the computer to which the storage device storing the file is different, the read / write efficiency is the same as before. However, when reading and writing from a storage device connected to the same computer, file contents are not communicated over the network as in the conventional case, so overhead is drastically reduced and efficiency is improved. The efficiency of.
[0095]
Although this embodiment has been described with only two computers, it is also possible to configure a computer cluster system using three or more computers. A plurality of user programs may be executed on different computers in the computer cluster system, and a part of the user programs may be executed on a computer on which the subsystem is executed. A plurality of subsystems may be executed on different computers in the computer cluster system. Furthermore, the functions of the subsystem may be divided and executed by being distributed over different computers in the computer cluster system.
[0096]
In this embodiment, the GID for the new file is assigned by the resource management unit 103 or 113, but may be assigned by the subsystem 111. The case where the resource management unit 103 is notified of the opening of the user program 101 file will be described as an example. If the file does not exist, the subsystem 111 allocates a GID, stores it in association with the path, and notifies the resource management unit 103 of it. . Then, the resource management unit 103 newly allocates and stores an LID corresponding to the GID.
[0097]
Up to this point, basic file read / write has been explained so far. The following is the operation when data redundancy is provided by creating a file copy on a storage device of a different computer. explain.
[0098]
In the case of a file having redundancy, a plurality of LIDs “computer 100: storage device 105: F0004” “computer 110: storage device 115: F0004” are associated with one GID, such as GID “VF0004” in FIG. Exists.
[0099]
In the file writing operation, all LIDs corresponding to GID are extracted in step 302. Then, the write request, the data, and the LID “computer 110: F0004” are transferred to the resource management unit 113 of the computer 110 and the resource management unit 113 performs the write process. In parallel with this, the resource management unit 103 also performs write processing. The resource management unit 103 executes step 306 after both writings are completed, and notifies the user program 101 of the writing result. It should be noted that step 306 may be executed at the stage where one writing is completed depending on the operation mode and policy of the computer cluster system.
[0100]
The file read operation is performed in the same manner as when one of a plurality of computers is selected and no redundancy is provided. When selecting a computer, it is most efficient to select a computer executing the user program that issued the request.
[0101]
When a failure occurs in the storage device, the resource management unit selects an LID of another storage device instead of the failed storage device.
[0102]
For example, when a failure occurs in the storage device 105, the resource management unit 103 that has received a read request for the file with the GID “VF0004” from the user program 101 changes the storage device 105 when converting from GID to LID. The “computer 100: storage device 105: F0004” is not selected, but “computer 110: storage device 115: F0004” is selected.
[0103]
As a result, the write operation executes steps 401, 402, 403, 404, and 406 in FIG. 6, and the write operation executes steps 301, 302, 303, 304, and 306 in FIG.
[0104]
In any case, the user program 101 may request reading and writing as in the case where there is no failure.
[0105]
In the computer cluster system according to the present embodiment, when the computer that is executing the user program or subsystem that requested reading / writing of the file is the same as the computer to which the storage device storing the file to be read / written is connected. Since the file can be read and written efficiently, the average file access efficiency of the entire system is improved. In addition, by executing reading and writing of files on the kernel side, it becomes difficult for communication to be concentrated on the computer on which the subsystem operates, so that the capacity, robustness and reliability of the entire system can be improved. In addition, even when the file is given redundancy, the file can be read and written efficiently as in the case where the file is not given redundancy.
[0106]
【The invention's effect】
As described above, the computer cluster system of the present invention can improve the read / write efficiency by dividing the file system functions and implementing path management in the subsystem and resource management and file management in the kernel.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration of a computer cluster system according to an embodiment of this invention.
FIG. 2 is a flowchart for explaining a file open operation in the computer cluster system according to the embodiment of this invention.
FIG. 3 is a flowchart for explaining a GID assignment operation in steps 208 and 209 in FIG. 2;
FIG. 4 is a flowchart for explaining a file writing operation in the computer cluster system according to the embodiment of this invention.
FIG. 5 is a flowchart for explaining a file writing operation in a resource management unit.
FIG. 6 is a flowchart for explaining a file read operation in the computer cluster system according to the embodiment of this invention.
FIG. 7 is a flowchart for explaining a file read operation in the resource management unit.
FIG. 8 is a diagram illustrating a configuration of a resource management unit.
FIG. 9 is a diagram illustrating a configuration of a file management unit.
FIG. 10 is a diagram illustrating the configuration of a subsystem.
FIG. 11 is a diagram illustrating an example of a data flow when reading a file in the computer cluster system according to the embodiment of this invention.
FIG. 12 is a diagram for explaining the configuration of a conventional computer cluster system.
FIG. 13 is a diagram for explaining an example of a data flow when reading a file in a conventional computer cluster system.
[Explanation of symbols]
100, 110 computer
101 User program
102, 112 kernel
103, 113 Resource Management Department
104, 114 File management unit
105, 115 Storage device
106, 116 Communication device
111 subsystem
117 Path management department
120 network
601 Resource Management Department
602 ID converter
603 Resource management table
701 File Management Department
702 Block position conversion unit
703 File management table
801 Subsystem (path management unit)
802 Path conversion unit
803 Path management table
900, 910 calculator
901 User program
902, 912 kernel
903, 913 storage device
904, 914 communication device
915 Path Management Department
916 Resource Management Department
917 File Management Department
920 network

Claims

In a computer cluster system in which a plurality of computers each having a storage device and a communication device are connected in common via a network,
The file system of the entire computer cluster system is distributed on the storage device of each computer,
Some computers have a subsystem having a path management unit that manages a relationship between a path representing the logical structure of the file system and a global identifier that is uniquely assigned to the file in the entire computer cluster system.
Each computer has a kernel that controls the storage device that each computer has,
The kernel stores and manages a correspondence relationship between a local identifier for identifying a storage device in which a file is stored and the global identifier, and an address of a location where the file is stored in the storage device. A file management unit for managing a relationship with the local identifier,
When the path management unit of the subsystem does not have a global identifier corresponding to the file path notified by the user program operating on the computer cluster system, the resource management unit of each computer corresponds to the file path. A new set of global identifiers and local identifiers to be stored, the subsystem newly stores a set of global identifiers and paths corresponding to the path of the file, and the user program corresponds to the path of the file A computer cluster system characterized in that a global identifier is obtained.

In a computer cluster system comprising a plurality of computers that are commonly connected via a network, including a storage device and a communication device,
The file system used in the entire computer cluster system is distributed on the storage device of each computer,
The operating system used in the computer cluster system is
A subsystem having a path management unit that manages a relationship between a path representing a logical structure of the file system and a global identifier uniquely assigned to the file in the entire computer cluster system;
A kernel for controlling the storage device of each computer,
The kernel stores and manages a correspondence relationship between a local identifier for identifying a storage device in which a file is stored and the global identifier, an address of a location where the file is stored in the storage device, and the A file management unit that manages the relationship with the local identifier, and operates on each computer,
The subsystem operates on some computers,
When the path management unit of the subsystem does not have a global identifier corresponding to the file path notified by the user program operating on the computer cluster system, the resource management unit of each computer corresponds to the file path. A new set of global identifiers and local identifiers to be stored, the subsystem newly stores a set of global identifiers and paths corresponding to the path of the file, and the user program corresponds to the path of the file A computer cluster system characterized in that a global identifier is obtained.

In a computer cluster system comprising a plurality of computers that are commonly connected via a network, including a storage device and a communication device,
The file system used in the entire computer cluster system is distributed on the storage device of each computer,
In some computers, a subsystem having a path management unit that manages a relationship between a path representing the logical structure of the file system and a global identifier uniquely assigned to the file in the entire computer cluster system is operated.
On each computer, run the kernel that controls the storage devices of each computer,
The kernel stores and manages a correspondence relationship between a local identifier for identifying a storage device in which a file is stored and the global identifier, and an address of a location where the file is stored in the storage device. A file management unit for managing a relationship with the local identifier,
When the path management unit of the subsystem does not have a global identifier corresponding to the file path notified by the user program operating on the computer cluster system, the resource management unit of each computer corresponds to the file path. A new set of global identifiers and local identifiers to be stored, the subsystem newly stores a set of global identifiers and paths corresponding to the path of the file, and the user program corresponds to the path of the file A computer cluster system characterized in that a global identifier is obtained.

If the path management unit of the subsystem does not have a global identifier corresponding to the path of the file notified by the user program operating on the computer cluster system, the resource management unit can determine the global identifier corresponding to the path of the file. 4. The computer cluster system according to claim 1, wherein a set of a local identifier is newly allocated.

If the path management unit of the subsystem does not have a global identifier corresponding to the file path notified by the user program operating on the computer cluster system, the subsystem assigns a global identifier corresponding to the file path. 4. The resource management unit newly assigns and stores a local identifier corresponding to a global identifier corresponding to the path of the file, and stores the information in association with a path. The computer cluster system described in the section.

When a change occurs in the correspondence between the global identifier and the local identifier of the resource management unit, the correspondence information between the global identifier and the local identifier between the resource management unit and the resource management unit of another computer is kept the same. The computer cluster system according to claim 1, wherein:

The resource management unit
7. The computer cluster system according to claim 1, wherein a global identifier is received from the kernel and a local identifier corresponding to the global identifier is output.

The resource management unit
8. The computer cluster system according to claim 1, wherein information on a correspondence relationship between the local identifier and the global identifier is acquired from a resource management unit of another computer constituting the computer cluster system.

The resource management unit
9. The computer cluster system according to claim 1, wherein information on a correspondence relationship between the local identifier and the global identifier is notified to a resource management unit of another computer constituting the computer cluster system.

The kernel is
A file path received from a user program operating on a computer cluster system and a file open request are notified to the subsystem to obtain a global identifier corresponding to the path,
10. The computer cluster system according to claim 1, wherein a global identifier obtained is returned to the user program.

The kernel is
Notifying the resource management unit of the global identifier received from the user program and a file read request to convert the global identifier into the local identifier,
The local identifier obtained by conversion is converted into an address in a storage device by the file management unit,
Access the storage device using the address obtained by conversion, read the data,
11. The computer cluster system according to claim 1, wherein the read data is returned to the user program.

The kernel is
The global identifier and data to be written to the file and a file write request are received from the user program
Converting the global identifier into the local identifier in the resource management unit;
The file management unit converts the local identifier into an address in a storage device,
12. The computer cluster system according to claim 1, wherein data is written by accessing the storage device.

The kernel is
When a file write request is received from a program running on a computer cluster system,
Write files to multiple storage devices connected to different computers in the computer cluster system,
13. The computer cluster system according to claim 1, wherein when any of the plurality of storage devices fails, file access is performed using a file of another storage device.

The subsystem is
Receiving a path representing the logical structure of the file system from the kernel;
14. The computer cluster system according to claim 1, wherein a global identifier corresponding to the path is output.

A file access method in a computer cluster system including a storage device and a communication device and comprising a plurality of computers commonly connected via a network,
An open step in which the user program notifies the subsystem that manages the logical structure of the file system of the file path and obtains the global identifier of the file;
A request step in which a user program notifies the kernel operating on the same computer as the user program to request reading and writing of the file by notifying the global identifier;
A file location search step in which the kernel converts the global identifier into a local identifier and checks the location of a storage device in which a file to be read and written exists;
As a result of the file location search step, when the computer in which the kernel is operating differs from the computer to which the storage device in which the file exists belongs, the kernel requests a read / write request to the kernel operating in the computer in which the storage device in which the file exists And a request transfer step for transferring the local identifier;
An access step in which a kernel operating on a computer to which the storage device in which the file exists belongs accesses the storage device indicated by the local identifier and executes a read / write request;
A kernel that has executed the access step has a result reporting step of notifying the user program of the result of the access step;
In the open step, when there is no global identifier corresponding to the file path in the path management unit of the subsystem, the user program notifies the subsystem of the file path,
The kernel newly assigning and storing a global identifier and local identifier pair corresponding to the path of the file;
A step of maintaining the information of the correspondence relationship between the global identifier and the local identifier of the kernel of the kernel and other computers on the same,
A file access method comprising the step of: the subsystem performing a new assignment of a global identifier and a path corresponding to the path of the file and storing them.

A file access method in a computer cluster system including a storage device and a communication device and comprising a plurality of computers commonly connected via a network,
An open step in which the user program notifies the subsystem that manages the logical structure of the file system of the file path and obtains the global identifier of the file;
A request step in which a user program notifies the kernel operating on the same computer as the user program to request reading and writing of the file by notifying the global identifier;
A file location search step in which the kernel converts the global identifier into a local identifier and checks the location of a storage device in which a file to be read and written exists;
As a result of the file location search step, when the computer in which the kernel is operating differs from the computer to which the storage device in which the file exists belongs, the kernel requests a read / write request to the kernel operating in the computer in which the storage device in which the file exists And a request transfer step for transferring the local identifier;
An access step in which a kernel operating on a computer to which the storage device in which the file exists belongs accesses the storage device indicated by the local identifier and executes a read / write request;
A kernel that has executed the access step has a result reporting step of notifying the user program of the result of the access step;
In the open step, when there is no global identifier corresponding to the file path in the path management unit of the subsystem, the user program notifies the subsystem of the file path,
The subsystem assigns a global identifier corresponding to the path of the file and stores it in association with the path;
The kernel newly assigning and storing a local identifier corresponding to a global identifier corresponding to the path of the file;
Maintaining the same correspondence information between global identifiers and local identifiers of the kernel and kernels of other computers; and
A file access method comprising: