JP3777872B2

JP3777872B2 - Query parallel processing system and machine-readable recording medium recording program

Info

Publication number: JP3777872B2
Application number: JP13674299A
Authority: JP
Inventors: 雄一相場
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1999-05-18
Filing date: 1999-05-18
Publication date: 2006-05-24
Anticipated expiration: 2019-05-18
Also published as: JP2000330959A

Description

【０００１】
【発明の属する技術分野】
本発明は、データベースに対するデータ検索，グループ化，集計演算などを記述した問合せ文を、複数の計算機ノードを用いて並列処理する技術に関し、特に、複数の計算機ノードの処理負荷を均衡化させることによってシステム全体の性能を向上させる問合せ並列処理システムに関する。
【０００２】
【従来の技術】
データベースに対する問合せを並列に処理する方式は、過去に多くの研究者によって提案されてきた("Query Evaluation Techiniques for Large Databases", Goetz Graefe, ACM Computing Surveys, Vol. 25, No.2 June 1993 など) 。巨大なデータベースに対する問合せの処理は、二次記憶装置からのデータ読み込みがネックになる。このため、複数の二次記憶装置にデータベースを分割して格納し、並列に読み込んで処理する方式が基本的となっている。
【０００３】
１つのデータベースに対し条件を指定して検索を行い該当レコードを取り出す処理を選択処理と呼ぶこととする。例えば、銀行の口座データベースに対し口座番号を指定して残高を照会したり、指紋データベースに対し指紋画像を与えて照合する処理が選択処理となる。データベースの選択処理は、データベースを複数の二次記憶装置に分割して格納することによって容易に並列化できる。この場合、各二次記憶装置に分割格納された各部分データベースを各計算機ノードから並列に読み込んで検索処理を行うことにより並列化する。選択処理では、インデックスなどのアクセスパスを用いることがあるが、部分データベース毎に独立にインデックスを作ることにより、インデックスを用いた選択処理を容易に並列化できる。これらの並列化の手法は商用のリレーショナルデータベースシステムにも取り入れられている。例えば、Oracleなどが挙げられる（「パーティショニング技術のVLDBへの適用」、電子情報通信学会信学技報AI97−40,DE97−73、1997−12) 。
【０００４】
データベース中のレコードを、ある属性（カラム）の値で分類することを分類処理と呼ぶこととする。例えば、銀行の口座データベース中のレコードを支店別に分類する処理である。また、レコード数を数えたり、あるカラムの平均値や最大値を算出するなどの処理を集計処理と呼び、レコード群を、あるカラムの値でソートする処理をソート処理と呼ぶ。
【０００５】
これらの処理を並列化する技術が、特開平10−97544 号公報に記載されている。この技術では、データベースを複数の入出力サーバ配下の複数の二次記憶装置に格納しておき、複数の入出力サーバから並列にデータベース中のレコードを読み出す。読み出されたレコードは入出力サーバによって分類され、複数の集計処理サーバに分配される。集計処理サーバでは各入出力サーバからのレコードに集計処理を施して部分集計結果とし、最後に、部分集計結果をマージする。ここまでの処理は複数の入出力サーバ、複数の集計処理サーバによって並列に行われる。
【０００６】
このように、データベースに対する問合せの並列処理は、データベースを複数の部分データベースに分割して複数の二次記憶装置に分散格納し、複数の計算機ノードが各二次記憶装置から並列にデータを読み込んで処理するということを基本にしている。しかし、データベースを二次記憶装置に分散格納して各部分データベースに対し並列に検索処理を行う場合、部分データベースを各二次記憶装置に均等に格納したとしても、それらを読み込んで処理を行う各計算機ノードの負荷は均等にならず、それによって、システム全体の処理効率を悪化させてしまう場合がある。
【０００７】
このように負荷の不均衡がある場合、共有メモリ型の並列計算機ならば、CPU 数よりも多くのプロセスを生成しておけばシステム全体の処理効率悪化を防げる。なぜなら、プロセス毎の負荷が不均衡になってもOSのスケジューリングによってプロセスが切り替わり、CPU が遊ぶことなく問合せの処理を進められるからである。しかし、複数の計算機ノードを用いた並列計算機では、プロセスを計算機ノード間で移動できない。システム全体の処理は、最終的には負荷の最も重い計算機ノードの処理終了を待ち合わせるため、負荷の軽かった計算機ノードが遊んでしまい、システム全体の処理効率が悪化する。
【０００８】
一方、特開平10−301822号公報には、次のような技術が記載されている。データベースに対する問合せを解析してアクセスパスを生成し、更に、アクセスパスを複数の部分アクセスパスに分解する。その後、同時に実行可能な部分アクセスパス群毎に、その群に属する部分アクセスパスの内のいくつかを処理コストに従って１つのグループにまとめる。ここで、処理コストとは、表の行数，列の分布状況，インデックスのヒット率等の統計情報から算出したものであり、部分アクセスパスの実行に要する負荷の度合を数値化したものである。また、部分アクセスパス群に属するいくつかの部分アクセスパスを処理コストに従って１つのグループにまとめる際には、そのグループの属する部分アクセスパスの処理コストの合計値が、上記部分アクセスパス群の最大処理コストを越えないようにする。
【０００９】
【発明が解決しようとする課題】
上記した特開平10−301822号公報に記載されている技術によれば、同時に実行可能な部分アクセスパス群の属する各グループの処理コストをほぼ等しくすることができる。従って、複数の計算機ノードに、グループ単位で処理を割り当てるようにすることにより、各計算機ノードの負荷をある程度は均衡化することができる。しかしながら、特開平10−301822号公報に記載されている技術は、統計情報という静的な情報を用いているため、各計算機ノードの動的な負荷の変動には対処できず、その結果、システム全体の処理効率を十分に高いものにすることができないという問題があった。
【００１０】
そこで、本発明の目的は、複数の計算機ノードから構成されるクラスタシステム上で問合せを並列処理する際、各計算機ノードにおける処理負荷を動的に均衡化させることにより、システム全体の処理効率を高めることにある。
【００１１】
【課題を解決するための手段】
本発明の問合せ並列処理システムは、問合せ処理用の計算機ノードで、問合せを解析して部分問合せに分解し、複数の部分問合せ処理用の計算機ノードに部分問合せを配布して並列処理を行わせ、返却された部分問合せの結果をマージして問合せの結果とする。この構成は、問合せを並列処理するための基本構成である。
【００１２】
本発明の問合せ並列処理システムでは、部分問合せを部分問合せ処理用の計算機ノードに配布する際、各計算機ノードへの配布数を随時決めて配布する。この時、各計算機ノードの負荷を等しく保つために、例えば、各計算機ノードで処理中になっている部分問合せの数が常に等しくなるように配布数を決める。配布数を越えて余った未配布の部分問合せは、後に、何れかの計算機ノードが結果を返してきた時などに、その計算機ノードに処理させることになる。つまり、結果を返してきた計算機ノードに対して、残っている部分問合せを処理させ、計算機ノードが遊ばないようにする。以上が本発明における負荷均衡化の基本的な考え方である。
【００１３】
また、本発明の問合せ並列処理システムでは、問合せ処理用の計算機ノードを複数化し、その計算機ノードがボトルネックになることを防ぐ。この場合、複数の計算機ノードで別々の問合せを受け、独立して各問合せの部分問合せ配布処理を行う。この時の配布ストラテジは、単独の計算機ノードで問合せを受け付ける場合と全く同様である。
【００１４】
また、部分問合せ処理用の計算機ノードの障害時には、その計算機ノードを検出し、その計算機ノードに配布すべき部分問合せを別の部分問合せ処理用の計算機ノードに配布し処理させる。これにより、部分問合せ処理用の計算機ノードに障害が発生しても、全ての部分問合せを処理し、問合せ結果を完全なものにすることができる。
【００１５】
無共有型クラスタシステムでは、各部分問合せ処理用の計算機ノードは自分に接続された二次記憶装置にしかアクセスできない。従って、その二次記憶装置に格納された部分データベースしかアクセスできない。そこで、各計算機ノードと、それがアクセスできる部分データベースの対応関係を管理しておく。これにより、各計算機ノードに対し、そのノードが処理可能な部分問合せを配布することができる。
【００１６】
無共有型クラスタシステムでは、部分データベースの原本と複製を別々の計算機ノード配下の二次記憶装置に格納する。これによって、同じ部分データベースに対して複数の計算機ノードからアクセス可能となる。つまり、同じ部分問合せを複数の計算機ノードで処理可能となる。このことを利用すると、負荷の軽い側の計算機ノードに部分問合せを処理させることが可能となり、これによって計算機ノードの負荷均衡化を行う。また、これと同時に、計算機ノードの障害時にも、そのノードに配布するべき問合せを別の計算機ノードに処理させることが可能となる。
【００１７】
【発明の実施の形態】
次に本発明の実施の形態について図面を参照して詳細に説明する。
【００１８】
本発明の第１の実施の形態の例について説明する。第１の実施の形態は、複数の計算機ノードが二次記憶装置を共有する共有型クラスタシステムの上で、二次記憶装置に記録されたデータベースに対する問合せを並列に処理する問合せ並列処理システムに関するもので、その構成例を図１に示す。
【００１９】
図１を参照すると、問合せ処理用の複数の計算機ノード１−１〜１−ｍ及び部分問合せ処理用の複数の計算機ノード２−１〜２−ｎが通信網４で接続され、計算機ノード間でデータの受け渡しができる。通信網４としては、Ether やATM のHUB やスイッチ、その他、専用結合ネットワークを利用することができる。
【００２０】
問合せ処理用の計算機ノード１−１〜１−ｍは、外部通信網５に接続されており、外部通信網５を介して問合せを受け取ることができる。外部通信網５としては、LAN やWAN などを利用する。問合せとしては、データの集合であるデータベースから、ある条件を与えて一部のデータを引き出すような要求や、データの値の平均値や最大値などの集計を行う要求、あるいは、あるデータの値に着目してグループ化したりソートを行う要求がある。これら要求は、問合せ言語SQL などを使って記述されて渡されたり、あるいは専用のキーワードのみを組み合わせて渡されたりする。
【００２１】
また、部分問合せ処理用の計算機ノード２−１〜２−ｎは、二次記憶装置３−１〜３−ｊを共有する記憶装置共有網６に接続されており、これらのいずれの計算機ノード２−１〜２−ｎも二次記憶装置３−１〜３−ｊにアクセスすることができる。記憶装置共有網６としては、SCSIやFibreChanel のHUB やスイッチ、その他、周辺装置用の専用ネットワークなどが用いられる。問合せの対象となるデータベースは、複数の部分データベース31に分割され、複数の二次記憶装置３−１〜３−ｊに分散して格納される。データベースの分割方法は、ラウンドロビン式分散配置、ハッシュ分割、キー値分割等の方法がとられる。
【００２２】
問合せ処理用の計算機ノード１−１〜１−ｍは、それぞれ問合せ解析手段11と、障害検出手段12と、配布数決定手段13と、問合せ統合処理手段14と、記録媒体Ｋ１とを備えている。
【００２３】
問合せ解析手段11は、外部通信網５を介して送られてくる自計算機ノード宛の問合せを解析し、この問合せを処理するための処理手順であって、問合せ対象となるデータベースを構成する各部分データベースに対する部分問合せ（部分データベースに対する検索要求）を含む処理手順を導出する機能を有する。
【００２４】
障害検出手段12は、部分問合せ処理用の計算機ノード２−１〜２−ｎに障害が発生したことを検出する機能を有する。
【００２５】
配布数決定手段13は、計算機ノード２−１〜２−ｎに配布する部分問合せの数を各計算機ノード２−１〜２−ｎの負荷に応じて動的に決定する機能を有する。
【００２６】
問合せ統合処理手段14は、問合せ解析手段11で導出した処理手順に従って問合せの処理を進め、各計算機ノード２−１〜２−ｎに対し部分問合せを配布する際には、配布数決定手段13を用いて決定した配布数だけの未配布の部分問合せを配布し、部分問合せの結果が返却された際には、関連する結果をマージしながら問合せの結果を導出する機能を有する。
【００２７】
計算機ノード１−１〜１−ｍが備えている記録媒体Ｋ１は、ディスク，半導体メモリ，その他の記録媒体であり、計算機ノード１−１〜１−ｍを問合せ処理用の計算機ノードとして機能させるためのプログラムが記録されている。このプログラムは、計算機ノード１−１〜１−ｍによって読み取られ、計算機ノード１−１〜１−ｍの動作を制御することで、計算機ノード１−１〜１−ｍ上に、問合せ解析手段11，障害検出手段12，配布数決定手段13, 問合せ統合処理手段14を実現する。
【００２８】
部分問合せ処理用の計算機ノード２−１〜２−ｎは、それぞれ部分問合せ処理手段21と、記録媒体Ｋ２とを備えている。
【００２９】
部分問合せ処理手段21は、問合せ処理用の計算機ノード１−１〜１−ｍから配布された部分問合せを受け、その部分問合せの処理を行い、配布元の計算機ノードに結果を返す機能を有する。
【００３０】
計算機ノード２−１〜２−ｎが備えている記録媒体Ｋ２は、ディスク，半導体メモリ，その他の記録媒体であり、計算機ノード２−１〜２−ｎを部分問合せ処理用の計算機ノードとして機能させるためのプログラムが記録されている。このプログラムは、計算機ノード２−１〜２−ｎによって読み取られ、計算機ノード２−１〜２−ｎの動作を制御することで、計算機ノード２−１〜２−ｎ上に部分問合せ処理手段21を実現する。
【００３１】
次に、本実施の形態の動作について説明する。
【００３２】
外部通信網５に接続される計算機ノード１−ｋ（１≦ｋ≦ｍ）で動作する問合せ解析手段11は、外部通信網５を介して投入された自計算機ノード宛の問合せを受け付けると、図２の流れ図に示すように、その意味を解析し、処理対象となるデータベースに対する処理ステップを含む、問合せの結果を求めるための処理手順を導出する（Ａ１）。例えば、図３（ａ）に示すような問合せに対し同図（ｂ）に示すような複数のステップから構成されるツリー状の処理手順が求められる。この例の場合、処理対象となるデータベースが２個であるので、データベースに対する処理ステップが２個含まれる。同図（ｂ）はあくまで一例であり、Ａ１で導出される処理手順は、問合せに応じたものになる。例えば、データベース１，２から取り出したデータを結合し、グループ化する場合には、同図（ｂ）に示したソート処理の代わりにグループ化処理を行う処理手順が導出される。
【００３３】
次に、問合せ解析手段11は、各データベースに対する処理ステップを、そのデータベースを構成する各部分データベースに対する処理ステップ（部分問合せ）に分割すると共に、同一データベースに対する部分問合せの集合毎に、部分問合せの結果をマージする処理ステップを追加する（Ａ２）。ここで、１つのデータベースに対する部分問合せの集合を部分問合せのセットと呼ぶこととする。処理対象となるデータベースが複数あった場合、各データベースについて部分問合せのセットが１つずつ作成される。例えば、図３（ｂ）に示すような処理手順から同図（ｃ）のような処理手順が作成される。この例では、１つの問合せで２つのデータベースを対象とするため、部分問合せの組が２セット作られている。また、１つの部分問合せのセットに含まれる部分問合せの数は、部分問合せがデータベースを構成する各データベース毎に作成されることから、対応するデータベースを構成する部分データベースの数と同じになる。
【００３４】
問合せ解析手段11において、問合せに対する処理手順が導出されると、問合せ統合処理手段14は、図４の流れ図に示すように、問合せ解析手段11から処理手順を入手して保存し、その処理手順中に部分問合せのセットが存在するか否かを判断する（Ｂ１，Ｂ２）。問合せ解析手段11から処理手順を入手した時点においては、部分問合せのセットが存在し、Ｂ２の判断結果がｙｅｓとなるので、問合せ統合処理手段14は、Ｂ３の処理を行う。
【００３５】
Ｂ３において、問合せ統合処理手段14は、処理手順に含まれている部分問合せのセット間の依存関係を解析し、配布可能な部分問合せを認識する。例えば、２つの部分問合せのセットが存在し、両者が処理の依存関係のない独立なセットになっている場合は、２つのセットに含まれる全ての部分問合せを配布可能な部分問合せとして認識する。また、例えば、２つの部分問合せのセットＳ１，Ｓ２が存在し、セットＳ２の処理にはセットＳ１の処理結果が必要になるという依存関係が存在する場合には、セットＳ１に含まれる部分問合せのみを配布可能な部分問合せとして認識する。
【００３６】
その後、問合せ統合処理手段14は、部分問合せ処理用の計算機ノード２−１〜２−ｎの内の、ノード番号「０」が付与されている計算機ノードを部分問合せの配布対象の計算機ノードとする（Ｂ４）。尚、本実施の形態では、計算機ノード２−１〜２−ｎにそれぞれノード番号「０」〜「ｎ−１」が付与されているものとする。
【００３７】
次に、問合せ統合処理手段14は、配布数決定手段13を用いてノード番号「０」の計算機ノード２−１への部分問合せの配布数を決定する（Ｂ６）。その際、各計算機ノード２−１〜２−ｎで処理中となる部分問合せの数が計算機ノード間で等しくなるように配布数を決定する。
【００３８】
このＢ６の処理を詳細に説明する。問合せ統合処理手段14は、計算機ノード２−１への部分問合せの配布数を決定する際、配布数決定手段13に対して配布対象にしている計算機ノード２−１のノード番号「０」を渡す。
【００３９】
配布数決定手段13は、問合せ統合処理手段14からノード番号「０」が渡されると（図５，Ｃ１）、進捗管理表を参照し、計算機ノード２−１から結果が返却されていない部分問合せの数（結果未返却数）を取得する（Ｃ２）。
【００４０】
進捗管理表は、各計算機ノード２−１〜２−ｎの結果未返却数および各計算機ノード２−１〜２−ｎから結果が返却されていない配布済みの部分問合せを管理するものであり、例えば、図６に示す構成を有する。この図６に示す進捗管理表には、部分問合せ処理用の計算機ノード２−１〜２−ｎのノード番号「０」〜「ｎ−１」に対応付けて、計算機ノード２−１〜２−ｎの結果未返却数と、計算機ノード２−１〜２−ｎから結果が返却されていない部分問合せのＩＤとが登録されている。尚、結果返却数，部分問合せＩＤは、問合せ統合処理手段14によって変更されるものである。
【００４１】
Ｃ２において、計算機ノード２−１の結果未返却数を取得すると、配布数決定手段13は、「結果未返却数＜目標配布数」なる条件が満たされているか否かを判断する（Ｃ３）。ここで、目標配布数は、各計算機ノード２−１〜２−ｎで処理中の、計算機ノード１−ｋからの部分問合せの個数を所定の個数に揃えるために計算機ノード１−ｋにおいて設定されている値であり、部分問合せの個数を例えば２個に揃える場合には、目標配布数は「２」となる。そして、Ｃ３において条件が満たされていると判断した場合（判断結果がｙｅｓ）は、「目標配布数−結果未返却数」なる演算を行って配布数を求め、求めた配布数を問合せ統合処理手段14に通知する（Ｃ４）。これに対して、条件が満たされていないと判断した場合（Ｃ３がｎｏ）は、配布数＝「０」を問合せ統合処理手段14に通知する（Ｃ５）。問合せ統合処理手段14は、配布数決定手段13から配布数が通知されると、その値を計算機ノード２−１に対する部分問合せの配布数と決定する。以上がＢ６で行う処理である。
【００４２】
そして、Ｂ６で決定した計算機ノード２−１に対する部分問合せの配布数が「０」よりも大きく、且つ未配布の配布可能な部分問合せが存在する場合（Ｂ７，Ｂ８が共にｙｅｓ）は、TCP,UDP/IPなどの汎用的な通信プロトコル、あるいは各種専用通信プロトコルを利用して、未配布の配布可能な部分問合せの内の１つを計算機ノード２−１へ配布すると共に、上記配布した部分問合せを配布可能と認識している部分問合せから除外し（Ｂ10）、更に、配布数を−１すると共に進捗管理表中の計算機ノード２−１についての管理情報を更新する（Ｂ11）。例えば、進捗管理表が図６に示す構成を有する場合は、ノード番号「０」に対応付けて登録されている計算機ノード２−１の結果未返却数を＋１すると共に、今回計算機ノード２−１に配布した部分問合せのＩＤをノード番号「０」に対応付けて登録する。問合せ統合処理手段14は、Ｂ６で決定した数の部分問合せを計算機ノード２−１に配布するか、或いは未配布の部分問合せが存在しなくまるまで（Ｂ７がｎｏとなるか、或いはＢ８がｎｏとなるまで）、上記した処理Ｂ７〜Ｂ11を繰り返し行う。
【００４３】
そして、Ｂ７或いはＢ８がｎｏとなると、問合せ統合処理手段14は、ノード番号を＋１し、計算機ノード２−２を部分問合せの配布対象の計算機ノードにする（Ｂ９）。以下、問合せ対象統合処理手段14は、「ノード番号＜ノード数」となるまで（Ｂ５がｎｏとなるまで）、上記した処理Ｂ５〜Ｂ１１を繰り返し行う。そして、Ｂ５の判断結果がｎｏとなると、問合せ統合処理手段14は、待ち状態となる。以上の処理により、計算機ノード１−ｋから各計算機ノード２−１〜２−ｎへ、目標配布数によって示される個数の部分問合せが配布される。但し、配布可能な部分問合せの数が、「目標配布数×計算機ノード２−１〜２−ｎの台数」よりも少ない場合は、計算機ノード２−１〜２−ｎの内の、一部の計算機ノードのみに目標配布数によって示される個数の部分問合せが配布されることになる。
【００４４】
一方、計算機ノード２−１〜２−ｎ内の部分問合せ処理手段21は、計算機ノード１−ｋから部分問合せが送られてくると、その部分問合せによって指示された部分データベース31から、部分問合せによって指定された条件を満たすデータを取り出し、取り出したデータを問合せ元の計算機ノード１−ｋに返却する。その際、処理した部分問合せのＩＤ及び自計算機ノードのノード番号も併せて返却する。
【００４５】
計算機ノード１−ｋ内の問合せ統合処理手段14は、計算機ノード２−ｉ（１≦ｉ≦ｎ）から部分問合せの結果，部分問合せのＩＤ及びノード番号が返却されると（図４，Ｂ13）、今回返却された結果により、結果を全て入手したことになる部分問合せのセットが存在するか否かを調べる（Ｂ14）。
【００４６】
そして、そのような部分問合せのセットが存在しないと判断した場合（Ｂ14がｎｏ）は、結果を返却した計算機ノード２−ｉのノード番号を取得し、配布数決定手段13により上記計算機ノード２−ｉへの部分問合せの配布数を決定する（Ｂ16, Ｂ17）。このＢ17の処理は、Ｂ６と同様の処理である。
【００４７】
計算機ノード２−ｉへの部分問合せの配布数を決定すると、問合せ統合処理手段14は、Ｂ17で決定した配布数の部分問合せを計算機ノード２−ｉに配布するか、或いは未配布の配布可能な部分問合せが存在しなくなるまで（Ｂ18がｎｏとなるか、或いはＢ19がｎｏとなるまで）、部分問合せを計算機ノード２−ｉに配布して配布した部分問合せを配布可能と認識している部分問合せから除外する処理（Ｂ20）と、配布数を−１し進捗管理表を更新する処理（Ｂ21）を繰り返し行う。ここで、例えば、進捗管理表が図６に示す構成を有する場合には、計算機ノード２−ｉのノード番号「ｉ−１」に対応付けて登録されている結果未返却数を−１する処理およびノード番号「ｉ−１」に対応付けて登録されている部分問合せのＩＤの内の、計算機２−ｉから返却されたＩＤを削除する処理を行う。Ｂ18或いはＢ19の判断結果がｎｏとなった場合は、問合せ統合処理手段14は待ち状態になる。
【００４８】
また、Ｂ14において、今回返却された結果により、結果を全て入手したことになる部分問合せのセットが存在すると判断した場合（判断結果がｙｅｓ）は、保存してある処理手順から、結果を全て入手した部分問合せのセットを除去し（Ｂ15）、その後、Ｂ２の処理を行う。
【００４９】
そして、Ｂ２において、保存してある処理手順中に部分問合せのセットが存在すると判断した場合（判断結果がｙｅｓ）は、Ｂ３の処理を行い、部分問合せのセットが存在しないと判断した場合（判断結果がｎｏ）は、部分問合せよりルート側の処理をルートに向かって順次行い、問合せに対する結果を外部通信網５を介して問合せ元に返却する（Ｂ１２）。例えば、Ｂ１で入手した処理手順が図３（ｃ）に示すものである場合は、図７に示すように、２つの部分問合せのセットについて結果を全て入手したら、各部分問合せのセットについて結果をマージする処理を行い、更に、マージ結果を結合する処理，結合結果をソートする処理を順次行う。尚、本実施の形態では、部分問合せの結果を全て入手してからマージを行うようにしたが、結果が返却される毎に１つずつマージしていく方法もある。例えば、平均値や合計値などを、問合せに対する結果として求める場合、部分問合せの結果が戻る度にセット毎に結果を足し込んでいく。これにより、処理手順の各ステップをパイプライン的に処理していくことになる。
【００５０】
以上のように、問合せ統合処理手段14は、問合せ解析手段11から処理手順を入手した場合（図４，Ｂ１）、予め定められている目標配布数の部分問合せを各計算機ノード２−１〜２−ｎに配布し（Ｂ５〜Ｂ11）、残った部分問合せは、未配布としておく。未配布となった部分問合せは、後に、部分問合せの結果を早く返却してきた計算機ノードに投げる（Ｂ16〜Ｂ21）。つまり、部分問合せの処理が早く終わって負荷の軽くなった計算機ノードに未配布の部分問合せを配布することになる。
【００５１】
以上によって各計算機ノード２−１〜２−ｎで処理中の部分問合せの個数は、ある一定個数以下に保たれる。従って、多数の部分問合せが発生するような状況では、各計算機ノードで処理中の部分問合せの個数は、常にある一定個数に保たれる。これによって各計算機ノードの負荷が均衡化される。
【００５２】
次に、各計算機ノード２−１〜２−ｎで独立に動作する部分問合せ処理手段21の実施例を図８に示す。図８（ａ）に示す実施例では、配布された部分問合せを溜めておくキューを持ち、そこから１つずつ部分問合せ処理部に取り出して順番に処理する。処理された部分問合せの結果は、部分問合せ処理部から配布元の計算機ノードに返却する。
【００５３】
これに対し、図８（ｂ）は、複数の部分問合せを多重的に処理する実施例を示す。マルチスレッドもしくは、マルチプロセスを利用することにより、部分問合せ処理部を複数同時に動作させ、配布された各部分問合せを各部分問合せ処理部に割り付ける。各部分問合せ処理部は、計算機ノードのOSのスケジューリングに従い、多重的に動作する。例えば、部分問合せ処理部が、部分データベースを一定サイズのブロック単位で二次記憶装置３−１〜３−ｊから読み出す処理と、読み出したブロックのデータに対し検索・演算を施す処理で構成される場合、複数の部分問合せ処理部は、互いに二次記憶装置３−１〜３−ｊの入出力処理と計算機ノードにおけるCPU 処理とが時間的に重ねられて処理される。
【００５４】
部分問合せ処理手段21は、図８（ａ）におけるキューを図８（ｂ）に組み合わせたものも考えられる。
【００５５】
次に、部分問合せ処理用の計算機ノード２−１〜２−ｎに障害が発生した場合の動作を説明する。
【００５６】
問合せ処理用の計算機ノード１−ｋ内の障害検出手段12は、部分問合せ処理用の計算機ノード２−１〜２−ｎに障害が発生したことを検出するために、図９に流れ図に示すように、各計算機ノード２−１〜２−ｎの状態を監視している（Ｄ１）。この監視は、例えば、通信網４を介して各計算機ノード２−１〜２−ｎと定期的に小さなデータを交換することにより行う。つまり、データ交換を正常に行うことができた場合は、障害が発生していないと判断し、データ交換が不能になった場合は、障害が発生していると判断する。また、計算機ノード２−１〜２−ｎにハードウェアによって構成される専用の障害診断機構を設け、障害が発生した場合、計算機２−１〜２−ｎの障害診断機構が通信網４を介して障害発生を計算機ノード１−１〜１−ｎに通知するようにしても良い。
【００５７】
計算機ノード１−ｋ内の障害検出手段12は、例えば、計算機ノード２−１に障害が発生したことを検出すると、先ず、計算機ノード２−１のノード番号「０」を入手する（Ｄ２）。その後、障害検出手段12は、進捗管理表からノード番号「０」の計算機ノード２−１に配布済みの部分問合せであって、結果が返却されていない部分問合せのＩＤを取得し（Ｄ３）、更に、進捗管理表から障害の発生した計算機ノードのノード番号「０」の行を削除し、計算機ノード２−１を部分問合せの配布対象から除外する（Ｄ４）。例えば、進捗管理表の内容が図６に示すものであった場合、障害検出手段12は、部分問合せのＩＤとして「ＩＤ１，ＩＤ２」を取得し（Ｄ３）、進捗管理表の第１行目を削除する（Ｄ４）。その後、障害検出手段13は、問合せ統合処理手段14に対してＤ３で取得した部分問合せのＩＤを通知する（Ｄ５）。尚、図９の流れ図では、進捗管理表から障害の発生した計算機ノードのノード番号の行を削除することにより、障害の発生した計算機ノードを部分問合せの配布対象から除外するようにしたが、除外を示すマークを付けるような方法をとることもできる。
【００５８】
問合せ統合処理手段14は、障害検出手段12から部分問合せのＩＤが通知されると、その部分問合せを未配布の部分問合せとして認識する。このように、障害の発生した計算機ノードに配布済みの部分問合せであって、結果が返却されていない部分問合せを未配布の部分問合せとして認識することにより、図４のＢ２０においてそれらが障害の発生していない部分問合せ処理用の計算機ノードの何れかに配布される。二次記憶装置３−１〜３−ｊが記憶装置共有網６で接続された共有型クラスタシステムでは、各計算機ノード２−１〜２−ｎは、記憶装置共有網６を介して何れの部分データベース31にもアクセス可能なので、ある計算機ノードに配布することになっていた部分問合せを別の計算機ノードでも処理できる。従って、障害の起きている計算機ノードに配布している部分問合せを他の正常な計算機ノードに配布することによって、全ての部分問合せを処理することができる。
【００５９】
次に、本発明の第２の実施の形態の例について説明する。第２の実施の形態は、複数の計算機ノードそれぞれにその計算機ノードだけがアクセスできる二次記憶装置が接続された無共有型クラスタシステムの上で、二次記憶装置に記録されたデータベースに対する問合せを並列に処理する問合せ並列処理システムに関するもので、その構成例を図１０に示す。
【００６０】
本実施の形態と図１に示した第１の実施の形態との相違点は、計算機ノード１−１〜１−ｍの代わりに計算機ノード10−１〜10−ｍを備えた点と、記憶装置共有網がなく各計算機ノード２−１〜２−ｎのそれぞれに二次記憶装置30−１〜30−ｎが直接接続されている点と、記録媒体Ｋ１の代わりに記録媒体Ｋ３を備えている点である。
【００６１】
問合せの対象となるデータベースは、複数の部分データベース31に分割され、複数の二次記憶装置30−１〜30−ｎに分散して格納される。更に、各部分データベースの複製が、原本と異なる二次記憶装置に分散して格納される。
【００６２】
計算機ノード10−１〜10−ｍは、問合せ統合処理手段14の代わりに問合せ統合処理手段14ａを備えている点と、格納状況管理手段15を備えている点が図１に示した計算機ノード１−１〜１−ｍと相違している。
【００６３】
格納状況管理手段15は、部分問合せ処理用の計算機ノード２−１〜２−ｎがアクセス可能な部分データベースのＩＤを管理し、問合せ統合処理手段14ａから計算機ノード２−１〜２−ｎのノード番号が通知された場合、通知されたノード番号の計算機ノードがアクセス可能な部分データベースのＩＤのリストを返却する機能を有する。各計算機ノード２−１〜２−ｎがアクセス可能な部分データベース31のＩＤの管理は、例えば、図１１に示すような格納状況管理表を使用して行う。格納状況管理表には、計算機ノード２−１〜２−ｎのノード番号「０」〜「ｎ−１」に対応付けてそのノード番号の計算機ノードがアクセス可能な部分データベースのＩＤが登録されている。図１１の例は、ノード番号「０」の計算機ノード２−１は、ＩＤ「Ａ−０，Ａ−４，Ａ−８，Ｂ−１，Ｂ−５，…」によって示される部分データベースの原本と、ＩＤ「Ａ−２，Ａ−６，Ａ−10，Ｂ−０，Ｂ−４，…」によって示される部分データベースの複製をアクセスできること示している。この複製は、ノード番号「１」の計算機ノード２−２がアクセスできる部分データベースの原本の複製である。また、部分データベースのＩＤは、Ａ，Ｂ等のデータベースのＩＤと、その中での部分データベースの順番とを組み合わせたものになっている。
【００６４】
問合せ統合処理手段14ａは、図１に示した問合せ統合処理手段14が備えている機能に加え、格納状況管理手段15に部分問合せ処理用の計算機ノードのノード番号を通知する機能や、この通知に応答して格納状況管理手段15から返却された部分データベースのＩＤのリストに基づいて配布する部分問合せを決定する機能を備えている。
【００６５】
記録媒体Ｋ３は、ディスク，半導体メモリ，その他の記録媒体であり、計算機ノード10−１〜10−ｍを問合せ処理用の計算機ノードとして機能させるためのプログラムが記録されている。このプログラムは、計算機ノード10−１〜10−ｍによって読み取られ、計算機ノード10−１〜10−ｍの動作を制御することで、計算機ノード10−１〜10−ｍ上に、問合せ解析手段11，障害検出手段12，配布数決定手段13，問合せ統合処理手段14ａ，格納状況管理手段15を実現する。
【００６６】
図１２は、問合せ統合処理手段14ａで行われる処理の内の、図１の問合せ統合処理手段14で行われる処理と異なる部分のみを示した流れ図であり、ステップＢ７，Ｂ８の間およびステップＢ18，Ｂ19の間でそれぞれステップＥ１，Ｅ２を行う点、およびステップＢ10，Ｂ20の代わりにステップＥ３，Ｅ４を行う点が図４に示した処理と異なっている。また、図１３は格納状況管理手段15の処理例を示す流れ図である。以下、各図を参照して本実施の形態の動作を説明する。
【００６７】
問合せ処理用の計算機ノード１０−ｋ内の問合せ解析手段11は、外部通信網５を介して自計算機ノード宛の問合せを受け付けると、図２のＡ１，Ａ２に示す処理を行い、図３（ｃ）に示すような処理手順を導出する。
【００６８】
問合せ統合処理手段14ａは、問合せ解析手段11が処理手順を導出すると、図４のＢ１〜Ｂ７と同様の処理を行い、ノード番号「０」の計算機ノード２−１へ配布する部分問合せの数を決定する。その後、問合せ統合処理手段14ａは、図１２の流れ図に示すように、格納状況管理手段15を利用して計算機ノード２−１へ配布する部分問い合わせを１個決定する（Ｅ１）。このＥ１の処理を詳しく説明すると、次のようになる。
【００６９】
問合せ統合処理手段14ａは、部分問合せの配布対象にしている計算機ノード２−１のノード番号「０」を格納状況管理手段15に通知する。
【００７０】
ノード番号「０」が通知されると、格納状況管理手段15は、図１３に示すように格納状況管理表を参照し、ノード番号「０」の計算機ノード２−１がアクセス可能な部分データベースのＩＤのリストを取得し（Ｆ１）、取得したリストを問合せ統合処理手段14ａに返却する（Ｆ２）。
【００７１】
問合せ統合処理手段14ａは、リストが返却されると、配布可能と認識している部分問合せであって、且つ上記リストにＩＤが登録されている部分データベースに対する部分問合せの内の１つを選択し、それを配布する部分問合せとする。尚、そのような部分問合せが存在しない場合は、配布可能な部分問合せがないと判断する。以上が、Ｅ１で行う処理の詳細である。
【００７２】
Ｅ１で配布する部分問合せが決定され、Ｂ８の判断結果がｙｅｓとなると、問合せ統合処理手段14ａは、Ｅ１で決定した部分問合せを計算機ノード２−１に配布すると共に、配布した部分問合せを配布可能と認識している部分問合せから除外する（Ｅ３）。その後、図４に示したＢ11と同様の処理が行われる。
【００７３】
また、部分問合せ処理用の計算機ノード２−ｉから部分問合せの結果が送られてきた場合は、図４のＢ13〜Ｂ18と同様の処理を行い、計算機ノード２−ｉに配布する部分問合せの数を決定する。その後、問合せ統合処理手段14ａは、格納状況管理手段15を使用して計算機ノード２−ｉに配布する部分問合せを１個決定し（Ｅ２）、その部分問合せを計算機ノード２−ｉに配布すると共に、配布した部分問合せを配布可能と認識している部分問合せから除外する（Ｅ４）。その後、図４に示したＢ21と同様の処理が行われる。
【００７４】
次に、障害検出手段12によって計算機ノード２−ｉに障害が発生したことが検出された時の動作を説明する。
【００７５】
障害検出手段12は、図９の流れ図に示すように、計算機ノード２−ｉに障害が発生したことを検出すると、計算機ノード２−ｉに配布済みで、且つ結果が返却されていない部分問合せのＩＤを進捗管理表から取得し、それを問合せ統合処理手段14ａに通知する。問合せ統合処理手段14ａは、ＩＤが通知された部分問合せを未配布の部分問合せとして認識する。このように、障害の発生した計算機ノードに配布済みの部分問合せであって、結果が返却されていない部分問合せを未配布の部分問合せとして認識することにより、図１２のＥ４において上記した部分問合せが、正常に動作している部分問合せ処理用の計算機ノードであって、且つ上記した部分問合せに対応する部分データベースをアクセス可能な計算機ノードに配布される。
【００７６】
【発明の効果】
以上説明したように、本発明の問合せ並列処理システムは、複数の部分問合せ処理用の計算機ノードから構成されるクラスタシステム上で問合せを並列処理する際、各計算機ノードの負荷に応じて動的に配布すべき部分問合せの数を決定するようにしているので、各計算機ノードの負荷を動的に均衡化させることができる。その結果、問合せ並列処理システム全体の処理効率を高め、性能改善を図ることが可能になる。
【００７７】
また、本発明の問合せ並列処理システムは、部分問合せ処理用の計算機ノードに障害が発生した場合、障害が発生した計算機以外の計算機ノードについて、その負荷に応じて動的に配布すべき部分問合せの数を決定するようにしているので、部分問合せ処理用の計算機ノードに障害が発生した場合であっても、問合せの結果を完全なものにすることができる。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態の構成例を示すブロック図である。
【図２】問合せ解析手段11の処理例を示す流れ図である。
【図３】問合せ解析手段11の動作を説明するための図である。
【図４】問合せ統合処理手段14の処理例を示す流れ図である。
【図５】配布数決定手段13の処理例を示す流れ図である。
【図６】進捗管理表の構成例を示す図である。
【図７】問合せ統合処理手段14の処理例を示す流れ図である。
【図８】部分問合せ処理手段21の実施例を示すブロック図である。
【図９】障害検出手段12の処理例を示す流れ図である。
【図１０】本発明の第２の実施の形態の構成例を示すブロック図である。
【図１１】格納状況管理表の構成例を示す図である。
【図１２】問合せ統合処理手段14ａの処理例の一部を示す流れ図である。
【図１３】格納状況管理手段15の処理例を示す流れ図である。
【符号の説明】
１−１〜１−ｍ，10−１〜１０−ｍ…問合せ処理用の計算機ノード
11…問合せ解析手段
12…障害検出手段
13…配布数決定手段
14，14ａ…問合せ統合処理手段
15…格納状況管理手段
２−１〜２−ｎ…部分問合せ処理用の計算機ノード
21…部分問合せ処理手段
３−１〜３−ｊ，30−１〜30−ｎ…二次記憶装置
31…部分データベース
４…通信網
５…外部通信網
６…記憶装置共有網
Ｋ１〜Ｋ３…記録媒体[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a technique for processing in parallel a query statement describing data retrieval, grouping, and aggregation operations for a database using a plurality of computer nodes, and in particular, by balancing the processing load of the plurality of computer nodes. The present invention relates to a query parallel processing system that improves the performance of the entire system.
[0002]
[Prior art]
A number of researchers have proposed a method for processing database queries in parallel ("Query Evaluation Techiniques for Large Databases", Goetz Graefe, ACM Computing Surveys, Vol. 25, No. 2 June 1993, etc.) . The query processing for a huge database is a bottleneck in reading data from the secondary storage device. For this reason, a method is basically used in which a database is divided and stored in a plurality of secondary storage devices, and read and processed in parallel.
[0003]
A process for retrieving a record by specifying a condition for one database and retrieving the record is called a selection process. For example, the selection process is a process in which an account number is designated with respect to a bank account database and a balance is inquired, or a fingerprint image is given to a fingerprint database and collated. The database selection process can be easily parallelized by dividing the database into a plurality of secondary storage devices and storing them. In this case, each partial database divided and stored in each secondary storage device is parallelized by reading in parallel from each computer node and performing a search process. In the selection process, an access path such as an index may be used. By creating an index independently for each partial database, the selection process using the index can be easily parallelized. These parallelization techniques are also incorporated in commercial relational database systems. Examples include Oracle (“Application of partitioning technology to VLDB”, IEICE Technical Report AI97-40, DE97-73, 1997-12).
[0004]
The classification of records in the database by the value of a certain attribute (column) is called classification processing. For example, a process of classifying records in a bank account database by branch. Processing such as counting the number of records or calculating an average value or maximum value of a certain column is called aggregation processing, and processing for sorting record groups by a certain column value is called sorting processing.
[0005]
A technique for parallelizing these processes is described in JP-A-10-97544. In this technique, a database is stored in a plurality of secondary storage devices under a plurality of input / output servers, and records in the database are read in parallel from the plurality of input / output servers. The read records are classified by the input / output server and distributed to a plurality of aggregation processing servers. The aggregation processing server performs aggregation processing on the records from each input / output server to obtain partial aggregation results, and finally merges the partial aggregation results. The processing so far is performed in parallel by a plurality of input / output servers and a plurality of aggregation processing servers.
[0006]
In this way, the parallel processing of queries to the database is performed by dividing the database into a plurality of partial databases and distributedly storing them in a plurality of secondary storage devices, and a plurality of computer nodes read data from each secondary storage device in parallel. It is based on processing. However, when the database is distributedly stored in the secondary storage device and the search processing is performed in parallel for each partial database, even if the partial database is stored evenly in each secondary storage device, each of them is read and processed. The load on the computer node is not equalized, which may deteriorate the processing efficiency of the entire system.
[0007]
If there is a load imbalance in this way, a shared memory type parallel computer can prevent the processing efficiency of the entire system from deteriorating by generating more processes than the number of CPUs. This is because even if the load on each process becomes unbalanced, the process is switched by the scheduling of the OS, and the query processing can proceed without the CPU playing. However, in a parallel computer using a plurality of computer nodes, a process cannot be moved between computer nodes. Since the processing of the entire system finally waits for the end of processing of the computer node with the heaviest load, the computer node with the light load is idle, and the processing efficiency of the entire system deteriorates.
[0008]
On the other hand, JP-A-10-301822 discloses the following technique. A query to the database is analyzed to generate an access path, and the access path is further decomposed into a plurality of partial access paths. After that, for each group of partial access paths that can be executed simultaneously, some of the partial access paths belonging to that group are grouped into one group according to the processing cost. Here, the processing cost is calculated from statistical information such as the number of table rows, column distribution status, index hit rate, and the like, and is a numerical value of the degree of load required to execute a partial access path. . When several partial access paths belonging to a partial access path group are grouped into one group according to the processing cost, the total processing cost of the partial access paths to which the group belongs is the maximum processing of the partial access path group. Do not exceed costs.
[0009]
[Problems to be solved by the invention]
According to the technique described in Japanese Patent Laid-Open No. 10-301822 described above, the processing costs of the groups to which the partial access path groups that can be executed simultaneously can be made substantially equal. Therefore, by assigning processes to a plurality of computer nodes in units of groups, the load on each computer node can be balanced to some extent. However, since the technique described in Japanese Patent Application Laid-Open No. 10-301822 uses static information called statistical information, it cannot cope with dynamic load fluctuations of each computer node. There was a problem that the overall processing efficiency could not be made sufficiently high.
[0010]
Accordingly, an object of the present invention is to increase the processing efficiency of the entire system by dynamically balancing the processing load on each computer node when a query is processed in parallel on a cluster system composed of a plurality of computer nodes. There is.
[0011]
[Means for Solving the Problems]
The query parallel processing system of the present invention is a computer node for query processing, analyzes a query and decomposes it into partial queries, distributes partial queries to a plurality of computer nodes for partial query processing, and performs parallel processing. Merge the returned partial query results into the query results. This configuration is a basic configuration for parallel processing of queries.
[0012]
In the query parallel processing system of the present invention, when distributing a partial query to computer nodes for partial query processing, the number of distribution to each computer node is determined and distributed as needed. At this time, in order to keep the load on each computer node equal, for example, the number of distributions is determined so that the number of partial queries being processed in each computer node is always equal. An undistributed partial query exceeding the number of distributions is processed by the computer node when any computer node returns a result later. That is, the remaining partial query is processed for the computer node that has returned the result so that the computer node does not play. The above is the basic concept of load balancing in the present invention.
[0013]
In the query parallel processing system of the present invention, a plurality of computer nodes for query processing are provided to prevent the computer nodes from becoming a bottleneck. In this case, separate queries are received by a plurality of computer nodes, and partial query distribution processing for each query is performed independently. The distribution strategy at this time is exactly the same as when a query is received by a single computer node.
[0014]
Further, when a failure occurs in a computer node for partial query processing, the computer node is detected, and a partial query to be distributed to the computer node is distributed to another computer node for partial query processing for processing. As a result, even if a failure occurs in a computer node for partial query processing, all partial queries can be processed and the query result can be completed.
[0015]
In the non-sharing cluster system, each partial query processing computer node can access only the secondary storage device connected to itself. Therefore, only a partial database stored in the secondary storage device can be accessed. Therefore, the correspondence between each computer node and the partial database that it can access is managed. Thereby, a partial query that can be processed by each node can be distributed to each computer node.
[0016]
In the non-sharing cluster system, the original and replica of the partial database are stored in a secondary storage device under separate computer nodes. Thereby, the same partial database can be accessed from a plurality of computer nodes. That is, the same partial query can be processed by a plurality of computer nodes. By utilizing this, it is possible to cause the computer node on the lighter load side to process a partial query, thereby balancing the load on the computer node. At the same time, even when a computer node fails, another computer node can process a query to be distributed to that node.
[0017]
DETAILED DESCRIPTION OF THE INVENTION
Next, embodiments of the present invention will be described in detail with reference to the drawings.
[0018]
An example of the first embodiment of the present invention will be described. The first embodiment relates to a query parallel processing system that processes a query for a database recorded in a secondary storage device in parallel on a shared cluster system in which a plurality of computer nodes share the secondary storage device. An example of the configuration is shown in FIG.
[0019]
Referring to FIG. 1, a plurality of computer nodes 1-1 to 1-m for query processing and a plurality of computer nodes 2-1 to 2-n for partial query processing are connected by a communication network 4, and between computer nodes Data can be exchanged. As the communication network 4, an Ether or ATM HUB, a switch, or other dedicated connection network can be used.
[0020]
The query processing computer nodes 1-1 to 1-m are connected to the external communication network 5 and can receive a query via the external communication network 5. As the external communication network 5, a LAN or WAN is used. As a query, a request to extract a part of data from a database that is a set of data, a request to perform aggregation such as the average or maximum value of data values, or a value of a certain data There is a request for grouping and sorting with a focus on. These requests are described and passed using a query language SQL or the like, or are combined with only a special keyword.
[0021]
Further, the computer nodes 2-1 to 2-n for partial inquiry processing are connected to the storage device shared network 6 that shares the secondary storage devices 3-1 to 3-j. -1 to 2-n can also access the secondary storage devices 3-1 to 3-j. As the storage device sharing network 6, a SCSI or FibreChanel hub or switch, or a dedicated network for peripheral devices is used. The database to be queried is divided into a plurality of partial databases 31, and is distributed and stored in a plurality of secondary storage devices 3-1 to 3-j. Database division methods include round-robin distributed arrangement, hash division, key value division, and the like.
[0022]
Each of the query processing computer nodes 1-1 to 1-m includes a query analysis unit 11, a failure detection unit 12, a distribution number determination unit 13, a query integration processing unit 14, and a recording medium K1. .
[0023]
The query analysis means 11 analyzes a query addressed to its own computer node sent via the external communication network 5, and is a processing procedure for processing this query, and each part constituting the database to be queried It has a function of deriving a processing procedure including a partial query for a database (a search request for a partial database).
[0024]
The failure detection means 12 has a function of detecting that a failure has occurred in the computer nodes 2-1 to 2-n for partial inquiry processing.
[0025]
The distribution number determination means 13 has a function of dynamically determining the number of partial queries distributed to the computer nodes 2-1 to 2-n according to the load of each computer node 2-1 to 2-n.
[0026]
The query integration processing means 14 advances the query processing according to the processing procedure derived by the query analysis means 11, and when distributing partial queries to the computer nodes 2-1 to 2-n, the distribution number determination means 13 is used. It distributes undistributed partial queries as many as the number of distributions determined by use, and when partial query results are returned, it has a function of deriving query results while merging related results.
[0027]
The recording medium K1 included in the computer nodes 1-1 to 1-m is a disk, a semiconductor memory, or other recording medium, and causes the computer nodes 1-1 to 1-m to function as computer nodes for inquiry processing. Programs are recorded. This program is read by the computer nodes 1-1 to 1-m, and the operation of the computer nodes 1-1 to 1-m is controlled, so that the query analysis unit 11 is placed on the computer nodes 1-1 to 1-m. , Failure detection means 12, distribution number determination means 13, and query integration processing means 14 are realized.
[0028]
The computer nodes 2-1 to 2-n for partial inquiry processing are each provided with partial inquiry processing means 21 and a recording medium K2.
[0029]
The partial query processing means 21 has a function of receiving a partial query distributed from the query processing computer nodes 1-1 to 1-m, processing the partial query, and returning the result to the distribution source computer node.
[0030]
The recording media K2 included in the computer nodes 2-1 to 2-n are disks, semiconductor memories, and other recording media, and cause the computer nodes 2-1 to 2-n to function as computer nodes for partial inquiry processing. A program for recording is recorded. This program is read by the computer nodes 2-1 to 2-n and controls the operations of the computer nodes 2-1 to 2-n, so that the partial inquiry processing means 21 on the computer nodes 2-1 to 2-n. Is realized.
[0031]
Next, the operation of the present embodiment will be described.
[0032]
When the query analysis means 11 operating in the computer node 1-k (1 ≦ k ≦ m) connected to the external communication network 5 accepts a query addressed to its own computer node input via the external communication network 5, FIG. As shown in the flowchart of FIG. 2, the meaning is analyzed, and a processing procedure for obtaining a query result including processing steps for the database to be processed is derived (A1). For example, for a query as shown in FIG. 3A, a tree-like processing procedure composed of a plurality of steps as shown in FIG. In this example, since there are two databases to be processed, two processing steps for the database are included. FIG. 5B is merely an example, and the processing procedure derived in A1 is in response to the inquiry. For example, when the data extracted from the databases 1 and 2 are combined and grouped, a processing procedure for performing the grouping process is derived instead of the sort process shown in FIG.
[0033]
Next, the query analysis means 11 divides the processing steps for each database into processing steps (partial queries) for each partial database constituting the database, and the partial query results for each set of partial queries for the same database. A processing step for merging is added (A2). Here, a set of partial queries for one database is referred to as a partial query set. If there are multiple databases to be processed, one partial query set is created for each database. For example, the processing procedure as shown in FIG. 3C is created from the processing procedure as shown in FIG. In this example, since two databases are targeted by one query, two sets of partial queries are created. In addition, the number of partial queries included in one partial query set is the same as the number of partial databases constituting a corresponding database because a partial query is created for each database constituting the database.
[0034]
When the query analysis means 11 derives a processing procedure for the query, the query integration processing means 14 obtains and stores the processing procedure from the query analysis means 11 as shown in the flowchart of FIG. It is determined whether a partial query set exists in (B1, B2). When the processing procedure is obtained from the query analysis means 11, there is a set of partial queries and the determination result of B2 is yes, so the query integration processing means 14 performs the process of B3.
[0035]
In B3, the query integration processing means 14 analyzes the dependency relationship between the sets of partial queries included in the processing procedure, and recognizes a partial query that can be distributed. For example, if there are two sets of partial queries and they are independent sets having no processing dependency, all partial queries included in the two sets are recognized as distributable partial queries. Further, for example, when there are two partial query sets S1 and S2, and there is a dependency relationship that the processing result of the set S1 is necessary for the processing of the set S2, only the partial query included in the set S1 is present. Is recognized as a distributable partial query.
[0036]
Thereafter, the query integration processing means 14 sets the computer node assigned with the node number “0” among the computer nodes 2-1 to 2-n for partial query processing as the computer node to which the partial query is distributed. (B4). In this embodiment, it is assumed that node numbers “0” to “n−1” are assigned to the computer nodes 2-1 to 2-n, respectively.
[0037]
Next, the query integration processing means 14 uses the distribution number determination means 13 to determine the number of partial queries distributed to the computer node 2-1 having the node number “0” (B6). At that time, the number of distributions is determined so that the number of partial queries being processed in each of the computer nodes 2-1 to 2-n is equal among the computer nodes.
[0038]
The process of B6 will be described in detail. When determining the number of partial queries distributed to the computer node 2-1, the query integration processing unit 14 passes the node number “0” of the computer node 2-1 to be distributed to the distribution number determination unit 13. .
[0039]
When the node number “0” is passed from the query integration processing means 14 (FIG. 5, C1), the distribution number determination means 13 refers to the progress management table and the partial query whose result is not returned from the computer node 2-1. (Number of results not returned) is acquired (C2).
[0040]
The progress management table manages the number of unreturned results of the computer nodes 2-1 to 2-n and the distributed partial queries in which the results are not returned from the computer nodes 2-1 to 2-n. For example, the configuration shown in FIG. The progress management table shown in FIG. 6 is associated with the node numbers “0” to “n−1” of the computer nodes 2-1 to 2-n for partial inquiry processing, and the computer nodes 2-1 to 2- The number n of unreturned results and the IDs of partial queries whose results are not returned from the computer nodes 2-1 to 2-n are registered. The number of result returns and the partial query ID are changed by the query integration processing means 14.
[0041]
In C2, when the number of unreturned results of the computer node 2-1 is acquired, the distribution number determining means 13 determines whether or not the condition “result unreturned number <target distribution number” is satisfied (C3). Here, the target distribution number is set in the computer node 1-k so that the number of partial queries from the computer node 1-k being processed in each of the computer nodes 2-1 to 2-n is equal to a predetermined number. For example, when the number of partial queries is set to two, the target distribution number is “2”. When it is determined that the condition is satisfied in C3 (the determination result is yes), the calculation “target distribution number−result unreturned number” is performed to obtain the distribution number, and the obtained distribution number is subjected to the query integration process The means 14 is notified (C4). On the other hand, if it is determined that the condition is not satisfied (C3 is no), the number of distributions = “0” is notified to the query integration processing means 14 (C5). When the number of distributions is notified from the distribution number determination unit 13, the query integration processing unit 14 determines the value as the number of partial queries distributed to the computer node 2-1. The above is the process performed in B6.
[0042]
If the distribution number of partial queries to the computer node 2-1 determined in B6 is larger than “0” and there are undistributable partial queries that can be distributed (B7 and B8 are both yes), TCP, Using a general-purpose communication protocol such as UDP / IP or various dedicated communication protocols, one of undistributable partial queries that are not distributed is distributed to the computer node 2-1, and the distributed partial query is distributed. From the partial query that is recognized as being distributable (B10), the distribution number is decremented by 1, and the management information for the computer node 2-1 in the progress management table is updated (B11). For example, when the progress management table has the configuration shown in FIG. 6, the number of unreturned results of the computer node 2-1 registered in association with the node number “0” is incremented by 1 and the current computer node 2-1 The ID of the partial query distributed to is registered in association with the node number “0”. The query integration processing means 14 distributes the number of partial queries determined in B6 to the computer node 2-1, or until there is no undistributed partial query (B7 becomes no or B8 becomes no. The above-described processes B7 to B11 are repeated.
[0043]
When B7 or B8 becomes no, the query integration processing means 14 increments the node number by 1 and makes the computer node 2-2 a computer node to which partial queries are distributed (B9). Thereafter, the query target integration processing means 14 repeats the above-described processes B5 to B11 until “node number <number of nodes” (until B5 becomes no). When the result of determination at B5 is no, the query integration processing means 14 enters a waiting state. Through the above processing, the number of partial queries indicated by the target distribution number is distributed from the computer node 1-k to each of the computer nodes 2-1 to 2-n. However, when the number of partial queries that can be distributed is smaller than “target distribution number × number of computer nodes 2-1 to 2-n”, some of the computer nodes 2-1 to 2-n The number of partial queries indicated by the target distribution number is distributed only to the computer nodes.
[0044]
On the other hand, when a partial query is sent from the computer node 1-k, the partial query processing means 21 in the computer nodes 2-1 to 2-n receives a partial query from the partial database 31 indicated by the partial query. Data that satisfies the specified condition is extracted, and the extracted data is returned to the computer node 1-k as the inquiry source. At that time, the ID of the processed partial inquiry and the node number of the local computer node are also returned.
[0045]
The query integration processing means 14 in the computer node 1-k returns the partial query result, partial query ID and node number from the computer node 2-i (1 ≦ i ≦ n) (FIG. 4, B13). Based on the result returned this time, it is checked whether there is a set of partial queries for which all the results have been obtained (B14).
[0046]
If it is determined that such a partial query set does not exist (B14 is no), the node number of the computer node 2-i that has returned the result is acquired, and the distribution number determining means 13 uses the computer node 2- The number of partial queries distributed to i is determined (B16, B17). The process of B17 is the same process as B6.
[0047]
When the number of partial queries distributed to the computer node 2-i is determined, the query integration processing means 14 distributes the partial query of the number of distributions determined in B17 to the computer node 2-i, or undistributable distribution is possible. A partial query that is recognized as being able to be distributed until the partial query does not exist (until B18 becomes no or B19 becomes no) and the partial query is distributed to the computer node 2-i. The process (B20) to be excluded from the process and the process (B21) for updating the progress management table by decrementing the number of distribution by −1 are repeated. Here, for example, when the progress management table has the configuration shown in FIG. 6, a process of decrementing the result unreturned number registered in association with the node number “i−1” of the computer node 2-i. In addition, a process of deleting the ID returned from the computer 2-i among the IDs of the partial queries registered in association with the node number “i-1” is performed. When the determination result of B18 or B19 is no, the query integration processing means 14 enters a waiting state.
[0048]
Also, in B14, if it is determined that there is a set of partial queries for which all the results have been obtained based on the results returned this time (judgment result is yes), all the results are obtained from the stored processing procedure. The set of partial queries is removed (B15), and then the process of B2 is performed.
[0049]
If it is determined in B2 that a partial query set exists in the stored processing procedure (the determination result is yes), the process of B3 is performed and it is determined that the partial query set does not exist (determination) If the result is no), processing on the root side from the partial query is sequentially performed toward the root, and the result of the query is returned to the query source via the external communication network 5 (B12). For example, if the processing procedure obtained in B1 is as shown in FIG. 3 (c), as shown in FIG. 7, when all the results are obtained for two sets of partial queries, the results are obtained for each partial query set. Processing for merging is performed, and further processing for combining the merged results and processing for sorting the combined results are sequentially performed. In this embodiment, the merge is performed after obtaining all the partial query results. However, there is a method of merging one by one each time the results are returned. For example, when an average value, a total value, or the like is obtained as a result for a query, the result is added for each set each time the partial query result is returned. Thereby, each step of the processing procedure is processed in a pipeline manner.
[0050]
As described above, when the query integration processing unit 14 obtains the processing procedure from the query analysis unit 11 (FIG. 4, B1), the query integration processing unit 14 sends a partial query of a predetermined target distribution number to each of the computer nodes 2-1 to 2-1. -N is distributed (B5 to B11), and the remaining partial queries are left undistributed. An undistributed partial query is later thrown to a computer node that has returned the result of the partial query early (B16 to B21). In other words, undistributed partial queries are distributed to computer nodes that have been processed early and the load has been reduced.
[0051]
As described above, the number of partial queries being processed in each of the computer nodes 2-1 to 2-n is kept below a certain number. Therefore, in a situation where a large number of partial queries are generated, the number of partial queries being processed in each computer node is always kept at a certain fixed number. This balances the load on each computer node.
[0052]
Next, FIG. 8 shows an embodiment of the partial inquiry processing means 21 that operates independently at each of the computer nodes 2-1 to 2-n. In the embodiment shown in FIG. 8A, there is a queue for storing distributed partial queries, which are taken out one by one to the partial query processing unit and processed in order. The processed partial query result is returned from the partial query processing unit to the distribution source computer node.
[0053]
In contrast, FIG. 8B shows an embodiment in which a plurality of partial queries are processed in a multiplexed manner. By using multiple threads or multiple processes, a plurality of partial query processing units are operated simultaneously, and each distributed partial query is assigned to each partial query processing unit. Each partial inquiry processing unit operates in a multiple manner according to the scheduling of the OS of the computer node. For example, the partial query processing unit includes a process of reading a partial database from the secondary storage devices 3-1 to 3-j in units of a certain size block, and a process of performing search / calculation on the read block data. In this case, the plurality of partial inquiry processing units process the input / output processing of the secondary storage devices 3-1 to 3-j and the CPU processing in the computer node in a temporally overlapping manner.
[0054]
The partial inquiry processing means 21 may be a combination of the queues in FIG. 8A and FIG. 8B.
[0055]
Next, an operation when a failure occurs in the computer nodes 2-1 to 2-n for partial inquiry processing will be described.
[0056]
As shown in the flowchart of FIG. 9, the failure detection means 12 in the query processing computer node 1-k detects that a failure has occurred in the partial query processing computer nodes 2-1 to 2-n. The state of each of the computer nodes 2-1 to 2-n is monitored (D1). This monitoring is performed, for example, by periodically exchanging small data with each of the computer nodes 2-1 to 2-n via the communication network 4. That is, when data exchange can be performed normally, it is determined that no failure has occurred, and when data exchange becomes impossible, it is determined that a failure has occurred. Further, a dedicated failure diagnosis mechanism configured by hardware is provided in the computer nodes 2-1 to 2 -n, and when a failure occurs, the failure diagnosis mechanism of the computers 2-1 to 2 -n is connected via the communication network 4. Then, the occurrence of the failure may be notified to the computer nodes 1-1 to 1-n.
[0057]
For example, when detecting that a failure has occurred in the computer node 2-1, the failure detection means 12 in the computer node 1-k first obtains the node number “0” of the computer node 2-1 (D2). Thereafter, the failure detection means 12 acquires the ID of the partial query that has already been distributed to the computer node 2-1 of the node number “0” from the progress management table and for which the result has not been returned (D3). Further, the row of the node number “0” of the computer node in which the failure has occurred is deleted from the progress management table, and the computer node 2-1 is excluded from the distribution target of the partial inquiry (D4). For example, when the content of the progress management table is as shown in FIG. 6, the failure detection means 12 acquires “ID1, ID2” as the ID of the partial inquiry (D3), and the first row of the progress management table is displayed. Delete (D4). Thereafter, the failure detection means 13 notifies the inquiry integration processing means 14 of the ID of the partial inquiry acquired in D3 (D5). In the flowchart of FIG. 9, the faulty computer node is excluded from the partial query distribution target by deleting the row of the node number of the faulty computer node from the progress management table. It is also possible to take a method of attaching a mark indicating.
[0058]
When the partial query ID is notified from the failure detection unit 12, the query integration processing unit 14 recognizes the partial query as an undistributed partial query. In this way, by recognizing a partial query that has been distributed to a failed computer node and has not returned a result as an undistributed partial query, in B20 in FIG. It is distributed to any computer node for partial query processing that has not been performed. In the shared cluster system in which the secondary storage devices 3-1 to 3-j are connected by the storage device shared network 6, each computer node 2-1 to 2-n is connected to any part via the storage device shared network 6. Since the database 31 is also accessible, a partial query that was to be distributed to a certain computer node can be processed by another computer node. Therefore, all partial queries can be processed by distributing the partial query distributed to the faulty computer node to other normal computer nodes.
[0059]
Next, an example of the second embodiment of the present invention will be described. In the second embodiment, an inquiry to a database recorded in a secondary storage device is performed on a non-shared cluster system in which a secondary storage device that can be accessed only by the computer node is connected to each of a plurality of computer nodes. FIG. 10 shows an example of the configuration of a query parallel processing system that processes in parallel.
[0060]
The difference between the present embodiment and the first embodiment shown in FIG. 1 is that the computer nodes 10-1 to 10-m are provided instead of the computer nodes 1-1 to 1-m, and the memory is stored. There is no device sharing network, and secondary storage devices 30-1 to 30-n are directly connected to the respective computer nodes 2-1 to 2-n, and a recording medium K3 is provided instead of the recording medium K1. It is a point.
[0061]
A database to be queried is divided into a plurality of partial databases 31 and distributed and stored in a plurality of secondary storage devices 30-1 to 30-n. Further, a copy of each partial database is distributed and stored in a secondary storage device different from the original.
[0062]
The computer nodes 10-1 to 10-m include the query integration processing unit 14a instead of the query integration processing unit 14, and the computer node 1 shown in FIG. -1 to 1-m.
[0063]
The storage status management means 15 manages the IDs of the partial databases accessible to the computer nodes 2-1 to 2-n for partial query processing, and the nodes of the computer nodes 2-1 to 2-n from the query integration processing means 14a. When the number is notified, it has a function of returning a list of partial database IDs accessible to the computer node of the notified node number. The management of the ID of the partial database 31 accessible by each of the computer nodes 2-1 to 2-n is performed using, for example, a storage status management table as shown in FIG. In the storage status management table, IDs of partial databases that can be accessed by the computer nodes corresponding to the node numbers “0” to “n−1” of the computer nodes 2-1 to 2-n are registered. Yes. In the example of FIG. 11, the computer node 2-1 with the node number “0” is the original partial database indicated by the ID “A-0, A-4, A-8, B-1, B-5,. And a copy of the partial database indicated by the ID “A-2, A-6, A-10, B-0, B-4,...” Can be accessed. This copy is a copy of the original partial database that can be accessed by the computer node 2-2 with the node number “1”. The ID of the partial database is a combination of the IDs of databases such as A and B and the order of the partial databases therein.
[0064]
In addition to the functions provided in the query integration processing means 14 shown in FIG. 1, the query integration processing means 14a includes a function for notifying the storage status management means 15 of the node number of the computer node for partial inquiry processing, It has a function of determining a partial query to be distributed based on a list of partial database IDs returned from the storage status management means 15 in response.
[0065]
The recording medium K3 is a disk, semiconductor memory, or other recording medium, and stores a program for causing the computer nodes 10-1 to 10-m to function as computer nodes for inquiry processing. This program is read by the computer nodes 10-1 to 10-m, and by controlling the operations of the computer nodes 10-1 to 10-m, the query analysis means 11 is placed on the computer nodes 10-1 to 10-m. , Failure detection means 12, distribution number determination means 13, inquiry integration processing means 14a, and storage status management means 15 are realized.
[0066]
FIG. 12 is a flowchart showing only the part of the processing performed by the query integration processing means 14a that is different from the processing performed by the query integration processing means 14 of FIG. 1, between steps B7 and B8 and step B18, 4 is different from the process shown in FIG. 4 in that steps E1 and E2 are performed between B19 and steps E3 and E4 are performed instead of steps B10 and B20. FIG. 13 is a flowchart showing a processing example of the storage status management means 15. Hereinafter, the operation of the present embodiment will be described with reference to the drawings.
[0067]
When the query analyzing means 11 in the computer node 10-k for query processing receives a query addressed to its own computer node via the external communication network 5, it performs the processing shown in A1 and A2 in FIG. ) Is derived.
[0068]
When the query analysis unit 11 derives the processing procedure, the query integration processing unit 14a performs the same processing as B1 to B7 in FIG. 4 and determines the number of partial queries to be distributed to the computer node 2-1 having the node number “0”. decide. Thereafter, the query integration processing unit 14a determines one partial query to be distributed to the computer node 2-1 using the storage status management unit 15 as shown in the flowchart of FIG. 12 (E1). The process of E1 will be described in detail as follows.
[0069]
The query integration processing unit 14a notifies the storage status management unit 15 of the node number “0” of the computer node 2-1 that is the distribution target of the partial query.
[0070]
When the node number “0” is notified, the storage status management means 15 refers to the storage status management table as shown in FIG. 13 and stores the partial database accessible to the computer node 2-1 with the node number “0”. A list of IDs is acquired (F1), and the acquired list is returned to the query integration processing means 14a (F2).
[0071]
When the list is returned, the query integration processing means 14a selects one of the partial queries for which the partial database is recognized as distributable and the ID is registered in the list. , And let it be a partial query to distribute. If no such partial query exists, it is determined that there is no partial query that can be distributed. The above is the details of the processing performed in E1.
[0072]
When the partial query to be distributed is determined in E1, and the determination result in B8 is yes, the query integration processing means 14a distributes the partial query determined in E1 to the computer node 2-1, and can distribute the distributed partial query. (E3). Thereafter, the same processing as B11 shown in FIG. 4 is performed.
[0073]
When a partial query result is sent from the computer node 2-i for partial query processing, the same processing as B13 to B18 in FIG. 4 is performed, and the number of partial queries distributed to the computer node 2-i. To decide. Thereafter, the query integration processing means 14a uses the storage status management means 15 to determine one partial query to be distributed to the computer node 2-i (E2), and distribute the partial query to the computer node 2-i. Then, the distributed partial query is excluded from the partial queries recognized as distributable (E4). Thereafter, processing similar to B21 shown in FIG. 4 is performed.
[0074]
Next, the operation when the failure detection means 12 detects that a failure has occurred in the computer node 2-i will be described.
[0075]
As shown in the flowchart of FIG. 9, when the failure detection means 12 detects that a failure has occurred in the computer node 2-i, the failure detection means 12 distributes the partial query that has already been distributed to the computer node 2-i and for which no result has been returned. The ID is acquired from the progress management table and notified to the query integration processing means 14a. The query integration processing unit 14a recognizes the partial query notified of the ID as an undistributed partial query. In this way, by recognizing a partial query that has already been distributed to a failed computer node and has not returned a result as an undistributed partial query, the partial query described above at E4 in FIG. The computer nodes for partial query processing that are operating normally and distributed to the computer nodes that can access the partial database corresponding to the partial query described above.
[0076]
【The invention's effect】
As described above, the query parallel processing system according to the present invention dynamically executes a query in parallel on a cluster system composed of a plurality of computer nodes for partial query processing according to the load on each computer node. Since the number of partial queries to be distributed is determined, the load on each computer node can be dynamically balanced. As a result, it is possible to improve the processing efficiency of the entire query parallel processing system and improve the performance.
[0077]
Further, the query parallel processing system of the present invention, when a failure occurs in a computer node for partial query processing, for a computer node other than the failed computer, a partial query to be dynamically distributed according to its load. Since the number is determined, even if a failure occurs in the computer node for partial query processing, the result of the query can be completed.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration example of a first exemplary embodiment of the present invention.
FIG. 2 is a flowchart showing a processing example of an inquiry analysis unit 11;
FIG. 3 is a diagram for explaining the operation of inquiry analysis means 11;
FIG. 4 is a flowchart showing a processing example of the query integration processing means 14;
FIG. 5 is a flowchart showing a processing example of the distribution number determining means 13;
FIG. 6 is a diagram illustrating a configuration example of a progress management table.
FIG. 7 is a flowchart showing a processing example of the query integration processing means 14;
8 is a block diagram showing an embodiment of partial inquiry processing means 21. FIG.
FIG. 9 is a flowchart showing a processing example of the failure detection means 12;
FIG. 10 is a block diagram illustrating a configuration example of a second exemplary embodiment of the present invention.
FIG. 11 is a diagram showing a configuration example of a storage status management table.
FIG. 12 is a flowchart showing a part of a processing example of the query integration processing means 14a.
13 is a flowchart showing a processing example of the storage status management means 15. FIG.
[Explanation of symbols]
1-1 to 1-m, 10-1 to 10-m: Computer nodes for query processing
11: Query analysis means
12 ... Fault detection means
13… Distribution number determination means
14, 14a ... Query integration processing means
15 ... Storage status management means
2-1 to 2-n: Computer nodes for partial query processing
21 ... Partial query processing means
3-1 to 3-j, 30-1 to 30-n ... secondary storage device
31 ... Partial database
4. Communication network
5 ... External communication network
6 ... Storage device shared network
K1 to K3 ... Recording medium

Claims

A shared cluster system in which multiple secondary query devices that share multiple partial databases created by dividing a database into multiple parts are shared by multiple partial query processing computer nodes A query parallel processing system for processing a query for the database in parallel,
A computer node for query processing that accepts queries to the database;
The computer node for the query processing is
A procedure for analyzing a query to the database and processing the query, and for deriving a processing procedure including a partial query for each partial database constituting the database to be queried and a merge process of the partial query results Analysis means;
Distribution number determination means for determining the number of partial queries to be dynamically distributed according to the load for each of the partial query processing computer nodes;
When processing the query according to the processing procedure derived by the query analysis means and distributing the partial query to the computer nodes for each partial query processing, only the number of distributions determined by the distribution number determination means is not yet determined. When the partial query of distribution is distributed and the result of the partial query is returned, a query integration processing means for deriving the result of the query while merging the related results is provided,
The plurality of computer nodes for partial query processing are respectively
A parallel query processing system comprising: partial query processing means for receiving a distributed partial query, processing the partial query, and returning a result to a distribution source computer node.

The query parallel processing system according to claim 1,
The distribution number determining means has a configuration for determining the distribution number of partial queries so that the number of partial queries being processed in each of the partial query processing computer nodes is a predetermined number. A query parallel processing system.

The query parallel processing system according to claim 1,
A query parallel processing system comprising a plurality of computer nodes for query processing.

A shared cluster system in which multiple secondary query devices that share multiple partial databases created by dividing a database into multiple parts are shared by multiple partial query processing computer nodes A query parallel processing system for processing a query for the database in parallel,
A computer node for query processing that accepts queries to the database;
The computer node for the query processing is
A procedure for analyzing a query to the database and processing the query, and for deriving a processing procedure including a partial query for each partial database constituting the database to be queried and a merge process of the partial query results Analysis means;
Failure detection means for detecting that a failure has occurred in the computer node for partial query processing;
Distribution that determines the number of partial queries to be dynamically distributed according to the load of computer nodes other than the computer node in which the failure detection is detected by the failure detection means among the computer nodes for partial query processing Number determining means;
When processing the query according to the processing procedure derived by the query analysis means and distributing the partial query to the computer nodes for each partial query processing, only the number of distributions determined by the distribution number determination means is not yet determined. When the partial query of distribution is distributed and the result of the partial query is returned, a query integration processing means for deriving the result of the query while merging the related results is provided,
The plurality of computer nodes for partial query processing are respectively
A parallel query processing system comprising: partial query processing means for receiving a distributed partial query, processing the partial query, and returning a result to a distribution source computer node.

The query parallel processing system according to claim 4,
The number-of-distribution determination means has a predetermined number of partial queries being processed in a computer node other than the computer node in which the failure detection is detected by the failure detection means among the computer nodes for the partial inquiry processing. A query parallel processing system having a configuration for determining a distribution number of partial queries so as to be a number.

The query parallel processing system according to claim 4,
A query parallel processing system comprising a plurality of computer nodes for query processing.

A plurality of partial databases created by dividing a database into a plurality of parts are connected to a secondary storage device that can be accessed only by the computer node for each of the plurality of partial query processing computer nodes. A query parallel processing system for processing a query for the database in parallel on a non-shared cluster system distributed and stored in a device,
A computer node for query processing that accepts queries to the database;
The computer node for the query processing is
A procedure for analyzing a query to the database and processing the query, and for deriving a processing procedure including a partial query for each partial database constituting the database to be queried and a merge process of the partial query results Analysis means;
Storage status management means for managing the correspondence between the computer nodes for each partial query processing and the partial database accessible from the computer nodes;
Distribution number determination means for determining the number of partial queries to be dynamically distributed according to the load for each of the partial query processing computer nodes;
When processing the query according to the processing procedure derived by the query analysis means and distributing the partial query to the computer node for the partial query processing, the storage status management is performed for all the partial databases accessible to the computer node. When undistributed partial queries of the number of distributions determined using the distribution number determination means are distributed from the partial queries to those partial databases, and the partial query results are returned Comprises a query integration processing means for deriving a query result while merging related results,
The computer nodes for processing each partial query are respectively
A parallel query processing system comprising: partial query processing means for receiving a distributed partial query, processing the partial query, and returning a result to a distribution source computer node.

The query parallel processing system according to claim 7,
The distribution number determining means is configured to determine the distribution number of partial queries so that the number of partial queries being processed in each of the partial query processing computer nodes is a predetermined number. Parallel processing system.

The query parallel processing system according to claim 7,
A query parallel processing system comprising a plurality of computer nodes for query processing.

A plurality of partial databases created by dividing a database into a plurality of parts are connected to a secondary storage device that can be accessed only by the computer node for each of the plurality of partial query processing computer nodes. Query parallel processing for processing queries to the database in parallel on a non-shared cluster system that is distributed and stored in a device, and in which a copy of each partial database is stored in a secondary storage device different from the original A system,
A computer node for query processing that accepts queries to the database;
The computer node for the query processing is
A procedure for analyzing a query to the database and processing the query, and for deriving a processing procedure including a partial query for each partial database constituting the database to be queried and a merge process of the partial query results Analysis means;
Failure detection means for detecting that a failure has occurred in the computer node for partial query processing;
Distribution that determines the number of partial queries to be dynamically distributed according to the load of computer nodes other than the computer node in which the failure detection is detected by the failure detection means among the computer nodes for partial query processing Number determining means;
A storage status management means for managing a correspondence relationship between the computer node for each partial query process and the original and duplicated partial database accessible from the computer node;
When processing the query according to the processing procedure derived by the query analysis means and distributing the partial query to the computer node for the partial query processing, the storage status management is performed for all the partial databases accessible to the computer node. When undistributed partial queries of the number of distributions determined using the distribution number determination means are distributed from the partial queries to those partial databases, and the partial query results are returned Comprises a query integration processing means for deriving a query result while merging related results,
The computer nodes for the partial query processing are respectively
A parallel query processing system comprising: partial query processing means for receiving a distributed partial query, processing the partial query, and returning a result to a distribution source computer node.

The query parallel processing system according to claim 10,
The number-of-distribution determination means includes a predetermined number of partial queries being processed in a computer node other than the computer node in which the failure detection is detected by the failure detection means among the computer nodes for each partial inquiry processing. Thus, a query parallel processing system having a configuration for determining the number of partial queries distributed.

The query parallel processing system according to claim 10,
A query parallel processing system comprising a plurality of computer nodes for query processing.

A shared cluster system in which multiple secondary query devices that share multiple partial databases created by dividing a database into multiple parts are shared by multiple partial query processing computer nodes Thus, a machine readable recording of a program for realizing a query parallel processing system for processing a query for the database in parallel with a plurality of computer nodes for partial query processing and one computer node for query processing. Recording medium,
The computer node for the query processing is
A procedure for analyzing a query to the database and processing the query, and for deriving a processing procedure including a partial query for each partial database constituting the database to be queried and a merge process of the partial query results Analysis means,
Distribution number determination means for determining the number of partial queries to be dynamically distributed according to the load for each of the partial query processing computer nodes,
When processing the query according to the processing procedure derived by the query analysis means and distributing the partial query to the computer nodes for each partial query processing, only the number of distributions determined by the distribution number determination means is not yet determined. A program for distributing a partial query of distribution and functioning as a query integration processing means for deriving a query result while merging related results when a partial query result is returned,
The computer node for each partial query process is
A machine-readable recording medium on which a program is recorded, which receives a distributed partial query, processes the partial query, and records a program for functioning as a partial query processing unit that returns a result to a distribution source computer node .

A plurality of partial databases created by dividing a database into a plurality of parts are connected to a secondary storage device that can be accessed only by the computer node for each of the plurality of partial query processing computer nodes. A query parallel processing system for processing a query to the database in parallel on a non-shared cluster system distributed and stored in a device, comprising a plurality of computer nodes for partial query processing and one computer for query processing A machine-readable recording medium recording a program to be realized by a node,
The computer node for the query processing is
A procedure for analyzing a query to the database and processing the query, and for deriving a processing procedure including a partial query for each partial database constituting the database to be queried and a merge process of the partial query results Analysis means,
A storage status management means for managing a correspondence relationship between the computer nodes for each partial query processing and a partial database accessible from the computer nodes;
Distribution number determination means for determining the number of partial queries to be distributed according to the load for each of the partial query processing computer nodes;
When processing the query according to the processing procedure derived by the query analysis means and distributing the partial query to the computer nodes for each partial query processing, all the partial databases accessible to each computer node are stored in the storage status. Recognized using the management means, and from the partial queries to those partial databases, the undistributed partial queries corresponding to the number of distributions determined using the distribution number determining means are distributed, and the partial query results are returned. Includes a program for functioning as a query integration processing means for deriving a query result while merging related results,
The computer node for each partial query process is
A machine-readable recording medium on which a program is recorded, which receives a distributed partial query, processes the partial query, and records a program for functioning as a partial query processing unit that returns a result to a distribution source computer node .