JP2004054940A

JP2004054940A - Storage management method and system

Info

Publication number: JP2004054940A
Application number: JP2003196180A
Authority: JP
Inventors: Masashi Tsuchida; 土田　正士; Kazuo Masai; 正井　一夫; Shunichi Torii; 鳥居　俊一
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2003-07-14
Filing date: 2003-07-14
Publication date: 2004-02-19
Anticipated expiration: 2019-12-08
Also published as: JP3599055B2

Abstract

<P>PROBLEM TO BE SOLVED: To reduce imbalance of a load by manupilating allocation of a storage region dividedly storing a database. <P>SOLUTION: A plurality of set key ranges are coordinated with a plurality of data storage regions provided in a storage. When storing data in the database, the data is stored in a data storage region corresponding to a key range including the data. When an allocation request for the data storage region and the key range is received, the data storage region is coordinated with the key range. Since the data storage region can be coordinated with a key range allocated with a small amount or a large amount of data, the imbalance of a load to each data storage region can be reduced. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、データベース分割管理方法および並列データベースシステムに関し、さらに詳しくは、データベース処理を行うプロセッサ数またはディスク数を負荷に合わせて最適にするデータベース分割管理方法および並列データベースシステムに関する。
【０００２】
【従来の技術】
例えば、「Ｄｅｖｉｄ　ＤｅＷｉｔｔ　ａｎｄ　Ｊｉｍ　Ｇｒａｙ：Ｐａｒａｌｌｅｌ　Ｄａｔａｂａｓｅ　Ｓｙｓｔｅｍｓ：Ｔｈｅ　Ｆｕｔｕｒｅ　ｏｆ　Ｈｉｇｈ　Ｐｅｒｆｏｒｍａｎｃｅ　Ｄａｔａｂａｓｅ　Ｓｙｓｔｅｍｓ，ＣＡＣＭ，Ｖｏｌ．３５，Ｎｏ．６，１９９２　」において、並列データベースシステムが提案されている。
この並列データベースシステムでは、密結合あるいは疎結合に複数のプロセッサを接続し、それら複数のプロセッサに対して、データベース処理を配分している。
【０００３】
【発明が解決しようとする課題】
従来の並列データベースシステムのシステム構成はユーザに任されており、且つ、固定的であった。
このため、システム構成が初めから負荷に不適合であったり、途中から不適合になる場合があり、期待する並列度が得られなかったり、高速な問い合せが実現できない問題点があった。
そこで、本発明の目的は、期待する並列度を得ることが出来ると共に、高速な問い合せを実現することが出来るデータベース分割管理方法および並列データベースシステムを提供することにある。
【０００４】
【課題を解決するための手段】
第１の観点では、本発明は、ユーザからの問合せの解析，最適化，処理手順作成を実行する機能を有するＦＥＳノードと、そのＦＥＳノードで作成された処理手順を基にしてデータベースをアクセスする機能を持つＢＥＳノードと、ディスクを備え且つそのディスクにデータベースを格納し管理する機能を持つＩＯＳノードとをネットワークで接続してなる並列データベースシステムにおいて、データベース処理の負荷パターンに応じて、ＦＥＳノードに割当てるプロセッサ数と、ＢＥＳノードに割当てるプロセッサ数と、ＩＯＳノードに割当てるプロセッサ数と、ＩＯＳノードのディスク数と、ディスクの分割数とを決定することを特徴とするデータベース分割管理方法を提供する。
【０００５】
また、ユーザからの問合せの解析，最適化，処理手順作成を実行する機能を有するＦＥＳノードと、そのＦＥＳノードで作成された処理手順を基にしてデータベースをアクセスする機能およびディスクを備え且つそのディスクにデータベースを格納し管理する機能を持つＢＥＳノードとをネットワークで接続してなる並列データベースシステムにおいて、データベース処理の負荷パターンに応じて、ＦＥＳノードに割当てるプロセッサ数と、ＢＥＳノードに割当てるプロセッサ数と、ＢＥＳノードのディスク数と、ディスクの分割数とを決定することを特徴とするデータベース分割管理方法を提供する。
【０００６】
上記第１の観点によるデータベース分割管理方法では、データベース処理の負荷パターン（一件検索処理，一件更新処理，データ取り出し処理など）に応じて、各ノードに割当てるプロセッサ数とディスク数とディスクの分割数とを決定する。そこで、システム構成が負荷に適合したものとなり、期待する並列度が得られると共に、高速な問い合せを実現できるようになる。
【０００７】
第２の観点では、本発明は、ユーザからの問合せの解析，最適化，処理手順作成を実行する機能を有するＦＥＳノードと、そのＦＥＳノードで作成された処理手順を基にしてデータベースをアクセスする機能を持つＢＥＳノードと、ディスクを備え且つそのディスクにデータベースを格納し管理する機能を持つＩＯＳノードとをネットワークで接続してなる並列データベースシステムにおいて、データベースのスキャンに必要な時間を一定とする並列アクセス可能なページ数の上限を決め、そのページ数の上限に応じて、ＦＥＳノードに割当てるプロセッサ数と、ＢＥＳノードに割当てるプロセッサ数と、ＩＯＳノードに割当てるプロセッサ数と、ＩＯＳノードのディスク数と、ディスクの分割数とを決定することを特徴とするデータベース分割管理方法を提供する。
【０００８】
また、ユーザからの問合せの解析，最適化，処理手順作成を実行する機能を有するＦＥＳノードと、そのＦＥＳノードで作成された処理手順を基にしてデータベースをアクセスする機能およびディスクを備え且つそのディスクにデータベースを格納し管理する機能を持つＢＥＳノードとをネットワークで接続してなる並列データベースシステムにおいて、データベースのスキャンに必要な時間を一定とする並列アクセス可能なページ数の上限を決め、そのページ数の上限に応じて、ＦＥＳノードに割当てるプロセッサ数と、ＢＥＳノードに割当てるプロセッサ数と、ＢＥＳノードのディスク数と、ディスクの分割数とを決定することを特徴とするデータベース分割管理方法を提供する。
【０００９】
上記第２の観点によるデータベース分割管理方法では、データベースのスキャンに必要な時間を一定とする並列アクセス可能なページ数の上限に応じて、各ノードに割当てるプロセッサ数とディスク数とディスクの分割数とを決定する。そこで、高速な問い合せを実現できるようになる。
【００１０】
第３の観点では、本発明は、ユーザからの問合せの解析，最適化，処理手順作成を実行する機能を有するＦＥＳノードと、そのＦＥＳノードで作成された処理手順を基にしてデータベースをアクセスする機能を持つＢＥＳノードと、ディスクを備え且つそのディスクにデータベースを格納し管理する機能を持つＩＯＳノードとをネットワークで接続してなる並列データベースシステムにおいて、負荷パターンにより期待並列度ｐを算出し、その期待並列度ｐに応じて、ＦＥＳノードに割当てるプロセッサ数と、ＢＥＳノードに割当てるプロセッサ数と、ＩＯＳノードに割当てるプロセッサ数と、ＩＯＳノードのディスク数と、ディスクの分割数とを決定することを特徴とするデータベース分割管理方法を提供する。
【００１１】
また、ユーザからの問合せの解析，最適化，処理手順作成を実行する機能を有するＦＥＳノードと、そのＦＥＳノードで作成された処理手順を基にしてデータベースをアクセスする機能およびディスクを備え且つそのディスクにデータベースを格納し管理する機能を持つＢＥＳノードとをネットワークで接続してなる並列データベースシステムにおいて、負荷パターンにより期待並列度ｐを算出し、その期待並列度ｐに応じて、ＦＥＳノードに割当てるプロセッサ数と、ＢＥＳノードに割当てるプロセッサ数と、ＢＥＳノードのディスク数と、ディスクの分割数とを決定することを特徴とするデータベース分割管理方法を提供する。
【００１２】
上記第３の観点によるデータベース分割管理方法では、負荷パターンにより算出した期待並列度ｐに応じて、各ノードに割当てるプロセッサ数とディスク数とディスクの分割数とを決定する。そこで、期待する並列度が得られるようになる。
【００１３】
第４の観点では、本発明は、上記構成のデータベース分割管理方法において、最適ページアクセス数ｍを算出し、キーレンジ分割がある場合には、サブキーレンジ単位の格納ページ数ｓ（＝ｍ／ｐ）を算出し、ｓページ単位でサブキーレンジ分割し、ディスクへデータ挿入を行うことを特徴とするデータベース分割管理方法を提供する。
上記第４の観点によるデータベース分割管理方法では、期待並列度ｐと最適ページアクセス数ｍとから算出したサブキーレンジ単位の格納ページ数ｓ（＝ｍ／ｐ）でサブキーレンジ分割して、ディスクへデータ挿入を行う。そこで、データを略均等に分割管理できるようになる。
【００１４】
第５の観点では、本発明は、ユーザからの問合せの解析，最適化，処理手順作成を実行する機能を有するＦＥＳノードと、そのＦＥＳノードで作成された処理手順を基にしてデータベースをアクセスする機能を持つＢＥＳノードと、ディスクを備え且つそのディスクにデータベースを格納し管理する機能を持つＩＯＳノードとをネットワークで接続してなる並列データベースシステムにおいて、問合せ実行処理中に取得したアクセスページ数，ヒットロウ数，通信回数などの負荷情報を基にして負荷アンバランスを検出し、負荷アンバランスを解消する方向に、ＦＥＳノードに割当てるプロセッサ数と、ＢＥＳノードに割当てるプロセッサ数と、ＩＯＳノードに割当てるプロセッサ数と、ＩＯＳノードのディスク数とを変更することを特徴とするデータベース分割管理方法を提供する。
【００１５】
また、ユーザからの問合せの解析，最適化，処理手順作成を実行する機能を有するＦＥＳノードと、そのＦＥＳノードで作成された処理手順を基にしてデータベースをアクセスする機能およびディスクを備え且つそのディスクにデータベースを格納し管理する機能を持つＢＥＳノードとをネットワークで接続してなる並列データベースシステムにおいて、問合せ実行処理中に取得したアクセスページ数，ヒットロウ数，通信回数などの負荷情報を基にして負荷アンバランスを検出し、負荷アンバランスを解消する方向に、ＦＥＳノードに割当てるプロセッサ数と、ＢＥＳノードに割当てるプロセッサ数と、ＢＥＳノードのディスク数とを変更することを特徴とするデータベース分割管理方法を提供する。
【００１６】
上記第５の観点によるデータベース分割管理方法では、問合せ実行処理中に負荷アンバランスを検出し、その負荷アンバランスを解消する方向に、各ノードのプロセッサ数またはディスク数を変更する。そこで、負荷変動があってもシステム構成が常に負荷に適合したものとなり、期待する並列度が得られると共に、高速な問い合せを実現できるようになる。
【００１７】
第６の観点では、本発明は、上記構成のデータベース分割管理方法において、ＢＥＳノードに割当てるプロセッサ数またはＩＯＳノードに割当てるプロセッサ数またはディスク数を追加する場合、オンライン中であれば、追加対象となるプロセッサまたはディスクで管理されるデータベースの表のキーレンジ範囲を閉塞し、新たにプロセッサあるいはディスクを割り当て、ロック情報，ディレクトリ情報の引き継ぎを行い、ノード振り分け制御に必要なディクショナリ情報の書き換えを行い、その後、オンライン中であれば、前記閉塞を解除することを特徴とするデータベース分割管理方法を提供する。
また、上記構成のデータベース分割管理方法において、ＢＥＳノードに割当てるプロセッサ数またはディスク数を追加する場合、オンライン中であれば、追加対象となるプロセッサまたはディスクで管理されるデータベースの表のキーレンジ範囲を閉塞し、新たにプロセッサあるいはディスクを割り当て、ロック情報，ディレクトリ情報の引き継ぎを行い、ノード振り分け制御に必要なディクショナリ情報の書き換えを行い、追加対象となるディスク群から新たなディスク群へデータを移動し、その後、オンライン中であれば、前記閉塞を解除することを特徴とするデータベース分割管理方法を提供する。
【００１８】
上記第６の観点によるデータベース分割管理方法では、プロセッサ数またはディスク数を追加，削除する場合、オンライン中であれば、表のキーレンジ範囲を閉塞した後、引き継ぎ処理を行い、その後、前記閉塞を解除する。そこで、オーバヘッドを最小限にできる。また、ＩＯＳノードがあるシステム構成では、データを移動せずに引き継ぎできるようになる。
【００１９】
第７の観点では、本発明は、上記構成のデータベース分割管理方法において、ＢＥＳノードに割当てるプロセッサ数またはＩＯＳノードに割当てるプロセッサ数またはディスク数を削除する場合、オンライン中であれば、削除対象となるプロセッサまたはディスクで管理されるデータベースの表のキーレンジ範囲を閉塞し、削除するプロセッサまたはディスクを決定し、ロック情報，ディレクトリ情報の引き継ぎを行い、ノード振り分け制御に必要なディクショナリ情報の書き換えを行い、その後、オンライン中であれば、前記閉塞を解除することを特徴とするデータベース分割管理方法を提供する。
【００２０】
また、上記構成のデータベース分割管理方法において、ＢＥＳノードに割当てるプロセッサ数またはディスク数を削除する場合、オンライン中であれば、削除対象となるプロセッサまたはディスクで管理されるデータベースの表のキーレンジ範囲を閉塞し、削除するプロセッサまたはディスクを決定し、ロック情報，ディレクトリ情報の引き継ぎを行い、ノード振り分け制御に必要なディクショナリ情報の書き換えを行い、削除対象となるディスク群から引継ぐディスク群へデータを移動し、その後、オンライン中であれば、前記閉塞を解除することを特徴とするデータベース分割管理方法を提供する。
【００２１】
上記第７の観点によるデータベース分割管理方法では、プロセッサ数またはディスク数を追加，削除する場合、オンライン中であれば、表のキーレンジ範囲を閉塞した後、引き継ぎ処理を行い、その後、前記閉塞を解除する。そこで、オーバヘッドを最小限にできる。また、ＩＯＳノードがあるシステム構成では、データを移動せずに引き継ぎできるようになる。
【００２２】
第８の観点では、本発明は、上記構成のデータベース分割管理方法により、データベース処理を行うプロセッサ数またはディスク数を動的に変更することを特徴とする並列データベースシステムを提供する。
上記第８の観点による並列データベースシステムでは、上記第５の観点から第７の観点の作用により、スケーラブルな並列データベースシステムとなる。
【００２３】
【発明の実施の形態】
以下、本発明の実施形態を図面に基づいて詳細に説明する。なお、これにより本発明が限定されるものではない。
図１は、本発明の一実施形態の並列データベースシステム１を示す構成図である。
この並列データベースシステム１は、ＦＥＳ（フロントエンドサーバ）ノード，ＢＥＳ（バックエンドサーバ）ノード，ＩＯＳ（インプットアウトプットサーバ）ノード，ＤＳ（ディクショナリサーバ）ノードおよびＪＳ（ジャーナルサーバ）ノードを、ネットワーク９０により接続した構成である。各ノードは、他のシステムとも接続されている。
ＦＥＳノードは、ディスクを持たない少なくとも１台のプロセッサから構成されたＦＥＳ７５からなるノードであり、ユーザからの問合せの解析，最適化，処理手順作成を実行するフロントエンドサーバの機能を持っている。
ＢＥＳノードは、ディスクを持たない少なくとも１台のプロセッサから構成されたＢＥＳ７３からなるノードであり、前記ＦＥＳ７５で作成された処理手順を基にしてデータベースをアクセスする機能を持っている。
ＩＯＳノードは、少なくとも１台のプロセッサから構成されたＩＯＳ７０および少なくとも１台のディスク８０からなるノードであり、ディスク８０にデータベースを格納し管理する機能を持っている。なお、ＩＯＳノードの機能をＢＥＳノードに持たせれば、ＩＯＳノードを省略できる。この場合、ＢＥＳ７３にディスクを接続すると共に、ＢＥＳ７３がディスク８０にデータベースを格納し管理する機能を持つ。
【００２４】
データベースは、複数の表からなる。表は、２次元のテーブル形式であり、複数のロウ（行）からなる。１つのロウは、１つ以上のカラム（属性）からなる。この表は、所定数のロウからなる固定長のページ単位で物理的に分割されて、ディスク８０上に格納される。各ページのディスク８０上の格納位置は、ディレクトリ情報を用いて知ることが出来る。
ＤＳノードは、少なくとも１台のプロセッサから構成されたＤＳ７１および少なくとも１台のディスク８１からなるノードであり、データベースの定義情報を一括管理する機能を持っている。
ＪＳノードは、少なくとも１台のプロセッサから構成されたＪＳ７２および少なくとも１台のディスク８２からなるノードであり、各ノードで実行されるデータベース更新履歴情報を格納し、管理する機能を持っている。
【００２５】
図２は、上記並列データベースシステム１におけるＦＥＳ７５，ＢＥＳ７３，ＩＯＳ７０のプロセッサ数およびディスク数およびディスクの分割数を決める本発明のデータベース分割管理方法の概念図である。
まず、ユーザが指定するワークロードでデータベース処理の負荷パターンを決める。負荷パターンには、一件検索処理、一件更新処理、データ取り出し処理などがある。その負荷パターンに応じて、ＩＯＳ７０でディスク８０を何分割して管理するか決定する（ＩＯＳノードの機能をＢＥＳが持つ場合は、ＢＥＳ７３でディスクを何分割して管理するか決定する）。
【００２６】
すなわち、スキーマ定義時、表の分割方法（キー値範囲，範囲毎ロウ数（ページ数を換算）など）から、格納に必要なディスク数が判明し、閉塞および再編成の単位が決まれば、ＢＥＳ７３およびＩＯＳ７０の組み合わせ（閉塞あるいは再編成の単位がディスク，ＩＯＳ，ＢＥＳに依存する）が決まる。
これにより、ＢＥＳ７３，ＩＯＳ７０，ディスク８０の構成台数が決まる。即ち、次のようになる。
既分割数．．．全ＢＥＳで管理する並列アクセス可能なデータベース分割単位（動的なＢＥＳ追加，削除は、この分割単位で行う）
各分割毎ディスク数．．．各分割で割り当てられるディスク数
図２は、ディスク数が”８”、既分割数が”４”、各分割毎ディスク数が”２”の場合である。
なお、プロセッサ性能がｎ倍となれば、分割数を変えずに、各分割で利用するボリューム数をｎ倍とする（ただし、ＩＯＳ７０とディスク８０の間の総データ転送レートの制限があるため、ディスク数にも制限がある）。
また、ここでディスクとは、ディスク装置１台に対応させている。本発明では、必ずしも「ディスク」はディスク装置１台と１対１に対応させる必要はない。例えば、１ディスク装置に複数のディスク装置を含む場合（ディスクアレイ装置）、並列アクセス可能な入出力単位の数を「ディスク装置」として適用すればよい。
【００２７】
図２のようにＦＥＳ：ＢＥＳ：ＩＯＳ：ディスク＝１：４：１：８であるが、初期データロード時には、ＦＥＳ，ＢＥＳは１台あればよく、ＦＥＳ：ＢＥＳ：ＩＯＳ：ディスク＝１：１：１：８となっている。そのため、ＢＥＳ７３１には、既分割＃１〜＃４のディスク８１１〜８４２に格納されるデータベースのディレクトリ情報を持つ。
ＢＥＳ７３の負荷が軽くて、ＢＥＳ７３１の１台だけでＩＯＳ７０および８台のディスク８１１〜８４２を処理可能な場合、ＢＥＳ７３１の１台だけで８台のディスク８１１〜８４２に格納されるデータベースをアクセスする。従って、ＦＥＳ：ＢＥＳ：ＩＯＳ：ディスク＝１：１：１：８のままである。
ＢＥＳ７３１の負荷が増加して利用率１００％の状態が続き負荷アンバランスが検出されると、ＢＥＳ７３２が追加される。既分割数が”４”であるので、２つのＢＥＳ７３１，７３２にそれぞれ２つの分割が対応付けられる。そのため、ＢＥＳ７３１には、既分割＃１〜＃２のディスク８１１〜８２２に格納されるデータベースのディレクトリ情報を持つ。また、ＢＥＳ７３２には、既分割＃３〜＃４のディスク８３１〜８４２に格納されるデータベースのディレクトリ情報を持つ。ＦＥＳ：ＢＥＳ：ＩＯＳ：ディスク＝１：２：１：８となる。
【００２８】
さらに、ＢＥＳ７３１，７３２の負荷が増加して利用率１００％の状態が続き負荷アンバランスが検出されると、ＢＥＳ７３１，７３２にそれぞれＢＥＳ７３３，７３４が追加される。既分割数が”４”であるので、４つのＢＥＳ７３１，７３２，７３３，７３４にそれぞれ１つの分割が対応付けられる。そのため、ＢＥＳ７３１には、既分割＃１のディスク８１１〜８１２に格納されるデータベースのディレクトリ情報を持つ。また、ＢＥＳ７３２には、既分割＃２のディスク８２１〜８２２に格納されるデータベースのディレクトリ情報を持つ。また、ＢＥＳ７３３には、既分割＃３のディスク８３１〜８３２に格納されるデータベースのディレクトリ情報を持つ。また、ＢＥＳ７３４には、既分割＃４のディスク８４１〜８４２に格納されるデータベースのディレクトリ情報を持つ。ＦＥＳ：ＢＥＳ：ＩＯＳ：ディスク＝１：４：１：８となる。
【００２９】
負荷が軽くなり、ＢＥＳ７３３，７３４の利用率が例えば５０％未満の状態が続くと、ＢＥＳ７３３，７３４に割り当ててあるノードを他の処理のために利用する方が有効である。そこで、利用率が５０％未満のＢＥＳ７３３，７３４を合わせる。すると、ＦＥＳ：ＢＥＳ：ＩＯＳ：ディスク＝１：２：１：８に縮退する。
【００３０】
以上のように、負荷に応じてＢＥＳを増減すれば、ＦＥＳ：ＢＥＳ：ＩＯＳ：ディスク＝１：１：１：８〜１：４：１：８の間でスケーラブルなシステムを実現できる。
ＩＯＳ７０は、ＢＥＳ７３とディスク８０の対応関係に依らず、並列にアクセス可能なディスク数分の並列なタスクが存在すればよい。このため、データの移動を行わず、ディレクトリ情報をＢＥＳ間で移動することで、ＢＥＳ７３とディスク８０の対応関係を変更でき、アクセスの分離および統合が容易に可能となる。
次に、負荷パターンが一件更新処理の場合とデータ取り出し処理の場合について、プロセッサ数，ディスク数，ディスクの分割数を、数値例で説明する。
前提条件は、以下のように仮定する。
ＦＥＳ処理（受取処理）　　　　　．．．　　　　　　３０［Ｋステッフ゜］
ＢＥＳ処理（一件更新処理）　　　．．．　　　　　　６０［Ｋステッフ゜］
（データ取り出し処理）．．．　　　　　２２０［Ｋステッフ゜］
送信処理　　　　　　　　　　　　．．．　　　　　　　６［Ｋステッフ゜］
受信処理　　　　　　　　　　　　．．．６＋４＊ページ数［Ｋステッフ゜］
入出力発行処理　　　　　　　　　．．．４＋４＊ページ数［Ｋステッフ゜］
プロセッサ性能　　　　　　　　　．．．　　　　　　１０［Ｍステッフ゜／秒］
入出力性能（１ヘ゜ーシ゛アクセス）　　　　．．．　　　　　　２０［ｍ秒］
（１０ヘ゜ーシ゛アクセス）　　　　．．．　　　　　　３０［ｍ秒］
Ａ．一件更新処理（１ヘ゜ーシ゛アクセス）の場合
（１）ＩＯＳノードがあるシステム構成の場合
ＦＥＳ処理の３０［Ｋステッフ゜］でプロセッサ性能１０［Ｍステッフ゜／秒］を割ると、３３３回／秒まで受取処理が可能である。
【００３１】
また、ＦＥＳからの実行要求の受信処理６［Ｋステッフ゜］＋ＢＥＳからのデータ取り出し要求の送信処理６［Ｋステッフ゜］＋ＩＯＳからのデータ取り出し結果の受信処理１０［Ｋステッフ゜］＋一件更新処理６０［Ｋステッフ゜］＋ＦＥＳへの実行要求結果の送信処理６［Ｋステッフ゜］＝８８［Ｋステッフ゜］がＢＥＳでの一件更新処理で必要であるから、これでプロセッサ性能１０［Ｍステッフ゜／秒］を割ると、１１４回／秒まで一件更新処理が可能である。
さらに、ＢＥＳからの入出力要求の受信処理６［Ｋステッフ゜］＋入出力発行処理８［Ｋステッフ゜］＋ＢＥＳへの入出力要求結果の送信処理６［Ｋステッフ゜］＝２０［Ｋステッフ゜］がＩＯＳのディスクへのアクセスに必要であるから、これでプロセッサ性能１０［Ｍステッフ゜／秒］を割ると、５００回／秒までディスクへのアクセスが可能である。
【００３２】
また、１ページのランダム入出力で２０［ｍ秒］を要するので、１台のディスクには５０回／秒までアクセス可能となる。これで前記ＩＯＳでのディスクへのアクセス可能回数５００回／秒を割ると、ＩＯＳには１０台までのディスクを実装可能である。
また、前記ＢＥＳでの一件更新処理可能回数１１４回／秒で前記ＩＯＳでのディスクへのアクセス可能回数５００回／秒を割ると、１台のＩＯＳで４．３台のＢＥＳに対応可能である。
【００３３】
さらに、前記ＢＥＳでの一件更新処理可能回数１１４回／秒で前記ＦＥＳでの受取処理可能３３３回／秒を割ると、１台のＦＥＳで３台のＢＥＳに対応可能である。
以上から、ＦＥＳ：ＢＥＳ＝１：３、ＢＥＳ：ＩＯＳ＝４．３：１、ＩＯＳ：ディスク＝１：１０となる。そこで、総合的には、図３に示すように、ＦＥＳ：ＢＥＳ：ＩＯＳ：ディスク＝１：４：１：８とすると、ほぼバランスがとれた実装になる（ＦＥＳとディスクに多少のアンバランスが生じる）。
【００３４】
（２）ＩＯＳノードの機能をＢＥＳノードに持たせたシステム構成の場合
ＦＥＳ処理の３０［Ｋステッフ゜］でプロセッサ性能１０［Ｍステッフ゜／秒］を割ると、３３３回／秒まで受取処理が可能である。
また、ＦＥＳからの実行要求の受信処理６［Ｋステッフ゜］＋入出力発行処理８［Ｋステッフ゜］＋一件更新処理６０［Ｋステッフ゜］＋ＦＥＳへの実行要求結果の送信処理６［Ｋステッフ゜］＝８０［Ｋステッフ゜］がＢＥＳでの一件更新処理で必要であるから、これでプロセッサ性能１０［Ｍステッフ゜／秒］を割ると、１２５回／秒まで一件更新処理が可能である。
また、１ページのランダム入出力で２０［ｍ秒］を要するので、１台のディスクには５０回／秒までアクセス可能となる。これで前記ＢＥＳでの一件更新処理可能回数１２５回／秒を割ると、ＢＥＳには２．５台までのディスクを実装可能である。
さらに、前記ＢＥＳでの一件更新処理可能回数１２５回／秒で前記ＦＥＳでの受取処理可能３３３回／秒を割ると、１台のＦＥＳで２．６台のＢＥＳに対応可能である。
【００３５】
以上から、ＦＥＳ：ＢＥＳ＝１：２．６、ＢＥＳ：ディスク＝１：２．５となる。そこで、総合的には、図４に示すように、ＦＥＳ：ＢＥＳ：ディスク＝１：４：８とすると、ほぼバランスがとれた実装になる（ＦＥＳに多少のアンバランスが生じる）。
Ｂ．データ取り出し処理（１０ヘ゜ーシ゛アクセス）の場合
（１）ＩＯＳノードがあるシステム構成の場合
ＦＥＳ処理の３０［Ｋステッフ゜］でプロセッサ性能１０［Ｍステッフ゜／秒］を割ると、３３３回／秒まで受取処理が可能である。
また、ＦＥＳからの実行要求の受信処理６［Ｋステッフ゜］＋ＢＥＳからのデータ取り出し要求の送信処理６［Ｋステッフ゜］＋ＩＯＳからのデータ取り出し結果の受信処理４６［Ｋステッフ゜］＋データ取り出し処理２２０［Ｋステッフ゜］＋ＦＥＳへの実行要求結果の送信処理６［Ｋステッフ゜］＝２８４［Ｋステッフ゜］がＢＥＳでのデータ取り出し処理で必要であるから、これでプロセッサ性能１０［Ｍステッフ゜／秒］を割ると、３５回／秒までデータ取り出し処理が可能である。
さらに、ＢＥＳからの入出力要求の受信処理６［Ｋステッフ゜］＋入出力発行処理４４［Ｋステッフ゜］＋ＢＥＳへの入出力要求結果の送信処理６［Ｋステッフ゜］＝５６［Ｋステッフ゜］がＩＯＳのディスクへのアクセスに必要であるから、これでプロセッサ性能１０［Ｍステッフ゜／秒］を割ると、１７９回／秒までディスクへのアクセスが可能である。
【００３６】
また、１０ページの一括入出力で３０［ｍ秒］を要するので、１台のディスクには３３回／秒までアクセス可能である。これで前記ＩＯＳでのディスクへのアクセス可能回数１７９回／秒を割ると、５．４台までのディスクを実装可能である。
また、前記ＢＥＳでのデータ取り出し処理可能回数３５回／秒で前記ＩＯＳでのディスクへのアクセス可能回数１７９回／秒を割ると、１台のＩＯＳで５．１台のＢＥＳに対応可能である。
さらに、前記ＢＥＳでのデータ取り出し処理可能回数３５回／秒で前記ＦＥＳでの受取処理可能３３３回／秒を割ると、１台のＦＥＳで９．５台のＢＥＳに対応可能である。
以上から、ＦＥＳ：ＢＥＳ＝１：９．５、ＢＥＳ：ＩＯＳ＝５．１：１、ＩＯＳ：ディスク＝１：５．４となる。そこで、総合的には、ＦＥＳ：ＢＥＳ：ＩＯＳ：ディスク＝１：１０：２：１０とすると、ほぼバランスがとれた実装になる（ディスクに多少のアンバランスが生じる）。
【００３７】
（２）ＩＯＳノードの機能をＢＥＳノードに持たせたシステム構成の場合
ＦＥＳ処理の３０［Ｋステッフ゜］でプロセッサ性能１０［Ｍステッフ゜／秒］を割ると、３３３回／秒まで受取処理が可能である。
また、ＦＥＳからの実行要求の受信処理６［Ｋステッフ゜］＋入出力発行処理４４［Ｋステッフ゜］＋データ取り出し処理２２０［Ｋステッフ゜］＋ＦＥＳへの実行要求結果の送信処理６［Ｋステッフ゜］＝２７６［Ｋステッフ゜］がＢＥＳでのデータ取り出し処理で必要であるから、これでプロセッサ性能１０［Ｍステッフ゜／秒］を割ると、３６回／秒までデータ取り出し処理が可能である。
また、１０ページの一括入出力で３０［ｍ秒］を要するので、１台のディスクには３３回／秒までアクセス可能である。これで前記ＢＥＳでのデータ取り出し処理可能回数３６回／秒を割ると、１台だけのディスクを実装可能である。
【００３８】
さらに、前記ＢＥＳでのデータ取り出し処理可能回数３６回／秒で前記ＦＥＳでの受取処理可能３３３回／秒を割ると、１台のＦＥＳで９．２台のＢＥＳに対応可能である。
以上から、ＦＥＳ：ＢＥＳ＝１：９．２、ＢＥＳ：ディスク＝１：１となる。そこで、総合的には、ＦＥＳ：ＢＥＳ：ディスク＝１：１０：１０とすると、ほぼバランスがとれた実装になる。
【００３９】
図５は、ＦＥＳ７５の構成図である。
ＦＥＳ７５は、ユーザが作成したアプリケーションプログラム１０〜１１と、問合せ処理やリソース管理などのデータベースシステム全体の管理を行う並列データベース管理システム２０と、データの読み書きなどの計算機システム全体の管理を受け持つオペレーティングシステム３０とを具備している。
【００４０】
上記並列データベース管理システム２０は、システム制御部２１と、論理処理部２２と、物理処理部２３と、処理対象となるデータを一時的に格納するデータベース／ディクショナリ２４とを具備している。また、上記並列データベース管理システム２０は、ネットワーク９０および他のシステムと接続されている。
【００４１】
上記システム制御部２１は、入出力の管理等を行う。また、データロード処理２１０と、動的負荷制御処理２１１とを具備している。
上記論理処理部２２は、問合せの構文解析や意味解析を行う問合せ解析２２０と、適切な処理手順候補を生成する静的最適化処理２２１と、処理手順候補に対応したコードの生成を行なうコード生成２２２とを具備している。また、処理手順候補から最適なものを選択する動的最適化処理２２３と、選択された処理手順候補のコードの解釈実行を行うコード解釈実行２２４とを具備している。
【００４２】
上記物理処理部２３は、アクセスしたデータの条件判定や編集やレコード追加などを実現するデータアクセス処理２３０と、データベースレコードの読み書き等を制御するデータベース／ディクショナリバッファ制御２３１と、システムで共用するリソースの排他制御を実現する排他制御２３３とを具備している。
【００４３】
図６は、ＢＥＳ７３の構成図である。
ＢＥＳ７３は、データベースシステム全体の管理を行う並列データベース管理システム２０と、計算機システム全体の管理を受け持つオペレーティングシステム３０とを具備して構成されている。なお、ＩＯＳノードの機能を持つときは、ディスクを有し、そのディスクにデータベース４０を格納し、管理する。
【００４４】
上記並列データベース管理システム２０は、システム制御部２１と、論理処理部２２と、物理処理部２３と、処理対象となるデータを一時的に格納するデータベースバッファ２４とを具備している。また、上記並列データベース管理システム２０は、ネットワーク９０および他のシステムと接続されている。
【００４５】
上記システム制御部２１は、入出力の管理等を行う。また、負荷配分を考慮したデータロードを行うためのデータロード処理２１０を具備している。
上記論理処理部２２は、コードの解釈実行を行うコード解釈実行２２４を具備している。
上記物理処理部２３は、アクセスしたデータの条件判定や編集やレコード追加などを実現するデータアクセス処理２３０と、データベースレコードの読み書き等を制御するデータベースバッファ制御２３１と、入出力対象となるデータの格納位置を管理するマッピング処理２３２と、システムで共用するリソースの排他制御を実現する排他制御２３３とを具備している。
【００４６】
図７は、ＩＯＳ７０とディスク８０の構成図である。
ＩＯＳ７０は、データベースシステム全体の管理を行う並列データベース管理システム２０と、計算機システム全体の管理を受け持つオペレーティングシステム３０とを具備して構成されている。
ディスク８０には、データベース４０が格納されている。
上記並列データベース管理システム２０は、システム制御部２１と、物理処理部２３と、処理対象となるデータを一時的に格納する入出力バッファ２４とを具備している。また、上記並列データベース管理システム２０は、ネットワーク９０および他のシステムと接続されている。
【００４７】
上記システム制御部２１は、入出力の管理等を行う。また、負荷配分を考慮したデータロードを行うためのデータロード処理２１０を具備している。
上記物理処理部２３は、アクセスしたデータの条件判定や編集やレコード追加などを実現するデータアクセス処理２３０と、データベースレコードの読み書き等を制御する入出力バッファ制御２３１とを具備している。
【００４８】
図８は、ＤＳ７１およびディスク８１の構成図である。
ＤＳ７１は、データベースシステム全体の管理を行う並列データベース管理システム２０と、計算機システム全体の管理を受け持つオペレーティングシステム３０とを具備して構成されている。
ディスク８１には、ディクショナリ５０が格納されている。
上記並列データベース管理システム２０は、システム制御部２１と、論理処理部２２と、物理処理部２３と、ディクショナリバッファ２４とを具備している。また、上記並列データベース管理システム２０は、ネットワーク９０および他のシステムと接続されている。
上記論理処理部２２は、コードの解釈実行を行うコード解釈実行２２４を具備している。
上記物理処理部２３は、アクセスしたデータの条件判定や編集やレコード追加などを実現するデータアクセス処理２３０と、ディクショナリレコードの読み書き等を制御するディクショナリバッファ制御２３１と、システムで共用するリソースの排他制御を実現する排他制御２３３とを具備している。
【００４９】
図９は、ＪＳ７２とディスク８２の構成図である。
ＪＳ７２は、データベースシステム全体の管理を行う並列データベース管理システム２０と、計算機システム全体の管理を受け持つオペレーティングシステム３０とを具備して構成されている。
ディスク８２には、ジャーナル６０が格納されている。
上記並列データベース管理システム２０は、システム制御部２１と、物理処理部２３と、ジャーナルバッファ２４とを具備している。また、上記並列データベース管理システム２０は、ネットワーク９０および他のシステムと接続されている。
上記物理処理部２３は、アクセスしたデータの条件判定や編集やレコード追加などを実現するデータアクセス処理２３０と、ジャーナルレコードの読み書き等を制御するジャーナルバッファ制御２３１とを具備している。
【００５０】
図１０は、ＦＥＳ７５におけるデータベース管理システム２０の処理を示すフローチャートである。
システム制御部２１は、問合せ分析処理か否かチェックする（２１２）。問合せ分析処理であれば、問合せ分析処理４００を呼び出し、それを実行した後、終了する。
問合せ分析処理でなければ、問合せ実行処理か否かチェックする（２１３）。
問合せ実行処理であれば、問合せ実行処理４１０を呼び出し、それを実行した後、終了する。
問合せ実行処理でなければ、データロード処理か否かチェックする（２１４）。データロード処理であれば、データロード処理２１０を呼び出し、それを実行した後、終了する。
データロード処理でなければ、動的負荷制御処理か否かチェックする（２１４）。動的負荷制御処理であれば、動的負荷制御処理２１０を呼び出し、それを実行した後、終了する。
動的負荷制御処理でなければ、終了する。
【００５１】
なお、ＢＥＳ７３におけるデータベース管理システム２０の処理のフローチャートは、図１０からステップ２１２，２１５，４００，２１１を省いたものとなる。また、ＩＯＳ７０におけるデータベース管理システム２０の処理のフローチャートは、図１０からステップ２１２，２１３，２１５，４００，４１０，２１１を省いたものとなる。
【００５２】
図１１は、問合せ分析処理４００のフローチャートである。
まず、問合せ解析２２０により、入力された問合せ文の構文解析，意味解析を実行する。
次に、静的最適化処理２２１により、問合せで出現する条件式から条件を満足するデータの割合を推定し、予め設定している規則を基に、有効なアクセスパス候補（特にインデクスを選出する）を作成し、処理手順の候補を作成する。
次に、コード生成２２２により、処理手順の候補を実行形式のコードに展開する。そして、処理を終了する。
【００５３】
図１２は、問合せ解析２２０のフローチャートである。
ステップ２２００では、入力された問合せ文の構文解析，意味解析を実行する。そして、処理を終了する。
【００５４】
図１３は、静的最適化処理２２１のフローチャートである。
まず、述語選択率推定２２１０により、問い合せに出現する条件式の述語の選択率を推定する。
次に、アクセスパス剪定２２１２により、インデクス等からなるアクセスパスを剪定する。
次に、処理手順候補生成２２１３により、アクセスパスを組み合わせた処理手順候補を生成する。
そして、処理を終了する。
【００５５】
図１４は、述語選択率推定２２１０のフローチャートである。
ステップ２２１０１では、問合せ条件式に変数が出現するか否かチェックする（２２１０１）。変数が出現しなければステップ２２１０２に進み、変数が出現すればステップ２２１０４に進む。
ステップ２２１０２では、当条件式にカラム値分布情報があるか否かチェックする。カラム値分布情報があればステップ２２１０３に進み、カラム値分布情報がなければステップ２２１０５に進む。
ステップ２２１０３では、カラム値分布情報を用いて選択率を算出し、終了する。
ステップ２２１０４では、当条件式にカラム値分布情報があるか否かチェックする。カラム値分布情報があれば終了し、カラム値分布情報がなければ、ステップ２２１０５に進む。
ステップ２２１０５では、条件式の種別に応じてディフォルト値を設定し（２２１０５）、終了する。
【００５６】
図１５は、アクセスパス剪定２２１２のフローチャートである。
ステップ２２１２０では、問合せ条件式で出現するカラムのインデクスをアクセスパス候補として登録する。
ステップ２２１２１では、問合せでアクセス対象となる表が複数ノードに分割格納されているかチェックする。分割格納されていなければステップ２２１２２に進み、分割格納されていればステップ２２１２３に進む。
ステップ２２１２２では、シーケンシャルスキャンをアクセスパス候補として登録する。
ステップ２２１２３では、パラレルスキャンをアクセスパス候補として登録する。
ステップ２２１２４では、各条件式の選択率が既に設定済みか否かチェックする。設定済みであればステップ２２１２５に進み、設定済みでなければステップ２２１２６に進む。
ステップ２２１２５では、各表に関して選択率が最小となる条件式のインデクスをアクセスパスの最優先度とする。
ステップ２２１２６では、各条件式の選択率の最大値および最小値を取得する。
ステップ２２１２７では、プロセッサ性能，ＩＯ性能等システム特性より各アクセスパスの選択基準を算出する。
ステップ２２１２８では、単一あるいは複数のインデクスを組合せたアクセスパスでの選択率が上記選択基準を下回るものだけアクセスパス候補として登録する。
【００５７】
図１６は、処理手順候補生成２２１３のフローチャートである。
ステップ２２１３０では、問合せでアクセス対象となる表が複数ノードに分割格納されているかチェックする。分割格納されていなければステップ２２１３１へ進み、分割格納されていればステップ２２１３５へ進む。
ステップ２２１３１では、処理手順候補にソート処理が含まれているか否かをチェックする。含まれていなければステップ２２１３２へ進み、含まれていればステップ２２１３５へ進む。
ステップ２２１３２では、問合せでアクセス対象となる表のアクセスパスが唯一であるかチェックする。唯一であればステップ２２１３３へ進み、唯一でなければステップ２２１３４へ進む。
ステップ２２１３３では、単一の処理手順を作成し、終了する。
ステップ２２１３４では、複数の処理手順を作成し、終了する。
ステップ２２１３５では、結合可能な２ウェイ結合へ問合せを分解する。
ステップ２２１３６では、分割格納される表の格納ノード群に対応して、データ読みだし／データ分配処理手順とスロットソート処理手順を候補として登録する。
ステップ２２１３７では、結合処理ノード群に対応して、スロットソート処理手順、Ｎウェイマージ処理手順および突き合わせ処理手順を候補として登録する。なお、スロットソート処理とは、ページ内の各ロウがページ先頭からのオフセットで位置付けされるスロットで管理され、データが格納されるページを対象とするページ内のソート処理を指し、スロット順に読みだせば昇順にロウがアクセス可能とする。また、Ｎウェイマージ処理とは、Ｎウェイのバッファを用いて、各マージ段でＮ本のソート連を入力にしてトーナメント法で最終的に１本のソート連を作成する処理をいう。
ステップ２２１３８では、要求データ出力ノードに要求データ出力処理手順を登録する。
ステップ２２１３９では、分解結果に対して評価がすべて終了したかチェックする。評価がすべて終了していなければ前記ステップ２２１３６に戻り、評価がすべて終了していれば処理を終了する。
【００５８】
図１７は、コード生成２２２のフローチャートである。
ステップ２２２０では、処理手順候補が唯一か否かをチェックする。唯一でなければステップ２２２１へ進み、唯一であればステップ２２２３へ進む。
ステップ２２２１では、カラム値分布情報等からなる最適化情報を処理手順に埋込む。
ステップ２２２２では、問合せ実行時に代入された定数に基づいて処理手順を選択するデータ構造を作成する。
ステップ２２２３では、処理手順を実行形式へ展開する。そして、処理を終了する。
【００５９】
図１８は、問合せ実行処理４１０のフローチャートである。
まず、動的実行時最適化２２３により、代入された定数に基づき、各ノード群で実行する処理手順を決定する。
次に、コード解釈実行２２４により、当処理手順を解釈し、実行する。そして、処理を終了する。
【００６０】
図１９は、動的最適化処理２２３のフローチャートである。
ステップ２２３００では、動的負荷制御処理を実行する（２２３００）。
ステップ２２３０１では、作成されている処理手順が単一か否かをチェックする。単一であれば、処理を終了する。単一でなければ、ステップ２２３０２へ進む。
ステップ２２３０２では、代入された定数を基に選択率を算出する。
ステップ２２３０３では、処理手順候補に並列な処理手順が含まれるか否かチェックする。含まれていればステップ２２３０４に進み、含まれていなければステップ２２３０８に進む。
ステップ２２３０４では、ディクショナリから最適化情報（結合カラムのカラム値分布情報，アクセス対象となる表のロウ数，ページ数等）を入力する。
ステップ２２３０５では、データ取り出し／データ分配のための処理時間を各システム特性を考慮し、算出する。
ステップ２２３０６では、当処理時間から結合処理に割当てるノード数ｐを決定する。
ステップ２２３０７では、データ分配情報を最適化情報を基に作成する。
ステップ２２３０８では、アクセスパスの選択基準に従って処理手順を選択し、終了する。
【００６１】
図２０は、コード解釈実行２２４のフローチャートである。
ステップ２２４００では、データ取り出し／データ分配処理か否かチェックする。データ取り出し／データ分配処理であればステップ２２４０１に進み、データ取り出し／データ分配処理でなければステップ２２４０５に進む。
ステップ２２４０１では、データベースをアクセスし条件式を評価する。
ステップ２２４０２では、データ分配情報を基に、各ノード毎のバッファへデータを設定する。
ステップ２２４０３では、当該ノードのバッファが満杯か否かチェックする。満杯であればステップ２２４０４へ進み、満杯でなければステップ２２４２０へ進む。
ステップ２２４０４では、ページ形式で対応するノードへデータを転送し、ステップ２２４２０へ進む。
ステップ２２４０５では、スロットソート処理か否かチェックする。スロットソート処理であればステップ２２４０６へ進み、スロットソート処理でなければステップ２２４０９へ進む。
ステップ２２４０６では、他ノードからのページ形式データの受け取りを行う。
ステップ２２４０７では、スロットソート処理を実行する。
ステップ２２４０８では、スロットソート処理結果を一時的に保存し、ステップ２２４２０へ進む。
ステップ２２４０９では、Ｎウェイマージ処理か否かチェックする。Ｎウェイマージ処理であればステップ２２４１０へ進み、Ｎウェイマージ処理でなければステップ２２４１２へ進む。
【００６２】
ステップ２２４１０では、Ｎウェイマージ処理を実行する。
ステップ２２４１１では、Ｎウェイマージ処理結果を一時的に保存し、ステップ２２４２０へ進む。
ステップ２２４１２では、突き合わせ処理か否かチェックする。突き合わせ処理であればステップ２２４１３へ進み、突き合わせ処理でなければステップ２２４１６へ進む。
ステップ２２４１３では、両ソートリストを突き合わせ、出力用バッファへデータを設定する。
ステップ２２４１４では、出力用バッファが満杯か否かチェックする。満杯であれば、ステップ２２４１５へ進む。満杯でなければ、ステップ２２４２０へ進む。
ステップ２２４１５では、ページ形式で要求データ出力ノードへデータを転送し、ステップ２２４２０へ進む。
ステップ２２４１６では、要求データ出力処理か否かチェックする。要求データ出力処理であればステップ２２４１７へ進み、要求データ出力処理でなければステップ２２４２０へ進む。
ステップ２２４１７では、他ノードからページ形式データの転送があるか否かチェックする。転送があればステップ２２４１８へ進み、転送がなければステップ２２４１９へ進む。
ステップ２２４１８では、他ノードからページ形式データを受け取る。
ステップ２２４１９では、アプリケーションプログラムへ問合せ処理結果を出力する。
ステップ２２４２０では、ＢＥＳで実行中か否かチェックする。ＢＥＳで実行中ならステップ２２４２１へ進み、ＢＥＳで実行中でないなら終了する。
ステップ２２４２１では、アクセスページ数，ヒットロウ数，通信回数等の処理負荷を推定するための情報をＦＥＳへ通知し、終了する。
【００６３】
図２１は、データロード処理２１０のフローチャートである。
各ステップを説明する前に概念を説明する。
データロード方法には、表全体のスキャンに必要な時間を一定時間内に抑える目標応答時間重視データ配置と、ｍページアクセスに最適化した期待並列度重視データ配置と、ボリューム分割を完全にユーザが指定したユーザ制御によるユーザ指定データ配置とがある。
目標応答時間重視データ配置では、まず、表全体のロウを格納するのに必要なページ数を求める。次に、並列アクセス可能な各分割のディスクに格納するページ数の上限を決める。アクセスには、必要となれば一括入力（例えば、１０ページ）を前提にする。ディスク台数，ＩＯＳ台数，ＢＥＳ台数の組み合わせに応じて負荷配分を決める。キーレンジ分割がある場合、上限ページ数でキーレンジ分割区間を再分割し、各分割のディスクへ各々格納する。このキーレンジ分割については、図２３を参照して後で詳述する。
期待並列度重視データ配置では、ｍのサイズに依存するが、かなり大であることな望ましい。キーレンジ分割がある場合、ｍのサイズと期待並列度ｐから各キーレンジ分割単位のサブキーレンジ格納ページ数ｓ（＝ｍ／ｐ）を決定し、ｓページ単位で各分割のディスクへ各々格納する。
【００６４】
期待並列度ｐの算出方法は、応答時間をノード毎のオーバヘッドで割った比の平方根で算出する。この比が、１０で期待並列度３、１００で期待並列度１０、１０００で期待並列度３２、１００００で期待並列度１００となる。算出された期待並列度ｐが、既分割数を上回る場合、既分割数を採用する（既分割数で処理できる最大ディスク数が決まるため）。逆の場合は、既分割数を上限に期待並列度ｐを分割数として採用する。
具体的に、１００ページアクセスに最適化したデータ配置を試算する。前提として、一括入力は１０ページとする。１回のＩ／Ｏ時間（１０ページアクセス）に３００ｍ秒、１回のＩ／Ｏオーバヘッドに５．６ｍ秒（１０ＭＩＰＳ性能で５６ｋｓが必要）であるので、期待並列度ｐが約７（＝√｛３００／５．６｝）となる。従って、ｓ＝１４（＝１００／７）ページ毎にサブキーレンジ分割を行う。
ユーザ指定データ配置は、従来のデータベース管理システムと同様のデータ配置であり、設定パラメタ通りに管理する。
【００６５】
さて、ステップ２１０００では、目標応答時間重視データ配置か否かをチェックする。目標応答時間重視データ配置でなければステップ２１００１に進み、目標応答時間重視データ配置であればステップ２１００３に進む。
ステップ２１００１では、期待並列度重視データ配置か否かチェックする。期待並列度重視データ配置でなければステップ２１００２に進み、期待並列度重視データ配置であればステップ２１０１０に進む。
ステップ２１００２では、ユーザ指定があるか否かをチェックする。ユーザ指定があればステップ２１０１６に進み、ユーザ指定がなければ終了する。
ステップ２１００３では、表全体のロウを格納するのに必要なページ数を求める。
ステップ２１００４では、表のスキャンに必要な時間を一定とする並列アクセス可能なディスクに格納するページ数の上限を決める。
ステップ２１００５では、上記要件を満たすＢＥＳ，ＩＯＳ，ディスク群を決定する。
ステップ２１００６では、キーレンジ分割があるか否かチェックする。キーレンジ分割があるならステップ２１００７へ進み、キーレンジ分割がないならステップ２１００９へ進む。
ステップ２１００７では、ある上限ページ数でキーレンジ分割区間を再分割しする。
ステップ２１００８では、キーレンジ分割区間に対応してデータ挿入を行い、終了する。
ステップ２１００９では、上限ページ数で区切ってデータ挿入を行い、終了する。
ステップ２１０１０では、推定ワークロードにより最適ページアクセス数ｍを算出する。
ステップ２１０１１では、期待並列度ｐを算出し、その期待並列度ｐに応じて、ＢＥＳ，ＩＯＳ，ディスク群を決定する。
ステップ２１０１２では、キーレンジ分割があるか否かチェックする。キーレンジ分割があるならステップ２１０１３へ進み、キーレンジ分割がないならステップ２１０１５へ進む。
ステップ２１０１３では、サブキーレンジ単位の格納ページ数ｓ（＝ｍ／ｐ）を算出する。
ステップ２１０１４では、ｓページ単位でサブキーレンジ分割し、各ディスクへデータ挿入を行い、終了する。
ステップ２１０１５では、ｓページ数で区切ってデータ挿入を行い、終了する。
ステップ２１０１６では、ユーザ指定のＩＯＳの管理するディスクへデータ挿入を行い、終了する。
【００６６】
図２２は、動的負荷制御処理２１１のフローチャートである。
ステップ２１１００では、負荷アンバランス（アクセス集中化／離散化）の有無を検出する。すなわち、ノード毎単位時間当たりに実行されたＤＢ処理負荷（処理ステップ数（ＤＢ処理分，Ｉ／Ｏ処理分，通信処理分）、プロセッサ性能（処理時間に換算する）、Ｉ／Ｏ回数（入出力時間に換算する））の分布からネックとなる資源（プロセッサ（ＢＥＳ，ＩＯＳ）、ディスク）を検出し、ＤＢ処理をＳＱＬ文に展開し、各資源へのアクセス状況を表単位に分類する。負荷アンバランスが検出されたらステップ２１１０１へ進み、負荷アンバランスが検出されなかったら処理を終了する。
【００６７】
ステップ２１１０１では、アクセス分布情報から、ＢＥＳを追加あるいは削除するか、ＩＯＳ，ディスク対を追加あるいは削除するかを判断する。追加または削除が必要ならステップ２１１０２に進み、必要ないなら終了する。
ステップ２１１０２では、追加か否かをチェックする。追加ならステップ２１１０３へ進み、追加でないならステップ２１１２へ進む。
ステップ２１１０３では、オンライン中かチェックする。オンライン中なら、ステップ２１１０４へ進む。オンライン中でないなら、ステップ２１１０５へ進む。
ステップ２１１０４では、対象となるＢＥＳ群で管理される表のキーレンジ範囲を閉塞する。
ステップ２１１０５では、新たにＢＥＳを割り当る。
ステップ２１１０６では、ロック情報およびディレクトリ情報の引き継ぎを行う。
ステップ２１１０７では、ノード振り分け制御に必要なディクショナリ情報の書き換えをＤＳ７１に指示する。
ステップ２１１０８では、ＩＯＳが存在するか否かをチェックする。存在しなければステップ２１１０９へ進み、存在すればステップ２１１１０へ進む。なお、このステップは、ＩＯＳが存在するシステム構成とＩＯＳが存在しないシステム構成の両方に同じソフトウエアで対応するために挿入されている。
ステップ２１１０９では、対象となるＢＥＳ群から新たなＢＥＳ群へデータを移動する。
ステップ２１１１０では、オンライン中かチェックする。オンライン中なら、ステップ２１１１１へ進む。オンライン中でないなら、処理を終了する。
ステップ２１１１１では、対象となるＢＥＳ群で管理される表のキーレンジ範囲の閉塞を解除し、終了する。
ステップ２１１１２では、オンライン中かチェックする。オンライン中なら、ステップ２１１１３へ進む。オンライン中でないなら、ステップ２１１１４へ進む。
ステップ２１１１３では、対象となるＢＥＳ群で管理される表のキーレンジ範囲を閉塞する。
ステップ２１１１４では、縮退するＢＥＳを決定する。
ステップ２１１１５では、ロック情報およびディレクトリ情報の引き継ぎを行う。
ステップ２１１１６では、ノード振り分け制御に必要なディクショナリ情報の書き換えをＤＳ７１に指示する。
ステップ２１１１７では、ＩＯＳが存在するか否かをチェックする。存在しなければステップ２１１１８へ進み、存在すればステップ２１１１９へ進む。
ステップ２１１１８では、縮退するＢＥＳ群からデータを追い出す。
ステップ２１１１９では、オンライン中かチェックする。オンライン中なら、ステップ２１１２０へ進む。オンライン中でないなら、処理を終了する。
ステップ２１１２０では、対象となるＢＥＳ群で管理される表のキーレンジ範囲の閉塞を解除し、終了する。
【００６８】
図２３は、キーレンジ分割を用いたデータロード処理の概念図である。
既分割数は”４”とする。また、データベースのカラム値ｖ１〜ｖ６は、図１１のような出現頻度を取るものとする。
初期データロード時、必要なＢＥＳは、７３１の１台でよい。格納するべきページ数を各分割８１０〜８４０のディスクにそれぞれページ数の上限まで対応付けると、カラム値ｖ１〜ｖ２は分割８１０のディスクに格納され、カラム値ｖ２〜ｖ３は分割８２０および８３０のディスクに格納され、カラム値ｖ３〜ｖ５は分割８４０のディスクに格納され、カラム値ｖ５〜ｖ６は他のディスク群に格納される。初期データロード時には、各ディスクに格納されたページの管理を行うために、ディスク毎のディレクトリ情報を作成する。
データベースアクセス時には、負荷に応じてＢＥＳ７３２〜７３４を用いる場合、各ＢＥＳに対応するディスク毎のディレクトリ情報を利用し、データベースをアクセスする。
上記各処理の実装に当たって、次の実施形態と組合せてもよい。
【００６９】
ロウのノード間移動を容易にするために、ロウ識別子にＢＥＳ等の位置情報を含めない。ＢＥＳでは、表の分割位置を特定するためのディレクトリ情報とロウ識別子とを組み合わせて、ロウの物理位置を特定する。ロウ移動に関しては、ディレクトリ情報の書き換えで対応する。再編成あるいはロウ移動に対応した構造にしておき、ＢＥＳが動的に追加されても、ディレクトリ情報およびロック情報の引き継ぎで処理の分割を可能とする。
また、データベースをレプリカ管理する場合、２倍の格納領域が必要となる。１次コピーとバックアップコピーが同一ＩＯＳ、ＢＥＳで管理されるか否かにかかわらず、ディスクへのアクセス負荷はほぼ２倍となるため、既分割数で管理する各分割毎ボリューム数を１／２とすればよい。
【００７０】
さらに、ディスク、ＩＯＳ、ＢＥＳ等の障害時、オンライン処理から切り離し、復旧後オンラインと接続する。各ノードに応じて閉塞管理方式が異なる。ディスク障害時、このディスクに格納されるキーレンジ範囲を閉塞する。バックアップコピーが存在すれば（同一ＩＯＳ（ミラーディスク）、別ＩＯＳ（データレプリカ）の管理下でバックアップコピーを取得する必要あり）、処理を振り分ける。ＩＯＳ障害時、このＩＯＳに格納されるキーレンジ範囲を閉塞する。バックアップコピーが存在すれば（別ＩＯＳ（データレプリカ）の管理下でバックアップコピーを取得する必要あり）、処理を振り分ける。ＢＥＳ障害時、このＢＥＳで管理されるキーレンジ範囲を閉塞する。ＩＯＳが存在すれば、新たにＢＥＳを割り当て、ロック情報引き継ぎ、ノード振り分け制御に必要なディクショナリ情報の書き換え後、処理を続行する。
【００７１】
本発明は、統計情報を用いた規則とコスト評価との併用に限らず、適当なデータベース参照特性情報を与える処理手順が得られるものであれば、例えばコスト評価のみ、規則利用のみ、コスト評価と規則利用の併用等の最適化処理を行うデータベース管理システムにも適用できる。
本発明は、密結合／疎結合マルチプロセッサシステム大型計算機のソフトウェアシステムを介して実現することも、また各処理部のために専用プロセッサが用意された密結合／疎結合複合プロセッサシステムを介して実現することも可能である。また、単一プロセッサシステムでも、各処理手順のために並列なプロセスを割当てていれば、適用可能である。
また、複数プロセッサが各々複数のディスクを互いに共用する構成にも適用可能である。
【００７２】
【発明の効果】
本発明のデータベース分割管理方法によれば、システム構成が負荷に適合したものとなり、期待する並列度が得られると共に、高速な問い合せを実現できるようになる。
本発明の並列データベースシステムによれば、負荷変動があってもシステム構成を常に負荷に適合したものに変更するスケーラブルな並列データベースシステムが得られる。
【図面の簡単な説明】
【図１】本発明の一実施形態の並列データベースシステムを示す構成図である。
【図２】本発明のデータベース分割管理方法を示す概念図である。
【図３】本発明のデータベース分割管理方法による最適ノード配分（ＩＯＳがある場合）の概念図である。
【図４】本発明のデータベース分割管理方法による最適ノード配分（ＩＯＳがない場合）の概念図である。
【図５】ＦＥＳの構成図である。
【図６】ＢＥＳの構成図である。
【図７】ＩＯＳの構成図である。
【図８】ＤＳの構成図である。
【図９】ＪＳの構成図である。
【図１０】システム制御部の処理のフローチャートである。
【図１１】問合せ分析処理のフローチャートである。
【図１２】問合せ解析の処理のフローチャートである。
【図１３】静的最適化処理のフローチャートである。
【図１４】述語選択率推定の処理のフローチャートである。
【図１５】アクセスパス剪定の処理のフローチャートである。
【図１６】処理手順候補生成の処理のフローチャートである。
【図１７】コード生成の処理のフローチャートである。
【図１８】問合せ実行処理のフローチャートである。
【図１９】動的最適化の処理のフローチャートである。
【図２０】コード解釈実行の処理のフローチャートである。
【図２１】データロード処理のフローチャートである。
【図２２】動的負荷制御処理のフローチャートである。
【図２３】動的負荷制御の概念図である。
【符号の説明】
１．．．並列データベースシステム
１０、１１．．．アプリケーションプログラム、
２０．．．データベース管理システム
２１．．．システム制御部、
２１０．．．データロード処理、　２１０．．．動的負荷制御処理
２２．．．論理処理部、
２２０．．．問合せ解析、２２１．．．静的最適化処理、２２２．．．コード生成、
２２３．．．動的最適化処理、２２４．．．コード解釈実行
３０．．．オペレーティングシステム、　４０．．．データベース
７０．．．ＩＯＳ、　　　７１．．．ＪＳ
７２．．．ＤＳ　　　　　７３．．．ＢＥＳ
７５．．．ＦＥＳ、　　　８０、８１、８２．．．ディスク
９０．．．相互結合ネットワーク[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a database division management method and a parallel database system, and more particularly, to a database division management method and a parallel database system for optimizing the number of processors or disks performing database processing according to the load.
[0002]
[Prior art]
For example, "David DeWitt and Jim Gray: Parallel Database Systems: The Future of High Performance Database Systems, CACM, Vol. 35, No. 6, 1992, which are parallel databases, are proposed in Systems, CACM, Vol. 35, No. 6, 1992.
In this parallel database system, a plurality of processors are connected in a tightly coupled or loosely coupled manner, and database processing is distributed to the plurality of processors.
[0003]
[Problems to be solved by the invention]
The system configuration of the conventional parallel database system is left to the user and is fixed.
For this reason, the system configuration may become incompatible with the load from the beginning or may become incompatible in the middle, so that the desired degree of parallelism cannot be obtained, and a high-speed inquiry cannot be realized.
Accordingly, an object of the present invention is to provide a database division management method and a parallel database system that can obtain an expected degree of parallelism and can realize a high-speed inquiry.
[0004]
[Means for Solving the Problems]
In a first aspect, the present invention provides an FES node having a function of analyzing, optimizing, and creating a processing procedure from a query from a user, and accessing a database based on the processing procedure created by the FES node. In a parallel database system in which a BES node having a function and an IOS node having a disk and having a function of storing and managing a database on the disk are connected by a network, the FES node is connected to the FES node according to the load pattern of the database processing. There is provided a database division management method characterized by determining the number of processors to be allocated, the number of processors to be allocated to a BES node, the number of processors to be allocated to an IOS node, the number of disks of an IOS node, and the number of divided disks.
[0005]
Further, an FES node having a function of analyzing, optimizing, and creating a processing procedure from a query from a user, a function of accessing a database based on the processing procedure created by the FES node, and a disk having the disk In a parallel database system in which a BES node having a function of storing and managing a database is connected by a network, the number of processors to be assigned to the FES node, the number of processors to be assigned to the BES node, A database division management method is provided, wherein the number of disks of a BES node and the number of divisions of a disk are determined.
[0006]
In the database division management method according to the first aspect, the number of processors, the number of disks, and the number of disks to be allocated to each node are divided according to the load pattern of the database processing (such as a single search process, a single update process, and a data retrieval process). Determine the number and. Therefore, the system configuration is adapted to the load, the expected degree of parallelism can be obtained, and a high-speed inquiry can be realized.
[0007]
In a second aspect, the present invention provides an FES node having a function of analyzing, optimizing, and creating a processing procedure from a query from a user, and accessing a database based on the processing procedure created by the FES node. In a parallel database system in which a BES node having a function and an IOS node having a disk and having a function of storing and managing a database in the disk are connected via a network, a parallel system that keeps the time required for scanning the database constant. The upper limit of the number of accessible pages is determined, and according to the upper limit of the number of pages, the number of processors allocated to the FES node, the number of processors allocated to the BES node, the number of processors allocated to the IOS node, the number of disks of the IOS node, Database for determining the number of disk divisions To provide the split management method.
[0008]
Further, an FES node having a function of analyzing, optimizing, and creating a processing procedure from a query from a user, a function of accessing a database based on the processing procedure created by the FES node, and a disk having the disk In a parallel database system in which a BES node having a function of storing and managing a database is connected via a network, the upper limit of the number of pages that can be accessed in parallel is determined so that the time required for scanning the database is constant. The number of processors to be allocated to the FES node, the number of processors to be allocated to the BES node, the number of disks of the BES node, and the number of disk partitions are determined according to the upper limit of the database partition management method.
[0009]
In the database division management method according to the second aspect, the number of processors, the number of disks, the number of disks, and the number of processors allocated to each node are determined in accordance with the upper limit of the number of pages that can be accessed in parallel to keep the time required for scanning the database constant. To determine. Therefore, a high-speed inquiry can be realized.
[0010]
In a third aspect, the present invention provides an FES node having a function of analyzing, optimizing, and creating a processing procedure from a query from a user, and accesses a database based on the processing procedure created by the FES node. In a parallel database system in which a BES node having a function and an IOS node having a disk and having a function of storing and managing a database on the disk are connected via a network, an expected parallelism p is calculated based on a load pattern. The number of processors to be assigned to the FES node, the number of processors to be assigned to the BES node, the number of processors to be assigned to the IOS node, the number of disks of the IOS node, and the number of divided disks are determined according to the expected parallelism p. A database division management method is provided.
[0011]
Further, an FES node having a function of analyzing, optimizing, and creating a processing procedure from a query from a user, a function of accessing a database based on the processing procedure created by the FES node, and a disk having the disk In a parallel database system in which a BES node having a function of storing and managing a database is connected by a network, an expected parallelism p is calculated based on a load pattern, and a processor assigned to the FES node in accordance with the expected parallelism p The number of processors, the number of processors to be assigned to the BES node, the number of disks of the BES node, and the number of disk partitions are determined.
[0012]
In the database division management method according to the third aspect, the number of processors, the number of disks, and the number of disks to be allocated to each node are determined according to the expected parallelism p calculated based on the load pattern. Therefore, the expected degree of parallelism can be obtained.
[0013]
In a fourth aspect, the present invention provides the database division management method having the above configuration, in which the optimum page access number m is calculated, and when there is a key range division, the number of stored pages s (= m / p) in sub-key range units. ) Is calculated, the sub-key range is divided in units of s pages, and data is inserted into a disk.
In the database division management method according to the fourth aspect, the sub-key range is divided by the storage page number s (= m / p) in the sub-key range unit calculated from the expected parallelism p and the optimal page access number m, and the data is written to the disk. Perform the insertion. Thus, data can be divided and managed substantially equally.
[0014]
In a fifth aspect, the present invention provides an FES node having a function of analyzing, optimizing, and creating a processing procedure from a query from a user, and accessing a database based on the processing procedure created by the FES node. In a parallel database system in which a BES node having a function and an IOS node having a disk and having a function of storing and managing a database in the disk are connected via a network, the number of access pages and hit rows acquired during the query execution process The number of processors to be allocated to the FES node, the number of processors to be allocated to the BES node, and the number of processors to be allocated to the IOS node are detected in the direction of eliminating the load imbalance based on the load information such as the number and the number of communications. And changing the number of disks of the IOS node. Providing a database division management how.
[0015]
Further, an FES node having a function of analyzing, optimizing, and creating a processing procedure from a query from a user, a function of accessing a database based on the processing procedure created by the FES node, and a disk having the disk In a parallel database system in which a BES node having a function of storing and managing a database is connected via a network, the load is determined based on load information such as the number of access pages, the number of hit rows, and the number of communications acquired during the query execution process. A database division management method characterized by changing the number of processors to be allocated to an FES node, the number of processors to be allocated to a BES node, and the number of disks of a BES node in a direction to detect the imbalance and eliminate the load imbalance. provide.
[0016]
In the database division management method according to the fifth aspect, a load imbalance is detected during a query execution process, and the number of processors or the number of disks of each node is changed in a direction to eliminate the load imbalance. Therefore, even if there is a load change, the system configuration is always adapted to the load, and the expected degree of parallelism can be obtained, and a high-speed inquiry can be realized.
[0017]
According to a sixth aspect, in the database division management method having the above-described configuration, when adding the number of processors to be assigned to a BES node or the number of processors or the number of disks to be assigned to an IOS node, the method is added when online. Block the key range of the database table managed by the processor or disk, allocate a new processor or disk, take over the lock information and directory information, rewrite the dictionary information necessary for node distribution control, and then In addition, the present invention provides a database division management method characterized in that the blockage is released when online.
Further, in the database division management method having the above configuration, when adding the number of processors or the number of disks to be allocated to the BES node, if online, the key range of the table of the database managed by the processor or the disk to be added is changed. Shut down, allocate a new processor or disk, take over lock information and directory information, rewrite dictionary information required for node distribution control, move data from the disk group to be added to the new disk group Thereafter, if online, the blockage is released, and a database division management method is provided.
[0018]
In the database division management method according to the sixth aspect, when adding or deleting the number of processors or the number of disks, if online, the key range of the table is closed, a takeover process is performed, and then the closing is performed. To release. Thus, overhead can be minimized. In a system configuration with an IOS node, data can be taken over without being moved.
[0019]
According to a seventh aspect of the present invention, in the database division management method having the above configuration, when deleting the number of processors to be allocated to the BES node or the number of processors or the number of disks to be allocated to the IOS node, if the number of processors is online, the deletion is performed. Block the key range of the database table managed by the processor or disk, determine the processor or disk to be deleted, take over the lock information and directory information, rewrite the dictionary information necessary for node distribution control, Thereafter, if online, the blockage is released, and a database division management method is provided.
[0020]
In the database division management method having the above configuration, when the number of processors or the number of disks to be allocated to the BES node is deleted, if online, the key range of the table of the database managed by the processor or the disk to be deleted is changed. Decide which processor or disk to block and delete, take over the lock information and directory information, rewrite the dictionary information required for node distribution control, and move the data from the target disk group to the takeover disk group Thereafter, if online, the blockage is released, and a database division management method is provided.
[0021]
In the database division management method according to the seventh aspect, when adding or deleting the number of processors or the number of disks, if online, the key range of the table is closed, a takeover process is performed, and then the closing is performed. To release. Thus, overhead can be minimized. In a system configuration with an IOS node, data can be taken over without being moved.
[0022]
According to an eighth aspect, the present invention provides a parallel database system characterized by dynamically changing the number of processors or disks for performing database processing by the database division management method having the above configuration.
In the parallel database system according to the eighth aspect, a scalable parallel database system is provided by the operation of the fifth to seventh aspects.
[0023]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited to this.
FIG. 1 is a configuration diagram illustrating a parallel database system 1 according to an embodiment of the present invention.
This parallel database system 1 connects an FES (front end server) node, a BES (back end server) node, an IOS (input output server) node, a DS (dictionary server) node, and a JS (journal server) node via a network 90. This is the configuration. Each node is also connected to other systems.
The FES node is a node including the FES 75 including at least one processor having no disk, and has a function of a front-end server that executes analysis, optimization, and creation of a processing procedure from a user's inquiry.
The BES node is a node composed of at least one BES 73 having no disk and having a processor, and has a function of accessing a database based on the processing procedure created by the FES 75.
The IOS node is a node including an IOS 70 composed of at least one processor and at least one disk 80, and has a function of storing and managing a database on the disk 80. If the BES node has the function of the IOS node, the IOS node can be omitted. In this case, a disk is connected to the BES 73, and the BES 73 has a function of storing and managing a database on the disk 80.
[0024]
The database is composed of a plurality of tables. The table has a two-dimensional table format, and includes a plurality of rows. One row is composed of one or more columns (attributes). This table is physically divided into fixed-length pages each consisting of a predetermined number of rows and stored on the disk 80. The storage position of each page on the disk 80 can be known using directory information.
The DS node is a node including a DS 71 composed of at least one processor and at least one disk 81, and has a function of collectively managing definition information of a database.
The JS node is a node including a JS 72 configured with at least one processor and at least one disk 82, and has a function of storing and managing database update history information executed by each node.
[0025]
FIG. 2 is a conceptual diagram of the database division management method of the present invention which determines the number of processors, the number of disks, and the number of divisions of the disks of the FES 75, BES 73, and IOS 70 in the parallel database system 1.
First, the load pattern of the database processing is determined by the workload specified by the user. The load pattern includes a single search process, a single update process, a data retrieval process, and the like. In accordance with the load pattern, the IOS 70 determines how many divisions of the disk 80 are to be managed (if the BES has the function of the IOS node, the BES 73 determines how many divisions of the disk are to be managed).
[0026]
That is, at the time of schema definition, the number of disks required for storage is determined from the table partitioning method (key value range, number of rows per range (converting the number of pages), etc.), and if the unit of blockage and reorganization is determined, BES73 (The unit of blockage or reorganization depends on the disk, IOS, and BES).
As a result, the number of components of the BES 73, the IOS 70, and the disks 80 is determined. That is, it becomes as follows.
Already divided number. . . A database division unit that can be accessed in parallel managed by all BESs (dynamic BES addition / deletion is performed in this division unit)
Number of disks for each division. . . Number of disks allocated in each split
FIG. 2 shows a case where the number of disks is “8”, the number of already divided disks is “4”, and the number of disks for each division is “2”.
If the processor performance is increased by n times, the number of volumes used in each division is increased by n times without changing the number of divisions (however, since the total data transfer rate between the IOS 70 and the disk 80 is limited, The number of disks is also limited.)
Here, the disk corresponds to one disk device. In the present invention, the “disk” does not necessarily have to correspond one-to-one with one disk device. For example, when one disk device includes a plurality of disk devices (disk array device), the number of input / output units that can be accessed in parallel may be applied as “disk device”.
[0027]
Although FES: BES: IOS: disk = 1: 4: 1: 8 as shown in FIG. 2, only one FES and BES is required at the time of initial data loading, and FES: BES: IOS: disk = 1: 1. 1: 8. For this reason, the BES 731 has directory information of a database stored in the disks 811 to 842 of the divisions # 1 to # 4.
When the load on the BES 73 is light and only one BES 731 can process the IOS 70 and eight disks 811 to 842, only one BES 731 accesses a database stored in the eight disks 811 to 842. Therefore, FES: BES: IOS: disk = 1: 1: 1: 8 remains.
When the load on the BES 731 increases and the state of utilization of 100% continues and a load imbalance is detected, the BES 732 is added. Since the number of divisions is “4”, two divisions are respectively associated with the two BESs 731 and 732. Therefore, the BES 731 has directory information of a database stored in the disks 811 to 822 of the divisions # 1 and # 2. Also, the BES 732 has directory information of a database stored in the disks 831 to 842 of the divisions # 3 and # 4. FES: BES: IOS: disk = 1: 2: 1: 8.
[0028]
Further, when the load on the BES 731 and 732 increases and the state of the utilization rate of 100% continues and the load imbalance is detected, the BES 733 and 734 are added to the BES 731 and 732, respectively. Since the number of divisions is “4”, one division is associated with each of the four BESs 731, 732, 733, and 734. Therefore, the BES 731 has directory information of a database stored in the disks 811 to 812 of the division # 1. Further, the BES 732 has directory information of a database stored in the disks 821 to 822 of the division # 2. In addition, the BES 733 has directory information of a database stored in the disks 831 to 832 of the division # 3. Also, the BES 734 has directory information of a database stored on the disks 841 to 842 of the division # 4. FES: BES: IOS: disk = 1: 4: 1: 8.
[0029]
When the load is reduced and the utilization rate of the BES 733 and 734 continues to be, for example, less than 50%, it is more effective to use the node allocated to the BES 733 and 734 for other processing. Therefore, BES733 and 734 whose utilization rate is less than 50% are combined. Then, the data is reduced to FES: BES: IOS: disk = 1: 2: 1: 8.
[0030]
As described above, if the BES is increased or decreased according to the load, a scalable system between FES: BES: IOS: disk = 1: 1: 1: 8 to 1: 4: 1: 8 can be realized.
The IOS 70 only needs to have parallel tasks for the number of disks that can be accessed in parallel, regardless of the correspondence between the BES 73 and the disks 80. Therefore, by moving directory information between BESs without moving data, the correspondence between the BES 73 and the disk 80 can be changed, and access separation and integration can be easily performed.
Next, the number of processors, the number of disks, and the number of divided disks will be described with numerical examples in the case where the load pattern is a single update process and the case of a data retrieval process.
Assumptions are made as follows.
FES processing (receiving processing). . . 30 [K step]
BES process (one-item update process) . . 60 [K step]
(Data retrieval processing). . . 220 [K step]
Transmission processing. . . 6 [K step]
Reception processing. . . 6 + 4 * number of pages [K step]
Input / output issue processing. . . 4 + 4 * Number of pages [K step]
Processor performance. . . 10 [M steps / sec]
Input / output performance (1 page access) . . 20 [msec]
(10 pages access). . . 30 [msec]
A. In case of single update (1 page access)
(1) In the case of a system configuration with an IOS node
When the processor performance 10 [M steps / sec] is divided by 30 [K steps] of the FES processing, the reception processing can be performed up to 333 times / sec.
[0031]
In addition, reception processing 6 [K step] of the execution request from the FES + transmission processing 6 [K step] of the data extraction request from the BES + reception processing 10 [K step] of the data extraction result from the IOS + one item update processing 60 [ The transmission process of the execution request result to the [K step #] + FES 6 [K step #] = 88 [K step #] is necessary for the single update processing in the BES, so that the processor performance 10 [M step / sec] is divided by this. In this case, a single update process can be performed up to 114 times / second.
Further, the input / output request reception processing 6 [K step #] from the BES + input / output issue processing 8 [K step #] + the input / output request result transmission processing 6 [K step #] = 20 [K step] of the IOS is performed by the IOS. Since it is necessary to access the disk, if the processor performance is divided by 10 [M steps / second], the disk can be accessed up to 500 times / second.
[0032]
In addition, since random input / output of one page requires 20 [msec], one disk can be accessed up to 50 times / sec. Thus, when the number of times that the disk can be accessed by the IOS is divided by 500 times / second, up to 10 disks can be mounted on the IOS.
Further, when the number of times that the single update process can be performed by the BES is 114 times / second, and the number of times that the disk can be accessed by the IOS is 500 times / second, one IOS can support 4.3 BESs. is there.
[0033]
Further, when the number of times that one case can be updated by the BES is divided by 114 times / second by the number of times that the reception processing by the FES can be performed by 333 times / second, one FES can support three BESs.
From the above, FES: BES = 1: 3, BES: IOS = 4.3: 1, and IOS: disk = 1: 10. Therefore, as a whole, as shown in FIG. 3, when FES: BES: IOS: disk = 1: 4: 1: 8, the mounting becomes almost balanced (some unbalance between FES and disk occurs). Occurs).
[0034]
(2) In the case of a system configuration in which the BES node has the function of the IOS node
When the processor performance 10 [M steps / sec] is divided by 30 [K steps] of the FES processing, the reception processing can be performed up to 333 times / sec.
In addition, reception processing 6 [K step] of the execution request from the FES + input / output issuance processing 8 [K step] + one-item update processing 60 [K step] + transmission processing 6 of the execution request result to the FES [K step] = Since 80 [K steps] is necessary for the single update process in the BES, if the processor performance is divided by 10 [M steps / sec], the single update process can be performed up to 125 times / sec.
Further, since random input / output of one page requires 20 [m seconds], one disk can be accessed up to 50 times / second. With this, if the number of times that one update process can be performed in the BES is divided by 125 times / second, up to 2.5 disks can be mounted in the BES.
Further, when the number of times that the single update process can be performed by the BES is 125 times / second and the number of times that the reception process can be performed by the FES is 333 times / second, one FES can support 2.6 BESs.
[0035]
From the above, FES: BES = 1: 2.6 and BES: disk = 1: 2.5. Therefore, as a whole, as shown in FIG. 4, when FES: BES: disk = 1: 4: 8, the mounting becomes almost balanced (some unbalance occurs in the FES).
B. For data retrieval processing (10-page access)
(1) In the case of a system configuration with an IOS node
When the processor performance 10 [M steps / sec] is divided by 30 [K steps] of the FES processing, the reception processing can be performed up to 333 times / sec.
In addition, reception processing 6 [K step] of the execution request from the FES + transmission processing 6 [K step] of the data extraction request from the BES + reception processing 46 [K step] of the data extraction result from the IOS + data extraction processing 220 [K] [Step #] + Transmission processing result of execution request to FES 6 [K step] = 284 [K step] is necessary for the data extraction processing in BES. Data extraction processing is possible up to 35 times / sec.
Further, the input / output request reception processing 6 [K step #] from the BES + input / output issue processing 44 [K step] + the input / output request result transmission processing 6 [K step] = 56 [K step] of the IOS is performed by the IOS. Since it is necessary to access the disk, if the processor performance is divided by 10 [M steps / second], the disk can be accessed up to 179 times / second.
[0036]
In addition, since batch input / output of 10 pages requires 30 [msec], one disk can be accessed up to 33 times / sec. By dividing the number of times that the disk can be accessed by the IOS by 179 times / second, up to 5.4 disks can be mounted.
When the number of times that the data can be retrieved by the BES is 35 times / second and the number of times that the disk can be accessed by the IOS is 179 times / second, one IOS can support 5.1 BESs. .
Further, if the number of data retrieval processings possible by the BES is 35 times / second, the number of reception processings possible by the FES is 333 times / second, one FES can support 9.5 BESs.
From the above, FES: BES = 1: 9.5, BES: IOS = 5.1: 1, and IOS: disk = 1: 5.4. Therefore, as a whole, when FES: BES: IOS: disk = 1: 10: 2: 10, the mounting is almost balanced (some unbalance occurs in the disk).
[0037]
(2) In the case of a system configuration in which the BES node has the function of the IOS node
When the processor performance 10 [M steps / sec] is divided by 30 [K steps] of the FES processing, the reception processing can be performed up to 333 times / sec.
In addition, reception processing of the execution request from the FES 6 [K step] + input / output issuing processing 44 [K step] + data fetching processing 220 [K step] + transmission processing of the execution request result to the FES 6 [K step] = 276 Since [K step] is necessary for the data fetch processing in the BES, if the processor performance is divided by 10 [M steps / sec], the data fetch processing can be performed up to 36 times / sec.
In addition, since batch input / output of 10 pages requires 30 [msec], one disk can be accessed up to 33 times / sec. Thus, when the number of times that the data can be retrieved by the BES is divided by 36 times / second, only one disk can be mounted.
[0038]
Further, if the number of times that the data can be retrieved by the BES is 36 times / second, the number of times that the FES can receive the data is 333 times / second, one FES can support 9.2 BESs.
From the above, FES: BES = 1: 9.2 and BES: disk = 1: 1. Therefore, as a whole, if FES: BES: disk = 1: 10: 10, mounting will be almost balanced.
[0039]
FIG. 5 is a configuration diagram of the FES 75.
The FES 75 includes application programs 10 to 11 created by a user, a parallel database management system 20 for managing the entire database system such as query processing and resource management, and an operating system 30 for managing the entire computer system such as reading and writing data. Is provided.
[0040]
The parallel database management system 20 includes a system control unit 21, a logical processing unit 22, a physical processing unit 23, and a database / dictionary 24 for temporarily storing data to be processed. The parallel database management system 20 is connected to the network 90 and other systems.
[0041]
The system control unit 21 performs input / output management and the like. Further, it includes a data load process 210 and a dynamic load control process 211.
The logic processing unit 22 includes a query analysis 220 that performs syntax analysis and semantic analysis of the query, a static optimization process 221 that generates an appropriate processing procedure candidate, and a code generation that generates a code corresponding to the processing procedure candidate. 222. It also includes a dynamic optimization process 223 for selecting the most suitable processing procedure candidate, and a code interpretation execution 224 for interpreting and executing the code of the selected processing procedure candidate.
[0042]
The physical processing unit 23 includes a data access process 230 that implements condition determination, editing, and record addition of the accessed data, a database / dictionary buffer control 231 that controls reading and writing of database records, and the like, An exclusive control 233 for implementing exclusive control is provided.
[0043]
FIG. 6 is a configuration diagram of the BES 73.
The BES 73 includes a parallel database management system 20 that manages the entire database system, and an operating system 30 that manages the entire computer system. When the device has the function of the IOS node, it has a disk, and stores and manages the database 40 on the disk.
[0044]
The parallel database management system 20 includes a system control unit 21, a logical processing unit 22, a physical processing unit 23, and a database buffer 24 for temporarily storing data to be processed. The parallel database management system 20 is connected to the network 90 and other systems.
[0045]
The system control unit 21 performs input / output management and the like. Further, a data load process 210 for performing data load in consideration of load distribution is provided.
The logic processing unit 22 includes a code interpretation and execution 224 for interpreting and executing a code.
The physical processing unit 23 includes a data access process 230 that implements condition determination, editing, and record addition of accessed data, a database buffer control 231 that controls reading and writing of database records, and storage of data to be input / output. The system includes a mapping process 232 for managing a position and an exclusive control 233 for implementing exclusive control of resources shared by the system.
[0046]
FIG. 7 is a configuration diagram of the IOS 70 and the disk 80.
The IOS 70 includes a parallel database management system 20 that manages the entire database system, and an operating system 30 that manages the entire computer system.
The database 80 is stored on the disk 80.
The parallel database management system 20 includes a system control unit 21, a physical processing unit 23, and an input / output buffer 24 for temporarily storing data to be processed. The parallel database management system 20 is connected to the network 90 and other systems.
[0047]
The system control unit 21 performs input / output management and the like. Further, a data load process 210 for performing data load in consideration of load distribution is provided.
The physical processing unit 23 includes a data access process 230 that implements condition determination, editing, and record addition of accessed data, and an input / output buffer control 231 that controls reading and writing of database records.
[0048]
FIG. 8 is a configuration diagram of the DS 71 and the disk 81.
The DS 71 includes a parallel database management system 20 that manages the entire database system, and an operating system 30 that manages the entire computer system.
The disk 81 stores the dictionary 50.
The parallel database management system 20 includes a system control unit 21, a logical processing unit 22, a physical processing unit 23, and a dictionary buffer 24. The parallel database management system 20 is connected to the network 90 and other systems.
The logic processing unit 22 includes a code interpretation and execution 224 for interpreting and executing a code.
The physical processing unit 23 includes a data access process 230 for determining conditions, editing, and adding records to accessed data, a dictionary buffer control 231 for controlling reading and writing of dictionary records, and an exclusive control of resources shared by the system. And an exclusive control 233 for realizing the following.
[0049]
FIG. 9 is a configuration diagram of the JS 72 and the disk 82.
The JS 72 includes a parallel database management system 20 that manages the entire database system, and an operating system 30 that manages the entire computer system.
The journal 82 is stored in the disk 82.
The parallel database management system 20 includes a system control unit 21, a physical processing unit 23, and a journal buffer 24. The parallel database management system 20 is connected to the network 90 and other systems.
The physical processing unit 23 includes a data access process 230 that performs condition determination, editing, and record addition of accessed data, and a journal buffer control 231 that controls reading and writing of journal records.
[0050]
FIG. 10 is a flowchart showing the processing of the database management system 20 in the FES 75.
The system control unit 21 checks whether or not it is an inquiry analysis process (212). If it is a query analysis process, the query analysis process 400 is called, executed, and then terminated.
If it is not a query analysis process, it is checked whether it is a query execution process (213).
In the case of the query execution process, the query execution process 410 is called, executed, and then terminated.
If it is not a query execution process, it is checked whether it is a data load process (214). In the case of the data load process, the data load process 210 is called, executed, and then terminated.
If it is not a data load process, it is checked whether it is a dynamic load control process (214). In the case of the dynamic load control process, the dynamic load control process 210 is called, executed, and then terminated.
If it is not a dynamic load control process, the process ends.
[0051]
Note that the flowchart of the processing of the database management system 20 in the BES 73 omits steps 212, 215, 400, and 211 from FIG. Also, the flowchart of the process of the database management system 20 in the IOS 70 omits steps 212, 213, 215, 400, 410, and 211 from FIG.
[0052]
FIG. 11 is a flowchart of the inquiry analysis processing 400.
First, the query analysis 220 executes syntax analysis and semantic analysis of the input query sentence.
Next, the static optimization processing 221 estimates the proportion of data satisfying the condition from the conditional expression appearing in the query, and selects a valid access path candidate (particularly an index) based on a preset rule. ) And create a candidate for the processing procedure.
Next, by the code generation 222, the candidates of the processing procedure are developed into codes in an executable format. Then, the process ends.
[0053]
FIG. 12 is a flowchart of the query analysis 220.
In step 2200, syntax analysis and semantic analysis of the input query sentence are executed. Then, the process ends.
[0054]
FIG. 13 is a flowchart of the static optimization processing 221.
First, the predicate selectivity estimation 2210 estimates the selectivity of the predicate of the conditional expression appearing in the query.
Next, an access path composed of an index or the like is pruned by the access path pruning 2212.
Next, the processing procedure candidate generation unit 2213 generates a processing procedure candidate combining the access paths.
Then, the process ends.
[0055]
FIG. 14 is a flowchart of the predicate selection rate estimation 2210.
In step 22101, it is checked whether a variable appears in the query conditional expression (22101). If the variable does not appear, the process proceeds to step 22102. If the variable appears, the process proceeds to step 22104.
In step 22102, it is checked whether or not there is column value distribution information in the conditional expression. If there is column value distribution information, the process proceeds to step 22103. If there is no column value distribution information, the process proceeds to step 22105.
In step 22103, the selectivity is calculated using the column value distribution information, and the processing ends.
In step 22104, it is checked whether or not there is column value distribution information in the conditional expression. If there is column value distribution information, the process ends. If there is no column value distribution information, the process proceeds to step 22105.
In step 22105, a default value is set according to the type of the conditional expression (22105), and the process ends.
[0056]
FIG. 15 is a flowchart of the access path pruning 2212.
In step 22120, the index of the column appearing in the query conditional expression is registered as an access path candidate.
In step 22121, it is checked whether the table to be accessed by the query is divided and stored in a plurality of nodes. If it is not divided and stored, the process proceeds to step 22122, and if it is divided and stored, the process proceeds to step 22123.
In step 22122, the sequential scan is registered as an access path candidate.
In step 22123, the parallel scan is registered as an access path candidate.
In step 22124, it is checked whether the selectivity of each conditional expression has already been set. If the setting has been completed, the process proceeds to step 22125, and if not, the process proceeds to step 22126.
In step 22125, the index of the conditional expression that minimizes the selectivity for each table is set as the highest priority of the access path.
In step 22126, the maximum value and the minimum value of the selectivity of each conditional expression are obtained.
In step 22127, a selection criterion for each access path is calculated from system characteristics such as processor performance and IO performance.
In step 22128, only those access paths in which the selectivity of an access path obtained by combining a single index or a plurality of indexes is lower than the above selection criteria are registered as access path candidates.
[0057]
FIG. 16 is a flowchart of the processing procedure candidate generation 2213.
In step 22130, it is checked whether the table to be accessed in the inquiry is divided and stored in a plurality of nodes. If it is not divided and stored, the process proceeds to step 22131, and if it is divided and stored, the process proceeds to step 22135.
In step 22131, it is checked whether or not the sorting procedure is included in the processing procedure candidate. If it is not included, the process proceeds to step 22132, and if it is included, the process proceeds to step 22135.
In step 22132, it is checked whether the access path of the table to be accessed in the inquiry is unique. If it is unique, the process proceeds to step 22133, and if not, the process proceeds to step 22134.
In step 22133, a single processing procedure is created, and the processing ends.
In step 22134, a plurality of processing procedures are created, and the process ends.
In step 22135, the query is decomposed into two-way joins that can be joined.
In step 22136, the data reading / data distribution processing procedure and the slot sorting processing procedure are registered as candidates corresponding to the storage node groups of the table to be divided and stored.
In step 22137, the slot sorting procedure, the N-way merging procedure, and the matching procedure are registered as candidates corresponding to the joining processing node group. Note that the slot sort process is a process in which each row in a page is managed by a slot positioned at an offset from the top of the page, and refers to a sort process in a page for a page in which data is stored, and can be read in slot order. In this case, the rows can be accessed in ascending order. The N-way merge process is a process of using a N-way buffer, inputting N sort runs at each merge stage, and finally creating one sort run by the tournament method.
In step 22138, the request data output procedure is registered in the request data output node.
In step 22139, it is checked whether or not the evaluation has been completed for the decomposition result. If all the evaluations have not been completed, the process returns to step 22136, and if all the evaluations have been completed, the process is terminated.
[0058]
FIG. 17 is a flowchart of the code generation 222.
In step 2220, it is checked whether or not the processing procedure candidate is unique. If it is not unique, the process proceeds to step 2221. If it is unique, the process proceeds to step 2223.
In step 2221, optimization information including column value distribution information and the like is embedded in the processing procedure.
In step 2222, a data structure for selecting a processing procedure based on the constant assigned at the time of query execution is created.
In step 2223, the processing procedure is developed into an executable form. Then, the process ends.
[0059]
FIG. 18 is a flowchart of the inquiry execution process 410.
First, a processing procedure to be executed in each node group is determined by the dynamic runtime optimization 223 based on the constants assigned.
Next, this processing procedure is interpreted and executed by code interpretation execution 224. Then, the process ends.
[0060]
FIG. 19 is a flowchart of the dynamic optimization process 223.
In step 22300, a dynamic load control process is executed (22300).
In step 22301, it is checked whether the created processing procedure is single. If single, the process ends. If not, go to step 22302.
In step 22302, the selectivity is calculated based on the substituted constant.
In step 22303, it is checked whether or not the processing procedure candidate includes a parallel processing procedure. If it is included, the process proceeds to step 22304; otherwise, the process proceeds to step 22308.
In step 22304, optimization information (column value distribution information of a join column, the number of rows and the number of pages of a table to be accessed, etc.) is input from the dictionary.
In step 22305, a processing time for data extraction / data distribution is calculated in consideration of each system characteristic.
In step 22306, the number p of nodes to be assigned to the combining process is determined from the processing time.
In step 22307, data distribution information is created based on the optimization information.
In step 22308, a processing procedure is selected according to the access path selection criteria, and the process ends.
[0061]
FIG. 20 is a flowchart of the code interpretation execution 224.
In step 22400, it is checked whether or not the data is to be extracted / distributed. If it is a data fetch / data distribution process, the process proceeds to step 22401; otherwise, the process proceeds to step 22405.
In step 22401, the database is accessed and the conditional expression is evaluated.
In step 22402, data is set in a buffer for each node based on the data distribution information.
In step 22403, it is checked whether the buffer of the node is full. If it is full, go to step 22404; if not, go to step 22420.
In step 22404, the data is transferred to the corresponding node in a page format, and the flow advances to step 22420.
In step 22405, it is checked whether or not slot sort processing is to be performed. If it is a slot sort process, the process proceeds to step 22406; if not, the process proceeds to step 22409.
In step 22406, page format data is received from another node.
In step 22407, a slot sort process is executed.
In step 22408, the slot sort processing result is temporarily stored, and the flow advances to step 22420.
In step 22409, it is checked whether the process is an N-way merge process. If it is an N-way merge process, the process proceeds to step 22410; if not, the process proceeds to step 22412.
[0062]
In step 22410, an N-way merge process is performed.
In step 22411, the result of the N-way merge process is temporarily stored, and the flow advances to step 22420.
In step 22412, it is checked whether or not the matching process is performed. If it is the matching process, the process proceeds to step 22413; if not, the process proceeds to step 22416.
In step 22413, both sort lists are matched, and data is set in the output buffer.
In step 22414, it is checked whether the output buffer is full. If it is full, go to step 22415. If it is not full, go to step 22420.
In step 22415, the data is transferred to the request data output node in a page format, and the flow advances to step 22420.
In step 22416, it is checked whether or not the requested data output processing is performed. If it is a request data output process, the process proceeds to step 22417. If it is not a request data output process, the process proceeds to step 22420.
In step 22417, it is checked whether page format data is transferred from another node. If there is a transfer, the process proceeds to step 22418; otherwise, the process proceeds to step 22419.
In step 22418, page format data is received from another node.
In step 22419, the result of the inquiry processing is output to the application program.
In step 22420, it is checked whether the BES is being executed. If the BES is being executed, the process proceeds to step 22421. If the BES is not being executed, the process ends.
In step 22421, information for estimating the processing load such as the number of access pages, the number of hit rows, and the number of communications is notified to the FES, and the processing ends.
[0063]
FIG. 21 is a flowchart of the data loading process 210.
Before describing each step, the concept will be described.
The data loading method includes a data placement that emphasizes the target response time that keeps the time required to scan the entire table within a certain time, a data placement that emphasizes the expected parallelism that is optimized for m-page access, and a complete volume division. There is a user-designated data arrangement by designated user control.
In the target response time emphasis data arrangement, first, the number of pages required to store the rows of the entire table is obtained. Next, the upper limit of the number of pages to be stored in the disk of each division that can be accessed in parallel is determined. Access is premised on batch input (for example, 10 pages) if necessary. The load distribution is determined according to the combination of the number of disks, the number of IOS, and the number of BES. If there is a key range division, the key range division section is re-divided with the upper limit of the number of pages, and the divided sections are stored on the respective divided disks. This key range division will be described later in detail with reference to FIG.
In the arrangement of data with an emphasis on expected parallelism, it depends on the size of m. When there is a key range division, the number of sub-key range storage pages s (= m / p) for each key range division unit is determined from the size of m and the expected parallelism p, and the page is stored in the disk of each division in s page units. .
[0064]
The method of calculating the expected parallelism p is calculated by the square root of the ratio obtained by dividing the response time by the overhead of each node. When the ratio is 10, the expected degree of parallelism is 3, 100, the expected degree of parallelism is 10, 1000, the expected degree of parallelism is 32, and 10,000, the expected degree of parallelism is 100. If the calculated expected parallelism p exceeds the number of divisions, the number of divisions is adopted (since the maximum number of disks that can be processed is determined by the number of divisions). In the opposite case, the expected degree of parallel p is adopted as the number of divisions with the number of divisions as the upper limit.
Specifically, a trial calculation of a data arrangement optimized for accessing 100 pages is performed. As a premise, batch input is assumed to be 10 pages. Since one I / O time (10 page access) is 300 ms and one I / O overhead is 5.6 ms (56 ksec is required for 10 MIPS performance), the expected parallelism p is about 7 (= √). {300 / 5.6}). Therefore, sub-key range division is performed for every s = 14 (= 100/7) pages.
The user-specified data arrangement is the same data arrangement as in the conventional database management system, and is managed according to the setting parameters.
[0065]
In step 21000, it is checked whether or not the target response time-oriented data arrangement is performed. If not, the process proceeds to step 21001, and if the data is prioritized, the process proceeds to step 21003.
In step 21001, it is checked whether or not the expected parallelism data is placed. If it is not the expected parallelism-oriented data arrangement, the process proceeds to step 21002. If the expected parallelism-oriented data is arranged, the process proceeds to step 21010.
In step 21002, it is checked whether or not there is a user designation. If there is a user designation, the process proceeds to step 21016, and if there is no user designation, the process ends.
In step 21003, the number of pages required to store the rows of the entire table is obtained.
In step 21004, the upper limit of the number of pages to be stored in the disk which can be accessed in parallel with the time required for scanning the table being fixed is determined.
In step 21005, a BES, an IOS, and a disk group that satisfy the above requirements are determined.
In step 21006, it is checked whether there is a key range division. If there is a key range division, the process proceeds to step 21007, and if there is no key range division, the process proceeds to step 21009.
In step 21007, the key range division section is re-divided by a certain upper limit page number.
In step 21008, data insertion is performed corresponding to the key range division section, and the processing ends.
In step 21009, data insertion is performed by dividing by the maximum number of pages, and the process ends.
In step 21010, the optimum page access number m is calculated based on the estimated workload.
In step 21011, the expected parallelism p is calculated, and the BES, the IOS, and the disk group are determined according to the expected parallelism p.
In step 21012, it is checked whether there is a key range division. If there is a key range division, the process proceeds to step 21013. If there is no key range division, the process proceeds to step 21015.
In step 21013, the number of stored pages s (= m / p) in sub-key range units is calculated.
In step 21014, the sub-key range is divided in units of s pages, data is inserted into each disk, and the process ends.
In step 21015, data insertion is performed by dividing by the number of s pages, and the process ends.
In step 21016, data is inserted into the disk managed by the IOS specified by the user, and the process ends.
[0066]
FIG. 22 is a flowchart of the dynamic load control process 211.
In step 21100, the presence or absence of load imbalance (access concentration / discretization) is detected. That is, the DB processing load (the number of processing steps (DB processing, I / O processing, communication processing)), processor performance (converted to processing time), and I / O count (input The resource (processor (BES, IOS), disk) which is a bottleneck is detected from the distribution of (converted to output time)), the DB processing is expanded into an SQL statement, and the access status to each resource is classified in table units. If a load imbalance is detected, the process proceeds to step 21101, and if no load imbalance is detected, the process ends.
[0067]
In step 21101, it is determined from the access distribution information whether to add or delete a BES or to add or delete an IOS-disk pair. If addition or deletion is necessary, the process proceeds to step 21102; otherwise, the process ends.
In step 21102, it is checked whether or not to add. If it is added, the process proceeds to step 21103. If not, the process proceeds to step 2112.
In step 21103, it is checked whether the user is online. If online, the process proceeds to step 21104. If not online, the process proceeds to step 21105.
In step 21104, the key range of the table managed by the target BES group is closed.
In step 21105, a new BES is allocated.
In step 21106, the lock information and the directory information are taken over.
In step 21107, the DS 71 is instructed to rewrite dictionary information necessary for node distribution control.
In step 21108, it is checked whether an IOS exists. If it does not exist, the process proceeds to step 21109. If it exists, the process proceeds to step 21110. This step is inserted in order to cope with both the system configuration in which the IOS exists and the system configuration in which the IOS does not exist with the same software.
In step 21109, data is moved from the target BES group to a new BES group.
In step 21110, it is checked whether it is online. If online, the process proceeds to step 21111. If not online, the process ends.
In step 21111, the block of the key range of the table managed by the target BES group is released, and the process ends.
In step 21112, it is checked whether it is online. If online, the process proceeds to step 21113. If not online, proceed to step 21114.
In step 21113, the key range of the table managed by the target BES group is closed.
In step 21114, the BES to be degenerated is determined.
In step 21115, the lock information and the directory information are taken over.
In step 21116, the DS 71 is instructed to rewrite dictionary information necessary for node distribution control.
In step 21117, it is checked whether or not the IOS exists. If it does not exist, the process proceeds to step 21118, and if it does exist, the process proceeds to step 21119.
In step 21118, data is evicted from the degenerate BES group.
In step 21119, it is checked whether the user is online. If online, the process proceeds to step 21120. If not online, the process ends.
In step 21120, the block of the key range of the table managed by the target BES group is released, and the process ends.
[0068]
FIG. 23 is a conceptual diagram of a data load process using key range division.
The already divided number is “4”. Also, the column values v1 to v6 of the database take the appearance frequency as shown in FIG.
At the time of initial data loading, only one BES 731 is required. When the number of pages to be stored is associated with the disk of each of the divisions 810 to 840 up to the upper limit of the number of pages, the column values v1 to v2 are stored in the disk of the division 810, and the column values v2 to v3 are stored in the disks of the divisions 820 and 830. The column values v3 to v5 are stored in the disks of the division 840, and the column values v5 to v6 are stored in another disk group. At the time of initial data loading, directory information for each disk is created in order to manage pages stored on each disk.
When using the BESs 732 to 734 according to the load at the time of accessing the database, the database is accessed using directory information for each disk corresponding to each BES.
In implementing each of the above processes, the processes may be combined with the following embodiments.
[0069]
In order to facilitate movement of a row between nodes, the row identifier does not include position information such as BES. In the BES, a physical position of a row is specified by combining directory information for specifying a division position of a table and a row identifier. Row movement is dealt with by rewriting directory information. A structure corresponding to reorganization or row movement is provided, and even if a BES is dynamically added, processing can be divided by taking over directory information and lock information.
Also, when replica management of a database is required, a double storage area is required. Regardless of whether the primary copy and the backup copy are managed by the same IOS and BES, the access load to the disk is almost doubled. And it is sufficient.
[0070]
Further, in the event of a failure of a disk, IOS, BES, etc., it is disconnected from online processing and connected to online after recovery. The blockage management method differs depending on each node. When a disk failure occurs, the key range stored on this disk is closed. If a backup copy exists (if the same IOS (mirror disk), a backup copy needs to be acquired under the control of another IOS (data replica)), the processing is distributed. When an IOS failure occurs, the key range stored in the IOS is closed. If a backup copy exists (a backup copy needs to be acquired under the management of another IOS (data replica)), the process is distributed. When a BES failure occurs, the key range managed by this BES is closed. If the IOS exists, a new BES is allocated, the lock information is taken over, and the dictionary information necessary for node distribution control is rewritten, and then the process is continued.
[0071]
The present invention is not limited to the use of the rule and the cost evaluation using the statistical information. If a processing procedure for giving appropriate database reference characteristic information can be obtained, for example, only the cost evaluation, only the rule use, the cost evaluation The present invention can also be applied to a database management system that performs optimization processing such as concurrent use of rules.
The present invention can be realized through a software system of a tightly-coupled / loosely-coupled multiprocessor system large-scale computer or through a tightly-coupled / loosely-coupled multiprocessor system in which a dedicated processor is prepared for each processing unit. It is also possible. Further, the present invention is applicable to a single processor system as long as parallel processes are allocated for each processing procedure.
Further, the present invention is applicable to a configuration in which a plurality of processors share a plurality of disks with each other.
[0072]
【The invention's effect】
According to the database division management method of the present invention, the system configuration is adapted to the load, the expected degree of parallelism can be obtained, and a high-speed inquiry can be realized.
According to the parallel database system of the present invention, it is possible to obtain a scalable parallel database system that always changes the system configuration to one suitable for the load even if there is a load change.
[Brief description of the drawings]
FIG. 1 is a configuration diagram illustrating a parallel database system according to an embodiment of the present invention.
FIG. 2 is a conceptual diagram showing a database division management method of the present invention.
FIG. 3 is a conceptual diagram of an optimal node distribution (when there is an IOS) according to the database division management method of the present invention.
FIG. 4 is a conceptual diagram of optimal node distribution (when there is no IOS) according to the database division management method of the present invention.
FIG. 5 is a configuration diagram of an FES.
FIG. 6 is a configuration diagram of a BES.
FIG. 7 is a configuration diagram of an IOS.
FIG. 8 is a configuration diagram of a DS.
FIG. 9 is a configuration diagram of a JS.
FIG. 10 is a flowchart of a process of a system control unit.
FIG. 11 is a flowchart of an inquiry analysis process.
FIG. 12 is a flowchart of a query analysis process.
FIG. 13 is a flowchart of a static optimization process.
FIG. 14 is a flowchart of a predicate selection rate estimation process.
FIG. 15 is a flowchart of an access path pruning process.
FIG. 16 is a flowchart of processing for generating a processing procedure candidate.
FIG. 17 is a flowchart of a code generation process.
FIG. 18 is a flowchart of an inquiry execution process.
FIG. 19 is a flowchart of a dynamic optimization process.
FIG. 20 is a flowchart of a code interpretation execution process.
FIG. 21 is a flowchart of a data load process.
FIG. 22 is a flowchart of a dynamic load control process.
FIG. 23 is a conceptual diagram of dynamic load control.
[Explanation of symbols]
1. . . Parallel database system
10,11. . . Application programs,
20. . . Database management system
21. . . System control unit,
210. . . Data load processing, 210. . . Dynamic load control processing
22. . . Logic processing unit,
220. . . Query analysis, 221. . . Static optimization processing, 222. . . Code generation,
223. . . Dynamic optimization processing, 224. . . Execute code interpretation
30. . . Operating system, 40. . . Database
70. . . IOS, 71. . . JS
72. . . DS 73. . . BES
75. . . FES, 80, 81, 82. . . disk
90. . . Interconnection network

Claims

An FES node having a function of executing analysis, optimization, and creation of a processing procedure of a query from a user, a BES node having a function of accessing a database based on the processing procedure created by the FES node, and a disk And a parallel database system in which an IOS node having a function of storing and managing a database on the disk is connected via a network,
According to the load pattern of the database processing, the number of processors to be allocated to the FES node, the number of processors to be allocated to the BES node, the number of processors to be allocated to the IOS node, the number of disks of the IOS node, and the number of divided disks are determined. Characteristic database division management method.

An FES node having a function of executing analysis, optimization, and creation of a processing procedure of a query from a user, a function of accessing a database based on the processing procedure created by the FES node, and a disk, and a database on the disk In a parallel database system in which BES nodes having a function of storing and managing
A database division management method characterized by determining the number of processors to be allocated to an FES node, the number of processors to be allocated to a BES node, the number of disks of a BES node, and the number of divisions of disks according to a load pattern of database processing.

An FES node having a function of executing analysis, optimization, and creation of a processing procedure of a query from a user, a BES node having a function of accessing a database based on the processing procedure created by the FES node, and a disk And a parallel database system in which an IOS node having a function of storing and managing a database on the disk is connected via a network,
The upper limit of the number of pages that can be accessed in parallel to keep the time required for scanning the database constant is determined, and the number of processors assigned to the FES node, the number of processors assigned to the BES node, and the number of processors assigned to the IOS node are determined according to the upper limit of the number of pages. A database division management method, wherein the number of processors to be allocated, the number of disks of an IOS node, and the number of divisions of disks are determined.

An FES node having a function of executing analysis, optimization, and creation of a processing procedure of a query from a user, a function of accessing a database based on the processing procedure created by the FES node, and a disk, and a database on the disk In a parallel database system in which BES nodes having a function of storing and managing
The upper limit of the number of pages that can be accessed in parallel to keep the time required for scanning the database constant is determined, and the number of processors allocated to the FES node, the number of processors allocated to the BES node, A database division management method characterized by determining the number of disks and the number of disk divisions.

An FES node having a function of executing analysis, optimization, and creation of a processing procedure of a query from a user, a BES node having a function of accessing a database based on the processing procedure created by the FES node, and a disk And a parallel database system in which an IOS node having a function of storing and managing a database on the disk is connected via a network,
The expected parallelism p is calculated based on the load pattern, and according to the expected parallelism p, the number of processors assigned to the FES node, the number of processors assigned to the BES node, the number of processors assigned to the IOS node, and the number of disks of the IOS node And a number of disk divisions.

An FES node having a function of executing analysis, optimization, and creation of a processing procedure of a query from a user, a function of accessing a database based on the processing procedure created by the FES node, and a disk, and a database on the disk In a parallel database system in which BES nodes having a function of storing and managing
The expected parallelism p is calculated from the load pattern, and the number of processors to be allocated to the FES node, the number of processors to be allocated to the BES node, the number of disks of the BES node, and the number of divided disks are determined according to the expected parallelism p. And a database division management method.

In the database division management method according to any one of claims 1 to 6, an optimum page access number m is calculated, and if there is a key range division, the number of stored pages s (= m / p) in sub-key range units. ), A sub-key range is divided in units of s pages, and data is inserted into a disk.

An FES node having a function of executing analysis, optimization, and creation of a processing procedure of a query from a user, a BES node having a function of accessing a database based on the processing procedure created by the FES node, and a disk And a parallel database system in which an IOS node having a function of storing and managing a database on the disk is connected via a network,
A load imbalance is detected based on load information such as the number of access pages, the number of hit rows, and the number of times of communication acquired during the query execution process, and the number of processors to be allocated to the FES node and the BES node in a direction to eliminate the load imbalance. A database division management method characterized by changing the number of processors to be assigned to a node, the number of processors to be assigned to an IOS node, and the number of disks of an IOS node.

An FES node having a function of executing analysis, optimization, and creation of a processing procedure of a query from a user, a function of accessing a database based on the processing procedure created by the FES node, and a disk, and a database on the disk In a parallel database system in which BES nodes having a function of storing and managing
A load imbalance is detected based on load information such as the number of access pages, the number of hit rows, and the number of times of communication acquired during the query execution process, and the number of processors to be allocated to the FES node and the BES node in a direction to eliminate the load imbalance. And changing the number of processors assigned to the BES node and the number of disks of the BES node.

In the database division management method according to claim 8, when adding the number of processors to be assigned to the BES node or the number of processors or the number of disks to be assigned to the IOS node, if the system is online, it is managed by the processor or disk to be added. Block the key range of the database table, allocate a new processor or disk, take over lock information and directory information, rewrite dictionary information necessary for node distribution control, and then, if online, A database division management method, wherein the blockage is released.

10. In the database division management method according to claim 9, when adding the number of processors or the number of disks to be allocated to the BES node, if online, the key range of the table of the database managed by the processor or the disk to be added is added. Block, assign a new processor or disk, take over lock information and directory information, rewrite dictionary information required for node distribution control, and move data from the disk group to be added to the new disk group Thereafter, if online, the blockage is released, and the database division management method is characterized in that:

In the database division management method according to claim 8 or 10, when deleting the number of processors assigned to the BES node or the number of processors or the number of disks assigned to the IOS node, the processor or disk to be deleted is online if it is online. Blocks the key range of the database table managed by, determines the processor or disk to be deleted, takes over the lock information and directory information, rewrites the dictionary information required for node distribution control, and then online If it is in the middle, the blockage is released, wherein the database is divided and managed.

In the database division management method according to claim 9 or 11, when deleting the number of processors or the number of disks allocated to a BES node, if online, the table of a database managed by the processor or disk to be deleted is deleted. Block the key range of the disk, determine the processor or disk to be deleted, take over the lock information and directory information, rewrite the dictionary information necessary for node distribution control, and take over the disk group to be deleted from the disk group to be deleted Transferring the data to a database, and then, if online, release the blockage.

14. A parallel database system, wherein the number of processors or disks for performing database processing is dynamically changed by the database division management method according to any one of claims 8 to 13.