JP3367140B2

JP3367140B2 - Database management method

Info

Publication number: JP3367140B2
Application number: JP10217893A
Authority: JP
Inventors: 信男河村; 俊一鳥居
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1993-04-28
Filing date: 1993-04-28
Publication date: 2003-01-14
Anticipated expiration: 2018-01-14
Also published as: JPH06314299A

Description

【発明の詳細な説明】【０００１】【産業上の利用分野】本発明は、複数のプロセッサと複
数の記憶装置を用いるデータベース管理方法に関する。【０００２】【従来の技術】データベース管理システムとは、データ
を記録し保持するコンピュータシステムである。【０００３】特に、データベース管理システムのうち、
リレーショナルデータベース管理システムでは、データ
ベースはユーザから二次元の表形式で見られる表（ある
いは、リレーション）から成り、かつ、この表は複数の
行（レコード、あるいはタップル）から構成されてい
る。また、行は複数個の列（アトリビュート、あるいは
フィールド）から構成され、各列にはその列の特性を示
すデータ型，データ長などが規定される。【０００４】このようなリレーショナル・データベース
システムの利用時、ユーザまたはアプリケーションプロ
グラムは、データベースに対する要求またはコマンドを
発行する（問合せと呼ぶ）ことにより、データを処理
(選択，更新，挿入または削除)する。ＳＱＬなどのリレ
ーショナル・データベース管理システムのデータ問合せ
および処理言語では、問合せは非定型である。すなわ
ち、ユーザまたはアプリケーションプログラムは、必要
なことを指定するだけで、それを実行するための処理手
順を指定する必要がない。また、ユーザやアプリケーシ
ョンプログラムは、問合せによってアクセスする表が格
納されている場所を意識する必要もない。【０００５】しかし、ユーザやアプリケーションプログ
ラムが処理手順を意識しない反面、リレーショナル・デ
ータベース管理システムの負担（問合せによって、デー
タベースのアクセス・プランを最適にする処理）が増加
する傾向が強まる。特に、データベースが格納されてい
る磁気ディスクなどの外部記憶装置との入出力処理が負
担となる。【０００６】そこで、近年では、一つの表を複数の外部
記憶装置に分割して格納し、入出力処理の並列化を行う
ことにより入出力処理の負担を軽減するシステムが増え
てきている。【０００７】また、さらに、データベースの演算処理を
も並列化するために、ネットワークで接続された複数の
プロセッサ配下の各外部記憶装置に、表のデータを分割
して格納するシステムもある。このシステムによれば、
一つの表を異なる外部記憶装置に記憶させ、異なるプロ
セッサがそれらを、それぞれ並列に読み取り処理をする
ことができる。このデータ分散技術は、リレーショナル
・データベース管理システムでますます重要な役割を果
たしてきている。【０００８】ところで、データを分散して格納する形態
として、ラウンドロビン，ハッシュ分割，ユーザ指定キ
ー・レンジ分割および一様分割（uniform partition)と
いうような形態がある。【０００９】最初の分割形態であるラウンドロビンは、
複数の記憶装置に対してデータ量が均一になるように格
納する。次のハッシュ分割は、表のある列に対してハッ
シュ関数を適用することによって該当する記憶装置を決
定する。【００１０】また、次にキー・レンジ分割は、表のある
列について、各記憶装置に格納すべきデータの範囲を条
件として指定することによって、与えられたデータの値
によって条件を満足する記憶装置を選択して格納する。
なお、このようなキー・レンジ分割をユーザの指定条件
に基づいて行う場合、分割した表の格納場所の指定の受
付は、格納する記憶装置の指定を直接受けるのではな
く、少なくとも１台の記憶装置からなる論理的なデータ
ベース領域の指定を受け付けるシステムが知られてい
る。このように、論理的なデータベース領域によって分
割した表の格納場所の指定を受け付けるのは、できるだ
けユーザ（データベースを定義する者）が物理的なシス
テム構成をも意識しなくて済むようにするためである。【００１１】最後の一様分割という形態は、あらかじ
め、ある表を初期ロード時に、複数の記憶装置にラウン
ドロビンの形態でデータを分割する。そして、特定のあ
る列（項目）について全体をソートし、再度、ソートし
た項目の順に複数の記憶装置にデータ量が均等になるよ
うに分割する。そうして、各記憶装置毎に、ソートした
項目の最小値，最大値を求め、その最小値，最大値の間
をその記憶装置のキー・レンジとするものである。【００１２】これらの技術は、タビッド・ジェー・デウ
ィット（David.J.DeWitt）等による１９８６年ＶＬＤＢ
国際会議資料の『（ガンマアハイパフォーマンス
データフローデータベースマシン（GAMMA A High
Performance Dataflow Database Machine）』の文献に
集約して記載されている。また、複数のプロセッサによ
るデータベースの並列処理の実現についても述べられて
いる。【００１３】このように、データを複数の異なる記憶装
置に分割して配置することにより、データベース・アク
セスの並列処理が可能となる。データの分割方法として
は、特にラウンドロビンのように複数の記憶装置に均等
にデータを分割すると、問い合わせ要求に対して、各々
同じ処理を要求することができるので並列処理による応
答時間の向上を行うことができる。また、キー・レンジ
分割をした場合には、表に対する問い合わせ要求に指定
された探索条件のうち、キー・レンジ分割を行った列
（項目）に条件が指定されていれば、あらかじめ、条件
を満足するデータが格納されている処理装置のみでデー
タベース処理を行えばよい。これによって、他の処理装
置の負荷が提言されることになり、システム全体のスル
ープットの向上を図ることができる。【００１４】さて、データベース処理を高速化するため
の重要な技術としては、この他に、統計情報を用いて、
データベースのアクセス方法を最適化する技術がある。
統計情報は、システムがユーザに代わって、最適なデー
タベース・アクセス手順を決定するために使用する情報
である。統計情報の一つとして代表される情報には、区
間毎に、ある特定の列に、当該区間に含まれる値を持つ
行の度数を求めた区間度数分布情報がある。【００１５】これにより、問合せによって、区間度数分
布情報をもつ列に条件が設定された場合、その条件を満
足する行の数（選択率）を、当該列についての区間度数
分布情報を参照して算出できるので、問合せのアクセス
処理手順（インデクスを用いるか否か）として最適な手
順を選択することができる。この区間度数分布情報の取
得方法は、グレゴリピアテツキーシャピロ(Gregory
Piatetsky Shapiro）等による１９８４年エイシーエム
シグモド（ＡＣＭ−ＳＩＧＭＯＤ）国際会議資料の
『アキュレートエスティメーションオブザーナ
ンバーオブタプルスサテイスファイングアコン
ディション(ACCURATE ESTIMATION OFTHE NUMBER OF TUP
LES SATISFYING A CONDITION）』に記載されている。【００１６】一般に、データベース管理システムでは、
データベースの設計に際して、データベース利用者にと
って、最良の性能が出せるようにするための手段を提供
する。特に、データベースを構成する表を外部記憶装置
に格納する場合、１台の外部記憶装置に格納すると、表
に対する問い合わせによって、表を格納した外部記憶装
置に対するアクセスが集中し、入出力処理がボトルネッ
クとなりやすく、システム全体のスループットの低下に
つながる。そこで、表を格納する場所として複数の外部
記憶装置を用意することにより、入出力処理のアクセス
を分散させ、スループットの低下を防止する策がよく知
られている。この場合、複数の外部記憶装置の利用方法
は、単純に１台の外部記憶装置に格納しきれなくなった
場合に次の外部記憶装置へとデータを格納していく方法
や、前述した格納条件（キー・レンジ分割，ハッシュ分
割，ラウンドロビン分割）を利用して複数の外部記憶装
置にデータを分割する方法がよく知られている。【００１７】しかし、後者のような格納条件を指定した
表の分割では、その指定した格納条件に適した効果を発
揮する問い合わせのみが有効な結果を生むことになり、
格納条件に適さない問い合わせの処理性能は保証されな
かった。【００１８】【発明が解決しようとする課題】前述したように、従来
の技術では、リレーショナル・データベースのようなデ
ータベース管理システムでは、利用者からの問い合わせ
は非定型であり、特定の効果しか出せない格納条件を利
用者に提供するだけでは、満足なシステム性能を要求で
きない。さらに、一旦、適用した分割条件を他の分割条
件に変更する場合、表のすべてのデータを再編成の対象
とし、表の分割条件の定義も再定義する必要があり、利
用者（データベース管理者）の負担が大きい。【００１９】また、表のデータを格納する外部記憶装置
に対するアクセスのバランス化だけではなく、データを
処理するプロセッサの負荷についても考慮する必要があ
る。表の分割におけるもう一つの課題は、ユーザが指定
するキー・レンジ分割の分割条件である。今日、並列計
算機の技術の向上により、プロセッサの台数も数台から
数千台の範囲にまで拡張される。プロセッサに接続され
る記憶装置（磁気ディスク等）の数も同様に拡張され
る。こういったシステムでデータベース・システムを構
築した場合、キー・レンジ分割する表の定義において、
ユーザが指定する分割条件が数十を超えると負担は大き
く、定義も誤りをおかす確率が高い。たとえば、ユーザ
が高々数個の分割条件を指定したとすると、その表のデ
ータ量に比例して記憶装置に格納されるデータ量は増加
し、一つの記憶装置に対する負荷が増加し、Ｉ／Ｏボト
ルネックとなりやすい。さらに、分割条件に指定された
列の定義域によっては、特定の記憶装置にデータが集中
するといったアンバランスが生じることがある。【００２０】この場合、ユーザは新しく記憶装置を用意
し、表を再定義することを強いられる。表の再定義をす
るにあたって、ユーザは、一旦、既に格納されているデ
ータをバックアップし、表を削除し、分割条件について
再定義した後、バックアップしておいたデータをダウン
ロードする必要があり、システムの運用を妨げる時間が
長くなるといった問題があった。【００２１】これについては、一様分割(uniform parti
tion）によれば、ユーザはキー・レンジ分割するときの
分割条件を指定せず、分割条件の対象となる項目を指定
するだけで、システムが複数の記憶装置にあらかじめデ
ータ量を均等にするよう格納し、各記憶装置において指
定された分割する項目の最小値，最大値によって、各記
憶装置のキー・レンジを決定することにより、ユーザの
負担は軽減される。しかし、この場合もデータの追加や
削除によって、各記憶装置のデータ量にアンバランスが
生じ、表の定義を再度行う必要はないが、再編成処理と
してはすべてのデータのバックアップを取得し、再度各
記憶装置のデータ量を均等にし、各記憶装置のキー・レ
ンジを算出し直さなければならないため、システムの運
用時間の妨げになることがある。さらに、各記憶装置の
データ量は一定になっても、表に対する問合せによって
は、特定のキー・レンジにだけアクセスが集中するとい
った状況も考えられ、アクセスの不均衡に対する問題に
ついては各記憶装置毎にアクセス状況の情報を取得し、
ユーザによって表の分割条件の再調整をおこなわなけれ
ばならない。【００２２】本発明の目的は、複数の処理装置と、その
処理装置に接続された記憶装置からなる情報処理装置に
おいて、リレーショナルデータベースを構成する表のデ
ータの最適な分割を行うことのできるデータベース管理
システムを提供することにある。【００２３】【課題を解決するための手段】上記の目的を達成するた
めに、本発明はＮ（Ｎ≧１）個の記憶手段と前記記憶手
段との間で入出力を行うＭ（Ｍ≧１）個のプロセッサか
らなる処理装置を最小構成単位とし、前記処理装置Ｏ
（Ｏ＞１）個の集合をクラスタとし、さらに前記クラス
タＰ（Ｐ＞１）個の集合をクラスタ・グループとした階
層を構成させ、これらのすべての前記処理装置をネット
ワークで接続した情報処理装置において、リレーショナ
ル・データベースを構成する表をある任意の階層におい
て利用者の分割方法の指定に対応させて分割して記憶さ
せ、前記任意の階層に対して指定された分割定義情報を
前記階層に対応させた情報として記憶させることによ
り、利用者からの問い合わせに対して、前記分割定義情
報に基づいてデータを処理する。【００２４】【作用】リレーショナル・データベースを構成する表を
ある任意の階層において利用者の分割方法の指定に対応
させて分割して記憶させ、前記任意の階層に対して指定
された分割定義情報を階層に対応させた情報として記憶
させる本発明に係るリレーショナルデータベースの管理
システムによれば、利用者からの問い合わせに対して、
前記分割定義情報に基づいてデータを処理する処理装置
およびデータが格納されている記憶装置を限定すること
ができるのでシステム全体のスループットを低下させる
ことなく、処理の負荷を増加させないようにすることが
できる。【００２５】【実施例】本発明に係るデータベース管理システムは、
ネットワークに接続した複数のプロセッサにより、各プ
ロセッサに接続された外部記憶装置に、表のデータを分
割する。【００２６】本実施例に係るデータベース管理システム
のハードウエアを図２ないし図４に示す。【００２７】図２において、本発明に係るデータベース
管理システムの最小構成単位は、プロセッサ１２，プロ
セッサ１２に接続された外部記憶装置１４からなり、こ
れをノード１５と呼ぶ。ノードを構成するプロセッサは
１台だけでなく、密結合された複数プロセッサであるこ
ともある。また、プロセッサ１２に接続される外部記憶
装置１４も複数台接続されることがある。ノードは、ネ
ットワーク１０に接続される。ノードを最小構成単位と
する処理装置を複数ノードの集合としてみなしたものを
クラスタ１６とする。さらに、図３及び図４では、クラ
スタ１６を複数まとめたものをクラスタグループ１８と
する。クラスタ１６およびクラスタグループ１８は、論
理的なシステムの構成単位であり、特別なハードウエア
を指し示すものではない。本発明に係るデータベース管
理システムは、ノード１５，クラスタ１６およびクラス
タグループを単位としたシステムの構成変更を可能とす
る。【００２８】次に、図２ないし図４で示したハードウエ
ア構成により、本発明に係るデータベース管理システム
のシステム構成を図５に示す。【００２９】図５において、本発明に係るデータベース
管理システムの各処理部（以後サーバと呼ぶ）の構成
と、サーバと資源の対応関連，ノード構成，通信路（ネ
ットワーク）の位置付けを示す。フロントエンド・サー
バ１０３（以下ＦＥＳと略す）はアプリケーション・プ
ログラム１０４（以下ＡＰと略す）からの問い合わせを
受信し、処理手順を生成する。バックエンド・サーバ１
０１(以下ＢＥＳと略す)はＦＥＳからの処理手順を受信
して、データベース１０２（以下ＤＢと略す）をアクセ
スしてデータを取得しＦＥＳに渡す。ジャーナル・サー
バ１０７（以下ＪＳと略す）はＦＥＳやＢＥＳが発生す
るデータベースの変更履歴情報やトランザクションの状
態情報を記録する。データディクショナリ・サーバ１０
５（以下ＤＤＳと略す）はＦＥＳやＢＥＳ（またはＪ
Ｓ）が利用するメタ情報、例えば、リレーショナルデー
タベースにおける表定義情報や各表の列情報をデータデ
ィクショナリ（以後ＤＤと略す）に保持する。図５にお
ける項番１５は図２で示したノードである。また、項番
１０は、図２で示したネットワークであり、ノード間の
通信を行うための通信バックボーンである。ノードに
は、そのノード固有のノード・アドレスが付与される。
ノード・アドレスは、物理的なノード識別子であり、ネ
ットワーク上ではこのノードアドレスを指定した通信を
行う。【００３０】ネットワークは、通信システムにおけるプ
ロトコルの違いやネットワークインタフェースの違いを
吸収し、ネットワーク上に存在する全てのサーバに対し
て相手サーバを識別するための識別子を指定して送受信
を行うことを可能とする。【００３１】次に、本発明に係るデータベースの表の分
割実施例を図１に示す。【００３２】図１において、表Ｔ１２０は、リレーシ
ョナルデータベースにおいてＳＱＬのデータ定義言語で
あるCREATE TABLE文によって定義されるものである。表
Ｔ１２０は、図２で示したクラスタグループ１８のある
クラスタグループ１に格納されている。さらに、表Ｔ１
（２０）は、クラスタグループ１に含まれる複数のクラ
スタ１６（Cluster１〜ClusterＯ）に分割して格納され
る。さらに、各クラスタにおいては、クラスタに含まれ
る複数のノード１５に分割して格納され、最終的に各ノ
ードに含まれる複数の外部記憶装置１４に分割して記憶
される。【００３３】図６にこのときの表Ｔ１の定義例を示す。
表Ｔ１を格納するクラスタグループ，クラスタ，ノード
および外部記憶装置の指定を階層的に指定し、各階層に
おいてその階層における表の分割方法を指定する方法を
とる。この場合、表Ｔ１は、ますクラスタグループ１に
格納し、クラスタグループ１に含まれるクラスタでは、
表Ｔ１の列Ｃ１をキー・レンジで分割し、クラスタ１か
ら８までに分割する。次に、各クラスタでは、クラスタ
中のノードに対して表Ｔ１の列Ｃ２でハッシュ分割し、
さらに、各ノードでは、ノード中の外部記憶装置に対し
て均等分割するように指定している。【００３４】図７に示すように、これらのクラスタグル
ープ，クラスタ，ノードおよび外部記憶装置の構成情報
は、システム構成情報３０によって管理され、図５にお
けるＤＤＳに記憶される。また、定義された表の分割定
義情報は、図８に示す三つのテーブルで管理され、これ
らのテーブルも図５におけるＤＤＳで記憶する。クラス
タグループ管理情報テーブル３２では、表を分割するク
ラスタグループに関する情報を記憶し、クラスタ管理情
報テーブル３４では、表を分割するクラスタに関する情
報を記憶し、ノード管理情報テーブル３６では、表を分
割するノードとノードに含まれる外部記憶装置（ディス
ク）に関する情報を記憶する。【００３５】図９は本発明に係る表の分割情報に基づい
て、表に対するデータの挿入処理がどのように実施され
るかを示す。図９において、挿入処理解析処理４０は図
５におけるデータベース管理システムの構成において、
ＡＰ１０４からＳＱＬのINSERT文によってFES103に要求
が渡され、処理を開始する。データの挿入対象となる表
の分割格納場所を決定するために、図８に示した三つの
テーブルを参照する。三つのテーブルの参照の際は、FE
S103からDDS105に対してテーブル情報の取得要求を送信
し、受信したDDS105は要求された表の分割情報を検索し
た結果を要求元のFES103に返す。まず、クラスタグルー
プ管理テーブル３２を参照し、表の分割情報を取得する
（ステップ４１０）。そこで、クラスタグループにおけ
る分割情報があれば（ステップ４１２）、データを挿入
するべきクラスタグループおよびクラスタを決定する
(ステップ４１４)。次に、クラスタグループにおける分
割情報がないか、先に決定したクラスタ情報に基づい
て、クラスタ管理テーブル３４を参照し、表のクラスタ
における分割情報を取得する（ステップ４１６）。【００３６】そこで、クラスタにおける分割情報があれ
ば（ステップ４１８）、データを挿入するべきクラスタ
およびノードを決定する（ステップ４２０）。次に、ク
ラスタにおける分割情報がないか、先に決定したノード
情報に基づいて、ノード管理テーブル３６を参照し、表
のノードにおける分割情報を取得する（ステップ422)。【００３７】そこで、ノードにおける分割情報があれば
（ステップ４２４）、データを挿入するべきノードを決
定する（ステップ４２６）。さらに、ノードが決定され
るとノード内のディスク分割情報を参照し、挿入すべき
ディスクを決定する（ステップ４２８）。こうして決定
された表のデータの挿入場所（どのクラスタグループの
どのクラスタのどのノードのどのディスク）に基づい
て、データの挿入要求を該当するサーバ（図５における
ＢＥＳ１０１）に送信する（ステップ４３０）。【００３８】次に、本発明に係る表の分割情報に基づい
て、表に対するデータの検索処理がどのように実施され
るかについて図１０および図１１に示す。【００３９】図１０で、最適化処理５０は図５における
データベース管理システムの構成において、ＡＰ１０４
からＳＱＬのSELECT文によってFES103に要求が渡され、
構文解析処理が行われた後、処理を開始する。データの
検索対象となる表の分割格納場所を決定するために、図
８に示した三つのテーブルを参照する。三つのテーブル
の参照の際は、FES103からDDS105に対してテーブル情報
の取得要求を送信し、受信したDDS105は要求された表の
分割情報を検索した結果を要求元のFES103に返す。【００４０】最適化処理５０は、三つのフェーズに分け
られる。まず、表の分割格納条件を三つのテーブルを参
照し、指定された問い合わせ中の探索条件から、検索の
対象範囲を評価する（ステップ５１０）。次に、決定し
た表の検索範囲を基にアクセス手順を決定する処理を行
う（ステップ５２０）。そして、決定したアクセス手順
を基に処理するサーバ（ＢＥＳ）で実行する処理手続き
を生成する（ステップ３０）。生成された処理手順は、
FES103から処理の対象となるBES101に対して送信され、
BES101は送信された処理手順に基づいてデータベース・
アクセス処理を行い、結果をFES103に返す。【００４１】図１１に図１０のステップ５１０の表分割
格納条件評価処理の処理の流れを示す。【００４２】まず、クラスタグループ管理テーブル３２
を参照し、当該表の分割情報を取得する（ステップ５１
１０）。そこで、クラスタグループにおける分割情報が
あれば（ステップ５１１２）、データを検索するべきク
ラスタグループおよびクラスタを決定する（ステップ５
１１４）。図１における表Ｔ１（２０）の場合、クラス
タグループ１が探索の対象となる。クラスタグループに
おいて、クラスタに分割する場合の分割条件がキー・レ
ンジ分割であるので、キー・レンジ分割の対象となって
いる列Ｃ１についての探索条件が指定されていれば、そ
の探索条件に指定された比較値を基に探索するクラスタ
を決定する。次に、クラスタグループにおける分割情報
がないか、先に決定したクラスタ情報に基づいて、クラ
スタ管理テーブル３４を参照し、表のクラスタにおける
分割情報を取得する（ステップ５１１６）。【００４３】そこで、クラスタにおける分割情報があれ
ば（ステップ５１１８）、データを検索するべきクラス
タおよびノードを決定する（ステップ５１２０）。図１
における表Ｔ１の場合、クラスタ１１が探索の対象とな
るとする。当該クラスタにおいて、ノードに分割する場
合の分割条件がハッシュ分割であるので、ハッシュ分割
の対象となっている列Ｃ２についての探索条件が指定さ
れていれば、その探索条件に指定された比較値を基に探
索するノードを決定する。次に、クラスタにおける分割
情報がないか、先に決定したノード情報に基づいて、ノ
ード管理テーブル３６を参照し、表のノードにおける分
割情報を取得する(ステップ５１２２)。【００４４】そこで、ノードにおける分割情報があれば
（ステップ５１２４）、データを検索するべきノードを
決定する（ステップ５１２６）。図１における表Ｔ１の
場合、クラスタNode１１１が探索の対象となるとする。
ノードが決定されるとノード内のディスク分割情報を参
照し、挿入すべきディスクを決定する(ステップ5128)。
ノードにおいて、ディスクに分割する場合の分割条件が
均等分割であるので、均等分割の場合はそのノードに含
まれる全てのディスクが探索の対象となる。このように
して、問い合わせの処理の対象となる探索処理要求範囲
を決定する（ステップ５１３０）。【００４５】本発明のデータベース管理方法の特定の実
施例について述べたが、その他の例についても本発明の
目的および主旨を逸脱することなく実施できる。その他
の実施例は、プロセッサとプロセッサに接続された複数
の外部記憶装置からなる単独のシステムにおいても、外
部記憶装置を論理的な階層に分割管理させ、外部記憶装
置の階層によって任意の分割方法を実現する方法が挙げ
られる。【００４６】【発明の効果】本発明によれば、データベース中のある
表のデータをシステムを構成するハードウエアの階層を
考慮した分割を行っているので、データの均一な分配を
容易にできる。【００４７】また、このようなデータの分割方式をとる
ことによって、各分割方式の得意なデータ並列検索処理
方式を選択できる。【００４８】さらに、表の分割が階層的に行われるの
で、データの再編成時にも、任意の階層で再編成を行う
事ができ、再編成処理の範囲を局所化できる。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a database management method using a plurality of processors and a plurality of storage devices. [0002] A database management system is a computer system that records and holds data. [0003] In particular, among database management systems,
In a relational database management system, a database is composed of a table (or relation) that is viewed from a user in a two-dimensional table format, and this table is composed of a plurality of rows (records or taples). A row is composed of a plurality of columns (attributes or fields), and each column defines a data type and data length indicating characteristics of the column. When using such a relational database system, a user or an application program processes data by issuing a request or command to the database (called an inquiry).
(Select, update, insert or delete). In a data query and processing language of a relational database management system such as SQL, queries are atypical. That is, the user or the application program only needs to specify what is necessary, and does not need to specify the processing procedure for executing it. Further, the user or the application program does not need to be aware of the location where the table accessed by the inquiry is stored. [0005] However, while the user or the application program is unaware of the processing procedure, the burden on the relational database management system (the process of optimizing the database access plan by query) tends to increase. In particular, input / output processing with an external storage device such as a magnetic disk in which a database is stored becomes a burden. Therefore, in recent years, systems have been increasing in which one table is divided and stored in a plurality of external storage devices, and the load on the input / output processing is reduced by parallelizing the input / output processing. Further, there is a system in which table data is divided and stored in each external storage device under a plurality of processors connected by a network in order to parallelize the arithmetic processing of the database. According to this system,
One table can be stored in different external storage devices, and different processors can read them in parallel. This data distribution technology is playing an increasingly important role in relational database management systems. [0008] Incidentally, as a form for storing data in a distributed manner, there are forms such as round robin, hash division, user-specified key range division, and uniform partition. The first division form, round robin,
Data is stored in a plurality of storage devices so that the data amount is uniform. The next hash partition determines the appropriate storage device by applying a hash function to a column of the table. In the key range division, a range of data to be stored in each storage device is specified as a condition for a certain column of the table, and the storage device satisfying the condition by a given data value. Select and store.
When such a key range division is performed based on the user's designated conditions, the designation of the storage location of the divided table is not directly received by the designation of the storage device to be stored, but at least one storage device is designated. 2. Description of the Related Art There is known a system that receives designation of a logical database area composed of devices. The reason for accepting the specification of the storage location of the table divided by the logical database area is to minimize the need for the user (the person who defines the database) to be aware of the physical system configuration. is there. In the last form of uniform division, data is divided into a plurality of storage devices in a round-robin manner in advance when a certain table is initially loaded. Then, the whole of a certain column (item) is sorted, and the data is divided again into a plurality of storage devices in the order of the sorted items so that the data amount becomes equal. Then, the minimum value and the maximum value of the sorted items are obtained for each storage device, and the range between the minimum value and the maximum value is used as the key range of the storage device. These techniques are described in 1986 VLDB by David J. DeWitt et al.
"GAMMA A High Performance Data Flow Database Machine (GAMMA A High
Performance Dataflow Database Machine)]. It also describes the realization of database parallel processing by multiple processors. As described above, by dividing and arranging data in a plurality of different storage devices, parallel processing of database access can be performed. As a data division method, in particular, when data is equally divided into a plurality of storage devices such as round robin, the same processing can be requested for each inquiry request, so that response time is improved by parallel processing. be able to. In addition, when the key range is divided, if the condition (column) subjected to the key range division is specified in the search condition specified in the query request for the table, the condition is satisfied in advance. The database processing may be performed only by the processing device storing the data to be processed. As a result, the load of another processing device is suggested, and the throughput of the entire system can be improved. As an important technique for speeding up database processing, other than this, using statistical information,
There are techniques for optimizing database access methods.
Statistical information is information that the system uses to determine the optimal database access procedure on behalf of the user. Information represented as one of the statistical information includes, for each section, section frequency distribution information in which a frequency of a row having a value included in the section is obtained in a specific column. Thus, when a condition is set in a column having section frequency distribution information by an inquiry, the number of rows satisfying the condition (selection rate) is determined by referring to the section frequency distribution information for the column. Since it can be calculated, an optimal procedure can be selected as an access processing procedure (whether or not to use an index) of the query. The method of obtaining this section frequency distribution information is based on Gregory Piatetsky Shapiro.
ACATERATE ETIMATION OF THE NUMBER OF TUP in 1984 ACM-SIGMOD international conference materials by Piatetsky Shapiro and others.
LES SATISFYING A CONDITION)]. Generally, in a database management system,
In designing a database, a means is provided for database users to achieve the best performance. In particular, when a table constituting a database is stored in an external storage device, if the table is stored in one external storage device, access to the external storage device storing the table is concentrated due to inquiries to the table, and input / output processing becomes a bottleneck. And the throughput of the entire system is reduced. Therefore, it is well known that a plurality of external storage devices are prepared as locations for storing tables, thereby distributing access for input / output processing and preventing a decrease in throughput. In this case, a method of using a plurality of external storage devices includes a method of storing data in the next external storage device when data cannot be simply stored in one external storage device, or a method of storing data described above. A method of dividing data into a plurality of external storage devices using key range division, hash division, and round robin division is well known. However, in the latter case of dividing a table in which storage conditions are specified, only queries that exhibit an effect suitable for the specified storage conditions produce valid results, and
The processing performance of queries that are not suitable for storage conditions was not guaranteed. As described above, in the prior art, in a database management system such as a relational database, inquiries from users are atypical, and only specific effects can be obtained. Satisfactory system performance cannot be required simply by providing the storage conditions to the user. Furthermore, if the applied partitioning condition is changed to another partitioning condition, all data in the table must be reorganized, and the definition of the table partitioning condition must be redefined. ) Burden. It is necessary to consider not only the balance of access to the external storage device for storing the table data but also the load of the processor for processing the data. Another problem in the table division is a key range division condition specified by a user. Today, with the advancement of parallel computer technology, the number of processors has been expanded from several to several thousand. The number of storage devices (such as magnetic disks) connected to the processor is similarly expanded. When a database system is built with such a system, the definition of the table to be divided into key ranges
If the division condition specified by the user exceeds several tens, the burden is large, and the definition is likely to be erroneous. For example, if the user specifies at most several division conditions, the amount of data stored in the storage device increases in proportion to the amount of data in the table, the load on one storage device increases, and I / O It is easy to become a bottleneck. Further, depending on the domain of the column specified in the division condition, an imbalance such as concentration of data in a specific storage device may occur. In this case, the user is required to prepare a new storage device and redefine the table. In redefining the table, the user needs to back up the data already stored, delete the table, redefine the partitioning conditions, and download the backed up data. There was a problem that the time to hinder the operation of the system increased. For this, a uniform parti
According to the option, the user does not specify the division condition for the key range division, but only specifies the item to be subjected to the division condition, and the system pre-equalizes the data amount in a plurality of storage devices. By determining the key range of each storage device based on the minimum value and the maximum value of the items to be stored and specified in each storage device to be divided, the burden on the user is reduced. However, also in this case, addition or deletion of data causes an imbalance in the data amount of each storage device, and there is no need to define the table again. Since it is necessary to equalize the data amount of each storage device and recalculate the key range of each storage device, the operation time of the system may be hindered. Furthermore, even if the data amount of each storage device becomes constant, there may be a situation in which access is concentrated only in a specific key range depending on a query to a table, and the problem of access imbalance is considered for each storage device. To get access status information,
The user must readjust the table partitioning conditions. An object of the present invention is to provide an information processing apparatus comprising a plurality of processing devices and a storage device connected to the processing devices, capable of performing optimal division of data of a table constituting a relational database. It is to provide a system. In order to achieve the above-mentioned object, the present invention provides an M (M ≧ M) which performs input / output between N (N ≧ 1) storage means and the storage means. 1) A processing device composed of a plurality of processors is a minimum constituent unit, and the processing device O
An information processing device in which a hierarchy is formed in which a set of (O> 1) is a cluster, and the set of clusters P (P> 1) is a cluster group, and all of the processing devices are connected via a network In the above, the table constituting the relational database is divided and stored in a certain arbitrary layer in accordance with the specification of the user's division method, and the division definition information specified for the arbitrary layer corresponds to the above-mentioned layer. By storing the information as the made information, data is processed based on the division definition information in response to an inquiry from a user. The table constituting the relational database is divided and stored in a certain hierarchy corresponding to the user's specification of the division method, and the division definition information specified for the arbitrary hierarchy is stored. According to the relational database management system of the present invention in which information is stored as information corresponding to a hierarchy, in response to an inquiry from a user,
Since it is possible to limit the processing devices that process data based on the division definition information and the storage devices that store the data, it is possible to prevent the processing load from increasing without lowering the throughput of the entire system. it can. DESCRIPTION OF THE PREFERRED EMBODIMENTS A database management system according to the present invention
A plurality of processors connected to the network divide the data in the table into external storage devices connected to each processor. FIGS. 2 to 4 show hardware of the database management system according to the present embodiment. In FIG. 2, the minimum structural unit of the database management system according to the present invention comprises a processor 12 and an external storage device 14 connected to the processor 12, which is called a node 15. The node may comprise not only one processor but also a plurality of tightly coupled processors. Also, a plurality of external storage devices 14 connected to the processor 12 may be connected. The nodes are connected to a network 10. A cluster 16 refers to a processing device having a node as a minimum unit as a set of a plurality of nodes. Further, in FIG. 3 and FIG. 4, a group of a plurality of clusters 16 is referred to as a cluster group 18. The cluster 16 and the cluster group 18 are constituent units of a logical system, and do not indicate special hardware. The database management system according to the present invention enables the system configuration to be changed in units of the node 15, the cluster 16, and the cluster group. Next, FIG. 5 shows a system configuration of the database management system according to the present invention using the hardware configuration shown in FIGS. FIG. 5 shows the configuration of each processing unit (hereinafter referred to as a server) of the database management system according to the present invention, the correspondence between the server and the resources, the node configuration, and the positioning of the communication path (network). The front-end server 103 (hereinafter abbreviated as FES) receives an inquiry from the application program 104 (hereinafter abbreviated as AP) and generates a processing procedure. Backend server 1
01 (hereinafter abbreviated as BES) receives the processing procedure from the FES, accesses a database 102 (hereinafter abbreviated as DB), acquires data, and passes it to the FES. The journal server 107 (hereinafter abbreviated as JS) records change history information of a database in which FES or BES occurs, and status information of a transaction. Data dictionary server 10
5 (hereinafter abbreviated as DDS) is FES or BES (or J
Meta information used by S), for example, table definition information in a relational database and column information of each table are stored in a data dictionary (hereinafter abbreviated as DD). Item number 15 in FIG. 5 is the node shown in FIG. Item No. 10 is the network shown in FIG. 2 and is a communication backbone for performing communication between nodes. A node is given a unique node address.
The node address is a physical node identifier, and communication is performed on the network by designating the node address. The network absorbs differences in protocols and network interfaces in the communication system, and allows all servers existing on the network to transmit and receive data by specifying an identifier for identifying the partner server. And Next, FIG. 1 shows an embodiment of dividing a database table according to the present invention. In FIG. 1, a table T120 is defined in a relational database by a CREATE TABLE statement which is an SQL data definition language. The table T120 is stored in the cluster group 1 including the cluster group 18 shown in FIG. Further, Table T1
(20) is divided into a plurality of clusters 16 (Cluster 1 to Cluster O) included in the cluster group 1 and stored. Further, in each cluster, the data is divided and stored in a plurality of nodes 15 included in the cluster, and finally divided and stored in a plurality of external storage devices 14 included in each node. FIG. 6 shows a definition example of Table T1 at this time.
A method of hierarchically designating the designation of the cluster group, cluster, node, and external storage device storing the table T1 and designating a table dividing method in each hierarchy in each hierarchy is adopted. In this case, the table T1 is stored in the cluster group 1, and in the clusters included in the cluster group 1,
The column C1 of the table T1 is divided by the key range, and is divided into clusters 1 to 8. Next, in each cluster, the nodes in the cluster are hash-divided in column C2 of table T1,
Further, each node specifies that the external storage devices in the node are equally divided. As shown in FIG. 7, the configuration information of these cluster groups, clusters, nodes and external storage devices is managed by system configuration information 30 and stored in the DDS in FIG. Further, the division definition information of the defined table is managed by three tables shown in FIG. 8, and these tables are also stored by the DDS in FIG. The cluster group management information table 32 stores information about the cluster groups that divide the table, the cluster management information table 34 stores information about the clusters that divide the table, and the node management information table 36 stores the nodes that divide the table. And information about an external storage device (disk) included in the node. FIG. 9 shows how data is inserted into a table based on the table partition information according to the present invention. 9, the insertion processing analysis processing 40 is performed in the configuration of the database management system shown in FIG.
The request is passed from the AP 104 to the FES 103 by an SQL INSERT statement, and the process starts. The three tables shown in FIG. 8 are referred to in order to determine the divided storage location of the table into which data is to be inserted. When referencing three tables, use FE
A request to acquire table information is transmitted from S103 to the DDS 105, and the received DDS 105 returns a search result of the requested table division information to the FES 103 that has issued the request. First, referring to the cluster group management table 32, the table division information is acquired (step 410). Therefore, if there is division information in the cluster group (step 412), a cluster group and a cluster into which data is to be inserted are determined.
(Step 414). Next, whether there is any division information in the cluster group or not, based on the previously determined cluster information, the cluster management table 34 is referred to and the division information in the cluster of the table is obtained (step 416). Therefore, if there is division information in the cluster (step 418), a cluster and a node into which data is to be inserted are determined (step 420). Next, based on the previously determined node information, whether there is any division information in the cluster, the node management table 36 is referred to, and the division information in the node of the table is acquired (step 422). Therefore, if there is division information at the node (step 424), the node into which data is to be inserted is determined (step 426). Further, when the node is determined, the disk to be inserted is determined by referring to the disk division information in the node (step 428). A data insertion request is transmitted to the corresponding server (BES 101 in FIG. 5) based on the data insertion location (in which cluster, which cluster, which node, which disk, etc.) of the table thus determined (step 430). Next, FIGS. 10 and 11 show how data retrieval processing for a table is performed based on the table division information according to the present invention. In FIG. 10, the optimizing process 50 is performed by the AP 104 in the configuration of the database management system shown in FIG.
The request is passed to FES103 by the SQL SELECT statement,
After the syntax analysis processing is performed, the processing starts. The three tables shown in FIG. 8 are referred to in order to determine the divided storage location of the table from which data is searched. When referring to the three tables, the FES 103 transmits a table information acquisition request to the DDS 105, and the received DDS 105 returns a search result of the requested table division information to the request source FES 103. The optimization process 50 is divided into three phases. First, the table divided storage conditions are referred to the three tables, and the search target range is evaluated from the specified search condition in the inquiry (step 510). Next, a process of determining an access procedure based on the determined table search range is performed (step 520). Then, a processing procedure to be executed by the server (BES) that performs the processing based on the determined access procedure is generated (step 30). The generated procedure is
Sent from FES103 to BES101 to be processed,
BES101 will send a database
Performs access processing and returns the result to FES103. FIG. 11 shows a flow of the table division storage condition evaluation processing in step 510 of FIG. First, the cluster group management table 32
To obtain the partition information of the table (step 51).
10). Therefore, if there is division information in the cluster group (step 5112), a cluster group and a cluster to be searched for data are determined (step 5).
114). In the case of the table T1 (20) in FIG. 1, the cluster group 1 is to be searched. In the cluster group, since the division condition for dividing into clusters is key range division, if a search condition for the column C1 subject to key range division is specified, the search condition is specified. The cluster to be searched is determined based on the comparison value obtained. Next, whether there is any division information in the cluster group or not, based on the previously determined cluster information, the cluster management table 34 is referred to and the division information in the cluster of the table is obtained (step 5116). Therefore, if there is division information in the cluster (step 5118), the cluster and the node from which data is to be searched are determined (step 5120). FIG.
In the case of Table T1, the cluster 11 is assumed to be a search target. In the cluster, since the partitioning condition for partitioning into nodes is hash partitioning, if a search condition for the column C2 subject to hash partitioning is specified, the comparison value specified in the search condition is used. Determine the node to search based on. Next, based on the previously determined node information, whether there is any division information in the cluster or not, the node management table 36 is referred to, and the division information in the node of the table is acquired (step 5122). Therefore, if there is division information at the node (step 5124), a node from which data is to be retrieved is determined (step 5126). In the case of the table T1 in FIG. 1, it is assumed that the cluster Node 111 is a search target.
When the node is determined, the disk to be inserted is determined by referring to the disk division information in the node (step 5128).
In the node, since the division condition when dividing into disks is equal division, in the case of equal division, all disks included in the node are to be searched. In this way, the search processing request range to be processed by the inquiry is determined (step 5130). Although a specific embodiment of the database management method of the present invention has been described, other examples can be implemented without departing from the purpose and spirit of the present invention. In other embodiments, even in a single system including a processor and a plurality of external storage devices connected to the processor, the external storage device is divided and managed in a logical hierarchy, and an arbitrary division method is performed according to the hierarchy of the external storage device. There is a method of realizing it. According to the present invention, since the data of a certain table in the database is divided in consideration of the hierarchy of the hardware constituting the system, uniform distribution of the data can be facilitated. Also, by adopting such a data division method, it is possible to select a data parallel search processing method which is good at each division method. Further, since the table is divided hierarchically, the data can be reorganized at an arbitrary hierarchy even when the data is reorganized, and the range of the reorganization process can be localized.

【図面の簡単な説明】【図１】本実施例の表の分割実施例を示す説明図。【図２】本実施例のハードウエア構成のクラスタからノ
ードまでの構成を示すブロック図。【図３】本実施例のハードウエア構成のクラスタグルー
プからノードまでの構成を示すブロック図。【図４】本実施例のハードウエア構成の最大構成を示す
ブロック図。【図５】本実施例のデータベース管理システムの構成を
示すブロック図。【図６】本実施例に係る表の分割定義例を示す説明図。【図７】本実施例のハードウエア構成のシステム構成情
報を示す説明図。【図８】本実施例の表の分割情報を示す説明図。【図９】本実施例に係る表へのデータの挿入処理解析処
理の処理フローチャート。【図１０】本実施例に係る表の検索処理における最適化
処理の処理フローチャート。【図１１】本実施例に係る表の検索処理における表分割
格納条件評価処理の処理フローチャート。【符号の説明】１０…ネットワーク、１２…プロセッサ、１４…記憶装
置、１５…ノード、１６…クラスタ、１８…クラスタグ
ループ。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an explanatory diagram showing a divided example of a table according to the present embodiment. FIG. 2 is a block diagram showing a configuration from a cluster to a node having a hardware configuration of the embodiment. FIG. 3 is a block diagram showing a configuration from a cluster group to a node in a hardware configuration of the embodiment. FIG. 4 is a block diagram showing the maximum configuration of the hardware configuration of the embodiment. FIG. 5 is a block diagram illustrating a configuration of a database management system according to the embodiment; FIG. 6 is an explanatory diagram illustrating an example of a table division definition according to the embodiment; FIG. 7 is an explanatory diagram showing system configuration information of a hardware configuration according to the embodiment; FIG. 8 is an explanatory diagram showing table division information according to the embodiment; FIG. 9 is a processing flowchart of an analysis process of inserting data into a table according to the embodiment; FIG. 10 is a flowchart of an optimization process in a table search process according to the embodiment; FIG. 11 is a processing flowchart of table division storage condition evaluation processing in table search processing according to the embodiment; [Description of Signs] 10 network, 12 processor, 14 storage device, 15 node, 16 cluster, 18 cluster group.

フロントページの続き (56)参考文献ＤＥＷＩＴＴ，Ｄ．ＧＲＡＹ，Ｊ，Ｐａｒａｌｌｅｌｄａｔａｂａｓｅｓｙｓｔｅｍｓ：ｔｈｅｆｕｔｕｒｅｏｆｈｉｇｈｐｅｒｆｏｒｍａｎｃｅｄａｔａｂａｓｅｓｙｓｔｅｍｓ，ＣｏｍｍｕｎｉｃａｔｉｏｎｓｏｆｔｈｅＡＣＭ，1992年，Ｖｏｌ. 35，Ｎｏ．６，ｐ．85−98 ＧＨＡＮＤＥＨＡＲＩＺＡＤＥＨ, Ｓ．ＤＥＷＩＴＴ，Ｄ．Ｊ．，Ｈｙｂｒｉｄ−ＲａｎｇｅＰａｒｔｉｏｎｉｎｇＳｔｒａｔｅｇｙ：ＡＮｅｗＤｅｃｌｕｓｔｅｒｉｎｇＳｔｒａｔｅｇｙｆｏｒＭｕｌｔｉｐｒｏｃｅｓｓｏｒＤａｔａｂａｓｅＭａｃｈｉｎｅｓ，16ｔｈＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＶｅｒｙＬａｒｇｅＤａｔａＢａｓｅｓ，1990年，ｐ．481−492 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 G06F 12/00 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of front page (56) References DEWITT, D. GRAY, J, Parallel datasystems: the feature of high performance datasystems, Communications of the ACM, 1992, Vol. 6, p. 85-98 GHANDHARIZADEH, S.M. DEWITT, D.E. J. , Hybrid-Range Partitioning Strategies: A New Destructing Strategies for Multiprocesor Database Machinery, Inc., 16th International News Agency, Canada. 481-492 (58) Field surveyed (Int. Cl. ⁷ , DB name) G06F 17/30 G06F 12/00 JICST file (JOIS)

Claims

(57) [Claim 1] A processing apparatus comprising M (M ≧ 1) processors for performing input / output between N (N ≧ 1) storage means and the storage means. The processing unit O (O
> 1) sets as clusters, and P (P> 1) sets of the clusters as a cluster group to form a hierarchy, wherein all of the processing devices are connected via a network. A table constituting a relational database is divided and stored in a certain hierarchy corresponding to a user's specification of a division method, and division definition information specified for the arbitrary hierarchy is made to correspond to the hierarchy. A database management method characterized by storing data as information and processing data based on the division definition information in response to an inquiry from a user.