JP3584630B2

JP3584630B2 - Classification and aggregation processing method in database processing system

Info

Publication number: JP3584630B2
Application number: JP24958996A
Authority: JP
Inventors: 一智牛嶋; 真二藤原; 一夫正井; 耕作山平; 章前田; 仁史芦田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1996-09-20
Filing date: 1996-09-20
Publication date: 2004-11-04
Anticipated expiration: 2016-09-20
Also published as: JPH1097544A

Description

【０００１】
【発明の属する技術分野】
本発明は１つ以上のノードを連携させてデータベース処理を行なう並列データベース処理システムに係わり、特にデータベースの分類集計処理の高速化を図る並列データベース処理システムに関する。
【０００２】
【従来の技術】
データベースの論理構造は、図２に示すようにテーブル形式（２０）である。この表の横方向をレコード（２１）、縦方向をカラム（２２）という。各レコードの同じカラムは、同じ形式のデータを格納する。
【０００３】
分類集計処理とは、テーブル内のレコードをあらかじめ指定された１つ以上のカラムの値に基づき、いくつかのグループに分類し、それぞれのグループに属するレコードの集合ごとにあらかじめ指定された１つ以上の集計対象カラムの値の合計値、平均値などの統計値を計算する処理である。
【０００４】
図３に分類集計処理の例を示す。この例では、関係データベースは販売実績に関する情報をテーブルの形で保持する。テーブル内のレコードは商品コード、販売地、販売日、値段に関するカラム値をもつ（３０）。この例では、テーブル内のレコードを商品コード、販売地、販売日のカラム値に関して同じ値を持つレコードを一つのグループとし（３１）、それぞれのグループごとにレコードの値段の合計値を計算する（３２）。集計処理の結果は再びテーブルの形で得られる（３３）。
【０００５】
従来の分類集計処理方式としては、第１の方式として文献「ＰａｒａｌｌｅｌＳｏｒｔｉｎｇＡｌｇｏｒｉｔｈｍｓ（ＳｅｌｉｍＧ．Ａｋｌ著、ＡｃａｄｅｍｉｃＰｒｅｓｓ，Ｉｎｃ．発行）」の３章の４８頁〜４９頁に開示されているマージソート方式、第２の方式として文献「ＲｅｌａｔｉｏｎａｌＤａｔａｂａｓｅＭａｎａｇｅｍｅｎｔ（Ｍ．Ｐａｐａｚｏｇｌｏｕ，Ｗ．Ｖａｌｄｅｒ著、ＰＲＥＮＴＩＣＥＨＡＬＬ発行）」の９章の２６２頁〜２６４頁に開示されている単純ハッシュ方式をあげることができる。
【０００６】
まず第１の従来方式としてマージソート方式のフローチャートを図４に示す。マージソート方式では、データベースのテーブルを構成するレコードを二次記憶装置からブロック単位で読み込み（４０）、主記憶上でソートした後（４１）、再びこれを二次記憶装置に書き戻す（４２）。この操作をすべてのレコードブロックに対して繰り返した後（４３）、ブロックごとにソートされたレコードリストの間で二次記憶装置を作業領域としてマージソートを行なう（４４）。以上の操作の後、ソートされたレコードの列を二次記憶装置から順次読み込みながら（４５）レコードの分類集計処理を行なう（４６）。しかしマージソート方式では、レコードの外部ソート処理に際して大量のＩ／Ｏが発生し、またソート処理（分類処理に相当）と集計処理の間で並行実行を行なうことができないため、この方式では効率の良い分類集計処理を行なうことができない。
【０００７】
また第２の従来方式として単純ハッシュ方式のフローチャートを図５に示す。単純ハッシュ方式では、二次記憶装置に格納されたレコードを読みだす際に（５０）、ハッシュ関数を用いたレコードのグループ化を行ない（５１）、同時に集計結果格納領域上の集計結果を更新してゆくことによって（５２）分類処理と集計処理を並行実行し、グループ化のための一時ファイルを作成せずに分類集計処理を行なう。しかし単純ハッシュ方式では、集計結果格納領域に対してランダムにアクセスするため、集計結果が主記憶上の予約領域に収まりきらない場合に（５３）二次記憶装置に対してランダムＩ／Ｏが発生し（５４）、分類集計処理の実行効率を低下させてしまう。
【０００８】
【発明が解決しようとする課題】
従来の関係データベースシステムでは、分類集計処理においてデータベースのレコードをグループ化する際に、二次記憶装置を介したマージソートを行なっていた。この方法では、マージソート処理のために大量のＩ／Ｏアクセスが必要とされ、また分類処理と集計処理を並行実行することができないため、分類集計処理の効率を大きく低下させていた。
【０００９】
あるいはまた、従来の関係データベースシステムでは、グループ集計におけるレコードの分類処理の際に、ハッシュ関数を用いたグルーピングを行なっていた。この方法では、集計処理の際に集計結果格納領域に対してランダムアクセスが発生し、集計結果が主記憶上の予約領域に収まらない場合には大量のランダムＩ／Ｏが二次記憶装置に対して発行され、分類集計処理の効率低下の要因となっていた。
【００１０】
本発明の目的は、データベースの分類集計処理において分類処理と集計処理を並行実行可能とすることである。
【００１１】
本発明の他の目的は、集計結果が主記憶上の予約領域に収まらない場合でも二次記憶装置に対するランダムＩ／Ｏを発生させないことである。
【００１２】
【課題を解決するための手段】
本発明による並列データベース処理システムの分類集計処理方式は、１つ以上の入出力サーバ２と、１つ以上の集計処理サーバ４と、１つ以上の問い合わせ処理サーバ５とを接続するネットワーク３とを備え、多数のレコードからなる一つのテーブルが分割された部分テーブル７を上記複数の入出力サーバに分割して保持する並列データベース処理システムにおいて分類集計処理を行なう方法であって、
ハッシュ機構を用いて集計結果の格納場所の決定と集計処理をレコード単位で実行することによりレコードの分類処理と集計処理を並行実行可能とし、集計結果が主記憶上の予約領域に収まらない場合には、部分集計結果を補助二次記憶装置１７に待避し、集計処理終了後に待避した部分集計結果を統合し、補助二次記憶装置に対するランダムＩ／Ｏを防止することにより効率的に分類集計処理を実行する方式である。
【００１３】
具体的には、データベースに対する問い合わせ処理要求を処理する問い合わせ処理サーバと、データベースのレコードの読み出しを行なう入出力サーバと、レコードの集計処理を行なう集計処理サーバとを備え、前記レコードの一つ以上のカラムをグループ化カラムとして指定し、当該グループ化カラム値に対してグループ識別子を対応させ、当該グループ識別子の値が同じである複数のレコードを１つのグループに分類し、当該レコードの一つ以上のカラムを集計対象カラムとして指定し、各々のグループに分類されたレコード毎に集計処理を行なう第１のデータベース処理システムであって、前記入出力サーバは、各レコードのグループ化カラムの値に応じて当該レコードの集計処理を行なう集計処理サーバに転送する手段を有し、前記集計処理サーバは、集計結果格納領域を集計処理サーバが備える主記憶上に確保する手段と、入出力サーバから受信したレコードのグループ化カラムの値からグループ識別子を生成する手段と、グループ識別子の値から当該グループ識別子に対応する集計結果の格納場所を一意に決定する分類手段と、当該格納場所に納められている集計結果を当該レコードの集計対象カラムの値に基づき更新する集計手段と、当該中間集計結果を前記問い合わせ処理サーバに転送する手段とを有し、前記問い合わせ処理サーバは、前記集計処理サーバから受信した中間集計結果を統合する手段を有することにより実現する。
【００１４】
また、上記第１のデータベース処理システムにおいて、集計処理サーバの主記憶上に確保された集計結果格納領域に集計結果が収まらなくなった場合に、集計途中の集計結果を当該集計結果に対応するグループ化識別子の値に関して主記憶上でソートした後、集計処理サーバの備える補助二次記憶装置に中間集計結果として待避し、集計処理を終えた時点で、主記憶上の集計結果と補助二次記憶装置に待避した前記０個以上のソート済みの中間集計結果との間でマージ処理を行なうことによって、それぞれの集計処理サーバに分配されたレコードに関する集計処理を行うことにより実現する。
【００１５】
また、上記第１のデータベース処理システムにおいて、集計処理サーバの主記憶上に確保された集計結果格納領域に集計結果が収まらなくなった場合に、集計途中の集計結果のうち、最も最近に参照された集計結果の一部を残し、残りの集計結果を当該集計結果に対応するグループ化識別子の値に関して主記憶上でソートした後、集計処理サーバの備える補助二次記憶装置に中間集計結果として待避し、集計処理を終えた時点で、主記憶上の集計結果と補助二次記憶装置に待避した前記０個以上のソート済みの中間集計結果との間でマージ処理を行なうことによって、それぞれの集計処理サーバに分配されたレコードに関する集計処理を行うことにより実現する。
【００１６】
また、上記第１のデータベース処理システムにおいて、集計処理サーバの主記憶上に確保された集計結果格納領域に集計結果が収まらなくなった場合に、あふれて集計することのできなかったレコードを集計処理サーバの備える補助二次記憶装置に待避し、集計処理を終えた時点で、主記憶上の集計結果と補助二次記憶装置に待避した前記０個以上のレコードの集計結果との間でマージ処理を行なうことによって、それぞれの集計処理サーバに分配されたレコードに関する集計処理を行うことにより実現する。
【００１７】
また、上記第１のデータベース処理システムにおいて、集計処理サーバの主記憶上に確保された集計結果格納領域に集計結果が収まらなくなった場合に、あふれて集計することのできなかったレコードを集計処理サーバの備える補助二次記憶装置に待避し、かつ、主記憶上で集計することのできたレコードの比率が一定の値を下回った時点で、集計途中の集計結果を当該集計結果に対応するグループ化識別子の値に関して主記憶上でソートした後、集計処理サーバの備える補助二次記憶装置に中間集計結果として待避し、集計処理を終えた時点で、主記憶上の集計結果と補助二次記憶装置に待避した前記０個以上のレコードの集計結果と補助二次記憶装置に待避した前記０個以上のソート済みの中間集計結果との間でマージ処理を行なうことによって、それぞれの集計処理サーバに分配されたレコードに関する集計処理を行うことにより実現する。
【００１８】
次に、データベースに対する問い合わせ処理要求を処理する問い合わせ処理サーバと、データベースのレコードの読み出しと集計処理を行なう入出力・集計サーバと、中間集計結果の集計処理を行なう中間集計処理サーバと、前記問い合わせ処理サーバと前記入出力・集計サーバと中間集計処理サーバとを備え、前記レコードの一つ以上のカラムをグループ化カラムとして指定し、当該グループ化カラム値に対してグループ識別子を対応させ、当該グループ識別子の値が同じである複数のレコードを１つのグループに分類し、当該レコードの一つ以上のカラムを集計対象カラムとして指定し、各々のグループに分類されたレコード毎に集計処理を行なう第２のデータベース処理システムにおいて、前記入出力・集計サーバは、集計結果格納領域を入出力・集計サーバが備える主記憶上に確保する手段と、各レコードのグループ化カラムの値からグループ識別子を生成する手段と、グループ識別子の値から当該グループ識別子に対応する集計結果の格納場所を一意に決定する分類手段と、当該格納領域に納められている集計結果を当該レコードの集計対象カラムの値に基づき更新する集計手段と、当該部分集計結果に対応するグループ化識別子の値に応じて当該部分集計結果の統合処理を行なう中間集計サーバに転送する手段とを有し、前記中間集計サーバは、入出力・集計サーバから受信した部分集計結果を統合し、中間結果を生成する手段と、当該中間集計結果を前記問い合わせ処理サーバに転送する手段を有し、前記問い合わせ処理サーバは、前記中間集計処理サーバから受信した中間集計結果を統合する手段を有することにより実現する。
【００１９】
また、上記第２のデータベース処理システムにおいて、入出力・集計サーバの主記憶上に確保された集計結果格納領域に集計結果が収まらなくなった場合に、主記憶上の集計途中の集計結果を前記入出力・集計サーバの備える対応規則に従って中間集計サーバに部分集計結果として転送し、中間集計サーバにおいて部分集計結果の分類集計処理を行なうことによって、それぞれの入出力・集計サーバに格納されたレコードに関する集計処理を行うことにより実現する。
【００２０】
また、上記第２のデータベース処理システムにおいて、入出力・集計サーバの主記憶上に確保された集計結果格納領域に集計結果が収まらなくなった場合に、集計途中の集計結果を当該集計結果に対応するグループ化識別子の値に関して主記憶上でソートした後、入出力集計サーバの備える補助二次記憶装置に中間集計結果として待避し、集計処理を終えた時点で、主記憶上の集計結果と補助二次記憶装置に待避した前記０個以上のソート済みの中間集計結果との間でマージ処理を行なうことによって、それぞれの入出力・集計サーバに分配されたレコードに関する集計処理を行うことにより実現する。
【００２１】
また、上記第２のデータベース処理システムにおいて、入出力・集計サーバの主記憶上に確保された集計結果格納領域に集計結果が収まらなくなった場合に、集計途中の集計結果のうち、最も最近に参照された集計結果の一部を残し、残りの集計結果を当該集計結果に対応するグループ化識別子の値に関して主記憶上でソートした後、入出力・集計サーバの備える補助二次記憶装置に中間集計結果として待避し、集計処理を終えた時点で、主記憶上の集計結果と補助二次記憶装置に待避した前記０個以上のソート済みの中間集計結果の間でマージ処理を行なうことによって、それぞれの入出力・集計サーバに分配されたレコードに関する集計処理を行うことにより実現する。
【００２２】
また、上記第２のデータベース処理システムにおいて、入出力・集計サーバの主記憶上に確保された集計結果格納領域に集計結果が収まらなくなった場合に、あふれて集計することのできなかったレコードを入出力・集計サーバの備える補助二次記憶装置に待避し、集計処理を終えた時点で、主記憶上の集計結果と補助二次記憶装置に待避した前記０個以上のレコードの集計結果との間でマージ処理を行なうことによって、それぞれの入出力・集計サーバに分配されたレコードに関する集計処理を行うことにより実現する。
【００２３】
また、上記第２のデータベース処理システムにおいて、入出力・集計サーバの主記憶上に確保された集計結果格納領域に集計結果が収まらなくなった場合に、あふれて集計することのできなかったレコードを入出力・集計サーバの備える補助二次記憶装置に待避し、かつ、主記憶上で集計することのできたレコードの比率が一定の値を下回った時点で、集計途中の集計結果を当該集計結果に対応するグループ化識別子の値に関して主記憶上でソートした後、入出力・集計サーバの備える補助二次記憶装置に中間集計結果として待避し、集計処理を終えた時点で、主記憶上の集計結果と補助二次記憶装置に待避した前記０個以上のレコードの集計結果と補助二次記憶装置に待避した前記０個以上のソート済みの中間集計結果との間でマージ処理を行なうことによって、それぞれの入出力・集計サーバに分配されたレコードに関する集計処理を行うことにより実現する。
【００２４】
また、中間集計サーバにおいて部分集計結果の分類集計処理を行なう際に、中間集計サーバの主記憶上に確保された集計結果格納領域に集計結果が収まらなくなった場合に、集計途中の集計結果を当該集計結果に対応するグループ化識別子の値に関して主記憶上でソートした後、中間集計サーバの備える補助二次記憶装置に中間集計処理として待避し、集計処理を終えた時点で、主記憶上の集計結果と補助二次記憶装置に待避した前記０個以上のソート済みの中間集計結果との間でマージ処理を行なうことによって、それぞれの中間集計サーバに分配された部分集計結果に関する集計処理を行うことにより実現する。
【００２５】
また、中間集計サーバにおいて部分集計結果の分類集計処理を行なう際に、中間集計サーバの主記憶上に確保された集計結果格納領域に集計結果が収まらなくなった場合に、集計途中の集計結果のうち、最も最近に参照された集計結果の一部を残し、残りの集計結果を当該集計結果に対応するグループ化識別子の値に関して主記憶上でソートした後、中間集計サーバの備える補助二次記憶装置に中間集計結果として待避し、集計処理を終えた時点で、主記憶上の集計結果と補助二次記憶装置に待避した前記０個以上のソート済みの中間集計結果との間でマージ処理を行なうことによって、それぞれの中間集計サーバに分配された部分集計結果に関する集計処理を行うことにより実現する。
【００２６】
また、中間集計サーバにおいて部分集計結果の分類集計処理を行なう際に、中間集計サーバの主記憶上に確保された集計結果格納領域に集計結果が収まらなくなった場合に、あふれて集計することのできなかったレコードを中間集計サーバの備える補助二次記憶装置に待避し、集計処理を終えた時点で、主記憶上の集計結果と補助二次記憶装置に待避した前記０個以上のレコードの集計結果との間でマージ処理を行なうことによって、それぞれの中間集計サーバに分配された部分集計結果に関する集計処理を行うことにより実現する。
【００２７】
また、中間集計サーバにおいて部分集計結果の分類集計処理を行なう際に、中間集計サーバの主記憶上に確保された集計結果格納領域に集計結果が収まらなくなった場合に、あふれて集計することのできなかったレコードを中間集計サーバの備える補助二次記憶装置に待避し、かつ、主記憶上で集計することのできたレコードの比率が一定の値を下回った時点で、集計途中の集計結果を当該集計結果に対応するグループ化識別子の値に関して主記憶上でソートした後、中間集計サーバの備える補助二次記憶装置に中間集計結果として待避し、集計処理を終えた時点で、主記憶上の集計結果と補助二次記憶装置に待避した前記０個以上のレコードの集計結果と補助二次記憶装置に待避した前記０個以上のソート済みの中間集計結果との間でマージ処理を行なうことによって、それぞれの中間集計サーバに分配された部分集計結果に関する集計処理を行うことにより実現する。
【００２８】
更に、データベース処理システムに格納されているテーブルを構成するレコードの一つ以上のカラムをグループ化カラムとして指定し、当該指定されたグループ化カラム値に対してグループ識別子を一対一あるいは多対一に対応させ、当該グループ識別子の値が同じである複数のレコードを１つのグループに分類し、当該分類された各々のグループに属するレコードの一つ以上のカラムを集計対象カラムとして指定し、当該指定されたカラムの値に関する集計処理を行なうデータベース処理システムにおいて、分類集計処理のグループ化カラムの指定に、グループ識別子の値の範囲を指定するための書式を有することにより実現する。
【００２９】
更に又、データベース処理システムに格納されているテーブルを構成するレコードの一つ以上のカラムをグループ化カラムとして指定し、当該指定されたグループ化カラム値に対してグループ識別子を一対一あるいは多対一に対応させ、当該グループ識別子の値が同じである複数のレコードを１つのグループに分類し、当該分類された各々のグループに属するレコードの一つ以上のカラムを集計対象カラムとして指定し、当該指定されたカラムの値に関する集計処理を行なうデータベース処理システムにおいて、分類集計処理のグループ化カラムの指定に、グループ化の結果得られるグループ数の上限を指定するための書式を有することにより実現する。
【００３０】
更に又、データベース処理システムに格納されているテーブルを構成するレコードの一つ以上のカラムをグループ化カラムとして指定し、当該指定されたグループ化カラム値に対してグループ識別子を一対一あるいは多対一に対応させ、当該グループ識別子の値が同じである複数のレコードを１つのグループに分類し、当該分類された各々のグループに属するレコードの一つ以上のカラムを集計対象カラムとして指定し、当該指定されたカラムの値に関する集計処理を行なうデータベース処理システムにおいて、分類集計処理のグループ化カラムの指定に、グループ化カラム値とグループ識別子との対応を定義するためのユーザ定義関数を指定するための書式を有することにより実現する。
【００３１】
更に、上記データベース処理システムにおいて、分類集計処理のグループ化カラムの指定に、グループ識別子の値の範囲を指定するための書式を有し、グループ化カラムごとのグループ識別子の値の範囲の指定の有無に対応して、集計結果格納位置をグループ化識別子の値から計算によって直接決定する配列方式と、グループ化識別子の比較によって決定するハッシュ方式とをそれぞれグループ化カラムごとに使い分けることにより実現する。
【００３２】
更に、上記データベース処理システムにおいて、分類集計処理のグループ化カラムの指定に、グループ化の結果得られるグループ数の上限を指定するための書式を有し、グループ化カラムごとのグループ識別子の値の範囲の指定を参照して、指定された範囲の上限あるいは下限を逸脱した値を持つグループ識別子のためのグループを用意し、当該グループに分類されるレコードに関しても集計処理を行なうことにより実現する。
【００３３】
更に、上記データベース処理システムにおいて分類集計処理のグループ化カラムの指定に、グループ化カラム値とグループ識別子との対応を定義するためのユーザ定義関数を指定するための書式を有し、グループ化カラム値とグループ識別子との対応を定義するためのユーザ定義関数を利用して、グループ識別子の値の範囲を狭く限定して集計処理を行なうことにより実現する。
【００３４】
【発明の実施の形態】
（実施例１）
図１に、本発明における並列データベース処理システムの一実施例の構成を示す。図１において、１はテーブルを格納する二次記憶装置、２は二次記憶装置から部分テーブル７を読み出す入出力サーバ、４はレコードの分類集計処理を行なう集計処理サーバ、５は各集計サーバの集計結果を統合し、最終的な集計結果を作成する問い合わせ処理サーバ、３はそれぞれのノードの間でレコードや集計結果を交換するためのネットワーク、６はデータベースに対して分類集計処理要求を発行し、また集計結果を問い合わせ処理サーバ５から取り出すための端末装置である。ネットワーク３としては、ＬＡＮ、ＷＡＮ、専用ハードウェア装置等、任意のプロセッサ間結合ネットワークを利用することができる。
【００３５】
二次記憶装置１は、グループ集計の対象となるレコードを格納する。レコードは、ハッシュ分割法あるいはキー値分割法等の分割手法に基づき、複数の二次記憶装置に分割して格納されている。入出力サーバ２は、二次記憶装置１から読み出したレコードを一時的に格納するデータバッファ８、レコードの分配先の集計処理サーバを決定するレコード分配手段９および複数の送信バッファ１０を保持する。データバッファ８は入出力サーバが備える主記憶上に確保される。
【００３６】
二次記憶装置１の保持する部分テーブル７は、ブロック単位でデータバッファ８に転送される。また、レコード分配手段９では、データバッファ８内のレコードを順次取り出し、各レコードのグループ化の対象となるカラムの値を参照して当該レコードの集計を行なうべき集計処理サーバ４を決定する。入出力サーバ２は、当該レコードの集計処理に必要となるグループ化カラムと集計対象カラムのみを取り出した後、送信バッファ１０からネットワーク３を通じて各集計処理サーバ４に分配する。
【００３７】
集計処理サーバ４は、複数の受信バッファ１１と入出力サーバ２から転送されたレコードを集計する集計手段１２と集計結果を格納する集計結果格納領域１３および集計途中の集計結果を待避するための補助二次記憶装置１７を保持する。集計結果格納領域は集計処理サーバ４の備える主記憶上に確保される。集計処理サーバでは、受信バッファ１１に転送されたレコードに対して集計手段１２を用いて集計処理を行い、中間集計結果を集計結果格納領域１３に格納する。中間集計結果が集計結果格納領域１３に収まらない場合は、集計途中の集計結果は部分集計結果として補助二次記憶装置１７に待避される。集計処理サーバ４は、分配されたすべてのレコードに関して集計処理を終えた時点で、中間集計結果を問い合わせ処理サーバ５へ転送する。
【００３８】
問い合わせ処理サーバ５は、複数の受信バッファ１４、集計結果統合手段１５および最終集計結果格納領域１６を保持する。問い合わせ処理サーバ５は、端末装置６からの集計処理を受け付け、集計処理実行手順を決定し、入出力サーバ２や集計処理サーバ４に対する実行指示を行う。また問い合わせ処理サーバ５はそれぞれの集計処理サーバ４が作成した中間集計結果をネットワーク３を通じて収集し、それらの間で統合処理を行なうことにより最終収集結果を作成し、最終集計結果格納領域１６に格納する。最終集計結果格納領域１６は問い合わせ処理サーバ５が備える主記憶あるいは二次記憶上に保持される。端末装置６は、問い合わせ処理サーバ５に対して集計処理を依頼し、問い合わせ処理サーバ５上に最終集計結果が作成され次第、順次最終集計結果を取り出す。
【００３９】
図６は本実施例おける分類集計処理全体の処理フローを示す。
まずはじめに端末装置６は集計処理実行要求を問い合わせ処理サーバ５に対して転送し（処理要求発行７００）、問い合わせ処理サーバからの応答を待つ。一方集計処理実行要求を受け付けた問い合わせ処理サーバ５は、要求された集計処理を行うための実行計画を作成し（処理要求受付７０１）、当該計画に従って入出力サーバや問い合わせ処理サーバ上の集計処理プログラムの起動を行なう（プログラム起動７０２）。問い合わせ処理サーバ５は、受信バッファ１４、最終集計結果格納領域１６などの初期化を行なった後（領域初期化７０３）、集計処理サーバ４からの応答を待つ。
【００４０】
問い合わせ処理サーバ５により集計処理プログラムを起動された入出力サーバ２や集計処理サーバ４は、問い合わせ処理サーバ５からの指示に従って、データバッファ８、送信バッファ１０、受信バッファ１１、集計結果格納領域１３などの初期化を行う（７０４、７０５）。入出力サーバ２は、領域初期化を行なった後、二次記憶装置１からデータバッファ８に読み込んだレコードごとに分配処理手段９に基づいてレコードの転送先集計処理サーバを決定し、送信バッファ１０を通じてレコードの転送を行なう（７０６）。入出力サーバ２は、すべてのレコードを分配し終えた時点で終了通知を各集計処理サーバに対して送信し（７０７）、終了する（７０８）。
【００４１】
一方入出力サーバ２からレコードを分配された集計処理サーバ４では、レコードを受信し次第、レコードの集計処理を行ない、中間集計結果の作成を始める。中間集計結果があらかじめ割り当てられた集計結果格納領域に収まらない場合には、中間集計結果を部分集計結果として補助二次記憶装置１７への待避処理を行なう（７０９）。各集計処理サーバでは、すべての入出力サーバからの終了通知を受信した時点で（７１０）、もし中間集計結果の二次記憶装置への待避処理が行なわれていた場合には（７１１）、部分集計結果の統合処理を行なう（７１２）。
【００４２】
以上のような手順で作成された中間集計結果は、各集計処理サーバ４から問い合わせ処理サーバ５に対してブロック単位で転送される（７１３）。すべての中間集計結果を転送し終えた時点で、集計処理サーバは問い合わせ処理サーバに対して終了通知を送信し（７１４）、終了する（７１５）。
【００４３】
中間集計結果を受け取った問い合わせ処理サーバ５では、中間集計結果を順次、最終集計結果格納領域１６に格納し、端末装置６から要求にしたがって、最終集計結果を端末装置６に対して転送する（７１６）。問い合わせ処理サーバはすべての集計処理サーバからの終了通知を受け取った時点で（７１７）、端末装置に対して終了通知を送信し（７１８）、終了する（７１９）。端末装置６は、必要に応じて問い合わせ処理サーバ５に対して最終集計結果を要求し、最終主計結果を受信する（７２０）。端末装置６は、問い合わせ処理サーバ５からの終了通知を受信して（７２１）集計処理の終了を確認した時点で自身の処理を終了する（７２２）。
【００４４】
本実施例において、集計処理サーバにおける集計処理（７０９）と中間集計処理転送（７１３）との間を除いて、互いに隣接する処理の間ではレコード単位で並行実行を行なうことが可能である。
【００４５】
以下では、各ノード上の集計処理プログラムのフローチャートを更に詳しく説明する。
【００４６】
図７は入出力サーバ２における分類集計処理のフローチャートを示す。入出力サーバ２は、問い合わせ処理サーバ５からの指示にしたがって、まずデータバッファ８、送信バッファ１０の初期化を行う（８０）。次に二次記憶装置１に格納されているレコードをブロック単位でデータバッファ８に転送する（８１）。次に入出力サーバ２は、データバッファ８からレコードを順次取り出し（８２）、各レコードごとにレコード分配手段９（後述する）にしたがって分配先の集計処理サーバ４を決定する（８３）。各レコードは、分配先ごとにページ単位で蓄積され（８４）、ページが埋まった時点（８５）で送信バッファ１０から対応する集計処理サーバ４に転送される（８６）。入出力サーバ２は、すべてのレコードを転送し終えた時点で、各集計処理サーバに対して終了通知を送信し（８７）、終了する（８８）。
【００４７】
図８は集計処理サーバ４における分類集計処理のフローチャートを示す。集計処理サーバ４は、問い合わせ処理サーバ５からの指示にしたがって、まず集計結果格納領域１３、受信バッファ１１の初期化を行う（９０）。集計処理サーバ４は、入出力サーバ２から受信バッファ１１に転送された（９１）レコードページからレコードを一つずつ取り出し（９２）、それぞれのレコードに対してグループ分類手段（後述する）を適用し、分類先グループを指定するグループ識別子を決定する（９３）。次いで取り出したレコードに対して集計処理を行い（９５）、得られた集計結果を集計結果格納領域１３に対して書き込む（９６）。集計結果格納形式としてハッシュ方式を利用した場合、集計結果があらかじめ割り当てられた集計結果格納領域に収まらない場合がある（９７）。この場合は、途中の集計結果を部分集計結果として対応するグループ化識別子の値に関して主記憶上でソートした後、補助二次記憶装置１７に待避する（９８）。各集計処理サーバは、受信したレコード分だけ集計処理を繰り返すことにより中間集計結果を得る（９９）。
【００４８】
集計処理サーバは、すべての入出力サーバから終了通知を受信した時点で（９１０）、もし集計途中で部分集計結果が補助二次記憶装置１７に待避されていた場合は（９１１）、この時点で待避した部分集計結果の間でマージソート処理を行ない、中間集計結果を作成する（９１２）。二次記憶装置に待避されている部分集計結果は、ソートされているので、部分集計結果の間のマージ処理は効率良く実行することができる。部分集計結果が待避されていない場合は、何も処理を行なう必要はない。各集計処理サーバはすべてのレコードの集計処理を終えた後に、得られた中間集計結果を問い合わせ処理サーバに対して転送し（９１３）、転送を終えた時点で終了通知を問い合わせ処理サーバに対して送信し（９１４）、終了する（９１５）。
【００４９】
図９は問い合わせ処理サーバ５における分類集計処理のフローチャートを示す。
【００５０】
問い合わせ処理サーバ５は、まず端末装置６から集計処理実行要求を受け付け（１００）、処理の内容にしたがって、入出力サーバ２や集計処理サーバ４上の集計処理プログラムを起動する（１０１）。次いで問い合わせ処理サーバ５は、受信バッファ１４、最終集計結果格納領域１６の初期化を行い（１０２）、集計処理サーバ４からの中間集計結果の受信を待つ。問い合わせ処理サーバ５は、各集計処理サーバ４から中間集計結果を受信し次第（１０３）、中間集計結果を順につなぎ合わせて最終集計結果とする（１０４）。問い合わせ処理サーバ５は、端末装置６からの要求にしたがって、最終集計結果を頭から順に転送する（１０５）。問い合わせ処理サーバ５はすべての最終集計結果を転送し終えた時点で（１０６）、端末装置６に対して終了通知を送信し（１０７）、終了する（１０８）。
【００５１】
図１０は、端末装置６における処理のフローチャートを示す。端末装置６は、問い合わせ処理サーバ５に対して集計処理実行要求を発行した後（１１０）、問い合わせ処理サーバ５からの最終集計結果の受信を待つ（１１１）。端末装置６は問い合わせ処理サーバ５に対して次々と最終集計結果の転送を要求する。端末装置６は問い合わせ処理サーバ５から終了通知を受け取った時点で（１１２）、処理を終了する（１１３）。
【００５２】
以下では、上記の説明に登場したレコード分配手段、グループ分類手段、集計処理手段、集計結果格納方式、統合処理手段に関して、それぞれ更に詳しく解説する。
【００５３】
本実施例におけるレコード分配手段とは、あらかじめ複数の二次記憶装置１に分割されて格納されているレコード群を、入出力サーバ２を経由して各集計サーバ４に再分配する方法である。本レコード分配手段では、集計処理において同じグループに分類されるレコード同士が、必ず同一集計サーバ４に対して分散されている必要がある。すなわち本実施例における分配では、集計処理サーバＮｉに分配されるレコードの集合をＳｉ、グループＧｊに分類されるレコードの集合をＴｊとした時、すべてのｊについてあるｉが存在して、Ｓｉ⊇ＧｊかつＳｋ（ｋ≠ｉ）∩Ｇｊ＝空集合が成り立つ。同一グループに分類されるレコードは必ず同一集計処理サーバで集計されるため、問い合わせ処理サーバ５は送られてきた中間集計結果を単につなぎ合わせるだけで簡単に最終集計結果を取得することができる。
【００５４】
レコード分配手段の具体例としては、ハッシュ分割、キー値分割、ラウンドロビン分割などの方法が適用可能である。ただし、二次記憶に分割格納されているレコードが、集計処理を行なうにあたってあらかじめ上記の条件を満たすように分割されている場合は、データの分配を行なわずに、直接入出力サーバが集計処理サーバとしての役割を果たし、集計処理を行なう場合もあり得る。
【００５５】
本実施例におけるグループ分類手段とは、端末装置６から発行された分類集計処理の指示するグループ分けにしたがって各集計処理サーバ４に分配されたレコードにグループ識別子を割り当て、当該識別子に基づきレコードを分類する方法である。レコードとグループ識別子の対応法については、以下で詳しく述べる。
【００５６】
本実施例におけるグループ識別子とは、グループ分類手段において各レコードが分類されるべきグループを指定する識別子である。グループ識別子とは、整数値、文字列などの互いに識別が可能な離散値である。複数のカラムの値に基づいてグループ分類を行なう多次元分類集計処理の場合は、それぞれのグループ化カラムに対応するグループ識別子を組み合わせることによって最終的なグループ識別子を生成する。たとえば、カラム｛Ａｉ｝をもつレコードをグループ化カラム｛Ｇｉ｝（⊆｛Ａｉ｝）に関してグループ化する場合、それぞれのグループ化カラムＧｉに関してグループ識別子対応関数ｇｉを適用し、その結果得られるグループ識別子の組＜ｇ１（Ｇ１），．．，ｇｎ（Ｇｎ）＞を当該レコードのグループ識別子として用いる。
【００５７】
本実施例では、グループ化カラムとグループ識別子の間の対応付けにユーザ定義のグループ化関数を利用することができる。
【００５８】
グループ分類手段においては、グループ化カラムの値に応じて適当なグループ識別子を対応させ、各レコードをいずれかのグループに対応させる。グループ識別子の値の決定法としては、以下の３方式のうち適切なものを選択することができる。
【００５９】
第一の方法は、グループ化に使用されるカラム（グループ化カラム）の値をそのまま利用する方法、第二の方法は、グループ化カラム値に対して組み込み関数を適用した値を利用する方法、第三の方法は、グループ化カラム値に対してユーザ定義のグループ化関数を適用した値を利用する方法である。
【００６０】
ユーザ定義のグループ化関数は、互いに識別可能な離散的な値を出力する必要がある。ユーザ定義のグループ関数を利用することにより、グループ化カラム値の定義域をいくつかの区間に区切り、それぞれの区間毎にグループ化を行なうことが可能となる。例えば、年齢をグループ化カラム値として用い、年代別にレコードをグループ化して、年代別の収入の平均値などを求める処理に有用である。また、区間が限定されない離散的な値の分布を持つグループカラム値に対して、その取りうる値が既知であればユーザ定義のグループ化関数を適用することにより、グループ識別子の分布を区間の限定された連続な分布に変換し、集計結果格納方式としてアクセス効率の良い配列方式を利用することができる。例えば、商品名をグループ化カラム値として用いる場合、出現する商品名が既知であればそれぞれの商品名に番号を対応させ、集計結果格納方式としてハッシュの代わりに対応する番号の最大値分の配列で代用することができる。
【００６１】
本実施例では、グループ化カラム値の分布に関してその範囲や種類があらかじめわかっている場合に、各グループ化カラム値ごとにグループ識別子の分布に関する指定（グループ化指定）を行なうことができる。指定の方法としては、以下の３種類がある。第一の方法はグループ識別子範囲指定、第二の方法はグループ識別子上限指定、第三の方法はグループ識別子種別上限指定である。
【００６２】
グループ識別子範囲指定では、グループ分類に用いるグループ識別子の取りうる値の上限値と下限値の指定が可能である。グループ識別子に順序を定義できる場合に利用可能である。
【００６３】
範囲外のグループ化カラム値を持つレコードがあった場合には、それぞれ下限値以下、上限値以上であることを示す特別なグループ識別子を用意し、例外値に関する集計処理を別に行なう。このことにより、集計結果格納方式において当該グループ化カラムに関して集計結果で必要になる領域の上限値を見積もり、当該グループ化カラムに関しては集計結果の格納位置決定法に配列方式を選択して効率的な格納位置決定を行なうことができる。
【００６４】
グループ識別子上限指定とは、グループ分類に用いるグループ識別子が整数であった場合に、グループ識別子範囲指定において下限を０で固定し、上限を指定可能とする。このことにより簡便なグループ識別子範囲指定を行なうことができる。
【００６５】
グループ識別子種別上限指定では、グループ分類に用いるグループ識別子の取りうる値の種類の上限値を指定する。このことにより、当該グループ化カラムに関して集計結果で必要になる領域の上限値を見積もることができる。
【００６６】
本実施例における集計処理手段とは、与えられたレコードの集合から一つずつレコードを取り出しながら統計量を逐次的に計算する方法である。具体的な集計処理としては合計値、平均値などがある。レコードの情報以外にも集計済みのレコード数などの補助情報を用いて集計を進めていくこともある。
【００６７】
本実施例における集計結果格納方式とは、グループ集計結果の格納方式を決定するための方法である。本実施例では、それぞれのグループ化カラム値ごとに配列方式、ハッシュ方式、ハッシュ・配列併用方式の３方式の何れかを後述の集計結果格納方式選択手段を用いて使い分ける。
【００６８】
図１１に配列方式の集計結果格納形式を示す。集計結果が主記憶上の予約領域に収まることが保証される場合は本方式を利用する。本格納形式は、集計結果を要素とする一つの配列１２０から成る。
【００６９】
配列方式では、グループ識別子を配列の添え字として利用し、集計結果を配列の要素として格納する。配列の格納領域は主記憶上に確保される。この方式は格納場所の決定に比較操作を必要としないので、集計結果を高速に格納することができる。また、グループ化カラムの値が上限、下限の範囲を逸脱する場合は、それぞれの上限以上、下限以下のグループ識別子に対応するグループを設け、別個集計する。
【００７０】
図１２にハッシュ方式の集計結果格納形式を示す。集計結果が主記憶上の予約領域に収まらない可能性がある場合、本方式を利用する。本格納形式は、一つのハッシュテーブル１３０、複数のハッシュリスト１３１、複数のハッシュエントリ１３２、一つの集計結果格納領域１３３から成る。本方式において、ハッシュテーブル１３０はハッシュリスト１３１へのポインタを保持する。ハッシュリスト１３１はハッシュエントリ１３２のリストである。ハッシュエントリ１３２は、集計結果を格納するレコードへポインタおよび同じハッシュ値を持つ別のハッシュエントリへのポインタを保持する。集計結果格納領域１３３は主記憶上の予約領域から確保されるが、予約領域が足りなくなった場合には、後述する方法により対処する。
【００７１】
ハッシュ方式では、グループ識別子をハッシュ関数に適用し、ハッシュテーブルから適切なハッシュリストの選択し、ハッシュエントリから目的の格納場所を決定する。この方式は、格納場所の決定に際してハッシュリストをたどる必要があるが、必要な分の格納領域だけを消費するので主記憶の格納効率は良い。
【００７２】
図１３にハッシュ・配列併用方式の集計結果格納方式を示す。多次元分類集計処理において、それぞれのグループ化カラムごとに配列方式とハッシュ方式の格納方式を使い分ける場合、本方式を利用する。以下では、レコードの多次元グループ化に用いるグループ識別子｛ｇｉ｝を、後述の集計結果格納形式選択方式に基づいて、ハッシュ方式のキーとなるハッシュグループ識別子｛ｈｉ｝（⊆｛ｇｉ｝）と、配列方式の添え字となる配列グループ識別子｛ｃｉ｝（⊆｛ｇｉ｝）に排他的（｛ｈｉ｝∩｛ｃｉ｝＝｛ｇｉ｝，｛ｈｉ｝∩｛ｃｉ｝＝空集合）に分割する。
【００７３】
本格納形式は、一つのハッシュテーブル１４０、複数のハッシュリスト１４１、複数のハッシュエントリ１４２から成る。本方式において、ハッシュテーブル１４０はハッシュリスト１４１へのポインタから成る。ハッシュリスト１４１はハッシュエントリ１４２のリストである。ハッシュエントリ１４２は、同じハッシュ値を持つ別のハッシュエントリへのポインタ、格納されている集計結果に対応するハッシュグループ識別子、同じハッシュグループ識別子を持つ集計結果を格納する配列から成る。
【００７４】
ハッシュ・配列併用方式では、まずハッシュグループ識別子を用いて、ハッシュエントリから集計結果が格納されるべき配列を見つける。もし目的とする配列がハッシュエントリに登録されていなかった場合は、新たに配列を主記憶上の予約領域上に確保し、ハッシュエントリに追加する。予約領域上に配列を確保できなくなった場合の対処法は後述する。目的の配列を決定した後、ついで配列グループ識別子を用いて配列方式の格納場所決定法に準じて最終的に集計結果を格納すべき場所を特定する。この方式では、それぞれのグループ化識別子のグループ化に応じて、配列方式を利用するかハッシュ方式を利用するかを使い分けることにより、格納場所の決定が高速で、かつ格納効率の良い格納を可能とする。
【００７５】
以下では、本実施例において中間集計結果が主記憶上の予約領域に収まらなくなった場合の対処方法について述べる。集計処理サーバ４において集計結果が主記憶上の予約領域に収まりきれなくなるのは、集計結果格納方式としてハッシュ方式あるいはハッシュ・配列併用方式を利用していて、ハッシュエントリに予約領域以上の配列格納領域を追加しようとした場合である。以下では、集計結果が主記憶上の予約領域に収まらなくなった時点で、集計処理サーバ上に蓄積されている集計途中の集計結果を部分集計結果と呼ぶ。
【００７６】
図１４および図１６に本実施例における中間集計結果が主記憶上の予約領域に収まらなくなった場合の処理方法及びフローチャートを示す。
【００７７】
本実施例では、集計処理サーバ４において集計結果が主記憶上の予約領域に収まらなくなった場合（１７０）、部分集計結果を集計処理サーバの主記憶上でソートしたのち（１７１）補助二次記憶装置１７に待避する（１５０、１７２）。部分集計結果を待避した後は、主記憶上の集計結果格納領域は再利用することが可能となる。
【００７８】
集計処理サーバ４は、すべてのレコードの集計が終わった時点で（１７３）、補助二次記憶装置１７内に待避された複数の部分集計結果の間でＮウェイマージ処理を行ない（１５１、１７４）、処理結果を問い合わせ処理サーバ５に転送する。この方式では、補助二次記憶装置に対するランダムＩ／Ｏが発生しないので、効率の良い集計結果の作成が可能になっている。
【００７９】
また、集計結果が主記憶上の予約領域に収まらなくなった場合、主記憶上の部分集計結果を全て補助二次記憶１７に待避するのではなく、ＬＲＵ（ＬｅａｓｔＲｅｃｅｎｔｌｙＵｓｅｄ）方式に基づき、最も最近に参照された部分集計結果の一部を主記憶上に留め、残りの部分集計結果を補助二次記憶に待避し、全てのレコードの集計が終わった時点で、主記憶上の部分集計結果と助二次記憶装置１７内に待避された複数の部分集計結果との間でＮウェイマージ処理を行い、集計結果を作成することも可能である。この方式では、主記憶上で集計されるレコードの割合を高め、集計処理を効率的に行なうことができる。
【００８０】
さらにまた、集計結果が主記憶上の予約領域に収まらなくなった場合、主記憶上の部分集計結果を補助二次記憶１７に待避するのではなく、主記憶上で集計することのできなかったレコードを補助二次記憶装置に待避し、全てのレコードの集計が終わった時点で、主記憶上の部分集計結果と補助二次記憶装置１７内に待避されたレコードとの間でＮウェイマージ処理を行い、集計結果を作成することも可能である。この方式では、主記憶上で集計することのできなかったレコード数が少ない場合に、集計処理を効率的に行なうことができる。
【００８１】
加えて、先述の方式において、集計処理中に集計対象レコードの内、実際に主記憶上で集計されたレコードの比率を監視しておき、この比率が一定の値を下回った時には、主記憶上の部分集計結果を全て補助二次記憶装置に待避し、全てのレコードの集計が終わった時点で、主記憶上の部分集計結果と補助二次記憶装置１７内に待避された複数の部分集計結果と補助二次記憶装置１７内に待避されたレコードとの間でＮウェイマージ処理を行い、集計結果を作成することも可能である。この方式では、集計対象のレコードの分布に偏りがある場合に、集計処理を効率的に行なうことができる。
【００８２】
本実施例では、各グループ識別子ごとのグループ化指定に基づいて集計結果格納領域の格納方式を選択する。以下では、格納方式の選択方法について述べる。
【００８３】
多次元分類集計処理に利用されるすべてのグループ識別子に対して“グループ識別子範囲指定（グループ識別子上限指定を含む）”が指定されている場合は、それぞれの指定から集計格納領域の大きさの上限値を算出し、各集計処理サーバの主記憶上の予約領域に収まるかどうかを判断する。収まる場合は、集計格納領域を主記憶上に配列の形で確保する。収まらない場合は、一部または全部の識別子に関してハッシュ方式に切り替えた後、残りの識別子には配列方式を採用する。
【００８４】
多次元分類集計処理に利用されるすべてのグループ識別子に対して“グループ識別子範囲指定（グループ識別子上限指定を含む）あるいはグループ識別子種別上限指定”が指定されている場合は、それぞれの指定からグループ識別子の種類の上限値を見積もり、集計格納領域のサイズの上限値を算出し、各集計処理サーバの主記憶上の予約領域に収まるかを判断する。収まる場合は、あらかじめ必要とされる領域を主記憶上に確保しておき、必要な領域はここから取得する。収まらない場合は、「グループ識別子種別上限指定」がされた識別子に関してはハッシュ方式に切り替えた後、残りの識別子に関しては配列方式とし、配列方式部分について集計格納領域を主記憶上に確保する。
【００８５】
上記以外の場合は、すべてのグループ識別子に対してハッシュ方式を選択する。
【００８６】
以上のように、グループ分類手段においてグループ化指定で指定されたそれぞれのグループ化識別子の特性に応じて、集計結果格納方式を使い分けることにより、格納場所の決定が高速で、かつ格納効率の良い格納方式の実現が可能になる。
【００８７】
本実施例における統合処理手段とは、集計処理サーバ４から転送されたきた複数の中間集計結果を統合して、問い合わせ処理サーバ５にて最終集計結果を作成する方法である。本実施例の場合、各集計処理サーバ上の中間集計結果は互いに独立なグループに関する集計結果を保持しているので、それぞれの中間集計結果の間で再計算を行なう必要はなく、単にそれぞれの中間集計結果同士をつなぎあわせれば、最終集計結果を得ることができる。
【００８８】
本実施例の特徴は以下のとおりである。
【００８９】
本実施例では、分類集計処理においてレコードの値によって一意に集計結果格納領域の位置を決定することのできる集計結果格納領域決定手段を用いることで、分類処理と集計処理とをレコード単位で並行実行することを可能とし、
集計処理結果が主記憶上の予約領域に収まらない場合に、部分集計結果を主記憶上でソートした後、補助二次記憶装置に待避し、最後にＮウェイマージにより統合することで補助二次記憶処理装置に対するランダムＩ／Ｏの発行を抑え、
グループ化識別子ごとにグループ化識別子の値の範囲あるいは種別の上限値に関する情報を指定することにより、格納場所の決定が高速で、かつ格納効率の良い格納方式の実現を可能とする。
【００９０】
以下では、データベースに対する問い合わせ言語としてＳＱＬを念頭に置き、本実施例におけるグループ化指定の一例を示す。
【００９１】
まずグループ識別子範囲指定、グループ識別子上限指定、グループ識別子種別上限指定の構文をそれぞれＢＮＦ記法にしたがって示す。
【００９２】
＜グループ識別子範囲指定＞：：＝ＲＡＮＧＥ＜列名＞［下限値，．．，上限値］［｛，＜列名＞［下限値，．．，上限値］｝．．．］；
＜グループ識別子上限指定＞：：＝ＭＡＸ＜列名＞上限値［｛，＜列名＞上限値｝．．．］；
＜グループ識別子種別上限指定＞：：＝ＧＲＯＵＰＭＡＸ＜列名＞上限値［｛，＜列名＞上限値｝．．．］；
ただし上記の指定法は一つの実現例である。グループ化指定の指定のタイミングとしてはテーブルの形式定義時、問い合わせ発行時、あるいはコメントの形の指定も有り得る。
【００９３】
以下にＳＱＬ文を通じたグループ分類手段の指定の一例を前出の例を用いて示す。
【００９４】
ＣＲＥＡＴＥＴＡＢＬＥ販売実績（商品コードＩＮＴ，販売地ＳＴＲＩＮＧ（３２），販売日ＩＮＴ，値段ＩＮＴ）；
ＧＲＯＵＰＭＡＸ販売実績．商品コード１０００，販売実績．販売地１００；
ＲＡＮＧＥ販売実績．販売日［１，．．，３１］；
ＳＥＬＥＣＴ商品コード，販売地，販売日，ＳＵＭ（値段）ＦＲＯＭ販売実績ＧＲＯＵＰＢＹ商品コード，販売地，販売日；
上記１つ目のＳＱＬ文は、商品コード，販売地，販売日，値段をカラムに持つ販売実績の表を定義している。２つ目のＳＱＬ文は、商品コード、販売地の種別の上限値としてそれぞれ１０００、１００を指定している。３つ目のＳＱＬ文は販売日の値の範囲が１以上３１以下であることを指示している。４つ目のＳＱＬ文は、集計処理要求文であり、販売実績のテーブルの中のレコードを商品コード，販売地，販売日の値によってグループ化し、それぞれのグループごとに値段の合計を求めることを指示している。
【００９５】
上記４つ目のＳＥＬＥＣＴ文においては、商品コード、販売地に関してはハッシュ方式、販売日に関しては配列形式が選択される。グループごとの集計結果のレコードのサイズを４４バイトとして、集計結果格納領域としておよそ４４バイト×１０００×１００×３３＝１４５．２Ｍｂが必要とされる。もしこれだけの領域が主記憶上に確保可能な場合は、検索処理実行前に必要な領域が主記憶上に確保され、動的なメモリ割り当ての発生しない効率的な集計処理を行なうことができる。また最初に必要な大きさの領域を確保することができなかった場合は、配列領域として４４×３３＝１４５２バイト単位の領域が主記憶から必要に応じて動的に割り当てられる。割り当てが不可能になった場合は、前述の方法に基づき部分集計結果が二次記憶装置に待避されることになる。
【００９６】
図１８に本実施例における分類集計処理の例を示す。本例では２つの入出力サーバ、２つの集計処理サーバ、１つの問い合わせ処理サーバを使って、前述の分類集計処理を本実施例に基づいて実行した場合を示す。まずそれぞれ２つの入出力サーバに分割格納されているレコードのうち（１９０）商品コードがＨＴ４４９６６であるレコードは集計処理サーバ１へ、商品コードがＰＣ１５５５０であるレコードは集計処理サーバ２へ分配される（１９１）。次いでそれぞれの集計処理サーバにおいて、販売地と販売日に関してグループ化が行なわれ、それぞれのグループごとに集計処理が行なわれる（１９２）。それぞれの集計処理サーバは分配されたレコードに関して集計処理を終えた時点で、中間集計結果を問い合わせ処理サーバに転送する（１９３）。問い合わせ処理サーバは転送された中間集計結果を結合して最終集計結果を作成する（１９４）。
【００９７】
ただし上記実施例において、それぞれのサーバは、物理的に同じ計算機上で動作しても、あるいはまたネットワークで接続された別々の計算機上で動作してもよい。
【００９８】
また上記実記例の入出力サーバの分類手段において、グループ化識別子の値に応じて分類先を決定するのではなく、分配レコード数が各集計サーバごとに均等になるような分配を行い負荷分散を図り、問い合わせ処理ノードでもう一度中間集計結果の分類集計処理を行なうことで全体の分類集計処理を完成させてもよい。
【００９９】
さらに上記実施例においては、集計処理サーバをネットワークを介して２段以上接続し、大規模な数のレコードに関しては分類集計処理を階層的に行なってもよい。
【０１００】
（実施例２）
図２０に、本発明における並列データベース処理システムのまた別の実施例の構成を示す。
【０１０１】
図２０において、１はテーブルを部分テーブル７単位で分割格納する二次記憶装置、１８は二次記憶装置から部分テーブル７を読み出し、部分集計処理の作成を行なう入出力・集計サーバ、１９は部分集計結果の間でもう一度分類集計処理を行い、中間集計結果の作成を行なう中間集計サーバ、５は各中間集計サーバの集計結果を統合し、最終的な集計結果を作成する問い合わせ処理サーバ、３はそれぞれのノードの間でレコードや集計結果を交換するためのネットワーク、６はデータベースに対して分類集計処理要求を発行し、また集計結果を問い合わせ処理サーバから取り出すための端末装置である。ネットワーク３としては、ＬＡＮ、ＷＡＮ、専用ハードウェア装置等、任意のプロセッサ間結合ネットワークを利用することができる。
【０１０２】
以下では、実施例２における入出力・集計サーバ１８および中間集計サーバ１９の構成及び動作について説明する。実施例２の以下で説明されない他の部分は、実施例１と同じである。
【０１０３】
入出力・集計サーバ１８は、二次記憶装置１から読み出したレコードを一時的に格納するデータバッファ８、データバッファから取り出されたレコードの集計処理を行なうための集計処理手段１２、集計結果を格納する部分集計結果格納領域２５、集計結果レコードの分配先を決定するレコード分配手段９、および複数の送信バッファ１０を保持する。データバッファ８、集計結果格納領域１３、送信バッファ１０は入出力・集計サーバ１８が備える主記憶上に確保される。入出力・集計サーバ１８は、二次記憶装置１の保持する部分テーブル７をブロック単位でデータバッファ８に転送する。ついで入出力・集計サーバ１８は、データバッファ８からレコードを順次取り出し、集計処理手段１２を用いて部分集計結果格納領域２５上に集計結果を作成する。集計結果格納領域上の集計結果は、集計結果格納領域の容量に達し次第、レコード分配手段９を用いて分配先の中間集計サーバ１９を決定し、送信バッファ１０を経由して部分集計結果として中間集計サーバ１９に転送される。
【０１０４】
中間集計サーバ１９は、複数の受信バッファ１１および入出力・集計サーバから転送された部分集計結果の集計途中結果を待避するための補助二次記憶装置１７、部分集計結果の間の分類集計処理の作業領域を提供する集計結果格納領域１３を保持する。入出力・集計サーバ１８から転送された部分集計結果は、中間集計サーバ１９においてもう一度分類集計処理が行なわれる。集計処理の途中で集計結果が集計結果格納領域１３に収まらなくなった場合は、集計途中の集計結果を補助二次記憶装置に待避する。すべての部分集計結果を入出力・集計サーバ１８から受信した後、中間集計サーバ１９は、補助二次記憶装置１７に待避された部分集計結果の間でＮウェイマージ処理を行なう。マージ結果は中間集計結果として、問い合わせ処理サーバ５に対して転送される。
【０１０５】
図２１は本実施例おける分類集計処理全体の処理フローの様子を示す。まずはじめに端末装置６は集計処理実行要求を問い合わせ処理サーバ５に対して転送し（２６００）、問い合わせ処理サーバからの応答を待つ。一方集計処理実行要求を受け付けた問い合わせ処理サーバ５は、要求された集計処理を行うための実行計画を作成し（２６０１）、当該計画に従って入出力・集計サーバ１８や中間集計処理サーバ１９上の集計処理プログラムの起動を行なう（２６０２）。
【０１０６】
問い合わせ処理サーバ５は、受信バッファ１４、最終集計結果格納領域１６などの初期化を行なった後（２６０３）、中間集計サーバ１９からの応答を待つ。問い合わせ処理サーバ５により集計処理プログラムを起動された入出力・集計サーバ１８や中間集計サーバ１９は、問い合わせ処理サーバ５からの指示に従って、データバッファ８、送信バッファ１０、受信バッファ１１、集計結果格納領域１３、部分集計結果格納領域２５などの初期化を行う（２６０４、２６０５）。
【０１０７】
入出力・集計サーバ１８は、領域初期化を行なった後、二次記憶装置１からデータバッファ８に読み込んだレコードごとに集計処理手段１２に基づいて集計処理を行なう（２６０６）。入出力・集計サーバ１８は、集計結果が部分集計結果格納領域２５の容量に達し次第、部分集計結果の転送先の中間集計サーバをレコードごとに分配処理手段９に基づいて決定し、送信バッファ１０を通じてレコードページの転送を行なう（２６２３）。入出力・集計サーバ１８は、すべてのレコードページを分配し終えた時点で終了通知を各集計処理サーバに対して送信し（２６０７）、終了する（２６０８）。
【０１０８】
一方入出力・集計サーバ１８からレコードページを分配された中間集計サーバ１９では、レコードページを受信し次第、転送された部分集計結果の分類集計処理を行なう。集計結果が格納領域に収まらなくなった場合は、集計途中の集計結果をグループ化識別子の値に関してソートした後、補助二次記憶装置に待避する。（２６０９）。各中間集計サーバでは、すべての入出力・集計サーバからの終了通知を受信した時点で（２６１０）、補助二次記憶装置１７に待避された部分集計結果があれば（２６１１）、それらの間でマージソート処理を行なう（２６１２）。
【０１０９】
以上のような手順で作成された中間集計結果は、各中間集計サーバから問い合わせ処理サーバに対してブロック単位で転送される（２６１３）。すべての中間集計結果を転送し終えた時点で、中間集計サーバは問い合わせ処理サーバに対して終了通知を送信し（２６１４）、終了する（２６１５）。中間集計結果を受け取った問い合わせ処理サーバ５では、中間集計結果を順次、最終集計結果格納領域１５に格納し、端末装置６から要求にしたがって、最終集計結果を端末装置６に対して転送する（２６１６）。問い合わせ処理サーバはすべての集計処理サーバからの終了通知を受け取った時点で（２６１７）、端末装置に対して終了通知を送信し（２６１８）、終了する（２６１９）。端末装置６は、必要に応じて問い合わせ処理サーバ５に対して最終集計結果を要求し、最終集計結果を受信する（２６２０）。端末装置６は、問い合わせ処理サーバ５からの終了通知を受信して（２６２１）集計処理の終了を確認した時点で自身の処理を終了する（２６２２）。
【０１１０】
本実施例は、部分集計結果の統合処理を複数の中間集計サーバで分担して独立に行なうので、集計結果のグループ数が多い場合に効率の良い処理を行なうことができる。
【０１１１】
以下では、入出力・集計サーバおよび中間集計サーバのフローチャートを更に詳しく説明する。
【０１１２】
図２２は入出力・集計サーバにおける分類集計処理のフローチャートを示す。入出力・集計サーバ１８は、問い合わせ処理サーバ５からの指示にしたがって、まずデータバッファ８、部分集計結果格納領域２５、送信バッファ１０の初期化を行なう（２７００）。次に二次記憶装置１に格納されているレコードをブロック単位でデータバッファ８に転送する（２７０１）。次いで入出力・集計サーバは、データバッファ８からレコードを順次取り出し（２７０２）、レコードの分類されるべきグループ識別子を決定する（２７０３）。入出力・集計サーバは、レコードの値を参照して集計処理を行い（２７０４）、グループ識別子に基づいて集計結果格納領域上の集計結果格納領域を決定し、集計結果を書き込む（２７０５）。このときもし集計結果が集計格納領域に収まらない場合は（２７０６）、集計途中の部分集計結果の中間集計サーバへの転送処理を行なう（２７０７）。読み出したすべてのブロック・レコードについて集計処理を終えた時点で（２７０８、２７０９）、最後の集計結果の転送を行なった後（２７１０）、終了通知を中間集計サーバに対して送信し（２７１１）、終了する（２７１２）。
【０１１３】
図２３に中間集計サーバにおける分類集計処理のフローチャートを示す。中間集計サーバ１９は、問い合わせ処理サーバ５からの指示にしたがって、まず集計結果格納領域１３、受信バッファ１１の初期化を行う（２８００）。中間集計サーバは、入出力・集計サーバから受信バッファ１１に転送された（２８０１）部分集計結果から集計結果を一つずつ取り出し（２８０３）、対応するグループ化識別子の値に関して分類処理を行う（２８０４）。
【０１１４】
次いで分類された集計結果ごとに集計値の再集計処理を行ない（２８０５）、その結果を集計結果格納領域１３に書き込む（２８０６）。集計結果が集計結果格納領域に収まらない場合は（２８０７）、途中の集計結果を対応するグループ化識別子の値に関してソートし、補助二次記憶装置に格納する（２８０８）。各中間集計サーバは、受信した集計結果分だけ集計結果を繰り返す（２８０９）。中間集計サーバは、すべての入出力・集計サーバから終了通知を受信した時点で（２８１０）、もし集計結果が補助二次記憶装置に待避されていた場合には（２８１１）、待避された集計結果の間でマージソート処理を行い、中間集計結果を作成する（２８１２）。中間集計サーバは中間集計結果は問い合わせ処理サーバに対して転送し（２８１３）、終了通知を送信した後（２８１４）、終了する（２８１５）。
【０１１５】
以下では、本実施例の入出力・集計サーバ１８において集計結果が主記憶上の予約領域に収まらなくなった場合の対処方法について述べる。入出力・集計サーバ１８において集計結果が主記憶上の予約領域に収まりきれなくなるのは、集計結果格納方式としてハッシュ方式あるいはハッシュ・配列併用方式を利用していて、ハッシュエントリに予約領域以上の配列格納領域を追加しようとした場合である。以下では、集計結果が主記憶上の予約領域に収まらなくなった時点で、入出力・集計サーバ上に蓄積されている集計途中の集計結果を部分集計結果と呼ぶ。
【０１１６】
図１５および図１７に本実施例における中間集計結果が主記憶上の予約領域に収まらなくなった場合の処理方法及びフローチャートを示す。
【０１１７】
本実施例では、入出力・集計サーバ１８において集計結果が主記憶上の予約領域に収まらなくなった場合、部分集計結果に含まれるそれぞれの集計結果を、同じグループに関する集計結果は同じ中間集計サーバ１９に転送されるように分類し、転送する（１６０）。部分集計結果を待避した後は、主記憶上の集計結果格納領域は再利用することが可能となる。中間集計サーバは部分集計結果を主記憶上でソートしたのち補助二次記憶装置１７に待避する（１６１）。中間集計サーバ１９は、すべての集計結果が転送された時点で（１８０）、補助二次記憶装置内に待避された複数の部分集計結果の間でＮウェイマージ処理を行ない（１６２、１８１）、処理結果を問い合わせ処理サーバ５に転送する。この方式では、補助二次記憶装置に対するランダムＩ／Ｏが発生しないので、効率の良い集計結果の作成が可能になっている。
【０１１８】
ただし本実施例において、入出力・集計サーバにおいて集計結果が主記憶上の予約領域に収まらなくなった場合、実施例１における集計処理サーバの場合と同様に、部分集計結果の一部を補助二次記憶装置に待避する、あふれたレコードだけ補助二次記憶装置に待避する、主記憶上での集計成功率により部分集計結果を補助二次記憶装置に待避する等により、効率的な集計処理を図ることも可能である。
【０１１９】
図１９に本実施例における分類集計処理の例を示す。本例では２つの入出力・出力サーバ、２つの中間集計サーバ、１つの問い合わせ処理サーバを使って、前述の分類集計処理を本実施例に基づいて実行した場合を示す。まずそれぞれ２つの入出力・集計サーバに分割格納されているレコードに関して（２００）各入出力・集計サーバにおいて集計処理をした後、商品コードがＨＴ４４９６６であるレコードに関する集計結果は中間集計サーバ１へ、商品コードがＰＣ１５５５０であるレコードに関する集計結果は集計処理サーバ２へ分配される（２０１）。次いでそれぞれの中間集計サーバにおいて、それぞれの同じグループに関する集計結果ごとに部分集計結果の再分類集計処理が行なわれる（２０２）。それぞれの中間集計サーバは分配されたすべての部分集計結果に関して分類集計処理を終えた時点で、マージ後の中間集計結果を問い合わせ処理サーバに転送する（２０３）。問い合わせ処理サーバは転送された中間集計結果を結合して最終集計結果を作成する（２０４）。
【０１２０】
ただし上記実施例において、それぞれのサーバは、物理的に同じ計算機上で動作しても、あるいはまたネットワークで接続された別々の計算機上で動作してもよい。
【０１２１】
また上記実記例の入出力サーバの分類手段において、グループ化識別子の値に応じて分類先を決定するのではなく、分配レコード数が各集計サーバごとに均等になるような分配を行い負荷分散を図り、問い合わせ処理ノードでもう一度中間集計結果の分類集計処理を行なうことで全体の分類集計処理を完成させてもよい。
【０１２２】
さらに上記実施例においては、集計処理サーバをネットワークを介して２段以上接続し、大規模な数のレコードに関しては分類集計処理を階層的に行なってもよい。
【０１２３】
本発明では、分類集計処理対象データの特性に応じて上記２つの実施例のうちから最も効率のよい方式を採用することができる。例えば、集計処理サーバ数を十分確保できる場合は実施例１を採用し、二次記憶装置に格納されたレコードをなるべく多くの集計処理サーバに分配することで、短時間で分類集計処理を終えることができる。また、集計結果のデー多量が集計対象レコードのデー多量に対して十分小さくなる場合は実施例２を採用することによって、ネットワーク上のデータ転送量を削減することができる。
【０１２４】
【発明の効果】
本発明による並列データベース処理システムでは、分類処理と集計処理とをレコード単位で並行実行し、集計結果が主記憶上の予約領域に収まらなかった場合のランダムＩ／Ｏの発生を防止することにより効率の良いグループ化集計処理が可能となる。
【図面の簡単な説明】
【図１】本発明による並列データベース処理システムの実施例１のブロック図である。
【図２】データベースの論理構造を示す図である。
【図３】分類集計処理の概略を示す図である。
【図４】マージソート方式のフローチャートを示す図である。
【図５】単純ハッシュ方式のフローチャートを示す図である。
【図６】実施例１における分類集計処理全体の処理フローの概略を示す図である。
【図７】実施例１における入出力サーバの処理のフローチャートを示す図である。
【図８】実施例１における集計処理サーバの処理のフローチャートを示す図である。
【図９】問い合わせ処理サーバの処理のフローチャートを示す図である。
【図１０】端末装置の処理のフローチャートを示す図である。
【図１１】配列方式に基づく集計結果格納形式を示す図である。
【図１２】ハッシュ方式に基づく集計結果格納形式を示す図である。
【図１３】ハッシュ・配列併用方式に基づく集計結果格納形式を示す図である。
【図１４】実施例１において集計結果が予約領域に収まらなかった場合の対処法を示す図である。
【図１５】実施例２において集計結果が予約領域に収まらなかった場合の対処法を示す図である。
【図１６】実施例１において集計結果が予約領域に収まらなかった場合の処理のフローチャートを示す図である。
【図１７】実施例２において集計結果が予約領域に収まらなかった場合の処理のフローチャートを示す図である。
【図１８】実施例１で分類集計処理を行なった場合の実施例を示す図である。
【図１９】実施例２で分類集計処理を行なった場合の実施例を示す図である。
【図２０】本発明による並列データベース処理システムの実施例２のブロック図である。
【図２１】実施例２における分類集計処理全体の処理フローの概略を示す図である。
【図２２】実施例２における入出力・集計サーバの処理のフローチャートを示す図である。
【図２３】実施例２における中間集計サーバの処理のフローチャートを示す図である。
【符号の説明】
１：二次記憶装置、２：入出力サーバ、３：ネットワーク、４：集計処理サーバ、５：問い合わせ処理サーバ、６：端末装置、７：部分テーブル、８：データバッファ、９：レコード分配手段、１０：送信バッファ、１１：受信バッファ、１２：集計処理手段、１３：集計結果格納領域、１４：受信バッファ、１５：集計結果統合手段、１６：最終集計結果格納領域、１７：補助二次記憶装置、１８：入出力・集計サーバ、１９：中間集計ノード。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a parallel database processing system that performs database processing in cooperation with one or more nodes, and more particularly to a parallel database processing system that speeds up database classification and aggregation processing.
[0002]
[Prior art]
The logical structure of the database is in a table format (20) as shown in FIG. The horizontal direction of this table is called a record (21), and the vertical direction is called a column (22). The same column of each record stores data in the same format.
[0003]
Classification and aggregation processing means that records in a table are classified into several groups based on the values of one or more columns specified in advance, and one or more predetermined records are set for each set of records belonging to each group. This is the process of calculating the statistical value such as the total value and the average value of the values of the columns to be totaled.
[0004]
FIG. 3 shows an example of the classification and aggregation process. In this example, the relational database holds information on sales performance in the form of a table. Records in the table have column values for the product code, sales location, date of sale, and price (30). In this example, records having the same value in the column values of the product code, the sales place, and the sale date are set as one group (31), and the total value of the record prices is calculated for each group (31). 32). The result of the tallying process is obtained again in the form of a table (33).
[0005]
As a conventional classification and aggregation processing method, the first method is disclosed in the literature "Parallel Sorting Algorithms (by Selim G. Akl, published by Academic Press, Inc.)", page 48 to page 49 of Chapter 3. As a method and a second method, a simple hashing method disclosed in Chapter 9, pages 262 to 264 of the document "Relational Data Management (M. Papazoglow, written by W. Valder, published by PRENTICE HALL)" can be cited.
[0006]
First, a flowchart of a merge sort method as a first conventional method is shown in FIG. In the merge sort method, records constituting a database table are read in blocks from a secondary storage device (40), sorted on main storage (41), and then written back to the secondary storage device (42). . After this operation is repeated for all record blocks (43), merge sorting is performed between the record lists sorted for each block using the secondary storage device as a work area (44). After the above operation, the sorted record rows are sequentially read from the secondary storage device (45), and the records are totaled (46). However, in the merge sort method, a large amount of I / O occurs during the external sort processing of records, and parallel execution cannot be performed between the sort processing (corresponding to the classification processing) and the aggregation processing. Good classification and tallying cannot be performed.
[0007]
FIG. 5 shows a flowchart of a simple hash method as a second conventional method. In the simple hash method, when reading out a record stored in the secondary storage device (50), records are grouped using a hash function (51), and at the same time, the counting result in the counting result storage area is updated. (52) The classification process and the tabulation process are performed in parallel, and the classification and tabulation process is performed without creating a temporary file for grouping. However, in the simple hash method, since the tally result storage area is randomly accessed, if the tally result does not fit in the reserved area on the main memory (53), random I / O occurs to the secondary storage device. (54), the execution efficiency of the classification and aggregation process is reduced.
[0008]
[Problems to be solved by the invention]
In a conventional relational database system, merge sorting is performed via a secondary storage device when grouping records in a database in a classification and aggregation process. In this method, a large amount of I / O access is required for the merge sort process, and the classification process and the counting process cannot be performed in parallel, so that the efficiency of the classification and counting process has been greatly reduced.
[0009]
Alternatively, in a conventional relational database system, grouping using a hash function is performed at the time of record classification processing in group totaling. In this method, random access occurs to the tally result storage area during the tally processing, and if the tally result does not fit in the reserved area on the main memory, a large amount of random I / O is sent to the secondary storage device. Issued, causing a reduction in the efficiency of classification and aggregation processing.
[0010]
An object of the present invention is to enable a classification process and a tallying process to be executed in parallel in a database tallying process.
[0011]
Another object of the present invention is to prevent a random I / O from being generated for a secondary storage device even when a totaling result does not fit in a reserved area on a main storage.
[0012]
[Means for Solving the Problems]
The classification and aggregation processing method of the parallel database processing system according to the present invention includes a network 3 that connects one or more input / output servers 2, one or more aggregation processing servers 4, and one or more query processing servers 5. A method for performing a classification and aggregation process in a parallel database processing system, wherein a partial table 7 obtained by dividing a single table composed of a large number of records is divided and held by the plurality of input / output servers.
By using a hash mechanism to determine the storage location of the aggregation results and executing the aggregation process on a record-by-record basis, it is possible to execute the record classification process and aggregation process in parallel, and if the aggregation results do not fit in the reserved area on the main memory, Saves the partial totaling result in the auxiliary secondary storage device 17, integrates the partial totaling result saved after the completion of the totaling process, and prevents the random I / O to the auxiliary secondary storage device to efficiently perform the classification and totalizing process. Is a method for executing.
[0013]
Specifically, an inquiry processing server that processes an inquiry processing request to the database, an input / output server that reads out records of the database, and an aggregation processing server that performs aggregation processing of records are provided, and one or more of the records are provided. A column is designated as a grouping column, a group identifier is made to correspond to the grouping column value, a plurality of records having the same group identifier value are classified into one group, and one or more records of the record are classified. A first database processing system that designates a column as an aggregation target column and performs an aggregation process for each record classified into each group, wherein the input / output server is configured to perform the aggregation process according to a value of a grouping column of each record. Means for transferring the record to a tallying processing server that performs tallying of the record, A total processing server for allocating a totaling result storage area on a main memory provided in the totaling processing server, a unit for generating a group identifier from a value of a grouping column of a record received from the input / output server, and a value of the group identifier. A grouping means for uniquely determining the storage location of the aggregation result corresponding to the group identifier, a counting means for updating the aggregation result stored in the storage location based on the value of the aggregation target column of the record, Means for transferring the tally result to the inquiry processing server, wherein the inquiry processing server is realized by having means for integrating the interim tally result received from the tally processing server.
[0014]
Further, in the first database processing system, when the tally result does not fit in the tally result storage area secured on the main memory of the tally processing server, the tally result in the tally is grouped according to the tally result. After sorting the values of the identifiers on the main memory, the data is saved as an intermediate tally result in an auxiliary secondary storage device provided in the tallying server, and when the tallying process is completed, the tally result in the main memory and the auxiliary secondary storage device By performing a merge process with the zero or more sorted intermediate tabulation results saved in the above, the process is performed by performing a tabulation process on the records distributed to the respective tabulation processing servers.
[0015]
Further, in the first database processing system, when the tally result does not fit in the tally result storage area secured on the main memory of the tally processing server, of the tally results in the middle of tally, the most recently referred tally result. After leaving a part of the tally result and sorting the remaining tally results on the main memory with respect to the grouping identifier value corresponding to the tally result, the result is saved as an intermediate tally result in the auxiliary secondary storage device provided in the tally processing server. At the time when the totaling process is completed, each of the totalizing processes is performed by performing a merge process between the totaling result on the main memory and the zero or more sorted intermediate totaling results saved in the auxiliary secondary storage device. This is achieved by performing a tallying process on records distributed to the server.
[0016]
Further, in the first database processing system, when the totaling result cannot be stored in the totaling result storage area secured on the main memory of the totalizing processing server, records that cannot be totaled are collected by the totaling processing server. At the time when the data is saved in the auxiliary secondary storage device provided and the aggregation process is completed, the merging process is performed between the aggregation result in the main storage and the aggregation result of the zero or more records saved in the auxiliary secondary storage device. This is achieved by performing the tallying process on the records distributed to the respective tallying servers.
[0017]
Further, in the first database processing system, when the totaling result cannot be stored in the totaling result storage area secured on the main memory of the totalizing processing server, records that cannot be totaled are collected by the totaling processing server. When the ratio of records that can be tallied in the secondary storage device provided in the storage device and that can be tallied on the main storage falls below a certain value, the tallied result of the tallied is grouped by the grouping identifier corresponding to the tallied result. After sorting on the main memory for the value of, it is saved as an intermediate tabulation result in the auxiliary secondary storage device provided in the tabulation processing server, and when the tabulation process is completed, the tabulation result in the main memory and the auxiliary secondary storage device are saved. Performing a merge process between the saved total result of the zero or more records and the zero or more sorted intermediate total results saved in the auxiliary secondary storage device; Thus, realized by performing the aggregation process for records distributed to each of the aggregation processing server.
[0018]
Next, an inquiry processing server for processing an inquiry processing request to the database, an input / output / aggregation server for reading and totaling records of the database, an intermediate aggregation processing server for performing an aggregation processing of intermediate aggregation results, and the inquiry processing A server, the input / output / aggregation server, and an intermediate aggregation processing server, one or more columns of the record are designated as grouping columns, a group identifier is associated with the grouping column value, and the group identifier is assigned. Are classified into one group, and one or more columns of the record are designated as columns to be totalized, and a second grouping process is performed for each record classified into each group. In the database processing system, the input / output / totaling server stores a totaling result storage area. Means for securing the data in the main memory of the input / output / aggregation server, means for generating a group identifier from the value of the grouping column of each record, and a storage location for the aggregation result corresponding to the group identifier from the value of the group identifier Classifying means for uniquely determining the totaling result, a totaling means for updating the totaling result stored in the storage area based on the value of the totaling target column of the record, and a grouping identifier corresponding to the partial totaling result. Means for transferring the partial tabulation results to an intermediate tabulation server that performs integration processing of the partial tabulation results, wherein the intermediate tabulation server integrates the partial tabulation results received from the input / output / tabulation server and generates an intermediate result. Means for transferring the intermediate counting result to the inquiry processing server, wherein the inquiry processing server receives the result from the intermediate counting processing server. It realized by having a means for integrating the intermediate aggregate results.
[0019]
Further, in the second database processing system, when the tally result cannot be stored in the tally result storage area secured in the main memory of the input / output / tally server, the tally result in the middle of the tally in the main memory is entered. According to the corresponding rules of the output / aggregation server, the data is transferred to the intermediate aggregation server as a partial aggregation result, and the intermediate aggregation server performs classification and aggregation processing of the partial aggregation result, thereby totaling the records stored in the respective input / output / aggregation servers. It is realized by performing processing.
[0020]
Further, in the second database processing system, when the tally result does not fit in the tally result storage area secured on the main memory of the input / output / tally server, the tally result in the middle of the tally corresponds to the tally result. After sorting the values of the grouping identifiers on the main memory, the data is saved as an intermediate aggregation result in the auxiliary secondary storage device provided in the input / output aggregation server, and when the aggregation processing is completed, the aggregation result on the main memory and the auxiliary This is realized by performing a merging process with the zero or more sorted intermediate tabulation results saved in the next storage device, thereby performing a tabulation process on records distributed to the respective input / output / counting servers.
[0021]
Further, in the second database processing system, when the tally result does not fit in the tally result storage area secured on the main memory of the input / output / total server, the most recently referred to among the tally results in the middle of the tally. After a part of the calculated totaling result is left, the remaining totaling result is sorted on the main storage with respect to the value of the grouping identifier corresponding to the relevant totaling result, and then the intermediate totaling is performed on the auxiliary secondary storage device of the input / output and totaling server. As a result, at the time when the totaling process is completed, the merging process is performed between the totaling result in the main storage and the zero or more sorted intermediate totaling results saved in the auxiliary secondary storage device. This is realized by performing the tallying process on the records distributed to the input / output and tallying server.
[0022]
Further, in the second database processing system, when the totaling result cannot be stored in the totaling result storage area secured in the main memory of the input / output / totaling server, records that could not be totaled are input. When the data is saved in the auxiliary secondary storage device provided in the output / aggregation server and the aggregation process is completed, between the aggregation result in the main memory and the aggregation result of the zero or more records saved in the auxiliary secondary storage device. This is realized by performing the aggregation processing on the records distributed to the respective input / output / aggregation servers.
[0023]
Further, in the second database processing system, when the totaling result cannot be stored in the totaling result storage area secured in the main memory of the input / output / totaling server, records that could not be totaled are input. When the ratio of records that can be totaled in the main storage falls below a certain value, the totaling result in the middle of totaling corresponds to the totaling result when the data is saved in the auxiliary secondary storage device of the output and totaling server. After sorting on the main memory with respect to the value of the grouping identifier to be performed, it is saved as an intermediate aggregation result in the auxiliary secondary storage device provided in the input / output / aggregation server, and when the aggregation processing is completed, the aggregation result on the main memory is Merge processing between the aggregation result of the zero or more records saved in the auxiliary secondary storage device and the zero or more sorted intermediate aggregation results saved in the auxiliary secondary storage device By performing, realized by performing the aggregation process for records distributed to each input and output and tallying server.
[0024]
In addition, when the intermediate aggregation server performs the classification and aggregation processing of the partial aggregation results, if the aggregation results cannot be stored in the aggregation result storage area secured on the main memory of the intermediate aggregation server, the aggregation results in the middle of the aggregation are regarded as relevant. After sorting the values of the grouping identifiers corresponding to the aggregation results on the main memory, the data is saved as an intermediate aggregation process in an auxiliary secondary storage device provided in the intermediate aggregation server, and when the aggregation process is completed, the aggregation on the main memory is performed. By performing a merge process between the result and the zero or more sorted intermediate tabulation results saved in the auxiliary secondary storage device, performing a tabulation process regarding the partial tabulation results distributed to the respective intermediate tabulation servers. It is realized by.
[0025]
In addition, when the intermediate tabulation server performs the classification tabulation processing of the partial tabulation results, if the tabulation results cannot be stored in the tabulation result storage area secured in the main memory of the intermediate tabulation server, After leaving a part of the most recently referred tally result and sorting the remaining tally results on the main memory with respect to the grouping identifier value corresponding to the tally result, the auxiliary secondary storage device provided in the intermediate tally server At the time when the tallying process is completed, the merging process is performed between the tallying result in the main memory and the zero or more sorted intermediate tallying results saved in the auxiliary secondary storage device. This is realized by performing the tallying process on the partial tallying results distributed to the respective intermediate tallying servers.
[0026]
In addition, when the intermediate tabulation server performs classification and tabulation of partial tabulation results, if the tabulation results cannot be stored in the tabulation result storage area secured in the main memory of the intermediate tabulation server, the tabulation can be overflown. The records that did not exist are saved in the auxiliary secondary storage device provided in the intermediate aggregation server, and when the aggregation process is completed, the aggregation results in the main storage and the aggregation results of the zero or more records saved in the auxiliary secondary storage device This is realized by performing a merging process between and the totaling process regarding the partial totaling results distributed to the respective intermediate totaling servers.
[0027]
In addition, when the intermediate tabulation server performs classification and tabulation of partial tabulation results, if the tabulation results cannot be stored in the tabulation result storage area secured in the main memory of the intermediate tabulation server, the tabulation can be overflown. The records that did not exist are backed up to the auxiliary secondary storage device of the intermediate aggregation server, and when the ratio of records that could be aggregated in the main memory falls below a certain value, the aggregation results in the aggregation are counted. After sorting the values of the grouping identifiers corresponding to the results on the main memory, the results are saved in the auxiliary secondary storage device of the intermediate aggregation server as the intermediate aggregation results, and when the aggregation processing is completed, the aggregation results on the main memory are completed. Between the count result of the zero or more records saved in the auxiliary secondary storage device and the sorted intermediate count result of the zero or more records saved in the auxiliary secondary storage device. By performing the process, it is realized by performing the aggregation process for Partial counting result which is distributed to each of the intermediate aggregation server.
[0028]
Further, one or more columns of the records constituting the table stored in the database processing system are designated as grouping columns, and a group identifier is assigned one-to-one or many-to-one with respect to the designated grouping column value. Correspondingly, a plurality of records having the same value of the group identifier are classified into one group, and one or more columns of the records belonging to each of the classified groups are designated as the aggregation target columns, and the designated In a database processing system that performs a totaling process related to the values of the columns, the grouping column in the classification and totaling process is realized by having a format for specifying a range of values of the group identifier.
[0029]
Further, one or more columns of records constituting a table stored in the database processing system are designated as grouping columns, and a group identifier is assigned one-to-one or many-to-one with respect to the designated grouping column value. , And classify a plurality of records having the same group identifier value into one group, specify one or more columns of records belonging to each of the classified groups as a column to be aggregated, and specify In a database processing system that performs a totaling process related to the values of the columns that have been set, this is realized by having a format for specifying the upper limit of the number of groups obtained as a result of the grouping when specifying the grouping columns in the classification and totalizing process.
[0030]
Further, one or more columns of records constituting a table stored in the database processing system are designated as grouping columns, and a group identifier is assigned one-to-one or many-to-one with respect to the designated grouping column value. , And classify a plurality of records having the same group identifier value into one group, specify one or more columns of records belonging to each of the classified groups as a column to be aggregated, and specify Format for specifying a user-defined function to define the correspondence between grouping column values and group identifiers in the specification of grouping columns in classification and aggregation processing in a database processing system that performs aggregation processing on the values of columns that have been set It is realized by having.
[0031]
Further, in the above database processing system, the specification of the grouping column of the classification and aggregation processing has a format for specifying a range of the value of the group identifier, and whether or not the range of the value of the group identifier is specified for each grouping column In response to the above, an array method in which the storage result storage position is directly determined by calculation from the value of the grouping identifier and a hash method in which the aggregation result storage position is determined by comparing the grouping identifier are used separately for each grouping column.
[0032]
Further, in the above database processing system, the specification of the grouping column of the classification and aggregation processing has a format for specifying the upper limit of the number of groups obtained as a result of the grouping, and the range of the value of the group identifier for each grouping column , A group for a group identifier having a value that deviates from the upper limit or the lower limit of the specified range is prepared, and the counting is performed on the records classified into the group.
[0033]
Further, the database processing system has a format for specifying a user-defined function for defining a correspondence between a grouping column value and a group identifier, for specifying a grouping column for classification and aggregation processing, This is realized by using a user-defined function for defining the correspondence between the group identifier and the group identifier, and performing a totaling process by narrowing the range of the value of the group identifier.
[0034]
BEST MODE FOR CARRYING OUT THE INVENTION
(Example 1)
FIG. 1 shows a configuration of an embodiment of a parallel database processing system according to the present invention. In FIG. 1, 1 is a secondary storage device for storing a table, 2 is an input / output server for reading a partial table 7 from the secondary storage device, 4 is a totaling server for performing a classification and totalizing process of records, and 5 is a totalizing server. An inquiry processing server that integrates the tally results and creates a final tally result, 3 is a network for exchanging records and tally results between respective nodes, and 6 issues a classification tally processing request to the database. , And a terminal device for taking out the aggregation result from the inquiry processing server 5. As the network 3, an arbitrary inter-processor connection network such as a LAN, a WAN, and a dedicated hardware device can be used.
[0035]
The secondary storage device 1 stores records to be grouped. The record is divided and stored in a plurality of secondary storage devices based on a division method such as a hash division method or a key value division method. The input / output server 2 holds a data buffer 8 for temporarily storing records read from the secondary storage device 1, a record distribution unit 9 for determining a tallying server to which records are to be distributed, and a plurality of transmission buffers 10. The data buffer 8 is secured on a main memory provided in the input / output server.
[0036]
The partial table 7 held by the secondary storage device 1 is transferred to the data buffer 8 in block units. The record distributing means 9 sequentially retrieves the records in the data buffer 8 and determines the totaling processing server 4 to which the records should be totalized by referring to the values of the columns to be grouped for each record. The input / output server 2 extracts only the grouping columns and the aggregation target columns necessary for the aggregation processing of the record, and distributes them to the aggregation processing servers 4 from the transmission buffer 10 through the network 3.
[0037]
The tallying processing server 4 includes a tallying unit 12 for tallying the records transferred from the plurality of receiving buffers 11 and the input / output server 2, a tallying result storage area 13 for storing tallying results, and an auxiliary for saving tallying results during tallying. It holds the secondary storage device 17. The tally result storage area is secured on the main memory of the tally processing server 4. The tallying server performs the tallying process on the records transferred to the reception buffer 11 using the tallying means 12 and stores the intermediate tallying result in the tallying result storage area 13. If the intermediate tabulation result does not fit in the tabulation result storage area 13, the tabulation result in the middle of the tabulation is saved in the auxiliary secondary storage device 17 as a partial tabulation result. The aggregation processing server 4 transfers the intermediate aggregation result to the inquiry processing server 5 when the aggregation processing is completed for all the distributed records.
[0038]
The inquiry processing server 5 holds a plurality of receiving buffers 14, a totaling result integrating unit 15, and a final totaling result storage area 16. The inquiry processing server 5 receives the tallying process from the terminal device 6, determines a tallying process execution procedure, and issues an execution instruction to the input / output server 2 and the tallying processing server 4. The query processing server 5 collects the intermediate tabulation results created by the respective tabulation processing servers 4 through the network 3, performs an integration process between them, creates a final collection result, and stores the result in the final tabulation result storage area 16. I do. The final totaling result storage area 16 is held in the main storage or the secondary storage provided in the inquiry processing server 5. The terminal device 6 requests the inquiry processing server 5 to perform a totaling process, and as soon as the final totaling result is created on the inquiry processing server 5, sequentially extracts the final totaling result.
[0039]
FIG. 6 shows a processing flow of the entire classification and aggregation processing in this embodiment.
First, the terminal device 6 transfers the totalization processing execution request to the inquiry processing server 5 (processing request issuance 700), and waits for a response from the inquiry processing server. On the other hand, the query processing server 5 that has received the tally processing execution request creates an execution plan for performing the requested tally processing (processing request reception 701), and according to the plan, the tally processing program on the input / output server or the query processing server. Is started (program start 702). The query processing server 5 waits for a response from the tally processing server 4 after initializing the reception buffer 14, the final tally result storage area 16 and the like (area initialization 703).
[0040]
The input / output server 2 and the tallying server 4 having started the tallying processing program by the query processing server 5, according to the instruction from the query processing server 5, the data buffer 8, the transmission buffer 10, the reception buffer 11, the tallying result storage area 13, and the like. Are initialized (704, 705). After performing the area initialization, the input / output server 2 determines the transfer destination totaling processing server of the record based on the distribution processing means 9 for each record read from the secondary storage device 1 into the data buffer 8, and The record is transferred through (706). The input / output server 2 transmits an end notification to each tabulation processing server when all records have been distributed (707), and ends (708).
[0041]
On the other hand, in the tabulation processing server 4 to which the records have been distributed from the input / output server 2, as soon as the records are received, the tabulation processing of the records is performed and the creation of the intermediate tabulation result is started. If the intermediate tabulation result does not fit in the pre-allocated tabulation result storage area, the intermediate tabulation result is saved to the auxiliary secondary storage device 17 as a partial tabulation result (709). At the time of receiving the end notification from all the input / output servers (710), if the saving process of the intermediate counting result in the secondary storage device has been performed (711), each of the tallying servers has a partial An integration process of the aggregation results is performed (712).
[0042]
The intermediate tabulation results created by the above procedure are transferred from each tabulation processing server 4 to the inquiry processing server 5 in block units (713). When the transfer of all the intermediate tabulation results is completed, the tabulation processing server transmits an end notification to the inquiry processing server (714), and ends (715).
[0043]
The query processing server 5 that has received the intermediate totaling result sequentially stores the intermediate totaling result in the final totaling result storage area 16 and transfers the final totaling result to the terminal device 6 according to a request from the terminal device 6 (716). ). When the inquiry processing server receives the end notification from all the tallying servers (717), it sends the end notification to the terminal device (718) and ends (719). The terminal device 6 requests the final total result to the inquiry processing server 5 as necessary, and receives the final total result (720). Upon receiving the end notification from the inquiry processing server 5 (721) and confirming the end of the tallying process, the terminal device 6 ends its process (722).
[0044]
In the present embodiment, it is possible to execute in parallel on a record-by-record basis between processes adjacent to each other, except between the tallying process (709) and the intermediate tallying process transfer (713) in the tallying server.
[0045]
Hereinafter, the flowchart of the tallying processing program on each node will be described in more detail.
[0046]
FIG. 7 shows a flowchart of the classification and aggregation process in the input / output server 2. The input / output server 2 first initializes the data buffer 8 and the transmission buffer 10 according to the instruction from the inquiry processing server 5 (80). Next, the records stored in the secondary storage device 1 are transferred to the data buffer 8 in block units (81). Next, the input / output server 2 sequentially retrieves the records from the data buffer 8 (82), and determines the distribution destination aggregation processing server 4 for each record according to the record distribution means 9 (described later) (83). Each record is accumulated in units of pages for each distribution destination (84), and is transferred from the transmission buffer 10 to the corresponding tallying server 4 when the page is filled (85) (86). When all the records have been transferred, the input / output server 2 transmits an end notification to each of the tallying servers (87) and ends (88).
[0047]
FIG. 8 shows a flowchart of the classifying and totaling process in the totalizing server 4. The tallying processing server 4 first initializes the tallying result storage area 13 and the receiving buffer 11 according to the instruction from the inquiry processing server 5 (90). The tallying server 4 retrieves records one by one from the record page (91) transferred from the input / output server 2 to the reception buffer 11 (92), and applies a group classification unit (described later) to each record. Then, a group identifier for specifying the group to be classified is determined (93). Next, a totaling process is performed on the retrieved records (95), and the obtained totaling result is written into the totaling result storage area 13 (96). When the hash method is used as the aggregation result storage format, the aggregation result may not fit in the aggregation result storage area allocated in advance (97). In this case, the intermediate counting result is sorted as the partial counting result on the main storage with respect to the value of the corresponding grouping identifier, and then saved in the auxiliary secondary storage device 17 (98). Each tallying server obtains an intermediate tallying result by repeating the tallying process for the received records (99).
[0048]
When the tally processing server receives the end notification from all of the input / output servers (910), if the partial tally result is saved in the auxiliary secondary storage device 17 during tallying (911), the tally processing server at this time A merge sort process is performed between the evacuated partial tabulation results to create an intermediate tabulation result (912). Since the partial tabulation results saved in the secondary storage device are sorted, the merging process between the partial tabulation results can be executed efficiently. If the partial aggregation result is not saved, there is no need to perform any processing. After completing the tallying process of all records, each tallying server transfers the obtained intermediate tallying result to the query processing server (913), and when the transfer is completed, sends a termination notification to the query processing server. Transmit (914) and end (915).
[0049]
FIG. 9 shows a flowchart of the classification and aggregation process in the inquiry processing server 5.
[0050]
First, the inquiry processing server 5 receives a totalization processing execution request from the terminal device 6 (100), and starts a totalization processing program on the input / output server 2 or the totalization processing server 4 according to the content of the processing (101). Next, the inquiry processing server 5 initializes the reception buffer 14 and the final tally result storage area 16 (102), and waits for the reception of the intermediate tally result from the tally processing server 4. As soon as the query processing server 5 receives the intermediate tabulation results from each tabulation processing server 4 (103), the interim tabulation results are connected in order to obtain the final tabulation result (104). The query processing server 5 sequentially transfers the final tabulation results from the beginning in accordance with the request from the terminal device 6 (105). When the inquiry processing server 5 has finished transferring all the final total results (106), it sends an end notification to the terminal device 6 (107) and ends (108).
[0051]
FIG. 10 shows a flowchart of the process in the terminal device 6. The terminal device 6 issues a tallying execution request to the inquiry processing server 5 (110), and then waits for reception of the final tallying result from the inquiry processing server 5 (111). The terminal device 6 requests the inquiry processing server 5 to transfer the final total result one after another. When the terminal device 6 receives the end notification from the inquiry processing server 5 (112), the process ends (113).
[0052]
Hereinafter, the record distribution unit, the group classification unit, the tallying unit, the tallying result storage system, and the integration processing unit which have appeared in the above description will be described in further detail.
[0053]
The record distributing means in this embodiment is a method of redistributing a group of records divided and stored in advance in a plurality of secondary storage devices 1 to each totaling server 4 via the input / output server 2. In the present record distributing means, records classified into the same group in the tallying process must be distributed to the same tallying server 4 without fail. That is, in the distribution according to the present embodiment, when a set of records distributed to the tallying server Ni is Si and a set of records classified into the group Gj is Tj, there exists a certain i for all j, and Si 、 Gj and Sk (k ≠ i) ∩Gj = empty set. Since records classified into the same group are always tabulated by the same tabulation server, the query processing server 5 can easily obtain the final tabulation result simply by joining the sent intermediate tabulation results.
[0054]
As a specific example of the record distribution means, a method such as hash division, key value division, and round robin division can be applied. However, when the records divided and stored in the secondary storage are divided in advance so as to satisfy the above-described conditions in performing the aggregation processing, the input / output server directly performs the aggregation processing without distributing the data. , And may perform a totaling process.
[0055]
The group classification means according to the present embodiment means that a group identifier is assigned to records distributed to each totalization processing server 4 in accordance with the grouping indicated by the classification totalization process issued from the terminal device 6, and the records are classified based on the identifier. How to The correspondence between records and group identifiers will be described in detail below.
[0056]
The group identifier in the present embodiment is an identifier that specifies a group to which each record is to be classified by the group classification means. The group identifier is a discrete value such as an integer value or a character string that can be identified from each other. In the case of the multidimensional classification and aggregation processing for performing group classification based on the values of a plurality of columns, a final group identifier is generated by combining the group identifiers corresponding to the respective grouping columns. For example, when a record having a column {Ai} is grouped with respect to a grouping column {Gi} ({Ai}), a group identifier correspondence function gi is applied to each grouping column Gi, and a group identifier obtained as a result is obtained. Set <g1 (G1),. . , Gn (Gn)> as the group identifier of the record.
[0057]
In this embodiment, a user-defined grouping function can be used for association between the grouping column and the group identifier.
[0058]
In the group classification means, an appropriate group identifier is associated according to the value of the grouping column, and each record is associated with one of the groups. As a method of determining the value of the group identifier, an appropriate method can be selected from the following three methods.
[0059]
The first method uses the value of the column (grouping column) used for grouping as it is, the second method uses a value obtained by applying a built-in function to the grouping column value, A third method is to use a value obtained by applying a user-defined grouping function to a grouping column value.
[0060]
The user-defined grouping function needs to output discrete values that can be distinguished from each other. By using the user-defined group function, the domain of the grouping column value can be divided into several sections, and grouping can be performed for each section. For example, it is useful for a process of using the age as a grouping column value, grouping records by age, and obtaining an average value of income by age, and the like. In addition, if a possible value is known for a group column value having a distribution of discrete values whose section is not limited, a user-defined grouping function is applied to limit the distribution of the group identifier to the section. It can be converted into a continuous distribution, and an array system with high access efficiency can be used as a totaling result storage system. For example, when a product name is used as a grouping column value, if an appearing product name is known, a number is associated with each product name, and an array of the maximum number of the corresponding number is used instead of a hash as a totaling result storage method. Can be substituted.
[0061]
In the present embodiment, when the range and type of the distribution of the grouping column values are known in advance, designation (grouping designation) of the distribution of the group identifier can be performed for each grouping column value. There are the following three types of designation methods. The first method is a group identifier range designation, the second method is a group identifier upper limit designation, and the third method is a group identifier type upper limit designation.
[0062]
In the group identifier range specification, it is possible to specify an upper limit and a lower limit of possible values of a group identifier used for group classification. It can be used when the order can be defined for the group identifier.
[0063]
If there is a record having a grouping column value out of the range, a special group identifier indicating that the value is equal to or less than the lower limit and equal to or greater than the upper limit is prepared, and the tally process for the exceptional value is separately performed. By this means, in the aggregation result storage method, the upper limit value of the area required for the aggregation result is estimated for the grouping column, and for the grouping column, the array method is selected as the aggregation result storage position determination method, and the efficiency is improved. A storage location can be determined.
[0064]
Group identifier upper limit designation means that when the group identifier used for group classification is an integer, the lower limit is fixed at 0 in the group identifier range designation, and the upper limit can be designated. This makes it possible to specify a simple group identifier range.
[0065]
In the group identifier type upper limit designation, the upper limit value of the possible value types of the group identifier used for the group classification is specified. This makes it possible to estimate the upper limit value of the area required for the aggregation result for the grouping column.
[0066]
The aggregation processing means in the present embodiment is a method of sequentially calculating statistics while extracting records one by one from a given set of records. As a specific aggregation process, there are a total value, an average value, and the like. Tallying may be performed using auxiliary information such as the number of already-recorded records in addition to the record information.
[0067]
The aggregation result storage method in the present embodiment is a method for determining the storage method of the group aggregation result. In the present embodiment, one of the three methods, the array method, the hash method, and the hash / array combined method, is selectively used for each grouping column value by using a totaling result storage method selection unit described later.
[0068]
FIG. 11 shows an arrangement result storage format of the array method. This method is used when it is guaranteed that the tally result will fit in the reserved area on the main memory. This storage format is composed of one array 120 whose elements are aggregation results.
[0069]
In the array method, the group identifier is used as a subscript of the array, and the aggregation result is stored as an element of the array. The storage area for the array is secured in main memory. This method does not require a comparison operation to determine the storage location, so that the aggregation results can be stored at high speed. When the value of the grouping column deviates from the range of the upper limit and the lower limit, groups corresponding to the group identifiers that are equal to or more than the upper limit and are equal to or less than the lower limit are provided and separately counted.
[0070]
FIG. 12 shows a hash result totaling format. This method is used when there is a possibility that the result of the aggregation does not fit in the reserved area on the main memory. This storage format includes one hash table 130, a plurality of hash lists 131, a plurality of hash entries 132, and one tally result storage area 133. In this method, the hash table 130 holds a pointer to the hash list 131. The hash list 131 is a list of hash entries 132. The hash entry 132 holds a pointer to a record that stores the aggregation result and a pointer to another hash entry having the same hash value. The tally result storage area 133 is secured from the reserved area on the main memory. However, when the reserved area runs short, it will be dealt with by a method described later.
[0071]
In the hash method, a group identifier is applied to a hash function, an appropriate hash list is selected from a hash table, and a target storage location is determined from a hash entry. In this method, it is necessary to follow a hash list when deciding a storage location, but since only a necessary storage area is consumed, the storage efficiency of the main memory is good.
[0072]
FIG. 13 shows a tally result storage method of the combined hash / array method. In the multidimensional classification and aggregation processing, this method is used when the storage method of the array method and the storage method of the hash method are properly used for each grouping column. In the following, a group identifier {gi} used for multi-dimensional grouping of records is defined as a hash group identifier {hi} ({gi}) serving as a key of a hash method based on a tally result storage format selection method described later; An array group identifier {ci} ({gi}), which is a subscript of the array method, is divided exclusively ({hi} ci} = {gi}, {hi {ci} = empty set).
[0073]
This storage format includes one hash table 140, a plurality of hash lists 141, and a plurality of hash entries 142. In this method, the hash table 140 includes a pointer to the hash list 141. The hash list 141 is a list of hash entries 142. The hash entry 142 includes a pointer to another hash entry having the same hash value, a hash group identifier corresponding to the stored totaling result, and an array storing the totaling result having the same hash group identifier.
[0074]
In the hash / array combined method, first, an array in which the aggregation result is to be stored is found from the hash entry using the hash group identifier. If the target array is not registered in the hash entry, a new array is secured in the reserved area on the main memory and added to the hash entry. A method for coping with the case where the array cannot be secured in the reserved area will be described later. After the target sequence is determined, a location where the totaling result is to be finally stored is specified using the sequence group identifier according to the storage location determination method of the array method. In this method, according to the grouping of each grouping identifier, whether to use the array method or the hash method is used properly, so that the storage location can be determined quickly and the storage can be performed with high storage efficiency. I do.
[0075]
In the following, a description will be given of a method for coping with the case where the intermediate tabulation result does not fit in the reserved area on the main memory in this embodiment. The reason why the tallying result cannot be stored in the reserved area on the main memory in the tallying processing server 4 is that the hash method or the hash / array combined method is used as the tallying result storage method, and the hash entry has an array storage area larger than the reserved area. When trying to add. In the following, at the point in time when the tally result no longer fits in the reserved area on the main memory, the tally result in the tally accumulated in the tally processing server is referred to as a partial tally result.
[0076]
FIGS. 14 and 16 show a processing method and a flowchart in the case where the intermediate tabulation result in the present embodiment does not fit in the reserved area on the main memory.
[0077]
In the present embodiment, when the tally result does not fit in the reserved area on the main memory in the tally processing server 4 (170), the partial tally result is sorted on the main memory of the tally processing server (171). It is evacuated to the device 17 (150, 172). After saving the partial aggregation result, the aggregation result storage area on the main memory can be reused.
[0078]
When the tabulation of all the records is completed (173), the tabulation processing server 4 performs an N-way merge process between the plurality of partial tabulation results saved in the auxiliary secondary storage device 17 (151, 174). , And transfers the processing result to the inquiry processing server 5. In this method, since a random I / O to the auxiliary secondary storage device does not occur, it is possible to efficiently create a totaling result.
[0079]
Further, when the totaling result cannot be stored in the reserved area on the main storage, the partial totaling result on the main storage is not saved in the auxiliary secondary storage 17 but based on the LRU (Least Recently Used) method. Part of the partial summary result referred to in the main memory is retained in the main memory, the remaining partial summary result is saved in the auxiliary secondary storage, and when the aggregation of all records is completed, the partial summary result in the main memory is It is also possible to create an aggregation result by performing an N-way merge process with a plurality of partial aggregation results saved in the auxiliary secondary storage device 17. According to this method, the ratio of records counted in the main memory can be increased, and the counting process can be performed efficiently.
[0080]
Furthermore, when the totaling result cannot be stored in the reserved area on the main storage, the record that cannot be totaled on the main storage is not saved in the auxiliary secondary storage 17 instead of saving the partial totaling result on the main storage. Is saved in the auxiliary secondary storage device, and when all records have been counted, the N-way merge process is performed between the partial count result in the main storage and the record saved in the auxiliary secondary storage device 17. It is also possible to create a total result. In this method, when the number of records that could not be counted on the main memory is small, the counting process can be performed efficiently.
[0081]
In addition, in the above-described method, the ratio of records actually counted in the main memory among the records to be counted in the counting process is monitored, and when the ratio falls below a certain value, the ratio in the main memory is reduced. Are saved in the auxiliary secondary storage device, and when the totalization of all records is completed, the partial aggregation results in the main storage and the plurality of partial aggregation results saved in the auxiliary secondary storage device 17 are saved. It is also possible to perform an N-way merge process between the record and the record saved in the auxiliary secondary storage device 17 to create a total result. In this method, when the distribution of records to be tabulated is biased, tabulation processing can be performed efficiently.
[0082]
In this embodiment, the storage method of the tally result storage area is selected based on the grouping designation for each group identifier. Hereinafter, a method of selecting a storage method will be described.
[0083]
When “Group identifier range specification (including group identifier upper limit specification)” is specified for all group identifiers used for multidimensional classification and aggregation processing, the upper limit of the size of the total storage area from each specification The value is calculated, and it is determined whether or not the value can be stored in the reserved area on the main memory of each totalization processing server. If it fits, a total storage area is secured in main memory in the form of an array. If it does not fit, after switching to a hash method for some or all of the identifiers, an array method is adopted for the remaining identifiers.
[0084]
If “group identifier range designation (including group identifier upper limit designation) or group identifier type upper limit designation” is designated for all group identifiers used for multidimensional classification and aggregation processing, the group identifier is determined from each designation. Of the type, the upper limit of the size of the total storage area is calculated, and it is determined whether or not each total processing server can fit in the reserved area on the main memory. If it fits, a necessary area is reserved in the main memory in advance, and the necessary area is obtained from here. If it does not fit, the identifier for which “group identifier type upper limit is specified” is switched to the hash method, the remaining identifiers are arranged in an array method, and a total storage area is secured in the main memory for the array method part.
[0085]
In cases other than the above, a hash method is selected for all group identifiers.
[0086]
As described above, according to the characteristics of the respective grouping identifiers specified by the grouping specification in the grouping means, the storage result can be determined quickly and the storage efficiency is improved by selectively using the aggregation result storage method. The system can be realized.
[0087]
The integration processing means in this embodiment is a method of integrating a plurality of intermediate aggregation results transferred from the aggregation processing server 4 and creating a final aggregation result in the inquiry processing server 5. In the case of the present embodiment, the intermediate tabulation results on each tabulation processing server hold the tabulation results for groups that are independent of each other, so there is no need to recalculate between the respective intermediate tabulation results. By connecting the tally results, a final tally result can be obtained.
[0088]
The features of this embodiment are as follows.
[0089]
In this embodiment, in the classification and aggregation processing, the classification processing and the aggregation processing are executed in a record unit in parallel by using the aggregation result storage area determination unit that can uniquely determine the position of the aggregation result storage area based on the value of the record. To be able to
When the totaling result does not fit in the reserved area on the main memory, the partial totaling result is sorted on the main memory, saved to the auxiliary secondary storage device, and finally integrated by N-way merge to obtain the auxiliary secondary Suppress the issue of random I / O to the storage processing device,
By specifying information relating to the range of the grouping identifier value or the upper limit of the type for each grouping identifier, it is possible to determine a storage location at high speed and realize a storage method with high storage efficiency.
[0090]
In the following, an example of grouping designation in the present embodiment will be described with SQL in mind as a query language for the database.
[0091]
First, the syntaxes of group identifier range designation, group identifier upper limit designation, and group identifier type upper limit designation are shown according to BNF notation.
[0092]
<Group identifier range specification> :: = RANGE <Column name> [Lower limit,. . , Upper limit] [｛, <column name> [lower limit,. . ,upper limit]} . . . ];
<Group identifier upper limit specification> :: = MAX <column name> Upper limit value [{, <column name> upper limit value}. . . ];
<Group identifier type upper limit specification> :: = GROUP MAX <Column name> Upper limit value [{, <Column name> Upper limit value}. . . ];
However, the above designation method is one implementation example. The timing of the grouping specification may be at the time of defining the format of the table, at the time of issuing a query, or at the time of specifying the comment.
[0093]
An example of the designation of the group classification means through the SQL sentence will be described below using the above example.
[0094]
CREATE TABLE sales results (product code INT, sales location STRING (32), sales date INT, price INT);
GROUP MAX Sales record. Product code 1000, sales record. Sales area 100;
RANGE sales results. Sale date [1,. . , 31];
SELECT Product code, sales location, date of sale, SUM (price) FROM sales results GROUP BY Product code, sales location, date of sale;
The first SQL statement defines a sales performance table having columns of product code, sales location, sales date, and price. The second SQL statement specifies 1000 and 100 as the upper limit value of the product code and the type of the place of sale, respectively. The third SQL statement indicates that the range of the sale date value is 1 or more and 31 or less. The fourth SQL statement is a tabulation processing request statement. The records in the sales record table are grouped according to the product code, the sales location, and the sales date, and the total price is calculated for each group. Instructed.
[0095]
In the fourth SELECT statement, a hash method is selected for a product code and a sales place, and an array format is selected for a sales date. Assuming that the size of the record of the aggregation result for each group is 44 bytes, approximately 44 bytes × 1000 × 100 × 33 = 145.2 Mb is required as the aggregation result storage area. If such an area can be secured in the main memory, a necessary area is secured in the main memory before the execution of the search processing, and efficient aggregation processing without dynamic memory allocation can be performed. If an area of a required size cannot be initially secured, an area of 44 × 33 = 1452 bytes is dynamically allocated as an array area from the main storage as needed. When the assignment becomes impossible, the partial totaling result is saved in the secondary storage device based on the method described above.
[0096]
FIG. 18 shows an example of the classification and aggregation process in this embodiment. In this example, a case is described in which the above-described classification and aggregation processing is executed based on the present embodiment using two input / output servers, two aggregation processing servers, and one inquiry processing server. First, among the records separately stored in the two input / output servers (190), the record whose product code is HT44966 is distributed to the tallying server 1, and the record whose product code is PC15550 is distributed to the tallying server 2 ( 191). Next, in each tallying server, grouping is performed on the place of sale and the date of sale, and tallying is performed for each group (192). Upon completion of the tallying process for the distributed records, each tallying server transfers the intermediate tallying result to the inquiry processing server (193). The inquiry processing server combines the transferred intermediate tabulation results to create a final tabulation result (194).
[0097]
However, in the above embodiment, each server may operate on the same physical computer, or may operate on separate computers connected via a network.
[0098]
Also, in the classifying means of the input / output server in the actual example, instead of determining the classification destination according to the value of the grouping identifier, distribution is performed so that the number of distribution records becomes equal for each aggregation server, and load distribution is performed. For example, the query processing node may again perform the classification and aggregation processing of the intermediate aggregation result to complete the entire classification and aggregation processing.
[0099]
Further, in the above-described embodiment, two or more stages of the tallying server may be connected via the network, and the classification and tallying process may be performed hierarchically for a large number of records.
[0100]
(Example 2)
FIG. 20 shows the configuration of another embodiment of the parallel database processing system according to the present invention.
[0101]
20, reference numeral 1 denotes a secondary storage device for dividing and storing a table in units of partial tables 7, 18 an input / output / aggregation server for reading the partial table 7 from the secondary storage device and creating a partial aggregation process, and 19 a partial storage device. An interim aggregation server that performs another classification and aggregation process between the aggregation results and creates an intermediate aggregation result, a query processing server that integrates the aggregation results of each intermediate aggregation server, and creates a final aggregation result, 3 A network 6 for exchanging records and tally results between the respective nodes, and a terminal device 6 for issuing a classification tally processing request to the database and extracting the tally results from the query processing server. As the network 3, an arbitrary inter-processor connection network such as a LAN, a WAN, and a dedicated hardware device can be used.
[0102]
Hereinafter, the configuration and operation of the input / output / aggregation server 18 and the intermediate aggregation server 19 in the second embodiment will be described. Other parts of the second embodiment that are not described below are the same as those of the first embodiment.
[0103]
The input / output / aggregation server 18 is a data buffer 8 for temporarily storing records read from the secondary storage device 1, an aggregation processing means 12 for performing aggregation processing of records retrieved from the data buffer, and stores an aggregation result. It holds a partial tabulation result storage area 25, a record distribution unit 9 that determines a distribution destination of the tabulation result records, and a plurality of transmission buffers 10. The data buffer 8, the tally result storage area 13, and the transmission buffer 10 are secured on the main memory of the input / output / tally server 18. The input / output / aggregation server 18 transfers the partial table 7 held in the secondary storage device 1 to the data buffer 8 in block units. Next, the input / output / aggregation server 18 sequentially retrieves the records from the data buffer 8 and creates the aggregation result in the partial aggregation result storage area 25 by using the aggregation processing unit 12. As soon as the totaling result in the totaling result storage area reaches the capacity of the totaling result storage area, the intermediate distribution server 19 of the distribution destination is determined by using the record distribution means 9, and is transmitted as the partial totaling result via the transmission buffer 10. The data is transferred to the totaling server 19.
[0104]
The intermediate totaling server 19 includes an auxiliary secondary storage device 17 for saving intermediate results of the partial totaling results transferred from the plurality of receiving buffers 11 and the input / output / totaling server, and a classification and totaling process between the partial totaling results. It holds a tally result storage area 13 that provides a work area. The partial totaling result transferred from the input / output / totalizing server 18 is subjected to another classification and totaling process in the intermediate totaling server 19. If the tally result does not fit in the tally result storage area 13 during the tally process, the tally result in the tally is saved in the auxiliary secondary storage device. After receiving all the partial tabulation results from the input / output / tabulation server 18, the intermediate tabulation server 19 performs an N-way merge process between the partial tabulation results saved in the auxiliary secondary storage device 17. The merge result is transferred to the inquiry processing server 5 as an intermediate total result.
[0105]
FIG. 21 shows the state of the processing flow of the entire classification and aggregation processing in this embodiment. First, the terminal device 6 transfers the request for executing the tallying process to the inquiry processing server 5 (2600), and waits for a response from the inquiry processing server. On the other hand, the query processing server 5 that has received the request for executing the tallying process creates an execution plan for performing the requested tallying process (2601), and according to the plan, the tallying on the input / output / totaling server 18 and the intermediate tallying server 19. The processing program is started (2602).
[0106]
The query processing server 5 initializes the receiving buffer 14, the final totaling result storage area 16, and the like (2603), and then waits for a response from the intermediate totaling server 19. The input / output / aggregation server 18 and the intermediate aggregation server 19, which have started the aggregation processing program by the inquiry processing server 5, respond to the instruction from the inquiry processing server 5 to store the data buffer 8, the transmission buffer 10, the reception buffer 11, and the aggregation result storage area. 13. Initialize the partial totaling result storage area 25 and the like (2604, 2605).
[0107]
After initializing the area, the input / output / aggregation server 18 performs the aggregation processing for each record read from the secondary storage device 1 into the data buffer 8 based on the aggregation processing means 12 (2606). The input / output / aggregation server 18 determines the intermediate aggregation server to which the partial aggregation result is to be transferred, based on the distribution processing means 9 for each record, as soon as the aggregation result reaches the capacity of the partial aggregation result storage area 25. The record page is transferred through (2623). The input / output / aggregation server 18 transmits an end notification to each aggregation processing server when all the record pages have been distributed (2607), and terminates (2608).
[0108]
On the other hand, the intermediate tabulation server 19 to which the record pages have been distributed from the input / output / tabulation server 18 performs classification and tabulation of the transferred partial tabulation results as soon as the record page is received. If the tally result does not fit in the storage area, the tally result in the middle of tally is sorted with respect to the value of the grouping identifier and then saved in the auxiliary secondary storage device. (2609). At the time of receiving the end notification from all the input / output / aggregation servers (2610), if there is a partial aggregation result evacuated to the auxiliary secondary storage device 17 (2611), each intermediate aggregation server receives the end notification (2611). A merge sort process is performed (2612).
[0109]
The intermediate tabulation result created by the above procedure is transferred from each intermediate tabulation server to the query processing server in block units (2613). At the point in time when all the intermediate tabulation results have been transferred, the intermediate tabulation server sends an end notification to the inquiry processing server (2614) and ends (2615). The query processing server 5 that has received the intermediate tabulation results sequentially stores the intermediate tabulation results in the final tabulation result storage area 15 and transfers the final tabulation results to the terminal device 6 according to the request from the terminal device 6 (2616) ). When the inquiry processing server receives the end notification from all the tally processing servers (2617), it transmits the end notification to the terminal device (2618) and ends (2619). The terminal device 6 requests the final count result from the inquiry processing server 5 as necessary, and receives the final count result (2620). The terminal device 6 ends its processing when it receives the end notification from the inquiry processing server 5 (2621) and confirms the end of the aggregation processing (2622).
[0110]
In this embodiment, since the integration processing of the partial aggregation results is shared by a plurality of intermediate aggregation servers and performed independently, efficient processing can be performed when the number of groups of the aggregation results is large.
[0111]
Hereinafter, the flowcharts of the input / output / aggregation server and the intermediate aggregation server will be described in more detail.
[0112]
FIG. 22 shows a flowchart of the classification and aggregation process in the input / output / aggregation server. The input / output / aggregation server 18 first initializes the data buffer 8, the partial aggregation result storage area 25, and the transmission buffer 10 according to the instruction from the query processing server 5 (2700). Next, the records stored in the secondary storage device 1 are transferred to the data buffer 8 in block units (2701). Next, the input / output / aggregation server sequentially retrieves the records from the data buffer 8 (2702), and determines the group identifiers of the records to be classified (2703). The input / output / aggregation server performs the aggregation processing with reference to the record value (2704), determines the aggregation result storage area on the aggregation result storage area based on the group identifier, and writes the aggregation result (2705). At this time, if the totaling result does not fit in the totaling storage area (2706), the process of transferring the partial totaling result during the totaling to the intermediate totaling server is performed (2707). When the tallying process is completed for all the read block records (2708, 2709), the final tallying result is transferred (2710), and an end notification is transmitted to the intermediate tallying server (2711). The process ends (2712).
[0113]
FIG. 23 shows a flowchart of the classification and aggregation process in the intermediate aggregation server. The intermediate totaling server 19 first initializes the totaling result storage area 13 and the receiving buffer 11 according to the instruction from the inquiry processing server 5 (2800). The intermediate aggregation server extracts the aggregation results one by one from the partial aggregation results (2801) transferred from the input / output / aggregation server to the reception buffer 11 (2803), and performs classification processing on the values of the corresponding grouping identifiers (2804). ).
[0114]
Next, the totalized value is re-counted for each classified totaling result (2805), and the result is written to the totaling result storage area 13 (2806). If the tally result does not fit in the tally result storage area (2807), the tally result is sorted with respect to the value of the corresponding grouping identifier and stored in the auxiliary secondary storage device (2808). Each intermediate totaling server repeats the totaling result for the received totaling result (2809). When receiving the end notification from all of the input / output / aggregation servers (2810), if the aggregation result has been saved in the auxiliary secondary storage device (2811), the intermediate aggregation server saves the saved aggregation result. Are performed, and an intermediate tabulation result is created (2812). The intermediate totaling server transfers the intermediate totaling result to the inquiry processing server (2813), transmits an end notification (2814), and ends (2815).
[0115]
In the following, a description will be given of a method for coping with the case where the totaling result does not fit in the reserved area on the main storage in the input / output / totaling server 18 of the present embodiment. The reason that the I / O / aggregation server 18 cannot accumulate the aggregation result in the reserved area on the main memory is that the hash result or the combined use of the hash and the array is used as the aggregation result storage method, and the hash entry has an array larger than the reserved area. This is the case when an attempt is made to add a storage area. In the following, at the point in time when the tally result no longer fits in the reserved area on the main memory, the tally result in the middle of tally accumulated on the input / output / tally server is referred to as a partial tally result.
[0116]
FIGS. 15 and 17 show a processing method and a flowchart in the case where the intermediate tabulation result in the present embodiment does not fit in the reserved area on the main storage.
[0117]
In this embodiment, when the tally result does not fit in the reserved area on the main memory in the input / output / tally server 18, each tally result included in the partial tally result is replaced with the tally result for the same group by the same intermediate tally server 19. Are classified and transferred (160). After saving the partial aggregation result, the aggregation result storage area on the main memory can be reused. The intermediate tabulation server sorts the partial tabulation results on the main memory and saves them in the auxiliary secondary storage device 17 (161). At the point in time when all the tabulation results have been transferred (180), the intermediate tabulation server 19 performs an N-way merge process between the plurality of partial tabulation results saved in the auxiliary secondary storage device (162, 181). The processing result is transferred to the inquiry processing server 5. In this method, since a random I / O to the auxiliary secondary storage device does not occur, it is possible to efficiently create a totaling result.
[0118]
However, in the present embodiment, if the tally result does not fit in the reserved area on the main memory in the input / output / tally server, a part of the partial tally result is added to the auxiliary secondary data as in the case of the tally processing server in the first embodiment. Efficient tabulation processing is performed by evacuating to the storage device, evacuating only overflowing records to the auxiliary secondary storage device, and evacuating the partial tabulation results to the auxiliary secondary storage device according to the tabulation success rate on the main memory. It is also possible.
[0119]
FIG. 19 shows an example of the classification and aggregation process in this embodiment. In this example, a case is described in which the above-described classification and aggregation processing is executed based on the present embodiment using two input / output / output servers, two intermediate aggregation servers, and one query processing server. First of all, after each of the records stored separately in the two input / output / aggregation servers (200), the aggregation process is performed on each record with the product code HT44966 to the intermediate aggregation server 1. The tally result for the record whose product code is PC15550 is distributed to the tally processing server 2 (201). Next, in each intermediate tabulation server, a reclassification and tabulation process of the partial tabulation results is performed for each tabulation result regarding the same group (202). When the respective intermediate tabulation servers have completed the classification and tabulation processing for all the distributed partial tabulation results, they transfer the merged intermediate tabulation results to the query processing server (203). The query processing server combines the transferred intermediate tabulation results to create a final tabulation result (204).
[0120]
However, in the above embodiment, each server may operate on the same physical computer, or may operate on separate computers connected via a network.
[0121]
Also, in the classifying means of the input / output server in the actual example, instead of determining the classification destination according to the value of the grouping identifier, distribution is performed so that the number of distribution records becomes equal for each aggregation server, and load distribution is performed. For example, the query processing node may again perform the classification and aggregation processing of the intermediate aggregation result to complete the entire classification and aggregation processing.
[0122]
Further, in the above-described embodiment, two or more stages of the tallying server may be connected via the network, and the classification and tallying process may be performed hierarchically for a large number of records.
[0123]
According to the present invention, the most efficient method can be adopted from the above two embodiments according to the characteristics of the data to be subjected to classification and aggregation. For example, when the number of tallying servers can be sufficiently secured, the first embodiment is adopted, and the records stored in the secondary storage device are distributed to as many tallying servers as possible, thereby completing the classification and tallying process in a short time. Can be. Further, when the amount of data of the tabulation result is sufficiently smaller than the amount of data of the record to be tabulated, by adopting the second embodiment, the amount of data transferred on the network can be reduced.
[0124]
【The invention's effect】
In the parallel database processing system according to the present invention, the classifying process and the counting process are executed in parallel in units of records, and the efficiency is improved by preventing the occurrence of random I / O when the counting result does not fit in the reserved area on the main memory. Grouping and tallying process can be performed.
[Brief description of the drawings]
FIG. 1 is a block diagram of a first embodiment of a parallel database processing system according to the present invention.
FIG. 2 is a diagram showing a logical structure of a database.
FIG. 3 is a diagram illustrating an outline of a classification and aggregation process.
FIG. 4 is a diagram showing a flowchart of a merge sort method.
FIG. 5 is a diagram showing a flowchart of a simple hash method.
FIG. 6 is a diagram illustrating an outline of a processing flow of the entire classification and aggregation processing in the first embodiment.
FIG. 7 is a diagram illustrating a flowchart of processing of the input / output server according to the first embodiment.
FIG. 8 is a diagram illustrating a flowchart of processing of a tallying server according to the first embodiment.
FIG. 9 is a diagram illustrating a flowchart of processing of an inquiry processing server.
FIG. 10 is a diagram illustrating a flowchart of processing of a terminal device.
FIG. 11 is a diagram showing a totaling result storage format based on an array method.
FIG. 12 is a diagram showing a tally result storage format based on a hash method.
FIG. 13 is a diagram showing a tally result storage format based on a combined hash / array method.
FIG. 14 is a diagram illustrating a countermeasure in a case where the aggregation result does not fit in the reservation area in the first embodiment.
FIG. 15 is a diagram illustrating a countermeasure in a case where the aggregation result does not fit in the reserved area in the second embodiment.
FIG. 16 is a diagram illustrating a flowchart of a process when a totaling result does not fit in a reserved area in the first embodiment.
FIG. 17 is a diagram illustrating a flowchart of a process when a totaling result does not fit in a reservation area in the second embodiment.
FIG. 18 is a diagram illustrating an example in which the classification and aggregation processing is performed in the first example.
FIG. 19 is a diagram illustrating an example in which a classification and aggregation process is performed in the second example.
FIG. 20 is a block diagram of a parallel database processing system according to a second embodiment of the present invention.
FIG. 21 is a diagram illustrating an outline of a processing flow of the entire classification and aggregation processing according to the second embodiment.
FIG. 22 is a diagram illustrating a flowchart of processing of the input / output / aggregation server in the second embodiment.
FIG. 23 is a diagram illustrating a flowchart of processing of an intermediate totaling server in the second embodiment.
[Explanation of symbols]
1: Secondary storage device, 2: Input / output server, 3: Network, 4: Aggregation processing server, 5: Query processing server, 6: Terminal device, 7: Partial table, 8: Data buffer, 9: Record distribution means, 10: transmission buffer, 11: reception buffer, 12: totaling processing means, 13: totaling result storage area, 14: reception buffer, 15: totaling result integration means, 16: final totaling result storage area, 17: auxiliary secondary storage device , 18: input / output / aggregation server, 19: intermediate aggregation node.

Claims

A query processing server for processing a query processing request for the database, an input / output server for reading records of the database, and a tabulation server for tabulating records; and a grouping column for at least one column of the record. , And associate a group identifier with the grouping column value, classify a plurality of records with the same group identifier value into one group, and assign one or more columns of the record to the aggregation target column. A database processing system that performs a totaling process for each record classified into each group,
The input / output server has means for transferring to a tally processing server that performs tally processing of the record according to the value of the grouping column of each record,
A means for allocating a tally result storage area on a main memory provided in the tally processing server; a means for generating a group identifier from a value of a grouping column of a record received from the input / output server; A classifying unit that uniquely determines a storage location of the aggregation result corresponding to the group identifier from the value, a counting unit that updates the aggregation result stored in the storage location based on the value of the aggregation target column of the record, Means for transferring the intermediate aggregation result to the inquiry processing server,
The inquiry processing server, in the classification and aggregation processing method of the database processing system having means for integrating the intermediate aggregation results received from the aggregation processing server,
In the case where the tally result does not fit in the tally result storage area secured on the main memory of the tally processing server, a part of the tally result most recently referred to among the tally results in the tally is left and the remaining After sorting the tally result on the main memory with respect to the value of the grouping identifier corresponding to the tally result, save the result as an intermediate tally result in the auxiliary secondary storage device provided in the tally processing server,
At the end of the tallying process, by performing a merging process between the tallying result in the main memory and the zero or more sorted intermediate tallying results saved in the auxiliary secondary storage device, the respective tallying server A totaling process for the records distributed to the groups.

A query processing server for processing a query processing request for the database, an input / output server for reading records of the database, and a tabulation server for tabulating records; and a grouping column for at least one column of the record. , And associate a group identifier with the grouping column value, classify a plurality of records with the same group identifier value into one group, and assign one or more columns of the record to the aggregation target column. A database processing system that performs a totaling process for each record classified into each group,
The input / output server has means for transferring to a tally processing server that performs tally processing of the record according to the value of the grouping column of each record,
A means for allocating a tally result storage area on a main memory provided in the tally processing server; a means for generating a group identifier from a value of a grouping column of a record received from the input / output server; A classifying unit that uniquely determines a storage location of the aggregation result corresponding to the group identifier from the value, a counting unit that updates the aggregation result stored in the storage location based on the value of the aggregation target column of the record, Means for transferring the intermediate aggregation result to the inquiry processing server,
The inquiry processing server, in the classification and aggregation processing method of the database processing system having means for integrating the intermediate aggregation results received from the aggregation processing server,
When the totaling result does not fit in the totaling result storage area secured on the main memory, the totaling server saves records that could not be totaled in the auxiliary secondary storage device of the totalizing server. When the ratio of records that can be counted in the main memory falls below a certain value, the counting result in the middle of counting is sorted in the main memory with respect to the value of the grouping identifier corresponding to the counting result. After that, the data is saved as an intermediate counting result in the auxiliary secondary storage device of the counting server,
At the end of the tallying process, the tallying result in the main memory, the tallying result of the zero or more records saved in the auxiliary secondary storage device, and the zero or more sorted intermediates saved in the auxiliary secondary storage device A classifying and summarizing method characterized by performing a summarizing process on records distributed to respective summarizing servers by performing a merge process with a summing result.

A query processing server for processing a query processing request to the database; an input / output / total server for reading and counting the records of the database; and an intermediate counting server for counting the intermediate counting results. One or more columns are designated as grouping columns, a group identifier is made to correspond to the grouping column value, a plurality of records having the same group identifier value are classified into one group, and A database processing system that specifies one or more columns as columns to be totalized and performs a totaling process for each record classified into each group,
The input / output / aggregation server has means for allocating an aggregation result storage area on a main memory provided in the input / output / aggregation server; means for generating a group identifier from a value of a grouping column of each record; A classifying means for uniquely determining the storage location of the aggregation result corresponding to the group identifier, a counting means for updating the aggregation result stored in the storage area based on the value of the aggregation target column of the record, Means for transferring to the intermediate aggregation server that performs the integration processing of the partial aggregation result according to the value of the grouping identifier corresponding to the aggregation result,
The intermediate aggregation server has means for integrating the partial aggregation results received from the input / output / aggregation server, generating an intermediate result, and transferring the intermediate aggregation result to the inquiry processing server,
The query processing server, in the classification and aggregation processing method in the database processing system having a unit that integrates the intermediate aggregation result received from the intermediate aggregation processing server,
The input / output / aggregation server, when the aggregation result does not fit in the aggregation result storage area secured on the main memory, leaves a part of the aggregation result most recently referred to among the aggregation results in the middle of aggregation. After sorting the remaining tabulation results on the main memory with respect to the value of the grouping identifier corresponding to the tabulation result, save the result as an intermediate tabulation result in the auxiliary secondary storage device of the input / output / counting server,
At the end of the tallying process, the merging process is performed between the tallying result in the main memory and the zero or more sorted intermediate tallying results saved in the auxiliary secondary storage device, so that the respective A classifying and summarizing method, which performs a summarizing process on records distributed to a server.

A query processing server for processing a query processing request for the database, an input / output / total server for reading and counting records of the database, and an intermediate counting server for counting the results of the intermediate counting; The above columns are designated as grouping columns, a group identifier is associated with the grouping column value, a plurality of records having the same group identifier value are classified into one group, and one of the records A database processing system that specifies the above columns as columns to be totaled and performs a totaling process for each record classified into each group,
The input / output / aggregation server has means for allocating an aggregation result storage area on a main memory provided in the input / output / aggregation server; means for generating a group identifier from a value of a grouping column of each record; A classifying means for uniquely determining the storage location of the aggregation result corresponding to the group identifier, a counting means for updating the aggregation result stored in the storage area based on the value of the aggregation target column of the record, Means for transferring to the intermediate aggregation server that performs the integration processing of the partial aggregation result according to the value of the grouping identifier corresponding to the aggregation result,
The intermediate aggregation server has means for integrating the partial aggregation results received from the input / output / aggregation server, generating an intermediate result, and transferring the intermediate aggregation result to the inquiry processing server,
The query processing server, in the classification and aggregation processing method in the database processing system having a unit that integrates the intermediate aggregation result received from the intermediate aggregation processing server,
The input / output / aggregation server, when the aggregation result cannot be accommodated in the aggregation result storage area secured on the main memory, overflows the record that could not be aggregated to an auxiliary secondary provided in the input / output / aggregation server. When the ratio of records that can be evacuated to the storage device and tallied on the main memory falls below a certain value, the tallied aggregation result is stored in the main storage with respect to the value of the grouping identifier corresponding to the aggregation result. After sorting on the above, save it to the auxiliary secondary storage device of the input / output / aggregation server as an intermediate aggregation result,
At the end of the tallying process, the tallying result in the main memory, the tallying result of the zero or more records saved in the auxiliary secondary storage device, and the zero or more sorted intermediates saved in the auxiliary secondary storage device A classification / aggregation processing method characterized by performing aggregation processing on records distributed to respective input / output / aggregation servers by performing merge processing with the aggregation result.

A query processing server for processing a query processing request for the database, an input / output / total server for reading and counting records of the database, and an intermediate counting server for counting the results of the intermediate counting; The above columns are designated as grouping columns, a group identifier is associated with the grouping column value, a plurality of records having the same group identifier value are classified into one group, and one of the records A database processing system that specifies the above columns as columns to be totaled and performs a totaling process for each record classified into each group,
The input / output / aggregation server has means for allocating an aggregation result storage area on a main memory provided in the input / output / aggregation server; means for generating a group identifier from a value of a grouping column of each record; A classifying means for uniquely determining the storage location of the aggregation result corresponding to the group identifier, a counting means for updating the aggregation result stored in the storage area based on the value of the aggregation target column of the record, Means for transferring to the intermediate aggregation server that performs the integration processing of the partial aggregation result according to the value of the grouping identifier corresponding to the aggregation result,
The intermediate aggregation server has means for integrating the partial aggregation results received from the input / output / aggregation server, generating an intermediate result, and transferring the intermediate aggregation result to the inquiry processing server,
The query processing server, in the classification and aggregation processing method in the database processing system having a unit that integrates the intermediate aggregation result received from the intermediate aggregation processing server,
The input / output / aggregation server, when the aggregation result does not fit in the aggregation result storage area secured on the main memory, according to a corresponding rule provided in the input / output / aggregation server, the aggregation result in the middle of the aggregation on the main memory. Transferred as a partial aggregation result to the intermediate aggregation server,
In performing the classification and aggregation processing of the transferred partial aggregation results, the intermediate aggregation server, when the aggregation results cannot be stored in the aggregation result storage area secured on the main memory of the intermediate aggregation server, the aggregation is being performed. Among the aggregation results, a part of the aggregation result most recently referred to is left, and the remaining aggregation results are sorted on the main memory with respect to the grouping identifier value corresponding to the aggregation result. Saved to the secondary storage device as an interim aggregation result,
At the end of the tallying process, by performing a merge process between the tallying result in the main memory and the zero or more sorted intermediate tallying results saved in the auxiliary secondary storage device, the respective intermediate tallying servers And performing a totaling process on the partial totaling results distributed to the groups.

A query processing server for processing a query processing request for the database, an input / output / total server for reading and counting records of the database, and an intermediate counting server for counting the results of the intermediate counting; The above columns are designated as grouping columns, a group identifier is associated with the grouping column value, a plurality of records having the same group identifier value are classified into one group, and one of the records A database processing system that specifies the above columns as columns to be totaled and performs a totaling process for each record classified into each group,
The input / output / aggregation server has means for allocating an aggregation result storage area on a main memory provided in the input / output / aggregation server; means for generating a group identifier from a value of a grouping column of each record; A classifying means for uniquely determining the storage location of the aggregation result corresponding to the group identifier, a counting means for updating the aggregation result stored in the storage area based on the value of the aggregation target column of the record, Means for transferring to the intermediate aggregation server that performs the integration processing of the partial aggregation result according to the value of the grouping identifier corresponding to the aggregation result,
The intermediate aggregation server has means for integrating the partial aggregation results received from the input / output / aggregation server, generating an intermediate result, and transferring the intermediate aggregation result to the inquiry processing server,
The query processing server, in the classification and aggregation processing method in the database processing system having a unit that integrates the intermediate aggregation result received from the intermediate aggregation processing server,
The input / output / aggregation server, when the aggregation result does not fit in the aggregation result storage area secured on the main memory, according to a corresponding rule provided in the input / output / aggregation server, the aggregation result in the middle of the aggregation on the main memory. Transferred as a partial aggregation result to the intermediate aggregation server,
In performing the classification and tallying process of the transferred partial tally results, the intermediate tally server overflows the tally results when the tally results no longer fit in the tally result storage area secured on the main memory of the intermediate tally server. The records that could not be collected are saved to the secondary storage device provided in the interim totaling server, and when the ratio of records that can be totaled in the main memory falls below a certain value, the totaling in the middle of the totaling is performed. After sorting the result on the main memory with respect to the value of the grouping identifier corresponding to the aggregation result, the result is saved as an intermediate aggregation result in the auxiliary secondary storage device provided in the intermediate aggregation server,
At the end of the counting process, the counting result in the main memory, the counting result of the zero or more records saved in the auxiliary secondary storage device, and the zero or more sorted intermediates saved in the auxiliary secondary storage device. A classifying and summarizing method, wherein a merging process is performed with a summarizing result to perform a summarizing process on partial summarizing results distributed to respective intermediate summarizing servers.