JP2015106219A

JP2015106219A - Distributed data virtualization system, query processing method and query processing program

Info

Publication number: JP2015106219A
Application number: JP2013246938A
Authority: JP
Inventors: 和広斉藤; Kazuhiro Saito
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2013-11-29
Filing date: 2013-11-29
Publication date: 2015-06-08
Anticipated expiration: 2033-11-29
Also published as: JP6262505B2

Abstract

PROBLEM TO BE SOLVED: To provide a distributed data virtualization system for performing parallel processing while distributing large-scaled data over a plurality of data virtualization devices.SOLUTION: Each of a plurality of data virtualization devices 3 includes: a distribution processing scheduling part 34 for creating a distribution processing schedule including the selection of a plurality of data virtualization devices to implement query processing in parallel and query processing contents in the plurality of data virtualization devices while considering a query plan defining procedures of query processing and a resource quantity; a processing schedule arrangement part 35 for distributing the distribution processing schedule to a data virtualization device group to perform distribution processing; a data distribution part 37 for distributing processing object data acquired from each database system to the other data virtualization device or receiving the data in the data virtualization device for the unit of a record in accordance with the distribution processing schedule; a query processing part 38 for performing processing according to the distribution processing schedule; and a data compilation part 39 for receiving a result of the processing from each query processing part and transmitting a result of compilation to a client.

Description

本発明は、一つ以上のデータベースシステムに対する論理的なモデルを提供するデータ仮想化システムに関し、クエリ処理を複数のデータ仮想化装置に分散して並列処理する分散型データ仮想化システム、クエリ処理方法及びクエリ処理プログラムに関する。 The present invention relates to a data virtualization system that provides a logical model for one or more database systems, and a distributed data virtualization system and a query processing method that distribute query processing to a plurality of data virtualization apparatuses in parallel. And a query processing program.

データ仮想化システム（又はマルチデータベースシステム）は、複数のデータベースシステムを仮想的に一つのデータベースシステムに見せるために、各データベースシステムが持つテーブルを論理的に一つのシステムに集約して管理し、ユーザのクエリに対応するデータベースシステムにクエリを投稿する。
代表的なデータ仮想化システムは、例えば特許文献１に示されるように、複数の階層的なデータベースシステムを、データマッピングにより仮想スキーマ（実際の物理テーブルをユーザに提供する論理テーブルに変換する処理を定義したもの）に統合し、クエリ実行時において処理対象となるデータを保持するデータベースシステムにクエリを分配するよう構成されている。各データベースシステムで実行されたクエリの結果は中央に収集され、仮想スキーマに従って一つに統合して結果を出力するシステムとなっている。 A data virtualization system (or multi-database system) is a user who logically aggregates and manages the tables of each database system in order to make multiple database systems appear virtually as one database system. Post the query to the database system corresponding to the query.
A typical data virtualization system, for example, as shown in Patent Document 1, converts a plurality of hierarchical database systems into a virtual schema (process that converts an actual physical table to a logical table provided to a user by data mapping). The query is distributed to a database system that holds data to be processed at the time of query execution. The results of queries executed in each database system are collected in the center, and are integrated into one according to a virtual schema to output the results.

特開平０７−１４１３９９号公報Japanese Patent Application Laid-Open No. 07-141399

特許文献１のように、複数のデータベースシステムに跨がるクエリ処理の最終的な統合処理は、データ仮想化システム上で実行する必要がある。このとき、一つ以上のデータベースシステムから得られるデータが、データ仮想化システムの物理メモリサイズを超えるほど大規模であった場合、実行できずにエラー終了してしまう可能性がある。
OSのスワップ機構もしくはデータ仮想化システムのディスク退避の機能の実装によってある程度のサイズまでは対応可能であると考えられるが、それによる遅延は非常に大きく、その許容量を超えてしまった場合にも同様に終了してしまうという問題がある。 As in Patent Document 1, it is necessary to execute final integration processing of query processing across a plurality of database systems on a data virtualization system. At this time, if the data obtained from one or more database systems is so large that it exceeds the physical memory size of the data virtualization system, there is a possibility that the process cannot be executed and the process ends in error.
Although it is considered that a certain size can be accommodated by implementing the OS swap mechanism or disk virtualization function of the data virtualization system, the delay due to it is very large, and even if it exceeds the allowable amount There is a problem that it ends similarly.

ユーザにとってデータ仮想化システムを利用する最大のメリットは、個々のデータベースシステムを意識せず利用できる点にあり、データ仮想化システムがアクセスするデータベースシステムのクエリ処理の結果が大規模かどうかを意識させることはデータ仮想化システムの利益を損なうこととなる。 The biggest merit of using a data virtualization system for users is that they can be used without being aware of individual database systems, and whether or not the results of query processing of database systems accessed by the data virtualization system are large-scale. This detracts from the benefits of the data virtualization system.

本発明は上記実情に鑑みて提案されたものであり、データ仮想化システムにおいて、一台のデータ仮想化装置では処理しきれない大規模なデータを、複数のデータ仮想化装置に分散して並列処理を行う分散型データ仮想化システム、クエリ処理方法及びクエリ処理プログラムを提供することを目的としている。 The present invention has been proposed in view of the above circumstances, and in a data virtualization system, large-scale data that cannot be processed by a single data virtualization apparatus is distributed to a plurality of data virtualization apparatuses in parallel. An object of the present invention is to provide a distributed data virtualization system, a query processing method, and a query processing program that perform processing.

上記目的を達成するため本発明の請求項１は、クエリ処理要求と結果受信を行うクライアントと、クエリ処理を行うデータ仮想化装置と、前記データ仮想化装置が利用する１つ以上のデータベースシステムと、前記データ仮想化装置のリソース量に関する情報、及び、前記データベースシステムが有するテーブル情報を管理する分散型データ仮想化システム管理装置と、を備えた分散型データ仮想化システムにおいて、
前記データ仮想化装置は、複数のデータ仮想化装置から構成し、
前記各データ仮想化装置は、
前記クエリ処理の手順を定義したクエリプランと前記リソース量及び前記テーブル情報を考慮し、クエリ処理を並列実施する複数のデータ仮想化装置の選定と当該複数のデータ仮想化装置でのクエリ処理内容とを含む分散処理計画を作成する分散処理計画部と、
分散処理するデータ仮想化装置群に前記分散処理計画を分配する処理計画配置部と、
前記分散処理計画に応じて各データベースシステムから取得した処理対象データをレコード単位で他のデータ仮想化装置に分配又は当該データ仮想化装置で受信するデータ分散部と、
前記分散処理計画に従った処理を行うクエリ処理部と、
前記分散処理計画に従った処理の結果を各クエリ処理部から受信し、集計した結果を前記クライアントに送信するデータ集計部とを備えることを特徴としている。 In order to achieve the above object, claim 1 of the present invention provides a client that performs query processing request and result reception, a data virtualization apparatus that performs query processing, and one or more database systems that the data virtualization apparatus uses. In a distributed data virtualization system comprising: a distributed data virtualization system management device that manages information related to the resource amount of the data virtualization device and table information included in the database system;
The data virtualization apparatus is composed of a plurality of data virtualization apparatuses,
Each of the data virtualization devices
Considering the query plan defining the query processing procedure, the resource amount, and the table information, selecting a plurality of data virtualization apparatuses that execute the query processing in parallel, and contents of query processing in the plurality of data virtualization apparatuses, A distributed processing plan section for creating a distributed processing plan including
A processing plan placement unit that distributes the distributed processing plan to a group of data virtualization apparatuses that perform distributed processing;
A data distribution unit that distributes or receives the processing target data acquired from each database system in accordance with the distributed processing plan to other data virtualization apparatuses in units of records;
A query processing unit that performs processing according to the distributed processing plan;
And a data totaling unit that receives results of processing according to the distributed processing plan from each query processing unit and transmits the totaled results to the client.

請求項２は、クエリ処理要求と結果受信を行うクライアントと、クエリ処理を行う複数のデータ仮想化装置と、前記データ仮想化装置のリソース量に関する情報を管理する分散型データ仮想化システム管理装置と、前記データ仮想化装置が利用する１つ以上のデータベースシステムとを備えた分散型データ仮想化システムにおけるクエリ処理方法であって、
前記クライアントからクエリ処理要求があった場合に、
クエリ処理の手順を定義したクエリプランを作成するクエリプラン作成手順と、
前記クエリプランと前記リソース量及び前記テーブル情報を考慮して、クエリ処理を並列実施する複数のデータ仮想化装置の選定と当該複数のデータ仮想化装置での分散クエリ処理内容とを含む分散処理計画を作成する分散処理計画手順と、
分散処理するデータ仮想化装置群に前記分散処理計画を分配する処理分配手順と、
前記分散処理計画に応じて各データベースシステムから取得した処理対象データをレコード単位で分配又は受信するデータ分散手順と、
前記分散処理計画に従った処理を行うクエリ処理手順と、
前記分散処理計画に従った処理の結果を受信し、集計した結果を前記クライアントに送信するデータ集計手順と
を含むことを特徴としている。 A second aspect of the present invention relates to a client that performs a query processing request and result reception, a plurality of data virtualization apparatuses that perform query processing, a distributed data virtualization system management apparatus that manages information related to the resource amount of the data virtualization apparatus, A query processing method in a distributed data virtualization system comprising one or more database systems used by the data virtualization device,
When there is a query processing request from the client,
Query plan creation procedure for creating a query plan that defines the query processing procedure,
In consideration of the query plan, the resource amount, and the table information, a distributed processing plan including selection of a plurality of data virtualization apparatuses that execute query processing in parallel and the contents of distributed query processing in the plurality of data virtualization apparatuses A distributed processing plan procedure to create
A process distribution procedure for distributing the distributed processing plan to the group of data virtualization devices to be distributed;
A data distribution procedure for distributing or receiving the processing target data acquired from each database system according to the distributed processing plan in units of records;
A query processing procedure for performing processing according to the distributed processing plan;
Receiving a result of processing according to the distributed processing plan, and sending a totaled result to the client.

請求項３は、請求項２のクエリ処理方法において、
前記分散処理計画手順は、
前記分散クエリ処理内容から各データベースシステムから得られる分散クエリ結果の容量を計算する手順と、
計算された容量から、前記仮想スキーマで得られるテーブルや中間テーブル、最終的な結果の出力容量と、仮想スキーマの処理における最大処理容量を計算する手順と、
計算された容量を利用して、必要なデータ仮想化装置の数及び適用先のデータ仮想化装置を決定するデータ仮想化装置群選択手順と
を含むことを特徴としている。 Claim 3 is the query processing method of claim 2,
The distributed processing plan procedure is:
A procedure for calculating the capacity of the distributed query result obtained from each database system from the distributed query processing content;
From the calculated capacity, a table or intermediate table obtained in the virtual schema, a final output capacity, and a procedure for calculating the maximum processing capacity in the virtual schema processing;
And a data virtualization apparatus group selection procedure for determining the number of necessary data virtualization apparatuses and the application destination data virtualization apparatus by using the calculated capacity.

請求項４は、請求項３のクエリ処理方法において、
前記データ仮想化装置群選択手順は、
前記クエリプランから作成したプランツリーを根（クエリ結果）から順に前記クエリ処理を実行するノードを決める対象となるテーブルを降順探索する手順と、
前記テーブルが仮想スキーマにおける出力テーブルである場合に、当該仮想スキーマにおける最大出力容量と、データ仮想化装置群における利用状況及び利用可能メモリサイズからデータ仮想化装置群を選択する手順と
を含むことを特徴としている。 Claim 4 is the query processing method of claim 3,
The data virtualization apparatus group selection procedure includes:
A procedure for searching in descending order a table that is a target for determining a node for executing the query processing in order from the root (query result) of the plan tree created from the query plan;
Including a procedure for selecting the data virtualization apparatus group from the maximum output capacity in the virtual schema, the usage status in the data virtualization apparatus group, and the available memory size when the table is an output table in the virtual schema. It is a feature.

請求項５は、請求項２のクエリ処理方法において、
前記データ分散手順は、
前記各データベースシステムから前記処理対象データを取得する手順と、
各テーブルの１レコードずつ指定するデータ仮想化装置を判定する手順と
を含むことを特徴としている。 Claim 5 is the query processing method of claim 2,
The data distribution procedure is:
A procedure for acquiring the processing target data from each of the database systems;
And a procedure for determining a data virtualization apparatus for designating one record of each table.

請求項６は、請求項２のクエリ処理方法において、
前記クエリ処理手順は、
処理完了により次の処理を行う場合に、
単項演算であるか二項演算であるかを判定する手順と、
二項演算である場合は、二項演算の完了通知を待って次の処理を行うためのデータ移動とクエリ処理を行う手順と
を含むことを特徴としている。 Claim 6 is the query processing method of claim 2,
The query processing procedure includes:
When the next process is performed upon completion of the process,
A procedure for determining whether it is a unary operation or a binary operation;
In the case of a binary operation, it is characterized by including a procedure for performing data movement and query processing for performing the next processing after waiting for the completion notification of the binary operation.

請求項７は、請求項２の各手順をコンピュータに実行させるクエリ処理プログラム。
であることを特徴している。 Claim 7 is a query processing program for causing a computer to execute the steps of claim 2.
It is characterized by being.

本発明（請求項１及び請求項２）によれば、分散処理計画と処理計画配置を行うことで、複数のデータ仮想化装置上の処理を分散化し並列処理を行うことができ、大規模データ処理に対応することが可能となる。
また、仮想スキーマを利用した分散による並列性の向上を実現することができる。
また、仮想スキーマの処理を独立化することにより、クエリ再実行の効率化及び仮想化対象のデータベースシステムの負荷軽減を実現することができる。 According to the present invention (Claim 1 and Claim 2), by performing the distributed processing plan and the processing plan arrangement, it is possible to perform the parallel processing by distributing the processing on the plurality of data virtualization apparatuses. It becomes possible to deal with processing.
In addition, it is possible to improve parallelism by distribution using a virtual schema.
Further, by making the processing of the virtual schema independent, it is possible to realize the efficiency of query re-execution and the reduction of the load on the database system to be virtualized.

請求項３及び請求項４によれば、分散クエリ結果の容量と、仮想スキーマの処理における最大処理容量を考慮することで、適切なデータ仮想化装置群の選択が可能となる。 According to claims 3 and 4, an appropriate data virtualization apparatus group can be selected by considering the capacity of the distributed query result and the maximum processing capacity in the processing of the virtual schema.

請求項５によれば、データ仮想化装置を判定するに際して、各テーブルの１レコードずつ指定するので、処理に適したデータ仮想化装置を選択することができる。 According to the fifth aspect, when determining the data virtualization apparatus, one record of each table is designated, so that a data virtualization apparatus suitable for processing can be selected.

請求項６によれば、クエリ処理を行うに際し、二項演算の完了通知を待って次の処理を行うようにしたので、計画に沿ったクエリ処理を確実に行うことが可能となる。 According to the sixth aspect, when the query process is performed, the next process is performed after the notification of completion of the binary operation, so that the query process according to the plan can be reliably performed.

請求項７によれば、複数のデータ仮想化装置上の処理を分散化し並列処理を行うことができるクエリ処理方法をコンピュータ上に構築することができる。 According to the seventh aspect, it is possible to construct a query processing method capable of performing parallel processing by distributing processing on a plurality of data virtualization apparatuses on a computer.

本発明の分散型データ仮想化システムの全体構成図を示すモデル図である。It is a model figure which shows the whole block diagram of the distributed data virtualization system of this invention. 一般的なデータ仮想化システムにおける仮想スキーマの例を示すAn example of a virtual schema in a general data virtualization system 分散型データ仮想化システムにおけるデータ仮想化装置の構成を説明するためのブロック図である。It is a block diagram for demonstrating the structure of the data virtualization apparatus in a distributed data virtualization system. 分散型データ仮想化システムにおけるデータ仮想化処理及び分散クエリ処理において、データ仮想化装置が各データベースシステムの処理結果を受け取るまでのデータの流れを示すブロック図である。It is a block diagram which shows the data flow until a data virtualization apparatus receives the process result of each database system in the data virtualization process and distributed query process in a distributed data virtualization system. 分散型データ仮想化システムにおけるデータ仮想化処理及び分散クエリ処理において、データ仮想化装置が各データベースシステムの処理結果を分散処理するときのデータの流れを示すブロック図である。It is a block diagram which shows the data flow when a data virtualization apparatus carries out the distributed process of the processing result of each database system in the data virtualization process and distributed query process in a distributed data virtualization system. データ仮想化装置の分散処理計画部における処理手順を示すフローチャート図である。It is a flowchart figure which shows the process sequence in the distributed processing plan part of a data virtualization apparatus. 図５の仮想スキーマaを例に、ユーザが投稿したSQLクエリ「SELECT aid FROM a」のクエリプランをプランツリーへの変換例を示すモデル図である。FIG. 6 is a model diagram illustrating an example of converting a query plan of an SQL query “SELECT aid FROM a” posted by a user into a plan tree, using the virtual schema a of FIG. 5 as an example. 図６における「各テーブルの処理データ仮想化装置群選択」についての処理手順を示すフローチャート図である。FIG. 7 is a flowchart showing a processing procedure for “processing data virtualization apparatus group selection of each table” in FIG. 6. 作成した分散処理計画を利用するデータ仮想化装置群に配布し、データベースシステムにおけるクエリ処理が完了した直後のデータ分散処理について、図９に示す。フローチャート図である。FIG. 9 shows data distribution processing immediately after the query processing in the database system is completed after being distributed to the data virtualization apparatus group using the created distributed processing plan. It is a flowchart figure. データ仮想化装置のクエリ処理部において、クエリツリー上のそれぞれの処理が完了した後の処理の流れを図１０に示す。フローチャート図である。FIG. 10 shows a flow of processing after each processing on the query tree is completed in the query processing unit of the data virtualization apparatus. It is a flowchart figure.

本発明の分散型データ仮想化システムの実施形態について、図１を参照して説明する。
本発明の分散型データ仮想化システムは、データ仮想化システムにおいて、一台のデータ仮想化装置では処理しきれない大規模なデータを、複数のデータ仮想化装置に分散して並列処理を行うシステムであり、図１に示すように、システムを利用するクライアント１と、分散型データ仮想化システム管理装置２と、１つ以上のデータ仮想化装置３と、分散型データ仮想化システムが利用する１つ以上のデータベースシステム４から構成されている。
各データ仮想化装置３と、クライアント１及び分散型データ仮想化システム管理装置２とはネットワークXを介して接続されている。また、各データ仮想化装置３と各データベースシステム４とはネットワークYを介して接続されている。 An embodiment of a distributed data virtualization system of the present invention will be described with reference to FIG.
The distributed data virtualization system of the present invention is a data virtualization system in which large-scale data that cannot be processed by a single data virtualization apparatus is distributed to a plurality of data virtualization apparatuses for parallel processing. As shown in FIG. 1, a client 1 that uses the system, a distributed data virtualization system management device 2, one or more data virtualization devices 3, and a distributed data virtualization system 1 It is composed of two or more database systems 4.
Each data virtualization apparatus 3, the client 1, and the distributed data virtualization system management apparatus 2 are connected via a network X. Each data virtualization apparatus 3 and each database system 4 are connected via a network Y.

クライアント１、分散型データ仮想化システム管理装置２、各データ仮想化装置３及び各データベースシステム４は、それぞれ、オペレーティングシステム（ＯＳ）を含む基本プログラムや各種の基本デバイスが記憶されたＲＯＭと、各種のプログラムションやデータが記憶されるハードディスクドライブ装置(HDD)と、CR-ROMやDVD等の記憶媒体からプログラムやデータを読み出すメディアドライブ装置と、プログラムを実行するCPUと、このCPUにワークエリアを提供するRAMと、外部装置と通信するパラレル／シリアルI/Fとを主要部分とする一般的な構成を備えたコンピュータ上に構築されている。
例えば、上述した構成を有する各コンピュータにおいて、メディアドライブ装置で読み取られたクエリ処理プログラムがHDDにインストールされることで分散型データ仮想化システムが構築される。 The client 1, the distributed data virtualization system management device 2, each data virtualization device 3, and each database system 4 are each a ROM in which a basic program including an operating system (OS) and various basic devices are stored, A hard disk drive (HDD) that stores the program and data, a media drive that reads programs and data from storage media such as CR-ROM and DVD, a CPU that executes the program, and a work area for this CPU It is constructed on a computer having a general configuration mainly including a RAM to be provided and a parallel / serial I / F communicating with an external device.
For example, in each computer having the above-described configuration, a distributed data virtualization system is constructed by installing a query processing program read by a media drive device in an HDD.

クライアント１は、データ仮想化装置３の利用状況や稼働状況から適切な一つのデータ仮想化装置３に接続し、クエリの投稿と結果の受信を行う分散コネクタ１１を備えている。 The client 1 includes a distributed connector 11 that connects to one data virtualization device 3 appropriate from the usage status and operation status of the data virtualization device 3 and posts a query and receives a result.

分散型データ仮想化システム管理装置２は、それぞれのデータ仮想化装置３が持つリソース容量及び現在の利用状況（リソース情報）を保持するリソース管理データ２１と、各データベースシステム４上にあるテーブル情報（物理モデルやサイズ、クエリによるサイズ変更差分）を持つデータソース管理データ２２と、その物理モデルに対する仮想スキーマの構成、変換手順（論理モデル情報）及び消費リソース量に関する情報（消費情報）を持つモデルデータ２３を備えている。 The distributed data virtualization system management device 2 includes resource management data 21 that holds the resource capacity and current usage status (resource information) of each data virtualization device 3, and table information on each database system 4 ( Data source management data 22 having physical model, size, and difference in size by query), model data having virtual schema configuration, conversion procedure (logical model information) and information on consumption resource (consumption information) for the physical model 23.

モデルデータ２３が有する仮想スキーマは、ユーザがクエリで指定する論理テーブルを、どのデータベースシステム４上のテーブルに対して、どのように処理を行うかに関する仮想スキーマ情報を管理している。例えば、データ仮想化装置３にクエリが投稿された場合の仮想スキーマによる一般的な処理例について、図２に示す。
ここでは、クライアント１からテーブルaへのSQLクエリが投稿されたとする。データ仮想化装置３上では、テーブルaに対して、テーブルsとテーブルtのUNION処理の結果が仮想スキーマaと定義されている。テーブルsは、実際にはデータベースシステムS上にあるテーブルであり、テーブルtは、データベースシステムTにあるテーブルである。 The virtual schema included in the model data 23 manages virtual schema information related to how the logical table specified by the user is processed with respect to the table on which database system 4. For example, FIG. 2 shows a general processing example using a virtual schema when a query is posted to the data virtualization apparatus 3.
Here, it is assumed that an SQL query from the client 1 to the table a is posted. On the data virtualization apparatus 3, the result of UNION processing of the table s and the table t is defined as the virtual schema a for the table a. The table s is actually a table on the database system S, and the table t is a table on the database system T.

このとき、クライアント１が投稿したSQLクエリに対して、テーブルaを仮想スキーマaの処理に変換する。そして、このクエリは、データ仮想化装置３上で実行されるクエリ（UNION）、データベースシステムS上で実行されるクエリ（テーブルsへのSQLクエリ）、データベースシステムT上で実行されるクエリ（テーブルtへのSQLクエリ）に分解され、各データベースシステム４にクエリが投稿される。
それぞれのデータベースシステム４の処理結果は、クエリを投稿したデータ仮想化装置３に収集され、予め作られたクエリ（UNION）処理を実行してクライアント１に送信される。 At this time, the table a is converted into the process of the virtual schema a for the SQL query posted by the client 1. This query includes a query (UNION) executed on the data virtualization apparatus 3, a query executed on the database system S (SQL query to the table s), and a query executed on the database system T (table (SQL query to t), and a query is posted to each database system 4.
The processing results of each database system 4 are collected by the data virtualization apparatus 3 that posted the query, and are transmitted to the client 1 by executing a query (UNION) process created in advance.

データ仮想化装置３は、図３〜図５に示すように、クエリ評価部３１と、分散クエリ生成部３２と、分散クエリ投稿部３３と、分散処理計画部３４と、処理計画配置部３５と、クエリ結果受信部３６、データ分散部３７と、クエリ処理部３８と、データ集計部３９とを備えて構成されている。 As shown in FIGS. 3 to 5, the data virtualization apparatus 3 includes a query evaluation unit 31, a distributed query generation unit 32, a distributed query posting unit 33, a distributed processing plan unit 34, and a processing plan placement unit 35. A query result receiving unit 36, a data distribution unit 37, a query processing unit 38, and a data totaling unit 39.

クエリ評価部３１は、クライアント１からクエリ処理要求を受け取り、モデルデータ２３から仮想スキーマ情報を取得Ａして、この情報に従ってクエリ処理の手順を定義したクエリプランを生成する（図４）。
分散クエリ生成部３２は、データソース管理データ２２からテーブル情報を取得Ｂし、この情報を基にクエリプランに従って適切なデータベースシステム４向けのクエリを生成（再構成）する（図４）。
分散クエリ投稿部３３は、分散クエリ生成部３２で生成したクエリを各データベースシステム４に投稿Ｃする（図３）。 The query evaluation unit 31 receives a query processing request from the client 1, acquires virtual schema information A from the model data 23, and generates a query plan that defines a query processing procedure according to this information (FIG. 4).
The distributed query generation unit 32 acquires table information B from the data source management data 22, and generates (reconstructs) an appropriate query for the database system 4 according to the query plan based on this information (FIG. 4).
The distributed query posting unit 33 posts the query generated by the distributed query generation unit 32 to each database system 4 (FIG. 3).

分散処理計画部３４は、リソース管理データ２１からリソース情報を取得Ｄし、モデルデータ２３から論理モデル情報及び消費情報を取得Ｅし、データソース管理データ２２からテーブル情報を取得Ｆし、クエリ評価部３１で生成されたクエリプランに対してこれらの情報を利用して、処理内容と処理を実施するデータ仮想化装置３を選択する分散処理計画を生成する。
処理計画配置部３５は、分散処理するデータ仮想化装置群に分散処理計画部３４で生成した分散処理計画を分配Ｇする（図３）。 The distributed processing plan unit 34 obtains resource information D from the resource management data 21, obtains logical model information and consumption information E from the model data 23, obtains table information F from the data source management data 22, and obtains a query evaluation unit. By using these pieces of information for the query plan generated in 31, a distributed processing plan for selecting the processing contents and the data virtualization apparatus 3 that performs the processing is generated.
The processing plan arranging unit 35 distributes the distributed processing plan generated by the distributed processing planning unit 34 to the data virtualization apparatus group to be distributed (FIG. 3).

クエリ結果受信部３６は、各データベースシステム４からのクエリ実行結果の受信Ｈし、データ分散部３７への出力Ｉを行う（図４）。
データ分散部３７は、リソース管理データ２１からリソース情報を受信Ｊするとともに、処理計画配置部３５から分散処理計画を受け取り、この計画に基いて各データベースシステム４から受信したクエリ実行結果をレコード単位で各データ仮想化装置３のデータ分散部３７に分配Ｋ又は受信する（図５）。
クエリ処理部３８は、データ分散部３７との間で分散処理計画に基いた処理Ｌを行う（図５）。
データ集計部３９は、処理Lの結果を各クエリ処理部３８から受信Ｍして集計し、その結果をクライアント１の分散コネクタ１１に転送Ｏする（図５）。
なお、データ仮想化装置を構成する各データ仮想化装置３は、それぞれ同様の機能を備えて構成されている。 The query result receiving unit 36 receives a query execution result H from each database system 4 and outputs I to the data distribution unit 37 (FIG. 4).
The data distribution unit 37 receives the resource information from the resource management data 21 and also receives the distributed processing plan from the processing plan arrangement unit 35, and the query execution result received from each database system 4 based on this plan in units of records. Distribution K is received or received by the data distribution unit 37 of each data virtualization apparatus 3 (FIG. 5).
The query processing unit 38 performs processing L based on the distributed processing plan with the data distribution unit 37 (FIG. 5).
The data totaling unit 39 receives and aggregates the results of processing L from each query processing unit 38, and transfers the results to the distributed connector 11 of the client 1 (FIG. 5).
Each data virtualization apparatus 3 constituting the data virtualization apparatus is configured to have the same function.

データベースシステム４は、クエリを処理するクエリエンジン４１と、実データを保管するデータソース４２から構成されている。 The database system 4 includes a query engine 41 that processes queries and a data source 42 that stores actual data.

次に、分散型データ仮想化システムにおけるデータ仮想化処理及び分散クエリ処理の一連の流れについて説明する。
ユーザは、クライアント１の分散コネクタ１１を経由してクエリを投稿する。
クエリ評価部３１はクエリを受け取り、クエリの処理対象となるテーブルに対応する仮想スキーマをモデルデータ２３から呼び出し、ユーザが投稿したクエリに仮想スキーマの処理を適用した形でクエリプランを生成する。
このクエリプランには、データ仮想化装置（以下、ノードと称する）３群に接続されたデータベースシステム４群の実際のテーブルが含まれる。 Next, a series of flow of data virtualization processing and distributed query processing in the distributed data virtualization system will be described.
The user posts a query via the distributed connector 11 of the client 1.
The query evaluation unit 31 receives the query, calls a virtual schema corresponding to the table to be processed by the query from the model data 23, and generates a query plan in a form in which the processing of the virtual schema is applied to the query posted by the user.
This query plan includes an actual table of the database system 4 group connected to the data virtualization apparatus (hereinafter referred to as a node) 3 group.

分散クエリ生成部３２は、このテーブルとデータソース管理データ２２を突合し、対象のデータベースシステム４を明確にする。そして、クエリプランを基に実際のデータベースシステム４で実行可能なクエリを生成する。
分散クエリ投稿部３３は、この実行可能なクエリを、対象となるデータベースシステム４にそれぞれ投稿する。 The distributed query generation unit 32 matches this table with the data source management data 22 to clarify the target database system 4. Then, a query that can be executed by the actual database system 4 is generated based on the query plan.
The distributed query posting unit 33 posts the executable query to the target database system 4.

分散処理計画部３４は、クエリ評価部３１がクエリプランを生成した段階で、このクエリプランを利用してノード３群の分散処理計画を作成する。
分散処理計画部３４における処理手順について、図６及び図７を参照して説明する。 When the query evaluation unit 31 generates a query plan, the distributed processing plan unit 34 creates a distributed processing plan for the group of nodes 3 using the query plan.
A processing procedure in the distributed processing planning unit 34 will be described with reference to FIGS.

クエリプラン（文字データ）を基に、処理対象テーブルと処理内容の流れを、処理毎の中間テーブルも含めて表すプランツリーを生成する（ステップ６１）。プランツリーは、図７に示すように、テーブルに対する処理が連なるように構成されている。
次に、モデルデータ２３及びデータソース管理データ２２を取得し（ステップ６２）、プランツリー上の処理内容から各データベースシステム４から得られる分散クエリ結果の容量を計算する（ステップ６３）。 Based on the query plan (character data), a plan tree is generated that represents the processing target table and the flow of processing contents, including the intermediate table for each process (step 61). As shown in FIG. 7, the plan tree is configured so that the processes for the table are continuous.
Next, the model data 23 and the data source management data 22 are acquired (step 62), and the capacity of the distributed query result obtained from each database system 4 is calculated from the processing contents on the plan tree (step 63).

この結果から、仮想スキーマで得られるテーブルや中間テーブル、最終的な結果の出力容量と、仮想スキーマの処理における最大処理容量を計算する（ステップ６４）。最大処理容量は、必要なノードの数を決める際に有用なデータとなる。
計算した容量を利用して、必要なノードの数及び適用先のノード（自身なのか別なのか）を決定する（ステップ６５）。このとき、最終的にデータを収集するノード３が、現在分散処理を作成しているノード３と異なる場合には（ステップ６６）、クライアント１の分散コネクタ１１に接続先変更を依頼し（ステップ６７）、分散処理計画を完了する（ステップ６８）。
このような処理を行うことで、分散クエリ結果の容量と、仮想スキーマの処理における最大処理容量が考慮されるので、適切なノードの選択が可能となる。 From this result, the table and intermediate table obtained in the virtual schema, the output capacity of the final result, and the maximum processing capacity in the processing of the virtual schema are calculated (step 64). The maximum processing capacity is useful data when determining the number of necessary nodes.
Using the calculated capacity, the number of necessary nodes and the application destination node (whether it is itself or another) are determined (step 65). At this time, if the node 3 that finally collects data is different from the node 3 that is currently creating the distributed processing (step 66), it requests the distributed connector 11 of the client 1 to change the connection destination (step 67). ) Complete the distributed processing plan (step 68).
By performing such processing, the capacity of the distributed query result and the maximum processing capacity in the processing of the virtual schema are taken into consideration, so that an appropriate node can be selected.

図７は、図２の仮想スキーマaを例に、ユーザが投稿したSQLクエリ「SELECT aid FROM a」のクエリプランをプランツリーに変換した例を示す。
ユーザが投稿したSQLクエリは、図７の例のように、３つのSQLクエリに分解される。これらのうち物理モデルであるテーブルsとテーブルtへのクエリの結果は、中間テーブルとしてノード３上に収集され、仮想スキーマaで定義されたUNION処理の対象として処理される。 FIG. 7 shows an example in which the query plan of the SQL query “SELECT aid FROM a” posted by the user is converted into a plan tree, taking the virtual schema a of FIG. 2 as an example.
The SQL query posted by the user is decomposed into three SQL queries as in the example of FIG. Of these, the results of queries to the tables s and t, which are physical models, are collected on the node 3 as intermediate tables and processed as UNION processing targets defined in the virtual schema a.

このような流れについて、処理の前後のテーブルを明確に表したものが図７のプランツリーである。すなわち、各データベースシステムから得られるテーブル容量としてデータソース管理データ２２から取得したテーブルsのサイズ（30GB）とテーブルtのサイズ（10GB）が示される。また、中間テーブルの出力容量として、Projection処理後のテーブルsのサイズ（6GB）とテーブルtのサイズ（2GB）、及び、モデルデータ２３にある仮想スキーマ処理のサイズ計算メソッドを適用した処理後のサイズ（8GB）が示されている。 The plan tree in FIG. 7 clearly shows the table before and after the processing for such a flow. That is, the table size obtained from the data source management data 22 (30 GB) and the table t size (10 GB) are shown as the table capacity obtained from each database system. In addition, as the output capacity of the intermediate table, the size of the table s after Projection processing (6 GB) and the size of the table t (2 GB), and the size after processing applying the size calculation method of the virtual schema processing in the model data 23 (8GB) is shown.

次に、図６における「各テーブルの処理ノード群選択」（ステップ６５）についての詳細の処理フローを図８に示す。
処理ノード群の選択が開始された場合（ステップ８１）、作成したプランツリーを根（クエリ結果）から順に、クエリ処理を実行するノードを決める対象となるテーブルを降順探索する（ステップ８２）。 Next, FIG. 8 shows a detailed processing flow for “selection of processing node group of each table” (step 65) in FIG.
When selection of a processing node group is started (step 81), the created plan tree is searched in descending order from the root (query result) in order to determine a table for determining a node for executing query processing (step 82).

探索で対象のテーブルが発見された場合（ステップ８３）、対象テーブルが、論理テーブル（仮想スキーマにおける出力テーブル）、仮想スキーマ内のテーブル（仮想スキーマにおける物理テーブル又は中間テーブル）、仮想スキーマ外のテーブル（論理テーブルに対する処理における中間テーブル又はクエリ結果）のいずれであるかのテーブル種別を行う（ステップ８４）。
ステップ８４のテーブル種別で、論理テーブルを発見した場合、当該仮想スキーマにおける最大出力容量と、ノード群における利用状況及び利用可能メモリサイズからノード群を選択する（ステップ８５）。
なお、仮想スキーマ内のテーブルである場合は、仮想スキーマ全体で処理容量を計算しているため、ここでは無視して再度探索を続ける。 When a target table is found by the search (step 83), the target table is a logical table (output table in the virtual schema), a table in the virtual schema (physical table or intermediate table in the virtual schema), a table outside the virtual schema The type of the table (intermediate table or query result in the process for the logical table) is determined (step 84).
When a logical table is found with the table type in step 84, a node group is selected from the maximum output capacity in the virtual schema, the usage status in the node group, and the available memory size (step 85).
In the case of a table in the virtual schema, the processing capacity is calculated for the entire virtual schema, so the search is ignored here and the search is continued again.

ステップ８４のテーブル種別で、仮想スキーマ外のテーブル（中間テーブル又はクエリ結果）を発見した場合は、当該テーブルの出力容量とノード群の利用可能メモリサイズを基に、ツリーにおける下位（葉側）のテーブルに対する処理を行ったノードの中から処理するノード群を選択する（ステップ８６）。このとき、できる限りデータ通信が少なくなるよう、テーブルデータが多く含まれているノードを優先する。また併せて、利用するノードのリソース管理データを更新する。
ここで空いているノードがない場合、すでに実行されている処理が最も早く終わると考えられるノードを選択し、対象ノードのリソース管理データの処理待ちキューの対象として記録される。 If a table (intermediate table or query result) outside the virtual schema is found in the table type in step 84, the lower (leaf side) of the tree is determined based on the output capacity of the table and the available memory size of the node group. A node group to be processed is selected from the nodes that have processed the table (step 86). At this time, priority is given to a node including a large amount of table data so that data communication is reduced as much as possible. At the same time, the resource management data of the node to be used is updated.
If there is no vacant node, the node that is considered to finish processing that has already been executed earliest is selected and recorded as the target of the processing queue of the resource management data of the target node.

ノード群を選択後、当該ノード群へのデータ転送を行うために、ハッシュ化対象属性を選択する（ステップ８７）。この属性は、直前の処理内容に依存し、例えばJOINのように、特定の属性毎の処理が必要な場合には選択されるが、UNION処理のようなハッシュ化が必要のない場合には選択されない。 After selecting a node group, a hashing target attribute is selected in order to transfer data to the node group (step 87). This attribute depends on the content of the previous process, and is selected when processing for each specific attribute is required, such as JOIN, but is selected when hashing is not required, such as UNION processing. Not.

以上の処理をプランツリー全体に対して行った後（ステップ８８）、処理ノード群選択の処理が完了する（ステップ８９）。
このように、仮想スキーマをベースとしたノード群の選択を行うことで、依存性のあるテーブル間の処理を特定のノードに集めることができ、クエリ処理に対して高い並列性を出すことが可能となる。 After the above processing is performed on the entire plan tree (step 88), processing node group selection processing is completed (step 89).
In this way, by selecting a node group based on the virtual schema, it is possible to collect the processing between the dependent tables in a specific node, and it is possible to give high parallelism to the query processing. It becomes.

更には、仮にある特定のノードが実行中に異常終了し、ノード単位で再実行が不可能になったとしても、異常終了したノードで実行された仮想スキーマの処理のみを再実行すればよく、ノード群におけるクエリ処理と、対象となるデータベースシステムへのクエリ処理要求を最低限に抑えることが可能となる。 Furthermore, even if a specific node ends abnormally during execution and re-execution becomes impossible in units of nodes, it is sufficient to re-execute only the processing of the virtual schema executed on the abnormally ended node. It is possible to minimize the query processing in the node group and the query processing request to the target database system.

次に、作成した分散処理計画を利用するノード群に配布し、データベースシステムにおけるクエリ処理が完了した直後のデータ分散処理の詳細フローについて、図９に示す。
各データベースシステム４にクエリを投稿したノード３において、データベースシステム４の実行結果が受信されると（ステップ９１）、クエリ結果受信部３６は、この結果データをストリーム的にデータ分散部３７に１レコードずつ渡す（ステップ９２）。 Next, FIG. 9 shows a detailed flow of data distribution processing immediately after the created distributed processing plan is distributed to the nodes that use the distributed processing plan and the query processing in the database system is completed.
When the execution result of the database system 4 is received in the node 3 that has submitted the query to each database system 4 (step 91), the query result receiving unit 36 streams the result data to the data distribution unit 37 in one record. (Step 92).

このときデータ分散部３７は、処理計画配置部３５から取得した分散処理計画を基に、対象となるノード３のデータ分散部３７へ結果データを転送する。転送先が複数あり、ハッシュ化対象属性がある場合は、これを利用して複数の対象ノードから一つを選択する。ハッシュ化対象属性がない場合は、複数の対象ノード３から空いているノード３を選択する。
したがって、ノード３を判定するに際して、各テーブルの１レコードずつ指定するので、処理に適したノードが選択可能となる。
ここで、事前に計算した最大処理容量の誤差等でノード３のメモリに空きがなくなった場合、リソース管理データから代替のノード３を決定し、分散処理計画を修正後に全利用ノード３に再配布し、レコードを対象のノード３に転送する（ステップ９３）。
転送の可否が判断され（ステップ９４）、データソース管理データ２２上も空きノード３がなく転送できない場合には、分散処理計画部３４と同様に、すでに実行されている処理が最も早く終わると考えられるノード３を代替ノードとして選択する（ステップ９５）。 At this time, the data distribution unit 37 transfers the result data to the data distribution unit 37 of the target node 3 based on the distributed processing plan acquired from the processing plan arrangement unit 35. When there are a plurality of transfer destinations and there is a hashing target attribute, one is selected from the plurality of target nodes using this. When there is no hashing target attribute, an empty node 3 is selected from the plurality of target nodes 3.
Therefore, when determining the node 3, one record of each table is designated, so that a node suitable for processing can be selected.
Here, when the memory of the node 3 is full due to an error in the maximum processing capacity calculated in advance, the alternative node 3 is determined from the resource management data, and the distributed processing plan is corrected and redistributed to all the nodes 3 The record is transferred to the target node 3 (step 93).
If it is determined whether or not transfer is possible (step 94) and there is no empty node 3 on the data source management data 22 and transfer is not possible, it is considered that the processing already executed is completed earliest, as with the distributed processing planning unit 34. Node 3 to be selected is selected as an alternative node (step 95).

データ転送時に、対象のノード３の処理待ちキューに当該クエリ処理が登録されている場合、対象のノード３のディスク上等の、現在実行されている処理を阻害しない場所にデータを格納する。実行中の処理が終了後、対象のテーブルに対する分散クエリ処理の実行が開始される。これにより、実行中クエリを阻害することなく、またクエリ処理のメモリ過剰で強制終了することなくクエリを実行することが可能となる。 When the query processing is registered in the processing queue of the target node 3 at the time of data transfer, the data is stored in a place that does not interfere with the currently executed processing, such as on the disk of the target node 3. After the process being executed is completed, the execution of the distributed query process for the target table is started. As a result, it is possible to execute the query without hindering the query being executed and without forcibly terminating the query processing due to excessive memory.

ステップ９４において転送可である場合、データベースシステム単位でデータ転送が完了した時点で（ステップ９６）、対象のノードにテーブルの受信完了通知を配布する（ステップ９７）。
続いて、処理続行の可否を判断する（ステップ９８）。例えば、演算が二項処理（２つの演算結果が必要）であって、もう片方のテーブルの受信が完了してない場合や、処理待ちキューに登録されている場合は、それぞれの完了通知が配布されるまで実行は開始されない。全ての完了通知が配布されて初めて、対象ノードのクエリ処理部３８におけるクエリ処理が開始される（ステップ９９）。 If transfer is possible in step 94, when the data transfer is completed in units of database systems (step 96), a table reception completion notice is distributed to the target node (step 97).
Subsequently, it is determined whether or not processing can be continued (step 98). For example, if the operation is binary processing (two operation results are required) and reception of the other table is not completed, or if it is registered in the processing queue, each completion notification is distributed. The execution will not start until Only after all completion notifications are distributed, the query processing in the query processing unit 38 of the target node is started (step 99).

次に、クエリ処理部３８にて、クエリツリー上のそれぞれの処理が完了した後における詳細な処理フローを図１０に示す。
クエリ処理部３８では、処理終了後（ステップ１０１）、残処理の有無を判断し（ステップ１０２）、他の処理が残っている場合は、分散処理計画に従って次の処理内容を続行する（ステップ１０３）。
他の処理が残っていない場合は、分散処理計画に従ってデータ集計部３９に転送する（ステップ１０４）。 Next, a detailed processing flow after each processing on the query tree is completed in the query processing unit 38 is shown in FIG.
The query processing unit 38 determines whether or not there is remaining processing (step 102) after the processing is completed (step 101). When other processing remains, the next processing content is continued according to the distributed processing plan (step 103). ).
If no other processing remains, the data is transferred to the data totaling unit 39 according to the distributed processing plan (step 104).

ステップ１０３で次の処理内容を続行する場合、完了した処理の結果のテーブルに対して、次に利用する処理が二項演算の場合、対象処理ノード全体でデータの移動を行い（ステップ１０５）、もう一方の完了を待つ必要がある。そのため、処理対象の両テーブルともに完了していることを完了通知の有（ステップ１０６）や完了通知待ち（ステップ１０７）で確認してから、次の処理が開始される。
したがって、クエリ処理を行うに際し、二項演算の完了通知を待って次の処理を行うようにしたので、計画に沿ったクエリ処理を確実に行うことが可能となる。 When the next processing content is continued in step 103, if the next processing to be used is a binary operation with respect to the table of the result of the completed processing, data is moved in the entire target processing node (step 105), You need to wait for the other to complete. For this reason, after confirming that both tables to be processed have been completed (step 106) and waiting for completion notification (step 107), the next processing is started.
Therefore, when the query process is performed, the next process is performed after the completion notification of the binary operation, so that the query process according to the plan can be reliably performed.

単項演算の場合は、他の処理に影響されることはないため、処理終了後、そのまますぐ開始される。ただし、仮にデータのサイズが処理前に比べて大きくなり、他ノードへの転送が必要となっていて、そのノードが処理待ちキューに登録されている場合は、その完了を待つ必要がある。 In the case of a unary operation, since it is not affected by other processing, it is started immediately after the processing is completed. However, if the size of the data is larger than before processing and transfer to another node is required, and that node is registered in the processing queue, it is necessary to wait for the completion.

次の処理に伴うデータ移動において（ステップ１０８）、メモリサイズを超えてしまった場合は（ステップ１０９）、クエリツリー上のこれまで利用してきたノードのうち、メモリが空いているノードに結果を転送する（ステップ１１０）。もし空きがない場合は、前述の代替ノード選択のロジック（ステップ９５）で転送先ノードを決定する。
完了した処理がクエリツリー上の最も根に近い最終処理の場合は、クエリ処理全体が完了となることから、最終処理の結果をデータ集計部３９に転送する。 If the memory size is exceeded (step 109) in the data movement associated with the next processing (step 108), the result is transferred to the nodes that have free memory among the nodes used so far on the query tree. (Step 110). If there is no vacancy, the transfer destination node is determined by the above-mentioned alternative node selection logic (step 95).
If the completed process is the final process closest to the root on the query tree, the entire query process is completed, and the result of the final process is transferred to the data totaling unit 39.

その後、クエリ集計部３９にて、必要に応じた最終的な処理を行って、クライアント１に結果を転送し、クエリ処理が完了する。 Thereafter, the query totaling unit 39 performs final processing as necessary, transfers the result to the client 1, and the query processing is completed.

上述した分散型データ仮想化システムによれば、分散処理計画部３４と処理計画配置部３５を備えることで、複数のデータ仮想化装置（ノード）３上の処理を分散化し並列処理を行うことにより、大規模データ処理に対応することが可能となる。
また、仮想スキーマ単位でテーブルを処理することで、複数仮想スキーマに対して仮想スキーマ毎にデータ仮想化装置（ノード）３へ分散することにより、並列性の向上を実現することができる。
また、仮想スキーマの処理を仮想スキーマ単位に独立化することにより、クエリ再実行の効率化及び仮想化対象のデータベースシステムの負荷軽減を実現することができる。 According to the distributed data virtualization system described above, by providing the distributed processing plan unit 34 and the processing plan placement unit 35, processing on a plurality of data virtualization apparatuses (nodes) 3 is distributed and parallel processing is performed. It becomes possible to cope with large-scale data processing.
Further, by processing the table in units of virtual schemas, parallelism can be improved by distributing the virtual schemas to the data virtualization apparatus (node) 3 for each virtual schema.
Further, by making the virtual schema processing independent for each virtual schema, it is possible to realize the efficiency of query re-execution and the load reduction of the database system to be virtualized.

１…クライアント、２…分散型データ仮想化システム管理装置、３…データ仮想化装置（ノード）、４…データベースシステム、１１…分散コネクタ、２１…リソース管理データ、２２…データソース管理データ、２３…モデルデータ、３１…クエリ評価部、３２…分散クエリ生成部、３３…分散クエリ投稿部、３４…分散処理計画部、３５…処理計画配置部、３６…クエリ結果受信部、３７…データ分散部、３８…クエリ処理部、３９…データ集計部。 DESCRIPTION OF SYMBOLS 1 ... Client, 2 ... Distributed data virtualization system management apparatus, 3 ... Data virtualization apparatus (node), 4 ... Database system, 11 ... Distributed connector, 21 ... Resource management data, 22 ... Data source management data, 23 ... Model data 31 ... Query evaluation unit 32 ... Distributed query generation unit 33 ... Distributed query posting unit 34 ... Distributed processing plan unit 35 ... Process plan placement unit 36 ... Query result receiving unit 37 ... Data distribution unit 38 ... Query processing part, 39 ... Data totaling part.

Claims

A client that performs query processing request and result reception, a data virtualization device that performs query processing, one or more database systems used by the data virtualization device, information about the resource amount of the data virtualization device, and In a distributed data virtualization system comprising: a distributed data virtualization system management device that manages table information of the database system;
The data virtualization apparatus is composed of a plurality of data virtualization apparatuses,
Each of the data virtualization devices
Considering the query plan defining the query processing procedure, the resource amount, and the table information, selecting a plurality of data virtualization apparatuses that execute the query processing in parallel, and contents of query processing in the plurality of data virtualization apparatuses, A distributed processing plan section for creating a distributed processing plan including
A processing plan placement unit that distributes the distributed processing plan to a group of data virtualization apparatuses that perform distributed processing;
In accordance with the distributed processing plan, the data distribution unit that distributes or receives the processing target data acquired from each database system to other data virtualization apparatuses in units of records, and
A query processing unit that performs processing according to the distributed processing plan; and a data totaling unit that receives results of processing according to the distributed processing plan from each query processing unit and transmits the totaled results to the client. A featured distributed data virtualization system.

A client that performs query processing request and result reception, a plurality of data virtualization apparatuses that perform query processing, a distributed data virtualization system management apparatus that manages information related to the resource amount of the data virtualization apparatus, and the data virtualization A query processing method in a distributed data virtualization system comprising one or more database systems used by a device, comprising:
When there is a query processing request from the client,
Query plan creation procedure for creating a query plan that defines the query processing procedure,
In consideration of the query plan, the resource amount, and the table information, a distributed processing plan including selection of a plurality of data virtualization apparatuses that execute query processing in parallel and the contents of distributed query processing in the plurality of data virtualization apparatuses A distributed processing plan procedure to create
A process distribution procedure for distributing the distributed processing plan to the group of data virtualization devices to be distributed;
A data distribution procedure for distributing or receiving the processing target data acquired from each database system in units of records according to the distributed processing plan;
A query processing procedure comprising: a query processing procedure for performing processing in accordance with the distributed processing plan; and a data summarizing procedure for receiving results of processing in accordance with the distributed processing plan and transmitting the aggregated results to the client Method.

The distributed processing plan procedure is:
A procedure for calculating the capacity of the distributed query result obtained from each database system from the distributed query processing content;
From the calculated capacity, a table or intermediate table obtained in the virtual schema, a final output capacity, and a procedure for calculating the maximum processing capacity in the virtual schema processing;
The query processing method according to claim 2, further comprising: a data virtualization apparatus group selection procedure that determines the number of necessary data virtualization apparatuses and the application destination data virtualization apparatus using the calculated capacity.

The data virtualization apparatus group selection procedure includes:
A procedure for searching in descending order a table that is a target for determining a node for executing the query processing in order from the root (query result) of the plan tree created from the query plan;
When the table is an output table in a virtual schema, it includes a procedure for selecting a data virtualization apparatus group from a maximum output capacity in the virtual schema, a usage status in a data virtualization apparatus group, and an available memory size. 3. The query processing method according to 3.

The data distribution procedure is:
A procedure for acquiring the processing target data from each of the database systems;
The query processing method according to claim 2, further comprising a procedure for determining a data virtualization apparatus that designates one record in each table.

The query processing procedure includes:
When the next process is performed upon completion of the process,
A procedure for determining whether it is a unary operation or a binary operation;
3. The query processing method according to claim 2, further comprising: a data movement for performing a next process after waiting for a binary operation completion notification and a procedure for performing a query process in the case of a binary operation.

A query processing program for causing a computer to execute each procedure of claim 2.