JP2015045996A

JP2015045996A - Distributed query processing apparatus, processing method, and processing program

Info

Publication number: JP2015045996A
Application number: JP2013176710A
Authority: JP
Inventors: 和広斉藤; Kazuhiro Saito
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2013-08-28
Filing date: 2013-08-28
Publication date: 2015-03-12
Anticipated expiration: 2033-08-28
Also published as: JP6204753B2

Abstract

PROBLEM TO BE SOLVED: To obtain a distributed query processing apparatus by which a query processing engine is selected in accordance with characteristics of query processing engines and contents of a query in a shared database system.SOLUTION: A system in which one database is shared among a plurality of query processing engines and processing of a query posted from a client is performed, includes: a multi-database management data 10 which manages database information 11 about the database (shared database 40) and engine information 12 about characteristics of the query processing engines 41; a distributed query generation unit 15 which selects an adequate query processing engine on the basis of the analysis result of a query posted from a client 2 and decomposes and reconstructs the query; and a query execution control unit 16 which executes processing of the query by the adequate query processing engine and receives the final execution result to transfer it to the client 2.

Description

本発明は、一つのデータベースを複数のクエリ処理エンジンが共有するデータ共有型データベースシステムにおいて、クエリ処理エンジンの特性とクエリの内容に応じてクエリを分割し、自動的に分配して処理を行う分散クエリ処理装置、処理方法及び処理プログラムに関する。 In a data sharing type database system in which a plurality of query processing engines share one database, the present invention divides a query according to the characteristics of the query processing engine and the content of the query, and distributes the processing automatically. The present invention relates to a query processing device, a processing method, and a processing program.

売上データや顧客データなど、企業などの組織で扱われる多くの情報はデータベースシステムに保存され、様々なアプリケーションから利用される。理想的には一つのデータベースシステムに全てのデータを重複させることなく格納し、全てのアプリケーションが一つのデータベースシステムを利用することが望まれる。
しかし実際には、目的に応じて複数のデータベースシステムが構築され、まったく同じデータが複数のデータベースシステムに格納されてしまうことや、同じようなデータが異なるデータ形式・粒度で保持する複数データベースシステムが構築される。
これにより、新規にアプリケーションを構築する上で、必要なデータを持つデータベースシステムを探し、更にその中でどのデータベースシステムに接続すればよいかの検討が必要となる等、様々な点で作業が煩雑になるという問題があった。 Many pieces of information handled by organizations such as sales data and customer data are stored in a database system and used by various applications. Ideally, it is desirable that all data be stored in one database system without duplication, and that all applications use one database system.
However, in reality, multiple database systems are built according to the purpose, and the exact same data is stored in multiple database systems, or there are multiple database systems that hold similar data in different data formats and granularities. Built.
As a result, when building a new application, it is necessary to search for a database system that has the necessary data, and to investigate which database system should be connected to it. There was a problem of becoming.

このような問題を解決するために、複数のデータベースシステムを仮想的に一つのデータベースシステムに見せるマルチデータベースシステム（又はデータ仮想化システム）が開発されている。
例えば、特許文献１に示されるように、複数の階層的なデータベースシステムを、データマッピングにより仮想的なスキーマに統合し、クエリ実行時において処理対象となるデータを保持するデータベースシステムにクエリを分配するマルチデータベースシステムが提案されている。
このシステムによれば、各データベースシステムで実行されたクエリの結果は中央に収集され、一つに統合して結果を出力するシステムとなっている。しかし、複数のデータベースシステムが同一のデータを保持することを考慮しておらず、仮想的なスキーマの生成時にデータの重複をさけるためには、複数のデータベースシステムのうち、一つしか選択できない。 In order to solve such a problem, a multi-database system (or data virtualization system) has been developed in which a plurality of database systems are virtually viewed as one database system.
For example, as disclosed in Patent Document 1, a plurality of hierarchical database systems are integrated into a virtual schema by data mapping, and a query is distributed to a database system that holds data to be processed during query execution. A multi-database system has been proposed.
According to this system, the results of queries executed in each database system are collected in the center, and are integrated into one to output the results. However, it is not considered that a plurality of database systems hold the same data, and only one of the plurality of database systems can be selected in order to avoid duplication of data when generating a virtual schema.

また、上述したマルチデータベースシステムにおいて、同一のデータを保持するデータベースシステムが複数ある場合のクエリ処理手法に関して、特許文献２に示される手法が提案されている。
この手法によれば、同一のテーブルが複数のデータベースシステムにある場合に、同一のテーブルに対して一つの仮想テーブル名をつけ、クエリ実行時に適切なテーブルを選択し、ユーザとは透過的に同一のテーブルの使い分けを実現している。 Further, in the multi-database system described above, a technique disclosed in Patent Document 2 has been proposed as a query processing technique when there are a plurality of database systems that hold the same data.
According to this method, when the same table exists in multiple database systems, one virtual table name is assigned to the same table, an appropriate table is selected at the time of query execution, and it is transparently the same as the user The use of different tables is realized.

一方で、データ規模の大容量化に伴い分散処理技術が発達したことで、データの規模や処理の種類に応じて様々なデータベースシステムが登場してきた。これに伴って、複数のデータベースシステムの構築を回避するために、実データを保存するデータベースと、SQL（Structured Query Languge）等のクエリを処理するクエリ処理エンジン（DBMS）を分離した新しい形式のデータベースシステム（以後、データ共有型データベースシステムと呼ぶ）が登場してきた。
このようなシステムでは、データは重複を避けるために一つのデータベースに収集される。そしてこの一つのデータベースを複数のクエリ処理エンジンが共有し、処理の種類や、対象となるデータの規模等に応じてクエリ処理エンジンを使い分けることで、より効率的なデータ処理が可能となった。 On the other hand, various database systems have appeared depending on the scale of data and the type of processing due to the development of distributed processing technology as the data scale increases. Along with this, in order to avoid the construction of multiple database systems, a new type of database that separates a database that stores actual data and a query processing engine (DBMS) that processes queries such as SQL (Structured Query Languge) A system (hereinafter referred to as a data sharing database system) has appeared.
In such a system, data is collected in a single database to avoid duplication. This single database is shared by a plurality of query processing engines, and by using different query processing engines according to the type of processing, the scale of the target data, etc., more efficient data processing has become possible.

特開平０７−１４１３９９号公報Japanese Patent Application Laid-Open No. 07-141399 特開２００８−１１２２８９号公報JP 2008-112289 A

一つのデータベースを複数のクエリ処理エンジンが共有するデータ共有型データベースシステムを利用するにあたり、ユーザがそれぞれのクエリ処理エンジンがどのような特徴を持っているかを把握していない場合がある。このような場合、特定のクエリを実行できないクエリ処理エンジンを選択したり、特定の処理が不得意で遅いクエリ処理エンジンを選択したりするなどの問題が発生する可能性がある。 When using a data sharing database system in which one database is shared by a plurality of query processing engines, the user may not know what characteristics each query processing engine has. In such a case, problems such as selecting a query processing engine that cannot execute a specific query or selecting a slow query processing engine that is not good at specific processing may occur.

特許文献１のような既存のマルチデータベースシステムでは、上述したデータ共有型データベースシステムのような構成を考慮していない。そのため、マルチデータベースシステムから見ると、共有されたデータベースであっても、同一のデータを持った複数のデータベースシステムとして認識される。 In the existing multi-database system as in Patent Document 1, a configuration like the data sharing database system described above is not considered. Therefore, when viewed from the multi-database system, even a shared database is recognized as a plurality of database systems having the same data.

特許文献２では、データ共有型データベースシステムへの適用を考慮しているが、一つのデータベースを共有する複数のクエリ処理エンジンへの振り分けが、データベースシステム間におけるデータ移動が発生する処理として高コストであると認識するため、複数のクエリ処理エンジンを跨いだクエリ分割を行わない。
また、仮にコストが低く見積もられ、クエリ処理エンジン毎にクエリを分割したとしても、既存のマルチデータベースシステムにおいては、データが同じデータベース上にあるにも関わらず、各クエリ処理の結果を各クエリエンジンから収集し、マルチデータベースシステムの管理サーバ上で結合処理等を行う必要があり、共有しているデータベースを有効活用することができない。振り分けたクエリに依存関係があるとすると、更にはその管理サーバ上の処理結果を元の同じデータベースに再度書き直すという、無駄なデータ移動が発生することとなる。 Patent Document 2 considers application to a data sharing database system. However, allocation to a plurality of query processing engines that share one database is a costly process that causes data movement between database systems. In order to recognize that there is, query division across multiple query processing engines is not performed.
In addition, even if the cost is estimated to be low and the query is divided for each query processing engine, in the existing multi-database system, the result of each query processing is displayed for each query even though the data is on the same database. It is necessary to collect from the engine and perform join processing etc. on the management server of the multi-database system, and the shared database cannot be used effectively. Assuming that the distributed query has a dependency relationship, useless data movement occurs in which the processing result on the management server is rewritten in the same original database.

本発明は上記実情に鑑みて提案されたもので、従来のマルチデータベース技術では想定していなかった一つのデータベースを複数のクエリ処理エンジンが共有するデータ共有型データベースシステムにおいて、クエリ処理エンジンの特性とクエリの内容に応じて、クエリ処理エンジンが選択される分散クエリ処理装置、処理方法及び処理プログラムを提供することを目的としている。 The present invention has been proposed in view of the above circumstances, and in a data sharing database system in which a plurality of query processing engines share a single database that was not assumed in the conventional multi-database technology, the characteristics of the query processing engine An object of the present invention is to provide a distributed query processing device, a processing method, and a processing program in which a query processing engine is selected according to the contents of a query.

上記目的を達成するため本発明（請求項１）の分散クエリ処理装置は、一つのデータベースを複数のクエリ処理エンジンが共有し、クライアントから投稿されるクエリの処理を行うデータ共有型データベースシステムにおいて、次の構成を含むことを特徴としている。
前記データベースに関するデータベース情報と、前記クエリ処理エンジンの特性に関するエンジン情報とを管理するマルチデータベース管理データ。
前記クライアントから投稿されたクエリの解析結果に基いて適切なクエリ処理エンジンを選択するとともに、該クエリの分解・再構築を行う分散クエリ生成部。
前記適切なクエリ処理エンジンで前記クエリの処理を実行するとともに、最終的な実行結果を受け取って前記クライアントに転送するクエリ実行制御部。 In order to achieve the above object, the distributed query processing apparatus of the present invention (Claim 1) is a data sharing type database system in which a plurality of query processing engines share a single database and process a query posted from a client. It is characterized by including the following configuration.
Multi-database management data for managing database information related to the database and engine information related to characteristics of the query processing engine.
A distributed query generation unit that selects an appropriate query processing engine based on an analysis result of a query posted from the client, and decomposes / reconstructs the query.
A query execution control unit that executes processing of the query by the appropriate query processing engine, and receives a final execution result and transfers it to the client.

請求項２は、請求項１の分散クエリ処理装置において、前記クエリ実行制御部は、クエリの処理を実行するに際して、分解されたクエリの中間テーブルを前記データベース上に作成することを特徴としている。 According to a second aspect of the present invention, in the distributed query processing device according to the first aspect, the query execution control unit creates an intermediate table of the decomposed query on the database when executing the query processing.

請求項３は、一つのデータベースを複数のクエリ処理エンジンが共有するデータ共有型データベースシステムにおけるクライアントから投稿されるクエリの分散クエリ処理方法において、
クライアントから投稿されたクエリを基にクエリの実行計画を示すクエリプランを生成又は取得し、前記クエリプランを解析することでクエリ処理エンジンにおいて実行される処理フロー、読み込むテーブルと中間データのデータフローを作成するクエリプラン生成／取得／解析手順と、
前記クエリプラン生成／取得／解析手順で生成した処理フローとデータフロー、及び、マルチデータベース管理データのデータベース情報を利用して、テーブルサイズ、処理フロー上の各処理における中間データのデータサイズ、各処理に必要なリソースを計算する計算手順と、
前記計算されたデータサイズや必要なリソースと、マルチデータベース管理データのエンジン情報を基に、処理毎にクエリ処理エンジンを選択するエンジン選択手順と、
処理毎に選択された前記クエリ処理エンジンの情報を基にクエリを生成するクエリ生成手順と、
を実行することを特徴としている。 Claim 3 is a distributed query processing method for queries posted from clients in a data sharing database system in which a plurality of query processing engines share one database.
Generate or obtain a query plan indicating the execution plan of the query based on the query posted from the client, and analyze the query plan to execute the processing flow executed in the query processing engine, the table to be read and the data flow of the intermediate data Query plan generation / acquisition / analysis procedure to be created,
Using the processing flow and data flow generated in the query plan generation / acquisition / analysis procedure and the database information of the multi-database management data, the table size, the data size of the intermediate data in each processing on the processing flow, each processing A calculation procedure for calculating the resources required for
An engine selection procedure for selecting a query processing engine for each process based on the calculated data size and necessary resources, and engine information of multi-database management data;
A query generation procedure for generating a query based on the information of the query processing engine selected for each process;
It is characterized by performing.

請求項４は、請求項３の分散クエリ処理方法において、
前記クエリ生成手順は、
複数のクエリ処理エンジンを跨ぐ処理となる場合、
一連のクエリ処理の最初に一時テーブルを生成する一時テーブル作成手順と、
各クエリ処理エンジンにおける最後の処理となる前記一時テーブルに対する書き込み手順と、
一連のクエリ処理の最後に前記一時テーブルの削除手順と、
を含むことを特徴としている。 Claim 4 is the distributed query processing method of claim 3,
The query generation procedure includes:
If the process spans multiple query processing engines,
Temporary table creation procedure to generate a temporary table at the beginning of a series of query processing,
A write procedure for the temporary table which is the last process in each query processing engine;
At the end of the series of query processing, the temporary table deletion procedure,
It is characterized by including.

請求項５は、請求項３又は請求項４に記載の分散クエリ処理方法の各手順について、コンピュータにより実行可能とした分散クエリ処理プログラムであることを特徴している。 A fifth aspect of the present invention is a distributed query processing program that can be executed by a computer for each procedure of the distributed query processing method according to the third or fourth aspect.

本発明によれば、一つのデータベースを複数のクエリ処理エンジンが共有するデータ共有型データベースシステムに対するクエリを、各クエリ処理エンジンの特性、処理状況、クエリの処理データサイズ及びクエリ処理内容を考慮し、共有しているデータベース上に作成した一時テーブルを利用するクエリに分割して、クエリ処理エンジンに振り分ける。 According to the present invention, a query for a data sharing database system in which a single database is shared by a plurality of query processing engines is considered in consideration of characteristics of each query processing engine, processing status, processing data size of the query, and query processing content. Divide the query into a query that uses the temporary table created on the shared database and distribute it to the query processing engine.

その結果、従来のマルチデータベース技術では想定していなかった一つのデータベースを複数のクエリ処理エンジンが共有するデータ共有型データベースシステムにおいて、ユーザからは透過的にクエリ処理エンジンの特性を考慮することなく、より適したクエリ処理エンジンの選択が可能となる。 As a result, in a data sharing type database system in which a plurality of query processing engines share a single database that has not been assumed in the conventional multi-database technology, without considering the characteristics of the query processing engine transparently from the user, A more suitable query processing engine can be selected.

また、各クエリ処理エンジンの特性とクエリの処理内容を考慮してクエリを分割し、共有のデータベース上に作成した一時テーブルに中間データを格納することで、余計なデータ移動をなくした高速なクエリ処理が可能となる。 In addition, the query is divided in consideration of the characteristics of each query processing engine and the processing contents of the query, and the intermediate data is stored in a temporary table created on a shared database, so that high-speed queries that eliminate unnecessary data movement Processing is possible.

本発明の分散クエリ処理装置を含む分散クエリ処理システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the distributed query processing system containing the distributed query processing apparatus of this invention. 分散クエリ処理装置の分散クエリ処理生成部における処理を説明するためのブロック図である。It is a block diagram for demonstrating the process in the distributed query process production | generation part of a distributed query processing apparatus. 分散クエリ処理システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of a distributed query processing system. クエリエンジンにおけるメモリサイズを示す表である。It is a table | surface which shows the memory size in a query engine. テーブルａの構成を示す表である。It is a table | surface which shows the structure of the table a. テーブルｂの構成を示す表である。It is a table | surface which shows the structure of the table b. SQLクエリのサンプル例である。This is a sample SQL query. SQLクエリの処理フロー図である。It is a processing flow figure of a SQL query. SQLクエリのデータフロー図である。It is a data flow figure of a SQL query. 図９のデータフローにおける各データサイズを示す表である。10 is a table showing data sizes in the data flow of FIG. 9. クエリ処理エンジンXにおける処理フロー図である。FIG. 6 is a processing flow diagram in the query processing engine X. クエリ処理エンジンYにおける処理フロー図である。FIG. 6 is a processing flow diagram in a query processing engine Y. 図７のSQLクエリを分割して作成したSQLクエリの例である。It is an example of the SQL query created by dividing the SQL query of FIG. 図７のSQLクエリを分割して作成したSQLクエリの例である。It is an example of the SQL query created by dividing the SQL query of FIG.

本発明の分散クエリ処理装置１を備えた分散クエリ処理システムの実施形態の一例について、図１を参照して説明する。
分散クエリ処理装置システムの主要部となる分散クエリ処理装置１は、クライアント２からのクエリの処理要求に対して、ネットワーク３を介して、データベースシステム（データ共有型データベースシステム）４で処理を行うに際して、クエリ処理エンジンの特性とクエリの内容に応じてクエリを分割し、自動的に分配してクエリの処理を行うシステムであり、分散クエリ処理プログラムが格納された記録媒体やインターネットを介してソフトウエアのダウンロードによりインストールすることで、コンピュータ上に構築されている。 An example of an embodiment of a distributed query processing system including the distributed query processing device 1 of the present invention will be described with reference to FIG.
The distributed query processing device 1, which is a main part of the distributed query processing device system, processes a query processing request from the client 2 in the database system (data sharing database system) 4 via the network 3. , A system that divides queries according to the characteristics of the query processing engine and the contents of the query, and automatically distributes the queries to process them. Software via a recording medium storing the distributed query processing program or the Internet It is built on the computer by installing it by downloading.

分散クエリ処理装置１が構築されるコンピュータは、オペレーティングシステム（ＯＳ）を含む基本プログラムや各種の基本デバイスが記憶されたＲＯＭと、各種のプログラムションやデータが記憶されるハードディスクドライブ装置(HDD)と、CR-ROMやDVD等の記憶媒体からプログラムやデータを読み出すメディアドライブ装置と、プログラムを実行するCPUと、このCPUにワークエリアを提供するRAMと、入出力インターフェース(I/F)を介して接続されたディスプレイ、キーボードおよびマウス等のポインティングデバイスと、外部装置と通信するパラレル／シリアルI/Fとを主要部分とする一般的な構成を備えている。
本実施形態の分散クエリ処理装置１では、分散クエリ処理プログラムがシリアル／パラレルI/Fから入力、またはメディアドライブ装置で読み取られてHDDに予め記憶される。分散クエリ処理プログラムは記憶媒体に記憶され、メディアドライブ装置で読み取られてHDDにインストールされる。 A computer in which the distributed query processing device 1 is constructed includes a ROM that stores a basic program including an operating system (OS) and various basic devices, and a hard disk drive (HDD) that stores various programs and data. Media drive device that reads programs and data from storage media such as CR-ROM and DVD, CPU that executes programs, RAM that provides a work area for this CPU, and input / output interface (I / F) A general configuration mainly includes a connected display device such as a display, a keyboard, and a mouse, and a parallel / serial I / F that communicates with an external device.
In the distributed query processing device 1 of the present embodiment, the distributed query processing program is input from the serial / parallel I / F or read by the media drive device and stored in advance in the HDD. The distributed query processing program is stored in a storage medium, read by the media drive device, and installed in the HDD.

分散クエリ処理システムは、分散クエリ処理装置１と、分散クエリ処理装置１に対してクエリ処理を要求するクライアント２と、ネットワーク３を介して接続されたデータベースシステム４から構成される。データベースシステム４は、一つの共有データベース４０と複数のクエリ処理エンジン４１とを備え、共有データベース４０は、複数のクエリ処理エンジン４１により共有するようになっている。 The distributed query processing system includes a distributed query processing device 1, a client 2 that requests query processing from the distributed query processing device 1, and a database system 4 connected via a network 3. The database system 4 includes one shared database 40 and a plurality of query processing engines 41, and the shared database 40 is shared by the plurality of query processing engines 41.

クライアント２は、WebアプリケーションサーバやBIツールなどの、データベースシステム４に対して、クエリ処理を要求するソフトウエア及びこれを実装したハードウェアを指す。 The client 2 refers to software that requests query processing from the database system 4, such as a Web application server or a BI tool, and hardware that implements the software.

分散クエリ処理装置１は、マルチデータベース管理データ１０と、データ更新部１３と、アクセス管理部１４と、分散クエリ生成部１５と、クエリ実行制御部１６から構成される。
マルチデータベース管理データ１０は、共有データベース４０の各種情報が記憶されたデータベース情報１１と、データベースシステム４の共有データベース４０を管理するデータベースマネジメントシステムとしての複数のクエリ処理エンジン４１等の情報が記憶されたエンジン情報１２を有することで、各種の情報を保持して管理する。 The distributed query processing device 1 includes multi-database management data 10, a data update unit 13, an access management unit 14, a distributed query generation unit 15, and a query execution control unit 16.
The multi-database management data 10 stores database information 11 in which various information of the shared database 40 is stored, and information on a plurality of query processing engines 41 as a database management system that manages the shared database 40 of the database system 4. By having the engine information 12, various information is held and managed.

データベース情報１１には、共有データベース４０におけるスキーマ群の構成、各スキーマのサイズ、アクセス頻度、各スキーマにおける属性一覧、各属性の名前、型、サイズ、付加情報、後述するクエリ処理における中間テーブルの構成とそのサイズの履歴、クエリ処理における出力結果の構成とサイズの履歴を含む共有データベース４０に関する情報が管理されている。 The database information 11 includes a configuration of a schema group in the shared database 40, a size of each schema, an access frequency, an attribute list in each schema, a name, a type, a size, additional information of each attribute, and a configuration of an intermediate table in query processing to be described later. And information on the shared database 40 including the history of the size, the structure of the output result in the query processing and the history of the size.

エンジン情報１２には、ノード数、CPUクロック数、CPUコア数、メモリサイズ、ネットワーク帯域、ディスクサイズ、ディスクアクセス帯域、クエリ処理エンジン名、メタデータ及びデータベースの共有の有無、同時実行可能クエリ数、実行不可クエリ、クエリのアルゴリズム情報、リソースの利用状況（CPU利用率、メモリ利用率、ネットワークトラフィック量、ディスク利用量、ディスクスワップ利用量、ディスクI/O量）を含む各クエリ処理エンジン４１の特性に関する情報が管理されている。 The engine information 12 includes the number of nodes, the number of CPU clocks, the number of CPU cores, the memory size, the network bandwidth, the disk size, the disk access bandwidth, the query processing engine name, the presence / absence of metadata and database sharing, the number of simultaneously executable queries, Characteristics of each query processing engine 41, including non-executable queries, query algorithm information, resource usage (CPU usage, memory usage, network traffic, disk usage, disk swap usage, disk I / O) Information about is managed.

データ更新部１３は、データベースシステム４におけるクエリ処理エンジン４１のエンジン情報１２及び共有データベース４０のデータベース情報１１を定期的に又はユーザの要求により収集することで、マルチデータベース管理データ１０を更新する。若しくは、データ更新部１３は、クライアント２を介してユーザから入力されるデータを受け付けることで、マルチデータベース管理データ１０を更新する。 The data update unit 13 updates the multi-database management data 10 by collecting the engine information 12 of the query processing engine 41 and the database information 11 of the shared database 40 in the database system 4 periodically or upon user request. Alternatively, the data updating unit 13 updates the multi-database management data 10 by receiving data input from the user via the client 2.

アクセス管理部１４は、クライアント２からのアクセス状況を管理し、クエリの投稿がされた場合に、クエリを分散クエリ生成部１５へ出力する制御が行われる。 The access management unit 14 manages the access status from the client 2 and controls to output the query to the distributed query generation unit 15 when a query is posted.

分散クエリ生成部１５は、クライアント２から投稿されたクエリを解析し、マルチデータベース管理データ１０のデータベース情報１１及びエンジン情報１２に基づいて、適切なクエリ処理エンジン４１の選択とクエリの分解・再構築を行う。クエリの分解・再構築（分散クエリの作成）とエンジン選択の詳細手順については、後述する。 The distributed query generation unit 15 analyzes the query posted from the client 2, selects an appropriate query processing engine 41 and decomposes / reconstructs the query based on the database information 11 and the engine information 12 of the multi-database management data 10. I do. Detailed procedures for query decomposition / reconstruction (creation of a distributed query) and engine selection will be described later.

クエリ実行制御部１６は、適切なクエリ処理エンジン４１でクエリの処理を実行するとともに、最終的な実行結果を受け取ってクライアント２に転送する。また、クエリの処理を実行するに際して、必要に応じて、分解されたクエリの中間テーブルを共有データベース４０上に作成する。 The query execution control unit 16 executes query processing with an appropriate query processing engine 41 and receives a final execution result and transfers it to the client 2. Further, when executing the query processing, an intermediate table of the decomposed query is created on the shared database 40 as necessary.

データベースシステム４の各のいジスン複数のデワークを介した複数のデータベースクエリ処理エンジン４１は、共有データベース４０に対して投稿されたクエリの処理を行う。
共有データベース４０は、クエリ処理に必要なデータをクエリ処理エンジン４１に提供する。 A plurality of database query processing engines 41 through a plurality of deworkers of the database system 4 process a query posted to the shared database 40.
The shared database 40 provides data necessary for query processing to the query processing engine 41.

分散クエリ処理装置１と各クエリ処理エンジン４１はネットワーク３を介して接続され、分散クエリ処理装置１で生成された分散クエリは、指定された各クエリ処理エンジン４１にネットワーク３を通じて転送され、各クエリの処理結果も同様にネットワーク３を通じてクエリ処理エンジン４１から分散クエリ処理装置１へと転送される。 The distributed query processing device 1 and each query processing engine 41 are connected via the network 3, and the distributed query generated by the distributed query processing device 1 is transferred to each designated query processing engine 41 through the network 3, and each query Similarly, the processing result is transferred from the query processing engine 41 to the distributed query processing device 1 through the network 3.

共有データベース４０は、物理的に一つである場合だけでなく、複数のデータベースを完全に同期または分散してデータを格納することで仮想的に一つに見せる場合も含まれる。例えば、物理的に離れている複数のデータベースにおいて、ある一つのデータベースへの書き込みが行われると同時に、その更新が他の全てのデータベースにも瞬時に適用される場合、共有する一つのデータベースの形態として適用される。
また、ここでのデータベースは、ハードディスクドライブ（HDD）やメモリなどの情報記憶媒体に保存されたデータベースシステムにおける物理データと、物理データの位置やテーブルスキーマなどのデータベースシステムのデータ構成に関する論理的情報（メタデータ）を含む。 The shared database 40 includes not only a case where the shared database 40 is physically one, but also a case where a plurality of databases are completely synchronized or distributed to store the data in a virtual manner. For example, when multiple databases that are physically separated are written to one database and the update is instantly applied to all other databases, one shared database configuration As applied.
The database here includes physical data in a database system stored in an information storage medium such as a hard disk drive (HDD) or memory, and logical information on the data structure of the database system such as physical data location and table schema ( Metadata).

次に、分散クエリ処理装置１におけるクエリ振り分け処理の詳細について説明する。
一つの共有データベース４０を複数のクエリ処理エンジン４１で共有するデータ共有型でのためしているデータソースデータ共有型データベースに対して、クライアント２からクエリが投稿されると、アクセス管理部１４が受け取り、シーケンシャルに分散クエリ生成部１５へとクエリを受け渡す。 Next, details of query distribution processing in the distributed query processing device 1 will be described.
When a query is posted from the client 2 to a data source data sharing type database that is shared by a plurality of query processing engines 41 that share a single shared database 40, the access management unit 14 receives the query and sequentially receives it. The query is delivered to the distributed query generation unit 15.

このとき、分散クエリ生成部１５は、図２に示す手順で、クライアント２が投稿したクエリを、適切なクエリ処理エンジン４１への分散クエリへと変換する。
クエリプラン生成／取得／解析では、クライアント２から投稿されたクエリを基にクエリの実行計画を示すクエリプランを生成又は取得し、クエリプランを解析することでクエリ処理エンジンにおいて実行される処理のフロー、読み込むテーブルと中間データのデータフローを作成する（ステップ２１）。 At this time, the distributed query generation unit 15 converts the query posted by the client 2 into a distributed query to the appropriate query processing engine 41 in the procedure shown in FIG.
In the query plan generation / acquisition / analysis, a flow of processing executed in the query processing engine by generating or acquiring a query plan indicating a query execution plan based on the query posted from the client 2 and analyzing the query plan. Then, a data flow of the table to be read and the intermediate data is created (step 21).

データサイズとリソース計算では、クエリプラン生成／取得／解析で生成した処理フローとデータフロー、更にマルチデータベース管理データ１０のデータベース情報１１を利用して、テーブルサイズ、処理フロー上の各処理における中間データのデータサイズ、その他の各処理に必要なリソースを計算する（ステップ２２）。 In the data size and resource calculation, the processing flow and data flow generated by the query plan generation / acquisition / analysis, and the database information 11 of the multi-database management data 10 are used. The data size and other resources necessary for each process are calculated (step 22).

エンジン選択では、処理毎の対象クエリ処理エンジンを選択する。このとき、上記で計算したデータサイズや必要なリソースと、マルチデータベース管理データ１０のエンジン情報１２を基に、予め定めたクエリ処理エンジン選択基準と照らしあわせた上でクエリ処理エンジンが選択される（ステップ２３）。 In the engine selection, a target query processing engine for each process is selected. At this time, a query processing engine is selected on the basis of a predetermined query processing engine selection criterion based on the data size and necessary resources calculated above and the engine information 12 of the multi-database management data 10 ( Step 23).

クエリ生成では、処理毎に、選択されたクエリ処理エンジンの情報を基に、クエリを生成する（ステップ２４）。
このとき、処理フローに沿ってクエリを生成するが、複数のクエリ処理エンジンを跨ぐ処理フローとなる場合、一連のクエリ処理の最初に一時テーブルを生成するクエリ文を挿入し、各クエリ処理エンジンにおける最後の処理はこの一時テーブルに対しての挿入処理に変更され、この一連のクエリ処理の最後に一時テーブルの削除処理を挿入する。
クエリエンジン選択後、クエリをそのままにクエリ分割できない場合は、処理フローやデータフローからエンジン別のクエリを再構築（再作成）する。 In the query generation, a query is generated for each process based on information of the selected query processing engine (step 24).
At this time, a query is generated along the processing flow, but if the processing flow spans multiple query processing engines, a query statement that generates a temporary table is inserted at the beginning of a series of query processing, and each query processing engine The last process is changed to an insertion process for the temporary table, and a deletion process for the temporary table is inserted at the end of the series of query processes.
After selecting a query engine, if the query cannot be divided as it is, the engine-specific query is reconstructed (recreated) from the processing flow or data flow.

続いて、クエリ処理を行う方法について、具体例を参照して説明する。
図３は、実際のクエリ処理におけるシステム構成例で、クエリ処理エンジンXとクエリ処理エンジンYが、共有のデータベースAを利用している。この構成により、クライアント２からは、同一のデータを持った２つのデータベースに見える。 Next, a method for performing query processing will be described with reference to a specific example.
FIG. 3 shows an example of a system configuration in actual query processing. The query processing engine X and the query processing engine Y use a shared database A. With this configuration, the client 2 can see two databases having the same data.

例えば、クエリ処理エンジンXは、データサイズに比例して常に安定した実行が可能で、一方、クエリ処理エンジンYは、物理メモリサイズ（32GB）までのテーブルに対する処理はクエリ処理エンジンXと比較して高速だがこれを超えるデータに対してクエリを実行できないと仮定する。クエリ処理エンジンXとクエリ処理エンジンYの各物理メモリは、図４に示す通りとなっている。
この前提から、振り分け条件として処理対象データがクエリ処理エンジンYの物理メモリサイズを超えた場合にクエリ処理エンジンXへ分配し、クエリ処理エンジンYの物理メモリサイズ以下の処理対象データに対する処理をクエリ処理エンジンYへ分配する。 For example, the query processing engine X can always execute stably in proportion to the data size, while the query processing engine Y performs processing for tables up to the physical memory size (32 GB) compared to the query processing engine X. Assume that you can't run queries on data that is fast but beyond this. Each physical memory of the query processing engine X and the query processing engine Y is as shown in FIG.
Based on this assumption, if the processing target data exceeds the physical memory size of the query processing engine Y as a distribution condition, it is distributed to the query processing engine X, and the processing for the processing target data less than the physical memory size of the query processing engine Y is processed by query processing. Distribute to engine Y.

テーブルa（図５）及びテーブルb（図６）に対するSQLクエリ（図７）を例に、クエリ処理について説明する。
このSQLクエリは、テーブルa及びbを「テーブルaに含まれる属性idとテーブルbに含まれる属性aidが等しいレコード」という条件で結合処理（JOIN処理）を行い、その結果を「テーブルaに含まれる属性dateが等しいレコード」で集計し、これに紐づくテーブルbのsalesの合計値を計算（AGGREGATE処理（GROUP BY + SUM））し、このレコードをdate値で降順にソート（ORDER BY処理）するという内容となっている。 Query processing will be described by taking an SQL query (FIG. 7) for table a (FIG. 5) and table b (FIG. 6) as an example.
This SQL query performs join processing (JOIN processing) on tables a and b under the condition that the attribute id contained in table a is equal to the attribute aid contained in table b, and the result is "included in table a. The records with the same attribute date ”, and calculate the total value of the sales of table b linked to this (AGGREGATE process (GROUP BY + SUM)), and sort this record in descending order by date value (ORDER BY process) The content is to do.

クライアント２から図７のSQLクエリが投稿されると、分散クエリ生成部１５はクエリプランを生成し、上述の処理順の通りに図８の処理フローと、各処理における入出力のデータから図９のデータフローを作成する。
ここでは、「テーブルaの読み込み」（ステップ８１ａ）は「a」を出力（ステップ９１ａ）、「テーブルbの読み込み」（ステップ８１ｂ）では「b」を出力（ステップ９１ｂ）、「JOIN処理」（ステップ８２）では「temp_join」を出力（ステップ９２）、「AGGREGATE処理（GROUP BY + SUM）」（ステップ８５）では「temp_aggr」を出力（ステップ９５）、「ORDER BY処理」（ステップ８６）ではoutputを出力（ステップ９６）するものとする。 When the SQL query of FIG. 7 is posted from the client 2, the distributed query generation unit 15 generates a query plan, and in accordance with the processing flow of FIG. Create a data flow for.
Here, “reading table a” (step 81a) outputs “a” (step 91a), “reading table b” (step 81b) outputs “b” (step 91b), and “JOIN processing” ( In step 82), “temp_join” is output (step 92), “AGGREGATE processing (GROUP BY + SUM)” (step 85) is output “temp_aggr” (step 95), and “ORDER BY processing” (step 86) is output. Is output (step 96).

このデータフローと、対応する処理やデータベース情報から、フローにおける各データのサイズを図１０のように見積る。このデータサイズと図４の物理メモリを比較し、入力データもしくは出力データが物理メモリサイズを超えているかどうかを判定することで、各処理を行うクエリ処理エンジンを決定する。
すなわち、「JOIN処理」までが全てクエリ処理エンジンYの物理メモリサイズ32GBを超えていることから、「テーブルaの読み込み」、「テーブルbの読み込み」、「JOIN処理」をクエリ処理エンジンXで処理し、残りの「AGGREGATE処理（GROUP BY + SUM）」、「ORDER BY処理」をクエリ処理エンジンYで行うこととなる。 From this data flow and the corresponding processing and database information, the size of each data in the flow is estimated as shown in FIG. By comparing this data size with the physical memory of FIG. 4 and determining whether the input data or the output data exceeds the physical memory size, the query processing engine that performs each process is determined.
In other words, since all of the processing up to “JOIN processing” exceeds the physical memory size 32GB of query processing engine Y, “reading table a”, “reading table b”, and “JOIN processing” are processed by query processing engine X. Then, the remaining “AGGREGATE processing (GROUP BY + SUM)” and “ORDER BY processing” are performed by the query processing engine Y.

上述したクエリ処理の場合、複数のクエリ処理エンジンを跨ぐ処理となるので、クエリ処理エンジンXとクエリ処理エンジンYの間でデータのやり取りを行うために、共有のデータベースA上に中間データ（一時テーブルtemp）を作成し、利用することが行われる。
すなわち、一連のクエリ処理の最初に一時テーブルの作成処理（一時テーブルを生成するクエリ文を挿入）、各クエリ処理エンジンにおける最後の処理は、一時テーブルに対しての書き込み処理（挿入処理）に変更し、一連のクエリ処理の最後に一時テーブルの削除処理を追加する。 In the case of the query processing described above, since processing is performed across multiple query processing engines, in order to exchange data between the query processing engine X and the query processing engine Y, intermediate data (temporary table) is stored on the shared database A. temp) is created and used.
In other words, the temporary table creation process (inserts a query statement that generates a temporary table) at the beginning of a series of query processes, and the last process in each query processing engine is changed to a write process (insert process) to the temporary table Then, a temporary table deletion process is added at the end of a series of query processes.

図１１がクエリ処理エンジンXにおける処理フローであり、分配された処理に加えて、最初に一時テーブルtempの作成処理（ステップ８０）と、「JOIN処理」の後に処理結果をテーブルtempへ書き込む処理（ステップ８３）が追加されている。 FIG. 11 shows a processing flow in the query processing engine X. In addition to the distributed processing, first, the temporary table temp creation processing (step 80), and the processing result writing to the table temp after the “JOIN processing” ( Step 83) has been added.

更に、図１２がクエリ処理エンジンYにおける処理フローであり、分配された処理に加えて、最初にクエリ処理エンジンXが作成したtempのデータを読み込む処理（ステップ８４）と、処理終了後にtempを削除する処理（ステップ８７）が追加されている。 Further, FIG. 12 shows a processing flow in the query processing engine Y. In addition to the distributed processing, processing for reading temp data first created by the query processing engine X (step 84), and deleting temp after the processing ends A process (step 87) is added.

図１１及び図１２の処理フローを基に、図７のSQLクエリを分割して作成したSQLクエリは、図１３及び１４のようになる。すなわち、図１３のSQLクエリが図１１のクエリ処理エンジンXにおける処理に対応し、図１４のSQLクエリが図１２のクエリ処理エンジンYにおける処理に対応する。 The SQL queries created by dividing the SQL query of FIG. 7 based on the processing flow of FIGS. 11 and 12 are as shown in FIGS. That is, the SQL query in FIG. 13 corresponds to the processing in the query processing engine X in FIG. 11, and the SQL query in FIG. 14 corresponds to the processing in the query processing engine Y in FIG.

上述したクエリ処理によれば、クエリを分割して、同じ共有データベース４０を共有している点を利用し、データ移動における余計なオーバヘッドを発生させずに、より高速にクエリを実行することができる。 According to the query processing described above, it is possible to execute a query at a higher speed without dividing the query and using the same shared database 40, without causing an extra overhead in data movement. .

１…分散クエリ処理装置、２…クライアント、３…ネットワーク、４…データベースシステム、１０…マルチデータベース管理データ、１１…データベース情報、１２…エンジン情報、１３…データ更新部、１４…アクセス管理部、１５…分散クエリ生成部、１６…クエリ実行制御部、４０…共有データベース、４１…クエリ処理エンジン。 DESCRIPTION OF SYMBOLS 1 ... Distributed query processing apparatus, 2 ... Client, 3 ... Network, 4 ... Database system, 10 ... Multi database management data, 11 ... Database information, 12 ... Engine information, 13 ... Data update part, 14 ... Access management part, 15 ... distributed query generation unit, 16 ... query execution control unit, 40 ... shared database, 41 ... query processing engine.

Claims

In a data sharing database system where multiple query processing engines share a single database and process queries submitted by clients,
Multi-database management data for managing database information related to the database and engine information related to characteristics of the query processing engine;
A distributed query generation unit that selects an appropriate query processing engine based on the analysis result of the query posted from the client, and decomposes / reconstructs the query;
A query execution control unit that executes processing of the query by the appropriate query processing engine and receives a final execution result and transfers the result to the client;
A distributed query processing apparatus comprising:

The distributed query processing device according to claim 1, wherein the query execution control unit creates an intermediate table of decomposed queries on the database when executing query processing.

In a method of processing a query posted from a client in a data sharing database system in which a plurality of query processing engines share one database,
Generate or obtain a query plan indicating the execution plan of the query based on the query posted from the client, and analyze the query plan to execute the processing flow executed in the query processing engine, the table to be read and the data flow of the intermediate data Query plan generation / acquisition / analysis procedure to be created,
Using the processing flow and data flow generated in the query plan generation / acquisition / analysis procedure and the database information of the multi-database management data, the table size, the data size of the intermediate data in each processing on the processing flow, each processing A calculation procedure for calculating the resources required for
An engine selection procedure for selecting a query processing engine for each process based on the calculated data size and necessary resources, and engine information of multi-database management data;
A query generation procedure for generating a query based on the information of the query processing engine selected for each process;
The distributed query processing method characterized by performing.

The query generation procedure includes:
If the process spans multiple query processing engines,
Temporary table creation procedure to generate a temporary table at the beginning of a series of query processing,
A write procedure for the temporary table which is the last process in each query processing engine;
At the end of the series of query processing, the temporary table deletion procedure,
The distributed query processing method according to claim 3, comprising:

A distributed query processing program which enables each procedure of the distributed query processing method according to claim 3 or 4 to be executed by a computer.