JP2012108635A

JP2012108635A - Distributed memory database system, front database server, data processing method and program

Info

Publication number: JP2012108635A
Application number: JP2010255654A
Authority: JP
Inventors: Yuta Namiki; 悠太並木
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2010-11-16
Filing date: 2010-11-16
Publication date: 2012-06-07
Anticipated expiration: 2030-11-16
Also published as: JP5598279B2

Abstract

PROBLEM TO BE SOLVED: To provide a distributed memory database system and the like in which processing for an aggregate function such as a total can be performed at high speed by reducing communication traffic volume and throughput.SOLUTION: A front database server 10 has a data structure conversion section 113 and an inquiry processing section 111. The data structure conversion section 113 divides table data inputted from the outside to produce a plurality of value ID tables and stores the value ID tables while distributing them to data nodes. The inquiry processing section 111 inquires the number of appearance of a specific value ID in the table data of the data nodes on the basis of a query including an aggregate function issued from an external client machine, calculates a value of the aggregate function corresponding to the query from the number of appearance of the specific value ID returned from each of the data nodes in response to the inquiry and redirects the calculated value to the client machine. The data structure conversion section produces a plurality of value ID tables individually for each of a plurality of data items designated beforehand as a sequence that may become a tabulation axis in the table data.

Description

本発明は分散メモリデータベースシステム、フロントデータベースサーバ、データ処理方法およびプログラムに関し、特に集合関数に対する処理を迅速に実施することの可能な分散メモリデータベースシステム等に関する。 The present invention relates to a distributed memory database system, a front database server, a data processing method, and a program, and more particularly, to a distributed memory database system and the like capable of quickly executing processing on a set function.

ある程度以上の規模のコンピュータ装置を利用したシステム、たとえばウェブサービスや業務システム等では、大量のデータを扱うためにデータベース管理システム（ＤＢＭＳ: DataBase Management System）の利用が不可欠である。また近年は、ネットワークによって接続された多数のコンピュータを連携させて、１つの巨大なコンピュータとして処理を行わせる、いわゆるクラウドコンピューティングの技術が確立されてきている。 In a system using a computer device of a certain size or more, such as a web service or a business system, it is indispensable to use a database management system (DBMS) in order to handle a large amount of data. In recent years, so-called cloud computing technology has been established in which a large number of computers connected via a network are linked to perform processing as one huge computer.

ＤＢＭＳの中でも特に、データベースに係る処理をネットワークによって接続された多数のコンピュータに分担させるものを、分散メモリデータベースシステムという。分散メモリデータベースシステムは、大量のデータを一括で処理するバッチ用途や、企業の扱う大量のデータから特定の部署が必要とするものを抽出するデータマートの作成などで、特に処理の高速化の効果が発揮される。 In particular, a DBMS that distributes processing related to a database to a large number of computers connected by a network is called a distributed memory database system. The distributed memory database system is particularly effective for speeding up processing, such as batch applications that process a large amount of data at once, and creation of a data mart that extracts a specific department's needs from a large amount of data handled by a company. Is demonstrated.

以下、これについて説明する。なお、本明細書では説明を平易なものとするために、ごく少ないデータ個数および項目数について例示するが、実際には膨大な個数および項目数のデータに対して例示したような処理を行うものである。 This will be described below. In this specification, in order to simplify the explanation, only a very small number of data and number of items are illustrated, but in practice, the processing as illustrated for the enormous number of data and number of items is performed. It is.

図１４は、一般的な分散メモリデータベースシステム９０１の構成を示す説明図である。分散メモリデータベースシステム１は、フロントメモリデータベースサーバ９１０（以後フロントＤＢサーバ９１０という）と、複数台のデータノード９２１〜９２３とが内部ネットワーク９３０を介して相互に接続されて構成される。図１では３台のデータノード９２１〜９２３を示しているが、もちろんこの台数は２台以上何台でもよい。また、フロントＤＢサーバ１０は、外部ネットワーク９４０を介して、クライアントマシン９５０と接続されている。 FIG. 14 is an explanatory diagram showing a configuration of a general distributed memory database system 901. The distributed memory database system 1 includes a front memory database server 910 (hereinafter referred to as a front DB server 910) and a plurality of data nodes 921 to 923 connected to each other via an internal network 930. Although three data nodes 921 to 923 are shown in FIG. 1, of course, this number may be two or more. The front DB server 10 is connected to the client machine 950 via the external network 940.

クライアントマシン９５０はフロントＤＢサーバ９１０に対してクエリー（処理要求）を発行するコンピュータである。このクエリーに基づくデータ処理をフロントＤＢサーバ９１０とデータノード９２１〜９２３とが連携して行い、フロントＤＢサーバ９１０はその検索結果をクライアントマシン５０に返す。その際、フロントＤＢサーバ９１０は、クライアントマシン９５０から発行されたクエリーの各データノード９２１〜９２３に向けての分割と、各データノード９２１〜９２３からの結果の集約とを行う。 The client machine 950 is a computer that issues a query (processing request) to the front DB server 910. Data processing based on this query is performed in cooperation between the front DB server 910 and the data nodes 921 to 923, and the front DB server 910 returns the search result to the client machine 50. At that time, the front DB server 910 divides the query issued from the client machine 950 toward the data nodes 921 to 923 and aggregates the results from the data nodes 921 to 923.

フロントＤＢサーバ９１０では、問い合わせ処理部９１１、データ配置情報管理部９１２、およびデータ構造変換部９１３が、各々コンピュータプログラムとして後述するそれぞれの機能を実行されるように構成されている。 In the front DB server 910, an inquiry processing unit 911, a data arrangement information management unit 912, and a data structure conversion unit 913 are configured to execute respective functions described later as computer programs.

問い合わせ処理部９１１は、クライアントマシン９５０が発行したクエリーを受け付け、このクエリーで処理対象となるデータ項目の所在をデータ配置情報管理部９１２に対して照会し、この照会に応じてデータ配置情報管理部１１２から得られた回答を元にしてクライアントマシン９５０からのクエリーを各データノード９２１〜９２３ごとに分割して、分割されたクエリーを各データノード９２１〜９２３に送信する。そして、送信した各クエリーに対する各データノード９２１〜９２３からの回答を集約してクライアントマシン９５０に返す。 The inquiry processing unit 911 receives a query issued by the client machine 950, inquires the data arrangement information management unit 912 about the location of the data item to be processed by this query, and in response to this inquiry, the data arrangement information management unit Based on the answer obtained from 112, the query from the client machine 950 is divided for each data node 921-923, and the divided query is transmitted to each data node 921-923. The answers from the data nodes 921 to 923 for each transmitted query are collected and returned to the client machine 950.

データ配置情報管理部９１２は、問い合わせ処理部９１１から照会のあったデータ項目のデータが各データノード９２１〜９２３のうちのいずれに存在するかを、問い合わせ処理部９１１に返答する。データ構造変換部９１３は、入力された表構造データを分割して、各データノード９２１〜９２３に記憶させる。 The data arrangement information management unit 912 returns to the inquiry processing unit 911 which of the data nodes 921 to 923 data of the data item inquired from the inquiry processing unit 911 exists. The data structure conversion unit 913 divides the input table structure data and stores the divided data in the data nodes 921 to 923.

各データノード９２１〜９２３では、問い合わせ処理部９１４がコンピュータプログラムとして、問い合わせ処理部９１１で分割されたクエリーに基づいて検索などの処理を行ってその結果をフロントＤＢサーバ９１０に返す機能を実行されるように構成されている。 In each of the data nodes 921 to 923, the inquiry processing unit 914 performs a function such as a search based on the query divided by the inquiry processing unit 911 and returns the result to the front DB server 910 as a computer program. It is configured as follows.

図１５は、図１４に示した分散メモリデータベースシステム９０１に対して入力される表データ９６０の一例を示す説明図である。図１５で示した表データ９６０は、日付９６０ａ、店ＩＤ９６０ｂ、売上９６０ｃという３つの項目のデータを持つ。このうちの日付９６０ａは「８月１０日」「８月１１日」の２種類の値が使われている。 FIG. 15 is an explanatory diagram showing an example of table data 960 input to the distributed memory database system 901 shown in FIG. The table data 960 shown in FIG. 15 has three items of data: date 960a, store ID 960b, and sales 960c. Of these, the date 960a uses two types of values, “August 10” and “August 11”.

データ構造変換部９１３は、このうちの店ＩＤ９１０ｂの値「Ａ１」「Ｄ３」「Ｅ１」を基準として表データ９６０を分割して、店ＩＤ別の表データ９６１〜９６３を生成し、これらを各々各データノード９２１〜９２３に送付して記憶させる。図１６は、図１５に示した表データ９１０をデータ構造変換部９１３が分割して生成した店ＩＤ別の表データ９６１〜９６３を示す説明図である。 The data structure conversion unit 913 divides the table data 960 based on the values “A1”, “D3”, and “E1” of the store ID 910b, and generates table data 961 to 963 for each store ID. The data nodes 921 to 923 are sent and stored. FIG. 16 is an explanatory diagram showing table data 961 to 963 for each shop ID generated by the data structure conversion unit 913 dividing the table data 910 shown in FIG.

図１６に示した店ＩＤ別の表データ９６１〜９６３が各々データノード９２１〜９２３に記憶された状態で、問い合わせ処理部９１１はクライアントマシン９５０が発行したクエリーを受け付ける。たとえば、以下の数１に示すクエリーを受け付けた場合、問い合わせ処理部９１１は各データノード９２１〜９２３に対して、各々が記憶している店ＩＤ別の表データ９６１〜９６３から店ＩＤごとの合計売上額を算出させ、返信されてきた店ＩＤごとの合計売上額をまとめてクライアントマシン９５０に返信する。

The inquiry processing unit 911 receives a query issued by the client machine 950 in a state where the table data 961 to 963 for each store ID shown in FIG. 16 are stored in the data nodes 921 to 923, respectively. For example, when the query shown in the following equation 1 is received, the inquiry processing unit 911 receives the total for each store ID from the table data 961 to 963 for each store ID stored in each of the data nodes 921 to 923. The sales amount is calculated, and the total sales amount for each returned shop ID is collectively returned to the client machine 950.

これに関連する技術として、以下の各々がある。その中でも特許文献１には、並列コンピュータの複数の処理モジュールで処理されるデータに共通のグローバル次元値番号を付与することによって、プロセッサ間通信の発生を少なくしてデータのソートや集計などを実現するという情報処理システムが記載されている。特許文献２には、ハッシュ値を利用して1つ以上の列の値が共通する行をグループ化する処理を高速化するというグループ化方法が記載されている。 There are the following technologies related to this. Among them, Patent Document 1 assigns a common global dimension value number to data processed by a plurality of processing modules of a parallel computer, thereby reducing the occurrence of inter-processor communication and realizing data sorting and aggregation. An information processing system is described. Patent Document 2 describes a grouping method that uses a hash value to speed up the process of grouping rows that share one or more column values.

特許文献３には、複数の分析問題を含むデータをレイヤに分割することによって、複数の計算機でのデータ分析を高速化するというデータ分析システムが記載されている。特許文献４には、複数の装置にデータを分散させて処理する際にシステムの終了処理時間を短縮するというデータベース処理方法が記載されている。 Patent Document 3 describes a data analysis system that speeds up data analysis on a plurality of computers by dividing data including a plurality of analysis problems into layers. Patent Document 4 describes a database processing method for shortening the end processing time of a system when processing data distributed to a plurality of devices.

特許文献５には、データにラベルコードを付与して、複数の装置によるデータの加工処理を高速化するというデータ加工システムが記載されている。非特許文献１には、データベースとして多く利用されているオラクル（登録商標）データベースで、完全一致検索を高速化することが可能である「ビットマップインデックス」について記載されている。 Patent Document 5 describes a data processing system that adds a label code to data to speed up data processing by a plurality of devices. Non-Patent Document 1 describes a “bitmap index” that can speed up the exact match search in an Oracle (registered trademark) database that is widely used as a database.

再特ＷＯ２００５／０４１０６７号公報Re-specialized WO2005 / 041067 特開２０００−１８７６６８号公報JP 2000-187668 A 特開２００６−１０７１２９号公報JP 2006-107129 A 特開２０１０−１３４５８３号公報JP 2010-134583 A 特開平０７−１８２３６８号公報Japanese Patent Laid-Open No. 07-182368

Paul Lane、「データ・ウェアハウスでのビットマップ索引の使用」、Oracle Databaseデータ・ウェアハウス・ガイド１１ｇリリース１より、２００７年、日本オラクル株式会社、［平成２２年１１月８日検索］、インターネット＜URL：http://otndnld.oracle.co.jp/document/products/oracle11g/111/doc_dvd/server.111/E05763-01/indexes.htm＞Paul Lane, “Using Bitmap Indexes in Data Warehouses”, Oracle Database Data Warehousing Guide 11g Release 1, 2007, Oracle Corporation Japan [retrieved November 8, 2010], Internet <URL: http://otndnld.oracle.co.jp/document/products/oracle11g/111/doc_dvd/server.111/E05763-01/indexes.htm>

分散メモリデータベースシステムでは特に、処理にかかるコストおよび処理時間を低減するため、ノード間で発生する通信を可能な限り少なくすることが重要である。図１４〜１６に示した分散メモリデータベースシステム９０１の例は、説明を平易なものとするために、ごく少ないデータ個数について例示したものであるが、実際には膨大な量のデータについて処理を行う必要がある。 Particularly in the distributed memory database system, it is important to reduce the communication generated between the nodes as much as possible in order to reduce the processing cost and processing time. The example of the distributed memory database system 901 shown in FIGS. 14 to 16 is illustrated with respect to a very small number of data in order to simplify the description, but actually processes a huge amount of data. There is a need.

図１４〜１６に示した分散メモリデータベースシステム９０１では、「店ＩＤ」ごとにデータを分割して各データノード９２１〜９２３に記憶させているので、数１に示したように店ＩＤごとの集計であればデータノード９２１〜９２３の各々の単体のみで合計などの集合関数の計算ができる。従って、問い合わせ処理部９１１での処理は単純に各データノード９２１〜９２３から受信した合計などの数値をまとめるだけでよく、高速に集計処理を行うことができる。 In the distributed memory database system 901 shown in FIGS. 14 to 16, the data is divided for each “store ID” and stored in each data node 921 to 923. If so, a set function such as a sum can be calculated by each of the data nodes 921 to 923 alone. Therefore, the processing in the inquiry processing unit 911 simply summarizes numerical values such as the sum received from the data nodes 921 to 923, and can perform the aggregation processing at high speed.

しかしながら、データの分割の基準とした列（この場合は「店ＩＤ」）以外で集計を行おうとすると、各データノード９２１〜９２３相互間、およびそれらとフロントＤＢサーバ９１０との間でデータの交換が必要になる。そのため、集計処理に時間がかかり、またデータの交換に伴う通信コストが発生する。 However, if data is to be tabulated other than the column used as a reference for data division (in this case, “store ID”), data is exchanged between the data nodes 921 to 923 and between them and the front DB server 910. Is required. For this reason, it takes time for the aggregation processing, and communication costs associated with data exchange occur.

図１４〜１６に示した分散メモリデータベースシステム９０１の例でいうなら、以下の数２に示すクエリーを受け付けた場合には、データノード９２１〜９２３の各々の単体のみで合計などの集合関数の計算ができないので、全データについて「日付」を基準として再分割を行う（第１の方法）か、もしくは「店ＩＤ」ごとにデータを保持している各データノード９２１〜９２３で「日付」ごとの集計を行い、その結果をフロントＤＢサーバ９１０に送付して「日付」ごとの合計を算出する（第２の方法）かのいずれかの方法で算出することとなる。

In the example of the distributed memory database system 901 shown in FIGS. 14 to 16, when the query shown in the following equation 2 is accepted, calculation of a set function such as a sum is performed only by each of the data nodes 921 to 923. Therefore, all data is re-divided on the basis of “date” (first method) or each data node 921 to 923 holding data for each “store ID” for each “date”. Totaling is performed, and the result is sent to the front DB server 910, and the total for each “date” is calculated (second method).

この第１の方法では、一度表データ９６１〜９６３の全てをフロントＤＢサーバ９１０に送付して、元の表データ９６０に戻してから「日付」を基準として再分割を行った表データを改めて各データノード９２１〜９２３に送付する必要がある。例えばｍ台のデータノードがあり、ここに一行あたりｌバイトのデータが１台のノードに１日分ｎ件、ｓ（ｍ，ｎ，ｓ，ｌは各々自然数、簡単のためにｓ＝ｍとする）店舗分のデータが均等に存在するとした場合、この表データを再分割するには、以下の数３に示す通りのデータ容量の通信が発生することとなる。

In the first method, all of the table data 961 to 963 are once sent to the front DB server 910, returned to the original table data 960, and then re-divided on the basis of “date” as the table data. It is necessary to send to the data nodes 921 to 923. For example, there are m data nodes, where 1 byte of data per row is n times per day for one node, s (m, n, s, l are natural numbers, and s = m for simplicity. If the data for the stores are present evenly, in order to re-divide the table data, communication with a data capacity as shown in the following Equation 3 occurs.

また、発生する通信量だけでなく、個々のデータを読み取り、その各々をどのデータノードに移動するかを判定する処理も必要となるので、フロントＤＢサーバ９１０および各データノードで必要な処理量も増大する。 In addition to the amount of communication generated, it is also necessary to read individual data and determine which data node to move each of them, so the amount of processing required for the front DB server 910 and each data node also Increase.

第２の方法では、各データノード９２１〜９２３から「店ＩＤ・日付」ごとに求められた合計データがフロントＤＢサーバ９１０に送信される。このため、フロントＤＢサーバ９１０への通信量が多くなる。また、フロントＤＢサーバ９１０では（数１に示したクエリの場合と違って）単純な差し替えだけでなく、改めて「日付」ごとの合計を求める必要があるので、ここでの処理量も増大する。 In the second method, the total data obtained for each “store ID / date” is transmitted from each data node 921 to 923 to the front DB server 910. For this reason, the amount of communication to the front DB server 910 increases. In addition, since the front DB server 910 needs to obtain a total for each “date” in addition to simple replacement (unlike the query shown in Equation 1), the amount of processing here also increases.

この問題に対して、インデックス（索引）を用意して異なる集計軸による集計に対応するという方法が既に知られている。しかしながら、この場合であっても、集計操作のために計算対象のデータを他のコンピュータに対して送信する必要は発生する。このため、発生する通信量を削減する効果は小さい。 In order to solve this problem, a method of preparing an index (index) and corresponding to aggregation by different aggregation axes is already known. However, even in this case, it is necessary to transmit the calculation target data to another computer for the tabulation operation. For this reason, the effect of reducing the amount of communication generated is small.

また、特許文献１および４〜５には、同一列の中で同一の値にＩＤ（もしくはラベル）を付与して、これに基づいてグループ分け（いわゆるレンジパーティショニング）を行ったデータを各データノードに保存するという技術が記載されている。これを利用すれば、生のデータそのものを通信するよりも、多少の通信量の削減にはなる。しかしながら、フロントＤＢサーバ９１０の側での処理量は軽減されないどころか、ＩＤを実際の値に置換する処理を伴うので、処理量はむしろ増大する。残る特許文献２〜４および非特許文献１にも、この問題点について解決しうる技術は記載されていない。 In Patent Documents 1 and 4 to 5, data obtained by assigning IDs (or labels) to the same values in the same column and performing grouping (so-called range partitioning) based on the IDs (or labels) is provided for each data. A technique for storing in a node is described. If this is used, the amount of communication will be reduced somewhat rather than communicating raw data itself. However, since the processing amount on the front DB server 910 side is not reduced, the processing amount is rather increased because the processing involves replacing the ID with an actual value. The remaining Patent Documents 2 to 4 and Non-Patent Document 1 do not describe a technique that can solve this problem.

本発明の目的は、通信量および処理量を削減して、合計などの集合関数に対する処理を高速に行うことを可能とする分散メモリデータベースシステム、フロントデータベースサーバ、データ処理方法およびプログラムを提供することにある。 An object of the present invention is to provide a distributed memory database system, a front database server, a data processing method, and a program that can reduce the amount of communication and the amount of processing, and can perform processing on a set function such as a sum at high speed. It is in.

上記目的を達成するため、本発明に係る分散メモリデータベースシステムは、フロントデータベースサーバと複数台のデータノードとが相互に接続されて構成される分散メモリデータベースシステムであって、フロントデータベースサーバが、外部から入力される表データを分割して各々の実際のデータを値ＩＤに置換した複数の値ＩＤ表を生成してこれらを各データノードに分散して記憶させるデータ構造変換部と、外部のクライアントマシンから発行された集合関数を含むクエリーに基づいて各データノードに対して表データの中の特定の値ＩＤの出現数を問い合わせると共に、これに応じて各データノードから返された特定の値ＩＤの出現数からクエリーに対応する集合関数の値を計算してクライアントマシンに返送する問い合わせ処理部とを有し、データ構造変換部が、複数の値ＩＤ表を生成する際に、表データの中で集計軸になり得る列としてあらかじめ指定された複数のデータ項目の各々について個別に複数の値ＩＤ表を生成することを特徴とする。 To achieve the above object, a distributed memory database system according to the present invention is a distributed memory database system configured by connecting a front database server and a plurality of data nodes to each other, and the front database server is externally connected. A data structure conversion unit that generates a plurality of value ID tables by dividing the table data input from, and replaces each actual data with a value ID, and distributes and stores them in each data node; and an external client Based on the query including the set function issued from the machine, each data node is inquired about the number of occurrences of the specific value ID in the table data, and the specific value ID returned from each data node in response to this is inquired. Query processing that calculates the value of the set function corresponding to the query from the number of occurrences and returns it to the client machine When the data structure conversion unit generates a plurality of value ID tables, each of the plurality of data items designated in advance as a column that can serve as an aggregation axis in the table data A value ID table is generated.

上記目的を達成するため、本発明に係るフロントデータベースサーバは、複数台のデータノードと相互に接続されて分散メモリデータベースシステムを構成するフロントデータベースサーバであって、外部から入力される表データを分割して各々の実際のデータを値ＩＤに置換した複数の値ＩＤ表を生成してこれらを各データノードに分散して記憶させるデータ構造変換部と、外部のクライアントマシンから発行された集合関数を含むクエリーに基づいて各データノードに対して表データの中の特定の値ＩＤの出現数を問い合わせると共に、これに応じて各データノードから返された特定の値ＩＤの出現数からクエリーに対応する集合関数の値を計算してクライアントマシンに返送する問い合わせ処理部とを有し、データ構造変換部が、複数の値ＩＤ表を生成する際に、表データの中で集計軸になり得る列としてあらかじめ指定された複数のデータ項目の各々について個別に複数の値ＩＤ表を生成することを特徴とする。 In order to achieve the above object, a front database server according to the present invention is a front database server that is connected to a plurality of data nodes to constitute a distributed memory database system, and divides table data input from the outside. A data structure conversion unit that generates a plurality of value ID tables in which each actual data is replaced with a value ID and stores them in each data node, and a set function issued from an external client machine Based on the included query, each data node is inquired about the number of occurrences of the specific value ID in the table data, and in response to this, the query is handled from the number of occurrences of the specific value ID returned from each data node. A query processing unit that calculates the value of the set function and sends it back to the client machine. When generating the value ID table, and generating a plurality of values ID table individually for each of the plurality of data items specified in advance as a column which can be a tabulation axis in the table data.

上記目的を達成するため、本発明に係るデータ処理方法は、フロントデータベースサーバと複数台のデータノードとが相互に接続されて構成される分散メモリデータベースシステムにあって、外部からの表データの入力をフロントデータベースサーバのデータ構造変換部が受け付け、入力された表データを、集計軸になり得る列としてあらかじめ指定されたデータ項目の各々についてフロントデータベースサーバのデータ構造変換部が個別に分割して実際のデータを値ＩＤに置換した複数の値ＩＤ表を生成し、生成された複数の値ＩＤ表をフロントデータベースサーバのデータ構造変換部が各データノードに分散して記憶させ、外部のクライアントマシンから発行された集合関数を含むクエリーをフロントデータベースサーバの問い合わせ処理部が受け付け、受け付けられたクエリーに基づいて表データの中の特定の値ＩＤの出現数をフロントデータベースサーバの問い合わせ処理部が各データノードに問い合わせ、各データノードから返された特定の値ＩＤの出現数からフロントデータベースサーバの問い合わせ処理部がクエリーに対応する集合関数の値を計算してクライアントマシンに返送することを特徴とする。 In order to achieve the above object, a data processing method according to the present invention is a distributed memory database system in which a front database server and a plurality of data nodes are connected to each other. The data structure conversion unit of the front database server accepts the input table data, and the data structure conversion unit of the front database server individually divides the input table data for each of the data items specified in advance as columns that can be aggregate axes. A plurality of value ID tables are generated by replacing the data with value IDs, and the data structure conversion unit of the front database server stores the generated value ID tables in a distributed manner in each data node. Query the front database server for the query that contains the issued aggregate function The inquiry processing unit of the front database server inquires each data node about the number of occurrences of the specific value ID in the table data based on the received query, and the specific value ID returned from each data node. The query processing unit of the front database server calculates the value of the set function corresponding to the query from the number of appearances and returns it to the client machine.

上記目的を達成するため、本発明に係るデータ処理プログラムは、フロントデータベースサーバと複数台のデータノードとが相互に接続されて構成される分散メモリデータベースシステムにあって、フロントデータベースサーバが備えるコンピュータに、外部からの表データの入力を受け付ける手順、入力された表データを、集計軸になり得る列としてあらかじめ指定されたデータ項目の各々について個別に分割して実際のデータを値ＩＤに置換した複数の値ＩＤ表を生成する手順、生成された複数の値ＩＤ表を各データノードに分散して記憶させる手順、外部のクライアントマシンから発行された集合関数を含むクエリーを受け付ける手順、受け付けられたクエリーに基づいて表データの中の特定の値ＩＤの出現数を各データノードに問い合わせる手順、および各データノードから返された特定の値ＩＤの出現数からクエリーに対応する集合関数の値を計算してクライアントマシンに返送する手順を実行させることを特徴とする。 In order to achieve the above object, a data processing program according to the present invention is a distributed memory database system in which a front database server and a plurality of data nodes are connected to each other in a computer included in the front database server. A procedure for receiving input of table data from the outside, a plurality of input table data divided into individual data items designated in advance as columns that can be aggregated axes, and replacing actual data with value IDs For generating a value ID table, a procedure for storing a plurality of generated value ID tables in each data node, a procedure for receiving a query including a set function issued from an external client machine, a received query Ask each data node for the number of occurrences of a specific value ID in the table data based on Characterized in that to execute a procedure to return to the client machine instructions cause I, and the number of occurrences of a particular value ID returned from each data node by calculating the value of the set function corresponding to the query.

本発明は、上述した通り、集計軸になり得る列としてあらかじめ指定されたデータ項目の各々について表データを個別に分割するように構成したので、集計軸になり得るどの列に対しても、他の装置との通信の発生を抑制して、各データノードの内部だけで集合関数に関する処理を行うことが可能となる。これによって、通信量および処理量を削減して、合計などの集合関数に対する処理を高速に行うことを可能であるという優れた特徴を持つ分散メモリデータベースシステム、フロントデータベースサーバ、データ処理方法およびプログラムを提供することが可能となる。 As described above, the present invention is configured to divide the table data individually for each data item designated in advance as a column that can be the aggregation axis. It is possible to perform processing related to the set function only within each data node while suppressing the occurrence of communication with the device. As a result, a distributed memory database system, a front database server, a data processing method, and a program having an excellent feature that it is possible to reduce the amount of communication and the amount of processing, and to perform processing on a set function such as a total at high speed. It becomes possible to provide.

本発明の第１の実施形態に係る分散メモリデータベースシステムの構成を示す説明図である。It is explanatory drawing which shows the structure of the distributed memory database system which concerns on the 1st Embodiment of this invention. 図１で説明した分散メモリデータベースシステムに対して入力される表データの一例を示す説明図である。It is explanatory drawing which shows an example of the table data input with respect to the distributed memory database system demonstrated in FIG. 図２で示した表データからデータ構造変換部が作成する値リストおよび値ＩＤ表を示す説明図である。It is explanatory drawing which shows the value list and value ID table which a data structure conversion part produces from the table data shown in FIG. 図１に示したデータ構造変換部が、図２に示すデータを図３に示すように分割して各データノードに分配して記憶させる処理を示すフローチャートである。FIG. 4 is a flowchart showing a process in which the data structure conversion unit shown in FIG. 1 divides the data shown in FIG. 2 as shown in FIG. 図１に示したデータ配置情報の、図２および図３に示した各データに対応する例について示す説明図である。It is explanatory drawing shown about the example corresponding to each data shown in FIG. 2 and FIG. 3 of the data arrangement | positioning information shown in FIG. 数４に示したクエリーに対して、図１で説明した分散メモリデータベースシステムで行われる処理を示すフローチャートである。5 is a flowchart showing processing performed in the distributed memory database system described with reference to FIG. 1 for the query shown in Equation 4. 図６のステップＳ４０３（数４）に示す処理で、データノードからフロントＤＢサーバに返却される「値ＩＤ」ごとの出現数の表を示す説明図である。FIG. 7 is an explanatory diagram showing a table of the number of appearances for each “value ID” returned from the data node to the front DB server in the process shown in step S403 (Equation 4) of FIG. 6. 図６のステップＳ４０６（数５）に示した処理の結果、クライアントマシンに返却される日付ごとの売上の合計を示す結果データについて示す説明図である。It is explanatory drawing shown about result data which shows the sum total of the sales for every date returned as a result of the process shown to step S406 (Formula 5) of FIG. 図６のステップＳ４０３（数６）に示す処理で、データノードからフロントＤＢサーバに返却される「値ＩＤ」ごとの出現数の表を示す説明図である。FIG. 7 is an explanatory diagram showing a table of the number of appearances for each “value ID” returned from the data node to the front DB server in the process shown in step S403 (formula 6) of FIG. 6. 図６のステップＳ４０６（数７）に示した処理の結果、クライアントマシンに返却される店ＩＤごとの売上の合計を示す結果データについて示す説明図である。FIG. 7 is an explanatory diagram showing result data indicating the total sales for each shop ID returned to the client machine as a result of the process shown in step S406 (Equation 7) in FIG. 6; 本発明の第２の実施形態に係る分散メモリデータベースシステムの構成を示す説明図である。It is explanatory drawing which shows the structure of the distributed memory database system which concerns on the 2nd Embodiment of this invention. 図１１で説明した分散メモリデータベースシステムに対して入力される表データの一例を示す説明図である。It is explanatory drawing which shows an example of the table data input with respect to the distributed memory database system demonstrated in FIG. 図１２に示した表データからデータ構造変換部が作成する値ＩＤ表の例を示す説明図である。It is explanatory drawing which shows the example of the value ID table which a data structure conversion part produces from the table data shown in FIG. 一般的な分散メモリデータベースシステムの構成を示す説明図である。It is explanatory drawing which shows the structure of a general distributed memory database system. 図１４に示した分散メモリデータベースシステムに対して入力される表データの一例を示す説明図である。It is explanatory drawing which shows an example of the table data input with respect to the distributed memory database system shown in FIG. 図１５に示した表データをデータ構造変換部が分割して生成した店ＩＤ別の表データを示す説明図である。It is explanatory drawing which shows the table data according to shop ID which the data structure conversion part divided | segmented and produced | generated the table data shown in FIG.

（第１の実施形態）
以下、本発明の第１の実施形態の構成について添付図１〜３に基づいて説明する。
最初に、本実施形態の基本的な内容について説明し、その後でより具体的な内容について説明する。
本実施形態に係る分散メモリデータベースシステム１は、フロントデータベースサーバ（フロントＤＢサーバ１０）と複数台のデータノード２１〜２３とが相互に接続されて構成される分散メモリデータベースシステムである。フロントデータベースサーバ１０は、外部から入力される表データを分割して各々の実際のデータを値ＩＤに置換した複数の値ＩＤ表２２１〜２２２，２３１〜２３３を生成してこれらを各データノードに分散して記憶させるデータ構造変換部１１３と、外部のクライアントマシン５０から発行された集合関数を含むクエリーに基づいて各データノードに表データの中の特定の値ＩＤの出現数を問い合わせると共にこれに応じて各データノードから返された特定の値ＩＤの出現数からクエリーに対応する集合関数の値を計算してクライアントマシンに返送する問い合わせ処理部１１１とを有する。そして、データ構造変換部１１３は、これら複数の値ＩＤ表を生成する際に、表データの中で集計軸になり得る列としてあらかじめ指定されたデータ項目の各々について個別に複数の値ＩＤ表を生成する。 (First embodiment)
Hereinafter, the structure of the 1st Embodiment of this invention is demonstrated based on attached FIGS. 1-3.
First, the basic content of the present embodiment will be described, and then more specific content will be described.
The distributed memory database system 1 according to the present embodiment is a distributed memory database system configured by connecting a front database server (front DB server 10) and a plurality of data nodes 21 to 23 to each other. The front database server 10 generates a plurality of value ID tables 221 to 222, 231 to 233 in which table data input from the outside is divided and each actual data is replaced with a value ID, and these are stored in each data node. Based on the data structure conversion unit 113 to be stored in a distributed manner and a query including a set function issued from the external client machine 50, each data node is inquired about the number of occurrences of a specific value ID in the table data. In response, a query processing unit 111 that calculates the value of the set function corresponding to the query from the number of occurrences of the specific value ID returned from each data node and returns it to the client machine. Then, when the data structure conversion unit 113 generates the plurality of value ID tables, the data structure conversion unit 113 individually generates a plurality of value ID tables for each of the data items designated in advance as a column that can be an aggregation axis in the table data. Generate.

また、フロントデータベースサーバ１０のデータ構造変換部１１３は、複数の値ＩＤ表２２１〜２２２，２３１〜２３３を生成する際に、表データの中で集計軸および集計対象になり得る列としてあらかじめ指定されたデータ項目の各々について実際の値を値ＩＤに置換すると共に、値ＩＤと実際の値との対応を示す値リスト２１１〜２１３を生成する機能を有し、フロントデータベースサーバ１０が、値リストおよび値ＩＤ表が各データノードの中のいずれに分散されたかをあらかじめ備えられた記憶手段に記憶するデータ配置情報管理部１１２を有する。 In addition, when the data structure conversion unit 113 of the front database server 10 generates a plurality of value ID tables 221 to 222 and 231 to 233, the data structure conversion unit 113 is designated in advance as columns that can be aggregated axes and aggregation targets in the table data. For each of the data items, an actual value is replaced with a value ID, and a value list 211 to 213 indicating the correspondence between the value ID and the actual value is generated. A data arrangement information management unit 112 that stores in which storage unit the value ID table is distributed among the data nodes is provided.

さらに、フロントデータベースサーバ１０のデータ構造変換部１１３が、値ＩＤ表２２１〜２２２，２３１〜２３３を生成する際、実際の値を大小順にソートしてから値ＩＤに置換する。 Furthermore, when the data structure conversion unit 113 of the front database server 10 generates the value ID tables 221 to 222 and 231 to 233, the actual values are sorted in order of magnitude and replaced with the value ID.

以上の構成を備えることにより、分散メモリデータベースシステム１は、通信量および処理量を削減して、合計などの集合関数に対する処理を高速に行うことが可能となる。
以下、これをより詳細に説明する。 With the above configuration, the distributed memory database system 1 can reduce the amount of communication and the amount of processing, and can perform processing on a set function such as a sum at high speed.
Hereinafter, this will be described in more detail.

図１は、本発明の第１の実施形態に係る分散メモリデータベースシステム１の構成を示す説明図である。分散メモリデータベースシステム１は、フロントデータベースサーバ１０（以後フロントＤＢサーバ１０という）と、複数台のデータノード２１〜２３とが内部ネットワーク３０を介して相互に接続されて構成される。図１では３台のデータノード２１〜２３を示しているが、もちろんこの台数は２台以上何台でもよい。 FIG. 1 is an explanatory diagram showing a configuration of a distributed memory database system 1 according to the first embodiment of the present invention. The distributed memory database system 1 includes a front database server 10 (hereinafter referred to as a front DB server 10) and a plurality of data nodes 21 to 23 connected to each other via an internal network 30. Although three data nodes 21 to 23 are shown in FIG. 1, of course, this number may be two or more.

また、内部ネットワーク３０は、外部ネットワーク４０を介して、外部のコンピュータ装置であるクライアントマシン５０と接続されている。内部ネットワーク３０および外部ネットワーク４０の、ネットワーク方式やプロトコルなどは任意のものを利用できる。フロントＤＢサーバ１０は、内部ネットワーク３０および外部ネットワーク４０を通じて、クライアントマシン５０からの操作を受け付けることができる。 The internal network 30 is connected to a client machine 50 that is an external computer device via an external network 40. Any network system and protocol for the internal network 30 and the external network 40 can be used. The front DB server 10 can accept operations from the client machine 50 through the internal network 30 and the external network 40.

クライアントマシン５０はフロントＤＢサーバ１０に対してクエリー（処理要求）を発行し、このクエリーに基づくデータ処理をフロントＤＢサーバ１０とデータノード２１〜２３とが連携して行い、フロントＤＢサーバ１０はその検索結果をクライアントマシン５０に返す。その際、フロントＤＢサーバ１０は、クライアントマシン５０から発行されたクエリーの各データノード２１〜２３に向けての分割と、各データノード２１〜２３からの結果の集約とを行う。 The client machine 50 issues a query (processing request) to the front DB server 10, and the front DB server 10 and the data nodes 21 to 23 perform data processing based on the query in cooperation with the front DB server 10. The search result is returned to the client machine 50. At that time, the front DB server 10 divides the query issued from the client machine 50 toward the data nodes 21 to 23 and aggregates the results from the data nodes 21 to 23.

フロントＤＢサーバ１０は、主演算制御手段１０１、記憶手段１０２、および通信手段１０３を備えるコンピュータ装置である。主演算制御手段１０１はコンピュータプログラムの動作主体となるＣＰＵ（Central Processing Unit）であり、記憶手段１０２は主演算制御手段１０１が作業中のデータを記憶するＲＡＭ（Random Access Memory）などのような主記憶装置である。通信手段１０３は内部ネットワーク３０および外部ネットワーク４０を介して、他のコンピュータとのデータ通信を行う。 The front DB server 10 is a computer device that includes main calculation control means 101, storage means 102, and communication means 103. The main arithmetic control means 101 is a CPU (Central Processing Unit) that is the main body of the computer program, and the storage means 102 is a main memory such as a RAM (Random Access Memory) that stores data that the main arithmetic control means 101 is working on. It is a storage device. The communication means 103 performs data communication with other computers via the internal network 30 and the external network 40.

主演算制御手段１０１では、問い合わせ処理部１１１、データ配置情報管理部１１２、およびデータ構造変換部１１３が、各々コンピュータプログラムとして後述するそれぞれの機能を実行されるように構成されている。また、記憶手段１０２には、後述するデータ配置情報１２１が記憶されている。 In the main calculation control means 101, the inquiry processing unit 111, the data arrangement information management unit 112, and the data structure conversion unit 113 are configured to execute respective functions described later as computer programs. The storage unit 102 stores data arrangement information 121 to be described later.

問い合わせ処理部１１１は、クライアントマシン５０が発行したクエリーを受け付け、このクエリーで処理対象となるデータ項目の所在をデータ配置情報管理部１１２に対して照会し、この照会に応じてデータ配置情報管理部１１２から得られた回答を元にしてクライアントマシン５０からのクエリーを各データノード２１〜２３ごとに分割して、分割されたクエリーを各データノード２１〜２３に送信する。そして、送信した各クエリーに対する各データノード２１〜２３からの回答を集約してクライアントマシン５０に返す。 The inquiry processing unit 111 accepts a query issued by the client machine 50, inquires the data arrangement information management unit 112 about the location of the data item to be processed by this query, and in response to this inquiry, the data arrangement information management unit Based on the answer obtained from 112, the query from the client machine 50 is divided for each data node 21-23, and the divided query is transmitted to each data node 21-23. Then, the answers from the data nodes 21 to 23 for each transmitted query are collected and returned to the client machine 50.

データ配置情報管理部１１２は、問い合わせ処理部１１１から照会のあったデータ項目のデータが各データノード２１〜２３のうちのいずれに存在するかを、データ配置情報１２１を参照して、問い合わせ処理部１１１に返答する。 The data arrangement information management unit 112 refers to the data arrangement information 121 to determine which of the data nodes 21 to 23 the data of the data item queried from the inquiry processing unit 111 exists. Reply to 111.

データ構造変換部１１３は、この後説明するように、表構造データを本システムに入力する際に、後述する本実施形態に独特のデータ構造に変換して分割し、各データノード２１〜２３に記憶させる。 As will be described later, when the table structure data is input to the present system, the data structure conversion unit 113 converts the data structure into a data structure unique to the present embodiment described later and divides the data structure into each data node 21 to 23. Remember.

各データノード２１〜２３も、フロントＤＢサーバ１０と同じく一般的なコンピュータ装置としての構成を備えるが、ハードウェアおよびソフトウェアとしては全て同一の構成を有し、ただ各々の記憶している内容が異なっているのみである。従って、図１ではデータノード２１についてのみ詳しい構成を示す。データノード２１は、フロントＤＢサーバ１０と同様に、主演算制御手段２０１、記憶手段２０２、および通信手段２０３を備えるコンピュータ装置である。 Each of the data nodes 21 to 23 also has a configuration as a general computer device like the front DB server 10, but all have the same configuration as hardware and software, but the contents stored therein are different. Only. Therefore, FIG. 1 shows a detailed configuration only for the data node 21. Similar to the front DB server 10, the data node 21 is a computer device including a main arithmetic control unit 201, a storage unit 202, and a communication unit 203.

主演算制御手段２０１では、問い合わせ処理部２０４が、各々コンピュータプログラムとして後述するそれぞれの機能を実行されるように構成されている。また、記憶手段２０２には、後述する値リスト２１１〜２１３、および値ＩＤ表２２１〜２２２，２３１〜２３３が記憶されている。問い合わせ処理部２０４は、フロントＤＢサーバ１０で分割生成された値リスト２１１〜２１３、および値ＩＤ表２２１〜２２２，２３１〜２３３のうち、自身に割り当てられたものを記憶手段２０２に記憶し、これらの値リストおよび値ＩＤ表に対して検索などの処理を行い、その結果をフロントＤＢサーバ１０に返す。 In the main arithmetic control means 201, the inquiry processing unit 204 is configured to execute each function described later as a computer program. The storage unit 202 stores value lists 211 to 213 and value ID tables 221 to 222 and 231 to 233 described later. The inquiry processing unit 204 stores the value list 211 to 213 and the value ID tables 221 to 222 and 231 to 233 that are divided and generated by the front DB server 10 in the storage unit 202. The value list and the value ID table are searched, and the result is returned to the front DB server 10.

図２は、図１で説明した分散メモリデータベースシステム１に対して入力される表データ２１０の一例を示す説明図である。このデータ入力はクライアントマシン５０から行われてもよいし、フロントＤＢサーバ１０から直接行われてもよい。データ構造変換部１１３は、この入力された表データ２１０を、以下で説明する形式に変換して、複数のデータノード２１〜２３に分配して記憶させる。 FIG. 2 is an explanatory diagram showing an example of the table data 210 input to the distributed memory database system 1 described in FIG. This data input may be performed from the client machine 50 or directly from the front DB server 10. The data structure conversion unit 113 converts the input table data 210 into a format described below, distributes the data to the plurality of data nodes 21 to 23, and stores them.

図２で示した表データ２１０は、日付２１０ａ、店ＩＤ２１０ｂ、売上２１０ｃという３つの項目のデータを持つ。このうちの「日付２１０ａ」および「店ＩＤ２１０ｂ」が「集計軸になり得る列（以後基準列という）」、売上２１０ｃが「集計対象になり得る列（以後対象列という）」としてあらかじめ指定されている。なお、表データ２１０の内容は図１５の表データ９１０と同一である。 The table data 210 shown in FIG. 2 has three items of data: date 210a, store ID 210b, and sales 210c. Of these, “date 210a” and “store ID 210b” are designated in advance as “columns that can be aggregated axes (hereinafter referred to as reference columns)” and sales 210c are designated in advance as “columns that can be aggregated (hereinafter referred to as target columns)”. Yes. The contents of the table data 210 are the same as the table data 910 in FIG.

図３は、図２で示した表データ２１０からデータ構造変換部１１３が作成する値リスト２１１〜２１３、および値ＩＤ表２２１〜２２２，２３１〜２３３を示す説明図である。図２で示した表データ２１０で、基準列として指定された項目のうち、日付２１０ａは「８月１０日」「８月１１日」の２種類、店ＩＤ２１０ｂは「Ａ１」「Ｄ３」「Ｅ１」の３種類の値が使われている。 FIG. 3 is an explanatory diagram showing value lists 211 to 213 and value ID tables 221 to 222 and 231 to 233 created by the data structure conversion unit 113 from the table data 210 shown in FIG. Among the items specified as the reference column in the table data 210 shown in FIG. 2, the date 210a has two types “August 10” and “August 11”, and the store ID 210b has “A1” “D3” “E1”. Three types of values are used.

そこで、データ構造変換部１１３は、各列に存在するユニークな値を小さいものから順に整列して並べ、先頭から順にそれぞれの値を特定する番号（値ＩＤ）を付与し、「日付」値リスト２１１、「店ＩＤ」値リスト２１２、「売上」値リスト２１３という３通りの値リストを作成する。 Therefore, the data structure conversion unit 113 arranges the unique values existing in each column in an ascending order, assigns numbers (value IDs) for specifying the respective values in order from the top, and sets the “date” value list. Three value lists are created: 211, “Store ID” value list 212, and “Sales” value list 213.

「日付」値リスト２１１は、「日付」値ＩＤと日付との対応を示す。「日付」値ＩＤ＝「０」の場合には日付＝「８月１０日」、「日付」値ＩＤ＝「１」の場合には日付＝「８月１１日」となる。 The “date” value list 211 indicates correspondence between “date” value IDs and dates. When “date” value ID = “0”, date = “August 10”, and when “date” value ID = “1”, date = “August 11”.

「店ＩＤ」値リスト２１２は、「店ＩＤ」値ＩＤと店ＩＤの値との対応を示す。「店ＩＤ」値ＩＤ＝「０」の場合には店ＩＤ＝「Ａ１」、「店ＩＤ」値ＩＤ＝「１」の場合には店ＩＤ＝「Ｄ３」、「店ＩＤ」値ＩＤ＝「２」の場合には店ＩＤ＝「Ｅ１」となる。 The “store ID” value list 212 indicates the correspondence between the “store ID” value ID and the store ID value. When “Store ID” value ID = “0”, Store ID = “A1”. When “Store ID” value ID = “1”, Store ID = “D3”, “Store ID” value ID = “ In the case of “2”, the store ID = “E1”.

「売上」値リスト２１３は、「売上」値ＩＤと売上の値との対応を示す。「売上」値ＩＤ＝「０」〜「４」の場合で、売上は各々「８００」「１０００」「１２００」「４８００」「１２０００」となる。 The “sales” value list 213 indicates correspondence between “sales” value IDs and sales values. In the case of “sales” value ID = “0” to “4”, the sales are “800”, “1000”, “1200”, “4800”, and “12000”, respectively.

そしてデータ構造変換部１１３は、各々の値ＩＤで表データ２１０を置き換え、さらに日付２１０ａと店ＩＤ２１０ｂの値ごとに表データ２１０を分割して、値ＩＤ表２２１〜２２２，２３１〜２３３を作成する。値ＩＤ表２２１は、「日付」値ＩＤ＝「０」の場合の「店ＩＤ」値ＩＤと「売上」値ＩＤの対応を示す。値ＩＤ表２２２は、「日付」値ＩＤ＝「１」の場合の「店ＩＤ」値ＩＤと「売上」値ＩＤの対応を示す。 Then, the data structure conversion unit 113 replaces the table data 210 with each value ID, further divides the table data 210 for each value of the date 210a and the store ID 210b, and creates the value ID tables 221 to 222, 231 to 233. . The value ID table 221 shows the correspondence between the “store ID” value ID and the “sales” value ID when “date” value ID = “0”. The value ID table 222 shows the correspondence between the “store ID” value ID and the “sales” value ID when “date” value ID = “1”.

値ＩＤ表２３１は、「店ＩＤ」値ＩＤ＝「０」の場合の「日付」値ＩＤと「売上」値ＩＤの対応を示す。値ＩＤ表２３２は、「店ＩＤ」値ＩＤ＝「１」の場合の「日付」値ＩＤと「売上」値ＩＤの対応を示す。値ＩＤ表２３３は、「店ＩＤ」値ＩＤ＝「２」の場合の「日付」値ＩＤと「売上」値ＩＤの対応を示す。 The value ID table 231 shows the correspondence between the “date” value ID and the “sales” value ID in the case of “store ID” value ID = “0”. The value ID table 232 shows the correspondence between the “date” value ID and the “sales” value ID in the case of “store ID” value ID = “1”. The value ID table 233 shows the correspondence between the “date” value ID and the “sales” value ID in the case of “store ID” value ID = “2”.

データ構造変換部１１３は、以上で作成した値リスト２１１〜２１３、および値ＩＤ表２２１〜２２２，２３１〜２３３を、各データノード２１〜２３に分配して記憶させる。図３に示した例では、データ構造変換部１１３は、値ＩＤ表２２２、値ＩＤ表２３１、および値リスト２１１をデータノード２１の記憶手段２０２に記憶させている。また、値ＩＤ表２２１、値ＩＤ表２３２、および値リスト２１２をデータノード２２の記憶手段２０２に記憶させている。さらに値ＩＤ表２３３、および値リスト２１３をデータノード２３の記憶手段２０２に記憶させている。 The data structure conversion unit 113 distributes and stores the value lists 211 to 213 and the value ID tables 221 to 222 and 231 to 233 created as described above to the data nodes 21 to 23. In the example illustrated in FIG. 3, the data structure conversion unit 113 stores the value ID table 222, the value ID table 231, and the value list 211 in the storage unit 202 of the data node 21. Further, the value ID table 221, the value ID table 232, and the value list 212 are stored in the storage unit 202 of the data node 22. Further, the value ID table 233 and the value list 213 are stored in the storage unit 202 of the data node 23.

ここでデータ構造変換部１１３が行う処理については、データノード２１〜２３のうちの特定の１つのデータノードに、データの容量および処理量が大きく偏ることがなければ、任意の分割方法を使用することができる。また、図２で示した表データ２１０のデータ構造を設計する際に、基準列および対象列といった各列の属性について、あらかじめ操作者が入力しているものとする。データ構造変換部１１３は、「基準列」および「対象列」であると設定されたデータ列の全てについて値リストを作成し、かつ「基準列」であると設定されたデータ列の全てについて上記で示したように表データ２１０を分割する。 Here, with respect to the processing performed by the data structure conversion unit 113, an arbitrary division method is used as long as the data capacity and the processing amount are not largely biased to one specific data node among the data nodes 21 to 23. be able to. Furthermore, when designing the data structure of the table data 210 shown in FIG. 2, it is assumed that the operator has input in advance the attributes of each column such as the reference column and the target column. The data structure conversion unit 113 creates a value list for all the data columns set to be the “reference column” and the “target column”, and the above-described data for all the data columns set to be the “reference column”. The table data 210 is divided as shown in FIG.

図４は、図１に示したデータ構造変換部１１３が、図２に示すデータを図３に示すように分割して各データノード２１〜２３に分配して記憶させる処理を示すフローチャートである。まず、表データ２１０のデータ構造について、ユーザによる入力を受け付ける（ステップＳ３０１）。この際に、どのデータ項目が「基準列」や「対象列」であるかについての入力も同時に行われる。 FIG. 4 is a flowchart showing a process in which the data structure conversion unit 113 shown in FIG. 1 divides the data shown in FIG. 2 as shown in FIG. 3, distributes the data to the data nodes 21 to 23, and stores them. First, an input by the user is accepted for the data structure of the table data 210 (step S301). At this time, an input as to which data item is the “reference column” or “target column” is also performed at the same time.

その次に、ユーザから表データ２１０の入力を受け付け（ステップＳ３０２）、これが完了すると基準列のうちの１つについて（ステップＳ３０３〜３０４）、まずデータ列の値を値ＩＤに置き換えて（ステップＳ３０５）、同時にこの基準列の値ごとに表を分割する（ステップＳ３０６）。データ構造変換部１１３は、このステップＳ３０５〜３０６の処理を全ての基準列について繰り返すと共に、置き換えた値と値ＩＤの対応を示す値リスト２１１〜２１３も同時に作成する。 Next, the input of the table data 210 is received from the user (step S302), and when this is completed, the value of the data string is first replaced with the value ID for one of the reference columns (steps S303 to 304) (step S305). At the same time, the table is divided for each value of the reference column (step S306). The data structure conversion unit 113 repeats the processing in steps S305 to S306 for all the reference columns, and simultaneously creates value lists 211 to 213 indicating the correspondence between the replaced values and the value IDs.

全ての基準列についてこの処理が完了したら、出来上がった値リストおよび値ＩＤ表を、各データノード２１〜２３に送信して記憶させる（ステップＳ３０７）。 When this process is completed for all the reference columns, the completed value list and value ID table are transmitted to and stored in the data nodes 21 to 23 (step S307).

図５は、図１に示したデータ配置情報１２１の、図２および図３に示した各データに対応する例について示す説明図である。データ配置情報１２１は、各データノード２１〜２３のコンピュータ名を示すデータノード名１２１ａと、当該コンピュータに記憶される値リスト２１１〜２１３、および値ＩＤ表２２１〜２２２，２３１〜２３３のデータ種類を示す記憶内容１２１ｂとの対応を示す。 FIG. 5 is an explanatory diagram showing an example of the data arrangement information 121 shown in FIG. 1 corresponding to each data shown in FIG. 2 and FIG. The data arrangement information 121 includes the data node name 121a indicating the computer name of each data node 21 to 23, the value lists 211 to 213 stored in the computer, and the data types of the value ID tables 221 to 222 and 231 to 233. The correspondence with the stored content 121b shown is shown.

データ配置情報１２１は、図５で示した例のような形に限られるものではなく、計算で使用されるデータ、および値ＩＤに対応する実際の値が、各データノード２１〜２３のうちのいずれに記憶されているかを特定できさえすれば、どのようなデータ形式でもよい。 The data arrangement information 121 is not limited to the form shown in the example shown in FIG. 5, and the data used in the calculation and the actual value corresponding to the value ID are among the data nodes 21 to 23. Any data format may be used as long as it can be specified where the data is stored.

（日付別の売上集計処理）
このように各データノード２１〜２３に分配して記憶されたデータに対して、クライアントマシン５０からフロントＤＢサーバ１０に対してクエリー（ＳＱＬ文）が発行された際の、フロントＤＢサーバ１０および各データノード２１〜２３で行われる処理について以下に説明する。 (Sales summary processing by date)
As described above, when the query (SQL sentence) is issued from the client machine 50 to the front DB server 10 for the data distributed and stored in each of the data nodes 21 to 23, the front DB server 10 and each Processing performed in the data nodes 21 to 23 will be described below.

以下に示す数４は、図２〜３に示した内容のデータ例に対して、クライアントマシン５０がフロントＤＢサーバ１０に対して発行するクエリーの一例である。これは、表データ２１０から日付ごとの売上の合計を求めるクエリーである。

Formula 4 shown below is an example of a query that the client machine 50 issues to the front DB server 10 for the data example having the contents shown in FIGS. This is a query for obtaining the total sales for each date from the table data 210.

図６は、数４に示したクエリーに対して、図１で説明した分散メモリデータベースシステム１で行われる処理を示すフローチャートである。数４に示したクエリーを受けたフロントＤＢサーバ１０の問い合わせ処理部１１１は、データ配置情報管理部１１２がデータ配置情報１２１を参照して、「売上表」に対し「日付」別の値ＩＤ表が各データノード２１〜２３上に存在することを確認し、分割された各表の存在するノードを特定する（ステップＳ４０１）。 FIG. 6 is a flowchart showing processing performed in the distributed memory database system 1 described with reference to FIG. The query processing unit 111 of the front DB server 10 that has received the query shown in Equation 4 refers to the data allocation information 121 by the data allocation information management unit 112, and a value ID table for each “date” relative to “sales table”. Is present on each of the data nodes 21 to 23, and the node where each of the divided tables exists is identified (step S401).

そして、フロントＤＢサーバ１０の問い合わせ処理部１１１は、データノード２１〜２３の問い合わせ処理部２０４に対して、日付別の値ＩＤ表２２１〜２２２で、「値ＩＤ」によって表現された「売上」列の値を「値ＩＤ」ごとの出現数を数えるよう問い合わせを発行する（ステップＳ４０２）。データノード２３は日付別の値ＩＤ表を記憶していないので、ここでは対象外となる。 Then, the inquiry processing unit 111 of the front DB server 10 sends the “sales” column expressed by “value ID” in the date-specific value ID tables 221 to 222 to the inquiry processing unit 204 of the data nodes 21 to 23. An inquiry is issued so as to count the number of occurrences for each “value ID” (step S402). Since the data node 23 does not store the value ID table for each date, it is excluded here.

問い合わせを受け付けたデータノード２１〜２２の問い合わせ処理部２０４は、日付別の値ＩＤ表２２１〜２２２に対し、「売上」列の「値ＩＤ」ごとの出現数を計算して、これをフロントＤＢサーバ１０に返却する（ステップＳ４０３）。図７は、図６のステップＳ４０３（数４）に示す処理で、データノード２１〜２２からフロントＤＢサーバ１０に返却される「値ＩＤ」ごとの出現数の表２４１〜２４２を示す説明図である。 The inquiry processing unit 204 of the data nodes 21 to 22 that has received the inquiry calculates the number of appearances for each “value ID” in the “sales” column for the date-specific value ID tables 221 to 222, and uses this as the front DB. Return to the server 10 (step S403). FIG. 7 is an explanatory diagram showing tables 241 to 242 of the number of appearances for each “value ID” returned from the data nodes 21 to 22 to the front DB server 10 in the process shown in step S403 (expression 4) of FIG. is there.

これを受けたフロントＤＢサーバ１０の問い合わせ処理部１１１は、「日付」と「売上」の各々の「値ＩＤ」に対応する値リスト２１１〜２１３を、各データノード２１〜２３から取得する（ステップＳ４０４〜４０５）。なお、このステップＳ４０４〜４０５の処理は、値リスト２１１〜２１３そのものを各データノード２１〜２３から取得するのではなく、値ＩＤに対応する値を各データノード２１〜２３から取得するものでもよい。 Receiving this, the inquiry processing unit 111 of the front DB server 10 acquires the value lists 211 to 213 corresponding to the “value IDs” of “date” and “sales” from the data nodes 21 to 23 (steps). S404-405). Note that the processing in steps S404 to S405 may be such that the value list 211 to 213 itself is not acquired from each data node 21 to 23, but the value corresponding to the value ID is acquired from each data node 21 to 23. .

そして、フロントＤＢサーバ１０の問い合わせ処理部１１１は、ステップＳ４０３で返却された「売上」列の「値ＩＤ」ごとの出現数に、実際の「売上」の数値を適用して、実際の売上金額を結果データ２４３として算出して、これをクライアントマシン５０に返却する（ステップＳ４０６）。 Then, the inquiry processing unit 111 of the front DB server 10 applies the actual “sales” numerical value to the number of appearances of each “value ID” in the “sales” column returned in step S403, thereby calculating the actual sales amount. Is calculated as result data 243 and returned to the client machine 50 (step S406).

以下に示す数５は、図６のステップＳ４０６に示した処理で、図２〜３に示した内容のデータ例に対して数４のクエリーによって実際に行われる計算を示す。図８は、図６のステップＳ４０６（数５）に示した処理の結果、クライアントマシン５０に返却される日付ごとの売上の合計を示す結果データ２４３について示す説明図である。

Equation 5 below shows the calculation actually performed by the query of Equation 4 for the data example having the contents shown in FIGS. 2 to 3 in the process shown in step S406 of FIG. FIG. 8 is an explanatory diagram showing result data 243 indicating the total sales for each date returned to the client machine 50 as a result of the process shown in step S406 (Equation 5) of FIG.

（店ＩＤ別の売上集計処理）
上記で示した処理は、たとえば以下の数６に示すクエリーをクライアントマシン５０が発行した場合においても、図６で示した動作によって同様に処理することができる。これは、表データ２１０から店ＩＤごとの売上の合計を求めるクエリーである。

(Sales summary processing by store ID)
For example, even when the client machine 50 issues the query shown in the following Equation 6 by the operation shown in FIG. 6, the processing shown above can be similarly processed. This is a query for calculating the total sales for each store ID from the table data 210.

この場合、問い合わせ処理部１１１は、データ配置情報管理部１１２がデータ配置情報１２１を参照して、「売上表」に対し「店ＩＤ」別の値ＩＤ表が各データノード２１〜２３上に存在することを確認し、分割された各表の存在するノードを特定する（ステップＳ４０１）。 In this case, the inquiry processing unit 111 refers to the data arrangement information 121 by the data arrangement information management unit 112, and a value ID table for each “store ID” exists on each data node 21 to 23 with respect to “sales table”. To identify the node in which each divided table exists (step S401).

そして、フロントＤＢサーバ１０の問い合わせ処理部１１１は、データノード２１〜２３の問い合わせ処理部２０４に対して、店ＩＤ別の値ＩＤ表２３１〜２３３で、「値ＩＤ」によって表現された「売上」列の値を「値ＩＤ」ごとの出現数を数えるよう問い合わせを発行する（ステップＳ４０２）。 Then, the inquiry processing unit 111 of the front DB server 10 sends “sales” expressed by “value ID” in the value ID tables 231 to 233 for each store ID to the inquiry processing unit 204 of the data nodes 21 to 23. An inquiry is issued so as to count the number of occurrences of each column value “value ID” (step S402).

問い合わせを受け付けたデータノード２１〜２３の問い合わせ処理部２０４は、店ＩＤ別の値ＩＤ表２３１〜２３３に対し、「売上」列の「値ＩＤ」ごとの出現数を計算して、これをフロントＤＢサーバ１０に返却する（ステップＳ４０３）。図９は、図６のステップＳ４０３（数６）に示す処理で、データノード２１〜２３からフロントＤＢサーバ１０に返却される「値ＩＤ」ごとの出現数の表２５１〜２５３を示す説明図である。 The inquiry processing unit 204 of the data nodes 21 to 23 that received the inquiry calculates the number of appearances for each “value ID” in the “sales” column for the value ID tables 231 to 233 for each store ID, Return to the DB server 10 (step S403). FIG. 9 is an explanatory diagram showing tables 251 to 253 of the number of appearances for each “value ID” returned from the data nodes 21 to 23 to the front DB server 10 in the process shown in step S403 (formula 6) of FIG. is there.

これを受けたフロントＤＢサーバ１０の問い合わせ処理部１１１は、「店ＩＤ」と「売上」の各々の「値ＩＤ」に対応する値リスト２１１〜２１３を、各データノード２１〜２３から取得する（ステップＳ４０４〜４０５）。なお、このステップＳ４０４〜４０５の処理は、値リスト２１１〜２１３そのものを各データノード２１〜２３から取得するのではなく、値ＩＤに対応する値を各データノード２１〜２３から取得するものでもよい。 In response to this, the inquiry processing unit 111 of the front DB server 10 acquires the value lists 211 to 213 corresponding to the “value IDs” of “store ID” and “sales” from the data nodes 21 to 23 ( Steps S404 to 405). Note that the processing in steps S404 to S405 may be such that the value list 211 to 213 itself is not acquired from each data node 21 to 23, but the value corresponding to the value ID is acquired from each data node 21 to 23. .

そして、フロントＤＢサーバ１０の問い合わせ処理部１１１は、ステップＳ４０３で返却された「売上」列の「値ＩＤ」ごとの出現数に、実際の「売上」の数値を適用して、実際の売上金額を結果データ２５４として算出して、算出して、これをクライアントマシン５０に返却する（ステップＳ４０６）。 Then, the inquiry processing unit 111 of the front DB server 10 applies the actual “sales” numerical value to the number of appearances of each “value ID” in the “sales” column returned in step S403, thereby calculating the actual sales amount. Is calculated as result data 254, and is calculated and returned to the client machine 50 (step S406).

以下に示す数７は、図６のステップＳ４０６に示した処理で、図２〜３に示した内容のデータ例に対して数６のクエリーによって実際に行われる計算を示す。図１０は、図６のステップＳ４０６（数７）に示した処理の結果、クライアントマシン５０に返却される店ＩＤごとの売上の合計を示す結果データ２５４について示す説明図である。

Equation 7 below shows the calculation actually performed by the query of Equation 6 for the data example shown in FIGS. 2 to 3 in the processing shown in step S406 of FIG. FIG. 10 is an explanatory diagram showing result data 254 indicating the total sales for each store ID returned to the client machine 50 as a result of the process shown in step S406 (Equation 7) of FIG.

（和以外の集計処理）
以上、各データノード２１〜２３に分配して記憶されたデータに対して、クライアントマシン５０からフロントＤＢサーバ１０に対して対象列の和（ＳＵＭ）を求めるクエリーが発行された際の処理について説明したが、これ以外のたとえば最小値（ＭＩＮ）、最大値（ＭＡＸ）、出現数（ＣＯＵＮＴ）、平均値（ＡＶＧ）を求める動作についても、同一のシステムで可能である。 (Aggregation processing other than Japanese)
The processing when the query for calculating the sum (SUM) of the target columns is issued from the client machine 50 to the front DB server 10 for the data distributed and stored in the data nodes 21 to 23 has been described above. However, other operations such as obtaining the minimum value (MIN), the maximum value (MAX), the number of appearances (COUNT), and the average value (AVG) can be performed by the same system.

このうち最小値（ＭＩＮ）もしくは最大値（ＭＡＸ）を求めるクエリーを受けた場合は、各データノード２１〜２３に記憶されている値ＩＤ表２２１〜２２２，２３１〜２３３から、問い合わせ処理部２０４が各々の値ＩＤが最大もしくは最小となるものを選択してフロントＤＢサーバ１０の問い合わせ処理部１１１に返却する。そしてフロントＤＢサーバ１０の問い合わせ処理部１１１は、図６のステップＳ４０６と同様にして、値リスト２１１〜２１３から返却された値ＩＤに対応する値をクライアントマシン５０に返却する。 When a query for obtaining the minimum value (MIN) or the maximum value (MAX) is received, the inquiry processing unit 204 uses the value ID tables 221 to 222 and 231 to 233 stored in the data nodes 21 to 23, respectively. A value having the maximum or minimum value ID is selected and returned to the inquiry processing unit 111 of the front DB server 10. Then, the inquiry processing unit 111 of the front DB server 10 returns values corresponding to the value IDs returned from the value lists 211 to 213 to the client machine 50 in the same manner as in step S406 of FIG.

値リスト２１１〜２１３は、前述したように値の小さいものから順に整列して並べて、その順番に値ＩＤを割り振るものであるので、各データノード２１〜２３の問い合わせ処理部２０４は、値リスト２１１〜２１３で実際の値を参照しなくても、値ＩＤが最大もしくは最小となるものを最大値もしくは最小値であると判断することができる。 Since the value lists 211 to 213 are arranged in order from the smallest value as described above, and value IDs are assigned in that order, the inquiry processing unit 204 of each data node 21 to 23 receives the value list 211. Even if the actual value is not referred to in steps 213 to 213, the value having the maximum or minimum value ID can be determined to be the maximum value or the minimum value.

出現数（ＣＯＵＮＴ）を求めるクエリーを受けた場合は、各データノード２１〜２３に値ＩＤ表２２１〜２２２，２３１〜２３３に各値ＩＤごとの出現数が記憶されているので、フロントＤＢサーバ１０の問い合わせ処理部１１１は各データノード２１〜２３側でその出現数を計算させる。そして、フロントＤＢサーバ１０の問い合わせ処理部１１１は各データノード２１〜２３から返却された出現数を受けて、図６のステップＳ４０６と同様にして、その値ＩＤを実際の値に変換してクライアントマシン５０に返却する。 When a query for the number of occurrences (COUNT) is received, the number of occurrences for each value ID is stored in the value ID tables 221 to 222 and 231 to 233 in the data nodes 21 to 23. Therefore, the front DB server 10 The inquiry processing unit 111 calculates the number of appearances on each data node 21 to 23 side. Then, the inquiry processing unit 111 of the front DB server 10 receives the number of appearances returned from each of the data nodes 21 to 23, converts the value ID into an actual value in the same manner as in step S406 in FIG. Return to machine 50.

平均値（ＡＶＧ）を求めるクエリーを受けた場合は、図６のステップＳ４０１〜４０６と同様の処理で、ただステップＳ４０６でフロントＤＢサーバ１０の問い合わせ処理部１１１で各々の値ＩＤを実際の値に変換してから平均値を求めてクライアントマシン５０に返却する点のみが図６で示した処理と異なる。 When a query for obtaining an average value (AVG) is received, the same processing as steps S401 to S406 in FIG. 6 is performed. In step S406, the query processor 111 of the front DB server 10 sets each value ID to an actual value. The only difference from the processing shown in FIG. 6 is that the average value is obtained after conversion and returned to the client machine 50.

（第１の実施形態の全体的な動作）
次に、上記の実施形態の全体的な動作について説明する。本実施形態に係るデータ処理方法は、フロントデータベースサーバ（フロントＤＢサーバ１０）と複数台のデータノード２１〜２３とが相互に接続されて構成される分散メモリデータベースシステム１にあって、外部からの表データの入力をフロントデータベースサーバのデータ構造変換部が受け付け（図４・ステップＳ３０１〜３０２）、入力された表データを、集計軸になり得る列としてあらかじめ指定されたデータ項目の各々についてフロントデータベースサーバのデータ構造変換部が個別に分割して実際のデータを値ＩＤに置換した複数の値ＩＤ表を生成し（図４・ステップＳ３０４〜３０６）、生成された複数の値ＩＤ表をフロントデータベースサーバのデータ構造変換部が各データノードに分散して記憶させ（図４・ステップＳ３０７）、外部のクライアントマシンから発行された集合関数を含むクエリーをフロントデータベースサーバの問い合わせ処理部が受け付け、受け付けられたクエリーに基づいて表データの中の特定の値ＩＤの出現数をフロントデータベースサーバの問い合わせ処理部が各データノードに問い合わせ（図６・ステップＳ４０１〜４０２）、各データノードから返された特定の値ＩＤの出現数からフロントデータベースサーバの問い合わせ処理部がクエリーに対応する集合関数の値を計算してクライアントマシンに返送する（図６・ステップＳ４０４〜４０６）。 (Overall operation of the first embodiment)
Next, the overall operation of the above embodiment will be described. The data processing method according to the present embodiment is a distributed memory database system 1 configured by connecting a front database server (front DB server 10) and a plurality of data nodes 21 to 23 to each other. The data structure conversion unit of the front database server accepts the input of table data (FIG. 4, steps S301 to 302), and the input table data is stored in the front database for each data item designated in advance as a column that can be an aggregation axis. The data structure conversion unit of the server individually generates a plurality of value ID tables in which actual data is replaced with value IDs (FIG. 4, steps S304 to 306), and the generated plurality of value ID tables are converted into a front database. The server's data structure conversion unit is distributed and stored in each data node (FIG. 4, step S). 07), the query processing unit of the front database server accepts a query including a set function issued from an external client machine, and the number of occurrences of a specific value ID in the table data is determined based on the accepted query. The query processing unit queries each data node (FIG. 6, steps S401 to 402), and the query processing unit of the front database server determines the set function corresponding to the query from the number of occurrences of the specific value ID returned from each data node. The value is calculated and returned to the client machine (FIG. 6, steps S404 to S406).

ここで、上記各動作ステップについては、これをコンピュータで実行可能にプログラム化し、これらを前記各ステップを直接実行するコンピュータであるフロントＤＢサーバ１０に実行させるようにしてもよい。本プログラムは、非一時的な記録媒体、例えば、ＤＶＤ、ＣＤ、フラッシュメモリ等に記録されてもよい。その場合、本プログラムは、記録媒体からコンピュータによって読み出され、実行される。
この動作により、本実施形態は以下のような効果を奏する。 Here, each of the above operation steps may be programmed so as to be executable by a computer, and these may be executed by the front DB server 10 which is a computer that directly executes each of the steps. The program may be recorded on a non-transitory recording medium, such as a DVD, a CD, or a flash memory. In this case, the program is read from the recording medium by a computer and executed.
By this operation, this embodiment has the following effects.

本実施形態では、全ての基準列、即ち集計軸になり得る列について、表データ２１０を分割して各データノード２１〜２３に分配して記憶させている。これによって、どの基準列についてクエリーが発行されたとしても、他の装置とのデータの交換を発生させずにデータノード２１〜２３のうちの１台だけで集合関数に対する集計の処理を行って、その集計結果だけをフロントＤＢサーバ１０に送信するようにできる。さらに、フロントＤＢサーバ１０での処理も、単純に値ＩＤを実際の値に差し替えるだけでよいので、高速に行うことができる。 In the present embodiment, the table data 210 is divided and distributed to the data nodes 21 to 23 for all the reference columns, that is, columns that can be aggregate axes. As a result, even if a query is issued for any reference column, the aggregation processing for the set function is performed by only one of the data nodes 21 to 23 without causing the exchange of data with other devices, Only the total result can be transmitted to the front DB server 10. Furthermore, the processing in the front DB server 10 can be performed at high speed because it is sufficient to simply replace the value ID with the actual value.

その際、各データノード２１〜２３が記憶するデータは、値リストと値ＩＤによる表現に変換した上で記憶されるので、各データノード２１〜２３に記憶されるデータの容量を削減することができる。特に重複する値が多い場合に、そのデータの容量の削減の効果はより顕著なものとなる。さらに、複数の基準列について表データ２１０を分割する際のデータ容量の増大も、最低限に抑制することができる。 At this time, the data stored in each of the data nodes 21 to 23 is stored after being converted into an expression based on the value list and the value ID, so that the capacity of the data stored in each of the data nodes 21 to 23 can be reduced. it can. In particular, when there are many overlapping values, the effect of reducing the data capacity becomes more remarkable. Furthermore, an increase in data capacity when the table data 210 is divided for a plurality of reference columns can be suppressed to a minimum.

（第２の実施形態）
本発明の第２の実施形態では、フロントデータベースサーバ（フロントＤＢサーバ５１０）のデータ構造変換部が、表データの中で集計軸および集計対象のいずれにもなり得ない列としてあらかじめ指定されたデータ項目の各々について、集計軸になり得る列についての値ＩＤ表のいずれか１種類に付随して記憶させる構成とした。 (Second Embodiment)
In the second embodiment of the present invention, the data structure conversion unit of the front database server (front DB server 510) is data designated in advance as a column that cannot be either the aggregation axis or the aggregation target in the table data. Each item is stored in association with any one of the value ID tables for columns that can be aggregate axes.

この構成によれば、集計軸および集計対象のいずれにもなり得ないデータ項目を含むデータに対しても、第１の実施形態と同一の効果を得ることができる。
以下、これをより詳細に説明する。 According to this configuration, the same effect as that of the first embodiment can be obtained for data including data items that cannot be any of the aggregation axis and the aggregation target.
Hereinafter, this will be described in more detail.

図１１は、本発明の第２の実施形態に係る分散メモリデータベースシステム５０１の構成を示す説明図である。分散メモリデータベースシステム５０１は、フロントＤＢサーバ５１０と複数台のデータノード５２１〜５２３とが、第１の実施形態と同一の内部ネットワーク３０を介して相互に接続されて構成される。外部ネットワーク４０、およびクライアントマシン５０も、第１の実施形態と同一である。 FIG. 11 is an explanatory diagram showing the configuration of the distributed memory database system 501 according to the second embodiment of the present invention. The distributed memory database system 501 is configured by connecting a front DB server 510 and a plurality of data nodes 521 to 523 to each other via the same internal network 30 as in the first embodiment. The external network 40 and the client machine 50 are the same as those in the first embodiment.

フロントＤＢサーバ５１０およびデータノード５２１〜５２３は、第１の実施形態のフロントＤＢサーバ１０およびデータノード２１〜２３と、ハードウェア的には同一の構成を有する。フロントＤＢサーバ５１０の主演算制御手段１０１で動作する問い合わせ処理部１１１およびデータ構造変換部１１３が、各々問い合わせ処理部５１１およびデータ構造変換部５１３に置換されている。また、データノード５２１〜５２３については、各々が記憶している値ＩＤ表６２１〜６２２，６３１〜６３２が第１の実施形態と異なる。 The front DB server 510 and the data nodes 521 to 523 have the same hardware configuration as the front DB server 10 and the data nodes 21 to 23 of the first embodiment. The inquiry processing unit 111 and the data structure conversion unit 113 that operate in the main arithmetic control unit 101 of the front DB server 510 are replaced with an inquiry processing unit 511 and a data structure conversion unit 513, respectively. Further, regarding the data nodes 521 to 523, the value ID tables 621 to 622 and 631 to 632 stored therein are different from those of the first embodiment.

図１２は、図１１で説明した分散メモリデータベースシステム５０１に対して入力される表データ６１０の一例を示す説明図である。表データ６１０は、項目Ａ６１０ａ、項目Ｂ６１０ｂ、項目Ｃ６１０ｃ、項目Ｄ６１０ｄといったデータ項目を持つが、このうち基準列として項目Ａ６１０ａおよび項目Ｂ６１０ｂ、対象列として項目Ｄ６１０ｄが指定されているが、項目Ｃ６１０ｃは基準列および対象列のいずれにも該当しない。なお、項目Ａ６１０ａは同時に主キーにも指定されている。 FIG. 12 is an explanatory diagram showing an example of the table data 610 input to the distributed memory database system 501 described in FIG. The table data 610 has data items such as an item A 610a, an item B 610b, an item C 610c, and an item D 610d. Among these, the item A 610a and the item B 610b are specified as the reference columns, and the item D 610d is specified as the target column. Does not fall into either column or target column. The item A 610a is also designated as a primary key at the same time.

図１３は、図１２に示した表データ６１０からデータ構造変換部５１３が作成する値ＩＤ表６２１〜６２２，６３１〜６３２の例を示す説明図である。前述のように基準列でも対象列でもない項目Ｃ６１０ｃが表データ６１０に含まれる場合、データ構造変換部５１３は、項目Ａ６１０ａの値ＩＤ別の値ＩＤ表６２１〜６２２にのみ項目Ｃ６１０ｃの値ＩＤを同時に保持し、項目Ｂ６１０ｂの値ＩＤ別の値ＩＤ表６３１〜６３２には項目Ｃ６１０ｃの値ＩＤを含めないようにする。値リストは、項目Ａ〜Ｄの全てについて、第１の実施形態と同様に作成され、各データノード５２１〜５２３に分散して記憶される。 FIG. 13 is an explanatory diagram showing an example of value ID tables 621 to 622 and 631 to 632 created by the data structure conversion unit 513 from the table data 610 shown in FIG. As described above, when the item C610c that is neither the reference column nor the target column is included in the table data 610, the data structure conversion unit 513 assigns the value ID of the item C610c only to the value ID tables 621 to 622 for each value ID of the item A610a. At the same time, the value ID tables 631 to 632 for each value ID of the item B 610b are not included in the value ID of the item C 610c. The value list is created for all items A to D in the same manner as in the first embodiment, and is distributed and stored in the data nodes 521 to 523.

たとえば、項目Ｂ６１０ｂの特定の値もしくは値ＩＤから、これに対応する項目Ｃ６１０ｃの値もしくは値ＩＤを知りたい場合には、まず項目Ｂ６１０ｂの値ＩＤ別の値ＩＤ表６３１〜６３２から、項目Ｂ６１０ｂの特定の値ＩＤに対応する項目Ａ６１０ａの値ＩＤを特定し、そこから項目Ａ６１０ａの値ＩＤ別の値ＩＤ表６２１〜６２２を参照してこれに対応する項目Ｃ６１０ｃの値ＩＤを特定することができる。 For example, when the user wants to know the value or value ID of the corresponding item C610c from the specific value or value ID of the item B610b, first, from the value ID tables 631 to 632 for each value ID of the item B610b, The value ID of the item A 610a corresponding to the specific value ID can be specified, and the value ID of the item C 610c corresponding to this can be specified by referring to the value ID tables 621 to 622 for each value ID of the item A 610a. .

項目Ｃ６１０ｃは基準列でも対象列でもなく、従って集合関数による処理の対象とはならないので、全ての値ＩＤ表でその対応関係を保持する必要はなく、ただ他のデータ項目の値との対応がわかるようにしておけばよいものである。従って、いずれか１つの基準列（必ずしも主キーである必要はない）についてのみ項目Ｃ６１０ｃとの対応がわかるようにしておけば、他の値との対応を辿ることが可能となる。 Since the item C610c is neither the reference column nor the target column, and therefore is not a target of processing by the set function, it is not necessary to maintain the correspondence relationship in all the value ID tables, and the correspondence with the values of other data items is merely It should be understood. Therefore, if the correspondence with the item C610c is known only for any one of the reference columns (not necessarily the primary key), the correspondence with the other values can be traced.

（実施形態の拡張）
以上で説明した第１および第２の実施形態は、その趣旨を改変しない範囲で、様々な拡張が考えられる。
たとえば、作成された値ＩＤ表および値リストについて、１つの表を必ず１つのデータノードに記憶する必要はない。１つの値ＩＤ表もしくは値リストを、各データノードの記憶容量などのような制約に応じて、複数のデータノードに適宜分割して記憶してもよい。この場合には、データ配置情報管理部１１２が、どの値ＩＤ表もしくは値リストが、どのデータノードに分割して記憶されているかを把握してデータ配置情報１２１にその旨を記憶するようにすればよい。 (Extended embodiment)
In the first and second embodiments described above, various extensions can be considered without departing from the spirit of the first and second embodiments.
For example, for the created value ID table and value list, it is not always necessary to store one table in one data node. One value ID table or value list may be divided into a plurality of data nodes and stored in accordance with restrictions such as the storage capacity of each data node. In this case, the data arrangement information management unit 112 grasps which value ID table or value list is divided and stored in which data node and stores it in the data arrangement information 121. That's fine.

これまで本発明について図面に示した特定の実施形態をもって説明してきたが、本発明は図面に示した実施形態に限定されるものではなく、本発明の効果を奏する限り、これまで知られたいかなる構成であっても採用することができる。 The present invention has been described with reference to the specific embodiments shown in the drawings. However, the present invention is not limited to the embodiments shown in the drawings, and any known hitherto provided that the effects of the present invention are achieved. Even if it is a structure, it is employable.

上述した各々の実施形態について、その新規な技術内容の要点をまとめると、以下のようになる。なお、上記実施形態の一部または全部は、新規な技術として以下のようにまとめられるが、本発明は必ずしもこれに限定されるものではない。 About each embodiment mentioned above, it is as follows when the summary of the novel technical content is put together. In addition, although part or all of the said embodiment is summarized as follows as a novel technique, this invention is not necessarily limited to this.

（付記１）フロントデータベースサーバと複数台のデータノードとが相互に接続されて構成される分散メモリデータベースシステムであって、
前記フロントデータベースサーバが、
外部から入力される表データを分割して各々の実際のデータを値ＩＤに置換した複数の値ＩＤ表を生成してこれらを前記各データノードに分散して記憶させるデータ構造変換部と、
外部のクライアントマシンから発行された集合関数を含むクエリーに基づいて前記各データノードに対して前記表データの中の特定の値ＩＤの出現数を問い合わせると共に、これに応じて前記各データノードから返された特定の値ＩＤの出現数から前記クエリーに対応する集合関数の値を計算して前記クライアントマシンに返送する問い合わせ処理部と
を有し、
前記データ構造変換部が、前記複数の値ＩＤ表を生成する際に、前記表データの中で集計軸になり得る列としてあらかじめ指定された複数のデータ項目の各々について個別に前記複数の値ＩＤ表を生成することを特徴とする分散メモリデータベースシステム。 (Supplementary note 1) A distributed memory database system comprising a front database server and a plurality of data nodes connected to each other,
The front database server is
A data structure conversion unit that divides table data input from the outside and generates a plurality of value ID tables in which each actual data is replaced with a value ID, and distributes and stores these in each data node;
Based on a query including an aggregate function issued from an external client machine, the data node is inquired about the number of occurrences of a specific value ID in the table data, and the data node returns a response accordingly. A query processing unit that calculates a value of a set function corresponding to the query from the number of occurrences of the specified value ID and returns the value to the client machine,
When the data structure conversion unit generates the plurality of value ID tables, the plurality of value IDs individually for each of a plurality of data items designated in advance as a column that can be an aggregation axis in the table data. A distributed memory database system characterized by generating a table.

（付記２）前記フロントデータベースサーバの前記データ構造変換部が、前記複数の値ＩＤ表を生成する際に、前記表データの中で集計軸および集計対象になり得る列としてあらかじめ指定されたデータ項目の各々について実際の値を値ＩＤに置換すると共に、前記値ＩＤと前記実際の値との対応を示す値リストを生成する機能を有し、
前記フロントデータベースサーバが、前記値リストおよび前記値ＩＤ表が前記各データノードの中のいずれに分散されたかを記憶するデータ配置情報管理部を有することを特徴とする、付記１に記載の分散メモリデータベースシステム。 (Supplementary Note 2) When the data structure conversion unit of the front database server generates the plurality of value ID tables, data items designated in advance as columns that can be aggregated axes and aggregation targets in the table data And a function of generating a value list indicating a correspondence between the value ID and the actual value, and replacing an actual value with a value ID for each of
The distributed memory according to claim 1, wherein the front database server includes a data arrangement information management unit that stores in which of the data nodes the value list and the value ID table are distributed. Database system.

（付記３）前記フロントデータベースサーバの前記データ構造変換部が、前記複数の値ＩＤ表を生成する際に、前記実際の値を大小順にソートしてから前記値ＩＤに置換することを特徴とする、付記２に記載の分散メモリデータベースシステム。 (Supplementary note 3) When the data structure conversion unit of the front database server generates the plurality of value ID tables, the actual values are sorted in order of magnitude and then replaced with the value ID. The distributed memory database system according to attachment 2.

（付記４）前記フロントデータベースサーバの前記データ構造変換部が、前記表データの中で集計軸および集計対象のいずれにもなり得ない列としてあらかじめ指定されたデータ項目の各々について、前記集計軸になり得る列についての前記値ＩＤ表のいずれか１種類に付随して記憶させることを特徴とする、付記２に記載の分散メモリデータベースシステム。 (Additional remark 4) The said data structure conversion part of the said front database server is set to the said total axis about each of the data item previously designated as a column which cannot become any of the total axis and the total object in the said table data. The distributed memory database system according to appendix 2, wherein the distributed memory database system is stored in association with any one type of the value ID table for possible columns.

（付記５）複数台のデータノードと相互に接続されて分散メモリデータベースシステムを構成するフロントデータベースサーバであって、
外部から入力される表データを分割して各々の実際のデータを値ＩＤに置換した複数の値ＩＤ表を生成してこれらを前記各データノードに分散して記憶させるデータ構造変換部と、
外部のクライアントマシンから発行された集合関数を含むクエリーに基づいて前記各データノードに対して前記表データの中の特定の値ＩＤの出現数を問い合わせると共に、これに応じて前記各データノードから返された特定の値ＩＤの出現数から前記クエリーに対応する集合関数の値を計算して前記クライアントマシンに返送する問い合わせ処理部と
を有し、
前記データ構造変換部が、前記複数の値ＩＤ表を生成する際に、前記表データの中で集計軸になり得る列としてあらかじめ指定された複数のデータ項目の各々について個別に前記複数の値ＩＤ表を生成することを特徴とするフロントデータベースサーバ。 (Supplementary Note 5) A front database server that is connected to a plurality of data nodes to constitute a distributed memory database system,
A data structure conversion unit that divides table data input from the outside and generates a plurality of value ID tables in which each actual data is replaced with a value ID, and distributes and stores these in each data node;
Based on a query including an aggregate function issued from an external client machine, the data node is inquired about the number of occurrences of a specific value ID in the table data, and the data node returns a response accordingly. A query processing unit that calculates a value of a set function corresponding to the query from the number of occurrences of the specified value ID and returns the value to the client machine,
When the data structure conversion unit generates the plurality of value ID tables, the plurality of value IDs individually for each of a plurality of data items designated in advance as a column that can be an aggregation axis in the table data. A front database server characterized by generating a table.

（付記６）フロントデータベースサーバと複数台のデータノードとが相互に接続されて構成される分散メモリデータベースシステムにあって、
外部からの表データの入力を前記フロントデータベースサーバのデータ構造変換部が受け付け、
入力された前記表データを、集計軸になり得る列としてあらかじめ指定されたデータ項目の各々について前記フロントデータベースサーバのデータ構造変換部が個別に分割して実際のデータを値ＩＤに置換した複数の値ＩＤ表を生成し、
生成された前記複数の値ＩＤ表を前記フロントデータベースサーバのデータ構造変換部が前記各データノードに分散して記憶させ、
外部のクライアントマシンから発行された集合関数を含むクエリーを前記フロントデータベースサーバの問い合わせ処理部が受け付け、
受け付けられた前記クエリーに基づいて前記表データの中の特定の値ＩＤの出現数を前記フロントデータベースサーバの問い合わせ処理部が前記各データノードに問い合わせ、
前記各データノードから返された特定の値ＩＤの出現数から前記フロントデータベースサーバの問い合わせ処理部が前記クエリーに対応する集合関数の値を計算して前記クライアントマシンに返送する
ことを特徴とするデータ処理方法。 (Supplementary Note 6) In a distributed memory database system configured by connecting a front database server and a plurality of data nodes to each other,
The data structure conversion unit of the front database server accepts input of table data from the outside,
The input table data is divided into a plurality of data items specified in advance as columns that can be aggregated axes, and the data structure conversion unit of the front database server individually divides and replaces actual data with value IDs. Generate a value ID table,
The data structure conversion unit of the front database server distributes and stores the generated plurality of value ID tables in the data nodes,
The query processing unit of the front database server accepts a query including a set function issued from an external client machine,
Based on the accepted query, the query processing unit of the front database server queries each data node for the number of occurrences of the specific value ID in the table data,
Data in which the query processing unit of the front database server calculates the value of the set function corresponding to the query from the number of occurrences of the specific value ID returned from each data node and returns it to the client machine Processing method.

（付記７）フロントデータベースサーバと複数台のデータノードとが相互に接続されて構成される分散メモリデータベースシステムにあって、
前記フロントデータベースサーバが備えるコンピュータに、
外部からの表データの入力を受け付ける手順、
入力された前記表データを、集計軸になり得る列としてあらかじめ指定されたデータ項目の各々について個別に分割して実際のデータを値ＩＤに置換した複数の値ＩＤ表を生成する手順、
生成された前記複数の値ＩＤ表を前記各データノードに分散して記憶させる手順、
外部のクライアントマシンから発行された集合関数を含むクエリーを受け付ける手順、
受け付けられた前記クエリーに基づいて前記表データの中の特定の値ＩＤの出現数を前記各データノードに問い合わせる手順、
および前記各データノードから返された特定の値ＩＤの出現数から前記クエリーに対応する集合関数の値を計算して前記クライアントマシンに返送する手順
を実行させることを特徴とするデータ処理プログラム。 (Supplementary Note 7) In a distributed memory database system configured by connecting a front database server and a plurality of data nodes to each other,
In the computer provided in the front database server,
Procedure to accept external table data input,
A procedure for generating a plurality of value ID tables in which the input data is individually divided for each data item designated in advance as a column that can serve as an aggregation axis, and actual data is replaced with value IDs;
A procedure for distributing and storing the generated plurality of value ID tables in each data node;
A procedure for accepting a query including a set function issued from an external client machine,
A procedure for inquiring each data node about the number of occurrences of a specific value ID in the table data based on the accepted query;
A data processing program for executing a procedure for calculating a value of a set function corresponding to the query from the number of appearances of a specific value ID returned from each data node and returning the value to the client machine.

本発明はデータベースを利用するコンピュータシステム、特に分散メモリを使用するデータベースシステムに幅広く適用できる。 The present invention can be widely applied to a computer system using a database, particularly a database system using a distributed memory.

１、５０１分散メモリデータベースシステム
１０、５１０フロントＤＢサーバ
２１〜２３、５２１〜５２３データノード
３０内部ネットワーク
４０外部ネットワーク
５０クライアントマシン
１０１、２０１主演算制御手段
１０２、２０２記憶手段
１０３、２０３通信手段
１１１、２０４、５１１問い合わせ処理部
１１２、５１３データ配置情報管理部
１１３データ構造変換部
１２１データ配置情報
２１０、６１０表データ
２１１〜２１３値リスト
２２１〜２２２、２３１〜３３、６２１〜６２２、６３１〜６３２値ＩＤ表 DESCRIPTION OF SYMBOLS 1,501 Distributed memory database system 10,510 Front DB server 21-23, 521-523 Data node 30 Internal network 40 External network 50 Client machine 101, 201 Main operation control means 102, 202 Storage means 103, 203 Communication means 111, 204, 511 Query processing unit 112, 513 Data arrangement information management unit 113 Data structure conversion unit 121 Data arrangement information 210, 610 Table data 211-213 Value list 221-222, 231-33, 621-622, 631-632 Value ID table

Claims

A distributed memory database system comprising a front database server and a plurality of data nodes connected to each other,
The front database server is
A data structure conversion unit that divides table data input from the outside and generates a plurality of value ID tables in which each actual data is replaced with a value ID, and distributes and stores these in each data node;
Based on a query including an aggregate function issued from an external client machine, the data node is inquired about the number of occurrences of a specific value ID in the table data, and the data node returns a response accordingly. A query processing unit that calculates a value of a set function corresponding to the query from the number of occurrences of the specified value ID and returns the value to the client machine,
When the data structure conversion unit generates the plurality of value ID tables, the plurality of value IDs individually for each of a plurality of data items designated in advance as a column that can be an aggregation axis in the table data. A distributed memory database system characterized by generating a table.

When the data structure conversion unit of the front database server generates the plurality of value ID tables, each data item designated in advance as an aggregation axis and a column that can be an aggregation target in the table data is actually A value list indicating the correspondence between the value ID and the actual value;
The front database server includes a data arrangement information management unit that stores in advance storage means in which the value list and the value ID table are distributed among the data nodes. The distributed memory database system according to claim 1.

3. The data structure conversion unit of the front database server, when generating the plurality of value lists, sorts the actual values in order of magnitude and then replaces them with the value IDs. The distributed memory database system described.

The column that can be the aggregation axis for each data item that is specified in advance by the data structure conversion unit of the front database server as a column that cannot be either the aggregation axis or the aggregation target in the table data The distributed memory database system according to claim 2, wherein the distributed memory database system is stored in association with any one of the value ID tables.

A front database server interconnected with a plurality of data nodes to constitute a distributed memory database system,
A data structure conversion unit that divides table data input from the outside and generates a plurality of value ID tables in which each actual data is replaced with a value ID, and distributes and stores these in each data node;
Based on a query including an aggregate function issued from an external client machine, the data node is inquired about the number of occurrences of a specific value ID in the table data, and the data node returns a response accordingly. A query processing unit that calculates a value of a set function corresponding to the query from the number of occurrences of the specified value ID and returns the value to the client machine,
When the data structure conversion unit generates the plurality of value ID tables, the plurality of value IDs individually for each of a plurality of data items designated in advance as a column that can be an aggregation axis in the table data. A front database server characterized by generating a table.

In a distributed memory database system configured by connecting a front database server and a plurality of data nodes to each other,
The data structure conversion unit of the front database server accepts input of table data from the outside,
The input table data is divided into a plurality of data items specified in advance as columns that can be aggregated axes, and the data structure conversion unit of the front database server individually divides and replaces actual data with value IDs. Generate a value ID table,
The data structure conversion unit of the front database server distributes and stores the generated plurality of value ID tables in the data nodes,
The query processing unit of the front database server accepts a query including a set function issued from an external client machine,
Based on the accepted query, the query processing unit of the front database server queries each data node for the number of occurrences of the specific value ID in the table data,
Data in which the query processing unit of the front database server calculates the value of the set function corresponding to the query from the number of occurrences of the specific value ID returned from each data node and returns it to the client machine Processing method.

In a distributed memory database system configured by connecting a front database server and a plurality of data nodes to each other,
In the computer provided in the front database server,
Procedure to accept external table data input,
A procedure for generating a plurality of value ID tables in which the input data is individually divided for each data item designated in advance as a column that can serve as an aggregation axis, and actual data is replaced with value IDs;
A procedure for distributing and storing the generated plurality of value ID tables in each data node;
A procedure for accepting a query including a set function issued from an external client machine,
A procedure for inquiring each data node about the number of occurrences of a specific value ID in the table data based on the accepted query;
A data processing program for executing a procedure for calculating a value of a set function corresponding to the query from the number of appearances of a specific value ID returned from each data node and returning the value to the client machine.