JP2015011484A

JP2015011484A - Data processing method and computer

Info

Publication number: JP2015011484A
Application number: JP2013135734A
Authority: JP
Inventors: 常之今木; Tsuneyuki Imaki; 西澤　格; Itaru Nishizawa; 格西澤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2013-06-28
Filing date: 2013-06-28
Publication date: 2015-01-19

Abstract

PROBLEM TO BE SOLVED: To improve the processing efficiency of stream data processing for coping with array type data.SOLUTION: There is provided a data processing method in a computer for executing stream data processing. The computer includes: a query analysis unit for generating the information of a query graph indicating the configuration of an operator in stream data processing; a query construction unit for generating an operator execution unit on the basis of the information of the query graph; and a query execution unit for processing the input stream data. The method includes: a step in which the query construction unit selects a processing object operator; a step in which the query construction unit analyzes the processing content of the operator, and determines a data structure in a data storage region allocated to an operator execution unit corresponding to the operator on the basis of the result of the analysis; a step in which the query construction unit generates the operator execution unit on the basis of the result of the analysis and the information of the data structure; and a step in which the query construction unit registers the operator execution unit in the query execution unit.

Description

本発明は、配列型データを扱うストリームデータ処理の処理効率を向上させるデータ処理方法に関する。 The present invention relates to a data processing method for improving the processing efficiency of stream data processing that handles array type data.

近年、処理するデータ量が増大しており、リアルタイムにデータを集計し、また、分析することを可能とするストリームデータ処理システムが注目を集めている。ストリームデータ処理システムは、ストリームデータを処理対象とする。ここで、ストリームデータとは、途切れることなく到来する時系列順のデータ列である。 In recent years, the amount of data to be processed has increased, and a stream data processing system that can aggregate and analyze data in real time has attracted attention. The stream data processing system targets stream data. Here, the stream data is a data sequence in chronological order that arrives without interruption.

ストリームデータ処理システムでは、予め定義されたクエリにしたがって所定のストリームデータ処理が実行される。クエリとは、処理の対象とするデータ及び処理内容を示すシナリオである。クエリを定義する計算機言語の例は、非特許文献１に開示されている。 In the stream data processing system, predetermined stream data processing is executed according to a predefined query. A query is a scenario indicating the data to be processed and the processing content. An example of a computer language that defines a query is disclosed in Non-Patent Document 1.

ストリームデータ処理のアプリケーションにおいては、回帰分析による時系列予測計算などの数値計算が必要となるケースがある。このような計算は、ベクトル計算や行列計算となるため、配列型データを用いた処理として定義される。 In a stream data processing application, there are cases where numerical calculation such as time series prediction calculation by regression analysis is required. Since such calculation is vector calculation or matrix calculation, it is defined as a process using array type data.

ストリームデータではないが、ストックされたデータに対する標準クエリ言語であるＳＱＬ９９では、配列型データを単一のカラム値として表現する、配列型カラムの仕様を規定している。この仕様は、ストリーム処理のクエリ言語にも適用可能である。同じく、ストックされた多次元配列のデータを効率的に処理する技術としては、非特許文献２に記載された技術が知られている。非特許文献２には、膨大なスケールの配列データを管理するデータ管理システムであるＳｃｉＤＢについて記載されている。 Although it is not stream data, SQL99, which is a standard query language for stocked data, defines an array type column specification that represents array type data as a single column value. This specification can also be applied to a query language for stream processing. Similarly, a technique described in Non-Patent Document 2 is known as a technique for efficiently processing stocked multidimensional array data. Non-Patent Document 2 describes SciDB, which is a data management system that manages a large scale of array data.

Ａ．Ａｒａｓｕ、Ｓ．ＢａｂｕａｎｄＪ．Ｗｉｄｏｍ著、“ＴｈｅＣＱＬＣｏｎｔｉｎｕｏｕｓＱｕｅｒｙＬａｎｇｕａｇｅ： SｅｍａｎｔｉｃＦｏｕｎｄａｔｉｏｎｓａｎｄＱｕｅｒｙＥｘｅｃｕｔｉｏｎ”、２００３年A. Arasu, S .; Babu and J.M. Widom, “The CQL Continuous Query Language: Semantic Foundations and Query Execution”, 2003 ＰａｕｌＧ. Ｂｒｏｗｎ著、“ＯｖｅｒｖｉｅｗｏｆｓｃｉＤＢ：ｌａｒｇｅｓｃａｌｅａｒｒａｙｓｔｏｒａｇｅ, ｐｒｏｃｅｓｓｉｎｇａｎｄａｎａｌｙｓｉｓ”、ＩｎＰｒｏｃ．ｏｆＳＩＧＭＯＤ２０１０、９６３〜９６８頁、２０１０年Paul G. Brown, “Overview of sciDB: large scale array storage, processing and analysis”, In Proc. of SIGMOD 2010, 963-968, 2010

従来、配列型データを扱うストリームデータ処理の高速化を実現するためには、配列の検索処理のコスト、及びメモリの利用効率が問題となる。ここで、二つの行列の積を算出する演算処理の例を用いて、従来の課題について説明する。 Conventionally, in order to realize high-speed stream data processing that handles array-type data, the cost of array search processing and the efficiency of use of memory are problematic. Here, a conventional problem will be described using an example of calculation processing for calculating the product of two matrices.

図１１は、従来の二つの行列の積を算出する演算処理を実現するためのクエリの一例を示す図である。図１１に示すクエリでは、行列Ａと行列Ｂとの積を算出するための演算処理が実行される。 FIG. 11 is a diagram illustrating an example of a query for realizing a calculation process for calculating a product of two conventional matrices. In the query shown in FIG. 11, an arithmetic process for calculating the product of the matrix A and the matrix B is executed.

図１１に示すクエリでは、行列のデータは、行のカラム、列のカラム、及び行列成分の値のカラムから構成されるタプルを複数含むデータ構造で管理される。したがって、一般的な行列表現のデータと比較して、データ量が増加するため、メモリの利用効率が低いという問題がある。 In the query shown in FIG. 11, matrix data is managed in a data structure including a plurality of tuples configured from row columns, column columns, and matrix component value columns. Therefore, there is a problem in that the use efficiency of the memory is low because the amount of data increases as compared with general matrix representation data.

また、クエリに基づく演算処理では、行列Ａの列のカラムの値と、行列Ｂの行のカラムの値とが一致する組合せを検索する処理が必要となる。したがって、行及び列の数が大きくなると検索処理のコストの増大し、これに伴って、演算処理の性能が低下するという問題がある。 In addition, in the arithmetic processing based on the query, it is necessary to search for a combination in which the column value of the matrix A and the column value of the row of the matrix B match. Therefore, as the number of rows and columns increases, the cost of search processing increases, and there is a problem in that the performance of arithmetic processing decreases accordingly.

前述した二つの問題を解決する方法として、以下のような従来技術が知られている。 The following conventional techniques are known as methods for solving the above two problems.

一つの方法は、予め、行列Ａの列のカラムの値、及び行列Ｂの行のカラムの値との組合せを指定するインデクスを用いる方法である。当該インデクスを用いることによって、前述した検索処理のコストの増大を抑制することができる。しかし、インデクスを格納するメモリ領域を確保する必要があるため、メモリの利用効率がさらに低くなるという問題がある。 One method is a method using an index that specifies a combination of a column value of a column of the matrix A and a column value of a row of the matrix B in advance. By using the index, it is possible to suppress an increase in the cost of the search process described above. However, since it is necessary to secure a memory area for storing the index, there is a problem in that the memory utilization efficiency is further reduced.

他の方法は、前述のＳＱＬ９９と同様の配列型カラムを利用する方法である。図１２は、配列型カラムを用いたクエリの一例を示す図である。図１２に示すクエリでは、行列成分そのものが配列として管理される。単純に配列型カラムを用いたクエリでは以下のような問題がある。 The other method is a method using an array type column similar to the above-mentioned SQL99. FIG. 12 is a diagram illustrating an example of a query using an array type column. In the query shown in FIG. 12, the matrix components themselves are managed as an array. A query using an array type column has the following problems.

一つの問題は、一つの行列成分のデータ更新に伴う演算処理が効率的に行えないことである。例えば、行列の積が算出された後に行列成分の一つが更新された場合、その更新によって変化する積の成分は、更新された成分と行又は列が一致する成分に限られる。このように、更新後の行列の積を得るためには、変化した成分の演算処理のみを実行すればよい。しかし、配列型カラムでは行列全体が一つのカラムに対応するため、一つの行列成分の更新に伴って配列型カラム（行列）そのものが更新される。改めて行列の積を算出しなければならない。したがって、一部の値の更新に関する演算処理を効率的に行えないという問題がある。 One problem is that the arithmetic processing associated with the data update of one matrix component cannot be performed efficiently. For example, when one of the matrix components is updated after the matrix product is calculated, the product component that is changed by the update is limited to a component whose row or column matches the updated component. Thus, in order to obtain the updated matrix product, it is only necessary to execute the arithmetic processing of the changed component. However, in the array type column, the entire matrix corresponds to one column, and therefore the array type column (matrix) itself is updated as one matrix component is updated. It is necessary to calculate the matrix product anew. Therefore, there is a problem that the arithmetic processing related to the updating of some values cannot be performed efficiently.

他の問題は、行列の積を算出するための組込関数を定義する必要がある。配列型カラムを用いた場合、演算に必要な関数を新たに定義する必要がある。また、予めライブラリとして用意されていない関数が必要になる場合がある。そのため、開発効率が悪いという問題点がある。 Another problem is the need to define built-in functions for calculating matrix products. When using an array type column, it is necessary to define a new function necessary for the operation. In some cases, a function that is not prepared in advance as a library is required. Therefore, there is a problem that development efficiency is poor.

本願発明は前述した課題に鑑みてなされた発明である。すなわち、配列型データを扱うストリームデータ処理の高速化が実現可能なデータ処理方法及び計算機を提供することを目的とする。 The present invention has been made in view of the above-described problems. That is, an object of the present invention is to provide a data processing method and computer capable of realizing high-speed stream data processing for handling array type data.

本願において開示される発明の代表的な一例を示せば以下の通りである。すなわち、ストリームデータ処理を実行する計算機におけるデータ処理方法であって、前記計算機は、プロセッサ、前記プロセッサに接続されるメモリ、前記プロセッサに接続されるネットワークインタフェースを有し、前記ストリームデータ処理は、処理の最小単位となるオペレータを複数含み、前記ストリームデータ処理の対象であるストリームデータは、複数のカラムから構成されるタプルを複数含み、前記計算機は、前記ストリームデータ処理を実現するための一つ以上のクエリを含む分析シナリオを受け付け、前記分析シナリオを解析することによって、前記ストリームデータ処理における前記オペレータの構成を示すクエリグラフの情報を生成するクエリ解析部と、前記クエリグラフの情報に基づいて、前記オペレータに対応する処理を実行するオペレータ実行部を生成するクエリ構築部と、前記クエリ構築部によって生成された前記オペレータ実行部に基づいて、入力されたストリームデータを処理するクエリ実行部と、を有し、前記方法は、前記クエリ構築部が、前記ストリームデータ処理に含まれる前記複数のオペレータの中から処理対象のオペレータを選択する第１のステップと、前記クエリ構築部が、前記クエリグラフの情報に基づいて、前記選択されたオペレータの処理内容を解析し、前記解析の結果に基づいて、前記選択されたオペレータに対応する前記オペレータ実行部に割り当てられるデータ格納領域におけるデータ構造を決定する第２のステップと、前記クエリ構築部が、前記解析の結果及び前記決定されたデータ構造の情報に基づいて、前記選択されたオペレータに対応するオペレータ実行部を生成する第３のステップと、前記クエリ構築部が、前記生成されたオペレータ実行部を前記クエリ実行部に登録する第４のステップと、を含むことを特徴とする。 A typical example of the invention disclosed in the present application is as follows. That is, a data processing method in a computer that executes stream data processing, wherein the computer includes a processor, a memory connected to the processor, and a network interface connected to the processor. The stream data that is the target of the stream data processing includes a plurality of tuples composed of a plurality of columns, and the computer has at least one for realizing the stream data processing. Based on the information of the query graph, the query analysis unit that generates the information of the query graph indicating the configuration of the operator in the stream data processing by receiving the analysis scenario including the query of, and analyzing the analysis scenario, Corresponding to the operator A query construction unit that generates an operator execution unit that executes processing, and a query execution unit that processes input stream data based on the operator execution unit generated by the query construction unit, and the method A first step in which the query construction unit selects an operator to be processed from the plurality of operators included in the stream data processing, and the query construction unit, based on information in the query graph, A second step of analyzing a processing content of the selected operator and determining a data structure in a data storage area allocated to the operator execution unit corresponding to the selected operator based on a result of the analysis; The query construction unit selects the selection based on the analysis result and the information on the determined data structure. A third step of generating an operator execution unit corresponding to the generated operator, and a fourth step of the query construction unit registering the generated operator execution unit in the query execution unit. And

本発明によれば、ストリームデータ処理に含まれるオペレータ毎に、配列データのデータ構造を最適化することができるため、計算処理のコストの低減及びメモリ利用効率を向上させることができる。したがって、ストリームデータ処理の高速化が可能となる。 According to the present invention, since the data structure of the array data can be optimized for each operator included in the stream data processing, the calculation processing cost can be reduced and the memory utilization efficiency can be improved. Therefore, it is possible to speed up the stream data processing.

上記した以外の課題、構成及び効果は、以下の実施形態の説明により明らかにされる。 Problems, configurations, and effects other than those described above will be clarified by the following description of embodiments.

本発明の実施例１におけるストリームデータ処理システムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the stream data processing system in Example 1 of this invention. 本発明の実施例１におけるストリームデータ処理部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the stream data processing part in Example 1 of this invention. 本発明の実施例１の分析シナリオの一例を示す説明図である。It is explanatory drawing which shows an example of the analysis scenario of Example 1 of this invention. 本発明の実施例１のクエリグラフ構成情報の一例を示す説明図である。It is explanatory drawing which shows an example of the query graph structure information of Example 1 of this invention. 本発明の実施例のストリームデータ処理部が分析シナリオの登録時に実行する処理を説明するフローチャートである。It is a flowchart explaining the process which the stream data processing part of the Example of this invention performs at the time of registration of an analysis scenario. 本発明の実施例１のＰａｒｔｉｔｉｏｎウィンドウ最適化部が実行する処理を説明するフローチャートである。It is a flowchart explaining the process which the Partition window optimization part of Example 1 of this invention performs. 本発明の実施例１のＪｏｉｎオペレータ最適化部が実行する処理を説明するフローチャートである。It is a flowchart explaining the process which the Join operator optimization part of Example 1 of this invention performs. 本発明の実施例１のＧｒｏｕｐＢｙオペレータ最適化部が実行する処理を説明するフローチャートである。It is a flowchart explaining the process which the GroupBy operator optimization part of Example 1 of this invention performs. 本発明の実施例１のオペレータ間のデータの流れを示す説明図である。It is explanatory drawing which shows the flow of the data between operators of Example 1 of this invention. 本発明の実施例１におけるオペレータ間のデータの流れを示す説明図である。It is explanatory drawing which shows the flow of the data between operators in Example 1 of this invention. 従来の二つ行列の積を算出する演算処理を実現するためのクエリの一例を示す図である。It is a figure which shows an example of the query for implement | achieving the arithmetic processing which calculates the product of the conventional two matrices. 従来の配列型カラムを用いたクエリの一例を示す図である。It is a figure which shows an example of the query using the conventional array type | mold column.

以下、本発明の実施例を添付図面に基づいて説明する。 Embodiments of the present invention will be described below with reference to the accompanying drawings.

図１は、本発明の実施例１におけるストリームデータ処理システムの構成例を示すブロック図である。 FIG. 1 is a block diagram showing a configuration example of a stream data processing system in Embodiment 1 of the present invention.

ストリームデータ処理システムは、ストリームデータ処理サーバ１００、及び、複数のホスト計算機１２０、１３０、１４０から構成される。 The stream data processing system includes a stream data processing server 100 and a plurality of host computers 120, 130, and 140.

ストリームデータ処理サーバ１００は、ネットワーク１５０を介して複数のホスト計算機１２０、１３０、１４０と接続される。なお、ネットワーク１５０は、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）及びＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）などを用いることが考えられるが、本発明はネットワーク１５０の接続形式に限定されない。 The stream data processing server 100 is connected to a plurality of host computers 120, 130, and 140 via a network 150. Note that the network 150 may be a wide area network (WAN), a local area network (LAN), or the like, but the present invention is not limited to the connection type of the network 150.

ストリームデータ処理サーバ１００は、ホスト計算機１２０から送信されるストリームデータを受信し、また、指示されたクエリに従ってストリームデータを処理し、処理結果を出力タプル１４１として出力する。 The stream data processing server 100 receives the stream data transmitted from the host computer 120, processes the stream data according to the instructed query, and outputs the processing result as an output tuple 141.

ストリームデータは、複数の入力タプル１２１を含む。本実施例では、入力タプル１２１は、値のカラム、及び当該値を特定するためのインデクスのカラムを含む。なお、入力タプル１２１には、インデクスのカラムが一つ以上含まれるものとする。また、入力タプル１２１は、タイムスタンプのカラムを含んでいてもよい。 The stream data includes a plurality of input tuples 121. In this embodiment, the input tuple 121 includes a value column and an index column for specifying the value. It is assumed that the input tuple 121 includes one or more index columns. The input tuple 121 may include a time stamp column.

ストリームデータ処理サーバ１００は、プロセッサ１０１、メモリ１０２、ネットワークインタフェース１０４、及びストレージ装置１０５を備え、各構成はバス１０３を介して接続される。 The stream data processing server 100 includes a processor 101, a memory 102, a network interface 104, and a storage device 105, and each component is connected via a bus 103.

プロセッサ１０１は、メモリ１０２に格納されるプログラムを実行することによって各種処理を実行する。以下の説明では、プログラムを主体に処理を記載する場合、当該プログラムがプロセッサ１０１によって実行されていることを示す。 The processor 101 executes various processes by executing a program stored in the memory 102. In the following description, when processing is described mainly with a program, it indicates that the program is being executed by the processor 101.

メモリ１０２は、プロセッサ１０１が実行するプログラム及び当該プログラムを実行するために必要な情報を格納する。具体的には、メモリ１０２は、ストリームデータ処理部１１０を実現するプログラムを格納する。 The memory 102 stores a program executed by the processor 101 and information necessary for executing the program. Specifically, the memory 102 stores a program for realizing the stream data processing unit 110.

ストリームデータ処理部１１０は、ストリームデータを処理する。ストリームデータ処理部１１０の詳細については、図２を用いて後述する。 The stream data processing unit 110 processes stream data. Details of the stream data processing unit 110 will be described later with reference to FIG.

ストレージ装置１０５は、ストリームデータ、クエリ、及びその他の情報を格納する。ストレージ装置１０５は、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）又はＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記憶媒体が考えられる。なお、本発明は記憶媒体の種類に限定されない。 The storage device 105 stores stream data, queries, and other information. The storage device 105 may be, for example, a storage medium such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive). The present invention is not limited to the type of storage medium.

なお、メモリ１０２に格納されるプログラム及び情報は、ストレージ装置１０５に格納されてもよい。この場合、プロセッサ１０１がストレージ装置１０５からプログラム及び情報を読み出し、読み出されたプログラム及び情報をメモリ１０２にロードする。 Note that the program and information stored in the memory 102 may be stored in the storage device 105. In this case, the processor 101 reads a program and information from the storage device 105 and loads the read program and information into the memory 102.

ホスト計算機１２０、１３０、１４０は、ストリームデータ処理サーバ１００を利用するユーザが利用する計算機であり、プロセッサ（図示省略）、メモリ（図示省略）、及びネットワークインタフェース（図示省略）を備える。 The host computers 120, 130, and 140 are computers used by users who use the stream data processing server 100, and include a processor (not shown), a memory (not shown), and a network interface (not shown).

ホスト計算機１２０は、ストリームデータ処理サーバ１００が処理するストリームデータ（入力タプル１２１）を送信する。なお、入力タプル１２１は、複数のホスト計算機１２０を含むシステムから送信されてもよい。 The host computer 120 transmits stream data (input tuple 121) to be processed by the stream data processing server 100. The input tuple 121 may be transmitted from a system including a plurality of host computers 120.

ホスト計算機１４０は、ストリームデータ処理サーバ１００によって処理された出力タプル１４１を受信する。なお、出力タプル１４１は、複数のホスト計算機１４０を含むシステムが受信してもよい。 The host computer 140 receives the output tuple 141 processed by the stream data processing server 100. The output tuple 141 may be received by a system including a plurality of host computers 140.

ホスト計算機１３０は、クエリ登録インタフェース１３１を実現するためのプログラムを実行する。クエリ登録インタフェース１３１は、分析シナリオ１３２を登録し、また、当該分析シナリオ１３２の実行を命令するためのインタフェースである。 The host computer 130 executes a program for realizing the query registration interface 131. The query registration interface 131 is an interface for registering the analysis scenario 132 and instructing execution of the analysis scenario 132.

ここで、分析シナリオ１３２は、一つ以上のクエリを含む。また、クエリは、一つ以上のオペレータを含む。オペレータは、ストリームデータ処理における最小単位の処理を示す。オペレータは、例えば、データの抽出処理、データの集約処理等が対応する。 Here, the analysis scenario 132 includes one or more queries. The query also includes one or more operators. The operator indicates the minimum unit processing in the stream data processing. The operator corresponds to, for example, data extraction processing, data aggregation processing, and the like.

本実施例の分析シナリオ１３２には、一つ以上の入力ストリームを定義するストリーム定義情報、及び一つ以上のクエリを定義するクエリ定義情報が含まれる。なお、分析シナリオ１３２の詳細については、図３を用いて後述する。 The analysis scenario 132 of this embodiment includes stream definition information that defines one or more input streams, and query definition information that defines one or more queries. Details of the analysis scenario 132 will be described later with reference to FIG.

ストリームデータ処理サーバ１００は、分析シナリオ１３２が入力されると、当該分析シナリオ１３２を分析し、所定のストリームデータ処理を実行するためのクエリグラフを構成する。ストリームデータ処理サーバ１００は、クエリグラフにしたがって、ストリームデータ（入力タプル１２１）を処理する。 When the analysis scenario 132 is input, the stream data processing server 100 analyzes the analysis scenario 132 and configures a query graph for executing predetermined stream data processing. The stream data processing server 100 processes the stream data (input tuple 121) according to the query graph.

図２は、本発明の実施例１におけるストリームデータ処理部１１０の構成例を示すブロック図である。 FIG. 2 is a block diagram illustrating a configuration example of the stream data processing unit 110 according to the first embodiment of this invention.

ストリームデータ処理部１１０は、クエリパーサ２０１、クエリ構築部２０３、クエリ実行部２１０、タプル入力部２２１、及びタプル出力部２２２を備える。また、ストリームデータ処理部１１０は、クエリグラフ構成情報２０２を保持する。 The stream data processing unit 110 includes a query parser 201, a query construction unit 203, a query execution unit 210, a tuple input unit 221, and a tuple output unit 222. In addition, the stream data processing unit 110 holds query graph configuration information 202.

クエリパーサ２０１は、分析シナリオ１３２の入力を受け付け、受け付けた分析シナリオ１３２を解析し、解析結果に基づいてクエリグラフを生成する。生成されたクエリグラフに関する情報は、クエリグラフ構成情報２０２として保持される。クエリグラフ構成情報２０２の詳細については、図４を用いて後述する。 The query parser 201 receives an input of the analysis scenario 132, analyzes the received analysis scenario 132, and generates a query graph based on the analysis result. Information about the generated query graph is held as query graph configuration information 202. Details of the query graph configuration information 202 will be described later with reference to FIG.

前述したように、分析シナリオには一つ以上のオペレータが含まれる。したがって、クエリグラフは、一つ以上のオペレータから構成される。 As described above, the analysis scenario includes one or more operators. Therefore, the query graph is composed of one or more operators.

図２に示す例では、オペレータ１、オペレータ２、オペレータ３、オペレータ４、オペレータ５、及びオペレータ６から構成されるクエリグラフを示す。なお、本発明は、オペレータの処理内容に限定されない。 In the example illustrated in FIG. 2, a query graph including an operator 1, an operator 2, an operator 3, an operator 4, an operator 5, and an operator 6 is illustrated. The present invention is not limited to the processing content of the operator.

クエリ構築部２０３は、クエリグラフ構成情報２０２に基づいて、オペレータ実行部２１１を生成し、生成されたオペレータ実行部２１１をクエリ実行部２１０に出力する。 The query construction unit 203 generates an operator execution unit 211 based on the query graph configuration information 202, and outputs the generated operator execution unit 211 to the query execution unit 210.

オペレータ実行部２１１は、クエリグラフに含まれるオペレータの具体的な処理を実行するモジュールである。オペレータ実行部２１１には、処理対象のタプル、タプルに含まれるカラム、及び処理に使う関数等の、処理の実行に必要な設定情報が含まれる。本実施例では、さらに、オペレータのデータ格納領域の設定情報が含まれる。 The operator execution unit 211 is a module that executes specific processing of the operator included in the query graph. The operator execution unit 211 includes setting information necessary for execution of processing, such as processing target tuples, columns included in the tuples, and functions used for processing. In the present embodiment, setting information of the operator's data storage area is further included.

ここで、データ格納領域は、オペレータに対応するオペレータ実行部２１１が一時的にタプルを保持する記憶領域である。また、データ格納領域の設定情報は、データ格納領域に保持されるデータの構造を設定するための情報であり、例えば、多次元配列のデータ構造を設定するための情報が考えられる。データ格納領域の設定情報の例は、図６を用いて後述する。 Here, the data storage area is a storage area in which the operator execution unit 211 corresponding to the operator temporarily holds tuples. The data storage area setting information is information for setting the structure of data held in the data storage area. For example, information for setting the data structure of a multidimensional array can be considered. An example of the setting information of the data storage area will be described later with reference to FIG.

クエリ構築部２０３は、クエリ最適化部２０４を含む。クエリ最適化部２０４は、クエリグラフ構成情報２０２に基づいて、分析シナリオ１３２を解析し、各オペレータのデータ格納領域の管理方法を決定する。 The query construction unit 203 includes a query optimization unit 204. The query optimization unit 204 analyzes the analysis scenario 132 based on the query graph configuration information 202 and determines a management method for the data storage area of each operator.

本実施例では、クエリグラフに含まれるオペレータとして、Ｐａｒｔｉｔｉｏｎウィンドウ演算に対応するオペレータ、Ｊｏｉｎオペレータ、及びＧｒｏｕｐＢｙオペレータを想定する。したがって、クエリ最適化部２０４は、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５、Ｊｏｉｎオペレータ最適化部２０６、及びＧｒｏｕｐＢｙオペレータ最適化部２０７を含む。 In the present embodiment, as operators included in the query graph, an operator corresponding to a Partition window operation, a Join operator, and a GroupBy operator are assumed. Therefore, the query optimization unit 204 includes a Partition window optimization unit 205, a Join operator optimization unit 206, and a GroupBy operator optimization unit 207.

Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、Ｐａｒｔｉｔｉｏｎウィンドウ演算に対応するオペレータのデータ格納領域の管理方法を決定する。Ｊｏｉｎオペレータ最適化部２０６は、Ｊｏｉｎオペレータのデータ格納領域の管理方法を決定する。ＧｒｏｕｐＢｙオペレータ最適化部２０７は、ＧｒｏｕｐＢｙオペレータのデータ格納領域の管理方法を決定する。 The partition window optimization unit 205 determines a management method of the data storage area of the operator corresponding to the partition window calculation. The Join operator optimization unit 206 determines a management method for the data storage area of the Join operator. The GroupBy operator optimization unit 207 determines a method for managing the data storage area of the GroupBy operator.

本実施例では、クエリ最適化部２０４が、各オペレータのデータ格納領域の管理方法を決定し、当該処理結果を出力する。クエリ構築部２０３は、当該処理結果に基づいて、各オペレータに対応するオペレータ実行部２１１を生成する。 In this embodiment, the query optimization unit 204 determines a method for managing the data storage area of each operator and outputs the processing result. The query construction unit 203 generates an operator execution unit 211 corresponding to each operator based on the processing result.

なお、クエリ最適化部２０４が実行する処理の詳細については、図５から図８を用いて後述する。 Details of processing executed by the query optimization unit 204 will be described later with reference to FIGS.

タプル入力部２２１は、ストリームデータ（入力タプル１２１）の入力を受け付ける。また、タプル入力部２２１は、受け付けたストリームデータ（入力タプル１２１）をクエリ実行部２１０に出力する。 The tuple input unit 221 receives input of stream data (input tuple 121). Further, the tuple input unit 221 outputs the received stream data (input tuple 121) to the query execution unit 210.

クエリ実行部２１０は、入力タプル１２１を用いてストリームデータ処理を実行し、また、処理結果をタプル出力部２２２に出力する。 The query execution unit 210 executes stream data processing using the input tuple 121 and outputs the processing result to the tuple output unit 222.

クエリ実行部２１０は、クエリグラフを構成するオペレータ毎に、オペレータ実行部を含む。図２に示す例では、クエリ実行部２１０には、オペレータ１実行部２１１−１、オペレータ２実行部２１１−２、オペレータ３実行部２１１−３、オペレータ４実行部２１１−４、オペレータ５実行部２１１−５、及びオペレータ６実行部２１１−６が含まれる。 The query execution unit 210 includes an operator execution unit for each operator constituting the query graph. In the example illustrated in FIG. 2, the query execution unit 210 includes an operator 1 execution unit 211-1, an operator 2 execution unit 211-2, an operator 3 execution unit 211-3, an operator 4 execution unit 211-4, and an operator 5 execution unit. 211-5 and an operator 6 execution unit 211-6.

以下の説明では、各オペレータ実行部２１１−１、２１１−２、２１１−３、２１１−４、２１１−５、２１１−６を区別しない場合、オペレータ実行部２１１とも記載する。 In the following description, the operator execution units 211-1, 211-2, 211-3, 211-4, 211-5, and 211-6 are also referred to as operator execution units 211 when not distinguished.

タプル出力部２２２は、クエリ実行部２１０から出力された処理結果を、出力タプル１４１としてホスト計算機１４０に送信する。 The tuple output unit 222 transmits the processing result output from the query execution unit 210 to the host computer 140 as an output tuple 141.

図３は、本発明の実施例１の分析シナリオ１３２の一例を示す説明図である。 FIG. 3 is an explanatory diagram illustrating an example of the analysis scenario 132 according to the first embodiment of this invention.

図３に示す分析シナリオ１３２は、行列Ａと行列Ｂの各成分を表現するストリーム定義情報、及び両行列の積を算出するための一つのクエリ定義情報から構成される。 The analysis scenario 132 shown in FIG. 3 is composed of stream definition information expressing each component of the matrix A and the matrix B, and one query definition information for calculating the product of both matrices.

なお、カラム「ｉ」は行列の行に対応するインデクス、カラム「ｊ」は行列の列に対応するインデクス、カラム「ｖ」は行列成分の値を表す。また、カラム「Ａ．ｉ」は行列Ａのカラム「ｉ」、カラム「Ａ．ｊ」は行列Ａのカラム「ｊ」、カラム「Ａ．ｖ」は行列Ａのカラム「ｖ」を表す。行列Ｂについても同様である。 The column “i” represents an index corresponding to a matrix row, the column “j” represents an index corresponding to a matrix column, and the column “v” represents a matrix component value. The column “A.i” represents the column “i” of the matrix A, the column “A.j” represents the column “j” of the matrix A, and the column “A.v” represents the column “v” of the matrix A. The same applies to the matrix B.

図３に示すクエリＣでは、以下のような行列の積の演算が定義される。 In the query C shown in FIG. 3, the following matrix product operations are defined.

まず、ＰＡＲＴＩＴＩＯＮウィンドウの指定に従って、ストリームデータＡのカラム「ｉ」、及びカラム「ｊ」の組合せに対してカラム「ｖ」の値が一つだけ抽出される。同様に、ストリームデータＢのカラム「ｉ」、カラム「ｊ」の組合せに対してカラム「ｖ」の値が一つだけ抽出される。 First, only one value of the column “v” is extracted for the combination of the column “i” and the column “j” of the stream data A according to the designation of the PARTION window. Similarly, only one value of the column “v” is extracted for the combination of the column “i” and the column “j” of the stream data B.

次に、ＷＨＥＲＥ句の指定に従って、カラム「Ａ．ｊ」及びカラム「Ｂ．ｉ」が一致するタプルが結合される。 Next, tuples matching the columns “A.j” and “B.i” are combined according to the specification of the WHERE clause.

次に、ＧＲＯＵＰＢＹ句の指定に従って、カラム「Ａ．ｉ」及び「Ｂ．ｊ」の組合せ毎にタプルのグループ分けが行われ、ＳＥＬＥＣＴ句の指定に従って、グループ内のタプルのカラム「Ａ．ｖ」とカラム「Ｂ．ｖ」との積の総和が算出される。 Next, tuples are grouped for each combination of the columns “A.i” and “B.j” according to the specification of the GROUP BY clause, and the tuple columns “A.v” within the group according to the specification of the SELECT clause. ”And the column“ Bv ”is calculated.

本実施例では、入力ストリームを定義するＲＥＧＩＳＴＥＲＳＴＲＥＡＭ句に、インデクスに対応するカラムが配列インデクス型カラムであることを指定する文字列「ＡＲＲＡＹ＿ＩＮＤＥＸ」を用いる。 In this embodiment, a character string “ARRAY_INDEX” that specifies that the column corresponding to the index is an array index type column is used in the REGISTER STREAM clause that defines the input stream.

ここで、配列インデクス型カラムは、カラムの値域が１から連続する整数であるカラムを表す。なお、以下の説明では、配列インデクス型カラム以外のカラムを通常型カラムとも記載する。 Here, the array index type column represents a column whose value range is an integer that continues from 1. In the following description, columns other than the array index type column are also referred to as normal type columns.

図３に示す例では、入力ストリームＡのカラム「ｉ」及びカラム「ｊ」、並びに、入力ストリームＢのカラム「ｉ」及びカラム「ｊ」が、配列インデクス型カラムとして指定されている。 In the example illustrated in FIG. 3, the column “i” and the column “j” of the input stream A and the column “i” and the column “j” of the input stream B are designated as the array index type columns.

図４は、本発明の実施例１のクエリグラフ構成情報２０２の一例を示す説明図である。 FIG. 4 is an explanatory diagram illustrating an example of the query graph configuration information 202 according to the first embodiment of this invention.

クエリグラフ構成情報２０２は、複数の情報（テーブル）を含む。具体的には、クエリグラフ構成情報２０２は、オペレータ管理情報４１０、Ｐａｒｔｉｔｉｏｎウィンドウ管理情報４２０、Ｊｏｉｎオペレータ管理情報４３０、及びＧｒｏｕｐＢｙオペレータ管理情報４４０を含む。 The query graph configuration information 202 includes a plurality of information (tables). Specifically, the query graph configuration information 202 includes operator management information 410, Partition window management information 420, Join operator management information 430, and GroupBy operator management information 440.

オペレータ管理情報４１０は、クエリグラフに含まれる全てのオペレータの概要を管理するための情報である。オペレータ管理情報４１０は、オペレータＩＤ４１１、タイプ４１２、及び入力４１３を含む。 The operator management information 410 is information for managing an overview of all operators included in the query graph. The operator management information 410 includes an operator ID 411, a type 412, and an input 413.

オペレータＩＤ４１１は、クエリグラフに含まれるオペレータを一意に識別するための識別情報である。 The operator ID 411 is identification information for uniquely identifying an operator included in the query graph.

タイプ４１２は、オペレータの種別を示す情報である。本実施例では、「Ｐａｒｔｉｔｉｏｎ」、「Ｊｏｉｎ」又は「ＧｒｏｕｐＢｙ」のいずれかの情報がタイプ４１２に格納される。 Type 412 is information indicating the type of operator. In this embodiment, information of “Partition”, “Join”, or “GroupBy” is stored in the type 412.

入力４１３は、オペレータに対して入力されるデータの識別情報である。 The input 413 is data identification information input to the operator.

Ｐａｒｔｉｔｉｏｎウィンドウ管理情報４２０は、Ｐａｒｔｉｔｉｏｎウィンドウ演算に対応するオペレータの詳細情報である。Ｐａｒｔｉｔｉｏｎウィンドウ管理情報４２０は、オペレータＩＤ４２１、入力カラム４２２、Ｇｒｏｕｐｉｎｇカラム４２３、ＡｒｒａｙＩｎｄｅｘカラム４２４、及びウィンドウサイズ４２５を含む。 The partition window management information 420 is detailed information of an operator corresponding to the partition window calculation. The Partition window management information 420 includes an operator ID 421, an input column 422, a Grouping column 423, an Array Index column 424, and a window size 425.

オペレータＩＤ４２１は、オペレータＩＤ４１１と同一のものである。 The operator ID 421 is the same as the operator ID 411.

入力カラム４２２は、Ｐａｒｔｉｔｉｏｎウィンドウ演算に対応するオペレータに入力されるタプルに含まれるカラムの識別情報である。 The input column 422 is column identification information included in the tuple input to the operator corresponding to the Partition window calculation.

Ｇｒｏｕｐｉｎｇカラム４２３は、Ｐａｒｔｉｔｉｏｎウィンドウ演算におけるグループ分け対象のカラムの識別情報である。なお、Ｐａｒｔｉｔｉｏｎウィンドウ演算では、複数のタプルの集合から複数の部分集合（グループ）が生成される。 A Grouping column 423 is identification information of a column to be grouped in the Partition window calculation. In the Partition window operation, a plurality of subsets (groups) are generated from a set of a plurality of tuples.

ＡｒｒａｙＩｎｄｅｘカラム４２４は、配列インデクス型カラムに対応するカラムの識別情報である。 The Array Index column 424 is column identification information corresponding to the sequence index type column.

ウィンドウサイズ４２５は、部分集合に含まれる要素の数を示す。図４に示す例では、カラム「ｉ」とカラム「ｊ」との組合せに対して、同時に生存できるタプルが高々一つに限られることを示す。したがって、カラム「ｉ」が「１」、カラム「ｊ」が「２」のタプルが入力された後、新たに、カラム「ｉ」が「１」、カラム「ｊ」が「２」のタプルが入力されると、新たなタプルのみが存在することができる。 The window size 425 indicates the number of elements included in the subset. The example shown in FIG. 4 shows that only one tuple can survive at the same time for the combination of the column “i” and the column “j”. Therefore, after a tuple of “1” in column “i” and “2” in column “j” is input, a new tuple of “1” in column “i” and “2” in column “j” is newly created. When entered, only new tuples can exist.

Ｊｏｉｎオペレータ管理情報４３０は、Ｊｏｉｎオペレータの詳細情報である。Ｊｏｉｎオペレータ管理情報４３０は、オペレータＩＤ４３１、入力カラム４３２、Ｊｏｉｎカラム４３３、及びＡｒｒａｙＩｎｄｅｘカラム４３４を含む。 Join operator management information 430 is detailed information of the Join operator. The join operator management information 430 includes an operator ID 431, an input column 432, a join column 433, and an array index column 434.

オペレータＩＤ４３１は、オペレータＩＤ４１１と同一のものである。 The operator ID 431 is the same as the operator ID 411.

入力カラム４３２は、Ｊｏｉｎオペレータに入力されるタプルに含まれるカラムの識別情報である。 The input column 432 is column identification information included in the tuple input to the Join operator.

Ｊｏｉｎカラム４３３は、結合対象のカラムの識別情報である。 The Join column 433 is identification information of a column to be combined.

ＡｒｒａｙＩｎｄｅｘカラム４３４は、配列インデクス型カラムに対応するカラムの識別情報である。 The Array Index column 434 is column identification information corresponding to the sequence index type column.

ＧｒｏｕｐＢｙオペレータ管理情報４４０は、ＧｒｏｕｐＢｙオペレータの詳細情報である。ＧｒｏｕｐＢｙオペレータ管理情報４４０は、オペレータＩＤ４４１、入力カラム４４２、Ｇｒｏｕｐｉｎｇカラム４４３、及びＡｒｒａｙＩｎｄｅｘカラム４４４を含む。 The GroupBy operator management information 440 is detailed information of the GroupBy operator. The GroupBy operator management information 440 includes an operator ID 441, an input column 442, a Grouping column 443, and an Array Index column 444.

オペレータＩＤ４４１は、オペレータＩＤ４１１と同一のものである。 The operator ID 441 is the same as the operator ID 411.

入力カラム４４２は、ＧｒｏｕｐＢｙオペレータに入力されるタプルに含まれるカラムの識別情報である。 The input column 442 is column identification information included in the tuple input to the GroupBy operator.

Ｇｒｏｕｐｉｎｇカラム４４３は、ＧｒｏｕｐＢｙオペレータにおけるグループ分け対象のカラムの識別情報である。 The Grouping column 443 is identification information of a column to be grouped by the GroupBy operator.

ＡｒｒａｙＩｎｄｅｘカラム４４４は、配列インデクス型カラムに対応するカラムの識別情報である。 The Array Index column 444 is column identification information corresponding to the sequence index type column.

図５は、本発明の実施例１のストリームデータ処理部１１０が分析シナリオ１３２の登録時に実行する処理を説明するフローチャートである。 FIG. 5 is a flowchart for describing processing executed by the stream data processing unit 110 according to the first embodiment of this invention when the analysis scenario 132 is registered.

ストリームデータ処理部１１０は、分析シナリオ１３２を受け付けると（ステップＳ５０１）、当該分析シナリオ１３２に基づいて、クエリグラフ構成情報２０２を生成する（ステップＳ５０２）。具体的には、以下のような処理が実行される。なお、ここでは、図３に示す分析シナリオ１３２を受け付けたものとする。 Upon receiving the analysis scenario 132 (step S501), the stream data processing unit 110 generates the query graph configuration information 202 based on the analysis scenario 132 (step S502). Specifically, the following processing is executed. Here, it is assumed that the analysis scenario 132 shown in FIG. 3 has been received.

まず、クエリパーサ２０１は、分析シナリオ１３２に基づいて、クエリＣに含まれるオペレータを特定する。本実施例では、クエリパーサ２０１は、クエリＣの定義情報から、二つのＰａｒｔｉｔｉｏｎウィンドウ演算に対応するオペレータ、一つのＪｏｉｎオペレータ、及びの一つのＧｒｏｕｐＢｙオペレータを特定する。 First, the query parser 201 identifies an operator included in the query C based on the analysis scenario 132. In this embodiment, the query parser 201 specifies an operator corresponding to two Partition window operations, one Join operator, and one GroupBy operator from the definition information of the query C.

クエリパーサ２０１は、オペレータ管理情報４１０、Ｐａｒｔｉｔｉｏｎウィンドウ管理情報４２０、Ｊｏｉｎオペレータ管理情報４３０、及びＧｒｏｕｐＢｙオペレータ管理情報４４０を初期化する。 The query parser 201 initializes the operator management information 410, the partition window management information 420, the join operator management information 430, and the GroupBy operator management information 440.

クエリパーサ２０１は、特定されたオペレータに「１」から「４」の識別番号を付与する。例えば、クエリパーサ２０１は、オペレータの実行順序に識別番号を付与する。 The query parser 201 gives identification numbers “1” to “4” to the specified operator. For example, the query parser 201 gives an identification number to the execution order of the operators.

クエリパーサ２０１は、オペレータ管理情報４１０にエントリを生成し、生成されたエントリのオペレータＩＤ４１１に付与された識別番号を格納し、タイプ４１２に各オペレータの識別情報を格納する。 The query parser 201 generates an entry in the operator management information 410, stores the identification number assigned to the operator ID 411 of the generated entry, and stores the identification information of each operator in the type 412.

また、クエリパーサ２０１は、クエリＣの定義情報から各オペレータの入出力関係を解析し、解析結果に基づいて入力４１３に入力されるデータの識別情報を格納する。例えば、オペレータＩＤ４１１が「１」のエントリは、入力ストリームＡが入力されるＰａｒｔｉｔｉｏｎウィンドウ演算に対応するオペレータであるため、タイプ４１２には「Ｐａｒｔｉｔｉｏｎ」が格納され、入力４１３には「ＳｔｒｅａｍＡ」が格納される。 Further, the query parser 201 analyzes the input / output relationship of each operator from the definition information of the query C, and stores identification information of data input to the input 413 based on the analysis result. For example, an entry whose operator ID 411 is “1” is an operator corresponding to a Partition window operation in which the input stream A is input. Therefore, “Partition” is stored in the type 412, and “Stream A” is stored in the input 413. Stored.

次に、クエリパーサ２０１は、各オペレータの管理情報を生成する。本実施例では、Ｐａｒｔｉｔｉｏｎウィンドウ管理情報４２０、Ｊｏｉｎオペレータ管理情報４３０、及びＧｒｏｕｐＢｙオペレータ管理情報４４０が生成される。 Next, the query parser 201 generates management information for each operator. In the present embodiment, Partition window management information 420, Join operator management information 430, and GroupBy operator management information 440 are generated.

Ｐａｒｔｉｔｉｏｎウィンドウ管理情報４２０は以下のようにして生成される。 The partition window management information 420 is generated as follows.

クエリパーサ２０１は、入力ストリームＡの定義情報に基づいて、タプルに含まれるカラムを特定する。また、クエリパーサ２０１は、入力ストリームＡの定義情報に基づいて、配列インデクス型カラムを特定する。本実施例では、クエリパーサ２０１は、カラムｉ及びカラムｊが配列インデクス型カラムであることが分かる。 The query parser 201 specifies the columns included in the tuple based on the definition information of the input stream A. In addition, the query parser 201 specifies an array index type column based on the definition information of the input stream A. In this embodiment, the query parser 201 knows that the column i and the column j are array index type columns.

また、クエリパーサ２０１は、クエリＣの定義情報のＦＲＯＭ句に基づいて、グループ分けの条件、及びウィンドウサイズを特定する。 Further, the query parser 201 specifies grouping conditions and a window size based on the FROM clause of the definition information of the query C.

クエリパーサ２０１は、Ｐａｒｔｉｔｉｏｎウィンドウ管理情報４２０にエントリを生成し、前述した解析結果に基づいて、生成されたエントリに必要な情報を格納する。入力ストリームＢが入力されるＰａｒｔｉｔｉｏｎウィンドウ演算に対応するオペレータについても同様の処理が実行される。 The query parser 201 generates an entry in the Partition window management information 420 and stores information necessary for the generated entry based on the analysis result described above. The same processing is executed for the operator corresponding to the Partition window calculation to which the input stream B is input.

Ｊｏｉｎオペレータ管理情報４３０は以下のようにして生成される。 Join operator management information 430 is generated as follows.

クエリパーサ２０１は、入力ストリームＡの定義情報、入力ストリームＢの定義情報、及びクエリＣの定義情報のＷＨＥＲＥ句に基づいて、タプルに含まれるカラム、及び結合条件を特定する。 The query parser 201 specifies the columns included in the tuple and the join condition based on the definition information of the input stream A, the definition information of the input stream B, and the WHERE clause of the definition information of the query C.

また、クエリパーサ２０１は、入力ストリームＡの定義情報、及び入力ストリームＢの定義情報に基づいて、タプルに含まれるカラムの中から配列インデクス型カラムを特定する。 In addition, the query parser 201 identifies an array index type column from among the columns included in the tuple based on the definition information of the input stream A and the definition information of the input stream B.

本実施例では、カラム「ｉ」及びカラム「ｊ」が配列インデクス型カラムとして指定されているため、クエリパーサ２０１は、当該カラムに対応するカラム「Ａｉ」、カラム「Ａｊ」、カラム「Ｂｉ」、及びカラム「Ｂｊ」も配列インデクス型カラムであることが分かる。 In this embodiment, since the column “i” and the column “j” are designated as the array index type column, the query parser 201 has the column “Ai”, the column “Aj”, the column “Bi”, It can also be seen that the column “Bj” is also an array index type column.

クエリパーサ２０１は、Ｊｏｉｎオペレータ管理情報４３０にエントリを生成し、前述した解析結果に基づいて、生成されたエントリに必要な情報を格納する。 The query parser 201 generates an entry in the join operator management information 430, and stores information necessary for the generated entry based on the analysis result described above.

ＧｒｏｕｐＢｙオペレータ管理情報４４０は以下のようにして生成される。 The GroupBy operator management information 440 is generated as follows.

クエリパーサ２０１は、入力ストリームＡの定義情報、入力ストリームＢの定義情報、及びクエリＣの定義情報のＧＲＯＵＰＢＹ句に基づいて、タプルに含まれるカラム、及びグループ分けの条件を特定する。また、クエリパーサ２０１は、入力ストリームＡの定義情報、及びストリームデータＢの定義情報に基づいて、タプルに含まれるのカラムの中から配列インデクス型カラムを特定する。 The query parser 201 specifies the columns included in the tuple and the grouping conditions based on the definition information of the input stream A, the definition information of the input stream B, and the GROUP BY clause of the definition information of the query C. In addition, the query parser 201 identifies an array index type column from among columns included in the tuple based on the definition information of the input stream A and the definition information of the stream data B.

クエリパーサ２０１は、ＧｒｏｕｐＢｙオペレータ管理情報４４０にエントリを生成し、解析結果に基づいて、生成されたエントリに必要な情報を格納する。 The query parser 201 generates an entry in the GroupBy operator management information 440, and stores information necessary for the generated entry based on the analysis result.

以上が、ステップＳ５０２の処理の説明である。なお、前述した処理は一例であって本発明はこれに限定されない。 The above is the description of the process in step S502. The above-described processing is an example, and the present invention is not limited to this.

次に、ストリームデータ処理部１１０は、オペレータのループ処理を開始する（ステップＳ５０３）。具体的には、以下のような処理が実行される。 Next, the stream data processing unit 110 starts an operator loop process (step S503). Specifically, the following processing is executed.

ストリームデータ処理部１１０は、クエリ構築部２０３を呼び出し、処理の開始を指示する。クエリ構築部２０３は、クエリ最適化部２０４を呼び出し、処理の開始を指示する。 The stream data processing unit 110 calls the query construction unit 203 and instructs the start of processing. The query construction unit 203 calls the query optimization unit 204 and instructs the start of processing.

クエリ最適化部２０４は、クエリグラフ構成情報２０２のオペレータ管理情報４１０を参照し、分析シナリオ１３２に含まれるオペレータの中から処理対象のオペレータを一つ選択する。ここでは、オペレータ管理情報４１０の上のエントリから順に選択するものとする。なお、本発明は、オペレータの選択方法に限定されない。 The query optimization unit 204 refers to the operator management information 410 of the query graph configuration information 202 and selects one operator to be processed from the operators included in the analysis scenario 132. Here, it is assumed that the entries are selected in order from the entry on the operator management information 410. The present invention is not limited to the operator selection method.

クエリ最適化部２０４は、選択されたオペレータがＰａｒｔｉｔｉｏｎウィンドウ演算に対応するオペレータであるか否かを判定する（ステップＳ５０４）。 The query optimization unit 204 determines whether or not the selected operator is an operator corresponding to the Partition window calculation (step S504).

具体的には、クエリ最適化部２０４は、選択されたオペレータに対応するエントリのタイプ４１２が「Ｐａｒｔｉｔｉｏｎ」であるか否かを判定する。タイプ４１２が「Ｐａｒｔｉｔｉｏｎ」である場合、クエリ最適化部２０４は、選択されたオペレータがＰａｒｔｉｔｉｏｎウィンドウ演算に対応するオペレータであると判定する。 Specifically, the query optimization unit 204 determines whether or not the entry type 412 corresponding to the selected operator is “Partition”. When the type 412 is “Partition”, the query optimization unit 204 determines that the selected operator is an operator corresponding to the Partition window calculation.

選択されたオペレータがＰａｒｔｉｔｉｏｎウィンドウ演算に対応するオペレータでないと判定された場合、クエリ最適化部２０４は、ステップＳ５０６に進む。 If it is determined that the selected operator is not an operator corresponding to the Partition window calculation, the query optimization unit 204 proceeds to step S506.

選択されたオペレータがＰａｒｔｉｔｉｏｎウィンドウ演算に対応するオペレータであると判定された場合、クエリ最適化部２０４は、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５を呼び出し、処理の開始を指示する（ステップＳ５０５）。なお、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５が実行する処理については、図６を用いて後述する。 When it is determined that the selected operator is an operator corresponding to the Partition window calculation, the query optimization unit 204 calls the Partition window optimization unit 205 and instructs the start of processing (step S505). The processing executed by the partition window optimization unit 205 will be described later with reference to FIG.

クエリ最適化部２０４は、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５から処理結果が出力されるまで待ち状態となる。 The query optimization unit 204 waits until the processing result is output from the partition window optimization unit 205.

クエリ最適化部２０４は、選択されたオペレータがＪｏｉｎオペレータであるか否かを判定する（ステップＳ５０６）。 The query optimization unit 204 determines whether or not the selected operator is a Join operator (step S506).

具体的には、クエリ最適化部２０４は、選択されたオペレータに対応するエントリのタイプ４１２が「Ｊｏｉｎ」であるか否かを判定する。タイプ４１２が「Ｊｏｉｎ」である場合、クエリ最適化部２０４は、選択されたオペレータがＪｏｉｎオペレータであると判定する。 Specifically, the query optimization unit 204 determines whether or not the entry type 412 corresponding to the selected operator is “Join”. When the type 412 is “Join”, the query optimization unit 204 determines that the selected operator is a Join operator.

選択されたオペレータがＪｏｉｎオペレータでないと判定された場合、クエリ最適化部２０４は、ステップＳ５０８に進む。 If it is determined that the selected operator is not a Join operator, the query optimization unit 204 proceeds to step S508.

選択されたオペレータがＪｏｉｎオペレータであると判定された場合、クエリ最適化部２０４は、Ｊｏｉｎオペレータ最適化部２０６を呼び出し、処理の開始を指示する（ステップＳ５０７）。なお、Ｊｏｉｎオペレータ最適化部２０６が実行する処理については、図７を用いて後述する。 When it is determined that the selected operator is a Join operator, the query optimization unit 204 calls the Join operator optimization unit 206 and instructs the start of processing (step S507). The process executed by the Join operator optimization unit 206 will be described later with reference to FIG.

クエリ最適化部２０４は、Ｊｏｉｎオペレータ最適化部２０６から処理結果が出力されるまで待ち状態となる。 The query optimization unit 204 waits until a processing result is output from the Join operator optimization unit 206.

クエリ最適化部２０４は、選択されたオペレータがＧｒｏｕｐＢｙオペレータであるか否かを判定する（ステップＳ５０８）。 The query optimization unit 204 determines whether or not the selected operator is a GroupBy operator (step S508).

具体的には、クエリ最適化部２０４は、選択されたオペレータに対応するエントリのタイプ４１２が「ＧｒｏｕｐＢｙ」であるか否かを判定する。タイプ４１２が「ＧｒｏｕｐＢｙ」である場合、クエリ最適化部２０４は、選択されたオペレータがＧｒｏｕｐＢｙオペレータであると判定する。 Specifically, the query optimization unit 204 determines whether or not the type 412 of the entry corresponding to the selected operator is “GroupBy”. When the type 412 is “GroupBy”, the query optimization unit 204 determines that the selected operator is a GroupBy operator.

選択されたオペレータがＧｒｏｕｐＢｙオペレータでないと判定された場合、クエリ最適化部２０４は、ステップＳ５１０に進む。 If it is determined that the selected operator is not a GroupBy operator, the query optimization unit 204 proceeds to step S510.

選択されたオペレータがＧｒｏｕｐＢｙオペレータであると判定された場合、クエリ最適化部２０４は、ＧｒｏｕｐＢｙオペレータ最適化部２０７を呼び出し、処理の開始を指示する（ステップＳ５０９）。 If it is determined that the selected operator is a GroupBy operator, the query optimization unit 204 calls the GroupBy operator optimization unit 207 and instructs the start of processing (step S509).

クエリ最適化部２０４は、ＧｒｏｕｐＢｙオペレータ最適化部２０７から処理結果が出力されるまで待ち状態となる。クエリ最適化部２０４は、ＧｒｏｕｐＢｙオペレータ最適化部２０７から処理結果が出力された後、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５、Ｊｏｉｎオペレータ最適化部２０６、又はＧｒｏｕｐＢｙオペレータ最適化部２０７から出力された処理結果をクエリ構築部２０３に出力し、処理を終了する。 The query optimization unit 204 waits until the processing result is output from the GroupBy operator optimization unit 207. After the processing result is output from the GroupBy operator optimization unit 207, the query optimization unit 204 displays the processing result output from the Partition window optimization unit 205, the Join operator optimization unit 206, or the GroupBy operator optimization unit 207. The data is output to the query construction unit 203, and the process ends.

クエリ構築部２０３は、クエリグラフ構成情報２０２、及びクエリ最適化部２０４から出力された処理結果に基づいて、オペレータ実行部２１１を生成する（ステップＳ５１０）。具体的には、以下のような処理が実行される。 The query construction unit 203 generates the operator execution unit 211 based on the query graph configuration information 202 and the processing result output from the query optimization unit 204 (step S510). Specifically, the following processing is executed.

クエリ構築部２０３は、クエリグラフ構成情報２０２に基づいて、処理の実行に必要な設定情報を生成し、また、クエリ最適化部２０４から出力された処理結果に基づいて、オペレータのデータ格納領域の設定情報を生成する。 The query construction unit 203 generates setting information necessary for the execution of processing based on the query graph configuration information 202, and based on the processing result output from the query optimization unit 204, the data storage area of the operator is stored. Generate configuration information.

クエリ構築部２０３は、生成された各設定情報に基づいて、オペレータ実行部２１１を生成する。また、クエリ構築部２０３は、生成されたオペレータ実行部２１１をクエリ実行部２１０に登録する。 The query construction unit 203 generates an operator execution unit 211 based on the generated setting information. In addition, the query construction unit 203 registers the generated operator execution unit 211 in the query execution unit 210.

以上がステップＳ５１０の処理の説明である。 The above is the description of step S510.

次に、クエリ最適化部２０４は、分析シナリオ１３２に含まれる全てのオペレータについて処理を完了したか否かを判定する（ステップＳ５１１）。 Next, the query optimization unit 204 determines whether or not the processing has been completed for all operators included in the analysis scenario 132 (step S511).

分析シナリオ１３２に含まれる全てのオペレータについて処理が完了していないと判定された場合、クエリ最適化部２０４は、ステップＳ５０３に戻り、同様の処理を実行する。 When it is determined that the processing has not been completed for all operators included in the analysis scenario 132, the query optimization unit 204 returns to step S503 and executes the same processing.

分析シナリオ１３２に含まれる全てのオペレータについて処理が完了したと判定された場合、クエリ最適化部２０４は、処理を終了する。 When it is determined that the processing has been completed for all operators included in the analysis scenario 132, the query optimization unit 204 ends the processing.

なお、ストリームデータ処理部１１０は、分析シナリオ１３２の受付を契機に処理を開始しているが、処理開始の契機はこれに限定されない。例えば、ストリームデータ処理部１１０は、受け付けた分析シナリオ１３２をストレージ装置１０５等に格納し、ユーザ等から処理開始の指示が入力された場合に、処理を開始してもよい。 Note that the stream data processing unit 110 starts processing upon reception of the analysis scenario 132, but the processing start trigger is not limited to this. For example, the stream data processing unit 110 may store the received analysis scenario 132 in the storage apparatus 105 or the like, and may start the process when a process start instruction is input from a user or the like.

図６は、本発明の実施例１のＰａｒｔｉｔｉｏｎウィンドウ最適化部２０５が実行する処理を説明するフローチャートである。 FIG. 6 is a flowchart illustrating processing executed by the partition window optimization unit 205 according to the first embodiment of this invention.

Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、クエリ最適化部２０４から呼び出されると以下で説明する処理を開始する。このとき、クエリ最適化部２０４は、処理対象のオペレータの識別情報（オペレータＩＤ）を出力する。 When the partition window optimization unit 205 is called from the query optimization unit 204, the partition window optimization unit 205 starts processing described below. At this time, the query optimization unit 204 outputs the identification information (operator ID) of the operator to be processed.

Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、クエリグラフ構成情報２０２に含まれるＰａｒｔｉｔｉｏｎウィンドウ管理情報４２０を参照し、グループ分け対象のカラムに配列インデクス型カラムが含まれるか否かを判定する（ステップＳ６０１）。具体的には、以下のような処理が実行される。 The partition window optimization unit 205 refers to the partition window management information 420 included in the query graph configuration information 202, and determines whether or not an array index column is included in the grouping target column (step S601). Specifically, the following processing is executed.

Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、Ｐａｒｔｉｔｉｏｎウィンドウ管理情報４２０を参照し、オペレータＩＤ４２１が入力されたオペレータの識別情報と一致するエントリを検索する。 The partition window optimization unit 205 refers to the partition window management information 420 and searches for an entry that matches the operator identification information to which the operator ID 421 is input.

Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、検索されたエントリのＧｒｏｕｐｉｎｇカラム４２３とＡｒｒａｙＩｎｄｅｘカラム４２４とを比較し、ＡｒｒａｙＩｎｄｅｘカラム４２４に、Ｇｒｏｕｐｉｎｇカラム４２３に格納されるカラムと一致するカラムが一つ以上存在するか否かを判定する。 The partition window optimization unit 205 compares the grouping column 423 and the array index column 424 of the searched entry, and the array index column 424 has one or more columns that match the columns stored in the grouping column 423. It is determined whether or not.

なお、Ｇｒｏｕｐｉｎｇカラム４２３がグループ分け対象のカラムに対応し、ＡｒｒａｙＩｎｄｅｘカラム４２４が配列インデクス型カラムに対応する。 The Grouping column 423 corresponds to the grouping target column, and the Array Index column 424 corresponds to the array index type column.

ＡｒｒａｙＩｎｄｅｘカラム４２４に、Ｇｒｏｕｐｉｎｇカラム４２３に格納されるカラムと一致するカラムが一つ以上存在する場合、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、グループ分け対象のカラムに配列インデクス型カラムが含まれると判定する。 If there is at least one column in the Array Index column 424 that matches the column stored in the Grouping column 423, the Partition window optimization unit 205 determines that the column to be grouped includes an array index type column. .

以上がステップＳ６０１の処理の説明である。 The above is the description of the process in step S601.

グループ分け対象のカラムに配列インデクス型カラムが含まれないと判定された場合、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、データ格納領域の設定情報が必要ない旨を処理結果として、クエリ最適化部２０４に出力し（ステップＳ６０６）、処理を終了する。 When it is determined that the column to be grouped does not include an array index type column, the Partition window optimization unit 205 outputs to the query optimization unit 204 as a processing result that the setting information of the data storage area is not necessary (Step S606), and the process ends.

グループ分け対象のカラムに配列インデクス型カラムが含まれると判定された場合、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、Ｐａｒｔｉｔｉｏｎウィンドウ管理情報４２０に基づいて、グループ分け対象のカラムを配列インデクス型カラムと通常型カラムとに分類する（ステップＳ６０２）。 If it is determined that the column to be grouped includes an array index type column, the Partition window optimization unit 205 sets the column to be grouped as an array index type column and a normal type column based on the Partition window management information 420. (Step S602).

Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、分類結果に基づいて、グループ分け対象のカラムの中に通常型カラムが含まれるか否かを判定する（ステップＳ６０３）。 The partition window optimization unit 205 determines whether or not the normal type column is included in the grouping target column based on the classification result (step S603).

例えば、Ｇｒｏｕｐｉｎｇカラム４２３に格納される全てのカラムの識別情報が、ＡｒｒａｙＩｎｄｅｘカラム４２４に格納される場合、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、グループ分け対象のカラムの中に、通常型カラムが含まれないと判定する。 For example, when the identification information of all the columns stored in the Grouping column 423 is stored in the Array Index column 424, the Partition window optimization unit 205 includes the normal type column among the columns to be grouped. Judge that there is no.

図４に示す例では、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、グループ分け対象のカラムの中に通常型カラムが含まれないと判定する。 In the example illustrated in FIG. 4, the partition window optimization unit 205 determines that the normal column is not included in the grouping target columns.

グループ分け対象のカラムの中に、通常型カラムが含まれないと判定された場合、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、データ格納領域におけるデータ構造を決定する（ステップＳ６０４）。その後、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、決定されたデータ格納領域におけるデータ構造の情報を処理結果として出力し（ステップＳ６０６）、処理を終了する。 If it is determined that the normal column is not included in the grouping target column, the partition window optimization unit 205 determines the data structure in the data storage area (step S604). Thereafter, the partition window optimization unit 205 outputs information on the data structure in the determined data storage area as a processing result (step S606), and ends the processing.

具体的には、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、配列インデクス型カラムの数と同数の次元の配列としてタプルを保持するように決定する。 Specifically, the partition window optimization unit 205 determines to hold the tuple as an array having the same number of dimensions as the number of array index type columns.

図４に示す例では、ＡｒｒａｙＩｎｄｅｘカラム４２４には、カラム「ｉ」、「ｊ」の二つ２つのカラムの識別情報が格納される。したがって、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、２次元配列としてタプルを保持するように決定する。 In the example illustrated in FIG. 4, the array index column 424 stores identification information of two columns of columns “i” and “j”. Therefore, the partition window optimization unit 205 determines to hold the tuple as a two-dimensional array.

グループ分け対象のカラムの中に通常型カラムが含まれると判定された場合、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、データ格納領域におけるデータ構造を決定する（ステップＳ６０５）。その後、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、決定されたデータ格納領域におけるデータ構造の情報を処理結果として出力し（ステップＳ６０６）、処理を終了する。 If it is determined that the normal type column is included in the grouping target column, the partition window optimization unit 205 determines the data structure in the data storage area (step S605). Thereafter, the partition window optimization unit 205 outputs information on the data structure in the determined data storage area as a processing result (step S606), and ends the processing.

具体的には、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、通常型カラムをキーとし、配列インデクス型カラムの数と同数の次元の配列を値とするハッシュマップとしてタプルを保持するように決定する。 Specifically, the partition window optimizing unit 205 determines to hold the tuple as a hash map having a normal type column as a key and an array having the same number of dimensions as the number of array index type columns.

例えば、配列インデクス型カラム「ｉ」、「ｊ」、通常型カラム「ｋ」を含むタプルの場合、Ｐａｒｔｉｔｉｏｎウィンドウ最適化部２０５は、通常型カラム「ｋ」をキーとし、２次元配列を値とするハッシュマップとしてタプルを保持するように決定する。 For example, in the case of a tuple including an array index type column “i”, “j”, and a normal type column “k”, the partition window optimization unit 205 uses the normal type column “k” as a key and a two-dimensional array as a value. Decide to keep the tuple as a hash map.

以上のように決定したデータ構造は、データ格納領域の設定情報として、オペレータ実行部２１１に保持する。オペレータ実行部２１１は、クエリ実行時に、当該設定情報に従って各タプルのデータを一時保存する。 The data structure determined as described above is held in the operator execution unit 211 as setting information of the data storage area. The operator execution unit 211 temporarily stores data of each tuple according to the setting information when executing a query.

当該設定情報は、データ格納領域の形態（すなわち、多次元配列又はハッシュマップの何れであるか）と、多次元配列又はハッシュマップに関する情報から構成される。多次元配列又はハッシュマップに関する情報は、多次元配列の場合、各次元の配列インデクス型カラムの名前が含まれ、ハッシュマップの場合、ハッシュマップのキーとなる通常型カラムの名前、及びハッシュマップの値となる多次元配列の、各次元の配列インデクス型カラムの名前が含まれる。 The setting information includes a data storage area form (that is, a multi-dimensional array or a hash map) and information related to the multi-dimensional array or the hash map. In the case of a multidimensional array, the information about the multidimensional array or hash map includes the name of the array index type column of each dimension. In the case of a hash map, the name of the normal type column that is the key of the hash map, and the hash map Contains the name of the array index type column for each dimension of the multidimensional array that is the value.

例えば、上記の二つの例における設定情報は、（形態＝多次元配列，配列インデクスカラム＝「ｉ」、「ｊ」）、及び（形態＝ハッシュマップ，ハッシュキーカラム＝「ｋ」，配列インデクスカラム＝「ｉ」、「ｊ」）のように表現される。 For example, the setting information in the above two examples includes (form = multidimensional array, array index column = “i”, “j”), and (form = hash map, hash key column = “k”, array index column = “I”, “j”).

図７は、本発明の実施例１のＪｏｉｎオペレータ最適化部２０６が実行する処理を説明するフローチャートである。 FIG. 7 is a flowchart illustrating processing executed by the Join operator optimizing unit 206 according to the first embodiment of this invention.

Ｊｏｉｎオペレータ最適化部２０６は、クエリ最適化部２０４から呼び出されると以下で説明する処理を開始する。このとき、クエリ最適化部２０４は、処理対象のオペレータの識別情報（オペレータＩＤ）を出力する。 When called from the query optimization unit 204, the Join operator optimization unit 206 starts processing described below. At this time, the query optimization unit 204 outputs the identification information (operator ID) of the operator to be processed.

Ｊｏｉｎオペレータ最適化部２０６は、クエリグラフ構成情報２０２に含まれるＪｏｉｎオペレータ管理情報４３０を参照し、入力スキーマに配列インデクス型カラムが含まれるか否かを判定する（ステップＳ７０１）。具体的には、以下のような処理が実行される。 The Join operator optimizing unit 206 refers to the Join operator management information 430 included in the query graph configuration information 202, and determines whether or not an array index type column is included in the input schema (Step S701). Specifically, the following processing is executed.

Ｊｏｉｎオペレータ最適化部２０６は、オペレータ管理情報４１０を参照し、オペレータＩＤ４１１が入力されたオペレータの識別情報と一致するエントリを検索する。 The Join operator optimizing unit 206 refers to the operator management information 410 and searches for an entry that matches the operator identification information to which the operator ID 411 is input.

Ｊｏｉｎオペレータ最適化部２０６は、検索されたエントリの入力４１３を参照し、入力先のオペレータの識別情報を取得する。また、Ｊｏｉｎオペレータ最適化部２０６は、検索されたエントリのタイプ４１２に基づいて、参照する管理情報を特定する。 The Join operator optimizing unit 206 refers to the input 413 of the searched entry and acquires the identification information of the input destination operator. Also, the Join operator optimizing unit 206 specifies management information to be referenced based on the type 412 of the retrieved entry.

図４に示す例では、オペレータＩＤ４１１が「１」、「２」のタイプ４１２が「Ｐａｒｔｉｔｉｏｎ」であるため、Ｊｏｉｎオペレータ最適化部２０６は、Ｐａｒｔｉｔｉｏｎウィンドウ管理情報４２０を参照する。 In the example shown in FIG. 4, since the operator ID 411 is “1” and the type 412 “2” is “Partition”, the Join operator optimizing unit 206 refers to the Partition window management information 420.

Ｊｏｉｎオペレータ最適化部２０６は、Ｐａｒｔｉｔｉｏｎウィンドウ管理情報４２０を参照し、取得されたオペレータの識別情報と一致するエントリを検索する。Ｊｏｉｎオペレータ最適化部２０６は、検索されたエントリのＡｒｒａｙＩｎｄｅｘカラム４２４に格納されるカラムの識別情報を取得する。 The Join operator optimization unit 206 refers to the Partition window management information 420 and searches for an entry that matches the acquired operator identification information. The Join operator optimizing unit 206 obtains column identification information stored in the Array Index column 424 of the searched entry.

Ｊｏｉｎオペレータ最適化部２０６は、Ｊｏｉｎオペレータ管理情報４３０を参照し、オペレータＩＤ４３１が入力されたオペレータの識別情報と一致するエントリを検索する。 The Join operator optimizing unit 206 refers to the Join operator management information 430 and searches for an entry that matches the operator identification information to which the operator ID 431 is input.

Ｊｏｉｎオペレータ最適化部２０６は、検索されたエントリの入力カラム４３２から、入力スキーマのカラムの識別情報を取得する。 The Join operator optimization unit 206 acquires the identification information of the input schema column from the input column 432 of the searched entry.

Ｊｏｉｎオペレータ最適化部２０６は、ＡｒｒａｙＩｎｄｅｘカラム４２４から取得されたカラムの識別情報と、入力カラム４３２から取得されたカラムの識別情報とを比較し、ＡｒｒａｙＩｎｄｅｘカラム４２４に格納されるカラムと一致するカラムが入力カラム４３２に一つ以上存在するか否かを判定する。 The Join operator optimization unit 206 compares the column identification information acquired from the Array Index column 424 with the column identification information acquired from the input column 432, and matches the column stored in the Array Index column 424. It is determined whether one or more columns exist in the input column 432.

ＡｒｒａｙＩｎｄｅｘカラム４２４に格納されるカラムと一致するカラムが入力カラム４３２に一つ以上存在する場合、Ｊｏｉｎオペレータ最適化部２０６は、入力スキーマに配列インデクス型カラムが含まれると判定する。 If there is at least one column in the input column 432 that matches the column stored in the Array Index column 424, the Join operator optimizing unit 206 determines that the input schema includes an array index type column.

以上がステップＳ７０１の処理の説明である。 The above is the description of the processing in step S701.

入力スキーマに配列インデクス型カラムが含まれないと判定された場合、Ｊｏｉｎオペレータ最適化部２０６は、データ格納領域の設定情報が必要ない旨を処理結果として、クエリ最適化部２０４に出力し（ステップＳ７０４）、処理を終了する。 When it is determined that the array index type column is not included in the input schema, the Join operator optimizing unit 206 outputs to the query optimizing unit 204 as a processing result that the setting information of the data storage area is not necessary (Step (Step)). S704), the process ends.

入力スキーマに配列インデクス型カラムが含まれると判定された場合、Ｊｏｉｎオペレータ最適化部２０６は、Ｊｏｉｎオペレータにおける結合条件が配列インデクス型カラムに関する結合条件であるか否かを判定する（ステップＳ７０２）。具体的には、以下のような処理が実行される。 When it is determined that the array index type column is included in the input schema, the Join operator optimization unit 206 determines whether or not the join condition in the Join operator is a join condition related to the array index type column (step S702). Specifically, the following processing is executed.

Ｊｏｉｎオペレータ最適化部２０６は、Ｐａｒｔｉｔｉｏｎウィンドウ管理情報４２０から検索されたエントリのＡｒｒａｙＩｎｄｅｘカラム４２４と、Ｊｏｉｎオペレータ管理情報４３０から検索されたエントリのＪｏｉｎカラム４３３とを参照する。 The Join operator optimizing unit 206 refers to the Array Index column 424 of the entry retrieved from the Partition window management information 420 and the Join column 433 of the entry retrieved from the Join operator management information 430.

Ｊｏｉｎオペレータ最適化部２０６は、Ｊｏｉｎカラム４３３に格納される結合条件が配列インデクス型カラムに関する結合条件であるか否かを判定する。 The Join operator optimization unit 206 determines whether or not the join condition stored in the Join column 433 is a join condition related to the array index type column.

図４に示す例では、Ｊｏｉｎカラム４３３には、配列インデクス型カラムであるカラム「Ａ．ｊ」とカラム「Ｂ．ｉ」に関する結合条件が格納される。したがって、Ｊｏｉｎオペレータ最適化部２０６は、Ｊｏｉｎオペレータにおける結合条件が配列インデクス型カラムに関する結合条件であると判定する。 In the example illustrated in FIG. 4, the join column 433 stores a join condition related to the column “A.j” and the column “Bi” which are array index type columns. Therefore, the Join operator optimizing unit 206 determines that the join condition in the Join operator is a join condition related to the array index type column.

以上がステップＳ７０２の処理の説明である。 The above is the description of the processing in step S702.

Ｊｏｉｎオペレータにおける結合条件が配列インデクス型カラムに関する結合条件でないと判定された場合、Ｊｏｉｎオペレータ最適化部２０６は、データ格納領域の設定情報が必要ない旨を処理結果として、クエリ最適化部２０４に出力し（ステップＳ７０４）、処理を終了する。 When it is determined that the join condition in the Join operator is not a join condition related to the array index type column, the Join operator optimization unit 206 outputs to the query optimization unit 204 as a processing result that the setting information of the data storage area is not necessary. (Step S704), and the process ends.

Ｊｏｉｎオペレータにおける結合条件が配列インデクス型カラムに関する結合条件であると判定された場合、Ｊｏｉｎオペレータ最適化部２０６は、データ格納領域におけるデータ構造を決定する（ステップＳ７０３）。その後、Ｊｏｉｎオペレータ最適化部２０６は、決定されたデータ格納領域におけるデータ構造の情報を処理結果として出力し（ステップＳ７０４）、処理を終了する。 When it is determined that the join condition in the Join operator is a join condition related to the array index type column, the Join operator optimizing unit 206 determines the data structure in the data storage area (step S703). Thereafter, the Join operator optimizing unit 206 outputs information on the data structure in the determined data storage area as a processing result (step S704), and ends the processing.

具体的には、Ｊｏｉｎオペレータ最適化部２０６は、配列インデクス型カラムの数と同数の次元の配列としてタプルを保持するように決定する。 Specifically, the Join operator optimization unit 206 determines to hold the tuple as an array having the same number of dimensions as the number of array index type columns.

図４に示す例では、ストリームデータＡのＡｒｒａｙＩｎｄｅｘカラム４２４には、カラム「ｉ」、「ｊ」の二つのカラムの識別情報が格納される。したがって、Ｊｏｉｎオペレータ最適化部２０６は、２次元配列として入力ストリームＡに含まれるタプルを保持するように決定する。ストリームデータＢについても同様である。 In the example shown in FIG. 4, the identification information of the two columns “i” and “j” is stored in the Array Index column 424 of the stream data A. Therefore, the Join operator optimization unit 206 determines to hold the tuples included in the input stream A as a two-dimensional array. The same applies to the stream data B.

図８は、本発明の実施例１のＧｒｏｕｐＢｙオペレータ最適化部２０７が実行する処理を説明するフローチャートである。 FIG. 8 is a flowchart illustrating processing executed by the GroupBy operator optimization unit 207 according to the first embodiment of this invention.

ＧｒｏｕｐＢｙオペレータ最適化部２０７は、クエリ最適化部２０４から呼び出されると以下で説明する処理を開始する。このとき、クエリ最適化部２０４は、処理対象のオペレータの識別情報（オペレータＩＤ）を出力する。 When called from the query optimization unit 204, the GroupBy operator optimization unit 207 starts processing described below. At this time, the query optimization unit 204 outputs the identification information (operator ID) of the operator to be processed.

ＧｒｏｕｐＢｙオペレータ最適化部２０７は、クエリグラフ構成情報２０２に含まれるＧｒｏｕｐＢｙオペレータ管理情報４４０を参照し、グループ分け対象のカラムに配列インデクス型カラムが含まれるか否かを判定する（ステップＳ８０１）。具体的には、以下のような処理が実行される。 The GroupBy operator optimization unit 207 refers to the GroupBy operator management information 440 included in the query graph configuration information 202, and determines whether or not an array index type column is included in the grouping target column (step S801). Specifically, the following processing is executed.

ＧｒｏｕｐＢｙオペレータ最適化部２０７は、ＧｒｏｕｐＢｙオペレータ管理情報４４０を参照し、オペレータＩＤ４４１が入力されたオペレータの識別情報と一致するエントリを検索する。 The GroupBy operator optimizing unit 207 refers to the GroupBy operator management information 440 and searches for an entry that matches the operator identification information to which the operator ID 441 is input.

ＧｒｏｕｐＢｙオペレータ最適化部２０７は、検索されたエントリのＧｒｏｕｐｉｎｇカラム４４３とＡｒｒａｙＩｎｄｅｘカラム４４４とを比較し、Ｇｒｏｕｐｉｎｇカラム４４３に格納されるカラムと一致するカラムがＡｒｒａｙＩｎｄｅｘカラム４４４に一つ以上存在するか否かを判定する。 The GroupBy operator optimizing unit 207 compares the Grouping column 443 and the Array Index column 444 of the searched entry, and whether there is one or more columns in the Array Index column 444 that match the columns stored in the Grouping column 443. Determine whether or not.

Ｇｒｏｕｐｉｎｇカラム４４３に格納されるカラムと一致するカラムが一つ以上存在する場合、ＧｒｏｕｐＢｙオペレータ最適化部２０７は、グループ分け対象のカラムに配列インデクス型カラムが含まれると判定する。 If there is one or more columns that match the columns stored in the Grouping column 443, the GroupBy operator optimization unit 207 determines that the column to be grouped includes an array index type column.

以上がステップＳ８０１の処理の説明である。 The above is the description of the process in step S801.

グループ分け対象のカラムに配列インデクス型カラムが含まれないと判定された場合、ＧｒｏｕｐＢｙオペレータ最適化部２０７は、データ格納領域の設定情報が必要ない旨を処理結果として、クエリ最適化部２０４に出力し（ステップＳ８０６）、処理を終了する。 When it is determined that the column to be grouped does not include an array index type column, the GroupBy operator optimization unit 207 outputs to the query optimization unit 204 as a processing result that the setting information of the data storage area is not necessary (Step S806), and the process ends.

グループ分け対象のカラムに配列インデクス型カラムが含まれると判定された場合、ＧｒｏｕｐＢｙオペレータ最適化部２０７は、ＧｒｏｕｐＢｙオペレータ管理情報４４０に基づいて、グループ分け対象のカラムを配列インデクス型カラムと通常型カラムとに分類する（ステップＳ８０２）。 When it is determined that the column to be grouped includes an array index type column, the GroupBy operator optimization unit 207 determines that the column to be grouped is an array index type column and a normal type column based on the GroupBy operator management information 440. (Step S802).

ＧｒｏｕｐＢｙオペレータ最適化部２０７は、分類結果に基づいて、グループ分け対象のカラムの中に通常型カラムが含まれるか否かを判定する（ステップＳ８０３）。 The GroupBy operator optimization unit 207 determines whether or not the normal type column is included in the grouping target column based on the classification result (step S803).

例えば、Ｇｒｏｕｐｉｎｇカラム４４３に格納される全てのカラムの識別情報が、ＡｒｒａｙＩｎｄｅｘカラム４４４に格納される場合、ＧｒｏｕｐＢｙオペレータ最適化部２０７は、グループ分け対象のカラムの中に、通常型カラムが含まれないと判定する。 For example, when the identification information of all the columns stored in the Grouping column 443 is stored in the Array Index column 444, the GroupBy operator optimization unit 207 includes the normal type column in the grouping target columns. Judge that there is no.

図４に示す例では、ＡｒｒａｙＩｎｄｅｘカラム４４４に、Ｇｒｏｕｐｉｎｇカラム４４３に格納される全てのカラム「Ａ．ｉ」、カラム「Ｂ．ｊ」が格納されるため、ＧｒｏｕｐＢｙオペレータ最適化部２０７は、グループ分け対象のカラムの中に、通常型カラムが含まれないと判定する。 In the example shown in FIG. 4, since all the columns “A.i” and “B.j” stored in the Grouping column 443 are stored in the Array Index column 444, the GroupBy operator optimizing unit 207 It is determined that the normal column is not included in the columns to be divided.

グループ分け対象のカラムの中に通常型カラムが含まれないと判定された場合、ＧｒｏｕｐＢｙオペレータ最適化部２０７は、データ格納領域におけるデータ構造を決定する（ステップＳ８０４）。その後、ＧｒｏｕｐＢｙオペレータ最適化部２０７は、決定されたデータ格納領域におけるデータ構造の情報を処理結果として出力し（ステップＳ８０６）、処理を終了する。 When it is determined that the normal column is not included in the grouping target column, the GroupBy operator optimization unit 207 determines the data structure in the data storage area (step S804). Thereafter, the GroupBy operator optimizing unit 207 outputs information on the data structure in the determined data storage area as a processing result (step S806), and ends the processing.

具体的には、ＧｒｏｕｐＢｙオペレータ最適化部２０７は、配列インデクス型カラム、かつ、グループ分け対象のカラムであるカラムの数と同数の次元の配列として集計値を保持するように決定する。 Specifically, the GroupBy operator optimizing unit 207 determines to hold the aggregate value as an array with an array index type and an array with the same number of dimensions as the number of columns that are the grouping target columns.

図４に示す例では、ＡｒｒａｙＩｎｄｅｘカラム４４４には、カラム「Ａ．ｉ」、カラム「Ａ．ｊ」、カラム「Ｂ．ｉ」、及びカラム「Ｂ．ｊ」が格納され、Ｇｒｏｕｐｉｎｇカラム４４３には、カラム「Ａ．ｉ」、及びカラム「Ｂ．ｊ」が格納される。そのため、配列インデクス型カラム、かつ、Ｇｒｏｕｐｉｎｇカラムであるカラムの数は「２」となる。したがって、ＧｒｏｕｐＢｙオペレータ最適化部２０７は、２次元配列として集計値を保持するように決定する。 In the example illustrated in FIG. 4, the column “A.i”, the column “A.j”, the column “B.i”, and the column “B.j” are stored in the Array Index column 444. Stores a column “A.i” and a column “B.j”. Therefore, the number of columns that are array index type columns and grouping columns is “2”. Accordingly, the GroupBy operator optimizing unit 207 determines to hold the aggregate value as a two-dimensional array.

グループ分け対象のカラムの中に通常型カラムが含まれると判定された場合、ＧｒｏｕｐＢｙオペレータ最適化部２０７は、データ格納領域におけるデータ構造を決定する（ステップＳ８０５）。その後、ＧｒｏｕｐＢｙオペレータ最適化部２０７は、決定されたデータ格納領域におけるデータ構造の情報を処理結果として出力し（ステップＳ８０６）、処理を終了する。 If it is determined that the normal column is included in the grouping target column, the GroupBy operator optimization unit 207 determines the data structure in the data storage area (step S805). Thereafter, the GroupBy operator optimizing unit 207 outputs information on the data structure in the determined data storage area as a processing result (step S806), and ends the processing.

具体的には、ＧｒｏｕｐＢｙオペレータ最適化部２０７は、通常型カラムをキーとし、配列インデクス型カラム、かつ、グループ分け対象の数と同数の次元の配列を値とするハッシュマップとしてタプルを保持するように決定する。 Specifically, the GroupBy operator optimizing unit 207 holds the tuple as a hash map having the normal type column as a key, an array index type column, and an array having the same number of dimensions as the number of grouping targets. To decide.

例えば、配列インデクス型カラム「Ａ．ｉ」、「Ｂ．ｊ」、通常型カラム「Ａ．ｋ」を含むタプルの場合、ＧｒｏｕｐＢｙオペレータ最適化部２０７は、通常型カラム「Ａ．ｋ」をキーとし、２次元配列を値とするハッシュマップとして集計値を保持するように決定する。 For example, in the case of a tuple including an array index type column “A.i”, “B.j”, and a normal type column “A.k”, the GroupBy operator optimization unit 207 uses the normal type column “A.k” as a key. And the total value is determined to be held as a hash map having a two-dimensional array as a value.

以上が、各最適化部が実行する処理の説明である。本実施例では、各最適化部が、配列インデクス型カラム等の情報に基づいて、処理に最適なデータ構造を決定する。また、クエリ構築部２０３は、図６から図８を用いて説明した処理結果に基づいて、オペレータのデータ格納領域の設定情報を生成する。これによって、オペレータ実行部２１１は、クエリの実行時に、当該データ格納領域の設定情報に基づいて、処理に最適なデータ構造のデータを管理できる。 The above is description of the process which each optimization part performs. In this embodiment, each optimization unit determines a data structure optimal for processing based on information such as an array index type column. In addition, the query construction unit 203 generates setting information for the data storage area of the operator based on the processing results described with reference to FIGS. Accordingly, the operator execution unit 211 can manage data having a data structure optimum for processing based on the setting information of the data storage area when executing a query.

多次元配列のデータを管理することによって、各オペレータにおける処理の効率を向上させることが可能となる。また、多次元配列として管理する必要がない場合には、通常のタプルとして扱う。これによって、不要なメモリの消費を低減することができる。すなわち、メモリの利用効率を向上させることができる。 By managing the data of the multidimensional array, it is possible to improve the processing efficiency in each operator. When it is not necessary to manage as a multidimensional array, it is handled as a normal tuple. As a result, unnecessary memory consumption can be reduced. That is, the memory utilization efficiency can be improved.

これは、全てのタプルを多次元配列として保持すると、不必要なインデクスが生成されることによって、メモリの消費量が増大する可能性があるためである。そのため、本実施例では、クエリ最適化部２０４が、クエリグラフの解析結果に基づいて、検索処理の効率化、及びメモリ消費量の削減が可能なデータの管理方法を決定する。 This is because holding all tuples as a multidimensional array may increase memory consumption by generating unnecessary indexes. Therefore, in this embodiment, the query optimization unit 204 determines a data management method that can improve the efficiency of the search process and reduce the memory consumption based on the analysis result of the query graph.

ここで、クエリ実行時におけるデータの取り扱いの一例を説明する。 Here, an example of data handling during query execution will be described.

なお、以下の説明では、クエリ実行部２１０にクエリグラフに含まれる全てのオペレータのオペレータ実行部２１１が登録されているものとする。 In the following description, it is assumed that operator execution units 211 of all operators included in the query graph are registered in the query execution unit 210.

図９は、本発明の実施例１のオペレータ間のデータの流れを示す説明図である。 FIG. 9 is an explanatory diagram illustrating a data flow between operators according to the first embodiment of this invention.

図９に示すクエリグラフは、図３に示す分析シナリオ１３２に対応するクエリグラフである。図９に示すクエリグラフには、Ｐａｒｔｉｔｉｏｎウィンドウ９０１−１、９０１−２、Ｊｏｉｎオペレータ９０２、ＧｒｏｕｐＢｙオペレータ９０３の四つのオペレータが含まれる。したがって、クエリ実行部２１０には、四つのオペレータ実行部２１１が登録される。 The query graph shown in FIG. 9 is a query graph corresponding to the analysis scenario 132 shown in FIG. The query graph shown in FIG. 9 includes four operators, a Partition window 901-1 and 901-2, a Join operator 902, and a GroupBy operator 903. Therefore, four operator execution units 211 are registered in the query execution unit 210.

以下の説明では、Ｐａｒｔｉｔｉｏｎウィンドウ９０１−１に対応するオペレータ実行部２１１を第１のオペレータ実行部と記載し、Ｐａｒｔｉｔｉｏｎウィンドウ９０１−２に対応するオペレータ実行部２１１を第２のオペレータ実行部と記載し、Ｊｏｉｎオペレータ９０２に対応するオペレータ実行部２１１を第３のオペレータ実行部と記載し、ＧｒｏｕｐＢｙオペレータ９０３に対応するオペレータ実行部２１１を第４のオペレータ実行部と記載する。 In the following description, the operator execution unit 211 corresponding to the Partition window 901-1 is described as a first operator execution unit, and the operator execution unit 211 corresponding to the Partition window 901-2 is described as a second operator execution unit. The operator execution unit 211 corresponding to the Join operator 902 is referred to as a third operator execution unit, and the operator execution unit 211 corresponding to the GroupBy operator 903 is referred to as a fourth operator execution unit.

Ｐａｒｔｉｔｉｏｎウィンドウ９０１−１には、カラム「ｉ」、カラム「ｊ」、カラム「ｖ」を含む入力タプル９２１、９２２、９２３が入力される。 In the Partition window 901-1, input tuples 921, 922, and 923 including a column “i”, a column “j”, and a column “v” are input.

このとき、第１のオペレータ実行部は、入力タプルを２次元配列として保持するように設定されているため、入力タプル９２１、９２２、９２３を、２次元配列としてデータ格納領域９１１−１に保持する。 At this time, since the first operator execution unit is set to hold the input tuple as a two-dimensional array, the input tuples 921, 922, and 923 are held in the data storage area 911-1 as a two-dimensional array. .

また、第１のオペレータ実行部は、カラム「ｉ」、カラム「ｊ」の組合せ毎に１つの値を抽出し、カラム「ｉ」、カラム「ｊ」及び抽出された値のカラムから構成されるタプル９３１、９３２、９３３をＪｏｉｎオペレータ９０２に出力する。 Further, the first operator execution unit extracts one value for each combination of the column “i” and the column “j”, and includes the column “i”, the column “j”, and the column of the extracted value. Tuples 931, 932, 933 are output to Join operator 902.

第２のオペレータ実行部も同様に、入力タプル９２４、９２５、９２６を２次元配列としてデータ格納領域９１１−２に保持する。 Similarly, the second operator execution unit holds the input tuples 924, 925, and 926 in the data storage area 911-2 as a two-dimensional array.

第３のオペレータ実行部は、Ｐａｒｔｉｔｉｏｎウィンドウ９０１−１及びＰａｒｔｉｔｉｏｎウィンドウ９０１−２のそれぞれから出力されるタプルを２次元配列として保持するように設定されている。したがって、第３のオペレータ実行部は、タプル９３１、９３２、９３３を２次元配列としてデータ格納領域９１２に保持し、また、タプル９３４、９３５、９３６を２次元配列としてデータ格納領域９１２に保持する。 The third operator execution unit is set to hold tuples output from the Partition window 901-1 and the Partition window 901-2 as a two-dimensional array. Therefore, the third operator execution unit holds the tuples 931, 932, and 933 as a two-dimensional array in the data storage area 912, and holds the tuples 934, 935, and 936 as a two-dimensional array in the data storage area 912.

第３のオペレータ実行部は、カラム「Ａ．ｊ」とカラム「Ｂ．ｊ」とが一致するタプルを結合するために、入力ストリームＡの列の位置と、入力ストリームＢの行の位置とが同一となる要素を結合する。 The third operator execution unit combines the tuples in which the columns “A.j” and “B.j” match, so that the column position of the input stream A and the row position of the input stream B are Combine identical elements.

第３のオペレータ実行部は、入力ストリームＡの要素の位置をカラム「Ａ．ｉ」、カラム「Ａ．ｊ」とし、入力ストリームＢの要素の位置をカラム「Ｂ．ｉ」、カラム「Ｂ．ｊ」とし、さらに、結合された入力ストリームＡの要素をカラム「Ａ．ｖ」、結合された入力ストリームＢの要素をカラム「Ｂ．ｖ」とするタプル９４１、９４２、９４３を出力する。 The third operator execution unit sets the position of the element of the input stream A to the column “A.i” and the column “A.j”, and sets the position of the element of the input stream B to the column “B.i” and the column “B. j ”, and further, tuples 941, 942, and 943 are output in which the elements of the combined input stream A are the column“ A.v ”and the elements of the combined input stream B are the column“ B.v ”.

本実施例では、１８個のタプルが出力されることとなる。 In this embodiment, 18 tuples are output.

第４のオペレータ実行部は、カラム「Ａ．ｉ」とカラム［Ｂ．ｊ］とに基づいて、Ｊｏｉｎオペレータ９０２から出力されたタプルを複数のグループに分ける。また、第４のオペレータ実行部は、グループ毎に、タプルの要素の積を算出し、さらに、各タプルの積の合計値を算出する。 The fourth operator execution unit includes column “A.i” and column [B. j] and the tuples output from the Join operator 902 are divided into a plurality of groups. The fourth operator execution unit calculates the product of the elements of the tuples for each group, and further calculates the total value of the products of the tuples.

第４のオペレータ実行部は、集計結果を２次元配列として保持するように設定されているため、各グループの合計値を２次元配列としてデータ格納領域９１３に保持する。 Since the fourth operator execution unit is set to hold the total result as a two-dimensional array, the total value of each group is held in the data storage area 913 as a two-dimensional array.

また、第４のオペレータ実行部は、合計値が格納される位置をカラム「Ａ．ｉ」、カラム「Ｂ．ｊ」とし、合計値をカラム「ｖ」とするタプル９５１、９５２、９５３を処理結果として出力する。 In addition, the fourth operator execution unit processes the tuples 951, 952, and 953 in which the position where the total value is stored is the column “A.i” and the column “B.j” and the total value is the column “v”. Output as a result.

以上で説明したように、多次元配列としてデータを管理することによってオペレータの処理効率を向上することができる。また、各オペレータのデータ格納領域に最適なデータ形式が決定されるため、メモリ消費量を低減することができる。 As described above, it is possible to improve the processing efficiency of the operator by managing data as a multidimensional array. In addition, since an optimal data format is determined for each operator's data storage area, memory consumption can be reduced.

図１０は、本発明の実施例１におけるオペレータ間のデータの流れを示す説明図である。 FIG. 10 is an explanatory diagram showing the flow of data between operators in the first embodiment of the present invention.

図１０は、一つのタプルの値の更新に伴った、データの流れを示す。 FIG. 10 shows the flow of data accompanying the updating of the value of one tuple.

Ｐａｒｔｉｔｉｏｎウィンドウ９０１−２に新たなタプル１００１が入力された場合、第２のオペレータ実行部は、ストリームデータＢの２次元配列の位置に基づいて、入力されたタプルに含まれるインデクスに一致するデータを検索する。 When a new tuple 1001 is input to the Partition window 901-2, the second operator execution unit selects data that matches the index included in the input tuple based on the position of the two-dimensional array of the stream data B. Search for.

Ｐａｒｔｉｔｉｏｎウィンドウ９０１−２では、カラム「ｉ」、「ｊ」の組合せに対して、一つの値しか存在することができないため、検索されたデータが削除されることとなる。したがって、第２のオペレータ実行部は、削除されるデータのタプル１００２、更新されるデータのタプル１００３をＪｏｉｎオペレータ９０２に出力する。 In the Partition window 901-2, only one value can exist for the combination of the columns “i” and “j”, and thus the retrieved data is deleted. Therefore, the second operator execution unit outputs a tuple 1002 of data to be deleted and a tuple 1003 of data to be updated to the Join operator 902.

第３のオペレータ実行部は、削除されるデータに関連するタプルを結合し、結合結果をタプル１００４、１００６、１００８として出力し、また、更新されるデータに関連するタプルを結合し、結合結果をタプル１００５、１００７、１００９として出力する。 The third operator execution unit combines the tuples related to the data to be deleted, outputs the combined result as tuples 1004, 1006, and 1008, combines the tuples related to the updated data, and displays the combined result. Output as tuples 1005, 1007, and 1009.

第４のオペレータ実行部は、Ｊｏｉｎオペレータから入力されたタプルに含まれるカラム「Ａ．ｉ」、カラム「Ｂ．ｊ」を参照し、更新される集計値の位置を特定する。第４のオペレータ実行部は、特定された集計値の位置から値を読み出し、また、特定された集計値に関する集計処理を実行する。 The fourth operator execution unit refers to the column “A.i” and the column “B.j” included in the tuple input from the Join operator, and specifies the position of the total value to be updated. The fourth operator execution unit reads a value from the position of the specified total value, and executes a total process regarding the specified total value.

さらに、第４のオペレータ実行部は、削除される集計値のタプル１０１０、１０１２、１０１４を出力し、更新される集計値のタプル１０１１、１０１３、１０１５を出力する。 Further, the fourth operator execution unit outputs the tuples 1010, 1012, 1014 of the aggregate values to be deleted, and outputs the tuples 1011, 1013, 1015 of the aggregate values to be updated.

本実施例では、多次元配列でデータを管理することによって、一部の要素の更新時に伴った処理を高速に実行することができる。 In the present embodiment, by managing data in a multidimensional array, it is possible to execute processing accompanying updating of some elements at high speed.

本実施例では、一つのストリームデータ処理サーバ１００がストリームデータの処理を実行したが本発明はこれに限定されない。 In this embodiment, one stream data processing server 100 executes processing of stream data, but the present invention is not limited to this.

例えば、複数のサーバから構成される計算機システムであってもよい。この場合、複数のサーバの各々が、ストリームデータ処理部１１０を備える形態が考えられる。また、ストリームデータ処理部１１０の各機能部を複数のサーバに分散して配置してもよい。 For example, a computer system composed of a plurality of servers may be used. In this case, a form in which each of the plurality of servers includes the stream data processing unit 110 is conceivable. In addition, each functional unit of the stream data processing unit 110 may be distributed and arranged in a plurality of servers.

また、一つのサーバ上に複数の仮想サーバを生成し、当該仮想サーバを用いて同様の構成を実現してもよい。 A plurality of virtual servers may be generated on one server, and the same configuration may be realized using the virtual servers.

以上で説明したように、実施例１によれば、クエリグラフの解析に基づいて、オペレータのデータ格納領域における最適なデータ構造を決定することができる。これによって、検索処理の高速化、及びメモリ使用量の低減が可能となる。また、行列の演算において、特殊な組込関数を定義する必要がないため、ユーザの開発効率を改善することができる。 As described above, according to the first embodiment, the optimum data structure in the data storage area of the operator can be determined based on the analysis of the query graph. This makes it possible to speed up the search process and reduce the memory usage. In addition, since it is not necessary to define a special built-in function in the matrix operation, the development efficiency of the user can be improved.

なお、本実施例で例示した種々のソフトウェアは、電磁的、電子的及び光学式等の種々の記録媒体（例えば、非一時的な記憶媒体）に格納可能であり、インターネット等の通信網を通じて、コンピュータにダウンロード可能である。 The various software illustrated in the present embodiment can be stored in various recording media (for example, non-temporary storage media) such as electromagnetic, electronic, and optical, and through a communication network such as the Internet. It can be downloaded to a computer.

さらに、本実施例では、ソフトウェアによる制御を用いた例について説明したが、その一部をハードウェアによって実現することも可能である。 Furthermore, in the present embodiment, the example using the control by software has been described, but a part thereof may be realized by hardware.

以上、本発明を添付の図面を参照して詳細に説明したが、本発明はこのような具体的構成に限定されるものではなく、添付した請求の範囲の趣旨内における様々な変更及び同等の構成を含むものである。 Although the present invention has been described in detail with reference to the accompanying drawings, the present invention is not limited to such specific configurations, and various modifications and equivalents within the spirit of the appended claims Includes configuration.

１００ストリームデータ処理サーバ
１０１プロセッサ
１０２メモリ
１０３バス
１０４ネットワークインタフェース
１０５ストレージ装置
１１０ストリームデータ処理部
１２０ホスト計算機
１２１入力タプル
１３０ホスト計算機
１３１クエリ登録インタフェース
１３２分析シナリオ
１４０ホスト計算機
１４１出力タプル
１５０ネットワーク
２０１クエリパーサ
２０２クエリグラフ構成情報
２０３クエリ構築部
２０４クエリ最適化部
２０５Ｐａｒｔｉｔｉｏｎウィンドウ最適化部
２０６Ｊｏｉｎオペレータ最適化部
２０７ＧｒｏｕｐＢｙオペレータ最適化部
２１０クエリ実行部
２１１オペレータ実行部
２２１タプル入力部
２２２タプル出力部
４１０オペレータ管理情報
４２０Ｐａｒｔｉｔｉｏｎウィンドウ管理情報
４３０Ｊｏｉｎオペレータ管理情報
４４０ＧｒｏｕｐＢｙオペレータ管理情報
９０１Ｐａｒｔｉｔｉｏｎウィンドウ
９０２Ｊｏｉｎオペレータ
９０３ＧｒｏｕｐＢｙオペレータ
９１１、９１２、９１３データ格納領域 100 stream data processing server 101 processor 102 memory 103 bus 104 network interface 105 storage device 110 stream data processing unit 120 host computer 121 input tuple 130 host computer 131 query registration interface 132 analysis scenario 140 host computer 141 output tuple 150 network 201 query parser 202 query Graph configuration information 203 Query construction unit 204 Query optimization unit 205 Partition window optimization unit 206 Join operator optimization unit 207 GroupBy operator optimization unit 210 Query execution unit 211 Operator execution unit 221 Tuple input unit 222 Tuple output unit 410 Operator management information 420 Partition Window Management Information 430 J in the operator management information 440 GroupBy operator management information 901 Partition window 902 Join operator 903 GroupBy operator 911, 912, 913 data storage area

Claims

A data processing method in a computer that executes stream data processing,
The computer has a processor, a memory connected to the processor, a network interface connected to the processor,
The stream data processing includes a plurality of operators as minimum processing units,
The stream data that is the target of the stream data processing includes a plurality of tuples composed of a plurality of columns,
The calculator is
Query analysis that receives an analysis scenario including one or more queries for realizing the stream data processing and generates information of a query graph indicating the configuration of the operator in the stream data processing by analyzing the analysis scenario And
A query construction unit that generates an operator execution unit that executes processing corresponding to the operator based on the information of the query graph;
A query execution unit that processes the input stream data based on the operator execution unit generated by the query construction unit;
Have
The method
A first step in which the query construction unit selects an operator to be processed from the plurality of operators included in the stream data processing;
The query construction unit analyzes the processing content of the selected operator based on the information of the query graph, and is assigned to the operator execution unit corresponding to the selected operator based on the result of the analysis. A second step of determining a data structure in the data storage area;
A third step in which the query construction unit generates an operator execution unit corresponding to the selected operator based on the result of the analysis and information on the determined data structure;
A fourth step in which the query construction unit registers the generated operator execution unit in the query execution unit;
A data processing method comprising:

A data processing method according to claim 1, comprising:
The method
The query analysis unit analyzing the analysis scenario to identify an array index type column that is an integer continuous from 1 among the columns constituting the tuple that is the target of the stream data processing;
The query analysis unit generating information of the query graph including information of the identified sequence index type column;
Including
The two steps are:
A fifth step of referring to the information of the query graph and determining whether a tuple processed by the selected operator includes the column of the array index type;
When it is determined that the tuple to be processed by the selected operator includes the array index type column, the tuple in the data storage area is managed as a multidimensional array defined based on the array index type. And a sixth step of generating setting information for managing the tuple as a multidimensional array.

A data processing method according to claim 2, wherein
The stream data processing includes a first operator that extracts a predetermined number of the tuples from the stream data;
The sixth step includes
Classifying columns constituting a tuple to be processed by the selected operator into an array index type column and a normal type column that is a column other than the array index type;
Determining whether the tuple processed by the selected operator includes the normal type column based on the classification result;
If it is determined that the tuple processed by the selected operator does not include the normal type column, the tuple for holding the tuple as a multidimensional array having the same number of dimensions as the number of columns of the array index type Generating configuration information;
When it is determined that the tuple processed by the selected operator includes the normal type column, the normal type column is used as a key, and a multidimensional array having the same number of dimensions as the number of the array index type columns is provided. Generating the setting information for holding the tuple as a hash map as a value;
A data processing method comprising:

A data processing method according to claim 2, wherein
The stream data processing includes a second operator that combines the tuples input from the plurality of operators based on a predetermined column condition,
The sixth step includes
Referring to the information of the query graph, it is determined whether or not the sequence index type column included in the tuple processed by the selected operator is related to the condition of the predetermined column in the second operator. And steps to
When it is determined that the array index type column included in the tuple processed by the selected operator is related to the condition of the predetermined column in the second operator, the number of the array index type columns Generating the setting information for holding the tuple as a multidimensional array of the same number of dimensions;
A data processing method comprising:

A data processing method according to claim 2, wherein
The stream data processing is a third operator that divides the plurality of tuples into a plurality of groups based on a predetermined condition, and aggregates the values of the plurality of tuples for the plurality of groups based on a predetermined arithmetic processing. Including
The sixth step includes
Classifying columns constituting a tuple to be processed by the selected operator into an array index type column and a normal type column that is a column other than the array index type;
Determining whether the tuple processed by the selected operator includes the normal type column based on the classification result;
If it is determined that the tuple processed by the selected operator does not include the normal type column, the tuple for holding the tuple as a multidimensional array having the same number of dimensions as the number of columns of the array index type Generating configuration information;
When it is determined that the tuple processed by the selected operator includes the normal type column, the normal type column is used as a key, and a multidimensional array having the same number of dimensions as the number of the array index type columns is provided. Generating the setting information for holding the tuple as a hash map as a value;
A data processing method comprising:

A computer having a processor, a memory connected to the processor, a network interface connected to the processor, and executing stream data processing;
The stream data processing includes a plurality of operators as minimum processing units,
The stream data that is the target of the stream data processing includes a plurality of tuples composed of a plurality of columns,
The calculator is
Query analysis that receives an analysis scenario including one or more queries for realizing the stream data processing and generates information of a query graph indicating the configuration of the operator in the stream data processing by analyzing the analysis scenario And
A query construction unit that generates an operator execution unit that executes processing corresponding to the operator based on the information of the query graph;
A query execution unit that processes the input stream data based on the operator execution unit generated by the query construction unit;
Have
The query building unit
Select an operator to be processed from the plurality of operators included in the stream data processing,
Based on the information of the query graph, the processing contents of the selected operator are analyzed, and based on the analysis result, the data structure in the data storage area allocated to the operator execution unit corresponding to the selected operator Decide
Based on the result of the analysis and the information of the determined data structure, an operator execution unit corresponding to the selected operator is generated,
A computer, wherein the generated operator execution unit is registered in the query execution unit.

The computer according to claim 6, wherein
The query analysis unit
By analyzing the analysis scenario, an array index type column that is an integer continuous from 1 among the columns constituting the tuple that is the target of the stream data processing is specified,
Generating information of the query graph including information of the column of the identified sequence index type;
The query building unit
When determining the data structure in the data storage area, refer to the information of the query graph, determine whether the tuple processed by the selected operator includes the array index type column,
When it is determined that the tuple to be processed by the selected operator includes the array index type column, the tuple in the data storage area is managed as a multidimensional array defined based on the array index type. Decided on
A computer that generates setting information for managing the tuple as a multidimensional array.

The computer according to claim 7,
The stream data processing includes a first operator that extracts a predetermined number of the tuples from the stream data;
The query building unit
When determining the data structure in the data storage area, the columns constituting the tuple processed by the selected operator are the column of the array index type and the normal type column which is a column other than the array index type And categorized as
Based on the result of the classification, determine whether the tuple processed by the selected operator includes the normal type column;
If it is determined that the tuple processed by the selected operator does not include the normal type column, the tuple for holding the tuple as a multidimensional array having the same number of dimensions as the number of columns of the array index type Generate configuration information,
When it is determined that the tuple processed by the selected operator includes the normal type column, the normal type column is used as a key, and a multidimensional array having the same number of dimensions as the number of the array index type columns is provided. A computer that generates the setting information for holding the tuple as a hash map to be a value.

The computer according to claim 7,
The stream data processing includes a second operator that combines the tuples input from the plurality of operators based on a predetermined column condition,
The query building unit
When determining the data structure in the data storage area, the array index type column included in the tuple processed by the selected operator is referred to the information in the query graph, and the second operator has the array index type column. Determine if it is relevant to the given column condition,
When it is determined that the array index type column included in the tuple processed by the selected operator is related to the condition of the predetermined column in the second operator, the number of the array index type columns A computer that generates the setting information for holding the tuple as a multidimensional array having the same number of dimensions.

The computer according to claim 7,
The stream data processing is a third operator that divides the plurality of tuples into a plurality of groups based on a predetermined condition, and aggregates the values of the plurality of tuples for the plurality of groups based on a predetermined arithmetic processing. Including
The query building unit
When determining the data structure in the data storage area, the columns constituting the tuple processed by the selected operator are the column of the array index type and the normal type column which is a column other than the array index type And categorized as
Based on the result of the classification, determine whether the tuple processed by the selected operator includes the normal type column;
If it is determined that the tuple processed by the selected operator does not include the normal type column, the tuple for holding the tuple as a multidimensional array having the same number of dimensions as the number of columns of the array index type Generate configuration information,
When it is determined that the tuple processed by the selected operator includes the normal type column, the normal type column is used as a key, and a multidimensional array having the same number of dimensions as the number of the array index type columns is provided. A computer that generates the setting information for holding the tuple as a hash map to be a value.