JP2013232165A

JP2013232165A - Query integration method, query integration program, and integrated component creation device

Info

Publication number: JP2013232165A
Application number: JP2012105061A
Authority: JP
Inventors: Kosaku Kimura; 功作木村; Yoshihide Nomura; 佳秀野村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2012-05-02
Filing date: 2012-05-02
Publication date: 2013-11-14
Anticipated expiration: 2032-05-02
Also published as: JP5835084B2

Abstract

PROBLEM TO BE SOLVED: To reduce the number of components for processing execution to be created in creating the components by understanding queries with a flow definition.SOLUTION: A first integrated query creation unit 11 reads a flow definition 2 that defines processing and order of the processing, and when target processing and subsequent processing are of the same processing type and results are fixed due to a change in order of the pieces of processing, couples a phrase of a query in the target processing with a phrase of a query of the subsequent processing to create an integrated query. A second integrated query creation unit 13 incorporates processing that is not performed by the first integrated query creation unit 11 into a query of the subsequent processing as a sub-query to create an integrated query. An integrated component creation unit 20 creates components from the integrated query.

Description

本発明は、フロー定義にもとづいてクエリを生成する処理に関し、より詳しくは、複数のクエリを統合した統合クエリを生成するクエリ統合処理に関する。 The present invention relates to a process for generating a query based on a flow definition, and more particularly to a query integration process for generating an integrated query in which a plurality of queries are integrated.

データ処理システムをコンポーネントの組合せによって構築する開発実行環境が知られている。 A development execution environment for constructing a data processing system by combining components is known.

ＧＵＩを備えて、どのような順序でどのデータをどのように処理するかを示す情報がユーザによって入力されると、入力情報から処理内容および処理順序を定義したフロー定義を作成されるフロー定義編集システムがある。 Flow definition editing that creates a flow definition that defines the processing content and processing order from the input information when the user inputs information indicating how and what data is processed in what order. There is a system.

さらに、フロー定義から各処理に対応するクエリを実行するコンポーネントを生成するコンポーネント生成システムがある。 Furthermore, there is a component generation system that generates a component that executes a query corresponding to each process from a flow definition.

図３６は、従来手法における、フロー定義、クエリおよびコンポーネントの関係を説明するための図である。図３６に示すフロー定義では、処理毎に、対象とするデータのスキーム（処理対象のフィールドおよびその属性などの情報）とその処理内容（プロシージャ）が定義されている。 FIG. 36 is a diagram for explaining the relationship between the flow definition, the query, and the component in the conventional method. In the flow definition shown in FIG. 36, a scheme of target data (information such as a processing target field and its attribute) and a processing content (procedure) are defined for each processing.

コンポーネント生成システムでは、フロー定義で定義された各処理の処理型に応じた処理モジュール（処理エンジン）が選択され、各処理の内容に１対１に対応するクエリを生成し、クエリの処理を実行するモジュールを内包したコンポーネントが生成され、さらに、フロー定義で定義された処理順序にもとづいてコンポーネント間のデータ受け渡しが定義される。 In the component generation system, a processing module (processing engine) corresponding to the processing type of each process defined in the flow definition is selected, a query corresponding to each process content is generated, and the query process is executed. A component including the module to be generated is generated, and further, data passing between the components is defined based on the processing order defined in the flow definition.

米国特許出願公開第２００９／０２９９９８６明細書US Patent Application Publication No. 2009/0299986 特開２００６−１０７４７４号公報JP 2006-107474 A 特開２０１１−１１８４９２号公報JP 2011-118492 A

従来のコンポーネント生成システムでは、フロー定義で定義された各処理と１対１に対応する１つのクエリ毎にコンポーネントが生成される。コンポーネント間でのデータ受け渡しの都度、ファイルの書き込み／読み込み、ネットワークを介した通信等が発生するため、コンポーネント数に応じてデータ受け渡しの発生回数が増加し、通信等の時間がかかるという問題があった。特に、大規模なシステムのフロー定義である場合には、生成されるコンポーネント数が膨大なものとなり、通信のオーバーヘッドが大きくなるという問題が無視できなくなっている。 In a conventional component generation system, a component is generated for each query that has a one-to-one correspondence with each process defined in the flow definition. Each time data is transferred between components, file writing / reading, communication via the network, etc. occur, so the number of data transfers increases according to the number of components, and communication takes time. It was. In particular, in the case of a flow definition of a large-scale system, the problem that the number of components to be generated becomes enormous and communication overhead becomes large cannot be ignored.

本発明は、フロー定義中の処理に対応するクエリをできる限り統合し、生成されるクエリ数およびコンポーネント数を減らし、コンポーネント間の通信のオーバーヘッドを軽減できるクエリ統合方法、プログラムおよび装置を提供することを目的とする。 The present invention provides a query integration method, program, and apparatus capable of integrating queries corresponding to processes in a flow definition as much as possible, reducing the number of generated queries and components, and reducing communication overhead between components. With the goal.

本発明の一態様として開示するクエリ統合方法は、コンピュータが、処理内容および属性がそれぞれ定義された複数の処理と該複数の処理の処理順序が定義されたフロー定義を読み込み、前記フロー定義で定義された処理を先頭から取り出して対象処理に設定し、該対象処理に対応する第１クエリと該対象処理の直後に処理される該対象処理と同一の処理型の後続処理に対応する第２クエリとを統合した統合クエリを生成し、該統合クエリを前記対象処理に対応する第１クエリとする処理を繰り返す、処理を実行するものである。 In the query integration method disclosed as one aspect of the present invention, a computer reads a plurality of processes in which processing contents and attributes are defined and a flow definition in which the processing order of the plurality of processes is defined, and is defined by the flow definition. A first query corresponding to the target process and a second query corresponding to a subsequent process of the same processing type as the target process processed immediately after the target process. Is generated, and a process is repeated in which the integrated query is used as a first query corresponding to the target process.

開示するクエリ統合方法によれば、フロー定義から生成するクエリ数を少なくし、クエリに対応するコンポーネント数を大幅に減少させて、通信のオーバーヘッドを削減することができる。 According to the disclosed query integration method, the number of queries generated from the flow definition can be reduced, the number of components corresponding to the query can be greatly reduced, and communication overhead can be reduced.

開示するクエリ統合方法における、フロー定義に定義された処理、統合クエリおよびコンポーネントの関係例を示す図である。It is a figure which shows the example of a relationship between the process defined in the flow definition, an integrated query, and a component in the query integration method to disclose. 第１統合クエリ生成処理において生成される統合クエリの例を示す図である。It is a figure which shows the example of the integrated query produced | generated in a 1st integrated query production | generation process. 第２統合クエリ生成処理において生成されるクエリ言語に準拠した統合クエリ例を示す図である。It is a figure which shows the example of an integrated query based on the query language produced | generated in a 2nd integrated query production | generation process. 開示するクエリ統合装置の一実施例におけるブロック構成例を示す図である。It is a figure which shows the block structural example in one Example of the query integrated device to disclose. 第１統合クエリ生成部が取得するフロー定義の構成例を示す図である。It is a figure which shows the structural example of the flow definition which a 1st integrated query production | generation part acquires. 句分割による統合クエリ生成処理における統合可否の判定条件例を示す図である。It is a figure which shows the example of a judgment condition of the possibility of integration in the integrated query production | generation process by phrase division | segmentation. フロー定義の処理の演算型の例を示す図である。It is a figure which shows the example of the calculation type of the process of a flow definition. フロー定義のデータ定義例を示す図である。It is a figure which shows the data definition example of a flow definition. フロー定義の処理定義例を示す図である。It is a figure which shows the process definition example of a flow definition. 句テンプレート例を示す図である。It is a figure which shows a phrase template example. 句テンプレート例を示す図である。It is a figure which shows a phrase template example. 句テンプレート例を示す図である。It is a figure which shows a phrase template example. 句テンプレート例を示す図である。It is a figure which shows a phrase template example. 統合コンポーネント生成装置の概要処理フロー例を示す図である。It is a figure which shows the example of an outline processing flow of an integrated component production | generation apparatus. 句分割による統合クエリ生成処理（ステップＳ２）のより詳細な処理フロー例を示す図である。It is a figure which shows the more detailed process flow example of the integrated query production | generation process (step S2) by phrase division | segmentation. 対象処理の後続処理があるかの判定例を示す図である。It is a figure which shows the example of determination whether there exists subsequent processing of object processing. 生成される各句の例を示す図である。It is a figure which shows the example of each phrase produced | generated. 生成クエリ記憶部に保存される各句および演算型の例を示す図である。It is a figure which shows the example of each phrase and arithmetic type preserve | saved at the production | generation query memory | storage part. 生成される各句の例を示す図である。It is a figure which shows the example of each phrase produced | generated. 生成クエリ記憶部に保存される各句および演算型の例を示す図である。It is a figure which shows the example of each phrase and arithmetic type preserve | saved at the production | generation query memory | storage part. 生成クエリ記憶部に保存される各句および演算型の例を示す図である。It is a figure which shows the example of each phrase and arithmetic type preserve | saved at the production | generation query memory | storage part. 生成されるクエリの例を示す図である。It is a figure which shows the example of the produced | generated query. 統合可否判定処理（ステップＳ２４）のより詳細な処理フロー例を示す図である。It is a figure which shows the example of a more detailed process flow of an integration possibility determination process (step S24). 演算型計算処理（ステップＳ２４３）のより詳細な処理フロー例を示す図である。It is a figure which shows the example of a more detailed process flow of a calculation type | mold calculation process (step S243). 演算型の判定の例を示す図である。It is a figure which shows the example of calculation type | mold determination. 演算型の判定の例を示す図である。It is a figure which shows the example of calculation type | mold determination. 演算型の判定の例を示す図である。It is a figure which shows the example of calculation type | mold determination. 入れ子による統合クエリ生成処理（ステップＳ３）のより詳細な処理フロー例を示す図である。It is a figure which shows the more detailed process flow example of the integrated query production | generation process (step S3) by nesting. 生成される統合クエリの例を示す図である。It is a figure which shows the example of the integrated query produced | generated. 生成クエリ記憶部に保存されるクエリの例を示す図である。It is a figure which shows the example of the query preserve | saved at the production | generation query memory | storage part. コンポーネント生成処理（ステップＳ４）のより詳細な処理フロー例を示す図である。It is a figure which shows the example of a more detailed process flow of a component production | generation process (step S4). クエリ言語に対応したコンポーネントテンプレートの設定例を示す図である。It is a figure which shows the example of a setting of the component template corresponding to a query language. コンポーネントテンプレートの例を示す図である。It is a figure which shows the example of a component template. 生成される統合コンポーネントの例を示す図である。It is a figure which shows the example of the integrated component produced | generated. 統合コンポーネント生成装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of an integrated component production | generation apparatus. 従来手法におけるフロー定義、クエリおよびコンポーネントの関係を説明するための図である。It is a figure for demonstrating the relationship between the flow definition in a conventional method, a query, and a component.

以下、本発明の一態様として開示するクエリ統合方法について説明する。 Hereinafter, a query integration method disclosed as one aspect of the present invention will be described.

開示するクエリ統合方法では、フロー定義に定義された各処理からクエリを生成する際に、ある処理（対象処理とする）に対応するクエリ（第１クエリとする）とその直後の処理（後続処理とする）に対応するクエリ（第２クエリとする）とを統合したクエリを可能な限り生成し、生成するクエリ数を減らし、クエリに対応するコンポーネント数を減らしていく。 In the disclosed query integration method, when a query is generated from each process defined in the flow definition, a query (referred to as a first query) corresponding to a certain process (referred to as a target process) and a process immediately thereafter (subsequent process) And the query corresponding to the query (referred to as the second query) are generated as much as possible, the number of generated queries is reduced, and the number of components corresponding to the query is reduced.

より詳しくは、対象処理と後続処理が、処理順序を変更して実行した場合にも演算結果が変化しないかを調べる。例えば、関係代数にもとづく演算型同士の対応関係から、対象処理と後続処理との順序依存の有無を調べる。両処理が、処理順序を変更して実行した場合にも演算結果が変化しない演算型である場合に、第１クエリと第２クエリを句単位に分割し、分割した句毎に要素を連結した統合クエリを生成する。そして、生成した統合クエリを第１クエリとし、クエリを統合した処理を対象処理に設定し、その後続処理（現後続処理の次の処理）との間で統合クエリの生成を繰り返していく（第１の統合クエリ生成処理）。 More specifically, it is examined whether or not the calculation result is changed even when the target process and the subsequent process are executed by changing the processing order. For example, whether or not there is an order dependency between the target process and the subsequent process is checked from the correspondence between the operation types based on the relational algebra. When both processes are operation types that do not change the operation result even if they are executed after changing the processing order, the first query and the second query are divided into phrase units, and elements are connected for each divided phrase. Generate a consolidated query. Then, the generated integrated query is set as the first query, the process in which the queries are integrated is set as the target process, and the generation of the integrated query is repeated with the subsequent process (the process subsequent to the current subsequent process) (the first query). 1 integrated query generation process).

フロー定義中、第１の統合クエリ生成処理による統合ができない処理の場合、例えば、対象処理と後続処理が処理順序を変更して実行したときに演算結果が変化したり後続処理が実行不能となったりする場合には、第１クエリをサブクエリとして第２クエリに組み込んで統合クエリを生成する（第２の統合クエリ生成処理）。 In the case of a process that cannot be integrated by the first integrated query generation process during the flow definition, for example, when the target process and the subsequent process are executed by changing the processing order, the calculation result changes or the subsequent process cannot be executed. In such a case, an integrated query is generated by incorporating the first query as a subquery into the second query (second integrated query generation process).

図１は、フロー定義に定義された処理、統合クエリ、およびコンポーネントの関係例を示す図である。 FIG. 1 is a diagram illustrating an example of a relationship between processes, integration queries, and components defined in a flow definition.

図１に示すフロー定義は、各処理で対象となるデータのデータ定義および処理定義、ならびに処理順序が定義されている情報である。ここでは、ＰＯＳシステムのＰＯＳデータに対する処理として、処理Ｐ１（日時から時刻への変換処理）、処理Ｐ２（値範囲指定（時刻）処理）、処理Ｐ３（フィールド選択）、処理Ｐ４（商品カテゴリとの結合（ジョイン）処理）、…と複数の処理が定義されているとする。 The flow definition shown in FIG. 1 is information in which a data definition and a process definition of data to be processed in each process and a process order are defined. Here, as processing for POS data of the POS system, processing P1 (date-to-time conversion processing), processing P2 (value range specification (time) processing), processing P3 (field selection), processing P4 (commodity category) Assume that a plurality of processes are defined, such as a join (join) process.

〔第１の統合クエリ生成処理（句分割による統合クエリ生成処理）〕
開示するクエリ統合方法では、まず、第１の統合クエリ生成処理として、フロー定義で定義されている先頭の処理から対象処理に設定し、句分割による統合クエリを生成する。 [First integrated query generation processing (integrated query generation processing by phrase division)]
In the disclosed query integration method, first, as a first integrated query generation process, the first process defined in the flow definition is set as a target process, and an integrated query by phrase division is generated.

フロー定義から統合クエリを生成するため、クエリを構成する各句のテンプレートを予め設定しておき、この句テンプレートを参照して統合クエリを生成する。句テンプレートの説明は、後述する。 In order to generate an integrated query from the flow definition, a template for each phrase constituting the query is set in advance, and the integrated query is generated with reference to the phrase template. The phrase template will be described later.

図１に示すフロー定義の例において、対象処理が処理Ｐ１である場合に、後続処理の処理Ｐ２は、処理型（リアルタイム）が一致するが、処理順序を入れ替えた場合に処理Ｐ２が処理不能となる。そこで、第１の統合クエリ生成処理を行わずに、処理Ａに対応するクエリ「ＳＥＬＥＣＴ．．．ＵＤＦ．ｇｅｔＴｉｍｅ（日時）ＡＳ時刻ＦＲＯＭＰＯＳデータ」を生成し、次の処理Ｐ２を対象処理として処理を行う。 In the example of the flow definition shown in FIG. 1, when the target process is the process P1, the process P2 of the subsequent process matches the process type (real time), but the process P2 cannot be processed when the process order is changed. Become. Therefore, the query “SELECT... UDF.getTime (date and time) AS time FROM POS data” corresponding to the process A is generated without performing the first integrated query generation process, and the next process P2 is processed as the target process. I do.

対象処理の処理Ｐ２と後続処理の処理Ｐ３とが実行順序の入れ替えが可能な演算型である場合に、処理Ｐ１、Ｐ２について句分割による統合クエリ生成処理を実行する。 When the process P2 of the target process and the process P3 of the subsequent process are operation types whose execution order can be changed, the integrated query generation process by phrase division is executed for the processes P1 and P2.

具体的には、図２に示すように、処理Ｐ２に対応する第１クエリが「ＳＥＬＥＣＴ．．．ＦＲＯＭデータＡＷＨＥＲＥ１８：００＜＝時刻ＡＮＤ時刻＜＝２０：００」であり、処理Ｐ２に対応する第２クエリが「ＳＥＬＥＣＴＩＤ，商品ＩＤ，時刻ＦＲＯＭデータＢ」である場合に、処理Ｐ１、Ｐ２にそれぞれ対応するクエリを、構成する句単位（ＳＥＬＥＣＴ句、ＦＲＯＭ句、ＷＨＥＲＥ句）に分割し、分割した句毎に、処理Ｐ１、Ｐ２の対応する句の要素を結合した句を連結して統合クエリ「ＳＥＬＥＣＴＩＤ，商品ＩＤ，時刻ＦＲＯＭデータＡＷＨＥＲＥ１８：００＜＝時刻ＡＮＤ時刻＜＝２０：００」を生成する。 Specifically, as shown in FIG. 2, the first query corresponding to the process P2 is “SELECT... FROM data A WHERE 18:00 <= time AND time <= 20: 00”. When the corresponding second query is “SELECT ID, product ID, time FROM data B”, the query corresponding to each of the processes P1 and P2 is divided into constituent phrase units (SELECT clause, FROM clause, WHERE clause). Then, for each divided phrase, the phrases obtained by combining the corresponding phrase elements of the processes P1 and P2 are connected to obtain the integrated query “SELECT ID, product ID, time FROM data A WHERE 18:00 <= time AND time <= 20:00 "is generated.

次の後続処理の処理Ｐ４は、統合した処理（処理Ｐ２、Ｐ３）と処理順序を入れ替えた場合に処理結果が異なる演算型である。そこで、統合クエリ生成処理を行わずに、処理Ｐ４に対応するクエリ「ＳＥＬＥＣＴ．．．ＦＲＯＭデータＣＪＯＩＮ商品カテゴリＯＮ．．．」を生成する。 The next subsequent process P4 is an operation type in which the process result is different when the process order is changed from the integrated process (processes P2 and P3). Therefore, the query “SELECT... FROM data C JOIN merchandise category ON...” Corresponding to the process P4 is generated without performing the integrated query generation process.

このように、句分割による統合クエリ生成処理を、処理型が同一の処理が連続する間繰り返していく。 In this way, the integrated query generation process by phrase division is repeated while processes of the same process type continue.

〔第２の統合クエリ生成処理（入れ子による統合クエリ生成処理）〕
次に、フロー定義の処理Ｐ１から、第２の統合クエリ生成処理として、入れ子による統合クエリを生成する。 [Second integrated query generation processing (integrated query generation processing by nesting)]
Next, as a second integrated query generation process, a nested integrated query is generated from the flow definition process P1.

具体的には、処理Ｐ１の後続処理となる統合済みの処理（Ｐ２＋Ｐ３）の統合クエリ（第２クエリ）に、処理Ｐ１に対応するクエリ（第１クエリ）をサブクエリとして組み込んだ統合クエリ「ＳＥＬＥＣＴＩＤ，商品ＩＤ，時刻ＦＲＯＭ（ＳＥＬＥＣＴ．．．ＵＤＦ．ｇｅｔＴｉｍｅ（日時）ＡＳ時刻ＦＲＯＭＰＯＳデータ）ＷＨＥＲＥ１８：００＜＝時刻ＡＮＤ時刻＜＝２０：００」を生成する。 Specifically, an integrated query “SELECT ID” in which a query (first query) corresponding to the process P1 is incorporated as a subquery into an integrated query (second query) of the integrated process (P2 + P3) that is a subsequent process of the process P1. , Product ID, time FROM (SELECT... UDF.getTime (date and time) AS time FROM POS data) WHERE 18:00 <= time AND time <= 20: 00 "is generated.

さらに、統合した処理（Ｐ１＋Ｐ２＋Ｐ３）を対象処理として、その後続処理となる処理Ｐ４に対応するクエリに、生成した統合クエリをサブクエリとして組み込んだ統合クエリ「ＳＥＬＥＣＴ．．．ＦＲＯＭ（ＳＥＬＥＣＴＩＤ，商品ＩＤ，時刻ＦＲＯＭ（ＳＥＬＥＣＴ．．．ＵＤＦ．ｇｅｔＴｉｍｅ（日時）ＡＳ時刻ＦＲＯＭＰＯＳデータ）ＷＨＥＲＥ１８：００＜＝時刻ＡＮＤ時刻＜＝２０：００）ＪＯＩＮ商品カテゴリＯＮ．．．」を生成する。 Further, an integrated query “SELECT... FROM (SELECT ID, product ID, product ID, product ID, product ID,. Time FROM (SELECT... UDF.getTime (date and time) AS time FROM POS data) WHERE 18:00 <= time AND time <= 20: 00) JOIN Product category ON ... is generated.

ところで、サブクエリ挿入の記述規則は、クエリ言語により異なるので、第２の統合クエリ生成処理では、クエリ言語に対応したサブクエリの挿入規則を用いて統合クエリを生成する。例えば、バッチ型の処理があるクエリ言語（ＨｉｖｅＱＬ）に対応するクエリ処理エンジンで処理される場合に、そのクエリ言語に準拠する図３（Ａ）に示すような統合クエリを生成し、リアルタイム型の処理が別のクエリ言語（ＥｓｐｅｒＥＰＬ）に対応するクエリ処理エンジンで処理される場合に、その別のクエリ言語に準拠する図３（Ｂ）に示すような統合クエリを生成する。 By the way, the description rule for subquery insertion differs depending on the query language. Therefore, in the second integrated query generation process, an integrated query is generated using the subquery insertion rule corresponding to the query language. For example, when processing is performed by a query processing engine corresponding to a query language (HiveQL) that has a batch type process, an integrated query as shown in FIG. When the process is processed by a query processing engine corresponding to another query language (EsperEPL), an integrated query as shown in FIG. 3B conforming to the other query language is generated.

以上のように、従来手法によれば４つのクエリから４つのコンポーネントが生成されていたのに対し、開示するクエリ統合方法によれば、４つのクエリを統合した１つの統合クエリに対する１つの統合コンポーネントが生成されることになる。よって、従来クエリ毎に生成されていたコンポーネント数が大幅に減少するため、コンポーネント間の通信回数の減少により処理時間や処理負荷を大幅に削減することができる。 As described above, according to the conventional method, four components are generated from four queries, whereas according to the disclosed query integration method, one integrated component for one integrated query obtained by integrating four queries. Will be generated. Therefore, since the number of components conventionally generated for each query is greatly reduced, the processing time and processing load can be greatly reduced by reducing the number of communication between components.

以下、本発明の別の態様として開示するクエリ統合装置について説明する。 Hereinafter, a query integration device disclosed as another aspect of the present invention will be described.

図４は、開示するクエリ統合装置の一実施例におけるブロック構成例を示す図である。本実施例において、本発明に係るクエリ統合装置１０は、統合コンポーネント生成装置１内に構成されている。 FIG. 4 is a diagram illustrating a block configuration example in an embodiment of the disclosed query integration device. In the present embodiment, the query integration device 10 according to the present invention is configured in the integrated component generation device 1.

統合コンポーネント生成装置１は、クエリ統合装置１０、統合コンポーネント生成部２０およびコンポーネントテンプレート記憶部（コンポーネントテンプレートリポジトリ）２１を備える。 The integrated component generation device 1 includes a query integration device 10, an integrated component generation unit 20, and a component template storage unit (component template repository) 21.

クエリ統合装置１０は、第１統合クエリ生成部１１、第２統合クエリ生成部１３、句分割テンプレート記憶部（句分割テンプレートリポジトリ）１５および生成クエリ記憶部１７を備える。 The query integration device 10 includes a first integration query generation unit 11, a second integration query generation unit 13, a phrase division template storage unit (phrase division template repository) 15, and a generation query storage unit 17.

クエリ統合装置１０の第１統合クエリ生成部１１は、句分割テンプレート記憶部１５を参照し、取得したフロー定義２に定義された処理（対象処理）の第１クエリおよびその後続処理に対応する第２クエリについて、句分割による統合クエリを生成する。 The first integrated query generation unit 11 of the query integration device 10 refers to the phrase division template storage unit 15 and corresponds to the first query of the process (target process) defined in the acquired flow definition 2 and the subsequent process. For two queries, an integrated query by phrase division is generated.

第１統合クエリ生成部１１は、対象処理と後続処理の処理順序の入れ替えによって、処理結果（出力データ）が変わらない場合に統合クエリを生成し、変わる場合には統合クエリを生成しない。 The first integrated query generation unit 11 generates an integrated query when the processing result (output data) does not change by changing the processing order of the target process and the subsequent process, and does not generate an integrated query when the processing result changes.

本実施の形態では、第１統合クエリ生成部１１は、例えば、複数の入力を結合する処理、入力データを要約する処理（平均値計算、データ数カウント等）等は、その前後の処理と統合して統合クエリを生成しない。さらに、あるフィールドを追加する処理、あるフィールドの値を変更する処理等は、そのフィールドの値を参照し演算や比較をする後続処理と統合して統合クエリを生成しない。 In the present embodiment, the first integrated query generation unit 11 integrates, for example, a process of combining a plurality of inputs, a process of summarizing input data (average value calculation, data count, etc.) with the processes before and after that. And do not generate consolidated queries. Furthermore, the process of adding a certain field, the process of changing the value of a certain field, and the like are integrated with subsequent processes that perform calculations and comparisons with reference to the value of the field, and do not generate an integrated query.

図５は、第１統合クエリ生成部１１が入力するフロー定義の構成例を示す図である。 FIG. 5 is a diagram illustrating a configuration example of a flow definition input by the first integrated query generation unit 11.

フロー定義２には、図５（Ａ）に示すように、ＰＯＳデータに対する複数の処理Ｐ１、Ｐ２、Ｐ３、…の処理順序が定義されている。さらに、フロー定義２には、データ定義として、図５（Ｂ）に示すような、処理対象となる各データのスキーマ（フィールド名およびデータ型）が定義されている。さらに、フロー定義２には、処理定義として、処理型、処理の種類や名称、入力データ、出力データ、図５（Ｃ）に示すような処理プロパティ（変数の属性名、値等）が定義されている。 In the flow definition 2, as shown in FIG. 5A, a processing order of a plurality of processes P1, P2, P3,... For POS data is defined. Furthermore, in the flow definition 2, a schema (field name and data type) of each data to be processed as shown in FIG. 5B is defined as a data definition. Furthermore, in the flow definition 2, as a process definition, a process type, a process type and name, input data, output data, and process properties (variable attribute names, values, etc.) as shown in FIG. 5C are defined. ing.

図６は、句分割による統合クエリ生成処理における統合可否の判定条件例を示す図である。 FIG. 6 is a diagram illustrating an example of a condition for determining whether or not integration is possible in the integrated query generation process based on phrase division.

図６に示すデータテーブルは、予め第１統合クエリ生成部１１に保持されており、対象処理と後続処理の演算型にもとづいて、句分割による統合クエリ生成が可能であるか否かが設定されている。図６に示すデータテーブル例では、対象処理と後続処理とが統合可能な関係を丸印（○）で、統合不可の場合がある関係を三角印（△）で、統合不可な関係をバツ印（×）で表している。 The data table shown in FIG. 6 is held in the first integrated query generation unit 11 in advance, and whether or not an integrated query can be generated by phrase division is set based on the calculation types of the target process and the subsequent process. ing. In the example of the data table shown in FIG. 6, the relationship that the target process and the subsequent process can be integrated is indicated by a circle (◯), the relationship that may not be integrated is indicated by a triangle (Δ), and the relationship that cannot be integrated is indicated by a cross. (X).

図７は、フロー定義の処理の演算型の例を示す図である。 FIG. 7 is a diagram illustrating an example of a calculation type of the flow definition process.

「拡張」は、入力データに新しいフィールドを追加して出力する処理であり、例えば、処理Ｐ１「日時→時刻変換」のように、日時から変換された時刻を格納するフィールドを出力データに追加する処理が該当する。「選択」は、入力データを何らかの条件でフィルタリングする処理であり、例えば、処理Ｐ２「値範囲指定」のように、ＷＨＥＲＥ句の条件で入力データを絞り込む処理が該当する。「射影」は、入力データのいくつかのフィールドのみを選択する処理であり、例えば、処理Ｐ３「フィールド選択」のように、入力データから指定フィールドのみを含む出力データを生成する処理が該当する。 “Extended” is a process of adding a new field to the input data and outputting it. For example, a field for storing the time converted from the date and time is added to the output data as in process P1 “date and time conversion”. Processing is applicable. “Selection” is a process of filtering input data under some condition, for example, a process of narrowing down input data by the condition of the WHERE clause, such as process P2 “value range designation”. “Projection” is a process of selecting only some fields of input data, and corresponds to a process of generating output data including only a specified field from input data, for example, process P3 “field selection”.

さらに、図７に図示しないが、「要約」は、入力データ全体に対して集約を行う処理であり、例えば、件数カウント、合計、平均等の処理が該当する。「複数入力」は、複数の入力データを結合して１つの出力データを生成する処理である。「複数出力」は、１つの出力データを、複数の後続処理が参照するような処理である。 Further, although not shown in FIG. 7, “summary” is a process of performing aggregation on the entire input data, and includes, for example, processes such as counting the number of cases, summation, and average. “Multiple inputs” is a process of combining a plurality of input data to generate one output data. “Multiple output” is processing in which one output data is referred to by a plurality of subsequent processes.

第２統合クエリ生成部１３は、対象処理に対応する第１クエリをその後続処理に対応する第２クエリにサブクエリとして組み込んで統合クエリを生成する。 The second integrated query generation unit 13 generates an integrated query by incorporating the first query corresponding to the target process as a subquery into the second query corresponding to the subsequent process.

句分割テンプレート記憶部１５は、クエリを構成する各句の句テンプレートを記憶する。 The phrase division template storage unit 15 stores a phrase template of each phrase constituting the query.

生成クエリ記憶部１７は、第１統合クエリ生成部１１および第２統合クエリ生成部１３が生成したクエリを記憶する。 The generated query storage unit 17 stores the queries generated by the first integrated query generating unit 11 and the second integrated query generating unit 13.

図８は、フロー定義のデータ定義例を示す図である。図８に示すブロックの上段から、ＰＯＳデータ、データＡ、データＢ、データＣのデータ定義を表している。データ定義には、処理されるデータのデータＩＤ、データ名、フィールド名、型（データ型）が定義されている。 FIG. 8 is a diagram illustrating a data definition example of the flow definition. The data definitions of POS data, data A, data B, and data C are shown from the top of the block shown in FIG. In the data definition, the data ID, data name, field name, and type (data type) of the data to be processed are defined.

図９は、フロー定義の処理定義例を示す図である。図９に示すブロックの上段から、処理Ｐ１、Ｐ２、Ｐ３の処理定義を表している。処理定義には、処理名、処理の種類、処理型、入力データのデータＩＤ（入力データＩＤ）、出力データのデータＩＤ（出力データＩＤ）、属性名、属性名の値が定義されている。 FIG. 9 is a diagram illustrating a process definition example of the flow definition. The process definitions of processes P1, P2, and P3 are shown from the top of the block shown in FIG. In the process definition, the process name, process type, process type, input data data ID (input data ID), output data data ID (output data ID), attribute name, and attribute name value are defined.

図１０〜図１３は、句テンプレート例を示す図である。 10 to 13 are diagrams showing examples of phrase templates.

句テンプレートは、クエリ言語毎に用意される。例えば、句分割テンプレート記憶部１５には、クエリ言語ＥＰＬ（ＥｖｅｎｔＰｒｏｃｅｓｓｉｎｇＬａｎｇｕａｇｅ）およびＨｉｖｅＱＬの２種類のクエリ言語に対応する句テンプレートが記憶され、ＥＰＬの句テンプレートは、処理型がリアルタイムの処理の場合に、ＨｉｖｅＱＬの句テンプレートは、処理型がバッチの処理の場合に適用されるものとする。 A phrase template is prepared for each query language. For example, the phrase division template storage unit 15 stores phrase templates corresponding to two types of query languages, the query language EPL (Event Processing Language) and the HighQL, and the EPL phrase template is processed in real time. In addition, the phrase template of HiveQL is applied when the processing type is batch processing.

図１０は、日時を時刻に変換する処理に対応するクエリの各句テンプレート例であり、図１０（Ａ）は、ＥＰＬ用のＳＥＬＥＣＴ句、ＦＲＯＭ句の句テンプレート例、図１０（Ｂ）は、ＨｉｖｅＱＬ用のＳＥＬＥＣＴ句、ＦＲＯＭ句の句テンプレート例である。 FIG. 10 is an example of each phrase template of a query corresponding to the process of converting date and time into time. FIG. 10A is an example of a SELECT phrase for EPL, a phrase template of a FROM phrase, and FIG. It is a phrase template example of a SELECT phrase for FROMQL and a FROM phrase.

図１１は、値範囲を指定する処理に対応するクエリの各句テンプレート例であり、図１１（Ａ）は、ＥＰＬ用のＳＥＬＥＣＴ句、ＦＲＯＭ句、ＷＨＥＲＥ句の句テンプレート例、図１１（Ｂ）は、ＨｉｖｅＱＬ用のＳＥＬＥＣＴ句、ＦＲＯＭ句、ＷＨＥＲＥ句の句テンプレート例である。 FIG. 11 is an example of each phrase template of a query corresponding to a process for specifying a value range. FIG. 11A is an example of a phrase template for an EPL SELECT phrase, FROM phrase, and WHERE phrase, and FIG. 11B. Is an example of a phrase template for a SELECT phrase, a FROM phrase, and a WHERE phrase for HighQL.

図１２は、フィールドを選択する処理に対応するクエリの各句テンプレート例であり、図１２（Ａ）は、ＥＰＬ用のＳＥＬＥＣＴ句、ＦＲＯＭ句の句テンプレート例、図１２（Ｂ）は、ＨｉｖｅＱＬ用のＳＥＬＥＣＴ句、ＦＲＯＭ句の句テンプレート例である。 FIG. 12 is an example of each phrase template of a query corresponding to the process of selecting a field, FIG. 12A is an example of a SELECT phrase for EPL, a phrase template of a FROM phrase, and FIG. 12B is an example for HighQL. This is a phrase template example of a SELECT phrase and a FROM phrase.

図１３は、処理型および処理名と適用される句テンプレートとの対応関係を示すテーブル例である。処理型や句が増加する場合には、図１３に示すテーブルに列を追加してテンプレート登録することができる。 FIG. 13 is an example of a table showing the correspondence between the processing type and processing name and the applied phrase template. When the number of processing types and phrases increases, a column can be added to the table shown in FIG. 13 to register a template.

以下、統合コンポーネント生成装置１の動作を説明する。 Hereinafter, the operation of the integrated component generation apparatus 1 will be described.

図１４は、統合コンポーネント生成装置１の概要処理フロー例を示す図である。 FIG. 14 is a diagram illustrating an example of an overview processing flow of the integrated component generation apparatus 1.

統合コンポーネント生成装置１に備えられたクエリ統合装置１０の第１統合クエリ生成部１１は、フロー定義２を取得し、フロー定義２の各処理にクエリ未生成を示すフラグを付加する（ステップＳ１）。そして、第１統合クエリ生成部１１は、句分割テンプレート記憶部１５を参照して句分割による統合クエリ生成処理を行い、生成した統合クエリを含む全てのクエリを生成クエリ記憶部１７に保存する（ステップＳ２）。 The first integrated query generation unit 11 of the query integration device 10 provided in the integrated component generation device 1 acquires the flow definition 2, and adds a flag indicating that no query is generated to each process of the flow definition 2 (step S1). . Then, the first integrated query generation unit 11 performs an integrated query generation process by phrase division with reference to the phrase division template storage unit 15 and stores all the queries including the generated integrated query in the generation query storage unit 17 ( Step S2).

次に、第２統合クエリ生成部１３は、生成クエリ記憶部１７に保存されたクエリに対し、入れ子による統合クエリ生成処理を行い、生成した統合クエリを含む全てのクエリを生成クエリ記憶部１７に保存する（ステップＳ３）。 Next, the second integrated query generation unit 13 performs an integrated query generation process by nesting the queries stored in the generated query storage unit 17, and stores all queries including the generated integrated query in the generated query storage unit 17. Save (step S3).

統合コンポーネント生成部２０は、コンポーネントテンプレート記憶部２１を参照して、生成クエリ記憶部１７に保存されたクエリ毎にコンポーネントを生成し、生成したコンポーネント３を出力する（ステップＳ４）。 The integrated component generation unit 20 refers to the component template storage unit 21, generates a component for each query stored in the generation query storage unit 17, and outputs the generated component 3 (step S4).

図１５は、句分割による統合クエリ生成処理（ステップＳ２）のより詳細な処理フロー例を示す図である。 FIG. 15 is a diagram illustrating a more detailed processing flow example of the integrated query generation processing (step S2) by phrase division.

第１統合クエリ生成部１１は、フロー定義２の先頭から、フラグをもとに、クエリ未生成の処理が存在するかを判定し（ステップＳ２１）、クエリ未生成の処理があれば（ステップＳ２１のＹ）、フロー定義２からクエリ未生成の処理を選択して対象処理とし（ステップＳ２２）、クエリ未生成の処理がなければ（ステップＳ２１のＮ）、処理を終了する。 The first integrated query generation unit 11 determines from the top of the flow definition 2 whether there is a query non-generated process based on the flag (step S21), and if there is a query non-generated process (step S21). Y), a process for which a query has not been generated is selected from the flow definition 2 as a target process (step S22).

第１統合クエリ生成部１１は、対象処理の後続処理があるかを判定する（ステップＳ２３）。第１統合クエリ生成部１１は、具体的には、図１６に示すように、対象処理の出力データと同じＩＤの入力データを持つ処理をフロー定義２から検索し、検索できた場合に、後続処理があると判定する。 The first integrated query generation unit 11 determines whether there is a subsequent process of the target process (step S23). Specifically, as illustrated in FIG. 16, the first integrated query generation unit 11 searches the flow definition 2 for a process having input data having the same ID as the output data of the target process. It is determined that there is a process.

対象処理に後続処理があれば（ステップＳ２３のＹ）、第１統合クエリ生成部１１は、対象処理と後続処理との統合可否を判定する（ステップＳ２４）。ステップＳ２４の処理の詳細は後述する。 If there is a subsequent process in the target process (Y in step S23), the first integrated query generation unit 11 determines whether or not the target process and the subsequent process can be integrated (step S24). Details of the processing in step S24 will be described later.

ステップＳ２４の処理の結果、対象処理と後続処理との統合が可能であれば（ステップＳ２５のＹ）、第１統合クエリ生成部１１は、対象処理と後続処理との句毎の要素を結合して統合クエリの各句を生成し、生成クエリ記憶部１７に対象処理の各句として保存する（ステップＳ２６）。具体的には、第１統合クエリ生成部１１は、フロー定義２の対象処理および後続処理のデータ定義を参照し、句分割テンプレート記憶部１５から処理名に対応する句テンプレートを用いて各句を生成する。図１３のデータテーブルが参照され、対象処理が処理Ｐ２で後続処理が処理Ｐ３である場合に、ＳＥＬＥＣＴ句の句テンプレートＴ１１（処理Ｐ３）、ＦＲＯＭ句の句テンプレートＴ６（処理Ｐ２）、ＷＨＥＲＥ句の句テンプレートＴ７（処理Ｐ２）が指定される。そして、図１７に示すように、句毎に対象処理と後続処理との要素を結合した各句（ＳＥＬＥＣＴ句ｑ４、ＦＲＯＭ句ｑ５、ＷＨＥＲＥ句ｑ６）が生成される。 As a result of the process in step S24, if the target process and the subsequent process can be integrated (Y in step S25), the first integrated query generation unit 11 combines the elements for each phrase of the target process and the subsequent process. Each phrase of the integrated query is generated and stored as each phrase of the target process in the generated query storage unit 17 (step S26). Specifically, the first integrated query generation unit 11 refers to the target process of the flow definition 2 and the data definition of the subsequent process, and uses the phrase template corresponding to the process name from the phrase division template storage unit 15 to specify each phrase. Generate. When the data table of FIG. 13 is referenced and the target process is process P2 and the subsequent process is process P3, the SELECT phrase phrase template T11 (process P3), the FROM phrase phrase template T6 (process P2), and the WHERE phrase Phrase template T7 (process P2) is designated. Then, as shown in FIG. 17, each phrase (SELECT phrase q4, FROM phrase q5, WHERE phrase q6) in which the elements of the target process and the subsequent process are combined is generated for each phrase.

第１統合クエリ生成部１１は、対象処理と後続処理の演算型の和を、対象処理の演算型として生成クエリ記憶部１７に保存する（ステップＳ２７）。さらに、第１統合クエリ生成部１１は、後続処理をフロー定義２から削除する（ステップＳ２８）。 The first integrated query generation unit 11 stores the sum of the calculation type of the target process and the subsequent process in the generation query storage unit 17 as the calculation type of the target process (step S27). Further, the first integrated query generation unit 11 deletes the subsequent process from the flow definition 2 (step S28).

図１８に示すように、対象処理が処理Ｐ２で後続処理が処理Ｐ３である場合に、生成クエリ記憶部１７には、対象処理の句として、生成された各句ｑ４、ｑ５、ｑ６、および演算型「選択＋射影」が保存される。また、図１８の処理名の斜線は、その処理が削除されていることを表す。 As illustrated in FIG. 18, when the target process is the process P2 and the subsequent process is the process P3, the generated query storage unit 17 stores the generated phrases q4, q5, q6, and an operation as the target process phrase. The type “selection + projection” is stored. Also, the diagonal line of the process name in FIG. 18 indicates that the process has been deleted.

ステップＳ２４の処理の結果、対象処理と後続処理との統合が可能でなければ（ステップＳ２５のＮ）、第１統合クエリ生成部１１は、対象処理のクエリの各句を生成し、各句と演算型を生成クエリ記憶部１７に保存する（ステップＳ２９）。 If the target process and the subsequent process cannot be integrated as a result of the process in step S24 (N in step S25), the first integrated query generation unit 11 generates each phrase of the target process query, The operation type is stored in the generated query storage unit 17 (step S29).

対象処理が処理Ｐ１で後続処理と統合不可である場合に、ＳＥＬＥＣＴ句の句テンプレートＴ１（処理Ｐ１）、ＦＲＯＭ句の句テンプレートＴ２（処理Ｐ１）が指定され、図１９に示すように、句毎に対象処理の要素をそのまま用いた各句（ＳＥＬＥＣＴ句ｑ１、ＦＲＯＭ句ｑ２）が生成される。図２０に示すように、生成クエリ記憶部１７には、対象処理の句として、生成された各句ｑ１、ｑ２および演算型「拡張」が保存される。 When the target process is the process P1 and cannot be integrated with the subsequent process, the phrase template T1 (process P1) of the SELECT phrase and the phrase template T2 (process P1) of the FROM phrase are designated, and as shown in FIG. Each phrase (SELECT phrase q1, FROM phrase q2) is generated using the target processing elements as they are. As illustrated in FIG. 20, the generated query storage unit 17 stores the generated phrases q1 and q2 and the operation type “expansion” as target processing phrases.

そして、ステップＳ２３の処理の結果、対象処理に後続処理がなければ（ステップＳ２３のＮ）、第１統合クエリ生成部１１は、対象処理のクエリの各句を生成して、生成クエリ記憶部１７に保存する（ステップＳ２９）。第１統合クエリ生成部１１は、生成クエリ記憶部１７に保存した各処理の各句を結合し、その処理のクエリとして生成クエリ記憶部１７に保存する（ステップＳ２１０）。図２１に示すように、処理Ｐ１に対応するクエリＱ１０が、処理Ｐ２（Ｐ２＋Ｐ３）に対応するクエリＱ１１が生成される。 If there is no subsequent process in the target process as a result of the process in step S23 (N in step S23), the first integrated query generation unit 11 generates each phrase of the query of the target process, and the generated query storage unit 17 (Step S29). The first integrated query generation unit 11 combines each phrase of each process stored in the generation query storage unit 17 and stores it in the generation query storage unit 17 as a query of the process (step S210). As shown in FIG. 21, a query Q10 corresponding to the process P1 is generated, and a query Q11 corresponding to the process P2 (P2 + P3) is generated.

図２２（Ａ）は、処理Ｐ１についてクエリＱ１０の例、図２２（Ｂ）は、処理Ｐ２についてクエリＱ１１の例を示す図である。 FIG. 22A shows an example of the query Q10 for the process P1, and FIG. 22B shows an example of the query Q11 for the process P2.

図２３は、統合可否判定処理（ステップＳ２４）のより詳細な処理フロー例を示す図である。 FIG. 23 is a diagram showing a more detailed processing flow example of the integration possibility determination processing (step S24).

第１統合クエリ生成部１１は、フロー定義２中の対象処理と後続処理とを参照し（ステップＳ２４１）、対象処理と後続処理の処理型が異なるか、または、対象処理の後続処理が複数存在するかを判定する（ステップＳ２４２）。 The first integrated query generation unit 11 refers to the target process and the subsequent process in the flow definition 2 (step S241), and the process types of the target process and the subsequent process are different, or there are a plurality of subsequent processes of the target process. It is determined whether to do so (step S242).

対象処理と後続処理の処理型が同じ、かつ、対象処理の後続処理が複数存在しない場合に（ステップＳ２４２のＮ）、対象処理と後続処理の演算型を計算する（ステップＳ２４３）。 When the processing types of the target process and the subsequent process are the same, and there are not a plurality of subsequent processes of the target process (N in step S242), the calculation types of the target process and the subsequent process are calculated (step S243).

ステップＳ２４３の処理の結果、対象処理または後続処理の演算型が、拡張、選択、射影以外の演算型を含まない場合に（ステップＳ２４４のＮ）、さらに、対象処理の演算型が拡張を含まない場合には（ステップＳ２４５のＮ）、第１統合クエリ生成部１１は、統合可と判定する（ステップＳ２４６）。一方、対象処理の演算型が拡張を含む場合は（ステップＳ２４５のＹ）、第１統合クエリ生成部１１は、さらに後続処理の演算型が拡張または選択を含むかを判定する（ステップＳ２４７）。後続処理の演算型が拡張または選択を含まない場合は（ステップＳ２４７のＮ）、第１統合クエリ生成部１１は、統合可と判定し（ステップＳ２４６）、後続処理の演算型が拡張または選択を含む場合は（ステップＳ２４７のＹ）、さらに、対象処理で追加したフィールドを後続処理で参照しているかを判定する（ステップＳ２４８）。対象処理で追加したフィールドを後続処理で参照していなければ（ステップＳ２４８のＮ）、第１統合クエリ生成部１１は、統合可と判定する（ステップＳ２４６）。対象処理で追加したフィールドを後続処理で参照していれば（ステップＳ２４８のＹ）、第１統合クエリ生成部１１は、統合不可と判定する（ステップＳ２４９）。 As a result of the process of step S243, when the operation type of the target process or the subsequent process does not include an operation type other than expansion, selection, or projection (N in step S244), the operation type of the target process does not include an extension. In this case (N in Step S245), the first integrated query generation unit 11 determines that integration is possible (Step S246). On the other hand, when the operation type of the target process includes an extension (Y in step S245), the first integrated query generation unit 11 further determines whether the operation type of the subsequent process includes an extension or selection (step S247). When the operation type of the subsequent process does not include expansion or selection (N in Step S247), the first integrated query generation unit 11 determines that the integration is possible (Step S246), and the operation type of the subsequent process expands or selects. If it is included (Y in step S247), it is further determined whether the field added in the target process is referenced in the subsequent process (step S248). If the field added in the target process is not referred to in the subsequent process (N in Step S248), the first integrated query generation unit 11 determines that the integration is possible (Step S246). If the field added in the target process is referred to in the subsequent process (Y in step S248), the first integrated query generation unit 11 determines that the integration is impossible (step S249).

ステップＳ２４２の処理で、対象処理と後続処理の処理型が異なるか、または、対象処理の後続処理が複数存在する場合（ステップＳ２４２のＹ）、または、ステップＳ２４４の処理で、対象処理または後続処理の演算型が、拡張、選択、射影以外の演算型を含む場合には（ステップＳ２４４のＹ）、第１統合クエリ生成部１１は、統合不可と判定する（ステップＳ２４９）。 In the process of step S242, when the process type of the target process is different from that of the subsequent process, or there are a plurality of subsequent processes of the target process (Y in step S242), or the target process or the subsequent process is performed in the process of step S244 If the calculation type includes calculation types other than expansion, selection, and projection (Y in step S244), the first integrated query generation unit 11 determines that integration is not possible (step S249).

図２４は、演算型計算処理（ステップＳ２４３）のより詳細な処理フロー例を示す図である。 FIG. 24 is a diagram illustrating a more detailed processing flow example of the arithmetic calculation processing (step S243).

第１統合クエリ生成部１１は、生成クエリ記憶部１７に、対象処理の演算型が登録済みであるかを判定し（ステップＳ４３１）、演算型が登録済みでなければ（ステップＳ４３１のＮ）、演算型を空（Φ）にして（ステップＳ４３２）、フロー定義２から、対象処理の入力データと出力データのスキーマを取得する（ステップＳ４３３）。 The first integrated query generation unit 11 determines whether the calculation type of the target process has been registered in the generation query storage unit 17 (step S431). If the calculation type has not been registered (N in step S431), The calculation type is empty (Φ) (step S432), and the schema of the input data and output data of the target process is acquired from the flow definition 2 (step S433).

第１統合クエリ生成部１１は、出力データに、入力データにはないフィールドが存在するかを判定する（ステップＳ４３４）。出力データに、入力データにはないフィールドが存在すれば（ステップＳ４３４のＹ）、第１統合クエリ生成部１１は、生成クエリ記憶部の演算型に拡張を追加し（ステップＳ４３５）、入力データにはないフィールドが存在しなければ（ステップＳ４３４のＮ）、ステップＳ４３６の処理に進む。 The first integrated query generation unit 11 determines whether there is a field not included in the input data in the output data (step S434). If the output data includes a field that is not in the input data (Y in step S434), the first integrated query generation unit 11 adds an extension to the operation type of the generation query storage unit (step S435), and adds the input data to the input data. If no field exists (N in Step S434), the process proceeds to Step S436.

例えば、図２５に示す処理Ｐ１（日時→時刻変換）では、処理定義に定義された新しいフィールド「時刻」が出力データに追加されているので、処理Ｐ１の演算型は拡張と判定される。 For example, in the process P1 (date-to-time conversion) shown in FIG. 25, since the new field “time” defined in the process definition is added to the output data, the operation type of the process P1 is determined to be extended.

第１統合クエリ生成部１１は、出力データに、入力データの全フィールドが揃っているかを判定する（ステップＳ４３６）。入力データの全フィールドが揃っていなければ（ステップＳ４３６のＮ）、第１統合クエリ生成部１１は、生成クエリ記憶部の演算型に射影を追加し（ステップＳ４３７）、入力データの全フィールドが揃っていれば（ステップＳ４３６のＹ）、ステップＳ４３８の処理に進む。 The first integrated query generation unit 11 determines whether all fields of the input data are included in the output data (step S436). If all the fields of the input data are not complete (N in step S436), the first integrated query generation unit 11 adds a projection to the operation type of the generation query storage unit (step S437), and all the fields of the input data are complete. If so (Y in step S436), the process proceeds to step S438.

例えば、図２６に示す処理Ｐ３（フィールド選択）では、入力データと出力データのフィールドが揃っておらず、出力データに入力データの全フィールドが含まれていないので（不一致（減少））、処理Ｐ３の演算型は射影と判定される。 For example, in the process P3 (field selection) shown in FIG. 26, the fields of the input data and the output data are not aligned, and the output data does not include all the fields of the input data (mismatch (decrease)). The operation type of is determined to be projection.

第１統合クエリ生成部１１は、対象処理にＷＨＥＲＥ句の句テンプレートが存在するかを判定する（ステップＳ４３８）。対象処理にＷＨＥＲＥ句の句テンプレートが存在すれば（ステップＳ４３８のＹ）、第１統合クエリ生成部１１は、生成クエリ記憶部の演算型に選択を追加し（ステップＳ４３９）、対象処理にＷＨＥＲＥ句の句テンプレートが存在しなければ（ステップＳ４３８のＮ）、ステップＳ４３１０の処理に進む。 The first integrated query generation unit 11 determines whether a phrase template for the WHERE phrase exists in the target process (step S438). If the phrase template of the WHERE phrase exists in the target process (Y in step S438), the first integrated query generation unit 11 adds a selection to the operation type of the generation query storage unit (step S439), and the WHERE phrase in the target process If the phrase template does not exist (N in step S438), the process proceeds to step S4310.

例えば、図２７に示す処理Ｐ２（値範囲指定）では、入力データと出力データのフィールドが一致し、処理に適用されたＷＨＥＲＥ句の句テンプレートが存在するので、処理Ｐ２の演算型は選択と判定される。 For example, in the process P2 (value range specification) shown in FIG. 27, the input data field and the output data field match, and the phrase template of the WHERE clause applied to the process exists, so the operation type of the process P2 is determined to be selection. Is done.

第１統合クエリ生成部１１は、ＳＥＬＥＣＴ、ＦＲＯＭ、ＷＨＥＲＥ以外の句テンプレートが存在するかを判定する（ステップＳ４３１０）。対象処理にＳＥＬＥＣＴ、ＦＲＯＭ、ＷＨＥＲＥ以外の句テンプレートが存在すれば（ステップＳ４３１０のＹ）、第１統合クエリ生成部１１は、生成クエリ記憶部の演算型にその他を追加し（ステップＳ４３１１）、対象処理にＳＥＬＥＣＴ、ＦＲＯＭ、ＷＨＥＲＥ以外の句テンプレートが存在しなければ（ステップＳ４３１０のＮ）、処理を終了する。 The first integrated query generation unit 11 determines whether a phrase template other than SELECT, FROM, and WHERE exists (step S4310). If a phrase template other than SELECT, FROM, and WHERE exists in the target process (Y in step S4310), the first integrated query generation unit 11 adds others to the operation type of the generated query storage unit (step S4311), and the target If there is no phrase template other than SELECT, FROM, and WHERE in the process (N in step S4310), the process ends.

処理の演算型が登録済みであれば（ステップＳ４３１のＹ）、対象処理の演算型を生成クエリ記憶部１７から取得し（ステップＳ４３１２）、処理を終了する。 If the operation type of the process has been registered (Y in step S431), the operation type of the target process is acquired from the generated query storage unit 17 (step S4312), and the process ends.

図２８は、入れ子による統合クエリ生成処理（ステップＳ３）のより詳細な処理フロー例を示す図である。 FIG. 28 is a diagram illustrating a more detailed process flow example of the nested integrated query generation process (step S3).

第２統合クエリ生成部１３は、生成クエリ記憶部１７から、第１統合クエリ生成部１１により生成された各クエリに入れ子による統合の未実施を示すフラグを付加する。そして、第２統合クエリ生成部１３は、入れ子による統合が未実施の処理があるかを判定する（ステップＳ３１）。 The second integrated query generation unit 13 adds a flag indicating that unnested integration is not performed to each query generated by the first integrated query generation unit 11 from the generated query storage unit 17. Then, the second integrated query generation unit 13 determines whether there is a process that has not been integrated by nesting (step S31).

入れ子による統合が未実施の処理があれば（ステップＳ３１のＹ）、第２統合クエリ生成部１３は、生成クエリ記憶部１７に保存されたクエリの先頭から入れ子による統合未実施の処理を選択し、対象処理とする（ステップＳ３２）。第２統合クエリ生成部１３は、対象処理の後続処理があるかを判定する（ステップＳ３３）。対象処理の後続処理があれば（ステップＳ３３のＹ）、第２統合クエリ生成部１３は、生成クエリ記憶部１７から対象処理の後続処理を選択する（ステップＳ３４）。 If there is a process that is not yet integrated by nesting (Y in step S31), the second integrated query generation unit 13 selects a process that is not yet integrated by nesting from the top of the query stored in the generated query storage unit 17. The target process (step S32). The second integrated query generation unit 13 determines whether there is a subsequent process of the target process (step S33). If there is a subsequent process of the target process (Y in step S33), the second integrated query generation unit 13 selects a subsequent process of the target process from the generated query storage unit 17 (step S34).

第２統合クエリ生成部１３は、対象処理と後続処理の処理型が異なるか、または、対象処理の後続処理が複数存在するかを判定する（ステップＳ３５）。 The second integrated query generation unit 13 determines whether the processing types of the target process and the subsequent process are different, or whether there are a plurality of subsequent processes of the target process (step S35).

対象処理と後続処理の処理型が同じ、かつ、対象処理の後続処理が複数存在しない場合に（ステップＳ３５のＮ）、第２統合クエリ生成部１３は、対象処理と後続処理の生成クエリを入れ子により統合し、統合クエリを生成する（ステップＳ３６）。第２統合クエリ生成部１３は、生成した統合クエリを対象処理のクエリとして生成クエリ記憶部へ保存し（ステップＳ３７）、後続処理を生成クエリ記憶部１７から削除する（ステップＳ３８）。 When the processing types of the target process and the subsequent process are the same and there are not a plurality of subsequent processes of the target process (N in step S35), the second integrated query generation unit 13 nests the generation query of the target process and the subsequent process Are integrated to generate an integrated query (step S36). The second integrated query generation unit 13 saves the generated integrated query as a target process query in the generated query storage unit (step S37), and deletes subsequent processing from the generated query storage unit 17 (step S38).

図２９に示すように、処理Ｐ１の生成クエリＱ１０と処理Ｐ２（Ｐ２＋Ｐ３）の生成クエリＱ１１とが入れ子により統合され、統合クエリＱ２０が生成される。そして、図３０に示すように、生成クエリ記憶部１７の処理Ｐ１（日時→時刻変換）の生成クエリが統合クエリＱ２０に置き換えられ、処理Ｐ２（値範囲指定）が削除される。 As shown in FIG. 29, the generation query Q10 of the process P1 and the generation query Q11 of the process P2 (P2 + P3) are integrated by nesting to generate an integrated query Q20. Then, as shown in FIG. 30, the generated query of the process P1 (date / time conversion) in the generated query storage unit 17 is replaced with the integrated query Q20, and the process P2 (value range specification) is deleted.

ステップＳ３３の処理で、対象処理の後続処理がない場合（ステップＳ３３のＮ）、または、ステップＳ３５の処理で、対象処理と後続処理の処理型が異なるか、または、対象処理の後続処理が複数存在する場合（ステップＳ３５のＹ）、対象処理について統合済みをフロー定義にマークする（ステップＳ３９）。 If there is no subsequent process of the target process in the process of step S33 (N of step S33), or the process type of the target process and the subsequent process is different in the process of step S35, or there are a plurality of subsequent processes of the target process If it exists (Y in step S35), the flow definition is marked as integrated for the target process (step S39).

図３１は、コンポーネント生成処理（ステップＳ４）のより詳細な処理フロー例を示す図である。 FIG. 31 is a diagram illustrating a more detailed processing flow example of the component generation processing (step S4).

統合コンポーネント生成部２０は、フロー定義２にコンポーネント未生成の処理があるかを判定する（ステップＳ４１）。コンポーネント未生成の処理があれば（ステップＳ４１のＹ）、統合コンポーネント生成部２０は、フロー定義２からコンポーネント未生成の処理を選択し（ステップＳ４２）、その処理の処理型をもとに、コンポーネントテンプレート記憶部２１から、対応するコンポーネントテンプレートを取得する（ステップＳ４３）。 The integrated component generation unit 20 determines whether there is a component non-generated process in the flow definition 2 (step S41). If there is a process in which no component is generated (Y in step S41), the integrated component generation unit 20 selects a process in which no component has been generated from the flow definition 2 (step S42), and based on the processing type of the process, the component A corresponding component template is acquired from the template storage unit 21 (step S43).

コンポーネントテンプレート記憶部２１には、図３２に示すように、クエリ言語に対応するコンポーネントテンプレートが設定されている。図３３は、コンポーネントテンプレートの例を示す図である。図３３（Ａ）は、ＥＰＬ用のコンポーネントテンプレートＣ１の例、図３３（Ｂ）は、ＨｉｖｅＱＬ用のコンポーネントテンプレートＣ２の例である。ここで、処理型に応じてクエリ言語が選択されるので、処理型がリアルタイムであればコンポーネントテンプレートＣ１が選択され、処理型がバッチであればコンポーネントテンプレートＣ２が選択される。 In the component template storage unit 21, a component template corresponding to the query language is set as shown in FIG. FIG. 33 is a diagram illustrating an example of a component template. FIG. 33A shows an example of a component template C1 for EPL, and FIG. 33B shows an example of a component template C2 for HiveQL. Here, since the query language is selected according to the processing type, the component template C1 is selected if the processing type is real time, and the component template C2 is selected if the processing type is batch.

統合コンポーネント生成部２０は、生成クエリ記憶部１７から、選択した処理に対応するクエリを取得し（ステップＳ４４）、選択したコンポーネントテンプレートをもとに、コンポーネントを生成する（ステップＳ４５）。フロー定義２から処理Ｐ１の処理型がリアルタイムであれば、コンポーネントテンプレートＣ１が選択されるため、図３４に示すように、コンポーネントテンプレートＣ１に、生成クエリ記憶部１７に保存されていた処理Ｐ１の統合クエリＱ２０が適用されて、コンポーネントが生成される。 The integrated component generation unit 20 acquires a query corresponding to the selected process from the generation query storage unit 17 (step S44), and generates a component based on the selected component template (step S45). If the processing type of the process P1 from the flow definition 2 is real-time, the component template C1 is selected. Therefore, as shown in FIG. 34, the process P1 stored in the generated query storage unit 17 is integrated into the component template C1. Query Q20 is applied to generate the component.

統合コンポーネント生成部２０によって生成されたコンポーネント３は出力され、記憶装置、記憶媒体等に保存される。 The component 3 generated by the integrated component generation unit 20 is output and stored in a storage device, a storage medium, or the like.

以上説明した統合コンポーネント生成装置１は、図４に示す処理部を備える専用ハードウェアとして実施することができる。 The integrated component generation apparatus 1 described above can be implemented as dedicated hardware including the processing unit illustrated in FIG.

また、統合コンポーネント生成装置１を、図３５に示すような、ＣＰＵ１０１、メモリ１０２、記憶装置（ハードディスク）１０３、入力装置（キーボード）１０４、出力装置（ディスプレイ）１０５、ネットワーク接続装置１０６等が内部のネットワーク等で接続されたコンピュータ１００で実施することができる。 Further, the integrated component generation apparatus 1 includes a CPU 101, a memory 102, a storage device (hard disk) 103, an input device (keyboard) 104, an output device (display) 105, a network connection device 106, and the like as shown in FIG. It can be implemented by the computer 100 connected via a network or the like.

さらに、統合コンポーネント生成装置１を、コンピュータ１００で実行可能なプログラムとして実施することができる。この場合に、図４に示す統合コンポーネント生成装置１の処理部の機能を実現するプログラムを実装し、コンピュータ１００上で実行することにより、実施する。すなわち、図４に示したクエリ統合装置１０の第１統合クエリ生成部１１，第２統合クエリ生成部１３、さらに、統合コンポーネント生成部２０の機能をコンピュータに実行させる実行プログラムをコンピュータ１００に読み込ませ、実行させることによって，統合コンポーネント生成装置１を実現することができる。 Furthermore, the integrated component generation apparatus 1 can be implemented as a program that can be executed by the computer 100. In this case, a program that realizes the function of the processing unit of the integrated component generation apparatus 1 shown in FIG. 4 is installed and executed on the computer 100. That is, the computer 100 is caused to read an execution program that causes the computer to execute the functions of the first integrated query generation unit 11, the second integrated query generation unit 13, and the integrated component generation unit 20 of the query integration device 10 illustrated in FIG. , The integrated component generation apparatus 1 can be realized.

なお、実行プログラムは、ＣＤ−ＲＯＭ、ＣＤ−ＲＷ、ＤＶＤ−Ｒ、ＤＶＤ−ＲＡＭ、ＤＶＤ−ＲＷ等やフレキシブルディスク等の記録媒体だけでなく、通信回線の先に備えられた他の記憶装置やコンピュータのハードディスク等に記憶されるものであってもよい。 The execution program is not only a recording medium such as a CD-ROM, CD-RW, DVD-R, DVD-RAM, DVD-RW, or flexible disk, but also other storage devices provided at the end of the communication line. It may be stored in a computer hard disk or the like.

なお、統合コンポーネント生成装置１のクエリ統合装置１０を構成する要素は、任意の組合せで実現されてもよい。複数の構成要素が１つの部材として実現されてもよく、１つの構成要素が複数の部材から構成されてもよい。また、クエリ統合装置１０は、上述した実施形態に限定されず、本発明の要旨を逸脱しない範囲において各種の改良および変更を行ってもよいことは当然である。 Note that the elements constituting the query integration device 10 of the integrated component generation device 1 may be realized in any combination. A plurality of components may be realized as one member, and one component may be configured from a plurality of members. In addition, the query integration device 10 is not limited to the above-described embodiment, and various improvements and changes may naturally be made without departing from the gist of the present invention.

以上説明したように、開示したクエリ統合装置１０によれば、フロー定義２に含まれる各処理のクエリを統合するため、生成するクエリ数を大幅に減らすことができ、クエリに対応するコンポーネント数も大幅に減らすことができる。よって、コンポーネント数の削減に応じて通信のオーバーヘッド等を小さくすることできる。 As described above, according to the disclosed query integration device 10, since the queries of each process included in the flow definition 2 are integrated, the number of queries to be generated can be greatly reduced, and the number of components corresponding to the query is also increased. It can be greatly reduced. Therefore, communication overhead and the like can be reduced according to the reduction in the number of components.

１統合コンポーネント生成装置
１０クエリ統合装置
１１第１統合クエリ生成部
１３第２統合クエリ生成部
１５句分割テンプレート記憶部
１７生成クエリ記憶部
２０統合コンポーネント生成部
２１コンポーネントテンプレート記憶部
２フロー定義
３生成コンポーネント DESCRIPTION OF SYMBOLS 1 Integrated component production | generation apparatus 10 Query integration apparatus 11 1st integrated query production | generation part 13 2nd integrated query production | generation part 15 Phrase division | segmentation template memory | storage part 17 Generation | occurrence | production query memory | storage part 20 Integrated component production | generation part 21 Component template memory | storage part 2 Flow definition 3 Generation | occurrence | production component

Claims

Computer
Load a flow definition in which multiple processes with defined process details and attributes and the processing order of the multiple processes are defined,
The process defined in the flow definition is extracted from the head and set as the target process, and the first query corresponding to the target process and the subsequent process of the same processing type as the target process processed immediately after the target process A query integration method, comprising: generating an integrated query that integrates a corresponding second query and repeating the process of using the integrated query as a first query corresponding to the target process.

The computer is
In the process of generating the integrated query, when the second query and the first query are of an operation type in which the operation result does not change due to a change in the processing order, the corresponding phrases of the first query and the second query Are combined to generate the integrated query, and when the second query and the first query are of an operation type in which an operation result changes due to a change in processing order, the first query is changed to the second query. The query integration method according to claim 1, wherein the integrated query is generated by being incorporated as a subquery.

On the computer,
Load a flow definition in which multiple processes with defined process details and attributes and the processing order of the multiple processes are defined,
The process defined in the flow definition is extracted from the head and set as the target process, and the first query corresponding to the target process and the subsequent process of the same processing type as the target process processed immediately after the target process A query integration program, comprising: generating an integrated query that integrates a corresponding second query and repeating the process of using the integrated query as a first query corresponding to the target process.

A flow definition acquisition unit that reads a plurality of processes in which processing contents and attributes are respectively defined, and a flow definition in which a processing order of the plurality of processes is defined;
The process defined in the flow definition is extracted from the head and set as the target process, and the first query corresponding to the target process and the subsequent process of the same processing type as the target process processed immediately after the target process A query integration apparatus comprising: an integrated query generation unit that generates an integrated query that integrates a corresponding second query and sets the integrated query as a first query corresponding to the target process.