WO2013145310A1 - Data stream parallel processing program, method, and system - Google Patents

Data stream parallel processing program, method, and system Download PDF

Info

Publication number
WO2013145310A1
WO2013145310A1 PCT/JP2012/058731 JP2012058731W WO2013145310A1 WO 2013145310 A1 WO2013145310 A1 WO 2013145310A1 JP 2012058731 W JP2012058731 W JP 2012058731W WO 2013145310 A1 WO2013145310 A1 WO 2013145310A1
Authority
WO
WIPO (PCT)
Prior art keywords
distributed
query
queries
segment
node
Prior art date
Application number
PCT/JP2012/058731
Other languages
French (fr)
Japanese (ja)
Inventor
エメリック ヴィエル
晴康 上田
Original Assignee
富士通株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士通株式会社 filed Critical 富士通株式会社
Priority to PCT/JP2012/058731 priority Critical patent/WO2013145310A1/en
Publication of WO2013145310A1 publication Critical patent/WO2013145310A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • the present invention relates to a data stream parallel processing program, method, and system.
  • the data (event sequence) obtained from the data stream can be temporarily stored in the database, and then the data can be extracted and processed.
  • the measures as described above often do not provide sufficient results for needs. Therefore, there is a need for a technique for processing and analyzing a large amount of data streams in real time (or near real time). In order to satisfy this need, a technique for sequentially processing data streams in parallel is required.
  • FIG. 1 shows an example of data stream processing.
  • the stream processing system 140 sequentially processes the data streams for the three input streams (110, 120, 130).
  • the stream processing system 140 provides two output streams (150, 160).
  • a plurality of events 111, 112, 113 are input to the stream processing system 140 in time series.
  • the stream processing system 140 includes a plurality of queries (142, 144, 146, 148, 149). These queries are similar to those used in static database processing. However, queries for stream processing systems have different aspects than queries for databases in that they always operate on input information and provide the required output. Also, the output of one query is the input of another query, which is different from a database query.
  • the term “query” as used herein also has a different function than a database query.
  • Each query is connected as indicated by a plurality of arrows. These arrows indicate the flow of data.
  • the stream 150 output from the stream processing system 140 includes, for example, a plurality of processing results (151 and 152).
  • a graph indicating a connection relationship between a plurality of queries included in the stream processing system 140 is referred to as a query graph.
  • a processing program including a set of queries represented by a query graph and the relationship between each query is referred to as a data stream program.
  • two queries Q1 and Q2 are connected by an intermediate stream 240. Then, the input stream 210 is processed by the query Q1 and the query Q2, and the query Q2 outputs the output stream 270.
  • the query Q1 has K1 and K2 as distribution keys
  • the query Q2 has K2 and K3 as distribution keys.
  • the distribution key corresponds to a distribution key that can be applied to a hash function used in parallel hash join processing in a static database.
  • a set of distributed keys input to the hash function is called a distributed key set.
  • a distributed key means a key used for joining in a join operator constituting a query.
  • the distributed key used in this specification has the same meaning as the above definition.
  • the hash join method using the distribution key is used to appropriately distribute the data stream processed by the query to the subsequent nodes, as described below with reference to FIG. It should be noted that this is different from the hash join method in the database technology that handles tables.
  • FIG. 2B shows an example in which the data stream program shown in FIG. 2A is executed in parallel in order to process the input stream 210 according to the parallel hash join method.
  • the input stream 210 is expressed using a notation according to the table handled in the database.
  • the input stream 210 has a plurality of keys (K1, K2, K3). Further, a plurality of specific events (212, 214, 216, 218) are arranged in time series to form the input stream 210.
  • the query Q1 is deployed to the node 232 and the node 234.
  • the node may be, for example, a physical machine or a virtual machine.
  • the query Q1 is distributedly processed by the two nodes (232, 234).
  • the distribution key set (K 1, K 2) is applied to an appropriate hash function, and the input stream 210 is divided into the stream 221 and the stream 222.
  • the event 212a and the event 214a arrive at the node 232 sequentially and are processed.
  • Event 216a and event 218a sequentially arrive at node 234 and are processed by stream 222.
  • a technique used in a static database can be used as the hash function. Specifically, a hash table or the like may be used. In this case, various hash functions for separating the data stream 210 into two streams (221, 222) using the distributed key set (K1, K2) are applicable.
  • the query Q2 is further deployed in the node 252 and the node 254.
  • Q2 has a distributed key set (K2, K3) and is different from Q1. Therefore, for example, the events to be processed by Q2 in the node 252 are the event 212b from the node 232 and the event 216b from the node 234. Similarly, events to be processed by Q2 in the node 254 are an event 214b from the node 232 and an event 218b from the node 234.
  • the output is distributed to the stream 242 and the stream 244 according to the distributed key set (K2, K3) of Q2 and an appropriate hash function, and an appropriate event of the output is assigned to the node 252 and the node 252 respectively. 254 need to be sent.
  • the node 234 distributes the output to the stream 246 and the stream 248 according to the distributed key set (K2, K3) of Q2 and an appropriate hash function, and the appropriate event of the output is assigned to the node 252 and the node 245, respectively. 254 need to be sent.
  • the data flow between the nodes is described using each event of the input stream.
  • the processing result of the query belonging to the preceding node is provided to the succeeding node at the succeeding node located after the certain preceding node.
  • the above-mentioned network resource consumption increases as the number of nodes for parallel distribution increases, which is a problem in appropriately realizing parallel distributed processing.
  • the input inquiry request is analyzed, and a plurality of database calculation requests corresponding to the number of key ranges of the database calculation keys and a plurality of data retrieval requests corresponding to the database calculation requests are generated.
  • N keys are sent from node 1 to node 2, receiving matching data, and matching data
  • an object of the present invention is to reduce the amount of data communication between queries. *
  • the embodiment is a program for deploying a plurality of queries for processing a data stream to a plurality of nodes, and each of the plurality of nodes outputs a result of executing the query deployed to the nodes to a subsequent node or Used to obtain a plurality of queries and a connection relationship between each of the plurality of queries, hash the data stream, and execute each of the plurality of queries in parallel.
  • a distributed key set that is associated with each of the plurality of queries and is a distributed segment to which a query subset of the plurality of queries belongs, and a common distributed key set that the query subset has in common
  • the query subset belonging to the segment is deployed, processes, provides a program for causing a computer to execute.
  • FIG. 6 illustrates node redistribution in one embodiment.
  • FIG. 4 is a diagram illustrating extraction of distributed segments in an embodiment.
  • FIG. 6 illustrates the deployment of queries to nodes distributed in parallel according to one embodiment. It is a functional block diagram of one embodiment. It is a figure which shows the hardware structural example of one Embodiment.
  • FIG. 3 shows an outline of an embodiment according to the present invention.
  • FIG. 3A shows a query graph of a data stream program including a query Q1 and a query Q2 similar to those in FIG.
  • a different point from FIG. 2 is that a distributed segment 310 in which the query Q1 and the query Q2 are integrated is introduced.
  • the distributed segment 310 has a distributed key set K2, which is a common distributed key set for the queries Q1 and Q2.
  • queries having a common distributed key set belong to the same distributed segment, and the distributed segment has this common distributed key set (common distributed key set).
  • FIG. 3B shows an example of a measure for reducing the above-mentioned complicated communication between nodes (between queries) that occurred in FIG. 2B.
  • a query Q1 and a query Q2 belonging to the distributed segment 310 are arranged in each of the four nodes (312, 314, 316, 318).
  • the data stream 210 is then distributed at point 320 by applying the distributed key set K2 to the appropriate hash function. That is, the event (212c, 214c, 216c, 218c) is given to the node (312, 314, 316, 318) via the stream (321, 322, 323, 324), respectively.
  • the distributed key set K2 that is a shared distributed key set of the query Q1 and the query Q2 is applied to the hash function, and the stream 210 is distributed and distributed.
  • the output of the query Q1 may be given to the query Q2 in the node.
  • the other nodes (314, 316, 318). Therefore, in this case, complicated communication (242, 244, 246, 248) between nodes as seen in FIG. 2B does not occur. Then, the outputs of the nodes are finally combined to obtain an output 270.
  • a common distributed key set exists in continuous queries, these are combined as one distributed segment in the query graph. Then, one or more queries included in the same distributed segment are deployed to one node. As a result, even if the nodes are distributed in parallel, the occurrence of complicated communication between the nodes is prevented.
  • the number of distributed segments and the number of nodes that distribute queries in parallel may be determined depending on the weight of query processing, the number of physical machines that can be adopted as nodes, the amount of streams, and the like.
  • An appropriate hash function to which the distributed key set of the distributed segment should be applied may be defined according to the determined number of distributed processing nodes. As for the hash function, an appropriate hash function may be used so that the events of the data stream are as uniform as possible.
  • FIG. 4A shows the processing flow of one embodiment.
  • step 402 a data stream program including a query is read. This data stream program is converted into a query graph. The query graph is used for the following processing.
  • step 404 the distribution key set of each query is extracted.
  • a key connected by a join can be an element of a set of distributed keys.
  • a distributed segment is generated. For example, when a common distributed key set exists between two adjacent query distributed key sets, a distributed segment having the common distributed key set may be generated. These two queries can belong to the generated segment. Details of this processing will be described later (see FIGS. 5 to 7).
  • step 408 priorities are assigned to the extracted plurality of distributed segments according to a predetermined rule. Details of the priority assignment processing will be described later (see FIG. 8). And the process which selects from the dispersion
  • step 410 a query is assigned to each distributed segment.
  • each distributed segment is distributed in parallel to each node.
  • an appropriate hash function is specified in consideration of the number of available nodes, the weight of query processing, and the like. Then, a distributed segment is assigned to each node, and a query belonging to the distributed segment is deployed.
  • step 430 shows processing for further tuning when the data stream is actually processed.
  • Processing is started in step 430.
  • a timer interrupt may be used and executed periodically. Or you may perform suitably according to an operator's instruction
  • step 432 execution profiling of each query and / or each node is performed.
  • the processing status of each query and / or each node is acquired as a profile.
  • Various information including the number of events per unit time in each query and / or each node, CPU load, memory usage, processing capability, and the like may be acquired in the profile.
  • this profile is evaluated. For example, a heavily loaded query and / or node is identified using a predetermined rule. This rule may use the number of events per fixed time, memory usage, and the like. This rule corresponds to the second rule. This identified query and / or node is determined as a load balancing recommendation target in a later process, and if it is a recommendation target, a process for load balancing is performed [0].
  • step 434 it is determined whether or not there is a candidate distributed segment that can be used for the above-described load distribution from the unused distributed segments that are not yet used. For example, it is checked whether there is an unused distributed segment having a query that overlaps with the query specified in step 432 and / or the query set included in the node. If such an unused distributed segment exists, there is a possibility that a query belonging to the unused distributed segment can be deployed to a new node.
  • Step 436 if there is another unused distributed segment, an unused distributed segment is selected in consideration of the priority of the distributed segment. Attach the query to the selected distributed segment. Thereafter, the process returns to step 412.
  • step 412 as described above, a query belonging to the selected distributed segment is deployed to the new node so that the selected distributed segment can be used. If necessary, a plurality of new nodes may be prepared and distributed in parallel. A specific example will be described later (see FIG. 11).
  • the distribution of data streams by hash functions may be checked by profiling.
  • the range of values for each distribution key is greater than the number of nodes distributed in parallel. For this reason, as long as there is no deviation in the distribution by the hash function, the event is transferred to all the nodes distributed in parallel.
  • a hash function that is conscious of eliminating the deviation may be selected again. Further, the deployment plan related to the number of parallel distributions of nodes may be changed.
  • FIG. 5 shows a detailed flow of the process of step 406 in FIG.
  • each query existing in the query graph acquired in step 402 is comprehensively processed in order from the input side. Therefore, processing is looped by the number of queries.
  • step 502 the process is started.
  • the query is processed by paying attention to one query in order from the query close to the input.
  • One query that is attracting attention in the processing is referred to as “under consideration query”.
  • step 504 a list of distributed segments extracted up to the previous query (that is, a query connected to the input side of the query under consideration) is acquired. If a plurality of queries are connected to the input side of the query under consideration, it is necessary that the distributed segment extraction process has been completed for all of the plurality of queries. A specific example of this process will be described later with reference to FIG.
  • step 506 a distributed key set is obtained from the query under consideration.
  • step 508 a new distributed segment is created from the acquired distributed key set. Only the query under consideration will belong to this new distributed segment. Further, the distribution key set of the new distribution segment is the same as the distribution key set of the query under consideration belonging to the new distribution segment.
  • step 510 the distributed key set of the new distributed segment is matched with the distributed key set of the distributed segment extracted up to the previous query.
  • the new distributed segment is registered as an official distributed segment. In this case, it is divided into the following two processes. (1) If there is a distributed segment including a partially matching distributed key set, a new distributed segment having a partially matching distributed key set is created and registered to the input side. (2) If there is no partially consistent distribution key, the new distribution segment is formally registered as it is. A specific example of this process will be described later with reference to FIG.
  • step 512 If there is an unprocessed query in step 512, the process returns to step 512. If the processing has been completed for all queries, this processing ends.
  • FIG. 6 shows a specific example of the processing in step 504 in FIG. Assume that the query Q3 is a query under consideration. In this case, in order to check the query Q3, the generation (extraction) of the distributed segments in the queries Q1 and Q2, which are the queries before the query Q3, must be completed.
  • FIG. 6A shows the integration processing of the same distributed segment.
  • Q1 belongs to the distributed segment S1
  • the distributed key set of the distributed segment S1 is K1.
  • Q2 belongs to the distributed segment S2 and indicates that the common distributed key set is K1.
  • the distributed segment S1 and the distributed segment S2 have the same distributed key set.
  • the distributed segment S2 is discarded, and the query Q2 also belongs to the distributed segment S1. Therefore, both the query Q1 and the query Q2 belong to the distributed segment S1 and have the distributed key set K1.
  • This state is shown after processing.
  • the query Q1 and the query Q2 that are not adjacent to each other may belong to the same distributed segment S1.
  • the query Q2 also belongs to the distributed segment S1.
  • FIG. 6B illustrates a case where the distributed segment S1 to which the query Q1 belongs and the distributed segment S2 to which the query Q2 belongs have a common distributed key set.
  • the distributed segment S1 and the distributed segment S2 have a common distributed key set K1.
  • a new distributed segment S3 having a distributed key set K1 is created.
  • the distributed segment S3 (K1) (620) and the distributed segment S3 (K1) (621) are generated. Therefore, in this case, the query Q1 belongs to the distributed segment S1 and also belongs to the distributed segment S3.
  • the query Q2 belongs to the distributed segment S2 and also belongs to the distributed segment S3.
  • FIG. 6C shows an example in which the distributed segment S1 to which the query Q1 belongs is extended to the query Q2. That is, as shown in the distributed segment S1 (K1) (630), it can be seen that the query Q2 also belongs to the distributed segment S1 after the processing.
  • FIG. 7 shows a specific example of the processing of step 510 in FIG. Assume that the query Q3 is a query under consideration.
  • the query Q3 has a distributed key set K1.
  • the query Q2 before the query Q3 belongs to the distributed segment S1 having a common distributed key set K1.
  • the query Q3 also belongs to the distributed segment S1 having the common distributed key set K1.
  • the distributed segment S1 is extended to the query Q3 (see S1 (K1) (710)).
  • FIG. 7B illustrates a case where the query Q2 before the query Q3 under consideration does not belong to a distributed segment having the same distributed key set as the distributed key set of Q3.
  • the query Q1, the query Q2, and the query Q3 have a common distributed key set K1. Therefore, a new distributed segment S2 is generated, and the query Q1, the query Q2, and the query Q3 are assigned to this.
  • This point is shown as distributed segment S2 (K1) (720), distributed segment S2 (K1) (721), and distributed segment S2 (K1) (722).
  • FIG. 8 shows a detailed flow of step 408 in FIG. It should be noted that the flow shown in FIG. 8 performs recursive processing.
  • step 802 one or more distributed segments to be processed are checked.
  • a predetermined evaluation function is applied to the distributed segment to be processed, and sorting is performed in order of high evaluation (priority order).
  • the predetermined evaluation function the longer the distributed segment length (the number of queries to which it belongs), the higher the evaluation for the distributed segment may be. Or you may give high evaluation, so that there are many common distributed key sets which a distributed segment has. Or you may give high evaluation, so that the weight of the process of the query which belongs to a distributed segment is heavy.
  • the processing weight may be acquired by execution profiling as described in FIG.
  • the present invention is not limited to the above evaluation function. This evaluation function corresponds to the first rule.
  • step 804 it is determined whether there is only one distributed segment to be processed. If this determination is “No”, the process proceeds to Step 806. If this determination is “Yes”, the process proceeds to Step 820.
  • step 820 an appropriate order is assigned to the last one distributed segment.
  • the last of the one or more distributed segments may be assigned to the processing target distributed segment.
  • this invention is not limited to provision of this order. In this case, since the last processing target distributed segment has been processed, the processing ends.
  • step 806 a distributed segment with the highest priority is acquired from among the distributed segments remaining as candidates for processing.
  • a distributed segment having the highest priority among the distributed segments belonging to the subgraph is acquired. It should be noted that this processing flow makes a recursive call, so that subgraphs may be nested. A method for creating a subgraph will be described in step 812.
  • the acquired distributed segments are given priorities in the order of the distributed segments.
  • priority if there is one or more distributed segments that have already been given priority in the range of the obtained distributed segment, the last priority among the one or more distributed segments The next priority may be given to the obtained distributed segment.
  • this invention is not limited to provision of this priority.
  • step 810 the distributed segment to which the priority is given in the immediately preceding step 808 may be excluded from the processing target. If priority is given to a part of a distributed segment, the distributed segment is deleted if priority is not given to all parts (all queries) including other parts of the distributed segment. You may leave without. This is because the plurality of distributed segments may overlap partially.
  • a subgraph is created in the range of the distributed segment deleted in the immediately preceding step 810.
  • the subgraph refers to a distributed segment that exists in the range of the deleted distributed segment among the distributed segments that remain as processing targets. If a part of the distributed segment is included in this range, a part of the distributed segment that is included in this range is included in the subgraph.
  • the subgraph to be created already exists, there is no need to create a new subgraph. If the already created subgraph is an empty set, the subgraph is deleted. As already mentioned, it should be noted that this process is called recursively, so multiple subgraphs may be nested. When a subgraph is deleted, the subgraph may exist in a shallower nest. In this case, what is necessary is just to continue a process regarding the distributed segment which exists in the subgraph. If there is no subgraph, the process moves to step 814.
  • step 814 in order to perform distributed segment prioritization processing for other ranges, the priority to be assigned to the distributed segment is reset, and step 802 is recursively called for the remaining distributed segments.
  • Priority is given to distributed segments by the above processing.
  • the result of specific prioritization of the distributed segments will be described with reference to FIG.
  • the present invention is not limited to the above prioritization.
  • FIG. 9 shows an embodiment relating to extraction of distributed segments.
  • FIG. 9A shows a query graph having a query Q1, a query Q2, and a query Q3.
  • the query Q1 has K1 and K2 as distribution keys.
  • the query Q2 has K1, K2, and K3 as distribution keys.
  • Query Q3 has K1 and K3 as distribution keys.
  • FIG. 9B is a table in which distributed segments extracted by applying the already described distributed segment extraction method are arranged in correspondence with queries.
  • the segment column to which the segment belongs is lined up with segments with higher priority in order from the left.
  • FIG. 9C is a table in which queries are arranged in correspondence with the extracted distributed segments.
  • the information in the table of FIG. 9B and the table of FIG. 9C is the same.
  • the distributed segment S1 constitutes a query subset including the query Q1 and the query Q2.
  • the distributed segment S2 includes a query Q2.
  • the distributed segment S3 includes a query Q2 and a query Q3.
  • the distributed segment S4 includes a query Q1, a query Q2, and a query Q3.
  • FIG. 10 illustrates the deployment of queries to nodes that are distributed in parallel using distributed segments, according to one embodiment.
  • FIG. 10A shows the distributed segments (S1, S2, S3, S4) arranged in order of priority.
  • the distributed segment S4 is the highest and the distributed segment S2 is the lowest.
  • an evaluation function is used in which the priority becomes higher as the segment length (number of queries) becomes longer.
  • FIG. 10B shows an example in which the distributed segment S4 (1000) having the highest priority is adopted and the queries (Q1, Q2, Q3) belonging to the distributed segment S4 are deployed.
  • FIG. 10C shows an example in which a query (Q1, Q2, Q3) is arranged in each node in units of the distributed segment S4, and the nodes 1000-1, 1000-2 to 1000-N are distributed in parallel.
  • the number of parallel distributions is N.
  • the input stream 1010 is distributed to N nodes.
  • the input stream may be distributed by applying K1 which is a common distributed key set of the distributed segment S4 to a hash function that outputs N hash values.
  • K1 is a common distributed key set of the distributed segment S4 to a hash function that outputs N hash values.
  • the distributed stream is processed in the order of the query Q1, the query Q2, and the query Q3 in each node.
  • the processing results of the nodes are combined at point 1030 to obtain a processing result 1040.
  • FIG. 11 shows an example in which the processing load of the query Q1 (1110) of the node 1 and the query 1130 of the node 3 is large as a result of actually profiling the input stream according to the embodiment, and redistribution is performed. Show. As shown in the figure, node 1.2 is added and node 1.1 is redistributed. In addition, node 3.2 is added and redistribution of node 3.1 is performed. Details thereof will be described below.
  • the data flow between each query is described using each event of the input stream.
  • the processing result of the query belonging to the preceding query can be provided to the succeeding query for the succeeding query located after the certain preceding query.
  • the redistribution key is different, so re-hashing occurs and communication increases by a certain amount.However, this does not occur at all nodes, but at a specific node (or to a specific hash value) The impact of this communication is local. For this reason, when considering redistribution, it is desirable to confirm that there is a sufficient bandwidth in the network to which the redistributed nodes are directly connected.
  • the node management mechanism When performing redistribution, it is desirable that the node management mechanism establishes a virtual node called node 1 so as not to affect the hash function H1 that distributes the stream to N at the point 1190. Then, the virtual node may be distributed to the physical nodes of node 1.1 and node 1.2. In the node 1.2, a query Q1 (1115) and a query Q2 (1125) belonging to the segment S1 are provided. As a result, the processing of the query Q1 (1110) and the query Q2 (1120) of the node 1.1 is distributed. Since the number of nodes has increased, it is necessary to perform distribution at the point 1195 by applying the distributed key set (K1, K2) to the hash H2 that outputs two hash values. The output of the query Q2 (1125) needs to be given to the query Q3 (1121). Therefore, it can be seen that the communication amount increases to some extent by adding the node 1.2.
  • K1, K2 distributed key set
  • a segment including the query Q2 and having the next highest priority is searched. Since it can be seen that the segment S3 meets this condition, the queries Q2 and Q3 belonging to the segment S3 are deployed to the new node 3.2 and redistribution is performed.
  • a query Q2 (1135) and a query Q3 (1345) belonging to the segment S3 are provided.
  • the processing of the query Q2 (1130) and the query Q3 (1140) of the node 3.1 is distributed. Since this increases the number of nodes, it is necessary to distribute the hash function H2 that outputs two hash values by applying the distributed key set (K1, K3).
  • the outputs of the query Q3 (1140) and the query Q3 (1145) need to be given to the point 1196. Therefore, it can be seen that the communication amount increases to some extent by adding the node 3.2.
  • FIG. 11B shows an example of a hash function. Note that K1% N means that the distributed key set K1 is applied to a hash function that outputs N hash values.
  • FIG. 12 illustrates the extraction of distributed segments in one embodiment.
  • FIG. 12A shows a query graph, which indicates that the queries Q1 to Q6 each have the illustrated distributed key set.
  • FIG. 12B is a table showing the distributed segments to which each query belongs.
  • FIG. 12C is a table showing queries belonging to each segment.
  • FIG. 13 illustrates the deployment of queries to parallel distributed nodes according to one embodiment.
  • FIG. 13 (A) shows the distributed segments (S1, S2, S3, S4, S5) arranged in order of priority.
  • the distributed segments S1 and S5 are the highest and the distributed segment S4 is the lowest.
  • an evaluation function is used in which the priority increases as the segment length (number of queries) increases.
  • S1 the longest segment in the subgraph (Q5, Q6) excluding the subgraph (Q1, Q2, Q3, Q4) of S1 is S5 instead of S4, so S5 is assigned to Q5 and Q6. It is desirable to assign priority. Therefore, the priority of S5 is high.
  • FIG. 13B employs distributed segments S1 and S5 having the highest priority, and queries (Q1, Q2, Q3, Q4) belonging to the distributed segment S1 and queries (Q5, Q5) belonging to the distributed segment S5.
  • queries Q1, Q2, Q3, Q4 belonging to the distributed segment S1 and queries (Q5, Q5) belonging to the distributed segment S5.
  • An example in which Q6) is deployed is shown.
  • a query (Q1, Q2, Q3, Q4) is allocated to node 1 to node N / 2, and a query (Q5, Q6) is assigned to node N / 2 + 1 to node N.
  • An example of deployment and parallel distribution is shown.
  • the output result of Q4 (1301) of node 1 needs to be distributed to node N / 2 + 1 or node N by applying distributed key set K3 to a hash function that outputs N / 2 hash values. is there.
  • the distributed key set K1 of the segment S1 and the distributed key set K3 of the segment S5 are different. In this way, communication occurs between nodes at the segment boundary, but the amount of communication can be drastically reduced as compared with the case where a node is assigned for each query.
  • FIG. 14 shows a functional block diagram of an embodiment, which includes elements constituting the system.
  • This embodiment includes a query information acquisition unit 1420, a distributed key set extraction unit 1430, a distributed segment generation unit 1440, a parallel distribution unit 1450, a profile acquisition unit 1460, a profile evaluation unit 1465, and a query / node specification unit 1470.
  • the query information acquisition unit 1420 accepts the data stream program 1410.
  • the query information acquisition unit 1420 recognizes a plurality of queries and a connection relationship between each of the plurality of queries. Then, the result is passed to the distributed key set extraction unit 1430.
  • the distributed key set extraction unit 1430 extracts a distributed key set that can be used when hashing the data stream and executing each of the plurality of queries in parallel with each of the plurality of queries.
  • the distributed segment generation unit 1440 generates a distributed segment to which a series of queries having a common distributed key set belong.
  • at least one query belonging to the distributed segment is contiguous. Exceptionally, it may be desirable to have multiple queries that send output to the same query belong to the same segment, as in the case shown in FIG.
  • the distributed segment generation unit 1440 may include a distributed segment priority assigning unit 1445 and a distributed segment selection unit 1446.
  • the distributed segment priority assigning unit 1445 assigns a priority to each distributed segment using a predetermined evaluation function.
  • the predetermined evaluation function the longer the distributed segment length (the number of queries to which it belongs), the higher the evaluation for the distributed segment may be. Or you may give high evaluation, so that there are many common distributed key sets which a distributed segment has. Or you may give high evaluation, so that the weight of the process of the query which belongs to a distributed segment is heavy.
  • the distributed segment selection unit 1446 selects a distributed segment to be used from one or more distributed segments based on the priority assigned to the distributed segment.
  • the parallel distribution unit 1450 distributes a plurality of nodes in parallel based on a hash function applied to a common distribution key set corresponding to the distributed segment, in order to execute a query belonging to the distributed segment to be used in parallel.
  • Each of the plurality of nodes distributed in parallel is provided with a query belonging to the distributed segment. Note that an appropriate number of nodes may be determined based on the number of existing physical machines.
  • a query is deployed to a plurality of nodes distributed in parallel.
  • the profile acquisition unit 1460 performs profiling when the data stream is processed. Through this profiling, an execution profile of each query and / or each node is obtained.
  • the profile evaluation unit 1465 can check whether a load is concentrated on a specific query and / or a specific node by evaluating the profile.
  • a measure of load concentration various information including the number of events per unit time in each query and / or each node, CPU load, memory usage, processing capability, and the like may be acquired. Note that the present invention is not limited to the above exemplary description.
  • the query / node specifying unit 1470 uses a predetermined evaluation function to specify a query or node where the load is concentrated. This result is given to the re-parallel distribution unit 1455.
  • the re-parallel distribution unit 1455 may exist in the parallel distribution unit 1450.
  • the re-parallel distribution unit 1455 includes a new node in which a new segment to which all or part of a specific query or a query deployed in the specific node belongs, a node in which the specific query exists, or the specific Redistribute the nodes again in parallel.
  • FIG. 15 shows a configuration example of hardware (computer) according to the embodiment of the present invention.
  • the hardware includes a CPU 1510, a memory 1515, an input device 1520, an output device 1525, an external storage device 1530, a portable recording medium drive device 1535, and a network connection device 1545. Each device is connected by a bus 1550. Further, the portable recording medium driving device 1535 can read and write the portable recording medium 1540.
  • a network 1560 is connected to the network connection device 1545.
  • the portable recording medium 1540 refers to one or more non-transitory, tangible storage media having a structure.
  • Illustrative examples of the portable recording medium 1540 include a magnetic recording medium, an optical disk, a magneto-optical recording medium, and a nonvolatile memory.
  • Magnetic recording media include HDDs, flexible disks (FD), magnetic tapes (MT) and the like.
  • Optical disks include DVD (Digital Versatile Disc), DVD-RAM, CD-ROM (Compact Disc-Read Only Memory), CD-R (Recordable) / RW (ReWritable), and the like.
  • Magneto-optical recording media include MO (Magneto-Optical disk). All or part of the embodiments of the present invention can be implemented by reading a program stored in a portable recording medium and executing it by a CPU.

Abstract

Provided is a system for allocating a plurality of queries used to process data streams to a plurality of nodes. Each of the plurality of nodes has a capability of outputting a result obtained by executing an allocated query to a subsequent stage. The system includes: a query information acquisition section that acquires the plurality of queries and the connection relationships among the plurality of queries; a distributed-key-set extraction section that extracts a distributed key set which can be used to execute each of the plurality of queries in parallel, in association with each of the plurality of queries; a distributed-segment generation section that generates a distributed segment to which a query subset of the plurality of queries belongs and which has a common distributed key set that the query subset has in common; and a parallel distribution section that distributes nodes in parallel corresponding to the distributed segment and allocates the query subset that belongs to the distributed segment to each of the one or more nodes distributed in parallel.

Description

データストリームの並列処理プログラム、方法、及びシステムData stream parallel processing program, method, and system
 本発明は、データストリームの並列処理プログラム、方法、及びシステムに関する。 The present invention relates to a data stream parallel processing program, method, and system.
 近年、ネットワークに接続される様々な情報ソース、機器、センサ等から提供される大量のデータであるビッグデータを収集し活用するサービスの需要が高まってきている。実世界で発生する大量のデータを順次処理できれば、リアルタイムに近い状態で情報を得ることができる。例えば、様々なセンサから常時提供される大量のデータストリームを順次処理できる技術が必要とされている。 In recent years, there has been an increasing demand for services that collect and utilize big data, which is a large amount of data provided from various information sources, devices, sensors, etc. connected to a network. If a large amount of data generated in the real world can be sequentially processed, information can be obtained in a state close to real time. For example, there is a need for a technique that can sequentially process a large amount of data stream that is constantly provided from various sensors.
 このような要求に応える技術として、ビックデータを扱う複合イベント処理が知られている。しかしながら、近年のスマートフォンやタブレット端末の普及は、通信量を飛躍的に増大させている。また、今後、人間だけでなく大量の機器がネットワークにつながるようになると、通信量はますます増加することが予想される。したがって、このような技術のさらなる発展が必要となってきている。 As a technology that meets such requirements, complex event processing that handles big data is known. However, the spread of smartphones and tablet terminals in recent years has dramatically increased the amount of communication. Further, in the future, when not only humans but also a large number of devices are connected to the network, it is expected that the amount of communication will increase further. Therefore, further development of such technology is required.
 この場合、例えば、データストリームから得られたデータ(イベントの列)を一旦データベースに格納してから、データの抽出や加工を行うことができる。しかしながら、リアルタイムに的確な情報を簡便に得る観点からは、上記のような方策は、ニーズに対する十分な結果が得られないことが多い。したがって、大量のデータストリームをリアルタイムで(または、リアルタイムに近い状態で)処理し分析する技術が必要とされている。そして、このニーズを満足するためには、データストリームを並列的に順次処理する技術が必要とされる。 In this case, for example, the data (event sequence) obtained from the data stream can be temporarily stored in the database, and then the data can be extracted and processed. However, from the viewpoint of easily obtaining accurate information in real time, the measures as described above often do not provide sufficient results for needs. Therefore, there is a need for a technique for processing and analyzing a large amount of data streams in real time (or near real time). In order to satisfy this need, a technique for sequentially processing data streams in parallel is required.
 図1は、データストリーム処理の例を示している。図示されるように、3つの入力ストリーム(110,120,130)に対して、ストリーム処理システム140が、データストリームを順次処理する。そして、このストリーム処理システム140が、2つの出力ストリーム(150,160)を提供する。例えば入力ストリーム110においては、複数のイベント(111,112,113)が、時系列的にストリーム処理システム140に投入される。ストリーム処理システム140は、複数のクエリ(142,144,146,148,149)を含んでいる。これらのクエリは、静的なデータベースの処理に用いられるクエリと近似したものである。しかしながら、ストリーム処理システム用のクエリは、入力情報に対して常時動作し、必要とされる出力を提供する点で、データベースのためのクエリとは異なる側面を有する。また、あるクエリの出力が別のクエリの入力になっていることも、データベース用のクエリとは異なる。したがって、本明細書で用いる用語「クエリ」は、データベースのクエリとは異なる機能をも有していることに留意すべきである。また、各クエリは、複数の矢印で示されるように連結されている。これらの矢印は、データの流れを示している。そして、ストリーム処理システム140から出力されたストリーム150には、例えば複数の処理結果(151,152)が含まれる。なお、本明細書においては、ストリーム処理システム140に含まれる複数のクエリの結合関係を示すグラフをクエリグラフと呼ぶ。そして、クエリグラフで表されるクエリの集合及び各クエリの関係を含む処理プログラムをデータストリームプログラムと呼ぶ。 FIG. 1 shows an example of data stream processing. As shown, the stream processing system 140 sequentially processes the data streams for the three input streams (110, 120, 130). The stream processing system 140 provides two output streams (150, 160). For example, in the input stream 110, a plurality of events (111, 112, 113) are input to the stream processing system 140 in time series. The stream processing system 140 includes a plurality of queries (142, 144, 146, 148, 149). These queries are similar to those used in static database processing. However, queries for stream processing systems have different aspects than queries for databases in that they always operate on input information and provide the required output. Also, the output of one query is the input of another query, which is different from a database query. Thus, it should be noted that the term “query” as used herein also has a different function than a database query. Each query is connected as indicated by a plurality of arrows. These arrows indicate the flow of data. The stream 150 output from the stream processing system 140 includes, for example, a plurality of processing results (151 and 152). In the present specification, a graph indicating a connection relationship between a plurality of queries included in the stream processing system 140 is referred to as a query graph. A processing program including a set of queries represented by a query graph and the relationship between each query is referred to as a data stream program.
 図2(A)においては、2つのクエリQ1及びQ2が中間ストリーム240で連結されている。そして、入力ストリーム210を、クエリQ1及びクエリQ2が処理し、クエリQ2が出力ストリーム270を出力する。そして、クエリQ1は、分散キーとして、K1及びK2を持ち、クエリQ2は、分散キーとしてK2とK3を持っている。なお、分散キーとは、静的なデータベースにおいて並列ハッシュ結合の処理において使用されるハッシュ関数に適用され得る分散キーに対応する。ハッシュ関数に投入される分散キーの集合を分散キーセットと呼ぶ。言葉を換えて説明すれば、例えば、分散キーとは、クエリを構成する結合(join)オペレータにおいて結合用に利用されるキーを意味する。本明細書において用いる分散キーも、上記の定義と同様の意味を持つ。なお、本明細書において分散キーを用いたハッシュ結合手法は、以下図2(B)を用いて説明するように、クエリで処理されたデータストリームを後段のノードに適切に振り分けるために利用される点で、表を扱うデータベースの技術におけるハッシュ結合手法とは異なっている点に留意すべきである。 In FIG. 2A, two queries Q1 and Q2 are connected by an intermediate stream 240. Then, the input stream 210 is processed by the query Q1 and the query Q2, and the query Q2 outputs the output stream 270. The query Q1 has K1 and K2 as distribution keys, and the query Q2 has K2 and K3 as distribution keys. Note that the distribution key corresponds to a distribution key that can be applied to a hash function used in parallel hash join processing in a static database. A set of distributed keys input to the hash function is called a distributed key set. In other words, for example, a distributed key means a key used for joining in a join operator constituting a query. The distributed key used in this specification has the same meaning as the above definition. In this specification, the hash join method using the distribution key is used to appropriately distribute the data stream processed by the query to the subsequent nodes, as described below with reference to FIG. It should be noted that this is different from the hash join method in the database technology that handles tables.
 図2(b)は、図2(A)で示されたデータストリームプログラムを並列ハッシュ結合手法に従って、入力ストリーム210を処理するために、クエリを並列実行する一例を示している。 FIG. 2B shows an example in which the data stream program shown in FIG. 2A is executed in parallel in order to process the input stream 210 according to the parallel hash join method.
 入力ストリーム210は、データベースで扱われる表に準じた表記を用いて表現している。入力ストリーム210は、複数のキー(K1,K2,K3)を持っている。また、具体的な複数のイベント(212,214,216,218)が時系列的に並んで、入力ストリーム210を形成している。クエリQ1は、ノード232及びノード234に配備される。ここで、ノードとは、具体的には、例えば物理マシン又は仮想マシンであってもよい。クエリQ1を2つのノード(232,234)によって分散処理する。ポイント220において、入力ストリーム210を、ノード232及びノード234に振り分けるために、適切なハッシュ関数に分散キーセット(K1,K2)を適用し、ストリーム221とストリーム222に分ける。ストリーム221によって、イベント212aとイベント214aがノード232に順次到着し、処理される。そして、ストリーム222によって、イベント216aとイベント218aがノード234に順次到着し、処理される。なお、ハッシュ関数としては、静的なデータベースで用いられている技術が利用できる。具体的には、ハッシュテーブル等を用いてもよい。この場合には、分散キーセット(K1,K2)を用いてデータストリーム210を2つのストリーム(221,222)に分離する様々なハッシュ関数が適用可能である。 The input stream 210 is expressed using a notation according to the table handled in the database. The input stream 210 has a plurality of keys (K1, K2, K3). Further, a plurality of specific events (212, 214, 216, 218) are arranged in time series to form the input stream 210. The query Q1 is deployed to the node 232 and the node 234. Here, specifically, the node may be, for example, a physical machine or a virtual machine. The query Q1 is distributedly processed by the two nodes (232, 234). At point 220, in order to distribute the input stream 210 to the nodes 232 and 234, the distribution key set (K 1, K 2) is applied to an appropriate hash function, and the input stream 210 is divided into the stream 221 and the stream 222. By the stream 221, the event 212a and the event 214a arrive at the node 232 sequentially and are processed. Event 216a and event 218a sequentially arrive at node 234 and are processed by stream 222. A technique used in a static database can be used as the hash function. Specifically, a hash table or the like may be used. In this case, various hash functions for separating the data stream 210 into two streams (221, 222) using the distributed key set (K1, K2) are applicable.
 図2(b)において、さらに、クエリQ2が、ノード252及びノード254に配備される。Q2は分散キーセット(K2,K3)を持っており、Q1とは異なっている。このため、例えばノード252におけるQ2が処理すべきイベントは、ノード232からのイベント212bと、ノード234からのイベント216bとなる。同様に、ノード254におけるQ2が処理すべきイベントは、ノード232からのイベント214bと、ノード234からのイベント218bとなる。 In FIG. 2B, the query Q2 is further deployed in the node 252 and the node 254. Q2 has a distributed key set (K2, K3) and is different from Q1. Therefore, for example, the events to be processed by Q2 in the node 252 are the event 212b from the node 232 and the event 216b from the node 234. Similarly, events to be processed by Q2 in the node 254 are an event 214b from the node 232 and an event 218b from the node 234.
 このため、ノード232においては、Q2の分散キーセット(K2,K3)と、適切なハッシュ関数に従って、出力をストリーム242とストリーム244に振り分けて、出力のうちの適切なイベントをそれぞれノード252とノード254に送る必要がある。同様に、ノード234においては、Q2の分散キーセット(K2,K3)と、適切なハッシュ関数に従って、出力をストリーム246とストリーム248に振り分けて、出力のうちの適切なイベントをそれぞれノード252とノード254に送る必要がある。 Therefore, in the node 232, the output is distributed to the stream 242 and the stream 244 according to the distributed key set (K2, K3) of Q2 and an appropriate hash function, and an appropriate event of the output is assigned to the node 252 and the node 252 respectively. 254 need to be sent. Similarly, the node 234 distributes the output to the stream 246 and the stream 248 according to the distributed key set (K2, K3) of Q2 and an appropriate hash function, and the appropriate event of the output is assigned to the node 252 and the node 245, respectively. 254 need to be sent.
 このように、図2(B)に示した例では、4つのノード(232,234,252,254)を設けて、クエリQ1及びQ2を並列ハッシュ結合手法により並列実行したにもかかわらず、4つのノード(232,234,252,254)の間で、4つのストリーム(242,244,246,248)による相互通信が発生してしまうことが分かる。この通信は、ノード間のネットワークリソースを消費することとなる。 In this way, in the example shown in FIG. 2B, four nodes (232, 234, 252, 254) are provided, and the queries Q1 and Q2 are executed in parallel by the parallel hash join method. It can be seen that communication between the four nodes (242, 244, 246, 248) occurs between the two nodes (232, 234, 252, 254). This communication consumes network resources between nodes.
 なお、本明細書では、説明のわかりやすさから、入力ストリームの各イベントを用いて、各ノード間でのデータの流れを記載している。実際の処理においては、ある前段ノードの後段に位置する後段ノードには、前段ノードに所属するクエリの処理結果が後段ノードに提供される。 In this specification, for easy understanding of the description, the data flow between the nodes is described using each event of the input stream. In the actual processing, the processing result of the query belonging to the preceding node is provided to the succeeding node at the succeeding node located after the certain preceding node.
 上述のネットワークのリソースの消費量は、並列分散のためのノードを増やすほど増大することとなり、並列分散処理を適切に実現する上での課題となっている。 The above-mentioned network resource consumption increases as the number of nodes for parallel distribution increases, which is a problem in appropriately realizing parallel distributed processing.
 従来、データベースを処理する複数のノードを備え、データがそれぞれのノードに分散して格納されており、かつ、関連元と関連先のデータが同一のノードに格納された場合、関連操作を含む問い合せを実行処理する分散型のデータベースシステムの技術がある。関連元と関連先が同一のノードに格納されている場合、関連操作はノードをまたぐことがなく、関連操作処理がノード毎に非同期な処理となり並列に実行することが可能である。ノード毎に処理された結果は、処理結果のソート・マージによってまとめられる技術が存在する(特許文献1参照)。 Conventionally, when multiple nodes that process databases are stored, data is distributed and stored in each node, and related source and related destination data are stored in the same node, queries including related operations There is a technology of a distributed database system that executes and processes. When the relation source and the relation destination are stored in the same node, the related operation does not cross the nodes, and the related operation process becomes an asynchronous process for each node and can be executed in parallel. There is a technique in which the results processed for each node are collected by sorting and merging the processing results (see Patent Document 1).
 また、ストリームデータ処理においてデータを受け取った計算機で処理に必要なデータが格納されている場合、クエリを実行し、必要なデータが格納されていなかった場合には、それを記憶している他の計算機から処理すべきデータを受信し、クエリを実行する技術が存在する(特許文献2参照)。 In addition, when data necessary for processing is stored in the computer that has received the data in the stream data processing, a query is executed, and when the necessary data is not stored, other data is stored. There is a technique for receiving data to be processed from a computer and executing a query (see Patent Document 2).
 また、分散データベースにおいて、入力された問い合わせ要求を解析し、データベース演算キーのキーレンジの数に応じた複数のデータベース演算要求とデータベース演算要求にて対象となる複数のデータの取り出し要求とを生成する。そして生成されたデータベース演算要求を前記キーレンジそれぞれに対応して設けられている複数のノードにそれぞれ分配し、データベース演算処理結果を、複数のノードから受け取り、問い合わせ要求の処理結果を出力する技術がある(特許文献3参照)。 Also, in the distributed database, the input inquiry request is analyzed, and a plurality of database calculation requests corresponding to the number of key ranges of the database calculation keys and a plurality of data retrieval requests corresponding to the database calculation requests are generated. . A technique for distributing the generated database calculation requests to a plurality of nodes provided corresponding to the key ranges, receiving the database calculation processing results from the plurality of nodes, and outputting the processing results of the inquiry request. Yes (see Patent Document 3).
 また、3つ以上の装置からなる分散データベースにおいて、3つ以上の表の間でジョインを行う場合に,ノード1からN個ずつのキーをノード2に送り、マッチするデータを受取り,マッチしたデータのキーをノード3に送り、マッチするデータを受取り、2つの結果を合わせて全体にマッチしたものを出力する技術がある(特許文献4参照)。 Also, in a distributed database consisting of three or more devices, when performing a join between three or more tables, N keys are sent from node 1 to node 2, receiving matching data, and matching data There is a technique for sending a key to node 3, receiving matching data, and combining the two results and outputting the entire match (see Patent Document 4).
特開平10-240753号公報Japanese Patent Laid-Open No. 10-240753 特開2011-34255号公報JP 2011-34255 A 特許3538322号公報Japanese Patent No. 3538322 特開2010-272030号公報JP 2010-272030 A
 一側面では、本発明は、クエリ間のデータ通信量の削減を図ることを目的とする。  In one aspect, an object of the present invention is to reduce the amount of data communication between queries. *
 実施形態は、データストリームを処理する複数のクエリを複数のノードに配備するためのプログラムであって、前記複数のノードの各々は、該ノードに配備されたクエリを実行した結果を後段のノード又は出力に提供する能力を有し、前記複数のクエリ及び前記複数のクエリの各々の間の接続関係を取得し、前記データストリームをハッシュして前記複数のクエリの各々を並列に実行させる場合に利用し得る分散キーセットを、前記複数のクエリの各々と対応付けて抽出し、前記複数のクエリのクエリ部分集合が所属する分散セグメントであって、前記クエリ部分集合が共通に持つ共通分散キーセットを持つ分散セグメントを生成し、前記分散セグメントに対応させてノードを並列分散させ、1つ以上の前記並列分散されたノードの各々に、前記セグメントに所属する前記クエリ部分集合が配備される、処理を、コンピュータに実行させるプログラムを提供する。 The embodiment is a program for deploying a plurality of queries for processing a data stream to a plurality of nodes, and each of the plurality of nodes outputs a result of executing the query deployed to the nodes to a subsequent node or Used to obtain a plurality of queries and a connection relationship between each of the plurality of queries, hash the data stream, and execute each of the plurality of queries in parallel. A distributed key set that is associated with each of the plurality of queries and is a distributed segment to which a query subset of the plurality of queries belongs, and a common distributed key set that the query subset has in common A distributed segment having a node, and correspondingly distributing the node in parallel to each of the one or more parallel-distributed nodes. The query subset belonging to the segment is deployed, processes, provides a program for causing a computer to execute.
 実施態様によれば、クエリ間のデータ通信量の削減を図ることができる。 According to the embodiment, it is possible to reduce the amount of data communication between queries.
クエリを用いたデータストリーム処理の概念を説明する図である。It is a figure explaining the concept of the data stream process using a query. クエリを並列ハッシュ結合手法により並列実行する例を示す図である。It is a figure which shows the example which performs a query in parallel by the parallel hash joining method. 本発明の一実施形態の概略を示す図である。It is a figure which shows the outline of one Embodiment of this invention. 一実施形態の処理フローを示す図である。It is a figure which shows the processing flow of one Embodiment. 一実施形態における分散セグメントの抽出フローの詳細を示す図である。It is a figure which shows the detail of the extraction flow of the dispersion | distribution segment in one Embodiment. 一実施形態における分散セグメントの抽出の具体例を示す図である。It is a figure which shows the specific example of extraction of the dispersion | distribution segment in one Embodiment. 一実施形態における分散セグメントの抽出の具体例を示す図である。It is a figure which shows the specific example of extraction of the dispersion | distribution segment in one Embodiment. 一実施形態における分散セグメントの抽出フローの詳細を示す図である。It is a figure which shows the detail of the extraction flow of the dispersion | distribution segment in one Embodiment. 一実施形態における分散セグメントの抽出の結果を示す図である。It is a figure which shows the result of the extraction of the dispersion | distribution segment in one Embodiment. 一実施形態における分散セグメントの優先度付けの例を示す図である。It is a figure which shows the example of prioritization of the distribution segment in one Embodiment. 一実施形態におけるノードの再分配を示す図である。FIG. 6 illustrates node redistribution in one embodiment. 一実施形態における分散セグメントの抽出を示す図である。FIG. 4 is a diagram illustrating extraction of distributed segments in an embodiment. 一実施形態に従って並列分散されたノードへのクエリの配備を示す図である。FIG. 6 illustrates the deployment of queries to nodes distributed in parallel according to one embodiment. 一実施形態の機能ブロック図である。It is a functional block diagram of one embodiment. 一実施形態のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of one Embodiment.
 以下に、図面を用いて本発明の実施態様を詳細に説明する。なお、以下の実施態様は、発明を理解するためのものであり、本発明の範囲を限定するためのものではない点に留意すべきである。また、以下の複数の実施態様は、相互に排他的なものではない。したがって、矛盾が生じない限り、実施態様の各要素を組み合わせることも意図されていることに留意すべきである。また、請求項に記載された方法やプログラムに係る発明は、矛盾のない限り処理の順番を入れ替えてもよく、あるいは、複数の処理を同時に実施してもよい。そして、これらの実施形態も、請求項に記載された発明の技術的範囲に包含されることは言うまでもない。また、同様の構成要素に対しては、複数の図において同様の参照符号を付す場合がある点に留意すべきである。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. It should be noted that the following embodiments are for understanding the invention and are not intended to limit the scope of the present invention. Further, the following embodiments are not mutually exclusive. Therefore, it should be noted that the elements of the embodiments are also intended to be combined unless a contradiction arises. Further, in the invention according to the method and the program described in the claims, the order of the processes may be changed as long as there is no contradiction, or a plurality of processes may be performed simultaneously. It goes without saying that these embodiments are also included in the technical scope of the invention described in the claims. In addition, it should be noted that similar components may be denoted by the same reference numerals in a plurality of drawings.
 図3は、本発明に従った一実施形態の概略を示している。図3(A)は、図2と同様のクエリQ1及びクエリQ2を含むデータストリームプログラムのクエリグラフを示している。図2と異なる点は、クエリQ1及びクエリQ2を統合した分散セグメント310が導入されている点である。分散セグメント310は、クエリQ1及びクエリQ2の共通の分散キーセットである分散キーセットK2を持つ。一連のクエリのうちで、共通の分散キーセットを持つクエリが、同一の分散セグメントに所属し、分散セグメントは、この共通の分散キーセット(共通分散キーセット)を持っている。 FIG. 3 shows an outline of an embodiment according to the present invention. FIG. 3A shows a query graph of a data stream program including a query Q1 and a query Q2 similar to those in FIG. A different point from FIG. 2 is that a distributed segment 310 in which the query Q1 and the query Q2 are integrated is introduced. The distributed segment 310 has a distributed key set K2, which is a common distributed key set for the queries Q1 and Q2. Among a series of queries, queries having a common distributed key set belong to the same distributed segment, and the distributed segment has this common distributed key set (common distributed key set).
 図2に戻ると、各ノードに1つのクエリしか考慮していなかった。データストリームの処理において、各クエリ(すなわちクエリQ1及びクエリQ2)で分散キーセットが完全に一致しない場合では、各クエリが配備されているノード間で錯綜した通信が発生していた。 Referring back to FIG. 2, only one query was considered for each node. In the processing of the data stream, in the case where the distributed key sets do not completely match in each query (that is, query Q1 and query Q2), complicated communication has occurred between the nodes where each query is deployed.
 図3(B)は、図2(B)で発生していた上記の錯綜したノード間(クエリ間)の通信を減少させるための方策の一例を示している。図示するように、4つのノード(312,314,316,318)のそれぞれに、分散セグメント310に所属するクエリQ1及びクエリQ2が配備されている。そして、データストリーム210は、適切なハッシュ関数に分散キーセットK2を適用して、ポイント320において振り分けられる。すなわち、イベント(212c,214c、216c、218c)は、それぞれストリーム(321,322,323,324)を介してノード(312,314,316,318)に与えられる。この場合、クエリQ1及びクエリQ2の共通の分散キーセットである分散キーセットK2をハッシュ関数に適用してストリーム210が振り分けられ、分散処理される。このため、例えばノード312内では、ノード内でクエリQ1の出力をクエリQ2に与えればよい。他のノード)(314,316,318)についても同様である。したがって、この場合には、図2(B)に見られたような、ノード間での錯綜した通信(242,244,246、248)は発生しない。そして、各ノードの出力は、最終的に結合され、出力270が得られる。 FIG. 3B shows an example of a measure for reducing the above-mentioned complicated communication between nodes (between queries) that occurred in FIG. 2B. As shown in the figure, a query Q1 and a query Q2 belonging to the distributed segment 310 are arranged in each of the four nodes (312, 314, 316, 318). The data stream 210 is then distributed at point 320 by applying the distributed key set K2 to the appropriate hash function. That is, the event (212c, 214c, 216c, 218c) is given to the node (312, 314, 316, 318) via the stream (321, 322, 323, 324), respectively. In this case, the distributed key set K2 that is a shared distributed key set of the query Q1 and the query Q2 is applied to the hash function, and the stream 210 is distributed and distributed. For this reason, for example, in the node 312, the output of the query Q1 may be given to the query Q2 in the node. The same applies to the other nodes (314, 316, 318). Therefore, in this case, complicated communication (242, 244, 246, 248) between nodes as seen in FIG. 2B does not occur. Then, the outputs of the nodes are finally combined to obtain an output 270.
 更に図3(A)を参照する。このように、連続したクエリで、共通の分散キーセットが存在する場合には、クエリグラフにおいてこれらを1つの分散セグメントとしてまとめる。そして、同一の分散セグメントに含まれる1つ以上のクエリを、1つのノードに配備する。これによって、ノードが並列分散されても、錯綜したノード間の通信の発生は防止される。なお、分散セグメントやクエリを並列分散させるノードの数は、クエリの処理の重さ、ノードとして採用できる物理マシンの数、ストリーム量などに依存して決定すればよい。決定された分散処理のノードの数によって、分散セグメントの分散キーセットを適用すべき適切なハッシュ関数を定義すればよい。また、ハッシュ関数については、データストリームのイベントが、なるべく均等になるように、適切なハッシュ関数を用いてもよい。 Further, refer to FIG. As described above, when a common distributed key set exists in continuous queries, these are combined as one distributed segment in the query graph. Then, one or more queries included in the same distributed segment are deployed to one node. As a result, even if the nodes are distributed in parallel, the occurrence of complicated communication between the nodes is prevented. Note that the number of distributed segments and the number of nodes that distribute queries in parallel may be determined depending on the weight of query processing, the number of physical machines that can be adopted as nodes, the amount of streams, and the like. An appropriate hash function to which the distributed key set of the distributed segment should be applied may be defined according to the determined number of distributed processing nodes. As for the hash function, an appropriate hash function may be used so that the events of the data stream are as uniform as possible.
 図4(A)は、一実施形態の処理フローを示している。 FIG. 4A shows the processing flow of one embodiment.
 ステップ402において、クエリを含むデータストリームプログラムを読み込む。このデータストリームプログラムをクエリグラフに変換する。クエリグラフは、以下の処理に利用される。 In step 402, a data stream program including a query is read. This data stream program is converted into a query graph. The query graph is used for the following processing.
 ステップ404において、各クエリの分散キーセットの抽出が行われる。例えば、SQL言語において例を示せば、結合(join)で結びつけられたキーは、分散キーのセットの要素となり得る。 In step 404, the distribution key set of each query is extracted. For example, in the SQL language, a key connected by a join can be an element of a set of distributed keys.
 ステップ406において、分散セグメントの生成が行われる。例えば二つの隣り合うクエリの分散キーセットの間に共通する分散キーセットが存在する場合には、その共通する分散キーセットを持つ分散セグメントを生成してもよい。そして、この生成されたセグメントに、この二つのクエリを所属させることができる。この処理の詳細については後述する(図5ないし図7参照)。 In step 406, a distributed segment is generated. For example, when a common distributed key set exists between two adjacent query distributed key sets, a distributed segment having the common distributed key set may be generated. These two queries can belong to the generated segment. Details of this processing will be described later (see FIGS. 5 to 7).
 ステップ408において、抽出された複数の分散セグメントに対して、所定の規則で優先度を付ける。優先度の付与の処理の詳細については後述する(図8参照)。そして、優先度が高い分散セグメントから選択してゆく処理を行う。分散セグメントを選択してゆく処理において、選択された複数の分散セグメントで、全てのクエリが網羅されれば、この選択は終了する。 In step 408, priorities are assigned to the extracted plurality of distributed segments according to a predetermined rule. Details of the priority assignment processing will be described later (see FIG. 8). And the process which selects from the dispersion | distribution segment with a high priority is performed. In the process of selecting a distributed segment, if all the queries are covered by the selected plurality of distributed segments, this selection ends.
 ステップ410において、分散セグメント毎にクエリを所属させる。 In step 410, a query is assigned to each distributed segment.
 ステップ412において、各分散セグメントを各ノードに並列分散させる。並列分散させる際には、利用できるノードの数、クエリの処理の重さ等を勘案し、適切なハッシュ関数を特定する。そして、各ノードに対し、分散セグメントを割り当て、その分散セグメントに所属するクエリを配備する。 In step 412, each distributed segment is distributed in parallel to each node. When distributing in parallel, an appropriate hash function is specified in consideration of the number of available nodes, the weight of query processing, and the like. Then, a distributed segment is assigned to each node, and a query belonging to the distributed segment is deployed.
 以上の処理によって、クエリの並列分散処理を行うための複数のノードに、適切なクエリが配備され、ノード間の接続関係が構築される。 Through the above processing, appropriate queries are deployed to a plurality of nodes for performing parallel and distributed processing of queries, and connection relationships between the nodes are constructed.
 図4(B)を参照する。ステップ430から始まる処理は、実際にデータストリームを処理した場合において、更なるチューニングを行う処理を示している。 Refer to FIG. 4 (B). The processing starting from step 430 shows processing for further tuning when the data stream is actually processed.
 ステップ430で処理が開始される。この開始の契機としては、例えばタイマ割込を利用して、定期的に実行してもよい。あるいは、オペレータの指示等により、適宜に実行してもよい。 Processing is started in step 430. As a trigger for this start, for example, a timer interrupt may be used and executed periodically. Or you may perform suitably according to an operator's instruction | indication.
 ステップ432において、各クエリ及び/又は各ノードの実行プロファイリングが行われる。この処理によって、各クエリ及び/又は各ノードの処理の状況がプロファイルとして取得される。プロファイルには、各クエリ及び/又は各ノードにおける単位時間当たりのイベントの数、CPU負荷、メモリ使用量、処理能力等を含む様々な情報が取得されてもよい。 In step 432, execution profiling of each query and / or each node is performed. By this processing, the processing status of each query and / or each node is acquired as a profile. Various information including the number of events per unit time in each query and / or each node, CPU load, memory usage, processing capability, and the like may be acquired in the profile.
 ステップ433において、このプロファイルを評価する。所定の規則を用いて、例えば負荷の高いクエリ及び/又はノードが特定される。この規則は、一定時間当たりのイベントの数、メモリ使用量等を用いてもよい。この規則は、第2の規則に相当する。この特定されたクエリ及び/又はノードは、後の処理において、負荷分散の推奨対象として判断され、推奨対象であれば負荷分散のための処理がなされる[0]。 In step 433, this profile is evaluated. For example, a heavily loaded query and / or node is identified using a predetermined rule. This rule may use the number of events per fixed time, memory usage, and the like. This rule corresponds to the second rule. This identified query and / or node is determined as a load balancing recommendation target in a later process, and if it is a recommendation target, a process for load balancing is performed [0].
 ステップ434において、未だ利用していない未利用分散セグメントから、上述の負荷分散に利用できる候補の分散セグメントが存在するか否かが判断される。例えば、上記ステップ432において特定されたクエリ及び/又はノードに含まれるクエリ集合と重複するクエリを持つ未利用分散セグメントが存在するか否かがチェックされる。そのような未利用分散セグメントが存在すれば、その未利用分散セグメントに所属するクエリを新たなノードに配備できる可能性がある。 In step 434, it is determined whether or not there is a candidate distributed segment that can be used for the above-described load distribution from the unused distributed segments that are not yet used. For example, it is checked whether there is an unused distributed segment having a query that overlaps with the query specified in step 432 and / or the query set included in the node. If such an unused distributed segment exists, there is a possibility that a query belonging to the unused distributed segment can be deployed to a new node.
 ステップ436において、未利用の他の分散セグメントがあれば、その分散セグメントの優先度等を考慮して、未利用の分散セグメントを選択する。選択された分散セグメントにクエリを帰属させる。その後、処理は、ステップ412に戻る。ステップ412では、上述のように、選択された分散セグメントが利用できるよう、新たなノードに選択された分散セグメントに所属するクエリを配備する。必要に応じて新たなノードを複数用意し並列分散させてもよい。この具体的な例は後述する(図11参照)。 In Step 436, if there is another unused distributed segment, an unused distributed segment is selected in consideration of the priority of the distributed segment. Attach the query to the selected distributed segment. Thereafter, the process returns to step 412. In step 412, as described above, a query belonging to the selected distributed segment is deployed to the new node so that the selected distributed segment can be used. If necessary, a plurality of new nodes may be prepared and distributed in parallel. A specific example will be described later (see FIG. 11).
 なお、プロファイリングによって、ハッシュ関数によるデータストリームの振り分けの偏りについてチェックしてもよい。一般に、各分散キーの値のレンジは、並列分散されたノードの数より大きい。このため、ハッシュ関数による振り分けに片寄りがない限り、全ての並列分散されたノードにイベントが転送される。なお、プロファイリングにおいて、ハッシュ関数によるデータストリームの振り分けに片寄りが見出された場合には、片寄りの解消を意識したハッシュ関数を選び直してもよい。また、ノードの並列分散の数に関する配備計画を変更してもよい。 Note that the distribution of data streams by hash functions may be checked by profiling. In general, the range of values for each distribution key is greater than the number of nodes distributed in parallel. For this reason, as long as there is no deviation in the distribution by the hash function, the event is transferred to all the nodes distributed in parallel. In the profiling, when a deviation is found in the distribution of the data stream by the hash function, a hash function that is conscious of eliminating the deviation may be selected again. Further, the deployment plan related to the number of parallel distributions of nodes may be changed.
 図5は、図4におけるステップ406の処理の詳細フローを示している。なお、この処理は、ステップ402で取得されたクエリグラフに存在する各クエリを、入力側から順番に、網羅的に処理するものである。したがって、クエリの数だけ処理をループさせる。 FIG. 5 shows a detailed flow of the process of step 406 in FIG. In this process, each query existing in the query graph acquired in step 402 is comprehensively processed in order from the input side. Therefore, processing is looped by the number of queries.
 ステップ502において、処理が開始される。上述のように、クエリは入力に近いクエリから順番に1つのクエリに注目して処理を行う。処理において注目されている1つのクエリを「検討中クエリ」と呼ぶ。 In step 502, the process is started. As described above, the query is processed by paying attention to one query in order from the query close to the input. One query that is attracting attention in the processing is referred to as “under consideration query”.
 ステップ504において、前のクエリ(すなわち検討中クエリの入力側に接続されているクエリ)までに抽出された分散セグメントの一覧を取得する。なお、検討中クエリの入力側に複数のクエリが接続されている場合には、その複数のクエリの全てにおいて、分散セグメントの抽出処理が終了していることが必要である。この処理の具体例については、図6を用いて後述する。 In step 504, a list of distributed segments extracted up to the previous query (that is, a query connected to the input side of the query under consideration) is acquired. If a plurality of queries are connected to the input side of the query under consideration, it is necessary that the distributed segment extraction process has been completed for all of the plurality of queries. A specific example of this process will be described later with reference to FIG.
 ステップ506において、検討中クエリから分散キーセットを取得する。 In step 506, a distributed key set is obtained from the query under consideration.
 ステップ508において、取得された分散キーセットから、新分散セグメントを作成する。検討中クエリのみがこの新分散セグメントに所属することとなる。また、この新分散セグメントの分散キーセットは、新分散セグメントに所属している検討中クエリの分散キーセットと同一となる。 In step 508, a new distributed segment is created from the acquired distributed key set. Only the query under consideration will belong to this new distributed segment. Further, the distribution key set of the new distribution segment is the same as the distribution key set of the query under consideration belonging to the new distribution segment.
 ステップ510において、新分散セグメントの分散キーセットと、前のクエリまで抽出された分散セグメントの分散キーセットとが突き合わされる。この突き合わせにより、共通の分散キーセット(完全一致)が存在しなければ、新分散セグメントを正式な分散セグメントとして登録する。なお、この場合、以下の2つの処理に分けられる。(1)部分的に一致する分散キーセットを含む分散セグメントがある場合には、部分的に一致する分散キーセットを持つ新分散セグメントを入力側にまで延ばして作成し、登録する。(2)部分的にも一致する分散キーがない場合には、上記新分散セグメントをそのまま正式に登録する。なお、この処理の具体例については、図7を用いて後述する。また、共通の分散キーセット(完全一致)を持つ分散セグメントが存在すれば、新分散セグメントは破棄し、前のクエリまで抽出された分散セグメントに検討中分散セグメントを加える処理を行う。なお、この処理の具体例については、図7を用いて説明する。 In step 510, the distributed key set of the new distributed segment is matched with the distributed key set of the distributed segment extracted up to the previous query. As a result of this matching, if there is no common distributed key set (complete match), the new distributed segment is registered as an official distributed segment. In this case, it is divided into the following two processes. (1) If there is a distributed segment including a partially matching distributed key set, a new distributed segment having a partially matching distributed key set is created and registered to the input side. (2) If there is no partially consistent distribution key, the new distribution segment is formally registered as it is. A specific example of this process will be described later with reference to FIG. If there is a distributed segment having a common distributed key set (exact match), the new distributed segment is discarded, and a process of adding the under-considered distributed segment to the distributed segment extracted up to the previous query is performed. A specific example of this process will be described with reference to FIG.
 ステップ512において、処理されていないクエリが存在する場合には、ステップ512に戻る。全てのクエリに対して処理が終了していれば、この処理は終了する。 If there is an unprocessed query in step 512, the process returns to step 512. If the processing has been completed for all queries, this processing ends.
 図6は、図5におけるステップ504の処理の具体例を示している。クエリQ3が、検討中クエリであるとする。この場合、クエリQ3をチェックするためには、クエリQ3の前のクエリであるクエリQ1とクエリQ2における分散セグメントの生成(抽出)が終了していなければならない。 FIG. 6 shows a specific example of the processing in step 504 in FIG. Assume that the query Q3 is a query under consideration. In this case, in order to check the query Q3, the generation (extraction) of the distributed segments in the queries Q1 and Q2, which are the queries before the query Q3, must be completed.
 図6(A)は、同一分散セグメントの統合の処理を示している。ここで、図6(A)の処理前のクエリQ1に示されている表記S1(K1)は、Q1が分散セグメントS1に所属しており、かつ分散セグメントS1の分散キーセットがK1であることを示す。そしてQ2は、分散セグメントS2に所属しており、共通の分散キーセットがK1であることを示している。この場合には、分散セグメントS1と分散セグメントS2は、同一の分散キーセットを持っている。このため、分散セグメントS2は、破棄され、クエリQ2も分散セグメントS1に所属させる。したがって、クエリQ1とクエリQ2とが、共に分散セグメントS1に所属し、分散キーセットK1を持つことになる。この状態が処理後に示されている。このように、隣り合わないクエリQ1とクエリQ2が同じ分散セグメントS1に所属する場合があり得ることを示している。分散セグメントS1(K1)(610)に示されるように、クエリQ2も分散セグメントS1に所属している。 FIG. 6A shows the integration processing of the same distributed segment. Here, in the notation S1 (K1) shown in the query Q1 before processing in FIG. 6A, Q1 belongs to the distributed segment S1, and the distributed key set of the distributed segment S1 is K1. Indicates. Q2 belongs to the distributed segment S2 and indicates that the common distributed key set is K1. In this case, the distributed segment S1 and the distributed segment S2 have the same distributed key set. For this reason, the distributed segment S2 is discarded, and the query Q2 also belongs to the distributed segment S1. Therefore, both the query Q1 and the query Q2 belong to the distributed segment S1 and have the distributed key set K1. This state is shown after processing. Thus, it is shown that the query Q1 and the query Q2 that are not adjacent to each other may belong to the same distributed segment S1. As shown in the distributed segment S1 (K1) (610), the query Q2 also belongs to the distributed segment S1.
 図6(B)は、クエリQ1が所属する分散セグメントS1と、クエリQ2が所属する分散セグメントS2とが、共通の分散キーセットを持つ場合を例示している。分散セグメントS1と分散セグメントS2は、共通の分散キーセットK1を持っている。しかしながら、分散キーセットK1を持つ分散セグメントは存在していない。この場合には、分散キーセットK1を持つ新たに分散セグメントS3を作成する。図6(B)の処理後においては、分散セグメントS3(K1)(620)と分散セグメントS3(K1)(621)が生成されていることが分かる。したがって、この場合には、クエリQ1は、分散セグメントS1に所属すると共に、分散セグメントS3にも所属することとなる。同様に、クエリQ2は、分散セグメントS2に所属すると共に、分散セグメントS3にも所属することとなる。 FIG. 6B illustrates a case where the distributed segment S1 to which the query Q1 belongs and the distributed segment S2 to which the query Q2 belongs have a common distributed key set. The distributed segment S1 and the distributed segment S2 have a common distributed key set K1. However, there is no distributed segment having the distributed key set K1. In this case, a new distributed segment S3 having a distributed key set K1 is created. After the processing of FIG. 6B, it can be seen that the distributed segment S3 (K1) (620) and the distributed segment S3 (K1) (621) are generated. Therefore, in this case, the query Q1 belongs to the distributed segment S1 and also belongs to the distributed segment S3. Similarly, the query Q2 belongs to the distributed segment S2 and also belongs to the distributed segment S3.
 図6(C)は、クエリQ1が所属する分散セグメントS1がクエリQ2まで引き延ばされる例を示している。すなわち、分散セグメントS1(K1)(630)に示されるように、処理後においては、クエリQ2も分散セグメントS1に所属していることが分かる。 FIG. 6C shows an example in which the distributed segment S1 to which the query Q1 belongs is extended to the query Q2. That is, as shown in the distributed segment S1 (K1) (630), it can be seen that the query Q2 also belongs to the distributed segment S1 after the processing.
 図7は、図5におけるステップ510の処理の具体例を示している。クエリQ3が検討中クエリであるとする。 FIG. 7 shows a specific example of the processing of step 510 in FIG. Assume that the query Q3 is a query under consideration.
 図7(A)においては、クエリQ3が、分散キーセットK1を持っている。そして、クエリQ3の前のクエリQ2が、共通の分散キーセットK1を持つ分散セグメントS1に所属している。この場合には、処理後に示されるように、共通の分散キーセットK1を持つ分散セグメントS1にクエリQ3も所属させる。これによって、分散セグメントS1がクエリQ3まで引き延ばされることとなる(S1(K1)(710)参照)。 In FIG. 7A, the query Q3 has a distributed key set K1. The query Q2 before the query Q3 belongs to the distributed segment S1 having a common distributed key set K1. In this case, as shown after processing, the query Q3 also belongs to the distributed segment S1 having the common distributed key set K1. As a result, the distributed segment S1 is extended to the query Q3 (see S1 (K1) (710)).
 図7(B)は、検討中クエリQ3の前のクエリQ2が、Q3の分散キーセットと同一の分散キーセットを持つ分散セグメントに所属していない場合を例示している。なお、この場合、クエリQ1、クエリQ2及びクエリQ3が、共通の分散キーセットK1を持っている。このため、新たに分散セグメントS2を生成し、これに、クエリQ1、クエリQ2及びクエリQ3を所属させる。この点が、分散セグメントS2(K1)(720)、分散セグメントS2(K1)(721)、分散セグメントS2(K1)(722)として示されている。 FIG. 7B illustrates a case where the query Q2 before the query Q3 under consideration does not belong to a distributed segment having the same distributed key set as the distributed key set of Q3. In this case, the query Q1, the query Q2, and the query Q3 have a common distributed key set K1. Therefore, a new distributed segment S2 is generated, and the query Q1, the query Q2, and the query Q3 are assigned to this. This point is shown as distributed segment S2 (K1) (720), distributed segment S2 (K1) (721), and distributed segment S2 (K1) (722).
 図8は、図4におけるステップ408の詳細なフローを示している。図8に示すフローは、再帰的な処理を行っている点に留意すべきである。 FIG. 8 shows a detailed flow of step 408 in FIG. It should be noted that the flow shown in FIG. 8 performs recursive processing.
 ステップ802において、処理対象の1つ以上の分散セグメントがチェックされる。すなわち、処理対象の分散セグメントに所定の評価関数を適用して、評価の高い順(優先度順)にソートする。所定の評価関数としては、分散セグメントの長さ(所属するクエリの数)が長いものほど、分散セグメントに対する評価を高くしてもよい。或いは、分散セグメントが持つ共通の分散キーセットの数が多いほど、高い評価を与えてもよい。或いは、分散セグメントに所属するクエリの処理の重さが重いほど、高い評価を与えてもよい。処理の重さについては、図4(B)において説明したように、実行プロファイリングにより取得されてもよい。なお、本発明は、以上の評価関数に限定されるものではない。この評価関数は第1の規則に相当する。 In step 802, one or more distributed segments to be processed are checked. In other words, a predetermined evaluation function is applied to the distributed segment to be processed, and sorting is performed in order of high evaluation (priority order). As the predetermined evaluation function, the longer the distributed segment length (the number of queries to which it belongs), the higher the evaluation for the distributed segment may be. Or you may give high evaluation, so that there are many common distributed key sets which a distributed segment has. Or you may give high evaluation, so that the weight of the process of the query which belongs to a distributed segment is heavy. The processing weight may be acquired by execution profiling as described in FIG. The present invention is not limited to the above evaluation function. This evaluation function corresponds to the first rule.
 ステップ804において、処理対象の分散セグメントが1つのみかが判断される。この判断が「いいえ」であれば、ステップ806に進む。この判断が「はい」であれば、ステップ820に進む。 In step 804, it is determined whether there is only one distributed segment to be processed. If this determination is “No”, the process proceeds to Step 806. If this determination is “Yes”, the process proceeds to Step 820.
 ステップ820において、最後の1つの分散セグメントに対して、適切な順番を付与する。優先度の付与の例としては、この最後の1つの分散セグメントの範囲において、既に順番が付与された分散セグメントが1つ以上存在する場合には、その1つ以上の分散セグメントのうち、最後の順番の次の順番を処理対象分散セグメントに付与してもよい。なお、本発明はこの順番の付与に限定されるものではない。この場合、最後の処理対象分散セグメントが処理されたため、処理は終了する。 In step 820, an appropriate order is assigned to the last one distributed segment. As an example of giving priority, in the range of the last one distributed segment, if there is one or more distributed segments that have already been assigned an order, the last of the one or more distributed segments The next order of the order may be assigned to the processing target distributed segment. In addition, this invention is not limited to provision of this order. In this case, since the last processing target distributed segment has been processed, the processing ends.
 ステップ806において、処理対象候補として残っている分散セグメントのうちで、最も優先度の高い分散セグメントを取得する。なお、後述するように後段のステップ8012でサブグラフが作成されている場合には、サブグラフに所属する分散セグメントの中で一番優先度の高い分散セグメントを取得する。なお、本処理フローは、再帰的な呼出しを行っているため、サブグラフがネストする場合がある点に留意すべきである。サブグラフの作成方法については、ステップ812において説明する。 In step 806, a distributed segment with the highest priority is acquired from among the distributed segments remaining as candidates for processing. As will be described later, when a subgraph is created in the subsequent step 8012, a distributed segment having the highest priority among the distributed segments belonging to the subgraph is acquired. It should be noted that this processing flow makes a recursive call, so that subgraphs may be nested. A method for creating a subgraph will be described in step 812.
 ステップ808において、取得された分散セグメントに対して、分散セグメントの順番に沿った優先度を与える。優先度の付与の例としては、取得された分散セグメントの範囲において、既に優先度が付与された分散セグメントが1つ以上存在する場合には、その1つ以上の分散セグメントのうち、最後の優先度の次の優先度を、取得された分散セグメントに付与してもよい。なお、本発明はこの優先度の付与に限定されるものではない。 In step 808, the acquired distributed segments are given priorities in the order of the distributed segments. As an example of giving priority, if there is one or more distributed segments that have already been given priority in the range of the obtained distributed segment, the last priority among the one or more distributed segments The next priority may be given to the obtained distributed segment. In addition, this invention is not limited to provision of this priority.
 ステップ810において、直前のステップ808で優先度を付与した分散セグメントを、処理対象から除外してもよい。なお、分散セグメントの一部分について優先度が与えられた場合には、その分散セグメントの他の部分も含め、全ての部分(全てのクエリ)で優先度が与えられていなければ、その分散セグメントを削除しないで残してもよい。複数の分散セグメントは、一部分で重なり合っている場合があるからである。 In step 810, the distributed segment to which the priority is given in the immediately preceding step 808 may be excluded from the processing target. If priority is given to a part of a distributed segment, the distributed segment is deleted if priority is not given to all parts (all queries) including other parts of the distributed segment. You may leave without. This is because the plurality of distributed segments may overlap partially.
 ステップ812において、直前のステップ810において削除された分散セグメントの範囲で、サブグラフを作成する。サブグラフとは、処理の対象として残っている分散セグメントのうちで、削除された分散セグメントの範囲に存在する分散セグメントをいう。この範囲に分散セグメントの一部分が入っている場合には、この分散セグメントのうち、この範囲に入っている一部分をサブグラフに含める。作ろうとしたサブグラフが、既に存在する場合には、新たに同一のサブグラフを作る必要はない。また、既に作成されたサブグラフが空集合になっている場合には、サブグラフを削除する。既に触れたように、この処理は再帰的に呼び出されるため、複数のサブグラフがネストしている場合がある点に留意すべきである。サブグラフが削除された場合には、更に浅いネストにサブグラフが存在している場合がある。この場合には、そのサブグラフに存在する分散セグメントに関して処理を続ければよい。サブグラフが1つも存在しない場合には、処理は、ステップ814に移る。 In step 812, a subgraph is created in the range of the distributed segment deleted in the immediately preceding step 810. The subgraph refers to a distributed segment that exists in the range of the deleted distributed segment among the distributed segments that remain as processing targets. If a part of the distributed segment is included in this range, a part of the distributed segment that is included in this range is included in the subgraph. If the subgraph to be created already exists, there is no need to create a new subgraph. If the already created subgraph is an empty set, the subgraph is deleted. As already mentioned, it should be noted that this process is called recursively, so multiple subgraphs may be nested. When a subgraph is deleted, the subgraph may exist in a shallower nest. In this case, what is necessary is just to continue a process regarding the distributed segment which exists in the subgraph. If there is no subgraph, the process moves to step 814.
 ステップ814においては、他の範囲について、分散セグメントの優先付けの処理を行うため、分散セグメントに付与すべき優先度をリセットして、残りの分散セグメントを対象にしてステップ802を再帰的に呼び出す。 In step 814, in order to perform distributed segment prioritization processing for other ranges, the priority to be assigned to the distributed segment is reset, and step 802 is recursively called for the remaining distributed segments.
 以上の処理によって、分散セグメントの優先度付けが行われる。なお、分散セグメントの具体的な優先度付けを行った結果については、図10において説明する。なお、本発明は、上記の優先度付けに限定されるものではない。 Priority is given to distributed segments by the above processing. The result of specific prioritization of the distributed segments will be described with reference to FIG. The present invention is not limited to the above prioritization.
 図9は、分散セグメントの抽出に関する一実施形態を示している。 FIG. 9 shows an embodiment relating to extraction of distributed segments.
 図9(A)は、クエリQ1、クエリQ2、及びクエリQ3を有するクエリグラフを示している。クエリQ1は、分散キーとしてK1及びK2を持っている。クエリQ2は、分散キーとしてK1、K2、及びK3を持っている。クエリQ3は、分散キーとしてK1及びK3を持っている。 FIG. 9A shows a query graph having a query Q1, a query Q2, and a query Q3. The query Q1 has K1 and K2 as distribution keys. The query Q2 has K1, K2, and K3 as distribution keys. Query Q3 has K1 and K3 as distribution keys.
 図9(B)は、既に説明した分散セグメントの抽出方法を適用して抽出された分散セグメントを、クエリに対応させて整理した表である。この表において、所属するセグメントの欄は、左から順番に優先度が高いセグメントが並んでいる。 FIG. 9B is a table in which distributed segments extracted by applying the already described distributed segment extraction method are arranged in correspondence with queries. In this table, the segment column to which the segment belongs is lined up with segments with higher priority in order from the left.
 図9(C)は、抽出された分散セグメントに対応させて、クエリを整理した表である。図9(B)の表と図9(C)の表の情報は同じである。分散セグメントS1は、クエリQ1及びクエリQ2を含むクエリ部分集合を構成する。分散セグメントS2は、クエリQ2を含む。分散セグメントS3は、クエリQ2及びクエリQ3を含む。分散セグメントS4は、クエリQ1、クエリQ2、及びクエリQ3を含む。 FIG. 9C is a table in which queries are arranged in correspondence with the extracted distributed segments. The information in the table of FIG. 9B and the table of FIG. 9C is the same. The distributed segment S1 constitutes a query subset including the query Q1 and the query Q2. The distributed segment S2 includes a query Q2. The distributed segment S3 includes a query Q2 and a query Q3. The distributed segment S4 includes a query Q1, a query Q2, and a query Q3.
 図10は、一実施形態に従って、分散セグメントを用いて、並列分散されたノードへのクエリの配備を示している。 FIG. 10 illustrates the deployment of queries to nodes that are distributed in parallel using distributed segments, according to one embodiment.
 図10(A)は、分散セグメント(S1,S2,S3,S4)を優先度順に並べたものを示している。優先度に関しては、分散セグメントS4が一番高く、分散セグメントS2が一番低い。この優先度付けには、セグメントの長さ(クエリの数)が長いほど優先度が高くなる評価関数を用いている。 FIG. 10A shows the distributed segments (S1, S2, S3, S4) arranged in order of priority. Regarding priorities, the distributed segment S4 is the highest and the distributed segment S2 is the lowest. For this prioritization, an evaluation function is used in which the priority becomes higher as the segment length (number of queries) becomes longer.
 図10(B)は、一番優先度の高い分散セグメントS4(1000)を採用し、分散セグメントS4に所属するクエリ(Q1,Q2,Q3)を配備した例を示している。 FIG. 10B shows an example in which the distributed segment S4 (1000) having the highest priority is adopted and the queries (Q1, Q2, Q3) belonging to the distributed segment S4 are deployed.
 図10(C)は、分散セグメントS4単位で、各ノードにクエリ(Q1,Q2,Q3)を配備し、ノード1000-1,ノード1000-2ないしノード1000-Nを並列分散した例を示している。並列分散の数は、N個である。ポイント1020において、入力ストリーム1010をN個のノードに振り分ける。この振り分けにおいては、N個のハッシュ値を出力するハッシュ関数に、分散セグメントS4の共通の分散キーセットであるK1を適用して、入力ストリームを振り分ければよい。そして、振り分けられたストリームは、各ノード内でクエリQ1、クエリQ2、クエリQ3の順に処理される。各ノードの処理結果は、ポイント1030において結合され、処理結果1040が得られる。 FIG. 10C shows an example in which a query (Q1, Q2, Q3) is arranged in each node in units of the distributed segment S4, and the nodes 1000-1, 1000-2 to 1000-N are distributed in parallel. Yes. The number of parallel distributions is N. At point 1020, the input stream 1010 is distributed to N nodes. In this distribution, the input stream may be distributed by applying K1 which is a common distributed key set of the distributed segment S4 to a hash function that outputs N hash values. The distributed stream is processed in the order of the query Q1, the query Q2, and the query Q3 in each node. The processing results of the nodes are combined at point 1030 to obtain a processing result 1040.
 図11は、一実施形態に従って、実際に入力ストリームをプロファイリングした結果、ノード1のクエリQ1(1110)、及びノード3のクエリ1130の処理負荷が大きいことが判明し、再分散を行った例を示している。図に示されるように、ノード1.2が追加されノード1.1の再分散が行われている。また、ノード3.2が追加され、ノード3.1の再分散が行われている。この詳細を以下に説明する。 FIG. 11 shows an example in which the processing load of the query Q1 (1110) of the node 1 and the query 1130 of the node 3 is large as a result of actually profiling the input stream according to the embodiment, and redistribution is performed. Show. As shown in the figure, node 1.2 is added and node 1.1 is redistributed. In addition, node 3.2 is added and redistribution of node 3.1 is performed. Details thereof will be described below.
 なお、本明細書では、説明のわかりやすさから、入力ストリームの各イベントを用いて、各クエリ間でのデータの流れを記載している。実際の処理においては、ある前段クエリの後段に位置する後段クエリには、前段クエリに所属するクエリの処理結果が後段クエリに提供され得る。 In this specification, for the sake of easy understanding, the data flow between each query is described using each event of the input stream. In actual processing, the processing result of the query belonging to the preceding query can be provided to the succeeding query for the succeeding query located after the certain preceding query.
 再分散を行う場合、分散キーが異なるため、再ハッシングが発生し、通信がある程度量増大するしかし、これは全てのノードで発生することではなく、特定のノードで(あるいは、特定のハッシュ値に対して)しか発生しないため、この通信の影響は局所的である。
そのため、再分散の検討時には、再分散されるノードが直接繋がっているネットワークの帯域幅に十分余裕があるかを確認した上で、実施することが望ましい。
When performing re-distribution, the redistribution key is different, so re-hashing occurs and communication increases by a certain amount.However, this does not occur at all nodes, but at a specific node (or to a specific hash value) The impact of this communication is local.
For this reason, when considering redistribution, it is desirable to confirm that there is a sufficient bandwidth in the network to which the redistributed nodes are directly connected.
 クエリQ1(1110)の処理を分散する場合を検討する。クエリQ1を含み、次に優先度の高いセグメントを探す。セグメントS1がこの条件に合致していることが分かるので、セグメントS1に所属するクエリQ1及びQ2を新たなノード1.2に配備して、再分散を行う。 Consider the case of distributing the processing of query Q1 (1110). Search for the segment with the next highest priority, including query Q1. Since it can be seen that the segment S1 meets this condition, the queries Q1 and Q2 belonging to the segment S1 are deployed to the new node 1.2 and redistributed.
 再分散を行う際、ポイント1190においてストリームをN個に振り分けを行っているハッシュ関数H1に影響を与えないように、ノード管理機構がノード1という仮想ノードを立てるのが望ましい。そして、その仮想ノードをノード1.1とノード1.2の物理ノードに分散させるようにしてもよい。ノード1.2には、セグメントS1に所属するクエリQ1(1115)とクエリQ2(1125)が配備される。これによって、ノード1.1のクエリQ1(1110)とクエリQ2(1120)の処理が分散される。そして、ノードが増えたため、2個のハッシュ値を出力するハッシュH2に対して分散キーセット(K1,K2)を適用して、ポイント1195において振り分けを行う必要がある。そして、クエリQ2(1125)の出力は、クエリQ3(1121)に与える必要がある。したがって、ノード1.2を追加することによって、通信量は、ある程度増大することが分かる。 When performing redistribution, it is desirable that the node management mechanism establishes a virtual node called node 1 so as not to affect the hash function H1 that distributes the stream to N at the point 1190. Then, the virtual node may be distributed to the physical nodes of node 1.1 and node 1.2. In the node 1.2, a query Q1 (1115) and a query Q2 (1125) belonging to the segment S1 are provided. As a result, the processing of the query Q1 (1110) and the query Q2 (1120) of the node 1.1 is distributed. Since the number of nodes has increased, it is necessary to perform distribution at the point 1195 by applying the distributed key set (K1, K2) to the hash H2 that outputs two hash values. The output of the query Q2 (1125) needs to be given to the query Q3 (1121). Therefore, it can be seen that the communication amount increases to some extent by adding the node 1.2.
 また、クエリQ3(1130)の処理負荷を減少させるため、クエリQ2を含み、次に優先度の高いセグメントを探す。セグメントS3がこの条件に合致していることが分かるので、セグメントS3に所属するクエリQ2及びQ3を新たなノード3.2に配備して、再分散を行う。ノード3.2には、セグメントS3に所属するクエリQ2(1135)とクエリQ3(1345)が配備される。これによって、ノード3.1のクエリQ2(1130)とクエリQ3(1140)の処理が分散される。これによってノードが増えたため、2個のハッシュ値を出力するハッシュ関数H2に対して分散キーセット(K1,K3)を適用して、振り分けを行う必要がある。そして、クエリQ3(1140)及びクエリQ3(1145)の出力は、ポイント1196に与える必要がある。したがって、ノード3.2を追加することによって、通信量は、ある程度増大することが分かる。 Also, in order to reduce the processing load of the query Q3 (1130), a segment including the query Q2 and having the next highest priority is searched. Since it can be seen that the segment S3 meets this condition, the queries Q2 and Q3 belonging to the segment S3 are deployed to the new node 3.2 and redistribution is performed. In the node 3.2, a query Q2 (1135) and a query Q3 (1345) belonging to the segment S3 are provided. As a result, the processing of the query Q2 (1130) and the query Q3 (1140) of the node 3.1 is distributed. Since this increases the number of nodes, it is necessary to distribute the hash function H2 that outputs two hash values by applying the distributed key set (K1, K3). The outputs of the query Q3 (1140) and the query Q3 (1145) need to be given to the point 1196. Therefore, it can be seen that the communication amount increases to some extent by adding the node 3.2.
 図11(B)は、ハッシュ関数の例を示している。なお、K1%Nとは、N個のハッシュ値を出力するハッシュ関数に分散キーセットK1を適用することを意味する。 FIG. 11B shows an example of a hash function. Note that K1% N means that the distributed key set K1 is applied to a hash function that outputs N hash values.
 以上のように、再分散を行うことによって、通信量はある程度増加するが、より適切な負荷分散が行えることが分かる。 As described above, it can be seen that by performing re-distribution, the amount of communication increases to some extent, but more appropriate load distribution can be performed.
 図12は、一実施形態における分散セグメントの抽出を示している。 FIG. 12 illustrates the extraction of distributed segments in one embodiment.
 図12(A)は、クエリグラフを示しており、クエリQ1ないしQ6が、それぞれ、図示された分散キーセットを有していることを示している。 FIG. 12A shows a query graph, which indicates that the queries Q1 to Q6 each have the illustrated distributed key set.
 図12(B)は、それぞれのクエリが所属する分散セグメントを示す表である。 FIG. 12B is a table showing the distributed segments to which each query belongs.
 図12(C)は、それぞれのセグメントに所属するクエリを示す表である。 FIG. 12C is a table showing queries belonging to each segment.
 図13は、一実施形態に従って、並列分散されたノードへのクエリの配備を示している。 FIG. 13 illustrates the deployment of queries to parallel distributed nodes according to one embodiment.
 図13(A)は、分散セグメント(S1,S2,S3,S4,S5)を優先度順に並べたものを示している。優先度に関しては、分散セグメントS1及びS5が一番高く、分散セグメントS4が一番低い。この優先度付けには、セグメントの長さ(クエリの数)が長いほど優先度が高くなる評価関数を用いている。S1を割り当てた後、S1のサブグラフ(Q1,Q2,Q3,Q4)を除外したサブグラフ(Q5,Q6)の中でもっとも長いセグメントはS4ではなくS5であるため、S5をQ5とQ6に対して優先的に割り当てることが望ましい。したがって、S5の優先度が高くなっている。 FIG. 13 (A) shows the distributed segments (S1, S2, S3, S4, S5) arranged in order of priority. Regarding priorities, the distributed segments S1 and S5 are the highest and the distributed segment S4 is the lowest. For this prioritization, an evaluation function is used in which the priority increases as the segment length (number of queries) increases. After assigning S1, the longest segment in the subgraph (Q5, Q6) excluding the subgraph (Q1, Q2, Q3, Q4) of S1 is S5 instead of S4, so S5 is assigned to Q5 and Q6. It is desirable to assign priority. Therefore, the priority of S5 is high.
 図13(B)は、一番優先度の高い分散セグメントS1及びS5を採用し、分散セグメントS1に所属するクエリ(Q1,Q2,Q3,Q4)と、分散セグメントS5に所属するクエリ(Q5,Q6)を配備した例を示している。 FIG. 13B employs distributed segments S1 and S5 having the highest priority, and queries (Q1, Q2, Q3, Q4) belonging to the distributed segment S1 and queries (Q5, Q5) belonging to the distributed segment S5. An example in which Q6) is deployed is shown.
 図13(C)は、分散セグメントS1単位で、ノード1ないしノードN/2にクエリ(Q1,Q2,Q3,Q4)を配備し、ノードN/2+1ないしノードNにクエリ(Q5,Q6)を配備し、並列分散した例を示している。 In FIG. 13C, in the distributed segment S1, a query (Q1, Q2, Q3, Q4) is allocated to node 1 to node N / 2, and a query (Q5, Q6) is assigned to node N / 2 + 1 to node N. An example of deployment and parallel distribution is shown.
 この場合、例えば、ノード1のQ4(1301)の出力結果は、N/2個のハッシュ値を出力するハッシュ関数に分散キーセットK3を適用して、ノードN/2+1ないしノードNに振り分ける必要がある。これは、セグメントS1の分散キーセットK1とセグメントS5の分散キーセットK3が異なるためである。このように、セグメントの境で、ノード間に通信が発生するが、クエリ毎にノードを割り当てる場合に比して、通信量は飛躍的に減少できる。 In this case, for example, the output result of Q4 (1301) of node 1 needs to be distributed to node N / 2 + 1 or node N by applying distributed key set K3 to a hash function that outputs N / 2 hash values. is there. This is because the distributed key set K1 of the segment S1 and the distributed key set K3 of the segment S5 are different. In this way, communication occurs between nodes at the segment boundary, but the amount of communication can be drastically reduced as compared with the case where a node is assigned for each query.
 図14は、一実施形態の機能ブロック図を示しており、システムを構成する要素を含んでいる。 FIG. 14 shows a functional block diagram of an embodiment, which includes elements constituting the system.
 この実施形態では、クエリ情報取得部1420、分散キーセット抽出部1430、分散セグメント生成部1440、並列分散部1450、プロファイル取得部1460、プロファイル評価部1465、クエリ/ノード特定部1470を含む。 This embodiment includes a query information acquisition unit 1420, a distributed key set extraction unit 1430, a distributed segment generation unit 1440, a parallel distribution unit 1450, a profile acquisition unit 1460, a profile evaluation unit 1465, and a query / node specification unit 1470.
 まずクエリ情報取得部1420は、データストリームプログラム1410を受け入れる。クエリ情報取得部1420は、複数のクエリ及び複数のクエリの各々の間の接続関係等を認識する。そして、その結果を分散キーセット抽出部1430に渡す。 First, the query information acquisition unit 1420 accepts the data stream program 1410. The query information acquisition unit 1420 recognizes a plurality of queries and a connection relationship between each of the plurality of queries. Then, the result is passed to the distributed key set extraction unit 1430.
 分散キーセット抽出部1430は、データストリームをハッシュして複数のクエリの各々を並列に実行させる場合に利用し得る分散キーセットを、複数のクエリの各々と対応付けて抽出する。 The distributed key set extraction unit 1430 extracts a distributed key set that can be used when hashing the data stream and executing each of the plurality of queries in parallel with each of the plurality of queries.
 分散セグメント生成部1440では、共通の分散キーセットを持つ一連のクエリを所属させるための分散セグメントを生成する。分散セグメントに所属する少なくとも1つのクエリは、原則として連続していることが条件となる。例外的に、図6で示した事例のように、同じクエリに出力を送る複数のクエリを、同一のセグメントに所属させることが望ましい場合がある。 The distributed segment generation unit 1440 generates a distributed segment to which a series of queries having a common distributed key set belong. In principle, at least one query belonging to the distributed segment is contiguous. Exceptionally, it may be desirable to have multiple queries that send output to the same query belong to the same segment, as in the case shown in FIG.
 なお、分散セグメント生成部1440は、分散セグメント優先度付与部1445と分散セグメント選択部1446を含んでもよい。 The distributed segment generation unit 1440 may include a distributed segment priority assigning unit 1445 and a distributed segment selection unit 1446.
 分散セグメント優先度付与部1445は、分散セグメントの各々に対して、所定の評価関数を用いて優先度を付与する。所定の評価関数としては、分散セグメントの長さ(所属するクエリの数)が長いものほど、分散セグメントに対する評価を高くしてもよい。或いは、分散セグメントが持つ共通の分散キーセットの数が多いほど、高い評価を与えてもよい。或いは、分散セグメントに所属するクエリの処理の重さが重いほど、高い評価を与えてもよい。 The distributed segment priority assigning unit 1445 assigns a priority to each distributed segment using a predetermined evaluation function. As the predetermined evaluation function, the longer the distributed segment length (the number of queries to which it belongs), the higher the evaluation for the distributed segment may be. Or you may give high evaluation, so that there are many common distributed key sets which a distributed segment has. Or you may give high evaluation, so that the weight of the process of the query which belongs to a distributed segment is heavy.
 分散セグメント選択部1446は、分散セグメントに付与された優先度に基づいて、1つ以上の分散セグメントから、利用すべき分散セグメントを選択する。 The distributed segment selection unit 1446 selects a distributed segment to be used from one or more distributed segments based on the priority assigned to the distributed segment.
 並列分散部1450は、利用するべき分散セグメントに所属するクエリの並列実行のために、分散セグメントに対応する共通の分散キーセットに適用されるハッシュ関数に基づいて複数のノードを並列分散させる。並列分散された複数のノードの各々には、分散セグメントに所属するクエリが配備される。なお、ノードの数は、存在する物理マシンの数に基づいて、適切な数を決定してもよい。 The parallel distribution unit 1450 distributes a plurality of nodes in parallel based on a hash function applied to a common distribution key set corresponding to the distributed segment, in order to execute a query belonging to the distributed segment to be used in parallel. Each of the plurality of nodes distributed in parallel is provided with a query belonging to the distributed segment. Note that an appropriate number of nodes may be determined based on the number of existing physical machines.
 以上の処理によって、並列分散された複数のノードに、クエリが配備される。 Through the above processing, a query is deployed to a plurality of nodes distributed in parallel.
 さらに、プロファイル取得部1460は、データストリームを処理した際のプロファイリングを実行する。このプロファイリングによって、各クエリ、及び/又は、各ノードの実行プロファイルが取得される。 Furthermore, the profile acquisition unit 1460 performs profiling when the data stream is processed. Through this profiling, an execution profile of each query and / or each node is obtained.
 プロファイル評価部1465は、このプロファイルを評価することにより、特定のクエリ、及び/又は、特定のノードに負荷が集中しているか否かをチェックすることができる。負荷の集中の尺度としては、各クエリ及び/又は各ノードにおける単位時間当たりのイベントの数、CPU負荷、メモリ使用量、処理能力等を含む様々な情報が取得されてもよい。なお、本発明は、上述の例示的記載に限定されるものではない。 The profile evaluation unit 1465 can check whether a load is concentrated on a specific query and / or a specific node by evaluating the profile. As a measure of load concentration, various information including the number of events per unit time in each query and / or each node, CPU load, memory usage, processing capability, and the like may be acquired. Note that the present invention is not limited to the above exemplary description.
 クエリ/ノード特定部1470は、所定の評価関数を用いて、負荷の集中しているクエリ、又はノードを特定する。この結果は、再並列分散部1455に与えられる。 The query / node specifying unit 1470 uses a predetermined evaluation function to specify a query or node where the load is concentrated. This result is given to the re-parallel distribution unit 1455.
 再並列分散部1455は、並列分散部1450内に存在してもよい。再並列分散部1455は、特定のクエリ又は前記特定のノードに配備されたクエリの全部又は一部が所属する新たなセグメントを配備した新たなノードと、特定のクエリが存在するノード又は前記特定のノードとを再度並列分散させる。 The re-parallel distribution unit 1455 may exist in the parallel distribution unit 1450. The re-parallel distribution unit 1455 includes a new node in which a new segment to which all or part of a specific query or a query deployed in the specific node belongs, a node in which the specific query exists, or the specific Redistribute the nodes again in parallel.
 図15は、本発明の実施形態のハードウェア(コンピュータ)の構成例を示している。ハードウェアは、CPU1510、メモリ1515、入力装置1520、出力装置1525、外部記憶装置1530、可搬記録媒体駆動装置1535、ネットワーク接続装置1545が含まれる。そして、それぞれの機器は、バス1550によって接続されている。また、可搬記録媒体駆動装置1535は、可搬記録媒体1540を読み書きすることができる。そして、ネットワーク接続装置1545には、ネットワーク1560が接続されている。 FIG. 15 shows a configuration example of hardware (computer) according to the embodiment of the present invention. The hardware includes a CPU 1510, a memory 1515, an input device 1520, an output device 1525, an external storage device 1530, a portable recording medium drive device 1535, and a network connection device 1545. Each device is connected by a bus 1550. Further, the portable recording medium driving device 1535 can read and write the portable recording medium 1540. A network 1560 is connected to the network connection device 1545.
 なお、本実施形態の全部又は一部はプログラムによってインプリメントされ得る。このプログラムは、可搬記録媒体1540に格納することができる。可搬記録媒体1540とは、構造(structure)を有する1つ以上の非一時的(non-transitory)な、有形(tangible)な、記憶媒体を言う。例示として、可搬記録媒体1540としては、磁気記録媒体、光ディスク、光磁気記録媒体、不揮発性メモリなどがある。磁気記録媒体には、HDD、フレキシブルディスク(FD)、磁気テープ(MT)などがある。光ディスクには、DVD(Digital Versatile Disc)、DVD-RAM、CD-ROM(Compact Disc-Read Only Memory)、CD-R(Recordable)/RW(ReWritable)などがある。また、光磁気記録媒体には、MO(Magneto-Optical disk)などがある。可搬型記録媒体に格納されたプログラムが読み込まれ、CPUによって実行されることにより、本発明の実施形態の全部又は一部が実施され得る。 It should be noted that all or part of the present embodiment can be implemented by a program. This program can be stored in the portable recording medium 1540. The portable recording medium 1540 refers to one or more non-transitory, tangible storage media having a structure. Illustrative examples of the portable recording medium 1540 include a magnetic recording medium, an optical disk, a magneto-optical recording medium, and a nonvolatile memory. Magnetic recording media include HDDs, flexible disks (FD), magnetic tapes (MT) and the like. Optical disks include DVD (Digital Versatile Disc), DVD-RAM, CD-ROM (Compact Disc-Read Only Memory), CD-R (Recordable) / RW (ReWritable), and the like. Magneto-optical recording media include MO (Magneto-Optical disk). All or part of the embodiments of the present invention can be implemented by reading a program stored in a portable recording medium and executing it by a CPU.
1410 データストリームプログラム
1420 クエリ情報取得部
1430 分散キーセット抽出部
1440 分散セグメント生成部
1445 分散セグメント優先度付与部
1446 分散セグメント選択部
1450 並列分散部
1455 再並列分散部
1460 プロファイル取得部
1465 プロファイル評価部
1470 クエリ/ノード特定部
1410 Data stream program 1420 Query information acquisition unit 1430 Distributed key set extraction unit 1440 Distributed segment generation unit 1445 Distributed segment priority assignment unit 1446 Distributed segment selection unit 1450 Parallel distribution unit 1455 Re-parallel distribution unit 1460 Profile acquisition unit 1465 Profile evaluation unit 1470 Query / node identification part

Claims (12)

  1.  データストリームを処理する複数のクエリを複数のノードに配備するためのプログラムであって、前記複数のノードの各々は、配備されたクエリを実行した結果を後段のノード又は出力に提供する能力を有し、
     前記複数のクエリ及び前記複数のクエリの各々の間の接続関係を取得し、
     前記データストリームをハッシュして前記複数のクエリの各々を並列に実行させる場合に利用し得る分散キーセットを、前記複数のクエリの各々と対応付けて抽出し、
     前記複数のクエリのクエリ部分集合が所属する分散セグメントであって、前記クエリ部分集合が共通に持つ共通分散キーセットを持つ分散セグメントを生成し、
     前記分散セグメントに対応させてノードを並列分散させ、1つ以上の前記並列分散されたノードの各々に、前記セグメントに所属する前記クエリ部分集合が配備される、
     処理を、コンピュータに実行させるプログラム。
    A program for deploying a plurality of queries for processing a data stream to a plurality of nodes, each of the plurality of nodes having an ability to provide a result of executing the deployed query to a subsequent node or output. And
    Obtaining a connection relationship between each of the plurality of queries and the plurality of queries;
    Extracting a distributed key set that can be used when hashing the data stream and executing each of the plurality of queries in parallel with each of the plurality of queries;
    A distributed segment to which a query subset of the plurality of queries belongs, having a common distributed key set that the query subset has in common;
    A node is distributed in parallel corresponding to the distributed segment, and the query subset belonging to the segment is deployed in each of one or more of the parallel distributed nodes.
    A program that causes a computer to execute processing.
  2.  前記分散セグメントを生成する処理は、
     1つ以上の前記分散セグメントの各々に対して、所定の第1の規則に基づいて優先度を付与し、
     前記優先度に基づいて、1つ以上の前記分散セグメントから分散セグメントを、利用のために、選択する、
     処理を含む請求項1記載のプログラム。
    The process of generating the distributed segment is as follows:
    Giving each one or more of the distributed segments a priority based on a predetermined first rule;
    Selecting a distributed segment for use from one or more of the distributed segments based on the priority;
    The program according to claim 1 including processing.
  3.  前記複数のノードに配備された前記複数のクエリが前記データストリームを処理する際に、前記複数のクエリの各々、及び/又は前記複数のノードの各々のプロファイルを取得し、
     所定の第2の規則に基づいて、前記プロファイルを評価し、
     前記評価に基づいて、特定のクエリ又は特定のノードを特定する、
     処理を、更にコンピュータに実行させ、
     前記並列分散させる処理は、
     前記特定のクエリ又は前記特定のノードに配備されたクエリの全部又は一部が所属する新たなセグメントを配備した新たなノードと、前記特定のクエリが存在するノード又は前記特定のノードとを再度並列分散させる、処理を含む、
     請求項1又は2記載のプログラム。(図4)
    Obtaining a profile of each of the plurality of queries and / or each of the plurality of nodes as the plurality of queries deployed on the plurality of nodes process the data stream;
    Evaluating the profile based on a predetermined second rule;
    Identify a specific query or a specific node based on the evaluation;
    Let the computer execute the process further,
    The parallel distributed processing is:
    A new node in which a new segment to which all or a part of the specific query or a query deployed in the specific node belongs and a node in which the specific query exists or the specific node is parallel again Distributed, including processing,
    The program according to claim 1 or 2. (Fig. 4)
  4.  前記クエリ部分集合は、処理が連続する一連のクエリを含む、請求項1ないし3のうちいずれか1項に記載のプログラム。 The program according to any one of claims 1 to 3, wherein the query subset includes a series of queries that are continuously processed.
  5.  データストリームを処理する複数のクエリを複数のノードに配備するための方法であって、前記複数のノードの各々は、配備されたクエリを実行した結果を後段のノード又は出力に提供する能力を有し、
     前記複数のクエリ及び前記複数のクエリの各々の間の接続関係を取得し、
     前記データストリームをハッシュして前記複数のクエリの各々を並列に実行させる場合に利用し得る分散キーセットを、前記複数のクエリの各々と対応付けて抽出し、
     前記複数のクエリのクエリ部分集合が所属する分散セグメントであって、前記クエリ部分集合が共通に持つ共通分散キーセットを持つ分散セグメントを生成し、
     前記分散セグメントに対応させてノードを並列分散させ、1つ以上の前記並列分散されたノードの各々に、前記セグメントに所属する前記クエリ部分集合が配備される、
     処理を有する方法。
    A method for deploying a plurality of queries for processing a data stream to a plurality of nodes, each of the plurality of nodes having an ability to provide a result of executing the deployed query to a subsequent node or output. And
    Obtaining a connection relationship between each of the plurality of queries and the plurality of queries;
    Extracting a distributed key set that can be used when hashing the data stream and executing each of the plurality of queries in parallel with each of the plurality of queries;
    A distributed segment to which a query subset of the plurality of queries belongs, having a common distributed key set that the query subset has in common;
    A node is distributed in parallel corresponding to the distributed segment, and the query subset belonging to the segment is deployed in each of one or more of the parallel distributed nodes.
    A method having processing.
  6.  前記分散セグメントを生成する処理は、
     1つ以上の前記分散セグメントの各々に対して、所定の第1の規則に基づいて優先度を付与し、
     前記優先度に基づいて、1つ以上の前記分散セグメントから分散セグメントを、利用のために、選択する、
     処理を含む請求項5記載の方法。
    The process of generating the distributed segment is as follows:
    Giving each one or more of the distributed segments a priority based on a predetermined first rule;
    Selecting a distributed segment for use from one or more of the distributed segments based on the priority;
    6. The method of claim 5, comprising processing.
  7.  前記複数のノードに配備された前記複数のクエリが前記データストリームを処理する際に、前記複数のクエリの各々、及び/又は前記複数のノードの各々のプロファイルを取得し、
     所定の第2の規則に基づいて、前記プロファイルを評価し、
     前記評価に基づいて、特定のクエリ又は特定のノードを特定する、
     処理を有し、
     前記並列分散させる処理は、
     前記特定のクエリ又は前記特定のノードに配備されたクエリの全部又は一部が所属する新たなセグメントを配備した新たなノードと、前記特定のクエリが存在するノード又は前記特定のノードとを再度並列分散させる、処理を含む、
     請求項5又は6記載の方法。
    Obtaining a profile for each of the plurality of queries and / or each of the plurality of nodes as the plurality of queries deployed on the plurality of nodes process the data stream;
    Evaluating the profile based on a predetermined second rule;
    Identify a specific query or a specific node based on the evaluation;
    Have processing,
    The parallel distributed processing is:
    A new node in which a new segment to which all or a part of the specific query or a query deployed in the specific node belongs and a node in which the specific query exists or the specific node is parallel again Distributed, including processing,
    The method according to claim 5 or 6.
  8.  前記クエリ部分集合は、処理が連続する一連のクエリを含む、請求項5ないし7のうちいずれか1項に記載の方法。 The method according to any one of claims 5 to 7, wherein the query subset includes a series of queries that are continuously processed.
  9.  データストリームを処理する複数のクエリを複数のノードに配備するためのシステムであって、前記複数のノードの各々は、配備されたクエリを実行した結果を後段のノード又は出力に提供する能力を有し、
     前記複数のクエリ及び前記複数のクエリの各々の間の接続関係を取得する、クエリ情報取得部と、
     前記データストリームをハッシュして前記複数のクエリの各々を並列に実行させる場合に利用し得る分散キーセットを、前記複数のクエリの各々と対応付けて抽出する、分散キーセット抽出部と、
     前記複数のクエリのクエリ部分集合が所属する分散セグメントであって、前記クエリ部分集合が共通に持つ共通分散キーセットを持つ分散セグメントを生成する、分散セグメント生成部と、
     前記分散セグメントに対応させてノードを並列分散させ、1つ以上の前記並列分散されたノードの各々に、前記セグメントに所属する前記クエリ部分集合が配備される、並列分散部と、
     を有するシステム。
    A system for deploying a plurality of queries for processing a data stream to a plurality of nodes, each of the plurality of nodes having an ability to provide a result of executing the deployed query to a subsequent node or output. And
    A query information acquisition unit for acquiring a connection relationship between each of the plurality of queries and the plurality of queries;
    A distributed key set extraction unit that extracts a distributed key set that can be used when hashing the data stream and causing each of the plurality of queries to be executed in parallel; and
    A distributed segment generation unit that generates a distributed segment to which a query subset of the plurality of queries belongs, and has a common distributed key set that the query subset has in common;
    A parallel distribution unit in which nodes are distributed in parallel corresponding to the distributed segments, and each of the one or more parallel distributed nodes is provided with the query subset belonging to the segment; and
    Having a system.
  10.  前記分散セグメント生成部は、
     1つ以上の前記分散セグメントの各々に対して、所定の第1の規則に基づいて優先度を付与する、分散セグメント優先度付与部と、
     前記優先度に基づいて、1つ以上の前記分散セグメントから分散セグメントを、利用のために、選択する、分散セグメント選択部と、
     を含む、請求項9記載のシステム。
    The distributed segment generation unit
    A distributed segment priority assigning unit that assigns a priority to each of the one or more distributed segments based on a predetermined first rule;
    A distributed segment selector that selects, for use, a distributed segment from one or more of the distributed segments based on the priority;
    10. The system of claim 9, comprising:
  11.  前記複数のノードに配備された前記複数のクエリが前記データストリームを処理する際に、前記複数のクエリの各々、及び/又は前記複数のノードの各々のプロファイルを取得する、プロファイル取得部と、
     所定の第2の規則に基づいて、前記プロファイルを評価する、プロファイル評価部と、
     前記評価に基づいて、特定のクエリ又は特定のノードを特定する、クエリ/ノード特定部と、
     を有し、
     前記並列分散部は、
     前記特定のクエリ又は前記特定のノードに配備されたクエリの全部又は一部が所属する新たなセグメントを配備した新たなノードと、前記特定のクエリが存在するノード又は前記特定のノードとを再度並列分散させる、再並列分散部、を含む、
     請求項9又は10記載のシステム。
    A profile acquisition unit that acquires a profile of each of the plurality of queries and / or each of the plurality of nodes when the plurality of queries deployed in the plurality of nodes process the data stream;
    A profile evaluation unit that evaluates the profile based on a predetermined second rule;
    A query / node identification unit that identifies a specific query or a specific node based on the evaluation;
    Have
    The parallel distribution unit is:
    A new node in which a new segment to which all or a part of the specific query or a query deployed in the specific node belongs and a node in which the specific query exists or the specific node is parallel again Including a re-parallel distribution unit,
    The system according to claim 9 or 10.
  12.  前記クエリ部分集合は、処理が連続する一連のクエリを含む、請求項9ないし11のうちいずれか1項に記載のシステム。 The system according to any one of claims 9 to 11, wherein the query subset includes a series of queries in which processing is continued.
PCT/JP2012/058731 2012-03-30 2012-03-30 Data stream parallel processing program, method, and system WO2013145310A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2012/058731 WO2013145310A1 (en) 2012-03-30 2012-03-30 Data stream parallel processing program, method, and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2012/058731 WO2013145310A1 (en) 2012-03-30 2012-03-30 Data stream parallel processing program, method, and system

Publications (1)

Publication Number Publication Date
WO2013145310A1 true WO2013145310A1 (en) 2013-10-03

Family

ID=49258672

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/058731 WO2013145310A1 (en) 2012-03-30 2012-03-30 Data stream parallel processing program, method, and system

Country Status (1)

Country Link
WO (1) WO2013145310A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015087829A (en) * 2013-10-28 2015-05-07 富士通株式会社 Data processing program, data processing method, and data processing apparatus
JP2015114937A (en) * 2013-12-13 2015-06-22 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Extraction device, data processing system, extraction method, and extraction program
WO2015196940A1 (en) * 2014-06-23 2015-12-30 华为技术有限公司 Stream processing method, apparatus and system
JP2016066291A (en) * 2014-09-25 2016-04-28 富士通株式会社 Data processing method, data processing program, and data processor
JP2017514216A (en) * 2014-03-31 2017-06-01 華為技術有限公司Huawei Technologies Co.,Ltd. Event processing system
WO2017104072A1 (en) * 2015-12-18 2017-06-22 株式会社日立製作所 Stream data distribution processing method, stream data distribution processing system and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010204880A (en) * 2009-03-03 2010-09-16 Hitachi Ltd Stream data processing method, stream data processing program, and stream data processing apparatus
US20110314019A1 (en) * 2010-06-18 2011-12-22 Universidad Politecnica De Madrid Parallel processing of continuous queries on data streams

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010204880A (en) * 2009-03-03 2010-09-16 Hitachi Ltd Stream data processing method, stream data processing program, and stream data processing apparatus
US20110314019A1 (en) * 2010-06-18 2011-12-22 Universidad Politecnica De Madrid Parallel processing of continuous queries on data streams

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MITCH CHERNIACK ET AL.: "Scalable Distributed Stream Processing", PROCEEDINGS OF THE FIRST BIENNIAL CONFERENCE ON INNOVATIVE DATA SYSTEMS RESEARCH (CIDR 2003), 8 January 2003 (2003-01-08) *
SATOSHI KATSUNUMA ET AL.: "Distributed and Parallel Stream Data Processing for Reducing Merge Operation Overhead", DAI 2 KAI FORUM ON DATA ENGINEERING AND INFORMATION MANAGEMENT -DEIM 2010- RONBUNSHU, 25 May 2010 (2010-05-25) *
YING XING ET AL.: "Providing resiliency to load variations in distributed stream processing", VLDB '06 PROCEEDINGS OF THE 32ND INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 12 September 2006 (2006-09-12), pages 775 - 786 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015087829A (en) * 2013-10-28 2015-05-07 富士通株式会社 Data processing program, data processing method, and data processing apparatus
JP2015114937A (en) * 2013-12-13 2015-06-22 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Extraction device, data processing system, extraction method, and extraction program
US9984134B2 (en) 2013-12-13 2018-05-29 International Business Machines Corporation Extraction device, data processing system, and extraction method
US10089370B2 (en) 2013-12-13 2018-10-02 International Business Machines Corporation Extraction device, data processing system, and extraction method
JP2017514216A (en) * 2014-03-31 2017-06-01 華為技術有限公司Huawei Technologies Co.,Ltd. Event processing system
US11138177B2 (en) 2014-03-31 2021-10-05 Huawei Technologies Co., Ltd. Event processing system
WO2015196940A1 (en) * 2014-06-23 2015-12-30 华为技术有限公司 Stream processing method, apparatus and system
CN105335376A (en) * 2014-06-23 2016-02-17 华为技术有限公司 Stream processing method, device and system
US9692667B2 (en) 2014-06-23 2017-06-27 Huawei Technologies Co., Ltd. Stream processing method, apparatus, and system
CN105335376B (en) * 2014-06-23 2018-12-07 华为技术有限公司 A kind of method for stream processing, apparatus and system
JP2016066291A (en) * 2014-09-25 2016-04-28 富士通株式会社 Data processing method, data processing program, and data processor
WO2017104072A1 (en) * 2015-12-18 2017-06-22 株式会社日立製作所 Stream data distribution processing method, stream data distribution processing system and storage medium

Similar Documents

Publication Publication Date Title
CN108595157B (en) Block chain data processing method, device, equipment and storage medium
WO2013145310A1 (en) Data stream parallel processing program, method, and system
US8726290B2 (en) System and/or method for balancing allocation of data among reduce processes by reallocation
US10402427B2 (en) System and method for analyzing result of clustering massive data
US8423605B2 (en) Parallel distributed processing method and computer system
US9910821B2 (en) Data processing method, distributed processing system, and program
US9892187B2 (en) Data analysis method, data analysis device, and storage medium storing processing program for same
JP4571609B2 (en) Resource allocation method, resource allocation program, and management computer
CN107193813B (en) Data table connection mode processing method and device
US20130297788A1 (en) Computer system and data management method
CN109033109B (en) Data processing method and system
US10904316B2 (en) Data processing method and apparatus in service-oriented architecture system, and the service-oriented architecture system
US20120089734A1 (en) Allocation of resources between web services in a composite service
CN103858103A (en) Resource allocation tree
US10606867B2 (en) Data mining method and apparatus
CN109791492B (en) Pipeline dependency tree query optimizer and scheduler
US20140059000A1 (en) Computer system and parallel distributed processing method
US20130138686A1 (en) Device and method for arranging query
US10387395B2 (en) Parallelized execution of window operator
JP2018067302A (en) Software service execution device, system, and method
WO2014046885A2 (en) Concurrency identification for processing of multistage workflows
JP6069503B2 (en) Parallel analysis platform for serial data and parallel distributed processing method
US8667008B2 (en) Search request control apparatus and search request control method
JP5108011B2 (en) System, method, and computer program for reducing message flow between bus-connected consumers and producers
US9852184B2 (en) Partition-aware distributed execution of window operator

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12873458

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12873458

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP