WO2018002976A1 - Dispositif de gestion, procédé de réglage d'environnement d'exécution, système de traitement de données de flux - Google Patents

Dispositif de gestion, procédé de réglage d'environnement d'exécution, système de traitement de données de flux Download PDF

Info

Publication number
WO2018002976A1
WO2018002976A1 PCT/JP2016/068942 JP2016068942W WO2018002976A1 WO 2018002976 A1 WO2018002976 A1 WO 2018002976A1 JP 2016068942 W JP2016068942 W JP 2016068942W WO 2018002976 A1 WO2018002976 A1 WO 2018002976A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
transmission interval
processing
data
processing device
Prior art date
Application number
PCT/JP2016/068942
Other languages
English (en)
Japanese (ja)
Inventor
隼之 土田
常之 今木
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Priority to PCT/JP2016/068942 priority Critical patent/WO2018002976A1/fr
Priority to JP2018524589A priority patent/JP6626198B2/ja
Publication of WO2018002976A1 publication Critical patent/WO2018002976A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures

Definitions

  • the present invention generally relates to data analysis.
  • Non-Patent Document 1 discloses a stream data processing system.
  • Patent Document 1 proposes a method for reducing the number of communication between apparatuses in stream data processing.
  • wireless sensors often used in the IoT system are battery driven, and it is desirable to lengthen the drive time by suppressing the communication power consumption by increasing the sensor data transmission interval and decreasing the number of transmissions. If the sensor data transmission interval is increased, the number of slave units that can be handled by the wireless master unit is increased, which makes it possible to reduce the price of the entire system. This is because in the CSMA / CD protocol that is widely used in communications in which communications are interrupted when a plurality of communications overlap, and communications are retried after a certain period of time, communication overlap is less likely to occur when the communications interval becomes longer. It is.
  • Patent Document 1 proposes a method for reducing the number of communications between devices in stream data processing. This is a method to reduce the number of communication between devices by processing a set of query processing with a lot of communication with the same device, but since the communication interval is not handled, the transmission interval of the stream data is lengthened and the number of transmissions is reduced. Reduction in network load and power saving effect cannot be expected.
  • a representative embodiment of the present invention includes a first processing device that transmits stream data, a second processing device that executes processing on the stream data received from the first processing device based on a query, and A management device connected via a network, the management device comprising a processor and a storage unit, wherein the storage unit stores information on a query graph composed of a plurality of the queries, and the query graph
  • the at least one query included includes information related to processing of processing the stream data at a predetermined interval
  • the processor converts the one or more queries to the second processing based on the query graph.
  • the first processing device assigns the stream data to the second processing based on the predetermined interval and the query graph. It calculates data transmission interval is an interval for transmitting apparatus, is characterized in that transmits information on the data transmission interval to the first processing unit.
  • FIG. 1 shows a configuration of an entire system according to a first embodiment.
  • 3 shows a flow of stream data processing according to the first embodiment.
  • An example of a query graph is shown.
  • An example of log data is shown.
  • An example of the query graph which set the allowable transmission interval is shown.
  • An example of the linear query graph which set the allowable transmission interval is shown.
  • An example of the expanded query graph is shown.
  • An example of the calculation method of the allowable transmission interval will be shown.
  • 3 shows a flow of a data transmission interval calculation process according to the first embodiment.
  • An example of the calculation method of the allowable transmission interval of the entire query graph will be shown.
  • An example of a query graph in which an allowable transmission interval of a query is calculated and expanded into a straight line is shown.
  • An example of the calculation method of the allowable transmission interval of the entire query graph will be shown.
  • An example of a query graph in which an allowable transmission interval of a query is calculated and expanded into a straight line is shown.
  • the flow of query group assignment processing is shown.
  • An example of the relationship between a server and an assigned query group is shown.
  • An example of configuration information is shown.
  • the flow of a query group extraction process is shown.
  • An example of the relationship between a server and an assigned query group is shown.
  • An example of the relationship between a server and an assigned query group is shown.
  • An example of an output device is shown.
  • An example of a query graph is shown. 9 shows a flow of a data transmission interval calculation process according to the second embodiment.
  • FIG. 1 shows an embodiment of the present invention and shows an example of a stream data processing system.
  • the management server 100 When executing the query graph, the management server 100 transmits a data transmission interval, which is an interval at which the computer 300 transmits the log data 341 to the server 200, to the computer 300, and executes a data process on the log data 341.
  • a group is assigned to the server 200. Based on the execution result of the query group in the server 200, the execution result of the query graph is output to the outside.
  • the management server 100, one or more servers 200, and one or more computers 300 are connected via a communication network 400.
  • the management server 100 and the server 200 are not limited to servers and may be virtual machines as long as they perform data processing.
  • FC Fibre Channel
  • SCSI Small Computer System Interface
  • TCP / IP Transmission Control Protocol / Internet Protocol
  • IEEE802.11, IEEE802.15.1, or IEEE802.15.4 may be adopted as a protocol for communication via the communication network 400.
  • the management server 100 includes a memory 110, a storage device 120, a processor 130, a network interface 140, an input device 150, and an output device 160 as hardware configurations.
  • the processor 130 executes a program stored in the memory 110.
  • the function of the computer 100 is realized by the processor 110 executing the program.
  • the processor 130 is executing a program that realizes the functional unit.
  • the network interface 140 is an interface for connecting to other devices via a network.
  • the storage device 120 stores one or a plurality of query graphs 121 describing the contents of processing for stream data, and configuration information 122.
  • the query graph is a tree of a plurality of queries, shows the processing contents of each query and the connection relationship between the queries, and includes information such as the execution order of the processing set in the query.
  • An example of the query graph is shown in FIG.
  • the query 3003 receives the execution result of the query 3002, and indicates that the process of the query 3003 is performed.
  • the execution result of the query 3003 is passed to the query 3004.
  • the storage device 120 may be a storage system having a controller and a plurality of storage media.
  • the storage device 120 may be a general computer having a storage medium, or may be a storage medium itself.
  • the storage medium may be HDD (Hard Disk Drive), SSD (Solid State Drive), or the like.
  • the input device 150 inputs data. Examples of the input device 150 include a keyboard, a mouse, a touch panel, a numeric keypad, and a scanner.
  • the output device 160 outputs data. Examples of the output device 160 include a display and a printer.
  • the memory 110 stores a program for realizing the transmission interval calculation unit 111, the transmission interval transmission unit 112, the query group allocation unit 113, and the query group execution result processing unit 114. The function of each part will be described later.
  • the server 200 executes the query group assigned from the management server 100 to the log data 341 transmitted from the computer 300.
  • the server 200 includes a network interface 210, a processor 220, and a memory 230 as hardware configurations.
  • the server 200 is not limited to a server, and may be a virtual machine.
  • the network interface 210 is an interface for connecting to other devices via a network.
  • the processor 220 executes a program stored in the memory 230.
  • the function of the server 200 is realized by the processor 220 executing the program.
  • the memory 230 stores a program for realizing the query group execution unit 231.
  • the query group execution unit 231 executes the processing of the query group assigned from the management server 100 to the log data 341 received from the computer 300.
  • the processing result is output to the management server 100 or another server 200.
  • the computer 300 transmits the log data 341 to the server 200 at the data transmission interval received from the management server 100.
  • the computer 300 includes a network interface 310, a processor 320, a memory 330, and a storage device 340 as hardware configurations.
  • the computer 300 only needs to store stream data, and may be a factory sensor device.
  • the network interface 310 is an interface for connecting to other devices via a network.
  • the processor 320 executes a program stored in the memory 330.
  • the function of the computer 300 is realized by the processor 320 executing the program.
  • the memory 330 stores a program for realizing the data transmission unit 331.
  • the data transmission unit 331 transmits the log data 341 to the server 200 based on the data transmission interval received from the management server 100.
  • the storage device 340 stores log data 341 that is an access log to the network of the computer.
  • the storage device 340 may be a storage system having a controller and a plurality of storage media.
  • the storage device 340 may be a general computer having a storage medium or the storage medium itself.
  • the storage medium may be HDD (Hard Disk Drive), SSD (Solid State Drive), or the like.
  • FIG. 2 is an example of a flow for executing stream data processing on the log data 341 stored in the computer 300 based on the query graph 121.
  • the transmission interval calculation unit 111 calculates the data transmission interval from the computer 300 to the server 200 based on the query graph 121. A method of calculating the data transmission interval from the computer 300 to the server 200 based on the query graph 121 will be described later.
  • the transmission interval transmission unit 112 transmits the calculated data transmission interval to the computer 300.
  • the computer 300 transmits log data 341 to the server 200 based on the received data transmission interval.
  • the query group assignment unit 113 assigns the query group in the query graph 121 to the server 200 based on the configuration information 122, the query graph 121, and the calculated data transmission interval. A method for assigning the query group to the server 200 will be described later.
  • the query group execution unit 231 executes the assigned query group on the log data 341 transmitted by the computer 300.
  • the query group execution result processing unit 114 processes the execution result of the query group of each server 200 as the execution result of the query graph 121.
  • the query group execution result processing unit 114 outputs the execution result of the query graph 121 to the output device 160, and ends the process.
  • the execution result of the query group in the server 200 is processed and output by the management server 100.
  • the server 200 outputs the processing result to another server 200, or the server 200 outputs it. It is possible to have a device and output the processing result of the server 200 to the output device.
  • a method in which the transmission interval calculation unit 111 calculates the data transmission interval of the log data 341 from the computer 300 to the server 200 based on the query graph 121 in S201 will be specifically described.
  • FIG. 3 shows an example of query graph processing for performing network analysis.
  • basic statistical information such as the average value and maximum value of network communication response time is calculated at regular intervals, and advanced analysis such as load prediction is performed using the calculated value.
  • advanced analysis such as load prediction is performed using the calculated value.
  • An example in which basic statistical information is calculated by stream data processing will be described.
  • the query graph 121 in FIG. 3 is a stream data processing query graph for reading the analysis target network log and calculating the maximum response time of normal data and the number of communication errors for each device, and is composed of nine queries. Yes. 3001, 3002, 3003, 3004, 3005, 3006, 3007, 3008, and 3009 represent queries, and arrows represent connections between queries. Processing performed for each query will be described.
  • the log data 341 to be analyzed is received at 3001, extraction processing of only normal data from the received data is performed at 3002, and the maximum response time for each device is calculated every 4 seconds at 3003. Only the data having the maximum response time of 100 milliseconds or more is extracted from the calculation result in 3004, and the processing result of the maximum response time is output to the output device 160 in 3005.
  • 3001 is a query that receives log data 341 from the computer 300
  • 3005 and 3009 are queries that output processing executed in the query graph.
  • FIG. 4 is an example of log data 341 that is input data of the processing example.
  • the log data 341 has a plurality of records.
  • This record has TIMESTAMP 4001, ID 4002, RESPONSE_TIME 4003, and STATUS_CODE 4004 as data attributes.
  • TIMESTAMP 4001 represents the date and time when the communication log was generated.
  • ID 4002 represents information for identifying a device that has performed communication of the communication log.
  • RESPONSE_TIME 4003 represents a communication response time.
  • STATUS_CODE 4004 represents the status code of the communication.
  • the communication log generation time (TIMESTAMP 4001) is “08: 03: 01: 01”, the device identification information (ID 4002) is “1”, and the communication response time (RESPONSE_TIME 313). Is “70” and the status code (STATUS_CODE 34004) is “200”.
  • the status code “200” indicates that data has been normally communicated, and “50” indicates that a communication error has occurred.
  • the records 4005 and 4007 have a status code “200”, the data is normally communicated, and the processing on the branch 3002 side of the query graph 121 is performed. Since the response time of the record 4005 is 70 milliseconds, the query 3004 is not extracted, and the record 4007 whose response time is 250 milliseconds is processed with the query 3005. The record 4006 indicates that a communication error has occurred because the status code is “50”, and processing on the branch 3006 side of the query graph 121 is performed.
  • 3003 and 3007 include an RSTREAM process that accumulates data and outputs it every few seconds, instead of sequentially transmitting and processing the data, the data is transmitted collectively at the RSTREAM time interval, Even if processing such as information calculation is performed, the result output time interval is not affected.
  • This transmission interval that can be lengthened is called an allowable transmission interval.
  • FIG. 5 is an example of the query graph of FIG. 3 after setting an allowable transmission interval for each query.
  • Reference numerals 5001, 5002, 5003, 5004, 5005, 5006, 5007, 5008, and 5009 denote queries.
  • An allowable transmission interval of 4 seconds is set to 5003, and an allowable transmission interval of 30 seconds is set to 5007 by reading the query graph of FIG.
  • the allowable transmission interval of the entire query graph is an allowable transmission interval set for the entire query graph calculated based on the allowable transmission interval of each query.
  • This allowable transmission interval of the entire query graph is referred to as a total allowable transmission interval.
  • a method for calculating the total allowable transmission interval will be described.
  • FIG. 6 is used to explain an example of a method for setting the overall allowable transmission interval of a query graph composed of straight lines.
  • the query graph 601 is composed of a linear query graph made up of queries 6001, 6002, 6003, and 6004, and the allowable transmission intervals of queries are set to 6002 and 6003.
  • Reference numeral 6001 denotes a query related to data reception
  • reference numeral 6004 denotes a query related to output of data processing results.
  • 2 seconds is the entire allowable transmission interval of the query graph 601.
  • FIG. 7 is an example of the query graph of FIG. 3 after expanding each path from the query related to the output of the processing result of the query graph to the query related to data reception.
  • a straight line A701 and a straight line B702 each represent a route after development.
  • the query 5003 has an allowable transmission interval of 4 seconds
  • the query 5007 has an allowable transmission interval of 30 seconds.
  • the maximum common divisor of each allowable transmission interval, 2 seconds is the allowable transmission interval of the entire query graph.
  • the number of communications can be reduced by receiving data for 2 seconds at a time, and the network load and power consumption can be suppressed.
  • the straight line allowable transmission interval can be set as the total allowable transmission interval of the query graph.
  • FIG. 8A shows an expansion when there are a plurality of queries on the input side among the queries constituting the branch, the allowable transmission interval is set for the input side query, but the allowable transmission time is not set for the output side query.
  • Reference numerals 801, 802, and 803 denote queries. In 801, an allowable transmission interval of 2 seconds is set, and in 802, an allowable transmission interval of 6 seconds is set.
  • an allowable transmission interval is set for the input-side query, and not set for the output-side query, the greatest common divisor of the allowable transmission interval of the input-side query Is the allowable transmission interval of the output side query. Therefore, 2 seconds, which is the greatest common divisor of the allowable transmission interval of 801 and 2 seconds of the allowable transmission interval of 802, is set as the allowable transmission interval of 803.
  • FIG. 8B shows an expansion in the case where there are a plurality of output side queries among the queries constituting the branch, the allowable transmission interval is not set for the input side query, but the allowable transmission interval is set for the output side query.
  • Reference numerals 804, 805, and 806 denote queries. In 805, an allowable transmission interval of 2 seconds is set, and in 806, an allowable transmission interval of 3 seconds is set.
  • an allowable transmission interval is set for the output-side query.
  • the greatest common divisor of the transmission interval is set as the allowable transmission interval of the input side query. Accordingly, 1 second, which is the greatest common divisor of the allowable transmission interval of 805 of 2 seconds and the allowable transmission interval of 3 seconds of 806, is set as the allowable transmission interval of 804.
  • FIG. 8C shows an expansion when there are a plurality of queries on the input side among the queries constituting the branch, the allowable transmission interval is not set for the input side query, but the allowable transmission time is set for the output side query.
  • Reference numerals 807, 808, and 809 denote queries.
  • an allowable transmission interval of 2 seconds is set.
  • FIG. 9C when there are a plurality of input side queries and the allowable transmission interval is not set for the input side query, but the output side query is set, the allowable transmission interval of the output side query is input as it is.
  • the allowable transmission interval of the side query Accordingly, an allowable transmission interval of 809 of 2 seconds is set as an allowable transmission interval of 807 and 808.
  • FIG. 8D shows an expansion in the case where there are a plurality of output side queries among the queries constituting the branch, the allowable transmission interval is set for the input side query, but the allowable transmission interval is not set for the output side query.
  • Reference numerals 810, 811 and 812 denote queries.
  • an allowable transmission interval of 1 second is set.
  • FIG. 8D when there are a plurality of output-side queries and an allowable transmission interval is set for the input-side query, an allowable transmission interval is not set for the output-side query.
  • the transmission interval is set as the allowable transmission interval of the output side query as it is. Therefore, an allowable transmission interval of 810 of 1 second is set as an allowable transmission interval of 811 and 812.
  • the above-described calculation process may be further executed for a query for which the allowable transmission interval is not set.
  • the allowable transmission interval is not set for the query before or after the processing of the query for which the allowable transmission interval is set, and there is only one before or after query, the allowance of the previous or subsequent query is permitted.
  • the transmission interval may be set to be equal to the allowable transmission interval of the query for which the allowable transmission interval is set.
  • the transmission interval calculation unit 111 starts a calculation process of a data transmission interval from the computer 300 to the server 200.
  • the analysis of the log data may be started by a query graph execution request from an administrator of the management system of the management server 100, or may be started at a timing when the query graph is newly stored in the storage device 121.
  • the transmission interval calculation unit 111 sets the conversion number of seconds as an allowable transmission interval for the query that performs the RSTREAM conversion of the query graph 121.
  • S902 In the query graph in which the allowable transmission interval is set, it is determined whether there is a query for which a new allowable transmission interval can be set from the context of the query.
  • the allowable transmission interval of the query is set with reference to the context of the query. Once set, the process returns to S902.
  • S904 It is determined whether there is a branch in the query graph. When there is a branch, the process of S905 is performed, and when there is no branch, the process of S906 is performed.
  • S905 As described with reference to FIG. 7, the query graph is expanded into a straight line based on the branch. When the straight line is developed, the process returns to S904.
  • S906 As described with reference to FIG. 6, an allowable transmission interval of the straight line is calculated for each straight line after the expansion.
  • the minimum allowable transmission interval for queries in a straight line is defined as the allowable transmission interval for a straight line.
  • S907 It is determined whether an allowable transmission interval is set for all the straight lines after expansion. If it is set, the process of S908 is performed. If there is a straight line that is not set, the process of S910 is performed.
  • S908 The minimum value of the allowable transmission intervals of all the straight lines after expansion is set as the allowable transmission interval of the entire query graph. Note that the minimum allowable transmission interval for all queries may be selected as the allowable transmission interval for the entire query graph.
  • the transmission interval calculation unit 111 notifies the transmission interval transmission unit 112 of the allowable transmission interval of the entire query graph as the data transmission interval of log data from the computer 300 to the server 200. This completes the process. (S910) The transmission interval calculation unit 111 notifies the transmission interval transmission unit 112 that the data transmission interval of data from the computer 300 to the server 200 cannot be set. This completes the process. When the allowable transmission interval for the entire query graph cannot be set, data is sequentially transmitted from the computer 300 to the server 200.
  • the flow until the transmission interval calculation unit 111 notifies the transmission interval transmission unit 112 of the data transmission interval from the computer 300 to the server 200 based on the query graph 121 will be specifically described with reference to FIGS. 10 and 11. .
  • the query graphs 1001 to 1028 in FIG. 10 all represent queries.
  • a query 1001 is a query that receives data
  • queries 1007 and 1008 are queries that output processing results.
  • the allowable transmission interval is set for each query as shown in FIG. 10B.
  • the maximum common divisor 1 second of the allowable transmission interval 2 seconds of the query 1013 and the allowable transmission interval 3 seconds of the query 1014 is set as the allowable transmission interval.
  • the maximum common divisor 1 second of the allowable transmission interval 6 seconds of the query 1015, the allowable transmission interval 4 seconds of the query 1016, and the allowable transmission interval 3 seconds of the query 1014 is set as the allowable transmission interval.
  • the allowable transmission interval 3 seconds of the query 1014 is set as the allowable transmission interval.
  • 1 second which is the allowable transmission interval of the query 1022, is set as the allowable transmission interval.
  • FIG. 11 is a diagram in which FIG. 10C is expanded into a straight line based on the branch of the query graph and arranged as 1101, 1102, 1103, and 1104 from the top.
  • the allowable transmission interval of the straight line 1101 is calculated as 1 second which is the minimum value of the allowable transmission interval of the queries existing in the straight line.
  • the allowable transmission interval of the straight line 1102 is calculated as 1 second which is the minimum value of the allowable transmission interval of the queries existing in the straight line.
  • the straight line allowable transmission interval 1103 is calculated as 1 second, which is the minimum value of the allowable transmission intervals of queries existing in the straight line.
  • the permissible transmission interval of the straight line 1104 is calculated as 1 second, which is the minimum value of the permissible transmission intervals of queries existing in the straight line.
  • the allowable transmission interval for each straight line is 1 second, 1 second, 1 second, and 1 second, and the minimum value of 1 second is calculated as the allowable transmission interval for the entire query graph.
  • the transmission interval calculation unit 111 notifies the transmission interval transmission unit 112 of 1 second as the data transmission interval from the computer 300 to the server 200. Even if the computer 300 transmits the data for one second to the server 200 together, the output result does not cause a delay. As described above, it is possible to reduce the number of communication times under the constraint condition of the query graph, and to suppress the network load and power consumption, rather than sequentially transmitting data.
  • a query 1201 is a query that receives data
  • queries 1207 and 1208 are queries that output processing results.
  • FIG. 13 is a diagram in which 1301, 1302, 1303, and 1304 are arranged from the top of the query graph by expanding FIG. 12B into straight lines based on branches.
  • the allowable transmission interval for a straight line is calculated as 2 seconds, but for 1303 and 1304, the allowable transmission interval for a straight line cannot be calculated. Therefore, the entire allowable transmission interval of the query graph cannot be set, and the transmission interval calculation unit 111 notifies the transmission interval transmission unit 112 to sequentially transmit data from the computer 300 to the server 200.
  • the query group assigning unit 113 can analyze and output the acquired log data without delay by assigning a query group so that each server 200 can process data within the data transmission interval.
  • FIG. 14 shows an example of the flow of query group assignment processing to the server 200.
  • the start of the query group assignment process may be the timing at which the transmission interval calculation unit 111 calculates the data transmission interval, or may be started by an instruction from an administrator of the management system of the management server 100.
  • S1401 In the query graph, the transmission amount of data between each query is calculated.
  • S1402 The query from the start of data output to the query with the smallest data transmission amount between queries is set as a query group candidate to be assigned to the server.
  • S1403 It is determined based on the server configuration information 122 whether there exists a server that can process query group candidates to be allocated to the server within the calculated data transmission interval.
  • S1404 If there is a server that can process the query group candidates to be assigned to the server within the calculated data transmission interval, the processing of S1406 is performed. Otherwise, the processing of S1405 is performed.
  • S1405 The query having the next smallest data transmission amount between queries is set as a query group candidate to be allocated to the server.
  • S1406 The query group candidate to be allocated is allocated to the server determined in S1404.
  • S1407 It is determined whether all queries in the query graph are allocated to the server. If assigned, the query group assignment process is terminated. If the assignment has not been completed, the process of S1402 is performed on the query for which the query group assignment in the query graph has not been completed.
  • the data output can be processed without delay.
  • FIG. 15 shows a query to be assigned.
  • Reference numerals 1501 to 1505 denote queries.
  • the data transmission interval can be calculated as 1 second.
  • FIG. 16 is an example of a specific example of the configuration information 112 in FIG. 1, and stores, for example, the number of expected processing tuples of a query, the processing time for each tuple, and the estimated processing time.
  • the number of assumed processing tuples can be calculated from, for example, input tuple and query information.
  • the product of the number of column value types in the Group by specified column of the tuple that is input to the query within the specific time is passed to the subsequent query.
  • the number of processing tuples in the preceding query becomes the number of processing tuples in the subsequent query as it is.
  • the number of processing tuples passed to subsequent queries also decreases in SUM processing and AVG processing for one column.
  • the estimated processing time can be calculated from, for example, the actual measurement processing time of the query.
  • the data transmission amount between each query is calculated, and a query group candidate that minimizes the data transmission amount between query groups is obtained.
  • a method for obtaining the minimum query group will be described later.
  • 15B shows an example in which all queries are processed in one query group.
  • processing time is calculated for 15B with the smallest data transmission amount, and it is determined whether there is a server that can be processed within the allowable transmission interval.
  • the processing time is calculated using the information in FIG.
  • the sum of the estimated processing times of the queries constituting the query group is the estimated processing time of the query group.
  • the total estimated processing time of data received per second is 2 seconds, which is longer than the calculated data transmission interval. Therefore, even if data is received every second, the received data cannot be processed, and unprocessed data is accumulated indefinitely.
  • 15C is a query group allocation with the next smallest data transmission amount between query groups after 15B.
  • the total estimated processing time of the query group assigned to the server 1 is 1 second, and the input data amount and the processable amount are balanced.
  • the query group assigning unit 113 assigns a query group assigned to another server for a query 1505 that has not been assigned a query group. With the above operation, data can be processed without causing a delay in the output of the processing result.
  • FIG. 17 shows an example of the flow of query group extraction processing with small inter-query group traffic.
  • the start of the query group extraction process may be the timing at which the transmission interval calculation unit 111 calculates the data transmission interval, or may be started by an instruction from the administrator of the management system of the management server 100.
  • “query group candidate having the largest number of queries for performing RSTREAM processing including Group by on the data input side” is extracted, and “the number of RSTREAM processing including Group by on the data input side” is stored. .
  • FIG. 18 shows a query to be assigned.
  • Reference numerals 1801 to 1806 and 1811 to 1816 denote queries.
  • FIGS. 19A and 19B are specific examples in which data reciprocates and does not reciprocate between query groups. When all the queries in the query graph are assigned to one query group, data transmission between query graphs is not necessary, and therefore the data transmission amount is minimized.
  • Processing starts when all queries shown in 18A are included in one query group.
  • 18A is passed to the query group assignment process as the query group with the smallest inter-query group communication amount, and when the assignment cannot be performed, the query group with the next smallest inter-query group transmission amount is extracted.
  • the query group with the smallest inter-query group transmission amount is 18B that does not include the 1815 query that is “RSTREAM processing including Group by”.
  • Query group assignment processing is performed for this query group. If the 18B query group could not be assigned, a query group with the few “RSTREAM processes including Group by” is extracted next. In this extraction, it is desirable that the data round trip between query groups does not occur as shown in FIG. Queries 1901 and 1902 are related to data output.
  • FIG. 19A data is reciprocating between query group 1 and query group 2.
  • data does not reciprocate between the query group 3 and the query group.
  • the query graph 19A data is exchanged twice between groups, that is, between servers, until the data is output, but in the query graph 19B, data exchange between servers occurs only once.
  • An example of a method for determining round trips between query groups is to determine whether the same query group is to be followed multiple times for each path from data input to data output. Can be determined.
  • FIG. 20 is an example in which the administrator of the management server 100 confirms the query graph 121 through the output device 160.
  • a query graph 2002 represents a query graph for which a data transmission interval is calculated.
  • a calculation result 2003 is a calculation result of the data transmission interval.
  • the tuning point 2004 shows a method for extending the data transmission interval.
  • the calculation result 2003 indicates that the data transmission interval is 1 second and that the allowable transmission interval between the straight line A and the straight line B is 1 second.
  • the tuning point 2004 indicates that the data transmission interval can be increased to 2 seconds when the allowable transmission interval of the query a and the query b is 2 seconds.
  • the administrator of the management system of the management server 100 confirms the query graph 121 on the 2001 interface, changes the query within a range that satisfies the business requirements, and sets / changes the allowable transmission interval to log data data. It is possible to control the transmission interval. By controlling the data transmission interval of log data from the computer 300 to the server 200, the administrator can reduce the number of log data transmissions and set power consumption to be suppressed.
  • the overall allowable transmission interval of the query graph can be calculated by a method other than the method of developing the straight line in the first embodiment. Specifically, the query that outputs the processing result is traced from the query that receives the data of the entire processing based on the query graph, and it is checked whether there is a query that can set the allowable transmission interval in all the routes, and is permitted in all the routes. When the transmission interval can be set, the greatest common divisor of the allowable transmission interval of each path can be set as the allowable transmission interval of the entire process.
  • FIG. 21 is an example of a query graph after setting an allowable transmission interval for each query.
  • the number on the query indicates the allowable transmission interval.
  • 2111, 2112, 2113, 2114, 2115, 2116, 2117, 2121, 2122, 2123, 2124, 2131, 2132, 2133, 2134, 2135 represent queries.
  • An allowable transmission interval of 2 seconds is set for 2114, an allowable transmission interval of 1 second for 2115, an allowable transmission interval of 1 second for 2116, an allowable transmission interval of 2 seconds for 2123, and an allowable transmission interval of 3 seconds for 2133.
  • the data transmission interval is calculated for the processing of the 2110 computer 1. It is checked whether an allowable transmission interval can be set for each path from 2140 output, which is a query for outputting a processing result based on the query graph, to 2112 which is a query for receiving data. There are three routes from 2140 to 2111, 2115 and 2116 have an allowable transmission interval of 1 second, and 2114 has an allowable transmission interval of 2 seconds. One second which is the greatest common divisor of the allowable transmission interval of each path is set as the data transmission interval of the computer 1 process.
  • the data transmission interval is calculated for the processing of the computer 2. It is checked whether an allowable transmission interval can be set for each path from 2140 output, which is a query for outputting a processing result based on the query graph, to 2122, which is a query for receiving data. There is one route from 2140 to 2122, and 2123 is set with an allowable transmission interval of 2 seconds. The allowable transmission interval of 2 seconds is set as the data transmission interval of the computer 2 process.
  • the data transmission interval is calculated for the processing of the computer 3. It is checked whether an allowable transmission interval can be set for each path from 2140 output, which is a query for outputting a processing result based on the query graph, to 2132, which is a query for receiving data. There are two routes from 2140 to 2132, and an allowable transmission interval of 3 seconds is set in 2133. However, since an allowable transmission interval cannot be set for the routes 2140, 2134, and 2132, a data transmission interval cannot be set for the processing of the computer 3.
  • FIG. 22 shows an example of the flow of data transmission interval calculation processing.
  • the transmission interval calculation unit 111 starts a calculation process of a data transmission interval from the computer 300 to the server 200.
  • the analysis of the log data may be started by a query graph execution request from an administrator of the management system of the management server 100, or may be started at a timing when the query graph is newly stored in the storage device 121.
  • the transmission interval calculation unit 111 sets an allowable transmission interval for each query in the query graph.
  • the transmission interval calculation unit 111 determines whether there is an unconfirmed route in which an allowable transmission interval can be set in the query graph.
  • the transmission interval calculation unit 111 checks whether an allowable transmission interval can be set for one of the paths of the query graph, and calculates the allowable transmission interval if it can be set. (S2205) The transmission interval calculation unit 111 determines whether or not an allowable transmission interval is set for all routes. If the determination result is affirmative, S2206 is executed, and if the determination result is negative, S2207 is executed. (S2206) The transmission interval calculation unit 111 sets the maximum common divisor of the allowable transmission interval of each path as the allowable transmission interval of the entire query graph.
  • the transmission interval calculation unit 111 notifies the transmission interval transmission unit 112 of the allowable transmission interval of the entire query graph as the data transmission interval of log data from the computer 300 to the server 200. This completes the process.
  • the transmission interval calculation unit 111 notifies the transmission interval transmission unit 112 that the data transmission interval of data from the computer 300 to the server 200 cannot be set. This completes the process.
  • data is sequentially transmitted from the computer 300 to the server 200.
  • S2205 may be performed without performing S2202 and S2203. As a result, the procedure up to the calculation of the data transmission interval can be reduced.
  • the present invention is not limited to the embodiment described above. For example, a case where the present invention is applied to a factory IoT system will be described.
  • data is collected from a sensor device installed in the factory, and equipment failure is detected based on the query graph 121. Since the sensor device is installed in a dangerous place, there are many battery drives, and it is necessary to send the acquired data to the server wirelessly.
  • the data transmission interval based on the query graph 121, it is possible to reduce the number of data transmissions, suppress battery consumption, and reduce the frequency of battery replacement.
  • management server 110 memory 111 transmission interval calculation unit 112 transmission interval transmission unit 113 query group allocation unit 114 query group execution result processing unit 120 storage device 121 query graph 122 configuration information 130 processor 140 network interface 150 input device 160 output device 200 server 210 Network interface 220 Processor 230 Memory 231 Query group execution unit 300 Computer 310 Network interface 320 Processor 330 Memory 331 Data transmission unit 340 Storage device 341 Log data 400 Communication network

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Lorsqu'un graphe d'interrogation comprend un processus d'exécution d'un processus toutes les quelques secondes par rapport aux données de flux, une sortie d'un résultat de traitement n'est pas influencée, même lorsque les données sont traitées dans une unité intégrée jusqu'au nombre de secondes de conversion. La présente invention utilise une caractéristique d'un tel processus de données de flux, extrait le nombre de secondes de conversion à partir de chaque requête, et calcule la valeur maximale d'un intervalle de transmission de données des données de flux sur la base du nombre extrait de secondes de conversion et d'un graphique de requête.
PCT/JP2016/068942 2016-06-27 2016-06-27 Dispositif de gestion, procédé de réglage d'environnement d'exécution, système de traitement de données de flux WO2018002976A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2016/068942 WO2018002976A1 (fr) 2016-06-27 2016-06-27 Dispositif de gestion, procédé de réglage d'environnement d'exécution, système de traitement de données de flux
JP2018524589A JP6626198B2 (ja) 2016-06-27 2016-06-27 管理装置、実行環境設定方法、ストリームデータ処理システム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/068942 WO2018002976A1 (fr) 2016-06-27 2016-06-27 Dispositif de gestion, procédé de réglage d'environnement d'exécution, système de traitement de données de flux

Publications (1)

Publication Number Publication Date
WO2018002976A1 true WO2018002976A1 (fr) 2018-01-04

Family

ID=60786253

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/068942 WO2018002976A1 (fr) 2016-06-27 2016-06-27 Dispositif de gestion, procédé de réglage d'environnement d'exécution, système de traitement de données de flux

Country Status (2)

Country Link
JP (1) JP6626198B2 (fr)
WO (1) WO2018002976A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022113578A1 (fr) * 2020-11-27 2022-06-02 日本電気株式会社 Dispositif de gestion, système, procédé de gestion et support d'enregistrement

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010217968A (ja) * 2009-03-13 2010-09-30 Hitachi Ltd ストリームデータ処理システムにおける障害回復方法、計算機システム及び障害回復プログラム
JP2014081759A (ja) * 2012-10-16 2014-05-08 Hitachi Ltd ストリームデータ処理方法、ストリームデータ処理装置及びプログラム
WO2014188500A1 (fr) * 2013-05-20 2014-11-27 富士通株式会社 Programme et système de parallélisation du traitement d'un flux de données
WO2014204489A2 (fr) * 2013-06-21 2014-12-24 Hitachi, Ltd. Procédé de traitement de données de flux à réglage temporel

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5862245B2 (ja) * 2011-11-30 2016-02-16 富士通株式会社 配置装置、配置プログラムおよび配置方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010217968A (ja) * 2009-03-13 2010-09-30 Hitachi Ltd ストリームデータ処理システムにおける障害回復方法、計算機システム及び障害回復プログラム
JP2014081759A (ja) * 2012-10-16 2014-05-08 Hitachi Ltd ストリームデータ処理方法、ストリームデータ処理装置及びプログラム
WO2014188500A1 (fr) * 2013-05-20 2014-11-27 富士通株式会社 Programme et système de parallélisation du traitement d'un flux de données
WO2014204489A2 (fr) * 2013-06-21 2014-12-24 Hitachi, Ltd. Procédé de traitement de données de flux à réglage temporel

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022113578A1 (fr) * 2020-11-27 2022-06-02 日本電気株式会社 Dispositif de gestion, système, procédé de gestion et support d'enregistrement

Also Published As

Publication number Publication date
JPWO2018002976A1 (ja) 2019-04-11
JP6626198B2 (ja) 2019-12-25

Similar Documents

Publication Publication Date Title
CN109564568B (zh) 用于分布式数据集索引的装置、方法和机器可读存储介质
US11138252B2 (en) System for organizing and fast searching of massive amounts of data
US8190599B2 (en) Stream data processing method and system
US9558045B2 (en) Realizing graph processing based on the MapReduce architecture
US9081829B2 (en) System for organizing and fast searching of massive amounts of data
US8924328B1 (en) Predictive models for configuration management of data storage systems
KR102559290B1 (ko) 하이브리드 클라우드 기반의 실시간 데이터 아카이빙 방법 및 시스템
US8782341B1 (en) Embedded advisory framework for storage configuration management
JP2016146020A (ja) データ分析システム及び分析方法
JP6626198B2 (ja) 管理装置、実行環境設定方法、ストリームデータ処理システム
JP2015079431A (ja) 業務システム連携装置およびその連携方法
WO2015019488A1 (fr) Système de gestion et procédé d'analyse d'événement par un système de gestion
CN115168042A (zh) 监控集群的管理方法及装置、计算机存储介质、电子设备
CN114706893A (zh) 故障检测方法、装置、设备及存储介质
CN113568892A (zh) 一种基于内存计算对数据源进行数据查询的方法和设备
US11003600B2 (en) Method and system for scheduling I/O operations for processing
KR101878291B1 (ko) 에너지 빅데이터 관리 시스템 및 그 방법
US11755453B1 (en) Performing iterative entity discovery and instrumentation
KR102448702B1 (ko) 엣지 서비스 증설 제어 시스템 및 그 제어방법
US10248458B2 (en) Control method, non-transitory computer-readable storage medium, and control device
CN117971941A (zh) 移动互联网大数据多业务高效并行计算方法及装置
CN105765569A (zh) 一种数据分发方法,装载机及存储系统
JP2008040515A (ja) 条件判定システム、方法、及び、プログラム
Sabharwal et al. Big data processing tuning in the cloud

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16907210

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018524589

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16907210

Country of ref document: EP

Kind code of ref document: A1