WO2015008379A1 - データ処理装置およびデータ処理方法 - Google Patents
データ処理装置およびデータ処理方法 Download PDFInfo
- Publication number
- WO2015008379A1 WO2015008379A1 PCT/JP2013/069630 JP2013069630W WO2015008379A1 WO 2015008379 A1 WO2015008379 A1 WO 2015008379A1 JP 2013069630 W JP2013069630 W JP 2013069630W WO 2015008379 A1 WO2015008379 A1 WO 2015008379A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- stream
- data
- processing
- program
- batch
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
Definitions
- the present invention relates to a data processing apparatus and a data processing method for processing data.
- time series data monitoring process rules are defined in advance for time series data such as sensors and logs, and processing such as time series data filtering, aggregation, anomaly detection, and future prediction is executed according to the rules.
- time series data monitoring process there is monitoring of a plant or a server in a factory.
- the plant monitoring process in a factory is a process that acquires the value of a sensor such as temperature or voltage attached to a machine, extracts a singular point from the time series of the sensor value for several hours to every day, and determines that it is abnormal. is there.
- the server monitoring process obtains CPU (Central Processing Unit) and hard disk usage or network packet volume from the server log, and detects abnormalities by monitoring time-series changes from several seconds to several hours. This is the process to be performed.
- Examples of the execution method of the time series data monitoring processing program include batch processing and stream processing.
- a program for batch processing (hereinafter referred to as “batch program”) collectively inputs time series data stored in a file or database as vector data and outputs the processing results as vector data.
- middleware that supports execution of a batch program include the batch processing platform disclosed in Patent Document 1 below.
- the batch processing platform is middleware that executes scheduling, starting, and stopping of batch programs. Batch processing is used for projects that require low response time, such as factory plant monitoring, and that require high processing throughput and low cost.
- a stream processing program sequentially processes stream data to be delivered from moment to moment, and sequentially outputs the processing result as stream data.
- middleware that supports the execution of a stream program include the stream processing platform disclosed in Non-Patent Document 1.
- the stream processing infrastructure is middleware that executes scheduling, starting, and stopping of a stream program.
- Stream processing is used for projects that have high response time requirements, such as server monitoring, and low processing throughput and cost requirements.
- Patent Document 1 supports operating a stream program on a batch basis. For this reason, Patent Document 1 designates a time range of input data for accumulated data on a batch processing platform, converts data within the range to stream data, and executes the stream program. On the other hand, Non-Patent Document 1 supports the operation of a batch program on a stream basis. Therefore, Non-Patent Document 1 collects a plurality of stream data in a data block called SigSegs on a stream processing platform, and executes a batch program that uses the data block as input / output.
- SigSegs a data block
- Non-Patent Document 1 when the batch program is executed on the stream processing platform of Non-Patent Document 1, it is not considered that the same stream data is provided in a plurality of data blocks, that is, the input data of the batch program is overlapped. . Therefore, on the stream processing platform of Non-Patent Document 1, there is a problem that it is not possible to execute a batch program that holds time-series data in a certain number of windows and slides the window for processing.
- the present invention can perform batch processing and stream processing on one processing base without changing the code or algorithm of the other processing program executed on the other processing base. It is an object of the present invention to make it possible to execute the program of the above process with overlapping time series data.
- a data processing apparatus and a data processing method are a processor, a stream program that executes stream processing, a batch program that executes batch processing, and a stream processing control program that controls the stream program And a data processing method executed by the data processing device, wherein the processor uses a stream processing control program to generate a stream data from a stream data stream.
- a first generation procedure for generating first vector data in which each stream data of the first stream data group is grouped as an element and the stream processing control program, Time For the second stream data group in a time series starting from the middle stream data of the first stream data group and having the same number of data as the first stream data, A second generation procedure for generating second vector data in which each stream data of the second stream data group is collected as an element; and the first generation procedure and the second generation procedure by the stream processing control program And a control procedure for executing the batch processing by inputting the first vector data and the second vector data generated by the above to the batch program.
- a data processing apparatus and a data processing method are a processor, a batch program for executing batch processing, a stream program for executing stream processing, and a batch for controlling the batch program.
- a data processing device having a memory for storing a processing control program, and a data processing method executed by the data processing device, wherein the processor includes an element sequence that is a value for each time by the batch processing control program A first generation procedure for generating a first stream data group in which each element of the first element group in the element sequence is divided into time series from vector data, and the batch processing control program, the element And the first element group in the column starts from the middle, and the first element group A first generation procedure for generating a second stream data group in which each element of the second element group is divided into time series for the second element group in time series having the same number of elements as the prime group And the batch processing control program inputs the first stream data group and the second stream data group generated by the first generation procedure and the second generation procedure to the stream program to perform stream processing.
- a stream of a third stream data group that is an execution result of the control procedure to be executed and the stream processing executed by the control procedure when the first stream data group is input to the stream program by the batch processing control program Data is acquired, and the second stream data group is input to the stream program and the control is performed.
- the other processing program is executed on one processing platform without changing the code or algorithm of the other processing program executed on the other processing platform.
- This processing program can be executed with overlapping time series data.
- FIG. 33 is an explanatory diagram illustrating an example of a calculation state storage area illustrated in FIG. 32. It is a flowchart which shows the process sequence example by a 2nd static determination part. It is a flowchart which shows the example of a process sequence by a stream execution monitoring part. It is a flowchart which shows the process sequence example by a 2nd dynamic determination part. 33 is a flowchart illustrating an example of a processing procedure performed by an input data / vector TO stream conversion unit illustrated in FIG. 32. It is explanatory drawing which shows the example of conversion from vector data to stream data. 33 is a flowchart illustrating an example of a processing procedure performed by an output data stream TO vector conversion unit illustrated in FIG.
- the other processing is executed with overlapping time series data. This makes it possible to execute the program of the other processing on one processing base without changing the code or algorithm of the program of the other processing executed on the one processing base.
- Example 1 an example (Example 1) in which a batch program is executed on a stream processing base and an example (Example 2) in which a stream program is executed on a batch processing base will be described.
- program and “processing infrastructure” may be described as the subject.
- the program and “processing infrastructure” are defined as being executed by a processor in a memory and a communication port ( Since the communication control device is used, the description may be given with the processor as the subject.
- the processing disclosed with the program as the subject may be processing performed by a computer. Further, part or all of the program may be realized by dedicated hardware.
- Example 1 1 and 2 are explanatory diagrams illustrating an example of executing a batch program on a stream processing platform.
- FIG. 1 shows an example of execution when there is no data overlap.
- the batch program has a program configuration in which the number of data corresponding to the window width is four and the number of slides that is the number of data to be slid is two. Note that the unit of time is “seconds” as an example.
- the stream processing platform executes a predetermined calculation while sliding stream data having a window width of 4 pieces by two.
- the stream processing platform converts the stream data sequence 101 from time 1:00 to 1:03 into vector data 102, which is a data block, by stream TO vector conversion 100. Thereby, the stream processing platform gives the vector data 102 to the batch program BP, and the batch program BP executes the calculation using the vector data 102. Since the number of slides is two, the next target stream data string is the stream data string 103 at time 1:05.
- the stream data sequence 103 at time 1:05 overlaps with the stream data sequence 101 because it includes stream data at times 1:02 and 1:03. Since the stream processing platform does not have data overlap, the stream data sequence 103 cannot be converted into the vector data 104 by the stream TO vector conversion 100. Therefore, the stream processing platform cannot give the vector data 104 to the batch program BP, and the batch program BP cannot execute the calculation using the vector data 104. Since the number of slides is two, the next target stream data sequence is the stream data sequence 105 at time 1:07.
- the stream data sequence 105 at time 1:07 does not overlap with the stream data in the stream data sequence 101. Since the stream processing platform does not have data overlap in the converted vector data, when the stream data sequence 105 is given, the stream processing vector is converted into the vector data 106 by the stream TO vector conversion 100. As a result, the stream processing platform gives the vector data 106 to the batch program BP, and the batch program BP executes the calculation using the vector data 106.
- Fig. 2 shows an example of execution when data overlap is provided.
- the program configuration of the batch program is the same as that shown in FIG. In FIG. 2, the response which is a platform requirement is 16 seconds. The response is the time from when the stream data is input until the processing is completed.
- the stream processing platform determines the input data size and overlap width based on the program configuration and platform requirements.
- the input data size is the number of stream data included in the vector data to be converted. Here, the number is eight.
- the overlap width is two as in FIG.
- the stream processing platform converts the stream data string 201 at time 0:56 to 1:03 into vector data 202, which is a data block, by stream TO vector conversion 100.
- the stream processing platform gives the vector data 202 to the batch program BP, and the batch program BP executes the calculation using the vector data 202. Since the number of slides is 2, the next target stream data string is the stream data string 203 at time 1:05.
- the overlapping width with the stream data string 201 is six. Since the two overlap widths that have been set are exceeded, at time 1:05, the stream processing platform does not convert the stream data string 203 into vector data by the stream TO vector conversion 100. Since the number of slides is two, the next target stream data string is the stream data string 205 at time 1:07.
- the stream data string 205 at time 1:07 includes stream data at times 1:00 to 1:03, so the overlap width with the stream data string 201 is 4. It becomes a piece. Since the two overlap widths that have been set are exceeded, the stream processing platform does not convert the stream data string 205 into vector data by the stream TO vector conversion 100 even at time 1:07. Since the number of slides is two, the next target stream data string is the stream data string 207 at time 1:09.
- the stream processing platform Since the stream data string 207 at time 1:09 includes stream data at time 1:02 to 1:03, the overlap width with the stream data string 201 is two. Since it matches the set overlap width of two, at time 1:09, the stream processing platform converts the stream data string 207 into vector data 208 by the stream TO vector conversion 100. As a result, the stream processing platform gives the vector data 207 to the batch program BP, and the batch program BP executes the calculation using the vector data 208.
- the data size of the vector data is determined so that only the data corresponding to the set overlapping width overlaps. In the example of FIG. 2, no vector data is generated at times 1:05 and 1:07. This is because the vector data 202 and 208 can be covered without generating the vector data. As described above, since generation of vector data exceeding the overlap width can be suppressed, the processing load can be reduced.
- FIG. 3 is a system configuration diagram illustrating an example of the stream processing system 300.
- the stream processing system 300 has a configuration in which a client 301, a data source 302, and a stream processing server 303 are communicably connected via a network.
- the network 304 may be a local area network (LAN) connected by Ethernet (registered trademark), an optical fiber, or the like, or a wide area network (WAN) including the Internet at a lower speed than the LAN.
- the client 301, the data source 302, and the stream processing server 303 may be any computer system such as a personal computer (PC) or a blade type computer system.
- PC personal computer
- the client 301 is a computer that executes registration processing for the stream processing server 303. Details of the registration process will be described later.
- the data source 302 is a supply source that supplies a series of time-series data to be processed to the stream processing server 303, and examples thereof include the plant and server of the factory described above.
- sensor values such as temperature and voltage attached to the machine are time-series data.
- server for example, the CPU and hard disk usage obtained from the server log, or the packet amount of the network 304 is time-series data.
- the stream processing server 303 is a computer in which a CPU 311, a memory 312, an I / O interface 313, and a storage 314 are coupled via a bus 315.
- the stream processing server 303 accesses the network 304 via the I / O interface 313. Further, the stream processing server 303 can store the processing result, the intermediate result of the processing, and the setting data necessary for the system operation in the nonvolatile storage 314.
- the storage 314 is directly connected via the I / O interface 313, but may be connected via the network 304 via the I / O interface 313 outside the stream processing server 303.
- the stream processing base 321 is mapped to the memory 312.
- the stream processing platform 321 is middleware to which general stream processing modules such as a start / stop module and a scheduling module of a stream program group 331 that is one or more stream programs are mapped.
- the stream processing platform 321 includes a batch program execution unit 335 including a batch program input / output static determination unit 332, a batch program input / output dynamic determination unit 333, and a batch program group 334 that is one or more batch programs. To be mapped.
- FIG. 4 is an explanatory diagram showing an example of a stream program in the stream program group 331 shown in FIG.
- the stream program 400 is a program that inputs and outputs stream data.
- FIG. 4 shows a stream program 400 defined in a CQL (Continuous Query Language) language.
- the stream program 400 includes an input stream definition, an output stream definition, and a query definition group.
- a sensor stream 401 having “time” and “measured value” as columns is defined as an input stream definition. Further, as an output stream definition, an abnormal sensor stream 402 having “time” and “measurement value” as columns is defined.
- the query definition group includes query definition 1 and query definition 2.
- a noise removal query 403 is defined, and as the query definition 2, an abnormal sensor query 404 is defined.
- the noise removal query 403 is a query that receives stream data from the sensor stream 401 and calculates an average value of the latest four measured values.
- the abnormal sensor query 404 is a query for outputting the stream data of the sensor stream 401 to the abnormal sensor stream 402 when the average value calculated by the noise removal query 403 is larger than ⁇ .
- FIG. 4 shows an example of the stream program 400.
- the stream program 400 may be defined in C language, Java language, or any other programming language.
- FIG. 5 is an explanatory diagram showing an example of stream data.
- the stream data 500 to 513 have time, and are stored in the stream storage queue Q in order of time.
- FIG. 5 shows stream data 500 to 513 having time and measurement values as columns as shown in the legend.
- the stream data 50 at the time “1:00” and the measurement value “10.0” is stored at the head, and then the stream data at the time “1:01” and the measurement value “15.0” is stored.
- stream data 502 of time “1:02” and measurement value “14.0” is stored.
- stream data 513 of time “1:13” and measurement value “12.0” is stored at the end of the stream storage queue Q.
- FIG. 6 is an explanatory diagram showing an example of a batch program in the batch program group 334 shown in FIG.
- the batch program 600 is a program that uses vector data as input and output.
- the batch program 600 includes a definition of vector data and a batch processing function.
- a sensor array 601 having “time” and “measured value” as columns is defined.
- a preprocessing function 602 is defined as a batch processing function definition.
- the pre-processing function 602 is a function that receives the sensor array 601 as an input, obtains a weighted average for the three most recent measurement values by the function SMOOTHING, and smoothes the measurement values.
- the preprocessing function 602 calculates a differential value from the current value and the previous value for the smoothed value by the function DERIVATION.
- the function DELIVATION is a function that performs sampling on the elements of the vector data and reduces the number of elements by 50%. For example, when 8 elements of time 1:01, 1:02, 1:03, 1:04, 1:05, 1:06, 1:07, 1:08 are input to the function DERIVATION, the function DERIVATION is , Reduce 50% of the 8 elements. As a result, four elements of time 1:01, 1:03, 1:05, and 1:07 are output.
- the preprocessing function 602 outputs the differential value calculated by the function DELIVATION to the sensor array 601 and ends the process.
- the batch program 600 is an example of the batch program 600, and the batch program 600 may be defined in R language, C language, Java language, or any other programming language.
- FIG. 7 is an explanatory diagram showing an example of vector data.
- the vector data VD is an aggregate having a plurality of elements.
- the vector data VD shown in FIG. 7 is realized as an array, and each element of the array has a time and a measured value.
- an index 700 has an element 700 of time “0:58” and a measured value “11.0”
- an index 1 has an element 701 of time “0:59” and a measured value “14.0”
- an index. 2 has an element 702 of time “10:00” and a measured value “10.0”
- an index 7 has an element 707 of time “1:05” and a measured value “12.0”.
- the method for realizing the vector data VD may be a list or other data structure in addition to the array.
- FIG. 8 is an explanatory diagram showing the input / output relationship of the batch program input / output static determination unit 332 shown in FIG.
- the batch program input / output static determination unit 332 is a program executed by the CPU 311 on the stream processing board 321, and determines the static input / output setting of the batch program 600.
- the batch program input / output static determination unit 332 includes a first static determination unit 804.
- the first static determination unit 804 receives registration information such as a program configuration 801, a platform requirement 802, and a batch execution specification 803 from the client 301. Then, the first static determination unit 804 determines the static input data size and the overlap width of the batch program 600. The determined input data size and overlap width are output as batch program input / output settings. The input data size and overlap width will be described later.
- FIG. 9 is an explanatory diagram showing an example of the program configuration 801 shown in FIG.
- the program configuration 801 is information in which parameters that configure the operation of the program are set.
- the parameters include, for example, a window width 901 and the number of slides 902, and are designated by the user by operating the client 301.
- the window width 901 indicates the width of a window including time series data necessary for processing of the stream program 400 and the batch program 600.
- the window width is the number of time-series data included in the window.
- the number of slides 902 is a size for sliding the window for each process. For example, in the stream program 400 shown in FIG. 4, since the noise removal query 403 continues to calculate by shifting the average value of the latest four measured values one by one, the window width 901 is four and the number of slides 902 is one. It becomes.
- the window width 901 is four in total.
- the number of slides 902 is two because the elements of the vector data VD are sampled and reduced by 50%.
- FIG. 10 is an explanatory diagram showing an example of the platform requirement 802 shown in FIG.
- the platform requirement 802 is a condition imposed on the stream processing infrastructure 321.
- the parameter includes, for example, a response time 1001 and is designated by the user by operating the client 301.
- the response time 1001 is a time that can be allowed by the user from when data is input to the stream processing server 303 until the processing of the data is completed. In FIG. 10, since the response time 1001 is designated as “16 seconds”, the time from the input of data to the completion of the processing of the data is allowed up to 16 seconds.
- FIG. 11 is an explanatory diagram showing an example of the batch execution specification 803 shown in FIG.
- the batch execution specification 803 is information that defines the execution method of batch processing.
- the input rate 1101 indicates an interval at which stream data input by the batch program 600 arrives.
- the processing throughput 1102 indicates the number of elements of the vector data VD that the batch program 600 processes per unit time.
- the batch program 600 can process one element per second.
- FIG. 12 is an explanatory diagram showing the input / output relationship of the batch program input / output dynamic determination unit 333 shown in FIG.
- the batch program input / output dynamic determination unit 333 is a program executed by the CPU 311 on the stream processing board 321 and determines dynamic input / output setting of the batch program 600.
- the batch program input / output dynamic determination unit 333 includes a batch execution monitoring unit 1201 and a first dynamic determination unit 1203.
- the batch execution monitoring unit 1201 monitors the batch program 600 being executed, and generates a batch execution monitoring value 1202.
- the batch execution monitoring value 1202 is an observation value in the batch program 600 being executed.
- the batch execution monitoring value 1202 will be described later.
- the first dynamic determination unit 1203 receives from the client 301 the program configuration 801 shown in FIG. 9, the platform requirement 802 and the batch execution monitoring value 1202 shown in FIG. Then, the first dynamic determination unit 1203 determines the dynamic input data size and the overlap width of the batch program 600. The determined input data size and overlap width are output as a batch program input / output setting 805. The input data size and overlap width will be described later.
- FIG. 13 is an explanatory diagram showing an example of the batch execution monitoring value 1202.
- the batch execution monitoring value 1202 has the number of processing target data 1301 and the processing throughput 1302 as parameters, and is output by the batch execution monitoring unit 1201.
- the processing target data count 1301 indicates the number of stream data stored in the stream storage queue Q and input by the batch program 600. In FIG. 13, since the processing target data count 1301 is “6”, it indicates that there are six stream data in the stream storage queue Q.
- the processing throughput 1302 indicates the size of the value of the vector data VD that the batch program 600 processes per unit time.
- FIG. 14 is an explanatory diagram showing an example of the batch program input / output setting 805 shown in FIGS. 8 and 12.
- the batch program input / output setting 805 is information that defines data input / output to / from the batch program 600. For example, there are an input data size 1401 and an overlap width 1402 as parameters, which are specified by the user by operating the client 301.
- the input data size 1401 indicates the number of elements of the vector data VD input by the batch program 600.
- the number of elements of the vector data VD is eight.
- the overlap width 1402 indicates the number of elements that overlap between the vector data VD input by the current batch program 600 and the vector data VD input by the previous batch program 600.
- the vector data VD input by the previous batch program 600 also includes an index 701 element 701 and an index 1 element 702. In such a case, the overlap width 1402 is two.
- FIG. 15 is an explanatory diagram showing the input / output relationship of the batch program execution unit 335 shown in FIG.
- the batch program execution unit 335 includes an input data / stream TO vector conversion unit 1501 and an output data / vector TO stream conversion unit 1502.
- the input data stream TO vector conversion unit 1501 receives the batch program input / output setting 805.
- the batch program input / output setting 805 may be generated by the first static determination unit 804 or the first dynamic determination unit 1203 or may be manually created by the user.
- the input data stream TO vector conversion unit 1501 inputs a plurality of stream data strings SD1 and SD2 from the stream storage queue Q and the overlap data storage area 1500, and from the stream data strings SD1 and SD2 according to the batch program input / output setting 805. Conversion to vector data VD1.
- the overlap data storage area 1500 is an area in which the stream data string SD2 having the latest overlap width 1402 is stored. Details will be described later.
- the stream data string SD1 of the stream storage queue Q may be generated by the stream program 400 or by another program.
- the batch program 600 receives the vector data VD1 output from the input data stream TO vector conversion unit 1501, and outputs the vector data VD2 as the processing result. Then, the output data / vector TO stream conversion unit 1502 receives the vector data VD2 output from the batch program 600 and converts it into a stream data string SD3.
- the stream data string SD3 output by the output data / vector TO stream conversion unit 1502 is stored in the stream storage queue Q. Subsequently, the stream data queue SD3 in the stream storage queue Q may be input by the stream program 400 or by another program.
- FIG. 16 is an explanatory diagram showing an example of the overlapping data storage area 1500.
- the overlap data storage area 1500 stores stream data 1601 at time “0:58” and measurement value “11: 0”, and stream data 1601 at time “0:59” and measurement value “14.0”.
- the input data / stream TO vector conversion unit 1501 inputs the stream data held in the overlap data storage area 1500 and generates vector data.
- FIG. 17 is a flowchart illustrating an example of a processing procedure performed by the batch program input / output static determination unit 332.
- the first static determination unit 804 reads a program configuration 801, a platform requirement 802, and a batch execution specification 803 specified by the user (S1701).
- the first static determination unit 804 sets the overlap width 1402 to “window width ⁇ slide number” and sets it in the batch program input / output setting 805 (S1702).
- the time until the head data of the stream storage queue Q that stores the stream data to be input to the batch program 600 is executed by the batch program 600 is a waiting time
- the time that the batch program 600 is executed is the execution time.
- the first static determination unit 804 can satisfy the requested response time 1001 if the waiting time + the execution time is equal to or shorter than the response time 1001.
- the waiting time is “the number of data in the stream storage queue Q (hereinafter, the number of queue data) / input rate”, and the execution time is “the number of queue data / processing throughput”. Therefore, “the number of queue data / input rate + the number of queue data / processing throughput” needs to be equal to or shorter than the response time.
- the number of data that can be processed in the stream storage queue Q (hereinafter, the number of data that can be processed) is the maximum number of queue data that satisfies “number of queue data / input rate + number of queue data / processing throughput ⁇ response time”.
- the number of data that can be processed is [response time ⁇ processing throughput ⁇ input rate / (processing throughput + input rate)] (step S1703). [] Is a Gaussian symbol.
- the first static determination unit 804 sets the input data size 1401 as the processable data number in the batch program input / output setting 805 ( Step S1705). Thereby, the input data size 1401 can be maximized while satisfying the requirement of the response time 1001.
- the first static determination unit 804 can process the vector data VD having a smaller number of elements than the window width 901. Therefore, the input data size 1401 is set to the window width 901, and the calculated input data size 1401 and overlap width 1402 are set in the batch program input / output setting 805 (S1706).
- the window width 901 is four
- the number of slides 902 is two
- the response time 1001 is 16 seconds as shown in FIG. 10
- the input rate 1101 is 1 / piece as shown in FIG. Second
- the number of processable data is 16 [seconds] ⁇ 1 [piece / second] ⁇ 1 [piece / second] / (1 [piece / second] +1 [piece / second] Seconds]). Therefore, since the window width 901 is four, the number of data that can be processed is larger than the window width 901, and the input data size 1401 is eight that is the number of data that can be processed.
- FIG. 18 is a flowchart illustrating an example of a processing procedure performed by the batch execution monitoring unit 1201.
- the batch execution monitoring unit 1201 acquires the current number of data in the stream storage queue Q that stores the stream data that is input to the batch program 600, and sets the batch execution monitoring value 1202 to the number of processing target data 1301 (S1801).
- the batch execution monitoring unit 1201 extracts the processing throughput 1102 from the log of the stream processing base 321 and sets it to the processing throughput 1102 of the batch execution monitoring value 1202 (S1802).
- the batch execution monitoring unit 1201 returns to step S1801 if the stream processing infrastructure 321 has not ended (step S1803: No), and ends the processing if it has ended (step S1803: Yes).
- FIG. 19 is a flowchart illustrating an example of a processing procedure performed by the first dynamic determination unit 1203.
- the first dynamic determination unit 1203 first reads the program configuration 801 and the platform requirement 802 (S1901). Next, the first dynamic determination unit 1203 sets the overlap width 1402 to “window width 901—the number of slides” and sets it in the batch program input / output setting 805 (S1902).
- the first dynamic determination unit 1203 waits for the stream data to be processed to be input to the stream storage queue Q (step S1903: No).
- the first dynamic determination unit 1203 reads the batch execution monitoring value 1202 (S1904).
- the first dynamic determination unit 1203 determines the maximum number of stream data satisfying the requirements of the response time 1001 (hereinafter referred to as the response time 1001, the processing throughput 1102, the current time, and the oldest time of the data in the stream storage queue Q). “Processable data count”) is set to “processing throughput ⁇ (response time ⁇ (current time ⁇ oldest time of processing target data))” (S1905).
- step S1906 Yes
- the first dynamic determination unit 1203 cannot process the vector data VD having a smaller number of elements than the window width 901. Therefore, the input data size 1401 is set to the window width 901 and set in the batch program input / output setting 805 (S1907). Then, control goes to a step S1911.
- the first dynamic determination unit 1203 determines whether the processable data number is equal to or less than the process target data number 1301 + the overlap width 1402. Judgment is made (step S1908). If the number of processable data is equal to or less than the number of process target data + the overlap width (step S1908: Yes), the first dynamic determination unit 1203 sets the input data size 1401 as the number of processable data, and sets batch program input / output settings 805 (S1909). Then, control goes to a step S1911.
- step S1908 when the number of data that can be processed is not equal to or less than the number of data to be processed + the overlap width (step S1908: No), the first dynamic determination unit 1203 sets the input data size 1401 to “the number of processable data + the overlap width 1402”. And set in the batch program input / output setting 805 (S1910). Then, control goes to a step S1911.
- step S1911 the first dynamic determination unit 1203 determines whether or not the stream processing base 321 has ended (step S1911). If the stream processing base 321 has not ended (step S1911: No), the process returns to step S1903. On the other hand, when the processing is finished (step S1911: Yes), the processing by the first dynamic determination unit 1203 is finished.
- FIG. 20 is a flowchart illustrating an example of a processing procedure performed by the input data stream TO vector conversion unit 1501.
- the input data stream TO vector conversion unit 1501 reads the batch program input / output setting (S2001).
- the input data stream TO vector conversion unit 1501 determines whether or not stream data equal to or larger than the number obtained by subtracting the overlap width 1402 from the input data size 1401 exists in the stream storage queue Q (step S2002).
- step S1803 If there is no more stream data than the number obtained by subtracting the overlap width 1402 from the input data size 1401 (step S1803: No), the process returns to step S2001 and waits until stream data is accumulated.
- step S2002: Yes the input data stream TO vector conversion unit 1501 is the number obtained by subtracting the overlap width 1402 from the input data size 1401. Stream data is acquired from the stream storage queue Q (S2003).
- the input data stream TO vector conversion unit 1501 acquires stream data from the overlap data storage area 1500 (S2004).
- the input data / stream TO vector conversion unit 1501 converts the stream data acquired from the stream storage queue Q and the overlap data storage area 1500 into vector data VD (S2005), and starts the batch program 600 with the vector data VD as an input. (S2006).
- the input data stream TO vector conversion unit 1501 adds the stream data having the newest overlap width 1402 among the stream data acquired from the stream storage queue Q and the overlap data storage area 1500 to the overlap data storage area 1500.
- Store S2007).
- step S2008 determines whether or not the stream processing infrastructure 321 has ended (step S2008), and when it has not ended (step S2008: No), returns to step S2002. On the other hand, when the processing is completed (step S2008: Yes), the processing by the input data stream TO vector conversion unit 1501 is ended.
- FIG. 21 is an explanatory diagram showing an example of conversion from stream data to vector data VD. 21 will be described with reference to the step numbers in FIG.
- the input data stream TO vector conversion unit 1501 reads the batch program input / output setting 805 having eight input data sizes 1401 and two overlap widths 1402.
- step S2002 the input data stream TO vector conversion unit 1501 determines whether or not stream data equal to or larger than the number obtained by subtracting the overlap width 1402 from the input data size 1401 exists in the stream storage queue Q.
- step S2003 the input data stream TO vector conversion unit 1501 acquires the five stream data 501 to 505 from the time 1:00 to 1:05 from the stream storage queue Q.
- step S2004 the input data stream TO vector conversion unit 1501 acquires stream data 1601 and 1602 at times 0:58 and 0:59 from the overlap data storage area 1500.
- step S2005 the input data / stream TO vector conversion unit 1501 performs stream data 1601 and 1602 at times 0:58 and 0:59 and six stream data 501 to 505 are converted into vector data VD1 having elements 700 to 707 with indexes 0 to 7.
- step S2007 the input data stream TO vector conversion unit 1501 has the overlapping width 1402: 2 streams having a newer time among the eight stream data acquired from time 0:59 to 1:05. Select data. In this case, two stream data at times 1:04 and 1:05 are selected.
- the input data stream TO vector conversion unit 1501 overwrites and saves the two stream data at times 1:04 and 1:05 in the overlapping data storage area 1500.
- the overlap data storage area 1500 stores stream data at times 1:04 and 1:05 instead of stream data at times 0:58 and 0:59. Therefore, when stream data is newly acquired in step S2003, stream data at times 1:04 and 1:05 is acquired from the overlapping data storage area 1500 in step S2004.
- FIG. 22 is a flowchart illustrating an example of a processing procedure performed by the output data / vector TO stream conversion unit 1502.
- the output data / vector TO stream conversion unit 1502 acquires the vector data VD output from the batch program 600 (S2201).
- the output data / vector TO stream conversion unit 1502 sequentially acquires elements from the vector data VD, and adds stream time to the acquired elements to generate stream data (step S2202).
- the output data / vector TO stream conversion unit 1502 stores the generated stream data in the stream storage queue Q in time order (S2203). After that, the output data / vector TO stream conversion unit 1502 determines whether or not the stream processing infrastructure 321 has ended (step S2204), and when it has not ended (step S2204: No), the process returns to step S2201. On the other hand, if completed (step S2204: Yes), the processing by the output data / vector TO stream converter 1502 is terminated.
- FIG. 23 is an explanatory diagram showing an example of conversion from vector data VD to stream data. 23 will be described with reference to step numbers in FIG.
- the output data / vector TO stream conversion unit 1502 obtains vector data VD2 having elements of indexes 0-2.
- the output data / vector TO stream conversion unit 1502 generates stream data 2311, 2313, 2315 at times 1:01, 1:03, 1:05 corresponding to the elements, and the stream storage queue.
- the downstream stream program 400 can acquire the stream data 2311, 2313, and 2315 stored in the stream storage queue Q and execute the stream processing.
- the vector data can be overlapped, so that it is possible to avoid the state where the calculation cannot be executed, and the batch program semantics can be maintained. Therefore, the batch program 600 can be executed on the stream processing base 321 without changing the code or algorithm of the batch program. Further, the data size of the vector data VD is determined so that only the data corresponding to the overlap width 1402 is overlapped. As described above, since the generation of the vector data VD exceeding the overlap width 1402 can be suppressed, the processing load can be reduced.
- the stream data group as the execution result is collectively converted into vector data, so that it is given as input data to subsequent stream data executed on the stream processing board 321.
- stream processing on the stream processing board 321 can be made more efficient.
- Example 2 Next, Example 2 will be described.
- the second embodiment is an example in which the stream program 400 is executed on a batch processing base.
- symbol is attached
- FIG. 24 and 25 are explanatory diagrams showing an example of executing the stream program SP on the batch processing base.
- FIG. 24 shows an example of execution when data is not overlapped.
- the stream program SP has a program configuration in which the window width is 60 and the number of slides is one.
- the unit of time is “minute” as an example.
- the batch processing platform executes batch processing every 4 hours as an example.
- the batch processing platform executes the batch processing based vector TO stream conversion 2400 that converts the value for each time in the file F into stream data in time units.
- the converted stream data string 2401 is input to the stream program SP.
- the stream program SP executes a predetermined process using the input stream data string 2401 while sliding one by one.
- Vector TO stream conversion 2400 is performed. That is, at time 5:59, the batch processing platform generates a stream data string 2402 by performing vector TO stream conversion on the time value from 5:00 to 8:59.
- the stream data at time 5:00 is head data in the vector TO stream conversion at time 8:59. Accordingly, when the stream data at time 5:00 is input to the stream program SP, the stream program SP does not have 59 stream data from time 4:01 to 4:59, so the stream data at time 5:00 The predetermined process cannot be executed. Therefore, when the stream program SP processes the stream data at time 5:00, it is necessary to provide the previous 59 stream data or to keep the stream data calculation state at time 4:59. is there.
- FIG. 25 shows an execution example when data overlap is provided.
- the program configuration of the stream program SP is the same as that shown in FIG.
- the platform requirement for the stream program SP is 480 minutes.
- the batch processing platform determines the overlap width or whether or not to maintain the calculation state based on the program configuration and platform requirements.
- the batch processing base selects whether to apply the overlap width or the calculation state holding after comparing the calculation processing amount.
- the batch processing platform converts the data including the overlap width into stream data by the vector TO stream conversion 2400.
- the vector TO stream conversion 2400 In the case of time 8:59, in FIG. 24, the vector TO stream conversion 2400 generates a stream data string 2402 of time 5:00 to 8:59, but in FIG. A stream data string 2501 at time 4:01 to 8:00 including stream data corresponding to 59 overlap widths immediately before 00 is generated. Note that the stream data at times 8:01 to 8:59 that have not been generated are generated at the timing of the next batch processing.
- the calculation state of the stream program SP at time 4:59 is held.
- the vector TO stream conversion 2400 generates stream data 2502 at times 5:00 to 8:59 at time 8:59 as in FIG. 24, and outputs the stream data 2502 to the stream program SP.
- the stream program SP executes a predetermined process using the calculation state of the stream program SP at time 4:59 and the stream data string 2502 at time 5:00 to 8:59.
- FIG. 26 is a system configuration diagram illustrating an example of a batch processing system.
- the batch processing system 2600 has a configuration in which a client 301, a data source 302, and a batch processing server 2603 are communicably connected via a network 304.
- the network 304 can be Ethernet, LAN, or WAN.
- the client 301, the data source 302, and the stream processing server 303 may be any computer system such as a PC or a blade-type computer system.
- the client 301 is a computer that executes registration processing for the batch processing server 2603. Details of the registration process will be described later.
- the data source 302 is a supply source that supplies a series of time-series data to be processed to the batch processing server 2603, and examples thereof include the plant and server of the factory described above.
- the batch processing server 2603 is a computer in which an I / O interface 2613, a CPU 2611, a memory 2612, and a storage 2614 are coupled by a bus 2615.
- the batch processing server 2603 accesses the network 304 via the I / O interface 2613. Further, the batch processing server 2603 can store processing results, intermediate results of processing, and setting data necessary for system operation in the nonvolatile storage 2614.
- the storage 2614 is directly connected via the I / O interface 2613, but may be connected outside the batch processing server 2603 via the network 304 via the I / O interface 2613.
- a batch processing base 2621 is mapped to the memory 2612.
- the batch processing platform 2621 is middleware to which general stream processing modules such as a batch program group 334 that is one or more batch programs are started, stopped, and scheduled.
- the batch processing platform 2621 includes a stream program input / output setting static determination unit 2632, a stream program input / output setting dynamic determination unit 2633, and a stream program execution unit including a stream program group 331 that is one or more stream programs. 2635 is mapped.
- FIG. 27 is an explanatory diagram showing the input / output relationship of the stream program input / output setting static determination unit 2632 shown in FIG.
- the stream program input / output setting static determination unit 2632 is a program executed by the CPU 2611 on the batch processing base 2621 and determines the static input / output setting of the stream program 400.
- the stream program input / output setting static determination unit 2632 includes a second static determination unit 2702.
- the second static determination unit 2702 receives registration information such as a program configuration 801, a platform requirement 802, and a stream execution specification 2701 from the client 301. Then, the second static determination unit 2702 determines whether or not the static input data size, the overlap width, and the calculation state of the stream program 400 are retained.
- the calculation state holding means holding a calculation state that is an execution result of the stream program 400 executed on the batch processing base 2621.
- the determined input data size and overlap width are output as a stream program input / output setting 2703.
- FIG. 28 is an explanatory diagram showing an example of the stream execution specification 2701.
- the stream execution specification 2701 is information that defines an execution method of stream processing.
- the parameters include, for example, an input rate 2801, a processing throughput 2802, a calculation state holding / reading time 2803, and are designated by the user by operating the client 301.
- the input rate 2801 indicates an interval at which elements of the vector data VD input by the stream program 400 arrive.
- the processing throughput 2802 indicates the number of stream data that the stream program 400 processes per unit time. In FIG. 28, since the processing throughput 2802 is 1 / min, it indicates that the stream program 400 can process one value per minute.
- the calculation state holding / reading time 2803 indicates the time required for holding and reading the calculation state. In FIG. 28, since the calculation state holding / reading time 2803 is 5 minutes, the calculation state holding and reading takes 5 minutes.
- FIG. 29 is an explanatory diagram showing the input / output relationship of the stream program input / output setting dynamic determination unit 2633 shown in FIG.
- the stream program input / output setting dynamic determination unit 2633 is a program executed by the CPU 2611 on the batch processing base 2621 and determines the dynamic input / output setting of the stream program 400.
- the stream program input / output setting dynamic determination unit 2633 includes a stream execution monitoring unit 2901 and a second dynamic determination unit 2903.
- the stream execution monitoring unit 2901 monitors the stream program 400 being executed and generates a stream execution monitoring value 2902.
- the stream execution monitoring value 2902 is an observation value in the batch program 600 being executed.
- the stream execution monitoring value 2902 will be described later.
- the second dynamic determination unit 2903 receives from the client 301 the program configuration 801 shown in FIG. 9, the platform requirement 802 and the stream execution monitoring value 2902 shown in FIG. Then, the second dynamic determination unit 2903 determines the dynamic input data size, the overlap width, and the presence / absence of the calculation state of the stream program 400.
- the determined input data size 1401 and overlap width 1402 are output as a stream program input / output setting 2703. The input data size and overlap width will be described later.
- FIG. 30 is an explanatory diagram showing an example of the stream execution monitoring value 2902.
- the stream execution monitoring value 2902 has, as parameters, the number of data to be processed 3001, a processing throughput 3002, and a calculation state holding / reading time 3003, and is output by the stream execution monitoring unit 2901.
- the processing target data number 3001 indicates the number of elements of the vector data VD input by the stream program 400.
- the number of elements is stored in a file, for example.
- the processing target data number 3001 is 240, it indicates that the number of vector data VD elements is 240 in the file.
- the processing throughput 3002 indicates the number of stream data processed by the stream program 400 per unit time.
- the calculation state holding / reading time 3003 indicates the time required for holding and reading the calculation state.
- FIG. 31 is an explanatory diagram showing an example of the stream program input / output setting 2703 shown in FIG. 27 and FIG.
- the stream program input / output setting 2703 is information that defines data to be input / output to / from the stream program 400.
- As parameters for example, there are an input data size 3101, an overlap width 3102, and a calculation state holding presence / absence 3103, which are designated by the user by operating the client 301.
- An input data size 3101 indicates the size of stream data input by the stream program 400.
- the stream program 400 inputs 240 stream data.
- the overlap width 3102 indicates the number of stream data that overlaps between the stream data input by the stream program 400 and the stream data input by the stream program 400 in the previous execution. In FIG. 31, since there are three overlap widths 3102, the stream data input by the stream program 400 in the previous execution and the three stream data overlap.
- the calculation state holding presence / absence 3103 indicates whether or not the calculation state of the stream program 400 is held. In FIG. 31, since the calculation state holding presence / absence 3103 is “none”, the calculation state is not held.
- FIG. 32 is an explanatory diagram showing the input / output relationship of the stream program execution unit 2635 shown in FIG.
- the stream program execution unit 2635 includes an input data / vector TO stream conversion unit 3201, an output data / stream TO vector conversion unit 3202, a calculation state reading unit 3203, and a calculation state holding unit 3204.
- the input data / vector TO stream conversion unit 3201, the calculation state reading unit 3203, and the calculation state holding unit 3204 receive the stream program input / output setting 2703.
- the stream program input / output setting 2703 may be generated by the second static determination unit 2702 or the second dynamic determination unit 2903, or may be manually created by the user.
- the calculation state reading unit 3203 reads the calculation state 3211 stored in the calculation state storage area 3210 and inputs it to the stream program 400 according to the stream program input / output setting 2703 when the execution of the stream program 400 is started.
- the calculation state holding unit 3204 inputs the calculation state 3211 to the stream program 400 according to the stream program input / output setting 2703 when the execution of the stream program 400 ends.
- the input data / vector TO stream conversion unit 3201 receives the vector data VD3 in the file F1, which is the output of the batch program BP1, and converts the vector data VD3 into stream data according to the stream program input / output setting 2703.
- the vector data VD3 input by the input data / vector TO stream conversion unit 3201 may be stored in a file, database, or other storage area. Further, the batch program 600 may store the vector data VD3 in the file F1 and other storage areas, and other programs may store them.
- the stream program 400 receives the stream data output from the input data / vector TO stream conversion unit 3201, and outputs the stream data as the processing result. Then, the output data / stream TO vector conversion unit 3202 receives the stream data SD4 output from the stream program 400 and converts it into vector data VD4.
- the vector data VD output from the output data stream TO vector conversion unit 3202 is stored in the file F2, a database, or other storage area.
- the vector data VD4 stored in the file F2 or other storage area may be input by the batch program 600 or other programs.
- FIG. 33 is an explanatory diagram showing an example of the overlapping data time shown in FIG.
- the overlap data time OT indicates the time of stream data that overlaps with the stream data input by the stream program 400 in the previous execution in the stream data input by the stream program 400.
- the overlap data time OT is “0:57 to 0:59”
- the time “0:57” to “0:59” is used in both execution of the current stream program 400 and the previous stream program 400.
- the overlap data time OT is set by the input data / vector TO stream conversion unit 3201 and used by the output data / stream TO vector conversion unit 3202.
- FIG. 34 is an explanatory diagram of an example of an operator tree.
- the operator tree 3400 is generated by compiling the stream program 400 described in CQL.
- the stream processing infrastructure 321 executes the operators 3401 to 3404 constituting the operator tree 3400 in the order specified by the operator tree 3400.
- An operator tree 3400 shown in FIG. 34 is an operator tree 3400 generated as a result of compilation of the noise removal query 403 and the abnormal sensor query 404 shown in FIG.
- the operator tree 3400 includes, for example, ROWS 3401, GROUP BY 3402, ISTREAM 3403, and ISTREAM 3404.
- the operator tree 3400 is executed in the order of ROWS 3401, GROUP BY 3402, ISTREAM 3403, and ISTREAM 3404.
- FIG. 35 is an explanatory diagram showing an example of the calculation state storage area 3210 shown in FIG.
- a calculation state 3211 is stored in the calculation state storage area 3210.
- a calculation state 3211 indicates a state used for calculation of each operator 3401 to 3404. For example, since the calculation state of the ROWS 3401 is a window that holds four stream data recently, four stream data are stored. Further, the calculation state 3211 of the operator GROUP BY 3402 stores an average value of the latest four measurement values.
- FIG. 36 is a flowchart illustrating an example of a processing procedure performed by the second static determination unit 2702.
- the second static determination unit 2702 reads the program configuration 801, the platform requirement 802, and the stream execution specification 2701 (step S3601).
- the second static determination unit 2702 determines whether or not “(window width ⁇ number of slides) / processing throughput” is greater than the calculated state holding / reading time 2803 (step S 3602). If it is larger (step S3602: Yes), the second static determination unit 2702 sets the calculation state holding presence / absence 3103 to “present” in the stream program input / output setting 2703 (step S3603), and the stream program input / output setting 2703 The overlap width 3102 is set to 0 (step S3604). Then, control goes to a step S3607.
- step S3602 when “(window width ⁇ number of slides) / processing throughput” is equal to or shorter than the calculation state holding / reading time 2803 (step S3602: No), the second static determination unit 2702 calculates the calculation state in the stream program input / output setting.
- the holding presence / absence 3103 is set to “none” (step S 3605), and the overlap width 3102 is set to “window width ⁇ slide number” in the stream program input / output setting 2703 (step S 3606). Then, control goes to a step S3607.
- the time until the stream data 400 is executed in the vector data VD to be processed is the waiting time, and the execution time of the stream program 400 is the execution time. If the waiting time + execution time is equal to or shorter than the response time 1001, processing can be performed with the requested response time 1001.
- the waiting time is “size of vector data VD to be processed (hereinafter referred to as vector size) / input rate”, and the execution time is “vector size / processing throughput”.
- vector size / input rate + vector size / processing throughput needs to be a response time of 1001 or less. Therefore, the processable data size of the vector data VD (hereinafter, “number of processable data”) is the maximum vector size that satisfies “vector size / input rate + vector size / processing throughput ⁇ response time”. As a result, the vector size becomes [response time ⁇ processing throughput ⁇ input rate / (processing throughput + input rate)] ([] is a Gaussian symbol).
- the second static determination unit 2702 sets the input data size 3101 as [response time ⁇ processing throughput ⁇ input rate / (processing throughput + input rate)] in the stream program input / output setting 2703 (step S3607). ). Thereby, the process by the second static determination unit 2702 ends.
- FIG. 37 is a flowchart illustrating a processing procedure example by the stream execution monitoring unit 2901.
- the stream execution monitoring unit 2901 refers to the file storing the stream data to be input to the stream program 400, and sets the stream execution monitoring value 2902 to the processing target data number 3001 (step S3701).
- the stream execution monitoring unit 2901 extracts the processing throughput from the log of the batch processing infrastructure 2621 and sets it to the processing throughput 3002 of the stream execution monitoring value 2902 (step S3702). Then, the batch execution monitoring unit 1201 returns to step S3701 if the batch processing base 2621 has not ended (step S3703: No), and ends the process if completed (step S3703: Yes).
- FIG. 38 is a flowchart illustrating an example of a processing procedure performed by the second dynamic determination unit 2903.
- the second dynamic determination unit 2903 first reads the program configuration 801 and the platform requirement 802 (step S3801). Next, the second dynamic determination unit 2903 determines whether or not the element of the vector data VD to be processed exists in the file (step S3802). When the vector data VD to be processed exists in the file (step S3802: Yes), the second dynamic determination unit 2903 reads the stream execution monitoring value 2902 (step S3803).
- the second dynamic determination unit 2903 determines whether or not “(window width ⁇ number of slides) / processing throughput” is larger than the calculation state holding / reading time 3003 (step S3804). If larger (step S3804: YES), the second dynamic determination unit 2903 sets the calculation state holding presence / absence 3103 to “present” in the stream program input / output setting 2703 (step S3805), and the stream program input / output setting 2703 The overlap width 3102 is set to 0 (step S3806). Then, control goes to a step S3809.
- step S3804 determines whether “(window width ⁇ number of slides) / processing throughput” is less than or equal to the calculation state holding / reading time 3003 (step S3804: No).
- the second dynamic determination unit 2903 calculates the stream program input / output setting 2703.
- the state holding presence / absence 3103 is set to “none” (step S3807), and the overlap width 3102 is set to “window width ⁇ slide number” in the stream program input / output setting 2703 (step S3808). Then, control goes to a step S3809.
- the time until the stream data 400 is executed in the vector data VD to be processed is the waiting time, and the execution time of the stream program 400 is the execution time. If the waiting time + execution time is equal to or shorter than the response time 1001, processing can be performed with the requested response time 1001.
- the waiting time is “size of vector data VD to be processed (hereinafter referred to as vector size) / input rate”, and the execution time is “vector size / processing throughput”.
- vector size / input rate + vector size / processing throughput needs to be a response time of 1001 or less. Therefore, the processable data size of the vector data VD (hereinafter, “number of processable data”) is the maximum vector size that satisfies “vector size / input rate + vector size / processing throughput ⁇ response time”. As a result, the vector size becomes [response time ⁇ processing throughput ⁇ input rate / (processing throughput + input rate)] ([] is a Gaussian symbol).
- the second dynamic determination unit 2903 sets the input data size 3101 as [response time ⁇ processing throughput ⁇ input rate / (processing throughput + input rate)] in the stream program input / output setting 2703 (step S3809). ).
- step S3810 determines whether or not the batch processing base 2621 has ended (step S3810), and if it has not ended (step S3810: No), the process returns to step S3802. On the other hand, when the processing ends (step S3810: Yes), the processing by the second dynamic determination unit 2903 ends.
- FIG. 39 is a flowchart showing an example of a processing procedure by the input data / vector TO stream conversion unit 3201 shown in FIG.
- the input data / vector TO stream conversion unit 3201 sets the read index of the file to “index of last input data + 1 ⁇ overlap width” (step S3901).
- the last input data is an element of the vector data VD read from the file last.
- the input data / vector TO stream conversion unit 3201 sets the overlap data time OT from the time of the element of the read index of the file to the time of the element obtained by subtracting 1 from the index of the last input data (step S3902).
- the input data / vector TO stream conversion unit 3201 acquires the element of the vector data VD of the read index from the file (step S3903), adds the time to the acquired element, and generates stream data (step S3904). .
- the input data / vector TO stream conversion unit 3201 stores the stream data in the stream storage queue Q (step S3905).
- the input data / vector TO stream conversion unit 3201 determines whether or not the number of acquired elements is smaller than the input data size (step S3906). If smaller (step S3906: YES), the input data / vector TO stream conversion unit 3201 adds one read index (step S3907), returns to step 3903, and executes steps S3903 to S3905.
- step S3906 If the acquired number of data exceeds the input data size (step S3906: NO), the input data / vector TO stream conversion unit 3201 sets the last index element in the file as the last input data ( In step S3908), the processing by the input data / vector TO stream conversion unit 3201 is terminated.
- FIG. 40 is an explanatory diagram showing an example of conversion from vector data VD to stream data. 40 will be described with reference to step numbers in FIG. Note that the last input data at this time is the element (10.0) of the index 1002.
- step S3902 the input data / vector TO stream conversion unit 3201 sets the time 0:59 of the element of the read index 1000 of the file to the time 0:59 of the element of the index 1002 of the last input data as the overlapping time. .
- step S3903 the input data / vector TO stream conversion unit 3201 acquires the element (10.0) of the read index 1002 from the file, and generates stream data with the element time 0:59 added in step S3904. And stored in the stream storage queue Q.
- step S3906 the input data / vector TO stream conversion unit 3201 determines whether or not the number of acquired elements (one element of the index 1000 at this stage) is smaller than 240 of the input data size. In this case, since it is small, the input data / vector TO stream conversion unit 3201 changes the reading index from 1000 to 1001. By repeating this loop, data of indexes 1001 to 1239 that are elements of the vector data VD are sequentially acquired, and stream data 3806 to 3809 at times 0:57 to 4:59 can be stored in the stream storage queue Q501. it can.
- FIG. 41 is a flowchart showing an example of a processing procedure performed by the output data stream TO vector conversion unit 3202 shown in FIG.
- the output data / stream TO vector conversion unit 3202 sequentially acquires stream data from the stream storage queue Q (S4101).
- the output data stream TO vector conversion unit 3202 determines whether or not the time of the acquired stream data coincides with the overlap data time OT (step S4102).
- step S4102 If they match (step S4102: Yes), the process proceeds to step S4104. If they do not match (step S4102: NO), the output data stream TO vector conversion unit 3202 stores the acquired stream data in a file (step S4102). S4103), the process proceeds to step S4104.
- step S4104 the output data stream TO vector conversion unit 3202 determines whether or not the batch processing platform 2621 has ended (step S4104). If not completed (step S4104: No), the process returns to step S4101. On the other hand, when the processing is completed (step S4104: Yes), the processing by the output data stream TO vector conversion unit 3202 is ended.
- FIG. 42 is an explanatory diagram showing an example of conversion from stream data to vector data VD. 42 will be described with reference to step numbers in FIG.
- step S 4101 the output data / stream TO vector conversion unit 3202 sequentially acquires the stream data 4201 to 4204 from the stream storage queue Q.
- step S4102 the output data stream TO vector conversion unit 3202 determines whether or not the obtained stream data matches the overlap data time OT. In this case, stream data 4201 at time 0:58 matches, and stream data 4202 after time 1:02 does not match. For this reason, in step S4103, the output data / stream TO vector conversion unit 3202 stores the stream data 4202 to 4204 after time 1:02, which are inconsistent, in the file F2. As a result, the stream data having the same time as the previous stream data is not output because it is not vector-converted. As a result, the batch program 600 at the subsequent stage can execute the batch processing with reference to the file.
- FIG. 43 is a flowchart showing a processing procedure by the calculation state reading unit 3203 shown in FIG.
- the calculation state reading unit 3203 sequentially refers to the operators configuring the operator tree 3400 (step S4301), extracts the calculation state 3211 of the referenced operator from the calculation state storage area 3210, and writes it in the stream program 400 (step S4302). ).
- the calculation state reading unit 3203 determines whether all operators in the operator tree 3400 have been referred to (step S4303). If all operators are not referred to (step S4303: NO), the process returns to step S4301. On the other hand, when all operators are referred to (step S4303: YES), the processing by the calculation state reading unit 3203 is terminated.
- FIG. 44 is a flowchart showing a processing procedure by the calculation state holding unit 3204 shown in FIG.
- the calculation state holding unit 3204 sequentially refers to the operators constituting the operator tree 3400 (step S4401), reads the calculation state 3211 of the referenced operator from the stream program 400, and holds it in the calculation state storage area 3210 (step S4401). S4402). Then, the calculation state holding unit 3204 determines whether all operators in the operator tree 3400 have been referred to (step S4403). If not all operators are referred to (step S4403: NO), the process returns to step S4401. On the other hand, when all operators have been referred to (step S4403: YES), the processing by the calculation state holding unit 3204 is terminated.
- the second embodiment it is possible to substantially overlap the stream data by using the execution result of the stream processing at the previous batch processing. Therefore, it is possible to avoid a state in which calculation cannot be performed, and it is possible to maintain the semantics of the stream program.
- processing throughput can be improved by increasing the input data size of the batch program and executing the batch program. Also, it is possible to execute a stream program that requires input data to overlap on a batch processing platform.
- the other processing when one processing platform and the program of the other processing are executed on the one processing platform, the other processing can be executed with overlapping time series data.
- the program of the other process can be executed on the one processing board without changing the code or algorithm of the program of the other process executed on the one processing board. Therefore, an existing program can be used on a processing platform with different processing, and the program can be executed easily and efficiently.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Stored Programmes (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
図1および図2は、ストリーム処理基盤上でバッチプログラムを実行する例を示す説明図である。図1は、データの重なりを持たせない場合の実行例である。バッチプログラムは、ウィンドウ幅となるデータ数が4個、スライドさせるデータ数であるスライド数が2個となるプログラム構成とする。なお、時刻の単位は、一例として「秒」とする。ストリーム処理基盤は、ウィンドウ幅4個のストリームデータを2個ずつスライドさせながら、所定の計算を実行する。
図3は、ストリーム処理システム300の一例を示すシステム構成図である。ストリーム処理システム300は、クライアント301と、データソース302と、ストリーム処理サーバ303と、が、ネットワークを介して通信可能に接続された構成である。ネットワーク304は、イーサネット(登録商標)、光ファイバなどで接続されるローカルエリアネットワーク(LAN)、またはLANよりも低速なインターネットを含んだワイドエリアネットワーク(WAN)でも差し支えない。また、クライアント301、データソース302、およびストリーム処理サーバ303は、パーソナルコンピュータ(PC)、ブレード型の計算機システムなどの任意のコンピュータシステムでよい。
図4は、図3に示したストリームプログラム群331の中のあるストリームプログラムの一例を示す説明図である。ストリームプログラム400は、ストリームデータを入出力とするプログラムである。図4では、CQL(Continuous Query Language)言語で定義されたストリームプログラム400を示す。ストリームプログラム400は入力ストリーム定義、出力ストリーム定義、およびクエリ定義群を含む。
図6は、図3に示したバッチプログラム群334の中のあるバッチプログラムの一例を示す説明図である。バッチプログラム600は、ベクトルデータを入出力とするプログラムである。図5では、バッチプログラム600は、ベクトルデータの定義とバッチ処理関数とを含む。
図8は、図3に示したバッチプログラム入出力静的決定部332の入出力関係を示す説明図である。バッチプログラム入出力静的決定部332は、ストリーム処理基盤321上でCPU311により実行されるプログラムであり、バッチプログラム600の静的な入出力設定を決定する。バッチプログラム入出力静的決定部332は、第1の静的決定部804を有する。
図9は、図8に示したプログラム構成801の一例を示す説明図である。プログラム構成801とは、プログラムの動作を構成するパラメータが設定される情報である。パラメータとしては、たとえば、ウィンドウ幅901とスライド数902があり、クライアント301を操作することでユーザにより指定される。
図10は、図8に示したプラットフォーム要件802の一例を示す説明図である。プラットフォーム要件802は、ストリーム処理基盤321に課される条件である。パラメータとしては、たとえば、レスポンス時間1001があり、クライアント301を操作することでユーザにより指定される。レスポンス時間1001は、データがストリーム処理サーバ303に入力してから、そのデータの処理が完了するまでの、ユーザが許容できる時間である。図10では、レスポンス時間1001は「16秒」と指定されているため、データの入力からそのデータの処理完了までの時間が16秒まで許容される。
図11は、図8に示したバッチ実行仕様803の一例を示す説明図である。バッチ実行仕様803とは、バッチ処理の実行方法を規定する情報である。パラメータとして、たとえば、入力レート1101および処理スループット1102があり、クライアント301を操作することでユーザにより指定される。入力レート1101は、バッチプログラム600が入力とするストリームデータが到着する間隔を示す。図11では入力レート1101は1個/秒であるため、毎秒1個、ストリームデータが到着することを示す。また、処理スループット1102は、バッチプログラム600が単位時間当たりに処理する、ベクトルデータVDの要素数を示す。図11では、処理スループット1102は1個/秒であるため、バッチプログラム600が毎秒1個の要素を処理できることを示す。
図12は、図3に示したバッチプログラム入出力動的決定部333の入出力関係を示す説明図である。バッチプログラム入出力動的決定部333は、ストリーム処理基盤321上でCPU311により実行されるプログラムであり、バッチプログラム600の動的な入出力設定を決定する。バッチプログラム入出力動的決定部333は、バッチ実行モニタリング部1201と、第1の動的決定部1203とを有する。
図13は、バッチ実行モニタリング値1202の一例を示す説明図である。バッチ実行モニタリング値1202は、パラメータとして、処理対象データ数1301および処理スループット1302を有し、バッチ実行モニタリング部1201により出力される。処理対象データ数1301は、ストリーム格納キューQに格納される、バッチプログラム600が入力とするストリームデータの数を示す。図13では処理対象データ数1301が「6」であるため、ストリーム格納キューQに6個のストリームデータがあることを示す。また処理スループット1302はバッチプログラム600が単位時間当たりに処理する、ベクトルデータVDの値のサイズを示す。
図14は、図8および図12に示したバッチプログラム入出力設定805の一例を示す説明図である。バッチプログラム入出力設定805とは、バッチプログラム600に入出力されるデータを規定する情報である。パラメータとして、たとえば、入力データサイズ1401および重なり幅1402があり、クライアント301を操作することでユーザにより指定される。入力データサイズ1401は、バッチプログラム600が入力するベクトルデータVDの要素数を示す。
図15は、図3に示したバッチプログラム実行部335の入出力の関係を示す説明図である。バッチプログラム実行部335は、入力データ・ストリームTOベクトル変換部1501および出力データ・ベクトルTOストリーム変換部1502を有する。入力データ・ストリームTOベクトル変換部1501は、バッチプログラム入出力設定805を入力する。バッチプログラム入出力設定805は、第1の静的決定部804や、第1の動的決定部1203が生成しても、ユーザにより手動で作成してもよい。
図17は、バッチプログラム入出力静的決定部332による処理手順例を示すフローチャートである。第1の静的決定部804は、まず、ユーザが指定したプログラム構成801、プラットフォーム要件802、およびバッチ実行仕様803を読込む(S1701)。つぎに、第1の静的決定部804は、重なり幅1402を「ウィンドウ幅-スライド数」としバッチプログラム入出力設定805にセットする(S1702)。
図18は、バッチ実行モニタリング部1201による処理手順例を示すフローチャートである。バッチ実行モニタリング部1201は、バッチプログラム600の入力となるストリームデータを格納するストリーム格納キューQの現在のデータ数を取得し、バッチ実行モニタリング値1202を処理対象データ数1301にセットする(S1801)。つぎに、バッチ実行モニタリング部1201は、ストリーム処理基盤321のログから処理スループット1102を取り出し、バッチ実行モニタリング値1202の処理スループット1102にセットする(S1802)。そして、バッチ実行モニタリング部1201は、ストリーム処理基盤321が終了していなければ(ステップS1803:No)、ステップS1801に戻り、終了していれば(ステップS1803:Yes)、処理を終了する。
図19は、第1の動的決定部1203による処理手順例を示すフローチャートである。第1の動的決定部1203は、まずプログラム構成801およびプラットフォーム要件802を読み込む(S1901)。つぎに、第1の動的決定部1203は、重なり幅1402を「ウィンドウ幅901-スライド数」とし、バッチプログラム入出力設定805にセットする(S1902)。
図20は、入力データ・ストリームTOベクトル変換部1501による処理手順例を示すフローチャートである。入力データ・ストリームTOベクトル変換部1501は、バッチプログラム入出力設定を読込む(S2001)。つぎに、入力データ・ストリームTOベクトル変換部1501は、ストリーム格納キューQに、入力データサイズ1401から重なり幅1402を引いた数以上のストリームデータが存在するか否かを判断する(ステップS2002)。
図21は、ストリームデータからベクトルデータVDへの変換例を示す説明図である。図21では、図20のステップ番号を参照して説明する。ステップS2001において、入力データ・ストリームTOベクトル変換部1501は、入力データサイズ1401が8個で重なり幅1402が2個であるバッチプログラム入出力設定805を読込む。
図22は、出力データ・ベクトルTOストリーム変換部1502による処理手順例を示すフローチャートである。出力データ・ベクトルTOストリーム変換部1502は、バッチプログラム600が出力するベクトルデータVDを取得する(S2201)。つぎに、出力データ・ベクトルTOストリーム変換部1502は、ベクトルデータVDから要素を順次取得し、取得した要素に時刻を付加してストリームデータを生成する(ステップS2202)。
図23は、ベクトルデータVDからストリームデータへの変換例を示す説明図である。図23では、図22のステップ番号を参照して説明する。ステップS2201において、出力データ・ベクトルTOストリーム変換部1502は、インデックス0~2の要素を持つベクトルデータVD2を取得する。
つぎに、実施例2について説明する。実施例2は、バッチ処理基盤上でストリームプログラム400を実行する例である。なお、実施例1と同一構成には同一符号を付し、その説明を省略する。
図26は、バッチ処理システムの一例を示すシステム構成図である。バッチ処理システム2600は、クライアント301と、データソース302と、バッチ処理サーバ2603と、が、ネットワーク304を介して通信可能に接続された構成である。ネットワーク304は、イーサネット(登録商標)、LAN、またはWANでも差し支えない。また、クライアント301、データソース302、およびストリーム処理サーバ303は、PC、ブレード型の計算機システムなどの任意のコンピュータシステムでよい。
図27は、図26に示したストリームプログラム入出力設定静的決定部2632の入出力関係を示す説明図である。ストリームプログラム入出力設定静的決定部2632は、バッチ処理基盤2621上でCPU2611により実行されるプログラムであり、ストリームプログラム400の静的な入出力設定を決定する。ストリームプログラム入出力設定静的決定部2632は、第2の静的決定部2702を有する。
図28は、ストリーム実行仕様2701の一例を示す説明図である。ストリーム実行仕様2701とは、ストリーム処理の実行方法を規定する情報である。パラメータとして、たとえば、入力レート2801、処理スループット2802、計算状態保持・読出し時間2803があり、クライアント301を操作することでユーザにより指定される。入力レート2801は、ストリームプログラム400が入力とするベクトルデータVDの要素が到着する間隔を示す。
図29は、図26に示したストリームプログラム入出力設定動的決定部2633の入出力関係を示す説明図である。ストリームプログラム入出力設定動的決定部2633は、バッチ処理基盤2621上でCPU2611により実行されるプログラムであり、ストリームプログラム400の動的な入出力設定を決定する。ストリームプログラム入出力設定動的決定部2633は、ストリーム実行モニタリング部2901と、第2の動的決定部2903とを有する。
図30は、ストリーム実行モニタリング値2902の一例を示す説明図である。ストリーム実行モニタリング値2902は、パラメータとして、処理対象データ数3001、処理スループット3002、および計算状態保持・読出し時間3003を有し、ストリーム実行モニタリング部2901により出力される。処理対象データ数3001は、ストリームプログラム400が入力とするベクトルデータVDの要素数を示す。当該要素数は、たとえば、ファイルに格納されている。
図31は、図27および図29に示したストリームプログラム入出力設定2703の一例を示す説明図である。ストリームプログラム入出力設定2703とは、ストリームプログラム400に入出力されるデータを規定する情報である。パラメータとして、たとえば、入力データサイズ3101、重なり幅3102、計算状態保持有無3103があり、クライアント301を操作することでユーザにより指定される。入力データサイズ3101は、ストリームプログラム400が入力するストリームデータのサイズを示す。
図32は、図26に示したストリームプログラム実行部2635の入出力の関係を示す説明図である。ストリームプログラム実行部2635は、入力データ・ベクトルTOストリーム変換部3201と、出力データ・ストリームTOベクトル変換部3202と、計算状態読出し部3203と、計算状態保持部3204と、を有する。
図33は、図32に示した重なりデータ時刻の一例を示す説明図である。重なりデータ時刻OTは、ストリームプログラム400が入力するストリームデータの中で、一つ前の実行でストリームプログラム400が入力するストリームデータと重複するストリームデータの時刻を示す。図33では、重なりデータ時刻OTは「0:57~0:59」であるため、現在と一つ前のストリームプログラム400のいずれの実行においても、時刻「0:57」~「0:59」のストリームデータを入力として持つ。重なりデータ時刻OTは、入力データ・ベクトルTOストリーム変換部3201により設定され、出力データ・ストリームTOベクトル変換部3202に使用される。
図34は、オペレータツリーの一例を示す説明図である。オペレータツリー3400は、CQLで記述したストリームプログラム400をコンパイルすることに生成される。ストリーム処理基盤321は、オペレータツリー3400を構成する各オペレータ3401~3404を、オペレータツリー3400で指定された順に実行する。図34に示すオペレータツリー3400は、図4に示すノイズ除去クエリ403および異常センサクエリ404のコンパイルの結果、生成されたオペレータツリー3400である。オペレータツリー3400は、たとえば、ROWS3401、GROUP BY3402、ISTREAM3403、ISTREAM3404により構成され、ROWS3401、GROUP BY3402、ISTREAM3403、ISTREAM3404の順に実行される。
図35は、図32に示した計算状態格納領域3210の一例を示す説明図である。計算状態格納領域3210には、計算状態3211が格納される。計算状態3211は、各オペレータ3401~3404の計算に用いる状態を示す。たとえば、ROWS3401の計算状態は、最近4個のストリームデータを保持するウィンドウであるため、4個のストリームデータを格納する。また、オペレータGROUP BY3402の計算状態3211は、最近4個の計測値の平均値を格納する。
図36は、第2の静的決定部2702による処理手順例を示すフローチャートである。第2の静的決定部2702は、まず、プログラム構成801、プラットフォーム要件802、およびストリーム実行仕様2701を読込む(ステップS3601)。
図37は、ストリーム実行モニタリング部2901による処理手順例を示すフローチャートである。ストリーム実行モニタリング部2901は、ストリームプログラム400の入力となるストリームデータを格納するファイルを参照し、ストリーム実行モニタリング値2902を処理対象データ数3001にセットする(ステップS3701)。
図38は、第2の動的決定部2903による処理手順例を示すフローチャートである。第2の動的決定部2903は、まず、プログラム構成801およびプラットフォーム要件802を読み込む(ステップS3801)。つぎに、第2の動的決定部2903は、処理対象のベクトルデータVDの要素がファイルに存在するか否かを判断する(ステップS3802)。処理対象のベクトルデータVDがファイルに存在する場合(ステップS3802:Yes)、第2の動的決定部2903は、ストリーム実行モニタリング値2902を読込む(ステップS3803)。
図39は、図32に示した入力データ・ベクトルTOストリーム変換部3201による処理手順例を示すフローチャートである。入力データ・ベクトルTOストリーム変換部3201は、まず、ファイルの読出しインデックスを、「最後入力データのインデックス+1-重なり幅」に設定する(ステップS3901)。最後入力データとは、最後にファイルから読み出されたベクトルデータVDの要素である。
図40は、ベクトルデータVDからストリームデータへの変換例を示す説明図である。図40では、図39のステップ番号を参照して説明する。なお、この時点での最後入力データを、インデックス1002の要素(10.0)とする。
図41は、図32に示した出力データ・ストリームTOベクトル変換部3202による処理手順例を示すフローチャートである。出力データ・ストリームTOベクトル変換部3202は、ストリーム格納キューQから順次ストリームデータを取得する(S4101)。つぎに、出力データ・ストリームTOベクトル変換部3202は、取得したストリームデータに時刻が、重なりデータ時刻OTと一致するか否かを判断する(ステップS4102)。
図42は、ストリームデータからベクトルデータVDへの変換例を示す説明図である。図42では、図41のステップ番号を参照して説明する。ステップS4101において、出力データ・ストリームTOベクトル変換部3202は、ストリーム格納キューQから順次ストリームデータ4201~4204を取得する。
図43は、図32に示した計算状態読出し部3203による処理手順を示すフローチャートである。まず、計算状態読出し部3203は、オペレータツリー3400を構成するオペレータを順次参照し(ステップS4301)、参照したオペレータの計算状態3211を、計算状態格納領域3210から取り出し、ストリームプログラム400に書き込む(ステップS4302)。そして、計算状態読出し部3203は、オペレータツリー3400の全オペレータを参照したか否かを判断する(ステップS4303)。全オペレータを参照していない場合(ステップS4303:No)、ステップS4301に戻る。一方、全オペレータを参照した場合(ステップS4303:Yes)、計算状態読出し部3203による処理を終了する。
図44は、図32に示した計算状態保持部3204による処理手順を示すフローチャートである。まず、計算状態保持部3204は、オペレータツリー3400を構成するオペレータを順次参照し(ステップS4401)、参照したオペレータの計算状態3211を、ストリームプログラム400から読出し、計算状態格納領域3210に保持する(ステップS4402)。そして、計算状態保持部3204は、オペレータツリー3400の全オペレータを参照したか否かを判断する(ステップS4403)。全オペレータを参照していない場合(ステップS4403:No)、ステップS4401に戻る。一方、全オペレータを参照した場合(ステップS4403:Yes)、計算状態保持部3204による処理を終了する。
Claims (14)
- プロセッサと、ストリーム処理を実行するストリームプログラム、バッチ処理を実行するバッチプログラム、および前記ストリームプログラムを制御するストリーム処理制御プログラムを記憶するメモリと、を有するデータ処理装置であって、
前記プロセッサは、
前記ストリーム処理制御プログラムにより、時系列なストリームデータ列のうちあるストリームデータからの時系列な第1のストリームデータ群について、当該第1のストリームデータ群の各ストリームデータを要素としてまとめた第1のベクトルデータを生成する第1の生成手順と、
前記ストリーム処理制御プログラムにより、前記時系列なストリームデータ列のうち前記第1のストリームデータ群の中途のストリームデータを先頭とし、かつ、前記第1のストリームデータ群と同数のデータ数である時系列な第2のストリームデータ群について、当該第2のストリームデータ群の各ストリームデータを要素としてまとめた第2のベクトルデータを生成する第2の生成手順と、
前記ストリーム処理制御プログラムにより、前記第1の生成手順および前記第2の生成手順によって生成された第1のベクトルデータおよび第2のベクトルデータを前記バッチプログラムに入力してバッチ処理を実行させる制御手順と、
を実行することを特徴とするデータ処理装置。 - 前記プロセッサは、
前記ストリーム処理制御プログラムにより、前記第1のベクトルデータが前記バッチプログラムに入力されて前記制御手順によってバッチ処理が実行された実行結果である第3のベクトルデータについて、当該第3のベクトルデータに含まれる時刻ごとの値である要素群を、時刻に対応する値となる時系列な第3のストリームデータ群に変換する第1の変換手順と、
前記ストリーム処理制御プログラムにより、前記第2のベクトルデータが前記バッチプログラムに入力されて前記制御手順によってバッチ処理が実行された実行結果である第4のベクトルデータについて、当該第4のベクトルデータに含まれる時刻ごとの値である要素群を、時刻に対応する値となる時系列な第4のストリームデータ群に変換する第2の変換手順と、
を実行することを特徴とする請求項1に記載のデータ処理装置。 - 前記プロセッサは、
前記ストリーム処理制御プログラムにより、前記ストリームプログラムを制御して実行されたストリーム処理の処理結果である前記第1のストリームデータ群および前記第2のストリームデータ群を出力する出力手順を実行し、
前記第1の生成手順では、
前記ストリーム処理制御プログラムにより、前記出力手順によって出力された前記第1のストリームデータ群について、前記第1のベクトルデータを生成し、
前記第2の生成手順では、
前記ストリーム処理制御プログラムにより、前記出力手順によって出力された前記第2のストリームデータ群について、前記第2のベクトルデータを生成することを特徴とする請求項1に記載のデータ処理装置。 - 前記プロセッサは、
前記ストリーム処理制御プログラムにより、前記第1のストリームデータ群および前記第2のストリームデータ群に存在するデータの個数と同数である、前記第1のベクトルデータおよび前記第2のベクトルデータに含ませる要素数を、入力データサイズとして設定し、前記第1のストリームデータ群と前記第2のストリームデータ群との間でのストリームデータの重複数と同数である、前記第1のベクトルデータと前記第2のベクトルデータとの間の前記要素の重複数を、重なり幅として設定する設定手順を実行し、
前記第1の生成手順では、
前記ストリーム処理制御プログラムにより、前記設定手順によって設定された入力データサイズおよび重なり幅にしたがって、前記出力手順によって出力された前記第1のストリームデータ群について、前記第1のベクトルデータを生成し、
前記第2の生成手順では、
前記ストリーム処理制御プログラムにより、前記設定手順によって設定された入力データサイズおよび重なり幅にしたがって、前記出力手順によって出力された前記第2のストリームデータ群について、前記第2のベクトルデータを生成することを特徴とする請求項1に記載のデータ処理装置。 - 前記プロセッサは、
前記設定手順では、
前記ストリーム処理制御プログラムにより、前記バッチプログラムで前記時系列なストリームデータ列の各ストリームデータが入力されてからバッチ処理が完了するまでのレスポンス時間と、前記バッチプログラムに前記各ストリームデータが入力される間隔である入力レートと、前記バッチプログラムが単位時間当たりに処理するベクトルデータの要素数である処理スループットと、に基づいて、前記入力データサイズを設定することを特徴とする請求項4に記載のデータ処理装置。 - 前記プロセッサは、
前記時系列なストリームデータ列のうちキューに格納されているストリームデータの個数である処理対象データ数と、前記ストリーム処理制御プログラムにより現在前記バッチプログラムが単位時間当たりに処理するベクトルデータの要素数である処理スループットと、を取得する取得手順と、
前記取得手順によって取得された処理スループットと、前記キューに格納されているストリームデータが有する時刻のうち最古の時刻と、前記バッチプログラムで前記時系列なストリームデータ列の各ストリームデータが入力されてからバッチ処理が完了するまでのレスポンス時間と、に基づいて、前記バッチプログラムでの処理可能なストリームデータ数を算出する算出手順と、を実行し、
前記設定手順では、
前記取得手順によって取得された処理対象データ数と、前記重なり幅と、前記算出手順によって算出された処理可能なストリームデータ数と、に基づいて、前記入力データサイズを設定することを特徴とする請求項4に記載のデータ処理装置。 - プロセッサと、バッチ処理を実行するバッチプログラム、ストリーム処理を実行するストリームプログラム、および前記バッチプログラムを制御するバッチ処理制御プログラムを記憶するメモリと、を有するデータ処理装置であって、
前記プロセッサは、
前記バッチ処理制御プログラムにより、時刻ごとの値である要素列を含むベクトルデータから、前記要素列内の第1の要素群の各要素を分割して時系列にした第1のストリームデータ群を生成する第1の生成手順と、
前記バッチ処理制御プログラムにより、前記要素列のうち前記第1の要素群の中途の要素を先頭とし、かつ、前記第1の要素群と同数の要素数である時系列な第2の要素群について、当該第2の要素群の各要素を分割して時系列にした第2のストリームデータ群を生成する第2の生成手順と、
前記バッチ処理制御プログラムにより、前記第1の生成手順および前記第2の生成手順によって生成された第1のストリームデータ群および第2のストリームデータ群を前記ストリームプログラムに入力してストリーム処理を実行させる制御手順と、
前記バッチ処理制御プログラムにより、前記第1のストリームデータ群が前記ストリームプログラムに入力されて前記制御手順によってストリーム処理が実行された実行結果である第3のストリームデータ群のストリームデータを取得し、前記第2のストリームデータ群が前記ストリームプログラムに入力されて前記制御手順によってストリーム処理が実行された実行結果である第4のストリームデータ群を取得し、当該第4のストリームデータ群から前記第3のストリームデータ群のストリームデータと重複するストリームデータを除外した除外後のストリームデータ群を第2のベクトルデータに変換する変換手順と、
を実行することを特徴とするデータ処理装置。 - 前記プロセッサは、
前記バッチ処理制御プログラムにより、前記第1のストリームデータ群および前記第2のストリームデータ群に含ませるストリームデータの個数である入力データサイズと、前記第1のストリームデータ群と前記第2のストリームデータ群との間のストリームデータの重複数である重なり幅と、を設定する設定手順を実行し、
前記第1の生成手順は、
前記ストリーム処理制御プログラムにより、前記設定手順によって設定された入力データサイズおよび重なり幅にしたがって、前記第1のベクトルデータから、前記第1のストリームデータ群を生成し、
前記第2の生成手順は、
前記ストリーム処理制御プログラムにより、前記設定手順によって設定された入力データサイズおよび重なり幅にしたがって、前記第1のベクトルデータから、前記第2のストリームデータ群を生成することを特徴とする請求項7に記載のデータ処理装置。 - プロセッサと、ストリーム処理を実行するストリームプログラム、バッチ処理を実行するバッチプログラム、および前記ストリームプログラムを制御するストリーム処理制御プログラムを記憶するメモリと、を有するデータ処理装置が実行するデータ処理方法であって、
前記プロセッサは、
前記ストリーム処理制御プログラムにより、時系列なストリームデータ列のうちあるストリームデータからの時系列な第1のストリームデータ群について、当該第1のストリームデータ群の各ストリームデータを要素としてまとめた第1のベクトルデータを生成する第1の生成手順と、
前記ストリーム処理制御プログラムにより、前記時系列なストリームデータ列のうち前記第1のストリームデータ群の中途のストリームデータを先頭とし、かつ、前記第1のストリームデータ群と同数のデータ数である時系列な第2のストリームデータ群について、当該第2のストリームデータ群の各ストリームデータを要素としてまとめた第2のベクトルデータを生成する第2の生成手順と、
前記ストリーム処理制御プログラムにより、前記第1の生成手順および前記第2の生成手順によって生成された第1のベクトルデータおよび第2のベクトルデータを前記バッチプログラムに入力してバッチ処理を実行させる制御手順と、
を実行することを特徴とするデータ処理方法。 - 前記プロセッサは、
前記ストリーム処理制御プログラムにより、前記第1のベクトルデータが前記バッチプログラムに入力されて前記制御手順によってバッチ処理が実行された実行結果である第3のベクトルデータについて、当該第3のベクトルデータに含まれる時刻ごとの値である要素群を、時刻に対応する値となる時系列な第3のストリームデータ群に変換する第1の変換手順と、
前記ストリーム処理制御プログラムにより、前記第2のベクトルデータが前記バッチプログラムに入力されて前記制御手順によってバッチ処理が実行された実行結果である第4のベクトルデータについて、討議亜第4のベクトルデータに含まれる時刻ごとの値である要素群を、時刻に対応する値となる時系列な第4のストリームデータ群に変換する第2の変換手順と、
を実行することを特徴とする請求項9に記載のデータ処理方法。 - 前記プロセッサは、
前記ストリーム処理制御プログラムにより、前記ストリームプログラムを制御して実行されたストリーム処理の処理結果である前記第1のストリームデータ群および前記第2のストリームデータ群を出力する出力手順を実行し、
前記第1の生成手順では、
前記ストリーム処理制御プログラムにより、前記出力手順によって出力された前記第1のストリームデータ群について、前記第1のベクトルデータを生成し、
前記第2の生成手順では、
前記ストリーム処理制御プログラムにより、前記出力手順によって出力された前記第2のストリームデータ群について、前記第2のベクトルデータを生成することを特徴とする請求項9に記載のデータ処理方法。 - 前記プロセッサは、
前記ストリーム処理制御プログラムにより、前記第1のストリームデータ群および前記第2のストリームデータ群に存在するデータの個数と同数である、前記第1のベクトルデータおよび前記第2のベクトルデータに含ませる要素数を、入力データサイズとして設定し、前記第1のストリームデータ群と前記第2のストリームデータ群との間でのストリームデータの重複数と同数である、前記第1のベクトルデータと前記第2のベクトルデータとの間の前記要素の重複数を、重なり幅として設定する設定手順を実行し、
前記第1の生成手順では、
前記ストリーム処理制御プログラムにより、前記設定手順によって設定された入力データサイズおよび重なり幅にしたがって、前記出力手順によって出力された前記第1のストリームデータ群について、前記第1のベクトルデータを生成し、
前記第2の生成手順では、
前記ストリーム処理制御プログラムにより、前記設定手順によって設定された入力データサイズおよび重なり幅にしたがって、前記出力手順によって出力された前記第2のストリームデータ群について、前記第2のベクトルデータを生成することを特徴とする請求項9に記載のデータ処理方法。 - 前記プロセッサは、
前記設定手順では、
前記ストリーム処理制御プログラムにより、前記バッチプログラムで前記時系列なストリームデータ列の各ストリームデータが入力されてからバッチ処理が完了するまでのレスポンス時間と、前記バッチプログラムに前記各ストリームデータが入力される間隔である入力レートと、前記バッチプログラムが単位時間当たりに処理するベクトルデータの要素数である処理スループットと、に基づいて、前記入力データサイズを設定することを特徴とする請求項12に記載のデータ処理方法。 - 前記プロセッサは、
前記時系列なストリームデータ列のうちキューに格納されているストリームデータの個数である処理対象データ数と、前記ストリーム処理制御プログラムにより現在前記バッチプログラムが単位時間当たりに処理するベクトルデータの要素数である処理スループットと、を取得する取得手順と、
前記取得手順によって取得された処理スループットと、前記キューに格納されているストリームデータが有する時刻のうち最古の時刻と、前記バッチプログラムで前記時系列なストリームデータ列の各ストリームデータが入力されてからバッチ処理が完了するまでのレスポンス時間と、に基づいて、前記バッチプログラムでの処理可能なストリームデータ数を算出する算出手順と、を実行し、
前記設定手順では、
前記取得手順によって取得された処理対象データ数と、前記重なり幅と、前記算出手順によって算出された処理可能なストリームデータ数と、に基づいて、前記入力データサイズを設定することを特徴とする請求項12に記載のデータ処理方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2013/069630 WO2015008379A1 (ja) | 2013-07-19 | 2013-07-19 | データ処理装置およびデータ処理方法 |
US14/771,064 US9921869B2 (en) | 2013-07-19 | 2013-07-19 | Data processing apparatus and data processing method |
JP2015527126A JP6038324B2 (ja) | 2013-07-19 | 2013-07-19 | データ処理装置およびデータ処理方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2013/069630 WO2015008379A1 (ja) | 2013-07-19 | 2013-07-19 | データ処理装置およびデータ処理方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015008379A1 true WO2015008379A1 (ja) | 2015-01-22 |
Family
ID=52345871
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/069630 WO2015008379A1 (ja) | 2013-07-19 | 2013-07-19 | データ処理装置およびデータ処理方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US9921869B2 (ja) |
JP (1) | JP6038324B2 (ja) |
WO (1) | WO2015008379A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116841753A (zh) * | 2023-08-31 | 2023-10-03 | 杭州迅杭科技有限公司 | 一种流处理和批处理的切换方法及切换装置 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10748646B2 (en) * | 2016-12-30 | 2020-08-18 | General Electric Company | Chunk-wise transmission of time-series data to mobile devices |
CN111506350A (zh) * | 2020-04-30 | 2020-08-07 | 中科院计算所西部高等技术研究院 | 具有ooda循环分区机制的流式处理器 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7310638B1 (en) * | 2004-10-06 | 2007-12-18 | Metra Tech | Method and apparatus for efficiently processing queries in a streaming transaction processing system |
JP2010108073A (ja) * | 2008-10-28 | 2010-05-13 | Hitachi Ltd | ストリームデータ処理方法、及びシステム |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5423553B2 (ja) | 2010-04-09 | 2014-02-19 | 株式会社日立製作所 | データベース管理方法、計算機、センサネットワークシステム及びデータベース検索プログラム |
US9275093B2 (en) * | 2011-01-28 | 2016-03-01 | Cisco Technology, Inc. | Indexing sensor data |
US8978034B1 (en) * | 2013-03-15 | 2015-03-10 | Natero, Inc. | System for dynamic batching at varying granularities using micro-batching to achieve both near real-time and batch processing characteristics |
-
2013
- 2013-07-19 JP JP2015527126A patent/JP6038324B2/ja not_active Expired - Fee Related
- 2013-07-19 US US14/771,064 patent/US9921869B2/en not_active Expired - Fee Related
- 2013-07-19 WO PCT/JP2013/069630 patent/WO2015008379A1/ja active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7310638B1 (en) * | 2004-10-06 | 2007-12-18 | Metra Tech | Method and apparatus for efficiently processing queries in a streaming transaction processing system |
JP2010108073A (ja) * | 2008-10-28 | 2010-05-13 | Hitachi Ltd | ストリームデータ処理方法、及びシステム |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116841753A (zh) * | 2023-08-31 | 2023-10-03 | 杭州迅杭科技有限公司 | 一种流处理和批处理的切换方法及切换装置 |
CN116841753B (zh) * | 2023-08-31 | 2023-11-17 | 杭州迅杭科技有限公司 | 一种流处理和批处理的切换方法及切换装置 |
Also Published As
Publication number | Publication date |
---|---|
JP6038324B2 (ja) | 2016-12-07 |
US20160004555A1 (en) | 2016-01-07 |
US9921869B2 (en) | 2018-03-20 |
JPWO2015008379A1 (ja) | 2017-03-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2932407B1 (en) | Method for storage, querying, and analysis of time series data | |
EP3502814B1 (en) | Processing loads balancing of control and monitoring functions | |
JP6038324B2 (ja) | データ処理装置およびデータ処理方法 | |
CN105980941A (zh) | 监视装置及监视方法 | |
US10452048B2 (en) | Control system and control device | |
Sobaszek et al. | Predictive scheduling as a part of intelligent job scheduling system | |
WO2020072155A1 (en) | Orchestration of containerized applications | |
CN116594349B (zh) | 机床预测方法、装置、终端设备以及计算机可读存储介质 | |
JP2018512630A (ja) | 工作機械の監視方法 | |
CN105988854A (zh) | 动态编译方法及装置 | |
Alrabghi et al. | A novel framework for simulation-based optimisation of maintenance systems | |
Huba | Some practical issues in the smith predictor design for FOTD systems | |
Balbastre et al. | Control tasks delay reduction under static and dynamic scheduling policies | |
JP5149254B2 (ja) | 実行可能な設定の生成 | |
US10871759B2 (en) | Machining time prediction device for predicting an execution time for tool change | |
US20170371908A1 (en) | Automatic updating of operational tables | |
EP4130899A1 (en) | Control device, program, and control method | |
Su et al. | A Decentralized MILP Method for Rescheduling Semiconductor Assembly Systems with Re-Entrance and Time Window Constraints | |
US10088834B2 (en) | Control system having function for optimizing control software of numerical controller in accordance with machining program | |
JP6349837B2 (ja) | スケジューラ装置及びそのスケジューリング方法、演算処理システム、並びにコンピュータ・プログラム | |
US20160154387A1 (en) | Function unit, analog input unit, and programmable controller system | |
CN117896417B (zh) | 一种嵌入式工业物联网控制器 | |
Žužek et al. | A max-plus algebra approach for generating non-delay schedule | |
US20180173504A1 (en) | Apparatus for Providing Program | |
JP2007011685A (ja) | パラメータ設定装置、パラメータ設定方法およびプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13889631 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14771064 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2015527126 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13889631 Country of ref document: EP Kind code of ref document: A1 |