WO2014118954A1 - データ圧縮装置、データ圧縮方法およびプログラム - Google Patents
データ圧縮装置、データ圧縮方法およびプログラム Download PDFInfo
- Publication number
- WO2014118954A1 WO2014118954A1 PCT/JP2013/052245 JP2013052245W WO2014118954A1 WO 2014118954 A1 WO2014118954 A1 WO 2014118954A1 JP 2013052245 W JP2013052245 W JP 2013052245W WO 2014118954 A1 WO2014118954 A1 WO 2014118954A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- starting point
- time
- input
- candidate
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3059—Digital compression and data reduction techniques where the original information is represented by a subset or similar information, e.g. lossy compression
Definitions
- Embodiments described herein relate generally to a data compression apparatus, a data compression method, and a program.
- a method of compressing data by thinning out point data constituting the time series data from the input time series data is known.
- compression methods include a Box Car algorithm, a Backward Slope algorithm, and a Swing Door algorithm.
- the Swinging Door algorithm is a typical example of an algorithm that thins out data by linear approximation so that the error is below a preset threshold value.
- the Swinging Door algorithm one starting point is determined, and linear approximation is performed from this starting point so that the error is not more than a preset threshold value.
- time series data stored in the time series database tend to increase, and there is a need for a method for compressing time series data more efficiently.
- the data compression apparatus includes a reception unit, a generation unit, a selection unit, and a compression unit.
- the reception unit receives a plurality of input data input in time series.
- the generation unit generates a plurality of starting point candidates that are data whose error with respect to the starting point data that is input data input at the first time is within a threshold value.
- the selection unit uses a starting point candidate, end point data that is input data input at a second time, and intermediate data that is input data at a time between the first time and the second time. Then, among the starting point candidates, a starting point candidate having a larger number of intermediate data that can be approximated so that the error is within the threshold between the starting point candidate and the end point data is selected.
- the compression unit outputs the selected starting point candidate and end point data as output data obtained by compressing the starting point data, the intermediate data, and the end point data.
- FIG. 1 is a block diagram illustrating an example of the configuration of the data compression apparatus according to the first embodiment.
- FIG. 2 is a diagram illustrating an example of time-series data.
- FIG. 3 is a diagram for explaining a first method of compressing time-series data.
- FIG. 4 is a diagram for explaining a first method for compressing time-series data.
- FIG. 5 is a diagram for explaining a first method for compressing time-series data.
- FIG. 6 is a diagram for explaining a second method for compressing time-series data.
- FIG. 7 is a diagram for explaining a second method for compressing time-series data.
- FIG. 8 is a diagram for explaining a second method for compressing time-series data.
- FIG. 1 is a block diagram illustrating an example of the configuration of the data compression apparatus according to the first embodiment.
- FIG. 2 is a diagram illustrating an example of time-series data.
- FIG. 3 is a diagram for explaining a first method of
- FIG. 9 is a diagram for explaining a second method for compressing time-series data.
- FIG. 10 is a diagram illustrating an example of the time-series data compression method according to the first embodiment.
- FIG. 11 is a diagram illustrating an example of a time-series data compression method according to the first embodiment.
- FIG. 12 is a diagram illustrating an example of the time-series data compression method according to the first embodiment.
- FIG. 13 is a diagram illustrating an example of a time-series data compression method according to the first embodiment.
- FIG. 14 is a flowchart showing an overall flow of data compression processing in the first embodiment.
- FIG. 15 is a diagram for explaining an example of post-processing.
- FIG. 16 is a block diagram illustrating an example of the configuration of the data compression apparatus according to the second embodiment.
- FIG. 17 is a diagram for explaining an example of the minimum lower limit gradient and the maximum upper limit gradient.
- FIG. 18 is a diagram illustrating an example of the upper limit gradient and the lower limit gradient.
- FIG. 19 is a diagram illustrating an example of the upper limit gradient and the lower limit gradient.
- FIG. 20 is a diagram illustrating an example of the upper limit gradient and the lower limit gradient.
- FIG. 21 is a flowchart showing an overall flow of data compression processing in the second embodiment.
- FIG. 22 is an explanatory diagram showing the hardware configuration of the data compression apparatus according to the first or second embodiment.
- the data compression apparatus determines a plurality of starting points (starting point candidates) and employs starting point candidates that can be more efficiently compressed to compress time-series data.
- Time series data is a series of values (point data string) obtained by observing or measuring a temporal change of a certain phenomenon. Time series data is usually measured at predetermined time intervals. Stock prices, plant equipment sensor values, etc. are examples of time-series data. For example, it can be said that each series of values such as temperature, vibration, and control set values of a large number of devices constituting the plant facility is one time series data.
- the time series database is a database of time series data.
- the time series database stores a large amount of time series data in a time series in a memory on a computer and an external storage device (hard disk).
- Data items that are the minimum unit of data storage are also called tags.
- the tag is composed of a data value, a time stamp, a data status, and the like.
- the types of data to be collected include operation data input from a control system, calculation data obtained by an online calculation function, data input manually by an operator, interface data input from another system, and the like.
- a time-series database generally has thousands to tens of thousands of tags, and the data storage period of each tag is one to several years.
- the data collection period depends on the real-time nature of the target system (plant equipment, etc.), but it is a few seconds to 1 minute.
- the time series database needs a capacity of about 10 GB (gigabyte) to 10 TB (terabyte).
- the search performance is inevitably deteriorated.
- FIG. 1 is a block diagram illustrating an example of the configuration of the data compression apparatus 100 according to the first embodiment.
- the data compression apparatus 100 includes a reception unit 101, a registration unit 110, a search unit 114, and a storage unit 121.
- the reception unit 101, the registration unit 110, and the search unit 114 may be realized by software, for example, by causing a processing device such as a CPU (Central Processing Unit) to execute a program, or an IC (Integrated Circuit). Such hardware may be used, or software and hardware may be used in combination.
- a processing device such as a CPU (Central Processing Unit) to execute a program, or an IC (Integrated Circuit).
- IC Integrated Circuit
- the storage unit 121 stores various data.
- the storage unit 121 stores time-series data after being compressed by the compression unit 113.
- the storage unit 121 can be configured by any commonly used storage medium such as an HDD (Hard Disk Drive), an optical disk, a memory card, and a RAM (Random Access Memory).
- the accepting unit 101 accepts a processing request and data input from an external device such as a client device.
- a time series data registration request, a time series data search request, and the like correspond to processing requests.
- the receiving unit 101 receives a plurality of input data (point data of time series data) input in time series.
- the accepting unit 101 may accept point data input in real time.
- the receiving unit 101 stores the point data input in real time in the storage unit 121, for example.
- the receiving unit 101 may receive point data in time series order from the time series data stored in the storage unit 121 or the like.
- the receiving unit 101 may be configured to go back in time starting from a certain time, that is, sequentially receive point data of the previous time.
- the registration unit 110 performs processing (compression processing) of thinning point data from the input point data series based on the allowable error, and registers the thinned point data in the storage unit 121 as time series data. Any algorithm conventionally used, such as a Swing Door algorithm, can be applied as an algorithm for thinning out point data using a starting point candidate and other point data.
- the registration unit 110 includes a generation unit 111, a selection unit 112, and a compression unit 113.
- the generating unit 111 generates a plurality of starting point candidates that are data within a predetermined threshold value with respect to starting point data that is point data at a certain time (first time).
- the selection unit 112 selects a starting point candidate that can compress time-series data more efficiently among the starting point candidates. For example, the selection unit 112 sets point data (intermediate data) that is approximated so that the error is within a threshold value by the start point candidate and the end point data input at a different time (second time) from the start point data. ) Is selected as a starting point candidate having a larger number.
- the compression unit 113 outputs the selected starting point candidate and end point data as time-series data (output data) after compression. For example, the compression unit 113 sequentially stores the time-series data after compression in the storage unit 121.
- the compression unit 113 may store a plurality of time-series data after compression in the storage unit 121 in a lump.
- the search unit 114 searches the time series data stored in the storage unit 121. For example, when the start time, the end time, and the sampling interval are specified, the search unit 114 searches the point data series in the section from the start time to the end time from the time series database at the specified sampling interval. Since point data may be thinned out by the registration unit 110, point data may not be searched at a specified sampling interval. In such a case, the search unit 114 interpolates the point data using, for example, a linear interpolation formula.
- the linear interpolation formula is an example of a method for interpolating between two points.
- FIG. 2 is a diagram illustrating an example of time-series data.
- FIG. 2 shows time-series data including five point data P1, P2, P3, P4, and P5.
- the point data is, for example, a combination of time (Time) and value (Value).
- the time interval does not necessarily have to be constant.
- 3 to 5 are diagrams for explaining the first method of compressing time-series data. As shown in FIG. 3, let an allowable error designated in advance be ⁇ . Also, P1 is set as the starting point data. When P2 is input as new point data, the registration unit 110 obtains an upper limit gradient US2 and a lower limit gradient LS2 for P2.
- the registration unit 110 sets two point data P2 ′ ⁇ t2, v2 + ⁇ > and P2 ′′ ⁇ t2, v2- ⁇ > that are the maximum allowable error at time t2 with respect to the value ⁇ t2, v2> of P2.
- the registration unit 110 obtains an upper limit gradient US3 and a lower limit gradient LS3 for P3.
- the registration unit 110 obtains an upper limit gradient US4 and a lower limit gradient LS4 for P4. If the upper limit gradient US4 up to P4 is smaller than the upper limit gradient US3 up to P3 and the lower limit gradient LS4 up to P4 is larger than the lower limit gradient LS3 up to P3, the old point data P3 is thinned out.
- the lower limit gradient LS4 up to P4 is smaller than the lower limit gradient LS3 up to P3.
- P4 cannot be thinned out, and P3 remains as end point data.
- Two point data P1 and P3 are archived in the time series database (storage unit 121).
- FIG. 6 to 9 are diagrams for explaining a second method of compressing time series data.
- an allowable error designated in advance be ⁇ .
- P1 is set as the starting point.
- the registration unit 110 obtains an upper limit gradient US2 and a lower limit gradient LS2 for P2.
- the registration unit 110 can obtain the upper limit gradient US2 and the lower limit gradient LS2 by the same method as in FIG.
- the difference from the first method in FIG. 3 is that an allowable error range is obtained.
- the allowable error range for P2 is represented by a hatched portion.
- the allowable error range for P2 is specified by two parameters, an upper limit gradient US2 and a lower limit gradient LS2.
- the registration unit 110 obtains a temporary upper limit gradient US3 and a temporary lower limit gradient LS3 for P3.
- the allowable error range for P3 is specified by two parameters, an upper limit gradient US3 and a lower limit gradient LS3.
- the registration unit 110 sets an overlapping portion of the allowable error range for P2 and the temporary allowable error range for P3 as the allowable error range for P3.
- the registration unit 110 calculates, for example, “LS2> US3 ⁇ LS3> US2”. If this value is true, the registration unit 110 determines that the allowable error range for P2 and the provisional allowable error range for P3 do not overlap. If this value is false, the registration unit 110 determines that the allowable error range for P2 and the provisional allowable error range for P3 overlap.
- the registration unit 110 obtains an allowable error range for P3 as follows.
- Min (A, B) is a function that returns a smaller value of A and B.
- Max (A, B) is a function that returns a larger value of A and B.
- P2 is thinned out.
- US3 ′ Min (US3, US2)
- LS3 ′ Max (LS3, LS2)
- the registration unit 110 obtains a temporary upper limit gradient US5 and a temporary lower limit gradient LS5 for P5.
- the registration unit 110 calculates “LS4> US5 ⁇ LS5> US4”.
- the registration unit 110 determines that the allowable error range for P4 and the provisional allowable error range for P5 do not overlap. As a result, P5 cannot be thinned out, and P4 remains as end point data.
- Two point data P1 and P4 are archived in the time series database (storage unit 121).
- the registration unit 110 may apply either the first or second algorithm as a compression method algorithm. Also, other algorithms may be applied. Conventionally, these algorithms have been applied with a single starting point.
- the registration unit 110 determines a plurality of starting points (starting point candidates) and applies the above algorithm to the plurality of starting point candidates.
- 10 to 13 are diagrams for explaining an example of the time-series data compression method according to this embodiment.
- the starting point at t1 was one point.
- a plurality of starting point candidates are set, and each starting point candidate is regarded as a starting point, and thinning calculation is performed in parallel.
- the generating unit 111 When the number of starting points to be generated is 3, the generating unit 111 generates, for example, P1 ⁇ t1, v1>, P1 ′ ⁇ t1, v1 + ⁇ >, and P1 ′′ ⁇ t1, v1 ⁇ > as starting point candidates.
- the generation unit 111 for example, ⁇ t1, v1 + ⁇ >, ⁇ t1, v1 + ⁇ ⁇ (1-2 ⁇ (N ⁇ 1)) ⁇ 1>, ⁇ t1, v1 + ⁇ ⁇ (1 ⁇ 2 ⁇ (N ⁇ 1)) ⁇ 2>,..., P1 ⁇ t1, v1>,..., ⁇ T1, v1- ⁇ > are generated as starting point candidates.
- the starting point candidate generation method is not limited to this, and any point data may be used as the starting point point as long as the value is within the range of the allowable error ⁇ centered on the starting point data.
- FIGS. 11 to 13 an example in which thinning calculation by the second compression method is applied will be described.
- the starting point candidates are P1 ⁇ t1, v1>, P1 ′ ⁇ t1, v1 + ⁇ >, P1 ′′ ⁇ t1, v1 ⁇ >.
- FIG. 11 shows an example in which thinning is performed using P1 ′′ ⁇ t1, v1- ⁇ > as a starting point among these starting point candidates.
- thinning is possible up to P2, P3, and P4, but thinning is not possible at P5.
- FIG. 12 shows an example in which thinning is performed starting from P1 ⁇ t1, v1>. As shown in FIG. 12, when starting from P1 ⁇ t1, v1>, thinning is possible up to P2, P3, and P4, but thinning is not possible at P5.
- FIG. 13 shows an example in which thinning is performed starting from P1 ′ ⁇ t1, v1 + ⁇ >. As shown in FIG. 13, when P1 ′ ⁇ t1, v1 + ⁇ > is the starting point, thinning is possible up to P2, P3, P4, and P5.
- the starting point at t1 is one point, but in this embodiment, a plurality of starting point candidates are set, and each starting point candidate is regarded as the starting point, and thinning calculation is performed in parallel. I do. For this reason, in the above example, when the starting point is one point, thinning up to P4 is possible at the maximum, whereas in the method of this embodiment, thinning out to P5 is possible. Thus, according to the method of the present embodiment, the compression rate is increased even with the same allowable error.
- FIG. 14 is a flowchart showing an overall flow of data compression processing in the first embodiment.
- FIG. 14 shows an example in which the second compression method described above is applied.
- the selection unit 112 selects starting point data (step S101). For example, when time-series data is input in real time, the selection unit 112 uses the point data input first or the point data input after the thinning-out process for the input point data is completed as the starting point data. Good. When inputting point data sequentially from the stored time series data, the selection unit 112 starts from the point data input first or the point data input after the thinning-out process for the input point data is completed. It may be data.
- the generating unit 111 generates a plurality of starting point candidates whose error with respect to the selected starting point data is within an allowable error (step S102).
- the selection unit 112 selects the next point data (step S103).
- the next point data is point data that is sequentially input at successive times (second time) with reference to the time (first time) when the starting point data is input.
- the next point data is selected while sequentially shifting the time until thinning cannot be performed.
- the next point data selected at the previous time is referred to as old next point data.
- the old runner-up data when the thinning cannot be performed corresponds to the end point data.
- the point data selected before the previous next point data corresponds to intermediate data input between the starting point data and the finally remaining end point data (old next point data).
- the time when the next point data is input may be either before or after the time when the starting point data is input.
- the starting point data selected in step S101 may be the last point data, and the next point data may not be selected (acquired).
- the registration unit 110 may end the data compression process.
- the selection unit 112 may wait for the process of step S103 until the next point data (next point data) can be acquired.
- the selection unit 112 selects one starting point candidate among the generated starting point candidates (step S104).
- the selection unit 112 determines whether or not the selected starting point candidate is invalidated (step S105). Invalidation means that a starting point candidate that can no longer be thinned out using the selected next point data is excluded from subsequent processing. For example, the starting point candidate that could not be thinned out in the process with the old next point data is invalidated when the old next point data is processed (step S109 described later).
- step S105 it is determined whether the starting point candidate has been invalidated in the process up to the previous time.
- step S105: Yes the process returns to step S104, and the selection unit 112 selects the next starting point candidate and repeats the process. If not invalidated (step S105: No), the selection unit 112 calculates an upper limit gradient and a lower limit gradient from the selected starting point candidate to the next point data (step S106). The selection unit 112 compares the calculated upper limit gradient and lower limit gradient with the upper limit gradient and lower limit gradient calculated for the previous next point data (step S107). For example, the selection unit 112 determines whether or not the allowable error range specified by the upper limit gradient and the lower limit gradient of the previous next point data overlaps with the allowable error range specified by the upper limit gradient and the lower limit gradient of the next point data. To do.
- the selection unit 112 determines whether or not the allowable error ranges of both overlap (step S108). If they do not overlap (step S108: No), the selection unit 112 invalidates the currently selected starting point candidate (step S109) and returns to step S104. When overlapping (step S108: Yes), the selection unit 112 updates the upper limit gradient and the lower limit gradient from the starting point candidate to the upper limit gradient and the lower limit gradient calculated for the current next point data (step S110).
- the selection unit 112 determines whether all the starting point candidates have been processed (step S111). When not processing (step S111: No), it returns to step S104 and repeats a process. When all the starting point candidates have been processed (step S111: Yes), the selection unit 112 determines whether or not all the starting point candidates have been invalidated (step S112). When all the starting point candidates have not been invalidated (step S112: No), the selection unit 112 selects the next successive point data as new next point data and repeats the process (step S103).
- the selection unit 112 selects the starting point candidate finally invalidated (step S113). By such processing, the selection unit 112 can select a starting point candidate having a larger number of point data (intermediate data) approximated so that the error is within an allowable error.
- the selection unit 112 selects any one of the starting point candidates.
- the selection unit 112 may select a starting point candidate having a value closer to the starting point data from among a plurality of starting point candidates invalidated last.
- the compression unit 113 performs post-processing to correct the value of the end point data (old next point data) according to the selected starting point candidate (step S114). Note that the end point data may be output without performing post-processing.
- FIG. 15 is a diagram for explaining an example of post-processing.
- FIG. 15 shows an example in which thinning is not possible when P5 is the next point data, and P4 remains as the old next point data (end point data). Further, it is assumed that P1 ′′ is selected as a starting point candidate.
- the compression unit 113 obtains an average gradient between the upper limit gradient and the lower limit gradient of the old runner-point data.
- the compression unit 113 regards the straight line having the average gradient as an inclination as an approximated data string, and obtains the value of the data string at t4.
- the compression unit 113 sets this value as the corrected end point data value (P4 ').
- the compression unit 113 stores the selected starting point candidate and the corrected end point data in the storage unit 121.
- the selection unit 112 determines whether all input data has been processed (step S115). When not processing (step S115: No), the selection part 112 selects the next point data as new starting data (step S101), and repeats a process. For example, the next point data when all the starting point candidates are invalidated is selected as new starting point data.
- a plurality of starting point candidates are set, and each starting point candidate is regarded as a starting point, and thinning calculation is performed. Then, a starting point candidate capable of thinning out more data is selected, and data thinned out by the selected starting point candidate is output as compression result data. Thereby, the compression rate of time series data can be increased.
- the data compression apparatus executes a process (filtering process) that omits the thinning calculation.
- FIG. 16 is a block diagram showing an example of the configuration of the data compression apparatus 100-2 according to the second embodiment.
- the data compression apparatus 100-2 includes a receiving unit 101, a registration unit 110-2, a search unit 114, and a storage unit 121.
- FIG. 1 is a block diagram showing the configuration of the data compression apparatus 100 according to the first embodiment.
- the selecting unit 112-2 adds a function for filtering to the function of the selecting unit 112 described above. Before executing the processing for each starting point candidate, the selection unit 112-2 is approximated so that the old next-point data is within the allowable error and the next-point data is within the allowable error. It is determined whether the range satisfies a predetermined condition. If the condition is satisfied, the selection unit 112-2 determines that the next point data cannot be approximated within an allowable error, and does not execute the determination process for each starting point candidate.
- the selection unit 112-2 compares the minimum lower limit gradient and the maximum upper limit gradient of the old next-point data and the minimum lower-limit gradient and the maximum upper-limit gradient of the (current) next-point data, and determines whether a predetermined condition is satisfied. To do. Then, the selection unit 112-2 obtains a determination value (for example, true or false) indicating whether or not the condition is satisfied, and omits processing for each starting point candidate according to the determination value.
- a determination value for example, true or false
- the minimum lower limit gradient represents the minimum value among gradients (gradients) between each starting point candidate and a value obtained by subtracting the allowable error from the point data.
- the maximum upper limit gradient represents a maximum value among gradients (gradients) between each starting point candidate and a value obtained by adding an allowable error to point data.
- FIG. 17 is a diagram for explaining an example of the minimum lower limit gradient MinLS and the maximum upper limit gradient MaxUS.
- MinLS Min (LS5, LS5 ′, LS5 ′′)
- FIG. 18 to 20 are diagrams showing examples of the upper limit gradient and the lower limit gradient for P4.
- FIG. 18 shows the upper limit gradient US4 ′′ and the lower limit gradient LS4 ′′ of P4 when starting from P1 ′′.
- FIG. 19 shows the upper limit gradient US4 and the lower limit gradient LS4 of P4 when starting from P1.
- FIG. 20 shows the upper limit gradient US4 'and the lower limit gradient LS4' of P4 when P1 'is the starting point.
- MaxUS Max (US4, US4 ′, US4 ′′)
- MinLS Min (LS4, LS4 ′, LS4 ′′)
- the minimum lower limit gradient and the maximum upper limit gradient of P4 are set to MinLS4 and MaxUS4, respectively.
- the selection unit 112-2 compares MinLS4, MaxUS4, MinLS5, and MaxUS5 according to the following conditions, and calculates a determination value indicating whether or not the conditions are satisfied. “MaxUS4 ⁇ MinLS5” ⁇ “MinLS4> MaxUS5”
- FIG. 21 is a flowchart showing an overall flow of data compression processing in the second embodiment.
- steps S201 to S203 are the same processes as steps S101 to S103 in the data compression apparatus 100 according to the first embodiment, description thereof is omitted.
- the selection unit 112-2 calculates the determination value as described above (step S204). The selection unit 112-2 determines whether or not the determination value is true (step S205). If false (step S205: No), processing is performed for each starting point candidate (steps S206 to S214). Steps S206 to S214 are the same processes as steps S104 to S112 of the first embodiment, and thus description thereof is omitted.
- step S205 If the determination value is true (step S205: Yes), the selection unit 112-2 does not execute step S206 to step S214, but transitions to step S215.
- Steps S215 to S217 are the same as steps S113 to S115 in the first embodiment, and thus the description thereof is omitted.
- the data compression apparatus further executes the filter process that omits the thinning calculation. For this reason, the increase in the computational complexity by using a some starting point can be suppressed.
- FIG. 22 is an explanatory diagram showing the hardware configuration of the data compression apparatus according to the first or second embodiment.
- the data compression device includes a control device such as a CPU (Central Processing Unit) 51, a storage device such as a ROM (Read Only Memory) 52 and a RAM (Random Access Memory) 53, and a network.
- a control device such as a CPU (Central Processing Unit) 51
- a storage device such as a ROM (Read Only Memory) 52 and a RAM (Random Access Memory) 53
- a network such as a Wi-Fi Protected Access Memory
- a communication I / F 54 that communicates by connecting to each other and a bus 61 that connects each unit are provided.
- the data compression program executed by the data compression apparatus according to the first or second embodiment is provided by being incorporated in advance in the ROM 52 or the like.
- the data compression program executed by the data compression apparatus is a file in an installable format or an executable format, and is a CD-ROM (Compact Disk Read Only Memory), a flexible disk (FD). It may be configured to be recorded on a computer-readable recording medium such as a CD-R (Compact Disk Recordable) or a DVD (Digital Versatile Disk).
- CD-ROM Compact Disk Read Only Memory
- FD flexible disk
- It may be configured to be recorded on a computer-readable recording medium such as a CD-R (Compact Disk Recordable) or a DVD (Digital Versatile Disk).
- the data compression program executed by the data compression apparatus according to the first or second embodiment is stored on a computer connected to a network such as the Internet and provided by being downloaded via the network. May be. Further, the data compression program executed by the data compression apparatus according to the first or second embodiment may be provided or distributed via a network such as the Internet.
- the data compression program executed by the data compression apparatus according to the first or second embodiment can cause a computer to function as each unit of the above-described data compression apparatus.
- the CPU 51 can read and execute a data compression program from a computer-readable storage medium onto a main storage device.
- the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage.
- various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, the constituent elements over different embodiments may be appropriately combined.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
上述のように、Swinging Doorアルゴリズム等では、1つの起点を定めて、線形近似により時系列データを圧縮していた。第1の実施形態にかかるデータ圧縮装置は、複数の起点(起点候補)を定めて、より効率的に圧縮できる起点候補を採用して時系列データを圧縮する。
y=ys+(x-xs)(ye-ys)/(xe-xs)・・・(1)
P1<t1,v1>、P2<t2,v2>、P3<t3,v3>、P4<t4,v4>、P5<t5,v5>。ただし、t1<t2<t3<t4<t5である。
US3'=Min(US3,US2)
LS3'=Max(LS3,LS2)
US4'=Min(US4,US3)
LS4'=Max(LS4,LS3)
US4=US4'
LS4=LS4'
第1の実施形態の手法により圧縮率は向上するが、複数の起点候補で並行して間引き計算を行うため計算量が増加する。そこで、第2の実施形態にかかるデータ圧縮装置は、間引き計算を省略する処理(フィルタ処理)をさらに実行する。
MaxUS=Max(US5,US5’,US5’')
MinLS=Min(LS5,LS5’,LS5’')
MaxUS=Max(US4,US4’,US4’')
MinLS=Min(LS4,LS4’,LS4’')
「MaxUS4<MinLS5」 ∨ 「MinLS4>MaxUS5」
101 受付部
110 登録部
111 生成部
112 選択部
113 圧縮部
114 検索部
121 記憶部
Claims (10)
- 時系列に入力される複数の入力データを受付ける受付部と、
第1時刻に入力された前記入力データである起点データに対する誤差が閾値以内のデータである複数の起点候補を生成する生成部と、
前記起点候補と、第2時刻に入力された前記入力データである終点データと、前記第1時刻と前記第2時刻との間の時刻に入力された前記入力データである中間データと、を用いて、前記起点候補のうち、前記起点候補と前記終点データとによって誤差が前記閾値以内となるように近似される前記中間データの個数が、他の前記起点候補より大きい前記起点候補を選択する選択部と、
選択された前記起点候補と前記終点データとを、前記起点データと前記中間データと前記終点データとを圧縮した出力データとして出力する圧縮部と
を備えるデータ圧縮装置。 - 前記選択部は、前記起点候補ごとに、前記起点候補と前記終点データとによって誤差が前記閾値以内となるように近似される範囲に前記中間データが含まれるか否かを判定する判定処理を、前記第2時刻を変更しながら繰り返し実行し、近似される範囲に含まれると最後に判定されたときの前記中間データの個数が、他の前記起点候補より大きい前記起点候補を選択する、
請求項1に記載のデータ圧縮装置。 - 前記選択部は、前記第2時刻を変更したときに、変更前の時刻で誤差が前記閾値以内となるように近似される範囲と、変更後の時刻で誤差が前記閾値以内となるように近似される範囲と、が予め定められた条件を満たすか否かを判定し、満たす場合に、変更後の時刻では、前記閾値以内となるように近似される範囲に前記中間データが含まれないと判定する、
請求項2に記載のデータ圧縮装置。 - 前記選択部は、変更前の時刻および変更後の時刻のそれぞれで、前記起点候補から、前記終点データに対して誤差が前記閾値以内となるデータまでの線分の傾きの最小値である最小下限勾配、および、前記線分の傾きの最大値である最大上限勾配を算出し、変更前の時刻の前記最小下限勾配および前記最大上限勾配と、変更後の時刻の前記最小下限勾配および前記最大上限勾配と、が前記条件を満たすか否かを判定する、
請求項3に記載のデータ圧縮装置。 - 前記選択部は、前記中間データの個数が同じ前記起点候補が複数存在する場合、前記中間データの個数が同じ複数の前記起点候補のうち、前記起点データとの差分が小さい前記起点候補を選択する、
請求項1に記載のデータ圧縮装置。 - 前記圧縮部は、選択された前記起点候補と、選択された前記起点候補に応じて補正した前記終点データとを、前記出力データとして出力する、
請求項1に記載のデータ圧縮装置。 - 前記選択部は、前記起点候補のうち、前記起点候補と、前記終点データに対して誤差が前記閾値以内となるデータとによって定まる許容誤差範囲に含まれる前記中間データの個数が、他の前記起点候補より大きい前記起点候補を選択する、
請求項1に記載のデータ圧縮装置。 - 時系列に入力される複数の入力データを受付ける受付ステップと、
第1時刻に入力された前記入力データである起点データに対する誤差が閾値以内のデータである複数の起点候補を生成する生成ステップと、
前記起点候補と、第2時刻に入力された前記入力データである終点データと、前記第1時刻と前記第2時刻との間の時刻に入力された前記入力データである中間データと、を用いて、前記起点候補のうち、前記起点候補と前記終点データとによって誤差が前記閾値以内となるように近似される前記中間データの個数が、他の前記起点候補より大きい前記起点候補を選択する選択ステップと、
選択された前記起点候補と前記終点データとを、前記起点データと前記中間データと前記終点データとを圧縮した出力データとして出力する圧縮ステップと
を含むデータ圧縮方法。 - 現象の時間的な変化を計測して得られた時刻と値とを含むポイントデータの系列である時系列データを圧縮するデータ圧縮方法であって、
第1時刻のポイントデータを起点とするステップと、
前記起点に対する誤差が閾値以内となる複数の起点候補を生成するステップと、
前記第1時刻の後の時刻に得られたポイントデータを注目点とするステップと、
前記起点候補ごとに、前記注目点に対する誤差の範囲に基づいて、上限勾配と下限勾配とを計算するステップと、
計算した前記上限勾配と前記下限勾配とで指定される許容誤差範囲が、前の時刻の注目点に対して計算された上限勾配と下限勾配とで指定される許容誤差範囲と重なれば、前記上限勾配と下限勾配を更新し、前記前の時刻の注目点を間引くステップと、
計算した前記上限勾配と前記下限勾配とで指定される許容誤差範囲が、前の時刻の注目点に対して計算された上限勾配と下限勾配とで指定される許容誤差範囲と重ならなければ、前記起点候補を無効とするステップと、
無効化されていない前記起点候補が残っていれば、残っている前記起点候補に対して、次の時刻に得られたポイントデータを注目点とする間引き処理を継続するステップと、
を含むデータ圧縮方法。 - コンピュータを、
時系列に入力される複数の入力データを受付ける受付部と、
第1時刻に入力された前記入力データである起点データに対する誤差が閾値以内のデータである複数の起点候補を生成する生成部と、
前記起点候補と、第2時刻に入力された前記入力データである終点データと、前記第1時刻と前記第2時刻との間の時刻に入力された前記入力データである中間データと、を用いて、前記起点候補のうち、前記起点候補と前記終点データとによって誤差が前記閾値以内となるように近似される前記中間データの個数が、他の前記起点候補より大きい前記起点候補を選択する選択部と、
選択された前記起点候補と前記終点データとを、前記起点データと前記中間データと前記終点データとを圧縮した出力データとして記憶部に出力する圧縮部
として機能させるためのプログラム。
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201380003042.4A CN104160629B (zh) | 2013-01-31 | 2013-01-31 | 数据压缩装置以及数据压缩方法 |
JP2014504111A JP5622967B1 (ja) | 2013-01-31 | 2013-01-31 | データ圧縮装置、データ圧縮方法およびプログラム |
PCT/JP2013/052245 WO2014118954A1 (ja) | 2013-01-31 | 2013-01-31 | データ圧縮装置、データ圧縮方法およびプログラム |
EP13836184.5A EP2953266B1 (en) | 2013-01-31 | 2013-01-31 | Data compression device, data compression method, and program |
AU2013376200A AU2013376200B2 (en) | 2013-01-31 | 2013-01-31 | Data compression device, data compression method, and program |
US14/208,061 US9838032B2 (en) | 2013-01-31 | 2014-03-13 | Data compression device, data compression method, and computer program product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2013/052245 WO2014118954A1 (ja) | 2013-01-31 | 2013-01-31 | データ圧縮装置、データ圧縮方法およびプログラム |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/208,061 Continuation US9838032B2 (en) | 2013-01-31 | 2014-03-13 | Data compression device, data compression method, and computer program product |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014118954A1 true WO2014118954A1 (ja) | 2014-08-07 |
Family
ID=51224109
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/052245 WO2014118954A1 (ja) | 2013-01-31 | 2013-01-31 | データ圧縮装置、データ圧縮方法およびプログラム |
Country Status (6)
Country | Link |
---|---|
US (1) | US9838032B2 (ja) |
EP (1) | EP2953266B1 (ja) |
JP (1) | JP5622967B1 (ja) |
CN (1) | CN104160629B (ja) |
AU (1) | AU2013376200B2 (ja) |
WO (1) | WO2014118954A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108667463A (zh) * | 2018-03-27 | 2018-10-16 | 江苏中科羿链通信技术有限公司 | 监测数据压缩方法 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107807271B (zh) * | 2017-09-29 | 2021-04-16 | 中国电力科学研究院 | 一种用于对过电压监测数据自动进行压缩的方法及系统 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001095496A1 (fr) * | 2000-06-06 | 2001-12-13 | Sakai, Yasue | Procede et appareil de compression, procede et appareil d'expansion, systeme de compression expansion |
JP2003015734A (ja) * | 2001-07-02 | 2003-01-17 | Toshiba Corp | 時系列データ圧縮方法および時系列データ格納装置およびプログラム |
JP2012010319A (ja) * | 2010-05-28 | 2012-01-12 | Hitachi Ltd | 時系列データの圧縮方法および圧縮装置 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4669097A (en) * | 1985-10-21 | 1987-05-26 | The Foxboro Company | Data compression for display and storage |
EP0407935B1 (en) * | 1989-07-10 | 1999-10-06 | Hitachi, Ltd. | Document data processing apparatus using image data |
US7076402B2 (en) | 2004-09-28 | 2006-07-11 | General Electric Company | Critical aperture convergence filtering and systems and methods thereof |
JP4719667B2 (ja) | 2006-12-28 | 2011-07-06 | 日立オートモティブシステムズ株式会社 | 時系列データ圧縮方法 |
JP5369041B2 (ja) * | 2010-03-30 | 2013-12-18 | 富士フイルム株式会社 | ページ記述データ処理装置、方法及びプログラム |
EP2410589B1 (de) | 2010-07-23 | 2013-10-09 | Grützediek, Ursula | Verfahren zur Herstellung eines TMR-Bauelements |
CA2749661C (en) * | 2010-08-20 | 2022-03-15 | Pratt & Whitney Canada Corp. | Method and system for generating a data set |
CN102393855B (zh) * | 2011-10-18 | 2013-07-31 | 国电南瑞科技股份有限公司 | 一种过程数据有损压缩比动态控制方法 |
CN102510287B (zh) * | 2011-11-03 | 2014-06-11 | 电子科技大学 | 一种工业实时数据的快速压缩方法 |
CN102664635B (zh) * | 2012-03-06 | 2015-07-29 | 华中科技大学 | 一种精度可控的自适应数据压缩方法 |
-
2013
- 2013-01-31 JP JP2014504111A patent/JP5622967B1/ja active Active
- 2013-01-31 WO PCT/JP2013/052245 patent/WO2014118954A1/ja active Application Filing
- 2013-01-31 EP EP13836184.5A patent/EP2953266B1/en active Active
- 2013-01-31 AU AU2013376200A patent/AU2013376200B2/en active Active
- 2013-01-31 CN CN201380003042.4A patent/CN104160629B/zh active Active
-
2014
- 2014-03-13 US US14/208,061 patent/US9838032B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001095496A1 (fr) * | 2000-06-06 | 2001-12-13 | Sakai, Yasue | Procede et appareil de compression, procede et appareil d'expansion, systeme de compression expansion |
JP2003015734A (ja) * | 2001-07-02 | 2003-01-17 | Toshiba Corp | 時系列データ圧縮方法および時系列データ格納装置およびプログラム |
JP2012010319A (ja) * | 2010-05-28 | 2012-01-12 | Hitachi Ltd | 時系列データの圧縮方法および圧縮装置 |
Non-Patent Citations (5)
Title |
---|
E. H. BRISTOL: "Swinging Door Trending: Adaptive Trend Recording?", ISA NATIONAL CONF. PROC., 1990, pages 749 - 754 |
FENG XIAODONG ET AL.: "An Improved Process Data Compression Algorithm", PROCEEDINGS OF THE 4TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, vol. 3, June 2002 (2002-06-01), pages 2190 - 2193, XP010595009 * |
GANG CHEN ET AL.: "An Optimized Algorithm for Lossy Compression of Real-Time Data, Intelligent Computing and Intelligent Systems(ICIS", 2010 IEEE INTERNATIONAL CONFERENCE ON, vol. 2, October 2010 (2010-10-01), pages 187 - 191, XP031818817 * |
MATTHEW J. WATSON ET AL.: "A Practical Assessment of Process Data Compression Techniques", IND. ENG. CHEM. RES., vol. 37, no. 1, 1998, pages 267 - 274 |
PETER A. JAMES, DATA COMPRESSION FOR PROCESS HISTORIANS, 1995, Retrieved from the Internet <URL:http://www.castdiv.org/archive/data_compression.pdf> |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108667463A (zh) * | 2018-03-27 | 2018-10-16 | 江苏中科羿链通信技术有限公司 | 监测数据压缩方法 |
CN108667463B (zh) * | 2018-03-27 | 2021-11-02 | 江苏中科羿链通信技术有限公司 | 监测数据压缩方法 |
Also Published As
Publication number | Publication date |
---|---|
EP2953266A1 (en) | 2015-12-09 |
EP2953266A4 (en) | 2016-08-31 |
EP2953266B1 (en) | 2019-09-25 |
JPWO2014118954A1 (ja) | 2017-01-26 |
AU2013376200B2 (en) | 2016-06-23 |
CN104160629B (zh) | 2017-09-01 |
US9838032B2 (en) | 2017-12-05 |
JP5622967B1 (ja) | 2014-11-12 |
AU2013376200A1 (en) | 2015-02-19 |
US20140214781A1 (en) | 2014-07-31 |
CN104160629A (zh) | 2014-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9727671B2 (en) | Method, system, and program storage device for automating prognostics for physical assets | |
JP6551101B2 (ja) | 情報処理装置、情報処理方法、及び、プログラム | |
US11456194B2 (en) | Determining critical parameters using a high-dimensional variable selection model | |
JP2014203228A (ja) | プロジェクト管理支援システム | |
JP5509153B2 (ja) | 歩容解析方法、歩容解析装置及びそのプログラム | |
JP5622967B1 (ja) | データ圧縮装置、データ圧縮方法およびプログラム | |
US20200333170A1 (en) | Missing value imputation device, missing value imputation method, and missing value imputation program | |
CN116778935A (zh) | 水印生成、信息处理、音频水印生成模型训练方法和装置 | |
JP3995569B2 (ja) | 波形パターンデータから設備の診断・監視のための特徴を抽出する方法及びプログラム | |
JP2010213230A (ja) | 近似計算処理装置、近似ウェーブレット係数計算処理装置、及び近似ウェーブレット係数計算処理方法 | |
JP5791555B2 (ja) | 状態追跡装置、方法、及びプログラム | |
CN106375849B (zh) | 一种生成模板的方法、装置、视频的更新方法及装置 | |
JP2020067910A (ja) | 学習曲線予測装置、学習曲線予測方法、およびプログラム | |
JP7113674B2 (ja) | 情報処理装置及び情報処理方法 | |
JP2019095932A (ja) | 異常判定方法及び装置 | |
JP4550398B2 (ja) | 一連の画像に現れる物体の動きを表現する方法、一連の画像中の画像における物体の選択を識別する方法、画像に対応する信号を処理することによって一連の画像を探索する方法、及び装置 | |
JP2008108204A (ja) | 環境負荷評価システム、方法及びプログラム | |
CN111539536B (zh) | 一种评估业务模型超参数的方法和装置 | |
JP5746078B2 (ja) | 時間的再現確率推定装置、状態追跡装置、方法、及びプログラム | |
JP2012196250A (ja) | 波形解析装置、波形解析方法、及びプログラム | |
CN117271098B (zh) | 一种ai模型计算核调度方法、装置、设备及存储介质 | |
WO2016180350A1 (zh) | 一种终端桌面的智能管理方法、终端及计算机存储介质 | |
CN111526054B (zh) | 用于获取网络的方法及装置 | |
JP2023009625A (ja) | 情報処理装置、情報処理方法及び情報処理プログラム | |
JP2024021145A (ja) | 情報処理装置、情報処理方法およびプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2014504111 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2013836184 Country of ref document: EP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13836184 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2013376200 Country of ref document: AU Date of ref document: 20130131 Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |