WO2011027714A1 - Système et procédé d'agrégation de données et support de stockage - Google Patents

Système et procédé d'agrégation de données et support de stockage Download PDF

Info

Publication number
WO2011027714A1
WO2011027714A1 PCT/JP2010/064538 JP2010064538W WO2011027714A1 WO 2011027714 A1 WO2011027714 A1 WO 2011027714A1 JP 2010064538 W JP2010064538 W JP 2010064538W WO 2011027714 A1 WO2011027714 A1 WO 2011027714A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
approximate
approximate expression
value
expression
Prior art date
Application number
PCT/JP2010/064538
Other languages
English (en)
Japanese (ja)
Inventor
今井照之
喜田弘司
海老山知生
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2011529886A priority Critical patent/JPWO2011027714A1/ja
Publication of WO2011027714A1 publication Critical patent/WO2011027714A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Definitions

  • the present invention is applied to a data summarization system, a data summarization method, a data summarization program, a recording medium, and a data summarization system, a data summarization method, a data summarization program, and a recording medium that reduce the amount of information of sequentially generated data.
  • a data summarization system a data summarization method, a data summarization program, and a recording medium that reduce the amount of information of sequentially generated data.
  • Patent Document 1 describes a data compression method based on a difference for process data.
  • Process data is time-series data values.
  • the compression method based on the difference described in Patent Document 1 uses this process data as a reference point when each process data is represented on a two-dimensional plane with the x-axis as time and the y-axis as process values. A difference in process value is obtained between the reference point and the process data to be processed.
  • process data whose absolute value of the difference exceeds the compression accuracy range calculated for the process data to be processed is stored in the time-series process data storage unit, and other process data is stored. Decrease as much as possible.
  • Patent Document 1 compresses data.
  • the compression accuracy is set for each process data and is used to determine whether or not to compress.
  • the higher the compression accuracy the higher the possibility of compression. That is, in the technique described in Patent Document 1, high compression accuracy is a concept similar to an increase in compression rate, and low compression accuracy is a concept similar to a reduction in compression rate.
  • Patent Document 2 describes a technique for predicting a next input value based on a past input value, storing a difference between the actual input value and the predicted value, and performing data compression.
  • JP 2003-15734 A paragraphs 0039, paragraphs 0063-0065
  • JP-A-2006-259937 paragraphs 0031 to 0033
  • the technique described in Patent Document 1 determines whether or not to compress process data by comparing the difference between the process value of the process data and the reference data with the compression accuracy. For this reason, when the process value of the process data to be compressed changes greatly in a discontinuous manner, the difference between the process value and the reference value exceeds the compression accuracy, and the process data becomes difficult to be thinned out (that is, difficult to compress). ). Therefore, in the technique described in Patent Document 1, efficient data compression is difficult when the value of the data often changes greatly suddenly.
  • the technique described in Patent Literature 2 realizes data compression by storing a difference between a prediction result based on a past value and an actual input value.
  • the present invention provides a data summarization system, a data summarization method, a data summarization program, and a data summarization method capable of efficiently compressing data that is sequentially generated in a certain tendency and that may change greatly irregularly.
  • An example is to provide a recording medium.
  • Another object is to provide a data structure suitably applied to such a data summarization system, data summarization method, data summarization program, and recording medium.
  • a data summarization system is an approximate expression for calculating an approximate value of a data value in data including a data value and an occurrence time of the data value, the occurrence time being a variable, and an effective definition range of the variable Is an approximate value that calculates the approximate value of the data value of the new data for each approximate expression by substituting the occurrence time included in the new data for each approximate expression that is defined as a time interval or a set of time points Based on the calculation unit, the approximate value calculated for each approximate expression and the data value of the new data, select an approximate expression suitable for calculating the approximate value of the data value of the new data, or the data value of the new data An approximate expression evaluation unit that determines that there is no approximate expression suitable for approximate value calculation, and an unconfirmed data that stores new data determined to have no approximate expression suitable for approximate calculation of data values as approximate formula unconfirmed data.
  • a new approximate expression generation unit that generates an expression and defines a time interval or a set of time points as an effective definition area of the approximate expression, and an approximate expression suitable for calculating the approximate value of the data value of the new data by the approximate expression evaluation unit
  • an update unit that updates the effective definition area of the approximate expression so as to include the generation time of the new data when selected.
  • a data summarization method is an approximate expression for calculating an approximate value of a data value in data including a data value and an occurrence time of the data value, wherein the occurrence time is a variable, By substituting the occurrence time included in the new data for each approximate expression whose domain is defined as a time interval or a set of time points, the data value of the new data is calculated for each approximate expression.
  • an approximation formula suitable for calculating the approximate value of the new data based on the approximate value calculated every time and the data value of the new data, or an approximation suitable for calculating the approximate value of the data value of the new data
  • New data determined that there is no formula and determined that there is no approximate formula suitable for calculating the approximate value of the data value is stored as approximate formula indeterminate data, and when new data is input, the new data It is determined whether or not a new approximate expression can be generated based on the undefined data of the similar expression, and if it can be generated, a new approximate expression is generated, and a time interval or a single point is defined as an effective definition area of the approximate expression.
  • a recording medium storing a data summarizing program according to an aspect of the present invention is an approximate expression for calculating an approximate value of a data value in data including a data value and an occurrence time of the data value in a computer.
  • Approximating the data value of the new data by substituting the time of occurrence included in the new data for each approximate expression where the time is a variable and the effective domain of the variable is defined as a time interval or a set of time points
  • Approximate value calculation processing that calculates a value for each approximate expression, whether to select an approximate expression suitable for calculating the approximate value of the data value of the new data based on the approximate value calculated for each approximate expression and the data value of the new data
  • approximate expression evaluation processing for determining that there is no approximate expression suitable for calculating the approximate value of the data value of the new data, and new data determined that there is no approximate expression suitable for calculating the approximate value of the data value
  • Unconfirmed data storage process to be stored in the unconfirmed data storage unit as approximate expression unconfirmed data, whether new approximate expression can be generated from new data and approximate expression unconfirmed data when new data is input
  • a new approximate expression is generated, and if it can be generated, a new approximate expression is generated, and a new
  • the data structure according to one aspect of the present invention includes an approximation formula for calculating an approximate value of a data value by substituting a variable, and an effective domain that is a domain of a variable that can obtain the approximate value of the data value.
  • the effective domain is represented by a set of points representing a variable interval or one variable value.
  • Each form of the present invention can efficiently compress data that is sequentially generated with a certain tendency and that may change greatly irregularly.
  • the data structure according to an aspect of the present invention can be suitably used for a data summarization system, a data summarization method, a data summarization program, and a recording medium having such advantages.
  • FIG. 1 is an explanatory diagram showing an example of an effective domain.
  • FIG. 2 is a block diagram illustrating an example of the data summarization system according to the first embodiment of this invention.
  • FIG. 3 is an explanatory diagram illustrating an example of one data input to the data input unit 10.
  • FIG. 4 is an explanatory diagram illustrating an example of uncertain points stored in the uncertain point storage unit 13.
  • FIG. 5 is an explanatory diagram schematically showing uncertain points and newly generated data.
  • FIG. 6 is an explanatory diagram schematically showing an approximate expression generated from each data shown in FIG.
  • FIG. 7 is an explanatory diagram illustrating an example of the expression format of the approximate expression.
  • FIG. 8 is an explanatory diagram showing an example of an approximate expression stored in the approximate expression storage unit 15 and an approximate expression ID thereof.
  • FIG. 1 is an explanatory diagram showing an example of an effective domain.
  • FIG. 2 is a block diagram illustrating an example of the data summarization system according to the first embodiment of this invention.
  • FIG. 9 is an explanatory diagram illustrating an example of final data information.
  • FIG. 10 is an explanatory diagram showing an example of processing of the new data generation time substitution unit 12.
  • FIG. 11 is an explanatory diagram showing an example of an effective definition area for each approximate expression.
  • FIG. 12 is an explanatory diagram illustrating an example of selecting an approximate expression that satisfies a criterion.
  • FIG. 13 is an explanatory diagram illustrating an example of selecting an approximate expression that satisfies a criterion.
  • FIG. 14 is an explanatory diagram illustrating an example of selecting an approximate expression that satisfies a criterion.
  • FIG. 15 is an explanatory diagram illustrating an example of valid domain update when a new approximate expression is generated.
  • FIG. 10 is an explanatory diagram showing an example of processing of the new data generation time substitution unit 12.
  • FIG. 11 is an explanatory diagram showing an example of an effective definition area for each approximate expression.
  • FIG. 12 is an explanatory diagram
  • FIG. 16 is a flowchart illustrating an example of processing progress of the first embodiment.
  • FIG. 17 is a flowchart illustrating an example of the processing progress of step S105.
  • FIG. 18 is an explanatory diagram illustrating an example in which data compression is performed by applying the technique described in Patent Document 1.
  • FIG. 19 is an explanatory diagram illustrating an example of a situation in which irregular data is continuously generated temporarily.
  • FIG. 20 is a block diagram illustrating an example of a data summarization system according to the second embodiment of this invention.
  • FIG. 21 is an explanatory diagram showing an example of a predetermined formula.
  • FIG. 22 is an explanatory diagram showing an example of a case where a predetermined expression is expressed only by a constant term.
  • FIG. 23 is a block diagram illustrating an example of a data summarization system according to the third embodiment of this invention.
  • FIG. 24 is a schematic diagram in which the data stored in the unsummarized data storage unit 30 is schematically arranged in the order of occurrence time.
  • FIG. 25 is a block diagram illustrating an example of a data summarization system according to the fourth embodiment of this invention.
  • FIG. 26 is an explanatory diagram showing an example of deriving a default effective definition area.
  • FIG. 28 is a flowchart illustrating an example of the progress of the default update process according to the fourth embodiment.
  • FIG. 29 is a block diagram showing the minimum configuration of the present invention.
  • a data summarization system reduces the amount of information of data that is sequentially generated over time.
  • the data summarization system can reduce the amount of information, thereby reducing the storage capacity required for storing data as compared with the case where the data itself is stored accurately as it is. This reduction of data information is called “summary”. Generally, the accuracy of data decreases due to summarization.
  • a data summarization system is a data that is generated with a constant tendency sequentially with the passage of time, and the data in which the tendency of the data may change irregularly can be accurately and efficiently obtained.
  • CPU usage rate has been exemplified as an example of data that is generated with a constant trend and the data trend may change irregularly.
  • the usage rate is not limited. For example, the number of accesses per unit time of the Web page to be observed and the total number of accesses after the Web page has been released suddenly change. Data that may change irregularly ”.
  • the communication amount of the network device also corresponds to “data that is sequentially generated in a certain tendency and the data tendency may change irregularly”.
  • “Sequentially generated data” is a concept including “sequentially observable data”.
  • numerical values that occur with the passage of time will be described as an example as “data that occurs sequentially with a certain tendency and may change irregularly”.
  • the data itself is not a numerical value, it can be converted into a numerical value, and any data can be used as long as the difference between the numerical data can be derived.
  • An example of applying the present invention to such data other than numerical values will be described later.
  • each data includes a data value and an occurrence time of the data value. In the following description, the occurrence time of this data value may be simply referred to as the occurrence time of data.
  • the data summarization system derives a function for calculating a data value (numerical value) from a set of generated data, using the generation time as a variable.
  • This function is an approximate expression for obtaining an approximate value of the data value from the occurrence time.
  • a domain in which an approximate value of the data value can be obtained is determined.
  • This domain is hereinafter referred to as an effective domain.
  • the effective domain is represented by a set of sections (time zones) or points (specific time). A plurality of sections or points may be defined for one effective definition area.
  • FIG. 1 is an explanatory diagram showing an example of an effective domain. In FIG. 1, the horizontal axis represents time, and the vertical axis represents data values. Further, in FIG.
  • the data summarization system determines whether or not there is an approximate expression that appropriately obtains an approximate value of the data value when new data is generated, and approximate expression that appropriately obtains the approximate value of the data value If there is, the generation time of the new data is added to the effective definition area of the approximate expression. On the other hand, if there is no approximate expression that can appropriately obtain an approximate value of newly generated data, the new data is stored as a point for which the corresponding approximate expression is not determined (hereinafter referred to as an indeterminate point).
  • the data summarization system creates an approximate expression from the uncertain points when the uncertain points are accumulated in a number that can derive a new approximate expression.
  • the data summarization system stores each data in the form of an approximate expression and an effective definition area, instead of storing the generation time and the data value for each data. Furthermore, the data summarization system according to an aspect of the present invention allows a plurality of sections and points (specific times) to be defined as an effective definition area of one approximate expression. As a result, a data summarization system according to one aspect of the present invention efficiently compresses (ie summarizes) data.
  • FIG. FIG. 2 is a block diagram illustrating an example of the data summarization system according to the first embodiment of this invention.
  • the data summarization system of the first embodiment includes a data input unit 10, a final time storage unit 11, a new data generation time substitution unit 12, a new approximate expression generation unit 14, an indeterminate point storage unit 13, and an approximation.
  • An expression storage unit 15, an accuracy constraint input unit 16, a graph evaluation unit 17, a graph update unit 18, and a confirmed graph storage unit 19 are provided.
  • the data summarization system according to the first embodiment of the present invention summarizes data sequentially input to the data input unit 10 and stores the summarized result in the definite graph storage unit 19.
  • the data input unit 10 acquires data from a data generation source (not shown) that sequentially generates data over time.
  • the mode of the data source differs depending on the type of data.
  • the web server may be the data generation source.
  • the unit that monitors the usage rate of the CPU may be the data generation source.
  • Each piece of data input from the data generation source to the data input unit 10 includes at least a data value and a generation time of the data value.
  • FIG. 3 is an explanatory diagram illustrating an example of one data input to the data input unit 10.
  • FIG. 3 illustrates the case where the data value is the CPU usage rate.
  • the data includes the generation time of the data value and the data value (CPU usage rate in this example).
  • the uncertain point storage unit 13 is a storage device that stores data for which an approximate expression for obtaining an approximate value of a data value has not yet been specified. Regardless of which approximate value is calculated using any existing approximate expression, data that is determined to have a large difference between the actual data value and the approximate value is stored in the indeterminate point storage unit 13 as an indeterminate point. Go.
  • the data can be regarded as a point having the occurrence time and the data value as coordinates.
  • data for which an approximate expression for obtaining an approximate value of a data value has not yet been specified is expressed using the word “indeterminate point”.
  • the approximate expression for obtaining the approximate value of the data value using the occurrence time as a variable is derived from a plurality of uncertain points stored in the uncertain point storage unit 13. Until the number of pieces of data necessary for determining the approximate expression is obtained, the uncertain point storage unit 13 stores generated data corresponding to the uncertain points. Note that the number of data necessary to determine the approximate expression depends on the type of approximate expression (whether it is a linear expression, a quadratic expression, an expression using a trigonometric function, etc.), an approximate expression, or the like. Depends on the decision algorithm.
  • FIG. 4 is an explanatory diagram showing an example of uncertain points stored in the uncertain point storage unit 13.
  • Each undetermined point includes an occurrence time of the undetermined point (data) and a data value (CPU usage rate in this example).
  • the generated data determines that the corresponding approximate expression cannot be specified becomes an undetermined point, and therefore the data structure of each undetermined point is the same as the data structure of the data illustrated in FIG.
  • the new approximate expression generation unit 14 is configured such that when newly generated data is input to the new approximate expression generation unit 14, the number of the generated data and the undefined points stored in the undefined point storage unit 13 is It is determined whether or not the number necessary for determining the approximate expression is exceeded.
  • the new approximate expression generation unit 14 determines that the number of pieces of data necessary for determining the approximate expression has been prepared, the new approximate expression generation unit 14 calculates a function (approximate expression) for calculating a data value from the data using the occurrence time as a variable. Generate. For example, it is assumed that the number of data required for generating the approximate expression is k, and k ⁇ 1 uncertain points are stored in the uncertain point storage unit 13.
  • the new approximate expression generation unit 14 when one new generation data is newly input to the new approximate expression generation unit 14, the new approximate expression generation unit 14 generates an approximate expression from k pieces of data obtained by adding the data to the undetermined point. Furthermore, the new approximate expression generation unit 14 generates the approximate expression after generation of the approximate expression and each data used for the generation of the approximate expression (that is, the uncertain point stored in the uncertain point storage unit 13, and Newly generated data) is output to the graph update unit 18. Instead of outputting the newly generated approximate expression to the graph updating unit 18, the new approximate expression generating unit 14 stores the approximate expression and the ID (identification information) of the approximate expression in the approximate expression storage unit 15, The approximate expression ID may be output to the graph update unit 18.
  • FIG. 5 is an explanatory diagram schematically showing uncertain points used for generating an approximate expression and newly generated data.
  • the horizontal axis shown in FIG. 5 represents time t, and the vertical axis shown in FIG. 5 represents the data value x.
  • the new approximate expression generation unit 14 generates a linear expression as an approximate expression.
  • the generation method is assumed to be a least square method.
  • the new approximate expression generation unit 14 generates a linear expression by the least square method, the number of necessary data is four.
  • Three uncertain points P shown in FIG. 400 Is stored in the indeterminate point storage unit 13 and the data P is newly added. 401 Is input to the data input unit 10. Then, the new approximate expression generation unit 14 has three uncertain points P.
  • FIG. 6 is an explanatory view schematically showing an approximate expression generated from each data shown in FIG.
  • FIG. 6 is an explanatory view schematically showing an approximate expression generated from each data shown in FIG.
  • FIG. 7 is an explanatory diagram showing an example of the expression format of the approximate expression generated by the new approximate expression generation unit 14.
  • the new approximate expression generation unit 14 obtains an approximate expression represented by a linear function by the least square method using three uncertain points and one new generated data is shown.
  • the function used by the new approximate expression generation unit 14 as an approximate expression is not limited to a linear function.
  • the new approximate expression generation unit 14 may generate an approximate expression represented by a quadratic or higher integer function, an exponential function, or a trigonometric function.
  • the method of generating the approximate expression is not limited to the least square method, and the approximate expression may be generated by another method.
  • the number of data necessary for generating the approximate expression differs depending on the type of approximate expression and the method for generating the approximate expression.
  • the new approximate expression generation unit 14 may generate an approximate expression.
  • the new approximate expression generation unit 14 may generate an approximate expression as a linear function connecting two points. In this case, if there are two data, the new approximate expression generation unit 14 can generate an approximate expression.
  • the new approximate expression generation unit 14 generates, as an approximate expression, a straight line connecting two points in the plane of the generation time t and the data value x from one uncertain point and one newly generated data. May be. In this case, the number of data necessary for generating the approximate expression is two. Further, the new approximate expression generation unit 14 may generate the approximate expression using other methods such as spline interpolation. In the following description, a case where the new approximate expression generation unit 14 generates an approximate expression of a linear function will be described as an example.
  • the approximate expression storage unit 15 is a storage device that stores an approximate expression for obtaining a data value using the occurrence time as a variable, together with an ID of the approximate expression.
  • the graph update unit 18 stores the approximate expression in the approximate expression storage unit 15 together with the ID.
  • the graph update unit 18 may assign the ID of the approximate expression.
  • the approximate expression storage unit 15 may store a combination of a first-order coefficient and a constant term as in the case shown in FIG.
  • this storage mode is an example, and the approximate expression storage unit 15 may store the approximate expression in another form.
  • FIG. 8 is an explanatory diagram illustrating an example of an approximate expression stored in the approximate expression storage unit 15 and an approximate expression ID thereof. As shown in FIG.
  • the approximate expression storage unit 15 displays an approximate expression ID, which is identification information of the approximate expression, and an approximate expression (in this example, expressed by a combination of a primary coefficient and a constant term).
  • the final time storage unit 11 is a storage device that stores a set of the generation time of data that has occurred at the end other than the uncertain point and an approximate expression that approximates the data value of the data.
  • the last time storage unit 11 is the approximation that appropriately obtains the generation time of the last generated data and the approximate value of the data among the data for which the approximate expression that appropriately obtains the approximate value of the data value is specified. Memorize a pair with an expression.
  • FIG. 9 is an explanatory diagram showing an example of final data information stored in the final time storage unit 11.
  • the final data information includes an approximate expression ID and a final time.
  • the approximate expression is specified by the approximate expression ID.
  • the final time is the data generation time of the data that has occurred last among the data that can be approximated by the approximate expression.
  • the uncertain point stored in the uncertain point storage unit 13 is data that cannot be approximated by each known approximate expression, and thus is not a target to be stored in the final time storage unit 11.
  • the final time stored in the final time storage unit 11 is not updated.
  • the approximate expression is represented by the approximate expression ID, but the final time storage unit 11 includes the approximate expression generated by the new approximate expression generation unit 14 and the approximate expression stored by the approximate expression storage unit 15.
  • the approximate expression may be stored in a similar format.
  • the final time storage unit 11 may store a combination of a primary coefficient and a constant term as information representing an approximate expression instead of the approximate expression ID.
  • the new data generation time substitution unit 12 substitutes the generation time of the generation data newly input to the data input unit 10 for each approximate expression generated in the past, and calculates an approximate value of the data value.
  • the new data generation time substitution unit 12 outputs a set of each approximate expression and an approximate value calculated for each approximate expression to the graph evaluation unit 17.
  • the new data generation time substitution unit 12 reads the final data information from the final time storage unit 11.
  • the new data generation time substitution unit 12 determines which of the sets of the approximate expression and the approximate value is the set corresponding to the set of the approximate expression indicated by the final data information and the approximate value obtained from the approximate expression. Is also output to the graph evaluation unit 17. Further, the new data generation time substitution unit 12 also outputs the generated data newly input to the data input unit 10 to the graph evaluation unit 17.
  • FIG. 10 is an explanatory diagram showing an example of processing of the new data generation time substitution unit 12.
  • x f0 (t)
  • x f1 (t)
  • x f2 (t)
  • each black circle represents each data.
  • the data shown in the vicinity of the line representing each approximate expression is data that can approximate the data value with the approximate expression.
  • the occurrence time t of new occurrence data i To calculate approximate values. This approximate value is X 1010 , X 1011 , X 1012 , X 1013
  • the accuracy constraint input unit 16 receives a standard (accuracy) that can be said that the approximate value calculated by the approximate expression appropriately approximates the actual data value, and stores the standard.
  • a standard accuracy
  • the absolute value of the difference between the approximate value f (t) by the approximate expression f and the actual generated data value x is less than a predetermined threshold value ⁇ .
  • This criterion can be expressed as
  • a criterion is set that the absolute value of the ratio of the difference between the approximate value f (t) and the actual generated data value x with respect to the approximate value f (t) by the approximate expression f is less than the threshold value ⁇ . It may be.
  • This criterion can be expressed as
  • the case where the above calculation results are both less than the threshold is exemplified as the reference, but a reference that the above calculation results are equal to or less than the threshold may be used.
  • the definite graph storage unit 19 is a storage device that stores an effective domain for each approximate expression that approximates past generated data.
  • FIG. 11 is an explanatory diagram illustrating an example of an effective definition area for each approximate expression stored in the definite graph storage unit 19. In the example shown in FIG. 11, each approximate expression is represented by an approximate expression ID.
  • a time range that is an effective definition area and a time point that is an effective definition area are determined.
  • a portion expressed as a time range (that is, a time zone) in the valid definition area is shown as “section”, and a portion expressed as a specific point of time is shown as “point”.
  • point a portion expressed as a specific point of time.
  • t 22b ⁇ t ⁇ t 22e , T 23b ⁇ t ⁇ t 23e , T t 21 ⁇ .
  • the effective domain of another approximate expression can also be specified from the information stored in the definite graph storage unit 19.
  • the confirmed graph storage unit 19 may store information of the following data structure. That is, the deterministic graph storage unit 19 corresponds to an approximate expression for calculating an approximate value of a data value by substituting a variable, and an effective definition area that is a variable definition range capable of obtaining the approximate value of the data value.
  • the valid domain may store information of a data structure represented by a variable section or a set of points representing one variable value.
  • this variable is a variable representing time.
  • this data structure it corresponds to another approximate expression between the sections of the effective domain associated with a certain approximate expression, or between points, or between sections. It is permissible to have a valid domain or point attached. For example, the order of the time series of each section and point shown in FIG.
  • the data summarization system, data summarization method, and data summarization program according to each aspect of the present invention can preferably use such a data structure.
  • the graph evaluation unit 17 calculates each approximate value calculated by the new data generation time substituting unit 12 by substituting the generation time of new data for each approximate expression stored in the approximate expression storage unit 15 and the new data. Compare the actual data value of. Then, the graph evaluation unit 17 specifies an approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16. Furthermore, when there are a plurality of approximate expressions satisfying the criterion, the graph evaluation unit 17 identifies an approximate expression that minimizes the increase in storage capacity when updating the effective domain, among the approximate expressions satisfying the criterion.
  • the graph evaluation unit 17 determines to approximate the data value of the new data with the approximate expression. Then, the graph evaluation unit 17 outputs the determined approximate expression, its valid domain, and new data (generated data received from the new data generation time substitution unit 12) to the graph update unit 18. At this time, the graph evaluation unit 17 also outputs information indicating whether or not the determined approximate expression is the final approximate expression to the graph update unit 18. Moreover, the graph evaluation part 17 should just output the approximate expression ID to the graph update part 18, for example as the determined approximate expression.
  • the graph evaluation unit 17 may read the effective definition area from the confirmed graph storage unit 19.
  • the graph evaluation unit 17 When the approximate expression output from the new data generation time substitution unit 12 to the graph evaluation unit 17 is expressed in the form of the approximate expression ID, the graph evaluation unit 17 is stored in the approximate expression storage unit 15. All approximate expressions are read from the approximate expression storage unit 15. The graph evaluation unit 17 may not be able to specify an approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16. That is, there may be no approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16. In that case, the graph evaluation unit 17 may output new data (generated data received from the new data generation time substitution unit 12) to the graph update unit 18 without selecting an approximate expression. An example of approximate expression selection by the graph evaluation unit 17 will be specifically described with reference to FIGS.
  • FIGS. 11 , I 01 , I 12 Etc. are sections and points included in the effective definition area, and correspond to the sections and points illustrated in FIG.
  • x f0 (t)
  • x f1 (t)
  • x f2 (t)
  • x f3 (t )
  • the generation time of new generation data received by the graph evaluation unit 17 is t. i
  • the data value of the generated data is x i
  • time t i Data P 1021 The case where this occurs is illustrated.
  • ⁇ holds, the graph evaluation unit 17 selects x f1 (t). When the plurality of approximate expressions satisfy the accuracy criterion, the graph evaluation unit 17 pays attention to the plurality of approximate values individually. Then, the graph evaluation unit 17 calculates the increase in the storage capacity for storing the effective domain when the effective domain is updated on the assumption that the approximate expression of interest is an approximate expression representing new generated data. calculate.
  • FIG. 13 illustrates a selection example of this aspect.
  • Data P 1022 The case where this occurs is illustrated.
  • both of the data P 1022 Data value x i Satisfies the criterion that the absolute value of the difference between the approximate value and the approximate value is less than the threshold value ⁇ .
  • the graph evaluation unit 17 may select the final approximate expression that is the final approximate expression among the approximate expressions that satisfy the criterion and that has the end of the effective domain as the end point of the “section”.
  • T is the effective definition area of the final approximate expression where the end of the effective definition area is the end point of the “section”.
  • the graph evaluation unit 17 selects the final approximate expression in which the end of the effective domain is the end point of the “section”.
  • the storage capacity necessary for expressing the effective domain does not increase when the effective domain is updated.
  • the effective domain after the update is ⁇ t 21 ⁇ ⁇ [t 22b , T 22e ] ⁇ [t 23b , T i ],
  • the storage capacity necessary for expressing the effective domain does not increase.
  • the graph evaluation unit 17 selects 1 from the approximate expressions.
  • One approximate expression may be selected.
  • the method for selecting the approximate expression may be a method in which one approximate expression is arbitrarily determined in advance.
  • Figure 14 shows time t i Data P 1023 The case where this occurs is illustrated.
  • both of the data P 1023 Data value x i Satisfies the criterion that the absolute value of the difference between the approximate value and the approximate value is less than the threshold value ⁇ . That is,
  • This selection method is an example of selecting one approximate expression from a plurality of approximate expressions having the same increase in storage capacity, and the approximate expression may be selected by another method. For example, t in the effective definition area i Among a plurality of approximate formulas having the same increase in storage capacity when adding, the graph evaluation unit 17 may select an approximate formula whose upper limit (end) of the effective domain is closest to the current time. In other words, the graph evaluation unit 17 may select an approximate expression having the maximum upper limit (end) of the effective domain. In the example shown in FIG.
  • the graph update unit 18 updates the effective definition area of the approximate expression selected by the graph evaluation unit 17 according to the content, or the uncertain point storage unit 13. Add indeterminate points to.
  • the graph update unit 18 when there is an input from the new approximate expression generation unit 14, the graph update unit 18 newly registers the approximate expression in the approximate expression storage unit 15.
  • the graph update unit 18 uses the graph evaluation unit 17 to display the approximate expression determined by the graph evaluation unit 17 and its valid definition area, new generated data, and information indicating whether the determined approximate expression is the final approximate expression.
  • the graph updating unit 18 updates the effective domain of the approximate expression.
  • the graph updating unit 18 may update the effective definition area so as to add the generation time of new generation data to the effective definition area of the input approximate expression.
  • the graph update unit 18 stores the updated effective definition area of the approximate expression determined by the graph evaluation unit 17 in the confirmed graph storage unit 19. At this time, the graph updating unit 18 may update the effective domain by dividing the case as follows.
  • the graph update unit 18 uses the point as the start point, What is necessary is just to create the new "section” which makes the generation
  • the graph updating unit 18 may exclude the “point” that is the end of the effective definition area from the classification of “points” in the effective definition area. In the example illustrated in FIG. 11, the graph updating unit 18 excludes one point from the “point” item and adds a new section to the “section” item.
  • the graph update unit 18 sets the generation time of the new generation data as “point” and validates the approximate expression. Add to the domain. Further, when updating the effective definition area of the approximate expression determined by the graph evaluation unit 17, the graph update unit 18 also updates the final data information. The graph update unit 18 updates the approximate expression ID of the final data information (see FIG.
  • the graph update unit 18 stores the generated data as an undefined point in the undefined point storage unit 13.
  • the graph update unit 18 only stores one unconfirmed point in the unconfirmed point storage unit 13, and information stored in the final time storage unit 11, the approximate expression storage unit 15, and the confirmed graph storage unit 19. Will not be updated.
  • the new approximate expression generation unit 14 when the new approximate expression generation unit 14 newly generates an approximate expression from the uncertain point, the new approximate expression generation unit 14 stores the approximate expression and each data used for generating the approximate expression (that is, the uncertain point storage). Each uncertain point stored in the unit 13 and newly input generated data) are output to the graph update unit 18.
  • the graph updating unit 18 assigns an approximate expression ID to the new approximate expression, and stores the new approximate expression and the approximate expression ID in association with each other in the approximate expression storage unit 15.
  • the graph update unit 18 deletes the undetermined point used for generating a new approximate expression. In other words, the graph update unit 18 deletes each undetermined point stored in the undetermined point storage unit 13.
  • the graph update unit 18 determines an effective definition area of the new approximate expression from the generation time of each data used for generating the approximate expression, and stores it in the definite graph storage unit 19 together with the approximate expression ID.
  • the graph updating unit 18 reduces the number of sections having the generation time of the data used for generating the new approximate expression as the starting point and the ending point as much as possible, and the effective definition of the existing approximate expression.
  • the effective domain is defined so that it does not overlap with the domain.
  • the graph updating unit 18 may determine a time that exists independently between sections and points in the effective definition area of the existing approximate expression and cannot be set as the start point or end point of the section. Referring to FIG. 11, FIG. 13, FIG. 12, and FIG.
  • the end of the effective domain is the section I 23 Since the end point of the graph, the graph updating unit 18 23 [T 23b , T 23e ] (See FIG. 11) end point t 23e T i Update to.
  • the graph update unit 18 stores the updated valid definition area in the confirmed graph storage unit 19.
  • the section corresponding to the approximate expression ID “f2” shown in FIG. 22b , T 22e ], [T 23b , T 23e ] To [t 22b , T 22e ], [T 23b , T i ] Is updated.
  • the graph update unit 18 also stores the undetermined point P. 1030 And new generation data P 1031
  • the effective domain is determined from The graph update unit 18 determines the uncertain point P shown in FIG.
  • the data input unit 10, the new data generation time substitution unit 12, the new approximate expression generation unit 14, the accuracy constraint input unit 16, the graph evaluation unit 17, and the graph update unit 18 are executed by, for example, a CPU of a computer that operates according to the data summarization program. Realized.
  • a program storage device (not shown) of the computer stores the data summarization program, and the CPU reads the program, and according to the program, the data input unit 10, the new data generation time substitution unit 12, and the new approximate expression generation unit 14
  • the accuracy constraint input unit 16, the graph evaluation unit 17, and the graph update unit 18 may be operated.
  • each of these units may be realized by separate hardware.
  • the final time storage unit 11, the uncertain point storage unit 13, the approximate expression storage unit 15, and the confirmed graph storage unit 19 may be realized by separate storage devices. Alternatively, it may be realized by the same storage device. Further, some combinations of the final time storage unit 11, the uncertain point storage unit 13, the approximate expression storage unit 15, and the confirmed graph storage unit 19 may be realized by the same storage device.
  • FIG. 16 is a flowchart illustrating an example of processing progress of the first embodiment.
  • a data generation source (not shown) sequentially generates data (No in step S100)
  • data is sequentially input from the data generation source to the data input unit 10 (step S101).
  • step S101 data is input to the data input unit 10 in order of generation time one by one.
  • the data summarization system performs the subsequent operations for each piece of data.
  • a plurality of data may be input to the data input unit 10 all together, but even in that case, the data summarization system performs the subsequent processing for each piece of data in order of data generation time.
  • step S102 can the new approximate expression generation unit 14 generate a new approximate expression from one generated data input to the data input unit 10 and the uncertain points stored in the uncertain point storage unit 13? It is determined whether or not (step S102).
  • step S ⁇ b> 102 the new approximate expression generation unit 14 reads each undetermined point stored in the undetermined point storage unit 13, and uses the undetermined point and new generated data input to the data input unit 10. What is necessary is just to determine whether or not the number of data necessary for the approximate expression generation has been prepared.
  • the new approximate expression generation unit 14 determines that a new approximate expression can be generated if the number of data necessary for generating the approximate expression is complete, and if it determines that a new approximate expression cannot be generated otherwise. Good.
  • the new approximate expression generation unit 14 determines that a new approximate expression can be generated (Yes in step S102), the new approximate expression generation unit 14 inputs each uncertain point and 1 input to the data input unit 10. From the two pieces of generated data, an approximate expression that approximates the data value is generated using the generation time of the data as a variable (step S103).
  • Approximate expression types and approximate expression generation algorithms are determined in advance, but the types and algorithms are not particularly limited. As already described, it is assumed that the approximate expression is a linear expression with the occurrence time t as a variable, and when four data are prepared, the new approximate expression generation unit 14 generates four sets of occurrence time and data.
  • An approximate expression may be generated by determining the first-order coefficient and the constant term from the value by the least square method.
  • the new approximate expression generation unit 14 obtains a straight line passing through two points having (occurrence time, data value) as coordinates as a linear expression having the occurrence time t as a variable. May be. Also in this case, the new approximate expression generation unit 14 may determine the primary coefficient and the constant term.
  • the approximate expression is a linear expression, and the approximate expression is represented by a primary coefficient and a constant term of the variable t as shown in FIG.
  • the expression form of the approximate expression is not limited to this form, and the approximate expression may be expressed in another form.
  • step S103 the new approximate expression generation unit 14 outputs the generated new approximate expression and each data (each indeterminate point and new generated data) used for generating the approximate expression to the graph update unit 18.
  • the graph update unit 18 assigns an approximate expression ID to the approximate expression received from the new approximate expression generation unit 14 and stores the approximate expression ID in the approximate expression storage unit 15 (step S104). As a result, one additional approximate expression is newly registered.
  • step S104 the graph updating unit 18 further determines an effective definition area of the newly generated approximate expression based on the generation time of each data used for generating the approximate expression, and stores the definite graph together with the approximate expression ID. Store in the unit 19.
  • the graph updating unit 18 satisfies the condition that the number of sections starting from and ending with the generation time of each data used for generating the approximate expression is as small as possible and does not overlap with the effective definition area of the existing approximate expression. Establish an effective definition area for the new approximate expression.
  • the graph update unit 18 may be a point that exists independently between the sections and points of the effective definition area of the existing approximate expression and cannot be set as the start point or end point of the section as a “point (see FIG. 11)”. .
  • the new data generation time substitution unit 12 sets the generation time of the data input to the data input unit 10 as follows: Substitution is performed for each approximate expression already generated in the past (that is, each approximate expression stored in the approximate expression storage unit 15). Then, the new data generation time substitution unit 12 calculates an approximate value of the data input to the data input unit 10 for each approximate expression (step S105). Then, the new data generation time substitution unit 12 outputs the approximate value calculated using the approximate expression and the set of the approximate expression to the graph evaluation unit 17.
  • the new data generation time substitution unit 12 refers to the final data information and also outputs to the graph evaluation unit 17 information indicating which is the final approximate expression and the set of approximate values calculated from the final approximate expression. . Further, the new data generation time substitution unit 12 also outputs the generation data (data input to the data input unit 10) that is currently processed to the graph evaluation unit 17. Details of the processing in step S105 will be described later. After step S105, the graph evaluation unit 17 compares the approximate value of the data calculated for each approximate expression with the actual data value of the data input to the data input unit 10 (step S106).
  • the accuracy constraint input unit 16 stores a criterion that the absolute value of the difference between the approximate value and the actual data value is less than the threshold value ⁇ (ie,
  • the graph evaluation unit 17 is an example in the case of selecting an approximate expression that satisfies this criterion. In this case, the graph evaluation unit 17 calculates the absolute value of the difference between the approximate value of the data calculated for each approximate expression and the actual data value of the data input to the data input unit 10 in step S106. Next, the graph evaluation unit 17 determines whether there is an approximate expression that satisfies the criterion that the absolute value of the difference calculated in step S106 is less than the threshold ⁇ (step S107).
  • the graph evaluation unit 17 selects an approximate expression that minimizes the amount of increase in storage capacity at the time of updating the effective domain from among the approximate expressions that satisfy the criterion. Select (step S108). That is, the graph evaluation unit 17 selects an approximate expression that minimizes the amount of increase from the storage capacity for storing the effective definition area before the update to the storage capacity for storing the effective definition area after the update. However, if there is only one approximate expression that satisfies the criterion, the graph evaluation unit 17 may select the approximate expression.
  • the graph evaluation unit 17 may select the final approximate expression. This is because the final approximate expression minimizes the increase in storage capacity. In addition, there are multiple approximate expressions that meet the criteria, and among these approximate expressions, there is a final approximate expression in which the end of the effective section is the end point of the “section”. If not, the graph evaluation unit 17 selects one from a plurality of approximate expressions that satisfy the criterion. This selection method may be determined in advance.
  • the graph evaluation unit 17 selects one approximate expression satisfying the criterion in step S108, the graph evaluation unit 17 outputs the selected approximate expression and its effective definition area and the generated data to be processed to the graph update unit 18.
  • the graph evaluation unit 17 also outputs information indicating whether or not the selected approximate expression is the final approximate expression to the graph update unit 18.
  • the graph evaluation unit 17 may output an approximate expression ID for identifying the approximate expression to the graph update unit 18 as an approximate expression.
  • the graph update unit 18 updates the effective definition area of the approximate expression received from the graph evaluation unit 17 in step S108, and stores it in the confirmed graph storage unit 19 (step S109). Since each mode in which the graph update unit 18 updates the valid domain has already been described, a description thereof is omitted here.
  • step S109 the graph update unit 18 updates the approximate expression ID of the final data information (see FIG. 9) stored in the final time storage unit 11 to the approximate expression ID of the approximate expression selected in step S108. Then, the final time (see FIG. 9) of the final data information is updated to the generation time of the generated data to be processed.
  • the graph evaluation unit 17 outputs the generated data to be processed to the graph update unit 18, and the graph update unit 18 stores the generated data as an undetermined point in the undetermined point storage unit 13 (step S110).
  • step S104 When the data summarization system completes any one of steps S104, S109, and S110, the same processing is repeated for the next data (next data in the order of occurrence time) input to the data input unit 10. . In this way, the processing from step S102 onward is performed individually for each piece of generated data input to the data input unit 10 in the order of generation time.
  • the data generation source ends the data generation (Yes in step S100)
  • the data summarization system ends the process.
  • the accuracy constraint input unit 16 stores a criterion that the absolute value of the difference between the approximate value and the actual data value is less than the threshold ⁇ is exemplified.
  • the criteria stored in the input unit 16 are not limited to the above criteria.
  • step S105 the new data generation time substitution unit 12 determines whether there is an approximate expression that has not yet been read from the approximate expression storage unit 15 (step S201). If there is an approximate expression that has not been read, the new data generation time substitution unit 12 reads the approximate expression and the approximate expression ID stored in the approximate expression storage unit 15 (step S202).
  • the new data generation time substitution unit 12 uses the approximate expression ID “f” and the approximate value F (t i ) Are output to the graph evaluation unit 17 (step S203). After step S203, the new data generation time substitution unit 12 repeats the processing after step S201. If there is no approximate expression that has not yet been read from the approximate expression storage unit 15 (No in step S201), the new data generation time substitution unit 12 reads the final data information from the final time storage unit 11 (step S204). The approximate expression ID of the final approximate expression included in the final data information is output to the graph evaluation unit 17 (step S205). At this time, the new data generation time substitution unit 12 also outputs to the graph evaluation unit 17 the data to be processed input to the data input unit 10.
  • the new data generation time substitution unit 12 may execute steps S204 and S205 before the loop processing of steps S201 to S203.
  • the data summarization system according to the first embodiment is such that “data that is generated sequentially with a certain tendency and may change irregularly, such as the CPU data rate and the number of web page accesses”. Are stored as an approximate expression using the occurrence time as a variable and an effective domain where it can be said that approximation by the approximate expression is appropriate.
  • the effective domain of one approximate expression is represented as a set of points indicating a time interval and a time point.
  • the data summarization system according to the first embodiment specifies an approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16 for new data, and approximates the new data using the approximate expression.
  • the data summarization system of the first embodiment can summarize (compress) data with high accuracy.
  • the effective definition area of one approximate expression is allowed to include a plurality of “sections” and “points”
  • the data summarization system of the first embodiment suppresses the storage capacity of the summarized data.
  • efficient data summarization can be realized. For example, after a state in which data that can be approximated by a certain approximate expression is continuously generated (first state), the tendency of the data value is temporarily changed (second state), and then again in the original approximate expression. Assume that a state (third state) in which data that can be approximated is generated occurs.
  • the data summarization system of the first embodiment includes the first state and the first state.
  • the generated data in the three states can be expressed by the same approximate expression, and the storage capacity can be reduced accordingly.
  • the data summarization system uses the approximate expression and effective domain in the first state and the approximate expression and effective definition in the third state. Each area needs to be stored, and the approximate expression is stored redundantly, resulting in an increase in storage capacity.
  • the data summarization system of the first embodiment can prevent such an increase in storage capacity.
  • the data summarization system must store the time of occurrence and data value for one point.
  • the data summarization system according to the first embodiment does not store data with a storage capacity larger than at least the storage capacity in the case of storing the data itself as it is.
  • the data summarization system according to the first embodiment stores two numerical values, that is, a data value and an occurrence time, so that two pieces of data are equivalent to four numerical values. Storage capacity is required.
  • the data summarization system according to the first embodiment generates a straight line connecting two points as an approximate expression when one uncertain point and one new data are prepared. To do.
  • the approximate expression is a linear expression
  • the data summarization system according to the first embodiment needs to store a linear coefficient and a constant term.
  • FIG. FIG. 20 is a block diagram illustrating an example of a data summarization system according to the second embodiment of this invention. Constituent elements similar to those in the first embodiment are denoted by the same reference numerals as those in FIG.
  • the data summarization system of the second embodiment includes a data input unit 10, a final time storage unit 11, a new data generation time substitution unit 12, a new approximate expression generation unit 14, an indeterminate point storage unit 13, and an approximation.
  • An expression storage unit 15, an accuracy constraint input unit 16, a graph evaluation unit 17a, a graph update unit 18a, a deterministic graph storage unit 19, a default expression input unit 20, and a default expression storage unit 21 are provided.
  • the default expression storage unit 21 is a storage device that stores one approximate expression that is known as an approximate expression for obtaining an approximate value of a data value. For example, FIG.
  • FIG. 21 is an explanatory diagram illustrating an example of a default formula stored in the default formula storage unit 21.
  • the default equation storage unit 21 stores a first-order coefficient and a constant term of a default equation.
  • FIG. 21 illustrates a case where a predetermined primary expression of the variable t is stored.
  • the default formula input unit 20 receives a default formula input from the user and stores the default formula in the default formula storage unit 21.
  • each approximate value calculated by the new data generation time substituting unit 12 substituting the generation time of new data for each approximate expression stored in the approximate expression storage unit 15 And the actual data value of the new data. This point is the same as the graph evaluation unit 17 of the first embodiment.
  • the graph evaluation unit 17a of the second embodiment further calculates an approximate value when the time of new data (generated data newly generated and received from the new data generation time substituting unit 12) is substituted into a predetermined formula. The approximate value is also compared with the actual data value of the new data. Then, the graph evaluation unit 17a identifies an approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16. Then, when there are a plurality of approximate expressions that satisfy the criterion, the graph evaluation unit 17a identifies an approximate expression that minimizes the increase in storage capacity when updating the effective domain among the approximate expressions that satisfy the criterion. The approximate expression is determined to approximate the data value of the new data.
  • the graph evaluation unit 17a determines that the approximate expression approximates the data value of the new data.
  • the graph evaluating unit 17a selects a predetermined expression if there is a predetermined expression in the approximate expressions. This is because an effective definition area is not defined in the default formula, so that the storage capacity for expressing the effective definition area is zero.
  • the method of selecting an approximate expression is the same as that of the graph evaluation unit 17 in the first embodiment, and a description thereof will be omitted.
  • the graph evaluation unit 17a includes the determined approximate expression and its effective domain, and new data (from the new data generation time substitution unit 12). The received generated data) is output to the graph updating unit 18a. At this time, the graph evaluation unit 17a also outputs information indicating whether or not the determined approximate expression is the final approximate expression to the graph update unit 18a. Further, the graph evaluation unit 17a may output, for example, the approximate expression ID to the graph update unit 18a as the determined approximate expression. This operation is the same as the operation of the graph evaluation unit 17 in the first embodiment.
  • the graph evaluation unit 17a outputs information notifying that the default expression has been selected and the input new generated data to the graph update unit 18a.
  • a default formula ID dedicated to the default formula representing the default formula may be used.
  • the graph evaluation unit 17a may output new data to the graph update unit 18a without selecting an approximate expression. This operation is the same as the operation of the graph evaluation unit 17 in the first embodiment.
  • the graph evaluation unit 17a determines an approximate expression other than the default expression, and the approximate expression and its effective domain, new generation data, and the determined approximation are determined from the graph evaluation unit 17a.
  • the effective domain of the approximate expression is updated.
  • the graph updating unit 18a may update the effective definition area so as to add the generation time of new generation data to the effective definition area of the received approximate expression.
  • the graph update unit 18a stores the updated effective definition area of the approximate expression determined by the graph evaluation unit 17a in the confirmed graph storage unit 19. This operation is the same as the operation of the graph update unit 18 in the first embodiment.
  • the graph evaluation unit 17a selects a predetermined formula as an approximation formula that approximates the data value of the new data, and notifies the fact that the default formula has been selected and the input new generated data to the graph update unit 18a.
  • the graph updating unit 18a operates as follows. That is, the graph update unit 18a stores the generation time of the new data and the default formula dedicated to the default formula indicating the default formula in the final time storage unit 11 as final data information.
  • the graph update unit 18a sets the generated data as an undefined point in the undefined point storage unit 13.
  • This operation is the same as the operation of the graph update unit 18 in the first embodiment.
  • the new approximate expression generation unit 14 newly generates an approximate expression from the uncertain point, and the approximate expression and each data used for generating the approximate expression (that is, each of the data stored in the uncertain point storage unit 13).
  • the operation of the graph update unit 18a when the uncertain point and newly generated data) are output to the graph update unit 18a is the same as the operation of the graph update unit 18 in the first embodiment.
  • the graph evaluation unit 17a, the graph update unit 18a, and the default input unit 20 in the second embodiment are realized by a CPU of a computer that operates according to a data summarization program, for example.
  • a program storage device (not shown) of the computer stores the data summarization program, and the CPU reads the program, and according to the program, the data input unit 10, the new data generation time substitution unit 12, and the new approximate expression generation unit 14
  • the accuracy constraint input unit 16, the graph evaluation unit 17 a, the graph update unit 18 a, and the default expression input unit 20 may be operated.
  • each of these units may be realized by separate hardware.
  • the graph evaluation unit 17a compares the approximate value of the data calculated for each approximate expression with the actual data value of the data input to the data input unit 10 (step S106). However, in the second embodiment, the graph evaluation unit 17a substitutes the generation time of new data for the default formula stored in the default formula storage unit 21, and obtains the approximate value obtained as a result and the actual data value. Compare. For example, when the accuracy constraint input unit 16 stores a reference of
  • the graph evaluation unit 17a determines whether there is an approximate expression that satisfies the criterion that the absolute value of the difference calculated in step S106 is less than the threshold ⁇ (step S107).
  • the operation when there is no approximate expression that satisfies the criterion (No in step S107) is the same as in the first embodiment.
  • the graph evaluation unit 17a selects an approximate expression that minimizes the amount of increase in the storage capacity at the time of effective domain update from the approximate expressions that satisfy the criterion. (Step S108).
  • step S108 if there is a predetermined expression in the approximate expression satisfying the criterion, the graph evaluation unit 17a selects the predetermined expression, information notifying that the predetermined expression has been selected, and the newly generated data that has been input. Is output to the graph updating unit 18a.
  • the operation when there is no predetermined expression in the approximate expression that satisfies the criterion is the same as in the first embodiment.
  • the graph update unit 18a receives the final data information including the default formula ID and the generation time of the data as the final time storage unit 11. (Step S109).
  • step S109 in other cases is the same as that in the first embodiment, and a description thereof will be omitted.
  • the data summarization system of the second embodiment does not provide an effective definition area for the default formula.
  • the generated data determined that the approximate expression that approximates the data value of the data does not correspond to the default expression is associated with any approximate expression. Therefore, the data summarization system of the second embodiment can efficiently summarize generated data approximated by an approximate expression other than the default expression with a small storage capacity, as in the first embodiment. .
  • the data summarization system of the second embodiment does not store the effective domain for data approximated by a predetermined formula, so that the summarization can be performed more efficiently.
  • the data summarization system according to the second embodiment can efficiently summarize with a smaller storage capacity than the first embodiment.
  • FIG. 23 is a block diagram illustrating an example of a data summarization system according to the third embodiment of this invention.
  • the data summarization system includes a data input unit 10, a final time storage unit 11, a new data generation time substitution unit 12, a new approximate expression generation unit 14, an uncertain point storage unit 13, and an approximation.
  • Expression storage unit 15, accuracy constraint input unit 16, graph evaluation unit 17, graph update unit 18, confirmed graph storage unit 19, unsummarized data storage unit 30, available storage area monitoring unit 31, and summary A control unit 32 is provided. Components similar to those in the first embodiment are denoted by the same reference numerals as those in FIG. 2, and detailed description thereof is omitted. However, when data is input to the data input unit 10 from a data generation source (not shown), the data input unit 10 outputs the input data to the summary control unit 32.
  • the new data generation time substitution unit 12 and the new approximate expression generation unit 14 receive data from the summary control unit 32.
  • the unsummarized data storage unit 30 is a storage device that stores data generated by a data generation source (not shown) in an unsummarized state. Since the data includes a data value and an occurrence time, the unsummary data storage unit 30 stores a data group including the data value and the occurrence time.
  • FIG. 4 is an explanatory diagram illustrating a plurality of uncertain points stored in the uncertain point storage unit 13, but the unsummary data storage unit 30 also includes data values and generation times as illustrated in FIG. Store multiple data. The process of storing data in the unsummarized data storage unit 30 is performed by the summary control unit 32.
  • the unsummarized data storage unit 30, the final time storage unit 11, the uncertain point storage unit 13, the approximate expression storage unit 15, and the confirmed graph storage unit 19 may be realized by separate storage devices. Alternatively, they may be realized by the same storage device. Further, some combinations of the unsummarized data storage unit 30, the final time storage unit 11, the uncertain point storage unit 13, the approximate expression storage unit 15, and the confirmed graph storage unit 19 are realized by the same storage device. May be. For example, the unsummarized data storage unit 30 and the final time storage unit 11 are realized by the same storage device, and the approximate expression storage unit 15 and the definite graph storage unit 19 are realized by a storage device different from the storage device. The uncertain point storage unit 13 may be realized by the storage device.
  • the available storage area monitoring unit 31 monitors the amount of available resources in the storage device that stores at least unsummarized data, and outputs the monitoring result to the summary control unit 32.
  • the unsummarized data here means data stored in the unsummarized data storage unit 30 from the data generation source via the data input unit 10 and the summary control unit 32. That is, it means data that has not yet been summarized as a target of the summary process. Therefore, the available storage area monitoring unit 31 only needs to monitor the amount of resources that can be used in the unsummarized data storage unit 30. If the unsummarized data storage unit 30 is realized by the same storage device as other storage units, the available storage area monitoring unit 31 may monitor the amount of resources that can be used in the storage device.
  • the available storage area monitoring unit 31 monitors an available resource amount
  • the available storage area monitoring unit 31 may monitor the resource amount in another aspect.
  • the unsummarized data storage unit 30 is a disk storage device
  • the unused rate of the disk may be monitored.
  • the case where the available storage monitoring unit 31 monitors the amount of available resources is shown, but the amount of resources already used by the available storage monitoring unit 31 (for example, the disk usage rate). ) May be monitored.
  • the available storage area monitoring unit 31 monitors the amount of available resources in the unsummarized data storage unit 30 will be described as an example.
  • the available storage area monitoring unit 31 may monitor the unsummarized data storage unit 30 at regular intervals. Alternatively, for example, the available storage area monitoring unit 31 may monitor the unsummarized data storage unit 30 when an instruction to perform monitoring is input at an arbitrary timing from a user or the like.
  • the summary control unit 32 stores the generated data input to the data input unit 10 in the unsummarized data storage unit 30 in an unsummarized state according to the monitoring result by the available storage area monitoring unit 31. Alternatively, the summary control unit 32 performs summary control for summarizing the data stored in the unsummarized data storage unit 30 according to the monitoring result by the available storage area monitoring unit 31.
  • the summarization control unit 32 determines the generated data input to the data input unit 10. Then, the data is stored in the unsummarized data storage unit 30 without being summarized. On the other hand, if the available resource amount in the unsummarized data storage unit 30 is equal to or less than the threshold value, the summary control unit 32 performs summary control for summarizing the data stored in the unsummary data storage unit 30. Specifically, the summary control unit 32 outputs the data stored in the non-summary data storage unit 30 to the new approximate expression generation unit 14 and the new data generation time substitution unit 12 to start the data summarization process. .
  • the summary control unit 32 When the summary control unit 32 outputs the data stored in the non-summary data storage unit 30 to the new approximate expression generation unit 14 and the new data generation time substitution unit 12, the summary control unit 32 stores the data from the unsummary data storage unit 30. to erase. In addition, when the available storage monitoring unit 31 monitors the amount of resources already used, the summary control unit 32 displays the generated data when the amount of used resources is less than the threshold, as an unsummarized data storage unit. 30 and the summary control may be performed when the amount of resources used is equal to or greater than the threshold. Also, when performing summary control, the summary control unit 32 may store new generated data in the unsummary data storage unit 30 and simultaneously perform summary control on the data.
  • the summary control unit 32 outputs the data stored in the non-summary data storage unit 30 to the new approximate expression generation unit 14 and the new data generation time substitution unit 12 one by one when performing summary control.
  • the summary control unit 32 outputs the same data simultaneously to the new approximate expression generation unit 14 and the new data generation time substitution unit 12, for example.
  • the summary control unit 32 should output the data one by one so that the output order of each data satisfies the condition that the generation time of the data output after the generation time of the data output earlier is later. That's fine.
  • the summary control unit 32 does not need to delete the output data in order of time if the condition that the output data is not output again is satisfied. Good.
  • FIG. 24 is a schematic diagram in which the data stored in the unsummarized data storage unit 30 is schematically arranged in the order of occurrence time. It is assumed that the data 51 is data that is first input to the data input unit 10 and stored in the non-summary data storage unit 30, and thereafter, the data 52 and subsequent data are sequentially stored in the non-summary data storage unit 30.
  • the summary control unit 32 may output “data to the new approximate expression generation unit 14 and the new data generation time substitution unit 12 in the order of occurrence from the data 51.
  • the summary control unit 32 may be in the middle of the generated data ( For example, from the data 55), the data may be output in the order of generation time to the new approximate expression generation unit 14 and the new data generation time substitution unit 12.
  • the data 51 to 54 are not subject to summarization and are deleted. It is kept in the unsummary data storage unit 30 without being done. If the condition that the generation time of data to be output later is later than the generation time of data to be output first to the new approximate expression generation unit 14 and the new data generation time substitution unit 12 is summarized, The control unit 32 may skip and output the data. For example, after the output of the data 51 to 54, the summary control unit 32 may skip the data 55 and output the data 56 to 59.
  • the skipped data is not included in the summary target and is kept in the unsummarized data storage unit 30.
  • the summary control unit 32 may output data satisfying the condition that the generation time is after the final time indicated by the final data information to the new approximate expression generation unit 14 and the new data generation time substitution unit 12.
  • the summary control unit 32 outputs the input generated data as it is to the new data generation time substituting unit 12 and the new approximate expression generating unit 14 in order to summarize the new generated data when available resources are still smaller.
  • a threshold for determining whether to summarize new data without storing it in the unsummarized data storage unit 30 may be set in advance separately from the above threshold.
  • the new approximate expression generation unit 14 When the summary control unit 32 outputs data to the new approximate expression generation unit 14 and the new data generation time substitution unit 12, the new approximate expression generation unit 14, the new data generation time substitution unit 12, the graph evaluation unit 17, and the graph update unit
  • the operation 18 is the same as in the first embodiment. That is, the new approximate expression generation unit 14, the new data generation time substitution unit 12, the graph evaluation unit 17, and the graph update unit 18 perform the steps shown in FIG. 16 for each piece of data output from the summary control unit 32.
  • the operations after S102 are performed.
  • the new data generation time substitution unit 12 does not perform processing.
  • the available storage area monitoring unit 31 and the summary control unit 32 in the third embodiment are realized by, for example, a CPU of a computer that operates according to a data summarization program.
  • the program storage device (not shown) of the computer stores the data summarization program, and the CPU reads the program, and according to the program, the data input unit 10, the available storage area monitoring unit 31, the summarization control unit 32, the new The data generation time substitution unit 12, the new approximate expression generation unit 14, the accuracy constraint input unit 16, the graph evaluation unit 17, and the graph update unit 18 may be operated.
  • each of these units may be realized by separate hardware.
  • the data summarization system of the third embodiment stores data in the unsummarized data storage unit 30 without summarizing the data if there are many resources that can be used to store the data.
  • the data summarization system according to the third embodiment does not summarize the data, and therefore can hold the data with high accuracy.
  • the data summarization system according to the third embodiment summarizes the data stored in the unsummarized data storage unit 30 when the resources that can be used to store the data are reduced. Similar to the embodiment, it is possible to efficiently store with a small storage capacity in the form of the approximate expression and its effective domain. Therefore, the data summarization system of the third embodiment can realize efficient data summarization as in the other embodiments, and can hold data with high accuracy when there are many available resources.
  • each component shown in FIG. 23 may be realized by a plurality of devices instead of being realized by one device.
  • the data input unit 10, the available storage area monitoring unit 31, the summary control unit 32, the unsummary data storage unit 30, and the final time storage unit 11 may be realized by the first information processing apparatus. .
  • the new data generation time substitution unit 12, the new approximate expression generation unit 14, the uncertain point storage unit 13, the accuracy constraint input unit 16, the graph evaluation unit 17, and the graph update unit 18 include the second information. You may implement
  • the data summarization system may be configured to include the first information processing device, the second information processing device, and the database device.
  • the data summarization system of the third embodiment includes a default formula input unit 20 and a default formula storage unit 21 (see FIG. 20), as in the second embodiment, and includes a graph evaluation unit 17 and a graph update unit 18.
  • FIG. 25 is a block diagram illustrating an example of a data summarization system according to the fourth embodiment of this invention.
  • the data summarization system includes a data input unit 10, a final time storage unit 11, a new data generation time substitution unit 12, a new approximate expression generation unit 14, an uncertain point storage unit 13, and an approximation.
  • An area calculation unit 40, an effective domain capacity evaluation unit 41, and an effective domain update unit 42 are provided. As described in the second embodiment, the valid definition area is not defined in the default formula stored in the default formula storage unit 21.
  • the default effective domain calculator 40 calculates the storage capacity required for the effective domain of each approximate expression other than the default formula and the storage capacity required for the effective domain of the default formula when the effective domain is defined in the default formula. In order to determine the magnitude relationship, a default effective definition area is calculated, and the effective definition area is output to the effective definition area capacity evaluation unit 41.
  • FIG. 26 is an explanatory diagram showing an example of deriving a default effective definition area. The horizontal axis shown in FIG.
  • 26 represents the data generation time t, and the vertical axis represents the data value x.
  • Each data is T 0 ⁇ T 7 It is assumed that it occurs in the time zone.
  • the predetermined effective range calculation unit 40 may calculate the default effective range as shown in the following equation (2).
  • the valid domain capacity evaluation unit 41 refers to the valid domain of the default formula received from the default formula valid domain calculation unit 40 and the valid domain of each approximate expression stored in the definite graph storage unit 19 to validate An approximate expression that maximizes the storage capacity required to store the domain is specified.
  • the effective domain capacity evaluation unit 41 outputs the approximate expression and the default effective domain to the effective domain update unit 42. Note that the valid domain capacity evaluation unit 41 may output an approximate formula ID or a default formula ID to the valid domain update unit 42 instead of outputting the approximate formula itself. In the example shown in FIG.
  • the effective definition area of the approximate expression of the approximate expression ID “f1” is [T 0 , T 1 ] ⁇ [T 2 +1, T 3 ] ⁇ [T 5 +1, T 6 Therefore, a storage capacity of 6 numerical values is required.
  • the effective definition area of the approximate expression ID “f2” is [T 3 +1, T 4 ] ⁇ [T 6 +1, T 7 Therefore, a storage capacity for four numerical values is required.
  • the effective domain calculated for the default formula is [T 1 +1, T 2 ] ⁇ [T 4 +1, T 5 Therefore, a storage capacity for four numerical values is required.
  • the valid domain update unit 42 responds to the input contents. To update the default formula. However, if the approximate expression that maximizes the storage capacity of the valid domain is the current default formula, the valid domain update unit 42 ends the process without updating.
  • the default valid domain calculation unit 40, the valid domain capacity evaluation unit 41, and the valid domain update unit 42 in the fourth embodiment are realized by a CPU of a computer that operates according to a data summarization program, for example.
  • a program storage device (not shown) of the computer stores the data summarization program, and the CPU reads the program, and according to the program, the data input unit 10, the new data generation time substitution unit 12, and the new approximate expression generation unit 14 , Accuracy constraint input unit 16, graph evaluation unit 17 a, graph update unit 18 a, default formula input unit 20, default formula valid domain calculation unit 40, valid domain capacity assessment unit 41, and valid domain update unit 42. Good.
  • each of these units may be realized by separate hardware.
  • FIG. 28 is a flowchart showing an example of the progress of the default formula update process by the default formula valid domain calculation unit 40, the valid domain capacity evaluation unit 41, and the valid domain update unit 42 in the fourth embodiment.
  • the data summarization system performs this default formula update process by performing the data input unit 10, the final time storage unit 11, the new data generation time substitution unit 12, the new approximate formula generation unit 14, and the uncertain point storage unit 13.
  • Data summarization processing (same data summarization processing as in the second embodiment) by the approximate expression storage unit 15, the graph evaluation unit 17a, the graph update unit 18a, the confirmed graph storage unit 19, and the default formula storage unit 21 ) And asynchronous.
  • the data summarization system may execute a predefined update process shown in FIG. 28 at regular time intervals.
  • the data summarization system may execute a default update process.
  • the data summarization system may execute a default expression update process.
  • the default formula valid domain calculation unit 40 reads the valid domain of all approximate formulas from the definite graph storage unit 19 (step S401). Further, the default effective range calculation unit 40 performs the calculation of the above-described formula (1), so that the default effective range S default Is calculated (step S402).
  • the predefined effective domain calculation unit 40 outputs the effective domain to the effective domain capacity evaluation unit 41.
  • the effective domain capacity evaluation unit 41 uses the default valid domain S default Then, referring to the effective domain of each approximate expression, an approximate expression that maximizes the storage capacity required to store the effective domain is specified (step S403).
  • the valid domain capacity evaluation unit 41 outputs the identified approximate expression and the default valid domain to the valid domain update unit 42.
  • the valid domain update unit 42 determines whether or not the approximate formula specified in step S403 is a default formula as an approximate formula that maximizes the storage capacity required to store the valid domain ( Step S404).
  • the valid domain update unit 42 updates the approximate expression specified in step S403 as a new prescribed expression (step S405). Specifically, the valid domain update unit 42 performs the following processing. The valid domain update unit 42 sets the approximate formula that maximizes the storage capacity of the valid domain as a new default formula, and updates the default formula stored in the default formula storage unit 21 to the new default formula. Then, the valid domain update unit 42 deletes the approximate expression and the approximate expression ID as a new default expression from the approximate expression storage unit 15. Further, the valid domain update unit 42 stores the approximate expression that has been set as the default expression in the approximate expression storage unit 15.
  • the valid domain update unit 42 assigns an approximate expression ID to the approximate expression (approximate expression that has been set as a default expression so far), and stores the approximate expression ID together with the approximate expression ID in the approximate expression storage unit 15.
  • the effective domain update unit 42 deletes the approximate expression ID of the approximate expression as a new default formula and its effective domain from the confirmed graph storage unit 19. Further, the valid domain update unit 42 confirms the approximate expression ID assigned to the approximate expression that has been used as the default formula and the valid domain (the valid domain calculated by the default formula valid domain calculation unit 40). The data is stored in the storage unit 19. Further, if the approximate expression ID of the new approximate expression is stored in the final time storage section 11 as the final approximate expression ID, the valid domain update unit 42 sets the ID as the default expression ID.
  • the valid domain update unit 42 assigns the default formula ID to the approximate formula that has been the default formula so far. Update to approximate expression ID. If the approximate expression specified in step S403 is a default expression (Yes in step S404), the valid domain update unit 42 does not update the default expression and ends the process as it is. That is, the valid domain update unit 42 ends the process without updating the contents stored in the default formula storage unit 21, the approximate formula storage unit 15, the confirmed graph storage unit 19, and the final time storage unit 11.
  • the data summarization system of the fourth embodiment compares each effective definition area of an approximate expression including a default expression, and sets an approximate expression having the maximum storage capacity for storing the effective definition area as a new default expression. Update the expression. Since the effective domain is not defined in the default formula, the data summarization system of the fourth embodiment reduces the storage capacity required for storing the valid domain by updating the default formula as described above, and more efficiently.
  • the data summarization system of the fourth embodiment can reduce the capacity required for storing the effective domain by two in this example.
  • the data summarization system according to the fourth embodiment includes an unsummary data storage unit 30, an available storage area monitoring unit 31, and a summary control unit 32, and stores data.
  • Data may be stored as it is when there are many resources that can be used, and data summarization may be performed when resources are reduced.
  • the case where the data includes a data value that is a numerical value is taken as an example.
  • the data value can be converted into a numerical value and a difference between the numerical data can be derived. May be included.
  • text information may be used as a data value if a conversion rule for numerical values is defined.
  • the data input unit 10 may convert the text information into a numerical value.
  • the subsequent processing is the same as in the above embodiment.
  • a vector may be included as a data value. That is, the data may include a vector and an occurrence time.
  • the new approximate expression generation unit 14 may generate an approximate expression for deriving an approximate value of a vector from a plurality of data (undefined points and newly generated data).
  • step S106 when the graph evaluation unit 17 or the graph evaluation unit 17a compares the vector calculated as the approximate value with the vector actually included in the data, the distance between the two in the vector space. May be calculated.
  • the graph evaluation unit 17 or the graph evaluation unit 17a may determine whether there is an approximate expression whose distance is less than the threshold value ⁇ (or less than ⁇ ).
  • the threshold value ⁇ or less than ⁇ .
  • FIG. 29 is a block diagram showing the minimum configuration of the present invention.
  • the data summarization system of the present invention includes an approximate value calculation unit 61, an approximate expression evaluation unit 62, an unconfirmed data storage unit 63, a new approximate expression generation unit 64, and an update unit 65.
  • the approximate value calculation unit 61 (for example, the new data generation time substitution unit 12) is an approximate expression for calculating an approximate value of a data value in data including the data value and the generation time of the data value, and the generation time is a variable. Approximate the approximate value of the data value of the new data by substituting the occurrence time included in the new data for each approximate expression in which the effective domain of the variable is defined as a time interval or a set of time points. Calculate for each formula.
  • the approximate expression evaluation unit 62 (for example, the graph evaluation unit 17) selects an approximate expression suitable for calculating the approximate value of the data value of the new data based on the approximate value calculated for each approximate expression and the data value of the new data.
  • the indeterminate data storage unit 63 (for example, the indeterminate point storage unit 13) converts the new data determined to have no approximate expression suitable for calculating the approximate value of the data value into the approximate expression indeterminate data (for example, the indeterminate point).
  • the new approximate expression generation unit 64 (for example, the new approximate expression generation unit 14) generates a new approximate expression from the new data and the approximate expression unconfirmed data when the new data is input to the new approximate expression generation unit 64.
  • the update unit 65 (for example, the graph update unit 18) approximates so that the generation time of the new data is included when the approximate expression evaluation unit 62 selects an approximate expression suitable for calculating the approximate value of the data value of the new data.
  • Update the effective domain of the expression The data summarization system including the above configuration stores each data in the form of an approximate expression and its effective domain, and defines the effective domain of one approximate expression as a time interval or a set of time points. Therefore, the data summarization system requires a small storage capacity for storing the approximate expression and its effective domain.
  • the data summarization system can efficiently summarize (compress) the data.
  • this advantage is remarkably obtained when summarizing data that occurs sequentially in a certain tendency and that may vary greatly irregularly.
  • a data summarization system having the following configuration is described.
  • An approximate expression for calculating an approximate value of a data value in data including a data value and an occurrence time of the data value, where the occurrence time is a variable, and the effective domain of the variable is a time interval or a single time
  • An approximate value calculation unit that calculates the approximate value of the data value of the new data for each approximate expression by substituting the occurrence time included in the new data for each approximate expression defined as a set of (for example, new data generation Based on the time substitution unit 12) and the approximate value calculated for each approximate expression and the data value of the new data, an approximate expression suitable for calculating the approximate value of the data value of the new data is selected, or the new data
  • An approximate expression evaluation unit (for example, the graph evaluation unit 17) that determines that there is no approximate expression suitable for calculating the approximate value of the data value, and new data determined that there is no approximate expression suitable for the approximate value calculation of the data value are approximated.
  • an unconfirmed data storage unit for example, an unconfirmed point storage unit 13
  • fixed data for example, unconfirmed points
  • new data and approximate expression unconfirmed data A new approximate expression that determines whether an approximate expression can be generated, generates a new approximate expression if it can be generated, and defines a time interval or a set of time points as an effective domain of the approximate expression
  • the generation unit for example, the new approximate expression generation unit 14
  • the approximate expression evaluation unit select an approximate expression suitable for calculating the approximate value of the data value of the new data, the approximation is performed so that the generation time of the new data is included.
  • a data summarization system comprising: an update unit (for example, graph update unit 18) that updates an effective domain of an expression.
  • the approximate expression evaluation unit specifies an approximate expression in which the relationship between the approximate value and the data value of the new data satisfies a predetermined criterion (for example, the criterion stored in the accuracy constraint input unit 16), and the approximate equation that satisfies the criterion If there is one, select the approximate expression. If there are multiple approximate expressions that meet the criteria, the effective definition area includes the time of occurrence of new data from the approximate expressions.
  • an approximate value of the data value of the new data A data summarization system that determines that there is no approximate expression suitable for calculation. (3) An approximate expression for calculating the approximate value of the data value, and including a default formula storage unit (for example, the default formula storage unit 21) that stores a default formula that is an approximate formula that does not define an effective domain, and calculates an approximate value. For each approximate expression including the default expression, the approximate value of the data value of the new data is calculated for each approximate expression by substituting the occurrence time included in the new data.
  • a default formula storage unit for example, the default formula storage unit 21
  • the relation between the approximate value and the data value of the new data satisfies the predetermined criterion from among the approximate expressions including the one, and if there is one approximate expression that satisfies the criterion, select the approximate expression. If there are multiple approximation formulas that satisfy the criteria and the default formula is included in the multiple approximation formulas, select the default formula, and there are multiple approximation formulas that satisfy the criteria. If the default expression is not included in the approximate expression, select from the multiple approximate expressions.
  • the data summarization system determines that there is no approximate expression suitable for calculating the approximate value of the data value of the new data.
  • a valid domain from a default formula valid domain calculator (for example, default formula valid domain calculator 40) for calculating a valid domain of a default formula and each approximate expression including the default formula
  • the effective domain capacity evaluation unit (for example, the effective domain capacity evaluation unit 41) that specifies an approximate expression that maximizes the storage capacity required for the storage, and the storage capacity required to store the effective domain If the approximate expression is not a default formula, the approximate formula with the maximum storage capacity is stored as a new default formula in the default formula storage unit, and the approximate formula with the maximum storage capacity and its effective domain are excluded.
  • a data summarization system comprising a predefined update unit (for example, an effective domain update unit 42).
  • New data storage unit for example, unsummarized data storage unit 30 for storing input new data
  • monitoring unit for example, available storage
  • monitoring resources capable of storing new data in the new data storage unit Area monitoring unit 31
  • a data summarization system comprising a summary control unit (for example, a summary control unit 32) for outputting.
  • a summary control unit for example, a summary control unit 32 for outputting.
  • Each form of the present invention can be suitably applied to a data summarization apparatus that summarizes data that is sequentially generated with a certain tendency and may change irregularly.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

La présente invention concerne un système d'agrégation de données capable de compresser efficacement des données qui sont générées séquentiellement suivant une tendance fixe et qui changent de façon sensiblement irrégulière. Une unité de calcul de valeur approchée (61) fait en sorte que chaque formule d'approximation calcule une valeur d'approximation d'une valeur de données de nouvelles données en substituant le temps de génération des nouvelles données dans des formules d'approximation au calcul de valeurs d'approximation des valeurs de données pour lesquelles chaque formule d'approximation a été définie pour que la variable soit le temps de génération, et le domaine valide de la variable soit un intervalle temporel ou un ensemble de points temporels. Une unité d'évaluation de formule d'approximation (62) sélectionne une formule d'approximation appropriée pour un calcul de valeur d'approximation d'une valeur de données de nouvelles données, ou détermine qu'il n'existe aucune formule d'approximation appropriée pour un calcul de valeur d'approximation de la valeur de données des nouvelles données sur la base de la valeur de données des nouvelles données et d'une valeur d'approximation calculée pour chaque formule d'approximation. L'unité d'actualisation (65) actualise le domaine valide de la formule d'approximation afin d'inclure le temps de génération des nouvelles données lorsque l'unité d'évaluation de formule d'approximation (62) a sélectionné une formule d'approximation appropriée pour un calcul de valeur d'approximation de la valeur de données des nouvelles données.
PCT/JP2010/064538 2009-09-04 2010-08-20 Système et procédé d'agrégation de données et support de stockage WO2011027714A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2011529886A JPWO2011027714A1 (ja) 2009-09-04 2010-08-20 データ要約システム、データ要約方法および記録媒体

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2009-205012 2009-09-04
JP2009205012 2009-09-04

Publications (1)

Publication Number Publication Date
WO2011027714A1 true WO2011027714A1 (fr) 2011-03-10

Family

ID=43649253

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2010/064538 WO2011027714A1 (fr) 2009-09-04 2010-08-20 Système et procédé d'agrégation de données et support de stockage

Country Status (2)

Country Link
JP (1) JPWO2011027714A1 (fr)
WO (1) WO2011027714A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254261A (zh) * 2020-02-07 2021-08-13 伊姆西Ip控股有限责任公司 数据备份方法、电子设备和计算机程序产品

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05159185A (ja) * 1991-12-02 1993-06-25 Toshiba Corp 発電プラント監視データ圧縮保存方法
JPH06175664A (ja) * 1992-12-10 1994-06-24 Yamaha Corp 楽音波形記憶方法及び楽音波形発生装置
JPH0765168A (ja) * 1993-08-31 1995-03-10 Hitachi Ltd 関数近似装置及び方法
JP2000149001A (ja) * 1998-11-17 2000-05-30 Sony Corp ディジタル画像記録装置
JP2002314922A (ja) * 2001-04-10 2002-10-25 Asahi Optical Co Ltd 画像データ記録装置、画像データ記録プログラム、そのプログラムが格納されるコンピュータ記録媒体およびそれを備えた画像入力機器
JP2002351860A (ja) * 2002-03-11 2002-12-06 Shiraishi Kenji 情報演算装置
WO2005039058A1 (fr) * 2003-10-17 2005-04-28 Matsushita Electric Industrial Co., Ltd. Procede et dispositif de generation de donnees de codage

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05159185A (ja) * 1991-12-02 1993-06-25 Toshiba Corp 発電プラント監視データ圧縮保存方法
JPH06175664A (ja) * 1992-12-10 1994-06-24 Yamaha Corp 楽音波形記憶方法及び楽音波形発生装置
JPH0765168A (ja) * 1993-08-31 1995-03-10 Hitachi Ltd 関数近似装置及び方法
JP2000149001A (ja) * 1998-11-17 2000-05-30 Sony Corp ディジタル画像記録装置
JP2002314922A (ja) * 2001-04-10 2002-10-25 Asahi Optical Co Ltd 画像データ記録装置、画像データ記録プログラム、そのプログラムが格納されるコンピュータ記録媒体およびそれを備えた画像入力機器
JP2002351860A (ja) * 2002-03-11 2002-12-06 Shiraishi Kenji 情報演算装置
WO2005039058A1 (fr) * 2003-10-17 2005-04-28 Matsushita Electric Industrial Co., Ltd. Procede et dispositif de generation de donnees de codage

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254261A (zh) * 2020-02-07 2021-08-13 伊姆西Ip控股有限责任公司 数据备份方法、电子设备和计算机程序产品

Also Published As

Publication number Publication date
JPWO2011027714A1 (ja) 2013-02-04

Similar Documents

Publication Publication Date Title
CN109753356A (zh) 一种容器资源调度方法、装置及计算机可读存储介质
US10346222B2 (en) Adaptive tree structure for visualizing data
JP5928091B2 (ja) タググループ分類方法、装置及びデータマッシュアップ方法、装置
WO2017188419A1 (fr) Dispositif de gestion de ressource de calcul, procédé de gestion de ressource de calcul et support d'enregistrement lisible par ordinateur
Choi et al. Scheduling algorithms to minimize the number of tardy jobs in two-stage hybrid flow shops
JP6176390B2 (ja) 情報処理装置、解析方法、及び、プログラム記録媒体
US11004007B2 (en) Predictor management system, predictor management method, and predictor management program
You Performance of synthetic double sampling chart with estimated parameters based on expected average run length
JPWO2013038473A1 (ja) ストリームデータの異常検知方法および装置
JP2008158806A (ja) 複数プロセッサエレメントを備えるプロセッサ用プログラム及びそのプログラムの生成方法及び生成装置
JP2015215756A (ja) プログラム中のif文の最適化方法
WO2011027714A1 (fr) Système et procédé d'agrégation de données et support de stockage
US20180210762A1 (en) Apparatus, method, and program medium for parallel-processing parameter determination
JP7353539B2 (ja) 定常範囲決定システム、定常範囲決定方法、および、定常範囲決定プログラム
CN115185456A (zh) 集群缩容风险提示方法、装置、设备及介质
AU2020462915B2 (en) Information processing system for assisting in solving allocation problems, and method
US12105772B2 (en) Dynamic and continuous composition of features extraction and learning operation tool for episodic industrial process
US20130254894A1 (en) Information processing device, non-transitory computer readable medium, and information processing method
JP6213665B2 (ja) 情報処理装置、及び、クラスタリング方法
JP2015108877A (ja) 予測時間分布生成装置、制御方法、及びプログラム
WO2020044413A1 (fr) Dispositif d'inférence d'hypothèse, procédé d'inférence d'hypothèse, et support d'enregistrement lisible par ordinateur
JP2011076544A (ja) 作業手順策定支援システム、作業手順策定支援方法、および作業手順策定支援プログラム
US20220253364A1 (en) Method of calculating predicted exhaustion date and non-transitory computer-readable medium
JP7355375B2 (ja) 入力項目表示制御システム、および入力項目表示制御方法
JP7258253B1 (ja) 正常モデル生成プログラム、正常モデル生成装置および正常モデル生成方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10813663

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2011529886

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10813663

Country of ref document: EP

Kind code of ref document: A1