WO2011027714A1 - Data summarization system, data summarization method and storage medium - Google Patents

Data summarization system, data summarization method and storage medium Download PDF

Info

Publication number
WO2011027714A1
WO2011027714A1 PCT/JP2010/064538 JP2010064538W WO2011027714A1 WO 2011027714 A1 WO2011027714 A1 WO 2011027714A1 JP 2010064538 W JP2010064538 W JP 2010064538W WO 2011027714 A1 WO2011027714 A1 WO 2011027714A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
approximate
approximate expression
value
expression
Prior art date
Application number
PCT/JP2010/064538
Other languages
French (fr)
Japanese (ja)
Inventor
今井照之
喜田弘司
海老山知生
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2011529886A priority Critical patent/JPWO2011027714A1/en
Publication of WO2011027714A1 publication Critical patent/WO2011027714A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Definitions

  • the present invention is applied to a data summarization system, a data summarization method, a data summarization program, a recording medium, and a data summarization system, a data summarization method, a data summarization program, and a recording medium that reduce the amount of information of sequentially generated data.
  • a data summarization system a data summarization method, a data summarization program, and a recording medium that reduce the amount of information of sequentially generated data.
  • Patent Document 1 describes a data compression method based on a difference for process data.
  • Process data is time-series data values.
  • the compression method based on the difference described in Patent Document 1 uses this process data as a reference point when each process data is represented on a two-dimensional plane with the x-axis as time and the y-axis as process values. A difference in process value is obtained between the reference point and the process data to be processed.
  • process data whose absolute value of the difference exceeds the compression accuracy range calculated for the process data to be processed is stored in the time-series process data storage unit, and other process data is stored. Decrease as much as possible.
  • Patent Document 1 compresses data.
  • the compression accuracy is set for each process data and is used to determine whether or not to compress.
  • the higher the compression accuracy the higher the possibility of compression. That is, in the technique described in Patent Document 1, high compression accuracy is a concept similar to an increase in compression rate, and low compression accuracy is a concept similar to a reduction in compression rate.
  • Patent Document 2 describes a technique for predicting a next input value based on a past input value, storing a difference between the actual input value and the predicted value, and performing data compression.
  • JP 2003-15734 A paragraphs 0039, paragraphs 0063-0065
  • JP-A-2006-259937 paragraphs 0031 to 0033
  • the technique described in Patent Document 1 determines whether or not to compress process data by comparing the difference between the process value of the process data and the reference data with the compression accuracy. For this reason, when the process value of the process data to be compressed changes greatly in a discontinuous manner, the difference between the process value and the reference value exceeds the compression accuracy, and the process data becomes difficult to be thinned out (that is, difficult to compress). ). Therefore, in the technique described in Patent Document 1, efficient data compression is difficult when the value of the data often changes greatly suddenly.
  • the technique described in Patent Literature 2 realizes data compression by storing a difference between a prediction result based on a past value and an actual input value.
  • the present invention provides a data summarization system, a data summarization method, a data summarization program, and a data summarization method capable of efficiently compressing data that is sequentially generated in a certain tendency and that may change greatly irregularly.
  • An example is to provide a recording medium.
  • Another object is to provide a data structure suitably applied to such a data summarization system, data summarization method, data summarization program, and recording medium.
  • a data summarization system is an approximate expression for calculating an approximate value of a data value in data including a data value and an occurrence time of the data value, the occurrence time being a variable, and an effective definition range of the variable Is an approximate value that calculates the approximate value of the data value of the new data for each approximate expression by substituting the occurrence time included in the new data for each approximate expression that is defined as a time interval or a set of time points Based on the calculation unit, the approximate value calculated for each approximate expression and the data value of the new data, select an approximate expression suitable for calculating the approximate value of the data value of the new data, or the data value of the new data An approximate expression evaluation unit that determines that there is no approximate expression suitable for approximate value calculation, and an unconfirmed data that stores new data determined to have no approximate expression suitable for approximate calculation of data values as approximate formula unconfirmed data.
  • a new approximate expression generation unit that generates an expression and defines a time interval or a set of time points as an effective definition area of the approximate expression, and an approximate expression suitable for calculating the approximate value of the data value of the new data by the approximate expression evaluation unit
  • an update unit that updates the effective definition area of the approximate expression so as to include the generation time of the new data when selected.
  • a data summarization method is an approximate expression for calculating an approximate value of a data value in data including a data value and an occurrence time of the data value, wherein the occurrence time is a variable, By substituting the occurrence time included in the new data for each approximate expression whose domain is defined as a time interval or a set of time points, the data value of the new data is calculated for each approximate expression.
  • an approximation formula suitable for calculating the approximate value of the new data based on the approximate value calculated every time and the data value of the new data, or an approximation suitable for calculating the approximate value of the data value of the new data
  • New data determined that there is no formula and determined that there is no approximate formula suitable for calculating the approximate value of the data value is stored as approximate formula indeterminate data, and when new data is input, the new data It is determined whether or not a new approximate expression can be generated based on the undefined data of the similar expression, and if it can be generated, a new approximate expression is generated, and a time interval or a single point is defined as an effective definition area of the approximate expression.
  • a recording medium storing a data summarizing program according to an aspect of the present invention is an approximate expression for calculating an approximate value of a data value in data including a data value and an occurrence time of the data value in a computer.
  • Approximating the data value of the new data by substituting the time of occurrence included in the new data for each approximate expression where the time is a variable and the effective domain of the variable is defined as a time interval or a set of time points
  • Approximate value calculation processing that calculates a value for each approximate expression, whether to select an approximate expression suitable for calculating the approximate value of the data value of the new data based on the approximate value calculated for each approximate expression and the data value of the new data
  • approximate expression evaluation processing for determining that there is no approximate expression suitable for calculating the approximate value of the data value of the new data, and new data determined that there is no approximate expression suitable for calculating the approximate value of the data value
  • Unconfirmed data storage process to be stored in the unconfirmed data storage unit as approximate expression unconfirmed data, whether new approximate expression can be generated from new data and approximate expression unconfirmed data when new data is input
  • a new approximate expression is generated, and if it can be generated, a new approximate expression is generated, and a new
  • the data structure according to one aspect of the present invention includes an approximation formula for calculating an approximate value of a data value by substituting a variable, and an effective domain that is a domain of a variable that can obtain the approximate value of the data value.
  • the effective domain is represented by a set of points representing a variable interval or one variable value.
  • Each form of the present invention can efficiently compress data that is sequentially generated with a certain tendency and that may change greatly irregularly.
  • the data structure according to an aspect of the present invention can be suitably used for a data summarization system, a data summarization method, a data summarization program, and a recording medium having such advantages.
  • FIG. 1 is an explanatory diagram showing an example of an effective domain.
  • FIG. 2 is a block diagram illustrating an example of the data summarization system according to the first embodiment of this invention.
  • FIG. 3 is an explanatory diagram illustrating an example of one data input to the data input unit 10.
  • FIG. 4 is an explanatory diagram illustrating an example of uncertain points stored in the uncertain point storage unit 13.
  • FIG. 5 is an explanatory diagram schematically showing uncertain points and newly generated data.
  • FIG. 6 is an explanatory diagram schematically showing an approximate expression generated from each data shown in FIG.
  • FIG. 7 is an explanatory diagram illustrating an example of the expression format of the approximate expression.
  • FIG. 8 is an explanatory diagram showing an example of an approximate expression stored in the approximate expression storage unit 15 and an approximate expression ID thereof.
  • FIG. 1 is an explanatory diagram showing an example of an effective domain.
  • FIG. 2 is a block diagram illustrating an example of the data summarization system according to the first embodiment of this invention.
  • FIG. 9 is an explanatory diagram illustrating an example of final data information.
  • FIG. 10 is an explanatory diagram showing an example of processing of the new data generation time substitution unit 12.
  • FIG. 11 is an explanatory diagram showing an example of an effective definition area for each approximate expression.
  • FIG. 12 is an explanatory diagram illustrating an example of selecting an approximate expression that satisfies a criterion.
  • FIG. 13 is an explanatory diagram illustrating an example of selecting an approximate expression that satisfies a criterion.
  • FIG. 14 is an explanatory diagram illustrating an example of selecting an approximate expression that satisfies a criterion.
  • FIG. 15 is an explanatory diagram illustrating an example of valid domain update when a new approximate expression is generated.
  • FIG. 10 is an explanatory diagram showing an example of processing of the new data generation time substitution unit 12.
  • FIG. 11 is an explanatory diagram showing an example of an effective definition area for each approximate expression.
  • FIG. 12 is an explanatory diagram
  • FIG. 16 is a flowchart illustrating an example of processing progress of the first embodiment.
  • FIG. 17 is a flowchart illustrating an example of the processing progress of step S105.
  • FIG. 18 is an explanatory diagram illustrating an example in which data compression is performed by applying the technique described in Patent Document 1.
  • FIG. 19 is an explanatory diagram illustrating an example of a situation in which irregular data is continuously generated temporarily.
  • FIG. 20 is a block diagram illustrating an example of a data summarization system according to the second embodiment of this invention.
  • FIG. 21 is an explanatory diagram showing an example of a predetermined formula.
  • FIG. 22 is an explanatory diagram showing an example of a case where a predetermined expression is expressed only by a constant term.
  • FIG. 23 is a block diagram illustrating an example of a data summarization system according to the third embodiment of this invention.
  • FIG. 24 is a schematic diagram in which the data stored in the unsummarized data storage unit 30 is schematically arranged in the order of occurrence time.
  • FIG. 25 is a block diagram illustrating an example of a data summarization system according to the fourth embodiment of this invention.
  • FIG. 26 is an explanatory diagram showing an example of deriving a default effective definition area.
  • FIG. 28 is a flowchart illustrating an example of the progress of the default update process according to the fourth embodiment.
  • FIG. 29 is a block diagram showing the minimum configuration of the present invention.
  • a data summarization system reduces the amount of information of data that is sequentially generated over time.
  • the data summarization system can reduce the amount of information, thereby reducing the storage capacity required for storing data as compared with the case where the data itself is stored accurately as it is. This reduction of data information is called “summary”. Generally, the accuracy of data decreases due to summarization.
  • a data summarization system is a data that is generated with a constant tendency sequentially with the passage of time, and the data in which the tendency of the data may change irregularly can be accurately and efficiently obtained.
  • CPU usage rate has been exemplified as an example of data that is generated with a constant trend and the data trend may change irregularly.
  • the usage rate is not limited. For example, the number of accesses per unit time of the Web page to be observed and the total number of accesses after the Web page has been released suddenly change. Data that may change irregularly ”.
  • the communication amount of the network device also corresponds to “data that is sequentially generated in a certain tendency and the data tendency may change irregularly”.
  • “Sequentially generated data” is a concept including “sequentially observable data”.
  • numerical values that occur with the passage of time will be described as an example as “data that occurs sequentially with a certain tendency and may change irregularly”.
  • the data itself is not a numerical value, it can be converted into a numerical value, and any data can be used as long as the difference between the numerical data can be derived.
  • An example of applying the present invention to such data other than numerical values will be described later.
  • each data includes a data value and an occurrence time of the data value. In the following description, the occurrence time of this data value may be simply referred to as the occurrence time of data.
  • the data summarization system derives a function for calculating a data value (numerical value) from a set of generated data, using the generation time as a variable.
  • This function is an approximate expression for obtaining an approximate value of the data value from the occurrence time.
  • a domain in which an approximate value of the data value can be obtained is determined.
  • This domain is hereinafter referred to as an effective domain.
  • the effective domain is represented by a set of sections (time zones) or points (specific time). A plurality of sections or points may be defined for one effective definition area.
  • FIG. 1 is an explanatory diagram showing an example of an effective domain. In FIG. 1, the horizontal axis represents time, and the vertical axis represents data values. Further, in FIG.
  • the data summarization system determines whether or not there is an approximate expression that appropriately obtains an approximate value of the data value when new data is generated, and approximate expression that appropriately obtains the approximate value of the data value If there is, the generation time of the new data is added to the effective definition area of the approximate expression. On the other hand, if there is no approximate expression that can appropriately obtain an approximate value of newly generated data, the new data is stored as a point for which the corresponding approximate expression is not determined (hereinafter referred to as an indeterminate point).
  • the data summarization system creates an approximate expression from the uncertain points when the uncertain points are accumulated in a number that can derive a new approximate expression.
  • the data summarization system stores each data in the form of an approximate expression and an effective definition area, instead of storing the generation time and the data value for each data. Furthermore, the data summarization system according to an aspect of the present invention allows a plurality of sections and points (specific times) to be defined as an effective definition area of one approximate expression. As a result, a data summarization system according to one aspect of the present invention efficiently compresses (ie summarizes) data.
  • FIG. FIG. 2 is a block diagram illustrating an example of the data summarization system according to the first embodiment of this invention.
  • the data summarization system of the first embodiment includes a data input unit 10, a final time storage unit 11, a new data generation time substitution unit 12, a new approximate expression generation unit 14, an indeterminate point storage unit 13, and an approximation.
  • An expression storage unit 15, an accuracy constraint input unit 16, a graph evaluation unit 17, a graph update unit 18, and a confirmed graph storage unit 19 are provided.
  • the data summarization system according to the first embodiment of the present invention summarizes data sequentially input to the data input unit 10 and stores the summarized result in the definite graph storage unit 19.
  • the data input unit 10 acquires data from a data generation source (not shown) that sequentially generates data over time.
  • the mode of the data source differs depending on the type of data.
  • the web server may be the data generation source.
  • the unit that monitors the usage rate of the CPU may be the data generation source.
  • Each piece of data input from the data generation source to the data input unit 10 includes at least a data value and a generation time of the data value.
  • FIG. 3 is an explanatory diagram illustrating an example of one data input to the data input unit 10.
  • FIG. 3 illustrates the case where the data value is the CPU usage rate.
  • the data includes the generation time of the data value and the data value (CPU usage rate in this example).
  • the uncertain point storage unit 13 is a storage device that stores data for which an approximate expression for obtaining an approximate value of a data value has not yet been specified. Regardless of which approximate value is calculated using any existing approximate expression, data that is determined to have a large difference between the actual data value and the approximate value is stored in the indeterminate point storage unit 13 as an indeterminate point. Go.
  • the data can be regarded as a point having the occurrence time and the data value as coordinates.
  • data for which an approximate expression for obtaining an approximate value of a data value has not yet been specified is expressed using the word “indeterminate point”.
  • the approximate expression for obtaining the approximate value of the data value using the occurrence time as a variable is derived from a plurality of uncertain points stored in the uncertain point storage unit 13. Until the number of pieces of data necessary for determining the approximate expression is obtained, the uncertain point storage unit 13 stores generated data corresponding to the uncertain points. Note that the number of data necessary to determine the approximate expression depends on the type of approximate expression (whether it is a linear expression, a quadratic expression, an expression using a trigonometric function, etc.), an approximate expression, or the like. Depends on the decision algorithm.
  • FIG. 4 is an explanatory diagram showing an example of uncertain points stored in the uncertain point storage unit 13.
  • Each undetermined point includes an occurrence time of the undetermined point (data) and a data value (CPU usage rate in this example).
  • the generated data determines that the corresponding approximate expression cannot be specified becomes an undetermined point, and therefore the data structure of each undetermined point is the same as the data structure of the data illustrated in FIG.
  • the new approximate expression generation unit 14 is configured such that when newly generated data is input to the new approximate expression generation unit 14, the number of the generated data and the undefined points stored in the undefined point storage unit 13 is It is determined whether or not the number necessary for determining the approximate expression is exceeded.
  • the new approximate expression generation unit 14 determines that the number of pieces of data necessary for determining the approximate expression has been prepared, the new approximate expression generation unit 14 calculates a function (approximate expression) for calculating a data value from the data using the occurrence time as a variable. Generate. For example, it is assumed that the number of data required for generating the approximate expression is k, and k ⁇ 1 uncertain points are stored in the uncertain point storage unit 13.
  • the new approximate expression generation unit 14 when one new generation data is newly input to the new approximate expression generation unit 14, the new approximate expression generation unit 14 generates an approximate expression from k pieces of data obtained by adding the data to the undetermined point. Furthermore, the new approximate expression generation unit 14 generates the approximate expression after generation of the approximate expression and each data used for the generation of the approximate expression (that is, the uncertain point stored in the uncertain point storage unit 13, and Newly generated data) is output to the graph update unit 18. Instead of outputting the newly generated approximate expression to the graph updating unit 18, the new approximate expression generating unit 14 stores the approximate expression and the ID (identification information) of the approximate expression in the approximate expression storage unit 15, The approximate expression ID may be output to the graph update unit 18.
  • FIG. 5 is an explanatory diagram schematically showing uncertain points used for generating an approximate expression and newly generated data.
  • the horizontal axis shown in FIG. 5 represents time t, and the vertical axis shown in FIG. 5 represents the data value x.
  • the new approximate expression generation unit 14 generates a linear expression as an approximate expression.
  • the generation method is assumed to be a least square method.
  • the new approximate expression generation unit 14 generates a linear expression by the least square method, the number of necessary data is four.
  • Three uncertain points P shown in FIG. 400 Is stored in the indeterminate point storage unit 13 and the data P is newly added. 401 Is input to the data input unit 10. Then, the new approximate expression generation unit 14 has three uncertain points P.
  • FIG. 6 is an explanatory view schematically showing an approximate expression generated from each data shown in FIG.
  • FIG. 6 is an explanatory view schematically showing an approximate expression generated from each data shown in FIG.
  • FIG. 7 is an explanatory diagram showing an example of the expression format of the approximate expression generated by the new approximate expression generation unit 14.
  • the new approximate expression generation unit 14 obtains an approximate expression represented by a linear function by the least square method using three uncertain points and one new generated data is shown.
  • the function used by the new approximate expression generation unit 14 as an approximate expression is not limited to a linear function.
  • the new approximate expression generation unit 14 may generate an approximate expression represented by a quadratic or higher integer function, an exponential function, or a trigonometric function.
  • the method of generating the approximate expression is not limited to the least square method, and the approximate expression may be generated by another method.
  • the number of data necessary for generating the approximate expression differs depending on the type of approximate expression and the method for generating the approximate expression.
  • the new approximate expression generation unit 14 may generate an approximate expression.
  • the new approximate expression generation unit 14 may generate an approximate expression as a linear function connecting two points. In this case, if there are two data, the new approximate expression generation unit 14 can generate an approximate expression.
  • the new approximate expression generation unit 14 generates, as an approximate expression, a straight line connecting two points in the plane of the generation time t and the data value x from one uncertain point and one newly generated data. May be. In this case, the number of data necessary for generating the approximate expression is two. Further, the new approximate expression generation unit 14 may generate the approximate expression using other methods such as spline interpolation. In the following description, a case where the new approximate expression generation unit 14 generates an approximate expression of a linear function will be described as an example.
  • the approximate expression storage unit 15 is a storage device that stores an approximate expression for obtaining a data value using the occurrence time as a variable, together with an ID of the approximate expression.
  • the graph update unit 18 stores the approximate expression in the approximate expression storage unit 15 together with the ID.
  • the graph update unit 18 may assign the ID of the approximate expression.
  • the approximate expression storage unit 15 may store a combination of a first-order coefficient and a constant term as in the case shown in FIG.
  • this storage mode is an example, and the approximate expression storage unit 15 may store the approximate expression in another form.
  • FIG. 8 is an explanatory diagram illustrating an example of an approximate expression stored in the approximate expression storage unit 15 and an approximate expression ID thereof. As shown in FIG.
  • the approximate expression storage unit 15 displays an approximate expression ID, which is identification information of the approximate expression, and an approximate expression (in this example, expressed by a combination of a primary coefficient and a constant term).
  • the final time storage unit 11 is a storage device that stores a set of the generation time of data that has occurred at the end other than the uncertain point and an approximate expression that approximates the data value of the data.
  • the last time storage unit 11 is the approximation that appropriately obtains the generation time of the last generated data and the approximate value of the data among the data for which the approximate expression that appropriately obtains the approximate value of the data value is specified. Memorize a pair with an expression.
  • FIG. 9 is an explanatory diagram showing an example of final data information stored in the final time storage unit 11.
  • the final data information includes an approximate expression ID and a final time.
  • the approximate expression is specified by the approximate expression ID.
  • the final time is the data generation time of the data that has occurred last among the data that can be approximated by the approximate expression.
  • the uncertain point stored in the uncertain point storage unit 13 is data that cannot be approximated by each known approximate expression, and thus is not a target to be stored in the final time storage unit 11.
  • the final time stored in the final time storage unit 11 is not updated.
  • the approximate expression is represented by the approximate expression ID, but the final time storage unit 11 includes the approximate expression generated by the new approximate expression generation unit 14 and the approximate expression stored by the approximate expression storage unit 15.
  • the approximate expression may be stored in a similar format.
  • the final time storage unit 11 may store a combination of a primary coefficient and a constant term as information representing an approximate expression instead of the approximate expression ID.
  • the new data generation time substitution unit 12 substitutes the generation time of the generation data newly input to the data input unit 10 for each approximate expression generated in the past, and calculates an approximate value of the data value.
  • the new data generation time substitution unit 12 outputs a set of each approximate expression and an approximate value calculated for each approximate expression to the graph evaluation unit 17.
  • the new data generation time substitution unit 12 reads the final data information from the final time storage unit 11.
  • the new data generation time substitution unit 12 determines which of the sets of the approximate expression and the approximate value is the set corresponding to the set of the approximate expression indicated by the final data information and the approximate value obtained from the approximate expression. Is also output to the graph evaluation unit 17. Further, the new data generation time substitution unit 12 also outputs the generated data newly input to the data input unit 10 to the graph evaluation unit 17.
  • FIG. 10 is an explanatory diagram showing an example of processing of the new data generation time substitution unit 12.
  • x f0 (t)
  • x f1 (t)
  • x f2 (t)
  • each black circle represents each data.
  • the data shown in the vicinity of the line representing each approximate expression is data that can approximate the data value with the approximate expression.
  • the occurrence time t of new occurrence data i To calculate approximate values. This approximate value is X 1010 , X 1011 , X 1012 , X 1013
  • the accuracy constraint input unit 16 receives a standard (accuracy) that can be said that the approximate value calculated by the approximate expression appropriately approximates the actual data value, and stores the standard.
  • a standard accuracy
  • the absolute value of the difference between the approximate value f (t) by the approximate expression f and the actual generated data value x is less than a predetermined threshold value ⁇ .
  • This criterion can be expressed as
  • a criterion is set that the absolute value of the ratio of the difference between the approximate value f (t) and the actual generated data value x with respect to the approximate value f (t) by the approximate expression f is less than the threshold value ⁇ . It may be.
  • This criterion can be expressed as
  • the case where the above calculation results are both less than the threshold is exemplified as the reference, but a reference that the above calculation results are equal to or less than the threshold may be used.
  • the definite graph storage unit 19 is a storage device that stores an effective domain for each approximate expression that approximates past generated data.
  • FIG. 11 is an explanatory diagram illustrating an example of an effective definition area for each approximate expression stored in the definite graph storage unit 19. In the example shown in FIG. 11, each approximate expression is represented by an approximate expression ID.
  • a time range that is an effective definition area and a time point that is an effective definition area are determined.
  • a portion expressed as a time range (that is, a time zone) in the valid definition area is shown as “section”, and a portion expressed as a specific point of time is shown as “point”.
  • point a portion expressed as a specific point of time.
  • t 22b ⁇ t ⁇ t 22e , T 23b ⁇ t ⁇ t 23e , T t 21 ⁇ .
  • the effective domain of another approximate expression can also be specified from the information stored in the definite graph storage unit 19.
  • the confirmed graph storage unit 19 may store information of the following data structure. That is, the deterministic graph storage unit 19 corresponds to an approximate expression for calculating an approximate value of a data value by substituting a variable, and an effective definition area that is a variable definition range capable of obtaining the approximate value of the data value.
  • the valid domain may store information of a data structure represented by a variable section or a set of points representing one variable value.
  • this variable is a variable representing time.
  • this data structure it corresponds to another approximate expression between the sections of the effective domain associated with a certain approximate expression, or between points, or between sections. It is permissible to have a valid domain or point attached. For example, the order of the time series of each section and point shown in FIG.
  • the data summarization system, data summarization method, and data summarization program according to each aspect of the present invention can preferably use such a data structure.
  • the graph evaluation unit 17 calculates each approximate value calculated by the new data generation time substituting unit 12 by substituting the generation time of new data for each approximate expression stored in the approximate expression storage unit 15 and the new data. Compare the actual data value of. Then, the graph evaluation unit 17 specifies an approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16. Furthermore, when there are a plurality of approximate expressions satisfying the criterion, the graph evaluation unit 17 identifies an approximate expression that minimizes the increase in storage capacity when updating the effective domain, among the approximate expressions satisfying the criterion.
  • the graph evaluation unit 17 determines to approximate the data value of the new data with the approximate expression. Then, the graph evaluation unit 17 outputs the determined approximate expression, its valid domain, and new data (generated data received from the new data generation time substitution unit 12) to the graph update unit 18. At this time, the graph evaluation unit 17 also outputs information indicating whether or not the determined approximate expression is the final approximate expression to the graph update unit 18. Moreover, the graph evaluation part 17 should just output the approximate expression ID to the graph update part 18, for example as the determined approximate expression.
  • the graph evaluation unit 17 may read the effective definition area from the confirmed graph storage unit 19.
  • the graph evaluation unit 17 When the approximate expression output from the new data generation time substitution unit 12 to the graph evaluation unit 17 is expressed in the form of the approximate expression ID, the graph evaluation unit 17 is stored in the approximate expression storage unit 15. All approximate expressions are read from the approximate expression storage unit 15. The graph evaluation unit 17 may not be able to specify an approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16. That is, there may be no approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16. In that case, the graph evaluation unit 17 may output new data (generated data received from the new data generation time substitution unit 12) to the graph update unit 18 without selecting an approximate expression. An example of approximate expression selection by the graph evaluation unit 17 will be specifically described with reference to FIGS.
  • FIGS. 11 , I 01 , I 12 Etc. are sections and points included in the effective definition area, and correspond to the sections and points illustrated in FIG.
  • x f0 (t)
  • x f1 (t)
  • x f2 (t)
  • x f3 (t )
  • the generation time of new generation data received by the graph evaluation unit 17 is t. i
  • the data value of the generated data is x i
  • time t i Data P 1021 The case where this occurs is illustrated.
  • ⁇ holds, the graph evaluation unit 17 selects x f1 (t). When the plurality of approximate expressions satisfy the accuracy criterion, the graph evaluation unit 17 pays attention to the plurality of approximate values individually. Then, the graph evaluation unit 17 calculates the increase in the storage capacity for storing the effective domain when the effective domain is updated on the assumption that the approximate expression of interest is an approximate expression representing new generated data. calculate.
  • FIG. 13 illustrates a selection example of this aspect.
  • Data P 1022 The case where this occurs is illustrated.
  • both of the data P 1022 Data value x i Satisfies the criterion that the absolute value of the difference between the approximate value and the approximate value is less than the threshold value ⁇ .
  • the graph evaluation unit 17 may select the final approximate expression that is the final approximate expression among the approximate expressions that satisfy the criterion and that has the end of the effective domain as the end point of the “section”.
  • T is the effective definition area of the final approximate expression where the end of the effective definition area is the end point of the “section”.
  • the graph evaluation unit 17 selects the final approximate expression in which the end of the effective domain is the end point of the “section”.
  • the storage capacity necessary for expressing the effective domain does not increase when the effective domain is updated.
  • the effective domain after the update is ⁇ t 21 ⁇ ⁇ [t 22b , T 22e ] ⁇ [t 23b , T i ],
  • the storage capacity necessary for expressing the effective domain does not increase.
  • the graph evaluation unit 17 selects 1 from the approximate expressions.
  • One approximate expression may be selected.
  • the method for selecting the approximate expression may be a method in which one approximate expression is arbitrarily determined in advance.
  • Figure 14 shows time t i Data P 1023 The case where this occurs is illustrated.
  • both of the data P 1023 Data value x i Satisfies the criterion that the absolute value of the difference between the approximate value and the approximate value is less than the threshold value ⁇ . That is,
  • This selection method is an example of selecting one approximate expression from a plurality of approximate expressions having the same increase in storage capacity, and the approximate expression may be selected by another method. For example, t in the effective definition area i Among a plurality of approximate formulas having the same increase in storage capacity when adding, the graph evaluation unit 17 may select an approximate formula whose upper limit (end) of the effective domain is closest to the current time. In other words, the graph evaluation unit 17 may select an approximate expression having the maximum upper limit (end) of the effective domain. In the example shown in FIG.
  • the graph update unit 18 updates the effective definition area of the approximate expression selected by the graph evaluation unit 17 according to the content, or the uncertain point storage unit 13. Add indeterminate points to.
  • the graph update unit 18 when there is an input from the new approximate expression generation unit 14, the graph update unit 18 newly registers the approximate expression in the approximate expression storage unit 15.
  • the graph update unit 18 uses the graph evaluation unit 17 to display the approximate expression determined by the graph evaluation unit 17 and its valid definition area, new generated data, and information indicating whether the determined approximate expression is the final approximate expression.
  • the graph updating unit 18 updates the effective domain of the approximate expression.
  • the graph updating unit 18 may update the effective definition area so as to add the generation time of new generation data to the effective definition area of the input approximate expression.
  • the graph update unit 18 stores the updated effective definition area of the approximate expression determined by the graph evaluation unit 17 in the confirmed graph storage unit 19. At this time, the graph updating unit 18 may update the effective domain by dividing the case as follows.
  • the graph update unit 18 uses the point as the start point, What is necessary is just to create the new "section” which makes the generation
  • the graph updating unit 18 may exclude the “point” that is the end of the effective definition area from the classification of “points” in the effective definition area. In the example illustrated in FIG. 11, the graph updating unit 18 excludes one point from the “point” item and adds a new section to the “section” item.
  • the graph update unit 18 sets the generation time of the new generation data as “point” and validates the approximate expression. Add to the domain. Further, when updating the effective definition area of the approximate expression determined by the graph evaluation unit 17, the graph update unit 18 also updates the final data information. The graph update unit 18 updates the approximate expression ID of the final data information (see FIG.
  • the graph update unit 18 stores the generated data as an undefined point in the undefined point storage unit 13.
  • the graph update unit 18 only stores one unconfirmed point in the unconfirmed point storage unit 13, and information stored in the final time storage unit 11, the approximate expression storage unit 15, and the confirmed graph storage unit 19. Will not be updated.
  • the new approximate expression generation unit 14 when the new approximate expression generation unit 14 newly generates an approximate expression from the uncertain point, the new approximate expression generation unit 14 stores the approximate expression and each data used for generating the approximate expression (that is, the uncertain point storage). Each uncertain point stored in the unit 13 and newly input generated data) are output to the graph update unit 18.
  • the graph updating unit 18 assigns an approximate expression ID to the new approximate expression, and stores the new approximate expression and the approximate expression ID in association with each other in the approximate expression storage unit 15.
  • the graph update unit 18 deletes the undetermined point used for generating a new approximate expression. In other words, the graph update unit 18 deletes each undetermined point stored in the undetermined point storage unit 13.
  • the graph update unit 18 determines an effective definition area of the new approximate expression from the generation time of each data used for generating the approximate expression, and stores it in the definite graph storage unit 19 together with the approximate expression ID.
  • the graph updating unit 18 reduces the number of sections having the generation time of the data used for generating the new approximate expression as the starting point and the ending point as much as possible, and the effective definition of the existing approximate expression.
  • the effective domain is defined so that it does not overlap with the domain.
  • the graph updating unit 18 may determine a time that exists independently between sections and points in the effective definition area of the existing approximate expression and cannot be set as the start point or end point of the section. Referring to FIG. 11, FIG. 13, FIG. 12, and FIG.
  • the end of the effective domain is the section I 23 Since the end point of the graph, the graph updating unit 18 23 [T 23b , T 23e ] (See FIG. 11) end point t 23e T i Update to.
  • the graph update unit 18 stores the updated valid definition area in the confirmed graph storage unit 19.
  • the section corresponding to the approximate expression ID “f2” shown in FIG. 22b , T 22e ], [T 23b , T 23e ] To [t 22b , T 22e ], [T 23b , T i ] Is updated.
  • the graph update unit 18 also stores the undetermined point P. 1030 And new generation data P 1031
  • the effective domain is determined from The graph update unit 18 determines the uncertain point P shown in FIG.
  • the data input unit 10, the new data generation time substitution unit 12, the new approximate expression generation unit 14, the accuracy constraint input unit 16, the graph evaluation unit 17, and the graph update unit 18 are executed by, for example, a CPU of a computer that operates according to the data summarization program. Realized.
  • a program storage device (not shown) of the computer stores the data summarization program, and the CPU reads the program, and according to the program, the data input unit 10, the new data generation time substitution unit 12, and the new approximate expression generation unit 14
  • the accuracy constraint input unit 16, the graph evaluation unit 17, and the graph update unit 18 may be operated.
  • each of these units may be realized by separate hardware.
  • the final time storage unit 11, the uncertain point storage unit 13, the approximate expression storage unit 15, and the confirmed graph storage unit 19 may be realized by separate storage devices. Alternatively, it may be realized by the same storage device. Further, some combinations of the final time storage unit 11, the uncertain point storage unit 13, the approximate expression storage unit 15, and the confirmed graph storage unit 19 may be realized by the same storage device.
  • FIG. 16 is a flowchart illustrating an example of processing progress of the first embodiment.
  • a data generation source (not shown) sequentially generates data (No in step S100)
  • data is sequentially input from the data generation source to the data input unit 10 (step S101).
  • step S101 data is input to the data input unit 10 in order of generation time one by one.
  • the data summarization system performs the subsequent operations for each piece of data.
  • a plurality of data may be input to the data input unit 10 all together, but even in that case, the data summarization system performs the subsequent processing for each piece of data in order of data generation time.
  • step S102 can the new approximate expression generation unit 14 generate a new approximate expression from one generated data input to the data input unit 10 and the uncertain points stored in the uncertain point storage unit 13? It is determined whether or not (step S102).
  • step S ⁇ b> 102 the new approximate expression generation unit 14 reads each undetermined point stored in the undetermined point storage unit 13, and uses the undetermined point and new generated data input to the data input unit 10. What is necessary is just to determine whether or not the number of data necessary for the approximate expression generation has been prepared.
  • the new approximate expression generation unit 14 determines that a new approximate expression can be generated if the number of data necessary for generating the approximate expression is complete, and if it determines that a new approximate expression cannot be generated otherwise. Good.
  • the new approximate expression generation unit 14 determines that a new approximate expression can be generated (Yes in step S102), the new approximate expression generation unit 14 inputs each uncertain point and 1 input to the data input unit 10. From the two pieces of generated data, an approximate expression that approximates the data value is generated using the generation time of the data as a variable (step S103).
  • Approximate expression types and approximate expression generation algorithms are determined in advance, but the types and algorithms are not particularly limited. As already described, it is assumed that the approximate expression is a linear expression with the occurrence time t as a variable, and when four data are prepared, the new approximate expression generation unit 14 generates four sets of occurrence time and data.
  • An approximate expression may be generated by determining the first-order coefficient and the constant term from the value by the least square method.
  • the new approximate expression generation unit 14 obtains a straight line passing through two points having (occurrence time, data value) as coordinates as a linear expression having the occurrence time t as a variable. May be. Also in this case, the new approximate expression generation unit 14 may determine the primary coefficient and the constant term.
  • the approximate expression is a linear expression, and the approximate expression is represented by a primary coefficient and a constant term of the variable t as shown in FIG.
  • the expression form of the approximate expression is not limited to this form, and the approximate expression may be expressed in another form.
  • step S103 the new approximate expression generation unit 14 outputs the generated new approximate expression and each data (each indeterminate point and new generated data) used for generating the approximate expression to the graph update unit 18.
  • the graph update unit 18 assigns an approximate expression ID to the approximate expression received from the new approximate expression generation unit 14 and stores the approximate expression ID in the approximate expression storage unit 15 (step S104). As a result, one additional approximate expression is newly registered.
  • step S104 the graph updating unit 18 further determines an effective definition area of the newly generated approximate expression based on the generation time of each data used for generating the approximate expression, and stores the definite graph together with the approximate expression ID. Store in the unit 19.
  • the graph updating unit 18 satisfies the condition that the number of sections starting from and ending with the generation time of each data used for generating the approximate expression is as small as possible and does not overlap with the effective definition area of the existing approximate expression. Establish an effective definition area for the new approximate expression.
  • the graph update unit 18 may be a point that exists independently between the sections and points of the effective definition area of the existing approximate expression and cannot be set as the start point or end point of the section as a “point (see FIG. 11)”. .
  • the new data generation time substitution unit 12 sets the generation time of the data input to the data input unit 10 as follows: Substitution is performed for each approximate expression already generated in the past (that is, each approximate expression stored in the approximate expression storage unit 15). Then, the new data generation time substitution unit 12 calculates an approximate value of the data input to the data input unit 10 for each approximate expression (step S105). Then, the new data generation time substitution unit 12 outputs the approximate value calculated using the approximate expression and the set of the approximate expression to the graph evaluation unit 17.
  • the new data generation time substitution unit 12 refers to the final data information and also outputs to the graph evaluation unit 17 information indicating which is the final approximate expression and the set of approximate values calculated from the final approximate expression. . Further, the new data generation time substitution unit 12 also outputs the generation data (data input to the data input unit 10) that is currently processed to the graph evaluation unit 17. Details of the processing in step S105 will be described later. After step S105, the graph evaluation unit 17 compares the approximate value of the data calculated for each approximate expression with the actual data value of the data input to the data input unit 10 (step S106).
  • the accuracy constraint input unit 16 stores a criterion that the absolute value of the difference between the approximate value and the actual data value is less than the threshold value ⁇ (ie,
  • the graph evaluation unit 17 is an example in the case of selecting an approximate expression that satisfies this criterion. In this case, the graph evaluation unit 17 calculates the absolute value of the difference between the approximate value of the data calculated for each approximate expression and the actual data value of the data input to the data input unit 10 in step S106. Next, the graph evaluation unit 17 determines whether there is an approximate expression that satisfies the criterion that the absolute value of the difference calculated in step S106 is less than the threshold ⁇ (step S107).
  • the graph evaluation unit 17 selects an approximate expression that minimizes the amount of increase in storage capacity at the time of updating the effective domain from among the approximate expressions that satisfy the criterion. Select (step S108). That is, the graph evaluation unit 17 selects an approximate expression that minimizes the amount of increase from the storage capacity for storing the effective definition area before the update to the storage capacity for storing the effective definition area after the update. However, if there is only one approximate expression that satisfies the criterion, the graph evaluation unit 17 may select the approximate expression.
  • the graph evaluation unit 17 may select the final approximate expression. This is because the final approximate expression minimizes the increase in storage capacity. In addition, there are multiple approximate expressions that meet the criteria, and among these approximate expressions, there is a final approximate expression in which the end of the effective section is the end point of the “section”. If not, the graph evaluation unit 17 selects one from a plurality of approximate expressions that satisfy the criterion. This selection method may be determined in advance.
  • the graph evaluation unit 17 selects one approximate expression satisfying the criterion in step S108, the graph evaluation unit 17 outputs the selected approximate expression and its effective definition area and the generated data to be processed to the graph update unit 18.
  • the graph evaluation unit 17 also outputs information indicating whether or not the selected approximate expression is the final approximate expression to the graph update unit 18.
  • the graph evaluation unit 17 may output an approximate expression ID for identifying the approximate expression to the graph update unit 18 as an approximate expression.
  • the graph update unit 18 updates the effective definition area of the approximate expression received from the graph evaluation unit 17 in step S108, and stores it in the confirmed graph storage unit 19 (step S109). Since each mode in which the graph update unit 18 updates the valid domain has already been described, a description thereof is omitted here.
  • step S109 the graph update unit 18 updates the approximate expression ID of the final data information (see FIG. 9) stored in the final time storage unit 11 to the approximate expression ID of the approximate expression selected in step S108. Then, the final time (see FIG. 9) of the final data information is updated to the generation time of the generated data to be processed.
  • the graph evaluation unit 17 outputs the generated data to be processed to the graph update unit 18, and the graph update unit 18 stores the generated data as an undetermined point in the undetermined point storage unit 13 (step S110).
  • step S104 When the data summarization system completes any one of steps S104, S109, and S110, the same processing is repeated for the next data (next data in the order of occurrence time) input to the data input unit 10. . In this way, the processing from step S102 onward is performed individually for each piece of generated data input to the data input unit 10 in the order of generation time.
  • the data generation source ends the data generation (Yes in step S100)
  • the data summarization system ends the process.
  • the accuracy constraint input unit 16 stores a criterion that the absolute value of the difference between the approximate value and the actual data value is less than the threshold ⁇ is exemplified.
  • the criteria stored in the input unit 16 are not limited to the above criteria.
  • step S105 the new data generation time substitution unit 12 determines whether there is an approximate expression that has not yet been read from the approximate expression storage unit 15 (step S201). If there is an approximate expression that has not been read, the new data generation time substitution unit 12 reads the approximate expression and the approximate expression ID stored in the approximate expression storage unit 15 (step S202).
  • the new data generation time substitution unit 12 uses the approximate expression ID “f” and the approximate value F (t i ) Are output to the graph evaluation unit 17 (step S203). After step S203, the new data generation time substitution unit 12 repeats the processing after step S201. If there is no approximate expression that has not yet been read from the approximate expression storage unit 15 (No in step S201), the new data generation time substitution unit 12 reads the final data information from the final time storage unit 11 (step S204). The approximate expression ID of the final approximate expression included in the final data information is output to the graph evaluation unit 17 (step S205). At this time, the new data generation time substitution unit 12 also outputs to the graph evaluation unit 17 the data to be processed input to the data input unit 10.
  • the new data generation time substitution unit 12 may execute steps S204 and S205 before the loop processing of steps S201 to S203.
  • the data summarization system according to the first embodiment is such that “data that is generated sequentially with a certain tendency and may change irregularly, such as the CPU data rate and the number of web page accesses”. Are stored as an approximate expression using the occurrence time as a variable and an effective domain where it can be said that approximation by the approximate expression is appropriate.
  • the effective domain of one approximate expression is represented as a set of points indicating a time interval and a time point.
  • the data summarization system according to the first embodiment specifies an approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16 for new data, and approximates the new data using the approximate expression.
  • the data summarization system of the first embodiment can summarize (compress) data with high accuracy.
  • the effective definition area of one approximate expression is allowed to include a plurality of “sections” and “points”
  • the data summarization system of the first embodiment suppresses the storage capacity of the summarized data.
  • efficient data summarization can be realized. For example, after a state in which data that can be approximated by a certain approximate expression is continuously generated (first state), the tendency of the data value is temporarily changed (second state), and then again in the original approximate expression. Assume that a state (third state) in which data that can be approximated is generated occurs.
  • the data summarization system of the first embodiment includes the first state and the first state.
  • the generated data in the three states can be expressed by the same approximate expression, and the storage capacity can be reduced accordingly.
  • the data summarization system uses the approximate expression and effective domain in the first state and the approximate expression and effective definition in the third state. Each area needs to be stored, and the approximate expression is stored redundantly, resulting in an increase in storage capacity.
  • the data summarization system of the first embodiment can prevent such an increase in storage capacity.
  • the data summarization system must store the time of occurrence and data value for one point.
  • the data summarization system according to the first embodiment does not store data with a storage capacity larger than at least the storage capacity in the case of storing the data itself as it is.
  • the data summarization system according to the first embodiment stores two numerical values, that is, a data value and an occurrence time, so that two pieces of data are equivalent to four numerical values. Storage capacity is required.
  • the data summarization system according to the first embodiment generates a straight line connecting two points as an approximate expression when one uncertain point and one new data are prepared. To do.
  • the approximate expression is a linear expression
  • the data summarization system according to the first embodiment needs to store a linear coefficient and a constant term.
  • FIG. FIG. 20 is a block diagram illustrating an example of a data summarization system according to the second embodiment of this invention. Constituent elements similar to those in the first embodiment are denoted by the same reference numerals as those in FIG.
  • the data summarization system of the second embodiment includes a data input unit 10, a final time storage unit 11, a new data generation time substitution unit 12, a new approximate expression generation unit 14, an indeterminate point storage unit 13, and an approximation.
  • An expression storage unit 15, an accuracy constraint input unit 16, a graph evaluation unit 17a, a graph update unit 18a, a deterministic graph storage unit 19, a default expression input unit 20, and a default expression storage unit 21 are provided.
  • the default expression storage unit 21 is a storage device that stores one approximate expression that is known as an approximate expression for obtaining an approximate value of a data value. For example, FIG.
  • FIG. 21 is an explanatory diagram illustrating an example of a default formula stored in the default formula storage unit 21.
  • the default equation storage unit 21 stores a first-order coefficient and a constant term of a default equation.
  • FIG. 21 illustrates a case where a predetermined primary expression of the variable t is stored.
  • the default formula input unit 20 receives a default formula input from the user and stores the default formula in the default formula storage unit 21.
  • each approximate value calculated by the new data generation time substituting unit 12 substituting the generation time of new data for each approximate expression stored in the approximate expression storage unit 15 And the actual data value of the new data. This point is the same as the graph evaluation unit 17 of the first embodiment.
  • the graph evaluation unit 17a of the second embodiment further calculates an approximate value when the time of new data (generated data newly generated and received from the new data generation time substituting unit 12) is substituted into a predetermined formula. The approximate value is also compared with the actual data value of the new data. Then, the graph evaluation unit 17a identifies an approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16. Then, when there are a plurality of approximate expressions that satisfy the criterion, the graph evaluation unit 17a identifies an approximate expression that minimizes the increase in storage capacity when updating the effective domain among the approximate expressions that satisfy the criterion. The approximate expression is determined to approximate the data value of the new data.
  • the graph evaluation unit 17a determines that the approximate expression approximates the data value of the new data.
  • the graph evaluating unit 17a selects a predetermined expression if there is a predetermined expression in the approximate expressions. This is because an effective definition area is not defined in the default formula, so that the storage capacity for expressing the effective definition area is zero.
  • the method of selecting an approximate expression is the same as that of the graph evaluation unit 17 in the first embodiment, and a description thereof will be omitted.
  • the graph evaluation unit 17a includes the determined approximate expression and its effective domain, and new data (from the new data generation time substitution unit 12). The received generated data) is output to the graph updating unit 18a. At this time, the graph evaluation unit 17a also outputs information indicating whether or not the determined approximate expression is the final approximate expression to the graph update unit 18a. Further, the graph evaluation unit 17a may output, for example, the approximate expression ID to the graph update unit 18a as the determined approximate expression. This operation is the same as the operation of the graph evaluation unit 17 in the first embodiment.
  • the graph evaluation unit 17a outputs information notifying that the default expression has been selected and the input new generated data to the graph update unit 18a.
  • a default formula ID dedicated to the default formula representing the default formula may be used.
  • the graph evaluation unit 17a may output new data to the graph update unit 18a without selecting an approximate expression. This operation is the same as the operation of the graph evaluation unit 17 in the first embodiment.
  • the graph evaluation unit 17a determines an approximate expression other than the default expression, and the approximate expression and its effective domain, new generation data, and the determined approximation are determined from the graph evaluation unit 17a.
  • the effective domain of the approximate expression is updated.
  • the graph updating unit 18a may update the effective definition area so as to add the generation time of new generation data to the effective definition area of the received approximate expression.
  • the graph update unit 18a stores the updated effective definition area of the approximate expression determined by the graph evaluation unit 17a in the confirmed graph storage unit 19. This operation is the same as the operation of the graph update unit 18 in the first embodiment.
  • the graph evaluation unit 17a selects a predetermined formula as an approximation formula that approximates the data value of the new data, and notifies the fact that the default formula has been selected and the input new generated data to the graph update unit 18a.
  • the graph updating unit 18a operates as follows. That is, the graph update unit 18a stores the generation time of the new data and the default formula dedicated to the default formula indicating the default formula in the final time storage unit 11 as final data information.
  • the graph update unit 18a sets the generated data as an undefined point in the undefined point storage unit 13.
  • This operation is the same as the operation of the graph update unit 18 in the first embodiment.
  • the new approximate expression generation unit 14 newly generates an approximate expression from the uncertain point, and the approximate expression and each data used for generating the approximate expression (that is, each of the data stored in the uncertain point storage unit 13).
  • the operation of the graph update unit 18a when the uncertain point and newly generated data) are output to the graph update unit 18a is the same as the operation of the graph update unit 18 in the first embodiment.
  • the graph evaluation unit 17a, the graph update unit 18a, and the default input unit 20 in the second embodiment are realized by a CPU of a computer that operates according to a data summarization program, for example.
  • a program storage device (not shown) of the computer stores the data summarization program, and the CPU reads the program, and according to the program, the data input unit 10, the new data generation time substitution unit 12, and the new approximate expression generation unit 14
  • the accuracy constraint input unit 16, the graph evaluation unit 17 a, the graph update unit 18 a, and the default expression input unit 20 may be operated.
  • each of these units may be realized by separate hardware.
  • the graph evaluation unit 17a compares the approximate value of the data calculated for each approximate expression with the actual data value of the data input to the data input unit 10 (step S106). However, in the second embodiment, the graph evaluation unit 17a substitutes the generation time of new data for the default formula stored in the default formula storage unit 21, and obtains the approximate value obtained as a result and the actual data value. Compare. For example, when the accuracy constraint input unit 16 stores a reference of
  • the graph evaluation unit 17a determines whether there is an approximate expression that satisfies the criterion that the absolute value of the difference calculated in step S106 is less than the threshold ⁇ (step S107).
  • the operation when there is no approximate expression that satisfies the criterion (No in step S107) is the same as in the first embodiment.
  • the graph evaluation unit 17a selects an approximate expression that minimizes the amount of increase in the storage capacity at the time of effective domain update from the approximate expressions that satisfy the criterion. (Step S108).
  • step S108 if there is a predetermined expression in the approximate expression satisfying the criterion, the graph evaluation unit 17a selects the predetermined expression, information notifying that the predetermined expression has been selected, and the newly generated data that has been input. Is output to the graph updating unit 18a.
  • the operation when there is no predetermined expression in the approximate expression that satisfies the criterion is the same as in the first embodiment.
  • the graph update unit 18a receives the final data information including the default formula ID and the generation time of the data as the final time storage unit 11. (Step S109).
  • step S109 in other cases is the same as that in the first embodiment, and a description thereof will be omitted.
  • the data summarization system of the second embodiment does not provide an effective definition area for the default formula.
  • the generated data determined that the approximate expression that approximates the data value of the data does not correspond to the default expression is associated with any approximate expression. Therefore, the data summarization system of the second embodiment can efficiently summarize generated data approximated by an approximate expression other than the default expression with a small storage capacity, as in the first embodiment. .
  • the data summarization system of the second embodiment does not store the effective domain for data approximated by a predetermined formula, so that the summarization can be performed more efficiently.
  • the data summarization system according to the second embodiment can efficiently summarize with a smaller storage capacity than the first embodiment.
  • FIG. 23 is a block diagram illustrating an example of a data summarization system according to the third embodiment of this invention.
  • the data summarization system includes a data input unit 10, a final time storage unit 11, a new data generation time substitution unit 12, a new approximate expression generation unit 14, an uncertain point storage unit 13, and an approximation.
  • Expression storage unit 15, accuracy constraint input unit 16, graph evaluation unit 17, graph update unit 18, confirmed graph storage unit 19, unsummarized data storage unit 30, available storage area monitoring unit 31, and summary A control unit 32 is provided. Components similar to those in the first embodiment are denoted by the same reference numerals as those in FIG. 2, and detailed description thereof is omitted. However, when data is input to the data input unit 10 from a data generation source (not shown), the data input unit 10 outputs the input data to the summary control unit 32.
  • the new data generation time substitution unit 12 and the new approximate expression generation unit 14 receive data from the summary control unit 32.
  • the unsummarized data storage unit 30 is a storage device that stores data generated by a data generation source (not shown) in an unsummarized state. Since the data includes a data value and an occurrence time, the unsummary data storage unit 30 stores a data group including the data value and the occurrence time.
  • FIG. 4 is an explanatory diagram illustrating a plurality of uncertain points stored in the uncertain point storage unit 13, but the unsummary data storage unit 30 also includes data values and generation times as illustrated in FIG. Store multiple data. The process of storing data in the unsummarized data storage unit 30 is performed by the summary control unit 32.
  • the unsummarized data storage unit 30, the final time storage unit 11, the uncertain point storage unit 13, the approximate expression storage unit 15, and the confirmed graph storage unit 19 may be realized by separate storage devices. Alternatively, they may be realized by the same storage device. Further, some combinations of the unsummarized data storage unit 30, the final time storage unit 11, the uncertain point storage unit 13, the approximate expression storage unit 15, and the confirmed graph storage unit 19 are realized by the same storage device. May be. For example, the unsummarized data storage unit 30 and the final time storage unit 11 are realized by the same storage device, and the approximate expression storage unit 15 and the definite graph storage unit 19 are realized by a storage device different from the storage device. The uncertain point storage unit 13 may be realized by the storage device.
  • the available storage area monitoring unit 31 monitors the amount of available resources in the storage device that stores at least unsummarized data, and outputs the monitoring result to the summary control unit 32.
  • the unsummarized data here means data stored in the unsummarized data storage unit 30 from the data generation source via the data input unit 10 and the summary control unit 32. That is, it means data that has not yet been summarized as a target of the summary process. Therefore, the available storage area monitoring unit 31 only needs to monitor the amount of resources that can be used in the unsummarized data storage unit 30. If the unsummarized data storage unit 30 is realized by the same storage device as other storage units, the available storage area monitoring unit 31 may monitor the amount of resources that can be used in the storage device.
  • the available storage area monitoring unit 31 monitors an available resource amount
  • the available storage area monitoring unit 31 may monitor the resource amount in another aspect.
  • the unsummarized data storage unit 30 is a disk storage device
  • the unused rate of the disk may be monitored.
  • the case where the available storage monitoring unit 31 monitors the amount of available resources is shown, but the amount of resources already used by the available storage monitoring unit 31 (for example, the disk usage rate). ) May be monitored.
  • the available storage area monitoring unit 31 monitors the amount of available resources in the unsummarized data storage unit 30 will be described as an example.
  • the available storage area monitoring unit 31 may monitor the unsummarized data storage unit 30 at regular intervals. Alternatively, for example, the available storage area monitoring unit 31 may monitor the unsummarized data storage unit 30 when an instruction to perform monitoring is input at an arbitrary timing from a user or the like.
  • the summary control unit 32 stores the generated data input to the data input unit 10 in the unsummarized data storage unit 30 in an unsummarized state according to the monitoring result by the available storage area monitoring unit 31. Alternatively, the summary control unit 32 performs summary control for summarizing the data stored in the unsummarized data storage unit 30 according to the monitoring result by the available storage area monitoring unit 31.
  • the summarization control unit 32 determines the generated data input to the data input unit 10. Then, the data is stored in the unsummarized data storage unit 30 without being summarized. On the other hand, if the available resource amount in the unsummarized data storage unit 30 is equal to or less than the threshold value, the summary control unit 32 performs summary control for summarizing the data stored in the unsummary data storage unit 30. Specifically, the summary control unit 32 outputs the data stored in the non-summary data storage unit 30 to the new approximate expression generation unit 14 and the new data generation time substitution unit 12 to start the data summarization process. .
  • the summary control unit 32 When the summary control unit 32 outputs the data stored in the non-summary data storage unit 30 to the new approximate expression generation unit 14 and the new data generation time substitution unit 12, the summary control unit 32 stores the data from the unsummary data storage unit 30. to erase. In addition, when the available storage monitoring unit 31 monitors the amount of resources already used, the summary control unit 32 displays the generated data when the amount of used resources is less than the threshold, as an unsummarized data storage unit. 30 and the summary control may be performed when the amount of resources used is equal to or greater than the threshold. Also, when performing summary control, the summary control unit 32 may store new generated data in the unsummary data storage unit 30 and simultaneously perform summary control on the data.
  • the summary control unit 32 outputs the data stored in the non-summary data storage unit 30 to the new approximate expression generation unit 14 and the new data generation time substitution unit 12 one by one when performing summary control.
  • the summary control unit 32 outputs the same data simultaneously to the new approximate expression generation unit 14 and the new data generation time substitution unit 12, for example.
  • the summary control unit 32 should output the data one by one so that the output order of each data satisfies the condition that the generation time of the data output after the generation time of the data output earlier is later. That's fine.
  • the summary control unit 32 does not need to delete the output data in order of time if the condition that the output data is not output again is satisfied. Good.
  • FIG. 24 is a schematic diagram in which the data stored in the unsummarized data storage unit 30 is schematically arranged in the order of occurrence time. It is assumed that the data 51 is data that is first input to the data input unit 10 and stored in the non-summary data storage unit 30, and thereafter, the data 52 and subsequent data are sequentially stored in the non-summary data storage unit 30.
  • the summary control unit 32 may output “data to the new approximate expression generation unit 14 and the new data generation time substitution unit 12 in the order of occurrence from the data 51.
  • the summary control unit 32 may be in the middle of the generated data ( For example, from the data 55), the data may be output in the order of generation time to the new approximate expression generation unit 14 and the new data generation time substitution unit 12.
  • the data 51 to 54 are not subject to summarization and are deleted. It is kept in the unsummary data storage unit 30 without being done. If the condition that the generation time of data to be output later is later than the generation time of data to be output first to the new approximate expression generation unit 14 and the new data generation time substitution unit 12 is summarized, The control unit 32 may skip and output the data. For example, after the output of the data 51 to 54, the summary control unit 32 may skip the data 55 and output the data 56 to 59.
  • the skipped data is not included in the summary target and is kept in the unsummarized data storage unit 30.
  • the summary control unit 32 may output data satisfying the condition that the generation time is after the final time indicated by the final data information to the new approximate expression generation unit 14 and the new data generation time substitution unit 12.
  • the summary control unit 32 outputs the input generated data as it is to the new data generation time substituting unit 12 and the new approximate expression generating unit 14 in order to summarize the new generated data when available resources are still smaller.
  • a threshold for determining whether to summarize new data without storing it in the unsummarized data storage unit 30 may be set in advance separately from the above threshold.
  • the new approximate expression generation unit 14 When the summary control unit 32 outputs data to the new approximate expression generation unit 14 and the new data generation time substitution unit 12, the new approximate expression generation unit 14, the new data generation time substitution unit 12, the graph evaluation unit 17, and the graph update unit
  • the operation 18 is the same as in the first embodiment. That is, the new approximate expression generation unit 14, the new data generation time substitution unit 12, the graph evaluation unit 17, and the graph update unit 18 perform the steps shown in FIG. 16 for each piece of data output from the summary control unit 32.
  • the operations after S102 are performed.
  • the new data generation time substitution unit 12 does not perform processing.
  • the available storage area monitoring unit 31 and the summary control unit 32 in the third embodiment are realized by, for example, a CPU of a computer that operates according to a data summarization program.
  • the program storage device (not shown) of the computer stores the data summarization program, and the CPU reads the program, and according to the program, the data input unit 10, the available storage area monitoring unit 31, the summarization control unit 32, the new The data generation time substitution unit 12, the new approximate expression generation unit 14, the accuracy constraint input unit 16, the graph evaluation unit 17, and the graph update unit 18 may be operated.
  • each of these units may be realized by separate hardware.
  • the data summarization system of the third embodiment stores data in the unsummarized data storage unit 30 without summarizing the data if there are many resources that can be used to store the data.
  • the data summarization system according to the third embodiment does not summarize the data, and therefore can hold the data with high accuracy.
  • the data summarization system according to the third embodiment summarizes the data stored in the unsummarized data storage unit 30 when the resources that can be used to store the data are reduced. Similar to the embodiment, it is possible to efficiently store with a small storage capacity in the form of the approximate expression and its effective domain. Therefore, the data summarization system of the third embodiment can realize efficient data summarization as in the other embodiments, and can hold data with high accuracy when there are many available resources.
  • each component shown in FIG. 23 may be realized by a plurality of devices instead of being realized by one device.
  • the data input unit 10, the available storage area monitoring unit 31, the summary control unit 32, the unsummary data storage unit 30, and the final time storage unit 11 may be realized by the first information processing apparatus. .
  • the new data generation time substitution unit 12, the new approximate expression generation unit 14, the uncertain point storage unit 13, the accuracy constraint input unit 16, the graph evaluation unit 17, and the graph update unit 18 include the second information. You may implement
  • the data summarization system may be configured to include the first information processing device, the second information processing device, and the database device.
  • the data summarization system of the third embodiment includes a default formula input unit 20 and a default formula storage unit 21 (see FIG. 20), as in the second embodiment, and includes a graph evaluation unit 17 and a graph update unit 18.
  • FIG. 25 is a block diagram illustrating an example of a data summarization system according to the fourth embodiment of this invention.
  • the data summarization system includes a data input unit 10, a final time storage unit 11, a new data generation time substitution unit 12, a new approximate expression generation unit 14, an uncertain point storage unit 13, and an approximation.
  • An area calculation unit 40, an effective domain capacity evaluation unit 41, and an effective domain update unit 42 are provided. As described in the second embodiment, the valid definition area is not defined in the default formula stored in the default formula storage unit 21.
  • the default effective domain calculator 40 calculates the storage capacity required for the effective domain of each approximate expression other than the default formula and the storage capacity required for the effective domain of the default formula when the effective domain is defined in the default formula. In order to determine the magnitude relationship, a default effective definition area is calculated, and the effective definition area is output to the effective definition area capacity evaluation unit 41.
  • FIG. 26 is an explanatory diagram showing an example of deriving a default effective definition area. The horizontal axis shown in FIG.
  • 26 represents the data generation time t, and the vertical axis represents the data value x.
  • Each data is T 0 ⁇ T 7 It is assumed that it occurs in the time zone.
  • the predetermined effective range calculation unit 40 may calculate the default effective range as shown in the following equation (2).
  • the valid domain capacity evaluation unit 41 refers to the valid domain of the default formula received from the default formula valid domain calculation unit 40 and the valid domain of each approximate expression stored in the definite graph storage unit 19 to validate An approximate expression that maximizes the storage capacity required to store the domain is specified.
  • the effective domain capacity evaluation unit 41 outputs the approximate expression and the default effective domain to the effective domain update unit 42. Note that the valid domain capacity evaluation unit 41 may output an approximate formula ID or a default formula ID to the valid domain update unit 42 instead of outputting the approximate formula itself. In the example shown in FIG.
  • the effective definition area of the approximate expression of the approximate expression ID “f1” is [T 0 , T 1 ] ⁇ [T 2 +1, T 3 ] ⁇ [T 5 +1, T 6 Therefore, a storage capacity of 6 numerical values is required.
  • the effective definition area of the approximate expression ID “f2” is [T 3 +1, T 4 ] ⁇ [T 6 +1, T 7 Therefore, a storage capacity for four numerical values is required.
  • the effective domain calculated for the default formula is [T 1 +1, T 2 ] ⁇ [T 4 +1, T 5 Therefore, a storage capacity for four numerical values is required.
  • the valid domain update unit 42 responds to the input contents. To update the default formula. However, if the approximate expression that maximizes the storage capacity of the valid domain is the current default formula, the valid domain update unit 42 ends the process without updating.
  • the default valid domain calculation unit 40, the valid domain capacity evaluation unit 41, and the valid domain update unit 42 in the fourth embodiment are realized by a CPU of a computer that operates according to a data summarization program, for example.
  • a program storage device (not shown) of the computer stores the data summarization program, and the CPU reads the program, and according to the program, the data input unit 10, the new data generation time substitution unit 12, and the new approximate expression generation unit 14 , Accuracy constraint input unit 16, graph evaluation unit 17 a, graph update unit 18 a, default formula input unit 20, default formula valid domain calculation unit 40, valid domain capacity assessment unit 41, and valid domain update unit 42. Good.
  • each of these units may be realized by separate hardware.
  • FIG. 28 is a flowchart showing an example of the progress of the default formula update process by the default formula valid domain calculation unit 40, the valid domain capacity evaluation unit 41, and the valid domain update unit 42 in the fourth embodiment.
  • the data summarization system performs this default formula update process by performing the data input unit 10, the final time storage unit 11, the new data generation time substitution unit 12, the new approximate formula generation unit 14, and the uncertain point storage unit 13.
  • Data summarization processing (same data summarization processing as in the second embodiment) by the approximate expression storage unit 15, the graph evaluation unit 17a, the graph update unit 18a, the confirmed graph storage unit 19, and the default formula storage unit 21 ) And asynchronous.
  • the data summarization system may execute a predefined update process shown in FIG. 28 at regular time intervals.
  • the data summarization system may execute a default update process.
  • the data summarization system may execute a default expression update process.
  • the default formula valid domain calculation unit 40 reads the valid domain of all approximate formulas from the definite graph storage unit 19 (step S401). Further, the default effective range calculation unit 40 performs the calculation of the above-described formula (1), so that the default effective range S default Is calculated (step S402).
  • the predefined effective domain calculation unit 40 outputs the effective domain to the effective domain capacity evaluation unit 41.
  • the effective domain capacity evaluation unit 41 uses the default valid domain S default Then, referring to the effective domain of each approximate expression, an approximate expression that maximizes the storage capacity required to store the effective domain is specified (step S403).
  • the valid domain capacity evaluation unit 41 outputs the identified approximate expression and the default valid domain to the valid domain update unit 42.
  • the valid domain update unit 42 determines whether or not the approximate formula specified in step S403 is a default formula as an approximate formula that maximizes the storage capacity required to store the valid domain ( Step S404).
  • the valid domain update unit 42 updates the approximate expression specified in step S403 as a new prescribed expression (step S405). Specifically, the valid domain update unit 42 performs the following processing. The valid domain update unit 42 sets the approximate formula that maximizes the storage capacity of the valid domain as a new default formula, and updates the default formula stored in the default formula storage unit 21 to the new default formula. Then, the valid domain update unit 42 deletes the approximate expression and the approximate expression ID as a new default expression from the approximate expression storage unit 15. Further, the valid domain update unit 42 stores the approximate expression that has been set as the default expression in the approximate expression storage unit 15.
  • the valid domain update unit 42 assigns an approximate expression ID to the approximate expression (approximate expression that has been set as a default expression so far), and stores the approximate expression ID together with the approximate expression ID in the approximate expression storage unit 15.
  • the effective domain update unit 42 deletes the approximate expression ID of the approximate expression as a new default formula and its effective domain from the confirmed graph storage unit 19. Further, the valid domain update unit 42 confirms the approximate expression ID assigned to the approximate expression that has been used as the default formula and the valid domain (the valid domain calculated by the default formula valid domain calculation unit 40). The data is stored in the storage unit 19. Further, if the approximate expression ID of the new approximate expression is stored in the final time storage section 11 as the final approximate expression ID, the valid domain update unit 42 sets the ID as the default expression ID.
  • the valid domain update unit 42 assigns the default formula ID to the approximate formula that has been the default formula so far. Update to approximate expression ID. If the approximate expression specified in step S403 is a default expression (Yes in step S404), the valid domain update unit 42 does not update the default expression and ends the process as it is. That is, the valid domain update unit 42 ends the process without updating the contents stored in the default formula storage unit 21, the approximate formula storage unit 15, the confirmed graph storage unit 19, and the final time storage unit 11.
  • the data summarization system of the fourth embodiment compares each effective definition area of an approximate expression including a default expression, and sets an approximate expression having the maximum storage capacity for storing the effective definition area as a new default expression. Update the expression. Since the effective domain is not defined in the default formula, the data summarization system of the fourth embodiment reduces the storage capacity required for storing the valid domain by updating the default formula as described above, and more efficiently.
  • the data summarization system of the fourth embodiment can reduce the capacity required for storing the effective domain by two in this example.
  • the data summarization system according to the fourth embodiment includes an unsummary data storage unit 30, an available storage area monitoring unit 31, and a summary control unit 32, and stores data.
  • Data may be stored as it is when there are many resources that can be used, and data summarization may be performed when resources are reduced.
  • the case where the data includes a data value that is a numerical value is taken as an example.
  • the data value can be converted into a numerical value and a difference between the numerical data can be derived. May be included.
  • text information may be used as a data value if a conversion rule for numerical values is defined.
  • the data input unit 10 may convert the text information into a numerical value.
  • the subsequent processing is the same as in the above embodiment.
  • a vector may be included as a data value. That is, the data may include a vector and an occurrence time.
  • the new approximate expression generation unit 14 may generate an approximate expression for deriving an approximate value of a vector from a plurality of data (undefined points and newly generated data).
  • step S106 when the graph evaluation unit 17 or the graph evaluation unit 17a compares the vector calculated as the approximate value with the vector actually included in the data, the distance between the two in the vector space. May be calculated.
  • the graph evaluation unit 17 or the graph evaluation unit 17a may determine whether there is an approximate expression whose distance is less than the threshold value ⁇ (or less than ⁇ ).
  • the threshold value ⁇ or less than ⁇ .
  • FIG. 29 is a block diagram showing the minimum configuration of the present invention.
  • the data summarization system of the present invention includes an approximate value calculation unit 61, an approximate expression evaluation unit 62, an unconfirmed data storage unit 63, a new approximate expression generation unit 64, and an update unit 65.
  • the approximate value calculation unit 61 (for example, the new data generation time substitution unit 12) is an approximate expression for calculating an approximate value of a data value in data including the data value and the generation time of the data value, and the generation time is a variable. Approximate the approximate value of the data value of the new data by substituting the occurrence time included in the new data for each approximate expression in which the effective domain of the variable is defined as a time interval or a set of time points. Calculate for each formula.
  • the approximate expression evaluation unit 62 (for example, the graph evaluation unit 17) selects an approximate expression suitable for calculating the approximate value of the data value of the new data based on the approximate value calculated for each approximate expression and the data value of the new data.
  • the indeterminate data storage unit 63 (for example, the indeterminate point storage unit 13) converts the new data determined to have no approximate expression suitable for calculating the approximate value of the data value into the approximate expression indeterminate data (for example, the indeterminate point).
  • the new approximate expression generation unit 64 (for example, the new approximate expression generation unit 14) generates a new approximate expression from the new data and the approximate expression unconfirmed data when the new data is input to the new approximate expression generation unit 64.
  • the update unit 65 (for example, the graph update unit 18) approximates so that the generation time of the new data is included when the approximate expression evaluation unit 62 selects an approximate expression suitable for calculating the approximate value of the data value of the new data.
  • Update the effective domain of the expression The data summarization system including the above configuration stores each data in the form of an approximate expression and its effective domain, and defines the effective domain of one approximate expression as a time interval or a set of time points. Therefore, the data summarization system requires a small storage capacity for storing the approximate expression and its effective domain.
  • the data summarization system can efficiently summarize (compress) the data.
  • this advantage is remarkably obtained when summarizing data that occurs sequentially in a certain tendency and that may vary greatly irregularly.
  • a data summarization system having the following configuration is described.
  • An approximate expression for calculating an approximate value of a data value in data including a data value and an occurrence time of the data value, where the occurrence time is a variable, and the effective domain of the variable is a time interval or a single time
  • An approximate value calculation unit that calculates the approximate value of the data value of the new data for each approximate expression by substituting the occurrence time included in the new data for each approximate expression defined as a set of (for example, new data generation Based on the time substitution unit 12) and the approximate value calculated for each approximate expression and the data value of the new data, an approximate expression suitable for calculating the approximate value of the data value of the new data is selected, or the new data
  • An approximate expression evaluation unit (for example, the graph evaluation unit 17) that determines that there is no approximate expression suitable for calculating the approximate value of the data value, and new data determined that there is no approximate expression suitable for the approximate value calculation of the data value are approximated.
  • an unconfirmed data storage unit for example, an unconfirmed point storage unit 13
  • fixed data for example, unconfirmed points
  • new data and approximate expression unconfirmed data A new approximate expression that determines whether an approximate expression can be generated, generates a new approximate expression if it can be generated, and defines a time interval or a set of time points as an effective domain of the approximate expression
  • the generation unit for example, the new approximate expression generation unit 14
  • the approximate expression evaluation unit select an approximate expression suitable for calculating the approximate value of the data value of the new data, the approximation is performed so that the generation time of the new data is included.
  • a data summarization system comprising: an update unit (for example, graph update unit 18) that updates an effective domain of an expression.
  • the approximate expression evaluation unit specifies an approximate expression in which the relationship between the approximate value and the data value of the new data satisfies a predetermined criterion (for example, the criterion stored in the accuracy constraint input unit 16), and the approximate equation that satisfies the criterion If there is one, select the approximate expression. If there are multiple approximate expressions that meet the criteria, the effective definition area includes the time of occurrence of new data from the approximate expressions.
  • an approximate value of the data value of the new data A data summarization system that determines that there is no approximate expression suitable for calculation. (3) An approximate expression for calculating the approximate value of the data value, and including a default formula storage unit (for example, the default formula storage unit 21) that stores a default formula that is an approximate formula that does not define an effective domain, and calculates an approximate value. For each approximate expression including the default expression, the approximate value of the data value of the new data is calculated for each approximate expression by substituting the occurrence time included in the new data.
  • a default formula storage unit for example, the default formula storage unit 21
  • the relation between the approximate value and the data value of the new data satisfies the predetermined criterion from among the approximate expressions including the one, and if there is one approximate expression that satisfies the criterion, select the approximate expression. If there are multiple approximation formulas that satisfy the criteria and the default formula is included in the multiple approximation formulas, select the default formula, and there are multiple approximation formulas that satisfy the criteria. If the default expression is not included in the approximate expression, select from the multiple approximate expressions.
  • the data summarization system determines that there is no approximate expression suitable for calculating the approximate value of the data value of the new data.
  • a valid domain from a default formula valid domain calculator (for example, default formula valid domain calculator 40) for calculating a valid domain of a default formula and each approximate expression including the default formula
  • the effective domain capacity evaluation unit (for example, the effective domain capacity evaluation unit 41) that specifies an approximate expression that maximizes the storage capacity required for the storage, and the storage capacity required to store the effective domain If the approximate expression is not a default formula, the approximate formula with the maximum storage capacity is stored as a new default formula in the default formula storage unit, and the approximate formula with the maximum storage capacity and its effective domain are excluded.
  • a data summarization system comprising a predefined update unit (for example, an effective domain update unit 42).
  • New data storage unit for example, unsummarized data storage unit 30 for storing input new data
  • monitoring unit for example, available storage
  • monitoring resources capable of storing new data in the new data storage unit Area monitoring unit 31
  • a data summarization system comprising a summary control unit (for example, a summary control unit 32) for outputting.
  • a summary control unit for example, a summary control unit 32 for outputting.
  • Each form of the present invention can be suitably applied to a data summarization apparatus that summarizes data that is sequentially generated with a certain tendency and may change irregularly.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided is a data summarization system capable of efficiently compressing data that is generated sequentially in a fixed trend and that changes significantly irregularly. An approximate value calculation unit (61) causes each approximation formula to calculate an approximation value of a data value of new data by way of substituting the time of generation of the new data into approximation formulas for calculating approximation values of data values for which each approximation formula has been defined so that the variable is the time of generation, and the valid domain of the variable is an interval of time or a set of points of time. An approximation formula evaluation unit (62) either selects an approximation formula suitable for approximation value calculation of a data value of new data, or determines that there is no approximation formula suitable for approximation value calculation of the data value of the new data on the basis of the data value of the new data and an approximation value calculated for each approximation formula. The update unit (65) updates the valid domain of the approximation formula so as to include the time of generation of the new data when the approximation formula evaluation unit (62) has selected an approximation formula suitable for approximation value calculation of the data value of the new data.

Description

データ要約システム、データ要約方法および記録媒体Data summarization system, data summarization method and recording medium
 本発明は、逐次的に発生するデータの情報量を削減するデータ要約システム、データ要約方法、データ要約プログラム、記録媒体、およびそのデータ要約システム、データ要約方法、データ要約プログラム、記録媒体に適用されるデータ構造に関する。 The present invention is applied to a data summarization system, a data summarization method, a data summarization program, a recording medium, and a data summarization system, a data summarization method, a data summarization program, and a recording medium that reduce the amount of information of sequentially generated data. Related to the data structure.
 連続的に発生するデータを動的に圧縮して格納する技術が提案されている。例えば、特許文献1には、プロセスデータに対する、差分によるデータ圧縮方法が記載されている。プロセスデータは時系列のデータ値である。特許文献1に記載された差分による圧縮方法は、x軸を時間、y軸をプロセス値とした2次元平面上で各プロセスデータを表した場合に、ある任意のプロセスデータを基準点として、この基準点と処理対象のプロセスデータとで、プロセス値の差を求める。そしてこの圧縮方法は、この差の絶対値が、処理対象としているプロセスデータについて算出した圧縮精度の範囲を超えるようなプロセスデータを、時系列プロセスデータ格納部に格納し、それ以外のプロセスデータを極力間引く。このように間引くことで特許文献1に記載された圧縮方法はデータを圧縮している。ここで、圧縮精度とは、各プロセスデータ毎に設定され、圧縮するか否かを判断するために用いられるものである。圧縮精度が大きいほど圧縮される可能性が高くなる。すなわち、特許文献1に記載の技術において、圧縮精度が高いとは、圧縮率が高くなることと類似する概念であり、圧縮精度が低いとは圧縮率が低くなることと類似する概念である。また、特許文献1において、上記の基準点はpivotと称されている。
 また、特許文献2には、過去の入力値に基づいて次の入力値を予測し、実際の入力値と予測値との差分を記憶し、データ圧縮を行う技術が記載されている。
Techniques for dynamically compressing and storing continuously generated data have been proposed. For example, Patent Document 1 describes a data compression method based on a difference for process data. Process data is time-series data values. The compression method based on the difference described in Patent Document 1 uses this process data as a reference point when each process data is represented on a two-dimensional plane with the x-axis as time and the y-axis as process values. A difference in process value is obtained between the reference point and the process data to be processed. In this compression method, process data whose absolute value of the difference exceeds the compression accuracy range calculated for the process data to be processed is stored in the time-series process data storage unit, and other process data is stored. Decrease as much as possible. In this way, the compression method described in Patent Document 1 compresses data. Here, the compression accuracy is set for each process data and is used to determine whether or not to compress. The higher the compression accuracy, the higher the possibility of compression. That is, in the technique described in Patent Document 1, high compression accuracy is a concept similar to an increase in compression rate, and low compression accuracy is a concept similar to a reduction in compression rate. In Patent Document 1, the reference point is referred to as pivot.
Patent Document 2 describes a technique for predicting a next input value based on a past input value, storing a difference between the actual input value and the predicted value, and performing data compression.
特開2003−15734号公報(段落0039,段落0063~0065)JP 2003-15734 A (paragraph 0039, paragraphs 0063-0065) 特開2006−259937号公報(段落0031~0033)JP-A-2006-259937 (paragraphs 0031 to 0033)
 特許文献1に記載された技術は、プロセスデータのプロセス値と基準となるデータとの差分を圧縮精度と比較して、プロセスデータを圧縮するか否かを判定する。そのため、圧縮対象であるプロセスデータのプロセス値が不連続に大きく変化すると、プロセス値と基準値との差が圧縮精度を超えることとなり、プロセスデータが間引かれにくくなる(すなわち、圧縮されにくくなる)。従って、特許文献1に記載された技術は、データの値が突然大きく変化することが多い場合には、効率的なデータ圧縮が困難である。
 特許文献2に記載された技術は、過去の値に基づく予測結果と実際の入力値との差分を記憶することでデータ圧縮を実現している。そのため、圧縮対象のデータの値が不規則なタイミングで変化すると、予測値と実際の入力値との差が大きくなり、その分、多くの記憶容量が必要となる。従って、特許文献2に記載された技術は、効率的にデータ圧縮を行えない。
 逐次的に発生するデータであって、データ値が突発的に大きく変化するデータの一例としてCPUの使用率等が挙げられる。このように、データ値が不規則に突然大きく変化するデータの圧縮に、特許文献1,2に記載された技術は適さない。
 そこで、本発明は、逐次的に一定の傾向で発生するデータであって不規則に大きく変化することがあるデータを効率的に圧縮することができるデータ要約システム、データ要約方法、データ要約プログラムおよび記録媒体を提供することを目的の一例とする。また、そのようなデータ要約システム、データ要約方法、データ要約プログラムおよび記録媒体に好適に適用されるデータ構造を提供することを目的の一例とする。
The technique described in Patent Document 1 determines whether or not to compress process data by comparing the difference between the process value of the process data and the reference data with the compression accuracy. For this reason, when the process value of the process data to be compressed changes greatly in a discontinuous manner, the difference between the process value and the reference value exceeds the compression accuracy, and the process data becomes difficult to be thinned out (that is, difficult to compress). ). Therefore, in the technique described in Patent Document 1, efficient data compression is difficult when the value of the data often changes greatly suddenly.
The technique described in Patent Literature 2 realizes data compression by storing a difference between a prediction result based on a past value and an actual input value. For this reason, if the value of the data to be compressed changes at irregular timing, the difference between the predicted value and the actual input value becomes large, and a correspondingly large storage capacity is required. Therefore, the technique described in Patent Document 2 cannot perform data compression efficiently.
An example of data that is sequentially generated and whose data value changes suddenly is a CPU usage rate. As described above, the techniques described in Patent Documents 1 and 2 are not suitable for compressing data in which data values suddenly and suddenly change greatly.
Therefore, the present invention provides a data summarization system, a data summarization method, a data summarization program, and a data summarization method capable of efficiently compressing data that is sequentially generated in a certain tendency and that may change greatly irregularly. An example is to provide a recording medium. Another object is to provide a data structure suitably applied to such a data summarization system, data summarization method, data summarization program, and recording medium.
 本発明の一形態によるデータ要約システムは、データ値とそのデータ値の発生時刻とを含むデータにおけるデータ値の近似値を計算する近似式であって、発生時刻を変数とし、変数の有効定義域が時間の区間または一点の時刻の集合として定められた各近似式に対して、新規データに含まれる発生時刻を代入することで新規データのデータ値の近似値を近似式毎に計算する近似値計算部と、近似式毎に計算された近似値と新規データのデータ値とに基づいて、新規データのデータ値の近似値計算に適する近似式を選択するか、あるいは、新規データのデータ値の近似値計算に適する近似式がないと判定する近似式評価部と、データ値の近似値計算に適する近似式がないと判定された新規データを、近似式未確定データとして記憶する未確定データ記憶部と、新規データが入力されたときに、その新規データと近似式未確定データとにより新規の近似式を生成可能であるか否かを判定し、生成可能である場合に新規の近似式を生成し、その近似式の有効定義域として時間の区間または一点の時刻の集合を定める新規近似式生成部と、近似式評価部が新規データのデータ値の近似値計算に適する近似式を選択した場合に、新規データの発生時刻を含めるように、近似式の有効定義域を更新する更新部とを備えることを特徴とする。
 また、本発明の一形態によるデータ要約方法は、データ値とそのデータ値の発生時刻とを含むデータにおけるデータ値の近似値を計算する近似式であって、発生時刻を変数とし、変数の有効定義域が時間の区間または一点の時刻の集合として定められた各近似式に対して、新規データに含まれる発生時刻を代入することで新規データのデータ値を近似式毎に計算し、近似式毎に計算された近似値と新規データのデータ値とに基づいて、新規データのデータ値の近似値計算に適する近似式を選択するか、あるいは、新規データのデータ値の近似値計算に適する近似式がないと判定し、データ値の近似値計算に適する近似式がないと判定された新規データを、近似式未確定データとして記憶し、新規データが入力されたときに、その新規データと近似式未確定データとにより新規の近似式を生成可能であるか否かを判定し、生成可能である場合に新規の近似式を生成し、その近似式の有効定義域として時間の区間または一点の時刻の集合を定め、新規データのデータ値の近似値計算に適する近似式を選択した場合に、新規データの発生時刻を含めるように、近似式の有効定義域を更新することを特徴とする。
 また、本発明の一形態によるデータ要約プログラムを格納した記録媒体は、コンピュータに、データ値とそのデータ値の発生時刻とを含むデータにおけるデータ値の近似値を計算する近似式であって、発生時刻を変数とし、変数の有効定義域が時間の区間または一点の時刻の集合として定められた各近似式に対して、新規データに含まれる発生時刻を代入することで新規データのデータ値の近似値を近似式毎に計算する近似値計算処理、近似式毎に計算された近似値と新規データのデータ値とに基づいて、新規データのデータ値の近似値計算に適する近似式を選択するか、あるいは、新規データのデータ値の近似値計算に適する近似式がないと判定する近似式評価処理、データ値の近似値計算に適する近似式がないと判定された新規データを、近似式未確定データとして未確定データ記憶部に記憶させる未確定データ記憶処理、新規データが入力されたときに、その新規データと近似式未確定データとにより新規の近似式を生成可能であるか否かを判定し、生成可能である場合に新規の近似式を生成し、その近似式の有効定義域として時間の区間または一点の時刻の集合を定める新規近似式生成処理、および、近似式評価処理で新規データのデータ値の近似値計算に適する近似式を選択した場合に、当該新規データの発生時刻を含めるように、近似式の有効定義域を更新する更新処理を実行させるプログラムを格納する。
 また、本発明の一形態によるデータ構造は、変数を代入してデータ値の近似値を計算するための近似式と、データ値の近似値を求めることができる変数の定義域である有効定義域とが対応付けられ、有効定義域は、変数の区間または一つの変数値を表す点の集合で表されることを特徴とする。
A data summarization system according to an aspect of the present invention is an approximate expression for calculating an approximate value of a data value in data including a data value and an occurrence time of the data value, the occurrence time being a variable, and an effective definition range of the variable Is an approximate value that calculates the approximate value of the data value of the new data for each approximate expression by substituting the occurrence time included in the new data for each approximate expression that is defined as a time interval or a set of time points Based on the calculation unit, the approximate value calculated for each approximate expression and the data value of the new data, select an approximate expression suitable for calculating the approximate value of the data value of the new data, or the data value of the new data An approximate expression evaluation unit that determines that there is no approximate expression suitable for approximate value calculation, and an unconfirmed data that stores new data determined to have no approximate expression suitable for approximate calculation of data values as approximate formula unconfirmed data. When new data is input, it is determined whether or not a new approximate expression can be generated from the new data and the approximate expression indeterminate data. A new approximate expression generation unit that generates an expression and defines a time interval or a set of time points as an effective definition area of the approximate expression, and an approximate expression suitable for calculating the approximate value of the data value of the new data by the approximate expression evaluation unit And an update unit that updates the effective definition area of the approximate expression so as to include the generation time of the new data when selected.
A data summarization method according to an aspect of the present invention is an approximate expression for calculating an approximate value of a data value in data including a data value and an occurrence time of the data value, wherein the occurrence time is a variable, By substituting the occurrence time included in the new data for each approximate expression whose domain is defined as a time interval or a set of time points, the data value of the new data is calculated for each approximate expression. Select an approximation formula suitable for calculating the approximate value of the new data based on the approximate value calculated every time and the data value of the new data, or an approximation suitable for calculating the approximate value of the data value of the new data New data determined that there is no formula and determined that there is no approximate formula suitable for calculating the approximate value of the data value is stored as approximate formula indeterminate data, and when new data is input, the new data It is determined whether or not a new approximate expression can be generated based on the undefined data of the similar expression, and if it can be generated, a new approximate expression is generated, and a time interval or a single point is defined as an effective definition area of the approximate expression. When the approximate expression suitable for calculating the approximate value of the data value of the new data is selected, the effective definition area of the approximate expression is updated so that the generation time of the new data is included. .
A recording medium storing a data summarizing program according to an aspect of the present invention is an approximate expression for calculating an approximate value of a data value in data including a data value and an occurrence time of the data value in a computer. Approximating the data value of the new data by substituting the time of occurrence included in the new data for each approximate expression where the time is a variable and the effective domain of the variable is defined as a time interval or a set of time points Approximate value calculation processing that calculates a value for each approximate expression, whether to select an approximate expression suitable for calculating the approximate value of the data value of the new data based on the approximate value calculated for each approximate expression and the data value of the new data Alternatively, approximate expression evaluation processing for determining that there is no approximate expression suitable for calculating the approximate value of the data value of the new data, and new data determined that there is no approximate expression suitable for calculating the approximate value of the data value Unconfirmed data storage process to be stored in the unconfirmed data storage unit as approximate expression unconfirmed data, whether new approximate expression can be generated from new data and approximate expression unconfirmed data when new data is input A new approximate expression is generated, and if it can be generated, a new approximate expression is generated, and a new approximate expression generation process that defines a time interval or a set of time points as an effective definition area of the approximate expression, and approximate expression evaluation When an approximate expression suitable for calculating the approximate value of the data value of the new data is selected in the process, a program for executing an update process for updating the effective definition area of the approximate expression so as to include the generation time of the new data is stored. .
In addition, the data structure according to one aspect of the present invention includes an approximation formula for calculating an approximate value of a data value by substituting a variable, and an effective domain that is a domain of a variable that can obtain the approximate value of the data value. And the effective domain is represented by a set of points representing a variable interval or one variable value.
 本発明の各形態は、逐次的に一定の傾向で発生するデータであって不規則に大きく変化することがあるデータを効率的に圧縮することができる。また、本発明の一形態によるデータ構造は、そのような利点を有するデータ要約システム、データ要約方法、データ要約プログラムおよび記録媒体に好適に用いられることができる。 Each form of the present invention can efficiently compress data that is sequentially generated with a certain tendency and that may change greatly irregularly. The data structure according to an aspect of the present invention can be suitably used for a data summarization system, a data summarization method, a data summarization program, and a recording medium having such advantages.
図1は、有効定義域の例を示す説明図である。FIG. 1 is an explanatory diagram showing an example of an effective domain. 図2は、本発明の第1の実施形態のデータ要約システムの例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of the data summarization system according to the first embodiment of this invention. 図3は、データ入力部10に入力される1つのデータの例を示す説明図である。FIG. 3 is an explanatory diagram illustrating an example of one data input to the data input unit 10. 図4は、未確定点記憶部13に記憶される未確定点の一例を示す説明図である。FIG. 4 is an explanatory diagram illustrating an example of uncertain points stored in the uncertain point storage unit 13. 図5は、未確定点と新たに発生したデータを模式的に示す説明図である。FIG. 5 is an explanatory diagram schematically showing uncertain points and newly generated data. 図6は、図5に示す各データから生成した近似式を模式的に示す説明図である。FIG. 6 is an explanatory diagram schematically showing an approximate expression generated from each data shown in FIG. 図7は、近似式の表現形式の例を示す説明図である。FIG. 7 is an explanatory diagram illustrating an example of the expression format of the approximate expression. 図8は、近似式記憶部15が記憶する近似式およびその近似式IDの例を示す説明図である。FIG. 8 is an explanatory diagram showing an example of an approximate expression stored in the approximate expression storage unit 15 and an approximate expression ID thereof. 図9は、最終データ情報の一例を示す説明図である。FIG. 9 is an explanatory diagram illustrating an example of final data information. 図10は、新規データ発生時刻代入部12の処理の例を示す説明図である。FIG. 10 is an explanatory diagram showing an example of processing of the new data generation time substitution unit 12. 図11は、近似式毎の有効定義域の例を示す説明図である。FIG. 11 is an explanatory diagram showing an example of an effective definition area for each approximate expression. 図12は、基準を満たす近似式を選択する例を示す説明図である。FIG. 12 is an explanatory diagram illustrating an example of selecting an approximate expression that satisfies a criterion. 図13は、基準を満たす近似式を選択する例を示す説明図である。FIG. 13 is an explanatory diagram illustrating an example of selecting an approximate expression that satisfies a criterion. 図14は、基準を満たす近似式を選択する例を示す説明図である。FIG. 14 is an explanatory diagram illustrating an example of selecting an approximate expression that satisfies a criterion. 図15は、新規近似式生成時の有効定義域更新の例を示す説明図である。FIG. 15 is an explanatory diagram illustrating an example of valid domain update when a new approximate expression is generated. 図16は、第1の実施形態の処理経過の例を示すフローチャートである。FIG. 16 is a flowchart illustrating an example of processing progress of the first embodiment. 図17は、ステップS105の処理経過の例を示すフローチャートである。FIG. 17 is a flowchart illustrating an example of the processing progress of step S105. 図18は、特許文献1に記載された技術を適用してデータ圧縮する場合の例を示す説明図である。FIG. 18 is an explanatory diagram illustrating an example in which data compression is performed by applying the technique described in Patent Document 1. 図19は、一時的に不規則なデータが連続して発生している状況の例を示す説明図である。FIG. 19 is an explanatory diagram illustrating an example of a situation in which irregular data is continuously generated temporarily. 図20は、本発明の第2の実施形態のデータ要約システムの例を示すブロック図である。FIG. 20 is a block diagram illustrating an example of a data summarization system according to the second embodiment of this invention. 図21は、既定式の一例を示す説明図である。FIG. 21 is an explanatory diagram showing an example of a predetermined formula. 図22は、定数項のみで既定式を表現した場合の例を示す説明図である。FIG. 22 is an explanatory diagram showing an example of a case where a predetermined expression is expressed only by a constant term. 図23は、本発明の第3の実施形態のデータ要約システムの例を示すブロック図である。FIG. 23 is a block diagram illustrating an example of a data summarization system according to the third embodiment of this invention. 図24は、未要約データ記憶部30に記憶されたデータを発生時刻順に模式的に並べた模式図である。FIG. 24 is a schematic diagram in which the data stored in the unsummarized data storage unit 30 is schematically arranged in the order of occurrence time. 図25は、本発明の第4の実施形態のデータ要約システムの例を示すブロック図である。FIG. 25 is a block diagram illustrating an example of a data summarization system according to the fourth embodiment of this invention. 図26は、既定式の有効定義域の導出例を示す説明図である。FIG. 26 is an explanatory diagram showing an example of deriving a default effective definition area. 図27は、図26に例示するx=f1(t)およびx=f2(t)の有効定義域をまとめた説明図である。FIG. 27 is an explanatory diagram summarizing the valid definition areas of x = f1 (t) and x = f2 (t) illustrated in FIG. 図28は、第4の実施形態における既定式更新処理の処理経過の例を示すフローチャートである。FIG. 28 is a flowchart illustrating an example of the progress of the default update process according to the fourth embodiment. 図29は、本発明の最小構成を示すブロック図である。FIG. 29 is a block diagram showing the minimum configuration of the present invention.
 以下、本発明の実施形態を、図面を参照して説明する。
 最初に、本発明の一形態によるデータ要約システムの概略について説明する。本発明の一形態によるデータ要約システムは、時間経過に沿って逐次的に発生するデータの情報量を削減する。本発明の一形態によるデータ要約システムは、情報量を削減することにより、データ自体をそのまま正確に記憶する場合に比べて、データの記憶に必要となる記憶容量を抑えることができる。このようにデータの情報を削減することを「要約」という。
 一般に、要約によってデータの精度は低下する。しかし本発明の一形態によるデータ要約システムは、時間経過に沿って逐次的に一定の傾向で発生するデータであって、データの傾向が不規則に変化することがあるデータを、精度よく、効率的に要約する。逐次的に一定の傾向で発生するデータであって、データの傾向が不規則に変化することがあるデータの例として、「CPUの使用率」が例示されたが、そのようなデータは、CPUの使用率に限定されない。例えば、観測対象とするWebページの単位時間当たりのアクセス数、Webページ公開後からの総アクセス数等も、突発的に値が変化するので、「逐次的に一定の傾向で発生するデータであって、不規則に変化することがあるデータ」に該当する。また、例えば、ネットワーク機器の通信量等も「逐次的に一定の傾向で発生するデータであって、データの傾向が不規則に変化することがあるデータ」に該当する。なお、「逐次的に発生するデータ」は、「逐次的に観測可能なデータ」も含む概念である。
 「逐次的に一定の傾向で発生するデータであって、不規則に変化することがあるデータ」として、以下の説明では、時間経過とともに発生する数値を例にして説明する。ただし、データ自体は、数値でなくても、数値化可能であり、数値化されたデータ同士の差分が導出できうるデータであればよい。このような数値以外のデータに本発明を適用する場合についての例は、後述される。
 なお、以下に示される例で、各データは、データ値と、そのデータ値の発生時刻とを含む。以下の説明において、このデータ値の発生時刻を、単にデータの発生時刻と呼ぶ場合がある。
 また、本発明の一形態によるデータ要約システムは、発生したデータの集合から、発生時刻を変数とし、データ値(数値)を算出する関数を導出する。この関数は、発生時刻からデータ値の近似値を求める近似式である。そして、この近似式には、データ値の近似値を求めることができる定義域が定められる。この定義域を、以下、有効定義域と記す。有効定義域は、区間(時間帯)または点(特定の時刻)の集合で表される。1つの有効定義域に関して、区間または点が複数定められていてもよい。図1は、有効定義域の例を示す説明図である。図1では、横軸が時刻を表し、縦軸がデータ値を表す。また図1では、区間a~bでデータ値が同じ傾向で変化し、時刻bでデータ値の変化の傾向が大きく変化し、さらに、時刻cで再度、データ値の変化の傾向が変化した場合が例示されている。そして、近似式として関数91,92が得られているとする。図1に示される例では、区間a~bおよびc~dに発生したデータに関しては、関数91で表される近似式によって近似値が得られる。よって関数91で表される近似式の有効定義域は区間a~bおよびc~dの集合である。また、関数92で表される近似式の有効定義域は区間b~cである。
 本発明の一形態によるデータ要約システムは、新たにデータが発生すると、そのデータ値の近似値が適切に得られる近似式の有無を判定し、そのデータ値の近似値が適切に得られる近似式があれば、その近似式の有効定義域に、その新たなデータの発生時刻を追加する。一方、新たに発生したデータの近似値が適切に得られる近似式がなければ、その新たなデータを、対応する近似式が定まっていない点(以下、未確定点と記す)として記憶する。本発明の一形態によるデータ要約システムは、新たな近似式を導出可能な数だけ未確定点が蓄積されたならば、その未確定点から近似式を作成する。
 このように、本発明の一形態によるデータ要約システムは、データ毎に発生時刻とデータ値を記憶するのではなく、近似式および有効定義域という形式で各データを記憶する。さらに、本発明の一形態によるデータ要約システムは、1つの近似式の有効定義域として、複数の区間や点(特定の時刻)を定めることを許容する。この結果、本発明の一形態によるデータ要約システムは、効率的にデータを圧縮(すなわち要約)する。
実施形態1.
 図2は、本発明の第1の実施形態のデータ要約システムの例を示すブロック図である。第1の実施形態のデータ要約システムは、データ入力部10と、最終時刻記憶部11と、新規データ発生時刻代入部12と、新規近似式生成部14と、未確定点記憶部13と、近似式記憶部15と、精度制約入力部16と、グラフ評価部17と、グラフ更新部18と、確定グラフ記憶部19とを備える。本発明の第1の実施形態のデータ要約システムは、データ入力部10に逐次入力されるデータを要約し、要約した結果を確定グラフ記憶部19に記憶させる。
 データ入力部10は、時間経過に伴って逐次データを発生するデータ発生源(図示せず)からデータを取得する。データ発生源の態様は、データの種類によって異なる。例えば、データがウェブページのアクセス数である場合には、Webサーバがデータ発生源であってもよい。また、データがCPUの使用率である場合には、CPUの使用率を監視するユニットがデータ発生源であってもよい。データ発生源からデータ入力部10に入力される個々のデータは、少なくとも、データ値と、そのデータ値の発生時刻とを含む。
 図3は、データ入力部10に入力される1つのデータの例を示す説明図である。図3では、データ値がCPU使用率である場合が例示されている。図3に示されるように、データは、データ値の発生時刻と、データ値(本例ではCPU使用率)とを含んでいる。図3に例示されるデータは、データの発生時刻が「2009/01/01 00:00:00」であり、その時のCPU使用率が5.0%であることを表している。データ発生源で発生し、データ入力部10に入力されたデータを発生データと呼ぶ場合がある。
 未確定点記憶部13は、データ値の近似値を求めるための近似式がまだ特定されていないデータを記憶する記憶装置である。既存のいずれの近似式を用いて近似値が計算されても、実際のデータ値と近似値との差が大きいと判定されるデータが、未確定点として未確定点記憶部13に記憶されていく。なお、第1の実施形態では、データを、発生時刻およびデータ値を座標とする点として捉えることができる。よって、第1の実施形態では、データ値の近似値を求めるための近似式がまだ特定されていないデータは、「未確定点」という文言を用いて表現される。
 発生時刻を変数としてデータ値の近似値を求める近似式は、未確定点記憶部13に記憶される複数の未確定点から導出される。近似式を決定するために必要な個数分のデータが揃うまで、未確定点記憶部13は、未確定点に該当する発生データを記憶していく。なお、近似式を決定するために必要なデータの個数は、近似式の種類(1次式であるか、2次式であるか、三角関数を用いた式であるか等)や、近似式の決定アルゴリズムに依存する。近似式の種類や近似式決定アルゴリズムは予め決められていて、その近似式の種類や近似式決定アルゴリズムに応じて、近似式の決定に要するデータの個数も予め定められている。
 図4は、未確定点記憶部13に記憶される未確定点の一例を示す説明図である。個々の未確定点は、その未確定点(データ)の発生時刻と、データ値(本例ではCPU使用率)とを含む。発生データのうち、対応する近似式が特定できないと判定された発生データが未確定点となるので、個々の未確定点のデータ構造は、図3に例示するデータのデータ構造と同様である。
 新規近似式生成部14は、新たに発生データが新規近似式生成部14に入力されたときに、その発生データと、未確定点記憶部13に記憶されている未確定点との個数が、近似式を決定するために必要な個数を上回ったか否か判定する。そして新規近似式生成部14は、近似式を決定するために必要な個数分のデータが揃ったと判定した場合、それらのデータから、発生時刻を変数としてデータ値を算出する関数(近似式)を生成する。例えば、近似式生成に要するデータ数がk個であり、未確定点記憶部13にk−1個の未確定点が記憶されていると仮定する。このとき、新たに1つの発生データが新規近似式生成部14に入力されると、新規近似式生成部14は、未確定点にそのデータを加えたk個のデータから近似式を生成する。
 さらに、新規近似式生成部14は、近似式生成後に、生成した近似式と、その近似式生成に用いた各データ(すなわち、未確定点記憶部13に記憶されていた未確定点、および、新たに入力された1つの発生データ)とをグラフ更新部18に出力する。なお、新規近似式生成部14は、新規に生成した近似式をグラフ更新部18に出力する代わりに、その近似式およびその近似式のID(識別情報)を近似式記憶部15に記憶させ、グラフ更新部18に、その近似式のIDを出力してもよい。
 図5は、近似式生成に用いる未確定点と新たに発生したデータを模式的に示す説明図である。図5に示される横軸は時刻tを表し、図5に示される縦軸はデータ値xを表す。ここでは、新規近似式生成部14が、近似式として1次式を生成すると仮定する。また、その生成方法は最小二乗法であると仮定する。新規近似式生成部14が最小二乗法で1次式を生成する場合、必要なデータ数は4個である。図5に示す3つの未確定点P400が未確定点記憶部13に記憶されていて、新たにデータP401がデータ入力部10に入力されたとする。すると、新規近似式生成部14は、3つの未確定点P400および新たな発生データP401から、最小二乗法により近似式を求める。
 図6は、図5に示す各データから生成した近似式を模式的に示す説明図である。新規近似式生成部14は、4個のデータ(すなわち、4組の発生時刻およびデータ値)から、発生時刻tの1次関数として表されるデータ値xの近似式“x=at+b”を最小二乗法で決定する。すなわち、新規近似式生成部14は、4組の発生時刻およびデータ値から、最小二乗法によって、変数tの係数「a」および定数項「b」を決定する。新規近似式生成部14は、このように、係数および定数項を決定することで、近似式を生成する。
 図7は、新規近似式生成部14によって生成される近似式の表現形式の例を示す説明図である。“x=at+b”という近似式は、変数tの係数(1次係数)と、定数項とが定まれば一意に決定される。よって、近似式が1次式である場合には、図7に例示されるように変数tの1次係数および定数項のみで近似式が表現されてもよい。
 本例では、新規近似式生成部14が3個の未確定点と新規の1つの発生データとを用いて最小二乗法により1次関数で表される近似式を求める場合が示されたが、新規近似式生成部14が近似式として用いる関数は、1次関数に限定されない。例えば、新規近似式生成部14は2次以上の整関数や、指数関数や、三角関数によって表される近似式を生成してもよい。また、近似式の生成方法は、最小二乗法に限らず、他の方法で近似式が生成されてもよい。既に説明したように、近似式の種類や近似式の生成方法によって、近似式生成に必要なデータ数は異なる。未確定点および新たに発生したデータの数が、そのデータ数に達したときに、新規近似式生成部14は近似式を生成すればよい。また、新規近似式生成部14は、2点を結ぶ1次関数として近似式を生成してもよい。この場合、2つのデータがあれば新規近似式生成部14は、近似式を生成することができる。すなわち、新規近似式生成部14は、1個の未確定点と、新たに発生した1個のデータとから、発生時刻tおよびデータ値xの平面内の2点を結ぶ直線を近似式として生成してもよい。この場合、近似式生成に必要なデータ数は2個である。また、新規近似式生成部14は、スプライン補間等の他の方法を利用して近似式を生成してもよい。
 以下の説明では、新規近似式生成部14が1次関数の近似式を生成する場合を例にして説明する。
 近似式記憶部15は、発生時刻を変数としてデータ値を求めるための近似式を、それぞれその近似式のIDとともに記憶する記憶装置である。新規近似式生成部14が新たに近似式を生成し、その近似式をグラフ更新部18に出力した場合、グラフ更新部18は、その近似式をIDとともに近似式記憶部15に記憶させる。この場合、近似式のIDは、グラフ更新部18が割り当てればよい。近似式記憶部15は、例えば、図7に示される場合と同様に、1次係数および定数項の組み合わせを記憶すればよい。ただし、この記憶態様は例示であり、近似式記憶部15は他の態様で近似式を記憶してもよい。
 図8は、近似式記憶部15が記憶する近似式およびその近似式IDの例を示す説明図である。図8に示されるように、近似式記憶部15は、近似式の識別情報である近似式IDと、近似式(本例では1次係数と定数項との組み合わせで表現される。)とを対応付けて記憶する。
 最終時刻記憶部11は、未確定点以外の最後に発生したデータの発生時刻と、そのデータのデータ値を近似する近似式との組を記憶する記憶装置である。すなわち、最終時刻記憶部11は、データ値の近似値が適切に得られる近似式が特定されたデータのうち、最後に発生したデータの発生時刻と、そのデータの近似値が適切に得られる近似式との組を記憶する。以下、最終時刻記憶部11が記憶するデータ発生時刻および近似式の組み合わせを最終データ情報と記す。また、最終データ情報によって示される近似式を最終近似式と記す。
 図9は、最終時刻記憶部11が記憶する最終データ情報の一例を示す説明図である。図9に示される例では、最終データ情報は、近似式IDと最終時刻とを含む。本例では、近似式IDによって近似式が特定されている。また、最終時刻は、近似式でデータ値の近似を行えるデータのうち、最後に発生したデータのデータ発生時刻である。なお、未確定点記憶部13に記憶されている未確定点は、既知の各近似式で近似できないデータであるので、最終時刻記憶部11に記憶される対象とはならない。よって、最終時刻記憶部11に記憶された最終時刻以降に未確定点が発生したとしても、最終時刻記憶部11に記憶された最終時刻は更新されない。
 なお、図9では、近似式IDで近似式が表されているが、最終時刻記憶部11は、新規近似式生成部14が生成した近似式や、近似式記憶部15が記憶する近似式と同様の形式で近似式を記憶してもよい。例えば、最終時刻記憶部11は、近似式IDの代わりに、近似式を表す情報として、1次係数と定数項との組み合わせを記憶してもよい。
 新規データ発生時刻代入部12は、過去に生成された各近似式に対して、新規にデータ入力部10に入力された発生データの発生時刻を代入し、データ値の近似値を計算する。そして、新規データ発生時刻代入部12は、それぞれの近似式と、近似式毎に計算した近似値との組を、グラフ評価部17に出力する。このとき、新規データ発生時刻代入部12は、最終時刻記憶部11から最終データ情報を読み込む。そして新規データ発生時刻代入部12は、近似式と近似値との組のうち、最終データ情報が示す近似式とその近似式から求めた近似値との組に該当する組がどの組であるのかを示す情報もグラフ評価部17に出力する。さらに、新規データ発生時刻代入部12は、新規にデータ入力部10に入力された発生データもグラフ評価部17に出力する。
 なお、新たに発生したデータの発生時刻を近似式に代入することで近似値を求めるということは、時刻を横軸としデータ値を縦軸とする平面内において、近似式によって表される線を、横軸における最新の発生時刻まで延長することであるということができる。
 図10は、新規データ発生時刻代入部12の処理の例を示す説明図である。図10に示される例では、過去に4つの近似式x=f0(t),x=f1(t),x=f2(t),x=f3(t)が生成されているものとする。図10に示される例では、x=f0(t)で求められるxの値は定数0である。すなわち、f0(t)=0・t+0である。また、図10において、個々の黒丸が各データを表す。そして、各近似式を表す線の近傍に示されているデータが、その近似式でデータ値を近似可能なデータである。例えば、図10に示される例では、x=f1(t)によって、6個のデータのデータ値が近似されている。
 また、図10において、最終データ情報が示す最終時刻は、tlastあり、最終時刻tlastに発生したデータP1010のデータ値は、近似式x=f2(t)で近似されるとする。
 また、最終時刻tlastの後、データ入力部10に入力された新しい発生データの発生時刻は、t=tであるとする。
 新規データ発生時刻代入部12は、近似式記憶部15に記憶されている各近似式x=f0(t),x=f1(t),x=f2(t),x=f3(t)に対して、新しい発生データの発生時刻tを入力し、近似値を計算する。この近似値をそれぞれX1010,X1011,X1012,X1013とすると、新規データ発生時刻代入部12は、X1010=f0(t),X1011=f1(t),X1012=f2(t),X1013=f3(t)を計算する。また、新規データ発生時刻代入部12は、最終時刻記憶部11から最終データ情報を読み込む。そして新規データ発生時刻代入部12は、その最終データ情報が示す近似式とその近似式から求めた近似値との組に該当する組が(x=f2(t),X1012)という組であることを特定する。そして、新規データ発生時刻代入部12は、各近似式と近似値との組、最終近似式による組がどの組であるのかを示す情報、および、時刻tに発生した新規のデータをグラフ評価部17に出力する。なお、このとき、近似式が、近似式IDで表されていても、あるいは、近似式記憶部15に記憶されている形式で近似式が表されてもどちらでもよい。
 精度制約入力部16は、近似式によって計算された近似値が、実際のデータ値を適切に近似しているといえる基準(精度)をユーザから受け取り、その基準を記憶する。この基準の例として、例えば、近似式fによる近似値f(t)と、実際の発生データのデータ値xとの差の絶対値が所定の閾値ε未満であること等が挙げられる。この基準は、|x−f(t)|<εと表すことができる。また、例えば、近似式fによる近似値f(t)に対する、近似値f(t)と実際の発生データのデータ値xとの差の割合の絶対値が閾値ε未満であるという基準が定められていてもよい。この基準は、|(x−f(t))/f(t)|<εと表すことができる。上記の2つの例では、基準は、上記の計算結果がいずれも閾値未満であるという場合が例示されたが、上記の計算結果が閾値以下であるという基準が用いられてもよい。これらの基準は例示であり、他の基準が定められていてもよい。
 以下の説明では、近似式fによる近似値f(t)と、実際の発生データのデータ値xとの差の絶対値が所定の閾値ε未満であるという基準(すなわち、|x−f(t)|<ε)が精度制約入力部16に出力され、精度制約入力部16はこの基準を記憶している場合を例にして説明する。
 確定グラフ記憶部19は、過去の発生データを近似する各近似式に対する有効定義域を記憶する記憶装置である。図11は、確定グラフ記憶部19が記憶する近似式毎の有効定義域の例を示す説明図である。図11に示される例では、各近似式が近似式IDで表されている。そして、各近似式IDに対して有効定義域となる時間の範囲と、有効定義域となる時刻の一点と、のいずれか、あるいは、その両方が定められる。図11では、有効定義域のうち、時間の範囲(すなわち時間帯)として表される部分が「区間」として示され、特定の一点の時刻として表される部分が「点」として示されている。有効定義域に含まれる「区間」や「点」は複数であってもよいので、1つの近似式に2つ以上の「区間」や2つ以上の「点」が定められていてもよい。
 図11に示される例では、近似式ID“f0”に対して、3つの区間[t01b,t01e]、[t02b,t02e]、[t03b,t03e]が定められている。なお、t01b,t01e等は、それぞれ区間の始点または終点となる時刻である。従って、近似式ID“f0”の近似式(x=f0(t)とする。)は、t01b≦t≦t01e、t02b≦t≦t02e、t03b≦t≦t03eのいずれかを満たす時刻tに関して、その時刻tのデータ値を近似値f0(t)で近似できることを意味する。
 同様に、近似式ID“f1”に対して、1つの区間[t11b,t11e]と、2点の時刻t12,t13が定められている。従って、近似式ID“f1”の近似式(x=f1(t)とする。)は、t11b≦t≦t11e、または、t=t12,t=t13のいずれかを満たす時刻tに関して、その時刻tのデータ値を近似値f1(t)で近似できることを意味する。
 これらの「区間」や「点」の集合が有効定義域である。図11に示す例では、近似式ID“f0”の近似式の有効定義域は、[t01b,t01e]∪[t02b,t02e]∪[t03b,t03e]={t|t01b≦t≦t01e,t02b≦t≦t02e,t03b≦t≦t03e}である。同様に、近似式ID“f1”の近似式の有効定義域は、[t11b,t11e]∪{t12}∪{t13}={t|t11b≦t≦t11e,t=t12,t=t13}である。近似式ID“f2”の近似式の有効定義域は、[t22b,t22e]∪[t23b,t23e]∪{t21}={t|t22b≦t≦t22e,t23b≦t≦t23e,t=t21}である。他の近似式の有効定義域に関しても、確定グラフ記憶部19に記憶された情報から特定できる。
 確定グラフ記憶部19は、以下のようなデータ構造の情報を記憶していてもよい。すなわち、確定グラフ記憶部19は、変数を代入してデータ値の近似値を計算するための近似式と、データ値の近似値を求めることができる変数の定義域である有効定義域とが対応付けられた情報であり、有効定義域が、変数の区間または一つの変数値を表す点の集合で表されるデータ構造の情報を記憶していてもよい。第1の実施形態において、この変数は、時刻を表す変数である。
 また、このデータ構造において、ある近似式に対応付けられた有効定義域の区間と区間との間、または、点と点との間、または区間と点との間に、他の近似式に対応付けられた有効定義域の区間または点が存在することが許容される。例えば、図11に示される各区間および点の時系列の順番が、[t11b,t11e],[t01b,t01e],t12,[t02b,t02e],t21,t13,[t22b,t22e],[t03b,t03e],[t31b,t31e],[t22b,t22e]という順番であるとする。この例では、近似式“f0”の区間[t01b,t01e]、[t02b,t02e]の間に、他の近似式“f1”の点t12が存在することが許容される。また、例えば、近似式“f1”の点t12,t13の間に、他の近似式“f0”の区間[t02b,t02e]や、他の近似式“f2”の点t21が存在することが許容される。また、例えば、近似式“f1”の区間[t11b,t11e]および点t12の間に、他の近似式“f0”の区間[t01b,t01e]が存在することが許容される。このように、ある近似式の有効定義域の要素(区間または点)同士の間に、他の近似式の有効定義域の要素(区間または点)が存在していてもよい。本発明の各形態におけるデータ要約システム、データ要約方法、データ要約プログラムは、このようなデータ構造を好適に用いることができる。
 グラフ評価部17は、近似式記憶部15に記憶された各近似式に対して新規データ発生時刻代入部12が新規のデータの発生時刻を代入して計算した各近似値と、その新規のデータの実際のデータ値とを比較する。そして、グラフ評価部17は、精度制約入力部16が記憶している基準を満たす近似式を特定する。さらに、グラフ評価部17は、基準を満たす近似式が複数存在する場合には、基準を満たす各近似式のうち、有効定義域を更新する際の記憶容量の増加が最小となる近似式を特定し、その近似式で、新規のデータのデータ値を近似すると決定する。また、グラフ評価部17は、基準を満たす近似式が1つしか存在しない場合には、その近似式で、新規のデータのデータ値を近似すると決定する。そして、グラフ評価部17は、決定した近似式およびその有効定義域と、新規のデータ(新規データ発生時刻代入部12から受け取った発生データ)を、グラフ更新部18に出力する。このとき、グラフ評価部17は、決定した近似式が最終近似式であるか否かを示す情報もグラフ更新部18に出力する。また、グラフ評価部17は、決定した近似式として、例えば、その近似式IDをグラフ更新部18に出力すればよい。なお、グラフ評価部17は、有効定義域を確定グラフ記憶部19から読み込めばよい。
 なお、新規データ発生時刻代入部12からグラフ評価部17に出力される近似式が、近似式IDの形式で表されている場合には、グラフ評価部17は、近似式記憶部15に記憶されているすべての近似式も近似式記憶部15から読み込む。
 精度制約入力部16が記憶している基準を満たす近似式をグラフ評価部17が特定できない場合もある。すなわち、精度制約入力部16が記憶している基準を満たす近似式が存在しない場合もある。その場合、グラフ評価部17は、近似式を選択せず、新規のデータ(新規データ発生時刻代入部12から受け取った発生データ)をグラフ更新部18に出力すればよい。
 図12ないし図14を参照して、グラフ評価部17による近似式選択の例を具体的に説明する。図12、図13、図14は、精度制約入力部16が記憶する基準を満たす近似式を選択する例を示す説明図である。なお、図12ないし図14に示すI11,I01,I12等は、有効定義域に含まれる区間や点であって、図11に例示した区間や点に相当する。図12ないし図14に示される例では、いずれも、過去に生成されている近似式として、x=f0(t),x=f1(t),x=f2(t),x=f3(t)がある。そして、x=f0(t)の有効定義域は、I01∪I02∪I03である。x=f1(t)の有効定義域は、I11∪I12∪I13である。x=f2(t)の有効定義域は、I21∪I22∪I23である。x=f3(t)の有効定義域は、I31である。また、グラフ評価部17が受け取る新たな発生データの発生時刻をtとする。そして、その発生データのデータ値をxとする。
 図12では、時刻tにデータP1021が発生した場合が例示されている。図12に示される例では、その発生時刻tをx=f1(t)に代入して求めた近似値はX1011=f1(t)である。データP1021のデータ値xと近似値との差の絶対値が閾値ε未満という基準を満たす近似式がx=f1(t)だけであるならば、グラフ評価部17は、x=f1(t)を選択する。すなわち、|x−X1011|<εのみが成立するならば、グラフ評価部17は、x=f1(t)を選択する。
 複数の近似式が精度の基準を満たす場合、グラフ評価部17は、その複数の近似値を個別に着目する。そして、グラフ評価部17は、着目している近似式が新規の発生データを表す近似式であるものとして有効定義域を更新した場合における、有効定義域を記憶するための記憶容量の増加量を計算する。そして、グラフ評価部17は、その増加量が最も小さい近似式を選択する。図13は、この態様の選択例を例示している。
 図13に示される例では、時刻tにデータP1022が発生した場合が例示されている。図13に示される例では、その発生時刻tをx=f2(t)に代入して求めた近似値は、X1012=f2(t)である。また、tをx=f3(t)に代入して求めた近似値は、X1013=f3(t)である。この2つの近似式に関しては、いずれも、データP1022のデータ値xと近似値との差の絶対値が閾値ε未満という基準を満たす。すなわち、|x−X1012|<εおよび|x−X1013|<εの両方が成立する。
 この場合、グラフ評価部17は、基準を満たす各近似式のうち、最終近似式であって、有効定義域の終端が「区間」の終点となっている最終近似式を選択すればよい。有効定義域の終端が「区間」の終点となっている最終近似式の有効定義域にtを追加する場合、グラフ評価部17は、その「区間」の終点をtに置き換えればよい。この処理を行うことで、有効定義域を表すための記憶容量の増加量が0となるからである。一方、最終近似式以外の近似式や、有効定義域の終端が「点」である最終近似式の場合、有効定義域にtを追加するときには、有効定義域を記憶するための記憶容量が数値1つ分増加してしまう。よって、グラフ評価部17は、基準を満たす近似式が複数ある場合、有効定義域の終端が「区間」の終点となっている最終近似式を選択する。
 図13に示される例の場合、発生データP1022を表す近似式がx=f2(t)であるとした場合、有効定義域の更新時に、有効定義域を表現するために必要な記憶容量は増加しない。発生データP1022が入力される前のx=f2(t)の有効定義域は、I21∪I22∪I23である。I21={t21}、I22=[t22b,t22e]、I23=[t23b,t23e]とおくと、この有効定義域は、t21,t22b,t22e,t23b,t23eという5つの数値で表現される。この有効定義域の終端は、区間I23の終点(右端)t23e=tlastである。よって、区間I23に時刻tを追加した区間は、I23∪{t}=[t23b,t]と表すことができる。すなわち、グラフ評価部17がx=f2(t)の有効定義域に時刻tを追加するように更新した場合、更新後の有効定義域は、{t21}∪[t22b,t22e]∪[t23b,t]と表すことができるので、有効定義域を表現するために必要な記憶容量は増加しない。
 一方、図13に示される例において、発生データP1022を表す近似式がx=f3(t)であると仮定した場合、有効定義域の更新時に、有効定義域を表現するために必要な記憶容量が増加する。発生データP1022が入力される前のx=f3(t)の有効定義域は、I31である。I31=[t31b,t31e]とおくと、この有効定義域はt31b,t31eという2つの数値で表現される。ここで、t31e<tlast<tである。tlastはx=f2(t)の有効定義域であるから、グラフ評価部17は、tlastをx=f3(t)の有効定義域に含めることはできない。そのため、グラフ評価部17は、x=f3(t)の有効定義域に時刻tを追加する場合には、I31∪{t}=[t31b,t31e]∪{t}として、{t}を点として追加することになる。この結果、有効定義域を表す場合に、t31b,t31e,tという3つの数値で表す必要があり、数値1つ分記憶容量が増加してしまう。
 従って、図13に示される例では、グラフ評価部17は、基準を満たす各近似式のうち、x=f2(t)を選択する。
 なお、有効定義域の終端が「点」となっている最終近似式の場合にも、新たな発生時刻tが有効定義域に追加される場合には、有効定義域を表現するための記憶容量が数値1つ分増加する。これは、更新前に「点」となっていた時刻を始点として、tを終点とする区間が新たに定められるためである。すなわち、グラフ評価部17は「点」として記憶していた部分を、「区間」として記憶しなければならなくなるためである。
 複数の近似式が精度の基準を満たし、更新後の有効定義域を記憶するための記憶容量の増加量が各近似式で等しい場合には、グラフ評価部17は、その近似式の中から1つの近似式を選択すればよい。この近似式の選択方法は、予め任意に1つの近似式が定められるという方法でもよい。
 図14は、時刻tにデータP1023が発生した場合を例示している。図14に示される例では、その発生時刻tをx=f1(t)に代入して求めた近似値はX1011=f1(t)である。また、tをx=f3(t)に代入して求めた近似値は、X1013=f3(t)である。この2つの近似式に関しては、いずれも、データP1023のデータ値xと近似値との差の絶対値が閾値ε未満という基準を満たす。すなわち、|x−X1011|<εおよび|x−X1013|<εの両方が成立する。また、x=f1(t)およびx=f3(t)はいずれも最終近似式ではなく、有効定義域にtを追加した場合の記憶容量の増加量は2つの近似式で等しい。
 この場合、グラフ評価部17は、近似値と実際のデータ値xとの差が最小となる近似式を選択してもよい。図14に例示される例の場合では、|x−X1011|<|x−X1013|<εであるので、グラフ評価部17は、x=f1(t)を選択してもよい。この選択方法は、記憶容量の増加量が等しい複数の近似式から1つの近似式を選択する一例であり、他の方法で近似式が選択されてもよい。
 例えば、有効定義域にtを追加した場合の記憶容量の増加量が等しい複数の近似式のうち、グラフ評価部17は、有効定義域の上限(終端)が現在時刻に最も近い近似式を選択してもよい。換言すれば、グラフ評価部17は、有効定義域の上限(終端)が最大の近似式を選択してもよい。図14に示される例では、x=f1(t)の有効定義域の上限はI13={t13}である。また、x=f3(t)の有効定義域の上限は、区間I31の終点のt31eである。よって、t13<t31eであることにより、グラフ評価部17は、x=f3(t)を選択してもよい。また、グラフ評価部17は、ここで例示された以外の方法で、複数の近似式から1つの近似式を選択してもよい。
 グラフ更新部18は、グラフ評価部17からの入力があった場合には、その内容に応じて、グラフ評価部17が選択した近似式の有効定義域の更新、あるいは、未確定点記憶部13への未確定点の追加を行う。また、グラフ更新部18は、新規近似式生成部14からの入力があった場合には、近似式記憶部15への近似式の新規登録を行う。
 グラフ更新部18が、グラフ評価部17によって決定された近似式およびその有効定義域、新規の発生データ、決定された近似式が最終近似式であるか否かを示す情報を、グラフ評価部17から受け取った場合、グラフ更新部18は、その近似式の有効定義域を更新する。グラフ更新部18は、入力された近似式の有効定義域に、新規の発生データの発生時刻を追加するように有効定義域を更新すればよい。グラフ更新部18は、グラフ評価部17によって決定された近似式の更新後の有効定義域を確定グラフ記憶部19に記憶させる。
 このとき、グラフ更新部18は、以下のように場合分けして有効定義域を更新すればよい。第1の更新態様として、グラフ評価部17に入力された近似式が最終近似式であり、その近似式の有効定義域の終端が「区間」の終点であるならば、グラフ更新部18は、その「区間」の終点を、新規の発生データの発生時刻に置き換えればよい。
 第2の更新態様として、グラフ評価部17に入力された近似式が最終近似式であり、有効定義域の終端が「点」であるならば、グラフ更新部18は、その点を始点とし、新規の発生データの発生時刻を終点とする新たな「区間」を作成すればよい。また、有効定義域の終端であった「点」は、「区間」に含まれることになる。よってグラフ更新部18は、有効定義域内の「点」の分類から有効定義域の終端であった「点」を除外すればよい。図11に示される例では、グラフ更新部18は、「点」の項目から1つの点を除外し、「区間」の項目に新たに区間を追加することになる。
 第3の更新態様として、グラフ評価部17に入力された近似式が最終近似式でない場合には、グラフ更新部18は、新規の発生データの発生時刻を「点」として、その近似式の有効定義域に追加すればよい。
 また、グラフ更新部18は、グラフ評価部17によって決定された近似式の有効定義域を更新する場合、最終データ情報も更新する。グラフ更新部18は、最終時刻記憶部11に記憶されている最終データ情報(図9参照)の近似式IDを、グラフ評価部17に決定された近似式の近似式IDに更新し、また、最終時刻を、新規の発生データの発生時刻に更新する。
 また、グラフ評価部17が近似式を選択せずに新規の発生データだけをグラフ更新部18に出力した場合、グラフ更新部18は、その発生データを未確定点として未確定点記憶部13に記憶させる。この場合、グラフ更新部18は、未確定点を1つ未確定点記憶部13に記憶させるだけであり、最終時刻記憶部11、近似式記憶部15および確定グラフ記憶部19に記憶された情報の更新を行わない。
 また、新規近似式生成部14が未確定点から新たに近似式を生成すると、新規近似式生成部14は、その近似式と、その近似式生成に用いた各データ(すなわち、未確定点記憶部13に記憶されていた各未確定点、および、新たに入力された発生データ)とをグラフ更新部18に出力する。この場合、グラフ更新部18は、その新規の近似式に近似式IDを割り当て、その新規の近似式およびその近似式IDを対応付けて近似式記憶部15に記憶させる。このとき、グラフ更新部18は、新たな近似式の生成に使用された未確定点を削除する。すなわち、グラフ更新部18は、未確定点記憶部13に記憶されている各未確定点を削除する。また、グラフ更新部18は、その近似式生成に用いた各データの発生時刻から、その新規の近似式の有効定義域を定め、近似式IDとともに確定グラフ記憶部19に記憶させる。グラフ更新部18は、有効定義域を定める場合には、新規の近似式生成に用いたデータの発生時刻を始点・終点とする区間の数ができるだけ少なくなり、かつ、既存の近似式の有効定義域と重複しないという条件を満たすように有効定義域を定める。グラフ更新部18は、既存の近似式の有効定義域の区間や点の間に単独で存在し、区間の始点または終点とすることができない時刻を点として定めればよい。
 図11、図13、図12、図15を参照して、グラフ更新部18による有効定義域の更新の例を説明する。
 まず、新規のデータのデータ値を近似する近似式としてグラフ評価部17に決定された近似式が最終近似式である場合の有効定義域更新の例を、図13を参照して説明する。図13に示される例では、近似式“x=f2(t)”を指定する近似式ID、およびその近似式の有効定義域、発生データP1022、“x=f2(t)”が最終近似式である旨の情報がグラフ評価部17からグラフ更新部18に出力される。グラフ更新部18は、“x=f2(t)”の有効定義域I21∪I22∪I23に{t}を追加するように、有効定義域を更新する。このとき、有効定義域の終端が区間I23の終点であるので、グラフ更新部18は、区間I23である[t23b,t23e](図11参照)の終点t23eをtに更新すればよい。グラフ更新部18は、更新後の有効定義域を確定グラフ記憶部19に記憶させる。この結果、図11に示す近似式ID“f2”に対応する区間は、[t22b,t22e],[t23b,t23e]から、[t22b,t22e],[t23b,t]に更新される。
 次に、新規のデータのデータ値を近似する近似式としてグラフ評価部17に決定された近似式が最終近似式でない場合の有効定義域更新の例を、図12を参照して説明する。図12に示される例では、近似式“x=f1(t)”を指定する近似式ID、およびその近似式の有効定義域、発生データP1021、“x=f1(t)”が最終近似式でない旨の情報がグラフ評価部17からグラフ更新部18に出力される。この場合、グラフ評価部17に指定された近似式は最終近似式ではないので、グラフ更新部18は、x=f1(t)の有効定義域に、発生データP1021の発生時刻tを「点」として追加すればよい。すなわち、図11に示される例では、グラフ更新部18は、近似式ID“f1”に対応する有効定義域の「点」にtを追加することで、確定グラフ記憶部19の記憶内容を更新する。
 次に、新規近似式生成部14が新規に近似式を生成した場合における、新規有効定義域登録の例を示す。図15に示される例では、新規近似式生成部14が、未確定点P1030および新規の発生データP1031から近似式“x=fnew(t)”を生成し、新規近似式生成部14がx=fnew(t)と未確定点P1030および新規の発生データP1031とをグラフ更新部18に出力する。グラフ更新部18は、近似式“x=fnew(t)”に近似式IDを割り当て、“x=fnew(t)”とともに近似式記憶部15に記憶させる。また、グラフ更新部18は、未確定点P1030および新規の発生データP1031から有効定義域を定める。グラフ更新部18は、図15に示される未確定点P1030および新規の発生データP1031のうち最初のデータの発生時刻から最後のデータ発生時刻までで、他の近似式の有効定義域と重複しない1つの区間Inewを定めることができる。よって、グラフ更新部18は、“x=fnew(t)”の近似式IDとともに有効定義域(区間Inew)を確定グラフ記憶部19に記憶させる。
 データ入力部10、新規データ発生時刻代入部12、新規近似式生成部14、精度制約入力部16、グラフ評価部17、およびグラフ更新部18は、例えば、データ要約プログラムに従って動作するコンピュータのCPUによって実現される。この場合、コンピュータのプログラム記憶装置(図示せず)がデータ要約プログラムを記憶し、CPUがそのプログラムを読み込み、プログラムに従って、データ入力部10、新規データ発生時刻代入部12、新規近似式生成部14、精度制約入力部16、グラフ評価部17、およびグラフ更新部18として動作すればよい。また、これらの各部がそれぞれ別々のハードウェアで実現されていてもよい。
 また、最終時刻記憶部11、未確定点記憶部13、近似式記憶部15、および確定グラフ記憶部19は、別々の記憶装置で実現されていてもよい。あるいは、同一の記憶装置で実現されていてもよい。また、最終時刻記憶部11、未確定点記憶部13、近似式記憶部15、および確定グラフ記憶部19のうちの一部の組み合わせが同一の記憶装置で実現されていてもよい。
 次に、第1の実施形態の動作について説明する。
 図16は、第1の実施形態の処理経過の例を示すフローチャートである。データ発生源(図示せず)が逐次、データを発生させている場合(ステップS100のNo)、データ入力部10には、逐次、データがデータ発生源から入力される(ステップS101)。本例では、データ入力部10に1つ1つ発生時間順にデータが入力されるものとする。そして、データ要約システムは、以降の動作を個々のデータ毎に行う。なお、データ入力部10に複数のデータがまとめて入力されてもよいが、その場合であっても、データ要約システムは、データ発生時刻順に、1つ1つのデータ毎に以降の処理を行う。
 次に、新規近似式生成部14は、データ入力部10に入力された1つの発生データと、未確定点記憶部13に記憶されている未確定点とから、新規に近似式を生成できるか否かを判定する(ステップS102)。ステップS102では、新規近似式生成部14は、未確定点記憶部13に記憶されている各未確定点を読み取り、その未確定点と、データ入力部10に入力された新規の発生データとにより、近似式生成に必要な数のデータが揃ったか否かを判定すればよい。新規近似式生成部14は、近似式生成に必要な数のデータが揃っていれば、新規近似式を生成可能であると判定し、揃っていなければ、新規近似式を生成できないと判定すればよい。既に説明したように、近似式生成に必要な数は、近似式の種類や近似式生成アルゴリズムに応じて予め定められている。
 新規近似式生成部14が、新規近似式を生成可能であると判定した場合(ステップS102のYes)、新規近似式生成部14は、各未確定点と、データ入力部10に入力された1つの発生データとから、それらのデータの発生時刻を変数としてデータ値を近似する近似式を新たに生成する(ステップS103)。
 近似式の種類や近似式生成アルゴリズムは、予め定められるが、その種類やアルゴリズムは特に限定されない。既に説明したように、近似式が、発生時刻tを変数とする1次式であるとものとし、4つのデータが揃ったときに、新規近似式生成部14は、4組の発生時刻およびデータ値から、その1次係数および定数項を最小二乗法で決定することにより、近似式を生成してもよい。また、2つのデータが揃ったときに、新規近似式生成部14は、(発生時刻,データ値)を座標とする2点を通過する直線を、発生時刻tを変数とする1次式として求めてもよい。この場合も、新規近似式生成部14は、1次係数および定数項を決定すればよい。
 本例では、近似式が1次式であり、図7に示すように、変数tの1次係数および定数項で近似式が表されるものとする。ただし、近似式の表現形式はこの形式に限定されず、他の形式で近似式が表現されてもよい。
 ステップS103で、新規近似式生成部14は、生成した新たな近似式と、その近似式生成に用いた各データ(各未確定点および新規の発生データ)とをグラフ更新部18に出力する。
 グラフ更新部18は、新規近似式生成部14から受け取った近似式に、近似式IDを割り当て、近似式記憶部15に記憶させる(ステップS104)。この結果、新たに近似式が1つ追加登録されたことになる。また、ステップS104では、さらに、グラフ更新部18は、近似式生成に用いた各データの発生時刻に基づいて、新たに生成された近似式の有効定義域を定め、近似式IDとともに確定グラフ記憶部19に記憶させる。このとき、グラフ更新部18は、近似式生成に用いた各データの発生時刻を始点・終点とする区間数ができるだけ少なくなり、既存の近似式の有効定義域と重複しないという条件を満たすように、新規近似式の有効定義域を定める。グラフ更新部18は、既存の近似式の有効定義域の区間や点の間に単独で存在し、区間の始点または終点とすることができない時刻を「点(図11参照)」とすればよい。
 また、新規近似式生成部14が、新規近似式を生成できないと判定した場合(ステップS102のNo)、新規データ発生時刻代入部12は、データ入力部10に入力されたデータの発生時刻を、過去に既に生成されている各近似式(すなわち、近似式記憶部15に記憶されている各近似式)に代入する。そして新規データ発生時刻代入部12は、データ入力部10に入力されたデータの近似値を近似式毎に計算する(ステップS105)。そして、新規データ発生時刻代入部12は、近似式を用いて計算した近似値とその近似式の組をグラフ評価部17に出力する。このとき、新規データ発生時刻代入部12は、最終データ情報を参照して、最終近似式および最終近似式から計算した近似値の組がどれであるかを示す情報もグラフ評価部17に出力する。また、新規データ発生時刻代入部12は、現在処理対象としている発生データ(データ入力部10に入力されたデータ)もグラフ評価部17に出力する。ステップS105の処理の詳細については、後述する。
 ステップS105の後、グラフ評価部17は、近似式毎に計算されたデータの近似値と、データ入力部10に入力されたデータの実際のデータ値とを比較する(ステップS106)。本例は、精度制約入力部16が、近似値と実際のデータ値との差の絶対値が閾値ε未満であるという基準(すなわち、|x−f(t)|<ε)を記憶していて、グラフ評価部17は、この基準を満たす近似式を選択する場合の例である。この場合、グラフ評価部17は、ステップS106において、近似式毎に計算されたデータの近似値と、データ入力部10に入力されたデータの実際のデータ値との差分の絶対値を計算する。
 次に、グラフ評価部17は、ステップS106で計算した差分の絶対値が閾値ε未満になっているという基準を満たす近似式が存在するか否かを判定する(ステップS107)。
 基準を満たす近似式が存在する場合には(ステップS107のYes)、グラフ評価部17は、基準を満たす各近似式のうち、有効定義域更新時の記憶容量増加量が最小となる近似式を選択する(ステップS108)。すなわち、グラフ評価部17は、更新前の有効定義域を記憶するための記憶容量から、更新後の有効定義域を記憶するための記憶容量への増加量が最小となる近似式を選択する。
 ただし、基準を満たす近似式が1つしか存在しない場合には、グラフ評価部17は、その近似式を選択すればよい。
 また、基準を満たす近似式が複数存在し、かつ、その近似式の中に、最終近似式であって、有効定義域の終端が「区間」の終点となっている最終近似式があれば、グラフ評価部17は、その最終近似式を選択すればよい。この最終近似式が、記憶容量増加量が最小となるからである。
 また、基準を満たす近似式が複数存在し、かつ、その近似式の中に、最終近似式であって、有効定義域の終端が「区間」の終点となっている最終近似式が存在していなければ、グラフ評価部17は、基準を満たす複数の近似式から1つを選択する。この選択方法は、予め定められてもよい。
 グラフ評価部17は、ステップS108において、基準を満たす近似式を1つ選択したならば、選択した近似式およびその有効定義域と、処理対象としている発生データとをグラフ更新部18に出力する。また、グラフ評価部17は、選択した近似式が最終近似式であるか否かを示す情報も合わせてグラフ更新部18に出力する。グラフ評価部17は、近似式として、近似式を識別する近似式IDをグラフ更新部18に出力すればよい。
 グラフ更新部18は、ステップS108でグラフ評価部17から受け取った近似式の有効定義域を更新し、確定グラフ記憶部19に記憶させる(ステップS109)。グラフ更新部18が有効定義域を更新する各態様については既に説明したので、ここでは省略する。また、ステップS109において、グラフ更新部18は、最終時刻記憶部11に記憶されている最終データ情報(図9参照)の近似式IDを、ステップS108で選択された近似式の近似式IDに更新し、最終データ情報の最終時刻(図9参照)を、処理対象としている発生データの発生時刻に更新する。
 また、ステップS107において、基準を満たす近似式が存在しないと判定された場合(ステップS107のNo)、グラフ評価部17は、処理対象としている発生データをグラフ更新部18に出力し、グラフ更新部18は、その発生データを未確定点として、未確定点記憶部13に記憶させる(ステップS110)。
 データ要約システムは、ステップS104,S109,S110のいずれかの処理を完了したならば、データ入力部10に入力された次のデータ(発生時間順で次のデータ)に対して同様の処理を繰り返す。このように、データ入力部10に入力された1つ1つの発生データに対して、発生時間順に、個別にステップS102以降の処理を行う。そして、データ発生源がデータ発生を終了した場合には(ステップS100のYes)、データ要約システムは、処理を終了する。
 なお、図16に示される例では、精度制約入力部16が、近似値と実際のデータ値との差の絶対値が閾値ε未満であるという基準を記憶する場合が例示されたが、精度制約入力部16が記憶する基準は上記の基準に限定されない。
 次に、上記のステップS105の動作について、より詳細に説明する。図17は、ステップS105の処理経過の例を示すフローチャートである。ステップS105において、新規データ発生時刻代入部12は、近似式記憶部15からまだ読み込んでいない近似式があるか否かを判定する(ステップS201)。読み込んでいない近似式があれば、新規データ発生時刻代入部12は、近似式記憶部15に記憶されている近似式および近似式IDを読み込む(ステップS202)。ここで、新規データ発生時刻代入部12が読み込んだ近似式をx=F(t)とし、その近似式IDを“f”とする。次に、新規データ発生時刻代入部12は、読み込んだ近似式x=F(t)に、データ入力部10に入力されたデータの発生時刻tを代入し、近似値F(t)を計算する。そして、新規データ発生時刻代入部12は、近似式ID“f”と近似値F(t)との組をグラフ評価部17に出力する(ステップS203)。
 ステップS203の後、新規データ発生時刻代入部12はステップS201以降の処理を繰り返す。そして、近似式記憶部15からまだ読み込んでいない近似式がなくなったならば(ステップS201のNo)、新規データ発生時刻代入部12は最終データ情報を最終時刻記憶部11から読み込み(ステップS204)、最終データ情報に含まれる最終近似式の近似式IDをグラフ評価部17に出力する(ステップS205)。また、このとき、新規データ発生時刻代入部12はデータ入力部10に入力された処理対象としているデータもグラフ評価部17に出力する。
 なお、新規データ発生時刻代入部12は、ステップS201~S203のループ処理の前に、ステップS204,S205を実行してもよい。
 第1の実施形態のデータ要約システムは、CPUの資料率やWebページのアクセス数などのように、「逐次的に一定の傾向で発生するデータであって、不規則に変化することがあるデータ」を、発生時刻を変数とする近似式と、その近似式による近似が適切であると言える有効定義域として記憶する。1つの近似式の有効定義域は、時間の幅を示す区間および一点の時刻を示す点の集合として表される。そして、第1の実施形態のデータ要約システムは、新たなデータに関して、精度制約入力部16が記憶する基準を満たす近似式を特定して、その近似式でその新たなデータを近似する。従って、第1の実施形態のデータ要約システムは、精度よくデータを要約(圧縮)することができる。
 また、1つの近似式の有効定義域が複数の「区間」や「点」を含むことを許容しているので、第1の実施形態のデータ要約システムは、要約したデータの記憶容量を抑えることができ、効率的なデータ要約を実現することができる。例えば、ある近似式で近似できるデータが連続して発生している状態(第1の状態)の後、一旦データ値の傾向が変わり(第2の状態)、その後、再度、元の近似式で近似できるデータが発生する状態(第3の状態)になったとする。この場合、3つの状態になっているが、有効定義域が複数の「区間」や「点」を含むことを許容するので、第1の実施形態のデータ要約システムは、第1の状態と第3の状態の発生データを同一の近似式で表すことができ、その分、記憶容量を少なく抑えることができる。仮に、有効定義域は1つの区間または点しか含めないとすると、上記の例では、データ要約システムは、第1の状態における近似式および有効定義域と、第3の状態における近似式および有効定義域とをそれぞれ記憶する必要があり、近似式を重複して記憶することとなり、記憶容量が増加してしまう。第1の実施形態のデータ要約システムは、そのような記憶容量の増加を防止することができる。
 図13に示される場合と同様にデータが逐次発生する場合を例にして、具体例を示す。このようなデータに、特許文献1に記載された技術を適用してデータ圧縮が行われると仮定する。その場合、あるデータを基準点(pivot)とし、そのpivotと、発生データのデータ値との差分が圧縮精度を超えた場合にそれらは近似しないことになる。この場合、図18に示されるように、近似式x=h1(t),x=h2(t),x=h3(t),x=h4(t),x=h5(t)のような5つの近似式と、それらで近似しきれない点P191,P192,P193で、発生データ群は表される。そして、1つの近似式に対して、有効な時間帯の区間が1つ定められる。個々の近似式は1次関数であり、近似式は1次係数および定数項で表現されるので、個々の近似式は2つの数値で表される。また、近似式に対応する区間は1つであるので、区間の始点・終点の2つの数値で当該区間は表される。よって、1つの近似式およびその区間を記憶するには数値4個分の記憶容量が必要になる。図18に示される例では、5つの近似式があるので、各近似式およびその区間の記憶のための記憶容量は、4×5=20個となる。
なお、ここでは、便宜的に、記憶容量を個数で表す。図18に示される例では、記憶されるべき情報としてさらに、点P191,P192,P193がある。データ要約システムは、1つの点について、発生時刻およびデータ値を記憶しなければならない。例えばデータ要約システムは、点P191について、発生時刻tとデータ値X191を記憶しなければならない。よってデータ要約システムは、1つの点について数値2個分の記憶容量が必要になる。よって、図18に例示される場合、必要な記憶容量は、4×5+2×3=26個となる。
 一方、第1の実施形態のデータ要約システムは、同じ発生データに対して、図13に示される4つの近似式で近似を行う。この場合、近似式自体の記憶に必要な容量は、2×4=8個である。有効定義域の記憶容量に着目すると、第1の実施形態のデータ要約システムは、x=f1(t)に関しては、1つの区間と2つの点を記憶すればよい。有効定義域として記憶する区間に関しては数値2個分の記憶容量があればよく、点に関しては数値一個分の記憶容量があればよい。よって、x=f1(t)の有効定義域に関しては、必要な記憶容量は2×1+2=4個である。近似式x=f0(t)の有効定義域は3区間である。近似式x=f2(t)の有効定義域は2つの区間と1つの点である。近似式x=f3(t)の有効定義域は1つの区間である。よって、各有効定義域の記憶に必要な容量は、(2×3)+(2×1+2)+(2×2+1)+(2×1)=17個となる。よって、第1の実施形態のデータ要約システムは、8+17=25個分の記憶容量で要約結果を記憶できる。したがって第1の実施形態のデータ要約システムは記憶容量を少なく抑えることができる。
 なお、本発明は、「逐次的に一定の傾向で発生するデータであって、データの傾向が不規則に変化することがあるデータ」の要約を目的にするが、一時的に不規則なデータが連続的に発生してもよい。図19は、一時的に不規則なデータが連続して発生している状況の例を示す説明図である。図19に示される期間“b”では、近似式x=f2(t)で表されるデータの発生が続いている。同様に、期間“c”では近似式x=f1(t)で表されるデータの発生が続き、期間“d”では近似式x=f3(t)で表されるデータの発生が続いている。しかし、期間“a”では、異なる傾向のデータ(換言すれば、異なる近似式で近似されるデータ)が連続して発生している。この期間“a”のように、不規則なデータが連続的に発生し、それらの各データが未確定点として蓄積される場合もある。この場合、それらの未確定点からは、理想的な近似式とは異なる近似式が得られることになる。しかし、一定の傾向のデータが連続して発生し、その傾向が不規則に変化するデータであれば、一時的に、理想的な近似式とは異なる近似式が得られることがあっても、長期的に見れば、一定の傾向のデータが連続して発生する場合も多い。よってそれらのデータから、理想的な近似式が得られる。例えば、図19において区間aで、望ましくない近似式が得られたとしても、その後の区間b,c,dで、x=f1(t)、x=f2(t)、x=f3(t)等の理想的な近似式が得られる。この結果、全体としては、効率的なデータの要約が実現できる。
 また、近似式を時刻tの一次式とし、1つの未確定点と、新規の一つのデータとが揃ったときに、第1の実施形態のデータ要約システムは二点を結ぶ直線を近似式として生成すると仮定する。この場合第1の実施形態のデータ要約システムは、少なくとも、データ自体をそのまま記憶する場合の記憶容量よりも大きな記憶容量でデータを記憶することはない。例えば、第1の実施形態のデータ要約システムは、二個のデータを記憶する場合、1個当たり、データ値および発生時刻の2つの数値を記憶するので、2個のデータでは、数値4個分の記憶容量が必要となる。ここで、上記のように、1つの未確定点と、新規の一つのデータとが揃ったときに、第1の実施形態のデータ要約システムは、二点を結ぶ直線を近似式として生成すると仮定する。この場合、近似式は1次式なので、第1の実施形態のデータ要約システムは、1次係数および定数項を記憶する必要がある。また、第1の実施形態のデータ要約システムは、有効定義域については、2つのデータの発生時刻を始点・終点として記憶すればよい。よって、近似式および有効定義域の記憶に要する記憶容量の数値4個分である。従って、第1の実施形態のデータ要約システムは、この場合でも効率的な圧縮にはならなくても、少なくとも、2個のデータをそのまま記憶するときと同じ容量で近似式および有効定義域を記憶できる。
実施形態2.
 図20は、本発明の第2の実施形態のデータ要約システムの例を示すブロック図である。第1の実施形態と同様の構成要素については、図2と同一の符号を付し、詳細な説明を省略する。第2の実施形態のデータ要約システムは、データ入力部10と、最終時刻記憶部11と、新規データ発生時刻代入部12と、新規近似式生成部14と、未確定点記憶部13と、近似式記憶部15と、精度制約入力部16と、グラフ評価部17aと、グラフ更新部18aと、確定グラフ記憶部19と、既定式入力部20と、既定式記憶部21とを備える。
 既定式記憶部21は、データ値の近似値を求めるための近似式として既知となっている1つの近似式を記憶する記憶装置である。例えば、第1の実施形態の図10では、x=f0(t)からx=f3(t)までの4つの近似式を生成した場合が示されているが、これらのうちの1つの近似式が処理開始前に既知となっているとする。既定式記憶部21は、そのような処理開始前に既知となっている近似式を既定式として記憶する。
 第2の実施形態のデータ要約システムは、既定式の有効定義域を定めないことにより、有効定義域の記憶容量を減少させ、第1の実施形態よりもさらに効率的にデータ要約を実現する。
 図21は、既定式記憶部21に記憶されている既定式の一例を示す説明図である。図21に示される例では、既定式記憶部21が、既定式の1次係数および定数項を記憶する。ここでは、図7や図8に示される場合と同様に、変数の1次係数および定数項によって変数tの1次式が表現されている。すなわち、図21では、変数tの1次式の既定式を記憶している場合が例示されている。具体的には、既定式記憶部21がx=a×t+bという既定式を記憶する場合が例示されている。
 図21では、図7や図8に示される場合と同様に、1次係数および定数項の組み合わせによって既定式を表現する場合が例示されているが、他の形式で既定式が表現されてもよい。図22は、定数項のみで既定式を表現した場合の例である。図22に示される例では、x=x(定数)という既定式が表されている。これは、1次係数が0の場合における既定式と同様であり、変数(発生時刻)によらずに近似値がxであることを表す。また、既定式も、2次以上の整関数、指数関数、三角関数等の種々の関数で表されていてもよい。
 既定式入力部20は、ユーザから既定式の入力を受け付け、その既定式を既定式記憶部21に記憶させる。
 第2の実施形態におけるグラフ評価部17aは、近似式記憶部15に記憶された各近似式に対して新規データ発生時刻代入部12が新規のデータの発生時刻を代入して計算した各近似値と、その新規のデータの実際のデータ値とを比較する。この点は、第1の実施形態のグラフ評価部17と同様である。第2の実施形態のグラフ評価部17aは、さらに、新規のデータ(新たに発生し新規データ発生時刻代入部12から受け取った発生データ)の時刻を既定式に代入したときの近似値も計算し、その近似値とその新規のデータの実際のデータ値との比較も行う。そして、グラフ評価部17aは、精度制約入力部16が記憶している基準を満たす近似式を特定する。そして、基準を満たす近似式が複数存在する場合には、グラフ評価部17aは、基準を満たす各近似式のうち、有効定義域を更新する際の記憶容量の増加が最小となる近似式を特定し、その近似式が、新規のデータのデータ値を近似すると決定する。また、基準を満たす近似式が1つしか存在しない場合には、グラフ評価部17aは、その近似式が、新規のデータのデータ値を近似すると決定する。
 ここで、グラフ評価部17aは、基準を満たす各近似式が複数存在する場合に、その近似式の中に既定式があるならば、既定式を選択する。既定式には有効定義域が定められないので、有効定義域を表現するための記憶容量が0で済むからである。基準を満たす各近似式が複数存在し、その近似式の中に既定式がない場合における近似式の選択方法は、第1の実施形態におけるグラフ評価部17と同様であり、説明を省略する。
 グラフ評価部17aが決定した近似式が既定式以外の近似式であるならば、グラフ評価部17aは、決定した近似式およびその有効定義域と、新規のデータ(新規データ発生時刻代入部12から受け取った発生データ)を、グラフ更新部18aに出力する。このとき、グラフ評価部17aは、決定した近似式が最終近似式であるか否かを示す情報もグラフ更新部18aに出力する。また、グラフ評価部17aは、決定した近似式として、例えば、その近似式IDをグラフ更新部18aに出力すればよい。この動作は、第1の実施形態におけるグラフ評価部17の動作と同様である。
 一方、決定した近似式が既定式であるならば、グラフ評価部17aは、既定式を選択したことを通知する情報と、入力された新規の発生データをグラフ更新部18aに出力する。既定式を選択したことを通知する情報としては、既定式を表す既定式専用の既定式IDが用いられてもよい。
 また、精度制約入力部16が記憶している基準を満たす近似式が存在しない場合もある。その場合、グラフ評価部17aは、近似式を選択せず、新規のデータをグラフ更新部18aに出力すればよい。この動作は第1の実施形態におけるグラフ評価部17の動作と同様である。
 第2の実施形態におけるグラフ更新部18aは、グラフ評価部17aが既定式以外の近似式を決定し、グラフ評価部17aからその近似式およびその有効定義域、新規の発生データ、決定された近似式が最終近似式であるか否かを示す情報を受け取った場合、その近似式の有効定義域を更新する。グラフ更新部18aは、受け取った近似式の有効定義域に、新規の発生データの発生時刻を追加するように有効定義域を更新すればよい。グラフ更新部18aは、グラフ評価部17aによって決定された近似式の更新後の有効定義域を確定グラフ記憶部19に記憶させる。この動作は、第1の実施形態におけるグラフ更新部18の動作と同様である。
 また、グラフ評価部17aが、新規のデータのデータ値を近似する近似式として既定式を選択し、既定式を選択したことを通知する情報と、入力された新規の発生データをグラフ更新部18aに出力した場合、グラフ更新部18aは、以下のように動作する。すなわち、グラフ更新部18aは、その新規のデータの発生時刻と、既定式を表す既定式専用の既定式IDとを最終データ情報として、最終時刻記憶部11に記憶させる。
 また、グラフ評価部17aが近似式を選択せずに新規の発生データだけをグラフ更新部18aに出力した場合、グラフ更新部18aは、その発生データを未確定点として未確定点記憶部13に記憶させる。この動作は、第1の実施形態におけるグラフ更新部18の動作と同様である。
 また、新規近似式生成部14が未確定点から新たに近似式を生成し、その近似式と、その近似式生成に用いた各データ(すなわち、未確定点記憶部13に記憶されていた各未確定点、および、新たに入力された発生データ)とをグラフ更新部18aに出力した場合におけるグラフ更新部18aの動作は、第1の実施形態におけるグラフ更新部18の動作と同様である。
 第2の実施形態におけるグラフ評価部17a、グラフ更新部18aおよび既定式入力部20は、例えば、データ要約プログラムに従って動作するコンピュータのCPUによって実現される。この場合、コンピュータのプログラム記憶装置(図示せず)がデータ要約プログラムを記憶し、CPUがそのプログラムを読み込み、プログラムに従って、データ入力部10、新規データ発生時刻代入部12、新規近似式生成部14、精度制約入力部16、グラフ評価部17a、グラフ更新部18a、および既定式入力部20として動作すればよい。また、これらの各部がそれぞれ別々のハードウェアで実現されていてもよい。
 第2の実施形態の処理経過の例を、図16を参照して説明する。ただし、第1の実施形態と同様の処理については説明を省略する。ステップS105までの動作は、第1の実施形態と同様である。ステップS105の後、グラフ評価部17aは、近似式毎に計算されたデータの近似値と、データ入力部10に入力されたデータの実際のデータ値とを比較する(ステップS106)。ただし、第2の実施形態では、グラフ評価部17aは、既定式記憶部21に記憶された既定式に新規データの発生時刻を代入し、その結果得た近似値と実際のデータ値とをも比較する。例えば、精度制約入力部16が|x−f(t)|<εという基準を記憶している場合、グラフ評価部17aは、近似値と実際のデータ値との差分の絶対値を計算すればよい。
 次に、グラフ評価部17aは、ステップS106で計算した差分の絶対値が閾値ε未満になっているという基準を満たす近似式が存在するか否かを判定する(ステップS107)。基準を満たす近似式が存在しない場合(ステップS107のNo)の動作は、第1の実施形態と同様である。
 基準を満たす近似式が複数存在する場合(ステップS107のYes)、グラフ評価部17aは、基準を満たす各近似式のうち、有効定義域更新時の記憶容量増加量が最小となる近似式を選択する(ステップS108)。ステップS108において、基準を満たす近似式の中に既定式があれば、グラフ評価部17aは、その既定式を選択し、既定式を選択したことを通知する情報と、入力された新規の発生データをグラフ更新部18aに出力する。基準を満たす近似式の中に既定式がない場合の動作は、第1の実施形態と同様である。
 既定式を選択したことを通知する情報および発生データがグラフ評価部17aから受け取った場合、グラフ更新部18aは、既定式IDとそのデータの発生時刻とを含む最終データ情報を最終時刻記憶部11に記憶させる(ステップS109)。その他の場合のステップS109の動作は、第1の実施形態と同様であり、説明を省略する。
 第2の実施形態のデータ要約システムは、既定式については有効定義域を設けない。また、第2の実施形態のデータ要約システムは、データのデータ値を近似する近似式が既定式に該当しないと判定された発生データについて、いずれかの近似式に対応付ける。よって、第2の実施形態のデータ要約システムは、既定式以外の近似式で近似される発生データについては、第1の実施形態と同様に、少ない記憶容量で効率的に要約を行うことができる。また第2の実施形態のデータ要約システムは、既定式で近似されるデータについては有効定義域を記憶しないので、さらに効率的に要約を行うことができる。従って、第2の実施形態のデータ要約システムは、第1の実施形態よりも、さらに少ない記憶容量で効率的に要約を行うことができる。
 例えば、図13に示す4つの近似式のうち、x=f0(t)が既定式であるとする。この場合、近似式自体の記憶に必要な容量は、数値8(=2×4)個分である。第2の実施形態のデータ要約システムは、有効定義域に関しては、x=f0(t)の有効定義域を記憶しなくてもよいので、各有効定義域の記憶に必要な容量は、(2×1+2)+(2×2+1)+(2×1)=11個分となる。従って、要約されるデータの記憶に必要な容量は、数値19個(=8+11)分で済み、第2の実施形態のデータ要約システムは、第1の実施形態よりもさらに記憶容量を削減できる。
実施形態3.
 第3の実施形態のデータ要約システムは、最初、データ発生源(図示せず)から受け取ったデータを要約せずにそのまま記憶し、データを記憶する記憶リソース(メモリリソース)が不足したときに、記憶したデータを要約する。
 図23は、本発明の第3の実施形態のデータ要約システムの例を示すブロック図である。第3の実施形態のデータ要約システムは、データ入力部10と、最終時刻記憶部11と、新規データ発生時刻代入部12と、新規近似式生成部14と、未確定点記憶部13と、近似式記憶部15と、精度制約入力部16と、グラフ評価部17と、グラフ更新部18と、確定グラフ記憶部19と、未要約データ記憶部30と、利用可能記憶域監視部31と、要約制御部32を備える。
 第1の実施形態と同様の構成要素については、図2と同一の符号を付し、詳細な説明を省略する。ただし、データ発生源(図示せず)からデータ入力部10にデータが入力されると、データ入力部10は、入力されたデータを要約制御部32に出力する。また、新規データ発生時刻代入部12および新規近似式生成部14は、要約制御部32からデータを受け取る。
 未要約データ記憶部30は、データ発生源(図示せず)で発生したデータを、未要約の状態で記憶する記憶装置である。データは、データ値と、発生時刻とを含むので、未要約データ記憶部30は、データ値および発生時刻を含むデータ群を記憶する。図4は、未確定点記憶部13が記憶する複数の未確定点を例示した説明図であるが、未要約データ記憶部30も図4に例示されるように、データ値および発生時刻を含む複数のデータを記憶する。未要約データ記憶部30にデータを記憶させる処理は、要約制御部32によって行われる。
 未要約データ記憶部30、最終時刻記憶部11、未確定点記憶部13、近似式記憶部15、および確定グラフ記憶部19は、別々の記憶装置で実現されていてもよい。あるいは、それらは同一の記憶装置で実現されていてもよい。また、未要約データ記憶部30、最終時刻記憶部11、未確定点記憶部13、近似式記憶部15、および確定グラフ記憶部19のうちの一部の組み合わせが同一の記憶装置で実現されていてもよい。例えば、未要約データ記憶部30および最終時刻記憶部11が同一の記憶装置で実現され、その記憶装置とは別の記憶装置によって近似式記憶部15と確定グラフ記憶部19が実現され、さらに別の記憶装置によって未確定点記憶部13が実現されてもよい。
 利用可能記憶域監視部31は、少なくとも未要約のデータを記憶する記憶装置における使用可能なリソース量を監視し、その監視結果を要約制御部32に出力する。ここでいう未要約のデータとは、データ発生源から、データ入力部10および要約制御部32を介して未要約データ記憶部30に記憶されているデータを意味する。すなわち、未だ、要約処理の対象として要約処理が開始されていないデータを意味する。従って、利用可能記憶域監視部31は、未要約データ記憶部30における使用可能なリソース量を監視すればよい。未要約データ記憶部30が他の記憶部と同一の記憶装置で実現されているならば、利用可能記憶域監視部31は、その記憶装置における使用可能なリソース量を監視すればよい。
 使用可能なリソース量を利用可能記憶域監視部31が監視する態様の一例として、使用可能な残りのメモリ容量を監視する態様が挙げられる。ただし、この態様は一例であり、利用可能記憶域監視部31は、他の態様でリソース量を監視してもよい。例えば、未要約データ記憶部30がディスク記憶装置である場合、ディスクの未使用率を監視してもよい。また、ここでは、利用可能記憶域監視部31が、使用可能なリソース量を監視する場合が示されたが、利用可能記憶域監視部31が既に使用されているリソース量(例えばディスクの使用率)を監視してもよい。以下の説明では、利用可能記憶域監視部31が、未要約データ記憶部30における使用可能なリソース量を監視する場合を例にして説明する。
 また、利用可能記憶域監視部31は、一定時間毎に未要約データ記憶部30に対する監視を行ってもよい。あるいは、例えば、ユーザ等から任意のタイミングで、監視を行う旨の指示が入力されたときに、利用可能記憶域監視部31は、未要約データ記憶部30に対する監視を行ってもよい。
 要約制御部32は、利用可能記憶域監視部31による監視結果に応じて、データ入力部10に入力された発生データを、未要約のまま未要約データ記憶部30に記憶させる。あるいは、要約制御部32は、利用可能記憶域監視部31による監視結果に応じて、未要約データ記憶部30に記憶されているデータを要約する要約制御を行う。未要約データ記憶部30における使用可能なリソース量(例えば、残りのメモリ容量あるいはディスクの未使用率)等が閾値より大きければ、要約制御部32は、データ入力部10に入力された発生データを、未要約のまま未要約データ記憶部30に記憶させる。一方、未要約データ記憶部30における使用可能なリソース量が閾値以下であるならば、要約制御部32は、未要約データ記憶部30に記憶されているデータを要約する要約制御を行う。具体的には、要約制御部32は、未要約データ記憶部30に記憶されているデータを新規近似式生成部14および新規データ発生時刻代入部12に出力して、データの要約処理を開始させる。要約制御部32は、未要約データ記憶部30に記憶されているデータを新規近似式生成部14および新規データ発生時刻代入部12に出力した場合には、そのデータを未要約データ記憶部30から消去する。
 また、利用可能記憶域監視部31が既に使用されているリソース量を監視する場合には、要約制御部32は、使用されているリソース量が閾値未満の場合に発生データを未要約データ記憶部30に記憶させ、使用されているリソース量が閾値以上である場合に要約制御を行えばよい。
 また、要約制御部32は、要約制御を行うときに、新たな発生データを未要約データ記憶部30に記憶させると同時に、そのデータに対する要約制御を行ってもよい。
 要約制御部32は、要約制御を行う場合、未要約データ記憶部30に記憶されているデータを、1つずつ、新規近似式生成部14および新規データ発生時刻代入部12に出力する。なお、要約制御部32は、新規近似式生成部14および新規データ発生時刻代入部12に対して、例えば、同時に同一のデータを出力する。要約制御部32は、各データの出力順序が、先に出力するデータの発生時刻よりも後に出力するデータの発生時刻の方が遅いという条件を満たすように、1つずつデータを出力していけばよい。ただし、要約制御部32は、出力済みのデータを未要約データ記憶部30から消去する場合に、出力済みのデータを再出力しない、という条件が成立していれば、時刻順に消去しなくてもよい。
 また、要約制御部32は、必ずしも、未要約データ記憶部30に記憶されている各データの全てを要約対象として新規近似式生成部14および新規データ発生時刻代入部12に出力しなくてもよい。図24は、未要約データ記憶部30に記憶されたデータを発生時刻順に模式的に並べた模式図である。データ51が、最初にデータ入力部10に入力され未要約データ記憶部30に記憶されたデータであり、その後、データ52以降が順に未要約データ記憶部30に記憶されたとする。要約制御部32は、データ51から発生時間順にデータを新規近似式生成部14および新規データ発生時刻代入部12に出力「してもよい。あるいは、要約制御部32は、発生したデータの途中(例えば、データ55)から、発生時間順にデータを新規近似式生成部14および新規データ発生時刻代入部12に出力してもよい。この場合、データ51~54は、要約対象とされずに、消去されないまま未要約データ記憶部30に保持され続ける。
 また、新規近似式生成部14および新規データ発生時刻代入部12に対して先に出力するデータの発生時刻よりも、後に出力するデータの発生時刻の方が遅いという条件を満たしていれば、要約制御部32は、データをとばして出力してもよい。例えば、要約制御部32は、データ51~54の出力後、データ55をとばしてデータ56~59を出力してもよい。この場合も、とばされたデータは、要約対象とされずに、未要約データ記憶部30に保持され続ける。
 また、データ要約が開始され、最終時刻記憶部11に最終データ情報が記憶されているとする。この場合、要約制御部32は、発生時刻が、最終データ情報が示す最終時刻以降であるという条件を満たすデータを新規近似式生成部14および新規データ発生時刻代入部12に出力してもよい。
 要約制御部32は、使用可能なリソースがさらに少ない場合には、新しい発生データも要約するために、入力された発生データをそのまま新規データ発生時刻代入部12および新規近似式生成部14に出力する。このように、新規データを未要約データ記憶部30に記憶させずに要約するか否かを判定するための閾値も、上記の閾値とは別に予め定められていればよい。
 要約制御部32が新規近似式生成部14および新規データ発生時刻代入部12にデータを出力した場合の、新規近似式生成部14,新規データ発生時刻代入部12、グラフ評価部17、グラフ更新部18の動作は、第1の実施形態と同様である。すなわち、新規近似式生成部14、新規データ発生時刻代入部12、グラフ評価部17、グラフ更新部18は、要約制御部32から出力される1つ1つのデータに対して、図16に示すステップS102以降の動作を行う。なお、ステップS102からステップS103に移行した場合、新規データ発生時刻代入部12は処理を行わない。
 第3の実施形態における利用可能記憶域監視部31、および要約制御部32は、例えば、データ要約プログラムに従って動作するコンピュータのCPUによって実現される。この場合、コンピュータのプログラム記憶装置(図示せず)がデータ要約プログラムを記憶し、CPUがそのプログラムを読み込み、プログラムに従って、データ入力部10、利用可能記憶域監視部31、要約制御部32、新規データ発生時刻代入部12、新規近似式生成部14、精度制約入力部16、グラフ評価部17、およびグラフ更新部18として動作すればよい。また、これらの各部がそれぞれ別々のハードウェアで実現されていてもよい。
 第3の実施形態のデータ要約システムは、データを記憶するために使用可能なリソースが多く残っていれば、データを要約せずに未要約データ記憶部30に記憶する。第3の実施形態のデータ要約システムは、この場合には、データを要約しないので高精度でデータを保持することができる。また、第3の実施形態のデータ要約システムは、データを記憶するために使用可能なリソースが少なくなった場合には、未要約データ記憶部30に記憶されていたデータを要約し、第1の実施形態と同様に、近似式およびその有効定義域の形式で、少ない記憶容量で効率的に記憶することができる。従って、第3の実施形態のデータ要約システムは、他の実施形態と同様に、効率的なデータ要約を実現できるとともに、使用可能なリソースが多いときには、高精度でデータを保持できる。
 なお、図23に示される各構成要素は、それぞれ1つの装置で実現されるのではなく、複数の装置で実現されてもよい。例えば、データ入力部10と、利用可能記憶域監視部31と、要約制御部32と、未要約データ記憶部30と、最終時刻記憶部11とが第1の情報処理装置で実現されてもよい。そして、新規データ発生時刻代入部12と、新規近似式生成部14と、未確定点記憶部13と、精度制約入力部16と、グラフ評価部17と、グラフ更新部18とが第2の情報処理装置で実現されてもよい。そして、近似式記憶部15および確定グラフ記憶部19がデータベース装置で実現されてもよい。そして、データ要約システムは、この第1の情報処理装置と、第2の情報処理装置と、データベース装置とを含む構成であってもよい。
 また、第3の実施形態のデータ要約システムは、第2の実施形態と同様に、既定式入力部20および既定式記憶部21(図20参照)を備え、グラフ評価部17、グラフ更新部18の代わりに、グラフ評価部17a、グラフ更新部18a(図20参照)を備える構成であってもよい。そのような構成を含むデータ要約システムは、第2の実施形態と同様に、データ要約をさらに効率的に行うことができる。
実施形態4.
 第4の実施形態では、第2の実施形態と同様に、既定式が用いられる。ただし、第4の実施形態のデータ要約システムは、既定式に対して有効定義域を定めた場合の記憶容量よりも、既定式以外のいずれかの近似式の有効定義域の記憶容量の方が大きければ、その近似式を既定式に変更する。
 図25は、本発明の第4の実施形態のデータ要約システムの例を示すブロック図である。第2の実施形態と同様の構成要素については、図20と同一の符号を付し、詳細な説明を省略する。第4の実施形態のデータ要約システムは、データ入力部10と、最終時刻記憶部11と、新規データ発生時刻代入部12と、新規近似式生成部14と、未確定点記憶部13と、近似式記憶部15と、精度制約入力部16と、グラフ評価部17aと、グラフ更新部18aと、確定グラフ記憶部19と、既定式入力部20と、既定式記憶部21と、既定式有効定義域計算部40と、有効定義域容量評価部41と、有効定義域更新部42とを備える。
 第2の実施形態で説明したように、既定式記憶部21が記憶する既定式には、有効定義域は定められていない。既定式有効定義域計算部40は、既定式以外の各近似式の有効定義域に要する記憶容量と、既定式に有効定義域を定めた場合における既定式の有効定義域に要する記憶容量との大小関係を判定するために、既定式の有効定義域を計算し、その有効定義域を有効定義域容量評価部41に出力する。既定式有効定義域計算部40は、以下のように、既定式の有効定義域を計算する。最初の発生データの発生時刻から、現時点における最後の発生データの発生時刻までの時間帯を∪と表す。また、各近似式x=f0(x)、x=f1(x)、・・・、x=fn(x)の有効定義域をそれぞれS,S,・・・,Sとする。また、既定式の有効定義域をSdefaultとする。既定式有効定義域計算部40は、以下に示す式(1)の計算を行うことによって、既定式の有効定義域Sdefaultを求める。
 Sdefault=∪−(S∪S∪・・・∪S)   式(1)
 すなわち、既定式有効定義域計算部40は、最初の発生データの発生時刻から現時点における最後の発生データの発生時刻までの時間帯から、各近似式の有効定義域の和集合を除くことにより、既定式の有効定義域Sdefaultを計算する。
 図26は、既定式の有効定義域の導出例を示す説明図である。図26に示される横軸はデータ発生時刻tを表し、縦軸はデータ値xを表す。また、各データは、T~Tの時間帯で発生しているものとする。また、図26において、既定式はx=f0(t)であり、時刻はT以上の整数であるとする。x=f1(t)およびx=f2(t)は既定式以外の近似式である。近似式x=f1(t)の有効定義域は、I11∪I12∪I13であり、より具体的には、[T,T]∪[T+1,T]∪[T+1,T]である。また、近似式x=f2(x)の有効定義域は、I21∪I22であり、より具体的には、[T+1,T]∪[T+1,T]である。なお、時刻Tから時刻Tまでの区間を[T,T]と表し、他の区間についても同様に表している。図27は、このx=f1(t)およびx=f2(t)の有効定義域をまとめた説明図である。
 本例では、∪=[T,T]であり、既定式有効定義域計算部40は、既定式の有効定義域を、以下に示す式(2)のように計算すればよい。
default
=[T,T]−([T,T]∪[T+1,T]∪[T+1,T])∪([T+1,T]∪[T+1,T])
=[T+1,T]∪[T+1,T]        式(2)
 なお、[T+1,T]、[T+1,T]は、図26に示す区間I01,I02である。
 有効定義域容量評価部41は、既定式有効定義域計算部40から受け取った既定式の有効定義域と、確定グラフ記憶部19に記憶された近似式毎の有効定義域とを参照し、有効定義域を記憶するために要する記憶容量が最大となる近似式を特定する。有効定義域容量評価部41は、その近似式と、既定式の有効定義域とを有効定義域更新部42に出力する。なお、有効定義域容量評価部41は、有効定義域更新部42に対して、近似式そのものを出力する代わりに、近似式IDまたは既定式IDを出力してもよい。
 図27に示される例では、近似式ID“f1”の近似式の有効定義域は、[T,T]∪[T+1,T]∪[T+1,T]なので、数値6個分の記憶容量を要する。また、近似式ID“f2”の有効定義域は、[T+1,T]∪[T+1,T]なので、数値4個分の記憶容量を要する。さらに、既定式に関して計算された有効定義域は、[T+1,T]∪[T+1,T]なので、数値4個分の記憶容量を要する。従って、有効定義域容量評価部41は、有効定義域の記憶容量が最大となる近似式x=f1(t)と、既定式の有効定義域[T+1,T]∪[T+1,T]を、有効定義域更新部42に出力する。
 既定式について計算された有効定義域と、有効定義域の記憶容量が最大となる近似式とが有効定義域更新部42に入力されると、有効定義域更新部42は、その入力内容に応じて、既定式を更新する。ただし、有効定義域更新部42は、有効定義域の記憶容量が最大となる近似式が現在の既定式であるならば、更新を行わずに処理を終了する。
 第4の実施形態における既定式有効定義域計算部40、有効定義域容量評価部41および有効定義域更新部42は、例えば、データ要約プログラムに従って動作するコンピュータのCPUによって実現される。この場合、コンピュータのプログラム記憶装置(図示せず)がデータ要約プログラムを記憶し、CPUがそのプログラムを読み込み、プログラムに従って、データ入力部10、新規データ発生時刻代入部12、新規近似式生成部14、精度制約入力部16、グラフ評価部17a、グラフ更新部18a、既定式入力部20、既定式有効定義域計算部40、有効定義域容量評価部41および有効定義域更新部42として動作すればよい。また、これらの各部がそれぞれ別々のハードウェアで実現されていてもよい。
 次に、第4の実施形態の動作について説明する。
 図28は、第4の実施形態における既定式有効定義域計算部40、有効定義域容量評価部41および有効定義域更新部42により既定式更新処理の処理経過の例を示すフローチャートである。
 なお、データ要約システムは、この既定式更新処理を、データ入力部10と、最終時刻記憶部11と、新規データ発生時刻代入部12と、新規近似式生成部14と、未確定点記憶部13と、近似式記憶部15と、グラフ評価部17aと、グラフ更新部18aと、確定グラフ記憶部19と、既定式記憶部21とによるデータ要約処理(第2の実施形態と同様のデータ要約処理)と非同期にしてもよい。例えば、データ要約システムは、一定時間毎に、図28に示される既定式更新処理を実行してもよい。あるいは、データ入力部10に所定個数の発生データが入力される毎に、データ要約システムは、既定式更新処理を実行してもよい。あるいは、近似式記憶部15に新たな近似式が追加されたときに、データ要約システムは、既定式更新処理を実行してもよい。これらは、図28に示される既定式更新処理の実行タイミングの例示であり、既定式更新処理の実行タイミングは上記の各例に限定されない。
 既定式更新処理が開始される場合、まず、既定式有効定義域計算部40は、確定グラフ記憶部19から全ての近似式の有効定義域を読み取る(ステップS401)。さらに、既定式有効定義域計算部40は、前述の式(1)の計算を行うことにより、既定式の有効定義域Sdefaultを計算する(ステップS402)。既定式有効定義域計算部40は、その有効定義域を有効定義域容量評価部41に出力する。
 次に、有効定義域容量評価部41は、既定式の有効定義域Sdefaultおよび各近似式の有効定義域を参照して、有効定義域を記憶するために要する記憶容量が最大となる近似式を特定する(ステップS403)。そして、有効定義域容量評価部41は、特定した近似式と、既定式の有効定義域を有効定義域更新部42に出力する。
 次に、有効定義域更新部42は、有効定義域を記憶するために要する記憶容量が最大となる近似式としてステップS403で特定された近似式が、既定式であるか否かを判定する(ステップS404)。
 ステップS403で特定された近似式が既定式でないならば(ステップS404のNo)、有効定義域更新部42は、ステップS403で特定された近似式を新たな規定式として更新する(ステップS405)。具体的には、有効定義域更新部42は、以下の処理を行う。
 有効定義域更新部42は、有効定義域の記憶容量が最大となる近似式を新たな既定式とし、既定式記憶部21に記憶されている既定式を、その新たな既定式に更新する。そして、有効定義域更新部42は、新たな既定式とした近似式およびその近似式IDを、近似式記憶部15から削除する。さらに、有効定義域更新部42は、それまで既定式としていた近似式を近似式記憶部15に記憶させる。このとき、有効定義域更新部42は、その近似式(それまで既定式としていた近似式)に近似式IDを割り当て、その近似式IDとともに近似式記憶部15に記憶させる。
 有効定義域更新部42は、新たな既定式とした近似式の近似式IDおよびその有効定義域を確定グラフ記憶部19から削除する。さらに、有効定義域更新部42は、それまで既定式としていた近似式に割り当てた近似式ID、および、その有効定義域(既定式有効定義域計算部40が計算した有効定義域)を確定グラフ記憶部19に記憶させる。
 さらに、有効定義域更新部42は、新たな既定式とした近似式の近似式IDが、最終近似式のIDとして最終時刻記憶部11に記憶されているならば、そのIDを、既定式IDに更新する。一方、既定式IDが最終近似式のIDとして最終時刻記憶部11に記憶されているならば、有効定義域更新部42は、その既定式IDを、それまで既定式としていた近似式に割り当てた近似式IDに更新する。
 また、ステップS403で特定された近似式が既定式であるならば(ステップS404のYes)、有効定義域更新部42は、既定式を更新せず、そのまま処理を終了する。すなわち、有効定義域更新部42は、既定式記憶部21、近似式記憶部15、確定グラフ記憶部19および最終時刻記憶部11に記憶されている内容を更新せずに、処理を終了する。
 第4の実施形態のデータ要約システムは、既定式も含む近似式の各有効定義域を比較し、有効定義域を記憶するために要する記憶容量が最大の近似式を新たな既定式として、既定式を更新する。既定式には有効定義域を定めないので、第4の実施形態のデータ要約システムは、上記のような既定式更新により、有効定義域の記憶に要する記憶容量を削減して、さらに効率的にデータを要約することができる。
 例えば、図26に例示されるようにデータが入力され、x=f0(t)を既定式としていたとする。この場合、図27に示されるように、x=f1(t),x=f2(t)の有効定義域のために、数値10個分の記憶容量を要する。第4の実施形態のデータ要約システムは、x=f1(t)を既定式とし、従前のx=f0(t)を有効定義域とともに記憶すれば、[T,T]∪[T+1,T]∪[T+1,T]の代わりに[T+1,T]∪[T+1,T]を記憶するだけでよい。従って第4の実施形態のデータ要約システムは、有効定義域の記憶に要する容量を、本例では2個分削減することができる。
 なお、第4の実施形態のデータ要約システムが、第3の実施形態と同様に、未要約データ記憶部30と、利用可能記憶域監視部31と、要約制御部32とを備え、データを記憶できるリソースが多いときにはデータをそのまま記憶し、リソースが減少したときにデータ要約を行う構成としてもよい。
 また、上記の各実施形態では、データが、数値であるデータ値を含む場合を例にしたが、数値自体の代わりに、数値化可能であり、数値化したデータ同士の差分を導出できるデータ値を含んでいてもよい。例えば、数値への変換規則が定められていれば、データ値としてテキスト情報が用いられてもよい。逐次発生するデータが、このようなテキスト情報と、発生時刻を含む場合、例えば、データ入力部10が、そのようテキスト情報を、数値に変換すればよい。以降の処理は、上記の実施形態と同様である。
 また、上記の各実施形態では、データがデータ値としてスカラ量を含む場合を例示したが、データ値としてベクトルを含んでいてもよい。すなわち、データが、ベクトルと、発生時刻とを含んでいてもよい。この場合、新規近似式生成部14は、複数のデータ(未確定点および新たな発生データ)から、ベクトルの近似値を導出する近似式を生成すればよい。また、ステップS106(図16参照)では、グラフ評価部17またはグラフ評価部17aは、近似値として計算したベクトルと、実際にデータに含まれるベクトルとを比較する際に、ベクトル空間における両者の距離を計算してもよい。そして、ステップS107では、グラフ評価部17またはグラフ評価部17aは、その距離が、閾値ε未満(あるいはε以下)となる近似式の有無を判定すればよい。
 以上、本発明を実施するための形態について説明したが、本発明は以上の実施の形態に限定されるものではない。本発明の構成や詳細には、本発明のスコープ内で当業者が理解しうるその他各種の付加変更が可能である。
 次に、本発明の最小構成について説明する。図29は、本発明の最小構成を示すブロック図である。本発明のデータ要約システムは、近似値計算部61と、近似式評価部62と、未確定データ記憶部63と、新規近似式生成部64と、更新部65とを備える。
 近似値計算部61(例えば、新規データ発生時刻代入部12)は、データ値とそのデータ値の発生時刻とを含むデータにおけるデータ値の近似値を計算する近似式であって、発生時刻を変数とし、変数の有効定義域が時間の区間または一点の時刻の集合として定められた各近似式に対して、新規データに含まれる発生時刻を代入することで新規データのデータ値の近似値を近似式毎に計算する。
 近似式評価部62(例えば、グラフ評価部17)は、近似式毎に計算された近似値と新規データのデータ値とに基づいて、新規データのデータ値の近似値計算に適する近似式を選択するか、あるいは、新規データのデータ値の近似値計算に適する近似式がないと判定する。
 未確定データ記憶部63(例えば、未確定点記憶部13)は、データ値の近似値計算に適する近似式がないと判定された新規データを、近似式未確定データ(例えば、未確定点)として記憶する。
 新規近似式生成部64(例えば、新規近似式生成部14)は、新規データが新規近似式生成部64に入力されたときに、新規データと近似式未確定データとにより新規の近似式を生成可能であるか否かを判定し、生成可能である場合に新規の近似式を生成し、その近似式の有効定義域として時間の区間または一点の時刻の集合を定める。
 更新部65(例えば、グラフ更新部18)は、近似式評価部62が新規データのデータ値の近似値計算に適する近似式を選択した場合に、その新規データの発生時刻を含めるように、近似式の有効定義域を更新する。
 以上のような構成を含むデータ要約システムは、各データを、近似式およびその有効定義域の形式で記憶し、1つの近似式の有効定義域を時間の区間または一点の時刻の集合として定める。よってデータ要約システムは、近似式およびその有効定義域を記憶する記憶容量が少なくて済む。従って、データ要約システムは、データを効率的に要約(圧縮)することができる。特に、この利点は、逐次的に一定の傾向で発生するデータであって不規則に大きく変化することがあるデータを要約するときに顕著に得られる。
 また、上記の各実施形態には、以下のような構成のデータ要約システムが記載されている。
(1)データ値とそのデータ値の発生時刻とを含むデータにおけるデータ値の近似値を計算する近似式であって、発生時刻を変数とし、変数の有効定義域が時間の区間または一点の時刻の集合として定められた各近似式に対して、新規データに含まれる発生時刻を代入することで新規データのデータ値の近似値を近似式毎に計算する近似値計算部(例えば、新規データ発生時刻代入部12)と、近似式毎に計算された近似値と新規データのデータ値とに基づいて、新規データのデータ値の近似値計算に適する近似式を選択するか、あるいは、新規データのデータ値の近似値計算に適する近似式がないと判定する近似式評価部(例えば、グラフ評価部17)と、データ値の近似値計算に適する近似式がないと判定された新規データを、近似式未確定データ(例えば、未確定点)として記憶する未確定データ記憶部(例えば、未確定点記憶部13)と、新規データが入力されたときに、新規データと近似式未確定データとにより新規の近似式を生成可能であるか否かを判定し、生成可能である場合に新規の近似式を生成し、その近似式の有効定義域として時間の区間または一点の時刻の集合を定める新規近似式生成部(例えば、新規近似式生成部14)と、近似式評価部が新規データのデータ値の近似値計算に適する近似式を選択した場合に、その新規データの発生時刻を含めるように、近似式の有効定義域を更新する更新部(例えば、グラフ更新部18)とを備えることを特徴とするデータ要約システム。
(2)近似式評価部が、近似値と新規データのデータ値との関係が所定の基準(例えば、精度制約入力部16が記憶する基準)を満たす近似式を特定し、基準を満たす近似式が1つである場合には、その近似式を選択し、基準を満たす近似式が複数存在する場合には、その複数の近似式の中から、新規データの発生時刻を含めるように有効定義域を更新した場合に有効定義域の記憶のために必要となる記憶容量の増加が最小となる近似式を選択し、基準を満たす近似式が存在しない場合には、新規データのデータ値の近似値計算に適する近似式がないと判定するデータ要約システム。
(3)データ値の近似値を計算する近似式であって有効定義域を定めない近似式である既定式を記憶する既定式記憶部(例えば、既定式記憶部21)を備え、近似値計算部が、既定式を含む各近似式に対して、新規データに含まれる発生時刻を代入することで新規データのデータ値の近似値を近似式毎に計算し、近似式評価部が、既定式を含む各近似式の中から近似値と新規データのデータ値との関係が所定の基準を満たす近似式を特定し、基準を満たす近似式が1つである場合には、その近似式を選択し、基準を満たす近似式が複数存在し、その複数の近似式の中に既定式が含まれている場合には、既定式を選択し、基準を満たす近似式が複数存在し、その複数の近似式の中に既定式が含まれていない場合には、その複数の近似式の中から、新規データの発生時刻を含めるように有効定義域を更新した場合に有効定義域の記憶のために必要となる記憶容量の増加が最小となる近似式を選択し、基準を満たす近似式が存在しない場合には、新規データのデータ値の近似値計算に適する近似式がないと判定するデータ要約システム。
(4)既定式の有効定義域を計算する既定式有効定義域計算部(例えば、既定式有効定義域計算部40)と、既定式を含む各近似式の中から有効定義域の記憶のために必要となる記憶容量が最大となる近似式を特定する有効定義域容量評価部(例えば、有効定義域容量評価部41)と、有効定義域の記憶のために必要となる記憶容量が最大となる近似式が既定式でない場合に、その記憶容量が最大となる近似式を新たな既定式として既定式記憶部に記憶させ、その記憶容量が最大となる近似式およびその有効定義域を除外する既定式更新部(例えば、有効定義域更新部42)とを備えるデータ要約システム。
(5)入力された新規データを記憶する新規データ記憶部(例えば、未要約データ記憶部30)と、新規データ記憶部における新規データを記憶可能なリソースを監視する監視部(例えば、利用可能記憶域監視部31)と、新規データを記憶可能なリソースが所定量より少なくなった場合に、新規データ記憶部に記憶されている新規データを1つずつ近似値計算部および新規近似式生成部に出力する要約制御部(例えば、要約制御部32)とを備えるデータ要約システム。
 また、上記の各実施形態には、以下のようなデータ構造が記載されている。
(1)変数を代入してデータ値の近似値を計算するための近似式と、データ値の近似値を求めることができる変数の定義域である有効定義域とが対応付けられ、有効定義域は、変数の区間または一つの変数値を表す点の集合で表されることを特徴とするデータ構造。
(2)一の近似式に対応付けられた有効定義域の区間と区間との間、または、点と点との間、または区間と点との間に、他の近似式に対応付けられた有効定義域の区間または点が存在することを許容するデータ構造。
 この出願は、2009年9月4日に出願された日本出願特願2009−205012を基礎とする優先権を主張し、その開示のすべてをここに取り込む。
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
First, an outline of a data summarization system according to an embodiment of the present invention will be described. A data summarization system according to an embodiment of the present invention reduces the amount of information of data that is sequentially generated over time. The data summarization system according to an aspect of the present invention can reduce the amount of information, thereby reducing the storage capacity required for storing data as compared with the case where the data itself is stored accurately as it is. This reduction of data information is called “summary”.
Generally, the accuracy of data decreases due to summarization. However, a data summarization system according to an embodiment of the present invention is a data that is generated with a constant tendency sequentially with the passage of time, and the data in which the tendency of the data may change irregularly can be accurately and efficiently obtained. To summarize. “CPU usage rate” has been exemplified as an example of data that is generated with a constant trend and the data trend may change irregularly. The usage rate is not limited. For example, the number of accesses per unit time of the Web page to be observed and the total number of accesses after the Web page has been released suddenly change. Data that may change irregularly ”. In addition, for example, the communication amount of the network device also corresponds to “data that is sequentially generated in a certain tendency and the data tendency may change irregularly”. “Sequentially generated data” is a concept including “sequentially observable data”.
In the following explanation, numerical values that occur with the passage of time will be described as an example as “data that occurs sequentially with a certain tendency and may change irregularly”. However, even if the data itself is not a numerical value, it can be converted into a numerical value, and any data can be used as long as the difference between the numerical data can be derived. An example of applying the present invention to such data other than numerical values will be described later.
In the example shown below, each data includes a data value and an occurrence time of the data value. In the following description, the occurrence time of this data value may be simply referred to as the occurrence time of data.
In addition, the data summarization system according to an aspect of the present invention derives a function for calculating a data value (numerical value) from a set of generated data, using the generation time as a variable. This function is an approximate expression for obtaining an approximate value of the data value from the occurrence time. In this approximate expression, a domain in which an approximate value of the data value can be obtained is determined. This domain is hereinafter referred to as an effective domain. The effective domain is represented by a set of sections (time zones) or points (specific time). A plurality of sections or points may be defined for one effective definition area. FIG. 1 is an explanatory diagram showing an example of an effective domain. In FIG. 1, the horizontal axis represents time, and the vertical axis represents data values. Further, in FIG. 1, the data value changes in the same direction in the sections a to b, the change tendency of the data value changes greatly at time b, and the change tendency of the data value changes again at time c. Is illustrated. It is assumed that functions 91 and 92 are obtained as approximate expressions. In the example shown in FIG. 1, for the data generated in the sections a to b and c to d, approximate values are obtained by the approximate expression represented by the function 91. Therefore, the effective domain of the approximate expression represented by the function 91 is a set of sections a to b and c to d. The effective definition area of the approximate expression represented by the function 92 is the interval b to c.
The data summarization system according to an aspect of the present invention determines whether or not there is an approximate expression that appropriately obtains an approximate value of the data value when new data is generated, and approximate expression that appropriately obtains the approximate value of the data value If there is, the generation time of the new data is added to the effective definition area of the approximate expression. On the other hand, if there is no approximate expression that can appropriately obtain an approximate value of newly generated data, the new data is stored as a point for which the corresponding approximate expression is not determined (hereinafter referred to as an indeterminate point). The data summarization system according to an embodiment of the present invention creates an approximate expression from the uncertain points when the uncertain points are accumulated in a number that can derive a new approximate expression.
As described above, the data summarization system according to an embodiment of the present invention stores each data in the form of an approximate expression and an effective definition area, instead of storing the generation time and the data value for each data. Furthermore, the data summarization system according to an aspect of the present invention allows a plurality of sections and points (specific times) to be defined as an effective definition area of one approximate expression. As a result, a data summarization system according to one aspect of the present invention efficiently compresses (ie summarizes) data.
Embodiment 1. FIG.
FIG. 2 is a block diagram illustrating an example of the data summarization system according to the first embodiment of this invention. The data summarization system of the first embodiment includes a data input unit 10, a final time storage unit 11, a new data generation time substitution unit 12, a new approximate expression generation unit 14, an indeterminate point storage unit 13, and an approximation. An expression storage unit 15, an accuracy constraint input unit 16, a graph evaluation unit 17, a graph update unit 18, and a confirmed graph storage unit 19 are provided. The data summarization system according to the first embodiment of the present invention summarizes data sequentially input to the data input unit 10 and stores the summarized result in the definite graph storage unit 19.
The data input unit 10 acquires data from a data generation source (not shown) that sequentially generates data over time. The mode of the data source differs depending on the type of data. For example, when the data is the number of web page accesses, the web server may be the data generation source. Further, when the data is the usage rate of the CPU, the unit that monitors the usage rate of the CPU may be the data generation source. Each piece of data input from the data generation source to the data input unit 10 includes at least a data value and a generation time of the data value.
FIG. 3 is an explanatory diagram illustrating an example of one data input to the data input unit 10. FIG. 3 illustrates the case where the data value is the CPU usage rate. As shown in FIG. 3, the data includes the generation time of the data value and the data value (CPU usage rate in this example). The data illustrated in FIG. 3 indicates that the data generation time is “2009/01/01 00:00:00” and the CPU usage rate at that time is 5.0%. Data generated at the data generation source and input to the data input unit 10 may be referred to as generation data.
The uncertain point storage unit 13 is a storage device that stores data for which an approximate expression for obtaining an approximate value of a data value has not yet been specified. Regardless of which approximate value is calculated using any existing approximate expression, data that is determined to have a large difference between the actual data value and the approximate value is stored in the indeterminate point storage unit 13 as an indeterminate point. Go. In the first embodiment, the data can be regarded as a point having the occurrence time and the data value as coordinates. Therefore, in the first embodiment, data for which an approximate expression for obtaining an approximate value of a data value has not yet been specified is expressed using the word “indeterminate point”.
The approximate expression for obtaining the approximate value of the data value using the occurrence time as a variable is derived from a plurality of uncertain points stored in the uncertain point storage unit 13. Until the number of pieces of data necessary for determining the approximate expression is obtained, the uncertain point storage unit 13 stores generated data corresponding to the uncertain points. Note that the number of data necessary to determine the approximate expression depends on the type of approximate expression (whether it is a linear expression, a quadratic expression, an expression using a trigonometric function, etc.), an approximate expression, or the like. Depends on the decision algorithm. The type of approximate expression and the approximate expression determination algorithm are determined in advance, and the number of data required to determine the approximate expression is also determined in advance according to the type of approximate expression and the approximate expression determination algorithm.
FIG. 4 is an explanatory diagram showing an example of uncertain points stored in the uncertain point storage unit 13. Each undetermined point includes an occurrence time of the undetermined point (data) and a data value (CPU usage rate in this example). Of the generated data, the generated data determined that the corresponding approximate expression cannot be specified becomes an undetermined point, and therefore the data structure of each undetermined point is the same as the data structure of the data illustrated in FIG.
The new approximate expression generation unit 14 is configured such that when newly generated data is input to the new approximate expression generation unit 14, the number of the generated data and the undefined points stored in the undefined point storage unit 13 is It is determined whether or not the number necessary for determining the approximate expression is exceeded. When the new approximate expression generation unit 14 determines that the number of pieces of data necessary for determining the approximate expression has been prepared, the new approximate expression generation unit 14 calculates a function (approximate expression) for calculating a data value from the data using the occurrence time as a variable. Generate. For example, it is assumed that the number of data required for generating the approximate expression is k, and k−1 uncertain points are stored in the uncertain point storage unit 13. At this time, when one new generation data is newly input to the new approximate expression generation unit 14, the new approximate expression generation unit 14 generates an approximate expression from k pieces of data obtained by adding the data to the undetermined point.
Furthermore, the new approximate expression generation unit 14 generates the approximate expression after generation of the approximate expression and each data used for the generation of the approximate expression (that is, the uncertain point stored in the uncertain point storage unit 13, and Newly generated data) is output to the graph update unit 18. Instead of outputting the newly generated approximate expression to the graph updating unit 18, the new approximate expression generating unit 14 stores the approximate expression and the ID (identification information) of the approximate expression in the approximate expression storage unit 15, The approximate expression ID may be output to the graph update unit 18.
FIG. 5 is an explanatory diagram schematically showing uncertain points used for generating an approximate expression and newly generated data. The horizontal axis shown in FIG. 5 represents time t, and the vertical axis shown in FIG. 5 represents the data value x. Here, it is assumed that the new approximate expression generation unit 14 generates a linear expression as an approximate expression. The generation method is assumed to be a least square method. When the new approximate expression generation unit 14 generates a linear expression by the least square method, the number of necessary data is four. Three uncertain points P shown in FIG.400Is stored in the indeterminate point storage unit 13 and the data P is newly added.401Is input to the data input unit 10. Then, the new approximate expression generation unit 14 has three uncertain points P.400And new generation data P401From the above, an approximate expression is obtained by the least square method.
FIG. 6 is an explanatory view schematically showing an approximate expression generated from each data shown in FIG. The new approximate expression generation unit 14 minimizes the approximate expression “x = at + b” of the data value x expressed as a linear function of the occurrence time t from four pieces of data (that is, four sets of occurrence times and data values). Determined by the square method. That is, the new approximate expression generation unit 14 determines the coefficient “a” and the constant term “b” of the variable t by the least square method from the four generation times and the data values. The new approximate expression generation unit 14 generates an approximate expression by determining the coefficient and the constant term in this way.
FIG. 7 is an explanatory diagram showing an example of the expression format of the approximate expression generated by the new approximate expression generation unit 14. The approximate expression “x = at + b” is uniquely determined if the coefficient (primary coefficient) of the variable t and the constant term are determined. Therefore, when the approximate expression is a linear expression, the approximate expression may be expressed by only the primary coefficient and the constant term of the variable t as illustrated in FIG.
In this example, the case where the new approximate expression generation unit 14 obtains an approximate expression represented by a linear function by the least square method using three uncertain points and one new generated data is shown. The function used by the new approximate expression generation unit 14 as an approximate expression is not limited to a linear function. For example, the new approximate expression generation unit 14 may generate an approximate expression represented by a quadratic or higher integer function, an exponential function, or a trigonometric function. Further, the method of generating the approximate expression is not limited to the least square method, and the approximate expression may be generated by another method. As already described, the number of data necessary for generating the approximate expression differs depending on the type of approximate expression and the method for generating the approximate expression. When the number of undetermined points and newly generated data reaches the number of data, the new approximate expression generation unit 14 may generate an approximate expression. Further, the new approximate expression generation unit 14 may generate an approximate expression as a linear function connecting two points. In this case, if there are two data, the new approximate expression generation unit 14 can generate an approximate expression. That is, the new approximate expression generation unit 14 generates, as an approximate expression, a straight line connecting two points in the plane of the generation time t and the data value x from one uncertain point and one newly generated data. May be. In this case, the number of data necessary for generating the approximate expression is two. Further, the new approximate expression generation unit 14 may generate the approximate expression using other methods such as spline interpolation.
In the following description, a case where the new approximate expression generation unit 14 generates an approximate expression of a linear function will be described as an example.
The approximate expression storage unit 15 is a storage device that stores an approximate expression for obtaining a data value using the occurrence time as a variable, together with an ID of the approximate expression. When the new approximate expression generation unit 14 newly generates an approximate expression and outputs the approximate expression to the graph update unit 18, the graph update unit 18 stores the approximate expression in the approximate expression storage unit 15 together with the ID. In this case, the graph update unit 18 may assign the ID of the approximate expression. For example, the approximate expression storage unit 15 may store a combination of a first-order coefficient and a constant term as in the case shown in FIG. However, this storage mode is an example, and the approximate expression storage unit 15 may store the approximate expression in another form.
FIG. 8 is an explanatory diagram illustrating an example of an approximate expression stored in the approximate expression storage unit 15 and an approximate expression ID thereof. As shown in FIG. 8, the approximate expression storage unit 15 displays an approximate expression ID, which is identification information of the approximate expression, and an approximate expression (in this example, expressed by a combination of a primary coefficient and a constant term). Store in association with each other.
The final time storage unit 11 is a storage device that stores a set of the generation time of data that has occurred at the end other than the uncertain point and an approximate expression that approximates the data value of the data. In other words, the last time storage unit 11 is the approximation that appropriately obtains the generation time of the last generated data and the approximate value of the data among the data for which the approximate expression that appropriately obtains the approximate value of the data value is specified. Memorize a pair with an expression. Hereinafter, the combination of the data generation time and approximate expression stored in the final time storage unit 11 is referred to as final data information. The approximate expression indicated by the final data information is referred to as a final approximate expression.
FIG. 9 is an explanatory diagram showing an example of final data information stored in the final time storage unit 11. In the example shown in FIG. 9, the final data information includes an approximate expression ID and a final time. In this example, the approximate expression is specified by the approximate expression ID. The final time is the data generation time of the data that has occurred last among the data that can be approximated by the approximate expression. The uncertain point stored in the uncertain point storage unit 13 is data that cannot be approximated by each known approximate expression, and thus is not a target to be stored in the final time storage unit 11. Therefore, even if an undetermined point occurs after the final time stored in the final time storage unit 11, the final time stored in the final time storage unit 11 is not updated.
In FIG. 9, the approximate expression is represented by the approximate expression ID, but the final time storage unit 11 includes the approximate expression generated by the new approximate expression generation unit 14 and the approximate expression stored by the approximate expression storage unit 15. The approximate expression may be stored in a similar format. For example, the final time storage unit 11 may store a combination of a primary coefficient and a constant term as information representing an approximate expression instead of the approximate expression ID.
The new data generation time substitution unit 12 substitutes the generation time of the generation data newly input to the data input unit 10 for each approximate expression generated in the past, and calculates an approximate value of the data value. Then, the new data generation time substitution unit 12 outputs a set of each approximate expression and an approximate value calculated for each approximate expression to the graph evaluation unit 17. At this time, the new data generation time substitution unit 12 reads the final data information from the final time storage unit 11. The new data generation time substitution unit 12 then determines which of the sets of the approximate expression and the approximate value is the set corresponding to the set of the approximate expression indicated by the final data information and the approximate value obtained from the approximate expression. Is also output to the graph evaluation unit 17. Further, the new data generation time substitution unit 12 also outputs the generated data newly input to the data input unit 10 to the graph evaluation unit 17.
Note that obtaining an approximate value by substituting the time of occurrence of newly generated data into an approximate expression means that a line represented by the approximate expression is represented in a plane with the time as the horizontal axis and the data value as the vertical axis. It can be said that it is to extend to the latest occurrence time on the horizontal axis.
FIG. 10 is an explanatory diagram showing an example of processing of the new data generation time substitution unit 12. In the example shown in FIG. 10, it is assumed that four approximate expressions x = f0 (t), x = f1 (t), x = f2 (t), and x = f3 (t) have been generated in the past. In the example shown in FIG. 10, the value of x obtained by x = f0 (t) is a constant 0. That is, f0 (t) = 0 · t + 0. In FIG. 10, each black circle represents each data. The data shown in the vicinity of the line representing each approximate expression is data that can approximate the data value with the approximate expression. For example, in the example shown in FIG. 10, the data values of six data are approximated by x = f1 (t).
In FIG. 10, the final time indicated by the final data information is tlastYes, last time tlastData P generated in1010Is approximated by an approximate expression x = f2 (t).
Also, last time tlastThe generation time of new generation data input to the data input unit 10 is t = tiSuppose that
The new data generation time substitution unit 12 sets the approximate expressions x = f0 (t), x = f1 (t), x = f2 (t), and x = f3 (t) stored in the approximate expression storage unit 15. On the other hand, the occurrence time t of new occurrence dataiTo calculate approximate values. This approximate value is X1010, X1011, X1012, X1013Then, the new data generation time substitution unit 121010= F0 (t), X1011= F1 (t), X1012= F2 (t), X1013= F3 (t) is calculated. Further, the new data generation time substitution unit 12 reads final data information from the final time storage unit 11. Then, the new data generation time substitution unit 12 determines that a set corresponding to the set of the approximate expression indicated by the final data information and the approximate value obtained from the approximate expression is (x = f2 (t), X1012). Then, the new data generation time substituting unit 12 includes information indicating which pair of each approximate expression and approximate value, the pair based on the final approximate expression is, and time tiThe new data generated in step 1 is output to the graph evaluation unit 17. At this time, the approximate expression may be expressed by an approximate expression ID or may be expressed in a form stored in the approximate expression storage unit 15.
The accuracy constraint input unit 16 receives a standard (accuracy) that can be said that the approximate value calculated by the approximate expression appropriately approximates the actual data value, and stores the standard. As an example of this reference, for example, the absolute value of the difference between the approximate value f (t) by the approximate expression f and the actual generated data value x is less than a predetermined threshold value ε. This criterion can be expressed as | x−f (t) | <ε. Further, for example, a criterion is set that the absolute value of the ratio of the difference between the approximate value f (t) and the actual generated data value x with respect to the approximate value f (t) by the approximate expression f is less than the threshold value ε. It may be. This criterion can be expressed as | (x−f (t)) / f (t) | <ε. In the above two examples, the case where the above calculation results are both less than the threshold is exemplified as the reference, but a reference that the above calculation results are equal to or less than the threshold may be used. These criteria are examples, and other criteria may be defined.
In the following description, the criterion that the absolute value of the difference between the approximate value f (t) by the approximate expression f and the data value x of the actually generated data is less than a predetermined threshold ε (ie, | x−f (t ) | <Ε) is output to the accuracy constraint input unit 16, and the accuracy constraint input unit 16 will be described with reference to an example in which this criterion is stored.
The definite graph storage unit 19 is a storage device that stores an effective domain for each approximate expression that approximates past generated data. FIG. 11 is an explanatory diagram illustrating an example of an effective definition area for each approximate expression stored in the definite graph storage unit 19. In the example shown in FIG. 11, each approximate expression is represented by an approximate expression ID. Then, for each approximate expression ID, either or both of a time range that is an effective definition area and a time point that is an effective definition area are determined. In FIG. 11, a portion expressed as a time range (that is, a time zone) in the valid definition area is shown as “section”, and a portion expressed as a specific point of time is shown as “point”. . Since there may be a plurality of “sections” and “points” included in the effective definition area, two or more “sections” and two or more “points” may be defined in one approximate expression.
In the example shown in FIG. 11, three intervals [t for the approximate expression ID “f0”01b, T01e], [T02b, T02e], [T03b, T03e] Is stipulated. T01b, T01eEtc. are times that are the start point or end point of each section. Therefore, the approximate expression of the approximate expression ID “f0” (x = f0 (t)) is t01b≦ t ≦ t01e, T02b≦ t ≦ t02e, T03b≦ t ≦ t03eMeans that the data value at time t can be approximated by the approximate value f0 (t).
Similarly, for the approximate expression ID “f1”, one interval [t11b, T11e] And two time points t12, T13Is stipulated. Therefore, the approximate expression of the approximate expression ID “f1” (x = f1 (t)) is t11b≦ t ≦ t11eOr t = t12, T = t13Means that the data value at time t can be approximated by the approximate value f1 (t).
The set of these “sections” and “points” is an effective domain. In the example shown in FIG. 11, the effective definition area of the approximate expression of the approximate expression ID “f0” is [t01b, T01e] ∪ [t02b, T02e] ∪ [t03b, T03e] = {T | t01b≦ t ≦ t01e, T02b≦ t ≦ t02e, T03b≦ t ≦ t03e}. Similarly, the effective definition area of the approximate expression of the approximate expression ID “f1” is [t11b, T11e] ∪ {t12} ∪ {t13} = {T | t11b≦ t ≦ t11e, T = t12, T = t13}. The effective domain of the approximate expression of the approximate expression ID “f2” is [t22b, T22e] ∪ [t23b, T23e] ∪ {t21} = {T | t22b≦ t ≦ t22e, T23b≦ t ≦ t23e, T = t21}. The effective domain of another approximate expression can also be specified from the information stored in the definite graph storage unit 19.
The confirmed graph storage unit 19 may store information of the following data structure. That is, the deterministic graph storage unit 19 corresponds to an approximate expression for calculating an approximate value of a data value by substituting a variable, and an effective definition area that is a variable definition range capable of obtaining the approximate value of the data value. The valid domain may store information of a data structure represented by a variable section or a set of points representing one variable value. In the first embodiment, this variable is a variable representing time.
Also, in this data structure, it corresponds to another approximate expression between the sections of the effective domain associated with a certain approximate expression, or between points, or between sections. It is permissible to have a valid domain or point attached. For example, the order of the time series of each section and point shown in FIG.11b, T11e], [T01b, T01e], T12, [T02b, T02e], T21, T13, [T22b, T22e], [T03b, T03e], [T31b, T31e], [T22b, T22e] In this order. In this example, the interval of the approximate expression “f0” [t01b, T01e], [T02b, T02e], The point t of another approximate expression “f1”12Is allowed to exist. Further, for example, the point t of the approximate expression “f1”12, T13During the interval [t0 of another approximate expression “f0”02b, T02e] Or the point t of another approximate expression “f2”21Is allowed to exist. Also, for example, the interval [t1 of the approximate expression “f1”11b, T11e] And point t12During the interval [t0 of another approximate expression “f0”01b, T01e] Is allowed to exist. In this way, elements (sections or points) of the effective definition area of another approximate expression may exist between elements (sections or points) of the effective definition area of a certain approximate expression. The data summarization system, data summarization method, and data summarization program according to each aspect of the present invention can preferably use such a data structure.
The graph evaluation unit 17 calculates each approximate value calculated by the new data generation time substituting unit 12 by substituting the generation time of new data for each approximate expression stored in the approximate expression storage unit 15 and the new data. Compare the actual data value of. Then, the graph evaluation unit 17 specifies an approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16. Furthermore, when there are a plurality of approximate expressions satisfying the criterion, the graph evaluation unit 17 identifies an approximate expression that minimizes the increase in storage capacity when updating the effective domain, among the approximate expressions satisfying the criterion. Then, it is determined that the data value of the new data is approximated by the approximate expression. Further, when there is only one approximate expression that satisfies the criterion, the graph evaluation unit 17 determines to approximate the data value of the new data with the approximate expression. Then, the graph evaluation unit 17 outputs the determined approximate expression, its valid domain, and new data (generated data received from the new data generation time substitution unit 12) to the graph update unit 18. At this time, the graph evaluation unit 17 also outputs information indicating whether or not the determined approximate expression is the final approximate expression to the graph update unit 18. Moreover, the graph evaluation part 17 should just output the approximate expression ID to the graph update part 18, for example as the determined approximate expression. The graph evaluation unit 17 may read the effective definition area from the confirmed graph storage unit 19.
When the approximate expression output from the new data generation time substitution unit 12 to the graph evaluation unit 17 is expressed in the form of the approximate expression ID, the graph evaluation unit 17 is stored in the approximate expression storage unit 15. All approximate expressions are read from the approximate expression storage unit 15.
The graph evaluation unit 17 may not be able to specify an approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16. That is, there may be no approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16. In that case, the graph evaluation unit 17 may output new data (generated data received from the new data generation time substitution unit 12) to the graph update unit 18 without selecting an approximate expression.
An example of approximate expression selection by the graph evaluation unit 17 will be specifically described with reference to FIGS. 12, 13, and 14 are explanatory diagrams illustrating an example of selecting an approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16. It should be noted that I shown in FIGS.11, I01, I12Etc. are sections and points included in the effective definition area, and correspond to the sections and points illustrated in FIG. In the examples shown in FIG. 12 to FIG. 14, as approximate expressions generated in the past, x = f0 (t), x = f1 (t), x = f2 (t), x = f3 (t ) And the effective domain of x = f0 (t) is I01∪I02∪I03It is. The effective domain of x = f1 (t) is I11∪I12∪I13It is. The effective domain of x = f2 (t) is I21∪I22∪I23It is. The effective domain of x = f3 (t) is I31It is. Further, the generation time of new generation data received by the graph evaluation unit 17 is t.iAnd The data value of the generated data is xiAnd
In FIG. 12, time tiData P1021The case where this occurs is illustrated. In the example shown in FIG. 12, the occurrence time tiAn approximate value obtained by substituting x into f = (t1) is X1011= F1 (ti). Data P1021Data value xiIf the approximate expression that satisfies the criterion that the absolute value of the difference between and the approximate value is less than the threshold value ε is only x = f1 (t), the graph evaluating unit 17 selects x = f1 (t). That is, | xi-X1011If only | <ε holds, the graph evaluation unit 17 selects x = f1 (t).
When the plurality of approximate expressions satisfy the accuracy criterion, the graph evaluation unit 17 pays attention to the plurality of approximate values individually. Then, the graph evaluation unit 17 calculates the increase in the storage capacity for storing the effective domain when the effective domain is updated on the assumption that the approximate expression of interest is an approximate expression representing new generated data. calculate. Then, the graph evaluation unit 17 selects an approximate expression having the smallest increase amount. FIG. 13 illustrates a selection example of this aspect.
In the example shown in FIG.iData P1022The case where this occurs is illustrated. In the example shown in FIG. 13, the occurrence time tiThe approximate value obtained by substituting x into f = (t) is X1012= F2 (ti). TiAn approximate value obtained by substituting x into f = f3 (t) is X1013= F3 (ti). As for these two approximate expressions, both of the data P1022Data value xiSatisfies the criterion that the absolute value of the difference between the approximate value and the approximate value is less than the threshold value ε. That is, | xi-X1012| <Ε and | xi-X1013Both || ε is satisfied.
In this case, the graph evaluation unit 17 may select the final approximate expression that is the final approximate expression among the approximate expressions that satisfy the criterion and that has the end of the effective domain as the end point of the “section”. T is the effective definition area of the final approximate expression where the end of the effective definition area is the end point of the “section”.iIs added, the graph evaluation unit 17 sets the end point of the “section” to tiShould be replaced. This is because by performing this process, the amount of increase in storage capacity for representing the effective domain becomes zero. On the other hand, in the case of an approximate expression other than the final approximate expression or a final approximate expression in which the end of the effective domain is “point”, the effective domain is tiIs added, the storage capacity for storing the effective domain is increased by one numerical value. Therefore, when there are a plurality of approximate expressions that satisfy the criterion, the graph evaluation unit 17 selects the final approximate expression in which the end of the effective domain is the end point of the “section”.
In the case of the example shown in FIG.1022When the approximate expression representing x = f2 (t) is assumed, the storage capacity necessary for expressing the effective domain does not increase when the effective domain is updated. Generation data P1022The effective domain of x = f2 (t) before is input is I21∪I22∪I23It is. I21= {T21}, I22= [T22b, T22e], I23= [T23b, T23e], This effective domain is t21, T22b, T22e, T23b, T23eIt is expressed by five numbers. The end of this effective domain is the section I23End point (right end) of t23e= TlastIt is. Therefore, section I23At time tiThe section added with is I23∪ {ti} = [T23b, Ti]It can be expressed as. That is, the graph evaluation unit 17 sets the time t in the effective definition area of x = f2 (t).i, The effective domain after the update is {t21} ∪ [t22b, T22e] ∪ [t23b, Ti], The storage capacity necessary for expressing the effective domain does not increase.
On the other hand, in the example shown in FIG.1022Is assumed to be x = f3 (t), the storage capacity necessary for expressing the effective domain increases when the effective domain is updated. Generation data P1022The effective domain of x = f3 (t) before is input is I31It is. I31= [T31b, T31e], This effective domain is t31b, T31eIt is expressed by two numerical values. Where t31e<Tlast<TiIt is. tlastIs an effective domain of x = f2 (t), the graph evaluation unit 17lastCannot be included in the valid domain of x = f3 (t). Therefore, the graph evaluation unit 17 sets the time t in the effective definition area of x = f3 (t).iTo add31∪ {ti} = [T31b, T31e] ∪ {ti}, {Ti} Will be added as a point. As a result, when representing an effective domain, t31b, T31e, TiTherefore, the storage capacity increases by one numerical value.
Therefore, in the example shown in FIG. 13, the graph evaluation unit 17 selects x = f2 (t) among the approximate expressions that satisfy the criterion.
Note that a new occurrence time t also applies to the final approximate expression where the end of the valid domain is a “point”.iIs added to the effective domain, the storage capacity for expressing the effective domain is increased by one numerical value. This starts from the time that was “point” before the update.iThis is because a section with the end point is newly determined. That is, the graph evaluation unit 17 has to store the portion stored as “point” as “section”.
When a plurality of approximate expressions satisfy the accuracy criterion and the amount of increase in storage capacity for storing the updated effective domain is equal in each approximate expression, the graph evaluation unit 17 selects 1 from the approximate expressions. One approximate expression may be selected. The method for selecting the approximate expression may be a method in which one approximate expression is arbitrarily determined in advance.
Figure 14 shows time tiData P1023The case where this occurs is illustrated. In the example shown in FIG. 14, the occurrence time tiAn approximate value obtained by substituting x into f = (t1) is X1011= F1 (ti). TiAn approximate value obtained by substituting x into f = f3 (t) is X1013= F3 (ti). As for these two approximate expressions, both of the data P1023Data value xiSatisfies the criterion that the absolute value of the difference between the approximate value and the approximate value is less than the threshold value ε. That is, | xi-X1011| <Ε and | xi-X1013Both || ε is satisfied. Also, x = f1 (t) and x = f3 (t) are not final approximation expressions, and tiThe amount of increase in storage capacity when adding is equal in the two approximate equations.
In this case, the graph evaluation unit 17 calculates the approximate value and the actual data value x.iAn approximate expression that minimizes the difference between and may be selected. In the case of the example illustrated in FIG.i-X1011| <| Xi-X1013Since | <ε, the graph evaluation unit 17 may select x = f1 (t). This selection method is an example of selecting one approximate expression from a plurality of approximate expressions having the same increase in storage capacity, and the approximate expression may be selected by another method.
For example, t in the effective definition areaiAmong a plurality of approximate formulas having the same increase in storage capacity when adding, the graph evaluation unit 17 may select an approximate formula whose upper limit (end) of the effective domain is closest to the current time. In other words, the graph evaluation unit 17 may select an approximate expression having the maximum upper limit (end) of the effective domain. In the example shown in FIG. 14, the upper limit of the effective domain of x = f1 (t) is I13= {T13}. The upper limit of the effective domain of x = f3 (t) is the interval I31End point of t31eIt is. Therefore, t13<T31eTherefore, the graph evaluation unit 17 may select x = f3 (t). Further, the graph evaluation unit 17 may select one approximate expression from a plurality of approximate expressions by a method other than that exemplified here.
When there is an input from the graph evaluation unit 17, the graph update unit 18 updates the effective definition area of the approximate expression selected by the graph evaluation unit 17 according to the content, or the uncertain point storage unit 13. Add indeterminate points to. In addition, when there is an input from the new approximate expression generation unit 14, the graph update unit 18 newly registers the approximate expression in the approximate expression storage unit 15.
The graph update unit 18 uses the graph evaluation unit 17 to display the approximate expression determined by the graph evaluation unit 17 and its valid definition area, new generated data, and information indicating whether the determined approximate expression is the final approximate expression. The graph updating unit 18 updates the effective domain of the approximate expression. The graph updating unit 18 may update the effective definition area so as to add the generation time of new generation data to the effective definition area of the input approximate expression. The graph update unit 18 stores the updated effective definition area of the approximate expression determined by the graph evaluation unit 17 in the confirmed graph storage unit 19.
At this time, the graph updating unit 18 may update the effective domain by dividing the case as follows. As a first update mode, if the approximate expression input to the graph evaluation unit 17 is the final approximate expression, and the end of the effective domain of the approximate expression is the end point of the “section”, the graph update unit 18 The end point of the “section” may be replaced with the generation time of new generation data.
As a second update mode, if the approximate expression input to the graph evaluation unit 17 is the final approximate expression, and the end of the valid domain is a “point”, the graph update unit 18 uses the point as the start point, What is necessary is just to create the new "section" which makes the generation | occurrence | production time of new generation data the end point. Also, the “point” that was the end of the valid definition area is included in the “section”. Therefore, the graph updating unit 18 may exclude the “point” that is the end of the effective definition area from the classification of “points” in the effective definition area. In the example illustrated in FIG. 11, the graph updating unit 18 excludes one point from the “point” item and adds a new section to the “section” item.
As a third update mode, when the approximate expression input to the graph evaluation unit 17 is not the final approximate expression, the graph update unit 18 sets the generation time of the new generation data as “point” and validates the approximate expression. Add to the domain.
Further, when updating the effective definition area of the approximate expression determined by the graph evaluation unit 17, the graph update unit 18 also updates the final data information. The graph update unit 18 updates the approximate expression ID of the final data information (see FIG. 9) stored in the final time storage unit 11 to the approximate expression ID of the approximate expression determined by the graph evaluation unit 17, The last time is updated to the time of occurrence of new occurrence data.
When the graph evaluation unit 17 outputs only new generated data to the graph update unit 18 without selecting an approximate expression, the graph update unit 18 stores the generated data as an undefined point in the undefined point storage unit 13. Remember. In this case, the graph update unit 18 only stores one unconfirmed point in the unconfirmed point storage unit 13, and information stored in the final time storage unit 11, the approximate expression storage unit 15, and the confirmed graph storage unit 19. Will not be updated.
Further, when the new approximate expression generation unit 14 newly generates an approximate expression from the uncertain point, the new approximate expression generation unit 14 stores the approximate expression and each data used for generating the approximate expression (that is, the uncertain point storage). Each uncertain point stored in the unit 13 and newly input generated data) are output to the graph update unit 18. In this case, the graph updating unit 18 assigns an approximate expression ID to the new approximate expression, and stores the new approximate expression and the approximate expression ID in association with each other in the approximate expression storage unit 15. At this time, the graph update unit 18 deletes the undetermined point used for generating a new approximate expression. In other words, the graph update unit 18 deletes each undetermined point stored in the undetermined point storage unit 13. Further, the graph update unit 18 determines an effective definition area of the new approximate expression from the generation time of each data used for generating the approximate expression, and stores it in the definite graph storage unit 19 together with the approximate expression ID. When the valid domain is defined, the graph updating unit 18 reduces the number of sections having the generation time of the data used for generating the new approximate expression as the starting point and the ending point as much as possible, and the effective definition of the existing approximate expression. The effective domain is defined so that it does not overlap with the domain. The graph updating unit 18 may determine a time that exists independently between sections and points in the effective definition area of the existing approximate expression and cannot be set as the start point or end point of the section.
Referring to FIG. 11, FIG. 13, FIG. 12, and FIG. 15, an example of updating the valid domain by the graph update unit 18 will be described.
First, an example of effective domain update when the approximate expression determined by the graph evaluation unit 17 as an approximate expression for approximating the data value of new data is the final approximate expression will be described with reference to FIG. In the example shown in FIG. 13, the approximate expression ID that designates the approximate expression “x = f2 (t)”, the effective domain of the approximate expression, and the generated data P1022, “X = f2 (t)” is output from the graph evaluation unit 17 to the graph update unit 18 to the effect that it is the final approximate expression. The graph updating unit 18 uses the effective domain I of “x = f2 (t)”.21∪I22∪I23{Ti}, The effective domain is updated to add. At this time, the end of the effective domain is the section I23Since the end point of the graph, the graph updating unit 1823[T23b, T23e] (See FIG. 11) end point t23eTiUpdate to. The graph update unit 18 stores the updated valid definition area in the confirmed graph storage unit 19. As a result, the section corresponding to the approximate expression ID “f2” shown in FIG.22b, T22e], [T23b, T23e] To [t22b, T22e], [T23b, Ti] Is updated.
Next, an example of the effective domain update when the approximate expression determined by the graph evaluation unit 17 as an approximate expression that approximates the data value of the new data is not the final approximate expression will be described with reference to FIG. In the example shown in FIG. 12, the approximate expression ID that designates the approximate expression “x = f1 (t)”, the effective domain of the approximate expression, and the generated data P1021, Information indicating that “x = f1 (t)” is not the final approximate expression is output from the graph evaluation unit 17 to the graph update unit 18. In this case, since the approximate expression designated by the graph evaluation unit 17 is not the final approximate expression, the graph update unit 18 includes the generated data P in the effective definition area of x = f1 (t).1021Occurrence time tiMay be added as a “point”. In other words, in the example illustrated in FIG. 11, the graph update unit 18 sets t to the “point” of the effective domain corresponding to the approximate expression ID “f1”.iIs added, the stored contents of the confirmed graph storage unit 19 are updated.
Next, an example of new effective domain registration when the new approximate expression generation unit 14 newly generates an approximate expression is shown. In the example shown in FIG. 15, the new approximate expression generation unit 14 determines that the undetermined point P1030And new generation data P1031From the approximate expression “x = fnew(T) ″ and the new approximate expression generation unit 14 generates x = fnew(T) and uncertain point P1030And new generation data P1031Are output to the graph update unit 18. The graph updating unit 18 calculates the approximate expression “x = fnew(T) "is assigned an approximate expression ID, and" x = fnew(T) "is stored in the approximate expression storage unit 15. The graph update unit 18 also stores the undetermined point P.1030And new generation data P1031The effective domain is determined from The graph update unit 18 determines the uncertain point P shown in FIG.1030And new generation data P 10311 section I from the first data generation time to the last data generation time, which does not overlap with the effective domain of other approximate expressionsnewCan be determined. Therefore, the graph update unit 18 determines that “x = fnew(T) "and the effective definition area (section Inew) Is stored in the confirmed graph storage unit 19.
The data input unit 10, the new data generation time substitution unit 12, the new approximate expression generation unit 14, the accuracy constraint input unit 16, the graph evaluation unit 17, and the graph update unit 18 are executed by, for example, a CPU of a computer that operates according to the data summarization program. Realized. In this case, a program storage device (not shown) of the computer stores the data summarization program, and the CPU reads the program, and according to the program, the data input unit 10, the new data generation time substitution unit 12, and the new approximate expression generation unit 14 The accuracy constraint input unit 16, the graph evaluation unit 17, and the graph update unit 18 may be operated. In addition, each of these units may be realized by separate hardware.
Further, the final time storage unit 11, the uncertain point storage unit 13, the approximate expression storage unit 15, and the confirmed graph storage unit 19 may be realized by separate storage devices. Alternatively, it may be realized by the same storage device. Further, some combinations of the final time storage unit 11, the uncertain point storage unit 13, the approximate expression storage unit 15, and the confirmed graph storage unit 19 may be realized by the same storage device.
Next, the operation of the first embodiment will be described.
FIG. 16 is a flowchart illustrating an example of processing progress of the first embodiment. When a data generation source (not shown) sequentially generates data (No in step S100), data is sequentially input from the data generation source to the data input unit 10 (step S101). In this example, it is assumed that data is input to the data input unit 10 in order of generation time one by one. The data summarization system performs the subsequent operations for each piece of data. A plurality of data may be input to the data input unit 10 all together, but even in that case, the data summarization system performs the subsequent processing for each piece of data in order of data generation time.
Next, can the new approximate expression generation unit 14 generate a new approximate expression from one generated data input to the data input unit 10 and the uncertain points stored in the uncertain point storage unit 13? It is determined whether or not (step S102). In step S <b> 102, the new approximate expression generation unit 14 reads each undetermined point stored in the undetermined point storage unit 13, and uses the undetermined point and new generated data input to the data input unit 10. What is necessary is just to determine whether or not the number of data necessary for the approximate expression generation has been prepared. The new approximate expression generation unit 14 determines that a new approximate expression can be generated if the number of data necessary for generating the approximate expression is complete, and if it determines that a new approximate expression cannot be generated otherwise. Good. As already described, the number necessary for generating the approximate expression is determined in advance according to the type of approximate expression and the approximate expression generation algorithm.
When the new approximate expression generation unit 14 determines that a new approximate expression can be generated (Yes in step S102), the new approximate expression generation unit 14 inputs each uncertain point and 1 input to the data input unit 10. From the two pieces of generated data, an approximate expression that approximates the data value is generated using the generation time of the data as a variable (step S103).
Approximate expression types and approximate expression generation algorithms are determined in advance, but the types and algorithms are not particularly limited. As already described, it is assumed that the approximate expression is a linear expression with the occurrence time t as a variable, and when four data are prepared, the new approximate expression generation unit 14 generates four sets of occurrence time and data. An approximate expression may be generated by determining the first-order coefficient and the constant term from the value by the least square method. When the two pieces of data are collected, the new approximate expression generation unit 14 obtains a straight line passing through two points having (occurrence time, data value) as coordinates as a linear expression having the occurrence time t as a variable. May be. Also in this case, the new approximate expression generation unit 14 may determine the primary coefficient and the constant term.
In this example, it is assumed that the approximate expression is a linear expression, and the approximate expression is represented by a primary coefficient and a constant term of the variable t as shown in FIG. However, the expression form of the approximate expression is not limited to this form, and the approximate expression may be expressed in another form.
In step S103, the new approximate expression generation unit 14 outputs the generated new approximate expression and each data (each indeterminate point and new generated data) used for generating the approximate expression to the graph update unit 18.
The graph update unit 18 assigns an approximate expression ID to the approximate expression received from the new approximate expression generation unit 14 and stores the approximate expression ID in the approximate expression storage unit 15 (step S104). As a result, one additional approximate expression is newly registered. In step S104, the graph updating unit 18 further determines an effective definition area of the newly generated approximate expression based on the generation time of each data used for generating the approximate expression, and stores the definite graph together with the approximate expression ID. Store in the unit 19. At this time, the graph updating unit 18 satisfies the condition that the number of sections starting from and ending with the generation time of each data used for generating the approximate expression is as small as possible and does not overlap with the effective definition area of the existing approximate expression. Establish an effective definition area for the new approximate expression. The graph update unit 18 may be a point that exists independently between the sections and points of the effective definition area of the existing approximate expression and cannot be set as the start point or end point of the section as a “point (see FIG. 11)”. .
When the new approximate expression generation unit 14 determines that a new approximate expression cannot be generated (No in step S102), the new data generation time substitution unit 12 sets the generation time of the data input to the data input unit 10 as follows: Substitution is performed for each approximate expression already generated in the past (that is, each approximate expression stored in the approximate expression storage unit 15). Then, the new data generation time substitution unit 12 calculates an approximate value of the data input to the data input unit 10 for each approximate expression (step S105). Then, the new data generation time substitution unit 12 outputs the approximate value calculated using the approximate expression and the set of the approximate expression to the graph evaluation unit 17. At this time, the new data generation time substitution unit 12 refers to the final data information and also outputs to the graph evaluation unit 17 information indicating which is the final approximate expression and the set of approximate values calculated from the final approximate expression. . Further, the new data generation time substitution unit 12 also outputs the generation data (data input to the data input unit 10) that is currently processed to the graph evaluation unit 17. Details of the processing in step S105 will be described later.
After step S105, the graph evaluation unit 17 compares the approximate value of the data calculated for each approximate expression with the actual data value of the data input to the data input unit 10 (step S106). In this example, the accuracy constraint input unit 16 stores a criterion that the absolute value of the difference between the approximate value and the actual data value is less than the threshold value ε (ie, | x−f (t) | <ε). The graph evaluation unit 17 is an example in the case of selecting an approximate expression that satisfies this criterion. In this case, the graph evaluation unit 17 calculates the absolute value of the difference between the approximate value of the data calculated for each approximate expression and the actual data value of the data input to the data input unit 10 in step S106.
Next, the graph evaluation unit 17 determines whether there is an approximate expression that satisfies the criterion that the absolute value of the difference calculated in step S106 is less than the threshold ε (step S107).
When there is an approximate expression that satisfies the criterion (Yes in step S107), the graph evaluation unit 17 selects an approximate expression that minimizes the amount of increase in storage capacity at the time of updating the effective domain from among the approximate expressions that satisfy the criterion. Select (step S108). That is, the graph evaluation unit 17 selects an approximate expression that minimizes the amount of increase from the storage capacity for storing the effective definition area before the update to the storage capacity for storing the effective definition area after the update.
However, if there is only one approximate expression that satisfies the criterion, the graph evaluation unit 17 may select the approximate expression.
In addition, if there are a plurality of approximate expressions satisfying the criteria, and there is a final approximate expression that is the final approximate expression and the end of the effective domain is the end point of the “section”, among the approximate expressions, The graph evaluation unit 17 may select the final approximate expression. This is because the final approximate expression minimizes the increase in storage capacity.
In addition, there are multiple approximate expressions that meet the criteria, and among these approximate expressions, there is a final approximate expression in which the end of the effective section is the end point of the “section”. If not, the graph evaluation unit 17 selects one from a plurality of approximate expressions that satisfy the criterion. This selection method may be determined in advance.
When the graph evaluation unit 17 selects one approximate expression satisfying the criterion in step S108, the graph evaluation unit 17 outputs the selected approximate expression and its effective definition area and the generated data to be processed to the graph update unit 18. The graph evaluation unit 17 also outputs information indicating whether or not the selected approximate expression is the final approximate expression to the graph update unit 18. The graph evaluation unit 17 may output an approximate expression ID for identifying the approximate expression to the graph update unit 18 as an approximate expression.
The graph update unit 18 updates the effective definition area of the approximate expression received from the graph evaluation unit 17 in step S108, and stores it in the confirmed graph storage unit 19 (step S109). Since each mode in which the graph update unit 18 updates the valid domain has already been described, a description thereof is omitted here. In step S109, the graph update unit 18 updates the approximate expression ID of the final data information (see FIG. 9) stored in the final time storage unit 11 to the approximate expression ID of the approximate expression selected in step S108. Then, the final time (see FIG. 9) of the final data information is updated to the generation time of the generated data to be processed.
When it is determined in step S107 that there is no approximate expression that satisfies the criterion (No in step S107), the graph evaluation unit 17 outputs the generated data to be processed to the graph update unit 18, and the graph update unit 18 stores the generated data as an undetermined point in the undetermined point storage unit 13 (step S110).
When the data summarization system completes any one of steps S104, S109, and S110, the same processing is repeated for the next data (next data in the order of occurrence time) input to the data input unit 10. . In this way, the processing from step S102 onward is performed individually for each piece of generated data input to the data input unit 10 in the order of generation time. When the data generation source ends the data generation (Yes in step S100), the data summarization system ends the process.
In the example shown in FIG. 16, the case where the accuracy constraint input unit 16 stores a criterion that the absolute value of the difference between the approximate value and the actual data value is less than the threshold ε is exemplified. The criteria stored in the input unit 16 are not limited to the above criteria.
Next, the operation of step S105 will be described in more detail. FIG. 17 is a flowchart illustrating an example of the processing progress of step S105. In step S105, the new data generation time substitution unit 12 determines whether there is an approximate expression that has not yet been read from the approximate expression storage unit 15 (step S201). If there is an approximate expression that has not been read, the new data generation time substitution unit 12 reads the approximate expression and the approximate expression ID stored in the approximate expression storage unit 15 (step S202). Here, the approximate expression read by the new data generation time substitution unit 12 is x = F (t), and the approximate expression ID is “f”. Next, the new data generation time substitution unit 12 adds the generation time t of the data input to the data input unit 10 to the read approximate expression x = F (t).iIs substituted and approximate value F (ti). The new data generation time substitution unit 12 then uses the approximate expression ID “f” and the approximate value F (ti) Are output to the graph evaluation unit 17 (step S203).
After step S203, the new data generation time substitution unit 12 repeats the processing after step S201. If there is no approximate expression that has not yet been read from the approximate expression storage unit 15 (No in step S201), the new data generation time substitution unit 12 reads the final data information from the final time storage unit 11 (step S204). The approximate expression ID of the final approximate expression included in the final data information is output to the graph evaluation unit 17 (step S205). At this time, the new data generation time substitution unit 12 also outputs to the graph evaluation unit 17 the data to be processed input to the data input unit 10.
Note that the new data generation time substitution unit 12 may execute steps S204 and S205 before the loop processing of steps S201 to S203.
The data summarization system according to the first embodiment is such that “data that is generated sequentially with a certain tendency and may change irregularly, such as the CPU data rate and the number of web page accesses”. Are stored as an approximate expression using the occurrence time as a variable and an effective domain where it can be said that approximation by the approximate expression is appropriate. The effective domain of one approximate expression is represented as a set of points indicating a time interval and a time point. The data summarization system according to the first embodiment specifies an approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16 for new data, and approximates the new data using the approximate expression. Therefore, the data summarization system of the first embodiment can summarize (compress) data with high accuracy.
In addition, since the effective definition area of one approximate expression is allowed to include a plurality of “sections” and “points”, the data summarization system of the first embodiment suppresses the storage capacity of the summarized data. And efficient data summarization can be realized. For example, after a state in which data that can be approximated by a certain approximate expression is continuously generated (first state), the tendency of the data value is temporarily changed (second state), and then again in the original approximate expression. Assume that a state (third state) in which data that can be approximated is generated occurs. In this case, although there are three states, since the effective domain allows a plurality of “sections” and “points” to be included, the data summarization system of the first embodiment includes the first state and the first state. The generated data in the three states can be expressed by the same approximate expression, and the storage capacity can be reduced accordingly. Assuming that the effective domain includes only one section or point, in the above example, the data summarization system uses the approximate expression and effective domain in the first state and the approximate expression and effective definition in the third state. Each area needs to be stored, and the approximate expression is stored redundantly, resulting in an increase in storage capacity. The data summarization system of the first embodiment can prevent such an increase in storage capacity.
A specific example is shown by taking as an example a case where data is sequentially generated as in the case shown in FIG. It is assumed that data compression is performed on such data by applying the technique described in Patent Document 1. In that case, when a certain data is set as a reference point (pivot), and the difference between the pivot and the data value of the generated data exceeds the compression accuracy, they are not approximated. In this case, as shown in FIG. 18, approximate equations x = h1 (t), x = h2 (t), x = h3 (t), x = h4 (t), x = h5 (t) 5 approximation formulas and point P that cannot be approximated by them191, P192, P193The generation data group is represented. Then, one effective time zone is determined for one approximate expression. Each approximate expression is a linear function, and the approximate expression is expressed by a linear coefficient and a constant term. Therefore, each approximate expression is expressed by two numerical values. Further, since there is one section corresponding to the approximate expression, the section is represented by two numerical values of the start point and end point of the section. Therefore, a storage capacity for four numerical values is required to store one approximate expression and its section. In the example shown in FIG. 18, since there are five approximate expressions, the storage capacity for storing each approximate expression and its section is 4 × 5 = 20.
Here, for the sake of convenience, the storage capacity is represented by the number. In the example shown in FIG. 18, the point P is further added as information to be stored.191, P192, P193There is. The data summarization system must store the time of occurrence and data value for one point. For example, the data summarization system has a point P191Occurrence time t1And data value X191Must be remembered. Therefore, the data summarization system needs a storage capacity of two numerical values for one point. Therefore, in the case illustrated in FIG. 18, the required storage capacity is 4 × 5 + 2 × 3 = 26.
On the other hand, the data summarization system of the first embodiment approximates the same generated data with the four approximate expressions shown in FIG. In this case, the capacity required for storing the approximate expression itself is 2 × 4 = 8. Focusing on the storage capacity of the effective domain, the data summarization system of the first embodiment may store one section and two points for x = f1 (t). Regarding the section to be stored as an effective definition area, a storage capacity for two numerical values is sufficient, and for a point, a storage capacity for one numerical value is sufficient. Therefore, for the effective definition area of x = f1 (t), the required storage capacity is 2 × 1 + 2 = 4. The effective domain of the approximate expression x = f0 (t) is 3 sections. The effective domain of the approximate expression x = f2 (t) is two sections and one point. The effective domain of the approximate expression x = f3 (t) is one section. Therefore, the capacity required for storing each effective domain is (2 × 3) + (2 × 1 + 2) + (2 × 2 + 1) + (2 × 1) = 17. Therefore, the data summarization system of the first embodiment can store the summary results with a storage capacity of 8 + 17 = 25. Therefore, the data summarization system of the first embodiment can reduce the storage capacity.
It should be noted that the present invention aims at summarizing “data that is generated sequentially with a certain tendency and whose data tendency may change irregularly”, but temporarily irregular data May occur continuously. FIG. 19 is an explanatory diagram illustrating an example of a situation in which irregular data is continuously generated temporarily. In the period “b” shown in FIG. 19, generation of data represented by the approximate expression x = f2 (t) continues. Similarly, generation of data represented by the approximate expression x = f1 (t) continues in the period “c”, and generation of data represented by the approximate expression x = f3 (t) continues in the period “d”. . However, in the period “a”, data having different tendencies (in other words, data approximated by different approximate expressions) are continuously generated. In this period “a”, irregular data is continuously generated, and each of these data may be accumulated as uncertain points. In this case, an approximate expression different from the ideal approximate expression is obtained from those uncertain points. However, if data with a certain trend is continuously generated and the trend changes irregularly, an approximate expression different from the ideal approximate expression may be temporarily obtained. In the long run, data with a certain tendency often occurs continuously. Therefore, an ideal approximate expression can be obtained from these data. For example, even if an undesirable approximate expression is obtained in section a in FIG. 19, x = f1 (t), x = f2 (t), x = f3 (t) in the subsequent sections b, c, and d. An ideal approximation formula such as is obtained. As a result, efficient data summarization can be realized as a whole.
In addition, when the approximate expression is a linear expression at time t and one uncertain point and one new data are prepared, the data summarization system of the first embodiment uses a straight line connecting the two points as an approximate expression. Suppose that it generates. In this case, the data summarization system according to the first embodiment does not store data with a storage capacity larger than at least the storage capacity in the case of storing the data itself as it is. For example, when storing two pieces of data, the data summarization system according to the first embodiment stores two numerical values, that is, a data value and an occurrence time, so that two pieces of data are equivalent to four numerical values. Storage capacity is required. Here, as described above, it is assumed that the data summarization system according to the first embodiment generates a straight line connecting two points as an approximate expression when one uncertain point and one new data are prepared. To do. In this case, since the approximate expression is a linear expression, the data summarization system according to the first embodiment needs to store a linear coefficient and a constant term. Moreover, the data summarization system of 1st Embodiment should just memorize | store the generation | occurrence | production time of two data as a starting point and an end point about an effective definition area. Therefore, it is the numerical value of the storage capacity required for storing the approximate expression and the effective domain. Therefore, the data summarization system of the first embodiment stores at least the approximate expression and the effective domain with the same capacity as when storing two pieces of data as they are, even if the compression is not efficient in this case. it can.
Embodiment 2. FIG.
FIG. 20 is a block diagram illustrating an example of a data summarization system according to the second embodiment of this invention. Constituent elements similar to those in the first embodiment are denoted by the same reference numerals as those in FIG. 2, and detailed description thereof is omitted. The data summarization system of the second embodiment includes a data input unit 10, a final time storage unit 11, a new data generation time substitution unit 12, a new approximate expression generation unit 14, an indeterminate point storage unit 13, and an approximation. An expression storage unit 15, an accuracy constraint input unit 16, a graph evaluation unit 17a, a graph update unit 18a, a deterministic graph storage unit 19, a default expression input unit 20, and a default expression storage unit 21 are provided.
The default expression storage unit 21 is a storage device that stores one approximate expression that is known as an approximate expression for obtaining an approximate value of a data value. For example, FIG. 10 of the first embodiment shows a case where four approximate expressions from x = f0 (t) to x = f3 (t) are generated. Is known before the start of processing. The default formula storage unit 21 stores an approximate formula that is known before the start of such processing as a default formula.
The data summarization system according to the second embodiment reduces the storage capacity of the effective definition area by not defining a predefined effective definition area, and realizes data summarization more efficiently than the first embodiment.
FIG. 21 is an explanatory diagram illustrating an example of a default formula stored in the default formula storage unit 21. In the example shown in FIG. 21, the default equation storage unit 21 stores a first-order coefficient and a constant term of a default equation. Here, as in the case shown in FIGS. 7 and 8, the linear expression of the variable t is expressed by the primary coefficient of the variable and the constant term. That is, FIG. 21 illustrates a case where a predetermined primary expression of the variable t is stored. Specifically, the default expression storage unit 21 has x = a0Xt + b0The case where the predetermined expression is stored is illustrated.
In FIG. 21, as in the case shown in FIGS. 7 and 8, the case where the default expression is expressed by a combination of the first-order coefficient and the constant term is illustrated, but the default expression may be expressed in other forms. Good. FIG. 22 shows an example in which the default expression is expressed only by the constant term. In the example shown in FIG. 22, x = x0The default formula (constant) is shown. This is the same as the default formula when the primary coefficient is 0, and the approximate value is x regardless of the variable (occurrence time).0It represents that. The predetermined formula may also be expressed by various functions such as a quadratic or higher-order integer function, an exponential function, and a trigonometric function.
The default formula input unit 20 receives a default formula input from the user and stores the default formula in the default formula storage unit 21.
In the graph evaluation unit 17a in the second embodiment, each approximate value calculated by the new data generation time substituting unit 12 substituting the generation time of new data for each approximate expression stored in the approximate expression storage unit 15 And the actual data value of the new data. This point is the same as the graph evaluation unit 17 of the first embodiment. The graph evaluation unit 17a of the second embodiment further calculates an approximate value when the time of new data (generated data newly generated and received from the new data generation time substituting unit 12) is substituted into a predetermined formula. The approximate value is also compared with the actual data value of the new data. Then, the graph evaluation unit 17a identifies an approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16. Then, when there are a plurality of approximate expressions that satisfy the criterion, the graph evaluation unit 17a identifies an approximate expression that minimizes the increase in storage capacity when updating the effective domain among the approximate expressions that satisfy the criterion. The approximate expression is determined to approximate the data value of the new data. When there is only one approximate expression that satisfies the criterion, the graph evaluation unit 17a determines that the approximate expression approximates the data value of the new data.
Here, when there are a plurality of approximate expressions satisfying the criteria, the graph evaluating unit 17a selects a predetermined expression if there is a predetermined expression in the approximate expressions. This is because an effective definition area is not defined in the default formula, so that the storage capacity for expressing the effective definition area is zero. When there are a plurality of approximate expressions that satisfy the criteria and there is no default expression among the approximate expressions, the method of selecting an approximate expression is the same as that of the graph evaluation unit 17 in the first embodiment, and a description thereof will be omitted.
If the approximate expression determined by the graph evaluation unit 17a is an approximate expression other than the default expression, the graph evaluation unit 17a includes the determined approximate expression and its effective domain, and new data (from the new data generation time substitution unit 12). The received generated data) is output to the graph updating unit 18a. At this time, the graph evaluation unit 17a also outputs information indicating whether or not the determined approximate expression is the final approximate expression to the graph update unit 18a. Further, the graph evaluation unit 17a may output, for example, the approximate expression ID to the graph update unit 18a as the determined approximate expression. This operation is the same as the operation of the graph evaluation unit 17 in the first embodiment.
On the other hand, if the determined approximate expression is a default expression, the graph evaluation unit 17a outputs information notifying that the default expression has been selected and the input new generated data to the graph update unit 18a. As information for notifying that the default formula has been selected, a default formula ID dedicated to the default formula representing the default formula may be used.
Also, there may be no approximate expression that satisfies the criteria stored in the accuracy constraint input unit 16. In that case, the graph evaluation unit 17a may output new data to the graph update unit 18a without selecting an approximate expression. This operation is the same as the operation of the graph evaluation unit 17 in the first embodiment.
In the graph update unit 18a in the second embodiment, the graph evaluation unit 17a determines an approximate expression other than the default expression, and the approximate expression and its effective domain, new generation data, and the determined approximation are determined from the graph evaluation unit 17a. When information indicating whether or not the expression is the final approximate expression is received, the effective domain of the approximate expression is updated. The graph updating unit 18a may update the effective definition area so as to add the generation time of new generation data to the effective definition area of the received approximate expression. The graph update unit 18a stores the updated effective definition area of the approximate expression determined by the graph evaluation unit 17a in the confirmed graph storage unit 19. This operation is the same as the operation of the graph update unit 18 in the first embodiment.
In addition, the graph evaluation unit 17a selects a predetermined formula as an approximation formula that approximates the data value of the new data, and notifies the fact that the default formula has been selected and the input new generated data to the graph update unit 18a. The graph updating unit 18a operates as follows. That is, the graph update unit 18a stores the generation time of the new data and the default formula dedicated to the default formula indicating the default formula in the final time storage unit 11 as final data information.
When the graph evaluation unit 17a outputs only new generated data to the graph update unit 18a without selecting an approximate expression, the graph update unit 18a sets the generated data as an undefined point in the undefined point storage unit 13. Remember. This operation is the same as the operation of the graph update unit 18 in the first embodiment.
Further, the new approximate expression generation unit 14 newly generates an approximate expression from the uncertain point, and the approximate expression and each data used for generating the approximate expression (that is, each of the data stored in the uncertain point storage unit 13). The operation of the graph update unit 18a when the uncertain point and newly generated data) are output to the graph update unit 18a is the same as the operation of the graph update unit 18 in the first embodiment.
The graph evaluation unit 17a, the graph update unit 18a, and the default input unit 20 in the second embodiment are realized by a CPU of a computer that operates according to a data summarization program, for example. In this case, a program storage device (not shown) of the computer stores the data summarization program, and the CPU reads the program, and according to the program, the data input unit 10, the new data generation time substitution unit 12, and the new approximate expression generation unit 14 The accuracy constraint input unit 16, the graph evaluation unit 17 a, the graph update unit 18 a, and the default expression input unit 20 may be operated. In addition, each of these units may be realized by separate hardware.
An example of processing progress of the second embodiment will be described with reference to FIG. However, description of the same processing as in the first embodiment will be omitted. The operation up to step S105 is the same as in the first embodiment. After step S105, the graph evaluation unit 17a compares the approximate value of the data calculated for each approximate expression with the actual data value of the data input to the data input unit 10 (step S106). However, in the second embodiment, the graph evaluation unit 17a substitutes the generation time of new data for the default formula stored in the default formula storage unit 21, and obtains the approximate value obtained as a result and the actual data value. Compare. For example, when the accuracy constraint input unit 16 stores a reference of | x−f (t) | <ε, the graph evaluation unit 17a calculates the absolute value of the difference between the approximate value and the actual data value. Good.
Next, the graph evaluation unit 17a determines whether there is an approximate expression that satisfies the criterion that the absolute value of the difference calculated in step S106 is less than the threshold ε (step S107). The operation when there is no approximate expression that satisfies the criterion (No in step S107) is the same as in the first embodiment.
When there are a plurality of approximate expressions that satisfy the criterion (Yes in step S107), the graph evaluation unit 17a selects an approximate expression that minimizes the amount of increase in the storage capacity at the time of effective domain update from the approximate expressions that satisfy the criterion. (Step S108). In step S108, if there is a predetermined expression in the approximate expression satisfying the criterion, the graph evaluation unit 17a selects the predetermined expression, information notifying that the predetermined expression has been selected, and the newly generated data that has been input. Is output to the graph updating unit 18a. The operation when there is no predetermined expression in the approximate expression that satisfies the criterion is the same as in the first embodiment.
When the information for notifying that the predetermined formula is selected and the generated data are received from the graph evaluation unit 17a, the graph update unit 18a receives the final data information including the default formula ID and the generation time of the data as the final time storage unit 11. (Step S109). The operation in step S109 in other cases is the same as that in the first embodiment, and a description thereof will be omitted.
The data summarization system of the second embodiment does not provide an effective definition area for the default formula. In the data summarization system of the second embodiment, the generated data determined that the approximate expression that approximates the data value of the data does not correspond to the default expression is associated with any approximate expression. Therefore, the data summarization system of the second embodiment can efficiently summarize generated data approximated by an approximate expression other than the default expression with a small storage capacity, as in the first embodiment. . Further, the data summarization system of the second embodiment does not store the effective domain for data approximated by a predetermined formula, so that the summarization can be performed more efficiently. Therefore, the data summarization system according to the second embodiment can efficiently summarize with a smaller storage capacity than the first embodiment.
For example, it is assumed that x = f0 (t) is a default expression among the four approximate expressions shown in FIG. In this case, the capacity necessary for storing the approximate expression itself is 8 (= 2 × 4) numerical values. The data summarization system according to the second embodiment does not need to store the effective domain of x = f0 (t) with respect to the effective domain, so the capacity necessary for storing each effective domain is (2 × 1 + 2) + (2 × 2 + 1) + (2 × 1) = 11. Therefore, the capacity necessary for storing the summarized data is 19 (= 8 + 11), and the data summarization system of the second embodiment can further reduce the storage capacity compared to the first embodiment.
Embodiment 3. FIG.
The data summarization system according to the third embodiment first stores data received from a data generation source (not shown) as it is without being summarized, and when storage resources (memory resources) for storing the data are insufficient, Summarize the stored data.
FIG. 23 is a block diagram illustrating an example of a data summarization system according to the third embodiment of this invention. The data summarization system according to the third embodiment includes a data input unit 10, a final time storage unit 11, a new data generation time substitution unit 12, a new approximate expression generation unit 14, an uncertain point storage unit 13, and an approximation. Expression storage unit 15, accuracy constraint input unit 16, graph evaluation unit 17, graph update unit 18, confirmed graph storage unit 19, unsummarized data storage unit 30, available storage area monitoring unit 31, and summary A control unit 32 is provided.
Components similar to those in the first embodiment are denoted by the same reference numerals as those in FIG. 2, and detailed description thereof is omitted. However, when data is input to the data input unit 10 from a data generation source (not shown), the data input unit 10 outputs the input data to the summary control unit 32. Further, the new data generation time substitution unit 12 and the new approximate expression generation unit 14 receive data from the summary control unit 32.
The unsummarized data storage unit 30 is a storage device that stores data generated by a data generation source (not shown) in an unsummarized state. Since the data includes a data value and an occurrence time, the unsummary data storage unit 30 stores a data group including the data value and the occurrence time. FIG. 4 is an explanatory diagram illustrating a plurality of uncertain points stored in the uncertain point storage unit 13, but the unsummary data storage unit 30 also includes data values and generation times as illustrated in FIG. Store multiple data. The process of storing data in the unsummarized data storage unit 30 is performed by the summary control unit 32.
The unsummarized data storage unit 30, the final time storage unit 11, the uncertain point storage unit 13, the approximate expression storage unit 15, and the confirmed graph storage unit 19 may be realized by separate storage devices. Alternatively, they may be realized by the same storage device. Further, some combinations of the unsummarized data storage unit 30, the final time storage unit 11, the uncertain point storage unit 13, the approximate expression storage unit 15, and the confirmed graph storage unit 19 are realized by the same storage device. May be. For example, the unsummarized data storage unit 30 and the final time storage unit 11 are realized by the same storage device, and the approximate expression storage unit 15 and the definite graph storage unit 19 are realized by a storage device different from the storage device. The uncertain point storage unit 13 may be realized by the storage device.
The available storage area monitoring unit 31 monitors the amount of available resources in the storage device that stores at least unsummarized data, and outputs the monitoring result to the summary control unit 32. The unsummarized data here means data stored in the unsummarized data storage unit 30 from the data generation source via the data input unit 10 and the summary control unit 32. That is, it means data that has not yet been summarized as a target of the summary process. Therefore, the available storage area monitoring unit 31 only needs to monitor the amount of resources that can be used in the unsummarized data storage unit 30. If the unsummarized data storage unit 30 is realized by the same storage device as other storage units, the available storage area monitoring unit 31 may monitor the amount of resources that can be used in the storage device.
As an example of an aspect in which the available storage area monitoring unit 31 monitors an available resource amount, there is an aspect in which the remaining available memory capacity is monitored. However, this aspect is an example, and the available storage area monitoring unit 31 may monitor the resource amount in another aspect. For example, when the unsummarized data storage unit 30 is a disk storage device, the unused rate of the disk may be monitored. Here, the case where the available storage monitoring unit 31 monitors the amount of available resources is shown, but the amount of resources already used by the available storage monitoring unit 31 (for example, the disk usage rate). ) May be monitored. In the following description, the case where the available storage area monitoring unit 31 monitors the amount of available resources in the unsummarized data storage unit 30 will be described as an example.
Further, the available storage area monitoring unit 31 may monitor the unsummarized data storage unit 30 at regular intervals. Alternatively, for example, the available storage area monitoring unit 31 may monitor the unsummarized data storage unit 30 when an instruction to perform monitoring is input at an arbitrary timing from a user or the like.
The summary control unit 32 stores the generated data input to the data input unit 10 in the unsummarized data storage unit 30 in an unsummarized state according to the monitoring result by the available storage area monitoring unit 31. Alternatively, the summary control unit 32 performs summary control for summarizing the data stored in the unsummarized data storage unit 30 according to the monitoring result by the available storage area monitoring unit 31. If the amount of available resources (for example, the remaining memory capacity or the disk unused rate) in the unsummarized data storage unit 30 is larger than the threshold value, the summarization control unit 32 determines the generated data input to the data input unit 10. Then, the data is stored in the unsummarized data storage unit 30 without being summarized. On the other hand, if the available resource amount in the unsummarized data storage unit 30 is equal to or less than the threshold value, the summary control unit 32 performs summary control for summarizing the data stored in the unsummary data storage unit 30. Specifically, the summary control unit 32 outputs the data stored in the non-summary data storage unit 30 to the new approximate expression generation unit 14 and the new data generation time substitution unit 12 to start the data summarization process. . When the summary control unit 32 outputs the data stored in the non-summary data storage unit 30 to the new approximate expression generation unit 14 and the new data generation time substitution unit 12, the summary control unit 32 stores the data from the unsummary data storage unit 30. to erase.
In addition, when the available storage monitoring unit 31 monitors the amount of resources already used, the summary control unit 32 displays the generated data when the amount of used resources is less than the threshold, as an unsummarized data storage unit. 30 and the summary control may be performed when the amount of resources used is equal to or greater than the threshold.
Also, when performing summary control, the summary control unit 32 may store new generated data in the unsummary data storage unit 30 and simultaneously perform summary control on the data.
The summary control unit 32 outputs the data stored in the non-summary data storage unit 30 to the new approximate expression generation unit 14 and the new data generation time substitution unit 12 one by one when performing summary control. The summary control unit 32 outputs the same data simultaneously to the new approximate expression generation unit 14 and the new data generation time substitution unit 12, for example. The summary control unit 32 should output the data one by one so that the output order of each data satisfies the condition that the generation time of the data output after the generation time of the data output earlier is later. That's fine. However, when erasing output data from the unsummary data storage unit 30, the summary control unit 32 does not need to delete the output data in order of time if the condition that the output data is not output again is satisfied. Good.
In addition, the summary control unit 32 does not necessarily have to output all of the data stored in the non-summary data storage unit 30 to the new approximate expression generation unit 14 and the new data generation time substitution unit 12 as summary targets. . FIG. 24 is a schematic diagram in which the data stored in the unsummarized data storage unit 30 is schematically arranged in the order of occurrence time. It is assumed that the data 51 is data that is first input to the data input unit 10 and stored in the non-summary data storage unit 30, and thereafter, the data 52 and subsequent data are sequentially stored in the non-summary data storage unit 30. The summary control unit 32 may output “data to the new approximate expression generation unit 14 and the new data generation time substitution unit 12 in the order of occurrence from the data 51. Alternatively, the summary control unit 32 may be in the middle of the generated data ( For example, from the data 55), the data may be output in the order of generation time to the new approximate expression generation unit 14 and the new data generation time substitution unit 12. In this case, the data 51 to 54 are not subject to summarization and are deleted. It is kept in the unsummary data storage unit 30 without being done.
If the condition that the generation time of data to be output later is later than the generation time of data to be output first to the new approximate expression generation unit 14 and the new data generation time substitution unit 12 is summarized, The control unit 32 may skip and output the data. For example, after the output of the data 51 to 54, the summary control unit 32 may skip the data 55 and output the data 56 to 59. Also in this case, the skipped data is not included in the summary target and is kept in the unsummarized data storage unit 30.
Further, it is assumed that data summarization is started and final data information is stored in the final time storage unit 11. In this case, the summary control unit 32 may output data satisfying the condition that the generation time is after the final time indicated by the final data information to the new approximate expression generation unit 14 and the new data generation time substitution unit 12.
The summary control unit 32 outputs the input generated data as it is to the new data generation time substituting unit 12 and the new approximate expression generating unit 14 in order to summarize the new generated data when available resources are still smaller. . As described above, a threshold for determining whether to summarize new data without storing it in the unsummarized data storage unit 30 may be set in advance separately from the above threshold.
When the summary control unit 32 outputs data to the new approximate expression generation unit 14 and the new data generation time substitution unit 12, the new approximate expression generation unit 14, the new data generation time substitution unit 12, the graph evaluation unit 17, and the graph update unit The operation 18 is the same as in the first embodiment. That is, the new approximate expression generation unit 14, the new data generation time substitution unit 12, the graph evaluation unit 17, and the graph update unit 18 perform the steps shown in FIG. 16 for each piece of data output from the summary control unit 32. The operations after S102 are performed. When the process proceeds from step S102 to step S103, the new data generation time substitution unit 12 does not perform processing.
The available storage area monitoring unit 31 and the summary control unit 32 in the third embodiment are realized by, for example, a CPU of a computer that operates according to a data summarization program. In this case, the program storage device (not shown) of the computer stores the data summarization program, and the CPU reads the program, and according to the program, the data input unit 10, the available storage area monitoring unit 31, the summarization control unit 32, the new The data generation time substitution unit 12, the new approximate expression generation unit 14, the accuracy constraint input unit 16, the graph evaluation unit 17, and the graph update unit 18 may be operated. In addition, each of these units may be realized by separate hardware.
The data summarization system of the third embodiment stores data in the unsummarized data storage unit 30 without summarizing the data if there are many resources that can be used to store the data. In this case, the data summarization system according to the third embodiment does not summarize the data, and therefore can hold the data with high accuracy. The data summarization system according to the third embodiment summarizes the data stored in the unsummarized data storage unit 30 when the resources that can be used to store the data are reduced. Similar to the embodiment, it is possible to efficiently store with a small storage capacity in the form of the approximate expression and its effective domain. Therefore, the data summarization system of the third embodiment can realize efficient data summarization as in the other embodiments, and can hold data with high accuracy when there are many available resources.
Note that each component shown in FIG. 23 may be realized by a plurality of devices instead of being realized by one device. For example, the data input unit 10, the available storage area monitoring unit 31, the summary control unit 32, the unsummary data storage unit 30, and the final time storage unit 11 may be realized by the first information processing apparatus. . The new data generation time substitution unit 12, the new approximate expression generation unit 14, the uncertain point storage unit 13, the accuracy constraint input unit 16, the graph evaluation unit 17, and the graph update unit 18 include the second information. You may implement | achieve with a processing apparatus. And the approximate expression memory | storage part 15 and the definite graph memory | storage part 19 may be implement | achieved by the database apparatus. The data summarization system may be configured to include the first information processing device, the second information processing device, and the database device.
In addition, the data summarization system of the third embodiment includes a default formula input unit 20 and a default formula storage unit 21 (see FIG. 20), as in the second embodiment, and includes a graph evaluation unit 17 and a graph update unit 18. Instead, a configuration including a graph evaluation unit 17a and a graph update unit 18a (see FIG. 20) may be employed. The data summarization system including such a configuration can perform data summarization more efficiently as in the second embodiment.
Embodiment 4 FIG.
In the fourth embodiment, a predetermined formula is used as in the second embodiment. However, in the data summarizing system of the fourth embodiment, the storage capacity of the effective domain of any approximate expression other than the default formula is greater than the storage capacity when the effective domain is defined for the default formula. If it is larger, the approximate expression is changed to a default expression.
FIG. 25 is a block diagram illustrating an example of a data summarization system according to the fourth embodiment of this invention. The same components as those of the second embodiment are denoted by the same reference numerals as those in FIG. 20, and detailed description thereof is omitted. The data summarization system according to the fourth embodiment includes a data input unit 10, a final time storage unit 11, a new data generation time substitution unit 12, a new approximate expression generation unit 14, an uncertain point storage unit 13, and an approximation. Formula storage unit 15, accuracy constraint input unit 16, graph evaluation unit 17a, graph update unit 18a, deterministic graph storage unit 19, default formula input unit 20, default formula storage unit 21, and default formula valid definition An area calculation unit 40, an effective domain capacity evaluation unit 41, and an effective domain update unit 42 are provided.
As described in the second embodiment, the valid definition area is not defined in the default formula stored in the default formula storage unit 21. The default effective domain calculator 40 calculates the storage capacity required for the effective domain of each approximate expression other than the default formula and the storage capacity required for the effective domain of the default formula when the effective domain is defined in the default formula. In order to determine the magnitude relationship, a default effective definition area is calculated, and the effective definition area is output to the effective definition area capacity evaluation unit 41. The default valid domain calculator 40 calculates the default valid domain as follows. A time period from the time of occurrence of the first occurrence data to the occurrence time of the last occurrence data at the present time is expressed as ∪. In addition, the effective domain of each approximate expression x = f0 (x), x = f1 (x),.0, S1, ..., SnAnd In addition, the default effective domain is SdefaultAnd The default effective range calculator 40 performs the calculation of the following formula (1) to obtain the effective range S of the default formula.defaultAsk for.
Sdefault= ∪- (S0∪S1∪ ・ ・ ・ ∪Sn) Formula (1)
That is, the default effective domain calculator 40 removes the union of the effective domains of each approximate expression from the time zone from the time of occurrence of the first occurrence data to the time of occurrence of the last occurrence data at the present time, Default valid domain SdefaultCalculate
FIG. 26 is an explanatory diagram showing an example of deriving a default effective definition area. The horizontal axis shown in FIG. 26 represents the data generation time t, and the vertical axis represents the data value x. Each data is T0~ T7It is assumed that it occurs in the time zone. In FIG. 26, the default formula is x = f0 (t), and the time is T0It is assumed that it is an integer above. x = f1 (t) and x = f2 (t) are approximate expressions other than the default expression. The effective domain of the approximate expression x = f1 (t) is I11∪I12∪I13More specifically, [T0, T1] ∪ [T2+1, T3] ∪ [T5+1, T6]. The effective domain of the approximate expression x = f2 (x) is I21∪I22More specifically, [T3+1, T4] ∪ [T6+1, T7]. Time T0To time T1[T0, T1] And the other sections are also represented in the same manner. FIG. 27 is an explanatory diagram summarizing the effective definition areas of x = f1 (t) and x = f2 (t).
∪ In this example, ∪ = [T0, T7The predetermined effective range calculation unit 40 may calculate the default effective range as shown in the following equation (2).
Sdefault
= [T0, T7]-([T0, T1] ∪ [T2+1, T3] ∪ [T5+1, T6]) ∪ ([T3+1, T4] ∪ [T6+1, T7])
= [T1+1, T2] ∪ [T4+1, T5] Formula (2)
[T1+1, T2], [T4+1, T5] Is the section I shown in FIG.01, I02It is.
The valid domain capacity evaluation unit 41 refers to the valid domain of the default formula received from the default formula valid domain calculation unit 40 and the valid domain of each approximate expression stored in the definite graph storage unit 19 to validate An approximate expression that maximizes the storage capacity required to store the domain is specified. The effective domain capacity evaluation unit 41 outputs the approximate expression and the default effective domain to the effective domain update unit 42. Note that the valid domain capacity evaluation unit 41 may output an approximate formula ID or a default formula ID to the valid domain update unit 42 instead of outputting the approximate formula itself.
In the example shown in FIG. 27, the effective definition area of the approximate expression of the approximate expression ID “f1” is [T0, T1] ∪ [T2+1, T3] ∪ [T5+1, T6Therefore, a storage capacity of 6 numerical values is required. The effective definition area of the approximate expression ID “f2” is [T3+1, T4] ∪ [T6+1, T7Therefore, a storage capacity for four numerical values is required. Furthermore, the effective domain calculated for the default formula is [T1+1, T2] ∪ [T4+1, T5Therefore, a storage capacity for four numerical values is required. Therefore, the effective domain capacity evaluation unit 41 calculates the approximate expression x = f1 (t) that maximizes the storage capacity of the effective domain and the effective domain [T1+1, T2] ∪ [T4+1, T5] Is output to the valid domain update unit 42.
When the valid domain calculated for the default formula and the approximate formula that maximizes the storage capacity of the valid domain are input to the valid domain update unit 42, the valid domain update unit 42 responds to the input contents. To update the default formula. However, if the approximate expression that maximizes the storage capacity of the valid domain is the current default formula, the valid domain update unit 42 ends the process without updating.
The default valid domain calculation unit 40, the valid domain capacity evaluation unit 41, and the valid domain update unit 42 in the fourth embodiment are realized by a CPU of a computer that operates according to a data summarization program, for example. In this case, a program storage device (not shown) of the computer stores the data summarization program, and the CPU reads the program, and according to the program, the data input unit 10, the new data generation time substitution unit 12, and the new approximate expression generation unit 14 , Accuracy constraint input unit 16, graph evaluation unit 17 a, graph update unit 18 a, default formula input unit 20, default formula valid domain calculation unit 40, valid domain capacity assessment unit 41, and valid domain update unit 42. Good. In addition, each of these units may be realized by separate hardware.
Next, the operation of the fourth embodiment will be described.
FIG. 28 is a flowchart showing an example of the progress of the default formula update process by the default formula valid domain calculation unit 40, the valid domain capacity evaluation unit 41, and the valid domain update unit 42 in the fourth embodiment.
Note that the data summarization system performs this default formula update process by performing the data input unit 10, the final time storage unit 11, the new data generation time substitution unit 12, the new approximate formula generation unit 14, and the uncertain point storage unit 13. Data summarization processing (same data summarization processing as in the second embodiment) by the approximate expression storage unit 15, the graph evaluation unit 17a, the graph update unit 18a, the confirmed graph storage unit 19, and the default formula storage unit 21 ) And asynchronous. For example, the data summarization system may execute a predefined update process shown in FIG. 28 at regular time intervals. Alternatively, each time a predetermined number of generated data is input to the data input unit 10, the data summarization system may execute a default update process. Alternatively, when a new approximate expression is added to the approximate expression storage unit 15, the data summarization system may execute a default expression update process. These are illustrations of the execution timing of the default update process shown in FIG. 28, and the execution timing of the default update process is not limited to the above examples.
When the default formula update process is started, first, the default formula valid domain calculation unit 40 reads the valid domain of all approximate formulas from the definite graph storage unit 19 (step S401). Further, the default effective range calculation unit 40 performs the calculation of the above-described formula (1), so that the default effective range SdefaultIs calculated (step S402). The predefined effective domain calculation unit 40 outputs the effective domain to the effective domain capacity evaluation unit 41.
Next, the effective domain capacity evaluation unit 41 uses the default valid domain SdefaultThen, referring to the effective domain of each approximate expression, an approximate expression that maximizes the storage capacity required to store the effective domain is specified (step S403). Then, the valid domain capacity evaluation unit 41 outputs the identified approximate expression and the default valid domain to the valid domain update unit 42.
Next, the valid domain update unit 42 determines whether or not the approximate formula specified in step S403 is a default formula as an approximate formula that maximizes the storage capacity required to store the valid domain ( Step S404).
If the approximate expression specified in step S403 is not a default expression (No in step S404), the valid domain update unit 42 updates the approximate expression specified in step S403 as a new prescribed expression (step S405). Specifically, the valid domain update unit 42 performs the following processing.
The valid domain update unit 42 sets the approximate formula that maximizes the storage capacity of the valid domain as a new default formula, and updates the default formula stored in the default formula storage unit 21 to the new default formula. Then, the valid domain update unit 42 deletes the approximate expression and the approximate expression ID as a new default expression from the approximate expression storage unit 15. Further, the valid domain update unit 42 stores the approximate expression that has been set as the default expression in the approximate expression storage unit 15. At this time, the valid domain update unit 42 assigns an approximate expression ID to the approximate expression (approximate expression that has been set as a default expression so far), and stores the approximate expression ID together with the approximate expression ID in the approximate expression storage unit 15.
The effective domain update unit 42 deletes the approximate expression ID of the approximate expression as a new default formula and its effective domain from the confirmed graph storage unit 19. Further, the valid domain update unit 42 confirms the approximate expression ID assigned to the approximate expression that has been used as the default formula and the valid domain (the valid domain calculated by the default formula valid domain calculation unit 40). The data is stored in the storage unit 19.
Further, if the approximate expression ID of the new approximate expression is stored in the final time storage section 11 as the final approximate expression ID, the valid domain update unit 42 sets the ID as the default expression ID. Update to On the other hand, if the default formula ID is stored in the final time storage unit 11 as the ID of the final approximate formula, the valid domain update unit 42 assigns the default formula ID to the approximate formula that has been the default formula so far. Update to approximate expression ID.
If the approximate expression specified in step S403 is a default expression (Yes in step S404), the valid domain update unit 42 does not update the default expression and ends the process as it is. That is, the valid domain update unit 42 ends the process without updating the contents stored in the default formula storage unit 21, the approximate formula storage unit 15, the confirmed graph storage unit 19, and the final time storage unit 11.
The data summarization system of the fourth embodiment compares each effective definition area of an approximate expression including a default expression, and sets an approximate expression having the maximum storage capacity for storing the effective definition area as a new default expression. Update the expression. Since the effective domain is not defined in the default formula, the data summarization system of the fourth embodiment reduces the storage capacity required for storing the valid domain by updating the default formula as described above, and more efficiently. Data can be summarized.
For example, it is assumed that data is input as illustrated in FIG. 26 and x = f0 (t) is a default formula. In this case, as shown in FIG. 27, a storage capacity of 10 numerical values is required for the effective definition area of x = f1 (t) and x = f2 (t). If the data summarization system of the fourth embodiment has x = f1 (t) as the default formula and stores the previous x = f0 (t) together with the valid domain, [T0, T1] ∪ [T2+1, T3] ∪ [T5+1, T6] Instead of [T1+1, T2] ∪ [T4+1, T5] Need only be stored. Therefore, the data summarization system of the fourth embodiment can reduce the capacity required for storing the effective domain by two in this example.
As in the third embodiment, the data summarization system according to the fourth embodiment includes an unsummary data storage unit 30, an available storage area monitoring unit 31, and a summary control unit 32, and stores data. Data may be stored as it is when there are many resources that can be used, and data summarization may be performed when resources are reduced.
In each of the above embodiments, the case where the data includes a data value that is a numerical value is taken as an example. However, instead of the numerical value itself, the data value can be converted into a numerical value and a difference between the numerical data can be derived. May be included. For example, text information may be used as a data value if a conversion rule for numerical values is defined. When the sequentially generated data includes such text information and generation time, for example, the data input unit 10 may convert the text information into a numerical value. The subsequent processing is the same as in the above embodiment.
In each of the above embodiments, the case where data includes a scalar quantity as a data value has been illustrated, but a vector may be included as a data value. That is, the data may include a vector and an occurrence time. In this case, the new approximate expression generation unit 14 may generate an approximate expression for deriving an approximate value of a vector from a plurality of data (undefined points and newly generated data). In step S106 (see FIG. 16), when the graph evaluation unit 17 or the graph evaluation unit 17a compares the vector calculated as the approximate value with the vector actually included in the data, the distance between the two in the vector space. May be calculated. In step S107, the graph evaluation unit 17 or the graph evaluation unit 17a may determine whether there is an approximate expression whose distance is less than the threshold value ε (or less than ε).
As mentioned above, although the form for implementing this invention was demonstrated, this invention is not limited to the above embodiment. Various other additions and modifications that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
Next, the minimum configuration of the present invention will be described. FIG. 29 is a block diagram showing the minimum configuration of the present invention. The data summarization system of the present invention includes an approximate value calculation unit 61, an approximate expression evaluation unit 62, an unconfirmed data storage unit 63, a new approximate expression generation unit 64, and an update unit 65.
The approximate value calculation unit 61 (for example, the new data generation time substitution unit 12) is an approximate expression for calculating an approximate value of a data value in data including the data value and the generation time of the data value, and the generation time is a variable. Approximate the approximate value of the data value of the new data by substituting the occurrence time included in the new data for each approximate expression in which the effective domain of the variable is defined as a time interval or a set of time points. Calculate for each formula.
The approximate expression evaluation unit 62 (for example, the graph evaluation unit 17) selects an approximate expression suitable for calculating the approximate value of the data value of the new data based on the approximate value calculated for each approximate expression and the data value of the new data. Or it is determined that there is no approximate expression suitable for calculating the approximate value of the data value of the new data.
The indeterminate data storage unit 63 (for example, the indeterminate point storage unit 13) converts the new data determined to have no approximate expression suitable for calculating the approximate value of the data value into the approximate expression indeterminate data (for example, the indeterminate point). Remember as.
The new approximate expression generation unit 64 (for example, the new approximate expression generation unit 14) generates a new approximate expression from the new data and the approximate expression unconfirmed data when the new data is input to the new approximate expression generation unit 64. It is determined whether or not it is possible, and if it can be generated, a new approximate expression is generated, and a time interval or a set of points of time is defined as an effective definition area of the approximate expression.
The update unit 65 (for example, the graph update unit 18) approximates so that the generation time of the new data is included when the approximate expression evaluation unit 62 selects an approximate expression suitable for calculating the approximate value of the data value of the new data. Update the effective domain of the expression.
The data summarization system including the above configuration stores each data in the form of an approximate expression and its effective domain, and defines the effective domain of one approximate expression as a time interval or a set of time points. Therefore, the data summarization system requires a small storage capacity for storing the approximate expression and its effective domain. Therefore, the data summarization system can efficiently summarize (compress) the data. In particular, this advantage is remarkably obtained when summarizing data that occurs sequentially in a certain tendency and that may vary greatly irregularly.
In each of the above embodiments, a data summarization system having the following configuration is described.
(1) An approximate expression for calculating an approximate value of a data value in data including a data value and an occurrence time of the data value, where the occurrence time is a variable, and the effective domain of the variable is a time interval or a single time An approximate value calculation unit that calculates the approximate value of the data value of the new data for each approximate expression by substituting the occurrence time included in the new data for each approximate expression defined as a set of (for example, new data generation Based on the time substitution unit 12) and the approximate value calculated for each approximate expression and the data value of the new data, an approximate expression suitable for calculating the approximate value of the data value of the new data is selected, or the new data An approximate expression evaluation unit (for example, the graph evaluation unit 17) that determines that there is no approximate expression suitable for calculating the approximate value of the data value, and new data determined that there is no approximate expression suitable for the approximate value calculation of the data value are approximated. Formula not yet When an unconfirmed data storage unit (for example, an unconfirmed point storage unit 13) that stores as fixed data (for example, unconfirmed points) and new data is input, new data and approximate expression unconfirmed data A new approximate expression that determines whether an approximate expression can be generated, generates a new approximate expression if it can be generated, and defines a time interval or a set of time points as an effective domain of the approximate expression When the generation unit (for example, the new approximate expression generation unit 14) and the approximate expression evaluation unit select an approximate expression suitable for calculating the approximate value of the data value of the new data, the approximation is performed so that the generation time of the new data is included. A data summarization system comprising: an update unit (for example, graph update unit 18) that updates an effective domain of an expression.
(2) The approximate expression evaluation unit specifies an approximate expression in which the relationship between the approximate value and the data value of the new data satisfies a predetermined criterion (for example, the criterion stored in the accuracy constraint input unit 16), and the approximate equation that satisfies the criterion If there is one, select the approximate expression. If there are multiple approximate expressions that meet the criteria, the effective definition area includes the time of occurrence of new data from the approximate expressions. If an approximate expression that minimizes the increase in storage capacity required to store the effective domain is selected when an update is made, and there is no approximate expression that satisfies the criteria, an approximate value of the data value of the new data A data summarization system that determines that there is no approximate expression suitable for calculation.
(3) An approximate expression for calculating the approximate value of the data value, and including a default formula storage unit (for example, the default formula storage unit 21) that stores a default formula that is an approximate formula that does not define an effective domain, and calculates an approximate value. For each approximate expression including the default expression, the approximate value of the data value of the new data is calculated for each approximate expression by substituting the occurrence time included in the new data. If the relation between the approximate value and the data value of the new data satisfies the predetermined criterion from among the approximate expressions including the one, and if there is one approximate expression that satisfies the criterion, select the approximate expression If there are multiple approximation formulas that satisfy the criteria and the default formula is included in the multiple approximation formulas, select the default formula, and there are multiple approximation formulas that satisfy the criteria. If the default expression is not included in the approximate expression, select from the multiple approximate expressions. When the effective domain is updated to include the time of occurrence of new data, an approximate expression that minimizes the increase in storage capacity required for storing the effective domain is selected, and there is no approximate expression that satisfies the criteria In this case, the data summarization system determines that there is no approximate expression suitable for calculating the approximate value of the data value of the new data.
(4) For storing a valid domain from a default formula valid domain calculator (for example, default formula valid domain calculator 40) for calculating a valid domain of a default formula and each approximate expression including the default formula The effective domain capacity evaluation unit (for example, the effective domain capacity evaluation unit 41) that specifies an approximate expression that maximizes the storage capacity required for the storage, and the storage capacity required to store the effective domain If the approximate expression is not a default formula, the approximate formula with the maximum storage capacity is stored as a new default formula in the default formula storage unit, and the approximate formula with the maximum storage capacity and its effective domain are excluded. A data summarization system comprising a predefined update unit (for example, an effective domain update unit 42).
(5) New data storage unit (for example, unsummarized data storage unit 30) for storing input new data, and monitoring unit (for example, available storage) for monitoring resources capable of storing new data in the new data storage unit Area monitoring unit 31) and new data stored in the new data storage unit one by one in the approximate value calculation unit and the new approximate expression generation unit when resources that can store new data are less than a predetermined amount A data summarization system comprising a summary control unit (for example, a summary control unit 32) for outputting.
In each of the above embodiments, the following data structure is described.
(1) An approximate expression for calculating an approximate value of a data value by substituting a variable is associated with an effective definition area that is a variable definition area capable of obtaining an approximate value of the data value. Is a data structure characterized by being represented by a variable interval or a set of points representing one variable value.
(2) Corresponding to another approximate expression between sections of the effective domain associated with one approximate expression, between points or between points, or between sections and points A data structure that allows valid domain intervals or points to exist.
This application claims priority based on Japanese Patent Application No. 2009-205012, filed on Sep. 4, 2009, the entire disclosure of which is incorporated herein.
 本発明の各形態は、逐次的に一定の傾向で発生するデータであって不規則に大きく変化することがあるデータ等を要約するデータ要約装置に好適に適用可能である。 Each form of the present invention can be suitably applied to a data summarization apparatus that summarizes data that is sequentially generated with a certain tendency and may change irregularly.
 10 データ入力部
 11 最終時刻記憶部
 12 新規データ発生時刻代入部
 13 未確定点記憶部
 14 新規近似式生成部
 15 近似式記憶部
 16 精度制約入力部
 17,17a グラフ評価部
 18,18a グラフ更新部
 19 確定グラフ記憶部
 20 既定式入力部
 21 既定式記憶部
 30 未要約データ記憶部
 31 利用可能記憶域監視部
 32 要約制御部
 40 既定式有効定義域計算部
 41 有効定義域容量評価部
 42 有効定義域更新部
 61 近似値計算部
 62 近似式評価部
 63 未確定データ記憶部
 64 新規近似式生成部
 65 更新部
DESCRIPTION OF SYMBOLS 10 Data input part 11 Final time memory | storage part 12 New data generation | occurrence | production time substitution part 13 Uncertain point memory | storage part 14 New approximate expression production | generation part 15 Approximation expression memory | storage part 16 Accuracy constraint input part 17, 17a Graph evaluation part 18, 18a Graph update part DESCRIPTION OF SYMBOLS 19 Deterministic graph memory | storage part 20 Default formula input part 21 Default formula memory | storage part 30 Unsummary data storage part 31 Available storage area monitoring part 32 Summary control part 40 Default formula effective domain calculation part 41 Effective domain capacity evaluation part 42 Effective definition Area update unit 61 Approximate value calculation unit 62 Approximation formula evaluation unit 63 Unconfirmed data storage unit 64 New approximation formula generation unit 65 Update unit

Claims (13)

  1.  データ値と当該データ値の発生時刻とを含むデータにおけるデータ値の近似値を計算する近似式であって、発生時刻を変数とし、変数の有効定義域が時間の区間または一点の時刻の集合として定められた各近似式に対して、新規データに含まれる発生時刻を代入することで当該新規データのデータ値の近似値を近似式毎に計算する近似値計算部と、
     近似式毎に計算された近似値と前記新規データのデータ値とに基づいて、前記新規データのデータ値の近似値計算に適する近似式を選択するか、あるいは、前記新規データのデータ値の近似値計算に適する近似式がないと判定する近似式評価部と、
     データ値の近似値計算に適する近似式がないと判定された新規データを、近似式未確定データとして記憶する未確定データ記憶部と、
     新規データが入力されたときに、当該新規データと近似式未確定データとにより新規の近似式を生成可能であるか否かを判定し、生成可能である場合に新規の近似式を生成し、当該近似式の有効定義域として時間の区間または一点の時刻の集合を定める新規近似式生成部と、
     近似式評価部が新規データのデータ値の近似値計算に適する近似式を選択した場合に、当該新規データの発生時刻を含めるように、前記近似式の有効定義域を更新する更新部とを備える
     ことを特徴とするデータ要約システム。
    An approximate expression that calculates an approximate value of a data value in data including the data value and the time of occurrence of the data value, where the time of occurrence is a variable and the effective domain of the variable is a time interval or a set of time points An approximate value calculation unit that calculates an approximate value of the data value of the new data for each approximate expression by substituting the occurrence time included in the new data for each determined approximate expression,
    Based on the approximate value calculated for each approximate expression and the data value of the new data, select an approximate expression suitable for calculating the approximate value of the data value of the new data, or approximate the data value of the new data An approximate expression evaluation unit that determines that there is no approximate expression suitable for value calculation;
    An unconfirmed data storage unit that stores new data determined as having no approximate expression suitable for calculating an approximate value of data values as approximate expression unconfirmed data;
    When new data is input, it is determined whether or not a new approximate expression can be generated based on the new data and the approximate expression indeterminate data. If it can be generated, a new approximate expression is generated. A new approximate expression generation unit that defines a time interval or a set of time points as an effective domain of the approximate expression;
    And an update unit that updates the effective definition area of the approximate expression so as to include the generation time of the new data when the approximate expression evaluation unit selects an approximate expression suitable for calculating the approximate value of the data value of the new data. A data summarization system.
  2.  近似式評価部は、
     近似値と新規データのデータ値との関係が所定の基準を満たす近似式を特定し、
     前記基準を満たす近似式が1つである場合には、その近似式を選択し、
     前記基準を満たす近似式が複数存在する場合には、当該複数の近似式の中から、新規データの発生時刻を含めるように有効定義域を更新した場合に有効定義域の記憶のために必要となる記憶容量の増加が最小となる近似式を選択し、
     前記基準を満たす近似式が存在しない場合には、新規データのデータ値の近似値計算に適する近似式がないと判定する
     請求項1に記載のデータ要約システム。
    The approximate expression evaluation unit
    Identify an approximate expression where the relationship between the approximate value and the data value of the new data satisfies a given criterion,
    If there is one approximate expression that satisfies the above criteria, select that approximate expression,
    When there are a plurality of approximate expressions satisfying the above criteria, it is necessary for storing the effective definition area when the effective definition area is updated to include the time of occurrence of new data from the plurality of approximate expressions. Select the approximate expression that minimizes the increase in storage capacity
    The data summarization system according to claim 1, wherein if there is no approximate expression that satisfies the criterion, it is determined that there is no approximate expression suitable for calculating the approximate value of the data value of the new data.
  3.  データ値の近似値を計算する近似式であって有効定義域を定めない近似式である既定式を記憶する既定式記憶部を備え、
     近似値計算部は、既定式を含む各近似式に対して、新規データに含まれる発生時刻を代入することで当該新規データのデータ値の近似値を近似式毎に計算し、
     近似式評価部は、既定式を含む各近似式の中から近似値と前記新規データのデータ値との関係が所定の基準を満たす近似式を特定し、
     前記基準を満たす近似式が1つである場合には、その近似式を選択し、
     前記基準を満たす近似式が複数存在し、その複数の近似式の中に既定式が含まれている場合には、既定式を選択し、
     前記基準を満たす近似式が複数存在し、その複数の近似式の中に既定式が含まれていない場合には、当該複数の近似式の中から、新規データの発生時刻を含めるように有効定義域を更新した場合に有効定義域の記憶のために必要となる記憶容量の増加が最小となる近似式を選択し、
     前記基準を満たす近似式が存在しない場合には、新規データのデータ値の近似値計算に適する近似式がないと判定する
     請求項2に記載のデータ要約システム。
    A default formula storage unit for storing a default formula that is an approximate formula that calculates an approximate value of a data value and that does not define an effective domain;
    The approximate value calculation unit calculates the approximate value of the data value of the new data for each approximate expression by substituting the occurrence time included in the new data for each approximate expression including the default expression.
    The approximate expression evaluation unit specifies an approximate expression that satisfies a predetermined criterion in a relationship between the approximate value and the data value of the new data from among the approximate expressions including the predetermined expression,
    If there is one approximate expression that satisfies the above criteria, select that approximate expression,
    When there are a plurality of approximate expressions that satisfy the above-mentioned criteria and the default expression is included in the approximate expressions, select the default expression,
    If there are multiple approximation formulas that meet the above criteria and the default formula is not included in the multiple approximation formulas, the effective definition is to include the generation time of new data from the multiple approximation formulas. Select an approximate expression that minimizes the increase in storage capacity required to store the effective domain when the domain is updated,
    The data summarization system according to claim 2, wherein if there is no approximate expression that satisfies the criterion, it is determined that there is no approximate expression suitable for calculating the approximate value of the data value of the new data.
  4.  既定式の有効定義域を計算する既定式有効定義域計算部と、
     既定式を含む各近似式の中から有効定義域の記憶のために必要となる記憶容量が最大となる近似式を特定する有効定義域容量評価部と、
     有効定義域の記憶のために必要となる記憶容量が最大となる近似式が既定式でない場合に、前記記憶容量が最大となる近似式を新たな既定式として既定式記憶部に記憶させ、前記記憶容量が最大となる近似式およびその有効定義域を除外する既定式更新部とを備える
     請求項3に記載のデータ要約システム。
    A default effective domain calculator that calculates the effective domain of the default formula;
    An effective domain capacity evaluation unit for identifying an approximate expression that maximizes the storage capacity required for storing the effective domain from among the approximate formulas including the predetermined formula;
    When the approximate expression that maximizes the storage capacity required for storing the effective domain is not a default expression, the approximate expression that maximizes the storage capacity is stored in the default expression storage unit as a new default expression, and The data summarization system according to claim 3, further comprising: an approximate expression that maximizes a storage capacity and a default expression update unit that excludes an effective domain thereof.
  5.  入力された新規データを記憶する新規データ記憶部と、
     新規データ記憶部における新規データを記憶可能なリソースを監視する監視部と、
     新規データを記憶可能なリソースが所定量より少なくなった場合に、新規データ記憶部に記憶されている新規データを1つずつ近似値計算部および新規近似式生成部に出力する要約制御部とを備える
     請求項1から請求項4のうちのいずれか1項に記載のデータ要約システム。
    A new data storage unit for storing the input new data;
    A monitoring unit that monitors resources capable of storing new data in the new data storage unit;
    A summary control unit that outputs new data stored in the new data storage unit to the approximate value calculation unit and the new approximate expression generation unit one by one when resources that can store the new data are less than a predetermined amount; The data summarization system according to any one of claims 1 to 4.
  6.  データ値と当該データ値の発生時刻とを含むデータにおけるデータ値の近似値を計算する近似式であって、発生時刻を変数とし、変数の有効定義域が時間の区間または一点の時刻の集合として定められた各近似式に対して、新規データに含まれる発生時刻を代入することで当該新規データのデータ値を近似式毎に計算し、
     近似式毎に計算された近似値と前記新規データのデータ値とに基づいて、前記新規データのデータ値の近似値計算に適する近似式を選択するか、あるいは、前記新規データのデータ値の近似値計算に適する近似式がないと判定し、
     データ値の近似値計算に適する近似式がないと判定された新規データを、近似式未確定データとして記憶し、
     新規データが入力されたときに、当該新規データと近似式未確定データとにより新規の近似式を生成可能であるか否かを判定し、生成可能である場合に新規の近似式を生成し、当該近似式の有効定義域として時間の区間または一点の時刻の集合を定め、
     新規データのデータ値の近似値計算に適する近似式を選択した場合に、当該新規データの発生時刻を含めるように、前記近似式の有効定義域を更新する
     ことを特徴とするデータ要約方法。
    An approximate expression that calculates an approximate value of a data value in data including the data value and the time of occurrence of the data value, where the time of occurrence is a variable and the effective domain of the variable is a time interval or a set of time points For each approximate expression defined, the data value of the new data is calculated for each approximate expression by substituting the occurrence time included in the new data,
    Based on the approximate value calculated for each approximate expression and the data value of the new data, select an approximate expression suitable for calculating the approximate value of the data value of the new data, or approximate the data value of the new data Judge that there is no approximate expression suitable for value calculation,
    New data determined to have no approximate expression suitable for calculating the approximate value of the data value is stored as approximate expression indeterminate data,
    When new data is input, it is determined whether or not a new approximate expression can be generated based on the new data and the approximate expression indeterminate data. If it can be generated, a new approximate expression is generated. Set a time interval or a set of time points as the effective domain of the approximate expression,
    A data summarization method comprising: updating an effective definition area of the approximate expression so as to include an occurrence time of the new data when an approximate expression suitable for calculating an approximate value of the data value of the new data is selected.
  7.  新規データのデータ値の近似値計算に適する近似式を選択するときに、
     近似値と新規データのデータ値との関係が所定の基準を満たす近似式を特定し、
     前記基準を満たす近似式が1つである場合には、その近似式を選択し、
     前記基準を満たす近似式が複数存在する場合には、当該複数の近似式の中から、新規データの発生時刻を含めるように有効定義域を更新した場合に有効定義域の記憶のために必要となる記憶容量の増加が最小となる近似式を選択し、
     前記基準を満たす近似式が存在しない場合には、新規データのデータ値の近似値計算に適する近似式がないと判定する
     請求項6に記載のデータ要約方法。
    When choosing an approximation formula suitable for calculating approximate values for new data values,
    Identify an approximate expression where the relationship between the approximate value and the data value of the new data satisfies a given criterion,
    If there is one approximate expression that satisfies the above criteria, select that approximate expression,
    When there are a plurality of approximate expressions satisfying the above criteria, it is necessary for storing the effective definition area when the effective definition area is updated to include the time of occurrence of new data from the plurality of approximate expressions. Select the approximate expression that minimizes the increase in storage capacity
    The data summarization method according to claim 6, wherein if there is no approximate expression that satisfies the criterion, it is determined that there is no approximate expression suitable for calculating the approximate value of the data value of the new data.
  8.  データ値の近似値を計算する近似式であって有効定義域を定めない近似式である既定式を記憶し、
     既定式を含む各近似式に対して、新規データに含まれる発生時刻を代入することで当該新規データのデータ値を近似式毎に計算し、
     新規データのデータ値の近似値計算に適する近似式を選択するときに、
     既定式を含む各近似式の中から近似値と前記新規データのデータ値との関係が所定の基準を満たす近似式を特定し、
     前記基準を満たす近似式が1つである場合には、その近似式を選択し、
     前記基準を満たす近似式が複数存在し、その複数の近似式の中に既定式が含まれている場合には、既定式を選択し、
     前記基準を満たす近似式が複数存在し、その複数の近似式の中に既定式が含まれていない場合には、当該複数の近似式の中から、新規データの発生時刻を含めるように有効定義域を更新した場合に有効定義域の記憶のために必要となる記憶容量の増加が最小となる近似式を選択し、
     前記基準を満たす近似式が存在しない場合には、新規データのデータ値の近似値計算に適する近似式がないと判定する
     請求項7に記載のデータ要約方法。
    Stores a default expression that is an approximate expression that calculates an approximate value of a data value and that does not define an effective domain,
    For each approximate expression including the default expression, the data value of the new data is calculated for each approximate expression by substituting the occurrence time included in the new data,
    When choosing an approximation formula suitable for calculating approximate values for new data values,
    From among the approximate expressions including a predetermined expression, specify an approximate expression that satisfies a predetermined criterion for the relationship between the approximate value and the data value of the new data,
    If there is one approximate expression that satisfies the above criteria, select that approximate expression,
    When there are a plurality of approximate expressions that satisfy the above-mentioned criteria and the default expression is included in the approximate expressions, select the default expression,
    If there are multiple approximation formulas that meet the above criteria and the default formula is not included in the multiple approximation formulas, the effective definition is to include the generation time of new data from the multiple approximation formulas. Select an approximate expression that minimizes the increase in storage capacity required to store the effective domain when the domain is updated,
    The data summarization method according to claim 7, wherein when there is no approximate expression that satisfies the criterion, it is determined that there is no approximate expression suitable for calculating the approximate value of the data value of the new data.
  9.  既定式の有効定義域を計算し、
     既定式を含む各近似式の中から有効定義域の記憶のために必要となる記憶容量が最大となる近似式を特定し、
     有効定義域の記憶のために必要となる記憶容量が最大となる近似式が既定式でない場合に、前記記憶容量が最大となる近似式を新たな既定式として記憶し、前記記憶容量が最大となる近似式およびその有効定義域を除外する
     請求項8に記載のデータ要約方法。
    Calculate the default effective domain,
    From the approximate formulas including the default formula, specify the approximate formula that maximizes the storage capacity required for storing the effective domain,
    When the approximate expression that maximizes the storage capacity required for storing the effective domain is not a default expression, the approximate expression that maximizes the storage capacity is stored as a new default expression, and the storage capacity is The data summarization method according to claim 8, wherein the approximate expression and its effective domain are excluded.
  10.  入力された新規データを記憶し、
     新規データを記憶可能なリソースを監視し、
     新規データを記憶可能なリソースが所定量より少なくなった場合に、記憶している新規データを1つずつ、近似値計算の対象、および、新規の近似式を生成可能か否かの判定の対象とする
     請求項6から請求項9のうちのいずれか1項に記載のデータ要約方法。
    Memorize the new data entered,
    Monitor resources that can store new data,
    When the amount of resources that can store new data becomes less than the specified amount, the stored new data is subject to approximate value calculation and whether or not a new approximate expression can be generated. The data summarization method according to any one of claims 6 to 9.
  11.  コンピュータに、
     データ値と当該データ値の発生時刻とを含むデータにおけるデータ値の近似値を計算する近似式であって、発生時刻を変数とし、変数の有効定義域が時間の区間または一点の時刻の集合として定められた各近似式に対して、新規データに含まれる発生時刻を代入することで当該新規データのデータ値の近似値を近似式毎に計算する近似値計算処理、
     近似式毎に計算された近似値と前記新規データのデータ値とに基づいて、前記新規データのデータ値の近似値計算に適する近似式を選択するか、あるいは、前記新規データのデータ値の近似値計算に適する近似式がないと判定する近似式評価処理、
     データ値の近似値計算に適する近似式がないと判定された新規データを、近似式未確定データとして未確定データ記憶部に記憶させる未確定データ記憶処理、
     新規データが入力されたときに、当該新規データと近似式未確定データとにより新規の近似式を生成可能であるか否かを判定し、生成可能である場合に新規の近似式を生成し、当該近似式の有効定義域として時間の区間または一点の時刻の集合を定める新規近似式生成処理、および、
     近似式評価処理で新規データのデータ値の近似値計算に適する近似式を選択した場合に、当該新規データの発生時刻を含めるように、前記近似式の有効定義域を更新する更新処理
     を実行させるためのデータ要約プログラムを格納した記録媒体。
    On the computer,
    An approximate expression that calculates an approximate value of a data value in data including the data value and the time of occurrence of the data value, where the time of occurrence is a variable and the effective domain of the variable is a time interval or a set of time points Approximate value calculation processing for calculating an approximate value of the data value of the new data for each approximate expression by substituting the occurrence time included in the new data for each determined approximate expression,
    Based on the approximate value calculated for each approximate expression and the data value of the new data, select an approximate expression suitable for calculating the approximate value of the data value of the new data, or approximate the data value of the new data Approximate expression evaluation process for determining that there is no approximate expression suitable for value calculation,
    Unconfirmed data storage processing for storing new data determined to have no approximate expression suitable for calculating approximate values of data values in the unconfirmed data storage unit as approximate expression unconfirmed data,
    When new data is input, it is determined whether or not a new approximate expression can be generated based on the new data and the approximate expression indeterminate data. If it can be generated, a new approximate expression is generated. A new approximate expression generation process for defining a time interval or a set of time points as an effective definition area of the approximate expression; and
    When an approximate expression suitable for calculating the approximate value of the data value of the new data is selected in the approximate expression evaluation process, an update process is executed to update the effective definition area of the approximate expression so that the generation time of the new data is included. Recording medium storing a data summarization program.
  12.  変数を代入してデータ値の近似値を計算するための近似式と、
     データ値の近似値を求めることができる変数の定義域である有効定義域とが対応付けられ、
     有効定義域は、変数の区間または一つの変数値を表す点の集合で表される
     ことを特徴とするデータ構造。
    An approximation formula for calculating approximate values of data values by substituting variables,
    Is associated with an effective domain, which is a domain of variables that can be approximated by data values,
    A valid domain is a data structure characterized by a variable interval or a set of points representing a single variable value.
  13.  一の近似式に対応付けられた有効定義域の区間と区間との間、または、点と点との間、または区間と点との間に、他の近似式に対応付けられた有効定義域の区間または点が存在することを許容する
     請求項12に記載のデータ構造。
    An effective domain associated with another approximate expression between sections of an effective domain associated with one approximate expression, or between points, or between an area and a point The data structure according to claim 12, wherein a section or a point is allowed to exist.
PCT/JP2010/064538 2009-09-04 2010-08-20 Data summarization system, data summarization method and storage medium WO2011027714A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2011529886A JPWO2011027714A1 (en) 2009-09-04 2010-08-20 Data summarization system, data summarization method and recording medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2009-205012 2009-09-04
JP2009205012 2009-09-04

Publications (1)

Publication Number Publication Date
WO2011027714A1 true WO2011027714A1 (en) 2011-03-10

Family

ID=43649253

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2010/064538 WO2011027714A1 (en) 2009-09-04 2010-08-20 Data summarization system, data summarization method and storage medium

Country Status (2)

Country Link
JP (1) JPWO2011027714A1 (en)
WO (1) WO2011027714A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254261A (en) * 2020-02-07 2021-08-13 伊姆西Ip控股有限责任公司 Data backup method, electronic device and computer program product

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05159185A (en) * 1991-12-02 1993-06-25 Toshiba Corp Power generation plant monitoring data compression and preservation method
JPH06175664A (en) * 1992-12-10 1994-06-24 Yamaha Corp Musical sound waveform storing method and musical sound waveform generating device
JPH0765168A (en) * 1993-08-31 1995-03-10 Hitachi Ltd Device and method for function approximation
JP2000149001A (en) * 1998-11-17 2000-05-30 Sony Corp Digital image recorder
JP2002314922A (en) * 2001-04-10 2002-10-25 Asahi Optical Co Ltd Image data recording device, image data recording program, computer recording medium storing the same, and image inputting equipment provided therewith
JP2002351860A (en) * 2002-03-11 2002-12-06 Shiraishi Kenji Information arithmetic device
WO2005039058A1 (en) * 2003-10-17 2005-04-28 Matsushita Electric Industrial Co., Ltd. Encoding data generation method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05159185A (en) * 1991-12-02 1993-06-25 Toshiba Corp Power generation plant monitoring data compression and preservation method
JPH06175664A (en) * 1992-12-10 1994-06-24 Yamaha Corp Musical sound waveform storing method and musical sound waveform generating device
JPH0765168A (en) * 1993-08-31 1995-03-10 Hitachi Ltd Device and method for function approximation
JP2000149001A (en) * 1998-11-17 2000-05-30 Sony Corp Digital image recorder
JP2002314922A (en) * 2001-04-10 2002-10-25 Asahi Optical Co Ltd Image data recording device, image data recording program, computer recording medium storing the same, and image inputting equipment provided therewith
JP2002351860A (en) * 2002-03-11 2002-12-06 Shiraishi Kenji Information arithmetic device
WO2005039058A1 (en) * 2003-10-17 2005-04-28 Matsushita Electric Industrial Co., Ltd. Encoding data generation method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254261A (en) * 2020-02-07 2021-08-13 伊姆西Ip控股有限责任公司 Data backup method, electronic device and computer program product

Also Published As

Publication number Publication date
JPWO2011027714A1 (en) 2013-02-04

Similar Documents

Publication Publication Date Title
US20140089945A1 (en) Adaptive tree structure for visualizing data
JP5928091B2 (en) Tag group classification method, apparatus, and data mashup method, apparatus
JPWO2017188419A1 (en) COMPUTER RESOURCE MANAGEMENT DEVICE, COMPUTER RESOURCE MANAGEMENT METHOD, AND PROGRAM
Choi et al. Scheduling algorithms to minimize the number of tardy jobs in two-stage hybrid flow shops
You Performance of synthetic double sampling chart with estimated parameters based on expected average run length
JP6366033B2 (en) Optimization method of IF statement in program
JPWO2013038473A1 (en) Stream data abnormality detection method and apparatus
JP2008158806A (en) Processor program with multiple processor elements, and method and device for generating the program
WO2011027714A1 (en) Data summarization system, data summarization method and storage medium
JP2019053448A (en) Information processing apparatus, information processing method, and information processing program
US10089151B2 (en) Apparatus, method, and program medium for parallel-processing parameter determination
US11004007B2 (en) Predictor management system, predictor management method, and predictor management program
CN115185456A (en) Cluster capacity shrinkage risk prompting method, device, equipment and medium
AU2020462915B2 (en) Information processing system for assisting in solving allocation problems, and method
JP6733656B2 (en) Information processing apparatus, information processing system, plant system, information processing method, and program
JP6213665B2 (en) Information processing apparatus and clustering method
JP2015108877A (en) Prediction time distribution generation device, control method, and program
WO2017109821A1 (en) Management system and management method for computer system
US20170185397A1 (en) Associated information generation device, associated information generation method, and recording medium storing associated information generation program
CN112906723A (en) Feature selection method and device
US20130254894A1 (en) Information processing device, non-transitory computer readable medium, and information processing method
US20220253364A1 (en) Method of calculating predicted exhaustion date and non-transitory computer-readable medium
JP7355375B2 (en) Input item display control system and input item display control method
JP7258253B1 (en) Normal model generation program, normal model generation device, and normal model generation method
JP7353539B2 (en) Steady range determination system, steady range determination method, and steady range determination program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10813663

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2011529886

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10813663

Country of ref document: EP

Kind code of ref document: A1