WO2014141344A1 - データ予測装置 - Google Patents
データ予測装置 Download PDFInfo
- Publication number
- WO2014141344A1 WO2014141344A1 PCT/JP2013/007424 JP2013007424W WO2014141344A1 WO 2014141344 A1 WO2014141344 A1 WO 2014141344A1 JP 2013007424 W JP2013007424 W JP 2013007424W WO 2014141344 A1 WO2014141344 A1 WO 2014141344A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- state model
- model
- steady
- series data
- time
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
Definitions
- the present invention relates to a data prediction apparatus, and more particularly to a data prediction apparatus that predicts time-series data values.
- communication networks such as the Internet network and mobile packet network
- communication services are provided mainly on a best-effort basis, but communication throughput, which is the data size (data amount) delivered (transmitted) per unit time, is severely affected by cross traffic and radio wave conditions. Can vary. For this reason, for example, on the service provider side, it is necessary to take a countermeasure in advance by predicting the communication throughput, and a communication throughput prediction apparatus for predicting such communication throughput has been developed.
- model parameters of a mathematical model are determined from past time series data, and a predicted value is calculated based on the mathematical model.
- Non-Patent Document 1 As another communication throughput prediction apparatus, there is a communication throughput prediction apparatus described in Non-Patent Document 1.
- a fluctuation process steady process / unsteady process
- a mixed model in which a steady process model and an unsteady process model are mixed is constructed based on the discrimination history.
- a probability distribution probability density function
- a probability spread probability spread
- the communication throughput related to communication according to TCP / IP has various factors (for example, end-to-end delay, packet loss, cross traffic, radio wave intensity in wireless communication, etc.). It varies from moment to moment due to the complex action of.
- model parameters of a mathematical model are determined from past time series data, and a predicted value is calculated based on the mathematical model.
- the communication throughput fluctuation process stationary process / non-stationary process
- the probability distribution probability density function
- any of the above prediction techniques uses a time series model described by a recurrence formula (difference equation) as a prediction model. For this reason, there is a problem that a prediction model cannot be accurately constructed unless the time interval of each point of time series data of past communication throughput observed is equal. Therefore, when the past time-series data of communication throughput is at unequal intervals, the future communication throughput cannot be accurately predicted. Such a problem occurs not only in the prediction of communication throughput, but also in the case of predicting values of all time series data.
- an object of the present invention is to solve the above-described problem that the value of time series data cannot be predicted with high accuracy.
- a data prediction apparatus Data observation means for observing time-series data values; A steady-state model representing the time-series data when the fluctuation process of the time-series data is a stationary process, and a non-steady-state model representing the time-series data when the fluctuation process of the time-series data is an unsteady process.
- Model identification means for identifying each with a stochastic differential equation model based on observed past time series data, Likelihood calculation means for calculating likelihoods that are values representing the likelihood of the steady-state model and the non-steady-state model based on observed past time-series data, A mixing ratio calculating means for calculating a mixing ratio of the steady state model and the unsteady state model based on the likelihood of each of the steady state model and the unsteady state model; A probability distribution predicting means for predicting a probability distribution of time series data based on a prediction model obtained by mixing the steady state model and the non-steady state model according to the mixture ratio;
- the configuration is as follows.
- the program which is the other form of this invention is: In the information processing device, Data observation means for observing time-series data values; A steady-state model representing the time-series data when the fluctuation process of the time-series data is a stationary process, and a non-steady-state model representing the time-series data when the fluctuation process of the time-series data is an unsteady process.
- Model identification means for identifying each with a stochastic differential equation model based on observed past time series data
- Likelihood calculation means for calculating likelihoods that are values representing the likelihood of the steady-state model and the non-steady-state model based on observed past time-series data
- a mixing ratio calculating means for calculating a mixing ratio of the steady state model and the unsteady state model based on the likelihood of each of the steady state model and the unsteady state model
- a probability distribution predicting means for predicting a probability distribution of time series data based on a prediction model obtained by mixing the steady state model and the non-steady state model according to the mixture ratio; It is a program for realizing.
- a data prediction method includes: Observe the value of time series data, A steady-state model representing the time-series data when the fluctuation process of the time-series data is a stationary process, and a non-steady-state model representing the time-series data when the fluctuation process of the time-series data is an unsteady process. , Identify each with a stochastic differential equation model based on the observed historical time series data, A likelihood that is a value representing the likelihood of the steady-state model and the non-steady-state model is calculated based on observed past time-series data, respectively.
- a mixture ratio between the steady-state model and the non-steady-state model is calculated, Predicting a probability distribution of time-series data based on a prediction model obtained by mixing the steady-state model and the non-steady-state model according to the mixing ratio;
- the present invention is configured as described above, so that the value of time series data can be predicted with high accuracy.
- FIG. 1 is a functional block diagram showing the configuration of the data prediction apparatus.
- FIG. 2 is a graph showing information used in the data prediction apparatus.
- FIG. 3 is a schematic diagram showing a probability distribution of data to be predicted.
- FIG. 4 is a graph comparing the data prediction accuracy in the present embodiment with the data prediction accuracy in other techniques.
- the data prediction device 1 in the present invention is a general information processing device including an arithmetic device and a storage device.
- the data predicting apparatus 1 is constructed by incorporating a program into a computing device, and is constructed by a data observing unit 11, a steady stochastic differential equation model identifying unit 12, a non-stationary stochastic differential equation model identifying unit. 13, a likelihood calculation unit 14, a likelihood ratio test unit 15, a mixture ratio calculation unit 16, and a probability distribution prediction unit 17.
- a data observing unit 11 a steady stochastic differential equation model identifying unit 12
- a non-stationary stochastic differential equation model identifying unit. 13 a likelihood calculation unit 14
- a likelihood ratio test unit a mixture ratio calculation unit 16
- a probability distribution prediction unit 17 a probability distribution prediction unit
- the data observation unit 11 observes the target time series data ⁇ x t ⁇ .
- the time series data is an observed data string of random variables that change with time.
- the time series data that is the target in the data prediction apparatus is not limited to the communication throughput, and may be any time series data.
- the time interval between adjacent data of the observed time series data needs to be equal.
- the data prediction apparatus according to the present invention may have unequal time intervals between adjacent data as in the above example. This is because, as will be described later, a model of data at a predetermined time is identified by a stochastic differential equation model.
- the stationary stochastic differential equation model identification unit 12 (model identification unit), based on the time series data observed by the data observation unit 11 described above, obtains time series data when the variation process of the time series data is a stationary process. Identify the stochastic differential equation model (stationary stochastic differential equation model (steady state model)) to represent.
- Equation 1 a stochastic differential equation model described by Equation 1 is used as a stochastic differential equation model representing time series data.
- Equation 1 is a stochastic differential equation model in which the difference is replaced with a differential with respect to the time series model described by the recurrence formula (difference equation) of Non-Patent Document 1 described above. In this way, by making the time interval of the time series model close to infinity, a more accurate data prediction value can be obtained even if the observed time series data is unequal.
- the stochastic differential equation model expressed by Equation 1 is a stationary process when the real constant a is a> 0, and is a non-stationary process when a ⁇ 0.
- the stationary stochastic differential equation model identifying unit 12 identifies a stationary stochastic differential equation model of a> 0 in Equation 1. This is equivalent to estimating a, b, and ⁇ which are parameters of the steady stochastic differential equation model of Equation 1.
- a method for identifying a stationary stochastic differential equation model will be described in detail.
- Equation 1 The stochastic differential equation model expressed by Equation 1 is a stochastic process called the Orstein-Uhlenbeck process.
- a, b, and ⁇ are constants, it is called a Vasikek model, and a general solution is obtained.
- x s is observed at time s
- Equation 2 the general solution of x t at the subsequent time t (> s) is expressed by Equation 2.
- Equation 3 the conditional expected value and conditional variance of x t at the subsequent time t (> s) are calculated by Equation 3 and Equation 4, respectively.
- Equation 2 Since the Orstein-Uhlenbeck process is a class of the Gaussian process, the probability distribution at each time of the general solution expressed by Equation 2 is a Gaussian distribution. Therefore, if E [x t
- the stationary stochastic differential equation model identification unit 12 aims to estimate model parameters a, b, and ⁇ .
- a method for estimating the model parameters a, b, and ⁇ using a maximum likelihood estimation method will be described.
- the likelihood function L is also a, b, It is a function of ⁇ .
- a, b, and ⁇ that maximize the likelihood function L are obtained.
- Equation 7 is obtained.
- ⁇ t i t i ⁇ t i ⁇ 1 .
- Maximizing the likelihood function L is equal to maximizing the lnL that is the logarithm of the likelihood function L. Since the first term on the right side of Equation 7 is a term irrelevant to a, b, and ⁇ , the sum of the second term and the third term may be maximized.
- Equation 8 Equation 9
- the quasi-Newton method is used as a method for calculating a, b, and ⁇ that minimizes F + G.
- the specific quasi-Newton processing steps are as follows.
- ⁇ (F + G) is defined by Equation 11.
- Step 2 The search step width is obtained in accordance with the Armijo condition shown in Step 2.1 to Step 2.4 below.
- Step 2.2 If the Armijo condition expressed by Equation 12 is satisfied, go to Step 2.4. Otherwise go to 2.3.
- Step 3) ⁇ is updated by Equation 13.
- Step 4 End if the stop condition is satisfied. Otherwise go to step 5.
- the stop condition there are Expression 14 and Expression 15.
- Equations 16 and 17 are calculated.
- Step 6 The matrix B k is updated using Equation 18 (BFGS formula).
- the Armijo condition is used to determine the step width of the search in step 2, but the Wolfe condition may be used.
- the BFGS formula matrix B k instead of the BFGS formula matrix B k , an H formula that is calculated based on the inverse matrix H k of Bk may be used.
- the non-stationary stochastic differential equation model identification unit 13 (model identification means) is a time series when the variation process of the time series data is a non-stationary process based on the time series data observed by the data observation unit 11 described above.
- a non-stationary stochastic differential equation model (unsteady state model)) that is a stochastic differential equation model representing data is identified. That is, the model parameters of the non-stationary stochastic differential equation model are estimated.
- the stochastic differential equation that is the base of the time-series data model is Equation 1, and this stochastic differential equation is non-stationary when a ⁇ 0.
- this stochastic differential equation is non-stationary when a ⁇ 0.
- the stochastic differential equation model of Equation 19 is equivalent to the Brownian motion model, and there is only one model parameter ⁇ . Therefore, in order to identify the unsteady stochastic differential equation model, ⁇ may be estimated.
- ⁇ is estimated using the maximum likelihood estimation method.
- the general solution of the unsteady stochastic differential equation model of Equation 19 is Equation 20.
- conditional expectation, conditional variance, and conditional probability density function of x t at time t (> s) after x s is observed at time s are as shown in Equations 21, 22, and 23.
- Equation 25 ⁇ that maximizes the logarithm lnL of the likelihood function L of Equation 24 is calculated.
- the ⁇ is obtained analytically and is given by Equation 25.
- the likelihood calculating unit 14 is a likelihood that is a value representing the likelihood of each stochastic differential equation model identified by the stationary stochastic differential equation model identifying unit 12 and the non-stationary stochastic differential equation model identifying unit 13. The degree is calculated based on the observed time series data.
- the likelihood of the stationary stochastic differential equation model can be obtained by calculating based on Equation 6 and the likelihood of the non-stationary stochastic differential equation model based on Equation 24, respectively.
- the likelihood ratio test unit 15 (test unit) is observed based on the ratio between the likelihood of the stationary stochastic differential equation model calculated by the likelihood calculating unit 14 and the likelihood of the non-stationary stochastic differential equation model.
- the hypothesis test is performed to determine whether the time series data fits the stationary stochastic differential equation model or the non-stationary stochastic differential equation model.
- the hypothesis that “the observed time series data is data generated from a non-stationary stochastic differential equation model” is tested in the null hypothesis.
- the alternative hypothesis is that the observed time series data is data generated from a stationary stochastic differential equation model.
- R (Equation 27) obtained by multiplying the logarithm of the likelihood ratio ⁇ (Equation 26) defined below by ⁇ 2 is used for the test, where L s is a stationary stochastic differential equation model. (Sup.6), and sup ⁇ L s ⁇ is the upper limit.
- L n is the likelihood (Expression 24) of the non-stationary stochastic differential equation model, and sup ⁇ L n ⁇ is the upper limit.
- the likelihood calculated by the likelihood ratio test unit 15 may be used for sup ⁇ L s ⁇ and sup ⁇ L n ⁇ , respectively. This is because the likelihood calculated by the likelihood ratio test unit 15 is a likelihood calculated based on a model parameter that maximizes each likelihood function (Equation 6 and Equation 24), and the likelihood is considered to be an upper limit. Because it is good.
- the likelihood upper limit sup ⁇ L s ⁇ of the stationary stochastic differential equation model is always greater than or equal to the upper limit sup ⁇ L n ⁇ of the likelihood of the non-stationary stochastic differential equation model (sup ⁇ L s ⁇ ⁇ sup ⁇ L n). ⁇ ). This is because the stationary stochastic differential equation model has three model parameters (a, b, and ⁇ ), whereas the non-stationary stochastic differential equation model has one model parameter ( ⁇ only). Therefore, as in Equation 28, the statistic R is a non-negative real number.
- the likelihood sup ⁇ L s ⁇ of the stationary stochastic differential equation model is the likelihood of the nonstationary stochastic differential equation model. becomes larger than the degree sup ⁇ L n ⁇ , as a result, statistics values of R by utilizing the fact that larger, statistic R if is becomes greater than a predetermined value, the alternative hypothesis was rejected the null hypothesis (Hypothesis of stationary stochastic differential equation model) is adopted. On the other hand, if the statistic R falls below a predetermined value, the null hypothesis is accepted without being rejected.
- Threshold of whether to reject the null hypothesis is determined by the distribution of the statistic R when the null hypothesis is correct (this is called the null distribution) and a predetermined significance level. Since it is difficult to obtain the null distribution analytically, in this embodiment, the distribution obtained by the Monte Carlo simulation is used.
- FIG. 2 shows a null distribution (cumulative distribution function) obtained by Monte Carlo simulation.
- the null distribution is a distribution obtained by repeating the trial of generating 100 points of time series data and calculating the statistic R under the null hypothesis (non-stationary stochastic differential equation model) 3 million times. .
- the null hypothesis can be rejected with R> 7.6 when the significance level is 0.1, R> 9.2 when the significance level is 0.05, and R> 12.8 when the significance level is 0.01.
- the likelihood ratio test unit 15 prepares in advance a threshold value obtained based on the null distribution and significance level, or the null distribution and significance level (for example, when the significance level is 0.1, the threshold is 7.6). Then, the statistic R is calculated from the observed time-series data based on the formulas 26 and 27, and the hypothesis that the model is a stationary stochastic differential equation model is adopted based on the statistic R and the threshold value. Accept the assumption that it is a stochastic differential equation model.
- equation 30 adopts an exponential load moving average lambda t of the u t to the mixing ratio.
- ⁇ is a smoothing coefficient for exponential load movement, and 0 ⁇ ⁇ ⁇ 1.
- the stationary stochastic differential equation model and the non-stationary stochastic differential equation model are mixed. From the definition of Equation 29, the ratio of the non-stationary stochastic differential equation model matches ⁇ t .
- the probability distribution predicting unit 17 includes the mixing ratio calculated by the mixing ratio calculating unit 16, the steady stochastic differential equation model identified by the steady stochastic differential equation model identifying unit 12 based on the mixing ratio, The probability distribution of future data is predicted from the unsteady stochastic differential equation model identified by the unsteady stochastic differential equation model identifying unit 13.
- Equation 31 The probability density function of the random variable in the steady stochastic differential equation model expressed by Equation 5 is replaced with f (x t ), and the probability density function of the random variable in the non-stationary stochastic differential equation model expressed by Equation 23 is changed to g.
- (x t ) the probability density function of the random variable in the non-stationary stochastic differential equation model expressed by Equation 23 is changed to g.
- Equation 31 is a mixed normal distribution in which two normal distributions are mixed, and the expected value E mix [x t ] and the variance V mix [x t ] are calculated as in Equations 32 and 33.
- E s [x t ] and V s [x t ] are the expected value and variance of x t in the stationary stochastic differential equation model
- E n [x t ] and V n [x t ] are non-stationary.
- the stochastic diffusion represented by Equation 34 is a value obtained by adding or subtracting a constant multiple ( ⁇ times) of the standard deviation from the expected value.
- FIG. 3 is a schematic diagram showing the probability density function, expected value, and stochastic diffusion of the prediction model. Stochastic diffusion spreads over time, which represents the uncertainty of the predicted value of the data over time. In the stochastic diffusion, the spread increases as the ratio of the non-stationary stochastic differential equation model increases, and the spread decreases as the ratio of the stationary stochastic differential equation model increases.
- the prediction accuracy is shown in FIG.
- a diffusion value was obtained from a histogram of variations in actual data values, and a value obtained by subtracting an error (%) from the predicted stochastic diffusion from 100% was taken as a predicted value.
- the data to be predicted is time-series data of communication throughput in the mobile network, and the time interval of each data is unequal-interval time-series data that follows an exponential distribution of 2 seconds on average. It can be seen that the prediction method using the stochastic differential equation model has higher prediction accuracy.
- Data observation means 101 for observing time-series data values Data observation means 101 for observing time-series data values; A steady-state model representing the time-series data when the fluctuation process of the time-series data is a stationary process, and a non-steady-state model representing the time-series data when the fluctuation process of the time-series data is an unsteady process.
- Model identifying means 102 for identifying each with a stochastic differential equation model based on observed past time series data
- Likelihood calculating means 103 for calculating likelihoods that are values representing the likelihood of the steady-state model and the non-steady-state model based on observed past time-series data
- a mixing ratio calculating means 104 for calculating a mixing ratio of the steady state model and the unsteady state model based on the likelihood of each of the steady state model and the unsteady state model
- a probability distribution prediction unit 105 that predicts a probability distribution of time-series data based on a prediction model obtained by mixing the steady-state model and the non-steady-state model according to the mixture ratio
- a data prediction apparatus 100 comprising:
- Appendix 2 A data prediction apparatus according to appendix 1, wherein The model identifying means identifies the steady state model and the non-steady state model with different stochastic differential equation models, Data prediction device.
- Appendix 3 A data prediction apparatus according to appendix 1 or 2, The model identifying means identifies the steady state model with a Vasikek model and identifies the steady state model with a Brownian motion model; Data prediction device.
- Appendix 4 A data prediction apparatus according to any one of appendices 1 to 3, Based on the ratio between the likelihood of the steady-state model and the likelihood of the non-steady-state model, whether the observed time-series data matches the steady-state model or the non-steady-state model It has a verification means to verify, The mixture ratio calculating means calculates the mixture ratio of the steady state model and the unsteady state model based on the result of the test; Data prediction device.
- Appendix 5 A data prediction apparatus according to appendix 4, wherein The test means performs a hypothesis test with the null hypothesis that the observed time series data fits the non-steady state model and the alternative hypothesis that the observed time series data fits the steady state model. Do, Data prediction device.
- the mixing ratio calculation means sets a variable that becomes “0” when the result of the test matches the steady state model and becomes “1” when the non-steady state model matches, and smoothes the variable Calculated as the mixing ratio, Data prediction device.
- Data observation means for observing time-series data values; A steady-state model representing the time-series data when the fluctuation process of the time-series data is a stationary process, and a non-steady-state model representing the time-series data when the fluctuation process of the time-series data is an unsteady process.
- Model identification means for identifying each with a stochastic differential equation model based on observed past time series data, Likelihood calculation means for calculating likelihoods that are values representing the likelihood of the steady-state model and the non-steady-state model based on observed past time-series data, A mixing ratio calculating means for calculating a mixing ratio of the steady state model and the unsteady state model based on the likelihood of each of the steady state model and the unsteady state model; A probability distribution predicting means for predicting a probability distribution of time series data based on a prediction model obtained by mixing the steady state model and the non-steady state model according to the mixture ratio; A program to realize
- the model identifying means identifies the steady state model with a Vasikek model and identifies the steady state model with a Brownian motion model; program.
- a mixture ratio between the steady-state model and the non-steady-state model is calculated, Predicting a probability distribution of time-series data based on a prediction model obtained by mixing the steady-state model and the non-steady-state model according to the mixing ratio; Data prediction method device.
- the above-described program is stored in a storage device or recorded on a computer-readable recording medium.
- the recording medium is a portable medium such as a flexible disk, an optical disk, a magneto-optical disk, and a semiconductor memory.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Biology (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Algebra (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- Evolutionary Computation (AREA)
- Complex Calculations (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
時系列データの値を観測するデータ観測手段と、
時系列データの変動過程が定常過程である場合における当該時系列データを表す定常状態モデルと、時系列データの変動過程が非定常過程である場合における当該時系列データを表す非定常状態モデルとを、観測された過去の時系列データに基づいてそれぞれ確率微分方程式モデルで同定するモデル同定手段と、
前記定常状態モデルと前記非定常状態モデルとの尤もらしさを表す値である尤度を、観測された過去の時系列データに基づいてそれぞれ算出する尤度算出手段と、
前記定常状態モデルと前記非定常状態モデルとのそれぞれの前記尤度に基づいて、前記定常状態モデルと前記非定常状態モデルとの混合比を算出する混合比算出手段と、
前記混合比に従って前記定常状態モデルと前記非定常状態モデルとを混合して得られる予測モデルに基づいて、時系列データの確率分布を予測する確率分布予測手段と、
を備えた、
という構成をとる。
情報処理装置に、
時系列データの値を観測するデータ観測手段と、
時系列データの変動過程が定常過程である場合における当該時系列データを表す定常状態モデルと、時系列データの変動過程が非定常過程である場合における当該時系列データを表す非定常状態モデルとを、観測された過去の時系列データに基づいてそれぞれ確率微分方程式モデルで同定するモデル同定手段と、
前記定常状態モデルと前記非定常状態モデルとの尤もらしさを表す値である尤度を、観測された過去の時系列データに基づいてそれぞれ算出する尤度算出手段と、
前記定常状態モデルと前記非定常状態モデルとのそれぞれの前記尤度に基づいて、前記定常状態モデルと前記非定常状態モデルとの混合比を算出する混合比算出手段と、
前記混合比に従って前記定常状態モデルと前記非定常状態モデルとを混合して得られる予測モデルに基づいて、時系列データの確率分布を予測する確率分布予測手段と、
を実現させるためのプログラムである。
時系列データの値を観測し、
時系列データの変動過程が定常過程である場合における当該時系列データを表す定常状態モデルと、時系列データの変動過程が非定常過程である場合における当該時系列データを表す非定常状態モデルとを、観測された過去の時系列データに基づいてそれぞれ確率微分方程式モデルで同定し、
前記定常状態モデルと前記非定常状態モデルとの尤もらしさを表す値である尤度を、観測された過去の時系列データに基づいてそれぞれ算出し、
前記定常状態モデルと前記非定常状態モデルとのそれぞれの前記尤度に基づいて、前記定常状態モデルと前記非定常状態モデルとの混合比を算出し、
前記混合比に従って前記定常状態モデルと前記非定常状態モデルとを混合して得られる予測モデルに基づいて、時系列データの確率分布を予測する、
という構成をとる。
本発明の第1の実施形態を、図1乃至図4を参照して説明する。図1は、データ予測装置の構成を示す機能ブロック図である。図2は、データ予測装置で使用する情報を示すグラフである。図3は、予測するデータの確率分布を示す模式図である。図4は、本実施形態におけるデータ予測精度と他の技術におけるデータ予測精度とを比較したグラフである。
データ観測部11(データ観測手段)は、対象となる時系列データ{xt}を観測する。時系列データとは、時間経過によって変動する確率変数の観測されたデータ列のことである。例えば、対象となる時系列データが通信スループットであって、時刻t=0[秒],t=1.5[秒]、t=4.1[秒]に、それぞれx=5[Mbps]、x=3[Mbps]、x=7[Mbps]、という数値が観測された場合、観測された時系列データは、{x0=5,x1.5=3,x4.1=7}となる。なお、データ予測装置において対象となる時系列データは、通信スループットであることに限定されず、いかなる時系列データであってもよい。
定常確率微分方程式モデル同定部12(モデル同定手段)は、上述したデータ観測部11で観測された時系列データに基づいて、当該時系列データの変動過程が定常過程である場合における時系列データを表す確率微分方程式モデル(定常確率微分方程式モデル(定常状態モデル))を同定する。
(ステップ0)適当な初期値θ0を与え,初期B0は3×3の単位行列とする。
(ステップ1)数10で表される連立一次方程式を解いて探索方向ベクトルdを求める。
(ステップ2.1)βk,0=1,i=0,0<ξ<1,0<τ<1とおく。
(ステップ2.2)数12で表されるArmijo条件を満足するならステップ2.4へ。それ以外は2.3へ。
(ステップ2.4)αk=βk,iとおく。
(ステップ3)数13でθを更新する。
非定常確率微分方程式モデル同定部13(モデル同定手段)は、上述したデータ観測部11で観測された時系列データに基づいて、当該時系列データの変動過程が非定常過程である場合における時系列データを表す確率微分方程式モデルである非定常確率微分方程式モデル(非定常状態モデル))を同定する。つまり、非定常確率微分方程式モデルのモデルパラメータを推定する。
尤度算出部14(尤度算出手段)は、上記定常確率微分方程式モデル同定部12及び上記非定常確率微分方程式モデル同定部13で同定した各確率微分方程式モデルの尤もらしさを表す値である尤度を、観測された時系列データに基づいてそれぞれ算出する。定常確率微分方程式モデルの尤度は数6、非定常確率微分方程式モデルの尤度は数24、に基づいてそれぞれ計算することで得られる。
尤度比検定部15(検定手段)は、上記尤度算出部14で算出した定常確率微分方程式モデルの尤度と、非定常確率微分方程式モデルの尤度と、の比に基づいて、観測された時系列データが、定常確率微分方程式モデルに適合するか非定常確率微分方程式モデルに適合するか仮説検定にかける。
混合比算出部16(混合比算出手段)、上記尤度比検定部15の検定結果の履歴に基づいて、定常確率微分方程式モデル同定部12で同定した定常確率微分方程式モデルと、上記非定常確率微分方程式モデル同定部13で同定した非定常確率微分方程式モデルと、を混合する割合を表す混合比を算出する。
確率分布予測部17(確率分布予測手段)は、上記混合比算出部16で算出した混合比と、当該混合比に基づいて定常確率微分方程式モデル同定部12で同定した定常確率微分方程式モデルと、非定常確率微分方程式モデル同定部13で同定した非定常確率微分方程式モデルとから、未来のデータの確率分布を予測する。
ここで、未来のデータの値を予測する場合、未来のデータが確率的にどの程度の幅に存在するかという目安が分かれば便利なことがある。この確率的な変動幅を確率的拡散と呼び、数34で定義する。
上記実施形態の一部又は全部は、以下の付記のようにも記載されうる。以下、本発明におけるデータ予測装置(図5参照)、プログラム、データ予測方法の構成の概略を説明する。但し、本発明は、以下の構成に限定されない。
時系列データの値を観測するデータ観測手段101と、
時系列データの変動過程が定常過程である場合における当該時系列データを表す定常状態モデルと、時系列データの変動過程が非定常過程である場合における当該時系列データを表す非定常状態モデルとを、観測された過去の時系列データに基づいてそれぞれ確率微分方程式モデルで同定するモデル同定手段102と、
前記定常状態モデルと前記非定常状態モデルとの尤もらしさを表す値である尤度を、観測された過去の時系列データに基づいてそれぞれ算出する尤度算出手段103と、
前記定常状態モデルと前記非定常状態モデルとのそれぞれの前記尤度に基づいて、前記定常状態モデルと前記非定常状態モデルとの混合比を算出する混合比算出手段104と、
前記混合比に従って前記定常状態モデルと前記非定常状態モデルとを混合して得られる予測モデルに基づいて、時系列データの確率分布を予測する確率分布予測手段105と、
を備えたデータ予測装置100。
付記1に記載のデータ予測装置であって、
前記モデル同定手段は、前記定常状態モデルと前記非定常状態モデルとをそれぞれ異なる確率微分方程式モデルで同定する、
データ予測装置。
付記1又は2に記載のデータ予測装置であって、
前記モデル同定手段は、前記定常状態モデルをVasicekモデルで同定し、前記定常状態モデルをブラウン運動モデルで同定する、
データ予測装置。
付記1乃至3のいずれかに記載のデータ予測装置であって、
前記定常状態モデルの前記尤度と前記非定常状態モデルの前記尤度との比に基づいて、観測された時系列データが前記定常状態モデルと前記非定常状態モデルとのいずれに適合するかを検定する検定手段を備え、
前記混合比算出手段は、前記検定の結果に基づいて前記定常状態モデルと前記非定常状態モデルとの前記混合比を算出する、
データ予測装置。
付記4に記載のデータ予測装置であって、
前記検定手段は、観測された時系列データが前記非定常状態モデルに適合することを帰無仮説とし、観測された時系列データが前記定常状態モデルに適合することを対立仮説とする仮説検定を行う、
データ予測装置。
付記4又は5に記載のデータ予測装置であって、
前記混合比算出手段は、前記検定の結果、前記定常状態モデルに適合した場合に「0」となり、前記非定常状態モデルに適合した場合に「1」となる変数を設定し、当該変数を平滑化した値を前記混合比として算出する、
データ予測装置。
情報処理装置に、
時系列データの値を観測するデータ観測手段と、
時系列データの変動過程が定常過程である場合における当該時系列データを表す定常状態モデルと、時系列データの変動過程が非定常過程である場合における当該時系列データを表す非定常状態モデルとを、観測された過去の時系列データに基づいてそれぞれ確率微分方程式モデルで同定するモデル同定手段と、
前記定常状態モデルと前記非定常状態モデルとの尤もらしさを表す値である尤度を、観測された過去の時系列データに基づいてそれぞれ算出する尤度算出手段と、
前記定常状態モデルと前記非定常状態モデルとのそれぞれの前記尤度に基づいて、前記定常状態モデルと前記非定常状態モデルとの混合比を算出する混合比算出手段と、
前記混合比に従って前記定常状態モデルと前記非定常状態モデルとを混合して得られる予測モデルに基づいて、時系列データの確率分布を予測する確率分布予測手段と、
を実現させるためのプログラム。
付記7に記載のプログラムであって、
前記モデル同定手段は、前記定常状態モデルをVasicekモデルで同定し、前記定常状態モデルをブラウン運動モデルで同定する、
プログラム。
時系列データの値を観測し、
時系列データの変動過程が定常過程である場合における当該時系列データを表す定常状態モデルと、時系列データの変動過程が非定常過程である場合における当該時系列データを表す非定常状態モデルとを、観測された過去の時系列データに基づいてそれぞれ確率微分方程式モデルで同定し、
前記定常状態モデルと前記非定常状態モデルとの尤もらしさを表す値である尤度を、観測された過去の時系列データに基づいてそれぞれ算出し、
前記定常状態モデルと前記非定常状態モデルとのそれぞれの前記尤度に基づいて、前記定常状態モデルと前記非定常状態モデルとの混合比を算出し、
前記混合比に従って前記定常状態モデルと前記非定常状態モデルとを混合して得られる予測モデルに基づいて、時系列データの確率分布を予測する、
データ予測方法装置。
付記9に記載のデータ予測方法であって、
前記定常状態モデルをVasicekモデルで同定し、前記定常状態モデルをブラウン運動モデルで同定する、
データ予測方法。
11 データ観測部
12 定常確率微分方程式モデル同定部
13 非定常確率微分方程式モデル同定部
14 尤度算出部
15 尤度比検定部
16 混合比算出部
17 確率分布予測部
100 データ予測装置
101 データ観測手段
102 モデル同定手段
103 尤度算出手段
104 混合比算出手段
105 確率分布予測手段
Claims (10)
- 時系列データの値を観測するデータ観測手段と、
時系列データの変動過程が定常過程である場合における当該時系列データを表す定常状態モデルと、時系列データの変動過程が非定常過程である場合における当該時系列データを表す非定常状態モデルとを、観測された過去の時系列データに基づいてそれぞれ確率微分方程式モデルで同定するモデル同定手段と、
前記定常状態モデルと前記非定常状態モデルとの尤もらしさを表す値である尤度を、観測された過去の時系列データに基づいてそれぞれ算出する尤度算出手段と、
前記定常状態モデルと前記非定常状態モデルとのそれぞれの前記尤度に基づいて、前記定常状態モデルと前記非定常状態モデルとの混合比を算出する混合比算出手段と、
前記混合比に従って前記定常状態モデルと前記非定常状態モデルとを混合して得られる予測モデルに基づいて、時系列データの確率分布を予測する確率分布予測手段と、
を備えたデータ予測装置。 - 請求項1に記載のデータ予測装置であって、
前記モデル同定手段は、前記定常状態モデルと前記非定常状態モデルとをそれぞれ異なる確率微分方程式モデルで同定する、
データ予測装置。 - 請求項1又は2に記載のデータ予測装置であって、
前記モデル同定手段は、前記定常状態モデルをVasicekモデルで同定し、前記定常状態モデルをブラウン運動モデルで同定する、
データ予測装置。 - 請求項1乃至3のいずれかに記載のデータ予測装置であって、
前記定常状態モデルの前記尤度と前記非定常状態モデルの前記尤度との比に基づいて、観測された時系列データが前記定常状態モデルと前記非定常状態モデルとのいずれに適合するかを検定する検定手段を備え、
前記混合比算出手段は、前記検定の結果に基づいて前記定常状態モデルと前記非定常状態モデルとの前記混合比を算出する、
データ予測装置。 - 請求項4に記載のデータ予測装置であって、
前記検定手段は、観測された時系列データが前記非定常状態モデルに適合することを帰無仮説とし、観測された時系列データが前記定常状態モデルに適合することを対立仮説とする仮説検定を行う、
データ予測装置。 - 請求項4又は5に記載のデータ予測装置であって、
前記混合比算出手段は、前記検定の結果、前記定常状態モデルに適合した場合に「0」となり、前記非定常状態モデルに適合した場合に「1」となる変数を設定し、当該変数を平滑化した値を前記混合比として算出する、
データ予測装置。 - 情報処理装置に、
時系列データの値を観測するデータ観測手段と、
時系列データの変動過程が定常過程である場合における当該時系列データを表す定常状態モデルと、時系列データの変動過程が非定常過程である場合における当該時系列データを表す非定常状態モデルとを、観測された過去の時系列データに基づいてそれぞれ確率微分方程式モデルで同定するモデル同定手段と、
前記定常状態モデルと前記非定常状態モデルとの尤もらしさを表す値である尤度を、観測された過去の時系列データに基づいてそれぞれ算出する尤度算出手段と、
前記定常状態モデルと前記非定常状態モデルとのそれぞれの前記尤度に基づいて、前記定常状態モデルと前記非定常状態モデルとの混合比を算出する混合比算出手段と、
前記混合比に従って前記定常状態モデルと前記非定常状態モデルとを混合して得られる予測モデルに基づいて、時系列データの確率分布を予測する確率分布予測手段と、
を実現させるためのプログラム。 - 請求項7に記載のプログラムであって、
前記モデル同定手段は、前記定常状態モデルをVasicekモデルで同定し、前記定常状態モデルをブラウン運動モデルで同定する、
プログラム。 - 時系列データの値を観測し、
時系列データの変動過程が定常過程である場合における当該時系列データを表す定常状態モデルと、時系列データの変動過程が非定常過程である場合における当該時系列データを表す非定常状態モデルとを、観測された過去の時系列データに基づいてそれぞれ確率微分方程式モデルで同定し、
前記定常状態モデルと前記非定常状態モデルとの尤もらしさを表す値である尤度を、観測された過去の時系列データに基づいてそれぞれ算出し、
前記定常状態モデルと前記非定常状態モデルとのそれぞれの前記尤度に基づいて、前記定常状態モデルと前記非定常状態モデルとの混合比を算出し、
前記混合比に従って前記定常状態モデルと前記非定常状態モデルとを混合して得られる予測モデルに基づいて、時系列データの確率分布を予測する、
データ予測方法装置。 - 請求項9に記載のデータ予測方法であって、
前記定常状態モデルをVasicekモデルで同定し、前記定常状態モデルをブラウン運動モデルで同定する、
データ予測方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2015505088A JP6337881B2 (ja) | 2013-03-14 | 2013-12-18 | データ予測装置 |
US14/775,485 US20160042101A1 (en) | 2013-03-14 | 2013-12-18 | Data prediction apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2013-051205 | 2013-03-14 | ||
JP2013051205 | 2013-03-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014141344A1 true WO2014141344A1 (ja) | 2014-09-18 |
Family
ID=51536047
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/007424 WO2014141344A1 (ja) | 2013-03-14 | 2013-12-18 | データ予測装置 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20160042101A1 (ja) |
JP (1) | JP6337881B2 (ja) |
WO (1) | WO2014141344A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018139300A1 (ja) * | 2017-01-24 | 2018-08-02 | 日本電気株式会社 | 情報処理装置、情報処理方法、及び、情報処理プログラムが記録された記録媒体 |
WO2019215810A1 (ja) * | 2018-05-08 | 2019-11-14 | 株式会社日立製作所 | データ分析装置、電力潮流解析装置およびデータ分析方法 |
KR20210092482A (ko) * | 2020-01-16 | 2021-07-26 | 주식회사 에이젠글로벌 | 인공지능을 이용한 사기거래 탐지 시스템 및 사기거래 탐지 방법 |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9934259B2 (en) | 2013-08-15 | 2018-04-03 | Sas Institute Inc. | In-memory time series database and processing in a distributed environment |
US9892370B2 (en) | 2014-06-12 | 2018-02-13 | Sas Institute Inc. | Systems and methods for resolving over multiple hierarchies |
US9418339B1 (en) * | 2015-01-26 | 2016-08-16 | Sas Institute, Inc. | Systems and methods for time series analysis techniques utilizing count data sets |
CN107665276A (zh) * | 2017-09-18 | 2018-02-06 | 天津大学 | 基于符号化模态及转换频次的时间序列复杂性测算方法 |
US10560313B2 (en) | 2018-06-26 | 2020-02-11 | Sas Institute Inc. | Pipeline system for time-series data forecasting |
US10685283B2 (en) | 2018-06-26 | 2020-06-16 | Sas Institute Inc. | Demand classification based pipeline system for time-series data forecasting |
JP2022072271A (ja) | 2020-10-29 | 2022-05-17 | 本田技研工業株式会社 | 情報処理装置、移動体、プログラム、及び情報処理方法 |
JP7410839B2 (ja) | 2020-10-29 | 2024-01-10 | 本田技研工業株式会社 | 情報処理装置、移動体、プログラム、及び情報処理方法 |
-
2013
- 2013-12-18 US US14/775,485 patent/US20160042101A1/en not_active Abandoned
- 2013-12-18 WO PCT/JP2013/007424 patent/WO2014141344A1/ja active Application Filing
- 2013-12-18 JP JP2015505088A patent/JP6337881B2/ja active Active
Non-Patent Citations (1)
Title |
---|
YUJI YOSHIDA ET AL.: "Application of TCP Throughput Prediction to Video Streaming Control and Its Evaluation", IEICE TECHNICAL REPORT, vol. 112, no. 464, 28 February 2013 (2013-02-28), pages 281 - 286 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018139300A1 (ja) * | 2017-01-24 | 2018-08-02 | 日本電気株式会社 | 情報処理装置、情報処理方法、及び、情報処理プログラムが記録された記録媒体 |
JPWO2018139300A1 (ja) * | 2017-01-24 | 2019-11-07 | 日本電気株式会社 | 情報処理装置、情報処理方法、及び、情報処理プログラムが記録された記録媒体 |
WO2019215810A1 (ja) * | 2018-05-08 | 2019-11-14 | 株式会社日立製作所 | データ分析装置、電力潮流解析装置およびデータ分析方法 |
JPWO2019215810A1 (ja) * | 2018-05-08 | 2021-01-14 | 株式会社日立製作所 | データ分析装置、電力潮流解析装置およびデータ分析方法 |
KR20210092482A (ko) * | 2020-01-16 | 2021-07-26 | 주식회사 에이젠글로벌 | 인공지능을 이용한 사기거래 탐지 시스템 및 사기거래 탐지 방법 |
KR102412432B1 (ko) * | 2020-01-16 | 2022-06-23 | 주식회사 에이젠글로벌 | 인공지능을 이용한 사기거래 탐지 시스템 및 사기거래 탐지 방법 |
Also Published As
Publication number | Publication date |
---|---|
JP6337881B2 (ja) | 2018-06-06 |
US20160042101A1 (en) | 2016-02-11 |
JPWO2014141344A1 (ja) | 2017-02-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6337881B2 (ja) | データ予測装置 | |
CN110149237A (zh) | 一种Hadoop平台计算节点负载预测方法 | |
Ganji et al. | Advance first order second moment (AFOSM) method for single reservoir operation reliability analysis: a case study | |
Azzouz et al. | Steady state IBEA assisted by MLP neural networks for expensive multi-objective optimization problems | |
Dushkin et al. | An improved method for predicting the evolution of the characteristic parameters of an information system | |
Daolio et al. | Local optima networks and the performance of iterated local search | |
CN111460692A (zh) | 考虑退化速率相互影响的设备剩余寿命预测方法及系统 | |
US20090138238A1 (en) | Sequential fixed-point quantile estimation | |
Lee et al. | Test for parameter change in diffusion processes by cusum statistics based on one-step estimators | |
Abate et al. | Approximate abstractions of stochastic systems: A randomized method | |
Oreshkin et al. | Efficient delay-tolerant particle filtering | |
CN102932264A (zh) | 流量溢出的判断方法和装置 | |
Shu et al. | Adaptive CUSUM procedures with Markovian mean estimation | |
Wei et al. | History-based throughput prediction with Hidden Markov Model in mobile networks | |
US10445444B2 (en) | Flow rate prediction device, mixing ratio estimation device, method, and computer-readable recording medium | |
Borodina et al. | Application of splitting to failure estimation in controllable degradation system | |
Mohammadpour et al. | Selecting the best flood flow frequency model using multi-criteria group decision-making | |
CN104679939A (zh) | 一种飞机设计经济可承受性评估过程的多准则决策方法 | |
Wu et al. | Software reliability modeling based on SVM and virtual sample | |
WO2023139640A1 (ja) | 情報処理装置および情報処理方法 | |
Bladt et al. | Simple simulation of diffusion bridges with application to likelihood inference for diffusions | |
Begin et al. | A DFO technique to calibrate queueing models | |
Sadre et al. | Fitting heavy-tailed HTTP traces with the new stratified EM-algorithm | |
Rehman et al. | Is there self-similarity in cloud QoS data? | |
Gunasekera | Inferences on the common scale parameter of several exponential populations based on the generalized variable method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13877559 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14775485 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2015505088 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13877559 Country of ref document: EP Kind code of ref document: A1 |