WO2024089770A1 - Information processing program, device, and method - Google Patents

Information processing program, device, and method Download PDF

Info

Publication number
WO2024089770A1
WO2024089770A1 PCT/JP2022/039750 JP2022039750W WO2024089770A1 WO 2024089770 A1 WO2024089770 A1 WO 2024089770A1 JP 2022039750 W JP2022039750 W JP 2022039750W WO 2024089770 A1 WO2024089770 A1 WO 2024089770A1
Authority
WO
WIPO (PCT)
Prior art keywords
predicted
latent variables
latent
time
data
Prior art date
Application number
PCT/JP2022/039750
Other languages
French (fr)
Japanese (ja)
Inventor
正之 廣本
章 中川
Original Assignee
富士通株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士通株式会社 filed Critical 富士通株式会社
Priority to PCT/JP2022/039750 priority Critical patent/WO2024089770A1/en
Publication of WO2024089770A1 publication Critical patent/WO2024089770A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the disclosed technology relates to an information processing program, an information processing device, and an information processing method.
  • future values of each data item have been predicted based on multidimensional past time series data. For example, when predicting the stock prices of multiple stocks, the stock price of each stock at the next time is predicted based on past data.
  • One method for predicting such time series data is to use a Gaussian process. With this method, it is possible to simultaneously calculate the predicted value and variance by predicting the time series data as a probability distribution.
  • a predictive model that uses a recurrent neural network (RNN) or a long short-term memory (LSTM) neural network. With this method, the use of a neural network allows for highly accurate predictions.
  • RNN recurrent neural network
  • LSTM long short-term memory
  • prediction methods using Gaussian processes have the problem that they are limited in their prediction accuracy for complex correlated data. Specifically, because this method is based on using a simple regression model, measures such as manually selecting an appropriate kernel are necessary to improve prediction accuracy. In addition, this method assumes the prediction of a stationary process and is unsuitable for predicting complex data that is context-dependent. Furthermore, methods that use RNN or LSTM neural networks only predict data, so the variance of the predicted value cannot be obtained. Additionally, methods that use generative model-type deep learning technology with isometricity have the problem that they are unsuitable for predicting time series data.
  • the disclosed technology aims to obtain the covariance of predicted values at the same time as the predicted values when predicting each data item based on multidimensional time series data.
  • the disclosed technology trains an autoencoder that includes an encoder that converts multidimensional input data into latent variables and a decoder that restores the latent variables to minimize a loss function.
  • the loss function includes a restoration error of the autoencoder.
  • the loss function also includes an encoded information amount of the latent variables calculated based on a probability distribution generated by adding noise to the latent variables and a prior distribution of the latent variables predicted based on time series data for the input data for a predetermined period of time in the past.
  • the disclosed technology also inputs latent variables predicted based on time series data for multidimensional prediction target data for a predetermined period of time to the decoder of the trained autoencoder. In this way, the disclosed technology calculates the predicted value of the multidimensional prediction target data and the covariance between the predicted values.
  • One aspect is that when predicting each piece of data based on multidimensional time series data, it is possible to obtain the covariance of the predicted values at the same time as the predicted values.
  • FIG. 1 is a diagram illustrating an example of the configuration of an isometric generation model that is an existing technology.
  • FIG. 1 is a functional block diagram of a machine learning device according to an embodiment of the present invention.
  • FIG. 1 illustrates an example of the configuration of a machine learning device.
  • FIG. 2 is a functional block diagram of the prediction device according to the present embodiment.
  • FIG. 1 illustrates an example of the configuration of a prediction device.
  • FIG. 1 is a block diagram showing a schematic configuration of a computer that functions as a machine learning device.
  • FIG. 1 is a block diagram showing a schematic configuration of a computer functioning as a prediction device.
  • 1 is a flowchart illustrating an example of a machine learning process.
  • 13 is a flowchart illustrating an example of a prediction process.
  • FIG. 1 illustrates how forecast values, variances, and covariances are calculated for multi-dimensional time series data. 1 is a comparison table between the present method and the reference method.
  • a prior distribution of context-dependent latent variables is introduced into a method using a generative model-type deep learning technology with isometry (hereinafter, referred to as an "isometry generative model").
  • the isometric generative model is composed of an autoencoder including an encoder that encodes input data x to obtain a latent variable z, and a decoder that restores the latent variable z and converts it into an output x ⁇ (in the figure and in the formulas described later, a " ⁇ (hat)" is written above "x").
  • the decoder receives the latent variable z to which noise ⁇ ⁇ N(0, ( ⁇ /2)I) has been applied. Note that ⁇ is a hyperparameter and I is a unit matrix.
  • E x ⁇ p(x), ⁇ N(0,( ⁇ /2)I) [X] represents the average of X when the input x is sampled multiple times from the probability distribution p(x) and the noise ⁇ is sampled multiple times from the normal distribution N(0,( ⁇ /2)I).
  • the isometric generative model can acquire a low-dimensional latent representation while maintaining the characteristics of the input data by reducing the amount of encoded information R of the latent variable z at the same time as the recovery error D, and can also calculate the probability distribution of the output by isometricity.
  • the above isometry generative model is not suitable for predicting time series data. Therefore, in this embodiment, past time series data is provided as context to the above isometry generative model, thereby making it possible to predict time series data. Furthermore, in this embodiment, by taking advantage of the advantages of the isometry generative model, data prediction can be performed with high accuracy, and the variance of the predicted value can be calculated at the same time as the predicted value. Furthermore, in this embodiment, the covariance between the predicted values of data for each dimension of multidimensional data can also be calculated.
  • the information processing system includes a machine learning device and a prediction device.
  • the machine learning device 10 functionally includes a distribution prediction unit 11, a first conversion unit 12, a second conversion unit 13, a loss calculation unit 14, and an update unit 15.
  • each functional unit will be described with reference to the configuration example of the machine learning device 10 shown in FIG. 3.
  • the distribution prediction unit 11 predicts the prior distribution of the latent variable based on the time series data of the input data for a predetermined period of time in the past. Specifically, the distribution prediction unit 11 predicts the mean and variance of the latent variable for the input data by inputting the time series data of the input data for a predetermined period of time in the past to the distribution predictor.
  • N-dimensional data s t ⁇ s t 1 , s t 2 , ..., s t N ⁇ at time t is input x
  • h ⁇ is the distribution predictor
  • is a parameter of the distribution predictor h ⁇
  • the distribution prediction unit 11 predicts a prior distribution q (z
  • the first conversion unit 12 converts the input x, which is N-dimensional input data, into a K-dimensional latent variable using the encoder of the autoencoder.
  • f ⁇ is the encoder
  • is a parameter of the encoder f ⁇ .
  • the second conversion unit 13 applies noise ⁇ to the latent variable y to convert it into a latent variable z.
  • the noise ⁇ follows a normal distribution with a mean of 0 and a variance of ( ⁇ /2)I.
  • is a hyperparameter, and I is a unit matrix. Therefore, the probability distribution p(z
  • x) N(z;y,( ⁇ /2)I).
  • the second conversion unit 13 converts the K-dimensional latent variable z into an N-dimensional output x ⁇ using a decoder of an autoencoder.
  • g ⁇ is the decoder
  • is a parameter of the decoder g ⁇ .
  • the loss calculation unit 14 calculates a loss function including the recovery error of the autoencoder and the amount of coded information of the latent variable. Specifically, the loss calculation unit 14 calculates the square error between the input x and the output x ⁇ as the recovery error D(x, x ⁇ ). In addition, the loss calculation unit 14 calculates the amount of information representing the difference between the probability distribution p(z
  • the loss calculation unit 14 calculates the weighted sum of the calculated recovery error D(x, x ⁇ ) and the amount of encoded information of the latent variable D KL (p(z
  • the update unit 15 trains the encoder f ⁇ , the decoder g ⁇ , and the distribution predictor h ⁇ of the autoencoder so as to minimize the loss function calculated by the loss calculation unit 14. Specifically, as shown in the following formula (4), the parameter ⁇ of the encoder f ⁇ , the parameter ⁇ of the decoder g ⁇ , and the parameter ⁇ of the distribution predictor h ⁇ are updated so as to minimize the loss function L ⁇ , ⁇ , ⁇ (x, C).
  • the prediction device 20 functionally includes a distribution prediction unit 21, a data prediction unit 22, and a covariance calculation unit 23. Below, each functional unit will be described with reference to the configuration example of the prediction device 20 shown in FIG. 5.
  • the data prediction unit 22 predicts the mean and variance of latent variables for the prediction target data by inputting time series data for a past period T for data at time t, which is the prediction target data, i.e., context C, to a distribution predictor trained by the machine learning device 10.
  • h ⁇ is the distribution predictor
  • the data prediction unit 22 predicts the prediction target data by inputting the mean ⁇ C of the K-dimensional latent variables predicted by the distribution prediction unit 21 to the decoder of the autoencoder trained by the machine learning device 10.
  • g ⁇ is the decoder
  • is a parameter of the decoder g ⁇ .
  • x ⁇ (in the figures and in the formulas described later, the notation " ⁇ (tilde)" above "x" is the predicted value of the prediction target data.
  • the data prediction unit 22 outputs the predicted value x ⁇ of the prediction target data.
  • the covariance calculation unit 23 calculates the covariance Cov(x i ⁇ , x j ⁇ ) of the predicted values from the variance ⁇ C of the K-dimensional latent variables predicted by the distribution prediction unit 21 and the gradient obtained by numerical differentiation of the decoder, as shown in the following formula (5) .
  • i and j are indexes of each dimension of the predicted value x ⁇ .
  • the covariance calculation unit 23 outputs the calculated covariance Cov( x i ⁇ , x j ⁇ ) .
  • the machine learning device 10 may be realized, for example, by a computer 30 shown in FIG. 6.
  • the computer 30 includes a CPU (Central Processing Unit) 31, a memory 32 as a temporary storage area, and a non-volatile storage device 33.
  • the computer 30 also includes an input/output device 34 such as an input device and a display device, and an R/W (Read/Write) device 35 that controls the reading and writing of data from and to a storage medium 39.
  • the computer 30 also includes a communication I/F (Interface) 36 that is connected to a network such as the Internet.
  • the CPU 31, memory 32, storage device 33, input/output device 34, R/W device 35, and communication I/F 36 are connected to each other via a bus 37.
  • the storage device 33 is, for example, a hard disk drive (HDD), a solid state drive (SSD), flash memory, etc.
  • the storage device 33 which serves as a storage medium, stores a machine learning program 40 for causing the computer 30 to function as the machine learning device 10.
  • the machine learning program 40 has a distribution prediction process control instruction 41, a first conversion process control instruction 42, a second conversion process control instruction 43, a loss calculation process control instruction 44, and an update process control instruction 45.
  • the storage device 33 also has an information storage area 50 in which information constituting the distribution predictor, the encoder and the decoder of the autoencoder is stored.
  • the CPU 31 reads the machine learning program 40 from the storage device 33, expands it in the memory 32, and sequentially executes the control instructions of the machine learning program 40.
  • the CPU 31 operates as the distribution prediction unit 11 shown in FIG. 2 by executing the distribution prediction process control instruction 41.
  • the CPU 31 also operates as the first conversion unit 12 shown in FIG. 2 by executing the first conversion process control instruction 42.
  • the CPU 31 also operates as the second conversion unit 13 shown in FIG. 2 by executing the second conversion process control instruction 43.
  • the CPU 31 also operates as the loss calculation unit 14 shown in FIG. 2 by executing the loss calculation process control instruction 44.
  • the CPU 31 also operates as the update unit 15 shown in FIG. 2 by executing the update process control instruction 45.
  • the CPU 31 also reads information from the information storage area 50 and expands the distribution predictor and the encoder and decoder of the autoencoder in the memory 32. As a result, the computer 30 that has executed the machine learning program 40 functions as the machine learning device 10.
  • the CPU 31 that executes the program is hardware.
  • the prediction device 20 may be realized, for example, by a computer 60 shown in FIG. 7.
  • the computer 60 includes a CPU 61, a memory 62 as a temporary storage area, and a non-volatile storage device 63.
  • the computer 60 also includes an input/output device 64, an R/W device 65 that controls the reading and writing of data from and to a storage medium 69, and a communication I/F 66.
  • the CPU 61, memory 62, storage device 63, input/output device 64, R/W device 65, and communication I/F 66 are connected to each other via a bus 67.
  • the storage device 63 is, for example, an HDD, SSD, flash memory, etc.
  • a prediction program 70 for causing the computer 60 to function as the prediction device 20 is stored in the storage device 63 as a storage medium.
  • the prediction program 70 has a distribution prediction process control instruction 71, a data prediction process control instruction 72, and a covariance calculation process control instruction 73.
  • the storage device 63 also has an information storage area 80 in which information constituting the trained distribution predictor and the decoder of the autoencoder is stored.
  • the CPU 61 reads out the prediction program 70 from the storage device 63, loads it in the memory 62, and sequentially executes the control instructions of the prediction program 70.
  • the CPU 61 operates as the distribution prediction unit 21 shown in FIG. 4 by executing the distribution prediction process control instruction 71.
  • the CPU 61 also operates as the data prediction unit 22 shown in FIG. 4 by executing the data prediction process control instruction 72.
  • the CPU 61 also operates as the covariance calculation unit 23 shown in FIG. 4 by executing the covariance calculation process control instruction 73.
  • the CPU 61 also reads out information from the information storage area 80, and loads the trained distribution predictor and the autoencoder decoder in the memory 62. In this way, the computer 60 that has executed the prediction program 70 functions as the prediction device 20.
  • the CPU 61 that executes the program is hardware.
  • each of the machine learning program 40 and the prediction program 70 may be realized, for example, by a semiconductor integrated circuit, more specifically, an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), etc.
  • the machine learning program 40 and the prediction program 70 are examples of information processing programs of the disclosed technology.
  • the machine learning device 10 executes the machine learning process shown in FIG. 8.
  • the prediction device 20 executes the prediction process shown in FIG. 9. Note that the machine learning process and the prediction process are examples of the information processing method of the disclosed technology.
  • step S11 the first conversion unit 12 acquires N-dimensional data s t at time t as an input x.
  • the distribution prediction unit 11 acquires time-series data for a past period T (from t ⁇ t to t ⁇ 1) as a context C.
  • step S12 the first conversion unit 12 converts the input x into a K-dimensional latent variable y using the encoder f ⁇ of the autoencoder.
  • step S13 the second conversion unit 13 applies noise ⁇ to the latent variable y to convert it into a latent variable z.
  • step S14 the second conversion unit 13 converts the K-dimensional latent variable z into an N-dimensional output x ⁇ using a decoder g ⁇ of the autoencoder.
  • step S15 the loss calculation unit 14 calculates the recovery error D(x, x ⁇ ) between the input x and the output x ⁇ .
  • step S16 the distribution prediction unit 11 predicts the mean ⁇ C and variance ⁇ C of the latent variable for the input x by inputting the context C to the distribution predictor h ⁇ .
  • step S17 the distribution prediction unit 11 predicts a prior distribution q(z
  • step S18 the loss calculation unit 14 calculates the amount of coded information D KL (p(z
  • step S19 the loss calculation unit 14 calculates the weighted sum of the calculated recovery error D(x, x ⁇ ) and the amount of coded information D KL (p(z
  • step S20 the update unit 15 updates the parameter ⁇ of the encoder f ⁇ , the parameter ⁇ of the decoder g ⁇ , and the parameter ⁇ of the distribution predictor h ⁇ so as to minimize the loss function L ⁇ , ⁇ , ⁇ (x,C).
  • the update unit 15 determines whether the machine learning has converged. For example, the update unit 15 may determine that the machine learning has converged when the number of updates of the parameters ⁇ , ⁇ , and ⁇ reaches a predetermined number, when the value of the loss function becomes equal to or less than a predetermined value, when the difference between the loss function calculated last time and the loss function calculated this time becomes equal to or less than a predetermined value, etc. If the machine learning has not converged, the process returns to step S12, and if the machine learning has converged, the machine learning process ends.
  • steps S12 to S15 and steps S16 to S17 of the machine learning process may be reversed, or may be executed in parallel.
  • the prediction process is performed using a distribution predictor h ⁇ to which the final parameter ⁇ updated by the above machine learning process is set, and a decoder g ⁇ of the autoencoder to which the final parameter ⁇ is set, that is, a trained distribution predictor h ⁇ and a decoder g ⁇ .
  • step S31 the distribution prediction unit 21 acquires time series data for a past period T (from t-t to t-1) as a context C for the data at time t, which is the data to be predicted.
  • step S32 the distribution prediction unit 21 inputs the context C to a trained distribution predictor h ⁇ , thereby predicting the mean ⁇ C and variance ⁇ C of the latent variables for the data to be predicted.
  • step S33 the data prediction unit 22 predicts the predicted value x ⁇ of the prediction target data by inputting the predicted mean ⁇ C of the K-dimensional latent variable to the decoder g ⁇ of the trained autoencoder.
  • step S34 the covariance calculation unit 23 calculates the covariance Cov(x i ⁇ , x j ⁇ ) of the predicted value from the variance ⁇ C of the K-dimensional latent variable predicted by the distribution prediction unit 21 and the gradient obtained by numerical differentiation of the decoder.
  • step S35 the data prediction unit 22 outputs the predicted predicted value x ⁇
  • the covariance calculation unit 23 outputs the calculated covariance Cov(x i ⁇ , x j ⁇ ), and the prediction process ends.
  • the machine learning device trains an autoencoder including an encoder that converts multidimensional input data into latent variables and a decoder that restores the latent variables so as to minimize a loss function.
  • the loss function includes the restoration error of the autoencoder and the encoded information amount of the latent variables calculated based on a probability distribution generated by adding noise to the latent variables and a prior distribution of the latent variables predicted based on time series data for a predetermined period of time in the past for the input data.
  • the prediction device inputs latent variables predicted based on time series data for the multidimensional prediction target data for a predetermined period of time into the decoder of the trained autoencoder, thereby calculating multidimensional prediction values of the prediction target data and covariance between the multidimensional prediction values.
  • Figure 10 when predicting each piece of data based on multidimensional time series data, the predicted value as well as the variance and covariance of the predicted value can be obtained. Note that Figure 10 shows an example in which the predicted value and variance for each piece of two-dimensional time series data and the covariance between the data are calculated.
  • the covariance of predicted values is calculated at the same time as predicting multidimensional time series data, it is useful to apply this to predicting data in which there is correlation between dimensions. For example, when predicting the stock prices of multiple stocks, the variance obtained for the predicted values of each stock is useful information for determining which stocks should be bought and sold. Furthermore, the covariance of predicted values between stocks is useful information for determining in which combination stocks should be bought and sold.
  • FIG. 11 shows a comparison table between the method according to this embodiment (hereinafter referred to as "this method") and the reference methods.
  • reference method 1 is the prediction method using the Gaussian process described above
  • reference method 2 is a method that uses an RNN or LSTM neural network
  • reference method 3 is an isometric generative model.
  • Reference method 1 has a problem with prediction accuracy for complex data that has context dependency.
  • Reference method 2 has a problem in that it is not possible to calculate variance simultaneously with prediction.
  • Reference method 3 has a problem in that it is not suitable for predicting time series data. This method solves all of the problems of the reference methods, and furthermore, it is possible to calculate covariance between predicted values.
  • the machine learning device and the prediction device are configured as separate computers, but this is not limiting. They may be realized as an information processing device having a machine learning unit equivalent to the machine learning device and a prediction unit equivalent to the prediction device, on a single computer.
  • the machine learning program and the prediction program are pre-stored (installed) in the storage device, but this is not limited to this.
  • the programs related to the disclosed technology may be provided in a form stored in a storage medium such as a CD-ROM, DVD-ROM, or USB memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention calculates predicted values of multi-dimensional data to be predicted and the covariance of the predicted values by: training an auto-encoder, which includes an encoder that converts multi-dimensional input data into latent variables and a decoder that reconstructs the latent variables, so as to minimize a loss function including a coding information quantity of the latent variables, the coding information quantity being calculated on the basis of a probability distribution, which is generated by adding noise to the latent variables and a reconstruction error of the auto-encoder, and a priori distribution, which is predicted on the basis of time-series data regarding input data for the past prescribed time period, of the latent variables; and inputting, to the decoder of the trained auto-encoder, the latent variables predicted on the basis of the time-series data regarding the multi-dimensional data to be predicted for the past prescribed time period.

Description

情報処理プログラム、装置、及び方法Information processing program, device, and method
 開示の技術は、情報処理プログラム、情報処理装置、及び情報処理方法に関する。 The disclosed technology relates to an information processing program, an information processing device, and an information processing method.
 従来、多次元の過去の時系列データに基づいて、各データの今後の値を予測することが行われている。例えば、複数銘柄の株価の予測において、過去のデータを元に次の時刻の各銘柄の株価を予測するような場合である。  Traditionally, future values of each data item have been predicted based on multidimensional past time series data. For example, when predicting the stock prices of multiple stocks, the stock price of each stock at the next time is predicted based on past data.
 このような時系列データの予測を行う手法として、ガウス過程を用いた予測手法が存在する。この手法では、時系列データを確率分布として予測することで、予測値と分散とを同時に算出可能である。また、回帰型ニューラルネットワーク(RNN:Recurrent neural network)や長・短期記憶(LSTM:Long Short Term Memory)ニューラルネットワークを利用した予測モデルにより、時系列データの予測を行う手法も存在する。この手法では、ニューラルネットワークを用いることで、高精度な予測が可能である。 One method for predicting such time series data is to use a Gaussian process. With this method, it is possible to simultaneously calculate the predicted value and variance by predicting the time series data as a probability distribution. There are also methods for predicting time series data using a predictive model that uses a recurrent neural network (RNN) or a long short-term memory (LSTM) neural network. With this method, the use of a neural network allows for highly accurate predictions.
 また、多次元データの予測を行う手法として、等長性を持つ生成モデル型ディープラーニング技術を用いた手法も存在する。この手法では、データの潜在表現を確率分布として扱う高精度な機械学習モデルにより、予測値とその予測値の分散とが同時に得られる。 There is also a method for predicting multidimensional data that uses generative model-based deep learning technology with isometric properties. With this method, a highly accurate machine learning model that treats the latent representation of the data as a probability distribution can be used to simultaneously obtain a predicted value and the variance of that predicted value.
国際公開第2021/059348号International Publication No. 2021/059348 国際公開第2021/059349号International Publication No. 2021/059349
 しかしながら、ガウス過程による予測手法では、相関を持つ複雑なデータに対する予測精度に限界があるという問題がある。具体的には、この手法では、単純な回帰モデルを用いることが基本であるため、予測精度の向上のためには、手動で適切なカーネルを選定する等の対処が必要となる。また、この手法では、定常過程の予測を想定しており、コンテキスト依存性を持つ複雑なデータの予測には不適である。また、RNNやLSTMニューラルネットワークを利用した手法では、データの予測のみを行うため、予測値の分散を得ることができない。また、等長性を持つ生成モデル型ディープラーニング技術を用いた手法は、時系列データの予測には不適であるという問題がある。 However, prediction methods using Gaussian processes have the problem that they are limited in their prediction accuracy for complex correlated data. Specifically, because this method is based on using a simple regression model, measures such as manually selecting an appropriate kernel are necessary to improve prediction accuracy. In addition, this method assumes the prediction of a stationary process and is unsuitable for predicting complex data that is context-dependent. Furthermore, methods that use RNN or LSTM neural networks only predict data, so the variance of the predicted value cannot be obtained. Additionally, methods that use generative model-type deep learning technology with isometricity have the problem that they are unsuitable for predicting time series data.
 一つの側面として、開示の技術は、多次元の時系列データに基づく各データの予測において、予測値と同時に予測値の共分散を得ることを目的とする。 In one aspect, the disclosed technology aims to obtain the covariance of predicted values at the same time as the predicted values when predicting each data item based on multidimensional time series data.
 一つの態様として、開示の技術は、多次元の入力データを潜在変数に変換するエンコーダ、及び前記潜在変数を復元するデコーダを含むオートエンコーダを、損失関数を最小化するように訓練する。損失関数は、前記オートエンコーダの復元誤差を含む。また、損失関数は、前記潜在変数にノイズを加えて生成された確率分布と前記入力データについての過去所定期間分の時系列データに基づいて予測された前記潜在変数の事前分布とに基づいて算出された前記潜在変数の符号化情報量を含む。また、開示の技術は、訓練された前記オートエンコーダのデコーダに、多次元の予測対象データについての過去所定期間分の時系列データに基づいて予測される潜在変数を入力する。これにより、開示の技術は、前記多次元の予測対象データの予測値、及び前記予測値間の共分散を算出する。 In one embodiment, the disclosed technology trains an autoencoder that includes an encoder that converts multidimensional input data into latent variables and a decoder that restores the latent variables to minimize a loss function. The loss function includes a restoration error of the autoencoder. The loss function also includes an encoded information amount of the latent variables calculated based on a probability distribution generated by adding noise to the latent variables and a prior distribution of the latent variables predicted based on time series data for the input data for a predetermined period of time in the past. The disclosed technology also inputs latent variables predicted based on time series data for multidimensional prediction target data for a predetermined period of time to the decoder of the trained autoencoder. In this way, the disclosed technology calculates the predicted value of the multidimensional prediction target data and the covariance between the predicted values.
 一つの側面として、多次元の時系列データに基づく各データの予測において、予測値と同時に予測値の共分散を得ることができる、という効果を有する。 One aspect is that when predicting each piece of data based on multidimensional time series data, it is possible to obtain the covariance of the predicted values at the same time as the predicted values.
既存技術である等長性生成モデルの構成例を示す図である。FIG. 1 is a diagram illustrating an example of the configuration of an isometric generation model that is an existing technology. 本実施形態に係る機械学習装置の機能ブロック図である。FIG. 1 is a functional block diagram of a machine learning device according to an embodiment of the present invention. 機械学習装置の構成例を示す図である。FIG. 1 illustrates an example of the configuration of a machine learning device. 本実施形態に係る予測装置の機能ブロック図である。FIG. 2 is a functional block diagram of the prediction device according to the present embodiment. 予測装置の構成例を示す図である。FIG. 1 illustrates an example of the configuration of a prediction device. 機械学習装置として機能するコンピュータの概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a computer that functions as a machine learning device. 予測装置として機能するコンピュータの概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a computer functioning as a prediction device. 機械学習処理の一例を示すフローチャートである。1 is a flowchart illustrating an example of a machine learning process. 予測処理の一例を示すフローチャートである。13 is a flowchart illustrating an example of a prediction process. 多次元の時系列データについての予測値、分散、及び共分散が算出されることを示す図である。FIG. 1 illustrates how forecast values, variances, and covariances are calculated for multi-dimensional time series data. 本手法と参考手法との比較表である。1 is a comparison table between the present method and the reference method.
 以下、図面を参照して、開示の技術に係る実施形態の一例を説明する。 Below, an example of an embodiment of the disclosed technology is described with reference to the drawings.
 まず、本実施形態の概要について説明する。本実施形態では、等長性を持つ生成モデル型ディープラーニング技術を用いた手法(以下、「等長性生成モデル」という)に、コンテキスト依存の潜在変数の事前分布を導入する。 First, an overview of this embodiment will be described. In this embodiment, a prior distribution of context-dependent latent variables is introduced into a method using a generative model-type deep learning technology with isometry (hereinafter, referred to as an "isometry generative model").
 ここで、既存技術である等長性生成モデル(例えば、非特許文献1及び2)について説明する。図1に示すように、等長性生成モデルは、入力データxを符号化して潜在変数zを得るエンコーダと、潜在変数zを復元して出力x^(図及び後述する数式内では、「x」の上に「^(ハット)」の表記)に変換するデコーダとを含むオートエンコーダで構成される。エンコーダのパラメータをφ、デコーダのパラメータをθとすると、fφ(x)=z、gθ(z)=x^である。デコーダには、潜在変数zにノイズε~N(0,(β/2)I)が印加された潜在変数zが入力される。なお、βはハイパーパラメータ、Iは単位行列である。 Here, an isometric generative model (for example, Non-Patent Documents 1 and 2), which is an existing technology, will be described. As shown in FIG. 1, the isometric generative model is composed of an autoencoder including an encoder that encodes input data x to obtain a latent variable z, and a decoder that restores the latent variable z and converts it into an output x^ (in the figure and in the formulas described later, a "^ (hat)" is written above "x"). If the parameter of the encoder is φ and the parameter of the decoder is θ, then f φ (x) = z and g θ (z) = x^. The decoder receives the latent variable z to which noise ε ~ N(0, (β/2)I) has been applied. Note that β is a hyperparameter and I is a unit matrix.
 また、等長性生成モデルは、入力xをエンコーダfφで変換した潜在変数zの確率分布pψ(z)を算出し、潜在変数zの符号化情報量を、R=-log(pψ(z))として算出する。さらに、等長性生成モデルは、入力xと出力x^との誤差である復元誤差D(x,x^)を算出し、下記(1)式に示す最適化により、エンコーダfφ、デコーダgθ、及び潜在変数zの確率分布pψを訓練する。 The isometric generative model also calculates a probability distribution p ψ (z) of a latent variable z obtained by converting an input x with an encoder f φ , and calculates the amount of encoded information of the latent variable z as R = -log(p ψ (z)). Furthermore, the isometric generative model calculates a restoration error D(x, x^), which is the error between the input x and the output x^, and trains the encoder f φ , the decoder g θ , and the probability distribution p ψ of the latent variable z by optimization shown in the following formula (1).
 なお、Ex~p(x),ε~N(0,(β/2)I)[X]は、確率分布p(x)から入力xを、正規分布N(0,(β/2)I)からノイズεをそれぞれ複数サンプリングした場合のXの平均を表す。このように、等長性生成モデルは、復元誤差Dと同時に、潜在変数zの符号化情報量Rを削減することで、入力データの特徴を保ちつつ低次元な潜在表現を獲得可能であり、また、等長性により出力の確率分布が算出可能である。 Here, E x~p(x), ε~N(0,(β/2)I) [X] represents the average of X when the input x is sampled multiple times from the probability distribution p(x) and the noise ε is sampled multiple times from the normal distribution N(0,(β/2)I). In this way, the isometric generative model can acquire a low-dimensional latent representation while maintaining the characteristics of the input data by reducing the amount of encoded information R of the latent variable z at the same time as the recovery error D, and can also calculate the probability distribution of the output by isometricity.
 しかし、上述したように、上記の等長性生成モデルは、時系列データの予測に不適である。そこで、本実施形態では、上記の等長性生成モデルに、過去の時系列データをコンテキストとして与えることにより、時系列データの予測を可能とする。また、本実施形態では、等長性生成モデルの利点を活かし、データ予測を高精度に行うと共に、予測値と同時に予測値の分散も算出可能である。さらに、本実施形態では、多次元データの各次元のデータの予測値間の共分散も算出可能である。 However, as mentioned above, the above isometry generative model is not suitable for predicting time series data. Therefore, in this embodiment, past time series data is provided as context to the above isometry generative model, thereby making it possible to predict time series data. Furthermore, in this embodiment, by taking advantage of the advantages of the isometry generative model, data prediction can be performed with high accuracy, and the variance of the predicted value can be calculated at the same time as the predicted value. Furthermore, in this embodiment, the covariance between the predicted values of data for each dimension of multidimensional data can also be calculated.
 以下、本実施形態に係る情報処理システムについて詳述する。 The information processing system according to this embodiment is described in detail below.
 本実施形態に係る情報処理システムは、機械学習装置と予測装置とを含む。 The information processing system according to this embodiment includes a machine learning device and a prediction device.
 まず、機械学習装置について説明する。図2に示すように、機械学習装置10は、機能的には、分布予測部11と、第1変換部12と、第2変換部13と、損失算出部14と、更新部15とを含む。以下、図3に示す機械学習装置10の構成例を参照し、各機能部について説明する。 First, the machine learning device will be described. As shown in FIG. 2, the machine learning device 10 functionally includes a distribution prediction unit 11, a first conversion unit 12, a second conversion unit 13, a loss calculation unit 14, and an update unit 15. Below, each functional unit will be described with reference to the configuration example of the machine learning device 10 shown in FIG. 3.
 分布予測部11は、入力データについての過去所定期間分の時系列データに基づいて、潜在変数の事前分布を予測する。具体的には、分布予測部11は、分布予測器に、入力データについての過去所定期間分の時系列データを入力することにより、入力データについての潜在変数の平均及び分散を予測する。図3の例において、時刻tにおけるN次元のデータs={s ,s ,・・・,s }を入力xとし、過去の期間Tの時系列データをコンテキストC={st-1,st-2,・・・,st-T}とする。また、図3の例では、hψが分布予測器であり、ψは分布予測器hψのパラメータである。分布予測器hψで予測される平均μ及び分散σは、μ,σ=hψ(C)と表される。平均μは、μ={μ ,・・・,μ ,・・・,μ }、分散σは、σ={σ ,・・・,σ ,・・・,σ }である。kは、潜在変数の各次元のインデックスであり、Kは、潜在変数の次元数であり、K<Nである。 The distribution prediction unit 11 predicts the prior distribution of the latent variable based on the time series data of the input data for a predetermined period of time in the past. Specifically, the distribution prediction unit 11 predicts the mean and variance of the latent variable for the input data by inputting the time series data of the input data for a predetermined period of time in the past to the distribution predictor. In the example of FIG. 3, N-dimensional data s t = {s t 1 , s t 2 , ..., s t N } at time t is input x, and time series data of the past period T is context C = {s t-1 , s t-2 , ..., s t-T }. In the example of FIG. 3, h ψ is the distribution predictor, and ψ is a parameter of the distribution predictor h ψ . The mean μ C and variance σ C predicted by the distribution predictor h ψ are expressed as μ C , σ C = h ψ (C). The mean μ C is μ C = {μ 1 C , ..., μ k C , ..., μ K C }, and the variance σ C is σ C = {σ 1 C , ..., σ k C , ..., σ K C }, where k is the index of each dimension of the latent variable, K is the number of dimensions of the latent variable, and K<N.
 分布予測部11は、分布予測器hψで予測される平均μ及び分散σで表される、潜在変数の次元数分の確率分布を混合することにより、潜在変数zについてのコンテキストCに依存した事前分布q(z|C)を予測する。すなわち、事前分布q(z|C)は、下記(2)式で表される。なお、(2)式では、各次元の潜在変数zの確率分布を、平均μ 、分散σ の正規分布とする例を示している。 The distribution prediction unit 11 predicts a prior distribution q (z|C) for a latent variable z that depends on a context C by mixing probability distributions for the number of dimensions of the latent variable, represented by the mean μ C and variance σ C predicted by the distribution predictor h ψ. That is, the prior distribution q(z|C) is represented by the following formula (2). Note that formula (2) shows an example in which the probability distribution of the latent variable z k of each dimension is a normal distribution with mean μ k C and variance σ k C.
 第1変換部12は、N次元の入力データである入力xを、オートエンコーダのエンコーダを用いて、K次元の潜在変数に変換する。図3の例では、fφがエンコーダであり、φはエンコーダfφのパラメータである。エンコーダfφによる変換は、y=fφ(x)と表される。なお、y={y ,・・・,y ,・・・,y }である。 The first conversion unit 12 converts the input x, which is N-dimensional input data, into a K-dimensional latent variable using the encoder of the autoencoder. In the example of FIG. 3, f φ is the encoder, and φ is a parameter of the encoder f φ . The conversion by the encoder f φ is expressed as y=f φ (x), where y={y t 1 ,..., y t k ,..., y t K }.
 第2変換部13は、潜在変数yにノイズεを印加して潜在変数zに変換する。図3の例では、ノイズεは、平均0、分散(β/2)Iの正規分布に従う。βはハイパーパラメータ、Iは単位行列である。したがって、入力xについての潜在変数zの確率分布p(z|x)は、z~p(z|x)=N(z;y,(β/2)I)となる。そして、第2変換部13は、K次元の潜在変数zを、オートエンコーダのデコーダを用いて、N次元の出力x^に変換する。図3の例では、gθがデコーダであり、θはデコーダgθのパラメータである。デコーダgθによる変換は、x^=gθ(z)と表される。 The second conversion unit 13 applies noise ε to the latent variable y to convert it into a latent variable z. In the example of FIG. 3, the noise ε follows a normal distribution with a mean of 0 and a variance of (β/2)I. β is a hyperparameter, and I is a unit matrix. Therefore, the probability distribution p(z|x) of the latent variable z for the input x is z~p(z|x)=N(z;y,(β/2)I). Then, the second conversion unit 13 converts the K-dimensional latent variable z into an N-dimensional output x^ using a decoder of an autoencoder. In the example of FIG. 3, g θ is the decoder, and θ is a parameter of the decoder g θ . The conversion by the decoder g θ is expressed as x^=g θ (z).
 損失算出部14は、オートエンコーダの復元誤差と、潜在変数の符号化情報量とを含む損失関数を算出する。具体的には、損失算出部14は、入力xと出力x^との二乗誤差等を、復元誤差D(x,x^)として算出する。また、損失算出部14は、潜在変数yにノイズεを印加した潜在変数zの確率分布p(z|x)と、コンテキストCに依存する潜在変数の事前分布q(z|C)との差を表す情報量を、潜在変数の符号化情報量として算出する。図3では、確率分布p(z|x)と事前分布q(z|C)とのカルバック・ライブラー情報量を、潜在変数の符号化情報量DKL(p(z|x)||q(z|C))として算出する例を示している。 The loss calculation unit 14 calculates a loss function including the recovery error of the autoencoder and the amount of coded information of the latent variable. Specifically, the loss calculation unit 14 calculates the square error between the input x and the output x^ as the recovery error D(x, x^). In addition, the loss calculation unit 14 calculates the amount of information representing the difference between the probability distribution p(z|x) of the latent variable z obtained by applying noise ε to the latent variable y and the prior distribution q(z|C) of the latent variable depending on the context C as the amount of coded information of the latent variable. FIG. 3 shows an example in which the Kullback-Leibler divergence between the probability distribution p(z|x) and the prior distribution q(z|C) is calculated as the amount of coded information of the latent variable D KL (p(z|x)∥q(z|C)).
 損失算出部14は、下記(3)式に示すように、算出した復元誤差D(x,x^)と、潜在変数の符号化情報量DKL(p(z|x)||q(z|C))との重み付き和を、損失関数Lθ,φ,ψ(x,C)として算出する。 The loss calculation unit 14 calculates the weighted sum of the calculated recovery error D(x, x^) and the amount of encoded information of the latent variable D KL (p(z|x)∥q(z|C)) as a loss function L θ,φ,ψ (x, C) as shown in the following equation (3).
 更新部15は、損失算出部14で算出された損失関数を最小化するように、オートエンコーダのエンコーダfφ、デコーダgθ、及び分布予測器hψを訓練する。具体的には、下記(4)式に示すように、損失関数Lθ,φ,ψ(x,C)を最小化するように、エンコーダfφのパラメータφ、デコーダgθのパラメータθ、及び分布予測器hψのパラメータψを更新する。 The update unit 15 trains the encoder f φ , the decoder g θ , and the distribution predictor h ψ of the autoencoder so as to minimize the loss function calculated by the loss calculation unit 14. Specifically, as shown in the following formula (4), the parameter φ of the encoder f φ , the parameter θ of the decoder g θ , and the parameter ψ of the distribution predictor h ψ are updated so as to minimize the loss function L θ,φ,ψ (x, C).
 次に、予測装置について説明する。図4に示すように、予測装置20は、機能的には、分布予測部21と、データ予測部22と、共分散算出部23とを含む。以下、図5に示す予測装置20の構成例を参照し、各機能部について説明する。 Next, the prediction device will be described. As shown in FIG. 4, the prediction device 20 functionally includes a distribution prediction unit 21, a data prediction unit 22, and a covariance calculation unit 23. Below, each functional unit will be described with reference to the configuration example of the prediction device 20 shown in FIG. 5.
 データ予測部22は、機械学習装置10により訓練された分布予測器に、予測対象データである時刻tのデータについての過去の期間Tの時系列データ、すなわちコンテキストCを入力することで、予測対象データについての潜在変数の平均及び分散を予測する。図5の例では、hψが分布予測器であり、分布予測器hψで予測される平均μ及び分散σは、μ,σ=hψ(C)と表される。 The data prediction unit 22 predicts the mean and variance of latent variables for the prediction target data by inputting time series data for a past period T for data at time t, which is the prediction target data, i.e., context C, to a distribution predictor trained by the machine learning device 10. In the example of Fig. 5, h ψ is the distribution predictor, and the mean μ C and variance σ C predicted by the distribution predictor h ψ are expressed as μ C , σ C = h ψ (C).
 データ予測部22は、分布予測部21で予測されたK次元の潜在変数の平均μを、機械学習装置10により訓練されたオートエンコーダのデコーダに入力することで、予測対象データを予測する。図5の例では、gθがデコーダであり、θはデコーダgθのパラメータである。デコーダgθによる変換は、x=gθ(μ)と表される。x(図及び後述する数式内では、「x」の上に「~(チルダ)」の表記)は、予測対象データの予測値である。データ予測部22は、予測した予測対象データの予測値xを出力する。 The data prediction unit 22 predicts the prediction target data by inputting the mean μ C of the K-dimensional latent variables predicted by the distribution prediction unit 21 to the decoder of the autoencoder trained by the machine learning device 10. In the example of FIG. 5, g θ is the decoder, and θ is a parameter of the decoder g θ . The conversion by the decoder g θ is expressed as x =g θC ). x ∼ (in the figures and in the formulas described later, the notation "∼ (tilde)" above "x") is the predicted value of the prediction target data. The data prediction unit 22 outputs the predicted value x of the prediction target data.
 共分散算出部23は、分布予測部21で予測されたK次元の潜在変数の分散σと、デコーダの数値微分により得られた勾配から、下記(5)式に示すように、予測値の共分散Cov(x ,x )を算出する。i及びjは、予測値xの各次元のインデックスである。共分散算出部23は、算出した共分散Cov(x ,x )を出力する。 The covariance calculation unit 23 calculates the covariance Cov(x i ~ , x j ~ ) of the predicted values from the variance σ C of the K-dimensional latent variables predicted by the distribution prediction unit 21 and the gradient obtained by numerical differentiation of the decoder, as shown in the following formula (5) . i and j are indexes of each dimension of the predicted value x ~ . The covariance calculation unit 23 outputs the calculated covariance Cov( x i ~ , x j ~ ) .
 機械学習装置10は、例えば図6に示すコンピュータ30で実現されてよい。コンピュータ30は、CPU(Central Processing Unit)31と、一時記憶領域としてのメモリ32と、不揮発性の記憶装置33とを備える。また、コンピュータ30は、入力装置、表示装置等の入出力装置34と、記憶媒体39に対するデータの読み込み及び書き込みを制御するR/W(Read/Write)装置35とを備える。また、コンピュータ30は、インターネット等のネットワークに接続される通信I/F(Interface)36を備える。CPU31、メモリ32、記憶装置33、入出力装置34、R/W装置35、及び通信I/F36は、バス37を介して互いに接続される。 The machine learning device 10 may be realized, for example, by a computer 30 shown in FIG. 6. The computer 30 includes a CPU (Central Processing Unit) 31, a memory 32 as a temporary storage area, and a non-volatile storage device 33. The computer 30 also includes an input/output device 34 such as an input device and a display device, and an R/W (Read/Write) device 35 that controls the reading and writing of data from and to a storage medium 39. The computer 30 also includes a communication I/F (Interface) 36 that is connected to a network such as the Internet. The CPU 31, memory 32, storage device 33, input/output device 34, R/W device 35, and communication I/F 36 are connected to each other via a bus 37.
 記憶装置33は、例えば、HDD(Hard Disk Drive)、SSD(Solid State Drive)、フラッシュメモリ等である。記憶媒体としての記憶装置33には、コンピュータ30を、機械学習装置10として機能させるための機械学習プログラム40が記憶される。機械学習プログラム40は、分布予測プロセス制御命令41と、第1変換プロセス制御命令42と、第2変換プロセス制御命令43と、損失算出プロセス制御命令44と、更新プロセス制御命令45とを有する。また、記憶装置33は、分布予測器、オートエンコーダのエンコーダ及びデコーダを構成する情報が記憶される情報記憶領域50を有する。 The storage device 33 is, for example, a hard disk drive (HDD), a solid state drive (SSD), flash memory, etc. The storage device 33, which serves as a storage medium, stores a machine learning program 40 for causing the computer 30 to function as the machine learning device 10. The machine learning program 40 has a distribution prediction process control instruction 41, a first conversion process control instruction 42, a second conversion process control instruction 43, a loss calculation process control instruction 44, and an update process control instruction 45. The storage device 33 also has an information storage area 50 in which information constituting the distribution predictor, the encoder and the decoder of the autoencoder is stored.
 CPU31は、機械学習プログラム40を記憶装置33から読み出してメモリ32に展開し、機械学習プログラム40が有する制御命令を順次実行する。CPU31は、分布予測プロセス制御命令41を実行することで、図2に示す分布予測部11として動作する。また、CPU31は、第1変換プロセス制御命令42を実行することで、図2に示す第1変換部12として動作する。また、CPU31は、第2変換プロセス制御命令43を実行することで、図2に示す第2変換部13として動作する。また、CPU31は、損失算出プロセス制御命令44を実行することで、図2に示す損失算出部14として動作する。また、CPU31は、更新プロセス制御命令45を実行することで、図2に示す更新部15として動作する。また、CPU31は、情報記憶領域50から情報を読み出して、分布予測器、オートエンコーダのエンコーダ及びデコーダをメモリ32に展開する。これにより、機械学習プログラム40を実行したコンピュータ30が、機械学習装置10として機能することになる。なお、プログラムを実行するCPU31はハードウェアである。 The CPU 31 reads the machine learning program 40 from the storage device 33, expands it in the memory 32, and sequentially executes the control instructions of the machine learning program 40. The CPU 31 operates as the distribution prediction unit 11 shown in FIG. 2 by executing the distribution prediction process control instruction 41. The CPU 31 also operates as the first conversion unit 12 shown in FIG. 2 by executing the first conversion process control instruction 42. The CPU 31 also operates as the second conversion unit 13 shown in FIG. 2 by executing the second conversion process control instruction 43. The CPU 31 also operates as the loss calculation unit 14 shown in FIG. 2 by executing the loss calculation process control instruction 44. The CPU 31 also operates as the update unit 15 shown in FIG. 2 by executing the update process control instruction 45. The CPU 31 also reads information from the information storage area 50 and expands the distribution predictor and the encoder and decoder of the autoencoder in the memory 32. As a result, the computer 30 that has executed the machine learning program 40 functions as the machine learning device 10. The CPU 31 that executes the program is hardware.
 予測装置20は、例えば図7に示すコンピュータ60で実現されてよい。コンピュータ60は、CPU61と、一時記憶領域としてのメモリ62と、不揮発性の記憶装置63とを備える。また、コンピュータ60は、入出力装置64と、記憶媒体69に対するデータの読み込み及び書き込みを制御するR/W装置65と、通信I/F66とを備える。CPU61、メモリ62、記憶装置63、入出力装置64、R/W装置65、及び通信I/F66は、バス67を介して互いに接続される。 The prediction device 20 may be realized, for example, by a computer 60 shown in FIG. 7. The computer 60 includes a CPU 61, a memory 62 as a temporary storage area, and a non-volatile storage device 63. The computer 60 also includes an input/output device 64, an R/W device 65 that controls the reading and writing of data from and to a storage medium 69, and a communication I/F 66. The CPU 61, memory 62, storage device 63, input/output device 64, R/W device 65, and communication I/F 66 are connected to each other via a bus 67.
 記憶装置63は、例えば、HDD、SSD、フラッシュメモリ等である。記憶媒体としての記憶装置63には、コンピュータ60を、予測装置20として機能させるための予測プログラム70が記憶される。予測プログラム70は、分布予測プロセス制御命令71と、データ予測プロセス制御命令72と、共分散算出プロセス制御命令73とを有する。また、記憶装置63は、訓練済みの分布予測器、及びオートエンコーダのデコーダを構成する情報が記憶される情報記憶領域80を有する。 The storage device 63 is, for example, an HDD, SSD, flash memory, etc. A prediction program 70 for causing the computer 60 to function as the prediction device 20 is stored in the storage device 63 as a storage medium. The prediction program 70 has a distribution prediction process control instruction 71, a data prediction process control instruction 72, and a covariance calculation process control instruction 73. The storage device 63 also has an information storage area 80 in which information constituting the trained distribution predictor and the decoder of the autoencoder is stored.
 CPU61は、予測プログラム70を記憶装置63から読み出してメモリ62に展開し、予測プログラム70が有する制御命令を順次実行する。CPU61は、分布予測プロセス制御命令71を実行することで、図4に示す分布予測部21として動作する。また、CPU61は、データ予測プロセス制御命令72を実行することで、図4に示すデータ予測部22として動作する。また、CPU61は、共分散算出プロセス制御命令73を実行することで、図4に示す共分散算出部23として動作する。また、CPU61は、情報記憶領域80から情報を読み出して、訓練済みの分布予測器、及びオートエンコーダのデコーダをメモリ62に展開する。これにより、予測プログラム70を実行したコンピュータ60が、予測装置20として機能することになる。なお、プログラムを実行するCPU61はハードウェアである。 The CPU 61 reads out the prediction program 70 from the storage device 63, loads it in the memory 62, and sequentially executes the control instructions of the prediction program 70. The CPU 61 operates as the distribution prediction unit 21 shown in FIG. 4 by executing the distribution prediction process control instruction 71. The CPU 61 also operates as the data prediction unit 22 shown in FIG. 4 by executing the data prediction process control instruction 72. The CPU 61 also operates as the covariance calculation unit 23 shown in FIG. 4 by executing the covariance calculation process control instruction 73. The CPU 61 also reads out information from the information storage area 80, and loads the trained distribution predictor and the autoencoder decoder in the memory 62. In this way, the computer 60 that has executed the prediction program 70 functions as the prediction device 20. The CPU 61 that executes the program is hardware.
 なお、機械学習プログラム40及び予測プログラム70の各々により実現される機能は、例えば半導体集積回路、より詳しくはASIC(Application Specific Integrated Circuit)、FPGA(Field-Programmable Gate Array)等で実現されてもよい。機械学習プログラム40及び予測プログラム70は、開示の技術の情報処理プログラムの一例である。 The functions realized by each of the machine learning program 40 and the prediction program 70 may be realized, for example, by a semiconductor integrated circuit, more specifically, an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), etc. The machine learning program 40 and the prediction program 70 are examples of information processing programs of the disclosed technology.
 次に、本実施形態に係る情報処理システムの動作について説明する。分布予測器、オートエンコーダのエンコーダ及びデコーダの訓練時において、機械学習装置10が、図8に示す機械学習処理を実行する。また、予測対象データの予測時において、予測装置20が、図9に示す予測処理を実行する。なお、機械学習処理及び予測処理は、開示の技術の情報処理方法の一例である。 Next, the operation of the information processing system according to this embodiment will be described. When training the distribution predictor and the encoder and decoder of the autoencoder, the machine learning device 10 executes the machine learning process shown in FIG. 8. When predicting the data to be predicted, the prediction device 20 executes the prediction process shown in FIG. 9. Note that the machine learning process and the prediction process are examples of the information processing method of the disclosed technology.
 まず、図8に示す機械学習処理について説明する。 First, we will explain the machine learning process shown in Figure 8.
 ステップS11で、第1変換部12が、入力xとして、時刻tにおけるN次元のデータsを取得する。また、分布予測部11が、コンテキストCとして、過去の期間T(t-t~t-1まで)の時系列データを取得する。 In step S11, the first conversion unit 12 acquires N-dimensional data s t at time t as an input x. The distribution prediction unit 11 acquires time-series data for a past period T (from t−t to t−1) as a context C.
 次に、ステップS12で、第1変換部12が、入力xを、オートエンコーダのエンコーダfφを用いて、K次元の潜在変数yに変換する。次に、ステップS13で、第2変換部13が、潜在変数yにノイズεを印加して潜在変数zに変換する。次に、ステップS14で、第2変換部13が、K次元の潜在変数zを、オートエンコーダのデコーダgθを用いて、N次元の出力x^に変換する。次に、ステップS15で、損失算出部14が、入力xと出力x^との復元誤差D(x,x^)を算出する。 Next, in step S12, the first conversion unit 12 converts the input x into a K-dimensional latent variable y using the encoder f φ of the autoencoder. Next, in step S13, the second conversion unit 13 applies noise ε to the latent variable y to convert it into a latent variable z. Next, in step S14, the second conversion unit 13 converts the K-dimensional latent variable z into an N-dimensional output x^ using a decoder g θ of the autoencoder. Next, in step S15, the loss calculation unit 14 calculates the recovery error D(x, x^) between the input x and the output x^.
 次に、ステップS16で、分布予測部11が、コンテキストCを分布予測器hψに入力することにより、入力xについての潜在変数の平均μ及び分散σを予測する。次に、ステップS17で、分布予測部11が、予測した平均μ及び分散σから、潜在変数zについてのコンテキストCに依存した事前分布q(z|C)を予測する。 Next, in step S16, the distribution prediction unit 11 predicts the mean μ C and variance σ C of the latent variable for the input x by inputting the context C to the distribution predictor h ψ . Next, in step S17, the distribution prediction unit 11 predicts a prior distribution q(z|C) that depends on the context C for the latent variable z from the predicted mean μ C and variance σ C.
 次に、ステップS18で、損失算出部14が、潜在変数yにノイズεを印加した潜在変数zの確率分布p(z|x)と、コンテキストCに依存する潜在変数の事前分布q(z|C)とから、潜在変数の符号化情報量DKL(p(z|x)||q(z|C))を算出する。次に、ステップS19で、損失算出部14が、算出した復元誤差D(x,x^)と、潜在変数の符号化情報量DKL(p(z|x)||q(z|C))との重み付き和を、損失関数Lθ,φ,ψ(x,C)として算出する。次に、ステップS20で、更新部15が、損失関数Lθ,φ,ψ(x,C)を最小化するように、エンコーダfφのパラメータφ、デコーダgθのパラメータθ、及び分布予測器hψのパラメータψを更新する。 Next, in step S18, the loss calculation unit 14 calculates the amount of coded information D KL (p(z|x )∥q (z|C)) of the latent variable from the probability distribution p(z|x) of the latent variable z obtained by applying noise ε to the latent variable y and the prior distribution q(z|C) of the latent variable depending on the context C. Next, in step S19, the loss calculation unit 14 calculates the weighted sum of the calculated recovery error D(x, x^) and the amount of coded information D KL (p(z|x)∥q(z|C)) of the latent variable as the loss function L θ,φ,ψ (x,C). Next, in step S20, the update unit 15 updates the parameter φ of the encoder f φ , the parameter θ of the decoder g θ , and the parameter ψ of the distribution predictor h ψ so as to minimize the loss function L θ,φ, ψ (x,C).
 次に、ステップS21で、更新部15が、機械学習が収束したか否かを判定する。例えば、更新部15は、パラメータθ、φ、及びψの更新回数が所定回数に達した場合、損失関数の値が所定値以下となった場合、前回算出した損失関数と今回算出した損失関数との差が所定値以下となった場合等に、機械学習が収束したと判定してよい。機械学習が収束していない場合には、ステップS12に戻り、機械学習が収束した場合には、機械学習処理は終了する。 Next, in step S21, the update unit 15 determines whether the machine learning has converged. For example, the update unit 15 may determine that the machine learning has converged when the number of updates of the parameters θ, φ, and ψ reaches a predetermined number, when the value of the loss function becomes equal to or less than a predetermined value, when the difference between the loss function calculated last time and the loss function calculated this time becomes equal to or less than a predetermined value, etc. If the machine learning has not converged, the process returns to step S12, and if the machine learning has converged, the machine learning process ends.
 なお、上記機械学習処理のステップS12~S15の処理と、ステップS16~S17の処理とは、処理順を逆にしてもよいし、並行して実行してもよい。 Note that the order of steps S12 to S15 and steps S16 to S17 of the machine learning process may be reversed, or may be executed in parallel.
 次に、図9に示す予測処理について説明する。予測処理は、上記の機械学習処理により更新された最終的なパラメータψが設定された分布予測器hψ、及び最終的なパラメータθが設定されたオートエンコーダのデコーダgθ、すなわち訓練済みの分布予測器hψ及びデコーダgθを用いて実行される。 Next, the prediction process shown in Fig. 9 will be described. The prediction process is performed using a distribution predictor h ψ to which the final parameter ψ updated by the above machine learning process is set, and a decoder g θ of the autoencoder to which the final parameter θ is set, that is, a trained distribution predictor h ψ and a decoder g θ .
 ステップS31で、分布予測部21が、予測対象データである時刻tのデータについてのコンテキストCとして、過去の期間T(t-t~t-1まで)の時系列データを取得する。次に、ステップS32で、分布予測部21が、コンテキストCを、訓練済みの分布予測器hψに入力することにより、予測対象データについての潜在変数の平均μ及び分散σを予測する。 In step S31, the distribution prediction unit 21 acquires time series data for a past period T (from t-t to t-1) as a context C for the data at time t, which is the data to be predicted. Next, in step S32, the distribution prediction unit 21 inputs the context C to a trained distribution predictor h ψ , thereby predicting the mean μ C and variance σ C of the latent variables for the data to be predicted.
 次に、ステップS33で、データ予測部22は、予測されたK次元の潜在変数の平均μを、訓練済みのオートエンコーダのデコーダgθに入力することで、予測対象データの予測値xを予測する。次に、ステップS34で、共分散算出部23が、分布予測部21で予測されたK次元の潜在変数の分散σと、デコーダの数値微分により得られた勾配から、予測値の共分散Cov(x ,x )を算出する。次に、ステップS35で、データ予測部22が、予測した予測値xを出力し、共分散算出部23が、算出した共分散Cov(x ,x )を出力して、予測処理は終了する。 Next, in step S33, the data prediction unit 22 predicts the predicted value x of the prediction target data by inputting the predicted mean μ C of the K-dimensional latent variable to the decoder g θ of the trained autoencoder. Next, in step S34, the covariance calculation unit 23 calculates the covariance Cov(x i , x j ) of the predicted value from the variance σ C of the K-dimensional latent variable predicted by the distribution prediction unit 21 and the gradient obtained by numerical differentiation of the decoder. Next, in step S35, the data prediction unit 22 outputs the predicted predicted value x , and the covariance calculation unit 23 outputs the calculated covariance Cov(x i , x j ), and the prediction process ends.
 以上説明したように、本実施形態に係る情報処理システムによれば、機械学習装置が、多次元の入力データを潜在変数に変換するエンコーダ、及び潜在変数を復元するデコーダを含むオートエンコーダを、損失関数を最小化するように訓練する。損失関数は、オートエンコーダの復元誤差、及び潜在変数にノイズを加えて生成された確率分布と入力データについての過去所定期間分の時系列データに基づいて予測された潜在変数の事前分布とに基づいて算出された潜在変数の符号化情報量を含む。そして、予測装置が、訓練されたオートエンコーダのデコーダに、多次元の予測対象データについての過去所定期間分の時系列データに基づいて予測される潜在変数を入力することで、予測対象データの多次元の予測値、及び多次元の予測値間の共分散を算出する。 As described above, according to the information processing system of this embodiment, the machine learning device trains an autoencoder including an encoder that converts multidimensional input data into latent variables and a decoder that restores the latent variables so as to minimize a loss function. The loss function includes the restoration error of the autoencoder and the encoded information amount of the latent variables calculated based on a probability distribution generated by adding noise to the latent variables and a prior distribution of the latent variables predicted based on time series data for a predetermined period of time in the past for the input data. Then, the prediction device inputs latent variables predicted based on time series data for the multidimensional prediction target data for a predetermined period of time into the decoder of the trained autoencoder, thereby calculating multidimensional prediction values of the prediction target data and covariance between the multidimensional prediction values.
 これにより、図10に示すように、多次元の時系列データに基づく各データの予測において、予測値と同時に予測値の分散及び共分散を得ることができる。なお、図10では、2次元の時系列データの各々についての予測値及び分散と、データ間の共分散が算出される例を示している。 As a result, as shown in Figure 10, when predicting each piece of data based on multidimensional time series data, the predicted value as well as the variance and covariance of the predicted value can be obtained. Note that Figure 10 shows an example in which the predicted value and variance for each piece of two-dimensional time series data and the covariance between the data are calculated.
 このように、本実施形態では、多次元の時系列データの予測と同時に、予測値の共分散が求まるため、次元間に相関があるデータの予測に適用することが有用である。例えば、複数銘柄の株価の予測を行う場合には、各銘柄の予測値について得られる分散は、どの銘柄を売買すべきかの判断に有用な情報となる。さらに、銘柄間の予測値の共分散は、どの組合せで銘柄を売買すべきかの判断に有用な情報となる。 In this way, in this embodiment, since the covariance of predicted values is calculated at the same time as predicting multidimensional time series data, it is useful to apply this to predicting data in which there is correlation between dimensions. For example, when predicting the stock prices of multiple stocks, the variance obtained for the predicted values of each stock is useful information for determining which stocks should be bought and sold. Furthermore, the covariance of predicted values between stocks is useful information for determining in which combination stocks should be bought and sold.
 図11に、本実施形態に係る手法(以下、「本手法」という)と、参考手法との比較表を示す。図11において、参考手法1は、上述した、ガウス過程による予測手法、参考手法2は、RNNやLSTMニューラルネットワークを利用した手法、参考手法3は、等長性生成モデルである。参考手法1では、コンテキスト依存性があるような複雑なデータに対する予測精度に問題がある。参考手法2では、予測と同時に分散の算出ができないという問題がある。参考手法3では、時系列データの予測に不適であるという問題がある。本手法では、各参考手法のいずれの問題も解消し、さらに、予測値間の共分散の算出も可能である。 FIG. 11 shows a comparison table between the method according to this embodiment (hereinafter referred to as "this method") and the reference methods. In FIG. 11, reference method 1 is the prediction method using the Gaussian process described above, reference method 2 is a method that uses an RNN or LSTM neural network, and reference method 3 is an isometric generative model. Reference method 1 has a problem with prediction accuracy for complex data that has context dependency. Reference method 2 has a problem in that it is not possible to calculate variance simultaneously with prediction. Reference method 3 has a problem in that it is not suitable for predicting time series data. This method solves all of the problems of the reference methods, and furthermore, it is possible to calculate covariance between predicted values.
 なお、上記実施形態では、機械学習装置と予測装置とを別々のコンピュータで構成する場合について説明したが、これに限定されない。機械学習装置に相当する機械学習部と、予測装置に相当する予測部とを有する情報処理装置として、1つのコンピュータで実現してもよい。 In the above embodiment, the machine learning device and the prediction device are configured as separate computers, but this is not limiting. They may be realized as an information processing device having a machine learning unit equivalent to the machine learning device and a prediction unit equivalent to the prediction device, on a single computer.
 また、上記実施形態では、機械学習プログラム及び予測プログラムが記憶装置に予め記憶(インストール)されているが、これに限定されない。開示の技術に係るプログラムは、CD-ROM、DVD-ROM、USBメモリ等の記憶媒体に記憶された形態で提供されてもよい。 In addition, in the above embodiment, the machine learning program and the prediction program are pre-stored (installed) in the storage device, but this is not limited to this. The programs related to the disclosed technology may be provided in a form stored in a storage medium such as a CD-ROM, DVD-ROM, or USB memory.
10   機械学習装置
11   分布予測部
12   第1変換部
13   第2変換部
14   損失算出部
15   更新部
20   予測装置
21   分布予測部
22   データ予測部
23   共分散算出部
30、60    コンピュータ
31、61    CPU
32、62    メモリ
33、63    記憶装置
34、64    入出力装置
35、65    R/W装置
36、66    通信I/F
37、67    バス
39、69    記憶媒体
40   機械学習プログラム
41   分布予測プロセス制御命令
42   第1変換プロセス制御命令
43   第2変換プロセス制御命令
44   損失算出プロセス制御命令
45   更新プロセス制御命令
50   情報記憶領域
70   予測プログラム
71   分布予測プロセス制御命令
72   データ予測プロセス制御命令
73   共分散算出プロセス制御命令
80   情報記憶領域
REFERENCE SIGNS LIST 10 Machine learning device 11 Distribution prediction unit 12 First conversion unit 13 Second conversion unit 14 Loss calculation unit 15 Update unit 20 Prediction device 21 Distribution prediction unit 22 Data prediction unit 23 Covariance calculation unit 30, 60 Computer 31, 61 CPU
32, 62 Memory 33, 63 Storage device 34, 64 Input/output device 35, 65 R/W device 36, 66 Communication I/F
37, 67 Bus 39, 69 Storage medium 40 Machine learning program 41 Distribution prediction process control instruction 42 First conversion process control instruction 43 Second conversion process control instruction 44 Loss calculation process control instruction 45 Update process control instruction 50 Information storage area 70 Prediction program 71 Distribution prediction process control instruction 72 Data prediction process control instruction 73 Covariance calculation process control instruction 80 Information storage area

Claims (13)

  1.  多次元の入力データを潜在変数に変換するエンコーダ、及び前記潜在変数を復元するデコーダを含むオートエンコーダを、前記オートエンコーダの復元誤差、及び前記潜在変数にノイズを加えて生成された確率分布と前記入力データについての過去所定期間分の時系列データに基づいて予測された前記潜在変数の事前分布とに基づいて算出された前記潜在変数の符号化情報量を含む損失関数を最小化するように訓練し、
     訓練された前記オートエンコーダのデコーダに、多次元の予測対象データについての過去所定期間分の時系列データに基づいて予測される潜在変数を入力することで、前記多次元の予測対象データの予測値、及び前記予測値間の共分散を算出する
     ことを含む処理をコンピュータに実行させるための情報処理プログラム。
    an autoencoder including an encoder that converts multidimensional input data into latent variables and a decoder that restores the latent variables is trained so as to minimize a restoration error of the autoencoder and a loss function including an encoded information amount of the latent variables calculated based on a probability distribution generated by adding noise to the latent variables and a prior distribution of the latent variables predicted based on time-series data for the input data for a predetermined period of time;
    An information processing program for causing a computer to execute a process including: calculating a predicted value of multidimensional prediction target data and a covariance between the predicted values by inputting a latent variable predicted based on time series data for the multidimensional prediction target data for a predetermined period of time into a decoder of the trained autoencoder.
  2.  前記入力データについての過去所定期間分の時系列データに基づいて、前記入力データについての潜在変数の平均及び分散を予測する分布予測器を、前記オートエンコーダと共に訓練する請求項1に記載の情報処理プログラム。 The information processing program according to claim 1, further comprising: training, together with the autoencoder, a distribution predictor that predicts the mean and variance of latent variables for the input data based on time series data for the input data for a predetermined period of time in the past.
  3.  前記分布予測器で予測される平均及び分散で表される、前記潜在変数の次元数分の確率分布を混合することにより、前記潜在変数の事前分布を予測する請求項2に記載の情報処理プログラム。 The information processing program according to claim 2, which predicts the prior distribution of the latent variable by mixing probability distributions for the number of dimensions of the latent variable, which are represented by the mean and variance predicted by the distribution predictor.
  4.  訓練された前記分布予測器に、前記予測対象データについての過去所定期間分の時系列データを入力することで、前記予測対象データについての潜在変数の平均及び分散を予測し、
     予測された前記潜在変数の平均を前記デコーダに入力することで、前記予測値を算出し、
     予測された前記潜在変数の分散と、前記デコーダの数値微分により得られた勾配とに基づいて、前記共分散を算出する、
     請求項2又は請求項3に記載の情報処理プログラム。
    predicting a mean and a variance of a latent variable for the prediction target data by inputting time series data for a predetermined period of time for the prediction target data into the trained distribution predictor;
    Calculating the predicted value by inputting the predicted mean of the latent variables to the decoder;
    Calculating the covariance based on the predicted variances of the latent variables and gradients obtained by numerical differentiation of the decoder.
    4. The information processing program according to claim 2 or 3.
  5.  多次元の入力データを潜在変数に変換するエンコーダ、及び前記潜在変数を復元するデコーダを含むオートエンコーダを、前記オートエンコーダの復元誤差、及び前記潜在変数にノイズを加えて生成された確率分布と前記入力データについての過去所定期間分の時系列データに基づいて予測された前記潜在変数の事前分布とに基づいて算出された前記潜在変数の符号化情報量を含む損失関数を最小化するように訓練する機械学習部と、
     訓練された前記オートエンコーダのデコーダに、多次元の予測対象データについての過去所定期間分の時系列データに基づいて予測される潜在変数を入力することで、前記多次元の予測対象データの予測値、及び前記予測値間の共分散を算出する予測部と、
     を含む情報処理装置。
    a machine learning unit that trains an autoencoder including an encoder that converts multidimensional input data into latent variables and a decoder that restores the latent variables so as to minimize a restoration error of the autoencoder and a loss function including an encoded information amount of the latent variables calculated based on a probability distribution generated by adding noise to the latent variables and a prior distribution of the latent variables predicted based on time-series data for the input data for a predetermined period of time;
    a prediction unit that calculates a predicted value of the multidimensional prediction target data and a covariance between the predicted values by inputting a latent variable predicted based on time series data for a predetermined past period for the multidimensional prediction target data into a decoder of the trained autoencoder;
    An information processing device comprising:
  6.  前記機械学習部は、前記入力データについての過去所定期間分の時系列データに基づいて、前記入力データについての潜在変数の平均及び分散を予測する分布予測器を、前記オートエンコーダと共に訓練する請求項5に記載の情報処理装置。 The information processing device according to claim 5, wherein the machine learning unit trains, together with the autoencoder, a distribution predictor that predicts the mean and variance of latent variables for the input data based on time series data for the input data for a predetermined period in the past.
  7.  前記機械学習部は、前記分布予測器で予測される平均及び分散で表される、前記潜在変数の次元数分の確率分布を混合することにより、前記潜在変数の事前分布を予測する請求項6に記載の情報処理装置。 The information processing device according to claim 6, wherein the machine learning unit predicts the prior distribution of the latent variable by mixing probability distributions for the number of dimensions of the latent variable, the distribution distributions being represented by the mean and variance predicted by the distribution predictor.
  8.  前記予測部は、
     訓練された前記分布予測器に、前記予測対象データについての過去所定期間分の時系列データを入力することで、前記予測対象データについての潜在変数の平均及び分散を予測し、
     予測された前記潜在変数の平均を前記デコーダに入力することで、前記予測値を算出し、
     予測された前記潜在変数の分散と、前記デコーダの数値微分により得られた勾配とに基づいて、前記共分散を算出する、
     請求項6又は請求項7に記載の情報処理装置。
    The prediction unit is
    predicting a mean and a variance of a latent variable for the prediction target data by inputting time series data for a predetermined period of time for the prediction target data into the trained distribution predictor;
    Calculating the predicted value by inputting the predicted mean of the latent variables to the decoder;
    Calculating the covariance based on the predicted variances of the latent variables and gradients obtained by numerical differentiation of the decoder.
    8. The information processing device according to claim 6 or 7.
  9.  多次元の入力データを潜在変数に変換するエンコーダ、及び前記潜在変数を復元するデコーダを含むオートエンコーダを、前記オートエンコーダの復元誤差、及び前記潜在変数にノイズを加えて生成された確率分布と前記入力データについての過去所定期間分の時系列データに基づいて予測された前記潜在変数の事前分布とに基づいて算出された前記潜在変数の符号化情報量を含む損失関数を最小化するように訓練し、
     訓練された前記オートエンコーダのデコーダに、多次元の予測対象データについての過去所定期間分の時系列データに基づいて予測される潜在変数を入力することで、前記多次元の予測対象データの予測値、及び前記予測値間の共分散を算出する
     ことを含む処理をコンピュータが実行する情報処理方法。
    an autoencoder including an encoder that converts multidimensional input data into latent variables and a decoder that restores the latent variables is trained so as to minimize a restoration error of the autoencoder and a loss function including an encoded information amount of the latent variables calculated based on a probability distribution generated by adding noise to the latent variables and a prior distribution of the latent variables predicted based on time-series data for the input data for a predetermined period of time;
    An information processing method in which a computer executes a process including: calculating a predicted value of multidimensional prediction target data and a covariance between the predicted values by inputting latent variables predicted based on time-series data for the multidimensional prediction target data for a predetermined period of time into a decoder of the trained autoencoder.
  10.  前記入力データについての過去所定期間分の時系列データに基づいて、前記入力データについての潜在変数の平均及び分散を予測する分布予測器を、前記オートエンコーダと共に訓練する請求項9に記載の情報処理方法。 The information processing method according to claim 9, further comprising training a distribution predictor together with the autoencoder to predict the mean and variance of latent variables for the input data based on time series data for the input data for a predetermined period of time in the past.
  11.  前記分布予測器で予測される平均及び分散で表される、前記潜在変数の次元数分の確率分布を混合することにより、前記潜在変数の事前分布を予測する請求項10に記載の情報処理方法。 The information processing method according to claim 10, which predicts the prior distribution of the latent variable by mixing probability distributions for the number of dimensions of the latent variable, which are represented by the mean and variance predicted by the distribution predictor.
  12.  訓練された前記分布予測器に、前記予測対象データについての過去所定期間分の時系列データを入力することで、前記予測対象データについての潜在変数の平均及び分散を予測し、
     予測された前記潜在変数の平均を前記デコーダに入力することで、前記予測値を算出し、
     予測された前記潜在変数の分散と、前記デコーダの数値微分により得られた勾配とに基づいて、前記共分散を算出する、
     請求項10又は請求項11に記載の情報処理方法。
    predicting a mean and a variance of a latent variable for the prediction target data by inputting time series data for a predetermined period of time for the prediction target data into the trained distribution predictor;
    Calculating the predicted value by inputting the predicted mean of the latent variables to the decoder;
    Calculating the covariance based on the predicted variances of the latent variables and gradients obtained by numerical differentiation of the decoder.
    The information processing method according to claim 10 or 11.
  13.  多次元の入力データを潜在変数に変換するエンコーダ、及び前記潜在変数を復元するデコーダを含むオートエンコーダを、前記オートエンコーダの復元誤差、及び前記潜在変数にノイズを加えて生成された確率分布と前記入力データについての過去所定期間分の時系列データに基づいて予測された前記潜在変数の事前分布とに基づいて算出された前記潜在変数の符号化情報量を含む損失関数を最小化するように訓練し、
     訓練された前記オートエンコーダのデコーダに、多次元の予測対象データについての過去所定期間分の時系列データに基づいて予測される潜在変数を入力することで、前記多次元の予測対象データの予測値、及び前記予測値間の共分散を算出する
     ことを含む処理をコンピュータに実行させるための情報処理プログラムを記憶した非一時的記憶媒体。
    an autoencoder including an encoder that converts multidimensional input data into latent variables and a decoder that restores the latent variables is trained so as to minimize a restoration error of the autoencoder and a loss function including an encoded information amount of the latent variables calculated based on a probability distribution generated by adding noise to the latent variables and a prior distribution of the latent variables predicted based on time-series data for the input data for a predetermined period of time;
    A non-transitory storage medium storing an information processing program for causing a computer to execute a process including: calculating a predicted value of multidimensional prediction target data and a covariance between the predicted values by inputting a latent variable predicted based on time series data for the multidimensional prediction target data for a predetermined period of time into a decoder of the trained autoencoder.
PCT/JP2022/039750 2022-10-25 2022-10-25 Information processing program, device, and method WO2024089770A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/039750 WO2024089770A1 (en) 2022-10-25 2022-10-25 Information processing program, device, and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/039750 WO2024089770A1 (en) 2022-10-25 2022-10-25 Information processing program, device, and method

Publications (1)

Publication Number Publication Date
WO2024089770A1 true WO2024089770A1 (en) 2024-05-02

Family

ID=90830360

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/039750 WO2024089770A1 (en) 2022-10-25 2022-10-25 Information processing program, device, and method

Country Status (1)

Country Link
WO (1) WO2024089770A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021059348A1 (en) * 2019-09-24 2021-04-01 富士通株式会社 Learning method, learning program, and learning device
JP2022074133A (en) * 2020-11-02 2022-05-17 インターナショナル・ビジネス・マシーンズ・コーポレーション Computing device, computer-implemented method and computer-readable storage medium for multivariate time series modeling and forecasting (probabilistic nonlinear relationships across multiple time series and external factors for improved multivariate time series modeling and forecasting)

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021059348A1 (en) * 2019-09-24 2021-04-01 富士通株式会社 Learning method, learning program, and learning device
JP2022074133A (en) * 2020-11-02 2022-05-17 インターナショナル・ビジネス・マシーンズ・コーポレーション Computing device, computer-implemented method and computer-readable storage medium for multivariate time series modeling and forecasting (probabilistic nonlinear relationships across multiple time series and external factors for improved multivariate time series modeling and forecasting)

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
EIKI KAWATA, RIKU TANAKA, TOMOYA SUZUKI: "N-2-16: Behavioral Econometric Factors Extracted by Anomaly Detection with Autoencoder", IEICE GENERAL CONFERENCE, THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, JAPAN, 23 February 2021 (2021-02-23) - 12 March 2021 (2021-03-12), Japan , pages 254, XP009555175, ISSN: 1349-1369 *
KEIZO KATO: "Rate-Distortion Optimization Guided Autoencoder for Isometric Embedding in Euclidean Latent Space", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, ARXIV.ORG, ITHACA, 31 August 2020 (2020-08-31), Ithaca, XP093163303, DOI: 10.48550/arxiv.1910.04329 *

Similar Documents

Publication Publication Date Title
Triebe et al. Ar-net: A simple auto-regressive neural network for time-series
Hoogeboom et al. Blurring diffusion models
US11468324B2 (en) Method and apparatus with model training and/or sequence recognition
US20230252301A1 (en) Initialization of Parameters for Machine-Learned Transformer Neural Network Architectures
JP6182242B1 (en) Machine learning method, computer and program related to data labeling model
US20220092411A1 (en) Data prediction method based on generative adversarial network and apparatus implementing the same method
JP2020009410A (en) System and method for classifying multidimensional time series of parameter
JP7205640B2 (en) LEARNING METHODS, LEARNING PROGRAMS AND LEARNING DEVICES
JP7344149B2 (en) Optimization device and optimization method
WO2020234984A1 (en) Learning device, learning method, computer program, and recording medium
US20210090552A1 (en) Learning apparatus, speech recognition rank estimating apparatus, methods thereof, and program
US11763152B2 (en) System and method of improving compression of predictive models
JP7007659B2 (en) Kernel learning device that uses the transformed convex optimization problem
JP7205641B2 (en) LEARNING METHODS, LEARNING PROGRAMS AND LEARNING DEVICES
WO2020218246A1 (en) Optimization device, optimization method, and program
WO2024089770A1 (en) Information processing program, device, and method
KR102617958B1 (en) Method and apparatus for cross attention mechanism based compound-protein interaction prediction
Zhu et al. A hybrid model for nonlinear regression with missing data using quasilinear kernel
JPWO2021059349A5 (en)
CA3160910A1 (en) Systems and methods for semi-supervised active learning
JP7163977B2 (en) Estimation device, learning device, method thereof, and program
JP7047665B2 (en) Learning equipment, learning methods and learning programs
JP6981860B2 (en) Series data analysis device, series data analysis method and program
JP7465497B2 (en) Learning device, learning method, and program
JP2019095894A (en) Estimating device, learning device, learned model, estimation method, learning method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22963422

Country of ref document: EP

Kind code of ref document: A1