WO2024089770A1

WO2024089770A1 - Information processing program, device, and method

Info

Publication number: WO2024089770A1
Application number: PCT/JP2022/039750
Authority: WO
Inventors: 正之廣本; 章中川
Original assignee: 富士通株式会社
Priority date: 2022-10-25
Filing date: 2022-10-25
Publication date: 2024-05-02

Abstract

The present invention calculates predicted values of multi-dimensional data to be predicted and the covariance of the predicted values by: training an auto-encoder, which includes an encoder that converts multi-dimensional input data into latent variables and a decoder that reconstructs the latent variables, so as to minimize a loss function including a coding information quantity of the latent variables, the coding information quantity being calculated on the basis of a probability distribution, which is generated by adding noise to the latent variables and a reconstruction error of the auto-encoder, and a priori distribution, which is predicted on the basis of time-series data regarding input data for the past prescribed time period, of the latent variables; and inputting, to the decoder of the trained auto-encoder, the latent variables predicted on the basis of the time-series data regarding the multi-dimensional data to be predicted for the past prescribed time period.

Description

Information processing program, device, and method

The disclosed technology relates to an information processing program, an information processing device, and an information processing method.

　Traditionally, future values of each data item have been predicted based on multidimensional past time series data. For example, when predicting the stock prices of multiple stocks, the stock price of each stock at the next time is predicted based on past data.

One method for predicting such time series data is to use a Gaussian process. With this method, it is possible to simultaneously calculate the predicted value and variance by predicting the time series data as a probability distribution. There are also methods for predicting time series data using a predictive model that uses a recurrent neural network (RNN) or a long short-term memory (LSTM) neural network. With this method, the use of a neural network allows for highly accurate predictions.

There is also a method for predicting multidimensional data that uses generative model-based deep learning technology with isometric properties. With this method, a highly accurate machine learning model that treats the latent representation of the data as a probability distribution can be used to simultaneously obtain a predicted value and the variance of that predicted value.

International Publication No. 2021/059348 International Publication No. 2021/059349

However, prediction methods using Gaussian processes have the problem that they are limited in their prediction accuracy for complex correlated data. Specifically, because this method is based on using a simple regression model, measures such as manually selecting an appropriate kernel are necessary to improve prediction accuracy. In addition, this method assumes the prediction of a stationary process and is unsuitable for predicting complex data that is context-dependent. Furthermore, methods that use RNN or LSTM neural networks only predict data, so the variance of the predicted value cannot be obtained. Additionally, methods that use generative model-type deep learning technology with isometricity have the problem that they are unsuitable for predicting time series data.

In one aspect, the disclosed technology aims to obtain the covariance of predicted values at the same time as the predicted values when predicting each data item based on multidimensional time series data.

In one embodiment, the disclosed technology trains an autoencoder that includes an encoder that converts multidimensional input data into latent variables and a decoder that restores the latent variables to minimize a loss function. The loss function includes a restoration error of the autoencoder. The loss function also includes an encoded information amount of the latent variables calculated based on a probability distribution generated by adding noise to the latent variables and a prior distribution of the latent variables predicted based on time series data for the input data for a predetermined period of time in the past. The disclosed technology also inputs latent variables predicted based on time series data for multidimensional prediction target data for a predetermined period of time to the decoder of the trained autoencoder. In this way, the disclosed technology calculates the predicted value of the multidimensional prediction target data and the covariance between the predicted values.

One aspect is that when predicting each piece of data based on multidimensional time series data, it is possible to obtain the covariance of the predicted values at the same time as the predicted values.

FIG. 1 is a diagram illustrating an example of the configuration of an isometric generation model that is an existing technology. FIG. 1 is a functional block diagram of a machine learning device according to an embodiment of the present invention. FIG. 1 illustrates an example of the configuration of a machine learning device. FIG. 2 is a functional block diagram of the prediction device according to the present embodiment. FIG. 1 illustrates an example of the configuration of a prediction device. FIG. 1 is a block diagram showing a schematic configuration of a computer that functions as a machine learning device. FIG. 1 is a block diagram showing a schematic configuration of a computer functioning as a prediction device. 1 is a flowchart illustrating an example of a machine learning process. 13 is a flowchart illustrating an example of a prediction process. FIG. 1 illustrates how forecast values, variances, and covariances are calculated for multi-dimensional time series data. 1 is a comparison table between the present method and the reference method.

Below, an example of an embodiment of the disclosed technology is described with reference to the drawings.

First, an overview of this embodiment will be described. In this embodiment, a prior distribution of context-dependent latent variables is introduced into a method using a generative model-type deep learning technology with isometry (hereinafter, referred to as an "isometry generative model").

Here, an isometric generative model (for example, Non-Patent Documents 1 and 2), which is an existing technology, will be described. As shown in FIG. 1, the isometric generative model is composed of an autoencoder including an encoder that encodes input data x to obtain a latent variable z, and a decoder that restores the latent variable z and converts it into an output x^ (in the figure and in the formulas described later, a "^ (hat)" is written above "x"). If the parameter of the encoder is φ and the parameter of the decoder is θ, then f _φ (x) = z and g _θ (z) = x^. The decoder receives the latent variable z to which noise ε ~ N(0, (β/2)I) has been applied. Note that β is a hyperparameter and I is a unit matrix.

The isometric generative model also calculates a probability distribution p _ψ (z) of a latent variable z obtained by converting an input x with an encoder f _φ , and calculates the amount of encoded information of the latent variable z as R = -log(p _ψ (z)). Furthermore, the isometric generative model calculates a restoration error D(x, x^), which is the error between the input x and the output x^, and trains the encoder f _φ , the decoder g _θ , and the probability distribution p _ψ of the latent variable z by optimization shown in the following formula (1).

Here, E _{x~p(x), ε~N(0,(β/2)I)} [X] represents the average of X when the input x is sampled multiple times from the probability distribution p(x) and the noise ε is sampled multiple times from the normal distribution N(0,(β/2)I). In this way, the isometric generative model can acquire a low-dimensional latent representation while maintaining the characteristics of the input data by reducing the amount of encoded information R of the latent variable z at the same time as the recovery error D, and can also calculate the probability distribution of the output by isometricity.

However, as mentioned above, the above isometry generative model is not suitable for predicting time series data. Therefore, in this embodiment, past time series data is provided as context to the above isometry generative model, thereby making it possible to predict time series data. Furthermore, in this embodiment, by taking advantage of the advantages of the isometry generative model, data prediction can be performed with high accuracy, and the variance of the predicted value can be calculated at the same time as the predicted value. Furthermore, in this embodiment, the covariance between the predicted values of data for each dimension of multidimensional data can also be calculated.

The information processing system according to this embodiment is described in detail below.

The information processing system according to this embodiment includes a machine learning device and a prediction device.

First, the machine learning device will be described. As shown in FIG. 2, the machine learning device 10 functionally includes a distribution prediction unit 11, a first conversion unit 12, a second conversion unit 13, a loss calculation unit 14, and an update unit 15. Below, each functional unit will be described with reference to the configuration example of the machine learning device 10 shown in FIG. 3.

The distribution prediction unit 11 predicts the prior distribution of the latent variable based on the time series data of the input data for a predetermined period of time in the past. Specifically, the distribution prediction unit 11 predicts the mean and variance of the latent variable for the input data by inputting the time series data of the input data for a predetermined period of time in the past to the distribution predictor. In the example of FIG. 3, N-dimensional data s _t = {s _t ¹ , s _t ² , ..., s _t ^N } at time t is input x, and time series data of the past period T is context C = {s _t-1 , s _t-2 , ..., s _t-T }. In the example of FIG. 3, h _ψ is the distribution predictor, and ψ is a parameter of the distribution predictor h _ψ . The mean μ ^C and variance σ ^C predicted by the distribution predictor h _ψ are expressed as μ ^C , σ ^C = h _ψ (C). The mean μ ^C is μ ^C = {μ ₁ ^C , ..., μ _k ^C , ..., μ _K ^C }, and the variance σ ^C is σ ^C = {σ ₁ ^C , ..., σ _k ^C , ..., σ _K ^C }, where k is the index of each dimension of the latent variable, K is the number of dimensions of the latent variable, and K<N.

The distribution prediction unit 11 predicts a prior distribution _q (z|C) for a latent variable z that depends on a context C by mixing probability distributions for the number of dimensions of the latent variable, represented by the mean μ ^C and variance σ ^C predicted by the distribution predictor h ψ. That is, the prior distribution q(z|C) is represented by the following formula (2). Note that formula (2) shows an example in which the probability distribution of the latent variable z _k of each dimension is a normal distribution with mean μ _k ^C and variance σ _k ^C.

The first conversion unit 12 converts the input x, which is N-dimensional input data, into a K-dimensional latent variable using the encoder of the autoencoder. In the example of FIG. 3, f _φ is the encoder, and φ is a parameter of the encoder f _φ . The conversion by the encoder f _φ is expressed as y=f _φ (x), where y={y _t ¹ ,..., y _t ^k ,..., y _t ^K }.

The second conversion unit 13 applies noise ε to the latent variable y to convert it into a latent variable z. In the example of FIG. 3, the noise ε follows a normal distribution with a mean of 0 and a variance of (β/2)I. β is a hyperparameter, and I is a unit matrix. Therefore, the probability distribution p(z|x) of the latent variable z for the input x is z~p(z|x)=N(z;y,(β/2)I). Then, the second conversion unit 13 converts the K-dimensional latent variable z into an N-dimensional output x^ using a decoder of an autoencoder. In the example of FIG. 3, g _θ is the decoder, and θ is a parameter of the decoder g _θ . The conversion by the decoder g _θ is expressed as x^=g _θ (z).

The loss calculation unit 14 calculates a loss function including the recovery error of the autoencoder and the amount of coded information of the latent variable. Specifically, the loss calculation unit 14 calculates the square error between the input x and the output x^ as the recovery error D(x, x^). In addition, the loss calculation unit 14 calculates the amount of information representing the difference between the probability distribution p(z|x) of the latent variable z obtained by applying noise ε to the latent variable y and the prior distribution q(z|C) of the latent variable depending on the context C as the amount of coded information of the latent variable. _{FIG. 3 shows an example in which the Kullback-Leibler divergence between the probability distribution p(z|x) and the prior distribution q(z|C) is calculated as the amount of coded information of the latent variable D KL} (p(z|x)∥q(z|C)).

The loss calculation unit 14 calculates the weighted sum of the calculated recovery error D(x, x^) and the amount of encoded information of the latent variable D _KL (p(z|x)∥q(z|C)) as a loss function L _θ,φ,ψ (x, C) as shown in the following equation (3).

The update unit 15 trains the encoder f _φ , the decoder g _θ , and the distribution predictor h _ψ of the autoencoder so as to minimize the loss function calculated by the loss calculation unit 14. Specifically, as shown in the following formula (4), the parameter φ of the encoder f _φ , the parameter θ of the decoder g _θ , and the parameter ψ of the distribution predictor h _ψ are updated so as to minimize the loss function L _θ,φ,ψ (x, C).

Next, the prediction device will be described. As shown in FIG. 4, the prediction device 20 functionally includes a distribution prediction unit 21, a data prediction unit 22, and a covariance calculation unit 23. Below, each functional unit will be described with reference to the configuration example of the prediction device 20 shown in FIG. 5.

The data prediction unit 22 predicts the mean and variance of latent variables for the prediction target data by inputting time series data for a past period T for data at time t, which is the prediction target data, i.e., context C, to a distribution predictor trained by the machine learning device 10. In the example of Fig. 5, h _ψ is the distribution predictor, and the mean μ ^C and variance σ ^C predicted by the distribution predictor h _ψ are expressed as μ ^C , σ ^C = h _ψ (C).

The data prediction unit 22 predicts the prediction target data by inputting the mean μ ^C of the K-dimensional latent variables predicted by the distribution prediction unit 21 to the decoder of the autoencoder trained by the machine learning device 10. In the example of FIG. 5, g _θ is the decoder, and θ is a parameter of the decoder g _θ . The conversion by the decoder g _θ is expressed as x ^∼ =g _θ (μ ^C ). ^{x ∼} (in the figures and in the formulas described later, the notation "∼ (tilde)" above "x") is the predicted value of the prediction target data. The data prediction unit 22 outputs the predicted value x ^∼ of the prediction target data.

The covariance calculation unit 23 calculates the covariance Cov(x i ~ , x j ~ ) of the predicted values from the variance σ ^C of the K-dimensional latent variables predicted by the distribution prediction unit 21 and the gradient obtained by numerical differentiation of the decoder, as shown in the following formula (5) ^. i and j are indexes of each dimension of the predicted value x ^~ . The covariance calculation unit 23 outputs the calculated covariance Cov( _{x i} _~ ^, ^x _j _~ ) ^.

The machine learning device 10 may be realized, for example, by a computer 30 shown in FIG. 6. The computer 30 includes a CPU (Central Processing Unit) 31, a memory 32 as a temporary storage area, and a non-volatile storage device 33. The computer 30 also includes an input/output device 34 such as an input device and a display device, and an R/W (Read/Write) device 35 that controls the reading and writing of data from and to a storage medium 39. The computer 30 also includes a communication I/F (Interface) 36 that is connected to a network such as the Internet. The CPU 31, memory 32, storage device 33, input/output device 34, R/W device 35, and communication I/F 36 are connected to each other via a bus 37.

The storage device 33 is, for example, a hard disk drive (HDD), a solid state drive (SSD), flash memory, etc. The storage device 33, which serves as a storage medium, stores a machine learning program 40 for causing the computer 30 to function as the machine learning device 10. The machine learning program 40 has a distribution prediction process control instruction 41, a first conversion process control instruction 42, a second conversion process control instruction 43, a loss calculation process control instruction 44, and an update process control instruction 45. The storage device 33 also has an information storage area 50 in which information constituting the distribution predictor, the encoder and the decoder of the autoencoder is stored.

The CPU 31 reads the machine learning program 40 from the storage device 33, expands it in the memory 32, and sequentially executes the control instructions of the machine learning program 40. The CPU 31 operates as the distribution prediction unit 11 shown in FIG. 2 by executing the distribution prediction process control instruction 41. The CPU 31 also operates as the first conversion unit 12 shown in FIG. 2 by executing the first conversion process control instruction 42. The CPU 31 also operates as the second conversion unit 13 shown in FIG. 2 by executing the second conversion process control instruction 43. The CPU 31 also operates as the loss calculation unit 14 shown in FIG. 2 by executing the loss calculation process control instruction 44. The CPU 31 also operates as the update unit 15 shown in FIG. 2 by executing the update process control instruction 45. The CPU 31 also reads information from the information storage area 50 and expands the distribution predictor and the encoder and decoder of the autoencoder in the memory 32. As a result, the computer 30 that has executed the machine learning program 40 functions as the machine learning device 10. The CPU 31 that executes the program is hardware.

The prediction device 20 may be realized, for example, by a computer 60 shown in FIG. 7. The computer 60 includes a CPU 61, a memory 62 as a temporary storage area, and a non-volatile storage device 63. The computer 60 also includes an input/output device 64, an R/W device 65 that controls the reading and writing of data from and to a storage medium 69, and a communication I/F 66. The CPU 61, memory 62, storage device 63, input/output device 64, R/W device 65, and communication I/F 66 are connected to each other via a bus 67.

The storage device 63 is, for example, an HDD, SSD, flash memory, etc. A prediction program 70 for causing the computer 60 to function as the prediction device 20 is stored in the storage device 63 as a storage medium. The prediction program 70 has a distribution prediction process control instruction 71, a data prediction process control instruction 72, and a covariance calculation process control instruction 73. The storage device 63 also has an information storage area 80 in which information constituting the trained distribution predictor and the decoder of the autoencoder is stored.

The CPU 61 reads out the prediction program 70 from the storage device 63, loads it in the memory 62, and sequentially executes the control instructions of the prediction program 70. The CPU 61 operates as the distribution prediction unit 21 shown in FIG. 4 by executing the distribution prediction process control instruction 71. The CPU 61 also operates as the data prediction unit 22 shown in FIG. 4 by executing the data prediction process control instruction 72. The CPU 61 also operates as the covariance calculation unit 23 shown in FIG. 4 by executing the covariance calculation process control instruction 73. The CPU 61 also reads out information from the information storage area 80, and loads the trained distribution predictor and the autoencoder decoder in the memory 62. In this way, the computer 60 that has executed the prediction program 70 functions as the prediction device 20. The CPU 61 that executes the program is hardware.

The functions realized by each of the machine learning program 40 and the prediction program 70 may be realized, for example, by a semiconductor integrated circuit, more specifically, an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), etc. The machine learning program 40 and the prediction program 70 are examples of information processing programs of the disclosed technology.

Next, the operation of the information processing system according to this embodiment will be described. When training the distribution predictor and the encoder and decoder of the autoencoder, the machine learning device 10 executes the machine learning process shown in FIG. 8. When predicting the data to be predicted, the prediction device 20 executes the prediction process shown in FIG. 9. Note that the machine learning process and the prediction process are examples of the information processing method of the disclosed technology.

First, we will explain the machine learning process shown in Figure 8.

In step S11, the first conversion unit 12 acquires N-dimensional data s _t at time t as an input x. The distribution prediction unit 11 acquires time-series data for a past period T (from t−t to t−1) as a context C.

Next, in step S12, the first conversion unit 12 converts the input x into a K-dimensional latent variable y using the encoder f _φ of the autoencoder. Next, in step S13, the second conversion unit 13 applies noise ε to the latent variable y to convert it into a latent variable z. Next, in step S14, the second conversion unit 13 converts the K-dimensional latent variable z into an N-dimensional output x^ using a decoder g _θ of the autoencoder. Next, in step S15, the loss calculation unit 14 calculates the recovery error D(x, x^) between the input x and the output x^.

Next, in step S16, the distribution prediction unit 11 predicts the mean μ ^C and variance σ ^C of the latent variable for the input x by inputting the context C to the distribution predictor h _ψ . Next, in step S17, the distribution prediction unit 11 predicts a prior distribution q(z|C) that depends on the context C for the latent variable z from the predicted mean μ ^C and variance σ ^C.

Next, in step S18, the loss calculation unit 14 calculates the amount of coded information D KL (p(z|x _)∥q (z|C)) of the latent variable from the probability distribution p(z|x) of the latent variable z obtained by applying noise ε to the latent variable y and the prior distribution q(z|C) of the latent variable depending on the context C. Next, in step S19, the loss calculation unit 14 calculates the weighted sum of the calculated recovery error D(x, x^) and the amount of coded information D _KL (p(z|x)∥q(z|C)) of the latent variable as the loss function L _θ,φ,ψ (x,C). Next, in step S20, the update unit 15 updates the parameter φ of the encoder f _φ , the parameter θ of the decoder g _θ , and the parameter ψ of the distribution predictor h _ψ so as to minimize the loss function L θ,φ, _ψ (x,C).

Next, in step S21, the update unit 15 determines whether the machine learning has converged. For example, the update unit 15 may determine that the machine learning has converged when the number of updates of the parameters θ, φ, and ψ reaches a predetermined number, when the value of the loss function becomes equal to or less than a predetermined value, when the difference between the loss function calculated last time and the loss function calculated this time becomes equal to or less than a predetermined value, etc. If the machine learning has not converged, the process returns to step S12, and if the machine learning has converged, the machine learning process ends.

Note that the order of steps S12 to S15 and steps S16 to S17 of the machine learning process may be reversed, or may be executed in parallel.

Next, the prediction process shown in Fig. 9 will be described. The prediction process is performed using a distribution predictor h _ψ to which the final parameter ψ updated by the above machine learning process is set, and a decoder g _θ of the autoencoder to which the final parameter θ is set, that is, a trained distribution predictor h _ψ and a decoder g _θ .

In step S31, the distribution prediction unit 21 acquires time series data for a past period T (from t-t to t-1) as a context C for the data at time t, which is the data to be predicted. Next, in step S32, the distribution prediction unit 21 inputs the context C to a trained distribution predictor h _ψ , thereby predicting the mean μ ^C and variance σ ^C of the latent variables for the data to be predicted.

Next, in step S33, the data prediction unit 22 predicts the predicted value x ^∼ of the prediction target data by inputting the predicted mean μ ^C of the K-dimensional latent variable to the decoder g _θ of the trained autoencoder. Next, in step S34, the covariance calculation unit 23 calculates the covariance Cov(x _i ^∼ , x _j ^∼ ) of the predicted value from the variance σ ^C of the K-dimensional latent variable predicted by the distribution prediction unit 21 and the gradient obtained by numerical differentiation of the decoder. Next, in step S35, the data prediction unit 22 outputs the predicted predicted value x ^∼ , and the covariance calculation unit 23 outputs the calculated covariance Cov(x _i ^∼ , x _j ^∼ ), and the prediction process ends.

As described above, according to the information processing system of this embodiment, the machine learning device trains an autoencoder including an encoder that converts multidimensional input data into latent variables and a decoder that restores the latent variables so as to minimize a loss function. The loss function includes the restoration error of the autoencoder and the encoded information amount of the latent variables calculated based on a probability distribution generated by adding noise to the latent variables and a prior distribution of the latent variables predicted based on time series data for a predetermined period of time in the past for the input data. Then, the prediction device inputs latent variables predicted based on time series data for the multidimensional prediction target data for a predetermined period of time into the decoder of the trained autoencoder, thereby calculating multidimensional prediction values of the prediction target data and covariance between the multidimensional prediction values.

As a result, as shown in Figure 10, when predicting each piece of data based on multidimensional time series data, the predicted value as well as the variance and covariance of the predicted value can be obtained. Note that Figure 10 shows an example in which the predicted value and variance for each piece of two-dimensional time series data and the covariance between the data are calculated.

In this way, in this embodiment, since the covariance of predicted values is calculated at the same time as predicting multidimensional time series data, it is useful to apply this to predicting data in which there is correlation between dimensions. For example, when predicting the stock prices of multiple stocks, the variance obtained for the predicted values of each stock is useful information for determining which stocks should be bought and sold. Furthermore, the covariance of predicted values between stocks is useful information for determining in which combination stocks should be bought and sold.

FIG. 11 shows a comparison table between the method according to this embodiment (hereinafter referred to as "this method") and the reference methods. In FIG. 11, reference method 1 is the prediction method using the Gaussian process described above, reference method 2 is a method that uses an RNN or LSTM neural network, and reference method 3 is an isometric generative model. Reference method 1 has a problem with prediction accuracy for complex data that has context dependency. Reference method 2 has a problem in that it is not possible to calculate variance simultaneously with prediction. Reference method 3 has a problem in that it is not suitable for predicting time series data. This method solves all of the problems of the reference methods, and furthermore, it is possible to calculate covariance between predicted values.

In the above embodiment, the machine learning device and the prediction device are configured as separate computers, but this is not limiting. They may be realized as an information processing device having a machine learning unit equivalent to the machine learning device and a prediction unit equivalent to the prediction device, on a single computer.

In addition, in the above embodiment, the machine learning program and the prediction program are pre-stored (installed) in the storage device, but this is not limited to this. The programs related to the disclosed technology may be provided in a form stored in a storage medium such as a CD-ROM, DVD-ROM, or USB memory.

REFERENCE SIGNS LIST 10 Machine learning device 11 Distribution prediction unit 12 First conversion unit 13 Second conversion unit 14 Loss calculation unit 15 Update unit 20 Prediction device 21 Distribution prediction unit 22 Data prediction unit 23 Covariance calculation unit 30, 60 Computer 31, 61 CPU
32, 62 Memory 33, 63 Storage device 34, 64 Input/output device 35, 65 R/W device 36, 66 Communication I/F
37, 67 Bus 39, 69 Storage medium 40 Machine learning program 41 Distribution prediction process control instruction 42 First conversion process control instruction 43 Second conversion process control instruction 44 Loss calculation process control instruction 45 Update process control instruction 50 Information storage area 70 Prediction program 71 Distribution prediction process control instruction 72 Data prediction process control instruction 73 Covariance calculation process control instruction 80 Information storage area

Claims

an autoencoder including an encoder that converts multidimensional input data into latent variables and a decoder that restores the latent variables is trained so as to minimize a restoration error of the autoencoder and a loss function including an encoded information amount of the latent variables calculated based on a probability distribution generated by adding noise to the latent variables and a prior distribution of the latent variables predicted based on time-series data for the input data for a predetermined period of time;
An information processing program for causing a computer to execute a process including: calculating a predicted value of multidimensional prediction target data and a covariance between the predicted values by inputting a latent variable predicted based on time series data for the multidimensional prediction target data for a predetermined period of time into a decoder of the trained autoencoder.
The information processing program according to claim 1, further comprising: training, together with the autoencoder, a distribution predictor that predicts the mean and variance of latent variables for the input data based on time series data for the input data for a predetermined period of time in the past.
The information processing program according to claim 2, which predicts the prior distribution of the latent variable by mixing probability distributions for the number of dimensions of the latent variable, which are represented by the mean and variance predicted by the distribution predictor.
predicting a mean and a variance of a latent variable for the prediction target data by inputting time series data for a predetermined period of time for the prediction target data into the trained distribution predictor;
Calculating the predicted value by inputting the predicted mean of the latent variables to the decoder;
Calculating the covariance based on the predicted variances of the latent variables and gradients obtained by numerical differentiation of the decoder.
4. The information processing program according to claim 2 or 3.
a machine learning unit that trains an autoencoder including an encoder that converts multidimensional input data into latent variables and a decoder that restores the latent variables so as to minimize a restoration error of the autoencoder and a loss function including an encoded information amount of the latent variables calculated based on a probability distribution generated by adding noise to the latent variables and a prior distribution of the latent variables predicted based on time-series data for the input data for a predetermined period of time;
a prediction unit that calculates a predicted value of the multidimensional prediction target data and a covariance between the predicted values by inputting a latent variable predicted based on time series data for a predetermined past period for the multidimensional prediction target data into a decoder of the trained autoencoder;
An information processing device comprising:
The information processing device according to claim 5, wherein the machine learning unit trains, together with the autoencoder, a distribution predictor that predicts the mean and variance of latent variables for the input data based on time series data for the input data for a predetermined period in the past.
The information processing device according to claim 6, wherein the machine learning unit predicts the prior distribution of the latent variable by mixing probability distributions for the number of dimensions of the latent variable, the distribution distributions being represented by the mean and variance predicted by the distribution predictor.
The prediction unit is
predicting a mean and a variance of a latent variable for the prediction target data by inputting time series data for a predetermined period of time for the prediction target data into the trained distribution predictor;
Calculating the predicted value by inputting the predicted mean of the latent variables to the decoder;
Calculating the covariance based on the predicted variances of the latent variables and gradients obtained by numerical differentiation of the decoder.
8. The information processing device according to claim 6 or 7.
an autoencoder including an encoder that converts multidimensional input data into latent variables and a decoder that restores the latent variables is trained so as to minimize a restoration error of the autoencoder and a loss function including an encoded information amount of the latent variables calculated based on a probability distribution generated by adding noise to the latent variables and a prior distribution of the latent variables predicted based on time-series data for the input data for a predetermined period of time;
An information processing method in which a computer executes a process including: calculating a predicted value of multidimensional prediction target data and a covariance between the predicted values by inputting latent variables predicted based on time-series data for the multidimensional prediction target data for a predetermined period of time into a decoder of the trained autoencoder.
The information processing method according to claim 9, further comprising training a distribution predictor together with the autoencoder to predict the mean and variance of latent variables for the input data based on time series data for the input data for a predetermined period of time in the past.
The information processing method according to claim 10, which predicts the prior distribution of the latent variable by mixing probability distributions for the number of dimensions of the latent variable, which are represented by the mean and variance predicted by the distribution predictor.
predicting a mean and a variance of a latent variable for the prediction target data by inputting time series data for a predetermined period of time for the prediction target data into the trained distribution predictor;
Calculating the predicted value by inputting the predicted mean of the latent variables to the decoder;
Calculating the covariance based on the predicted variances of the latent variables and gradients obtained by numerical differentiation of the decoder.
The information processing method according to claim 10 or 11.
an autoencoder including an encoder that converts multidimensional input data into latent variables and a decoder that restores the latent variables is trained so as to minimize a restoration error of the autoencoder and a loss function including an encoded information amount of the latent variables calculated based on a probability distribution generated by adding noise to the latent variables and a prior distribution of the latent variables predicted based on time-series data for the input data for a predetermined period of time;
A non-transitory storage medium storing an information processing program for causing a computer to execute a process including: calculating a predicted value of multidimensional prediction target data and a covariance between the predicted values by inputting a latent variable predicted based on time series data for the multidimensional prediction target data for a predetermined period of time into a decoder of the trained autoencoder.