JP2019139482A

JP2019139482A - Information estimation device and information estimation method

Info

Publication number: JP2019139482A
Application number: JP2018021943A
Authority: JP
Inventors: 仁吾安達; Jingo Adachi
Original assignee: Denso IT Laboratory Inc
Current assignee: Denso IT Laboratory Inc
Priority date: 2018-02-09
Filing date: 2018-02-09
Publication date: 2019-08-22
Anticipated expiration: 2038-02-09
Also published as: JP6893483B2

Abstract

To embody a new auto-encoder equipped with probabilistic elements in an estimation technology using a neural network.SOLUTION: In an information estimation device comprising an auto-encoder configured by an encoder and a decoder utilizing a neural network, at least one integration layer composed of a combination of a dropout layer for dropping out a part of input data out and a full coupling layer (FC layer) for calculating weight is provided as a final layer of the encoder. Accordingly, output values (latent variables) in a latent space which are output values from the encoder become multi-dimensional probability variable vectors, allowing calculating parameters relating to probability distribution in the latent space while preventing the number of dimensions (the number of neurons) in the latent space from growing, and moreover allowing performing estimation by analytically calculating shapes of the probability distribution in the latent space.SELECTED DRAWING: Figure 2

Description

本発明は、ニューラルネットワークを用いた推定処理を行う情報推定装置及び情報推定方法に関する。本発明は、特に、オートエンコーダの一種であるバリエーショナルオートエンコーダ（Variational AutoEncoder）を改良した情報推定装置及び情報推定方法に関する。 The present invention relates to an information estimation apparatus and an information estimation method that perform an estimation process using a neural network. In particular, the present invention relates to an information estimation apparatus and an information estimation method obtained by improving a variational auto encoder that is a kind of auto encoder.

ニューラルネットワーク（ＮＮ： Neural Network）を用いた推定器は、他の推定器と比べ、画像やセンサー信号データなど、大量の情報を入力データとして処理し、推定を行うことができることから様々な分野への応用に期待されている。 Estimators using neural networks (NNs) can process and estimate a large amount of information such as images and sensor signal data as input data compared to other estimators. It is expected to be applied.

ニューラルネットワークには、オートエンコーダ（Auto-encoder）と呼ばれるものが存在する。オートエンコーダはニューラルネットワークによる教師無し学習器であり、典型的には、オートエンコーダのニューラルネットワーク構造において、入力層で次元数を意味するニューロン数が多く、徐々に後続の層のニューロンの数が減っていき、中心部分の潜在空間を表す層で最も次元数が圧縮されてニューロンの数が少なくなる。一方、中心部分の潜在空間を表す層以降では、逆にニューロンの数が増えていき、最後の出力層ではニューロンの数が入力層と同じになる構造を有している。すなわち、入力層の次元数と出力層の次元数は同一であり、中心部分の潜在空間を表す層の次元数は、入力層及び出力層の次元数よりも少なく設定される。なお、入力層から潜在空間を表す層までの前半部分はエンコーダと呼ばれ、潜在空間を表す層から出力層までの後半部分はデコーダと呼ばれる。 Neural networks include what is called an auto-encoder. Auto-encoders are unsupervised learners based on neural networks.Typically, in the neural network structure of auto-encoders, the number of neurons in the input layer means the number of dimensions, and the number of neurons in the subsequent layers gradually decreases. In the layer that represents the latent space in the central part, the number of dimensions is compressed the most and the number of neurons is reduced. On the other hand, after the layer representing the latent space in the central portion, the number of neurons increases conversely, and the last output layer has a structure in which the number of neurons is the same as that of the input layer. That is, the number of dimensions of the input layer and the number of dimensions of the output layer are the same, and the number of dimensions of the layer representing the latent space in the central portion is set to be smaller than the number of dimensions of the input layer and the output layer. The first half from the input layer to the layer representing the latent space is called an encoder, and the second half from the layer representing the latent space to the output layer is called a decoder.

ラベルのない学習データ（ｎ_Xin次元のベクトルｘ）を入力すると、まずエンコーダで、次元数を減らした潜在空間のデータ（ｎ_z次元のベクトルｚ：潜在変数とも呼ばれる）に圧縮される。潜在空間の中では元データの類似度に応じて複数の塊に集まって存在する。さらに、その圧縮された空間のデータｚがデコーダを通り、入力ｘを復元（Reconstruction）することができる。これが古典的なオートエンコーダであり、固定値である入力ｘに基づいてオートエンコーダから出力される値は、入力ｘと同様にある固定値に一意的に決まり、決定論的（Deterministic）である。 When unlabeled learning data (n _Xin- dimensional vector x) is input, first, the encoder compresses the data into latent space with a reduced number of dimensions ( _{nz z-} dimensional vector z: also called a latent variable). In the latent space, they exist in a plurality of chunks according to the similarity of the original data. Furthermore, the compressed space data z can pass through the decoder to reconstruct the input x. This is a classic auto encoder, and the value output from the auto encoder based on the input x that is a fixed value is uniquely determined to be a fixed value as in the case of the input x, and is deterministic.

それに対し、確率的要素を含んだもの、すなわち、ある固定された入力ｘに対して毎回計算ごとに出力値が変わる確率的な（Stochastic）オートエンコーダとして、非特許文献１には、バリエーショナルオートエンコーダ（Variational AutoEncoder、以降、ＶＡＥと略す）が提案されている。 On the other hand, as a stochastic auto encoder that includes a stochastic element, that is, the output value changes every calculation for a certain fixed input x, Non-Patent Document 1 discloses a variational auto An encoder (Variational AutoEncoder, hereinafter abbreviated as VAE) has been proposed.

上述した古典的なオートエンコーダでは、入力されたベクトルデータｘに対し、圧縮されたｎ_z次元の潜在空間でのベクトルデータｚが一意的に決まるよう構成されているが、一方、ＶＡＥでは、入力されたベクトルデータｘに対し、圧縮されたｎ_z次元の潜在空間でのベクトルｚが一意的に決まるのではなく、ある事後確率分布ｐ（ｚ｜ｘ）をとる確率変数のベクトルとして求められる。その事後確率分布ｐ（ｚ｜ｘ）は、例えば、ｎ_z次元の多変量ガウス分布によって表される。以下、非特許文献１において提案されている理論について説明する。 In the classic auto encoder described above, the vector data z in the compressed _nz- dimensional latent space is uniquely determined for the input vector data x. On the other hand, in the VAE, The compressed vector data x is not uniquely determined in the compressed _nz- dimensional latent space, but is determined as a vector of random variables having a certain posterior probability distribution p (z | x). The posterior probability distribution p (z | x) is represented by, for example, an _nz- dimensional multivariate Gaussian distribution. Hereinafter, the theory proposed in Non-Patent Document 1 will be described.

ＶＡＥでは、与えられたデータｘは、それを生じさせる元となった潜在的要因の全てのｚの値を積分することで説明される。それは数式的に以下のように記述される。 In VAE, a given data x is described by integrating all z values of the potential factors that caused it. It is described mathematically as follows:

ここで、ｐ_θとは、あるパラメータθでその分布形状が決定される確率を意味する。右辺の全てのｚを積分することで説明されたデータｘの確率が大きいほど、データｘが説明されていることを意味する。 Here, p _θ means the probability that the distribution shape is determined by a certain parameter θ. The larger the probability of the data x explained by integrating all zs on the right side, the more data x is explained.

データｘが与えられたとき、その要因となった潜在的確率変数ｚはどのような分布をとるのかを表す事後確率分布ｐ（ｚ｜ｘ）を求めたい。しかし、この事後確率分布ｐ（ｚ｜ｘ）は解析的には計算不可能であるため、例えば変分法が用いられる。すなわち、ｐ（ｚ｜ｘ）に近いとされたある提案関数ｑ_φ（あるパラメータφでその分布形状が決定される確率分布）があると仮定すると、以下の関係式が成り立ち、この関係式から提案関数ｑ_φを求め、それをｐ（ｚ｜ｘ）の近似解とすることができる。 When data x is given, we want to find a posterior probability distribution p (z | x) that represents what kind of distribution the latent random variable z that caused the data x takes. However, since this posterior probability distribution p (z | x) cannot be calculated analytically, for example, a variational method is used. That is, assuming that there is a certain proposed function q _φ (probability distribution whose distribution shape is determined by a certain parameter φ) that is assumed to be close to p (z | x), the following relational expression is established. The proposed function q _φ can be obtained and used as an approximate solution of p (z | x).

ここで、上式（１）の左辺は、前述の与えられたデータｘの説明がどれだけできるか、もっともらしさを表す対数尤度である。 Here, the left side of the above equation (1) is a log likelihood representing the likelihood of how much the above-described given data x can be explained.

上式（１）の右辺第１項のＤ_KLは、ＫＬダイバージェンス（KL Divergence）を意味し、２つの関数がどれだけ近いか、距離を表すゼロ以上の値を返す関数である。事後確率分布ｐ（ｚ｜ｘ）を近似させた提案関数ｑ_φを求めるためには、その分布がどういう関数で表されるのかを決め、その関数のパラメータθ、φを決定する。大量にあるデータｘに対して、前述の式がより最適な状態でパラメータθ、φで成り立っているとすると、左辺の尤度のｌｏｇｐ_θ（ｘ）が説明できているので高いはずであり、提案関数ｑ_φが、知ることができない事後確率分布ｐ（ｚ｜ｘ）に近づいているので右辺第１項のＤ_KLはゼロに近づくとみなせる。 D _{KL in} the first term on the right-hand side of the above equation (1) means KL divergence, and is a function that returns a value of zero or more indicating a distance as to how close two functions are. In order to obtain the proposed function q _φ that approximates the posterior probability distribution p (z | x), it is determined what function the distribution is represented by, and parameters θ and φ of the function are determined. For a large amount of data x, if the above equation is made up of parameters θ and φ in a more optimal state, the logp _θ (x) of the left-side likelihood can be explained, so it should be high, Since the proposed function q _φ approaches the posterior probability distribution p (z | x) that cannot be known, it can be considered that D _{KL of} the first term on the right side approaches zero.

一方、右辺第２項をＬ（θ，φ；ｘ）と書くと、右辺第２項は、以下のように２つの項で表される。 On the other hand, when the second term on the right side is written as L (θ, φ; x), the second term on the right side is represented by two terms as follows.

上式（２）の第１項は、正則化（Regularization）を意味する項であり、上式（２）の第２項は、入力されたデータを出力において復元（Reconstruction）できるかを意味する項である。尤度を表すｌｏｇｐ_θ（ｘ）を高くするためには、Ｌ（θ，φ；ｘ）を最大化する必要があり、上式（２）の第１項及び第２項を最大化させる必要がある。学習における最適化とは、大量の学習データｘに対して目的関数Ｌ（θ，φ；ｘ）を最大にするパラメータθ、φを求めることである。そのためには、大量のデータ処理能力のあるニューラルネットワークを用いることが最適であり、パラメータ最適化計算ツールとして使用する。 The first term in the above equation (2) is a term that means regularization, and the second term in the above equation (2) means whether the input data can be reconstructed in the output. Term. In order to increase logp _θ (x) representing the likelihood, it is necessary to maximize L (θ, φ; x), and it is necessary to maximize the first and second terms of the above equation (2). There is. The optimization in learning is to obtain parameters θ and φ that maximize the objective function L (θ, φ; x) for a large amount of learning data x. For that purpose, it is optimal to use a neural network having a large amount of data processing capability, and it is used as a parameter optimization calculation tool.

非特許文献１で提案されているＶＡＥでは、ｑ_φ（ｚ｜ｘ）を、ｎ_z次元多変量ガウス分布と考えて、その形状を決定するパラメータφを、ガウス分布の平均μ_zと分散共分散行列Σ_zの分散ｄｉａｇ（Σ_z）の２つであるとして計算している。なお、ｄｉａｇは行列の対角項を意味している。また、残りの非対角部分ｏｆｆｄｉａｇ（Σ_z）に関しては、非特許文献１ではゼロとしており、したがって、共分散値ｏｆｆｄｉａｇ（Σ_z）に関しては、非特許文献１で提案されているＶＡＥでは計算されず、指定されていない。すなわち、非特許文献１で提案されているＶＡＥでは、以下の式のような条件が設定されている。 In the VAE proposed in Non-Patent Document 1, q _φ (z | x) is considered as an _nz- dimensional multivariate Gaussian distribution, and the parameter φ that determines its shape is set to be equal to the mean μ _{z of the} Gaussian distribution and the variance. The calculation is performed assuming that there are two variance diags (Σ _z ) of the variance matrix Σ _z . Note that diag means a diagonal term of the matrix. Further, the remaining non-diagonal portion offdiag (Σ _z ) is set to zero in Non-Patent Document 1, and therefore the covariance value offdiag (Σ _z ) is calculated in the VAE proposed in Non-Patent Document 1. Not specified. That is, in the VAE proposed in Non-Patent Document 1, conditions such as the following formula are set.

パラメータφはエンコーダの出力値として計算され、潜在空間の層のニューロン数は、ｎ_z次元×２となる。つまり、以下のｎ_z次元×２個のパラメータの値が順番にエンコーダから出力される。 The parameter φ is calculated as an output value of the encoder, and the number of neurons in the latent space layer is _nz dimension × 2. That is, the following _nz dimensions × 2 parameter values are output in order from the encoder.

前述のように、最適化計算では、目的関数Ｌ（θ，φ；ｘ）を最大化する必要があり、そのためには、正則化を意味する上式（２）の第１項 As described above, in the optimization calculation, the objective function L (θ, φ; x) needs to be maximized. For this purpose, the first term of the above equation (2) that means regularization is used.

を最大にする必要がある。この項を最大にするということは、 Need to be maximized. Maximizing this term means that

を最小化するということであり、求めようとする分布ｑ_φ（ｚ｜ｘ）が分布ｐ_θ（ｚ）にできるだけ近い形状でなければならないということである。ｐ_θ（ｚ）はｚの事前分布ｐ_θ（ｚ）を意味し、非特許文献１によれば、平均μ₀はゼロの値のベクトル、分散値Σ₀は単位ベクトルとなる、以下の式のような標準ガウス分布として計算する。 The distribution q _φ (z | x) to be obtained must have a shape as close as possible to the distribution p _θ (z). p _θ (z) means the prior distribution p _θ (z) of z. According to Non-Patent Document 1, the average μ ₀ is a vector of zero values, and the variance value Σ ₀ is a unit vector. Calculate as a standard Gaussian distribution.

上記の式より、正則化を意味する上式（２）の第１項は、以下の式のように表される。 From the above formula, the first term of the above formula (2), which means regularization, is expressed as the following formula.

もう１つのパラメータθは、非特許文献１によると、デコーダの出力値を意味することになる。デコーダでは、ある具体的なｚの値をサンプリングし、前述のように得られた確率分布ｑ_φ（ｚ｜ｘ）、すなわち、知り得ない事後確率ｐ（ｚ｜ｘ）に限りなく近づけた確率分布ｑ_φ（ｚ｜ｘ）から復元させる。前述の復元に関する上式（２）の第２項は、復元されたｘが、入力されたデータｘに対応して同じ値となるかを表す対数尤度を意味する。 According to Non-Patent Document 1, another parameter θ means the output value of the decoder. In the decoder, a specific value of z is sampled, and the probability distribution q _φ (z | x) obtained as described above, that is, the probability of approaching the posterior probability p (z | x) that cannot be known as much as possible. Restore from the distribution q _φ (z | x). The second term of the above formula (2) relating to the above-described restoration means log likelihood indicating whether the restored x has the same value corresponding to the input data x.

つまり、前述のようにデコーダの最終層から出力される値はｘそのものではなく、そのｘがとる確率分布ｐ_θ（ｘ｜ｚ）の形状を決定するパラメータθとする。仮に、データｘが白黒の画像である場合、その確率分布をベルヌーイ分布と置き、ベルヌーイ分布を決定するパラメータθを使って、入力ｘと同じである確率ｐ_θ（ｘ｜ｚ）を計算し、さらにそのｌｏｇをとることでｌｏｇ［ｐ_θ（ｚ｜ｘ）］を計算する。前述の復元に関する上式（２）の第２項の期待値の部分 That is, as described above, the value output from the final layer of the decoder is not x itself, but is a parameter θ that determines the shape of the probability distribution p _θ (x | z) taken by x. If the data x is a black and white image, the probability distribution is set as the Bernoulli distribution, and the probability p _θ (x | z) that is the same as the input x is calculated using the parameter θ that determines the Bernoulli distribution. Further, log [p _θ (z | x)] is calculated by taking the log. The expected value part of the second term of the above formula (2) related to the above restoration

は、バッチの複数のサンプルで処理することで、同等の期待値計算をしているものとみなされる。 Is considered to have an equivalent expected value calculation by processing with multiple samples of the batch.

図１は、従来技術におけるＶＡＥの一例を模式的に示す図である。図１に示すように、入力Ｘ（ｎ_Xin次元のベクトル）は、ニューラルネットワークで構成されたエンコーダを通り、エンコーダから、ガウス分布の平均（ｎ_z次元）と分散値（ｎ_z次元）とが出力される。また、エンコーダの出力結果に基づいてある具体的なｚの値がサンプリングされて、ニューラルネットワークで構成されたデコーダに入力され、デコーダからｎ_Xout次元のベクトルが出力される。なお、デコーダからの出力は、入力Ｘと同じとなるよう最適化され、入力と出力の次元数は同じ（ｎ_Xin＝ｎ_Xout）である。 FIG. 1 is a diagram schematically illustrating an example of a VAE in the prior art. As shown in FIG. 1, an input X (an n _Xin- dimensional vector) passes through an encoder constituted by a neural network, and an average (N _z dimension) of a Gaussian distribution and a variance value (n _z dimension) are output from the encoder. Is output. Further, a specific value of z is sampled based on the output result of the encoder and input to a decoder configured by a neural network, and an n _Xout- dimensional vector is output from the decoder. Note that the output from the decoder is optimized to be the same as the input X, and the number of dimensions of the input and the output is the same (n _Xin = n _Xout ).

国際公開公報ＷＯ２０１４１０５８６６Ａ１International Publication No. WO2014155866A1

“Auto-Encoding Variational Bayes”, Diederik P. Kingma, Max Welling：２０１３年１２月２０日（https://arxiv.org/abs/1312.6114から取得可能）“Auto-Encoding Variational Bayes”, Diederik P. Kingma, Max Welling: December 20, 2013 (available from https://arxiv.org/abs/1312.6114) “APPROXIMATING THE KULLBACK LEIBLER DIVERGENCE BETWEEN GAUSSIAN MIXTURE MODELS”, John R. Hershey and Peder A. Olsen：２００７年４月１５−２０日（{ HYPERLINK "http://ieeexplore.ieee.org/document/4218101/" ,http://ieeexplore.ieee.org/document/4218101/} から取得可能）“APPROXIMATING THE KULLBACK LEIBLER DIVERGENCE BETWEEN GAUSSIAN MIXTURE MODELS”, John R. Hershey and Peder A. Olsen: April 15-20, 2007 ({HYPERLINK "http://ieeexplore.ieee.org/document/4218101/", available from http://ieeexplore.ieee.org/document/4218101/})

非特許文献１で提案されているＶＡＥは確率的要素を備えているが、ニューラルネットワークの潜在空間での出力は、ｚの値そのものではなく、ｚがとり得る値の確率分布の形状を決定づけるパラメータである。上述のように、非特許文献１で提案されているＶＡＥでは、ｑ_φ（ｚ｜ｘ）をｎ_z次元多変量ガウス分布と考え、ＶＡＥの潜在空間の層におけるパラメータφはｎ_z個の平均とｎ_z個の分散値であり、また、共分散値はすべてゼロとして単純化している。 Although the VAE proposed in Non-Patent Document 1 has a stochastic element, the output in the latent space of the neural network is not a value of z itself, but a parameter that determines the shape of the probability distribution of values that z can take. It is. As described above, in the VAE proposed in Non-Patent Document 1, q _φ (z | x) is considered as an _nz- dimensional multivariate Gaussian distribution, and the parameter φ in the latent space layer of the VAE is an _nz average. And n _z variance values, and the covariance values are all simplified to zero.

しかしながら、より複雑な分布をとらせようと設計者がデザインする場合には、その分布形状を決定づけるパラメータがより多く必要となる。例えば、潜在空間の分布を１０次元多変量ガウス分布にした場合、その形状を決定づけるパラメータの数は、１０個の平均値、１０個の分散値に加えて、（１０×１０−１０）／２＝４５個の共分散値が必要となる。また、潜在空間の分布を混合ガウス分布などにする場合には、さらに複雑となる。 However, when a designer designs to make a more complicated distribution, more parameters are required to determine the distribution shape. For example, when the distribution of the latent space is a 10-dimensional multivariate Gaussian distribution, the number of parameters determining the shape is (10 × 10 −10) / 2 in addition to 10 average values and 10 variance values. = 45 covariance values are required. Further, when the latent space distribution is changed to a mixed Gaussian distribution or the like, it becomes more complicated.

上記の課題を解決するため、本発明は、確率的要素を備えた新たなオートエンコーダを実現する情報推定装置及び情報推定方法を提供することを目的とする。 In order to solve the above-described problems, an object of the present invention is to provide an information estimation apparatus and an information estimation method that realize a new auto encoder having a stochastic element.

上記目的を達成するため、本発明によれば、従来技術におけるＶＡＥのエンコーダの潜在空間での出力ｚを、出力ｚの分布を決定づけるパラメータとするのではなく、前述の古典的なオートエンコーダと同様に出力ｚの値そのものであるようにし、かつ、出力ｚの値は、古典的なオートエンコーダのような決定論的なある値ではなく、ある確率分布からサンプリングされた確率変数であるようにした情報推定装置及び情報推定方法が提供される。 In order to achieve the above object, according to the present invention, the output z in the latent space of the VAE encoder in the prior art is not a parameter that determines the distribution of the output z, but is similar to the classic auto encoder described above. The output z value itself, and the output z value is not a deterministic value like a classic auto-encoder, but a random variable sampled from a probability distribution. An information estimation apparatus and an information estimation method are provided.

上記目的を達成するため、例えば、本発明に係る情報推定装置は、ニューラルネットワークを使用して推定処理を行う情報推定装置であって、
エンコーダ及びデコーダにより構成されたオートエンコーダを備え、前記オートエンコーダに入力された入力データに基づいて前記エンコーダ及び前記デコーダで順次計算処理を行い、前記推定処理の結果として前記オートエンコーダから出力データを出力するよう構成されているオートエンコーダ計算部を有し、
データの一部をドロップアウトさせるドロップアウト層と、前記ドロップアウト層から出力されたデータに対して重みの計算を行う全結合層との組み合わせからなる少なくとも１つの一体化層を、前記エンコーダの最終層として設けることで、前記エンコーダからの出力値である潜在空間での出力値が多次元確率変数ベクトルとなるように構成されている。 In order to achieve the above object, for example, an information estimation apparatus according to the present invention is an information estimation apparatus that performs an estimation process using a neural network,
An auto encoder composed of an encoder and a decoder is provided, and the encoder and the decoder sequentially perform calculation processing based on input data input to the auto encoder, and output data from the auto encoder as a result of the estimation processing An auto-encoder calculator configured to
At least one integrated layer composed of a combination of a dropout layer that drops out a part of data and a fully connected layer that performs weight calculation on the data output from the dropout layer, By providing as a layer, the output value in the latent space, which is the output value from the encoder, is configured to be a multidimensional random variable vector.

また、上記目的を達成するため、例えば、本発明に係る情報推定方法は、ニューラルネットワークを使用して推定処理を行う情報推定装置で行われる情報推定方法であって、
エンコーダ及びデコーダにより構成されたオートエンコーダを用いて、前記オートエンコーダに入力された入力データに基づいて前記エンコーダ及び前記デコーダで順次計算処理を行い、前記推定処理の結果として前記オートエンコーダから出力データを出力するオートエンコーダ計算ステップを有し、
データの一部をドロップアウトさせるドロップアウト層と、前記ドロップアウト層から出力されたデータに対して重みの計算を行う全結合層との組み合わせからなる少なくとも１つの一体化層を、前記エンコーダの最終層として設けることで、前記エンコーダからの出力値である潜在空間での出力値を多次元確率変数ベクトルとする。 In order to achieve the above object, for example, an information estimation method according to the present invention is an information estimation method performed by an information estimation apparatus that performs an estimation process using a neural network,
Using an auto encoder configured by an encoder and a decoder, the encoder and the decoder sequentially perform calculation processing based on input data input to the auto encoder, and output data from the auto encoder is obtained as a result of the estimation processing. An auto encoder calculation step to output,
At least one integrated layer composed of a combination of a dropout layer that drops out a part of data and a fully connected layer that performs weight calculation on the data output from the dropout layer, By providing as a layer, the output value in the latent space, which is the output value from the encoder, is set as a multidimensional random variable vector.

本発明は、確率的要素を備えた新たなオートエンコーダを実現し、潜在空間における次元数（ニューロンの数）の増加を抑えながら、潜在空間における確率分布についてに任意の確率分布の形状に対応できるという効果を有する。また、本発明は、潜在空間における確率分布の形状を解析的な計算によって推測できるため、潜在空間における入力データの分離の様子をより正確に評価することができるという効果を有する。 The present invention realizes a new auto-encoder equipped with a stochastic element and can cope with an arbitrary probability distribution shape in the latent space while suppressing an increase in the number of dimensions (number of neurons) in the latent space. It has the effect. Further, the present invention can estimate the shape of the probability distribution in the latent space by analytical calculation, and thus has an effect that the state of separation of input data in the latent space can be more accurately evaluated.

従来技術におけるＶＡＥの一例を模式的に示す図である。It is a figure which shows typically an example of VAE in a prior art. 本発明の第１の実施の形態におけるオートエンコーダの第１の例を模式的に示す図である。It is a figure which shows typically the 1st example of the auto encoder in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるオートエンコーダの第１の例に関して、ＤＦ層の詳細を示す図である。It is a figure which shows the detail of DF layer regarding the 1st example of the auto encoder in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるオートエンコーダの第２の例を示す図である。It is a figure which shows the 2nd example of the auto encoder in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるオートエンコーダの第２の例に関して、ＤＦ層の詳細を示す図である。It is a figure which shows the detail of DF layer regarding the 2nd example of the auto encoder in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるオートエンコーダの計算処理機能を含む情報推定装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the information estimation apparatus containing the calculation processing function of the auto encoder in the 1st Embodiment of this invention. 本発明の第１の実施の形態における計算処理の一例を示すフローチャートである。It is a flowchart which shows an example of the calculation process in the 1st Embodiment of this invention. （ａ）は、ガウス分布の幅を表すσの等高線の楕円と、さらにその分布に従って点在するモンテカルロ的にサンプリングした点の散布図とを示す表示方法を説明するための図であり、（ｂ）は、ガウス分布の幅を表すσの等高線の楕円と、さらに、そのガウス楕円の中心値、つまり平均値の点とを示す表示方法を説明するための図である。(A) is a figure for demonstrating the display method which shows the ellipse of the contour line of (sigma) showing the width | variety of a Gaussian distribution, and also the scatter diagram of the Monte Carlo sample sampled according to the distribution, (b) ) Is a diagram for explaining a display method that shows an ellipse of a contour line of σ representing the width of the Gaussian distribution, and further, a center value of the Gaussian ellipse, that is, a point of an average value. 本発明の第１の実施の形態における情報推定装置を用いた実験によって得られた、潜在空間の次元数ｎ_zがｎ_z＝２の場合の潜在空間におけるｚの値の分布を示す図であり、図８（ａ）の表示方法で描かれた図である。It is a figure which shows distribution of the value of z in the latent space when the dimension number _nz of the latent space is obtained by experiment using the information estimation apparatus in the 1st Embodiment of this invention, and _nz = 2. FIG. 9 is a diagram drawn by the display method of FIG. 本発明の第１の実施の形態における情報推定装置を用いた実験によって得られた、潜在空間の次元数ｎ_zがｎ_z＝２の場合の潜在空間におけるｚの値の分布を示す図であり、図８（ｂ）の表示方法で描かれた図である。It is a figure which shows distribution of the value of z in the latent space when the dimension number _nz of the latent space is obtained by experiment using the information estimation apparatus in the 1st Embodiment of this invention, and _nz = 2. FIG. 9 is a diagram drawn by the display method of FIG. （ａ）は、本発明の第１の実施の形態における情報推定装置を用いた実験結果を評価するために作成された図であって、学習前の状態のオートエンコーダが入力画像を復元した状態を示す図であり、（ｂ）は、本発明の第１の実施の形態における情報推定装置を用いた実験結果を評価するために作成された図であり、学習後の状態のオートエンコーダが入力画像を復元した状態を示す図である。(A) is the figure produced in order to evaluate the experimental result using the information estimation apparatus in the 1st Embodiment of this invention, Comprising: The state which the auto encoder of the state before learning decompress | restored the input image (B) is a diagram created to evaluate the experimental results using the information estimation apparatus according to the first embodiment of the present invention, and is input by the auto encoder in a state after learning It is a figure which shows the state which decompress | restored the image. 図９の事後確率分布（ガウス分布）を、本発明の第２の実施の形態により混合ガウス分布の場合に拡張した、入力画像が右上の文字「Ｈ」の画像の場合の実験結果を示す図であり、解析的に計算した混合ガウス分布は等高線で示され、モンテカルロ的に散布図で分布を表したものを重ねて表示させたものである。FIG. 9 is a diagram showing experimental results when the input image is an image with the letter “H” in the upper right, in which the posterior probability distribution (Gaussian distribution) of FIG. The analytically calculated mixed Gaussian distribution is indicated by contour lines, and is displayed by superimposing Monte Carlo distributions represented by a scatter diagram. 図９の事後確率分布（ガウス分布）を、本発明の第２の実施の形態により混合ガウス分布の場合に拡張した、入力画像が右上の文字「Ｈ」の画像の場合の実験結果を示す別の図であり、解析的に計算した混合ガウス分布は等高線で示され、モンテカルロ的に散布図で分布を表したものを重ねて表示させたものである。The posterior probability distribution (Gaussian distribution) shown in FIG. 9 is expanded to the case of the mixed Gaussian distribution according to the second embodiment of the present invention. The mixed Gaussian distribution calculated analytically is indicated by contour lines, and is displayed by superimposing the distributions represented by the scatter chart in a Monte Carlo manner.

以下、図面を参照しながら、本発明の第１及び第２の実施の形態について説明する。 Hereinafter, first and second embodiments of the present invention will be described with reference to the drawings.

＜第１の実施の形態＞
本発明の第１の実施の形態では、オートエンコーダの潜在空間での出力ｚを、出力ｚの分布を決定づけるパラメータとするのではなく、前述の古典的なオートエンコーダと同様に出力ｚの値そのものであるようにし、かつ、出力ｚの値を、古典的なオートエンコーダの場合のような決定論的なある値とするのではなく、ある確率分布からサンプリングされた確率変数とする。 <First Embodiment>
In the first embodiment of the present invention, the output z in the latent space of the auto encoder is not used as a parameter that determines the distribution of the output z, but the value of the output z itself is the same as the classic auto encoder described above. And let the value of the output z be a random variable sampled from a probability distribution, rather than a deterministic value as in a classic auto-encoder.

具体的には、本発明の第１の実施の形態では、エンコーダを構成するニューラルネットワーク内にドロップアウト層を追加することで、固定値である入力データに対して、エンコーダから出力される値を確率変数に変換する。さらに、ドロップアウトによるベルヌーイ分布がニューラルネットワーク上でどのような形状で伝搬するかを解析的に計算することで、その確率変数の分布形状を計算しておき、従来技術におけるＶＡＥと同様、正則化計算に使用する。 Specifically, in the first embodiment of the present invention, by adding a dropout layer in the neural network that constitutes the encoder, a value output from the encoder is obtained for input data that is a fixed value. Convert to random variable. Furthermore, the distribution shape of the random variable is calculated by analytically calculating in what form the Bernoulli distribution due to dropout propagates on the neural network, and regularized as in the case of VAE in the prior art. Used for calculation.

以下、図２〜図５を参照しながら、本発明の実施の形態におけるオートエンコーダの構造について説明する。図２は、本発明の第１の実施の形態におけるオートエンコーダの第１の例を模式的に示す図であり、図３は、本発明の第１の実施の形態におけるオートエンコーダの第１の例に関して、ＤＦ層の詳細を示す図である。また、図４は、本発明の第１の実施の形態におけるオートエンコーダの第２の例を示す図であり、図５は、本発明の第１の実施の形態におけるオートエンコーダの第２の例に関して、ＤＦ層の詳細を示す図である。なお、図２及び図３に示す例では、エンコーダにドロップアウト層が１つ設けられており、図４及び図５に示す例では、エンコーダにドロップアウト層が２つ設けられている。 The structure of the auto encoder according to the embodiment of the present invention will be described below with reference to FIGS. FIG. 2 is a diagram schematically illustrating a first example of the auto encoder according to the first embodiment of the present invention, and FIG. 3 illustrates a first example of the auto encoder according to the first embodiment of the present invention. FIG. 4 shows details of the DF layer for an example. FIG. 4 is a diagram showing a second example of the auto encoder according to the first embodiment of the present invention, and FIG. 5 is a second example of the auto encoder according to the first embodiment of the present invention. Is a diagram showing details of the DF layer. 2 and 3, the encoder is provided with one dropout layer, and in the examples shown in FIGS. 4 and 5, the encoder is provided with two dropout layers.

本発明の第１の実施の形態におけるオートエンコーダでは、古典的なオートエンコーダのエンコーダに、入力データの一部を欠損させてランダム性を生むドロップアウト層と、ドロップアウト層と、重みの計算を行う全結合（Fully Connected：ＦＣ）層を設ける。さらに、そのドロップアウト層とＦＣ層から、出力される値の分布を解析的に計算し、それを正則化の条件に使用する。なお、本明細書では簡易表現のため、ドロップアウト層及びＦＣ層を組み合わせた一体化層をＤＦ層と呼び、ドロップアウト層における計算処理とＦＣ層における計算処理が一緒に行われるものとして説明する。 In the auto encoder according to the first embodiment of the present invention, the encoder of the classic auto encoder performs calculation of a dropout layer, a dropout layer, and a weight that cause randomness by missing a part of input data. A fully connected (FC) layer is provided. Furthermore, the distribution of output values is analytically calculated from the dropout layer and the FC layer, and used as a regularization condition. In this specification, for the sake of simplicity, an integrated layer combining a dropout layer and an FC layer is called a DF layer, and the calculation process in the dropout layer and the calculation process in the FC layer are described as being performed together. .

まず、エンコーダにドロップアウト層が１つ設けられた場合について説明する。図２には、エンコーダにドロップアウト層が１つ設けられた場合が図示されている。図１に示す従来のＶＡＥでは、潜在空間での値の次元数はｚの確率分布のパラメータの数であったのに対し、図２に示すオートエンコーダでは、本発明の第１の実施の形態では、潜在空間での値の次元数はｚの次元数ｎ_zそのものとなる。 First, the case where one dropout layer is provided in the encoder will be described. FIG. 2 shows the case where one dropout layer is provided in the encoder. In the conventional VAE shown in FIG. 1, the number of dimensions of the value in the latent space is the number of parameters of the probability distribution of z, whereas in the auto encoder shown in FIG. 2, the first embodiment of the present invention is used. Then, the dimension number of the value in the latent space is the z dimension number n _z itself.

また、図３には、エンコーダにドロップアウト層が１つ設けられた場合におけるエンコーダのＤＦ１層が図示されている。なお、図３は、図２のエンコーダに含まれるドロップアウト層及びＦＣ層の部分を抜き出して図示したものである。図３のＤＦ１層への入力値Ｘｉｎ^DF1は固定値であり、その出力Ｘｏｕｔ^DF1はドロップアウト層によって変換された確率変数である。出力Ｘｏｕｔ^DF1の確率分布は、例えば特許文献１で提案された計算方法を用いて計算することができる。以下に、その計算方法について説明する。 FIG. 3 shows the DF1 layer of the encoder when one dropout layer is provided in the encoder. FIG. 3 shows the dropout layer and FC layer included in the encoder of FIG. The input value Xin ^DF1 to the DF1 layer in FIG. 3 is a fixed value, and its output Xout ^DF1 is a random variable converted by the dropout layer. The probability distribution of the output Xout ^DF1 can be calculated using, for example, the calculation method proposed in Patent Document 1. The calculation method will be described below.

ＤＦ１層への入力をＸｉｎ^DF1、出力をＸｏｕｔ^DF1とし、ＤＦ１層のドロップアウト層にあらかじめ設定されたドロップアウト率（データをランダムに欠損させる確率）をｐ_Drop ^DF1とする。また、ＤＦ１層のＦＣ層にあらかじめ設定された重みをＷ_i,j ^DF1とし、バイアスをｂ_i ^DF1とする。ただし、添え字ｉ及びｊは、１≦ｉ≦ｎ_Xout ^DF1、１≦ｊ≦ｎ_Xin ^DF1を満たす整数である。なお、明細書中の表記ｎ_Xin ^DF1は、ｎの下付き添字がＸｉｎ^DF1であることを表し、明細書中の表記ｎ_Xout ^DF1は、ｎの下付き添字がＸｏｕｔ^DF1であることを表す。 Assume that the input to the DF1 layer is Xin ^DF1 , the output is Xout ^DF1, and the dropout rate (probability of missing data randomly) set in the dropout layer of the DF1 layer is p _Drop ^DF1 . Further, the weight set in advance for the FC layer of the DF1 layer is W _{i, j} ^DF1 , and the bias is b _i ^DF1 . The subscripts i and j are integers satisfying 1 ≦ i ≦ n _Xout ^DF1 and 1 ≦ j ≦ n _Xin ^DF1 . The notation n _Xin ^DF1 in the specification indicates that the subscript of n is Xin ^DF1 , and the notation n _Xout ^DF1 in the specification indicates that the subscript of n is Xout ^DF1 .

ＤＦ１層への入力Ｘｉｎ^DF1は固定値であり、定数からなるｎ_Xin ^DF1次元のベクトルであって、以下のように表される。 The input Xin ^DF1 to the DF1 layer is a fixed value and is an n _Xin ^DF1 dimensional vector composed of constants, and is expressed as follows.

一方、ＤＦ１層からの出力Ｘｏｕｔ^DF1は、以下のように表される。 On the other hand, the output Xout ^DF1 from the DF1 layer is expressed as follows.

ＤＦ１層からの出力Ｘｏｕｔ^DF1は、ｎ_Xout ^DF1次元のベクトルであり、このベクトルＸｏｕｔ^DF1のｉ番目の要素は以下のとおりである。 The output Xout ^DF1 from the DF1 layer is an n _Xout ^DF1- dimensional vector, and the i-th element of this vector Xout ^DF1 is as follows.

ここで、ドロップアウト層におけるドロップアウトにより、右辺のＷ_i,j ^DF1Ｘｉｎ^DF1 _j項（１≦ｊ≦ｎ_Xin ^DF1）が、確率ｐ_drop ^DF1でランダムに消える（ゼロとなる）。したがって、各項の和である左辺のＸｏｕｔ^DF1 _iは“サンプリング和”としてとらえて計算することができる。このことから、出力Ｘｏｕｔ^DF1は確率変数であり、例えば、以下のようなｎ_Xout ^DF1次元の多変量ガウス分布に従う確率変数であるとする。 Here, due to the dropout in the dropout layer, the W _{i, j} ^DF1 Xin ^DF1 _j term (1 ≦ j ≦ n _Xin ^DF1 ) on the right side disappears at random with the probability p _drop ^DF1 (becomes zero). Accordingly, the Xout ^DF1 _{i on the} left side, which is the sum of the terms, can be calculated as a “sampling sum”. From this, it is assumed that the output Xout ^DF1 is a random variable, for example, a random variable that follows an n _Xout ^DF1- dimensional multivariate Gaussian distribution as follows.

ただし、μ_out ^DF1は、平均値を示すｎ_Xout ^DF1次元のベクトル、Σ_out ^DF1は、ｎ_Xout ^DF1×ｎ_Xout ^DF1の分散共分散行列である。平均値μ_out ^DF1及び分散共分散行列Σ_out ^DF1は、以下の式から得られる。 However, μ _out ^DF1 is an n _Xout ^DF1 dimensional vector indicating an average value, and Σ _out ^DF1 is an n _Xout ^DF1 × n _Xout ^DF1 variance-covariance matrix. The average value μ _out ^DF1 and the variance covariance matrix Σ _out ^DF1 are obtained from the following equations.

図３のＤＦ１層からの出力は、図２のオートエンコーダのエンコーダからの出力であり、エンコーダから出力される潜在空間における値ｚの確率分布ｑ_φ（ｚ｜ｘ）に対応している。これより表記について、Ｘｏｕｔ^DF1をｚに、μ_out ^DF1をμ_zに、Σ_out ^DF1をΣ_zに、ｎ_Xin ^DF1をｎ_hに、ｎ_Xout ^DF1をｎ_zにそれぞれ置き換えることができ、エンコーダから出力される潜在空間における値ｚは、以下の多変量ガウス分布として表される。 The output from the DF1 layer in FIG. 3 is the output from the encoder of the auto encoder in FIG. 2, and corresponds to the probability distribution q _φ (z | x) of the value z in the latent space output from the encoder. From this, Xout ^DF1 can be replaced with z, μ _out ^DF1 with μ _z , Σ _out ^DF1 with Σ _z , n _Xin ^DF1 with n _h , and n _Xout ^DF1 with n _z. The value z in the output latent space is represented as the following multivariate Gaussian distribution.

ただし、μ_zはｎ_z次元のベクトル、Σ_zはｎ_z×ｎ_zの分散共分散行列である。 Here, μ _z is an _nz- dimensional vector, and Σ _z is an _nz × _nz variance-covariance matrix.

次に、エンコーダにドロップアウト層が２つ設けられた場合について説明する。図４には、より複雑な場合として、エンコーダにドロップアウト層が２つ設けられた場合が図示されている。また、図５には、エンコーダにドロップアウト層が２つ設けられた場合におけるエンコーダのＤＦ１層、ＲｅＬｕ（Rectified Linear Unit）層、ＤＦ２層が図示されている。なお、図５は、図４のエンコーダに含まれる２つのドロップアウト層及びＦＣ層と、これらの間に挟まれたＲｅＬｕ層の部分を抜き出して図示したものである。以下、ＤＦ層が２つ存在する場合の計算方法について説明する。 Next, a case where two dropout layers are provided in the encoder will be described. FIG. 4 shows the case where two dropout layers are provided in the encoder as a more complicated case. FIG. 5 shows the DF1 layer, ReLu (Rectified Linear Unit) layer, and DF2 layer of the encoder when two dropout layers are provided in the encoder. FIG. 5 shows two dropout layers and an FC layer included in the encoder of FIG. 4 and a part of the ReLu layer sandwiched between them. Hereinafter, a calculation method when there are two DF layers will be described.

図５の場合、ＲｅＬｕ層を挟んで、２つのＤＦ層、すなわちＤＦ１層及びＤＦ２層が設けられている。１つ目のＤＦ１層への入力、出力は上述のとおりである。また、ＤＦ１層とＤＦ２層の間にあるＲｅＬｕ層などの非線形関数の計算方法としては、例えば、特許文献１に挙げたような多変量ガウス近似として計算する方法や、単純に、ガウス関数が負の領域にあるか正の領域にあるかの判断で近似して計算する方法（本願出願時には非公開であるが、本発明者を発明者とする特許出願（特願２０１７−１９６７４０）に係る明細書及び図面に記載された計算方法）など使用可能であるが、本発明はこれらの計算方法に限定されるものではない。 In the case of FIG. 5, two DF layers, that is, a DF1 layer and a DF2 layer are provided with a ReLu layer interposed therebetween. Inputs and outputs to the first DF1 layer are as described above. In addition, as a method of calculating a nonlinear function such as a ReLu layer between the DF1 layer and the DF2 layer, for example, a method of calculating as a multivariate Gaussian approximation as described in Patent Document 1, or a simple Gaussian function is negative. Method of Approximating and Calculating by Judgment of Being in Region or Positive Region (Details Related to Patent Application (Japanese Patent Application No. 2017-196740) Inventor as Inventor Although Not Published at the Time of Application) The calculation method described in the document and drawings) can be used, but the present invention is not limited to these calculation methods.

以下、２つ目のＤＦ２層への入力、出力について説明する。ＤＦ２層への入力をＸｉｎ^DF2、出力をＸｏｕｔ^DF2とし、ＤＦ２層のドロップアウト率をｐ_Drop ^DF2とする。また、ＤＦ２層のＦＣ層の重みをＷ_i,j ^DF2とし、バイアスをｂ_i ^DF2とする。ただし、添え字ｉ及びｊは、１≦ｉ≦ｎ_Xout ^DF2、１≦ｊ≦ｎ_Xin ^DF2を満たす整数である。なお、明細書中の表記ｎ_Xin ^DF2は、ｎの下付き添字がＸｉｎ^DF2であることを表し、明細書中の表記ｎ_Xout ^DF2は、ｎの下付き添字がＸｏｕｔ^DF2であることを表す。 Hereinafter, input and output to the second DF2 layer will be described. The input to the DF2 layer is Xin ^DF2 , the output is Xout ^DF2, and the dropout rate of the DF2 layer is p _Drop ^DF2 . The weight of the FC layer of the DF2 layer is W _{i, j} ^DF2 , and the bias is b _i ^DF2 . The subscripts i and j are integers satisfying 1 ≦ i ≦ n _Xout ^DF2 and 1 ≦ j ≦ n _Xin ^DF2 . The notation n _Xin ^DF2 in the specification indicates that the subscript of n is Xin ^DF2 , and the notation n _Xout ^DF2 in the specification indicates that the subscript of n is Xout ^DF2 .

ＤＦ２層への入力Ｘｉｎ^DF2、出力Ｘｏｕｔ^DF2は両方とも、多変量ガウス分布に従う確率変数となり、以下のように表される。 Both the input Xin ^DF2 and the output Xout ^DF2 to the DF2 layer are random variables according to a multivariate Gaussian distribution, and are expressed as follows.

ただし、μ_in ^DF2はｎ_Xin ^DF2次元のベクトル、Σ_in ^DF2はｎ_Xin ^DF2×ｎ_Xin ^DF2の分散共分散行列であり、μ_out ^DF2はｎ_Xout ^DF2次元のベクトル、Σ_out ^DF2はｎ_Xout ^DF2×ｎ_Xout ^DF2の分散共分散行列である。 However, μ _in ^DF2 is an n _Xin ^DF2 dimensional vector, Σ _in ^DF2 is an n _Xin ^DF2 × n _Xin ^DF2 covariance matrix, μ _out ^DF2 is an n _Xout ^DF2 dimensional vector, and Σ _out ^DF2 is n _Xout ^DF2 Xn _Xout ^DF2 variance-covariance matrix.

平均値に関しては、以下のように計算できる。 The average value can be calculated as follows.

また、分散共分散行列に関しては、以下のように計算できる。 The variance-covariance matrix can be calculated as follows.

上記の右辺の第１項に関しては、以下のように計算できる。 The first term on the right side can be calculated as follows.

図５のＤＦ２層からの出力は、図４のオートエンコーダのエンコーダからの出力であり、エンコーダから出力される潜在空間における値ｚの確率分布ｑ_φ（ｚ｜ｘ）に対応している。したがって、ドロップアウト層が１つ存在する場合と同様に、表記について、Ｘｏｕｔ^DF2をｚに、μ_out ^DF2をμ_zに、Σ_out ^DF2をΣ_zに、ｎ_Xin ^DF2をｎ_hに、ｎ_Xout ^DF2をｎ_zにそれぞれ置き換えることができ、エンコーダから出力される潜在空間における値ｚは、以下の多変量ガウス分布として表される。 The output from the DF2 layer in FIG. 5 is the output from the encoder of the auto encoder in FIG. 4, and corresponds to the probability distribution q _φ (z | x) of the value z in the latent space output from the encoder. Thus, as with the drop-out layer is present one, the title, the Xout ^DF2 to z, the mu _out ^DF2 in mu _z, the sigma _out ^DF2 to sigma _z, the n _Xin ^DF2 to n _h, n _Xout Each of ^DF2 can be replaced with _nz, and the value z in the latent space output from the encoder is expressed as the following multivariate Gaussian distribution.

なお、ここでは、ドロップアウト層が２つ存在する場合について説明しているが、ドロップアウト層が３つ以上存在していてもよい。例えば、ＤＦ２層からの出力値が更なるドロップアウト層（３つ目のドロップアウト層）に入力されてもよく、この場合も、上述したＤＦ２層における計算方法と同様の計算方法によって、更なるドロップアウト層からの出力値を求めることができる。 In addition, although the case where two dropout layers exist is described here, three or more dropout layers may exist. For example, an output value from the DF2 layer may be input to a further dropout layer (third dropout layer), and in this case, the calculation method similar to the calculation method in the DF2 layer described above may be used. The output value from the dropout layer can be obtained.

以上のように、本発明の第１の実施の形態では、固定値である入力データをドロップアウトによって確率変数に変換して確率分布を生じさせ、解析計算方法により、その確率分布を計算する。また、この計算結果を、従来技術におけるＶＡＥと同様に、正則化の条件に使用する。すなわち、下記の式で表される確率分布ｑ_φ（ｚ｜ｘ）が、下記の式で表される事前分布ｐ_θ（ｚ）とあまりに異ならないよう、同じ形状に留めるための条件を課す。 As described above, in the first embodiment of the present invention, input data that is a fixed value is converted into a random variable by dropout to generate a probability distribution, and the probability distribution is calculated by an analytical calculation method. Also, this calculation result is used as a regularization condition as in the case of VAE in the prior art. In other words, a condition for keeping the same shape is imposed so that the probability distribution q _φ (z | x) expressed by the following expression is not so different from the prior distribution p _θ (z) expressed by the following expression.

例えば、上記の確率分布ｑ_φ（ｚ｜ｘ）と事前分布ｐ_θ（ｚ）とが同じ形状に留まっているかを判定するため、前述のように多変量ガウス分布のＫＬダイバージェンスを使い、２つの多変量ガウス分布の距離を最小にするようなコスト関数を設定する。その式を以下に示す。 For example, in order to determine whether the probability distribution q _φ (z | x) and the prior distribution p _θ (z) remain in the same shape, as described above, the KL divergence of the multivariate Gaussian distribution is used, Set a cost function that minimizes the distance of the multivariate Gaussian distribution. The formula is shown below.

本発明の第１の実施の形態における計算方法は、非特許文献１に開示されている従来技術の計算方法と比較すると、共分散の値を計算している点で大きく異なっている。すなわち、非特許文献１では共分散の値を求めておらず、共分散の値をゼロの値としており、あるいは、共分散の値を求めるためにはさらにニューロンの数を増やす必要があったのに対し、本発明の第１の実施の形態では、エンコーダのより少ないニューロンの数でありながら、上述した解析計算によって共分散の値の計算も行っている。 The calculation method according to the first embodiment of the present invention is greatly different from the calculation method according to the prior art disclosed in Non-Patent Document 1 in that a covariance value is calculated. That is, in Non-Patent Document 1, the covariance value is not obtained, and the covariance value is set to zero, or in order to obtain the covariance value, it is necessary to further increase the number of neurons. On the other hand, in the first embodiment of the present invention, the covariance value is also calculated by the analysis calculation described above, although the number of neurons of the encoder is smaller.

また、本発明の第１の実施の形態における計算方法によれば、オートエンコーダの出力が入力データを再現できるかという条件の判定について、従来技術に係るＶＡＥの判定より簡単に行うことができる。従来技術によれば、エンコーダの出力値はｚの確率分布のパラメータであるため、例えばデコーダに入力するための値を得るためには、さらにその確率分布を作って、ｚの値をサンプリングしなければならない。一方、本発明の第１の実施の形態では、エンコーダの出力そのものが、ｚの値であり、すなわち、エンコーダの出力値をそのままデコーダの入力値として使用することができる。ｚの値を得た後のデコーダにおける処理は、本発明の第１の実施の形態も従来技術も同じである。 Further, according to the calculation method in the first embodiment of the present invention, it is possible to more easily determine the condition of whether the output of the auto encoder can reproduce the input data than the determination of the VAE according to the related art. According to the prior art, since the output value of the encoder is a parameter of the probability distribution of z, for example, in order to obtain a value to be input to the decoder, the probability distribution must be further generated and the value of z must be sampled. I must. On the other hand, in the first embodiment of the present invention, the output of the encoder itself is the value of z, that is, the output value of the encoder can be used as it is as the input value of the decoder. The processing in the decoder after obtaining the value of z is the same in both the first embodiment of the present invention and the prior art.

また、本発明の第１の実施の形態では、ドロップアウト率は、エンコーダで生成するｚの確率分布を表現するために使用されることから、例えばドロップアウト層が１つの場合は、ドロップアウト率は相対的に大きい値（例えば、０．７以上の値）とすることが望ましい。 In the first embodiment of the present invention, the dropout rate is used to express the probability distribution of z generated by the encoder. For example, when there is one dropout layer, the dropout rate is Is preferably a relatively large value (for example, a value of 0.7 or more).

次に、本発明の第１の実施の形態における処理を実行することが可能な情報推定装置について説明する。図６は、本発明の第１の実施の形態における情報推定装置の構成の一例を示すブロック図である。図６の情報推定装置１０は、ニューラルネットワークを用いて推定処理を行う推定器であり、オートエンコーダ計算部２０、エンコーダ出力分布形状計算部３０、コスト関数計算部４０、パラメータ最適化計算部５０を有する。 Next, an information estimation apparatus capable of executing the process according to the first embodiment of the present invention will be described. FIG. 6 is a block diagram showing an example of the configuration of the information estimation apparatus according to the first embodiment of the present invention. The information estimation apparatus 10 in FIG. 6 is an estimator that performs an estimation process using a neural network, and includes an auto encoder calculation unit 20, an encoder output distribution shape calculation unit 30, a cost function calculation unit 40, and a parameter optimization calculation unit 50. Have.

図６に示すブロック図は、本発明に関連した機能を表しているにすぎず、実際の実装では、ハードウェア、ソフトウェア、ファームウェア、又はそれらの任意の組み合わせによって実現されてもよい。ソフトウェアで実装される機能は、１つ又は複数の命令若しくはコードとして任意のコンピュータ可読媒体に記憶され、これらの命令又はコードは、ＣＰＵ（Central Processing Unit：中央処理ユニット）やＧＰＵ（Graphics Processing Unit：グラフィックスプロセッシングユニット）などのハードウェアベースの処理ユニットによって実行可能である。また、本発明に関連した機能は、ＩＣ（Integrated Circuit：集積回路）やＩＣチップセットなどを含む様々なデバイスによって実現されてもよい。 The block diagram shown in FIG. 6 only represents the functions related to the present invention, and in actual implementation, may be realized by hardware, software, firmware, or any combination thereof. Functions implemented by software are stored as one or a plurality of instructions or codes in an arbitrary computer-readable medium, and these instructions or codes are stored in a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit: GPU). It can be executed by a hardware-based processing unit such as a graphics processing unit. The functions related to the present invention may be realized by various devices including an IC (Integrated Circuit) and an IC chip set.

オートエンコーダ計算部２０は、ニューラルネットワークにより構成されたエンコーダ及びデコーダを含むオートエンコーダを有し、入力データＸについてエンコーダ及びデコーダで計算処理を行って、出力データＸを出力する機能を有する。オートエンコーダ計算部２０における計算に用いられるオートエンコーダは、図２〜図５を参照しながら説明したように、１つ又は２つ以上のドロップアウト層がエンコーダ内に設けられており、ドロップアウト層においてデータの一部がランダムに欠損される。これにより、オートエンコーダのエンコーダからの出力（潜在空間における出力）の値ｚを確率変数とすることができる。 The auto encoder calculation unit 20 includes an auto encoder including an encoder and a decoder configured by a neural network, and has a function of performing calculation processing on the input data X by the encoder and decoder and outputting the output data X. As described with reference to FIGS. 2 to 5, the auto encoder used for the calculation in the auto encoder calculation unit 20 is provided with one or more dropout layers in the encoder. A part of the data is lost at random. Thereby, the value z of the output from the encoder of the auto encoder (output in the latent space) can be used as a random variable.

エンコーダ出力分布形状計算部３０は、入力のデータｘがエンコーダでドロップアウトによってどのような確率分布の形状になったかを解析的に計算する機能を有する。エンコーダ出力分布形状計算部３０は、例えば、入力データｘ、ドロップアウト層におけるドロップアウト率、パラメータ（例えば、ＦＣ層における重み及びバイアス）から、潜在空間における出力ｚの分布形状を計算することができる。 The encoder output distribution shape calculation unit 30 has a function of analytically calculating what kind of probability distribution shape the input data x has become due to dropout at the encoder. The encoder output distribution shape calculation unit 30 can calculate the distribution shape of the output z in the latent space from, for example, the input data x, the dropout rate in the dropout layer, and parameters (eg, weight and bias in the FC layer). .

コスト関数計算部４０は、ドロップアウトによるエンコーダ出力分布形状計算部３０で計算された分布形状（潜在空間における出力ｚの分布形状）から正則化の条件を満たすか計算し、さらに、オートエンコーダ計算部２０から算出される出力ｘが入力ｘとどれだけ似ているかを計算することで、これら２つの計算結果を合わせた全体のコスト関数の値を計算する機能を有する。 The cost function calculation unit 40 calculates whether the regularization condition is satisfied from the distribution shape (distribution shape of the output z in the latent space) calculated by the encoder output distribution shape calculation unit 30 by dropout, and further, the auto encoder calculation unit By calculating how much the output x calculated from 20 is similar to the input x, it has a function of calculating the value of the total cost function combining these two calculation results.

パラメータ最適化計算部５０は、コスト関数計算部４０で計算されたコスト関数の値が最適化されるように、オートエンコーダ計算部２０で参照した重み及びバイアスをどの値に最適化するのかを計算する機能を有する。パラメータ最適化計算部５０は、コスト関数の値が最小になるようパラメータ（重み及びバイアス）を計算し、この計算の結果得られたパラメータはオートエンコーダ計算部２０に供給されて、オートエンコーダのパラメータが更新される。 The parameter optimization calculation unit 50 calculates to which value the weight and bias referred to by the auto encoder calculation unit 20 are optimized so that the value of the cost function calculated by the cost function calculation unit 40 is optimized. It has the function to do. The parameter optimization calculation unit 50 calculates parameters (weights and biases) so that the value of the cost function is minimized, and the parameters obtained as a result of this calculation are supplied to the auto encoder calculation unit 20 and parameters of the auto encoder are calculated. Is updated.

以上のように構成された情報推定装置１０において、大量の入力データＸに対して繰返し最適化が行われることで、オートエンコーダから最適解が得られるように最適化が行われる。 In the information estimation apparatus 10 configured as described above, optimization is performed so that an optimal solution can be obtained from the auto encoder by repeatedly performing optimization on a large amount of input data X.

次に、図７を参照しながら、図６に図示されている情報推定装置１０における処理の一例について説明する。図７は、本発明の第１の実施の形態における情報推定装置の処理の一例を示すフローチャートである。 Next, an example of processing in the information estimation apparatus 10 illustrated in FIG. 6 will be described with reference to FIG. FIG. 7 is a flowchart showing an example of processing of the information estimation apparatus according to the first embodiment of the present invention.

図７に示すフローチャートにおいて、最初に、オートエンコーダ計算部２０は、オートエンコーダのパラメータ（重み、バイアス）を初期化する（ステップＳ１０１）。そして、学習データＸがオートエンコーダの入力Ｘとして入力されると（ステップＳ１０２）、オートエンコーダ計算部２０は、オートエンコーダのエンコーダにおいて潜在空間での値ｚを計算する（ステップＳ１０３）。 In the flowchart shown in FIG. 7, first, the auto encoder calculation unit 20 initializes parameters (weights and biases) of the auto encoder (step S101). When the learning data X is input as the input X of the auto encoder (step S102), the auto encoder calculation unit 20 calculates the value z in the latent space in the encoder of the auto encoder (step S103).

また、エンコーダ出力分布形状計算部３０は、ドロップアウト率、入力データＸ、パラメータ（重み、バイアス）から、潜在空間での値ｚの分布形状を計算する（ステップＳ１０４）。エンコーダ出力分布形状計算部３０で計算された潜在空間での値ｚの分布形状に係る情報は、コスト関数計算部４０に供給される。 Also, the encoder output distribution shape calculation unit 30 calculates the distribution shape of the value z in the latent space from the dropout rate, the input data X, and the parameters (weights, bias) (step S104). Information related to the distribution shape of the value z in the latent space calculated by the encoder output distribution shape calculation unit 30 is supplied to the cost function calculation unit 40.

オートエンコーダ計算部２０は、さらに、潜在空間での値ｚを用いて、オートエンコーダのデコーダの出力Ｘを計算する（ステップＳ１０５）。オートエンコーダ計算部２０で計算されたオートエンコーダのデコーダの出力Ｘは、コスト関数計算部４０に供給される。 The auto encoder calculation unit 20 further calculates the output X of the decoder of the auto encoder using the value z in the latent space (step S105). The output X of the decoder of the auto encoder calculated by the auto encoder calculation unit 20 is supplied to the cost function calculation unit 40.

コスト関数計算部４０は、潜在空間での値ｚの分布形状に係る情報に基づいて正則化の条件を満たすかを計算し、さらに、出力Ｘが入力Ｘとどれだけ似ているかを計算して、これらの２つの計算結果を合わせた全体のコスト関数の値を計算する（ステップＳ１０６）。 The cost function calculation unit 40 calculates whether the regularization condition is satisfied based on the information related to the distribution shape of the value z in the latent space, and further calculates how much the output X is similar to the input X Then, the total cost function value obtained by combining these two calculation results is calculated (step S106).

パラメータ最適化計算部５０は、コスト関数計算部４０で計算されたコスト関数の値が最小になるようパラメータ（重み及びバイアス）を計算し、この計算結果に基づいて、オートエンコーダ計算部２０におけるオートエンコーダのパラメータが更新される（ステップＳ１０７）。 The parameter optimization calculation unit 50 calculates parameters (weights and biases) so that the value of the cost function calculated by the cost function calculation unit 40 is minimized, and based on the calculation results, the auto encoder calculation unit 20 The encoder parameters are updated (step S107).

未処理の新しい学習データＸが存在する場合（ステップＳ１０８で「はい」）にはステップＳ１０２に戻り、新しい学習データＸについて同様の処理（ステップＳ１０３〜Ｓ１０７の処理）が実行される。すなわち、大量の学習データＸについて、ステップＳ１０３〜Ｓ１０７の処理が繰り返し実行される。一方、すべての学習データＸについて処理が実行され、未処理の新しい学習データＸが存在しない場合（ステップＳ１０８で「いいえ」）には、処理は終了となる。 If unprocessed new learning data X exists (“Yes” in step S108), the process returns to step S102, and the same processing (processing in steps S103 to S107) is executed for the new learning data X. That is, the processing of steps S103 to S107 is repeatedly executed for a large amount of learning data X. On the other hand, when all the learning data X is processed and there is no unprocessed new learning data X (“No” in step S108), the processing ends.

次に、本発明の第１の実施の形態における情報推定装置を用いて実際に学習最適化計算を行った場合の実験結果について示す。なお、以下に記載する実験においては、図２及び図３に示されているオートエンコーダを採用し、エンコーダにドロップアウト層を１つ設けている。また、潜在空間での値ｚの次元数ｎ_zをｎ_z＝２としている。さらに、本発明に係る技術分野で利用されているＭＮＩＳＴデータ（０〜９の手書き数字の画像セット）を使用して学習を行うことで、入力されたＭＮＩＳＴデータを出力において復元するオートエンコーダを構築している。 Next, an experimental result when learning optimization calculation is actually performed using the information estimation apparatus according to the first embodiment of the present invention will be described. In the experiment described below, the auto encoder shown in FIGS. 2 and 3 is adopted, and one dropout layer is provided in the encoder. Also, the number of dimensions n _z value z in latent space to a n _z = 2. Furthermore, an auto encoder is constructed that restores the input MNIST data at the output by performing learning using the MNIST data (image set of handwritten numerals from 0 to 9) used in the technical field according to the present invention. doing.

最適化のためのアルゴリズムには、二乗平均平方根（ＲＭＳ：root mean square）方式を使い、学習率０．００１でオートエンコーダの重みとバイアスを計算している。また、上述の事前分布は、以下のようにして計算している。 The algorithm for optimization uses a root mean square (RMS) method, and calculates the weight and bias of the auto encoder at a learning rate of 0.001. Further, the above prior distribution is calculated as follows.

なお、もちろん、分散共分散行列の非対角項の部分、すなわち、共分散値を０以外の値にして、正の相関や負の相関を持たせることもできる。 Of course, the part of the non-diagonal term of the variance-covariance matrix, that is, the covariance value can be set to a value other than 0 to have a positive correlation or a negative correlation.

図９及び図１０に、本発明の第１の実施の形態における情報推定装置を用いた実験によって得られた、潜在空間の次元数ｎ_zがｎ_z＝２の場合の潜在空間におけるｚの値の分布を示す。なお、２次元のガウス分布を視覚化表示する方法としては、例えば、図８（ａ）に示すように、ガウス分布の幅を表すσの等高線の楕円と、さらにその分布に従って点在するモンテカルロ的（何度も試行を繰り返すこと）にサンプリングした点の散布図とを示す表示方法と、図８（ｂ）に示すように、ガウス分布の幅を表すσの等高線の楕円と、さらに、そのガウス楕円の中心値、つまり平均値の点とを示す表示方法がある。図９は、実験結果を図８（ａ）の表示方法で表した図であり、図１０は、実験結果を図８（ｂ）の表示方法で表した図である。 FIG. 9 and FIG. 10 show the value of z in the latent space when the dimension number n _z of the latent space is n _z = 2 obtained by the experiment using the information estimation apparatus in the first embodiment of the present invention. The distribution of. As a method for visualizing and displaying a two-dimensional Gaussian distribution, for example, as shown in FIG. 8 (a), a σ contour ellipse representing the width of the Gaussian distribution, and a Monte Carlo-like dot scattered according to the distribution. A display method showing a scatter diagram of sampled points (repeating trials many times), an ellipse of a contour line of σ representing the width of a Gaussian distribution, and the Gaussian as shown in FIG. There is a display method that shows the center value of the ellipse, that is, the average value point. FIG. 9 is a diagram showing the experimental results by the display method of FIG. 8A, and FIG. 10 is a diagram showing the experimental results by the display method of FIG. 8B.

図９及び図１０に図示されている実験結果は、ＭＮＩＳＴデータを用いて５０００回の最適化学習を行った状態で、モンテカルロ的に４００個サンプリングした場合の潜在空間におけるｚの値の分布を示している。ＭＮＩＳＴデータの手書き数字０〜９いずれかのある１つの画像入力データに対し、１つのｚの値の分布（楕円）が潜在空間に存在する。図９及び図１０では、ＭＮＩＳＴデータの画像の異なる手書き数字０〜９のそれぞれに対応する潜在空間でのｚの値が異なる色によって表されている。 The experimental results shown in FIG. 9 and FIG. 10 show the distribution of the value of z in the latent space when the Monte Carlo sampling is performed 400 times with 5000 optimization learning using the MNIST data. ing. A distribution (ellipse) of one z value exists in the latent space for one piece of image input data that is one of the handwritten numerals 0 to 9 of the MNIST data. 9 and 10, z values in the latent space corresponding to the different handwritten numerals 0 to 9 of the MNIST data image are represented by different colors.

なお、本発明に係る技術分野では、通常、ＭＮＩＳＴデータの手書き数字０〜９に対応して、例えばＶＡＥにおける潜在空間での値ｚを色分け表示することが行われている。図９及び図１０も、当業者が容易に理解できるようにこうした慣例にならって作成されたものであって本来はカラー図面であるが、モノクロ図面では色を表現することが困難である。図９及び図１０に関して、手書き数字０〜９及び各数字に対応づけられた色について概略的に説明すると、潜在空間での値ｚは、手書き数字が０の場合は赤、１の場合は緑、２の場合は青、３の場合は黄色、４の場合は水色、５の場合は紫、６の場合はオレンジ、７の場合はピンク、８の場合は灰色、９の場合は黒にそれぞれ対応している。また、必ずしも正確な表現ではないが、図９及び図１０の中心に対して、赤の点は１時の方向、緑の点は９時の方向、青の点は１２時の方向、黄色の点は５時の方向、水色の点は５時の方向、紫の点は６時の方向、オレンジの点は５時の方向、ピンクの点は６時の方向、灰色の点は１１時の方向、黒の点は４時の方向に塊を形成して広がりを有している。このように、図９及び図１０では、２次元の潜在空間内で、同じ色同士、つまり同じ手書き数字同士が塊を形成して広がっている。したがって、入力されたＭＮＩＳＴデータに対して、手書き数字０〜９のいずれの画像であったのかを、正解ラベル無しの教師無し学習で、自動的に分類ができていることがわかる。 Note that, in the technical field according to the present invention, for example, the value z in the latent space in the VAE is displayed in different colors, for example, corresponding to the handwritten numerals 0 to 9 of the MNIST data. FIG. 9 and FIG. 10 are also made in accordance with such a convention so that those skilled in the art can easily understand, and are originally color drawings, but it is difficult to express colors in monochrome drawings. Referring to FIGS. 9 and 10, the handwritten numerals 0 to 9 and the colors associated with the numerals are schematically described. The value z in the latent space is red when the handwritten numeral is 0, and green when the handwritten numeral is 1. 2 for blue, 3 for yellow, 4 for light blue, 5 for purple, 6 for orange, 7 for pink, 8 for gray, 9 for black It corresponds. Also, although not necessarily an accurate representation, the red point is the direction of 1 o'clock, the green point is the direction of 9 o'clock, the blue point is the direction of 12 o'clock, and the center of FIGS. 9 and 10 is yellow. Point is 5 o'clock, light blue is 5 o'clock, purple is 6 o'clock, orange is 5 o'clock, pink is 6 o'clock, gray is 11 o'clock The direction and the black dot form a lump in the 4 o'clock direction and have a spread. As described above, in FIGS. 9 and 10, the same colors, that is, the same handwritten numerals spread in a two-dimensional latent space, forming a lump. Therefore, it can be seen that the input MNIST data can be automatically classified by the unsupervised learning without the correct answer label as to which of the handwritten numerals 0-9.

なお、例えば図９では、本発明の第１の実施の形態における解析的な計算によって得られた、潜在空間でのｚの値のガウス分布のパラメータ（平均値、分散共分散値）に基づいて、各手書き数字の入力に対応する事後確率分布をｑ_φ（ｚ｜ｘ）が楕円で表されている。さらに、解析的な計算によって得られた事後確率分布（楕円）が正しいかどうかを視覚的に検証すべく、それぞれの楕円に対してモンテカルロ的にドロップアウトにより確率的に４００個分散させた点を散布図としてプロットしている。これは、確かに解析的な計算によって得られた楕円が、ドロップアウトにより生じた確率分布をとらえていると評価するために行ったものであるが、実際に実施する場合には、このような細かい点を描画するための標本計算は不要である。 For example, in FIG. 9, based on the Gaussian distribution parameters (mean value, variance covariance value) of the z value in the latent space, obtained by the analytical calculation in the first embodiment of the present invention. The posterior probability distribution corresponding to the input of each handwritten digit is represented by an ellipse q _φ (z | x). Furthermore, in order to visually verify whether the posterior probability distribution (ellipse) obtained by analytical calculation is correct or not, 400 points are distributed in a stochastic manner by dropout in a Monte Carlo manner for each ellipse. Plotted as a scatter plot. This was done in order to evaluate that the ellipse obtained by the analytical calculation certainly captured the probability distribution caused by the dropout. No sample calculation is required to draw fine points.

一方、非特許文献１に開示されている従来技術に係るＶＡＥでは、図１を参照して説明したように、オートエンコーダの中心の潜在空間に関して計算できるのは、ｚの値そのものではなく、ｚがとる分布のパラメータである。したがって、従来技術に係るＶＡＥでは、図９及び図１０に示すようなｚの値の散布図を直接描画することができない。このように、従来技術に係るＶＡＥでは共分散値の計算は行われないことから、平均、分散、共分散の全てを使って初めて分かる確率分布の形状、すなわち、図９及び図１０に示されている楕円形状を描くことができない。したがって、従来技術に係るＶＡＥでは、実際の個々のｚの値が潜在空間内で、異なる入力の手書き数字画像ごとに重なっているのか、あるいはきちんと分離できているのかを見ることもできない。 On the other hand, in the VAE according to the prior art disclosed in Non-Patent Document 1, as described with reference to FIG. 1, what can be calculated regarding the latent space at the center of the auto encoder is not the value of z itself but z Is a distribution parameter. Therefore, in the VAE according to the prior art, a scatter diagram of z values as shown in FIGS. 9 and 10 cannot be directly drawn. Thus, since the VAE according to the prior art does not calculate the covariance value, the shape of the probability distribution that can be understood only by using all of the mean, variance, and covariance, that is, shown in FIG. 9 and FIG. I can't draw an elliptical shape. Therefore, in the VAE according to the related art, it is impossible to see whether the actual individual z values overlap each other in different input handwritten numeral images or can be separated properly in the latent space.

また、もし、従来のＶＡＥで得られる結果を用いて図９及び図１０に示すような分布を表示しようとする場合には、ｚがとる分布のパラメータとして平均値μ_zと分散値ｄｉａｇ（Σ_z）だけではなく、潜在空間における共分散値ｏｆｆｄｉａｇ（Σ_z）の出力も用意して重みを学習させ、学習後の完成した分布からサンプリングを行ったうえで散布図として表示する必要がある。すなわち、従来のＶＡＥで共分散値を計算しようとした場合には、分布形状を決定づけるパラメータがより多く必要となり、より複雑な構造を設計する必要がある。 Further, if it is intended to display the distribution as shown in FIGS. 9 and 10 using the result obtained by the conventional VAE, the average value μ _z and the variance value diag (Σ It is necessary to prepare not only _z ) but also the output of the covariance value offdiag (Σ _z ) in the latent space to learn the weights, perform sampling from the completed distribution after learning, and display it as a scatter diagram. That is, when trying to calculate the covariance value by the conventional VAE, more parameters for determining the distribution shape are required, and it is necessary to design a more complicated structure.

また、図１１（ａ）及び（ｂ）は、本発明の第１の実施の形態における情報推定装置を用いた実験結果を評価するために作成された図である。図１１（ａ）及び（ｂ）は、２次元の潜在空間における２０×２０の各グリッドでサンプリングを行って、各グリッドの値をデコーダで手書き数字の画像に復元した結果を、そのグリッドの位置を反映させたまま並べてプロットした図である。なお、図１１（ａ）には、オートエンコーダの最適化学習の回数がゼロ（学習回数＝０、すなわち学習前）のときに得られた出力が図示されており、図１１（ｂ）には、オートエンコーダの最適化学習の回数が５０００回目（学習回数＝５０００、すなわち学習後）のときに得られた出力が図示されている。 FIGS. 11A and 11B are diagrams created in order to evaluate the experimental results using the information estimation apparatus according to the first embodiment of the present invention. 11 (a) and 11 (b) show the results of sampling each 20 × 20 grid in a two-dimensional latent space and restoring the values of each grid into an image of a handwritten numeral by the decoder. It is the figure plotted side by side reflecting FIG. 11A shows an output obtained when the number of optimization learnings of the auto encoder is zero (learning number = 0, that is, before learning), and FIG. 11B shows the output. The output obtained when the number of optimization learnings of the auto encoder is 5000th (learning number = 5000, that is, after learning) is shown.

最適化学習の回数がゼロのときには、オートエンコーダからの出力は、入力された手書き数字画像を復元できておらず、図１１（ａ）に示すようにただのランダムなノイズである。一方、最適化学習の回数が５０００回目では、オートエンコーダからの出力は、図１１（ａ）に示すように入力された手書き数字画像を復元できていることがわかる。また、形状が似たような数字は、潜在空間内で似たような箇所に存在しており、従来技術に係るＶＡＥと同じような結果が得られている。 When the number of times of optimization learning is zero, the output from the auto encoder cannot be restored from the input handwritten numeral image, and is merely random noise as shown in FIG. On the other hand, when the number of optimization learnings is 5000, it can be seen that the output from the auto encoder can restore the input handwritten numeral image as shown in FIG. In addition, numbers having similar shapes are present in similar places in the latent space, and the same results as the VAE according to the prior art are obtained.

＜第２の実施の形態＞
次に、本発明の第２の実施の形態について説明する。上述した第１の実施の形態では、潜在空間でのｚの値の確率分布ｑ_φ（ｚ｜ｘ）は多変量ガウス分布であるとして計算を行っている。しかしながら、ＤＦ層からの出力Ｘｏｕｔ^DFを計算するためのｘｉｎ^DF _jＷ_i,j ^DF項の中に、他の項に比べて逸脱して大きい値を持つ項が存在する場合には、上述した第１の実施の形態のようなＤＦ層からの出力Ｘｏｕｔ^DFを多変量ガウス分布とする近似が成り立たない。その場合は、特許文献１に記述されるように、ピーク項と呼ばれる逸脱したｘｉｎ^DF _jＷ_i,j ^DF項に対しては、ピーク項がドロップアウトされた場合及びドロップアウトされなかった場合を個別に考えることで、確率変数ではなく、条件確率下での定数ととらえ、それぞれの場合の下で、上述した第１の実施の形態のような多変量ガウス分布として計算することができる。そして、その場合は、複数の場合ごとの条件確率下での多変量ガウス分布となるため、ＤＦ層からの出力Ｘｏｕｔ^DFは多変量“混合”ガウス分布となる。 <Second Embodiment>
Next, a second embodiment of the present invention will be described. In the first embodiment described above, the calculation is performed assuming that the probability distribution q _φ (z | x) of the value of z in the latent space is a multivariate Gaussian distribution. However, if there is a term in the xin ^DF _j Wi _{, j} ^DF term for calculating the output Xout ^DF from the DF layer that deviates from other terms and has a large value, the above-mentioned An approximation in which the output Xout ^DF from the DF layer is a multivariate Gaussian distribution as in the first embodiment does not hold. In that case, as described in Patent Document 1, for a deviated xin ^DF _j Wi _{, j} ^DF term called a peak term, a case where the peak term is dropped out and a case where it is not dropped out are shown. By considering them individually, they are not random variables but are regarded as constants under conditional probabilities, and can be calculated as multivariate Gaussian distributions as in the first embodiment described above under each case. In that case, since the multivariate Gaussian distribution is obtained under the condition probabilities for each of a plurality of cases, the output Xout ^DF from the DF layer is a multivariate “mixed” Gaussian distribution.

なお、上述の第１の実施の形態では、ＤＦ層からの出力Ｘｏｕｔ^DFの重みの計算に相当する項をＷ_i,j ^DFＸｉｎ^DF _jと記載していたが、第２の実施の形態では、ｘｉｎ^DF _jＷ_i,j ^DFと記載する。両者は表記が異なっているもの、同一の項を表している。 In the first embodiment described above, the term corresponding to the calculation of the weight of the output Xout ^DF from the DF layer is described as W _{i, j} ^DF Xin ^DF _j , but in the second embodiment, , Xin ^DF _j Wi _{, j} ^DF . Both represent different terms and the same term.

ドロップアウト層と全結合層からなるＤＦ層に関して、その出力ベクトルのｉ番目の要素Ｘｏｕｔ^DF _iは、重みＷと入力Ｘｉｎ^DFとの積の和に、バイアス項ｂ_i ^DFを加えたものであり、以下の式のように表される。 For the DF layer consisting of a dropout layer and a fully coupled layer, the i-th element Xout ^DF _i of the output vector is the sum of the product of the weight W and the input Xin ^DF plus the bias term b _i ^DF . Is expressed as the following equation.

その中のある１つの項が、他の項より逸脱してその絶対値が大きいピーク項（ｊ＝ｐｅａｋ）である場合、つまり以下の式が成り立つ場合には、２つのガウス分布が混合した混合ガウス分布となる。 When one of the terms is a peak term (j = peak) that deviates from the other terms and has a large absolute value, that is, when the following formula is satisfied, a mixture in which two Gaussian distributions are mixed: Gaussian distribution.

なお、上記の式の不等号「≫」は、左辺の値が右辺の値より逸脱して大きいことを意味する。 The inequality sign “>>” in the above expression means that the value on the left side is larger than the value on the right side.

以下、より一般的な場合として、ＤＦ層（例えば、図３のＤＦ１層）からの出力ベクトルＸｏｕｔ^DFの確率分布について、どのように多変量混合ガウス分布として計算されるのかについて説明する。 Hereinafter, as a more general case, how the probability distribution of the output vector Xout ^DF from the DF layer (for example, the DF1 layer in FIG. 3) is calculated as a multivariate mixed Gaussian distribution will be described.

第１の実施の形態と全く同様に、ｎ_Xout ^DF次元の出力ベクトルＸｏｕｔ^DFは、ｎ_Xout ^DF個の要素を持つ確率変数ベクトルであり、ｉ番目の要素（１≦ｉ≦ｎ_Xout ^DF）を、Ｘｏｕｔ^DF _iと表す。それぞれの要素Ｘｏｕｔ^DF _iは、以下の式のように、インデックスｊ（１≦ｊ≦ｎ_Xin ^DF）で表されるｎ_Xin ^DF個のｘＷ項を持った式となる。 Just as in the first embodiment, the n _Xout ^DF- dimensional output vector Xout ^DF is a random variable vector having n _Xout ^DF elements, and the i-th element (1 ≦ i ≦ n _Xout ^DF ) And Xout ^DF _i . Each element Xout ^DF _i is an expression having n _Xin ^DF xW terms represented by an index j (1 ≦ j ≦ n _Xin ^DF ) as in the following expression.

ここで、前述のピーク項（ｊ＝ｐｅａｋ）とは、ある１つのインデックスｉ番目の行（Ｘｏｕｔ^DF _i）の中で逸脱して大きい値を持つｘＷ項ではなく、１≦ｉ≦ｎ_Xout ^DFの範囲のすべての行の中において、共通のインデックスｊを持つ最も逸脱したｘＷ項の値であり、ｊ番目の列（“カラム”）のことである。そのため、あるインデックスｉで特定される１つの行だけからピーク項を決定することはできず、例えば以下のような手順で、すべての行のインデックスｉを見ながら、ピーク項（ｊ＝ｐｅａｋ）のカラムを見つけ出す必要がある。 Here, the aforementioned peak term (j = peak) is not an xW term that deviates and has a large value in one index i-th row (Xout ^DF _i ), but 1 ≦ i ≦ n _Xout ^DF Is the value of the most deviating xW term with a common index j in all rows in the range, and is the j th column (“column”). Therefore, the peak term cannot be determined only from one row specified by a certain index i. For example, the peak term (j = peak) You need to find the column.

まず、すべてのｎ_Xin ^DF個のカラムに対して、逸脱度を示すカラムの箱ＰｅａｋＳｃｏｒｅ_j（１≦ｊ≦ｎ_Xin ^DF）を用意し、以下のように初期値をゼロとする。 First, for every n _Xin ^DF columns, a box PeakScore _j (1 ≦ j ≦ n _Xin ^DF ) indicating a deviation degree is prepared, and the initial value is set to zero as follows.

次に、あるｉ番目の行におけるピーク項を探す。すなわち、ｉ番目の行のすべてのｘＷ_j項（１≦ｊ≦ｎ_Xin ^DF）の平均値ｘＷＭｅａｎ_iを計算する。 Next, the peak term in a certain i-th row is searched. That is, the average value xWMean _i of all xW _j terms (1 ≦ j ≦ n _Xin ^DF ) in the i-th row is calculated.

なお、右辺は、あるｉ番目の行において、すべてのインデックスｊのｘＷ_j項の平均値を計算することを意味する。さらに、そのあるｉ番目の行における、それぞれのｘＷ_j項（１≦ｊ≦ｎ_Xin ^DF）について、平均値からどれだけ逸脱しているかを示す値ｘＷＤｅｖｉａｔｉｏｎ_i,jを計算する。この値は、例えば以下の式のように、平均値との差分の絶対値として計算される。 The right side means that the average value of the xW _j terms of all indexes j is calculated in a certain i-th row. Further, for each xW _j term (1 ≦ j ≦ n _Xin ^DF ) in a certain i-th row, a value xWDevision _{i, j} indicating how far from the average value is calculated. This value is calculated as an absolute value of a difference from the average value, for example, as in the following equation.

これにより、あるｉ番目の行におけるｊ番目のｘＷ_j項が平均値からどれだけ逸脱しているかを示すスコア（逸脱度）を計算することができる。すべての行（すべてのインデックスｉ）について上記の計算を行い、累積的に各インデックスｊに対するスコアを蓄積していく。例えば以下のように、ｘＷＤｅｖｉａｔｉｏｎ_i,jの値を前述したカラムの箱ＰｅａｋＳｃｏｒｅ_jに足し合わせる。 As a result, a score (deviation) indicating how much the j-th xW _j term in an i-th row deviates from the average value can be calculated. The above calculation is performed for all rows (all indexes i), and the score for each index j is accumulated cumulatively. For example _, the value of xWDDev _i, _j is added to the above-described column box PeakScore _j as follows.

以上の計算をすべての行（すべてのインデックスｉ：１≦ｉ≦ｎ_Xout ^DF）について繰り返し、ＰｅａｋＳｃｏｒｅ_jを更新していくことで、最終的に、各カラム（各インデックスｊ）の逸脱度を得ることができる。そして、最終的に得られたＰｅａｋＳｃｏｒｅ_j（１≦ｊ≦ｎ_Xin ^DF）を値が大きいものから順に並べて、ＰｅａｋＳｃｏｒｅ_jが大きい値から順番に所定の個数（例えば、Ｋ個）のインデックスｊを記録する。これにより、Ｋ個のインデックスｊ（ｊ_k=1，ｊ_k=2，…，ｊ_k=K）が、コラムとしてのピーク項ｘＷ_jの候補として特定される。 The above calculation is repeated for all rows (all indexes i: 1 ≦ i ≦ n _Xout ^DF ), and PeakScore _j is updated to finally obtain the deviation degree of each column (each index j). be able to. Then, finally obtained PeakScore _j (1 ≦ j ≦ n _Xin ^DF ) is arranged in descending order, and a predetermined number (for example, K) of indexes j are recorded in order from the largest PeakScore _j. To do. Thus, K indexes j (j _{k = 1} , j _{k = 2} ,..., J _{k = K} ) are specified as candidates for the peak term xW _j as a column.

次に、それぞれのピーク項ｘＷ_jに対して、ドロップアウトされた場合／ドロップアウトされなかった場合の組み合わせを考え、混合ガウス分布を作成する。Ｋ個のピーク項を考慮した場合には、混合ガウス分布の混合数は２^K個となる。 Next, for each peak term xW _j , a combination of cases where it is dropped out / not dropped out is considered, and a mixed Gaussian distribution is created. When K peak terms are considered, the number of mixtures in the mixed Gaussian distribution is 2 ^K.

なお、ピーク項として記録する個数（Ｋ個）が大きいほど、正確に真の確率分布を計算することができるが、一方、Ｋの値を大きくすれば計算負荷が大きくなってしまう。したがって、Ｋの値は、計算負荷とのトレードオフにより、計算処理できる範囲でユーザが事前に指定してもよい。ピーク項の個数（Ｋの値）は１又は２以上の整数とすることが可能であり、また、ピーク項の個数（Ｋの値）をゼロとした場合は、上述した本発明の第１の実施の形態と同様の計算となる。 Note that the true probability distribution can be calculated more accurately as the number (K) recorded as peak terms is larger. On the other hand, increasing the value of K increases the calculation load. Therefore, the value of K may be specified in advance by the user within a range that can be calculated by a trade-off with the calculation load. The number of peak terms (K value) can be 1 or an integer greater than or equal to 2, and when the number of peak terms (K value) is zero, the first aspect of the present invention described above can be used. The calculation is the same as in the embodiment.

以下、第１の実施の形態に係る計算において、Ｋ個のすべてのピーク項ｘＷ_j（ｊ＝ｊ_K=1，ｊ_K=2，…，ｊ_k=K）についてドロップアウトされた場合／ドロップアウトされなかった場合を考慮し、それぞれの場合における条件確率の下でガウス分布として近似した出力Ｘｏｕｔ^DFの確率分布を計算する計算方法について、具体的な例を用いて説明する。 Hereinafter, in the calculation according to the first embodiment, when all K peak terms xW _j (j = j _{K = 1} , j _{K = 2} ,..., J _{k = K} ) are dropped out / dropped A calculation method for calculating the probability distribution of the output Xout ^DF approximated as a Gaussian distribution under the conditional probabilities in each case will be described using a specific example.

ここでは、具体例としてピーク項の個数を２個（Ｋ＝２）とし、前述のＰｅａｋＳｃｏｒｅ_jから計算された２つのピーク項ｘＷ_jのインデックスｊ（ｊ＝ｊ_K=1，ｊ_K=2）が、ｊ_K=1＝３、ｊ_K=2＝５であった場合を考える。すなわち、ピーク項はｘＷ_j=3とｘＷ_j=5である。 Here, as a specific example, the number of peak terms is two (K = 2), and the indexes j (j = j _{K = 1} , j _{K = 2} ) of the two peak terms xW _j calculated from the aforementioned PeakScore _j. Suppose that j _{K = 1} = 3 and j _{K = 2} = 5. That is, the peak terms are xW _{j = 3} and xW _{j = 5} .

２つのピーク項ｘＷ_j=3、ｘＷ_j=5がドロップアウトされた場合／ドロップアウトされなかった場合の組み合わせは、下記のケース（１）〜（４）の２^K=2＝４通り存在する。 There are 2 ^{K = 2} = 4 combinations in the following cases (1) to (4) when the two peak terms xW _{j = 3} and xW _{j = 5} are dropped out / not dropped out. .

（１）ｘＷ_j=3がドロップアウトされた、ｘＷ_j=5がドロップアウトされた
（２）ｘＷ_j=3がドロップアウトされた、ｘＷ_j=5がドロップアウトされなかった
（３）ｘＷ_j=3がドロップアウトされなかった、ｘＷ_j=5がドロップアウトされた
（４）ｘＷ_j=3がドロップアウトされなかった、ｘＷ_j=5がドロップアウトされなかった (1) xW _{j = 3} was dropped out, xW _{j = 5} was dropped out (2) xW _{j = 3} was dropped out, xW _{j = 5} was not dropped out (3) xW _{j = 3} was not dropped out, xW _{j = 5} was dropped out (4) xW _{j = 3} was not dropped out, xW _{j = 5} was not dropped out

上記４つのケース（１）〜（４）を考慮すると、出力Ｘｏｕｔ^DFの確率分布は４つの多変量混合ガウス分布となる。ケース（１）〜（４）のそれぞれのケースが起こり得る確率は、ＤＦ層におけるドロップアウト率をｐ_Drop ^DFとすると、以下のようになる。 Considering the above four cases (1) to (4), the probability distribution of the output Xout ^DF becomes four multivariate mixed Gaussian distributions. The probability that each of cases (1) to (4) may occur is as follows when the dropout rate in the DF layer is p _Drop ^DF .

ピーク項に対応するインデックスｊ_K=1＝３、ｊ_K=2＝５以外のすべてのインデックスｊにおけるｘｉｎ^DF _jＷ_i,j ^DF項（１≦ｊ、ｊ≠３,ｊ≠５≦ｎ_Xin ^DF）は、ドロップアウトにより消えたり残ったりとゆらゆら変化する確率変数である。一方、ピーク項ｘｉｎ^DF _j=3Ｗ_i,j=3 ^DFとｘｉｎ^DF _j=5Ｗ_i,j=5 ^DFは、それぞれの項がドロップアウトされた場合／ドロップアウトされなかった場合を考えるので、それぞれの条件下での固定値として取り扱うことができる。このことから、第２の実施の形態では、第１の実施の形態に係る計算において、あるｉ番目の行における確率変数として考えるｘｉｎ^DF _jＷ_i,j ^DF項群のうち、ピーク項ｘｉｎ^DF _j=3Ｗ_i,j=3 ^DFとピーク項ｘｉｎ^DF _j=5Ｗ_i,j=5 ^DFは取り除いて、以下のように計算する。 _Xin ^DF _j Wi _{, j} ^DF term (1 ≦ j, j ≠ 3, j ≠ 5 ≦ n _Xin ) in all indexes j other than the index j _{K = 1} = 3 and j _{K = 2} = 5 corresponding to the peak term ^DF ) is a random variable that fluctuates and disappears due to dropout. On the other hand, since the peak terms xin ^DF _{j = 3} Wi _{, j = 3} ^DF and xin ^DF _{j = 5} Wi _{, j = 5} ^DF are considered when each term is dropped out / not dropped out. , And can be treated as a fixed value under each condition. Therefore, in the second embodiment, in the calculation according to the first embodiment, among the xin ^DF _j Wi _{, j} ^DF term groups considered as random variables in a certain i-th row, the peak term xin ^DF _{j = 3} W _{i, j = 3} ^DF and peak term xin ^DF _{j = 5} W _{i, j = 5} ^DF are removed and calculation is performed as follows.

したがって、ケース（１）〜（４）のそれぞれの場合において、平均値は以下のようになる。 Therefore, in each case (1) to (4), the average value is as follows.

また、分散値は、以下のように、第１の実施の形態と同様の式で計算できる。 In addition, the variance value can be calculated by the same formula as in the first embodiment as follows.

ただし、ＬｉｓｔＷ^DFｘ^DF _iに関して、２つのピーク項を確率変数でなく定数として取り扱うため、バイアス項と同様にピーク項ｘｉｎ^DF _j=3Ｗ_i,j=3 ^DFとピーク項ｘｉｎ^DF _j=5Ｗ_i,j=5 ^DFを無視することができる。したがって、以下の式のように、ピーク項に対応するインデックスｊ_K=1＝３、ｊ_K=2＝５を除くｘｉｎ^DF _jＷ_i,j ^DF項のリストＬｉｓｔＷ^DFｘ^DF _{j≠3,j≠5,i}を計算に使用する。 However, with respect to ListW ^DF x ^DF _i , since the two peak terms are treated as constants instead of random variables, the peak term xin ^DF _{j = 3} W _{i, j = 3} ^DF and the peak term xin ^DF _{j = 5} as well as the bias term. _{Wi, j = 5} ^DF can be ignored. Accordingly, as shown in the following expression, the index _{j K = 1 = 3, j} K = xin DF j W i except 2 = _{5, j} list of ^DF claim ListW ^{^DF} x ^DF _{j ≠ 3} corresponding to the peak _{section, j ≠ 5, i} is used for the calculation.

このようにピーク項を除いたＬｉｓｔＷ^DFｘ^DF _iを使用して、前述した式から分散値Ｖａｒ（Ｘｏｕｔ^DF _i）を求める。分散値Ｖａｒ（Ｘｏｕｔ^DF _i）は、ケース（１）〜（４）において、すべて同じ値となる。 In this way, using ListW ^DF x ^DF _i excluding the peak term, the variance value Var (Xout ^DF _i ) is obtained from the above-described equation. The variance value Var (Xout ^DF _i ) is the same in all cases (1) to (4).

また、共分散値も第１の実施の形態と同様に求められる。 Also, the covariance value is obtained in the same manner as in the first embodiment.

共分散値は、ケース（１）〜（４）において、すべて同じ値となる。 The covariance values are all the same in cases (1) to (4).

最終的に、分散共分散行列は、すべてのケース（１）〜（４）において同じ値となる。 Finally, the variance-covariance matrix has the same value in all cases (1) to (4).

以上、４つのケース（１）〜（４）について、各ケースが起こり得る確率値と、各ケースにおける平均値、分散値、共分散値が計算できる。これらを単純に確率値を重みとして足し合わせることで、以下の式のように、４つのガウス分布を混合させた多変量混合ガウス分布として、出力値の確率分布を計算することができる。 As described above, for the four cases (1) to (4), the probability value in which each case can occur and the average value, variance value, and covariance value in each case can be calculated. By simply adding the probability values as weights, the probability distribution of the output values can be calculated as a multivariate mixed Gaussian distribution in which four Gaussian distributions are mixed as in the following equation.

また、第１の実施の形態では、出力値の確率分布ｑ_φ（ｚ｜ｘ）が正則化の条件を満たすかを判定するために、多変量ガウス分布である確率分布ｑ_φ（ｚ｜ｘ）と事前分布ｐ_θ（ｚ）とのＫＬダイバージェンスを計算している。一方、第２の実施の形態では、出力値の確率分布ｑ_φ（ｚ｜ｘ）が混合ガウス分布である。混合ガウスのＫＬダイバージェンスの計算には、解析的解は存在しないが、非特許文献２に挙げられるような、変分近似（Variational Approximation）法など、様々な近似計算法で計算することができる。 In the first embodiment, the probability of the output value distribution q _phi | to (z x) to determine whether the condition is satisfied regularization probability multivariate Gaussian distribution q φ _(z | x ) And the prior distribution p _θ (z). On the other hand, in the second embodiment, the output value probability distribution q _φ (z | x) is a mixed Gaussian distribution. There is no analytical solution for the calculation of the mixed Gaussian KL divergence, but it can be calculated by various approximation calculation methods such as the variational approximation method described in Non-Patent Document 2.

以上の第２の実施の形態に係る計算方法により、第１の実施の形態の拡張として、潜在空間でのｚの値の確率分布ｑ_φ（ｚ｜ｘ）を多変量混合ガウス分布として計算することができる。その計算結果として、図１２及び図１３に、それぞれ、ピーク項の個数を４個（Ｋ＝４）として、２^K=4＝１６個のガウス分布からなる混合ガウス分布で、潜在変数の潜在空間でのｚの値の確率分布ｑ_φ（ｚ｜ｘ）を２次元でプロットさせたものを示す。この場合、入力画像としては、図の右上に小さく示される文字「Ｈ」の画像を入れた。図９に示すガウス分布のときと同様に、モンテカルロの分布（散布図や１次元のヒストグラム）と解析的分布（２次元の等高線、１次元の関数の形状）が一致し、解析的に分布を混合ガウスとして計算できていることがわかる。 With the calculation method according to the second embodiment described above, as an extension of the first embodiment, the probability distribution q _φ (z | x) of the value of z in the latent space is calculated as a multivariate mixed Gaussian distribution. be able to. As a result of the calculation, FIGS. 12 and 13 each show a mixed Gaussian distribution composed of 2 ^{K = 4} = 16 Gaussian distributions with 4 peak terms (K = 4), and the latent space of the latent variable. 2 shows a two-dimensional plot of the probability distribution q _φ (z | x) of the value of z. In this case, as the input image, an image of the letter “H” shown small in the upper right of the figure is put. As in the case of the Gaussian distribution shown in FIG. 9, the Monte Carlo distribution (scatter chart or one-dimensional histogram) and the analytical distribution (two-dimensional contour lines, one-dimensional function shape) coincide, and the distribution is analytically determined. It can be seen that it can be calculated as a mixed Gaussian.

また、図５に図示されているように複数のドロップアウト層を設けた場合においても、混合ガウス分布の条件確率下でのそれぞれのガウス分布について個別に第１の実施の形態と同様の計算を行うことで、出力値ｑ_φ（ｚ｜ｘ）の確率分布を計算することができる。ただし、エンコーダに設けられたＤＦ層で計算が行われるとガウス分布がさらに複数の混合ガウス分布に分かれるため、複数の各ＤＦ層を伝搬するごとに、混合数がどんどん増えていく。そのため、例えば既存の技術を利用して、似ている混合ガウス分布同士を融合させるなどの処理を行うことで、ガウス分布の混合数を低減させながら計算を行ってもよい。 Further, even when a plurality of dropout layers are provided as shown in FIG. 5, the same calculation as that of the first embodiment is performed individually for each Gaussian distribution under the condition probability of the mixed Gaussian distribution. By doing so, the probability distribution of the output value q _φ (z | x) can be calculated. However, when the calculation is performed in the DF layer provided in the encoder, the Gaussian distribution is further divided into a plurality of mixed Gaussian distributions, so that the number of mixtures increases every time a plurality of DF layers are propagated. Therefore, for example, calculation may be performed while reducing the number of Gaussian distributions by performing a process such as merging similar Gaussian distributions using existing technology.

また、本発明の第２の実施の形態に係る情報推定装置は、本発明の第１の実施の形態に係る情報推定装置の構成（図６に図示されている構成）を拡張することによって実現可能である。例えば、オートエンコーダ計算部２０に、ＤＦ層の出力値Ｘｏｕｔ^DF _iを計算する際に現れる重みＷと入力Ｘｉｎ^DFとの積であるｘＷ項のピーク項を決定する機能を有するデータ解析部を設ければよい。そして、オートエンコーダ計算部２０が、データ解析部で特定されたＫ個のピーク項について上述した計算を実行するよう拡張されることで、潜在空間において、多変量混合ガウス分布に従ったｚの値を出力することが可能となる。また、正則化の条件に係る計算についても、オートエンコーダ計算部２０が上述した計算を実行するよう拡張されればよい。 Moreover, the information estimation apparatus according to the second embodiment of the present invention is realized by extending the configuration of the information estimation apparatus according to the first embodiment of the present invention (configuration illustrated in FIG. 6). Is possible. For example, the auto encoder calculation unit 20 is provided with a data analysis unit having a function of determining the peak term of the xW term that is the product of the weight W that appears when calculating the output value Xout ^DF _i of the DF layer and the input Xin ^DF Just do it. Then, the auto encoder calculation unit 20 is expanded to perform the above-described calculation for the K peak terms specified by the data analysis unit, so that the value of z according to the multivariate mixed Gaussian distribution in the latent space. Can be output. Further, the calculation related to the regularization condition may be extended so that the auto encoder calculation unit 20 executes the above-described calculation.

本発明は、ニューラルネットワークを使用した推定技術に適用可能であり、確率的要素を備えた新たなオートエンコーダを実現することが可能である。 The present invention is applicable to an estimation technique using a neural network, and can realize a new auto encoder having a stochastic element.

１０情報推定装置
２０オートエンコーダ計算部
３０エンコーダ出力分布形状計算部
４０コスト関数計算部
５０パラメータ最適化計算部 DESCRIPTION OF SYMBOLS 10 Information estimation apparatus 20 Auto encoder calculation part 30 Encoder output distribution shape calculation part 40 Cost function calculation part 50 Parameter optimization calculation part

Claims

An information estimation device that performs an estimation process using a neural network,
An auto encoder composed of an encoder and a decoder is provided, and the encoder and the decoder sequentially perform calculation processing based on input data input to the auto encoder, and output data from the auto encoder as a result of the estimation processing An auto-encoder calculator configured to
At least one integrated layer composed of a combination of a dropout layer that drops out a part of data and a fully connected layer that performs weight calculation on the data output from the dropout layer, An information estimation apparatus configured so that an output value in a latent space which is an output value from the encoder becomes a multidimensional random variable vector by being provided as a layer.

The auto encoder calculation unit causes the dropout layer to drop out a part of data input to the integrated layer according to a predetermined dropout rate, and from the dropout layer to the total coupling layer. It is configured to calculate a biased sum of a list of terms obtained by multiplying the value of the vector of output data by a matrix of weights,
The information estimation apparatus according to claim 1, wherein a part of each item included in the list becomes zero according to the dropout rate.

Based on the data input to the integration layer, the dropout rate, the weight, and the bias, an average value of a probability distribution followed by a multidimensional random variable vector that is an output value in the latent space, a variance value, The information estimation apparatus according to claim 2, further comprising an encoder output distribution shape calculation unit that calculates a covariance value.

The encoder output distribution shape calculator
Multiply the sum of each term in the list by the ratio that remains without being dropped out, and add a bias to calculate the average value of the distribution that the list sum follows,
By calculating the variance value of the list and calculating the variance value of the sample mean, the variance value of the distribution followed by the sum of the list is calculated,
Calculating from the variance values of the distribution followed by the sum of the list a covariance value indicating the correlation of two elements with the sum of the list;
4. The shape of a probability distribution followed by a multidimensional random variable vector that is an output value in the latent space is analytically calculated from the average value, the variance value, and the covariance value. Information estimation device.

Regularization processing for regularizing the probability distribution followed by the multidimensional random variable vector, which is an output value in the latent space, to remain in the same shape as the prior distribution, and the output data output from the auto encoder to the auto encoder A cost function calculation unit that calculates a cost function for evaluating a restoration process for restoring the input data that is input;
Based on the cost function, calculate a parameter for optimizing the regularization process and the restoration process, and update a parameter used in the calculation of the auto encoder with the optimization parameter;
The information estimation device according to any one of claims 1 to 4.

A term obtained by multiplying the value of a vector of data output from the dropout layer by a matrix of weights used in calculating each element of multidimensional random variable vector data that is data output from the integration layer In the list of, a term specified by a common index included in each element of the multi-dimensional random variable vector is referred to, and a predetermined number of indexes of terms having terms larger than terms specified by other indexes are used. A data analysis unit that extracts and identifies as a peak term with a value greater than other terms;
The auto encoder calculation unit is divided into a case where the peak term is dropped out in the dropout layer and a case where the peak term is not dropped out in the dropout layer, and an average value of the Gaussian distribution in each case. The multivariate mixed Gaussian distribution is calculated by calculating the variance value and the covariance value, and further calculating the mixture sum of the Gaussian distribution in each case using the probability value in which each case occurs. The information estimation apparatus according to claim 2.

An information estimation method performed by an information estimation apparatus that performs an estimation process using a neural network,
Using an auto encoder configured by an encoder and a decoder, the encoder and the decoder sequentially perform calculation processing based on input data input to the auto encoder, and output data from the auto encoder is obtained as a result of the estimation processing. An auto encoder calculation step to output,
At least one integrated layer composed of a combination of a dropout layer that drops out a part of data and a fully connected layer that performs weight calculation on the data output from the dropout layer, An information estimation method in which an output value in a latent space, which is an output value from the encoder, is provided as a layer and a multidimensional random variable vector is used.

The auto encoder calculation step causes the dropout layer to drop out a part of the data input to the integrated layer according to a predetermined dropout rate, and from the dropout layer to the total coupling layer. Calculate the biased value of the sum of the list of terms multiplied by a weight matrix multiplied by the vector value of the output data,
The information estimation method according to claim 7, wherein a part of each item included in the list becomes zero according to the dropout rate.

Based on the data input to the integration layer, the dropout rate, the weight, and the bias, an average value of a probability distribution followed by a multidimensional random variable vector that is an output value in the latent space, a variance value, The information estimation method according to claim 8, further comprising an encoder output distribution shape calculation step for calculating a covariance value.

The encoder output distribution shape calculation step
Multiplying the sum of each term contained in the list by the ratio that remains without being dropped out, and adding a bias to calculate an average value of the distribution followed by the sum of the list;
Calculating a variance value of the list by calculating a variance value of the list by calculating a variance value of the sample average; and
Calculating a covariance value indicating the correlation between two elements of the list sum from the variance value of the distribution followed by the list sum;
Analytically calculating the shape of a probability distribution followed by a multidimensional random variable vector that is an output value in the latent space from the mean value, the variance value, and the covariance value;
The information estimation method according to claim 9.

Regularization processing for regularizing the probability distribution followed by the multidimensional random variable vector, which is an output value in the latent space, to remain in the same shape as the prior distribution, and the output data output from the auto encoder to the auto encoder A cost function calculating step for calculating a cost function for evaluating a restoration process for restoring the input data inputted;
A parameter optimization calculation step of calculating a parameter for optimizing the regularization process and the restoration process based on the cost function, and updating a parameter used in the calculation of the auto encoder with the optimization parameter;
The information estimation method according to any one of claims 7 to 10.

A term obtained by multiplying the value of a vector of data output from the dropout layer by a matrix of weights used in calculating each element of multidimensional random variable vector data that is data output from the integration layer In the list of, a term specified by a common index included in each element of the multi-dimensional random variable vector is referred to, and a predetermined number of indexes of terms having terms larger than terms specified by other indexes are used. Having a data analysis step to extract and identify as a peak term with a value greater than the other terms;
The auto encoder calculation step is divided into a case where the peak term is dropped out in the dropout layer and a case where the peak term is not dropped out in the dropout layer, and an average value of the Gaussian distribution in each case. The multivariate mixed Gaussian distribution is calculated by calculating a variance value and a covariance value, and further calculating a mixture sum of the Gaussian distributions in each case using a probability value in which each case occurs. Information estimation method described in 1.