CN108171324A

CN108171324A - A kind of variation own coding mixed model

Info

Publication number: CN108171324A
Application number: CN201711433048.1A
Authority: CN
Inventors: 陈亚瑞; 蒋硕然; 赵青; 杨巨成; 张传雷; 赵希; 刘建征
Original assignee: Tianjin University of Science and Technology
Current assignee: Tianjin University of Science and Technology
Priority date: 2017-12-26
Filing date: 2017-12-26
Publication date: 2018-06-15

Abstract

The present invention relates to a kind of variation own coding mixed model, technical characteristics are：It is made of K variation own coding model, wherein each variation own coding model is indicated by the random hidden variable of two-value that K is tieed up, the probabilistic decoding model and probability encoding model of each variation own coding model are made of neural networks with single hidden layer, indicate the Posterior probability distribution of hidden variable by being formed based on the neural network for cutting rod.Reasonable design of the present invention, it uses the relationship between own coding mixed model estimation hidden variable and sample, the ability of model generation sample can be improved, learn the weight of mixing generation model based on hidden variable, it can ensure the simplicity sampled to hidden variable, and best generation model can be independently determined by hidden variable when sampling generates sample, simultaneously, the present invention effectively extends latent variables space i.e. probability encoding space, improves the precision of model expression.

Description

A kind of variation own coding mixed model

Technical field

The invention belongs to machine learning techniques field, especially a kind of variation own coding mixed model.

Background technology

Variation own coding (Variational Autoencoders, VAEs) is a kind of important expression model, passes through change Point method comes approximate solution generation model (probabilistic decoding) and identification model (probability encoding).Enable X={ x₁,x₂,…,x_NRepresent N The set of a independent same distribution sample composition.Variable x=[x₁,x₂,...,x_d]^TIt is a d dimensional vector, can is discrete variable Or continuous variable.VAEs model hypothesis data x is by condition distribution p_θ(x | z) generation, wherein z are continuous hidden variables, priori It is distributed as p_θ(z), θ represents model parameter.Learning tasks at this time are by calculating edge likelihood p_θ(x) and the posteriority of hidden variable z Distribution p_θ(z | x) solving model parameter, i.e.,：

p_θ(x)=∫_zp_θ(x|z)p_θ(z)dz (1)

It calculates edge likelihood and Posterior distrbutionp is difficult to resolve, variation own coding is freely distributed q by introducing_φ(z | x) it is used for Approximate Posterior probability distribution p_θThe variable problem of quadraturing is converted into about being freely distributed q by (z | x)_φThe optimization problem of (z | x) L_VAEs(x, θ, φ), by the approximate solution optimization problem calculating target function, i.e.,：

In variation own coding model, condition distribution p_θ(x | z) it is known as generation model or probabilistic decoding, freely it is distributed q_φ(z | x) it is known as identification model or probability encoding.Specifically,q_φ(z | x)=N (z；μ_φ(x),Σ_φ(x)), wherein f_θ(z), μ_φ(x) and Σ_φ(x) it is made of neural networks with single hidden layer.Using stochastic gradient descent Method solving-optimizing problem (3) study variation own coding model parameter { θ, φ }.

Karol Gregor in 2014 et al. are in " Deep AutoRegressive Networks " autoregression network Concept is used in self-encoding encoder, with a kind of increasingly complex autoencoder network of autoregression network struction, can accurately be fitted sample This potential regularity of distribution.2016, Danilo Jimenez Rezende were in " Variational Inference with The concept of the middle normal streams of Normalizing Flows " is complicated the Posterior probability distribution of hidden variable in variation own coding model Change, obtain a kind of higher distribution of scalability.Although autoregression self-encoding encoder and normal stream self-encoding encoder both models are all So that the edge likelihood of variation reasoning improves, but both modes have upset the hidden variable of different classes of sample in feature space In distribution, to the sample that is generated after hidden variable random sampling also distribution-free rule.2017, Serena Yeung were proposed " Epitomic Variational Autoencoder, eVAE ", Serena analyze the intermediate hidden layer of variation own coding model, Some intermediate hidden nodes are all sluggish to big multisample in concurrent own coding model now, while the value pair of the hidden node Different sample changed unobvious, variance are smaller.Based on this point, the node of intermediate hidden layer is divided into multigroup, each sample by eVAE Hidden node among one group of this correspondence, other group nodes will be hidden, and add in a hidden variable in a model, be used to specify The group of the corresponding intermediate hidden node of sample.How variation own coding model to be effectively bonded together with hidden variable, because This, the precision of the model, space still have some problems.

Invention content

It is an object of the invention to overcome the deficiencies in the prior art, propose that a kind of reasonable design, precision are high and effectively expand The variation own coding mixed model of latent variables space.

The present invention solves its technical problem and following technical scheme is taken to realize：

A kind of variation own coding mixed model is made of K variation own coding model, and each variation own coding model is tieed up by K The instruction of two-value random hidden variable, the probabilistic decoding model and probability encoding model of each variation own coding model are by single hidden layer Neural network forms, and indicates the Posterior probability distribution of hidden variable by being formed based on the neural network for cutting rod.

Each variation own coding model represents as follows：

Enable { θ₁,θ₂,...,θ_KRepresent the parameter of each distributed component, π=[π₁,π₂,...,π_K] represent each distributed component Weight, andThe two-value instruction hidden variable c=[c of K dimensions₁,c₂,…,c_K]^T, meet c_k∈ { 0,1 } andThen π_k=p (c_k=1) be k-th of model weight, hidden variable probability is indicated in variation own coding mixed model The conditional probability distribution p of distribution p (c | π) and generation data (x | z, c) be respectively：

The joint probability distribution form of variation own coding mixed model is：

P (x, z, c)=p (x | z, c) p (z) p (c | π).

The condition distribution p of the neural networks with single hidden layer_θ(x | z) be：

Y=f_σ(W₂tanh(W₁z+b₁)+b₂)

Wherein, W₃,b₃Represent weight and the biasing of neural networks with single hidden layer input layer, W₄,b₄,W₅,b₅Represent single hidden layer god Weight and biasing through network output layer, therefore parameter phi={ W₃,W₄,W₅,b₃,b₄,b₅}。

For continuous hidden variable z, the condition distribution q based on neural networks with single hidden layer_φ(z | x) be：

logq_φ(z | x)=logN (z；μ,δ²I)

μ=W₄h+b₄

logδ²=W₅h+b₅

H=tanh (W₃z+b₃)

Wherein, W₃,b₃Represent the weight of neural networks with single hidden layer input layer and biasing, W in probability encoding₄,b₄,W₅,b₅Table Show weight and the biasing of neural networks with single hidden layer output layer, therefore parameter phi={ W₃,W₄,W₅,b₃,b₄,b₅}。

The Posterior probability distribution of the instruction hidden variable represents as follows：

For hidden variable π, using monolayer neural networks study posteriority q_η(π|z)：

α=tanh (W₇(W₆z+b₆)+b₇)

Wherein, W₆,b₆Represent weight and the biasing of neural networks with single hidden layer input layer, W₇,b₇Represent neural networks with single hidden layer The weight of output layer and biasing, therefore parameter η={ W₇,W₈,b₇,b₈}。

The variation own coding mixed model includes the model parameter of probabilistic decoding model, the model ginseng of probability encoding model The model parameter of number and instruction hidden variable Posterior probability distribution, above-mentioned parameter are calculated using gradient descent method optimization object function It obtains.

The advantages and positive effects of the present invention are：

Reasonable design of the present invention uses the relationship between own coding mixed model estimation hidden variable and sample, Neng Gouti The ability of high model generation sample, while based on the weight of hidden variable study mixing generation model, can ensure to take out hidden variable The simplicity of sample, and best generation model can be independently determined by hidden variable when sampling generates sample, the present invention is effectively Latent variables space i.e. probability encoding space is extended, improves the precision of model expression, while sample can be efficiently generated.

Description of the drawings

Fig. 1 is variation own coding mixed model graph model structure chart；

Fig. 2 is MNIST hand-written script data set figures；

During Fig. 3 is using MNIST data sets training variation own coding mixed model, the convergence process figure of variation lower bound；

Fig. 4 is to generate new hand-written script sample after training variation own coding mixed model using MNIST data sets；

After Fig. 5 is trains variation own coding mixed model using MNIST data sets, the generation of latent variables space uniform sampling Hand-written script.

Specific embodiment

The embodiment of the present invention is further described below in conjunction with attached drawing.

A kind of variation own coding mixed model as shown in Figure 1, forming mixed model by K variation own coding model, enables {θ₁,θ₂,...,θ_KRepresent the parameter of each distributed component, π=[π₁,π₂,...,π_K] represent the weight of each distributed component, andIntroduce the random hidden variable c=[c of two-value of K dimensions₁,c₂,…,c_K]^T, meet c_k∈ { 0,1 } andThen π_k=p (c_k=1) be k-th of model weight.The form of probability of variation own coding mixed model is：

Combining form of probability under variation own coding mixed model is：

P (x, z, c)=p (x | z, c) p (z) p (c | π) (6)

The Posterior probability distribution p (z | x) and p (c | x) for calculating hidden variable c and z are difficult to resolve, according to variation approximate resoning side Method, introducing are freely distributed q_φ(z | x) and q_η(c | x), variable Integral Problem is converted into optimization problem, specific derivation process is such as Under：

Therefore, the variation optimization problem in variation own coding mixed model is：

Learning tasks on variation own coding mixed model are by solving variation optimization problem (8), and study variation is self-editing Code model parameter { θ, φ, η }.The graph model representation of variation own coding mixed model is such as.

For variation own coding mixed model, probabilistic decoding model (or generation model) p_θ(x | z) and probability encoding model (or identification model) q_φ(z | x) it is made of neural networks with single hidden layer.Specifically, when vector x is discrete vector, based on single hidden layer The condition distribution p of neural network_θ(x | z) be：

Y=f_σ(W₂tanh(W₁z+b₁)+b₂)

Wherein, W₁,b₁Represent the weight of neural networks with single hidden layer input layer and biasing, W in probabilistic decoding₂,b₂Represent single hidden The weight of layer neural network output layer and biasing, therefore parameter θ={ W₁,W₂,b₁,b₂}。

Hidden variable π selects its conjugate gradient descent method --- and Di Li Crays distribution Dir (α) is distributed with classical Di Li Crays Building method --- rod method approximation p (π) is cut, i.e.,

(π₁,π₂,...,π_K)~Dir (α₁,α₂,...,α_K) (11)

π_k=σ_k (13)

Realize that the Beta in above-mentioned construction process is distributed by the way of mapping is differentiated in variation own coding mixed model Cut rod process：

π₁=sigmoid (f₁(z,η)) (14)

π₂=(1-sigmoid (f₁(z,η)))sigmoid(f₂(z,η)) (15)

…

π_K=(1-sigmoid (f_K-1(z,η)))...(1-sigmoid(f₁(z,η))) (16)

Above-mentioned construction process can be further simplified as：

α=tanh (W₇(W₆z+b₆)+b₇) (19)

L (x, θ, φ, η) is the variation lower bound of the edge likelihood of variation own coding mixed model, and the target of model is exactly most Change the value greatly.To obtain the estimation of deviation minimum, it should using batch sampleIt is handled.Using adopting again Sample is sampled hidden variable z, then optimizes object function L (x, θ, φ, η) with stochastic gradient descent method.

The MNIST handwritten numerals data set provided below with Fig. 2 illustrates this variation own coding mixed model. MNIST handwritten numerals data set comes from American National Standard and technical research institute, including 0-9 totally ten digital hand-written scripts, Training set includes 60000 fonts altogether, is specifically formed from the hand-written number of 250 different peoples, and 50% is high school student, 50% staff from the Census Bureau.The few examples of MNIST handwritten numerals are as shown in Figure 2.Using the hand-written numbers of MNIST Digital data set pair this variation own coding mixed model is trained, especially by solving-optimizing formula (8) computation model parameter θ, φ, η } and variation lower bound L (x, θ, φ, η).The convergence process of optimization problem variation lower bound is as shown in figure 3, wherein abscissa represents Iterations, ordinate represent variation lower bound.Generation model p can be obtained after the completion of variation own coding model training_θ(x | z), New handwritten numeral sample can be generated using the generation model, the newly-generated handwritten numeral sample in part is as shown in Figure 4.Fig. 5 After training variation own coding mixed model using MNIST data sets, the hand-written script of hidden variable z space uniforms sampling generation.

It is emphasized that embodiment of the present invention is illustrative rather than limited, therefore present invention packet Include the embodiment being not limited to described in specific embodiment, it is every by those skilled in the art according to the technique and scheme of the present invention The other embodiment obtained, also belongs to the scope of protection of the invention.

Claims

1. a kind of variation own coding mixed model, it is characterised in that：It is made of K variation own coding model, each variation is self-editing Code model is by the random hidden variable instruction of two-value that K is tieed up, the probabilistic decoding model of each variation own coding model and probability encoding mould Type is made of neural networks with single hidden layer, indicates the Posterior probability distribution of hidden variable by being formed based on the neural network for cutting rod.

2. a kind of variation own coding mixed model according to claim 1, it is characterised in that：Each variation own coding Model represents as follows：

Enable { θ₁,θ₂,...,θ_KRepresent the parameter of each distributed component, π=[π₁,π₂,...,π_K] represent the weight of each distributed component, AndThe two-value instruction hidden variable c=[c of K dimensions₁,c₂,…,c_K]^T, meet c_k∈ { 0,1 } andThen π_k =p (c_k=1) be k-th of model weight, instruction hidden variable probability distribution p in variation own coding mixed model (c | π) and raw Conditional probability distribution p (x | z, c) into data is respectively：

P (x, z, c)=p (x | z, c) p (z) p (c | π).

3. a kind of variation own coding mixed model according to claim 1, it is characterised in that：The neural networks with single hidden layer Condition distribution p_θ(x | z) be：

Y=f_σ(W₂tanh(W₁z+b₁)+b₂)

Wherein, W₁,b₁Represent the weight of neural networks with single hidden layer input layer and biasing, W in probabilistic decoding₂,b₂Represent single hidden layer god Weight and biasing through network output layer, therefore parameter θ={ W₁,W₂,b₁,b₂}。

log q_φ(z | x)=log N (z；μ,δ²I)

μ=W₄h+b₄

logδ²=W₅h+b₅

H=tanh (W₃z+b₃)

Wherein, W₃,b₃Represent the weight of neural networks with single hidden layer input layer and biasing, W in probability encoding₄,b₄,W₅,b₅Represent single The weight of hidden layer neural network output layer and biasing, therefore parameter phi={ W₃,W₄,W₅,b₃,b₄,b₅}。

4. a kind of variation own coding mixed model according to claim 1, it is characterised in that：After the instruction hidden variable It tests probability distribution and represents as follows：

α=tanh (W₇(W₆z+b₆)+b₇)

Wherein, W₆,b₆Represent weight and the biasing of neural networks with single hidden layer input layer, W₇,b₇Represent neural networks with single hidden layer output The weight of layer and biasing, therefore parameter η={ W₇,W₈,b₇,b₈}。

5. a kind of variation own coding mixed model according to claim 1, it is characterised in that：The variation own coding mixing Model includes the model parameter, the model parameter of probability encoding model and instruction hidden variable Posterior probability distribution of probabilistic decoding model Model parameter, above-mentioned parameter is calculated using gradient descent method optimization object function.