CN113298895A

CN113298895A - Convergence guarantee-oriented unsupervised bidirectional generation automatic coding method and system

Info

Publication number: CN113298895A
Application number: CN202110678193.6A
Authority: CN
Inventors: 钱诗友; 华勤; 曹健; 薛广涛
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2021-08-24
Anticipated expiration: 2041-06-18
Also published as: CN113298895B

Abstract

The invention provides an unsupervised bidirectional generation automatic coding method and system for convergence guarantee, which comprises the following steps: the batch data (x, z) simultaneously generate an encoding result E (x) and generation data G (z) through an encoder E and a generator G; image space data

And latent variable space data

By rolling blocks Fx and Fz, respectively, image space data is processed

And latent variable space data

Extracting information to obtain extracted image space data

And extracted hidden variable space data

From the extracted image space data

And extracted hidden variable space data

Training a discriminator D until the loss function is minimum; the image data is encoded by the trained encoder E to generate an encoding result, the encoding result is input into the trained generator G to obtain a reconstructed image data result, the reconstruction of the image data is completed, the overall collaborative optimization of the image space and hidden variable space bidirectional mapping process is realized, and the representation capability and the image generation capability are improved.

Description

Convergence guarantee-oriented unsupervised bidirectional generation automatic coding method and system

Technical Field

The invention relates to the technical field of encoders, in particular to an unsupervised bidirectional generation automatic encoding method and system for convergence guarantee, and more particularly to an unsupervised bidirectional generation automatic encoder for convergence guarantee.

Background

An Auto Encoder (AE) is a learning algorithm for efficiently encoding data to reduce dimensionality. In recent years, automatic encoders have been widely used in various fields such as image classification and reconstruction, recommendation systems, and abnormality detection.

Currently, research on auto-encoders focuses on improving the ability to simultaneously generate and characterize images. This means that the auto-encoder should learn the bi-directional mapping between the generator/decoder and the encoder. In particular, the generator/decoder focuses on the mapping from the hidden variable space to the data space, while the encoder aims at extracting the semantically related feature representation in the inverse mapping from the data space to the hidden variable space. Generation of countermeasure networks (GANs) as the most advanced generation model has powerful mapping capabilities, especially in terms of generalization. Therefore, it is a feasible method to research the automatic encoder based on the GAN network.

Some previous work has proposed methods of using GAN or countermeasure models in an auto-encoder, such as AAE, ALAE and BiGAN. For example, the AAE generalizes the GAN framework when training the encoder and makes the distribution of the encoding results approach a gaussian distribution. ALAE trains the auto-encoder by reconstructing the image from the pattern coding results of the real image using the StyleGAN framework.

However, most of these efforts have two limitations. First, they do not allow a good compromise between mapping and inverse mapping. For example, AAE and ALAE typically treat the training process as a one-way optimization, regardless of the trade-off between the generator and the encoder. Second, convergence cannot be guaranteed in some bi-directional networks. For example, BiGAN implements mapping and inverse mapping by distinguishing the joint distribution of the hidden variable space and the data space, but has poor convergence performance. In addition, the characterization capability was not optimized in BiGAN.

Patent document CN111402179A (application number: 202010169306.5) discloses an image synthesis method and system that combines a countermeasure autoencoder and a generation countermeasure network. The method includes constructing an enhanced countermeasure automatic encoder including two different sets of encoders, two different sets of first discriminators, and a set of decoders; constructing an improved conditional generation countermeasure network comprising a generator and a second discriminator; taking the manually segmented blood vessel tree image and the original fundus retina image as training data, and performing iterative training on a combined enhanced countermeasure automatic encoder and an improved conditional generation countermeasure network to obtain an optimal blood vessel tree image generator and an optimal fundus retina image generator; and performing fundus retina image synthesis on the to-be-processed artificial segmentation blood vessel tree image based on the optimal blood vessel tree image generator and the optimal fundus retina image generator to obtain a synthesized image.

The invention provides Bi-GAE, which is an unsupervised generation automatic encoder based on BiGAN. First, the present invention designs two schemes to trade-off mapping and inverse mapping. In particular, the present invention introduces a guiding term in the mapping based on the SSIM loss function that causes the model to follow the human visual pattern to generate an image. In addition, the invention utilizes the embedded GAN to calculate another guiding item, thereby enhancing the representation capability related to semantics in the reverse mapping. The cooperation of the two schemes enhances the bidirectional information expansion between the hidden variable space and the data space, thereby improving the overall performance of the Bi-GAE. Secondly, the present invention uses the Wasserstein distance to guarantee efficient gradient computation, while the embedded GAN exploits MMD to enhance the convergence of Bi-GAE as the discriminator approaches convergence.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide an unsupervised bidirectional generation automatic coding method and system for convergence guarantee.

The invention provides an unsupervised bidirectional generation automatic coding method for convergence guarantee, which comprises the following steps:

step S1: the batch data (x, z) simultaneously generates an encoding result E (x) and generated data G (z) through an encoder E and a generator G, and the mapping from the hidden variable space to the data space and the reverse mapping from the data space to the hidden variable space are completed;

step S2: image space data

And latent variable space data

By rolling blocks Fx and Fz, respectively, image space data is processed

And latent variable space data

Extracting information to obtain extracted image space data

And extracted hidden variable space data

Step S3: from the extracted image space data

And extracted hidden variable space data

Training a discriminator D until the loss function is minimum;

step S4: the image data is encoded by the trained encoder E to generate an encoding result, the encoding result is input into the trained generator G to obtain a reconstructed image data result, the reconstruction of the image data is completed, the overall collaborative optimization of the image space and hidden variable space bidirectional mapping process is realized, and the representation capability and the image generation capability are improved.

Preferably, the generator G and the encoder E use a convolution network module and a deconvolution network module, respectively, in the DCGAN structure.

Preferably, the loss function in step S3 includes: a Wasserstein distance;

wherein ,

represents the Wasserstein distance;

d represents the likelihood probability of the input data pair; f_x and F_zRepresenting a volume block.

Preferably, the method further comprises the following steps: an embedded MMD discriminator module Dz is introduced, the convolution blocks Fx and Fz are multiplexed, an encoder E and a generator G realize an embedded GAN network, and the embedded GAN network is utilized to reduce MMD distance between z and distribution of a reconstruction result E (G (z)) to strengthen convergence of the whole bidirectional structure and strengthen semantic correlation characterization capability of the encoder.

Preferably, the reducing the MMD distance between z and the distribution of the reconstruction result E (g (z)) by using the embedded GAN network to enhance the convergence of the overall bidirectional structure and enhance the semantic correlation characterization capability of the encoder includes:

step S5: the encoding result E (x) is reconstructed by a generator G to generate data G (E (x)); generating data g (z) and reconstructing an encoding result E (g (z)) by an encoder E;

step S6: calculating by using an embedded MMD discriminator module Dz to obtain an MMD difference loss function distributed between z and E (G (z))

And using a loss function

Optimizing an embedded MMD discriminator module Dz;

wherein ,P_zRepresenting the spatial distribution of hidden variables; y is_z＝D_z(z)；y_z'＝D_z(z'); z, z' represents a sample on the hidden variable space;

representing a reconstruction result of the hidden variable z to generate a data recoding result;

to represent

A conditional probability distribution of (a);

λ₁representing a gradient penalty term weight parameter;

presentation pair

The above acquired gradient; σ denotes k_rep(ii) a coefficient of variance of the conforming gaussian distribution; e is the base constant of the natural logarithm;

ε represents the value of z and its reconstructed result

Solving a weight parameter during weighted sampling;

step S7: solving MMD loss function distributed between z and E (G (z)) about encoder E and generator G by using optimized embedded MMD discriminator module Dz

Wherein kF represents a Gaussian kernel function; b₁，b_uLower and upper bound parameters representing the distance L2 between a and b, respectively;

step S8: using loss functions

Generating a guide item Tz on a coding space, and further finishing the training of the embedded GAN network;

wherein ,∝₁To represent

And pixel level of the reconstruction operation₂Loss of power

The weighting coefficient of (2); nb represents the data batch size;

original hidden variable z and reconstruction result

Of (b) in₂Loss;

step S9: the convergence of the current Bi-GAE bidirectional structure is optimized through the trained embedded GAN network, and the semantic representation capability of the encoder E is further improved through the generated guide items.

Preferably, the method further comprises the following steps: calculating to obtain a guide item T by using an SSIM module according to the real image x and the reconstructed image G (E (x)))_xThe human visual feature generation capability of the generator is enhanced;

SSIM concerns brightness (l (x, x ')), contrast (contrast) and structure (structure) (measured in combination with cs (x, x'))

L^MSSSIM(x，x′)＝1-MSSSIM(x，x′) (8)

Wherein, x ═ G (e (x)); l is^MSSSIM(x, x') denotes the SSIM loss function, α₂Indicates a loss of SSIM and

a weighting factor of the loss;

denotes l between x and the reconstructed result₁Loss value, /)_m ^γmRepresents the result of l (x, x') after passing through m gaussian filters; gamma represents an attenuation parameter; m represents the number of gaussian filters; c. C₁Represents a constant parameter;

represents the result generated by the jth filter; eta_jRepresenting the attenuation coefficient of the corresponding window if x has a magnitude of(s)_i×s_i) Then the jth filter window size is

p_jRepresents the filter window, μ_x，μ_x′Respectively represent the average values of x and x'; sigma_x，σ_x′Respectively, the standard deviation of x, x'.

The invention provides an unsupervised bidirectional generation automatic coding system facing convergence guarantee, which comprises:

module M1: the batch data (x, z) simultaneously generates an encoding result E (x) and generated data G (z) through an encoder E and a generator G, and the mapping from the hidden variable space to the data space and the reverse mapping from the data space to the hidden variable space are completed;

module M2: image space data

And implicit variable spaceData of

By rolling blocks Fx and Fz, respectively, image space data is processed

And latent variable space data

Extracting information to obtain extracted image space data

And extracted hidden variable space data

Module M3: from the extracted image space data

And extracted hidden variable space data

Training a discriminator D until the loss function is minimum;

module M4: the image data is encoded by the trained encoder E to generate an encoding result, the encoding result is input into the trained generator G to obtain a reconstructed image data result, the reconstruction of the image data is completed, the overall collaborative optimization of the image space and hidden variable space bidirectional mapping process is realized, and the representation capability and the image generation capability are improved.

Preferably, the loss function in the module M3 includes: a Wasserstein distance;

wherein ,

represents the Wasserstein distance;

Preferably, the method further comprises the following steps: an embedded MMD discriminator module Dz is introduced, the convolution blocks Fx and Fz are multiplexed, an encoder E and a generator G realize an embedded GAN network, and the embedded GAN network is utilized to reduce the MMD distance between z and the distribution of a reconstruction result E (G (z)) to strengthen the convergence of the whole bidirectional structure and strengthen the semantic correlation characterization capability of the encoder;

the MMD distance between z and the distribution of the reconstruction result E (G (z)) is reduced by utilizing the embedded GAN network to strengthen the convergence of the whole bidirectional structure and strengthen the semantically-related characterization capability of the encoder, and the MMD distance comprises the following steps:

module M5: the encoding result E (x) is reconstructed by a generator G to generate data G (E (x)); generating data g (z) and reconstructing an encoding result E (g (z)) by an encoder E;

module M6: calculating by using an embedded MMD discriminator module Dz to obtain an MMD difference loss function distributed between z and E (G (z))

And using a loss function

Optimizing an embedded MMD discriminator module Dz;

wherein ,P_zRepresenting the spatial distribution of hidden variables; y is_z＝D_z(z)；y_z′＝D_z(z'); z, z' being implicitSampling over a volume space;

to represent

A conditional probability distribution of (a);

λ₁representing a gradient penalty term weight parameter;

presentation pair

ε represents the value of z and its reconstructed result

Solving a weight parameter during weighted sampling;

module M7: solving MMD loss function distributed between z and E (G (z)) about encoder E and generator G by using optimized embedded MMD discriminator module Dz

wherein ,k_FRepresenting a gaussian kernel function; b_l，b_uLower and upper bound parameters representing the distance L2 between a and b, respectively;

module M8: using loss functions

wherein ,∝₁To represent

And pixel level of the reconstruction operation₂Loss of power

The weighting coefficient of (2); nb represents the data batch size;

original hidden variable z and reconstruction result

Of (b) in₂Loss;

module M9: the convergence of the current Bi-GAE bidirectional structure is optimized through the trained embedded GAN network, and the semantic representation capability of the encoder E is further improved through the generated guide items.

SSIM concerns brightness (1. mu. l (x, x ')), contrast (contrast) and structure (structure) (measured in combination with cs (x, x'))

L^MSSSIM(x，x′)＝1-MSSSIM(x，x′) (19)

a weighting factor of the loss;

p_jRepresents the filter window, μ_x，μ_x′Respectively represent the average values of x and x'; sigma_x，σ_x′Respectively represent the x-ray numbers and the x-ray numbers,standard deviation of x'.

Compared with the prior art, the invention has the following beneficial effects:

1. the convergence guarantee-oriented unsupervised bidirectional generation automatic encoder can improve the information expansion between a hidden variable space and a data space through stable convergence;

2. the invention introduces a guide in the loss function to optimize the image reconstruction and generation in the map according to the human visual mode;

3. the present invention embeds GAN for computing a leading term that enhances the representation of semantically related features in the inverse mapping to enhance the convergence of the auto-encoder.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a logical framework diagram of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

The invention discloses an unsupervised bidirectional generation automatic encoder for convergence guarantee. Improving the generation and characterization capabilities of autoencoders is a major research problem in the field of machine learning. However, optimizing the bi-directional mapping while stabilizing convergence has presented significant challenges. Most existing auto-encoders fail to automatically trade-off the bi-directional mapping between the encoder and the decoder/generator. The invention provides Bi-GAE, which is an unsupervised bidirectional generation automatic encoder based on BiGAN. First, we introduce two guiding terms in the loss function for enhancing information expansion to follow the human visual model in the mapping and improve the semantic-related characterization capability in the inverse mapping. In addition, we embedded a GAN to improve Bi-GAE convergence and characterization capability. The experimental result shows that the Bi-GAE has competitive advantages in generation and characterization and has stable convergence. Compared with the similar method, the characterization capability of Bi-GAE improves the classification accuracy of high-resolution images by about 6.607%. Furthermore, in image reconstruction, Bi-GAE increased the Structural Similarity (SSIM) index by 0.017 and decreased the fracht onset distance (FID) by 3.098.

Example 1

step S2: image space data

And latent variable space data

By rolling blocks Fx and Fz, respectively, image space data is processed

And latent variable space data

Extracting information to obtain extracted image space data

And extracted hidden variable space data

Step S3: from the extracted image space data

And extracted hidden variable space data

Training a discriminator D until the loss function is minimum;

Specifically, the generator G and the encoder E use a convolution network module and a deconvolution network module, respectively, in the DCGAN structure.

Specifically, the loss function in step S3 includes: a Wasserstein distance;

wherein ,

represents the Wasserstein distance;

Specifically, the method further comprises the following steps: an embedded MMD discriminator module Dz is introduced, the convolution blocks Fx and Fz are multiplexed, an encoder E and a generator G realize an embedded GAN network, and the embedded GAN network is utilized to reduce MMD distance between z and distribution of a reconstruction result E (G (z)) to strengthen convergence of the whole bidirectional structure and strengthen semantic correlation characterization capability of the encoder.

Specifically, the reducing of the MMD distance between z and the distribution of the reconstruction result E (g (z)) by using the embedded GAN network enhances the convergence of the overall bidirectional structure and enhances the semantic correlation characterization capability of the encoder includes:

And using a loss function

Optimizing an embedded MMD discriminator module Dz;

wherein ,P_zRepresenting the spatial distribution of hidden variables; y is_z＝D_z(z)；y_z′＝D_z(z'); z, z' represents a sample on the hidden variable space;

to represent

A conditional probability distribution of (a);

λ₁representing a gradient penalty term weight parameter;

presentation pair

ε represents the value of z and its reconstructed result

Solving a weight parameter during weighted sampling;

step S8: using loss functions

wherein ,∝₁To represent

And pixel level of the reconstruction operation₂Loss of power

The weighting coefficient of (2); nb represents the data batch size;

original hidden variable z and reconstruction result

Of (b) in₂Loss;

Specifically, the method further comprises the following steps: calculating to obtain a guide item T by using an SSIM module according to the real image x and the reconstructed image G (E (x)))_xThe human visual feature generation capability of the generator is enhanced;

L^MSSSIM(x，x′)＝1-MSSSIM(x，x′) (8)

a weighting factor of the loss;

The convergence guarantee-oriented unsupervised bidirectional generation automatic coding system can be realized through the step flow in the convergence guarantee-oriented unsupervised bidirectional generation automatic coding method. The convergence guarantee oriented unsupervised bidirectional generation automatic coding method can be understood as a preferred example of the convergence guarantee oriented unsupervised bidirectional generation automatic coding system by those skilled in the art.

Example 2

Example 2 is a preferred example of example 1

In order to overcome the defects of the existing automatic encoder in the aspects of bidirectional mapping balance and convergence, the invention provides the unsupervised generation automatic encoder based on the BiGAN, and the generation and characterization capabilities of the automatic encoder are effectively improved.

The invention provides an unsupervised generative automatic encoder which can simultaneously realize the balance and joint optimization of mapping and inverse mapping. Aiming at the limitation of BiGAN, the invention carries out two main optimizations on Bi-GAE. First, the present invention introduces Wasserstein distance and embeds GAN to enhance the convergence of Bi-GAE. In order to estimate the Wasserstein distance of the joint distribution, the invention designs two convolution blocks for feature extraction in Bi-GAE. The invention proves the convergence of Bi-GAE theoretically. The invention introduces two terms for the generator and the encoder, respectively, which enable information expansion in mapping and inverse mapping. These extensions effectively achieve a trade-off between bi-directional mapping.

Fig. 1 shows a framework of an unsupervised bi-directional generation auto-encoder for convergence guarantee according to the present invention. Similar to BiGAN, the body part of Bi-GAE comprises a generator G, an encoder E and a discriminator D. As a bidirectional mapping encoder, the bidirectional optimization targets of the architecture are respectively: (1) mapping of hidden variables to data space (generator G); (2) inverse mapping of data space to hidden variable space (coding space). Compared with the common encoder and the generative encoder, the basic improvement idea of the architecture technical scheme comprises the following steps: (1) the joint distribution of the data space and the hidden variable space is utilized to realize the simultaneous optimization of the bidirectional mapping of the image space and the hidden variable space and further realize the overall cooperative optimization of the bidirectional process; (2) guiding mechanisms are respectively introduced in the two-way mapping training process, the convergence of the two-way architecture is guaranteed, and the visual property of the generator and the semantic correlation representation of the encoder are respectively optimized. The above idea is based on the basic architecture of BiGAN in the specific implementation of the technical solution, introduces Wasserstein distance as the judgment index of discriminator D, and designs 4 specific embedded modules to implement two guiding mechanisms in the bidirectional mapping process.

As shown in FIG. 1, the main part of the Bi-GAE is based on the BiGAN structure, and the encoder E and the generator G of the Bi-GAE respectively use a convolution network module and a deconvolution network module in the DCGAN structure in specific implementation. In the Bi-GAE, Wasserstein distance is selectively introduced into a main structure to serve as an index for distinguishing joint distribution distance in a discriminator D (specifically realized by a convolution block in DCGAN), correspondingly, in order to realize the simultaneous and feature extraction of a data space and an implicit variable space, two feature extraction convolution blocks Fx and Fz are introduced, and features of the data space and the implicit variable space are extracted as vectors and input into D. For the two guiding mechanisms described above, we utilize the SSIM module to calculate the structural difference between the real data x and the corresponding reconstructed data G (e (x)); for the coding space, we further embed an MMD-based countermeasure network module, further strengthen the convergence of the overall bidirectional structure and strengthen the semantic correlation characterization capability of the encoder by reducing the MMD distance between z and the distribution of the reconstructed result E (g (z)). The above two-way training process, the Wasserstein distance, the two guiding mechanisms and the technical solution of four embedded modules will be further elaborated below.

In each batch training process, the batch data input by the Bi-GAE is (x, z), that is, the data space batch instance x input into the encoder E and the hidden variable space normal distribution batch sample z input into the generator G. The specific training process is as follows:

(1) using (x, z), the encoder E and the generator G simultaneously generate an encoding result E (x) and generate data G (z). The step is to complete the mapping from the hidden variable space to the data space and the reverse mapping from the data space to the hidden variable space at the same time.

(2) By obtaining image space data

And latent variable space data

Referring to the BiGAN two-way training concept, the difference between two joint distributions is determined by a discriminator based on the joint distribution (x, e (x)) and the generated data (g (z), z). The core idea is to train the discriminator D to make it as large as possible according to the judgment distance between the distribution and the generation distribution, and for the encoder E and the generator G which are synchronously trained, the optimized ones are usedD, the result of judging the distance difference between the distributions Wasserstein is output for training, and the goal is to make D unable to distinguish the two distributions as much as possible, that is, the generated data obtained by D is as small as possible in relation to the judgment distance of the basis data. The core idea is to follow the core principle of BiGAN, and the mathematical meaning of this goal can be said to be that two joint probabilities are equal, i.e. it is considered that the bidirectional optimization is successfully completed. We can consider { x, E (x) } joint distribution as P_x，E＝P_E(z | x) P (x), (G (z)), z } joint distribution is P_G，z＝P(z)P_G(x | z), P (x), P (z) is the true distribution of data space, hidden space, P_E，P_GIs the conditional distribution of the encoder and generator. Therefore, under the premise of the same joint distribution, P can be respectively realized_EAnd P (z), P (x) and P_GThereby completing the synchronous optimization of the encoder and the generator. To ensure convergence, we introduced the concept that the Wasserstein distance replaces the KL divergence in BiGAN.

(3) Based on the main flow, we obtain x reconstruction result G (E (x)) according to encoding result E (x), and obtain z reconstruction result E (G (z)) according to generation result G (z). And (3) guiding the encoder E to the direction of strengthening semantic representation by using an MMD embedded network according to (E (G (z)), z) and guiding the generator G to simultaneously optimize in the direction meeting the visual characteristics of human eyes by using an SSIM module according to (G (E (x)), x).

In order to solve the problem that BiGAN is difficult to converge, Wasserstein distance is required to be introduced into a BiGAN structure as a loss function, so that the Wasserstein distance is firstly obtained from an image space

And

two convolution modules Fx and Fz are therefore designed to do this.

In order to make the generation capability of the generator G meet the human visual characteristics, we use the Structural Similarity Indexing Method (SSIM) loss to construct the bootstrap term Tx as part of the generator loss function for the generator. The overall solution flow for Tx is: an SSIM module is introduced between the (G (E (x)) and the (x)) to calculate a similarity loss value between a reconstructed image space and an original image space;

in order to solve the problem that estimation of Wasserstein is invalid when a discriminator is close to convergence, an embedded GAN model based on MMD is nested on the basis of the existing structure. The embedded model effectively multiplexes the encoder E and generator G, and accordingly, an embedded GAN discriminator should be introduced.

Introducing Wasserstein distance

A traditional BiGAN network is optimized aiming at a joint data space, X distribution of an image data space is P (X), Z distribution of a hidden variable data space is P (Z), and a discriminant D training target of the traditional BiGAN network is a maximum probability P_D(Y | X, Z) wherein (Y ═ 1| (X, e (X)); (Y ═ 0| (x, E (x))), that is, the judgment accuracy is maximized, and the opposite is true for the generator G and the trainer E. Therefore, the training targets of BiGAN as a bidirectional structure are as follows:

wherein D () is the similarity probability of the input data pair, and G (), E () is the output of the generator and encoder.

As shown in equation (1), the conventional BiGAN uses Jensen-shannon (js) divergence or Kullback-leibler (kl) divergence in estimating the judgment distance between the basis data space (x, e (x)) and the generated data space (g (z), z). However, the divergence has the obvious problem that the gradient of the loss function associated with equation (1) is ineffective when D tends to converge, thus leading to the failure of the training of the BiGAN structure. Therefore, an effective way to optimize the convergence of the BiGAN bi-directional structure is to introduce a distance measure such that the gradient of the corresponding loss function is effective and not 0 at any time. In summary, the Wasserstein distance was introduced into Bi-GAE. For distribution P according to data_xAnd generating a data distribution P_gIn practical applications, the Wasserstein distance, W, between two distributions is estimated as follows:

wherein ,g_θ(. cndot.) and D (-) output of the generator G and the discriminator D, respectively, E () represents expectation. However, the requirement of equation (2) that D (-) must satisfy the 1-Lipschit constraint, and the Wasserstein distance still has the problem of being unable to estimate W correctly when D is close to converging, so we will solve these problems by further design.

Wasserstein distance is introduced into the bidirectional countermeasure network, so that two volume blocks Fx and Fz are correspondingly designed and respectively used for image space data

And latent variable space data

Information extraction is carried out, so that two pieces of information are helped to have a data form (shape) capable of being aggregated and connected, and the order is given

To obtain

The final calculation in Bi-GAE for Wasserstein distance between the data space and the production data space is as follows:

in summary, the penalty function L of the discriminator-convolution block (D-Fx-Fz) is designed accordingly_DFJoint loss function L of sum generator-encoder (G-E)_EGAs shown in equations (4) and (5), respectively:

wherein ,n_bFor batch size, to ensure that the 1-Lipschit limit is met, a Gradient Penalty Term (GP Term) is introduced. In the penalty term, there are

wherein ,T_xAs a guide term for the generation capability, ε represents the comparison of the real data sample x and the generated data sample

Weight parameter in weighted sampling, true hidden space data sample z and coding result sample

Calculating a weight parameter during weighted sampling, wherein lambda represents a weight coefficient of a ladder penalty term,

to represent

In that

Gradient found over variable, sigma representing T_xThe weight parameter of (2).

Embedding a MMD-based GAN network

When the Wasserstein distance is introduced, the problem of invalid estimation of the Wasserstein distance when convergence is approached is not solved, namely when the parameters of the discriminator D tend to converge, the network has the condition that the Wasserstein distance cannot be correctly estimated, so that the Bi-GAE cannot be trained by using a correct gradient. Meanwhile, in order to realize the enhancement of semantic representation capability in the mapping from the data space to the hidden variable space on an encoder, an error design guide term of encoding reconstruction is utilized to realize the process. Both of the above processes can be implemented using an MMD based embedded GAN network.

First, we briefly describe the flow of MMD embedded GAN networks. The invention introduces an embedded MMD discriminator module Dz, and multiplexes the volume blocks Fx and Fz, an encoder E and a generator G to realize an embedded GAN network. For the input hidden variable z and the reconstruction result E (g (z)) of the completed main process, the embedded GAN network of the present invention has the following implementation process:

1) firstly, an embedded Dz module is utilized to solve the MMD difference loss function distributed between z and E (G (z))

This loss is used to optimize the Dz module as shown in equation (8).

2) Solving MMD loss function of distribution between z and E (G (z)) about E, G by using optimized Dz module

As shown in equation (9), and the loss function is used to generate the guiding term Tz on the coding space as shown in equation (10), thereby completing the training of the embedded GAN network.

The embedded GAN network further optimizes the convergence of the current Bi-GAE bidirectional structure, and the generated guide item can further improve the semantic representation capability of the encoder E. The theoretical derivation and guidance mechanism of MMD-based embedded GAN networks is designed as follows.

Optimizing convergence

Aiming at the estimation failure problem of Wasserstein distance when D is close to convergence, a solution is found from the calculation method of the Wasserstein distance, because the Wasserstein distance is a special case of Maximum Mean Difference (MMD) in nature, namely a case of using a linear kernel, and therefore, when the MMD with a high-order Gaussian kernel is used as an index to measure the difference of distribution, the convergence performance can be further improved. In combination with the definition of MMD and the characteristics of the joint distribution to be solved in the joint Bi-GAE, we define MMD on the joint distribution space of image data and hidden variables as follows, and have a threshold e, { f } which is a set of continuous functions:

wherein ,E_fRepresenting the expected difference obtained when using f to measure the sample difference between two distributions, can be understood to reflect the difference in the distributions as the parameter of D tends to converge (i.e., as the parameter of D converges

Time), by two lemmas: lesion 1 and 2, we use

The effective quantitative estimation of the sequential convergence of the introduced embedded GAN structure is carried out.

Introduction 1: is provided with

We can get it

By analogy with equation (6), we can see that we use batch sample crushing

And z_iIn the measurement function f₀Upper utilization expected value

And

the difference therebetween is as

Using the function g₀In the case of bulk samples

And on xi

σ is to

Obtainable in respect of

Lower bound and ε is to

Obtainable in respect of

The lower bound, defined by MMD, is σ > 0 and ε > 0.

And (3) proving that:

similarly, can prove that

The proof of lemma 1 means that when D converges in the hidden variable space and the image data space, respectively, Bi-GAE is in

I.e. to achieve an overall convergence on the joint distribution space.

2, leading: when f is a continuous function, the function,

and (3) proving that: suppose that

Let E (g (f (a))) -E (g (f (b)) > tau. Let k be g · f, there are E (k) (a) -E (k (b)) > τ ═ M (a, b), which contradicts the MMD definition, and therefore there are

Lemma 2 means that MMD is stable for continuous functions f (e.g., G and E trained in Bi-GAE). Since GAN embedded in Bi-GAE is MMD based, we can deduce the upper bound of M (E (g (z)), z) from lemmas 1 and 2.

By lemma 1 and lemma 2, we can derive theorem 1 as follows:

theorem 1: when D is close to convergence, an embedded GAN structure is not introduced (namely D is not newly added)_z) In the case of (B), the upper limit of M (E (G (z)), z) tends to be 2E.

And (3) proving that:

M(E(G(z))，z)≤M(E(G(z))，E(x))+M(E(x)，z)

m (x, g (z)) + M (e) (x), z)// theorem 1

2 · max ({ epsilon, sigma }) ≦ 2 · e// theorem 2 ≦ epsilon + sigma

While an embedded GAN is introduced, i.e. a new D is added_zAfter the module, according to the theorem 2 and the formula (6), when D is obtained_zUpon convergence, the MMD should satisfy:

M′(E(G(z))，z)≤M′(x，G(z))+M′(E(x)，z)≤e (7)

equation (7) means that at this time the upper limit of M' (E (g (z)), z) can be reduced to E. Meanwhile, the reduction is essentially to promote the convergence of the Bi-GAE by promoting the information interaction capability in the encoding and true hidden variable space (e (x), z) and the true and generated image space (x, g (z)), respectively.

1) Enhancing coding semantic representation capability

The goal of Bi-GAE in implementing the mapping of image space to latent variable space is to enhance the semantically-related characterization capability when de-entanglement is considered. Embedding in Bi-GAE according to equation (7)The GAN of (2) realizes information expansion in the encoding process by reducing M (E (x), z), thereby improving the characterization capability. MMD-based loss of embedded GAN is dominated by the repulsion term L^repAnd attraction item L^attAnd (4) forming. In actual computation, a single bounded Gaussian kernel method (kernel function k) is selected_F) To reduce the amount of computation. Concrete calculating time

Order to

To represent

The conditional probability distribution of (2). In summary, D_zLoss function

And E-G (encoder-generator) loss function of

As shown in equations (8) and (9), respectively:

wherein

y_z＝D_z(z)，y_z′＝D_z(z′)，

P_zRepresenting hidden variable space componentsCloth, y_z＝D_z(z)，y_z′＝D_z(z '), z, z' represent samples over the hidden variable space,

represents the reconstruction result of the hidden variable z, i.e. the re-encoding result of the generated data, e.g.,

to represent

A conditional probability distribution of (a);

λ₁a gradient penalty term weight parameter is represented,

presentation pair

The above acquired gradient; σ denotes k_rep(ii) a coefficient of variance of the conforming gaussian distribution; e is the base constant of the natural logarithm (≈ 2.718281828459.);

ε represents the value of z and its reconstructed result

And calculating a weight parameter during weighted sampling.

wherein ,k_FRepresenting a kernel function (e.g. Gaussian kernel), b_l，b_uThe lower and upper bound parameters represent the distance L2 between a and b, respectively.

For the encoder-decoder, in order to prevent the occurrence of excessive element-level errors, therefore, with the L2 loss as a regularization term, the guiding term T for the encoder to enhance the semantic representation capability is finally known_zThe definition is as follows:

wherein ,∝₁To represent

And pixel level of the reconstruction operation₂Loss of power

Weighting coefficient of n_bWhich is indicative of the size of the data batch,

namely the original hidden variable z and the reconstructed result

Of (b) in₂Loss;

3) human eye visual feature generation capability of enhanced generator introduced with SSIM module

One of the goals of interest of Bi-GAE in generating image data is to facilitate image generation and reconstruction in accordance with the human visual model. In order to realize information expansion in the process of mapping from a hidden variable space to an image space, the Bi-GAE introduces a Structural Similarity Index Method (SSIM), which comprises three human visual indicators, namely brightness (luminance), contrast (contrast) and structure (structure). Therefore, Bi-GAE designs a guiding term T between the real image x and the reconstructed image G (E (x)))_x。

In training encoder E and generator G, we compute x' ═ G (E (x)). If x is of size(s)_i×s_i) And m Gaussian filters exist, we can calculate MS-SSIM(multiple size SSIM) as follows:

wherein ,l_m ^γmDenotes the result of l (x, x') after m Gaussian filters, where γ is the attenuation parameter, m denotes the number of Gaussian filters, c₁Is a constant parameter, and is a constant parameter,

representing the result generated by the jth filter, where eta_jAttenuation coefficient for corresponding window if x is(s)_i×s_i) Then the jth filter window size is

The size of the jth filter is

l (x, x ') and cs (x, x') are defined as follows:

wherein p_jRepresents the filter window, μ_x，μ_x′Respectively represent the average values of x and x'; sigma_x，σ_x′Respectively, the standard deviation of x, x'.

The corresponding design SSIM loss function is:

L^MSSSIM(x，x′)＝1-MSSSIM(x，x′) (12)

analogy T_zIn order to avoid excessive pixel-level errors during reconstruction, we need to add a regular term, considering that the dimension of the image space is high, and in order to prevent the problem of potential excessive penalty of the L2 loss function, we use the L1 loss function to implement the regular term here, and in conclusion, we design the self-guiding term T for the generator_xAs follows：

Wherein by default there is alpha₂0.84x ═ G (e (x)) means loss of SSIM and

a weighting factor of the loss;

denotes l between x and the reconstructed result₁Loss value.

System implementation

Each component in the Bi-GAE is implemented based on the source code of DCGAN. Let theta_E，θ_G，θ_D，θ_Dz and θ_F＝{θ_Fx，θ_FzDenotes parameters of E, G, D, Dz and { Fx, Fz }, respectively. Accordingly, the present invention uses three custom Adam optimizers, β 1 ═ 0.5 and β 2 ═ 0.9: are respectively used for optimizing theta_G and θ_EAdam of_EGFor optimizing theta_F and θ_DAdam of_FDAnd Adam for optimizing Dz_DzThe learning rates are lr_EG，lr_FD and lr_Dz。

The Bi-GAE run included 4 steps:

step 1: d and { Fx, Fz } are trained using the data and latent samples (x, z), and the Wasserstein loss distributed jointly in terms of space and guess space is determined using the loss function in equation (4), i.e., discriminator D. L is_DFIterative training is performed, the loss function is used to update the discriminator D and the convolution feature extraction module F, the steps are repeated,_Dsecond (default 5).

Step 2: training G and E with another batch of data (x, z), and using the loss function L in equation (5)_EGTraining G and E, tracing from equation (5)It can be seen that the penalty is a weighted result of the distance of the joint distribution Wasserstein judged by the optimized arbiter and the SSIM difference penalty between data x and its reconstructed result. Given x, calculate x' ═ G (E)_ng(x) And calculates Tx in equation (13) using (x, x') to train G.

And 3, step 3: input a batch of z to train Dz and use the loss function L in equation (8)_DzAnd performing iterative training, wherein the loss is the MMD loss between the z and z reconstruction results E (G (z)) obtained after decomposition. The step is repeated

Second (default to 3).

And 4, step 4: inputting a batch of z to calculate

Input device

E is trained with the loss function Tz in equation (10).

The settings of parameters used in training and testing on the Celeba-hq and Mnist data sets in the overall process are shown in Table 1, and Table 1 shows the parameter settings in the present invention.

Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. An unsupervised bidirectional generation automatic coding method for convergence guarantee is characterized by comprising the following steps:

step S2: image space data

And latent variable space data

By rolling up blocks F_xAnd Fz, respectively, for image space data

And latent variable space data

Extracting information to obtain extracted image space data

And extracted hidden variable space data

Step S3: from the extracted image space data

And extracted hidden variable space data

Training a discriminator D until the loss function is minimum;

2. The convergence guarantee oriented unsupervised bidirectional generation automatic encoding method of claim 1, wherein the generator G and the encoder E use a convolutional network module and a deconvolution network module, respectively, in a DCGAN structure.

3. The convergence assurance oriented unsupervised bi-directional generation automatic encoding method of claim 1, wherein the loss function in the step S3 comprises: a Wasserstein distance;

wherein ,

represents the Wasserstein distance;

4. The convergence guarantee oriented unsupervised bi-directional generation automatic coding method according to claim 1, further comprising: an embedded MMD discriminator module Dz is introduced, the convolution blocks Fx and Fz are multiplexed, an encoder E and a generator G realize an embedded GAN network, and the embedded GAN network is utilized to reduce MMD distance between z and distribution of a reconstruction result E (G (z)) to strengthen convergence of the whole bidirectional structure and strengthen semantic correlation characterization capability of the encoder.

5. The convergence guarantee oriented unsupervised bidirectional generation automatic coding method of claim 4, wherein the reducing MMD distance between z and E (G (z)) distribution of reconstruction results by using embedded GAN network to enhance convergence of the whole bidirectional structure and enhance semantically related characterization capability of the encoder comprises:

And using a loss function

Optimizing an embedded MMD discriminator module Dz;

wherein ,P_zRepresenting the spatial distribution of hidden variables; y is_z＝F_z(z)；y_z'＝D_z(z'); z, z' represents a sample on the hidden variable space;

to represent

A conditional probability distribution of (a);

λ₁representing a gradient penalty term weight parameter;

presentation pair

ε represents the value of z and its reconstructed result

Solving a weight parameter during weighted sampling;

step S8: using loss functions

wherein ,∝₁To represent

And pixel level of the reconstruction operation₂Loss of power

The weighting coefficient of (2); n is_bRepresenting a data batch size;

original hidden variable z and reconstruction result

Of (b) in₂Loss;

6. Convergence assurance oriented unsupervised duplex according to claim 5The method for automatically encoding the vector generator is characterized by further comprising the following steps: calculating to obtain a guide item T by using an SSIM module according to the real image x and the reconstructed image G (E (x)))_xThe human visual feature generation capability of the generator is enhanced;

L^MSSSIM(x,x')＝1-MSSSIM(x,x') (8)

a weighting factor of the loss;

p_jRepresents the filter window, μ_x，μ_x'Respectively represent the average values of x and x'; sigma_x，σ_x'Respectively, the standard deviation of x, x'.

7. An unsupervised bi-directional generation automatic coding system for convergence guarantee, comprising:

module M2: image space data

And latent variable space data

By rolling blocks Fx and Fz, respectively, image space data is processed

And latent variable space data

Extracting information to obtain extracted image space data

And extracted hidden variable space data

Module M3: from the extracted image space data

And extracted hidden variable space data

Training a discriminator D until the loss function is minimum;

8. The convergence assurance oriented unsupervised bi-directional generation automatic coding system of claim 7, wherein the loss function in the module M3 comprises: a Wasserstein distance;

wherein ,

represents the Wasserstein distance;

9. The convergence assurance oriented unsupervised bi-directional generation automatic encoding system of claim 7, further comprising: an embedded MMD discriminator module Dz is introduced, the convolution blocks Fx and Fz are multiplexed, an encoder E and a generator G realize an embedded GAN network, and the embedded GAN network is utilized to reduce the MMD distance between z and the distribution of a reconstruction result E (G (z)) to strengthen the convergence of the whole bidirectional structure and strengthen the semantic correlation characterization capability of the encoder;

And using a loss function

Optimizing an embedded MMD discriminator module Dz;

to represent

A conditional probability distribution of (a);

λ₁representing a gradient penalty term weight parameter;

presentation pair

ε represents the value of z and its reconstructed result

Solving a weight parameter during weighted sampling;

module M8: using loss functions

wherein ,∝₁To represent

And pixel level of the reconstruction operation₂Loss of power

The weighting coefficient of (2); n is_bRepresenting a data batch size;

original hidden variable z and reconstruction result

Of (b) in₂Loss;

10. The convergence assurance oriented unsupervised bi-directional generation automatic encoding system of claim 9, further comprising: calculating to obtain a guide item T by using an SSIM module according to the real image x and the reconstructed image G (E (x)))_xThe human visual feature generation capability of the generator is enhanced;

L^MSSSIM(x,x')＝1-MSSSIM(x,x') (19)

a weighting factor of the loss;