CN113298895A - Convergence guarantee-oriented unsupervised bidirectional generation automatic coding method and system - Google Patents
Convergence guarantee-oriented unsupervised bidirectional generation automatic coding method and system Download PDFInfo
- Publication number
- CN113298895A CN113298895A CN202110678193.6A CN202110678193A CN113298895A CN 113298895 A CN113298895 A CN 113298895A CN 202110678193 A CN202110678193 A CN 202110678193A CN 113298895 A CN113298895 A CN 113298895A
- Authority
- CN
- China
- Prior art keywords
- data
- result
- encoder
- image
- space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002457 bidirectional effect Effects 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000013507 mapping Methods 0.000 claims abstract description 50
- 238000012549 training Methods 0.000 claims abstract description 31
- 230000008569 process Effects 0.000 claims abstract description 20
- 238000005457 optimization Methods 0.000 claims abstract description 14
- 238000005096 rolling process Methods 0.000 claims abstract description 6
- 238000009826 distribution Methods 0.000 claims description 60
- 238000012512 characterization method Methods 0.000 claims description 20
- 230000000007 visual effect Effects 0.000 claims description 14
- 238000005070 sampling Methods 0.000 claims description 8
- 230000002441 reversible effect Effects 0.000 claims description 7
- 230000002829 reductive effect Effects 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 2
- 230000006870 function Effects 0.000 description 48
- 238000013461 design Methods 0.000 description 8
- 238000000605 extraction Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 210000004204 blood vessel Anatomy 0.000 description 4
- 210000001525 retina Anatomy 0.000 description 4
- 230000002708 enhancing effect Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013256 Gubra-Amylin NASH model Methods 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention provides an unsupervised bidirectional generation automatic coding method and system for convergence guarantee, which comprises the following steps: the batch data (x, z) simultaneously generate an encoding result E (x) and generation data G (z) through an encoder E and a generator G; image space dataAnd latent variable space dataBy rolling blocks Fx and Fz, respectively, image space data is processedAnd latent variable space dataExtracting information to obtain extracted image space dataAnd extracted hidden variable space dataFrom the extracted image space dataAnd extracted hidden variable space dataTraining a discriminator D until the loss function is minimum; the image data is encoded by the trained encoder E to generate an encoding result, the encoding result is input into the trained generator G to obtain a reconstructed image data result, the reconstruction of the image data is completed, the overall collaborative optimization of the image space and hidden variable space bidirectional mapping process is realized, and the representation capability and the image generation capability are improved.
Description
Technical Field
The invention relates to the technical field of encoders, in particular to an unsupervised bidirectional generation automatic encoding method and system for convergence guarantee, and more particularly to an unsupervised bidirectional generation automatic encoder for convergence guarantee.
Background
An Auto Encoder (AE) is a learning algorithm for efficiently encoding data to reduce dimensionality. In recent years, automatic encoders have been widely used in various fields such as image classification and reconstruction, recommendation systems, and abnormality detection.
Currently, research on auto-encoders focuses on improving the ability to simultaneously generate and characterize images. This means that the auto-encoder should learn the bi-directional mapping between the generator/decoder and the encoder. In particular, the generator/decoder focuses on the mapping from the hidden variable space to the data space, while the encoder aims at extracting the semantically related feature representation in the inverse mapping from the data space to the hidden variable space. Generation of countermeasure networks (GANs) as the most advanced generation model has powerful mapping capabilities, especially in terms of generalization. Therefore, it is a feasible method to research the automatic encoder based on the GAN network.
Some previous work has proposed methods of using GAN or countermeasure models in an auto-encoder, such as AAE, ALAE and BiGAN. For example, the AAE generalizes the GAN framework when training the encoder and makes the distribution of the encoding results approach a gaussian distribution. ALAE trains the auto-encoder by reconstructing the image from the pattern coding results of the real image using the StyleGAN framework.
However, most of these efforts have two limitations. First, they do not allow a good compromise between mapping and inverse mapping. For example, AAE and ALAE typically treat the training process as a one-way optimization, regardless of the trade-off between the generator and the encoder. Second, convergence cannot be guaranteed in some bi-directional networks. For example, BiGAN implements mapping and inverse mapping by distinguishing the joint distribution of the hidden variable space and the data space, but has poor convergence performance. In addition, the characterization capability was not optimized in BiGAN.
Patent document CN111402179A (application number: 202010169306.5) discloses an image synthesis method and system that combines a countermeasure autoencoder and a generation countermeasure network. The method includes constructing an enhanced countermeasure automatic encoder including two different sets of encoders, two different sets of first discriminators, and a set of decoders; constructing an improved conditional generation countermeasure network comprising a generator and a second discriminator; taking the manually segmented blood vessel tree image and the original fundus retina image as training data, and performing iterative training on a combined enhanced countermeasure automatic encoder and an improved conditional generation countermeasure network to obtain an optimal blood vessel tree image generator and an optimal fundus retina image generator; and performing fundus retina image synthesis on the to-be-processed artificial segmentation blood vessel tree image based on the optimal blood vessel tree image generator and the optimal fundus retina image generator to obtain a synthesized image.
The invention provides Bi-GAE, which is an unsupervised generation automatic encoder based on BiGAN. First, the present invention designs two schemes to trade-off mapping and inverse mapping. In particular, the present invention introduces a guiding term in the mapping based on the SSIM loss function that causes the model to follow the human visual pattern to generate an image. In addition, the invention utilizes the embedded GAN to calculate another guiding item, thereby enhancing the representation capability related to semantics in the reverse mapping. The cooperation of the two schemes enhances the bidirectional information expansion between the hidden variable space and the data space, thereby improving the overall performance of the Bi-GAE. Secondly, the present invention uses the Wasserstein distance to guarantee efficient gradient computation, while the embedded GAN exploits MMD to enhance the convergence of Bi-GAE as the discriminator approaches convergence.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an unsupervised bidirectional generation automatic coding method and system for convergence guarantee.
The invention provides an unsupervised bidirectional generation automatic coding method for convergence guarantee, which comprises the following steps:
step S1: the batch data (x, z) simultaneously generates an encoding result E (x) and generated data G (z) through an encoder E and a generator G, and the mapping from the hidden variable space to the data space and the reverse mapping from the data space to the hidden variable space are completed;
step S2: image space dataAnd latent variable space dataBy rolling blocks Fx and Fz, respectively, image space data is processedAnd latent variable space dataExtracting information to obtain extracted image space dataAnd extracted hidden variable space data
Step S3: from the extracted image space dataAnd extracted hidden variable space dataTraining a discriminator D until the loss function is minimum;
step S4: the image data is encoded by the trained encoder E to generate an encoding result, the encoding result is input into the trained generator G to obtain a reconstructed image data result, the reconstruction of the image data is completed, the overall collaborative optimization of the image space and hidden variable space bidirectional mapping process is realized, and the representation capability and the image generation capability are improved.
Preferably, the generator G and the encoder E use a convolution network module and a deconvolution network module, respectively, in the DCGAN structure.
Preferably, the loss function in step S3 includes: a Wasserstein distance;
wherein ,represents the Wasserstein distance;d represents the likelihood probability of the input data pair; fx and FzRepresenting a volume block.
Preferably, the method further comprises the following steps: an embedded MMD discriminator module Dz is introduced, the convolution blocks Fx and Fz are multiplexed, an encoder E and a generator G realize an embedded GAN network, and the embedded GAN network is utilized to reduce MMD distance between z and distribution of a reconstruction result E (G (z)) to strengthen convergence of the whole bidirectional structure and strengthen semantic correlation characterization capability of the encoder.
Preferably, the reducing the MMD distance between z and the distribution of the reconstruction result E (g (z)) by using the embedded GAN network to enhance the convergence of the overall bidirectional structure and enhance the semantic correlation characterization capability of the encoder includes:
step S5: the encoding result E (x) is reconstructed by a generator G to generate data G (E (x)); generating data g (z) and reconstructing an encoding result E (g (z)) by an encoder E;
step S6: calculating by using an embedded MMD discriminator module Dz to obtain an MMD difference loss function distributed between z and E (G (z))And using a loss functionOptimizing an embedded MMD discriminator module Dz;
wherein ,PzRepresenting the spatial distribution of hidden variables; y isz=Dz(z);yz'=Dz(z'); z, z' represents a sample on the hidden variable space;representing a reconstruction result of the hidden variable z to generate a data recoding result;to representA conditional probability distribution of (a);λ1representing a gradient penalty term weight parameter;presentation pairThe above acquired gradient; σ denotes krep(ii) a coefficient of variance of the conforming gaussian distribution; e is the base constant of the natural logarithm;ε represents the value of z and its reconstructed resultSolving a weight parameter during weighted sampling;
step S7: solving MMD loss function distributed between z and E (G (z)) about encoder E and generator G by using optimized embedded MMD discriminator module Dz
Wherein kF represents a Gaussian kernel function; b1,buLower and upper bound parameters representing the distance L2 between a and b, respectively;
step S8: using loss functionsGenerating a guide item Tz on a coding space, and further finishing the training of the embedded GAN network;
wherein ,∝1To representAnd pixel level of the reconstruction operation2Loss of powerThe weighting coefficient of (2); nb represents the data batch size;original hidden variable z and reconstruction resultOf (b) in2Loss;
step S9: the convergence of the current Bi-GAE bidirectional structure is optimized through the trained embedded GAN network, and the semantic representation capability of the encoder E is further improved through the generated guide items.
Preferably, the method further comprises the following steps: calculating to obtain a guide item T by using an SSIM module according to the real image x and the reconstructed image G (E (x)))xThe human visual feature generation capability of the generator is enhanced;
SSIM concerns brightness (l (x, x ')), contrast (contrast) and structure (structure) (measured in combination with cs (x, x'))
LMSSSIM(x,x′)=1-MSSSIM(x,x′) (8)
Wherein, x ═ G (e (x)); l isMSSSIM(x, x') denotes the SSIM loss function, α2Indicates a loss of SSIM anda weighting factor of the loss;denotes l between x and the reconstructed result1Loss value, /)m γmRepresents the result of l (x, x') after passing through m gaussian filters; gamma represents an attenuation parameter; m represents the number of gaussian filters; c. C1Represents a constant parameter;represents the result generated by the jth filter; etajRepresenting the attenuation coefficient of the corresponding window if x has a magnitude of(s)i×si) Then the jth filter window size ispjRepresents the filter window, μx,μx′Respectively represent the average values of x and x'; sigmax,σx′Respectively, the standard deviation of x, x'.
The invention provides an unsupervised bidirectional generation automatic coding system facing convergence guarantee, which comprises:
module M1: the batch data (x, z) simultaneously generates an encoding result E (x) and generated data G (z) through an encoder E and a generator G, and the mapping from the hidden variable space to the data space and the reverse mapping from the data space to the hidden variable space are completed;
module M2: image space dataAnd implicit variable spaceData ofBy rolling blocks Fx and Fz, respectively, image space data is processedAnd latent variable space dataExtracting information to obtain extracted image space dataAnd extracted hidden variable space data
Module M3: from the extracted image space dataAnd extracted hidden variable space dataTraining a discriminator D until the loss function is minimum;
module M4: the image data is encoded by the trained encoder E to generate an encoding result, the encoding result is input into the trained generator G to obtain a reconstructed image data result, the reconstruction of the image data is completed, the overall collaborative optimization of the image space and hidden variable space bidirectional mapping process is realized, and the representation capability and the image generation capability are improved.
Preferably, the loss function in the module M3 includes: a Wasserstein distance;
wherein ,represents the Wasserstein distance;d represents the likelihood probability of the input data pair; fx and FzRepresenting a volume block.
Preferably, the method further comprises the following steps: an embedded MMD discriminator module Dz is introduced, the convolution blocks Fx and Fz are multiplexed, an encoder E and a generator G realize an embedded GAN network, and the embedded GAN network is utilized to reduce the MMD distance between z and the distribution of a reconstruction result E (G (z)) to strengthen the convergence of the whole bidirectional structure and strengthen the semantic correlation characterization capability of the encoder;
the MMD distance between z and the distribution of the reconstruction result E (G (z)) is reduced by utilizing the embedded GAN network to strengthen the convergence of the whole bidirectional structure and strengthen the semantically-related characterization capability of the encoder, and the MMD distance comprises the following steps:
module M5: the encoding result E (x) is reconstructed by a generator G to generate data G (E (x)); generating data g (z) and reconstructing an encoding result E (g (z)) by an encoder E;
module M6: calculating by using an embedded MMD discriminator module Dz to obtain an MMD difference loss function distributed between z and E (G (z))And using a loss functionOptimizing an embedded MMD discriminator module Dz;
wherein ,PzRepresenting the spatial distribution of hidden variables; y isz=Dz(z);yz′=Dz(z'); z, z' being implicitSampling over a volume space;representing a reconstruction result of the hidden variable z to generate a data recoding result;to representA conditional probability distribution of (a);λ1representing a gradient penalty term weight parameter;presentation pairThe above acquired gradient; σ denotes krep(ii) a coefficient of variance of the conforming gaussian distribution; e is the base constant of the natural logarithm;ε represents the value of z and its reconstructed resultSolving a weight parameter during weighted sampling;
module M7: solving MMD loss function distributed between z and E (G (z)) about encoder E and generator G by using optimized embedded MMD discriminator module Dz
wherein ,kFRepresenting a gaussian kernel function; bl,buLower and upper bound parameters representing the distance L2 between a and b, respectively;
module M8: using loss functionsGenerating a guide item Tz on a coding space, and further finishing the training of the embedded GAN network;
wherein ,∝1To representAnd pixel level of the reconstruction operation2Loss of powerThe weighting coefficient of (2); nb represents the data batch size;original hidden variable z and reconstruction resultOf (b) in2Loss;
module M9: the convergence of the current Bi-GAE bidirectional structure is optimized through the trained embedded GAN network, and the semantic representation capability of the encoder E is further improved through the generated guide items.
Preferably, the method further comprises the following steps: calculating to obtain a guide item T by using an SSIM module according to the real image x and the reconstructed image G (E (x)))xThe human visual feature generation capability of the generator is enhanced;
SSIM concerns brightness (1. mu. l (x, x ')), contrast (contrast) and structure (structure) (measured in combination with cs (x, x'))
LMSSSIM(x,x′)=1-MSSSIM(x,x′) (19)
Wherein, x ═ G (e (x)); l isMSSSIM(x, x') denotes the SSIM loss function, α2Indicates a loss of SSIM anda weighting factor of the loss;denotes l between x and the reconstructed result1Loss value, /)m γmRepresents the result of l (x, x') after passing through m gaussian filters; gamma represents an attenuation parameter; m represents the number of gaussian filters; c. C1Represents a constant parameter;represents the result generated by the jth filter; etajRepresenting the attenuation coefficient of the corresponding window if x has a magnitude of(s)i×si) Then the jth filter window size ispjRepresents the filter window, μx,μx′Respectively represent the average values of x and x'; sigmax,σx′Respectively represent the x-ray numbers and the x-ray numbers,standard deviation of x'.
Compared with the prior art, the invention has the following beneficial effects:
1. the convergence guarantee-oriented unsupervised bidirectional generation automatic encoder can improve the information expansion between a hidden variable space and a data space through stable convergence;
2. the invention introduces a guide in the loss function to optimize the image reconstruction and generation in the map according to the human visual mode;
3. the present invention embeds GAN for computing a leading term that enhances the representation of semantically related features in the inverse mapping to enhance the convergence of the auto-encoder.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a logical framework diagram of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention discloses an unsupervised bidirectional generation automatic encoder for convergence guarantee. Improving the generation and characterization capabilities of autoencoders is a major research problem in the field of machine learning. However, optimizing the bi-directional mapping while stabilizing convergence has presented significant challenges. Most existing auto-encoders fail to automatically trade-off the bi-directional mapping between the encoder and the decoder/generator. The invention provides Bi-GAE, which is an unsupervised bidirectional generation automatic encoder based on BiGAN. First, we introduce two guiding terms in the loss function for enhancing information expansion to follow the human visual model in the mapping and improve the semantic-related characterization capability in the inverse mapping. In addition, we embedded a GAN to improve Bi-GAE convergence and characterization capability. The experimental result shows that the Bi-GAE has competitive advantages in generation and characterization and has stable convergence. Compared with the similar method, the characterization capability of Bi-GAE improves the classification accuracy of high-resolution images by about 6.607%. Furthermore, in image reconstruction, Bi-GAE increased the Structural Similarity (SSIM) index by 0.017 and decreased the fracht onset distance (FID) by 3.098.
Example 1
The invention provides an unsupervised bidirectional generation automatic coding method for convergence guarantee, which comprises the following steps:
step S1: the batch data (x, z) simultaneously generates an encoding result E (x) and generated data G (z) through an encoder E and a generator G, and the mapping from the hidden variable space to the data space and the reverse mapping from the data space to the hidden variable space are completed;
step S2: image space dataAnd latent variable space dataBy rolling blocks Fx and Fz, respectively, image space data is processedAnd latent variable space dataExtracting information to obtain extracted image space dataAnd extracted hidden variable space data
Step S3: from the extracted image space dataAnd extracted hidden variable space dataTraining a discriminator D until the loss function is minimum;
step S4: the image data is encoded by the trained encoder E to generate an encoding result, the encoding result is input into the trained generator G to obtain a reconstructed image data result, the reconstruction of the image data is completed, the overall collaborative optimization of the image space and hidden variable space bidirectional mapping process is realized, and the representation capability and the image generation capability are improved.
Specifically, the generator G and the encoder E use a convolution network module and a deconvolution network module, respectively, in the DCGAN structure.
Specifically, the loss function in step S3 includes: a Wasserstein distance;
wherein ,represents the Wasserstein distance;d represents the likelihood probability of the input data pair; fx and FzRepresenting a volume block.
Specifically, the method further comprises the following steps: an embedded MMD discriminator module Dz is introduced, the convolution blocks Fx and Fz are multiplexed, an encoder E and a generator G realize an embedded GAN network, and the embedded GAN network is utilized to reduce MMD distance between z and distribution of a reconstruction result E (G (z)) to strengthen convergence of the whole bidirectional structure and strengthen semantic correlation characterization capability of the encoder.
Specifically, the reducing of the MMD distance between z and the distribution of the reconstruction result E (g (z)) by using the embedded GAN network enhances the convergence of the overall bidirectional structure and enhances the semantic correlation characterization capability of the encoder includes:
step S5: the encoding result E (x) is reconstructed by a generator G to generate data G (E (x)); generating data g (z) and reconstructing an encoding result E (g (z)) by an encoder E;
step S6: calculating by using an embedded MMD discriminator module Dz to obtain an MMD difference loss function distributed between z and E (G (z))And using a loss functionOptimizing an embedded MMD discriminator module Dz;
wherein ,PzRepresenting the spatial distribution of hidden variables; y isz=Dz(z);yz′=Dz(z'); z, z' represents a sample on the hidden variable space;representing a reconstruction result of the hidden variable z to generate a data recoding result;to representA conditional probability distribution of (a);λ1representing a gradient penalty term weight parameter;presentation pairThe above acquired gradient; σ denotes krep(ii) a coefficient of variance of the conforming gaussian distribution; e is the base constant of the natural logarithm;ε represents the value of z and its reconstructed resultSolving a weight parameter during weighted sampling;
step S7: solving MMD loss function distributed between z and E (G (z)) about encoder E and generator G by using optimized embedded MMD discriminator module Dz
wherein ,kFRepresenting a gaussian kernel function; bl,buLower and upper bound parameters representing the distance L2 between a and b, respectively;
step S8: using loss functionsGenerating a guide item Tz on a coding space, and further finishing the training of the embedded GAN network;
wherein ,∝1To representAnd pixel level of the reconstruction operation2Loss of powerThe weighting coefficient of (2); nb represents the data batch size;original hidden variable z and reconstruction resultOf (b) in2Loss;
step S9: the convergence of the current Bi-GAE bidirectional structure is optimized through the trained embedded GAN network, and the semantic representation capability of the encoder E is further improved through the generated guide items.
Specifically, the method further comprises the following steps: calculating to obtain a guide item T by using an SSIM module according to the real image x and the reconstructed image G (E (x)))xThe human visual feature generation capability of the generator is enhanced;
SSIM concerns brightness (l (x, x ')), contrast (contrast) and structure (structure) (measured in combination with cs (x, x'))
LMSSSIM(x,x′)=1-MSSSIM(x,x′) (8)
Wherein, x ═ G (e (x)); l isMSSSIM(x, x') denotes the SSIM loss function, α2Indicates a loss of SSIM anda weighting factor of the loss;denotes l between x and the reconstructed result1Loss value, /)m γmRepresents the result of l (x, x') after passing through m gaussian filters; gamma represents an attenuation parameter; m represents the number of gaussian filters; c. C1Represents a constant parameter;represents the result generated by the jth filter; etajRepresenting the attenuation coefficient of the corresponding window if x has a magnitude of(s)i×si) Then the jth filter window size ispjRepresents the filter window, μx,μx′Respectively represent the average values of x and x'; sigmax,σx′Respectively, the standard deviation of x, x'.
The convergence guarantee-oriented unsupervised bidirectional generation automatic coding system can be realized through the step flow in the convergence guarantee-oriented unsupervised bidirectional generation automatic coding method. The convergence guarantee oriented unsupervised bidirectional generation automatic coding method can be understood as a preferred example of the convergence guarantee oriented unsupervised bidirectional generation automatic coding system by those skilled in the art.
Example 2
Example 2 is a preferred example of example 1
In order to overcome the defects of the existing automatic encoder in the aspects of bidirectional mapping balance and convergence, the invention provides the unsupervised generation automatic encoder based on the BiGAN, and the generation and characterization capabilities of the automatic encoder are effectively improved.
The invention provides an unsupervised generative automatic encoder which can simultaneously realize the balance and joint optimization of mapping and inverse mapping. Aiming at the limitation of BiGAN, the invention carries out two main optimizations on Bi-GAE. First, the present invention introduces Wasserstein distance and embeds GAN to enhance the convergence of Bi-GAE. In order to estimate the Wasserstein distance of the joint distribution, the invention designs two convolution blocks for feature extraction in Bi-GAE. The invention proves the convergence of Bi-GAE theoretically. The invention introduces two terms for the generator and the encoder, respectively, which enable information expansion in mapping and inverse mapping. These extensions effectively achieve a trade-off between bi-directional mapping.
Fig. 1 shows a framework of an unsupervised bi-directional generation auto-encoder for convergence guarantee according to the present invention. Similar to BiGAN, the body part of Bi-GAE comprises a generator G, an encoder E and a discriminator D. As a bidirectional mapping encoder, the bidirectional optimization targets of the architecture are respectively: (1) mapping of hidden variables to data space (generator G); (2) inverse mapping of data space to hidden variable space (coding space). Compared with the common encoder and the generative encoder, the basic improvement idea of the architecture technical scheme comprises the following steps: (1) the joint distribution of the data space and the hidden variable space is utilized to realize the simultaneous optimization of the bidirectional mapping of the image space and the hidden variable space and further realize the overall cooperative optimization of the bidirectional process; (2) guiding mechanisms are respectively introduced in the two-way mapping training process, the convergence of the two-way architecture is guaranteed, and the visual property of the generator and the semantic correlation representation of the encoder are respectively optimized. The above idea is based on the basic architecture of BiGAN in the specific implementation of the technical solution, introduces Wasserstein distance as the judgment index of discriminator D, and designs 4 specific embedded modules to implement two guiding mechanisms in the bidirectional mapping process.
As shown in FIG. 1, the main part of the Bi-GAE is based on the BiGAN structure, and the encoder E and the generator G of the Bi-GAE respectively use a convolution network module and a deconvolution network module in the DCGAN structure in specific implementation. In the Bi-GAE, Wasserstein distance is selectively introduced into a main structure to serve as an index for distinguishing joint distribution distance in a discriminator D (specifically realized by a convolution block in DCGAN), correspondingly, in order to realize the simultaneous and feature extraction of a data space and an implicit variable space, two feature extraction convolution blocks Fx and Fz are introduced, and features of the data space and the implicit variable space are extracted as vectors and input into D. For the two guiding mechanisms described above, we utilize the SSIM module to calculate the structural difference between the real data x and the corresponding reconstructed data G (e (x)); for the coding space, we further embed an MMD-based countermeasure network module, further strengthen the convergence of the overall bidirectional structure and strengthen the semantic correlation characterization capability of the encoder by reducing the MMD distance between z and the distribution of the reconstructed result E (g (z)). The above two-way training process, the Wasserstein distance, the two guiding mechanisms and the technical solution of four embedded modules will be further elaborated below.
In each batch training process, the batch data input by the Bi-GAE is (x, z), that is, the data space batch instance x input into the encoder E and the hidden variable space normal distribution batch sample z input into the generator G. The specific training process is as follows:
(1) using (x, z), the encoder E and the generator G simultaneously generate an encoding result E (x) and generate data G (z). The step is to complete the mapping from the hidden variable space to the data space and the reverse mapping from the data space to the hidden variable space at the same time.
(2) By obtaining image space dataAnd latent variable space dataReferring to the BiGAN two-way training concept, the difference between two joint distributions is determined by a discriminator based on the joint distribution (x, e (x)) and the generated data (g (z), z). The core idea is to train the discriminator D to make it as large as possible according to the judgment distance between the distribution and the generation distribution, and for the encoder E and the generator G which are synchronously trained, the optimized ones are usedD, the result of judging the distance difference between the distributions Wasserstein is output for training, and the goal is to make D unable to distinguish the two distributions as much as possible, that is, the generated data obtained by D is as small as possible in relation to the judgment distance of the basis data. The core idea is to follow the core principle of BiGAN, and the mathematical meaning of this goal can be said to be that two joint probabilities are equal, i.e. it is considered that the bidirectional optimization is successfully completed. We can consider { x, E (x) } joint distribution as Px,E=PE(z | x) P (x), (G (z)), z } joint distribution is PG,z=P(z)PG(x | z), P (x), P (z) is the true distribution of data space, hidden space, PE,PGIs the conditional distribution of the encoder and generator. Therefore, under the premise of the same joint distribution, P can be respectively realizedEAnd P (z), P (x) and PGThereby completing the synchronous optimization of the encoder and the generator. To ensure convergence, we introduced the concept that the Wasserstein distance replaces the KL divergence in BiGAN.
(3) Based on the main flow, we obtain x reconstruction result G (E (x)) according to encoding result E (x), and obtain z reconstruction result E (G (z)) according to generation result G (z). And (3) guiding the encoder E to the direction of strengthening semantic representation by using an MMD embedded network according to (E (G (z)), z) and guiding the generator G to simultaneously optimize in the direction meeting the visual characteristics of human eyes by using an SSIM module according to (G (E (x)), x).
In order to solve the problem that BiGAN is difficult to converge, Wasserstein distance is required to be introduced into a BiGAN structure as a loss function, so that the Wasserstein distance is firstly obtained from an image spaceAndtwo convolution modules Fx and Fz are therefore designed to do this.
In order to make the generation capability of the generator G meet the human visual characteristics, we use the Structural Similarity Indexing Method (SSIM) loss to construct the bootstrap term Tx as part of the generator loss function for the generator. The overall solution flow for Tx is: an SSIM module is introduced between the (G (E (x)) and the (x)) to calculate a similarity loss value between a reconstructed image space and an original image space;
in order to solve the problem that estimation of Wasserstein is invalid when a discriminator is close to convergence, an embedded GAN model based on MMD is nested on the basis of the existing structure. The embedded model effectively multiplexes the encoder E and generator G, and accordingly, an embedded GAN discriminator should be introduced.
Introducing Wasserstein distance
A traditional BiGAN network is optimized aiming at a joint data space, X distribution of an image data space is P (X), Z distribution of a hidden variable data space is P (Z), and a discriminant D training target of the traditional BiGAN network is a maximum probability PD(Y | X, Z) wherein (Y ═ 1| (X, e (X)); (Y ═ 0| (x, E (x))), that is, the judgment accuracy is maximized, and the opposite is true for the generator G and the trainer E. Therefore, the training targets of BiGAN as a bidirectional structure are as follows:
wherein D () is the similarity probability of the input data pair, and G (), E () is the output of the generator and encoder.
As shown in equation (1), the conventional BiGAN uses Jensen-shannon (js) divergence or Kullback-leibler (kl) divergence in estimating the judgment distance between the basis data space (x, e (x)) and the generated data space (g (z), z). However, the divergence has the obvious problem that the gradient of the loss function associated with equation (1) is ineffective when D tends to converge, thus leading to the failure of the training of the BiGAN structure. Therefore, an effective way to optimize the convergence of the BiGAN bi-directional structure is to introduce a distance measure such that the gradient of the corresponding loss function is effective and not 0 at any time. In summary, the Wasserstein distance was introduced into Bi-GAE. For distribution P according to dataxAnd generating a data distribution PgIn practical applications, the Wasserstein distance, W, between two distributions is estimated as follows:
wherein ,gθ(. cndot.) and D (-) output of the generator G and the discriminator D, respectively, E () represents expectation. However, the requirement of equation (2) that D (-) must satisfy the 1-Lipschit constraint, and the Wasserstein distance still has the problem of being unable to estimate W correctly when D is close to converging, so we will solve these problems by further design.
Wasserstein distance is introduced into the bidirectional countermeasure network, so that two volume blocks Fx and Fz are correspondingly designed and respectively used for image space dataAnd latent variable space dataInformation extraction is carried out, so that two pieces of information are helped to have a data form (shape) capable of being aggregated and connected, and the order is givenTo obtain The final calculation in Bi-GAE for Wasserstein distance between the data space and the production data space is as follows:
in summary, the penalty function L of the discriminator-convolution block (D-Fx-Fz) is designed accordinglyDFJoint loss function L of sum generator-encoder (G-E)EGAs shown in equations (4) and (5), respectively:
wherein ,nbFor batch size, to ensure that the 1-Lipschit limit is met, a Gradient Penalty Term (GP Term) is introduced. In the penalty term, there are
wherein ,TxAs a guide term for the generation capability, ε represents the comparison of the real data sample x and the generated data sampleWeight parameter in weighted sampling, true hidden space data sample z and coding result sampleCalculating a weight parameter during weighted sampling, wherein lambda represents a weight coefficient of a ladder penalty term,to representIn thatGradient found over variable, sigma representing TxThe weight parameter of (2).
Embedding a MMD-based GAN network
When the Wasserstein distance is introduced, the problem of invalid estimation of the Wasserstein distance when convergence is approached is not solved, namely when the parameters of the discriminator D tend to converge, the network has the condition that the Wasserstein distance cannot be correctly estimated, so that the Bi-GAE cannot be trained by using a correct gradient. Meanwhile, in order to realize the enhancement of semantic representation capability in the mapping from the data space to the hidden variable space on an encoder, an error design guide term of encoding reconstruction is utilized to realize the process. Both of the above processes can be implemented using an MMD based embedded GAN network.
First, we briefly describe the flow of MMD embedded GAN networks. The invention introduces an embedded MMD discriminator module Dz, and multiplexes the volume blocks Fx and Fz, an encoder E and a generator G to realize an embedded GAN network. For the input hidden variable z and the reconstruction result E (g (z)) of the completed main process, the embedded GAN network of the present invention has the following implementation process:
1) firstly, an embedded Dz module is utilized to solve the MMD difference loss function distributed between z and E (G (z))This loss is used to optimize the Dz module as shown in equation (8).
2) Solving MMD loss function of distribution between z and E (G (z)) about E, G by using optimized Dz moduleAs shown in equation (9), and the loss function is used to generate the guiding term Tz on the coding space as shown in equation (10), thereby completing the training of the embedded GAN network.
The embedded GAN network further optimizes the convergence of the current Bi-GAE bidirectional structure, and the generated guide item can further improve the semantic representation capability of the encoder E. The theoretical derivation and guidance mechanism of MMD-based embedded GAN networks is designed as follows.
Optimizing convergence
Aiming at the estimation failure problem of Wasserstein distance when D is close to convergence, a solution is found from the calculation method of the Wasserstein distance, because the Wasserstein distance is a special case of Maximum Mean Difference (MMD) in nature, namely a case of using a linear kernel, and therefore, when the MMD with a high-order Gaussian kernel is used as an index to measure the difference of distribution, the convergence performance can be further improved. In combination with the definition of MMD and the characteristics of the joint distribution to be solved in the joint Bi-GAE, we define MMD on the joint distribution space of image data and hidden variables as follows, and have a threshold e, { f } which is a set of continuous functions:
wherein ,EfRepresenting the expected difference obtained when using f to measure the sample difference between two distributions, can be understood to reflect the difference in the distributions as the parameter of D tends to converge (i.e., as the parameter of D convergesTime), by two lemmas: lesion 1 and 2, we useThe effective quantitative estimation of the sequential convergence of the introduced embedded GAN structure is carried out.
Introduction 1: is provided withWe can get itBy analogy with equation (6), we can see that we use batch sample crushingAnd ziIn the measurement function f0Upper utilization expected valueAndthe difference therebetween is asUsing the function g0In the case of bulk samplesAnd on xiσ is toObtainable in respect ofLower bound and ε is toObtainable in respect ofThe lower bound, defined by MMD, is σ > 0 and ε > 0.
And (3) proving that:
The proof of lemma 1 means that when D converges in the hidden variable space and the image data space, respectively, Bi-GAE is inI.e. to achieve an overall convergence on the joint distribution space.
and (3) proving that: suppose thatLet E (g (f (a))) -E (g (f (b)) > tau. Let k be g · f, there are E (k) (a) -E (k (b)) > τ ═ M (a, b), which contradicts the MMD definition, and therefore there are
Lemma 2 means that MMD is stable for continuous functions f (e.g., G and E trained in Bi-GAE). Since GAN embedded in Bi-GAE is MMD based, we can deduce the upper bound of M (E (g (z)), z) from lemmas 1 and 2.
By lemma 1 and lemma 2, we can derive theorem 1 as follows:
theorem 1: when D is close to convergence, an embedded GAN structure is not introduced (namely D is not newly added)z) In the case of (B), the upper limit of M (E (G (z)), z) tends to be 2E.
And (3) proving that:
M(E(G(z)),z)≤M(E(G(z)),E(x))+M(E(x),z)
m (x, g (z)) + M (e) (x), z)// theorem 1
2 · max ({ epsilon, sigma }) ≦ 2 · e// theorem 2 ≦ epsilon + sigma
While an embedded GAN is introduced, i.e. a new D is addedzAfter the module, according to the theorem 2 and the formula (6), when D is obtainedzUpon convergence, the MMD should satisfy:
M′(E(G(z)),z)≤M′(x,G(z))+M′(E(x),z)≤e (7)
equation (7) means that at this time the upper limit of M' (E (g (z)), z) can be reduced to E. Meanwhile, the reduction is essentially to promote the convergence of the Bi-GAE by promoting the information interaction capability in the encoding and true hidden variable space (e (x), z) and the true and generated image space (x, g (z)), respectively.
1) Enhancing coding semantic representation capability
The goal of Bi-GAE in implementing the mapping of image space to latent variable space is to enhance the semantically-related characterization capability when de-entanglement is considered. Embedding in Bi-GAE according to equation (7)The GAN of (2) realizes information expansion in the encoding process by reducing M (E (x), z), thereby improving the characterization capability. MMD-based loss of embedded GAN is dominated by the repulsion term LrepAnd attraction item LattAnd (4) forming. In actual computation, a single bounded Gaussian kernel method (kernel function k) is selectedF) To reduce the amount of computation. Concrete calculating timeOrder toTo representThe conditional probability distribution of (2). In summary, DzLoss functionAnd E-G (encoder-generator) loss function ofAs shown in equations (8) and (9), respectively:
wherein yz=Dz(z),yz′=Dz(z′), PzRepresenting hidden variable space componentsCloth, yz=Dz(z),yz′=Dz(z '), z, z' represent samples over the hidden variable space,represents the reconstruction result of the hidden variable z, i.e. the re-encoding result of the generated data, e.g.,to representA conditional probability distribution of (a);λ1a gradient penalty term weight parameter is represented,presentation pairThe above acquired gradient; σ denotes krep(ii) a coefficient of variance of the conforming gaussian distribution; e is the base constant of the natural logarithm (≈ 2.718281828459.);ε represents the value of z and its reconstructed resultAnd calculating a weight parameter during weighted sampling.
wherein ,kFRepresenting a kernel function (e.g. Gaussian kernel), bl,buThe lower and upper bound parameters represent the distance L2 between a and b, respectively.
For the encoder-decoder, in order to prevent the occurrence of excessive element-level errors, therefore, with the L2 loss as a regularization term, the guiding term T for the encoder to enhance the semantic representation capability is finally knownzThe definition is as follows:
wherein ,∝1To representAnd pixel level of the reconstruction operation2Loss of powerWeighting coefficient of nbWhich is indicative of the size of the data batch,namely the original hidden variable z and the reconstructed resultOf (b) in2Loss;
3) human eye visual feature generation capability of enhanced generator introduced with SSIM module
One of the goals of interest of Bi-GAE in generating image data is to facilitate image generation and reconstruction in accordance with the human visual model. In order to realize information expansion in the process of mapping from a hidden variable space to an image space, the Bi-GAE introduces a Structural Similarity Index Method (SSIM), which comprises three human visual indicators, namely brightness (luminance), contrast (contrast) and structure (structure). Therefore, Bi-GAE designs a guiding term T between the real image x and the reconstructed image G (E (x)))x。
In training encoder E and generator G, we compute x' ═ G (E (x)). If x is of size(s)i×si) And m Gaussian filters exist, we can calculate MS-SSIM(multiple size SSIM) as follows:
wherein ,lm γmDenotes the result of l (x, x') after m Gaussian filters, where γ is the attenuation parameter, m denotes the number of Gaussian filters, c1Is a constant parameter, and is a constant parameter,representing the result generated by the jth filter, where etajAttenuation coefficient for corresponding window if x is(s)i×si) Then the jth filter window size isThe size of the jth filter isl (x, x ') and cs (x, x') are defined as follows:
wherein pjRepresents the filter window, μx,μx′Respectively represent the average values of x and x'; sigmax,σx′Respectively, the standard deviation of x, x'.
The corresponding design SSIM loss function is:
LMSSSIM(x,x′)=1-MSSSIM(x,x′) (12)
analogy TzIn order to avoid excessive pixel-level errors during reconstruction, we need to add a regular term, considering that the dimension of the image space is high, and in order to prevent the problem of potential excessive penalty of the L2 loss function, we use the L1 loss function to implement the regular term here, and in conclusion, we design the self-guiding term T for the generatorxAs follows:
Wherein by default there is alpha20.84x ═ G (e (x)) means loss of SSIM anda weighting factor of the loss; denotes l between x and the reconstructed result1Loss value.
System implementation
Each component in the Bi-GAE is implemented based on the source code of DCGAN. Let thetaE,θG,θD,θDz and θF={θFx,θFzDenotes parameters of E, G, D, Dz and { Fx, Fz }, respectively. Accordingly, the present invention uses three custom Adam optimizers, β 1 ═ 0.5 and β 2 ═ 0.9: are respectively used for optimizing thetaG and θEAdam ofEGFor optimizing thetaF and θDAdam ofFDAnd Adam for optimizing DzDzThe learning rates are lrEG,lrFD and lrDz。
The Bi-GAE run included 4 steps:
step 1: d and { Fx, Fz } are trained using the data and latent samples (x, z), and the Wasserstein loss distributed jointly in terms of space and guess space is determined using the loss function in equation (4), i.e., discriminator D. L isDFIterative training is performed, the loss function is used to update the discriminator D and the convolution feature extraction module F, the steps are repeated,Dsecond (default 5).
Step 2: training G and E with another batch of data (x, z), and using the loss function L in equation (5)EGTraining G and E, tracing from equation (5)It can be seen that the penalty is a weighted result of the distance of the joint distribution Wasserstein judged by the optimized arbiter and the SSIM difference penalty between data x and its reconstructed result. Given x, calculate x' ═ G (E)ng(x) And calculates Tx in equation (13) using (x, x') to train G.
And 3, step 3: input a batch of z to train Dz and use the loss function L in equation (8)DzAnd performing iterative training, wherein the loss is the MMD loss between the z and z reconstruction results E (G (z)) obtained after decomposition. The step is repeatedSecond (default to 3).
And 4, step 4: inputting a batch of z to calculateInput deviceE is trained with the loss function Tz in equation (10).
The settings of parameters used in training and testing on the Celeba-hq and Mnist data sets in the overall process are shown in Table 1, and Table 1 shows the parameter settings in the present invention.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.
Claims (10)
1. An unsupervised bidirectional generation automatic coding method for convergence guarantee is characterized by comprising the following steps:
step S1: the batch data (x, z) simultaneously generates an encoding result E (x) and generated data G (z) through an encoder E and a generator G, and the mapping from the hidden variable space to the data space and the reverse mapping from the data space to the hidden variable space are completed;
step S2: image space dataAnd latent variable space dataBy rolling up blocks FxAnd Fz, respectively, for image space dataAnd latent variable space dataExtracting information to obtain extracted image space dataAnd extracted hidden variable space data
Step S3: from the extracted image space dataAnd extracted hidden variable space dataTraining a discriminator D until the loss function is minimum;
step S4: the image data is encoded by the trained encoder E to generate an encoding result, the encoding result is input into the trained generator G to obtain a reconstructed image data result, the reconstruction of the image data is completed, the overall collaborative optimization of the image space and hidden variable space bidirectional mapping process is realized, and the representation capability and the image generation capability are improved.
2. The convergence guarantee oriented unsupervised bidirectional generation automatic encoding method of claim 1, wherein the generator G and the encoder E use a convolutional network module and a deconvolution network module, respectively, in a DCGAN structure.
3. The convergence assurance oriented unsupervised bi-directional generation automatic encoding method of claim 1, wherein the loss function in the step S3 comprises: a Wasserstein distance;
4. The convergence guarantee oriented unsupervised bi-directional generation automatic coding method according to claim 1, further comprising: an embedded MMD discriminator module Dz is introduced, the convolution blocks Fx and Fz are multiplexed, an encoder E and a generator G realize an embedded GAN network, and the embedded GAN network is utilized to reduce MMD distance between z and distribution of a reconstruction result E (G (z)) to strengthen convergence of the whole bidirectional structure and strengthen semantic correlation characterization capability of the encoder.
5. The convergence guarantee oriented unsupervised bidirectional generation automatic coding method of claim 4, wherein the reducing MMD distance between z and E (G (z)) distribution of reconstruction results by using embedded GAN network to enhance convergence of the whole bidirectional structure and enhance semantically related characterization capability of the encoder comprises:
step S5: the encoding result E (x) is reconstructed by a generator G to generate data G (E (x)); generating data g (z) and reconstructing an encoding result E (g (z)) by an encoder E;
step S6: calculating by using an embedded MMD discriminator module Dz to obtain an MMD difference loss function distributed between z and E (G (z))And using a loss functionOptimizing an embedded MMD discriminator module Dz;
wherein ,PzRepresenting the spatial distribution of hidden variables; y isz=Fz(z);yz'=Dz(z'); z, z' represents a sample on the hidden variable space;representing a reconstruction result of the hidden variable z to generate a data recoding result;to representA conditional probability distribution of (a);λ1representing a gradient penalty term weight parameter;presentation pairThe above acquired gradient; σ denotes krep(ii) a coefficient of variance of the conforming gaussian distribution; e is the base constant of the natural logarithm;ε represents the value of z and its reconstructed resultSolving a weight parameter during weighted sampling;
step S7: solving MMD loss function distributed between z and E (G (z)) about encoder E and generator G by using optimized embedded MMD discriminator module Dz
wherein ,kFRepresenting a gaussian kernel function; bl,buLower and upper bound parameters representing the distance L2 between a and b, respectively;
step S8: using loss functionsGenerating a guide item Tz on a coding space, and further finishing the training of the embedded GAN network;
wherein ,∝1To representAnd pixel level of the reconstruction operation2Loss of powerThe weighting coefficient of (2); n isbRepresenting a data batch size;original hidden variable z and reconstruction resultOf (b) in2Loss;
step S9: the convergence of the current Bi-GAE bidirectional structure is optimized through the trained embedded GAN network, and the semantic representation capability of the encoder E is further improved through the generated guide items.
6. Convergence assurance oriented unsupervised duplex according to claim 5The method for automatically encoding the vector generator is characterized by further comprising the following steps: calculating to obtain a guide item T by using an SSIM module according to the real image x and the reconstructed image G (E (x)))xThe human visual feature generation capability of the generator is enhanced;
SSIM concerns brightness (l (x, x ')), contrast (contrast) and structure (structure) (measured in combination with cs (x, x'))
LMSSSIM(x,x')=1-MSSSIM(x,x') (8)
Wherein, x ═ G (e (x)); l isMSSSIM(x, x') denotes the SSIM loss function, α2Indicates a loss of SSIM anda weighting factor of the loss;denotes l between x and the reconstructed result1Loss value, /)m γmRepresents the result of l (x, x') after passing through m gaussian filters; gamma represents an attenuation parameter; m represents the number of gaussian filters; c. C1Represents a constant parameter;represents the result generated by the jth filter; etajRepresenting the attenuation coefficient of the corresponding window if x has a magnitude of(s)i×si) Then the jth filter window size ispjRepresents the filter window, μx,μx'Respectively represent the average values of x and x'; sigmax,σx'Respectively, the standard deviation of x, x'.
7. An unsupervised bi-directional generation automatic coding system for convergence guarantee, comprising:
module M1: the batch data (x, z) simultaneously generates an encoding result E (x) and generated data G (z) through an encoder E and a generator G, and the mapping from the hidden variable space to the data space and the reverse mapping from the data space to the hidden variable space are completed;
module M2: image space dataAnd latent variable space dataBy rolling blocks Fx and Fz, respectively, image space data is processedAnd latent variable space dataExtracting information to obtain extracted image space dataAnd extracted hidden variable space data
Module M3: from the extracted image space dataAnd extracted hidden variable space dataTraining a discriminator D until the loss function is minimum;
module M4: the image data is encoded by the trained encoder E to generate an encoding result, the encoding result is input into the trained generator G to obtain a reconstructed image data result, the reconstruction of the image data is completed, the overall collaborative optimization of the image space and hidden variable space bidirectional mapping process is realized, and the representation capability and the image generation capability are improved.
8. The convergence assurance oriented unsupervised bi-directional generation automatic coding system of claim 7, wherein the loss function in the module M3 comprises: a Wasserstein distance;
9. The convergence assurance oriented unsupervised bi-directional generation automatic encoding system of claim 7, further comprising: an embedded MMD discriminator module Dz is introduced, the convolution blocks Fx and Fz are multiplexed, an encoder E and a generator G realize an embedded GAN network, and the embedded GAN network is utilized to reduce the MMD distance between z and the distribution of a reconstruction result E (G (z)) to strengthen the convergence of the whole bidirectional structure and strengthen the semantic correlation characterization capability of the encoder;
the MMD distance between z and the distribution of the reconstruction result E (G (z)) is reduced by utilizing the embedded GAN network to strengthen the convergence of the whole bidirectional structure and strengthen the semantically-related characterization capability of the encoder, and the MMD distance comprises the following steps:
module M5: the encoding result E (x) is reconstructed by a generator G to generate data G (E (x)); generating data g (z) and reconstructing an encoding result E (g (z)) by an encoder E;
module M6: calculating by using an embedded MMD discriminator module Dz to obtain an MMD difference loss function distributed between z and E (G (z))And using a loss functionOptimizing an embedded MMD discriminator module Dz;
wherein ,PzRepresenting the spatial distribution of hidden variables; y isz=Dz(z);yz'=Dz(z'); z, z' represents a sample on the hidden variable space;representing a reconstruction result of the hidden variable z to generate a data recoding result;to representA conditional probability distribution of (a);λ1representing a gradient penalty term weight parameter;presentation pairThe above acquired gradient; σ denotes krep(ii) a coefficient of variance of the conforming gaussian distribution; e is the base constant of the natural logarithm;ε represents the value of z and its reconstructed resultSolving a weight parameter during weighted sampling;
module M7: solving MMD loss function distributed between z and E (G (z)) about encoder E and generator G by using optimized embedded MMD discriminator module Dz
wherein ,kFRepresenting a gaussian kernel function; bl,buLower and upper bound parameters representing the distance L2 between a and b, respectively;
module M8: using loss functionsGenerating a guide item Tz on a coding space, and further finishing the training of the embedded GAN network;
wherein ,∝1To representAnd pixel level of the reconstruction operation2Loss of powerThe weighting coefficient of (2); n isbRepresenting a data batch size;original hidden variable z and reconstruction resultOf (b) in2Loss;
module M9: the convergence of the current Bi-GAE bidirectional structure is optimized through the trained embedded GAN network, and the semantic representation capability of the encoder E is further improved through the generated guide items.
10. The convergence assurance oriented unsupervised bi-directional generation automatic encoding system of claim 9, further comprising: calculating to obtain a guide item T by using an SSIM module according to the real image x and the reconstructed image G (E (x)))xThe human visual feature generation capability of the generator is enhanced;
SSIM concerns brightness (l (x, x ')), contrast (contrast) and structure (structure) (measured in combination with cs (x, x'))
LMSSSIM(x,x')=1-MSSSIM(x,x') (19)
Wherein, x ═ G (e (x)); l isMSSSIM(x, x') denotes the SSIM loss function, α2Indicates a loss of SSIM anda weighting factor of the loss;denotes l between x and the reconstructed result1Loss value, /)m γmRepresents the result of l (x, x') after passing through m gaussian filters; gamma represents an attenuation parameter; m represents the number of gaussian filters; c. C1Represents a constant parameter;represents the result generated by the jth filter; etajRepresenting the attenuation coefficient of the corresponding window if x has a magnitude of(s)i×si) Then the jth filter window size ispjRepresents the filter window, μx,μx'Respectively represent the average values of x and x'; sigmax,σx'Respectively, the standard deviation of x, x'.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110678193.6A CN113298895B (en) | 2021-06-18 | 2021-06-18 | Automatic encoding method and system for unsupervised bidirectional generation oriented to convergence guarantee |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110678193.6A CN113298895B (en) | 2021-06-18 | 2021-06-18 | Automatic encoding method and system for unsupervised bidirectional generation oriented to convergence guarantee |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113298895A true CN113298895A (en) | 2021-08-24 |
CN113298895B CN113298895B (en) | 2023-05-12 |
Family
ID=77328729
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110678193.6A Active CN113298895B (en) | 2021-06-18 | 2021-06-18 | Automatic encoding method and system for unsupervised bidirectional generation oriented to convergence guarantee |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113298895B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114330514A (en) * | 2021-12-14 | 2022-04-12 | 深圳大学 | Data reconstruction method and system based on depth features and gradient information |
CN115242250A (en) * | 2022-09-21 | 2022-10-25 | 成都工业学院 | Encoding and decoding method for single-full mapping of multi-value chain data element allocation |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109146988A (en) * | 2018-06-27 | 2019-01-04 | 南京邮电大学 | Non-fully projection CT image rebuilding method based on VAEGAN |
CN109523463A (en) * | 2018-11-20 | 2019-03-26 | 中山大学 | A kind of face aging method generating confrontation network based on condition |
CN110751698A (en) * | 2019-09-27 | 2020-02-04 | 太原理工大学 | Text-to-image generation method based on hybrid network model |
CN110866958A (en) * | 2019-10-28 | 2020-03-06 | 清华大学深圳国际研究生院 | Method for text to image |
US10652565B1 (en) * | 2017-10-12 | 2020-05-12 | Amazon Technologies, Inc. | Image compression and decompression using embeddings |
CN111340791A (en) * | 2020-03-02 | 2020-06-26 | 浙江浙能技术研究院有限公司 | Photovoltaic module unsupervised defect detection method based on GAN improved algorithm |
CN112070209A (en) * | 2020-08-13 | 2020-12-11 | 河北大学 | Stable controllable image generation model training method based on W distance |
CN112424779A (en) * | 2018-07-13 | 2021-02-26 | 映佳控制公司 | Method and system for generating synthetic anonymous data for given task |
-
2021
- 2021-06-18 CN CN202110678193.6A patent/CN113298895B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10652565B1 (en) * | 2017-10-12 | 2020-05-12 | Amazon Technologies, Inc. | Image compression and decompression using embeddings |
CN109146988A (en) * | 2018-06-27 | 2019-01-04 | 南京邮电大学 | Non-fully projection CT image rebuilding method based on VAEGAN |
CN112424779A (en) * | 2018-07-13 | 2021-02-26 | 映佳控制公司 | Method and system for generating synthetic anonymous data for given task |
CN109523463A (en) * | 2018-11-20 | 2019-03-26 | 中山大学 | A kind of face aging method generating confrontation network based on condition |
CN110751698A (en) * | 2019-09-27 | 2020-02-04 | 太原理工大学 | Text-to-image generation method based on hybrid network model |
CN110866958A (en) * | 2019-10-28 | 2020-03-06 | 清华大学深圳国际研究生院 | Method for text to image |
CN111340791A (en) * | 2020-03-02 | 2020-06-26 | 浙江浙能技术研究院有限公司 | Photovoltaic module unsupervised defect detection method based on GAN improved algorithm |
CN112070209A (en) * | 2020-08-13 | 2020-12-11 | 河北大学 | Stable controllable image generation model training method based on W distance |
Non-Patent Citations (3)
Title |
---|
JEFF DONAHUE 等: "ADVERSARIAL FEATURE LEARNING", 《ARXIV:1605.09782V7》 * |
MARTIN ARJOVSKY 等: "Wasserstein GAN", 《ARXIV:1701.07875V3》 * |
SHENG MAO 等: "Discriminative Autoencoding Framework for Simple and Efficient Anomaly Detection", 《DIGITAL OBJECT IDENTIFIER》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114330514A (en) * | 2021-12-14 | 2022-04-12 | 深圳大学 | Data reconstruction method and system based on depth features and gradient information |
CN114330514B (en) * | 2021-12-14 | 2024-04-05 | 深圳大学 | Data reconstruction method and system based on depth features and gradient information |
CN115242250A (en) * | 2022-09-21 | 2022-10-25 | 成都工业学院 | Encoding and decoding method for single-full mapping of multi-value chain data element allocation |
Also Published As
Publication number | Publication date |
---|---|
CN113298895B (en) | 2023-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109754403A (en) | Tumour automatic division method and system in a kind of CT image | |
Marimont et al. | Anomaly detection through latent space restoration using vector quantized variational autoencoders | |
Li et al. | Scconv: spatial and channel reconstruction convolution for feature redundancy | |
CN115409733B (en) | Low-dose CT image noise reduction method based on image enhancement and diffusion model | |
CN111932444A (en) | Face attribute editing method based on generation countermeasure network and information processing terminal | |
CN113298895A (en) | Convergence guarantee-oriented unsupervised bidirectional generation automatic coding method and system | |
CN112541864A (en) | Image restoration method based on multi-scale generation type confrontation network model | |
CN113822437A (en) | Deep layered variational automatic encoder | |
CN112233012A (en) | Face generation system and method | |
CN117058307A (en) | Method, system, equipment and storage medium for generating heart three-dimensional nuclear magnetic resonance image | |
CN114332287A (en) | Method, device, equipment and medium for reconstructing PET (positron emission tomography) image based on transformer feature sharing | |
CN113538608A (en) | Controllable character image generation method based on generation countermeasure network | |
CN111626296A (en) | Medical image segmentation system, method and terminal based on deep neural network | |
AU2022288157A1 (en) | Method for producing an image of expected results of medical cosmetic treatments on a human anatomical feature from an image of the anatomical feature prior to these medical cosmetic treatments | |
Chen et al. | Self-supervised neuron segmentation with multi-agent reinforcement learning | |
Yang et al. | Low‐dose CT denoising with a high‐level feature refinement and dynamic convolution network | |
Andersson et al. | Evaluation of data augmentation of MR images for deep learning | |
Poonkodi et al. | 3d-medtrancsgan: 3d medical image transformation using csgan | |
Zwettler et al. | Strategies for training deep learning models in medical domains with small reference datasets | |
Tang et al. | A deep map transfer learning method for face recognition in an unrestricted smart city environment | |
CN112541566B (en) | Image translation method based on reconstruction loss | |
Jeon et al. | Continuous face aging generative adversarial networks | |
CN115482557A (en) | Human body image generation method, system, device and storage medium | |
Ren et al. | Medical image super-resolution based on semantic perception transfer learning | |
Ni et al. | Natural Image Reconstruction from fMRI Based on Self-supervised Representation Learning and Latent Diffusion Model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |