CN113298895A - Convergence guarantee-oriented unsupervised bidirectional generation automatic coding method and system - Google Patents

Convergence guarantee-oriented unsupervised bidirectional generation automatic coding method and system Download PDF

Info

Publication number
CN113298895A
CN113298895A CN202110678193.6A CN202110678193A CN113298895A CN 113298895 A CN113298895 A CN 113298895A CN 202110678193 A CN202110678193 A CN 202110678193A CN 113298895 A CN113298895 A CN 113298895A
Authority
CN
China
Prior art keywords
data
result
encoder
image
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110678193.6A
Other languages
Chinese (zh)
Other versions
CN113298895B (en
Inventor
钱诗友
华勤
曹健
薛广涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110678193.6A priority Critical patent/CN113298895B/en
Publication of CN113298895A publication Critical patent/CN113298895A/en
Application granted granted Critical
Publication of CN113298895B publication Critical patent/CN113298895B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention provides an unsupervised bidirectional generation automatic coding method and system for convergence guarantee, which comprises the following steps: the batch data (x, z) simultaneously generate an encoding result E (x) and generation data G (z) through an encoder E and a generator G; image space data
Figure DDA0003121662200000011
And latent variable space data
Figure DDA0003121662200000012
By rolling blocks Fx and Fz, respectively, image space data is processed
Figure DDA0003121662200000013
And latent variable space data
Figure DDA0003121662200000014
Extracting information to obtain extracted image space data
Figure DDA0003121662200000015
And extracted hidden variable space data
Figure DDA0003121662200000016
From the extracted image space data
Figure DDA0003121662200000017
And extracted hidden variable space data
Figure DDA0003121662200000018
Training a discriminator D until the loss function is minimum; the image data is encoded by the trained encoder E to generate an encoding result, the encoding result is input into the trained generator G to obtain a reconstructed image data result, the reconstruction of the image data is completed, the overall collaborative optimization of the image space and hidden variable space bidirectional mapping process is realized, and the representation capability and the image generation capability are improved.

Description

Convergence guarantee-oriented unsupervised bidirectional generation automatic coding method and system
Technical Field
The invention relates to the technical field of encoders, in particular to an unsupervised bidirectional generation automatic encoding method and system for convergence guarantee, and more particularly to an unsupervised bidirectional generation automatic encoder for convergence guarantee.
Background
An Auto Encoder (AE) is a learning algorithm for efficiently encoding data to reduce dimensionality. In recent years, automatic encoders have been widely used in various fields such as image classification and reconstruction, recommendation systems, and abnormality detection.
Currently, research on auto-encoders focuses on improving the ability to simultaneously generate and characterize images. This means that the auto-encoder should learn the bi-directional mapping between the generator/decoder and the encoder. In particular, the generator/decoder focuses on the mapping from the hidden variable space to the data space, while the encoder aims at extracting the semantically related feature representation in the inverse mapping from the data space to the hidden variable space. Generation of countermeasure networks (GANs) as the most advanced generation model has powerful mapping capabilities, especially in terms of generalization. Therefore, it is a feasible method to research the automatic encoder based on the GAN network.
Some previous work has proposed methods of using GAN or countermeasure models in an auto-encoder, such as AAE, ALAE and BiGAN. For example, the AAE generalizes the GAN framework when training the encoder and makes the distribution of the encoding results approach a gaussian distribution. ALAE trains the auto-encoder by reconstructing the image from the pattern coding results of the real image using the StyleGAN framework.
However, most of these efforts have two limitations. First, they do not allow a good compromise between mapping and inverse mapping. For example, AAE and ALAE typically treat the training process as a one-way optimization, regardless of the trade-off between the generator and the encoder. Second, convergence cannot be guaranteed in some bi-directional networks. For example, BiGAN implements mapping and inverse mapping by distinguishing the joint distribution of the hidden variable space and the data space, but has poor convergence performance. In addition, the characterization capability was not optimized in BiGAN.
Patent document CN111402179A (application number: 202010169306.5) discloses an image synthesis method and system that combines a countermeasure autoencoder and a generation countermeasure network. The method includes constructing an enhanced countermeasure automatic encoder including two different sets of encoders, two different sets of first discriminators, and a set of decoders; constructing an improved conditional generation countermeasure network comprising a generator and a second discriminator; taking the manually segmented blood vessel tree image and the original fundus retina image as training data, and performing iterative training on a combined enhanced countermeasure automatic encoder and an improved conditional generation countermeasure network to obtain an optimal blood vessel tree image generator and an optimal fundus retina image generator; and performing fundus retina image synthesis on the to-be-processed artificial segmentation blood vessel tree image based on the optimal blood vessel tree image generator and the optimal fundus retina image generator to obtain a synthesized image.
The invention provides Bi-GAE, which is an unsupervised generation automatic encoder based on BiGAN. First, the present invention designs two schemes to trade-off mapping and inverse mapping. In particular, the present invention introduces a guiding term in the mapping based on the SSIM loss function that causes the model to follow the human visual pattern to generate an image. In addition, the invention utilizes the embedded GAN to calculate another guiding item, thereby enhancing the representation capability related to semantics in the reverse mapping. The cooperation of the two schemes enhances the bidirectional information expansion between the hidden variable space and the data space, thereby improving the overall performance of the Bi-GAE. Secondly, the present invention uses the Wasserstein distance to guarantee efficient gradient computation, while the embedded GAN exploits MMD to enhance the convergence of Bi-GAE as the discriminator approaches convergence.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an unsupervised bidirectional generation automatic coding method and system for convergence guarantee.
The invention provides an unsupervised bidirectional generation automatic coding method for convergence guarantee, which comprises the following steps:
step S1: the batch data (x, z) simultaneously generates an encoding result E (x) and generated data G (z) through an encoder E and a generator G, and the mapping from the hidden variable space to the data space and the reverse mapping from the data space to the hidden variable space are completed;
step S2: image space data
Figure BDA0003121662180000021
And latent variable space data
Figure BDA0003121662180000022
By rolling blocks Fx and Fz, respectively, image space data is processed
Figure BDA0003121662180000023
And latent variable space data
Figure BDA0003121662180000024
Extracting information to obtain extracted image space data
Figure BDA0003121662180000025
And extracted hidden variable space data
Figure BDA0003121662180000026
Step S3: from the extracted image space data
Figure BDA0003121662180000027
And extracted hidden variable space data
Figure BDA0003121662180000028
Training a discriminator D until the loss function is minimum;
step S4: the image data is encoded by the trained encoder E to generate an encoding result, the encoding result is input into the trained generator G to obtain a reconstructed image data result, the reconstruction of the image data is completed, the overall collaborative optimization of the image space and hidden variable space bidirectional mapping process is realized, and the representation capability and the image generation capability are improved.
Preferably, the generator G and the encoder E use a convolution network module and a deconvolution network module, respectively, in the DCGAN structure.
Preferably, the loss function in step S3 includes: a Wasserstein distance;
Figure BDA0003121662180000031
wherein ,
Figure BDA0003121662180000032
represents the Wasserstein distance;
Figure BDA0003121662180000033
d represents the likelihood probability of the input data pair; fx and FzRepresenting a volume block.
Preferably, the method further comprises the following steps: an embedded MMD discriminator module Dz is introduced, the convolution blocks Fx and Fz are multiplexed, an encoder E and a generator G realize an embedded GAN network, and the embedded GAN network is utilized to reduce MMD distance between z and distribution of a reconstruction result E (G (z)) to strengthen convergence of the whole bidirectional structure and strengthen semantic correlation characterization capability of the encoder.
Preferably, the reducing the MMD distance between z and the distribution of the reconstruction result E (g (z)) by using the embedded GAN network to enhance the convergence of the overall bidirectional structure and enhance the semantic correlation characterization capability of the encoder includes:
step S5: the encoding result E (x) is reconstructed by a generator G to generate data G (E (x)); generating data g (z) and reconstructing an encoding result E (g (z)) by an encoder E;
step S6: calculating by using an embedded MMD discriminator module Dz to obtain an MMD difference loss function distributed between z and E (G (z))
Figure BDA0003121662180000034
And using a loss function
Figure BDA0003121662180000035
Optimizing an embedded MMD discriminator module Dz;
Figure BDA0003121662180000036
Figure BDA0003121662180000037
wherein ,PzRepresenting the spatial distribution of hidden variables; y isz=Dz(z);yz'=Dz(z'); z, z' represents a sample on the hidden variable space;
Figure BDA0003121662180000038
representing a reconstruction result of the hidden variable z to generate a data recoding result;
Figure BDA0003121662180000039
to represent
Figure BDA00031216621800000310
A conditional probability distribution of (a);
Figure BDA00031216621800000311
λ1representing a gradient penalty term weight parameter;
Figure BDA00031216621800000312
presentation pair
Figure BDA00031216621800000318
The above acquired gradient; σ denotes krep(ii) a coefficient of variance of the conforming gaussian distribution; e is the base constant of the natural logarithm;
Figure BDA00031216621800000313
ε represents the value of z and its reconstructed result
Figure BDA00031216621800000314
Solving a weight parameter during weighted sampling;
step S7: solving MMD loss function distributed between z and E (G (z)) about encoder E and generator G by using optimized embedded MMD discriminator module Dz
Figure BDA00031216621800000315
Figure BDA00031216621800000316
Figure BDA00031216621800000317
Wherein kF represents a Gaussian kernel function; b1,buLower and upper bound parameters representing the distance L2 between a and b, respectively;
step S8: using loss functions
Figure BDA0003121662180000041
Generating a guide item Tz on a coding space, and further finishing the training of the embedded GAN network;
Figure BDA0003121662180000042
wherein ,∝1To represent
Figure BDA0003121662180000043
And pixel level of the reconstruction operation2Loss of power
Figure BDA0003121662180000044
The weighting coefficient of (2); nb represents the data batch size;
Figure BDA0003121662180000045
original hidden variable z and reconstruction result
Figure BDA0003121662180000046
Of (b) in2Loss;
step S9: the convergence of the current Bi-GAE bidirectional structure is optimized through the trained embedded GAN network, and the semantic representation capability of the encoder E is further improved through the generated guide items.
Preferably, the method further comprises the following steps: calculating to obtain a guide item T by using an SSIM module according to the real image x and the reconstructed image G (E (x)))xThe human visual feature generation capability of the generator is enhanced;
SSIM concerns brightness (l (x, x ')), contrast (contrast) and structure (structure) (measured in combination with cs (x, x'))
Figure BDA0003121662180000047
LMSSSIM(x,x′)=1-MSSSIM(x,x′) (8)
Figure BDA0003121662180000048
Figure BDA0003121662180000049
Figure BDA00031216621800000410
Wherein, x ═ G (e (x)); l isMSSSIM(x, x') denotes the SSIM loss function, α2Indicates a loss of SSIM and
Figure BDA00031216621800000411
a weighting factor of the loss;
Figure BDA00031216621800000412
denotes l between x and the reconstructed result1Loss value, /)m γmRepresents the result of l (x, x') after passing through m gaussian filters; gamma represents an attenuation parameter; m represents the number of gaussian filters; c. C1Represents a constant parameter;
Figure BDA00031216621800000413
represents the result generated by the jth filter; etajRepresenting the attenuation coefficient of the corresponding window if x has a magnitude of(s)i×si) Then the jth filter window size is
Figure BDA00031216621800000414
pjRepresents the filter window, μx,μx′Respectively represent the average values of x and x'; sigmax,σx′Respectively, the standard deviation of x, x'.
The invention provides an unsupervised bidirectional generation automatic coding system facing convergence guarantee, which comprises:
module M1: the batch data (x, z) simultaneously generates an encoding result E (x) and generated data G (z) through an encoder E and a generator G, and the mapping from the hidden variable space to the data space and the reverse mapping from the data space to the hidden variable space are completed;
module M2: image space data
Figure BDA0003121662180000051
And implicit variable spaceData of
Figure BDA0003121662180000052
By rolling blocks Fx and Fz, respectively, image space data is processed
Figure BDA0003121662180000053
And latent variable space data
Figure BDA0003121662180000054
Extracting information to obtain extracted image space data
Figure BDA0003121662180000055
And extracted hidden variable space data
Figure BDA0003121662180000056
Module M3: from the extracted image space data
Figure BDA0003121662180000057
And extracted hidden variable space data
Figure BDA0003121662180000058
Training a discriminator D until the loss function is minimum;
module M4: the image data is encoded by the trained encoder E to generate an encoding result, the encoding result is input into the trained generator G to obtain a reconstructed image data result, the reconstruction of the image data is completed, the overall collaborative optimization of the image space and hidden variable space bidirectional mapping process is realized, and the representation capability and the image generation capability are improved.
Preferably, the loss function in the module M3 includes: a Wasserstein distance;
Figure BDA0003121662180000059
wherein ,
Figure BDA00031216621800000510
represents the Wasserstein distance;
Figure BDA00031216621800000511
d represents the likelihood probability of the input data pair; fx and FzRepresenting a volume block.
Preferably, the method further comprises the following steps: an embedded MMD discriminator module Dz is introduced, the convolution blocks Fx and Fz are multiplexed, an encoder E and a generator G realize an embedded GAN network, and the embedded GAN network is utilized to reduce the MMD distance between z and the distribution of a reconstruction result E (G (z)) to strengthen the convergence of the whole bidirectional structure and strengthen the semantic correlation characterization capability of the encoder;
the MMD distance between z and the distribution of the reconstruction result E (G (z)) is reduced by utilizing the embedded GAN network to strengthen the convergence of the whole bidirectional structure and strengthen the semantically-related characterization capability of the encoder, and the MMD distance comprises the following steps:
module M5: the encoding result E (x) is reconstructed by a generator G to generate data G (E (x)); generating data g (z) and reconstructing an encoding result E (g (z)) by an encoder E;
module M6: calculating by using an embedded MMD discriminator module Dz to obtain an MMD difference loss function distributed between z and E (G (z))
Figure BDA00031216621800000512
And using a loss function
Figure BDA00031216621800000513
Optimizing an embedded MMD discriminator module Dz;
Figure BDA00031216621800000514
Figure BDA00031216621800000515
wherein ,PzRepresenting the spatial distribution of hidden variables; y isz=Dz(z);yz′=Dz(z'); z, z' being implicitSampling over a volume space;
Figure BDA00031216621800000516
representing a reconstruction result of the hidden variable z to generate a data recoding result;
Figure BDA00031216621800000517
to represent
Figure BDA00031216621800000518
A conditional probability distribution of (a);
Figure BDA00031216621800000519
λ1representing a gradient penalty term weight parameter;
Figure BDA00031216621800000520
presentation pair
Figure BDA00031216621800000521
The above acquired gradient; σ denotes krep(ii) a coefficient of variance of the conforming gaussian distribution; e is the base constant of the natural logarithm;
Figure BDA0003121662180000061
ε represents the value of z and its reconstructed result
Figure BDA0003121662180000062
Solving a weight parameter during weighted sampling;
module M7: solving MMD loss function distributed between z and E (G (z)) about encoder E and generator G by using optimized embedded MMD discriminator module Dz
Figure BDA0003121662180000063
Figure BDA0003121662180000064
Figure BDA0003121662180000065
wherein ,kFRepresenting a gaussian kernel function; bl,buLower and upper bound parameters representing the distance L2 between a and b, respectively;
module M8: using loss functions
Figure BDA0003121662180000066
Generating a guide item Tz on a coding space, and further finishing the training of the embedded GAN network;
Figure BDA0003121662180000067
wherein ,∝1To represent
Figure BDA0003121662180000068
And pixel level of the reconstruction operation2Loss of power
Figure BDA0003121662180000069
The weighting coefficient of (2); nb represents the data batch size;
Figure BDA00031216621800000610
original hidden variable z and reconstruction result
Figure BDA00031216621800000611
Of (b) in2Loss;
module M9: the convergence of the current Bi-GAE bidirectional structure is optimized through the trained embedded GAN network, and the semantic representation capability of the encoder E is further improved through the generated guide items.
Preferably, the method further comprises the following steps: calculating to obtain a guide item T by using an SSIM module according to the real image x and the reconstructed image G (E (x)))xThe human visual feature generation capability of the generator is enhanced;
SSIM concerns brightness (1. mu. l (x, x ')), contrast (contrast) and structure (structure) (measured in combination with cs (x, x'))
Figure BDA00031216621800000612
LMSSSIM(x,x′)=1-MSSSIM(x,x′) (19)
Figure BDA00031216621800000613
Figure BDA00031216621800000614
Figure BDA00031216621800000615
Wherein, x ═ G (e (x)); l isMSSSIM(x, x') denotes the SSIM loss function, α2Indicates a loss of SSIM and
Figure BDA00031216621800000618
a weighting factor of the loss;
Figure BDA00031216621800000616
denotes l between x and the reconstructed result1Loss value, /)m γmRepresents the result of l (x, x') after passing through m gaussian filters; gamma represents an attenuation parameter; m represents the number of gaussian filters; c. C1Represents a constant parameter;
Figure BDA00031216621800000617
represents the result generated by the jth filter; etajRepresenting the attenuation coefficient of the corresponding window if x has a magnitude of(s)i×si) Then the jth filter window size is
Figure BDA0003121662180000071
pjRepresents the filter window, μx,μx′Respectively represent the average values of x and x'; sigmax,σx′Respectively represent the x-ray numbers and the x-ray numbers,standard deviation of x'.
Compared with the prior art, the invention has the following beneficial effects:
1. the convergence guarantee-oriented unsupervised bidirectional generation automatic encoder can improve the information expansion between a hidden variable space and a data space through stable convergence;
2. the invention introduces a guide in the loss function to optimize the image reconstruction and generation in the map according to the human visual mode;
3. the present invention embeds GAN for computing a leading term that enhances the representation of semantically related features in the inverse mapping to enhance the convergence of the auto-encoder.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a logical framework diagram of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention discloses an unsupervised bidirectional generation automatic encoder for convergence guarantee. Improving the generation and characterization capabilities of autoencoders is a major research problem in the field of machine learning. However, optimizing the bi-directional mapping while stabilizing convergence has presented significant challenges. Most existing auto-encoders fail to automatically trade-off the bi-directional mapping between the encoder and the decoder/generator. The invention provides Bi-GAE, which is an unsupervised bidirectional generation automatic encoder based on BiGAN. First, we introduce two guiding terms in the loss function for enhancing information expansion to follow the human visual model in the mapping and improve the semantic-related characterization capability in the inverse mapping. In addition, we embedded a GAN to improve Bi-GAE convergence and characterization capability. The experimental result shows that the Bi-GAE has competitive advantages in generation and characterization and has stable convergence. Compared with the similar method, the characterization capability of Bi-GAE improves the classification accuracy of high-resolution images by about 6.607%. Furthermore, in image reconstruction, Bi-GAE increased the Structural Similarity (SSIM) index by 0.017 and decreased the fracht onset distance (FID) by 3.098.
Example 1
The invention provides an unsupervised bidirectional generation automatic coding method for convergence guarantee, which comprises the following steps:
step S1: the batch data (x, z) simultaneously generates an encoding result E (x) and generated data G (z) through an encoder E and a generator G, and the mapping from the hidden variable space to the data space and the reverse mapping from the data space to the hidden variable space are completed;
step S2: image space data
Figure BDA0003121662180000081
And latent variable space data
Figure BDA0003121662180000082
By rolling blocks Fx and Fz, respectively, image space data is processed
Figure BDA0003121662180000083
And latent variable space data
Figure BDA0003121662180000084
Extracting information to obtain extracted image space data
Figure BDA0003121662180000085
And extracted hidden variable space data
Figure BDA0003121662180000086
Step S3: from the extracted image space data
Figure BDA0003121662180000087
And extracted hidden variable space data
Figure BDA0003121662180000088
Training a discriminator D until the loss function is minimum;
step S4: the image data is encoded by the trained encoder E to generate an encoding result, the encoding result is input into the trained generator G to obtain a reconstructed image data result, the reconstruction of the image data is completed, the overall collaborative optimization of the image space and hidden variable space bidirectional mapping process is realized, and the representation capability and the image generation capability are improved.
Specifically, the generator G and the encoder E use a convolution network module and a deconvolution network module, respectively, in the DCGAN structure.
Specifically, the loss function in step S3 includes: a Wasserstein distance;
Figure BDA0003121662180000089
wherein ,
Figure BDA00031216621800000810
represents the Wasserstein distance;
Figure BDA00031216621800000811
d represents the likelihood probability of the input data pair; fx and FzRepresenting a volume block.
Specifically, the method further comprises the following steps: an embedded MMD discriminator module Dz is introduced, the convolution blocks Fx and Fz are multiplexed, an encoder E and a generator G realize an embedded GAN network, and the embedded GAN network is utilized to reduce MMD distance between z and distribution of a reconstruction result E (G (z)) to strengthen convergence of the whole bidirectional structure and strengthen semantic correlation characterization capability of the encoder.
Specifically, the reducing of the MMD distance between z and the distribution of the reconstruction result E (g (z)) by using the embedded GAN network enhances the convergence of the overall bidirectional structure and enhances the semantic correlation characterization capability of the encoder includes:
step S5: the encoding result E (x) is reconstructed by a generator G to generate data G (E (x)); generating data g (z) and reconstructing an encoding result E (g (z)) by an encoder E;
step S6: calculating by using an embedded MMD discriminator module Dz to obtain an MMD difference loss function distributed between z and E (G (z))
Figure BDA0003121662180000091
And using a loss function
Figure BDA0003121662180000092
Optimizing an embedded MMD discriminator module Dz;
Figure BDA0003121662180000093
Figure BDA0003121662180000094
wherein ,PzRepresenting the spatial distribution of hidden variables; y isz=Dz(z);yz′=Dz(z'); z, z' represents a sample on the hidden variable space;
Figure BDA0003121662180000095
representing a reconstruction result of the hidden variable z to generate a data recoding result;
Figure BDA0003121662180000096
to represent
Figure BDA0003121662180000097
A conditional probability distribution of (a);
Figure BDA0003121662180000098
λ1representing a gradient penalty term weight parameter;
Figure BDA0003121662180000099
presentation pair
Figure BDA00031216621800000910
The above acquired gradient; σ denotes krep(ii) a coefficient of variance of the conforming gaussian distribution; e is the base constant of the natural logarithm;
Figure BDA00031216621800000911
ε represents the value of z and its reconstructed result
Figure BDA00031216621800000912
Solving a weight parameter during weighted sampling;
step S7: solving MMD loss function distributed between z and E (G (z)) about encoder E and generator G by using optimized embedded MMD discriminator module Dz
Figure BDA00031216621800000913
Figure BDA00031216621800000914
Figure BDA00031216621800000915
wherein ,kFRepresenting a gaussian kernel function; bl,buLower and upper bound parameters representing the distance L2 between a and b, respectively;
step S8: using loss functions
Figure BDA00031216621800000916
Generating a guide item Tz on a coding space, and further finishing the training of the embedded GAN network;
Figure BDA00031216621800000917
wherein ,∝1To represent
Figure BDA00031216621800000918
And pixel level of the reconstruction operation2Loss of power
Figure BDA00031216621800000919
The weighting coefficient of (2); nb represents the data batch size;
Figure BDA00031216621800000920
original hidden variable z and reconstruction result
Figure BDA00031216621800000921
Of (b) in2Loss;
step S9: the convergence of the current Bi-GAE bidirectional structure is optimized through the trained embedded GAN network, and the semantic representation capability of the encoder E is further improved through the generated guide items.
Specifically, the method further comprises the following steps: calculating to obtain a guide item T by using an SSIM module according to the real image x and the reconstructed image G (E (x)))xThe human visual feature generation capability of the generator is enhanced;
SSIM concerns brightness (l (x, x ')), contrast (contrast) and structure (structure) (measured in combination with cs (x, x'))
Figure BDA00031216621800000922
LMSSSIM(x,x′)=1-MSSSIM(x,x′) (8)
Figure BDA0003121662180000101
Figure BDA0003121662180000102
Figure BDA0003121662180000103
Wherein, x ═ G (e (x)); l isMSSSIM(x, x') denotes the SSIM loss function, α2Indicates a loss of SSIM and
Figure BDA0003121662180000104
a weighting factor of the loss;
Figure BDA0003121662180000105
denotes l between x and the reconstructed result1Loss value, /)m γmRepresents the result of l (x, x') after passing through m gaussian filters; gamma represents an attenuation parameter; m represents the number of gaussian filters; c. C1Represents a constant parameter;
Figure BDA0003121662180000106
represents the result generated by the jth filter; etajRepresenting the attenuation coefficient of the corresponding window if x has a magnitude of(s)i×si) Then the jth filter window size is
Figure BDA0003121662180000107
pjRepresents the filter window, μx,μx′Respectively represent the average values of x and x'; sigmax,σx′Respectively, the standard deviation of x, x'.
The convergence guarantee-oriented unsupervised bidirectional generation automatic coding system can be realized through the step flow in the convergence guarantee-oriented unsupervised bidirectional generation automatic coding method. The convergence guarantee oriented unsupervised bidirectional generation automatic coding method can be understood as a preferred example of the convergence guarantee oriented unsupervised bidirectional generation automatic coding system by those skilled in the art.
Example 2
Example 2 is a preferred example of example 1
In order to overcome the defects of the existing automatic encoder in the aspects of bidirectional mapping balance and convergence, the invention provides the unsupervised generation automatic encoder based on the BiGAN, and the generation and characterization capabilities of the automatic encoder are effectively improved.
The invention provides an unsupervised generative automatic encoder which can simultaneously realize the balance and joint optimization of mapping and inverse mapping. Aiming at the limitation of BiGAN, the invention carries out two main optimizations on Bi-GAE. First, the present invention introduces Wasserstein distance and embeds GAN to enhance the convergence of Bi-GAE. In order to estimate the Wasserstein distance of the joint distribution, the invention designs two convolution blocks for feature extraction in Bi-GAE. The invention proves the convergence of Bi-GAE theoretically. The invention introduces two terms for the generator and the encoder, respectively, which enable information expansion in mapping and inverse mapping. These extensions effectively achieve a trade-off between bi-directional mapping.
Fig. 1 shows a framework of an unsupervised bi-directional generation auto-encoder for convergence guarantee according to the present invention. Similar to BiGAN, the body part of Bi-GAE comprises a generator G, an encoder E and a discriminator D. As a bidirectional mapping encoder, the bidirectional optimization targets of the architecture are respectively: (1) mapping of hidden variables to data space (generator G); (2) inverse mapping of data space to hidden variable space (coding space). Compared with the common encoder and the generative encoder, the basic improvement idea of the architecture technical scheme comprises the following steps: (1) the joint distribution of the data space and the hidden variable space is utilized to realize the simultaneous optimization of the bidirectional mapping of the image space and the hidden variable space and further realize the overall cooperative optimization of the bidirectional process; (2) guiding mechanisms are respectively introduced in the two-way mapping training process, the convergence of the two-way architecture is guaranteed, and the visual property of the generator and the semantic correlation representation of the encoder are respectively optimized. The above idea is based on the basic architecture of BiGAN in the specific implementation of the technical solution, introduces Wasserstein distance as the judgment index of discriminator D, and designs 4 specific embedded modules to implement two guiding mechanisms in the bidirectional mapping process.
As shown in FIG. 1, the main part of the Bi-GAE is based on the BiGAN structure, and the encoder E and the generator G of the Bi-GAE respectively use a convolution network module and a deconvolution network module in the DCGAN structure in specific implementation. In the Bi-GAE, Wasserstein distance is selectively introduced into a main structure to serve as an index for distinguishing joint distribution distance in a discriminator D (specifically realized by a convolution block in DCGAN), correspondingly, in order to realize the simultaneous and feature extraction of a data space and an implicit variable space, two feature extraction convolution blocks Fx and Fz are introduced, and features of the data space and the implicit variable space are extracted as vectors and input into D. For the two guiding mechanisms described above, we utilize the SSIM module to calculate the structural difference between the real data x and the corresponding reconstructed data G (e (x)); for the coding space, we further embed an MMD-based countermeasure network module, further strengthen the convergence of the overall bidirectional structure and strengthen the semantic correlation characterization capability of the encoder by reducing the MMD distance between z and the distribution of the reconstructed result E (g (z)). The above two-way training process, the Wasserstein distance, the two guiding mechanisms and the technical solution of four embedded modules will be further elaborated below.
In each batch training process, the batch data input by the Bi-GAE is (x, z), that is, the data space batch instance x input into the encoder E and the hidden variable space normal distribution batch sample z input into the generator G. The specific training process is as follows:
(1) using (x, z), the encoder E and the generator G simultaneously generate an encoding result E (x) and generate data G (z). The step is to complete the mapping from the hidden variable space to the data space and the reverse mapping from the data space to the hidden variable space at the same time.
(2) By obtaining image space data
Figure BDA0003121662180000111
And latent variable space data
Figure BDA0003121662180000112
Referring to the BiGAN two-way training concept, the difference between two joint distributions is determined by a discriminator based on the joint distribution (x, e (x)) and the generated data (g (z), z). The core idea is to train the discriminator D to make it as large as possible according to the judgment distance between the distribution and the generation distribution, and for the encoder E and the generator G which are synchronously trained, the optimized ones are usedD, the result of judging the distance difference between the distributions Wasserstein is output for training, and the goal is to make D unable to distinguish the two distributions as much as possible, that is, the generated data obtained by D is as small as possible in relation to the judgment distance of the basis data. The core idea is to follow the core principle of BiGAN, and the mathematical meaning of this goal can be said to be that two joint probabilities are equal, i.e. it is considered that the bidirectional optimization is successfully completed. We can consider { x, E (x) } joint distribution as Px,E=PE(z | x) P (x), (G (z)), z } joint distribution is PG,z=P(z)PG(x | z), P (x), P (z) is the true distribution of data space, hidden space, PE,PGIs the conditional distribution of the encoder and generator. Therefore, under the premise of the same joint distribution, P can be respectively realizedEAnd P (z), P (x) and PGThereby completing the synchronous optimization of the encoder and the generator. To ensure convergence, we introduced the concept that the Wasserstein distance replaces the KL divergence in BiGAN.
(3) Based on the main flow, we obtain x reconstruction result G (E (x)) according to encoding result E (x), and obtain z reconstruction result E (G (z)) according to generation result G (z). And (3) guiding the encoder E to the direction of strengthening semantic representation by using an MMD embedded network according to (E (G (z)), z) and guiding the generator G to simultaneously optimize in the direction meeting the visual characteristics of human eyes by using an SSIM module according to (G (E (x)), x).
In order to solve the problem that BiGAN is difficult to converge, Wasserstein distance is required to be introduced into a BiGAN structure as a loss function, so that the Wasserstein distance is firstly obtained from an image space
Figure BDA0003121662180000121
And
Figure BDA0003121662180000122
two convolution modules Fx and Fz are therefore designed to do this.
In order to make the generation capability of the generator G meet the human visual characteristics, we use the Structural Similarity Indexing Method (SSIM) loss to construct the bootstrap term Tx as part of the generator loss function for the generator. The overall solution flow for Tx is: an SSIM module is introduced between the (G (E (x)) and the (x)) to calculate a similarity loss value between a reconstructed image space and an original image space;
in order to solve the problem that estimation of Wasserstein is invalid when a discriminator is close to convergence, an embedded GAN model based on MMD is nested on the basis of the existing structure. The embedded model effectively multiplexes the encoder E and generator G, and accordingly, an embedded GAN discriminator should be introduced.
Introducing Wasserstein distance
A traditional BiGAN network is optimized aiming at a joint data space, X distribution of an image data space is P (X), Z distribution of a hidden variable data space is P (Z), and a discriminant D training target of the traditional BiGAN network is a maximum probability PD(Y | X, Z) wherein (Y ═ 1| (X, e (X)); (Y ═ 0| (x, E (x))), that is, the judgment accuracy is maximized, and the opposite is true for the generator G and the trainer E. Therefore, the training targets of BiGAN as a bidirectional structure are as follows:
Figure BDA0003121662180000123
wherein D () is the similarity probability of the input data pair, and G (), E () is the output of the generator and encoder.
As shown in equation (1), the conventional BiGAN uses Jensen-shannon (js) divergence or Kullback-leibler (kl) divergence in estimating the judgment distance between the basis data space (x, e (x)) and the generated data space (g (z), z). However, the divergence has the obvious problem that the gradient of the loss function associated with equation (1) is ineffective when D tends to converge, thus leading to the failure of the training of the BiGAN structure. Therefore, an effective way to optimize the convergence of the BiGAN bi-directional structure is to introduce a distance measure such that the gradient of the corresponding loss function is effective and not 0 at any time. In summary, the Wasserstein distance was introduced into Bi-GAE. For distribution P according to dataxAnd generating a data distribution PgIn practical applications, the Wasserstein distance, W, between two distributions is estimated as follows:
Figure BDA0003121662180000131
wherein ,gθ(. cndot.) and D (-) output of the generator G and the discriminator D, respectively, E () represents expectation. However, the requirement of equation (2) that D (-) must satisfy the 1-Lipschit constraint, and the Wasserstein distance still has the problem of being unable to estimate W correctly when D is close to converging, so we will solve these problems by further design.
Wasserstein distance is introduced into the bidirectional countermeasure network, so that two volume blocks Fx and Fz are correspondingly designed and respectively used for image space data
Figure BDA0003121662180000132
And latent variable space data
Figure BDA0003121662180000133
Information extraction is carried out, so that two pieces of information are helped to have a data form (shape) capable of being aggregated and connected, and the order is given
Figure BDA0003121662180000134
To obtain
Figure BDA0003121662180000135
Figure BDA0003121662180000136
The final calculation in Bi-GAE for Wasserstein distance between the data space and the production data space is as follows:
Figure BDA0003121662180000137
in summary, the penalty function L of the discriminator-convolution block (D-Fx-Fz) is designed accordinglyDFJoint loss function L of sum generator-encoder (G-E)EGAs shown in equations (4) and (5), respectively:
Figure BDA0003121662180000138
wherein ,nbFor batch size, to ensure that the 1-Lipschit limit is met, a Gradient Penalty Term (GP Term) is introduced. In the penalty term, there are
Figure BDA0003121662180000139
Figure BDA00031216621800001310
wherein ,TxAs a guide term for the generation capability, ε represents the comparison of the real data sample x and the generated data sample
Figure BDA00031216621800001311
Weight parameter in weighted sampling, true hidden space data sample z and coding result sample
Figure BDA00031216621800001312
Calculating a weight parameter during weighted sampling, wherein lambda represents a weight coefficient of a ladder penalty term,
Figure BDA00031216621800001313
to represent
Figure BDA00031216621800001314
In that
Figure BDA00031216621800001315
Gradient found over variable, sigma representing TxThe weight parameter of (2).
Embedding a MMD-based GAN network
When the Wasserstein distance is introduced, the problem of invalid estimation of the Wasserstein distance when convergence is approached is not solved, namely when the parameters of the discriminator D tend to converge, the network has the condition that the Wasserstein distance cannot be correctly estimated, so that the Bi-GAE cannot be trained by using a correct gradient. Meanwhile, in order to realize the enhancement of semantic representation capability in the mapping from the data space to the hidden variable space on an encoder, an error design guide term of encoding reconstruction is utilized to realize the process. Both of the above processes can be implemented using an MMD based embedded GAN network.
First, we briefly describe the flow of MMD embedded GAN networks. The invention introduces an embedded MMD discriminator module Dz, and multiplexes the volume blocks Fx and Fz, an encoder E and a generator G to realize an embedded GAN network. For the input hidden variable z and the reconstruction result E (g (z)) of the completed main process, the embedded GAN network of the present invention has the following implementation process:
1) firstly, an embedded Dz module is utilized to solve the MMD difference loss function distributed between z and E (G (z))
Figure BDA00031216621800001410
This loss is used to optimize the Dz module as shown in equation (8).
2) Solving MMD loss function of distribution between z and E (G (z)) about E, G by using optimized Dz module
Figure BDA00031216621800001411
As shown in equation (9), and the loss function is used to generate the guiding term Tz on the coding space as shown in equation (10), thereby completing the training of the embedded GAN network.
The embedded GAN network further optimizes the convergence of the current Bi-GAE bidirectional structure, and the generated guide item can further improve the semantic representation capability of the encoder E. The theoretical derivation and guidance mechanism of MMD-based embedded GAN networks is designed as follows.
Optimizing convergence
Aiming at the estimation failure problem of Wasserstein distance when D is close to convergence, a solution is found from the calculation method of the Wasserstein distance, because the Wasserstein distance is a special case of Maximum Mean Difference (MMD) in nature, namely a case of using a linear kernel, and therefore, when the MMD with a high-order Gaussian kernel is used as an index to measure the difference of distribution, the convergence performance can be further improved. In combination with the definition of MMD and the characteristics of the joint distribution to be solved in the joint Bi-GAE, we define MMD on the joint distribution space of image data and hidden variables as follows, and have a threshold e, { f } which is a set of continuous functions:
Figure BDA0003121662180000141
wherein ,EfRepresenting the expected difference obtained when using f to measure the sample difference between two distributions, can be understood to reflect the difference in the distributions as the parameter of D tends to converge (i.e., as the parameter of D converges
Figure BDA0003121662180000142
Time), by two lemmas: lesion 1 and 2, we use
Figure BDA0003121662180000143
The effective quantitative estimation of the sequential convergence of the introduced embedded GAN structure is carried out.
Introduction 1: is provided with
Figure BDA0003121662180000144
We can get it
Figure BDA0003121662180000145
By analogy with equation (6), we can see that we use batch sample crushing
Figure BDA0003121662180000146
And ziIn the measurement function f0Upper utilization expected value
Figure BDA0003121662180000147
And
Figure BDA0003121662180000148
the difference therebetween is as
Figure BDA0003121662180000149
Using the function g0In the case of bulk samples
Figure BDA0003121662180000151
And on xi
Figure BDA0003121662180000152
σ is to
Figure BDA0003121662180000153
Obtainable in respect of
Figure BDA0003121662180000154
Lower bound and ε is to
Figure BDA0003121662180000155
Obtainable in respect of
Figure BDA0003121662180000156
The lower bound, defined by MMD, is σ > 0 and ε > 0.
And (3) proving that:
Figure BDA0003121662180000157
similarly, can prove that
Figure BDA0003121662180000158
The proof of lemma 1 means that when D converges in the hidden variable space and the image data space, respectively, Bi-GAE is in
Figure BDA0003121662180000159
I.e. to achieve an overall convergence on the joint distribution space.
2, leading: when f is a continuous function, the function,
Figure BDA00031216621800001510
and (3) proving that: suppose that
Figure BDA00031216621800001511
Let E (g (f (a))) -E (g (f (b)) > tau. Let k be g · f, there are E (k) (a) -E (k (b)) > τ ═ M (a, b), which contradicts the MMD definition, and therefore there are
Figure BDA00031216621800001512
Figure BDA00031216621800001513
Lemma 2 means that MMD is stable for continuous functions f (e.g., G and E trained in Bi-GAE). Since GAN embedded in Bi-GAE is MMD based, we can deduce the upper bound of M (E (g (z)), z) from lemmas 1 and 2.
By lemma 1 and lemma 2, we can derive theorem 1 as follows:
theorem 1: when D is close to convergence, an embedded GAN structure is not introduced (namely D is not newly added)z) In the case of (B), the upper limit of M (E (G (z)), z) tends to be 2E.
And (3) proving that:
M(E(G(z)),z)≤M(E(G(z)),E(x))+M(E(x),z)
m (x, g (z)) + M (e) (x), z)// theorem 1
2 · max ({ epsilon, sigma }) ≦ 2 · e// theorem 2 ≦ epsilon + sigma
While an embedded GAN is introduced, i.e. a new D is addedzAfter the module, according to the theorem 2 and the formula (6), when D is obtainedzUpon convergence, the MMD should satisfy:
M′(E(G(z)),z)≤M′(x,G(z))+M′(E(x),z)≤e (7)
equation (7) means that at this time the upper limit of M' (E (g (z)), z) can be reduced to E. Meanwhile, the reduction is essentially to promote the convergence of the Bi-GAE by promoting the information interaction capability in the encoding and true hidden variable space (e (x), z) and the true and generated image space (x, g (z)), respectively.
1) Enhancing coding semantic representation capability
The goal of Bi-GAE in implementing the mapping of image space to latent variable space is to enhance the semantically-related characterization capability when de-entanglement is considered. Embedding in Bi-GAE according to equation (7)The GAN of (2) realizes information expansion in the encoding process by reducing M (E (x), z), thereby improving the characterization capability. MMD-based loss of embedded GAN is dominated by the repulsion term LrepAnd attraction item LattAnd (4) forming. In actual computation, a single bounded Gaussian kernel method (kernel function k) is selectedF) To reduce the amount of computation. Concrete calculating time
Figure BDA0003121662180000161
Order to
Figure BDA0003121662180000162
To represent
Figure BDA0003121662180000163
The conditional probability distribution of (2). In summary, DzLoss function
Figure BDA0003121662180000164
And E-G (encoder-generator) loss function of
Figure BDA0003121662180000165
As shown in equations (8) and (9), respectively:
Figure BDA0003121662180000166
Figure BDA0003121662180000167
wherein
Figure BDA0003121662180000168
yz=Dz(z),yz′=Dz(z′),
Figure BDA0003121662180000169
Figure BDA00031216621800001610
PzRepresenting hidden variable space componentsCloth, yz=Dz(z),yz′=Dz(z '), z, z' represent samples over the hidden variable space,
Figure BDA00031216621800001611
represents the reconstruction result of the hidden variable z, i.e. the re-encoding result of the generated data, e.g.,
Figure BDA00031216621800001612
to represent
Figure BDA00031216621800001613
A conditional probability distribution of (a);
Figure BDA00031216621800001614
λ1a gradient penalty term weight parameter is represented,
Figure BDA00031216621800001615
presentation pair
Figure BDA00031216621800001616
The above acquired gradient; σ denotes krep(ii) a coefficient of variance of the conforming gaussian distribution; e is the base constant of the natural logarithm (≈ 2.718281828459.);
Figure BDA00031216621800001617
ε represents the value of z and its reconstructed result
Figure BDA00031216621800001618
And calculating a weight parameter during weighted sampling.
Figure BDA00031216621800001619
Figure BDA00031216621800001620
wherein ,kFRepresenting a kernel function (e.g. Gaussian kernel), bl,buThe lower and upper bound parameters represent the distance L2 between a and b, respectively.
For the encoder-decoder, in order to prevent the occurrence of excessive element-level errors, therefore, with the L2 loss as a regularization term, the guiding term T for the encoder to enhance the semantic representation capability is finally knownzThe definition is as follows:
Figure BDA00031216621800001621
wherein ,∝1To represent
Figure BDA00031216621800001622
And pixel level of the reconstruction operation2Loss of power
Figure BDA00031216621800001623
Weighting coefficient of nbWhich is indicative of the size of the data batch,
Figure BDA00031216621800001624
namely the original hidden variable z and the reconstructed result
Figure BDA00031216621800001625
Of (b) in2Loss;
3) human eye visual feature generation capability of enhanced generator introduced with SSIM module
One of the goals of interest of Bi-GAE in generating image data is to facilitate image generation and reconstruction in accordance with the human visual model. In order to realize information expansion in the process of mapping from a hidden variable space to an image space, the Bi-GAE introduces a Structural Similarity Index Method (SSIM), which comprises three human visual indicators, namely brightness (luminance), contrast (contrast) and structure (structure). Therefore, Bi-GAE designs a guiding term T between the real image x and the reconstructed image G (E (x)))x
In training encoder E and generator G, we compute x' ═ G (E (x)). If x is of size(s)i×si) And m Gaussian filters exist, we can calculate MS-SSIM(multiple size SSIM) as follows:
Figure BDA0003121662180000171
wherein ,lm γmDenotes the result of l (x, x') after m Gaussian filters, where γ is the attenuation parameter, m denotes the number of Gaussian filters, c1Is a constant parameter, and is a constant parameter,
Figure BDA0003121662180000172
representing the result generated by the jth filter, where etajAttenuation coefficient for corresponding window if x is(s)i×si) Then the jth filter window size is
Figure BDA0003121662180000173
The size of the jth filter is
Figure BDA0003121662180000174
l (x, x ') and cs (x, x') are defined as follows:
Figure BDA0003121662180000175
wherein pjRepresents the filter window, μx,μx′Respectively represent the average values of x and x'; sigmax,σx′Respectively, the standard deviation of x, x'.
The corresponding design SSIM loss function is:
LMSSSIM(x,x′)=1-MSSSIM(x,x′) (12)
analogy TzIn order to avoid excessive pixel-level errors during reconstruction, we need to add a regular term, considering that the dimension of the image space is high, and in order to prevent the problem of potential excessive penalty of the L2 loss function, we use the L1 loss function to implement the regular term here, and in conclusion, we design the self-guiding term T for the generatorxAs follows:
Figure BDA0003121662180000176
Wherein by default there is alpha20.84x ═ G (e (x)) means loss of SSIM and
Figure BDA0003121662180000177
a weighting factor of the loss;
Figure BDA0003121662180000178
Figure BDA0003121662180000179
denotes l between x and the reconstructed result1Loss value.
System implementation
Each component in the Bi-GAE is implemented based on the source code of DCGAN. Let thetaE,θG,θD,θDz and θF={θFx,θFzDenotes parameters of E, G, D, Dz and { Fx, Fz }, respectively. Accordingly, the present invention uses three custom Adam optimizers, β 1 ═ 0.5 and β 2 ═ 0.9: are respectively used for optimizing thetaG and θEAdam ofEGFor optimizing thetaF and θDAdam ofFDAnd Adam for optimizing DzDzThe learning rates are lrEG,lrFD and lrDz
The Bi-GAE run included 4 steps:
step 1: d and { Fx, Fz } are trained using the data and latent samples (x, z), and the Wasserstein loss distributed jointly in terms of space and guess space is determined using the loss function in equation (4), i.e., discriminator D. L isDFIterative training is performed, the loss function is used to update the discriminator D and the convolution feature extraction module F, the steps are repeated,Dsecond (default 5).
Step 2: training G and E with another batch of data (x, z), and using the loss function L in equation (5)EGTraining G and E, tracing from equation (5)It can be seen that the penalty is a weighted result of the distance of the joint distribution Wasserstein judged by the optimized arbiter and the SSIM difference penalty between data x and its reconstructed result. Given x, calculate x' ═ G (E)ng(x) And calculates Tx in equation (13) using (x, x') to train G.
And 3, step 3: input a batch of z to train Dz and use the loss function L in equation (8)DzAnd performing iterative training, wherein the loss is the MMD loss between the z and z reconstruction results E (G (z)) obtained after decomposition. The step is repeated
Figure BDA0003121662180000181
Second (default to 3).
And 4, step 4: inputting a batch of z to calculate
Figure BDA0003121662180000182
Input device
Figure BDA0003121662180000183
E is trained with the loss function Tz in equation (10).
The settings of parameters used in training and testing on the Celeba-hq and Mnist data sets in the overall process are shown in Table 1, and Table 1 shows the parameter settings in the present invention.
Figure BDA0003121662180000184
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. An unsupervised bidirectional generation automatic coding method for convergence guarantee is characterized by comprising the following steps:
step S1: the batch data (x, z) simultaneously generates an encoding result E (x) and generated data G (z) through an encoder E and a generator G, and the mapping from the hidden variable space to the data space and the reverse mapping from the data space to the hidden variable space are completed;
step S2: image space data
Figure FDA0003121662170000014
And latent variable space data
Figure FDA0003121662170000017
By rolling up blocks FxAnd Fz, respectively, for image space data
Figure FDA0003121662170000015
And latent variable space data
Figure FDA0003121662170000016
Extracting information to obtain extracted image space data
Figure FDA00031216621700000111
And extracted hidden variable space data
Figure FDA0003121662170000018
Step S3: from the extracted image space data
Figure FDA0003121662170000019
And extracted hidden variable space data
Figure FDA00031216621700000110
Training a discriminator D until the loss function is minimum;
step S4: the image data is encoded by the trained encoder E to generate an encoding result, the encoding result is input into the trained generator G to obtain a reconstructed image data result, the reconstruction of the image data is completed, the overall collaborative optimization of the image space and hidden variable space bidirectional mapping process is realized, and the representation capability and the image generation capability are improved.
2. The convergence guarantee oriented unsupervised bidirectional generation automatic encoding method of claim 1, wherein the generator G and the encoder E use a convolutional network module and a deconvolution network module, respectively, in a DCGAN structure.
3. The convergence assurance oriented unsupervised bi-directional generation automatic encoding method of claim 1, wherein the loss function in the step S3 comprises: a Wasserstein distance;
Figure FDA0003121662170000011
wherein ,
Figure FDA0003121662170000012
represents the Wasserstein distance;
Figure FDA0003121662170000013
d represents the likelihood probability of the input data pair; fx and FzRepresenting a volume block.
4. The convergence guarantee oriented unsupervised bi-directional generation automatic coding method according to claim 1, further comprising: an embedded MMD discriminator module Dz is introduced, the convolution blocks Fx and Fz are multiplexed, an encoder E and a generator G realize an embedded GAN network, and the embedded GAN network is utilized to reduce MMD distance between z and distribution of a reconstruction result E (G (z)) to strengthen convergence of the whole bidirectional structure and strengthen semantic correlation characterization capability of the encoder.
5. The convergence guarantee oriented unsupervised bidirectional generation automatic coding method of claim 4, wherein the reducing MMD distance between z and E (G (z)) distribution of reconstruction results by using embedded GAN network to enhance convergence of the whole bidirectional structure and enhance semantically related characterization capability of the encoder comprises:
step S5: the encoding result E (x) is reconstructed by a generator G to generate data G (E (x)); generating data g (z) and reconstructing an encoding result E (g (z)) by an encoder E;
step S6: calculating by using an embedded MMD discriminator module Dz to obtain an MMD difference loss function distributed between z and E (G (z))
Figure FDA00031216621700000212
And using a loss function
Figure FDA00031216621700000213
Optimizing an embedded MMD discriminator module Dz;
Figure FDA0003121662170000021
Figure FDA0003121662170000022
wherein ,PzRepresenting the spatial distribution of hidden variables; y isz=Fz(z);yz'=Dz(z'); z, z' represents a sample on the hidden variable space;
Figure FDA0003121662170000023
representing a reconstruction result of the hidden variable z to generate a data recoding result;
Figure FDA0003121662170000024
to represent
Figure FDA00031216621700000219
A conditional probability distribution of (a);
Figure FDA0003121662170000025
λ1representing a gradient penalty term weight parameter;
Figure FDA0003121662170000026
presentation pair
Figure FDA00031216621700000214
The above acquired gradient; σ denotes krep(ii) a coefficient of variance of the conforming gaussian distribution; e is the base constant of the natural logarithm;
Figure FDA0003121662170000027
ε represents the value of z and its reconstructed result
Figure FDA00031216621700000218
Solving a weight parameter during weighted sampling;
step S7: solving MMD loss function distributed between z and E (G (z)) about encoder E and generator G by using optimized embedded MMD discriminator module Dz
Figure FDA00031216621700000215
Figure FDA0003121662170000028
Figure FDA0003121662170000029
wherein ,kFRepresenting a gaussian kernel function; bl,buLower and upper bound parameters representing the distance L2 between a and b, respectively;
step S8: using loss functions
Figure FDA00031216621700000216
Generating a guide item Tz on a coding space, and further finishing the training of the embedded GAN network;
Figure FDA00031216621700000210
wherein ,∝1To represent
Figure FDA00031216621700000221
And pixel level of the reconstruction operation2Loss of power
Figure FDA00031216621700000220
The weighting coefficient of (2); n isbRepresenting a data batch size;
Figure FDA00031216621700000211
original hidden variable z and reconstruction result
Figure FDA00031216621700000217
Of (b) in2Loss;
step S9: the convergence of the current Bi-GAE bidirectional structure is optimized through the trained embedded GAN network, and the semantic representation capability of the encoder E is further improved through the generated guide items.
6. Convergence assurance oriented unsupervised duplex according to claim 5The method for automatically encoding the vector generator is characterized by further comprising the following steps: calculating to obtain a guide item T by using an SSIM module according to the real image x and the reconstructed image G (E (x)))xThe human visual feature generation capability of the generator is enhanced;
SSIM concerns brightness (l (x, x ')), contrast (contrast) and structure (structure) (measured in combination with cs (x, x'))
Figure FDA0003121662170000031
LMSSSIM(x,x')=1-MSSSIM(x,x') (8)
Figure FDA0003121662170000032
Figure FDA0003121662170000033
Figure FDA0003121662170000034
Wherein, x ═ G (e (x)); l isMSSSIM(x, x') denotes the SSIM loss function, α2Indicates a loss of SSIM and
Figure FDA00031216621700000315
a weighting factor of the loss;
Figure FDA0003121662170000035
denotes l between x and the reconstructed result1Loss value, /)m γmRepresents the result of l (x, x') after passing through m gaussian filters; gamma represents an attenuation parameter; m represents the number of gaussian filters; c. C1Represents a constant parameter;
Figure FDA0003121662170000036
represents the result generated by the jth filter; etajRepresenting the attenuation coefficient of the corresponding window if x has a magnitude of(s)i×si) Then the jth filter window size is
Figure FDA0003121662170000037
pjRepresents the filter window, μx,μx'Respectively represent the average values of x and x'; sigmax,σx'Respectively, the standard deviation of x, x'.
7. An unsupervised bi-directional generation automatic coding system for convergence guarantee, comprising:
module M1: the batch data (x, z) simultaneously generates an encoding result E (x) and generated data G (z) through an encoder E and a generator G, and the mapping from the hidden variable space to the data space and the reverse mapping from the data space to the hidden variable space are completed;
module M2: image space data
Figure FDA0003121662170000038
And latent variable space data
Figure FDA0003121662170000039
By rolling blocks Fx and Fz, respectively, image space data is processed
Figure FDA00031216621700000311
And latent variable space data
Figure FDA00031216621700000310
Extracting information to obtain extracted image space data
Figure FDA00031216621700000313
And extracted hidden variable space data
Figure FDA00031216621700000312
Module M3: from the extracted image space data
Figure FDA00031216621700000314
And extracted hidden variable space data
Figure FDA00031216621700000316
Training a discriminator D until the loss function is minimum;
module M4: the image data is encoded by the trained encoder E to generate an encoding result, the encoding result is input into the trained generator G to obtain a reconstructed image data result, the reconstruction of the image data is completed, the overall collaborative optimization of the image space and hidden variable space bidirectional mapping process is realized, and the representation capability and the image generation capability are improved.
8. The convergence assurance oriented unsupervised bi-directional generation automatic coding system of claim 7, wherein the loss function in the module M3 comprises: a Wasserstein distance;
Figure FDA0003121662170000041
wherein ,
Figure FDA0003121662170000042
represents the Wasserstein distance;
Figure FDA0003121662170000043
d represents the likelihood probability of the input data pair; fx and FzRepresenting a volume block.
9. The convergence assurance oriented unsupervised bi-directional generation automatic encoding system of claim 7, further comprising: an embedded MMD discriminator module Dz is introduced, the convolution blocks Fx and Fz are multiplexed, an encoder E and a generator G realize an embedded GAN network, and the embedded GAN network is utilized to reduce the MMD distance between z and the distribution of a reconstruction result E (G (z)) to strengthen the convergence of the whole bidirectional structure and strengthen the semantic correlation characterization capability of the encoder;
the MMD distance between z and the distribution of the reconstruction result E (G (z)) is reduced by utilizing the embedded GAN network to strengthen the convergence of the whole bidirectional structure and strengthen the semantically-related characterization capability of the encoder, and the MMD distance comprises the following steps:
module M5: the encoding result E (x) is reconstructed by a generator G to generate data G (E (x)); generating data g (z) and reconstructing an encoding result E (g (z)) by an encoder E;
module M6: calculating by using an embedded MMD discriminator module Dz to obtain an MMD difference loss function distributed between z and E (G (z))
Figure FDA00031216621700000413
And using a loss function
Figure FDA00031216621700000414
Optimizing an embedded MMD discriminator module Dz;
Figure FDA0003121662170000044
Figure FDA0003121662170000045
wherein ,PzRepresenting the spatial distribution of hidden variables; y isz=Dz(z);yz'=Dz(z'); z, z' represents a sample on the hidden variable space;
Figure FDA00031216621700000416
representing a reconstruction result of the hidden variable z to generate a data recoding result;
Figure FDA0003121662170000046
to represent
Figure FDA00031216621700000415
A conditional probability distribution of (a);
Figure FDA0003121662170000047
λ1representing a gradient penalty term weight parameter;
Figure FDA0003121662170000048
presentation pair
Figure FDA00031216621700000417
The above acquired gradient; σ denotes krep(ii) a coefficient of variance of the conforming gaussian distribution; e is the base constant of the natural logarithm;
Figure FDA0003121662170000049
ε represents the value of z and its reconstructed result
Figure FDA00031216621700000418
Solving a weight parameter during weighted sampling;
module M7: solving MMD loss function distributed between z and E (G (z)) about encoder E and generator G by using optimized embedded MMD discriminator module Dz
Figure FDA00031216621700000419
Figure FDA00031216621700000410
Figure FDA00031216621700000411
wherein ,kFRepresenting a gaussian kernel function; bl,buLower and upper bound parameters representing the distance L2 between a and b, respectively;
module M8: using loss functions
Figure FDA00031216621700000412
Generating a guide item Tz on a coding space, and further finishing the training of the embedded GAN network;
Figure FDA0003121662170000051
wherein ,∝1To represent
Figure FDA00031216621700000510
And pixel level of the reconstruction operation2Loss of power
Figure FDA00031216621700000511
The weighting coefficient of (2); n isbRepresenting a data batch size;
Figure FDA0003121662170000052
original hidden variable z and reconstruction result
Figure FDA00031216621700000512
Of (b) in2Loss;
module M9: the convergence of the current Bi-GAE bidirectional structure is optimized through the trained embedded GAN network, and the semantic representation capability of the encoder E is further improved through the generated guide items.
10. The convergence assurance oriented unsupervised bi-directional generation automatic encoding system of claim 9, further comprising: calculating to obtain a guide item T by using an SSIM module according to the real image x and the reconstructed image G (E (x)))xThe human visual feature generation capability of the generator is enhanced;
SSIM concerns brightness (l (x, x ')), contrast (contrast) and structure (structure) (measured in combination with cs (x, x'))
Figure FDA0003121662170000053
LMSSSIM(x,x')=1-MSSSIM(x,x') (19)
Figure FDA0003121662170000054
Figure FDA0003121662170000055
Figure FDA0003121662170000056
Wherein, x ═ G (e (x)); l isMSSSIM(x, x') denotes the SSIM loss function, α2Indicates a loss of SSIM and
Figure FDA00031216621700000513
a weighting factor of the loss;
Figure FDA0003121662170000057
denotes l between x and the reconstructed result1Loss value, /)m γmRepresents the result of l (x, x') after passing through m gaussian filters; gamma represents an attenuation parameter; m represents the number of gaussian filters; c. C1Represents a constant parameter;
Figure FDA0003121662170000058
represents the result generated by the jth filter; etajRepresenting the attenuation coefficient of the corresponding window if x has a magnitude of(s)i×si) Then the jth filter window size is
Figure FDA0003121662170000059
pjRepresents the filter window, μx,μx'Respectively represent the average values of x and x'; sigmax,σx'Respectively, the standard deviation of x, x'.
CN202110678193.6A 2021-06-18 2021-06-18 Automatic encoding method and system for unsupervised bidirectional generation oriented to convergence guarantee Active CN113298895B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110678193.6A CN113298895B (en) 2021-06-18 2021-06-18 Automatic encoding method and system for unsupervised bidirectional generation oriented to convergence guarantee

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110678193.6A CN113298895B (en) 2021-06-18 2021-06-18 Automatic encoding method and system for unsupervised bidirectional generation oriented to convergence guarantee

Publications (2)

Publication Number Publication Date
CN113298895A true CN113298895A (en) 2021-08-24
CN113298895B CN113298895B (en) 2023-05-12

Family

ID=77328729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110678193.6A Active CN113298895B (en) 2021-06-18 2021-06-18 Automatic encoding method and system for unsupervised bidirectional generation oriented to convergence guarantee

Country Status (1)

Country Link
CN (1) CN113298895B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114330514A (en) * 2021-12-14 2022-04-12 深圳大学 Data reconstruction method and system based on depth features and gradient information
CN115242250A (en) * 2022-09-21 2022-10-25 成都工业学院 Encoding and decoding method for single-full mapping of multi-value chain data element allocation

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109146988A (en) * 2018-06-27 2019-01-04 南京邮电大学 Non-fully projection CT image rebuilding method based on VAEGAN
CN109523463A (en) * 2018-11-20 2019-03-26 中山大学 A kind of face aging method generating confrontation network based on condition
CN110751698A (en) * 2019-09-27 2020-02-04 太原理工大学 Text-to-image generation method based on hybrid network model
CN110866958A (en) * 2019-10-28 2020-03-06 清华大学深圳国际研究生院 Method for text to image
US10652565B1 (en) * 2017-10-12 2020-05-12 Amazon Technologies, Inc. Image compression and decompression using embeddings
CN111340791A (en) * 2020-03-02 2020-06-26 浙江浙能技术研究院有限公司 Photovoltaic module unsupervised defect detection method based on GAN improved algorithm
CN112070209A (en) * 2020-08-13 2020-12-11 河北大学 Stable controllable image generation model training method based on W distance
CN112424779A (en) * 2018-07-13 2021-02-26 映佳控制公司 Method and system for generating synthetic anonymous data for given task

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10652565B1 (en) * 2017-10-12 2020-05-12 Amazon Technologies, Inc. Image compression and decompression using embeddings
CN109146988A (en) * 2018-06-27 2019-01-04 南京邮电大学 Non-fully projection CT image rebuilding method based on VAEGAN
CN112424779A (en) * 2018-07-13 2021-02-26 映佳控制公司 Method and system for generating synthetic anonymous data for given task
CN109523463A (en) * 2018-11-20 2019-03-26 中山大学 A kind of face aging method generating confrontation network based on condition
CN110751698A (en) * 2019-09-27 2020-02-04 太原理工大学 Text-to-image generation method based on hybrid network model
CN110866958A (en) * 2019-10-28 2020-03-06 清华大学深圳国际研究生院 Method for text to image
CN111340791A (en) * 2020-03-02 2020-06-26 浙江浙能技术研究院有限公司 Photovoltaic module unsupervised defect detection method based on GAN improved algorithm
CN112070209A (en) * 2020-08-13 2020-12-11 河北大学 Stable controllable image generation model training method based on W distance

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JEFF DONAHUE 等: "ADVERSARIAL FEATURE LEARNING", 《ARXIV:1605.09782V7》 *
MARTIN ARJOVSKY 等: "Wasserstein GAN", 《ARXIV:1701.07875V3》 *
SHENG MAO 等: "Discriminative Autoencoding Framework for Simple and Efficient Anomaly Detection", 《DIGITAL OBJECT IDENTIFIER》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114330514A (en) * 2021-12-14 2022-04-12 深圳大学 Data reconstruction method and system based on depth features and gradient information
CN114330514B (en) * 2021-12-14 2024-04-05 深圳大学 Data reconstruction method and system based on depth features and gradient information
CN115242250A (en) * 2022-09-21 2022-10-25 成都工业学院 Encoding and decoding method for single-full mapping of multi-value chain data element allocation

Also Published As

Publication number Publication date
CN113298895B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN109754403A (en) Tumour automatic division method and system in a kind of CT image
Marimont et al. Anomaly detection through latent space restoration using vector quantized variational autoencoders
Li et al. Scconv: spatial and channel reconstruction convolution for feature redundancy
CN115409733B (en) Low-dose CT image noise reduction method based on image enhancement and diffusion model
CN111932444A (en) Face attribute editing method based on generation countermeasure network and information processing terminal
CN113298895A (en) Convergence guarantee-oriented unsupervised bidirectional generation automatic coding method and system
CN112541864A (en) Image restoration method based on multi-scale generation type confrontation network model
CN113822437A (en) Deep layered variational automatic encoder
CN112233012A (en) Face generation system and method
CN117058307A (en) Method, system, equipment and storage medium for generating heart three-dimensional nuclear magnetic resonance image
CN114332287A (en) Method, device, equipment and medium for reconstructing PET (positron emission tomography) image based on transformer feature sharing
CN113538608A (en) Controllable character image generation method based on generation countermeasure network
CN111626296A (en) Medical image segmentation system, method and terminal based on deep neural network
AU2022288157A1 (en) Method for producing an image of expected results of medical cosmetic treatments on a human anatomical feature from an image of the anatomical feature prior to these medical cosmetic treatments
Chen et al. Self-supervised neuron segmentation with multi-agent reinforcement learning
Yang et al. Low‐dose CT denoising with a high‐level feature refinement and dynamic convolution network
Andersson et al. Evaluation of data augmentation of MR images for deep learning
Poonkodi et al. 3d-medtrancsgan: 3d medical image transformation using csgan
Zwettler et al. Strategies for training deep learning models in medical domains with small reference datasets
Tang et al. A deep map transfer learning method for face recognition in an unrestricted smart city environment
CN112541566B (en) Image translation method based on reconstruction loss
Jeon et al. Continuous face aging generative adversarial networks
CN115482557A (en) Human body image generation method, system, device and storage medium
Ren et al. Medical image super-resolution based on semantic perception transfer learning
Ni et al. Natural Image Reconstruction from fMRI Based on Self-supervised Representation Learning and Latent Diffusion Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant