CN109447906B - Picture synthesis method based on generation countermeasure network - Google Patents

Picture synthesis method based on generation countermeasure network Download PDF

Info

Publication number
CN109447906B
CN109447906B CN201811325648.0A CN201811325648A CN109447906B CN 109447906 B CN109447906 B CN 109447906B CN 201811325648 A CN201811325648 A CN 201811325648A CN 109447906 B CN109447906 B CN 109447906B
Authority
CN
China
Prior art keywords
network
loss
picture
pictures
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811325648.0A
Other languages
Chinese (zh)
Other versions
CN109447906A (en
Inventor
解凯
何翊卿
李桐
李婷
孙磬宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Graphic Communication
Original Assignee
Beijing Institute of Graphic Communication
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Graphic Communication filed Critical Beijing Institute of Graphic Communication
Priority to CN201811325648.0A priority Critical patent/CN109447906B/en
Publication of CN109447906A publication Critical patent/CN109447906A/en
Application granted granted Critical
Publication of CN109447906B publication Critical patent/CN109447906B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a picture synthesis method based on a generated countermeasure network, which performs feature extraction and fusion on pictures in different domains to generate a new picture, and comprises the following steps: firstly, collecting and sorting picture samples, and grouping the picture samples, wherein each group of pictures has the same characteristics; then, constructing and training an countermeasure network, and initializing network parameters; then, selecting a proper loss function and an optimization method; then, the sample is transmitted into a generated countermeasure network to start training; finally, according to the training result, the network parameters are properly adjusted so as to obtain better results. The invention synthesizes the image contents and generates a new image, and simultaneously simplifies the manual operation and improves the working efficiency.

Description

Picture synthesis method based on generation countermeasure network
Technical Field
The invention relates to a picture synthesis method based on a generated countermeasure network, belonging to the technical field of deep learning and digital graphic image processing.
Background
With the development of the fields of computer hardware and neural networks, artificial intelligence is gaining attention, and plays an increasingly important role in the life of people. Deep learning stems from the development of neural networks, the concept of which was proposed by Hinton et al in 2006, with the aim of simulating the human brain for analysis and interpretation of data. It is desirable to find a deep neural network model through deep learning that can represent probability distributions among the various data encountered in artificial intelligence applications, including image processing, natural language processing, and the like.
Deep learning can be classified into supervised learning, semi-supervised learning, unsupervised learning, and the like. The generation of the countermeasure network is a typical and very promising unsupervised learning, which is a neural network model of generating a model by the countermeasure process estimation, and the optimization is a binary minimum and maximum game process. However, the original generation has the problems of unstable training and vanishing gradient of the countermeasure network, and the problem of model Collapse (Mode Collapse) also occurs frequently. DRAGAN proposes training of GAN as an unfortunately minimized process characterized by: in the repeated game process, both parties to the game need to use a repentance algorithm. The DRAGAN has the advantages of high training speed and less mode collapse when reaching a stable state. While InfoGAN is a network model whose input components are interpretable by non-supervised training (Unsupervi sed Training), it can directly control the variation of the generated samples by varying the components of the input vectors.
Pictures, as technology evolves to become an integral part of people's life, are often desired to incorporate different forms of presentation for a picture, as required by work, life, and other aspects. For example, in operation, a person often needs to blend himself or herself, or a designated person, into a designated landscape. Currently, some picture processing software (such as PhotoShop) is needed to complete the synthesis of two picture contents. However, this approach requires more time to be spent, is cumbersome to operate, and requires the user to be familiar with the software than using a generated countermeasure network.
Currently, scholars have proposed using generation of a countermeasure network for cross-domain image style migration, such as CycleGAN, di scoGAN, dual gan, and StarGAN, where CycleGAN, discoGAN and dual gan are features of a picture that learn two domains, and then migrate these features to each other, and the resulting picture has features of two domains. For example, cycleGAN can change the picture of horses into the picture of zebras through training, and also can change the landscape picture in summer into the appearance in winter; the DiscoGAN can convert a real photo taken by a camera into a picture with a fanghi style under the condition that most of the details of the photo are reserved; starGAN can be used to migrate picture styles of any two domains to each other or to characterize a picture with more than two domains. The method aims at migrating the style of the image, namely migrating the characteristics of the wind, the color and the like of one image, and aims at reserving the characteristics and the details of the image in two domains, so that people can be added into scenic images, and the positions and the light and shadow effects of the people cannot be against environmental violations. The fundamental difference between the two is that: the CycleGAN takes a picture in one domain as an original picture, adds some features in the other domain, and generates the content of the picture mainly determined by the original picture; the invention extracts a picture from two domains, synthesizes the characteristic information and detail information of the two pictures, and determines the content of the generated picture by the two pictures.
Disclosure of Invention
The invention solves the technical problems: the method has the advantages that the defects of the prior art are overcome, the picture synthesis method based on the generation countermeasure network is provided, the technology in the deep learning field is used for realizing cross-domain picture synthesis, the operation steps are greatly simplified, and the generation efficiency and the finished product effect are improved; meanwhile, the operation is simple and convenient, tedious manual operation is avoided, and manpower resources and time consumption are saved.
The technical solution of the invention is as follows: a picture synthesis method based on a generated countermeasure network is named as a Cross-domain synthesized countermeasure network, CSGAN (Cross-domain Synthesis Generat ive Neural Net) is used, wherein partial characteristics of DRAGAN and InfoGAN are added, a CSGAN model is firstly built, then a training model is used for learning common characteristics of various samples, and finally the model can be combined with the learned characteristics to generate a specified picture. In addition, the training speed and the result quality are improved by adopting a method in the DRAGAN, and the extraction and the processing of the implicit feature vector are conducted by referring to the technology in the InfoGAN.
The invention provides a picture synthesis method based on a generated countermeasure network, which comprises the following steps:
(1) Collecting pictures required for training, and making sample sets of two pictures, wherein one sample set of the pictures is a landscape picture, the other sample set is a human picture, and all pictures in each sample set should contain some same characteristics and are called a domain;
(2) Constructing and training to generate an countermeasure network, wherein the countermeasure network consists of a feature extractor, a decoder, a generation network and a discrimination network; the feature extractor scans and collects the detail information of the pictures, identifies the patterns in the pictures, takes the patterns shared by the sample sets as the basic features of the pictures of the sample sets, randomly extracts a picture from a domain, extracts the features contained in the picture and represents the features by using a high-dimensional vector, wherein the high-dimensional vector contains the common features of the domain and the unique features in each picture; the decoder is used for pre-training the feature extractor, generates a new picture according to the high-dimensional vector extracted by the feature extractor from a certain domain, ensures that the more similar and better the generated picture is, the more features of the domain picture are reserved as much as possible, thereby ensuring that the feature extractor extracts enough information from the original picture, and then deleting the decoder after the training of the feature extractor is finished; judging whether the network contains two domains according to the pictures generated by the generation network, learning the characteristics of the real pictures in the two domains at the same time, measuring the gap between the real pictures and the generated pictures, and guiding the countermeasure network to generate the pictures with the characteristics of the two domains; wherein parameters are shared between the convolutional layer of the feature extractor and the convolutional layer of the discrimination network (parameters sharing); during the training process, the shared parameters are updated along with the updating of the parameters of the discriminator, but during the updating of the parameters of the extractor, the shared parameters are kept unchanged; the function of the generating network is to generate a brand new image according to given information, the image is true enough to make the discriminating network difficult to accurately discriminate, the input of the image is an N-dimensional vector c composed of two hidden variables h extracted by a feature extractor and a random sampling Gaussian noise z, the two hidden variables h are used for storing the features extracted from the pictures in two domains, the lengths of the two hidden variables h are the same, and z is used for adding more randomness to the generated pictures to make the generated images more diversified;
(3) Initializing parameters for generating each network layer (layer) in the countermeasure network;
(4) The total loss of the countermeasure network consists of the generated loss and the reconstruction loss of the hidden vector between the generated sample and the real sample given by the discrimination network, the reconstruction loss is used for detecting whether the feature extractor can find a common feature vector between the original image and the generated image, the generated loss is used for measuring whether the generated image is real and contains the features of a certain domain, a loss function is required to be selected according to the actual problem, then the reconstruction loss, the generated loss and the discrimination loss are calculated according to the selected loss function, and the gradient of the countermeasure network is calculated in the back propagation process; simultaneously, an optimization method is selected to update the parameters of the countermeasure network; the parameters comprise weights and biases of all network layers;
(5) Training to generate an countermeasure network, recording a generation result, and storing parameters of the network, wherein the parameters comprise weights and biases of all layers and the overall structure of the network;
(6) Evaluating a generation result of the generation countermeasure network, and adjusting network super-parameters, wherein the super-parameters comprise: network learning rate, training period number, implicit vector dimension and feature map number; and then carrying out multiple tests to finally obtain the required generation result.
In the step (3), in the generation of the countermeasure network, the discrimination loss L D Consists of three parts: the discrimination loss of the real sample, the discrimination loss of the generated sample and the loss obtained by calculating gradient penalty correspond to three expected calculations in the formula respectively, and the three expected calculations are expressed as follows:
Figure GDA0004174979930000031
p in the formula τ And P g A data distribution representing the real samples and the generated samples; epsilon represents a coefficient for controlling the gradient punishment force, and the gradient punishment force is set to be 10 through repeated test determination; k represents the range in which the gradient is expected to be controlled, gradients larger or smaller than k are subjected to gradient penalty, and k is 1 through trial and error; n (N) d Is a normal distribution with a mean of 0 and a variance of cI, where c is set to 10 through trial and error; (V) x D refers to the gradient of the arbiter; θ is a parameter of the arbiter;
generating total loss L of network G The discrimination network gives discrimination loss between the score and the sample label and reconstruction loss L of the hidden vector recon The composition, formula, is:
L G =-E a ~P A ,z~N(0,1)[D A (G(Ex A (a),Ex B (b),z)]
-E b ~P B ,z~N(0,1)[D B (G(Ex A (a),Ex B (b),z)]+λL recon
L recon =V(Ex A (G(Ex A (a),Ex B (b),z),Ex A (a))
+V(Ex B (G(Ex A (a),Ex B (b),z),Ex B (a))
wherein P represents sample distribution, z represents noise conforming to standard Gaussian distribution, V represents reconstruction loss, and mean square error calculation is commonly used; λ represents the specific gravity of the reconstruction loss in the generation loss; e represents the computation of mathematical expectations, ex represents the extractor; a and b represent samples extracted from both distributions, and z represents a gaussian noise randomly sampled from a standard normal distribution. ζ is a superparameter named feature specific gravity ratio having a value between 0 and 1, which ratio, as the name implies, is suitable for regulating the specific gravity of a generated sample under the influence of two types of sample features. According to the actual needs, the size of ζ can be changed to regulate and control the content of the generated sample, and the value of the coefficient is directly given by the user.
The loss function is a binary cross entropy loss, a mean square error, an exponential loss function or a range loss function.
Compared with the prior art, the invention has the advantages that:
(1) The invention uses the technology in the deep learning field to realize the cross-domain picture synthesis, greatly simplifies the operation steps, absorbs the advantages of other generation countermeasure networks, and improves the generation efficiency and the finished product effect. The model in the invention needs to use a large amount of samples and time for training, but the model after training can finish the process of synthesizing the cross-domain pictures only by spending a short time, and the invention has simple and convenient operation and high practical value. Moreover, the method can avoid complicated manual operation, and saves manpower resources and time consumption.
(2) The invention also learns the characteristics of two domains, but can learn similar detail information between pictures of different domains, and reflects the details in the generated pictures, namely adding more details as much as possible on the basis of retaining the characteristics of the pictures of the two domains. The invention can combine the pictures in two domains and combine the information such as illumination, color, style and the like in the pictures to make the generated image more realistic.
Drawings
Fig. 1 is a network flow diagram of CSGAN according to the present invention;
fig. 2 is a network configuration diagram of the CSGAN according to the present invention;
FIG. 3 is a block diagram of a feature extractor of CSGAN in accordance with the present invention;
fig. 4 is a diagram of a network structure for generating CSGAN according to the present invention;
Detailed Description
The picture composition method based on the generation of the countermeasure network is explained in detail below with reference to the accompanying drawings, and the basic flow is shown in fig. 1.
1. Sample pictures are gathered from the network and preprocessed.
The present generation countermeasure network requires collection and downloading of a large number of samples on the network, and extensive training to learn a probability distribution and generate data. All pictures require a resolution of greater than 128X128, and the picture content includes scenery and portraits. The portrait pictures are from CelebA picture sets, and the landscape pictures are from network picture sets obtained by crawling web pages by Python. These pictures were made into two sample sets. The sample set A comprises N portrait pictures in a CelebA sample set, and N landscape pictures collected by authors are stored in the sample set B;
numbering all sample images, and deleting blurred pictures with watermarks. The grey scale picture is fitted with a textual description and the textual description is converted into a vector for limiting the style of the generated image. The text description is implemented using pixel normalization, formulated as:
Figure GDA0004174979930000051
where X' is the normalized pixel, X is the currently required pixel, b=255, a=0 since it is desirable to convert the text vector into a gray scale form. The normalized vector will be passed into the generator and the discriminant as a description vector during training.
And selecting N portrait pictures as a sample set A of the neural network, and taking N landscape pictures as a sample set B of the neural network. The sample set is divided into training samples and random samples during each training process, and downsampling operation is performed according to the resolution of the current stage of the generation countermeasure network, and the downsampling operation is completed by using a self-defined automatic encoder, so that the visual dimension reduction is performed through the self-defined automatic encoder. Then, all the real sample picture tags are set to 1, indicating that they are real samples. Finally, the OpenCV graphic library is used for carrying out image enhancement operation on the sample pictures, so that the machine can better distinguish the images.
2. Construction of CSGAN model
Firstly, the functions and principles of the network are analyzed, and how to realize the picture synthesis function needs to be determined in the analysis process. Then building a new architecture for generating an countermeasure network by using a Pytorch deep learning library: based on the original generated countermeasure network structure, a feature extraction layer is added, the network is designed into a two-way structure, hidden variables with limited dimensions are added to represent features, and a generated countermeasure network for image synthesis is built by combining the related characteristics of DRAGAN and InfoGAN and named CSGAN. As shown in fig. 2. After the model structure is determined, the parameters in the generated countermeasure network need to be weighted normalized (Weight Normalization) first so as to equalize the learning rate.
This generation of an countermeasure network consists of two featuresExtractor E A And E is B Two discrimination networks D A And D B And a generation network G. The Feature extractor consists of a number of convolution layers (Convolutional Layer), as shown in fig. 3, which extract Feature maps (Feature maps) from a given picture, fold (flame) the Feature maps into a one-dimensional array, which is the required implicit vector. The generating network is composed of a plurality of deconvolution layers (Transposed Convolutional Layer), each deconvolution layer is composed of a deconvolution network, a batch regularization layer and an activation function, as shown in fig. 4, it can do deconvolution operation to the input vector, then through the regularization layer and the activation function, a new picture is regenerated, namely, a Generated Sample is Generated. The discrimination network is a simple fully connected network (Fully Connection Nets) consisting of X linear layers, X depending on the complexity of the task, which serves to discriminate whether a picture is authentic and contains features of a certain class of images. The judging network can be added with a Dropout parameter regularization method during training, namely, when the judging network is used each time, some units in the nerve layer are selected randomly and temporarily hidden.
3. Network layer parameters of the CSGAN are initialized.
And carrying out parameter initialization operation on the convolution layer of the network, wherein the initialization mode is to set the bias of each layer to be 0, and initialize the weight through a LeCun normal distribution initialization method, so that the nonlinear layer SELU can normally play a role.
4. Selecting a loss function proposed in the DRAGAN to calculate reconstruction loss, generation loss and discrimination loss; selecting an Adam optimization function to update network parameters in a back propagation process; common loss functions are binary cross entropy loss, mean square error, exponential loss function, and range loss function.
5. Training CSGAN model
Firstly, the CSGAM model is trained by fixedly generating network parameters, and training a discrimination network, wherein the steps are as follows:
(5.1) according to the batch size m set in the super parameter, randomly performing in two sample setsSampling, respectively taking m portrait pictures x A And N scenery pictures x B As training samples, and record the pictures in sample set A as { x } A 1 ,x A 2 ,……,x A m An ith sample picture denoted as x A i The pictures in sample set B are denoted as { x } B 1 ,x B 2 ,……,x B m An ith sample picture denoted as x B i
(5.2) fixedly generating parameters of the network G, only allowing updating parameters of the discrimination network D, and zeroing the discrimination network gradient;
(5.3) X is A i And x B i Respectively into the feature extractor E A And E is B In the method, the dimension of the picture is reduced, and two groups of hidden vectors H of H dimensions are generated 1 And h 2 The method comprises the steps of carrying out a first treatment on the surface of the The implicit vector is generated in the following way: inputting a sample picture into a feature extractor, outputting two H-dimensional vectors a and b, sampling from normal distribution to obtain H-dimensional noise z, and calculating a final hidden vector H, wherein the calculation method comprises the following steps:
Figure GDA0004174979930000071
where e represents the natural logarithm and i represents the ith dimension of the vector.
(5.4) random sampling on a Standard Gaussian distribution, generating m H-dimensional Gaussian noise z noise
(5.5) z is to noise 、h 1 And h 2 Are connected together to form a 3H-dimensional vector, which is the input vector z of the generated network fake
(5.6) true Picture x A i And x B i The label of (1) is set to be 1, and then the label is respectively transmitted into the discrimination network D A And D B In which the discrimination Loss of a real sample is calculated using a Binary Cross Entropy Loss function (Binary Cross-Entropy Loss) according to a given discrimination score, the Loss being propagated back moreNewly judging the gradient of the network parameters;
(5.7) vector z fake Inputting the generation network G, forging N generation samples x fake
(5.8) Picture x fake The tag of (2) is set to 0 and then is transmitted into the discrimination network D A And D B In which a discrimination loss of the generated samples is calculated using a binary cross entropy loss function according to the given score, the loss is propagated back to update the discrimination network parameters, the process is formulated as:
θ d ←θ d +η▽V(θ d )
where η represents the learning rate in the hyper-parameters, and in the present invention, the learning rate is set to 0.0001, and although a lower learning rate slows down the convergence rate of training, the training result quality is better, V (θ d ) Representing gradients of the arbiter parameters; v represents the gradient, θ d Is a symbol custom used to represent a parameter.
(5.9) applying Gradient Penalty (Gradient Penalty) to the discrimination loss, specifically, adding a Penalty term in the calculation process of the discrimination loss, wherein the calculation method of the Penalty term is derived from DRAGAN, and the calculation method is as follows:
λEx~P τ ,δ~N d (0,cI)[max(0,||▽xD θ (x+δ)|| 2 -1)]
p in the formula r Representing a distribution of real sample data, x being sampled in this distribution; lambda represents a coefficient for controlling the strength of the gradient penalty, N d Is a normal distribution with a mean of 0 and a variance of cI, where c is set to 10; (V) x D refers to the gradient of the arbiter; θ is a parameter of the arbiter; the penalty term has the same effect as the generation of the WGAN and other penalty terms with gradient penalty in the antagonism network, the gradient of the discriminator can be limited to k, in the invention, k is set to be 1, and only the penalty term in the DRAGAN has better effect in the invention; the gradient of the discriminator can oscillate near 1 in the training process, the oscillation amplitude can be gradually reduced along with the training, the reduction rate can be accelerated by adjusting the punishment force lambda, and the training process is finishedAnd finally limiting the discrimination gradient to 1; counter-propagating the penalty term and updating the gradient of the discrimination network parameters;
(5.10) optimizing the discrimination network using Adam function;
then fixing parameters of the discrimination network and the feature extractor, and training the generation network, wherein the steps are as follows:
(5.11) fixedly judging the parameters of the network D and the feature extractor E, only allowing updating the parameters of the generating network G, and zeroing the generating network gradient;
(5.12) x is A i And x B i Respectively into the feature extractor E A And E is B Regenerating two groups of hidden vectors H of H dimension 1 And h 2 Then z is noise 、h 1 And h 2 Are connected together to form an input vector z fake
(5.13) vector z fake Inputting the generating network G, and counterfeiting N generating samples x fake
(5.14) Picture x fake The label of (1) is set to be 1, and then the identification network D is transmitted A And D B In which a binary cross entropy loss function is used to calculate a generation loss L of the generation sample according to the difference between the given discrimination score and the label G Back-propagating the loss and updating the parameter θ g This process is formulated as:
θ g ←θ g -η▽V(θ g )
wherein V (θ) g ) Representing the gradient of the generator parameters.
(5.15) Picture x fake Respectively into the feature extractor E A And E is B In (1) obtaining an implicit vector h 3 And h 4 . Will h 3 And h 4 And implicit vector h 1 And h 2 Comparing the computed differences, computing the reconstruction loss L of the implicit vector using the Mean-square Error (Mean-square Error) recon
(5.16) loss of reconstruction L recon Multiplying the characteristic specific gravity ζ by the generation loss L G Adding to obtain the final generation loss, and adding thisLoss counter-propagates and updates network parameters.
(5.17) repeating steps (5.1) to (5.16) M times according to the training period number M specified in the super parameter, and printing various loss values of the network every period, saving the parameters of the generated sample and the model every M/10 periods.
6. The adjustment generates hyper-parameters of the antagonism network.
There are many super-parameters in the network, such as learning rate, picture batch size, training cycle number, target resolution, starting resolution, etc. By adjusting these parameters, the network generated results can be influenced. This corresponds to an optimization procedure, which is to find the optimal parameters to enable the network to generate the best results.
7. Others
The invention optimizes and improves the structure and detail, adds some original designs for realizing target functions, slightly adjusts the training thought and learning process of the neural network, and adaptively changes the overall structure of the network, and the specific steps of the change are as follows:
(7.1) in the convolutional layer of CSGAN, the activation function is changed to SeLU (Scaled Exponential Linear Unit) to replace the common ReLU and the leakage ReLU, and the operation formula is as follows:
Figure GDA0004174979930000091
where α≡ 1.6732632, λ≡ 1.050701, which are strictly derived, they are directly taken into the formula. The benefits of using SeLU are: its result does not lose details of the region where z is less than 0 like a ReLU; seLU has a saturation region (Saturation Region); the slope of the SeLU in most areas is larger than 1, namely, the SeLU can amplify input data by 1.05070098 times, and the training speed can be increased. It should be noted, however, that the use of SeLU requires a constraint on the initialized value of the weights, the distribution of which should be normal and satisfy the condition of an average value of 0 and a variance of 1. Compared with a batch normalization method, the method is more stable, the obtained result is more accurate, and the convergence rate is also increased.
(7.2) the optimization function of the network uses Adam, which is essentially a combination of Momentum and RMSProp, and then corrects for its bias. Judging that the learning rate of the network (i.e. the step length of Adam) is 0.0001, generating that the learning rate of the network is 0.0005, and estimating the exponential decay rate beta of moment 1 And beta 2 0.5 and 0.999 respectively, the remaining parameters remain default.
(7.3) the extractor and the arbiter in the present invention will share parameters, and there are two methods for sharing parameters: 1. the extractor directly serves as a convolution layer of the discriminator, does not contain an independent network layer, and the discriminator only needs to contain a full connection layer. 2. The extractor and part of network layers in the discrimination layer are the same, and each comprises a part of independent convolution layer and a full connection layer. In the practical training process, the method is used, and has the advantages of less total network parameters and high training speed. The purpose of this is to: features of pictures are detected by an extractor and represented by implicit vectors, corresponding to a "number" given to each picture to identify them, the pictures of the same class having similar "numbers". The arbiter then uses the same extractor to extract these features from the generated picture and represent them as implicit variables, and then discriminates whether this implicit vector matches the features of a certain class of pictures.
(7.4) the input vector for generating the network in the present invention is composed of two parts, one part is an implicit vector for identifying the features and the other part is gaussian noise obtained by random sampling. The implicit vector represents the picture characteristics contained in the two input pictures, and Gaussian noise adds more diversity to the details of the generated pictures, so that the content and the presentation elements of the pictures are richer.
(7.5) the invention adds a superparameter ζ, named characteristic specific gravity, whose value is between 0 and 1, which, as the name implies, is suitable for regulating the specific gravity of the resulting sample under the influence of the characteristics of the two types of samples. According to actual needs, the size of ζ can be changed to regulate and control the content of the generated sample, and the value of the coefficient is directly given by a user; when ζ is larger, the influence of the characteristics of one type of sample on the generated sample is more prominent, and the influence of the characteristics of the other type of sample is weakened, and vice versa; thus, ζ can be changed to obtain a desired generated sample according to the need. This is because during training, the hyper-parameters affect two discriminant losses of the generated samples, formulated as follows:
Figure GDA0004174979930000101
wherein L is G Indicating the loss of generation, L D Representing two discrimination losses; according to zeta different, the generation network can weigh the generation loss brought by various picture characteristics, and the judgment loss with larger coefficient is more paid attention to and corrected while the regretation minimization is carried out.
What has been described above is only an embodiment of a picture composition method based on generating a countermeasure network embodying the invention. The present invention is not limited to the above-described embodiments. The description of the present invention is intended to be illustrative, and not to limit the scope of the claims. Many alternatives, modifications, and variations will be apparent to those skilled in the art. All technical schemes formed by equivalent substitution or equivalent transformation fall within the protection scope of the invention.

Claims (3)

1. The picture synthesis method based on the generation countermeasure network is characterized by comprising the following steps:
(1) Collecting pictures required for training, and making sample sets of two pictures, wherein one sample set of the pictures is a landscape picture, the other sample set is a human picture, and all pictures in each sample set should contain some same characteristics and are called a domain;
(2) Constructing and training to generate an countermeasure network, wherein the countermeasure network consists of a feature extractor, a decoder, a generation network and a discrimination network; the feature extractor scans and collects the detail information of the pictures, identifies the patterns in the pictures, takes the patterns shared by the sample sets as the basic features of the pictures of the sample sets, randomly extracts a picture from a domain, extracts the features contained in the picture and represents the features by using a high-dimensional vector, wherein the high-dimensional vector contains the common features of the domain and the unique features in each picture; the decoder is used for pre-training the feature extractor, generates a new picture according to the high-dimensional vector extracted by the feature extractor from a certain domain, ensures that the more similar and better the generated picture is, the more features of the domain picture are reserved as much as possible, thereby ensuring that the feature extractor extracts enough information from the original picture, and then deleting the decoder after the training of the feature extractor is finished; judging whether the network contains two domains according to the pictures generated by the generation network, learning the characteristics of the real pictures in the two domains at the same time, measuring the gap between the real pictures and the generated pictures, and guiding the countermeasure network to generate the pictures with the characteristics of the two domains; parameters can be shared between the convolution layer of the feature extractor and the convolution layer of the discrimination network; during the training process, the shared parameters are updated along with the updating of the parameters of the discriminator, but during the updating of the parameters of the extractor, the shared parameters are kept unchanged; the function of the generating network is to generate a brand new image according to given information, the image is true enough to make the discriminating network difficult to accurately discriminate, the input of the image is an N-dimensional vector c composed of two hidden variables h extracted by a feature extractor and a random sampling Gaussian noise z, the two hidden variables h are used for storing the features extracted from the pictures in two domains, the lengths of the two hidden variables h are the same, and z is used for adding more randomness to the generated pictures to make the generated images more diversified;
(3) Initializing parameters for generating each network layer in the countermeasure network;
(4) The total loss of the countermeasure network consists of the generated loss and the reconstruction loss of the hidden vector between the generated sample and the real sample given by the discrimination network, the reconstruction loss is used for detecting whether the feature extractor can find a common feature vector between the original image and the generated image, the generated loss is used for measuring whether the generated image is real and contains the features of a certain domain, a loss function is required to be selected according to the actual problem, then the reconstruction loss, the generated loss and the discrimination loss are calculated according to the selected loss function, and the gradient of the countermeasure network is calculated in the back propagation process; simultaneously, an optimization method is selected to update the parameters of the countermeasure network; the parameters comprise weights and biases of all network layers;
(5) Training to generate an countermeasure network, recording a generation result, and storing parameters of the network, wherein the parameters comprise weights and biases of all layers and the overall structure of the network;
(6) Evaluating a generation result of the generation countermeasure network, and adjusting network super-parameters, wherein the super-parameters comprise: network learning rate, training period number, implicit vector dimension and feature map number; and then carrying out multiple tests to finally obtain the required generation result.
2. A method of composing a picture based on a generation countermeasure network as claimed in claim 1, wherein: in the step (3), in the generation of the countermeasure network, the discrimination loss L D Consists of three parts: the discrimination loss of the real sample, the discrimination loss of the generated sample and the loss obtained by calculating gradient penalty correspond to three expected calculations in the formula respectively, and the three expected calculations are expressed as follows:
Figure FDA0004134954370000021
p in the formula τ And P g Respectively representing the data distribution of the real sample and the generated sample; epsilon represents a coefficient for controlling the strength of the gradient penalty, k represents in which range it is desired to control the gradient, gradients greater or less than k are subjected to the gradient penalty, N d Is a normal distribution with a mean of 0 and a variance of cI, where c is set to 10; (V) x D refers to the gradient of the arbiter; θ is a parameter of the arbiter;
generating total loss L of network G The discrimination network gives discrimination loss between the score and the sample label and reconstruction loss L of the hidden vector recon The composition, formula, is:
L G =-E a ~P A ,z~N(0,1)[D A (G(Ex A (a),Ex B (b),z)]-E b ~P B ,z~N(0,1)[D B (G(Ex A (a),Ex B (b),z)]+λL recon
L recon =V(Ex A (G(Ex A (a),Ex B (b),z),Ex A (a))+V(Ex B (G(Ex A (a),Ex B (b),z),Ex B (a))
wherein P represents sample distribution, A is a sample set A of a selected portrait, B is a sample set of a landscape picture, V represents reconstruction loss, and mean square error calculation is adopted; λ represents the specific gravity of the reconstruction loss in the generation loss; e represents the computation of mathematical expectations, ex represents the extractor; a and b represent samples extracted from two distributions; ζ is a superparameter named feature specific gravity ratio having a value between 0 and 1, as the name implies, which ratio is suitable for regulating the specific gravity of the resulting sample under the influence of the two types of sample features; according to the actual needs, the size of ζ can be changed to regulate and control the content of the generated sample, and the value of the coefficient is directly given by the user.
3. A method of composing a picture based on a generation countermeasure network as claimed in claim 1, wherein: the loss function is a binary cross entropy loss, a mean square error, an exponential loss function or a range loss function.
CN201811325648.0A 2018-11-08 2018-11-08 Picture synthesis method based on generation countermeasure network Active CN109447906B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811325648.0A CN109447906B (en) 2018-11-08 2018-11-08 Picture synthesis method based on generation countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811325648.0A CN109447906B (en) 2018-11-08 2018-11-08 Picture synthesis method based on generation countermeasure network

Publications (2)

Publication Number Publication Date
CN109447906A CN109447906A (en) 2019-03-08
CN109447906B true CN109447906B (en) 2023-07-11

Family

ID=65551957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811325648.0A Active CN109447906B (en) 2018-11-08 2018-11-08 Picture synthesis method based on generation countermeasure network

Country Status (1)

Country Link
CN (1) CN109447906B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210556B (en) * 2019-05-29 2022-09-06 中国科学技术大学 Pedestrian re-identification data generation method
CN110175575A (en) * 2019-05-29 2019-08-27 南京邮电大学 A kind of single Attitude estimation method based on novel high-resolution network model
CN110263858B (en) * 2019-06-21 2022-05-06 华北电力大学(保定) Bolt image synthesis method and device and related equipment
CN110598786B (en) * 2019-09-09 2022-01-07 京东方科技集团股份有限公司 Neural network training method, semantic classification method and semantic classification device
CN111008692A (en) * 2019-11-08 2020-04-14 国网天津市电力公司 Method and device for generating multi-energy metering characteristic data based on improved generation countermeasure network
WO2021092686A1 (en) * 2019-11-15 2021-05-20 Modiface Inc. Image-to-image translation using unpaired data for supervised learning
CN110991496B (en) * 2019-11-15 2023-05-30 北京三快在线科技有限公司 Model training method and device
CN111160555B (en) * 2019-12-26 2023-12-01 北京迈格威科技有限公司 Processing method and device based on neural network and electronic equipment
CN111242133B (en) * 2020-01-14 2022-06-28 山东浪潮科学研究院有限公司 Method and system for generating correlation of object in image and GAN hidden layer unit
CN111564160B (en) * 2020-04-21 2022-10-18 重庆邮电大学 Voice noise reduction method based on AEWGAN
CN111797891A (en) * 2020-05-21 2020-10-20 南京大学 Unpaired heterogeneous face image generation method and device based on generation countermeasure network
SG10202005064VA (en) * 2020-05-29 2021-12-30 Yitu Pte Ltd A decoder training method, a high-resolution face image generation method, a device and a computer device
CN111784565B (en) * 2020-07-01 2021-10-29 北京字节跳动网络技术有限公司 Image processing method, migration model training method, device, medium and equipment
SG10202006360VA (en) * 2020-07-01 2021-01-28 Yitu Pte Ltd Image generation method and device based on neural network
US20220405634A1 (en) * 2021-06-16 2022-12-22 Moxa Inc. Device of Handling Domain-Agnostic Meta-Learning
CN114092610B (en) * 2021-11-22 2023-04-07 哈尔滨工业大学(深圳) Character video generation method based on generation of confrontation network
CN117730309A (en) * 2022-05-31 2024-03-19 小米科技(武汉)有限公司 Model determination method, layout generation method, device, medium and chip
CN116822623B (en) * 2023-08-29 2024-01-12 苏州浪潮智能科技有限公司 Method, device, equipment and storage medium for generating countermeasures network joint training

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107437077A (en) * 2017-08-04 2017-12-05 深圳市唯特视科技有限公司 A kind of method that rotation face based on generation confrontation network represents study
CN108280885A (en) * 2018-01-09 2018-07-13 上海大学 The holographic idol method of structure
CN108288072A (en) * 2018-01-26 2018-07-17 深圳市唯特视科技有限公司 A kind of facial expression synthetic method based on generation confrontation network
CN108510435A (en) * 2018-03-28 2018-09-07 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN108711138A (en) * 2018-06-06 2018-10-26 北京印刷学院 A kind of gray scale picture colorization method based on generation confrontation network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9971958B2 (en) * 2016-06-01 2018-05-15 Mitsubishi Electric Research Laboratories, Inc. Method and system for generating multimodal digital images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107437077A (en) * 2017-08-04 2017-12-05 深圳市唯特视科技有限公司 A kind of method that rotation face based on generation confrontation network represents study
CN108280885A (en) * 2018-01-09 2018-07-13 上海大学 The holographic idol method of structure
CN108288072A (en) * 2018-01-26 2018-07-17 深圳市唯特视科技有限公司 A kind of facial expression synthetic method based on generation confrontation network
CN108510435A (en) * 2018-03-28 2018-09-07 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN108711138A (en) * 2018-06-06 2018-10-26 北京印刷学院 A kind of gray scale picture colorization method based on generation confrontation network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DualGAN: Unsupervised Dual Learning for Image-to-Image Translation;Zili Yi,et al;《2017 IEEE International Conference on Computer Vision》;20171225;正文第2868-2876页 *
基于条件生成对抗网络的手绘图像检索;刘玉杰等;《计算机辅助设计与图形学学报》;20171215(第12期);正文第2336-2342页 *
基于生成对抗网络的图片风格迁移;许哲豪等;《软件导刊》;20180529(第06期);正文第207-209、212页 *

Also Published As

Publication number Publication date
CN109447906A (en) 2019-03-08

Similar Documents

Publication Publication Date Title
CN109447906B (en) Picture synthesis method based on generation countermeasure network
CN108711138B (en) Gray level picture colorizing method based on generation countermeasure network
CN111563841B (en) High-resolution image generation method based on generation countermeasure network
CN112465111B (en) Three-dimensional voxel image segmentation method based on knowledge distillation and countermeasure training
CN108717568B (en) A kind of image characteristics extraction and training method based on Three dimensional convolution neural network
Liang et al. Understanding mixup training methods
CN111242841B (en) Image background style migration method based on semantic segmentation and deep learning
CN108171266A (en) A kind of learning method of multiple target depth convolution production confrontation network model
CN107239514A (en) A kind of plants identification method and system based on convolutional neural networks
CN111161137B (en) Multi-style Chinese painting flower generation method based on neural network
CN107977629A (en) A kind of facial image aging synthetic method of feature based separation confrontation network
CN107437077A (en) A kind of method that rotation face based on generation confrontation network represents study
CN110009057A (en) A kind of graphical verification code recognition methods based on deep learning
CN111282267A (en) Information processing method, information processing apparatus, information processing medium, and electronic device
CN110728629A (en) Image set enhancement method for resisting attack
CN110176050B (en) Aesthetic optimization method for text generated image
CN109635653A (en) A kind of plants identification method
Dogan et al. Semi-supervised image attribute editing using generative adversarial networks
CN105447566B (en) Training device, training method and detection device
CN110102051A (en) The plug-in detection method and device of game
CN113505855A (en) Training method for anti-attack model
CN112241741A (en) Self-adaptive image attribute editing model and method based on classified countermeasure network
CN114332565A (en) Method for generating image by generating confrontation network text based on distribution estimation condition
CN116958712B (en) Image generation method, system, medium and device based on prior probability distribution
CN112084936B (en) Face image preprocessing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant