CN109447906B

CN109447906B - Picture synthesis method based on generation countermeasure network

Info

Publication number: CN109447906B
Application number: CN201811325648.0A
Authority: CN
Inventors: 解凯; 何翊卿; 李桐; 李婷; 孙磬宇
Original assignee: Beijing Institute of Graphic Communication
Current assignee: Beijing Institute of Graphic Communication
Priority date: 2018-11-08
Filing date: 2018-11-08
Publication date: 2023-07-11
Anticipated expiration: 2038-11-08
Also published as: CN109447906A

Abstract

The invention relates to a picture synthesis method based on a generated countermeasure network, which performs feature extraction and fusion on pictures in different domains to generate a new picture, and comprises the following steps: firstly, collecting and sorting picture samples, and grouping the picture samples, wherein each group of pictures has the same characteristics; then, constructing and training an countermeasure network, and initializing network parameters; then, selecting a proper loss function and an optimization method; then, the sample is transmitted into a generated countermeasure network to start training; finally, according to the training result, the network parameters are properly adjusted so as to obtain better results. The invention synthesizes the image contents and generates a new image, and simultaneously simplifies the manual operation and improves the working efficiency.

Description

Picture synthesis method based on generation countermeasure network

Technical Field

The invention relates to a picture synthesis method based on a generated countermeasure network, belonging to the technical field of deep learning and digital graphic image processing.

Background

With the development of the fields of computer hardware and neural networks, artificial intelligence is gaining attention, and plays an increasingly important role in the life of people. Deep learning stems from the development of neural networks, the concept of which was proposed by Hinton et al in 2006, with the aim of simulating the human brain for analysis and interpretation of data. It is desirable to find a deep neural network model through deep learning that can represent probability distributions among the various data encountered in artificial intelligence applications, including image processing, natural language processing, and the like.

Deep learning can be classified into supervised learning, semi-supervised learning, unsupervised learning, and the like. The generation of the countermeasure network is a typical and very promising unsupervised learning, which is a neural network model of generating a model by the countermeasure process estimation, and the optimization is a binary minimum and maximum game process. However, the original generation has the problems of unstable training and vanishing gradient of the countermeasure network, and the problem of model Collapse (Mode Collapse) also occurs frequently. DRAGAN proposes training of GAN as an unfortunately minimized process characterized by: in the repeated game process, both parties to the game need to use a repentance algorithm. The DRAGAN has the advantages of high training speed and less mode collapse when reaching a stable state. While InfoGAN is a network model whose input components are interpretable by non-supervised training (Unsupervi sed Training), it can directly control the variation of the generated samples by varying the components of the input vectors.

Pictures, as technology evolves to become an integral part of people's life, are often desired to incorporate different forms of presentation for a picture, as required by work, life, and other aspects. For example, in operation, a person often needs to blend himself or herself, or a designated person, into a designated landscape. Currently, some picture processing software (such as PhotoShop) is needed to complete the synthesis of two picture contents. However, this approach requires more time to be spent, is cumbersome to operate, and requires the user to be familiar with the software than using a generated countermeasure network.

Currently, scholars have proposed using generation of a countermeasure network for cross-domain image style migration, such as CycleGAN, di scoGAN, dual gan, and StarGAN, where CycleGAN, discoGAN and dual gan are features of a picture that learn two domains, and then migrate these features to each other, and the resulting picture has features of two domains. For example, cycleGAN can change the picture of horses into the picture of zebras through training, and also can change the landscape picture in summer into the appearance in winter; the DiscoGAN can convert a real photo taken by a camera into a picture with a fanghi style under the condition that most of the details of the photo are reserved; starGAN can be used to migrate picture styles of any two domains to each other or to characterize a picture with more than two domains. The method aims at migrating the style of the image, namely migrating the characteristics of the wind, the color and the like of one image, and aims at reserving the characteristics and the details of the image in two domains, so that people can be added into scenic images, and the positions and the light and shadow effects of the people cannot be against environmental violations. The fundamental difference between the two is that: the CycleGAN takes a picture in one domain as an original picture, adds some features in the other domain, and generates the content of the picture mainly determined by the original picture; the invention extracts a picture from two domains, synthesizes the characteristic information and detail information of the two pictures, and determines the content of the generated picture by the two pictures.

Disclosure of Invention

The invention solves the technical problems: the method has the advantages that the defects of the prior art are overcome, the picture synthesis method based on the generation countermeasure network is provided, the technology in the deep learning field is used for realizing cross-domain picture synthesis, the operation steps are greatly simplified, and the generation efficiency and the finished product effect are improved; meanwhile, the operation is simple and convenient, tedious manual operation is avoided, and manpower resources and time consumption are saved.

The technical solution of the invention is as follows: a picture synthesis method based on a generated countermeasure network is named as a Cross-domain synthesized countermeasure network, CSGAN (Cross-domain Synthesis Generat ive Neural Net) is used, wherein partial characteristics of DRAGAN and InfoGAN are added, a CSGAN model is firstly built, then a training model is used for learning common characteristics of various samples, and finally the model can be combined with the learned characteristics to generate a specified picture. In addition, the training speed and the result quality are improved by adopting a method in the DRAGAN, and the extraction and the processing of the implicit feature vector are conducted by referring to the technology in the InfoGAN.

The invention provides a picture synthesis method based on a generated countermeasure network, which comprises the following steps:

(1) Collecting pictures required for training, and making sample sets of two pictures, wherein one sample set of the pictures is a landscape picture, the other sample set is a human picture, and all pictures in each sample set should contain some same characteristics and are called a domain;

(2) Constructing and training to generate an countermeasure network, wherein the countermeasure network consists of a feature extractor, a decoder, a generation network and a discrimination network; the feature extractor scans and collects the detail information of the pictures, identifies the patterns in the pictures, takes the patterns shared by the sample sets as the basic features of the pictures of the sample sets, randomly extracts a picture from a domain, extracts the features contained in the picture and represents the features by using a high-dimensional vector, wherein the high-dimensional vector contains the common features of the domain and the unique features in each picture; the decoder is used for pre-training the feature extractor, generates a new picture according to the high-dimensional vector extracted by the feature extractor from a certain domain, ensures that the more similar and better the generated picture is, the more features of the domain picture are reserved as much as possible, thereby ensuring that the feature extractor extracts enough information from the original picture, and then deleting the decoder after the training of the feature extractor is finished; judging whether the network contains two domains according to the pictures generated by the generation network, learning the characteristics of the real pictures in the two domains at the same time, measuring the gap between the real pictures and the generated pictures, and guiding the countermeasure network to generate the pictures with the characteristics of the two domains; wherein parameters are shared between the convolutional layer of the feature extractor and the convolutional layer of the discrimination network (parameters sharing); during the training process, the shared parameters are updated along with the updating of the parameters of the discriminator, but during the updating of the parameters of the extractor, the shared parameters are kept unchanged; the function of the generating network is to generate a brand new image according to given information, the image is true enough to make the discriminating network difficult to accurately discriminate, the input of the image is an N-dimensional vector c composed of two hidden variables h extracted by a feature extractor and a random sampling Gaussian noise z, the two hidden variables h are used for storing the features extracted from the pictures in two domains, the lengths of the two hidden variables h are the same, and z is used for adding more randomness to the generated pictures to make the generated images more diversified;

(3) Initializing parameters for generating each network layer (layer) in the countermeasure network;

(4) The total loss of the countermeasure network consists of the generated loss and the reconstruction loss of the hidden vector between the generated sample and the real sample given by the discrimination network, the reconstruction loss is used for detecting whether the feature extractor can find a common feature vector between the original image and the generated image, the generated loss is used for measuring whether the generated image is real and contains the features of a certain domain, a loss function is required to be selected according to the actual problem, then the reconstruction loss, the generated loss and the discrimination loss are calculated according to the selected loss function, and the gradient of the countermeasure network is calculated in the back propagation process; simultaneously, an optimization method is selected to update the parameters of the countermeasure network; the parameters comprise weights and biases of all network layers;

(5) Training to generate an countermeasure network, recording a generation result, and storing parameters of the network, wherein the parameters comprise weights and biases of all layers and the overall structure of the network;

(6) Evaluating a generation result of the generation countermeasure network, and adjusting network super-parameters, wherein the super-parameters comprise: network learning rate, training period number, implicit vector dimension and feature map number; and then carrying out multiple tests to finally obtain the required generation result.

In the step (3), in the generation of the countermeasure network, the discrimination loss L _D Consists of three parts: the discrimination loss of the real sample, the discrimination loss of the generated sample and the loss obtained by calculating gradient penalty correspond to three expected calculations in the formula respectively, and the three expected calculations are expressed as follows:

p in the formula _τ And P _g A data distribution representing the real samples and the generated samples; epsilon represents a coefficient for controlling the gradient punishment force, and the gradient punishment force is set to be 10 through repeated test determination; k represents the range in which the gradient is expected to be controlled, gradients larger or smaller than k are subjected to gradient penalty, and k is 1 through trial and error; n (N) _d Is a normal distribution with a mean of 0 and a variance of cI, where c is set to 10 through trial and error; (V) _x D refers to the gradient of the arbiter; θ is a parameter of the arbiter;

generating total loss L of network _G The discrimination network gives discrimination loss between the score and the sample label and reconstruction loss L of the hidden vector _recon The composition, formula, is:

L _G ＝-E _a ～P _A ,z～N(0,1)[D _A (G(Ex _A (a),Ex _B (b),z)]

-E _b ～P _B ,z～N(0,1)[D _B (G(Ex _A (a),Ex _B (b),z)]+λL _recon

L _recon ＝V(Ex _A (G(Ex _A (a),Ex _B (b),z),Ex _A (a))

+V(Ex _B (G(Ex _A (a),Ex _B (b),z),Ex _B (a))

wherein P represents sample distribution, z represents noise conforming to standard Gaussian distribution, V represents reconstruction loss, and mean square error calculation is commonly used; λ represents the specific gravity of the reconstruction loss in the generation loss; e represents the computation of mathematical expectations, ex represents the extractor; a and b represent samples extracted from both distributions, and z represents a gaussian noise randomly sampled from a standard normal distribution. ζ is a superparameter named feature specific gravity ratio having a value between 0 and 1, which ratio, as the name implies, is suitable for regulating the specific gravity of a generated sample under the influence of two types of sample features. According to the actual needs, the size of ζ can be changed to regulate and control the content of the generated sample, and the value of the coefficient is directly given by the user.

The loss function is a binary cross entropy loss, a mean square error, an exponential loss function or a range loss function.

Compared with the prior art, the invention has the advantages that:

(1) The invention uses the technology in the deep learning field to realize the cross-domain picture synthesis, greatly simplifies the operation steps, absorbs the advantages of other generation countermeasure networks, and improves the generation efficiency and the finished product effect. The model in the invention needs to use a large amount of samples and time for training, but the model after training can finish the process of synthesizing the cross-domain pictures only by spending a short time, and the invention has simple and convenient operation and high practical value. Moreover, the method can avoid complicated manual operation, and saves manpower resources and time consumption.

(2) The invention also learns the characteristics of two domains, but can learn similar detail information between pictures of different domains, and reflects the details in the generated pictures, namely adding more details as much as possible on the basis of retaining the characteristics of the pictures of the two domains. The invention can combine the pictures in two domains and combine the information such as illumination, color, style and the like in the pictures to make the generated image more realistic.

Drawings

Fig. 1 is a network flow diagram of CSGAN according to the present invention;

fig. 2 is a network configuration diagram of the CSGAN according to the present invention;

FIG. 3 is a block diagram of a feature extractor of CSGAN in accordance with the present invention;

fig. 4 is a diagram of a network structure for generating CSGAN according to the present invention;

Detailed Description

The picture composition method based on the generation of the countermeasure network is explained in detail below with reference to the accompanying drawings, and the basic flow is shown in fig. 1.

1. Sample pictures are gathered from the network and preprocessed.

The present generation countermeasure network requires collection and downloading of a large number of samples on the network, and extensive training to learn a probability distribution and generate data. All pictures require a resolution of greater than 128X128, and the picture content includes scenery and portraits. The portrait pictures are from CelebA picture sets, and the landscape pictures are from network picture sets obtained by crawling web pages by Python. These pictures were made into two sample sets. The sample set A comprises N portrait pictures in a CelebA sample set, and N landscape pictures collected by authors are stored in the sample set B;

numbering all sample images, and deleting blurred pictures with watermarks. The grey scale picture is fitted with a textual description and the textual description is converted into a vector for limiting the style of the generated image. The text description is implemented using pixel normalization, formulated as:

where X' is the normalized pixel, X is the currently required pixel, b=255, a=0 since it is desirable to convert the text vector into a gray scale form. The normalized vector will be passed into the generator and the discriminant as a description vector during training.

And selecting N portrait pictures as a sample set A of the neural network, and taking N landscape pictures as a sample set B of the neural network. The sample set is divided into training samples and random samples during each training process, and downsampling operation is performed according to the resolution of the current stage of the generation countermeasure network, and the downsampling operation is completed by using a self-defined automatic encoder, so that the visual dimension reduction is performed through the self-defined automatic encoder. Then, all the real sample picture tags are set to 1, indicating that they are real samples. Finally, the OpenCV graphic library is used for carrying out image enhancement operation on the sample pictures, so that the machine can better distinguish the images.

2. Construction of CSGAN model

Firstly, the functions and principles of the network are analyzed, and how to realize the picture synthesis function needs to be determined in the analysis process. Then building a new architecture for generating an countermeasure network by using a Pytorch deep learning library: based on the original generated countermeasure network structure, a feature extraction layer is added, the network is designed into a two-way structure, hidden variables with limited dimensions are added to represent features, and a generated countermeasure network for image synthesis is built by combining the related characteristics of DRAGAN and InfoGAN and named CSGAN. As shown in fig. 2. After the model structure is determined, the parameters in the generated countermeasure network need to be weighted normalized (Weight Normalization) first so as to equalize the learning rate.

This generation of an countermeasure network consists of two featuresExtractor E _A And E is _B Two discrimination networks D _A And D _B And a generation network G. The Feature extractor consists of a number of convolution layers (Convolutional Layer), as shown in fig. 3, which extract Feature maps (Feature maps) from a given picture, fold (flame) the Feature maps into a one-dimensional array, which is the required implicit vector. The generating network is composed of a plurality of deconvolution layers (Transposed Convolutional Layer), each deconvolution layer is composed of a deconvolution network, a batch regularization layer and an activation function, as shown in fig. 4, it can do deconvolution operation to the input vector, then through the regularization layer and the activation function, a new picture is regenerated, namely, a Generated Sample is Generated. The discrimination network is a simple fully connected network (Fully Connection Nets) consisting of X linear layers, X depending on the complexity of the task, which serves to discriminate whether a picture is authentic and contains features of a certain class of images. The judging network can be added with a Dropout parameter regularization method during training, namely, when the judging network is used each time, some units in the nerve layer are selected randomly and temporarily hidden.

3. Network layer parameters of the CSGAN are initialized.

And carrying out parameter initialization operation on the convolution layer of the network, wherein the initialization mode is to set the bias of each layer to be 0, and initialize the weight through a LeCun normal distribution initialization method, so that the nonlinear layer SELU can normally play a role.

4. Selecting a loss function proposed in the DRAGAN to calculate reconstruction loss, generation loss and discrimination loss; selecting an Adam optimization function to update network parameters in a back propagation process; common loss functions are binary cross entropy loss, mean square error, exponential loss function, and range loss function.

5. Training CSGAN model

Firstly, the CSGAM model is trained by fixedly generating network parameters, and training a discrimination network, wherein the steps are as follows:

(5.1) according to the batch size m set in the super parameter, randomly performing in two sample setsSampling, respectively taking m portrait pictures x _A And N scenery pictures x _B As training samples, and record the pictures in sample set A as { x } _A ¹ ，x _A ² ，……，x _A ^m An ith sample picture denoted as x _A ⁱ The pictures in sample set B are denoted as { x } _B ¹ ，x _B ² ，……，x _B ^m An ith sample picture denoted as x _B ⁱ ，

(5.2) fixedly generating parameters of the network G, only allowing updating parameters of the discrimination network D, and zeroing the discrimination network gradient;

(5.3) X is _A ⁱ And x _B ⁱ Respectively into the feature extractor E _A And E is _B In the method, the dimension of the picture is reduced, and two groups of hidden vectors H of H dimensions are generated ₁ And h ₂ The method comprises the steps of carrying out a first treatment on the surface of the The implicit vector is generated in the following way: inputting a sample picture into a feature extractor, outputting two H-dimensional vectors a and b, sampling from normal distribution to obtain H-dimensional noise z, and calculating a final hidden vector H, wherein the calculation method comprises the following steps:

where e represents the natural logarithm and i represents the ith dimension of the vector.

(5.4) random sampling on a Standard Gaussian distribution, generating m H-dimensional Gaussian noise z _noise ；

(5.5) z is to _noise 、h ₁ And h ₂ Are connected together to form a 3H-dimensional vector, which is the input vector z of the generated network _fake ；

(5.6) true Picture x _A ⁱ And x _B ⁱ The label of (1) is set to be 1, and then the label is respectively transmitted into the discrimination network D _A And D _B In which the discrimination Loss of a real sample is calculated using a Binary Cross Entropy Loss function (Binary Cross-Entropy Loss) according to a given discrimination score, the Loss being propagated back moreNewly judging the gradient of the network parameters;

(5.7) vector z _fake Inputting the generation network G, forging N generation samples x _fake ；

(5.8) Picture x _fake The tag of (2) is set to 0 and then is transmitted into the discrimination network D _A And D _B In which a discrimination loss of the generated samples is calculated using a binary cross entropy loss function according to the given score, the loss is propagated back to update the discrimination network parameters, the process is formulated as:

θ _d ←θ _d +η▽V(θ _d )

where η represents the learning rate in the hyper-parameters, and in the present invention, the learning rate is set to 0.0001, and although a lower learning rate slows down the convergence rate of training, the training result quality is better, V (θ _d ) Representing gradients of the arbiter parameters; v represents the gradient, θ _d Is a symbol custom used to represent a parameter.

(5.9) applying Gradient Penalty (Gradient Penalty) to the discrimination loss, specifically, adding a Penalty term in the calculation process of the discrimination loss, wherein the calculation method of the Penalty term is derived from DRAGAN, and the calculation method is as follows:

λEx～P _τ ,δ～N _d (0,cI)[max(0,||▽xD _θ (x+δ)|| ² -1)]

p in the formula _r Representing a distribution of real sample data, x being sampled in this distribution; lambda represents a coefficient for controlling the strength of the gradient penalty, N _d Is a normal distribution with a mean of 0 and a variance of cI, where c is set to 10; (V) _x D refers to the gradient of the arbiter; θ is a parameter of the arbiter; the penalty term has the same effect as the generation of the WGAN and other penalty terms with gradient penalty in the antagonism network, the gradient of the discriminator can be limited to k, in the invention, k is set to be 1, and only the penalty term in the DRAGAN has better effect in the invention; the gradient of the discriminator can oscillate near 1 in the training process, the oscillation amplitude can be gradually reduced along with the training, the reduction rate can be accelerated by adjusting the punishment force lambda, and the training process is finishedAnd finally limiting the discrimination gradient to 1; counter-propagating the penalty term and updating the gradient of the discrimination network parameters;

(5.10) optimizing the discrimination network using Adam function;

then fixing parameters of the discrimination network and the feature extractor, and training the generation network, wherein the steps are as follows:

(5.11) fixedly judging the parameters of the network D and the feature extractor E, only allowing updating the parameters of the generating network G, and zeroing the generating network gradient;

(5.12) x is _A ⁱ And x _B ⁱ Respectively into the feature extractor E _A And E is _B Regenerating two groups of hidden vectors H of H dimension ₁ And h ₂ Then z is _noise 、h ₁ And h ₂ Are connected together to form an input vector z _fake ；

(5.13) vector z _fake Inputting the generating network G, and counterfeiting N generating samples x _fake 。

(5.14) Picture x _fake The label of (1) is set to be 1, and then the identification network D is transmitted _A And D _B In which a binary cross entropy loss function is used to calculate a generation loss L of the generation sample according to the difference between the given discrimination score and the label _G Back-propagating the loss and updating the parameter θ _g This process is formulated as:

θ _g ←θ _g -η▽V(θ _g )

wherein V (θ) _g ) Representing the gradient of the generator parameters.

(5.15) Picture x _fake Respectively into the feature extractor E _A And E is _B In (1) obtaining an implicit vector h ₃ And h ₄ . Will h ₃ And h ₄ And implicit vector h ₁ And h ₂ Comparing the computed differences, computing the reconstruction loss L of the implicit vector using the Mean-square Error (Mean-square Error) _recon 。

(5.16) loss of reconstruction L _recon Multiplying the characteristic specific gravity ζ by the generation loss L _G Adding to obtain the final generation loss, and adding thisLoss counter-propagates and updates network parameters.

(5.17) repeating steps (5.1) to (5.16) M times according to the training period number M specified in the super parameter, and printing various loss values of the network every period, saving the parameters of the generated sample and the model every M/10 periods.

6. The adjustment generates hyper-parameters of the antagonism network.

There are many super-parameters in the network, such as learning rate, picture batch size, training cycle number, target resolution, starting resolution, etc. By adjusting these parameters, the network generated results can be influenced. This corresponds to an optimization procedure, which is to find the optimal parameters to enable the network to generate the best results.

7. Others

The invention optimizes and improves the structure and detail, adds some original designs for realizing target functions, slightly adjusts the training thought and learning process of the neural network, and adaptively changes the overall structure of the network, and the specific steps of the change are as follows:

(7.1) in the convolutional layer of CSGAN, the activation function is changed to SeLU (Scaled Exponential Linear Unit) to replace the common ReLU and the leakage ReLU, and the operation formula is as follows:

where α≡ 1.6732632, λ≡ 1.050701, which are strictly derived, they are directly taken into the formula. The benefits of using SeLU are: its result does not lose details of the region where z is less than 0 like a ReLU; seLU has a saturation region (Saturation Region); the slope of the SeLU in most areas is larger than 1, namely, the SeLU can amplify input data by 1.05070098 times, and the training speed can be increased. It should be noted, however, that the use of SeLU requires a constraint on the initialized value of the weights, the distribution of which should be normal and satisfy the condition of an average value of 0 and a variance of 1. Compared with a batch normalization method, the method is more stable, the obtained result is more accurate, and the convergence rate is also increased.

(7.2) the optimization function of the network uses Adam, which is essentially a combination of Momentum and RMSProp, and then corrects for its bias. Judging that the learning rate of the network (i.e. the step length of Adam) is 0.0001, generating that the learning rate of the network is 0.0005, and estimating the exponential decay rate beta of moment ₁ And beta ₂ 0.5 and 0.999 respectively, the remaining parameters remain default.

(7.3) the extractor and the arbiter in the present invention will share parameters, and there are two methods for sharing parameters: 1. the extractor directly serves as a convolution layer of the discriminator, does not contain an independent network layer, and the discriminator only needs to contain a full connection layer. 2. The extractor and part of network layers in the discrimination layer are the same, and each comprises a part of independent convolution layer and a full connection layer. In the practical training process, the method is used, and has the advantages of less total network parameters and high training speed. The purpose of this is to: features of pictures are detected by an extractor and represented by implicit vectors, corresponding to a "number" given to each picture to identify them, the pictures of the same class having similar "numbers". The arbiter then uses the same extractor to extract these features from the generated picture and represent them as implicit variables, and then discriminates whether this implicit vector matches the features of a certain class of pictures.

(7.4) the input vector for generating the network in the present invention is composed of two parts, one part is an implicit vector for identifying the features and the other part is gaussian noise obtained by random sampling. The implicit vector represents the picture characteristics contained in the two input pictures, and Gaussian noise adds more diversity to the details of the generated pictures, so that the content and the presentation elements of the pictures are richer.

(7.5) the invention adds a superparameter ζ, named characteristic specific gravity, whose value is between 0 and 1, which, as the name implies, is suitable for regulating the specific gravity of the resulting sample under the influence of the characteristics of the two types of samples. According to actual needs, the size of ζ can be changed to regulate and control the content of the generated sample, and the value of the coefficient is directly given by a user; when ζ is larger, the influence of the characteristics of one type of sample on the generated sample is more prominent, and the influence of the characteristics of the other type of sample is weakened, and vice versa; thus, ζ can be changed to obtain a desired generated sample according to the need. This is because during training, the hyper-parameters affect two discriminant losses of the generated samples, formulated as follows:

wherein L is _G Indicating the loss of generation, L _D Representing two discrimination losses; according to zeta different, the generation network can weigh the generation loss brought by various picture characteristics, and the judgment loss with larger coefficient is more paid attention to and corrected while the regretation minimization is carried out.

What has been described above is only an embodiment of a picture composition method based on generating a countermeasure network embodying the invention. The present invention is not limited to the above-described embodiments. The description of the present invention is intended to be illustrative, and not to limit the scope of the claims. Many alternatives, modifications, and variations will be apparent to those skilled in the art. All technical schemes formed by equivalent substitution or equivalent transformation fall within the protection scope of the invention.

Claims

1. The picture synthesis method based on the generation countermeasure network is characterized by comprising the following steps:

(2) Constructing and training to generate an countermeasure network, wherein the countermeasure network consists of a feature extractor, a decoder, a generation network and a discrimination network; the feature extractor scans and collects the detail information of the pictures, identifies the patterns in the pictures, takes the patterns shared by the sample sets as the basic features of the pictures of the sample sets, randomly extracts a picture from a domain, extracts the features contained in the picture and represents the features by using a high-dimensional vector, wherein the high-dimensional vector contains the common features of the domain and the unique features in each picture; the decoder is used for pre-training the feature extractor, generates a new picture according to the high-dimensional vector extracted by the feature extractor from a certain domain, ensures that the more similar and better the generated picture is, the more features of the domain picture are reserved as much as possible, thereby ensuring that the feature extractor extracts enough information from the original picture, and then deleting the decoder after the training of the feature extractor is finished; judging whether the network contains two domains according to the pictures generated by the generation network, learning the characteristics of the real pictures in the two domains at the same time, measuring the gap between the real pictures and the generated pictures, and guiding the countermeasure network to generate the pictures with the characteristics of the two domains; parameters can be shared between the convolution layer of the feature extractor and the convolution layer of the discrimination network; during the training process, the shared parameters are updated along with the updating of the parameters of the discriminator, but during the updating of the parameters of the extractor, the shared parameters are kept unchanged; the function of the generating network is to generate a brand new image according to given information, the image is true enough to make the discriminating network difficult to accurately discriminate, the input of the image is an N-dimensional vector c composed of two hidden variables h extracted by a feature extractor and a random sampling Gaussian noise z, the two hidden variables h are used for storing the features extracted from the pictures in two domains, the lengths of the two hidden variables h are the same, and z is used for adding more randomness to the generated pictures to make the generated images more diversified;

(3) Initializing parameters for generating each network layer in the countermeasure network;

2. A method of composing a picture based on a generation countermeasure network as claimed in claim 1, wherein: in the step (3), in the generation of the countermeasure network, the discrimination loss L _D Consists of three parts: the discrimination loss of the real sample, the discrimination loss of the generated sample and the loss obtained by calculating gradient penalty correspond to three expected calculations in the formula respectively, and the three expected calculations are expressed as follows:

p in the formula _τ And P _g Respectively representing the data distribution of the real sample and the generated sample; epsilon represents a coefficient for controlling the strength of the gradient penalty, k represents in which range it is desired to control the gradient, gradients greater or less than k are subjected to the gradient penalty, N _d Is a normal distribution with a mean of 0 and a variance of cI, where c is set to 10; (V) _x D refers to the gradient of the arbiter; θ is a parameter of the arbiter;

L _G ＝-E _a ～P _A ,z～N(0,1)[D _A (G(Ex _A (a),Ex _B (b),z)]-E _b ～P _B ,z～N(0,1)[D _B (G(Ex _A (a),Ex _B (b),z)]+λL _recon

L _recon ＝V(Ex _A (G(Ex _A (a),Ex _B (b),z),Ex _A (a))+V(Ex _B (G(Ex _A (a),Ex _B (b),z),Ex _B (a))

wherein P represents sample distribution, A is a sample set A of a selected portrait, B is a sample set of a landscape picture, V represents reconstruction loss, and mean square error calculation is adopted; λ represents the specific gravity of the reconstruction loss in the generation loss; e represents the computation of mathematical expectations, ex represents the extractor; a and b represent samples extracted from two distributions; ζ is a superparameter named feature specific gravity ratio having a value between 0 and 1, as the name implies, which ratio is suitable for regulating the specific gravity of the resulting sample under the influence of the two types of sample features; according to the actual needs, the size of ζ can be changed to regulate and control the content of the generated sample, and the value of the coefficient is directly given by the user.

3. A method of composing a picture based on a generation countermeasure network as claimed in claim 1, wherein: the loss function is a binary cross entropy loss, a mean square error, an exponential loss function or a range loss function.