CN112802048A

CN112802048A - Method and device for generating layer generation countermeasure network with asymmetric structure

Info

Publication number: CN112802048A
Application number: CN202110120086.1A
Authority: CN
Inventors: 季向阳; 杨宇
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2021-05-14
Anticipated expiration: 2041-01-28
Also published as: CN112802048B

Abstract

The invention discloses a method and a device for generating a layer generation countermeasure network with an asymmetric structure. The confrontation network is generated by training a layer with an asymmetric structure, a large number of images with foreground mask pseudo labels are generated by using the network, and the images are used as a training data set to train a segmentation network, so that unsupervised object segmentation is realized. The method solves the problem that training of the layer countermeasure generation network is easy to fall into a degradation solution, so that the layer countermeasure generation network can effectively generate the layer, meanwhile, a foreground mask is used as a segmentation pseudo-label of the image and is used for training a segmenter, extra artificial labels are not needed in the whole process, and the time and the cost for preparing data are greatly reduced.

Description

Method and device for generating layer generation countermeasure network with asymmetric structure

Technical Field

The invention relates to the technical field of pattern recognition, computer vision and machine learning, in particular to a method and a device for generating a layer generation confrontation network with an asymmetric structure.

Background

The countermeasure generation network is a basic and important technology in the cross field of deep learning, machine learning and computer vision. A challenge generation network generally includes a generator and an arbiter. The generator converts random vectors which are subject to a certain standard distribution (such as normal distribution) into specific data samples (such as images), and the discriminator distinguishes the generated data samples from real data samples. The generator and the discriminator are alternately and iteratively updated in a mutual confrontation mode, the discriminator gradually improves the discrimination capability of the discriminator, and correspondingly, the data samples generated by the generator for confronting the discriminator are more and more vivid. The final generator can generate a very realistic picture so that the discriminator is essentially unable to distinguish it from the real data.

On the other hand, the countermeasure generation network can generate a high-quality composite image, which becomes an important means for numerous applications such as data generation, data conversion, image editing, and the like. On the other hand, the training for resisting the generated network does not need any manual supervision information, so that the training becomes an important method for many unsupervised learning, weak supervised learning and semi-supervised learning.

The layer countermeasure generating network is different from a general countermeasure generating network mainly in a generator. A generator of a general countermeasure generation network directly maps random vectors into generated images, and an image-layer countermeasure generation network maps random vectors into image layers, such as a foreground object image, a foreground object mask, and a background image, and then overlaps the image layers to obtain a final generated image. The simple layer generation network has an important defect that the simple layer generation network is very easy to fall into a degradation solution in the training process, namely all layers are collapsed to a certain layer, so that the layer directly generates a whole picture, and the corresponding foreground object mask is all zero or all one.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, an object of the present invention is to provide a method for generating an image-layer countermeasures network with an asymmetric structure, which solves the problem that training of the image-layer countermeasures generation network is prone to fall into a degraded solution, so that the image-layer countermeasures generation network can effectively generate an image layer.

Another objective of the present invention is to provide an apparatus for generating an overlay generation countermeasure network with an asymmetric structure.

In order to achieve the above object, an embodiment of the present invention provides a method for generating a layer generation countermeasure network with an asymmetric structure, including:

inputting the continuous random variable into a background generator of the asymmetric layer generator, and outputting to obtain a background image;

inputting the continuous random variable, the first discrete variable and the second discrete variable into a foreground generator of the asymmetric image layer generator, and outputting to obtain a foreground image and a foreground mask;

disturbing the foreground image and the foreground mask through a layer disturbing machine, and stacking the background image, the disturbed foreground image and the disturbed foreground mask to obtain a generated image;

inputting the generated image and the real image into a discriminator at the same time, obtaining a loss function of the discriminator according to the counterstudy loss function, and training the discriminator;

inputting the generated image into a discriminator, calculating an antagonistic learning loss value, carrying out pseudo classification on the generated image and the generated foreground mask by using an auxiliary classifier, calculating a cross entropy according to a first discrete variable, a second discrete variable and an output value of the auxiliary classifier to obtain a loss function of a generator, and training the generator;

and repeatedly carrying out alternate training of the discriminator and the generator to obtain the trained layer confrontation generation network.

The method for generating the layer generation countermeasure network with the asymmetric structure, provided by the embodiment of the invention, is used for preventing all foreground masks from degrading by introducing disturbance when the layers are stacked, provides an asymmetric structure for preventing all zero foreground masks from degrading, and can generate a large number of vivid images by a perfect layer countermeasure generation network, and meanwhile, the images have layer representation and comprise foreground object masks. Further a solution for unsupervised object segmentation is provided: by using the generated data, the foreground object mask is regarded as segmentation labels, and a segmentation network is trained, so that an effective segmenter is obtained.

In addition, the method for generating an overlay generation countermeasure network with an asymmetric structure according to the above embodiment of the present invention may further have the following additional technical features:

further, in an embodiment of the present invention, the method further includes:

and generating a plurality of images with foreground masks by using the trained image layer confrontation network, training a segmentation network by using the images with the plurality of foreground masks, and segmenting the object by using the trained segmentation network.

Further, in an embodiment of the present invention, the first discrete variable and the second discrete variable are related to each other through a hierarchical relationship, and the first discrete variable is derived from the second discrete variable, and the first discrete variable is used for representing a category of shape and posture characteristics of a foreground object, which is represented on the foreground mask; the second discrete variable is used for representing a specific appearance style of a foreground object, and the specific appearance style is represented on the foreground image.

Further, in an embodiment of the present invention, applying a perturbation to the foreground image and the foreground mask by an image layer perturber includes:

applying a disturbance to the position, the size and the angle of the foreground image layer:

wherein, x is a foreground image layer including a foreground image or a foreground mask, u and v are pixel coordinates, alpha is a disturbance angle, delta u and delta b are disturbance offset, and s is a disturbance scaling ratio.

Further, in an embodiment of the present invention, stacking the background image, the disturbed foreground image, and the foreground mask to obtain a generated image includes:

wherein x is_bAs background image, x_fIs a foreground image, and the image is a foreground image,

for the perturbed image of the mask,

the foreground image after disturbance is obtained.

Further, in an embodiment of the present invention, the layer generation penalty function against the network is:

wherein V (D, G) is an antagonistic learning loss term, λ_MI,cFor the second mutual information loss term weight, V_MI,c(G,Q_c) For the second mutual information loss term, λ_MI,pIs the first mutual information loss term weight, V_MI,p(G,Q_p) Is a first mutual information loss term, λ_binFor binarizing the loss term weight, V_bin(G) Is a binary loss term.

In order to achieve the above object, an embodiment of another aspect of the present invention provides an apparatus for generating an overlay generation countermeasure network with an asymmetric structure, including:

the background generation module is used for inputting the continuous random variable into a background generator of the asymmetric layer generator and outputting to obtain a background image;

the foreground generation module is used for inputting the continuous random variable, the first discrete variable and the second discrete variable into a foreground generator of the asymmetric image layer generator and outputting to obtain a foreground image and a foreground mask;

the disturbance module is used for applying disturbance to the foreground image and the foreground mask through the image layer disturber and stacking the background image, the disturbed foreground image and the disturbed foreground mask to obtain a generated image;

the processing module is used for judging the authenticity of the generated image through a discriminator to obtain a counterstudy loss function; pseudo-classifying the generated image by using an auxiliary image classifier, and calculating cross entropy according to the second discrete variable and the class output by the auxiliary image classifier; pseudo-classifying the foreground mask through an auxiliary mask classifier, and calculating cross entropy according to the first discrete variable and the class output by the auxiliary mask classifier; weighting and summing the three terms to obtain a loss function of the generator;

and the training module is used for alternately training the discriminator and the generator according to the loss function of the layer generation confrontation network to obtain the trained layer confrontation generation network.

The image layer generation countermeasure network generation device with the asymmetric structure provided by the embodiment of the invention is used for preventing all foreground mask degradation solutions by introducing disturbance when the image layers are stacked, provides an asymmetric structure to prevent all zero foreground mask degradation solutions, and can generate a large number of vivid images by a perfect image layer countermeasure generation network, and meanwhile, the images have image layer representation and comprise foreground object masks. Further a solution for unsupervised object segmentation is provided: by using the generated data, the foreground object mask is regarded as segmentation labels, and a segmentation network is trained, so that an effective segmenter is obtained.

In addition, the layer generation countermeasure network generation apparatus with an asymmetric structure according to the above embodiment of the present invention may further have the following additional technical features:

and the segmentation module is used for generating a plurality of images with foreground masks by using the trained layer confrontation network, training a segmentation network by using the images with the plurality of foreground masks, and segmenting the object by using the trained segmentation network.

for the perturbed image of the mask,

the foreground image after disturbance is obtained.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flowchart of a method for generating a countermeasure network for generating an overlay having an asymmetric structure according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for generating a countermeasure network for an image layer with an asymmetric structure according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a segmented network training according to one embodiment of the present invention;

FIG. 4 is a schematic diagram of an asymmetric layer countermeasures generation network collapse degradation solution according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of an asymmetric layer confrontation generation network according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a split network according to one embodiment of the present invention;

fig. 7 is a schematic structural diagram of an apparatus for generating an anti-adversarial network with an asymmetric layer structure according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The following describes a method and an apparatus for generating an overlay generation countermeasure network with an asymmetric structure according to an embodiment of the present invention with reference to the drawings.

First, a method for generating an overlay generation countermeasure network having an asymmetric structure according to an embodiment of the present invention will be described with reference to the drawings.

Fig. 1 is a flowchart of a method for generating a countermeasure network for generating an overlay with an asymmetric structure according to an embodiment of the present invention.

Fig. 2 is a flowchart of a method for generating a countermeasure network for an image layer with an asymmetric structure according to an embodiment of the present invention.

As shown in fig. 1 and fig. 2, the method for generating an overlay generation countermeasure network with an asymmetric structure includes the following steps:

and step S1, inputting the continuous random variable into a background generator of the asymmetric layer generator, and outputting to obtain a background image.

And step S2, inputting the continuous random variable, the first discrete variable and the second discrete variable into a foreground generator of the asymmetric image layer generator, and outputting to obtain a foreground image and a foreground mask.

The dissimilarity network for generating asymmetric image layers comprises an asymmetric image layer generatorThe image layer scrambler, the discriminator and the auxiliary classifier. Wherein the inputs to the asymmetric layer generator are a random variable z and a first discrete variable c_pAnd a second discrete variable c_cThe output is a series of layers including a background image, a foreground mask, and a foreground image. The layer perturbator mainly performs a small perturbation on the characteristics of the position, the size, the angle and the like of the foreground layer, and the introduction of the module is to prevent the occurrence of degradation solution in the training process.

Specifically, the asymmetric layer generator includes a foreground generator and a background generator, and since the structures of the foreground generator and the background generator are different, the functions of the foreground generator and the background generator are not interchangeable, so that the asymmetric layer generator is called as an asymmetric layer generator. The input to the background generator is a continuous random variable z and the output is a background image. The input to the foreground generator is a continuous random variable z shared with the background generator and its proprietary discrete random variable c_pAnd c_cThe output is the foreground image and the foreground mask. Wherein c is_pAnd c_cAre related to each other by a hierarchical relationship, i.e. by c_cCan push out c_p. Wherein c is_pIs used to represent the categories of the characteristics of the foreground object such as shape and posture, and is represented on the foreground mask. And c_cThe specific appearance pattern used to represent the foreground object is reflected on the generated foreground image. The input to the foreground generator contains proprietary variables that can prevent the degenerate solution from occurring.

And step S3, disturbing the foreground image and the foreground mask through the image layer disturbing device, and stacking the background image, the disturbed foreground image and the foreground mask to obtain a generated image.

Specifically, the layer passes through the layer perturber to apply a small perturbation to the position, size, angle and other characteristics of the foreground layer, that is, the layer is perturbed by the layer perturber

Wherein alpha is a disturbance angle, delta u and delta v are disturbance offset, s is a disturbance scaling proportion, and uniform sampling is carried out in a certain interval in the training process. The amount of disturbance is generally small so as not to affect the realism of the final superimposed image. Perturbators may also prevent degenerative solutions from occurring.

The layers after disturbance are stacked to obtain a generated image

And step S4, inputting the generated image and the real picture into the discriminator at the same time, obtaining the loss function of the discriminator according to the counterstudy loss function, and training the discriminator.

And step S5, inputting the generated image into a discriminator, calculating an antagonistic learning loss value, carrying out pseudo classification on the generated image and the generated foreground mask by using an auxiliary classifier, calculating cross entropy according to the first discrete variable, the second discrete variable and the output value of the auxiliary classifier to obtain a loss function of the generator, and training the generator.

And step S6, repeating the alternate training of the discriminator and the generator to obtain the trained layer confrontation generation network.

The layers are subjected to a stacking process to form a final generated image, and the generated image is sent to a discriminator to judge the authenticity. In the training process, the discriminator also receives the real image and is used for comparing the real image with the generated image, so that the generated image is more and more vivid. In addition, the generator has an additional branch for implementing a pseudo-classification for maximizing the generation of the image and c_cThe mutual information between them. In addition, an auxiliary classifier is used to perform pseudo-classification on the generated foreground mask, further preventing the occurrence of degraded solutions.

Specifically, the generated image is sent to the discriminator D to discriminate authenticity. In addition to the authenticity branch, the discriminator also has an auxiliary branch for pseudo-classifying the generated image. C used by the formation process_cCalculating cross entropy with the class distribution of the branch output to obtain the loss of the branchFunction is as

In addition to this, the auxiliary classifier D_mPseudo-classification of foreground masks is introduced, and similarly, its loss function is:

the learning problem of the whole map layer generating countermeasure network can be summarized according to the mode of countermeasure learning as follows:

wherein:

in order to combat the learning loss function,

is a concave function, and depending on the particular challenge generating network, for example, a hind loss function, i.e., f (t) -max (0,1-t), may be chosen. The last term is:

for binarizing the foreground mask as much as possible.

It will be appreciated that after the training of the layer confrontation generation network is completed, it can be used to generate a large number of images with foreground masks. These images are used to train a segmentation network in accordance with a pattern of general supervised training to achieve object segmentation.

Specifically, as shown in fig. 3, the generator for generating the countermeasure network by using the trained image layer synthesizes some images with foreground masks, and the foreground masks are used as segmentation pseudo labels of the images and used for training the segmenter. On the basis of generating a network by layer confrontation, the unsupervised object segmentation problem is solved, so that the training of the model does not depend on the labeled data any more in the aspect of the object segmentation problem, and the time and the cost for preparing the data are greatly reduced.

The layer generation countermeasure network is very easy to fall into a degradation solution in the training process, namely, multiple layers collapse into a single layer, and the corresponding foreground mask is all zero or all one. In one aspect, embodiments of the invention introduce perturbations when stacking layers to prevent degradation of a full foreground mask. In another aspect, embodiments of the present invention provide an asymmetric structure to prevent degradation of an all-zero foreground mask. Specifically, the generation of the background layer depends only on the shared variables, while the generation of the foreground layer depends on the shared variables and the proprietary variables. Finally, the generated image is required to have high mutual information with proprietary variables. The asymmetric structure can effectively prevent the occurrence of the all-zero foreground mask degradation solution.

As shown in fig. 4, the asymmetric layer confrontation generation network provided by the present invention can effectively prevent degradation solution from being trapped in the training process. The specific principle is illustrated by the following figure, if the degradation solution of all the foreground masks occurs, the stacked images will have abrupt and unrealistic boundaries due to the effect of the layer perturber, which will be penalized by the discriminator; if a degenerate solution of the all-zero foreground mask occurs, the generated image will not contain any information about the proprietary variable c, and the mutual information of the generated image and c is low, which is penalized by the mutual information loss function. And if the normal solution occurs, the generated image contains the information of c, and the mutual information of the generated image and c is maximized.

As shown in fig. 5, a network structure of layer generator, discriminator and auxiliary classifier at 128 × 128 resolution is given. Where the convolution kernel size of all convolutional layers of the generator is 3 x 3 and in the discriminator the convolution kernel size of all convolutional layers is 4 x 4. For the discriminator, two head branches respectively realize true and false discrimination and mutual information estimation, and for the auxiliary classifier, only the head of the mutual information estimation is reserved.

As shown in fig. 6, the unsupervised object segmentation method does not rely on manual labeling, so that the limitation on the data collection process is small, and the time and cost for preparing data can be greatly reduced. The network can be trained and tested at any resolution, such as 128 x 128 or 64 x 64 resolution. Where the convolution kernel size of all convolution layers is 3 x 3.

According to the method for generating the layer generation countermeasure network with the asymmetric structure, which is provided by the embodiment of the invention, the degradation solution of the all-foreground mask is prevented by introducing disturbance when the layers are stacked, the asymmetric structure is provided to prevent the degradation solution of the all-zero foreground mask, and the perfect layer countermeasure generation network can generate a large number of vivid images which are represented by the layers and comprise the foreground object mask. Further a solution for unsupervised object segmentation is provided: by using the generated data, the foreground object mask is regarded as segmentation labels, and a segmentation network is trained, so that an effective segmenter is obtained.

Next, an image layer generation countermeasure network generation apparatus having an asymmetric structure according to an embodiment of the present invention will be described with reference to the drawings.

As shown in fig. 7, the map layer generation countermeasure network generation apparatus with an asymmetric structure includes: a first generation module 701, a second generation module 702, a perturbation module 703, a processing module 704 and a training module 705.

And the background generating module 701 is configured to input the continuous random variable into a background generator of the asymmetric layer generator, and output the continuous random variable to obtain a background image.

And a foreground generating module 702, configured to input the continuous random variable, the first discrete variable, and the second discrete variable into a foreground generator of the asymmetric layer generator, and output a foreground image and a foreground mask.

And the perturbation module 703 is configured to apply perturbation to the foreground image and the foreground mask through the layer perturber, and stack the background image, the perturbed foreground image, and the foreground mask to obtain a generated image.

A processing module 704, configured to determine authenticity of the generated image through a discriminator to obtain a counterlearning loss function; pseudo-classifying the generated image by using an auxiliary image classifier, and calculating cross entropy according to the second discrete variable and the class output by the auxiliary image classifier; pseudo-classifying the foreground mask through an auxiliary mask classifier, and calculating cross entropy according to the first discrete variable and the class output by the auxiliary mask classifier; and weighting and summing the three terms to obtain a loss function of the generator.

The training module 705 is configured to generate a loss function of the countermeasure network according to the layer, and alternately perform training of the discriminator and the generator to obtain a trained layer countermeasure generation network.

Further, in an embodiment of the present invention, the first discrete variable and the second discrete variable are related to each other through a hierarchical relationship, and the first discrete variable is derived from the second discrete variable, and the first discrete variable is used for representing the type of the shape and the posture characteristics of the foreground object and is represented on the foreground mask; the second discrete variable is used for representing a specific appearance style of the foreground object, and is represented on the foreground image.

for the perturbed image of the mask,

the foreground image after disturbance is obtained.

It should be noted that the foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and is not repeated herein.

According to the layer generation countermeasure network generation device with the asymmetric structure provided by the embodiment of the invention, disturbance is introduced when layers are stacked to prevent the degradation solution of the all-foreground mask, an asymmetric structure is provided to prevent the degradation solution of the all-zero foreground mask, and a perfect layer countermeasure generation network can generate a large number of vivid images which are represented by the layers and comprise the foreground object mask. Further a solution for unsupervised object segmentation is provided: by using the generated data, the foreground object mask is regarded as segmentation labels, and a segmentation network is trained, so that an effective segmenter is obtained.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A method for generating a layer generation countermeasure network with an asymmetric structure is characterized by comprising the following steps:

2. The method of claim 1, further comprising:

3. The method according to claim 1, wherein the first discrete variable and the second discrete variable are related to each other through a hierarchical relationship, the first discrete variable is derived through the second discrete variable, and the first discrete variable is used for representing the category of the shape and the posture characteristic of a foreground object and is represented on the foreground mask; the second discrete variable is used for representing a specific appearance style of a foreground object, and the specific appearance style is represented on the foreground image.

4. The method of claim 1, wherein applying perturbation to the foreground image and the foreground mask by an overlay perturber comprises:

wherein, x is a foreground image layer including a foreground image or a foreground mask, u and v are pixel coordinates, alpha is a disturbance angle, delta u and delta v are disturbance offset, and s is a disturbance scaling ratio.

5. The method of claim 1, wherein stacking the background image, the perturbed foreground image, and a foreground mask to generate an image comprises:

for the perturbed image of the mask,

the foreground image after disturbance is obtained.

6. The method of claim 1, wherein the graph layer generation penalty function against the network is:

wherein V (D, G) is an antagonistic learning loss term, λ_MI，cFor the second mutual information loss term weight, V_MI，c(G，Q_c) For the second mutual information loss term, λ_MI，pIs the first mutual information loss term weight, V_MI，p(G，Q_p) Is a first mutual information loss term, λ_binFor binarizing the loss term weight, V_bin(G) Is a binary loss term.

7. An apparatus for generating an overlay generation countermeasure network having an asymmetric structure, comprising:

8. The apparatus of claim 7, further comprising:

9. The apparatus according to claim 7, wherein the first discrete variable and the second discrete variable are related to each other through a hierarchical relationship, and the first discrete variable is derived through the second discrete variable, and is used for representing the category of the shape and the posture characteristic of a foreground object, which is represented on the foreground mask; the second discrete variable is used for representing a specific appearance style of a foreground object, and the specific appearance style is represented on the foreground image.

10. The apparatus of claim 7, wherein stacking the background image, the perturbed foreground image, and a foreground mask to generate an image comprises:

for the perturbed image of the mask,

the foreground image after disturbance is obtained.