CN115424119B

CN115424119B - Image generation training method and device capable of explaining GAN based on semantic fractal

Info

Publication number: CN115424119B
Application number: CN202211373030.8A
Authority: CN
Inventors: 李超; 王劲
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2022-11-04
Filing date: 2022-11-04
Publication date: 2023-03-24
Anticipated expiration: 2042-11-04
Also published as: CN115424119A

Abstract

The invention discloses a semantic fractal-based interpretable GAN image generation training method and a device, wherein a traditional GAN model is modified into an interpretable GAN, so that the high-level feature representation of the model is clear and consistent; in interpretable GAN, all filters at the same layer "learn" to be activated by the same part, so that features inside the GAN represent clear and consistent semantic information; according to the method, a fractal loss function is designed under the condition that no additional marking is needed and only a normal training sample is used, and the characteristic expression of the GAN is restrained, so that the GAN can automatically learn important parts of an attention object; by optimizing the original loss function and the fractal loss function of the GAN at the same time, the interpretability of internal features of the GAN is improved, and meanwhile, the quality of the generated picture of the GAN is guaranteed.

Description

Image generation training method and device capable of explaining GAN based on semantic fractal

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an image generation training method and device capable of explaining GAN based on semantic fractal.

Background

In recent years, with the rise and development of generation of countermeasure Networks (GANs), it has become possible to directly generate high-fidelity and diversified pictures from noise data. The generation countermeasure network consists of generators (G) whose goal is to map random noise to samples, and discriminators (D) whose purpose is to discriminate between real samples and generate samples. The training goal of GAN is to find nash equilibrium for the D and G training processes. Traditional GAN research has focused on solving the problem of poor picture generation effect of original GAN. For this reason, various improvements have been proposed by the predecessors, which can be divided into two main categories: firstly, modifications to the GAN model structure, such as SAGAN introducing attention mechanism, proGAN progressively generating pictures, etc., these structural improvements greatly improve the resolution and fidelity of GAN generated pictures; secondly, the improvement of the model training process, which is sensitive to the original GAN training process without the training aid skill, requires fine tuning and super-parameter adjustment to stabilize the training process, and thus, it is an important research direction to improve the training process of the original GAN to enhance its stability, which includes both theoretical exploration and empirical attempts, such as modification of GAN loss function, adding a gradient penalty term, or normalization technique to limit updating of the D parameter of the discriminator.

However, although GAN generation picture effect is enough to compare with real picture, the mechanism of GAN model internal generation picture is still a "black box", and research on GAN interpretability is also drawing attention of researchers.

On the one hand, the content of the picture generated by GAN is uncertain, and people cannot correlate the input random noise with the content of the generated picture. For this purpose, interfaceGAN proposes a method of performing a linear transformation on input random noise to perform a specific semantic editing on a generated picture. The InfoGAN provides a mode of maximizing mutual information between random noise and generated picture attributes, so that a certain dimension of the noise can correspondingly control a certain attribute of a picture. The GANspace adopts a principal component analysis mode to find a change direction having semantic information for GAN in a noise space or a feature space. On the other hand, the mechanism of generating pictures inside GAN is unknown, and the features of the middle layer inside GAN are often complex and disordered. In particular, for a certain high-level convolutional layer, its filters may be activated by different sites. For example, when generating a picture of a human face, filter a is activated by the eyes and nose, filter B is activated by both ears, and filter C is activated by the mouth. The high-level feature representation becomes complex, fuzzy, and inconsistent. However, there is relatively little interpretative research on the internal characteristics of GAN, which also presents significant challenges for its further development.

Disclosure of Invention

In order to solve the defects of the prior art, realize the automatic attention to the important part of an object in an image, improve the interpretability of the internal image characteristics and ensure the quality of the generated image, the invention adopts the following technical scheme:

an image generation training method capable of explaining GAN based on semantic fractal comprises the following steps:

s1, generating a countermeasure network based on loss function training through an image training set to obtain a generator of the countermeasure network;

s2, acquiring a first feature map output by the image generator in the middle layer, and performing semantic fractal on the first feature map;

s3, constructing fractal loss for the middle-layer characteristics of the generation countermeasure network according to the semantic fractal result, and comprising the following steps:

s3.1, obtaining a fractal result of the first feature graph divided based on a semantic fractal rule;

s3.2, acquiring a second feature map output by the image generator in the middle layer, and taking the matching degree of the second feature map and the fractal result as fractal loss so as to enable the fractal of the second feature map to be close to the first feature map;

and S4, generating the countermeasure network through joint optimization based on the fractal loss and the original loss function of the generated countermeasure network, and training image generation.

Further, the step S2 includes the steps of:

s2.1, designing a semantic fractal rule of the feature map; dividing the feature graph into a plurality of fractal by taking each fractal as a class based on the principles of minimizing intra-class variance and maximizing inter-class distance, wherein the fractal is not overlapped;

and S2.2, carrying out iterative division on the first characteristic diagram based on a fractal rule.

Further, in step S2, the first feature map is an activated feature map, and the fractal rule is to divide the feature map into a plurality of fractal shapes according to distribution of activation values of the feature map.

Further, in the step S2.2, in each iteration, a fractal with a large area and a large variance of an activation value is found in the current fractal, and the fractal is divided into two parts according to the principle of minimizing the intra-class variance.

Further, the principle of minimizing the intra-class variance is divided into two, and the formula is as follows:

wherein, the first and the second end of the pipe are connected with each other,

and &>

Respectively representing the variance of the activation values of the two divided sub-fractal parts; />

And &>

Respectively showing the area ratio of the two sub-fractal parts after division.

Further, the fractal result in step S3.1 includes a fractal position of the first feature map based on a fractal rule; in step S3.2, the matching degree of the fractal positions of the second feature map and the first feature map is calculated as a fractal loss function, so that all filters in the same middle layer of the generator are activated by the same part.

Further, in step S4, the generator is reinitialized as the parameters saved in step S1, a generator image is generated according to the training set image, the discriminator is initialized randomly, the generator image and the training set image are discriminated, the countermeasure network is generated by using the loss function fine tuning of step S4, in the training process, the fractal loss and the original loss function of the countermeasure network are used for optimizing the network parameters alternately, so that the convolution kernels in the same layer can express consistent features, the generation logic of the generator is further explained, and the construction of the interpretable GAN is realized.

In order to train the GAN end-to-end so that the GAN can automatically learn feature maps with consistent fractal, based on the original GAN loss function, the fractal loss function (frac loss) is added, and network parameters are alternately optimized between the two loss functions, so as to encourage all feature maps to have consistent fractal while ensuring the picture generation quality.

The device comprises a memory and one or more processors, wherein the memory stores executable codes, and the one or more processors are used for realizing the semantic fractal-based interpretable GAN image generation training method when executing the executable codes.

An image generation method capable of explaining GAN based on semantic fractal is characterized in that an image to be generated is generated through a trained discriminator.

An image generation device based on semantic fractal interpretable GAN comprises a memory and one or more processors, wherein the memory is stored with executable codes, and the one or more processors are used for realizing the image generation method based on semantic fractal interpretable GAN when executing the executable codes.

The invention has the advantages and beneficial effects that:

in the invention, the problem of complex and chaotic characteristics of the layers in the GAN is considered, so that the filters in the same layer represent clear and consistent characteristics; by designing a fractal loss function, the filters of the same layer can be activated by the same part without adding extra artificial labels; by optimizing the original loss function and the fractal loss function of the GAN at the same time, the interpretability of internal features of the GAN is improved, and meanwhile, the quality of the generated picture of the GAN is guaranteed. The method provided by the invention can be widely applied to various GAN models and has a certain application prospect.

Drawings

Fig. 1 is a flowchart of an image generation training method for interpretable GAN based on semantic fractal in an embodiment of the present invention.

FIG. 2 is a flowchart of a method for calculating and generating layer feature semantic fractal in a countermeasure network according to an embodiment of the present invention.

FIG. 3 is a flowchart of a method for constructing and generating a fractal constraint for layer feature semantics in an anti-adversarial network in an embodiment of the present invention.

Fig. 4 is an overview of the structure of the generated countermeasure network in the embodiment of the present invention.

Fig. 5 is a schematic structural diagram of an image generation apparatus capable of interpreting GAN based on semantic fractal according to an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.

The present invention aims to modify the conventional GAN model into an interpretable GAN such that its high-level representation of features is unambiguous and consistent. In interpretable GAN, all filters at the same layer "learn" to be activated by the same part, so that features inside the GAN represent clear and consistent semantic information.

The key technology of the invention is that the GAN is automatically made to learn the important parts of the object of interest without additional input/labeling, and only using normal training samples, and all the interest is consistent among different filters of the same high level, i.e. the filters of the same level are activated by the same parts.

The hardware used in the embodiment is a workstation capable of deep learning, and the auxiliary software used in the embodiment is a deep learning training frame pytorech.

Taking DCGAN as an example, the image generation training method for interpretable GAN based on semantic fractal provided by the invention, as shown in fig. 1 and fig. 2, comprises the following steps:

in the embodiment of the invention, an FFHQ data set is selected, only the original loss function of the DCGAN is used for training one DCGAN, and the model parameters of a generator (G) are stored;

s2, acquiring a first feature map output by the middle layer of the image generator, and performing semantic fractal on the first feature map;

in the embodiment of the invention, the input graph is divided in an iterative way based on a fractal rule; the selected generator outputs the feature map with the size of 32 × 32 and activated by the RELU function, and performs semantic shape processing on the feature map. Specifically, as shown in fig. 3, the generator has n layers in total, where the mth layer is a semantic layering layer, and the semantic layering includes the following steps:

the first characteristic diagram is an activated characteristic diagram, and the fractal rule is to divide the characteristic diagram into a plurality of fractal according to the distribution of the activation values of the characteristic diagram.

In the embodiment of the invention, a feature diagram A is randomly selected from output feature diagrams of a target convolutional layer, and a fractal rule is designed;

the semantic fractal of the feature graph means that the feature graph is divided according to a certain rule, and a complete feature graph is divided into a plurality of parts, namely a fractal, wherein the fractal does not overlap with each other;

in order to divide a feature graph into a plurality of fractal according to the distribution of the activation value of the feature graph, each fractal is regarded as a class, and then the feature graph is divided into a plurality of fractal on the basis of the principle of minimizing intra-class variance and maximizing inter-class distance.

S2.2, performing iterative division on the first characteristic diagram based on a fractal rule;

finding a fractal with a large area and a large variance of an activation value in the current fractal every iteration, and dividing the fractal into two parts according to the principle of minimizing the variance in the class; the formula is as follows:

finding the position that minimizes the result of the above formula, divides the current fractal into two, wherein,

and &>

And &>

Step S3, constructing a fractal loss for the middle-layer feature of the generation countermeasure network according to the semantic fractal result, specifically, as shown in fig. 4, including the following steps:

s3.1, obtaining a fractal result of the first feature graph divided based on a semantic fractal rule; the fractal result comprises a fractal position of the first characteristic diagram based on a fractal rule;

in the embodiment of the invention, a fractal position P of a feature diagram A based on a fractal rule is stored _A 。

S3.2, acquiring a second feature map output by the image generator through a middle layer, and taking the matching degree of the second feature map and a fractal result as fractal loss so as to enable the fractal of the second feature map to be close to the first feature map;

and calculating the matching degree of the second characteristic diagram and the fractal position of the first characteristic diagram as a fractal loss function so that all filters in the same layer of the generator are activated by the same part.

In the embodiment of the invention, a characteristic diagram B is selected from the output characteristic diagram of the target convolutional layer, and the characteristic diagram B and the fractal position P are calculated _A As fractal loss function (frac loss):

wherein

Representing a fractal position P _A The fractal index of (2). By the fractal loss function (frac loss), the fractal of the feature diagram B can be encouraged to approach the feature diagram A, so that the feature diagram fractal constraint is obtained, and the aim of activating all filters on the same layer by the same part is fulfilled.

S4, based on the fractal loss and the original loss function of the generated countermeasure network, jointly optimizing the generated countermeasure network, and training the image generation;

In the embodiment of the invention, a fractal loss function (frac loss) is added on the basis of the loss function of the DCGAN. Specifically, a generator (G) is reinitialized as the parameters stored in the step S1, a generator image is generated according to the training set image, a discriminator (D) is initialized randomly to discriminate the generator image and the training set image, the countermeasure network (DCGAN) is generated by using the loss function fine tuning of the step S4, and in the training process, the fractal loss and the original loss function of the countermeasure network are used for optimizing the network parameters alternately, so that convolution kernels in the same layer can express consistent characteristics, the generation logic of the generator (G) is further explained, and the construction of the DCGAN can be explained.

An image generation training device based on semantic fractal interpretable GAN comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for realizing the image generation training method based on semantic fractal interpretable GAN when executing the executable codes.

An image generation method capable of explaining GAN based on semantic fractal is used for generating an image to be generated through a trained discriminator.

Referring to fig. 5, an image generation apparatus based on semantic fractal interpretable GAN according to an embodiment of the present invention includes a memory and one or more processors, where the memory stores executable code, and the one or more processors execute the executable code to implement the image generation method based on semantic fractal interpretable GAN according to the above embodiment.

The embodiment of the image generation device based on semantic fractal interpretable GAN of the invention can be applied to any equipment with data processing capability, such as computers and other equipment or devices. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for running through the processor of any device with data processing capability. In terms of hardware, as shown in fig. 5, a hardware structure diagram of any device with data processing capability where the image generating apparatus based on semantic fractal and interpretable GAN of the present invention is located is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, in the embodiment, any device with data processing capability where the apparatus is located may also include other hardware according to the actual function of the any device with data processing capability, which is not described again.

The specific details of the implementation process of the functions and actions of each unit in the above device are the implementation processes of the corresponding steps in the above method, and are not described herein again.

For the device embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.

An embodiment of the present invention further provides a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the method for generating an image based on semantic fractal interpretable GAN in the above embodiments is implemented.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be any external storage device of a device with data processing capabilities, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. An image generation training method capable of explaining GAN based on semantic fractal is characterized by comprising the following steps:

s2, acquiring a first feature map output by the image generator in the middle layer, and performing semantic fractal on the first feature map, wherein the method comprises the following steps:

s3, constructing fractal loss for the middle-layer characteristics of the generation countermeasure network according to the semantic fractal result, and comprising the following steps of:

2. The image generation training method based on semantic fractal interpretable GAN of claim 1, wherein: in step S2, the first feature map is an activated feature map, and the fractal rule is to divide the feature map into a plurality of fractal according to the distribution of the activation values of the feature map.

3. The image generation training method based on semantic fractal interpretable GAN of claim 2, wherein: in the step S2.2, a fractal with a large area and a large variance of an activation value is found in the current fractal in each iteration, and the fractal is divided into two parts according to the principle of minimizing the intra-class variance.

4. The image generation training method based on semantic fractal interpretable GAN of claim 3, wherein: the principle of minimizing the intra-class variance is divided into two, and the formula is as follows:

wherein the content of the first and second substances,

and

respectively representing the variance of the activation values of the two divided sub-fractal parts;

and

5. The image generation training method based on semantic fractal interpretable GAN of claim 1, wherein: the fractal result in the step S3.1 comprises a fractal position of the first characteristic diagram based on a fractal rule; in step S3.2, the matching degree between the second feature map and the fractal position of the first feature map is calculated as a fractal loss, so that all filters in the same middle layer of the generator are activated by the same part.

6. The image generation training method based on semantic fractal interpretable GAN of claim 1, wherein: in the step S4, the generator is reinitialized as the parameters stored in the step S1, a generator image is generated according to the training set image, the discriminator is initialized randomly to discriminate the generator image and the training set image, the countermeasure network is generated by fine tuning of the loss function in the step S4, and in the training process, the network parameters are optimized alternately by the fractal loss and the original loss function of the generation countermeasure network, so that the convolution kernels in the same layer can express consistent features, and the generation logic of the generator is further explained.

7. An image generation training device based on semantic fractal interpretable GAN, comprising a memory and one or more processors, wherein the memory stores executable code, and the one or more processors execute the executable code to implement the image generation training method based on semantic fractal interpretable GAN as claimed in any one of claims 1 to 6.

8. The semantic fractal-based interpretable GAN image generating method according to claim 1, wherein: and generating an image to be generated through the trained discriminator.

9. An image generation device based on semantic fractal interpretable GAN, comprising a memory and one or more processors, wherein the memory stores executable code, and the one or more processors execute the executable code to implement the image generation method based on semantic fractal interpretable GAN as claimed in claim 8.