CN116109719A

CN116109719A - Fair controllable image generation method based on structured network priori knowledge

Info

Publication number: CN116109719A
Application number: CN202211607479.6A
Authority: CN
Inventors: 陈志勇
Original assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Current assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Priority date: 2022-12-14
Filing date: 2022-12-14
Publication date: 2023-05-12

Abstract

The invention discloses a fair controllable image generation method based on structured network priori knowledge, and belongs to the field of computer vision. In the first stage, the loss is guided by the structural priori knowledge, good initialization parameters are provided for the encoder network, so that the encoder has the differential representation capability of representing the real image before formal training, and direction guiding and initial guarantee are provided for the encoder to more comprehensively cover the real image data. In the second stage, a good encoder guided by the structured priori knowledge is used for providing attribute control constraint for the generation reactance network, and the sustainability of the real image representation extraction capability learned by the encoder is ensured through auxiliary similarity loss. In addition, the invention provides a higher characteristic dimension and linear layer batch standardization strategy, so that the representation and extraction capacity of the model is further improved; the fairness of the model in the image generation process is remarkably improved, and meanwhile, the accuracy of attribute control of the model is improved.

Description

Fair controllable image generation method based on structured network priori knowledge

Technical Field

The invention belongs to the field of computer vision research, is oriented to the problem of controllable generation of depth images, and is mainly applied to the aspects of understanding and explanation of a depth neural network, image data mining, improvement of algorithm reliability and the like.

Background

The image generation technology belongs to a generation algorithm and is one of key algorithms in the field of deep learning research. The generating algorithm and the discriminant algorithm are used as two branches of the deep learning algorithm, have important roles in the field of artificial intelligence research, and compared with the discriminant algorithm, the generating algorithm requires to establish a representation model of data distribution, estimates the probability model of the data distribution through means such as Bayes, variation inference, markov process and the like, and extracts information needed by researchers from the probability model for application to downstream tasks. The generating algorithm requires more complete and comprehensive knowledge of data and tighter mathematical logic reasoning, which provides theoretical guarantee for the reliability of the generating algorithm and challenges for the construction of the generating algorithm.

In the existing image generation method based on the deep learning technology, a deep neural network is generally constructed, parameterized estimation on image data distribution is extracted, and sampling is carried out from the estimated distribution to generate a new image sample meeting the real image data distribution rule. In recent years, a lot of depth image generation technologies with excellent performance and excellent quality are emerging due to the rapid development of computer technologies and deep learning computing hardware. The existing image generation method can be divided into two main categories, namely a display probability density estimation method and an implicit probability density estimation method according to whether the probability model is directly estimated or not. The display probability density estimation method tries to find an approximate solution of image data distribution through Bayesian variation inference, ordinary differential equation and other technologies, and the method develops for many years to obtain great progress, and in recent years, a Diffusion Model (Diffusion Model) based on a Markov chain and a differential equation obtains a remarkable image generation effect, but the method generally needs to establish strict constraint, has a complex mathematical Model structure, has high inference and training cost, and has a large progress space from actual landing and application. The implicit probability density estimation method is used for solving the problem of difficult image data distribution estimation from the aspect of image generation results, constructing deep neural network training loss by taking good image generation quality as a target as far as possible, completing learning of image data distribution by the deep neural network, remarkably reducing the difficulty of image generation, obtaining good image generation effect, and being applied to the fields of video entertainment scenes, image data restoration, attack and defense resistance, data set expansion and the like.

In 2014, the implicit image generation method is represented as a generation countermeasure network, and Goodfellow et al creatively constructs a pair of generator-discriminator binary countermeasure models with completely opposite optimization targets by combining game theory, wherein random noise is taken as input by a generator and is output as an image sample, and true and generated images are taken as input by a discriminator and the probability that the output image is true is taken as input by a discriminator. Reference is made to: goodfellow I, pouset-abadrie J, mirza M, et al generated adversarial networks J Communications of the ACM,2020,63 (11): 139-144. In the optimization process, the arbiter plays the role of a true-false classifier, with the goal of distinguishing as far as possible between true and false images. In the optimization process of the generator, the optimization information is obtained from the true and false discrimination results of the image generated by the discriminator, in the subsequent optimization process, the generator carries out parameter updating according to the prediction results of the discriminator, the discriminator is expected to be deceived in the gradual iteration process, and in the ideal state, the probability of the discriminator discriminating the true image and the generated image is 0.5, at the moment, the true image and the generated image reach zero and a game state, which means that the image generation quality of the generator is comparable to that of the true image, and the image generation target is realized. The generation countermeasure network skillfully avoids the difficulty that the traditional image generation method needs to solve the explicit probability distribution through countermeasure training, reduces the training difficulty of an image generation model, realizes a great breakthrough of the image generation technology, remarkably improves the image generation quality of the generation countermeasure network through years of development, and is widely applied to multiple fields.

Although the derivation of the countermeasure training method for generating the countermeasure network reduces the difficulty of optimizing modeling, the problems of unstable model training, difficult interpretation of image generation results, uncontrollable training and the like also occur. In addition, as the application of deep learning technology in real scenes is wider, higher requirements are put on the controllability, reliability and fairness of the image generation method. The method has the advantages that the inherent expression form of the image data is mined and analyzed, the attribute constraint paradigm is reasonably designed, researchers are helped to realize accurate control of the image generation result by simply modifying the input mode, and the method has important significance for the application of the network countermeasure method in the fields of security, medical treatment and the like. In addition, because the difficulty of generating different image modes is not equal, the generation capability of the model for different attribute and category images in the same data set is deviated, when the model is actually applied, the generation of the countermeasure network may have preference for generating a certain attribute, the generation of a few attribute images is ignored, the structure of the image generation result is single, the diversity is poor, and the application of the generation countermeasure network in the actual field is limited, and further research is needed.

In the aspect of controllable generation research of image attributes, sudipto Mukherjee et al rely on a mutual information maximization principle to generate an countermeasure network framework, and put forward an unsupervised type attribute controllable generation countermeasure network ClusterGAN based on a clustering thought, so that a certain type attribute controllable image generation effect is realized. Reference is made to: mukherjee S, asnani H, lin E, et al ClusterGAN: latent Space Clustering in Generative Adversarial Networks [ J ]. Proceedings of the AAAI Conference on Artificial Intelligence,2019,33:4610-4617. The model is inspired from the clustering thought in the machine learning field, decomposes the noise space input by a generator, divides hidden coding vectors bearing category attribute control, restricts the consistency of image characteristic representation generated by the same hidden coding vector control through cross entropy loss, thereby realizing the aim of maximizing mutual information between hidden coding and generated images, realizing the binding of hidden coding and image attribute, and ClusterGAN obtains good effect on data sets such as MNIST, fashion-MNIST. However, the model does not achieve better performance on a more complex CIFAR-10 dataset, and moreover, clusterigan does not use a deep neural network with strong fitting capability, so that the generated image quality is difficult to meet the requirement of practical application, and the accuracy of category attribute control is poor, so that the model cannot be applied to practical scenes.

In order to solve the problem, china patent 'a self-supervision attribute controllable image generation method based on a depth twin network' (application number: 202210006607.5) finds that the current method generally applies attribute constraint to the generated image by analyzing the defects of the existing non-supervision attribute controllable image generation method, ignores the implicit structured prior information in the real image, and causes insufficient attribute control effect, so the invention creatively introduces a self-supervision characterization learning method based on the depth twin network, and comprises the following steps: chen X, he K.Expling simple siamese representation learning [ C ]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognizing.2021:15750-15758. Generating an antagonism network provides image characterization information from real images, corrects attribute constraint direction in image generation process, and improves attribute control effect of the antagonism network. However, although the method of the patent invention achieves the property control effect and the performance improvement on the image generation quality, the fairness development in the image generation process is not researched, and the model potential of the method is still to be further explored and improved.

Based on the analysis result, the invention mainly explores a method for improving the fairness of the generated countermeasure network and realizing more reliable and fair attribute control image generation. The inventors noted that in recent years, in the field of natural language processing, a large-scale pre-training model achieved good migration performance in downstream tasks, reference: devlin J, chang M W, lee K, et al, bert: pre-training of deep bidirectional transformers for language understanding [ J ]. ArXiv preprint arXiv:1810.04805,2018. This means that a deep learning model is well initialized through Pre-training, which is helpful for the migration and expansion capability of the model on multiple tasks, and meanwhile, the good migration capability means that the model fully and completely learns input data of different models and attributes, which reveals important promotion significance of the Pre-training initialization method for fair learning of the model. The invention is inspired by the method, a pre-training reliable image generation algorithm based on a structured priori learning mode is provided, and a self-supervision image characterization algorithm is used for introducing a reliable and fair initialization parameter prior to the attribute constraint process of the generation reactance network, so that the attribute control effect of a model is improved, and the fairness of a model generation result is improved. Moreover, the invention provides a fairer image attribute control effect measurement method, improves the attribute control measurement mechanism of the original method, and realizes more comprehensive and equal description of model performance.

Disclosure of Invention

The invention discloses a fair controllable image generation method based on structured network priori knowledge, which mainly solves the problems of insufficient control effect of the attribute of the conventional controllable image generation method, poor fairness for generating data of different modes and the like.

The invention introduces a self-supervision characterization learning method based on cosine similarity measurement, designs a structure priori model initialization training mode conforming to an attribute control image generation task, and constructs an attribute controllable image generation countermeasure network model with good image characterization capability. According to the invention, the CIFAR-10 data set is taken as a research object, and the image data is normalized firstly, so that the image data meets the training requirement of a deep learning algorithm. The deep learning algorithm provided by the invention consists of three deep neural networks, namely a generator, an encoder and a discriminator, wherein the invention designs a mixed generator input mode, decomposes the generator input hidden variable into two parts of Category hidden variable and Category independent hidden variable, wherein the Category hidden variable is sampled from Category distribution and expanded into One-Hot coding format, and the Category independent hidden variable is sampled from Gaussian distribution; the encoder takes the image data as input and provides category attribute control constraint for the generator; the arbiter takes the image data as input, and forms a pair of generating countermeasure networks with the generator, and the generator provides the generated image authenticity constraint. The invention designs a two-stage training process, wherein the first stage is a pre-training stage, and a self-supervision characterization learning algorithm is used for pre-training an encoder; the second stage is a formal training stage, and the training attribute is a controllable generator, so that the requirements of downstream tasks are met. Specifically, in the training process of the first stage, two disturbed data enhancement samples are generated for a real image by using a data enhancement strategy, cosine similarity loss is established, and a constraint encoder outputs similar encoded representations for the two enhancement samples, so that the encoder has the characteristic extraction capability of structuring and class specialization, thereby providing good initialization information for the subsequent attribute controllable image generation training process and ensuring the correctness of the optimization direction of the subsequent task. In the second stage, a combined optimization target is established for a generator, a countermeasure network structure is generated by using spectrum normalization, hinge loss is introduced to optimize image generation quality, and meanwhile, the generator is controlled to realize controllable image generation by using category attribute constraint based on cross entropy loss. Simulation experiment results show that the structured network priori knowledge introduced strategy provided by the invention provides good initialization parameter distribution for the encoder, promotes fairness of the generator image generation result, and improves attribute control effect. The general structure of the invention is shown in figure 1.

In order to more clearly describe the details of the invention, certain terms are first defined.

Definition 1: and (5) Gaussian distribution. The probability distribution is also called normal distribution, has important importance in the fields of physics, mathematics and the like, and has a very wide application range because a plurality of rules in real life accord with the normal distribution. If a random variable x, the probability density function satisfies

Where μ is the mathematical expectation of the normal distribution and σ is the standard deviation of the normal distribution, then x is said to follow the gaussian distribution, commonly denoted as N (x|μ, σ) ² ) In the form of (a). />

Definition 2: category distribution. Is a generalization of Bernoulli distribution, in particular, assuming an event with n cases, the sum of the probabilities of each occurrence

At each sampling timeRandom, equiprobability sampling of a certain situation is widely used for uniform data set class distribution sampling.

Definition 3: depth residual neural network. The depth residual neural network is an improved depth neural network based on the depth convolutional neural network, and the problems of over fitting and poor stability of the common depth convolutional neural network under a deep architecture are solved. Specifically, the depth residual neural network takes residual blocks as basic structural units, and 'short-cut' cross-layer operation is used in each residual block, so that input knowledge is introduced for output of the residual blocks, and characteristic information under different sensing fields is fused at the input end of the residual blocks, so that the residual blocks have multi-scale characteristic fusion capability. Experiments prove that the depth residual neutral network has better stability compared with the depth convolutional neutral network, and can stack depth residual blocks with different sizes and numbers according to specific requirements when in application, so that strong image feature extraction capability is realized, and better image representation and generation effects are realized.

Definition 4: and (5) carrying out average pooling. Averaging pooling is a method of dividing an input image into rectangular regions and performing an averaging operation on each sub-region. The strategy is derived from spatial filtering techniques, for a given feature map X, if it is divided into k sub-regions, each sub-region X _k The output after the average pooling is

Wherein R is _k Representing the number of pixel points in the kth sub-area, x _kab Representing the value of the element located at (a, b) in the kth subregion. And the average pooling is realized by extracting the average value in each rectangular area, and fusing the result in one rectangular area into one pixel point, so that the purpose of reducing the dimension is achieved.

Definition 5: upsampling techniques. The up-sampling technology is a method for expanding the size of an input image or a feature map, and aims at contrary to the pooling technology, the common up-sampling algorithm comprises interpolation algorithms such as bilinear algorithm, nearest neighbor algorithm, mean filling algorithm and the like. The invention selects the nearest neighbor algorithm as an up-sampling algorithm, namely, the value of an expansion area is deduced through the parameter values of adjacent positions, and the size of an input image or a feature map is doubled by an up-sampling technology.

Definition 6: batch-Norm normalized function. The batch normalization function is a normalization method which is provided for solving the problem that in the training process, the neuron updating is unstable due to the change of the characteristic distribution and the optimization is difficult. The method is characterized in that the mean value and variance of the image data of the same batch are obtained, then the samples are normalized to be in line with Gaussian distribution, and in order to prevent the differences of the normalized data from disappearing, translation and scaling operations are used for recovering the differences among the samples after normalization. The learning parameters exist in the batch normalization function, different parameter playback strategies are used in the training and testing stages so as to meet different task demands, and the method has very wide application in convolutional neural networks, is beneficial to improving generalization of the model, and relieves overfitting so that the model is easier to converge.

Definition 7: a spectral normalization function. The frequency spectrum normalization function is a normalization method for the weight of the discriminator network, which is proposed for solving the problems that the original generation of the countermeasure network training is unstable and the ideal optimal solution is difficult to reach, in particular, the frequency spectrum normalization strategy uses the weight matrix W of the neural network layer l ^l Divided by its maximum singular value sigma (W ^l ) The output of each layer of neurons of the discriminator meets the continuity of 1-Lipshcitz, thereby meeting the calculation requirement of the optimal transmission bulldozer distance, and the weight matrix modified by the frequency spectrum normalization technology is as follows

Definition 8: reLU function. The idea of the activation function, also called piecewise linear function, is that a value smaller than 0 is set to 0, while a value larger than 0 is not changed, the expression is ReLU (x) =max (0, x), and the function of ReLU is to introduce nonlinearity into the deep neural network and improve fitting capability of the deep neural network.

Definition 9: tanh function. I.e. hyperbolic tangent function. The nonlinear monotonic rising and falling relation of the input and the output can be kept, the function expression is similar to the Sigmoid activation function, but has a gradient range wider than that of the Sigmoid function, thereby being beneficial to improving the gradient disappearance problem frequently occurring in the neural network, and the expression is that

The invention only applies the Tanh function to the output end of the generator network, so that the output data distribution is located at [ -1,1]Interval.

Definition 10: cosine similarity. The method is also called cosine distance, is a common vector distance measurement method, and can measure similarity of vectors in a high-dimensional space better by calculating the included angle of two vectors and measuring the similarity between the vectors, and is widely adopted by a self-supervision characterization learning method, and the expression is as follows:

Wherein x is _i 、y _i Respectively represent vectors

A component in the i-th dimension.

Definition 11: normalizing the exponential function. The normalized exponential function (Softmax) is the most commonly used output normalization method in multi-classification tasks, which is calculated by the formula

N-dimensional vector +.>

X of each dimension of (2) _i Compressed to [0,1 ]]Interval, at the same time, the compressed vector is provided with +.>

Such that the output vector conforms to the rules of the class probability distribution and is therefore often used as the final output operation of the probability prediction model.

Definition 12: one-bit efficient coding, also called One-Hot coding, is a coding mode for coding data into a binary form, wherein in the binary representation, other bits are set to 0 except an integer index position '1' corresponding to the coded data, and the binary representation is commonly used for vector representation of discrete variables.

Definition 13: cross entropy loss. Cross-Entropy Loss (Cross-Entropy Loss) is a commonly used class Loss function that is used to increase the mutual information between two distributions by minimizing the Cross-Entropy function between the two, thereby achieving a correlation between class and feature. For two mathematical distributions p and q, when q is expressed by using the distribution p, the average information length which can be transmitted is the cross entropy between the two, and is recorded as

Definition 14: an antagonizing network is generated. The generating countermeasure network is a representative model of an implicit probability inferred image generating method, the core component of the generating countermeasure network is a binary model formed by a generator and a discriminator, the task of the discriminator is to distinguish the authenticity of an input image, the task of the generator is to map an input hidden code to generate a false image, and the optimization aim is to generate a sufficiently realistic image sample to cheat the discriminator. The two optimization targets are opposite to form a pair of maximum and minimum zeros and games, and in the optimization process, the network parameters of the generator automatically fit probability distribution of real data without displaying solution, so that the strategy can reduce the optimization difficulty.

Definition 15: an encoder network. The encoder network constructed based on the deep neural network is a generic term of a feature extraction module, and has the function of extracting and representing output data of a high-dimensional space. The most common task of the encoder network is to generalize the input data, mine the internal differences of the input data according to the control of the supervision information or the constraint function, and generalize different characterization modes of different data. The optimization of the encoder network used herein aims at fully extracting the structured characterization information of the image by using an unsupervised method, can generalize the inherent category attribute differences of the image data, and transmits the generalized information to the generator to guide the generator to realize attribute control.

Definition 16: random data enhancement. The random data enhancement method is a data expansion method provided for relieving the deficiency of the diversity of training data, and an enhancement sample is generated by carrying out position and color conversion on an original image according to a certain rule or adding additional noise and the like so as to relieve the over-fitting phenomenon of a deep neural network. The purpose of using the random data enhancement strategy is to generate two copies with certain difference for the same image sample, and the two copies are used for downstream attribute constraint and image annotation extraction.

Therefore, the technical scheme of the invention is as follows: a method for generating a fair and controllable image based on a priori knowledge of a structured network, the method comprising:

step 1: preprocessing experimental image data;

all image data to be used are converted from RGB format to Tensor format with deep learning operation, and the pixel value range is normalized from 0-255 to 0-1 interval, so that a computer can conveniently process and infer, and each training image is restrained to have the same size;

step 2: carrying out random data enhancement on experimental image data;

the method is characterized in that 8 operations including random scaling clipping, random horizontal overturning, random brightness change, random contrast change, random saturation change, random tone change, random graying and random Gaussian blur are used for carrying out random data enhancement processing on the image data processed in the step 1; these 8 data enhancement operations will be applied to each image independently with probability as follows:

The first step, selecting a cutting area from 20% -100% of the area in the original image at random, and recovering the original size of the cut image;

secondly, horizontally overturning the image with 50% probability;

and thirdly, randomly converting the brightness, contrast and saturation of the image into 60-140% of the original image, and randomly shifting the hue of the image by a range of-10%. Note that in practical applications, these four operations are bound to each other with a probability of 80% being used;

fourth, converting the image into a gray image with 20% probability;

fifth, the image is Gaussian blurred with 50% probability, resulting in a more blurred sample.

When in practical application, the corresponding data enhancement methods are sequentially applied according to the sequence, and each image can obtain a random enhancement sample with the same size as the original image;

step 3: constructing a deep neural network;

1) Constructing a generator network:

the input of the generator is 128-dimensional noise vector consisting of 118-dimensional Gaussian noise and 10-dimensional One-Hot coding, and the output is an image with the size of 3 x 32; the generator network structure is formed by sequentially connecting a full connection layer, a residual neural network formed by 3 residual neural network modules and a two-dimensional convolution layer, wherein the full connection layer is used as an input end, and the two-dimensional convolution layer is used as an output end;

2) Constructing a discriminator network:

the discriminator takes a real image and a generated image as input and outputs as 1-dimensional characteristics, the network structure of the discriminator consists of four spectrum normalization residual error blocks, a global average pooling layer and a full-connection layer, the four spectrum normalization residual error neural network modules are sequentially connected to form a residual error neural network, the discriminator network is sequentially connected in sequence by the residual error neural network, the global average pooling neural network and the full-connection layer, the residual error neural network is taken as an input end, and the full-connection layer is taken as an output end;

3) Constructing an encoder network:

the encoder inputs random data enhancement samples for generating an image and a real image and outputs the random data enhancement samples as image feature vectors; the main structure of the encoder network is formed by sequentially connecting a ResNet18 network and two full connection layers, wherein a residual neural network is used as an input end, the last full connection layer is used as an output end, and 2048-dimensional feature vectors are output; for the output of the encoder body network, the feature vector is sent to a pre-header of an additional two-layer fully-connected layer structure for calculating cosine similarity loss of the feature, and is sent to a cluster header consisting of one layer of fully-connected layer for providing attribute control loss for the generator. The encoder designed by the invention has high-dimensional output dimension, so that the encoder has stronger characteristic representation capability, can learn the characteristic information of different mode data of a complete data set more fully, and simultaneously uses batch standardization operation for all full-connection layers behind a residual connection layer to slow down the overfitting phenomenon;

Step 4: designing a priori knowledge of a structured network to introduce loss;

recording the real image Tensor format data processed in the step 1 as

Carrying out random data enhancement on x in the step 2 to obtain two enhancement samples which are respectively marked as T ₁ (x)、T ₂ (x) The method comprises the steps of carrying out a first treatment on the surface of the The main network structure of the encoder is recorded as E, the pre-measurement head is recorded as P, and the output characteristic of the encoder is recorded as q _i ＝E(T _i (x) A predictive head output characteristic h _i ＝P(q _i )；

In order to learn the reliable and sufficient prior characterization knowledge information of the real image, reasonable and effective characterization mining loss is required in the pre-training stage of the encoder, so that reliable network loss weight initialization can be provided for the encoder, and the encoder can output reliable and fair attribute constraint in the subsequent image generation task, so that the fairness of image generation and attribute control is promoted. Based on this starting point, the present invention introduces a penalty using structured network prior knowledge based on cosine similarity, expecting the encoder network to output similar feature representations for different enhanced samples of the same image:

in the above formula, the stopgard (·) represents a gradient stopping strategy, and the variable stopped by the gradient is not calculated by the gradient in the back propagation process;

step 5: designing network loss in the formal training process;

Recording 118-dimensional class independent hidden codes obtained by random sampling from Gaussian distribution as

Marking a random integer with a value of 0-9 sampled from Category distribution with probability of 0.1 as m, and marking a corresponding 10-dimensional One-Hot vector as +.>

Taking z and c together as the input of the generator, thus the generator input is steganographically encoded into 128 dimensions; the generator, the discriminator and the encoder cluster head network are respectively marked as G, D, C, and part of unexplained parameters have the same meaning as those in the step 4;

1) Generator Loss function Loss ^G ：

The optimization of the generator aims at generating images that are as realistic as possible, while the generated images under the control of the same class steganography c should have the same characteristic representation so that they have a consistent class. Based on the above objectives, the loss of the generator includes generating an countering network loss

Constraint loss with category consistency>

Two parts; wherein:

in the above-mentioned method, the step of,

representing a loss expectation calculated from an image generated from a batch of latent codes sampled in a distribution; d (G (z, c)) represents an output to which the discriminator corresponds with the generator-generated image as input; c (E (G (z, C))) represents a 10-dimensional feature representation of the encoder cluster head after further generalizing the features extracted by the generator through the encoder body network; CE (,) represents cross entropy loss. It should be noted that when the cross entropy loss is calculated, the image characterization extracted by the cluster head C is Softmax operated, and this step is omitted for simplicity of representation;

Thus, the generator total loss function is:

in the above-mentioned method, the step of,

the symbol indicates that the corresponding network parameters are fixed, and the gradient of the related network is not calculated; beta _c For the adjustable super-parameters, the weight of the class consistency loss in the total loss of the generator is represented;

2) Loss function Loss of discriminator ^D ：

The optimization purpose of the discriminator is to distinguish the real image from the generated image as accurately as possible, and the loss function based on the hinge loss design is as follows:

in the above-mentioned method, the step of,

representation pairs randomly from a true image distributionThe sampled samples are expected, D (x) represents the output of the discriminator corresponding to the real image as input, and the rest definitions are the same as those in the generator loss function;

3) Encoder Loss function Loss ^E ：

The optimization purpose of the encoder is two, namely, the encoder provides a characteristic representation which accords with the real image data characterization mode as much as possible, and error guidance is avoided being provided for the encoder; furthermore, the encoder needs to simultaneously restrict the generated images sharing the same class of latent codes according to a given class of latent codes c to have similar feature representations. In order to achieve the above, the invention continues the prior knowledge guiding loss of the design of the encoder in the pre-training stage, ensures that the encoder is always optimized along the direction conforming to the characterization characteristics of the real image in the subsequent training process, and simultaneously uses the same cross entropy loss function as the generator to meet the requirement of attribute control constraint:

The definitions of the terms in the above formulas are the same as in the previous. The expression of the encoder in the formal training stage is:

step 6: training a total neural network;

firstly, using the structured network priori knowledge introduced loss proposed in the step 4 to pretrain the encoder by utilizing the three neural networks constructed in the step 3, introducing good structured initial network parameters for the encoder, respectively using the corresponding loss functions designed in the step 5 to train each network after the pretraining stage is completed, using an Adam momentum optimizer to fix the encoder and the generator network parameters when updating the identifier network parameters and fix the identifier network parameters when updating the encoder and the generator network;

step 7: and (3) training a model in the step (6), saving model parameters, taking a generator, constructing an input hidden code vector of the generator according to the method in the step (5), and inputting the input hidden code vector into the generator to obtain a generated image, wherein different hidden code combination inputs generate different generated images.

The invention comprises the following improvement points:

a, aiming at the problem that the current unsupervised attribute controllable image generation method does not consider fairness of a generation model for different mode data generation, the fair controllable image generation method based on structured network priori knowledge is provided, and the encoder network is designed with a good structured network priori loss function to pretrain, so that reliable parameter initialization is provided for an attribute control process of an objective network, and the correctness of an initial optimization path of the model is ensured.

b, aiming at the problem that the attribute control effect of the current unsupervised attribute control image generation method still has a lifting space, the loss is introduced by combining the prior knowledge of the structured network, which is provided by the invention, the network structure of the encoder is carefully researched, the characteristic of higher dimension is provided to more fully represent the image, meanwhile, the detailed experiment is performed on each linear layer component at the tail end of the encoder, and finally, the batch normalization technology is introduced, so that the representation and extraction capacity of the encoder is further improved.

In the invention, the difference of different characterization extraction methods is considered, the class prediction is carried out on the generated image by using a pre-trained classifier network, and a fairer attribute control effect measurement strategy is provided according to the attribute control effect of the consistency measurement model between the class prediction result and the class hidden code distribution corresponding to the image.

The improvement in a can be based on the existing unsupervised attribute control image generation algorithm, any modification on the network structure is not needed, a structured priori knowledge guarantee is provided for the parameter initialization of the model, and the fairness of the image generation process is improved; the improvement in the step b can obviously improve the attribute control image generation effect of the model on the basis of not modifying the model optimization process and not obviously enhancing the model inference cost; the improvement in c mainly aims at the measurement of the model attribute control capability, a third-party classifier independent of the model is used for evaluating the quality of model attribute control, and the convincing power of an evaluation result is improved. Experimental results show that the invention can relatively improve the clustering purity index of the model image generation result by 18.6% aiming at the improvement of the encoder network structure, the normalized mutual information index by 38.7%, and the adjustment Rankine index by 45%. After the structured network priori knowledge training strategy proposed by the invention is further introduced, compared with the original method, the clustering purity of the generated result reaches 0.54, and is improved by 25.5%; the standardized mutual information reaches 0.48, which is improved by 54.8 percent relatively; the Rankine index is adjusted to be 0.36, and the relative improvement is 80 percent. The experimental results fully illustrate that the structured network priori knowledge introduction strategy provided by the invention has a remarkable promotion effect on the generation of the unsupervised attribute control performance of the antagonistic network, and the promotion of the attribute control effect means that the uniformity of capturing different types of modal images by the model is more fully represented. In addition, in order to measure the promoting effect of the method for generating the anti-network fairness more accurately, the diversity of the generated results is used as the model fairness measurement basis, and the diversity of the generated results is directly reflected by the coverage integrity of the model to the real image data, through tests, the method relatively improves the image generated result diversity index FID by 13.2%, which indicates that the structural network priori knowledge guiding strategy can effectively improve the fairness of generating the anti-network model.

Drawings

FIG. 1 is a schematic diagram of the overall structure of the method of the present invention;

Detailed Description

Step 1: preprocessing experimental image data;

the CIFAR-10 data set is obtained from an official channel, all image data are converted from an RGB format to a Tensor format which can be operated by deep learning, and the pixel value range is normalized from 0 to 255 to a 0-1 interval, so that a computer can conveniently process and infer, and each training image is restrained to have the same size.

Step 2: carrying out random data enhancement on experimental image data;

secondly, horizontally overturning the image with 50% probability;

Fourth, converting the image into a gray image with 20% probability;

In practical application, the corresponding data enhancement methods are sequentially applied according to the sequence, and each image is subjected to random enhancement samples with the same size as the original image.

Step 3: constructing a deep neural network;

1) Constructing a generator network:

the input of the generator is 128-dimensional noise vector consisting of 118-dimensional Gaussian noise and 10-dimensional One-Hot coding, and the output is an image with the size of 3 x 32; the generator network structure is formed by sequentially connecting a full connection layer, a residual neural network formed by 3 residual neural network modules and a two-dimensional convolution layer, wherein the full connection layer is used as an input end, and the two-dimensional convolution layer is used as an output end.

2) Constructing a discriminator network:

the discriminator takes a real image and a generated image as input and outputs as 1-dimensional characteristics, the network structure of the discriminator consists of four spectrum normalization residual error blocks, a global average pooling layer and a full-connection layer, the four spectrum normalization residual error neural network modules are sequentially connected to form a residual error neural network, the discriminator network is sequentially connected in sequence by the residual error neural network, the global average pooling neural network and the full-connection layer, the residual error neural network is taken as an input end, and the full-connection layer is taken as an output end.

3) Constructing an encoder network:

the encoder inputs random data enhancement samples for generating an image and a real image and outputs the random data enhancement samples as image feature vectors; the main structure of the encoder network is formed by sequentially connecting a ResNet18 network and two full connection layers, wherein a residual neural network is used as an input end, the last full connection layer is used as an output end, and 2048-dimensional feature vectors are output; for the output of the encoder body network, the feature vector is sent to a pre-header of an additional two-layer fully-connected layer structure for calculating cosine similarity loss of the feature, and is sent to a cluster header consisting of one layer of fully-connected layer for providing attribute control loss for the generator. The encoder designed by the invention has high-dimensional output dimension, so that the encoder has stronger characteristic representation capability, can learn the characteristic information of different mode data of a complete data set more fully, and simultaneously uses batch standardization operation for all full-connection layers behind a residual connection layer to slow down the overfitting phenomenon.

Step 4: designing a priori knowledge of a structured network to introduce loss;

recording the real image Tensor format data processed in the step 1 as

Carrying out random data enhancement on x in the step 2 to obtain two enhancement samples which are respectively marked as T ₁ (x)、T ₂ (x) The method comprises the steps of carrying out a first treatment on the surface of the The main network structure of the encoder is recorded as E, the pre-measurement head is recorded as P, and the output characteristic of the encoder is recorded as q _i ＝E(T _i (x) A predictive head output characteristic h _i ＝P(q _i )。

in the above equation, the stoprad (·) represents a gradient stop strategy, and the variables stopped by the gradient are not computed during the back propagation.

Step 5: designing network loss in the formal training process;

Marking a random integer with a value of 0-9 sampled from Category distribution with probability of 0.1 as m, and marking a corresponding 10-dimensional One-Hot vector as +. >

Taking z and c together as the input of the generator, thus the generator input is steganographically encoded into 128 dimensions; the generator, the discriminator and the encoder cluster head network are respectively marked as G, D, C, and part of unexplained parameters have the same meaning as that in the step 4.

1) Generator Loss function Loss ^G ：

Constraint loss with category consistency>

Two parts; wherein:

in the above-mentioned method, the step of,

representing a loss expectation calculated from an image generated from a batch of steganography sampled in a distribution; d (G (z, c)) represents an output to which the discriminator corresponds with the generator-generated image as input; c (E (G (z, C))) represents a 10-dimensional feature representation of the encoder cluster head after further generalizing the features extracted by the generator through the encoder body network; CE (,) represents cross entropy loss. It should be noted that when calculating the cross entropy loss, the image representation extracted by cluster head C is Softmax operated, and this step is omitted for simplicity of representation.

Thus, the generator total loss function is:

in the above-mentioned method, the step of,

the symbol indicates that the corresponding network parameters are fixed, and the gradient of the related network is not calculated; beta _c For the tunable superparameter, it is indicated that the class consistency penalty accounts for the weight of the total penalty of the generator.

2) Loss function Loss of discriminator ^D ：

in the above-mentioned method, the step of,

representing the expectation of a plurality of samples randomly sampled from the real image distribution, wherein D (x) represents the output corresponding to the real image taken as the input by the discriminator, and the rest definitions are the same as those in the generator loss function;

3) Encoder Loss function Loss ^E ：

step 6: training a total neural network;

the three neural networks constructed in the step 3 are utilized, firstly, the encoder is pre-trained by using the loss introduced by the prior knowledge of the structured network proposed in the step 4, good structured initial network parameters are introduced for the encoder, an Adam momentum optimizer is used in the encoder pre-training stage, the learning rate is set to be 0.0002, 64 real images are fed in each iteration, and the whole experimental encoder iterates 500 times by using a complete data set.

After the pre-training is finished, training is carried out by using the corresponding loss function designed in the step 5, the Adam momentum optimizer is used, the design learning rate is 0.0002, the encoder and the generator are bound with update parameters, the encoder and the generator update parameters alternately with the discriminator, and the generator and the encoder update once every three times the discriminator is updated. The encoder and generator network parameters are fixed when the arbiter network parameters are updated, and the arbiter network parameters are fixed when the encoder and generator network are updated. The Adam momentum optimizer is used for setting the learning rate to be 0.0002, the arbiter updates 64 generated images and 64 real images each time, the generator and the encoder independently and uniformly distribute and sample 128 groups of hidden coding vectors for updating, and in addition, the encoder additionally updates by using 64 real images. The whole experimental discriminant iterates 350 times in total using the complete data set.

All experiments are realized by depending on a Python language Python deep learning platform, wherein the Python version is 3.6, and the Python version is 1.7.1.

Step 7: testing a total neural network;

training a model in the step 6, saving model parameters, taking a generator, constructing hidden code vectors according to the method in the step 5, inputting the hidden code vectors into the generator, and obtaining generated images, wherein different random noise inputs generate different generated images. 50000 generated images are generated according to the method, the FID indexes of the image generation diversity are calculated, and fairness of the image generation process of the generator is evaluated. Simultaneously, 1000 independent hidden codes of each class are randomly sampled independently for pairing, and are sent to a generator to generate 10000 generated images, the 10000 generated images are input into a classifier network which is pre-trained on a real CIFAR-10 data set for class prediction, and according to class hidden code distribution and predicted class distribution, a clustering purity index ACC is used for standardizing a mutual information index NMI and adjusting an attribute control effect of a Rand index ARI evaluation model.

Claims

1. A method for generating a fair and controllable image based on a priori knowledge of a structured network, the method comprising:

step 1: preprocessing experimental image data;

step 2: carrying out random data enhancement on experimental image data;

secondly, horizontally overturning the image with 50% probability;

Fourth, converting the image into a gray image with 20% probability;

step 3: constructing a deep neural network;

1) Constructing a generator network:

2) Constructing a discriminator network:

3) Constructing an encoder network:

step 4: designing a priori knowledge of a structured network to introduce loss;

recording the real image Tensor format data processed in the step 1 as

step 5: designing network loss in the formal training process;

Marking a random integer with a value of 0-9 sampled from Category distribution with probability of 0.1 as m, and marking a corresponding 10-dimensional One-Hot vector as

1) Generator Loss function Loss ^G ：

Constraint loss with category consistency>

Two parts; wherein:

in the above-mentioned method, the step of,

representing a loss expectation calculated from an image generated from a batch of latent codes sampled in a distribution; d (G (z, c)) represents an output to which the discriminator corresponds with the generator-generated image as input; c (E (G (z, C))) represents a 10-dimensional feature representation of the encoder cluster head after further generalizing the features extracted by the generator through the encoder body network; CE (·, ·) represents cross entropy loss. It should be noted that when the cross entropy loss is calculated, the image characterization extracted by the cluster head C is Softmax operated, and this step is omitted for simplicity of representation;

Thus, the generator total loss function is:

in the above-mentioned method, the step of,

2) Loss function Loss of discriminator ^D ：

in the above-mentioned method, the step of,

3) Encoder Loss function Loss ^E ：

step 6: training a total neural network;