CN116109719A - Fair controllable image generation method based on structured network priori knowledge - Google Patents

Fair controllable image generation method based on structured network priori knowledge Download PDF

Info

Publication number
CN116109719A
CN116109719A CN202211607479.6A CN202211607479A CN116109719A CN 116109719 A CN116109719 A CN 116109719A CN 202211607479 A CN202211607479 A CN 202211607479A CN 116109719 A CN116109719 A CN 116109719A
Authority
CN
China
Prior art keywords
image
network
encoder
loss
generator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211607479.6A
Other languages
Chinese (zh)
Inventor
陈志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze River Delta Research Institute of UESTC Huzhou
Original Assignee
Yangtze River Delta Research Institute of UESTC Huzhou
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangtze River Delta Research Institute of UESTC Huzhou filed Critical Yangtze River Delta Research Institute of UESTC Huzhou
Priority to CN202211607479.6A priority Critical patent/CN116109719A/en
Publication of CN116109719A publication Critical patent/CN116109719A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a fair controllable image generation method based on structured network priori knowledge, and belongs to the field of computer vision. In the first stage, the loss is guided by the structural priori knowledge, good initialization parameters are provided for the encoder network, so that the encoder has the differential representation capability of representing the real image before formal training, and direction guiding and initial guarantee are provided for the encoder to more comprehensively cover the real image data. In the second stage, a good encoder guided by the structured priori knowledge is used for providing attribute control constraint for the generation reactance network, and the sustainability of the real image representation extraction capability learned by the encoder is ensured through auxiliary similarity loss. In addition, the invention provides a higher characteristic dimension and linear layer batch standardization strategy, so that the representation and extraction capacity of the model is further improved; the fairness of the model in the image generation process is remarkably improved, and meanwhile, the accuracy of attribute control of the model is improved.

Description

Fair controllable image generation method based on structured network priori knowledge
Technical Field
The invention belongs to the field of computer vision research, is oriented to the problem of controllable generation of depth images, and is mainly applied to the aspects of understanding and explanation of a depth neural network, image data mining, improvement of algorithm reliability and the like.
Background
The image generation technology belongs to a generation algorithm and is one of key algorithms in the field of deep learning research. The generating algorithm and the discriminant algorithm are used as two branches of the deep learning algorithm, have important roles in the field of artificial intelligence research, and compared with the discriminant algorithm, the generating algorithm requires to establish a representation model of data distribution, estimates the probability model of the data distribution through means such as Bayes, variation inference, markov process and the like, and extracts information needed by researchers from the probability model for application to downstream tasks. The generating algorithm requires more complete and comprehensive knowledge of data and tighter mathematical logic reasoning, which provides theoretical guarantee for the reliability of the generating algorithm and challenges for the construction of the generating algorithm.
In the existing image generation method based on the deep learning technology, a deep neural network is generally constructed, parameterized estimation on image data distribution is extracted, and sampling is carried out from the estimated distribution to generate a new image sample meeting the real image data distribution rule. In recent years, a lot of depth image generation technologies with excellent performance and excellent quality are emerging due to the rapid development of computer technologies and deep learning computing hardware. The existing image generation method can be divided into two main categories, namely a display probability density estimation method and an implicit probability density estimation method according to whether the probability model is directly estimated or not. The display probability density estimation method tries to find an approximate solution of image data distribution through Bayesian variation inference, ordinary differential equation and other technologies, and the method develops for many years to obtain great progress, and in recent years, a Diffusion Model (Diffusion Model) based on a Markov chain and a differential equation obtains a remarkable image generation effect, but the method generally needs to establish strict constraint, has a complex mathematical Model structure, has high inference and training cost, and has a large progress space from actual landing and application. The implicit probability density estimation method is used for solving the problem of difficult image data distribution estimation from the aspect of image generation results, constructing deep neural network training loss by taking good image generation quality as a target as far as possible, completing learning of image data distribution by the deep neural network, remarkably reducing the difficulty of image generation, obtaining good image generation effect, and being applied to the fields of video entertainment scenes, image data restoration, attack and defense resistance, data set expansion and the like.
In 2014, the implicit image generation method is represented as a generation countermeasure network, and Goodfellow et al creatively constructs a pair of generator-discriminator binary countermeasure models with completely opposite optimization targets by combining game theory, wherein random noise is taken as input by a generator and is output as an image sample, and true and generated images are taken as input by a discriminator and the probability that the output image is true is taken as input by a discriminator. Reference is made to: goodfellow I, pouset-abadrie J, mirza M, et al generated adversarial networks J Communications of the ACM,2020,63 (11): 139-144. In the optimization process, the arbiter plays the role of a true-false classifier, with the goal of distinguishing as far as possible between true and false images. In the optimization process of the generator, the optimization information is obtained from the true and false discrimination results of the image generated by the discriminator, in the subsequent optimization process, the generator carries out parameter updating according to the prediction results of the discriminator, the discriminator is expected to be deceived in the gradual iteration process, and in the ideal state, the probability of the discriminator discriminating the true image and the generated image is 0.5, at the moment, the true image and the generated image reach zero and a game state, which means that the image generation quality of the generator is comparable to that of the true image, and the image generation target is realized. The generation countermeasure network skillfully avoids the difficulty that the traditional image generation method needs to solve the explicit probability distribution through countermeasure training, reduces the training difficulty of an image generation model, realizes a great breakthrough of the image generation technology, remarkably improves the image generation quality of the generation countermeasure network through years of development, and is widely applied to multiple fields.
Although the derivation of the countermeasure training method for generating the countermeasure network reduces the difficulty of optimizing modeling, the problems of unstable model training, difficult interpretation of image generation results, uncontrollable training and the like also occur. In addition, as the application of deep learning technology in real scenes is wider, higher requirements are put on the controllability, reliability and fairness of the image generation method. The method has the advantages that the inherent expression form of the image data is mined and analyzed, the attribute constraint paradigm is reasonably designed, researchers are helped to realize accurate control of the image generation result by simply modifying the input mode, and the method has important significance for the application of the network countermeasure method in the fields of security, medical treatment and the like. In addition, because the difficulty of generating different image modes is not equal, the generation capability of the model for different attribute and category images in the same data set is deviated, when the model is actually applied, the generation of the countermeasure network may have preference for generating a certain attribute, the generation of a few attribute images is ignored, the structure of the image generation result is single, the diversity is poor, and the application of the generation countermeasure network in the actual field is limited, and further research is needed.
In the aspect of controllable generation research of image attributes, sudipto Mukherjee et al rely on a mutual information maximization principle to generate an countermeasure network framework, and put forward an unsupervised type attribute controllable generation countermeasure network ClusterGAN based on a clustering thought, so that a certain type attribute controllable image generation effect is realized. Reference is made to: mukherjee S, asnani H, lin E, et al ClusterGAN: latent Space Clustering in Generative Adversarial Networks [ J ]. Proceedings of the AAAI Conference on Artificial Intelligence,2019,33:4610-4617. The model is inspired from the clustering thought in the machine learning field, decomposes the noise space input by a generator, divides hidden coding vectors bearing category attribute control, restricts the consistency of image characteristic representation generated by the same hidden coding vector control through cross entropy loss, thereby realizing the aim of maximizing mutual information between hidden coding and generated images, realizing the binding of hidden coding and image attribute, and ClusterGAN obtains good effect on data sets such as MNIST, fashion-MNIST. However, the model does not achieve better performance on a more complex CIFAR-10 dataset, and moreover, clusterigan does not use a deep neural network with strong fitting capability, so that the generated image quality is difficult to meet the requirement of practical application, and the accuracy of category attribute control is poor, so that the model cannot be applied to practical scenes.
In order to solve the problem, china patent 'a self-supervision attribute controllable image generation method based on a depth twin network' (application number: 202210006607.5) finds that the current method generally applies attribute constraint to the generated image by analyzing the defects of the existing non-supervision attribute controllable image generation method, ignores the implicit structured prior information in the real image, and causes insufficient attribute control effect, so the invention creatively introduces a self-supervision characterization learning method based on the depth twin network, and comprises the following steps: chen X, he K.Expling simple siamese representation learning [ C ]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognizing.2021:15750-15758. Generating an antagonism network provides image characterization information from real images, corrects attribute constraint direction in image generation process, and improves attribute control effect of the antagonism network. However, although the method of the patent invention achieves the property control effect and the performance improvement on the image generation quality, the fairness development in the image generation process is not researched, and the model potential of the method is still to be further explored and improved.
Based on the analysis result, the invention mainly explores a method for improving the fairness of the generated countermeasure network and realizing more reliable and fair attribute control image generation. The inventors noted that in recent years, in the field of natural language processing, a large-scale pre-training model achieved good migration performance in downstream tasks, reference: devlin J, chang M W, lee K, et al, bert: pre-training of deep bidirectional transformers for language understanding [ J ]. ArXiv preprint arXiv:1810.04805,2018. This means that a deep learning model is well initialized through Pre-training, which is helpful for the migration and expansion capability of the model on multiple tasks, and meanwhile, the good migration capability means that the model fully and completely learns input data of different models and attributes, which reveals important promotion significance of the Pre-training initialization method for fair learning of the model. The invention is inspired by the method, a pre-training reliable image generation algorithm based on a structured priori learning mode is provided, and a self-supervision image characterization algorithm is used for introducing a reliable and fair initialization parameter prior to the attribute constraint process of the generation reactance network, so that the attribute control effect of a model is improved, and the fairness of a model generation result is improved. Moreover, the invention provides a fairer image attribute control effect measurement method, improves the attribute control measurement mechanism of the original method, and realizes more comprehensive and equal description of model performance.
Disclosure of Invention
The invention discloses a fair controllable image generation method based on structured network priori knowledge, which mainly solves the problems of insufficient control effect of the attribute of the conventional controllable image generation method, poor fairness for generating data of different modes and the like.
The invention introduces a self-supervision characterization learning method based on cosine similarity measurement, designs a structure priori model initialization training mode conforming to an attribute control image generation task, and constructs an attribute controllable image generation countermeasure network model with good image characterization capability. According to the invention, the CIFAR-10 data set is taken as a research object, and the image data is normalized firstly, so that the image data meets the training requirement of a deep learning algorithm. The deep learning algorithm provided by the invention consists of three deep neural networks, namely a generator, an encoder and a discriminator, wherein the invention designs a mixed generator input mode, decomposes the generator input hidden variable into two parts of Category hidden variable and Category independent hidden variable, wherein the Category hidden variable is sampled from Category distribution and expanded into One-Hot coding format, and the Category independent hidden variable is sampled from Gaussian distribution; the encoder takes the image data as input and provides category attribute control constraint for the generator; the arbiter takes the image data as input, and forms a pair of generating countermeasure networks with the generator, and the generator provides the generated image authenticity constraint. The invention designs a two-stage training process, wherein the first stage is a pre-training stage, and a self-supervision characterization learning algorithm is used for pre-training an encoder; the second stage is a formal training stage, and the training attribute is a controllable generator, so that the requirements of downstream tasks are met. Specifically, in the training process of the first stage, two disturbed data enhancement samples are generated for a real image by using a data enhancement strategy, cosine similarity loss is established, and a constraint encoder outputs similar encoded representations for the two enhancement samples, so that the encoder has the characteristic extraction capability of structuring and class specialization, thereby providing good initialization information for the subsequent attribute controllable image generation training process and ensuring the correctness of the optimization direction of the subsequent task. In the second stage, a combined optimization target is established for a generator, a countermeasure network structure is generated by using spectrum normalization, hinge loss is introduced to optimize image generation quality, and meanwhile, the generator is controlled to realize controllable image generation by using category attribute constraint based on cross entropy loss. Simulation experiment results show that the structured network priori knowledge introduced strategy provided by the invention provides good initialization parameter distribution for the encoder, promotes fairness of the generator image generation result, and improves attribute control effect. The general structure of the invention is shown in figure 1.
In order to more clearly describe the details of the invention, certain terms are first defined.
Definition 1: and (5) Gaussian distribution. The probability distribution is also called normal distribution, has important importance in the fields of physics, mathematics and the like, and has a very wide application range because a plurality of rules in real life accord with the normal distribution. If a random variable x, the probability density function satisfies
Figure BDA0003998356360000041
Where μ is the mathematical expectation of the normal distribution and σ is the standard deviation of the normal distribution, then x is said to follow the gaussian distribution, commonly denoted as N (x|μ, σ) 2 ) In the form of (a). />
Definition 2: category distribution. Is a generalization of Bernoulli distribution, in particular, assuming an event with n cases, the sum of the probabilities of each occurrence
Figure BDA0003998356360000042
At each sampling timeRandom, equiprobability sampling of a certain situation is widely used for uniform data set class distribution sampling.
Definition 3: depth residual neural network. The depth residual neural network is an improved depth neural network based on the depth convolutional neural network, and the problems of over fitting and poor stability of the common depth convolutional neural network under a deep architecture are solved. Specifically, the depth residual neural network takes residual blocks as basic structural units, and 'short-cut' cross-layer operation is used in each residual block, so that input knowledge is introduced for output of the residual blocks, and characteristic information under different sensing fields is fused at the input end of the residual blocks, so that the residual blocks have multi-scale characteristic fusion capability. Experiments prove that the depth residual neutral network has better stability compared with the depth convolutional neutral network, and can stack depth residual blocks with different sizes and numbers according to specific requirements when in application, so that strong image feature extraction capability is realized, and better image representation and generation effects are realized.
Definition 4: and (5) carrying out average pooling. Averaging pooling is a method of dividing an input image into rectangular regions and performing an averaging operation on each sub-region. The strategy is derived from spatial filtering techniques, for a given feature map X, if it is divided into k sub-regions, each sub-region X k The output after the average pooling is
Figure BDA0003998356360000051
Wherein R is k Representing the number of pixel points in the kth sub-area, x kab Representing the value of the element located at (a, b) in the kth subregion. And the average pooling is realized by extracting the average value in each rectangular area, and fusing the result in one rectangular area into one pixel point, so that the purpose of reducing the dimension is achieved.
Definition 5: upsampling techniques. The up-sampling technology is a method for expanding the size of an input image or a feature map, and aims at contrary to the pooling technology, the common up-sampling algorithm comprises interpolation algorithms such as bilinear algorithm, nearest neighbor algorithm, mean filling algorithm and the like. The invention selects the nearest neighbor algorithm as an up-sampling algorithm, namely, the value of an expansion area is deduced through the parameter values of adjacent positions, and the size of an input image or a feature map is doubled by an up-sampling technology.
Definition 6: batch-Norm normalized function. The batch normalization function is a normalization method which is provided for solving the problem that in the training process, the neuron updating is unstable due to the change of the characteristic distribution and the optimization is difficult. The method is characterized in that the mean value and variance of the image data of the same batch are obtained, then the samples are normalized to be in line with Gaussian distribution, and in order to prevent the differences of the normalized data from disappearing, translation and scaling operations are used for recovering the differences among the samples after normalization. The learning parameters exist in the batch normalization function, different parameter playback strategies are used in the training and testing stages so as to meet different task demands, and the method has very wide application in convolutional neural networks, is beneficial to improving generalization of the model, and relieves overfitting so that the model is easier to converge.
Definition 7: a spectral normalization function. The frequency spectrum normalization function is a normalization method for the weight of the discriminator network, which is proposed for solving the problems that the original generation of the countermeasure network training is unstable and the ideal optimal solution is difficult to reach, in particular, the frequency spectrum normalization strategy uses the weight matrix W of the neural network layer l l Divided by its maximum singular value sigma (W l ) The output of each layer of neurons of the discriminator meets the continuity of 1-Lipshcitz, thereby meeting the calculation requirement of the optimal transmission bulldozer distance, and the weight matrix modified by the frequency spectrum normalization technology is as follows
Figure BDA0003998356360000052
Definition 8: reLU function. The idea of the activation function, also called piecewise linear function, is that a value smaller than 0 is set to 0, while a value larger than 0 is not changed, the expression is ReLU (x) =max (0, x), and the function of ReLU is to introduce nonlinearity into the deep neural network and improve fitting capability of the deep neural network.
Definition 9: tanh function. I.e. hyperbolic tangent function. The nonlinear monotonic rising and falling relation of the input and the output can be kept, the function expression is similar to the Sigmoid activation function, but has a gradient range wider than that of the Sigmoid function, thereby being beneficial to improving the gradient disappearance problem frequently occurring in the neural network, and the expression is that
Figure BDA0003998356360000061
The invention only applies the Tanh function to the output end of the generator network, so that the output data distribution is located at [ -1,1]Interval.
Definition 10: cosine similarity. The method is also called cosine distance, is a common vector distance measurement method, and can measure similarity of vectors in a high-dimensional space better by calculating the included angle of two vectors and measuring the similarity between the vectors, and is widely adopted by a self-supervision characterization learning method, and the expression is as follows:
Figure BDA0003998356360000062
Wherein x is i 、y i Respectively represent vectors
Figure BDA0003998356360000063
A component in the i-th dimension.
Definition 11: normalizing the exponential function. The normalized exponential function (Softmax) is the most commonly used output normalization method in multi-classification tasks, which is calculated by the formula
Figure BDA0003998356360000064
N-dimensional vector +.>
Figure BDA0003998356360000065
X of each dimension of (2) i Compressed to [0,1 ]]Interval, at the same time, the compressed vector is provided with +.>
Figure BDA0003998356360000066
Such that the output vector conforms to the rules of the class probability distribution and is therefore often used as the final output operation of the probability prediction model.
Definition 12: one-bit efficient coding, also called One-Hot coding, is a coding mode for coding data into a binary form, wherein in the binary representation, other bits are set to 0 except an integer index position '1' corresponding to the coded data, and the binary representation is commonly used for vector representation of discrete variables.
Definition 13: cross entropy loss. Cross-Entropy Loss (Cross-Entropy Loss) is a commonly used class Loss function that is used to increase the mutual information between two distributions by minimizing the Cross-Entropy function between the two, thereby achieving a correlation between class and feature. For two mathematical distributions p and q, when q is expressed by using the distribution p, the average information length which can be transmitted is the cross entropy between the two, and is recorded as
Figure BDA0003998356360000067
Definition 14: an antagonizing network is generated. The generating countermeasure network is a representative model of an implicit probability inferred image generating method, the core component of the generating countermeasure network is a binary model formed by a generator and a discriminator, the task of the discriminator is to distinguish the authenticity of an input image, the task of the generator is to map an input hidden code to generate a false image, and the optimization aim is to generate a sufficiently realistic image sample to cheat the discriminator. The two optimization targets are opposite to form a pair of maximum and minimum zeros and games, and in the optimization process, the network parameters of the generator automatically fit probability distribution of real data without displaying solution, so that the strategy can reduce the optimization difficulty.
Definition 15: an encoder network. The encoder network constructed based on the deep neural network is a generic term of a feature extraction module, and has the function of extracting and representing output data of a high-dimensional space. The most common task of the encoder network is to generalize the input data, mine the internal differences of the input data according to the control of the supervision information or the constraint function, and generalize different characterization modes of different data. The optimization of the encoder network used herein aims at fully extracting the structured characterization information of the image by using an unsupervised method, can generalize the inherent category attribute differences of the image data, and transmits the generalized information to the generator to guide the generator to realize attribute control.
Definition 16: random data enhancement. The random data enhancement method is a data expansion method provided for relieving the deficiency of the diversity of training data, and an enhancement sample is generated by carrying out position and color conversion on an original image according to a certain rule or adding additional noise and the like so as to relieve the over-fitting phenomenon of a deep neural network. The purpose of using the random data enhancement strategy is to generate two copies with certain difference for the same image sample, and the two copies are used for downstream attribute constraint and image annotation extraction.
Therefore, the technical scheme of the invention is as follows: a method for generating a fair and controllable image based on a priori knowledge of a structured network, the method comprising:
step 1: preprocessing experimental image data;
all image data to be used are converted from RGB format to Tensor format with deep learning operation, and the pixel value range is normalized from 0-255 to 0-1 interval, so that a computer can conveniently process and infer, and each training image is restrained to have the same size;
step 2: carrying out random data enhancement on experimental image data;
the method is characterized in that 8 operations including random scaling clipping, random horizontal overturning, random brightness change, random contrast change, random saturation change, random tone change, random graying and random Gaussian blur are used for carrying out random data enhancement processing on the image data processed in the step 1; these 8 data enhancement operations will be applied to each image independently with probability as follows:
The first step, selecting a cutting area from 20% -100% of the area in the original image at random, and recovering the original size of the cut image;
secondly, horizontally overturning the image with 50% probability;
and thirdly, randomly converting the brightness, contrast and saturation of the image into 60-140% of the original image, and randomly shifting the hue of the image by a range of-10%. Note that in practical applications, these four operations are bound to each other with a probability of 80% being used;
fourth, converting the image into a gray image with 20% probability;
fifth, the image is Gaussian blurred with 50% probability, resulting in a more blurred sample.
When in practical application, the corresponding data enhancement methods are sequentially applied according to the sequence, and each image can obtain a random enhancement sample with the same size as the original image;
step 3: constructing a deep neural network;
1) Constructing a generator network:
the input of the generator is 128-dimensional noise vector consisting of 118-dimensional Gaussian noise and 10-dimensional One-Hot coding, and the output is an image with the size of 3 x 32; the generator network structure is formed by sequentially connecting a full connection layer, a residual neural network formed by 3 residual neural network modules and a two-dimensional convolution layer, wherein the full connection layer is used as an input end, and the two-dimensional convolution layer is used as an output end;
2) Constructing a discriminator network:
the discriminator takes a real image and a generated image as input and outputs as 1-dimensional characteristics, the network structure of the discriminator consists of four spectrum normalization residual error blocks, a global average pooling layer and a full-connection layer, the four spectrum normalization residual error neural network modules are sequentially connected to form a residual error neural network, the discriminator network is sequentially connected in sequence by the residual error neural network, the global average pooling neural network and the full-connection layer, the residual error neural network is taken as an input end, and the full-connection layer is taken as an output end;
3) Constructing an encoder network:
the encoder inputs random data enhancement samples for generating an image and a real image and outputs the random data enhancement samples as image feature vectors; the main structure of the encoder network is formed by sequentially connecting a ResNet18 network and two full connection layers, wherein a residual neural network is used as an input end, the last full connection layer is used as an output end, and 2048-dimensional feature vectors are output; for the output of the encoder body network, the feature vector is sent to a pre-header of an additional two-layer fully-connected layer structure for calculating cosine similarity loss of the feature, and is sent to a cluster header consisting of one layer of fully-connected layer for providing attribute control loss for the generator. The encoder designed by the invention has high-dimensional output dimension, so that the encoder has stronger characteristic representation capability, can learn the characteristic information of different mode data of a complete data set more fully, and simultaneously uses batch standardization operation for all full-connection layers behind a residual connection layer to slow down the overfitting phenomenon;
Step 4: designing a priori knowledge of a structured network to introduce loss;
recording the real image Tensor format data processed in the step 1 as
Figure BDA0003998356360000081
Carrying out random data enhancement on x in the step 2 to obtain two enhancement samples which are respectively marked as T 1 (x)、T 2 (x) The method comprises the steps of carrying out a first treatment on the surface of the The main network structure of the encoder is recorded as E, the pre-measurement head is recorded as P, and the output characteristic of the encoder is recorded as q i =E(T i (x) A predictive head output characteristic h i =P(q i );
In order to learn the reliable and sufficient prior characterization knowledge information of the real image, reasonable and effective characterization mining loss is required in the pre-training stage of the encoder, so that reliable network loss weight initialization can be provided for the encoder, and the encoder can output reliable and fair attribute constraint in the subsequent image generation task, so that the fairness of image generation and attribute control is promoted. Based on this starting point, the present invention introduces a penalty using structured network prior knowledge based on cosine similarity, expecting the encoder network to output similar feature representations for different enhanced samples of the same image:
Figure BDA0003998356360000091
in the above formula, the stopgard (·) represents a gradient stopping strategy, and the variable stopped by the gradient is not calculated by the gradient in the back propagation process;
step 5: designing network loss in the formal training process;
Recording 118-dimensional class independent hidden codes obtained by random sampling from Gaussian distribution as
Figure BDA0003998356360000092
Marking a random integer with a value of 0-9 sampled from Category distribution with probability of 0.1 as m, and marking a corresponding 10-dimensional One-Hot vector as +.>
Figure BDA0003998356360000093
Taking z and c together as the input of the generator, thus the generator input is steganographically encoded into 128 dimensions; the generator, the discriminator and the encoder cluster head network are respectively marked as G, D, C, and part of unexplained parameters have the same meaning as those in the step 4;
1) Generator Loss function Loss G
The optimization of the generator aims at generating images that are as realistic as possible, while the generated images under the control of the same class steganography c should have the same characteristic representation so that they have a consistent class. Based on the above objectives, the loss of the generator includes generating an countering network loss
Figure BDA0003998356360000094
Constraint loss with category consistency>
Figure BDA0003998356360000095
Two parts; wherein:
Figure BDA0003998356360000096
Figure BDA0003998356360000097
in the above-mentioned method, the step of,
Figure BDA0003998356360000098
representing a loss expectation calculated from an image generated from a batch of latent codes sampled in a distribution; d (G (z, c)) represents an output to which the discriminator corresponds with the generator-generated image as input; c (E (G (z, C))) represents a 10-dimensional feature representation of the encoder cluster head after further generalizing the features extracted by the generator through the encoder body network; CE (,) represents cross entropy loss. It should be noted that when the cross entropy loss is calculated, the image characterization extracted by the cluster head C is Softmax operated, and this step is omitted for simplicity of representation;
Thus, the generator total loss function is:
Figure BDA0003998356360000099
in the above-mentioned method, the step of,
Figure BDA0003998356360000101
the symbol indicates that the corresponding network parameters are fixed, and the gradient of the related network is not calculated; beta c For the adjustable super-parameters, the weight of the class consistency loss in the total loss of the generator is represented;
2) Loss function Loss of discriminator D
The optimization purpose of the discriminator is to distinguish the real image from the generated image as accurately as possible, and the loss function based on the hinge loss design is as follows:
Figure BDA0003998356360000102
in the above-mentioned method, the step of,
Figure BDA0003998356360000103
representation pairs randomly from a true image distributionThe sampled samples are expected, D (x) represents the output of the discriminator corresponding to the real image as input, and the rest definitions are the same as those in the generator loss function;
3) Encoder Loss function Loss E
The optimization purpose of the encoder is two, namely, the encoder provides a characteristic representation which accords with the real image data characterization mode as much as possible, and error guidance is avoided being provided for the encoder; furthermore, the encoder needs to simultaneously restrict the generated images sharing the same class of latent codes according to a given class of latent codes c to have similar feature representations. In order to achieve the above, the invention continues the prior knowledge guiding loss of the design of the encoder in the pre-training stage, ensures that the encoder is always optimized along the direction conforming to the characterization characteristics of the real image in the subsequent training process, and simultaneously uses the same cross entropy loss function as the generator to meet the requirement of attribute control constraint:
Figure BDA0003998356360000104
Figure BDA0003998356360000105
The definitions of the terms in the above formulas are the same as in the previous. The expression of the encoder in the formal training stage is:
Figure BDA0003998356360000106
step 6: training a total neural network;
firstly, using the structured network priori knowledge introduced loss proposed in the step 4 to pretrain the encoder by utilizing the three neural networks constructed in the step 3, introducing good structured initial network parameters for the encoder, respectively using the corresponding loss functions designed in the step 5 to train each network after the pretraining stage is completed, using an Adam momentum optimizer to fix the encoder and the generator network parameters when updating the identifier network parameters and fix the identifier network parameters when updating the encoder and the generator network;
step 7: and (3) training a model in the step (6), saving model parameters, taking a generator, constructing an input hidden code vector of the generator according to the method in the step (5), and inputting the input hidden code vector into the generator to obtain a generated image, wherein different hidden code combination inputs generate different generated images.
The invention comprises the following improvement points:
a, aiming at the problem that the current unsupervised attribute controllable image generation method does not consider fairness of a generation model for different mode data generation, the fair controllable image generation method based on structured network priori knowledge is provided, and the encoder network is designed with a good structured network priori loss function to pretrain, so that reliable parameter initialization is provided for an attribute control process of an objective network, and the correctness of an initial optimization path of the model is ensured.
b, aiming at the problem that the attribute control effect of the current unsupervised attribute control image generation method still has a lifting space, the loss is introduced by combining the prior knowledge of the structured network, which is provided by the invention, the network structure of the encoder is carefully researched, the characteristic of higher dimension is provided to more fully represent the image, meanwhile, the detailed experiment is performed on each linear layer component at the tail end of the encoder, and finally, the batch normalization technology is introduced, so that the representation and extraction capacity of the encoder is further improved.
In the invention, the difference of different characterization extraction methods is considered, the class prediction is carried out on the generated image by using a pre-trained classifier network, and a fairer attribute control effect measurement strategy is provided according to the attribute control effect of the consistency measurement model between the class prediction result and the class hidden code distribution corresponding to the image.
The improvement in a can be based on the existing unsupervised attribute control image generation algorithm, any modification on the network structure is not needed, a structured priori knowledge guarantee is provided for the parameter initialization of the model, and the fairness of the image generation process is improved; the improvement in the step b can obviously improve the attribute control image generation effect of the model on the basis of not modifying the model optimization process and not obviously enhancing the model inference cost; the improvement in c mainly aims at the measurement of the model attribute control capability, a third-party classifier independent of the model is used for evaluating the quality of model attribute control, and the convincing power of an evaluation result is improved. Experimental results show that the invention can relatively improve the clustering purity index of the model image generation result by 18.6% aiming at the improvement of the encoder network structure, the normalized mutual information index by 38.7%, and the adjustment Rankine index by 45%. After the structured network priori knowledge training strategy proposed by the invention is further introduced, compared with the original method, the clustering purity of the generated result reaches 0.54, and is improved by 25.5%; the standardized mutual information reaches 0.48, which is improved by 54.8 percent relatively; the Rankine index is adjusted to be 0.36, and the relative improvement is 80 percent. The experimental results fully illustrate that the structured network priori knowledge introduction strategy provided by the invention has a remarkable promotion effect on the generation of the unsupervised attribute control performance of the antagonistic network, and the promotion of the attribute control effect means that the uniformity of capturing different types of modal images by the model is more fully represented. In addition, in order to measure the promoting effect of the method for generating the anti-network fairness more accurately, the diversity of the generated results is used as the model fairness measurement basis, and the diversity of the generated results is directly reflected by the coverage integrity of the model to the real image data, through tests, the method relatively improves the image generated result diversity index FID by 13.2%, which indicates that the structural network priori knowledge guiding strategy can effectively improve the fairness of generating the anti-network model.
Drawings
FIG. 1 is a schematic diagram of the overall structure of the method of the present invention;
Detailed Description
Step 1: preprocessing experimental image data;
the CIFAR-10 data set is obtained from an official channel, all image data are converted from an RGB format to a Tensor format which can be operated by deep learning, and the pixel value range is normalized from 0 to 255 to a 0-1 interval, so that a computer can conveniently process and infer, and each training image is restrained to have the same size.
Step 2: carrying out random data enhancement on experimental image data;
the method is characterized in that 8 operations including random scaling clipping, random horizontal overturning, random brightness change, random contrast change, random saturation change, random tone change, random graying and random Gaussian blur are used for carrying out random data enhancement processing on the image data processed in the step 1; these 8 data enhancement operations will be applied to each image independently with probability as follows:
the first step, selecting a cutting area from 20% -100% of the area in the original image at random, and recovering the original size of the cut image;
secondly, horizontally overturning the image with 50% probability;
and thirdly, randomly converting the brightness, contrast and saturation of the image into 60-140% of the original image, and randomly shifting the hue of the image by a range of-10%. Note that in practical applications, these four operations are bound to each other with a probability of 80% being used;
Fourth, converting the image into a gray image with 20% probability;
fifth, the image is Gaussian blurred with 50% probability, resulting in a more blurred sample.
In practical application, the corresponding data enhancement methods are sequentially applied according to the sequence, and each image is subjected to random enhancement samples with the same size as the original image.
Step 3: constructing a deep neural network;
1) Constructing a generator network:
the input of the generator is 128-dimensional noise vector consisting of 118-dimensional Gaussian noise and 10-dimensional One-Hot coding, and the output is an image with the size of 3 x 32; the generator network structure is formed by sequentially connecting a full connection layer, a residual neural network formed by 3 residual neural network modules and a two-dimensional convolution layer, wherein the full connection layer is used as an input end, and the two-dimensional convolution layer is used as an output end.
2) Constructing a discriminator network:
the discriminator takes a real image and a generated image as input and outputs as 1-dimensional characteristics, the network structure of the discriminator consists of four spectrum normalization residual error blocks, a global average pooling layer and a full-connection layer, the four spectrum normalization residual error neural network modules are sequentially connected to form a residual error neural network, the discriminator network is sequentially connected in sequence by the residual error neural network, the global average pooling neural network and the full-connection layer, the residual error neural network is taken as an input end, and the full-connection layer is taken as an output end.
3) Constructing an encoder network:
the encoder inputs random data enhancement samples for generating an image and a real image and outputs the random data enhancement samples as image feature vectors; the main structure of the encoder network is formed by sequentially connecting a ResNet18 network and two full connection layers, wherein a residual neural network is used as an input end, the last full connection layer is used as an output end, and 2048-dimensional feature vectors are output; for the output of the encoder body network, the feature vector is sent to a pre-header of an additional two-layer fully-connected layer structure for calculating cosine similarity loss of the feature, and is sent to a cluster header consisting of one layer of fully-connected layer for providing attribute control loss for the generator. The encoder designed by the invention has high-dimensional output dimension, so that the encoder has stronger characteristic representation capability, can learn the characteristic information of different mode data of a complete data set more fully, and simultaneously uses batch standardization operation for all full-connection layers behind a residual connection layer to slow down the overfitting phenomenon.
Step 4: designing a priori knowledge of a structured network to introduce loss;
recording the real image Tensor format data processed in the step 1 as
Figure BDA0003998356360000131
Carrying out random data enhancement on x in the step 2 to obtain two enhancement samples which are respectively marked as T 1 (x)、T 2 (x) The method comprises the steps of carrying out a first treatment on the surface of the The main network structure of the encoder is recorded as E, the pre-measurement head is recorded as P, and the output characteristic of the encoder is recorded as q i =E(T i (x) A predictive head output characteristic h i =P(q i )。
In order to learn the reliable and sufficient prior characterization knowledge information of the real image, reasonable and effective characterization mining loss is required in the pre-training stage of the encoder, so that reliable network loss weight initialization can be provided for the encoder, and the encoder can output reliable and fair attribute constraint in the subsequent image generation task, so that the fairness of image generation and attribute control is promoted. Based on this starting point, the present invention introduces a penalty using structured network prior knowledge based on cosine similarity, expecting the encoder network to output similar feature representations for different enhanced samples of the same image:
Figure BDA0003998356360000132
in the above equation, the stoprad (·) represents a gradient stop strategy, and the variables stopped by the gradient are not computed during the back propagation.
Step 5: designing network loss in the formal training process;
recording 118-dimensional class independent hidden codes obtained by random sampling from Gaussian distribution as
Figure BDA0003998356360000133
Marking a random integer with a value of 0-9 sampled from Category distribution with probability of 0.1 as m, and marking a corresponding 10-dimensional One-Hot vector as +. >
Figure BDA0003998356360000134
Taking z and c together as the input of the generator, thus the generator input is steganographically encoded into 128 dimensions; the generator, the discriminator and the encoder cluster head network are respectively marked as G, D, C, and part of unexplained parameters have the same meaning as that in the step 4.
1) Generator Loss function Loss G
The optimization of the generator aims at generating images that are as realistic as possible, while the generated images under the control of the same class steganography c should have the same characteristic representation so that they have a consistent class. Based on the above objectives, the loss of the generator includes generating an countering network loss
Figure BDA0003998356360000141
Constraint loss with category consistency>
Figure BDA0003998356360000142
Two parts; wherein:
Figure BDA0003998356360000143
Figure BDA0003998356360000144
in the above-mentioned method, the step of,
Figure BDA0003998356360000145
representing a loss expectation calculated from an image generated from a batch of steganography sampled in a distribution; d (G (z, c)) represents an output to which the discriminator corresponds with the generator-generated image as input; c (E (G (z, C))) represents a 10-dimensional feature representation of the encoder cluster head after further generalizing the features extracted by the generator through the encoder body network; CE (,) represents cross entropy loss. It should be noted that when calculating the cross entropy loss, the image representation extracted by cluster head C is Softmax operated, and this step is omitted for simplicity of representation.
Thus, the generator total loss function is:
Figure BDA0003998356360000146
in the above-mentioned method, the step of,
Figure BDA0003998356360000147
the symbol indicates that the corresponding network parameters are fixed, and the gradient of the related network is not calculated; beta c For the tunable superparameter, it is indicated that the class consistency penalty accounts for the weight of the total penalty of the generator.
2) Loss function Loss of discriminator D
The optimization purpose of the discriminator is to distinguish the real image from the generated image as accurately as possible, and the loss function based on the hinge loss design is as follows:
Figure BDA0003998356360000148
in the above-mentioned method, the step of,
Figure BDA0003998356360000149
representing the expectation of a plurality of samples randomly sampled from the real image distribution, wherein D (x) represents the output corresponding to the real image taken as the input by the discriminator, and the rest definitions are the same as those in the generator loss function;
3) Encoder Loss function Loss E
The optimization purpose of the encoder is two, namely, the encoder provides a characteristic representation which accords with the real image data characterization mode as much as possible, and error guidance is avoided being provided for the encoder; furthermore, the encoder needs to simultaneously restrict the generated images sharing the same class of latent codes according to a given class of latent codes c to have similar feature representations. In order to achieve the above, the invention continues the prior knowledge guiding loss of the design of the encoder in the pre-training stage, ensures that the encoder is always optimized along the direction conforming to the characterization characteristics of the real image in the subsequent training process, and simultaneously uses the same cross entropy loss function as the generator to meet the requirement of attribute control constraint:
Figure BDA0003998356360000151
Figure BDA0003998356360000152
The definitions of the terms in the above formulas are the same as in the previous. The expression of the encoder in the formal training stage is:
Figure BDA0003998356360000153
step 6: training a total neural network;
the three neural networks constructed in the step 3 are utilized, firstly, the encoder is pre-trained by using the loss introduced by the prior knowledge of the structured network proposed in the step 4, good structured initial network parameters are introduced for the encoder, an Adam momentum optimizer is used in the encoder pre-training stage, the learning rate is set to be 0.0002, 64 real images are fed in each iteration, and the whole experimental encoder iterates 500 times by using a complete data set.
After the pre-training is finished, training is carried out by using the corresponding loss function designed in the step 5, the Adam momentum optimizer is used, the design learning rate is 0.0002, the encoder and the generator are bound with update parameters, the encoder and the generator update parameters alternately with the discriminator, and the generator and the encoder update once every three times the discriminator is updated. The encoder and generator network parameters are fixed when the arbiter network parameters are updated, and the arbiter network parameters are fixed when the encoder and generator network are updated. The Adam momentum optimizer is used for setting the learning rate to be 0.0002, the arbiter updates 64 generated images and 64 real images each time, the generator and the encoder independently and uniformly distribute and sample 128 groups of hidden coding vectors for updating, and in addition, the encoder additionally updates by using 64 real images. The whole experimental discriminant iterates 350 times in total using the complete data set.
All experiments are realized by depending on a Python language Python deep learning platform, wherein the Python version is 3.6, and the Python version is 1.7.1.
Step 7: testing a total neural network;
training a model in the step 6, saving model parameters, taking a generator, constructing hidden code vectors according to the method in the step 5, inputting the hidden code vectors into the generator, and obtaining generated images, wherein different random noise inputs generate different generated images. 50000 generated images are generated according to the method, the FID indexes of the image generation diversity are calculated, and fairness of the image generation process of the generator is evaluated. Simultaneously, 1000 independent hidden codes of each class are randomly sampled independently for pairing, and are sent to a generator to generate 10000 generated images, the 10000 generated images are input into a classifier network which is pre-trained on a real CIFAR-10 data set for class prediction, and according to class hidden code distribution and predicted class distribution, a clustering purity index ACC is used for standardizing a mutual information index NMI and adjusting an attribute control effect of a Rand index ARI evaluation model.

Claims (1)

1. A method for generating a fair and controllable image based on a priori knowledge of a structured network, the method comprising:
step 1: preprocessing experimental image data;
All image data to be used are converted from RGB format to Tensor format with deep learning operation, and the pixel value range is normalized from 0-255 to 0-1 interval, so that a computer can conveniently process and infer, and each training image is restrained to have the same size;
step 2: carrying out random data enhancement on experimental image data;
the method is characterized in that 8 operations including random scaling clipping, random horizontal overturning, random brightness change, random contrast change, random saturation change, random tone change, random graying and random Gaussian blur are used for carrying out random data enhancement processing on the image data processed in the step 1; these 8 data enhancement operations will be applied to each image independently with probability as follows:
the first step, selecting a cutting area from 20% -100% of the area in the original image at random, and recovering the original size of the cut image;
secondly, horizontally overturning the image with 50% probability;
and thirdly, randomly converting the brightness, contrast and saturation of the image into 60-140% of the original image, and randomly shifting the hue of the image by a range of-10%. Note that in practical applications, these four operations are bound to each other with a probability of 80% being used;
Fourth, converting the image into a gray image with 20% probability;
fifth, the image is Gaussian blurred with 50% probability, resulting in a more blurred sample.
When in practical application, the corresponding data enhancement methods are sequentially applied according to the sequence, and each image can obtain a random enhancement sample with the same size as the original image;
step 3: constructing a deep neural network;
1) Constructing a generator network:
the input of the generator is 128-dimensional noise vector consisting of 118-dimensional Gaussian noise and 10-dimensional One-Hot coding, and the output is an image with the size of 3 x 32; the generator network structure is formed by sequentially connecting a full connection layer, a residual neural network formed by 3 residual neural network modules and a two-dimensional convolution layer, wherein the full connection layer is used as an input end, and the two-dimensional convolution layer is used as an output end;
2) Constructing a discriminator network:
the discriminator takes a real image and a generated image as input and outputs as 1-dimensional characteristics, the network structure of the discriminator consists of four spectrum normalization residual error blocks, a global average pooling layer and a full-connection layer, the four spectrum normalization residual error neural network modules are sequentially connected to form a residual error neural network, the discriminator network is sequentially connected in sequence by the residual error neural network, the global average pooling neural network and the full-connection layer, the residual error neural network is taken as an input end, and the full-connection layer is taken as an output end;
3) Constructing an encoder network:
the encoder inputs random data enhancement samples for generating an image and a real image and outputs the random data enhancement samples as image feature vectors; the main structure of the encoder network is formed by sequentially connecting a ResNet18 network and two full connection layers, wherein a residual neural network is used as an input end, the last full connection layer is used as an output end, and 2048-dimensional feature vectors are output; for the output of the encoder body network, the feature vector is sent to a pre-header of an additional two-layer fully-connected layer structure for calculating cosine similarity loss of the feature, and is sent to a cluster header consisting of one layer of fully-connected layer for providing attribute control loss for the generator. The encoder designed by the invention has high-dimensional output dimension, so that the encoder has stronger characteristic representation capability, can learn the characteristic information of different mode data of a complete data set more fully, and simultaneously uses batch standardization operation for all full-connection layers behind a residual connection layer to slow down the overfitting phenomenon;
step 4: designing a priori knowledge of a structured network to introduce loss;
recording the real image Tensor format data processed in the step 1 as
Figure FDA0003998356350000021
Carrying out random data enhancement on x in the step 2 to obtain two enhancement samples which are respectively marked as T 1 (x)、T 2 (x) The method comprises the steps of carrying out a first treatment on the surface of the The main network structure of the encoder is recorded as E, the pre-measurement head is recorded as P, and the output characteristic of the encoder is recorded as q i =E(T i (x) A predictive head output characteristic h i =P(q i );
In order to learn the reliable and sufficient prior characterization knowledge information of the real image, reasonable and effective characterization mining loss is required in the pre-training stage of the encoder, so that reliable network loss weight initialization can be provided for the encoder, and the encoder can output reliable and fair attribute constraint in the subsequent image generation task, so that the fairness of image generation and attribute control is promoted. Based on this starting point, the present invention introduces a penalty using structured network prior knowledge based on cosine similarity, expecting the encoder network to output similar feature representations for different enhanced samples of the same image:
Figure FDA0003998356350000022
in the above formula, the stopgard (·) represents a gradient stopping strategy, and the variable stopped by the gradient is not calculated by the gradient in the back propagation process;
step 5: designing network loss in the formal training process;
recording 118-dimensional class independent hidden codes obtained by random sampling from Gaussian distribution as
Figure FDA0003998356350000023
Marking a random integer with a value of 0-9 sampled from Category distribution with probability of 0.1 as m, and marking a corresponding 10-dimensional One-Hot vector as
Figure FDA0003998356350000024
Taking z and c together as the input of the generator, thus the generator input is steganographically encoded into 128 dimensions; the generator, the discriminator and the encoder cluster head network are respectively marked as G, D, C, and part of unexplained parameters have the same meaning as those in the step 4;
1) Generator Loss function Loss G
The optimization of the generator aims at generating images that are as realistic as possible, while the generated images under the control of the same class steganography c should have the same characteristic representation so that they have a consistent class. Based on the above objectives, the loss of the generator includes generating an countering network loss
Figure FDA0003998356350000031
Constraint loss with category consistency>
Figure FDA0003998356350000032
Two parts; wherein:
Figure FDA0003998356350000033
Figure FDA0003998356350000034
in the above-mentioned method, the step of,
Figure FDA0003998356350000035
representing a loss expectation calculated from an image generated from a batch of latent codes sampled in a distribution; d (G (z, c)) represents an output to which the discriminator corresponds with the generator-generated image as input; c (E (G (z, C))) represents a 10-dimensional feature representation of the encoder cluster head after further generalizing the features extracted by the generator through the encoder body network; CE (·, ·) represents cross entropy loss. It should be noted that when the cross entropy loss is calculated, the image characterization extracted by the cluster head C is Softmax operated, and this step is omitted for simplicity of representation;
Thus, the generator total loss function is:
Figure FDA0003998356350000036
in the above-mentioned method, the step of,
Figure FDA0003998356350000037
the symbol indicates that the corresponding network parameters are fixed, and the gradient of the related network is not calculated; beta c For the adjustable super-parameters, the weight of the class consistency loss in the total loss of the generator is represented;
2) Loss function Loss of discriminator D
The optimization purpose of the discriminator is to distinguish the real image from the generated image as accurately as possible, and the loss function based on the hinge loss design is as follows:
Figure FDA0003998356350000038
in the above-mentioned method, the step of,
Figure FDA0003998356350000039
representing the expectation of a plurality of samples randomly sampled from the real image distribution, wherein D (x) represents the output corresponding to the real image taken as the input by the discriminator, and the rest definitions are the same as those in the generator loss function;
3) Encoder Loss function Loss E
The optimization purpose of the encoder is two, namely, the encoder provides a characteristic representation which accords with the real image data characterization mode as much as possible, and error guidance is avoided being provided for the encoder; furthermore, the encoder needs to simultaneously restrict the generated images sharing the same class of latent codes according to a given class of latent codes c to have similar feature representations. In order to achieve the above, the invention continues the prior knowledge guiding loss of the design of the encoder in the pre-training stage, ensures that the encoder is always optimized along the direction conforming to the characterization characteristics of the real image in the subsequent training process, and simultaneously uses the same cross entropy loss function as the generator to meet the requirement of attribute control constraint:
Figure FDA0003998356350000041
Figure FDA0003998356350000042
The definitions of the terms in the above formulas are the same as in the previous. The expression of the encoder in the formal training stage is:
Figure FDA0003998356350000043
step 6: training a total neural network;
firstly, using the structured network priori knowledge introduced loss proposed in the step 4 to pretrain the encoder by utilizing the three neural networks constructed in the step 3, introducing good structured initial network parameters for the encoder, respectively using the corresponding loss functions designed in the step 5 to train each network after the pretraining stage is completed, using an Adam momentum optimizer to fix the encoder and the generator network parameters when updating the identifier network parameters and fix the identifier network parameters when updating the encoder and the generator network;
step 7: and (3) training a model in the step (6), saving model parameters, taking a generator, constructing an input hidden code vector of the generator according to the method in the step (5), and inputting the input hidden code vector into the generator to obtain a generated image, wherein different hidden code combination inputs generate different generated images.
CN202211607479.6A 2022-12-14 2022-12-14 Fair controllable image generation method based on structured network priori knowledge Pending CN116109719A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211607479.6A CN116109719A (en) 2022-12-14 2022-12-14 Fair controllable image generation method based on structured network priori knowledge

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211607479.6A CN116109719A (en) 2022-12-14 2022-12-14 Fair controllable image generation method based on structured network priori knowledge

Publications (1)

Publication Number Publication Date
CN116109719A true CN116109719A (en) 2023-05-12

Family

ID=86260669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211607479.6A Pending CN116109719A (en) 2022-12-14 2022-12-14 Fair controllable image generation method based on structured network priori knowledge

Country Status (1)

Country Link
CN (1) CN116109719A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116778208A (en) * 2023-08-24 2023-09-19 吉林大学 Image clustering method based on depth network model
CN116993770A (en) * 2023-08-16 2023-11-03 哈尔滨工业大学 Image segmentation method based on residual error diffusion model
CN117670689A (en) * 2024-01-31 2024-03-08 四川辰宇微视科技有限公司 Method for improving image quality of ultraviolet image intensifier through AI algorithm control

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116993770A (en) * 2023-08-16 2023-11-03 哈尔滨工业大学 Image segmentation method based on residual error diffusion model
CN116993770B (en) * 2023-08-16 2024-05-28 哈尔滨工业大学 Image segmentation method based on residual error diffusion model
CN116778208A (en) * 2023-08-24 2023-09-19 吉林大学 Image clustering method based on depth network model
CN116778208B (en) * 2023-08-24 2023-11-10 吉林大学 Image clustering method based on depth network model
CN117670689A (en) * 2024-01-31 2024-03-08 四川辰宇微视科技有限公司 Method for improving image quality of ultraviolet image intensifier through AI algorithm control

Similar Documents

Publication Publication Date Title
CN110020682B (en) Attention mechanism relation comparison network model method based on small sample learning
CN116109719A (en) Fair controllable image generation method based on structured network priori knowledge
Xing An improved emperor penguin optimization based multilevel thresholding for color image segmentation
CN109165664B (en) Attribute-missing data set completion and prediction method based on generation of countermeasure network
Jia et al. An optimized RBF neural network algorithm based on partial least squares and genetic algorithm for classification of small sample
CN112308133A (en) Modulation identification method based on convolutional neural network
CN108121975B (en) Face recognition method combining original data and generated data
CN114494489A (en) Self-supervision attribute controllable image generation method based on depth twin network
CN112464004A (en) Multi-view depth generation image clustering method
CN112183742A (en) Neural network hybrid quantization method based on progressive quantization and Hessian information
CN113011487B (en) Open set image classification method based on joint learning and knowledge migration
CN113836312A (en) Knowledge representation reasoning method based on encoder and decoder framework
CN113420868A (en) Traveling salesman problem solving method and system based on deep reinforcement learning
CN112884045A (en) Classification method of random edge deletion embedded model based on multiple visual angles
CN111325259A (en) Remote sensing image classification method based on deep learning and binary coding
CN117290721A (en) Digital twin modeling method, device, equipment and medium
CN117070741B (en) Control method and system of pickling line
CN111160161A (en) Self-learning face age estimation method based on noise elimination
CN116341666A (en) Quantum convolution neural network design method and system based on quantum circuit
CN111429436B (en) Intrinsic image analysis method based on multi-scale attention and label loss
CN115063374A (en) Model training method, face image quality scoring method, electronic device and storage medium
CN114139674A (en) Behavior cloning method, electronic device, storage medium, and program product
CN113744175A (en) Image generation method and system for generating countermeasure network based on bidirectional constraint
Li et al. Modulation recognition analysis based on neural networks and improved model
Kang et al. Game Theory Meets Data Augmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination