WO2021254499A1 - Editing model generation method and apparatus, face image editing method and apparatus, device, and medium - Google Patents

Editing model generation method and apparatus, face image editing method and apparatus, device, and medium Download PDF

Info

Publication number
WO2021254499A1
WO2021254499A1 PCT/CN2021/101007 CN2021101007W WO2021254499A1 WO 2021254499 A1 WO2021254499 A1 WO 2021254499A1 CN 2021101007 W CN2021101007 W CN 2021101007W WO 2021254499 A1 WO2021254499 A1 WO 2021254499A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
generator
training
discriminator
editing
Prior art date
Application number
PCT/CN2021/101007
Other languages
French (fr)
Chinese (zh)
Inventor
吴臻志
祝夭龙
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2021254499A1 publication Critical patent/WO2021254499A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the embodiment of the present invention relates to the field of artificial intelligence, in particular to the editing of face images and the generation of corresponding editing models.
  • a Generative Adversarial Network including a generator and a discriminator can be used to generate real face images.
  • the generator is used to generate the face image
  • the discriminator is used to distinguish the true and false of the generated face image.
  • the training of the generative confrontation network is actually training the generator and discriminator in the generative confrontation network.
  • the discriminator training is usually completed faster. After the training begins, the discriminator can be quickly trained to determine the true and false of the generated image relatively accurately, but at this time the generator has not learned how to generate real images, and the images generated by the generator cannot pass the judgment of the discriminator. , Resulting in the failure of the training of the entire generative confrontation network.
  • the authenticity of the image generated based on the generative confrontation network obtained by training cannot be guaranteed, and the editing effect of the image editing model obtained based on the generative confrontation network cannot be guaranteed.
  • the embodiment of the present invention provides an editing model generation and face image editing method, device, equipment and medium, which can improve the training consistency of the generator and the discriminator, and improve the authenticity of the generated image.
  • an embodiment of the present invention provides an editing model generation method, including: performing iterative training on a generative confrontation network, the generative confrontation network including a generator and a discriminator; in the iterative training, according to all The gradient update configuration information of the discriminator updates the generative confrontation network, wherein the gradient update configuration information is determined by Lipschitz constraints; when it is determined that the generative confrontation network satisfies the training end condition, it is obtained according to the training
  • the generator in the generative confrontation network generates an image editing model.
  • an embodiment of the present invention provides a face image editing method, including: acquiring a face image to be edited; inputting the face image to be edited into an image editing model to obtain the image editing model The output edited face image.
  • the image editing model is generated by the above-mentioned editing model generation method.
  • an embodiment of the present invention also provides an editing model generation device, including: a network training module for iterative training of a generative confrontation network, the generative confrontation network including a generator and a discriminator; network update A module for updating the generative confrontation network according to the gradient update configuration information of the discriminator in the iterative training, the gradient update configuration information is determined by Lipschitz constraints; a model generation module is used for When it is determined that the generative confrontation network satisfies the training end condition, an image editing model is generated according to the generator in the generative confrontation network obtained by training.
  • an embodiment of the present invention also provides a face image editing device, including: a face image acquisition module for acquiring a face image to be edited; a face image editing module for editing the face image The face image of is input into the image editing model to obtain the edited face image output by the image editing model; wherein, the image editing model is generated by the above-mentioned editing model generation method.
  • an embodiment of the present invention also provides a computer device, including a memory, a processor, and a computer program stored on the memory and running on the processor. Wherein, when the processor executes the program, the editing model generation method or the face image editing method according to any one of the embodiments of the present invention is implemented.
  • an embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the editing model generation method as described in any one of the embodiments of the present invention is implemented Or face image editing method.
  • the embodiment of the present invention limits the learning rate of the parameter items of the discriminator according to Lipschitz constraints when training a generative confrontation network including a generator and a discriminator, so as to slow down the training speed of the discriminator, and can improve the discriminator and
  • the training consistency of the generator can ensure the accuracy of the discriminator’s identification of true and false images, and enable the generator to quickly learn how to generate real images, thereby improving the editing effect of the image editing model built by the generator Authenticity.
  • FIG. 1A is a flowchart of an editing model generation method in Embodiment 1 of the present invention.
  • FIG. 1B is a schematic diagram of an application scenario of training a generative confrontation network in Embodiment 1 of the present invention.
  • FIG. 2 is a flowchart of an editing model generation method in the second embodiment of the present invention.
  • FIG. 3A is a flowchart of an editing model generation method in Embodiment 3 of the present invention.
  • FIG. 3B is a schematic diagram of an application scenario of a self-supervised training convolutional neural network in Embodiment 3 of the present invention.
  • 4A is a flowchart of a method for editing a face image in the fourth embodiment of the present invention.
  • 4B is a schematic diagram of a face image editing image in the fourth embodiment of the present invention.
  • Figure 5 is a schematic structural diagram of an editing model generating device in the fifth embodiment of the present invention.
  • Fig. 6 is a schematic structural diagram of a face image editing device in the sixth embodiment of the present invention.
  • Fig. 7 is a schematic structural diagram of a computer device in the seventh embodiment of the present invention.
  • FIG. 1A is a flowchart of a method for generating an editing model in Embodiment 1 of the present invention.
  • This embodiment may be suitable for training a generative confrontation network, and generate an image editing model according to a generator in the trained generative confrontation network.
  • the method can be executed by the editing model generation device provided in the embodiment of the present invention, and the device can be implemented in software and/or hardware, and generally can be integrated in computer equipment. As shown in FIG. 1A, the method of this embodiment specifically includes:
  • S110 Perform iterative training on a generative confrontation network (GAN), which includes a generator and a discriminator.
  • GAN generative confrontation network
  • the generator to be trained and the discriminator to be trained constitute a GAN.
  • the generator and the discriminator are actually trained at the same time.
  • samples are used to train the generative confrontation network.
  • training the generative countermeasure network includes: inputting samples of real images or noisy images to the generative countermeasure network, and iteratively trains the generative countermeasure network.
  • the samples include noisy images and real images.
  • the noise image may be a random noise image
  • the real image may include images with real attributes such as real people, real animals, or real scenes.
  • the real image may include a real face image, for example, a face photo.
  • multiple samples can be formed into a sample group, and multiple rounds of iterative training can be performed on the generative confrontation network. Each round of training can use a set number of samples for training, and the set number can be selected according to the actual situation, for example, There are eight, which are not limited in the embodiment of the present invention.
  • a set number of samples can be determined as a sample group, and in one round of training, a sample group is used to train the generative adversarial network.
  • the generative confrontation network includes a generator 101 and a discriminator 102.
  • random noise images or real images can be input into the generator 101 as samples, and the generated images output by the generator 101 can be obtained.
  • the generated image output by the generator 101 and the corresponding random noise image or real image as a sample can be input to the discriminator 102 to obtain the discrimination result output by the discriminator 102, and update the generator 101 and the real image based on the discrimination result.
  • the parameter item of the discriminator 102 is the training process.
  • the generator 101 can be used to edit any image and output the generated image
  • the discriminator 102 is used to determine whether the generated image output by the generator 101 meets the real conditions (or rules). It should be noted that the discriminator 102 is not used to determine whether the generated image is accurate, that is, whether to edit the original image into the required image effect, but to determine the degree of authenticity of the generated image, that is, true or Fake.
  • the generated image is a face image
  • the authenticity of the generated image can be determined according to the positional relationship between the nose and the mouth, and the corresponding real conditions may include that the nose is located directly above the mouth.
  • the generated image is determined to be false; if the nose is located above the mouth in the generated image output by the generator 101, the generated image is determined to be true.
  • the true condition can be used by the determiner 102 to determine whether the generated image output by the generator is true.
  • the discriminator 102 can learn real features to determine whether the image is true or false.
  • the gradient update configuration information is used to determine the learning rate of the parameter item learned from each sample, where the learning rate is used to measure the rate of change of the parameter item. Updating the generative confrontation network is actually updating the parameter items of the generator and/or the parameter items of the discriminator. In this step, the learning rate of each parameter item can be determined according to the gradient update configuration information of the discriminator, so as to update the generative confrontation network based on the learning rate.
  • the target learning rate of each parameter item may be determined according to the gradient update configuration information. Among them, the target learning rate is used to indicate the fastest learning rate achievable for each parameter item, or the target learning rate is used to define the suitability of the learning rate. After determining the target learning rate, the generative confrontation network can be updated according to the current learning rate and the target learning rate.
  • the value of the parameter item when entering the current round of training can be obtained as the pre-update value, and the value of the parameter item determined based on the current round of training can be used as the proposed update value; according to the parameter The value to be updated and the value before the update of the item, calculate the current learning rate of the parameter item, and judge whether the current learning rate matches the target learning rate.
  • the current learning rate matches the target learning rate, use the to-be-updated value to update the parameter item, that is, to use the to-be-updated value as the final value of the parameter item in the current round of training, and as the parameter item in the next round of training If the current learning rate does not match the target learning rate, determine the target value of the parameter item according to the target learning rate, and use the target value to update the parameter item, that is, the target value as the current round of training The final value of the parameter item will be used as the pre-update value of the parameter item in the next round of training.
  • the target learning rate can be determined according to Lipschitz constraints.
  • the function f(x) satisfies the Lipschitz condition, then the function f(x) is uniformly continuous.
  • the Lipschitz constraint limits the rate of change of the function f(x), that is, limits the range of change of the function f(x) not to exceed a certain constant, and its slope must be less than Lipschitz’s constant L, which can be based on The Pushtz constant L determines the learning rate.
  • the learning rate of each parameter item in the discriminator can be reduced, so as to ensure the accuracy of the discriminator’s identification of true and false images, while allowing the generator to quickly learn how to generate real images, and then The generator is effectively applied to the image editing model of real images.
  • updating the generative countermeasure network according to the gradient update configuration information of the discriminator includes: updating the configuration information according to the gradient update configuration information of the discriminator, and determining the parameter corresponding to each feature extraction layer in one or more feature extraction layers included in the discriminator Maximum learning rate threshold; for each of the feature extraction layers in the discriminator, according to the parameter learning rate maximum threshold of the feature extraction layer, update the parameter items of the feature extraction layer in the discriminator to make the parameters associated with the feature extraction layer The update rate of the item matches the maximum threshold of the parameter learning rate corresponding to the feature extraction layer.
  • the maximum parameter learning rate threshold is used to determine the maximum learning rate of the parameter item.
  • the parameter item refers to the parameter item of the generative confrontation network. Specifically, it can refer to one or more parameter items corresponding to one or more feature extraction layers in the discriminator .
  • the feature extraction layer is used to extract feature information from the input and output it.
  • the discriminator can be a learning model of any depth, and is usually a structure including multiple feature extraction layers.
  • the learning rate of the parameter item to be updated relative to the value before the update needs to be less than or equal to the maximum learning rate determined according to the maximum threshold of the parameter learning rate.
  • the maximum threshold of the parameter learning rate can be configured for some or all of the parameter items; or, the parameter item that requires the configuration of the maximum threshold of the parameter learning rate can be customized according to the actual situation.
  • the embodiment of the present invention does not make a specific limitation.
  • update the parameter items of the feature extraction layer in the discriminator which can be specifically: According to the gradient update configuration information, each parameter item of one or more parameter items associated with the feature extraction layer can be determined The target learning rate of the parameter item; for each parameter item, obtain the value to be updated for the parameter item; calculate the learning rate of the parameter item according to the value to be updated and the value before the update of the parameter item; determine the learning rate and the parameter item The size relationship of the target learning rate.
  • the learning rate is less than or equal to the target learning rate, it is determined that the learning rate matches the target learning rate, and the parameter item is updated according to the expected update value of the parameter item; when the learning rate is greater than the target learning rate, it is determined that the learning rate does not match the target learning rate. Calculate the target value of the parameter item according to the target learning rate, and update the parameter item with the target value.
  • the target value of the parameter item can be calculated based on the following formula:
  • is the learning rate
  • J( ⁇ 0 , ⁇ 1 ) is the fitting function
  • ⁇ 0 is the value before the update of the parameter term
  • ⁇ 1 is the target value of the parameter term.
  • the value of ⁇ can be determined according to the value of the constant L in the aforementioned Lipschitz constraint condition, for example, it can be equal to the aforementioned target learning rate.
  • the maximum learning rate of each parameter item can be limited, the learning rate of each parameter item of the discriminator can be slowed down, and the discriminator and the discriminator in the generative confrontation network can be effectively improved.
  • the learning consistency of the generator enables the generator to quickly learn how to generate real images while ensuring the accuracy of the discriminator’s identification of true and false images, so that the generator can be effectively applied to the image editing model structure of real images .
  • the training end condition is used to judge whether the training of the generative confrontation network is completed.
  • the loss function will converge to a set value
  • the training end condition can be configured to be that the calculated value of the loss function is less than the set value, or that the update rate of the loss function is less than the set threshold, etc.
  • the generator in it can generate real images relatively accurately.
  • the image editing model can be obtained by adjusting the generator.
  • the real image can be edited using the image editing model, and the edited image output correspondingly is the real image.
  • the editing mode of the image editing model may include changes in attributes such as the position, size, brightness, and color of pixels in the image.
  • the editing method of the image editing model does not change the true nature of the image, and usually the image obtained after editing the real image is still the real image.
  • the editing method includes editing at least one of the skin color, age, gender, and organ region of the human face. For example, edit the skin color of the face from yellow to white; edit the age feature of the face from 50 to 10; edit the gender feature of the face from male to female; edit the single eyelid of the face to double eyelid, etc.
  • the generator includes an encoder and a decoder.
  • the intermediate results corresponding to these intermediate layers can affect the final output result of the generator, that is, the final image editing effect.
  • the output results of a specific one or more intermediate layers can be obtained from the generator as a hidden space (Latent Space), and the hidden space can be adjusted and sent to the cascade structure behind the generator to achieve the effect of image editing. That is, the image editing model can be generated by adjusting the parameters of the hidden space of the generator.
  • the gender characteristics of the face image can be adjusted by editing the hidden space.
  • a female face is input, and a male face is output.
  • the hidden space can be selected according to the specific structure of the generator.
  • the generator includes an encoder and a decoder, and the hidden space is a neural network layer in the decoder. Editing the hidden space may be: obtaining the parameter items of the pre-trained image editing model, and updating the parameter items of the hidden space of the generator.
  • image editing samples can be used to continue training the generator to generate an image editing model.
  • the image editing sample includes the real image before editing and the real image after editing.
  • the image editing sample may include a face image before editing and a face image after editing.
  • the correlation between the face image after editing and the face image before editing can be selected according to the actual situation.
  • the correlation may include gender, age, skin color, etc., which is not limited in the embodiment of the present invention.
  • a pre-trained standard encoder can be used to replace the encoder in the generator to extract effective features from the input image.
  • the standard encoder is used to learn how to extract features that can characterize the input image from the input image.
  • the input size of the decoder in the generator matches the output size of the standard encoder, where the size can be the dimension of the vector.
  • the generative confrontation network by inputting a sample group including noisy images and/or real images to a generative confrontation network including a discriminator and a generator, the generative confrontation network is trained, and the parameters of the discriminator are limited according to Lipschitz constraints
  • the learning rate of the item can slow down the learning rate of each parameter item of the discriminator, which can effectively improve the consistency of the training rate of the discriminator and generator in the generative confrontation network.
  • FIG. 2 is a flowchart of a method for generating an editing model in Embodiment 2 of the present invention. This embodiment is embodied on the basis of Embodiment 1 described above.
  • the method of this embodiment specifically includes:
  • S210 Perform iterative training on the generative confrontation network, which includes a generator and a discriminator.
  • the loss function configuration information can be used to add the Euclidean distance norm to the initial loss function, and the elements included in the Euclidean distance norm are the parameter items of the encoder in the generator.
  • the training process of the generative confrontation network is actually the process of solving the algorithm used to realize the input to the output, and the solving of the algorithm is actually the numerical value of each parameter item in the algorithm.
  • the algorithm has an objective function, and the solution process of the algorithm is an optimization process of the objective function.
  • the loss function can be used as the objective function.
  • the loss function is used to express the degree to which the predicted value of the generative confrontation network is different from the true value. For example, the smaller the value of the loss function, the better the performance of the corresponding generative adversarial network.
  • different models use different loss functions.
  • the loss function is used as the training target of the generative confrontation network.
  • the loss function can be of the following form:
  • the training discriminator D takes maximizing logD(m) as the training goal, so as to continuously improve the accuracy of the discriminator to determine whether the generated image output by the generator is true or not, while training the generator G to minimize 1-logD(G( n)) is the training target, so as to continuously reduce the difference between the generated image output by the generator and the real image.
  • the training effect of the discriminator and generator confrontation training can be achieved.
  • the Euclidean distance norm can be added as a constraint condition on the basis of the initial loss function. Since the Euclidean distance norm can be decomposed into a combination of two low-dimensional parameter matrices, adding the Euclidean distance norm as a constraint condition can effectively reduce the dimension of the parameter matrix and the sample requirement.
  • the training of generative confrontation network may also have the problem of over-fitting.
  • the trained generative confrontation network has good generation effect and discrimination accuracy only for certain types of real images, while the generation effect for unknown types of real images And the accuracy of discrimination is poor.
  • it can also be considered to add the Euclidean distance norm as a constraint condition on the basis of the initial loss function, so that the distribution of the mapping to the hidden space is more even, thereby reducing the coupling of each feature vector, and correspondingly improving the generative confrontation network
  • the loss function configuration information is used to add the Euclidean distance norm on the basis of the initial loss function.
  • the Euclidean distance norm can also be called the regularization term, or the L2 norm, which refers to the result of the square sum of each element.
  • Adding the Euclidean distance norm is equivalent to adding constraints to the initial loss function. In fact, it severely penalizes large-value weight vectors to tend to more dispersed weight vectors, so as to achieve a more uniform weight distribution and avoid weights concentrated in With a small number of vectors, the generative adversarial network is closer to a low-dimensional model. The lower the dimensionality, the smaller the amount of data used for training. Therefore, adding the Euclidean distance range to the initial loss function as a constraint condition can reduce the amount of data used in the training of the generative confrontation network, thereby reducing the complexity of the training of the generative confrontation network.
  • the updated loss function can be in the following form:
  • ⁇ g represents the parameter item matrix of the hidden space of the generator G, which can be specifically the parameter item matrix of the hidden space of the encoder in the generator G.
  • is the penalty coefficient, which is used to adjust the training complexity of the generative confrontation network, which can be set according to the actual situation.
  • ⁇ F stands for norm operation, Represents the Euclidean distance norm of the parameter matrix ⁇ g of the hidden space.
  • the elements included in the Euclidean distance norm may be the parameter item matrix ⁇ g of the hidden space of the generator G, and specifically may be the parameter items of the encoder in the generator.
  • the stability condition is used to judge whether the loss function tends to be stable or convergent.
  • the stable condition is used to determine whether the change rate of the loss function in adjacent training rounds is less than a set change rate threshold, and the size of the change rate threshold can be limited according to actual conditions. It can be understood that the value of the loss function changes very little with the number of training rounds, which indicates that the loss function is stable.
  • the rate of change of the loss function may be: the difference between the current value of the loss function calculated in the current round of training and the historical value of the loss function calculated in the previous round of training, relative to the ratio of the current value of the loss function.
  • the stable condition may be to determine whether the number of training rounds exceeds the set round number threshold. If the number of training rounds of the generative confrontation network is sufficient, it can be determined that the training of the generative confrontation network is completed.
  • the weight distribution of the vectors can be made more uniform, and the weights can be prevented from being concentrated on a few vectors, thereby not only reducing the amount of data used in the training of the generative confrontation network and
  • the computational complexity can also improve the generalization ability of the trained generative countermeasure network, that is, expand the range of real images applicable to the trained generative countermeasure network, thereby ensuring the application of the trained generative countermeasure network The accuracy and editing effect of images in unknown categories.
  • FIG. 3A is a flowchart of a method for generating an editing model in Embodiment 3 of the present invention. This embodiment is embodied on the basis of the above-mentioned embodiment.
  • the method of this embodiment specifically includes:
  • the generative confrontation network includes a generator and a discriminator.
  • the image feature detection model is obtained by training based on image feature samples.
  • the image feature sample may include two regional image blocks in the same image and relationship data between the two regional image blocks.
  • the image feature detection model may include two convolutional neural networks sharing weights, a feature vector splicer, and a fully connected network classifier.
  • the convolutional neural network is used to extract the feature information of the regional image block and form a feature vector
  • the feature vector splicer is used to synthesize the feature vector generated by each convolutional neural network into the target feature vector
  • the fully connected network classifier is used to combine the target feature The vector is classified, and the relationship data between the image blocks in each area is output.
  • the image feature detection model is used to extract features from the image, and can be a pre-trained deep learning model. Specifically, the image feature detection model can learn to extract the features of image blocks in different regions and the relationship between image blocks in different regions in a self-supervised manner.
  • the image blocks of different regions are partial image regions in the same image, and there is no overlap or overlap between the two regional image blocks.
  • the number and specific settings of regional image blocks in the same image can be selected according to actual conditions.
  • the target object is detected in the image, and the target object is divided into nine equal parts (for example, in the form of nine square grids).
  • the embodiment of the present invention does not make a specific limitation.
  • the relationship data is used to describe the relationship between the two regional image blocks.
  • the relationship data may be at least one of the position relationship, size relationship, shape relationship, and color relationship of the regional image blocks.
  • the relationship data includes a position relationship.
  • the positional relationship in the relationship data may include upper left, upper middle, upper right, right left, middle, right right, bottom left, bottom middle, and bottom right.
  • the feature information of the regional image block is used to represent the regional image block in the form of data, for example, a feature vector.
  • the feature information represents regional image blocks from different dimensions, and the feature vector can be used to represent the corresponding dimensional information.
  • the convolutional neural network and the feature vector stitcher are used to map the original image data to the hidden space, and the fully connected network classifier is used to map the learned distributed feature representation to the sample label space, so that the classification of the sample can be determined based on the sample label.
  • the convolutional neural network can use the PixelShuffle method to realize the upsampling of the feature map to reduce the artifact effect caused by the transposed convolution or the ordinary linear interpolation upsampling method, which can improve the generation based on the convolutional neural network structure The authenticity of the generated image output by the generator.
  • the image feature detection model may include a first convolutional neural network 301 and a second convolutional neural network 302 that share weights, a feature vector splicer 303, and a fully connected network classifier 304.
  • the convolutional neural network used to construct the generator may be any one of the first convolutional neural network 301 and the second convolutional neural network 302.
  • the specific operation of the image feature detection model may include: dividing the face image into at least two regional image blocks (for example, the mouth region image block and the right eye region image block); in this embodiment, the mouth region image block can be input to the first Perform feature extraction in a convolutional neural network 301 to obtain the first feature vector output by the first convolutional neural network 301; input the image block of the right eye area into the second convolutional neural network 302 for feature extraction to obtain the second volume
  • the second feature vector output by the product neural network 302; the first feature vector and the second feature vector are input to the feature vector splicer 303 for splicing, and the spliced feature vector output by the feature vector splicer 303 is obtained; the spliced feature vector Input to the fully connected network classifier 304 for classification, and obtain the relationship data between the image block of the mouth area and the image block of the right eye area.
  • the fully connected network classifier 304 can determine that the image block of the right eye area is at the upper right of the image block
  • the image feature sample may include two face organ region image blocks in the same face image and the relationship data between the two face organ region image blocks.
  • the facial organ region image blocks may be image blocks divided according to facial organs, for example, a nose region image block and a mouth region image block.
  • the relationship data between the image blocks of the face organ region may indicate the relative positional relationship of the two face organ region image blocks in the face image. For example, if the nose area image block is in the middle and the mouth area image block is in the middle and bottom, the relational data may be that the nose area image block is located above the mouth area image block.
  • the facial organ region image blocks in the face image as the image feature sample, the feature information used to distinguish the facial organs can be accurately extracted from the face image, and learning can be performed. In this way, it can help to accurately identify the various organs of the face image to be edited, so that the authenticity of face editing can be improved.
  • the decoder of the generator may include a convolutional neural network.
  • the convolutional neural network in the pre-trained image feature detection model can be used as the convolutional neural network in the decoder of the generator; or, the parameter items of the convolutional neural network in the pre-trained image feature detection model , Migrate to the convolutional neural network in the decoder of the generator.
  • the convolutional neural network in the pre-trained image feature detection model may be additionally added to the existing feature extraction network of the decoder of the generator.
  • the convolutional neural network and other feature extraction networks can share weights
  • the output feature vector of the convolutional neural network and the output feature vector of other feature extraction layers can be spliced
  • the spliced feature vector can be input to the original feature extraction Layer output feature vector module, for example, fully connected network classifier, etc.
  • S340 Generate an image editing model according to the updated generator.
  • the updated generator uses a self-supervised learning method to train the generated convolutional neural network.
  • a small number of samples can be used to complete the training of the convolutional neural network, which effectively reduces the demand for the generator's training samples and improves the training speed .
  • the generator in the generative confrontation network is updated by the convolutional neural network trained and generated in advance based on the self-supervised learning method, and the image editing model is constructed based on the updated generator, which can effectively extract the input image of the image editing model
  • the features in, reduce the demand for labeling samples, reduce the training sample size of the image editing model, thereby increasing the generation speed of the image editing model, and reducing the labeling labor cost of the image editing model.
  • Fig. 4A is a flowchart of a method for editing a face image in the fourth embodiment of the present invention.
  • This embodiment is applicable to a situation where an image editing model is used to edit a face image.
  • the method may be executed by the face image editing apparatus provided by the embodiment of the present invention, and the apparatus may be implemented in software and/or hardware, and may generally be integrated into computer equipment.
  • the method of this embodiment specifically includes:
  • the human face image is a real image including the human face. For example, photos taken by users themselves. It should be noted that the face images of cartoon characters are not real images.
  • S420 Input the face image to be edited into the image editing model to obtain the edited face image output by the image editing model.
  • the image editing model is generated by the editing model generation method in any of the foregoing embodiments of the present invention.
  • the image editing model can be generated by the editing model generation method of any of the foregoing embodiments of the present invention.
  • the generator in the image editing model or the decoder in the generator is derived from the generative confrontation network obtained by the method for generating the editing model in any of the foregoing embodiments of the present invention.
  • the generative confrontation network includes a generator and a discriminator, and uses Lipschitz constraints to determine the gradient update configuration information of the discriminator, so as to slow down the learning rate of each parameter item of the discriminator, so that the discriminator learns how to distinguish between real images and
  • the rate at which the generator learns how to generate real images is as consistent as possible, so as to ensure both the accuracy of the generative adversarial network to distinguish real images and the authenticity of the generated images.
  • the first image on the left is a standard processed image commonly used in teaching books, and this image can be used as a real face image.
  • the second image in the middle is a video frame in a dynamic video.
  • the third image on the right is the edited image formed by editing the first image to simulate the mouth opening action of the intermediate video frame.
  • the embodiment of the present invention uses Lipschitz constraints to constrain the gradient update configuration information of the discriminator in the training of the generative confrontation network, which can effectively slow down the learning rate of each parameter item of the discriminator, and improve the performance of the discriminator and generator. Training consistency guarantees the authenticity of the generated image output by the generator while ensuring the discrimination accuracy of the final trained discriminator. In this way, the editing model built based on the generator in the generative confrontation network that is finally trained can effectively ensure the authenticity of the image editing effect when used to obtain the edited image of the real face image, and further improve the user experience .
  • Fig. 5 is a schematic diagram of an editing model generating device in the fifth embodiment of the present invention.
  • the fifth embodiment is a corresponding device that implements the editing model generation method provided in the foregoing embodiment of the present invention.
  • the device can be implemented in software and/or hardware, and generally can be integrated with computer equipment.
  • the apparatus of this embodiment may include:
  • the network training module 510 is used for iterative training of a generative confrontation network, the generative confrontation network including a generator and a discriminator;
  • the network update module 520 is configured to update the generative confrontation network according to the gradient update configuration information of the discriminator in the iterative training, and the gradient update configuration information is determined by Lipschitz constraint conditions;
  • the model generation module 530 is configured to generate an image editing model according to the generator in the trained generative confrontation network when it is determined that the generative confrontation network meets the training end condition.
  • the embodiment of the present invention uses real images and/or noise images as samples to be input to the generative countermeasure network to iteratively train a generative countermeasure network including a generator and a discriminator, and limit the parameter items of the discriminator according to Lipschitz constraints
  • a generative countermeasure network including a generator and a discriminator
  • limit the parameter items of the discriminator according to Lipschitz constraints In order to improve the learning consistency of the discriminator and the generator, it can ensure the accuracy of the discriminator’s identification of true and false images while ensuring the authenticity of the generated image output by the generator, and then the generator can be effectively applied
  • the image editing model structure of the real image the authenticity of the effect of editing the face image based on the image editing model structure is improved.
  • the model generation module 530 includes a loss function calculation unit for: adding Euclidean distance norm as a constraint condition on the basis of the initial loss function according to the configuration information of the loss function to obtain the loss function of the generative countermeasure network.
  • the elements included in the distance norm are the parameter items of the encoder in the generator; when the loss function is determined to meet the convergence condition, it is determined that the generative confrontation network meets the training end condition, and the generator in the generated confrontation network is confronted according to the training Generate image editing model.
  • the network update module 520 includes a discriminator parameter item update unit, which is used to: according to the gradient update configuration information of the discriminator, determine that the parameter learning rate corresponding to each feature extraction layer in the one or more feature extraction layers included in the discriminator is the largest Threshold; For each feature extraction layer in the discriminator, according to the maximum threshold of the parameter learning rate of the feature extraction layer, update the parameter items of the feature extraction layer in the discriminator so that the update rate of the parameter items associated with the feature extraction layer is the same as The parameter learning rate corresponding to the feature extraction layer is matched with the maximum threshold value.
  • the model generation module 530 includes a self-supervised generation unit, which is used to: based on the pre-trained image feature detection model of the convolutional neural network, update the generator in the generative confrontation network obtained by training, and specifically update the generator in the generator Decoder; Generate image editing model based on the updated generator.
  • the image feature detection model is obtained by training based on image feature samples, and the image feature samples include two regional image blocks in the same image and the relationship data between the two regional image blocks.
  • the image feature detection model can include two convolutional neural networks that share weights, a feature vector splicer, and a fully connected network classifier.
  • the convolutional neural network is used to extract the feature information of the regional image block and form a feature vector; the feature vector splicer is used to synthesize the feature vector generated by each convolutional neural network into the target feature vector; the fully connected network classifier is used to perform the target feature vector Categorize and output the relationship data between image blocks in each area.
  • the image feature sample includes two face organ region image blocks in the same face image and the relationship data between the two face organ region image blocks.
  • the network training module 510 includes a training unit for inputting samples including real images and/or noisy images into the generative confrontation network, and performs a round of training on the generative confrontation network.
  • the above-mentioned editing model generating device can execute the editing model generating method provided by any one of the embodiments of the present invention, and achieve the same beneficial effects.
  • Fig. 6 is a schematic diagram of a face image editing device in the sixth embodiment of the present invention.
  • the sixth embodiment is a corresponding device that implements the face image editing method provided in the foregoing embodiment of the present invention.
  • the device can be implemented in software and/or hardware, and generally can be integrated with computer equipment.
  • the apparatus of this embodiment may include:
  • the face image acquisition module 610 is used to acquire the face image to be edited
  • the face image editing module 620 is configured to input the face image to be edited into the image editing model to obtain the edited face image output by the image editing model; wherein, the image editing model is adopted as in any of the foregoing embodiments of the present invention Generated by editing model generation method.
  • the gradient update configuration information of the discriminator in the generative confrontation network is determined according to Lipschitz constraints, and the learning rate of the parameter items of the discriminator is restricted based on the gradient update configuration information, which can improve the generative confrontation network
  • the training consistency of the middle discriminator and the generator can ensure the authenticity of the image editing effect of the editing model constructed based on the generator in the generative confrontation network of the final training, which effectively improves the user experience.
  • the aforementioned facial image editing device can execute the facial image editing method provided by any one of the embodiments of the present invention, and achieve the same beneficial effects.
  • FIG. 7 is a schematic structural diagram of a computer device according to Embodiment 7 of the present invention.
  • Figure 7 shows a block diagram of an exemplary computer device 12 suitable for implementing embodiments of the present invention.
  • the computer device 12 shown in FIG. 7 is only an example, and should not bring any limitation to the function and application scope of the embodiment of the present invention.
  • the computer device 12 is represented in the form of a general-purpose computing device.
  • the components of the computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 connecting different system components (including the system memory 28 and the processing unit 16).
  • the bus 18 represents one or more of several types of bus structures, including a memory bus, a peripheral bus, or any bus structure using multiple bus structures.
  • these bus structures include but are not limited to Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards) Association, VESA) local bus and Peripheral Component Interconnect (PCI) bus.
  • the computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by the computer device 12, including volatile and nonvolatile media, removable and non-removable media.
  • the system memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32.
  • the computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media.
  • the storage system 34 may be used to read and write non-removable, non-volatile magnetic media (not shown in FIG. 7 and generally referred to as a "hard drive").
  • a disk drive for reading and writing to a removable non-volatile disk (such as a "floppy disk") and a removable non-volatile optical disk (such as a compact disk read-only memory (Compact Disk)) can be provided.
  • each drive can be connected to the bus 18 through one or more data media interfaces.
  • the system memory 28 may store at least one program product, the program product having a set (for example, at least one) program modules, and these program modules are configured to perform the functions of the various embodiments of the present invention.
  • the program/utility tool 40 having a set of (at least one) program module 42 may be stored in the system memory 28, for example.
  • the program module 42 includes, but is not limited to, an operating system, one or more application programs, other program modules, and program data. Each of these examples or some combination may include the implementation of a network environment.
  • the program module 42 generally executes the functions and/or methods in the described embodiments of the present invention.
  • the computer device 12 can also communicate with one or more external devices 14 (such as keyboards, pointing devices, displays 24, etc.), and can also communicate with one or more devices that enable users to interact with the computer device 12, and/or communicate with Any device (such as a network card, modem, etc.) that enables the computer device 12 to communicate with one or more other computing devices. This communication can be performed through an input/output (Input/Output, I/O) interface 22.
  • the computer device 12 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN)) through the network adapter 20. As shown in the figure, the network adapter 20 communicates with other modules of the computer device 12 through the bus 18. It should be understood that although not shown in FIG.
  • the processing unit 16 executes various functional applications and data processing by running the program module 42 stored in the system memory 28, for example, to implement an editing model generation method and/or a face image editing method provided by any embodiment of the present invention .
  • the eighth embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored.
  • the program When the program is executed by a processor, it realizes the editing model generation method as provided in all the invention embodiments of this application, or realizes all
  • An embodiment of the invention provides a method for editing a face image.
  • the computer storage medium of the embodiment of the present invention may adopt any combination of one or more computer-readable media.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples of computer-readable storage media (non-exhaustive list) include: electrical connections with one or more wires, portable computer disks, hard disks, RAM, Read Only Memory (ROM), erasable Erasable Programmable Read Only Memory (EPROM), flash memory, optical fiber, portable CD-ROM, optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable medium may send, propagate or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, optical cable, radio frequency, etc., or any suitable combination of the foregoing.
  • the computer program code used to perform the operations of the present invention can be written in one or more programming languages or a combination thereof.
  • the programming languages include object-oriented programming languages-such as Java, Smalltalk, C++, and also conventional procedural programming languages. Programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to a user computer through any kind of network including a LAN or WAN, or may be connected to an external computer (for example, using an Internet service provider to connect through the Internet).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

An editing model generation method and apparatus, a face image editing method and apparatus, a device, and a medium. The editing model generation method comprises: iteratively training a generative adversarial network comprising a generator and a discriminator (S110); in iterative training, updating the generative adversarial network according to gradient updating configuration information of the discriminator, the gradient updating configuration information being determined by means of a Lipschitz constraint condition (S120); and when determining that the generative adversarial network satisfies a training ending condition, generating an image editing model according to the generator in the trained generative adversarial network (S130).

Description

编辑模型生成、人脸图像编辑方法、装置、设备及介质Editing model generation, face image editing method, device, equipment and medium 技术领域Technical field
本发明实施例涉及人工智能领域,尤其涉及人脸图像的编辑以及相应的编辑模型的生成。The embodiment of the present invention relates to the field of artificial intelligence, in particular to the editing of face images and the generation of corresponding editing models.
背景技术Background technique
近年来,人们对于合成图像的真实度要求越来越高,期望通过算法可以生成更为真实和自然的图像。尤其是,人们经常会对人脸图像进行编辑,期望编辑后的人脸图像仍为真实的人脸。In recent years, people have higher and higher requirements for the authenticity of synthesized images, and it is hoped that more realistic and natural images can be generated through algorithms. In particular, people often edit face images, expecting that the edited face images are still real faces.
目前,可以采用包括生成器和判别器的生成式对抗网络(Generative Adversarial Network,GAN)来生成真实人脸图像。在生成式对抗网络的训练过程中,其中生成器用于生成人脸图像,判别器用于对生成的人脸图像判别真假。At present, a Generative Adversarial Network (GAN) including a generator and a discriminator can be used to generate real face images. In the training process of the generative confrontation network, the generator is used to generate the face image, and the discriminator is used to distinguish the true and false of the generated face image.
对生成式对抗网络的训练,实际是对生成式对抗网络中的生成器和判别器进行训练。在实际训练过程中,相较于生成器,判别器训练完成的速度通常更快。在训练开始后,判别器可以很快训练达到可以相对准确地判别生成图像的真假,但此时生成器尚未学习到如何生成真实图像,并使得生成器生成的图像均不能通过判别器的判断,从而导致对整个生成式对抗网络的训练失败。在训练失败的情况下,基于训练得到的生成式对抗网络生成的图像无法保证真实性,基于生成式对抗网络获得的图像编辑模型的编辑效果也不能保证。The training of the generative confrontation network is actually training the generator and discriminator in the generative confrontation network. In the actual training process, compared with the generator, the discriminator training is usually completed faster. After the training begins, the discriminator can be quickly trained to determine the true and false of the generated image relatively accurately, but at this time the generator has not learned how to generate real images, and the images generated by the generator cannot pass the judgment of the discriminator. , Resulting in the failure of the training of the entire generative confrontation network. In the case of training failure, the authenticity of the image generated based on the generative confrontation network obtained by training cannot be guaranteed, and the editing effect of the image editing model obtained based on the generative confrontation network cannot be guaranteed.
发明内容Summary of the invention
本发明实施例提供一种编辑模型生成、人脸图像编辑方法、装置、设备及介质,可以提高生成器和判别器的训练一致性,提高生成图像的真实性。The embodiment of the present invention provides an editing model generation and face image editing method, device, equipment and medium, which can improve the training consistency of the generator and the discriminator, and improve the authenticity of the generated image.
第一方面,本发明实施例提供了一种编辑模型生成方法,包括:对生成式对抗网络进行迭代训练,所述生成式对抗网络包括生成器和判别器;在所述迭代训练中,根据所述判别器的梯度更新配置信息更新所述生成式对抗网络,其中,所述梯度更新配置信息通过利普希茨约束条件确定;在确定所述生成式对抗网络满足训练结束条件时,根据训练得到的所述生成式对抗网络中的生成器生成图像编辑模型。In the first aspect, an embodiment of the present invention provides an editing model generation method, including: performing iterative training on a generative confrontation network, the generative confrontation network including a generator and a discriminator; in the iterative training, according to all The gradient update configuration information of the discriminator updates the generative confrontation network, wherein the gradient update configuration information is determined by Lipschitz constraints; when it is determined that the generative confrontation network satisfies the training end condition, it is obtained according to the training The generator in the generative confrontation network generates an image editing model.
第二方面,本发明实施例提供了一种人脸图像编辑方法,包括:获取待编辑的人脸图像;将所述待编辑的人脸图像输入到图像编辑模型中,得到所述图像编辑模型输出的 编辑人脸图像。其中,所述图像编辑模型通过如上所述的编辑模型生成方法生成。In a second aspect, an embodiment of the present invention provides a face image editing method, including: acquiring a face image to be edited; inputting the face image to be edited into an image editing model to obtain the image editing model The output edited face image. Wherein, the image editing model is generated by the above-mentioned editing model generation method.
第三方面,本发明实施例还提供了一种编辑模型生成装置,包括:网络训练模块,用于对生成式对抗网络进行迭代训练,所述生成式对抗网络包括生成器和判别器;网络更新模块,用于在所述迭代训练中,根据所述判别器的梯度更新配置信息更新所述生成式对抗网络,所述梯度更新配置信息通过利普希茨约束条件确定;模型生成模块,用于在确定所述生成式对抗网络满足训练结束条件时,根据训练得到的生成式对抗网络中的生成器生成图像编辑模型。In a third aspect, an embodiment of the present invention also provides an editing model generation device, including: a network training module for iterative training of a generative confrontation network, the generative confrontation network including a generator and a discriminator; network update A module for updating the generative confrontation network according to the gradient update configuration information of the discriminator in the iterative training, the gradient update configuration information is determined by Lipschitz constraints; a model generation module is used for When it is determined that the generative confrontation network satisfies the training end condition, an image editing model is generated according to the generator in the generative confrontation network obtained by training.
第四方面,本发明实施例还提供了一种人脸图像编辑装置,包括:人脸图像获取模块,用于获取待编辑的人脸图像;人脸图像编辑模块,用于将所述待编辑的人脸图像输入到图像编辑模型中,得到所述图像编辑模型输出的编辑人脸图像;其中,所述图像编辑模型通过如上所述的编辑模型生成方法生成。In a fourth aspect, an embodiment of the present invention also provides a face image editing device, including: a face image acquisition module for acquiring a face image to be edited; a face image editing module for editing the face image The face image of is input into the image editing model to obtain the edited face image output by the image editing model; wherein, the image editing model is generated by the above-mentioned editing model generation method.
第五方面,本发明实施例还提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序。其中,所述处理器执行所述程序时实现如本发明实施例中任一项所述的编辑模型生成方法或人脸图像编辑方法。In a fifth aspect, an embodiment of the present invention also provides a computer device, including a memory, a processor, and a computer program stored on the memory and running on the processor. Wherein, when the processor executes the program, the editing model generation method or the face image editing method according to any one of the embodiments of the present invention is implemented.
第六方面,本发明实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本发明实施例中任一项所述的编辑模型生成方法或人脸图像编辑方法。In a sixth aspect, an embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the editing model generation method as described in any one of the embodiments of the present invention is implemented Or face image editing method.
本发明实施例通过在训练包括生成器和判别器的生成式对抗网络时,根据利普希茨约束条件限制判别器的参数项的学习速率,以减缓判别器的训练速度,可以提高判别器和生成器的训练一致性,从而可以在保证判别器对真假图像的判别准确率的同时,使生成器能够快速学习如何生成真实图像,进而提高基于生成器所构建的图像编辑模型的编辑效果的真实性。The embodiment of the present invention limits the learning rate of the parameter items of the discriminator according to Lipschitz constraints when training a generative confrontation network including a generator and a discriminator, so as to slow down the training speed of the discriminator, and can improve the discriminator and The training consistency of the generator can ensure the accuracy of the discriminator’s identification of true and false images, and enable the generator to quickly learn how to generate real images, thereby improving the editing effect of the image editing model built by the generator Authenticity.
附图说明Description of the drawings
图1A是本发明实施例一中的一种编辑模型生成方法的流程图;FIG. 1A is a flowchart of an editing model generation method in Embodiment 1 of the present invention;
图1B是本发明实施例一中的一种训练生成式对抗网络的应用场景的示意图;FIG. 1B is a schematic diagram of an application scenario of training a generative confrontation network in Embodiment 1 of the present invention;
图2是本发明实施例二中的一种编辑模型生成方法的流程图;2 is a flowchart of an editing model generation method in the second embodiment of the present invention;
图3A是本发明实施例三中的一种编辑模型生成方法的流程图;3A is a flowchart of an editing model generation method in Embodiment 3 of the present invention;
图3B是本发明实施例三中的一种自监督训练卷积神经网络的应用场景的示意图;3B is a schematic diagram of an application scenario of a self-supervised training convolutional neural network in Embodiment 3 of the present invention;
图4A是本发明实施例四中的一种人脸图像编辑方法的流程图;4A is a flowchart of a method for editing a face image in the fourth embodiment of the present invention;
图4B是本发明实施例四中的一种人脸图像编辑图像的示意图;4B is a schematic diagram of a face image editing image in the fourth embodiment of the present invention;
图5是本发明实施例五中的一种编辑模型生成装置的结构示意图;Figure 5 is a schematic structural diagram of an editing model generating device in the fifth embodiment of the present invention;
图6是本发明实施例六中的一种人脸图像编辑装置的结构示意图;Fig. 6 is a schematic structural diagram of a face image editing device in the sixth embodiment of the present invention;
图7是本发明实施例七中的一种计算机设备的结构示意图。Fig. 7 is a schematic structural diagram of a computer device in the seventh embodiment of the present invention.
具体实施方式detailed description
下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部结构。此外,本发明中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。The present invention will be further described in detail below with reference to the drawings and embodiments. It can be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention. In addition, it should be noted that, for ease of description, the drawings only show a part but not all of the structure related to the present invention. In addition, the scope of disclosure involved in the present invention is not limited to the technical solutions formed by the specific combination of the above technical features, and should also cover any implementation of the above technical features or equivalent features without departing from the above inventive concept. Other technical solutions formed by combination.
实施例一Example one
图1A为本发明实施例一中的一种编辑模型生成方法的流程图。本实施例可适用于训练生成式对抗网络,并根据训练完成的生成式对抗网络中的生成器生成图像编辑模型。该方法可以由本发明实施例提供的编辑模型生成装置来执行,该装置可采用软件和/或硬件的方式实现,并一般可集成在计算机设备中。如图1A所示,本实施例的方法具体包括:FIG. 1A is a flowchart of a method for generating an editing model in Embodiment 1 of the present invention. This embodiment may be suitable for training a generative confrontation network, and generate an image editing model according to a generator in the trained generative confrontation network. The method can be executed by the editing model generation device provided in the embodiment of the present invention, and the device can be implemented in software and/or hardware, and generally can be integrated in computer equipment. As shown in FIG. 1A, the method of this embodiment specifically includes:
S110,对生成式对抗网络(GAN)进行迭代训练,所述生成式对抗网络包括生成器和判别器。S110: Perform iterative training on a generative confrontation network (GAN), which includes a generator and a discriminator.
在本实施例中,待训练的生成器和待训练的判别器构成GAN。对GAN的训练操作,实际也是同时对生成器和判别器进行训练。In this embodiment, the generator to be trained and the discriminator to be trained constitute a GAN. For the GAN training operation, the generator and the discriminator are actually trained at the same time.
在本实施例中,采用样本对生成式对抗网络进行训练。In this embodiment, samples are used to train the generative confrontation network.
可选的,对生成式对抗网络进行训练,包括:将为真实图像或者噪声图像的样本输入到生成式对抗网络,对生成式对抗网络进行迭代训练。Optionally, training the generative countermeasure network includes: inputting samples of real images or noisy images to the generative countermeasure network, and iteratively trains the generative countermeasure network.
样本包括噪声图像和真实图像等。噪声图像可以是随机噪声图像,真实图像可以包括真实人物、真实动物或真实场景等具有真实属性特征的图像。示例性的,真实图像可以包括真实人脸图像,例如,人脸照片。示例性的,可以将多个样本构成一个样本组, 对生成式对抗网络进行多轮迭代训练,每轮训练可以采用设定数量个样本进行训练,设定数量可以根据实际情况进行选择,例如,8个,本发明实施例对此并不进行限定。可以将设定数量个样本确定为一个样本组,在一轮训练中,采用一个样本组对生成式对抗网络进行训练。The samples include noisy images and real images. The noise image may be a random noise image, and the real image may include images with real attributes such as real people, real animals, or real scenes. Exemplarily, the real image may include a real face image, for example, a face photo. Exemplarily, multiple samples can be formed into a sample group, and multiple rounds of iterative training can be performed on the generative confrontation network. Each round of training can use a set number of samples for training, and the set number can be selected according to the actual situation, for example, There are eight, which are not limited in the embodiment of the present invention. A set number of samples can be determined as a sample group, and in one round of training, a sample group is used to train the generative adversarial network.
如图1B所示,生成式对抗网络包括生成器101和判别器102。在训练过程中,可将随机噪声图像或真实图像作为样本输入到生成器101中,获取生成器101输出的生成图像。然后,可将生成器101输出的生成图像和对应的作为样本的随机噪声图像或真实图像输入到判别器102中,获取判别器102输出的判别结果,并基于所述判别结果更新生成器101和判别器102的参数项。As shown in FIG. 1B, the generative confrontation network includes a generator 101 and a discriminator 102. In the training process, random noise images or real images can be input into the generator 101 as samples, and the generated images output by the generator 101 can be obtained. Then, the generated image output by the generator 101 and the corresponding random noise image or real image as a sample can be input to the discriminator 102 to obtain the discrimination result output by the discriminator 102, and update the generator 101 and the real image based on the discrimination result. The parameter item of the discriminator 102.
在本发明实施例中,生成器101可用于对任意图像进行编辑,并输出生成图像,判别器102用于判断生成器101输出的生成图像是否满足真实条件(或规则)。需要说明的是,判别器102不是用于判断所述生成图像是否准确,也即是否将原始图像编辑成需要的图像效果,而是用于判断所述生成图像的真实性程度,即为真或假。例如,在生成图像为人脸图像的情况下,可以根据鼻子和嘴巴的位置关系判断该生成图像的真假,相应的真实条件可包括鼻子位于嘴巴正上方。示例性的,如果在生成器101输出的生成图像中鼻子位于嘴巴下方,则确定生成图像为假;如果在生成器101输出的生成图像中鼻子位于嘴巴上方,则确定生成图像为真。真实条件可以用于判断器102判断生成器输出的生成图像是否真实。例如,判别器102可以学习真实特征,以进行图像的真假判断。In the embodiment of the present invention, the generator 101 can be used to edit any image and output the generated image, and the discriminator 102 is used to determine whether the generated image output by the generator 101 meets the real conditions (or rules). It should be noted that the discriminator 102 is not used to determine whether the generated image is accurate, that is, whether to edit the original image into the required image effect, but to determine the degree of authenticity of the generated image, that is, true or Fake. For example, in the case where the generated image is a face image, the authenticity of the generated image can be determined according to the positional relationship between the nose and the mouth, and the corresponding real conditions may include that the nose is located directly above the mouth. Exemplarily, if the nose is located below the mouth in the generated image output by the generator 101, the generated image is determined to be false; if the nose is located above the mouth in the generated image output by the generator 101, the generated image is determined to be true. The true condition can be used by the determiner 102 to determine whether the generated image output by the generator is true. For example, the discriminator 102 can learn real features to determine whether the image is true or false.
S120,在迭代训练中,根据判别器的梯度更新配置信息更新生成式对抗网络。其中,所述梯度更新配置信息通过利普希茨约束条件确定。S120. In iterative training, update the generative confrontation network according to the gradient update configuration information of the discriminator. Wherein, the gradient update configuration information is determined by Lipschitz constraint conditions.
梯度更新配置信息用于确定从每个样本学习到的参数项的学习速率,其中,学习速率用于衡量参数项的变化率。更新生成式对抗网络实际是更新生成器的参数项和/或判别器的参数项。本步骤可以根据判别器的梯度更新配置信息确定各参数项的学习速率,从而基于学习速率更新生成式对抗网络。The gradient update configuration information is used to determine the learning rate of the parameter item learned from each sample, where the learning rate is used to measure the rate of change of the parameter item. Updating the generative confrontation network is actually updating the parameter items of the generator and/or the parameter items of the discriminator. In this step, the learning rate of each parameter item can be determined according to the gradient update configuration information of the discriminator, so as to update the generative confrontation network based on the learning rate.
在一些可选实施例中,可以根据梯度更新配置信息确定各参数项的目标学习速率。其中,目标学习速率用于表示各参数项可达到的最快的学习速率,或者目标学习速率用于界定学习速率的合适与否。在确定目标学习速率后,可以根据当前学习速率与目标学习速率更新生成式对抗网络。In some optional embodiments, the target learning rate of each parameter item may be determined according to the gradient update configuration information. Among them, the target learning rate is used to indicate the fastest learning rate achievable for each parameter item, or the target learning rate is used to define the suitability of the learning rate. After determining the target learning rate, the generative confrontation network can be updated according to the current learning rate and the target learning rate.
例如,可获取进入当轮训练时参数项的数值作为更新前数值(pre-update value), 以及基于当轮训练所确定的该参数项的数值作为拟更新数值(proposed update value);根据该参数项的拟更新数值和更新前数值,计算该参数项的当前学习速率,并判断该当前学习速率与目标学习速率是否匹配。若当前学习速率与目标学习速率匹配时,用所述拟更新数值更新该参数项,也即将该拟更新数值作为当轮训练中该参数项的最终数值、并作为下一轮训练中该参数项的更新前数值;若当前学习速率与目标学习速率不匹配时,根据目标学习速率确定该参数项的目标数值,并用所述目标数值更新该参数项、也即将该目标数值作为当轮训练中该参数项的最终数值、并作为下一轮训练中该参数项的更新前数值。For example, the value of the parameter item when entering the current round of training can be obtained as the pre-update value, and the value of the parameter item determined based on the current round of training can be used as the proposed update value; according to the parameter The value to be updated and the value before the update of the item, calculate the current learning rate of the parameter item, and judge whether the current learning rate matches the target learning rate. If the current learning rate matches the target learning rate, use the to-be-updated value to update the parameter item, that is, to use the to-be-updated value as the final value of the parameter item in the current round of training, and as the parameter item in the next round of training If the current learning rate does not match the target learning rate, determine the target value of the parameter item according to the target learning rate, and use the target value to update the parameter item, that is, the target value as the current round of training The final value of the parameter item will be used as the pre-update value of the parameter item in the next round of training.
其中,目标学习速率可以根据利普希茨约束条件确定。Among them, the target learning rate can be determined according to Lipschitz constraints.
利普希茨(Lipschitz)约束条件定义:Lipschitz constraint definition:
若存在常数L,使得对定义域D的任意两个不同的实数x1、x2均有下述不等式成立:If there is a constant L, the following inequality holds for any two different real numbers x1 and x2 in the domain D:
|f(x1)-f(x2)|≤L||x1-x2|||f(x1)-f(x2)|≤L||x1-x2||
则称函数f(x)在D上满足Lipschitz约束条件。其中,L称为利普希茨常数,具体值可依函数f(x)而定。It is said that the function f(x) satisfies the Lipschitz constraint on D. Among them, L is called Lipschitz constant, and the specific value can be determined by the function f(x).
显然地,若函数f(x)满足利普希茨条件,则函数f(x)一致连续。实际上,Lipschitz约束条件限制了函数f(x)的改变速率,也即限制了函数f(x)的改变幅度不能超过某个常量,其斜率必然小于利普希茨常数L,从而可以根据利普希茨常数L确定学习速率。Obviously, if the function f(x) satisfies the Lipschitz condition, then the function f(x) is uniformly continuous. In fact, the Lipschitz constraint limits the rate of change of the function f(x), that is, limits the range of change of the function f(x) not to exceed a certain constant, and its slope must be less than Lipschitz’s constant L, which can be based on The Pushtz constant L determines the learning rate.
发明人通过研究发现,在对包括生成器和判别器的生成式对抗网络进行训练时,如果对更新步长无约束,则会使得判别器学习如何判别图像真假的速率过快,而生成器学习如何生成真实图像的速率过慢。由于判别器与生成器的学习速率的不同,通常导致判别器在生成器未训练完成的情况下即可相对准确地判别出生成器输出的生成图像的真实性,例如将尚未完成训练的生成器输出的生成图像判断为假,从而使得生成器再怎样训练和学习都无法生成被判别器确定为真实的生成图像。换言之,在判别器完成训练之后,判别器对生成器输出的生成图像的判别结果大部分是“该生成图像为假的图像”,从而导致生成器的梯度消失,无法继续学习如何生成真实图像。因而,最终无法保证训练得到的生成器输出的生成图像的真实性。The inventor found through research that when training a generative adversarial network that includes a generator and a discriminator, if there is no constraint on the update step, the discriminator will learn how to distinguish between true and false images too fast, and the generator The rate of learning how to generate real images is too slow. Due to the difference in the learning rate between the discriminator and the generator, the discriminator usually can determine the authenticity of the generated image output by the generator relatively accurately when the generator has not been trained, for example, the generator that has not completed training The output generated image is judged to be false, so that no matter how the generator is trained and learned, it cannot generate the generated image determined by the discriminator to be true. In other words, after the discriminator completes the training, most of the discriminator's discrimination result of the generated image output by the generator is "the generated image is a fake image", which causes the gradient of the generator to disappear, and it is impossible to continue learning how to generate a real image. Therefore, the authenticity of the generated image output by the trained generator cannot be guaranteed in the end.
有鉴于此,通过Lipschitz约束条件,可以降低判别器中各参数项的学习速率,从而能够在保证判别器对真假图像的判别准确率的同时,使生成器快速学习如何生成真实图像,进而可以将生成器有效应用在真实图像的图像编辑模型中。In view of this, through Lipschitz constraints, the learning rate of each parameter item in the discriminator can be reduced, so as to ensure the accuracy of the discriminator’s identification of true and false images, while allowing the generator to quickly learn how to generate real images, and then The generator is effectively applied to the image editing model of real images.
可选的,根据判别器的梯度更新配置信息,更新生成式对抗网络,包括:根据判别器的梯度更新配置信息,确定判别器包括的一个或多个特征提取层中各特征提取层对应的参数学习速率最大阈值;针对判别器中的各所述特征提取层,根据该特征提取层的参数学习速率最大阈值,更新判别器中该特征提取层的参数项,以使该特征提取层关联的参数项的更新速率与该特征提取层对应的参数学习速率最大阈值匹配。Optionally, updating the generative countermeasure network according to the gradient update configuration information of the discriminator includes: updating the configuration information according to the gradient update configuration information of the discriminator, and determining the parameter corresponding to each feature extraction layer in one or more feature extraction layers included in the discriminator Maximum learning rate threshold; for each of the feature extraction layers in the discriminator, according to the parameter learning rate maximum threshold of the feature extraction layer, update the parameter items of the feature extraction layer in the discriminator to make the parameters associated with the feature extraction layer The update rate of the item matches the maximum threshold of the parameter learning rate corresponding to the feature extraction layer.
参数学习速率最大阈值用于确定参数项的最大学习速率,参数项是指生成式对抗网络的参数项,具体可以是指判别器中一个或多个特征提取层各自对应的一个或多个参数项。特征提取层用于从输入中提取特征信息,并输出。判别器可以是任意深度的学习模型,通常为包括多个特征提取层的结构。The maximum parameter learning rate threshold is used to determine the maximum learning rate of the parameter item. The parameter item refers to the parameter item of the generative confrontation network. Specifically, it can refer to one or more parameter items corresponding to one or more feature extraction layers in the discriminator . The feature extraction layer is used to extract feature information from the input and output it. The discriminator can be a learning model of any depth, and is usually a structure including multiple feature extraction layers.
参数项的拟更新数值相对于更新前数值的学习速率,需要小于等于根据参数学习速率最大阈值所确定的最大学习速率。在一种可能的实现方式中,可以对部分或全部参数项配置参数学习速率最大阈值;或者,可以根据实际情况进行自定义设置需要配置参数学习速率最大阈值的参数项。对此,本发明实施例不做具体限定。The learning rate of the parameter item to be updated relative to the value before the update needs to be less than or equal to the maximum learning rate determined according to the maximum threshold of the parameter learning rate. In a possible implementation manner, the maximum threshold of the parameter learning rate can be configured for some or all of the parameter items; or, the parameter item that requires the configuration of the maximum threshold of the parameter learning rate can be customized according to the actual situation. In this regard, the embodiment of the present invention does not make a specific limitation.
根据特征提取层的参数学习速率最大阈值,更新判别器中该特征提取层的参数项,具体可以是:根据梯度更新配置信息可以确定该特征提取层关联的一个或多个参数项中各参数项的目标学习速率;针对各所述参数项,获取该参数项的拟更新数值;根据该参数项的拟更新数值和更新前数值,计算该参数项的学习速率;判断该学习速率与该参数项的目标学习速率的大小关系。当学习速率小于等于目标学习速率时,确定学习速率与目标学习速率匹配,按参数项的拟更新数值更新该参数项;当学习速率大于目标学习速率时,确定学习速率与目标学习速率不匹配,根据目标学习速率计算该参数项的目标数值(target value),并用所述目标数值更新该参数项。According to the maximum threshold of the parameter learning rate of the feature extraction layer, update the parameter items of the feature extraction layer in the discriminator, which can be specifically: According to the gradient update configuration information, each parameter item of one or more parameter items associated with the feature extraction layer can be determined The target learning rate of the parameter item; for each parameter item, obtain the value to be updated for the parameter item; calculate the learning rate of the parameter item according to the value to be updated and the value before the update of the parameter item; determine the learning rate and the parameter item The size relationship of the target learning rate. When the learning rate is less than or equal to the target learning rate, it is determined that the learning rate matches the target learning rate, and the parameter item is updated according to the expected update value of the parameter item; when the learning rate is greater than the target learning rate, it is determined that the learning rate does not match the target learning rate. Calculate the target value of the parameter item according to the target learning rate, and update the parameter item with the target value.
示例性的,可以基于如下公式计算该参数项的目标数值:Exemplarily, the target value of the parameter item can be calculated based on the following formula:
Figure PCTCN2021101007-appb-000001
Figure PCTCN2021101007-appb-000001
其中,α为学习速率,J(θ 0,θ 1)为拟合函数,θ 0为参数项的更新前数值,θ 1为参数项的目标数值。其中,α的取值可以为根据前述Lipschitz约束条件中常数L的取值而确定,例如可以等于上述目标学习速率。 Among them, α is the learning rate, J(θ 0 , θ 1 ) is the fitting function, θ 0 is the value before the update of the parameter term, and θ 1 is the target value of the parameter term. Wherein, the value of α can be determined according to the value of the constant L in the aforementioned Lipschitz constraint condition, for example, it can be equal to the aforementioned target learning rate.
通过配置判别器中各特征提取层的参数学习速率最大阈值,可以限制每个参数项的学习速率的最大值,减缓判别器的各参数项的学习速率,有效提高生成式对抗网络中判别器和生成器的学习一致性,从而能够在保证判别器对真假图像的判别准确率的同时使 生成器快速学习如何生成真实图像,进而可以将该生成器有效应用在真实图像的图像编辑模型结构中。By configuring the maximum threshold of the parameter learning rate of each feature extraction layer in the discriminator, the maximum learning rate of each parameter item can be limited, the learning rate of each parameter item of the discriminator can be slowed down, and the discriminator and the discriminator in the generative confrontation network can be effectively improved. The learning consistency of the generator enables the generator to quickly learn how to generate real images while ensuring the accuracy of the discriminator’s identification of true and false images, so that the generator can be effectively applied to the image editing model structure of real images .
S130,在确定生成式对抗网络满足训练结束条件时,根据训练得到的生成式对抗网络中的生成器生成图像编辑模型。S130: When it is determined that the generative confrontation network meets the training end condition, generate an image editing model according to the generator in the generative confrontation network obtained by training.
训练结束条件用于判断生成式对抗网络是否训练完成。通常,损失函数会收敛至设定数值,可以配置训练结束条件为损失函数的计算值小于设定数值,或者为损失函数的更新变化率小于设定阈值等。The training end condition is used to judge whether the training of the generative confrontation network is completed. Generally, the loss function will converge to a set value, and the training end condition can be configured to be that the calculated value of the loss function is less than the set value, or that the update rate of the loss function is less than the set threshold, etc.
生成式对抗网络训练完成时,其中的生成器可以相对准确地生成真实图像。可以通过对生成器进行调整,来获得图像编辑模型。由此,可以利用图像编辑模型对真实图像进行编辑,相应输出的编辑后图像为真实图像。When the generative confrontation network training is completed, the generator in it can generate real images relatively accurately. The image editing model can be obtained by adjusting the generator. As a result, the real image can be edited using the image editing model, and the edited image output correspondingly is the real image.
其中,图像编辑模型的编辑方式可以包括图像中像素的位置、尺寸、亮度和颜色等属性的改变。图像编辑模型的编辑方式不会改变图像的真实性质,通常对真实图像编辑后得到的图像仍为真实图像。示例性的,编辑方式包括对人脸的肤色、年龄、性别和器官区域等中的至少一项进行编辑。例如,将人脸的肤色由黄色编辑成白色;将人脸的年龄特征由50岁编辑成10岁;将人脸的性别特征由男性编辑成女性;将人脸的单眼皮编辑成双眼皮等。Among them, the editing mode of the image editing model may include changes in attributes such as the position, size, brightness, and color of pixels in the image. The editing method of the image editing model does not change the true nature of the image, and usually the image obtained after editing the real image is still the real image. Exemplarily, the editing method includes editing at least one of the skin color, age, gender, and organ region of the human face. For example, edit the skin color of the face from yellow to white; edit the age feature of the face from 50 to 10; edit the gender feature of the face from male to female; edit the single eyelid of the face to double eyelid, etc.
在一些可选实施例中,生成器包括编码器和解码器。实际上,在生成器结构中,存在级联的多个中间层。这些中间层对应的中间结果可以影响生成器的最终输出结果,也即最后的图像编辑效果。可以通过从生成器中获取特定的一个或多个中间层的输出结果作为隐藏空间(Latent Space),对隐藏空间进行调整后送入生成器后面的级联结构中,以达到图像编辑的效果。即,可以通过对生成器的隐藏空间的参数进行调整,生成图像编辑模型。In some alternative embodiments, the generator includes an encoder and a decoder. In fact, in the generator structure, there are multiple intermediate layers cascaded. The intermediate results corresponding to these intermediate layers can affect the final output result of the generator, that is, the final image editing effect. The output results of a specific one or more intermediate layers can be obtained from the generator as a hidden space (Latent Space), and the hidden space can be adjusted and sent to the cascade structure behind the generator to achieve the effect of image editing. That is, the image editing model can be generated by adjusting the parameters of the hidden space of the generator.
例如,可以通过对隐藏空间的编辑来调整人脸图像的性别特征。示例性的,输入女性人脸,输出男性人脸。其中,隐藏空间可以根据生成器的具体结构选择。可选的,生成器包括编码器和解码器,隐藏空间为解码器中的神经网络层。对隐藏空间进行编辑可以是:获取预先训练的图像编辑模型的参数项,更新该生成器的隐藏空间的参数项。For example, the gender characteristics of the face image can be adjusted by editing the hidden space. Exemplarily, a female face is input, and a male face is output. Among them, the hidden space can be selected according to the specific structure of the generator. Optionally, the generator includes an encoder and a decoder, and the hidden space is a neural network layer in the decoder. Editing the hidden space may be: obtaining the parameter items of the pre-trained image editing model, and updating the parameter items of the hidden space of the generator.
又如,可采用图像编辑样本对生成器继续进行训练,生成图像编辑模型。其中,图像编辑样本包括编辑前的真实图像和编辑后的真实图像等。例如,图像编辑样本可以包括编辑前的人脸图像和编辑后的人脸图像。编辑后的人脸图像与编辑前的人脸图像之间 的相关关系可以根据实际情况进行选择。例如,相关关系可包括性别、年龄和肤色等,本发明实施例对此并不进行限定。For another example, image editing samples can be used to continue training the generator to generate an image editing model. Among them, the image editing sample includes the real image before editing and the real image after editing. For example, the image editing sample may include a face image before editing and a face image after editing. The correlation between the face image after editing and the face image before editing can be selected according to the actual situation. For example, the correlation may include gender, age, skin color, etc., which is not limited in the embodiment of the present invention.
此外,还可以采用预先训练的标准编码器替换生成器中的编码器,用于从输入图像提取出有效特征。标准编码器用于学习如何从输入图像中提取可以表征输入图像的特征。生成器中的解码器的输入尺寸与标准编码器的输出尺寸匹配,其中,尺寸可以是向量的维度。In addition, a pre-trained standard encoder can be used to replace the encoder in the generator to extract effective features from the input image. The standard encoder is used to learn how to extract features that can characterize the input image from the input image. The input size of the decoder in the generator matches the output size of the standard encoder, where the size can be the dimension of the vector.
本发明实施例通过将包括噪声图像和/或真实图像的样本组输入到包括判别器和生成器的生成式对抗网络,训练生成式对抗网络,并根据利普希茨约束条件限制判别器的参数项的学习速率来减缓判别器的各参数项的学习速率,从而可以有效提高生成式对抗网络中判别器和生成器的训练速率一致性。这样,不仅可以使判别器的参数项的变化更加连续性和光滑,还可以保证判别器对真假图像的判别准确率,使生成器快速学习如何生成真实图像,进而可以将该生成器有效应用在真实图像的图像编辑模型结构中,以提高基于该图像编辑模型结构对真实图像进行编辑的效果的真实性。In the embodiment of the present invention, by inputting a sample group including noisy images and/or real images to a generative confrontation network including a discriminator and a generator, the generative confrontation network is trained, and the parameters of the discriminator are limited according to Lipschitz constraints The learning rate of the item can slow down the learning rate of each parameter item of the discriminator, which can effectively improve the consistency of the training rate of the discriminator and generator in the generative confrontation network. In this way, not only can the change of the parameter items of the discriminator be more continuous and smooth, but also the accuracy of the discriminator's identification of true and false images can be ensured, so that the generator can quickly learn how to generate real images, and then the generator can be effectively applied In the image editing model structure of the real image, the authenticity of the effect of editing the real image based on the image editing model structure is improved.
实施例二Example two
图2为本发明实施例二中的一种编辑模型生成方法的流程图,本实施例以上述实施例一为基础进行具体化。FIG. 2 is a flowchart of a method for generating an editing model in Embodiment 2 of the present invention. This embodiment is embodied on the basis of Embodiment 1 described above.
如图2所示,本实施例的方法具体包括:As shown in Figure 2, the method of this embodiment specifically includes:
S210,对生成式对抗网络进行迭代训练,生成式对抗网络包括生成器和判别器。S210: Perform iterative training on the generative confrontation network, which includes a generator and a discriminator.
本发明实施例中未详尽的描述可以参考前述实施例。For detailed descriptions in the embodiments of the present invention, reference may be made to the foregoing embodiments.
S220,在迭代训练中,根据判别器的梯度更新配置信息更新生成式对抗网络。其中,所述梯度更新配置信息通过利普希茨约束条件确定。S220: In the iterative training, update the generative confrontation network according to the gradient update configuration information of the discriminator. Wherein, the gradient update configuration information is determined by Lipschitz constraint conditions.
S230,根据损失函数配置信息计算生成式对抗网络的损失函数的值。其中,损失函数配置信息可用于在初始损失函数中添加欧氏距离范数,欧氏距离范数包括的元素为生成器中编码器的参数项。S230: Calculate the value of the loss function of the generative countermeasure network according to the loss function configuration information. Among them, the loss function configuration information can be used to add the Euclidean distance norm to the initial loss function, and the elements included in the Euclidean distance norm are the parameter items of the encoder in the generator.
生成式对抗网络的训练过程,实际上是对用于实现从输入到输出的算法的求解过程,而对算法的求解实际是对算法中各参数项的数值进行求解。该算法存在目标函数,算法的求解过程是对该目标函数的优化过程。通常,可以将损失函数作为目标函数。损失函数用来表示生成式对抗网络的预测值和真实值不一样的程度。例如,损失函数的值越小, 通常对应的生成式对抗网络的性能越好。通常,不同的模型使用的损失函数不一样。The training process of the generative confrontation network is actually the process of solving the algorithm used to realize the input to the output, and the solving of the algorithm is actually the numerical value of each parameter item in the algorithm. The algorithm has an objective function, and the solution process of the algorithm is an optimization process of the objective function. Generally, the loss function can be used as the objective function. The loss function is used to express the degree to which the predicted value of the generative confrontation network is different from the true value. For example, the smaller the value of the loss function, the better the performance of the corresponding generative adversarial network. Generally, different models use different loss functions.
在本发明实施例中,损失函数用于作为生成式对抗网络的训练目标。损失函数可以为如下形式:In the embodiment of the present invention, the loss function is used as the training target of the generative confrontation network. The loss function can be of the following form:
Figure PCTCN2021101007-appb-000002
Figure PCTCN2021101007-appb-000002
其中,LOSS为初始损失函数,代表判别器D的损失函数LOSS_D与生成器G的损失函数LOSS_G的加和;E(*)表示分布函数的期望值;m表示真实图像;θ d表示判别器D的参数项矩阵;P data(m)代表作为真实图像的样本的分布,并可通过判别器D映射到高位的数据空间得到D(m,θ d);n表示随机噪声;θ g表示生成器G的参数项矩阵;P noise(n)代表噪声分布,并可通过生成器G映射到高位的数据空间得到G(n,θ g)。
Figure PCTCN2021101007-appb-000003
表示判别器D的损失函数LOSS_D,并可以以LOSS_D最大化作为判别器D的训练目标;
Figure PCTCN2021101007-appb-000004
可表示生成器G的损失函数LOSS_G,并可以以最小化LOSS_G作为生成器D的训练目标。换言之,训练判别器D以最大化logD(m)为训练目标,从而不断提高判别器对生成器输出的生成图像是否真实的判别准确率,同时训练生成器G以最小化1-logD(G(n))为训练目标,从而不断缩小生成器输出的生成图像与真实图像之间的差异。这样,通过最大化判别器D的损失函数、并同时最小化生成器G的损失函数作为生成式对抗网络的训练目标,可实现判别器和生成器对抗训练的训练效果。
Among them, LOSS is the initial loss function, which represents the sum of the loss function LOSS_D of the discriminator D and the loss function LOSS_G of the generator G; E(*) represents the expected value of the distribution function; m represents the real image; θ d represents the value of the discriminator D Parameter term matrix; P data (m) represents the distribution of samples as real images, and can be mapped to the upper data space by the discriminator D to obtain D(m, θ d ); n represents random noise; θ g represents generator G The parameter item matrix of; P noise (n) represents the noise distribution, and can be mapped to the higher data space by the generator G to obtain G(n, θ g ).
Figure PCTCN2021101007-appb-000003
Represents the loss function LOSS_D of the discriminator D, and can maximize the LOSS_D as the training target of the discriminator D;
Figure PCTCN2021101007-appb-000004
It can represent the loss function LOSS_G of generator G, and minimize LOSS_G as the training target of generator D. In other words, the training discriminator D takes maximizing logD(m) as the training goal, so as to continuously improve the accuracy of the discriminator to determine whether the generated image output by the generator is true or not, while training the generator G to minimize 1-logD(G( n)) is the training target, so as to continuously reduce the difference between the generated image output by the generator and the real image. In this way, by maximizing the loss function of the discriminator D and minimizing the loss function of the generator G as the training target of the generative confrontation network, the training effect of the discriminator and generator confrontation training can be achieved.
仅采用初始损失函数作为生成式对抗网络的训练目标,需要执行大量图像特征提取操作,计算代价高,求解速度慢。有鉴于此,可以在初始损失函数的基础上添加欧氏距离范数作为约束条件。由于欧氏距离范数可以被分解为两个低维度的参数矩阵的组合,添加欧氏距离范数作为约束条件可有效降低参数矩阵的维度以及样本需求量。Only the initial loss function is used as the training target of the generative confrontation network, and a large number of image feature extraction operations need to be performed, which is computationally expensive and slow to solve. In view of this, the Euclidean distance norm can be added as a constraint condition on the basis of the initial loss function. Since the Euclidean distance norm can be decomposed into a combination of two low-dimensional parameter matrices, adding the Euclidean distance norm as a constraint condition can effectively reduce the dimension of the parameter matrix and the sample requirement.
训练生成式对抗网络还可能存在过拟合的问题,导致经训练的生成式对抗网络仅针对某几类真实图像具有较好的生成效果以及判别准确率,而对未知类别的真实图像的生成效果和判别准确率较差。有鉴于此,也可以考虑在初始损失函数的基础上添加欧氏距离范数作为约束条件,以使得映射到隐藏空间的分布更加平均,从而降低各特征向量的耦合,并相应提高生成式对抗网络的泛化能力以确保对未知类别的真实图像的应用效果。The training of generative confrontation network may also have the problem of over-fitting. As a result, the trained generative confrontation network has good generation effect and discrimination accuracy only for certain types of real images, while the generation effect for unknown types of real images And the accuracy of discrimination is poor. In view of this, it can also be considered to add the Euclidean distance norm as a constraint condition on the basis of the initial loss function, so that the distribution of the mapping to the hidden space is more even, thereby reducing the coupling of each feature vector, and correspondingly improving the generative confrontation network The generalization ability to ensure the application effect of real images of unknown categories.
损失函数配置信息用于在初始损失函数的基础上添加欧氏距离范数。该欧氏距离范数又可以称为正则化项,或者L2范数,是指各元素的平方和再开方的结果。添加欧氏 距离范数相当于对初始损失函数添加约束条件,实际是对于大数值的权重向量进行严厉惩罚,以倾向于更加分散的权重向量,从而实现权重的分配更均匀,并避免权重集中在少数向量上,使得生成式对抗网络更接近低维模型。维度越低,训练使用的数据量越小。因此,对初始损失函数添加欧氏距离范围作为约束条件,可以降低生成式对抗网络训练使用的数据量,从而可以降低生成式对抗网络训练的复杂度。The loss function configuration information is used to add the Euclidean distance norm on the basis of the initial loss function. The Euclidean distance norm can also be called the regularization term, or the L2 norm, which refers to the result of the square sum of each element. Adding the Euclidean distance norm is equivalent to adding constraints to the initial loss function. In fact, it severely penalizes large-value weight vectors to tend to more dispersed weight vectors, so as to achieve a more uniform weight distribution and avoid weights concentrated in With a small number of vectors, the generative adversarial network is closer to a low-dimensional model. The lower the dimensionality, the smaller the amount of data used for training. Therefore, adding the Euclidean distance range to the initial loss function as a constraint condition can reduce the amount of data used in the training of the generative confrontation network, thereby reducing the complexity of the training of the generative confrontation network.
具体的,更新后的损失函数可以为如下形式:Specifically, the updated loss function can be in the following form:
Figure PCTCN2021101007-appb-000005
Figure PCTCN2021101007-appb-000005
其中,θ g表示生成器G的隐藏空间的参数项矩阵,可具体为生成器G中编码器的隐藏空间的参数项矩阵。λ为惩罚系数,用于调整生成式对抗网络训练的复杂度,可以根据实际情况进行设定。‖·‖ F表示范数运算,
Figure PCTCN2021101007-appb-000006
表示隐藏空间的参数项矩阵θ g的欧氏距离范数。
Among them, θ g represents the parameter item matrix of the hidden space of the generator G, which can be specifically the parameter item matrix of the hidden space of the encoder in the generator G. λ is the penalty coefficient, which is used to adjust the training complexity of the generative confrontation network, which can be set according to the actual situation. ‖·‖ F stands for norm operation,
Figure PCTCN2021101007-appb-000006
Represents the Euclidean distance norm of the parameter matrix θ g of the hidden space.
欧氏距离范数包括的元素可以是生成器G的隐藏空间的参数项矩阵θ g,具体可以是生成器中编码器的参数项。 The elements included in the Euclidean distance norm may be the parameter item matrix θ g of the hidden space of the generator G, and specifically may be the parameter items of the encoder in the generator.
S240,如果确定损失函数满足稳定条件(也可称为收敛条件),则确定生成式对抗网络满足训练结束条件,并可根据训练得到的生成式对抗网络中的生成器生成图像编辑模型。S240: If it is determined that the loss function satisfies the stability condition (also referred to as the convergence condition), it is determined that the generative confrontation network meets the training end condition, and the image editing model can be generated according to the generator in the generative confrontation network obtained by training.
稳定条件用于判断损失函数是否趋于稳定、也可称趋于收敛。例如,稳定条件用于判断相邻训练轮次中损失函数的变化率是否小于设定变化率阈值,该变化率阈值的大小可根据实际情况限定。可以理解,损失函数的值随训练轮数的变化极小,则表明该损失函数稳定。损失函数的变化率可以是:当轮训练计算得到的损失函数的当前值与上一轮训练计算得到的损失函数的历史值之间的差值,相对于损失函数的当前值的比值。如果该比值小于设定阈值,则确定即使再训练损失函数的变化速率也很小,表明损失函数趋于稳定,或称损失函数已收敛。此时,确定生成式对抗网络训练完成。或者,稳定条件可以是判断训练轮数是否超过设定轮数阈值,如果生成式对抗网络的训练轮数足够多,则可以确定生成式对抗网络训练完成。The stability condition is used to judge whether the loss function tends to be stable or convergent. For example, the stable condition is used to determine whether the change rate of the loss function in adjacent training rounds is less than a set change rate threshold, and the size of the change rate threshold can be limited according to actual conditions. It can be understood that the value of the loss function changes very little with the number of training rounds, which indicates that the loss function is stable. The rate of change of the loss function may be: the difference between the current value of the loss function calculated in the current round of training and the historical value of the loss function calculated in the previous round of training, relative to the ratio of the current value of the loss function. If the ratio is less than the set threshold, it is determined that the rate of change of the loss function is small even after retraining, indicating that the loss function has stabilized, or that the loss function has converged. At this point, it is determined that the training of the generative confrontation network is completed. Alternatively, the stable condition may be to determine whether the number of training rounds exceeds the set round number threshold. If the number of training rounds of the generative confrontation network is sufficient, it can be determined that the training of the generative confrontation network is completed.
本发明实施例通过在初始损失函数的基础上添加范数作为约束条件,可以使向量的权重分配更均匀,避免权重集中在少数向量上,从而不仅可以降低生成式对抗网络训练使用的数据量和计算的复杂度,还可以提高所训练的生成式对抗网络的泛化能力,即扩展了经训练的生成式对抗网络可适用的真实图像的类型范围,从而可确保经训练的生成 式对抗网络应用于未知类别图像的判别准确率和编辑效果。In the embodiment of the present invention, by adding a norm as a constraint condition on the basis of the initial loss function, the weight distribution of the vectors can be made more uniform, and the weights can be prevented from being concentrated on a few vectors, thereby not only reducing the amount of data used in the training of the generative confrontation network and The computational complexity can also improve the generalization ability of the trained generative countermeasure network, that is, expand the range of real images applicable to the trained generative countermeasure network, thereby ensuring the application of the trained generative countermeasure network The accuracy and editing effect of images in unknown categories.
实施例三Example three
图3A为本发明实施例三中的一种编辑模型生成方法的流程图,本实施例以上述实施例为基础进行具体化。FIG. 3A is a flowchart of a method for generating an editing model in Embodiment 3 of the present invention. This embodiment is embodied on the basis of the above-mentioned embodiment.
如图3A所示,本实施例的方法具体包括:As shown in FIG. 3A, the method of this embodiment specifically includes:
S310,对生成式对抗网络进行迭代训练。其中,所述生成式对抗网络包括生成器和判别器。S310: Perform iterative training on the generative confrontation network. Wherein, the generative confrontation network includes a generator and a discriminator.
本发明实施例中未详尽的描述可以参考前述实施例。For detailed descriptions in the embodiments of the present invention, reference may be made to the foregoing embodiments.
S320,在迭代训练中,根据判别器的梯度更新配置信息更新生成式对抗网络。其中,所述梯度更新配置信息通过利普希茨约束条件确定。S320: In iterative training, update the generative confrontation network according to the gradient update configuration information of the discriminator. Wherein, the gradient update configuration information is determined by Lipschitz constraint conditions.
S330,在确定生成式对抗网络满足训练结束条件时,根据预先训练的图像特征检测模型中的卷积神经网络更新训练得到的生成式对抗网络中的生成器。其中,图像特征检测模型是根据图像特征样本训练得到。图像特征样本可以包括同一图像中的两个区域图像块以及这两个区域图像块之间的关系数据。S330: When it is determined that the generative confrontation network satisfies the training end condition, update the generator in the trained generative confrontation network according to the convolutional neural network in the pre-trained image feature detection model. Among them, the image feature detection model is obtained by training based on image feature samples. The image feature sample may include two regional image blocks in the same image and relationship data between the two regional image blocks.
示例性的,图像特征检测模型可包括两个共享权重的卷积神经网络、特征向量拼接器和全连接网络分类器。其中,卷积神经网络用于提取区域图像块的特征信息,并形成特征向量;特征向量拼接器用于将各卷积神经网络生成的特征向量合成目标特征向量;全连接网络分类器用于将目标特征向量进行分类,并输出各区域图像块之间的关系数据。Exemplarily, the image feature detection model may include two convolutional neural networks sharing weights, a feature vector splicer, and a fully connected network classifier. Among them, the convolutional neural network is used to extract the feature information of the regional image block and form a feature vector; the feature vector splicer is used to synthesize the feature vector generated by each convolutional neural network into the target feature vector; the fully connected network classifier is used to combine the target feature The vector is classified, and the relationship data between the image blocks in each area is output.
图像特征检测模型用于从图像中提取特征,可以是预先训练的深度学习模型。具体的,图像特征检测模型可通过自监督的方式,学习从图像中提取不同区域图像块的特征以及不同区域图像块之间的关系。The image feature detection model is used to extract features from the image, and can be a pre-trained deep learning model. Specifically, the image feature detection model can learn to extract the features of image blocks in different regions and the relationship between image blocks in different regions in a self-supervised manner.
其中,不同区域图像块为在同一图像中的局部图像区域,区域图像块两两之间不存在交叠或重叠。同一图像中区域图像块的个数和具体设置可以根据实际情况进行选择。例如,在图像中检测目标对象,并将目标对象划分为九等份(例如九宫格形式)。对此,本发明实施例不做具体限定。Among them, the image blocks of different regions are partial image regions in the same image, and there is no overlap or overlap between the two regional image blocks. The number and specific settings of regional image blocks in the same image can be selected according to actual conditions. For example, the target object is detected in the image, and the target object is divided into nine equal parts (for example, in the form of nine square grids). In this regard, the embodiment of the present invention does not make a specific limitation.
关系数据用于描述两个区域图像块之间的关系。关系数据可以是区域图像块的位置关系、尺寸关系、形状关系和颜色关系等中的至少一项。示例性的,关系数据包括位置关系。例如,在将图像划分为九宫格形式的区域图像块的情况下,关系数据中的位置关 系可以包括左上、中上、右上、正左、正中、正右、左下、中下和右下等。The relationship data is used to describe the relationship between the two regional image blocks. The relationship data may be at least one of the position relationship, size relationship, shape relationship, and color relationship of the regional image blocks. Exemplarily, the relationship data includes a position relationship. For example, in the case of dividing an image into regional image blocks in the form of a nine-square grid, the positional relationship in the relationship data may include upper left, upper middle, upper right, right left, middle, right right, bottom left, bottom middle, and bottom right.
区域图像块的特征信息用于以数据形式、例如特征向量对区域图像块进行表示。实际上,特征信息是从不同维度表示区域图像块,特征向量可用于表示相应的维度信息。The feature information of the regional image block is used to represent the regional image block in the form of data, for example, a feature vector. In fact, the feature information represents regional image blocks from different dimensions, and the feature vector can be used to represent the corresponding dimensional information.
卷积神经网络和特征向量拼接器用于将原始图像数据映射到隐藏空间,全连接网络分类器用于将学习到的分布式特征表示映射到样本标记空间,从而可以根据样本标记确定样本的分类。其中,卷积神经网络可采用PixelShuffle方法实现对特征图的上采样,以减少转置卷积或者普通的线性插值上采样方法带来的人造物效应,从而可以提高基于卷积神经网络构造的生成器输出的生成图像的真实性。The convolutional neural network and the feature vector stitcher are used to map the original image data to the hidden space, and the fully connected network classifier is used to map the learned distributed feature representation to the sample label space, so that the classification of the sample can be determined based on the sample label. Among them, the convolutional neural network can use the PixelShuffle method to realize the upsampling of the feature map to reduce the artifact effect caused by the transposed convolution or the ordinary linear interpolation upsampling method, which can improve the generation based on the convolutional neural network structure The authenticity of the generated image output by the generator.
具体如图3B所示,图像特征检测模型可包括共享权重的第一卷积神经网络301和第二卷积神经网络302、特征向量拼接器303以及全连接网络分类器304。其中,用于构建生成器的卷积神经网络可以是第一卷积神经网络301和第二卷积神经网络302中的任意一个。Specifically, as shown in FIG. 3B, the image feature detection model may include a first convolutional neural network 301 and a second convolutional neural network 302 that share weights, a feature vector splicer 303, and a fully connected network classifier 304. Wherein, the convolutional neural network used to construct the generator may be any one of the first convolutional neural network 301 and the second convolutional neural network 302.
图像特征检测模型的具体操作可以包括:将人脸图像划分为至少两个区域图像块(例如,嘴巴区域图像块和右眼区域图像块);本实施例中可以将嘴巴区域图像块输入到第一卷积神经网络301中进行特征提取,得到第一卷积神经网络301输出的第一特征向量;将右眼区域图像块输入到第二卷积神经网络302中进行特征提取,得到第二卷积神经网络302输出的第二特征向量;将第一特征向量和第二特征向量输入到特征向量拼接器303中进行拼接,得到特征向量拼接器303输出的拼接后特征向量;将拼接后特征向量输入到全连接网络分类器304进行分类,得出嘴巴区域图像块和右眼区域图像块之间的关系数据。例如,全连接网络分类器304可以得出右眼区域图像块在嘴巴区域图像块的右上方。The specific operation of the image feature detection model may include: dividing the face image into at least two regional image blocks (for example, the mouth region image block and the right eye region image block); in this embodiment, the mouth region image block can be input to the first Perform feature extraction in a convolutional neural network 301 to obtain the first feature vector output by the first convolutional neural network 301; input the image block of the right eye area into the second convolutional neural network 302 for feature extraction to obtain the second volume The second feature vector output by the product neural network 302; the first feature vector and the second feature vector are input to the feature vector splicer 303 for splicing, and the spliced feature vector output by the feature vector splicer 303 is obtained; the spliced feature vector Input to the fully connected network classifier 304 for classification, and obtain the relationship data between the image block of the mouth area and the image block of the right eye area. For example, the fully connected network classifier 304 can determine that the image block of the right eye area is at the upper right of the image block of the mouth area.
可选的,图像特征样本可包括在同一人脸图像中的两个人脸器官区域图像块以及这两个人脸器官区域图像块之间的关系数据。Optionally, the image feature sample may include two face organ region image blocks in the same face image and the relationship data between the two face organ region image blocks.
人脸器官区域图像块可以是按照人脸器官进行划分的图像块,例如,鼻子区域图像块和嘴巴区域图像块。人脸器官区域图像块之间的关系数据可以表示,两个人脸器官区域图像块在人脸图像中的相对位置关系。例如,鼻子区域图像块在正中间,嘴巴区域图像块在中下,关系数据可以是鼻子区域图像块位于嘴巴区域图像块的上方。The facial organ region image blocks may be image blocks divided according to facial organs, for example, a nose region image block and a mouth region image block. The relationship data between the image blocks of the face organ region may indicate the relative positional relationship of the two face organ region image blocks in the face image. For example, if the nose area image block is in the middle and the mouth area image block is in the middle and bottom, the relational data may be that the nose area image block is located above the mouth area image block.
通过采用人脸图像中的人脸器官区域图像块作为图像特征样本,可以准确地从人脸图像中提取出用于区分人脸器官的特征信息,并进行学习。这样,可以有助于准确识别 待编辑的人脸图像的各个器官,从而可以提高人脸编辑的真实性。By using the facial organ region image blocks in the face image as the image feature sample, the feature information used to distinguish the facial organs can be accurately extracted from the face image, and learning can be performed. In this way, it can help to accurately identify the various organs of the face image to be edited, so that the authenticity of face editing can be improved.
生成器的解码器中可以包括卷积神经网络。本步骤可以将预先训练的图像特征检测模型中的卷积神经网络,作为生成器的解码器中的卷积神经网络;或者,将预先训练的图像特征检测模型中的卷积神经网络的参数项,迁移到生成器的解码器中的卷积神经网络。The decoder of the generator may include a convolutional neural network. In this step, the convolutional neural network in the pre-trained image feature detection model can be used as the convolutional neural network in the decoder of the generator; or, the parameter items of the convolutional neural network in the pre-trained image feature detection model , Migrate to the convolutional neural network in the decoder of the generator.
在一些可选实施例中,可以在生成器的解码器既有存在的特征提取网络中额外添加预先训练的图像特征检测模型中的卷积神经网络。例如,可以将卷积神经网络和其他特征提取网络进行共享权重,将卷积神经网络的输出特征向量和其他特征提取层的输出特征向量进行拼接,并将拼接后的特征向量输入到原特征提取层输出特征向量的模块中,例如,全连接网络分类器中等。In some optional embodiments, the convolutional neural network in the pre-trained image feature detection model may be additionally added to the existing feature extraction network of the decoder of the generator. For example, the convolutional neural network and other feature extraction networks can share weights, the output feature vector of the convolutional neural network and the output feature vector of other feature extraction layers can be spliced, and the spliced feature vector can be input to the original feature extraction Layer output feature vector module, for example, fully connected network classifier, etc.
S340,根据更新后的生成器,生成图像编辑模型。S340: Generate an image editing model according to the updated generator.
更新后的生成器采用自监督学习方式训练生成的卷积神经网络,利用少量样本即可完成对卷积神经网络的训练,有效降低了对生成器的训练样本的需求量,并可提高训练速度。The updated generator uses a self-supervised learning method to train the generated convolutional neural network. A small number of samples can be used to complete the training of the convolutional neural network, which effectively reduces the demand for the generator's training samples and improves the training speed .
本发明实施例通过预先基于自监督学习方式训练生成的卷积神经网络更新生成式对抗网络中的生成器,并基于该更新后的生成器构建图像编辑模型,可以有效提取图像编辑模型的输入图像中的特征,减少标注样本的需求量,减少图像编辑模型的训练样本量,从而提高图像编辑模型的生成速度,降低图像编辑模型的标注人工成本。In the embodiment of the present invention, the generator in the generative confrontation network is updated by the convolutional neural network trained and generated in advance based on the self-supervised learning method, and the image editing model is constructed based on the updated generator, which can effectively extract the input image of the image editing model The features in, reduce the demand for labeling samples, reduce the training sample size of the image editing model, thereby increasing the generation speed of the image editing model, and reducing the labeling labor cost of the image editing model.
实施例四Embodiment four
图4A为本发明实施例四中的一种人脸图像编辑方法的流程图。本实施例可适用于采用图像编辑模型进行人脸图像编辑的情况。该方法可以由本发明实施例提供的人脸图像编辑装置来执行,该装置可采用软件和/或硬件的方式实现,并一般可集成计算机设备中。如图4A所示,本实施例的方法具体包括:Fig. 4A is a flowchart of a method for editing a face image in the fourth embodiment of the present invention. This embodiment is applicable to a situation where an image editing model is used to edit a face image. The method may be executed by the face image editing apparatus provided by the embodiment of the present invention, and the apparatus may be implemented in software and/or hardware, and may generally be integrated into computer equipment. As shown in FIG. 4A, the method of this embodiment specifically includes:
S410,获取待编辑的人脸图像。S410: Acquire a face image to be edited.
人脸图像为包括人脸的真实图像。例如,用户自拍的照片。需要说明的是,卡通人物的人脸图像不是真实图像。The human face image is a real image including the human face. For example, photos taken by users themselves. It should be noted that the face images of cartoon characters are not real images.
本发明实施例中未详尽的描述可以参考前述实施例。For detailed descriptions in the embodiments of the present invention, reference may be made to the foregoing embodiments.
S420,将待编辑的人脸图像输入到图像编辑模型中,得到图像编辑模型输出的编 辑人脸图像。其中,所述图像编辑模型为通过如本发明前述任一实施例中的编辑模型生成方法所生成。S420: Input the face image to be edited into the image editing model to obtain the edited face image output by the image editing model. Wherein, the image editing model is generated by the editing model generation method in any of the foregoing embodiments of the present invention.
在本实施例中,图像编辑模型可通过本发明前述任一实施例的编辑模型的生成方法生成。换言之,该图像编辑模型中的生成器或者该生成器中的解码器,源自通过本发明前述任一实施例的编辑模型的生成方法得到的生成式对抗网络。该生成式对抗网络包括生成器和判别器,并采用利普希茨约束条件确定判别器的梯度更新配置信息,以减缓判别器的各参数项的学习速率,使判别器学习如何判别真实图像与生成器学习如何生成真实图像的速率尽量一致,从而可同时保证生成式对抗网络判别真实图像的准确率以及生成图像的真实性。In this embodiment, the image editing model can be generated by the editing model generation method of any of the foregoing embodiments of the present invention. In other words, the generator in the image editing model or the decoder in the generator is derived from the generative confrontation network obtained by the method for generating the editing model in any of the foregoing embodiments of the present invention. The generative confrontation network includes a generator and a discriminator, and uses Lipschitz constraints to determine the gradient update configuration information of the discriminator, so as to slow down the learning rate of each parameter item of the discriminator, so that the discriminator learns how to distinguish between real images and The rate at which the generator learns how to generate real images is as consistent as possible, so as to ensure both the accuracy of the generative adversarial network to distinguish real images and the authenticity of the generated images.
具体如图4B所示,三张图像中,左侧第一张图像为教学书籍中常用的标准处理图像,该图像可作为真实人脸图像。中间第二张图像为某个动态视频中的视频帧。右侧第三张图像为对第一张图像进行编辑以模拟中间视频帧的张嘴动作,从而形成的编辑后图像。Specifically, as shown in FIG. 4B, among the three images, the first image on the left is a standard processed image commonly used in teaching books, and this image can be used as a real face image. The second image in the middle is a video frame in a dynamic video. The third image on the right is the edited image formed by editing the first image to simulate the mouth opening action of the intermediate video frame.
本发明实施例通过使用利普希茨约束条件,约束在生成式对抗网络的训练中判别器的梯度更新配置信息,可以有效减缓判别器的各参数项的学习速率,提高判别器与生成器的训练一致性,在确保最终训练完成的判别器的判别准确率的同时保证了生成器输出的生成图像的真实性。这样,基于最终训练完成的生成式对抗网络中的生成器所构建的编辑模型,在用于获取真实人脸图像的编辑图像时,可以有效保证图像编辑效果的真实性,进一步提高用户的使用体验。The embodiment of the present invention uses Lipschitz constraints to constrain the gradient update configuration information of the discriminator in the training of the generative confrontation network, which can effectively slow down the learning rate of each parameter item of the discriminator, and improve the performance of the discriminator and generator. Training consistency guarantees the authenticity of the generated image output by the generator while ensuring the discrimination accuracy of the final trained discriminator. In this way, the editing model built based on the generator in the generative confrontation network that is finally trained can effectively ensure the authenticity of the image editing effect when used to obtain the edited image of the real face image, and further improve the user experience .
实施例五Embodiment five
图5为本发明实施例五中的一种编辑模型生成装置的示意图。实施例五是实现本发明上述实施例提供的编辑模型生成方法的相应装置,该装置可采用软件和/或硬件的方式实现,并一般可集成计算机设备中等。Fig. 5 is a schematic diagram of an editing model generating device in the fifth embodiment of the present invention. The fifth embodiment is a corresponding device that implements the editing model generation method provided in the foregoing embodiment of the present invention. The device can be implemented in software and/or hardware, and generally can be integrated with computer equipment.
如图5所示,本实施例的装置可以包括:As shown in FIG. 5, the apparatus of this embodiment may include:
网络训练模块510,用于对生成式对抗网络进行迭代训练,所述生成式对抗网络包括生成器和判别器;The network training module 510 is used for iterative training of a generative confrontation network, the generative confrontation network including a generator and a discriminator;
网络更新模块520,用于在所述迭代训练中,根据判别器的梯度更新配置信息更新生成式对抗网络,所述梯度更新配置信息通过利普希茨约束条件确定;The network update module 520 is configured to update the generative confrontation network according to the gradient update configuration information of the discriminator in the iterative training, and the gradient update configuration information is determined by Lipschitz constraint conditions;
模型生成模块530,用于在确定生成式对抗网络满足训练结束条件时,根据训练得 到的生成式对抗网络中的生成器生成图像编辑模型。The model generation module 530 is configured to generate an image editing model according to the generator in the trained generative confrontation network when it is determined that the generative confrontation network meets the training end condition.
本发明实施例通过将真实图像和/或噪声图像作为样本输入到生成式对抗网络来迭代训练包括生成器和判别器的生成式对抗网络,并根据利普希茨约束条件限制判别器的参数项的学习速率,以提高判别器和生成器的学习一致性,可以在保证判别器对真假图像的判别准确率的同时确保生成器输出的生成图像的真实性,进而可以将该生成器有效应用在真实图像的图像编辑模型结构中,提高基于该图像编辑模型结构编辑人脸图像效果的真实性。The embodiment of the present invention uses real images and/or noise images as samples to be input to the generative countermeasure network to iteratively train a generative countermeasure network including a generator and a discriminator, and limit the parameter items of the discriminator according to Lipschitz constraints In order to improve the learning consistency of the discriminator and the generator, it can ensure the accuracy of the discriminator’s identification of true and false images while ensuring the authenticity of the generated image output by the generator, and then the generator can be effectively applied In the image editing model structure of the real image, the authenticity of the effect of editing the face image based on the image editing model structure is improved.
进一步的,模型生成模块530包括损失函数计算单元,用于:通过根据损失函数配置信息在初始损失函数的基础上添加欧氏距离范数作为约束条件,得到生成式对抗网络的损失函数,欧氏距离范数包括的元素为生成器中编码器的参数项;在确定损失函数满足收敛条件的情况下,确定生成式对抗网络满足训练结束条件,并根据训练得到的生成式对抗网络中的生成器生成图像编辑模型。Further, the model generation module 530 includes a loss function calculation unit for: adding Euclidean distance norm as a constraint condition on the basis of the initial loss function according to the configuration information of the loss function to obtain the loss function of the generative countermeasure network. The elements included in the distance norm are the parameter items of the encoder in the generator; when the loss function is determined to meet the convergence condition, it is determined that the generative confrontation network meets the training end condition, and the generator in the generated confrontation network is confronted according to the training Generate image editing model.
进一步的,网络更新模块520包括判别器参数项更新单元,用于:根据判别器的梯度更新配置信息,确定判别器包括的一个或多个特征提取层中各特征提取层对应的参数学习速率最大阈值;针对判别器中的各特征提取层,根据该特征提取层的参数学习速率最大阈值,更新判别器中该特征提取层的参数项,以使该特征提取层关联的参数项的更新速率与该特征提取层对应的参数学习速率最大阈值匹配。Further, the network update module 520 includes a discriminator parameter item update unit, which is used to: according to the gradient update configuration information of the discriminator, determine that the parameter learning rate corresponding to each feature extraction layer in the one or more feature extraction layers included in the discriminator is the largest Threshold; For each feature extraction layer in the discriminator, according to the maximum threshold of the parameter learning rate of the feature extraction layer, update the parameter items of the feature extraction layer in the discriminator so that the update rate of the parameter items associated with the feature extraction layer is the same as The parameter learning rate corresponding to the feature extraction layer is matched with the maximum threshold value.
进一步的,模型生成模块530包括自监督生成单元,用于:基于预先训练的图像特征检测模型中的卷积神经网络,更新训练得到的生成式对抗网络中生成器,可具体更新生成器中的解码器;根据更新后的生成器生成图像编辑模型。其中,图像特征检测模型通过根据图像特征样本训练得到,图像特征样本包括在同一图像中的两个区域图像块以及这两个区域图像块之间的关系数据。图像特征检测模型可包括两个共享权重的卷积神经网络、特征向量拼接器和全连接网络分类器。卷积神经网络用于提取区域图像块的特征信息,并形成特征向量;特征向量拼接器用于将各卷积神经网络生成的特征向量合成目标特征向量;全连接网络分类器用于将目标特征向量进行分类,并输出各区域图像块之间的关系数据。Further, the model generation module 530 includes a self-supervised generation unit, which is used to: based on the pre-trained image feature detection model of the convolutional neural network, update the generator in the generative confrontation network obtained by training, and specifically update the generator in the generator Decoder; Generate image editing model based on the updated generator. The image feature detection model is obtained by training based on image feature samples, and the image feature samples include two regional image blocks in the same image and the relationship data between the two regional image blocks. The image feature detection model can include two convolutional neural networks that share weights, a feature vector splicer, and a fully connected network classifier. The convolutional neural network is used to extract the feature information of the regional image block and form a feature vector; the feature vector splicer is used to synthesize the feature vector generated by each convolutional neural network into the target feature vector; the fully connected network classifier is used to perform the target feature vector Categorize and output the relationship data between image blocks in each area.
进一步的,图像特征样本包括在同一人脸图像中的两个人脸器官区域图像块以及这两个人脸器官区域图像块之间的关系数据。Further, the image feature sample includes two face organ region image blocks in the same face image and the relationship data between the two face organ region image blocks.
进一步的,网络训练模块510包括训练单元,用于将包括真实图像和/或噪声图像 的样本输入到生成式对抗网络,对生成式对抗网络进行一轮训练。Further, the network training module 510 includes a training unit for inputting samples including real images and/or noisy images into the generative confrontation network, and performs a round of training on the generative confrontation network.
上述编辑模型生成装置可执行本发明实施例任一所提供的编辑模型生成方法,并实现相同的有益效果。The above-mentioned editing model generating device can execute the editing model generating method provided by any one of the embodiments of the present invention, and achieve the same beneficial effects.
实施例六Example Six
图6为本发明实施例六中的一种人脸图像编辑装置的示意图。实施例六是实现本发明上述实施例提供的人脸图像编辑方法的相应装置,该装置可采用软件和/或硬件的方式实现,并一般可集成计算机设备中等。Fig. 6 is a schematic diagram of a face image editing device in the sixth embodiment of the present invention. The sixth embodiment is a corresponding device that implements the face image editing method provided in the foregoing embodiment of the present invention. The device can be implemented in software and/or hardware, and generally can be integrated with computer equipment.
如图6所示,本实施例的装置可以包括:As shown in FIG. 6, the apparatus of this embodiment may include:
人脸图像获取模块610,用于获取待编辑的人脸图像;The face image acquisition module 610 is used to acquire the face image to be edited;
人脸图像编辑模块620,用于将待编辑的人脸图像输入到图像编辑模型中,得到图像编辑模型输出的编辑人脸图像;其中,图像编辑模型为通过如本发明前述任一实施例的编辑模型生成方法所生成。The face image editing module 620 is configured to input the face image to be edited into the image editing model to obtain the edited face image output by the image editing model; wherein, the image editing model is adopted as in any of the foregoing embodiments of the present invention Generated by editing model generation method.
本发明实施例通过根据利普希茨约束条件确定生成式对抗网络中的判别器的梯度更新配置信息,并基于该梯度更新配置信息约束判别器的参数项的学习速率,可以提高生成式对抗网络中判别器和生成器的训练一致性,从而可以确保基于最终训练完成的生成式对抗网络中的生成器构建的编辑模型的图像编辑效果的真实性,有效改善了用户的使用体验。In the embodiment of the present invention, the gradient update configuration information of the discriminator in the generative confrontation network is determined according to Lipschitz constraints, and the learning rate of the parameter items of the discriminator is restricted based on the gradient update configuration information, which can improve the generative confrontation network The training consistency of the middle discriminator and the generator can ensure the authenticity of the image editing effect of the editing model constructed based on the generator in the generative confrontation network of the final training, which effectively improves the user experience.
上述人脸图像编辑装置可执行本发明实施例任一所提供的人脸图像编辑方法,并实现相同的有益效果。The aforementioned facial image editing device can execute the facial image editing method provided by any one of the embodiments of the present invention, and achieve the same beneficial effects.
实施例七Example Seven
图7为本发明实施例七提供的一种计算机设备的结构示意图。图7示出了适于用来实现本发明实施方式的示例性计算机设备12的框图。图7显示的计算机设备12仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。FIG. 7 is a schematic structural diagram of a computer device according to Embodiment 7 of the present invention. Figure 7 shows a block diagram of an exemplary computer device 12 suitable for implementing embodiments of the present invention. The computer device 12 shown in FIG. 7 is only an example, and should not bring any limitation to the function and application scope of the embodiment of the present invention.
如图7所示,计算机设备12以通用计算设备的形式表现。计算机设备12的组件可以包括但不限于:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括系统存储器28和处理单元16)的总线18。As shown in FIG. 7, the computer device 12 is represented in the form of a general-purpose computing device. The components of the computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 connecting different system components (including the system memory 28 and the processing unit 16).
总线18表示几类总线结构中的一种或多种,包括存储器总线、外围总线、或者使用多种总线结构中的任意总线结构。举例来说,这些总线结构包括但不限于工业标准体 系结构(Industry Standard Architecture,ISA)总线、微通道体系结构(Micro Channel Architecture,MCA)总线、增强型ISA总线、视频电子标准协会(Video Electronics Standards Association,VESA)局域总线以及外围组件互连(Peripheral Component Interconnect,PCI)总线。The bus 18 represents one or more of several types of bus structures, including a memory bus, a peripheral bus, or any bus structure using multiple bus structures. For example, these bus structures include but are not limited to Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards) Association, VESA) local bus and Peripheral Component Interconnect (PCI) bus.
计算机设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机设备12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。The computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by the computer device 12, including volatile and nonvolatile media, removable and non-removable media.
系统存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)30和/或高速缓存存储器32。计算机设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图7未显示,通常称为“硬盘驱动器”)。尽管图7中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM),数字视盘(Digital Video Disc-Read Only Memory,DVD-ROM))或者其它光介质读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。系统存储器28可以存储至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本发明各实施例的功能。The system memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. For example only, the storage system 34 may be used to read and write non-removable, non-volatile magnetic media (not shown in FIG. 7 and generally referred to as a "hard drive"). Although not shown in FIG. 7, a disk drive for reading and writing to a removable non-volatile disk (such as a "floppy disk") and a removable non-volatile optical disk (such as a compact disk read-only memory (Compact Disk)) can be provided. Disc Read-Only Memory, CD-ROM), Digital Video Disc-Read Only Memory (DVD-ROM)) or other optical media read and write optical disc drives. In these cases, each drive can be connected to the bus 18 through one or more data media interfaces. The system memory 28 may store at least one program product, the program product having a set (for example, at least one) program modules, and these program modules are configured to perform the functions of the various embodiments of the present invention.
具有一组(至少一个)程序模块42的程序/实用工具40可以存储在例如系统存储器28中。程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据。这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本发明所描述的实施例中的功能和/或方法。The program/utility tool 40 having a set of (at least one) program module 42 may be stored in the system memory 28, for example. The program module 42 includes, but is not limited to, an operating system, one or more application programs, other program modules, and program data. Each of these examples or some combination may include the implementation of a network environment. The program module 42 generally executes the functions and/or methods in the described embodiments of the present invention.
计算机设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该计算机设备12交互的设备通信,和/或与使得该计算机设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(Input/Output,I/O)接口22进行。并且,计算机设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN))通信。如图所示,网络适配器20通过总线18与计算机设备12的其它模块通信。应当明白,尽管图7中未示出,可以结合计算机设备12使用其它硬件和/或软件模块,包括但不限于微 代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、(Redundant Arrays of Inexpensive Disks,RAID)系统、磁带驱动器以及数据备份存储系统等。The computer device 12 can also communicate with one or more external devices 14 (such as keyboards, pointing devices, displays 24, etc.), and can also communicate with one or more devices that enable users to interact with the computer device 12, and/or communicate with Any device (such as a network card, modem, etc.) that enables the computer device 12 to communicate with one or more other computing devices. This communication can be performed through an input/output (Input/Output, I/O) interface 22. In addition, the computer device 12 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN)) through the network adapter 20. As shown in the figure, the network adapter 20 communicates with other modules of the computer device 12 through the bus 18. It should be understood that although not shown in FIG. 7, other hardware and/or software modules can be used in conjunction with the computer device 12, including but not limited to microcode, device drivers, redundant processing units, external disk drive arrays, (Redundant Arrays of Expensive Disks, RAID) systems, tape drives, and data backup storage systems.
处理单元16通过运行存储在系统存储器28中的程序模块42,从而执行各种功能应用以及数据处理,例如实现本发明任意实施例所提供的一种编辑模型生成方法和/或人脸图像编辑方法。The processing unit 16 executes various functional applications and data processing by running the program module 42 stored in the system memory 28, for example, to implement an editing model generation method and/or a face image editing method provided by any embodiment of the present invention .
实施例八Example eight
本发明实施例八提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请所有发明实施例提供的编辑模型生成方法,或实现如本申请所有发明实施例提供的人脸图像编辑方法。The eighth embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, it realizes the editing model generation method as provided in all the invention embodiments of this application, or realizes all An embodiment of the invention provides a method for editing a face image.
本发明实施例的计算机存储介质,可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、RAM、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、闪存、光纤、便携式CD-ROM、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。The computer storage medium of the embodiment of the present invention may adopt any combination of one or more computer-readable media. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples of computer-readable storage media (non-exhaustive list) include: electrical connections with one or more wires, portable computer disks, hard disks, RAM, Read Only Memory (ROM), erasable Erasable Programmable Read Only Memory (EPROM), flash memory, optical fiber, portable CD-ROM, optical storage device, magnetic storage device, or any suitable combination of the above. In this document, the computer-readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, apparatus, or device.
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。The computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium. The computer-readable medium may send, propagate or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、电线、光缆、无线电频率等等,或者上述的任意合适的组合。The program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, optical cable, radio frequency, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言或其组合来编写用于执行本发明操作的计算机程序代码,程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执 行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络包括LAN或WAN连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。The computer program code used to perform the operations of the present invention can be written in one or more programming languages or a combination thereof. The programming languages include object-oriented programming languages-such as Java, Smalltalk, C++, and also conventional procedural programming languages. Programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to a user computer through any kind of network including a LAN or WAN, or may be connected to an external computer (for example, using an Internet service provider to connect through the Internet).
注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。Note that the above are only the preferred embodiments of the present invention and the applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments herein, and various obvious changes, readjustments and substitutions can be made to those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present invention has been described in more detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention. The scope of is determined by the scope of the appended claims.

Claims (18)

  1. 一种编辑模型生成方法,包括:An editing model generation method, including:
    对生成式对抗网络进行迭代训练,所述生成式对抗网络包括生成器和判别器;Performing iterative training on a generative confrontation network, the generative confrontation network including a generator and a discriminator;
    在所述迭代训练中,根据所述判别器的梯度更新配置信息更新所述生成式对抗网络,其中,所述梯度更新配置信息通过利普希茨约束条件确定;In the iterative training, the generative confrontation network is updated according to gradient update configuration information of the discriminator, wherein the gradient update configuration information is determined by Lipschitz constraints;
    在确定所述生成式对抗网络满足训练结束条件时,根据训练得到的所述生成式对抗网络中的生成器生成图像编辑模型。When it is determined that the generative confrontation network satisfies the training end condition, an image editing model is generated according to the generator in the generative confrontation network obtained by training.
  2. 根据权利要求1所述的方法,其特征在于,根据所述判别器的梯度更新配置信息更新所述生成式对抗网络,包括:The method according to claim 1, wherein updating the generative confrontation network according to the gradient update configuration information of the discriminator comprises:
    根据所述梯度更新配置信息,确定所述判别器包括的一个或多个特征提取层中各所述特征提取层对应的参数学习速率最大阈值;Determining, according to the gradient update configuration information, the maximum threshold of the parameter learning rate corresponding to each of the one or more feature extraction layers included in the discriminator;
    根据各所述特征提取层的参数学习速率最大阈值,更新所述判别器的参数项。According to the maximum threshold value of the parameter learning rate of each feature extraction layer, the parameter item of the discriminator is updated.
  3. 根据权利要求2所述的方法,其特征在于,根据各所述特征提取层的参数学习速率最大阈值,更新所述判别器的参数项,包括:The method according to claim 2, wherein updating the parameter items of the discriminator according to the maximum threshold of the parameter learning rate of each of the feature extraction layers includes:
    获取进入当轮训练时所述特征提取层关联的参数项的数值,作为更新前数值;Acquiring the value of the parameter item associated with the feature extraction layer when entering the current round of training, as the value before the update;
    获取基于当轮训练计算的所述特征提取层关联的参数项的数值,作为拟更新数值;Acquiring the value of the parameter item associated with the feature extraction layer calculated based on the current round of training as a value to be updated;
    根据所述拟更新数值和所述更新前数值,确定所述参数项的学习速率;Determine the learning rate of the parameter item according to the value to be updated and the value before the update;
    在所述学习速率小于等于所述参数学习速率最大阈值的情况下,按所述拟更新数值更新所述特征提取层关联的所述参数项;In the case that the learning rate is less than or equal to the maximum threshold of the parameter learning rate, update the parameter item associated with the feature extraction layer according to the to-be-updated value;
    在所述学习速率大于所述参数学习速率最大阈值的情况下,按目标数值更新所述特征提取层关联的所述参数项,所述目标数值根据所述参数学习速率最大阈值得到。In the case that the learning rate is greater than the maximum threshold of the parameter learning rate, the parameter item associated with the feature extraction layer is updated according to a target value, and the target value is obtained according to the maximum threshold of the parameter learning rate.
  4. 根据权利要求3所述的方法,其特征在于,根据如下公式计算所述目标数值:The method according to claim 3, wherein the target value is calculated according to the following formula:
    Figure PCTCN2021101007-appb-100001
    Figure PCTCN2021101007-appb-100001
    其中,θ 1表示所述目标数值, Where θ 1 represents the target value,
    θ 0表示所述更新前数值, θ 0 represents the value before the update,
    α表示所述参数学习速率最大阈值,α represents the maximum threshold of the parameter learning rate,
    J(θ 0,θ 1)为拟合函数。 J(θ 0 , θ 1 ) is the fitting function.
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述训练结束条件包括:The method according to any one of claims 1 to 4, wherein the training end condition comprises:
    损失函数收敛至设定数值;The loss function converges to the set value;
    其中,所述损失函数是根据损失函数配置信息在初始损失函数中添加欧氏距离范数 得到,所述欧氏距离范数包括的元素为所述生成器中编码器的参数项。Wherein, the loss function is obtained by adding a Euclidean distance norm to the initial loss function according to the loss function configuration information, and the elements included in the Euclidean distance norm are parameter items of the encoder in the generator.
  6. 根据权利要求5所述的方法,其特征在于,所述欧氏距离范数的计算式如下:The method according to claim 5, wherein the calculation formula of the Euclidean distance norm is as follows:
    Figure PCTCN2021101007-appb-100002
    Figure PCTCN2021101007-appb-100002
    其中,λ表示惩罚系数,Among them, λ represents the penalty coefficient,
    ‖·‖ F表示范数运算, ‖·‖ F stands for norm operation,
    θ g表示所述生成器中编码器的参数项。 θ g represents the parameter item of the encoder in the generator.
  7. 根据权利要求5所述的方法,其特征在于,所述初始损失函数包括:The method according to claim 5, wherein the initial loss function comprises:
    所述判别器的损失函数和所述生成器的损失函数;其中,对所述判别器的训练目标为最大化所述判别器的损失函数,对所述生成器的训练目标为最小化所述生成器的损失函数。The loss function of the discriminator and the loss function of the generator; wherein the training objective of the discriminator is to maximize the loss function of the discriminator, and the training objective of the generator is to minimize the The loss function of the generator.
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,根据训练得到的所述生成式对抗网络中的生成器生成所述图像编辑模型,包括:The method according to any one of claims 1 to 7, wherein generating the image editing model according to a generator in the generative confrontation network obtained by training includes:
    根据预先训练的图像特征检测模型中的卷积神经网络,更新所述生成器;Update the generator according to the convolutional neural network in the pre-trained image feature detection model;
    根据更新后的生成器,生成图像编辑模型;Generate an image editing model according to the updated generator;
    其中,所述图像特征检测模型通过图像特征样本训练得到,所述图像特征样本包括在同一图像中的两个区域图像块以及所述两个区域图像块之间的关系数据。Wherein, the image feature detection model is obtained through image feature sample training, and the image feature sample includes two regional image blocks in the same image and relationship data between the two regional image blocks.
  9. 根据权利要求8所述的方法,其特征在于,所述图像特征检测模型包括:The method according to claim 8, wherein the image feature detection model comprises:
    共享权重的两个卷积神经网络,用于分别提取所述两个区域图像块的特征信息,并形成特征向量;Two convolutional neural networks sharing weights are used to extract feature information of the two regional image blocks respectively, and form feature vectors;
    特征向量拼接器,用于将各所述卷积神经网络生成的特征向量合成目标特征向量;以及The feature vector splicer is used to synthesize the feature vector generated by each of the convolutional neural networks into a target feature vector; and
    全连接网络分类器,用于将所述目标特征向量进行分类,并输出所述两个区域图像块之间的关系数据。The fully connected network classifier is used to classify the target feature vector and output the relationship data between the two regional image blocks.
  10. 根据权利要求8或9所述的方法,其特征在于,根据预先训练的图像特征检测模型中的卷积神经网络,更新所述生成器,包括以下至少一项:The method according to claim 8 or 9, wherein updating the generator according to the convolutional neural network in the pre-trained image feature detection model includes at least one of the following:
    将所述图像特征检测模型中的卷积神经网络,作为所述生成器的解码器中的卷积神经网络;Using the convolutional neural network in the image feature detection model as the convolutional neural network in the decoder of the generator;
    将所述图像特征检测模型中的卷积神经网络的参数项,迁移到所述生成器的解码器中的卷积神经网络;Migrating the parameter items of the convolutional neural network in the image feature detection model to the convolutional neural network in the decoder of the generator;
    在所述生成器的解码器的特征提取网络中添加所述图像特征检测模型中的卷积神经网络。The convolutional neural network in the image feature detection model is added to the feature extraction network of the decoder of the generator.
  11. 根据权利要求8至10中任一项所述的方法,其特征在于,所述图像特征样本包括:The method according to any one of claims 8 to 10, wherein the image feature sample comprises:
    在同一人脸图像中的两个人脸器官区域图像块,和所述两个人脸器官区域图像块之间的关系数据。Two face organ region image blocks in the same face image, and relationship data between the two face organ region image blocks.
  12. 根据权利要求8至11中任一项所述的方法,其特征在于,所述关系数据表示所述两个区域图像块之间的以下关系中的任意一项或多项:位置关系、尺寸关系、形状关系、颜色关系。The method according to any one of claims 8 to 11, wherein the relationship data represents any one or more of the following relationships between the two regional image blocks: position relationship, size relationship , Shape relationship, color relationship.
  13. 根据权利要求1至12中任一项所述的方法,其特征在于,所述对生成式对抗网络进行迭代训练,包括:The method according to any one of claims 1 to 12, wherein the iterative training of the generative confrontation network comprises:
    将样本输入所述生成式对抗网络,对所述生成式对抗网络进行训练;其中,所述样本包括真实图像和噪声图像中的至少一项。Inputting samples into the generative confrontation network to train the generative confrontation network; wherein the samples include at least one of a real image and a noisy image.
  14. 一种人脸图像编辑方法,包括:A face image editing method, including:
    获取待编辑的人脸图像;Obtain the face image to be edited;
    将所述待编辑的人脸图像输入到图像编辑模型中,得到所述图像编辑模型输出的编辑人脸图像;Inputting the face image to be edited into an image editing model to obtain an edited face image output by the image editing model;
    其中,所述图像编辑模型通过如权利要求1至13中任一项所述的编辑模型生成方法生成。Wherein, the image editing model is generated by the editing model generating method according to any one of claims 1 to 13.
  15. 一种编辑模型生成装置,包括:An editing model generation device, including:
    网络训练模块,用于对生成式对抗网络进行迭代训练,所述生成式对抗网络包括生成器和判别器;A network training module for iterative training of a generative confrontation network, the generative confrontation network including a generator and a discriminator;
    网络更新模块,用于在所述迭代训练中,根据所述判别器的梯度更新配置信息更新所述生成式对抗网络,所述梯度更新配置信息通过利普希茨约束条件确定;A network update module, configured to update the generative confrontation network according to gradient update configuration information of the discriminator in the iterative training, where the gradient update configuration information is determined by Lipschitz constraints;
    模型生成模块,用于在确定所述生成式对抗网络满足训练结束条件时,根据训练得到的生成式对抗网络中的生成器生成图像编辑模型。The model generation module is configured to generate an image editing model according to the generator in the generative confrontation network obtained by training when it is determined that the generative confrontation network satisfies the training end condition.
  16. 一种人脸图像编辑装置,包括:A face image editing device, including:
    人脸图像获取模块,用于获取待编辑的人脸图像;The face image acquisition module is used to acquire the face image to be edited;
    人脸图像编辑模块,用于将所述待编辑的人脸图像输入到图像编辑模型中,得到所述图像编辑模型输出的编辑人脸图像;其中,所述图像编辑模型通过如权利要求1至13中任一项所述的编辑模型生成方法生成。The face image editing module is used to input the face image to be edited into the image editing model to obtain the edited face image output by the image editing model; wherein, the image editing model is as claimed in claim 1 to The editing model generation method described in any one of 13 is generated.
  17. 一种计算机设备,包括:A computer equipment including:
    存储器;Memory
    处理器;及Processor; and
    存储在存储器上并可在处理器上运行的计算机程序,A computer program stored in the memory and running on the processor,
    其特征在于,所述处理器执行所述程序时实现如权利要求1至13中任一项所述的编辑模型生成方法,或如权利要求14所述的人脸图像编辑方法。It is characterized in that, when the processor executes the program, the editing model generation method according to any one of claims 1 to 13 or the face image editing method according to claim 14 is realized.
  18. 一种计算机可读存储介质,其上存储有计算机程序,A computer-readable storage medium on which a computer program is stored,
    其特征在于,该程序被处理器执行时实现如权利要求1至13中任一项所述的编辑模型生成方法,或如权利要求14所述的人脸图像编辑方法。It is characterized in that, when the program is executed by the processor, the editing model generation method according to any one of claims 1 to 13 or the face image editing method according to claim 14 is realized.
PCT/CN2021/101007 2020-06-19 2021-06-18 Editing model generation method and apparatus, face image editing method and apparatus, device, and medium WO2021254499A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010568177.7 2020-06-19
CN202010568177.7A CN111754596B (en) 2020-06-19 2020-06-19 Editing model generation method, device, equipment and medium for editing face image

Publications (1)

Publication Number Publication Date
WO2021254499A1 true WO2021254499A1 (en) 2021-12-23

Family

ID=72675543

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/101007 WO2021254499A1 (en) 2020-06-19 2021-06-18 Editing model generation method and apparatus, face image editing method and apparatus, device, and medium

Country Status (2)

Country Link
CN (1) CN111754596B (en)
WO (1) WO2021254499A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359034A (en) * 2021-12-24 2022-04-15 北京航空航天大学 Method and system for generating face picture based on hand drawing
CN114359667A (en) * 2021-12-30 2022-04-15 西安交通大学 Strength coherent identification method and equipment based on generating type countermeasure network
CN114492176A (en) * 2022-01-12 2022-05-13 北京仿真中心 Dynamic model parameter identification method and system based on generation countermeasure network
CN114549287A (en) * 2022-01-27 2022-05-27 西北大学 Method and system for constructing human face arbitrary attribute editing model
CN114663539A (en) * 2022-03-09 2022-06-24 东南大学 2D face restoration technology under mask based on audio drive
CN114724214A (en) * 2022-03-31 2022-07-08 华南理工大学 Micro-expression editing method and system based on face action unit
CN116187294A (en) * 2023-04-24 2023-05-30 开元华创科技(集团)有限公司 Method and system for rapidly generating electronic file of informationized detection laboratory
CN116415687A (en) * 2022-12-29 2023-07-11 江苏东蓝信息技术有限公司 Artificial intelligent network optimization training system and method based on deep learning
WO2023143126A1 (en) * 2022-01-30 2023-08-03 北京字跳网络技术有限公司 Image processing method and apparatus, electronic device, and storage medium
CN117853638A (en) * 2024-03-07 2024-04-09 厦门大学 End-to-end 3D face rapid generation and editing method based on text driving
WO2024108472A1 (en) * 2022-11-24 2024-05-30 北京京东方技术开发有限公司 Model training method and apparatus, text image processing method, device, and medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111754596B (en) * 2020-06-19 2023-09-19 北京灵汐科技有限公司 Editing model generation method, device, equipment and medium for editing face image
CN112232281B (en) * 2020-11-04 2024-06-11 深圳大学 Face attribute editing method and device, intelligent terminal and storage medium
CN112651915B (en) * 2020-12-25 2023-08-29 百果园技术(新加坡)有限公司 Face image synthesis method, system, electronic equipment and storage medium
CN112668529A (en) * 2020-12-31 2021-04-16 神思电子技术股份有限公司 Dish sample image enhancement identification method
CN112819689B (en) * 2021-02-02 2024-08-27 百果园技术(新加坡)有限公司 Training method of human face attribute editing model, human face attribute editing method and human face attribute editing equipment
CN113158977B (en) * 2021-05-12 2022-07-29 河南师范大学 Image character editing method for improving FANnet generation network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170316281A1 (en) * 2016-04-28 2017-11-02 Microsoft Technology Licensing, Llc Neural network image classifier
CN107943784A (en) * 2017-11-02 2018-04-20 南华大学 Relation extraction method based on generation confrontation network
CN108564119A (en) * 2018-04-04 2018-09-21 华中科技大学 A kind of any attitude pedestrian Picture Generation Method
CN110197514A (en) * 2019-06-13 2019-09-03 南京农业大学 A kind of mushroom phenotype image generating method based on production confrontation network
CN110689480A (en) * 2019-09-27 2020-01-14 腾讯科技(深圳)有限公司 Image transformation method and device
CN111754596A (en) * 2020-06-19 2020-10-09 北京灵汐科技有限公司 Editing model generation method, editing model generation device, editing method, editing device, editing equipment and editing medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537152B (en) * 2018-03-27 2022-01-25 百度在线网络技术(北京)有限公司 Method and apparatus for detecting living body
CN109308450A (en) * 2018-08-08 2019-02-05 杰创智能科技股份有限公司 A kind of face's variation prediction method based on generation confrontation network
CN110457994B (en) * 2019-06-26 2024-05-10 平安科技(深圳)有限公司 Face image generation method and device, storage medium and computer equipment
CN110659582A (en) * 2019-08-29 2020-01-07 深圳云天励飞技术有限公司 Image conversion model training method, heterogeneous face recognition method, device and equipment
CN110889370B (en) * 2019-11-26 2023-10-24 上海大学 System and method for synthesizing face by end-to-end side face based on condition generation countermeasure network
CN111275784B (en) * 2020-01-20 2023-06-13 北京百度网讯科技有限公司 Method and device for generating image
CN111275613A (en) * 2020-02-27 2020-06-12 辽宁工程技术大学 Editing method for generating confrontation network face attribute by introducing attention mechanism

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170316281A1 (en) * 2016-04-28 2017-11-02 Microsoft Technology Licensing, Llc Neural network image classifier
CN107943784A (en) * 2017-11-02 2018-04-20 南华大学 Relation extraction method based on generation confrontation network
CN108564119A (en) * 2018-04-04 2018-09-21 华中科技大学 A kind of any attitude pedestrian Picture Generation Method
CN110197514A (en) * 2019-06-13 2019-09-03 南京农业大学 A kind of mushroom phenotype image generating method based on production confrontation network
CN110689480A (en) * 2019-09-27 2020-01-14 腾讯科技(深圳)有限公司 Image transformation method and device
CN111754596A (en) * 2020-06-19 2020-10-09 北京灵汐科技有限公司 Editing model generation method, editing model generation device, editing method, editing device, editing equipment and editing medium

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359034A (en) * 2021-12-24 2022-04-15 北京航空航天大学 Method and system for generating face picture based on hand drawing
CN114359034B (en) * 2021-12-24 2023-08-08 北京航空航天大学 Face picture generation method and system based on hand drawing
CN114359667B (en) * 2021-12-30 2024-01-30 西安交通大学 Intensity coherent identification method and equipment based on generation type countermeasure network
CN114359667A (en) * 2021-12-30 2022-04-15 西安交通大学 Strength coherent identification method and equipment based on generating type countermeasure network
CN114492176A (en) * 2022-01-12 2022-05-13 北京仿真中心 Dynamic model parameter identification method and system based on generation countermeasure network
CN114549287B (en) * 2022-01-27 2024-03-01 西北大学 Method and system for constructing arbitrary attribute editing model of human face
CN114549287A (en) * 2022-01-27 2022-05-27 西北大学 Method and system for constructing human face arbitrary attribute editing model
WO2023143126A1 (en) * 2022-01-30 2023-08-03 北京字跳网络技术有限公司 Image processing method and apparatus, electronic device, and storage medium
CN114663539B (en) * 2022-03-09 2023-03-14 东南大学 2D face restoration technology under mask based on audio drive
CN114663539A (en) * 2022-03-09 2022-06-24 东南大学 2D face restoration technology under mask based on audio drive
CN114724214B (en) * 2022-03-31 2024-05-14 华南理工大学 Micro-expression editing method and system based on facial action unit
CN114724214A (en) * 2022-03-31 2022-07-08 华南理工大学 Micro-expression editing method and system based on face action unit
WO2024108472A1 (en) * 2022-11-24 2024-05-30 北京京东方技术开发有限公司 Model training method and apparatus, text image processing method, device, and medium
CN116415687B (en) * 2022-12-29 2023-11-21 江苏东蓝信息技术有限公司 Artificial intelligent network optimization training system and method based on deep learning
CN116415687A (en) * 2022-12-29 2023-07-11 江苏东蓝信息技术有限公司 Artificial intelligent network optimization training system and method based on deep learning
CN116187294B (en) * 2023-04-24 2023-07-07 开元华创科技(集团)有限公司 Method and system for rapidly generating electronic file of informationized detection laboratory
CN116187294A (en) * 2023-04-24 2023-05-30 开元华创科技(集团)有限公司 Method and system for rapidly generating electronic file of informationized detection laboratory
CN117853638A (en) * 2024-03-07 2024-04-09 厦门大学 End-to-end 3D face rapid generation and editing method based on text driving

Also Published As

Publication number Publication date
CN111754596B (en) 2023-09-19
CN111754596A (en) 2020-10-09

Similar Documents

Publication Publication Date Title
WO2021254499A1 (en) Editing model generation method and apparatus, face image editing method and apparatus, device, and medium
US11481869B2 (en) Cross-domain image translation
US11508169B2 (en) System and method for synthetic image generation with localized editing
CN110785767B (en) Compact linguistics-free facial expression embedding and novel triple training scheme
WO2023082882A1 (en) Pose estimation-based pedestrian fall action recognition method and device
WO2020216033A1 (en) Data processing method and device for facial image generation, and medium
CN112085041B (en) Training method and training device of neural network and electronic equipment
CN114240735B (en) Arbitrary style migration method, system, storage medium, computer equipment and terminal
CN111091010A (en) Similarity determination method, similarity determination device, network training device, network searching device and storage medium
KR20210147507A (en) Image generation system and image generation method using the system
CN113177572A (en) Method and computer readable medium for automatic learning from sensors
Liu et al. Learning explicit shape and motion evolution maps for skeleton-based human action recognition
CN112801107A (en) Image segmentation method and electronic equipment
Chen et al. A unified framework for generative data augmentation: A comprehensive survey
AU2023204419A1 (en) Multidimentional image editing from an input image
CN115019053A (en) Dynamic graph semantic feature extraction method for point cloud classification and segmentation
US20240290022A1 (en) Automatic avatar generation using semi-supervised machine learning
WO2023240583A1 (en) Cross-media corresponding knowledge generating method and apparatus
KR20210063171A (en) Device and method for image translation
CN114781642B (en) Cross-media corresponding knowledge generation method and device
CN118071867B (en) Method and device for converting text data into image data
WO2023178801A1 (en) Image description method and apparatus, computer device, and storage medium
Chen et al. Unsupervised Learning: Deep Generative Model
CN116823991A (en) Image processing method and device
Cai Monocular visual scene analysis: saliency detection and 3D face reconstruction using GAN.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21825129

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 31/03/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21825129

Country of ref document: EP

Kind code of ref document: A1