WO2023072067A1 - Face attribute editing model training and face attribute editing methods - Google Patents

Face attribute editing model training and face attribute editing methods Download PDF

Info

Publication number
WO2023072067A1
WO2023072067A1 PCT/CN2022/127361 CN2022127361W WO2023072067A1 WO 2023072067 A1 WO2023072067 A1 WO 2023072067A1 CN 2022127361 W CN2022127361 W CN 2022127361W WO 2023072067 A1 WO2023072067 A1 WO 2023072067A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
image
target
attribute editing
loss function
Prior art date
Application number
PCT/CN2022/127361
Other languages
French (fr)
Chinese (zh)
Inventor
黄嘉彬
李玉乐
项伟
Original Assignee
百果园技术(新加坡)有限公司
黄嘉彬
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百果园技术(新加坡)有限公司, 黄嘉彬 filed Critical 百果园技术(新加坡)有限公司
Publication of WO2023072067A1 publication Critical patent/WO2023072067A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the embodiment of the present application relates to the technical field of image processing, for example, it relates to the training of a human face attribute editing model and a method for editing human face attributes.
  • Face attribute editing is an important technology in the field of computer vision. It is widely used in content production, film production and entertainment videos, such as changing bald head, changing hairstyle, changing child, changing star face, etc. Face attribute editing is to give an input image containing a face and the target attribute to be edited, and then transform the input image into a target domain face image with the target attribute, and ensure that other original attribute features in the face image remain unchanged .
  • a Generative Adversarial Networks is usually pre-trained to achieve target attribute editing of face images.
  • GAN Generative Adversarial Networks
  • a global loss function will be uniformly set for the generative adversarial network, and the global loss function will be used to train the generative adversarial network, thereby guiding the generative adversarial network.
  • the face images output by the adversarial network can have specific target attributes.
  • Embodiments of the present application provide a face attribute editing model training and a face attribute editing method.
  • the embodiment of the present application provides a training method for a human face attribute editing model, the method comprising:
  • the target confrontation loss function and similarity loss function are preset in the face attribute editing model; wherein, the target confrontation loss function is set as a constraint
  • the similarity loss function is set to constrain the non-target attribute of the human face training image when it passes through the human face attribute editing model and the same as the human face training image The similarity between the non-target attributes of time structure;
  • the human face training image is input into the human face attribute editing model, and the human face attribute editing model is trained by using the target confrontation loss function and the similarity loss function to obtain the trained human face attribute Edit the model.
  • the embodiment of the present application provides a method for editing face attributes, the method comprising:
  • the embodiment of the present application provides a training device for a face attribute editing model, the device comprising:
  • the model building module is configured to construct an initial face attribute editing model according to the reconstruction parameters of the face training image, and a target confrontation loss function and a similarity loss function are preset in the face attribute editing model; wherein, the target The adversarial loss function is set to constrain the editing authenticity of the target attribute in the face training image, and the similarity loss function is set to constrain the non-target attribute of the face training image when it passes through the face attribute editing model. Describe the similarity between the non-target attributes when the face training image is reconstructed;
  • the model training module is configured to input the face training image into the face attribute editing model, and use the target confrontation loss function and the similarity loss function to train the face attribute editing model to obtain The trained face attribute editing model.
  • the embodiment of the present application provides a device for editing human face attributes, which includes:
  • the preliminary editing module is configured to input the current face image to be edited into the face attribute editing model trained by the training method of the face attribute editing model provided by the above-mentioned first aspect, to obtain a corresponding face editing image;
  • the target segmentation module is configured to perform target segmentation on the current face image to obtain a corresponding target domain mask
  • the editing and repairing module is configured to use the target domain mask image to perform image repair on the edited face image to obtain a face image with edited target attributes.
  • the embodiment of the present application provides an electronic device, the electronic device includes:
  • processors one or more processors
  • a storage device configured to store one or more programs
  • the one or more processors realize the training method of the face attribute editing model provided by the first aspect above, or realize the second above Face attribute editing method provided by aspect.
  • the embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for training the face attribute editing model provided in the above-mentioned first aspect is implemented, or, Realize the face attribute editing method provided by the second aspect above.
  • FIG. 1A is a flowchart of a training method for a face attribute editing model provided in Embodiment 1 of the present application;
  • FIG. 1B is a schematic diagram of the training process of the face attribute editing model provided in Embodiment 1 of the present application;
  • Fig. 2A is the flow chart of the training method of a kind of human face attribute editing model provided in embodiment 2 of the present application;
  • FIG. 2B is a schematic diagram of the training process of the face attribute editing model provided in Embodiment 2 of the present application.
  • FIG. 2C is a schematic structural diagram of the face attribute editing model provided in Embodiment 2 of the present application.
  • FIG. 3A is a flowchart of a face attribute editing method provided in Embodiment 3 of the present application.
  • FIG. 3B is a schematic diagram of the principle of the face attribute editing process provided by Embodiment 3 of the present application.
  • FIG. 4 is a schematic structural diagram of a training device for a face attribute editing model provided in Embodiment 4 of the present application;
  • FIG. 5 is a schematic structural diagram of a face attribute editing device provided in Embodiment 5 of the present application.
  • FIG. 6 is a schematic structural diagram of an electronic device provided in Embodiment 6 of the present application.
  • FIG. 1A is a flowchart of a training method for a face attribute editing model provided in Embodiment 1 of the present application.
  • various attributes in any face image can be re-edited to change the face style.
  • the training method of the human face attribute editing model provided by this embodiment can be executed by the training device of the human face attribute editing model provided by the embodiment of this application, which device can be realized by means of software and/or hardware, and integrated in the execution of this application method of electronic equipment.
  • the method may include the following steps:
  • the set global loss function is usually used to analyze the difference between the output image and the input image, so as to train the network model to understand the target attributes in the face image Make accurate edits.
  • the global loss function will cause a corresponding strong constraint relationship between the input image and the output image, it may cause changes in the background area and non-target attributes while editing the target attributes, making the edited output image not realistic enough. nature.
  • this embodiment will specifically build a face attribute editing model for all types of face images. Natural and realistic editing of target properties within .
  • the target attribute can be a specific key point feature to be edited in the face image according to the user's editing requirements. For example, when the user needs to re-edit the portrait in the face image into a bald head, the target attribute is the Hair region features.
  • this embodiment can preliminarily construct an initial face attribute editing model on the basis of the face image reconstruction information, so that the initial face attribute editing model can realize the corresponding human face attribute editing model. Face image reconstruction. In the future, by adopting the method of transfer learning, the reconstruction ability of the face attribute editing model can be continuously changed, so that the face attribute editing model has the ability to accurately edit the target attribute.
  • an initial face attribute editing model when constructing an initial face attribute editing model, a large number of face training images are first obtained as training samples for the constructed face attribute editing model. Then, each face training image can be reconstructed through the existing face image reconstruction method to determine the various reconstruction parameters used in the reconstruction of the human face training image, and then construct according to each reconstruction parameter
  • the initial face attribute editing model enables the initial face attribute editing model to have the ability of face image reconstruction for subsequent transfer learning of target attribute editing.
  • this embodiment will pre-set the target confrontation loss function and similarity loss function in the face attribute editing model Two loss functions. Among them, the editing authenticity of the target attributes in the face training image is constrained by the target confrontation loss function, and the non-target attributes and face training image reconstruction of the face training image are constrained by the face attribute editing model through the similarity loss function The similarity between the non-target attributes at the time.
  • the target confrontation loss function is used to constrain the editing authenticity of the face attribute editing model for the target attributes in the face training image to ensure the accurate editing of the target attributes in the face training image;
  • the similarity loss function is used to constrain the face training image after The similarity between the non-target attributes when the face attribute editing model is reconstructed and the non-target attributes when the face training image is reconstructed, to ensure that the face training image is not edited after the face attribute editing model edits the target attributes. It will affect the non-target attributes in the face training image, and try to ensure the invariance of the non-target attributes of the face training image before and after editing.
  • the face attribute editing model in this embodiment can adopt a generative confrontation (such as StyleGAN) model.
  • the target confrontation loss function (GAN Loss) can be a WGAN-GP function with a gradient penalty, which is used to guide the face attribute editing model to move closer to the target attribute.
  • the WGAN-GP function replaces the measures of the two probability distributions in the existing GAN network with the Wasserstein distance, making the training process of the face attribute editing model more stable.
  • the WGAN-GP function is optimized on the basis of the existing loss function, and the regular term GP (gradient penalty) is added to constrain the L2 norm of the input gradient of the discriminator to be constrained near 1.
  • the target adversarial loss function in this embodiment can be:
  • D is the discriminator when the face attribute editing model adopts the generative confrontation network
  • G is the generator when the face attribute editing model adopts the generative confrontation network
  • x is the face training image input into the generator in the face attribute editing model
  • y is the output image after the generator edits the target attributes in the face training image, which is used to input the discriminator in the face attribute editing model for authenticity discrimination.
  • the generative confrontation network can be divided into two parts, namely the generator and the discriminator.
  • Loss G is the loss function of the generator
  • Loss D is the loss function of the discriminator.
  • this embodiment divides the similarity loss function into two types: a low-resolution perceptual loss function and a mask perceptual loss function.
  • the low-resolution perceptual loss function is used to constrain the difference between the first intermediate low-resolution image output when the face training image is edited through the face attribute editing model for target attribute editing, and the second intermediate low-resolution image generated when the face training image is reconstructed.
  • the mask-aware loss function is used to constrain the first key point mask feature extracted after the target attribute editing of the face training image through the face attribute editing model and the second key point extracted after the reconstruction of the face training image The similarity between point mask features.
  • the low-resolution perceptual loss function (low-resolution perceptual loss) is used to constrain the similarity of the intermediate low-resolution images when the target attribute editing and reconstruction of the face training images are performed, so that the face attribute editing model can be used for the input
  • the similarity constraint between the image and the output image does not form too strong a constraint that the target attribute cannot be edited.
  • the present embodiment uses the masked perceptual loss function (Masked perceptual loss) to use the face key point network to edit the target attributes of the face training image and reconstruct the face key points in the two output images output after the output (such as eyes, nose, mouth, etc.) to extract key point features, so as to obtain the first key point mask feature extracted after the face attribute editing model edits the target attribute and the face training image is reconstructed.
  • Masked perceptual loss masked perceptual loss function
  • the second keypoint mask features and then use the mask-aware loss function to analyze the mask similarity between the same keypoints, so as to constrain the face training image for target attribute editing and reconstruction of each keypoint mask feature extracted
  • the similarity between the non-target attributes represented.
  • the mask-aware loss function in this embodiment can be:
  • VGG(m G transfer (x)) is the first key point mask feature extracted from the face training image after the target attribute editing of the face attribute editing model
  • VGG(mgG rec (x)) is the weight of the face training image.
  • the second key point mask feature extracted after construction G transfer (x) is the face edited image output after the target attribute editing of the face training image
  • G rec (x) is the face output after the face training image is reconstructed Reconstruct the image
  • m is the key point of the face.
  • each face training image will be continuously input into the face attribute editing model, and each model parameter set in the face attribute editing model will affect the input face
  • the training image is subjected to feature processing, and the input face training image is reconstructed at the same time, so as to obtain the parameters required by the target adversarial loss function and similarity loss function set in the face attribute editing model.
  • the loss result is back-propagated into the face attribute editing model, so as to modify the current model parameters in the face attribute editing model accordingly.
  • the initial face attribute editing model is constructed according to the reconstruction parameters of the face training image, and the target adversarial loss function and similarity loss function are preset in the face attribute editing model, so as to constrain the human face through the target adversarial loss function.
  • the editing authenticity of the target attributes in the face training image constraining the similarity between the non-target attributes of the face training image through the face attribute editing model and the non-target attributes of the face training image reconstruction through the similarity loss function
  • the face attribute editing model is jointly trained by using the target confrontation loss function and the similarity loss function, so that the face attribute editing model can realize the mutual recognition between the target attribute and the non-target attribute when editing the target attribute in the face image. Constraints, to avoid the problem of changing the background area and non-target attributes when editing the target attributes, to ensure that the editing of the non-target attributes remains invariant while editing the target attributes, thereby improving the editing of the face attribute editing model for the target attributes accuracy.
  • Fig. 2A is a flowchart of a training method of a face attribute editing model provided in Embodiment 2 of the present application
  • Fig. 2B is a schematic diagram of the principle of the training process of the face attribute editing model provided in Embodiment 2 of the present application
  • Fig. 2C is the present application
  • the StyleGAN network has a powerful generation ability to generate real and natural images
  • the StyleGAN network is a model that maps random noise to images, it cannot directly accept real images as input
  • the face attribute editing model in can include three parts: image encoding network, migration decoding network and hidden space of migration decoding network, so that the StyleGAN network can be used to accurately realize the target attribute editing of face images.
  • the image encoding network (The pSp Encoder) in the face attribute editing model of the present embodiment can adopt the network structure of feature pyramid, be used for inputting each face training image, thus transfer to the latent space of decoding network Output the hidden variable features of the face training image.
  • the image coding network based on the feature pyramid structure can map the feature maps with different semantic information to the latent space, and transform the features of different granularities into multiple hidden variable features, so that the hidden variables with different semantic information
  • the features are input to different layers in the transfer decoding network (StyleGAN decoder), so that the transfer decoding network is used to input the hidden variable features in the hidden space, edit the target attributes in the face training image, and output the corresponding face Edit images to enhance image reconstruction capabilities.
  • the basic module of the image coding network uses a residual module, which applies the input of the image coding network to the output of the image coding network in the form of a shortcut, so that the gradient can be directly transmitted back to the shallow layer parameters during backpropagation, Effectively suppress problems such as gradient disappearance.
  • this embodiment may include the following steps:
  • the face attribute editing model In order to ensure the efficiency of the face attribute editing model, before constructing the initial face attribute editing model according to the reconstruction parameters of the face training images in this embodiment, a large number of face images with as strong a diversity as possible are first used as training samples, and pre-trained The corresponding reconstructed decoding network is produced, so that the reconstructed decoding network has the ability to reconstruct the face training image. Then, when constructing the initial face attribute editing model, the network parameters of the trained reconstructed decoding network can be directly used to construct the initial migration decoding network, so that the migration decoding network can be based on the reconstructed decoding network. In the way of transfer learning, the face attribute editing model with the ability to edit target attributes is obtained through continuous training.
  • a large number of face images can be used to train the corresponding image encoding network, so that the image encoding network can accurately analyze the hidden variable features in the face training images.
  • the pre-trained image encoding network, the initial migration decoding network, and the latent space of the migration decoding network are jointly constructed to construct an initial face attribute editing model.
  • the training of the face attribute editing model in this embodiment is mainly aimed at training the migration decoding network.
  • each face training image will be input into the image coding network in the face attribute editing model, and the image coding network will analyze the features of the face training image, and Output the hidden variable features of the face training image, and then output each hidden variable feature to the hidden space of the migration decoding network, so as to realize the transformation of the face training image from the real image to the hidden variable, so that the subsequent migration decoding network
  • the corresponding hidden variable features can be input from the hidden space to realize the input accuracy of the migration decoding network.
  • this embodiment will input the hidden variable features in the latent space into the migration decoding network and the reconstruction decoding network respectively.
  • the migration decoding network performs the editing operation of the target attribute according to the hidden variable feature, thereby outputting the first human face image associated with the editing of the target attribute.
  • a corresponding image reconstruction operation is performed by the reconstruction decoding network according to the hidden variable feature, thereby outputting a second human face image associated with the reconstruction.
  • the first face image and the second face image are used as the loss analysis objects of the target adversarial loss function and similarity loss function set in the face attribute editing model, and the model loss of the migration decoding network can be calculated subsequently, In order to train the transfer decoding network.
  • the first face image after obtaining the first face image associated with target attribute editing output by the migration decoding network and the second face image associated with reconstruction output by the reconstruction decoding network, the first face image can be directly substituted into In the target adversarial loss function, analyze the editing loss for the target attributes in the face training image.
  • substituting the first face image and the second face image into the similarity loss function can analyze the similarity between the non-target attributes of the face training image when performing target attribute editing and performing reconstruction loss.
  • the edit loss and similarity loss are used to backpropagate the migration decoding network to modify the network parameters in the migration decoding network and realize the training of the migration decoding network.
  • each face training image is continuously input into the image coding network, and the above steps are executed cyclically, so as to continuously train the migration decoding network until the loss function of the migration decoding network reaches convergence, thereby obtaining a migration decoding network that has completed training , the trained face attribute editing model can be obtained.
  • the similarity loss function in this embodiment includes two types of low-resolution perceptual loss function and mask perceptual loss function, in order to ensure the training accuracy of the migration decoding network, this embodiment will use the low-resolution perceptual loss function It is distinguished from the loss object of the mask-aware loss function.
  • the first face image associated with the target attribute editing output by the migration decoding network may include the first intermediate output when the face training image passes through the migration decoding network for target attribute editing.
  • the low-resolution image and the edited face image output after performing target attribute editing the second face image associated with reconstruction output by the reconstruction decoding network may include the first face training image generated when the face training image is reconstructed through the reconstruction decoding network. Two intermediate low-resolution images and the face reconstructed image output after performing reconstruction.
  • the first human face image and the second human face image are substituted into the similarity loss function
  • the migration decoding network is trained, which may include: the first intermediate low-resolution image and the second intermediate low-resolution image Substituting into the low-resolution perceptual loss function, the first key point mask feature extracted from the face editing image and the second key point mask feature extracted from the face reconstruction image are substituted into the mask perceptual loss function.
  • Transfer decoding network for training may include: the first intermediate low-resolution image and the second intermediate low-resolution image Substituting into the low-resolution perceptual loss function, the first key point mask feature extracted from the face editing image and the second key point mask feature extracted from the face reconstruction image are substituted into the mask perceptual loss function.
  • the low-resolution perceptual loss function is used to constrain the first intermediate low-resolution image output when the face training image is edited through the face attribute editing model for target attribute editing and the second intermediate low-resolution image generated when the face training image is reconstructed
  • the similarity between images can be distinguished, so the first intermediate low-resolution image output by the migration decoder when performing the target editing operation and the second intermediate low-resolution image output by the reconstruction decoder when performing image reconstruction operations can be substituted into In the low-resolution perceptual loss function, to analyze the corresponding low-resolution image editing loss.
  • the existing feature extractor will be used to extract the face The first key point mask feature in the face editing image and the second key point mask feature in the face reconstruction image, and then respectively substitute the first key point mask feature and the second key point mask feature into the mask Perceptual loss function to analyze the mask loss at each keypoint. Furthermore, combined with the loss results of the target adversarial loss function, the low-resolution image editing loss of the low-resolution perceptual loss function and the mask loss of the mask perceptual loss function at each key point, the transfer decoder is jointly trained, and the trained The face attribute editing model of .
  • the initial face attribute editing model is constructed according to the reconstruction parameters of the face training image, and the target adversarial loss function and similarity loss function are preset in the face attribute editing model, so as to constrain the human face through the target adversarial loss function.
  • the editing authenticity of the target attributes in the face training image constraining the similarity between the non-target attributes of the face training image through the face attribute editing model and the non-target attributes of the face training image reconstruction through the similarity loss function
  • the face attribute editing model is jointly trained by using the target confrontation loss function and the similarity loss function, so that the face attribute editing model can realize the mutual recognition between the target attribute and the non-target attribute when editing the target attribute in the face image. Constraints, to avoid the problem of changing the background area and non-target attributes when editing the target attributes, to ensure that the editing of the non-target attributes remains invariant while editing the target attributes, thereby improving the editing of the face attribute editing model for the target attributes accuracy
  • FIG. 3A is a flow chart of a face attribute editing method provided in Embodiment 3 of the present application.
  • various attributes in any face image can be re-edited to change the face style.
  • the face attribute editing method provided in this embodiment can be executed by the face attribute editing device provided in the embodiment of the present application.
  • the device can be realized by means of software and/or hardware, and integrated in the electronic device that executes the method.
  • the method may include the following steps:
  • a corresponding face attribute editing model is trained by adopting the training method of the face attribute editing model provided in the above embodiment, and the face attribute editing model has the ability to accurately edit target attributes. Therefore, in this embodiment, the current face image to be edited is input into the trained face attribute editing model, and the face attribute editing model performs target attribute editing processing on the current face image, thereby outputting the corresponding human face image. An edited face image. Compared with the current face image, the edited face image has re-edited target attributes in the image, and the non-target attributes can remain unchanged.
  • this embodiment will also add the face attribute Edit the face editing image output by the model to perform post-processing optimization operations for target attribute editing.
  • the image segmentation technology is used to segment the mask area where the target attribute is located from the current face image, and the corresponding target domain mask map is obtained.
  • hair segmentation can be performed on the current face image to obtain a corresponding hair mask map (denoted as m hair ), which is used as the target domain mask map in this embodiment.
  • the target domain mask map can completely represent the target attribute information in the current face image, so this embodiment can Use the target domain mask map to analyze the non-target attribute information such as the background area in the face edited image, and then use the fusion result of the current face image and the target domain mask map for the non-target attributes to perform image restoration on the face edited image , to obtain the face image with the target attribute edited, so that the final face image and the non-target attributes in the current face image remain unchanged, thereby maintaining the invariance of the non-target attributes while editing the target attributes.
  • the face editing image when performing image restoration on the edited face image in this embodiment, it may be as follows: performing portrait segmentation on the edited face image to obtain a corresponding target portrait mask image; fusing the target portrait mask image with the current face image, Obtain the corresponding face fusion image; use the difference set of the target domain mask map to the target portrait mask map to perform image repair on the face fusion image, and obtain the face image with the target attribute editing completed. That is to say, by performing portrait segmentation on the edited face image to obtain the corresponding target portrait mask map, the background area in the edited face image can be eliminated, and the problem of background area change after target attribute editing can be prevented. Taking target attribute editing by changing the bald head as an example, the face editing image can be segmented to obtain a corresponding bald head portrait mask map (denoted as m bald ), which is used as the target portrait mask map in this embodiment.
  • this embodiment fuses the target portrait mask map and the current face image, thereby retaining the non-portrait area (such as the background) in the current face image. area), and fuse with the target portrait mask image in the edited face image to obtain the corresponding face fusion image.
  • this embodiment will calculate the difference between the target domain mask map and the target portrait mask map (that is, m hair to m bald ’s difference set), and then the difference set and the face fusion image are jointly input into the pre-trained image inpainting model (inpainting), and the image inpainting model is based on the difference between the target domain mask map and the target portrait mask map Set, image repair is performed on the face fusion image, so as to obtain a face image that completes target attribute editing.
  • the pre-trained image inpainting model inpainting
  • the current face image to be edited is input into the face attribute editing model trained in the above manner, the corresponding edited face image can be obtained, and the target segmentation is performed on the current face image to obtain the corresponding target Domain mask map, and then use the target domain mask map to perform image restoration on the edited face image, and obtain the face image that has completed the target attribute editing.
  • perform mask restoration on the edited face image output by the face attribute editing model Processing can improve the accuracy of face target attribute editing, and ensure the authenticity and naturalness of the target attribute edited by the current face image.
  • FIG. 4 is a schematic structural diagram of a training device for a face attribute editing model provided in Embodiment 4 of the present application. As shown in FIG. 4 , the device may include:
  • the model construction module 410 is configured to construct an initial face attribute editing model according to the reconstruction parameters of the face training image, and a target confrontation loss function and a similarity loss function are preset in the face attribute editing model; wherein, the The target confrontation loss function is used to constrain the editing authenticity of the target attribute in the face training image, and the similarity loss function is used to constrain the non-target attribute and The similarity between the non-target attributes when the face training image is reconstructed;
  • the model training module 420 is configured to input the face training image into the face attribute editing model, and use the target adversarial loss function and the similarity loss function to train the face attribute editing model, Obtain the trained face attribute editing model.
  • the initial face attribute editing model is constructed according to the reconstruction parameters of the face training image, and the target adversarial loss function and similarity loss function are preset in the face attribute editing model, so as to constrain the human face through the target adversarial loss function.
  • the editing authenticity of the target attributes in the face training image constraining the similarity between the non-target attributes of the face training image through the face attribute editing model and the non-target attributes of the face training image reconstruction through the similarity loss function
  • the face attribute editing model is jointly trained by using the target confrontation loss function and the similarity loss function, so that the face attribute editing model can realize the mutual recognition between the target attribute and the non-target attribute when editing the target attribute in the face image.
  • Constraints to avoid the problem of changing the background area and non-target attributes when editing the target attributes, to ensure that the editing of the non-target attributes remains unchanged while editing the target attributes, thereby improving the editing of the face attribute editing model for the editing of the target attributes accuracy.
  • the training device for the face attribute editing model provided in this embodiment can be applied to the training method for the face attribute editing model provided in any of the above embodiments, and has corresponding functions and beneficial effects.
  • FIG. 5 is a schematic structural diagram of a face attribute editing device provided in Embodiment 5 of the present application. As shown in FIG. 5, the device may include:
  • the preliminary editing module 510 is configured to input the current human face image to be edited into the human face attribute editing model trained by the training method of the human face attribute editing model provided by the above-mentioned embodiment, and obtain a corresponding human face editing image;
  • the target segmentation module 520 is configured to perform target segmentation on the current face image to obtain a corresponding target domain mask
  • the editing and repairing module 530 is configured to use the target domain mask map to perform image repair on the edited face image to obtain a face image with edited target attributes.
  • the current face image to be edited is input into the face attribute editing model trained in the above manner, the corresponding edited face image can be obtained, and the target segmentation is performed on the current face image to obtain the corresponding target Domain mask map, and then use the target domain mask map to perform image restoration on the edited face image, and obtain the face image that has completed the target attribute editing.
  • perform mask restoration on the edited face image output by the face attribute editing model Processing can improve the accuracy of face target attribute editing, and ensure the authenticity and naturalness of the target attribute edited by the current face image.
  • the face attribute editing device provided in this embodiment can be applied to the face attribute editing method provided in any of the above embodiments, and has corresponding functions and beneficial effects.
  • FIG. 6 is a schematic structural diagram of an electronic device provided in Embodiment 6 of the present application.
  • the electronic device includes a processor 60, a storage device 61, and a communication device 62; the number of processors 60 in the electronic device may be One or more, one processor 60 is taken as an example in FIG. 6; the processor 60, storage device 61 and communication device 62 in the electronic device may be connected through a bus or in other ways, and a bus connection is taken as an example in FIG. 6.
  • An electronic device provided in this embodiment can be used to execute the training method of the face attribute editing model provided in any of the above embodiments, or the face attribute editing method, and has corresponding functions and beneficial effects.
  • Embodiment 7 of the present application also provides a computer-readable storage medium, on which a computer program is stored.
  • the program is executed by a processor, the training method of the face attribute editing model in any of the above-mentioned embodiments, or the face Property editing method.
  • a storage medium containing computer-executable instructions provided in an embodiment of the present application the computer-executable instructions are not limited to the method operations described above, and can also perform the training of the face attribute editing model provided in any embodiment of the present application method, or related operations in the face attribute editing method.
  • the present application can be realized by means of software and necessary general-purpose hardware, and of course it can also be realized by hardware, but in many cases the former is a better implementation .
  • the essence of the embodiment of the present application or the part that contributes to the related technology can be embodied in the form of software products, and the computer software products can be stored in computer-readable storage media, such as computer floppy disks, Read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disc, etc., including several instructions to make a computer device (which can be a personal computer, A server, or a network device, etc.) executes the methods described in various embodiments of the present application.
  • a computer device which can be a personal computer, A server, or a network device, etc.
  • the storage medium may be a non-transitory storage medium.
  • the various units and modules included are only divided according to functional logic, but are not limited to the above-mentioned divisions, As long as the corresponding functions can be realized; in addition, the specific names of the functional units are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The present application discloses face attribute editing model training and face attribute editing methods. The training method comprises: constructing an initial face attribute editing model according to a reconstruction parameter of a face training image, the face attribute editing model having a preset target adversarial loss function and a similarity loss function; inputting the face training image into the face attribute editing model, and training the face attribute editing model using the target adversarial loss function and the similarity loss function, to obtain a trained face attribute editing model.

Description

人脸属性编辑模型的训练以及人脸属性编辑方法Training of face attribute editing model and face attribute editing method
本申请要求在2021年10月25日提交中国专利局、申请号为202111239087.4的中国专利申请的优先权,以上申请的全部内容通过引用结合在本申请中。This application claims priority to a Chinese patent application with application number 202111239087.4 filed with the China Patent Office on October 25, 2021, the entire contents of which are incorporated herein by reference.
技术领域technical field
本申请实施例涉及图像处理技术领域,例如涉及一种人脸属性编辑模型的训练以及人脸属性编辑方法。The embodiment of the present application relates to the technical field of image processing, for example, it relates to the training of a human face attribute editing model and a method for editing human face attributes.
背景技术Background technique
人脸属性编辑是计算机视觉领域的一项重要技术,广泛用于内容生产、电影制作和娱乐视频等中,例如变光头、变发型、变小孩、变明星脸等。人脸属性编辑是给定一张包含人脸的输入图像以及待编辑的目标属性,然后将输入图像变换为具有目标属性的目标域人脸图像,并保证人脸图像内原有的其他属性特征不变。Face attribute editing is an important technology in the field of computer vision. It is widely used in content production, film production and entertainment videos, such as changing bald head, changing hairstyle, changing child, changing star face, etc. Face attribute editing is to give an input image containing a face and the target attribute to be edited, and then transform the input image into a target domain face image with the target attribute, and ensure that other original attribute features in the face image remain unchanged .
通常会预训练一种生成对抗网络(Generative Adversarial Networks,GAN)来实现人脸图像的目标属性编辑。此时,面向该生成对抗网络内输出图像与输入图像之间的差异,会为该生成对抗网络统一设定一个全局损失函数,并利用该全局损失函数来训练该生成对抗网络,从而引导该生成对抗网络输出的人脸图像能够具有特定的目标属性。A Generative Adversarial Networks (GAN) is usually pre-trained to achieve target attribute editing of face images. At this time, facing the difference between the output image and the input image in the generative adversarial network, a global loss function will be uniformly set for the generative adversarial network, and the global loss function will be used to train the generative adversarial network, thereby guiding the generative adversarial network. The face images output by the adversarial network can have specific target attributes.
但是,由于全局损失函数会使生成对抗网络的输入图像和输出图像之间存在相应的强约束关系,导致所训练的生成对抗网络输出的人脸图像内的目标属性虽然能够实现编辑,但是与输入图像相比,在编辑目标属性的同时,可能会带动背景区域和非目标属性的变更,使得输出图像内的背景区域和非目标属性区域更新不够自然,极大降低人脸目标属性的编辑准确性。However, due to the global loss function, there will be a corresponding strong constraint relationship between the input image and the output image of the generative adversarial network, so that although the target attributes in the face image output by the trained generative adversarial network can be edited, they are not consistent with the input Compared with images, while editing the target attributes, it may lead to changes in the background area and non-target attribute areas, making the update of the background area and non-target attribute areas in the output image unnatural, which greatly reduces the editing accuracy of face target attributes. .
发明内容Contents of the invention
本申请实施例提供了一种人脸属性编辑模型的训练以及人脸属性编辑方法。Embodiments of the present application provide a face attribute editing model training and a face attribute editing method.
第一方面,本申请实施例提供了一种人脸属性编辑模型的训练方法,该方法包括:In the first aspect, the embodiment of the present application provides a training method for a human face attribute editing model, the method comprising:
根据人脸训练图像的重构参数构建初始的人脸属性编辑模型,所述人脸属性编辑模型内预设有目标对抗损失函数和相似度损失函数;其中,所述目标对 抗损失函数设置为约束所述人脸训练图像内目标属性的编辑真实性,所述相似度损失函数设置为约束所述人脸训练图像经过所述人脸属性编辑模型时的非目标属性与所述人脸训练图像重构时的非目标属性之间的相似度;Construct an initial face attribute editing model according to the reconstruction parameters of the face training image, and the target confrontation loss function and similarity loss function are preset in the face attribute editing model; wherein, the target confrontation loss function is set as a constraint The editing authenticity of the target attribute in the human face training image, the similarity loss function is set to constrain the non-target attribute of the human face training image when it passes through the human face attribute editing model and the same as the human face training image The similarity between the non-target attributes of time structure;
将所述人脸训练图像输入到所述人脸属性编辑模型内,利用所述目标对抗损失函数和所述相似度损失函数对所述人脸属性编辑模型进行训练,得到训练后的人脸属性编辑模型。The human face training image is input into the human face attribute editing model, and the human face attribute editing model is trained by using the target confrontation loss function and the similarity loss function to obtain the trained human face attribute Edit the model.
第二方面,本申请实施例提供了一种人脸属性编辑方法,该方法包括:In a second aspect, the embodiment of the present application provides a method for editing face attributes, the method comprising:
将待编辑的当前人脸图像输入到采用上述第一方面提供的人脸属性编辑模型的训练方法所训练的人脸属性编辑模型中,得到对应的人脸编辑图像;Input the current face image to be edited into the face attribute editing model trained by the training method of the face attribute editing model provided by the first aspect above, and obtain the corresponding face editing image;
对所述当前人脸图像进行目标分割,得到对应的目标域掩码图;performing target segmentation on the current face image to obtain a corresponding target domain mask;
利用所述目标域掩码图对所述人脸编辑图像进行图像修复,得到完成目标属性编辑的人脸图像。Performing image restoration on the edited face image by using the target domain mask map to obtain a face image with edited target attributes.
第三方面,本申请实施例提供了一种人脸属性编辑模型的训练装置,该装置包括:In a third aspect, the embodiment of the present application provides a training device for a face attribute editing model, the device comprising:
模型构建模块,设置为根据人脸训练图像的重构参数构建初始的人脸属性编辑模型,所述人脸属性编辑模型内预设有目标对抗损失函数和相似度损失函数;其中,所述目标对抗损失函数设置为约束所述人脸训练图像内目标属性的编辑真实性,所述相似度损失函数设置为约束所述人脸训练图像经过所述人脸属性编辑模型时的非目标属性与所述人脸训练图像重构时的非目标属性之间的相似度;The model building module is configured to construct an initial face attribute editing model according to the reconstruction parameters of the face training image, and a target confrontation loss function and a similarity loss function are preset in the face attribute editing model; wherein, the target The adversarial loss function is set to constrain the editing authenticity of the target attribute in the face training image, and the similarity loss function is set to constrain the non-target attribute of the face training image when it passes through the face attribute editing model. Describe the similarity between the non-target attributes when the face training image is reconstructed;
模型训练模块,设置为将所述人脸训练图像输入到所述人脸属性编辑模型内,利用所述目标对抗损失函数和所述相似度损失函数对所述人脸属性编辑模型进行训练,得到训练后的人脸属性编辑模型。The model training module is configured to input the face training image into the face attribute editing model, and use the target confrontation loss function and the similarity loss function to train the face attribute editing model to obtain The trained face attribute editing model.
第四方面,本申请实施例提供了一种人脸属性编辑装置,该装置包括:In a fourth aspect, the embodiment of the present application provides a device for editing human face attributes, which includes:
初步编辑模块,设置为将待编辑的当前人脸图像输入到采用上述第一方面提供的人脸属性编辑模型的训练方法所训练的人脸属性编辑模型中,得到对应的人脸编辑图像;The preliminary editing module is configured to input the current face image to be edited into the face attribute editing model trained by the training method of the face attribute editing model provided by the above-mentioned first aspect, to obtain a corresponding face editing image;
目标分割模块,设置为对所述当前人脸图像进行目标分割,得到对应的目标域掩码图;The target segmentation module is configured to perform target segmentation on the current face image to obtain a corresponding target domain mask;
编辑修复模块,设置为利用所述目标域掩码图对所述人脸编辑图像进行图像修复,得到完成目标属性编辑的人脸图像。The editing and repairing module is configured to use the target domain mask image to perform image repair on the edited face image to obtain a face image with edited target attributes.
第五方面,本申请实施例提供了一种电子设备,该电子设备包括:In a fifth aspect, the embodiment of the present application provides an electronic device, the electronic device includes:
一个或多个处理器;one or more processors;
存储装置,设置为存储一个或多个程序;a storage device configured to store one or more programs;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现上述第一方面提供的人脸属性编辑模型的训练方法,或者,实现上述第二方面提供的人脸属性编辑方法。When the one or more programs are executed by the one or more processors, the one or more processors realize the training method of the face attribute editing model provided by the first aspect above, or realize the second above Face attribute editing method provided by aspect.
第六方面,本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述第一方面提供的人脸属性编辑模型的训练方法,或者,实现上述第二方面提供的人脸属性编辑方法。In the sixth aspect, the embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for training the face attribute editing model provided in the above-mentioned first aspect is implemented, or, Realize the face attribute editing method provided by the second aspect above.
附图说明Description of drawings
图1A为本申请实施例一提供的一种人脸属性编辑模型的训练方法的流程图;FIG. 1A is a flowchart of a training method for a face attribute editing model provided in Embodiment 1 of the present application;
图1B为本申请实施例一提供的人脸属性编辑模型的训练过程的原理示意图;FIG. 1B is a schematic diagram of the training process of the face attribute editing model provided in Embodiment 1 of the present application;
图2A为本申请实施例二提供的一种人脸属性编辑模型的训练方法的流程图;Fig. 2A is the flow chart of the training method of a kind of human face attribute editing model provided in embodiment 2 of the present application;
图2B为本申请实施例二提供的人脸属性编辑模型的训练过程的原理示意图;FIG. 2B is a schematic diagram of the training process of the face attribute editing model provided in Embodiment 2 of the present application;
图2C为本申请实施例二提供的人脸属性编辑模型的结构示意图;FIG. 2C is a schematic structural diagram of the face attribute editing model provided in Embodiment 2 of the present application;
图3A为本申请实施例三提供的一种人脸属性编辑方法的流程图;FIG. 3A is a flowchart of a face attribute editing method provided in Embodiment 3 of the present application;
图3B为本申请实施例三提供的人脸属性编辑过程的原理示意图;FIG. 3B is a schematic diagram of the principle of the face attribute editing process provided by Embodiment 3 of the present application;
图4为本申请实施例四提供的一种人脸属性编辑模型的训练装置的结构示意图;4 is a schematic structural diagram of a training device for a face attribute editing model provided in Embodiment 4 of the present application;
图5为本申请实施例五提供的一种人脸属性编辑装置的结构示意图;FIG. 5 is a schematic structural diagram of a face attribute editing device provided in Embodiment 5 of the present application;
图6为本申请实施例六提供的一种电子设备的结构示意图。FIG. 6 is a schematic structural diagram of an electronic device provided in Embodiment 6 of the present application.
具体实施方式Detailed ways
下面结合附图和实施例对本申请进行说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。此外,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。The application will be described below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, only some structures related to the present application are shown in the drawings but not all structures. In addition, the embodiments in the present application and the features in the embodiments can be combined with each other under the condition of no conflict.
实施例一Embodiment one
图1A为本申请实施例一提供的一种人脸属性编辑模型的训练方法的流程图,本实施例可以对任一人脸图像中的各种属性进行重新编辑以更改人脸样式。本实施例提供的人脸属性编辑模型的训练方法可以由本申请实施例提供的人脸属性编辑模型的训练装置来执行,该装置可以通过软件和/或硬件的方式来实现,并集成在执行本方法的电子设备中。FIG. 1A is a flowchart of a training method for a face attribute editing model provided in Embodiment 1 of the present application. In this embodiment, various attributes in any face image can be re-edited to change the face style. The training method of the human face attribute editing model provided by this embodiment can be executed by the training device of the human face attribute editing model provided by the embodiment of this application, which device can be realized by means of software and/or hardware, and integrated in the execution of this application method of electronic equipment.
参考图1A,该方法可以包括如下步骤:Referring to Figure 1A, the method may include the following steps:
S110,根据人脸训练图像的重构参数构建初始的人脸属性编辑模型,所述人脸属性编辑模型内预设有目标对抗损失函数和相似度损失函数。S110. Construct an initial face attribute editing model according to the reconstruction parameters of the face training image, and a target adversarial loss function and a similarity loss function are preset in the face attribute editing model.
例如,考虑到用于人脸属性编辑的网络模型训练时,通常会利用所设定的全局损失函数来分析输出图像与输入图像之间的差异,以训练该网络模型对人脸图像内目标属性进行准确编辑。但是,由于全局损失函数会使输入图像和输出图像之间存在相应的强约束关系,导致在编辑目标属性的同时,可能会带动背景区域和非目标属性发生变更,使得编辑后的输出图像不够真实自然。For example, when considering the network model training for face attribute editing, the set global loss function is usually used to analyze the difference between the output image and the input image, so as to train the network model to understand the target attributes in the face image Make accurate edits. However, because the global loss function will cause a corresponding strong constraint relationship between the input image and the output image, it may cause changes in the background area and non-target attributes while editing the target attributes, making the edited output image not realistic enough. nature.
因此,为了准确对人脸图像内的各种属性进行重新编辑,以满足用户对于人脸属性编辑的多样化需求,本实施例会专门构建一个人脸属性编辑模型,用于对各类人脸图像内的目标属性进行自然真实的编辑。其中,该目标属性可以为人脸图像内按照用户的编辑需求确定的待编辑的特定关键点特征,例如用户需要将人脸图像内的人像重新编辑为光头时,那么目标属性则为人脸图像内的头发区域特征。Therefore, in order to accurately re-edit the various attributes in the face image to meet the diverse needs of users for face attribute editing, this embodiment will specifically build a face attribute editing model for all types of face images. Natural and realistic editing of target properties within . Wherein, the target attribute can be a specific key point feature to be edited in the face image according to the user's editing requirements. For example, when the user needs to re-edit the portrait in the face image into a bald head, the target attribute is the Hair region features.
需要说明的是,为了确保人脸属性编辑模型的训练效率,考虑到对于人脸图像的重构方式较为普及,而且对于人脸图像进行目标属性编辑时,也会在人脸图像执行相似的重构步骤后进一步执行对应的目标属性编辑,因此本实施例可以在人脸图像重构信息的基础上,初步构建初始的人脸属性编辑模型,使得初始的人脸属性编辑模型能够实现对应的人脸图像重构。后续,采用迁移学习的方式,可以对人脸属性编辑模型所具备的重构能力进行不断改变,使得该人脸属性编辑模型具备对目标属性进行准确编辑的能力。It should be noted that, in order to ensure the training efficiency of the face attribute editing model, considering that the face image reconstruction method is relatively popular, and when the face image is edited for the target attribute, similar reconstruction will be performed on the face image. After the configuration step, the corresponding target attribute editing is further performed. Therefore, this embodiment can preliminarily construct an initial face attribute editing model on the basis of the face image reconstruction information, so that the initial face attribute editing model can realize the corresponding human face attribute editing model. Face image reconstruction. In the future, by adopting the method of transfer learning, the reconstruction ability of the face attribute editing model can be continuously changed, so that the face attribute editing model has the ability to accurately edit the target attribute.
例如,在构建初始的人脸属性编辑模型时,首先会获取大量的人脸训练图像,作为所构建的人脸属性编辑模型的训练样本。然后,通过现有的人脸图像重构方式可以对各个人脸训练图像进行重构,以确定人脸训练图像重构时所采用的各项重构参数,进而按照各项重构参数来构建初始的人脸属性编辑模型,使得该初始的人脸属性编辑模型能够具备人脸图像重构的能力,以便后续进行目标属性编辑的迁移学习。For example, when constructing an initial face attribute editing model, a large number of face training images are first obtained as training samples for the constructed face attribute editing model. Then, each face training image can be reconstructed through the existing face image reconstruction method to determine the various reconstruction parameters used in the reconstruction of the human face training image, and then construct according to each reconstruction parameter The initial face attribute editing model enables the initial face attribute editing model to have the ability of face image reconstruction for subsequent transfer learning of target attribute editing.
而且,为了避免全局损失函数使输入图像和输出图像之间存在相应的强约束关系所带来的问题,本实施例会在人脸属性编辑模型内预先设定有目标对抗损失函数和相似度损失函数两种损失函数。其中,通过目标对抗损失函数来约束人脸训练图像内目标属性的编辑真实性,通过相似度损失函数来约束人脸训练图像经过人脸属性编辑模型时的非目标属性与人脸训练图像重构时的非目标属性之间的相似度。Moreover, in order to avoid the problem caused by the global loss function causing a corresponding strong constraint relationship between the input image and the output image, this embodiment will pre-set the target confrontation loss function and similarity loss function in the face attribute editing model Two loss functions. Among them, the editing authenticity of the target attributes in the face training image is constrained by the target confrontation loss function, and the non-target attributes and face training image reconstruction of the face training image are constrained by the face attribute editing model through the similarity loss function The similarity between the non-target attributes at the time.
也就是说,本实施例针对人脸图像内待编辑的目标属性和不必编辑的非目标属性,会在人脸属性编辑模型内设定不同的损失函数,以在人脸属性编辑模型的训练过程中,分别对人脸训练图像内目标属性和非目标属性的编辑情况进行区分引导。采用目标对抗损失函数来约束人脸属性编辑模型对于人脸训练图像内目标属性的编辑真实性,以确保人脸训练图像内目标属性的准确编辑;采用相似度损失函数来约束人脸训练图像经过人脸属性编辑模型时的非目标属性与人脸训练图像重构时的非目标属性之间的相似度,以确保人脸训练图像在经过人脸属性编辑模型对目标属性进行编辑后,并不会对人脸训练图像内的非目标属性造成影响,尽可能保证人脸训练图像在编辑前后非目标属性的不变性。That is to say, in this embodiment, for the target attributes to be edited in the face image and the non-target attributes that do not need to be edited, different loss functions will be set in the face attribute editing model to improve the training process of the face attribute editing model. In , the editing situation of the target attribute and non-target attribute in the face training image is distinguished and guided respectively. The target confrontation loss function is used to constrain the editing authenticity of the face attribute editing model for the target attributes in the face training image to ensure the accurate editing of the target attributes in the face training image; the similarity loss function is used to constrain the face training image after The similarity between the non-target attributes when the face attribute editing model is reconstructed and the non-target attributes when the face training image is reconstructed, to ensure that the face training image is not edited after the face attribute editing model edits the target attributes. It will affect the non-target attributes in the face training image, and try to ensure the invariance of the non-target attributes of the face training image before and after editing.
示例性的,本实施例中的人脸属性编辑模型可以采用生成对抗(如StyleGAN)模型。此时,目标对抗损失函数(GAN Loss)可以为带有梯度惩罚的WGAN-GP函数,用于指引人脸属性编辑模型往目标属性靠拢。其中,该WGAN-GP函数将现有GAN网络中两个概率分布的度量替换为Wasserstein距离,使得人脸属性编辑模型的训练过程更加稳定。而且,WGAN-GP函数在现有损失函数的基础上进行优化,添加了正则项GP(gradient penalty),用于约束要求对判别器输入梯度的L2范数要约束在1附近。本实施例中的目标对抗损失函数可以为:Exemplarily, the face attribute editing model in this embodiment can adopt a generative confrontation (such as StyleGAN) model. At this time, the target confrontation loss function (GAN Loss) can be a WGAN-GP function with a gradient penalty, which is used to guide the face attribute editing model to move closer to the target attribute. Among them, the WGAN-GP function replaces the measures of the two probability distributions in the existing GAN network with the Wasserstein distance, making the training process of the face attribute editing model more stable. Moreover, the WGAN-GP function is optimized on the basis of the existing loss function, and the regular term GP (gradient penalty) is added to constrain the L2 norm of the input gradient of the discriminator to be constrained near 1. The target adversarial loss function in this embodiment can be:
Figure PCTCN2022127361-appb-000001
Figure PCTCN2022127361-appb-000001
其中,D为人脸属性编辑模型采用生成对抗网络时的判别器,G为人脸属性编辑模型采用生成对抗网络时的生成器,x为输入人脸属性编辑模型内的生成器中的人脸训练图像,y为由生成器对人脸训练图像内的目标属性进行编辑后的输出图像,用于输入人脸属性编辑模型内的判别器中进行真实性判别。Among them, D is the discriminator when the face attribute editing model adopts the generative confrontation network, G is the generator when the face attribute editing model adopts the generative confrontation network, and x is the face training image input into the generator in the face attribute editing model , y is the output image after the generator edits the target attributes in the face training image, which is used to input the discriminator in the face attribute editing model for authenticity discrimination.
其中,生成对抗网络可分成两部分,即生成器和判别器。Loss G为生成器的损失函数,Loss D为判别器的损失函数。 Among them, the generative confrontation network can be divided into two parts, namely the generator and the discriminator. Loss G is the loss function of the generator, and Loss D is the loss function of the discriminator.
其中,
Figure PCTCN2022127361-appb-000002
是指D(y)对y求梯度,λ是一个权重参数,用来衡量乘的这一项在总体损失函数中的比例。
in,
Figure PCTCN2022127361-appb-000002
It refers to the gradient of D(y) to y, and λ is a weight parameter used to measure the proportion of the multiplication item in the overall loss function.
而且,在一实施例中,考虑到人脸训练图像在经过人脸属性编辑模型进行目标属性编辑,或者在图像重构时,由于在不同分辨率的RGB图像上采用了残差跨连,因此会在不同分辨率下输出一些中间产物,也就是中间低分辨率图像。此时,为了确保人脸属性编辑模型对人脸训练图像内的非目标属性的编辑准确性,本实施例会将相似度损失函数分为低分辨感知损失函数和掩码感知损失函数两种。Moreover, in one embodiment, considering that the face training image is edited through the face attribute editing model for target attribute editing, or when the image is reconstructed, due to the use of residual cross-connection on RGB images of different resolutions, therefore Some intermediate products, that is, intermediate low-resolution images, will be output at different resolutions. At this time, in order to ensure the editing accuracy of the face attribute editing model on the non-target attributes in the face training image, this embodiment divides the similarity loss function into two types: a low-resolution perceptual loss function and a mask perceptual loss function.
其中,低分辨感知损失函数用于约束人脸训练图像经过人脸属性编辑模型进行目标属性编辑时输出的第一中间低分辨图像与人脸训练图像重构时生成的第二中间低分辨图像之间的相似度;掩码感知损失函数用于约束人脸训练图像经过人脸属性编辑模型进行目标属性编辑后提取的第一关键点掩码特征与人脸训练图像重构后提取的第二关键点掩码特征之间的相似度。Among them, the low-resolution perceptual loss function is used to constrain the difference between the first intermediate low-resolution image output when the face training image is edited through the face attribute editing model for target attribute editing, and the second intermediate low-resolution image generated when the face training image is reconstructed. The similarity between them; the mask-aware loss function is used to constrain the first key point mask feature extracted after the target attribute editing of the face training image through the face attribute editing model and the second key point extracted after the reconstruction of the face training image The similarity between point mask features.
例如,本实施例通过低分辨感知损失函数(low-resolution perceptual loss)对人脸训练图像进行目标属性编辑和重构时的中间低分辨图像相似度进行约束,能够实现人脸属性编辑模型对于输入图像与输出图像之间的相似度约束,又不会形成太强的约束而导致目标属性无法编辑。本实施例的低分辨感知损失函数可以为:low-resolution perceptual loss=(VGG(G rec_l(x))-VGG(G transfer_l(x))) 2;其中,G transfer_l(x)为人脸训练图像经过人脸属性编辑模型进行目标属性编辑时输出的第一中间低分辨图像,G rec_l(x)为人脸训练图像重构时生成的第二中间低分辨图像,VGG为预训练的特征提取器,用于提取第一中间低分辨图像和第二中间低分辨图像的特征。 For example, in this embodiment, the low-resolution perceptual loss function (low-resolution perceptual loss) is used to constrain the similarity of the intermediate low-resolution images when the target attribute editing and reconstruction of the face training images are performed, so that the face attribute editing model can be used for the input The similarity constraint between the image and the output image does not form too strong a constraint that the target attribute cannot be edited. The low-resolution perceptual loss function of this embodiment can be: low-resolution perceptual loss=(VGG(G rec_l (x))-VGG(G transfer_l (x))) 2 ; wherein, G transfer_l (x) is a face training image The first intermediate low-resolution image output when the target attribute is edited through the face attribute editing model, G rec_l (x) is the second intermediate low-resolution image generated when the face training image is reconstructed, VGG is the pre-trained feature extractor, Used to extract features of the first intermediate low-resolution image and the second intermediate low-resolution image.
而且,如果不对人脸属性编辑模型对于非目标属性的最终输出作出约束,会使输出图像内的非目标属性无法保持输入图像时的姿态,导致与输入图像的偏差较大。因此,本实施例通过掩码感知损失函数(Masked perceptual loss)会采用人脸关键点网络分别对人脸训练图像进行目标属性编辑和重构后输出的两个输出图像内的人脸关键点(例如眼睛、鼻子、嘴巴等)进行关键点特征提取,从而得到人脸训练图像经过人脸属性编辑模型进行目标属性编辑后提取的第一 关键点掩码特征以及人脸训练图像重构后提取的第二关键点掩码特征,然后采用掩码感知损失函数来分析相同关键点间的掩码相似度,从而约束人脸训练图像进行目标属性编辑和重构后提取的各个关键点掩码特征所表示的非目标属性之间的相似度。本实施例中的掩码感知损失函数可以为:Moreover, if the final output of the non-target attributes of the face attribute editing model is not constrained, the non-target attributes in the output image will not be able to maintain the pose of the input image, resulting in a large deviation from the input image. Therefore, the present embodiment uses the masked perceptual loss function (Masked perceptual loss) to use the face key point network to edit the target attributes of the face training image and reconstruct the face key points in the two output images output after the output ( Such as eyes, nose, mouth, etc.) to extract key point features, so as to obtain the first key point mask feature extracted after the face attribute editing model edits the target attribute and the face training image is reconstructed. The second keypoint mask features, and then use the mask-aware loss function to analyze the mask similarity between the same keypoints, so as to constrain the face training image for target attribute editing and reconstruction of each keypoint mask feature extracted The similarity between the non-target attributes represented. The mask-aware loss function in this embodiment can be:
masked resolution perceptual loss=(VGG(mgG rec(x))-VGG(mgG transfer(x))) 2 masked resolution perceptual loss=(VGG(mgG rec (x))-VGG(mgG transfer (x))) 2
其中,VGG(m·G transfer(x))为人脸训练图像经过人脸属性编辑模型进行目标属性编辑后提取的第一关键点掩码特征,VGG(mgG rec(x))为人脸训练图像重构后提取的第二关键点掩码特征,G transfer(x)为人脸训练图像执行目标属性编辑后输出的人脸编辑图像,G rec(x)为人脸训练图像执行重构后输出的人脸重构图像,m为人脸关键点。 Among them, VGG(m G transfer (x)) is the first key point mask feature extracted from the face training image after the target attribute editing of the face attribute editing model, and VGG(mgG rec (x)) is the weight of the face training image. The second key point mask feature extracted after construction, G transfer (x) is the face edited image output after the target attribute editing of the face training image, G rec (x) is the face output after the face training image is reconstructed Reconstruct the image, m is the key point of the face.
S120,将人脸训练图像输入到人脸属性编辑模型内,利用目标对抗损失函数和相似度损失函数对人脸属性编辑模型进行训练,得到训练后的人脸属性编辑模型。S120. Input the face training image into the face attribute editing model, use the target adversarial loss function and the similarity loss function to train the face attribute editing model, and obtain the trained face attribute editing model.
在构建好初始的人脸属性编辑模型后,会将各个人脸训练图像不断输入到人脸属性编辑模型内,由该人脸属性编辑模型内所设定的各个模型参数对所输入的人脸训练图像进行特征处理,同时对所输入的人脸训练图像进行重构处理,从而得到该人脸属性编辑模型内所设定的目标对抗损失函数和相似度损失函数所需要的各项参数,以计算出该目标对抗损失函数和相似度损失函数的损失结果。然后,将该损失结果反向传播回该人脸属性编辑模型内,以对该人脸属性编辑模型内的当前模型参数进行相应修正。然后,将下一人脸训练图像继续输入到修正后的人脸属性编辑模型内,继续采用上述方式对该人脸属性编辑模型内的模型参数进行修正,依次循环,直至人脸属性编辑模型具备准确编辑目标属性,且保持非目标属性不变的能力,从而得到训练后的人脸属性编辑模型。After constructing the initial face attribute editing model, each face training image will be continuously input into the face attribute editing model, and each model parameter set in the face attribute editing model will affect the input face The training image is subjected to feature processing, and the input face training image is reconstructed at the same time, so as to obtain the parameters required by the target adversarial loss function and similarity loss function set in the face attribute editing model. Calculate the loss results of the target adversarial loss function and similarity loss function. Then, the loss result is back-propagated into the face attribute editing model, so as to modify the current model parameters in the face attribute editing model accordingly. Then, continue to input the next face training image into the corrected face attribute editing model, continue to use the above-mentioned method to correct the model parameters in the face attribute editing model, and cycle in turn until the face attribute editing model has accurate The ability to edit the target attributes and keep the non-target attributes unchanged, so as to obtain the trained face attribute editing model.
本实施例,根据人脸训练图像的重构参数构建初始的人脸属性编辑模型,并在人脸属性编辑模型内预设目标对抗损失函数和相似度损失函数,以通过目标对抗损失函数约束人脸训练图像内目标属性的编辑真实性,通过相似度损失函数约束人脸训练图像经过人脸属性编辑模型时的非目标属性与人脸训练图像重构时的非目标属性之间的相似度,然后利用目标对抗损失函数和相似度损失函数共同对所述人脸属性编辑模型进行训练,实现人脸属性编辑模型在编辑人脸图像内的目标属性时对目标属性和非目标属性之间的共同约束,避免出现在 编辑目标属性时带动背景区域和非目标属性发生变更的问题,确保在编辑目标属性的同时,保持非目标属性的编辑不变性,从而提升人脸属性编辑模型对于目标属性的编辑准确性。In this embodiment, the initial face attribute editing model is constructed according to the reconstruction parameters of the face training image, and the target adversarial loss function and similarity loss function are preset in the face attribute editing model, so as to constrain the human face through the target adversarial loss function. The editing authenticity of the target attributes in the face training image, constraining the similarity between the non-target attributes of the face training image through the face attribute editing model and the non-target attributes of the face training image reconstruction through the similarity loss function, Then, the face attribute editing model is jointly trained by using the target confrontation loss function and the similarity loss function, so that the face attribute editing model can realize the mutual recognition between the target attribute and the non-target attribute when editing the target attribute in the face image. Constraints, to avoid the problem of changing the background area and non-target attributes when editing the target attributes, to ensure that the editing of the non-target attributes remains invariant while editing the target attributes, thereby improving the editing of the face attribute editing model for the target attributes accuracy.
实施例二Embodiment two
图2A为本申请实施例二提供的一种人脸属性编辑模型的训练方法的流程图,图2B为本申请实施例二提供的人脸属性编辑模型的训练过程的原理示意图,图2C为本申请实施例二提供的人脸属性编辑模型的结构示意图。本实施例是在上述实施例的基础上进行调整。例如,如图2B所示,考虑到StyleGAN网络具有强大的生成能力能生成真实自然的图像,且StyleGAN网络是一个将随机噪声映射到图像的模型,不能直接接受真实图像作为输入,因此本实施例中的人脸属性编辑模型可以包括图像编码网络、迁移解码网络和迁移解码网络的隐空间三部分,以采用StyleGAN网络来准确实现人脸图像的目标属性编辑。如图2C所示,本实施例的人脸属性编辑模型中的图像编码网络(The pSp Encoder)可以采用特征金字塔的网络结构,用于输入各个人脸训练图像,从而向迁移解码网络的隐空间输出该人脸训练图像的隐含变量特征。此时,基于特征金字塔结构的图像编码网络可以将具有不同语义信息的特征图分别映射到隐空间,将不同粒度的特征转变成多个隐含变量特征,从而将具有不同语义信息的隐含变量特征输入到迁移解码网络(StyleGAN decoder)中的不同层中,使得迁移解码网络用于输入隐空间内的各个隐含变量特征,对人脸训练图像内的目标属性进行编辑,输出对应的人脸编辑图像,从而增强图像重构能力。而且,图像编码网络的基础模块使用了残差模块,残差模块将图像编码网络的输入以捷径的形式作用到图像编码网络的输出上,使得反向传播时梯度可以直传回浅层参数,有效抑制梯度消失等问题。Fig. 2A is a flowchart of a training method of a face attribute editing model provided in Embodiment 2 of the present application, Fig. 2B is a schematic diagram of the principle of the training process of the face attribute editing model provided in Embodiment 2 of the present application, and Fig. 2C is the present application A schematic structural diagram of the face attribute editing model provided in the second embodiment of the application. This embodiment is adjusted on the basis of the foregoing embodiments. For example, as shown in Figure 2B, considering that the StyleGAN network has a powerful generation ability to generate real and natural images, and the StyleGAN network is a model that maps random noise to images, it cannot directly accept real images as input, so this embodiment The face attribute editing model in can include three parts: image encoding network, migration decoding network and hidden space of migration decoding network, so that the StyleGAN network can be used to accurately realize the target attribute editing of face images. As shown in Figure 2C, the image encoding network (The pSp Encoder) in the face attribute editing model of the present embodiment can adopt the network structure of feature pyramid, be used for inputting each face training image, thus transfer to the latent space of decoding network Output the hidden variable features of the face training image. At this time, the image coding network based on the feature pyramid structure can map the feature maps with different semantic information to the latent space, and transform the features of different granularities into multiple hidden variable features, so that the hidden variables with different semantic information The features are input to different layers in the transfer decoding network (StyleGAN decoder), so that the transfer decoding network is used to input the hidden variable features in the hidden space, edit the target attributes in the face training image, and output the corresponding face Edit images to enhance image reconstruction capabilities. Moreover, the basic module of the image coding network uses a residual module, which applies the input of the image coding network to the output of the image coding network in the form of a shortcut, so that the gradient can be directly transmitted back to the shallow layer parameters during backpropagation, Effectively suppress problems such as gradient disappearance.
在一实施例中,如图2A所示,本实施例中可以包括如下步骤:In one embodiment, as shown in FIG. 2A, this embodiment may include the following steps:
S210,按照已训练的重构解码网络的网络参数构建初始的迁移解码网络。S210. Construct an initial migration decoding network according to network parameters of the trained reconstructed decoding network.
为了保证人脸属性编辑模型高效性,本实施例按照人脸训练图像的重构参数构建初始的人脸属性编辑模型前,首先会采用大量多样性尽可能强的人脸图像作为训练样本,预先训练出对应的重构解码网络,使得重构解码网络具备对人脸训练图像的重构能力。然后,在构建初始的人脸属性编辑模型时,可以直接采用已训练好的重构解码网络的网络参数来构建初始的迁移解码网络,使得迁移解码网络能够在重构解码网络的基础上,采用迁移学习的方式,不断训练 得到最终具备目标属性编辑能力的人脸属性编辑模型。In order to ensure the efficiency of the face attribute editing model, before constructing the initial face attribute editing model according to the reconstruction parameters of the face training images in this embodiment, a large number of face images with as strong a diversity as possible are first used as training samples, and pre-trained The corresponding reconstructed decoding network is produced, so that the reconstructed decoding network has the ability to reconstruct the face training image. Then, when constructing the initial face attribute editing model, the network parameters of the trained reconstructed decoding network can be directly used to construct the initial migration decoding network, so that the migration decoding network can be based on the reconstructed decoding network. In the way of transfer learning, the face attribute editing model with the ability to edit target attributes is obtained through continuous training.
S220,将预训练的图像编码网络、初始的迁移解码网络和该迁移解码网络的隐空间构建为初始的人脸属性编辑模型。S220. Construct the pre-trained image encoding network, the initial migration decoding network, and the latent space of the migration decoding network as an initial face attribute editing model.
示例性地,在构建出初始的迁移解码网络后,可以采用大量人脸图像来训练对应的图像编码网络,使得图像编码网络能够对人脸训练图像中的隐含变量特征进行准确分析。然后,如图2B所示,将预训练的图像编码网络、初始的迁移解码网络以及该迁移解码网络的隐空间共同构建出初始的人脸属性编辑模型。此时,由于人脸属性编辑模型内的图像编码网络是预先训练好的,那么本实施例对于人脸属性编辑模型的训练则主要针对迁移解码网络进行训练,在训练好迁移解码网络后,即可得到训练好的人脸属性编辑模型。Exemplarily, after constructing the initial migration decoding network, a large number of face images can be used to train the corresponding image encoding network, so that the image encoding network can accurately analyze the hidden variable features in the face training images. Then, as shown in FIG. 2B , the pre-trained image encoding network, the initial migration decoding network, and the latent space of the migration decoding network are jointly constructed to construct an initial face attribute editing model. At this time, since the image encoding network in the face attribute editing model is pre-trained, the training of the face attribute editing model in this embodiment is mainly aimed at training the migration decoding network. After the migration decoding network is trained, that is A trained face attribute editing model can be obtained.
S230,将人脸训练图像输入到图像编码网络内,向隐空间输出人脸训练图像的隐含变量特征。S230. Input the face training image into the image coding network, and output the latent variable features of the face training image to the latent space.
在构建好初始的人脸属性编辑模型后,会对该人脸属性编辑模型进行训练。此时,如图2B所示,本实施例会将各个人脸训练图输入到该人脸属性编辑模型内的图像编码网络中,由该图像编码网络对该人脸训练图像的特征进行分析,并输出该人脸训练图像的隐含变量特征,然后将各个隐含变量特征输出到迁移解码网络的隐空间内,从而实现人脸训练图像从真实图像到隐含变量的转变,使得后续迁移解码网络能够从该隐空间内输入对应的隐含变量特征,实现迁移解码网络的输入准确性。After the initial face attribute editing model is constructed, the face attribute editing model will be trained. At this point, as shown in Figure 2B, in this embodiment, each face training image will be input into the image coding network in the face attribute editing model, and the image coding network will analyze the features of the face training image, and Output the hidden variable features of the face training image, and then output each hidden variable feature to the hidden space of the migration decoding network, so as to realize the transformation of the face training image from the real image to the hidden variable, so that the subsequent migration decoding network The corresponding hidden variable features can be input from the hidden space to realize the input accuracy of the migration decoding network.
S240,将隐含变量特征输入到迁移解码网络中,得到目标属性编辑关联的第一人脸图像;将隐含变量特征输入到已训练的重构解码网络中,得到重构关联的第二人脸图像。S240, input the hidden variable features into the migration decoding network to obtain the first face image associated with the edited target attribute; input the hidden variable features into the trained reconstruction decoding network to obtain the second face image associated with the reconstruction face image.
考虑到人脸属性编辑模型内所设定的目标对抗损失函数和相似度损失函数的约束对象,本实施例会将隐空间内的隐含变量特征分别输入到迁移解码网络和重构解码网络中。此时由迁移解码网络按照该隐含变量特征执行目标属性的编辑操作,从而输出与目标属性编辑关联的第一人脸图像。由重构解码网络按照该隐含变量特征执行对应的图像重构操作,从而输出与重构关联的第二人脸图像。然后,第一人脸图像和第二人脸图像作为人脸属性编辑模型内所设定的目标对抗损失函数和相似度损失函数的损失分析对象,后续能够计算出该迁移解码网络的模型损失,以便对该迁移解码网络进行训练。Considering the target adversarial loss function and the constraint object of the similarity loss function set in the face attribute editing model, this embodiment will input the hidden variable features in the latent space into the migration decoding network and the reconstruction decoding network respectively. At this time, the migration decoding network performs the editing operation of the target attribute according to the hidden variable feature, thereby outputting the first human face image associated with the editing of the target attribute. A corresponding image reconstruction operation is performed by the reconstruction decoding network according to the hidden variable feature, thereby outputting a second human face image associated with the reconstruction. Then, the first face image and the second face image are used as the loss analysis objects of the target adversarial loss function and similarity loss function set in the face attribute editing model, and the model loss of the migration decoding network can be calculated subsequently, In order to train the transfer decoding network.
S250,将第一人脸图像代入到目标对抗损失函数中,将第一人脸图像和第二人脸图像代入到相似度损失函数中,对迁移解码网络进行训练,得到训练后 的人脸属性编辑模型。S250, substituting the first face image into the target confrontation loss function, substituting the first face image and the second face image into the similarity loss function, training the migration decoding network, and obtaining the trained face attributes Edit the model.
示例性地,在得到迁移解码网络输出的与目标属性编辑关联的第一人脸图像和重构解码网络输出的与重构关联的第二人脸图像后,可以直接将第一人脸图像代入到目标对抗损失函数中,分析对于人脸训练图像内目标属性的编辑损失。同时,将第一人脸图像和第二人脸图像代入到相似度损失函数中,可以分析人脸训练图像执行目标属性编辑时的非目标属性与执行重构时的非目标属性之间的相似度损失。进而,采用编辑损失和相似度损失共同对该迁移解码网络进行反向传播,以修正迁移解码网络中的网络参数,实现对该迁移解码网络的训练。此时,将每一人脸训练图像不断输入到图像编码网络内,循环执行上述步骤,从而不断对迁移解码网络进行训练,直至该迁移解码网络的损失函数达到收敛,从而得到完成训练的迁移解码网络,即可得到训练后的人脸属性编辑模型。Exemplarily, after obtaining the first face image associated with target attribute editing output by the migration decoding network and the second face image associated with reconstruction output by the reconstruction decoding network, the first face image can be directly substituted into In the target adversarial loss function, analyze the editing loss for the target attributes in the face training image. At the same time, substituting the first face image and the second face image into the similarity loss function can analyze the similarity between the non-target attributes of the face training image when performing target attribute editing and performing reconstruction loss. Furthermore, the edit loss and similarity loss are used to backpropagate the migration decoding network to modify the network parameters in the migration decoding network and realize the training of the migration decoding network. At this time, each face training image is continuously input into the image coding network, and the above steps are executed cyclically, so as to continuously train the migration decoding network until the loss function of the migration decoding network reaches convergence, thereby obtaining a migration decoding network that has completed training , the trained face attribute editing model can be obtained.
需要说明的是,由于本实施例中的相似度损失函数包括低分辨感知损失函数和掩码感知损失函数两种,因此为了确保迁移解码网络的训练准确性,本实施例会对低分辨感知损失函数和掩码感知损失函数的损失对象进行区分,此时迁移解码网络输出的与目标属性编辑关联的第一人脸图像可以包括人脸训练图像经过迁移解码网络进行目标属性编辑时输出的第一中间低分辨图像和执行目标属性编辑后输出的人脸编辑图像,重构解码网络输出的与重构关联的第二人脸图像可以包括人脸训练图像经过重构解码网络进行重构时生成的第二中间低分辨图像和执行重构后输出的人脸重构图像。It should be noted that since the similarity loss function in this embodiment includes two types of low-resolution perceptual loss function and mask perceptual loss function, in order to ensure the training accuracy of the migration decoding network, this embodiment will use the low-resolution perceptual loss function It is distinguished from the loss object of the mask-aware loss function. At this time, the first face image associated with the target attribute editing output by the migration decoding network may include the first intermediate output when the face training image passes through the migration decoding network for target attribute editing. The low-resolution image and the edited face image output after performing target attribute editing, the second face image associated with reconstruction output by the reconstruction decoding network may include the first face training image generated when the face training image is reconstructed through the reconstruction decoding network. Two intermediate low-resolution images and the face reconstructed image output after performing reconstruction.
因此,本实施例中将第一人脸图像和第二人脸图像代入到相似度损失函数中,对迁移解码网络进行训练,可以包括:将第一中间低分辨图像和第二中间低分辨图像代入到低分辨感知损失函数中,将人脸编辑图像中提取的第一关键点掩码特征和人脸重构图像中提取的第二关键点掩码特征代入到掩码感知损失函数中,对迁移解码网络进行训练。Therefore, in this embodiment, the first human face image and the second human face image are substituted into the similarity loss function, and the migration decoding network is trained, which may include: the first intermediate low-resolution image and the second intermediate low-resolution image Substituting into the low-resolution perceptual loss function, the first key point mask feature extracted from the face editing image and the second key point mask feature extracted from the face reconstruction image are substituted into the mask perceptual loss function. Transfer decoding network for training.
也就是说,由于低分辨感知损失函数用于约束人脸训练图像经过人脸属性编辑模型进行目标属性编辑时输出的第一中间低分辨图像与人脸训练图像重构时生成的第二中间低分辨图像之间的相似度,因此可以将迁移解码器在执行目标述编辑操作时输出的第一中间低分辨图像和重构解码器执行图像重构操作时输出的第二中间低分辨图像代入到低分辨感知损失函数中,以分析对应的低分辨图像编辑损失。同时,对于迁移解码器在执行目标述编辑操作后输出的人脸编辑图像和重构解码器执行图像重构操作时输出的人脸重构图像,会采用现有 的特征提取器来分别提取人脸编辑图像中的第一关键点掩码特征和人脸重构图像中的第二关键点掩码特征,然后将第一关键点掩码特征和第二关键点掩码特征分别代入到掩码感知损失函数中,以分析各个关键点处的掩码损失。进而,结合目标对抗损失函数的损失结果,以及低分辨感知损失函数的低分辨图像编辑损失和掩码感知损失函数在各个关键点处的掩码损失,共同对迁移解码器进行训练,得到训练后的人脸属性编辑模型。That is to say, since the low-resolution perceptual loss function is used to constrain the first intermediate low-resolution image output when the face training image is edited through the face attribute editing model for target attribute editing and the second intermediate low-resolution image generated when the face training image is reconstructed The similarity between images can be distinguished, so the first intermediate low-resolution image output by the migration decoder when performing the target editing operation and the second intermediate low-resolution image output by the reconstruction decoder when performing image reconstruction operations can be substituted into In the low-resolution perceptual loss function, to analyze the corresponding low-resolution image editing loss. At the same time, for the face edited image output by the migration decoder after performing the target editing operation and the reconstructed face image output by the reconstruction decoder when the image reconstruction operation is performed, the existing feature extractor will be used to extract the face The first key point mask feature in the face editing image and the second key point mask feature in the face reconstruction image, and then respectively substitute the first key point mask feature and the second key point mask feature into the mask Perceptual loss function to analyze the mask loss at each keypoint. Furthermore, combined with the loss results of the target adversarial loss function, the low-resolution image editing loss of the low-resolution perceptual loss function and the mask loss of the mask perceptual loss function at each key point, the transfer decoder is jointly trained, and the trained The face attribute editing model of .
本实施例,根据人脸训练图像的重构参数构建初始的人脸属性编辑模型,并在人脸属性编辑模型内预设目标对抗损失函数和相似度损失函数,以通过目标对抗损失函数约束人脸训练图像内目标属性的编辑真实性,通过相似度损失函数约束人脸训练图像经过人脸属性编辑模型时的非目标属性与人脸训练图像重构时的非目标属性之间的相似度,然后利用目标对抗损失函数和相似度损失函数共同对所述人脸属性编辑模型进行训练,实现人脸属性编辑模型在编辑人脸图像内的目标属性时对目标属性和非目标属性之间的共同约束,避免出现在编辑目标属性时带动背景区域和非目标属性发生变更的问题,确保在编辑目标属性的同时,保持非目标属性的编辑不变性,从而提升人脸属性编辑模型对于目标属性的编辑准确性In this embodiment, the initial face attribute editing model is constructed according to the reconstruction parameters of the face training image, and the target adversarial loss function and similarity loss function are preset in the face attribute editing model, so as to constrain the human face through the target adversarial loss function. The editing authenticity of the target attributes in the face training image, constraining the similarity between the non-target attributes of the face training image through the face attribute editing model and the non-target attributes of the face training image reconstruction through the similarity loss function, Then, the face attribute editing model is jointly trained by using the target confrontation loss function and the similarity loss function, so that the face attribute editing model can realize the mutual recognition between the target attribute and the non-target attribute when editing the target attribute in the face image. Constraints, to avoid the problem of changing the background area and non-target attributes when editing the target attributes, to ensure that the editing of the non-target attributes remains invariant while editing the target attributes, thereby improving the editing of the face attribute editing model for the target attributes accuracy
实施例三Embodiment three
图3A为本申请实施例三提供的一种人脸属性编辑方法的流程图,本实施例可以对任一人脸图像中的各种属性进行重新编辑以更改人脸样式。本实施例提供的人脸属性编辑方法可以由本申请实施例提供的人脸属性编辑装置来执行,该装置可以通过软件和/或硬件的方式来实现,并集成在执行本方法的电子设备中。FIG. 3A is a flow chart of a face attribute editing method provided in Embodiment 3 of the present application. In this embodiment, various attributes in any face image can be re-edited to change the face style. The face attribute editing method provided in this embodiment can be executed by the face attribute editing device provided in the embodiment of the present application. The device can be realized by means of software and/or hardware, and integrated in the electronic device that executes the method.
参考图3A,该方法可以包括如下步骤:Referring to FIG. 3A, the method may include the following steps:
S310,将待编辑的当前人脸图像输入到采用上述实施例提供的人脸属性编辑模型的训练方法所训练的人脸属性编辑模型中,得到对应的人脸编辑图像。S310, input the current face image to be edited into the face attribute editing model trained by the face attribute editing model training method provided in the above embodiment, and obtain the corresponding face editing image.
例如,在采用上述实施例提供的人脸属性编辑模型的训练方法训练出对应的人脸属性编辑模型,该人脸属性编辑模型具备准确对目标属性进行编辑的能力。因此,本实施例会将待编辑的当前人脸图像输入到训练好的人脸属性编辑模型中,由该人脸属性编辑模型对该当前人脸图像进行目标属性的编辑处理,从而输出对应的人脸编辑图像,该人脸编辑图像与当前人脸图像相比,图像内的目标属性经过重新编辑,非目标属性能够保持不变。For example, a corresponding face attribute editing model is trained by adopting the training method of the face attribute editing model provided in the above embodiment, and the face attribute editing model has the ability to accurately edit target attributes. Therefore, in this embodiment, the current face image to be edited is input into the trained face attribute editing model, and the face attribute editing model performs target attribute editing processing on the current face image, thereby outputting the corresponding human face image. An edited face image. Compared with the current face image, the edited face image has re-edited target attributes in the image, and the non-target attributes can remain unchanged.
S320,对当前人脸图像进行目标分割,得到对应的目标域掩码图。S320. Perform target segmentation on the current face image to obtain a corresponding target domain mask map.
由于人脸属性编辑模型对于目标属性的编辑能力高低与人脸属性编辑模型的训练结果相关,因此为了进一步确保当前人脸图像目标属性编辑的真实自然性,本实施例还会对该人脸属性编辑模型输出的人脸编辑图像进行目标属性编辑的后处理优化操作。Since the editing ability of the face attribute editing model for the target attribute is related to the training result of the face attribute editing model, in order to further ensure the authenticity and naturalness of the target attribute editing of the current face image, this embodiment will also add the face attribute Edit the face editing image output by the model to perform post-processing optimization operations for target attribute editing.
例如,首先采用图像分割技术从当前人脸图像分割出目标属性所在的掩码区域,得到对应的目标域掩码图。以变光头进行目标属性编辑为例,可以对当前人脸图像进行头发分割,得到对应的头发掩码图(记为m hair),作为本实施例中的目标域掩码图。 For example, firstly, the image segmentation technology is used to segment the mask area where the target attribute is located from the current face image, and the corresponding target domain mask map is obtained. Taking target attribute editing with a variable shaved head as an example, hair segmentation can be performed on the current face image to obtain a corresponding hair mask map (denoted as m hair ), which is used as the target domain mask map in this embodiment.
S330,利用目标域掩码图对人脸编辑图像进行图像修复,得到完成目标属性编辑的人脸图像。S330. Perform image inpainting on the edited face image by using the target domain mask image, to obtain a face image with edited target attributes.
由于人脸编辑图像通常是在人脸背景区域等非目标属性区域的特征与当前人脸图像不一致,而目标域掩码图能够完整表示当前人脸图像内的目标属性信息,因此本实施例可以采用目标域掩码图,分析人脸编辑图像内的背景区域等非目标属性信息,然后采用当前人脸图像与目标域掩码图对于非目标属性的融合结果对该人脸编辑图像进行图像修复,得到完成目标属性编辑的人脸图像,使得最终的人脸图像与当前人脸图像内的非目标属性不变,从而在编辑目标属性的同时,保持非目标属性的不变性。Since the features of the face edited image are usually inconsistent with the current face image in non-target attribute areas such as the face background area, and the target domain mask map can completely represent the target attribute information in the current face image, so this embodiment can Use the target domain mask map to analyze the non-target attribute information such as the background area in the face edited image, and then use the fusion result of the current face image and the target domain mask map for the non-target attributes to perform image restoration on the face edited image , to obtain the face image with the target attribute edited, so that the final face image and the non-target attributes in the current face image remain unchanged, thereby maintaining the invariance of the non-target attributes while editing the target attributes.
示例性的,本实施例对人脸编辑图像进行图像修复时,可以为:对人脸编辑图像进行人像分割,得到对应的目标人像掩码图;融合目标人像掩码图和当前人脸图像,得到对应的人脸融合图像;利用目标域掩码图对目标人像掩码图的差集,对人脸融合图像进行图像修复,得到完成目标属性编辑的人脸图像。也就是说,通过对人脸编辑图像进行人像分割,得到对应的目标人像掩码图,可以排除人脸编辑图像内的背景区域,防止目标属性编辑后背景区域发生变化的问题。以变光头进行目标属性编辑为例,可以对人脸编辑图像进行人像分割,得到对应的光头人像掩码图(记为m bald),作为本实施例中的目标人像掩码图。 Exemplarily, when performing image restoration on the edited face image in this embodiment, it may be as follows: performing portrait segmentation on the edited face image to obtain a corresponding target portrait mask image; fusing the target portrait mask image with the current face image, Obtain the corresponding face fusion image; use the difference set of the target domain mask map to the target portrait mask map to perform image repair on the face fusion image, and obtain the face image with the target attribute editing completed. That is to say, by performing portrait segmentation on the edited face image to obtain the corresponding target portrait mask map, the background area in the edited face image can be eliminated, and the problem of background area change after target attribute editing can be prevented. Taking target attribute editing by changing the bald head as an example, the face editing image can be segmented to obtain a corresponding bald head portrait mask map (denoted as m bald ), which is used as the target portrait mask map in this embodiment.
然后,如图3B所示,为了保证目标属性编辑前后非目标属性的不变性,本实施例融合目标人像掩码图和当前人脸图像,从而保留当前人脸图像内的非人像区域(例如背景区域),并与人脸编辑图像内的目标人像掩码图进行融合,从而得到对应的人脸融合图像。最后,由于目标域掩码图内存在部分区域在目标属性编辑后会变成背景区域,因此本实施例会计算出目标域掩码图对目标人像掩码图的差集(也就是m hair对m bald的差集),然后将该差集和人脸融合图像共同 输入到预先训练好的图像修复模型(inpainting)中,由该图像修复模型按照目标域掩码图对目标人像掩码图的差集,对该人脸融合图像进行图像修复,从而得到完成目标属性编辑的人脸图像。 Then, as shown in FIG. 3B , in order to ensure the invariance of non-target attributes before and after target attribute editing, this embodiment fuses the target portrait mask map and the current face image, thereby retaining the non-portrait area (such as the background) in the current face image. area), and fuse with the target portrait mask image in the edited face image to obtain the corresponding face fusion image. Finally, since some areas in the target domain mask map will become background areas after the target attribute is edited, this embodiment will calculate the difference between the target domain mask map and the target portrait mask map (that is, m hair to m bald ’s difference set), and then the difference set and the face fusion image are jointly input into the pre-trained image inpainting model (inpainting), and the image inpainting model is based on the difference between the target domain mask map and the target portrait mask map Set, image repair is performed on the face fusion image, so as to obtain a face image that completes target attribute editing.
本实施例,将待编辑的当前人脸图像输入到采用上述方式所训练的人脸属性编辑模型中,可以得到对应的人脸编辑图像,并对当前人脸图像进行目标分割,得到对应的目标域掩码图,然后利用目标域掩码图对人脸编辑图像进行图像修复,得到完成目标属性编辑的人脸图像,此时通过对人脸属性编辑模型输出的人脸编辑图像进行掩码修复处理,能够提高人脸目标属性编辑的准确性,确保当前人脸图像对目标属性编辑后的真实自然性。In this embodiment, the current face image to be edited is input into the face attribute editing model trained in the above manner, the corresponding edited face image can be obtained, and the target segmentation is performed on the current face image to obtain the corresponding target Domain mask map, and then use the target domain mask map to perform image restoration on the edited face image, and obtain the face image that has completed the target attribute editing. At this time, perform mask restoration on the edited face image output by the face attribute editing model Processing can improve the accuracy of face target attribute editing, and ensure the authenticity and naturalness of the target attribute edited by the current face image.
实施例四Embodiment four
图4为本申请实施例四提供的一种人脸属性编辑模型的训练装置的结构示意图,如图4所示,该装置可以包括:FIG. 4 is a schematic structural diagram of a training device for a face attribute editing model provided in Embodiment 4 of the present application. As shown in FIG. 4 , the device may include:
模型构建模块410,设置为根据人脸训练图像的重构参数构建初始的人脸属性编辑模型,所述人脸属性编辑模型内预设有目标对抗损失函数和相似度损失函数;其中,所述目标对抗损失函数用于约束所述人脸训练图像内目标属性的编辑真实性,所述相似度损失函数用于约束所述人脸训练图像经过所述人脸属性编辑模型时的非目标属性与所述人脸训练图像重构时的非目标属性之间的相似度;The model construction module 410 is configured to construct an initial face attribute editing model according to the reconstruction parameters of the face training image, and a target confrontation loss function and a similarity loss function are preset in the face attribute editing model; wherein, the The target confrontation loss function is used to constrain the editing authenticity of the target attribute in the face training image, and the similarity loss function is used to constrain the non-target attribute and The similarity between the non-target attributes when the face training image is reconstructed;
模型训练模块420,设置为将所述人脸训练图像输入到所述人脸属性编辑模型内,利用所述目标对抗损失函数和所述相似度损失函数对所述人脸属性编辑模型进行训练,得到训练后的人脸属性编辑模型。The model training module 420 is configured to input the face training image into the face attribute editing model, and use the target adversarial loss function and the similarity loss function to train the face attribute editing model, Obtain the trained face attribute editing model.
本实施例,根据人脸训练图像的重构参数构建初始的人脸属性编辑模型,并在人脸属性编辑模型内预设目标对抗损失函数和相似度损失函数,以通过目标对抗损失函数约束人脸训练图像内目标属性的编辑真实性,通过相似度损失函数约束人脸训练图像经过人脸属性编辑模型时的非目标属性与人脸训练图像重构时的非目标属性之间的相似度,然后利用目标对抗损失函数和相似度损失函数共同对所述人脸属性编辑模型进行训练,实现人脸属性编辑模型在编辑人脸图像内的目标属性时对目标属性和非目标属性之间的共同约束,避免出现在编辑目标属性时带动背景区域和非目标属性发生变更的问题,确保在编辑目标属性的同时,保持非目标属性的编辑不变性,从而提升人脸属性编辑模型对于目标属性的编辑准确性。In this embodiment, the initial face attribute editing model is constructed according to the reconstruction parameters of the face training image, and the target adversarial loss function and similarity loss function are preset in the face attribute editing model, so as to constrain the human face through the target adversarial loss function. The editing authenticity of the target attributes in the face training image, constraining the similarity between the non-target attributes of the face training image through the face attribute editing model and the non-target attributes of the face training image reconstruction through the similarity loss function, Then, the face attribute editing model is jointly trained by using the target confrontation loss function and the similarity loss function, so that the face attribute editing model can realize the mutual recognition between the target attribute and the non-target attribute when editing the target attribute in the face image. Constraints, to avoid the problem of changing the background area and non-target attributes when editing the target attributes, to ensure that the editing of the non-target attributes remains unchanged while editing the target attributes, thereby improving the editing of the face attribute editing model for the editing of the target attributes accuracy.
本实施例提供的人脸属性编辑模型的训练装置可适用于上述任意实施例提供的人脸属性编辑模型的训练方法,具备相应的功能和有益效果。The training device for the face attribute editing model provided in this embodiment can be applied to the training method for the face attribute editing model provided in any of the above embodiments, and has corresponding functions and beneficial effects.
实施例五Embodiment five
图5为本申请实施例五提供的一种人脸属性编辑装置的结构示意图,如图5所示,该装置可以包括:FIG. 5 is a schematic structural diagram of a face attribute editing device provided in Embodiment 5 of the present application. As shown in FIG. 5, the device may include:
初步编辑模块510,设置为将待编辑的当前人脸图像输入到采用上述实施例提供的人脸属性编辑模型的训练方法所训练的人脸属性编辑模型中,得到对应的人脸编辑图像;The preliminary editing module 510 is configured to input the current human face image to be edited into the human face attribute editing model trained by the training method of the human face attribute editing model provided by the above-mentioned embodiment, and obtain a corresponding human face editing image;
目标分割模块520,设置为对所述当前人脸图像进行目标分割,得到对应的目标域掩码图;The target segmentation module 520 is configured to perform target segmentation on the current face image to obtain a corresponding target domain mask;
编辑修复模块530,设置为利用所述目标域掩码图对所述人脸编辑图像进行图像修复,得到完成目标属性编辑的人脸图像。The editing and repairing module 530 is configured to use the target domain mask map to perform image repair on the edited face image to obtain a face image with edited target attributes.
本实施例,将待编辑的当前人脸图像输入到采用上述方式所训练的人脸属性编辑模型中,可以得到对应的人脸编辑图像,并对当前人脸图像进行目标分割,得到对应的目标域掩码图,然后利用目标域掩码图对人脸编辑图像进行图像修复,得到完成目标属性编辑的人脸图像,此时通过对人脸属性编辑模型输出的人脸编辑图像进行掩码修复处理,能够提高人脸目标属性编辑的准确性,确保当前人脸图像对目标属性编辑后的真实自然性。In this embodiment, the current face image to be edited is input into the face attribute editing model trained in the above manner, the corresponding edited face image can be obtained, and the target segmentation is performed on the current face image to obtain the corresponding target Domain mask map, and then use the target domain mask map to perform image restoration on the edited face image, and obtain the face image that has completed the target attribute editing. At this time, perform mask restoration on the edited face image output by the face attribute editing model Processing can improve the accuracy of face target attribute editing, and ensure the authenticity and naturalness of the target attribute edited by the current face image.
本实施例提供的人脸属性编辑装置可适用于上述任意实施例提供的人脸属性编辑方法,具备相应的功能和有益效果。The face attribute editing device provided in this embodiment can be applied to the face attribute editing method provided in any of the above embodiments, and has corresponding functions and beneficial effects.
实施例六Embodiment six
图6为本申请实施例六提供的一种电子设备的结构示意图,如图6所示,该电子设备包括处理器60、存储装置61和通信装置62;电子设备中处理器60的数量可以是一个或多个,图6中以一个处理器60为例;电子设备中的处理器60、存储装置61和通信装置62可以通过总线或其他方式连接,图6中以通过总线连接为例。FIG. 6 is a schematic structural diagram of an electronic device provided in Embodiment 6 of the present application. As shown in FIG. 6, the electronic device includes a processor 60, a storage device 61, and a communication device 62; the number of processors 60 in the electronic device may be One or more, one processor 60 is taken as an example in FIG. 6; the processor 60, storage device 61 and communication device 62 in the electronic device may be connected through a bus or in other ways, and a bus connection is taken as an example in FIG. 6.
本实施例提供的一种电子设备可用于执行上述任意实施例提供的人脸属性编辑模型的训练方法,或者人脸属性编辑方法,具备相应的功能和有益效果。An electronic device provided in this embodiment can be used to execute the training method of the face attribute editing model provided in any of the above embodiments, or the face attribute editing method, and has corresponding functions and beneficial effects.
实施例七Embodiment seven
本申请实施例七还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时可实现上述任意实施例中的人脸属性编辑模型的训练方法,或者人脸属性编辑方法。Embodiment 7 of the present application also provides a computer-readable storage medium, on which a computer program is stored. When the program is executed by a processor, the training method of the face attribute editing model in any of the above-mentioned embodiments, or the face Property editing method.
本申请实施例所提供的一种包含计算机可执行指令的存储介质,其计算机可执行指令不限于如上所述的方法操作,还可以执行本申请任意实施例所提供的人脸属性编辑模型的训练方法,或者人脸属性编辑方法中的相关操作。A storage medium containing computer-executable instructions provided in an embodiment of the present application, the computer-executable instructions are not limited to the method operations described above, and can also perform the training of the face attribute editing model provided in any embodiment of the present application method, or related operations in the face attribute editing method.
通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到,本申请可借助软件及必需的通用硬件来实现,当然也可以通过硬件实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的实施例本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the above description about the implementation, those skilled in the art can clearly understand that the present application can be realized by means of software and necessary general-purpose hardware, and of course it can also be realized by hardware, but in many cases the former is a better implementation . Based on this understanding, the essence of the embodiment of the present application or the part that contributes to the related technology can be embodied in the form of software products, and the computer software products can be stored in computer-readable storage media, such as computer floppy disks, Read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disc, etc., including several instructions to make a computer device (which can be a personal computer, A server, or a network device, etc.) executes the methods described in various embodiments of the present application.
存储介质可以是非暂态(non-transitory)存储介质。The storage medium may be a non-transitory storage medium.
值得注意的是,上述人脸属性编辑模型的训练装置,或者人脸属性编辑装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。It is worth noting that, in the above-mentioned training device for face attribute editing model, or in the embodiments of the face attribute editing device, the various units and modules included are only divided according to functional logic, but are not limited to the above-mentioned divisions, As long as the corresponding functions can be realized; in addition, the specific names of the functional units are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application.

Claims (12)

  1. 一种人脸属性编辑模型的训练方法,包括:A training method for a human face attribute editing model, comprising:
    根据人脸训练图像的重构参数构建初始的人脸属性编辑模型,所述人脸属性编辑模型内预设有目标对抗损失函数和相似度损失函数;其中,所述目标对抗损失函数设置为约束所述人脸训练图像内目标属性的编辑真实性,所述相似度损失函数设置为约束所述人脸训练图像经过所述人脸属性编辑模型时的非目标属性与所述人脸训练图像重构时的非目标属性之间的相似度;Construct an initial face attribute editing model according to the reconstruction parameters of the face training image, and the target confrontation loss function and similarity loss function are preset in the face attribute editing model; wherein, the target confrontation loss function is set as a constraint The editing authenticity of the target attribute in the human face training image, the similarity loss function is set to constrain the non-target attribute of the human face training image when it passes through the human face attribute editing model and the same as the human face training image The similarity between the non-target attributes of time structure;
    将所述人脸训练图像输入到所述人脸属性编辑模型内,利用所述目标对抗损失函数和所述相似度损失函数对所述人脸属性编辑模型进行训练,得到训练后的人脸属性编辑模型。The human face training image is input into the human face attribute editing model, and the human face attribute editing model is trained by using the target confrontation loss function and the similarity loss function to obtain the trained human face attribute Edit the model.
  2. 根据权利要求1所述的方法,其中,所述相似度损失函数包括低分辨感知损失函数和掩码感知损失函数;The method according to claim 1, wherein the similarity loss function comprises a low resolution perceptual loss function and a mask perceptual loss function;
    其中,所述低分辨感知损失函数设置为约束所述人脸训练图像经过所述人脸属性编辑模型进行目标属性编辑时输出的第一中间低分辨图像与所述人脸训练图像重构时生成的第二中间低分辨图像之间的相似度;Wherein, the low-resolution perceptual loss function is set to constrain the face training image to be generated when the face training image is reconstructed from the first intermediate low-resolution image output when the face attribute editing model is used for target attribute editing The similarity between the second intermediate low-resolution images of ;
    所述掩码感知损失函数设置为约束所述人脸训练图像经过人脸属性编辑模型进行目标属性编辑后提取的第一关键点掩码特征与所述人脸训练图像重构后提取的第二关键点掩码特征之间的相似度。The mask-aware loss function is set to constrain the first key point mask feature extracted after the target attribute editing of the face training image through the face attribute editing model and the second key point mask feature extracted after the face training image is reconstructed. The similarity between keypoint mask features.
  3. 根据权利要求1所述的方法,其中,所述人脸属性编辑模型包括图像编码网络、迁移解码网络和所述迁移解码网络的隐空间;The method according to claim 1, wherein the human face attribute editing model comprises an image coding network, a migration decoding network and a latent space of the migration decoding network;
    其中,所述图像编码网络设置为输入所述人脸训练图像,向所述隐空间输出所述人脸训练图像的隐含变量特征;Wherein, the image coding network is configured to input the human face training image, and output the hidden variable features of the human face training image to the hidden space;
    所述迁移解码网络设置为输入所述隐含变量特征,对所述人脸训练图像内的目标属性进行编辑,输出对应的人脸编辑图像。The migration decoding network is configured to input the hidden variable features, edit the target attributes in the face training image, and output the corresponding edited face image.
  4. 根据权利要求3所述的方法,其中,所述根据人脸训练图像的重构参数构建初始的人脸属性编辑模型,包括:The method according to claim 3, wherein said constructing an initial face attribute editing model according to the reconstruction parameters of the face training image comprises:
    按照已训练的重构解码网络的网络参数构建初始的迁移解码网络;Construct the initial migration decoding network according to the network parameters of the trained reconstructed decoding network;
    将预训练的图像编码网络、所述初始的迁移解码网络和所述迁移解码网络的隐空间构建为初始的人脸属性编辑模型。The pre-trained image encoding network, the initial migration decoding network and the latent space of the migration decoding network are constructed as an initial face attribute editing model.
  5. 根据权利要求3或4所述的方法,其中,所述将所述人脸训练图像输入到所述人脸属性编辑模型内,利用所述目标对抗损失函数和所述相似度损失函数对所述人脸属性编辑模型进行训练,得到训练后的人脸属性编辑模型,包括:The method according to claim 3 or 4, wherein said inputting said face training image into said face attribute editing model, using said target adversarial loss function and said similarity loss function to The face attribute editing model is trained, and the trained face attribute editing model is obtained, including:
    将所述人脸训练图像输入到所述图像编码网络内,向所述隐空间输出所述 人脸训练图像的隐含变量特征;The human face training image is input into the image encoding network, and the latent variable characteristics of the human face training image are output to the hidden space;
    将所述隐含变量特征输入到所述迁移解码网络中,得到目标属性编辑关联的第一人脸图像;The latent variable feature is input into the migration decoding network to obtain the first human face image associated with the target attribute editing;
    将所述隐含变量特征输入到已训练的重构解码网络中,得到重构关联的第二人脸图像;Inputting the hidden variable features into the trained reconstruction decoding network to obtain the second face image associated with reconstruction;
    将所述第一人脸图像代入到所述目标对抗损失函数中,将所述第一人脸图像和所述第二人脸图像代入到所述相似度损失函数中,对所述迁移解码网络进行训练,得到训练后的人脸属性编辑模型。Substituting the first human face image into the target adversarial loss function, substituting the first human face image and the second human face image into the similarity loss function, the migration decoding network Perform training to obtain a trained face attribute editing model.
  6. 根据权利要求5所述的方法,其中,所述相似度损失函数包括低分辨感知损失函数和掩码感知损失函数,所述第一人脸图像包括所述人脸训练图像经过所述迁移解码网络进行目标属性编辑时输出的第一中间低分辨图像和执行目标属性编辑后输出的人脸编辑图像,所述第二人脸图像包括所述人脸训练图像经过所述重构解码网络进行重构时生成的第二中间低分辨图像和执行重构后输出的人脸重构图像;The method according to claim 5, wherein the similarity loss function includes a low-resolution perceptual loss function and a mask perceptual loss function, and the first face image includes the face training image passed through the migration decoding network The first intermediate low-resolution image output during target attribute editing and the face edited image output after target attribute editing, the second face image including the face training image is reconstructed through the reconstruction decoding network The second intermediate low-resolution image generated at the time and the face reconstructed image output after performing reconstruction;
    所述将所述第一人脸图像和所述第二人脸图像代入到所述相似度损失函数中,对所述迁移解码网络进行训练,包括:Said substituting said first human face image and said second human face image into said similarity loss function, said migration decoding network is trained, comprising:
    将所述第一中间低分辨图像和所述第二中间低分辨图像代入到所述低分辨感知损失函数中,将所述人脸编辑图像中提取的第一关键点掩码特征和所述人脸重构图像中提取的第二关键点掩码特征代入到所述掩码感知损失函数中,对所述迁移解码网络进行训练。Substituting the first intermediate low-resolution image and the second intermediate low-resolution image into the low-resolution perceptual loss function, combining the first key point mask feature extracted from the edited face image and the human face The second key point mask feature extracted from the face reconstruction image is substituted into the mask-aware loss function to train the migration decoding network.
  7. 一种人脸属性编辑方法,包括:A method for editing face attributes, comprising:
    将待编辑的当前人脸图像输入到采用权利要求1-6中任一项所述的人脸属性编辑模型的训练方法所训练的人脸属性编辑模型中,得到对应的人脸编辑图像;The current human face image to be edited is input in the human face attribute editing model trained by the training method of the human face attribute editing model described in any one of claims 1-6, to obtain the corresponding human face editing image;
    对所述当前人脸图像进行目标分割,得到对应的目标域掩码图;performing target segmentation on the current face image to obtain a corresponding target domain mask;
    利用所述目标域掩码图对所述人脸编辑图像进行图像修复,得到完成目标属性编辑的人脸图像。Performing image restoration on the edited face image by using the target domain mask map to obtain a face image with edited target attributes.
  8. 根据权利要求7所述的方法,其中,所述利用所述目标域掩码图对所述人脸编辑图像进行图像修复,得到完成目标属性编辑的人脸图像,包括:The method according to claim 7, wherein said performing image restoration on the edited face image by using the target domain mask map to obtain a face image with edited target attributes includes:
    对所述人脸编辑图像进行人像分割,得到对应的目标人像掩码图;Carrying out portrait segmentation to the edited face image to obtain a corresponding target portrait mask;
    融合所述目标人像掩码图和所述当前人脸图像,得到对应的人脸融合图像;Fusion of the target portrait mask and the current face image to obtain a corresponding face fusion image;
    利用所述目标域掩码图对所述目标人像掩码图的差集,对所述人脸融合图 像进行图像修复,得到完成目标属性编辑的人脸图像。Utilize the difference set of the target domain mask map to the target portrait mask map, carry out image restoration to the fusion image of the human face, and obtain the human face image that completes target attribute editing.
  9. 一种人脸属性编辑模型的训练装置,包括:A training device for a human face attribute editing model, comprising:
    模型构建模块,设置为根据人脸训练图像的重构参数构建初始的人脸属性编辑模型,所述人脸属性编辑模型内预设有目标对抗损失函数和相似度损失函数;其中,所述目标对抗损失函数设置为约束所述人脸训练图像内目标属性的编辑真实性,所述相似度损失函数设置为约束所述人脸训练图像经过所述人脸属性编辑模型时的非目标属性与所述人脸训练图像重构时的非目标属性之间的相似度;The model building module is configured to construct an initial face attribute editing model according to the reconstruction parameters of the face training image, and a target confrontation loss function and a similarity loss function are preset in the face attribute editing model; wherein, the target The adversarial loss function is set to constrain the editing authenticity of the target attribute in the face training image, and the similarity loss function is set to constrain the non-target attribute of the face training image when it passes through the face attribute editing model. Describe the similarity between the non-target attributes when the face training image is reconstructed;
    模型训练模块,设置为将所述人脸训练图像输入到所述人脸属性编辑模型内,利用所述目标对抗损失函数和所述相似度损失函数对所述人脸属性编辑模型进行训练,得到训练后的人脸属性编辑模型。The model training module is configured to input the face training image into the face attribute editing model, and use the target confrontation loss function and the similarity loss function to train the face attribute editing model to obtain The trained face attribute editing model.
  10. 一种人脸属性编辑装置,包括:A device for editing human face attributes, comprising:
    初步编辑模块,设置为将待编辑的当前人脸图像输入到采用权利要求1-6中任一项所述的人脸属性编辑模型的训练方法所训练的人脸属性编辑模型中,得到对应的人脸编辑图像;The primary editing module is configured to input the current human face image to be edited into the human face attribute editing model trained by the training method of the human face attribute editing model described in any one of claims 1-6, to obtain the corresponding face editing images;
    目标分割模块,设置为对所述当前人脸图像进行目标分割,得到对应的目标域掩码图;The target segmentation module is configured to perform target segmentation on the current face image to obtain a corresponding target domain mask;
    编辑修复模块,设置为利用所述目标域掩码图对所述人脸编辑图像进行图像修复,得到完成目标属性编辑的人脸图像。The editing and repairing module is configured to use the target domain mask image to perform image repair on the edited face image to obtain a face image with edited target attributes.
  11. 一种电子设备,包括:An electronic device comprising:
    一个或多个处理器;one or more processors;
    存储装置,设置为存储一个或多个程序;a storage device configured to store one or more programs;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-6中任一所述的人脸属性编辑模型的训练方法,或者,实现如权利要求7或8所述的人脸属性编辑方法。When the one or more programs are executed by the one or more processors, so that the one or more processors implement the training method of the face attribute editing model according to any one of claims 1-6, Or, realize the human face attribute editing method as described in claim 7 or 8.
  12. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-6中任一所述的人脸属性编辑模型的训练方法,或者,实现如权利要求7或8所述的人脸属性编辑方法。A computer-readable storage medium, the computer-readable storage medium is stored with a computer program, and when the computer program is executed by a processor, it realizes the training of the human face attribute editing model according to any one of claims 1-6 method, or realize the face attribute editing method as claimed in claim 7 or 8.
PCT/CN2022/127361 2021-10-25 2022-10-25 Face attribute editing model training and face attribute editing methods WO2023072067A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111239087.4A CN113963409A (en) 2021-10-25 2021-10-25 Training of face attribute editing model and face attribute editing method
CN202111239087.4 2021-10-25

Publications (1)

Publication Number Publication Date
WO2023072067A1 true WO2023072067A1 (en) 2023-05-04

Family

ID=79466512

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/127361 WO2023072067A1 (en) 2021-10-25 2022-10-25 Face attribute editing model training and face attribute editing methods

Country Status (2)

Country Link
CN (1) CN113963409A (en)
WO (1) WO2023072067A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116363263A (en) * 2023-06-01 2023-06-30 北京邃芒科技有限公司 Image editing method, system, electronic device and storage medium
CN116629315A (en) * 2023-05-23 2023-08-22 北京百度网讯科技有限公司 Training method, device, equipment and medium of perception model
CN117115295A (en) * 2023-09-28 2023-11-24 北京数字力场科技有限公司 Face texture generation method, electronic equipment and computer storage medium
CN117765620A (en) * 2023-12-26 2024-03-26 中国信息通信研究院 Self-enhancement-based false identification method and system for deep fake image

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113963409A (en) * 2021-10-25 2022-01-21 百果园技术(新加坡)有限公司 Training of face attribute editing model and face attribute editing method
CN115082292A (en) * 2022-06-06 2022-09-20 华南理工大学 Human face multi-attribute editing method based on global attribute editing direction
CN115937009A (en) * 2022-06-10 2023-04-07 脸萌有限公司 Image processing method, image processing device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560758A (en) * 2020-12-24 2021-03-26 百果园技术(新加坡)有限公司 Face attribute editing method, system, electronic equipment and storage medium
CN112819689A (en) * 2021-02-02 2021-05-18 百果园技术(新加坡)有限公司 Training method of face attribute editing model, face attribute editing method and equipment
CN113255551A (en) * 2021-06-04 2021-08-13 广州虎牙科技有限公司 Training, face editing and live broadcasting method of face editor and related device
US20210319532A1 (en) * 2020-04-14 2021-10-14 Adobe Inc. Automatic image warping for warped image generation
CN113963409A (en) * 2021-10-25 2022-01-21 百果园技术(新加坡)有限公司 Training of face attribute editing model and face attribute editing method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210319532A1 (en) * 2020-04-14 2021-10-14 Adobe Inc. Automatic image warping for warped image generation
CN112560758A (en) * 2020-12-24 2021-03-26 百果园技术(新加坡)有限公司 Face attribute editing method, system, electronic equipment and storage medium
CN112819689A (en) * 2021-02-02 2021-05-18 百果园技术(新加坡)有限公司 Training method of face attribute editing model, face attribute editing method and equipment
CN113255551A (en) * 2021-06-04 2021-08-13 广州虎牙科技有限公司 Training, face editing and live broadcasting method of face editor and related device
CN113963409A (en) * 2021-10-25 2022-01-21 百果园技术(新加坡)有限公司 Training of face attribute editing model and face attribute editing method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116629315A (en) * 2023-05-23 2023-08-22 北京百度网讯科技有限公司 Training method, device, equipment and medium of perception model
CN116629315B (en) * 2023-05-23 2024-02-20 北京百度网讯科技有限公司 Training method, device, equipment and medium of perception model
CN116363263A (en) * 2023-06-01 2023-06-30 北京邃芒科技有限公司 Image editing method, system, electronic device and storage medium
CN116363263B (en) * 2023-06-01 2023-10-27 北京邃芒科技有限公司 Image editing method, system, electronic device and storage medium
CN117115295A (en) * 2023-09-28 2023-11-24 北京数字力场科技有限公司 Face texture generation method, electronic equipment and computer storage medium
CN117765620A (en) * 2023-12-26 2024-03-26 中国信息通信研究院 Self-enhancement-based false identification method and system for deep fake image

Also Published As

Publication number Publication date
CN113963409A (en) 2022-01-21

Similar Documents

Publication Publication Date Title
WO2023072067A1 (en) Face attribute editing model training and face attribute editing methods
CN111489287B (en) Image conversion method, device, computer equipment and storage medium
Lu et al. Image generation from sketch constraint using contextual gan
WO2021073417A1 (en) Expression generation method and apparatus, device and storage medium
CN111858954A (en) Task-oriented text-generated image network model
US11914645B2 (en) Systems and methods for generating improved content based on matching mappings
KR102602112B1 (en) Data processing method, device, and medium for generating facial images
CN113901894A (en) Video generation method, device, server and storage medium
WO2024109374A1 (en) Training method and apparatus for face swapping model, and device, storage medium and program product
CN113781324B (en) Old photo restoration method
CN112861805B (en) Face image generation method based on content characteristics and style characteristics
Sun et al. Masked lip-sync prediction by audio-visual contextual exploitation in transformers
Huang et al. Multi-density sketch-to-image translation network
CN114972016A (en) Image processing method, image processing apparatus, computer device, storage medium, and program product
CN116912924B (en) Target image recognition method and device
CN116977903A (en) AIGC method for intelligently generating short video through text
CN113012030A (en) Image splicing method, device and equipment
WO2022252372A1 (en) Image processing method, apparatus and device, and computer-readable storage medium
CN114863000A (en) Method, device, medium and equipment for generating hairstyle
CN113129399A (en) Pattern generation
Mitsouras et al. U-Sketch: An Efficient Approach for Sketch to Image Diffusion Models
Liu et al. Prediction with Visual Evidence: Sketch Classification Explanation via Stroke-Level Attributions
Guo Researches Advanced in Generative Adversarial Networks and Their Applications for Image-Generating NFT
US20240169701A1 (en) Affordance-based reposing of an object in a scene
Peng et al. Research on Colorization of Qinghai Farmer Painting Image Based on Generative Adversarial Networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22885944

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE