WO2023072067A1

WO2023072067A1 - Face attribute editing model training and face attribute editing methods

Info

Publication number: WO2023072067A1
Application number: PCT/CN2022/127361
Authority: WO
Inventors: 黄嘉彬; 李玉乐; 项伟
Original assignee: 百果园技术(新加坡)有限公司; 黄嘉彬
Priority date: 2021-10-25
Filing date: 2022-10-25
Publication date: 2023-05-04
Also published as: CN113963409A

Abstract

The present application discloses face attribute editing model training and face attribute editing methods. The training method comprises: constructing an initial face attribute editing model according to a reconstruction parameter of a face training image, the face attribute editing model having a preset target adversarial loss function and a similarity loss function; inputting the face training image into the face attribute editing model, and training the face attribute editing model using the target adversarial loss function and the similarity loss function, to obtain a trained face attribute editing model.

Description

Training of face attribute editing model and face attribute editing method

This application claims priority to a Chinese patent application with application number 202111239087.4 filed with the China Patent Office on October 25, 2021, the entire contents of which are incorporated herein by reference.

technical field

The embodiment of the present application relates to the technical field of image processing, for example, it relates to the training of a human face attribute editing model and a method for editing human face attributes.

Background technique

Face attribute editing is an important technology in the field of computer vision. It is widely used in content production, film production and entertainment videos, such as changing bald head, changing hairstyle, changing child, changing star face, etc. Face attribute editing is to give an input image containing a face and the target attribute to be edited, and then transform the input image into a target domain face image with the target attribute, and ensure that other original attribute features in the face image remain unchanged .

A Generative Adversarial Networks (GAN) is usually pre-trained to achieve target attribute editing of face images. At this time, facing the difference between the output image and the input image in the generative adversarial network, a global loss function will be uniformly set for the generative adversarial network, and the global loss function will be used to train the generative adversarial network, thereby guiding the generative adversarial network. The face images output by the adversarial network can have specific target attributes.

However, due to the global loss function, there will be a corresponding strong constraint relationship between the input image and the output image of the generative adversarial network, so that although the target attributes in the face image output by the trained generative adversarial network can be edited, they are not consistent with the input Compared with images, while editing the target attributes, it may lead to changes in the background area and non-target attribute areas, making the update of the background area and non-target attribute areas in the output image unnatural, which greatly reduces the editing accuracy of face target attributes. .

Contents of the invention

Embodiments of the present application provide a face attribute editing model training and a face attribute editing method.

In the first aspect, the embodiment of the present application provides a training method for a human face attribute editing model, the method comprising:

Construct an initial face attribute editing model according to the reconstruction parameters of the face training image, and the target confrontation loss function and similarity loss function are preset in the face attribute editing model; wherein, the target confrontation loss function is set as a constraint The editing authenticity of the target attribute in the human face training image, the similarity loss function is set to constrain the non-target attribute of the human face training image when it passes through the human face attribute editing model and the same as the human face training image The similarity between the non-target attributes of time structure;

The human face training image is input into the human face attribute editing model, and the human face attribute editing model is trained by using the target confrontation loss function and the similarity loss function to obtain the trained human face attribute Edit the model.

In a second aspect, the embodiment of the present application provides a method for editing face attributes, the method comprising:

Input the current face image to be edited into the face attribute editing model trained by the training method of the face attribute editing model provided by the first aspect above, and obtain the corresponding face editing image;

performing target segmentation on the current face image to obtain a corresponding target domain mask;

Performing image restoration on the edited face image by using the target domain mask map to obtain a face image with edited target attributes.

In a third aspect, the embodiment of the present application provides a training device for a face attribute editing model, the device comprising:

The model building module is configured to construct an initial face attribute editing model according to the reconstruction parameters of the face training image, and a target confrontation loss function and a similarity loss function are preset in the face attribute editing model; wherein, the target The adversarial loss function is set to constrain the editing authenticity of the target attribute in the face training image, and the similarity loss function is set to constrain the non-target attribute of the face training image when it passes through the face attribute editing model. Describe the similarity between the non-target attributes when the face training image is reconstructed;

The model training module is configured to input the face training image into the face attribute editing model, and use the target confrontation loss function and the similarity loss function to train the face attribute editing model to obtain The trained face attribute editing model.

In a fourth aspect, the embodiment of the present application provides a device for editing human face attributes, which includes:

The preliminary editing module is configured to input the current face image to be edited into the face attribute editing model trained by the training method of the face attribute editing model provided by the above-mentioned first aspect, to obtain a corresponding face editing image;

The target segmentation module is configured to perform target segmentation on the current face image to obtain a corresponding target domain mask;

The editing and repairing module is configured to use the target domain mask image to perform image repair on the edited face image to obtain a face image with edited target attributes.

In a fifth aspect, the embodiment of the present application provides an electronic device, the electronic device includes:

one or more processors;

a storage device configured to store one or more programs;

When the one or more programs are executed by the one or more processors, the one or more processors realize the training method of the face attribute editing model provided by the first aspect above, or realize the second above Face attribute editing method provided by aspect.

In the sixth aspect, the embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for training the face attribute editing model provided in the above-mentioned first aspect is implemented, or, Realize the face attribute editing method provided by the second aspect above.

Description of drawings

FIG. 1A is a flowchart of a training method for a face attribute editing model provided in Embodiment 1 of the present application;

FIG. 1B is a schematic diagram of the training process of the face attribute editing model provided in Embodiment 1 of the present application;

Fig. 2A is the flow chart of the training method of a kind of human face attribute editing model provided in embodiment 2 of the present application;

FIG. 2B is a schematic diagram of the training process of the face attribute editing model provided in Embodiment 2 of the present application;

FIG. 2C is a schematic structural diagram of the face attribute editing model provided in Embodiment 2 of the present application;

FIG. 3A is a flowchart of a face attribute editing method provided in Embodiment 3 of the present application;

FIG. 3B is a schematic diagram of the principle of the face attribute editing process provided by Embodiment 3 of the present application;

4 is a schematic structural diagram of a training device for a face attribute editing model provided in Embodiment 4 of the present application;

FIG. 5 is a schematic structural diagram of a face attribute editing device provided in Embodiment 5 of the present application;

FIG. 6 is a schematic structural diagram of an electronic device provided in Embodiment 6 of the present application.

Detailed ways

The application will be described below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, only some structures related to the present application are shown in the drawings but not all structures. In addition, the embodiments in the present application and the features in the embodiments can be combined with each other under the condition of no conflict.

Embodiment one

FIG. 1A is a flowchart of a training method for a face attribute editing model provided in Embodiment 1 of the present application. In this embodiment, various attributes in any face image can be re-edited to change the face style. The training method of the human face attribute editing model provided by this embodiment can be executed by the training device of the human face attribute editing model provided by the embodiment of this application, which device can be realized by means of software and/or hardware, and integrated in the execution of this application method of electronic equipment.

Referring to Figure 1A, the method may include the following steps:

S110. Construct an initial face attribute editing model according to the reconstruction parameters of the face training image, and a target adversarial loss function and a similarity loss function are preset in the face attribute editing model.

For example, when considering the network model training for face attribute editing, the set global loss function is usually used to analyze the difference between the output image and the input image, so as to train the network model to understand the target attributes in the face image Make accurate edits. However, because the global loss function will cause a corresponding strong constraint relationship between the input image and the output image, it may cause changes in the background area and non-target attributes while editing the target attributes, making the edited output image not realistic enough. nature.

Therefore, in order to accurately re-edit the various attributes in the face image to meet the diverse needs of users for face attribute editing, this embodiment will specifically build a face attribute editing model for all types of face images. Natural and realistic editing of target properties within . Wherein, the target attribute can be a specific key point feature to be edited in the face image according to the user's editing requirements. For example, when the user needs to re-edit the portrait in the face image into a bald head, the target attribute is the Hair region features.

It should be noted that, in order to ensure the training efficiency of the face attribute editing model, considering that the face image reconstruction method is relatively popular, and when the face image is edited for the target attribute, similar reconstruction will be performed on the face image. After the configuration step, the corresponding target attribute editing is further performed. Therefore, this embodiment can preliminarily construct an initial face attribute editing model on the basis of the face image reconstruction information, so that the initial face attribute editing model can realize the corresponding human face attribute editing model. Face image reconstruction. In the future, by adopting the method of transfer learning, the reconstruction ability of the face attribute editing model can be continuously changed, so that the face attribute editing model has the ability to accurately edit the target attribute.

For example, when constructing an initial face attribute editing model, a large number of face training images are first obtained as training samples for the constructed face attribute editing model. Then, each face training image can be reconstructed through the existing face image reconstruction method to determine the various reconstruction parameters used in the reconstruction of the human face training image, and then construct according to each reconstruction parameter The initial face attribute editing model enables the initial face attribute editing model to have the ability of face image reconstruction for subsequent transfer learning of target attribute editing.

Moreover, in order to avoid the problem caused by the global loss function causing a corresponding strong constraint relationship between the input image and the output image, this embodiment will pre-set the target confrontation loss function and similarity loss function in the face attribute editing model Two loss functions. Among them, the editing authenticity of the target attributes in the face training image is constrained by the target confrontation loss function, and the non-target attributes and face training image reconstruction of the face training image are constrained by the face attribute editing model through the similarity loss function The similarity between the non-target attributes at the time.

That is to say, in this embodiment, for the target attributes to be edited in the face image and the non-target attributes that do not need to be edited, different loss functions will be set in the face attribute editing model to improve the training process of the face attribute editing model. In , the editing situation of the target attribute and non-target attribute in the face training image is distinguished and guided respectively. The target confrontation loss function is used to constrain the editing authenticity of the face attribute editing model for the target attributes in the face training image to ensure the accurate editing of the target attributes in the face training image; the similarity loss function is used to constrain the face training image after The similarity between the non-target attributes when the face attribute editing model is reconstructed and the non-target attributes when the face training image is reconstructed, to ensure that the face training image is not edited after the face attribute editing model edits the target attributes. It will affect the non-target attributes in the face training image, and try to ensure the invariance of the non-target attributes of the face training image before and after editing.

Exemplarily, the face attribute editing model in this embodiment can adopt a generative confrontation (such as StyleGAN) model. At this time, the target confrontation loss function (GAN Loss) can be a WGAN-GP function with a gradient penalty, which is used to guide the face attribute editing model to move closer to the target attribute. Among them, the WGAN-GP function replaces the measures of the two probability distributions in the existing GAN network with the Wasserstein distance, making the training process of the face attribute editing model more stable. Moreover, the WGAN-GP function is optimized on the basis of the existing loss function, and the regular term GP (gradient penalty) is added to constrain the L2 norm of the input gradient of the discriminator to be constrained near 1. The target adversarial loss function in this embodiment can be:

Among them, D is the discriminator when the face attribute editing model adopts the generative confrontation network, G is the generator when the face attribute editing model adopts the generative confrontation network, and x is the face training image input into the generator in the face attribute editing model , y is the output image after the generator edits the target attributes in the face training image, which is used to input the discriminator in the face attribute editing model for authenticity discrimination.

Among them, the generative confrontation network can be divided into two parts, namely the generator and the discriminator. Loss _G is the loss function of the generator, and Loss _D is the loss function of the discriminator.

in,

It refers to the gradient of D(y) to y, and λ is a weight parameter used to measure the proportion of the multiplication item in the overall loss function.

Moreover, in one embodiment, considering that the face training image is edited through the face attribute editing model for target attribute editing, or when the image is reconstructed, due to the use of residual cross-connection on RGB images of different resolutions, therefore Some intermediate products, that is, intermediate low-resolution images, will be output at different resolutions. At this time, in order to ensure the editing accuracy of the face attribute editing model on the non-target attributes in the face training image, this embodiment divides the similarity loss function into two types: a low-resolution perceptual loss function and a mask perceptual loss function.

Among them, the low-resolution perceptual loss function is used to constrain the difference between the first intermediate low-resolution image output when the face training image is edited through the face attribute editing model for target attribute editing, and the second intermediate low-resolution image generated when the face training image is reconstructed. The similarity between them; the mask-aware loss function is used to constrain the first key point mask feature extracted after the target attribute editing of the face training image through the face attribute editing model and the second key point extracted after the reconstruction of the face training image The similarity between point mask features.

For example, in this embodiment, the low-resolution perceptual loss function (low-resolution perceptual loss) is used to constrain the similarity of the intermediate low-resolution images when the target attribute editing and reconstruction of the face training images are performed, so that the face attribute editing model can be used for the input The similarity constraint between the image and the output image does not form too strong a constraint that the target attribute cannot be edited. The low-resolution perceptual loss function of this embodiment can be: low-resolution perceptual loss=(VGG(G _{rec_l} (x))-VGG(G _{transfer_l} (x))) ² ; wherein, G _{transfer_l} (x) is a face training image The first intermediate low-resolution image output when the target attribute is edited through the face attribute editing model, G _{rec_l} (x) is the second intermediate low-resolution image generated when the face training image is reconstructed, VGG is the pre-trained feature extractor, Used to extract features of the first intermediate low-resolution image and the second intermediate low-resolution image.

Moreover, if the final output of the non-target attributes of the face attribute editing model is not constrained, the non-target attributes in the output image will not be able to maintain the pose of the input image, resulting in a large deviation from the input image. Therefore, the present embodiment uses the masked perceptual loss function (Masked perceptual loss) to use the face key point network to edit the target attributes of the face training image and reconstruct the face key points in the two output images output after the output ( Such as eyes, nose, mouth, etc.) to extract key point features, so as to obtain the first key point mask feature extracted after the face attribute editing model edits the target attribute and the face training image is reconstructed. The second keypoint mask features, and then use the mask-aware loss function to analyze the mask similarity between the same keypoints, so as to constrain the face training image for target attribute editing and reconstruction of each keypoint mask feature extracted The similarity between the non-target attributes represented. The mask-aware loss function in this embodiment can be:

masked resolution perceptual loss＝(VGG(mgG _rec (x))-VGG(mgG _transfer (x))) ²

Among them, VGG(m G _transfer (x)) is the first key point mask feature extracted from the face training image after the target attribute editing of the face attribute editing model, and VGG(mgG _rec (x)) is the weight of the face training image. The second key point mask feature extracted after construction, G _transfer (x) is the face edited image output after the target attribute editing of the face training image, G _rec (x) is the face output after the face training image is reconstructed Reconstruct the image, m is the key point of the face.

S120. Input the face training image into the face attribute editing model, use the target adversarial loss function and the similarity loss function to train the face attribute editing model, and obtain the trained face attribute editing model.

After constructing the initial face attribute editing model, each face training image will be continuously input into the face attribute editing model, and each model parameter set in the face attribute editing model will affect the input face The training image is subjected to feature processing, and the input face training image is reconstructed at the same time, so as to obtain the parameters required by the target adversarial loss function and similarity loss function set in the face attribute editing model. Calculate the loss results of the target adversarial loss function and similarity loss function. Then, the loss result is back-propagated into the face attribute editing model, so as to modify the current model parameters in the face attribute editing model accordingly. Then, continue to input the next face training image into the corrected face attribute editing model, continue to use the above-mentioned method to correct the model parameters in the face attribute editing model, and cycle in turn until the face attribute editing model has accurate The ability to edit the target attributes and keep the non-target attributes unchanged, so as to obtain the trained face attribute editing model.

In this embodiment, the initial face attribute editing model is constructed according to the reconstruction parameters of the face training image, and the target adversarial loss function and similarity loss function are preset in the face attribute editing model, so as to constrain the human face through the target adversarial loss function. The editing authenticity of the target attributes in the face training image, constraining the similarity between the non-target attributes of the face training image through the face attribute editing model and the non-target attributes of the face training image reconstruction through the similarity loss function, Then, the face attribute editing model is jointly trained by using the target confrontation loss function and the similarity loss function, so that the face attribute editing model can realize the mutual recognition between the target attribute and the non-target attribute when editing the target attribute in the face image. Constraints, to avoid the problem of changing the background area and non-target attributes when editing the target attributes, to ensure that the editing of the non-target attributes remains invariant while editing the target attributes, thereby improving the editing of the face attribute editing model for the target attributes accuracy.

Embodiment two

Fig. 2A is a flowchart of a training method of a face attribute editing model provided in Embodiment 2 of the present application, Fig. 2B is a schematic diagram of the principle of the training process of the face attribute editing model provided in Embodiment 2 of the present application, and Fig. 2C is the present application A schematic structural diagram of the face attribute editing model provided in the second embodiment of the application. This embodiment is adjusted on the basis of the foregoing embodiments. For example, as shown in Figure 2B, considering that the StyleGAN network has a powerful generation ability to generate real and natural images, and the StyleGAN network is a model that maps random noise to images, it cannot directly accept real images as input, so this embodiment The face attribute editing model in can include three parts: image encoding network, migration decoding network and hidden space of migration decoding network, so that the StyleGAN network can be used to accurately realize the target attribute editing of face images. As shown in Figure 2C, the image encoding network (The pSp Encoder) in the face attribute editing model of the present embodiment can adopt the network structure of feature pyramid, be used for inputting each face training image, thus transfer to the latent space of decoding network Output the hidden variable features of the face training image. At this time, the image coding network based on the feature pyramid structure can map the feature maps with different semantic information to the latent space, and transform the features of different granularities into multiple hidden variable features, so that the hidden variables with different semantic information The features are input to different layers in the transfer decoding network (StyleGAN decoder), so that the transfer decoding network is used to input the hidden variable features in the hidden space, edit the target attributes in the face training image, and output the corresponding face Edit images to enhance image reconstruction capabilities. Moreover, the basic module of the image coding network uses a residual module, which applies the input of the image coding network to the output of the image coding network in the form of a shortcut, so that the gradient can be directly transmitted back to the shallow layer parameters during backpropagation, Effectively suppress problems such as gradient disappearance.

In one embodiment, as shown in FIG. 2A, this embodiment may include the following steps:

S210. Construct an initial migration decoding network according to network parameters of the trained reconstructed decoding network.

In order to ensure the efficiency of the face attribute editing model, before constructing the initial face attribute editing model according to the reconstruction parameters of the face training images in this embodiment, a large number of face images with as strong a diversity as possible are first used as training samples, and pre-trained The corresponding reconstructed decoding network is produced, so that the reconstructed decoding network has the ability to reconstruct the face training image. Then, when constructing the initial face attribute editing model, the network parameters of the trained reconstructed decoding network can be directly used to construct the initial migration decoding network, so that the migration decoding network can be based on the reconstructed decoding network. In the way of transfer learning, the face attribute editing model with the ability to edit target attributes is obtained through continuous training.

S220. Construct the pre-trained image encoding network, the initial migration decoding network, and the latent space of the migration decoding network as an initial face attribute editing model.

Exemplarily, after constructing the initial migration decoding network, a large number of face images can be used to train the corresponding image encoding network, so that the image encoding network can accurately analyze the hidden variable features in the face training images. Then, as shown in FIG. 2B , the pre-trained image encoding network, the initial migration decoding network, and the latent space of the migration decoding network are jointly constructed to construct an initial face attribute editing model. At this time, since the image encoding network in the face attribute editing model is pre-trained, the training of the face attribute editing model in this embodiment is mainly aimed at training the migration decoding network. After the migration decoding network is trained, that is A trained face attribute editing model can be obtained.

S230. Input the face training image into the image coding network, and output the latent variable features of the face training image to the latent space.

After the initial face attribute editing model is constructed, the face attribute editing model will be trained. At this point, as shown in Figure 2B, in this embodiment, each face training image will be input into the image coding network in the face attribute editing model, and the image coding network will analyze the features of the face training image, and Output the hidden variable features of the face training image, and then output each hidden variable feature to the hidden space of the migration decoding network, so as to realize the transformation of the face training image from the real image to the hidden variable, so that the subsequent migration decoding network The corresponding hidden variable features can be input from the hidden space to realize the input accuracy of the migration decoding network.

S240, input the hidden variable features into the migration decoding network to obtain the first face image associated with the edited target attribute; input the hidden variable features into the trained reconstruction decoding network to obtain the second face image associated with the reconstruction face image.

Considering the target adversarial loss function and the constraint object of the similarity loss function set in the face attribute editing model, this embodiment will input the hidden variable features in the latent space into the migration decoding network and the reconstruction decoding network respectively. At this time, the migration decoding network performs the editing operation of the target attribute according to the hidden variable feature, thereby outputting the first human face image associated with the editing of the target attribute. A corresponding image reconstruction operation is performed by the reconstruction decoding network according to the hidden variable feature, thereby outputting a second human face image associated with the reconstruction. Then, the first face image and the second face image are used as the loss analysis objects of the target adversarial loss function and similarity loss function set in the face attribute editing model, and the model loss of the migration decoding network can be calculated subsequently, In order to train the transfer decoding network.

S250, substituting the first face image into the target confrontation loss function, substituting the first face image and the second face image into the similarity loss function, training the migration decoding network, and obtaining the trained face attributes Edit the model.

Exemplarily, after obtaining the first face image associated with target attribute editing output by the migration decoding network and the second face image associated with reconstruction output by the reconstruction decoding network, the first face image can be directly substituted into In the target adversarial loss function, analyze the editing loss for the target attributes in the face training image. At the same time, substituting the first face image and the second face image into the similarity loss function can analyze the similarity between the non-target attributes of the face training image when performing target attribute editing and performing reconstruction loss. Furthermore, the edit loss and similarity loss are used to backpropagate the migration decoding network to modify the network parameters in the migration decoding network and realize the training of the migration decoding network. At this time, each face training image is continuously input into the image coding network, and the above steps are executed cyclically, so as to continuously train the migration decoding network until the loss function of the migration decoding network reaches convergence, thereby obtaining a migration decoding network that has completed training , the trained face attribute editing model can be obtained.

It should be noted that since the similarity loss function in this embodiment includes two types of low-resolution perceptual loss function and mask perceptual loss function, in order to ensure the training accuracy of the migration decoding network, this embodiment will use the low-resolution perceptual loss function It is distinguished from the loss object of the mask-aware loss function. At this time, the first face image associated with the target attribute editing output by the migration decoding network may include the first intermediate output when the face training image passes through the migration decoding network for target attribute editing. The low-resolution image and the edited face image output after performing target attribute editing, the second face image associated with reconstruction output by the reconstruction decoding network may include the first face training image generated when the face training image is reconstructed through the reconstruction decoding network. Two intermediate low-resolution images and the face reconstructed image output after performing reconstruction.

Therefore, in this embodiment, the first human face image and the second human face image are substituted into the similarity loss function, and the migration decoding network is trained, which may include: the first intermediate low-resolution image and the second intermediate low-resolution image Substituting into the low-resolution perceptual loss function, the first key point mask feature extracted from the face editing image and the second key point mask feature extracted from the face reconstruction image are substituted into the mask perceptual loss function. Transfer decoding network for training.

That is to say, since the low-resolution perceptual loss function is used to constrain the first intermediate low-resolution image output when the face training image is edited through the face attribute editing model for target attribute editing and the second intermediate low-resolution image generated when the face training image is reconstructed The similarity between images can be distinguished, so the first intermediate low-resolution image output by the migration decoder when performing the target editing operation and the second intermediate low-resolution image output by the reconstruction decoder when performing image reconstruction operations can be substituted into In the low-resolution perceptual loss function, to analyze the corresponding low-resolution image editing loss. At the same time, for the face edited image output by the migration decoder after performing the target editing operation and the reconstructed face image output by the reconstruction decoder when the image reconstruction operation is performed, the existing feature extractor will be used to extract the face The first key point mask feature in the face editing image and the second key point mask feature in the face reconstruction image, and then respectively substitute the first key point mask feature and the second key point mask feature into the mask Perceptual loss function to analyze the mask loss at each keypoint. Furthermore, combined with the loss results of the target adversarial loss function, the low-resolution image editing loss of the low-resolution perceptual loss function and the mask loss of the mask perceptual loss function at each key point, the transfer decoder is jointly trained, and the trained The face attribute editing model of .

In this embodiment, the initial face attribute editing model is constructed according to the reconstruction parameters of the face training image, and the target adversarial loss function and similarity loss function are preset in the face attribute editing model, so as to constrain the human face through the target adversarial loss function. The editing authenticity of the target attributes in the face training image, constraining the similarity between the non-target attributes of the face training image through the face attribute editing model and the non-target attributes of the face training image reconstruction through the similarity loss function, Then, the face attribute editing model is jointly trained by using the target confrontation loss function and the similarity loss function, so that the face attribute editing model can realize the mutual recognition between the target attribute and the non-target attribute when editing the target attribute in the face image. Constraints, to avoid the problem of changing the background area and non-target attributes when editing the target attributes, to ensure that the editing of the non-target attributes remains invariant while editing the target attributes, thereby improving the editing of the face attribute editing model for the target attributes accuracy

Embodiment three

FIG. 3A is a flow chart of a face attribute editing method provided in Embodiment 3 of the present application. In this embodiment, various attributes in any face image can be re-edited to change the face style. The face attribute editing method provided in this embodiment can be executed by the face attribute editing device provided in the embodiment of the present application. The device can be realized by means of software and/or hardware, and integrated in the electronic device that executes the method.

Referring to FIG. 3A, the method may include the following steps:

S310, input the current face image to be edited into the face attribute editing model trained by the face attribute editing model training method provided in the above embodiment, and obtain the corresponding face editing image.

For example, a corresponding face attribute editing model is trained by adopting the training method of the face attribute editing model provided in the above embodiment, and the face attribute editing model has the ability to accurately edit target attributes. Therefore, in this embodiment, the current face image to be edited is input into the trained face attribute editing model, and the face attribute editing model performs target attribute editing processing on the current face image, thereby outputting the corresponding human face image. An edited face image. Compared with the current face image, the edited face image has re-edited target attributes in the image, and the non-target attributes can remain unchanged.

S320. Perform target segmentation on the current face image to obtain a corresponding target domain mask map.

Since the editing ability of the face attribute editing model for the target attribute is related to the training result of the face attribute editing model, in order to further ensure the authenticity and naturalness of the target attribute editing of the current face image, this embodiment will also add the face attribute Edit the face editing image output by the model to perform post-processing optimization operations for target attribute editing.

For example, firstly, the image segmentation technology is used to segment the mask area where the target attribute is located from the current face image, and the corresponding target domain mask map is obtained. Taking target attribute editing with a variable shaved head as an example, hair segmentation can be performed on the current face image to obtain a corresponding hair mask map (denoted as m _hair ), which is used as the target domain mask map in this embodiment.

S330. Perform image inpainting on the edited face image by using the target domain mask image, to obtain a face image with edited target attributes.

Since the features of the face edited image are usually inconsistent with the current face image in non-target attribute areas such as the face background area, and the target domain mask map can completely represent the target attribute information in the current face image, so this embodiment can Use the target domain mask map to analyze the non-target attribute information such as the background area in the face edited image, and then use the fusion result of the current face image and the target domain mask map for the non-target attributes to perform image restoration on the face edited image , to obtain the face image with the target attribute edited, so that the final face image and the non-target attributes in the current face image remain unchanged, thereby maintaining the invariance of the non-target attributes while editing the target attributes.

Exemplarily, when performing image restoration on the edited face image in this embodiment, it may be as follows: performing portrait segmentation on the edited face image to obtain a corresponding target portrait mask image; fusing the target portrait mask image with the current face image, Obtain the corresponding face fusion image; use the difference set of the target domain mask map to the target portrait mask map to perform image repair on the face fusion image, and obtain the face image with the target attribute editing completed. That is to say, by performing portrait segmentation on the edited face image to obtain the corresponding target portrait mask map, the background area in the edited face image can be eliminated, and the problem of background area change after target attribute editing can be prevented. Taking target attribute editing by changing the bald head as an example, the face editing image can be segmented to obtain a corresponding bald head portrait mask map (denoted as m _bald ), which is used as the target portrait mask map in this embodiment.

Then, as shown in FIG. 3B , in order to ensure the invariance of non-target attributes before and after target attribute editing, this embodiment fuses the target portrait mask map and the current face image, thereby retaining the non-portrait area (such as the background) in the current face image. area), and fuse with the target portrait mask image in the edited face image to obtain the corresponding face fusion image. Finally, since some areas in the target domain mask map will become background areas after the target attribute is edited, this embodiment will calculate the difference between the target domain mask map and the target portrait mask map (that is, m _hair to m _bald ’s difference set), and then the difference set and the face fusion image are jointly input into the pre-trained image inpainting model (inpainting), and the image inpainting model is based on the difference between the target domain mask map and the target portrait mask map Set, image repair is performed on the face fusion image, so as to obtain a face image that completes target attribute editing.

In this embodiment, the current face image to be edited is input into the face attribute editing model trained in the above manner, the corresponding edited face image can be obtained, and the target segmentation is performed on the current face image to obtain the corresponding target Domain mask map, and then use the target domain mask map to perform image restoration on the edited face image, and obtain the face image that has completed the target attribute editing. At this time, perform mask restoration on the edited face image output by the face attribute editing model Processing can improve the accuracy of face target attribute editing, and ensure the authenticity and naturalness of the target attribute edited by the current face image.

Embodiment four

FIG. 4 is a schematic structural diagram of a training device for a face attribute editing model provided in Embodiment 4 of the present application. As shown in FIG. 4 , the device may include:

The model construction module 410 is configured to construct an initial face attribute editing model according to the reconstruction parameters of the face training image, and a target confrontation loss function and a similarity loss function are preset in the face attribute editing model; wherein, the The target confrontation loss function is used to constrain the editing authenticity of the target attribute in the face training image, and the similarity loss function is used to constrain the non-target attribute and The similarity between the non-target attributes when the face training image is reconstructed;

The model training module 420 is configured to input the face training image into the face attribute editing model, and use the target adversarial loss function and the similarity loss function to train the face attribute editing model, Obtain the trained face attribute editing model.

In this embodiment, the initial face attribute editing model is constructed according to the reconstruction parameters of the face training image, and the target adversarial loss function and similarity loss function are preset in the face attribute editing model, so as to constrain the human face through the target adversarial loss function. The editing authenticity of the target attributes in the face training image, constraining the similarity between the non-target attributes of the face training image through the face attribute editing model and the non-target attributes of the face training image reconstruction through the similarity loss function, Then, the face attribute editing model is jointly trained by using the target confrontation loss function and the similarity loss function, so that the face attribute editing model can realize the mutual recognition between the target attribute and the non-target attribute when editing the target attribute in the face image. Constraints, to avoid the problem of changing the background area and non-target attributes when editing the target attributes, to ensure that the editing of the non-target attributes remains unchanged while editing the target attributes, thereby improving the editing of the face attribute editing model for the editing of the target attributes accuracy.

The training device for the face attribute editing model provided in this embodiment can be applied to the training method for the face attribute editing model provided in any of the above embodiments, and has corresponding functions and beneficial effects.

Embodiment five

FIG. 5 is a schematic structural diagram of a face attribute editing device provided in Embodiment 5 of the present application. As shown in FIG. 5, the device may include:

The preliminary editing module 510 is configured to input the current human face image to be edited into the human face attribute editing model trained by the training method of the human face attribute editing model provided by the above-mentioned embodiment, and obtain a corresponding human face editing image;

The target segmentation module 520 is configured to perform target segmentation on the current face image to obtain a corresponding target domain mask;

The editing and repairing module 530 is configured to use the target domain mask map to perform image repair on the edited face image to obtain a face image with edited target attributes.

The face attribute editing device provided in this embodiment can be applied to the face attribute editing method provided in any of the above embodiments, and has corresponding functions and beneficial effects.

Embodiment six

FIG. 6 is a schematic structural diagram of an electronic device provided in Embodiment 6 of the present application. As shown in FIG. 6, the electronic device includes a processor 60, a storage device 61, and a communication device 62; the number of processors 60 in the electronic device may be One or more, one processor 60 is taken as an example in FIG. 6; the processor 60, storage device 61 and communication device 62 in the electronic device may be connected through a bus or in other ways, and a bus connection is taken as an example in FIG. 6.

An electronic device provided in this embodiment can be used to execute the training method of the face attribute editing model provided in any of the above embodiments, or the face attribute editing method, and has corresponding functions and beneficial effects.

Embodiment seven

Embodiment 7 of the present application also provides a computer-readable storage medium, on which a computer program is stored. When the program is executed by a processor, the training method of the face attribute editing model in any of the above-mentioned embodiments, or the face Property editing method.

A storage medium containing computer-executable instructions provided in an embodiment of the present application, the computer-executable instructions are not limited to the method operations described above, and can also perform the training of the face attribute editing model provided in any embodiment of the present application method, or related operations in the face attribute editing method.

Through the above description about the implementation, those skilled in the art can clearly understand that the present application can be realized by means of software and necessary general-purpose hardware, and of course it can also be realized by hardware, but in many cases the former is a better implementation . Based on this understanding, the essence of the embodiment of the present application or the part that contributes to the related technology can be embodied in the form of software products, and the computer software products can be stored in computer-readable storage media, such as computer floppy disks, Read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disc, etc., including several instructions to make a computer device (which can be a personal computer, A server, or a network device, etc.) executes the methods described in various embodiments of the present application.

The storage medium may be a non-transitory storage medium.

It is worth noting that, in the above-mentioned training device for face attribute editing model, or in the embodiments of the face attribute editing device, the various units and modules included are only divided according to functional logic, but are not limited to the above-mentioned divisions, As long as the corresponding functions can be realized; in addition, the specific names of the functional units are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application.

Claims

A training method for a human face attribute editing model, comprising:

Construct an initial face attribute editing model according to the reconstruction parameters of the face training image, and the target confrontation loss function and similarity loss function are preset in the face attribute editing model; wherein, the target confrontation loss function is set as a constraint The editing authenticity of the target attribute in the human face training image, the similarity loss function is set to constrain the non-target attribute of the human face training image when it passes through the human face attribute editing model and the same as the human face training image The similarity between the non-target attributes of time structure;

The human face training image is input into the human face attribute editing model, and the human face attribute editing model is trained by using the target confrontation loss function and the similarity loss function to obtain the trained human face attribute Edit the model.
The method according to claim 1, wherein the similarity loss function comprises a low resolution perceptual loss function and a mask perceptual loss function;

Wherein, the low-resolution perceptual loss function is set to constrain the face training image to be generated when the face training image is reconstructed from the first intermediate low-resolution image output when the face attribute editing model is used for target attribute editing The similarity between the second intermediate low-resolution images of ;

The mask-aware loss function is set to constrain the first key point mask feature extracted after the target attribute editing of the face training image through the face attribute editing model and the second key point mask feature extracted after the face training image is reconstructed. The similarity between keypoint mask features.
The method according to claim 1, wherein the human face attribute editing model comprises an image coding network, a migration decoding network and a latent space of the migration decoding network;

Wherein, the image coding network is configured to input the human face training image, and output the hidden variable features of the human face training image to the hidden space;

The migration decoding network is configured to input the hidden variable features, edit the target attributes in the face training image, and output the corresponding edited face image.
The method according to claim 3, wherein said constructing an initial face attribute editing model according to the reconstruction parameters of the face training image comprises:

Construct the initial migration decoding network according to the network parameters of the trained reconstructed decoding network;

The pre-trained image encoding network, the initial migration decoding network and the latent space of the migration decoding network are constructed as an initial face attribute editing model.
The method according to claim 3 or 4, wherein said inputting said face training image into said face attribute editing model, using said target adversarial loss function and said similarity loss function to The face attribute editing model is trained, and the trained face attribute editing model is obtained, including:

The human face training image is input into the image encoding network, and the latent variable characteristics of the human face training image are output to the hidden space;

The latent variable feature is input into the migration decoding network to obtain the first human face image associated with the target attribute editing;

Inputting the hidden variable features into the trained reconstruction decoding network to obtain the second face image associated with reconstruction;

Substituting the first human face image into the target adversarial loss function, substituting the first human face image and the second human face image into the similarity loss function, the migration decoding network Perform training to obtain a trained face attribute editing model.
The method according to claim 5, wherein the similarity loss function includes a low-resolution perceptual loss function and a mask perceptual loss function, and the first face image includes the face training image passed through the migration decoding network The first intermediate low-resolution image output during target attribute editing and the face edited image output after target attribute editing, the second face image including the face training image is reconstructed through the reconstruction decoding network The second intermediate low-resolution image generated at the time and the face reconstructed image output after performing reconstruction;

Said substituting said first human face image and said second human face image into said similarity loss function, said migration decoding network is trained, comprising:

Substituting the first intermediate low-resolution image and the second intermediate low-resolution image into the low-resolution perceptual loss function, combining the first key point mask feature extracted from the edited face image and the human face The second key point mask feature extracted from the face reconstruction image is substituted into the mask-aware loss function to train the migration decoding network.
A method for editing face attributes, comprising:

The current human face image to be edited is input in the human face attribute editing model trained by the training method of the human face attribute editing model described in any one of claims 1-6, to obtain the corresponding human face editing image;

performing target segmentation on the current face image to obtain a corresponding target domain mask;

Performing image restoration on the edited face image by using the target domain mask map to obtain a face image with edited target attributes.
The method according to claim 7, wherein said performing image restoration on the edited face image by using the target domain mask map to obtain a face image with edited target attributes includes:

Carrying out portrait segmentation to the edited face image to obtain a corresponding target portrait mask;

Fusion of the target portrait mask and the current face image to obtain a corresponding face fusion image;

Utilize the difference set of the target domain mask map to the target portrait mask map, carry out image restoration to the fusion image of the human face, and obtain the human face image that completes target attribute editing.
A training device for a human face attribute editing model, comprising:

The model building module is configured to construct an initial face attribute editing model according to the reconstruction parameters of the face training image, and a target confrontation loss function and a similarity loss function are preset in the face attribute editing model; wherein, the target The adversarial loss function is set to constrain the editing authenticity of the target attribute in the face training image, and the similarity loss function is set to constrain the non-target attribute of the face training image when it passes through the face attribute editing model. Describe the similarity between the non-target attributes when the face training image is reconstructed;

The model training module is configured to input the face training image into the face attribute editing model, and use the target confrontation loss function and the similarity loss function to train the face attribute editing model to obtain The trained face attribute editing model.
A device for editing human face attributes, comprising:

The primary editing module is configured to input the current human face image to be edited into the human face attribute editing model trained by the training method of the human face attribute editing model described in any one of claims 1-6, to obtain the corresponding face editing images;

The target segmentation module is configured to perform target segmentation on the current face image to obtain a corresponding target domain mask;

The editing and repairing module is configured to use the target domain mask image to perform image repair on the edited face image to obtain a face image with edited target attributes.
An electronic device comprising:

one or more processors;

a storage device configured to store one or more programs;

When the one or more programs are executed by the one or more processors, so that the one or more processors implement the training method of the face attribute editing model according to any one of claims 1-6, Or, realize the human face attribute editing method as described in claim 7 or 8.
A computer-readable storage medium, the computer-readable storage medium is stored with a computer program, and when the computer program is executed by a processor, it realizes the training of the human face attribute editing model according to any one of claims 1-6 method, or realize the face attribute editing method as claimed in claim 7 or 8.