WO2021083069A1

WO2021083069A1 - Method and device for training face swapping model

Info

Publication number: WO2021083069A1
Application number: PCT/CN2020/123582
Authority: WO
Inventors: 徐伟; 罗琨; 陈晓磊
Original assignee: 上海掌门科技有限公司
Priority date: 2019-10-30
Filing date: 2020-10-26
Publication date: 2021-05-06
Also published as: CN110796089B; CN110796089A

Abstract

Disclosed in embodiments of the present application are a method and device for training a face swapping model. A specific embodiment of the method comprises: receiving a face swapping model training request sent by a user, wherein the face swapping model training request comprises a face sample set provided by the user before face swapping and a specified template face identifier; determining, from a pre-training model set corresponding to the template face identifier, a pre-training model matching the face sample set before face swapping, wherein the pre-training model set comprises a pre-trained model on the basis of a target face sample set group and a template face sample set group corresponding to the template face identifier; determining, from the template face sample set group, a template face sample set matching the face sample set before face swapping; and training the determined pre-training model on the basis of the face sample set before face swapping and the determined template face sample set by using a machine learning method to obtain a face swapping model. The embodiment saves the training time of the face swapping model and improves the training efficiency of the face swapping model.

Description

Method and equipment for training face-changing model

Technical field

The embodiments of the present application relate to the field of computer technology, in particular to a method and device for training a face-changing model.

Background technique

In the current popular deep face swapping framework, the technology of Generative Adversarial Networks (GAN) is usually used to obtain satisfactory face generation effects. In the model training of the general generative confrontation network framework, although the high-quality face can be guaranteed to be generated on the basis of sufficient samples and computing power, there is still the problem of long training time, which will affect the deep face-changing technology Prospects and user experience in practical applications.

Summary of the invention

The embodiment of the application proposes a method and device for training a face-changing model.

In the first aspect, an embodiment of the present application provides a method for training a face-changing model, including: receiving a face-changing model training request sent by a user, wherein the face-changing model training request includes the face before the face change provided by the user The sample set and the specified template face identification; from the pre-training model set corresponding to the template face identification, the pre-training model that matches the face sample set before the face change is determined, wherein the pre-training model set includes a sample set group based on the target face The pre-trained model of the template face sample set corresponding to the template face identifier; the template face sample set that matches the face sample set before the face change is determined from the template face sample set; the machine learning method is based on Training the determined pre-training model with the face sample set before the face change and the determined template face sample set to obtain the face change model.

In some embodiments, determining from the pre-training model set corresponding to the template face identifier the pre-training model that matches the face sample set before the face change includes: if there is a pre-trained model corresponding to the template face identifier in the user's historical face change record The training model determines the pre-training model corresponding to the template face identifier as the pre-training model matching the face sample set before the face change.

In some embodiments, determining from the pre-training model set corresponding to the template face identifiers the pre-training model that matches the face sample set before the face change, further includes: if there is no template face identifier corresponding to the user's historical face change records The pre-training model for identifying the face attribute information of the face sample set before the face change; based on the recognized face attribute information, the pre-training model is determined from the pre-training model set.

In some embodiments, the face attribute information includes information in at least one of the following dimensions: gender, age group, race, facial accessories, and face shape.

In some embodiments, identifying the face attribute information of the face sample set before the face change includes: inputting the face sample set before the face change into a pre-trained first classification model to obtain the gender of the face sample set before the face change , Age, race, and facial accessories, where the first classification model is a classification model based on a convolutional neural network.

In some embodiments, identifying the face attribute information of the face sample set before the face change includes: extracting the face classification features of the face sample set before the face change; inputting the extracted face classification features to the pre-training The second classification model is to obtain the face shape of the face sample set before the face change, where the second classification model is a classification model based on a support vector machine.

In some embodiments, extracting the facial classification features of the face sample set before the face change includes: extracting the face feature point information of the face sample set before the face change; calculating the face feature point information based on the extracted face feature point information The face measurement parameters of the face sample set before the face; the extracted face feature point information and the calculated face measurement parameters are combined into the face classification features of the face sample set before the face change.

In some embodiments, determining a pre-training model from the pre-training model set based on the recognized face attribute information includes: determining a pre-training model subset matching the recognized face attribute information from the pre-training model set; computing; The similarity between the face sample set before the face change and the target face sample set corresponding to the pre-training model in the pre-training model subset; based on the calculated similarity, the pre-training model is determined from the pre-training model subset.

In some embodiments, calculating the similarity between the face sample set before the face change and the target face sample set corresponding to the pre-trained model in the pre-training model subset includes: extracting the average facial features of the face sample set before the face change Vector; Calculate the cosine similarity between the extracted average face feature vector and the average face feature vector of the target face sample set corresponding to the pre-training model in the pre-training model subset.

In some embodiments, determining, from the template face sample set group, the template face sample set that matches the face sample set before the face change includes: extracting the face richness features of the face sample set before the face change; The degree of matching between the extracted face richness features and the face richness features of the template face sample set in the template face sample set group; based on the calculated matching degree, the template face is determined from the template face sample set group Sample set.

In some embodiments, extracting the face richness features of the face sample set before the face change includes: extracting the face feature information of the face sample set before the face change; performing histogram statistics on the face feature information to obtain the face change The face richness feature of the previous face sample set.

In some embodiments, the facial feature information includes information in at least one of the following dimensions: facial feature points, facial angles, and facial expressions.

In some embodiments, calculating the matching degree between the extracted face richness features and the face richness features of the template face sample set in the template face sample set group includes: using a histogram matching method to calculate the extracted The degree of matching between the face richness features of the template face sample set and the face richness feature of the template face sample set in the template face sample set group.

In some embodiments, based on the calculated matching degree, determining the template face sample set from the template face sample set group includes: if there is a template person in the template face sample set group with a matching degree greater than a preset matching degree threshold Face sample set, select the template face sample set with the highest matching degree from the template face sample set group; if there is no template face sample set with the matching degree greater than the preset matching degree threshold in the template face sample set group, select the template face sample set from the template face sample set group. Select a universal template face sample set from the face sample set group.

In some embodiments, the pre-training model set is trained by the following steps: acquiring multiple target face samples; dividing the multiple target face samples into target face sample set groups according to face attributes, where the same target face The face attributes of the target face samples in the sample set are similar; for the target face sample set in the target face sample set group, based on the target face sample set and the template face sample set matching the target face sample set Train the generative confrontation network to get the pre-training model.

In some embodiments, the pre-training model includes a generative model and a discriminant model; and using machine learning methods, the determined pre-training model is trained based on the face sample set before the face change and the determined template face sample set, to obtain The face-changing model includes: inputting the face sample set before changing the face into the generated model of the determined pre-training model to obtain the face sample set after the face changing; the face sample set after the face changing and the determined template face sample Set input to the discriminant model of the pre-trained model determined to obtain the discriminant result, where the discriminant result is used to represent the probability that the face sample set after face change and the determined template face sample set are the real sample set; adjust based on the discriminant result The parameters of the generative model and the discriminant model of the determined pre-training model.

In some embodiments, adjusting the parameters of the generated model and the discrimination model of the determined pre-training model based on the discrimination result includes: determining whether the discrimination result meets the constraint condition; if the discrimination result does not satisfy the constraint condition, adjusting the determined parameter based on the discrimination result The generation model of the pre-training model and the parameters of the discrimination model, and the determined pre-training model is trained again based on the face sample set before the face change and the determined template face sample set; if the discrimination result meets the constraint conditions, it is determined to change The face model training is completed, and the face sample set after the face change output last time by the generation model of the determined pre-training model is sent to the user.

In a second aspect, an embodiment of the present application provides an apparatus for training a face-changing model, including: a receiving unit configured to receive a face-changing model training request sent by a user, wherein the face-changing model training request includes the user provided The face sample set before the face change and the designated template face identifier; the first determining unit is configured to determine the pre-training model matching the face sample set before the face change from the pre-training model set corresponding to the template face identifier, Among them, the pre-training model set includes a model pre-trained based on the target face sample set group and the template face sample set group corresponding to the template face identifier; the second determining unit is configured to determine from the template face sample set group A template face sample set that matches the face sample set before the face change; the training unit is configured to use a machine learning method, based on the pre-training determined by the face sample set before the face change and the determined template face sample set The model is trained to obtain a face-changing model.

In some embodiments, the first determining unit includes: a first determining subunit configured to, if there is a pre-trained model corresponding to the template face identifier in the user’s historical face-changing record, the pre-trained model corresponding to the template face identifier Determined as a pre-trained model that matches the face sample set before the face change.

In some embodiments, the first determining unit further includes: a recognition sub-unit configured to recognize the person in the face sample set before the face change if there is no pre-trained model corresponding to the template face identifier in the user's historical face change record Face attribute information; the second determining subunit is configured to determine the pre-training model from the pre-training model set based on the recognized face attribute information.

In some embodiments, the recognition subunit includes: a first classification module configured to input the face sample set before the face change into the pre-trained first classification model to obtain the gender and age group of the face sample set before the face change , Race, and facial accessories, where the first classification model is a classification model based on a convolutional neural network.

In some embodiments, the recognition subunit includes: an extraction module configured to extract facial facial classification features of a face sample set before the face change; a second classification module configured to input the extracted facial facial classification features To the pre-trained second classification model, the face shape of the face sample set before the face change is obtained, where the second classification model is a classification model based on a support vector machine.

In some embodiments, the extraction module is further configured to: extract face feature point information of the face sample set before the face change; based on the extracted face feature point information, calculate the face measurement of the face sample set before the face change Parameters: Combine the extracted facial feature point information and the calculated facial measurement parameters into the facial classification features of the face sample set before the face change.

In some embodiments, the second determining subunit includes: a first determining module configured to determine a subset of pre-trained models matching the recognized face attribute information from a set of pre-training models; a calculation module configured to calculate The similarity between the face sample set before the face change and the target face sample set corresponding to the pre-training model in the pre-training model subset; the second determining module is configured to determine from the pre-training model subset based on the calculated similarity Pre-trained model.

In some embodiments, the calculation module is further configured to: extract the average face feature vector of the face sample set before the face change; calculate the extracted average face feature vector and the target corresponding to the pre-trained model in the pre-trained model subset The cosine similarity of the average face feature vector of the face sample set.

In some embodiments, the second determining unit includes: an extraction subunit configured to extract face richness features of a face sample set before the face change; a calculation subunit configured to calculate the extracted face richness features The matching degree with the face richness feature of the template face sample set in the template face sample set group; the third determining subunit is configured to determine the template from the template face sample set group based on the calculated matching degree Set of human face samples.

In some embodiments, the extraction subunit is further configured to: extract the face feature information of the face sample set before the face change; perform histogram statistics on the face feature information to obtain the face richness of the face sample set before the face change Degree characteristics.

In some embodiments, the calculation subunit is further configured to: use a histogram matching method to calculate the difference between the extracted face richness feature and the face richness feature of the template face sample set in the template face sample set group. suitability.

In some embodiments, the third determining subunit is further configured to: if there is a template face sample set with a matching degree greater than a preset matching degree threshold in the template face sample set group, select the matching from the template face sample set group The template face sample set with the highest degree; if there is no template face sample set with a matching degree greater than the preset matching degree threshold in the template face sample set group, select a general template face sample set from the template face sample set group .

In some embodiments, the pre-training model includes a generative model and a discriminant model; and the training unit includes: a generating subunit configured to input the set of face samples before the face change into the generative model of the determined pre-training model to obtain the face change The posterior face sample set; the discriminant subunit is configured to input the face sample set after face change and the determined template face sample set into the discriminant model of the determined pre-training model to obtain the discriminant result, where the discriminant result is used To characterize the probability that the face sample set and the determined template face sample set are the real sample set after the face change; the adjustment subunit is configured to adjust the parameters of the determined pre-training model generation model and the discrimination model based on the discrimination result .

In some embodiments, the adjustment subunit is further configured to: determine whether the discrimination result meets the constraint condition; if the discrimination result does not satisfy the constraint condition, adjust the parameters of the determined pre-training model generation model and the discrimination model based on the discrimination result, and Train the determined pre-training model again based on the face sample set before the face change and the determined template face sample set; if the discrimination result meets the constraint conditions, it is determined that the face change model training is completed, and the determined pre-training model The face sample set after the face change last time output by the generative model is sent to the user.

In a third aspect, the embodiments of the present application provide a computer device, which includes: one or more processors; a storage device on which one or more programs are stored; when one or more programs are stored by one or more Execution by two processors, so that one or more processors implement the method described in any implementation manner of the first aspect.

In the fourth aspect, an embodiment of the present application provides a computer-readable medium on which a computer program is stored, and when the computer program is executed by a processor, the method as described in any implementation manner in the first aspect is implemented.

The method and device for training a face-changing model provided by the embodiments of the application firstly receive a face-changing model training request sent by a user; then, determine and change the set of pre-training models corresponding to the template face identifier in the face-changing model training request. The pre-training model that matches the face sample set before the face change in the face model training request; then, from the template face sample set corresponding to the template face identifier, it is determined to match the face sample set before the face change in the face change model training request Finally, the machine learning method is used to train the determined pre-training model based on the face sample set before the face change and the determined template face sample set to obtain the face change model. Using the pre-training model to train the face-changing model avoids "zero-start" training, saves the time-consuming training of the face-changing model, and improves the training efficiency of the face-changing model. In turn, the in-depth face-changing technology plays a positive role in practical applications and experience effects.

Description of the drawings

By reading the detailed description of the non-limiting embodiments with reference to the following drawings, other features, purposes, and advantages of the present application will become more apparent:

Fig. 1 is an exemplary system architecture in which some embodiments of the present application can be applied;

Fig. 2 is a flowchart of an embodiment of a method for training a face-changing model according to the present application;

Fig. 3 is a flowchart of another embodiment of the method for training a face-changing model according to the present application;

Fig. 4 is a schematic structural diagram of a computer system suitable for implementing computer equipment of some embodiments of the present application.

Detailed ways

The application will be further described in detail below with reference to the drawings and embodiments. It can be understood that the specific embodiments described here are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for ease of description, only the parts related to the relevant invention are shown in the drawings.

It should be noted that the embodiments in this application and the features in the embodiments can be combined with each other if there is no conflict. Hereinafter, the application will be described in detail with reference to the drawings and in conjunction with the embodiments.

Fig. 1 shows an exemplary system architecture 100 to which an embodiment of the method for training a face-changing model of the present application can be applied.

As shown in FIG. 1, the system architecture 100 may include

devices

101 and 102 and a network 103. The network 103 is a medium used to provide a communication link between the

devices

101 and 102. The network 103 may include various connection types, such as wired, wireless target communication links, or fiber optic cables, and so on.

The

devices

101 and 102 may be hardware devices or software that support network connections to provide various network services. When the device is hardware, it can be various electronic devices, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, servers, and so on. At this time, as a hardware device, it can be implemented as a distributed device group composed of multiple devices, or as a single device. When the device is software, it can be installed in the electronic devices listed above. At this time, as software, it can be implemented as multiple software or software modules for providing distributed services, or as a single software or software module. There is no specific limitation here.

In practice, devices can provide corresponding network services by installing corresponding client applications or server applications. After the device has installed the client application, it can be embodied as a client in network communication. Correspondingly, after the server application is installed, it can be embodied as a server in network communication.

As an example, in FIG. 1, the device 101 is embodied as a client, and the device 102 is embodied as a server. Specifically, the device 101 may be a client with image processing software installed, and the device 102 may be a server of the image processing software.

It should be noted that the method for training a face-changing model provided in the embodiment of the present application may be executed by the device 102.

It should be understood that the number of networks and devices in FIG. 1 is merely illustrative. According to implementation needs, there can be any number of networks and devices.

Continuing to refer to FIG. 2, it shows a process 200 of an embodiment of the method for training a face-changing model according to the present application. The method for training a face-changing model may include the following steps:

Step 201: Receive a face-changing model training request sent by a user.

In this embodiment, the execution subject of the method for training a face-changing model (for example, the device 102 shown in FIG. 1) may receive a face-changing model training request sent by a user. Wherein, the face-changing model training request may include the face sample set before the face-changing provided by the user and the designated template face identifier. The face sample set before the face change may be a sample set of which the user wants to replace the face. The face sample set before the face change may be one or more face images before the face change, or it may be a multi-frame video frame of the face video before the face change. The template face may be the face that the user wants to replace. The template face identification can be composed of letters, numbers, symbols, etc., and is the only identification of the template face.

Generally, image processing software may be installed on the user's terminal device (for example, the device 101 shown in FIG. 1). The user can open the image processing software and enter the main page. Edit buttons can be set on the main page. When the user clicks the edit button, the locally stored image list and/or video list can be displayed for the user to select. When the user selects one or more images from the image list, the one or more images selected by the user can be determined as the face sample set provided by the user before the face change. When the user selects a video from the video list, the multi-frame video frame of the video selected by the user can be determined as the face sample set provided by the user before the face change. In addition, after the user selects the face sample set before the face change, the user will enter the image processing page. The face sample set before the face change can be displayed on the image processing page. A face-changing button can also be set on the image processing page. When the user clicks the face-changing button, a list of template faces that can be replaced can be displayed. When the user selects a template face from the template face list, the template face selected by the user can be determined as the user-specified template face, and its identifier is the user-specified template face identifier. In addition, after the user selects the template face, the terminal device can send a face-changing model training request including the face sample set before the face-changing provided by the user and the designated template face identifier to the above-mentioned execution subject.

Step 202: Determine a pre-training model matching the set of face samples before the face change from the pre-training model set corresponding to the template face identifier.

In this embodiment, the above-mentioned execution subject may determine a pre-training model matching the set of face samples before the face change from the set of pre-training models corresponding to the template face identifier designated by the user. For example, the above-mentioned execution subject may randomly select a pre-training model from a set of pre-training models corresponding to a template face identifier designated by the user.

In some optional implementations of this embodiment, if there is a pre-trained model corresponding to the template face identifier in the user’s historical face-changing record, the execution subject may determine the pre-training model corresponding to the template face identifier as the A pre-trained model for matching the face sample set in front of the face. Generally, after a user uses a pre-trained model to train a face-changing model for face-changing, a historical face-changing record is generated. Among them, the historical face change record may record the template face identifier and the pre-training model identifier used during the historical face change process. It can be seen that if there is a pre-training model identifier corresponding to the template face identifier specified by the user in the user's historical face changing record, it means that the user has used the pre-training model corresponding to the template face identifier to train the face changing model. At this time, the above-mentioned execution subject may directly determine the pre-training model corresponding to the template face identifier as the pre-training model to be used this time.

Generally, a template face identifier corresponds to a pre-training model set. The same pre-training model set can be used to train face-changing models of different face attribute information of the same template face. The pre-trained model set of the same template face may include a pre-trained model based on the target face sample set group of the same target face and the template face sample set group of the same template face. A pair of target face sample set and template face sample set can be used to train a pre-training model of the same face attribute information. It can be seen that the face attribute information of the target face samples in the same target face sample set is similar, and the face attributes of the template face samples in the sample set of the same template face are similar. In addition, the face attribute information of the target face sample set and the template face sample set used to train the same pre-training model are also similar.

Generally, face attribute information may include information of multiple dimensions. For example, face attribute information may include, but is not limited to, information of at least one of the following dimensions: gender (such as male, female), age group (such as teenagers, middle-aged, Old age), race (such as white, yellow, black), facial accessories (such as whether to wear facial accessories), face shape (such as round face, triangle face, oval face, square face), etc. .

In some optional implementations of this embodiment, the pre-training model set is trained through the following steps:

First, obtain multiple target face samples.

Here, the multiple target face samples may be a batch of target face samples of the same target face.

Then, according to the face attributes, the multiple target face samples are divided into target face sample set groups.

Among them, the face attribute information of the target face samples in the same target face sample set is similar. For example, the target face sample whose face attribute information is {male, middle-aged, yellow, no glasses, round face} belongs to a target face sample set. The target face sample whose face attribute information is {male, middle-aged, yellow, wearing glasses, round face} belongs to another target face sample set. In addition, each target face sample set will be marked with a corresponding label to record the corresponding face attribute information.

Finally, for the target face sample set in the target face sample set group, the generative confrontation network is trained based on the target face sample set and the template face sample set matching the target face sample set to obtain pre-training model.

Among them, the face attribute information of the template face samples in the same template face sample set is similar. In addition, the face attribute information of the template face sample set matching the target face sample set is similar to the face attribute information of the target face sample set. For example, if the face attribute information of the target face sample set is {male, middle-aged, yellow, no glasses, round face}, then the face attributes of the template face sample set that matches the target face sample set The information has a high probability of {male, middle-aged, yellow race, no glasses, round face}.

Step 203: Determine a template face sample set matching the face sample set before the face change from the template face sample set group.

In this embodiment, the above-mentioned execution subject may determine a template face sample set that matches the face sample set before the face change from the template face sample set group. For example, the above-mentioned execution subject can select a template face sample set similar to the face attribute information of the face sample set before the face change from the template face sample set group, and determine it as a template matching the face sample set before the face change Set of human face samples.

Step 204: Using a machine learning method, train the determined pre-training model based on the face sample set before the face change and the determined template face sample set to obtain the face change model.

In this embodiment, the above-mentioned execution subject may use a machine learning method to train the determined pre-training model based on the face sample set before the face change and the determined template face sample set to obtain the face change model. Specifically, the above-mentioned execution subject may take the face sample set before the face change and the determined template face sample set as input, and obtain the corresponding output through the processing of the determined pre-training model. If the output satisfies the unconstrained condition, the parameters of the determined pre-training model are adjusted, and the face sample set before the face change and the determined template face sample set are input again to continue training. If the output meets the preset conditions, the model training is completed.

In practice, since the pre-training model is a trained generative confrontation network, the pre-training model may include a trained generative model and a trained discriminant model. Among them, the generative model is mainly used to learn the distribution of real images to make the images generated by itself more realistic, so as to fool the discriminant model. The discriminant model needs to judge the authenticity of the received image. In the whole process, the generative model strives to make the generated image more realistic, while the discriminative model strives to identify the true and false of the image. This process is equivalent to a two-person game. As time goes by, the generative model and the discriminant model In constant confrontation, the two networks finally reached a dynamic equilibrium: the images generated by the generative model were close to the distribution of real images, and the discriminant model could not identify true and false images.

In some optional implementations of this embodiment, the above-mentioned execution subject may train the face-changing model through the following steps:

First, the face sample set before the face change is input into the determined generation model of the pre-training model to obtain the face sample set after the face change.

Then, the face sample set after the face change and the determined template face sample set are input into the determined discriminant model of the pre-training model, and the discriminant result is obtained.

Among them, the discrimination result can be used to characterize the probability that the face sample set after the face change and the determined template face sample set are the real sample set.

Finally, the parameters of the determined pre-training model generation model and the discrimination model are adjusted based on the discrimination result.

Here, every time a judgment result is obtained, the above-mentioned execution subject will determine whether the judgment result meets the constraint conditions. If the discrimination result does not satisfy the constraint condition, the above-mentioned execution subject may adjust the parameters of the determined pre-training model generation model and discrimination model based on the discrimination result. Subsequently, the determined pre-training model is trained again based on the face sample set before the face change and the determined template face sample set. If the result of the discrimination satisfies the constraint condition, the above-mentioned execution subject may determine that the face-changing model training is completed, and send the face sample set after the face-changing output of the determined generation model of the pre-training model to the user. Among them, the face sample set after the face change last output by the generative model is the sample set where the face before the face change is replaced with the template face.

In the method for training a face-changing model provided by an embodiment of the application, firstly, a face-changing model training request sent by a user is received; then, the pre-training model set corresponding to the template face identifier in the face-changing model training request is determined and the face-changing model is determined The pre-training model that matches the face sample set before the face change in the training request; then, from the template face sample set group corresponding to the template face identifier, determine the template that matches the face sample set before the face change in the face change model training request Face sample set; Finally, a machine learning method is used to train the determined pre-training model based on the face sample set before the face change and the determined template face sample set to obtain the face change model. Using the pre-training model to train the face-changing model avoids "zero-start" training, saves the time-consuming training of the face-changing model, and improves the training efficiency of the face-changing model. In turn, the in-depth face-changing technology plays a positive role in practical applications and experience effects.

With further reference to FIG. 3, it shows a process 300 of another embodiment of the method for training a face-changing model according to the present application. The method for training a face-changing model may include the following steps:

Step 301: Receive a face-changing model training request sent by the user.

In this embodiment, the specific operation of step 301 has been described in detail in step 201 in the embodiment shown in FIG. 2, and will not be repeated here.

Step 302: If there is no pre-training model corresponding to the template face identifier in the user's historical face change record, identify the face attribute information of the face sample set before the face change.

In this embodiment, if the pre-trained model corresponding to the template face identifier does not exist in the user’s historical face-changing record, the execution subject of the method for training the face-changing model (for example, the device 102 shown in FIG. 1) can identify the change The face attribute information of the face sample set in front of the face. Generally, face attribute information may include information of multiple dimensions. For example, face attribute information may include, but is not limited to, information of at least one of the following dimensions: gender (such as male, female), age group (such as teenagers, middle-aged, Old age), race (such as white, yellow, black), facial accessories (such as whether to wear facial accessories), face shape (such as round face, triangle face, oval face, square face), etc. .

In some optional implementations of this embodiment, the above-mentioned execution subject may input the face sample set before the face change into the pre-trained first classification model to obtain the gender, age group, and person of the face sample set before the face change. Information of at least one dimension in species and facial accessories. Since gender, age group, race, and facial accessories are all classification problems, the first classification model can be a classification model based on Convolutional Neural Networks (CNN) (such as AlexNet, GoogleNet, ResNet, etc.). Get it through training.

In some optional implementations of this embodiment, the above-mentioned execution subject may first extract the facial classification features of the face sample set before the face change; and then input the extracted facial facial classification features into the pre-trained second The classification model obtains the face shape of the face sample set before the face change. The second classification model may be obtained by training a classification model based on Support Vector Machine (SVM).

In some optional implementations of this embodiment, the facial feature classification feature may include facial feature point information and facial measurement parameters. At this time, the above-mentioned execution subject can first extract the face feature point information of the face sample set before the face change; then, based on the extracted face feature point information, calculate the face measurement parameters of the face sample set before the face change; The extracted face feature point information and the calculated face measurement parameters are combined into the face classification feature of the face sample set before the face change. Among them, the algorithm for extracting facial feature point information may include, but is not limited to, dlib, LBF, and so on. The face measurement parameters calculated based on the facial feature point information may include, but are not limited to, face width (Wshape), jaw width (Wmandible), shape face height (Hshape), and so on. The face width can be equal to the Euclidean distance between the left and right zygomatic points. The width of the mandible can be equal to the Euclidean distance between the left and right mandibular corner points. The height of the morphological surface can be equal to the Euclidean distance between the nasion point and the submental point.

Step 303: Based on the recognized face attribute information, a pre-training model is determined from the pre-training model set.

In this embodiment, the above-mentioned execution subject may determine the pre-training model from the pre-training model set based on the recognized face attribute information. For example, the above-mentioned execution subject may select a pre-training model that best matches the recognized face attribute information from a set of pre-training models.

In some optional implementations of this embodiment, the above-mentioned execution subject may first determine from the pre-training model set a subset of the pre-training model that matches the recognized face attribute information; and then calculate the face sample set before the face change and The similarity of the target face sample set corresponding to the pre-training model in the pre-training model subset; finally, based on the calculated similarity, the pre-training model is determined from the pre-training model subset. Generally, the above-mentioned execution subject can first extract the average face feature vector of the face sample set before the face change; then calculate the extracted average face feature vector and the target face sample set corresponding to the pre-training model in the pre-training model subset. The cosine similarity of the average face feature vector. Among them, the algorithm for extracting the average face feature vector may be, for example, a face recognition algorithm (such as VggFace). The target face sample set corresponding to the pre-training model is the target face sample set used when the pre-training model is pre-trained.

Step 304: Extract the face richness features of the face sample set before the face change.

In this embodiment, the above-mentioned execution subject may extract the face richness features of the face sample set before the face change.

In some optional implementations of this embodiment, the above-mentioned execution subject may first extract the face feature information of the face sample set before the face change; then perform histogram statistics on the face feature information to obtain the face sample before the face change Set of face richness characteristics. Wherein, the facial feature information may include, but is not limited to, information in at least one of the following dimensions: facial feature points, facial angles, facial expressions, and so on. Methods for extracting facial feature information may include, but are not limited to, face detection, facial feature point extraction, facial angle recognition, facial expression recognition, and so on.

Step 305: Calculate the matching degree between the extracted face richness feature and the face richness feature of the template face sample set in the template face sample set group.

In this embodiment, the above-mentioned execution subject may calculate the degree of matching between the extracted face richness feature and the face richness feature of the template face sample set in the template face sample set group. Among them, the value of the matching degree is usually between 0 and 1, 0 means no match at all, and 1 means complete match. It should be noted that the face richness features of the template face sample set can be pre-selected and extracted, and the extraction method is the same as the face richness feature extraction method of the face sample set before the face change, and will not be repeated here.

In some optional implementations of this embodiment, the above-mentioned execution subject may use the histogram matching method to calculate the extracted face richness features and the face richness of the template face sample set in the template face sample set group. The degree of matching of the feature.

Step 306: Determine a template face sample set from the template face sample set group based on the calculated matching degree.

In this embodiment, the above-mentioned execution subject may determine the template face sample set from the template face sample set group based on the calculated matching degree. For example, the above-mentioned execution subject may select the template face sample set with the highest matching degree from the template face sample set group.

In some optional implementations of this embodiment, the above-mentioned execution subject may compare the matching degree of the template face sample set in the template face sample set group with a preset matching degree threshold (for example, 0.7). If there is a template face sample set with a matching degree greater than a preset matching degree threshold in the template face sample set group, the above-mentioned execution subject may select the template face sample set with the highest matching degree from the template face sample set group. If there is no template face sample set with a matching degree greater than the preset matching degree threshold in the template face sample set group, the above-mentioned execution subject may select a general template face sample set from the template face sample set group. Generally, a universal template face sample set is preset in the template face sample set group.

Step 307: Use a machine learning method to train the determined pre-training model based on the face sample set before the face change and the determined template face sample set to obtain the face change model.

In this embodiment, the specific operation of step 307 has been described in detail in step 204 in the embodiment shown in FIG. 2, and will not be repeated here.

It can be seen from FIG. 3 that, compared with the embodiment corresponding to FIG. 2, the process 300 of the method for training a face-changing model in this embodiment highlights the determination of the pre-training model based on the face attribute information and the richness based on the face. The step of determining the template face sample set by the degree feature. Therefore, the solution described in this embodiment uses a face attribute recognition algorithm to fine-grained matching pre-training models, and uses a face richness detection algorithm to select a template face sample set, so as to realize the use of the most matching template face with facial richness features. The sample set trains the pre-trained model with the most similar face attribute information, improves the face-changing effect of the trained face-changing model, and makes the output of the face-changing model more realistic.

4, which shows a schematic structural diagram of a computer system 400 suitable for implementing a computer device (for example, the device 102 shown in FIG. 1) of an embodiment of the present application. The computer device shown in FIG. 4 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present application.

As shown in FIG. 4, the computer system 400 includes a central processing unit (CPU) 401, which can be based on a program stored in a read-only memory (ROM) 402 or a program loaded from a storage part 408 into a random access memory (RAM) 403 And perform various appropriate actions and processing. In the RAM 403, various programs and data required for the operation of the system 400 are also stored. The CPU 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404. An input/output (I/O) interface 405 is also connected to the bus 404.

The following components are connected to the I/O interface 405: an input part 406 including a keyboard, a mouse, etc.; an output part 407 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and speakers, etc.; a storage part 408 including a hard disk, etc. ; And a communication section 409 including a network interface card such as a LAN card, a modem, and the like. The communication section 409 performs communication processing via a network such as the Internet. The driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 410 as required, so that the computer program read from it is installed into the storage section 408 as required.

In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart can be implemented as a computer software program. For example, an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network through the communication part 409, and/or installed from the removable medium 411. When the computer program is executed by the central processing unit (CPU) 401, the above-mentioned functions defined in the method of the present application are executed.

It should be noted that the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this application, a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device. In this application, a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium. The computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device . The program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wireless, wire, optical cable, RF, etc., or any suitable combination of the above.

The computer program code used to perform the operations of this application can be written in one or more programming languages or a combination thereof. The programming languages include object-oriented programming languages-such as Java, Smalltalk, C++, and also conventional The procedural programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or electronic device. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, through an Internet service provider). Internet connection).

The flowcharts and block diagrams in the accompanying drawings illustrate the possible implementation of the system architecture, functions, and operations of the system, method, and computer program product according to various embodiments of the present application. In this regard, each block in the flowchart or block diagram may represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions. It should also be noted that, in some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.

The units involved in the embodiments described in the present application can be implemented in software or hardware. The described unit may also be provided in the processor. For example, it may be described as: a processor includes a receiving unit, a first determining unit, a second determining unit, and a training unit. Among them, the names of these units do not constitute a limitation on the unit itself in this case. For example, the receiving unit can also be described as "a unit that receives a face-changing model training request sent by a user".

As another aspect, this application also provides a computer-readable medium. The computer-readable medium may be included in the computer equipment described in the above-mentioned embodiments; it may also exist alone without being assembled into the computer equipment. in. The above-mentioned computer-readable medium carries one or more programs. When the above-mentioned one or more programs are executed by the computer device, the computer device: receives a face-changing model training request sent by a user, wherein, in the face-changing model training request Including the face sample set before the face change and the specified template face identifier provided by the user; the pre-training model matching the face sample set before the face change is determined from the pre-training model set corresponding to the template face identifier, where the pre-training model The set includes pre-trained models based on the target face sample set group and the template face sample set group corresponding to the template face identifier; the template face that matches the face sample set before the face change is determined from the template face sample set group Sample set: Using machine learning methods, the determined pre-training model is trained based on the face sample set before the face change and the determined template face sample set to obtain the face change model.

The above description is only a preferred embodiment of the present application and an explanation of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solutions formed by the specific combination of the above technical features, and should also cover the above technical features or technical solutions without departing from the above inventive concept. Other technical solutions formed by arbitrarily combining the equivalent features. For example, the above-mentioned features and the technical features disclosed in this application (but not limited to) with similar functions are mutually replaced to form a technical solution.

Claims

A method for training a face-changing model, including:

Receiving a face-changing model training request sent by a user, where the face-changing model training request includes a face sample set before the face change and a designated template face identifier provided by the user;

From the pre-training model set corresponding to the template face identifier, a pre-training model that matches the face sample set before the face change is determined, wherein the pre-training model set includes a sample set based on the target face and the template The pre-trained model of the template face sample set corresponding to the face identifier;

Determining, from the template face sample set group, a template face sample set that matches the face sample set before the face change;

Using a machine learning method, the determined pre-training model is trained based on the face sample set before the face change and the determined template face sample set to obtain the face change model.
The method according to claim 1, wherein the determining from the pre-training model set corresponding to the template face identifier the pre-training model that matches the face sample set before the face change comprises:

If there is a pre-training model corresponding to the template face identifier in the user’s historical face change record, the pre-training model corresponding to the template face identifier is determined as the pre-training model that matches the face sample set before the face change. Train the model.
The method according to claim 2, wherein said determining from the set of pre-training models corresponding to the template face identifiers a pre-training model that matches the set of face samples before the face change, further comprising:

If the pre-trained model corresponding to the template face identifier does not exist in the user's historical face change record, identifying the face attribute information of the face sample set before the face change;

Based on the recognized face attribute information, a pre-training model is determined from the set of pre-training models.
The method according to claim 3, wherein the face attribute information includes information in at least one of the following dimensions: gender, age group, race, facial accessories, and face shape.
The method according to claim 4, wherein the identifying the face attribute information of the face sample set before the face change comprises:

Input the face sample set before the face change to the pre-trained first classification model to obtain information about at least one dimension of the gender, age group, race, and facial accessories of the face sample set before the face change , Wherein the first classification model is a classification model based on a convolutional neural network.
The method according to claim 4 or 5, wherein the identifying the face attribute information of the face sample set before the face change comprises:

Extracting face classification features of the face sample set before the face change;

The extracted facial classification features are input to a pre-trained second classification model to obtain the face shape of the face sample set before the face change, wherein the second classification model is a classification model based on a support vector machine.
The method according to claim 6, wherein said extracting the facial classification features of the face sample set before the face change comprises:

Extracting face feature point information of the face sample set before the face change;

Calculating the face measurement parameters of the face sample set before the face change based on the extracted face feature point information;

The extracted face feature point information and the calculated face measurement parameters are combined into the face classification feature of the face sample set before the face change.
The method according to claim 3, wherein the determining a pre-training model from the set of pre-training models based on the recognized face attribute information comprises:

Determining a subset of pre-training models matching the recognized face attribute information from the set of pre-training models;

Calculating the similarity between the face sample set before the face change and the target face sample set corresponding to the pre-training model in the pre-training model subset;

Based on the calculated similarity, a pre-training model is determined from the subset of pre-training models.
The method according to claim 8, wherein the calculating the similarity between the face sample set before the face change and the target face sample set corresponding to the pre-training model in the pre-training model subset comprises:

Extracting an average face feature vector of the face sample set before the face change;

Calculate the cosine similarity between the extracted average face feature vector and the average face feature vector of the target face sample set corresponding to the pre-training model in the pre-training model subset.
The method according to claim 1, wherein the determining a template face sample set matching the pre-changing face sample set from the template face sample set group comprises:

Extracting the face richness features of the face sample set before the face change;

Calculating the degree of matching between the extracted face richness features and the face richness features of the template face sample set in the template face sample set group;

Based on the calculated matching degree, a template face sample set is determined from the template face sample set group.
The method according to claim 10, wherein said extracting the face richness features of the face sample set before the face change comprises:

Extracting face feature information of the face sample set before the face change;

Performing histogram statistics on the face feature information to obtain the face richness features of the face sample set before the face change.
The method according to claim 11, wherein the facial feature information includes information in at least one of the following dimensions: facial feature points, facial angles, and facial expressions.
The method according to claim 11, wherein the calculating the matching degree between the extracted face richness feature and the face richness feature of the template face sample set in the template face sample set group comprises:

Using the histogram matching method, the degree of matching between the extracted face richness feature and the face richness feature of the template face sample set in the template face sample set group is calculated.
The method according to claim 10, wherein the determining a template face sample set from the template face sample set group based on the calculated matching degree comprises:

If there is a template face sample set with a matching degree greater than a preset matching degree threshold in the template face sample set group, select the template face sample set with the highest matching degree from the template face sample set group;

If there is no template face sample set with a matching degree greater than the preset matching degree threshold in the template face sample set group, a universal template face sample set is selected from the template face sample set group.
The method according to claim 1, wherein the pre-training model set is trained through the following steps:

Obtain multiple target face samples;

Dividing the multiple target face samples into the target face sample set groups according to the face attributes, wherein the target face samples in the same target face sample set have similar face attributes;

For the target face sample set in the target face sample set group, the generative confrontation network is trained based on the target face sample set and the template face sample set matching the target face sample set to obtain pre-training model.
The method according to claim 15, wherein the pre-training model includes a generative model and a discriminant model; and

The machine learning method is used to train the determined pre-training model based on the face sample set before the face change and the determined template face sample set to obtain the face change model, including:

Inputting the face sample set before the face change into the determined generation model of the pre-training model to obtain the face sample set after the face change;

The face sample set after the face change and the determined template face sample set are input into the discriminant model of the determined pre-training model to obtain a discrimination result, wherein the discrimination result is used to characterize the face after the face change The probability that the sample set and the determined template face sample set are true sample sets;

The parameters of the determined generation model and the discrimination model of the pre-training model are adjusted based on the discrimination result.
The method according to claim 16, wherein the adjusting the parameters of the determined generation model and the discrimination model of the pre-training model based on the discrimination result comprises:

Determine whether the judgment result meets the constraint condition;

If the discrimination result does not satisfy the constraint condition, adjust the parameters of the determined generation model and discrimination model of the pre-training model based on the discrimination result, and again based on the face sample set before the face change and the determined template The face sample set trains the determined pre-training model;

If the discrimination result satisfies the constraint condition, it is determined that the training of the face-changing model is completed, and the final face-changing face sample set output by the generation model of the determined pre-training model is sent to the user.
A computer device including:

One or more processors;

A storage device, on which one or more programs are stored;

When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1-17.
A computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the method according to any one of claims 1-17 is realized.