CN115423677A

CN115423677A - Image face changing processing method and device, electronic equipment and storage medium

Info

Publication number: CN115423677A
Application number: CN202210879102.XA
Authority: CN
Inventors: 秦泽奎; 李强; 张国鑫; 刘明聪; 邹倩芳; 牛雪松; 叶奎; 郭建珠; 谷继力
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2022-07-25
Filing date: 2022-07-25
Publication date: 2022-12-02

Abstract

The disclosure relates to an image face changing processing method, an image face changing processing device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring an original face image and a target face image; respectively extracting the identity features of the original facial image and the accessory features of the target facial image to obtain the identity features and the accessory features; extracting the facial features in the corresponding image of a plurality of preset local facial feature areas in the original facial image to obtain a plurality of local facial feature features; and synthesizing facial images of the local facial features, the identity features and the accessory features by adopting a synthesis model obtained by pre-training to obtain a face change result image from the original facial image to the target facial image, wherein the facial image features in the face change result image comprise the local facial features, the identity features and the accessory features. According to the scheme, the local facial features, the identity features and the auxiliary features are synthesized by using the synthesis model, so that the facial feature similarity of the face changing result image and the original image is improved, and the face changing effect is optimized.

Description

Image face changing processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision, and in particular, to an image face-changing processing method and apparatus, an electronic device, and a storage medium.

Background

The AI (Artificial Intelligence) face change can replace the face in the target image with the face in the original image, thereby realizing scenes which are difficult to realize by the user in daily life, such as antique changing, world travel and the like.

The first stage is to change the face of the target image to obtain a face-changed image, wherein the face-changed image comprises a self-adaptive attention adjustment module and a module for dynamically fusing original image information and target image information; the second stage is to correct the occlusion problem in the face change image.

The inventor finds that the face changing method has the defect that the five-sense-organ similarity between the face changing image and the original image is not high enough, so that the face changing effect is poor.

Disclosure of Invention

The present disclosure provides an image face-changing processing method, apparatus, electronic device, and storage medium, to at least solve the problem in the related art that the matching precision of text and image is not high. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided an image face changing processing method, including:

acquiring an original face image and a target face image; the target face image is a reference image for face changing;

respectively extracting the identity characteristics of the original facial image and the accessory characteristics of the target facial image to obtain identity characteristics and accessory characteristics; the identity features represent facial image features used for face recognition, and the accessory features are features of the target facial image except the identity features;

extracting the features of the five sense organs in the image corresponding to the preset local five sense organ regions in the original face image to obtain a plurality of local five sense organ features;

and synthesizing the facial images of the local facial features, the identity features and the accessory features by adopting a synthesis model obtained by pre-training to obtain a face change result image from the original facial image to the target facial image, wherein the facial image features in the face change result image comprise the local facial features, the identity features and the accessory features.

Optionally, the synthesis model is constructed by the following method:

acquiring a first face image sample set and a second face image sample set; the first set of face image samples comprises an original face sample image, the second set of face image samples comprises a target face sample image; the target face sample image is a reference image used for face changing processing of the original face sample image;

extracting facial features in images corresponding to the preset local facial regions in the original facial sample image to obtain local facial features of the samples;

extracting identity features in the original face sample image as sample identity features, and extracting accessory features in the target face sample image as sample accessory features;

inputting the local facial features of the multiple samples, the identity features of the samples and the auxiliary features of the samples into an initial model to obtain a sample face changing result image output by the initial model;

determining a loss value between the sample face-changing result image and the original face sample image and the target face sample image; the loss values comprise at least a first loss value between the sample local facial feature and a resulting local facial feature of the sample re-face result image; the result local facial features are extracted from the preset local facial feature region of the sample face changing result image;

and adjusting parameters of the initial model according to the loss value, continuously training the initial model by using the parameters until the obtained loss value is smaller than a preset threshold value, and determining the initial model obtained by training as a synthetic model.

Optionally, the determining a loss value between the sample face-changing result image and the original face sample image and the target face sample image includes:

determining a first loss value between the plurality of sample local facial features and a plurality of result local facial features comprised by the sample re-face result image; the result local facial features are local facial features included in the sample face changing result image;

determining a second loss value between the sample identity feature and a result identity feature of the sample face-changing result image; the result identity characteristic is the identity characteristic included in the sample face changing result image;

determining a third loss value between the sample accessory feature and a resulting accessory feature of the sample re-face result image; the result auxiliary features are auxiliary features included in the sample face change result image;

determining a loss value between the sample face-change result image and the original face sample image and the target face sample image based on the first loss value, the second loss value, and the third loss value.

Optionally, the determining a first loss value between the plurality of sample local facial features and a plurality of result local facial features included in the sample face-changing result image includes:

respectively determining cosine distances between the sample local facial features and the corresponding result local facial feature to obtain a plurality of cosine distances;

acquiring weights corresponding to the preset local facial features to obtain a plurality of weights;

and performing weighted summation on the plurality of cosine distances based on the plurality of weights to obtain a first loss value between the sample local facial feature and a result local facial feature included in the sample face-changing result image.

Optionally, before extracting features of five sense organs in a plurality of preset local five sense organ region corresponding images in the original face image, the method further includes:

acquiring a first facial feature region with the responsibility higher than a preset threshold value in the original facial image by using an activation layer of a face recognition model, and taking the first facial feature region as a preset local facial feature region;

or, determining a preset second five sense organ region for similarity enhancement, and taking the second five sense organ region as a preset local five sense organ region.

Optionally, after the first five sense organ region is taken as a preset local five sense organ region, or the second five sense organ region is taken as a preset local five sense organ region, the method further includes:

cropping a plurality of the preset local facial region in the original face image.

Optionally, the extracting features of five sense organs in an image corresponding to a plurality of preset local five sense organ regions in the original face image includes:

if a certain preset local facial feature region comprises at least two facial organs, extracting feature information of each facial organ in the at least two facial organs, and extracting feature information of a facial region between two adjacent facial organs to serve as facial feature in an image corresponding to the certain preset local facial feature region.

According to a second aspect of the embodiments of the present disclosure, there is provided an image face-changing processing apparatus, the apparatus including:

an image acquisition module configured to perform acquiring an original face image and a target face image; the target face image is a reference image for face changing;

a first feature extraction module configured to perform respective extraction of an identity feature of the original facial image and an accessory feature of the target facial image, resulting in an identity feature and an accessory feature; the identity features represent facial image features used for face recognition, and the accessory features are features of the target facial image except the identity features;

the second feature extraction module is configured to extract the features of the five sense organs in the images corresponding to the preset local five sense organ areas in the original face image to obtain a plurality of local five sense organ features;

a synthesis module configured to perform face image synthesis on the local facial features, the identity features and the accessory features by using a synthesis model obtained through pre-training to obtain a face change result image from the original face image to the target face image, wherein the face image features in the face change result image include the local facial features, the identity features and the accessory features.

Optionally, the apparatus further comprises a synthesis model construction module configured to perform:

extracting the features of the five sense organs in the images corresponding to the preset local five sense organ regions in the original face sample image to obtain the local five sense organ features of the samples;

Optionally, the synthesis model building module is further configured to perform:

determining a first loss value between the plurality of sample local facial features and a plurality of resulting local facial features comprised by the sample re-face result image; the result local facial features are local facial features included in the sample face change result image;

determining a third loss value between the sample accessory feature and a resulting accessory feature of the sample re-face result image; the result auxiliary features are auxiliary features included in the sample face-changing result image;

Optionally, the synthesis model construction module is further configured to perform:

respectively determining cosine distances between the sample local facial features and the corresponding result local facial features to obtain a plurality of cosine distances;

and weighting and summing the cosine distances based on the weights to obtain a first loss value between the sample local facial feature and a result local facial feature included in the sample face-changed result image.

Optionally, the apparatus further comprises:

the first preset module is configured to acquire a first facial region with the responsibility higher than a preset threshold in the original facial image by using an activation layer of a face recognition model, and the first facial region is taken as a preset local facial region;

or the second preset module is configured to determine a preset second facial feature region for similarity enhancement, and take the second facial feature region as a preset local facial feature region.

Optionally, the apparatus further comprises:

a cropping module configured to perform cropping of a plurality of the preset local facial region in the original face image.

Optionally, the second feature extraction module is further configured to perform:

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image face-changing processing method according to the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium in which instructions, when executed by a processor of a server, enable the server to perform the image face-changing processing method according to the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer programs/instructions, wherein the computer programs/instructions, when executed by a processor, implement the image face-changing processing method of the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in an embodiment of the present disclosure, an original face image and a target face image are acquired; the target face image is a reference image for face changing processing; respectively extracting the identity features of the original facial image and the accessory features of the target facial image to obtain the identity features and the accessory features; the identity feature represents facial image features used for face recognition, and the accessory features are features in the target facial image except the identity features; extracting the features of the five sense organs in the image corresponding to a plurality of preset local five sense organ regions in the original facial image to obtain a plurality of local five sense organ features; and synthesizing facial images of the plurality of local facial features, the identity features and the accessory features by adopting a synthesis model obtained by pre-training to obtain a face change result image from the original facial image to the target facial image, wherein the facial image features in the face change result image comprise the local facial features, the identity features and the accessory features. According to the scheme, the local facial features of the multiple preset local facial feature areas of the original facial image are extracted, and the multiple local facial feature areas, the identity features and the auxiliary features are synthesized by using the synthesis model, so that the preset local facial feature areas of the face changing result image keep the facial features in the original facial image, the facial similarity between the face changing result image and the facial features of the original facial image is improved, and the face changing effect is optimized.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a flowchart illustrating steps of a method of image facelining according to an exemplary embodiment;

FIG. 2 is a flow diagram illustrating steps in a method for building a composite model in accordance with an exemplary embodiment;

FIG. 3 is an overall framework diagram illustrating a method of building a composite model in accordance with an exemplary embodiment;

FIG. 4 is a graph comparing loss values for a conventional method and present scheme, according to an exemplary embodiment;

FIG. 5 is a flowchart illustrating a step of calculating a first loss value in accordance with an exemplary embodiment;

FIG. 6 is a schematic diagram illustrating a calculation of a first loss value in accordance with an exemplary embodiment;

fig. 7 is a block diagram showing a configuration of an image retouching apparatus according to an exemplary embodiment;

FIG. 8 is a block diagram illustrating an electronic device for image facelining, according to an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in other sequences than those illustrated or described herein. The implementations described in the exemplary embodiments below do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The inventor finds that the existing face changing method only uses the overall facial features to restrict the similarity between the face changing image and the original image, but the overall facial features are used for training a discrimination task and are not enough to provide accurate guidance for the generation of each detail of the face for a generation-type task. Therefore, the image face changing processing method in the present invention is proposed.

Fig. 1 is a flowchart illustrating steps of an image facelining method according to an exemplary embodiment, as shown in fig. 1, the method including the following steps.

In step S11, an original face image and a target face image are acquired; the target face image is a reference image for face changing processing.

The original face image is an image to be subjected to face changing processing, and the target face image is a reference image in the face changing processing.

The original face image and the target face image are both face images, which may include face images of a person or face images of an animal. The face image may be a face image obtained by actual shooting, or a face image manually drawn or generated by a computer.

In step S12, the identity feature of the original facial image and the accessory feature of the target facial image are respectively extracted to obtain an identity feature and an accessory feature; the identity features represent facial image features used for face recognition, and the accessory features are features of the target facial image except the identity features.

Generally, the face changing process is to replace the identity feature in the target face image with the identity feature in the original face image and to retain the accessory features in the target face image.

The identity features refer to facial features used for face recognition, such as geometric features, statistical features, model features, neural network features and the like.

The accessory features are features in the target face image except for the identity features, and specifically can comprise self accessory features and external accessory features. The self-attachment features include other features of the face itself besides identity features, such as facial pose features, skin features, expression features, and the like, and the foreign object attachment features include features of light, hats, ornaments and the like appearing on the face.

The identity features can be extracted by adopting a trained face recognition model, and the accessory features can be extracted by adopting a trained accessory feature extractor.

In step S13, facial features in images corresponding to a plurality of preset local facial regions in the original facial image are extracted to obtain a plurality of local facial features.

In the embodiment of the invention, in addition to extracting the identity features of the original face image, the five sense organ features in the original face image are also extracted.

And cutting a plurality of preset local facial feature areas from the original face image, and extracting facial feature of each preset local facial feature area from the cut image to obtain a plurality of local facial feature characteristics.

If the eyebrow region, the eye region, the nose region, the mouth region and the ear region are respectively set as preset local facial features, the local facial features respectively comprise eyebrow features, eye features, nose features, mouth features and ear features. If the preset local five sense organ regions are respectively the combination of the eyebrow region and the nose region, the local five sense organ characteristics comprise: the characteristics of the eyebrows and eyes and the area between the eyebrows and eyes, the nose and mouth and the area between the nose and mouth.

In step S14, a synthesis model obtained through pre-training is adopted to synthesize face images of the local facial features, the identity features, and the accessory features, so as to obtain a face change result image from the original face image to the target face image, where the face image features in the face change result image include the local facial features, the identity features, and the accessory features.

The facial image changing method can be used for synthesizing a plurality of local facial features, identity features and accessory features by adopting a pre-trained synthesis model, so that a face changing result image with the accessory features of the target facial image and the facial features and the identity features of the original facial image is obtained.

The synthetic model is constructed based on the facial features of the preset local facial region, so that the facial features of the original facial image can be ensured to be enhanced and expressed, and the similarity between the facial features of the face-changing result image and the original facial image is improved.

Compared with the prior art that only the identity characteristics of the original face image and the auxiliary characteristics of the target face image are synthesized, the method and the device provided by the invention can be used for splitting the facial features of the original target image into a plurality of local facial features and synthesizing the local facial features with the target face image, so that the facial features of the preset local facial region in the synthesis result are further enhanced, more facial features of the original face image are reserved in the face change result image, and the facial similarity between the face change result image and the original face image is improved.

In conclusion, according to the scheme, the local facial features of the preset local facial feature areas of the original facial image are extracted, and the local facial features, the identity features and the auxiliary features are synthesized by using the synthesis model, so that the preset local facial feature areas of the first face changing result image keep the facial features in the original facial image, the facial similarity between the face changing result image and the facial features of the original facial image is improved, and the face changing effect is optimized.

FIG. 2 is a flowchart illustrating steps of a method of constructing a composite model, as shown in FIG. 2, including the following steps, according to an exemplary embodiment.

In step S21, a first face image sample set and a second face image sample set are acquired; the first set of face image samples comprises an original face sample image, and the second set of face image samples comprises a target face sample image; the target face sample image is a reference image used for face changing processing of the original face sample image.

In particular, the first set of face image samples and the second set of face image samples may be collected over the internet.

For example, the MS1MV2 face recognition dataset may be used to screen a large number of images from it, including various gender, facial features, expressions, facial lighting, facial background, etc., as a sample image set.

Images that can be used as original face sample images can be further selected from the sample image sets to form a first face image sample set, and images that can be used as target face sample images can be further selected to form a second face image sample set.

In step S22, extracting features of five sense organs in the image corresponding to the preset local five sense organ regions in the original face sample image to obtain local five sense organ features of multiple samples.

Here, the preset local facial region during model training is the same as the region corresponding to the preset local facial region in step S13. Namely, when the model is used for face changing, the region which is the same as the preset local facial region during model training is selected, so that the face changing effect is better.

Similar to step S13, if the eyebrow region, the eye region, the nose region, the mouth region and the ear region are respectively set as the preset local facial features, the local facial features of the sample respectively include eyebrow features, eye features, nose features, mouth features and ear features. If the preset local facial features are respectively the combination of the eyebrow area and the nose area, the local facial features of the sample comprise: features of the eyebrows and eyes and the area between the eyebrows and eyes, of the nose and mouth and of the area between the nose and mouth.

In step S23, the identity features in the original face sample image are extracted as sample identity features, and the accessory features in the target face sample image are extracted as sample accessory features.

And simultaneously, extracting the identity characteristics of the original face sample image to obtain the sample identity characteristics. The identity features refer to facial features used for face recognition, such as geometric features, statistical features, model features, neural network features and the like.

And taking the target face sample image in the target face sample image set as a target face image, and extracting the accessory features in the target face sample image to obtain sample accessory features. The accessory features refer to facial accessory features such as facial light, facial pose, facial expression and facial background.

In step S24, the local facial features of the multiple samples, the identity features of the samples, and the sample accessory features are input into an initial model, so as to obtain a sample face-changing result image output by the initial model.

The model structure of the initial model may employ a ResNet18 backbone network, which includes 17 convolutional layers (conv) and 1 fully-connected layer (fc), the core of which is a residual unit.

Inputting the local facial features of the multiple samples, the identity features of the samples and the auxiliary features of the samples into an initial model, and training the initial model according to a sample face changing result image output by the initial model.

In step S25, determining a loss value between the sample face change result image and the original face sample image and the target face sample image; the loss values comprise at least a first loss value between the sample local facial feature and a resulting local facial feature of the sample re-face result image; the result local facial features are facial features extracted from the preset local facial feature region of the sample face change result image.

And respectively extracting local facial features, identity features and accessory features in the sample face changing result image, comparing the features with related features in the original face sample image and the target face sample image, and determining differences so as to obtain a loss value of the training.

In the conventional method, only the loss values corresponding to the identity features and the accessory features are generally determined, in this scheme, in addition to the above two loss values, the loss value between the local features of the five sense organs is also determined, and the loss value between the local features of the five sense organs is: loss value between the sample local facial features and the resulting local facial features. Thus, the similarity of the facial change result image and the facial features of the original facial image can be improved.

In one possible embodiment, step S25 includes the following steps S251 to S253:

in step S251, a first loss value between the plurality of sample local facial features and a plurality of result local facial features included in the sample face-change result image is determined.

In order to improve the similarity between the facial feature output by the model and the facial feature in the original facial image, the difference between the sample local facial feature and the result local facial feature included in the facial result image is used as the first loss value of the model.

And the result local facial features are the facial features of the preset local facial features extracted from the face changing result image. Therefore, the result local facial features and the facial regions corresponding to the sample local facial features are kept consistent, and the similarity of the preset local facial features can be enhanced through multiple training.

In addition, only the loss value of the facial features is considered in the step, and attributes such as facial pose, facial expression and the like which are irrelevant to the facial features are discarded, so that the similarity of the facial features can be effectively restricted when the facial pose and the facial expression of the original facial image and the target facial image are inconsistent.

In step S252, determining a second loss value between the sample identity feature and a result identity feature of the sample face change result image; and the result identity features are the identity features included in the sample face changing result image.

And extracting result identity characteristics from the sample face changing result image, comparing the result identity characteristics with the sample identity characteristics in the original face sample image, determining a difference value between the result identity characteristics and the sample identity characteristics, and taking the difference value as a second loss value.

Determining a third loss value between the sample ancillary features and the resulting ancillary features of the sample re-face result image in step S253; the result auxiliary features are auxiliary features included in the sample face-changing result image.

And extracting result auxiliary features from the sample face change result image, comparing the result auxiliary features with the sample auxiliary features in the target face sample image, determining a difference value between the result auxiliary features and the sample auxiliary features, and taking the difference value as a third loss value.

In step S254, a loss value between the sample face change result image and the original face sample image and the target face sample image is determined based on the first loss value, the second loss value, and the third loss value.

Weights corresponding to the first loss value, the second loss value and the third loss value can be predetermined, and then the first loss value, the second loss value and the third loss value are subjected to weighted fusion to obtain the loss value between the sample face change result image and the original face sample image and between the sample face change result image and the target face sample image.

In addition, a manner of directly adding the first loss value, the second loss value, and the third loss value to obtain a loss value may be adopted, or a manner of obtaining a loss value by processing the first loss value, the second loss value, and the third loss value with an external parameter or a calculation formula may be adopted, which is not specifically limited in this embodiment of the disclosure.

In step S26, adjusting parameters of the initial model according to the loss value, and continuing to train the initial model using the parameters until the obtained loss value is smaller than a preset threshold, and determining the initial model obtained by training as a synthetic model.

And adjusting parameters of the initial model according to the loss value, so that the loss value between the face change result image output by the initial model next time and the original face sample image and the target face sample image is reduced until the finally obtained loss value is smaller than a preset threshold value, and finishing training if the precision of the initial model meets the preset precision requirement.

And determining the initial model when the loss value is smaller than a preset threshold value as a synthetic model.

In a possible embodiment, the adjusting the parameters of the initial model according to the loss value includes:

and adjusting parameters of the initial model according to the first loss value, the second loss value and the third loss value.

In the embodiment of the invention, the parameters of the initial model corresponding to each loss value can be respectively adjusted, so that the adjustment of the parameters is more comprehensive and effective.

In steps S21 to S26, since it is constrained during the training process that the smaller the difference between the input original face image and the output face change result image, the better the difference is, and the greater the similarity is, the better the similarity is, so that the face change processing is performed on the original face image and the target face image by using the synthesis model, which can improve the similarity between the facial features of the output face change result image and the facial features of the original face image, and optimize the face change effect.

FIG. 3 is an overall framework diagram illustrating a method of building a composite model according to an exemplary embodiment.

As shown in fig. 3, sample local five sense organ features and sample identity features are extracted from an original face sample image and input into a synthesis model, and sample accessory features are extracted from a target face sample image and input into an initial model. And synthesizing the local facial features of the sample, the identity features of the sample and the accessory features of the sample in the initial model to obtain a sample face changing result image. Training the initial model according to a facial feature loss value (namely a first loss value) and an identity feature loss value (namely a second loss value) between the sample face change result image and the original face sample image and an auxiliary feature loss value (namely a third loss value) between the sample face change result image and the target face sample image, and finally obtaining a synthetic model.

Fig. 4 is a graph comparing loss values for a conventional method and present scheme, according to an exemplary embodiment.

As shown in fig. 4, the dark color curve represents the loss value curve of the conventional method, and the light color curve represents the loss value curve of the present scheme. It can be seen that the dark color curve has gentle trend and small oscillation, the light color curve is obviously reduced, the reduction speed of the loss value of the scheme is higher than that of the traditional method, and the final loss value of the scheme is smaller. Therefore, in the traditional method, the constraint of the identity features and the auxiliary features on the similarity is not strong enough, and a space for further improvement is provided.

In summary, the method for constructing a synthetic model provided in the embodiment of the present invention increases the loss value of the facial features in the training process, and constrains the difference between the original facial image and the output face-change result image to be smaller and better, and the similarity is higher and better, so that the synthetic model obtained by training can improve the similarity between the facial image-change result image and the facial features of the original facial image.

In the training process of the synthetic model, the weighted sum of the difference values of the facial features of all dimensions is determined under the condition that the weight of each preset local facial feature region is considered, namely the loss value is the difference of the facial features of all dimensions between the input original facial sample image and the output face changing result image, the output loss of the model can be accurately described, the parameter improvement direction of the model is further obtained, and the training precision of the model is improved.

In one possible implementation, as shown in fig. 5, the step S251 includes the following steps S2511 to S2513:

in step S2511, cosine distances between the sample local features and the corresponding result local feature are determined, respectively, to obtain a plurality of cosine distances.

The cosine distance is a value obtained by subtracting the cosine similarity of two vectors from 1. Cosine similarity refers to the cosine of the angle between two vectors. The cosine distance may describe the relationship between two feature vectors.

If there are multiple regions of the five sense organs, there are multiple local features of the five sense organs in the sample and multiple local features of the five sense organs in the result. And respectively determining the cosine distance between the sample local facial features and the result local facial features corresponding to each local facial region.

For example, two local five sense organ regions are preset, which are respectively: and respectively calculating cosine distances between the sample local five sense organ characteristics and the result local five sense organ characteristics corresponding to the eyebrow area and between the sample local five sense organ characteristics and the result local five sense organ characteristics corresponding to the nose area, and obtaining two cosine distances.

In step S2512, weights corresponding to the preset local facial feature regions are obtained, and a plurality of weights are obtained.

The weight corresponding to each preset local facial region can be determined in advance according to the importance of the facial region in the facial region. Due to the fact that the facial images in different postures and angles have different importance of the five sense organ regions, the weights can be adjusted according to the differences of the facial images.

In step S2513, based on the weights, the cosine distances are summed in a weighted manner to obtain a first loss value between the sample local facial feature and a result local facial feature included in the sample face-changed result image.

Multiplying the weight of each preset local facial features area by the corresponding cosine distance to obtain a weighted value of each preset local facial features area, and summing all weighted values to obtain a first loss value.

Specifically, the calculation formula of the first loss value between the sample local feature of the five sense organs and the result local feature of the five sense organs is as follows:

wherein, tau _id Represents the loss value, x _s Representing an original face sample image, Y _s2t A sample face-change result image is represented,

representing a feature vector of the five sense organs; r represents a preset local facial region of each dimension, k represents a facial region of a current dimension,

representing the weight corresponding to the region of the five sense organs for the current dimension. For example, if there are two preset local facial regions, one for the eyebrow region and one for the nose and mouth region, r =2.k =1 indicates an eyebrow region, and k =2 indicates a nose and mouth region.

And the cosine distance between the facial feature vector of the original face sample image and the facial feature vector of the sample face change result image is represented.

FIG. 6 is a diagram illustrating one type of calculating a first loss value according to an exemplary embodiment.

As shown in fig. 6, a feature extraction encoder is used to extract sample local features of five sense organs from an original face sample image, extract result local features of five sense organs from a sample face-changing result image, calculate cosine similarity between the sample local features of five sense organs and the result local features of five sense organs, and obtain a first loss value according to the cosine similarity and a corresponding weight.

By using the method of steps S2511 to S2513, the weighted sum of the difference values of the facial feature regions of each dimension, that is, the first loss value, can be determined in consideration of the weight of each preset local facial feature region, where the first loss value is the difference of the facial feature of each dimension between the input original face sample image and the output face-changing result image, and can accurately describe the output loss of the model, thereby obtaining the parameter improvement direction of the model, and improving the training precision of the model.

In a possible implementation, before the clipping the preset local facial feature regions of the original face image, the method further includes step S31 or step S32:

step S31, acquiring a first facial feature region with the responsibility higher than a preset threshold value in the original facial image by using an activation layer of a face recognition model, and taking the first facial feature region as a preset local facial feature region;

or, step S32, determining a preset second five sense organ region for similarity enhancement, and using the second five sense organ region as a preset local five sense organ region.

In step S31, the present solution determines the five sense organ regions to be enhanced according to the face recognition model. The aim of face changing is to make the face changing result image and the original face image similar enough, so that the subjective feeling of people is the same person and the face recognition model is judged to be the same person. Then what is the criterion of the face recognition model?

By visualizing the activation value of the last layer of the face recognition model, it can be seen that, compared with other face regions, the responsivity of the five sense organ regions such as eyes, nose, mouth, eyebrows and the like is higher than a preset threshold, which indicates that people generally judge the similarity of faces according to the places. Therefore, the scheme cuts the first facial region with the responsibility higher than the preset threshold value as the preset local facial region.

In step S32, the region concerned by the user may also be flexibly and self-definitively cropped according to the present algorithm, that is, the region that needs to be subjected to similarity enhancement is set as the preset local five-sense region. For example, a preset local five sense organ region may be set as a left eye region, then the left eye region of the original face image is subsequently cropped, and the five sense organ features in the images corresponding to the left and right regions are extracted, so as to obtain the left eye feature of the original face image. And meanwhile, selecting a synthesis model constructed based on the left eye features, synthesizing the left eye features of the original face image, the first contour features of the original face image and the accessory features of the target face image, and finally obtaining a face changing result image only enhancing the left eye features.

In one possible implementation, the extracting features of five sense organs in a plurality of preset local five sense organ region corresponding images in the original face image comprises: if a certain preset local five sense organ region comprises at least two facial organs, extracting feature information of each facial organ in the at least two facial organs, and extracting feature information of a facial region between two adjacent facial organs to serve as the five sense organ features in an image corresponding to the certain preset local five sense organ region.

In the embodiment of the present invention, the preset local five sense organ region may be one facial organ, or may be two or more facial organs.

Specifically, each individual organ of the face may be respectively used as a preset local five sense organ region according to requirements, that is, an eyebrow region, an eye region, a nose region, a mouth region, and an ear region are respectively set as the preset local five sense organ regions.

It is also possible to arrange a combination of a plurality of individual organs as the preset local penta-functional region. For example, the eyebrow region and the eye region are combined as a preset partial five sense organ region, and the nose region and the mouth region are combined as a preset partial five sense organ region.

In the case where the preset local five sense organ region includes at least two facial organs, in addition to extracting feature information of the respective two organs, feature information of a facial region between adjacent two facial organs is extracted. For example, combining the nose region and the mouth region as one preset local five sense organ region, in addition to extracting the feature information of the nose region and the mouth region separately, the feature information of the nasolabial sulcus region between the two is extracted.

Thus, compared with the scheme that the preset local region only comprises one facial organ, and the preset local five sense organ region comprises at least two facial organs, the extraction region of the facial region between the two facial organs is increased, so that face changing processing can be carried out based on more facial information, and the similarity of the facial region between the two facial organs in the face changing result image is further improved.

In a possible embodiment, after the first region of five sense organs is taken as the preset local region of five sense organs, or the second region of five sense organs is taken as the preset local region of five sense organs, the method further comprises:

Specifically, each preset local facial feature region is cut from the original face image to obtain an individual image of each preset local facial feature region, and further, identity feature extraction is performed on the individual image of each preset local facial feature region to obtain the identity feature of the original face image.

Similarly, before extracting the accessory features of the target face image, each preset local facial feature region may also be clipped from the target face image, and then the image of each clipped preset local facial feature region may be subjected to accessory feature extraction.

Before feature extraction, a preset local facial region is cut from an original face image or a target face image, so that image noise interference during feature extraction can be reduced, related features can be positioned more easily, and the efficiency of feature extraction is improved.

Fig. 7 is a block diagram illustrating a configuration of an image resurfacing processing apparatus according to an exemplary embodiment. As shown in fig. 7, the image resurfacing processing apparatus 40 includes:

an image acquisition module 41 configured to perform acquisition of an original face image and a target face image; the target face image is a reference image for face changing;

a first feature extraction module 42 configured to perform extraction of the identity feature of the original facial image and the accessory feature of the target facial image, respectively, to obtain an identity feature and an accessory feature; the identity features represent facial image features used for face recognition, and the accessory features are features of the target facial image except the identity features;

a second feature extraction module 43, configured to perform extraction of features of five sense organs in a plurality of preset local five sense organ region corresponding images in the original face image, so as to obtain a plurality of local features of five sense organs;

a synthesizing module 44 configured to perform face image synthesis on the local facial features, the identity features and the accessory features by using a synthesis model obtained through pre-training, so as to obtain a face change result image from the original face image to the target face image, where the face image features in the face change result image include the local facial features, the identity features and the accessory features.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 8 is a block diagram illustrating an electronic device for image re-faceting, according to an example embodiment. The internal structure thereof may be as shown in fig. 8. The server or electronic device includes a processor, memory, and a network interface connected by a system bus. Wherein the processor of the server or electronic device is used to provide computing and control capabilities. The memory of the server or the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the server or the electronic device is used for communicating with an external terminal through network connection. The computer program is executed by a processor to implement an image facelining method.

It will be appreciated by those skilled in the art that the configuration shown in fig. 8 is a block diagram of only a portion of the configuration associated with the disclosed aspects and does not constitute a limitation on the servers or electronic devices to which the disclosed aspects apply, and a particular server or electronic device may include more or fewer components than shown in the figures, or may combine certain components, or have a different arrangement of components.

In an exemplary embodiment, there is also provided a server or an electronic device including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the image face changing processing method as in the embodiment of the present disclosure.

In an exemplary embodiment, there is also provided a computer-readable storage medium, in which instructions, when executed by a processor of a server or an electronic device, enable the server or the electronic device to perform an image retouching method in the embodiments of the present disclosure. The computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the image facelining method in the embodiments of the present disclosure.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, the computer program may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image face-changing processing method, characterized in that the method comprises:

respectively extracting the identity characteristics of the original facial image and the accessory characteristics of the target facial image to obtain identity characteristics and accessory characteristics; the identity features represent facial image features used for face recognition, and the accessory features are features in the target facial image except the identity features;

extracting the facial features in the images corresponding to a plurality of preset local facial feature areas in the original facial image to obtain a plurality of local facial feature features;

2. The method of claim 1, wherein the synthetic model is constructed by:

determining a loss value between the sample face-change result image and the original face sample image and the target face sample image; the loss values comprise at least a first loss value between the sample local facial feature and a resulting local facial feature of the sample re-face result image; the result local facial features are extracted from the preset local facial feature region of the sample face changing result image;

3. The method of claim 2, wherein the determining a loss value between the sample face exchange result image and the original face sample image and the target face sample image comprises:

determining a first loss value between the plurality of sample local facial features and a plurality of result local facial features comprised by the sample re-face result image;

determining a second loss value between the sample identity feature and a result identity feature of the sample face change result image; the result identity characteristics are identity characteristics included in the sample face changing result image;

4. The method of claim 3, wherein the determining a first loss value between the plurality of sample local facial features and a plurality of resulting local facial features included in the sample re-face result image comprises:

5. The method according to any one of claims 1-4, further comprising, prior to extracting facial features in a plurality of preset local facial region correspondence images in the original facial image:

6. The method according to claim 5, wherein after the first or second facial region is taken as a predetermined local facial region, further comprising:

7. The method according to claim 1, wherein said extracting features of five sense organs in a plurality of preset local five sense organ region corresponding images in the original face image comprises:

8. An image facelining apparatus, the apparatus comprising:

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image facelining method of any of claims 1 to 7.

10. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of a server, enable the server to perform the image resurfacing method of any of claims 1 to 7.

11. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the image facelining method of any of claims 1 to 7.