CN113627404A

CN113627404A - High-generalization face replacement method and device based on causal inference and electronic equipment

Info

Publication number: CN113627404A
Application number: CN202111185354.4A
Authority: CN
Inventors: 赫然; 黄怀波; 高格格
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2021-10-12
Filing date: 2021-10-12
Publication date: 2021-11-09
Anticipated expiration: 2041-10-12
Also published as: CN113627404B

Abstract

The invention provides a causal inference-based high-generalization face replacement method, a causal inference-based high-generalization face replacement device and electronic equipment, wherein the method comprises the following steps of: determining a source face image and a target face image; inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model; the face replacement model determines the identity information representation of the source face image based on the causal effect of the expression posture parameters of the target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image; the face replacement model is obtained by training based on the sample source face image and the sample target face image. The method, the device, the electronic equipment and the storage medium provided by the invention obtain the high-quality vivid face replacement image, thereby improving the stability and generalization capability of the face replacement technology in different target scenes.

Description

High-generalization face replacement method and device based on causal inference and electronic equipment

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a high-generalization face replacement method and device based on causal inference and electronic equipment.

Background

The human face image identity replacement is a leading-edge research problem in the field of computer vision and the image generation direction, has extremely important application value in the fields of virtual reality, movie special effects, game production and the like, and has attracted wide attention in academia and industry at present. The face image identity replacement, namely 'face changing', replaces the identity information of a given target face image with the identity of a source face image, and keeps other contents in the image unchanged.

At present, the difficulty of the face changing technology lies in improving the generalization capability. Under the difficult scene that the difference between the source face image and the target face image is large, namely when the difference between the pose (face orientation angle), the expression and the like of the face in the target face image and the source face image is large, the face image generated by the model hardly shows the state that the source face image should present under the target expression and pose, and the face changing result is generally distorted.

Disclosure of Invention

The invention provides a high-generalization face replacement method, a high-generalization face replacement device and electronic equipment based on causal inference, which are used for solving the defect of low generalization capability of a face replacement technology in the prior art and realizing the promotion of the generalization capability of the face replacement technology.

The invention provides a causal inference-based high-generalization face replacement method, which comprises the following steps:

determining a source face image and a target face image;

inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model;

the face replacement model determines the identity information representation of the source face image based on the causal effect of the expression posture parameters of the target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image;

the face replacement model is obtained by training based on the sample source face image and the sample target face image.

According to the high-generalization face replacement method based on causal inference, which is provided by the invention, the face replacement model comprises a first face statistical network; the first face statistical network is used for determining corresponding dense key points of the face based on the input face image;

the causal effect of the expression posture parameters of the target face image on the identity information is determined based on the following steps:

determining a causal effect of the expression posture parameters on dense key points of the human face based on the first face statistical network and the second face statistical network; the second face statistical network is obtained by sequentially inserting all information bottleneck layers into all intermediate layers of the first face statistical network, and the information bottleneck layers are used for carrying out information compression on the expression posture parameters;

and migrating the causal effect of the expression posture parameters on dense key points of the human face based on the migration parameters in the human face replacement model to obtain the causal effect of the expression posture parameters on the identity information.

According to the high-generalization face replacement method based on causal inference, which is provided by the invention, the face replacement model comprises a face recognition network; the face recognition network is used for determining corresponding original identity representation based on the input face image;

the identity information representation of the source face image is determined based on the following steps:

inputting the source face image into the face recognition network, and extracting source feature representation of the source face image from an intermediate layer of the face recognition network;

and determining the identity information representation of the source face image based on the source feature representation and the causal effect.

According to the high-generalization face replacement method based on causal inference provided by the invention, the determining the identity information representation of the source face image based on the source feature representation and the causal effect comprises the following steps:

determining an updated source feature representation based on the source feature representation and a regression kernel in the face replacement model; the regression kernel is used for compressing information which is irrelevant to identity in the source feature representation;

inputting the updated source feature representation into an intermediate layer of the face recognition network to obtain a compact identity representation output by the face recognition network;

determining the identity information representation based on the compact identity representation and the causal effect.

According to the high-generalization face replacement method based on causal inference provided by the invention, the perception information representation of the target face image is determined based on the following steps:

inputting the target face image into the face recognition network, and extracting target feature representation of the target face image from an intermediate layer of the face recognition network;

determining a perceptual information representation of the target face image based on the target feature representation.

According to the high-generalization face replacement method based on causal inference, provided by the invention, the face replacement model further comprises a kernel regression network; the kernel regression network is used for removing specific identity information contained in the input data;

the determining a perceptual information representation of the target face image based on the target feature representation comprises:

and inputting the target feature representation into the kernel regression network to obtain the perception information representation of the target face image output by the kernel regression network.

According to the high-generalization face replacement method based on causal inference, the kernel regression network is obtained by training based on the identity information representation of the sample source face image and the target feature representation and the original identity representation of the sample target face image determined by the face recognition network.

The invention also provides a high-generalization face replacement device based on causal inference, which comprises:

the determining module is used for determining a source face image and a target face image;

the replacing module is used for inputting the source face image and the target face image into a face replacing model to obtain a face replacing image output by the face replacing model;

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the high-generalization human face replacement method based on causal inference.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the causal inference based highly generalized face replacement method as described in any of the above.

The high-generalization face replacement method, the device and the electronic equipment based on causal inference provided by the invention carry out causal inference through the face replacement model, determine the causal effect of the expression attitude parameters of the target face image on the identity information, thereby estimating the influence of the difference between the target face image and the source face image in the aspects of expression, attitude and the like on the source identity representation, determining the identity information representation of the source face image based on the causal effect, simultaneously effectively extracting the perception information representation of the target face image, carrying out face replacement on the basis, obtaining a high-quality vivid face replacement image, and further improving the stability and generalization capability of the face replacement technology in different target scenes.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a highly generalized face replacement method based on causal inference provided by the present invention;

FIG. 2 is a schematic flow diagram of a causal determination method provided by the present invention;

FIG. 3 is a schematic diagram of a computing framework of a face replacement model provided by the present invention;

FIG. 4 is a schematic flow chart of a face replacement model construction method provided by the invention;

FIG. 5 is a schematic structural diagram of a highly generalized face replacement device based on causal inference provided by the present invention;

fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a high-generalization face replacement method based on causal inference. Fig. 1 is a schematic flow chart of a causal inference-based highly generalized face replacement method provided by the present invention, and as shown in fig. 1, the method includes:

step 110, determining a source face image and a target face image.

Specifically, the source face image is a face image that needs to retain identity information in the face replacement process, and correspondingly, the image that needs to be replaced with the identity information and retains the perception information in the face replacement process is a target face image. Here, the perception information may include hair, clothing, background, lighting conditions, and the like in the target face image. The source face image and the target face image may be captured by a web crawler or other means, or may be acquired by an image acquisition device such as a scanner, a mobile phone, a camera, and the like, which is not specifically limited in this embodiment of the present invention.

Step 120, inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model;

the face replacement model determines the identity information representation of a source face image based on the causal effect of the expression posture parameters of a target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image;

Here, the expression pose parameters may include expression parameters and pose parameters of the target face image, the expression parameters may represent expression information of the face in the corresponding image, and the pose parameters may represent orientation angle information of the face in the corresponding image. The causal effect of the expression posture parameters of the target face image on the identity information refers to the concept of interventional causal observation, and refers to the difference value of the identity estimation results obtained by the target face image containing the expression posture parameters and the target face image not containing the expression posture parameters.

Specifically, it is considered that in the prior art, under a difficult scene that the difference between the source face image and the target face image is large, that is, when the difference between the pose, expression and the like of the face in the target face image and the source face image is large, the face changing result is generally distorted. In order to solve the problem, in the embodiment of the invention, after a source face image and a target face image are input into a face replacement model, the face replacement model determines the causal effect of expression posture parameters of the target face image on identity information, and estimates the inductive deviation of the identity representation of the source face image in a target scene by using the causal effect, so as to determine the identity information representation of the source face image, and extract a perception information representation irrelevant to the identity information from the target face image.

The causal effect of the expression posture parameters of the target face image on the identity information can be specifically obtained by inducing the target scene condition of the target face image through causal inference, and here, the embodiment of the invention does not specifically limit the causal inference mode. The identity information representation of the source face image can be specifically obtained by carrying out face recognition on the source face image to obtain an original identity representation and then determining according to the original identity representation and the causal effect, or can be obtained by extracting a feature representation from the source face image and then determining according to the feature representation and the causal effect. The perception information representation of the target face image may be obtained by directly recognizing the perception information of the target face image, or may be obtained by extracting a feature representation from the target face image and determining the feature representation according to the feature representation.

In addition, before step 120 is executed, a face replacement model needs to be trained in advance, and the face replacement model can be trained specifically in the following manner: first, a large number of sample source face images and sample target face images are collected. And then, training the initial model based on the sample source face image and the sample target face image, thereby obtaining a face replacement model. Here, the network type and structure of the initial model are not particularly limited in the embodiments of the present invention.

The method provided by the embodiment of the invention determines the causal effect of the expression attitude parameters of the target face image on the identity information by carrying out causal inference through the face replacement model, thereby estimating the influence of the difference between the target face image and the source face image in the aspects of expression, attitude and the like on the source identity representation, determining the identity information representation of the source face image based on the causal effect, simultaneously effectively extracting the perception information representation of the target face image, carrying out face replacement on the basis, obtaining a high-quality vivid face replacement image, and further improving the stability and generalization capability of the face replacement technology in different target scenes.

Based on any of the above embodiments, the face replacement model includes a first face statistics network; the first face statistical network is used for determining corresponding face dense key points based on the input face image;

determining a causal effect of the expression posture parameters on dense key points of the face based on the first face statistical network and the second face statistical network; the second face statistical network is obtained by sequentially inserting all information bottleneck layers into all intermediate layers of the first face statistical network, and the information bottleneck layers are used for carrying out information compression on the expression and posture parameters;

and based on the migration parameters in the face replacement model, migrating the causal effect of the expression posture parameters on dense key points of the face to obtain the causal effect of the expression posture parameters on the identity information.

Specifically, the expression posture parameter f of the target face image is considered_expoCausal effects on identity information may be achieved by targeting f_expoThe control variable is obtained by the intervention experiment, and the difference of the identity estimation results obtained by the experimental group and the control group in the control variable intervention experiment is the causal effect. Wherein, the identity estimation of the contrast group can be obtained by directly identifying the target face image by the face identification network, and the identity estimation z of the experimental group_idIs not subjected to any reaction with f_expoAnd obtaining an identity estimation result under the influence of the related information.

However, since there is no case of not carrying any f in the real situation_expoThe face image of the information, and the pair of pictures usually also not having different expression postures of the same person in the training data are used for learning, in addition, f_expoAnd identity information are not estimated by the same network, both of which can lead to problems in the controlIn the variable intervention experiment, the results of the experimental group can not be obtained. Therefore, directly infer f_expoThe causal effect on identity information cannot be performed, i.e. there is a Fundamental Problem of Causal Inference (FPCI).

To address this problem, embodiments of the present invention provide an innovative solution by first introducing a non-rigid shape for the face (dense key points f with the face)_meshExpress) as a mediation variable, due to f_expoAnd f_meshCan be obtained by the same face statistical network based on the estimation of the input face image, and f can be calculated by the face statistical network_expoTo f_meshCause and effect of, i.e. elimination of f_expoF obtained before and after the related information_meshEstimating the difference of the results, carrying out causal effect migration according to the migration parameters in the face replacement model, and converting f into f_expoTo f_meshThe causal effect of (a) is transferred to identity information to finally obtain f_expoCausal effects on identity information.

Here, f_expoTo f_meshThe causal effect can be specifically achieved by an original face statistical network, namely a first face statistical network, included in the face replacement model, and a second face statistical network obtained by sequentially inserting each information bottleneck layer into each intermediate layer of the first face statistical network, and respectively performing f on a target face image_meshAnd (6) obtaining the estimation.

It will be appreciated that each information bottleneck layer can characterize f in the process_expoCompressing the related information to limit the correlation with_expoRelated information flows into a calculation graph, so that causal inference under the condition that the target scene information is controlled is realized through an information bottleneck principle. F obtained by the first face statistical network_meshThe estimation result is the result of the control group in the control variable intervention experiment, and f is obtained by the second face statistical network_meshThe estimated result is the result of the experimental group in the experiment, and the difference between the two is f_expoTo f_meshThe causal effect of (a).

Based on any of the above embodiments, the face replacement model includes a face recognition network; the face recognition network is used for determining corresponding original identity representation based on the input face image;

inputting a source face image into a face recognition network, and extracting source characteristic representation of the source face image from an intermediate layer of the face recognition network;

Specifically, the face replacement model includes a face recognition network, and the face recognition network may perform face recognition based on an input face image to obtain an original identity representation corresponding to the face image. On the basis, the identity information representation of the source face image can be obtained by the following method: firstly, inputting a source face image into the face recognition network, and extracting source characteristic representation of the source face image from an intermediate layer of the face recognition network in the process of calculating the original identity representation of the source face image by the face recognition network; and then, determining the identity information representation of the source face image according to the source feature representation and the causal effect of the expression posture parameters of the target face image on the identity information.

Here, the determining manner of the identity information representation of the source face image may specifically be to input the source feature representation into another network, and then determine the identity information representation of the source face image based on the recognition result and the causal effect of the other network, or perform feature transformation on the source feature representation, and then input the transformed source feature representation back to the face recognition network, and determine the identity information representation of the source face image based on the recognition result and the causal effect of the face recognition network, which is not specifically limited in this embodiment of the present invention.

Based on any of the above embodiments, determining the identity information representation of the source face image based on the source feature representation and the causal effect includes:

determining an updated source feature representation based on the source feature representation and a regression kernel in the face replacement model; the regression kernel is used for compressing information which is irrelevant to the identity in the source feature representation;

inputting the updated source feature representation into an intermediate layer of the face recognition network to obtain compact identity representation output by the face recognition network;

an identity information representation is determined based on the compact identity representation and the causal effect.

Specifically, in order to obtain a more compact representation of the identity information of the source face image, further improve the accuracy of face replacement, and reduce the amount of calculation of face replacement, the embodiment of the present invention compresses the extracted information irrelevant to the identity in the source feature representation by using the regression kernel in the face replacement model, and re-inputs the obtained updated source feature representation into the intermediate layer of the face recognition network, and performs identity recognition on the updated source feature representation by the face recognition network, thereby obtaining a compact identity representation of the source face image.

And then, the identity information representation of the source face image more suitable for the identity replacement task can be determined according to the compact identity representation of the source face image and the causal effect of the expression posture parameters of the target face image on the identity information, namely the inductive deviation of the identity representation of the source face image in the target scene.

Based on any of the above embodiments, the perceptual information representation of the target face image is determined based on the following steps:

inputting a target face image into a face recognition network, and extracting target feature representation of the target face image from an intermediate layer of the face recognition network;

based on the target feature representation, a perceptual information representation of the target face image is determined.

Specifically, the perceptual information representation of the target face image may be obtained by: firstly, inputting a target face image into a face recognition network included in a face replacement model, and extracting target feature representation of the target face image from an intermediate layer of the face recognition network in the process of calculating the original identity representation of the target face image by the face recognition network; then, according to the target feature representation, the perception information representation of the target face image is determined.

Here, the determination method of the perception information representation of the target face image may specifically be to directly perform perception information identification on the target feature representation to obtain the perception information representation of the target face image, or may also be to perform kernel regression transformation on the target feature representation to remove specific identity information included therein to obtain the perception information representation of the target face image, which is not specifically limited in this embodiment of the present invention.

Based on any of the above embodiments, the face replacement model further includes a kernel regression network; the kernel regression network is used for removing specific identity information contained in the input data;

determining a perceptual information representation of the target face image based on the target feature representation, comprising:

and inputting the target feature representation into a kernel regression network to obtain the perception information representation of the target face image output by the kernel regression network.

Specifically, after the target feature representation of the target face image is extracted from the intermediate layer of the face recognition network, the target feature representation may be input into a kernel regression network included in the face replacement model, the kernel regression network performs kernel regression transformation on the target feature representation, the identity information included in the target feature representation is decoupled from the sensing information, and it is ensured that only the decoupled identity information is subjected to regression transformation, the identity information included in the target face image is removed, but the sensing information included in the target face image is not changed, and the kernel regression network may finally obtain the sensing information representation of the target face image.

Based on any of the above embodiments, the kernel regression network is trained based on the identity information representation of the sample source face image, and the target feature representation and the original identity representation of the sample target face image determined by the face recognition network.

Specifically, in order to remove specific identity information contained in the input data, the kernel regression network may be trained as follows: inputting the sample source face image into a face replacement model to obtain identity information representation of the sample source face image; inputting the sample target face image into a face recognition network to obtain an original identity representation of the sample target face image output by the face recognition network, and extracting a target feature representation of the sample target face image from an intermediate layer of the face recognition network; and then, training the initial kernel regression network according to the identity information representation of the sample source face image, the target feature representation and the original identity representation of the sample target face image, thereby obtaining the kernel regression network.

Based on any of the above embodiments, after a large number of sample source face images are collected, the sample source face images may be aligned with the face key points and cut into a fixed size, for example, 512 × 512, required by the face replacement model, and then the cut sample source face images are used for training the face replacement model;

further, the facial key points may be obtained by estimating the facial key points of the sample source face image by a face statistical network, and the face statistical network may be a network based on a face 3D deformation statistical Model (3D deformable Model, 3 DMM), and is denoted as M^3d(. charpy). After the facial key points of the sample source face image are obtained through estimation, a similarity transformation matrix H between the group of key points and the reference key points can be calculated, affine transformation is carried out on the sample source face image through the matrix, the sample source face image is cut into a fixed size required by a face replacement model, and the processed sample source face image can be used for subsequent training of the face replacement model.

Based on any one of the above embodiments, in order to obtain the expression posture parameter f_expoDense key points f to face_meshThe embodiment of the invention designs a Hierarchical Information Bottleneck module (HIB) which comprises a plurality of Information Bottleneck layers

Composition, each information bottleneck layer

Are all independent three-layer convolutional neural networks. By successively inserting these information bottleneck layers into the middle layer of the first demographic network, it is possible to implementNow to f_expoAnd compressing the related information, wherein the obtained network is the second face statistical network.

Here, the first demographic network may determine a corresponding f based on the input facial image_meshAnd f_expoThe network M based on 3DMM can be adopted^3d(. dash) implementation. The parameters of 3DMM, i.e. the weighting parameters of Principal Component Analysis (PCA), include three items, which are: shape of face (shape) parameter f_shpHead pose (position) and facial expression (expression) parameters, the latter two being collectively denoted as f_expo. In addition, f can also be determined by the three parameters_mesh。

Fig. 2 is a schematic flow chart of the causal effect determination method provided by the present invention, and as shown in fig. 2, the specific flow chart is as follows:

in using M^3d(. charpy) network estimates the target face image X_tF of (a)_meshIn the process of (2), the intermediate features of the network are extracted layer by layer and recorded as

(ii) a Using these intermediate features as each information bottleneck layer

As an intermediate feature of each layer

Predicting a channel-by-channel information mask

Namely:

wherein the content of the first and second substances,

is between 0 and 1, and

have the same spatial dimensions.

Then, the information is masked

Acting on intermediate features

By using

To the direction of

Random noise with the same distribution is injected to achieve the purpose of information compression:

wherein the content of the first and second substances,

is random Gaussian noise, sampled from and

gaussian distribution of the mean variance.

To guide each information bottleneck layer

To correctly predict the information mask

In the embodiment of the invention, an information bottleneck tradeoff equation is designed, so that

The value of each element in the series and

for expression of f_expoThe importance of the information corresponds to: intermediate features

In, with f_expoThe more relevant the information is, the information mask of the neuron

The closer the value of the element at the corresponding position in the array is to 1; conversely, the closer to 0. The trade-off equation is designed as follows:

wherein α: (>0) Is a weight parameter that is a function of,

and

are two items that are gambling against each other:

（1）

is each

And

average of mutual information between them, mutual information being noted𝐼A signature;. signature) that measures a characteristic after injection of noise

Degree of information compression of (2):

（2）

after injected noise is measured

For f_expoThe predictive power of (a) for maximum retention of (a) and (b)_expoThe related information is as follows:

wherein the content of the first and second substances,

is made by

Replacing original intermediate features of the network

And then, calculating the obtained expression posture parameters.

Multiple information masks thus learned

In (1), the value of an element represents the corresponding intermediate feature

Of (3) and_expothe degree of correlation of information representation, therefore, the information bottleneck layer can pass

Will be related to f_expoFor irrelevant information

The noise is replaced, thereby achieving the purpose of information compression. In addition, in order to avoid the occurrence of systematic errors,

is adopted by

Noise of the same distribution. It should be noted that f is sufficiently compressed_expoIrrelevant information, information bottleneck layers

Is a successive insertion of the 3D network M^3dThe compression effect of each bottleneck layer in (dash), i.e., after, is based on the previous bottleneck layer.

By using

Replacing original intermediate features of the network

Carrying out 3Df_meshTaking the parameter vector before the last classified fully-connected layer as f ̃_vec(ii) a Using original intermediate features before replacement

Carrying out 3Df_meshTaking the parameter vector before the last classified fully-connected layer as f_vec. Since the classification fully-connected layer only performs a classification function, the definition of causal effects according to the causal view of the interventionalist, f_expoTo f_meshThe causal effect of (d) can be determined by the variation of the parameter vector before and after the intermediate feature replacement, and is marked as delta (f)_expo→f_mesh)：

Then, carrying out causal effect migration according to migration parameters in the face replacement model, and carrying out causal effect migration on the f_expoTo f_meshThe causal effect of (a) is transferred to identity information to finally obtain f_expoCausal effects on identity information. Here, the migration parameter can be realized in particular by a parameter learnable neural network implicit function g (dash), in which case f_expoThe causal effect on identity information can be expressed as:

wherein the content of the first and second substances,

and representing the influence of exogenous disturbance on identity estimation, such as background, illumination and the like in the face image.

Considering that the exogenous disturbance is complex and variable and a unified model estimation cannot be established, the embodiment of the invention adopts a random sampling mode in von Mises-Fisher (vMF) distribution with the mean value of 0 and known concentration kappa to simulate the exogenous disturbance in the face image. The vMF aggregation degree can be calculated by identifying 5000 face images in advance to obtain the identity estimation encoding vectors of the face images, and then deducing the aggregation degree according to the standard deviation of the identity estimation encoding vectors.

In addition, adopt and

constructing the inverse equation based on the mask

May also be substituted with f_expoThe relevant information is compressed:

based on any of the above embodiments, the target face image X_tThe perceptual information representation of (a) may be obtained in particular by:

first using a pre-trained face recognition network M^id(. charpy) calculate X_tAnd from M in the calculation process^id(character) extracting X from the intermediate layer_tIs expressed as

Then, the target feature representation can be input into a kernel regression network included in the face replacement model, and the kernel regression network performs kernel regression transformation on the target feature representation, and the kernel regression network may include a group of multiple nonlinear regressors, which are denoted as

。

Each nonlinear regression device

Is composed of a convolutional neural network, each of which includes a regression kernel k_(i)The regression kernel k_(i)The size of the value of the medium element represents

Correlation of mesoneurons with identity representation, therefore, the regression kernel k is used_(i)Can act on M^id(. charpy) in

The method comprises the following steps:

wherein, regression kernel k_(i)Size and M^id(. charpy) the ith target feature representation in

And the same element value is between 0 and 1.

Make the regression transformation only effective

In which the partial area most relevant to the identity information is not changed

The sensing information which is not related to the identity is contained in the information, so that X can be obtained_tIs represented by the perception information

。

Here, regression kernel k_(i)Specifically, an optimization equation with a constraint term and a sample source face image X are established_sIdentity information representation of (1), sample target face image X_tThe target feature representation and the original identity representation of (2) are learned to obtain:

and establishing an optimization equation with a constraint term, learning a regression kernel required by kernel regression transformation in a feature space in a kernel regression network, and ensuring that the regression transformation only acts on a part related to the identity information in the feature representation without changing other perception information through the regression kernel. The optimization equation is designed as follows:

wherein the content of the first and second substances,

representing the identity of a given face image X, replacing the face recognition network M with features^id(. charpy) the ith target feature representation in

，

Represents X_sIs indicative of the identity information of (a),

represents X_tOriginal identity table ofShown in the figure.

If k is_(i)The value of the medium element can correctly represent

The degree of correlation of the neuron with the identity representation, then

In a region of closely related identity

Will be transformed by a transformation function

Is changed, then produced

Relative to

Will vary greatly; in the same way, the method for preparing the composite material,

characteristics used

Only for areas that are less relevant to identity

The transformation is carried out while preserving the areas closely related to the identity, and therefore

Relative to

There should be no significant variation. In addition, during the training process, the device makes

And

cosine similarity between them

As large as possible, can make X_tIs closer to the identity representation of the "average face", no longer has X_tUnique identity information.

Through the optimization equation with the constraint term, regression kernel k in the training process_(i)Will be effectively supervised and will be able to learn the kernel regression transformation of the high dimensional feature space. Using a kernel k based on the regression_(i)Kernel regression network computation of X_tIs represented by the perception information

Can convert X_tDecoupling the contained identity information from the perception information and X_tThe unique identity information in the human face identification information is replaced by the identity information of the average human face without uniqueness, and meanwhile, the background perception information required by the human face replacement is reserved.

In addition, in M^id(. charpy) calculate X_sFrom M in the course of the original identity representation^id(character) extracting X from the intermediate layer_sSource signature representation of

Then further using regression kernel k_(i)Will be

The information irrelevant to the identity is compressed, and the information is specifically obtained by compressing

The middle injection is realized by the same distribution Gaussian noise, so that more compact and robust identity representation, namely X, can be obtained_sCompact identity representation of

：

At the same time require k_(i)The following constraints are satisfied:

wherein the content of the first and second substances,𝐼a signature representing mutual information that measures the characteristics after the injection of noise

The degree of information compression of (2);

representation calculation for identity representation of given face image X, replacing M with features^id(. charpy) the ith source signature representation in

，

Represents X_sIs represented by the original identity of (a). X_sThe original identity representation of (a) may specifically be determined by first identifying X_sIs input to M^idIn the net, the feature vector before the last classification full connection layer is taken as the original identity representation.

On the basis of the above, can be according to X_sCompact identity representation of

And X_tCausal effect of expression gesture parameters on identity information

Get X more suitable for the task of identity replacement_sThe identity information of (a) indicates:

wherein the content of the first and second substances,λis a weight hyperparameter.

Based on any of the above embodiments, the face replacement model further includes an Adaptive generation network, where the Adaptive generation network includes multiple Adaptive Instance Normalization (AdaIN) modules, and is used to normalize the source face image X_sIdentity information representation of

And a target face image X_tIs represented by the perception information

Information fusion is carried out, so that a final face changing result, namely a face replacing image is obtained

The specific process is as follows:

wherein the content of the first and second substances,

and

are respectively used

And

the calculated affine parameters and the sizes of the two pairs of affine parameters are equal to

The same is true.

Representing the intermediate layer results of the network being generated,

after that, at the first

The layer is calculated as follows:

wherein, the weight characteristic diagram

From the upper layer

Calculated for fusion activation

And

. Final face replacement image

Is formed by

And performing one-time up-sampling to obtain the product.

Based on any of the above embodiments, the present invention is directed to X_sAnd X_tIn the difficult scene with large difference in the aspects of facial expression, head posture, image background illumination and the like, a face replacement method based on inductive biased estimation and image information decoupling is disclosed, and a new face replacement model is provided, namely an EVA (ethylene vinyl acetate) model.

Fig. 3 is a schematic diagram of a computing framework of the face replacement Model provided by the present invention, and as shown in fig. 3, f is obtained through a control variable intervention experiment and a virtual reality Model (RCM)_expoTo f_meshAnd the causal effect is transferred by a parameter learnable neural network implicit function g (charpy), and the exogenous perturbation obtained at vMF has an effect on identity estimation

Determining the inductive deviation

(ii) a Obtaining regression kernel k through IKE (Invariant kernel regression) algorithm learning_(i)Using the regression kernel k_(i)To obtain X_tIs represented by the perception information

(ii) a Further using regression kernel k_(i)Obtaining X_sCompact identity representation of

(ii) a Then, inductive biased estimation is carried out, based on

And inductive deviation

To obtain X_sIdentity information representation of

(ii) a Finally, according to the generation network, i.e. generator G in the graph (dash), it will be

And

carrying out information fusion to generate a face replacement image

。

Based on any of the above embodiments, an opponent-generating mode can be adopted to perform end-to-end training on the face replacement model:

in order to supervise the end-to-end training of the model, the embodiment of the invention designs a plurality of loss items. And obtaining a causal loss term according to an information bottleneck tradeoff equation, wherein the causal loss term is used for supervising the learning of the inductive bias identity estimator based on causal inference:

obtaining kernel regression loss terms according to a constraint equation of the regression kernel, and using the kernel regression loss terms for learning of the supervised regression kernel, so that the regression transformation in the feature space only acts on the target feature representation

Without changing other context-aware information:

where β is a weight parameter.

Generating a face replacement image

Thereafter, the network M is recognized again by the face^id(. charpy) extraction

Is represented by

Constructing identity loss terms for supervision

Identity information retention case of (1):

wherein sg [. dashed ] indicates that the gradient stopped.

Design background loss term, pair

The background retention of (2) is supervised:

wherein the content of the first and second substances,

is shown in the calculation

In the process of identity representation, from M^idExtracted from the middle layer of the signature

Is shown.

Design expression posture loss item, pair

Monitoring the expression posture consistency:

wherein the content of the first and second substances,

representing face statistics network M^3d(. charpy) computed target face image X_tThe expression-posture parameters of (1) are,

represents M^3d(. charpy) is computed

The expression gesture parameters.

In addition, in order to increase the fidelity of the generated face replacement image, the embodiment of the invention also introduces a discrimination network for confrontation training and adds a confrontation loss item

And the method is used for improving the fidelity of the generated result. Finally, the overall objective function is:

wherein (w)₁,w₂,w₃,w₄,w₅) Is a weight hyperparameter.

Based on any of the above embodiments, fig. 4 is a schematic flow chart of the face replacement model construction method provided by the present invention, and as shown in fig. 4, the model construction flow includes: firstly, processing training data, and processing a source face image X in the training data_sAnd a target face image X_tAligning and cutting the blank into a fixed size; second, calculate X_sA compact identity representation of; cause and effect inference is carried out through control variable intervention experiment according to X_tThe expression posture parameters of the middle human face, and the estimated X_sThe identity of (a) represents the inductive bias that should be present in the target scenario, i.e., the inductive identity bias estimator; according to compact identityRepresenting and generalizing identity deviation estimators, determining X's containing generalizing deviations_sIs represented by identity information of; then to X_tIs subjected to a kernel regression transformation to remove X contained therein_tUnique identity information without changing the perception information contained therein, resulting in X without the specific identity information_tThe perceptual information representation of; finally designing a generation network, and enabling X_sAnd after regression transformation X_tThe perception information representation of the human face is subjected to self-adaptive feature fusion to generate a human face replacement image

And performing end-to-end training by adopting a countermeasure generating mode to finally obtain the EVA model.

In the testing (application) stage of the EVA model, the aligned and cut X can be aligned and cut_sAnd X_tInputting the image into a trained EVA model to obtain an image subjected to identity replacement, namely a face replacement image

。

Having a reaction with X_sIdentity information identical to X, and_tthe same expression posture, hair style and clothes and other identity-independent perception information.

The EVA model adopts the confrontation type self-coding neural network as a main body of a learning frame, effectively learns the characteristic of sample distribution and generates a high-quality vivid human face replacement image. Using X in conjunction with causal inference in constructing an inductive identity bias estimator_tThe known information contained in the method is used for modeling the source face identity deviation amount required by the identity replacement task and learning in X_sAnd X_tAnd under the condition that the information such as expression, posture, style and the like has great difference, the generalization capability of the model is improved.

Based on any one of the embodiments, the face replacement method disclosed by the invention performs attribution on the target scene condition through causal inferenceNano to thereby estimate a source face image X_sIndicates the generalized deviation that should be had in the target scenario, X_sThe identity representation of (a) is estimated from the deterministic points commonly used in the prior art, modified to a new estimator containing the uncertainty of the target scene; the causal inference under the condition of controlled target scene information is realized through an information bottleneck principle; meanwhile, an optimization model with constraint terms, namely a kernel regression network, is constructed, kernel regression transformation of a high-dimensional feature space is learned, and the target face image X is subjected to kernel regression transformation_tDecoupling the contained identity information from the perception information and X_tThe unique identity information in the human face identification information is replaced by the identity information of the average human face without uniqueness, and meanwhile, the background perception information required by the human face replacement is reserved.

And finally, the process of generating the face replacement image is to fuse the identity information representation of the source face image containing inductive deviation and the perception information representation of the target face image without identity specificity, so that the face replacement with high fidelity and an open scene capable of generalization is realized.

The invention has the beneficial effects that: the invention is a face identity replacement technology with any identity in an open scene, and according to the method of the invention, natural and vivid identity replacement can be realized on non-paired face image data with any identity, and the identity can be replaced at X_sAnd X_tEffectively keeping X under the condition that the information of expression, posture, style and the like has great difference_sIdentity information and X of_tThe stability and generalization capability of the face identity replacement method are greatly improved by sensing information.

Based on any of the embodiments, in order to verify the effectiveness of the method of the present invention, an EVA model is applied to a test set, a quantitative evaluation index is calculated, and compared with the most advanced existing face replacement method, the existing face replacement method includes FaceSwap, FSGAN (FaceSwap-general adaptive Network), deepfaces, and faceshift. Table 1 shows the results of comparing EVA with the existing face replacement method:

firstly, quantitative analysis and comparison are carried out, and a human face replacement image is generated by using a model

Then, the Identity retrieval result (Identity retrieval) is judged by using the face recognition network, including

The Accuracy of identity (Accuracy, the larger the index is, the better),

And

cosine similarity (the larger the index is, the better) of,

And

cosine similarity (smaller index is better), as shown in the second column of table 1 (a); using 3D face statistics network estimation

The pose and expression of (a) are calculated and compared with the target face image X_tDifferences of Expression attitudes, including attitude Errors (poserror) and Expression Errors (Expression Errors), are shown in the third and fourth columns of table 1 (a) (smaller indexes are better);

secondly, a user survey is carried out, the EVA generation result and other method generation results are displayed to the user, and the user is requested to select the best generation result from the three aspects of Identity retention (Identity), Expression and posture consistency (position and Expression) and image quality Fidelity (Fidelity), as shown in the table 1 (b) (the larger the index is, the better the index is).

TABLE 1

As shown in the table, both evaluation results show that the face replacement image generated by the EVA has higher fidelity.

The following describes the causal inference-based highly-generalized face replacement device provided by the present invention, and the causal inference-based highly-generalized face replacement device described below and the causal inference-based highly-generalized face replacement method described above may be referred to correspondingly.

Based on any of the above embodiments, fig. 5 is a schematic structural diagram of a highly generalized face replacement device based on causal inference provided by the present invention, as shown in fig. 5, the device includes:

a determining module 510, configured to determine a source face image and a target face image;

a replacing module 520, configured to input the source face image and the target face image into a face replacing model, so as to obtain a face replacing image output by the face replacing model;

The device provided by the embodiment of the invention performs causal inference through the face replacement model to determine the causal effect of the expression attitude parameters of the target face image on the identity information, thereby estimating the influence of the difference between the target face image and the source face image in the aspects of expression, attitude and the like on the source identity representation, determining the identity information representation of the source face image based on the causal effect, simultaneously effectively extracting the perception information representation of the target face image, and performing face replacement on the basis to obtain a high-quality vivid face replacement image, thereby improving the stability and generalization capability of the face replacement technology in different target scenes.

Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a causal inference based highly generalized face replacement method comprising: determining a source face image and a target face image; inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model; the face replacement model determines the identity information representation of a source face image based on the causal effect of the expression posture parameters of a target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image; the face replacement model is obtained by training based on the sample source face image and the sample target face image.

In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention further provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer being capable of executing the method for highly generalized face replacement based on causal inference provided by the above methods, the method including: determining a source face image and a target face image; inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model; the face replacement model determines the identity information representation of a source face image based on the causal effect of the expression posture parameters of a target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image; the face replacement model is obtained by training based on the sample source face image and the sample target face image.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements a method for performing causal inference based highly generalized face replacement provided by the above methods, the method comprising: determining a source face image and a target face image; inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model; the face replacement model determines the identity information representation of a source face image based on the causal effect of the expression posture parameters of a target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image; the face replacement model is obtained by training based on the sample source face image and the sample target face image.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A high-generalization face replacement method based on causal inference is characterized by comprising the following steps:

determining a source face image and a target face image;

2. The causal inference based highly generalized face replacement method of claim 1, wherein said face replacement model comprises a first face statistical network; the first face statistical network is used for determining corresponding dense key points of the face based on the input face image;

3. The causal inference based highly generalized face replacement method of claim 1, wherein said face replacement model comprises a face recognition network; the face recognition network is used for determining corresponding original identity representation based on the input face image;

4. The causal inference based highly generalized face replacement method of claim 3, wherein said determining an identity information representation of said source face image based on said source feature representation and said causal effect comprises:

5. The causal inference based highly generalized face replacement method of claim 3, wherein said perceptual information representation of the target face image is determined based on the following steps:

6. The causal inference based highly generalized face replacement method of claim 5, wherein said face replacement model further comprises a kernel regression network; the kernel regression network is used for removing specific identity information contained in the input data;

7. The causal inference based highly generalized face replacement method of claim 6, wherein the kernel regression network is trained based on the identity information representation of the sample source face image, and the target feature representation and original identity representation of the sample target face image determined by the face recognition network.

8. A causally-inferred highly generalized face replacement device, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the causal inference based highly generalized face replacement method according to any one of claims 1 to 7.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the causal inference based highly generalized face replacement method according to any one of claims 1 to 7.