CN113627404A - High-generalization face replacement method and device based on causal inference and electronic equipment - Google Patents

High-generalization face replacement method and device based on causal inference and electronic equipment Download PDF

Info

Publication number
CN113627404A
CN113627404A CN202111185354.4A CN202111185354A CN113627404A CN 113627404 A CN113627404 A CN 113627404A CN 202111185354 A CN202111185354 A CN 202111185354A CN 113627404 A CN113627404 A CN 113627404A
Authority
CN
China
Prior art keywords
face
face image
representation
target
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111185354.4A
Other languages
Chinese (zh)
Other versions
CN113627404B (en
Inventor
赫然
黄怀波
高格格
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202111185354.4A priority Critical patent/CN113627404B/en
Publication of CN113627404A publication Critical patent/CN113627404A/en
Application granted granted Critical
Publication of CN113627404B publication Critical patent/CN113627404B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a causal inference-based high-generalization face replacement method, a causal inference-based high-generalization face replacement device and electronic equipment, wherein the method comprises the following steps of: determining a source face image and a target face image; inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model; the face replacement model determines the identity information representation of the source face image based on the causal effect of the expression posture parameters of the target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image; the face replacement model is obtained by training based on the sample source face image and the sample target face image. The method, the device, the electronic equipment and the storage medium provided by the invention obtain the high-quality vivid face replacement image, thereby improving the stability and generalization capability of the face replacement technology in different target scenes.

Description

High-generalization face replacement method and device based on causal inference and electronic equipment
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a high-generalization face replacement method and device based on causal inference and electronic equipment.
Background
The human face image identity replacement is a leading-edge research problem in the field of computer vision and the image generation direction, has extremely important application value in the fields of virtual reality, movie special effects, game production and the like, and has attracted wide attention in academia and industry at present. The face image identity replacement, namely 'face changing', replaces the identity information of a given target face image with the identity of a source face image, and keeps other contents in the image unchanged.
At present, the difficulty of the face changing technology lies in improving the generalization capability. Under the difficult scene that the difference between the source face image and the target face image is large, namely when the difference between the pose (face orientation angle), the expression and the like of the face in the target face image and the source face image is large, the face image generated by the model hardly shows the state that the source face image should present under the target expression and pose, and the face changing result is generally distorted.
Disclosure of Invention
The invention provides a high-generalization face replacement method, a high-generalization face replacement device and electronic equipment based on causal inference, which are used for solving the defect of low generalization capability of a face replacement technology in the prior art and realizing the promotion of the generalization capability of the face replacement technology.
The invention provides a causal inference-based high-generalization face replacement method, which comprises the following steps:
determining a source face image and a target face image;
inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model;
the face replacement model determines the identity information representation of the source face image based on the causal effect of the expression posture parameters of the target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image;
the face replacement model is obtained by training based on the sample source face image and the sample target face image.
According to the high-generalization face replacement method based on causal inference, which is provided by the invention, the face replacement model comprises a first face statistical network; the first face statistical network is used for determining corresponding dense key points of the face based on the input face image;
the causal effect of the expression posture parameters of the target face image on the identity information is determined based on the following steps:
determining a causal effect of the expression posture parameters on dense key points of the human face based on the first face statistical network and the second face statistical network; the second face statistical network is obtained by sequentially inserting all information bottleneck layers into all intermediate layers of the first face statistical network, and the information bottleneck layers are used for carrying out information compression on the expression posture parameters;
and migrating the causal effect of the expression posture parameters on dense key points of the human face based on the migration parameters in the human face replacement model to obtain the causal effect of the expression posture parameters on the identity information.
According to the high-generalization face replacement method based on causal inference, which is provided by the invention, the face replacement model comprises a face recognition network; the face recognition network is used for determining corresponding original identity representation based on the input face image;
the identity information representation of the source face image is determined based on the following steps:
inputting the source face image into the face recognition network, and extracting source feature representation of the source face image from an intermediate layer of the face recognition network;
and determining the identity information representation of the source face image based on the source feature representation and the causal effect.
According to the high-generalization face replacement method based on causal inference provided by the invention, the determining the identity information representation of the source face image based on the source feature representation and the causal effect comprises the following steps:
determining an updated source feature representation based on the source feature representation and a regression kernel in the face replacement model; the regression kernel is used for compressing information which is irrelevant to identity in the source feature representation;
inputting the updated source feature representation into an intermediate layer of the face recognition network to obtain a compact identity representation output by the face recognition network;
determining the identity information representation based on the compact identity representation and the causal effect.
According to the high-generalization face replacement method based on causal inference provided by the invention, the perception information representation of the target face image is determined based on the following steps:
inputting the target face image into the face recognition network, and extracting target feature representation of the target face image from an intermediate layer of the face recognition network;
determining a perceptual information representation of the target face image based on the target feature representation.
According to the high-generalization face replacement method based on causal inference, provided by the invention, the face replacement model further comprises a kernel regression network; the kernel regression network is used for removing specific identity information contained in the input data;
the determining a perceptual information representation of the target face image based on the target feature representation comprises:
and inputting the target feature representation into the kernel regression network to obtain the perception information representation of the target face image output by the kernel regression network.
According to the high-generalization face replacement method based on causal inference, the kernel regression network is obtained by training based on the identity information representation of the sample source face image and the target feature representation and the original identity representation of the sample target face image determined by the face recognition network.
The invention also provides a high-generalization face replacement device based on causal inference, which comprises:
the determining module is used for determining a source face image and a target face image;
the replacing module is used for inputting the source face image and the target face image into a face replacing model to obtain a face replacing image output by the face replacing model;
the face replacement model determines the identity information representation of the source face image based on the causal effect of the expression posture parameters of the target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image;
the face replacement model is obtained by training based on the sample source face image and the sample target face image.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the high-generalization human face replacement method based on causal inference.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the causal inference based highly generalized face replacement method as described in any of the above.
The high-generalization face replacement method, the device and the electronic equipment based on causal inference provided by the invention carry out causal inference through the face replacement model, determine the causal effect of the expression attitude parameters of the target face image on the identity information, thereby estimating the influence of the difference between the target face image and the source face image in the aspects of expression, attitude and the like on the source identity representation, determining the identity information representation of the source face image based on the causal effect, simultaneously effectively extracting the perception information representation of the target face image, carrying out face replacement on the basis, obtaining a high-quality vivid face replacement image, and further improving the stability and generalization capability of the face replacement technology in different target scenes.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a highly generalized face replacement method based on causal inference provided by the present invention;
FIG. 2 is a schematic flow diagram of a causal determination method provided by the present invention;
FIG. 3 is a schematic diagram of a computing framework of a face replacement model provided by the present invention;
FIG. 4 is a schematic flow chart of a face replacement model construction method provided by the invention;
FIG. 5 is a schematic structural diagram of a highly generalized face replacement device based on causal inference provided by the present invention;
fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a high-generalization face replacement method based on causal inference. Fig. 1 is a schematic flow chart of a causal inference-based highly generalized face replacement method provided by the present invention, and as shown in fig. 1, the method includes:
step 110, determining a source face image and a target face image.
Specifically, the source face image is a face image that needs to retain identity information in the face replacement process, and correspondingly, the image that needs to be replaced with the identity information and retains the perception information in the face replacement process is a target face image. Here, the perception information may include hair, clothing, background, lighting conditions, and the like in the target face image. The source face image and the target face image may be captured by a web crawler or other means, or may be acquired by an image acquisition device such as a scanner, a mobile phone, a camera, and the like, which is not specifically limited in this embodiment of the present invention.
Step 120, inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model;
the face replacement model determines the identity information representation of a source face image based on the causal effect of the expression posture parameters of a target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image;
the face replacement model is obtained by training based on the sample source face image and the sample target face image.
Here, the expression pose parameters may include expression parameters and pose parameters of the target face image, the expression parameters may represent expression information of the face in the corresponding image, and the pose parameters may represent orientation angle information of the face in the corresponding image. The causal effect of the expression posture parameters of the target face image on the identity information refers to the concept of interventional causal observation, and refers to the difference value of the identity estimation results obtained by the target face image containing the expression posture parameters and the target face image not containing the expression posture parameters.
Specifically, it is considered that in the prior art, under a difficult scene that the difference between the source face image and the target face image is large, that is, when the difference between the pose, expression and the like of the face in the target face image and the source face image is large, the face changing result is generally distorted. In order to solve the problem, in the embodiment of the invention, after a source face image and a target face image are input into a face replacement model, the face replacement model determines the causal effect of expression posture parameters of the target face image on identity information, and estimates the inductive deviation of the identity representation of the source face image in a target scene by using the causal effect, so as to determine the identity information representation of the source face image, and extract a perception information representation irrelevant to the identity information from the target face image.
The causal effect of the expression posture parameters of the target face image on the identity information can be specifically obtained by inducing the target scene condition of the target face image through causal inference, and here, the embodiment of the invention does not specifically limit the causal inference mode. The identity information representation of the source face image can be specifically obtained by carrying out face recognition on the source face image to obtain an original identity representation and then determining according to the original identity representation and the causal effect, or can be obtained by extracting a feature representation from the source face image and then determining according to the feature representation and the causal effect. The perception information representation of the target face image may be obtained by directly recognizing the perception information of the target face image, or may be obtained by extracting a feature representation from the target face image and determining the feature representation according to the feature representation.
In addition, before step 120 is executed, a face replacement model needs to be trained in advance, and the face replacement model can be trained specifically in the following manner: first, a large number of sample source face images and sample target face images are collected. And then, training the initial model based on the sample source face image and the sample target face image, thereby obtaining a face replacement model. Here, the network type and structure of the initial model are not particularly limited in the embodiments of the present invention.
The method provided by the embodiment of the invention determines the causal effect of the expression attitude parameters of the target face image on the identity information by carrying out causal inference through the face replacement model, thereby estimating the influence of the difference between the target face image and the source face image in the aspects of expression, attitude and the like on the source identity representation, determining the identity information representation of the source face image based on the causal effect, simultaneously effectively extracting the perception information representation of the target face image, carrying out face replacement on the basis, obtaining a high-quality vivid face replacement image, and further improving the stability and generalization capability of the face replacement technology in different target scenes.
Based on any of the above embodiments, the face replacement model includes a first face statistics network; the first face statistical network is used for determining corresponding face dense key points based on the input face image;
the causal effect of the expression posture parameters of the target face image on the identity information is determined based on the following steps:
determining a causal effect of the expression posture parameters on dense key points of the face based on the first face statistical network and the second face statistical network; the second face statistical network is obtained by sequentially inserting all information bottleneck layers into all intermediate layers of the first face statistical network, and the information bottleneck layers are used for carrying out information compression on the expression and posture parameters;
and based on the migration parameters in the face replacement model, migrating the causal effect of the expression posture parameters on dense key points of the face to obtain the causal effect of the expression posture parameters on the identity information.
Specifically, the expression posture parameter f of the target face image is consideredexpoCausal effects on identity information may be achieved by targeting fexpoThe control variable is obtained by the intervention experiment, and the difference of the identity estimation results obtained by the experimental group and the control group in the control variable intervention experiment is the causal effect. Wherein, the identity estimation of the contrast group can be obtained by directly identifying the target face image by the face identification network, and the identity estimation z of the experimental groupidIs not subjected to any reaction with fexpoAnd obtaining an identity estimation result under the influence of the related information.
However, since there is no case of not carrying any f in the real situationexpoThe face image of the information, and the pair of pictures usually also not having different expression postures of the same person in the training data are used for learning, in addition, fexpoAnd identity information are not estimated by the same network, both of which can lead to problems in the controlIn the variable intervention experiment, the results of the experimental group can not be obtained. Therefore, directly infer fexpoThe causal effect on identity information cannot be performed, i.e. there is a Fundamental Problem of Causal Inference (FPCI).
To address this problem, embodiments of the present invention provide an innovative solution by first introducing a non-rigid shape for the face (dense key points f with the face)meshExpress) as a mediation variable, due to fexpoAnd fmeshCan be obtained by the same face statistical network based on the estimation of the input face image, and f can be calculated by the face statistical networkexpoTo fmeshCause and effect of, i.e. elimination of fexpoF obtained before and after the related informationmeshEstimating the difference of the results, carrying out causal effect migration according to the migration parameters in the face replacement model, and converting f into fexpoTo fmeshThe causal effect of (a) is transferred to identity information to finally obtain fexpoCausal effects on identity information.
Here, fexpoTo fmeshThe causal effect can be specifically achieved by an original face statistical network, namely a first face statistical network, included in the face replacement model, and a second face statistical network obtained by sequentially inserting each information bottleneck layer into each intermediate layer of the first face statistical network, and respectively performing f on a target face imagemeshAnd (6) obtaining the estimation.
It will be appreciated that each information bottleneck layer can characterize f in the processexpoCompressing the related information to limit the correlation withexpoRelated information flows into a calculation graph, so that causal inference under the condition that the target scene information is controlled is realized through an information bottleneck principle. F obtained by the first face statistical networkmeshThe estimation result is the result of the control group in the control variable intervention experiment, and f is obtained by the second face statistical networkmeshThe estimated result is the result of the experimental group in the experiment, and the difference between the two is fexpoTo fmeshThe causal effect of (a).
Based on any of the above embodiments, the face replacement model includes a face recognition network; the face recognition network is used for determining corresponding original identity representation based on the input face image;
the identity information representation of the source face image is determined based on the following steps:
inputting a source face image into a face recognition network, and extracting source characteristic representation of the source face image from an intermediate layer of the face recognition network;
and determining the identity information representation of the source face image based on the source feature representation and the causal effect.
Specifically, the face replacement model includes a face recognition network, and the face recognition network may perform face recognition based on an input face image to obtain an original identity representation corresponding to the face image. On the basis, the identity information representation of the source face image can be obtained by the following method: firstly, inputting a source face image into the face recognition network, and extracting source characteristic representation of the source face image from an intermediate layer of the face recognition network in the process of calculating the original identity representation of the source face image by the face recognition network; and then, determining the identity information representation of the source face image according to the source feature representation and the causal effect of the expression posture parameters of the target face image on the identity information.
Here, the determining manner of the identity information representation of the source face image may specifically be to input the source feature representation into another network, and then determine the identity information representation of the source face image based on the recognition result and the causal effect of the other network, or perform feature transformation on the source feature representation, and then input the transformed source feature representation back to the face recognition network, and determine the identity information representation of the source face image based on the recognition result and the causal effect of the face recognition network, which is not specifically limited in this embodiment of the present invention.
Based on any of the above embodiments, determining the identity information representation of the source face image based on the source feature representation and the causal effect includes:
determining an updated source feature representation based on the source feature representation and a regression kernel in the face replacement model; the regression kernel is used for compressing information which is irrelevant to the identity in the source feature representation;
inputting the updated source feature representation into an intermediate layer of the face recognition network to obtain compact identity representation output by the face recognition network;
an identity information representation is determined based on the compact identity representation and the causal effect.
Specifically, in order to obtain a more compact representation of the identity information of the source face image, further improve the accuracy of face replacement, and reduce the amount of calculation of face replacement, the embodiment of the present invention compresses the extracted information irrelevant to the identity in the source feature representation by using the regression kernel in the face replacement model, and re-inputs the obtained updated source feature representation into the intermediate layer of the face recognition network, and performs identity recognition on the updated source feature representation by the face recognition network, thereby obtaining a compact identity representation of the source face image.
And then, the identity information representation of the source face image more suitable for the identity replacement task can be determined according to the compact identity representation of the source face image and the causal effect of the expression posture parameters of the target face image on the identity information, namely the inductive deviation of the identity representation of the source face image in the target scene.
Based on any of the above embodiments, the perceptual information representation of the target face image is determined based on the following steps:
inputting a target face image into a face recognition network, and extracting target feature representation of the target face image from an intermediate layer of the face recognition network;
based on the target feature representation, a perceptual information representation of the target face image is determined.
Specifically, the perceptual information representation of the target face image may be obtained by: firstly, inputting a target face image into a face recognition network included in a face replacement model, and extracting target feature representation of the target face image from an intermediate layer of the face recognition network in the process of calculating the original identity representation of the target face image by the face recognition network; then, according to the target feature representation, the perception information representation of the target face image is determined.
Here, the determination method of the perception information representation of the target face image may specifically be to directly perform perception information identification on the target feature representation to obtain the perception information representation of the target face image, or may also be to perform kernel regression transformation on the target feature representation to remove specific identity information included therein to obtain the perception information representation of the target face image, which is not specifically limited in this embodiment of the present invention.
Based on any of the above embodiments, the face replacement model further includes a kernel regression network; the kernel regression network is used for removing specific identity information contained in the input data;
determining a perceptual information representation of the target face image based on the target feature representation, comprising:
and inputting the target feature representation into a kernel regression network to obtain the perception information representation of the target face image output by the kernel regression network.
Specifically, after the target feature representation of the target face image is extracted from the intermediate layer of the face recognition network, the target feature representation may be input into a kernel regression network included in the face replacement model, the kernel regression network performs kernel regression transformation on the target feature representation, the identity information included in the target feature representation is decoupled from the sensing information, and it is ensured that only the decoupled identity information is subjected to regression transformation, the identity information included in the target face image is removed, but the sensing information included in the target face image is not changed, and the kernel regression network may finally obtain the sensing information representation of the target face image.
Based on any of the above embodiments, the kernel regression network is trained based on the identity information representation of the sample source face image, and the target feature representation and the original identity representation of the sample target face image determined by the face recognition network.
Specifically, in order to remove specific identity information contained in the input data, the kernel regression network may be trained as follows: inputting the sample source face image into a face replacement model to obtain identity information representation of the sample source face image; inputting the sample target face image into a face recognition network to obtain an original identity representation of the sample target face image output by the face recognition network, and extracting a target feature representation of the sample target face image from an intermediate layer of the face recognition network; and then, training the initial kernel regression network according to the identity information representation of the sample source face image, the target feature representation and the original identity representation of the sample target face image, thereby obtaining the kernel regression network.
Based on any of the above embodiments, after a large number of sample source face images are collected, the sample source face images may be aligned with the face key points and cut into a fixed size, for example, 512 × 512, required by the face replacement model, and then the cut sample source face images are used for training the face replacement model;
further, the facial key points may be obtained by estimating the facial key points of the sample source face image by a face statistical network, and the face statistical network may be a network based on a face 3D deformation statistical Model (3D deformable Model, 3 DMM), and is denoted as M3d(. charpy). After the facial key points of the sample source face image are obtained through estimation, a similarity transformation matrix H between the group of key points and the reference key points can be calculated, affine transformation is carried out on the sample source face image through the matrix, the sample source face image is cut into a fixed size required by a face replacement model, and the processed sample source face image can be used for subsequent training of the face replacement model.
Based on any one of the above embodiments, in order to obtain the expression posture parameter fexpoDense key points f to facemeshThe embodiment of the invention designs a Hierarchical Information Bottleneck module (HIB) which comprises a plurality of Information Bottleneck layers
Figure 773687DEST_PATH_IMAGE001
Composition, each information bottleneck layer
Figure 670099DEST_PATH_IMAGE002
Are all independent three-layer convolutional neural networks. By successively inserting these information bottleneck layers into the middle layer of the first demographic network, it is possible to implementNow to fexpoAnd compressing the related information, wherein the obtained network is the second face statistical network.
Here, the first demographic network may determine a corresponding f based on the input facial imagemeshAnd fexpoThe network M based on 3DMM can be adopted3d(. dash) implementation. The parameters of 3DMM, i.e. the weighting parameters of Principal Component Analysis (PCA), include three items, which are: shape of face (shape) parameter fshpHead pose (position) and facial expression (expression) parameters, the latter two being collectively denoted as fexpo. In addition, f can also be determined by the three parametersmesh。
Fig. 2 is a schematic flow chart of the causal effect determination method provided by the present invention, and as shown in fig. 2, the specific flow chart is as follows:
in using M3d(. charpy) network estimates the target face image XtF of (a)meshIn the process of (2), the intermediate features of the network are extracted layer by layer and recorded as
Figure 170744DEST_PATH_IMAGE003
(ii) a Using these intermediate features as each information bottleneck layer
Figure 986253DEST_PATH_IMAGE004
As an intermediate feature of each layer
Figure 805305DEST_PATH_IMAGE005
Predicting a channel-by-channel information mask
Figure 505408DEST_PATH_IMAGE006
Namely:
Figure 359094DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 611084DEST_PATH_IMAGE006
is between 0 and 1, and
Figure 415967DEST_PATH_IMAGE005
have the same spatial dimensions.
Then, the information is masked
Figure 654181DEST_PATH_IMAGE006
Acting on intermediate features
Figure 487008DEST_PATH_IMAGE005
By using
Figure 785265DEST_PATH_IMAGE006
To the direction of
Figure 578909DEST_PATH_IMAGE005
Random noise with the same distribution is injected to achieve the purpose of information compression:
Figure 139857DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 92770DEST_PATH_IMAGE009
is random Gaussian noise, sampled from and
Figure 561928DEST_PATH_IMAGE010
gaussian distribution of the mean variance.
To guide each information bottleneck layer
Figure 577289DEST_PATH_IMAGE011
To correctly predict the information mask
Figure 281940DEST_PATH_IMAGE012
In the embodiment of the invention, an information bottleneck tradeoff equation is designed, so that
Figure 964725DEST_PATH_IMAGE012
The value of each element in the series and
Figure 103320DEST_PATH_IMAGE010
for expression of fexpoThe importance of the information corresponds to: intermediate features
Figure 871556DEST_PATH_IMAGE013
In, with fexpoThe more relevant the information is, the information mask of the neuron
Figure 379897DEST_PATH_IMAGE014
The closer the value of the element at the corresponding position in the array is to 1; conversely, the closer to 0. The trade-off equation is designed as follows:
Figure 917189DEST_PATH_IMAGE015
wherein α: (>0) Is a weight parameter that is a function of,
Figure 728150DEST_PATH_IMAGE016
and
Figure 842737DEST_PATH_IMAGE017
are two items that are gambling against each other:
(1)
Figure 531600DEST_PATH_IMAGE018
is each
Figure 188978DEST_PATH_IMAGE019
And
Figure 170840DEST_PATH_IMAGE020
average of mutual information between them, mutual information being noted𝐼A signature;. signature) that measures a characteristic after injection of noise
Figure 507144DEST_PATH_IMAGE021
Degree of information compression of (2):
Figure 498234DEST_PATH_IMAGE022
(2)
Figure 243073DEST_PATH_IMAGE023
after injected noise is measured
Figure 520471DEST_PATH_IMAGE019
For fexpoThe predictive power of (a) for maximum retention of (a) and (b)expoThe related information is as follows:
Figure 750595DEST_PATH_IMAGE024
wherein the content of the first and second substances,
Figure 14217DEST_PATH_IMAGE025
is made by
Figure 646187DEST_PATH_IMAGE019
Replacing original intermediate features of the network
Figure 94486DEST_PATH_IMAGE020
And then, calculating the obtained expression posture parameters.
Multiple information masks thus learned
Figure 41932DEST_PATH_IMAGE006
In (1), the value of an element represents the corresponding intermediate feature
Figure 109246DEST_PATH_IMAGE020
Of (3) andexpothe degree of correlation of information representation, therefore, the information bottleneck layer can pass
Figure 454776DEST_PATH_IMAGE019
Will be related to fexpoFor irrelevant information
Figure 949343DEST_PATH_IMAGE026
The noise is replaced, thereby achieving the purpose of information compression. In addition, in order to avoid the occurrence of systematic errors,
Figure 888480DEST_PATH_IMAGE026
is adopted by
Figure 258019DEST_PATH_IMAGE010
Noise of the same distribution. It should be noted that f is sufficiently compressedexpoIrrelevant information, information bottleneck layers
Figure 723635DEST_PATH_IMAGE027
Is a successive insertion of the 3D network M3dThe compression effect of each bottleneck layer in (dash), i.e., after, is based on the previous bottleneck layer.
By using
Figure 389103DEST_PATH_IMAGE019
Replacing original intermediate features of the network
Figure 815536DEST_PATH_IMAGE020
Carrying out 3DfmeshTaking the parameter vector before the last classified fully-connected layer as f ̃vec(ii) a Using original intermediate features before replacement
Figure 490231DEST_PATH_IMAGE020
Carrying out 3DfmeshTaking the parameter vector before the last classified fully-connected layer as fvec. Since the classification fully-connected layer only performs a classification function, the definition of causal effects according to the causal view of the interventionalist, fexpoTo fmeshThe causal effect of (d) can be determined by the variation of the parameter vector before and after the intermediate feature replacement, and is marked as delta (f)expo→fmesh):
Figure 810354DEST_PATH_IMAGE028
Then, carrying out causal effect migration according to migration parameters in the face replacement model, and carrying out causal effect migration on the fexpoTo fmeshThe causal effect of (a) is transferred to identity information to finally obtain fexpoCausal effects on identity information. Here, the migration parameter can be realized in particular by a parameter learnable neural network implicit function g (dash), in which case fexpoThe causal effect on identity information can be expressed as:
Figure 148188DEST_PATH_IMAGE029
wherein the content of the first and second substances,
Figure 61917DEST_PATH_IMAGE030
and representing the influence of exogenous disturbance on identity estimation, such as background, illumination and the like in the face image.
Considering that the exogenous disturbance is complex and variable and a unified model estimation cannot be established, the embodiment of the invention adopts a random sampling mode in von Mises-Fisher (vMF) distribution with the mean value of 0 and known concentration kappa to simulate the exogenous disturbance in the face image. The vMF aggregation degree can be calculated by identifying 5000 face images in advance to obtain the identity estimation encoding vectors of the face images, and then deducing the aggregation degree according to the standard deviation of the identity estimation encoding vectors.
In addition, adopt and
Figure 133779DEST_PATH_IMAGE019
constructing the inverse equation based on the mask
Figure 183774DEST_PATH_IMAGE006
May also be substituted with fexpoThe relevant information is compressed:
Figure 456624DEST_PATH_IMAGE031
based on any of the above embodiments, the target face image XtThe perceptual information representation of (a) may be obtained in particular by:
first using a pre-trained face recognition network Mid(. charpy) calculate XtAnd from M in the calculation processid(character) extracting X from the intermediate layertIs expressed as
Figure 90605DEST_PATH_IMAGE032
Then, the target feature representation can be input into a kernel regression network included in the face replacement model, and the kernel regression network performs kernel regression transformation on the target feature representation, and the kernel regression network may include a group of multiple nonlinear regressors, which are denoted as
Figure 231736DEST_PATH_IMAGE033
Each nonlinear regression device
Figure 136239DEST_PATH_IMAGE034
Is composed of a convolutional neural network, each of which includes a regression kernel k(i)The regression kernel k(i)The size of the value of the medium element represents
Figure 579989DEST_PATH_IMAGE035
Correlation of mesoneurons with identity representation, therefore, the regression kernel k is used(i)Can act on Mid(. charpy) in
Figure 202732DEST_PATH_IMAGE036
The method comprises the following steps:
Figure 881975DEST_PATH_IMAGE037
wherein, regression kernel k(i)Size and Mid(. charpy) the ith target feature representation in
Figure 402168DEST_PATH_IMAGE038
And the same element value is between 0 and 1.
Figure 16820DEST_PATH_IMAGE039
Make the regression transformation only effective
Figure 126859DEST_PATH_IMAGE038
In which the partial area most relevant to the identity information is not changed
Figure 344213DEST_PATH_IMAGE038
The sensing information which is not related to the identity is contained in the information, so that X can be obtainedtIs represented by the perception information
Figure 488887DEST_PATH_IMAGE040
Here, regression kernel k(i)Specifically, an optimization equation with a constraint term and a sample source face image X are establishedsIdentity information representation of (1), sample target face image XtThe target feature representation and the original identity representation of (2) are learned to obtain:
and establishing an optimization equation with a constraint term, learning a regression kernel required by kernel regression transformation in a feature space in a kernel regression network, and ensuring that the regression transformation only acts on a part related to the identity information in the feature representation without changing other perception information through the regression kernel. The optimization equation is designed as follows:
Figure 507396DEST_PATH_IMAGE041
wherein the content of the first and second substances,
Figure 229364DEST_PATH_IMAGE042
representing the identity of a given face image X, replacing the face recognition network M with featuresid(. charpy) the ith target feature representation in
Figure 125776DEST_PATH_IMAGE043
Figure 124956DEST_PATH_IMAGE044
Represents XsIs indicative of the identity information of (a),
Figure 815832DEST_PATH_IMAGE045
represents XtOriginal identity table ofShown in the figure.
If k is(i)The value of the medium element can correctly represent
Figure 25096DEST_PATH_IMAGE043
The degree of correlation of the neuron with the identity representation, then
Figure 226664DEST_PATH_IMAGE046
In a region of closely related identity
Figure 814771DEST_PATH_IMAGE047
Will be transformed by a transformation function
Figure 66761DEST_PATH_IMAGE048
Is changed, then produced
Figure 373108DEST_PATH_IMAGE049
Relative to
Figure 611323DEST_PATH_IMAGE045
Will vary greatly; in the same way, the method for preparing the composite material,
Figure 83630DEST_PATH_IMAGE050
characteristics used
Figure 240942DEST_PATH_IMAGE051
Only for areas that are less relevant to identity
Figure 34586DEST_PATH_IMAGE052
The transformation is carried out while preserving the areas closely related to the identity, and therefore
Figure 76491DEST_PATH_IMAGE053
Relative to
Figure 29404DEST_PATH_IMAGE045
There should be no significant variation. In addition, during the training process, the device makes
Figure 498562DEST_PATH_IMAGE054
And
Figure 275107DEST_PATH_IMAGE044
cosine similarity between them
Figure 855124DEST_PATH_IMAGE055
As large as possible, can make XtIs closer to the identity representation of the "average face", no longer has XtUnique identity information.
Through the optimization equation with the constraint term, regression kernel k in the training process(i)Will be effectively supervised and will be able to learn the kernel regression transformation of the high dimensional feature space. Using a kernel k based on the regression(i)Kernel regression network computation of XtIs represented by the perception information
Figure 662543DEST_PATH_IMAGE040
Can convert XtDecoupling the contained identity information from the perception information and XtThe unique identity information in the human face identification information is replaced by the identity information of the average human face without uniqueness, and meanwhile, the background perception information required by the human face replacement is reserved.
In addition, in Mid(. charpy) calculate XsFrom M in the course of the original identity representationid(character) extracting X from the intermediate layersSource signature representation of
Figure 37024DEST_PATH_IMAGE056
Then further using regression kernel k(i)Will be
Figure 805260DEST_PATH_IMAGE056
The information irrelevant to the identity is compressed, and the information is specifically obtained by compressing
Figure 313602DEST_PATH_IMAGE056
The middle injection is realized by the same distribution Gaussian noise, so that more compact and robust identity representation, namely X, can be obtainedsCompact identity representation of
Figure 349429DEST_PATH_IMAGE057
Figure 160390DEST_PATH_IMAGE058
Figure 415922DEST_PATH_IMAGE059
At the same time require k(i)The following constraints are satisfied:
Figure 727954DEST_PATH_IMAGE060
Figure 854173DEST_PATH_IMAGE061
wherein the content of the first and second substances,𝐼a signature representing mutual information that measures the characteristics after the injection of noise
Figure 337501DEST_PATH_IMAGE062
The degree of information compression of (2);
Figure 204963DEST_PATH_IMAGE063
representation calculation for identity representation of given face image X, replacing M with featuresid(. charpy) the ith source signature representation in
Figure 930473DEST_PATH_IMAGE056
Figure 442357DEST_PATH_IMAGE064
Represents XsIs represented by the original identity of (a). XsThe original identity representation of (a) may specifically be determined by first identifying XsIs input to MidIn the net, the feature vector before the last classification full connection layer is taken as the original identity representation.
On the basis of the above, can be according to XsCompact identity representation of
Figure 595121DEST_PATH_IMAGE057
And XtCausal effect of expression gesture parameters on identity information
Figure 684300DEST_PATH_IMAGE065
Get X more suitable for the task of identity replacementsThe identity information of (a) indicates:
Figure 446457DEST_PATH_IMAGE066
wherein the content of the first and second substances,λis a weight hyperparameter.
Based on any of the above embodiments, the face replacement model further includes an Adaptive generation network, where the Adaptive generation network includes multiple Adaptive Instance Normalization (AdaIN) modules, and is used to normalize the source face image XsIdentity information representation of
Figure 812847DEST_PATH_IMAGE044
And a target face image XtIs represented by the perception information
Figure 526725DEST_PATH_IMAGE040
Information fusion is carried out, so that a final face changing result, namely a face replacing image is obtained
Figure 978566DEST_PATH_IMAGE067
The specific process is as follows:
Figure 780300DEST_PATH_IMAGE068
wherein the content of the first and second substances,
Figure 391410DEST_PATH_IMAGE069
and
Figure 405020DEST_PATH_IMAGE070
are respectively used
Figure 344157DEST_PATH_IMAGE071
And
Figure 215161DEST_PATH_IMAGE072
the calculated affine parameters and the sizes of the two pairs of affine parameters are equal to
Figure 680777DEST_PATH_IMAGE073
The same is true.
Figure 346245DEST_PATH_IMAGE073
Representing the intermediate layer results of the network being generated,
Figure 5634DEST_PATH_IMAGE074
after that, at the first
Figure 539384DEST_PATH_IMAGE075
The layer is calculated as follows:
Figure 734873DEST_PATH_IMAGE076
wherein, the weight characteristic diagram
Figure 836821DEST_PATH_IMAGE077
From the upper layer
Figure 750550DEST_PATH_IMAGE073
Calculated for fusion activation
Figure 822412DEST_PATH_IMAGE078
And
Figure 639451DEST_PATH_IMAGE079
. Final face replacement image
Figure 646721DEST_PATH_IMAGE067
Is formed by
Figure 906802DEST_PATH_IMAGE080
And performing one-time up-sampling to obtain the product.
Based on any of the above embodiments, the present invention is directed to XsAnd XtIn the difficult scene with large difference in the aspects of facial expression, head posture, image background illumination and the like, a face replacement method based on inductive biased estimation and image information decoupling is disclosed, and a new face replacement model is provided, namely an EVA (ethylene vinyl acetate) model.
Fig. 3 is a schematic diagram of a computing framework of the face replacement Model provided by the present invention, and as shown in fig. 3, f is obtained through a control variable intervention experiment and a virtual reality Model (RCM)expoTo fmeshAnd the causal effect is transferred by a parameter learnable neural network implicit function g (charpy), and the exogenous perturbation obtained at vMF has an effect on identity estimation
Figure 657720DEST_PATH_IMAGE030
Determining the inductive deviation
Figure 827801DEST_PATH_IMAGE081
(ii) a Obtaining regression kernel k through IKE (Invariant kernel regression) algorithm learning(i)Using the regression kernel k(i)To obtain XtIs represented by the perception information
Figure 770087DEST_PATH_IMAGE040
(ii) a Further using regression kernel k(i)Obtaining XsCompact identity representation of
Figure 517463DEST_PATH_IMAGE057
(ii) a Then, inductive biased estimation is carried out, based on
Figure 72073DEST_PATH_IMAGE057
And inductive deviation
Figure 96660DEST_PATH_IMAGE081
To obtain XsIdentity information representation of
Figure 711312DEST_PATH_IMAGE044
(ii) a Finally, according to the generation network, i.e. generator G in the graph (dash), it will be
Figure 680405DEST_PATH_IMAGE044
And
Figure 268732DEST_PATH_IMAGE040
carrying out information fusion to generate a face replacement image
Figure 413405DEST_PATH_IMAGE082
Based on any of the above embodiments, an opponent-generating mode can be adopted to perform end-to-end training on the face replacement model:
in order to supervise the end-to-end training of the model, the embodiment of the invention designs a plurality of loss items. And obtaining a causal loss term according to an information bottleneck tradeoff equation, wherein the causal loss term is used for supervising the learning of the inductive bias identity estimator based on causal inference:
Figure 323593DEST_PATH_IMAGE083
obtaining kernel regression loss terms according to a constraint equation of the regression kernel, and using the kernel regression loss terms for learning of the supervised regression kernel, so that the regression transformation in the feature space only acts on the target feature representation
Figure 920927DEST_PATH_IMAGE043
Without changing other context-aware information:
Figure 817339DEST_PATH_IMAGE084
where β is a weight parameter.
Generating a face replacement image
Figure 49475DEST_PATH_IMAGE085
Thereafter, the network M is recognized again by the faceid(. charpy) extraction
Figure 864984DEST_PATH_IMAGE085
Is represented by
Figure 949615DEST_PATH_IMAGE086
Constructing identity loss terms for supervision
Figure 649718DEST_PATH_IMAGE085
Identity information retention case of (1):
Figure 628038DEST_PATH_IMAGE087
wherein sg [. dashed ] indicates that the gradient stopped.
Design background loss term, pair
Figure 224235DEST_PATH_IMAGE088
The background retention of (2) is supervised:
Figure 297627DEST_PATH_IMAGE089
wherein the content of the first and second substances,
Figure 660475DEST_PATH_IMAGE090
is shown in the calculation
Figure 634247DEST_PATH_IMAGE085
In the process of identity representation, from MidExtracted from the middle layer of the signature
Figure 666926DEST_PATH_IMAGE085
Is shown.
Design expression posture loss item, pair
Figure 726148DEST_PATH_IMAGE088
Monitoring the expression posture consistency:
Figure 627108DEST_PATH_IMAGE091
wherein the content of the first and second substances,
Figure 688343DEST_PATH_IMAGE092
representing face statistics network M3d(. charpy) computed target face image XtThe expression-posture parameters of (1) are,
Figure 157502DEST_PATH_IMAGE093
represents M3d(. charpy) is computed
Figure 563075DEST_PATH_IMAGE085
The expression gesture parameters.
In addition, in order to increase the fidelity of the generated face replacement image, the embodiment of the invention also introduces a discrimination network for confrontation training and adds a confrontation loss item
Figure 408671DEST_PATH_IMAGE094
And the method is used for improving the fidelity of the generated result. Finally, the overall objective function is:
Figure 825877DEST_PATH_IMAGE095
wherein (w)1,w2,w3,w4,w5) Is a weight hyperparameter.
Based on any of the above embodiments, fig. 4 is a schematic flow chart of the face replacement model construction method provided by the present invention, and as shown in fig. 4, the model construction flow includes: firstly, processing training data, and processing a source face image X in the training datasAnd a target face image XtAligning and cutting the blank into a fixed size; second, calculate XsA compact identity representation of; cause and effect inference is carried out through control variable intervention experiment according to XtThe expression posture parameters of the middle human face, and the estimated XsThe identity of (a) represents the inductive bias that should be present in the target scenario, i.e., the inductive identity bias estimator; according to compact identityRepresenting and generalizing identity deviation estimators, determining X's containing generalizing deviationssIs represented by identity information of; then to XtIs subjected to a kernel regression transformation to remove X contained thereintUnique identity information without changing the perception information contained therein, resulting in X without the specific identity informationtThe perceptual information representation of; finally designing a generation network, and enabling XsAnd after regression transformation XtThe perception information representation of the human face is subjected to self-adaptive feature fusion to generate a human face replacement image
Figure 590571DEST_PATH_IMAGE082
And performing end-to-end training by adopting a countermeasure generating mode to finally obtain the EVA model.
In the testing (application) stage of the EVA model, the aligned and cut X can be aligned and cutsAnd XtInputting the image into a trained EVA model to obtain an image subjected to identity replacement, namely a face replacement image
Figure 854412DEST_PATH_IMAGE088
Figure 238120DEST_PATH_IMAGE088
Having a reaction with XsIdentity information identical to X, andtthe same expression posture, hair style and clothes and other identity-independent perception information.
The EVA model adopts the confrontation type self-coding neural network as a main body of a learning frame, effectively learns the characteristic of sample distribution and generates a high-quality vivid human face replacement image. Using X in conjunction with causal inference in constructing an inductive identity bias estimatortThe known information contained in the method is used for modeling the source face identity deviation amount required by the identity replacement task and learning in XsAnd XtAnd under the condition that the information such as expression, posture, style and the like has great difference, the generalization capability of the model is improved.
Based on any one of the embodiments, the face replacement method disclosed by the invention performs attribution on the target scene condition through causal inferenceNano to thereby estimate a source face image XsIndicates the generalized deviation that should be had in the target scenario, XsThe identity representation of (a) is estimated from the deterministic points commonly used in the prior art, modified to a new estimator containing the uncertainty of the target scene; the causal inference under the condition of controlled target scene information is realized through an information bottleneck principle; meanwhile, an optimization model with constraint terms, namely a kernel regression network, is constructed, kernel regression transformation of a high-dimensional feature space is learned, and the target face image X is subjected to kernel regression transformationtDecoupling the contained identity information from the perception information and XtThe unique identity information in the human face identification information is replaced by the identity information of the average human face without uniqueness, and meanwhile, the background perception information required by the human face replacement is reserved.
And finally, the process of generating the face replacement image is to fuse the identity information representation of the source face image containing inductive deviation and the perception information representation of the target face image without identity specificity, so that the face replacement with high fidelity and an open scene capable of generalization is realized.
The invention has the beneficial effects that: the invention is a face identity replacement technology with any identity in an open scene, and according to the method of the invention, natural and vivid identity replacement can be realized on non-paired face image data with any identity, and the identity can be replaced at XsAnd XtEffectively keeping X under the condition that the information of expression, posture, style and the like has great differencesIdentity information and X oftThe stability and generalization capability of the face identity replacement method are greatly improved by sensing information.
Based on any of the embodiments, in order to verify the effectiveness of the method of the present invention, an EVA model is applied to a test set, a quantitative evaluation index is calculated, and compared with the most advanced existing face replacement method, the existing face replacement method includes FaceSwap, FSGAN (FaceSwap-general adaptive Network), deepfaces, and faceshift. Table 1 shows the results of comparing EVA with the existing face replacement method:
firstly, quantitative analysis and comparison are carried out, and a human face replacement image is generated by using a model
Figure 775412DEST_PATH_IMAGE085
Then, the Identity retrieval result (Identity retrieval) is judged by using the face recognition network, including
Figure 711007DEST_PATH_IMAGE085
The Accuracy of identity (Accuracy, the larger the index is, the better),
Figure 966539DEST_PATH_IMAGE096
And
Figure 386894DEST_PATH_IMAGE064
cosine similarity (the larger the index is, the better) of,
Figure 903326DEST_PATH_IMAGE097
And
Figure 885188DEST_PATH_IMAGE045
cosine similarity (smaller index is better), as shown in the second column of table 1 (a); using 3D face statistics network estimation
Figure 628016DEST_PATH_IMAGE085
The pose and expression of (a) are calculated and compared with the target face image XtDifferences of Expression attitudes, including attitude Errors (poserror) and Expression Errors (Expression Errors), are shown in the third and fourth columns of table 1 (a) (smaller indexes are better);
secondly, a user survey is carried out, the EVA generation result and other method generation results are displayed to the user, and the user is requested to select the best generation result from the three aspects of Identity retention (Identity), Expression and posture consistency (position and Expression) and image quality Fidelity (Fidelity), as shown in the table 1 (b) (the larger the index is, the better the index is).
TABLE 1
Figure 353527DEST_PATH_IMAGE098
As shown in the table, both evaluation results show that the face replacement image generated by the EVA has higher fidelity.
The following describes the causal inference-based highly-generalized face replacement device provided by the present invention, and the causal inference-based highly-generalized face replacement device described below and the causal inference-based highly-generalized face replacement method described above may be referred to correspondingly.
Based on any of the above embodiments, fig. 5 is a schematic structural diagram of a highly generalized face replacement device based on causal inference provided by the present invention, as shown in fig. 5, the device includes:
a determining module 510, configured to determine a source face image and a target face image;
a replacing module 520, configured to input the source face image and the target face image into a face replacing model, so as to obtain a face replacing image output by the face replacing model;
the face replacement model determines the identity information representation of a source face image based on the causal effect of the expression posture parameters of a target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image;
the face replacement model is obtained by training based on the sample source face image and the sample target face image.
The device provided by the embodiment of the invention performs causal inference through the face replacement model to determine the causal effect of the expression attitude parameters of the target face image on the identity information, thereby estimating the influence of the difference between the target face image and the source face image in the aspects of expression, attitude and the like on the source identity representation, determining the identity information representation of the source face image based on the causal effect, simultaneously effectively extracting the perception information representation of the target face image, and performing face replacement on the basis to obtain a high-quality vivid face replacement image, thereby improving the stability and generalization capability of the face replacement technology in different target scenes.
Based on any of the above embodiments, the face replacement model includes a first face statistics network; the first face statistical network is used for determining corresponding face dense key points based on the input face image;
the causal effect of the expression posture parameters of the target face image on the identity information is determined based on the following steps:
determining a causal effect of the expression posture parameters on dense key points of the face based on the first face statistical network and the second face statistical network; the second face statistical network is obtained by sequentially inserting all information bottleneck layers into all intermediate layers of the first face statistical network, and the information bottleneck layers are used for carrying out information compression on the expression and posture parameters;
and based on the migration parameters in the face replacement model, migrating the causal effect of the expression posture parameters on dense key points of the face to obtain the causal effect of the expression posture parameters on the identity information.
Based on any of the above embodiments, the face replacement model includes a face recognition network; the face recognition network is used for determining corresponding original identity representation based on the input face image;
the identity information representation of the source face image is determined based on the following steps:
inputting a source face image into a face recognition network, and extracting source characteristic representation of the source face image from an intermediate layer of the face recognition network;
and determining the identity information representation of the source face image based on the source feature representation and the causal effect.
Based on any of the above embodiments, determining the identity information representation of the source face image based on the source feature representation and the causal effect includes:
determining an updated source feature representation based on the source feature representation and a regression kernel in the face replacement model; the regression kernel is used for compressing information which is irrelevant to the identity in the source feature representation;
inputting the updated source feature representation into an intermediate layer of the face recognition network to obtain compact identity representation output by the face recognition network;
an identity information representation is determined based on the compact identity representation and the causal effect.
Based on any of the above embodiments, the perceptual information representation of the target face image is determined based on the following steps:
inputting a target face image into a face recognition network, and extracting target feature representation of the target face image from an intermediate layer of the face recognition network;
based on the target feature representation, a perceptual information representation of the target face image is determined.
Based on any of the above embodiments, the face replacement model further includes a kernel regression network; the kernel regression network is used for removing specific identity information contained in the input data;
determining a perceptual information representation of the target face image based on the target feature representation, comprising:
and inputting the target feature representation into a kernel regression network to obtain the perception information representation of the target face image output by the kernel regression network.
Based on any of the above embodiments, the kernel regression network is trained based on the identity information representation of the sample source face image, and the target feature representation and the original identity representation of the sample target face image determined by the face recognition network.
Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a causal inference based highly generalized face replacement method comprising: determining a source face image and a target face image; inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model; the face replacement model determines the identity information representation of a source face image based on the causal effect of the expression posture parameters of a target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image; the face replacement model is obtained by training based on the sample source face image and the sample target face image.
In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention further provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer being capable of executing the method for highly generalized face replacement based on causal inference provided by the above methods, the method including: determining a source face image and a target face image; inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model; the face replacement model determines the identity information representation of a source face image based on the causal effect of the expression posture parameters of a target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image; the face replacement model is obtained by training based on the sample source face image and the sample target face image.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements a method for performing causal inference based highly generalized face replacement provided by the above methods, the method comprising: determining a source face image and a target face image; inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model; the face replacement model determines the identity information representation of a source face image based on the causal effect of the expression posture parameters of a target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image; the face replacement model is obtained by training based on the sample source face image and the sample target face image.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A high-generalization face replacement method based on causal inference is characterized by comprising the following steps:
determining a source face image and a target face image;
inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model;
the face replacement model determines the identity information representation of the source face image based on the causal effect of the expression posture parameters of the target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image;
the face replacement model is obtained by training based on the sample source face image and the sample target face image.
2. The causal inference based highly generalized face replacement method of claim 1, wherein said face replacement model comprises a first face statistical network; the first face statistical network is used for determining corresponding dense key points of the face based on the input face image;
the causal effect of the expression posture parameters of the target face image on the identity information is determined based on the following steps:
determining a causal effect of the expression posture parameters on dense key points of the human face based on the first face statistical network and the second face statistical network; the second face statistical network is obtained by sequentially inserting all information bottleneck layers into all intermediate layers of the first face statistical network, and the information bottleneck layers are used for carrying out information compression on the expression posture parameters;
and migrating the causal effect of the expression posture parameters on dense key points of the human face based on the migration parameters in the human face replacement model to obtain the causal effect of the expression posture parameters on the identity information.
3. The causal inference based highly generalized face replacement method of claim 1, wherein said face replacement model comprises a face recognition network; the face recognition network is used for determining corresponding original identity representation based on the input face image;
the identity information representation of the source face image is determined based on the following steps:
inputting the source face image into the face recognition network, and extracting source feature representation of the source face image from an intermediate layer of the face recognition network;
and determining the identity information representation of the source face image based on the source feature representation and the causal effect.
4. The causal inference based highly generalized face replacement method of claim 3, wherein said determining an identity information representation of said source face image based on said source feature representation and said causal effect comprises:
determining an updated source feature representation based on the source feature representation and a regression kernel in the face replacement model; the regression kernel is used for compressing information which is irrelevant to identity in the source feature representation;
inputting the updated source feature representation into an intermediate layer of the face recognition network to obtain a compact identity representation output by the face recognition network;
determining the identity information representation based on the compact identity representation and the causal effect.
5. The causal inference based highly generalized face replacement method of claim 3, wherein said perceptual information representation of the target face image is determined based on the following steps:
inputting the target face image into the face recognition network, and extracting target feature representation of the target face image from an intermediate layer of the face recognition network;
determining a perceptual information representation of the target face image based on the target feature representation.
6. The causal inference based highly generalized face replacement method of claim 5, wherein said face replacement model further comprises a kernel regression network; the kernel regression network is used for removing specific identity information contained in the input data;
the determining a perceptual information representation of the target face image based on the target feature representation comprises:
and inputting the target feature representation into the kernel regression network to obtain the perception information representation of the target face image output by the kernel regression network.
7. The causal inference based highly generalized face replacement method of claim 6, wherein the kernel regression network is trained based on the identity information representation of the sample source face image, and the target feature representation and original identity representation of the sample target face image determined by the face recognition network.
8. A causally-inferred highly generalized face replacement device, comprising:
the determining module is used for determining a source face image and a target face image;
the replacing module is used for inputting the source face image and the target face image into a face replacing model to obtain a face replacing image output by the face replacing model;
the face replacement model determines the identity information representation of the source face image based on the causal effect of the expression posture parameters of the target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image;
the face replacement model is obtained by training based on the sample source face image and the sample target face image.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the causal inference based highly generalized face replacement method according to any one of claims 1 to 7.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the causal inference based highly generalized face replacement method according to any one of claims 1 to 7.
CN202111185354.4A 2021-10-12 2021-10-12 High-generalization face replacement method and device based on causal inference and electronic equipment Active CN113627404B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111185354.4A CN113627404B (en) 2021-10-12 2021-10-12 High-generalization face replacement method and device based on causal inference and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111185354.4A CN113627404B (en) 2021-10-12 2021-10-12 High-generalization face replacement method and device based on causal inference and electronic equipment

Publications (2)

Publication Number Publication Date
CN113627404A true CN113627404A (en) 2021-11-09
CN113627404B CN113627404B (en) 2022-01-14

Family

ID=78391324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111185354.4A Active CN113627404B (en) 2021-10-12 2021-10-12 High-generalization face replacement method and device based on causal inference and electronic equipment

Country Status (1)

Country Link
CN (1) CN113627404B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114220051A (en) * 2021-12-10 2022-03-22 马上消费金融股份有限公司 Video processing method, application program testing method and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027465A (en) * 2019-12-09 2020-04-17 韶鼎人工智能科技有限公司 Video face replacement method based on illumination migration
CN111275779A (en) * 2020-01-08 2020-06-12 网易(杭州)网络有限公司 Expression migration method, training method and device of image generator and electronic equipment
CN111783603A (en) * 2020-06-24 2020-10-16 有半岛(北京)信息科技有限公司 Training method for generating confrontation network, image face changing method and video face changing method and device
CN112766160A (en) * 2021-01-20 2021-05-07 西安电子科技大学 Face replacement method based on multi-stage attribute encoder and attention mechanism
WO2021180114A1 (en) * 2020-03-11 2021-09-16 广州虎牙科技有限公司 Facial reconstruction method and apparatus, computer device, and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027465A (en) * 2019-12-09 2020-04-17 韶鼎人工智能科技有限公司 Video face replacement method based on illumination migration
CN111275779A (en) * 2020-01-08 2020-06-12 网易(杭州)网络有限公司 Expression migration method, training method and device of image generator and electronic equipment
WO2021180114A1 (en) * 2020-03-11 2021-09-16 广州虎牙科技有限公司 Facial reconstruction method and apparatus, computer device, and storage medium
CN111783603A (en) * 2020-06-24 2020-10-16 有半岛(北京)信息科技有限公司 Training method for generating confrontation network, image face changing method and video face changing method and device
CN112766160A (en) * 2021-01-20 2021-05-07 西安电子科技大学 Face replacement method based on multi-stage attribute encoder and attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHENG XIN等: "《A Survey of Deep Facial Attribute Analysis》", 《NTERNATIONAL JOURNAL OF COMPUTER VISION》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114220051A (en) * 2021-12-10 2022-03-22 马上消费金融股份有限公司 Video processing method, application program testing method and electronic equipment
CN114220051B (en) * 2021-12-10 2023-07-28 马上消费金融股份有限公司 Video processing method, application program testing method and electronic equipment

Also Published As

Publication number Publication date
CN113627404B (en) 2022-01-14

Similar Documents

Publication Publication Date Title
Zeng et al. Srnet: Improving generalization in 3d human pose estimation with a split-and-recombine approach
CN110929622B (en) Video classification method, model training method, device, equipment and storage medium
CN108780519B (en) Structural learning of convolutional neural networks
CN111754396B (en) Face image processing method, device, computer equipment and storage medium
Triastcyn et al. Generating artificial data for private deep learning
CN109934197A (en) Training method, device and the computer readable storage medium of human face recognition model
CN109711283A (en) A kind of joint doubledictionary and error matrix block Expression Recognition algorithm
CN113822953A (en) Processing method of image generator, image generation method and device
CN114724218A (en) Video detection method, device, equipment and medium
Pérez-Cabo et al. Learning to learn face-pad: a lifelong learning approach
CN110503113B (en) Image saliency target detection method based on low-rank matrix recovery
He et al. Finger vein image deblurring using neighbors-based binary-GAN (NB-GAN)
CN113627404B (en) High-generalization face replacement method and device based on causal inference and electronic equipment
CN111860056B (en) Blink-based living body detection method, blink-based living body detection device, readable storage medium and blink-based living body detection equipment
CN111259264A (en) Time sequence scoring prediction method based on generation countermeasure network
CN110765843A (en) Face verification method and device, computer equipment and storage medium
CN113657272A (en) Micro-video classification method and system based on missing data completion
Raji et al. Photo-guided exploration of volume data features
CN113011307A (en) Face recognition identity authentication method based on deep residual error network
CN111737688A (en) Attack defense system based on user portrait
CN110163049B (en) Face attribute prediction method, device and storage medium
CN116543437A (en) Occlusion face recognition method based on occlusion-feature mapping relation
Sang et al. Image recognition based on multiscale pooling deep convolution neural networks
CN116311434A (en) Face counterfeiting detection method and device, electronic equipment and storage medium
CN113449193A (en) Information recommendation method and device based on multi-classification images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant