CN113627404A - High-generalization face replacement method and device based on causal inference and electronic equipment - Google Patents
High-generalization face replacement method and device based on causal inference and electronic equipment Download PDFInfo
- Publication number
- CN113627404A CN113627404A CN202111185354.4A CN202111185354A CN113627404A CN 113627404 A CN113627404 A CN 113627404A CN 202111185354 A CN202111185354 A CN 202111185354A CN 113627404 A CN113627404 A CN 113627404A
- Authority
- CN
- China
- Prior art keywords
- face
- face image
- representation
- target
- source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000001364 causal effect Effects 0.000 title claims abstract description 121
- 238000000034 method Methods 0.000 title claims abstract description 70
- 230000008447 perception Effects 0.000 claims abstract description 41
- 238000012549 training Methods 0.000 claims abstract description 24
- 238000003860 storage Methods 0.000 claims abstract description 9
- 238000013508 migration Methods 0.000 claims description 10
- 230000005012 migration Effects 0.000 claims description 10
- 230000006835 compression Effects 0.000 claims description 9
- 238000007906 compression Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 9
- 238000005516 engineering process Methods 0.000 abstract description 8
- 230000036544 posture Effects 0.000 description 41
- 230000009466 transformation Effects 0.000 description 17
- 230000001939 inductive effect Effects 0.000 description 12
- DQXBYHZEEUGOBF-UHFFFAOYSA-N but-3-enoic acid;ethene Chemical compound C=C.OC(=O)CC=C DQXBYHZEEUGOBF-UHFFFAOYSA-N 0.000 description 10
- 239000005038 ethylene vinyl acetate Substances 0.000 description 10
- 229920001200 poly(ethylene-vinyl acetate) Polymers 0.000 description 10
- 230000008569 process Effects 0.000 description 10
- 239000000243 solution Substances 0.000 description 10
- 239000000126 substance Substances 0.000 description 10
- 230000000694 effects Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000002474 experimental method Methods 0.000 description 7
- 238000009826 distribution Methods 0.000 description 6
- 239000004576 sand Substances 0.000 description 6
- 239000013598 vector Substances 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 230000001815 facial effect Effects 0.000 description 4
- 230000004927 fusion Effects 0.000 description 4
- 230000014759 maintenance of location Effects 0.000 description 4
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 238000002347 injection Methods 0.000 description 3
- 239000007924 injection Substances 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000008921 facial expression Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 241000287196 Asthenes Species 0.000 description 1
- 208000001613 Gambling Diseases 0.000 description 1
- 101000836150 Homo sapiens Transforming acidic coiled-coil-containing protein 3 Proteins 0.000 description 1
- 241001255830 Thema Species 0.000 description 1
- 102100027048 Transforming acidic coiled-coil-containing protein 3 Human genes 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000011158 quantitative evaluation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention provides a causal inference-based high-generalization face replacement method, a causal inference-based high-generalization face replacement device and electronic equipment, wherein the method comprises the following steps of: determining a source face image and a target face image; inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model; the face replacement model determines the identity information representation of the source face image based on the causal effect of the expression posture parameters of the target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image; the face replacement model is obtained by training based on the sample source face image and the sample target face image. The method, the device, the electronic equipment and the storage medium provided by the invention obtain the high-quality vivid face replacement image, thereby improving the stability and generalization capability of the face replacement technology in different target scenes.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a high-generalization face replacement method and device based on causal inference and electronic equipment.
Background
The human face image identity replacement is a leading-edge research problem in the field of computer vision and the image generation direction, has extremely important application value in the fields of virtual reality, movie special effects, game production and the like, and has attracted wide attention in academia and industry at present. The face image identity replacement, namely 'face changing', replaces the identity information of a given target face image with the identity of a source face image, and keeps other contents in the image unchanged.
At present, the difficulty of the face changing technology lies in improving the generalization capability. Under the difficult scene that the difference between the source face image and the target face image is large, namely when the difference between the pose (face orientation angle), the expression and the like of the face in the target face image and the source face image is large, the face image generated by the model hardly shows the state that the source face image should present under the target expression and pose, and the face changing result is generally distorted.
Disclosure of Invention
The invention provides a high-generalization face replacement method, a high-generalization face replacement device and electronic equipment based on causal inference, which are used for solving the defect of low generalization capability of a face replacement technology in the prior art and realizing the promotion of the generalization capability of the face replacement technology.
The invention provides a causal inference-based high-generalization face replacement method, which comprises the following steps:
determining a source face image and a target face image;
inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model;
the face replacement model determines the identity information representation of the source face image based on the causal effect of the expression posture parameters of the target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image;
the face replacement model is obtained by training based on the sample source face image and the sample target face image.
According to the high-generalization face replacement method based on causal inference, which is provided by the invention, the face replacement model comprises a first face statistical network; the first face statistical network is used for determining corresponding dense key points of the face based on the input face image;
the causal effect of the expression posture parameters of the target face image on the identity information is determined based on the following steps:
determining a causal effect of the expression posture parameters on dense key points of the human face based on the first face statistical network and the second face statistical network; the second face statistical network is obtained by sequentially inserting all information bottleneck layers into all intermediate layers of the first face statistical network, and the information bottleneck layers are used for carrying out information compression on the expression posture parameters;
and migrating the causal effect of the expression posture parameters on dense key points of the human face based on the migration parameters in the human face replacement model to obtain the causal effect of the expression posture parameters on the identity information.
According to the high-generalization face replacement method based on causal inference, which is provided by the invention, the face replacement model comprises a face recognition network; the face recognition network is used for determining corresponding original identity representation based on the input face image;
the identity information representation of the source face image is determined based on the following steps:
inputting the source face image into the face recognition network, and extracting source feature representation of the source face image from an intermediate layer of the face recognition network;
and determining the identity information representation of the source face image based on the source feature representation and the causal effect.
According to the high-generalization face replacement method based on causal inference provided by the invention, the determining the identity information representation of the source face image based on the source feature representation and the causal effect comprises the following steps:
determining an updated source feature representation based on the source feature representation and a regression kernel in the face replacement model; the regression kernel is used for compressing information which is irrelevant to identity in the source feature representation;
inputting the updated source feature representation into an intermediate layer of the face recognition network to obtain a compact identity representation output by the face recognition network;
determining the identity information representation based on the compact identity representation and the causal effect.
According to the high-generalization face replacement method based on causal inference provided by the invention, the perception information representation of the target face image is determined based on the following steps:
inputting the target face image into the face recognition network, and extracting target feature representation of the target face image from an intermediate layer of the face recognition network;
determining a perceptual information representation of the target face image based on the target feature representation.
According to the high-generalization face replacement method based on causal inference, provided by the invention, the face replacement model further comprises a kernel regression network; the kernel regression network is used for removing specific identity information contained in the input data;
the determining a perceptual information representation of the target face image based on the target feature representation comprises:
and inputting the target feature representation into the kernel regression network to obtain the perception information representation of the target face image output by the kernel regression network.
According to the high-generalization face replacement method based on causal inference, the kernel regression network is obtained by training based on the identity information representation of the sample source face image and the target feature representation and the original identity representation of the sample target face image determined by the face recognition network.
The invention also provides a high-generalization face replacement device based on causal inference, which comprises:
the determining module is used for determining a source face image and a target face image;
the replacing module is used for inputting the source face image and the target face image into a face replacing model to obtain a face replacing image output by the face replacing model;
the face replacement model determines the identity information representation of the source face image based on the causal effect of the expression posture parameters of the target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image;
the face replacement model is obtained by training based on the sample source face image and the sample target face image.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the high-generalization human face replacement method based on causal inference.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the causal inference based highly generalized face replacement method as described in any of the above.
The high-generalization face replacement method, the device and the electronic equipment based on causal inference provided by the invention carry out causal inference through the face replacement model, determine the causal effect of the expression attitude parameters of the target face image on the identity information, thereby estimating the influence of the difference between the target face image and the source face image in the aspects of expression, attitude and the like on the source identity representation, determining the identity information representation of the source face image based on the causal effect, simultaneously effectively extracting the perception information representation of the target face image, carrying out face replacement on the basis, obtaining a high-quality vivid face replacement image, and further improving the stability and generalization capability of the face replacement technology in different target scenes.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a highly generalized face replacement method based on causal inference provided by the present invention;
FIG. 2 is a schematic flow diagram of a causal determination method provided by the present invention;
FIG. 3 is a schematic diagram of a computing framework of a face replacement model provided by the present invention;
FIG. 4 is a schematic flow chart of a face replacement model construction method provided by the invention;
FIG. 5 is a schematic structural diagram of a highly generalized face replacement device based on causal inference provided by the present invention;
fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a high-generalization face replacement method based on causal inference. Fig. 1 is a schematic flow chart of a causal inference-based highly generalized face replacement method provided by the present invention, and as shown in fig. 1, the method includes:
Specifically, the source face image is a face image that needs to retain identity information in the face replacement process, and correspondingly, the image that needs to be replaced with the identity information and retains the perception information in the face replacement process is a target face image. Here, the perception information may include hair, clothing, background, lighting conditions, and the like in the target face image. The source face image and the target face image may be captured by a web crawler or other means, or may be acquired by an image acquisition device such as a scanner, a mobile phone, a camera, and the like, which is not specifically limited in this embodiment of the present invention.
the face replacement model determines the identity information representation of a source face image based on the causal effect of the expression posture parameters of a target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image;
the face replacement model is obtained by training based on the sample source face image and the sample target face image.
Here, the expression pose parameters may include expression parameters and pose parameters of the target face image, the expression parameters may represent expression information of the face in the corresponding image, and the pose parameters may represent orientation angle information of the face in the corresponding image. The causal effect of the expression posture parameters of the target face image on the identity information refers to the concept of interventional causal observation, and refers to the difference value of the identity estimation results obtained by the target face image containing the expression posture parameters and the target face image not containing the expression posture parameters.
Specifically, it is considered that in the prior art, under a difficult scene that the difference between the source face image and the target face image is large, that is, when the difference between the pose, expression and the like of the face in the target face image and the source face image is large, the face changing result is generally distorted. In order to solve the problem, in the embodiment of the invention, after a source face image and a target face image are input into a face replacement model, the face replacement model determines the causal effect of expression posture parameters of the target face image on identity information, and estimates the inductive deviation of the identity representation of the source face image in a target scene by using the causal effect, so as to determine the identity information representation of the source face image, and extract a perception information representation irrelevant to the identity information from the target face image.
The causal effect of the expression posture parameters of the target face image on the identity information can be specifically obtained by inducing the target scene condition of the target face image through causal inference, and here, the embodiment of the invention does not specifically limit the causal inference mode. The identity information representation of the source face image can be specifically obtained by carrying out face recognition on the source face image to obtain an original identity representation and then determining according to the original identity representation and the causal effect, or can be obtained by extracting a feature representation from the source face image and then determining according to the feature representation and the causal effect. The perception information representation of the target face image may be obtained by directly recognizing the perception information of the target face image, or may be obtained by extracting a feature representation from the target face image and determining the feature representation according to the feature representation.
In addition, before step 120 is executed, a face replacement model needs to be trained in advance, and the face replacement model can be trained specifically in the following manner: first, a large number of sample source face images and sample target face images are collected. And then, training the initial model based on the sample source face image and the sample target face image, thereby obtaining a face replacement model. Here, the network type and structure of the initial model are not particularly limited in the embodiments of the present invention.
The method provided by the embodiment of the invention determines the causal effect of the expression attitude parameters of the target face image on the identity information by carrying out causal inference through the face replacement model, thereby estimating the influence of the difference between the target face image and the source face image in the aspects of expression, attitude and the like on the source identity representation, determining the identity information representation of the source face image based on the causal effect, simultaneously effectively extracting the perception information representation of the target face image, carrying out face replacement on the basis, obtaining a high-quality vivid face replacement image, and further improving the stability and generalization capability of the face replacement technology in different target scenes.
Based on any of the above embodiments, the face replacement model includes a first face statistics network; the first face statistical network is used for determining corresponding face dense key points based on the input face image;
the causal effect of the expression posture parameters of the target face image on the identity information is determined based on the following steps:
determining a causal effect of the expression posture parameters on dense key points of the face based on the first face statistical network and the second face statistical network; the second face statistical network is obtained by sequentially inserting all information bottleneck layers into all intermediate layers of the first face statistical network, and the information bottleneck layers are used for carrying out information compression on the expression and posture parameters;
and based on the migration parameters in the face replacement model, migrating the causal effect of the expression posture parameters on dense key points of the face to obtain the causal effect of the expression posture parameters on the identity information.
Specifically, the expression posture parameter f of the target face image is consideredexpoCausal effects on identity information may be achieved by targeting fexpoThe control variable is obtained by the intervention experiment, and the difference of the identity estimation results obtained by the experimental group and the control group in the control variable intervention experiment is the causal effect. Wherein, the identity estimation of the contrast group can be obtained by directly identifying the target face image by the face identification network, and the identity estimation z of the experimental groupidIs not subjected to any reaction with fexpoAnd obtaining an identity estimation result under the influence of the related information.
However, since there is no case of not carrying any f in the real situationexpoThe face image of the information, and the pair of pictures usually also not having different expression postures of the same person in the training data are used for learning, in addition, fexpoAnd identity information are not estimated by the same network, both of which can lead to problems in the controlIn the variable intervention experiment, the results of the experimental group can not be obtained. Therefore, directly infer fexpoThe causal effect on identity information cannot be performed, i.e. there is a Fundamental Problem of Causal Inference (FPCI).
To address this problem, embodiments of the present invention provide an innovative solution by first introducing a non-rigid shape for the face (dense key points f with the face)meshExpress) as a mediation variable, due to fexpoAnd fmeshCan be obtained by the same face statistical network based on the estimation of the input face image, and f can be calculated by the face statistical networkexpoTo fmeshCause and effect of, i.e. elimination of fexpoF obtained before and after the related informationmeshEstimating the difference of the results, carrying out causal effect migration according to the migration parameters in the face replacement model, and converting f into fexpoTo fmeshThe causal effect of (a) is transferred to identity information to finally obtain fexpoCausal effects on identity information.
Here, fexpoTo fmeshThe causal effect can be specifically achieved by an original face statistical network, namely a first face statistical network, included in the face replacement model, and a second face statistical network obtained by sequentially inserting each information bottleneck layer into each intermediate layer of the first face statistical network, and respectively performing f on a target face imagemeshAnd (6) obtaining the estimation.
It will be appreciated that each information bottleneck layer can characterize f in the processexpoCompressing the related information to limit the correlation withexpoRelated information flows into a calculation graph, so that causal inference under the condition that the target scene information is controlled is realized through an information bottleneck principle. F obtained by the first face statistical networkmeshThe estimation result is the result of the control group in the control variable intervention experiment, and f is obtained by the second face statistical networkmeshThe estimated result is the result of the experimental group in the experiment, and the difference between the two is fexpoTo fmeshThe causal effect of (a).
Based on any of the above embodiments, the face replacement model includes a face recognition network; the face recognition network is used for determining corresponding original identity representation based on the input face image;
the identity information representation of the source face image is determined based on the following steps:
inputting a source face image into a face recognition network, and extracting source characteristic representation of the source face image from an intermediate layer of the face recognition network;
and determining the identity information representation of the source face image based on the source feature representation and the causal effect.
Specifically, the face replacement model includes a face recognition network, and the face recognition network may perform face recognition based on an input face image to obtain an original identity representation corresponding to the face image. On the basis, the identity information representation of the source face image can be obtained by the following method: firstly, inputting a source face image into the face recognition network, and extracting source characteristic representation of the source face image from an intermediate layer of the face recognition network in the process of calculating the original identity representation of the source face image by the face recognition network; and then, determining the identity information representation of the source face image according to the source feature representation and the causal effect of the expression posture parameters of the target face image on the identity information.
Here, the determining manner of the identity information representation of the source face image may specifically be to input the source feature representation into another network, and then determine the identity information representation of the source face image based on the recognition result and the causal effect of the other network, or perform feature transformation on the source feature representation, and then input the transformed source feature representation back to the face recognition network, and determine the identity information representation of the source face image based on the recognition result and the causal effect of the face recognition network, which is not specifically limited in this embodiment of the present invention.
Based on any of the above embodiments, determining the identity information representation of the source face image based on the source feature representation and the causal effect includes:
determining an updated source feature representation based on the source feature representation and a regression kernel in the face replacement model; the regression kernel is used for compressing information which is irrelevant to the identity in the source feature representation;
inputting the updated source feature representation into an intermediate layer of the face recognition network to obtain compact identity representation output by the face recognition network;
an identity information representation is determined based on the compact identity representation and the causal effect.
Specifically, in order to obtain a more compact representation of the identity information of the source face image, further improve the accuracy of face replacement, and reduce the amount of calculation of face replacement, the embodiment of the present invention compresses the extracted information irrelevant to the identity in the source feature representation by using the regression kernel in the face replacement model, and re-inputs the obtained updated source feature representation into the intermediate layer of the face recognition network, and performs identity recognition on the updated source feature representation by the face recognition network, thereby obtaining a compact identity representation of the source face image.
And then, the identity information representation of the source face image more suitable for the identity replacement task can be determined according to the compact identity representation of the source face image and the causal effect of the expression posture parameters of the target face image on the identity information, namely the inductive deviation of the identity representation of the source face image in the target scene.
Based on any of the above embodiments, the perceptual information representation of the target face image is determined based on the following steps:
inputting a target face image into a face recognition network, and extracting target feature representation of the target face image from an intermediate layer of the face recognition network;
based on the target feature representation, a perceptual information representation of the target face image is determined.
Specifically, the perceptual information representation of the target face image may be obtained by: firstly, inputting a target face image into a face recognition network included in a face replacement model, and extracting target feature representation of the target face image from an intermediate layer of the face recognition network in the process of calculating the original identity representation of the target face image by the face recognition network; then, according to the target feature representation, the perception information representation of the target face image is determined.
Here, the determination method of the perception information representation of the target face image may specifically be to directly perform perception information identification on the target feature representation to obtain the perception information representation of the target face image, or may also be to perform kernel regression transformation on the target feature representation to remove specific identity information included therein to obtain the perception information representation of the target face image, which is not specifically limited in this embodiment of the present invention.
Based on any of the above embodiments, the face replacement model further includes a kernel regression network; the kernel regression network is used for removing specific identity information contained in the input data;
determining a perceptual information representation of the target face image based on the target feature representation, comprising:
and inputting the target feature representation into a kernel regression network to obtain the perception information representation of the target face image output by the kernel regression network.
Specifically, after the target feature representation of the target face image is extracted from the intermediate layer of the face recognition network, the target feature representation may be input into a kernel regression network included in the face replacement model, the kernel regression network performs kernel regression transformation on the target feature representation, the identity information included in the target feature representation is decoupled from the sensing information, and it is ensured that only the decoupled identity information is subjected to regression transformation, the identity information included in the target face image is removed, but the sensing information included in the target face image is not changed, and the kernel regression network may finally obtain the sensing information representation of the target face image.
Based on any of the above embodiments, the kernel regression network is trained based on the identity information representation of the sample source face image, and the target feature representation and the original identity representation of the sample target face image determined by the face recognition network.
Specifically, in order to remove specific identity information contained in the input data, the kernel regression network may be trained as follows: inputting the sample source face image into a face replacement model to obtain identity information representation of the sample source face image; inputting the sample target face image into a face recognition network to obtain an original identity representation of the sample target face image output by the face recognition network, and extracting a target feature representation of the sample target face image from an intermediate layer of the face recognition network; and then, training the initial kernel regression network according to the identity information representation of the sample source face image, the target feature representation and the original identity representation of the sample target face image, thereby obtaining the kernel regression network.
Based on any of the above embodiments, after a large number of sample source face images are collected, the sample source face images may be aligned with the face key points and cut into a fixed size, for example, 512 × 512, required by the face replacement model, and then the cut sample source face images are used for training the face replacement model;
further, the facial key points may be obtained by estimating the facial key points of the sample source face image by a face statistical network, and the face statistical network may be a network based on a face 3D deformation statistical Model (3D deformable Model, 3 DMM), and is denoted as M3d(. charpy). After the facial key points of the sample source face image are obtained through estimation, a similarity transformation matrix H between the group of key points and the reference key points can be calculated, affine transformation is carried out on the sample source face image through the matrix, the sample source face image is cut into a fixed size required by a face replacement model, and the processed sample source face image can be used for subsequent training of the face replacement model.
Based on any one of the above embodiments, in order to obtain the expression posture parameter fexpoDense key points f to facemeshThe embodiment of the invention designs a Hierarchical Information Bottleneck module (HIB) which comprises a plurality of Information Bottleneck layersComposition, each information bottleneck layerAre all independent three-layer convolutional neural networks. By successively inserting these information bottleneck layers into the middle layer of the first demographic network, it is possible to implementNow to fexpoAnd compressing the related information, wherein the obtained network is the second face statistical network.
Here, the first demographic network may determine a corresponding f based on the input facial imagemeshAnd fexpoThe network M based on 3DMM can be adopted3d(. dash) implementation. The parameters of 3DMM, i.e. the weighting parameters of Principal Component Analysis (PCA), include three items, which are: shape of face (shape) parameter fshpHead pose (position) and facial expression (expression) parameters, the latter two being collectively denoted as fexpo. In addition, f can also be determined by the three parametersmesh。
Fig. 2 is a schematic flow chart of the causal effect determination method provided by the present invention, and as shown in fig. 2, the specific flow chart is as follows:
in using M3d(. charpy) network estimates the target face image XtF of (a)meshIn the process of (2), the intermediate features of the network are extracted layer by layer and recorded as(ii) a Using these intermediate features as each information bottleneck layerAs an intermediate feature of each layerPredicting a channel-by-channel information maskNamely:
wherein the content of the first and second substances,is between 0 and 1, andhave the same spatial dimensions.
Then, the information is maskedActing on intermediate featuresBy usingTo the direction ofRandom noise with the same distribution is injected to achieve the purpose of information compression:
wherein the content of the first and second substances,is random Gaussian noise, sampled from andgaussian distribution of the mean variance.
To guide each information bottleneck layerTo correctly predict the information maskIn the embodiment of the invention, an information bottleneck tradeoff equation is designed, so thatThe value of each element in the series andfor expression of fexpoThe importance of the information corresponds to: intermediate featuresIn, with fexpoThe more relevant the information is, the information mask of the neuronThe closer the value of the element at the corresponding position in the array is to 1; conversely, the closer to 0. The trade-off equation is designed as follows:
wherein α: (>0) Is a weight parameter that is a function of,andare two items that are gambling against each other:
(1)is eachAndaverage of mutual information between them, mutual information being noted𝐼A signature;. signature) that measures a characteristic after injection of noiseDegree of information compression of (2):
(2)after injected noise is measuredFor fexpoThe predictive power of (a) for maximum retention of (a) and (b)expoThe related information is as follows:
wherein the content of the first and second substances,is made byReplacing original intermediate features of the networkAnd then, calculating the obtained expression posture parameters.
Multiple information masks thus learnedIn (1), the value of an element represents the corresponding intermediate featureOf (3) andexpothe degree of correlation of information representation, therefore, the information bottleneck layer can passWill be related to fexpoFor irrelevant informationThe noise is replaced, thereby achieving the purpose of information compression. In addition, in order to avoid the occurrence of systematic errors,is adopted byNoise of the same distribution. It should be noted that f is sufficiently compressedexpoIrrelevant information, information bottleneck layersIs a successive insertion of the 3D network M3dThe compression effect of each bottleneck layer in (dash), i.e., after, is based on the previous bottleneck layer.
By usingReplacing original intermediate features of the networkCarrying out 3DfmeshTaking the parameter vector before the last classified fully-connected layer as f ̃vec(ii) a Using original intermediate features before replacementCarrying out 3DfmeshTaking the parameter vector before the last classified fully-connected layer as fvec. Since the classification fully-connected layer only performs a classification function, the definition of causal effects according to the causal view of the interventionalist, fexpoTo fmeshThe causal effect of (d) can be determined by the variation of the parameter vector before and after the intermediate feature replacement, and is marked as delta (f)expo→fmesh):
Then, carrying out causal effect migration according to migration parameters in the face replacement model, and carrying out causal effect migration on the fexpoTo fmeshThe causal effect of (a) is transferred to identity information to finally obtain fexpoCausal effects on identity information. Here, the migration parameter can be realized in particular by a parameter learnable neural network implicit function g (dash), in which case fexpoThe causal effect on identity information can be expressed as:
wherein the content of the first and second substances,and representing the influence of exogenous disturbance on identity estimation, such as background, illumination and the like in the face image.
Considering that the exogenous disturbance is complex and variable and a unified model estimation cannot be established, the embodiment of the invention adopts a random sampling mode in von Mises-Fisher (vMF) distribution with the mean value of 0 and known concentration kappa to simulate the exogenous disturbance in the face image. The vMF aggregation degree can be calculated by identifying 5000 face images in advance to obtain the identity estimation encoding vectors of the face images, and then deducing the aggregation degree according to the standard deviation of the identity estimation encoding vectors.
In addition, adopt andconstructing the inverse equation based on the maskMay also be substituted with fexpoThe relevant information is compressed:
based on any of the above embodiments, the target face image XtThe perceptual information representation of (a) may be obtained in particular by:
first using a pre-trained face recognition network Mid(. charpy) calculate XtAnd from M in the calculation processid(character) extracting X from the intermediate layertIs expressed asThen, the target feature representation can be input into a kernel regression network included in the face replacement model, and the kernel regression network performs kernel regression transformation on the target feature representation, and the kernel regression network may include a group of multiple nonlinear regressors, which are denoted as。
Each nonlinear regression deviceIs composed of a convolutional neural network, each of which includes a regression kernel k(i)The regression kernel k(i)The size of the value of the medium element representsCorrelation of mesoneurons with identity representation, therefore, the regression kernel k is used(i)Can act on Mid(. charpy) inThe method comprises the following steps:
wherein, regression kernel k(i)Size and Mid(. charpy) the ith target feature representation inAnd the same element value is between 0 and 1.Make the regression transformation only effectiveIn which the partial area most relevant to the identity information is not changedThe sensing information which is not related to the identity is contained in the information, so that X can be obtainedtIs represented by the perception information。
Here, regression kernel k(i)Specifically, an optimization equation with a constraint term and a sample source face image X are establishedsIdentity information representation of (1), sample target face image XtThe target feature representation and the original identity representation of (2) are learned to obtain:
and establishing an optimization equation with a constraint term, learning a regression kernel required by kernel regression transformation in a feature space in a kernel regression network, and ensuring that the regression transformation only acts on a part related to the identity information in the feature representation without changing other perception information through the regression kernel. The optimization equation is designed as follows:
wherein the content of the first and second substances,representing the identity of a given face image X, replacing the face recognition network M with featuresid(. charpy) the ith target feature representation in,Represents XsIs indicative of the identity information of (a),represents XtOriginal identity table ofShown in the figure.
If k is(i)The value of the medium element can correctly representThe degree of correlation of the neuron with the identity representation, thenIn a region of closely related identityWill be transformed by a transformation functionIs changed, then producedRelative toWill vary greatly; in the same way, the method for preparing the composite material,characteristics usedOnly for areas that are less relevant to identityThe transformation is carried out while preserving the areas closely related to the identity, and thereforeRelative toThere should be no significant variation. In addition, during the training process, the device makesAndcosine similarity between themAs large as possible, can make XtIs closer to the identity representation of the "average face", no longer has XtUnique identity information.
Through the optimization equation with the constraint term, regression kernel k in the training process(i)Will be effectively supervised and will be able to learn the kernel regression transformation of the high dimensional feature space. Using a kernel k based on the regression(i)Kernel regression network computation of XtIs represented by the perception informationCan convert XtDecoupling the contained identity information from the perception information and XtThe unique identity information in the human face identification information is replaced by the identity information of the average human face without uniqueness, and meanwhile, the background perception information required by the human face replacement is reserved.
In addition, in Mid(. charpy) calculate XsFrom M in the course of the original identity representationid(character) extracting X from the intermediate layersSource signature representation ofThen further using regression kernel k(i)Will beThe information irrelevant to the identity is compressed, and the information is specifically obtained by compressingThe middle injection is realized by the same distribution Gaussian noise, so that more compact and robust identity representation, namely X, can be obtainedsCompact identity representation of:
At the same time require k(i)The following constraints are satisfied:
wherein the content of the first and second substances,𝐼a signature representing mutual information that measures the characteristics after the injection of noiseThe degree of information compression of (2);representation calculation for identity representation of given face image X, replacing M with featuresid(. charpy) the ith source signature representation in,Represents XsIs represented by the original identity of (a). XsThe original identity representation of (a) may specifically be determined by first identifying XsIs input to MidIn the net, the feature vector before the last classification full connection layer is taken as the original identity representation.
On the basis of the above, can be according to XsCompact identity representation ofAnd XtCausal effect of expression gesture parameters on identity informationGet X more suitable for the task of identity replacementsThe identity information of (a) indicates:
wherein the content of the first and second substances,λis a weight hyperparameter.
Based on any of the above embodiments, the face replacement model further includes an Adaptive generation network, where the Adaptive generation network includes multiple Adaptive Instance Normalization (AdaIN) modules, and is used to normalize the source face image XsIdentity information representation ofAnd a target face image XtIs represented by the perception informationInformation fusion is carried out, so that a final face changing result, namely a face replacing image is obtainedThe specific process is as follows:
wherein the content of the first and second substances,andare respectively usedAndthe calculated affine parameters and the sizes of the two pairs of affine parameters are equal toThe same is true.Representing the intermediate layer results of the network being generated,after that, at the firstThe layer is calculated as follows:
wherein, the weight characteristic diagramFrom the upper layerCalculated for fusion activationAnd. Final face replacement imageIs formed byAnd performing one-time up-sampling to obtain the product.
Based on any of the above embodiments, the present invention is directed to XsAnd XtIn the difficult scene with large difference in the aspects of facial expression, head posture, image background illumination and the like, a face replacement method based on inductive biased estimation and image information decoupling is disclosed, and a new face replacement model is provided, namely an EVA (ethylene vinyl acetate) model.
Fig. 3 is a schematic diagram of a computing framework of the face replacement Model provided by the present invention, and as shown in fig. 3, f is obtained through a control variable intervention experiment and a virtual reality Model (RCM)expoTo fmeshAnd the causal effect is transferred by a parameter learnable neural network implicit function g (charpy), and the exogenous perturbation obtained at vMF has an effect on identity estimationDetermining the inductive deviation(ii) a Obtaining regression kernel k through IKE (Invariant kernel regression) algorithm learning(i)Using the regression kernel k(i)To obtain XtIs represented by the perception information(ii) a Further using regression kernel k(i)Obtaining XsCompact identity representation of(ii) a Then, inductive biased estimation is carried out, based onAnd inductive deviationTo obtain XsIdentity information representation of(ii) a Finally, according to the generation network, i.e. generator G in the graph (dash), it will beAndcarrying out information fusion to generate a face replacement image。
Based on any of the above embodiments, an opponent-generating mode can be adopted to perform end-to-end training on the face replacement model:
in order to supervise the end-to-end training of the model, the embodiment of the invention designs a plurality of loss items. And obtaining a causal loss term according to an information bottleneck tradeoff equation, wherein the causal loss term is used for supervising the learning of the inductive bias identity estimator based on causal inference:
obtaining kernel regression loss terms according to a constraint equation of the regression kernel, and using the kernel regression loss terms for learning of the supervised regression kernel, so that the regression transformation in the feature space only acts on the target feature representationWithout changing other context-aware information:
where β is a weight parameter.
Generating a face replacement imageThereafter, the network M is recognized again by the faceid(. charpy) extractionIs represented byConstructing identity loss terms for supervisionIdentity information retention case of (1):
wherein sg [. dashed ] indicates that the gradient stopped.
wherein the content of the first and second substances,is shown in the calculationIn the process of identity representation, from MidExtracted from the middle layer of the signatureIs shown.
wherein the content of the first and second substances,representing face statistics network M3d(. charpy) computed target face image XtThe expression-posture parameters of (1) are,represents M3d(. charpy) is computedThe expression gesture parameters.
In addition, in order to increase the fidelity of the generated face replacement image, the embodiment of the invention also introduces a discrimination network for confrontation training and adds a confrontation loss itemAnd the method is used for improving the fidelity of the generated result. Finally, the overall objective function is:
wherein (w)1,w2,w3,w4,w5) Is a weight hyperparameter.
Based on any of the above embodiments, fig. 4 is a schematic flow chart of the face replacement model construction method provided by the present invention, and as shown in fig. 4, the model construction flow includes: firstly, processing training data, and processing a source face image X in the training datasAnd a target face image XtAligning and cutting the blank into a fixed size; second, calculate XsA compact identity representation of; cause and effect inference is carried out through control variable intervention experiment according to XtThe expression posture parameters of the middle human face, and the estimated XsThe identity of (a) represents the inductive bias that should be present in the target scenario, i.e., the inductive identity bias estimator; according to compact identityRepresenting and generalizing identity deviation estimators, determining X's containing generalizing deviationssIs represented by identity information of; then to XtIs subjected to a kernel regression transformation to remove X contained thereintUnique identity information without changing the perception information contained therein, resulting in X without the specific identity informationtThe perceptual information representation of; finally designing a generation network, and enabling XsAnd after regression transformation XtThe perception information representation of the human face is subjected to self-adaptive feature fusion to generate a human face replacement imageAnd performing end-to-end training by adopting a countermeasure generating mode to finally obtain the EVA model.
In the testing (application) stage of the EVA model, the aligned and cut X can be aligned and cutsAnd XtInputting the image into a trained EVA model to obtain an image subjected to identity replacement, namely a face replacement image。Having a reaction with XsIdentity information identical to X, andtthe same expression posture, hair style and clothes and other identity-independent perception information.
The EVA model adopts the confrontation type self-coding neural network as a main body of a learning frame, effectively learns the characteristic of sample distribution and generates a high-quality vivid human face replacement image. Using X in conjunction with causal inference in constructing an inductive identity bias estimatortThe known information contained in the method is used for modeling the source face identity deviation amount required by the identity replacement task and learning in XsAnd XtAnd under the condition that the information such as expression, posture, style and the like has great difference, the generalization capability of the model is improved.
Based on any one of the embodiments, the face replacement method disclosed by the invention performs attribution on the target scene condition through causal inferenceNano to thereby estimate a source face image XsIndicates the generalized deviation that should be had in the target scenario, XsThe identity representation of (a) is estimated from the deterministic points commonly used in the prior art, modified to a new estimator containing the uncertainty of the target scene; the causal inference under the condition of controlled target scene information is realized through an information bottleneck principle; meanwhile, an optimization model with constraint terms, namely a kernel regression network, is constructed, kernel regression transformation of a high-dimensional feature space is learned, and the target face image X is subjected to kernel regression transformationtDecoupling the contained identity information from the perception information and XtThe unique identity information in the human face identification information is replaced by the identity information of the average human face without uniqueness, and meanwhile, the background perception information required by the human face replacement is reserved.
And finally, the process of generating the face replacement image is to fuse the identity information representation of the source face image containing inductive deviation and the perception information representation of the target face image without identity specificity, so that the face replacement with high fidelity and an open scene capable of generalization is realized.
The invention has the beneficial effects that: the invention is a face identity replacement technology with any identity in an open scene, and according to the method of the invention, natural and vivid identity replacement can be realized on non-paired face image data with any identity, and the identity can be replaced at XsAnd XtEffectively keeping X under the condition that the information of expression, posture, style and the like has great differencesIdentity information and X oftThe stability and generalization capability of the face identity replacement method are greatly improved by sensing information.
Based on any of the embodiments, in order to verify the effectiveness of the method of the present invention, an EVA model is applied to a test set, a quantitative evaluation index is calculated, and compared with the most advanced existing face replacement method, the existing face replacement method includes FaceSwap, FSGAN (FaceSwap-general adaptive Network), deepfaces, and faceshift. Table 1 shows the results of comparing EVA with the existing face replacement method:
firstly, quantitative analysis and comparison are carried out, and a human face replacement image is generated by using a modelThen, the Identity retrieval result (Identity retrieval) is judged by using the face recognition network, includingThe Accuracy of identity (Accuracy, the larger the index is, the better),Andcosine similarity (the larger the index is, the better) of,Andcosine similarity (smaller index is better), as shown in the second column of table 1 (a); using 3D face statistics network estimationThe pose and expression of (a) are calculated and compared with the target face image XtDifferences of Expression attitudes, including attitude Errors (poserror) and Expression Errors (Expression Errors), are shown in the third and fourth columns of table 1 (a) (smaller indexes are better);
secondly, a user survey is carried out, the EVA generation result and other method generation results are displayed to the user, and the user is requested to select the best generation result from the three aspects of Identity retention (Identity), Expression and posture consistency (position and Expression) and image quality Fidelity (Fidelity), as shown in the table 1 (b) (the larger the index is, the better the index is).
TABLE 1
As shown in the table, both evaluation results show that the face replacement image generated by the EVA has higher fidelity.
The following describes the causal inference-based highly-generalized face replacement device provided by the present invention, and the causal inference-based highly-generalized face replacement device described below and the causal inference-based highly-generalized face replacement method described above may be referred to correspondingly.
Based on any of the above embodiments, fig. 5 is a schematic structural diagram of a highly generalized face replacement device based on causal inference provided by the present invention, as shown in fig. 5, the device includes:
a determining module 510, configured to determine a source face image and a target face image;
a replacing module 520, configured to input the source face image and the target face image into a face replacing model, so as to obtain a face replacing image output by the face replacing model;
the face replacement model determines the identity information representation of a source face image based on the causal effect of the expression posture parameters of a target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image;
the face replacement model is obtained by training based on the sample source face image and the sample target face image.
The device provided by the embodiment of the invention performs causal inference through the face replacement model to determine the causal effect of the expression attitude parameters of the target face image on the identity information, thereby estimating the influence of the difference between the target face image and the source face image in the aspects of expression, attitude and the like on the source identity representation, determining the identity information representation of the source face image based on the causal effect, simultaneously effectively extracting the perception information representation of the target face image, and performing face replacement on the basis to obtain a high-quality vivid face replacement image, thereby improving the stability and generalization capability of the face replacement technology in different target scenes.
Based on any of the above embodiments, the face replacement model includes a first face statistics network; the first face statistical network is used for determining corresponding face dense key points based on the input face image;
the causal effect of the expression posture parameters of the target face image on the identity information is determined based on the following steps:
determining a causal effect of the expression posture parameters on dense key points of the face based on the first face statistical network and the second face statistical network; the second face statistical network is obtained by sequentially inserting all information bottleneck layers into all intermediate layers of the first face statistical network, and the information bottleneck layers are used for carrying out information compression on the expression and posture parameters;
and based on the migration parameters in the face replacement model, migrating the causal effect of the expression posture parameters on dense key points of the face to obtain the causal effect of the expression posture parameters on the identity information.
Based on any of the above embodiments, the face replacement model includes a face recognition network; the face recognition network is used for determining corresponding original identity representation based on the input face image;
the identity information representation of the source face image is determined based on the following steps:
inputting a source face image into a face recognition network, and extracting source characteristic representation of the source face image from an intermediate layer of the face recognition network;
and determining the identity information representation of the source face image based on the source feature representation and the causal effect.
Based on any of the above embodiments, determining the identity information representation of the source face image based on the source feature representation and the causal effect includes:
determining an updated source feature representation based on the source feature representation and a regression kernel in the face replacement model; the regression kernel is used for compressing information which is irrelevant to the identity in the source feature representation;
inputting the updated source feature representation into an intermediate layer of the face recognition network to obtain compact identity representation output by the face recognition network;
an identity information representation is determined based on the compact identity representation and the causal effect.
Based on any of the above embodiments, the perceptual information representation of the target face image is determined based on the following steps:
inputting a target face image into a face recognition network, and extracting target feature representation of the target face image from an intermediate layer of the face recognition network;
based on the target feature representation, a perceptual information representation of the target face image is determined.
Based on any of the above embodiments, the face replacement model further includes a kernel regression network; the kernel regression network is used for removing specific identity information contained in the input data;
determining a perceptual information representation of the target face image based on the target feature representation, comprising:
and inputting the target feature representation into a kernel regression network to obtain the perception information representation of the target face image output by the kernel regression network.
Based on any of the above embodiments, the kernel regression network is trained based on the identity information representation of the sample source face image, and the target feature representation and the original identity representation of the sample target face image determined by the face recognition network.
Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a causal inference based highly generalized face replacement method comprising: determining a source face image and a target face image; inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model; the face replacement model determines the identity information representation of a source face image based on the causal effect of the expression posture parameters of a target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image; the face replacement model is obtained by training based on the sample source face image and the sample target face image.
In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention further provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer being capable of executing the method for highly generalized face replacement based on causal inference provided by the above methods, the method including: determining a source face image and a target face image; inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model; the face replacement model determines the identity information representation of a source face image based on the causal effect of the expression posture parameters of a target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image; the face replacement model is obtained by training based on the sample source face image and the sample target face image.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements a method for performing causal inference based highly generalized face replacement provided by the above methods, the method comprising: determining a source face image and a target face image; inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model; the face replacement model determines the identity information representation of a source face image based on the causal effect of the expression posture parameters of a target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image; the face replacement model is obtained by training based on the sample source face image and the sample target face image.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A high-generalization face replacement method based on causal inference is characterized by comprising the following steps:
determining a source face image and a target face image;
inputting the source face image and the target face image into a face replacement model to obtain a face replacement image output by the face replacement model;
the face replacement model determines the identity information representation of the source face image based on the causal effect of the expression posture parameters of the target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image;
the face replacement model is obtained by training based on the sample source face image and the sample target face image.
2. The causal inference based highly generalized face replacement method of claim 1, wherein said face replacement model comprises a first face statistical network; the first face statistical network is used for determining corresponding dense key points of the face based on the input face image;
the causal effect of the expression posture parameters of the target face image on the identity information is determined based on the following steps:
determining a causal effect of the expression posture parameters on dense key points of the human face based on the first face statistical network and the second face statistical network; the second face statistical network is obtained by sequentially inserting all information bottleneck layers into all intermediate layers of the first face statistical network, and the information bottleneck layers are used for carrying out information compression on the expression posture parameters;
and migrating the causal effect of the expression posture parameters on dense key points of the human face based on the migration parameters in the human face replacement model to obtain the causal effect of the expression posture parameters on the identity information.
3. The causal inference based highly generalized face replacement method of claim 1, wherein said face replacement model comprises a face recognition network; the face recognition network is used for determining corresponding original identity representation based on the input face image;
the identity information representation of the source face image is determined based on the following steps:
inputting the source face image into the face recognition network, and extracting source feature representation of the source face image from an intermediate layer of the face recognition network;
and determining the identity information representation of the source face image based on the source feature representation and the causal effect.
4. The causal inference based highly generalized face replacement method of claim 3, wherein said determining an identity information representation of said source face image based on said source feature representation and said causal effect comprises:
determining an updated source feature representation based on the source feature representation and a regression kernel in the face replacement model; the regression kernel is used for compressing information which is irrelevant to identity in the source feature representation;
inputting the updated source feature representation into an intermediate layer of the face recognition network to obtain a compact identity representation output by the face recognition network;
determining the identity information representation based on the compact identity representation and the causal effect.
5. The causal inference based highly generalized face replacement method of claim 3, wherein said perceptual information representation of the target face image is determined based on the following steps:
inputting the target face image into the face recognition network, and extracting target feature representation of the target face image from an intermediate layer of the face recognition network;
determining a perceptual information representation of the target face image based on the target feature representation.
6. The causal inference based highly generalized face replacement method of claim 5, wherein said face replacement model further comprises a kernel regression network; the kernel regression network is used for removing specific identity information contained in the input data;
the determining a perceptual information representation of the target face image based on the target feature representation comprises:
and inputting the target feature representation into the kernel regression network to obtain the perception information representation of the target face image output by the kernel regression network.
7. The causal inference based highly generalized face replacement method of claim 6, wherein the kernel regression network is trained based on the identity information representation of the sample source face image, and the target feature representation and original identity representation of the sample target face image determined by the face recognition network.
8. A causally-inferred highly generalized face replacement device, comprising:
the determining module is used for determining a source face image and a target face image;
the replacing module is used for inputting the source face image and the target face image into a face replacing model to obtain a face replacing image output by the face replacing model;
the face replacement model determines the identity information representation of the source face image based on the causal effect of the expression posture parameters of the target face image on the identity information, and performs face replacement based on the identity information representation and the perception information representation of the target face image;
the face replacement model is obtained by training based on the sample source face image and the sample target face image.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the causal inference based highly generalized face replacement method according to any one of claims 1 to 7.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the causal inference based highly generalized face replacement method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111185354.4A CN113627404B (en) | 2021-10-12 | 2021-10-12 | High-generalization face replacement method and device based on causal inference and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111185354.4A CN113627404B (en) | 2021-10-12 | 2021-10-12 | High-generalization face replacement method and device based on causal inference and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113627404A true CN113627404A (en) | 2021-11-09 |
CN113627404B CN113627404B (en) | 2022-01-14 |
Family
ID=78391324
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111185354.4A Active CN113627404B (en) | 2021-10-12 | 2021-10-12 | High-generalization face replacement method and device based on causal inference and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113627404B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114220051A (en) * | 2021-12-10 | 2022-03-22 | 马上消费金融股份有限公司 | Video processing method, application program testing method and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111027465A (en) * | 2019-12-09 | 2020-04-17 | 韶鼎人工智能科技有限公司 | Video face replacement method based on illumination migration |
CN111275779A (en) * | 2020-01-08 | 2020-06-12 | 网易(杭州)网络有限公司 | Expression migration method, training method and device of image generator and electronic equipment |
CN111783603A (en) * | 2020-06-24 | 2020-10-16 | 有半岛(北京)信息科技有限公司 | Training method for generating confrontation network, image face changing method and video face changing method and device |
CN112766160A (en) * | 2021-01-20 | 2021-05-07 | 西安电子科技大学 | Face replacement method based on multi-stage attribute encoder and attention mechanism |
WO2021180114A1 (en) * | 2020-03-11 | 2021-09-16 | 广州虎牙科技有限公司 | Facial reconstruction method and apparatus, computer device, and storage medium |
-
2021
- 2021-10-12 CN CN202111185354.4A patent/CN113627404B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111027465A (en) * | 2019-12-09 | 2020-04-17 | 韶鼎人工智能科技有限公司 | Video face replacement method based on illumination migration |
CN111275779A (en) * | 2020-01-08 | 2020-06-12 | 网易(杭州)网络有限公司 | Expression migration method, training method and device of image generator and electronic equipment |
WO2021180114A1 (en) * | 2020-03-11 | 2021-09-16 | 广州虎牙科技有限公司 | Facial reconstruction method and apparatus, computer device, and storage medium |
CN111783603A (en) * | 2020-06-24 | 2020-10-16 | 有半岛(北京)信息科技有限公司 | Training method for generating confrontation network, image face changing method and video face changing method and device |
CN112766160A (en) * | 2021-01-20 | 2021-05-07 | 西安电子科技大学 | Face replacement method based on multi-stage attribute encoder and attention mechanism |
Non-Patent Citations (1)
Title |
---|
ZHENG XIN等: "《A Survey of Deep Facial Attribute Analysis》", 《NTERNATIONAL JOURNAL OF COMPUTER VISION》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114220051A (en) * | 2021-12-10 | 2022-03-22 | 马上消费金融股份有限公司 | Video processing method, application program testing method and electronic equipment |
CN114220051B (en) * | 2021-12-10 | 2023-07-28 | 马上消费金融股份有限公司 | Video processing method, application program testing method and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113627404B (en) | 2022-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zeng et al. | Srnet: Improving generalization in 3d human pose estimation with a split-and-recombine approach | |
CN110929622B (en) | Video classification method, model training method, device, equipment and storage medium | |
CN108780519B (en) | Structural learning of convolutional neural networks | |
CN111754396B (en) | Face image processing method, device, computer equipment and storage medium | |
Triastcyn et al. | Generating artificial data for private deep learning | |
CN109934197A (en) | Training method, device and the computer readable storage medium of human face recognition model | |
CN109711283A (en) | A kind of joint doubledictionary and error matrix block Expression Recognition algorithm | |
CN113822953A (en) | Processing method of image generator, image generation method and device | |
CN114724218A (en) | Video detection method, device, equipment and medium | |
Pérez-Cabo et al. | Learning to learn face-pad: a lifelong learning approach | |
CN110503113B (en) | Image saliency target detection method based on low-rank matrix recovery | |
He et al. | Finger vein image deblurring using neighbors-based binary-GAN (NB-GAN) | |
CN113627404B (en) | High-generalization face replacement method and device based on causal inference and electronic equipment | |
CN111860056B (en) | Blink-based living body detection method, blink-based living body detection device, readable storage medium and blink-based living body detection equipment | |
CN111259264A (en) | Time sequence scoring prediction method based on generation countermeasure network | |
CN110765843A (en) | Face verification method and device, computer equipment and storage medium | |
CN113657272A (en) | Micro-video classification method and system based on missing data completion | |
Raji et al. | Photo-guided exploration of volume data features | |
CN113011307A (en) | Face recognition identity authentication method based on deep residual error network | |
CN111737688A (en) | Attack defense system based on user portrait | |
CN110163049B (en) | Face attribute prediction method, device and storage medium | |
CN116543437A (en) | Occlusion face recognition method based on occlusion-feature mapping relation | |
Sang et al. | Image recognition based on multiscale pooling deep convolution neural networks | |
CN116311434A (en) | Face counterfeiting detection method and device, electronic equipment and storage medium | |
CN113449193A (en) | Information recommendation method and device based on multi-classification images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |