CN115631285B - Face rendering method, device, equipment and storage medium based on unified driving - Google Patents

Face rendering method, device, equipment and storage medium based on unified driving Download PDF

Info

Publication number
CN115631285B
CN115631285B CN202211487137.5A CN202211487137A CN115631285B CN 115631285 B CN115631285 B CN 115631285B CN 202211487137 A CN202211487137 A CN 202211487137A CN 115631285 B CN115631285 B CN 115631285B
Authority
CN
China
Prior art keywords
face
initial
model
rendering
face image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211487137.5A
Other languages
Chinese (zh)
Other versions
CN115631285A (en
Inventor
王文斓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hongmian Xiaoice Technology Co Ltd
Original Assignee
Beijing Hongmian Xiaoice Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Hongmian Xiaoice Technology Co Ltd filed Critical Beijing Hongmian Xiaoice Technology Co Ltd
Priority to CN202211487137.5A priority Critical patent/CN115631285B/en
Publication of CN115631285A publication Critical patent/CN115631285A/en
Application granted granted Critical
Publication of CN115631285B publication Critical patent/CN115631285B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a face rendering method, device and equipment based on unified driving and a storage medium, and relates to the technical field of artificial intelligence. The method comprises the following steps: obtaining target source data, wherein the target source data at least comprises: an initial face image; inputting the initial face image into a preset driving model, and outputting three-dimensional face parameters corresponding to the initial face image, wherein the three-dimensional face parameters are obtained by extracting and converting the initial face image by the driving model; and rendering the three-dimensional face parameters according to a preset rendering strategy to generate a target face image. The embodiment provided by the invention not only can improve the driving precision, but also decouples the driving and the rendering.

Description

Face rendering method, device, equipment and storage medium based on unified driving
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a face rendering method, device, equipment and storage medium based on unified driving.
Background
When artificial intelligence is utilized to carry out three-dimensional face reconstruction, the analysis from the viewpoint of intermediate expression is carried out, and the technical scheme based on artificial intelligence driving in the prior art mainly comprises the following steps:
First, end-to-end training is performed with latent variable (latency) as an intermediate expression. For example, the ID, voice and head gestures are decoupled through contrast learning, so that the generated driving video can respectively acquire the ID, the mouth shape and the head gestures from different driving sources to be controllably driven, an image is generated through a GAN framework, and a person-specific rendering image can be generated by decoupling eye blink and mouth in expression, combining contrast learning with mouth imaging and audio imaging, and respectively acquiring the head gestures, expression and mouth shape characteristics from different driving sources to drive the radice field. However, because the whole is based on end-to-end training, there are conditions of driving and rendering coupling, that is, the learned 2D latency space is strongly related to the generator used in the training stage, which results in difficulty in overlaying the latency to other generators, and because the end-to-end training is performed without constraint and supervision on the learned latency, even if the same generator and network structure are used, if the training data are different, the learned latency is not used in a cross manner, and the coupling problem is serious.
Second, again expressed in terms of latency as an intermediate, end-to-end training, but the first prior art virtual person modeling is picture-based. The second is expressed by mesh and texture commonly used in computer graphics, that is, the latency contains information of 3D geometric changes. For example Deep Appearance Models (depth appearance model) uses 7306 vertexes of 3-dimensional coordinates to represent geometry, uses 2D texture map to represent texture, uses person specific end-to-end VAE structure training to make its latency space code information based on expression, uses geometry and texture decoders to respectively decode geometry and texture information corresponding to latency, namely changes latency to implement driving. Authentic Volumetric Avatars (realistic three-dimensional head) uses position map instead of vertex to represent 3D geometry, and adds ID encoder and gaze encoder for explicit control of ID and gaze, supporting general (not person specific) driving, but the intermediate expression still takes expression as main, and decodes geometry and texture information corresponding to the latency through geometry and application decoder (condition on ID (input by ID)) respectively. The scheme has the same problems as the first scheme, and because the training data is additionally introduced with 3D geometric coordinates as input besides images (textures), the information quantity stored in the solution space is more than that of the first type, textures and geometry can be respectively decoded, but the earlier-stage registration cost and the workload for constructing the training data are huge.
Thirdly, taking a parameterized model as an intermediate expression, firstly respectively extracting 3DMM (three-dimensional face statistical model) parameters of each frame of face from a driving source and a driven source, then replacing the parameters of the driven source with the parameters of the driving source, thereby obtaining the 3DMM parameters after driving, and then decoding a driving image through a conditional decoder (conditional decoder).
Disclosure of Invention
The invention provides a face rendering method, device, equipment and storage medium based on unified driving, which not only can improve the driving precision, but also decouples driving and rendering.
In a first aspect, the present invention provides a face rendering method based on unified driving, the method comprising:
obtaining target source data, wherein the target source data at least comprises: an initial face image;
inputting the initial face image into a preset driving model, and outputting three-dimensional face parameters corresponding to the initial face image, wherein the three-dimensional face parameters are obtained by extracting and converting the initial face image by the driving model;
And rendering the three-dimensional face parameters according to a preset rendering strategy to generate a target face image.
Preferably, according to the face rendering method based on unified driving provided by the invention,
the obtaining the target source data at least comprises:
acquiring a plurality of initial data;
according to the attribute of each initial data, classifying, extracting and processing a plurality of initial data according to a preset attribute classification strategy to obtain a plurality of sub-data sets;
and screening out a target data set from the sub-data set, and screening out the target source data from the target data set.
Preferably, according to the face rendering method based on unified driving provided by the invention,
the driving model at least comprises: a latent variable extraction model and a latent variable conversion model;
the step of inputting the initial face image into a preset driving model and outputting three-dimensional face parameters corresponding to the initial face image comprises the following steps:
inputting the initial face image into the latent variable extraction model, and extracting face latent variable features corresponding to the initial face image;
and converting the human face latent variable characteristics into corresponding three-dimensional human face parameters by using the latent variable conversion model.
Preferably, according to the face rendering method based on unified driving provided by the invention,
the face latent variable feature at least comprises: a first face feature, a face action feature;
inputting the initial face image into the latent variable extraction model, extracting the face latent variable characteristics corresponding to the initial face image, and comprising the following steps:
inputting the initial face image into a preset face recognition model, and performing first extraction processing on the initial face image by using a first encoder in the face recognition model to obtain the corresponding first face feature; and
performing second extraction processing on the initial face image by using a second encoder in the face recognition model to obtain corresponding second face features;
and decoupling the second face features to generate a plurality of face action features.
Preferably, according to the face rendering method based on unified driving provided by the invention,
the step of converting the face latent variable feature into a corresponding three-dimensional face parameter by using the latent variable conversion model comprises the following steps:
acquiring a plurality of initial face parameters of a three-dimensional face statistical model;
constructing a mapping relation between each initial face parameter and the first face feature and the face action feature respectively;
And respectively converting the first face feature and the face action feature into the three-dimensional face parameters by using the multi-layer perceptron of the latent variable conversion model based on the mapping relation.
Preferably, according to the face rendering method based on unified driving provided by the invention,
rendering the three-dimensional face parameters according to a preset rendering strategy to generate a target face image, wherein the method comprises the following steps:
according to a preset rendering strategy, invoking an image rendering model corresponding to the rendering strategy;
and rendering the three-dimensional face parameters by using the image rendering model to generate the target face image.
In a second aspect, the present invention further provides a face rendering device based on unified driving, where the device includes:
the acquisition module is used for acquiring target source data, wherein the target source data at least comprises: an initial face image;
the identification module is used for inputting the initial face image into a preset driving model and outputting three-dimensional face parameters corresponding to the initial face image, wherein the three-dimensional face parameters are obtained by extracting and converting the initial face image by the driving model;
And the rendering module is used for rendering the three-dimensional face parameters according to a preset rendering strategy to generate a target face image.
In a third aspect, the present invention further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the face rendering method based on unified driving as described in any one of the above when executing the program.
In a fourth aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a unified drive based face rendering method as described in any of the above.
In a fifth aspect, the present invention also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of a face rendering method based on unified driving as described in any one of the above.
The invention provides a face rendering method, a device, equipment and a storage medium based on unified driving, which are used for acquiring target source data, wherein the target source data at least comprises the following components: an initial face image; inputting the initial face image into a preset driving model, and outputting three-dimensional face parameters corresponding to the initial face image, wherein the three-dimensional face parameters are obtained by extracting and converting the initial face image by the driving model; and rendering the three-dimensional face parameters according to a preset rendering strategy to generate a target face image. Not only can the driving precision be improved, but also the driving and the rendering are decoupled.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a face rendering method based on unified driving provided by the invention;
FIG. 2 is a second flow chart of a face rendering method based on unified driving provided by the invention;
FIG. 3 is a schematic flow chart of step S200 in FIG. 1 according to the present invention;
FIG. 4 is a third flow chart of a face rendering method based on unified driving according to the present invention;
fig. 5 is a schematic structural diagram of a face rendering device based on unified driving provided by the invention;
fig. 6 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The face rendering method, device, equipment and storage medium based on unified driving of the present invention are described below with reference to fig. 1 to 6.
As shown in fig. 1, which is a schematic flow chart of an implementation of a face rendering method based on unified driving according to an embodiment of the present invention, the face rendering method based on unified driving may include, but is not limited to, steps S100 to S300.
S100, acquiring material information of a plurality of materials, wherein the material information at least comprises a material identifier;
s200, inputting the initial face image into a preset driving model, and outputting three-dimensional face parameters corresponding to the initial face image, wherein the three-dimensional face parameters are obtained by extracting and converting the initial face image by the driving model;
and S300, rendering the three-dimensional face parameters according to a preset rendering strategy to generate a target face image.
In step S100 of some embodiments, target source data is acquired.
It will be appreciated that the specific implementation steps may be: the computer program firstly acquires a plurality of initial data; then, according to the attribute of each initial data, classifying, extracting and processing a plurality of initial data according to a preset attribute classification strategy to obtain a plurality of sub-data sets; and screening out a target data set from the sub-data set, and screening out the target source data from the target data set.
The target source data at least includes: an initial face image.
In step S200 of some embodiments, the initial face image is input into a preset driving model, and three-dimensional face parameters corresponding to the initial face image are output.
It will be appreciated that, after the step of acquiring the target source data in step S100 is performed, the specific performing steps may be: inputting the initial face image into a preset face recognition model, and performing first extraction processing on the initial face image by using a first encoder in the face recognition model to obtain the corresponding first face feature; performing second extraction processing on the initial face image by using a second encoder in the face recognition model to obtain corresponding second face features; and decoupling the second face features to generate a plurality of face action features.
Acquiring a plurality of initial face parameters of the three-dimensional face statistical model; constructing a mapping relation between each initial face parameter and the first face feature and the face action feature respectively; and respectively converting the first face feature and the face action feature into the three-dimensional face parameters by using the multi-layer perceptron of the latent variable conversion model based on the mapping relation.
It should be noted that the three-dimensional face parameter is obtained by extracting and converting the initial face image by the driving model.
In step S300 of some embodiments, the three-dimensional face parameter is rendered according to a preset rendering policy, so as to generate a target face image.
It may be understood that after the step S200 of inputting the initial face image into a preset driving model and outputting the three-dimensional face parameters corresponding to the initial face image, the specific implementation steps may be: according to a preset rendering strategy, invoking an image rendering model corresponding to the rendering strategy; and rendering the three-dimensional face parameters by using the image rendering model to generate the target face image.
In some embodiments of the present invention, the acquiring the target source data includes at least:
acquiring a plurality of initial data;
according to the attribute of each initial data, classifying, extracting and processing a plurality of initial data according to a preset attribute classification strategy to obtain a plurality of sub-data sets;
and screening out a target data set from the sub-data set, and screening out the target source data from the target data set.
It can be understood that the computer program firstly obtains a plurality of initial data from a preset database, and then performs classification extraction processing on the obtained plurality of initial data according to a preset attribute classification strategy according to the attribute of each initial data to obtain a plurality of sub-data sets. And screening out a target data set from the plurality of sub-data sets according to the target attribute, and screening out target source data from the target data set.
In some embodiments, referring to fig. 2, the classification extraction process is performed on a plurality of initial data, so as to obtain a plurality of sub-data sets, where the initial data includes at least, but is not limited to: non-semantic image data without semantic information, non-semantic video data, non-semantic audio data, semantic image data with semantic information, semantic video data, non-semantic audio data, etc.
The computer program performs classification extraction according to the structure attributes respectively corresponding to the plurality of initial data to obtain a plurality of sub-data sets.
It should be noted that the plurality of sub-data sets at least includes, but is not limited to: sub-dataset a: (Unstructured Driving Source) unstructured drive source dataset, sub dataset B: (Structured Driving Source) structuring the drive source data set. And screening the sub-data set A from the plurality of sub-data sets to be a target data set, and screening target source data from the target data set. The target source data includes at least, but is not limited to, an initial face image.
The initial face image includes at least but is not limited to: ID Image (face ID Image), driving Image (face motion Image), driving Audio (voice motion data).
It should be further noted that the sub data set B: (Structured Driving Source) the manner in which the structured drive source data set is obtained may include, but is not limited to: obtained using a cluster analysis algorithm or obtained using conventional computer vision or signal processing methods.
More specifically, the step of obtaining the sub-data set B may be extracting semantic features from the unstructured drive source data set to form the sub-data set B. Such as features extracted from semantic image data or semantic video, key points, semantic segmentation masks (masks), depth maps, optical flow maps, motion trajectories, etc.; or a structural representation of the MFCC features, phonemes, pitches, etc. extracted from the semantic audio data with semantics to form a structured driving source dataset.
In contrast, structured drive source data sets may also be obtained from user portraits or business data, which may include, but are not limited to: user personality data, age data, gender data, and the like.
It should be noted that, cluster analysis refers to an analysis process of grouping a collection of physical or abstract objects into a plurality of classes composed of similar objects, and clustering refers to a process of classifying data into different classes or clusters, so that objects in the same cluster have a great similarity, and objects among different clusters have a great variability. The cluster analysis is a exploratory analysis, people do not need to give a classification standard in advance in the classification process, and the cluster analysis can automatically classify from sample data.
It will be appreciated that the sub-dataset B is (Structured Driving Source) a structured drive source dataset, which may include Driving Signal 1, driving Signal n, corresponding to Signal 1 Encoder, signal n Encoder, extracting Signal 1 latency, signal n latency, respectively.
As shown in fig. 3, in some embodiments of the present invention, the driving model includes at least: the step S200 may include, but is not limited to, steps S310 to S320.
S310, inputting the initial face image into the latent variable extraction model, and extracting face latent variable features corresponding to the initial face image;
s320, converting the face latent variable characteristics into corresponding three-dimensional face parameters by using the latent variable conversion model.
In step S310 of some embodiments, the initial face image is input into the latent variable extraction model, and a face latent variable feature corresponding to the initial face image is extracted.
It will be appreciated that the specific implementation steps may be: inputting the initial face image into a preset face recognition model by a computer program, and performing first extraction processing on the initial face image by using a first encoder in the face recognition model to obtain the corresponding first face feature; performing second extraction processing on the initial face image by using a second encoder in the face recognition model to obtain corresponding second face features; and decoupling the second face features to generate a plurality of face action features.
In step S320 of some embodiments, the latent variable feature of the face is converted into a corresponding three-dimensional face parameter using the latent variable conversion model.
It may be understood that, after the step of inputting the initial face image into the latent variable extraction model and extracting the face latent variable feature corresponding to the initial face image in step S310 is performed, the specific performing steps may be: firstly, acquiring a plurality of initial face parameters of a three-dimensional face statistical model; constructing a mapping relation between each initial face parameter and the first face feature and the face action feature respectively; and respectively converting the first face feature and the face action feature into the three-dimensional face parameters by using the multi-layer perceptron of the latent variable conversion model based on the mapping relation.
In some embodiments of the present invention, the face latent variable feature includes at least: a first face feature, a face action feature;
inputting the initial face image into the latent variable extraction model, extracting the face latent variable characteristics corresponding to the initial face image, and comprising the following steps:
inputting the initial face image into a preset face recognition model, and performing first extraction processing on the initial face image by using a first encoder in the face recognition model to obtain the corresponding first face feature; and
Performing second extraction processing on the initial face image by using a second encoder in the face recognition model to obtain corresponding second face features;
and decoupling the second face features to generate a plurality of face action features.
It can be understood that the initial face image is input into a preset face recognition model, and a first extraction process is performed on the initial face image by using a first encoder in the face recognition model, so as to obtain the corresponding first face feature.
As shown in fig. 2, in some embodiments, the driving model includes at least: latent variable extraction model (latency extraction), latent variable conversion model (latency Converter). In the Latent variable extraction model (Latent extraction model), the first Encoder may be an ID Encoder, and the first face feature may be an ID Latent variable feature.
It should be noted that face recognition is a biometric technology that performs identity recognition based on facial feature information of a person. A series of related technologies, commonly referred to as image recognition and face recognition, are used to capture images or video streams containing faces with a camera or cameras, and automatically detect and track the faces in the images, thereby performing face recognition on the detected faces.
Face feature extraction, also known as face characterization, is a process of feature modeling of a face. The method for extracting the face features by using the face recognition model can generally comprise the following two methods: one is a knowledge-based characterization method; the other is a characterization method based on algebraic features or statistical learning. The embodiments of the present application are not particularly limited.
It should be further noted that the second Encoder may be a Non-ID Encoder, and the second face feature may be a Non-ID latency (Non-ID latent variable feature).
And after performing second extraction processing on the initial face image by using a second encoder in the face recognition model to obtain corresponding second face features, performing decoupling processing on the second face features to generate a plurality of face action features.
It can be appreciated that the facial motion characteristics may be at least: head pose, eye blink, size (eye orientation), expression.
It should be further noted that, the pre-training Audio Encoder (Audio Encoder) and the Non-ID Encoder (Non-ID Encoder) may be used to extract Audio latency (Audio latent variable feature) by comparing and learning.
In some embodiments of the present invention, the converting the face latent variable feature into a corresponding three-dimensional face parameter using the latent variable conversion model includes:
acquiring a plurality of initial face parameters of a three-dimensional face statistical model;
constructing a mapping relation between each initial face parameter and the first face feature and the face action feature respectively;
and respectively converting the first face feature and the face action feature into the three-dimensional face parameters by using the multi-layer perceptron of the latent variable conversion model based on the mapping relation.
It can be understood that in the Latent variable conversion model (latency Converter), first, a plurality of initial face parameters of a three-dimensional face statistical model (3DMM,3D Morphable models) are obtained, and then a mapping relation of conversion between decoupling characteristics and corresponding parameters is learned through a Multi-Layer Perceptron (MLP), namely, a mapping relation between each initial face parameter and the first face characteristics and between each initial face action characteristics is constructed through the Multi-Layer Perceptron, and the Multi-Layer Perceptron of the Latent variable conversion model is utilized to convert the first face characteristics and the face action characteristics into the three-dimensional face parameters respectively based on the mapping relation.
Alternatively, as shown in fig. 2, the three-dimensional face parameters include at least but are not limited to Shape β (Shape three-dimensional face parameters), pose θ (Pose three-dimensional face parameters), expression
Figure 694908DEST_PATH_IMAGE001
(expression three-dimensional face parameters).
It should be noted that the multi-layer perceptron (MLP, multilayer Perceptron) is a feedforward artificial neural network model that maps multiple data of an input onto data of a single output.
It should be noted that the three-dimensional face statistical model is a general three-dimensional face model, and the faces are represented by fixed points. The key idea is that faces can be matched one by one in a three-dimensional space and can be obtained by carrying out weighted linear addition on orthogonal basis of a plurality of other faces.
In some embodiments of the present invention, the rendering the three-dimensional face parameter according to a preset rendering policy to generate a target face image includes:
according to a preset rendering strategy, invoking an image rendering model corresponding to the rendering strategy;
and rendering the three-dimensional face parameters by using the image rendering model to generate the target face image.
It can be understood that, in the Rendering layer, according to a preset Rendering policy, an image Rendering model corresponding to the Rendering policy is called, and then the three-dimensional face parameter is rendered by using the image Rendering model, so as to generate the target face image.
It should be noted that, the image rendering model may be at least: 3D CG model, neural rendering, GAN/Diffusion model.
As shown in fig. 4, which is a third flow chart of the face rendering method based on unified driving provided by the invention, an initial face image is firstly obtained, a latent variable extraction model of a driving model is used for extracting the initial face image, a plurality of latent variable features are extracted, then the latent variable features are converted into a plurality of corresponding three-dimensional face parameters through a multi-layer perceptron by using a latent variable conversion model, an image rendering model corresponding to the rendering strategy is called according to a preset rendering strategy, and the three-dimensional face parameters are rendered by using the image rendering model, so that the target face image is generated.
Wherein the latent variable features include at least: ID. head pose), eyeblink/size, expression, audio/mouse (mouth opening).
The three-dimensional face parameters include, but are not limited to, shape (Shape three-dimensional face parameters), else Pose (Pose three-dimensional face parameters), and Expression (Expression three-dimensional face parameters).
Pose (Pose three-dimensional face parameters) includes at least, but is not limited to, neg post (neck Pose) and jaw post (mandibular Pose).
The face rendering based on unified driving provided by the invention is realized by acquiring target source data, wherein the target source data at least comprises: an initial face image; inputting the initial face image into a preset driving model, and outputting three-dimensional face parameters corresponding to the initial face image, wherein the three-dimensional face parameters are obtained by extracting and converting the initial face image by the driving model; and rendering the three-dimensional face parameters according to a preset rendering strategy to generate a target face image. Not only can the driving precision be improved, but also the driving and the rendering are decoupled.
The invention provides a mixed unified driving frame, which decouples driving and rendering while ensuring that the driving precision is consistent with or even better than a latency scheme, so that the driving frame provided by the invention can be used for training one time end to end, and can be directly used by changing any renderer or rendering mode, and only a rendering part is required to be trained. The method has the advantages that the driving model is saved and repeatedly trained for many times, meanwhile, the consistency of the driving effect is guaranteed, the driving model is trained once and used for many times, the purpose of uniform driving is achieved, and the 3D parameterized expression capability is improved.
The invention is mainly described with respect to a human face, but in practice the invention may be applied to other components including but not limited to parts of a limb (e.g. using parameterized representation of SMPL), a hand (e.g. using parameterized representation of MANO), etc. And a unified driving engine which expresses unification is formed by a plurality of components of a unified whole body model such as SMPL-X unified header (FLAME), body (SMPL), hand (MANO) and the like, but the driving mode can be different according to different components.
The face rendering device based on the unified driving provided by the invention is described below, and the face rendering device based on the unified driving described below and the face rendering method based on the unified driving described above can be correspondingly referred to each other.
Referring to fig. 5, a schematic structural diagram of a face rendering device based on unified driving according to the present invention is shown, where the device includes:
an obtaining module 510, configured to obtain target source data, where the target source data at least includes: an initial face image;
the recognition module 520 is configured to input the initial face image into a preset driving model, and output three-dimensional face parameters corresponding to the initial face image, where the three-dimensional face parameters are obtained by extracting and converting the initial face image by the driving model;
The rendering module 530 is configured to render the three-dimensional face parameter according to a preset rendering policy, and generate a target face image.
Optionally, according to the face rendering device based on unified driving provided by the present invention, the obtaining module 510 is configured to obtain a plurality of initial data; according to the attribute of each initial data, classifying, extracting and processing a plurality of initial data according to a preset attribute classification strategy to obtain a plurality of sub-data sets; and screening out a target data set from the sub-data set, and screening out the target source data from the target data set.
Optionally, the face rendering device based on unified driving according to the present invention, the driving model at least includes: a latent variable extraction model and a latent variable conversion model, and a recognition module 520, configured to input the initial face image into the latent variable extraction model, and extract a face latent variable feature corresponding to the initial face image;
and converting the human face latent variable characteristics into corresponding three-dimensional human face parameters by using the latent variable conversion model.
Optionally, according to the face rendering device based on unified driving provided by the invention, the face latent variable feature at least includes: the recognition module 520 is configured to input the initial face image into a preset face recognition model, and perform a first extraction process on the initial face image by using a first encoder in the face recognition model to obtain a corresponding first face feature; performing second extraction processing on the initial face image by using a second encoder in the face recognition model to obtain corresponding second face features; and decoupling the second face features to generate a plurality of face action features.
Optionally, according to the face rendering device based on unified driving provided by the present invention, the recognition module 520 is configured to obtain a plurality of initial face parameters of the three-dimensional face statistical model; constructing a mapping relation between each initial face parameter and the first face feature and the face action feature respectively; and respectively converting the first face feature and the face action feature into the three-dimensional face parameters by using the multi-layer perceptron of the latent variable conversion model based on the mapping relation.
Optionally, according to the face rendering device based on unified driving provided by the present invention, the rendering module 530 is configured to invoke, according to a preset rendering policy, an image rendering model corresponding to the rendering policy; and rendering the three-dimensional face parameters by using the image rendering model to generate the target face image.
The invention provides a face rendering device based on unified driving, which is characterized by obtaining target source data, wherein the target source data at least comprises: an initial face image; inputting the initial face image into a preset driving model, and outputting three-dimensional face parameters corresponding to the initial face image, wherein the three-dimensional face parameters are obtained by extracting and converting the initial face image by the driving model; and rendering the three-dimensional face parameters according to a preset rendering strategy to generate a target face image. Not only can the driving precision be improved, but also the driving and the rendering are decoupled.
Fig. 6 illustrates a physical schematic diagram of an electronic device, as shown in fig. 6, which may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a unified drive-based face rendering method comprising: obtaining target source data, wherein the target source data at least comprises: an initial face image; inputting the initial face image into a preset driving model, and outputting three-dimensional face parameters corresponding to the initial face image, wherein the three-dimensional face parameters are obtained by extracting and converting the initial face image by the driving model; and rendering the three-dimensional face parameters according to a preset rendering strategy to generate a target face image.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute the face rendering method based on unified driving provided by the above methods, and the method includes: obtaining target source data, wherein the target source data at least comprises: an initial face image; inputting the initial face image into a preset driving model, and outputting three-dimensional face parameters corresponding to the initial face image, wherein the three-dimensional face parameters are obtained by extracting and converting the initial face image by the driving model; and rendering the three-dimensional face parameters according to a preset rendering strategy to generate a target face image.
In still another aspect, the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the method for face rendering based on unified driving provided by the above methods, the method comprising: obtaining target source data, wherein the target source data at least comprises: an initial face image; inputting the initial face image into a preset driving model, and outputting three-dimensional face parameters corresponding to the initial face image, wherein the three-dimensional face parameters are obtained by extracting and converting the initial face image by the driving model; and rendering the three-dimensional face parameters according to a preset rendering strategy to generate a target face image.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (6)

1. The face rendering method based on the unified driving is characterized by comprising the following steps of:
obtaining target source data, wherein the target source data at least comprises: an initial face image;
inputting the initial face image into a preset driving model, and outputting three-dimensional face parameters corresponding to the initial face image, wherein the three-dimensional face parameters are obtained by extracting and converting the initial face image by the driving model; wherein the driving model at least comprises: a latent variable extraction model and a latent variable conversion model;
the step of inputting the initial face image into a preset driving model and outputting three-dimensional face parameters corresponding to the initial face image comprises the following steps:
Inputting the initial face image into the latent variable extraction model, and extracting face latent variable features corresponding to the initial face image; wherein, the human face latent variable characteristics at least comprise: a first face feature, a face action feature;
converting the latent variable characteristics of the human face into corresponding three-dimensional human face parameters by utilizing the latent variable conversion model;
the step of inputting the initial face image into the latent variable extraction model and extracting the face latent variable features corresponding to the initial face image comprises the following steps:
inputting the initial face image into a preset face recognition model, and performing first extraction processing on the initial face image by using a first encoder in the face recognition model to obtain the corresponding first face feature; and
performing second extraction processing on the initial face image by using a second encoder in the face recognition model to obtain corresponding second face features;
decoupling the second face features to generate a plurality of face action features;
the step of converting the face latent variable feature into the corresponding three-dimensional face parameter by using the latent variable conversion model comprises the following steps:
Acquiring a plurality of initial face parameters of a three-dimensional face statistical model;
constructing a mapping relation between each initial face parameter and the first face feature and the face action feature respectively;
based on the mapping relation, converting the first face feature and the face action feature into the three-dimensional face parameters by using the multi-layer perceptron of the latent variable conversion model;
and rendering the three-dimensional face parameters according to a preset rendering strategy to generate a target face image.
2. The face rendering method based on unified driving of claim 1, wherein,
the obtaining the target source data at least comprises:
acquiring a plurality of initial data;
according to the attribute of each initial data, classifying, extracting and processing a plurality of initial data according to a preset attribute classification strategy to obtain a plurality of sub-data sets;
and screening out a target data set from the plurality of sub-data sets, and screening out the target source data from the target data set.
3. The face rendering method based on unified driving of claim 2, wherein,
rendering the three-dimensional face parameters according to a preset rendering strategy to generate a target face image, wherein the method comprises the following steps:
According to a preset rendering strategy, invoking an image rendering model corresponding to the rendering strategy;
and rendering the three-dimensional face parameters by using the image rendering model to generate the target face image.
4. A face rendering device based on unified driving, the device comprising:
the acquisition module is used for acquiring target source data, wherein the target source data at least comprises: an initial face image;
the identification module is used for inputting the initial face image into a preset driving model and outputting three-dimensional face parameters corresponding to the initial face image, wherein the three-dimensional face parameters are obtained by extracting and converting the initial face image by the driving model; wherein the driving model at least comprises: a latent variable extraction model and a latent variable conversion model; the step of inputting the initial face image into a preset driving model and outputting three-dimensional face parameters corresponding to the initial face image is used for inputting the initial face image into the latent variable extraction model and extracting face latent variable features corresponding to the initial face image; wherein, the human face latent variable characteristics at least comprise: a first face feature, a face action feature; converting the latent variable characteristics of the human face into corresponding three-dimensional human face parameters by utilizing the latent variable conversion model; the step of inputting the initial face image into the latent variable extraction model and extracting the face latent variable features corresponding to the initial face image is used for: inputting the initial face image into a preset face recognition model, and performing first extraction processing on the initial face image by using a first encoder in the face recognition model to obtain the corresponding first face feature; performing second extraction processing on the initial face image by using a second encoder in the face recognition model to obtain corresponding second face features; decoupling the second face features to generate a plurality of face action features; the step of converting the face latent variable feature into a corresponding three-dimensional face parameter by using the latent variable conversion model is used for: acquiring a plurality of initial face parameters of a three-dimensional face statistical model; constructing a mapping relation between each initial face parameter and the first face feature and the face action feature respectively; based on the mapping relation, converting the first face feature and the face action feature into the three-dimensional face parameters by using the multi-layer perceptron of the latent variable conversion model;
And the rendering module is used for rendering the three-dimensional face parameters according to a preset rendering strategy to generate a target face image.
5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the unified drive based face rendering method of any one of claims 1 to 3 when the program is executed.
6. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of a unified drive based face rendering method according to any of claims 1 to 3.
CN202211487137.5A 2022-11-25 2022-11-25 Face rendering method, device, equipment and storage medium based on unified driving Active CN115631285B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211487137.5A CN115631285B (en) 2022-11-25 2022-11-25 Face rendering method, device, equipment and storage medium based on unified driving

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211487137.5A CN115631285B (en) 2022-11-25 2022-11-25 Face rendering method, device, equipment and storage medium based on unified driving

Publications (2)

Publication Number Publication Date
CN115631285A CN115631285A (en) 2023-01-20
CN115631285B true CN115631285B (en) 2023-05-02

Family

ID=84911015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211487137.5A Active CN115631285B (en) 2022-11-25 2022-11-25 Face rendering method, device, equipment and storage medium based on unified driving

Country Status (1)

Country Link
CN (1) CN115631285B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020140832A1 (en) * 2019-01-04 2020-07-09 北京达佳互联信息技术有限公司 Three-dimensional facial reconstruction method and apparatus, and electronic device and storage medium
CN113313085A (en) * 2021-07-28 2021-08-27 北京奇艺世纪科技有限公司 Image processing method and device, electronic equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10565305B2 (en) * 2016-11-18 2020-02-18 Salesforce.Com, Inc. Adaptive attention model for image captioning
CN114282895A (en) * 2021-12-22 2022-04-05 中国农业银行股份有限公司 Data processing method and device, electronic equipment and storage medium
CN115205949B (en) * 2022-09-05 2022-12-06 腾讯科技(深圳)有限公司 Image generation method and related device
CN115356953B (en) * 2022-10-21 2023-02-03 北京红棉小冰科技有限公司 Virtual robot decision method, system and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020140832A1 (en) * 2019-01-04 2020-07-09 北京达佳互联信息技术有限公司 Three-dimensional facial reconstruction method and apparatus, and electronic device and storage medium
CN113313085A (en) * 2021-07-28 2021-08-27 北京奇艺世纪科技有限公司 Image processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115631285A (en) 2023-01-20

Similar Documents

Publication Publication Date Title
US20210174072A1 (en) Microexpression-based image recognition method and apparatus, and related device
Seow et al. A comprehensive overview of Deepfake: Generation, detection, datasets, and opportunities
WO2024051445A1 (en) Image generation method and related device
WO2022052530A1 (en) Method and apparatus for training face correction model, electronic device, and storage medium
US20220398797A1 (en) Enhanced system for generation of facial models and animation
CN110796593A (en) Image processing method, device, medium and electronic equipment based on artificial intelligence
US11887232B2 (en) Enhanced system for generation of facial models and animation
US20220398795A1 (en) Enhanced system for generation of facial models and animation
CN113807265A (en) Diversified human face image synthesis method and system
CN115565238A (en) Face-changing model training method, face-changing model training device, face-changing model training apparatus, storage medium, and program product
CN116129013A (en) Method, device and storage medium for generating virtual person animation video
CN113657272B (en) Micro video classification method and system based on missing data completion
CN112200236B (en) Training method of face parameter identification model and face parameter identification method
Li et al. End-to-end training for compound expression recognition
CN113542758A (en) Generating antagonistic neural network assisted video compression and broadcast
CN115914505B (en) Video generation method and system based on voice-driven digital human model
CN115457374B (en) Deep pseudo-image detection model generalization evaluation method and device based on reasoning mode
CN115631285B (en) Face rendering method, device, equipment and storage medium based on unified driving
CN112990123B (en) Image processing method, apparatus, computer device and medium
CN116385606A (en) Speech signal driven personalized three-dimensional face animation generation method and application thereof
CN115690276A (en) Video generation method and device of virtual image, computer equipment and storage medium
CN114694074A (en) Method, device and storage medium for generating video by using image
CN114529785A (en) Model training method, video generation method and device, equipment and medium
CN113542759A (en) Generating antagonistic neural network assisted video reconstruction
Singh et al. Facial emotion detection using action units

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Wang Wenlan

Inventor after: Wang Duomin

Inventor after: Wang Baoyuan

Inventor before: Wang Wenlan