CN113744129A - Semantic neural rendering-based face image generation method and system - Google Patents

Semantic neural rendering-based face image generation method and system Download PDF

Info

Publication number
CN113744129A
CN113744129A CN202111050013.6A CN202111050013A CN113744129A CN 113744129 A CN113744129 A CN 113744129A CN 202111050013 A CN202111050013 A CN 202111050013A CN 113744129 A CN113744129 A CN 113744129A
Authority
CN
China
Prior art keywords
image
network
face image
deformation
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111050013.6A
Other languages
Chinese (zh)
Inventor
陈元祺
任俞睿
龙仕强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Original Assignee
Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Instritute Of Intelligent Video Audio Technology Longgang Shenzhen filed Critical Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Priority to CN202111050013.6A priority Critical patent/CN113744129A/en
Publication of CN113744129A publication Critical patent/CN113744129A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • G06T3/04
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Abstract

A face image generation method based on semantic neural rendering comprises the following steps: s1, a mapping network generates a hidden vector from a target face motion descriptor; s2, under the guidance of the hidden vector, a deformation network estimates accurate deformation between a source face image and a required target image, and deforms the source face image by using the estimated deformation parameters to generate a rough deformed image; and S3, generating a final fine image from the roughly deformed image by the editing network. The human face image generation method based on semantic neural rendering can generate images with more accurate actions, can generate more vivid results and accurate movement, and simultaneously still retains the identity information of the source human face image. Not only can a realistic image with the correct global pose be generated, but also vivid micro-presentations, such as pounding mouth and raising eyebrows, can be generated. In addition, information in irrelevant source face images is well preserved.

Description

Semantic neural rendering-based face image generation method and system
Technical Field
The invention relates to face image generation and neural rendering, in particular to a face image generation method and a face image generation system based on semantic neural rendering.
Background
A face image is one of the most important photographic contents widely used in daily life. It is an important task to have a variety of application scenarios to be able to edit portrait images by modifying the pose and expression of a given face. However, achieving such editing is extremely challenging, as it requires the automatic perception of 3D geometry that any given face is authentic. At the same time, the acuity of the human visual system to portrait images requires algorithms to generate realistic faces and backgrounds, which makes the task more difficult.
To achieve intuitive control, the motion descriptors should be semantically meaningful, which requires representing facial expressions, head rotations and translations as completely decoupled variables. Parametric face modeling methods provide a powerful tool for describing 3D faces with semantic parameters. These methods allow controlling the shape, expression, etc. characteristics of the 3D face through parameters. In conjunction with the priors of these techniques, one may desire to control the generation of realistic face images similar to the graphics rendering process. Currently, some model-based methods combine rendered images of a three-dimensional deformable face model (3DMM) and edit portrait images by modifying expression or pose parameters. These methods achieve impressive results, but they are target person specific methods, which means that they cannot be applied to arbitrary person portraits.
In 3DMM, the 3D shape S of a face is parametrically characterized as:
Figure BDA0003252567650000011
in which the number of the first and second groups is reduced,
Figure BDA0003252567650000012
average of 3D shape of human face, BidAnd BexpIs the base of identity and expression obtained after scanning 200 faces and performing principal component analysis. The parameters alpha and beta are respectively 80-dimension and 64-dimension, and describe the identity of the human faceAnd expressive features. The rotation and translation of the face is expressed as R ∈ SO (3) and t ∈ R3. Up to this point, the motion information in the human face can be clearly expressed by (β, R, t) in 3 DMM.
Disclosure of Invention
The invention provides a face image generation method and system based on semantic neural rendering, which can generate an image with more accurate action.
The technical scheme of the invention is as follows:
according to one aspect of the invention, a face image generation method based on semantic neural rendering is provided, which comprises the following steps: s1, a mapping network generates a hidden vector from a target face motion descriptor; s2, under the guidance of the hidden vector, a deformation network estimates accurate deformation between a source face image and a required target image, and deforms the source face image by using the estimated deformation parameters to generate a rough deformed image; and S3, generating a final fine image from the roughly deformed image by the editing network.
Preferably, in the above-mentioned human face image generation method based on semantic neural rendering, in step S1, the target facial motion descriptor includes expression, rotation and conversion information of the target face, and after obtaining the target facial motion descriptor, the mapping network generates a hidden vector from the target facial motion descriptor.
Preferably, in the above method for generating a face image based on semantic neural rendering, in step S2, under the guidance of the hidden vector z, the deformation network estimates an accurate deformation between the source face image and the desired target image to obtain an optical flow field, and deforms the source face image by using the estimated optical flow field to generate a rough deformed image.
Preferably, in the above method for generating a face image based on semantic neural rendering, in step S3, the editing network receives the coarse deformed image obtained in the previous step, and combines the source face image and the hidden vector to obtain a final fine image.
According to another aspect of the invention, a face image generation system based on semantic neural rendering is provided, which comprises a mapping network, a deformation network and an editing network, wherein the mapping network is used for mapping an object motion descriptor to a hidden vector; the deformation network is used for estimating the accurate deformation between the source face image and the required target image under the guidance of the hidden vector, and deforming the source face image by using the estimated deformation parameters to generate a rough deformed image; and an editing network for generating a clear image with rich details by editing the coarse morphed image, and generating a final fine image from the coarse morphed image.
According to the technical scheme of the invention, the beneficial effects are as follows:
the semantic neural rendering-based face image generation method and system can generate images with more accurate actions, can generate more vivid results and accurate movement, and simultaneously still retain the identity information of the source face image. Not only can a realistic image with the correct global pose be generated, but also vivid micro-presentations, such as pounding mouth and raising eyebrows, can be generated. In addition, information in irrelevant source face images is well preserved.
For a better understanding and appreciation of the concepts, principles of operation, and effects of the invention, reference will now be made in detail to the following examples, taken in conjunction with the accompanying drawings, in which:
drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below.
FIG. 1 is a flow chart of a semantic neural rendering based face image generation method of the present invention;
FIG. 2 is a network overall frame diagram of the semantic neural rendering-based face image generation method of the present invention;
FIG. 3 is a qualitative comparison graph of the present invention and other algorithms on the task of intuitive face image control;
fig. 4 is an effect diagram of the indirect human face image editing task according to the present invention.
Detailed Description
In order to make the objects, technical means and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific examples. These examples are merely illustrative and not restrictive of the invention.
A face image generation method and system based on semantic neural rendering relates to a novel neural rendering model, and after a source face image and target 3DMM parameters are given, the model can generate a vivid result with accurate target motion. The proposed system model can be divided into three parts: a mapping network, a morphing network, and an editing network, wherein the mapping network generates hidden vectors from the motion descriptors. Under the guidance of the implicit vector, the deformation network estimates the accurate deformation between the source face image and the required target image, and deforms the source face image by using the estimated deformation parameters to generate a rough result. Finally, the editing network generates a final fine image from the coarse image.
Fig. 1 is a flowchart of a semantic-neural-rendering-based face image generation method of the present invention, and fig. 2 is an overall framework diagram of a semantic-neural-rendering-based face image generation system of the present invention, which is described with reference to fig. 1 and fig. 2, and includes the following steps:
s1. the mapping network generates a hidden vector from the target face motion descriptor (as shown in FIG. 2). In this step, as shown in fig. 2, the target face motion descriptor p includes expression, rotation, and conversion information of the target face. After the target face motion descriptor p is obtained, the mapping network generates a hidden vector z from p.
S2, under the guidance of the hidden vector, the deformation network estimates accurate deformation between the source face image and the required target image, and deforms the source face image by using the estimated deformation parameters to generate a rough deformed image. In the step, under the guidance of a hidden vector z, a deformation network estimates a source face image IsAnd accurate deformation between the required target image to obtain an optical flow field w, and aligning the source face image I by using the estimated wsPerforming deformation to generate rough deformed image
Figure BDA0003252567650000031
And S3, generating a final fine image from the roughly deformed image by the editing network. In this step, the editing network receives the rough deformed image obtained in the previous step
Figure BDA0003252567650000032
Combining source face images IsAnd the hidden vector z to obtain the final fine image
Figure BDA0003252567650000033
I.e. the generated image in fig. 2
Figure BDA0003252567650000034
FIG. 2 is a network overall framework diagram of the semantic neural rendering-based face image generation method of the present invention. Given the source facial image (source image Is in fig. 2) and the target facial motion descriptor, the output of the model Is a facial image with accurate target motion, while retaining other information of the source facial image, such as identity, lighting, and background. As shown in fig. 2, the face image generation system model based on semantic neural rendering of the present invention can be divided into three parts: mapping networks, morphing networks, and editing networks. Firstly, mapping a target motion descriptor to a hidden vector; then generating a rough image through a deformation network; finally, the editing network is responsible for generating a sharp image with rich details by editing the coarse results (i.e., generating an image)
Figure BDA0003252567650000041
)。
The invention also provides a face image generation system based on semantic neural rendering, which comprises a mapping network, a deformation network and an editing network, wherein the mapping network is used for mapping the target motion descriptor to the hidden vector; the deformation network is used for estimating the accurate deformation between the source face image and the required target image under the guidance of the hidden vector, and deforming the source face image by using the estimated deformation parameters to generate a rough deformed image; and an editing network for generating a clear image with rich details by editing the coarse morphed image, and generating a final fine image from the coarse morphed image.
Fig. 3 shows a qualitative comparison of the present invention (i.e., the present model in fig. 3) with other algorithms on the task of intuitive face image control. It can be seen that the compared style manipulation network (styleirg) model produces impressive results with realistic details. However, it tends to generate images with a conservative strategy: face motion away from the distribution center is attenuated or ignored for better image quality. Meanwhile, some factors (such as glasses and clothes) which are not related to the facial movement are changed in the modification process. Although the proposed system is not trained using the FFHQ dataset, it still achieves impressive results when tested using this dataset. The system model of the present invention can not only generate realistic images with correct global poses, but also vivid micro-presentations, such as pounding mouth and raising eyebrows. In addition, information in irrelevant source face images is well preserved.
Compared with the existing human face image generation method, the method provided by the invention has the following two advantages: with better generation quality and with higher accuracy of the face movements. Two concepts of generation quality and face movement accuracy in face image generation and related evaluation indexes are explained below respectively: generating quality and face motion accuracy, wherein:
the quality of generation: and measuring whether the generated face image has higher image quality. On the evaluation index, the evaluation is divided into objective evaluation and subjective evaluation. Fraich perceptual distance is a commonly used objective assessment method of production quality. To calculate the Frey's perception distance of a face image generation model, a batch of face images is first generated using the model, and a batch of images is sampled from the data set for comparison. Then, the characteristics of the two batches of images are extracted, the statistical characteristics of the two batches of images are calculated, and the difference of distribution between the generated image and the real image is measured based on the statistical characteristics to serve as the evaluation of the quality of the generated image.
Face motion accuracy: and measuring whether the generated face image has the target face motion characteristics.
Specifically, the accuracy of the facial movement is measured by calculating the average distance of the expression and the posture in the 3d mms of the generated image and the target image as the average expression distance and the average posture distance, respectively.
Table 1 shows the quantitative comparison of the present invention with other algorithms on the task of intuitive face image control. As can be seen from table 1, by using a stylized generation confrontation network (StyleGAN) model as the final generator, the styleig model is able to generate a more realistic image, resulting in a lower fregue perceived distance (FID) score. However, a higher average expressive distance and average pose distance indicates that it may not be able to faithfully reconstruct the target facial motion. Unlike the stylerrig model, the method and system model provided by the invention can generate images with more accurate actions.
TABLE 1 quantitative comparison of the present invention to other algorithms on intuitive face image control task
Sensing distance of Frey cut Mean expression distance Mean attitude distance
StyleRig model 47.37 0.316 0.0919
The model 65.97 0.257 0.0252
Fig. 4 is an effect diagram of the indirect human face image editing task according to the present invention. It can be seen that the system model proposed by the present invention can generate more realistic results and accurate motion while still preserving the identity information of the source face image.
In summary, in order to realize controllable face image generation, the invention provides a novel neural rendering model. Given the source face image and the target 3DMM parameters, the model will produce a realistic result with accurate target motion. The proposed model can be divided into three parts: mapping networks, morphing networks, and editing networks. The mapping network generates hidden vectors from the motion descriptors. Under the guidance of the implicit vector, the deformation network estimates the accurate deformation between the source face image and the required target image, and deforms the source face image by using the estimated deformation parameters to generate a rough result. Finally, the editing network generates a final fine image from the coarse image.
Experiments prove that the model provided by the invention has superiority and versatility. Experiments have shown that this model not only enables intuitive image control through user-specified facial actions, but also generates realistic results in an indirect portrait editing task (also called face reproduction) with the goal of mimicking another person's facial actions.
The foregoing description is of the preferred embodiment of the concepts and principles of operation in accordance with the invention. The above-described embodiments should not be construed as limiting the scope of the claims, and other embodiments and combinations of implementations according to the inventive concept are within the scope of the invention.

Claims (5)

1. A face image generation method based on semantic neural rendering is characterized by comprising the following steps:
s1, a mapping network generates a hidden vector from a target face motion descriptor;
s2, under the guidance of the hidden vector, a deformation network estimates accurate deformation between a source face image and a required target image, and deforms the source face image by using an estimated deformation parameter to generate a rough deformed image; and
and S3, generating a final fine image from the rough deformed image by the editing network.
2. The method for generating a facial image based on semantic neural rendering of claim 1, wherein in step S1, the target facial motion descriptor comprises expression, rotation and transformation information of a target face, and after obtaining the target facial motion descriptor, the mapping network generates the hidden vector from the target facial motion descriptor.
3. The semantic neural rendering-based face image generation method according to claim 1, wherein in step S2, under the guidance of the implicit vector z, the deformation network estimates an accurate deformation between the source face image and the desired target image, obtains an optical flow field, and generates a rough deformed image by deforming the source face image using the estimated optical flow field.
4. The method for generating a facial image based on semantic neural rendering of claim 1, wherein in step S3, the editing network receives the coarse deformed image obtained in the previous step, and combines the source facial image and the hidden vector to obtain a final fine image.
5. A face image generation system based on semantic neural rendering is characterized by comprising a mapping network, a deformation network and an editing network, wherein,
a mapping network for mapping the target motion descriptor to the hidden vector;
the deformation network is used for estimating the accurate deformation between the source face image and the required target image under the guidance of the hidden vector, and deforming the source face image by using the estimated deformation parameters to generate a rough deformed image; and
and the editing network is used for generating a clear image with rich details by editing the rough deformed image and generating a final fine image from the rough deformed image.
CN202111050013.6A 2021-09-08 2021-09-08 Semantic neural rendering-based face image generation method and system Pending CN113744129A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111050013.6A CN113744129A (en) 2021-09-08 2021-09-08 Semantic neural rendering-based face image generation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111050013.6A CN113744129A (en) 2021-09-08 2021-09-08 Semantic neural rendering-based face image generation method and system

Publications (1)

Publication Number Publication Date
CN113744129A true CN113744129A (en) 2021-12-03

Family

ID=78737158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111050013.6A Pending CN113744129A (en) 2021-09-08 2021-09-08 Semantic neural rendering-based face image generation method and system

Country Status (1)

Country Link
CN (1) CN113744129A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114648613A (en) * 2022-05-18 2022-06-21 杭州像衍科技有限公司 Three-dimensional head model reconstruction method and device based on deformable nerve radiation field

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563323A (en) * 2017-08-30 2018-01-09 华中科技大学 A kind of video human face characteristic point positioning method
US20180046854A1 (en) * 2015-02-16 2018-02-15 University Of Surrey Three dimensional modelling
CN109961507A (en) * 2019-03-22 2019-07-02 腾讯科技(深圳)有限公司 A kind of Face image synthesis method, apparatus, equipment and storage medium
CN110660076A (en) * 2019-09-26 2020-01-07 北京紫睛科技有限公司 Face exchange method
CN110717418A (en) * 2019-09-25 2020-01-21 北京科技大学 Method and system for automatically identifying favorite emotion
GB202007052D0 (en) * 2020-05-13 2020-06-24 Facesoft Ltd Facial re-enactment
CN111971713A (en) * 2018-06-14 2020-11-20 英特尔公司 3D face capture and modification using image and time tracking neural networks
CN113239857A (en) * 2021-05-27 2021-08-10 京东科技控股股份有限公司 Video synthesis method and device
CN113343761A (en) * 2021-05-06 2021-09-03 武汉理工大学 Real-time facial expression migration method based on generation confrontation

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180046854A1 (en) * 2015-02-16 2018-02-15 University Of Surrey Three dimensional modelling
CN107563323A (en) * 2017-08-30 2018-01-09 华中科技大学 A kind of video human face characteristic point positioning method
CN111971713A (en) * 2018-06-14 2020-11-20 英特尔公司 3D face capture and modification using image and time tracking neural networks
CN109961507A (en) * 2019-03-22 2019-07-02 腾讯科技(深圳)有限公司 A kind of Face image synthesis method, apparatus, equipment and storage medium
CN110717418A (en) * 2019-09-25 2020-01-21 北京科技大学 Method and system for automatically identifying favorite emotion
CN110660076A (en) * 2019-09-26 2020-01-07 北京紫睛科技有限公司 Face exchange method
GB202007052D0 (en) * 2020-05-13 2020-06-24 Facesoft Ltd Facial re-enactment
CN113343761A (en) * 2021-05-06 2021-09-03 武汉理工大学 Real-time facial expression migration method based on generation confrontation
CN113239857A (en) * 2021-05-27 2021-08-10 京东科技控股股份有限公司 Video synthesis method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114648613A (en) * 2022-05-18 2022-06-21 杭州像衍科技有限公司 Three-dimensional head model reconstruction method and device based on deformable nerve radiation field

Similar Documents

Publication Publication Date Title
Lee et al. Maskgan: Towards diverse and interactive facial image manipulation
US20200334853A1 (en) Facial features tracker with advanced training for natural rendering of human faces in real-time
Cao et al. Facewarehouse: A 3d facial expression database for visual computing
Pyun et al. An example-based approach for facial expression cloning
CN106778628A (en) A kind of facial expression method for catching based on TOF depth cameras
CN111833236B (en) Method and device for generating three-dimensional face model for simulating user
KR20130003170A (en) Method and apparatus for expressing rigid area based on expression control points
CN105118023A (en) Real-time video human face cartoonlization generating method based on human facial feature points
CN111950430A (en) Color texture based multi-scale makeup style difference measurement and migration method and system
CN106326980A (en) Robot and method for simulating human facial movements by robot
CN110443872B (en) Expression synthesis method with dynamic texture details
CN113744129A (en) Semantic neural rendering-based face image generation method and system
Li et al. Orthogonal-blendshape-based editing system for facial motion capture data
Liu et al. Data-driven 3d neck modeling and animation
Mattos et al. 3D linear facial animation based on real data
Sucontphunt et al. Crafting 3d faces using free form portrait sketching and plausible texture inference
Pei et al. Transferring of speech movements from video to 3D face space
KR100792704B1 (en) A Method of Retargeting A Facial Animation Based on Wire Curves And Example Expression Models
Deena et al. Speech-driven facial animation using a shared Gaussian process latent variable model
Park et al. A feature‐based approach to facial expression cloning
Do et al. Quantitative manipulation of custom attributes on 3D-aware image synthesis
Sato et al. Synthesis of photo-realistic facial animation from text based on HMM and DNN with animation unit
KR100544684B1 (en) A feature-based approach to facial expression cloning method
Cosker et al. Speech-driven facial animation using a hierarchical model
Sun et al. Generation of virtual digital human for customer service industry

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20211203