CN117974867A - Monocular face avatar generation method based on Gaussian point rendering - Google Patents

Monocular face avatar generation method based on Gaussian point rendering Download PDF

Info

Publication number
CN117974867A
CN117974867A CN202410381197.1A CN202410381197A CN117974867A CN 117974867 A CN117974867 A CN 117974867A CN 202410381197 A CN202410381197 A CN 202410381197A CN 117974867 A CN117974867 A CN 117974867A
Authority
CN
China
Prior art keywords
gaussian
points
point
space
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410381197.1A
Other languages
Chinese (zh)
Other versions
CN117974867B (en
Inventor
张盛平
陈宇凡
柳青林
孟权令
吕晓倩
王晨阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology Weihai
Original Assignee
Harbin Institute of Technology Weihai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology Weihai filed Critical Harbin Institute of Technology Weihai
Priority to CN202410381197.1A priority Critical patent/CN117974867B/en
Publication of CN117974867A publication Critical patent/CN117974867A/en
Application granted granted Critical
Publication of CN117974867B publication Critical patent/CN117974867B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

A monocular face avatar generation method based on Gaussian point rendering comprises the following steps: extracting expression parameters and gesture parameters of FLAME from the monocular portrait video; defining an initialization space, a standard space and a deformation space; acquiring Gaussian parameters of points in a deformation space from position information of the points in the deformation space and an initialization space; inputting Gaussian parameters of points in a deformation space into a renderer, and rendering an image; performing image loss on the rendered image and the input monocular portrait video, and training by minimizing the constraint; adding a point adding and deleting strategy in each training iteration to realize point increase; and driving the trained specific character avatar through the driving video. According to the invention, an iterative optimization strategy and a Gaussian point cloud adding and deleting point strategy are designed, the rendering speed and rendering quality of a Gaussian snowball renderer are utilized, training of a Gaussian parameter network and a point deformation network is guided through a pre-trained linear mixed skin function, and the generation quality of a portrait avatar is improved.

Description

Monocular face avatar generation method based on Gaussian point rendering
Technical Field
The invention relates to the technical field of image processing and pattern recognition, in particular to a monocular face avatar generation method based on Gaussian point rendering.
Background
Monocular avatar generation aims at generating a specific character face avatar to perform specific actions and expressions, and has wide application in the fields of man-machine interaction, virtual reality, augmented reality and the like. The existing method mostly adopts an implicit network to solve the problem, but the type of network has long training time and rendering time, and the generated result has poor geometric result. In recent years, new vigor is injected into the field due to the occurrence of point rendering, and the point rendering is lost with a true value by rendering the points on a two-dimensional image, so that a high-precision rendered portrait is obtained. In particular, recently occurring gaussian snowball rendering, which has a faster rendering speed than point rendering, while having a rendering quality. However, the existing Gaussian snowball rendering can only achieve static scenes, and cannot generate dynamic sequences according to the requirements of users. In addition, the existing Gaussian snowball rendering-based method is optimized iteratively, a specific face avatar cannot be generated according to the requirements of a user, and an action sequence is generated according to an input driving signal.
Disclosure of Invention
The invention aims to provide a monocular face avatar generation method based on Gaussian point rendering, which utilizes the high-fidelity image rendering capability of Gaussian snowball rendering to generate a face avatar driven based on FLAME parameters, and utilizes a Gaussian deformation field and a Gaussian parameter prediction network to establish the relation between the offset of points and the FLAME parameters through a pre-trained linear mixed skin function so as to improve the rendering quality, geometric quality and driving quality of the face avatar.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a monocular face avatar generation method based on Gaussian point rendering comprises the following steps:
extracting expression parameters and gesture parameters of FLAME from the monocular portrait video;
further, inputting the data of the monocular portrait training set into a FLAME fitting network, and predicting to obtain facial expression parameters and posture parameters of the corresponding images.
Initializing parameters of points, and defining an initialization space according to the parameters;
further, the number of initialization points is set to 400, the positions of the initialization points are randomly set in the space, and this space is defined as an initialization space.
Predicting Gaussian parameters and the offset of the points from the space positions of the initialized points to obtain new positions and defining a standard space;
further, the gaussian parameter and the offset of the point are predicted from the spatial position of the initialized point, and the offset is added to the position of the corresponding point to obtain a new position, which is defined as the standard space.
Deforming points in the standard space and defining a deformation space;
further, the positions of points in the standard space are input into a linear mixed skin function which is learned in advance to obtain deformed points, and the stage is defined as a deformation space.
Acquiring Gaussian parameters of points in a deformation space from position information of the points in the deformation space and an initialization space;
further, the position of the point in the deformation space is differenced from the position of the point in the initialization space, and the difference is input into a Gaussian deformation field to obtain the deformation quantity of the Gaussian parameter, so that the Gaussian parameter of the point in the deformation space is obtained.
Inputting Gaussian parameters of points in a deformation space into a renderer, and rendering an image;
further, gaussian parameters of the points in the deformation space and positions of the points are input into a Gaussian snowball renderer, and a rendered image is obtained.
Performing image loss on the rendered image and the input monocular portrait video, combining the FLAME loss and the image perception loss, and training by minimizing the weighted sum of the image loss, the FLAME loss and the image perception loss;
further, the rendered image and the input monocular video are subjected to image loss, the FLAME loss and the image perception loss are combined, and the implicit network is trained by minimizing the weighted sum of the image loss, the FLAME loss and the image perception loss, so that model parameters are learned.
Adding a point adding and deleting strategy in each training iteration to realize point increase;
Further, at the end of each training period, the undesirable points are eliminated and a random addition of one point around each remaining point is performed. For example: when the number of points is increased to 100000, at the end of each training period, points which are not satisfactory are deleted, one point is randomly increased around the rest points, and the total number of points is supplemented to 100000.
And driving the trained specific character avatar through the driving video.
Further, the FLAME parameters extracted from the driving video are input into a trained network to obtain point positions and Gaussian parameters corresponding to the point positions, and the point positions and Gaussian parameters are input into a Gaussian snowball renderer to obtain corresponding specific character avatar driving actions and rendering images of the specific character avatar driving actions.
The effects provided in the summary of the invention are merely effects of embodiments, not all effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
The invention provides a monocular face avatar generation method based on Gaussian point rendering, which solves the problem that the existing method for driving the face avatar rendering to be real-time and overcomes the problem that Gaussian snowball rendering cannot render dynamic scenes. And designing an iterative optimization strategy and a point adding and deleting strategy of the Gaussian point cloud, fully utilizing the rendering speed and rendering quality of the Gaussian snowball renderer, and guiding training of a Gaussian parameter network and a point deformation network through a pre-trained linear mixed skin function so as to improve the generation quality of the portrait avatar.
Drawings
FIG. 1 is a flow chart of a method for generating a monocular face avatar based on Gaussian point rendering.
Detailed Description
As shown in fig. 1, a monocular face avatar generation method based on gaussian point rendering includes the steps of:
S1, extracting expression parameters and posture parameters of FLAME from monocular portrait videos;
S2, initializing parameters of points, and defining an initialization space according to the parameters;
s3, predicting Gaussian parameters and offset of points from the space positions of the initialized points to obtain new positions and defining a standard space;
s4, deforming points in the standard space and defining a deformation space;
S5, acquiring Gaussian parameters of the points in the deformation space from the position information of the points in the deformation space and the initialization space;
s6, inputting Gaussian parameters of points in a deformation space into a renderer, and rendering an image;
S7, performing image loss on the rendered image and the input monocular portrait video, combining the FLAME loss and the image perception loss, and training by minimizing the weighted sum of the image loss, the FLAME loss and the image perception loss;
And S8, adding a point adding and deleting strategy in each training iteration to realize point adding.
S9, driving the trained specific character avatar through the driving video.
In step S1, inputting data of a monocular portrait training set into a FLAME fitting network, and predicting to obtain facial expression parameters and posture parameters of a corresponding image: for a given portrait video, the existing face key point detection method is combined with the camera parameter estimation method of the FLAME, and the face expression parameters and the gesture parameters corresponding to the FLAME are optimized to achieve fitting of the FLAME model and the faces in the input images.
In step S2, an initialization space is defined, in which the initial position of the point is defined: 400 points were randomly sampled on a sphere surface with a radius of 0.5 image size, and their positions were defined as initial positions of the points.
In step S3, according to the positions of the points in the initialization space, the gaussian parameter of each point is predicted by the multi-layer perceptron network: the gaussian parameters of a point are defined as: wherein/> To initialize the position of a Gaussian point in space,/>To initialize the rotation coefficient of a Gaussian point in space,/>To initialize the scaling factor of a Gaussian point in space,/>To initialize the visible coefficients of Gaussian points in space,/>Initializing color parameters of Gaussian points in a space; the prediction process is defined as:
on the basis, a leachable offset is added to each point, and the position information of the Gaussian points is converted into a standard space: wherein/> Is a multi-layer perceptron.
In step S4, the positions of the points in the standard space are input into a linear mixed skin function learned in advance to obtain deformed points:
Wherein LBS is a pre-trained linear hybrid skin function, And/>Is a posture base and an expression base of the human head model,For mixed skin weights,/>And/>Linear hybrid skin for output pose and expression,/>And/>Is a posture coefficient and an expression coefficient.
In step S5, the difference between the position of the point in the deformation space and the position of the point in the initialization space is input into the Gaussian deformation field to obtain the Gaussian parameter deformation quantity, so that the Gaussian parameter in the deformation space is obtained. The process outputs Gaussian parameters of Gaussian points in a deformation space by inputting the positions of points in the deformation space and the positions of points in an initialization space to a multi-layer perceptron:
Wherein, Is Gaussian deformation field composed of multiple layers of perceptrons,/>To predict the resulting Gaussian deformation,/>A gaussian parameter representing a gaussian point in the deformation space. The gaussian points in the deformation space are expressed as:
in step S6, the Gaussian points in the deformation space are used As input, and rendering by a gaussian snowball renderer to obtain a predicted image:
Wherein, Is standard/>A function; /(I)Is the covariance matrix of Gaussian points, which is defined by scaling matrix/>And rotation matrix/>Is formed by the steps of; viewing angle conversion matrix/>And projective transformation mapping jacobian matrix/>Covariance matrix/>Conversion from three-dimensional world coordinates to two-dimensional camera coordinates, i.e. covariance matrix/>;/>Is the effect of each gaussian point on the pixel; /(I)Representing the transmittance term.
In step S7, the rendered image and the input monocular video are subjected to image loss, and the FLAME loss and the image perception loss are combined, and training is performed by minimizing the weighted sum of the image loss, the FLAME loss and the image perception loss:
Wherein, And/>Is a rendered image and a truth image,/>Is obtained by L1 loss of a rendered image and a truth image; /(I)Is a feature of the first four layers of output of the pretrained VGG network,/>Is input by rendering images and truth images/>Obtaining characteristics and L1 loss; /(I)、/>And/>Is based on the false true value of the FLAME vertex,/>、/>And/>Is FLAME loss/>Making all weight values of L2 loss by the expression, the gesture and the skin weight and the corresponding false true value; /(I)Is a loss of structural similarity; /(I)、/>、/>And/>The weight of each loss in the final loss is given.
In step S8, adding a point adding and deleting strategy to each training iteration to realize point adding: defining a rendering radius and a sampling radius which decrease along with the training period and a point number which increases along with the period; deleting points with transmittance less than 0.1 per period, and supplementing the points to the set point number; the number of points, the rendering radius and the sampling radius are updated every 5 periods.
In step S9, expression parameters and gesture parameters of the driving video are extracted through a FLAME fitting network, a trained specific character avatar network is input, the displacement of Gaussian points and the change of the Gaussian parameters are realized, and the driving video is rendered.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims (10)

1. A monocular face avatar generation method based on Gaussian point rendering is characterized by comprising the following steps of: the method comprises the following steps:
extracting expression parameters and posture parameters of FLAME from monocular portrait videos;
Initializing parameters of points, and defining an initialization space according to the parameters;
Step three, predicting Gaussian parameters and the offset of the points from the space positions of the initialized points to obtain new positions and define a standard space;
Fourthly, deforming the points in the standard space and defining a deformation space;
Fifthly, acquiring Gaussian parameters of the points in the deformation space from the position information of the points in the deformation space and the initialization space;
step six, inputting Gaussian parameters of points in a deformation space into a renderer, and rendering an image;
step seven, performing image loss on the rendered image and the input monocular portrait video, combining the FLAME loss and the image perception loss, and training by minimizing the weighted sum of the image loss, the FLAME loss and the image perception loss;
Step eight, adding a point adding and deleting strategy in each training iteration to realize point increase;
and step nine, driving the trained specific character avatar through driving video.
2. The method for generating a monocular face avatar based on gaussian point rendering according to claim 1, wherein the first step is as follows: inputting the data of the monocular portrait training set into a FLAME fitting network, and predicting to obtain facial expression parameters and posture parameters of the corresponding image.
3. The method for generating a monocular face avatar based on gaussian point rendering according to claim 1, wherein the step two is specifically as follows: the number of initialization points is set, the positions of the initialization points are randomly set in a space, and this space is defined as an initialization space.
4. The method for generating a monocular face avatar based on gaussian point rendering according to claim 1, wherein said step three specifically comprises: the Gaussian parameters and the offset of the points are predicted from the spatial positions of the initialized points, and the offset is added to the positions of the corresponding points to obtain new positions, and the stage is defined as a standard space.
5. The method for generating a monocular face avatar based on gaussian point rendering according to claim 1, wherein said step four specifically comprises: the positions of the points in the standard space are input into a linear mixed skin function which is learned in advance to obtain deformed points, and the stage is defined as a deformation space.
6. The method for generating a monocular face avatar based on gaussian point rendering according to claim 1, wherein the fifth step is specifically as follows: and (3) taking the difference between the position of the midpoint in the deformation space and the position of the midpoint in the initialization space, and inputting the difference into a Gaussian deformation field to obtain the deformation quantity of the Gaussian parameters, thereby obtaining the Gaussian parameters of the point in the deformation space.
7. The method for generating a monocular face avatar based on gaussian point rendering according to claim 1, wherein said step six specifically comprises: and inputting the Gaussian parameters of the points in the deformation space and the positions of the points into a Gaussian snowball renderer to obtain a rendering chart.
8. The method for generating a monocular face avatar based on gaussian point rendering according to claim 1, wherein said step seven specifically comprises: and (3) carrying out image loss on the rendered image and the input monocular portrait video, combining the FLAME loss and the image perception loss, training an implicit network by minimizing the weighted sum of the image loss, the FLAME loss and the image perception loss, and learning model parameters.
9. The method for generating a monocular face avatar based on gaussian point rendering according to claim 1, wherein the step eight is specifically as follows: deleting points which do not meet the requirements when each training period is finished, and randomly adding one point around each remaining point; when the number of points is increased to N, at the end of each training period, the points which are not satisfactory are deleted, one point is randomly increased around the rest points, and the total number of points is supplemented to N.
10. The method for generating a monocular face avatar based on gaussian point rendering according to claim 1, wherein said step nine specifically comprises: and inputting the FLAME parameters extracted from the driving video into a trained network to obtain the point positions and the Gaussian parameters corresponding to the point positions, and inputting the point positions and the Gaussian parameters into a Gaussian snowball renderer to obtain the corresponding specific character avatar driving actions and the rendering images thereof.
CN202410381197.1A 2024-04-01 2024-04-01 Monocular face avatar generation method based on Gaussian point rendering Active CN117974867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410381197.1A CN117974867B (en) 2024-04-01 2024-04-01 Monocular face avatar generation method based on Gaussian point rendering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410381197.1A CN117974867B (en) 2024-04-01 2024-04-01 Monocular face avatar generation method based on Gaussian point rendering

Publications (2)

Publication Number Publication Date
CN117974867A true CN117974867A (en) 2024-05-03
CN117974867B CN117974867B (en) 2024-06-21

Family

ID=90864895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410381197.1A Active CN117974867B (en) 2024-04-01 2024-04-01 Monocular face avatar generation method based on Gaussian point rendering

Country Status (1)

Country Link
CN (1) CN117974867B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101465973A (en) * 2008-11-04 2009-06-24 新奥特(北京)视频技术有限公司 Method for rendering subtitling based on curved profile closed loop domain and pixel mask matrix
CN104574432A (en) * 2015-02-15 2015-04-29 四川川大智胜软件股份有限公司 Three-dimensional face reconstruction method and three-dimensional face reconstruction system for automatic multi-view-angle face auto-shooting image
CN112070896A (en) * 2020-09-07 2020-12-11 哈尔滨工业大学(威海) Portrait automatic slimming method based on 3D modeling
US20210065434A1 (en) * 2019-09-02 2021-03-04 Disney Enterprises, Inc. Techniques for performing point-based inverse rendering
CN114332136A (en) * 2022-03-15 2022-04-12 南京甄视智能科技有限公司 Face attribute data labeling method, computer equipment and storage medium
CN116385577A (en) * 2023-02-24 2023-07-04 北京邮电大学 Virtual viewpoint image generation method and device
CN116777765A (en) * 2023-05-26 2023-09-19 深圳市瑞云科技股份有限公司 Method, device and storage medium for realizing offline image rendering based on Diffusion
CN117218300A (en) * 2023-11-08 2023-12-12 腾讯科技(深圳)有限公司 Three-dimensional model construction method, three-dimensional model construction training method and device
CN117315211A (en) * 2023-11-29 2023-12-29 苏州元脑智能科技有限公司 Digital human synthesis and model training method, device, equipment and storage medium thereof

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101465973A (en) * 2008-11-04 2009-06-24 新奥特(北京)视频技术有限公司 Method for rendering subtitling based on curved profile closed loop domain and pixel mask matrix
CN104574432A (en) * 2015-02-15 2015-04-29 四川川大智胜软件股份有限公司 Three-dimensional face reconstruction method and three-dimensional face reconstruction system for automatic multi-view-angle face auto-shooting image
US20210065434A1 (en) * 2019-09-02 2021-03-04 Disney Enterprises, Inc. Techniques for performing point-based inverse rendering
CN112070896A (en) * 2020-09-07 2020-12-11 哈尔滨工业大学(威海) Portrait automatic slimming method based on 3D modeling
CN114332136A (en) * 2022-03-15 2022-04-12 南京甄视智能科技有限公司 Face attribute data labeling method, computer equipment and storage medium
CN116385577A (en) * 2023-02-24 2023-07-04 北京邮电大学 Virtual viewpoint image generation method and device
CN116777765A (en) * 2023-05-26 2023-09-19 深圳市瑞云科技股份有限公司 Method, device and storage medium for realizing offline image rendering based on Diffusion
CN117218300A (en) * 2023-11-08 2023-12-12 腾讯科技(深圳)有限公司 Three-dimensional model construction method, three-dimensional model construction training method and device
CN117315211A (en) * 2023-11-29 2023-12-29 苏州元脑智能科技有限公司 Digital human synthesis and model training method, device, equipment and storage medium thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FENG,Y等: "Gaussian Splashing: Dynamic Fluid Synthesis with Gaussian Splatting", ARXIV, 15 February 2024 (2024-02-15) *
王涵;夏时洪;: "单张图片自动重建带几何细节的人脸形状", 计算机辅助设计与图形学学报, no. 07, 15 July 2017 (2017-07-15) *

Also Published As

Publication number Publication date
CN117974867B (en) 2024-06-21

Similar Documents

Publication Publication Date Title
CN109063301B (en) Single image indoor object attitude estimation method based on thermodynamic diagram
CN110728219B (en) 3D face generation method based on multi-column multi-scale graph convolution neural network
CN108830913B (en) Semantic level line draft coloring method based on user color guidance
Li et al. Vox-surf: Voxel-based implicit surface representation
JPH07509081A (en) Method and device for computer graphic processing using memory
CN113255457A (en) Animation character facial expression generation method and system based on facial expression recognition
CN113421328B (en) Three-dimensional human body virtual reconstruction method and device
CN111462274A (en) Human body image synthesis method and system based on SMP L model
CN117422829A (en) Face image synthesis optimization method based on nerve radiation field
CN116134491A (en) Multi-view neuro-human prediction using implicit differentiable renderers for facial expression, body posture morphology, and clothing performance capture
CN115018989A (en) Three-dimensional dynamic reconstruction method based on RGB-D sequence, training device and electronic equipment
CN117315211B (en) Digital human synthesis and model training method, device, equipment and storage medium thereof
CN116385667B (en) Reconstruction method of three-dimensional model, training method and device of texture reconstruction model
CN117974867B (en) Monocular face avatar generation method based on Gaussian point rendering
Di et al. Multi-agent reinforcement learning of 3d furniture layout simulation in indoor graphics scenes
CN114783039B (en) Motion migration method driven by 3D human body model
Yang et al. EnNeRFACE: improving the generalization of face reenactment with adaptive ensemble neural radiance fields
CN114758205A (en) Multi-view feature fusion method and system for 3D human body posture estimation
CN113763536A (en) Three-dimensional reconstruction method based on RGB image
CN117292040B (en) Method, apparatus and storage medium for new view synthesis based on neural rendering
CN116071831B (en) Human body image generation method based on UV space transformation
CN117011493B (en) Three-dimensional face reconstruction method, device and equipment based on symbol distance function representation
Dang et al. Generalizable Dynamic Radiance Fields For Talking Head Synthesis With Few-shot
Liu et al. Report on Methods and Applications for Crafting 3D Humans
Griffiths et al. Curiosity-driven 3D object detection without labels

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant