CN113223188B

CN113223188B - Video face fat and thin editing method

Info

Publication number: CN113223188B
Application number: CN202110538233.7A
Authority: CN
Inventors: 唐祥峻; 孙文欣; 金小刚
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-05-18
Filing date: 2021-05-18
Publication date: 2022-05-27
Anticipated expiration: 2041-05-18
Also published as: CN113223188A

Abstract

The invention discloses a video face fat and thin editing method, which comprises the following steps: reconstructing a three-dimensional face model based on the face video, and outputting three-dimensional face shape parameters and face expression parameters and face posture parameters of each video frame; adjusting the three-dimensional face model based on a three-dimensional face fat-thin adjustment algorithm, and transferring an adjustment result to each frame to generate a deformed three-dimensional face model of each frame; establishing dense mapping of face boundaries before and after deformation on a two-dimensional plane by using a directed distance field, and adjusting the dense mapping based on the structure of a three-dimensional face model; and deforming the face video frame based on the dense mapping, reducing background distortion of the video frame caused by deformation by using energy optimization, obtaining the deformed face video frame, and correspondingly replacing the deformed face video frame to the original face video. The invention realizes the automatic generation of the face video conforming to the fat and thin scale, and can still obtain ideal results under the conditions of being shielded, growing hair, wearing glasses and the like.

Description

Video face fat and thin editing method

Technical Field

The invention relates to the technical field of portrait editing, in particular to a video face fat-thin editing method.

Background

With the rapid development of social networks and media, more and more people actively share personal videos and pictures on the network. Image editing techniques are commonly used to create special face effects such as face exaggeration, beautification, etc. The current research focus is mainly on editing the color, texture and shape of the face.

"Deep Shapely portals" (In MM' 20: The28th ACM International Conference on multimedia.2020.1800-1808) discloses an image-based automatic fat-thin editing method that automatically identifies The most appropriate fat-thin scale for a particular person using a neural network and morphs an image using a rendering fusion technique. This technique is not temporally stable and does not deal well with side faces.

"Motion-aware Domain Video Composition" (IEEE Transactions on Image Processing22,7(2013), 2532) and 2544) disclose the importance of stable boundary fusion and propose a method of fusing the original Video and the target Video gradient.

The specification with publication number CN 112348937 a discloses a face image processing method, which includes: the method comprises the steps that electronic equipment obtains a two-dimensional image to be processed, a three-dimensional grid model corresponding to the two-dimensional image to be processed is built according to a preset reference grid, a texture map of the three-dimensional grid model is obtained according to shooting parameters of the two-dimensional image to be processed, boundary points and control points corresponding to the boundary points are determined according to visible boundaries of the face of the reference grid; and the electronic equipment performs deformation processing on the three-dimensional grid model according to a preset deformation requirement by combining the corresponding relation between the boundary point and the control point, renders the texture image to the three-dimensional grid model after the deformation processing, and generates a processed image according to the rendered three-dimensional grid model.

With the development of deep learning, many methods for editing a face with deep learning have appeared in recent years, but most of them are used for editing the expression, makeup or pose of a face. For example, "Real-time Expression Transfer for facial retrieval" (ACMTransactioninggraphics (TOG)34,6(2015),183: 1-183: 14) discloses a method of migrating Expression parameters of one of the persons to another face; "PIE: portal image embedding for semantic control" (ACMTransactionon graphics39,6(2020), 1-14) discloses an editing method for image semantics using GAN network, and editable features include face pose, expression and illumination.

The specification with publication number CN 112308957 a discloses an automatic generation method of an optimal fat-thin portrait image based on deep learning, which comprises the following steps: generating a three-dimensional face model, face related parameters and texture mapping of a two-dimensional face in a face portrait image, wherein the face related parameters comprise face posture parameters; inputting the face portrait image into a trained optimal face fat-thin estimation model based on deep learning, and outputting an optimal face fat-thin scale; adjusting the three-dimensional face model by taking the output optimal fat-thin scale as input according to a three-dimensional face fat-thin adjustment algorithm to generate an optimal fat-thin three-dimensional face model; projecting the optimal fat-thin three-dimensional model with textures onto a two-dimensional plane according to the human face posture parameters to obtain an optimal fat-thin human face; and through a front-back background fusion algorithm, the optimal fat-thin face is seamlessly embedded into the face portrait image to obtain the optimal fat-thin face portrait image.

Monocular face reconstruction is a morbid problem and requires prior knowledge of the face shape and expression to solve for a good result. "A portable model for the synthesis of 3D faces" (internal 26 by Nuclear Coniference computer graphics and Interactive technologies. ACM, 187) discloses a three-dimensional deformable model (3DMM) at first, and the representation space of the face shape is extracted by applying a principal component analysis method to the three-dimensional scanning model to obtain the prior knowledge of the face shape.

"Practice and Theory of Blendshape Facial Models" (int 35 by annual consensus on European Association for computer graphics.199-218) discloses a hybrid model (blendshapes) that expresses Facial expressions as the difference between an expressive face and a naturally expressed face. On the basis of 3DMM, more researches are carried out subsequently to improve the precision and the applicability of the face representation model on different layers.

A Multiresolution 3DMM Model is disclosed by "A Multiresolution3D Mobile Face Model and fixing Framework" (Inthe11th JointConferenceComputerVision, imagingand general graphics the same applications (VISIGR APP), Vol.4.79-86) which can reconstruct three-dimensional Face meshes of different resolutions.

But the video-based face reconstruction works relatively less than the image-based face monocular reconstruction. Although video contains more information than images, reconstructing the shape, expression and pose of a human face is still a very challenging problem. Simply adding more constraints does not lead to satisfactory results.

"Real-time Expression Transfer for social interaction" (ACMTransactioninggraphics (TOG)34,6(2015),183: 1-183: 14) discloses optimization in conjunction with multiframes.

"Stabilized Real-time Face Tracking via a Learned Dynamic Rigidypior" (ACMTransectionon graphics (TOG)37,6(2018), 1-11) discloses a Face Tracking method based on Dynamic rigid prior to further improve reconstruction quality.

However, the above video-based reconstruction methods cannot obtain a unique face shape parameter, i.e., the face shape parameter obtained from the first frame is different from the face shape parameter obtained from the last frame.

Disclosure of Invention

The invention aims to provide a video face fat-thin editing method, which can automatically generate a face video according with a fat-thin scale given to any face video, and can still obtain an ideal result under the conditions of being shielded, having long hair and wearing glasses and the like.

A video face fat and thin editing method comprises the following steps:

(1) reconstructing a three-dimensional face model based on the face video, and generating three-dimensional face shape parameters in the face video and face expression parameters and face posture parameters of each video frame;

(2) adjusting the three-dimensional face model based on a three-dimensional face fat-thin adjustment algorithm, transferring an adjustment result to each video frame, and generating a deformed three-dimensional face model of each video frame;

(3) establishing dense mapping of face boundaries before and after deformation on a two-dimensional plane by using a directed distance field, and adjusting the dense mapping based on the structure of a three-dimensional face model;

(4) and (3) deforming the face video frame in the step (1) based on the dense mapping, reducing background distortion of the video frame caused by deformation by adopting energy optimization to obtain a deformed face video frame, and correspondingly replacing the deformed face video frame to the original face video.

In the step (1), a three-dimensional face model is reconstructed to generate three-dimensional face shape parameters in a face video and face expression parameters and face posture parameters of each video frame, and the specific steps are as follows:

and (1-1) reconstructing a three-dimensional face model based on a monocular vision three-dimensional face reconstruction algorithm, and calculating the face pose parameter of each video frame in the face video. The monocular vision three-dimensional Face reconstruction algorithm adopts a method of lowest resolution in a multi-resolution Face three-dimensional Model disclosed by 'A Multiresolution3D Mobile Face Model and Fitting Framework' (Inthe11th JointConiference computer Vision, imagingand computer graphics principles and applications (VIIGRAPP), Vol.4.79-86).

(1-2) finding out continuous k frames which are suitable for representing the shape of the face in the face video according to the face posture parameters, and performing joint optimization to obtain three-dimensional face shape parameters, wherein the k value is less than 10;

and (1-3) taking the three-dimensional face shape parameters as known conditions, and obtaining the face expression parameters of each frame according to a monocular vision three-dimensional face reconstruction algorithm.

In the step (1-1), a three-dimensional face model is reconstructed based on a monocular vision three-dimensional face reconstruction algorithm, and a face pose parameter of each video frame in a face video is calculated, which specifically comprises the following steps:

(1-1-1) reconstructing a three-dimensional face model, defining an optical flow constraint in a time domain of a face video according to formula (I), wherein the optical flow constraint corrects errors of feature point detection in the three-dimensional face model by using optical flow information so that the three-dimensional face model conforms to detected optical flow changes, and the optical flow constraint is defined in a region of a face boundary,

wherein r and t are respectively the face rotation parameter and the translation parameter of the current frame, r ' and t ' are the face rotation parameter and the translation parameter of the previous frame, alpha is the three-dimensional face shape parameter estimation value, and beta ' are the face expression parameter estimation values of the current frame and the previous frame; II is a projection operator which is used for projecting the human face to a two-dimensional plane according to the human face parameters; l is_bA feature point set of a human face boundary; u shape_iIs the optical flow value corresponding to the ith point on the two-dimensional plane, E_opticThe energy is constrained for the optical flow defined at the face boundary.

(1-1-2) fixing the estimated value of the facial expression parameters and the estimated value of the three-dimensional facial shape parameters according to the energy equation in the formula (II), and solving for the optimal solution to obtain the facial pose parameters E_pose。

Wherein E is_landIs threeThe projection of the two-dimensional face feature points and the energy of the two-dimensional face feature points matching,

energy for maintaining continuity of facial poses of two adjacent frames, λ_land、

λ_opticWeights of the three energy terms respectively;

wherein p is_iThe feature points corresponding to the ith point are L, and the L is a set of three-dimensional face feature points;

where γ is a parameter that balances the translational and rotational effects.

In the step (1-2), the specific steps of finding continuous k frames suitable for representing the human face shape according to the human face posture parameters and carrying out combined optimization to obtain three-dimensional human face shape parameters are as follows;

(1-2-1) traversing the video frames to obtain continuous k frames with the front faces facing to the lens based on the human face posture parameters obtained in the step (1-1); if the k value is more than or equal to 10, cutting redundant frames;

(1-2-2) according to the energy equation of the formula (III), jointly solving the energy equation and the continuous k frames to obtain a three-dimensional human face shape parameter E_identity：

Wherein, E_alignEnergy for matching three-dimensional face feature points and three-dimensional face boundary points with two-dimensional face feature points and two-dimensional face boundary points, E_tempTo keep the postures of the faces of two adjacent frames continuousEnergy of sex and expression continuity, E_priorEnergy, λ, for matching prior knowledge of three-dimensional face shape parameters and expression parameters_align、λ_optic、λ_tempIs the weight of the corresponding energy term;

wherein, σ is a balance parameter of the boundary energy term and the characteristic point energy term.

E_alignAnd E_landIs distinguished by_landIs a fixed correspondence of image feature points to the three-dimensional model of the face, and E_alignThe corresponding relation of (A) changes along with the change of the human face posture; because the human face postures are different, the boundaries of the three-dimensional human face projected on the two-dimensional image are also different, and when the human face posture parameters are obtained through calculation, the E can be applied_align。

E_prior＝w_prior(||α||₂+||αβ||₂)(VIII)

Wherein, w_priorTo the corresponding energy term E_priorThe weight of (c).

In the step (1-3), the facial expression parameter E of each frame is obtained according to a monocular vision three-dimensional face reconstruction algorithm_exprComprises the following steps:

E_expr＝λ_align(E_align)_i+λ_optic(E_optic)_i+λ_temp(E_temp)_i+(E_prior)_i(IX)

in The step (2), The three-dimensional face fat-thin adjustment algorithm adopts a reconstruction algorithm disclosed by Deep shadow textures (In MM' 20: The28th ACM International Conference on multimedia.2020.1800-1808), and The reconstruction algorithm deforms The three-dimensional face model according to The input fat-thin scale.

In the step (2), the three-dimensional face model is adjusted based on a three-dimensional face fat-thin adjustment algorithm, and the specific steps of generating the deformed three-dimensional face model of each frame are as follows;

(2-1) solving a three-dimensional average face model under the natural expression by using the natural expression parameters and the three-dimensional face shape parameters;

(2-2) obtaining a deformed face model under a natural expression according to the fat-thin scale input by the user;

and (2-3) adding the facial expression parameters into the deformed facial model under the natural expression aiming at each frame of the video, and outputting the deformed three-dimensional facial model with the expression.

In the step (3), a directed distance field is used to establish dense mapping of face boundaries before and after deformation on a two-dimensional plane, and the mapping is adjusted based on a three-dimensional face structure, which specifically comprises the following steps:

(3-1) projecting the three-dimensional face to a two-dimensional image plane by adopting a three-dimensional average face model and face posture parameters, and extracting the projected boundary of the two-dimensional face image;

(3-2) extracting a video frame in the face video, back-projecting each pixel point in the face boundary region of the video frame on the three-dimensional face model reconstructed in the step (1), and finding the corresponding nearest vertex on the reconstructed three-dimensional face model for each pixel point; projecting the vertex of the deformed three-dimensional face model obtained in the step (2) to the plane of a video frame, so that the original pixel point before back projection is converted to a new position, and the original position and the new position are mapped to be used as initial mapping;

(3-3) constructing a directional distance field of a face boundary in the deformed three-dimensional face model in real time by using a GPU;

(3-4) moving the pixel points of the initial mapping obtained in the step (3-2) to a region with a median of 0 in the directed distance field by adopting a gradient optimization method to form dense mapping;

and (3-5) projecting the densely mapped pixel points back to the deformed three-dimensional face model, and if the projected pixel points are not positioned on the cheeks of the deformed three-dimensional face model, excluding the corresponding mapping relation.

In the step (4), each video frame in the face video is deformed based on dense mapping, and the background distortion of the video frame caused by deformation is reduced by using energy optimization, and the method specifically comprises the following steps:

(4-1) establishing a checkerboard grid near the face region on each video frame before the face model is deformed, finding out a grid point on the checkerboard closest to a densely mapped pixel point, and moving the grid point by adopting a moving least square deformation method according to the dense mapping relation; the moving least squares deformation method is disclosed in "Image formatting moving least squares" (ACM SIGGRAPH 2006papers.2006: 533-540.);

(4-2) fixing the grid point closest to the densely mapped pixel point, and moving other grid points to minimize the energy E, so that the background distortion influence of the video frame caused by deformation is minimized:

E＝w_lE_l+w_rE_r (X)

wherein, E_l、E_rEnergy, w, respectively keeping the constraint grid lines straight and the area of each grid point on the constraint grid constant_l、w_rWeights for the corresponding energy terms, respectively;

wherein v is_iIs the ith grid point, N (i) is the set of all adjacent points of the ith grid point, e_ijIs along v_i-v_jUnit vector of direction.

Compared with the prior art, the invention has the advantages that:

1. the method adjusts the fat and thin of the face portrait in the video with high quality;

2. the method adopts a multi-stage optimized three-dimensional face reconstruction algorithm, so that the three-dimensional face reconstruction is efficient and stable;

3. the present invention provides a method of video morphing based on directed distance fields and three-dimensional face structures that does not produce noticeable artifacts.

Drawings

Fig. 1 is a flowchart of a video face fat editing method according to an embodiment of the present invention.

Detailed Description

As shown in fig. 1, the video face fat-thin editing method includes the following steps:

(1) and reconstructing a three-dimensional face model based on the face video, and generating three-dimensional face shape parameters in the face video, and face expression parameters and face posture parameters of each video frame.

(1-1) reconstructing a three-dimensional face model based on a monocular vision three-dimensional face reconstruction algorithm, and calculating a face pose parameter of each video frame in a face video; this step performs rigid pose estimation on the face video.

The monocular vision three-dimensional Face reconstruction algorithm adopts 'A Multiresolution3D Mobile Face Model and Fitting Framework'

(Inthe11th JointConferenceon computer Vision, imagingand Communicationgraphics TheiyandApplicatio ns (VIIGRAPP), Vol.4.79-86), which comprises the following steps:

(1-1-1) reconstructing a three-dimensional face model, defining an optical flow constraint in a time domain of a face video according to a formula (I), wherein the optical flow constraint corrects errors detected by feature points in the three-dimensional face model by using optical flow information so that the reconstructed three-dimensional face model conforms to the detected optical flow change, and the optical flow constraint is defined in a face boundary area:

wherein, r and t are the face rotation parameter and translation parameter of the current frame, r ' and t ' are the face rotation parameter and translation parameter of the previous frame, alpha is the three-dimensional face shape parameter, beta and beta ' are the face expression parameter of the current frame and the previous frameN is a projection operator for projecting the face onto a two-dimensional plane, L, based on the face parameters_bA feature point set which is a human face boundary; u shape_iIs the optical flow value corresponding to the ith point on the two-dimensional plane, E_opticThe energy is constrained for the optical flow defined at the face boundary.

(1-1-2) fixing the facial expression parameters and the three-dimensional facial shape parameters according to the energy equation of the formula (II), and solving the optimal solution to obtain the facial posture parameters E_pose。

Wherein E is_landFor the projection of the three-dimensional face feature points and the energy of the two-dimensional face feature points matching,

λ_opticRespectively, the weights of the three energy terms.

Wherein p is_iAnd L is a set of three-dimensional human face characteristic points.

Where γ is a parameter that balances the translational and rotational effects.

(1-2) finding continuous k frames capable of most expressing the face shape according to the face posture parameters, and performing joint optimization to obtain three-dimensional face shape parameters, wherein the method comprises the following specific steps:

(1-2-1) traversing all video frames in the face video based on the face posture parameters obtained in the step (1-1), finding out continuous k frames with the face facing to the shot or facing to the shot as far as possible, and cutting off redundant frames to enable the k value to be less than 10;

(1-2-2) estimating the shape of the three-dimensional face according to the energy equation of the formula (V), and performing combined optimization solution on the three-dimensional face and the continuous k frames which can represent the shape of the face most to obtain three-dimensional face shape parameters E_identity：

Wherein E is_alignEnergy for matching three-dimensional face feature points and three-dimensional face boundary points with two-dimensional face feature points and two-dimensional face boundary points, E_tempTo maintain the energy of the continuity of the pose and the continuity of the expression of the faces of two adjacent frames, E_priorEnergy, λ, for matching prior knowledge of three-dimensional face shape parameters and expression parameters_align、λ_optic、λ_tempIs the weight of the corresponding energy term.

E_alignAnd E_landIs distinguished by_landIs a fixed correspondence of image feature points to the three-dimensional model of the face, and E_alignThe corresponding relation of (A) changes along with the change of the human face posture; because the human face postures are different, the boundaries of the three-dimensional human face projected on the two-dimensional image are also different, and when the human face posture parameters are obtained, the E can be applied_align。

E_prior＝w_prior(||α||₂+||β||₂)(VIII)

Wherein, w_priorTo the corresponding energy term E_priorThe weight of (c).

(1-3) taking the three-dimensional face shape parameters as known conditions, and obtaining the facial expression parameters E of each frame according to a monocular vision three-dimensional face reconstruction algorithm_exprComprises the following steps:

(2) the method comprises the following steps of adjusting a three-dimensional face model based on a three-dimensional face fat-thin adjustment algorithm, transferring an adjustment result to each video frame, and generating a deformed three-dimensional face model of each video frame, wherein the method comprises the following specific steps:

the three-dimensional face fat-thin adjustment algorithm adopts a reconstruction algorithm disclosed by Deep Shapely portals (In MM' 20: The28th ACM International Conference on multimedia.2020.1800-1808), and The reconstruction algorithm deforms The three-dimensional face model according to The input fat-thin scale.

(3) The method comprises the following steps of establishing dense mapping of human face boundaries before and after deformation on a two-dimensional plane by using a directed distance field, and adjusting the dense mapping based on the structure of a three-dimensional human face model, wherein the method comprises the following specific steps:

and (3-1) projecting the three-dimensional face to a two-dimensional image plane by adopting the three-dimensional average face model and the face posture parameters obtained in the step (2-1), and extracting the projected boundary of the two-dimensional face image.

And (3-2) extracting an original image of the video frame, back-projecting each pixel point in the face boundary region of the original image on the three-dimensional face model reconstructed in the step (1), finding the nearest vertex on the three-dimensional face model before deformation for each pixel point, re-projecting the vertex of the deformed three-dimensional face model to the plane of the original image, converting the original pixel point before back projection to a new position, and taking the mapping of the original position and the new position as initial mapping.

If the found vertex does not fall on the cheek, but on the nose, mouth, etc. (in the case of a side face, the nose may block the boundary of the original cheek) then the next calculation does not take this vertex into account.

(3-3) transmitting the positions of the deformed face boundary points into a GPU; in the computer loader, each pixel point is paralleled, and the human face boundary point is traversed to find the nearest distance from the pixel point and store the nearest distance.

(3-4) moving the pixel points of the initial mapping obtained in the step (3-2) to a region with a median of 0 in the directed distance field by adopting a gradient optimization method to form dense mapping, wherein the method specifically comprises the following steps:

(3-4-1) calculating the gradient of each pixel point in the initial mapping at the same position of the directed distance field, and slightly shifting the gradient according to the gradient direction;

(3-4-2) iterating the step (3-4-1) until the gradient is 0 or the value is close to the area with the value of 0 in the directional distance field, so as to achieve the purpose of moving each pixel point in the initial mapping to the deformed boundary line.

(4) Deforming the video frames of the face video in the step (1) based on dense mapping, reducing background distortion of the video frames caused by deformation by adopting energy optimization, obtaining deformed video frames, and correspondingly replacing the deformed face video frames with original face videos, wherein the specific steps are as follows:

(4-1) establishing a checkerboard grid near the face area in the video frame of the face video in the step (1), finding out a grid point on the checkerboard closest to the densely mapped pixel points, and moving the grid point by adopting a moving least square deformation method according to the densely mapped relation; the moving least squares deformation method is disclosed in "Image formatting moving least squares" (ACM SIGGRAPH 2006papers.2006: 533-;

E＝w_lE_l+w_rE_r(X)

wherein E is_l，E_rEnergy, w, respectively keeping the constraint grid lines straight and the area of each grid point on the constraint grid constant_l、w_rWeights for the corresponding energy terms, respectively;

Claims

1. A video face fat and thin editing method is characterized by comprising the following steps:

(2) adjusting a three-dimensional face model based on a three-dimensional face fat-thin adjustment algorithm, transferring an adjustment result to each video frame, and generating a deformed three-dimensional face model of each video frame, wherein the method comprises the following specific steps of:

(2-3) adding the facial expression parameters to the deformed facial model under the natural expression aiming at each video frame of the video, and outputting the deformed three-dimensional facial model with the expression;

(3) the method comprises the following steps of establishing dense mapping of human face boundaries before and after deformation on a two-dimensional plane by using a directed distance field, and adjusting the dense mapping based on the structure of a three-dimensional human face model, wherein the method specifically comprises the following steps:

(3-2) extracting a video frame in the face video, back-projecting each pixel point in the face boundary region of the video frame on the three-dimensional face model reconstructed in the step (1), finding the corresponding nearest vertex on the reconstructed three-dimensional face model for each pixel point, projecting the vertex of the deformed three-dimensional face model obtained in the step (2) to the plane of the video frame, so that the original pixel point before back projection is converted to a new position, and taking the mapping of the original position and the new position as initial mapping;

(3-5) projecting the densely mapped pixel points back to the deformed three-dimensional face model, and if the projected pixel points are not positioned on the cheeks of the deformed three-dimensional face model, excluding the corresponding mapping relation;

2. The video face slimming editing method of claim 1, wherein in the step (1), the three-dimensional face model is reconstructed to generate three-dimensional face shape parameters in the face video and face expression parameters and face pose parameters of each video frame, and the specific steps are as follows:

(1-1) reconstructing a three-dimensional face model based on a monocular vision three-dimensional face reconstruction algorithm, and calculating face posture parameters of each video frame in a face video;

and (1-3) taking the three-dimensional face shape parameters as known conditions, and obtaining the face expression parameters of each video frame according to a monocular vision three-dimensional face reconstruction algorithm.

3. The video face slimming editing method of claim 2, wherein in the step (1-1), the three-dimensional face model is reconstructed based on a monocular vision three-dimensional face reconstruction algorithm, and the face pose parameter of each video frame in the face video is calculated, which comprises the following specific steps:

(1-1-1) reconstructing a three-dimensional face model, defining an optical flow constraint in a time domain of a face video, wherein the optical flow constraint utilizes optical flow information to correct errors of feature point detection in the three-dimensional face model so that the three-dimensional face model conforms to detected optical flow changes, and the optical flow constraint is defined in a face boundary area:

wherein r and t are respectively the face rotation parameter and the translation parameter of the current frame, r ' and t ' are the face rotation parameter and the translation parameter of the previous frame, alpha is the three-dimensional face shape parameter estimation value, and beta ' are the face expression parameter estimation values of the current frame and the previous frame; II is a projection operator which is used for projecting the human face to a two-dimensional plane according to the human face parameters; l is a radical of an alcohol_bAs a boundary of a human faceA set of feature points of (1); u shape_iIs the optical flow value corresponding to the ith point on the two-dimensional plane, E_opticOptical flow constraint energy defined for the boundary of the face;

(1-1-2) fixing the estimated value of the facial expression parameters and the estimated value of the three-dimensional facial shape parameters, and solving the optimal solution to obtain the facial pose parameters E_pose：

λ_opticRespectively, the weights of the three energy terms.

4. The video face slimming editing method of claim 3, wherein in the step (1-2), the three-dimensional face shape parameters are obtained by finding continuous k frames suitable for representing the face shape according to the face pose parameters and performing joint optimization, and the specific steps are as follows:

(1-2-1) traversing the video frames to obtain continuous k frames with the front of the face facing to the lens based on the face posture parameters obtained in the step (1-1), and cutting off redundant frames if the k value is more than or equal to 10;

Wherein E is_alignIs three-dimensionalEnergy of matching of the face feature points and the three-dimensional face boundary points with the two-dimensional face feature points and the two-dimensional face boundary points, E_tempTo maintain the energy of the continuity of the pose and the continuity of the expression of the faces of two adjacent frames, E_priorEnergy for matching prior knowledge of three-dimensional face shape parameters and expression parameters, E_opticOptical flow constraint energy defined for the boundary of the face; lambda [ alpha ]_align、λ_optic、λ_tempIs the weight of the corresponding energy term.

5. The video face slimming editing method of claim 4, wherein the energy E of the matching of the three-dimensional face feature points and the three-dimensional face boundary points with the two-dimensional face feature points and the two-dimensional face boundary points_alignComprises the following steps:

wherein r and t are respectively a face rotation parameter and a translation parameter of the current frame, alpha is a three-dimensional face shape parameter estimation value, and beta is a face expression parameter estimation value of the current frame; σ is a balance parameter of the boundary energy term and the characteristic point energy term, p_iA feature point corresponding to the ith point, L_bIs a set of feature points of the boundary of the face.

6. The video face slimming editing method of claim 2, wherein in the step (1-3), the facial expression parameter E of each frame is obtained according to a monocular vision three-dimensional face reconstruction algorithm_exprComprises the following steps:

E_expr＝λ_align(E_align)_i+λ_optic(E_optic)_i+λ_temp(E_temp)_i+(E_prior)_i (IX)

wherein, E_exprHuman face expression parameters, E_alignEnergy for matching three-dimensional face feature points and three-dimensional face boundary points with two-dimensional face feature points and two-dimensional face boundary points, E_tempEnergy for keeping continuity of pose and continuity of expression of two adjacent frames of human faces, E_priorEnergy for matching prior knowledge of three-dimensional face shape parameters and expression parameters, E_opticOptical flow constraint energy defined for the boundary of the face; lambda [ alpha ]_align、λ_optic、λ_tempIs the weight of the corresponding energy term.

7. The video face slimming editing method of claim 1, wherein in the step (4), each video frame in the face video is morphed based on dense mapping, and energy optimization is used to reduce distortion of the background of the video frame caused by the morphed video frame, and the specific steps are as follows:

(4-1) establishing a checkerboard grid near the face region on each video frame before the face model is deformed, finding out a grid point on the checkerboard closest to a densely mapped pixel point, and moving the grid point by adopting a moving least square deformation method according to the dense mapping relation;

E＝w_lE_l+w_rE_r (X)

wherein E is_l、E_rEnergy, w, respectively keeping the constraint grid lines straight and the area of each grid point on the constraint grid constant_l、w_rWeights for the corresponding energy terms, respectively;