CN113223188B - Video face fat and thin editing method - Google Patents

Video face fat and thin editing method Download PDF

Info

Publication number
CN113223188B
CN113223188B CN202110538233.7A CN202110538233A CN113223188B CN 113223188 B CN113223188 B CN 113223188B CN 202110538233 A CN202110538233 A CN 202110538233A CN 113223188 B CN113223188 B CN 113223188B
Authority
CN
China
Prior art keywords
face
dimensional
video
parameters
dimensional face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110538233.7A
Other languages
Chinese (zh)
Other versions
CN113223188A (en
Inventor
唐祥峻
孙文欣
金小刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202110538233.7A priority Critical patent/CN113223188B/en
Publication of CN113223188A publication Critical patent/CN113223188A/en
Application granted granted Critical
Publication of CN113223188B publication Critical patent/CN113223188B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/04Indexing scheme for image data processing or generation, in general involving 3D image data

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Architecture (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a video face fat and thin editing method, which comprises the following steps: reconstructing a three-dimensional face model based on the face video, and outputting three-dimensional face shape parameters and face expression parameters and face posture parameters of each video frame; adjusting the three-dimensional face model based on a three-dimensional face fat-thin adjustment algorithm, and transferring an adjustment result to each frame to generate a deformed three-dimensional face model of each frame; establishing dense mapping of face boundaries before and after deformation on a two-dimensional plane by using a directed distance field, and adjusting the dense mapping based on the structure of a three-dimensional face model; and deforming the face video frame based on the dense mapping, reducing background distortion of the video frame caused by deformation by using energy optimization, obtaining the deformed face video frame, and correspondingly replacing the deformed face video frame to the original face video. The invention realizes the automatic generation of the face video conforming to the fat and thin scale, and can still obtain ideal results under the conditions of being shielded, growing hair, wearing glasses and the like.

Description

Video face fat and thin editing method
Technical Field
The invention relates to the technical field of portrait editing, in particular to a video face fat-thin editing method.
Background
With the rapid development of social networks and media, more and more people actively share personal videos and pictures on the network. Image editing techniques are commonly used to create special face effects such as face exaggeration, beautification, etc. The current research focus is mainly on editing the color, texture and shape of the face.
"Deep Shapely portals" (In MM' 20: The28th ACM International Conference on multimedia.2020.1800-1808) discloses an image-based automatic fat-thin editing method that automatically identifies The most appropriate fat-thin scale for a particular person using a neural network and morphs an image using a rendering fusion technique. This technique is not temporally stable and does not deal well with side faces.
"Motion-aware Domain Video Composition" (IEEE Transactions on Image Processing22,7(2013), 2532) and 2544) disclose the importance of stable boundary fusion and propose a method of fusing the original Video and the target Video gradient.
The specification with publication number CN 112348937 a discloses a face image processing method, which includes: the method comprises the steps that electronic equipment obtains a two-dimensional image to be processed, a three-dimensional grid model corresponding to the two-dimensional image to be processed is built according to a preset reference grid, a texture map of the three-dimensional grid model is obtained according to shooting parameters of the two-dimensional image to be processed, boundary points and control points corresponding to the boundary points are determined according to visible boundaries of the face of the reference grid; and the electronic equipment performs deformation processing on the three-dimensional grid model according to a preset deformation requirement by combining the corresponding relation between the boundary point and the control point, renders the texture image to the three-dimensional grid model after the deformation processing, and generates a processed image according to the rendered three-dimensional grid model.
With the development of deep learning, many methods for editing a face with deep learning have appeared in recent years, but most of them are used for editing the expression, makeup or pose of a face. For example, "Real-time Expression Transfer for facial retrieval" (ACMTransactioninggraphics (TOG)34,6(2015),183: 1-183: 14) discloses a method of migrating Expression parameters of one of the persons to another face; "PIE: portal image embedding for semantic control" (ACMTransactionon graphics39,6(2020), 1-14) discloses an editing method for image semantics using GAN network, and editable features include face pose, expression and illumination.
The specification with publication number CN 112308957 a discloses an automatic generation method of an optimal fat-thin portrait image based on deep learning, which comprises the following steps: generating a three-dimensional face model, face related parameters and texture mapping of a two-dimensional face in a face portrait image, wherein the face related parameters comprise face posture parameters; inputting the face portrait image into a trained optimal face fat-thin estimation model based on deep learning, and outputting an optimal face fat-thin scale; adjusting the three-dimensional face model by taking the output optimal fat-thin scale as input according to a three-dimensional face fat-thin adjustment algorithm to generate an optimal fat-thin three-dimensional face model; projecting the optimal fat-thin three-dimensional model with textures onto a two-dimensional plane according to the human face posture parameters to obtain an optimal fat-thin human face; and through a front-back background fusion algorithm, the optimal fat-thin face is seamlessly embedded into the face portrait image to obtain the optimal fat-thin face portrait image.
Monocular face reconstruction is a morbid problem and requires prior knowledge of the face shape and expression to solve for a good result. "A portable model for the synthesis of 3D faces" (internal 26 by Nuclear Coniference computer graphics and Interactive technologies. ACM, 187) discloses a three-dimensional deformable model (3DMM) at first, and the representation space of the face shape is extracted by applying a principal component analysis method to the three-dimensional scanning model to obtain the prior knowledge of the face shape.
"Practice and Theory of Blendshape Facial Models" (int 35 by annual consensus on European Association for computer graphics.199-218) discloses a hybrid model (blendshapes) that expresses Facial expressions as the difference between an expressive face and a naturally expressed face. On the basis of 3DMM, more researches are carried out subsequently to improve the precision and the applicability of the face representation model on different layers.
A Multiresolution 3DMM Model is disclosed by "A Multiresolution3D Mobile Face Model and fixing Framework" (Inthe11th JointConferenceComputerVision, imagingand general graphics the same applications (VISIGR APP), Vol.4.79-86) which can reconstruct three-dimensional Face meshes of different resolutions.
But the video-based face reconstruction works relatively less than the image-based face monocular reconstruction. Although video contains more information than images, reconstructing the shape, expression and pose of a human face is still a very challenging problem. Simply adding more constraints does not lead to satisfactory results.
"Real-time Expression Transfer for social interaction" (ACMTransactioninggraphics (TOG)34,6(2015),183: 1-183: 14) discloses optimization in conjunction with multiframes.
"Stabilized Real-time Face Tracking via a Learned Dynamic Rigidypior" (ACMTransectionon graphics (TOG)37,6(2018), 1-11) discloses a Face Tracking method based on Dynamic rigid prior to further improve reconstruction quality.
However, the above video-based reconstruction methods cannot obtain a unique face shape parameter, i.e., the face shape parameter obtained from the first frame is different from the face shape parameter obtained from the last frame.
Disclosure of Invention
The invention aims to provide a video face fat-thin editing method, which can automatically generate a face video according with a fat-thin scale given to any face video, and can still obtain an ideal result under the conditions of being shielded, having long hair and wearing glasses and the like.
A video face fat and thin editing method comprises the following steps:
(1) reconstructing a three-dimensional face model based on the face video, and generating three-dimensional face shape parameters in the face video and face expression parameters and face posture parameters of each video frame;
(2) adjusting the three-dimensional face model based on a three-dimensional face fat-thin adjustment algorithm, transferring an adjustment result to each video frame, and generating a deformed three-dimensional face model of each video frame;
(3) establishing dense mapping of face boundaries before and after deformation on a two-dimensional plane by using a directed distance field, and adjusting the dense mapping based on the structure of a three-dimensional face model;
(4) and (3) deforming the face video frame in the step (1) based on the dense mapping, reducing background distortion of the video frame caused by deformation by adopting energy optimization to obtain a deformed face video frame, and correspondingly replacing the deformed face video frame to the original face video.
In the step (1), a three-dimensional face model is reconstructed to generate three-dimensional face shape parameters in a face video and face expression parameters and face posture parameters of each video frame, and the specific steps are as follows:
and (1-1) reconstructing a three-dimensional face model based on a monocular vision three-dimensional face reconstruction algorithm, and calculating the face pose parameter of each video frame in the face video. The monocular vision three-dimensional Face reconstruction algorithm adopts a method of lowest resolution in a multi-resolution Face three-dimensional Model disclosed by 'A Multiresolution3D Mobile Face Model and Fitting Framework' (Inthe11th JointConiference computer Vision, imagingand computer graphics principles and applications (VIIGRAPP), Vol.4.79-86).
(1-2) finding out continuous k frames which are suitable for representing the shape of the face in the face video according to the face posture parameters, and performing joint optimization to obtain three-dimensional face shape parameters, wherein the k value is less than 10;
and (1-3) taking the three-dimensional face shape parameters as known conditions, and obtaining the face expression parameters of each frame according to a monocular vision three-dimensional face reconstruction algorithm.
In the step (1-1), a three-dimensional face model is reconstructed based on a monocular vision three-dimensional face reconstruction algorithm, and a face pose parameter of each video frame in a face video is calculated, which specifically comprises the following steps:
(1-1-1) reconstructing a three-dimensional face model, defining an optical flow constraint in a time domain of a face video according to formula (I), wherein the optical flow constraint corrects errors of feature point detection in the three-dimensional face model by using optical flow information so that the three-dimensional face model conforms to detected optical flow changes, and the optical flow constraint is defined in a region of a face boundary,
Figure BDA0003070681690000031
wherein r and t are respectively the face rotation parameter and the translation parameter of the current frame, r ' and t ' are the face rotation parameter and the translation parameter of the previous frame, alpha is the three-dimensional face shape parameter estimation value, and beta ' are the face expression parameter estimation values of the current frame and the previous frame; II is a projection operator which is used for projecting the human face to a two-dimensional plane according to the human face parameters; l isbA feature point set of a human face boundary; u shapeiIs the optical flow value corresponding to the ith point on the two-dimensional plane, EopticThe energy is constrained for the optical flow defined at the face boundary.
(1-1-2) fixing the estimated value of the facial expression parameters and the estimated value of the three-dimensional facial shape parameters according to the energy equation in the formula (II), and solving for the optimal solution to obtain the facial pose parameters Epose
Figure BDA0003070681690000041
Wherein E islandIs threeThe projection of the two-dimensional face feature points and the energy of the two-dimensional face feature points matching,
Figure BDA0003070681690000042
energy for maintaining continuity of facial poses of two adjacent frames, λland
Figure BDA0003070681690000043
λopticWeights of the three energy terms respectively;
Figure BDA0003070681690000044
wherein p isiThe feature points corresponding to the ith point are L, and the L is a set of three-dimensional face feature points;
Figure BDA0003070681690000045
where γ is a parameter that balances the translational and rotational effects.
In the step (1-2), the specific steps of finding continuous k frames suitable for representing the human face shape according to the human face posture parameters and carrying out combined optimization to obtain three-dimensional human face shape parameters are as follows;
(1-2-1) traversing the video frames to obtain continuous k frames with the front faces facing to the lens based on the human face posture parameters obtained in the step (1-1); if the k value is more than or equal to 10, cutting redundant frames;
(1-2-2) according to the energy equation of the formula (III), jointly solving the energy equation and the continuous k frames to obtain a three-dimensional human face shape parameter Eidentity
Figure BDA0003070681690000046
Wherein, EalignEnergy for matching three-dimensional face feature points and three-dimensional face boundary points with two-dimensional face feature points and two-dimensional face boundary points, EtempTo keep the postures of the faces of two adjacent frames continuousEnergy of sex and expression continuity, EpriorEnergy, λ, for matching prior knowledge of three-dimensional face shape parameters and expression parametersalign、λoptic、λtempIs the weight of the corresponding energy term;
Figure BDA0003070681690000047
wherein, σ is a balance parameter of the boundary energy term and the characteristic point energy term.
EalignAnd ElandIs distinguished bylandIs a fixed correspondence of image feature points to the three-dimensional model of the face, and EalignThe corresponding relation of (A) changes along with the change of the human face posture; because the human face postures are different, the boundaries of the three-dimensional human face projected on the two-dimensional image are also different, and when the human face posture parameters are obtained through calculation, the E can be appliedalign
Figure BDA0003070681690000048
Eprior=wprior(||α||2+||αβ||2)(VIII)
Wherein, wpriorTo the corresponding energy term EpriorThe weight of (c).
In the step (1-3), the facial expression parameter E of each frame is obtained according to a monocular vision three-dimensional face reconstruction algorithmexprComprises the following steps:
Eexpr=λalign(Ealign)ioptic(Eoptic)itemp(Etemp)i+(Eprior)i(IX)
in The step (2), The three-dimensional face fat-thin adjustment algorithm adopts a reconstruction algorithm disclosed by Deep shadow textures (In MM' 20: The28th ACM International Conference on multimedia.2020.1800-1808), and The reconstruction algorithm deforms The three-dimensional face model according to The input fat-thin scale.
In the step (2), the three-dimensional face model is adjusted based on a three-dimensional face fat-thin adjustment algorithm, and the specific steps of generating the deformed three-dimensional face model of each frame are as follows;
(2-1) solving a three-dimensional average face model under the natural expression by using the natural expression parameters and the three-dimensional face shape parameters;
(2-2) obtaining a deformed face model under a natural expression according to the fat-thin scale input by the user;
and (2-3) adding the facial expression parameters into the deformed facial model under the natural expression aiming at each frame of the video, and outputting the deformed three-dimensional facial model with the expression.
In the step (3), a directed distance field is used to establish dense mapping of face boundaries before and after deformation on a two-dimensional plane, and the mapping is adjusted based on a three-dimensional face structure, which specifically comprises the following steps:
(3-1) projecting the three-dimensional face to a two-dimensional image plane by adopting a three-dimensional average face model and face posture parameters, and extracting the projected boundary of the two-dimensional face image;
(3-2) extracting a video frame in the face video, back-projecting each pixel point in the face boundary region of the video frame on the three-dimensional face model reconstructed in the step (1), and finding the corresponding nearest vertex on the reconstructed three-dimensional face model for each pixel point; projecting the vertex of the deformed three-dimensional face model obtained in the step (2) to the plane of a video frame, so that the original pixel point before back projection is converted to a new position, and the original position and the new position are mapped to be used as initial mapping;
(3-3) constructing a directional distance field of a face boundary in the deformed three-dimensional face model in real time by using a GPU;
(3-4) moving the pixel points of the initial mapping obtained in the step (3-2) to a region with a median of 0 in the directed distance field by adopting a gradient optimization method to form dense mapping;
and (3-5) projecting the densely mapped pixel points back to the deformed three-dimensional face model, and if the projected pixel points are not positioned on the cheeks of the deformed three-dimensional face model, excluding the corresponding mapping relation.
In the step (4), each video frame in the face video is deformed based on dense mapping, and the background distortion of the video frame caused by deformation is reduced by using energy optimization, and the method specifically comprises the following steps:
(4-1) establishing a checkerboard grid near the face region on each video frame before the face model is deformed, finding out a grid point on the checkerboard closest to a densely mapped pixel point, and moving the grid point by adopting a moving least square deformation method according to the dense mapping relation; the moving least squares deformation method is disclosed in "Image formatting moving least squares" (ACM SIGGRAPH 2006papers.2006: 533-540.);
(4-2) fixing the grid point closest to the densely mapped pixel point, and moving other grid points to minimize the energy E, so that the background distortion influence of the video frame caused by deformation is minimized:
E=wlEl+wrEr (X)
wherein, El、ErEnergy, w, respectively keeping the constraint grid lines straight and the area of each grid point on the constraint grid constantl、wrWeights for the corresponding energy terms, respectively;
Figure BDA0003070681690000061
Figure BDA0003070681690000062
wherein v isiIs the ith grid point, N (i) is the set of all adjacent points of the ith grid point, eijIs along vi-vjUnit vector of direction.
Compared with the prior art, the invention has the advantages that:
1. the method adjusts the fat and thin of the face portrait in the video with high quality;
2. the method adopts a multi-stage optimized three-dimensional face reconstruction algorithm, so that the three-dimensional face reconstruction is efficient and stable;
3. the present invention provides a method of video morphing based on directed distance fields and three-dimensional face structures that does not produce noticeable artifacts.
Drawings
Fig. 1 is a flowchart of a video face fat editing method according to an embodiment of the present invention.
Detailed Description
As shown in fig. 1, the video face fat-thin editing method includes the following steps:
(1) and reconstructing a three-dimensional face model based on the face video, and generating three-dimensional face shape parameters in the face video, and face expression parameters and face posture parameters of each video frame.
(1-1) reconstructing a three-dimensional face model based on a monocular vision three-dimensional face reconstruction algorithm, and calculating a face pose parameter of each video frame in a face video; this step performs rigid pose estimation on the face video.
The monocular vision three-dimensional Face reconstruction algorithm adopts 'A Multiresolution3D Mobile Face Model and Fitting Framework'
(Inthe11th JointConferenceon computer Vision, imagingand Communicationgraphics TheiyandApplicatio ns (VIIGRAPP), Vol.4.79-86), which comprises the following steps:
(1-1-1) reconstructing a three-dimensional face model, defining an optical flow constraint in a time domain of a face video according to a formula (I), wherein the optical flow constraint corrects errors detected by feature points in the three-dimensional face model by using optical flow information so that the reconstructed three-dimensional face model conforms to the detected optical flow change, and the optical flow constraint is defined in a face boundary area:
Figure BDA0003070681690000071
wherein, r and t are the face rotation parameter and translation parameter of the current frame, r ' and t ' are the face rotation parameter and translation parameter of the previous frame, alpha is the three-dimensional face shape parameter, beta and beta ' are the face expression parameter of the current frame and the previous frameN is a projection operator for projecting the face onto a two-dimensional plane, L, based on the face parametersbA feature point set which is a human face boundary; u shapeiIs the optical flow value corresponding to the ith point on the two-dimensional plane, EopticThe energy is constrained for the optical flow defined at the face boundary.
(1-1-2) fixing the facial expression parameters and the three-dimensional facial shape parameters according to the energy equation of the formula (II), and solving the optimal solution to obtain the facial posture parameters Epose
Figure BDA0003070681690000072
Wherein E islandFor the projection of the three-dimensional face feature points and the energy of the two-dimensional face feature points matching,
Figure BDA0003070681690000073
energy for maintaining continuity of facial poses of two adjacent frames, λland
Figure BDA0003070681690000074
λopticRespectively, the weights of the three energy terms.
Figure BDA0003070681690000075
Wherein p isiAnd L is a set of three-dimensional human face characteristic points.
Figure BDA0003070681690000076
Where γ is a parameter that balances the translational and rotational effects.
(1-2) finding continuous k frames capable of most expressing the face shape according to the face posture parameters, and performing joint optimization to obtain three-dimensional face shape parameters, wherein the method comprises the following specific steps:
(1-2-1) traversing all video frames in the face video based on the face posture parameters obtained in the step (1-1), finding out continuous k frames with the face facing to the shot or facing to the shot as far as possible, and cutting off redundant frames to enable the k value to be less than 10;
(1-2-2) estimating the shape of the three-dimensional face according to the energy equation of the formula (V), and performing combined optimization solution on the three-dimensional face and the continuous k frames which can represent the shape of the face most to obtain three-dimensional face shape parameters Eidentity
Figure BDA0003070681690000077
Wherein E isalignEnergy for matching three-dimensional face feature points and three-dimensional face boundary points with two-dimensional face feature points and two-dimensional face boundary points, EtempTo maintain the energy of the continuity of the pose and the continuity of the expression of the faces of two adjacent frames, EpriorEnergy, λ, for matching prior knowledge of three-dimensional face shape parameters and expression parametersalign、λoptic、λtempIs the weight of the corresponding energy term.
Figure BDA0003070681690000078
Wherein, σ is a balance parameter of the boundary energy term and the characteristic point energy term.
EalignAnd ElandIs distinguished bylandIs a fixed correspondence of image feature points to the three-dimensional model of the face, and EalignThe corresponding relation of (A) changes along with the change of the human face posture; because the human face postures are different, the boundaries of the three-dimensional human face projected on the two-dimensional image are also different, and when the human face posture parameters are obtained, the E can be appliedalign
Figure BDA0003070681690000081
Eprior=wprior(||α||2+||β||2)(VIII)
Wherein, wpriorTo the corresponding energy term EpriorThe weight of (c).
(1-3) taking the three-dimensional face shape parameters as known conditions, and obtaining the facial expression parameters E of each frame according to a monocular vision three-dimensional face reconstruction algorithmexprComprises the following steps:
Eexpr=λalign(Ealign)ioptic(Eoptic)itemp(Etemp)i+(Eprior)i(IX)
(2) the method comprises the following steps of adjusting a three-dimensional face model based on a three-dimensional face fat-thin adjustment algorithm, transferring an adjustment result to each video frame, and generating a deformed three-dimensional face model of each video frame, wherein the method comprises the following specific steps:
the three-dimensional face fat-thin adjustment algorithm adopts a reconstruction algorithm disclosed by Deep Shapely portals (In MM' 20: The28th ACM International Conference on multimedia.2020.1800-1808), and The reconstruction algorithm deforms The three-dimensional face model according to The input fat-thin scale.
(2-1) solving a three-dimensional average face model under the natural expression by using the natural expression parameters and the three-dimensional face shape parameters;
(2-2) obtaining a deformed face model under a natural expression according to the fat-thin scale input by the user;
and (2-3) adding the facial expression parameters into the deformed facial model under the natural expression aiming at each frame of the video, and outputting the deformed three-dimensional facial model with the expression.
(3) The method comprises the following steps of establishing dense mapping of human face boundaries before and after deformation on a two-dimensional plane by using a directed distance field, and adjusting the dense mapping based on the structure of a three-dimensional human face model, wherein the method comprises the following specific steps:
and (3-1) projecting the three-dimensional face to a two-dimensional image plane by adopting the three-dimensional average face model and the face posture parameters obtained in the step (2-1), and extracting the projected boundary of the two-dimensional face image.
And (3-2) extracting an original image of the video frame, back-projecting each pixel point in the face boundary region of the original image on the three-dimensional face model reconstructed in the step (1), finding the nearest vertex on the three-dimensional face model before deformation for each pixel point, re-projecting the vertex of the deformed three-dimensional face model to the plane of the original image, converting the original pixel point before back projection to a new position, and taking the mapping of the original position and the new position as initial mapping.
If the found vertex does not fall on the cheek, but on the nose, mouth, etc. (in the case of a side face, the nose may block the boundary of the original cheek) then the next calculation does not take this vertex into account.
(3-3) transmitting the positions of the deformed face boundary points into a GPU; in the computer loader, each pixel point is paralleled, and the human face boundary point is traversed to find the nearest distance from the pixel point and store the nearest distance.
(3-4) moving the pixel points of the initial mapping obtained in the step (3-2) to a region with a median of 0 in the directed distance field by adopting a gradient optimization method to form dense mapping, wherein the method specifically comprises the following steps:
(3-4-1) calculating the gradient of each pixel point in the initial mapping at the same position of the directed distance field, and slightly shifting the gradient according to the gradient direction;
(3-4-2) iterating the step (3-4-1) until the gradient is 0 or the value is close to the area with the value of 0 in the directional distance field, so as to achieve the purpose of moving each pixel point in the initial mapping to the deformed boundary line.
And (3-5) projecting the densely mapped pixel points back to the deformed three-dimensional face model, and if the projected pixel points are not positioned on the cheeks of the deformed three-dimensional face model, excluding the corresponding mapping relation.
(4) Deforming the video frames of the face video in the step (1) based on dense mapping, reducing background distortion of the video frames caused by deformation by adopting energy optimization, obtaining deformed video frames, and correspondingly replacing the deformed face video frames with original face videos, wherein the specific steps are as follows:
(4-1) establishing a checkerboard grid near the face area in the video frame of the face video in the step (1), finding out a grid point on the checkerboard closest to the densely mapped pixel points, and moving the grid point by adopting a moving least square deformation method according to the densely mapped relation; the moving least squares deformation method is disclosed in "Image formatting moving least squares" (ACM SIGGRAPH 2006papers.2006: 533-;
(4-2) fixing the grid point closest to the densely mapped pixel point, and moving other grid points to minimize the energy E, so that the background distortion influence of the video frame caused by deformation is minimized:
E=wlEl+wrEr(X)
wherein E isl,ErEnergy, w, respectively keeping the constraint grid lines straight and the area of each grid point on the constraint grid constantl、wrWeights for the corresponding energy terms, respectively;
Figure BDA0003070681690000091
Figure BDA0003070681690000092
wherein v isiIs the ith grid point, N (i) is the set of all adjacent points of the ith grid point, eijIs along vi-vjUnit vector of direction.

Claims (7)

1. A video face fat and thin editing method is characterized by comprising the following steps:
(1) reconstructing a three-dimensional face model based on the face video, and generating three-dimensional face shape parameters in the face video and face expression parameters and face posture parameters of each video frame;
(2) adjusting a three-dimensional face model based on a three-dimensional face fat-thin adjustment algorithm, transferring an adjustment result to each video frame, and generating a deformed three-dimensional face model of each video frame, wherein the method comprises the following specific steps of:
(2-1) solving a three-dimensional average face model under the natural expression by using the natural expression parameters and the three-dimensional face shape parameters;
(2-2) obtaining a deformed face model under a natural expression according to the fat-thin scale input by the user;
(2-3) adding the facial expression parameters to the deformed facial model under the natural expression aiming at each video frame of the video, and outputting the deformed three-dimensional facial model with the expression;
(3) the method comprises the following steps of establishing dense mapping of human face boundaries before and after deformation on a two-dimensional plane by using a directed distance field, and adjusting the dense mapping based on the structure of a three-dimensional human face model, wherein the method specifically comprises the following steps:
(3-1) projecting the three-dimensional face to a two-dimensional image plane by adopting a three-dimensional average face model and face posture parameters, and extracting the projected boundary of the two-dimensional face image;
(3-2) extracting a video frame in the face video, back-projecting each pixel point in the face boundary region of the video frame on the three-dimensional face model reconstructed in the step (1), finding the corresponding nearest vertex on the reconstructed three-dimensional face model for each pixel point, projecting the vertex of the deformed three-dimensional face model obtained in the step (2) to the plane of the video frame, so that the original pixel point before back projection is converted to a new position, and taking the mapping of the original position and the new position as initial mapping;
(3-3) constructing a directional distance field of a face boundary in the deformed three-dimensional face model in real time by using a GPU;
(3-4) moving the pixel points of the initial mapping obtained in the step (3-2) to a region with a median of 0 in the directed distance field by adopting a gradient optimization method to form dense mapping;
(3-5) projecting the densely mapped pixel points back to the deformed three-dimensional face model, and if the projected pixel points are not positioned on the cheeks of the deformed three-dimensional face model, excluding the corresponding mapping relation;
(4) and (3) deforming the face video frame in the step (1) based on the dense mapping, reducing background distortion of the video frame caused by deformation by adopting energy optimization to obtain a deformed face video frame, and correspondingly replacing the deformed face video frame to the original face video.
2. The video face slimming editing method of claim 1, wherein in the step (1), the three-dimensional face model is reconstructed to generate three-dimensional face shape parameters in the face video and face expression parameters and face pose parameters of each video frame, and the specific steps are as follows:
(1-1) reconstructing a three-dimensional face model based on a monocular vision three-dimensional face reconstruction algorithm, and calculating face posture parameters of each video frame in a face video;
(1-2) finding out continuous k frames which are suitable for representing the shape of the face in the face video according to the face posture parameters, and performing joint optimization to obtain three-dimensional face shape parameters, wherein the k value is less than 10;
and (1-3) taking the three-dimensional face shape parameters as known conditions, and obtaining the face expression parameters of each video frame according to a monocular vision three-dimensional face reconstruction algorithm.
3. The video face slimming editing method of claim 2, wherein in the step (1-1), the three-dimensional face model is reconstructed based on a monocular vision three-dimensional face reconstruction algorithm, and the face pose parameter of each video frame in the face video is calculated, which comprises the following specific steps:
(1-1-1) reconstructing a three-dimensional face model, defining an optical flow constraint in a time domain of a face video, wherein the optical flow constraint utilizes optical flow information to correct errors of feature point detection in the three-dimensional face model so that the three-dimensional face model conforms to detected optical flow changes, and the optical flow constraint is defined in a face boundary area:
Figure FDA0003581589140000021
wherein r and t are respectively the face rotation parameter and the translation parameter of the current frame, r ' and t ' are the face rotation parameter and the translation parameter of the previous frame, alpha is the three-dimensional face shape parameter estimation value, and beta ' are the face expression parameter estimation values of the current frame and the previous frame; II is a projection operator which is used for projecting the human face to a two-dimensional plane according to the human face parameters; l is a radical of an alcoholbAs a boundary of a human faceA set of feature points of (1); u shapeiIs the optical flow value corresponding to the ith point on the two-dimensional plane, EopticOptical flow constraint energy defined for the boundary of the face;
(1-1-2) fixing the estimated value of the facial expression parameters and the estimated value of the three-dimensional facial shape parameters, and solving the optimal solution to obtain the facial pose parameters Epose
Figure FDA0003581589140000022
Wherein E islandFor the projection of the three-dimensional face feature points and the energy of the two-dimensional face feature points matching,
Figure FDA0003581589140000023
energy for maintaining continuity of facial poses of two adjacent frames, λland
Figure FDA0003581589140000024
λopticRespectively, the weights of the three energy terms.
4. The video face slimming editing method of claim 3, wherein in the step (1-2), the three-dimensional face shape parameters are obtained by finding continuous k frames suitable for representing the face shape according to the face pose parameters and performing joint optimization, and the specific steps are as follows:
(1-2-1) traversing the video frames to obtain continuous k frames with the front of the face facing to the lens based on the face posture parameters obtained in the step (1-1), and cutting off redundant frames if the k value is more than or equal to 10;
(1-2-2) according to the energy equation of the formula (III), jointly solving the energy equation and the continuous k frames to obtain a three-dimensional human face shape parameter Eidentity
Figure FDA0003581589140000025
Wherein E isalignIs three-dimensionalEnergy of matching of the face feature points and the three-dimensional face boundary points with the two-dimensional face feature points and the two-dimensional face boundary points, EtempTo maintain the energy of the continuity of the pose and the continuity of the expression of the faces of two adjacent frames, EpriorEnergy for matching prior knowledge of three-dimensional face shape parameters and expression parameters, EopticOptical flow constraint energy defined for the boundary of the face; lambda [ alpha ]align、λoptic、λtempIs the weight of the corresponding energy term.
5. The video face slimming editing method of claim 4, wherein the energy E of the matching of the three-dimensional face feature points and the three-dimensional face boundary points with the two-dimensional face feature points and the two-dimensional face boundary pointsalignComprises the following steps:
Figure FDA0003581589140000031
wherein r and t are respectively a face rotation parameter and a translation parameter of the current frame, alpha is a three-dimensional face shape parameter estimation value, and beta is a face expression parameter estimation value of the current frame; σ is a balance parameter of the boundary energy term and the characteristic point energy term, piA feature point corresponding to the ith point, LbIs a set of feature points of the boundary of the face.
6. The video face slimming editing method of claim 2, wherein in the step (1-3), the facial expression parameter E of each frame is obtained according to a monocular vision three-dimensional face reconstruction algorithmexprComprises the following steps:
Eexpr=λalign(Ealign)ioptic(Eoptic)itemp(Etemp)i+(Eprior)i (IX)
wherein, EexprHuman face expression parameters, EalignEnergy for matching three-dimensional face feature points and three-dimensional face boundary points with two-dimensional face feature points and two-dimensional face boundary points, EtempEnergy for keeping continuity of pose and continuity of expression of two adjacent frames of human faces, EpriorEnergy for matching prior knowledge of three-dimensional face shape parameters and expression parameters, EopticOptical flow constraint energy defined for the boundary of the face; lambda [ alpha ]align、λoptic、λtempIs the weight of the corresponding energy term.
7. The video face slimming editing method of claim 1, wherein in the step (4), each video frame in the face video is morphed based on dense mapping, and energy optimization is used to reduce distortion of the background of the video frame caused by the morphed video frame, and the specific steps are as follows:
(4-1) establishing a checkerboard grid near the face region on each video frame before the face model is deformed, finding out a grid point on the checkerboard closest to a densely mapped pixel point, and moving the grid point by adopting a moving least square deformation method according to the dense mapping relation;
(4-2) fixing the grid point closest to the densely mapped pixel point, and moving other grid points to minimize the energy E, so that the background distortion influence of the video frame caused by deformation is minimized:
E=wlEl+wrEr (X)
wherein E isl、ErEnergy, w, respectively keeping the constraint grid lines straight and the area of each grid point on the constraint grid constantl、wrWeights for the corresponding energy terms, respectively;
Figure FDA0003581589140000032
Figure FDA0003581589140000033
wherein v isiIs the ith grid point, N (i) is the set of all adjacent points of the ith grid point, eijIs along vi-vjUnit vector of direction.
CN202110538233.7A 2021-05-18 2021-05-18 Video face fat and thin editing method Active CN113223188B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110538233.7A CN113223188B (en) 2021-05-18 2021-05-18 Video face fat and thin editing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110538233.7A CN113223188B (en) 2021-05-18 2021-05-18 Video face fat and thin editing method

Publications (2)

Publication Number Publication Date
CN113223188A CN113223188A (en) 2021-08-06
CN113223188B true CN113223188B (en) 2022-05-27

Family

ID=77093047

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110538233.7A Active CN113223188B (en) 2021-05-18 2021-05-18 Video face fat and thin editing method

Country Status (1)

Country Link
CN (1) CN113223188B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103208133A (en) * 2013-04-02 2013-07-17 浙江大学 Method for adjusting face plumpness in image

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7755619B2 (en) * 2005-10-13 2010-07-13 Microsoft Corporation Automatic 3D face-modeling from video
CN107944367B (en) * 2017-11-16 2021-06-01 北京小米移动软件有限公司 Face key point detection method and device
CN108596827B (en) * 2018-04-18 2022-06-17 太平洋未来科技(深圳)有限公司 Three-dimensional face model generation method and device and electronic equipment
CN112348937A (en) * 2019-08-09 2021-02-09 华为技术有限公司 Face image processing method and electronic equipment
CN110796083B (en) * 2019-10-29 2023-07-04 腾讯科技(深圳)有限公司 Image display method, device, terminal and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103208133A (en) * 2013-04-02 2013-07-17 浙江大学 Method for adjusting face plumpness in image

Also Published As

Publication number Publication date
CN113223188A (en) 2021-08-06

Similar Documents

Publication Publication Date Title
CN109584353B (en) Method for reconstructing three-dimensional facial expression model based on monocular video
Achenbach et al. Fast generation of realistic virtual humans
US10089522B2 (en) Head-mounted display with facial expression detecting capability
CN109377557B (en) Real-time three-dimensional face reconstruction method based on single-frame face image
Thies et al. Real-time expression transfer for facial reenactment.
CN106920274B (en) Face modeling method for rapidly converting 2D key points of mobile terminal into 3D fusion deformation
JP4733318B2 (en) Method and system for animating facial features and method and system for facial expression transformation
CN110096156B (en) Virtual reloading method based on 2D image
US10311624B2 (en) Single shot capture to animated vr avatar
US6047078A (en) Method for extracting a three-dimensional model using appearance-based constrained structure from motion
WO2016029768A1 (en) 3d human face reconstruction method and apparatus
WO2019219012A1 (en) Three-dimensional reconstruction method and device uniting rigid motion and non-rigid deformation
JP4950787B2 (en) Image processing apparatus and method
JP2011170891A (en) Facial image processing method and system
US11494963B2 (en) Methods and systems for generating a resolved threedimensional (R3D) avatar
CN113628327A (en) Head three-dimensional reconstruction method and equipment
WO2021063271A1 (en) Human body model reconstruction method and reconstruction system, and storage medium
CN111950430A (en) Color texture based multi-scale makeup style difference measurement and migration method and system
CN111028354A (en) Image sequence-based model deformation human face three-dimensional reconstruction scheme
Song et al. A generic framework for efficient 2-D and 3-D facial expression analogy
CN106909904B (en) Human face obverse method based on learnable deformation field
CN111640172A (en) Attitude migration method based on generation of countermeasure network
CN113223188B (en) Video face fat and thin editing method
CN110648394B (en) Three-dimensional human body modeling method based on OpenGL and deep learning
CN112116699A (en) Real-time real-person virtual trial sending method based on 3D face tracking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant