CN109035388B

CN109035388B - Three-dimensional face model reconstruction method and device

Info

Publication number: CN109035388B
Application number: CN201810690747.2A
Authority: CN
Inventors: 户磊
Original assignee: Hefei Dilusense Technology Co Ltd
Current assignee: Hefei Dilusense Technology Co Ltd
Priority date: 2018-06-28
Filing date: 2018-06-28
Publication date: 2023-12-05
Anticipated expiration: 2038-06-28
Also published as: CN109035388A

Abstract

The embodiment of the application discloses a three-dimensional face model reconstruction method and device, which can improve the accuracy and speed of three-dimensional face model reconstruction. The method comprises the following steps: s1, acquiring a face depth map and a face color map to be processed, inputting the face depth map and the face color map to be processed into a pre-trained coarse learning network to obtain parameterized three-dimensional face model coefficients, and determining an initial three-dimensional face model according to the parameterized three-dimensional face model coefficients; s2, converting the brightness values on the initial three-dimensional face model and the face color image into 4-channel UV images, and inputting the UV images into a pre-trained fine learning network to obtain the offset of each grid point of the three-dimensional face model to be reconstructed in the direction of the normal line; s3, reconstructing a three-dimensional face model according to the parameterized three-dimensional face model coefficient and the offset of each grid point in the normal direction.

Description

Three-dimensional face model reconstruction method and device

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to a three-dimensional face model reconstruction method and device.

Background

Three main methods are three methods for three-dimensional face reconstruction: manual modeling, instrument acquisition and image-based, small amount of interaction or fully automatic modeling. Manual modeling is still widely used today as the earliest three-dimensional modeling tool. Manual modeling generally requires an experienced professional to do so by means of software such as Autodesk Maya, 3DMax, etc. Because manual modeling requires a lot of manpower and time, three-dimensional face modeling instruments have been studied and developed for a long time as a more convenient method. Typical examples thereof are precision three-dimensional acquisition instruments and depth cameras based on structured light and laser scanning techniques. The precision of the three-dimensional model acquired based on the precise instrument can reach millimeter, the three-dimensional model is real three-dimensional data of an object, and the three-dimensional model can be used for providing an evaluation database for an image-based modeling method, but the equipment is generally high in price, and the three-dimensional model is used in a market which is not suitable for consumer level and needs professional training. Recently, depth cameras such as Microsoft Kinect, intel RealSense, primeSense and the like have appeared on the market, and researchers can reconstruct three-dimensional models by using depth information acquired by the depth cameras. Depth cameras are cheaper and easier to use than precision acquisition instruments, but these devices are still less common than RGB cameras. Image-based modeling techniques refer to reconstructing a three-dimensional face model from multiple or single face images. Compared with a face modeling instrument, the image-based modeling method only needs face pictures acquired by a traditional RGB camera, so that the application scene is wider.

Because the human face has more commonalities, such as a specific number of eyes, mouth, nose and ears, and the relative positions are unchanged, a parameterized model of the human face can be built, and the complex three-dimensional human face can be parameterized into a low-dimensional space. The traditional three-dimensional face modeling technology based on the image generally takes a parameterized model as a priori, and optimizes the correlation coefficient of the parameterized model by utilizing the key point information and the color information of the face. However, these methods have some problems: the optimization based on the key point information only uses sparse key point information, so that the three-dimensional reconstruction precision is low; color-based optimization is relatively time-consuming to calculate and relatively sensitive to illumination.

Disclosure of Invention

Aiming at the defects and shortcomings existing in the prior art, the embodiment of the application provides a three-dimensional face model reconstruction method and device.

In one aspect, an embodiment of the present application provides a method for reconstructing a three-dimensional face model, including:

s1, acquiring a face depth map and a face color map to be processed, inputting the face depth map and the face color map to be processed into a pre-trained coarse learning network to obtain parameterized three-dimensional face model coefficients, and determining an initial three-dimensional face model according to the parameterized three-dimensional face model coefficients;

s2, converting the brightness values on the initial three-dimensional face model and the face color image into 4-channel UV images, and inputting the UV images into a pre-trained fine learning network to obtain the offset of each grid point of the three-dimensional face model to be reconstructed in the direction of the normal line;

s3, reconstructing a three-dimensional face model according to the parameterized three-dimensional face model coefficient and the offset of each grid point in the normal direction.

On the other hand, an embodiment of the present application provides a three-dimensional face model reconstruction device, including:

the first input unit is used for acquiring a face depth map and a face color map to be processed, inputting the face depth map and the face color map to be processed into a pre-trained coarse learning network to obtain parameterized three-dimensional face model coefficients, and determining an initial three-dimensional face model according to the parameterized three-dimensional face model coefficients;

the second input unit is used for converting the brightness values on the initial three-dimensional face model and the face color image into 4-channel UV images, inputting the UV images into a pre-trained fine learning network, and obtaining the offset of each grid point of the three-dimensional face model to be reconstructed in the direction of the normal line of each grid point;

and the reconstruction unit is used for reconstructing the three-dimensional face model according to the parameterized three-dimensional face model coefficient and the offset of each grid point in the normal direction.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory, a bus, and a computer program stored on the memory and executable on the processor;

the processor and the memory complete communication with each other through the bus;

the processor implements the above method when executing the computer program.

In a fourth aspect, embodiments of the present application provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method.

According to the three-dimensional face model reconstruction method and device, the face depth map and the face color map to be processed are obtained, the face depth map and the face color map to be processed are input into a pre-trained coarse learning network, parameterized three-dimensional face model coefficients are obtained, and an initial three-dimensional face model is determined according to the parameterized three-dimensional face model coefficients; converting the brightness values on the initial three-dimensional face model and the face color image into 4-channel UV images, and inputting the UV images into a pre-trained fine learning network to obtain the offset of each grid point of the three-dimensional face model to be reconstructed in the direction of the normal line of each grid point; and reconstructing the three-dimensional face model according to the parameterized three-dimensional face model coefficient and the offset of each grid point in the normal direction, thereby not only utilizing the complete face image information, but also avoiding the complicated and time-consuming optimization process, and further not detecting the key point information of the input face image in advance, so that the accuracy and the speed of reconstructing the three-dimensional face model can be improved.

Drawings

FIG. 1 is a schematic flow chart of an embodiment of a three-dimensional face model reconstruction method according to the present application;

FIG. 2 is a schematic structural diagram of an embodiment of a three-dimensional face model reconstruction device according to the present application;

fig. 3 is a schematic entity structure diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the embodiments of the present disclosure.

Referring to fig. 1, the embodiment discloses a three-dimensional face model reconstruction method, which includes:

s2, converting a model xyz coordinate obtained by the parameterized three-dimensional face model and a brightness value on a corresponding color image into a 4-channel UV image, inputting the UV image into a pre-trained fine learning network, and obtaining the offset of each grid point of the three-dimensional face model to be reconstructed in the direction of the normal line of each grid point;

According to the three-dimensional face model reconstruction method provided by the embodiment of the application, the face depth map and the face color map to be processed are obtained, the face depth map and the face color map to be processed are input into a pre-trained coarse learning network to obtain parameterized three-dimensional face model coefficients, and an initial three-dimensional face model is determined according to the parameterized three-dimensional face model coefficients; converting the brightness values on the initial three-dimensional face model and the face color image into 4-channel UV images, and inputting the UV images into a pre-trained fine learning network to obtain the offset of each grid point of the three-dimensional face model to be reconstructed in the direction of the normal line of each grid point; and reconstructing the three-dimensional face model according to the parameterized three-dimensional face model coefficient and the offset of each grid point in the normal direction, thereby not only utilizing the complete face image information, but also avoiding the complicated and time-consuming optimization process, and further not detecting the key point information of the input face image in advance, so that the accuracy and the speed of reconstructing the three-dimensional face model can be improved.

On the basis of the foregoing method embodiment, before S1, the method may further include:

training the coarse learning network, wherein a loss function of the coarse learning network is:

first item E in the above _geo So that the geometric information of the reconstruction model and the depth map is consistent, a second item E _col Make the synthetic texture consistent with the actual texture, the firstThree items E _lan So that the key points on the reconstructed model are projected onto the color map and are close to the actually detected key points, a fourth item E _reg So that the regressed face coefficients do not have strange shape and reflectivity, the fifth item E _flo So that the movement of the model between two adjacent frames is consistent with the optical flow between the color images, the last item E _sam So that the identity coefficient and the reflectivity coefficient between any two frames of the same person are the same. In training, a pair of pictures of a human adjacent frame and a non-adjacent frame are sampled simultaneously. w (w) _col ,w _lan ，w _reg ，w _flo ，w _sam For adjusting the weights of the items.

Each of the above formulas is described in detail below.

E _geo (χ)＝w _pp ×E _pp (χ)+w _ps ×E _ps (χ)，

Here χ= { α _id ,α _exp ,α _alb Z, pitch, yw, roll, t, γ represents parameterized three-dimensional face model coefficients. Wherein alpha is _id Refers to the identity base coefficient, alpha _exp Expression base coefficient, alpha _alb Referring to the reflectance base coefficient, pitch represents the euler angle of rotation about the X-axis, yaw represents the euler angle of rotation about the Y-axis, roll represents the euler angle of rotation about the Z-axis, t= (t) _x ,t _y ,t _z ) Refer to translation vector, t _x 、t _y And t _z Represents the translation amounts in the X axis, Y axis and Z axis, respectively, and gamma= (gamma) _r ,γ _g ,γ _b )，γ _r 、γ _g And gamma _b The illumination coefficients of the pictures on the r, g and b channels are shown, respectively.

Recording the depth value z at the coordinate (m, n) of the depth map, and converting the pixel point from the picture coordinate (m, n) to the point cloud coordinate (wx, wy, wz) according to the calculation formula: wx= (n-cx)/fx.z, wy= - (m-cy)/fy.z, wz= -z,

wherein fx, fy are focal lengths of the depth camera in x and y directions, respectively, and cx and cy are optical centers of the depth camera in x and y directions, respectively.

The process of calculating the three-dimensional model V from the parameterized model coefficients is:

wherein V is the average geometry, b _id 、b _exp 、b _ez Respectively an identity base, an expression base and a local base, wherein R is a rotation matrix obtained by Euler angles pitch, yaw and roll, and t is a translation vector.

The corresponding relation between the pixel points of the face area on the picture and the points on the three-dimensional model is determined by the following modes:

1) Points (w) on the three-dimensional model are projected from perspective _x ,w _y ,w _z ) After projection onto the picture, the corresponding picture coordinates (m, n) are: m=cx-w _x ·fx/w _z ,n＝cy+w _y ·fy/w _z Wherein fx, fy are focal lengths of the depth camera in x and y directions, respectively, cx and cy are optical centers of the depth camera in x and y directions, respectively;

2) For a triangular surface patch f on the three-dimensional model, recording three vertexes of the triangular surface patch f as V1, V2 and V3 respectively, firstly calculating projection points P1, P2 and P3 of the three vertexes V1, V2 and V3 on a picture according to 1); for a pixel point P in a triangle region formed by P1, P2 and P3 on a picture, after the barycentric coordinates (c 1, c2 and c 3) relative to a triangular patch f are calculated according to P=c1.P1+c2.P2+c3.P3, the corresponding three-dimensional point is taken as c1.V1+c2.V2+c3.V3, and if the three-dimensional point corresponds to the pixel point, the three-dimensional point with a z coordinate being closer is taken.

E _pp (χ) calculating the Point-to-Point distance between the reconstruction model and the Point cloud, E _ps (χ) calculating the Point-to-face distance, w, between the reconstructed model and the Point cloud _pp And w _ps For adjusting the weights.

Specific:

here, F represents a face region determined by projecting a three-dimensional model onto a picture, p _syn (m) is the coordinates of a three-dimensional grid point corresponding to the pixel point m (obtained by projecting a three-dimensional model through a patch and interpolating the three-dimensional model based on the barycentric coordinates), p _real (m') is the distance p from the point cloud _syn (m) the three-dimensional coordinates of the nearest point (m' representing the position of the point on the depth map),is the unit normal vector at m' on the depth map.

Here, I _syn (m) represents the synthesized texture at pixel point m, I _real (m) represents the actual texture at pixel point m on the color map.

The process of calculating the synthetic texture from the parameterized model coefficients is:

1) Skin reflectivityWherein (1)>Is the average reflectance, b _alb Is a reflectivity base;

2) Light condition l=b _sh Gamma, wherein b _sh Is the illumination base (which is constructed by normal according to spherical harmonic function), and gamma is the illumination coefficient;

3) The composite texture representation for each pixel point:

for a pixel point P on a picture, after the gravity center coordinates of a triangular surface patch f (three vertexes of the surface patch are respectively V1, V2 and V3 and the reflectances of the three vertexes are respectively T1, T2 and T3) on a corresponding grid are (c 1, c2 and c 3), the reflectivity at the pixel point is expressed as T= (c1.T1+c2.T2+c3.T3), and then the illumination base b of the point is calculated according to the normal direction of the surface patch _sh Thereafter, the texture I is synthesized at that point _syn ＝b _sh ·γ·T。

Where L represents the set of all currently visible keypoint numbers, q _i Image coordinates, p, representing the ith key point detected on the color map _i Representing the corresponding q on the reconstructed model _i Is a rotation matrix, t is a translation vector, and II is perspective projection.

α _id,j Sum sigma _id,j The coefficient of the jth identity base and the characteristic value of the jth identity base are respectively, J is the number of the identity bases, alpha _alb,k Sum sigma _alb,k The coefficient of the kth reflectivity base and the characteristic value of the kth reflectivity base are respectively, K is the number of the reflectivity bases, alpha _exp,m Sum sigma _exp,m The characteristic values of the coefficient of the jth expression group and the jth expression group are respectively, and M is the number of the expression groups.

Here, p represents the relative position information of the three-dimensional point corresponding to the pixel m on the three-dimensional model (including the chip number and the corresponding barycentric coordinates on the three-dimensional model corresponding to the pixel m), and Proj _n (p) is parameterized model coefficient χ according to the nth frame _n Inner (. Alpha.) _id ,α _exp Z) calculating the three-dimensional model shape, parameterizing the model coefficient χ according to the nth frame _n The position in the image plane calculated after projection of the pose information (pitch, yaw, roll, t) is the optical flow of the pixel point m from the position on the n-1 frame picture to the position on the n frame picture. This loss is to increase the stability of the previous and subsequent frames, thereby reducing the jitter of the model.

Here χ _n1 ,χ _n2 Respectively representing parameterized model coefficients corresponding to the nth 1 frame photo and the nth 2 frame photo of the same person, alpha _id,n1 ,α _id,n2 Respectively representing identity base coefficients corresponding to the nth 1 frame photo and the nth 2 frame photo of the same person, alpha _alb,n1 ,α _alb,n2 Respectively representing the reflectivity base coefficients corresponding to the nth 1 frame photo and the nth 2 frame photo of the same person. The loss is to enable identity-based coefficients, reflectivity-based coefficients obtained by the same person at different frames to be kept consistent.

In this embodiment, the trained coarse learning network may regress parameterized three-dimensional face model coefficients from the color map and the depth map, where the parameterized three-dimensional face model coefficients may include: identity base coefficients, expression base coefficients, local base coefficients, reflectivity base coefficients, euler angles, translation coefficients, and illumination coefficients.

The three-dimensional face model adopts a representation method of a three-dimensional deformation model. The shape of any three-dimensional face model is expressed linearly through an average shape, an identity base, an expression base and a local base. The identity base can be obtained by performing principal component analysis on the three-dimensional face model of the neutral expression; and obtaining the expression base by taking a result obtained by subtracting the neutral expression three-dimensional face model with the corresponding identity from all the three-dimensional face models with the expression as a principal component analysis. After the identity base coefficient, the expression base coefficient and the local base coefficient are determined, the shape of the three-dimensional face model to be reconstructed can be determined.

The skin reflectivity of any three-dimensional face model is expressed by mean reflectivity, reflectivity baseline. After the reflectivity base coefficient is determined, the skin reflectivity of the three-dimensional face model to be reconstructed can be determined.

For projection models, a standard pinhole camera model can be used to render a three-dimensional face model, and the process can be expressed as:here->And->Is the position of grid point i in the world coordinate system and the image plane, respectively, where R is the rotation matrix, t is the translation vector, and ii represents the perspective projection. After determining Euler angles and translation coefficients, a rotation matrix and translation vectors can be determined, coordinates of each grid point of the three-dimensional face model to be reconstructed can be determined, and then the normal direction of each grid point can be determined.

For the colors of the grid points of the three-dimensional face model to be reconstructed, the face is assumed to be a langerhans surface, and the color of each grid point is determined by the normal direction of the grid point, the skin reflectivity and the illumination coefficient. After the illumination coefficient, the normal direction of each grid point and the skin reflectivity are determined, the colors of each grid point of the three-dimensional face model to be reconstructed can be determined.

In the embodiment, the expression and illumination conditions are considered when training data are collected, and data enhancement is performed in a training stage, so that the trained coarse learning network has good adaptability to complex expression and complex illumination conditions, and the training stage takes front-back related constraint and fixed identity constraint into consideration, so that the training system has relatively stable performance on various test sets.

On the basis of the foregoing method embodiment, before S2, the method may further include:

training the fine learning network, wherein a loss function of the fine learning network is:

E _loss ＝E _sh (d)+E _sm (d)+E _cl (d _n ,d _n-1 )，

the first term in the above formula enables the brightness value of the synthesized texture to be close to the brightness value of the actual texture, the shadow information of the synthesized texture is consistent with the shadow term of the actual texture, the second term enables the reconstructed model after offset to be smoother and is close to the model output by the coarse network as a whole, and the third term enables the surface patch normal direction of the adjacent frames to be close.

Each of the above formulas is described in detail below.

Wherein T represents a triangle patch set on the three-dimensional model, E represents a grid edge set on the three-dimensional model, I (n) _l |b _l Gamma) is the luminance value of the synthesized texture at pixel point l, n _l Is the normal direction at pixel point l, b _l Is the reflectivity at pixel point l, gamma is the illumination coefficient, c _l Is the luminance value at pixel point l, w _face And w _edge For adjusting the weights. Here, the reflectivity and the illumination coefficient are fixed, and after updating the normal direction and then updating the illumination base, a new synthetic texture is calculated.

Wherein V represents all grid points on the reconstruction model, d represents coordinate offset of each grid vertex of the reconstruction model, and p _v Represents the v-th grid point coordinate, Δp of the reconstructed model after the offset d is added _v Is the Laplacian vector, w, of the v-th grid point on the reconstruction model _sm And w _mi For adjusting the weights.

Wherein d _n Representing the offset of all grid vertex coordinates on the reconstruction model of the nth frame, Q represents the set of patches on the reconstruction model, n _n,q Table normal to the q-th patch on the nth frame reconstruction model, w _cl For adjusting the weights.

In this embodiment, a self-learning method is used to train the coarse learning network and the fine learning network, and a real three-dimensional model corresponding to each frame depth map and color map in the training set does not need to be constructed.

When the coarse learning network and the fine learning network are trained, loss terms among different frames are calculated, and the three-dimensional model obtained by the adjacent frames through network learning can be ensured not to change greatly.

Referring to fig. 2, the present embodiment discloses a three-dimensional face model reconstruction device, which includes:

the first input unit 1 is used for acquiring a face depth map and a face color map to be processed, inputting the face depth map and the face color map to be processed into a pre-trained coarse learning network to obtain parameterized three-dimensional face model coefficients, and determining an initial three-dimensional face model according to the parameterized three-dimensional face model coefficients;

the second input unit 2 is configured to convert the brightness values on the initial three-dimensional face model and the face color chart into 4-channel UV images, and input the UV images into a pre-trained fine learning network to obtain offsets of grid points of the three-dimensional face model to be reconstructed in respective normal directions;

a reconstruction unit 3, configured to reconstruct a three-dimensional face model according to the parameterized three-dimensional face model coefficient and the offset of each grid point in the normal direction.

Specifically, the first input unit 1 obtains a face depth map and a face color map to be processed, inputs the face depth map and the face color map to be processed into a pre-trained coarse learning network to obtain parameterized three-dimensional face model coefficients, and determines an initial three-dimensional face model according to the parameterized three-dimensional face model coefficients; the second input unit 2 converts the brightness values on the initial three-dimensional face model and the face color image into 4-channel UV images, and inputs the UV images into a pre-trained fine learning network to obtain the offset of each grid point of the three-dimensional face model to be reconstructed in the normal direction; the reconstruction unit 3 reconstructs a three-dimensional face model from the parameterized three-dimensional face model coefficients and the deviations of the respective grid points in the respective normal directions.

According to the three-dimensional face model reconstruction device provided by the embodiment of the application, the face depth map and the face color map to be processed are obtained, the face depth map and the face color map to be processed are input into a pre-trained coarse learning network to obtain parameterized three-dimensional face model coefficients, and an initial three-dimensional face model is determined according to the parameterized three-dimensional face model coefficients; converting the brightness values on the initial three-dimensional face model and the face color image into 4-channel UV images, and inputting the UV images into a pre-trained fine learning network to obtain the offset of each grid point of the three-dimensional face model to be reconstructed in the direction of the normal line of each grid point; and reconstructing the three-dimensional face model according to the parameterized three-dimensional face model coefficient and the offset of each grid point in the normal direction, thereby not only utilizing the complete face image information, but also avoiding the complicated and time-consuming optimization process, and further not detecting the key point information of the input face image in advance, so that the accuracy and the speed of reconstructing the three-dimensional face model can be improved.

On the basis of the foregoing apparatus embodiment, the apparatus may further include:

the first training unit is used for training the coarse learning network before the first input unit works, wherein the loss function of the coarse learning network is as follows:

w in the above _col ,w _lan ，w _reg ，w _flo ，w _sam For adjusting the weight of each item,

E _geo (χ)＝w _pp ×E _pp (χ)+w _ps ×E _ps (χ)，

χ＝{α _id ,α _exp ,α _alb z, pitch, yw, roll, t, γ represent parameterized three-dimensional face model coefficients, where α _id Refers to the identity base coefficient, alpha _exp Expression base coefficient, alpha _alb Referring to the reflectance base coefficient, pitch represents the euler angle of rotation about the X-axis, yaw represents the euler angle of rotation about the Y-axis, roll represents the euler angle of rotation about the Z-axis, t= (t) _x ,t _y ,t _z ) Refer to translation vector, t _x 、t _y And t _z Represents the translation amounts in the X axis, Y axis and Z axis, respectively, and gamma= (gamma) _r ,γ _g ,γ _b )，γ _r 、γ _g And gamma _b The illumination coefficients of the pictures on the r, g and b channels are shown respectively,

w _pp and w _ps For the purpose of adjusting the weight of the weight,

f represents a face area determined after the three-dimensional model is projected on a picture, and p _syn (m) is the three-dimensional grid point coordinates corresponding to the pixel point m, p _real (m') is the distance p from the point cloud _syn (m) three-dimensional coordinates of the nearest point, m' representing the position of the point on the depth map,is the unit normal vector at m' on the depth map,

I _syn (m) represents the synthesized texture at pixel point m, I _real (m) represents the actual texture at pixel point m on the color map,

l represents the set of all currently visible keypoint numbers, q _i Image coordinates, p, representing the ith key point detected on the color map _i Representing the corresponding q on the reconstructed model _i Three-dimensional coordinates of three-dimensional key points of (1), R is a rotation matrix, and II representsThe perspective projection is performed so that,

α _id,j sum sigma _id,j The coefficient of the jth identity base and the characteristic value of the jth identity base are respectively, J is the number of the identity bases, alpha _alb,k Sum sigma _alb,k The coefficient of the kth reflectivity base and the characteristic value of the kth reflectivity base are respectively, K is the number of the reflectivity bases, alpha _exp,m Sum sigma _exp,m The coefficient of the jth expression group and the characteristic value of the jth expression group are respectively, M is the number of the expression groups,

p represents the relative position information of the three-dimensional point corresponding to the pixel point m on the three-dimensional model, proj _n (p) is parameterized model coefficient χ according to the nth frame _n Inner (. Alpha.) _id ，α _exp Z) calculating the three-dimensional model shape, parameterizing the model coefficient χ according to the nth frame _n The position in the image plane calculated after projection of the pose information (pitch, roll, t), f (m) is the shift of the pixel point m from the position on the n-1 frame picture to the n-th frame pictureFrame pictureOptical flow at the upper position.

Here χ _n1 ，χ _n2 Respectively represent the nth 1 of the same personFrame(s)Photograph, n2Frame(s)The parameters corresponding to the photo are the model coefficients of the parameterization,

α _id，n1 ，α _id，n2 respectively represent the nth 1 of the same personFrame(s)Photograph, n2Frame(s)Identity base coefficient, alpha, corresponding to photo _a1b，n1 ，α _a1b，n2 Respectively represent the nth 1 of the same personFrame(s)Photograph, n2Frame(s)The reflectivity base coefficient corresponding to the photo.

a second training unit for training the second input unit before the second input unit worksFine learningA network, wherein theFine learningThe loss function of the network is:

E _loss ＝E _sh (d)+E _sm (d)+E _cl (d _n ，d _n-1 )，

wherein T represents a triangle patch set on the three-dimensional model, E represents a grid edge set on the three-dimensional model, I (n1|b1, gamma) is a brightness value of the synthesized texture at the pixel point 1, n1 is a normal direction at the pixel point 1, b ₁ Is the reflectivity at pixel 1, gamma is the illumination coefficient, c1 is the luminance value at pixel 1, W _face And W is _edge For the purpose of adjusting the weight of the weight,

wherein V represents all grid points on the reconstruction model, d represents coordinate offset of each grid vertex of the reconstruction model, and p _v Representing the V-th reconstructed model after the offset d is addedPersonal (S)Grid point coordinates Δp _v Is the V on the reconstruction modelPersonal (S)Laplacian vector of grid point, W _sm And W is _mi For the purpose of adjusting the weight of the weight,

wherein d _n Representing the offset of all grid vertex coordinates on the reconstruction model of the nth frame, Q represents the set of patches on the reconstruction model, n _n，q Table nth frame reconstruction on modelPersonal (S)Normal direction of dough sheet, W _cl For adjusting the weights.

On the basis of the foregoing apparatus embodiment, the parameterized three-dimensional face model coefficient may include: identity base coefficients, expression base coefficients, local base coefficients, reflectivity base coefficients, euler angles, translation coefficients, and illumination coefficients.

The three-dimensional face model reconstruction device of the embodiment can be used for executing the technical scheme of the foregoing method embodiment, and its implementation principle and technical effects are similar, and will not be repeated here.

FIG. 3 shows an implementation of the applicationExample providingAs shown in fig. 3, the electronic device may include: a processor 11, a memory 12, a bus 13, and a computer program stored on the memory 12 and executable on the processor 11;

wherein the processor 11 and the memory 12 complete the communication with each other through the bus 13;

the processor 11 implements the methods described above when executing the computer programExamples provideFor example, comprising: acquiring a face depth map and a face color map to be processed, and inputting the face depth map and the face color map to be processed into a pre-trained systemCoarse learningA network obtains parameterized three-dimensional face model coefficients, and an initial three-dimensional face model is determined according to the parameterized three-dimensional face model coefficients; converting the brightness values on the initial three-dimensional face model and the face color image into 4-channel UV images, and inputting the UV images into a pre-trained fine learning network to obtain the offset of each grid point of the three-dimensional face model to be reconstructed in the direction of the normal line of each grid point; reconstructing a three-dimensional face model according to the parameterized three-dimensional face model coefficients and the deviations of the grid points in the normal directions.

Embodiments of the present application provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the methods provided by the above-described method embodiments, for example, comprising: acquiring a face depth map and a face color map to be processed, inputting the face depth map and the face color map to be processed into a pre-trained coarse learning network to obtain parameterized three-dimensional face model coefficients, and determining an initial three-dimensional face model according to the parameterized three-dimensional face model coefficients; converting the brightness values on the initial three-dimensional face model and the face color image into 4-channel UV images, and inputting the UV images into a pre-trained fine learning network to obtain the offset of each grid point of the three-dimensional face model to be reconstructed in the direction of the normal line of each grid point; reconstructing a three-dimensional face model according to the parameterized three-dimensional face model coefficients and the deviations of the grid points in the normal directions.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. The orientation or positional relationship indicated by the terms "upper", "lower", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of description and to simplify the description, and are not indicative or implying that the apparatus or elements in question must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be construed as limiting the present application. Unless specifically stated or limited otherwise, the terms "mounted," "connected," and "coupled" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art according to the specific circumstances.

In the description of the present application, numerous specific details are set forth. It may be evident, however, that the embodiments of the present application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description. Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting the intention: i.e., the claimed application requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The present application is not limited to any single aspect, nor to any single embodiment, nor to any combination and/or permutation of these aspects and/or embodiments. Moreover, each aspect and/or embodiment of the application may be used alone or in combination with one or more other aspects and/or embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application, and are intended to be included within the scope of the appended claims and description.

Claims

1. The three-dimensional face model reconstruction method is characterized by comprising the following steps of:

s1, training a coarse learning network, obtaining a face depth map and a face color map to be processed, inputting the face depth map and the face color map to be processed into the coarse learning network trained in advance to obtain parameterized three-dimensional face model coefficients, and determining an initial three-dimensional face model according to the parameterized three-dimensional face model coefficients; the loss function of the coarse learning network is as follows:

E _loss ＝E _geo +w _col ×E _col +w _lan ×E _lan +w _reg ×E _reg +w _flo ×E _flo +w _sam ×E _sam

w in the above _col ，w _lan ，w _reg ，w _flo ，w _sam For adjusting the weights of the items, E _geo (χ)＝W _pp ×E _pp (χ)+W _ps ×E _ps (χ)，χ＝{α _id ，α _exp ，α _alb Z, pitch, yw, roll, t, γ represent parameterized three-dimensional face model coefficients, where α _id Refers to the identity base coefficient, alpha _exp Expression base coefficient, alpha _alb Referring to the reflectance base coefficient, pitch represents the euler angle of rotation about the X-axis, yaw represents the euler angle of rotation about the Y-axis, roll represents the euler angle of rotation about the Z-axis, t= (t) _x ，t _y ，t _z ) Refer to translation vector, t _x 、t _y And t _z Represents the translation amounts in the X axis, Y axis and Z axis, respectively, and gamma= (gamma) _r ，γ _g ，γ _b )，γ _r 、γ _g And gamma _b Respectively representing the illumination coefficients of pictures on r, g and b channels, w _pp And W is _ps For the purpose of adjusting the weight of the weight,

f represents a face area determined after the three-dimensional model is projected on a picture, and p _syn ^(m) Is the three-dimensional grid point coordinate corresponding to the pixel point m, p _real ^(m′) Is the distance p from the point cloud _syn ^(m) The three-dimensional coordinates of the nearest point, m' represent the position of the point on the depth map,is the unit normal vector at m' on the depth map,

I _syn ^(m) representing the synthesized texture at pixel point m, I _real ^(m) Representing the actual texture at pixel point m on the color image,

l represents the set of all currently visible keypoint numbers, q _i Image coordinates, p, representing the ith key point detected on the color map _i Representing the corresponding q on the reconstructed model _i R is a rotation matrix, pi represents perspective projection,

α _id，j sum sigma _id，j The coefficient of the jth identity base and the characteristic value of the jth identity base are respectively, J is the number of the identity bases, alpha _alb，k Sum sigma _alb，k Respectively the firstCoefficients of K reflectivity bases, characteristic values of the kth reflectivity base, K is the number of the reflectivity bases, alpha _exp，m Sum sigma _exp，m The coefficient of the jth expression group and the characteristic value of the jth expression group are respectively, M is the number of the expression groups,

p represents the relative position information of the three-dimensional point corresponding to the pixel point m on the three-dimensional model, proj _n (p) is parameterized model coefficient χ according to the nth frame _n Inner (. Alpha.) _id ，α _exp Z) calculating the three-dimensional model shape, parameterizing the model coefficient χ according to the nth frame _n The position in the image plane calculated after projection of the pose information (pitch, yaw, roll, t), f (m) is the optical flow of the pixel point m moving from the position on the n-1 frame picture to the position on the n-th frame picture,

here χ _n1 ，χ _n2 Respectively representing parameterized model coefficients corresponding to the nth 1 frame photo and the nth 2 frame photo of the same person, alpha _id，n1 ，α _id，n2 Respectively representing identity base coefficients corresponding to the nth 1 frame photo and the nth 2 frame photo of the same person, alpha _a1b，n1 ，α _a1b，n2 Respectively representing the reflectivity base coefficients corresponding to the nth 1 frame photo and the nth 2 frame photo of the same person;

s2, training a fine learning network, converting the brightness values on the initial three-dimensional face model and the face color image into 4-channel UV images, and inputting the UV images into the fine learning network trained in advance to obtain the offset of each grid point of the three-dimensional face model to be reconstructed in the normal direction; the loss function of the fine learning network is as follows:

E _loss ＝E _sh (d)+E _sm (d)+E _c1 (d _n ，d _n-1 )，

wherein T represents a triangle patch set on the three-dimensional model, E represents a grid edge set on the three-dimensional model, I (n) ₁ |b ₁ Gamma) is the luminance value of the synthesized texture at pixel point l, n ₁ Is the normal direction at pixel point l, b _l Is the reflectivity at pixel point l, gamma is the illumination coefficient, c ₁ Is the brightness value at pixel point l, W _face And W is _edge For the purpose of adjusting the weight of the weight,

wherein V represents all grid points on the reconstruction model, d represents coordinate offset of each grid vertex of the reconstruction model, and p _v Represents the v-th grid point coordinate, Δp of the reconstructed model after the offset d is added _v Is the Laplacian vector of the v-th grid point on the reconstruction model, W _sm And W is _mi For the purpose of adjusting the weight of the weight,

wherein d _n Representing the offset of all grid vertex coordinates on the reconstruction model of the nth frame, Q represents the set of patches on the reconstruction model, n _n，q Normal direction of the q-th patch on the nth frame reconstruction model, W _cl For adjusting the weights;

2. The method of claim 1, wherein parameterizing three-dimensional face model coefficients comprises: identity base coefficients, expression base coefficients, local base coefficients, reflectivity base coefficients, euler angles, translation coefficients, and illumination coefficients.

3. A three-dimensional face model reconstruction device, comprising:

E _loss ＝E _geo +w _col ×E _col +w _lan ×E _lan +w _reg ×E _reg +w _flo ×E _flo +w _sam ×E _sam 。

w in the above _col ，w _lan ，w _reg ，w _flo ，w _sam For adjusting the weights of the items, E _geo (χ)＝W _pp ×E _pp (χ)+W _ps ×E _ps (χ)，χ＝{α _id ，α _exp ，α _alb Z, pitch, yw, roll, t, γ represent parameterized three-dimensional face model coefficients, where α _id Refers to the identity base coefficient, alpha _exp Expression base coefficient, alpha _alb Referring to the reflectance base coefficient, pitch represents the euler angle of rotation about the X-axis, yaw represents the euler angle of rotation about the Y-axis, roll represents the euler angle of rotation about the Z-axis, t= (t) _x ，t _y ，t _z ) Refer to translation vector, t _x 、t _y And t _z Represents the translation amounts in the X axis, Y axis and Z axis, respectively, and gamma= (gamma) _r ，γ _g ，γ _b )，γ _r 、γ _g And gamma _b The illumination coefficients of the pictures on the r, g and b channels are shown respectively,

W _pp and W is _ps For the purpose of adjusting the weight of the weight,

l represents the set of all currently visible keypoint numbers, q _i Image coordinates, p, representing the ith key point detected on the color map _i Representing the corresponding q on the reconstructed model _i Is a rotation matrix, pi represents perspective projection,

α _id，j sum sigma _id，j The coefficient of the jth identity base and the characteristic value of the jth identity base are respectively, J is the number of the identity bases, alpha _alb，k Sum sigma _alb，k The coefficient of the kth reflectivity base and the characteristic value of the kth reflectivity base are respectively, K is the number of the reflectivity bases, alpha _exp，m Sum sigma _exp，m Coefficients of the jth expression group and characteristics of the jth expression groupThe sign value, M is the number of expression bases,

here χ _n1 ，χ _n2 Respectively representing parameterized model coefficients corresponding to the nth 1 frame photo and the nth 2 frame photo of the same person,

α _id，n1 ，α _id，n2 respectively representing identity base coefficients corresponding to the nth 1 frame photo and the nth 2 frame photo of the same person, alpha _alb，n1 ，α _alb，n2 Respectively representing the reflectivity base coefficients corresponding to the nth 1 frame photo and the nth 2 frame photo of the same person;

the second training unit is used for training the fine learning network before the second input unit works, wherein the loss function of the fine learning network is as follows:

E _loss ＝E _sh (d)+E _sm (d)+E _cl (d _n ，d _n-1 )，

the second input unit is used for training a fine learning network, converting the brightness values on the initial three-dimensional face model and the face color image into 4-channel UV images, and inputting the UV images into the fine learning network trained in advance to obtain the offset of each grid point of the three-dimensional face model to be reconstructed in the normal direction;

4. The apparatus of claim 3, wherein the parameterized three-dimensional face model coefficients comprise: identity base coefficients, expression base coefficients, local base coefficients, reflectivity base coefficients, euler angles, translation coefficients, and illumination coefficients.

5. An electronic device, comprising: a processor, a memory, a bus, and a computer program stored on the memory and executable on the processor;

the processor, when executing the computer program, implements the method of any of claims 1-2.

6. A non-transitory computer readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed by a processor, implements the method according to any of claims 1-2.