WO2023001095A1

WO2023001095A1 - Face key point interpolation method and apparatus, computer device, and storage medium

Info

Publication number: WO2023001095A1
Application number: PCT/CN2022/106211
Authority: WO
Inventors: 陈文喻; 刘更代; 王志勇
Original assignee: 百果园技术(新加坡)有限公司; 刘更代
Priority date: 2021-07-23
Filing date: 2022-07-18
Publication date: 2023-01-26
Also published as: CN113362231A

Abstract

A face key point interpolation method and apparatus, a computer device, and a storage medium. The method comprises: acquiring two-dimensional first face data (101), the first face data comprising two-dimensional first face key points; fitting three-dimensional second face data according to the first face key points (102), the second face data comprising three-dimensional second face key points; selecting a local area in the second face data as a target area (103); and performing linear deformation on the target area, so that when the second face key points in the target area are subjected to perspective projection to two-dimensional third face key points, the third face key points overlap with the first face key points corresponding to the second face key points in the target area (104).

Description

Interpolation method, device, computer equipment and storage medium for human face key points

This application claims the priority of a Chinese patent application with application number 202110836320.0 filed with the China Patent Office on July 23, 2021, the entire contents of which are incorporated herein by reference.

technical field

The embodiments of the present application relate to the technical field of computer vision, for example, to a method, device, computer equipment and storage medium for interpolating key points of a human face.

Background technique

For augmented reality (Augmented Reality, AR) (such as trying on hats for users, trying on glasses, adding beards, etc.), face-driven virtual characters (such as dolls, animals) and other image processing are based on the user's face data. of.

These face data belong to the two-dimensional projection of the three-dimensional face. For business needs, the three-dimensional face model and three-dimensional face pose are accurately calculated according to the key points of the face data. The process of integration is relatively complicated and the amount of calculation is high, so it is difficult to process in real time on devices with relatively scarce resources.

Contents of the invention

The embodiment of the present application proposes a face key point interpolation method, device, computer equipment and storage medium to solve the problem of how to reduce the amount of calculation for fitting the face key points.

An embodiment of the present application provides an interpolation method for key points of a human face, including: acquiring two-dimensional first human face data, the first human face data having two-dimensional first human face key points; according to the The first human face key point fits the second three-dimensional human face data, and the second human face data has the second three-dimensional human face key point; the local area in the second human face data is selected as the target area Carry out linear deformation to the target area, so that when the second key point of human face in the target area is perspective-projected to the third two-dimensional key point of human face, the third key point of human face is consistent with the key point of human face The first human face key points corresponding to the second human face key points in the target area overlap.

The embodiment of the present application also provides an interpolation device for key points of a human face, including: a two-dimensional human face data acquisition module, configured to acquire two-dimensional first human face data, the first human face data has a two-dimensional The key point of the first human face; the three-dimensional human face data fitting module is configured to fit the second three-dimensional human face data according to the first human face key point, and the second human face data has a three-dimensional second human face face key points; the target area selection module is configured to select a local area in the second face data as the target area; the target area deformation module is configured to linearly deform the target area so that the In the case where the second key point of human face in the target area is perspectively projected to a two-dimensional third key point of human face, the third key point of human face corresponds to the second key point of human face in the target area The key points of the first face overlap.

The embodiment of the present application also provides a computer device, the computer device includes: at least one processor; memory, used to store at least one program, when the at least one program is executed by the at least one processor, so that the At least one processor implements the method for interpolating key points of a human face as described above.

The embodiment of the present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for interpolating key points of a human face as described above is implemented.

Description of drawings

Fig. 1 is the flow chart of the interpolation method of a kind of face key point provided by embodiment 1 of the present application;

FIG. 2 is an example diagram of a target area provided in Embodiment 1 of the present application;

FIG. 3A to FIG. 3E are example diagrams of a reprojection provided in Embodiment 1 of the present application;

FIG. 4 is a schematic structural diagram of an interpolation device for key points of a face provided in Embodiment 3 of the present application;

FIG. 5 is a schematic structural diagram of a computer device provided in Embodiment 4 of the present application.

detailed description

The application will be described below in conjunction with the accompanying drawings and embodiments. It should be understood that the embodiments described here are only used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, only the structures of the parts relevant to the present application are shown in the drawings.

Embodiment one

FIG. 1 is a flow chart of a face key point interpolation method provided in Embodiment 1 of the present application. This embodiment is applicable to the situation that the key points of the human face are interpolated in a linear manner, and the method can be performed by an interpolation device for the key points of the human face, and the interpolation device for the key points of the human face can be implemented by software and/or hardware , can be configured in a computer device, for example, a mobile terminal (such as a mobile phone, a tablet computer, etc.), a wearable device (such as smart glasses, a smart watch, etc.) and the like. The method includes the following steps.

Step 101. Obtain two-dimensional first human face data.

In computer equipment, operating systems such as Android (Android), iOS, and HarmonyOS can be installed, and users can install their required applications in these operating systems, such as live broadcast applications, short video applications, and beauty applications. , conference applications, etc.

The computer equipment may be configured with one or more cameras (cameras), and these cameras may be installed on the front of the computer equipment (also called a front camera) or on the back of the computer equipment (also called a rear camera).

In business operations such as AR and face-driven virtual characters, these applications can call the camera to face the user to collect image data, perform face detection on the image data, and detect the user's two-dimensional face data in the image data. The face data is represented by two-dimensional face key points. For the convenience of distinction, the two-dimensional face data is recorded as the first face data, and the two-dimensional face key points are recorded as the first face key points, that is, in There are two-dimensional first human face key points in the two-dimensional first human face data.

Face detection, also known as face key point detection, positioning or face alignment, refers to given face data to locate the key areas of the face, including eyebrows, eyes, nose, mouth, facial contours, etc.

Face detection can be performed in the following ways:

1. Use manual extraction features, such as haar features, use features to train classifiers, and use classifiers for face detection.

2. Inherit face detection from general target detection algorithms, for example, use Faster R-CNN to detect faces.

3. Convolutional neural networks using cascaded structures, for example, Cascade CNN, Multi-task Cascaded Convolutional Networks (MTCNN).

It should be noted that the number of face key points can be set by those skilled in the art according to the actual situation. For static image processing, the real-time requirements are low, and relatively dense face key points can be detected, such as 1000. In addition to being able to In addition to locating the important feature points of the face, it can also accurately describe the outline of the facial features; for live broadcast, etc., the real-time requirements are high, and it can detect relatively sparse key points of the face, such as 68, 81, and 106, and locate the face The more obvious and important feature points (such as eye key points, eyebrow key points, nose key points, mouth key points, outline key points, etc.) limit.

Exemplarily, the camera is called to face the user to collect video data, the video data has multiple frames of image data, and the two-dimensional first human face data is tracked in the multiple frames of image data by methods such as Kalman filtering and optical flow method.

For perspective projection, set P to be the pose of the face data in the image data. Under perspective projection, the pose of the face data is:

P={R,T}

R is the rotation matrix

R ₀ is the X-axis component of the rotation matrix, R ₁ is the Y-axis component of the rotation matrix, R ₂ is the Z-axis component of the rotation matrix, and T is the translation vector T=(T ₀ , T ₁ , T ₂ ), T ₀ is the X-axis component of the translation vector, T ₁ is the Y-axis component of the translation vector, and T ₂ is the Z-axis component of the translation vector.

In the case of a given camera parameter K (such as internal parameters, external parameters, etc.), a three-dimensional point v perspective projection is:

In the process of face tracking, the first face data F can be expressed as:

F＝F(α,δ)＝C ₀ +C _exp δ (2)

C ₀ is the user's expressionless neutral face data, C _exp is the user's expression shape fusion deformer, δ is the expression of the face data, and α is the user's identity vector.

In the process of face tracking, given the user's identity vector α, the multi-frame image data in the video data contains the following data:

{Q|P,δ}

Q is the key point of the first face, P is the posture of the face data, and δ is the expression of the face data.

In the process of face tracking, the coordinate descent method (Coordinate Descent) can be used to solve the following optimization equation to obtain P and δ:

(P,δ)＝argmin(∑ _j w ^j ||Π _P (C ₀ +C _exp δ) ^j -Q ^j || ² +γ ₀ ‖(P,δ)-(P _-1 ,δ _-1 ) ‖+γ‖(P,δ)‖) (3)

Π _P is the perspective projection defined by formula (1), j is the jth key point of the first face, (P _-1 , δ _-1 ) is P and δ of the previous frame of image data, γ ₀ ‖(P, δ)-(P _-1 ,δ _-1 )‖ is the smoothing term, γ‖(P,δ)‖ is the regularization term, and w ^j is the weight of the jth key point of the first face.

Step 102, fitting the 3D second human face data according to the key points of the first human face.

In this embodiment, the first face key points can be used, through the face three-dimensional deformation statistical model (3D Morphable Face Model, 3DMM), end-to-end 3D face reconstruction (such as virtual reality network (Virtual Reality Network, VRNet) , position map regression network (Position Map Regression Network, PRNet), two-dimensional assisted self-supervised learning (2D-Assisted Self-Supervised Learning, 2DASL) and other methods to fit three-dimensional face data, recorded as the second face data, The second human face data has three-dimensional human face key points, which are recorded as second human face key points.

Generally, as shown in FIG. 2 , the second face data can be represented in the form of grids (such as triangles), and these grids have multiple three-dimensional vertices, some of which are key points of the second face.

In the process of fitting the second face data, the three-dimensional second face key points are projected to the two-dimensional third face key points, so that all two-dimensional third face key points approach all two-dimensional first face key points Key points of a face.

Step 103, selecting a local area in the second face data as a target area.

In the process of fitting the second face data, the key points of the third face can be obtained after the second face data is projected onto the face data, which can fit the entire face as a whole, but there is no guarantee that all the second face data Face key point interpolation, part of the third face key point will deviate from the corresponding first face key point, where the so-called interpolation can refer to when the three-dimensional second face key point is projected to the two-dimensional third face key point , the third face keypoint overlaps with the corresponding first face keypoint.

For example, as shown in FIG. 3A , for the key points of the face representing the contour of the face, when the second three-dimensional key point of the face is projected onto the third key point of the two-dimensional face 302, the third key point of the face 302 deviates from the corresponding The key point 301 of the first face.

In order to interpolate the key points of this part of the second face, you can select a local area in the second face data according to the business requirements, as the target area for deformation, that is, do it again after fitting the second face data Local deformation, so that the specified key points of the second face can coincide with the corresponding key points of the first face under perspective projection.

In general, the target area is the facial features, such as eyes, mouth, eyebrows, etc., the target area can be pre-recorded in the configuration file, when the application program performs business operations, load the configuration file, and from the configuration file Read the target area to be deformed.

Step 104, performing linear deformation on the target area, so that when the second face key point in the target area is perspective-projected to the two-dimensional third face key point, the third face key point and the second face key point in the target area The face key points overlap with the first face key points.

Formula (3) can fit the overall first face key point Q _i , but cannot achieve complete interpolation, namely:

Π _P (C ₀ +C _exp δ) ^j ≠ Q ^j

In the process of face tracking, sometimes some key points of the first face are interpolated, such as the first key points of the eyes, so that 3D special effects can be applied to the eyes.

Assuming that J is the key point of the second face to be interpolated, appropriately increasing the weight w ^j ,j∈J of these key points of the second face can improve the fitting degree of these key points of the second face, but the interpolation cannot be guaranteed . If the weight w ^j (j∈J) of the second face key point is set too large, the result will be unstable and distorted face data will be reconstructed.

In order to interpolate the second key point of human face J, the local area (ie target area) containing the second key point of human face in the face data is deformed.

Taking Laplacian deformation as an example, Laplacian deformation is a process of encoding and decoding local detail features of a grid. Coding refers to the conversion of the Euclidean space coordinates of the vertices of the grid to the Laplacian coordinates. The Laplacian coordinates contain the local details of the grid. Therefore, the Laplacian deformation can better maintain the local details of the grid. Finding coordinates in Euclidean space is essentially a process of solving a linear system. Therefore, the Laplacian deformation algorithm has efficient and robust performance.

Assuming that V is all the vertices in the target area, in the process of deforming the target area by using Laplaican deformation, the Laplaican coordinates are kept unchanged, and the second key point J of the face can be interpolated at the same time.

argmin _V (||LV-Δ||+γ∑ _j∈J ||Π _P (C ₀ +C _exp δ) ^j -Q ^j || ² ) (4)

_L is the Laplacian matrix, Δ contains the Laplacian coordinates corresponding to each vertex, and ΠP is the perspective projection defined by formula (1).

The first optimization term ||LV-Δ|| is to keep the Laplaican coordinates unchanged before and after deformation, and the second optimization term ∑ _j∈J ||Π _P (C ₀ +C _exp δ) ^j -Q ^j || ² It is for the interpolation of the key points of the second face. Here, the perspective projection is performed on the key points of the second face, and the perspective projection itself is a nonlinear operation. Therefore, the optimization problem (4) is a non-linear optimization problem, and solving the non-linear optimization problem requires a large amount of calculation, and it is difficult to process it in real time on a device with relatively scarce resources.

In this embodiment, when interpolating the second human face key points, a linear optimization method is used to deform the target area.

Exemplarily, Fig. 2 defines the eyes as the target area in the face data, the vertices 202 located on the boundary of the target area remain unchanged during the linear deformation process, and the vertices located inside the target area are updated during the linear deformation process Their coordinates make these vertices keep the Laplacian coordinates as much as possible. At the same time, the third face key point after the second face key point 203 is transmitted and projected onto the face data overlaps with the corresponding first face key point, and is located in the target The vertices 201 outside the area are used to calculate the Laplacian coordinates of the vertices 202 located on the boundary of the target area, and do not participate in the linear deformation.

By definition, the Laplacian vector after linear deformation is:

W is the Laplacian matrix of all vertices in the face data, the size is 3m×3m, and W ₁ is the first sub-Laplacian matrix, that is, remove the target area from the Laplacian matrix W of all vertices The matrix remaining after the first n columns of the vertices of the boundary, of size 3m×3(mn), W ₂ is the second sub-Laplacian matrix, i.e., in the Laplacian matrix W of all vertices at the boundary of the target region The first n columns of the vertices of , the size is 3m×3n.

U=(u _0,0 ,u _0,1 ,…,u _0,n-1 ) is an array composed of vertices located at the boundary of the target area, V=(v _0,n ,v _0,n+1 ,… ,v _0,m-1 ) is an array composed of vertices located inside the target area.

In order to keep the Laplacian vector unchanged, it is necessary to optimize the interpolation of the second face key points:

min‖W ₁ V-(L ₀ -W ₂ U)‖ (5)

L ₀ is a vector of vertex transformations in the target region.

Suppose B is the index of the first face key point, q _i =(q _i,x ,q _i,y ), i∈B is the known first face key point,

As the parameters of the camera, the first reference matrix K _i,0 can be defined as follows:

According to the formula (1), Π( ) is the perspective projection based on the parameter K of the camera, let s _0,i ,s ∈ {u,v} be the vertex before the linear transformation, and apply it to the vertex s after the linear transformation _{1, i} , s ∈ {u, v}:

For i∈B,s∈{u,v}, combined with the first reference matrix K _i,0 , the error of re-projecting the second face key points (that is, reprojection) can be rewritten as:

The goal of interpolation is to minimize the error of reprojection, that is, to minimize the following function:

v _0,i =(v _0,i,x ,v _0,i,y ,v _0,i,z ) is the vertex after linear deformation.

Combining formula (5) and formula (6), the goal of interpolation is to solve the following function:

Formula (7) is a nonlinear optimization problem. In order to make the grid reprojection error obtained by face tracking small enough, it can be assumed that the Z-axis component of the second face key point changes little before and after Laplacian deformation, that is, :

Let u _0,i =(u _0,i,x ,u _0,i,y ,u _0,i,z ) be the second face key point before linear transformation before linear transformation, u _1,i =( u _1,i,x ,u _1,i,y ,u _1,i,z )=Ru _0,i +T is the second face key point after linear transformation, set v _0,i =(v _{0, i,x} ,v _0,i,y ,v _0,i,z ) are vertices before deformation, v _1,i =Rv _0,i +T=(v _1,i,x ,v _1,i,y ,v _1,i,z ) is the linearly transformed vertex, and v _1,i,z =R ₂ v _0,i +T ₂ is the Z-axis component of the linearly transformed vertex.

v _1,i,z =R ₂ v _0,i +T ₂ is the Z-axis component of the second facial key point after linear transformation.

If equation (7) is approximated using the following method, a distorted result will be obtained:

According to formula (6):

To minimize ||K _i (Rv _0,i +T)|| is to minimize ||v _1,i,z (Π(v _1,i )-q _i )||.

Therefore, formula (8) and formula (9) will simultaneously minimize ||Π(v _1,i )-q _i || and ||v _1,i,z || While the face keypoint coincides with the first face keypoint corresponding to the second face keypoint in the target region, ||v _1,i,z || is minimized, resulting in distortion.

As shown in Figure 3A, if the second facial keypoint of the eye is interpolated so that the reprojected third facial keypoint coincides with the corresponding first facial keypoint, the frontal view of the right eye shown in Figure 3C , the upper view of the right eye shown in FIG. 3E all incorrectly linearly approximate the right eye, resulting in distortion.

In an embodiment of the present application, step 104 includes the following steps.

Step 1041 , setting deformation of the target area, and mapping vertices inside the target area to three-dimensional target points.

In this embodiment, the target area in the face data can be deformed, such as Laplacian (Laplacian) deformation, so that in the three-dimensional space, the vertices inside the target area are mapped to the target point v _0,i = (v _0,i,x ,v _0,i,y ,v _0,i,z ), the target point to be solved.

Step 1042. Calculate the difference between the first vector and the second vector as a vector difference.

In this embodiment, the vertices in the target area may be linearly transformed, and the difference between the first vector and the second vector is calculated in the vector space after the linear transformation as the vector difference.

The first vector is the vector transformed by the vertices in the target area, and the second vector is the vector transformed by the target point.

Exemplarily, by performing Laplacian transformation on the vertices in the target area, the first vector L ₀ can be obtained,

W is the Laplacian matrix of all vertices in the face data, and W ₁ is the first sub-Laplacian matrix, that is, remove the first n vertices at the boundary of the target area from the Laplacian matrix W of all vertices The remaining matrix after columns, W ₂ is the second sub-Laplacian matrix, that is, the first n columns of vertices that are on the boundary of the target region in the Laplacian matrix W of all vertices.

P=(p ₀ ,p ₁ ,...,p _m-1 ) is the vertices of the face data in the target area, the first n vertices are vertices located at the boundary of the target area, P ₁ =(p _n ,p _n+1 , ...,p _m-1 ) are vertices located inside the target area, and P ₂ =(p ₀ ,p ₁ ,...,p _n-1 ) are vertices located at the boundary of the target area.

The Laplacian matrix W, the first sub-Laplacian matrix W ₁ and the second sub-Laplacian matrix W ₂ are initialized before the start of the entire face tracking, and are initialized based on the vertices in the neutral face to avoid The Laplacian matrix W, the first sub-Laplacian matrix W ₁ and the second sub-Laplacian matrix W ₂ are calculated for each frame of image data, thereby speeding up the speed of linear deformation.

Laplacian deformation is performed on the target point to obtain the second vector W ₂ U+W ₁ V .

U=(u _0,0 ,u _0,1 ,…,u _0,n-1 ) is the vertex located at the boundary of the target area, V=(v _0,n ,v _0,n+1 ,…,v _{0, m-1} ) are vertices located inside the target area.

The second vector is subtracted from the first vector to obtain the vector difference W ₁ V-(L ₀ −W ₂ U).

Step 1043, calculating the difference between the first human face key point and the third human face key point corresponding to the second human face key point in the target area, as the reprojection difference.

In this embodiment, the target point is projected into the face data, and the key point of the face can be obtained, which is recorded as the third key point of the face, that is, the third key point of the face is a two-dimensional perspective projection of the target point. Calculate the difference between the first face key point and the third face key point corresponding to the second face key point in the target area, which is recorded as the reprojection difference, and the reprojection difference is used as a constraint item , used to control the second face key point interpolation.

In the present, if q _i =(q _i,x ,q _i,y ), i∈B is the first face key point, then the first reference matrix K _i,0 can be calculated, and the first reference matrix is the A product of an identity matrix of key points of a face and a parameter of a camera, and the camera is used to collect the first face data.

is the identity matrix,

is the parameter of the camera.

Calculate the second reference matrix K _i , where the second reference matrix is the ratio between the first reference matrix and the Z-axis component after the linear transformation of the target point.

Let u _0,i =(u _0,i,x ,u _0,i,y ,u _0,i,z ) be the second face key point before linear transformation before linear transformation, u _1,i =( u _1,i,x ,u _1,i,y ,u _1,i,z )=Ru _0,i +T is the second face key point after linear transformation, R is the rotation matrix, T is the translation vector, That is, both the rotation matrix R and the translation vector T are used for linear transformation, u _{1, i, z} = R ₂ u _{0, i} + T ₂ After the linear transformation, the Z-axis component of the second face key point (that is, the Z-axis component portion).

At this time, the first target matrix J _i,0 and the second target matrix J _i,1 can be calculated.

The first target matrix is the product J _i,0 =K _i R between the second reference matrix and the rotation matrix, and the second target matrix is the inverse J _i,1 =-K of the product between the second reference matrix and the translation vector _i T.

The product of the first target matrix and the target point is subtracted from the second target matrix to obtain the reprojection difference J _i,0 v _0,i −J _i,1 .

Step 1044, calculate the moving distance of the target point in the Z-axis direction.

In this embodiment, the moving distance of the target point in the Z-axis direction can be calculated, so as to limit the movement of the grid in the Z-axis direction as a constraint item.

In the implementation, the product between the component R ₂ of the Z axis of the rotation matrix and the target point v _0,i can be calculated as the first intermediate value, and the component u ₁ of the Z axis after the linear transformation of the second face key point can be calculated _{, the difference between i, z} and the component T ₂ of the Z axis of the translation vector, as the second intermediate value, calculate the difference between the first intermediate value and the second intermediate value, as the target point in the Z axis direction The distance moved is R ₂ v _0,i -(u _1,i,z -T ₂ ).

Step 1045, linearly fusing the vector difference, the reprojection difference and the distance as an objective function.

In this embodiment, the vector difference, the reprojection difference and the distance can be linearly fused, so as to be set as the objective function.

In the implementation, the first weight is configured for the reprojection difference, the second weight is configured for the distance, and the vector difference, the sum of the reprojection difference configured with the first weight and the distance configured with the second weight is calculated as the objective function, expressed as as follows:

α is the first weight, and β is the second weight.

Step 1046, taking the minimization of the objective function as the objective, and solving the objective point.

In this embodiment, the optimization problem (7) can be approximated by the method of least squares, and the objective function can be minimized, thereby solving the coordinates of the target point. The array of vertices (target points) is set as V=(v _0,n ,v _{0 ,n+1} ,…,v _0,m-1 ), then the solution process is expressed as follows:

As shown in Figure 3A, if the second facial keypoint of the eye is interpolated so that the reprojected third facial keypoint coincides with the corresponding first facial keypoint, the frontal view of the right eye shown in Figure 3C , and the upper view of the right eye shown in FIG. 3E all linearly approximate the right eye without distortion.

In implementation, the first sparse matrix A and the second sparse matrix b can be constructed.

The first sparse matrix A consists of the first sub-Laplacian matrix, the product between the first weight α and the first target matrix J _i,0 , the product between the _second weight β and the component R2 of the Z axis of the rotation matrix Product, then the first sparse matrix A is expressed as follows:

The second sparse matrix b consists of the first vector L ₀ minus the difference between the product of the second sub-Laplacian matrix W ₂ and the vertex U at the boundary of the target region, the first weight α and the second target matrix J The product between _{i and 1} , the product between the second weight β and the second intermediate value (u _{1, i, z} -T ₂ ), then the second sparse matrix b is expressed as follows:

The objective relationship is set, and the objective relationship is a linear equation, that is, the product between the first sparse matrix and the objective function is equal to the second sparse matrix, that is, AV=b(11).

Therefore, the target point is solved based on the target relationship through sparse solver methods such as SimplicialLDLT (the built-in direct solver provided by Eigen for direct LDLT decomposition) and the conjugate gradient method (ConjugateGradient).

Considering that the vertex V ₀ =(u _0,n ,u _0,n+1 ,...,u _0,m-1 ) inside the target area obtained by tracking the face data is close to the target point, therefore, the vertex in the target area can be The vertex _V0 is set as the initial value of the target point, and the goal of the linear deformation in this embodiment is to make the reprojected third facial key point coincide with the first human face key point corresponding to the reprojected third human face key point, That is, Π(v _1,i )=q _i ,i∈B. In fact, on the face data, it is enough to keep a certain error (such as within 1 pixel) between Π(v _1,i ) and q _i , that is to say, iterative algorithm can be used to solve the target relationship (11), setting For a threshold with a larger value, the target point is iteratively updated on the basis of the initial value until the difference between the third face key point and the first face key point corresponding to the second face key point in the target area is less than The preset threshold can improve the calculation speed and reduce the calculation time.

Exemplarily, this embodiment uses ConjugateGradient to solve, and adopts the initial value V ₀ , and the threshold is 1e-5. It can be seen from the table below that this strategy takes half the time of the SimplicialLDLT algorithm, and in the result, there is no obvious the difference.

After testing, this embodiment performs Laplacian deformation on the two target areas of the left eye and the right eye, and it takes 1 ms on the mobile phone, which can ensure real-time performance.

In this embodiment, the two-dimensional first human face data is acquired, and the first human face data has two-dimensional first human face key points, and the three-dimensional second human face data is fitted according to the first human face key points, and the second human face data is The face data has a three-dimensional second human face key point, select a local area in the second human face data as the target area, and perform linear deformation on the target area, so that the second human face key point in the target area is perspective projected to When the third face key point is two-dimensional, the third face key point overlaps with the first face key point corresponding to the second face key point in the target area, and the optimization problem when fitting face data is adjusted It is a linear optimization problem. The linear optimization problem is relatively simple to deal with, and the calculation amount is low, which can greatly reduce the calculation time consumption, and it can be processed in real time on devices with relatively scarce resources.

It should be noted that, for the method embodiment, for the sake of simple description, it is expressed as a series of action combinations, but those skilled in the art should know that the embodiment of the present application is not limited by the described action sequence, because According to the embodiment of the present application, some steps may be performed in other orders or simultaneously. Those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the actions involved are not necessarily required by the embodiments of the present application.

Embodiment two

FIG. 4 is a structural block diagram of an interpolation device for key points of a face provided by Embodiment 2 of the present application. The device may include the following modules.

The two-dimensional face data obtaining module 401 is configured to obtain two-dimensional first human face data, and the first two-dimensional face data has two-dimensional first human face key points.

The 3D face data fitting module 402 is configured to fit the 3D second face data according to the first face key points, and the second face data has the 3D second face key points.

The target area selection module 403 is configured to select a partial area in the second face data as the target area.

The target area deformation module 404 is configured to perform linear deformation on the target area, so that when the key points of the second human face in the target area are perspective-projected to the two-dimensional key points of the third human face, the third human face The face key points overlap with the first human face key points corresponding to the second human face key points in the target area.

In one embodiment of the present application, the second face data is represented in the form of a grid, and the grid has a plurality of three-dimensional vertices, some of which are key points of the second face;

The target area deformation module 404 includes: a vertex mapping module, configured to deform the target area, and map vertices inside the target area to three-dimensional target points; a vector difference calculation module, configured to calculate the first The difference between the vector and the second vector, as a vector difference, the first vector is the vector converted by the vertex in the target area, and the second vector is the vector converted by the target point; reprojection difference calculation module , set to calculate the difference between the first face key point corresponding to the third face key point and the second face key point in the target area, as the reprojection difference, the third face key point The key point is the two-dimensional human face key point of the perspective projection of the target point; the moving distance calculation module is configured to calculate the distance that the target point moves in the Z-axis direction; the linear fusion module is configured to The vector difference, the reprojection difference and the distance are linearly fused as an objective function; the target point solving module is configured to minimize the objective function as a target and solve the target point.

In one embodiment of the present application, the vector difference calculation module is configured to: perform Laplace transformation on the vertex to obtain the first vector; perform Laplace transformation on the target point to obtain the second vector Vector; subtract the second vector from the first vector to obtain a vector difference.

In an embodiment of the present application, the reprojection difference calculation module is configured to: calculate a first reference matrix, the first reference matrix is the first reference matrix corresponding to the second human face key point in the target area The product between the unit matrix of key points of a human face and the parameter of the camera, the camera is set to collect the first human face data; calculate the second reference matrix, the second reference matrix is the first reference matrix and the ratio between the components of the Z axis after the linear transformation of the target point; calculate the first target matrix and the second target matrix, and the first target matrix is the product between the second reference matrix and the rotation matrix , the second target matrix is the inverse of the product between the second reference matrix and a translation vector, both of which are used for linear transformation; the first target matrix and the target The product between the points is subtracted from the second objective matrix to obtain the reprojected difference.

In one embodiment of the present application, the moving distance calculation module is configured to: calculate the product of the Z-axis component of the rotation matrix and the target point as the first intermediate value; calculate the moving distance in the target area The difference between the Z-axis component after the linear transformation of the second face key point and the Z-axis component of the translation vector, as the second intermediate value; calculate the difference between the first intermediate value and the second intermediate value The difference is taken as the moving distance of the target point in the Z-axis direction.

In one embodiment of the present application, the linear fusion module is configured to: configure a first weight for the reprojection difference; configure a second weight for the distance; calculate the vector difference and configure the first weight The sum of the reprojected differences of , and the distance to configure the second weight is used as the objective function.

In one embodiment of the present application, the target point solving module is configured to: construct a first sparse matrix A and a second sparse matrix B, the first sparse matrix includes a first sub-Laplacian matrix, the first The product between the weights and the first objective matrix J _i,0 , the product between the second weights and the Z-axis component of the rotation matrix, the first sub-Laplacian matrix being the Laplacian at all vertices The remaining matrix after removing the first n columns of the vertices at the boundary of the target area in the matrix, the second sparse matrix includes the first vector minus the second sub-Laplacian matrix and the vertices at the boundary of the target area The difference between the products between, the product between the first weight and the second target matrix J _i,1 , the product between the second weight and the second intermediate value, the second sub-Laplace matrix is In the Laplacian matrix of all vertices, the first n column matrices of the vertices at the boundary of the target area; set the target relationship, the target relationship is the product between the first sparse matrix and the target function is equal to the second sparse matrix; solving the target point based on the target relationship.

In one embodiment of the present application, the target point solving module is set to: set the vertices inside the target area as the initial value of the target point; iteratively update the target point until the difference between the third key point of human face and the first key point of human face corresponding to the third key point of human face is less than a preset threshold.

In one embodiment of the present application, the two-dimensional face data acquisition module 401 includes: a video data acquisition module, configured to call a camera to collect video data, and the video data has multiple frames of image data; a face tracking module , set to track two-dimensional first face data in the multi-frame image data.

The face key point interpolation device provided in the embodiment of the present application can execute the face key point interpolation method provided in any embodiment of the present application, and has corresponding functional modules for executing the method.

Embodiment three

FIG. 5 is a schematic structural diagram of a computer device provided in Embodiment 3 of the present application. FIG. 5 shows a block diagram of an exemplary computer device 12 suitable for implementing embodiments of the present application. The computer device 12 shown in FIG. 5 is only an example, and should not limit the functions and scope of use of this embodiment of the present application.

As shown in FIG. 5, computer device 12 takes the form of a general-purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16 , system memory 28 , bus 18 connecting various system components including system memory 28 and processing unit 16 .

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include but are not limited to Industry Standard Architecture (Industry Standard Architecture, ISA) bus, Micro Channel Architecture (Micro Channel Architecture, MCA) bus, Enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards Association, VESA) local bus and peripheral component interconnection (Peripheral Component Interconnection, PCI) bus.

Computer device 12 includes a variety of computer system readable media. These media can be any available media that can be accessed by computer device 12 and include both volatile and nonvolatile media, removable and non-removable media.

System memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32 . Computer device 12 may also include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be configured to read from and write to non-removable, non-volatile magnetic media (not shown in Figure 5, commonly referred to as a "hard drive"). Although not shown in Figure 5, the storage system may provide disk drives for reading and writing to removable non-volatile disks (such as "floppy disks"), as well as read-only Memory (Compact Disc Read-Only Memory, CD-ROM), digital video read-only memory (Digital Video Disc Read-Only Memory, DVD-ROM) or other optical media) CD-ROM drive. In these cases, each drive may be connected to bus 18 via one or more data media interfaces. Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including but not limited to an operating system, one or more application programs, other program modules, and program data , each or a combination of these examples may include implementations of network environments. The program modules 42 generally perform the functions and/or methods of the embodiments described herein.

The computer device 12 may also communicate with one or more external devices 14 (e.g., a keyboard, pointing device, display 24, etc.), and with one or more devices that enable a user to interact with the computer device 12, and/or with Any device (eg, network card, modem, etc.) that enables the computing device 12 to communicate with one or more other computing devices. This communication can be performed through an input/output (Input/Output, I/O) interface 22 . The computer device 12 can also communicate with one or more networks (such as a local area network (Local Area Nerwork, LAN), a wide area network (Wide Area Network WAN) and/or a public network such as the Internet) through the network adapter 20. As shown in FIG. 5 , network adapter 20 communicates with other modules of computer device 12 via bus 18 . It should be appreciated that although not shown in FIG. 5 , network adapter 20 may use other hardware and/or software modules in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, stand-alone Redundant Arrays Of Independent Disk (RAID) system, tape drive and data backup storage system, etc.

The processing unit 16 executes a variety of functional applications and data processing by running the programs stored in the system memory 28 , such as realizing the interpolation method of key points of the face provided by the embodiment of the present application.

Embodiment four

Embodiment 4 of the present application also provides a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, multiple processes of the above-mentioned interpolation method for key points of a human face are implemented. In order to avoid Repeat, so I won't go into details here.

A computer-readable storage medium, for example, may include, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer-readable storage media include: electrical connections with one or more leads, portable computer disks, hard disks, RAM, Read-Only Memory (ROM), Erasable Programmable Read-Only Memory (EPROM) or flash memory, optical fiber, CD-ROM, optical storage device, magnetic storage device, or any suitable combination of the above. In the present application, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

Claims

A method for interpolating face key points, comprising:

Acquiring two-dimensional first human face data, the first human face data having two-dimensional key points of the first human face;

Fitting three-dimensional second human face data according to the first human face key points, the second human face data having three-dimensional second human face key points;

Selecting a partial area in the second face data as a target area;

performing linear deformation on the target area, so that when the second key point of human face in the target area is perspective-projected to a two-dimensional third key point of human face, the third key point of human face is consistent with the The first human face key points corresponding to the second human face key points in the target area overlap.
The method according to claim 1, wherein the second face data is represented in the form of a grid, the grid has a plurality of three-dimensional vertices, and some of the vertices are key points of the second face;

The linear deformation of the target area is performed, so that when the second key point of human face in the target area is perspective-projected to the third two-dimensional key point of human face, the third key point of human face and The first human face key points corresponding to the second human face key points in the target area overlap, including:

Setting the deformation of the target area, and mapping the vertices inside the target area to three-dimensional target points;

calculating a difference between a first vector and a second vector, as a vector difference, the first vector is a vector transformed by a vertex in the target area, and the second vector is a vector transformed by the target point;

calculating the difference between the first human face key point corresponding to the second human face key point in the target area and the third human face key point, as a reprojection difference, the third human face key point It is the two-dimensional human face key point of the perspective projection of the target point;

Calculate the distance that the target point moves in the Z-axis direction;

linearly fusing said vector difference, said reprojected difference and said distance as an objective function;

Taking minimizing the objective function as an objective, solving the objective point.
The method of claim 2, wherein said calculating the difference between the first vector and the second vector, as a vector difference, comprises:

Laplace deformation is performed on the vertices in the target area to obtain the first vector;

performing Laplace deformation on the target point to obtain a second vector;

The second vector is subtracted from the first vector to obtain a vector difference.
The method according to claim 2, wherein the calculation of the difference between the first human face key point corresponding to the second human face key point in the target area and the third human face key point is as Reproject diffs, including:

Calculating the first reference matrix, the first reference matrix is the product between the identity matrix of the first human face key point corresponding to the second human face key point in the target area and the parameters of the camera, the camera set to collect the first face data;

Calculating a second reference matrix, where the second reference matrix is the ratio between the first reference matrix and the Z-axis component after the linear transformation of the target point;

Calculate the first target matrix and the second target matrix, the first target matrix is the product between the second reference matrix and the rotation matrix, and the second target matrix is the product between the second reference matrix and the translation vector The reverse of the product, both the rotation matrix and the translation vector are used for linear transformation;

The product of the first target matrix and the target point is subtracted from the second target matrix to obtain a reprojection difference.
The method according to claim 2, wherein said calculating the moving distance of said target point in the Z-axis direction comprises:

Calculate the product between the component of the Z axis of the rotation matrix and the target point, as the first intermediate value;

Calculate the difference between the Z-axis component and the Z-axis component of the translation vector after the linear transformation of the second face key point in the target area, as the second intermediate value;

Calculate the difference between the first intermediate value and the second intermediate value as the moving distance of the target point in the Z-axis direction.
The method according to claim 2, wherein the linear fusion of the vector difference, the reprojection difference and the distance, as an objective function, comprises:

assigning a first weight to the reprojected difference;

configuring a second weight for the distance;

A sum of the vector difference, the reprojection difference configuring the first weight, and the distance configuring the second weight is calculated as an objective function.
The method according to any one of claims 2-6, wherein said minimizing said objective function as an objective and solving said objective point comprises:

Constructing a first sparse matrix and a second sparse matrix, the first sparse matrix includes a first sub-Laplace matrix, the product between the first weight and the first target matrix, the second weight and The product between the components of the Z axis of the rotation matrix, the first sub-Laplacian matrix is after removing the first n columns of vertices at the boundary of the target area in the Laplacian matrices of all vertices The remaining matrix, the second sparse matrix includes the difference between the first vector minus the product of the second sub-Laplace matrix and the vertices at the boundary of the target region, the first weight and The product between the second target matrix, the product between the second weight and the second intermediate value, the second sub-Laplacian matrix is the Laplacian matrix at all vertices The first n column matrix of the vertices in the border of the target area;

Setting an objective relationship, the objective relationship being that the product between the first sparse matrix and the objective function is equal to the second sparse matrix;

Solving for the target point based on the target relationship.
The method according to claim 7, wherein said solving said target point based on said target relationship comprises:

Setting the vertices inside the target area as the initial value of the target point;

The target point is iteratively updated on the basis of the initial value until the difference between the first human face key point corresponding to the third human face key point and the second human face key point in the target area value is less than the preset threshold.
A device for interpolating face key points, comprising:

The two-dimensional face data acquisition module is configured to obtain two-dimensional first human face data, and the first two-dimensional face data has two-dimensional first human face key points;

The three-dimensional face data fitting module is configured to fit the second three-dimensional face data according to the first key points of the face, and the second face data has the second three-dimensional key points of the face;

The target area selection module is configured to select a local area in the second face data as the target area;

The target area deformation module is configured to perform linear deformation on the target area, so that when the second human face key point in the target area is perspective projected to a two-dimensional third human face key point, the third The key points of human face overlap with the first key points of human face corresponding to the second key points of human face in the target area.
A computer device comprising:

at least one processor;

a memory configured to store at least one program,

When the at least one program is executed by the at least one processor, the at least one processor is made to implement the method for interpolating key points of a human face according to any one of claims 1-8.
A computer-readable storage medium, storing a computer program on the computer-readable storage medium, when the computer program is executed by a processor, the method for interpolating key points of a human face according to any one of claims 1-8 is realized .