CN107122705B - Face key point detection method based on three-dimensional face model - Google Patents

Face key point detection method based on three-dimensional face model Download PDF

Info

Publication number
CN107122705B
CN107122705B CN201710159215.1A CN201710159215A CN107122705B CN 107122705 B CN107122705 B CN 107122705B CN 201710159215 A CN201710159215 A CN 201710159215A CN 107122705 B CN107122705 B CN 107122705B
Authority
CN
China
Prior art keywords
dimensional
parameter
face
human face
face model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710159215.1A
Other languages
Chinese (zh)
Other versions
CN107122705A (en
Inventor
朱翔昱
雷震
刘浩
李子青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201710159215.1A priority Critical patent/CN107122705B/en
Publication of CN107122705A publication Critical patent/CN107122705A/en
Application granted granted Critical
Publication of CN107122705B publication Critical patent/CN107122705B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a human face key point detection method based on a three-dimensional human face model, which comprises the following steps: step 01, acquiring initial parameters of a face image and a three-dimensional face model from a face training sample; step 02, generating an attitude self-adaptive feature and a normalized coordinate code according to the face image and the initial parameters; step 03, respectively carrying out transformation fusion on the attitude self-adaptive features and the normalized coordinate codes by using a convolutional neural network to obtain real residual errors and parameter residual errors of initial parameters; step 04, updating the initial parameters according to the parameter residual errors, and turning to step 02 until the parameter residual errors reach preset threshold values; and step 05, updating the three-dimensional face model by using the parameter residual reaching a preset threshold value, and collecting face key points on the three-dimensional face model. In the invention, the detection of the key points of the human face under the full posture is realized.

Description

Face key point detection method based on three-dimensional face model
Technical Field
The invention belongs to the technical field of image processing and pattern recognition, and particularly relates to a human face key point detection method based on a three-dimensional human face model.
Background
The key points of the human face are a series of points with fixed semantics on the human face, such as eye corners, nose tips, mouth corners and the like, and the detection of the key points is an important preprocessing step in the computer vision based on human face understanding. Most face analysis systems need to perform key point detection first to accurately know the distribution of facial features, so as to extract features at the designated positions of the faces. However, most of the existing key point detection methods can only process the face with a moderate attitude or below, namely, the deflection angle (yaw) is less than 45 degrees, and the detection of the key points of the face with a large attitude (the deflection angle can reach 90 degrees) is always a difficult point.
The challenges presented therein are mainly three: first, in the conventional keypoint detection algorithm, it is assumed that all keypoints have stable appearance characteristics and thus can be detected. However, in a large gesture, some key points inevitably become invisible due to self-occlusion, and the invisible points cannot be detected due to occlusion of the representation information, so that the traditional method fails; secondly, the appearance change of the face under the large posture is more complex and can be changed from the front to the side, which requires that the positioning algorithm must be more robust to understand the face appearance under different postures; finally, in the aspect of training data, it is difficult to calibrate key points of the human face in a large pose, positions of invisible key points need to be guessed, most of the human faces in the existing database are in a medium pose, only visible key points are labeled in a few databases containing the human face in the large pose, and a key point algorithm for processing any pose is difficult to design.
One possible solution in the prior art is to fit a three-dimensional face model directly from the image. A cascaded convolutional neural network is generally used to transform an input image and regress parameters of a three-dimensional face model. However, this technique has the following drawbacks: firstly, the technology uses an Euler angle to express the rotation of a human face, and the Euler angle can generate ambiguity due to dead lock of a universal joint under a large posture; secondly, the technology only uses the input characteristics of the image visual angle, namely, the original image is directly sent to a convolutional neural network, and the intermediate result image can be used for gradual correction in cascade connection, so that the fitting precision is further improved; finally, the technique does not effectively model the priorities of the model parameters when training the convolutional neural network, so that the fitting performance of the convolutional neural network is dispersed on some secondary parameters.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a face key point detection method based on a three-dimensional face model, so as to realize face key point detection under a full posture.
The method comprises the following steps:
step 01, extracting face images and initial parameters of a three-dimensional face model from a face training sample;
step 02, generating an attitude self-adaptive feature and a normalized coordinate code according to the face image and the initial parameters;
step 03, respectively carrying out transformation fusion on the attitude self-adaptive features and the normalized coordinate codes by using a convolutional neural network to obtain real residual errors and parameter residual errors of initial parameters;
step 04, updating the initial parameters according to the parameter residual errors, and turning to step 02 until the parameter residual errors reach preset threshold values;
and step 05, updating the three-dimensional face model by using the parameter residual reaching a preset threshold value, and collecting face key points on the three-dimensional face model.
Preferably, in the step 02, when the pose adaptive feature is generated, the three-dimensional face model is projected, and a formula in the projection includes:
Figure GDA0002234280830000021
wherein V (p) is a function for constructing and projecting a three-dimensional face model, and can obtain two-dimensional coordinates of each key point on the three-dimensional model on an image,
Figure GDA0002234280830000023
representing the average shape of a human face, Aidprincipal component axis, alpha, of PCA extracted from three-dimensional face with neutral expressionidDenotes a shape parameter, Aexprepresenting the principal component axis, α, of PCA extracted from the difference between expressive and neutral facesexpRepresenting expression parameters, f is a scaling factor, Pr is a forward projection matrix, R is a rotation matrix, and the expression parameters are represented by a quadruple [ q [ q ] ]0,q1,q2,q3]Construction of t2dFor the translation vector, the fitting target parameter is [ f, R, t ]2didexp]Fitting ofThe target parameter set is combined as [ f, q ]0,q1,q2,q3,t2didexp]。
Preferably, said group of four tuples [ q [ ]0,q1,q2,q3]The formula for constructing the rotation matrix is:
Figure GDA0002234280830000022
preferably, the generating of the pose adaptive feature in the step 02 includes:
calculating the cylindrical coordinates of each vertex of the three-dimensional face model, and sampling n x n anchor points at equal intervals on an azimuth axis and a height axis; in the fitting process, the anchor points are deformed, scaled, rotated and translated by using the parameters of the current model to obtain the positions of the anchor points on the image, and the posture self-adaptive feature is generated.
Preferably, the generating of the normalized coordinate code in step 02 includes the following formula:
PNCC(I,p)=I&ZBuffer(V3d(p),NCC)
Figure GDA0002234280830000031
wherein PNCC is the normalized coordinate code of projection, NCC is the normalized coordinate code, I is the input face image, p is the current parameter,&for stacking operations in channel dimensions, the function Zbuffer is a function for rendering a three-dimensional patch using texture to generate a two-dimensional image, V3dAnd (p) the three-dimensional human face after the rotation translation deformation is scaled, and the images generated by stacking together are coded by normalized coordinates.
Preferably, the step 03 specifically includes:
and transforming the attitude self-adaptive feature and the normalized coordinate code respectively according to two parallel convolutional neural networks, fusing the transformed features by using an additional full-connection layer, and regressing a fusion result to obtain a parameter residual error.
Preferably, the calculation formula of the parameter residual in step 03 is as follows:
Δpk=Netk(PAF(pk,I),PNCC(pk,I))
wherein p iskFor the current parameter, I is the input image, Δ pkFor the residual between the current parameter and the true residual, PAF is the attitude adaptive feature, PNCC is the normalized coordinate code, NetkIs a two-way parallel convolutional neural network.
Preferably, the step 03 further includes training the convolutional neural network, and weighting the real residuals during training, where the formula is:
Figure GDA0002234280830000032
wherein p isc=p0+ Δ p, w is more than or equal to 0 and less than or equal to 1, w is a parameter weight, Δ p is the output of the convolutional neural network, pgFor true residual, p0As an input parameter for the current iteration, pcFor the current parameters, V (p) is the deformation and weak perspective projection function, and diag is the diagonal matrix construction.
Preferably, in the step 04, updating the initial parameter according to the parameter residual, specifically, adding the parameter residual to the initial parameter.
Compared with the prior art, the invention has at least the following advantages:
the human face key point detection method based on the three-dimensional human face model realizes the human face key point detection under the full posture.
Drawings
FIG. 1 is a schematic flow chart of a face key point detection method based on a three-dimensional face model according to the present invention;
fig. 2 is a schematic diagram of a processing flow of a two-way parallel convolutional neural network provided by the present invention.
Detailed Description
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
The invention discloses a face key point detection method based on a three-dimensional face model, which comprises the following steps as shown in figure 1:
and 00, constructing a three-dimensional variable human face model.
Obtaining a three-dimensional face point cloud sample through a three-dimensional scanner, and constructing a three-dimensional variable model by using Principal Component Analysis (PCA):
Figure GDA0002234280830000041
wherein S represents a three-dimensional face of a person,
Figure GDA0002234280830000042
representing the average shape of a human face, Aidprincipal component axis, alpha, of PCA extracted from three-dimensional face with neutral expressionidDenotes a shape parameter, Aexprepresenting the principal component axis, α, of PCA extracted from the difference between expressive and neutral facesexpRepresenting an expression parameter.
After the three-dimensional face model is constructed, the three-dimensional face model is projected onto an image plane by using weak perspective projection:
V(p)=f*Pr*R*(S+Aidαid+Aexpαexp)+t2d
v (p) is a function for constructing and projecting a human face model, two-dimensional coordinates of each point on the three-dimensional model on an image can be obtained, f is a scaling factor, Pr is a forward projection matrix, R is a rotation matrix, t2dIs a translation vector; then the target parameter of the fit is [ f, R, t ]2didexp]。
Traditionally, the pose of a human face is usually expressed in terms of euler angles, including pitch, yaw, and roll. However, when the yaw angle is close to 90 °, i.e. the attitude approaches the side, the problem of gimbal deadlock can make the euler angles ambiguous, i.e. two different euler angles may correspond to the same rotation matrix. Therefore, we have adopted the quadruple [ q ]0,q1,q2,q3]To represent a rotation matrix and to integrate a scaling factor f into this matrix, the resulting set of model parameters is:
[f,q0,q1,q2,q3,t2didexp]。
a three-dimensional variable face model is used as a fitting target. Manually calibrating the face key points to serve as basic training samples (or using a public face key point data set as a basic training sample), and performing out-of-plane rotation on the face by using a face sidedness technology on the basis to generate a face training sample set with a larger variable angle and richness.
And step 01, extracting the face image and the initial parameters.
And step 02, generating a posture self-adaptive feature and a normalized coordinate code.
The following describes a convolutional neural network-based three-dimensional face model fitting algorithm, i.e., how to estimate the pose, shape and expression parameters of a face using a convolutional neural network. For the input of the convolutional neural network, two input features are designed, namely an attitude adaptive feature and normalized coding of projection.
First, a Pose Adaptive Feature (PAF) is explained.
In convolutional neural networks, the conventional convolutional layer is convolved pixel by pixel along the two-dimensional image axis, whereas in PAF the convolution is performed at some fixed semantic location of the face. The position where the PAF performs the convolution operation is obtained by: considering that a face can be roughly approximated by a cylinder, we calculate two-dimensional cylindrical coordinates of each vertex of a three-dimensional face model and sample n x n anchor points at equal intervals on an azimuth axis and a height axis. In the fitting process, given a current model parameter p, a three-dimensional face model is projected, and the position of an anchor point on a two-dimensional image is obtained and used as the position of the PAF for convolution operation. Note that the convolution operation on the anchor points forms an n x n map, and then the conventional convolution operation can be performed. To reduce the impact of the feature at the occlusion region, we generate the pose adaptive feature by dividing the response at the occlusion region by 2.
The Normalized coordinate Code (Projected Normalized Code-PNCC) of the projection is described below. This input feature relies on a new coordinate encoding, first normalizing the three-dimensional average face to 0-1 in three-dimensional space:
Figure GDA0002234280830000051
after normalization, the points on the three-dimensional model are uniquely distributed on [0,0,0] to [1,1,1], so that the three-dimensional model can be regarded as a three-dimensional coordinate code which is called normalized coordinate code. Unlike commonly used numbering (e.g., 0,1, …, n), normalized coordinate encodings are continuous in three-dimensional space. In the fitting process, given the current model parameters p, we use the ZBuffer algorithm to render the projected three-dimensional face with normalized coordinate encoding:
PNCC(I,p)=I&ZBuffer(V3d(p),NCC)
Figure GDA0002234280830000061
wherein PNCC is a normalized coordinate code, I is an input face image, p is a current parameter,&for stacking operations in channel dimensions, the function Zbuffer is a function for rendering a three-dimensional patch using texture to generate a two-dimensional image, V3dAnd (p) scaling the three-dimensional human face subjected to the rotational translation deformation, stacking the three-dimensional human face to generate a graph which is a normalized coordinate code, and inputting the graph into a convolutional neural network.
The two generated characteristics have complementarity, wherein the normalized coding of the projection belongs to the characteristics of the image view angle, and the characteristic is that the original image is directly fed into a convolution neural network. The posture self-adaptive feature belongs to the feature of the model view angle, and is characterized in that the original image is corrected by using the fitting intermediate result. The projected normalized code contains the whole face image, so that the context information of the image is richer, the normalized code is suitable for face positioning and rough fitting, and the normalized code is important in the initial iterations; the pose self-adaptive feature is equivalent to positioning and correcting the human face in the image by using the current model parameters due to the convolution operation at the anchor point, so that the fitting task is gradually simplified, the pose self-adaptive feature is suitable for fitting on details, and is important in final iteration for several times.
And 03, transforming and fusing to obtain parameter residual errors.
It can be seen that the two characteristics have a complementary relationship, and in order to fully utilize the advantages of the two characteristics, a two-way parallel convolutional neural network structure is utilized to perform K iterations. In the k-th iteration, an initial parameter p is givenkWe use pkGenerating the attitude adaptive features and the projected normalized coordinate coding features, and training a two-way parallel convolutional neural network as shown in fig. 2, wherein the attitude adaptive feature branch comprises 5 convolutional layers, 4 pooling layers and a full-connection layer. The projected normalized coordinate coding branch comprises a pose self-adaptive convolutional layer, three common convolutional layers, three pooling layers and a full-connection layer. The network uses two parallel neural networks to respectively transform two characteristics, and uses a full connection layer to fuse. The fused final features are used for regressing the residual error between the current parameter and the target parameter:
Δpk=Netk(PAF(pk,I),PNCC(pk,I))
wherein p iskFor the current parameter, I is the input image, Δ pkFor the residual between the current parameter and the true residual, PAF is the attitude adaptive feature, PNCC is the normalized coordinate code, NetkIs a two-way parallel convolutional neural network.
The basic idea of how to train a convolutional neural network is to make the regressed parameter residuals close to the true parameter residuals. However, since the importance of the parameters of the face model is different, the importance of a few parameters (such as pose) is much greater than that of most parameters, and therefore, the loss of each parameter needs to be weighted during training. The weights in the conventional algorithm are independent of each other and are usually determined manually or based on "the loss caused by estimating some parameter incorrectly". However, the weights of the parameters are correlated, and for example, before the posture parameters are accurate enough, it is meaningless to estimate the expression parameters. The invention obtains the weight values of all parameters in a unified way by optimizing an energy function, and designs the following optimally Weighted Parameter Distance loss (Optimized Weighted Parameter Distance Cost-OWPDC):
Eowpdc=(Δp-(pg-p0))Tdiag(w*)(Δp-(pg-p0))
Figure GDA0002234280830000072
wherein p isc=p0+ Δ p, w is more than or equal to 0 and less than or equal to 1, w is a parameter weight, Δ p is the output of the convolutional neural network, pgFor true residual, p0As an input parameter for the current iteration, pcFor the current parameters, V (p) is the deformation and weak perspective projection function, and diag is the diagonal matrix construction.
As shown in the formula, by weighting the true residual diag (w) × (p)g-pc) Adding to the current parameter pcIn the method, the three-dimensional face constructed by the updated parameters is expected to be closer to the real face V (p)g). Meanwhile, since the fitting ability of the neural network is limited, λ | | | diag (w) × (p) is usedg-pc)||2The pressure of the current parameters on the neural network is modeled and fitted, and the pressure is added into a loss term, so that the neural network is expected to assign weights to the parameters with higher cost performance.
During the training process, finding the optimal w for each sample is too complex, so V (p) is usedc+diag(w)*(pg-pc) In p)gUsing taylor expansion to obtain:
||V′(pg)*diag(w-1)*Δpc||2+λ||diag(w)*Δpc||2
wherein, V' (p)g) Is V (p)g) The above equation is expanded and the constant term is removed to obtain:
wT(diag(Δpc)V′(pg)TV′(pg)diag(Δpc))w-2*1T(diag(Δpc)V′(pg)TV′(pg)diag(Δpc))w
-λ*wTdiag(Δpc.*Δpc)w
let H equal V' (p)g)diag(Δpc) Then the original optimization problem can be written as:
Figure GDA0002234280830000071
0≤w≤1
the above formula is a standard quadratic programming problem, which can be solved quickly by interior point method. However, the computation of H in this loss function is very time consuming, recalculating H while training each sample makes training time unacceptable. The experiment shows that the only non-constant term of H is V' (p)g) For each training sample V' (p)g) Is stationary. Thus, before training, V' (p) of each sample can be comparedg) Calculated and stored, and directly read during training. The weight value obtained, i.e., the weight lost by each parameter in the OWPDC, may describe the priority of each parameter.
And step 04, updating the initial parameters according to the parameter residual errors.
Then adding the input parameter and the parameter residual error to obtain a better parameter pk+1=pk+ΔpkAnd performing next iteration including input feature construction and parameter estimation of the convolutional neural network. After K iterations to reach the preset threshold, V (p) is usedk) And obtaining the position of each point on the three-dimensional face on the image.
And step 05, collecting face key points on the three-dimensional face model.
Because the existing face key point training sample is usually within the medium posture, the invention generates the training sample under the large posture by performing out-of-plane rotation on the existing training sample, and the method specifically comprises the following steps:
given a training sample comprising a face image and manually calibrated key points, a three-dimensional model of the face in the image can be obtained by using key point-based three-dimensional face model fitting. Some anchor points are then sampled uniformly over the background area. For each anchor point, its depth is estimated from the point on the three-dimensional face model that is closest to it. After the depths of all anchor points are obtained, triangularization is used to group the anchor points into a series of triangular patches. These patches, together with the fitted three-dimensional face, constitute the depth information of the image. The virtual depth image can be rotated out of plane in a three-dimensional space and rendered at any angle, and the appearance of the human face in different postures in the image is generated. The deflection angle is used as the step length by 5 degrees, and a series of virtual samples are generated by gradually expanding until the step length reaches 90 degrees.
The method overcomes the defect that the traditional key point detection algorithm can not position the self-shielding key points, directly carries out three-dimensional face model fitting by the image, and samples the key points from the fitted three-dimensional face. In the process of face fitting, except for using normalized coordinate coding of feature projection of an image view, special model view feature 'attitude adaptive feature' is designed, and the feature can use a fitting intermediate result to carry out implicit orthogonalization on the image, so that a fitting task is simplified step by step, and fitting accuracy is further improved. Because the image visual angle characteristic and the model visual angle characteristic have a complementary relation, in order to combine the advantages of the two characteristics, the two input characteristics are simultaneously transformed and fused by utilizing a double-path parallel convolution neural network, and finally the fused characteristics are used for model parameter regression. When the convolutional neural network is trained, the fitting precision is further improved by intensively fitting a plurality of important parameters of the convolutional neural network by considering the priority of the parameters of the face model. Finally, the invention realizes the detection of the key points of the face in the full posture.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (8)

1. A face key point detection method based on a three-dimensional face model is characterized by comprising the following steps:
step 01, acquiring initial parameters of a face image and a three-dimensional face model from a face training sample;
step 02, generating an attitude self-adaptive feature and a normalized coordinate code according to the face image and the initial parameters;
step 03, respectively carrying out transformation fusion on the attitude self-adaptive features and the normalized coordinate codes by using a convolutional neural network to obtain real residual errors and parameter residual errors of initial parameters;
step 04, updating the initial parameters according to the parameter residual errors, and turning to step 02 until the parameter residual errors reach preset threshold values;
step 05, updating the three-dimensional face model by using parameter residual errors reaching a preset threshold value, and collecting face key points on the three-dimensional face model;
the step 03 specifically includes:
and transforming the attitude self-adaptive feature and the normalized coordinate code respectively according to two parallel convolutional neural networks, fusing the transformed features by using an additional full-connection layer, and regressing a fusion result to obtain a parameter residual error.
2. The method for detecting the key points of the human face based on the three-dimensional human face model according to claim 1, wherein in the step 02, the three-dimensional human face model is projected when the pose adaptive feature is generated, and a formula in the projection includes:
Figure FDA0002234280820000011
wherein V (p) is a function for constructing and projecting a three-dimensional face model, and can obtain two-dimensional coordinates of each key point on the three-dimensional model on an image,
Figure FDA0002234280820000012
representing the average shape of a human face, Aidprincipal component axis, alpha, of PCA extracted from three-dimensional face with neutral expressionidDenotes a shape parameter, Aexprepresenting the principal component axis, α, of PCA extracted from the difference between expressive and neutral facesexpExpressing expression parameters, f is a scaling factor, Pr is a forward projection matrix, R is a rotation matrix, and the expression parameters are represented by a quadruple [ q [ [ q ]0,q1,q2,q3]Construction of t2dFor the translation vector, the fitting target parameter is [ f, R, t ]2didexp]The set of fitting target parameters is [ f, q ]0,q1,q2,q3,t2didexp]。
3. The method for detecting the key points of the human face based on the three-dimensional human face model as claimed in claim 2, wherein the key points are composed of four tuples [ q [ ]0,q1,q2,q3]The formula for constructing the rotation matrix is:
Figure FDA0002234280820000021
4. the method for detecting the key points of the human face based on the three-dimensional human face model according to claim 3, wherein the generating of the pose adaptive feature in the step 02 comprises:
calculating the cylindrical coordinates of each vertex of the three-dimensional face model, and sampling n x n anchor points at equal intervals on an azimuth axis and a height axis; in the fitting process, the anchor points are deformed, scaled, rotated and translated by using the parameters of the current model to obtain the positions of the anchor points on the image, and the posture self-adaptive feature is generated.
5. The method for detecting key points of a human face based on a three-dimensional human face model according to claim 3, wherein the generating of the normalized coordinate code in the step 02 comprises the following formula:
PNCC(I,p)=I&ZBuffer(V3d(p),NCC)
Figure FDA0002234280820000022
wherein PNCC is the normalized coordinate code of projection, NCC is the normalized coordinate code, I is the input face image, p is the current parameter,&for stacking operations in channel dimensions, the function Zbuffer is a function for rendering a three-dimensional patch using texture to generate a two-dimensional image, V3dAnd (p) the three-dimensional human face after the rotation translation deformation is scaled, and the images generated by stacking together are coded by normalized coordinates.
6. The method for detecting the key points of the human face based on the three-dimensional human face model according to claim 1, wherein the calculation formula of the parameter residual error in the step 03 is as follows:
Δpk=Netk(PAF(pk,I),PNCC(pk,I))
wherein p iskFor the current parameter, I is the input image, Δ pkFor the residual between the current parameter and the true residual, PAF is the attitude adaptive feature, PNCC is the normalized coordinate code, NetkIs a two-way parallel convolutional neural network.
7. The method for detecting the key points of the human face based on the three-dimensional human face model according to any one of claims 1 to 6, wherein the step 03 further comprises training the convolutional neural network, and weighting the real residuals during training, wherein the formula is as follows:
Figure FDA0002234280820000031
wherein p isc=p0+ Δ p, w is more than or equal to 0 and less than or equal to 1, w is a parameter weight, Δ p is the output of the convolutional neural network, pgFor true residual, p0As an input parameter for the current iteration, pcFor the current parameter, V (p) isDeformation and weak perspective projection function, diag is a diagonal matrix structure.
8. The method for detecting key points of a human face based on a three-dimensional human face model according to any one of claims 1 to 6, wherein the step 04 of updating the initial parameters according to the parameter residuals is to add the parameter residuals and the initial parameters.
CN201710159215.1A 2017-03-17 2017-03-17 Face key point detection method based on three-dimensional face model Active CN107122705B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710159215.1A CN107122705B (en) 2017-03-17 2017-03-17 Face key point detection method based on three-dimensional face model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710159215.1A CN107122705B (en) 2017-03-17 2017-03-17 Face key point detection method based on three-dimensional face model

Publications (2)

Publication Number Publication Date
CN107122705A CN107122705A (en) 2017-09-01
CN107122705B true CN107122705B (en) 2020-05-19

Family

ID=59717971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710159215.1A Active CN107122705B (en) 2017-03-17 2017-03-17 Face key point detection method based on three-dimensional face model

Country Status (1)

Country Link
CN (1) CN107122705B (en)

Families Citing this family (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729904A (en) * 2017-10-09 2018-02-23 广东工业大学 A kind of face pore matching process based on the limitation of 3 D deformation face
CN109726613B (en) * 2017-10-27 2021-09-10 虹软科技股份有限公司 Method and device for detection
CN107944367B (en) * 2017-11-16 2021-06-01 北京小米移动软件有限公司 Face key point detection method and device
CN107967454B (en) * 2017-11-24 2021-10-15 武汉理工大学 Double-path convolution neural network remote sensing classification method considering spatial neighborhood relationship
CN108229313B (en) * 2017-11-28 2021-04-16 北京市商汤科技开发有限公司 Face recognition method and apparatus, electronic device, computer program, and storage medium
CN113688737A (en) * 2017-12-15 2021-11-23 北京市商汤科技开发有限公司 Face image processing method, face image processing device, electronic apparatus, storage medium, and program
CN108320274A (en) * 2018-01-26 2018-07-24 东华大学 It is a kind of to recycle the infrared video colorization method for generating confrontation network based on binary channels
CN108876894B (en) * 2018-02-01 2022-07-15 北京旷视科技有限公司 Three-dimensional human face model and three-dimensional human head model generation method and generation device
CN108764048B (en) * 2018-04-28 2021-03-16 中国科学院自动化研究所 Face key point detection method and device
CN108898556A (en) * 2018-05-24 2018-11-27 麒麟合盛网络技术股份有限公司 A kind of image processing method and device of three-dimensional face
CN111819568A (en) * 2018-06-01 2020-10-23 华为技术有限公司 Method and device for generating face rotation image
CN109035338B (en) * 2018-07-16 2020-11-10 深圳辰视智能科技有限公司 Point cloud and picture fusion method, device and equipment based on single-scale features
CN109299643B (en) * 2018-07-17 2020-04-14 深圳职业技术学院 Face recognition method and system based on large-posture alignment
CN109087379B (en) * 2018-08-09 2020-01-17 北京华捷艾米科技有限公司 Facial expression migration method and facial expression migration device
CN109191584B (en) 2018-08-16 2020-09-18 Oppo广东移动通信有限公司 Three-dimensional model processing method and device, electronic equipment and readable storage medium
WO2020037676A1 (en) * 2018-08-24 2020-02-27 太平洋未来科技(深圳)有限公司 Three-dimensional face image generation method and apparatus, and electronic device
WO2020041934A1 (en) * 2018-08-27 2020-03-05 华为技术有限公司 Data processing device and data processing method
CN109300114A (en) * 2018-08-30 2019-02-01 西南交通大学 The minimum target components of high iron catenary support device hold out against missing detection method
CN109448007B (en) * 2018-11-02 2020-10-09 北京迈格威科技有限公司 Image processing method, image processing apparatus, and storage medium
CN109726692A (en) * 2018-12-29 2019-05-07 重庆集诚汽车电子有限责任公司 High-definition camera 3D object detection system based on deep learning
CN109902616B (en) * 2019-02-25 2020-12-01 清华大学 Human face three-dimensional feature point detection method and system based on deep learning
CN109934196A (en) * 2019-03-21 2019-06-25 厦门美图之家科技有限公司 Human face posture parameter evaluation method, apparatus, electronic equipment and readable storage medium storing program for executing
CN110136243B (en) * 2019-04-09 2023-03-17 五邑大学 Three-dimensional face reconstruction method, system, device and storage medium thereof
CN110059602B (en) * 2019-04-10 2022-03-15 武汉大学 Forward projection feature transformation-based overlook human face correction method
CN110008873B (en) * 2019-04-25 2021-06-22 北京华捷艾米科技有限公司 Facial expression capturing method, system and equipment
CN112215050A (en) * 2019-06-24 2021-01-12 北京眼神智能科技有限公司 Nonlinear 3DMM face reconstruction and posture normalization method, device, medium and equipment
CN110298319B (en) * 2019-07-01 2021-10-08 北京字节跳动网络技术有限公司 Image synthesis method and device
CN110348406B (en) * 2019-07-15 2021-11-02 广州图普网络科技有限公司 Parameter estimation method and device
CN110705355A (en) * 2019-08-30 2020-01-17 中国科学院自动化研究所南京人工智能芯片创新研究院 Face pose estimation method based on key point constraint
CN110866962B (en) * 2019-11-20 2023-06-16 成都威爱新经济技术研究院有限公司 Virtual portrait and expression synchronization method based on convolutional neural network
CN111062266B (en) * 2019-11-28 2022-07-15 东华理工大学 Face point cloud key point positioning method based on cylindrical coordinates
CN111145166B (en) * 2019-12-31 2023-09-01 北京深测科技有限公司 Security monitoring method and system
CN111222469B (en) * 2020-01-09 2022-02-15 浙江工业大学 Coarse-to-fine human face posture quantitative estimation method
CN111401157A (en) * 2020-03-02 2020-07-10 中国电子科技集团公司第五十二研究所 Face recognition method and system based on three-dimensional features
CN111489435B (en) * 2020-03-31 2022-12-27 天津大学 Self-adaptive three-dimensional face reconstruction method based on single image
CN111898552B (en) * 2020-07-31 2022-12-27 成都新潮传媒集团有限公司 Method and device for distinguishing person attention target object and computer equipment
CN112002014B (en) * 2020-08-31 2023-12-15 中国科学院自动化研究所 Fine structure-oriented three-dimensional face reconstruction method, system and device
CN112307899A (en) * 2020-09-27 2021-02-02 中国科学院宁波材料技术与工程研究所 Facial posture detection and correction method and system based on deep learning
CN112287820A (en) * 2020-10-28 2021-01-29 广州虎牙科技有限公司 Face detection neural network, face detection neural network training method, face detection method and storage medium
CN112800882A (en) * 2021-01-15 2021-05-14 南京航空航天大学 Mask face posture classification method based on weighted double-flow residual error network
CN113643366B (en) * 2021-07-12 2024-03-05 中国科学院自动化研究所 Multi-view three-dimensional object attitude estimation method and device
CN113838134B (en) * 2021-09-26 2024-03-12 广州博冠信息科技有限公司 Image key point detection method, device, terminal and storage medium
CN113870227B (en) * 2021-09-29 2022-12-23 赛诺威盛科技(北京)股份有限公司 Medical positioning method and device based on pressure distribution, electronic equipment and storage medium
CN114125273B (en) * 2021-11-05 2023-04-07 维沃移动通信有限公司 Face focusing method and device and electronic equipment
CN114360031B (en) * 2022-03-15 2022-06-21 南京甄视智能科技有限公司 Head pose estimation method, computer device, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081733A (en) * 2011-01-13 2011-06-01 西北工业大学 Multi-modal information combined pose-varied three-dimensional human face five-sense organ marking point positioning method
CN105005755A (en) * 2014-04-25 2015-10-28 北京邮电大学 Three-dimensional face identification method and system
CN105354531A (en) * 2015-09-22 2016-02-24 成都通甲优博科技有限责任公司 Marking method for facial key points
CN105678284A (en) * 2016-02-18 2016-06-15 浙江博天科技有限公司 Fixed-position human behavior analysis method
CN106022228A (en) * 2016-05-11 2016-10-12 东南大学 Three-dimensional face recognition method based on vertical and horizontal local binary pattern on the mesh

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081733A (en) * 2011-01-13 2011-06-01 西北工业大学 Multi-modal information combined pose-varied three-dimensional human face five-sense organ marking point positioning method
CN105005755A (en) * 2014-04-25 2015-10-28 北京邮电大学 Three-dimensional face identification method and system
CN105354531A (en) * 2015-09-22 2016-02-24 成都通甲优博科技有限责任公司 Marking method for facial key points
CN105678284A (en) * 2016-02-18 2016-06-15 浙江博天科技有限公司 Fixed-position human behavior analysis method
CN106022228A (en) * 2016-05-11 2016-10-12 东南大学 Three-dimensional face recognition method based on vertical and horizontal local binary pattern on the mesh

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《Face Alignment Across Large Poses: A 3D Solution》;Xiangyu Zhu等;《2016 IEEE Conference on Computer Vision and Pattern Recognition》;20161231;摘要第13-17行,第3.1节,第3.2节,第5.1节公式10,第4.1节以及图2和图4 *

Also Published As

Publication number Publication date
CN107122705A (en) 2017-09-01

Similar Documents

Publication Publication Date Title
CN107122705B (en) Face key point detection method based on three-dimensional face model
CN111795704B (en) Method and device for constructing visual point cloud map
CN106679648B (en) Visual inertia combination SLAM method based on genetic algorithm
CN105843223B (en) A kind of mobile robot three-dimensional based on space bag of words builds figure and barrier-avoiding method
JP4785880B2 (en) System and method for 3D object recognition
CN109671120A (en) A kind of monocular SLAM initial method and system based on wheel type encoder
CN105844276A (en) Face posture correction method and face posture correction device
CN114782691A (en) Robot target identification and motion detection method based on deep learning, storage medium and equipment
CN110288638B (en) Broken bone model rough registration method and system and broken bone model registration method
CN104395932A (en) Method for registering data
CN111862299A (en) Human body three-dimensional model construction method and device, robot and storage medium
CN113108773A (en) Grid map construction method integrating laser and visual sensor
CN113327275B (en) Point cloud double-view-angle fine registration method based on multi-constraint point to local curved surface projection
CN113470084B (en) Point set registration method based on outline rough matching
CN110097584A (en) The method for registering images of combining target detection and semantic segmentation
CN108597016B (en) Torr-M-Estimators basis matrix robust estimation method based on joint entropy
CN109425348A (en) A kind of while positioning and the method and apparatus for building figure
CN107016319A (en) A kind of key point localization method and device
CN111998862A (en) Dense binocular SLAM method based on BNN
Pilu et al. Training PDMs on models: the case of deformable superellipses
CN103700135B (en) A kind of three-dimensional model local spherical mediation feature extracting method
CN111598995B (en) Prototype analysis-based self-supervision multi-view three-dimensional human body posture estimation method
CN114565728A (en) Map construction method, pose determination method, related device and equipment
CN105488491A (en) Human body sleep posture detection method based on pyramid matching histogram intersection kernel
CN114332070A (en) Meteor crater detection method based on intelligent learning network model compression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant