CN108446672B - Face alignment method based on shape estimation of coarse face to fine face - Google Patents

Face alignment method based on shape estimation of coarse face to fine face Download PDF

Info

Publication number
CN108446672B
CN108446672B CN201810358918.1A CN201810358918A CN108446672B CN 108446672 B CN108446672 B CN 108446672B CN 201810358918 A CN201810358918 A CN 201810358918A CN 108446672 B CN108446672 B CN 108446672B
Authority
CN
China
Prior art keywords
face
shape
human face
estimation
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810358918.1A
Other languages
Chinese (zh)
Other versions
CN108446672A (en
Inventor
李晶
万俊
常军
吴玉佳
肖雅夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201810358918.1A priority Critical patent/CN108446672B/en
Publication of CN108446672A publication Critical patent/CN108446672A/en
Application granted granted Critical
Publication of CN108446672B publication Critical patent/CN108446672B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human face alignment method based on rough-to-fine face shape estimation, which is characterized in that aiming at any input human face picture, an initialized human face shape is estimated firstly, then the real shape of the human face is gradually approximated, a multitask deep learning framework is used for estimating the position of a main characteristic point and the human face expression of the human face, a head posture classification model based on a convolutional neural network is constructed for accurately estimating and classifying the head posture of the human face, and the head posture classification result and the estimation result of the position of the human face expression and the main characteristic point are utilized to obtain a more accurate initialized shape; and training respective regressors according to the classification results of the postures and the expressions based on the initialized shape, and updating the face shape to be close to the standard shape. The invention improves the robustness to the human face expression, the head posture and the illumination shielding difference by constructing a more accurate human face initialization shape and adopting a higher-level cascade regression frame.

Description

Face alignment method based on shape estimation of coarse face to fine face
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a face alignment method based on rough-to-fine face shape estimation in the field of face recognition of digital images.
Background
The face alignment can provide accurate face shape information with specific semantics, and can help to realize geometric image normalization and feature extraction. Therefore, the face alignment is an indispensable important component of face recognition, face pose expression analysis, human-computer interaction and three-dimensional face modeling, and is widely applied to the fields of security protection, public security arrangement and control, intelligent access control, human-computer interaction, auxiliary driving, movie and television production, video conference and the like. In practical situations, the face alignment problem still faces a great challenge due to differences in facial expressions, head pose, lighting conditions and partial occlusion. Therefore, how to better solve the face alignment under these unconstrained conditions is a main trend for studying the face alignment problem at the present stage.
In recent years, with the wide application of the cascading regression framework to the field of face alignment, human research on the face problem has made rapid progress. The main reason for the success of the cascade shape regression method is to build a larger regression by cascading weak regressors. The structure greatly enhances the generalization capability and the accuracy of the face alignment algorithm, avoids the solution of a Hessian matrix and a Jacobian matrix, and greatly improves the speed of the algorithm.
In the early period of face alignment algorithm based on optimization (ASM [ document 1], AAM [ document 2] - [ document 4], CLM [ document 5] - [ document 7]), the purpose of face alignment is achieved by optimizing an error equation, and the performance of the algorithm depends on the degree of superiority and inferiority of the design of the error equation and the optimization effect of the error equation. The algorithm considers the face alignment problem as a nonlinear optimization problem to solve, and when the nonlinear optimization problem is solved, the most effective, most reliable and fastest method is a second-order descent method. In solving the problem of computer vision, the second-order descent method has two major disadvantages: 1) the objective function is not differentiable, so that the idea of numerical approximation cannot be realized; 2) the hessian matrix is over-dimensioned and indeterminate. The problem is too costly or unsolvable to solve due to the existence of these problems.
The face alignment algorithm based on the cascade shape regression [ document 8] - [ document 14] gradually estimates the shape increment according to the initialized shape to continuously approach the standard shape, so that a Hessian matrix and a Jacobian matrix do not need to be calculated. The face alignment algorithm based on shape regression has good effects on timeliness and accuracy, and has become a mainstream algorithm in the field of face alignment. The face alignment algorithm based on shape regression [ document 8] - [ document 11] needs to have an initialized shape, which is generally an average face. The algorithm firstly extracts features on the reference points (neighborhoods) of the average face, the features of all the reference points form a feature vector, and the algorithm directly estimates the mapping relation R between the difference between the average face and the standard shape and the corresponding feature vector. The testing stage optimizes the average face to approximate the true shape using the average face as initialization data and R regression estimated in the training stage.
SDM [8] first proposed a framework using cascaded regression to solve the face alignment problem, by using sift features [ document 15] and multiple cascaded regressions to enhance robustness to differences in facial expressions and head pose, illumination variations. Cao [ ref 9] proposes a nonparametric shape model in ESR, considering that the final regression shape of each face can be regarded as the linear sum of the initialization shape and all training face shape vectors, and an accurate model can be learned quickly by using shape index features and an associated feature selection method. Burgos-Artizzu [ document 10] and the like propose that PCPR can detect occlusion information while estimating the position of a reference point, and select a shape index feature without occlusion according to the occlusion information to solve the problem of face alignment under occlusion. Ren [ document 11] and the like propose effective local binary features with extremely high calculation speed and use random forests to carry out classification regression, and the speed of the algorithm reaches 3000 fps. Zhu [ document 16] and the like divide face alignment into a rough selection stage and a fine selection stage in CCFS, wherein the rough selection stage firstly constructs a shape space containing a plurality of candidate face shapes, then determines a subspace and gives the subspace to the fine selection stage for processing, and meanwhile discards other subspaces which have larger differences with standard shapes and have no hopes; this space is continually narrowed during the fine selection phase until it converges to a very small subspace in which the final face shape can be determined.
The face alignment algorithm at the present stage can well solve the face alignment problem with small changes of expressions, head postures and illumination differences, for example, a face picture under a 300-W data set [ document 17] comman subset has relatively small changes of expressions, head postures and illumination differences, the optimal error of the face alignment algorithm [ document 18] on the data set is 4%, and the optimal error of the face alignment algorithm [ document 19] on a COFW [10] data set with a serious occlusion problem is 6.5%, so that the face alignment under an unconstrained condition is a problem to be solved urgently in the face alignment field at the present stage.
[ document 1] dyes T F, Taylor C J, Cooper D H, Graham J. active shape models-the guiding and the application. computer vision and image understating, 1995,61(1):38-59.
[ document 2] Matthews I, Baker S.active adaptation models viewed, International Journal of Computer Vision,2004,60(2): 135- "164.
[ document 3] Sauer P, Cootes T F, Taylor C J. Accurate regression processes for active application models// Proceedings of the British Machine Vision conference. Dunde, Scotland,2011: 681-.
[ document 4] Cootes T F, Edwards G J, Taylor C J.active application models.IEEE Transactions on Pattern Analysis and Machine understanding, 2001,23(6):581-585.
[ document 5] Asthana A, Zafeiriou S, Cheng S, cationic M.Robust distributed reactive map matching with constrained local models// IEEE Conference on Computer Vision and Pattern recognition. Portland, USA,2013: 3444-.
[ document 6] cristiniace D, Cootes T.feature detection and tracking with constrained local models// Proceedings of the British Machine Vision conference. Edinburgh, UK,2006: 929-.
[ document 7] Asthana A, Zafeiriou S, CHENG Shi-yang, cationic M.Inclusion Face Alignment in the wild.// IEEE Conference on Computer Vision and Pattern recognition. Columbus, USA 2014: 1859-.
[ document 8] Xiong Xue-han, Torre F D L.Supervised device method and its applications to face alignment// IEEE Conference on Computer Vision and Pattern registration. Portland, USA,2013: 532-.
[ document 9] Cao Xu-dong, Wei Yi-chen, Wen Fang, Sun Jian.face alignment by application program regression. International Journal of Computer Vision,2014,107(2): 177-.
[ document 10] Burgos-Artizzu X P, Perona P, Dollar P.Robust face and evaluation unit oclusion// IEEE International Conference on Computer Vison.Sydney, Australia,2013: 1513-.
[ document 11] Ren Shao-q, Cao Xu-dong, Wei Yi-chen, Sun Jian.face alignment at 3000fps via regressing local binding facilities// IEEE Conference on Computer Vision and Pattern recognition. Columbus, USA 2014: 1685-.
[ document 12] Dollar P, Welinder P, Perona P.Cascaded position regression.// IEEE Conference on Computer Vision and Pattern recognition. san Francisco, USA,2010: 1078-.
[ document 13] Tzimiopoulos G, cationic M.Gauss-newton defined part models for face alignment in-the-wire.// IEEE Conference on Computer Vision and Pattern recognition. Columbus, USA,2014: 1851-.
[ document 14] Smith B M, Brandt J, Lin Z, Zhang L. Nonparametric context modifying of local impedance for position-and expression-robust surface localization.// IEEE Conference on Computer Vision and Pattern recognition. Columbus, USA,2014: 1741-.
[ document 15] Lowe D G.distinctive image features from scale-innovative keypoints. International Journal of Computer Vision,2004,60(2): 91-110.
[ document 16] Zhu Shi-zhan, Chen Li.LOY Chen-change, Tang Xiao-ou.face Alignment by Coarse code-to-Fine Shape searching// IEEE Conference on Computer Vision and Pattern recognition. Boston, USA 2015: 4998-.
[ document 17] C.Sagonas, G.Tzimioperolos, S.Zafeiriou, M.Pantic, A semi-automatic method for facial recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern recognition works, 2013, pp.896-903.
[ document 18] S.Xiao, J.Feng, J.xing, H.Lai, S.Yan, A.Kassim, Robust facial feature detection via accurate assessment-sensitive networks, in: European Conference on Computer Vision,2016, pp.57-72.
[ document 19] J.Zhang, M.Kan, S.Shan, X.Chen, occupancy-free face alignment: Deep regression networks with ground-corrected auto-encoders in IEEE Conference on Computer Vision and Pattern Recognition,2016, pp.3428-3437.
[ document 20] V.Blankz, T.Vetter, Face recognition based on fixing a 3d morphable model, IEEE Transactions on pattern analysis and machine interaction.25 (9) (2003) 1063-.
[ document 21] P.Paysan, R.Knothe, B.Amberg, S.Romdhani, T.Vetter, A3 d face model for position and initialization surface recognition, in: Advanced video 440and signal based survey, 2009.AVSS'09.six IEEE International Conference on, IEEE,2009, pp.296-301.
[ document 22] C.Cao, Y.Weng, S.ZHou, Y.Tong, K.ZHou, Facewarehouse: A3 d facial expression database for visual computing, IEEE Transactions on Visualization and Computer Graphics 20(3) (2014)413-425.
[ document 23] X.Zhu, Z.Lei, X.Liu, H.Shi, S.Z.Li, Face alignment across large sites: A3 d solution, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016, pp.146{155.
Disclosure of Invention
In order to solve the technical problems, the invention provides a face alignment method based on rough-to-fine face shape estimation, which mainly solves the problem of low face alignment precision under the conditions of difference of facial expressions, head postures and illumination conditions and partial occlusion.
The technical scheme adopted by the invention is a face alignment method based on rough-to-fine face shape estimation, aiming at any input face picture, firstly estimating the shape of an initialized face, then gradually approximating the real shape of the face, comprising the following steps,
step 1, estimating the positions of main characteristic points of a human face and the facial expression by using a multitask deep learning frame;
step 2, constructing a head pose classification model based on a convolutional neural network to accurately estimate and classify the head pose of the human face;
step 3, obtaining a more accurate initialized shape by using the head posture classification result obtained in the step 2 and the estimation results of the positions of the facial expressions and the main characteristic points obtained in the step 1;
and 4, training respective regressors based on the initialized shape obtained in the step 3 according to the classification results of the postures and the expressions, and updating the face shape to be close to the standard shape.
In step 1, in the multitask deep learning framework, the multitask learning comprises estimation of main task human face main feature points and estimation of other subtasks, wherein the human face main feature points comprise left and right mouth corners, nose tips and left and right eye centers, and the subtasks comprise estimation of head pose, gender, eye shape and mouth shape.
In step 2, firstly, three-dimensional modeling is carried out on the human face based on the head posture classification model of the convolutional neural network to obtain angle parameters pitch, jaw and roll of the human face, and then the human face pictures are classified according to the value range of the angle parameters; for the classified pictures, generating other pictures by using a human face sidedness method to obtain a new picture set; and taking the new picture set as a training set of the head posture classification model based on the convolutional neural network to finish the training of the model.
In step 3, the corresponding output class c is obtained by the head pose classification model based on the convolutional neural network for the picture, and the corresponding average human face shape is selected
Figure BDA0001635446260000051
Then, the position of the main characteristic point is adjusted so as to ensure that
Figure BDA0001635446260000052
The error between the main characteristic point and the detected main characteristic point of the face is minimum, and the initialized shape S of the face of the picture is obtainedi
In step 4, a higher-level cascade regression frame is adopted, and domain division is firstly carried out on the optimization space of the face alignment problem, so that the face shapes contained in each domain are relatively similar, the same gradient descending direction is achieved in the training of a regressor, and each domain trains a respective regressor; when the face shape is updated, it is first determined which domain the face shape belongs to, and then the regression device of the corresponding domain is used to update the face shape.
The technical scheme provided by the invention is a simple face alignment method with good robustness, and the face shape estimation method based on the convolutional neural network from coarse to fine can be used for selecting the face shape with close additional attributes as an initialization shape with high accuracy, so that the dependence of the initialization shape on an average face is reduced, the robustness of an algorithm on the difference of the head posture, the facial expression, the shielding and the illumination condition of the face is enhanced, and the alignment effect is improved. The invention improves the robustness of the algorithm to the human face expression, the head posture and the illumination shielding difference by constructing a more accurate human face initialization shape and adopting a higher-level cascade regression frame.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
FIG. 2 is a head pose classification model based on a convolutional neural network constructed in an embodiment of the present invention.
Fig. 3 is a schematic diagram comparing a conventional face alignment method with the method of the present invention when processing exaggerated pictures of face head pose and expression.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The face alignment method based on the estimation of the face shape from coarse to fine is a simple face alignment method with good robustness. The face shape estimation method based on the convolutional neural network from coarse to fine can select the face shape with approximate additional attributes as the initialized shape with high accuracy, so that the dependence of the initialized shape on an average face is reduced, the robustness of the algorithm on the difference of the head pose, the facial expression, the shielding and the illumination condition of the face is enhanced, and the effect of the algorithm is improved.
Referring to fig. 1, the human face alignment method based on rough-to-fine face shape estimation provided by the present invention firstly estimates an initialized face shape for any input human face picture, and then gradually approximates to the real shape of the human face; the specific implementation comprises the following steps:
step 1: estimating the positions of main characteristic points of the human face and the facial expression by using a multi-task deep learning frame;
the multi-task learning comprises estimation of main task human face main characteristic points and estimation of other subtasks. Wherein the subtasks include estimation of head pose, gender, eye shape, mouth shape. The main task and the sub-task use least squares and cross entropy functions, respectively, as loss functions.
In the embodiment, the multi-task learning in step 1 is defined as the estimation of main characteristic points (left and right mouth corners, nose tip, left and right eye centers) of the main task face and the estimation of other subtasks, and the corresponding marks of the main characteristic points and the estimation are marked as
Figure BDA0001635446260000061
Wherein i represents the index number of the training pictures, and N is the number of pictures in the training picture set.
Figure BDA0001635446260000062
The labels representing the main feature point detection tasks, the remaining labels representing other additional attribute (head pose, gender, eyes, mouth) tasks.
Figure BDA0001635446260000063
Coordinates representing 5 feature points: (
Figure BDA0001635446260000064
Represented as a vector with dimensions of 10 dimensions),
Figure BDA0001635446260000065
representing 5 different face poses (0, ± 30 °, ± 60 °) divided by yaw angle.
Figure BDA0001635446260000066
Is a binary feature, representing male or female, respectively.
Figure BDA0001635446260000067
Respectively showing the wearing of glasses, open eyes and closed eyes,
Figure BDA0001635446260000068
indicating smile, grin, mouth closed, mouth open. The objective function of the multitask learning can be expressed as:
Figure BDA0001635446260000069
wherein,
Figure BDA00016354462600000610
representing a set of all feature vectors, F (x)i;Wr)=(Wr)TxiIs a linear function, representing x according to the ith featureiAnd the mapping relation W obtained by trainingrA process of calculating the positions of the main feature points of the face, wherein WrRepresenting a feature x fromiTo the main feature points of the real human face
Figure BDA00016354462600000611
The mapping relationship between them.
Figure BDA0001635446260000071
To use the posterior probability expressed by the softmax function,
Figure BDA0001635446260000072
a representation matrix WaJ (th) column, WaRepresenting a feature x fromiMarking to subtask a
Figure BDA0001635446260000073
The mapping relationship between the two is the parameter of the maximum likelihood estimation equation, and m represents a certain label of the subtask a, for example, when a is a gender estimation subtask, m may be 0 or 1.
For the estimation of the subtask a, the maximum likelihood estimation method is adopted to respectively calculate the probabilities of different label predicted values (m is different in value). For example, when a is a gender estimation subtask, m may be 0 or 1, i.e., the probability of being male or female, respectively, is found. The probability is calculated from the maximum likelihood estimate and the softmax function.
Figure BDA0001635446260000074
Is a parameter of m corresponding maximum likelihood estimates, e.g. W1 aAnd when the corresponding m is 1, estimating parameters of the maximum likelihood.
Figure BDA0001635446260000075
Is a regular term, the parameter W ═ Wr,{WaAnd (4) representing a penalty item, wherein a represents a task belonging to A, and A represents a set of all other additional attribute detection tasks without feature point detection. T represents all tasks, including the main task (main feature point detection) and a, and T is the task number.
λaAnd (4) representing the weight occupied by different subtasks in the overall objective function (the weight corresponding to the main characteristic point estimation task is 1).
Step 2: constructing a head posture classification model based on a convolutional neural network to accurately estimate and classify the head posture of the human face;
a head pose classification model based on a convolutional neural network is constructed, and the purpose is to perform fine estimation and classification on the head pose of the human face of any input human face picture by using the head pose classification model.
The method comprises the steps of firstly carrying out three-dimensional modeling on a human face based on a head posture classification model of a convolutional neural network to obtain angle parameters (pitch, jaw and roll) of the human face, and then classifying pictures of the human face according to the value range of the angle parameters. And generating other types of pictures by using a human face sidedness method for the pictures after classification to obtain a new picture set. And finishing the training of the model by taking the new picture set as a training set of the head posture classification model based on the convolutional neural network.
A large amount of training sets are needed for training the head posture classification model based on the convolutional neural network, and the size of the existing picture set (300-W) is too small, so that the existing picture set needs to be expanded. The embodiment enlarges the training set by synthesizing the face pictures with different postures by using a face sidedness method. Firstly, 3D modeling and classification are carried out on pictures, then, a plurality of new pictures with different postures (classes) are synthesized by using a human face sidetracking method, a plurality of synthesized pictures with serious distortion are removed,
the embodiment is realized as follows:
1) the head pose classification model based on the convolutional neural network firstly performs three-dimensional modeling on each picture face of a 300-W picture set [ document 17], and obtains angle parameters (pitch angle, jaw angle and roll angle) of the face.
When the three-dimensional modeling is carried out on the face with the two-dimensional picture, a 3D DMM model (document 20) (3D face deformation model) proposed in the article by Blenz and the like is adopted, a PCA dimension reduction method is used for describing a 3D face shape space,
Figure BDA0001635446260000081
wherein,
v represents a three-dimensional face of a person,
Figure BDA0001635446260000082
representing a three-dimensional average face, Aid、AexpShape principal components and expression principal components, which are three-dimensional face shape spaces, are derived from the BFM model [ document 21]]And Face-Warehouse [ reference 22]],aid、aexpRespectively a shape parameter and an expression parameter. Then, the three-dimensional face is projected to a two-dimensional plane through weak perspective projection according to the corresponding relation between the two-dimensional face reference point and the three-dimensional average face reference point, and the angle parameters of the face pose are estimated:
Figure BDA0001635446260000083
wherein,
r is selected from pitch, jaw, roll (a)Angle parameter), t2dFor the translation matrix, f is the scaling factor,
Figure BDA0001635446260000084
is an orthogonal projection matrix. S is a 2D picture or point cloud (generally representing the shape of a human face and a matrix obtained by arranging coordinates of human face characteristic points according to a certain sequence) obtained by projecting a 3D picture or point cloud, and a more accurate R matrix, namely pitch, jaw and roll angles, can be obtained by a method of multiple iterations and characteristic point matching.
2) And classifying the face pictures of the 300-W picture set according to the value range of the angle parameter, and calculating the average face shape of each class.
And classifying the 300-W picture set according to the estimation result of the angle parameter of the face posture. Firstly, the value range is obtained according to the pitch angle ([ -45 degrees, -15 degrees °)]、[-15°,15°]、[15°,45°]) Dividing the face pictures into 3 classes, and then respectively taking values of the 3 classes of pictures according to the jaw angle ([ -50 degrees and 50 degrees ]]Averagely divided into 5 classes), generating 15 classes of pictures in total, and then dividing the pictures according to the value range of roll angles ([ -50 degrees, 50 degrees)]Average classification into 5 classes) for a total of 75 classes. Denote class by c, { ScAnd the method refers to a set of class c pictures, and a face shape subspace is constructed corresponding to the class c face pictures. According to the classification rule, every adjacent 5 classes (c is 5j +1 … c is 5j +5 (where j is 0 … 24)) have close pitch and jaw angles, but the roll angles are different greatly, and most of the training pictures have only small angle changes, and the roll angles in the pictures are also small, namely the c-th picture3Relatively many pictures of 5j +3 types, c-th1、c2、c4、c5So c can be calculated first3The average shape of the face of the class picture is obtained, and then other four classes (c) are obtained after the class picture is rotated by a certain angle1、c2、c4、c5) The average shape of the human face can avoid that the average shape of the human face obtained by independent calculation is not strong in representativeness because a certain type of images are too few, so that the initialized shape error is larger; in addition, in calculating c3Picture-like face average shapeAdding attribute to picture according to human face
Figure BDA0001635446260000091
Further divided, i.e., opening (wearing glasses), closing, opening (laughing) and closing (grining) of the eyes, i.e., each class of picture has four average shapes, and if the number of pictures in a certain class of picture sets is 0, the average shape of the class is selected from the four average shapes with the smallest error. Assuming that there is no closed-eye or closed-mouth picture in a certain class, the average shape of the open-eye or closed-mouth picture is preferably selected, so that 300 face subspaces are formed, and the average face shape is calculated in each face subspace and used
Figure BDA0001635446260000092
Wherein γ ∈ (0,1,2, 3).
3) Using the 3D face model constructed as described above, a human face sidemapping method [ document 23] is used to synthesize new pictures of different poses (classes), expand a training set, and train a head pose classification model.
In practice, it is not possible to estimate the face angle parameters for each picture and then classify the pictures by the above method, which consumes a lot of time and space. Therefore, the embodiment proposes to construct a head pose classification model based on a convolutional neural network, and a corresponding classification result can be directly given for any input picture. The training of the model requires a large number of training picture sets, the embodiment utilizes a face sidedness method to synthesize face pictures with different postures to enlarge the training sets, removes some synthesized pictures with serious distortion, and marks about 1000 pictures of each class and 75000 pictures in total
Figure BDA0001635446260000093
67500 of them were selected as training sets and the remaining 7500 were selected as validation sets. This example converts all pictures of the training set to 96 × 96 size as input to the convolutional neural network, see fig. 2. The convolution kernel of convolution 1 layer is larger (the convolution kernel is 11), in order to filter noise information more quickly, and the noise information is extractedInformation for use. Convolution kernels of the convolution 2 layer and the convolution 3 layer are gradually reduced, and the filtered characteristic information needs to be processed for multiple times to obtain more accurate characteristic information. A drop strategy is added into the full-connection layer, the weights of some hidden nodes of the network are randomly made to not work during model training, and the weights of the nodes which do not work are temporarily stored for later sample input, so that the method can be used as a strategy for preventing overfitting of the model when training samples are few. The process of training a convolutional neural network can be represented as
Figure BDA0001635446260000101
Wherein, ckIndicating the k picture after expansion
Figure BDA0001635446260000102
As a result of the classification of (a),
Figure BDA0001635446260000103
representing the set of pictures after expansion, N2For the number of picture sets after expansion cnn () represents the head pose classification model before training and net represents the trained convolutional neural network parameters. The forward computation process of the test phase convolutional neural network can be expressed as:
Figure BDA0001635446260000104
Figure BDA0001635446260000105
c is a classification result predicted according to the neural network net, so that the image can be classified without testing the coordinates of the standard shape of the human face in the testing stage.
And step 3: more accurate initialization shape can be obtained by utilizing the head posture classification result, the facial expression (determining the shape of the face) and the position of the main characteristic point (assisting the face positioning);
and (3) according to the classification result of the head posture classification model based on the convolutional neural network, and by combining the positions of the main feature points of the human face obtained in the step (1) and the estimation results of other subtasks, constructing the initialized shape of the picture.
The specific implementation process in the step 3 is as follows:
preprocessing the picture and then inputting the preprocessed picture as the input of a neural network to obtain a corresponding output class c, and then selecting a corresponding average human face shape according to the detection result of the main feature points of the human face in the step 1
Figure BDA0001635446260000106
Then, the adjustment (rotation and translation) is carried out according to the position of the main characteristic point, so that
Figure BDA0001635446260000107
The error between the 5 main characteristic points and the detected main characteristic points of the face is minimum, so that the initialized shape S of the face of the picture is obtainedi
Figure BDA0001635446260000108
Wherein,
Figure BDA0001635446260000111
is a matrix of rotations of the optical system,
Figure BDA0001635446260000112
indicating the angle of rotation, t2dFor translation vectors, f is the scaling factor.
In that
Figure BDA0001635446260000113
The average value of the coordinates corresponding to the left and right eyes and the coordinates corresponding to the nose tip and the left and right mouth corners are respectively obtained, and the vector y is usedIs expressed such that it is combined with the detection result y of the main feature points of the face in step 1rThe error of (2) is minimal:
Figure BDA0001635446260000114
and 4, step 4: and training respective regressors according to the classification results of the postures and the expressions to update the face shape to approach a standard shape.
In the embodiment, the specific implementation process of step 4 is as follows:
and processing the initialized shape by adopting a higher-level cascade regression framework to gradually approximate to the shape of the real face.
Firstly, dividing domains in an optimization space of a face alignment problem to enable the face shapes contained in each domain to be similar, so that the domains have the same gradient descent direction in the training of a regressor, each domain trains a respective regressor, and the objective function of the training process of the regressor of the domain is as follows:
Figure BDA0001635446260000115
wherein,
Figure BDA0001635446260000116
a regressor for expressing the mth domain of the t stage, corresponding to the regression matrix of the t stage; omegamRepresents a picture set divided into an m-th domain.
Figure BDA0001635446260000117
Represents the t stage to the picture IiFace shape according to stage t-1
Figure BDA0001635446260000118
Extracting global binary features; the second part is
Figure BDA0001635446260000119
Is the control regularization strength, η.
Figure BDA00016354462600001110
To represent
Figure BDA00016354462600001111
And the error between the actual human face shape.
When the face shape is updated, the domain to which the face shape belongs is judged first, and then the regression device of the corresponding domain is adopted to update the face shape. The updating process of the face shape comprises the following steps:
Figure BDA00016354462600001112
wherein,
Figure BDA00016354462600001113
representing the shape of the face newly estimated at stage t,
Figure BDA00016354462600001114
and (4) representing the shape of the face estimated in the last stage, wherein T is the total number of rounds of the cascade regression, and the preferred suggested value is 8.
The training phase is performed according to the above steps 1 to 4, and regressors (regression matrices) of different domains can be obtained through training. In the testing stage, a new initialized face shape is first constructed according to steps 1 to 3, and then the face shape is updated by using a regressor (obtained from the training stage) of the corresponding domain, see fig. 1.
The result of this embodiment is improved in accuracy to a certain extent compared with the currently popular face alignment method, and referring to fig. 3, fig. 3 (a) shows the face alignment effect of the same test pictures of this embodiment (CFSE) and the currently popular face alignment algorithms LBF [ document 11], ESR [ document 9], and SDM [ document 8 ]. Fig. 3 (b) shows the experimental results of this example under actual monitoring (the first four), and under different data sets (the last line corresponds to 194 feature points in the helen data set, and the rest corresponds to 68 feature points in the picture under the ibug data set).
In specific implementation, the above processes can be automatically operated by adopting a computer software technology.
It should be understood that the above examples are only for illustrating the present invention and are not intended to limit the scope of the present invention. Furthermore, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art without departing from the spirit and scope of the technical solution of the present invention after reading the teaching of the present invention, and all of them should be covered in the scope of the claims of the present invention.

Claims (4)

1. A human face alignment method based on coarse-to-fine face shape estimation is characterized in that: aiming at any input face picture, firstly estimating an initialized face shape, and then gradually approximating the real shape of the face, comprising the following steps,
step 1, estimating the positions of main characteristic points of a human face and the facial expression by using a multitask deep learning frame;
step 2, constructing a head pose classification model based on a convolutional neural network to accurately estimate and classify the head pose of the human face;
step 3, obtaining a more accurate initialized shape by using the head posture classification result obtained in the step 2 and the estimation results of the positions of the facial expressions and the main characteristic points obtained in the step 1;
the implementation mode is that corresponding output class c is obtained by a head posture classification model based on a convolutional neural network for the picture, and corresponding average human face shape is selected
Figure FDA0003260842480000011
Then, the position of the main characteristic point is adjusted so as to ensure that
Figure FDA0003260842480000012
The error between the main characteristic point and the detected main characteristic point of the face is minimum, and the initialized shape S of the face of the picture is obtainedi
And 4, training respective regressors based on the initialized shape obtained in the step 3 according to the classification results of the postures and the expressions, and updating the face shape to be close to the standard shape.
2. The method of claim 1, wherein the face alignment method based on coarse-to-fine face shape estimation comprises: in the step 1, in a multitask deep learning framework, multitask learning comprises estimation of main task human face main feature points and estimation of other subtasks, wherein the human face main feature points comprise left and right mouth corners, nose tips and left and right eye centers, and the subtasks comprise estimation of head postures, sexes, eye shapes and mouth shapes.
3. The method of claim 1, wherein the face alignment method based on coarse-to-fine face shape estimation comprises: in step 2, firstly, a head posture classification model based on a convolutional neural network carries out three-dimensional modeling on a human face to obtain angle parameters pitch, jaw and roll of the human face, and then, a human face picture is classified according to the value range of the angle parameters; for the classified pictures, generating other pictures by using a human face sidedness method to obtain a new picture set; and taking the new picture set as a training set of the head posture classification model based on the convolutional neural network to finish the training of the model.
4. The face alignment method based on estimation of face shape from coarse to fine according to claim 1,2 or 3, characterized in that: in step 4, a higher-level cascade regression frame is adopted, and domain division is firstly carried out on an optimization space of the face alignment problem, so that the face shapes contained in each domain are relatively similar, the same gradient descending direction is achieved in the training of a regressor, and each domain is trained to form a respective regressor; when the face shape is updated, it is first determined which domain the face shape belongs to, and then the regression device of the corresponding domain is used to update the face shape.
CN201810358918.1A 2018-04-20 2018-04-20 Face alignment method based on shape estimation of coarse face to fine face Expired - Fee Related CN108446672B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810358918.1A CN108446672B (en) 2018-04-20 2018-04-20 Face alignment method based on shape estimation of coarse face to fine face

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810358918.1A CN108446672B (en) 2018-04-20 2018-04-20 Face alignment method based on shape estimation of coarse face to fine face

Publications (2)

Publication Number Publication Date
CN108446672A CN108446672A (en) 2018-08-24
CN108446672B true CN108446672B (en) 2021-12-17

Family

ID=63201089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810358918.1A Expired - Fee Related CN108446672B (en) 2018-04-20 2018-04-20 Face alignment method based on shape estimation of coarse face to fine face

Country Status (1)

Country Link
CN (1) CN108446672B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902716B (en) * 2019-01-22 2021-01-29 厦门美图之家科技有限公司 Training method for alignment classification model and image classification method
CN109934129B (en) * 2019-02-27 2023-05-30 嘉兴学院 Face feature point positioning method, device, computer equipment and storage medium
CN111444787B (en) * 2020-03-12 2023-04-07 江西赣鄱云新型智慧城市技术研究有限公司 Fully intelligent facial expression recognition method and system with gender constraint
CN111951175A (en) * 2020-06-28 2020-11-17 中国电子科技网络信息安全有限公司 Face image normalization method based on self-coding network
CN112307899A (en) * 2020-09-27 2021-02-02 中国科学院宁波材料技术与工程研究所 Facial posture detection and correction method and system based on deep learning
CN112417991B (en) * 2020-11-02 2022-04-29 武汉大学 Double-attention face alignment method based on hourglass capsule network
CN112270308B (en) * 2020-11-20 2021-07-16 江南大学 Face feature point positioning method based on double-layer cascade regression model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1794265A (en) * 2005-12-31 2006-06-28 北京中星微电子有限公司 Method and device for distinguishing face expression based on video frequency
CN104598936A (en) * 2015-02-28 2015-05-06 北京畅景立达软件技术有限公司 Human face image face key point positioning method
CN104657713A (en) * 2015-02-09 2015-05-27 浙江大学 Three-dimensional face calibrating method capable of resisting posture and facial expression changes
CN105512638A (en) * 2015-12-24 2016-04-20 黄江 Fused featured-based face detection and alignment method
CN107038429A (en) * 2017-05-03 2017-08-11 四川云图睿视科技有限公司 A kind of multitask cascade face alignment method based on deep learning
CN107563323A (en) * 2017-08-30 2018-01-09 华中科技大学 A kind of video human face characteristic point positioning method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8798374B2 (en) * 2008-08-26 2014-08-05 The Regents Of The University Of California Automated facial action coding system
US9633250B2 (en) * 2015-09-21 2017-04-25 Mitsubishi Electric Research Laboratories, Inc. Method for estimating locations of facial landmarks in an image of a face using globally aligned regression

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1794265A (en) * 2005-12-31 2006-06-28 北京中星微电子有限公司 Method and device for distinguishing face expression based on video frequency
CN104657713A (en) * 2015-02-09 2015-05-27 浙江大学 Three-dimensional face calibrating method capable of resisting posture and facial expression changes
CN104598936A (en) * 2015-02-28 2015-05-06 北京畅景立达软件技术有限公司 Human face image face key point positioning method
CN105512638A (en) * 2015-12-24 2016-04-20 黄江 Fused featured-based face detection and alignment method
CN107038429A (en) * 2017-05-03 2017-08-11 四川云图睿视科技有限公司 A kind of multitask cascade face alignment method based on deep learning
CN107563323A (en) * 2017-08-30 2018-01-09 华中科技大学 A kind of video human face characteristic point positioning method

Also Published As

Publication number Publication date
CN108446672A (en) 2018-08-24

Similar Documents

Publication Publication Date Title
CN108446672B (en) Face alignment method based on shape estimation of coarse face to fine face
Zadeh et al. Convolutional experts constrained local model for 3d facial landmark detection
Alp Guler et al. Densereg: Fully convolutional dense shape regression in-the-wild
US9117111B2 (en) Pattern processing apparatus and method, and program
Tuzel et al. Global-local face upsampling network
Tewari et al. Learning complete 3d morphable face models from images and videos
Yang et al. Facial shape tracking via spatio-temporal cascade shape regression
US20140043329A1 (en) Method of augmented makeover with 3d face modeling and landmark alignment
Tang et al. Facial landmark detection by semi-supervised deep learning
US20080219516A1 (en) Image matching apparatus, image matching method, computer program and computer-readable storage medium
JP6207210B2 (en) Information processing apparatus and method
CN111178208A (en) Pedestrian detection method, device and medium based on deep learning
CN102622589A (en) Multispectral face detection method based on graphics processing unit (GPU)
CN112530019A (en) Three-dimensional human body reconstruction method and device, computer equipment and storage medium
CN102654903A (en) Face comparison method
Uřičář et al. Real-time multi-view facial landmark detector learned by the structured output SVM
Yu et al. A video-based facial motion tracking and expression recognition system
Wu et al. Privacy leakage of sift features via deep generative model based image reconstruction
Yan et al. A survey of deep facial landmark detection
CN118052723A (en) Intelligent design system for face replacement
Yamashita et al. Cost-alleviative learning for deep convolutional neural network-based facial part labeling
Liu et al. Human action recognition using manifold learning and hidden conditional random fields
CN112183155B (en) Method and device for establishing action posture library, generating action posture and identifying action posture
Xie et al. Towards Hardware-Friendly and Robust Facial Landmark Detection Method
Li et al. Face Recognition Model Optimization Research Based on Embedded Platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20211217