CN108446672B - Face alignment method based on shape estimation of coarse face to fine face - Google Patents
Face alignment method based on shape estimation of coarse face to fine face Download PDFInfo
- Publication number
- CN108446672B CN108446672B CN201810358918.1A CN201810358918A CN108446672B CN 108446672 B CN108446672 B CN 108446672B CN 201810358918 A CN201810358918 A CN 201810358918A CN 108446672 B CN108446672 B CN 108446672B
- Authority
- CN
- China
- Prior art keywords
- face
- shape
- human face
- estimation
- picture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012549 training Methods 0.000 claims abstract description 36
- 230000036544 posture Effects 0.000 claims abstract description 32
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 24
- 238000013145 classification model Methods 0.000 claims abstract description 21
- 230000008921 facial expression Effects 0.000 claims abstract description 16
- 230000014509 gene expression Effects 0.000 claims abstract description 9
- 238000013135 deep learning Methods 0.000 claims abstract description 6
- 238000005457 optimization Methods 0.000 claims description 7
- 238000005286 illumination Methods 0.000 abstract description 8
- 210000003128 head Anatomy 0.000 description 33
- 238000004422 calculation algorithm Methods 0.000 description 20
- 239000011159 matrix material Substances 0.000 description 12
- 238000003909 pattern recognition Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 238000001514 detection method Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 239000013598 vector Substances 0.000 description 7
- 238000012360 testing method Methods 0.000 description 6
- 238000007476 Maximum Likelihood Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000013507 mapping Methods 0.000 description 4
- 125000002091 cationic group Chemical group 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 239000011521 glass Substances 0.000 description 2
- 241000764238 Isis Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000975 dye Substances 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a human face alignment method based on rough-to-fine face shape estimation, which is characterized in that aiming at any input human face picture, an initialized human face shape is estimated firstly, then the real shape of the human face is gradually approximated, a multitask deep learning framework is used for estimating the position of a main characteristic point and the human face expression of the human face, a head posture classification model based on a convolutional neural network is constructed for accurately estimating and classifying the head posture of the human face, and the head posture classification result and the estimation result of the position of the human face expression and the main characteristic point are utilized to obtain a more accurate initialized shape; and training respective regressors according to the classification results of the postures and the expressions based on the initialized shape, and updating the face shape to be close to the standard shape. The invention improves the robustness to the human face expression, the head posture and the illumination shielding difference by constructing a more accurate human face initialization shape and adopting a higher-level cascade regression frame.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a face alignment method based on rough-to-fine face shape estimation in the field of face recognition of digital images.
Background
The face alignment can provide accurate face shape information with specific semantics, and can help to realize geometric image normalization and feature extraction. Therefore, the face alignment is an indispensable important component of face recognition, face pose expression analysis, human-computer interaction and three-dimensional face modeling, and is widely applied to the fields of security protection, public security arrangement and control, intelligent access control, human-computer interaction, auxiliary driving, movie and television production, video conference and the like. In practical situations, the face alignment problem still faces a great challenge due to differences in facial expressions, head pose, lighting conditions and partial occlusion. Therefore, how to better solve the face alignment under these unconstrained conditions is a main trend for studying the face alignment problem at the present stage.
In recent years, with the wide application of the cascading regression framework to the field of face alignment, human research on the face problem has made rapid progress. The main reason for the success of the cascade shape regression method is to build a larger regression by cascading weak regressors. The structure greatly enhances the generalization capability and the accuracy of the face alignment algorithm, avoids the solution of a Hessian matrix and a Jacobian matrix, and greatly improves the speed of the algorithm.
In the early period of face alignment algorithm based on optimization (ASM [ document 1], AAM [ document 2] - [ document 4], CLM [ document 5] - [ document 7]), the purpose of face alignment is achieved by optimizing an error equation, and the performance of the algorithm depends on the degree of superiority and inferiority of the design of the error equation and the optimization effect of the error equation. The algorithm considers the face alignment problem as a nonlinear optimization problem to solve, and when the nonlinear optimization problem is solved, the most effective, most reliable and fastest method is a second-order descent method. In solving the problem of computer vision, the second-order descent method has two major disadvantages: 1) the objective function is not differentiable, so that the idea of numerical approximation cannot be realized; 2) the hessian matrix is over-dimensioned and indeterminate. The problem is too costly or unsolvable to solve due to the existence of these problems.
The face alignment algorithm based on the cascade shape regression [ document 8] - [ document 14] gradually estimates the shape increment according to the initialized shape to continuously approach the standard shape, so that a Hessian matrix and a Jacobian matrix do not need to be calculated. The face alignment algorithm based on shape regression has good effects on timeliness and accuracy, and has become a mainstream algorithm in the field of face alignment. The face alignment algorithm based on shape regression [ document 8] - [ document 11] needs to have an initialized shape, which is generally an average face. The algorithm firstly extracts features on the reference points (neighborhoods) of the average face, the features of all the reference points form a feature vector, and the algorithm directly estimates the mapping relation R between the difference between the average face and the standard shape and the corresponding feature vector. The testing stage optimizes the average face to approximate the true shape using the average face as initialization data and R regression estimated in the training stage.
SDM [8] first proposed a framework using cascaded regression to solve the face alignment problem, by using sift features [ document 15] and multiple cascaded regressions to enhance robustness to differences in facial expressions and head pose, illumination variations. Cao [ ref 9] proposes a nonparametric shape model in ESR, considering that the final regression shape of each face can be regarded as the linear sum of the initialization shape and all training face shape vectors, and an accurate model can be learned quickly by using shape index features and an associated feature selection method. Burgos-Artizzu [ document 10] and the like propose that PCPR can detect occlusion information while estimating the position of a reference point, and select a shape index feature without occlusion according to the occlusion information to solve the problem of face alignment under occlusion. Ren [ document 11] and the like propose effective local binary features with extremely high calculation speed and use random forests to carry out classification regression, and the speed of the algorithm reaches 3000 fps. Zhu [ document 16] and the like divide face alignment into a rough selection stage and a fine selection stage in CCFS, wherein the rough selection stage firstly constructs a shape space containing a plurality of candidate face shapes, then determines a subspace and gives the subspace to the fine selection stage for processing, and meanwhile discards other subspaces which have larger differences with standard shapes and have no hopes; this space is continually narrowed during the fine selection phase until it converges to a very small subspace in which the final face shape can be determined.
The face alignment algorithm at the present stage can well solve the face alignment problem with small changes of expressions, head postures and illumination differences, for example, a face picture under a 300-W data set [ document 17] comman subset has relatively small changes of expressions, head postures and illumination differences, the optimal error of the face alignment algorithm [ document 18] on the data set is 4%, and the optimal error of the face alignment algorithm [ document 19] on a COFW [10] data set with a serious occlusion problem is 6.5%, so that the face alignment under an unconstrained condition is a problem to be solved urgently in the face alignment field at the present stage.
[ document 1] dyes T F, Taylor C J, Cooper D H, Graham J. active shape models-the guiding and the application. computer vision and image understating, 1995,61(1):38-59.
[ document 2] Matthews I, Baker S.active adaptation models viewed, International Journal of Computer Vision,2004,60(2): 135- "164.
[ document 3] Sauer P, Cootes T F, Taylor C J. Accurate regression processes for active application models// Proceedings of the British Machine Vision conference. Dunde, Scotland,2011: 681-.
[ document 4] Cootes T F, Edwards G J, Taylor C J.active application models.IEEE Transactions on Pattern Analysis and Machine understanding, 2001,23(6):581-585.
[ document 5] Asthana A, Zafeiriou S, Cheng S, cationic M.Robust distributed reactive map matching with constrained local models// IEEE Conference on Computer Vision and Pattern recognition. Portland, USA,2013: 3444-.
[ document 6] cristiniace D, Cootes T.feature detection and tracking with constrained local models// Proceedings of the British Machine Vision conference. Edinburgh, UK,2006: 929-.
[ document 7] Asthana A, Zafeiriou S, CHENG Shi-yang, cationic M.Inclusion Face Alignment in the wild.// IEEE Conference on Computer Vision and Pattern recognition. Columbus, USA 2014: 1859-.
[ document 8] Xiong Xue-han, Torre F D L.Supervised device method and its applications to face alignment// IEEE Conference on Computer Vision and Pattern registration. Portland, USA,2013: 532-.
[ document 9] Cao Xu-dong, Wei Yi-chen, Wen Fang, Sun Jian.face alignment by application program regression. International Journal of Computer Vision,2014,107(2): 177-.
[ document 10] Burgos-Artizzu X P, Perona P, Dollar P.Robust face and evaluation unit oclusion// IEEE International Conference on Computer Vison.Sydney, Australia,2013: 1513-.
[ document 11] Ren Shao-q, Cao Xu-dong, Wei Yi-chen, Sun Jian.face alignment at 3000fps via regressing local binding facilities// IEEE Conference on Computer Vision and Pattern recognition. Columbus, USA 2014: 1685-.
[ document 12] Dollar P, Welinder P, Perona P.Cascaded position regression.// IEEE Conference on Computer Vision and Pattern recognition. san Francisco, USA,2010: 1078-.
[ document 13] Tzimiopoulos G, cationic M.Gauss-newton defined part models for face alignment in-the-wire.// IEEE Conference on Computer Vision and Pattern recognition. Columbus, USA,2014: 1851-.
[ document 14] Smith B M, Brandt J, Lin Z, Zhang L. Nonparametric context modifying of local impedance for position-and expression-robust surface localization.// IEEE Conference on Computer Vision and Pattern recognition. Columbus, USA,2014: 1741-.
[ document 15] Lowe D G.distinctive image features from scale-innovative keypoints. International Journal of Computer Vision,2004,60(2): 91-110.
[ document 16] Zhu Shi-zhan, Chen Li.LOY Chen-change, Tang Xiao-ou.face Alignment by Coarse code-to-Fine Shape searching// IEEE Conference on Computer Vision and Pattern recognition. Boston, USA 2015: 4998-.
[ document 17] C.Sagonas, G.Tzimioperolos, S.Zafeiriou, M.Pantic, A semi-automatic method for facial recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern recognition works, 2013, pp.896-903.
[ document 18] S.Xiao, J.Feng, J.xing, H.Lai, S.Yan, A.Kassim, Robust facial feature detection via accurate assessment-sensitive networks, in: European Conference on Computer Vision,2016, pp.57-72.
[ document 19] J.Zhang, M.Kan, S.Shan, X.Chen, occupancy-free face alignment: Deep regression networks with ground-corrected auto-encoders in IEEE Conference on Computer Vision and Pattern Recognition,2016, pp.3428-3437.
[ document 20] V.Blankz, T.Vetter, Face recognition based on fixing a 3d morphable model, IEEE Transactions on pattern analysis and machine interaction.25 (9) (2003) 1063-.
[ document 21] P.Paysan, R.Knothe, B.Amberg, S.Romdhani, T.Vetter, A3 d face model for position and initialization surface recognition, in: Advanced video 440and signal based survey, 2009.AVSS'09.six IEEE International Conference on, IEEE,2009, pp.296-301.
[ document 22] C.Cao, Y.Weng, S.ZHou, Y.Tong, K.ZHou, Facewarehouse: A3 d facial expression database for visual computing, IEEE Transactions on Visualization and Computer Graphics 20(3) (2014)413-425.
[ document 23] X.Zhu, Z.Lei, X.Liu, H.Shi, S.Z.Li, Face alignment across large sites: A3 d solution, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016, pp.146{155.
Disclosure of Invention
In order to solve the technical problems, the invention provides a face alignment method based on rough-to-fine face shape estimation, which mainly solves the problem of low face alignment precision under the conditions of difference of facial expressions, head postures and illumination conditions and partial occlusion.
The technical scheme adopted by the invention is a face alignment method based on rough-to-fine face shape estimation, aiming at any input face picture, firstly estimating the shape of an initialized face, then gradually approximating the real shape of the face, comprising the following steps,
step 1, estimating the positions of main characteristic points of a human face and the facial expression by using a multitask deep learning frame;
step 2, constructing a head pose classification model based on a convolutional neural network to accurately estimate and classify the head pose of the human face;
step 3, obtaining a more accurate initialized shape by using the head posture classification result obtained in the step 2 and the estimation results of the positions of the facial expressions and the main characteristic points obtained in the step 1;
and 4, training respective regressors based on the initialized shape obtained in the step 3 according to the classification results of the postures and the expressions, and updating the face shape to be close to the standard shape.
In step 1, in the multitask deep learning framework, the multitask learning comprises estimation of main task human face main feature points and estimation of other subtasks, wherein the human face main feature points comprise left and right mouth corners, nose tips and left and right eye centers, and the subtasks comprise estimation of head pose, gender, eye shape and mouth shape.
In step 2, firstly, three-dimensional modeling is carried out on the human face based on the head posture classification model of the convolutional neural network to obtain angle parameters pitch, jaw and roll of the human face, and then the human face pictures are classified according to the value range of the angle parameters; for the classified pictures, generating other pictures by using a human face sidedness method to obtain a new picture set; and taking the new picture set as a training set of the head posture classification model based on the convolutional neural network to finish the training of the model.
In step 3, the corresponding output class c is obtained by the head pose classification model based on the convolutional neural network for the picture, and the corresponding average human face shape is selectedThen, the position of the main characteristic point is adjusted so as to ensure thatThe error between the main characteristic point and the detected main characteristic point of the face is minimum, and the initialized shape S of the face of the picture is obtainedi。
In step 4, a higher-level cascade regression frame is adopted, and domain division is firstly carried out on the optimization space of the face alignment problem, so that the face shapes contained in each domain are relatively similar, the same gradient descending direction is achieved in the training of a regressor, and each domain trains a respective regressor; when the face shape is updated, it is first determined which domain the face shape belongs to, and then the regression device of the corresponding domain is used to update the face shape.
The technical scheme provided by the invention is a simple face alignment method with good robustness, and the face shape estimation method based on the convolutional neural network from coarse to fine can be used for selecting the face shape with close additional attributes as an initialization shape with high accuracy, so that the dependence of the initialization shape on an average face is reduced, the robustness of an algorithm on the difference of the head posture, the facial expression, the shielding and the illumination condition of the face is enhanced, and the alignment effect is improved. The invention improves the robustness of the algorithm to the human face expression, the head posture and the illumination shielding difference by constructing a more accurate human face initialization shape and adopting a higher-level cascade regression frame.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
FIG. 2 is a head pose classification model based on a convolutional neural network constructed in an embodiment of the present invention.
Fig. 3 is a schematic diagram comparing a conventional face alignment method with the method of the present invention when processing exaggerated pictures of face head pose and expression.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The face alignment method based on the estimation of the face shape from coarse to fine is a simple face alignment method with good robustness. The face shape estimation method based on the convolutional neural network from coarse to fine can select the face shape with approximate additional attributes as the initialized shape with high accuracy, so that the dependence of the initialized shape on an average face is reduced, the robustness of the algorithm on the difference of the head pose, the facial expression, the shielding and the illumination condition of the face is enhanced, and the effect of the algorithm is improved.
Referring to fig. 1, the human face alignment method based on rough-to-fine face shape estimation provided by the present invention firstly estimates an initialized face shape for any input human face picture, and then gradually approximates to the real shape of the human face; the specific implementation comprises the following steps:
step 1: estimating the positions of main characteristic points of the human face and the facial expression by using a multi-task deep learning frame;
the multi-task learning comprises estimation of main task human face main characteristic points and estimation of other subtasks. Wherein the subtasks include estimation of head pose, gender, eye shape, mouth shape. The main task and the sub-task use least squares and cross entropy functions, respectively, as loss functions.
In the embodiment, the multi-task learning in step 1 is defined as the estimation of main characteristic points (left and right mouth corners, nose tip, left and right eye centers) of the main task face and the estimation of other subtasks, and the corresponding marks of the main characteristic points and the estimation are marked asWherein i represents the index number of the training pictures, and N is the number of pictures in the training picture set.The labels representing the main feature point detection tasks, the remaining labels representing other additional attribute (head pose, gender, eyes, mouth) tasks.Coordinates representing 5 feature points: (Represented as a vector with dimensions of 10 dimensions),representing 5 different face poses (0, ± 30 °, ± 60 °) divided by yaw angle.Is a binary feature, representing male or female, respectively.Respectively showing the wearing of glasses, open eyes and closed eyes,indicating smile, grin, mouth closed, mouth open. The objective function of the multitask learning can be expressed as:
wherein,
representing a set of all feature vectors, F (x)i;Wr)=(Wr)TxiIs a linear function, representing x according to the ith featureiAnd the mapping relation W obtained by trainingrA process of calculating the positions of the main feature points of the face, wherein WrRepresenting a feature x fromiTo the main feature points of the real human faceThe mapping relationship between them.
To use the posterior probability expressed by the softmax function,a representation matrix WaJ (th) column, WaRepresenting a feature x fromiMarking to subtask aThe mapping relationship between the two is the parameter of the maximum likelihood estimation equation, and m represents a certain label of the subtask a, for example, when a is a gender estimation subtask, m may be 0 or 1.
For the estimation of the subtask a, the maximum likelihood estimation method is adopted to respectively calculate the probabilities of different label predicted values (m is different in value). For example, when a is a gender estimation subtask, m may be 0 or 1, i.e., the probability of being male or female, respectively, is found. The probability is calculated from the maximum likelihood estimate and the softmax function.
Is a parameter of m corresponding maximum likelihood estimates, e.g. W1 aAnd when the corresponding m is 1, estimating parameters of the maximum likelihood.
Is a regular term, the parameter W ═ Wr,{WaAnd (4) representing a penalty item, wherein a represents a task belonging to A, and A represents a set of all other additional attribute detection tasks without feature point detection. T represents all tasks, including the main task (main feature point detection) and a, and T is the task number.
λaAnd (4) representing the weight occupied by different subtasks in the overall objective function (the weight corresponding to the main characteristic point estimation task is 1).
Step 2: constructing a head posture classification model based on a convolutional neural network to accurately estimate and classify the head posture of the human face;
a head pose classification model based on a convolutional neural network is constructed, and the purpose is to perform fine estimation and classification on the head pose of the human face of any input human face picture by using the head pose classification model.
The method comprises the steps of firstly carrying out three-dimensional modeling on a human face based on a head posture classification model of a convolutional neural network to obtain angle parameters (pitch, jaw and roll) of the human face, and then classifying pictures of the human face according to the value range of the angle parameters. And generating other types of pictures by using a human face sidedness method for the pictures after classification to obtain a new picture set. And finishing the training of the model by taking the new picture set as a training set of the head posture classification model based on the convolutional neural network.
A large amount of training sets are needed for training the head posture classification model based on the convolutional neural network, and the size of the existing picture set (300-W) is too small, so that the existing picture set needs to be expanded. The embodiment enlarges the training set by synthesizing the face pictures with different postures by using a face sidedness method. Firstly, 3D modeling and classification are carried out on pictures, then, a plurality of new pictures with different postures (classes) are synthesized by using a human face sidetracking method, a plurality of synthesized pictures with serious distortion are removed,
the embodiment is realized as follows:
1) the head pose classification model based on the convolutional neural network firstly performs three-dimensional modeling on each picture face of a 300-W picture set [ document 17], and obtains angle parameters (pitch angle, jaw angle and roll angle) of the face.
When the three-dimensional modeling is carried out on the face with the two-dimensional picture, a 3D DMM model (document 20) (3D face deformation model) proposed in the article by Blenz and the like is adopted, a PCA dimension reduction method is used for describing a 3D face shape space,
wherein,
v represents a three-dimensional face of a person,representing a three-dimensional average face, Aid、AexpShape principal components and expression principal components, which are three-dimensional face shape spaces, are derived from the BFM model [ document 21]]And Face-Warehouse [ reference 22]],aid、aexpRespectively a shape parameter and an expression parameter. Then, the three-dimensional face is projected to a two-dimensional plane through weak perspective projection according to the corresponding relation between the two-dimensional face reference point and the three-dimensional average face reference point, and the angle parameters of the face pose are estimated:
wherein,
r is selected from pitch, jaw, roll (a)Angle parameter), t2dFor the translation matrix, f is the scaling factor,is an orthogonal projection matrix. S is a 2D picture or point cloud (generally representing the shape of a human face and a matrix obtained by arranging coordinates of human face characteristic points according to a certain sequence) obtained by projecting a 3D picture or point cloud, and a more accurate R matrix, namely pitch, jaw and roll angles, can be obtained by a method of multiple iterations and characteristic point matching.
2) And classifying the face pictures of the 300-W picture set according to the value range of the angle parameter, and calculating the average face shape of each class.
And classifying the 300-W picture set according to the estimation result of the angle parameter of the face posture. Firstly, the value range is obtained according to the pitch angle ([ -45 degrees, -15 degrees °)]、[-15°,15°]、[15°,45°]) Dividing the face pictures into 3 classes, and then respectively taking values of the 3 classes of pictures according to the jaw angle ([ -50 degrees and 50 degrees ]]Averagely divided into 5 classes), generating 15 classes of pictures in total, and then dividing the pictures according to the value range of roll angles ([ -50 degrees, 50 degrees)]Average classification into 5 classes) for a total of 75 classes. Denote class by c, { ScAnd the method refers to a set of class c pictures, and a face shape subspace is constructed corresponding to the class c face pictures. According to the classification rule, every adjacent 5 classes (c is 5j +1 … c is 5j +5 (where j is 0 … 24)) have close pitch and jaw angles, but the roll angles are different greatly, and most of the training pictures have only small angle changes, and the roll angles in the pictures are also small, namely the c-th picture3Relatively many pictures of 5j +3 types, c-th1、c2、c4、c5So c can be calculated first3The average shape of the face of the class picture is obtained, and then other four classes (c) are obtained after the class picture is rotated by a certain angle1、c2、c4、c5) The average shape of the human face can avoid that the average shape of the human face obtained by independent calculation is not strong in representativeness because a certain type of images are too few, so that the initialized shape error is larger; in addition, in calculating c3Picture-like face average shapeAdding attribute to picture according to human faceFurther divided, i.e., opening (wearing glasses), closing, opening (laughing) and closing (grining) of the eyes, i.e., each class of picture has four average shapes, and if the number of pictures in a certain class of picture sets is 0, the average shape of the class is selected from the four average shapes with the smallest error. Assuming that there is no closed-eye or closed-mouth picture in a certain class, the average shape of the open-eye or closed-mouth picture is preferably selected, so that 300 face subspaces are formed, and the average face shape is calculated in each face subspace and usedWherein γ ∈ (0,1,2, 3).
3) Using the 3D face model constructed as described above, a human face sidemapping method [ document 23] is used to synthesize new pictures of different poses (classes), expand a training set, and train a head pose classification model.
In practice, it is not possible to estimate the face angle parameters for each picture and then classify the pictures by the above method, which consumes a lot of time and space. Therefore, the embodiment proposes to construct a head pose classification model based on a convolutional neural network, and a corresponding classification result can be directly given for any input picture. The training of the model requires a large number of training picture sets, the embodiment utilizes a face sidedness method to synthesize face pictures with different postures to enlarge the training sets, removes some synthesized pictures with serious distortion, and marks about 1000 pictures of each class and 75000 pictures in total67500 of them were selected as training sets and the remaining 7500 were selected as validation sets. This example converts all pictures of the training set to 96 × 96 size as input to the convolutional neural network, see fig. 2. The convolution kernel of convolution 1 layer is larger (the convolution kernel is 11), in order to filter noise information more quickly, and the noise information is extractedInformation for use. Convolution kernels of the convolution 2 layer and the convolution 3 layer are gradually reduced, and the filtered characteristic information needs to be processed for multiple times to obtain more accurate characteristic information. A drop strategy is added into the full-connection layer, the weights of some hidden nodes of the network are randomly made to not work during model training, and the weights of the nodes which do not work are temporarily stored for later sample input, so that the method can be used as a strategy for preventing overfitting of the model when training samples are few. The process of training a convolutional neural network can be represented as
Wherein, ckIndicating the k picture after expansionAs a result of the classification of (a),representing the set of pictures after expansion, N2For the number of picture sets after expansion cnn () represents the head pose classification model before training and net represents the trained convolutional neural network parameters. The forward computation process of the test phase convolutional neural network can be expressed as:
c is a classification result predicted according to the neural network net, so that the image can be classified without testing the coordinates of the standard shape of the human face in the testing stage.
And step 3: more accurate initialization shape can be obtained by utilizing the head posture classification result, the facial expression (determining the shape of the face) and the position of the main characteristic point (assisting the face positioning);
and (3) according to the classification result of the head posture classification model based on the convolutional neural network, and by combining the positions of the main feature points of the human face obtained in the step (1) and the estimation results of other subtasks, constructing the initialized shape of the picture.
The specific implementation process in the step 3 is as follows:
preprocessing the picture and then inputting the preprocessed picture as the input of a neural network to obtain a corresponding output class c, and then selecting a corresponding average human face shape according to the detection result of the main feature points of the human face in the step 1Then, the adjustment (rotation and translation) is carried out according to the position of the main characteristic point, so thatThe error between the 5 main characteristic points and the detected main characteristic points of the face is minimum, so that the initialized shape S of the face of the picture is obtainedi。
Wherein,
is a matrix of rotations of the optical system,indicating the angle of rotation, t2dFor translation vectors, f is the scaling factor.
In thatThe average value of the coordinates corresponding to the left and right eyes and the coordinates corresponding to the nose tip and the left and right mouth corners are respectively obtained, and the vector y is usedcγIs expressed such that it is combined with the detection result y of the main feature points of the face in step 1rThe error of (2) is minimal:
and 4, step 4: and training respective regressors according to the classification results of the postures and the expressions to update the face shape to approach a standard shape.
In the embodiment, the specific implementation process of step 4 is as follows:
and processing the initialized shape by adopting a higher-level cascade regression framework to gradually approximate to the shape of the real face.
Firstly, dividing domains in an optimization space of a face alignment problem to enable the face shapes contained in each domain to be similar, so that the domains have the same gradient descent direction in the training of a regressor, each domain trains a respective regressor, and the objective function of the training process of the regressor of the domain is as follows:
wherein,
a regressor for expressing the mth domain of the t stage, corresponding to the regression matrix of the t stage; omegamRepresents a picture set divided into an m-th domain.Represents the t stage to the picture IiFace shape according to stage t-1Extracting global binary features; the second part isIs the control regularization strength, η.To representAnd the error between the actual human face shape.
When the face shape is updated, the domain to which the face shape belongs is judged first, and then the regression device of the corresponding domain is adopted to update the face shape. The updating process of the face shape comprises the following steps:
wherein,representing the shape of the face newly estimated at stage t,and (4) representing the shape of the face estimated in the last stage, wherein T is the total number of rounds of the cascade regression, and the preferred suggested value is 8.
The training phase is performed according to the above steps 1 to 4, and regressors (regression matrices) of different domains can be obtained through training. In the testing stage, a new initialized face shape is first constructed according to steps 1 to 3, and then the face shape is updated by using a regressor (obtained from the training stage) of the corresponding domain, see fig. 1.
The result of this embodiment is improved in accuracy to a certain extent compared with the currently popular face alignment method, and referring to fig. 3, fig. 3 (a) shows the face alignment effect of the same test pictures of this embodiment (CFSE) and the currently popular face alignment algorithms LBF [ document 11], ESR [ document 9], and SDM [ document 8 ]. Fig. 3 (b) shows the experimental results of this example under actual monitoring (the first four), and under different data sets (the last line corresponds to 194 feature points in the helen data set, and the rest corresponds to 68 feature points in the picture under the ibug data set).
In specific implementation, the above processes can be automatically operated by adopting a computer software technology.
It should be understood that the above examples are only for illustrating the present invention and are not intended to limit the scope of the present invention. Furthermore, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art without departing from the spirit and scope of the technical solution of the present invention after reading the teaching of the present invention, and all of them should be covered in the scope of the claims of the present invention.
Claims (4)
1. A human face alignment method based on coarse-to-fine face shape estimation is characterized in that: aiming at any input face picture, firstly estimating an initialized face shape, and then gradually approximating the real shape of the face, comprising the following steps,
step 1, estimating the positions of main characteristic points of a human face and the facial expression by using a multitask deep learning frame;
step 2, constructing a head pose classification model based on a convolutional neural network to accurately estimate and classify the head pose of the human face;
step 3, obtaining a more accurate initialized shape by using the head posture classification result obtained in the step 2 and the estimation results of the positions of the facial expressions and the main characteristic points obtained in the step 1;
the implementation mode is that corresponding output class c is obtained by a head posture classification model based on a convolutional neural network for the picture, and corresponding average human face shape is selectedThen, the position of the main characteristic point is adjusted so as to ensure thatThe error between the main characteristic point and the detected main characteristic point of the face is minimum, and the initialized shape S of the face of the picture is obtainedi;
And 4, training respective regressors based on the initialized shape obtained in the step 3 according to the classification results of the postures and the expressions, and updating the face shape to be close to the standard shape.
2. The method of claim 1, wherein the face alignment method based on coarse-to-fine face shape estimation comprises: in the step 1, in a multitask deep learning framework, multitask learning comprises estimation of main task human face main feature points and estimation of other subtasks, wherein the human face main feature points comprise left and right mouth corners, nose tips and left and right eye centers, and the subtasks comprise estimation of head postures, sexes, eye shapes and mouth shapes.
3. The method of claim 1, wherein the face alignment method based on coarse-to-fine face shape estimation comprises: in step 2, firstly, a head posture classification model based on a convolutional neural network carries out three-dimensional modeling on a human face to obtain angle parameters pitch, jaw and roll of the human face, and then, a human face picture is classified according to the value range of the angle parameters; for the classified pictures, generating other pictures by using a human face sidedness method to obtain a new picture set; and taking the new picture set as a training set of the head posture classification model based on the convolutional neural network to finish the training of the model.
4. The face alignment method based on estimation of face shape from coarse to fine according to claim 1,2 or 3, characterized in that: in step 4, a higher-level cascade regression frame is adopted, and domain division is firstly carried out on an optimization space of the face alignment problem, so that the face shapes contained in each domain are relatively similar, the same gradient descending direction is achieved in the training of a regressor, and each domain is trained to form a respective regressor; when the face shape is updated, it is first determined which domain the face shape belongs to, and then the regression device of the corresponding domain is used to update the face shape.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810358918.1A CN108446672B (en) | 2018-04-20 | 2018-04-20 | Face alignment method based on shape estimation of coarse face to fine face |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810358918.1A CN108446672B (en) | 2018-04-20 | 2018-04-20 | Face alignment method based on shape estimation of coarse face to fine face |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108446672A CN108446672A (en) | 2018-08-24 |
CN108446672B true CN108446672B (en) | 2021-12-17 |
Family
ID=63201089
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810358918.1A Expired - Fee Related CN108446672B (en) | 2018-04-20 | 2018-04-20 | Face alignment method based on shape estimation of coarse face to fine face |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108446672B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109902716B (en) * | 2019-01-22 | 2021-01-29 | 厦门美图之家科技有限公司 | Training method for alignment classification model and image classification method |
CN109934129B (en) * | 2019-02-27 | 2023-05-30 | 嘉兴学院 | Face feature point positioning method, device, computer equipment and storage medium |
CN111444787B (en) * | 2020-03-12 | 2023-04-07 | 江西赣鄱云新型智慧城市技术研究有限公司 | Fully intelligent facial expression recognition method and system with gender constraint |
CN111951175A (en) * | 2020-06-28 | 2020-11-17 | 中国电子科技网络信息安全有限公司 | Face image normalization method based on self-coding network |
CN112307899A (en) * | 2020-09-27 | 2021-02-02 | 中国科学院宁波材料技术与工程研究所 | Facial posture detection and correction method and system based on deep learning |
CN112417991B (en) * | 2020-11-02 | 2022-04-29 | 武汉大学 | Double-attention face alignment method based on hourglass capsule network |
CN112270308B (en) * | 2020-11-20 | 2021-07-16 | 江南大学 | Face feature point positioning method based on double-layer cascade regression model |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1794265A (en) * | 2005-12-31 | 2006-06-28 | 北京中星微电子有限公司 | Method and device for distinguishing face expression based on video frequency |
CN104598936A (en) * | 2015-02-28 | 2015-05-06 | 北京畅景立达软件技术有限公司 | Human face image face key point positioning method |
CN104657713A (en) * | 2015-02-09 | 2015-05-27 | 浙江大学 | Three-dimensional face calibrating method capable of resisting posture and facial expression changes |
CN105512638A (en) * | 2015-12-24 | 2016-04-20 | 黄江 | Fused featured-based face detection and alignment method |
CN107038429A (en) * | 2017-05-03 | 2017-08-11 | 四川云图睿视科技有限公司 | A kind of multitask cascade face alignment method based on deep learning |
CN107563323A (en) * | 2017-08-30 | 2018-01-09 | 华中科技大学 | A kind of video human face characteristic point positioning method |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8798374B2 (en) * | 2008-08-26 | 2014-08-05 | The Regents Of The University Of California | Automated facial action coding system |
US9633250B2 (en) * | 2015-09-21 | 2017-04-25 | Mitsubishi Electric Research Laboratories, Inc. | Method for estimating locations of facial landmarks in an image of a face using globally aligned regression |
-
2018
- 2018-04-20 CN CN201810358918.1A patent/CN108446672B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1794265A (en) * | 2005-12-31 | 2006-06-28 | 北京中星微电子有限公司 | Method and device for distinguishing face expression based on video frequency |
CN104657713A (en) * | 2015-02-09 | 2015-05-27 | 浙江大学 | Three-dimensional face calibrating method capable of resisting posture and facial expression changes |
CN104598936A (en) * | 2015-02-28 | 2015-05-06 | 北京畅景立达软件技术有限公司 | Human face image face key point positioning method |
CN105512638A (en) * | 2015-12-24 | 2016-04-20 | 黄江 | Fused featured-based face detection and alignment method |
CN107038429A (en) * | 2017-05-03 | 2017-08-11 | 四川云图睿视科技有限公司 | A kind of multitask cascade face alignment method based on deep learning |
CN107563323A (en) * | 2017-08-30 | 2018-01-09 | 华中科技大学 | A kind of video human face characteristic point positioning method |
Also Published As
Publication number | Publication date |
---|---|
CN108446672A (en) | 2018-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108446672B (en) | Face alignment method based on shape estimation of coarse face to fine face | |
Zadeh et al. | Convolutional experts constrained local model for 3d facial landmark detection | |
Alp Guler et al. | Densereg: Fully convolutional dense shape regression in-the-wild | |
US9117111B2 (en) | Pattern processing apparatus and method, and program | |
Tuzel et al. | Global-local face upsampling network | |
Tewari et al. | Learning complete 3d morphable face models from images and videos | |
Yang et al. | Facial shape tracking via spatio-temporal cascade shape regression | |
US20140043329A1 (en) | Method of augmented makeover with 3d face modeling and landmark alignment | |
Tang et al. | Facial landmark detection by semi-supervised deep learning | |
US20080219516A1 (en) | Image matching apparatus, image matching method, computer program and computer-readable storage medium | |
JP6207210B2 (en) | Information processing apparatus and method | |
CN111178208A (en) | Pedestrian detection method, device and medium based on deep learning | |
CN102622589A (en) | Multispectral face detection method based on graphics processing unit (GPU) | |
CN112530019A (en) | Three-dimensional human body reconstruction method and device, computer equipment and storage medium | |
CN102654903A (en) | Face comparison method | |
Uřičář et al. | Real-time multi-view facial landmark detector learned by the structured output SVM | |
Yu et al. | A video-based facial motion tracking and expression recognition system | |
Wu et al. | Privacy leakage of sift features via deep generative model based image reconstruction | |
Yan et al. | A survey of deep facial landmark detection | |
CN118052723A (en) | Intelligent design system for face replacement | |
Yamashita et al. | Cost-alleviative learning for deep convolutional neural network-based facial part labeling | |
Liu et al. | Human action recognition using manifold learning and hidden conditional random fields | |
CN112183155B (en) | Method and device for establishing action posture library, generating action posture and identifying action posture | |
Xie et al. | Towards Hardware-Friendly and Robust Facial Landmark Detection Method | |
Li et al. | Face Recognition Model Optimization Research Based on Embedded Platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20211217 |