CN108446672B

CN108446672B - Face alignment method based on shape estimation of coarse face to fine face

Info

Publication number: CN108446672B
Application number: CN201810358918.1A
Authority: CN
Inventors: 李晶; 万俊; 常军; 吴玉佳; 肖雅夫
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2018-04-20
Filing date: 2018-04-20
Publication date: 2021-12-17
Anticipated expiration: 2038-04-20
Also published as: CN108446672A

Abstract

The invention discloses a face alignment method based on face shape estimation from coarse to fine. For any input face picture, the initial face shape is first estimated, and then the real face shape is gradually approximated, including using multiple The task deep learning framework estimates the position and facial expression of the main feature points of the face, builds a head pose classification model based on convolutional neural network to accurately estimate and classify the face head pose, and uses the head pose classification results. As well as the estimation results of the facial expression and the position of the main feature points, a more accurate initialized shape is obtained; based on the initialized shape, according to the classification results of the pose and expression, the respective regressors are trained to update the face shape and approximate the standard shape. By constructing a more accurate face initialization shape and adopting a more advanced cascaded regression framework, the present invention improves the robustness to differences in face expression, head posture and illumination occlusion.

Description

A Face Alignment Method Based on Coarse-to-fine Face Shape Estimation

技术领域technical field

本发明属于本发明属于计算机视觉技术领域，具体涉及数字图像的人脸识别领域中的一种基于由粗到细脸部形状估计的人脸对齐方法。The present invention belongs to the technical field of computer vision, and in particular relates to a face alignment method based on estimation of face shape from coarse to fine in the field of face recognition of digital images.

背景技术Background technique

人脸对齐能够提供准确的、有特定语义的人脸形状信息，可以帮助实现几何图像归一化和特征提取。因此，人脸对齐是人脸识别、人脸姿态表情分析、人机交互以及三维人脸建模不可缺少的重要组成部分，广泛应用于安防、公安布控、智能门禁、人机交互、辅助驾驶、影视制作、视频会议等领域。在实际情况中，由于人脸表情、头部姿势、光照条件的差异和部分遮挡的存在，人脸对齐问题仍然面临着巨大的挑战。因此如何更好的解决这些无约束条件下的人脸对齐，是现阶段研究人脸对齐问题的主要趋势。Face alignment can provide accurate and semantic face shape information, which can help achieve geometric image normalization and feature extraction. Therefore, face alignment is an indispensable and important part of face recognition, facial gesture expression analysis, human-computer interaction and 3D face modeling, and is widely used in security, public security control, intelligent access control, human-computer interaction, assisted driving, Film and television production, video conferencing and other fields. In practical situations, the face alignment problem still faces huge challenges due to differences in facial expressions, head poses, lighting conditions, and partial occlusions. Therefore, how to better solve the face alignment under these unconstrained conditions is the main trend of research on face alignment at this stage.

近年来，随着级联回归框架广泛应用于人脸对齐领域，人类对人脸问题的研究取得了飞速的进展。级联形状回归方法成功的主要原因在于通过级联一些弱回归器去构建一个更加强大的回归器。这种架构极大的增强了人脸对齐算法的泛化能力和精确性，同时避免了海森矩阵和雅克比矩阵的求解，大大提升了算法的速度。In recent years, with the widespread application of the cascade regression framework in the field of face alignment, the research on the human face problem has made rapid progress. The main reason for the success of the cascaded shape regression method is to build a stronger regressor by cascading some weak regressors. This architecture greatly enhances the generalization ability and accuracy of the face alignment algorithm, while avoiding the solution of the Hessian matrix and the Jacobian matrix, which greatly improves the speed of the algorithm.

早期的基于最优的人脸对齐算法(ASM[文献1]、AAM[文献2]-[文献4]、CLM[文献5]-[文献7])，通过优化误差方程以达到人脸对齐的目的，其性能依赖于误差方程本身设计的优劣程度及其最优化的效果。这类算法把人脸对齐问题看成是非线性最优化问题来求解，而在求解非线性最优化问题时，目前最有效、最可靠、最快的方法是二阶下降方法。在解决计算机视觉方面的问题时，二阶下降方法有两大缺点：1)目标函数不可微从而使数值逼近的思想无法实现；2)海森矩阵的维度过高且不正定。由于这些问题的存在导致该问题求解代价太大或不可解。Early face alignment algorithms based on optimal (ASM [Document 1], AAM [Document 2]-[Document 4], CLM [Document 5]-[Literature 7]) optimized the error equation to achieve the best face alignment. Its performance depends on the pros and cons of the design of the error equation itself and its optimization effect. This kind of algorithm regards the face alignment problem as a nonlinear optimization problem to solve, and when solving the nonlinear optimization problem, the most effective, reliable and fastest method is the second-order descent method. When solving problems in computer vision, the second-order descent method has two major disadvantages: 1) the objective function is not differentiable, so the idea of numerical approximation cannot be realized; 2) the dimension of the Hessian matrix is too high and not positive definite. Due to the existence of these problems, the problem is too expensive or unsolvable to solve.

基于级联形状回归的人脸对齐算法[文献8]-[文献14]根据初始化形状逐步估计形状增量来不断逼近标准形状，从而不需要计算海森矩阵和雅可比矩阵。基于形状回归的人脸对齐算法在时效性和精确性上都取得了不错的效果，已经成为人脸对齐领域的主流算法。基于形状回归的人脸对齐算法[文献8]-[文献11]都需要有一个初始化形状，这个初始化形状一般为平均脸。这类算法首先在平均脸的基准点上(邻域)提取特征，所有基准点的特征组成一个特征向量，该类算法直接估计出平均脸和标准形状的差异与对应的特征向量之间的映射关系R。测试阶段利用平均脸作为初始化数据和训练阶段估计得到的R作回归来对平均脸进行优化以逼近真实形状。The face alignment algorithm based on cascaded shape regression [Reference 8]-[Reference 14] gradually estimates the shape increment according to the initialized shape to continuously approximate the standard shape, so that there is no need to calculate the Hessian matrix and the Jacobian matrix. The face alignment algorithm based on shape regression has achieved good results in terms of timeliness and accuracy, and has become the mainstream algorithm in the field of face alignment. The face alignment algorithms based on shape regression [Document 8]-[Document 11] all need to have an initialization shape, which is generally an average face. This kind of algorithm first extracts features on the reference point (neighborhood) of the average face, and the features of all reference points form a feature vector. This kind of algorithm directly estimates the difference between the average face and the standard shape and the corresponding feature vector. relation R. In the testing phase, the average face is used as the initialization data and the R estimated in the training phase is used for regression to optimize the average face to approximate the true shape.

SDM[8]第一个提出使用级联回归的框架来解决人脸对齐问题，通过使用sift特征[文献15]和多次的级联回归来增强对人脸表情和头部姿势、光照变化差异的鲁棒性。Cao[文献9]在ESR中提出了非参数形状模型，认为每一个人脸的最终回归形状可以看成初始化形状和所有训练人脸形状向量的线性和,通过使用形状索引特征和相关联的特征选择方法可以快速学习准确的模型。Burgos-Artizzu[文献10]等提出了PCPR在对基准点的位置进行估计的同时可以检测遮挡信息，根据遮挡信息选择没有遮挡的形状索引特征来解决遮挡下的人脸对齐问题。Ren[文献11]等提出了有效且计算速度极快的局部二值特征并且使用随机森林进行分类回归，算法的速度达到了3000fps。Zhu[文献16]等在CCFS把人脸对齐分为粗糙选择阶段和精细选择阶段，粗糙选择阶段首先构建一个包含很多候选人脸形状的形状空间，然后确定一个子空间交给精细选择阶段处理，同时丢弃其它的一些和标准形状相差较大、没有希望的子空间；在精细选择阶段不断缩小这个空间直到其收敛到一个极小的、可以确定最后人脸形状的子空间。SDM [8] was the first to propose a cascaded regression framework to solve the face alignment problem, by using sift features [15] and multiple cascaded regressions to enhance the difference in facial expression, head pose, and illumination changes robustness. Cao [9] proposed a non-parametric shape model in ESR, arguing that the final regression shape of each face can be regarded as the linear sum of the initial shape and all training face shape vectors, by using shape index features and associated features Choose a method to quickly learn an accurate model. Burgos-Artizzu [10] and others proposed that PCPR can detect the occlusion information while estimating the position of the reference point, and select the shape index feature without occlusion according to the occlusion information to solve the face alignment problem under occlusion. Ren [11] et al. proposed an effective and extremely fast local binary feature and used random forest for classification and regression, and the speed of the algorithm reached 3000fps. Zhu [16] et al. divided face alignment into a rough selection stage and a fine selection stage in CCFS. In the rough selection stage, a shape space containing many candidate face shapes was first constructed, and then a subspace was determined to be processed by the fine selection stage. At the same time, some other subspaces that differ greatly from the standard shape and are not hopeful are discarded; this space is continuously reduced in the fine selection stage until it converges to an extremely small subspace that can determine the final face shape.

现阶段的人脸对齐算法可以很好的解决表情、头部姿势、光照差异变化较小的人脸对齐问题,例如，300-W数据集[文献17]comman subset下的人脸图片相对来说表情、头部姿势、光照差异变化较小，人脸对齐算法[文献18]在该数据集上的最佳误差为4％，而在遮挡问题比较严重的COFW[10]数据集上，人脸对齐算法[文献19]的最佳误差却为6.5％，因此无约束条件下的人脸对齐是现阶段人脸对齐领域亟待解决的问题。The current face alignment algorithm can well solve the face alignment problem with small changes in expressions, head poses, and illumination differences. The difference in expression, head pose, and illumination changes is small, and the best error of the face alignment algorithm [18] on this dataset is 4%, while on the COFW [10] dataset, which has serious occlusion problems, the face alignment algorithm [18] has the best error of 4%. The optimal error of the alignment algorithm [19] is 6.5%, so face alignment under unconstrained conditions is an urgent problem to be solved in the field of face alignment at this stage.

[文献1]Cootes T F,Taylor C J,Cooper D H,Graham J.Active shape models-their training and application.Computer vision and image understanding,1995,61(1):38-59.[Document 1] Cootes T F, Taylor C J, Cooper D H, Graham J. Active shape models-their training and application. Computer vision and image understanding, 1995, 61(1): 38-59.

[文献2]Matthews I,Baker S.Active appearance modelsrevisited.International Journal of Computer Vision,2004,60(2):135–164.[Document 2] Matthews I, Baker S. Active appearance models reviewed. International Journal of Computer Vision, 2004, 60(2): 135–164.

[文献3]Sauer P,Cootes T F,Taylor C J.Accurate regression proceduresfor active appearance models.//Proceedings of the British Machine VisionConference.Dundee,Scotland,2011:681-685.[Literature 3] Sauer P, Cootes T F, Taylor C J. Accurate regression procedures for active appearance models. // Proceedings of the British Machine Vision Conference. Dundee, Scotland, 2011: 681-685.

[文献4]Cootes T F,Edwards G J,Taylor C J.Active appearancemodels.IEEE Transactions on Pattern Analysis and Machine Intelligence,2001,23(6):581-585.[Document 4] Cootes T F, Edwards G J, Taylor C J. Active appearancemodels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(6): 581-585.

[文献5]Asthana A,Zafeiriou S,Cheng S,Pantic M.Robust discriminativeresponse map fitting with constrained local models.//IEEE Conference onComputer Vision and Pattern Recognition.Portland,USA,2013:3444-3451.[Document 5] Asthana A, Zafeiriou S, Cheng S, Pantic M. Robust discriminativeresponse map fitting with constrained local models.//IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 3444-3451.

[文献6]Cristinacce D,Cootes T.Feature detection and tracking withconstrained local models.//Proceedings of the British Machine VisionConference.Edinburgh,UK,2006:929-938.[Literature 6] Cristinacce D, Cootes T. Feature detection and tracking with constrained local models. // Proceedings of the British Machine Vision Conference. Edinburgh, UK, 2006: 929-938.

[文献7]Asthana A,Zafeiriou S,CHENG Shi-yang,Pantic M.Incremental FaceAlignment in the Wild.//IEEE Conference on Computer Vision and PatternRecognition.Columbus,USA,2014:1859-1867.[Document 7] Asthana A, Zafeiriou S, CHENG Shi-yang, Pantic M. Incremental FaceAlignment in the Wild.//IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 1859-1867.

[文献8]Xiong Xue-han,Torre F D L.Supervised descent method and itsapplications to face alignment.//IEEE Conference on Computer Vision andPattern Recognition.Portland,USA,2013:532-539.[Document 8] Xiong Xue-han, Torre F D L. Supervised descent method and its applications to face alignment.//IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 532-539.

[文献9]Cao Xu-dong,Wei Yi-chen,Wen Fang,Sun Jian.Face alignment byexplicit shape regression.International Journal of Computer Vision,2014,107(2):177-190.[Document 9] Cao Xu-dong, Wei Yi-chen, Wen Fang, Sun Jian. Face alignment by explicit shape regression. International Journal of Computer Vision, 2014, 107(2): 177-190.

[文献10]Burgos-Artizzu X P,Perona P,Dollar P.Robust face landmarkestimation under occlusion.//IEEE International Conference on ComputerVison.Sydney,Australia,2013:1513-1520.[Document 10] Burgos-Artizzu X P, Perona P, Dollar P. Robust face landmark estimation under occlusion. //IEEE International Conference on ComputerVison.Sydney,Australia,2013:1513-1520.

[文献11]Ren Shao-qing,Cao Xu-dong,Wei Yi-chen,Sun Jian.Face alignmentat 3000fps via regressing local binary features.//IEEE Conference on ComputerVision and Pattern Recognition.Columbus,USA,2014:1685-1692.[Document 11]Ren Shao-qing,Cao Xu-dong,Wei Yi-chen,Sun Jian.Face alignmentat 3000fps via regressing local binary features.//IEEE Conference on ComputerVision and Pattern Recognition.Columbus,USA,2014:1685-1692 .

[文献12]Dollar P,Welinder P,Perona P.Cascaded pose regression.//IEEEConference on Computer Vision and Pattern Recognition.San Francisco,USA,2010:1078-1085.[Document 12] Dollar P, Welinder P, Perona P. Cascaded pose regression.//IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, USA, 2010: 1078-1085.

[文献13]Tzimiropoulos G,Pantic M.Gauss-newton deformable part modelsfor face alignment in-the-wild.//IEEE Conference on Computer Vision andPattern Recognition.Columbus,USA,2014:1851-1858.[Document 13] Tzimiropoulos G, Pantic M. Gauss-newton deformable part models for face alignment in-the-wild.//IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 1851-1858.

[文献14]Smith B M,Brandt J,Lin Z,Zhang L.Nonparametric contextmodeling of local appearance for pose-and expression-robust facial landmarklocalization.//IEEE Conference on Computer Vision and PatternRecognition.Columbus,USA,2014:1741-1748.[Document 14] Smith B M, Brandt J, Lin Z, Zhang L.Nonparametric contextmodeling of local appearance for pose-and expression-robust facial landmarklocalization.//IEEE Conference on Computer Vision and PatternRecognition.Columbus,USA,2014:1741-1748 .

[文献15]Lowe D G.Distinctive image features from scale-invariantkeypoints.International Journal of Computer Vision,2004,60(2):91–110.[Document 15] Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91–110.

[文献16]Zhu Shi-zhan,Chen Li.LOY Chen-change,Tang Xiao-ou.FaceAlignment by Coarse-to-Fine Shape Searching.//IEEE Conference on ComputerVision and Pattern Recognition.Boston,USA,2015:4998-5006.[Document 16] Zhu Shi-zhan, Chen Li.LOY Chen-change, Tang Xiao-ou.FaceAlignment by Coarse-to-Fine Shape Searching.//IEEE Conference on ComputerVision and Pattern Recognition.Boston,USA,2015:4998- 5006.

[文献17]C.Sagonas,G.Tzimiropoulos,S.Zafeiriou,M.Pantic,A semi-automatic methodology for facial landmark annotation,in:Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops,2013,pp.896-903.[Document 17] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, M. Pantic, A semi-automatic methodology for facial landmark annotation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2013, pp.896-903 .

[文献18]S.Xiao,J.Feng,J.Xing,H.Lai,S.Yan,A.Kassim,Robust faciallandmark detection via recurrent attentive-renement networks,in:EuropeanConference on Computer Vision,2016,pp.57-72.[Document 18]S.Xiao,J.Feng,J.Xing,H.Lai,S.Yan,A.Kassim,Robust faciallandmark detection via recurrent attentive-renement networks,in:EuropeanConference on Computer Vision,2016,pp.57 -72.

[文献19]J.Zhang,M.Kan,S.Shan,X.Chen,Occlusion-free face alignment:Deep regression networks coupled with de-corrupt auto-encoders,in:IEEEConference on Computer Vision and Pattern Recognition,2016,pp.3428-3437.[Document 19] J. Zhang, M. Kan, S. Shan, X. Chen, Occlusion-free face alignment: Deep regression networks coupled with de-corrupt auto-encoders, in: IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp.3428-3437.

[文献20]V.Blanz,T.Vetter,Face recognition based on fitting a 3dmorphable model,IEEE Transactions on pattern analysis and machineintelligence.25(9)(2003)1063-1074.[Document 20] V.Blanz, T. Vetter, Face recognition based on fitting a 3dmorphable model, IEEE Transactions on pattern analysis and machineintelligence. 25(9)(2003) 1063-1074.

[文献21]P.Paysan,R.Knothe,B.Amberg,S.Romdhani,T.Vetter,A 3d facemodel for pose and illumination invariant face recognition,in:Advanced video440and signal based surveillance,2009.AVSS'09.Sixth IEEE InternationalConference on,IEEE,2009,pp.296-301.[Document 21] P. Paysan, R. Knothe, B. Amberg, S. Romdhani, T. Vetter, A 3d facemodel for pose and illumination invariant face recognition, in: Advanced video440 and signal based surveillance, 2009.AVSS'09.Sixth IEEE International Conference on, IEEE, 2009, pp.296-301.

[文献22]C.Cao,Y.Weng,S.Zhou,Y.Tong,K.Zhou,Facewarehouse:A 3d facialexpression database for visual computing,IEEE Transactions on Visualizationand Computer Graphics 20(3)(2014)413-425.[Document 22] C.Cao,Y.Weng,S.Zhou,Y.Tong,K.Zhou,Facewarehouse: A 3d facialexpression database for visual computing,IEEE Transactions on Visualization and Computer Graphics 20(3)(2014)413-425 .

[文献23]X.Zhu,Z.Lei,X.Liu,H.Shi,S.Z.Li,Face alignment across largeposes:A 3d solution,in:Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition,2016,pp.146{155.[Document 23] X.Zhu, Z.Lei, X.Liu, H.Shi, S.Z.Li, Face alignment across large poses: A 3d solution, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp.146 {155.

发明内容SUMMARY OF THE INVENTION

为了解决上述技术问题，本发明提出了一种基于由粗到细脸部形状估计的人脸对齐方法，主要解决人脸表情、头部姿势、光照条件的差异和部分遮挡的存在情况下人脸对齐精度不高问题。In order to solve the above technical problems, the present invention proposes a face alignment method based on the estimation of the face shape from coarse to fine, which mainly solves the problem of face expression, head posture, differences in lighting conditions and the existence of partial occlusion. Alignment accuracy is not high.

本发明所采用的技术方案是一种基于由粗到细脸部形状估计的人脸对齐方法，针对任一张输入人脸图片，首先估计出初始化人脸形状，然后逐步逼近人脸的真实形状，包括以下步骤，The technical solution adopted by the present invention is a face alignment method based on the estimation of the face shape from coarse to fine. For any input face picture, the initial face shape is estimated first, and then the real shape of the face is gradually approximated. , including the following steps,

步骤1，使用多任务深度学习框架对人脸的主要特征点的位置和人脸表情进行估计；Step 1, use the multi-task deep learning framework to estimate the position of the main feature points and the facial expression of the face;

步骤2，构建基于卷积神经网络的头部姿势分类模型对人脸头部姿势进行精确估计和分类；Step 2, constructing a head pose classification model based on convolutional neural network to accurately estimate and classify the face head pose;

步骤3，利用步骤2所得头部姿势分类结果，以及步骤1所得人脸表情和主要特征点的位置的估计结果，得到更加准确的初始化形状；Step 3, using the head pose classification result obtained in step 2, and the estimation result of the facial expression and the position of the main feature points obtained in step 1, to obtain a more accurate initialization shape;

步骤4，基于步骤3所得初始化形状，按照姿势以及表情的分类结果，训练各自的回归器，对人脸形状进行更新，逼近标准形状。Step 4, based on the initialized shape obtained in step 3, according to the classification results of pose and expression, train respective regressors, update the face shape, and approximate the standard shape.

而且，步骤1中，多任务深度学习框架中，多任务学习包括对主任务人脸主要特征点的估计和其它子任务的估计，其中，人脸主要特征点包括左右嘴角、鼻尖和左右眼中心，子任务包括头部姿势、性别、眼睛形状和嘴巴形状的估计。Moreover, in step 1, in the multi-task deep learning framework, the multi-task learning includes the estimation of the main feature points of the face of the main task and the estimation of other sub-tasks, wherein the main feature points of the face include the left and right corners of the mouth, the tip of the nose and the center of the left and right eyes , subtasks include estimation of head pose, gender, eye shape, and mouth shape.

而且，步骤2中，基于卷积神经网络的头部姿势分类模型首先对人脸进行三维建模，得到人脸的角度参数pitch、jaw和roll，然后根据角度参数的取值范围对人脸图片进行分类；对于分类之后的图片，使用人脸侧面化方法生成其它类的图片，得到的新的图片集；将新的图片集作为基于卷积神经网络的头部姿势分类模型的训练集完成对模型的训练。Moreover, in step 2, the head pose classification model based on the convolutional neural network first performs three-dimensional modeling on the face to obtain the angle parameters pitch, jaw and roll of the face, and then according to the value range of the angle parameter For classification; for the classified pictures, use the face profile method to generate pictures of other categories, and obtain a new picture set; use the new picture set as the training set of the head pose classification model based on the convolutional neural network to complete the pairing. Model training.

而且，步骤3中，对图片通过基于卷积神经网络的头部姿势分类模型得到对应的输出类别c，选择对应的平均人脸形状

再根据主要特征点的位置进行调整，使得

的主要特征点和检测到的人脸主要特征点的误差最小，实现得到该图片人脸的初始化形状S_i。Moreover, in step 3, the corresponding output category c is obtained for the picture through the head pose classification model based on the convolutional neural network, and the corresponding average face shape is selected.

Then adjust according to the position of the main feature points, so that

The error between the main feature points and the detected main feature points of the face is the smallest, and the initialized shape S _i of the face in the picture is obtained.

而且，步骤4中，采用更高级的级联回归框架，首先对人脸对齐问题的优化空间进行域的划分，使得每个域包含的人脸形状比较相似，从而在回归器的训练中具有相同的梯度下降方向，每个域训练出各自的回归器；在对人脸形状进行更新时，首先判断该人脸形状属于哪一个域，然后采用对应域的回归器对其进行更新。Moreover, in step 4, a more advanced cascade regression framework is adopted, and the optimization space of the face alignment problem is firstly divided into domains, so that the face shapes contained in each domain are relatively similar, so that they have the same characteristics in the training of the regressor. In the gradient descent direction of , each domain trains its own regressor; when updating the face shape, first determine which domain the face shape belongs to, and then use the regressor of the corresponding domain to update it.

本发明所提供技术方案是一种简单却具有不错鲁棒性的人脸对齐方法，使用基于卷积神经网络由粗到细的脸部形状估计方法可以以很高的准确率选择附加属性接近的人脸形状作为初始化形状，从而降低初始化形状对平均脸的依赖以及增强算法对人脸头部姿势、面部表情、遮挡和光照条件差异的鲁棒性，提升对齐效果。本发明通过构建更准确的人脸初始化形状和采用更高级的级联回归框架，提高了算法对于人脸表情、头部姿势以及光照遮挡差异的鲁棒性。The technical solution provided by the present invention is a simple but robust face alignment method. Using the face shape estimation method from coarse to fine based on convolutional neural network, it is possible to select a face shape with close additional attributes with high accuracy. The face shape is used as the initialization shape, thereby reducing the dependence of the initialization shape on the average face and enhancing the robustness of the algorithm to differences in face head pose, facial expression, occlusion and lighting conditions, and improving the alignment effect. By constructing a more accurate face initialization shape and adopting a more advanced cascaded regression framework, the invention improves the robustness of the algorithm to differences in face expression, head posture and illumination occlusion.

附图说明Description of drawings

图1是本发明实施例的流程图。FIG. 1 is a flowchart of an embodiment of the present invention.

图2是本发明实施例中构建的基于卷积神经网络的头部姿势分类模型。FIG. 2 is a head pose classification model based on a convolutional neural network constructed in an embodiment of the present invention.

图3是本发明实施例中传统人脸对齐方法与本发明在处理人脸头部姿势和表情夸张图片时的对比示意图。FIG. 3 is a schematic diagram of comparison between the traditional face alignment method in the embodiment of the present invention and the present invention when processing exaggerated pictures of human face and head posture and expressions.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明做进一步详细说明。应当理解，此处描述的具体实施例仅用以解释本发明，并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

基于由粗到细脸部形状估计的人脸对齐方法是一种简单却具有不错鲁棒性的人脸对齐方法。使用基于卷积神经网络由粗到细的脸部形状估计方法可以以很高的准确率选择附加属性接近的人脸形状作为初始化形状，从而降低初始化形状对平均脸的依赖以及增强算法对人脸头部姿势、面部表情、遮挡和光照条件差异的鲁棒性，提升算法的效果。The face alignment method based on coarse-to-fine face shape estimation is a simple yet robust face alignment method. Using the face shape estimation method based on convolutional neural network from coarse to fine, the face shape with close additional attributes can be selected as the initial shape with high accuracy, thereby reducing the dependence of the initial shape on the average face and the enhancement algorithm on the face. Robustness to differences in head pose, facial expression, occlusion, and lighting conditions, improving the performance of the algorithm.

请见图1，本发明提供的一种基于由粗到细脸部形状估计的人脸对齐方法，针对任一张输入人脸图片，首先估计出初始化人脸形状，然后逐步逼近人脸的真实形状；其具体实现包括以下步骤：Please refer to FIG. 1, a face alignment method based on the estimation of the face shape from coarse to fine provided by the present invention, for any input face picture, firstly estimates the initial face shape, and then gradually approximates the real face of the face shape; its specific implementation includes the following steps:

步骤1：使用多任务深度学习框架对人脸的主要特征点的位置和人脸表情进行估计；Step 1: Use the multi-task deep learning framework to estimate the position of the main feature points and the facial expression of the face;

多任务学习包括对主任务人脸主要特征点的估计和其它子任务的估计。其中，子任务包括头部姿势、性别、眼睛形状、嘴巴形状的估计。主任务和子任务分别使用最小二乘和交叉熵函数作为损失函数。Multi-task learning includes the estimation of the main feature points of the main task face and the estimation of other sub-tasks. Among them, subtasks include estimation of head pose, gender, eye shape, and mouth shape. The main task and subtask use the least squares and cross-entropy functions as loss functions, respectively.

实施例中，步骤1中多任务学习定义为主任务人脸主要特征点(左右嘴角、鼻尖、左右眼中心)的估计和其它子任务的估计，它们对应的标记为

其中i表示训练图片的索引号，N为训练图片集中图片的数量。

表示主要特征点检测任务的标记，其余的表示其它的附加属性(头部姿势、性别、眼睛、嘴巴)任务的标记。

表示5个特征点的坐标(

表示为维度为10维的向量)，

表示按照偏航角进行划分的5种不同人脸姿势(0,±30°,±60°)。

是二值特征，分别表示男或女。

分别表示有带眼镜、睁眼和闭眼，

表示微笑、咧嘴笑、嘴巴闭合、嘴巴张开。则该多任务学习的目标函数可以表示成：In the embodiment, the multi-task learning in step 1 is defined as the estimation of the main feature points (left and right mouth corners, nose tip, left and right eye centers) of the main task face and the estimation of other sub-tasks, and their corresponding labels are

where i represents the index number of the training image, and N is the number of images in the training image set.

Labels representing the main feature point detection task, and the rest representing labels for other additional attribute (head pose, gender, eyes, mouth) tasks.

Represents the coordinates of the 5 feature points (

represented as a 10-dimensional vector),

Indicates 5 different face poses (0, ±30°, ±60°) divided according to the yaw angle.

is a binary feature, representing male or female, respectively.

Respectively, with glasses, eyes open and eyes closed,

Means smile, grin, mouth closed, mouth open. Then the objective function of the multi-task learning can be expressed as:

其中，in,

表示所有的特征向量的集合，F(x_i；W^r)＝(W^r)^Tx_i是一个线性函数，表示根据第i个特征x_i和训练得到的映射关系W^r对人脸主要特征点的位置进行计算的过程，其中，W^r表示从特征x_i到真实的人脸主要特征点

之间的映射关系。

Represents the set of all feature vectors, F(x _i ; W ^r )=(W ^r ) ^T x _i is a linear function, which represents the main feature of the face according to the i-th feature x _i and the mapping relationship W ^r obtained by training The process of calculating the position of the point, where W ^r represents the main feature point from the feature _xi to the real face

the mapping relationship between them.

为使用softmax函数表示的后验概率，

表示矩阵W^a的第j列，W^a表示从特征x_i到子任务a的标记

之间的映射关系，就是极大似然估计方程的参数，m表示子任务a的某一标记，例如当a为性别估计子任务时，m可以为0或1。

is the posterior probability expressed using the softmax function,

represents the jth column of matrix W ^a , W ^a represents the label from feature x _i to subtask a

The mapping relationship between is the parameter of the maximum likelihood estimation equation, and m represents a certain mark of subtask a. For example, when a is a gender estimation subtask, m can be 0 or 1.

对于子任务a的估计，采用极大似然估计方法，分别求出不同标记预测值(m的取值不同)的概率。例如当a为性别估计子任务时，m可以为0或1，也就是分别求出是男还是女的概率。该概率由极大似然估计和softmax函数计算得到。For the estimation of the subtask a, the maximum likelihood estimation method is used to obtain the probability of the predicted values of different labels (the values of m are different). For example, when a is a gender estimation subtask, m can be 0 or 1, that is, the probability of being male or female is calculated respectively. This probability is calculated by the maximum likelihood estimation and the softmax function.

是m相应的极大似然估计的参数，例如W₁ ^a对应m＝1时，极大似然估计的参数。

is the parameter of maximum likelihood estimation corresponding to m, for example, when W ₁ ^a corresponds to m=1, the parameter of maximum likelihood estimation.

是正则项，参数W＝{W^r,{W^a}}表示惩罚项，a表示属于A某一个任务，A表示不包含特征点检测的其它所有附加属性检测任务的集合。T表示所有的任务，包含主任务(主要特征点检测)和A，t为任务序号。

is the regular term, the parameter W={W ^r , {W ^a }} represents the penalty term, a represents a task belonging to A, and A represents the set of all other additional attribute detection tasks that do not include feature point detection. T represents all tasks, including the main task (main feature point detection) and A, and t is the task sequence number.

λ^a表示不同子任务在总体目标函数所占的权值(主要特征点估计任务对应的权值为1)。λ ^a represents the weight of different subtasks in the overall objective function (the weight corresponding to the main feature point estimation task is 1).

步骤2：构建基于卷积神经网络的头部姿势分类模型对人脸头部姿势进行精确估计、分类；Step 2: Build a head pose classification model based on convolutional neural network to accurately estimate and classify the face head pose;

构建基于卷积神经网络的头部姿势分类模型，目的在于，对于任一输入人脸图片，使用头部姿势分类模型对其人脸的头部姿势进行精细估计、分类。The purpose of building a head pose classification model based on convolutional neural network is to use the head pose classification model to finely estimate and classify the head pose of any input face image.

本步骤基于卷积神经网络的头部姿势分类模型首先对人脸进行三维建模，得到人脸的角度参数(pitch、jaw、roll)，然后根据角度参数的取值范围对人脸图片进行分类。对于分类之后的图片，使用人脸侧面化方法生成其它类的图片，得到的新的图片集。将这个新的图片集作为基于卷积神经网络的头部姿势分类模型的训练集完成对模型的训练。In this step, the head pose classification model based on the convolutional neural network firstly models the face in 3D to obtain the angle parameters (pitch, jaw, roll) of the face, and then classify the face pictures according to the value range of the angle parameters . For the classified pictures, use the face profile method to generate pictures of other categories, and obtain a new picture set. The training of the model is completed using this new set of images as the training set for the convolutional neural network-based head pose classification model.

基于卷积神经网络的头部姿势分类模型训练时需要大量的训练集，而现在已有的图片集(300-W)的规模太小，因此需要对其进行扩充。本实施例通过利用人脸侧面化方法合成不同姿势的人脸图片扩大训练集。首先对图片进行3D建模和分类，然后使用人脸侧面化方法合成一些新的不同姿势(类)的图片，去掉一些失真严重的合成图片，The training of the head pose classification model based on convolutional neural network requires a large number of training sets, and the scale of the existing image set (300-W) is too small, so it needs to be expanded. In this embodiment, the training set is enlarged by synthesizing face pictures of different poses by using the face profiling method. First, 3D modeling and classification are performed on the pictures, and then some new pictures with different poses (classes) are synthesized by using the face profile method, and some synthetic pictures with serious distortion are removed.

实施例的具体实现如下：The specific implementation of the embodiment is as follows:

1)基于卷积神经网络的头部姿势分类模型首先对300-W图片集[文献17]的每一张图片人脸进行三维建模，得到人脸的角度参数(pitch(俯仰角)、jaw(偏航角)、roll(翻滚角))。1) The head pose classification model based on convolutional neural network firstly performs 3D modeling on each image face in the 300-W image set [17], and obtains the angle parameters of the face (pitch (pitch angle), jaw (yaw angle), roll (roll angle)).

在对二维图片人脸进行三维建模时，采用Blenz等在文章中提出的3DMM模型[文献20](3D人脸形变模型)，使用PCA降维的方法来描述3D人脸形状空间，When 3D modeling of 2D image faces, the 3DMM model proposed by Blenz et al. [20] (3D face deformation model) is used, and the PCA dimension reduction method is used to describe the 3D face shape space.

其中，in,

V表示一个三维人脸，

表示三维平均脸，A_id、A_exp分别为三维人脸形状空间的形状主成分以及表情主成分，分别来自于BFM模型[文献21]和Face-Warehouse[文献22],a_id、a_exp分别是形状参数和表情参数。然后，三维人脸根据二维人脸基准点与三维平均脸基准点之间的对应关系，通过弱透视投影投影到二维平面，对人脸姿势的角度参数进行估计：V represents a three-dimensional face,

Represents a three-dimensional average face, A _id and A _exp are the shape principal components and expression principal components of the three-dimensional face shape space, respectively, from the BFM model [Document 21] and Face-Warehouse [Document 22], a _id , a _exp respectively are shape parameters and expression parameters. Then, the 3D face is projected to the 2D plane through weak perspective projection according to the correspondence between the 2D face reference point and the 3D average face reference point, and the angle parameters of the face pose are estimated:

其中，in,

R是由pitch、jaw、roll(角度参数)构成的旋转矩阵，t_2d为平移矩阵，f是放缩因子，

是正交投影矩阵。S是由3D图片或点云投影得到的2D图片或点云(一般表示人脸形状，把人脸特征点坐标按照一定顺序排列得到的矩阵)，通过多次迭代和特征点匹配的方法可以得到更加准确的R矩阵，即pitch、jaw、roll角度。R is a rotation matrix composed of pitch, jaw, roll (angle parameter), t _2d is a translation matrix, f is a scaling factor,

is the orthographic projection matrix. S is a 2D image or point cloud obtained by projecting a 3D image or point cloud (generally representing the shape of the face, a matrix obtained by arranging the coordinates of the face feature points in a certain order), which can be obtained by multiple iterations and feature point matching. More accurate R matrix, namely pitch, jaw, roll angles.

2)根据角度参数的取值范围对300-W图片集的人脸图片进行分类，并且计算每一类的平均人脸形状。2) Classify the face pictures in the 300-W picture set according to the value range of the angle parameter, and calculate the average face shape of each category.

按照人脸姿势的角度参数的估计结果，对300-W图片集进行分类。首先根据pitch角度取值范围([-45°,-15°]、[-15°,15°]、[15°,45°])把人脸图片分为3类，然后再分别对这3类图片按照jaw角度取值([-50°,50°]平均划分为5类)范围进行划分，总共产生了15类图片，再根据roll角度的取值范围([-50°,50°]平均划分为5类)进行划分，总共75类。用c表示类别，{S^c}指第c类图片的集合，对应第c类人脸图片构建人脸形状子空间。根据上述的分类规则，每临近的5类(c＝5j+1…c＝5j+5(其中j＝0…24))具有相接近的pitch、jaw角度，只是roll角度相差较大，而且训练图片大部分都只有较小的角度变化，而这些图片中roll角度同样很小，即第c₃＝5j+3类的图片相对较多，第c₁、c₂、c₄、c₅的图片比较少，因此可以首先计算出c₃类图片的人脸平均形状，然后在旋转一定的角度得到其它四类(c₁、c₂、c₄、c₅)的人脸平均形状，这样可以避免因为某类图片太少而使单独计算得到人脸的平均形状代表性不强，使得初始化形状误差较大；另外在计算c₃类图片的人脸平均形状时，对图片根据人脸的附加属性

进一步划分，即眼睛的睁(戴眼镜)、闭，嘴巴的开(大笑)、合(咧嘴笑)，即每一类图片有四种平均形状，如果某类图片集图片数量为0，则从这四种平均形状中选择与其误差最小的作为该类的平均形状。假设某类中眼睛闭、嘴巴合的图片不存在，优先选择眼睛睁、嘴巴合的那一个平均形状，这样就构成了300个人脸子空间，每个人脸子空间计算平均人脸形状，用

表示，其中γ∈(0,1,2,3)。The 300-W image set is classified according to the estimation results of the angle parameters of the face pose. First, according to the pitch angle value range ([-45°, -15°], [-15°, 15°], [15°, 45°]), the face pictures are divided into three categories, and then the three The class pictures are divided according to the value of the jaw angle ([-50°, 50°] is divided into 5 categories on average), and a total of 15 categories of pictures are generated, and then according to the value range of the roll angle ([-50°, 50°] It is divided into 5 categories on average) for a total of 75 categories. Use c to represent the category, {S ^c } refers to the set of the c-th type of pictures, and construct the face shape subspace corresponding to the c-th type of face pictures. According to the above classification rules, each adjacent 5 categories (c=5j+1...c=5j+5 (where j=0...24)) have similar pitch and jaw angles, but the roll angles are quite different, and the training Most of the pictures have only a small angle change, and the roll angle in these pictures is also very small, that is, there are relatively more pictures in the c ₃ =5j+3 category, and pictures in the c ₁ , c ₂ , c ₄ , and c ₅ . It is relatively small, so you can first calculate the average face shape of the c3 category _of pictures, and then rotate to a certain angle to obtain the average face shape _of the other _four categories (c1, c2, c4, _c5 ₎ , which can avoid Because there are too few pictures of a certain type, the average shape of the face calculated separately is not representative, which makes the initial shape error larger; in addition, when calculating the average shape of the face of the c3 categories _of pictures, the additional attributes of the face are used for the picture.

Further division, that is, the eyes are open (wearing glasses), closed, the mouth is open (laughing), and closed (grinning), that is, each type of picture has four average shapes, if the number of pictures in a certain type of picture set is 0, then Among the four average shapes, the one with the smallest error is selected as the average shape of the class. Assuming that there is no picture with eyes closed and mouth closed in a certain category, the average shape with eyes open and mouth closed is preferred, so that 300 face subspaces are formed. Each face subspace calculates the average face shape, using

represents, where γ∈(0,1,2,3).

3)利用上述构建的3D人脸模型，使用人脸侧面化方法[文献23]，合成一些新的不同姿势(类)的图片，扩充训练集，进行头部姿势分类模型的训练。3) Using the 3D face model constructed above, using the face profile method [23], synthesizing some new pictures of different poses (classes), expanding the training set, and training the head pose classification model.

在实际过程中，并不可能对每一张图片都用上述方法进行人脸角度参数的估计，然后进行分类，这会消耗大量的时间和空间。因此，本实施例提出构建基于卷积神经网络的头部姿势分类模型，针对任一输入图片就能直接给出对应的分类结果。该模型的训练需要大量的训练图片集，本实施例利用人脸侧面化方法合成不同姿势的人脸图片扩大训练集，去掉一些失真严重的合成图片，每类大约1000张图片，总共75000张图片，标记为

选择其中的67500张作为训练集，剩余的7500张作为验证集。本实施例把训练集的所有图片都转换成96×96大小，作为卷积神经网络的输入，请见图2。卷积1层的卷积核比较大(卷积核为11)，是为了更快的过滤掉噪声信息，提取有用的信息。卷积2层、卷积3层的卷积核逐渐减小，因为需要对过滤掉的特征信息进行多次处理，来得到更加准确的特征信息。全连接层加入了drop策略，在模型训练时随机让网络某些隐含节点的权重不工作，不工作节点的权重暂时保存起来供以后的样本输入时使用，因此在训练样本较少时，可以作为防止模型过拟合的一种策略。训练卷积神经网络的过程可以表示为In the actual process, it is impossible to use the above method to estimate the face angle parameters for each picture, and then classify them, which consumes a lot of time and space. Therefore, this embodiment proposes to construct a head pose classification model based on a convolutional neural network, which can directly give a corresponding classification result for any input picture. The training of this model requires a large number of training image sets. In this embodiment, the face profile method is used to synthesize face images of different poses to expand the training set, and some severely distorted synthetic images are removed. There are about 1,000 images for each type, and a total of 75,000 images. ,Mark as

Among them, 67,500 images were selected as training set, and the remaining 7,500 images were used as validation set. In this example, all the pictures in the training set are converted into 96×96 size, as the input of the convolutional neural network, see Figure 2. The convolution kernel of the convolution layer 1 is relatively large (the convolution kernel is 11), in order to filter out noise information faster and extract useful information. The convolution kernel of convolution layer 2 and convolution layer 3 is gradually reduced, because the filtered feature information needs to be processed multiple times to obtain more accurate feature information. The drop strategy is added to the fully connected layer. During model training, the weights of some hidden nodes in the network are randomly disabled. The weights of non-working nodes are temporarily saved for future sample input. Therefore, when there are few training samples, you can As a strategy to prevent model overfitting. The process of training a convolutional neural network can be expressed as

其中，c_k表示扩展之后第k张图片

的分类结果，

表示扩展之后的图片集，N₂为扩展之后的图片集的数量，cnn()表示训练之前的头部姿势分类模型，net表示训练好的卷积神经网络参数。测试阶段卷积神经网络的前向计算过程可以表示为：Among them, _ck represents the kth picture after expansion

the classification result,

Represents the image set after expansion, N ₂ is the number of image sets after expansion, cnn() represents the head pose classification model before training, and net represents the trained convolutional neural network parameters. The forward calculation process of the convolutional neural network in the test phase can be expressed as:

表示某一张测试图片，c为根据神经网络net预测的分类结果，这样在测试阶段可以不需要测试人脸标准形状的坐标就能完成对该图片的分类。

Indicates a test image, and c is the classification result predicted by the neural network net, so that the classification of the image can be completed without testing the coordinates of the standard shape of the face in the testing stage.

步骤3：利用头部姿势分类结果、人脸表情(确定人脸形状)和主要特征点的位置(辅助人脸定位)可以得到更加准确的初始化形状；Step 3: A more accurate initialization shape can be obtained by using the head pose classification result, the facial expression (determining the shape of the face) and the position of the main feature points (assisting the face positioning);

本步骤根据基于卷积神经网络的头部姿势分类模型的分类结果，再结合步骤1得到的人脸主要特征点的位置以及其它子任务的估计结果，构建图片的初始化形状。In this step, the initial shape of the picture is constructed according to the classification result of the head pose classification model based on the convolutional neural network, combined with the position of the main feature points of the face obtained in step 1 and the estimation results of other subtasks.

步骤3中的具体实现过程为：The specific implementation process in step 3 is:

对图片进行预处理之后作为神经网络的输入，得到对应的输出类别c，然后根据步骤1中对人脸主要特征点的检测结果，选择对应的平均人脸形状

再根据主要特征点的位置进行调整(旋转、平移)，使得

的5个主要特征点和检测到的人脸主要特征点的误差最小，这样就得到了该图片人脸的初始化形状S_i。After the image is preprocessed, it is used as the input of the neural network, and the corresponding output category c is obtained, and then the corresponding average face shape is selected according to the detection results of the main feature points of the face in step 1.

Then adjust (rotation, translation) according to the position of the main feature points, so that

The error between the five main feature points and the detected main feature points of the face is the smallest, so that the initialized shape S _i of the face in the picture is obtained.

其中，in,

是旋转矩阵，

表示旋转的角度，t_2d为平移向量，f是放缩因子。

is the rotation matrix,

Indicates the angle of rotation, t _2d is the translation vector, and f is the scaling factor.

在

中分别取左右眼睛对应的坐标求平均值、鼻尖以及左右嘴角对应的坐标，用向量y^cγ表示，使得它和步骤1中对人脸主要特征点的检测结果y^r的误差最小：exist

Take the coordinates corresponding to the left and right eyes respectively to obtain the average value, the coordinates corresponding to the nose tip and the left and right corners of the mouth, and use the vector y ^{cγ to} represent it, so that it has the smallest error with the detection result y ^r of the main feature points of the face in step 1:

步骤4：上述构建的人脸形状作为最终的初始化形状，按照姿势以及表情的分类结果，训练各自的回归器，对人脸形状进行更新，逼近标准形状。Step 4: The face shape constructed above is used as the final initialization shape, and the respective regressors are trained according to the classification results of pose and expression, and the face shape is updated to approximate the standard shape.

实施例中，步骤4的具体实现过程为：In the embodiment, the specific implementation process of step 4 is:

采用更高级的级联回归框架，对初始化形状进行处理逐步逼近真实人脸形状。Using a more advanced cascaded regression framework, the initial shape is processed to gradually approximate the real face shape.

首先对人脸对齐问题的优化空间进行域的划分，使得每个域包含的人脸形状比较相似，从而在回归器的训练中具有相同的梯度下降方向，每个域训练出各自的回归器，域的回归器的训练过程的目标函数为：First, the optimization space of the face alignment problem is divided into domains, so that the face shapes contained in each domain are relatively similar, so that they have the same gradient descent direction in the training of the regressor, and each domain trains its own regressor, The objective function of the training process of the regressor of the domain is:

其中，in,

表示t阶段第m个域的回归器，对应t阶段的回归矩阵；Ω_m表示被划分到第m个域的图片集合。

表示第t阶段对图片I_i根据t-1阶段的人脸形状

提取的全局二值特征；第二个部分是

的L2正则项，η是控制正则化强度。

表示

与真实的人脸形状之间的误差。

Represents the regressor of the mth domain in the t stage, corresponding to the regression matrix of the t stage; Ω _m represents the set of pictures that are divided into the mth domain.

Indicates the face shape of the t-th stage for the picture I _i according to the t-1 stage

Extracted global binary features; the second part is

The L2 regularization term of , η is the control regularization strength.

express

error from the true face shape.

对人脸形状进行更新时，首先判断该人脸形状属于哪一个域，然后采用对应域的回归器对其进行更新。人脸形状的更新过程为：When updating the face shape, first determine which domain the face shape belongs to, and then use the regressor of the corresponding domain to update it. The update process of the face shape is:

其中，

表示在第t阶段新估计得到的人脸形状，

表示上一阶段估计得到的人脸形状，T为级联回归的总轮数，优选建议值为8。in,

represents the face shape newly estimated in the t-th stage,

Indicates the face shape estimated in the previous stage, T is the total number of rounds of cascade regression, and the recommended value is 8.

训练阶段按照上述步骤1到步骤4进行，可以通过训练得到不同域的回归器(回归矩阵)。测试阶段，首先按照步骤1到3构建新的初始化人脸形状，然后利用对应域的回归器(由训练阶段得到)对人脸形状进行更新，请见图1。The training phase is carried out according to the above steps 1 to 4, and regressors (regression matrices) in different domains can be obtained through training. In the testing phase, first construct a new initialized face shape according to steps 1 to 3, and then use the regressor of the corresponding domain (obtained by the training phase) to update the face shape, see Figure 1.

本实施例的结果与当前流行的人脸对齐方法相比，在精确性上有一定的提高，参见图3，图3.(a)给出了本实施例(CFSE)和当前流行的人脸对齐算法LBF[文献11]、ESR[文献9]、SDM[文献8]相同测试图片的人脸对齐效果。图3.(b)给出本实施例在实际监控下(前四张)、以及不同数据集下(最后一行对应helen数据集194个特征点，其余对应ibug数据集下图片68个特征点)的实验结果。Compared with the current popular face alignment methods, the results of this embodiment have a certain improvement in accuracy, see Figure 3, Figure 3. (a) shows this embodiment (CFSE) and the current popular face alignment method Alignment Algorithms LBF [Document 11], ESR [Document 9], SDM [Document 8] The face alignment effect of the same test image. Figure 3. (b) shows this embodiment under actual monitoring (the first four pictures) and under different data sets (the last line corresponds to 194 feature points in the helen dataset, and the rest corresponds to 68 feature points in the pictures under the ibug dataset) the experimental results.

具体实施时，以上流程可采用计算机软件技术实现自动运行。During specific implementation, the above process can be automatically run by using computer software technology.

应理解，上述实施例仅用于说明本发明而不用于限制本发明的范围。此外应理解，在阅读了本发明讲授的内容之后，本领域技术人员可以对本发明作各种改动或修改，而不脱离本发明技术方案的精神和范围，其均应涵盖在本发明的权利要求范围中。It should be understood that the above embodiments are only used to illustrate the present invention and not to limit the scope of the present invention. In addition, it should be understood that after reading the content taught by the present invention, those skilled in the art can make various changes or modifications to the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention. in the range.

Claims

1. A face alignment method based on the estimation of the face shape from coarse to fine, characterized in that: for any input face picture, first estimate the initial face shape, and then gradually approach the true shape of the face, including the following steps,

Step 1, use the multi-task deep learning framework to estimate the position of the main feature points and the facial expression of the face;

Step 2, constructing a head pose classification model based on convolutional neural network to accurately estimate and classify the face head pose;

Step 3, using the head pose classification result obtained in step 2, and the estimation result of the facial expression and the position of the main feature points obtained in step 1, to obtain a more accurate initialization shape;

The implementation method is to obtain the corresponding output category c for the picture through the head pose classification model based on the convolutional neural network, and select the corresponding average face shape.

Then adjust according to the position of the main feature points, so that

The error between the main feature points and the detected main feature points of the face is the smallest, and the initialization shape S _i of the face in the picture is obtained;

Step 4, based on the initialized shape obtained in step 3, according to the classification results of pose and expression, train respective regressors, update the face shape, and approximate the standard shape.

2. The face alignment method based on the estimation of the face shape from coarse to fine according to claim 1, is characterized in that: in step 1, in the multi-task deep learning framework, multi-task learning includes the main features of the main task face Point estimation and estimation of other sub-tasks, where the main feature points of the face include the left and right mouth corners, the tip of the nose and the center of the left and right eyes, and the sub-tasks include the estimation of head pose, gender, eye shape and mouth shape.

3. the face alignment method based on the estimation of the face shape from thick to thin according to claim 1, it is characterized in that: in step 2, the head pose classification model based on convolutional neural network first carries out three-dimensional modeling of face. model, obtain the angle parameters pitch, jaw and roll of the face, and then classify the face pictures according to the value range of the angle parameters; for the classified pictures, use the face profile method to generate other types of pictures, and the obtained new The new image set is used as the training set of the convolutional neural network-based head pose classification model to complete the training of the model.

4. the face alignment method based on the face shape estimation from thick to thin according to claim 1 or 2 or 3, it is characterized in that: in step 4, adopt more advanced cascade regression framework, first align the face The optimization space of the problem is divided into domains, so that the face shapes contained in each domain are relatively similar, so that they have the same gradient descent direction in the training of the regressor, and each domain trains its own regressor; When updating, first determine which domain the face shape belongs to, and then use the regressor of the corresponding domain to update it.