CN108446672B - Face alignment method based on shape estimation of coarse face to fine face - Google Patents

Face alignment method based on shape estimation of coarse face to fine face Download PDF

Info

Publication number
CN108446672B
CN108446672B CN201810358918.1A CN201810358918A CN108446672B CN 108446672 B CN108446672 B CN 108446672B CN 201810358918 A CN201810358918 A CN 201810358918A CN 108446672 B CN108446672 B CN 108446672B
Authority
CN
China
Prior art keywords
face
shape
estimation
head pose
feature points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810358918.1A
Other languages
Chinese (zh)
Other versions
CN108446672A (en
Inventor
李晶
万俊
常军
吴玉佳
肖雅夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201810358918.1A priority Critical patent/CN108446672B/en
Publication of CN108446672A publication Critical patent/CN108446672A/en
Application granted granted Critical
Publication of CN108446672B publication Critical patent/CN108446672B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于由粗到细脸部形状估计的人脸对齐方法,针对任一张输入人脸图片,首先估计出初始化人脸形状,然后逐步逼近人脸的真实形状,包括使用多任务深度学习框架对人脸的主要特征点的位置和人脸表情进行估计,构建基于卷积神经网络的头部姿势分类模型对人脸头部姿势进行精确估计和分类,利用头部姿势分类结果以及人脸表情和主要特征点的位置的估计结果,得到更加准确的初始化形状;基于初始化形状,按照姿势以及表情的分类结果,训练各自的回归器,对人脸形状进行更新,逼近标准形状。本发明通过构建更准确的人脸初始化形状和采用更高级的级联回归框架,提高了对于人脸表情、头部姿势以及光照遮挡差异的鲁棒性。

Figure 201810358918

The invention discloses a face alignment method based on face shape estimation from coarse to fine. For any input face picture, the initial face shape is first estimated, and then the real face shape is gradually approximated, including using multiple The task deep learning framework estimates the position and facial expression of the main feature points of the face, builds a head pose classification model based on convolutional neural network to accurately estimate and classify the face head pose, and uses the head pose classification results. As well as the estimation results of the facial expression and the position of the main feature points, a more accurate initialized shape is obtained; based on the initialized shape, according to the classification results of the pose and expression, the respective regressors are trained to update the face shape and approximate the standard shape. By constructing a more accurate face initialization shape and adopting a more advanced cascaded regression framework, the present invention improves the robustness to differences in face expression, head posture and illumination occlusion.

Figure 201810358918

Description

一种基于由粗到细脸部形状估计的人脸对齐方法A Face Alignment Method Based on Coarse-to-fine Face Shape Estimation

技术领域technical field

本发明属于本发明属于计算机视觉技术领域,具体涉及数字图像的人脸识别领域中的一种基于由粗到细脸部形状估计的人脸对齐方法。The present invention belongs to the technical field of computer vision, and in particular relates to a face alignment method based on estimation of face shape from coarse to fine in the field of face recognition of digital images.

背景技术Background technique

人脸对齐能够提供准确的、有特定语义的人脸形状信息,可以帮助实现几何图像归一化和特征提取。因此,人脸对齐是人脸识别、人脸姿态表情分析、人机交互以及三维人脸建模不可缺少的重要组成部分,广泛应用于安防、公安布控、智能门禁、人机交互、辅助驾驶、影视制作、视频会议等领域。在实际情况中,由于人脸表情、头部姿势、光照条件的差异和部分遮挡的存在,人脸对齐问题仍然面临着巨大的挑战。因此如何更好的解决这些无约束条件下的人脸对齐,是现阶段研究人脸对齐问题的主要趋势。Face alignment can provide accurate and semantic face shape information, which can help achieve geometric image normalization and feature extraction. Therefore, face alignment is an indispensable and important part of face recognition, facial gesture expression analysis, human-computer interaction and 3D face modeling, and is widely used in security, public security control, intelligent access control, human-computer interaction, assisted driving, Film and television production, video conferencing and other fields. In practical situations, the face alignment problem still faces huge challenges due to differences in facial expressions, head poses, lighting conditions, and partial occlusions. Therefore, how to better solve the face alignment under these unconstrained conditions is the main trend of research on face alignment at this stage.

近年来,随着级联回归框架广泛应用于人脸对齐领域,人类对人脸问题的研究取得了飞速的进展。级联形状回归方法成功的主要原因在于通过级联一些弱回归器去构建一个更加强大的回归器。这种架构极大的增强了人脸对齐算法的泛化能力和精确性,同时避免了海森矩阵和雅克比矩阵的求解,大大提升了算法的速度。In recent years, with the widespread application of the cascade regression framework in the field of face alignment, the research on the human face problem has made rapid progress. The main reason for the success of the cascaded shape regression method is to build a stronger regressor by cascading some weak regressors. This architecture greatly enhances the generalization ability and accuracy of the face alignment algorithm, while avoiding the solution of the Hessian matrix and the Jacobian matrix, which greatly improves the speed of the algorithm.

早期的基于最优的人脸对齐算法(ASM[文献1]、AAM[文献2]-[文献4]、CLM[文献5]-[文献7]),通过优化误差方程以达到人脸对齐的目的,其性能依赖于误差方程本身设计的优劣程度及其最优化的效果。这类算法把人脸对齐问题看成是非线性最优化问题来求解,而在求解非线性最优化问题时,目前最有效、最可靠、最快的方法是二阶下降方法。在解决计算机视觉方面的问题时,二阶下降方法有两大缺点:1)目标函数不可微从而使数值逼近的思想无法实现;2)海森矩阵的维度过高且不正定。由于这些问题的存在导致该问题求解代价太大或不可解。Early face alignment algorithms based on optimal (ASM [Document 1], AAM [Document 2]-[Document 4], CLM [Document 5]-[Literature 7]) optimized the error equation to achieve the best face alignment. Its performance depends on the pros and cons of the design of the error equation itself and its optimization effect. This kind of algorithm regards the face alignment problem as a nonlinear optimization problem to solve, and when solving the nonlinear optimization problem, the most effective, reliable and fastest method is the second-order descent method. When solving problems in computer vision, the second-order descent method has two major disadvantages: 1) the objective function is not differentiable, so the idea of numerical approximation cannot be realized; 2) the dimension of the Hessian matrix is too high and not positive definite. Due to the existence of these problems, the problem is too expensive or unsolvable to solve.

基于级联形状回归的人脸对齐算法[文献8]-[文献14]根据初始化形状逐步估计形状增量来不断逼近标准形状,从而不需要计算海森矩阵和雅可比矩阵。基于形状回归的人脸对齐算法在时效性和精确性上都取得了不错的效果,已经成为人脸对齐领域的主流算法。基于形状回归的人脸对齐算法[文献8]-[文献11]都需要有一个初始化形状,这个初始化形状一般为平均脸。这类算法首先在平均脸的基准点上(邻域)提取特征,所有基准点的特征组成一个特征向量,该类算法直接估计出平均脸和标准形状的差异与对应的特征向量之间的映射关系R。测试阶段利用平均脸作为初始化数据和训练阶段估计得到的R作回归来对平均脸进行优化以逼近真实形状。The face alignment algorithm based on cascaded shape regression [Reference 8]-[Reference 14] gradually estimates the shape increment according to the initialized shape to continuously approximate the standard shape, so that there is no need to calculate the Hessian matrix and the Jacobian matrix. The face alignment algorithm based on shape regression has achieved good results in terms of timeliness and accuracy, and has become the mainstream algorithm in the field of face alignment. The face alignment algorithms based on shape regression [Document 8]-[Document 11] all need to have an initialization shape, which is generally an average face. This kind of algorithm first extracts features on the reference point (neighborhood) of the average face, and the features of all reference points form a feature vector. This kind of algorithm directly estimates the difference between the average face and the standard shape and the corresponding feature vector. relation R. In the testing phase, the average face is used as the initialization data and the R estimated in the training phase is used for regression to optimize the average face to approximate the true shape.

SDM[8]第一个提出使用级联回归的框架来解决人脸对齐问题,通过使用sift特征[文献15]和多次的级联回归来增强对人脸表情和头部姿势、光照变化差异的鲁棒性。Cao[文献9]在ESR中提出了非参数形状模型,认为每一个人脸的最终回归形状可以看成初始化形状和所有训练人脸形状向量的线性和,通过使用形状索引特征和相关联的特征选择方法可以快速学习准确的模型。Burgos-Artizzu[文献10]等提出了PCPR在对基准点的位置进行估计的同时可以检测遮挡信息,根据遮挡信息选择没有遮挡的形状索引特征来解决遮挡下的人脸对齐问题。Ren[文献11]等提出了有效且计算速度极快的局部二值特征并且使用随机森林进行分类回归,算法的速度达到了3000fps。Zhu[文献16]等在CCFS把人脸对齐分为粗糙选择阶段和精细选择阶段,粗糙选择阶段首先构建一个包含很多候选人脸形状的形状空间,然后确定一个子空间交给精细选择阶段处理,同时丢弃其它的一些和标准形状相差较大、没有希望的子空间;在精细选择阶段不断缩小这个空间直到其收敛到一个极小的、可以确定最后人脸形状的子空间。SDM [8] was the first to propose a cascaded regression framework to solve the face alignment problem, by using sift features [15] and multiple cascaded regressions to enhance the difference in facial expression, head pose, and illumination changes robustness. Cao [9] proposed a non-parametric shape model in ESR, arguing that the final regression shape of each face can be regarded as the linear sum of the initial shape and all training face shape vectors, by using shape index features and associated features Choose a method to quickly learn an accurate model. Burgos-Artizzu [10] and others proposed that PCPR can detect the occlusion information while estimating the position of the reference point, and select the shape index feature without occlusion according to the occlusion information to solve the face alignment problem under occlusion. Ren [11] et al. proposed an effective and extremely fast local binary feature and used random forest for classification and regression, and the speed of the algorithm reached 3000fps. Zhu [16] et al. divided face alignment into a rough selection stage and a fine selection stage in CCFS. In the rough selection stage, a shape space containing many candidate face shapes was first constructed, and then a subspace was determined to be processed by the fine selection stage. At the same time, some other subspaces that differ greatly from the standard shape and are not hopeful are discarded; this space is continuously reduced in the fine selection stage until it converges to an extremely small subspace that can determine the final face shape.

现阶段的人脸对齐算法可以很好的解决表情、头部姿势、光照差异变化较小的人脸对齐问题,例如,300-W数据集[文献17]comman subset下的人脸图片相对来说表情、头部姿势、光照差异变化较小,人脸对齐算法[文献18]在该数据集上的最佳误差为4%,而在遮挡问题比较严重的COFW[10]数据集上,人脸对齐算法[文献19]的最佳误差却为6.5%,因此无约束条件下的人脸对齐是现阶段人脸对齐领域亟待解决的问题。The current face alignment algorithm can well solve the face alignment problem with small changes in expressions, head poses, and illumination differences. The difference in expression, head pose, and illumination changes is small, and the best error of the face alignment algorithm [18] on this dataset is 4%, while on the COFW [10] dataset, which has serious occlusion problems, the face alignment algorithm [18] has the best error of 4%. The optimal error of the alignment algorithm [19] is 6.5%, so face alignment under unconstrained conditions is an urgent problem to be solved in the field of face alignment at this stage.

[文献1]Cootes T F,Taylor C J,Cooper D H,Graham J.Active shape models-their training and application.Computer vision and image understanding,1995,61(1):38-59.[Document 1] Cootes T F, Taylor C J, Cooper D H, Graham J. Active shape models-their training and application. Computer vision and image understanding, 1995, 61(1): 38-59.

[文献2]Matthews I,Baker S.Active appearance modelsrevisited.International Journal of Computer Vision,2004,60(2):135–164.[Document 2] Matthews I, Baker S. Active appearance models reviewed. International Journal of Computer Vision, 2004, 60(2): 135–164.

[文献3]Sauer P,Cootes T F,Taylor C J.Accurate regression proceduresfor active appearance models.//Proceedings of the British Machine VisionConference.Dundee,Scotland,2011:681-685.[Literature 3] Sauer P, Cootes T F, Taylor C J. Accurate regression procedures for active appearance models. // Proceedings of the British Machine Vision Conference. Dundee, Scotland, 2011: 681-685.

[文献4]Cootes T F,Edwards G J,Taylor C J.Active appearancemodels.IEEE Transactions on Pattern Analysis and Machine Intelligence,2001,23(6):581-585.[Document 4] Cootes T F, Edwards G J, Taylor C J. Active appearancemodels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(6): 581-585.

[文献5]Asthana A,Zafeiriou S,Cheng S,Pantic M.Robust discriminativeresponse map fitting with constrained local models.//IEEE Conference onComputer Vision and Pattern Recognition.Portland,USA,2013:3444-3451.[Document 5] Asthana A, Zafeiriou S, Cheng S, Pantic M. Robust discriminativeresponse map fitting with constrained local models.//IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 3444-3451.

[文献6]Cristinacce D,Cootes T.Feature detection and tracking withconstrained local models.//Proceedings of the British Machine VisionConference.Edinburgh,UK,2006:929-938.[Literature 6] Cristinacce D, Cootes T. Feature detection and tracking with constrained local models. // Proceedings of the British Machine Vision Conference. Edinburgh, UK, 2006: 929-938.

[文献7]Asthana A,Zafeiriou S,CHENG Shi-yang,Pantic M.Incremental FaceAlignment in the Wild.//IEEE Conference on Computer Vision and PatternRecognition.Columbus,USA,2014:1859-1867.[Document 7] Asthana A, Zafeiriou S, CHENG Shi-yang, Pantic M. Incremental FaceAlignment in the Wild.//IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 1859-1867.

[文献8]Xiong Xue-han,Torre F D L.Supervised descent method and itsapplications to face alignment.//IEEE Conference on Computer Vision andPattern Recognition.Portland,USA,2013:532-539.[Document 8] Xiong Xue-han, Torre F D L. Supervised descent method and its applications to face alignment.//IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 532-539.

[文献9]Cao Xu-dong,Wei Yi-chen,Wen Fang,Sun Jian.Face alignment byexplicit shape regression.International Journal of Computer Vision,2014,107(2):177-190.[Document 9] Cao Xu-dong, Wei Yi-chen, Wen Fang, Sun Jian. Face alignment by explicit shape regression. International Journal of Computer Vision, 2014, 107(2): 177-190.

[文献10]Burgos-Artizzu X P,Perona P,Dollar P.Robust face landmarkestimation under occlusion.//IEEE International Conference on ComputerVison.Sydney,Australia,2013:1513-1520.[Document 10] Burgos-Artizzu X P, Perona P, Dollar P. Robust face landmark estimation under occlusion. //IEEE International Conference on ComputerVison.Sydney,Australia,2013:1513-1520.

[文献11]Ren Shao-qing,Cao Xu-dong,Wei Yi-chen,Sun Jian.Face alignmentat 3000fps via regressing local binary features.//IEEE Conference on ComputerVision and Pattern Recognition.Columbus,USA,2014:1685-1692.[Document 11]Ren Shao-qing,Cao Xu-dong,Wei Yi-chen,Sun Jian.Face alignmentat 3000fps via regressing local binary features.//IEEE Conference on ComputerVision and Pattern Recognition.Columbus,USA,2014:1685-1692 .

[文献12]Dollar P,Welinder P,Perona P.Cascaded pose regression.//IEEEConference on Computer Vision and Pattern Recognition.San Francisco,USA,2010:1078-1085.[Document 12] Dollar P, Welinder P, Perona P. Cascaded pose regression.//IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, USA, 2010: 1078-1085.

[文献13]Tzimiropoulos G,Pantic M.Gauss-newton deformable part modelsfor face alignment in-the-wild.//IEEE Conference on Computer Vision andPattern Recognition.Columbus,USA,2014:1851-1858.[Document 13] Tzimiropoulos G, Pantic M. Gauss-newton deformable part models for face alignment in-the-wild.//IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 1851-1858.

[文献14]Smith B M,Brandt J,Lin Z,Zhang L.Nonparametric contextmodeling of local appearance for pose-and expression-robust facial landmarklocalization.//IEEE Conference on Computer Vision and PatternRecognition.Columbus,USA,2014:1741-1748.[Document 14] Smith B M, Brandt J, Lin Z, Zhang L.Nonparametric contextmodeling of local appearance for pose-and expression-robust facial landmarklocalization.//IEEE Conference on Computer Vision and PatternRecognition.Columbus,USA,2014:1741-1748 .

[文献15]Lowe D G.Distinctive image features from scale-invariantkeypoints.International Journal of Computer Vision,2004,60(2):91–110.[Document 15] Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91–110.

[文献16]Zhu Shi-zhan,Chen Li.LOY Chen-change,Tang Xiao-ou.FaceAlignment by Coarse-to-Fine Shape Searching.//IEEE Conference on ComputerVision and Pattern Recognition.Boston,USA,2015:4998-5006.[Document 16] Zhu Shi-zhan, Chen Li.LOY Chen-change, Tang Xiao-ou.FaceAlignment by Coarse-to-Fine Shape Searching.//IEEE Conference on ComputerVision and Pattern Recognition.Boston,USA,2015:4998- 5006.

[文献17]C.Sagonas,G.Tzimiropoulos,S.Zafeiriou,M.Pantic,A semi-automatic methodology for facial landmark annotation,in:Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops,2013,pp.896-903.[Document 17] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, M. Pantic, A semi-automatic methodology for facial landmark annotation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2013, pp.896-903 .

[文献18]S.Xiao,J.Feng,J.Xing,H.Lai,S.Yan,A.Kassim,Robust faciallandmark detection via recurrent attentive-renement networks,in:EuropeanConference on Computer Vision,2016,pp.57-72.[Document 18]S.Xiao,J.Feng,J.Xing,H.Lai,S.Yan,A.Kassim,Robust faciallandmark detection via recurrent attentive-renement networks,in:EuropeanConference on Computer Vision,2016,pp.57 -72.

[文献19]J.Zhang,M.Kan,S.Shan,X.Chen,Occlusion-free face alignment:Deep regression networks coupled with de-corrupt auto-encoders,in:IEEEConference on Computer Vision and Pattern Recognition,2016,pp.3428-3437.[Document 19] J. Zhang, M. Kan, S. Shan, X. Chen, Occlusion-free face alignment: Deep regression networks coupled with de-corrupt auto-encoders, in: IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp.3428-3437.

[文献20]V.Blanz,T.Vetter,Face recognition based on fitting a 3dmorphable model,IEEE Transactions on pattern analysis and machineintelligence.25(9)(2003)1063-1074.[Document 20] V.Blanz, T. Vetter, Face recognition based on fitting a 3dmorphable model, IEEE Transactions on pattern analysis and machineintelligence. 25(9)(2003) 1063-1074.

[文献21]P.Paysan,R.Knothe,B.Amberg,S.Romdhani,T.Vetter,A 3d facemodel for pose and illumination invariant face recognition,in:Advanced video440and signal based surveillance,2009.AVSS'09.Sixth IEEE InternationalConference on,IEEE,2009,pp.296-301.[Document 21] P. Paysan, R. Knothe, B. Amberg, S. Romdhani, T. Vetter, A 3d facemodel for pose and illumination invariant face recognition, in: Advanced video440 and signal based surveillance, 2009.AVSS'09.Sixth IEEE International Conference on, IEEE, 2009, pp.296-301.

[文献22]C.Cao,Y.Weng,S.Zhou,Y.Tong,K.Zhou,Facewarehouse:A 3d facialexpression database for visual computing,IEEE Transactions on Visualizationand Computer Graphics 20(3)(2014)413-425.[Document 22] C.Cao,Y.Weng,S.Zhou,Y.Tong,K.Zhou,Facewarehouse: A 3d facialexpression database for visual computing,IEEE Transactions on Visualization and Computer Graphics 20(3)(2014)413-425 .

[文献23]X.Zhu,Z.Lei,X.Liu,H.Shi,S.Z.Li,Face alignment across largeposes:A 3d solution,in:Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition,2016,pp.146{155.[Document 23] X.Zhu, Z.Lei, X.Liu, H.Shi, S.Z.Li, Face alignment across large poses: A 3d solution, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp.146 {155.

发明内容SUMMARY OF THE INVENTION

为了解决上述技术问题,本发明提出了一种基于由粗到细脸部形状估计的人脸对齐方法,主要解决人脸表情、头部姿势、光照条件的差异和部分遮挡的存在情况下人脸对齐精度不高问题。In order to solve the above technical problems, the present invention proposes a face alignment method based on the estimation of the face shape from coarse to fine, which mainly solves the problem of face expression, head posture, differences in lighting conditions and the existence of partial occlusion. Alignment accuracy is not high.

本发明所采用的技术方案是一种基于由粗到细脸部形状估计的人脸对齐方法,针对任一张输入人脸图片,首先估计出初始化人脸形状,然后逐步逼近人脸的真实形状,包括以下步骤,The technical solution adopted by the present invention is a face alignment method based on the estimation of the face shape from coarse to fine. For any input face picture, the initial face shape is estimated first, and then the real shape of the face is gradually approximated. , including the following steps,

步骤1,使用多任务深度学习框架对人脸的主要特征点的位置和人脸表情进行估计;Step 1, use the multi-task deep learning framework to estimate the position of the main feature points and the facial expression of the face;

步骤2,构建基于卷积神经网络的头部姿势分类模型对人脸头部姿势进行精确估计和分类;Step 2, constructing a head pose classification model based on convolutional neural network to accurately estimate and classify the face head pose;

步骤3,利用步骤2所得头部姿势分类结果,以及步骤1所得人脸表情和主要特征点的位置的估计结果,得到更加准确的初始化形状;Step 3, using the head pose classification result obtained in step 2, and the estimation result of the facial expression and the position of the main feature points obtained in step 1, to obtain a more accurate initialization shape;

步骤4,基于步骤3所得初始化形状,按照姿势以及表情的分类结果,训练各自的回归器,对人脸形状进行更新,逼近标准形状。Step 4, based on the initialized shape obtained in step 3, according to the classification results of pose and expression, train respective regressors, update the face shape, and approximate the standard shape.

而且,步骤1中,多任务深度学习框架中,多任务学习包括对主任务人脸主要特征点的估计和其它子任务的估计,其中,人脸主要特征点包括左右嘴角、鼻尖和左右眼中心,子任务包括头部姿势、性别、眼睛形状和嘴巴形状的估计。Moreover, in step 1, in the multi-task deep learning framework, the multi-task learning includes the estimation of the main feature points of the face of the main task and the estimation of other sub-tasks, wherein the main feature points of the face include the left and right corners of the mouth, the tip of the nose and the center of the left and right eyes , subtasks include estimation of head pose, gender, eye shape, and mouth shape.

而且,步骤2中,基于卷积神经网络的头部姿势分类模型首先对人脸进行三维建模,得到人脸的角度参数pitch、jaw和roll,然后根据角度参数的取值范围对人脸图片进行分类;对于分类之后的图片,使用人脸侧面化方法生成其它类的图片,得到的新的图片集;将新的图片集作为基于卷积神经网络的头部姿势分类模型的训练集完成对模型的训练。Moreover, in step 2, the head pose classification model based on the convolutional neural network first performs three-dimensional modeling on the face to obtain the angle parameters pitch, jaw and roll of the face, and then according to the value range of the angle parameter For classification; for the classified pictures, use the face profile method to generate pictures of other categories, and obtain a new picture set; use the new picture set as the training set of the head pose classification model based on the convolutional neural network to complete the pairing. Model training.

而且,步骤3中,对图片通过基于卷积神经网络的头部姿势分类模型得到对应的输出类别c,选择对应的平均人脸形状

Figure BDA0001635446260000051
再根据主要特征点的位置进行调整,使得
Figure BDA0001635446260000052
的主要特征点和检测到的人脸主要特征点的误差最小,实现得到该图片人脸的初始化形状Si。Moreover, in step 3, the corresponding output category c is obtained for the picture through the head pose classification model based on the convolutional neural network, and the corresponding average face shape is selected.
Figure BDA0001635446260000051
Then adjust according to the position of the main feature points, so that
Figure BDA0001635446260000052
The error between the main feature points and the detected main feature points of the face is the smallest, and the initialized shape S i of the face in the picture is obtained.

而且,步骤4中,采用更高级的级联回归框架,首先对人脸对齐问题的优化空间进行域的划分,使得每个域包含的人脸形状比较相似,从而在回归器的训练中具有相同的梯度下降方向,每个域训练出各自的回归器;在对人脸形状进行更新时,首先判断该人脸形状属于哪一个域,然后采用对应域的回归器对其进行更新。Moreover, in step 4, a more advanced cascade regression framework is adopted, and the optimization space of the face alignment problem is firstly divided into domains, so that the face shapes contained in each domain are relatively similar, so that they have the same characteristics in the training of the regressor. In the gradient descent direction of , each domain trains its own regressor; when updating the face shape, first determine which domain the face shape belongs to, and then use the regressor of the corresponding domain to update it.

本发明所提供技术方案是一种简单却具有不错鲁棒性的人脸对齐方法,使用基于卷积神经网络由粗到细的脸部形状估计方法可以以很高的准确率选择附加属性接近的人脸形状作为初始化形状,从而降低初始化形状对平均脸的依赖以及增强算法对人脸头部姿势、面部表情、遮挡和光照条件差异的鲁棒性,提升对齐效果。本发明通过构建更准确的人脸初始化形状和采用更高级的级联回归框架,提高了算法对于人脸表情、头部姿势以及光照遮挡差异的鲁棒性。The technical solution provided by the present invention is a simple but robust face alignment method. Using the face shape estimation method from coarse to fine based on convolutional neural network, it is possible to select a face shape with close additional attributes with high accuracy. The face shape is used as the initialization shape, thereby reducing the dependence of the initialization shape on the average face and enhancing the robustness of the algorithm to differences in face head pose, facial expression, occlusion and lighting conditions, and improving the alignment effect. By constructing a more accurate face initialization shape and adopting a more advanced cascaded regression framework, the invention improves the robustness of the algorithm to differences in face expression, head posture and illumination occlusion.

附图说明Description of drawings

图1是本发明实施例的流程图。FIG. 1 is a flowchart of an embodiment of the present invention.

图2是本发明实施例中构建的基于卷积神经网络的头部姿势分类模型。FIG. 2 is a head pose classification model based on a convolutional neural network constructed in an embodiment of the present invention.

图3是本发明实施例中传统人脸对齐方法与本发明在处理人脸头部姿势和表情夸张图片时的对比示意图。FIG. 3 is a schematic diagram of comparison between the traditional face alignment method in the embodiment of the present invention and the present invention when processing exaggerated pictures of human face and head posture and expressions.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明做进一步详细说明。应当理解,此处描述的具体实施例仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

基于由粗到细脸部形状估计的人脸对齐方法是一种简单却具有不错鲁棒性的人脸对齐方法。使用基于卷积神经网络由粗到细的脸部形状估计方法可以以很高的准确率选择附加属性接近的人脸形状作为初始化形状,从而降低初始化形状对平均脸的依赖以及增强算法对人脸头部姿势、面部表情、遮挡和光照条件差异的鲁棒性,提升算法的效果。The face alignment method based on coarse-to-fine face shape estimation is a simple yet robust face alignment method. Using the face shape estimation method based on convolutional neural network from coarse to fine, the face shape with close additional attributes can be selected as the initial shape with high accuracy, thereby reducing the dependence of the initial shape on the average face and the enhancement algorithm on the face. Robustness to differences in head pose, facial expression, occlusion, and lighting conditions, improving the performance of the algorithm.

请见图1,本发明提供的一种基于由粗到细脸部形状估计的人脸对齐方法,针对任一张输入人脸图片,首先估计出初始化人脸形状,然后逐步逼近人脸的真实形状;其具体实现包括以下步骤:Please refer to FIG. 1, a face alignment method based on the estimation of the face shape from coarse to fine provided by the present invention, for any input face picture, firstly estimates the initial face shape, and then gradually approximates the real face of the face shape; its specific implementation includes the following steps:

步骤1:使用多任务深度学习框架对人脸的主要特征点的位置和人脸表情进行估计;Step 1: Use the multi-task deep learning framework to estimate the position of the main feature points and the facial expression of the face;

多任务学习包括对主任务人脸主要特征点的估计和其它子任务的估计。其中,子任务包括头部姿势、性别、眼睛形状、嘴巴形状的估计。主任务和子任务分别使用最小二乘和交叉熵函数作为损失函数。Multi-task learning includes the estimation of the main feature points of the main task face and the estimation of other sub-tasks. Among them, subtasks include estimation of head pose, gender, eye shape, and mouth shape. The main task and subtask use the least squares and cross-entropy functions as loss functions, respectively.

实施例中,步骤1中多任务学习定义为主任务人脸主要特征点(左右嘴角、鼻尖、左右眼中心)的估计和其它子任务的估计,它们对应的标记为

Figure BDA0001635446260000061
其中i表示训练图片的索引号,N为训练图片集中图片的数量。
Figure BDA0001635446260000062
表示主要特征点检测任务的标记,其余的表示其它的附加属性(头部姿势、性别、眼睛、嘴巴)任务的标记。
Figure BDA0001635446260000063
表示5个特征点的坐标(
Figure BDA0001635446260000064
表示为维度为10维的向量),
Figure BDA0001635446260000065
表示按照偏航角进行划分的5种不同人脸姿势(0,±30°,±60°)。
Figure BDA0001635446260000066
是二值特征,分别表示男或女。
Figure BDA0001635446260000067
分别表示有带眼镜、睁眼和闭眼,
Figure BDA0001635446260000068
表示微笑、咧嘴笑、嘴巴闭合、嘴巴张开。则该多任务学习的目标函数可以表示成:In the embodiment, the multi-task learning in step 1 is defined as the estimation of the main feature points (left and right mouth corners, nose tip, left and right eye centers) of the main task face and the estimation of other sub-tasks, and their corresponding labels are
Figure BDA0001635446260000061
where i represents the index number of the training image, and N is the number of images in the training image set.
Figure BDA0001635446260000062
Labels representing the main feature point detection task, and the rest representing labels for other additional attribute (head pose, gender, eyes, mouth) tasks.
Figure BDA0001635446260000063
Represents the coordinates of the 5 feature points (
Figure BDA0001635446260000064
represented as a 10-dimensional vector),
Figure BDA0001635446260000065
Indicates 5 different face poses (0, ±30°, ±60°) divided according to the yaw angle.
Figure BDA0001635446260000066
is a binary feature, representing male or female, respectively.
Figure BDA0001635446260000067
Respectively, with glasses, eyes open and eyes closed,
Figure BDA0001635446260000068
Means smile, grin, mouth closed, mouth open. Then the objective function of the multi-task learning can be expressed as:

Figure BDA0001635446260000069
Figure BDA0001635446260000069

其中,in,

Figure BDA00016354462600000610
表示所有的特征向量的集合,F(xi;Wr)=(Wr)Txi是一个线性函数,表示根据第i个特征xi和训练得到的映射关系Wr对人脸主要特征点的位置进行计算的过程,其中,Wr表示从特征xi到真实的人脸主要特征点
Figure BDA00016354462600000611
之间的映射关系。
Figure BDA00016354462600000610
Represents the set of all feature vectors, F(x i ; W r )=(W r ) T x i is a linear function, which represents the main feature of the face according to the i-th feature x i and the mapping relationship W r obtained by training The process of calculating the position of the point, where W r represents the main feature point from the feature xi to the real face
Figure BDA00016354462600000611
the mapping relationship between them.

Figure BDA0001635446260000071
为使用softmax函数表示的后验概率,
Figure BDA0001635446260000072
表示矩阵Wa的第j列,Wa表示从特征xi到子任务a的标记
Figure BDA0001635446260000073
之间的映射关系,就是极大似然估计方程的参数,m表示子任务a的某一标记,例如当a为性别估计子任务时,m可以为0或1。
Figure BDA0001635446260000071
is the posterior probability expressed using the softmax function,
Figure BDA0001635446260000072
represents the jth column of matrix W a , W a represents the label from feature x i to subtask a
Figure BDA0001635446260000073
The mapping relationship between is the parameter of the maximum likelihood estimation equation, and m represents a certain mark of subtask a. For example, when a is a gender estimation subtask, m can be 0 or 1.

对于子任务a的估计,采用极大似然估计方法,分别求出不同标记预测值(m的取值不同)的概率。例如当a为性别估计子任务时,m可以为0或1,也就是分别求出是男还是女的概率。该概率由极大似然估计和softmax函数计算得到。For the estimation of the subtask a, the maximum likelihood estimation method is used to obtain the probability of the predicted values of different labels (the values of m are different). For example, when a is a gender estimation subtask, m can be 0 or 1, that is, the probability of being male or female is calculated respectively. This probability is calculated by the maximum likelihood estimation and the softmax function.

Figure BDA0001635446260000074
是m相应的极大似然估计的参数,例如W1 a对应m=1时,极大似然估计的参数。
Figure BDA0001635446260000074
is the parameter of maximum likelihood estimation corresponding to m, for example, when W 1 a corresponds to m=1, the parameter of maximum likelihood estimation.

Figure BDA0001635446260000075
是正则项,参数W={Wr,{Wa}}表示惩罚项,a表示属于A某一个任务,A表示不包含特征点检测的其它所有附加属性检测任务的集合。T表示所有的任务,包含主任务(主要特征点检测)和A,t为任务序号。
Figure BDA0001635446260000075
is the regular term, the parameter W={W r , {W a }} represents the penalty term, a represents a task belonging to A, and A represents the set of all other additional attribute detection tasks that do not include feature point detection. T represents all tasks, including the main task (main feature point detection) and A, and t is the task sequence number.

λa表示不同子任务在总体目标函数所占的权值(主要特征点估计任务对应的权值为1)。λ a represents the weight of different subtasks in the overall objective function (the weight corresponding to the main feature point estimation task is 1).

步骤2:构建基于卷积神经网络的头部姿势分类模型对人脸头部姿势进行精确估计、分类;Step 2: Build a head pose classification model based on convolutional neural network to accurately estimate and classify the face head pose;

构建基于卷积神经网络的头部姿势分类模型,目的在于,对于任一输入人脸图片,使用头部姿势分类模型对其人脸的头部姿势进行精细估计、分类。The purpose of building a head pose classification model based on convolutional neural network is to use the head pose classification model to finely estimate and classify the head pose of any input face image.

本步骤基于卷积神经网络的头部姿势分类模型首先对人脸进行三维建模,得到人脸的角度参数(pitch、jaw、roll),然后根据角度参数的取值范围对人脸图片进行分类。对于分类之后的图片,使用人脸侧面化方法生成其它类的图片,得到的新的图片集。将这个新的图片集作为基于卷积神经网络的头部姿势分类模型的训练集完成对模型的训练。In this step, the head pose classification model based on the convolutional neural network firstly models the face in 3D to obtain the angle parameters (pitch, jaw, roll) of the face, and then classify the face pictures according to the value range of the angle parameters . For the classified pictures, use the face profile method to generate pictures of other categories, and obtain a new picture set. The training of the model is completed using this new set of images as the training set for the convolutional neural network-based head pose classification model.

基于卷积神经网络的头部姿势分类模型训练时需要大量的训练集,而现在已有的图片集(300-W)的规模太小,因此需要对其进行扩充。本实施例通过利用人脸侧面化方法合成不同姿势的人脸图片扩大训练集。首先对图片进行3D建模和分类,然后使用人脸侧面化方法合成一些新的不同姿势(类)的图片,去掉一些失真严重的合成图片,The training of the head pose classification model based on convolutional neural network requires a large number of training sets, and the scale of the existing image set (300-W) is too small, so it needs to be expanded. In this embodiment, the training set is enlarged by synthesizing face pictures of different poses by using the face profiling method. First, 3D modeling and classification are performed on the pictures, and then some new pictures with different poses (classes) are synthesized by using the face profile method, and some synthetic pictures with serious distortion are removed.

实施例的具体实现如下:The specific implementation of the embodiment is as follows:

1)基于卷积神经网络的头部姿势分类模型首先对300-W图片集[文献17]的每一张图片人脸进行三维建模,得到人脸的角度参数(pitch(俯仰角)、jaw(偏航角)、roll(翻滚角))。1) The head pose classification model based on convolutional neural network firstly performs 3D modeling on each image face in the 300-W image set [17], and obtains the angle parameters of the face (pitch (pitch angle), jaw (yaw angle), roll (roll angle)).

在对二维图片人脸进行三维建模时,采用Blenz等在文章中提出的3DMM模型[文献20](3D人脸形变模型),使用PCA降维的方法来描述3D人脸形状空间,When 3D modeling of 2D image faces, the 3DMM model proposed by Blenz et al. [20] (3D face deformation model) is used, and the PCA dimension reduction method is used to describe the 3D face shape space.

Figure BDA0001635446260000081
Figure BDA0001635446260000081

其中,in,

V表示一个三维人脸,

Figure BDA0001635446260000082
表示三维平均脸,Aid、Aexp分别为三维人脸形状空间的形状主成分以及表情主成分,分别来自于BFM模型[文献21]和Face-Warehouse[文献22],aid、aexp分别是形状参数和表情参数。然后,三维人脸根据二维人脸基准点与三维平均脸基准点之间的对应关系,通过弱透视投影投影到二维平面,对人脸姿势的角度参数进行估计:V represents a three-dimensional face,
Figure BDA0001635446260000082
Represents a three-dimensional average face, A id and A exp are the shape principal components and expression principal components of the three-dimensional face shape space, respectively, from the BFM model [Document 21] and Face-Warehouse [Document 22], a id , a exp respectively are shape parameters and expression parameters. Then, the 3D face is projected to the 2D plane through weak perspective projection according to the correspondence between the 2D face reference point and the 3D average face reference point, and the angle parameters of the face pose are estimated:

Figure BDA0001635446260000083
Figure BDA0001635446260000083

其中,in,

R是由pitch、jaw、roll(角度参数)构成的旋转矩阵,t2d为平移矩阵,f是放缩因子,

Figure BDA0001635446260000084
是正交投影矩阵。S是由3D图片或点云投影得到的2D图片或点云(一般表示人脸形状,把人脸特征点坐标按照一定顺序排列得到的矩阵),通过多次迭代和特征点匹配的方法可以得到更加准确的R矩阵,即pitch、jaw、roll角度。R is a rotation matrix composed of pitch, jaw, roll (angle parameter), t 2d is a translation matrix, f is a scaling factor,
Figure BDA0001635446260000084
is the orthographic projection matrix. S is a 2D image or point cloud obtained by projecting a 3D image or point cloud (generally representing the shape of the face, a matrix obtained by arranging the coordinates of the face feature points in a certain order), which can be obtained by multiple iterations and feature point matching. More accurate R matrix, namely pitch, jaw, roll angles.

2)根据角度参数的取值范围对300-W图片集的人脸图片进行分类,并且计算每一类的平均人脸形状。2) Classify the face pictures in the 300-W picture set according to the value range of the angle parameter, and calculate the average face shape of each category.

按照人脸姿势的角度参数的估计结果,对300-W图片集进行分类。首先根据pitch角度取值范围([-45°,-15°]、[-15°,15°]、[15°,45°])把人脸图片分为3类,然后再分别对这3类图片按照jaw角度取值([-50°,50°]平均划分为5类)范围进行划分,总共产生了15类图片,再根据roll角度的取值范围([-50°,50°]平均划分为5类)进行划分,总共75类。用c表示类别,{Sc}指第c类图片的集合,对应第c类人脸图片构建人脸形状子空间。根据上述的分类规则,每临近的5类(c=5j+1…c=5j+5(其中j=0…24))具有相接近的pitch、jaw角度,只是roll角度相差较大,而且训练图片大部分都只有较小的角度变化,而这些图片中roll角度同样很小,即第c3=5j+3类的图片相对较多,第c1、c2、c4、c5的图片比较少,因此可以首先计算出c3类图片的人脸平均形状,然后在旋转一定的角度得到其它四类(c1、c2、c4、c5)的人脸平均形状,这样可以避免因为某类图片太少而使单独计算得到人脸的平均形状代表性不强,使得初始化形状误差较大;另外在计算c3类图片的人脸平均形状时,对图片根据人脸的附加属性

Figure BDA0001635446260000091
进一步划分,即眼睛的睁(戴眼镜)、闭,嘴巴的开(大笑)、合(咧嘴笑),即每一类图片有四种平均形状,如果某类图片集图片数量为0,则从这四种平均形状中选择与其误差最小的作为该类的平均形状。假设某类中眼睛闭、嘴巴合的图片不存在,优先选择眼睛睁、嘴巴合的那一个平均形状,这样就构成了300个人脸子空间,每个人脸子空间计算平均人脸形状,用
Figure BDA0001635446260000092
表示,其中γ∈(0,1,2,3)。The 300-W image set is classified according to the estimation results of the angle parameters of the face pose. First, according to the pitch angle value range ([-45°, -15°], [-15°, 15°], [15°, 45°]), the face pictures are divided into three categories, and then the three The class pictures are divided according to the value of the jaw angle ([-50°, 50°] is divided into 5 categories on average), and a total of 15 categories of pictures are generated, and then according to the value range of the roll angle ([-50°, 50°] It is divided into 5 categories on average) for a total of 75 categories. Use c to represent the category, {S c } refers to the set of the c-th type of pictures, and construct the face shape subspace corresponding to the c-th type of face pictures. According to the above classification rules, each adjacent 5 categories (c=5j+1...c=5j+5 (where j=0...24)) have similar pitch and jaw angles, but the roll angles are quite different, and the training Most of the pictures have only a small angle change, and the roll angle in these pictures is also very small, that is, there are relatively more pictures in the c 3 =5j+3 category, and pictures in the c 1 , c 2 , c 4 , and c 5 . It is relatively small, so you can first calculate the average face shape of the c3 category of pictures, and then rotate to a certain angle to obtain the average face shape of the other four categories (c1, c2, c4, c5 ) , which can avoid Because there are too few pictures of a certain type, the average shape of the face calculated separately is not representative, which makes the initial shape error larger; in addition, when calculating the average shape of the face of the c3 categories of pictures, the additional attributes of the face are used for the picture.
Figure BDA0001635446260000091
Further division, that is, the eyes are open (wearing glasses), closed, the mouth is open (laughing), and closed (grinning), that is, each type of picture has four average shapes, if the number of pictures in a certain type of picture set is 0, then Among the four average shapes, the one with the smallest error is selected as the average shape of the class. Assuming that there is no picture with eyes closed and mouth closed in a certain category, the average shape with eyes open and mouth closed is preferred, so that 300 face subspaces are formed. Each face subspace calculates the average face shape, using
Figure BDA0001635446260000092
represents, where γ∈(0,1,2,3).

3)利用上述构建的3D人脸模型,使用人脸侧面化方法[文献23],合成一些新的不同姿势(类)的图片,扩充训练集,进行头部姿势分类模型的训练。3) Using the 3D face model constructed above, using the face profile method [23], synthesizing some new pictures of different poses (classes), expanding the training set, and training the head pose classification model.

在实际过程中,并不可能对每一张图片都用上述方法进行人脸角度参数的估计,然后进行分类,这会消耗大量的时间和空间。因此,本实施例提出构建基于卷积神经网络的头部姿势分类模型,针对任一输入图片就能直接给出对应的分类结果。该模型的训练需要大量的训练图片集,本实施例利用人脸侧面化方法合成不同姿势的人脸图片扩大训练集,去掉一些失真严重的合成图片,每类大约1000张图片,总共75000张图片,标记为

Figure BDA0001635446260000093
选择其中的67500张作为训练集,剩余的7500张作为验证集。本实施例把训练集的所有图片都转换成96×96大小,作为卷积神经网络的输入,请见图2。卷积1层的卷积核比较大(卷积核为11),是为了更快的过滤掉噪声信息,提取有用的信息。卷积2层、卷积3层的卷积核逐渐减小,因为需要对过滤掉的特征信息进行多次处理,来得到更加准确的特征信息。全连接层加入了drop策略,在模型训练时随机让网络某些隐含节点的权重不工作,不工作节点的权重暂时保存起来供以后的样本输入时使用,因此在训练样本较少时,可以作为防止模型过拟合的一种策略。训练卷积神经网络的过程可以表示为In the actual process, it is impossible to use the above method to estimate the face angle parameters for each picture, and then classify them, which consumes a lot of time and space. Therefore, this embodiment proposes to construct a head pose classification model based on a convolutional neural network, which can directly give a corresponding classification result for any input picture. The training of this model requires a large number of training image sets. In this embodiment, the face profile method is used to synthesize face images of different poses to expand the training set, and some severely distorted synthetic images are removed. There are about 1,000 images for each type, and a total of 75,000 images. ,Mark as
Figure BDA0001635446260000093
Among them, 67,500 images were selected as training set, and the remaining 7,500 images were used as validation set. In this example, all the pictures in the training set are converted into 96×96 size, as the input of the convolutional neural network, see Figure 2. The convolution kernel of the convolution layer 1 is relatively large (the convolution kernel is 11), in order to filter out noise information faster and extract useful information. The convolution kernel of convolution layer 2 and convolution layer 3 is gradually reduced, because the filtered feature information needs to be processed multiple times to obtain more accurate feature information. The drop strategy is added to the fully connected layer. During model training, the weights of some hidden nodes in the network are randomly disabled. The weights of non-working nodes are temporarily saved for future sample input. Therefore, when there are few training samples, you can As a strategy to prevent model overfitting. The process of training a convolutional neural network can be expressed as

Figure BDA0001635446260000101
Figure BDA0001635446260000101

其中,ck表示扩展之后第k张图片

Figure BDA0001635446260000102
的分类结果,
Figure BDA0001635446260000103
表示扩展之后的图片集,N2为扩展之后的图片集的数量,cnn()表示训练之前的头部姿势分类模型,net表示训练好的卷积神经网络参数。测试阶段卷积神经网络的前向计算过程可以表示为:Among them, ck represents the kth picture after expansion
Figure BDA0001635446260000102
the classification result,
Figure BDA0001635446260000103
Represents the image set after expansion, N 2 is the number of image sets after expansion, cnn() represents the head pose classification model before training, and net represents the trained convolutional neural network parameters. The forward calculation process of the convolutional neural network in the test phase can be expressed as:

Figure BDA0001635446260000104
Figure BDA0001635446260000104

Figure BDA0001635446260000105
表示某一张测试图片,c为根据神经网络net预测的分类结果,这样在测试阶段可以不需要测试人脸标准形状的坐标就能完成对该图片的分类。
Figure BDA0001635446260000105
Indicates a test image, and c is the classification result predicted by the neural network net, so that the classification of the image can be completed without testing the coordinates of the standard shape of the face in the testing stage.

步骤3:利用头部姿势分类结果、人脸表情(确定人脸形状)和主要特征点的位置(辅助人脸定位)可以得到更加准确的初始化形状;Step 3: A more accurate initialization shape can be obtained by using the head pose classification result, the facial expression (determining the shape of the face) and the position of the main feature points (assisting the face positioning);

本步骤根据基于卷积神经网络的头部姿势分类模型的分类结果,再结合步骤1得到的人脸主要特征点的位置以及其它子任务的估计结果,构建图片的初始化形状。In this step, the initial shape of the picture is constructed according to the classification result of the head pose classification model based on the convolutional neural network, combined with the position of the main feature points of the face obtained in step 1 and the estimation results of other subtasks.

步骤3中的具体实现过程为:The specific implementation process in step 3 is:

对图片进行预处理之后作为神经网络的输入,得到对应的输出类别c,然后根据步骤1中对人脸主要特征点的检测结果,选择对应的平均人脸形状

Figure BDA0001635446260000106
再根据主要特征点的位置进行调整(旋转、平移),使得
Figure BDA0001635446260000107
的5个主要特征点和检测到的人脸主要特征点的误差最小,这样就得到了该图片人脸的初始化形状Si。After the image is preprocessed, it is used as the input of the neural network, and the corresponding output category c is obtained, and then the corresponding average face shape is selected according to the detection results of the main feature points of the face in step 1.
Figure BDA0001635446260000106
Then adjust (rotation, translation) according to the position of the main feature points, so that
Figure BDA0001635446260000107
The error between the five main feature points and the detected main feature points of the face is the smallest, so that the initialized shape S i of the face in the picture is obtained.

Figure BDA0001635446260000108
Figure BDA0001635446260000108

其中,in,

Figure BDA0001635446260000111
是旋转矩阵,
Figure BDA0001635446260000112
表示旋转的角度,t2d为平移向量,f是放缩因子。
Figure BDA0001635446260000111
is the rotation matrix,
Figure BDA0001635446260000112
Indicates the angle of rotation, t 2d is the translation vector, and f is the scaling factor.

Figure BDA0001635446260000113
中分别取左右眼睛对应的坐标求平均值、鼻尖以及左右嘴角对应的坐标,用向量y表示,使得它和步骤1中对人脸主要特征点的检测结果yr的误差最小:exist
Figure BDA0001635446260000113
Take the coordinates corresponding to the left and right eyes respectively to obtain the average value, the coordinates corresponding to the nose tip and the left and right corners of the mouth, and use the vector y cγ to represent it, so that it has the smallest error with the detection result y r of the main feature points of the face in step 1:

Figure BDA0001635446260000114
Figure BDA0001635446260000114

步骤4:上述构建的人脸形状作为最终的初始化形状,按照姿势以及表情的分类结果,训练各自的回归器,对人脸形状进行更新,逼近标准形状。Step 4: The face shape constructed above is used as the final initialization shape, and the respective regressors are trained according to the classification results of pose and expression, and the face shape is updated to approximate the standard shape.

实施例中,步骤4的具体实现过程为:In the embodiment, the specific implementation process of step 4 is:

采用更高级的级联回归框架,对初始化形状进行处理逐步逼近真实人脸形状。Using a more advanced cascaded regression framework, the initial shape is processed to gradually approximate the real face shape.

首先对人脸对齐问题的优化空间进行域的划分,使得每个域包含的人脸形状比较相似,从而在回归器的训练中具有相同的梯度下降方向,每个域训练出各自的回归器,域的回归器的训练过程的目标函数为:First, the optimization space of the face alignment problem is divided into domains, so that the face shapes contained in each domain are relatively similar, so that they have the same gradient descent direction in the training of the regressor, and each domain trains its own regressor, The objective function of the training process of the regressor of the domain is:

Figure BDA0001635446260000115
Figure BDA0001635446260000115

其中,in,

Figure BDA0001635446260000116
表示t阶段第m个域的回归器,对应t阶段的回归矩阵;Ωm表示被划分到第m个域的图片集合。
Figure BDA0001635446260000117
表示第t阶段对图片Ii根据t-1阶段的人脸形状
Figure BDA0001635446260000118
提取的全局二值特征;第二个部分是
Figure BDA0001635446260000119
的L2正则项,η是控制正则化强度。
Figure BDA00016354462600001110
表示
Figure BDA00016354462600001111
与真实的人脸形状之间的误差。
Figure BDA0001635446260000116
Represents the regressor of the mth domain in the t stage, corresponding to the regression matrix of the t stage; Ω m represents the set of pictures that are divided into the mth domain.
Figure BDA0001635446260000117
Indicates the face shape of the t-th stage for the picture I i according to the t-1 stage
Figure BDA0001635446260000118
Extracted global binary features; the second part is
Figure BDA0001635446260000119
The L2 regularization term of , η is the control regularization strength.
Figure BDA00016354462600001110
express
Figure BDA00016354462600001111
error from the true face shape.

对人脸形状进行更新时,首先判断该人脸形状属于哪一个域,然后采用对应域的回归器对其进行更新。人脸形状的更新过程为:When updating the face shape, first determine which domain the face shape belongs to, and then use the regressor of the corresponding domain to update it. The update process of the face shape is:

Figure BDA00016354462600001112
Figure BDA00016354462600001112

其中,

Figure BDA00016354462600001113
表示在第t阶段新估计得到的人脸形状,
Figure BDA00016354462600001114
表示上一阶段估计得到的人脸形状,T为级联回归的总轮数,优选建议值为8。in,
Figure BDA00016354462600001113
represents the face shape newly estimated in the t-th stage,
Figure BDA00016354462600001114
Indicates the face shape estimated in the previous stage, T is the total number of rounds of cascade regression, and the recommended value is 8.

训练阶段按照上述步骤1到步骤4进行,可以通过训练得到不同域的回归器(回归矩阵)。测试阶段,首先按照步骤1到3构建新的初始化人脸形状,然后利用对应域的回归器(由训练阶段得到)对人脸形状进行更新,请见图1。The training phase is carried out according to the above steps 1 to 4, and regressors (regression matrices) in different domains can be obtained through training. In the testing phase, first construct a new initialized face shape according to steps 1 to 3, and then use the regressor of the corresponding domain (obtained by the training phase) to update the face shape, see Figure 1.

本实施例的结果与当前流行的人脸对齐方法相比,在精确性上有一定的提高,参见图3,图3.(a)给出了本实施例(CFSE)和当前流行的人脸对齐算法LBF[文献11]、ESR[文献9]、SDM[文献8]相同测试图片的人脸对齐效果。图3.(b)给出本实施例在实际监控下(前四张)、以及不同数据集下(最后一行对应helen数据集194个特征点,其余对应ibug数据集下图片68个特征点)的实验结果。Compared with the current popular face alignment methods, the results of this embodiment have a certain improvement in accuracy, see Figure 3, Figure 3. (a) shows this embodiment (CFSE) and the current popular face alignment method Alignment Algorithms LBF [Document 11], ESR [Document 9], SDM [Document 8] The face alignment effect of the same test image. Figure 3. (b) shows this embodiment under actual monitoring (the first four pictures) and under different data sets (the last line corresponds to 194 feature points in the helen dataset, and the rest corresponds to 68 feature points in the pictures under the ibug dataset) the experimental results.

具体实施时,以上流程可采用计算机软件技术实现自动运行。During specific implementation, the above process can be automatically run by using computer software technology.

应理解,上述实施例仅用于说明本发明而不用于限制本发明的范围。此外应理解,在阅读了本发明讲授的内容之后,本领域技术人员可以对本发明作各种改动或修改,而不脱离本发明技术方案的精神和范围,其均应涵盖在本发明的权利要求范围中。It should be understood that the above embodiments are only used to illustrate the present invention and not to limit the scope of the present invention. In addition, it should be understood that after reading the content taught by the present invention, those skilled in the art can make various changes or modifications to the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention. in the range.

Claims (4)

1.一种基于由粗到细脸部形状估计的人脸对齐方法,其特征在于:针对任一张输入人脸图片,首先估计出初始化人脸形状,然后逐步逼近人脸的真实形状,包括以下步骤,1. A face alignment method based on the estimation of the face shape from coarse to fine, characterized in that: for any input face picture, first estimate the initial face shape, and then gradually approach the true shape of the face, including the following steps, 步骤1,使用多任务深度学习框架对人脸的主要特征点的位置和人脸表情进行估计;Step 1, use the multi-task deep learning framework to estimate the position of the main feature points and the facial expression of the face; 步骤2,构建基于卷积神经网络的头部姿势分类模型对人脸头部姿势进行精确估计和分类;Step 2, constructing a head pose classification model based on convolutional neural network to accurately estimate and classify the face head pose; 步骤3,利用步骤2所得头部姿势分类结果,以及步骤1所得人脸表情和主要特征点的位置的估计结果,得到更加准确的初始化形状;Step 3, using the head pose classification result obtained in step 2, and the estimation result of the facial expression and the position of the main feature points obtained in step 1, to obtain a more accurate initialization shape; 实现方式为,对图片通过基于卷积神经网络的头部姿势分类模型得到对应的输出类别c,选择对应的平均人脸形状
Figure FDA0003260842480000011
再根据主要特征点的位置进行调整,使得
Figure FDA0003260842480000012
的主要特征点和检测到的人脸主要特征点的误差最小,实现得到该图片人脸的初始化形状Si
The implementation method is to obtain the corresponding output category c for the picture through the head pose classification model based on the convolutional neural network, and select the corresponding average face shape.
Figure FDA0003260842480000011
Then adjust according to the position of the main feature points, so that
Figure FDA0003260842480000012
The error between the main feature points and the detected main feature points of the face is the smallest, and the initialization shape S i of the face in the picture is obtained;
步骤4,基于步骤3所得初始化形状,按照姿势以及表情的分类结果,训练各自的回归器,对人脸形状进行更新,逼近标准形状。Step 4, based on the initialized shape obtained in step 3, according to the classification results of pose and expression, train respective regressors, update the face shape, and approximate the standard shape.
2.根据权利要求1所述的基于由粗到细脸部形状估计的人脸对齐方法,其特征在于:步骤1中,多任务深度学习框架中,多任务学习包括对主任务人脸主要特征点的估计和其它子任务的估计,其中,人脸主要特征点包括左右嘴角、鼻尖和左右眼中心,子任务包括头部姿势、性别、眼睛形状和嘴巴形状的估计。2. The face alignment method based on the estimation of the face shape from coarse to fine according to claim 1, is characterized in that: in step 1, in the multi-task deep learning framework, multi-task learning includes the main features of the main task face Point estimation and estimation of other sub-tasks, where the main feature points of the face include the left and right mouth corners, the tip of the nose and the center of the left and right eyes, and the sub-tasks include the estimation of head pose, gender, eye shape and mouth shape. 3.根据权利要求1所述的基于由粗到细脸部形状估计的人脸对齐方法,其特征在于:步骤2中,基于卷积神经网络的头部姿势分类模型首先对人脸进行三维建模,得到人脸的角度参数pitch、jaw和roll,然后根据角度参数的取值范围对人脸图片进行分类;对于分类之后的图片,使用人脸侧面化方法生成其它类的图片,得到的新的图片集;将新的图片集作为基于卷积神经网络的头部姿势分类模型的训练集完成对模型的训练。3. the face alignment method based on the estimation of the face shape from thick to thin according to claim 1, it is characterized in that: in step 2, the head pose classification model based on convolutional neural network first carries out three-dimensional modeling of face. model, obtain the angle parameters pitch, jaw and roll of the face, and then classify the face pictures according to the value range of the angle parameters; for the classified pictures, use the face profile method to generate other types of pictures, and the obtained new The new image set is used as the training set of the convolutional neural network-based head pose classification model to complete the training of the model. 4.根据权利要求1或2或3所述的基于由粗到细脸部形状估计的人脸对齐方法,其特征在于:步骤4中,采用更高级的级联回归框架,首先对人脸对齐问题的优化空间进行域的划分,使得每个域包含的人脸形状比较相似,从而在回归器的训练中具有相同的梯度下降方向,每个域训练出各自的回归器;在对人脸形状进行更新时,首先判断该人脸形状属于哪一个域,然后采用对应域的回归器对其进行更新。4. the face alignment method based on the face shape estimation from thick to thin according to claim 1 or 2 or 3, it is characterized in that: in step 4, adopt more advanced cascade regression framework, first align the face The optimization space of the problem is divided into domains, so that the face shapes contained in each domain are relatively similar, so that they have the same gradient descent direction in the training of the regressor, and each domain trains its own regressor; When updating, first determine which domain the face shape belongs to, and then use the regressor of the corresponding domain to update it.
CN201810358918.1A 2018-04-20 2018-04-20 Face alignment method based on shape estimation of coarse face to fine face Expired - Fee Related CN108446672B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810358918.1A CN108446672B (en) 2018-04-20 2018-04-20 Face alignment method based on shape estimation of coarse face to fine face

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810358918.1A CN108446672B (en) 2018-04-20 2018-04-20 Face alignment method based on shape estimation of coarse face to fine face

Publications (2)

Publication Number Publication Date
CN108446672A CN108446672A (en) 2018-08-24
CN108446672B true CN108446672B (en) 2021-12-17

Family

ID=63201089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810358918.1A Expired - Fee Related CN108446672B (en) 2018-04-20 2018-04-20 Face alignment method based on shape estimation of coarse face to fine face

Country Status (1)

Country Link
CN (1) CN108446672B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902716B (en) * 2019-01-22 2021-01-29 厦门美图之家科技有限公司 Training method for alignment classification model and image classification method
CN109934129B (en) * 2019-02-27 2023-05-30 嘉兴学院 Face feature point positioning method, device, computer equipment and storage medium
CN111444787B (en) * 2020-03-12 2023-04-07 江西赣鄱云新型智慧城市技术研究有限公司 Fully intelligent facial expression recognition method and system with gender constraint
CN111951175A (en) * 2020-06-28 2020-11-17 中国电子科技网络信息安全有限公司 Face image normalization method based on self-coding network
CN112307899A (en) * 2020-09-27 2021-02-02 中国科学院宁波材料技术与工程研究所 Facial posture detection and correction method and system based on deep learning
CN112417991B (en) * 2020-11-02 2022-04-29 武汉大学 A dual-attention face alignment method based on hourglass capsule network
CN112270308B (en) * 2020-11-20 2021-07-16 江南大学 A facial feature point localization method based on two-layer cascaded regression model
CN113837932A (en) * 2021-09-28 2021-12-24 深圳市商汤科技有限公司 Face generation method, face recognition method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1794265A (en) * 2005-12-31 2006-06-28 北京中星微电子有限公司 Method and device for distinguishing face expression based on video frequency
CN104598936A (en) * 2015-02-28 2015-05-06 北京畅景立达软件技术有限公司 Human face image face key point positioning method
CN104657713A (en) * 2015-02-09 2015-05-27 浙江大学 Three-dimensional face calibrating method capable of resisting posture and facial expression changes
CN105512638A (en) * 2015-12-24 2016-04-20 黄江 Fused featured-based face detection and alignment method
CN107038429A (en) * 2017-05-03 2017-08-11 四川云图睿视科技有限公司 A kind of multitask cascade face alignment method based on deep learning
CN107563323A (en) * 2017-08-30 2018-01-09 华中科技大学 A kind of video human face characteristic point positioning method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8798374B2 (en) * 2008-08-26 2014-08-05 The Regents Of The University Of California Automated facial action coding system
US9633250B2 (en) * 2015-09-21 2017-04-25 Mitsubishi Electric Research Laboratories, Inc. Method for estimating locations of facial landmarks in an image of a face using globally aligned regression

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1794265A (en) * 2005-12-31 2006-06-28 北京中星微电子有限公司 Method and device for distinguishing face expression based on video frequency
CN104657713A (en) * 2015-02-09 2015-05-27 浙江大学 Three-dimensional face calibrating method capable of resisting posture and facial expression changes
CN104598936A (en) * 2015-02-28 2015-05-06 北京畅景立达软件技术有限公司 Human face image face key point positioning method
CN105512638A (en) * 2015-12-24 2016-04-20 黄江 Fused featured-based face detection and alignment method
CN107038429A (en) * 2017-05-03 2017-08-11 四川云图睿视科技有限公司 A kind of multitask cascade face alignment method based on deep learning
CN107563323A (en) * 2017-08-30 2018-01-09 华中科技大学 A kind of video human face characteristic point positioning method

Also Published As

Publication number Publication date
CN108446672A (en) 2018-08-24

Similar Documents

Publication Publication Date Title
CN108446672B (en) Face alignment method based on shape estimation of coarse face to fine face
CN112800903B (en) Dynamic expression recognition method and system based on space-time diagram convolutional neural network
Cheng et al. Fast and accurate online video object segmentation via tracking parts
CN109359526B (en) Human face posture estimation method, device and equipment
CN109359608B (en) A face recognition method based on deep learning model
Zhang et al. Content-adaptive sketch portrait generation by decompositional representation learning
Tang et al. Facial landmark detection by semi-supervised deep learning
Zheng et al. Attention-based spatial-temporal multi-scale network for face anti-spoofing
CN101499128A (en) Three-dimensional human face action detecting and tracing method based on video stream
Yang et al. CNN based 3D facial expression recognition using masking and landmark features
CN112232184B (en) Multi-angle face recognition method based on deep learning and space conversion network
CN104392241B (en) A kind of head pose estimation method returned based on mixing
CN107808129A (en) A kind of facial multi-characteristic points localization method based on single convolutional neural networks
CN105719285A (en) Pedestrian detection method based on directional chamfering distance characteristics
CN102654903A (en) Face comparison method
CN107066916A (en) Scene Semantics dividing method based on deconvolution neutral net
Uřičář et al. Real-time multi-view facial landmark detector learned by the structured output SVM
CN110569724A (en) A Face Alignment Method Based on Residual Hourglass Network
CN106557750A (en) It is a kind of based on the colour of skin and the method for detecting human face of depth y-bend characteristics tree
CN110909778B (en) An image semantic feature matching method based on geometric consistency
CN110852327A (en) Image processing method, device, electronic device and storage medium
CN110880010A (en) Visual SLAM closed loop detection algorithm based on convolutional neural network
CN112036260A (en) An expression recognition method and system for multi-scale sub-block aggregation in natural environment
CN111160119B (en) A multi-task deep discriminative metric learning model building method for makeup face verification
Chen et al. A multi-scale fusion convolutional neural network for face detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20211217

CF01 Termination of patent right due to non-payment of annual fee