CN112270308A

CN112270308A - A facial feature point localization method based on two-layer cascaded regression model

Info

Publication number: CN112270308A
Application number: CN202011305067.8A
Authority: CN
Inventors: 狄岚; 张佳慧; 顾雨迪
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2021-01-26
Anticipated expiration: 2040-11-20
Also published as: CN112270308B

Abstract

The invention discloses a face feature point positioning method based on a double-layer cascade regression model, which is applied to face feature point positioning, and improves the accuracy of face feature point positioning to a certain extent, wherein the first layer of the model is used for positioning a simplified face shape containing part of key feature points, in order to enhance the robustness of the face feature point positioning method, a fusion subspace is used for dividing samples, and each part of the samples train a feature extraction model and a regressor independently; the second layer is used for positioning the complete face shape, 3D fitting is carried out on the regression result of the first layer model to obtain the roughly aligned complete face shape, and finer regression is realized on the basis of the shape. Experiments prove that compared with a single-layer cascade regression model, the model provided by the invention has 23.55% improved performance on a 300-W challenge set with large attitude change and has certain attitude robustness.

Description

A facial feature point localization method based on two-layer cascaded regression model

技术领域technical field

本发明涉及计算机图像处理技术技术领域，尤其涉及一种基于双层级联回归模型的人脸特征点定位方法。The invention relates to the technical field of computer image processing technology, in particular to a method for locating facial feature points based on a double-layer cascaded regression model.

背景技术Background technique

人脸特征点定位的目标是在完成人脸检测的基础上，定位出更具体的人脸形状，如眉毛、眼睛、鼻子、嘴巴和轮廓等。人脸特征点定位是人脸图像处理任务中的重要环节，对于后续的人脸识别、人脸表情分析和3D人脸重建等工作都起到了至关重要的作用。目前的大部分人脸特征点定位算法已经可以在正面人脸图像上获得令人满意的效果，但对于表情和姿态变化巨大的无约束人脸图像，实现高精度的人脸特征点定位仍具有挑战性。The goal of facial feature point location is to locate more specific facial shapes, such as eyebrows, eyes, nose, mouth, and contours, based on the completion of face detection. Facial feature point location is an important part of face image processing tasks, and plays a vital role in subsequent face recognition, facial expression analysis, and 3D face reconstruction. Most of the current facial feature point localization algorithms can achieve satisfactory results on frontal face images, but for unconstrained face images with huge changes in expressions and postures, it is still difficult to achieve high-precision facial feature point localization. challenge.

人脸特征点定位的相关算法主要可以分为两个类别：(1)基于生成模型的方法。代表性算法有Cootes等提出的主动外观模型(active appearance model,AAM)，首先利用主成分分析(PCA)在形状和纹理特征空间构建参数模型，再通过优化参数，实现模型与人脸图像的匹配。尽管此类型的方法已经获得了各种改进，但参数模型的表达能力终归有限，无法处理细微的形状变化。(2)基于判别模型的方法。此类方法的目标是从提取的图像特征直接映射到人脸特征点坐标。级联回归模型是目前人脸特征点定位领域应用最广泛的模型，最早出现在Dollár等提出的级联姿态回归法(cascade pose regression,CPR)。基于级联回归的算法将人脸特征点定位视为解决图像纹理特征与人脸形状之间的非线性优化问题，通过学习一系列特征到形状的映射，来逐渐将初始形状更新为最终形状。近年来除了级联回归模型，基于深度网络的方法也得到了广泛的关注。最早将深度网络应用于人脸特征点定位的是Sun等提出的基于深度卷积神经网络(DCNN)的人脸特征点定位算法，也采用级联的方式实现从粗到精的定位。近些年也出现了利用3D人脸模型实现人脸特征点定位的算法。此类方法通过调整参数使模型与人脸图像拟合，再将拟合结果投影至2D平面以获得定位结果。由于3D人脸模型的参数较多，此类算法大多也需要借助深度网络。尽管深度网络的加入使得人脸特征点定位算法的性能得到大幅度的提升，目前依旧有许多优秀的基于级联回归的算法涌现，在拥有相对较低的训练成本的基础上，在某些情况下还具有超越基于深度学习的算法的性能。Algorithms related to facial feature point localization can be divided into two categories: (1) Methods based on generative models. The representative algorithm is the active appearance model (AAM) proposed by Cootes et al. First, the principal component analysis (PCA) is used to construct a parameter model in the shape and texture feature space, and then the parameters are optimized to realize the matching between the model and the face image. . While various improvements have been made to this type of approach, parametric models are ultimately limited in their expressive power to handle subtle shape changes. (2) Methods based on discriminant models. The goal of such methods is to map directly from the extracted image features to facial feature point coordinates. The cascade regression model is currently the most widely used model in the field of facial feature point location, and it first appeared in the cascade pose regression (CPR) method proposed by Dollár et al. Algorithms based on cascade regression regard facial feature point localization as solving the nonlinear optimization problem between image texture features and face shape, and gradually update the initial shape to the final shape by learning a series of feature-to-shape mappings. In addition to cascaded regression models, methods based on deep networks have also received extensive attention in recent years. The earliest application of deep network to facial feature point location is the facial feature point location algorithm based on deep convolutional neural network (DCNN) proposed by Sun et al. In recent years, algorithms for locating facial feature points using 3D face models have also appeared. Such methods fit the model to the face image by adjusting the parameters, and then project the fitting result to a 2D plane to obtain the localization result. Due to the large number of parameters in the 3D face model, most of these algorithms also require the help of deep networks. Although the addition of the deep network has greatly improved the performance of the facial feature point localization algorithm, there are still many excellent algorithms based on cascaded regression emerging. On the basis of relatively low training cost, in some cases It also has performance that surpasses deep learning-based algorithms.

形状初始化是基于级联回归的人脸特征点定位算法的第一步，许多传统的级联回归算法以训练集中所有样本真实形状的平均值，或者是一张正面中性的人脸形状作为初始形状，接着以初始形状为基础，通过多次迭代更新达到最终的预测结果。在使用这种初始形状时，通常是姿态和表情变化较大的人脸难以实现准确的定位，这是因为初始形状与真实形状相差较大时，级联回归模型可能会由于迭代次数有限，或者陷入局部最优，导致最终结果不理想。Xiong等人提出将样本按照同质下降域(domains of homogeneous descent,DHD)进行划分，再分别建立回归模型，避免回归进入局部最优，但他们提出的划分方法在实际计算时要用到样本的真实人脸形状，这在测试过程中并不现实。由粗到精的形状搜索(coarse-to-fine shape searching,CFSS)将所有样本的真实形状作为解空间来探索，虽然解决了初始形状的限制，但也降低了运算速率。Shape initialization is the first step of the face feature point localization algorithm based on cascade regression. Many traditional cascade regression algorithms use the average value of the true shape of all samples in the training set, or a positive neutral face shape as the initial shape, and then based on the initial shape, iteratively updated to reach the final prediction result. When using this initial shape, it is usually difficult for faces with large changes in pose and expression to achieve accurate positioning. This is because when the initial shape differs greatly from the real shape, the cascaded regression model may be limited due to the limited number of iterations, or Stuck in a local optimum, resulting in an unsatisfactory final result. Xiong et al. proposed to divide the samples according to domains of homogeneous descent (DHD), and then build regression models separately to avoid regression from entering the local optimum. Real face shape, which was not realistic during testing. Coarse-to-fine shape searching (CFSS) explores the true shape of all samples as a solution space, which solves the limitation of the initial shape, but also reduces the computational speed.

发明内容SUMMARY OF THE INVENTION

本部分的目的在于概述本发明的实施例的一些方面以及简要介绍一些较佳实施例。在本部分以及本申请的说明书摘要和发明名称中可能会做些简化或省略以避免使本部分、说明书摘要和发明名称的目的模糊，而这种简化或省略不能用于限制本发明的范围。The purpose of this section is to outline some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section and the abstract and title of the application to avoid obscuring the purpose of this section, abstract and title, and such simplifications or omissions may not be used to limit the scope of the invention.

鉴于上述现有存在的问题，提出了本发明。The present invention has been proposed in view of the above-mentioned existing problems.

因此，本发明解决的技术问题是：人脸特征点定位过程中存在的无约束人脸图像姿态变化大、初始形状限制回归结果的问题。Therefore, the technical problems solved by the present invention are: the problems existing in the process of locating the facial feature points in the unconstrained face image pose a large change, and the initial shape limits the regression result.

为解决上述技术问题，本发明提供如下技术方案：包括，通过从训练样本集中随机选取人脸形状作为初始形状的方法来扩增样本；In order to solve the above-mentioned technical problems, the present invention provides the following technical solutions: including, expanding the sample by randomly selecting the face shape from the training sample set as the initial shape;

从68点的完整人脸形状中挑选出对人脸特征点定位起到关键作用的14个点，组成简化人脸形状；From the complete face shape of 68 points, 14 points that play a key role in the positioning of face feature points are selected to form a simplified face shape;

利用子空间划分法，将样本分成多个子集，每个子集分别训练回归器，所有回归器共同构成第一层级联回归模型，用于预测简化人脸形状；Using the subspace division method, the samples are divided into multiple subsets, and each subset trains a regressor, and all the regressors together form the first-level cascaded regression model, which is used to predict the simplified face shape;

利用3D拟合的方式，实现从第一层预测的简化人脸形状到完整人脸形状的投影，生成粗略的初始完整人脸形状；Using 3D fitting, realize the projection from the simplified face shape predicted in the first layer to the complete face shape, and generate a rough initial complete face shape;

将粗略完整人脸形状作为初始形状，利用所有训练样本来训练回归器，构成第二层级联回归模型，形成双层级联回归模型，用于预测完整人脸形状。The rough and complete face shape is used as the initial shape, and all training samples are used to train the regressor to form a second-level cascaded regression model to form a two-layer cascaded regression model for predicting the complete face shape.

作为本发明所述的基于双层级联回归模型的人脸特征点定位方法的一种优选方案，其中：构建所述双层级联回归模型的步骤包括，As a preferred solution of the method for locating facial feature points based on the double-layer cascaded regression model of the present invention, wherein: the step of constructing the double-layered cascaded regression model includes:

首先提取基于PHOG的形状索引特征，使用训练过程中获得的投影矩阵P投影到融合子空间，并根据其划分出K个样本子集；First, extract the shape index feature based on PHOG, use the projection matrix P obtained in the training process to project it into the fusion subspace, and divide K sample subsets according to it;

接着根据不同的子集分别训练特征映射函数

和回归器

经过T次迭代回归得到简化人脸形状

Then, the feature mapping function is trained separately according to different subsets

and regressor

After T iterations of regression, the simplified face shape is obtained

再利用3D拟合法得到仿射变换参数，投影得到粗略的完整人脸形状s⁰；Then use the 3D fitting method to obtain affine transformation parameters, and project to obtain a rough complete face shape s ⁰ ;

最终，根据所有的训练样本和上个步骤得到的粗略完整人脸形状，训练特征提取函数Φ和回归器W，经过T次迭代实现精回归。Finally, according to all the training samples and the rough and complete face shape obtained in the previous step, the feature extraction function Φ and the regressor W are trained, and refined regression is achieved after T iterations.

作为本发明所述的基于双层级联回归模型的人脸特征点定位方法的一种优选方案，其中：挑选最优简化人脸形状的步骤包括，As a preferred solution of the method for locating facial feature points based on the double-layer cascaded regression model of the present invention, wherein: the step of selecting the optimal simplified face shape includes:

以最具代表性的5个轮廓特征点为基础，分别在眼睛、鼻子、嘴和轮廓的部位增加部分特征点，挑选出6个简化人脸形状；Based on the most representative 5 contour feature points, add some feature points to the eyes, nose, mouth and contour respectively, and select 6 simplified face shapes;

所述最具代表性的5个轮廓特征点包括双眼中心、鼻尖和嘴角。The five most representative contour feature points include the center of the eyes, the tip of the nose and the corner of the mouth.

作为本发明所述的基于双层级联回归模型的人脸特征点定位方法的一种优选方案，其中：挑选最优简化人脸形状的步骤还包括，As a preferred solution of the method for locating facial feature points based on the double-layer cascaded regression model of the present invention, wherein: the step of selecting the optimal simplified face shape further includes:

挑选最合适的简化人脸形状，包括计算定位误差和拟合误差；Pick the most suitable simplified face shape, including calculating the positioning error and fitting error;

比较所述定位误差和所述拟合误差，选取所述定位误差和所述拟合误差均为最小的14点形状，作为简化人脸形状。The positioning error and the fitting error are compared, and the 14-point shape with the minimum positioning error and the fitting error is selected as the simplified face shape.

作为本发明所述的基于双层级联回归模型的人脸特征点定位方法的一种优选方案，其中：所述定位误差计算方法为，基于每个简化人脸形状训练单层的级联回归模型，计算各个模型在测试集样本上的归一化标准误差；As a preferred solution of the method for locating facial feature points based on a double-layer cascaded regression model according to the present invention, wherein: the method for calculating the positioning error is to train a single-layer cascaded regression based on each simplified face shape. Model, calculate the normalized standard error of each model on the test set samples;

所述拟合误差计算方法为，基于定位误差得到的定位结果，以及3D拟合法生成的完整人脸形状，计算生成的完整人脸形状与样本的真实人脸形状之间的平均归一化误差。The fitting error calculation method is, based on the positioning result obtained by the positioning error and the complete face shape generated by the 3D fitting method, calculate the average normalized error between the generated complete face shape and the real face shape of the sample. .

作为本发明所述的基于双层级联回归模型的人脸特征点定位方法的一种优选方案，其中：所述融合子空间包括，As a preferred solution of the method for locating facial feature points based on the double-layer cascaded regression model of the present invention, wherein: the fusion subspace includes:

其中，r为融合子空间的列维数，z_i,j代表第i个样本的形状索引特征在第j维子空间的投影结果。Among them, r is the column dimension of the fusion subspace, and z _i,j represents the projection result of the shape index feature of the ith sample on the jth dimensional subspace.

作为本发明所述的基于双层级联回归模型的人脸特征点定位方法的一种优选方案，其中：利用所述融合子空间将样本划分成K个样本子集的步骤包括，As a preferred solution of the method for locating facial feature points based on the double-layer cascaded regression model of the present invention, wherein: the step of using the fusion subspace to divide the sample into K sample subsets includes:

定义训练集中的人脸图像和对应的真实形状集合，Define the face images in the training set and the corresponding set of real shapes,

其中N为样本数量，I_i和

分别为第i个人脸图像和对应的真实形状；where N is the number of samples, I _i and

are the ith face image and the corresponding real shape, respectively;

将平均简化人脸形状与所有训练集样本的边界框匹配，匹配结果定义为

Match the average simplified face shape to the bounding box of all training set samples, and the matching result is defined as

计算形状残差，compute the shape residuals,

以及基于PHOG的形状索引特征

and PHOG-based shape index features

利用典型相关分析对形状残差和形状索引特征进行分析，获得二者对应的投影矩阵P和Q；Use canonical correlation analysis to analyze the shape residual and shape index features, and obtain their corresponding projection matrices P and Q;

根据以下公式划分子集，The subsets are divided according to the following formula,

作为本发明所述的基于双层级联回归模型的人脸特征点定位方法的一种优选方案，其中：若第i个和第g个样本在融合子空间上的每一维符号都相同，则这两个样本属于同一子集U_k。As a preferred solution of the method for locating facial feature points based on the double-layer cascaded regression model of the present invention, wherein: if the i-th and g-th samples have the same sign in each dimension in the fusion subspace, Then the two samples belong to the same subset U _k .

作为本发明所述的基于双层级联回归模型的人脸特征点定位方法的一种优选方案，其中：利用所述3D拟合法为第二层模型生成初始形状的步骤包括，As a preferred solution of the method for locating facial feature points based on the double-layer cascaded regression model of the present invention, wherein: the step of using the 3D fitting method to generate an initial shape for the second-layer model includes:

利用由稠密的3D坐标点集合构成的3DMM模型，分别提取出14个坐标点集合和68个坐标点集合，分别记为3D简化人脸形状

和3D完整人脸形状

Using the 3DMM model composed of dense 3D coordinate point sets, 14 coordinate point sets and 68 coordinate point sets are extracted respectively, which are respectively recorded as 3D simplified face shape

and 3D full face shape

利用弱透视投影将3D简化人脸形状投影到2D平面，对s_3d与人脸图像拟合，包括：Use weak perspective projection to project the 3D simplified face shape to the 2D plane, and fit the s _3d to the face image, including:

其中，f为缩放因子，P_o为正投影矩阵

R(α,β,γ)为一个由俯仰角α，偏航角β和翻滚角γ构成的3×3的旋转矩阵，t_3d为位移向量。

为

在2D平面上的投影；Among them, f is the scaling factor, P _o is the orthographic projection matrix

R(α,β,γ) is a 3×3 rotation matrix composed of pitch angle α, yaw angle β and roll angle γ, and t _3d is the displacement vector.

for

Projection on a 2D plane;

若将第一层级联回归模型输出的2D简化人脸形状记作

通过最小化

与

之间的欧氏距离，则确定参数f，R和t_3d的合适值，实现人脸图像与3D人脸形状的拟合，包括：If the 2D simplified face shape output by the first-level cascaded regression model is recorded as

by minimizing

and

The Euclidean distance between, then determine the appropriate values of the parameters f, R and t _3d to realize the fitting of the face image and the 3D face shape, including:

利用最小二乘法计算得到放射变换的参数后，将上述弱透视投影公式中的3D简化人脸形状

替换成3D完整人脸形状

即可得到与人脸图像拟合后的3D完整人脸形状在2D平面上的投影

After calculating the parameters of the radiation transformation by the least squares method, the 3D simplified face shape in the above weak perspective projection formula

Replaced with 3D full face shape

The projection of the 3D complete face shape after fitting with the face image on the 2D plane can be obtained.

当人脸姿态发生变化时，3D人脸形状上标记的轮廓部位的特征点会受到脸颊的遮挡，而真正的轮廓特征点应该在脸颊的边界处。When the face pose changes, the feature points of the contour parts marked on the 3D face shape will be occluded by the cheeks, while the real contour feature points should be at the border of the cheeks.

使用上述拟合过程中得到的偏航角β，判断人脸是偏向左侧还是右侧。Use the yaw angle β obtained in the above fitting process to determine whether the face is biased to the left or the right.

作为本发明所述的基于双层级联回归模型的人脸特征点定位方法的一种优选方案，其中：利用所述3D拟合法为第二层模型生成初始形状的步骤还包括，As a preferred solution of the method for locating facial feature points based on the double-layer cascaded regression model of the present invention, wherein: the step of using the 3D fitting method to generate an initial shape for the second-layer model further includes:

当人脸偏向左侧时，左侧的8个轮廓特征点搜索其对应坐标点子集中x坐标最小点，将其作为新的轮廓特征点；When the face is biased to the left, the 8 contour feature points on the left search for the minimum x-coordinate point in its corresponding coordinate point subset, and use it as a new contour feature point;

当人脸偏向右侧时，就是右侧的8个轮廓特征点搜索其对应坐标点子集中x坐标最大点，标记成为新的轮廓特征点；When the face is biased to the right, the 8 contour feature points on the right side search for the maximum x-coordinate point in the subset of corresponding coordinate points, and mark it as a new contour feature point;

所述8个轮廓特征点为在所述5个轮廓特征点的基础上增加了3个轮廓特征点。The eight contour feature points are three contour feature points added on the basis of the five contour feature points.

本发明的有益效果：本发明提供了一种基于双层级联回归模型的人脸特征点定位方法，该方法利用子空间划分法和3D拟合法，结合双层级联回归的结构，使得人脸特征点定位方法的姿态鲁棒性得到了提升，进而提升了整体定位精度。Beneficial effects of the present invention: The present invention provides a method for locating facial feature points based on a double-layer cascaded regression model. The pose robustness of the facial feature point localization method is improved, which in turn improves the overall localization accuracy.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其它的附图。其中：In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort. in:

图1为基于双层级联回归模型的人脸特征点定位方法的流程示意图；1 is a schematic flowchart of a method for locating facial feature points based on a double-layer cascaded regression model;

图2为基于双层级联回归模型的人脸特征点定位方法的不同的简化人脸形状的定位误差和拟合误差的结果示意图；2 is a schematic diagram of the results of the positioning errors and fitting errors of different simplified face shapes based on the facial feature point positioning method based on the double-layer cascaded regression model;

图3为基于双层级联回归模型的人脸特征点定位方法的二维子空间内人脸图像分布图；Fig. 3 is the face image distribution diagram in the two-dimensional subspace of the facial feature point location method based on the double-layer cascaded regression model;

图4为基于双层级联回归模型的人脸特征点定位方法的K值对简化人脸形状在不同数据集上的定位效果影响示意图；4 is a schematic diagram showing the influence of the K value of the facial feature point localization method based on the double-layer cascaded regression model on the localization effect of the simplified face shape on different data sets;

图5为基于双层级联回归模型的人脸特征点定位方法的两种子空间划分法对简化人脸形状在300-W数据集上的定位效果影响示意图；Fig. 5 is a schematic diagram showing the influence of two subspace division methods of the facial feature point localization method based on the double-layer cascaded regression model on the localization effect of the simplified face shape on the 300-W data set;

图6为基于双层级联回归模型的人脸特征点定位方法的K值对完整人脸形状在不同数据集上的定位效果影响示意图；6 is a schematic diagram showing the influence of the K value of the facial feature point localization method based on the double-layer cascaded regression model on the localization effect of the complete face shape on different data sets;

图7为基于双层级联回归模型的人脸特征点定位方法的各个算法在300-W全集上的CED曲线示意图；7 is a schematic diagram of the CED curve of each algorithm based on the double-layer cascaded regression model-based facial feature point location method on the 300-W ensemble;

图8为基于双层级联回归模型的人脸特征点定位方法的现有技术中的不同算法在300-W测试集上的定位结果示意图；8 is a schematic diagram of the localization results of different algorithms in the prior art based on a double-layer cascaded regression model-based facial feature point localization method on a 300-W test set;

图9为基于双层级联回归模型的人脸特征点定位方法的算法在300-W测试集上的部分定位结果。Fig. 9 is a partial localization result of the algorithm of the facial feature point localization method based on the double-layer cascaded regression model on the 300-W test set.

具体实施方式Detailed ways

为使本发明的上述目的、特征和优点能够更加明显易懂，下面结合说明书附图对本发明的具体实施方式做详细的说明，显然所描述的实施例是本发明的一部分实施例，而不是全部实施例。基于本发明中的实施例，本领域普通人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明的保护的范围。In order to make the above objects, features and advantages of the present invention more obvious and easy to understand, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are a part of the embodiments of the present invention, not all of them. Example. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

在下面的描述中阐述了很多具体细节以便于充分理解本发明，但是本发明还可以采用其他不同于在此描述的其它方式来实施，本领域技术人员可以在不违背本发明内涵的情况下做类似推广，因此本发明不受下面公开的具体实施例的限制。In the following description, many specific details are set forth to facilitate a full understanding of the present invention, but the present invention can also be implemented in other ways different from those described herein, and those skilled in the art can do it without departing from the connotation of the present invention. Similar promotion, therefore, the present invention is not limited by the specific embodiments disclosed below.

其次，此处所称的“一个实施例”或“实施例”是指可包含于本发明至少一个实现方式中的特定特征、结构或特性。在本说明书中不同地方出现的“在一个实施例中”并非均指同一个实施例，也不是单独的或选择性的与其他实施例互相排斥的实施例。Second, reference herein to "one embodiment" or "an embodiment" refers to a particular feature, structure, or characteristic that may be included in at least one implementation of the present invention. The appearances of "in one embodiment" in various places in this specification are not all referring to the same embodiment, nor are they separate or selectively mutually exclusive from other embodiments.

本发明结合示意图进行详细描述，在详述本发明实施例时，为便于说明，表示器件结构的剖面图会不依一般比例作局部放大，而且所述示意图只是示例，其在此不应限制本发明保护的范围。此外，在实际制作中应包含长度、宽度及深度的三维空间尺寸。The present invention is described in detail with reference to the schematic diagrams. When describing the embodiments of the present invention in detail, for the convenience of explanation, the cross-sectional views showing the device structure will not be partially enlarged according to the general scale, and the schematic diagrams are only examples, which should not limit the present invention. scope of protection. In addition, the three-dimensional spatial dimensions of length, width and depth should be included in the actual production.

同时在本发明的描述中，需要说明的是，术语中的“上、下、内和外”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。此外，术语“第一、第二或第三”仅用于描述目的，而不能理解为指示或暗示相对重要性。At the same time, in the description of the present invention, it should be noted that the orientation or positional relationship indicated in terms such as "upper, lower, inner and outer" is based on the orientation or positional relationship shown in the accompanying drawings, which is only for the convenience of describing the present invention. The invention and simplified description do not indicate or imply that the device or element referred to must have a particular orientation, be constructed and operate in a particular orientation, and therefore should not be construed as limiting the invention. Furthermore, the terms "first, second or third" are used for descriptive purposes only and should not be construed to indicate or imply relative importance.

本发明中除非另有明确的规定和限定，术语“安装、相连、连接”应做广义理解，例如：可以是固定连接、可拆卸连接或一体式连接；同样可以是机械连接、电连接或直接连接，也可以通过中间媒介间接相连，也可以是两个元件内部的连通。对于本领域的普通技术人员而言，可以具体情况理解上述术语在本发明中的具体含义。Unless otherwise expressly specified and limited in the present invention, the term "installation, connection, connection" should be understood in a broad sense, for example: it may be a fixed connection, a detachable connection or an integral connection; it may also be a mechanical connection, an electrical connection or a direct connection. The connection can also be indirectly connected through an intermediate medium, or it can be the internal communication between two elements. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood in specific situations.

实施例1Example 1

参照图1，为本发明的第一个实施例，提供了一种基于双层级联回归模型的人脸特征点定位方法流程示意图，如图1所示，一种基于双层级联回归模型的人脸特征点定位方法包括：通过从训练样本集中随机选取人脸形状作为初始形状的方法来扩增样本；从68点的完整人脸形状中挑选出对人脸特征点定位起到关键作用的14个点，组成简化人脸形状；构建双层级联回归模型，第一层用于预测简化人脸形状，第二层用于预测完整人脸形状；为了提高姿态鲁棒性，在第一层级联回归模型中利用子空间划分法，将样本分成多个子集，每个子集分别训练回归器；两层之间利用3D拟合的方式，实现从简化人脸形状到完整人脸形状的投影，生成粗略的初始完整人脸形状；第一层模型使用中性的简化人脸形状作为初始形状，第二层模型使用3D拟合的结果作为初始形状。Referring to Fig. 1, it is a first embodiment of the present invention, which provides a schematic flowchart of a method for locating facial feature points based on a double-layer cascaded regression model. As shown in Fig. 1, a method based on a double-layer cascaded regression model The facial feature point localization method includes: amplifying the sample by randomly selecting the face shape from the training sample set as the initial shape; selecting from the complete face shape of 68 points plays a key role in the localization of facial feature points The 14 points are composed of simplified face shape; a two-layer cascaded regression model is constructed, the first layer is used to predict the simplified face shape, and the second layer is used to predict the complete face shape; in order to improve the pose robustness, in the first In the first-level cascaded regression model, the subspace division method is used to divide the samples into multiple subsets, and each subset trains the regressor separately; the 3D fitting method is used between the two layers to realize the transformation from simplified face shape to complete face shape. Projection to generate a rough initial full face shape; the first layer model uses the neutral simplified face shape as the initial shape, and the second layer model uses the result of 3D fitting as the initial shape.

具体的，一种基于双层级联回归模型的人脸特征点定位方法包括利用融合子空间划分样本子集的过程，定义训练集中的人脸图像和对应的真实形状集合

其中N为样本数量，I_i和

分别为第i个人脸图像和对应的真实形状。首先将平均简化人脸形状与所有训练集样本的边界框匹配，匹配结果定义为

接着计算形状残差

以及基于PHOG的形状索引特征

利用典型相关分析(CCA)对形状残差和形状索引特征进行分析，获得二者对应的投影矩阵P和Q；由于在测试过程中形状残差不可知，因此仅使用形状索引特征的投影结果，并且将该投影结果的子空间命名为融合子空间

其中r为融合子空间的列维数，z_i,j代表第i个样本的形状索引特征在第j维子空间的投影结果；如果第i个和第g个样本在融合子空间上的每一维符号都相同，那么这两个样本属于同一子集U_k。Specifically, a method for locating facial feature points based on a two-layer cascaded regression model includes a process of dividing a sample subset by using a fusion subspace, and defining a face image in a training set and a corresponding set of real shapes

where N is the number of samples, I _i and

are the i-th face image and the corresponding real shape, respectively. First, the average simplified face shape is matched with the bounding box of all training set samples, and the matching result is defined as

Then calculate the shape residual

and PHOG-based shape index features

Use canonical correlation analysis (CCA) to analyze the shape residuals and shape index features, and obtain their corresponding projection matrices P and Q; since the shape residuals are unknown during the testing process, only the projection results of the shape index features are used, And the subspace of the projection result is named the fusion subspace

where r is the column dimension of the fusion subspace, z _i,j represents the projection result of the shape index feature of the ith sample on the jth dimensional subspace; The one-dimensional symbols are the same, then the two samples belong to the same subset U _k .

一种稀疏混合字典学习的人脸鉴别方法还包括利用3D拟合为第二层模型生成初始形状。本实施例使用3DMM模型中的中性人脸形状，还根据人脸特征点定位所需的坐标点，从稠密的3D坐标点集合中分别提取出14个坐标点集合和68个坐标点集合，分别记为3D简化人脸形状

和3D完整人脸形状

为了使s_3d与人脸图像拟合，本实施例先利用弱透视投影将3D简化人脸形状投影

到2D平面上。如果将第一层级联回归模型输出的2D简化人脸形状记作

那么通过最小化

与

之间的欧氏距离，就可以确定对

执行弱透视投影的仿射变换参数，实现人脸图像与3D人脸形状的拟合。本实施例使用最小二乘法计算得到放射变换的参数，对3D完整人脸形状

执行相同的仿射变换，即可得到与人脸图像拟合后的3D完整人脸形状在2D平面上的投影

当人脸姿态发生变化时，3D人脸形状上标记的轮廓部位的特征点会受到脸颊的遮挡，而真正的轮廓特征点应该在脸颊的边界处。本实施例为16个轮廓特征点(排除了最底部的轮廓特征点)建立了与它们自身平行的坐标点子集。使用上述拟合过程中得到的偏航角β，可以判断人脸是偏向左侧还是右侧。当人脸偏向左侧时，左侧的8个轮廓特征点搜索其对应坐标点子集中x坐标最小点，将其作为新的轮廓特征点；而当人脸偏向右侧时，就是右侧的8个轮廓特征点搜索其对应坐标点子集中x坐标最大点，标记成为新的轮廓特征点。A face identification method for sparse hybrid dictionary learning also includes generating an initial shape for the second layer model using 3D fitting. This embodiment uses the neutral face shape in the 3DMM model, and also locates the required coordinate points according to the facial feature points, and extracts 14 coordinate point sets and 68 coordinate point sets respectively from the dense 3D coordinate point set, Respectively recorded as 3D simplified face shape

and 3D full face shape

In order to make s _3d fit with the face image, this embodiment first uses weak perspective projection to project the 3D simplified face shape.

onto the 2D plane. If the 2D simplified face shape output by the first-level cascaded regression model is written as

Then by minimizing

and

The Euclidean distance between

The affine transformation parameters of the weak perspective projection are performed to realize the fitting of the face image and the 3D face shape. This embodiment uses the least squares method to calculate the parameters of the radiation transformation, and the 3D complete face shape

Perform the same affine transformation to get the projection of the 3D complete face shape fitted with the face image on the 2D plane

When the face pose changes, the feature points of the contour parts marked on the 3D face shape will be occluded by the cheeks, while the real contour feature points should be at the border of the cheeks. This embodiment establishes a subset of coordinate points parallel to themselves for the 16 contour feature points (excluding the bottommost contour feature point). Using the yaw angle β obtained in the above fitting process, it can be determined whether the face is biased to the left or the right. When the face is biased to the left, the 8 contour feature points on the left search for the minimum x-coordinate point in the subset of corresponding coordinate points, and use it as the new contour feature point; and when the face is biased to the right, it is the 8 on the right. Each contour feature point searches for the maximum x-coordinate point in its corresponding coordinate point subset, and marks it as a new contour feature point.

较佳的是，本实施例综合了上述两种方法，在传统的级联回归模型的基础上构建了双层级联回归模型。首先提取基于PHOG的形状索引特征，使用训练过程中获得的投影矩阵P投影到融合子空间，并根据其划分出K个样本子集；接着根据不同的子集分别训练特征映射函数

和回归器

经过T次迭代回归得到简化人脸形状

再利用上述的3D拟合法得到仿射变换参数，投影得到粗略的完整人脸形状s⁰；最终，根据所有的训练样本和上个步骤得到的粗略完整人脸形状，训练特征提取函数Φ和回归器W，经过T次迭代实现精回归。Preferably, in this embodiment, the above two methods are combined, and a two-layer cascaded regression model is constructed on the basis of the traditional cascaded regression model. Firstly, the shape index feature based on PHOG is extracted, and the projection matrix P obtained in the training process is used to project it into the fusion subspace, and K sample subsets are divided according to it; then the feature mapping function is trained according to different subsets.

and regressor

After T iterations of regression, the simplified face shape is obtained

Then use the above-mentioned 3D fitting method to obtain affine transformation parameters, and project to obtain a rough and complete face shape s ⁰ ; finally, according to all training samples and the rough and complete face shape obtained in the previous step, the training feature extraction function Φ and regression The device W, after T iterations to achieve precise regression.

实施例2Example 2

参照图2到图9，为本发明的第二个实施例，该实施例不同于第一个实施例的是：使用的实验样本来自于两个使用广泛的人脸特征点定位数据集：HELEN数据集和300-W数据集。这些数据集包含的均为具有姿态偏转、遮挡和照明变化的无约束人脸图像，具有一定的挑战性。在训练过程中，先将所有样本水平翻转，再给每个样本从其他训练样本中随机采样10个人脸形状作为初始形状，样本总量扩充了20倍。在测试过程中，第一层级联回归模型依旧使用平均人脸形状作为初始形状，第二层级联回归模型则使用3D拟合的结果作为初始形状。Referring to FIG. 2 to FIG. 9 , it is a second embodiment of the present invention. This embodiment is different from the first embodiment in that the experimental samples used are from two widely used facial feature point positioning datasets: HELEN dataset and the 300-W dataset. These datasets contain unconstrained face images with pose deflection, occlusion, and illumination changes, which are challenging. In the training process, all samples are flipped horizontally, and then 10 face shapes are randomly sampled from other training samples for each sample as the initial shape, and the total number of samples is expanded by 20 times. During the testing process, the first-level cascaded regression model still uses the average face shape as the initial shape, and the second-level cascaded regression model uses the result of 3D fitting as the initial shape.

参数设置：第一层和第二层级联回归模型采用基本相同的参数，每个随机森林中决策树的个数G＝10，树的深度D＝5，迭代次数T＝7。由于第二层级联回归模型的形状变化空间要比第一层小，第一层和第二层级联回归模型在随机选取像素差特征的时采用不同的范围，第一层采取半径随着迭代次数从0.4递减至0.08，第二层采取半径随着迭代次数从0.3递减至0.06(均为经过人脸边界框归一化的距离)。Parameter setting: The first-layer and second-layer cascaded regression models use basically the same parameters, the number of decision trees in each random forest is G=10, the depth of the tree is D=5, and the number of iterations is T=7. Since the shape change space of the second-level cascaded regression model is smaller than that of the first-level, the first-level and second-level cascaded regression models use different ranges when randomly selecting pixel difference features. Descending from 0.4 to 0.08, the radius of the second layer decreases from 0.3 to 0.06 with the number of iterations (all distances normalized by the face bounding box).

评估标准：与目前的大多数人脸特征点定位算法相同，使用归一化平均误差(normalized mean error，NME)来衡量算法的准确性。归一化标准误差的计算方法如下公式所示：Evaluation standard: Same as most of the current facial feature point localization algorithms, the normalized mean error (NME) is used to measure the accuracy of the algorithm. The normalized standard error is calculated as follows:

其中N是样本总量，L是人脸形状的特征点数量，s_i和

分别是第i个样本的预测人脸形状和真实人脸形状，d_ipd则是第i个样本双眼中心之间的欧氏距离。where N is the total number of samples, L is the number of feature points of the face shape, s _i and

are the predicted face shape and the real face shape of the ith sample, respectively, and d _ipd is the Euclidean distance between the centers of the eyes of the ith sample.

基于上述，首先简化人脸形状的拟合效果评估，在定位简化人脸形状之前，首先要确定具体使用哪些特征点作为简化人脸形状。本实施例以包含双眼中心、鼻尖和嘴角的5点形状为基础，分别在眼睛、鼻子、嘴和轮廓的部位增加了部分特征点，挑选出共记6个简化人脸形状。为了从这些形状中选出最合适的简化人脸形状，本实施例从两个方面进行评估：(1)基于每个简化人脸形状训练单层的级联回归模型，并计算各个模型在测试集样本上的归一化标准误差，记为定位误差；(2)利用(1)得到的定位结果，按照本实施例提出的3D拟合法生成完整人脸形状，并计算生成的完整人脸形状与样本的真实人脸形状之间的平均归一化误差，记为拟合误差。图2展示了6个简化人脸形状按照以上标准在HELEN数据集上计算得到的定位误差和拟合误差。同时为了证明简化人脸形状的优势，还在图中标记了直接对完整人脸形状进行定位的误差，以及根据定位结果进行3D拟合后新生成的完整人脸形状的误差。Based on the above, first simplify the evaluation of the fitting effect of the face shape, and before locating the simplified face shape, first determine which feature points to use as the simplified face shape. This embodiment is based on the 5-point shape including the center of the eyes, the tip of the nose, and the corner of the mouth. Some feature points are added to the parts of the eyes, nose, mouth, and contour, respectively, and a total of 6 simplified face shapes are selected and recorded. In order to select the most suitable simplified face shape from these shapes, this embodiment evaluates from two aspects: (1) trains a single-layer cascaded regression model based on each simplified face shape, and calculates The normalized standard error on the set samples is recorded as the positioning error; (2) using the positioning results obtained in (1), generate a complete face shape according to the 3D fitting method proposed in this embodiment, and calculate the generated complete face shape The average normalized error between the real face shape of the sample and the fitting error. Figure 2 shows the localization error and fitting error calculated on the HELEN dataset for six simplified face shapes according to the above criteria. At the same time, in order to prove the advantage of simplifying the face shape, the error of directly locating the complete face shape and the error of the newly generated complete face shape after 3D fitting according to the positioning result are also marked in the figure.

根据图2中的定位误差曲线和拟合误差曲线可以看出，8点形状和16点形状相较5点和14点形状分别在轮廓部位增加了3点和2点，定位的误差也因此上升，证明了轮廓部位的特征不能很好的通过纹理特征来反映。然而8点形状的拟合误差相较5点形状大大降低，证明加入少量的轮廓部位的特征点可以约束整体形状，降低拟合的误差。8点、10点、12点和14点形状相较其前一个形状，分别做出添加3个轮廓特征点、把2个双眼中心特征点改为4个双眼眼角特征点、添加2个鼻翼特征点和添加2个上下唇特征点的改进，拟合的误差也随之逐渐减小，证明增加五官部分的特征点可以得到更好的拟合效果。但考虑到运算成本，本实施例没有继续尝试在简化人脸形状的五官部位增加更多的特征点。According to the positioning error curve and fitting error curve in Figure 2, it can be seen that the 8-point shape and the 16-point shape have an increase of 3 points and 2 points in the contour part compared with the 5-point and 14-point shapes, respectively, and the positioning error also increases. , which proves that the features of contour parts cannot be well reflected by texture features. However, the fitting error of the 8-point shape is much lower than that of the 5-point shape, which proves that adding a small number of feature points of the contour can constrain the overall shape and reduce the fitting error. Compared with the previous shape, the 8 o'clock, 10 o'clock, 12 o'clock and 14 o'clock shapes are made to add 3 contour feature points, change 2 eye center feature points to 4 double eye corner feature points, and add 2 nose wing features. Point and the improvement of adding two upper and lower lip feature points, the fitting error also gradually decreases, which proves that adding the feature points of the facial features can get a better fitting effect. However, considering the computational cost, this embodiment does not continue to try to add more feature points to the facial features of the simplified face shape.

从图2中还可以看出包含68个点的完整人脸形状的定位误差要高于所有简化人脸形状，这说明大量的非关键特征点会提高定位的难度。在挑选的6个简化人脸形状中，定位误差和拟合误差最小的均为14点形状，其具体的拟合误差是9.22％，而直接对完整人脸形状执行定位、拟合操作，得到的拟合误差是9.63％。由此可见该简化人脸形状既可以提高定位的准确度，又可以防止过多的特征点互相约束导致3D模型欠拟合。在接下来的实验中，都将该14点形状作为简化人脸形状。It can also be seen from Figure 2 that the positioning error of the complete face shape containing 68 points is higher than that of all simplified face shapes, which shows that a large number of non-key feature points will increase the difficulty of positioning. Among the selected 6 simplified face shapes, the 14-point shape with the smallest positioning error and fitting error is the 14-point shape, and its specific fitting error is 9.22%, and the positioning and fitting operations are directly performed on the complete face shape to obtain The fitting error is 9.63%. It can be seen that the simplified face shape can not only improve the positioning accuracy, but also prevent too many feature points from constraining each other to cause underfitting of the 3D model. In the following experiments, the 14-point shape is used as the simplified face shape.

较佳的，子空间划分的效果评估，为了证明本实施例提出的融合子空间划分法的有效性，将从两个方面进行评估：(1)融合子空间与人脸姿态的相关性；(2)子空间划分的结果对第一层级联回归模型的定位效果影响。Preferably, the effect evaluation of the subspace division, in order to prove the effectiveness of the fusion subspace division method proposed in this embodiment, will be evaluated from two aspects: (1) the correlation between the fusion subspace and the face pose; ( 2) The result of subspace division affects the positioning effect of the first-level cascaded regression model.

由于人脸图像具有流形分布的特点，一个有效的特征子空间也应当满足这一特点。按照本实施例提出的子空间划分法，对300-W数据集中的样本提取特征并投影到融合子空间，此时部分人脸图像在融合子空间的前两维分布情况如图3所示。不难看出，沿着图中的水平方向，样本的偏航角发生变化，也就是从朝向左侧逐渐过渡为朝向右侧；而沿着图中的垂直方向，样本的滚动角发生变化，也就是从向左倾斜逐渐过渡为向右倾斜。分布在图像中心的样本姿态偏转角度较小，主要为正面人脸；四周的样本则根据水平和垂直方向上的变化趋势，姿态偏转角逐渐变大。这证明了融合子空间满足人脸的流形分布特点，并且可以用于划分不同姿态的样本子集。Since face images have the characteristics of manifold distribution, an effective feature subspace should also satisfy this characteristic. According to the subspace division method proposed in this embodiment, features are extracted from the samples in the 300-W data set and projected into the fusion subspace. At this time, the distribution of some face images in the first two dimensions of the fusion subspace is shown in Figure 3. It is not difficult to see that along the horizontal direction in the figure, the yaw angle of the sample changes, that is, it gradually transitions from the left to the right; and along the vertical direction in the figure, the roll angle of the sample changes, and It is a gradual transition from leaning to the left to leaning to the right. The samples distributed in the center of the image have a small attitude deflection angle, mainly frontal faces; the samples around the image have a gradually larger attitude deflection angle according to the change trend in the horizontal and vertical directions. This proves that the fusion subspace satisfies the manifold distribution characteristics of faces and can be used to divide sample subsets with different poses.

为了评估样本子集的数量K对第一层级联回归模型的定位效果的影响，本实施例计算取不同K值时简化人脸形状在不同数据集上定位的归一化平均误差，具体结果如图4所示。其中K＝1代表着不进行子空间划分，使用全部的样本训练单个回归模型。可以看出在HELEN数据集上，误差从K＝1到K＝4呈现下降趋势，从4.46％降至4.32％，证明子空间划分法可以有效的降低定位误差。但在K＝8和K＝16的情况下，误差重新升高，甚至在K＝16时超过了不进行子空间划分的情况。造成这一现象的原因一是随着子集数量增多，每个子集中的训练样本数量也在减少，导致模型的鲁棒性降低；二是HELEN数据集的姿态变化较少，子集数量过大容易造成混乱的划分结果。而在姿态变化丰富的300-W挑战集上，子空间划分法可以大幅降低定位的误差，从K＝1到K＝8，误差从13.83％降至11.48％。在K＝16时误差略微回升，理由与HELEN数据集相似。根据该实验结果，本实施例算法在与其他算法比较时，在HELEN数据集和300-W数据集上K的取值分别取4和8。In order to evaluate the influence of the number K of sample subsets on the positioning effect of the first-level cascaded regression model, this embodiment calculates the normalized average error of the simplified face shape positioning on different data sets when different K values are taken. The specific results are as follows: shown in Figure 4. Where K=1 means that no subspace division is performed, and all samples are used to train a single regression model. It can be seen that on the HELEN dataset, the error shows a downward trend from K=1 to K=4, from 4.46% to 4.32%, which proves that the subspace division method can effectively reduce the positioning error. However, in the case of K=8 and K=16, the error increases again, and even exceeds the case of no subspace division at K=16. The reason for this phenomenon is that, as the number of subsets increases, the number of training samples in each subset is also decreasing, which reduces the robustness of the model; the second is that the pose of the HELEN dataset changes less, and the number of subsets is too large. It is easy to cause confusing division results. On the 300-W challenge set with rich pose changes, the subspace division method can greatly reduce the positioning error, from K=1 to K=8, the error is reduced from 13.83% to 11.48%. The error picks up slightly at K=16 for similar reasons to the HELEN dataset. According to the experimental results, when the algorithm of this embodiment is compared with other algorithms, the values of K in the HELEN data set and the 300-W data set are respectively 4 and 8.

为了证明融合子空间划分法相较PCA子空间划分更具优势，本实施例分别比较了在K＝2，4，8时，两种子空间划分方法对简化人脸形状在300-W挑战集上的定位误差影响，如图5所示。图中虚线的位置是不使用子空间划分时的定位误差，可以看出两种子空间划分法都能降低定位误差，但基于融合子空间的划分法的效果要明显由于基于PCA子空间的。由于基于PCA子空间的划分方法无法应用于静态人脸，在此次实验中仅利用形状索引特征来实现该方法。这种方法忽视了形状索引特征和形状残差之间的相关性，得到的划分结果也相对混乱；而本实施例的方法融合了二者的特性，因此对姿态的划分更加准确，在K的各个取值中都能达到更低的对齐误差。In order to prove that the fusion subspace division method is more advantageous than the PCA subspace division, this embodiment compares the performance of the two subspace division methods on the simplified face shape on the 300-W challenge set when K=2, 4, and 8, respectively. The effect of positioning error is shown in Figure 5. The position of the dotted line in the figure is the positioning error when subspace division is not used. It can be seen that both subspace division methods can reduce the positioning error, but the effect of the division method based on fusion subspace is obvious because it is based on PCA subspace. Since the division method based on PCA subspace cannot be applied to static faces, only shape index features are used to implement the method in this experiment. This method ignores the correlation between the shape index feature and the shape residual, and the obtained division results are also relatively chaotic; while the method of this embodiment combines the characteristics of the two, so the division of the pose is more accurate. A lower alignment error can be achieved in each value.

进一步的，双层级联回归模型的效果评估，基于未经过改进的LBF算法，不使用子空间划分的双层级联回归模型(K＝1)，以及K取不同值时的双层级联回归模型在HELEN数据集和300-W数据集上对完整人脸形状的定位结果。Further, the effect evaluation of the two-layer cascaded regression model is based on the unimproved LBF algorithm, the two-layer cascaded regression model (K=1) without subspace division, and the two-layer cascaded regression model when K takes different values. Localization results of the regression model on the complete face shape on the HELEN dataset and the 300-W dataset.

由于本实施例使用的部分参数与LBF算法的原文章不同，表1中关于LBF算法的实验数据与其文中的数据存在一定差异。根据表1可以看出不使用子空间划分的双层级联回归模型(K＝1)在所有的数据集上表现都要比改进前的LBF算法出色，这证明了双层模型要优于单层模型。继续加入子空间划分的改进后，算法在HELEN数据集，300-W挑战集上随着K值增加，误差也逐渐降低，并分别在K＝4和K＝8时误差降至最低，相较原算法分别降低了4％和23.55％。这与简化人脸形状的定位效果是对应的，因此K值的范围只分别取到4和8为止。在300-W普通集上，取不同K值的双层级联回归模型的误差相差甚小，造成这一现象的原因可能是由于姿态变化较小的普通集样本密集地分布在子空间的中心位置，并且靠近划分的边界。在进行子空间划分后，一些正面人脸样本可能被分配到不合适的子集当中，于是这一改进对正面人脸图像的定位结果优化不是非常明显。尽管如此，与挑战集的定位结果平均后，300-W全集的定位误差会随着K值的增加而降低，相较原算法降低了11.24％。Since some parameters used in this embodiment are different from the original article of the LBF algorithm, the experimental data about the LBF algorithm in Table 1 is different from the data in the article. According to Table 1, it can be seen that the double-layer cascaded regression model (K=1) without subspace division performs better than the LBF algorithm before improvement on all data sets, which proves that the double-layer model is better than the single-layer model. layer model. After continuing to improve the subspace division, the algorithm in the HELEN data set and the 300-W challenge set increases with the K value, and the error gradually decreases, and the error is reduced to the lowest when K=4 and K=8 respectively. The original algorithm reduces by 4% and 23.55%, respectively. This corresponds to the positioning effect of simplifying the shape of the face, so the range of the K value is only 4 and 8 respectively. On the 300-W common set, the errors of the two-layer cascaded regression models with different K values are very small. The reason for this phenomenon may be that the common set samples with small attitude changes are densely distributed in the center of the subspace. location, and close to the boundary of the division. After subspace division, some frontal face samples may be assigned to inappropriate subsets, so this improvement is not very obvious for the optimization of the localization results of frontal face images. Nevertheless, after averaging the localization results with the challenge set, the localization error of the 300-W ensemble will decrease with the increase of the K value, which is 11.24% lower than the original algorithm.

如图6所示，其中第7次迭代的结果与表1中的数据对应。可以看出准确的初始形状可以让第二层级联回归模型在第一次迭代时就具有优势，并且可以保持这一优势至最后一次迭代，达到更好的定位效果。As shown in Figure 6, the results of the seventh iteration correspond to the data in Table 1. It can be seen that the accurate initial shape can give the second-level cascaded regression model an advantage in the first iteration, and can maintain this advantage to the last iteration to achieve better localization effect.

表1在不同数据集上的完整人脸形状定位效果(％)Table 1. Complete face shape localization effects on different datasets (%)

基于上述，为了和目前的现进方法比较，本实施例将以各个算法在不同数据集上的归一化平均误差、CED曲线以及它们具体的定位结果作为评估的标准。Based on the above, in order to compare with the current state-of-the-art methods, this embodiment will use the normalized average error, CED curve and their specific positioning results of each algorithm on different data sets as evaluation criteria.

表2展示了不同的算法在HELEN和300-W数据集上的归一化平均误差，这些数据均来自于算法的原文章或者其相关文章。其中ESR，SDM，LBF，CFSS以及MCO是基于级联回归的算法，CFAN，3DDFA以及PIFA-S是基于深度学习的算法。Table 2 shows the normalized average errors of different algorithms on the HELEN and 300-W datasets, which are all from the original paper of the algorithm or its related papers. Among them, ESR, SDM, LBF, CFSS and MCO are algorithms based on cascaded regression, and CFAN, 3DDFA and PIFA-S are algorithms based on deep learning.

根据表中的数据可以看出，在姿态变化较小的HELEN数据集和300-W普通集上，本实施例的算法是所有对比算法中误差最小的，甚至优于三种基于深度学习的算法。造成这一结果的原因应该是CFAN仅将特征映射函数改为深度自编码器网络，本质上仍属于单层级联回归；3DDFA和PIFA-S基于3DMM模型，3D人脸对齐的结果在2D人脸对齐的评价标准下并不占优势。这一结果也证明了双层级联回归模型由粗到精的回归过程很好地提升了算法的精度。在姿态变化丰富的300-W挑战集上，本实施例算法在基于级联回归的算法中排第二位，仅次于CFSS算法，和两种基于深度学习的算法相比也具有一定的竞争力。其中CFSS通过计算概率矩阵来挑选候选形状，并将多个候选形状的更新结果融合来生成新形状，以庞大的计算量为代价提升了定位的精度；3DDFA和PIFA-S均利用卷积神经网络来优化3DMM参数，实现3D人脸对齐，对姿态和遮挡变化都具有很强的鲁棒性。将300-W普通集和挑战集的结果平均后，本实施例算法的结果仅次于CFSS，超过了其他所有基于级联回归的算法以及两种基于深度学习的算法，因此，本实施例的算法和目前先进的算法相比具有很好的竞争优势。According to the data in the table, it can be seen that the algorithm of this embodiment has the smallest error among all the comparison algorithms on the HELEN data set and the 300-W common set with small attitude changes, and is even better than the three algorithms based on deep learning . The reason for this result should be that CFAN only changes the feature mapping function to a deep autoencoder network, which is still a single-level cascaded regression in essence; 3DDFA and PIFA-S are based on the 3DMM model, and the results of 3D face alignment are in 2D people. It is not dominant under the evaluation criteria of face alignment. This result also proves that the regression process of the double-layer cascaded regression model from coarse to fine can improve the accuracy of the algorithm. In the 300-W challenge set with rich posture changes, the algorithm of this embodiment ranks second among the algorithms based on cascade regression, second only to the CFSS algorithm, and also has a certain degree of competition compared with the two algorithms based on deep learning. force. Among them, CFSS selects candidate shapes by calculating the probability matrix, and fuses the update results of multiple candidate shapes to generate new shapes, which improves the accuracy of positioning at the expense of huge computational load; 3DDFA and PIFA-S both use convolutional neural networks. To optimize 3DMM parameters, achieve 3D face alignment, and have strong robustness to pose and occlusion changes. After averaging the results of the 300-W normal set and the challenge set, the results of the algorithm in this embodiment are second only to CFSS, and surpass all other algorithms based on cascaded regression and two algorithms based on deep learning. Compared with the current advanced algorithms, the algorithm has a good competitive advantage.

表2在300-W数据集上的归一化平均误差(％)Table 2 Normalized mean error (%) on the 300-W dataset

如图7所示，其中本实施例复现的SDM算法对应的归一化平均误差为7.04％，相较原文的7.50％要更好；复现的LBF算法对应的归一化平均误差为6.59％，相较原文的6.32％略差，这是因为为了防止特征维数过高对运行内存消耗过大，本实施例在复现时把决策树数量从原文的1200缩减到680；复现的CFSS算法对应的归一化平均误差为6.00％，使用的是原作者在Github上提供的MATLAB代码，可以看出，本实施例的算法和其他算法相比具有一定的优势。As shown in Figure 7, the normalized average error corresponding to the reproduced SDM algorithm in this embodiment is 7.04%, which is better than the 7.50% of the original text; the normalized average error corresponding to the reproduced LBF algorithm is 6.59 %, which is slightly worse than the original 6.32%, because in order to prevent the running memory consumption from being too high due to the high feature dimension, this embodiment reduces the number of decision trees from 1200 in the original to 680 when reproducing; the reproduced CFSS The normalized average error corresponding to the algorithm is 6.00%, and the MATLAB code provided by the original author on Github is used. It can be seen that the algorithm of this embodiment has certain advantages compared with other algorithms.

如图8所示，图中第一行为样本真实形状，第二至五行分别是本实施例算法、CFSS、LBF和SDM的定位结果。从第一列到第四列可以看出，本实施例算法可以取得与真实形状非常相近的结果，而其他三个算法均可见误差；在第五列到第八列中，本实施例算法的结果存在一定的误差，SDM和LBF基本定位失败，CFSS在第五、六列的结果比本实施例算法好，但在俯仰角较大的第七、八列定位失败，结果不如本实施例算法。根据这些结果可以看出，本实施例的算法在姿态上具有一定的鲁棒性。As shown in FIG. 8 , the first row in the figure is the real shape of the sample, and the second to fifth rows are the positioning results of the algorithm of this embodiment, CFSS, LBF, and SDM, respectively. It can be seen from the first to fourth columns that the algorithm of this embodiment can obtain results that are very close to the real shape, while errors are visible in the other three algorithms; in the fifth to eighth columns, the results of the algorithm of this embodiment are There is a certain error in the results. The basic positioning of SDM and LBF fails. The results of CFSS in the fifth and sixth columns are better than that of the algorithm in this embodiment, but the positioning in the seventh and eighth columns with larger pitch angles fails, and the results are not as good as the algorithm in this embodiment. . According to these results, it can be seen that the algorithm of this embodiment has certain robustness in attitude.

如图9所示，其中标有特征点的样本分别代表着简单案例，困难但成功预测的案例和失败案例。导致失败的主要原因是样本的面部表情夸张。根据这些结果不难看出，所提出的方法具有姿态鲁棒性。As shown in Figure 9, the samples marked with feature points represent easy cases, difficult but successfully predicted cases, and failed cases, respectively. The main reason for the failure is the exaggerated facial expressions of the samples. From these results, it is not difficult to see that the proposed method is pose robust.

应当认识到，本发明的实施例可以由计算机硬件、硬件和软件的组合、或者通过存储在非暂时性计算机可读存储器中的计算机指令来实现或实施。所述方法可以使用标准编程技术-包括配置有计算机程序的非暂时性计算机可读存储介质在计算机程序中实现，其中如此配置的存储介质使得计算机以特定和预定义的方式操作——根据在具体实施例中描述的方法和附图。每个程序可以以高级过程或面向对象的编程语言来实现以与计算机系统通信。然而，若需要，该程序可以以汇编或机器语言实现。在任何情况下，该语言可以是编译或解释的语言。此外，为此目的该程序能够在编程的专用集成电路上运行。It should be appreciated that embodiments of the present invention may be implemented or implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in non-transitory computer readable memory. The methods can be implemented in a computer program using standard programming techniques - including a non-transitory computer-readable storage medium configured with a computer program, wherein the storage medium so configured causes the computer to operate in a specific and predefined manner - according to the specific Methods and figures described in the Examples. Each program may be implemented in a high-level procedural or object-oriented programming language to communicate with a computer system. However, if desired, the program can be implemented in assembly or machine language. In any case, the language can be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.

此外，可按任何合适的顺序来执行本文描述的过程的操作，除非本文另外指示或以其他方式明显地与上下文矛盾。本文描述的过程(或变型和/或其组合)可在配置有可执行指令的一个或多个计算机系统的控制下执行，并且可作为共同地在一个或多个处理器上执行的代码(例如，可执行指令、一个或多个计算机程序或一个或多个应用)、由硬件或其组合来实现。所述计算机程序包括可由一个或多个处理器执行的多个指令。Furthermore, the operations of the processes described herein may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes (or variations and/or combinations thereof) described herein can be performed under the control of one or more computer systems configured with executable instructions, and as code that executes collectively on one or more processors (eg, , executable instructions, one or more computer programs or one or more applications), implemented in hardware, or a combination thereof. The computer program includes a plurality of instructions executable by one or more processors.

进一步，所述方法可以在可操作地连接至合适的任何类型的计算平台中实现，包括但不限于个人电脑、迷你计算机、主框架、工作站、网络或分布式计算环境、单独的或集成的计算机平台、或者与带电粒子工具或其它成像装置通信等等。本发明的各方面可以以存储在非暂时性存储介质或设备上的机器可读代码来实现，无论是可移动的还是集成至计算平台，如硬盘、光学读取和/或写入存储介质、RAM、ROM等，使得其可由可编程计算机读取，当存储介质或设备由计算机读取时可用于配置和操作计算机以执行在此所描述的过程。此外，机器可读代码，或其部分可以通过有线或无线网络传输。当此类媒体包括结合微处理器或其他数据处理器实现上文所述步骤的指令或程序时，本文所述的发明包括这些和其他不同类型的非暂时性计算机可读存储介质。当根据本发明所述的方法和技术编程时，本发明还包括计算机本身。计算机程序能够应用于输入数据以执行本文所述的功能，从而转换输入数据以生成存储至非易失性存储器的输出数据。输出信息还可以应用于一个或多个输出设备如显示器。在本发明优选的实施例中，转换的数据表示物理和有形的对象，包括显示器上产生的物理和有形对象的特定视觉描绘。Further, the methods may be implemented in any type of computing platform operably connected to a suitable, including but not limited to personal computer, minicomputer, mainframe, workstation, network or distributed computing environment, stand-alone or integrated computer platform, or communicate with charged particle tools or other imaging devices, etc. Aspects of the invention may be implemented in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, an optically read and/or written storage medium, RAM, ROM, etc., such that it can be read by a programmable computer, when a storage medium or device is read by a computer, it can be used to configure and operate the computer to perform the processes described herein. Additionally, the machine-readable code, or portions thereof, may be transmitted over wired or wireless networks. The invention described herein includes these and other various types of non-transitory computer-readable storage media when such media includes instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein. A computer program can be applied to input data to perform the functions described herein, transforming the input data to generate output data for storage to non-volatile memory. The output information can also be applied to one or more output devices such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including specific visual depictions of physical and tangible objects produced on the display.

如在本申请所使用的，术语“组件”、“模块”、“系统”等等旨在指代计算机相关实体，该计算机相关实体可以是硬件、固件、硬件和软件的结合、软件或者运行中的软件。例如，组件可以是，但不限于是：在处理器上运行的处理、处理器、对象、可执行文件、执行中的线程、程序和/或计算机。作为示例，在计算设备上运行的应用和该计算设备都可以是组件。一个或多个组件可以存在于执行中的过程和/或线程中，并且组件可以位于一个计算机中以及/或者分布在两个或更多个计算机之间。此外，这些组件能够从在其上具有各种数据结构的各种计算机可读介质中执行。这些组件可以通过诸如根据具有一个或多个数据分组(例如，来自一个组件的数据，该组件与本地系统、分布式系统中的另一个组件进行交互和/或以信号的方式通过诸如互联网之类的网络与其它系统进行交互)的信号，以本地和/或远程过程的方式进行通信。As used in this application, the terms "component," "module," "system," etc. are intended to refer to a computer-related entity, which may be hardware, firmware, a combination of hardware and software, software, or running software. For example, a component can be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread in execution, a program, and/or a computer. As an example, both an application running on a computing device and the computing device may be components. One or more components can exist in a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures thereon. These components can be implemented by, for example, having one or more data groupings (eg, data from one component interacting with another component in a local system, a distributed system, and/or in a signaling manner such as the Internet network to interact with other systems) to communicate locally and/or as remote processes.

应说明的是，以上实施例仅用以说明本发明的技术方案而非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或者等同替换，而不脱离本发明技术方案的精神和范围，其均应涵盖在本发明的权利要求范围当中。It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent substitutions without departing from the spirit and scope of the technical solutions of the present invention should be included in the scope of the claims of the present invention.

Claims

1. a facial feature point location method based on double-layer cascade regression model, is characterized in that: comprise,

Amplify the sample by randomly selecting the face shape from the training sample set as the initial shape;

From the complete face shape of 68 points, 14 points that play a key role in the positioning of face feature points are selected to form a simplified face shape;

Using the subspace division method, the samples are divided into multiple subsets, and each subset trains a regressor, and all the regressors together form the first-level cascaded regression model, which is used to predict the simplified face shape;

Using 3D fitting, realize the projection from the simplified face shape predicted in the first layer to the complete face shape, and generate a rough initial complete face shape;

The rough and complete face shape is used as the initial shape, and all training samples are used to train the regressor to form a second-level cascaded regression model to form a two-layer cascaded regression model for predicting the complete face shape.

2. The method for locating facial feature points based on a double-layer cascaded regression model according to claim 1, wherein the step of constructing the double-layered cascaded regression model comprises,

First, extract the shape index feature based on PHOG, use the projection matrix P obtained in the training process to project it into the fusion subspace, and divide K sample subsets according to it;

and regressor

After T iterations of regression, the simplified face shape is obtained

Then use the 3D fitting method to obtain affine transformation parameters, and project to obtain a rough complete face shape s ⁰ ;

Finally, according to all the training samples and the rough and complete face shape obtained in the previous step, the feature extraction function Φ and the regressor W are trained, and refined regression is achieved after T iterations.

3. the facial feature point locating method based on double-layer cascade regression model according to claim 1 and 2, is characterized in that: the step of selecting optimal simplified facial shape comprises,

Based on the most representative 5 contour feature points, add some feature points to the eyes, nose, mouth and contour respectively, and select 6 simplified face shapes;

The five most representative contour feature points include the center of the eyes, the tip of the nose and the corner of the mouth.

4. the facial feature point location method based on the double-layer cascade regression model according to claim 3, is characterized in that: the step of selecting optimal simplified facial shape also comprises,

Pick the most suitable simplified face shape, including calculating the positioning error and fitting error;

The positioning error and the fitting error are compared, and the 14-point shape with the minimum positioning error and the fitting error is selected as the simplified face shape.

5. The method for locating facial feature points based on a double-layer cascaded regression model according to claim 4, wherein the method for calculating the positioning error is to train a single-layer cascaded regression based on each simplified face shape. Model, calculate the normalized standard error of each model on the test set samples;

The fitting error calculation method is, based on the positioning result obtained by the positioning error and the complete face shape generated by the 3D fitting method, calculate the average normalized error between the generated complete face shape and the real face shape of the sample. .

6. The method for locating facial feature points based on a double-layer cascaded regression model according to claim 2, wherein the fusion subspace comprises:

Among them, r is the column dimension of the fusion subspace, and z _i,j represents the projection result of the shape index feature of the ith sample on the jth dimensional subspace.

7. The method for locating facial feature points based on a double-layer cascaded regression model according to claim 6, wherein the step of dividing the sample into K sample subsets by utilizing the fusion subspace comprises,

Define the face images in the training set and the corresponding set of real shapes,

where N is the number of samples, I _i and

are the ith face image and the corresponding real shape, respectively;

compute the shape residuals,

and PHOG-based shape index features

Use canonical correlation analysis to analyze the shape residual and shape index features, and obtain their corresponding projection matrices P and Q;

The subsets are divided according to the following formula,

8. the facial feature point location method based on the double-layer cascade regression model according to claim 7, is characterized in that: if the i-th and g-th samples have the same each dimension symbol on the fusion subspace, Then the two samples belong to the same subset U _k .

9. The method for locating facial feature points based on a double-layer cascaded regression model according to claim 8, wherein the step of utilizing the 3D fitting method to generate an initial shape for the second-layer model comprises,

and 3D full face shape

Use weak perspective projection to project the 3D simplified face shape to the 2D plane, and fit the s _3d to the face image, including:

Among them, f is the scaling factor, P _o is the orthographic projection matrix

for

Projection on a 2D plane;

If the 2D simplified face shape output by the first-level cascaded regression model is recorded as

by minimizing

and

Replaced with 3D full face shape

When the face pose changes, the feature points of the contour parts marked on the 3D face shape will be occluded by the cheeks, while the real contour feature points should be at the border of the cheeks.

Use the yaw angle β obtained in the above fitting process to determine whether the face is biased to the left or the right.

10. The method for locating facial feature points based on a double-layer cascaded regression model according to claim 9, wherein the step of utilizing the 3D fitting method to generate an initial shape for the second-layer model further comprises:

When the face is biased to the left, the 8 contour feature points on the left search for the minimum x-coordinate point in its corresponding coordinate point subset, and use it as a new contour feature point;

When the face is biased to the right, the 8 contour feature points on the right side search for the maximum x-coordinate point in the subset of corresponding coordinate points, and mark it as a new contour feature point;

The eight contour feature points are three contour feature points added on the basis of the five contour feature points.