CN105224935A

CN105224935A - A kind of real-time face key point localization method based on Android platform

Info

Publication number: CN105224935A
Application number: CN201510713055.1A
Authority: CN
Inventors: 刘青山; 王东; 杨静; 邓健康
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2015-10-28
Filing date: 2015-10-28
Publication date: 2016-01-06
Anticipated expiration: 2035-10-28
Also published as: CN105224935B

Abstract

The invention discloses a real-time face key point positioning method based on an Android platform, belonging to the technical field of computer vision. The method of the present invention comprises the following steps: collecting a human face training picture set, and demarcating key points; randomly selecting n samples in the training set as the initial shape of each training sample; calculating the standardization target of each training sample; extracting each key point Shape index feature; use correlation analysis method to select appropriate amount of features; use two-layer enhanced regression structure (outer layer and inner layer); calculate the regressor of each stage; use face detection method to estimate face window, according to training The regression model predicts the location of facial key points. The current existing methods have high computational complexity and run too slowly on mobile platforms; they are also sensitive to noise and have low positioning accuracy. The invention uses the linear combination of samples to constrain the shape, and uses a method based on regression to improve the accuracy and efficiency of face key point positioning.

Description

A real-time face key point location method based on Android platform

技术领域technical field

本发明涉及计算机视觉技术领域，涉及一种人脸关键点定位方法。The invention relates to the technical field of computer vision, and relates to a method for locating key points of a human face.

背景技术Background technique

人脸检测和关键点定位技术作为计算机视觉研究的关键技术，目前已广泛应用于智能监控、身份识别、表情分析等方面。人脸关键点定位是指在人脸图像中精确定位到人的眼睛、嘴巴、鼻子等特定的面部器官，并获取其几何参数，从而为表情分析或人脸识别等研究提供准确的信息。As the key technology of computer vision research, face detection and key point positioning technology have been widely used in intelligent monitoring, identity recognition, expression analysis and other aspects. Face key point positioning refers to the precise positioning of specific facial organs such as eyes, mouth, and nose in the face image, and obtaining their geometric parameters, so as to provide accurate information for research such as expression analysis or face recognition.

目前，人脸关键点的定位在计算机上已有较好的实现，实时性和准确性都很高，代表性的工作有主动形状模型(ActiveShapeModel，ASM)，贝叶斯切空间形状模型(BayesianTangentShapeModel，BTSM)，主动外观模型(ActiveAppearanceModel，AAM)，受约束的局部模型(ConstrainedLocalModel，CLM)等等。但已有的算法计算复杂度较高，在计算、存储能力有限的移动平台上鲜有应用。At present, the location of face key points has been well realized on the computer, and the real-time and accuracy are very high. Representative works include Active Shape Model (ActiveShapeModel, ASM), Bayesian Tangent Shape Model (BayesianTangentShapeModel , BTSM), active appearance model (ActiveAppearanceModel, AAM), constrained local model (ConstrainedLocalModel, CLM) and so on. However, the existing algorithms have high computational complexity and are rarely used on mobile platforms with limited computing and storage capabilities.

如今，在中国有几亿的智能终端(主要是手机，平板电脑)用户，智能终端逐渐成为人们的信息载体。移动平台性能也得到了大幅提升，这就为移动平台上人脸特征点实时定位跟踪提供了可能。但目前已有的人脸特征点定位跟踪算法普遍计算复杂度较高、内存消耗较大、处理速度较慢，很难直接移植到移动平台。Today, there are hundreds of millions of users of smart terminals (mainly mobile phones and tablet computers) in China, and smart terminals have gradually become people's information carriers. The performance of the mobile platform has also been greatly improved, which provides the possibility for real-time positioning and tracking of facial feature points on the mobile platform. However, the existing facial feature point positioning and tracking algorithms generally have high computational complexity, large memory consumption, and slow processing speed, so it is difficult to directly transplant them to mobile platforms.

发明内容Contents of the invention

为了解决上述问题，本发明公开了高效率、高精度的“显式形状回归”人脸关键点定位方法，通过直接学习一个矢量回归函数来预测整个人脸的关键点。固有的形状约束被自然地编码到我们的级联学习框架中，并且被由粗到细地应用在测试的过程中。本发明力图在多个方面全面提升算法的性能，具体体现在特征选取和模型训练方面做了较大改进，使算法运行效率大幅提升，并同时保证了定位的准确性，能够实现在移动平台上实时进行人脸特征点的检测定位。In order to solve the above problems, the present invention discloses a high-efficiency, high-precision "explicit shape regression" face key point positioning method, which predicts the key points of the entire face by directly learning a vector regression function. Intrinsic shape constraints are naturally encoded into our cascaded learning framework and applied coarse-to-fine during testing. The present invention strives to comprehensively improve the performance of the algorithm in many aspects, which is specifically reflected in the relatively large improvement in feature selection and model training, which greatly improves the operating efficiency of the algorithm, and at the same time ensures the accuracy of positioning, and can be realized on the mobile platform. Real-time detection and positioning of facial feature points.

本发明分为训练和测试两个阶段。训练阶段主要学习回归模型。区别于大部分回归方法，本方法并不是用固定参数的形状模型。本发明通过直接学习一个矢量回归函数来预测整个人脸的关键点，然后在训练集上显式地最小化定位误差。The present invention is divided into two stages of training and testing. The training phase mainly learns the regression model. Unlike most regression methods, this method does not use a shape model with fixed parameters. The present invention predicts the key points of the whole face by directly learning a vector regression function, and then explicitly minimizes the positioning error on the training set.

为了实现高效的回归，我们使用简单的像素差分特征，即在图像中的两个像素的强度差。这样的特征计算复杂度非常小，就是根据关键点的位置和一个偏移量，取得该位置的像素值，然后计算两个这样的像素的差值，从而得到了形状索引特征。该方法中采用的是局部坐标而非全局坐标系，极大的增强了特征的鲁棒性。To achieve efficient regression, we use simple pixel-difference features, which are the intensity differences of two pixels in an image. The computational complexity of such a feature is very small, that is, according to the position of the key point and an offset, the pixel value of the position is obtained, and then the difference between two such pixels is calculated to obtain the shape index feature. This method uses local coordinates instead of global coordinates, which greatly enhances the robustness of features.

为了使像素差特征保持几何不变性，通过估计得到的形状来索引像素点。这里以局部坐标来索引像素Δ^l＝(Δx^l,Δy^l)，l为标准化脸上的一个关键点。这样的索引方式可以对尺度，旋转等变化保持不变性并且使得算法更加鲁棒。同时这种索引也可以帮助我们在一些静态的点周围列举出更多有用的特征(比如眼睛中心的点要比鼻尖更暗，然而两个眼睛的中心又是相似的)。在实际实现的过程中，我们把局部坐标系转换回原始图像的全局坐标系中来得到形状索引像素点，然后在计算像素差特征。这样的方法使得测试阶段的运行速度更快。设一个样本的估计形状为S。那么第m个关键点的位置通过π_lοS获取，这里的π_l从形状向量里得到第m个关键点的x，y坐标。Δ^l为局部坐标系，那么在原始图像上相应的全局坐标系可以表达为对于不同的样本Δ^l是完全相同的，但是用于提取底层像素点的全局坐标系会相应地调节来保证几何不变性。In order to keep the pixel difference feature geometrically invariant, the pixel points are indexed by the estimated shape. Here, the local coordinates are used to index the pixel Δ ^l = (Δx ^l , Δy ^l ), and l is a key point on the normalized face. Such an indexing method can maintain invariance to changes in scale, rotation, etc. and make the algorithm more robust. At the same time, this index can also help us enumerate more useful features around some static points (such as the point at the center of the eye is darker than the tip of the nose, but the center of the two eyes is similar). In the actual implementation process, we convert the local coordinate system back to the global coordinate system of the original image to obtain the shape index pixel, and then calculate the pixel difference feature. Such an approach makes the test phase run much faster. Let S be the estimated shape of a sample. Then the position of the m-th key point is obtained by π _l οS, where π _l gets the x, y coordinates of the m-th key point from the shape vector. Δ ^l is the local coordinate system, then the corresponding global coordinate system on the original image can be expressed as ^Δl is exactly the same for different samples, but the global coordinate system used to extract the underlying pixel points will be adjusted accordingly to ensure geometric invariance.

本发明使用了一个两级的增强回归器，即第一级10级，第二层500级。在这个二级结构中，第一级中每个节点都是500个弱分类器的级联，也就是一个第二层的回归器。在第二层回归器中，特征是保持不变，而在第一层中，特征是变化的。在第一层，每一个节点的输出都是上一个节点的输入。都是在上一级估计的关键点上在取的特征。The present invention uses a two-level enhanced regressor, that is, the first level is 10 levels, and the second level is 500 levels. In this secondary structure, each node in the first level is a cascade of 500 weak classifiers, which is a second-level regressor. In the second layer of regressors, the features are held constant, while in the first layer, the features are varied. In the first layer, the output of each node is the input of the previous node. They are all features taken at the key points estimated at the previous level.

本发明采用一个两层的加强回归模式，即基本人脸关键点定位框架(外部层)和阶段回归器R^t(内部层)级联模式。内部层阶段回归器又称作原始回归器。外部层与内部层回归器的一个关键区别在于形状索引特征在内部层回归中是固定的。也就是特征只对应于前一次估计的形状S^t-1并且在当前原始回归器学习好之前保持不变。保持内部层回归中的形状索引特征不变可以使训练阶段更加节省时间并且使得学习到的内部层回归器更加稳定。在本发明中回归器的结构可以描述为T＝10,K＝500，即第一层(外部层)10级，第二层(内部层)500级。在第一层中的每一个节点都是500个弱分类器的级联，也就是第二层回归器。第一层每一个节点的输入都是上一个节点的输出。第二层回归器采用所谓的fern结构。每一个fern是5个特征和阈值的组合，将特征空间分割成2^F个bins。The present invention adopts a two-layer enhanced regression model, that is, the basic human face key point positioning framework (outer layer) and the cascade model of stage regressor R ^t (inner layer). The inner layer stage regressor is also called the original regressor. A key difference between outer-layer and inner-layer regressors is that shape-indexed features are fixed in inner-layer regression. That is, the features only correspond to the shape S ^t-1 of the previous estimate and remain unchanged until the current original regressor is learned. Keeping the shape-indexed features in inner-layer regression constant can make the training phase more time-saving and make the learned inner-layer regressors more stable. In the present invention, the structure of the regressor can be described as T=10, K=500, that is, the first layer (outer layer) has 10 levels, and the second layer (inner layer) has 500 levels. Each node in the first layer is a cascade of 500 weak classifiers, which is the second layer regressor. The input of each node in the first layer is the output of the previous node. The second layer of regressors adopts the so-called fern structure. Each fern is a combination of 5 features and a threshold, which divides the feature space into 2 ^F bins.

每一个binb对应于一个回归输出y_b：Ω_b代表在第b个bin中的样本。y_b的最优解为为了防止训练时的过拟合引入一个收缩因子当bin中的样本数量足够大时beta影响较小，反之则会起到较低估计值得幅度。Each binb corresponds to a regression output y _b : Ω _b represents the sample in the bth bin. The optimal solution of y _b is In order to prevent overfitting during training, a shrinkage factor is introduced When the number of samples in the bin is large enough, beta has less influence, otherwise it will have a lower estimated value.

原始回归器：本发明用fern代表原始回归器，一个fern是F个特征和阈值的组合，将特征空间和所有训练样本划分到2^F个bin里面。每个bin与一个回归输出y_b密切相关。表示训练样本的回归目标，那么一个bin的输出就是最小化落入这个bin的所有训练样本到回归目标的均方距离：Ω_b是在第b个bin中的样本。它的最优解是对bin中的所有回归目标取均值：为了避免过拟合引入一个收缩因子β：可以看出当训练样本数足够大的时候，收缩因子β几乎没有影响，否则β会降低估计的量级。Original regressor: The present invention uses fern to represent the original regressor. A fern is a combination of F features and thresholds, and divides the feature space and all training samples into 2 ^F bins. Each bin is closely related to a regression output y _b . Represents the regression target of the training sample, then the output of a bin is to minimize the mean square distance from all training samples falling into this bin to the regression target: Ω _b is the sample in the bth bin. Its optimal solution is to take the mean of all regression targets in the bin: In order to avoid overfitting, a shrinkage factor β is introduced: It can be seen that when the number of training samples is large enough, the shrinkage factor β has little effect, otherwise β will reduce the magnitude of the estimate.

本专利完整的方法操作流程如下：The complete method operation process of this patent is as follows:

步骤1、收集训练图片集，并对训练集人脸图片标定关键点；Step 1. Collect the training picture set, and calibrate the key points of the face pictures in the training set;

步骤2、计算每一个训练样本标准化目标：MοS即为标准化的的目标样本；Step 2. Calculate the normalization target for each training sample: MοS is the standardized target sample;

步骤3、组合训练样本集合每个样本包含了一个训练图像、一个真实形状和一个初始形状；随机选取训练集中的n个样本作为每一个训练样本的初始形状；Step 3. Combining training sample sets Each sample contains a training image, a real shape and an initial shape; randomly select n samples in the training set as the initial shape of each training sample;

步骤4、根据每一个关键点的位置提取形状索引特征；Step 4, extract shape index features according to the position of each key point;

步骤5、采用相关性分析方法选择适量的特征作为最终的特征；Step 5, using a correlation analysis method to select an appropriate amount of features as the final feature;

步骤6、设置回归结构：采用2-level增强回归结构作为基本人脸配准框架(外部层)，在每一层外部层设置内部阶段回归器R^t(内部层)；Step 6. Set the regression structure: use the 2-level enhanced regression structure as the basic face registration framework (external layer), and set the internal stage regressor R ^t (internal layer) in each external layer;

步骤7、计算每一个阶段的回归器R^t： Step 7. Calculate the regressor R ^t of each stage:

步骤8、使用人脸检测的方法估计人脸窗口；根据训练好的回归模型预测人脸关键点位置： Step 8. Use the method of face detection to estimate the face window; predict the position of key points of the face according to the trained regression model:

附图说明Description of drawings

图1是本发明基于Android平台的实时人脸关键点定位方法的基本流程示意图。Fig. 1 is the basic flowchart diagram of the real-time face key point location method based on the Android platform of the present invention.

具体实施方式detailed description

下面结合附图对本发明的技术方案进行详细说明。本发明公开了一种基于Android平台的实时人脸关键点定位方法。The technical solution of the present invention will be described in detail below in conjunction with the accompanying drawings. The invention discloses a real-time face key point positioning method based on an Android platform.

如图1所示，本发明可分为学习定位和测试定位两个阶段，具体地说，包括如下步骤：As shown in Figure 1, the present invention can be divided into two stages of learning positioning and testing positioning, specifically, comprising the following steps:

步骤1、训练集特征点的标注与标准化处理：本发明中涉及的人脸标定点按照MPEG-4标准选取，由于人脸定义参数FDP定义的84个特征点过于繁杂，因此我们根据需要选取了其中68个特征点，眼睛轮廓上的特征点也仅保留了八个(即双眼眼周上、下、左、右各四个点)以提高运算效率。训练库取自MUCT和自标定的人脸图像，一共有大约四千幅人脸图像，对这些人脸图像进行镜像对称处理，最终获得有八千幅人脸图像的训练集。Step 1, labeling and standardization of training set feature points: the face calibration points involved in the present invention are selected according to the MPEG-4 standard, because the 84 feature points defined by the face definition parameter FDP are too complicated, so we selected them as needed Of the 68 feature points, only eight feature points on the eye contour (that is, four points on the upper, lower, left, and right sides of the eyes) are reserved to improve computing efficiency. The training library is taken from MUCT and self-calibration face images. There are about 4,000 face images in total. These face images are processed with mirror symmetry, and finally a training set with 8,000 face images is obtained.

步骤2、对样本图像形状进行标准化处理：由于图像中人脸形状的姿态通过大小尺度，旋转角度，位置参数来描述，因此需要通过适当改变形状的姿态参数，包括大小，角度，位置等来尽可能使所有的图像达到一致，例如尺度变换可以让标定点之间的距离固定，旋转变换可以让标定点连线指向固定，位移变换使形状中心位置固定。首先我们求得所有训练样本的平均脸然后最小化输入样本S与之间的L2距离MοS即为标准化的形状Step 2. Standardize the shape of the sample image: Since the pose of the face shape in the image is described by the size scale, rotation angle, and position parameters, it is necessary to appropriately change the pose parameters of the shape, including size, angle, and position. It is possible to make all images consistent. For example, scale transformation can make the distance between the calibration points fixed, rotation transformation can make the alignment of the calibration points fixed, and displacement transformation can make the center position of the shape fixed. First we find the average face of all training samples Then minimize the input sample S with L2 distance between MοS is the standardized shape

步骤3、组合训练样本集合每个样本包含了一个训练图像、一个真实形状和一个初始形状。随机选取训练集中的10个样本作为每一个训练样本的初始形状。以这种对一幅图片产生多个初始形状的方法无形地增加了训练样本的数量，同时也提高训练的泛化性。Step 3. Combining training sample sets Each sample contains a training image, a ground truth shape and an initial shape. Randomly select 10 samples in the training set as the initial shape of each training sample. This method of generating multiple initial shapes for a picture invisibly increases the number of training samples and also improves the generalization of training.

步骤4、在标准化的人脸上的某一特定点上建立局部坐标系Δ^l＝(Δx^l,Δy^l)。对于不同的训练样本，局部坐标系Δ^l是相同的。根据局部坐标系关键点的位置索引像素点。上标l表示像素点所对应的具体标注点。然后将局部坐标系转换回原始图像全局坐标系提取像素差特征π_l表示从形状向量里提取第m个标注点的x，y坐标。每一个弱学习器R^t的特征基于图像I和前一估计形状S^t-1。在每个阶段回归器R^t中，随机生成P个局部坐标系即定义了P个形状索引像素点。首先通过随机选取一个标注点(第_lα个标注点)生成每一个局部坐标系，然后以均匀分布的方式随机产生X-和Y-偏移。P个像素点一共会Step 4. Establish a local coordinate system Δ ^l = (Δx ^l , Δy ^l ) on a specific point on the standardized human face. For different training samples, the local coordinate system ^Δl is the same. Index pixels according to the position of the key point in the local coordinate system. The superscript l indicates the specific label point corresponding to the pixel point. Then transform the local coordinate system back to the original image global coordinate system to extract pixel difference features π _l means extracting the x, y coordinates of the mth label point from the shape vector. The features of each weak learner R ^t are based on the image I and the previous estimated shape S ^t-1 . In each stage regressor R ^t , randomly generate P local coordinate systems That is, P shape index pixel points are defined. First, each local coordinate system is generated by randomly selecting a label point (the _lαth label point), and then the X- and Y-offsets are randomly generated in a uniform distribution. A total of P pixels will

产生P²个像素差特征。Generate P ² pixel difference features.

步骤5、采用相关性分析方法选择适量的特征作为最终的特征。Step 5, using a correlation analysis method to select an appropriate amount of features as the final features.

在产生了P²个特征后，带来新的挑战——在巨大的特征空间里快速选取有效的特征。“适量”意为选取出远小于P²的特征个数，这样利于较小计算复杂度，提高算法的效率。本发明选取F/P²特征来建立良好的fern回归器。我们利用特征和回归目标之间的相关性来选取特征。选取特征有以下两个原则：在fern里的每一个特征和回归目标之间保持高度相关性；特征与特征之间的保持低相关，呈互补关系。用Y表示回归目标，是一个N(样本数)行N_fp(标注点数)列的矩阵。X表示像素差特征，是一个N行P²列的矩阵。我们的目标是在X矩阵的P²列中选出与Y矩阵高度相关的F列出来。既然Y是矩阵，我们用以个从单元高斯得到的列向量投影υ，将Y投影到一个列向量Y_prob上：Y_prob＝Yυ。与投影目标最为相关的(PearsonCorrelation)特征即可被选取出来：以多次投影重复F次即可得到F个适合的特征。这F个特征即为所要选取出来的适量的特征。回归目标与像素差特征的相关性计算corr(Y_proj,ρ_m-ρ_n)设计如下： $c o r r (Y_{p r o j}, ρ_{m} - ρ_{n}) = \frac{cov (Y_{p r o j}, ρ_{m}) - cov (Y_{p r o j}, ρ_{n})}{\sqrt{σ (Y_{p r o j}) σ (ρ_{m} - ρ_{n})}},$ 其中After generating P ² features, it brings a new challenge - to quickly select effective features in a huge feature space. "Appropriate amount" means to select the number of features that is much smaller ^than P2, which is beneficial to reduce computational complexity and improve the efficiency of the algorithm. The present invention selects F/P ² features to establish a good fern regressor. We exploit the correlation between features and regression targets to select features. There are two principles for selecting features: maintain a high correlation between each feature in fern and the regression target; maintain a low correlation between features and features in a complementary relationship. Use Y to represent the regression target, which is a matrix with N (number of samples) rows and N _fp (number of labeled points) columns. X represents the pixel difference feature, which is a matrix with N rows and P ² columns. Our goal is to select out of the P ² columns of the X matrix the F columns that are highly correlated with the Y matrix. Since Y is a matrix, we project Y onto a column vector Y _prob with a column vector projection υ obtained from a unit Gaussian: Y _prob = Yυ. The most relevant (PearsonCorrelation) features to the projection target can be selected: By repeating F times with multiple projections, F suitable features can be obtained. These F features are the right amount of features to be selected. The correlation calculation corr(Y _proj ,ρ _m -ρ _n ) between the regression target and the pixel difference feature is designed as follows: $c o r r (Y_{p r o j}, ρ_{m} - ρ_{no}) = \frac{cov (Y_{p r o j}, ρ_{m}) - cov (Y_{p r o j}, ρ_{no})}{\sqrt{σ (Y_{p r o j}) σ (ρ_{m} - ρ_{no})}},$ in

σ(ρ_m-ρ_n)＝cov(ρ_m,ρ_m)+cov(ρ_n,ρ_n)-2cov(ρ_m,ρ_n)。这里相关性的计算由两部分组成：目标-像素协方差和像素-像素协方差。因为形状索引特征在内部层级联回归中是固定不变的，所以像素-像素协方差可以被提前计算得到然后在每个内部层中重复使用。而对于原始回归器，我们只需要计算所有的目标-像素协方差来构成相关性，这里的相关性与像素特征的个数成线性关系。因此，相关性计算复杂度从O(NP²)降到了O(NP)。σ(ρ _m −ρ _n )=cov(ρ _m ,ρ _m )+cov(ρ _n ,ρ _n )−2cov(ρ _m ,ρ _n ). The calculation of correlation here consists of two parts: target-pixel covariance and pixel-pixel covariance. Since the shape-indexed features are fixed in inner-layer cascaded regression, the pixel-pixel covariance can be computed in advance and reused in each inner layer. For the original regressor, we only need to calculate all the target-pixel covariances to form the correlation, where the correlation is linear with the number of pixel features. Therefore, the correlation calculation complexity is reduced from O(NP ² ) to O(NP).

步骤6、设置回归结构：采用2-level增强回归结构作为基本人脸配准框架(外部层)，在每一层外部层设置内部阶段回归器R^t(内部层)。内部层的弱学习器被称为原始回归器。这里的内部和外部回归器具有一些相似性，但是他们的不同之处在于内部层的所有阶段回归其中形状索引特征是保持不变的。Step 6. Set the regression structure: use the 2-level enhanced regression structure as the basic face registration framework (external layer), and set the internal stage regressor R ^t (internal layer) in each external layer. The weak learners in the inner layers are called primitive regressors. Here the inner and outer regressors share some similarities, but they differ in that all stages of the inner layer regress where the shape index features are kept constant.

步骤7、学习内部级联回归，内部级联回归包含K个原始回归器{r₁,...,r_K}，即ferns。这些原始回归器依次贪婪地拟合回归目标。每一个原始回归器处理上一个回归器遗留下来的的残余。在每一次迭代过程中，这些残余被当做用来学习新的回归器的目标。计算每一个阶段的回归器R^t： Step 7. Learning the internal cascade regression, the internal cascade regression includes K original regressors {r ₁ ,...,r _K }, ie ferns. These primitive regressors in turn greedily fit the regression objective. Each original regressor processes the residue left over from the previous regressor. At each iteration, these residuals are used as targets for learning a new regressor. Compute the regressor R ^t for each stage:

Claims

1. a real-time face key point localization method based on Android platform, is characterized in that constructing a non-parametric shape model based on regression, predicts the key point of whole face by directly learning a vector regression function, comprising the following steps:

Step 1. Collect the training picture set, and calibrate the key points of the face pictures in the training set;

Step 2. Calculate the normalization target for each training sample: That is, the standardized target sample;

Step 3. Combining training sample sets Each sample contains a training image, a real shape and an initial shape; randomly select n samples in the training set as the initial shape of each training sample;

Step 4, extract shape index features according to the position of each key point;

Step 5, using a correlation analysis method to select an appropriate amount of features as the final feature;

Step 6. Set the regression structure: use the 2-level enhanced regression structure as the basic face registration framework (external layer), and set the internal stage regressor R ^t (internal layer) in each external layer;

Step 7. Calculate the regressor R ^t of each stage:

Step 8. Use the method of face detection to estimate the face window; predict the position of key points of the face according to the trained regression model: