CN112115845B

CN112115845B - Active shape model parameterization method for face key point detection

Info

Publication number: CN112115845B
Application number: CN202010968975.9A
Authority: CN
Inventors: 苟超; 卓莹
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2023-12-29
Anticipated expiration: 2040-09-15
Also published as: CN112115845A

Abstract

The invention provides an active shape model parameterization method for face key point detection, which uses active shape model parameters to encode the face key points, so that the positions of the face key points can be predicted in a new space, and the solving of the positions of the face key points becomes compact; the cascade regression method is used for updating the active shape model parameters by utilizing local characteristics, so that the operation efficiency and the detection precision are improved, and the robustness of the model is enhanced. The method has the advantages of high detection speed, high matching precision, strong robustness and the like, effectively solves the problems of low operation efficiency of parameterization methods such as an active shape model and the like and the over-fitting problem of non-parameterization methods such as cascade regression and the like, and has higher practical application value.

Description

Active shape model parameterization method for face key point detection

Technical Field

The invention relates to the field of computer vision, in particular to an active shape model parameterization method for face key point detection.

Background

The key points of the human face refer to the salient features with distinguishing effect or can serve as facial anchor points, such as the corners of eyes, corners of mouth, nose tips and the like. The detection of key points of the human face plays an important role in human face related problems such as face recognition, three-dimensional human face reconstruction, expression analysis, sight estimation and the like, and is a precondition and key step of the series of researches. The face key point detection method can be divided into an overall method, a local constraint method and a regression-based method. The overall method taking the active appearance model as an example mainly detects face key points through the overall face appearance and the overall face shape. The local constraint method estimates the positions of the key points of the face according to the global face shape mode and the independent local appearance information around each key point. Regression-based methods do not explicitly build a face model, but rather directly learn the mapping from image appearance to face keypoint locations. Regression-based methods can be classified into a direct regression method, a cascade regression method, and a deep learning-based regression method. The direct regression method does not need to be initialized, and only deduces the positions of key points of the human face in one iteration; the cascade regression algorithm needs to initialize the positions of key points of the human face, and then learns a regression model in each iteration to iteratively map the local discriminant features around the key points to the true positions of the key points; the deep learning-based regression method mainly uses a deep learning method on a framework following a direct regression method or a cascade regression method. If the face key points are classified according to parameterization, the face key point detection method can be classified into a parameterization method and a non-parameterization method. The method includes the steps of taking an active shape model, an active appearance model, a local constraint method and the like as parameterization methods, and adopting cascade regression, deep learning and the like as non-parameterization methods.

The active shape model belongs to a parameterization method in a face key point detection method, the method parameterizes the face key points into a face model, and a principal component analysis method is used for carrying out dimension reduction and constraint shape change on the model; parameters are then adjusted to match the keypoint locations of a given face image based on the detected local features. The active shape model is simple and easy to understand, and the frame is clear and convenient to apply. Some studies have been done to improve the generalization ability of active shape models, such as by modifying the local binary pattern (Local Binary Pattern, LBP) features, enhanced wavelet features or mutual information features, etc. However, the positioning manner of the key points, which are similar to the exhaustive search, in the active shape model limits the operation efficiency to a certain extent. To this end, some research efforts have used regression methods to learn the relationship between local neighborhood appearance and actual keypoint locations, instead of the original local feature detection methods, such as Seise et al in "Learning active shape models for bifurcating contours" to update face keypoint locations using a relevance vector machine (Relevance Vector Machines, RVM) regression model. Wimmer et al train the model tree at "Learning robust objective functions for model fitting in image understanding applications" to return from local harr wavelet features to an objective function that peaks at true keypoint locations.

As a non-parametric method, the cascade regression method aims at learning a regression model in each iteration to map local discriminant features around the keypoints to the truth positions of the keypoints. The improvement of cascade regression has been studied mainly on the choice of local features and the regression model used at each iteration. The supervised descent method (Supervised Descent Method, SDM) as introduced by Xiong equal to "Supervised descent method and its applications to face alignment" and the local binary pattern feature regression method (Regressing Local Binary Features, LBF) as introduced by Ren equal to "Face alignment at 3000fps via regressing local binary features" both use linear regression models at each iteration; the supervision descent method directly uses SIFT features related to the face shape as input features, and the local binary pattern feature regression method uses neighborhood sparse binary features learned by a random forest regression model. In the text "Robust discriminative response map fitting with constrained local models", the discriminant response map matching method (Discriminative Response Map Fitting, DRMF) proposed by ashhana et al uses a directional gradient histogram (Histogram of Oriented Gradient, HOG) feature and a support vector regression model (Support Vector Regression, SVR).

In recent years, a deep learning method based on a cascade regression framework has been greatly developed in the field of face key point detection. For example, zhou et al in "Extensive facial landmark localization with coarse-to-fine convolutional network cascade" combine a cascade of coarse-to-fine framework and geometric optimization, and propose a deep convolutional neural network (Deep Convolutional Neural Networks, DCNN) to detect 68 face keypoints. Zhang et al in "Coarse-to-fine Auto-encoder Networks (CFAN) for real-time face alignment" propose a Coarse-to-fine Auto-encoder network (Deep Convolutional Neural Networks, CFAN) that performs a cascading process by four Stacked Auto-encoder Networks (SANs) that allows each SAN to learn the mapping of face images to face shapes under different scale images based on the shape estimate of the previous SAN. In the Decafa-deep convolutional cascade for face alignment in the wild article, dapogny et al introduced an end-to-end deep convolution cascade structure and incorporated the attention-seeking and intermediate supervision of keypoints into the structure for face keypoint detection. As a non-parameterized method, a regression-based method is more flexible than a parameterized method such as an active shape model, but the computational resource consumption of the parameterized method is generally smaller, whereas the computational resource consumption of the non-parameterized method increases with an increase in the number of samples. In addition, for a regression method based on deep learning, the detection performance of the key points of the human face is highly dependent on the scale of training samples, and data are more likely to be over-fitted.

The patent specification with the application number of 201910986156.4 discloses a face key point detection method and a face key point detection system based on deep learning, and the face key point detection network based on a MobileNet V1 architecture is constructed; the input of the face key point detection network is a face image, and the output is a face key point coordinate; training a constructed face Key point Detection network by taking a face data set of a Facial Key-points Detection on a kagle as a sample image; collecting an image of a face to be detected; acquiring a face region by adopting an OpenCV cascade classifier for the acquired image to be detected; inputting the obtained face region into a trained face key point detection network to obtain left eye and nose tip coordinates; and calculating coordinates of right eyes and left and right corners of the face in the image to be detected, and marking key points. However, the patent cannot predict the positions of the key points of the face in a new space, so that the solving of the positions of the key points of the face becomes compact and the operation efficiency is improved.

Disclosure of Invention

The invention provides an active shape model parameterization method for face key point detection, which can predict the positions of the face key points in a new space, so that the solving of the positions of the face key points becomes compact and the operation efficiency is improved.

In order to achieve the technical effects, the technical scheme of the invention is as follows:

an active shape model parameterization method for face key point detection, comprising the following steps:

s1: initializing active shape model parameters by using sample data of a face key point data set;

s2: updating the active shape model parameters by using a cascade regression method until the iteration number reaches the maximum value;

s3: and (3) decoding the final result of the active shape model parameters in the step (S2) to obtain the corresponding key point positions of the human face.

Further, the specific process of the step S1 is:

s11: taking the gesture of the first face in the face key point data set as the reference, sequentially reading parameters related to the gesture in each face in the data set, aligning the first face, and eliminating the influence of the gesture of the face in the data set;

s12: returning the average face of 136-dimensional vectors in the key point data set of the face images by using a principal component analysis method for the face images after unified alignment in the step S11And a reserved eigenvalue lambda and eigenvector lambda, the dimension of eigenvalue lambda being determined by the reserved energy proportion in the principal component analysis, the reserved energy proportion being 95%, the corresponding eigenvalue lambda being a 7-dimensional vector, the dimension of eigenvector lambda being 136 x 7;

s13: initializing the shape parameter d to be zero vector, and according to the average face obtained in step S12And feature vector lambda, initializing the position x of key points of the human face ⁰ As the current face key point estimated value x _temp ；

S14: according to key point true value x in face key point data set ^* And the current face key point estimated value x _temp Solving the attitude parameters { theta, s, t } _x ,t _y Wherein θ represents a rotation angle, s is a scaling ratio, t _x ,t _y The translation amounts in the horizontal direction and the vertical direction respectively;

s15: key point true value x of face key point data set ^* Average faceAnd the eigenvector Λ, the attitude parameters { θ, S, t obtained in step S13 _x ,t _y -calculating to obtain a shape parameter d;

s16: from the attitude parameters { θ, S, t obtained in step S13 _x ,t _y And (2) forming an active shape model parameter p of the corresponding face in the face key point data set by the shape parameter d obtained in the step S15:

p＝{θ,s,t _x ,t _y ,d}

s17: according to the active shape model parameters p obtained in the step S16 and the average face obtained in the step S12The feature vector lambda re-estimates the current key point position x of the human face _temp And calculates the true value x of the key point of the face corresponding to the key point ^* Error GTE of (c):

e in the formula is the pupil distance of the corresponding face image;

s18: repeating the steps S14-S17 until the estimated position x of the key point of the current face _temp True value x of key points of face ^* Error GTE is less than error threshold GTE _tresh The active shape model parameters at this time are the active shape model parameters p corresponding to the face image ^* ；

S19: active shape model parameters p corresponding to all face images in the face key point data set ^* The average value is calculated as the initial value p of the active shape model parameter ⁰ 。

Further, updating the active shape model parameters by using a cascade regression method until the iteration times reach the maximum value; for the positions of key points of faces to be predictedFace image, using cascade regression method to continuously update initial value of active shape model parameter to obtain active shape model parameter p corresponding to the image _pre 。

Further, the specific process of step S2 is:

s21: reading a face image, converting the face image into a gray image, performing image scaling by taking a central point of a detected face range in the image as an anchor point, uniformly cutting the face image into a gray image with the width of 241, and marking the gray image as I;

s22: reading an initial value p of the active shape model parameter obtained in the step S1 ⁰ As the current active shape model parameter p _temp ；

S23: according to the current active shape model parameters p _temp Average face obtained in step S12And feature vector lambda, estimating current face key point x _temp ；

S24: the estimated value x of the key point of the current face obtained in step S23 _temp And step S21, uniformly scaling and cutting the gray level image I, extracting local features phi near each key point in the current face, wherein the extracted local features are SIFT features;

s25: according to the local feature phi near the key points in the face obtained in the step S24, calculating the current active shape model parameter p by using the regression model R _temp The regression model, as used herein, is a linear model:

Δp＝R(Φ(x _temp ,I))

s26: adding the increment Δp of the current active shape model parameter obtained in step S25 to update the current active shape model parameter p _temp And repeating step S23 to estimate the corresponding current face key point x _temp The method comprises the steps of carrying out a first treatment on the surface of the Step S23-S26 of loop iteration until the iteration number of the cascade regression reaches the maximum value T, and the current active shape model parameter p in the last iteration _temp Namely the most of the face images corresponding to the positions of the key points to be predictedFinal active shape model parameters p ^T T has a value of 6.

Further, the specific process of step S3 is as follows:

final result p of active shape model parameters in step S2 ^T Decoding to obtain predicted key point position x of human face _pre ：

Wherein P is an orthogonal base of the principal component space obtained by the principal component analysis method in step S12; t= { t _x ,t _y The M () function represents the scaling and planar rotation of the selected face shape, representing the amount of translation of the face image in the horizontal and vertical directions of the key point position to be predicted.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention uses the active shape model parameter to encode the key point of the human face, so that the position of the key point of the human face can be predicted in a new space, and the solving of the position of the key point of the human face becomes compact; the cascade regression method is used for updating the active shape model parameters by utilizing local characteristics, so that the operation efficiency and the detection precision are improved, and the robustness of the model is enhanced. The method has the advantages of high detection speed, high matching precision, strong robustness and the like, effectively solves the problems of low operation efficiency of parameterization methods such as an active shape model and the like and the over-fitting problem of non-parameterization methods such as cascade regression and the like, and has higher practical application value.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a calculation flow of corresponding active shape model parameters obtained from a face image;

FIG. 3 cascade regression of corresponding face key sample samples in each iteration;

FIG. 4 is a flow chart of cascade regression updating of active shape model parameters;

fig. 5 is a comparative example of face key point true and predicted values.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions;

it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.

As shown in fig. 1, the present invention provides an active shape model parameterization method for face keypoints detection, which detects the positions of face keypoints by decoding active shape model parameters. The specific implementation steps of this embodiment are as follows:

step 1, initializing active shape model parameters by using sample data of a face key point data set;

the face key point data set is from a part of the disclosed 300-W data set, and comprises 811 face pictures of a training set in the LFPW data set, 2000 face pictures of the training set in the Helen data set and 337 face pictures in the AFW data set, which are 3148 in total. And labeling 68 face key points for each face image in the face key point data set.

The initialization process of the active shape model parameters specifically comprises the following steps:

and step 11, taking the gesture of the first face in the face key point data set as the reference, sequentially reading parameters related to the gesture in each face in the data set, aligning the first face, and eliminating the influence of the gesture of the face in the data set.

Step 12, using principal component analysis method to the face images aligned in step 11, returning to the average face of 136-dimensional vector in the face key point data setAnd the eigenvalue λ and eigenvector Λ reserved. Special purposeThe dimension of the eigenvalue λ is determined by the energy proportion retained in the principal component analysis. The energy ratio remained in this embodiment is 95%, the corresponding eigenvalue λ is a 7-dimensional vector, and the dimension of the eigenvector Λ is 136×7;

step 13, initializing the shape parameter d to be zero vector, and according to the average face obtained in step 12And feature vector lambda, initializing the position x of key points of the human face ⁰ As the current face key point estimated value x _temp ；

Step 14, according to the key point true value x in the face key point data set ^* And the current face key point estimated value x _temp Solving the attitude parameters { theta, s, t } _x ,t _y }. Wherein θ represents the rotation angle, s is the scaling, t _x ,t _y The translation amounts in the horizontal direction and the vertical direction respectively;

step 15, key point true value x of the face key point data set ^* Average faceAnd a feature vector Λ, and the attitude parameters { θ, s, t obtained in step 13 _x ,t _y -calculating to obtain a shape parameter d;

step 16, obtaining attitude parameters { θ, s, t } from the step 13 _x ,t _y And (2) forming an active shape model parameter p of the corresponding face in the face key point data set by the shape parameter d obtained in the step (15):

p＝{θ,s,t _x ,t _y ,d}

step 17, according to the active shape model parameters p obtained in step 16 and the average face obtained in step 12The feature vector lambda re-estimates the current key point position x of the human face _temp And calculates the true value x of the key point of the face corresponding to the key point ^* Error GTE of (c):

e in the formula is the pupil distance of the corresponding face image;

step 18, repeating the steps 14-17 until the estimated position x of the key point of the current face _temp True value x of key points of face ^* Error GTE is less than error threshold GTE _tresh Until that point. The active shape model parameters at this time are the active shape model parameters p corresponding to the face image ^* ；

In step 19, the flow of obtaining the corresponding active shape model parameters from the face image is shown in fig. 2. Active shape model parameters p corresponding to all face images in the face key point data set ^* The average value is calculated as the initial value p of the active shape model parameter ⁰ 。

Step 2, updating the active shape model parameters by using a cascade regression method until the iteration times reach the maximum value;

for the face image of the key point position of the face to be predicted, a cascade regression method can be used for continuously updating the initial value of the active shape model parameter to obtain the active shape model parameter p corresponding to the image _pre . The method specifically comprises the following steps:

step 21, the face image is read and converted into a gray scale image. Then, taking the central point of the detected face range in the image as an anchor point to perform image scaling, and uniformly cutting the face image into a gray image with the width of 241, and marking the gray image as I;

step 22, reading the initial value p of the active shape model parameter obtained in step 1 ⁰ As the current active shape model parameter p _temp ；

Step 23, according to the current active shape model parameters p _temp Average face obtained in step 12And feature vector lambda, estimating current face key point x _temp ；

Step 24, obtaining the estimated value x of the key point of the current face from step 23 _temp And step 21, uniformly scaling and cutting the gray level image I, and extracting the local feature phi near each key point in the current face. The local features extracted here are SIFT features;

step 25, calculating the current active shape model parameter p by using the regression model R according to the local feature phi near the key point in the face obtained in step 24 _temp Is increased by an increment deltap. The regression model used herein is a linear model:

Δp＝R(Φ(x _temp ,I))

step 26, adding the increment Δp of the current active shape model parameter obtained in step 25 to update the current active shape model parameter p _temp . And repeating step 23 to estimate the corresponding key point x of the current face _temp ；

Step 27, iterating the steps 23-26 circularly until the iteration number of the cascade regression reaches the maximum value T. Current active shape model parameters p in last iteration _temp The final active shape model parameter p of the face image corresponding to the position of the key point to be predicted is ^T . In this embodiment, the value of T is 6. The face key point predicted value corresponding to the active shape model parameter obtained by cascade regression is shown in fig. 3.

The process of updating the active shape model parameters using the cascade regression method in step 2 is shown in fig. 4.

Step 3, for the final result p of the active shape model parameters in step 2 ^T Decoding to obtain predicted key point position x of human face _pre ：

Wherein P is an orthogonal base of the principal component space obtained by the principal component analysis method in step 12; t= { t _x ,t _y And the horizontal and vertical translation amounts of the face image of the key point position to be predicted are represented. Face key pointThe comparison of the set predicted value and the true value is shown in fig. 5, wherein the green point represents the predicted value of the key point position of the face, and the white point represents the true value of the key point position of the face.

The same or similar reference numerals correspond to the same or similar components;

the positional relationship depicted in the drawings is for illustrative purposes only and is not to be construed as limiting the present patent;

it is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. An active shape model parameterization method for face key point detection is characterized by comprising the following steps:

s3: decoding the final result of the active shape model parameters in the step S2 to obtain the corresponding key point positions of the face;

the specific process of the step S1 is as follows:

s12: returning the average face x of the face image, the reserved eigenvalue lambda and eigenvector lambda of the face image by using a principal component analysis method to the face images after unified alignment in the step S11, wherein the dimension of the eigenvalue lambda is determined by the reserved energy proportion in the principal component analysis method;

p＝{θ,s,t _x ,t _y ,d}

e in the formula is the pupil distance of the corresponding face image;

S19: active shape model parameters p corresponding to all face images in the face key point data set ^* The average value is calculated as the initial value p of the active shape model parameter ⁰ ；

The specific process of step S2 is:

s21: reading a face image, converting the face image into a gray image, performing image scaling by taking a central point of a detected face range in the image as an anchor point, cutting the face image into a gray image with uniform width, and marking the gray image as I;

S23: according to the current active shape model parameters p _temp And estimating the current face key point x by the average face x and the eigenvector lambda obtained in the step S12 _temp ；

Δp＝R(Φ(x _temp ,I))

s26: adding the increment Δp of the current active shape model parameter obtained in step S25 to update the current active shape model parameter p _temp And repeating step S23 to estimate the corresponding current face key point x _temp 。

2. The method according to claim 1, wherein in step S12, the average face of the face image is returnedThe method specifically comprises the following steps: returning the average face of 136-dimensional vectors in the key point data set of the face image +.>

3. The method according to claim 2, wherein in step S12, the retained energy ratio is 95%, the corresponding eigenvalue λ is a 7-dimensional vector, and the dimension of the eigenvector Λ is 136×7.

4. The method for parameterizing an active shape model for face keypoint detection according to claim 3, wherein the active shape model parameters are updated using a cascade regression method until the number of iterations reaches a maximum value; for a face image of a key point position of a face to be predicted, continuously updating initial values of active shape model parameters by using a cascade regression method to obtain active shape model parameters p corresponding to the image _pre 。

5. The method for parameterizing an active shape model for face keypoint detection according to claim 1, wherein the steps S23-S26 are iterated in a loop until the number of iterations of the cascade regression reaches a maximum value T, the current active shape model parameter p in the last iteration _temp The final active shape model parameter p of the face image corresponding to the position of the key point to be predicted is ^T 。

6. The method for parameterizing an active shape model for face keypoint detection of claim 5, wherein T has a value of 6.

7. The method for parameterizing an active shape model for face keypoint detection according to claim 6, wherein the specific process of step S3 is:

8. The method according to claim 7, wherein the face image is uniformly cut to have a width of 241 in step S21.