CN110096965A

CN110096965A - A kind of face identification method based on head pose

Info

Publication number: CN110096965A
Application number: CN201910279003.6A
Authority: CN
Inventors: 童卫青; 章昕烨
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2019-08-06

Abstract

The present invention is a kind of face identification method based on head pose, devises the artificial neural network end to end for the estimation of multitask head pose and recognition of face, and modified cosine classification method is used in assorting process.The face identification method based on head pose, it can be according to the head pose information and face feature vector that network model is extracted from facial image, the head pose extracted is judged, the face for not meeting Gesture is directly deleted, to the facial image for meeting Gesture, the comparison processing that feature vector is carried out using improved cosine classifier, to obtain the similarity of face alignment.The method of the present invention has relatively good robustness to illumination variation, not only guarantees there is higher discrimination compared with traditional method, but also be able to satisfy the demand of the lower misclassification rate of practical application, has wide application prospect.

Description

Face recognition method based on head posture

Technical Field

The invention belongs to the technical field of image processing and pattern recognition, and particularly relates to a face recognition method based on a head posture.

Background

Due to different postures in an unlimited condition^[1]Illumination of light^[2]The effects of expression, background, distance, age, and wearing apparel, such that face recognition is currently intended to exhibit superior performance remains a challenging problem. Therefore, how to improve the accuracy of face recognition in real complex scenes and reduce the corresponding false recognition rate is one of the subjects studied in computer vision.

In recent years, more methods for face recognition and pose estimation, Schroff et al, have appeared.^[3]The method is different from the traditional method of extracting the human face features by using siamese CNN and then classifying by using classifiers such as SVM and the like. Chan et al.^[4]An unsupervised feature extraction algorithm is provided, repeated iterative training is not required to be carried out like CNN, and finally, simple nearest neighbor classification is carried out^[5]A good effect can be obtained. Sun et al.^[6]The deep ID series method carries out face verification supervision by repeatedly passing the L2 distance of the features among convolution layers of different levels, and finds that the shielding robustness is high through moderate sparsity and binarization. Wang et al.^[7]An end-to-end neural network for face detection alignment and recognition is designed. An interleaved operation is also provided to accelerate feature extraction, and finally feature classification is carried out through an SVM. In terms of attitude estimation, Beyer et al.^[8]The continuous head pose is predicted by using a VGG form structure (including a regular mode) comprising 6 convolution layers, wherein the number of each convolution channel is 24, 24, 48, 48, 64 and 64, and then connecting a hidden layer with 512 units to cooperate with a periodic direction regression loss function. Sankha et al.^[9]Taking the problem as one of the classifications of the gaze direction, the head classification (classified into 8 classes) is learned by using a deep neural network, and then the two models (classification and regression) are combined to estimate the approximate regression confidence (8 classes are mapped to a number of 0 to 1). Bao et al.^[10]The head pose estimation is regarded as a classification problem by utilizing a convolutional neural network, each angle of roll, pitch and yaw is divided into 0-180 classes, and each class represents the degree of the pose angleAnd (4) counting.

The invention is characterized in that the face recognition is not an isolated problem but is often entangled with other tasks, and the control of the face posture is determined by analyzing the existing face recognition scheme, so that the invention has positive effects on reducing the false recognition rate and improving the recognition rate. Therefore, an end-to-end artificial neural network for multi-task head pose estimation and face recognition is designed, and an improved cosine classification method is adopted in the classification process.

Compared with the current idea of conducting pose estimation and then conducting feature extraction based on face detection of pose estimation, the face recognition method based on head pose provided by the invention selects the sequence of more time-efficient face feature extraction before pose estimation and face recognition, and under accurate measurement, the tasks of head pose estimation and face recognition are almost completed simultaneously, which is more in line with the requirements of scenes in reality, for example, the current real-name detection scene based on identity cards has higher time limit requirements. Although the face which does not meet the posture requirement is directly deleted, the higher false rejection rate is caused, but the more important false recognition rate is greatly reduced, and the method has more superiority in practical application.

Reference to the literature

[1]Y.Seike,and J.Yamaguchi,"Face Recognition in Unrestricted Postureusing Invariant Image Information,"Proceedings of9th Symposium on Sensing viaImage Information,F-5,pp.323-328,June,2003

[2]Cui R,Zhang Y N,Hu Y N,et al.Lighting-variant face recognitionbased on illumination categorization[J].Jisuanji Gongcheng yu Yingyong(Computer Engineering and Applications),2010,46(28):185-188.

[3]Schroff F,Kalenichenko D,Philbin J.Facenet:A unified embedding forface recognition and clustering[C]//Proceedings of the IEEE conference oncomputer vision and pattern recognition.2015:815-823.

[4]Chan T H,Jia K,Gao S,et al.PCANet:A simple deep learning baselinefor image classification？[J].IEEE Transactions on Image Processing,2015,24(12):5017-5032.[5]Li S Z,Lu J.Face recognition using the nearest feature linemethod[J].IEEE transactions on neural networks,1999,10(2):439-443.

[6]Sun Y,Wang X,Tang X.Deeply learned face representations aresparse,selective,and robust[C]//Proceedings of the IEEE conference oncomputer vision and pattern recognition.2015:2892-2900.

[7]Liu Z,Luo P,Wang X,et al.Deep learning face attributes in the wild[C]//Proceedings of the IEEE International Conference on ComputerVision.2015:3730-3738.

[8]Beyer L,Hermans A,Leibe B.Biternion nets:Continuous head poseregression from discrete training labels[C]//German Conference on PatternRecognition.Springer,Cham,2015:157-168.

[9]Xu X,Kakadiaris I A.Joint head pose estimation and face alignmentframework using global and local CNN features[C]//Automatic Face&GestureRecognition(FG 2017),201712th IEEE International Conference on.IEEE,2017:642-649.

[10]Lawrence S,Giles C L,Tsoi A C,et al.Face recognition:Aconvolutional neural-network approach[J].IEEE transactions on neuralnetworks,1997,8(1):98-113.

Disclosure of Invention

In order to overcome the technical problems of high false recognition rate, low recognition rate and the like in a real complex scene in the prior art, the invention aims to provide a face recognition method based on a head posture, and the technical scheme adopted by the invention is as follows: screening images in an international open face library CASIA-Webface with nearly 50 ten thousand images by using a head pose estimation library Dlib to obtain a front face image with a pose angle between pitch +/-35 degrees and yaw +/-30 degrees; then designing a 64-layer residual multi-task convolutional neural network for classifying the face images and estimating the postures of the face images; and finally, training the large image data of the convolutional neural network to obtain the face recognition network with the head pose having practical application value.

The invention provides a face recognition method based on head pose, which uses a network model to extract head pose information and face feature vectors from a face image; then, judging the extracted head posture, and directly deleting the human face which does not meet the posture requirement; and comparing the characteristic vectors of the face images meeting the posture requirements by adopting an improved cosine classifier to obtain the similarity of face comparison.

The network model refers to a neural network combining face feature extraction and head pose angle regression of 64-layer residual error modules.

The 'face image' refers to an image of a face photo normalized to 96 × 112 size by MTCNN alignment and similarity transformation.

The head posture information refers to the pitching, yawing and inclination angle of the human face image.

The "face feature vector" refers to a 512-dimensional vector obtained by calculating the "face image" through the model.

The 'attitude not meeting' requirement and 'attitude meeting' requirement mean that the image of the 'face image' is not within plus or minus 35 degrees of the pitch angle or not within plus or minus 30 degrees of the yaw angle, and the latter mean that the image of the 'face image' is not within plus or minus 35 degrees of the pitch angle and not within plus or minus 30 degrees of the yaw angle.

The purpose of the invention is realized as follows:

the invention relates to a face recognition method based on head gestures, which comprises the following steps:

the method comprises the following steps: carrying out face detection and alignment on images of a face data set (international public face library CASIA-Webface);

step two: according to the five-point coordinates aligned in the step one, carrying out similarity transformation size normalization according to corresponding coordinates in the height of 96 width 112; wherein the five-point coordinates include: two-point coordinates (eye × 2), one-point coordinates (nose tip), two-point coordinates (mouth angle × 2);

step three: dividing the face data set image into N types as labels according to different identity IDs;

step four: the method comprises the steps that a face feature extraction and head attitude angle regression model of a 64-layer residual error module normalizes an image head attitude angle to a range of [ -1,1] according to pitch, yaw and roll angles to be used as a label;

step five: training a depth model for face recognition and posture estimation by using the finally preprocessed image obtained in the step four;

step six: extracting head posture information and a human face feature vector from the image by using the built depth model, judging the extracted head posture, and directly deleting the human face which does not meet the posture requirement; and comparing the characteristic vectors of the face images meeting the posture requirements by adopting an improved cosine classifier to obtain the similarity of face comparison.

The model architecture of the invention is as shown in fig. 1, after training samples are aligned, a probability mirror image of 0.5 is closed by subtracting 127.5 from a channel value of 3, then dividing by 256, the model is trained, the batch size is 64, the learning rate is 0.05, 11 epochs are trained totally, and the learning rate is updated gradually. The learning rate was reduced 1/10 after 4 epochs first, and then reduced to 1/10 every 3 epochs.

Aiming at the deep convolutional neural network structure in the prior art, samples of various classes cannot be well distinguished, so that the classification loss function (L) is added into the last layer of the deep convolutional neural network structure_angle) And attitude angle loss function (L)_pose)：

And k is as [1, m ]]

Wherein N is the batch size, x_iM is a positive integer of 1 or more, y represents the characteristic of the ith input image_iRepresenting a category to which the feature belongs;is the included angle between the input feature and the feature vector of the category, and the included angle range belongs to [0, pi/m ]]Because ofOutside of this range may cause(thus not belonging to class y_iIs not monotonically decreased), but cos m θ₁>cosθ₂It is still possible to do this, and again the Loss equation uses cosine angles. To avoid this problem, the function is redesignedTo replace

According to the invention, an angle Loss function is added into a network structure, and the angle Loss function and a position Loss are jointly supervised and trained to train a face model based on deep learning, so that the features extracted by the trained model have higher cohesiveness, and the distance between different classes is larger.

The invention uses the existing depth residual error network architecture based on metric learning to combine the angle metric and attitude angle regression multi-task learning scheme, but because the basic cosine method does not consider the average deviation of each dimension classifier in the image training caused by the model or data, and because the deviation exists, the invention is improved to solve the problem that the classifier with larger deviation tends to generate the value when calculating the similarity, the method subtracts the average value of the dimension characteristic under the characteristic value of each dimension, as the formula 2:

wherein x is_i、y_iI-dimension characteristic value of x, y, m_iIs the average of the dimensional features.

Compared with the prior art, the invention has the advantages that: the method for acquiring the face features and the head posture features by adopting a single image reduces the false recognition rate, improves the recognition rate, has better robustness to the illumination change around the clock, and can meet the requirements of practicability and timeliness in a real complex scene.

Drawings

FIG. 1 is a schematic diagram of a model structure of a 64-layer residual multi-task convolutional neural network according to the present invention.

Fig. 2 is a schematic flow chart of the working process of the face recognition method based on the head pose.

Fig. 3 is a running diagram of an embodiment of a face recognition method based on head pose.

Detailed Description

The invention is further described in detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.

As shown in fig. 1 and 2, a face recognition method based on head pose includes the following steps:

the method comprises the following steps: carrying out face detection and alignment on the face data set image to obtain five-point coordinates after alignment; wherein the five-point coordinates comprise coordinates of two eyes, a nose tip and two mouth corners;

taking CASIA-WebFace as a human face data set of the invention, comprising 494414 images of 10575 individuals, in order to keep the library attitude pure, the invention estimates the angles of pitch and yaw of the head through Dlib, and only reserves 389945 photos between-30 and-35 of attitude angle;

the method for detecting and aligning the face of the image comprises the following steps:

step 1.1) firstly, training a convolution network through P-Net for generating regression vectors of a candidate frame and a boundary frame for an image I, correcting the candidate frame by using a boundary frame regression method, and merging the candidate frames by using a non-maximum suppression algorithm;

step 1.2), further improving the candidate frames by using R-Net, taking the candidate frames passing through P-Net as R-Net, further deleting some windows which do not meet the conditions, and continuously using a boundary frame regression and non-maximum suppression algorithm to merge the remaining candidate frames;

and step 1.3) finally, outputting the positions of the final rectangular frame and the face characteristic points by using O-Net to obtain an image I', wherein the aligned 5-point coordinates are respectively as follows:

(eye_lx,eye_ly,eye_rx,eye_ry,nouse_x,nouse_y,mounth_lx,mounth_ly,mounth_rx,m ounth_ry)。

step two: according to the five-point coordinates, carrying out similarity transformation size normalization; carrying out size normalization on the image sample by adopting a similarity transformation method, wherein a transformation matrix is as follows:

the image was normalized to 96 x 112 size.

Step three: dividing the image into a plurality of categories according to personal identity, and respectively setting different labels (integers) for different categories;

step four: the attitude angle of the head of the image is normalized to an interval of [ -1,1] according to the pitch, yaw and inclination angle through the model, and the interval serves as a label (floating point number vector).

Step five: designing a depth model network architecture, wherein an angle loss function and a posture loss function are added in the last layer of the depth convolution neural network structure, and the network structure is shown as table 1;

table 1: network structure and network layer parameter configuration of the invention under a caffe framework

And k is as [1, m ]]

Wherein,is the included angle between the input feature and the feature vector of the category, and the included angle range belongs to [0, pi/m ]](ii) a N is the batch size, x_iM represents the characteristic of the ith input image and is a positive integer of 1 or more.

In the invention, a frame used for training the depth model is cafe, and the hyper-parameters are set as follows:

wherein, the batch size is the batch size, the base _ lr is the basic learning rate, the momentum is the forgetting factor, the weight _ decay is the weight attenuation term, the lr _ policy is the training strategy, and the gamma is the learning rate change weight.

After training is finished, the weight of each layer and the characteristic diagram are visualized to visually show the performance of the network, and if the characteristic diagram has noise, irregular pattern shape and other abnormalities, the network and the parameters are adjusted to continue training.

And D, utilizing the finally preprocessed image obtained in the step four to realize model training through a caffe framework, and training a depth model for face recognition and posture estimation.

Step six: extracting head pose information and a human face feature vector from a human face image by using a network model, judging the extracted head pose, and directly deleting the human face which does not meet the pose requirement; and comparing the characteristic vectors of the face images meeting the posture requirements by adopting an improved cosine classifier to obtain the similarity of face comparison.

Because the basic cosine method does not consider the average deviation of each classifier in the image training caused by the model or data, and the existence of the deviation causes the values of the classifiers with larger deviation in the similarity calculation. To solve this problem, the present invention improves this by subtracting the average value of the dimension feature under each dimension feature value, as shown in equation 2:

in the face recognition method based on the head pose provided by the invention, 1: face detection and alignment is performed using MTCNN. MTCNN is a face detection deep learning model of multitask cascade CNN, and face boundary regression and face key point detection are comprehensively considered. 2: size normalization: in terms of image size normalization, a similarity transformation is used: the similarity transformation is used to refer to geometric similarity, or a matrix transformation for generating similarity. 3: and (4) feature extraction, namely obtaining a feature representative vector through a feature extraction model. 4: and (5) comparing the features, obtaining the similarity through an improved cosine classifier, and giving a face recognition result according to the similarity sequence.

According to the invention, a combination of deep CNN characteristics and head posture characteristics is added on a traditional CNN single-layer characteristic framework to screen different imaging samples, and a face detection network with strong robustness to front postures and different illumination is trained from small sample data on the basis of a deep convolutional neural network algorithm, so that the false detection rate is reduced, and the detection response speed is increased.

The invention defines a complete flow scheme implemented by a face recognition method based on head gestures, and comprises the steps of image acquisition, face detection alignment, face geometric normalization, face feature extraction, face gesture judgment, face feature comparison and comparison result acquisition as shown in figure 2. Firstly, an image containing a human face is obtained from a camera data stream, then a human face photograph with high confidence coefficient is screened out through a human face detection module, human face features are obtained through a model after alignment normalization, and finally a human face similarity result is obtained in a comparison module.

The protection of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, and the scope of the appended claims is intended to be protected.

Claims

1. A face recognition method based on head gestures is characterized by comprising the following steps:

step two: according to the five-point coordinates, carrying out similarity transformation size normalization;

step four: normalizing the attitude angle of the head of the image to an interval of [ -1,1] according to the pitch, the yaw and the inclination angle by using a model to serve as a label;

step five: using the finally preprocessed image obtained in the fourth step, realizing model training through a caffe framework, and training a depth model for face recognition and posture estimation;

2. The method of claim 1, wherein the face data set image is a frontal face image having a pose angle between pitch ± 35 degrees and yaw ± 30 degrees.

3. The method for recognizing a human face based on a head pose as claimed in claim 1, wherein in the first step, the method for detecting and aligning the human face of the image is as follows:

(eye_lx,eye_ly,eye_rx,eye_ry,nouse_x,nouse_y,mounth_lx,mounth_ly,mounth_rx,mounth_ry)。

4. the method for recognizing a face based on a head pose as claimed in claim 1, wherein in the second step, the image sample is subjected to size normalization by using a similarity transformation method, wherein a transformation matrix is as follows:

the image was normalized to 96 x 112 size.

5. The method according to claim 1, wherein in the fourth step, the model is a 64-layer residual multi-task convolutional neural network model.

6. The method for recognizing a human face based on a head pose as claimed in claim 1, wherein in the fifth step, the method for training the depth model for recognizing the human face and pose estimation is as follows:

firstly, designing a depth model network architecture, and adding a cosine angle quantity loss function (L) in the last layer of face recognition of a depth convolution neural network structure_angle) Adding Euclidean distance loss function (L) in the last layer of the attitude regression_pose) (ii) a Wherein,

and k is as [1, m ]]

Wherein N is the batch size, x_iM is a positive integer of 1 or more, y represents the characteristic of the ith input image_iRepresenting a category to which the feature belongs;is the included angle between the input feature and the feature vector of the category, and the included angle range belongs to [0, pi/m ]]。

7. The method according to claim 1, wherein in the fifth step, the frame used for training the depth model is cafe, and the hyper-parameters are set as: the training strategy is in a step shape; the learning rate change weight is 0.1; batch size 64; the basic learning rate is 0.05; the forgetting factor is 0.9; the weighted decay term is 0.0005.

8. The method according to claim 1, wherein in the sixth step, in the modified cosine classifier, the average value of the dimensional feature is subtracted under each dimensional feature value, as shown in formula 1: