CN110263774A

CN110263774A - A kind of method for detecting human face

Info

Publication number: CN110263774A
Application number: CN201910761999.4A
Authority: CN
Inventors: 殷绪成; 杨博闻; 杨春
Original assignee: Zhuhai Wisdom Electronic Technology Co Ltd
Current assignee: Zhuhai Wisdom Electronic Technology Co Ltd
Priority date: 2019-08-19
Filing date: 2019-08-19
Publication date: 2019-09-20
Anticipated expiration: 2039-08-19
Also published as: US10984224B2; CN110263774B; US20210056293A1

Abstract

The invention discloses a kind of method for detecting human face, the following steps are included: 1, input picture passes through image pyramid first zooms to different sizes according to a certain percentage, then first order network is passed sequentially through in a manner of sliding window, the confidence level of the rough coordinate for predicting face and face and the direction of face, later, most negative sample is filtered out according to confidence level ranking, and remaining image block is sent into second level network；2, second level network further filters out non-face sample and returns out more accurate position coordinates, provides the prediction result of facial orientation；3, angle arbitration mechanism will combine the prediction result of the first two network to make terminal arbitration to the rotation angle of each sample；4, each image block is become a full member according to the result that angle arbitration mechanism is arbitrated, and is then fed into third level network and is done accurate adjustment, to predict the position of key point.The present invention realizes the position that the face of any rotation angle has been snapped to standard faces.

Description

A kind of method for detecting human face

Technical field

The present invention relates to the human face detection tech fields of computer vision field, and in particular to a kind of method for detecting human face.

Background technique

Face datection has a wide range of applications in fields such as authentication, security protection, media and amusements, and Face datection problem rises The committed step for realizing recognition of face derived from recognition of face, especially in open scene, due to face posture, illumination, The diversity of scale etc. brings huge challenge to face and its critical point detection, in past ten years, computer view Feel field emerges a large amount of method to improve the ability of machine detection face, and traditional method for detecting human face is according to realization mechanism The method based on geometrical characteristic can be divided into, the method based on complexion model and the method based on statistical theory, wherein based on several The method of what feature mainly realizes Face datection using the geometrical characteristic that human face's organ embodies；Side based on complexion model Method thinks that the colour of skin of face and non-face region have significant difference；Method based on statistical theory is to utilize statistical analysis and machine The method of device study searches out face sample and the respective statistical nature of non-face sample, reuses respective feature construction and divides Class device, such methods include subspace method, neural network method, support vector machine method, hidden markov model approach and Boosting method, with the unprecedented increase for calculating power and data in recent years, the method based on CNN has surmounted tradition above-mentioned comprehensively The problem of method, many methods are proposed to solve the detection of unconstrained scene human face.

The present invention focuses on the face and its critical point detection for solving the problems, such as that Plane Rotation is constant, compared to pitching and side The face of face, Plane Rotation has semantic information identical with front face, therefore solves the problems, such as this for subsequent face The work such as identification, face analysis are of great significance.In order to solve the Face datection of invariable rotary, Huang Chang et al. exists Paper (Huang C, Ai H, Li Y, et al. High-Performance Rotation Invariant in 2007 Multiview Face Detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007,29 (4): 671-686.) using the strategy divided and rule, i.e., for different angle Face use different detectors, each detector only to the rotation face robust of an a small range, goes their synthesis As a result it is exported as last prediction.STN(Jaderberg M, Simonyan K, Zisserman A, et al. Spatial Transformer Networks [J] 2015.) by learning rotation of the spin matrix realization to target in training Turn invariance, but this method is simultaneously only to a target effective.Recently, Shi et al. paper in 2018 (Shi X, Shan S , Kan M , et al. Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks [J] 2018.) propose cascade method, it is revolved by the study slightly to essence Gyration, to reach the invariable rotary to Face datection, but there is still a need for additional key point informations to come in fact for its testing result Existing face alignment.

Paper (Sun Y, Wang X, the Tang X Deep Convolutional Network of Sun Yi et al. Cascade for Facial Point Detection[C]// Computer Vision and Pattern Recognition, 2013 IEEE Conference on. IEEE, 2013.) that deep learning is introduced face for the first time is crucial Point Detection task, TCDCN(Zhang Z, Luo P, Loy C C, et al. Facial Landmark Detection by Deep Multi-task Learning[C]// European Conference on Computer Vision. Springer, Cham, 2014.) it is improved using attributes such as expression, the genders being closely related with face key point and key point is examined The robustness of survey, but these methods are all separated with Face datection so that such methods to the testing result of back have compared with Big dependence, HyperFace(Ranjan R, Patel V M, Chellappa R. HyperFace:A Deep Multi- task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2018, PP (99): 1-1.) by more attribute tags addition training mission, by more Tasking learning improves the accuracy that key point returns, however, excessive learning tasks bring bigger calculation amount and more Runing time, this for Face datection for the very high task of requirement of real-time, this method obviously has many limitations.

Cascade thought is just widely used in conventional methods where, such as the method for Adboost class, with CNN Emergence, the multistage, cascade CNN method was also come into being, method such as SSD(Liu W, the Anguelov D with single phase, Erhan D, et al. SSD:Single Shot MultiBox Detector [J] 2015.) and YOLO(Redmon J , Divvala S , Girshick R , et al. You Only Look Once: Unified, Real-Time Object Detection [J] 2015.) it compares, cascade structure can greatly improve while significantly reducing performance The speed of service of method.Principle is that in cascade network, most of negative sample can be fallen by the network filtering of front, so that Subsequent network only focuses on raising on the classification capacity of difficult sample, and this strategy can save significantly on network parameter and meter Calculation amount.

Summary of the invention

In view of the deficiencies of the prior art, the object of the present invention is to provide a kind of method for detecting human face, which is realized Face Plane Rotation angle is predicted while detecting face, then face is become a full member according to rotation angle, on this basis Return out the key point of face features.

To achieve the purpose of the present invention, technical solution below is taken: a kind of method for detecting human face, comprising the following steps:

Step 1, input picture pass through image pyramid first and zoom to different sizes according to a certain percentage, then with sliding The mode of window passes sequentially through first order network, the court of the confidence level and face of the rough coordinate for predicting face and face To (direction of face can be with are as follows: upward, downward, towards a left side or towards the right side), later, filtered out according to confidence level ranking most negative Sample, and remaining image block is sent into second level network；

Step 2, second level network further filter out non-face sample and return out more accurate position coordinates, provide people The prediction result of face direction；

Step 3, angle arbitration mechanism will combine the prediction result of the first two network to make finally the rotation angle of each sample Arbitration；

Step 4, last each image block become a full member according to the result that angle arbitration mechanism is arbitrated, and are sent into third level network and do most Accurate adjustment afterwards, to predict the position of key point.

The first order network and second level network include: face/non-face classification task, the recurrence times of face bounding box The training mission of business and angle classification task；The third level network includes: face/non-face classification task, face bounding box Recurrence task and face key point return the training mission of task；

Face classification loss functionIt is defined as cross entropy loss function:

,

Wherein,Indicate that the classification annotation of training sample, subscript f indicate face classification task, when input is positive sample,, otherwise,, whereinFor face classification prediction result, log indicates natural logrithm, angle Classification Loss functionIs defined as:

,

Wherein,The direction of rotation mark for indicating training data, when the rotation angle of input sample falls in theA rotation angle is attached When close,, otherwise, in training,Indicate four kinds of different rotation angles,Indicate neural network forecast input Sample falls inThe probability of a angle, log indicate natural logrithm, and the recurrence of face bounding box uses Euclidean distance loss function, The regressive object of bounding box includes following four, respectively indicates the relative displacement of four coordinates:

,

Wherein,Indicate the relative displacement of upper left point abscissa,Indicate the relative displacement of upper left point ordinate,Table Show the relative displacement of lower-right most point abscissa,Indicate the relative displacement of lower-right most point ordinate,WithIndicate prediction width and Height,The upper left point coordinate and lower-right most point coordinate of each face frame in training data are respectively indicated,Respectively indicate the upper left point coordinate and lower-right most point coordinate of the frame of neural network forecast.

Pass through following key point loss functionTo train the key point position of face:

,

Wherein,Indicate the size of each cluster in training process,Indicate key point number on every face,It indicates in training The line of n-th face, two eyes and the angle of picture horizontal axis, cos indicate cosine,Indicate n-th face The distance between m-th of key point predicted value and true value,Indicate two norms,It indicates big posture penalty term, calculates Process is as follows: 1) four key points in addition to nose being interconnected to constitute four edges boundary line；2) nose is calculated to its most near side (ns) The relative distance in boundary line；3) judge whether nose exceeds boundary；4) if nose is in boundary, w_n=1-, otherwise, w_n=1。

Angle arbitration mechanism presets a threshold value, when the prediction result of second level network is higher than the threshold value or second The highest confidence of the prediction result of the highest confidence level and first order network facial orientation of the prediction result of grade network facial orientation It when spending identical, takes the prediction result of facial orientation as final prediction result, otherwise, investigates confidence level row in first order network Confidence level arranges the prediction result of preceding two facial orientations in the prediction result and second level network of preceding two facial orientations, is It is no to have intersection, if so, then taking the intersection as final prediction result.

Key technical problem to be solved by this invention is: solving face and its pass that angle is arbitrarily rotated under open scene Key point test problems, in unconstrained opening scene, due to imaging device and the relative positional relationship random by imaging face, There may be arbitrary rotation angle, the diversity of rotation to bring the diversity of face characteristic performance for facial image, and adjoint There is complicated ambient noise, this key point positioning to detection work and on this basis brings huge challenge.This hair It is bright to predict face Plane Rotation angle while be intended to detect face, then face is become a full member according to rotation angle, it is basic herein Upper returning goes out the key point of face features.

The advantages of the present invention:

The present invention uses the structure of concatenated convolutional neural network, and Face datection and key point location tasks are melted in the case where rotating scene It closes, angle prediction and Face datection task is combined, realize while rotation angle, face classification, face bounding box are returned Return and crucial point location.The result that the present invention exports can be realized by simple similarity transformation and be aligned the face arbitrarily rotated To the position of standard faces, at the same time, this method may be implemented real on universal cpu in the case where keeping small-scale model When the speed of service, for mobile computing project plan have important practical significance.

Detailed description of the invention

Fig. 1 is overall framework flow instance figure of the present invention.

Fig. 2 is test comparison result of the present invention on AFLW data set.

Fig. 3 is test effect figure of the invention.

Specific embodiment

Embodiment

The present invention is further illustrated With reference to embodiment.

For open application scenarios, present invention combination deep learning method and cascade thought propose rotation robust Face and its Keypoint detector, the thought of deep learning by many methods prove in terms of feature extraction have other The incomparable advantage of method, especially under unconstrained scene, the method based on deep learning can preferably extract magnanimity The feature of training sample, in addition, cascade can trace back to the way of thinking of conventional machines study as one kind, it is extensive in recent years Applied to deep learning field, especially in Face datection and critical point detection field, in addition, pre- in such a way that angle is arbitrated The rotation angle of face is surveyed, and punishes loss function improvement method to the predictive ability of difficult sample by introducing posture.

Here, the overall implementation of invariable rotary is introduced first, global schema is made of three mutual cascade sub-networks, By by slightly stepping up the accuracy to Face datection to the method for essence, as shown in Figure 1, specifically, during the test, it is defeated Enter image and pass through image pyramid first to zoom to different sizes according to a certain percentage, then in a manner of sliding window according to It is secondary by first order network, the direction of the confidence level and face of the rough coordinate for predicting face and face (such as: court On, downward, towards a left side, towards the right side).Later, most negative sample is filtered out according to confidence level ranking, and remaining image block is sent Enter second level network, this primary network station further filters out non-face sample and returns out more accurate position coordinates, together Sample, the prediction result of facial orientation is also provided, and then, angle arbitration mechanism will combine the prediction result pair of the first two network The rotation angle of each sample makes terminal arbitration, and last each image block becomes a full member according to the result of arbitration and is sent into afterbody Network does last accurate adjustment, and predicts the position of key point, it is notable that non-maximum restraining operation is used as each stage Post-processing operation come merge height be overlapped candidate frame.

The prediction Task-decomposing for rotating face and its key point is multiple simple tasks by the present invention, can guarantee to rotate Real-time arithmetic speed is kept while robust, is of great significance for practical application.In first order network and second level net In network, angle classification is combination learning with face/non-face two classification and the recurrence of bounding box, rotates angle classification task Introducing on the one hand help to improve rotation Face datection recall rate, on the other hand due to improving in each small range angle The extent of polymerization of sample helps to improve the regression accuracy of bounding box.Entire 360 ° of planes are divided into four parts by this method, and preceding two Which kind of in these fourth types a Web Cams belong in the rotation angle of prediction face, eight points for classifying compared to two and more refining Class, four classification can keep smaller parameter amount in the case where guaranteeing compared with high-accuracy.Wherein, first order sub-network is using complete The network structure of convolution.Its main task includes: the confidence extracted candidate frame from original image and these candidate frames are belonged to face Degree carries out primary learning, while returning four coordinates of bounding box.For second level sub-network, the prediction result of upper level is selected Middle face confidence level is higher than the sample of certain threshold value as inputting, and still includes a large amount of negative sample, this stage purport in these samples The confidence level of negative sample is reduced improving the wherein confidence level of positive sample, thus achieve the purpose that further to remove negative sample, In addition to this, this network also carries out re prediction to the direction of rotation of each input sample, by preceding two stage processing, largely Negative sample have been removed, and the sample of each reservation includes the prediction result of two groups of direction of rotation, and angle arbitrates machine System is by combining this two groups of outputs to provide final rotation angle prediction result.

The training process of this method includes four kinds of tasks, they are respectively: face/non-face classification task, face boundary Frame returns task, and angle classification task and face key point return task, these tasks pass through different weights in each stage Collective effect is combined together in each network, wherein face classification loss functionIt is defined as cross entropy loss function:

,

It is worth noting that, this method in key point recurrence task, increases pair on the basis of traditional Euclidean distance The penalty term of big posture face, this is primarily due in existing training data, and often accounting is lower for the face of big posture, leads It causes model inadequate to the attention rate of this kind of sample, causes training result larger to the prediction result error of these samples, it is same with this When, it, can be with according to the relative positional relationship of the mark coordinate of existing training data (such as: right and left eyes, nose and the left and right corners of the mouth) The sample of the larger face of those postures is extracted, therefore the present invention constructs following key point loss functionFor training The crucial point location of face:

,

Wherein,Indicate the size of each cluster in training process,Indicate key point number on every face,It indicates in training The line of n-th face, two eyes and the angle of picture horizontal axis, cos indicate cosine,Indicate n-th face The distance between m-th of key point predicted value and true value,Indicate two norms,It indicates to the big of n-th training sample Posture penalty term, calculating process are as follows: 1) four key points in addition to nose being interconnected to constitute four edges boundary line；2) it counts Relative distance of the calculation nose to its nearest boundary line；3) judge whether nose exceeds boundary；If 4) nose is in boundary, w_n=1-, otherwise, w_n=1, this strategy for redefining weight can make network that more attentions are placed in big posture sample.

Angle arbitration mechanism is used to integrate the first two network to the prediction result of face rotation angle, mutual cascade network Structure be also to the conduction of error prediction result it is cascade, this, which will lead to stage of the error result of front below, to draw It returns, in the method, it is all to classify at four kinds towards doing in range that the angle classification task of the first two network, which is identical, Prediction, the difference is that the sample of second level network inputs includes therefore more positive samples have more believable prediction result, angle Degree arbitration mechanism combines the first two angle prediction result by one predefined threshold value of setting, specifically, when second net When the prediction result of network is identical higher than highest two prediction results of confidence level of the threshold value or the first two network, the present invention is taken The prediction of second level network is as final as a result, whether two most believable prediction results have friendship before otherwise investigating two networks Collection, if so, then taking their intersection as prediction result.

(1) data set that the present invention uses；

2010. FDDB:A Benchmark for Face of FDDB(Vidit Jain and Erik Learned-Miller. Detection in Unconstrained Settings. Technical Report UM-CS-2010-009. University of Massachusetts, Amherst.) it include picture in 2845 natural scenes, wherein being labeled with 5171 face frames are the general data collection for testing Face datection, but the posture of wherein most face be it is typical, Namely rotation angle is smaller, and in order to test method of the invention with rotational invariance, the present invention is by the picture of original data set It is rotated by 90 ° counterclockwise respectively, 180 ° and 270 °, the rotation angle of itself in combined data, data substantially can be with after rotating augmentation Cover all angles of entire plane, the present invention is using notebook data collection for assessing face frame detection effect.

AFLW(Martin K stinger, Wohlhart P, Roth P M, et al. Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization[C]// IEEE International Conference on Computer Vision Workshops, ICCV 2011 Workshops, Barcelona, Spain, November 6-13, 2011. IEEE, 2011.) include 25993 faces, they posture, block in terms of have diversity, be used for using notebook data collection Test critical point detection effect of the invention.

(2) test process；

Test and test of the invention is all made of Caffe deep learning frame, is carried out in training using stochastic gradient descent method excellent Change, specifically, the training batch size of three sub-networks is respectively set to 400,300 and 200, initial learning rate is set as 0.01, and original 1/10th are decremented to after every 20,000 iteration wheel numbers, total iteration wheel number is 200,000, and weight declines Subtract parameter and is set as 5 × 10^-4, momentum parameter 0.9, PReLU connects after convolution operation and full attended operation as activation primitive Face.

Training data comes from multiple data sources, and wherein Face datection and the data of angle classification are from allusion quotation in WIDER FACE The sample of type posture, for the plane deflection angle of this part posture face between ± 30 °, the training data of face key point is main From CelebA data set, for first network, present invention random cropping from original image goes out candidate frame as training data, this A little candidate frames are divided into positive class, negative class and part class according to the friendship really marked and than (IoU), specifically, the sample of IoU > 0.7 Originally be positive class, and the sample of IoU < 0.3 is negative class, and the sample of 0.4 < IoU < 0.7 is part class, and positive class and negative class are used to training of human Face/non-face two classification task, positive class and part class, which are used to that face candidate frame is trained to return, rotates angle classification task with face. The training data of second network uses identical partition strategy, but data source is pre- in original data set in first network Output is surveyed, for third level network, needing to cut out on CelebA data set using the first two network includes key point Image is as training sample, in the training process, positive class, negative class, part class and crucial point data ratio setting be 2:3:1: 2, in addition, the training data balanced distribution in order to guarantee rotation angle classification, the present invention devise Random-Rotation layer, which is used to The rotation of dynamic random inputs facial image in training, while being converted accordingly to its label, guarantees to instruct at each The quantity accounting for practicing all kinds of angle-datas in batch is identical, it should be noted that Random-Rotation layer can only rotate input picture 0 °, 90 °, 180 ° or 270 °, because front face data itself have small-scale rotation angle, introduce Random-Rotation layer Training data afterwards can cover all rotation angles in plane, in addition, data preparation is also greatly reduced in the introducing of this layer Time and training in EMS memory occupation.

(3) test result；

In order to assess effectiveness of the invention, Face datection and crucial point location have been carried out respectively in data set mentioned above Test, the method for detecting human face of the present invention and current mainstream has carried out contrast test, in Face datection task, present invention choosing It is higher complexity is selected, stronger general target algorithm SSD(Liu W, Anguelov D, the Erhan D of feature representation ability , et al. SSD:Single Shot MultiBox Detector [J] 2015.) and Faster-RCNN(Ren S, He K , Girshick R , et al. Faster R-CNN: Towards Real-Time Object Detection With Region Proposal Networks [J] 2015.) and other popular methods carried out on FDDB data set Contrast test, the results showed that this method keeps higher recall rate in the case where false detection rate is certain in different rotary angle, Especially compare others cascade neural network such as PCN(Shi X, Shan S, Kan M, et al. Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks[J]. 2018.) method of the invention on same test collection is higher by 1.8 percentage points, in the assessment test of crucial point location, this hair It is bright it is same selected several critical point detection methods to compare on identical test set, test results are shown in figure 2, as a result Show that method of the invention maintains lower level in terms of normalizing mean error, is surveyed in the key point of different rotary angle It can achieve lower error rate on examination collection, it was demonstrated that the advantage of rotation robust of the invention.

In addition, the present invention has carried out ablation test, right respectively in order to verify combination learning for the validity of test result Compared whether joint training Face datection and angle classification and whether the comparison of joint training Face datection and crucial point location Test, test show that the effect that angle classification task and key point location tasks both contribute to improve Face datection is added, can be with The feature of two kinds of tasks is construed to while realizing shared, the multiple tasks that are mutually related pass through feature during study Raising of the shared realization to individual task performance with weight level, in order to verify the validity of big posture punishment loss function, The present invention compared the mean error of the model using the loss function and using the training of common L2 loss function on AFLW, survey The mean error for the key point after bright introducing of taking temperature is reduced to 7.5% from 7.9% before introducing, and the present invention is in universal cpu and GPU On test the inference speed of this method, the speed of 23FPS and 60FPS can be respectively reached on CPU and GPU.

The invention proposes the faces and its critical point detection method of a kind of novel rotation robust, pass through three mutual grades The convolutional neural networks of connection realize the prediction of rotation angle, the positioning of Face datection and key point, test effect such as Fig. 3 institute simultaneously Show, punishes that loss function improves the accuracy and big posture of the prediction to angle by introducing angle arbitration mechanism and big posture The locating effect of face key point.

Above-listed detailed description is illustrating for possible embodiments of the present invention, and the embodiment is not to limit this hair Bright the scope of the patents, all equivalence enforcements or change without departing from carried out by the present invention, is intended to be limited solely by the scope of the patents of this case.

Claims

1. a kind of method for detecting human face, which comprises the following steps:

The image of input is passed through image pyramid, and is scaled to different size of image according to a certain percentage by step 1, then First order network is passed sequentially through in a manner of sliding window, the confidence level of the rough coordinate for predicting face and face and the The prediction result of primary network station facial orientation, according to confidence level ranking to filter out negative sample, then after will filter out negative sample Remaining image block sample is sent into second level network；

Step 2, second level network further filter out non-face sample, and return out more accurate position coordinates, with The prediction result of second level network facial orientation out；

Step 3, the prediction result of angle arbitration mechanism combination first order network facial orientation and second level network facial orientation Prediction result makes terminal arbitration to the rotation angle of each image block sample；

Step 4, the rotation angle arbitrated according to angle arbitration mechanism, become a full member each image block sample, and be sent into the third level Network does accurate adjustment, to predict the position of the key point of face；

The first order network and second level network include: face/non-face classification task, face bounding box return task and The training mission of angle classification task；The third level network includes: face/non-face classification task, the recurrence of face bounding box Task and face key point return the training mission of task；

Face classification loss functionIt is defined as cross entropy loss function:

Wherein,The direction of rotation mark for indicating training data, when the rotation angle of input sample falls in theA rotation angle When,, otherwise,, in training,, T four kinds of different rotation angles of expression,Indicate neural network forecast Input sample falls inThe probability of a angle, log indicate natural logrithm, and the recurrence of face bounding box is lost using Euclidean distance Function, the regressive object of bounding box include following four, respectively indicate the relative displacement of four coordinates:

Wherein,Indicate the relative displacement of upper left point abscissa,Indicate the relative displacement of upper left point ordinate,Table Show the relative displacement of lower-right most point abscissa,Indicate the relative displacement of lower-right most point ordinate,WithIndicate the width of prediction And height,The upper left point coordinate and lower-right most point coordinate of each face frame in training data are respectively indicated,Respectively indicate the upper left point coordinate and lower-right most point coordinate of the frame of neural network forecast；

Pass through following loss functionTo train the key point position of face:

,

Wherein,Indicate the size of each cluster in training process,Indicate key point number on every face,Indicate training In n-th face, two eyes line and picture horizontal axis angle, cos indicate cosine,Indicate n-th people The distance between m-th of key point predicted value of face and true value,Indicate two norms,It indicates to n-th of training sample Big posture penalty term, specific calculating process is as follows: 1) four key points in addition to nose being interconnected to constitute four edges circle Line；2) relative distance of the calculating nose to its nearest boundary line；3) judge whether nose exceeds boundary；4) if nose is on side In boundary, then w_n=1- , otherwise, w_n=1。

2. method for detecting human face according to claim 1, which is characterized in that the angle arbitration mechanism presets one Threshold value, when the highest for the prediction result that the prediction result of second level network is higher than the threshold value or second level network facial orientation is set When reliability is identical as the highest confidence level of prediction result of first order network facial orientation, the prediction result conduct of facial orientation is taken Otherwise final prediction result investigates the prediction result and second that confidence level in first order network arranges preceding two facial orientations Confidence level arranges the prediction result of preceding two facial orientations in grade network, if has intersection, if so, then taking the intersection as most Whole prediction result.

3. method for detecting human face according to claim 1, which is characterized in that the direction of the face are as follows: upward, downward, Towards a left side or towards the right side.