CN111739070B

CN111739070B - Real-time multi-pose face detection algorithm based on progressive calibration type network

Info

Publication number: CN111739070B
Application number: CN202010471082.3A
Authority: CN
Inventors: 吴渊; 金城; 李雨晴
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2020-05-28
Filing date: 2020-05-28
Publication date: 2022-07-22
Anticipated expiration: 2040-05-28
Also published as: CN111739070A

Abstract

The invention belongs to the technical field of face detection, and particularly relates to a real-time multi-pose face detection algorithm based on a progressive calibration type network. The invention aims at the problems of rotation and shielding during face detection, and solves the problem of face rotation by three stages, wherein each stage calibrates the face in a certain range of angles, and finally detects and judges the accurate angle of the calibrated face image in the last stage. On the basis of a CASIA-Webface data set, a series of processes are enhanced through data such as shielding object labeling, human face key point positioning, shielding human face synthesis and image rotation, and a rotating shielded SORF (sequence of partial images of human faces) human face data set is constructed and used for training and testing a network model. The invention can effectively detect the rotating and shielded human face on the basis of ensuring the detection real-time performance, and obtains excellent effect on common public data sets.

Description

Real-time multi-pose face detection algorithm based on progressive calibration type network

Technical Field

The invention belongs to the technical field of face detection, and particularly relates to a real-time multi-pose face detection algorithm based on a progressive calibration type network.

Background

With the increasing development of the information-based society, a face detection and identification technology with the advantages of non-contact, non-intrusion, intuition, simplicity and convenience and the like is available everywhere in life, and related products and applications thereof are successfully applied to various scene public security fields such as mobile phone face brushing unlocking, face shooting locking, self-service face brushing payment, face beautifying and the like, camera monitoring, hotel check-in identity verification, train station face brushing ticket checking and the like. Under the actual application scene, the rotation and the sheltering from of people's face are difficult to avoid at the in-process of gathering the image, if the angle deviation of shooting equipment and people's face can cause the rotation of people's face in the image, and the dress of personage (gauze mask, sunglasses, bang etc.) can cause the partial information of people's face in the image to be sheltered from. The rotation and the shielding of the face image can cause the face detection effect to be poor.

The mainstream human face detection algorithm processes an image in a two-dimensional pixel matrix manner, determines an area meeting a specific calculation result as a human face, and in the detection process, rotation can destroy the azimuth characteristic of the human face, and shielding can destroy the geometric characteristic of the human face, so that the detection algorithm can not accurately and quickly detect and judge the human face, and the human face detection effect is influenced. It is imperative to reduce or even eliminate the negative impact of rotation and occlusion on the face detection algorithm. The problems are solved, the robustness of the face detection algorithm is improved, the face detection technology is favorably and better landed and applied to a real scene, better basic support is provided for further subsequent face correlation technology application, meanwhile, an idea is provided for solving similar problems in the future, and the method has profound influence on the development and application of the target detection algorithm.

Disclosure of Invention

The invention aims to provide a real-time multi-pose face detection algorithm based on a progressive calibration type network, which can be applied to a face detection system to solve the negative influence of rotation and shielding on face detection.

The invention solves the problem of rotation and shielding of the face in face detection by dividing the rotation problem of the face into three stages, wherein each stage calibrates the face in a certain range of angles, and finally detects and judges the accurate angle of the calibrated face image in the last stage. On the basis of a CASIA-Webface data set, a rotating shielded SORF face data set is constructed by enhancing a series of processes through data such as shielding object labeling, face key point positioning, shielded face synthesis, image rotation and the like and is used for training and testing a network model. The invention can effectively detect the rotating and shielded human face on the basis of ensuring the detection real-time performance, and obtains excellent effect on common public data sets. The technical scheme of the invention is specifically described as follows.

The invention provides a real-time multi-pose face detection algorithm based on a progressive calibration type network, which comprises the following specific steps of:

1) based on a CASIAWebFace data set, carrying out shielding object labeling and face key point positioning, then sequentially shielding face synthesis and image rotation, and constructing a shielded and rotated face data set SOR;

2) processing the image in the SOR obtained in the step 1), detecting the obtained face candidate frame in three stages from coarse to fine, calibrating the angle of the face within a certain range in each stage, and performing final detection and accurate angle judgment on the calibrated face image in the last stage.

In the invention, in the step 1), the face key points are obtained on the face image by adopting a method of Dlib face key point detection.

In the invention, in the step 1), when the shielding object is synthesized with the face, a thin plate spline deformation TPS method is adopted to deform the image of the shielding object so as to enable the image to be matched with the specified position in the face image needing to be mapped; filling default values in the deformed image by using a nonlinear interpolation method;

after the barrier is deformed, the image synthesis at the position corresponding to the face image is shown as formula (10).

Output＝Occlusion×Opacity+Input×(1-Opacity) (1)

Wherein Occlusion is the pixel value of the obstruction, Input is the pixel value of the Input face image, and Opacity is the transparency of the obstruction.

In the invention, in the step 1), the initial deflection angle alpha of the human face is calculated by utilizing the coordinates of the left eye and the right eye, and then a human face image with rotation and shielding is obtained by rotating a specified angle; wherein: coordinates (x) of left and right eyes_i,y_i) Averaging is performed to obtain two points (x) representing the left and right eyes_L,y_L) And (x)_R,y_R) Taking the connection line of the two points as the initial deflection angle α of the face, as shown in equation (11):

the problem of the rotary face detection is solved through a deep learning model based on a progressive calibration type network; for the problem of the detection of the face with the occlusion, the problem is solved by constructing a data set of the face with the occlusion, namely, a mode of expanding a training set. Compared with the prior art, the invention has the beneficial effects that:

(1) the invention provides a method for solving the problems of rotary and shielded face detection, and the face detection of the invention can meet the requirements of effectiveness, robustness and instantaneity in practical application;

(2) the method constructs a rotating and shielded human face data set SORF in a mode of shielding object synthesis on the basis of the existing public data set, so that a network model obtained by training the data set has a better detection effect on the rotating and shielded human face, and has better generalization capability on the detection of the controlled human face;

(2) the method comprises a plurality of stages, and the face detection and angle calibration are performed on the rotating face image step by step, so that the difficulty of predicting the rotating angle of the face is reduced, and the accuracy of the face detection and angle calibration is improved.

(3) By utilizing the cascade network model, the face image is processed from coarse to fine, and the face detection efficiency is greatly improved.

Drawings

FIG. 1 is a block flow diagram of the algorithm of the present invention.

Fig. 2 is a schematic diagram of key points of sunglasses and masks.

Fig. 3 shows the position of the sunglasses and mask (blue is sunglasses, green is mask).

FIG. 4 is a schematic diagram of occluded face synthesis.

Fig. 5 is a schematic view of a rotated image.

Fig. 6 is a first stage network architecture.

Fig. 7 is a second stage network architecture.

Fig. 8 is a third stage network structure.

FIG. 9 is a schematic diagram of an algorithm face detection process in the embodiment of the present invention.

Detailed Description

Fig. 1 is a block diagram of the overall system flow of the real-time multi-pose face detection algorithm design based on the progressive calibration type network according to the present invention. The method mainly comprises the steps of constructing the SORF data set and modeling the network.

One, SORF data set construction

The method is based on a CASIAWbFace data set, specifically adopts a barrier synthesis mode, and synthesizes sunglasses, a mask and a human face through positioning of key points of the human face and key points of barriers (sunglasses and masks) to obtain a human face image with the barrier; meanwhile, the initial deflection angle of the face is calculated by using the key points of the face, and the final face image with rotation and occlusion is obtained by rotating the specified angle, so that an Occluded and Rotated face data set SORF (synthetic Occluded and Rotated faces) is constructed.

The data set construction method comprises four stages of shielding object key point marking, human face key point positioning, shielding human face synthesis and image rotation.

1. Obstruction keypoint labeling

The sunglasses and the mask are shields and are synthesized with the face image, and the situation that the face is shielded in a real scene is simulated. Firstly, pictures of a plurality of independent sunglasses and masks, namely pictures of the sunglasses and the masks which are not worn on the human face, are manually collected on the internet, and key points are marked in a manual marking mode, wherein 10 points are selected as the key points by the sunglasses, and 12 points are selected as the key points by the masks, as shown in fig. 2.

2. Human face key point positioning

The invention selects a Dlib face key point detection method to obtain 68 key points on each face image. Using part of the 68 key points to perform calculation, the positions of the stickers of the sunglasses and the mask in the face image are determined, as shown in fig. 3.

3. Occlusion face synthesis

The invention adopts a TPS (thin plate spline deformation) method to deform the shielding object image so as to enable the shielding object image to be matched with the specified position in the face image needing to be mapped. The invention fills the default value in the deformed image by using a nonlinear interpolation method.

Output＝Occlusion×Opacity+Input×(1-Opacity) (3)

Wherein Occlusion is the pixel value of the obstruction, Input is the pixel value of the Input face image, and Opacity is the transparency of the obstruction. Thus, a synthesized occlusion face image can be obtained, as shown in fig. 4.

4. Image rotation

The invention uses the key points of the face to calculate, and respectively averages the coordinates of the left eye and the right eye to obtain two points representing the left eye and the right eye, and uses the connecting line of the two points as the initial deflection angle of the face, as shown in a formula (11).

After the deflection angle α of the face is obtained, the image is rotated by θ - α to obtain a final image of the face with a shielding angle θ, as shown in fig. 5.

Second, network model

According to the method, the initial face candidate frame is acquired on the image by selecting a candidate frame acquisition mode of the sliding window, the image is scaled to different sizes by adopting an image pyramid mode, and the size of the sliding window is fixed, so that areas with different scales in the original image can be acquired; and selecting a mode of inhibiting NMS by the non-maximum value to integrate the frames, predicting and judging scores of a large number of collected candidate frames, selecting the candidate frame with the highest score as a final result from the candidate frames representing the same detection target, and discarding the rest candidate frames with the scores not being the maximum value.

The network model of the invention is divided into three stages, and adopts a face detection strategy from coarse to fine.

1) The detection network model in the first stage adopts a simple convolutional layer structure, roughly screens a large number of candidate face frames, and simultaneously carries out first angle discrimination on the face candidate frames, so that all faces are calibrated to the direction of a normal face (the angle range is [ -90 degrees, 90 degrees ]). The specific network parameters are shown in table 1, and the network structure is shown in fig. 6.

TABLE 1 first-phase network model parameters

The input image of the first stage network model is obtained by using a sliding window for the original image, and the sliding windows with different sizes are all scaled to 24 × 24 image size for processing. For the image obtained from each sliding window, there are 3 detection tasks in the first stage, which are face/non-face classification, bounding box regression, and rotation angle classification, respectively, as shown in formula (1).

[f,t,g]＝F₁(x) (5)

Wherein F₁For the face detector in the first stage, f is a face confidence score, the higher the probability that the face is represented is, and a candidate face frame with low confidence can be removed by setting a threshold value for f, so that the purpose of screening is achieved, t is a feature vector representing a face bounding box, and g is a score in the face direction, namely the probability that the face is a front face.

1) Face/non-face classification task: and carrying out face/non-face prediction classification on the candidate frames. The loss function of the classification task uses the softmax loss function, as shown in equation (2).

2) Face bounding box regression task: the target is a regression accurate face bounding box (square with equal length and width), the feature vector t of the face bounding box is composed of three parameters of coordinates (a and b) at the upper left corner of the bounding box and width w, and the specific calculation mode is shown in formula (3).

Where the band is labeled with the actual value for that parameter.

For the regression task of the feature vector t of the bounding box, the loss function uses smooth L1 loss function, as shown in formula (4)

3) A rotation angle classification task: in the first stage, the present invention simply classifies the direction of the face of a person, i.e., face up (face up) or face down (face down), with a loss function as shown in equation (5).

From the loss functions of the three tasks, the total loss function of the whole detection task in the first stage can be obtained as shown in formula (6)

Wherein λ is_regAnd λ_calAre balance factors for adjusting specific gravity between the respective loss functions.

By iterative optimization of the loss function, the network model in the first stage can realize the function of filtering a large number of non-human face windows, and the rest is taken as the stageAnd outputting the face candidate frame. Meanwhile, according to the score g of the face direction, as shown in formula (7), the face angle θ at this stage is determined₁The following discrimination is made:

0 ° represents face up and 180 ° represents face down. Thus, the face determined to be facing downward is only rotated 180 °, and the face after this stage is calibrated to be facing upward, i.e., the range of the rotation angle of the face is reduced from-180 °,180 ° to-90 °,90 °.

2) Specific network parameters in the second stage are shown in table 2, and a network structure is shown in fig. 7.

TABLE 2 second stage network model parameters

Network type	Size of nucleus	Step size	Output size
				Input device	/	/	24243
Conv1	3*3	1	222220
				MP1	3*3	2	111120
Conv2	3*3	1	9940
				MP2	3*3	2	4440
Conv3	2*2	1	3370
				FC	/	/	140

The face rotation angles at the second stage are classified into 3 classes. After the initial calibration in the first stage, the rotation angle range of the human face is already within the range of [ -90 °,90 ° ], so the second stage further divides the angle range, i.e., [ -90 °,45 ° ], [ -45 °,45 ° ], [45 °,90 ° ], and the corresponding 3 classifications of the rotation angle of the human face are as shown in formula (8):

wherein g is the rotation angle classification probability after softmax calculation. Similar to the first stage, for two classes with rotation angles different from 0, the faces are correspondingly rotationally calibrated again, i.e. the faces within [ -90 °,45 ° ] are rotated by 90 °, the faces within [45 °,90 ° ] are rotated by-90 °, so that all the faces are within the range of positive face angles of [ -45 °,45 ° ]3) the third stage will be able to perform accurate face detection, bounding box regression and calculation of the exact face deflection angle for these candidate face frames. Specific network parameters are shown in table 3, and a network structure is shown in fig. 8.

TABLE 3 third stage network model parameters

Network type	Size of nucleus	Step size	Output size
				Input the method	/	/	48483
Conv1	3*3	1	464624
				MP1	3*3	2	232324
Conv2	3*3	1	212148
				MP2	3*3	2	101048
Conv3	3*3	1	8896
				MP3	2*2	2	4496
Conv4	2*2	1	33192
				FC	/	/	384

Through the processing of the candidate face frame by the third-stage network model, the accurate predicted value theta of the rotation angle in the stage can be obtained₃Then combining the calibration angle theta of the first two stages₁And theta₂The complete plane rotation angle theta of the human face can be calculated_RIPAs shown in formula (9).

θ_RIP＝θ₁+θ₂+θ₃ (13)

The calculation mode adopts the idea of cascade regression, and the calculated quantities of each stage are superposed to obtain a final result. The algorithm face detection flow of the invention is shown in fig. 9.

Example 1

The data set used by the invention comprises FDDB, WIDER FACE, AR and the rotating and sheltered human FACE data set SORF constructed by the invention.

In order to evaluate the quality of the SORF data set constructed by the invention, for the existing SSH algorithm, three different training sets WIDER FACE, CASIA-Webface and SORF are used in common to carry out model training on the existing SSH algorithm, and the results are shown in Table 4 by comparing the detection effects of the existing SSH algorithm on the SORF test set and the AR data set.

Table 4SORF data set quality assessment experiments: experiment on the effect of training set on SSH algorithm (unit:%)

The detection result of the model obtained by the SORF training set on the test set obtains 93.5% of accuracy, and is respectively improved by 25.3% and 17.9% compared with other two groups of tests, so that the robustness of the algorithm on the detection of the rotating and shielded human face can be effectively improved by using the SORF data set.

On the constructed SORF data set, the algorithm and mainstream face detection algorithms such as MTCNN, SSH and PCN are subjected to multiple groups of comparison experiments, a face detection model of each algorithm is obtained by partial training of a training set in the SORF data set, and a test set and the training set of the SORF are not overlapped in a cross mode. The experiment tests the conditions of multiple groups of accuracy rates and recall rates by adjusting the threshold value during the face detection, and calculates the corresponding F₁The results of the scores are shown in Table 5.

TABLE 5 SORF data set comparative experimental results (unit:%)

As can be seen from the data in the table, the algorithm proposed by the present invention performs significantly better on the SOFF data set than the other three algorithms, with F₁The score is improved by 2% to 3% compared with the second PCN algorithm. The method has the advantages that the detection of the rotating and shielded face image is remarkably improved compared with other algorithms, and the method accords with the characteristic of strong robustness.

The algorithm of the invention and other face detection algorithms are compared and tested on two mainstream face detection data sets FDDB and WIDER FACE at present, so as to verify whether the public data sets have the same face detection effectiveness and robustness. The results of the experimental tests on the FDDB data set are shown in table 6 and the results of the experimental tests on the WIDER FACE data set are shown in table 7.

The FDDB data set generally evaluates the detection effect of the algorithm by using the recall rate of the algorithm when the false detection number (FP is the number of non-human faces in a detection return result) is fixed. WIDER FACE the data set comprises Easy, Medium and Hard test sets, and the detection difficulty is improved in sequence.

TABLE 6 FDDB data set comparison of experimental results (unit:%)

TABLE 7 comparison of wire Face data set test results (unit:%)

Algorithm	Easy	Medium	Hard
				MTCNN	84.8	82.5	59.8
SSH	93.1	92.1	84.5
				PCN	87.2	84.6	72.1
The invention	90.8	88.6	80.5

The algorithm of the invention is slightly inferior to the SSH algorithm with the best detection effect, but the difference is not large and is basically within 4%. This shows that the algorithm of the present invention can not achieve the optimal effect on the public data set, but the performance thereof is still within the satisfactory range, and has no great difference from the mainstream excellent detection algorithm. Through the experiment, the algorithm has obvious effect on the detection of the rotating and shielded human face and still has generalization on the human face image in the public data set.

A comparison experiment of the detection speed was performed for different face detection algorithms using 3 image data sets of different sizes, and the results are shown in table 8.

TABLE 8 test speed comparison test results

The algorithm provided by the invention is based on the PCN algorithm, and a more compact angle detection range is adopted compared with the PCN algorithm, so that the detection speed is slightly reduced and is still obviously faster than other algorithms, and the real-time property of the detection algorithm is proved.

In order to find the optimal selection of the algorithm on the training set and the algorithm model, the effect of the face detection model in practical application is improved, and the influence of the network training set composition proportion and the network first-stage and second-stage angle calibration ranges on the experimental result is researched in an ablation experiment mode.

The SORF and the CASIA-Webface are mixed to be used as a training set, experiments are conducted on different proportions of the SORF and the CASIA-Webface in the training set, the accuracy of face detection of a model obtained through training is tested on an SORF data set and a CASIA-Webface data set, and the experimental results are shown in the table 9.

TABLE 9 training set constitutes the results of the ratio experiments (unit:%)

As can be seen from table 9, although the model trained using the SORF data set alone can achieve good detection results in the non-occlusion task, the use of the SORF data set mixed with the non-occlusion raw data set can achieve excellent results in both the occlusion and non-occlusion tasks. By comparing the influence of the proportion of the training set, the method selects the use of the SORF and the CASIA-Webface data in the ratio of 1:1 in the training set, so that the detection accuracy of the image of the face with the shield can be slightly reduced, and the detection effect of the image of the face without the shield can be greatly improved.

In order to investigate the influence of the angle ranges of the first and second calibration stages on the detection result, the angle ranges in table 10 indicate the ranges of the rotation angles of the face to be processed at this stage, and the calibration interval division indicates that the angle range to be detected is divided into a plurality of sub-intervals, and the angle range of each sub-interval. As shown in table 10, a total of 4 sets of comparative experiments were performed, taking the first set as an example: the 360-degree angle range of the first stage indicates that the face image within the 360-degree range is calibrated at the first stage, and the calibration intervals are two 180-degree intervals, namely, the face faces face upwards and face downwards; the angle range 180 in the second stage indicates that the face image in the 180 ° range will be calibrated in this stage, and the calibration intervals are three, namely 45 ° for the face facing left, 45 ° for the face facing right, and 90 ° for the face, and finally the face is calibrated in the angle range of the face.

TABLE 10 Angle calibration Range experiment

As can be seen from the data in the table, when the interval of the first stage is divided into more detail, the accuracy is improved to a certain extent, but at the same time, the detection speed is also reduced. The accuracy of the fourth group is obviously improved compared with the first three groups. The fourth set of experiments is different from the first set of experiments in that the angle calibration ranges of the second stage of the first set of experiments are consistent with those of the first stage of experiments after calibration, but in the fourth set of experiments, the angle range of the first stage of experiments after calibration is 90 degrees, and the angle range of 180 degrees is still calibrated in the second stage of experiments, so that the human face which is mistakenly detected because the human face is positioned at the boundary of two intervals in the first stage can be detected for the second time, which is equivalent to a one-time error correction process, and the accuracy is improved slightly. Although the FPS of the fourth set of experiments is slightly reduced, the requirement of real-time performance is met, and on the basis, the higher accuracy is obviously more important in practical application. The algorithm of the present invention therefore chooses this set of parameters to use in the actual algorithm model.

Claims

1. A real-time multi-pose face detection algorithm based on a progressive calibration type network is characterized by comprising the following specific steps:

(1) based on a CASIAWebFace data set, carrying out shielding object labeling and face key point positioning, then sequentially shielding face synthesis and image rotation, and constructing a shielded and rotated face data set SOR;

(2) processing the image in the SOR obtained in the step (1), detecting the obtained face candidate frame in three stages from coarse to fine, calibrating the angle of the face within a certain range in each stage, and performing final detection and accurate angle judgment on the calibrated face image in the last stage; the method comprises the following specific steps:

first) in a first stage, roughly screening a large number of candidate face frames through a first-stage network model consisting of an input layer, a first convolution layer with convolution kernel size of 3 × 3 and step size of 2, a second convolution layer with convolution kernel size of 3 × 3 and step size of 2, a third convolution layer with convolution kernel size of 3 × 3 and step size of 2 and full-connection layers, and simultaneously carrying out first angle discrimination on the face candidate frames to calibrate all faces to the face direction, namely the angle range of [ -90 degrees, 90 degrees ];

the input image of the first stage network model is obtained by using a sliding window mode for an original image, and for each image obtained by the sliding window, there are 3 detection tasks in the first stage, namely face/non-face classification, bounding box regression and rotation angle classification, as shown in formula (1):

[f,t,g]＝F₁(x) (1)

wherein F₁F is a face detector in the first stage, f is a face confidence score, a candidate face frame with low confidence can be removed by setting a threshold value for f, t is a feature vector representing a face bounding box, and g is a score in the face direction, namely the probability that the face is a positive face;

1) face/non-face classification task: and (3) performing face/non-face prediction classification on the candidate frame, wherein a softmax loss function is used as a loss function of the classification task, as shown in formula (2):

2) face bounding box regression task: the target is a regression accurate face bounding box, namely a square with equal length and width, the feature vector t of the bounding box consists of three parameters, namely coordinates (a and b) at the upper left corner of the bounding box and width w, and the specific calculation mode is as shown in formula (3):

wherein the band is marked with the actual value corresponding to the parameter;

for the regression task of the feature vector t of the bounding box, the loss function uses the smooth L1 loss function, as shown in formula (4):

3) a rotation angle classification task: in the first stage, the direction of the face is simply classified into two categories, i.e. face up and face down or face down and upside down, and the loss function is shown in formula (5):

obtaining the total loss function of the whole detection task in the first stage according to the loss functions of the three tasks is shown in formula (6):

wherein λ_regAnd λ_calAre balance factors used for adjusting the proportion among all loss functions;

by iterative optimization of the loss function, the network model in the first stage can realize the function of filtering a large number of non-face windows, the rest is used as a face candidate frame output in the stage, and meanwhile, the face angle theta in the stage can be used according to the score g of the face direction as shown in formula (7)₁The following discrimination is made:

0 degrees represents that the face faces upwards, and 180 degrees represents that the face faces downwards;

through a first stage, the face judged to be downward is rotated by 180 degrees, so that the face is calibrated to be upward, namely the range of the rotation angle of the face is reduced from [ -180 degrees, 180 degrees ] to [ -90 degrees, 90 degrees ];

second) second stage, the second stage network model is composed of input layers, a first convolution layer with convolution kernel size of 3 × 3 step size 1, a first active layer with convolution kernel size of 3 × 3 step size 2, a second convolution layer with convolution kernel size of 3 × 3 step size 1, a second active layer with convolution kernel size of 3 × 3 step size 2, a third convolution layer with convolution kernel size of 2 × 2 step size 1 and full connection layers;

the face rotation angles at the second stage are classified into 3 classes, namely three ranges of [ -90 °,45 ° ], [ -45 °,45 ° ], [45 °,90 ° ], and the corresponding 3 classes of face rotation angles are shown in formula (8):

wherein g is_iClassifying the probability of the rotation angle after the softmax calculation;

after the second stage, for two classifications with the rotation angle not being 0, performing corresponding rotation calibration on the human faces again, namely rotating the human faces within the range of minus 90 degrees and 45 degrees by 90 degrees and rotating the human faces within the range of 45 degrees and 90 degrees by minus 90 degrees, so that all the human faces are within the range of the positive face angle of minus 45 degrees and 45 degrees;

and thirdly) a third stage, using input layers, a first convolution layer with convolution kernel size of 3 × 3 and step size of 1, a first active layer with convolution kernel size of 3 × 3 and step size of 2, a second convolution layer with convolution kernel size of 3 × 3 and step size of 1, a second active layer with convolution kernel size of 3 × 3 and step size of 2, a third convolution layer with convolution kernel size of 2 × 2 and step size of 1, a third active layer with convolution kernel size of 2 × 2 and step size of 2, a fourth convolution layer with convolution kernel size of 2 × 2 and step size of 1, and a full connection layer to form the third stageThe third-stage network model carries out accurate face detection, bounding box regression and exact face deflection angle calculation on the candidate face frame; processing the candidate face frame by the third-stage network model to obtain an accurate predicted value theta of the rotation angle in the stage₃Then combining the calibration angle theta of the first two stages₁And theta₂Calculating the complete plane rotation angle theta of the human face_RIPAs shown in formula (9):

θ_RIP＝θ₁+θ₂+θ₃ (9)。

2. the real-time multi-pose face detection algorithm according to claim 1, wherein in the step (1), the face key points are obtained on the face image by a Dlib face key point detection method.

3. The real-time multi-pose face detection algorithm according to claim 1, characterized in that in step (1), when the occlusion object is synthesized with the face, a thin plate spline deformation TPS method is adopted to deform the occlusion object image so as to make it coincide with the designated position in the face image to be mapped; filling default values in the deformed image by using a nonlinear interpolation method; after the barrier is deformed, the image synthesis of the position corresponding to the face image is shown as the formula (10):

Output＝Occlusion×Opacity+Input×(1-Opacity) (10)

4. The real-time multi-pose face detection algorithm according to claim 1, wherein in step (1), the initial deflection angle α of the face is calculated by using the coordinates of the left and right eyes, and then a face image with rotation and occlusion is obtained by rotating a designated angle; wherein: coordinates (x) of left and right eyes_i,y_i) Averaging is performed to obtain two points (x) representing the left and right eyes_L,y_L) And (x)_R,y_R) Taking the line connecting the two points as the initial deflection angle α of the face, as shown in equation (11):

5. the real-time multi-pose face detection algorithm according to claim 1, wherein in step (2), the method for processing the image in the SOR obtained in step (1) is as follows: acquiring an initial human face candidate frame by adopting a candidate frame acquisition mode of a sliding window, zooming an image by adopting an image pyramid mode, fixing the size of the sliding window, and acquiring areas with different scales in an original image; and integrating in a non-maximum value NMS (network management system) inhibition mode, predicting and scoring a large number of collected candidate frames, and selecting the candidate frame with the highest score as a final result aiming at the same detection target.