CN111739070B - Real-time multi-pose face detection algorithm based on progressive calibration type network - Google Patents

Real-time multi-pose face detection algorithm based on progressive calibration type network Download PDF

Info

Publication number
CN111739070B
CN111739070B CN202010471082.3A CN202010471082A CN111739070B CN 111739070 B CN111739070 B CN 111739070B CN 202010471082 A CN202010471082 A CN 202010471082A CN 111739070 B CN111739070 B CN 111739070B
Authority
CN
China
Prior art keywords
face
stage
image
degrees
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010471082.3A
Other languages
Chinese (zh)
Other versions
CN111739070A (en
Inventor
吴渊
金城
李雨晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202010471082.3A priority Critical patent/CN111739070B/en
Publication of CN111739070A publication Critical patent/CN111739070A/en
Application granted granted Critical
Publication of CN111739070B publication Critical patent/CN111739070B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention belongs to the technical field of face detection, and particularly relates to a real-time multi-pose face detection algorithm based on a progressive calibration type network. The invention aims at the problems of rotation and shielding during face detection, and solves the problem of face rotation by three stages, wherein each stage calibrates the face in a certain range of angles, and finally detects and judges the accurate angle of the calibrated face image in the last stage. On the basis of a CASIA-Webface data set, a series of processes are enhanced through data such as shielding object labeling, human face key point positioning, shielding human face synthesis and image rotation, and a rotating shielded SORF (sequence of partial images of human faces) human face data set is constructed and used for training and testing a network model. The invention can effectively detect the rotating and shielded human face on the basis of ensuring the detection real-time performance, and obtains excellent effect on common public data sets.

Description

Real-time multi-pose face detection algorithm based on progressive calibration type network
Technical Field
The invention belongs to the technical field of face detection, and particularly relates to a real-time multi-pose face detection algorithm based on a progressive calibration type network.
Background
With the increasing development of the information-based society, a face detection and identification technology with the advantages of non-contact, non-intrusion, intuition, simplicity and convenience and the like is available everywhere in life, and related products and applications thereof are successfully applied to various scene public security fields such as mobile phone face brushing unlocking, face shooting locking, self-service face brushing payment, face beautifying and the like, camera monitoring, hotel check-in identity verification, train station face brushing ticket checking and the like. Under the actual application scene, the rotation and the sheltering from of people's face are difficult to avoid at the in-process of gathering the image, if the angle deviation of shooting equipment and people's face can cause the rotation of people's face in the image, and the dress of personage (gauze mask, sunglasses, bang etc.) can cause the partial information of people's face in the image to be sheltered from. The rotation and the shielding of the face image can cause the face detection effect to be poor.
The mainstream human face detection algorithm processes an image in a two-dimensional pixel matrix manner, determines an area meeting a specific calculation result as a human face, and in the detection process, rotation can destroy the azimuth characteristic of the human face, and shielding can destroy the geometric characteristic of the human face, so that the detection algorithm can not accurately and quickly detect and judge the human face, and the human face detection effect is influenced. It is imperative to reduce or even eliminate the negative impact of rotation and occlusion on the face detection algorithm. The problems are solved, the robustness of the face detection algorithm is improved, the face detection technology is favorably and better landed and applied to a real scene, better basic support is provided for further subsequent face correlation technology application, meanwhile, an idea is provided for solving similar problems in the future, and the method has profound influence on the development and application of the target detection algorithm.
Disclosure of Invention
The invention aims to provide a real-time multi-pose face detection algorithm based on a progressive calibration type network, which can be applied to a face detection system to solve the negative influence of rotation and shielding on face detection.
The invention solves the problem of rotation and shielding of the face in face detection by dividing the rotation problem of the face into three stages, wherein each stage calibrates the face in a certain range of angles, and finally detects and judges the accurate angle of the calibrated face image in the last stage. On the basis of a CASIA-Webface data set, a rotating shielded SORF face data set is constructed by enhancing a series of processes through data such as shielding object labeling, face key point positioning, shielded face synthesis, image rotation and the like and is used for training and testing a network model. The invention can effectively detect the rotating and shielded human face on the basis of ensuring the detection real-time performance, and obtains excellent effect on common public data sets. The technical scheme of the invention is specifically described as follows.
The invention provides a real-time multi-pose face detection algorithm based on a progressive calibration type network, which comprises the following specific steps of:
1) based on a CASIAWebFace data set, carrying out shielding object labeling and face key point positioning, then sequentially shielding face synthesis and image rotation, and constructing a shielded and rotated face data set SOR;
2) processing the image in the SOR obtained in the step 1), detecting the obtained face candidate frame in three stages from coarse to fine, calibrating the angle of the face within a certain range in each stage, and performing final detection and accurate angle judgment on the calibrated face image in the last stage.
In the invention, in the step 1), the face key points are obtained on the face image by adopting a method of Dlib face key point detection.
In the invention, in the step 1), when the shielding object is synthesized with the face, a thin plate spline deformation TPS method is adopted to deform the image of the shielding object so as to enable the image to be matched with the specified position in the face image needing to be mapped; filling default values in the deformed image by using a nonlinear interpolation method;
after the barrier is deformed, the image synthesis at the position corresponding to the face image is shown as formula (10).
Output=Occlusion×Opacity+Input×(1-Opacity) (1)
Wherein Occlusion is the pixel value of the obstruction, Input is the pixel value of the Input face image, and Opacity is the transparency of the obstruction.
In the invention, in the step 1), the initial deflection angle alpha of the human face is calculated by utilizing the coordinates of the left eye and the right eye, and then a human face image with rotation and shielding is obtained by rotating a specified angle; wherein: coordinates (x) of left and right eyesi,yi) Averaging is performed to obtain two points (x) representing the left and right eyesL,yL) And (x)R,yR) Taking the connection line of the two points as the initial deflection angle α of the face, as shown in equation (11):
Figure BDA0002514316760000021
the problem of the rotary face detection is solved through a deep learning model based on a progressive calibration type network; for the problem of the detection of the face with the occlusion, the problem is solved by constructing a data set of the face with the occlusion, namely, a mode of expanding a training set. Compared with the prior art, the invention has the beneficial effects that:
(1) the invention provides a method for solving the problems of rotary and shielded face detection, and the face detection of the invention can meet the requirements of effectiveness, robustness and instantaneity in practical application;
(2) the method constructs a rotating and shielded human face data set SORF in a mode of shielding object synthesis on the basis of the existing public data set, so that a network model obtained by training the data set has a better detection effect on the rotating and shielded human face, and has better generalization capability on the detection of the controlled human face;
(2) the method comprises a plurality of stages, and the face detection and angle calibration are performed on the rotating face image step by step, so that the difficulty of predicting the rotating angle of the face is reduced, and the accuracy of the face detection and angle calibration is improved.
(3) By utilizing the cascade network model, the face image is processed from coarse to fine, and the face detection efficiency is greatly improved.
Drawings
FIG. 1 is a block flow diagram of the algorithm of the present invention.
Fig. 2 is a schematic diagram of key points of sunglasses and masks.
Fig. 3 shows the position of the sunglasses and mask (blue is sunglasses, green is mask).
FIG. 4 is a schematic diagram of occluded face synthesis.
Fig. 5 is a schematic view of a rotated image.
Fig. 6 is a first stage network architecture.
Fig. 7 is a second stage network architecture.
Fig. 8 is a third stage network structure.
FIG. 9 is a schematic diagram of an algorithm face detection process in the embodiment of the present invention.
Detailed Description
Fig. 1 is a block diagram of the overall system flow of the real-time multi-pose face detection algorithm design based on the progressive calibration type network according to the present invention. The method mainly comprises the steps of constructing the SORF data set and modeling the network.
One, SORF data set construction
The method is based on a CASIAWbFace data set, specifically adopts a barrier synthesis mode, and synthesizes sunglasses, a mask and a human face through positioning of key points of the human face and key points of barriers (sunglasses and masks) to obtain a human face image with the barrier; meanwhile, the initial deflection angle of the face is calculated by using the key points of the face, and the final face image with rotation and occlusion is obtained by rotating the specified angle, so that an Occluded and Rotated face data set SORF (synthetic Occluded and Rotated faces) is constructed.
The data set construction method comprises four stages of shielding object key point marking, human face key point positioning, shielding human face synthesis and image rotation.
1. Obstruction keypoint labeling
The sunglasses and the mask are shields and are synthesized with the face image, and the situation that the face is shielded in a real scene is simulated. Firstly, pictures of a plurality of independent sunglasses and masks, namely pictures of the sunglasses and the masks which are not worn on the human face, are manually collected on the internet, and key points are marked in a manual marking mode, wherein 10 points are selected as the key points by the sunglasses, and 12 points are selected as the key points by the masks, as shown in fig. 2.
2. Human face key point positioning
The invention selects a Dlib face key point detection method to obtain 68 key points on each face image. Using part of the 68 key points to perform calculation, the positions of the stickers of the sunglasses and the mask in the face image are determined, as shown in fig. 3.
3. Occlusion face synthesis
The invention adopts a TPS (thin plate spline deformation) method to deform the shielding object image so as to enable the shielding object image to be matched with the specified position in the face image needing to be mapped. The invention fills the default value in the deformed image by using a nonlinear interpolation method.
After the barrier is deformed, the image synthesis at the position corresponding to the face image is shown as formula (10).
Output=Occlusion×Opacity+Input×(1-Opacity) (3)
Wherein Occlusion is the pixel value of the obstruction, Input is the pixel value of the Input face image, and Opacity is the transparency of the obstruction. Thus, a synthesized occlusion face image can be obtained, as shown in fig. 4.
4. Image rotation
The invention uses the key points of the face to calculate, and respectively averages the coordinates of the left eye and the right eye to obtain two points representing the left eye and the right eye, and uses the connecting line of the two points as the initial deflection angle of the face, as shown in a formula (11).
Figure BDA0002514316760000041
After the deflection angle α of the face is obtained, the image is rotated by θ - α to obtain a final image of the face with a shielding angle θ, as shown in fig. 5.
Second, network model
According to the method, the initial face candidate frame is acquired on the image by selecting a candidate frame acquisition mode of the sliding window, the image is scaled to different sizes by adopting an image pyramid mode, and the size of the sliding window is fixed, so that areas with different scales in the original image can be acquired; and selecting a mode of inhibiting NMS by the non-maximum value to integrate the frames, predicting and judging scores of a large number of collected candidate frames, selecting the candidate frame with the highest score as a final result from the candidate frames representing the same detection target, and discarding the rest candidate frames with the scores not being the maximum value.
The network model of the invention is divided into three stages, and adopts a face detection strategy from coarse to fine.
1) The detection network model in the first stage adopts a simple convolutional layer structure, roughly screens a large number of candidate face frames, and simultaneously carries out first angle discrimination on the face candidate frames, so that all faces are calibrated to the direction of a normal face (the angle range is [ -90 degrees, 90 degrees ]). The specific network parameters are shown in table 1, and the network structure is shown in fig. 6.
TABLE 1 first-phase network model parameters
Figure BDA0002514316760000042
Figure BDA0002514316760000051
The input image of the first stage network model is obtained by using a sliding window for the original image, and the sliding windows with different sizes are all scaled to 24 × 24 image size for processing. For the image obtained from each sliding window, there are 3 detection tasks in the first stage, which are face/non-face classification, bounding box regression, and rotation angle classification, respectively, as shown in formula (1).
[f,t,g]=F1(x) (5)
Wherein F1For the face detector in the first stage, f is a face confidence score, the higher the probability that the face is represented is, and a candidate face frame with low confidence can be removed by setting a threshold value for f, so that the purpose of screening is achieved, t is a feature vector representing a face bounding box, and g is a score in the face direction, namely the probability that the face is a front face.
1) Face/non-face classification task: and carrying out face/non-face prediction classification on the candidate frames. The loss function of the classification task uses the softmax loss function, as shown in equation (2).
Figure BDA0002514316760000052
2) Face bounding box regression task: the target is a regression accurate face bounding box (square with equal length and width), the feature vector t of the face bounding box is composed of three parameters of coordinates (a and b) at the upper left corner of the bounding box and width w, and the specific calculation mode is shown in formula (3).
Figure BDA0002514316760000053
Where the band is labeled with the actual value for that parameter.
For the regression task of the feature vector t of the bounding box, the loss function uses smooth L1 loss function, as shown in formula (4)
Figure BDA0002514316760000054
3) A rotation angle classification task: in the first stage, the present invention simply classifies the direction of the face of a person, i.e., face up (face up) or face down (face down), with a loss function as shown in equation (5).
Figure BDA0002514316760000061
From the loss functions of the three tasks, the total loss function of the whole detection task in the first stage can be obtained as shown in formula (6)
Figure BDA0002514316760000062
Wherein λ isregAnd λcalAre balance factors for adjusting specific gravity between the respective loss functions.
By iterative optimization of the loss function, the network model in the first stage can realize the function of filtering a large number of non-human face windows, and the rest is taken as the stageAnd outputting the face candidate frame. Meanwhile, according to the score g of the face direction, as shown in formula (7), the face angle θ at this stage is determined1The following discrimination is made:
Figure BDA0002514316760000063
0 ° represents face up and 180 ° represents face down. Thus, the face determined to be facing downward is only rotated 180 °, and the face after this stage is calibrated to be facing upward, i.e., the range of the rotation angle of the face is reduced from-180 °,180 ° to-90 °,90 °.
2) Specific network parameters in the second stage are shown in table 2, and a network structure is shown in fig. 7.
TABLE 2 second stage network model parameters
Network type Size of nucleus Step size Output size
Input device / / 24*24*3
Conv1 3*3 1 22*22*20
MP1 3*3 2 11*11*20
Conv2 3*3 1 9*9*40
MP2 3*3 2 4*4*40
Conv3 2*2 1 3*3*70
FC / / 140
The face rotation angles at the second stage are classified into 3 classes. After the initial calibration in the first stage, the rotation angle range of the human face is already within the range of [ -90 °,90 ° ], so the second stage further divides the angle range, i.e., [ -90 °,45 ° ], [ -45 °,45 ° ], [45 °,90 ° ], and the corresponding 3 classifications of the rotation angle of the human face are as shown in formula (8):
Figure BDA0002514316760000071
wherein g is the rotation angle classification probability after softmax calculation. Similar to the first stage, for two classes with rotation angles different from 0, the faces are correspondingly rotationally calibrated again, i.e. the faces within [ -90 °,45 ° ] are rotated by 90 °, the faces within [45 °,90 ° ] are rotated by-90 °, so that all the faces are within the range of positive face angles of [ -45 °,45 ° ]3) the third stage will be able to perform accurate face detection, bounding box regression and calculation of the exact face deflection angle for these candidate face frames. Specific network parameters are shown in table 3, and a network structure is shown in fig. 8.
TABLE 3 third stage network model parameters
Network type Size of nucleus Step size Output size
Input the method / / 48*48*3
Conv1 3*3 1 46*46*24
MP1 3*3 2 23*23*24
Conv2 3*3 1 21*21*48
MP2 3*3 2 10*10*48
Conv3 3*3 1 8*8*96
MP3 2*2 2 4*4*96
Conv4 2*2 1 3*3*192
FC / / 384
Through the processing of the candidate face frame by the third-stage network model, the accurate predicted value theta of the rotation angle in the stage can be obtained3Then combining the calibration angle theta of the first two stages1And theta2The complete plane rotation angle theta of the human face can be calculatedRIPAs shown in formula (9).
θRIP=θ123 (13)
The calculation mode adopts the idea of cascade regression, and the calculated quantities of each stage are superposed to obtain a final result. The algorithm face detection flow of the invention is shown in fig. 9.
Example 1
The data set used by the invention comprises FDDB, WIDER FACE, AR and the rotating and sheltered human FACE data set SORF constructed by the invention.
In order to evaluate the quality of the SORF data set constructed by the invention, for the existing SSH algorithm, three different training sets WIDER FACE, CASIA-Webface and SORF are used in common to carry out model training on the existing SSH algorithm, and the results are shown in Table 4 by comparing the detection effects of the existing SSH algorithm on the SORF test set and the AR data set.
Table 4SORF data set quality assessment experiments: experiment on the effect of training set on SSH algorithm (unit:%)
Figure BDA0002514316760000081
The detection result of the model obtained by the SORF training set on the test set obtains 93.5% of accuracy, and is respectively improved by 25.3% and 17.9% compared with other two groups of tests, so that the robustness of the algorithm on the detection of the rotating and shielded human face can be effectively improved by using the SORF data set.
On the constructed SORF data set, the algorithm and mainstream face detection algorithms such as MTCNN, SSH and PCN are subjected to multiple groups of comparison experiments, a face detection model of each algorithm is obtained by partial training of a training set in the SORF data set, and a test set and the training set of the SORF are not overlapped in a cross mode. The experiment tests the conditions of multiple groups of accuracy rates and recall rates by adjusting the threshold value during the face detection, and calculates the corresponding F1The results of the scores are shown in Table 5.
TABLE 5 SORF data set comparative experimental results (unit:%)
Figure BDA0002514316760000082
As can be seen from the data in the table, the algorithm proposed by the present invention performs significantly better on the SOFF data set than the other three algorithms, with F1The score is improved by 2% to 3% compared with the second PCN algorithm. The method has the advantages that the detection of the rotating and shielded face image is remarkably improved compared with other algorithms, and the method accords with the characteristic of strong robustness.
The algorithm of the invention and other face detection algorithms are compared and tested on two mainstream face detection data sets FDDB and WIDER FACE at present, so as to verify whether the public data sets have the same face detection effectiveness and robustness. The results of the experimental tests on the FDDB data set are shown in table 6 and the results of the experimental tests on the WIDER FACE data set are shown in table 7.
The FDDB data set generally evaluates the detection effect of the algorithm by using the recall rate of the algorithm when the false detection number (FP is the number of non-human faces in a detection return result) is fixed. WIDER FACE the data set comprises Easy, Medium and Hard test sets, and the detection difficulty is improved in sequence.
TABLE 6 FDDB data set comparison of experimental results (unit:%)
Figure BDA0002514316760000091
TABLE 7 comparison of wire Face data set test results (unit:%)
Algorithm Easy Medium Hard
MTCNN 84.8 82.5 59.8
SSH 93.1 92.1 84.5
PCN 87.2 84.6 72.1
The invention 90.8 88.6 80.5
The algorithm of the invention is slightly inferior to the SSH algorithm with the best detection effect, but the difference is not large and is basically within 4%. This shows that the algorithm of the present invention can not achieve the optimal effect on the public data set, but the performance thereof is still within the satisfactory range, and has no great difference from the mainstream excellent detection algorithm. Through the experiment, the algorithm has obvious effect on the detection of the rotating and shielded human face and still has generalization on the human face image in the public data set.
A comparison experiment of the detection speed was performed for different face detection algorithms using 3 image data sets of different sizes, and the results are shown in table 8.
TABLE 8 test speed comparison test results
Figure BDA0002514316760000092
The algorithm provided by the invention is based on the PCN algorithm, and a more compact angle detection range is adopted compared with the PCN algorithm, so that the detection speed is slightly reduced and is still obviously faster than other algorithms, and the real-time property of the detection algorithm is proved.
In order to find the optimal selection of the algorithm on the training set and the algorithm model, the effect of the face detection model in practical application is improved, and the influence of the network training set composition proportion and the network first-stage and second-stage angle calibration ranges on the experimental result is researched in an ablation experiment mode.
The SORF and the CASIA-Webface are mixed to be used as a training set, experiments are conducted on different proportions of the SORF and the CASIA-Webface in the training set, the accuracy of face detection of a model obtained through training is tested on an SORF data set and a CASIA-Webface data set, and the experimental results are shown in the table 9.
TABLE 9 training set constitutes the results of the ratio experiments (unit:%)
Figure BDA0002514316760000101
As can be seen from table 9, although the model trained using the SORF data set alone can achieve good detection results in the non-occlusion task, the use of the SORF data set mixed with the non-occlusion raw data set can achieve excellent results in both the occlusion and non-occlusion tasks. By comparing the influence of the proportion of the training set, the method selects the use of the SORF and the CASIA-Webface data in the ratio of 1:1 in the training set, so that the detection accuracy of the image of the face with the shield can be slightly reduced, and the detection effect of the image of the face without the shield can be greatly improved.
In order to investigate the influence of the angle ranges of the first and second calibration stages on the detection result, the angle ranges in table 10 indicate the ranges of the rotation angles of the face to be processed at this stage, and the calibration interval division indicates that the angle range to be detected is divided into a plurality of sub-intervals, and the angle range of each sub-interval. As shown in table 10, a total of 4 sets of comparative experiments were performed, taking the first set as an example: the 360-degree angle range of the first stage indicates that the face image within the 360-degree range is calibrated at the first stage, and the calibration intervals are two 180-degree intervals, namely, the face faces face upwards and face downwards; the angle range 180 in the second stage indicates that the face image in the 180 ° range will be calibrated in this stage, and the calibration intervals are three, namely 45 ° for the face facing left, 45 ° for the face facing right, and 90 ° for the face, and finally the face is calibrated in the angle range of the face.
TABLE 10 Angle calibration Range experiment
Figure BDA0002514316760000102
As can be seen from the data in the table, when the interval of the first stage is divided into more detail, the accuracy is improved to a certain extent, but at the same time, the detection speed is also reduced. The accuracy of the fourth group is obviously improved compared with the first three groups. The fourth set of experiments is different from the first set of experiments in that the angle calibration ranges of the second stage of the first set of experiments are consistent with those of the first stage of experiments after calibration, but in the fourth set of experiments, the angle range of the first stage of experiments after calibration is 90 degrees, and the angle range of 180 degrees is still calibrated in the second stage of experiments, so that the human face which is mistakenly detected because the human face is positioned at the boundary of two intervals in the first stage can be detected for the second time, which is equivalent to a one-time error correction process, and the accuracy is improved slightly. Although the FPS of the fourth set of experiments is slightly reduced, the requirement of real-time performance is met, and on the basis, the higher accuracy is obviously more important in practical application. The algorithm of the present invention therefore chooses this set of parameters to use in the actual algorithm model.

Claims (5)

1. A real-time multi-pose face detection algorithm based on a progressive calibration type network is characterized by comprising the following specific steps:
(1) based on a CASIAWebFace data set, carrying out shielding object labeling and face key point positioning, then sequentially shielding face synthesis and image rotation, and constructing a shielded and rotated face data set SOR;
(2) processing the image in the SOR obtained in the step (1), detecting the obtained face candidate frame in three stages from coarse to fine, calibrating the angle of the face within a certain range in each stage, and performing final detection and accurate angle judgment on the calibrated face image in the last stage; the method comprises the following specific steps:
first) in a first stage, roughly screening a large number of candidate face frames through a first-stage network model consisting of an input layer, a first convolution layer with convolution kernel size of 3 × 3 and step size of 2, a second convolution layer with convolution kernel size of 3 × 3 and step size of 2, a third convolution layer with convolution kernel size of 3 × 3 and step size of 2 and full-connection layers, and simultaneously carrying out first angle discrimination on the face candidate frames to calibrate all faces to the face direction, namely the angle range of [ -90 degrees, 90 degrees ];
the input image of the first stage network model is obtained by using a sliding window mode for an original image, and for each image obtained by the sliding window, there are 3 detection tasks in the first stage, namely face/non-face classification, bounding box regression and rotation angle classification, as shown in formula (1):
[f,t,g]=F1(x) (1)
wherein F1F is a face detector in the first stage, f is a face confidence score, a candidate face frame with low confidence can be removed by setting a threshold value for f, t is a feature vector representing a face bounding box, and g is a score in the face direction, namely the probability that the face is a positive face;
1) face/non-face classification task: and (3) performing face/non-face prediction classification on the candidate frame, wherein a softmax loss function is used as a loss function of the classification task, as shown in formula (2):
Figure FDA0003559631800000011
2) face bounding box regression task: the target is a regression accurate face bounding box, namely a square with equal length and width, the feature vector t of the bounding box consists of three parameters, namely coordinates (a and b) at the upper left corner of the bounding box and width w, and the specific calculation mode is as shown in formula (3):
Figure FDA0003559631800000012
wherein the band is marked with the actual value corresponding to the parameter;
for the regression task of the feature vector t of the bounding box, the loss function uses the smooth L1 loss function, as shown in formula (4):
Figure FDA0003559631800000021
3) a rotation angle classification task: in the first stage, the direction of the face is simply classified into two categories, i.e. face up and face down or face down and upside down, and the loss function is shown in formula (5):
Figure FDA0003559631800000022
obtaining the total loss function of the whole detection task in the first stage according to the loss functions of the three tasks is shown in formula (6):
Figure FDA0003559631800000023
wherein λregAnd λcalAre balance factors used for adjusting the proportion among all loss functions;
by iterative optimization of the loss function, the network model in the first stage can realize the function of filtering a large number of non-face windows, the rest is used as a face candidate frame output in the stage, and meanwhile, the face angle theta in the stage can be used according to the score g of the face direction as shown in formula (7)1The following discrimination is made:
Figure FDA0003559631800000024
0 degrees represents that the face faces upwards, and 180 degrees represents that the face faces downwards;
through a first stage, the face judged to be downward is rotated by 180 degrees, so that the face is calibrated to be upward, namely the range of the rotation angle of the face is reduced from [ -180 degrees, 180 degrees ] to [ -90 degrees, 90 degrees ];
second) second stage, the second stage network model is composed of input layers, a first convolution layer with convolution kernel size of 3 × 3 step size 1, a first active layer with convolution kernel size of 3 × 3 step size 2, a second convolution layer with convolution kernel size of 3 × 3 step size 1, a second active layer with convolution kernel size of 3 × 3 step size 2, a third convolution layer with convolution kernel size of 2 × 2 step size 1 and full connection layers;
the face rotation angles at the second stage are classified into 3 classes, namely three ranges of [ -90 °,45 ° ], [ -45 °,45 ° ], [45 °,90 ° ], and the corresponding 3 classes of face rotation angles are shown in formula (8):
Figure FDA0003559631800000025
wherein g isiClassifying the probability of the rotation angle after the softmax calculation;
after the second stage, for two classifications with the rotation angle not being 0, performing corresponding rotation calibration on the human faces again, namely rotating the human faces within the range of minus 90 degrees and 45 degrees by 90 degrees and rotating the human faces within the range of 45 degrees and 90 degrees by minus 90 degrees, so that all the human faces are within the range of the positive face angle of minus 45 degrees and 45 degrees;
and thirdly) a third stage, using input layers, a first convolution layer with convolution kernel size of 3 × 3 and step size of 1, a first active layer with convolution kernel size of 3 × 3 and step size of 2, a second convolution layer with convolution kernel size of 3 × 3 and step size of 1, a second active layer with convolution kernel size of 3 × 3 and step size of 2, a third convolution layer with convolution kernel size of 2 × 2 and step size of 1, a third active layer with convolution kernel size of 2 × 2 and step size of 2, a fourth convolution layer with convolution kernel size of 2 × 2 and step size of 1, and a full connection layer to form the third stageThe third-stage network model carries out accurate face detection, bounding box regression and exact face deflection angle calculation on the candidate face frame; processing the candidate face frame by the third-stage network model to obtain an accurate predicted value theta of the rotation angle in the stage3Then combining the calibration angle theta of the first two stages1And theta2Calculating the complete plane rotation angle theta of the human faceRIPAs shown in formula (9):
θRIP=θ123 (9)。
2. the real-time multi-pose face detection algorithm according to claim 1, wherein in the step (1), the face key points are obtained on the face image by a Dlib face key point detection method.
3. The real-time multi-pose face detection algorithm according to claim 1, characterized in that in step (1), when the occlusion object is synthesized with the face, a thin plate spline deformation TPS method is adopted to deform the occlusion object image so as to make it coincide with the designated position in the face image to be mapped; filling default values in the deformed image by using a nonlinear interpolation method; after the barrier is deformed, the image synthesis of the position corresponding to the face image is shown as the formula (10):
Output=Occlusion×Opacity+Input×(1-Opacity) (10)
wherein Occlusion is the pixel value of the obstruction, Input is the pixel value of the Input face image, and Opacity is the transparency of the obstruction.
4. The real-time multi-pose face detection algorithm according to claim 1, wherein in step (1), the initial deflection angle α of the face is calculated by using the coordinates of the left and right eyes, and then a face image with rotation and occlusion is obtained by rotating a designated angle; wherein: coordinates (x) of left and right eyesi,yi) Averaging is performed to obtain two points (x) representing the left and right eyesL,yL) And (x)R,yR) Taking the line connecting the two points as the initial deflection angle α of the face, as shown in equation (11):
Figure FDA0003559631800000031
5. the real-time multi-pose face detection algorithm according to claim 1, wherein in step (2), the method for processing the image in the SOR obtained in step (1) is as follows: acquiring an initial human face candidate frame by adopting a candidate frame acquisition mode of a sliding window, zooming an image by adopting an image pyramid mode, fixing the size of the sliding window, and acquiring areas with different scales in an original image; and integrating in a non-maximum value NMS (network management system) inhibition mode, predicting and scoring a large number of collected candidate frames, and selecting the candidate frame with the highest score as a final result aiming at the same detection target.
CN202010471082.3A 2020-05-28 2020-05-28 Real-time multi-pose face detection algorithm based on progressive calibration type network Active CN111739070B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010471082.3A CN111739070B (en) 2020-05-28 2020-05-28 Real-time multi-pose face detection algorithm based on progressive calibration type network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010471082.3A CN111739070B (en) 2020-05-28 2020-05-28 Real-time multi-pose face detection algorithm based on progressive calibration type network

Publications (2)

Publication Number Publication Date
CN111739070A CN111739070A (en) 2020-10-02
CN111739070B true CN111739070B (en) 2022-07-22

Family

ID=72646742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010471082.3A Active CN111739070B (en) 2020-05-28 2020-05-28 Real-time multi-pose face detection algorithm based on progressive calibration type network

Country Status (1)

Country Link
CN (1) CN111739070B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418195B (en) * 2021-01-22 2021-04-09 电子科技大学中山学院 Face key point detection method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145867A (en) * 2017-05-09 2017-09-08 电子科技大学 Face and face occluder detection method based on multitask deep learning
CN107748858A (en) * 2017-06-15 2018-03-02 华南理工大学 A kind of multi-pose eye locating method based on concatenated convolutional neutral net
CN107886074A (en) * 2017-11-13 2018-04-06 苏州科达科技股份有限公司 A kind of method for detecting human face and face detection system
CN108805040A (en) * 2018-05-24 2018-11-13 复旦大学 It is a kind of that face recognition algorithms are blocked based on piecemeal
CN110263774A (en) * 2019-08-19 2019-09-20 珠海亿智电子科技有限公司 A kind of method for detecting human face
CN110458005A (en) * 2019-07-02 2019-11-15 重庆邮电大学 It is a kind of based on the progressive invariable rotary method for detecting human face with pseudo-crystalline lattice of multitask
CN110580445A (en) * 2019-07-12 2019-12-17 西北工业大学 Face key point detection method based on GIoU and weighted NMS improvement

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145867A (en) * 2017-05-09 2017-09-08 电子科技大学 Face and face occluder detection method based on multitask deep learning
CN107748858A (en) * 2017-06-15 2018-03-02 华南理工大学 A kind of multi-pose eye locating method based on concatenated convolutional neutral net
CN107886074A (en) * 2017-11-13 2018-04-06 苏州科达科技股份有限公司 A kind of method for detecting human face and face detection system
CN108805040A (en) * 2018-05-24 2018-11-13 复旦大学 It is a kind of that face recognition algorithms are blocked based on piecemeal
CN110458005A (en) * 2019-07-02 2019-11-15 重庆邮电大学 It is a kind of based on the progressive invariable rotary method for detecting human face with pseudo-crystalline lattice of multitask
CN110580445A (en) * 2019-07-12 2019-12-17 西北工业大学 Face key point detection method based on GIoU and weighted NMS improvement
CN110263774A (en) * 2019-08-19 2019-09-20 珠海亿智电子科技有限公司 A kind of method for detecting human face

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Key Frame Extraction with Face Biometric Features in Multi-shot Human Re-identification System;Agus Gunawan 等;《IEEE》;20200206;全文 *
级联网络和金字塔光流的旋转不变人脸检测;孙锐 等;《光电工程》;20200115;第47卷(第1期);全文 *

Also Published As

Publication number Publication date
CN111739070A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN111414887B (en) Secondary detection mask face recognition method based on YOLOV3 algorithm
CN103927520B (en) A kind of backlight environment servant's face detecting method
JP6549797B2 (en) Method and system for identifying head of passerby
CN111967393A (en) Helmet wearing detection method based on improved YOLOv4
CN104036236B (en) A kind of face gender identification method based on multiparameter exponential weighting
CN105678811A (en) Motion-detection-based human body abnormal behavior detection method
CN106874894A (en) A kind of human body target detection method based on the full convolutional neural networks in region
CN105160297B (en) Masked man's event automatic detection method based on features of skin colors
CN111209892A (en) Crowd density and quantity estimation method based on convolutional neural network
CN108596087B (en) Driving fatigue degree detection regression model based on double-network result
CN106778633B (en) Pedestrian identification method based on region segmentation
CN105893946A (en) Front face image detection method
CN107273799A (en) A kind of indoor orientation method and alignment system
CN112364778A (en) Power plant safety behavior information automatic detection method based on deep learning
CN108074234A (en) A kind of large space flame detecting method based on target following and multiple features fusion
CN111882586A (en) Multi-actor target tracking method oriented to theater environment
CN112183472A (en) Method for detecting whether test field personnel wear work clothes or not based on improved RetinaNet
CN111739070B (en) Real-time multi-pose face detection algorithm based on progressive calibration type network
CN113111722A (en) Automatic driving target identification method based on improved Mask R-CNN
CN104598914A (en) Skin color detecting method and device
CN103607558A (en) Video monitoring system, target matching method and apparatus thereof
CN112613359B (en) Construction method of neural network for detecting abnormal behaviors of personnel
CN106815567A (en) A kind of flame detecting method and device based on video
CN113361370A (en) Abnormal behavior detection method based on deep learning
CN103077403B (en) pedestrian counting method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant