CN110458005A

CN110458005A - It is a kind of based on the progressive invariable rotary method for detecting human face with pseudo-crystalline lattice of multitask

Info

Publication number: CN110458005A
Application number: CN201910590187.8A
Authority: CN
Inventors: 周丽芳; 谷雨; 雷帮军; 李伟生
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2019-07-02
Filing date: 2019-07-02
Publication date: 2019-11-15
Anticipated expiration: 2039-07-02
Also published as: CN110458005B

Abstract

The invention discloses a kind of based on the progressive invariable rotary method for detecting human face with pseudo-crystalline lattice of multitask, belongs to computer vision field.The method mainly comprises the steps that image preprocessing, constructs and trains cascade multilayer convolutional neural networks；Input test image generates the image collection of different resolution using the mode of image pyramid, is then fed into the cascade multilayer convolutional neural networks and starts to detect；Every level-one network filtering falls the non-face window in part, adjusts candidate frame position according to frame regression result, while predicting the rotation angle of face；Then it is registrated according to the rotation angle predicted by flipped image operation.In the present invention, it by the progressive registration network method of multitask, realizes Face datection real-time, that rotation is adaptive, achieves good effect in accuracy and speed.

Description

Rotation-invariant face detection method based on multitask progressive registration network

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a rotation invariant face detection method based on a convolutional neural network.

Background

The image containing the face is indispensable to human-computer interaction based on intelligent vision, and the face detection provides abundant visual information for the intelligent analysis of the target, and can be used for identifying an interested object in the image. Meanwhile, the research on human face detection also becomes a fundamental problem which is difficult to avoid in the fields of image processing, computer vision and pattern recognition, and is widely concerned by researchers. The progress made in face detection plays an important supporting role in many problems in the fields of computer vision and pattern recognition, such as face recognition, video tracking, head pose estimation, gender recognition, and the like.

Research on human face detection of human targets by computer vision means has been developed for decades, but the performance of many face detection algorithms is not sufficient to meet the requirements in practical applications. Compared with a controlled environment, the human face in a real scene has different appearances in the picture: the human face is basically in a right-side-up state under the controlled environment, and only the head has slight geometric deformation; the human face pose under the real scene is more complex, and the biggest characteristic is that an uncertain plane rotation angle exists between a human face target and imaging equipment. An important disadvantage of the existing typical DCNN face detection network is that the robustness of the existing DCNN face detection network to image rotation change, scale change and the like is poor.

Disclosure of Invention

In view of the above disadvantages in the prior art, the present invention provides a face detection method that is robust to changes in rotation angles in a plane.

The technical scheme adopted by the invention for realizing the aim is as follows: a rotation invariant face detection method based on a multitask progressive registration network comprises the following steps:

s1, preprocessing the image, and constructing and training a cascaded multilayer convolutional neural network;

s2, inputting a test image, generating an image set with different resolutions in an image pyramid mode, and sending the image set into the cascaded multilayer convolutional neural network for starting detection;

s3, filtering partial non-face windows by each layer of convolutional neural network, adjusting the positions of candidate frames according to the frame regression result, and predicting the rotation angle of the face;

and S4, registering through image overturning operation according to the predicted rotation angle, and judging the registered image as a human face image.

Further, the image preprocessing comprises:

a1, rotating the WIDER FACE data set image according to any angle to generate a large number of face images containing rotation angle changes in a plane, and correspondingly rotating and changing face position information;

and A2, randomly rotating the LFW data set image according to any angle to generate a large number of face images containing the change of the rotation angle in the plane, and correspondingly rotating and changing the position information of the key points of the face.

Further, the cascaded multilayer convolutional neural network adopts a three-layer cascaded structure, the first layer comprises 4 convolutional layers and 1 maximum pooling layer, the second layer comprises 3 convolutional layers, 2 maximum pooling layers and 2 full-connection layers, and the third layer comprises 4 convolutional layers, 3 maximum pooling layers and 2 full-connection layers.

The invention has the following advantages and beneficial effects:

the invention mainly aims at the defect that the existing popular face detection method based on the deep convolutional neural network lacks robustness on image rotation change, and designs a rotation-invariant face detection method based on a multitask progressive registration network. In the actual scene, the situation that the face area cannot be detected due to the fact that the face target and the imaging device have uncertain plane rotation angles may occur. And the knowledge is transferred among the face detection task, the face key point detection task and the angle registration task by adopting a multi-task learning mode, so that the effective learning among all related tasks is realized, and the practical face detector with high efficiency, strong discriminative power and high robustness is obtained. In addition, the in-plane face image rotation angle information contained in the position coordinates of the key points of the face is fully considered, and in order to improve the robustness of key point detection when the face angle changes, the regression loss function of the key points of the face is redefined, and the tolerance capability of the algorithm to the in-plane rotation angle change is effectively improved. The method obtains better detection effect.

Drawings

Fig. 1 is a flow chart of implementation of rotation-invariant face detection provided in an embodiment of the present invention;

FIG. 2 is a diagram of a first stage network structure of a multi-cascaded convolutional neural network provided in an embodiment of the present invention;

FIG. 3 is a diagram of a second level network structure of a multi-cascaded convolutional neural network provided by an embodiment of the present invention;

FIG. 4 is a diagram of a third-level network structure of a multi-cascaded convolutional neural network provided in an embodiment of the present invention;

FIG. 5 is a diagram illustrating the effect of rotation invariant face detection provided by an embodiment of the present invention;

fig. 6 is a flowchart illustrating a specific implementation of the method S4 for rotation-invariant face detection according to an embodiment of the present invention;

fig. 7 is a tag position display diagram of feature points on a face image.

Detailed Description

The embodiment of the invention is realized based on cascaded multilayer convolutional neural networks, the image to be detected passes through the multilayer convolutional neural networks at all levels at one time, and each multilayer convolutional neural network executes the tasks of face classification, face candidate frame regression, face key point detection and angle identification. And finally, registering through image overturning operation according to the predicted rotation angle, and judging the registered image as a human face image.

In order to explain the technical solution of the present invention, the following description is made with reference to the accompanying drawings and specific examples.

Fig. 1 shows an implementation process of rotation-invariant face detection provided in an embodiment of the present invention, which is detailed as follows:

s1, constructing and training a cascaded multilayer convolutional neural network;

s2, inputting a test image, generating an image set with different resolutions in an image pyramid mode, and sending the image set to the cascaded multilayer convolutional neural network for starting detection;

s3, filtering partial non-face windows in each level of network, adjusting the positions of candidate frames according to the frame regression result, and predicting the rotation angle of the face;

and S4, performing registration through image overturning operation according to the predicted rotation angle.

The cascaded multilayer convolutional neural network adopts a three-layer cascaded architecture design, each stage is composed of a shallow convolutional neural network, the human face detection, the angle recognition and the key point positioning tasks are completed simultaneously, and good effects are achieved in the aspects of speed and precision.

Further, step S1 is to construct and train a multitask convolutional neural network based on a three-layer cascade architecture by using the relevance among multitasks and combining with face detection, angle recognition and key point positioning tasks, and the specific implementation steps are as follows:

the network structure diagrams of the first-level network, the second-level network and the third-level network are shown in fig. 2, fig. 3 and fig. 4, respectively. The rotation invariant face detection is decomposed into a face/non-face binary classification problem, a face angle identification problem and a face candidate frame regression problem, namely whether an input image is a face or not is judged, and an output result of a detection frame is enabled to be infinitely close to a true value of the input image. Specifically, the method comprises the following steps:

A. as shown in fig. 2, the network structure of the first-level network is, from top to bottom: the first layer, convolutional layer, with convolutional kernel size of 3 × 3 and convolutional kernel number of 16; the second layer is a maximum pooling layer, and the pooling interval is 2 multiplied by 2; the third layer, convolution kernel size is 3 x 3, convolution kernel number is 32; the fourth layer, convolution kernel size is 3 x 3, convolution kernel number is 64; the fifth layer is divided into four sublayers which are respectively connected with the fourth layer in series, the four sublayers are convolution layers, the convolution kernel is 1 multiplied by 1, and the used supervision information is respectively as follows: face and non-face two-classification information, face position information and face key point position information;

B. as shown in fig. 3, the network structure of the second-level network is, from top to bottom: the first layer, convolutional layer, with convolutional kernel size of 3 × 3 and convolutional kernel number of 24; the second layer is a maximum pooling layer, and the pooling interval is 3 multiplied by 3; the third layer, convolution kernel size is 3 x 3, convolution kernel number is 48; the fourth layer is a maximum pooling layer, and the pooling interval is 3 multiplied by 3; the fifth layer, convolution kernel size is 2 x 2, convolution kernel number is 96; the sixth layer, the full junction layer, the number of neurons is 196; the seventh layer is divided into four sublayers which are respectively connected with the sixth layer in series, the four sublayers are all full-connection layers, and the used supervision information is respectively as follows: face and non-face two-classification information, face position information and face key point position information;

C. as shown in fig. 4, the network structure of the third-level network is, from top to bottom: the first layer, convolutional layer, with convolutional kernel size of 3 × 3 and convolutional kernel number of 24; the second layer is a maximum pooling layer, and the pooling interval is 3 multiplied by 3; the third layer, convolution kernel size is 3 x 3, convolution kernel number is 48; the fourth layer is a maximum pooling layer, and the pooling interval is 3 multiplied by 3; the fifth layer, convolution kernel size is 2 x 2, convolution kernel number is 96; the sixth layer is a maximum pooling layer, and the pooling interval is 2 multiplied by 2; a seventh layer, convolutional layer, with convolutional kernel size of 2 × 2 and number of convolutional kernels of 192; the eighth layer is a full connection layer, and the number of the neurons is 254; the ninth layer is divided into three sublayers which are respectively connected with the eighth layer in series, the four sublayers are all connected layers, and the used supervision information is respectively as follows: face and non-face two-classification information, face position information and face key point position information;

D. in the testing stage, the first-level network and the second-level network only output judgment results f of the human face and the non-human face, the displacement t of the human face candidate frame and the human face direction g, and the third-level network only outputs the judgment results f of the human face and the non-human face, the displacement t of the human face candidate frame and the human face key point position p;

E. when the convolutional neural network is trained, a random gradient descent algorithm is used, a cross entropy function is used for calculating the loss of a face/non-face two-classification task, and the loss is shown in a calculation formula (1):

L_cls＝ylog f+(1-y)log(1-f) (1)

where y represents the true face classification result.

Likewise, the loss of the angle identification task is calculated using a cross entropy function, as shown in equation (2):

L_cal＝xlog g+(1-x)log(1-g) (2)

where x represents the true angle classification result.

The regression task of the face candidate frame uses an Euclidean distance function, and the calculation formula is as follows:

wherein,and representing the coordinate value of the real face position.

Finally, the rotation angle information of the face image in the plane, which is contained in the position coordinates of the key points of the face, is fully considered, the robustness of key point detection in the method is improved when the face angle changes, so that the regression loss function of the key points of the face is redefined, and the calculation formula is as follows:

n in the formula is the total number of training samples participating in a human face key point task, d is the Euclidean distance between a prediction point and a real point, and theta is the rotation angle value of the sample and meets the condition that theta belongs to [ -45 degrees and 45 degrees ].

F. In the present example, the public face data set WIDER FACE and LFW are used as the training set. WIDER FACE contains 32203 images and 393703 face detection frame position markers. Of these, 50% of the face data was used to train the face classification and candidate box regression tasks, 40% was used as the test set, and the remaining 10% was used as the verification set. The LFW dataset is used to train the angle recognition task person and face alignment task.

Further, step S2 is to input the image into the cascaded multilayer convolutional neural network, and output and generate a face candidate frame displacement, a candidate frame score, a keypoint position, and a face rotation angle, and the specific implementation steps are as follows:

A. the image to be tested is firstly scaled to generate an image pyramid. The input to the first level network is 12 × 12 × 3, where 3 represents an input image with 3 color channels, i.e., an RGB image. The output of the input image generated by the first-level network is the face candidate frame displacement t, the candidate frame score f and the face direction g. At the moment, the face angle recognition task is regarded as a two-classification task, namely the face direction is upward and the face direction is downward, and the two classification tasks are respectively marked as 1 and 0;

training labeled values of samples used in first-layer network angle recognitionTheta is the rotation angle value of the sample;

let the label value f₁Samples of 0 and 1 participate in training the first layer network angle identification.

B. The input of the second-level network is 24 multiplied by 3, and the output of the input image generated by the second-level network is the displacement t of the face candidate frame, the score f of the candidate frame and the face direction g; at the moment, the face angle recognition task is regarded as a three-classification task, namely that the face direction faces upwards, the face direction faces left and the face direction faces right, and the three-classification task is respectively marked as 0, 1 and 2;

training labeled values of samples used in angle recognition of layer two networkTheta is the rotation angle value of the sample;

let the label value f₂Samples of 0, 1 and 2 participate in training the second-tier network angle recognition task.

Further, in step S3, the rotation angle of the human face output by the network is registered by flipping the image, and the specific implementation steps are as follows:

A. in step S2, after the image to be measured passes through the first-level network, a face direction score g is generated, and the calculation formula of the rotation angle in the corresponding plane isWherein 0 ° represents face up, and 180 ° represents face down.

B. When the rotation angle p is 0 degrees, the image is not turned over; when the rotation angle p is 180 degrees, the image is turned over for 180 degrees; at this time, the range of the face rotation angle in the plane is narrowed from [ -180 °,180 ° ] to [ -90 °,90 ° ]. The image turning operation is simple, the calculation cost is low, and efficient and rapid face image registration in a plane can be realized;

C. in step S2, after the image to be detected passes through the second-level network, a face direction score g is generated, and the face direction score is converted into a direction label according to formula (5):

id＝argmax g_i,i∈[0,1,2] (5)

wherein id represents a directional tag, g₀，g₁，g₂Respectively representing the scores of the directions of the human face towards the left, the upper and the right.

The calculation formula of the rotation angle in the corresponding plane isWherein 0 ° represents the front face facing up, 90 ° represents the front face facing left, and-90 ° represents the front face facing right.

D. When the rotation angle p is 0 degrees, the image is not turned over; when the rotation angle p is 90 degrees, the image rotates rightwards by 90 degrees; when the rotation angle p is-90 deg., the image is rotated 90 deg. to the left. At this time, the range of the face rotation angle in plane is narrowed from [ -90 °,90 ° ] to [ -45 °,45 ° ].

Further, step S2 inputs the image into the last-stage multi-task convolutional neural network, and outputs a generated face candidate frame position t, a candidate frame score f, and a keypoint position p, which includes the following steps:

A. the input of the last level network is 48 multiplied by 3, and the output of the input image generated by the third level network is the face candidate frame displacement t, the candidate frame score f and the face key point position p, which is different from the first level network and the second level network.

B. The purpose of the image to be measured passing through the first stage network is to rapidly generate candidate windows using a full convolution network to predict the image rotation angle in a relatively coarse manner. The purpose of the second-stage network is to continuously refine the candidate window generated in the first stage by using a complex convolutional neural network, discard a large number of overlapping windows and predict the image rotation angle.

Further, step S4 calculates a rotation angle by using the geometric information between the position of the candidate frame of the human face and the key points of the human face output in step S2, and performs registration by flipping the image to obtain a detected human face image, which includes the following specific implementation steps:

A. as shown in fig. 6(a), the image to be measured is subjected to face detection and key point positioning through a third-level network, the output image of the third-level network is determined to be a face image, and the range of the rotation angle of the face in the plane is [ -45 °,45 ° ].

B. As shown in fig. 6(b), the distance from the human eye to the top of the head is known to be closer than the distance from other key points to the top of the head. Based on this a priori knowledge, the forward direction of the face detection box is first determined by calculating the distances between the left and right eyes to the four bounding boxes.

C. As shown in fig. 7, in a standard face-up image, the line between the left eye and the tip of the nose forms an angle α equal to the angle β formed by the line between the tip of the nose and the top of the head. As shown in fig. 6(d), the geometric relationship between the face detection frame and the key point is used to calculate the face image rotation angle as θ ═ α - β ÷ 2.

The rotation-invariant face detection effect provided by the embodiment of the invention is shown in fig. 5.

Claims

1. A rotation invariant face detection method based on a multitask progressive registration network is characterized by comprising the following steps:

2. The rotation-invariant face detection method based on the multitask progressive registration network according to claim 1, wherein the rotation-invariant face detection method comprises the following steps: the image preprocessing comprises:

a1, randomly rotating the face image to any angle to generate a large number of face images containing the change of the rotation angle in the plane, and correspondingly rotating and changing the face position information;

and A2, randomly rotating the face key point images to any angle to generate a large number of face key point images containing rotation angle changes in a plane, and correspondingly rotating and changing the position information of the face key point.

3. The rotation-invariant face detection method based on the multitask progressive registration network according to claim 1 or 2, characterized in that: the cascaded multilayer convolutional neural network adopts a three-layer cascaded structure, wherein the first layer comprises 4 convolutional layers and 1 maximum pooling layer, the second layer comprises 3 convolutional layers, 2 maximum pooling layers and 2 full-connection layers, and the third layer comprises 4 convolutional layers, 3 maximum pooling layers and 2 full-connection layers.

4. The rotation-invariant face detection method based on the multitask progressive registration network according to claim 3, wherein the rotation-invariant face detection method comprises the following steps: in the three-layer cascaded multilayer convolutional neural network,

let the label value f₁The samples of 0 and 1 participate in training the first layer of network angle recognition;

5. The rotation-invariant face detection method based on the multitask progressive registration network according to claim 1, 2 or 4, wherein: the frame regression result is embodied through the regression loss of the key points of the human face;

face key point regression loss isD in the formula is the Euclidean distance between the predicted point and the real point, theta is the rotation angle value of the sample and satisfies theta epsilon [ -45 DEG, 45 DEG]。