CN109409303A

CN109409303A - A kind of cascade multitask Face datection and method for registering based on depth

Info

Publication number: CN109409303A
Application number: CN201811287109.2A
Authority: CN
Inventors: 刘青山; 蔡珍妮
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2018-10-31
Filing date: 2018-10-31
Publication date: 2019-03-01

Abstract

The present invention provides a kind of cascade multitask Face datection and method for registering based on depth includes the following steps: step 1: adjustment picture size forms image pyramid；Step 2: the various sizes of image that step 1 is obtained is respectively fed to P-Net network, and prediction obtains original image face candidate frame；Step 3: inputting R-Net network after the picture size in all face frames that step 2 obtains is set as n × n, carry out face or non-face judgement, and bounding box recurrence is carried out to face frame；Step 4: inputting O-Net network after the picture size in all face frames that step 3 obtains is set as m × m, carry out face or non-face judgement, and bounding box recurrence is carried out to face frame, while exporting the coordinate at left and right eye center and nose, left and right corners of the mouth position；Step 5: inputting L-Net network after the picture size in all face frames that step 4 obtains is set as k × k, finally obtain the three-dimensional perspective size of original image head pose and the position of several characteristic points.

Description

A kind of cascade multitask Face datection and method for registering based on depth

Technical field

The invention belongs to human face detection tech fields, more particularly to a kind of cascade multitask Face datection based on depth With method for registering.

Background technique

Face datection has vital effect, such as face editor, face for many face applications with being registrated Identification and facial Expression Analysis etc..But in real world, due to the influence of the factors such as illumination, size, attitudes vibration, make one Face, which is detected, to be become difficult with being registrated.

In terms of Face datection, more classical at present is the VJ face that Paul Viola and Michael Jones are proposed Detection method, this method propose integral image, quickly calculate Haar-like feature and are carried out using Adaboost learning algorithm Weak Classifier is combined into strong classifier by feature selecting and classifier training.

It is using the thought returned mostly, more representational is supervision descent method in terms of face registration.This side It is owned by France to utilize the SIFT feature of point on the basis of initialization feature point in a kind of method for solving the problems, such as non-linear minimisation Point is returned to obtain new characteristic point position, then the point newly obtained is returned, until obtaining closest true The position of real characteristic point.

Summary of the invention

It is an object of the invention in view of the drawbacks of the prior art or problem, provide a kind of cascade multitask based on depth Face datection and method for registering, small with model, speed is fast, good to the robustness of the extraneous factors such as light and posture variation Advantage.

Technical scheme is as follows: a kind of cascade multitask Face datection based on depth and method for registering include with Lower step: step 1: adjustment picture size forms image pyramid；Step 2: the various sizes of image that step 1 is obtained is distinguished It is sent into P-Net network, prediction obtains original image face candidate frame；Step 3: the image in all face frames that step 2 is obtained R-Net network is inputted after being dimensioned to n × n, carries out face or non-face judgement, and bounding is carried out to face frame Box is returned, wherein n is positive integer；Step 4: after the picture size in all face frames that step 3 obtains is set as m × m O-Net network is inputted, face or non-face judgement is carried out, and bounding box recurrence is carried out to face frame, exports simultaneously The coordinate of left and right eye center and nose, left and right corners of the mouth position, wherein m is positive integer；Step 5: the institute that step 4 is obtained L-Net network is inputted after having the picture size in face frame to be set as k × k, finally obtains the three dimensional angular of original image head pose Spend the position of size and several characteristic points, wherein k is positive integer.

Preferably, P-Net network described in step 2 is full convolutional neural networks.

Preferably, the three-dimensional perspective of head pose described in step 5 is tri- kinds of angles of yaw, pitch, roll respectively, respectively generation Table or so overturns, spins upside down, the angle of plane internal rotation.

Technical solution provided by the invention has the following beneficial effects:

1, the present invention is based on the cascade multitask Face datections and method for registering of depth, are registrated using Face datection with face Inner link, pass through network simultaneously export face location and characteristic point position, improve estimated performance；

2, the present invention is based on the cascade multitask Face datection and method for registering of depth, using head pose three-dimensional perspective and The inner link of human face characteristic point is mentioned by network while out-feed head posture three-dimensional perspective size and human face characteristic point position High estimated performance；

3, the present invention is based on the cascade multitask Face datection and method for registering of depth, using four shallow-layer neural networks, It is finally trained by cascading the size of prediction face frame, characteristic point position and head pose three-dimensional perspective to finely roughly The four model volumes arrived are very small, and predetermined speed is than very fast.

Detailed description of the invention

Fig. 1 is the process of cascade the multitask Face datection and method for registering provided in an embodiment of the present invention based on depth Figure；

Fig. 2 is the structure of the cascade multitask Face datection and P-Net network in method for registering shown in Fig. 1 based on depth Figure；

Fig. 3 is the structure of the cascade multitask Face datection and R-Net network in method for registering shown in Fig. 1 based on depth Figure；

Fig. 4 is the structure of the cascade multitask Face datection and O-Net network in method for registering shown in Fig. 1 based on depth Figure；

Fig. 5 is the structure of the cascade multitask Face datection and L-Net network in method for registering shown in Fig. 1 based on depth Figure.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

Although the step in the present invention is arranged with label, it is not used to limit the precedence of step, unless Based on the execution of the order or certain step that specify step needs other steps, otherwise the relative rank of step is It is adjustable.It is appreciated that term "and/or" used herein be related to and cover in associated listed item one Person or one or more of any and all possible combinations.

As shown in Figure 1, the cascade multitask Face datection provided in an embodiment of the present invention based on depth and method for registering packet Include following steps:

Step 1, original image pretreatment

For example, original image size is set, choose and reduce the factor 0.709, it is made gradually to narrow down to about 12 × 12, forms figure As pyramid.

Step 2, prediction original image face candidate frame

The various sizes of picture that step 1 is obtained is respectively fed to P-Net network, exports the candidate frame position of original image. Wherein, the P-Net network is full convolutional neural networks, moreover, as shown in Fig. 2, Conv indicates volume in the P-Net network Product, step-length 1；MP indicates max pooling, step-length 2.

In training, face classification uses cross entropy loss function, and bouding box, which is returned, calculates damage using Euclidean distance It loses, and forms the loss for calculating entire P-Net network with the ratio of 2:1.

Step 3 judges face or non-face and fine tuning face frame position

R-Net network is inputted after picture size in all face frames that step 2 obtains is set as n × n, carries out face Or non-face judgement, and bounding box recurrence is carried out to face frame, wherein n is positive integer.

For example, the image resize in all face frames that step 2 is obtained inputs R-Net network after being 24 × 24, into Pedestrian's face/non-face judgement, and bounding box recurrence is carried out to face frame.Wherein, the R-Net network is full convolution mind Through network, moreover, as shown in figure 3, Conv indicates convolution, step-length 1 in the R-Net network of network；MP indicates max Pooling, step-length 2.

In training, face classification uses cross entropy loss function, and bouding box, which is returned, calculates damage using Euclidean distance It loses, and forms the loss for calculating entire R-Net network with the ratio of 2:1.

Step 4 further judges face or non-face, fine tuning frame and the position for predicting several characteristic points

O-Net network is inputted after picture size in all face frames that step 3 obtains is set as m × m, carries out face Or non-face judgement, and bounding box recurrence is carried out to face frame, while exporting left and right eye center and nose, a left side The coordinate of right corners of the mouth position, wherein m is positive integer.

For example, the image resize in all face frames that step 3 is obtained inputs O-Net network after being 48 × 48, into Pedestrian's face/non-face judgement, and bounding box recurrence is carried out to face frame, while exporting left and right eye center and nose The coordinate of sharp, left and right corners of the mouth position 5 points.Wherein, the R-Net network is full convolutional neural networks, moreover, such as Fig. 4 institute Show, in the R-Net network of network, Conv indicates convolution, step-length 1；MP indicates max pooling, step-length 2.

In training, face classification uses cross entropy loss function, and bouding box is returned and positioning feature point is all made of Euclidean distance calculates loss, and the loss for calculating entire O-Net network is formed with the ratio of 2:1:2.

The position of step 5, out-feed head posture three-dimensional perspective and several characteristic points

L-Net network is inputted after picture size in all face frames that step 4 obtains is set as k × k, is finally obtained The three-dimensional perspective size of original image head pose and the position of several characteristic points, wherein k is positive integer.

For example, the image resize in all face frames that step 4 is obtained inputs L-Net network, output after being 48x48 The position of the three-dimensional perspective size of original image head pose and 68 characteristic points.Wherein, the R-Net network is full convolutional Neural Network, moreover, as shown in figure 5, Conv indicates convolution, step-length 1 in the R-Net network of network；MP indicates max Pooling, step-length 2.

In training, facial modeling and head pose estimation are all made of Euclidean distance and calculate loss, in order to obtain More accurate positioning feature point effect forms the loss for calculating entire L-Net network with the ratio of 100:1.

It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included within the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.

In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art The other embodiments being understood that.

Claims

1. a kind of cascade multitask Face datection and method for registering based on depth, it is characterised in that: the following steps are included:

Step 1: adjustment picture size forms image pyramid；

Step 2: the various sizes of image that step 1 is obtained is respectively fed to P-Net network, and prediction obtains original image face candidate Frame；

Step 3: inputting R-Net network after the picture size in all face frames that step 2 obtains is set as n × n, carry out people Face or non-face judgement, and bounding box recurrence is carried out to face frame, wherein n is positive integer；

Step 4: inputting O-Net network after the picture size in all face frames that step 3 obtains is set as m × m, carry out people Face or non-face judgement, and bounding box recurrence is carried out to face frame, at the same export left and right eye center and nose, The coordinate of left and right corners of the mouth position, wherein m is positive integer；

Step 5: L-Net network is inputted after the picture size in all face frames that step 4 obtains is set as k × k, finally To the three-dimensional perspective size of original image head pose and the position of several characteristic points, wherein k is positive integer.

2. a kind of cascade multitask Face datection and method for registering, feature based on depth according to claim 1 exists In P-Net network described in step 2 is full convolutional neural networks.

3. a kind of cascade multitask Face datection and method for registering, feature based on depth according to claim 1 exists In, the three-dimensional perspective of head pose described in step 5 is tri- kinds of angles of yaw, pitch, roll respectively, respectively represent left and right overturning, It spins upside down, the angle of plane internal rotation.