CN110490170A

CN110490170A - A kind of face candidate frame extracting method

Info

Publication number: CN110490170A
Application number: CN201910793249.5A
Authority: CN
Inventors: 刘中秋; 陈高曙; 祝江威; 张伟; 唐松鹤
Original assignee: Miaxis Biometrics Co Ltd
Current assignee: Miaxis Biometrics Co Ltd
Priority date: 2019-08-27
Filing date: 2019-08-27
Publication date: 2019-11-22
Anticipated expiration: 2039-08-27
Also published as: CN110490170B

Abstract

A kind of face candidate frame extracting method provided by the invention mainly includes two steps: step 1 is candidate network training process；Step 2 is candidate frame extraction process；Step 1 includes that the data for preparing 3 class training datas and using are trained candidate network as training sample, and wherein training data is respectively sectioning image, the label of the corresponding anchor frame of sectioning image, true frame regression vector；Step 2 is to carry out the extraction of face candidate frame to image to be detected based on trained candidate network in step 1, and the candidate network is full convolutional neural networks.More anchor frames are used in combination the present invention with image pyramid, reduce the image pyramid number of plies, greatly improve the speed of service while guaranteeing accuracy, have taken into account accuracy and real-time, so that the present invention has good practicability.

Description

A kind of face candidate frame extracting method

[technical field]

The present invention relates to human face detection tech fields, more particularly, to a kind of face candidate frame extracting method.

[background technique]

Face datection (face detection) is the basic steps of computer vision field related application, to subsequent correlation The performance of application plays decisive role.So-called Face datection is that one kind is looked in arbitrarily input information (such as image, video) To the computer technology of the relevant informations such as the position of face, size and posture, wherein the position of face and size are face inspections The most basic face information that must be obtained is surveyed, shows usually using face candidate collimation mark and obtains.

Steps are as follows for method for detecting human face common at present: carrying out image pyramid operation to original image first, obtains more Layer pyramid diagram picture, secondly extracted using a candidate network may be in each layer of pyramid diagram picture face candidate frame, most Screening and coordinate modification are carried out to candidate frame using level-one or multistage refinement network afterwards, it is final to obtain face candidate frame.This side The extraction of face candidate frame need to be handled each layer of pyramid diagram picture in method, computationally intensive, cause to take a long time, and extract speed It is poor to spend, and lacks practicability.

[summary of the invention]

The present invention is intended to provide a kind of face candidate frame extracting method, is effectively reduced calculation amount, face candidate frame is improved Extraction rate.

The purpose of the present invention is what is be achieved through the following technical solutions:

A kind of face candidate frame extracting method mainly includes two steps: step 1: candidate network training process；Step 2: Candidate frame extraction process；The step 2 is to carry out face candidate to image to be detected based on trained candidate network in step 1 Frame extracts.

Step 1: preparing 3 class training datas, the respectively label of the corresponding anchor frame of sectioning image, sectioning image, true side Frame regression vector, corresponding data acquisition modes are as follows:

(1) sectioning image:

Square slice of the side length not less than A is randomly selected from the original image of training dataset, by the unified contracting of its side length It is put into A, it is final to obtain the sectioning image having a size of A*A.

(2) label of the corresponding anchor frame of sectioning image:

1) the corresponding anchor frame of sectioning image:

On the corresponding original image of sectioning image, centered on the sectioning image central point, choosing side length is respectively A* scaleⁱK square anchor frame；Wherein scale is zoom factor, value range 0~1；I value is 0,1 ..., k-1, wherein K is positive integer, and value meetsPreferred value is maximum k value.

2) label of anchor frame:

The IOU value of each anchor frame with real human face frame in corresponding original image is calculated,Wherein S_∩For anchor frame with The area of real human face frame intersection part, S_∪For the area of anchor frame and real human face frame union part.It is corresponding with each anchor frame Maximum IOU value is that anchor frame assigns label, and label classification is divided into positive sample label, part sample label and negative sample label.

(3) true frame regression vector:

For the anchor frame for being endowed positive sample label and part sample label in above-mentioned (2), it is corresponding to calculate its maximum IOU value True frame regression vector of the real human face frame relative to corresponding anchor frame, calculation formula is as follows:

Wherein x_g、y_gFor the center point coordinate of real human face frame, x_a、y_aFor the center point coordinate of corresponding anchor frame, w_g、h_gIt is true The width and height of real face frame, w_a、h_aFor the width and height of corresponding anchor frame.

3 class training datas of above-mentioned acquisition are trained candidate network as training sample, trained termination condition To meet the number of iterations or loss convergence.

Step 2: candidate frame extraction process, it is shown that detailed process is as follows:

(a) image pyramid processing is carried out to original image to be detected, obtains multilayer pyramid diagram picture；

(b) since kth layer pyramid diagram picture, every j layers of pyramid diagram picture inputs trained candidate network and is calculated, Face confidence characteristic figure and prediction frame regression vector characteristic pattern are obtained, wherein j is the positive integer less than or equal to k；

(c) face confidence level is more than or equal to the pixel of threshold value T in face confidence characteristic figure in selection above-mentioned (b)；

(d) according to selected pixel in (c), its corresponding anchor frame i.e. primary election anchor frame on original image is obtained；

(e) for the primary election anchor frame obtained in (d), extra anchor frame is excluded using non-maxima suppression method first, with Coordinate modification is carried out using corresponding prediction frame regression vector to remaining anchor frame afterwards, it is final to obtain face candidate frame.

More anchor frames are used in combination the present invention with image pyramid, and the pyramid diagram picture of required processing is initial pyramid diagram AsThe image pyramid number of plies is effectively reduced, the speed of service is greatly improved while guaranteeing accuracy, has taken into account standard True property and real-time, so that the present invention has good practicability.

[Detailed description of the invention]

Fig. 1 is the flow diagram of the candidate network training process of the embodiment of the present invention.

Fig. 2 is the flow diagram of the candidate frame extraction process of the embodiment of the present invention.

Fig. 3 is the structural schematic diagram of the candidate network of the embodiment of the present invention.

[specific embodiment]

To make the purpose of the present invention, technical solution and effect clearer, clear and definite, referring to the drawings and illustrative example pair The present invention is further described.

A kind of face candidate frame extracting method provided by the invention mainly includes two steps: step 1 is candidate network instruction Practice process, as shown in Figure 1；Step 2 is candidate frame extraction process, as shown in Figure 2；The step 2 is to be based on training in step 1 Candidate network to image to be detected carry out the extraction of face candidate frame.

The candidate network is convolutional neural networks, and specific network structure is as shown in Figure 3.

Shown in the candidate network training process of the present embodiment is specific as follows:

(1) side length randomly selected from the original image of training dataset is sliced not less than the square of pixel 24, by it Side length uniformly zooms to 24, final to obtain the sectioning image having a size of 24*24 pixel, as candidate network training input data it One；

(2) obtained in above-mentioned (1) centered on sectioning image central point, k square anchor frame is obtained by scaling, The experience value of middle zoom factor scale is 0.7937, therefore the side length of square anchor frame is respectively 24*scaleⁱ, i successively takes Value 0,1 ..., k-1, k value are 4；

(3) the IOU value of each anchor frame and real human face frame in corresponding original image in above-mentioned (2), calculation formula are calculated are as follows:Wherein S_∩For the area of anchor frame and real human face frame intersection part, S_∪For anchor frame and real human face frame union part Area.Label is assigned by anchor frame of the corresponding maximum IOU value of each anchor frame；

The scoring criteria of label is described as follows: imparting of the maximum IOU value more than or equal to 0.5 corresponds to anchor frame positive sample label, Imparting of the maximum IOU value less than 0.5 and more than or equal to 0.3 corresponds to anchor frame portion point sample label, maximum tax of the IOU value less than 0.2 Corresponding anchor frame negative sample label is given, remaining anchor frame does not assign label；Positive sample label anchor collimation mark can be denoted as 1, negative sample mark Label anchor collimation mark is denoted as 0；

(4) for being endowed the anchor frame of positive sample label and part sample label in (3), it is corresponding to calculate its maximum IOU value True frame regression vector of the real human face frame relative to corresponding anchor frame, calculation formula is as follows:

Wherein x_g、y_gFor the center point coordinate of real human face frame, x_a、y_aFor the center point coordinate of anchor frame, w_g、h_gFor true people The width and height of face frame, w_a、h_aFor the width and height of anchor frame.

(5) trained candidate network output is face confidence level and predicts two class data of frame regression vector, therefore, right In face confidence level, using sectioning image corresponding to the anchor box label and anchor frame of all positive sample labels and negative sample label into Row training；For predicting frame regression vector, the true side of all positive sample label anchor frames and part sample label anchor frame is used Sectioning image corresponding to frame regression vector and anchor frame is trained；Trained termination condition is to meet the number of iterations or loss receipts It holds back, finally obtains trained candidate network.

The candidate frame extraction process detailed process of the present embodiment can be described as follows:

From wisdom foreground obtain resolution ratio be 1920*1080 original image to be detected, first to original image to be detected into Row image pyramid processing obtains 18 layers of pyramid diagram picture, then since the 4th layer of pyramid diagram picture, by every 3 layers of pyramid diagram As inputting trained candidate network and obtaining two kinds of the frame regression vector characteristic pattern outputs of face confidence characteristic figure and prediction, Then it selects face confidence level to be more than or equal to 0.6 pixel in face confidence characteristic figure, finds out its institute on original image Corresponding anchor is confined as primary election anchor frame, anchor frame extra in primary election anchor frame is then excluded using non-maxima suppression method, then Coordinate modification calculating is carried out using corresponding prediction frame regression vector to remaining anchor frame, it is final to obtain face candidate frame.

More anchor frames are used in combination the present embodiment with image pyramid, by the pyramid diagram picture of required processing from initial 18 Layer is reduced to 6 layers, effectively reduces the image pyramid number of plies.

It by test, is run on common personal computer at present, the picture that a Zhang great little is 1920*1080 is carried out When face candidate frame extracts, face candidate frame extraction time is about 255ms, the embodiment of the present application in MTCNN method for detecting human face Middle face candidate frame extraction time is about 33ms, greatly optimizes the extraction rate of face candidate frame, has good practicability.

Above embodiments are only present pre-ferred embodiments, are not intended to limit the invention；Although with reference to the foregoing embodiments Invention is explained in detail, those skilled in the art should understand that: it still can be to previous embodiment Documented technical solution is modified or equivalent replacement of some of the technical features；And these are modified or replace It changes, the spirit and scope for technical solution of the embodiment of the present invention that it does not separate the essence of the corresponding technical solution.

Claims

1. a kind of face candidate frame extracting method, which is characterized in that mainly include two steps: step 1: candidate network was trained Journey；Step 2: candidate frame extraction process；The step 2 be based on trained candidate network in step 1 to image to be detected into Pedestrian's face candidate frame extracts；

The step 1 include: a. prepare 3 class training datas, the data that b. is used as training sample to candidate network into Row training；Wherein training data is respectively sectioning image, the label of the corresponding anchor frame of sectioning image, true frame regression vector, Data acquiring mode is described in detail below:

(1) sectioning image:

Square slice of the side length not less than A is randomly selected from the original image of training dataset, its side length is uniformly zoomed to A, it is final to obtain the sectioning image having a size of A*A；

(2) label of the corresponding anchor frame of sectioning image:

1) the corresponding anchor frame of sectioning image:

On the corresponding original image of sectioning image, centered on the sectioning image central point, choosing side length is respectively A*scaleⁱ's K square anchor frame；Wherein scale is zoom factor, value range 0~1；I value is 0,1 ..., and k-1, wherein k is positive whole Number, and value meets condition

2) label of anchor frame:

The IOU value of each anchor frame with real human face frame in corresponding original image is calculated,Wherein S_∩For anchor frame and really The area of face frame intersection part, S_∪For the area of anchor frame and real human face frame union part；With the corresponding maximum of each anchor frame IOU value is that anchor frame assigns label, and label classification is divided into positive sample label, part sample label and negative sample label；

(3) true frame regression vector:

For the anchor frame for being endowed positive sample label and part sample label in above-mentioned (2), it is corresponding true to calculate its maximum IOU value True frame regression vector of the real face frame relative to corresponding anchor frame, calculation formula are as follows:

Wherein x_g、y_gFor the center point coordinate of real human face frame, x_a、y_aFor the center point coordinate of corresponding anchor frame, w_g、h_gFor true people The width and height of face frame, w_a、h_aFor the width and height of corresponding anchor frame；

The step 2 is shown in detailed process is as follows:

(b) since kth layer pyramid diagram picture, every j layers of pyramid diagram picture inputs trained candidate network and is calculated, and obtains Face confidence characteristic figure and prediction frame regression vector characteristic pattern, wherein j is the positive integer less than or equal to k；

(e) for the primary election anchor frame obtained in (d), extra anchor frame is excluded using non-maxima suppression method first, it is then right Remaining anchor frame carries out coordinate modification using corresponding prediction frame regression vector, final to obtain face candidate frame.

2. a kind of face candidate frame extracting method as described in claim 1, which is characterized in that the candidate network is full convolution Neural network.

3. a kind of face candidate frame extracting method as described in claim 1, which is characterized in that the k is to meetMaximum value.

4. a kind of face candidate frame extracting method as described in claim 1, which is characterized in that the threshold value T is 0.6.

5. a kind of face candidate frame extracting method as described in claim 1, which is characterized in that maximum IOU value is more than or equal to 0.5 Imparting correspond to anchor frame positive sample label, imparting of the maximum IOU value less than 0.5 and more than or equal to 0.3 corresponds to anchor frame portion and divides sample Label, maximum imparting of the IOU value less than 0.2 correspond to anchor frame negative sample label, remaining anchor frame does not assign label.