CN106874868B

CN106874868B - Face detection method and system based on three-level convolutional neural network

Info

Publication number: CN106874868B
Application number: CN201710078431.3A
Authority: CN
Inventors: 王鲁许; 白洪亮; 董远
Original assignee: Beijing Feisou Technology Co ltd
Current assignee: Beijing Feisou Technology Co ltd
Priority date: 2017-02-14
Filing date: 2017-02-14
Publication date: 2020-09-18
Anticipated expiration: 2037-02-14
Also published as: CN106874868A

Abstract

The invention discloses a face detection method and a face detection system based on a three-level convolutional neural network, wherein the method has the following beneficial effects: in the training process, the training result of the front n-level is added as the input of the rear level, so that the problem of missing of training data is solved, the accuracy and recall rate of face detection are improved, and the performance of the whole network is improved. The face characteristic points are added into the training samples, and the classification of the faces and the positioning precision of the face rectangular frame are improved through the face characteristic points, so that the online of a network is approximately achieved, and the recall rate and the accuracy of face detection are further improved; and performing regression correction of image classification only through the classification offset in the first (second) offset obtained by calculation, thus ensuring that the part with correct classification is not subjected to regression correction any more, improving the speed of face detection and achieving the aim of further mining network performance. The system has the same beneficial effects as the detection method.

Description

Face detection method and system based on three-level convolutional neural network

Technical Field

The invention relates to the technical field of face detection, in particular to a face detection method and a face detection system based on a three-level convolutional neural network.

Background

Since the twenty-first century, the computer technology has been developed vigorously and widely applied to various fields; with the development of computer technology, face detection technology is on the fly and is in continuous iteration and updating. The face detection means that for any image set, a certain strategy is adopted to search the image set so as to determine an image with a face.

Face detection is a key link in automatic face recognition systems. Early face recognition research mainly aims at face images with strong constraint conditions (such as images without background), and usually assumes that the face position is always or easily obtained, so the face detection problem is not considered.

With the development of applications such as electronic commerce and the like, face recognition is the most potential biological identity authentication means, and the application background requires that an automatic face recognition system has a certain recognition capability on a common image, so a series of problems faced by the face recognition system are paid attention to by researchers as an independent subject. Today, the application background of face detection is far beyond the scope of face recognition systems, and the face detection system has important application value in the aspects of content-based retrieval, digital video processing, video detection, face modeling, face tracking and the like.

The face detection technology generally adopts search strategies such as decision trees, logistic regression, naive Bayes, three-level convolutional neural network and the like, wherein the face detection method/system based on the three-level convolutional neural network is fast in detection speed, high in identification accuracy rate and fast in iteration and updating. In the prior art, a face detection method based on a three-level convolutional neural network comprises the following steps: 1) training step by step through a network with multi-level performance enhanced step by step, and transferring the candidate frame which is judged as the face by the previous step to the next step to be used as a training sample for learning; 2) in each level, the judgment is carried out through the classification of the human face and the regression network of the human face frame; 3) and if the classification is correct, directly feeding back all the corrected data.

The prior art has the defects that due to the fact that the performance of a previous-level network is poor, partial faces cannot be judged correctly, the next-level face candidate frame is transmitted, loss is caused, and the overall performance is poor; the network performance can not be on-line only through face classification and face frame regression, and a promotion space still exists; and data are fed back completely, the network learning depth is not enough, and the network performance cannot be mined.

Disclosure of Invention

The invention aims to provide a face detection method and a face detection system based on a three-level convolutional neural network, which aim to solve the problem of poor overall performance; the performance of the network cannot be on line only by face classification and face frame correction; the correctly classified parts still have the problem of regression correction.

In order to achieve the above purpose, the invention provides the following technical scheme:

a face detection method based on a three-level convolutional neural network comprises the following steps:

acquiring a training sample and a detection picture; the training sample at least comprises a face picture marked with a face frame and face characteristic points;

inputting the training samples into a three-level convolution neural network for training step by step, wherein the training process comprises the following steps:

performing dimensionality reduction after prediction according to the training samples and the training results of the previous n levels to obtain corresponding two-dimensional feature vectors, and calculating to obtain a first offset according to the corresponding two-dimensional feature vectors;

performing regression correction on the two-dimensional feature vector through the first offset to obtain a corresponding training result;

and inputting the detection picture into the trained three-level convolutional neural network for carrying out face detection step by step to obtain a face rectangular frame.

According to the face detection method based on the three-level convolutional neural network, the face picture in the training sample also contains the picture classification label and the uniquely determined face frame.

In the above human face detection method based on the three-level convolutional neural network, the obtaining of the two-dimensional feature vector includes the following steps:

obtaining an m-dimensional feature vector according to the training sample and the training result of the previous n levels;

and performing dimensionality reduction processing on the m-dimensional feature vector through a full convolution layer/full connection layer to obtain the two-dimensional feature vector.

In the face detection method based on the three-level convolutional neural network, the three-level network includes a first branch, a second branch and a third branch, the two-level network includes the first branch and the second branch, and the first branch is the same as the first-level network.

In the above human face detection method based on the three-level convolutional neural network, in the three-level network, the obtaining of the m-dimensional feature vector includes the following steps:

inputting the training sample and the training result of the previous stage into the first branch to obtain a first feature vector, inputting the first feature vector into the second branch to obtain a second feature vector, and inputting the second feature vector into the third branch to obtain a third feature vector;

and splicing the first feature vector, the second feature vector and the third feature vector to obtain an m-dimensional feature vector.

In the above face detection method based on the three-level convolutional neural network, the obtaining of the first offset includes the following steps:

inputting the two-dimensional feature vector into a SoftmaxWithLoss layer, and calculating to obtain a classification offset;

and inputting the two-dimensional feature vector into an Euclidean Loss layer, and calculating to obtain the offset of a face frame and the offset of the face feature point.

In the above face detection method based on the three-level convolutional neural network, the calculation of the classification offset includes the following steps:

defining the two-dimensional feature vector; is defined as Z ═ { Z ═ Z₁,z₂Therein of

Classifying through a softmax function; the method is divided into two types, and is characterized in that:

calculating the difference between the predicted two-dimensional feature vector and the training sample through a loss function;

the loss function is:

wherein

Computing

Correction

Wherein α is a coefficient。

In the above human face detection method based on the three-level convolutional neural network, the obtaining of the human face rectangular frame includes the following steps:

inputting the detected picture into a primary network, screening, performing regression correction and combining the detected picture to obtain a first face candidate frame;

inputting the first face candidate frame into a secondary network, screening, performing regression correction and combining the first face candidate frame and the second face candidate frame to obtain a second face candidate frame;

and inputting the second face candidate frame into a three-level network to carry out screening, regression correction and combination on the second face candidate frame to obtain a face rectangular frame.

The face detection method based on the three-level convolutional neural network comprises the following steps of screening, regression correction and combination:

screening out the face candidate frames larger than a set probability threshold value according to the detected picture, the first face candidate frame, the second face candidate frame and the corresponding face probability;

calculating to obtain a second offset according to the face candidate frame obtained after screening, and performing regression correction on the second offset;

and combining the face candidate frames obtained after correction through a non-maximum suppression algorithm to obtain a first face candidate frame/a second face candidate frame/a face rectangular frame.

The face detection method based on the three-level convolutional neural network has the following beneficial effects:

1) in the training process, the training result of the front n stages is added as the input of the rear stage, so that the problem of missing of training data is solved, the accuracy and recall rate of face detection are improved, and the performance of the whole network is improved;

2) the face characteristic points are added into the training samples, and the classification of the faces and the positioning precision of the face rectangular frame are improved through the face characteristic points, so that the online of a network is approximately achieved, and the recall rate and the accuracy of face detection are further improved;

3) and performing regression correction of image classification only through the classification offset in the first (second) offset obtained by calculation, thus ensuring that the part with correct classification is not subjected to regression correction any more, improving the speed of face detection and achieving the aim of further mining network performance.

A face detection system based on a three-level convolutional neural network, comprising the three-level convolutional neural network, wherein the three-level convolutional neural network comprises:

the acquisition unit is used for acquiring a training sample and a detection picture; the training sample at least comprises a face picture marked with face characteristic points;

the network training unit is used for inputting the training samples into a three-level convolutional neural network for step-by-step training;

it includes: a feature vector module and a regression correction module,

the feature vector module is used for predicting and reducing the dimension according to the training samples and the training results of the previous n levels to obtain corresponding two-dimensional feature vectors, and calculating to obtain a first offset according to the two-dimensional feature vectors;

the regression correction module is used for performing regression correction on the two-dimensional feature vector through the first offset to obtain a corresponding training result;

and the face detection unit is used for inputting the detection picture into the trained three-level convolutional neural network to carry out face detection step by step so as to obtain a face rectangular frame.

The face detection system based on the three-level convolutional neural network has the following beneficial effects:

1) the secondary network and the tertiary network in the network training unit 2 (or the face detection unit 3) make up the defect of poor performance of the secondary network, so that the accuracy of picture classification is improved, the recall rate and the accuracy of face detection are improved, and the performance of the whole network is improved;

2) adding a face characteristic point on a face picture in a training sample of the acquisition unit 1, and improving the classification of the face and the positioning precision of a face rectangular frame through the face characteristic point, thereby approaching to the online of a network and further improving the recall rate and the accuracy of face detection;

3) the regression correction of the image classification is carried out only through the classification offset obtained by the matching of the feature vector module 21 and the regression correction module 22, so that the part with correct classification is ensured to be unnecessary to be corrected, the speed of face detection is improved, and the purpose of further mining the network performance is achieved.

Drawings

In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a structural block diagram of a face detection method based on a three-level convolutional neural network according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a face detection method based on a three-level convolutional neural network according to a preferred embodiment of the present invention;

fig. 3 is a schematic flow chart of a face detection method based on a three-level convolutional neural network according to a preferred embodiment of the present invention;

fig. 4 is a schematic flow chart of a face detection method based on a three-level convolutional neural network according to a preferred embodiment of the present invention;

fig. 5 is a schematic flow chart of a face detection method based on a three-level convolutional neural network according to a preferred embodiment of the present invention;

fig. 6 is a schematic flow chart of a face detection method based on a three-level convolutional neural network according to a preferred embodiment of the present invention;

fig. 7 is a schematic flowchart of a face detection method based on a three-level convolutional neural network according to a preferred embodiment of the present invention;

fig. 8 is a schematic structural diagram of a face detection system based on a three-level convolutional neural network according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a primary network according to a preferred embodiment of the present invention;

fig. 10 is a schematic structural diagram of a secondary network according to a preferred embodiment of the present invention;

fig. 11 is a schematic structural diagram of a three-stage network according to a preferred embodiment of the present invention.

Description of reference numerals:

1. an acquisition unit; 2. a network training unit; 21. a feature vector module; 22. a regression correction module; 3. a face detection unit.

Detailed Description

In order to make the technical solutions of the present invention better understood, those skilled in the art will now describe the present invention in further detail with reference to the accompanying drawings.

As shown in fig. 1-7 and 9-11, the face detection method based on the three-level convolutional neural network according to the embodiment of the present invention further includes the following steps:

s101, obtaining a training sample and a detection picture; the training sample at least comprises a face picture marked with a face frame and face characteristic points;

as shown in fig. 9-11, further, the three-level convolutional neural network includes a first-level network, a second-level network and a third-level network, the third-level network includes a first branch, a second branch and a third branch, the second-level network includes the first branch and the second branch, and the first branch is identical to the first-level network. The network structure of the first branch is completely the same as that of the primary network, so that the first branch is easy to distinguish, and 12-net represents the primary network, 24-net represents the secondary network, and 48-net represents the tertiary network in the figure; namely, the 24-net comprises a 12-net branch and a 24-net branch, the 48-net comprises a 12-net branch, a 24-net branch and a 48-net branch, and the 12-net, the 24-net and the 48-net are connected step by step, so that training samples can be selected step by step, other pictures without faces are eliminated, and accurate face pictures and corresponding more accurate face frames (face position determination) are obtained.

Furthermore, the face pictures in the training samples also contain picture classification labels. Specifically, the training samples are face pictures and other pictures, wherein the face pictures comprise classification labels, uniquely determined face frames and labeled face characteristic point information; the image classification training can be carried out through the classification labels, namely, the training samples are divided into a face image set with labels and other image sets; the rectangular area of the face in the face picture can be determined through the face frame, and therefore the area is framed and is the face position is determined; the human face characteristic points (landmark points) are salient parts such as a nose, glasses, a mouth, a forehead, a human face contour line and the like, and the difference of the human faces can be easily judged through the parts; because the position of the face is determined only through the face frame, the face can be accurately positioned through the face characteristic points: by increasing or reducing the face frame, the face characteristic points fall within the range of the face frame, thereby improving the face positioning precision of the face frame. Detecting pictures as a set of face pictures, environment pictures and other arbitrary pictures; after the training is finished, the face detection of the detected picture can be carried out. The mode of acquiring the training sample can be to obtain a face picture by calling a face library in the prior art or by 3D printing and the like, add a classification label, a uniquely determined face frame, mark a face feature point, and mix the face feature point in other pictures.

S102, inputting the training samples into a three-level convolution neural network for gradual training;

the step-by-step training means that a three-level convolutional neural network is trained according to the sequence of a first-level network, a second-level network and a third-level network in sequence, the three-level convolutional neural network has learning capacity, a picture classification mode can be learned after training, corresponding positions can be found in pictures and framed by rectangular frames, and even the positions of the rectangular frames can be further corrected by introducing human face characteristic points, so that when a large number of different pictures are input, human face classification and positioning can be achieved through the trained three-level convolutional neural network.

In step S102, the training further includes the steps of:

s1021, performing post-prediction dimensionality reduction according to the training samples and the training results of the previous n levels to obtain corresponding two-dimensional feature vectors, and calculating to obtain a first offset according to the corresponding two-dimensional feature vectors;

the training result refers to a result obtained after network prediction, dimension reduction and regression correction of each level in the three-level convolutional neural network; when the training sample is input into the first-level network, the training result of the first n levels is 'null', when the training sample is input into the second-level network, the training result of the first n levels is 'the training result of the first-level network', and when the training sample is input into the third-level network, the training result of the previous level is 'the training result of the first-level network' and 'the training result of the second-level network'; the prediction and dimension reduction means that in the training process, input training samples are classified, the positions of human faces are predicted, and the input training samples are converted into two-dimensional feature vectors convenient for operation; the first offset is the difference of a two-dimensional feature vector obtained after prediction and dimensionality reduction relative to a training sample (mainly the difference of a predicted value, a classification label in the training sample, a uniquely determined face frame and a labeled face feature point) in the training process, namely the difference between the predicted value and a pre-predicted value; preferably, the calculation between the two is performed by a loss function. The defect of poor performance of the previous network (the training result and the training sample of the previous network are both input into the next network) is compensated through the next network, so that the accuracy of picture classification is improved, the recall rate and the accuracy of face detection are improved, and the performance of the whole network is improved.

In step S1021, the obtaining of the two-dimensional feature vector comprises the steps of:

s201, obtaining an m-dimensional feature vector according to the training sample and the training result of the previous n levels;

the method comprises the steps that a prediction structure is arranged in front of a full convolution layer/a full connection layer, m-dimensional characteristic vectors of all networks are obtained through the structure prediction, and the m-dimensional characteristic vectors are different because the structures of a primary network, a secondary network and a tertiary network are different and pictures input into the networks for training are also different; the second-level network corrects the error part of the first-level network prediction, and the third-level network corrects the second-level network; the main point of correction is that the situation that the picture which is not classified to the face picture set but contains the label or the picture which does not contain the label but is classified to the face picture set can occur in the result obtained by the primary/secondary network prediction; the probability of the occurrence of the above situation can be greatly reduced through the two-stage/three-stage network, so that the three-stage convolutional neural network has the self-purification capability.

In the three-level network in step S201, the obtaining of the m-dimensional feature vector includes the steps of:

s301, inputting a training sample and a training result of a previous stage into a first branch to obtain a first feature vector, inputting the first feature vector into a second branch to obtain a second feature vector, and inputting the second feature vector into a third branch to obtain a third feature vector;

s302, splicing the first feature vector, the second feature vector and the third feature vector to obtain an m-dimensional feature vector.

The prediction structures in all levels of networks have splicing functions; in a three-level network, each level of branch respectively operates to obtain different eigenvectors, the dimensionalities of the eigenvectors (namely a first eigenvector, a second eigenvector and a third eigenvector) are different, and the eigenvectors are superposed to obtain m-dimensional eigenvectors; in the secondary network, as the splicing mode is the same as the splicing mode, one branch is omitted, so that a third eigenvector does not exist; in the primary network, there is only one branch, so the result obtained by splicing is the result of the branch. The method prepares for converting into the two-dimensional characteristic vector, and expresses the human face in a vector form, so that the calculation is more convenient. Specifically, the corresponding training data is input into the three branches, respectively. The first branch is identical to 12-net, and can obtain m-dimensional (for example, 16-dimensional) feature vectors before full convolution, and the second branch can obtain n-dimensional (for example, 128-dimensional) face feature vectors after passing through a layer before the 24-net full connection layer. After the third branch passes through the layer before the 48-net full connection layer, a p-dimensional (256-dimensional for example) face feature vector can be obtained, and the three feature vectors are spliced. Suppose that

Is a feature vector of 12-net,

is a feature vector of 24-net.

Is a feature vector of 48-net. Splicing the three vectors to obtain 400-dimension ((m + n + p) dimension)

Mixing X₄Through the full interconnect layer.

S202, dimension reduction processing is carried out on the m-dimensional feature vector through a full convolution layer/full connection layer, and the two-dimensional feature vector is obtained.

A prediction structure for prediction is arranged before the full convolution layer, the prediction structure is regarded as a human face picture set as one type by default from the training sample through the prediction structure, and other picture sets are divided into another type; and obtaining a predicted face frame and a predicted face feature point of the face picture set, and converting the predicted face frame and the predicted face feature point into a form of m-dimensional feature vectors to represent. The full convolution layer has the function of reducing the dimension of the multi-dimensional feature vector to two dimensions, and the m-dimensional feature vector can obtain the two-dimensional feature vector through the full convolution layer, so that the calculation of the offset between the predicted value and the training sample is facilitated.

S1022, performing regression correction on the two-dimensional feature vector through the first offset to obtain a corresponding training result;

a feedback structure is arranged behind the full convolution/connection layer in each level of network, and regression correction is carried out on the predicted value through the feedback structure; the regression correction is to compensate the predicted value through the first offset, correct the offset generated by classification, the offset generated by a face frame and the offset generated by a face characteristic point, so that the face classification and the face positioning are more accurate, the finally obtained face frame is more accurate, the regression correction of classifying the part of the network which is classified correctly is not performed, the network performance is further mined, and the detection speed is ensured.

In step S1022, the obtaining of the first offset amount includes the following steps:

s401, inputting the two-dimensional feature vector into a SoftmaxWithLoss layer, and calculating to obtain a classification offset;

after the two-dimensional feature vector is obtained, the calculated weight W and the offset item b are fed back through calculation of classification offset of a SoftmaxWithLoss layer, namely regression correction of classification can be carried out through the classification offset, and the recall rate and accuracy of classification are improved.

In step S401, the calculation of the classification offset includes the steps of:

s501, defining the two-dimensional feature vector;

is defined as Z ═ { Z ═ Z₁,z₂Therein of

S502, classifying through a softmax function; the method is divided into two types, and is characterized in that:

s503, calculating the difference between the predicted two-dimensional feature vector and the training sample through a loss function;

the loss function is:

wherein

Computing

Correction

Where α is a coefficient.

S402, inputting the two-dimensional feature vector into an Euclidean Loss layer, and calculating to obtain a human face frame offset and a human face feature point offset.

Regression correction of the human face frame offset and the human face feature offset is carried out on all levels of networks through the combination of the Euclidean distance and the loss function, so that the finally obtained human face rectangular frame is corrected, and the human face recognition rate is further improved on the premise of ensuring the human face recognition speed.

S103, inputting the detection picture into the three-level convolutional neural network for gradual face detection to obtain a face rectangular frame.

The detection result is that the input detection pictures are classified through networks at all levels in the three-level convolutional neural network, and the detection result is a general name of a face position and a face characteristic point, which is a face candidate frame obtained by each network detection; corresponding to three networks, the number of detection results is three, screening, regression correction and combination are carried out on the three networks, and then the three networks are input into the next stage for detection, and finally a face rectangular frame can be obtained; the face rectangular frame is a rectangular frame obtained by firstly screening through a specific program, then correcting the face rectangular frame through the combination of the offset of the face characteristic points and the offset of the face frame, and then combining the same or similar face frames, and the face rectangular frame can determine the information such as the face position.

In step S103, the obtaining of the face rectangular frame includes the following steps:

s601, inputting the detection picture into a primary network, screening, performing regression correction and combining the detection picture to obtain a first face candidate frame;

s602, inputting the first face candidate frame into a secondary network, screening, performing regression correction and combining the first face candidate frame and the second face candidate frame to obtain a second face candidate frame;

and S603, inputting the second face candidate frame into a three-level network, screening, performing regression correction and combining the second face candidate frame to obtain a face rectangular frame.

The first-level network detection obtains a first face candidate frame, the second-level network detection obtains a second face candidate frame, the third-level network detection obtains a face rectangular frame (the three face candidate frames correspond to the three detection results in the step 103), and the first two detection results are screened, regression corrected and combined to respectively obtain a second face candidate frame and a final face rectangular frame; further, after the first face candidate frame is obtained, the size of the first face candidate frame is cut from the original image and adjusted to 24 × 24px, the first face candidate frame is input into the second network for detection, after the second face candidate frame is obtained, the size of the second face candidate frame is cut from the original image and adjusted to 48 × 48px, the second face candidate frame is input into the third network for detection, and after the detection, the face rectangular frame is obtained through screening, regression correction and combination. And step-by-step detection is carried out, and an accurate face rectangular frame (face position) is obtained, so that the recall rate and the accuracy of detection are further improved.

In step S103, the screening, regression correction, and combination includes the following steps:

s701, screening out a face candidate frame larger than a set probability threshold value according to the detected picture, the first face candidate frame, the second face candidate frame and the corresponding face probability;

s702, calculating to obtain a second offset according to the face candidate frame obtained after screening, and performing regression correction on the second offset;

and S703, merging the face candidate frames obtained after correction through a non-maximum suppression algorithm to obtain a first face candidate frame/a second face candidate frame/a face rectangular frame.

The face probability refers to the probability that the pictures in the face picture set contain the face after part of the pictures in the detected pictures are classified into the face picture set; comparing the face probability with a set probability threshold, and if the face probability is smaller than the set value, deleting the face candidate frames smaller than the set value to obtain screened face candidate frames; calculating a second offset through a SoftmaxWithLoss layer and an Euclidean Loss layer, wherein the second offset comprises picture classification offset in the detection process, detected face frame offset and detected face characteristic point offset, and therefore regression correction is conducted on the screened face candidate frames through the offset to obtain corrected face candidate frames; and then carrying out frame merging on the face candidate frames obtained after correction through a non-maximum suppression algorithm, wherein the non-maximum suppression algorithm is used for sorting the face frames according to the probability of the face, picking out the face frame with the maximum probability and calculating the coincidence degree with other frames, and deleting the corresponding frame when the coincidence degree is greater than a certain threshold value, so that the purpose of frame merging is achieved, and the first face candidate frame/the second face candidate frame/the face rectangular frame are obtained. The recall rate and accuracy of face detection are further improved through screening, regression correction and frame combination, and the detection speed is ensured.

As shown in fig. 8, an embodiment of the present invention further provides a face detection system based on a three-level convolutional neural network, including the three-level convolutional neural network, where the three-level convolutional neural network includes:

the device comprises an acquisition unit 1, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a training sample and a detection picture; the training sample at least comprises a face picture marked with face characteristic points;

the network training unit 2 is used for inputting the training samples into a three-level convolutional neural network for step-by-step training;

it includes: a feature vector module and a regression correction module,

the feature vector module 21 is configured to perform post-prediction dimensionality reduction according to the training samples and the training results of the previous n levels to obtain corresponding two-dimensional feature vectors, and calculate a first offset according to the corresponding two-dimensional feature vectors;

the regression correction module 22 is configured to perform regression correction on the two-dimensional feature vector through the first offset to obtain a corresponding training result;

and the face detection unit 3 is used for inputting the detection picture into the trained three-level convolutional neural network to carry out face detection step by step so as to obtain a face rectangular frame.

While certain exemplary embodiments of the present invention have been described above by way of illustration only, it will be apparent to those of ordinary skill in the art that the described embodiments may be modified in various different ways without departing from the spirit and scope of the invention. Accordingly, the drawings and description are illustrative in nature and should not be construed as limiting the scope of the invention.

Claims

1. A face detection method based on a three-level convolutional neural network is characterized by comprising the following steps:

the obtaining of the two-dimensional feature vector comprises the following steps:

performing dimensionality reduction processing on the m-dimensional feature vector through a full convolution layer/full connection layer to obtain the two-dimensional feature vector;

the three-level convolutional neural network comprises a first-level network, a second-level network and a third-level network, wherein the third-level network comprises a first branch, a second branch and a third branch, the second-level network comprises the first branch and the second branch, and the first branch is the same as the first-level network;

in a three-level network, the obtaining of the m-dimensional feature vector comprises the following steps:

splicing the first feature vector, the second feature vector and the third feature vector to obtain an m-dimensional feature vector;

2. The method of claim 1, wherein the face images in the training samples further comprise image classification labels.

3. The face detection method according to claim 1, wherein the obtaining of the first offset comprises the following steps:

4. The face detection method of claim 3, wherein the calculation of the classification offset comprises the steps of:

the loss function is:

wherein

Computing

Correction

Where α is a coefficient.

5. The face detection method of claim 1, wherein the obtaining of the face rectangular frame comprises the following steps:

6. The face detection method of claim 5, wherein the steps of screening, regression correcting and merging comprise:

7. A face detection system based on a three-level convolutional neural network is characterized by comprising the three-level convolutional neural network, wherein the three-level convolutional neural network comprises:

it includes: a feature vector module and a regression correction module,