CN106874868B - Face detection method and system based on three-level convolutional neural network - Google Patents

Face detection method and system based on three-level convolutional neural network Download PDF

Info

Publication number
CN106874868B
CN106874868B CN201710078431.3A CN201710078431A CN106874868B CN 106874868 B CN106874868 B CN 106874868B CN 201710078431 A CN201710078431 A CN 201710078431A CN 106874868 B CN106874868 B CN 106874868B
Authority
CN
China
Prior art keywords
face
feature vector
training
network
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710078431.3A
Other languages
Chinese (zh)
Other versions
CN106874868A (en
Inventor
王鲁许
白洪亮
董远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Feisou Technology Co ltd
Original Assignee
Beijing Feisou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Feisou Technology Co ltd filed Critical Beijing Feisou Technology Co ltd
Priority to CN201710078431.3A priority Critical patent/CN106874868B/en
Publication of CN106874868A publication Critical patent/CN106874868A/en
Application granted granted Critical
Publication of CN106874868B publication Critical patent/CN106874868B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a face detection method and a face detection system based on a three-level convolutional neural network, wherein the method has the following beneficial effects: in the training process, the training result of the front n-level is added as the input of the rear level, so that the problem of missing of training data is solved, the accuracy and recall rate of face detection are improved, and the performance of the whole network is improved. The face characteristic points are added into the training samples, and the classification of the faces and the positioning precision of the face rectangular frame are improved through the face characteristic points, so that the online of a network is approximately achieved, and the recall rate and the accuracy of face detection are further improved; and performing regression correction of image classification only through the classification offset in the first (second) offset obtained by calculation, thus ensuring that the part with correct classification is not subjected to regression correction any more, improving the speed of face detection and achieving the aim of further mining network performance. The system has the same beneficial effects as the detection method.

Description

Face detection method and system based on three-level convolutional neural network
Technical Field
The invention relates to the technical field of face detection, in particular to a face detection method and a face detection system based on a three-level convolutional neural network.
Background
Since the twenty-first century, the computer technology has been developed vigorously and widely applied to various fields; with the development of computer technology, face detection technology is on the fly and is in continuous iteration and updating. The face detection means that for any image set, a certain strategy is adopted to search the image set so as to determine an image with a face.
Face detection is a key link in automatic face recognition systems. Early face recognition research mainly aims at face images with strong constraint conditions (such as images without background), and usually assumes that the face position is always or easily obtained, so the face detection problem is not considered.
With the development of applications such as electronic commerce and the like, face recognition is the most potential biological identity authentication means, and the application background requires that an automatic face recognition system has a certain recognition capability on a common image, so a series of problems faced by the face recognition system are paid attention to by researchers as an independent subject. Today, the application background of face detection is far beyond the scope of face recognition systems, and the face detection system has important application value in the aspects of content-based retrieval, digital video processing, video detection, face modeling, face tracking and the like.
The face detection technology generally adopts search strategies such as decision trees, logistic regression, naive Bayes, three-level convolutional neural network and the like, wherein the face detection method/system based on the three-level convolutional neural network is fast in detection speed, high in identification accuracy rate and fast in iteration and updating. In the prior art, a face detection method based on a three-level convolutional neural network comprises the following steps: 1) training step by step through a network with multi-level performance enhanced step by step, and transferring the candidate frame which is judged as the face by the previous step to the next step to be used as a training sample for learning; 2) in each level, the judgment is carried out through the classification of the human face and the regression network of the human face frame; 3) and if the classification is correct, directly feeding back all the corrected data.
The prior art has the defects that due to the fact that the performance of a previous-level network is poor, partial faces cannot be judged correctly, the next-level face candidate frame is transmitted, loss is caused, and the overall performance is poor; the network performance can not be on-line only through face classification and face frame regression, and a promotion space still exists; and data are fed back completely, the network learning depth is not enough, and the network performance cannot be mined.
Disclosure of Invention
The invention aims to provide a face detection method and a face detection system based on a three-level convolutional neural network, which aim to solve the problem of poor overall performance; the performance of the network cannot be on line only by face classification and face frame correction; the correctly classified parts still have the problem of regression correction.
In order to achieve the above purpose, the invention provides the following technical scheme:
a face detection method based on a three-level convolutional neural network comprises the following steps:
acquiring a training sample and a detection picture; the training sample at least comprises a face picture marked with a face frame and face characteristic points;
inputting the training samples into a three-level convolution neural network for training step by step, wherein the training process comprises the following steps:
performing dimensionality reduction after prediction according to the training samples and the training results of the previous n levels to obtain corresponding two-dimensional feature vectors, and calculating to obtain a first offset according to the corresponding two-dimensional feature vectors;
performing regression correction on the two-dimensional feature vector through the first offset to obtain a corresponding training result;
and inputting the detection picture into the trained three-level convolutional neural network for carrying out face detection step by step to obtain a face rectangular frame.
According to the face detection method based on the three-level convolutional neural network, the face picture in the training sample also contains the picture classification label and the uniquely determined face frame.
In the above human face detection method based on the three-level convolutional neural network, the obtaining of the two-dimensional feature vector includes the following steps:
obtaining an m-dimensional feature vector according to the training sample and the training result of the previous n levels;
and performing dimensionality reduction processing on the m-dimensional feature vector through a full convolution layer/full connection layer to obtain the two-dimensional feature vector.
In the face detection method based on the three-level convolutional neural network, the three-level network includes a first branch, a second branch and a third branch, the two-level network includes the first branch and the second branch, and the first branch is the same as the first-level network.
In the above human face detection method based on the three-level convolutional neural network, in the three-level network, the obtaining of the m-dimensional feature vector includes the following steps:
inputting the training sample and the training result of the previous stage into the first branch to obtain a first feature vector, inputting the first feature vector into the second branch to obtain a second feature vector, and inputting the second feature vector into the third branch to obtain a third feature vector;
and splicing the first feature vector, the second feature vector and the third feature vector to obtain an m-dimensional feature vector.
In the above face detection method based on the three-level convolutional neural network, the obtaining of the first offset includes the following steps:
inputting the two-dimensional feature vector into a SoftmaxWithLoss layer, and calculating to obtain a classification offset;
and inputting the two-dimensional feature vector into an Euclidean Loss layer, and calculating to obtain the offset of a face frame and the offset of the face feature point.
In the above face detection method based on the three-level convolutional neural network, the calculation of the classification offset includes the following steps:
defining the two-dimensional feature vector; is defined as Z ═ { Z ═ Z1,z2Therein of
Figure BDA0001225162080000031
Classifying through a softmax function; the method is divided into two types, and is characterized in that:
Figure BDA0001225162080000032
calculating the difference between the predicted two-dimensional feature vector and the training sample through a loss function;
the loss function is:
Figure BDA0001225162080000033
wherein
Figure BDA0001225162080000034
Computing
Figure BDA0001225162080000035
Correction
Figure BDA0001225162080000036
Wherein α is a coefficient。
In the above human face detection method based on the three-level convolutional neural network, the obtaining of the human face rectangular frame includes the following steps:
inputting the detected picture into a primary network, screening, performing regression correction and combining the detected picture to obtain a first face candidate frame;
inputting the first face candidate frame into a secondary network, screening, performing regression correction and combining the first face candidate frame and the second face candidate frame to obtain a second face candidate frame;
and inputting the second face candidate frame into a three-level network to carry out screening, regression correction and combination on the second face candidate frame to obtain a face rectangular frame.
The face detection method based on the three-level convolutional neural network comprises the following steps of screening, regression correction and combination:
screening out the face candidate frames larger than a set probability threshold value according to the detected picture, the first face candidate frame, the second face candidate frame and the corresponding face probability;
calculating to obtain a second offset according to the face candidate frame obtained after screening, and performing regression correction on the second offset;
and combining the face candidate frames obtained after correction through a non-maximum suppression algorithm to obtain a first face candidate frame/a second face candidate frame/a face rectangular frame.
The face detection method based on the three-level convolutional neural network has the following beneficial effects:
1) in the training process, the training result of the front n stages is added as the input of the rear stage, so that the problem of missing of training data is solved, the accuracy and recall rate of face detection are improved, and the performance of the whole network is improved;
2) the face characteristic points are added into the training samples, and the classification of the faces and the positioning precision of the face rectangular frame are improved through the face characteristic points, so that the online of a network is approximately achieved, and the recall rate and the accuracy of face detection are further improved;
3) and performing regression correction of image classification only through the classification offset in the first (second) offset obtained by calculation, thus ensuring that the part with correct classification is not subjected to regression correction any more, improving the speed of face detection and achieving the aim of further mining network performance.
A face detection system based on a three-level convolutional neural network, comprising the three-level convolutional neural network, wherein the three-level convolutional neural network comprises:
the acquisition unit is used for acquiring a training sample and a detection picture; the training sample at least comprises a face picture marked with face characteristic points;
the network training unit is used for inputting the training samples into a three-level convolutional neural network for step-by-step training;
it includes: a feature vector module and a regression correction module,
the feature vector module is used for predicting and reducing the dimension according to the training samples and the training results of the previous n levels to obtain corresponding two-dimensional feature vectors, and calculating to obtain a first offset according to the two-dimensional feature vectors;
the regression correction module is used for performing regression correction on the two-dimensional feature vector through the first offset to obtain a corresponding training result;
and the face detection unit is used for inputting the detection picture into the trained three-level convolutional neural network to carry out face detection step by step so as to obtain a face rectangular frame.
The face detection system based on the three-level convolutional neural network has the following beneficial effects:
1) the secondary network and the tertiary network in the network training unit 2 (or the face detection unit 3) make up the defect of poor performance of the secondary network, so that the accuracy of picture classification is improved, the recall rate and the accuracy of face detection are improved, and the performance of the whole network is improved;
2) adding a face characteristic point on a face picture in a training sample of the acquisition unit 1, and improving the classification of the face and the positioning precision of a face rectangular frame through the face characteristic point, thereby approaching to the online of a network and further improving the recall rate and the accuracy of face detection;
3) the regression correction of the image classification is carried out only through the classification offset obtained by the matching of the feature vector module 21 and the regression correction module 22, so that the part with correct classification is ensured to be unnecessary to be corrected, the speed of face detection is improved, and the purpose of further mining the network performance is achieved.
Drawings
In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a structural block diagram of a face detection method based on a three-level convolutional neural network according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a face detection method based on a three-level convolutional neural network according to a preferred embodiment of the present invention;
fig. 3 is a schematic flow chart of a face detection method based on a three-level convolutional neural network according to a preferred embodiment of the present invention;
fig. 4 is a schematic flow chart of a face detection method based on a three-level convolutional neural network according to a preferred embodiment of the present invention;
fig. 5 is a schematic flow chart of a face detection method based on a three-level convolutional neural network according to a preferred embodiment of the present invention;
fig. 6 is a schematic flow chart of a face detection method based on a three-level convolutional neural network according to a preferred embodiment of the present invention;
fig. 7 is a schematic flowchart of a face detection method based on a three-level convolutional neural network according to a preferred embodiment of the present invention;
fig. 8 is a schematic structural diagram of a face detection system based on a three-level convolutional neural network according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a primary network according to a preferred embodiment of the present invention;
fig. 10 is a schematic structural diagram of a secondary network according to a preferred embodiment of the present invention;
fig. 11 is a schematic structural diagram of a three-stage network according to a preferred embodiment of the present invention.
Description of reference numerals:
1. an acquisition unit; 2. a network training unit; 21. a feature vector module; 22. a regression correction module; 3. a face detection unit.
Detailed Description
In order to make the technical solutions of the present invention better understood, those skilled in the art will now describe the present invention in further detail with reference to the accompanying drawings.
As shown in fig. 1-7 and 9-11, the face detection method based on the three-level convolutional neural network according to the embodiment of the present invention further includes the following steps:
s101, obtaining a training sample and a detection picture; the training sample at least comprises a face picture marked with a face frame and face characteristic points;
as shown in fig. 9-11, further, the three-level convolutional neural network includes a first-level network, a second-level network and a third-level network, the third-level network includes a first branch, a second branch and a third branch, the second-level network includes the first branch and the second branch, and the first branch is identical to the first-level network. The network structure of the first branch is completely the same as that of the primary network, so that the first branch is easy to distinguish, and 12-net represents the primary network, 24-net represents the secondary network, and 48-net represents the tertiary network in the figure; namely, the 24-net comprises a 12-net branch and a 24-net branch, the 48-net comprises a 12-net branch, a 24-net branch and a 48-net branch, and the 12-net, the 24-net and the 48-net are connected step by step, so that training samples can be selected step by step, other pictures without faces are eliminated, and accurate face pictures and corresponding more accurate face frames (face position determination) are obtained.
Furthermore, the face pictures in the training samples also contain picture classification labels. Specifically, the training samples are face pictures and other pictures, wherein the face pictures comprise classification labels, uniquely determined face frames and labeled face characteristic point information; the image classification training can be carried out through the classification labels, namely, the training samples are divided into a face image set with labels and other image sets; the rectangular area of the face in the face picture can be determined through the face frame, and therefore the area is framed and is the face position is determined; the human face characteristic points (landmark points) are salient parts such as a nose, glasses, a mouth, a forehead, a human face contour line and the like, and the difference of the human faces can be easily judged through the parts; because the position of the face is determined only through the face frame, the face can be accurately positioned through the face characteristic points: by increasing or reducing the face frame, the face characteristic points fall within the range of the face frame, thereby improving the face positioning precision of the face frame. Detecting pictures as a set of face pictures, environment pictures and other arbitrary pictures; after the training is finished, the face detection of the detected picture can be carried out. The mode of acquiring the training sample can be to obtain a face picture by calling a face library in the prior art or by 3D printing and the like, add a classification label, a uniquely determined face frame, mark a face feature point, and mix the face feature point in other pictures.
S102, inputting the training samples into a three-level convolution neural network for gradual training;
the step-by-step training means that a three-level convolutional neural network is trained according to the sequence of a first-level network, a second-level network and a third-level network in sequence, the three-level convolutional neural network has learning capacity, a picture classification mode can be learned after training, corresponding positions can be found in pictures and framed by rectangular frames, and even the positions of the rectangular frames can be further corrected by introducing human face characteristic points, so that when a large number of different pictures are input, human face classification and positioning can be achieved through the trained three-level convolutional neural network.
In step S102, the training further includes the steps of:
s1021, performing post-prediction dimensionality reduction according to the training samples and the training results of the previous n levels to obtain corresponding two-dimensional feature vectors, and calculating to obtain a first offset according to the corresponding two-dimensional feature vectors;
the training result refers to a result obtained after network prediction, dimension reduction and regression correction of each level in the three-level convolutional neural network; when the training sample is input into the first-level network, the training result of the first n levels is 'null', when the training sample is input into the second-level network, the training result of the first n levels is 'the training result of the first-level network', and when the training sample is input into the third-level network, the training result of the previous level is 'the training result of the first-level network' and 'the training result of the second-level network'; the prediction and dimension reduction means that in the training process, input training samples are classified, the positions of human faces are predicted, and the input training samples are converted into two-dimensional feature vectors convenient for operation; the first offset is the difference of a two-dimensional feature vector obtained after prediction and dimensionality reduction relative to a training sample (mainly the difference of a predicted value, a classification label in the training sample, a uniquely determined face frame and a labeled face feature point) in the training process, namely the difference between the predicted value and a pre-predicted value; preferably, the calculation between the two is performed by a loss function. The defect of poor performance of the previous network (the training result and the training sample of the previous network are both input into the next network) is compensated through the next network, so that the accuracy of picture classification is improved, the recall rate and the accuracy of face detection are improved, and the performance of the whole network is improved.
In step S1021, the obtaining of the two-dimensional feature vector comprises the steps of:
s201, obtaining an m-dimensional feature vector according to the training sample and the training result of the previous n levels;
the method comprises the steps that a prediction structure is arranged in front of a full convolution layer/a full connection layer, m-dimensional characteristic vectors of all networks are obtained through the structure prediction, and the m-dimensional characteristic vectors are different because the structures of a primary network, a secondary network and a tertiary network are different and pictures input into the networks for training are also different; the second-level network corrects the error part of the first-level network prediction, and the third-level network corrects the second-level network; the main point of correction is that the situation that the picture which is not classified to the face picture set but contains the label or the picture which does not contain the label but is classified to the face picture set can occur in the result obtained by the primary/secondary network prediction; the probability of the occurrence of the above situation can be greatly reduced through the two-stage/three-stage network, so that the three-stage convolutional neural network has the self-purification capability.
In the three-level network in step S201, the obtaining of the m-dimensional feature vector includes the steps of:
s301, inputting a training sample and a training result of a previous stage into a first branch to obtain a first feature vector, inputting the first feature vector into a second branch to obtain a second feature vector, and inputting the second feature vector into a third branch to obtain a third feature vector;
s302, splicing the first feature vector, the second feature vector and the third feature vector to obtain an m-dimensional feature vector.
The prediction structures in all levels of networks have splicing functions; in a three-level network, each level of branch respectively operates to obtain different eigenvectors, the dimensionalities of the eigenvectors (namely a first eigenvector, a second eigenvector and a third eigenvector) are different, and the eigenvectors are superposed to obtain m-dimensional eigenvectors; in the secondary network, as the splicing mode is the same as the splicing mode, one branch is omitted, so that a third eigenvector does not exist; in the primary network, there is only one branch, so the result obtained by splicing is the result of the branch. The method prepares for converting into the two-dimensional characteristic vector, and expresses the human face in a vector form, so that the calculation is more convenient. Specifically, the corresponding training data is input into the three branches, respectively. The first branch is identical to 12-net, and can obtain m-dimensional (for example, 16-dimensional) feature vectors before full convolution, and the second branch can obtain n-dimensional (for example, 128-dimensional) face feature vectors after passing through a layer before the 24-net full connection layer. After the third branch passes through the layer before the 48-net full connection layer, a p-dimensional (256-dimensional for example) face feature vector can be obtained, and the three feature vectors are spliced. Suppose that
Figure BDA0001225162080000081
Is a feature vector of 12-net,
Figure BDA0001225162080000082
is a feature vector of 24-net.
Figure BDA0001225162080000083
Is a feature vector of 48-net. Splicing the three vectors to obtain 400-dimension ((m + n + p) dimension)
Figure BDA0001225162080000084
Mixing X4Through the full interconnect layer.
S202, dimension reduction processing is carried out on the m-dimensional feature vector through a full convolution layer/full connection layer, and the two-dimensional feature vector is obtained.
A prediction structure for prediction is arranged before the full convolution layer, the prediction structure is regarded as a human face picture set as one type by default from the training sample through the prediction structure, and other picture sets are divided into another type; and obtaining a predicted face frame and a predicted face feature point of the face picture set, and converting the predicted face frame and the predicted face feature point into a form of m-dimensional feature vectors to represent. The full convolution layer has the function of reducing the dimension of the multi-dimensional feature vector to two dimensions, and the m-dimensional feature vector can obtain the two-dimensional feature vector through the full convolution layer, so that the calculation of the offset between the predicted value and the training sample is facilitated.
S1022, performing regression correction on the two-dimensional feature vector through the first offset to obtain a corresponding training result;
a feedback structure is arranged behind the full convolution/connection layer in each level of network, and regression correction is carried out on the predicted value through the feedback structure; the regression correction is to compensate the predicted value through the first offset, correct the offset generated by classification, the offset generated by a face frame and the offset generated by a face characteristic point, so that the face classification and the face positioning are more accurate, the finally obtained face frame is more accurate, the regression correction of classifying the part of the network which is classified correctly is not performed, the network performance is further mined, and the detection speed is ensured.
In step S1022, the obtaining of the first offset amount includes the following steps:
s401, inputting the two-dimensional feature vector into a SoftmaxWithLoss layer, and calculating to obtain a classification offset;
after the two-dimensional feature vector is obtained, the calculated weight W and the offset item b are fed back through calculation of classification offset of a SoftmaxWithLoss layer, namely regression correction of classification can be carried out through the classification offset, and the recall rate and accuracy of classification are improved.
In step S401, the calculation of the classification offset includes the steps of:
s501, defining the two-dimensional feature vector;
is defined as Z ═ { Z ═ Z1,z2Therein of
Figure BDA0001225162080000085
S502, classifying through a softmax function; the method is divided into two types, and is characterized in that:
Figure BDA0001225162080000091
s503, calculating the difference between the predicted two-dimensional feature vector and the training sample through a loss function;
the loss function is:
Figure BDA0001225162080000092
wherein
Figure BDA0001225162080000093
Computing
Figure BDA0001225162080000094
Correction
Figure BDA0001225162080000095
Where α is a coefficient.
S402, inputting the two-dimensional feature vector into an Euclidean Loss layer, and calculating to obtain a human face frame offset and a human face feature point offset.
Regression correction of the human face frame offset and the human face feature offset is carried out on all levels of networks through the combination of the Euclidean distance and the loss function, so that the finally obtained human face rectangular frame is corrected, and the human face recognition rate is further improved on the premise of ensuring the human face recognition speed.
S103, inputting the detection picture into the three-level convolutional neural network for gradual face detection to obtain a face rectangular frame.
The detection result is that the input detection pictures are classified through networks at all levels in the three-level convolutional neural network, and the detection result is a general name of a face position and a face characteristic point, which is a face candidate frame obtained by each network detection; corresponding to three networks, the number of detection results is three, screening, regression correction and combination are carried out on the three networks, and then the three networks are input into the next stage for detection, and finally a face rectangular frame can be obtained; the face rectangular frame is a rectangular frame obtained by firstly screening through a specific program, then correcting the face rectangular frame through the combination of the offset of the face characteristic points and the offset of the face frame, and then combining the same or similar face frames, and the face rectangular frame can determine the information such as the face position.
In step S103, the obtaining of the face rectangular frame includes the following steps:
s601, inputting the detection picture into a primary network, screening, performing regression correction and combining the detection picture to obtain a first face candidate frame;
s602, inputting the first face candidate frame into a secondary network, screening, performing regression correction and combining the first face candidate frame and the second face candidate frame to obtain a second face candidate frame;
and S603, inputting the second face candidate frame into a three-level network, screening, performing regression correction and combining the second face candidate frame to obtain a face rectangular frame.
The first-level network detection obtains a first face candidate frame, the second-level network detection obtains a second face candidate frame, the third-level network detection obtains a face rectangular frame (the three face candidate frames correspond to the three detection results in the step 103), and the first two detection results are screened, regression corrected and combined to respectively obtain a second face candidate frame and a final face rectangular frame; further, after the first face candidate frame is obtained, the size of the first face candidate frame is cut from the original image and adjusted to 24 × 24px, the first face candidate frame is input into the second network for detection, after the second face candidate frame is obtained, the size of the second face candidate frame is cut from the original image and adjusted to 48 × 48px, the second face candidate frame is input into the third network for detection, and after the detection, the face rectangular frame is obtained through screening, regression correction and combination. And step-by-step detection is carried out, and an accurate face rectangular frame (face position) is obtained, so that the recall rate and the accuracy of detection are further improved.
In step S103, the screening, regression correction, and combination includes the following steps:
s701, screening out a face candidate frame larger than a set probability threshold value according to the detected picture, the first face candidate frame, the second face candidate frame and the corresponding face probability;
s702, calculating to obtain a second offset according to the face candidate frame obtained after screening, and performing regression correction on the second offset;
and S703, merging the face candidate frames obtained after correction through a non-maximum suppression algorithm to obtain a first face candidate frame/a second face candidate frame/a face rectangular frame.
The face probability refers to the probability that the pictures in the face picture set contain the face after part of the pictures in the detected pictures are classified into the face picture set; comparing the face probability with a set probability threshold, and if the face probability is smaller than the set value, deleting the face candidate frames smaller than the set value to obtain screened face candidate frames; calculating a second offset through a SoftmaxWithLoss layer and an Euclidean Loss layer, wherein the second offset comprises picture classification offset in the detection process, detected face frame offset and detected face characteristic point offset, and therefore regression correction is conducted on the screened face candidate frames through the offset to obtain corrected face candidate frames; and then carrying out frame merging on the face candidate frames obtained after correction through a non-maximum suppression algorithm, wherein the non-maximum suppression algorithm is used for sorting the face frames according to the probability of the face, picking out the face frame with the maximum probability and calculating the coincidence degree with other frames, and deleting the corresponding frame when the coincidence degree is greater than a certain threshold value, so that the purpose of frame merging is achieved, and the first face candidate frame/the second face candidate frame/the face rectangular frame are obtained. The recall rate and accuracy of face detection are further improved through screening, regression correction and frame combination, and the detection speed is ensured.
The face detection method based on the three-level convolutional neural network has the following beneficial effects:
1) in the training process, the training result of the front n stages is added as the input of the rear stage, so that the problem of missing of training data is solved, the accuracy and recall rate of face detection are improved, and the performance of the whole network is improved;
2) the face characteristic points are added into the training samples, and the classification of the faces and the positioning precision of the face rectangular frame are improved through the face characteristic points, so that the online of a network is approximately achieved, and the recall rate and the accuracy of face detection are further improved;
3) and performing regression correction of image classification only through the classification offset in the first (second) offset obtained by calculation, thus ensuring that the part with correct classification is not subjected to regression correction any more, improving the speed of face detection and achieving the aim of further mining network performance.
As shown in fig. 8, an embodiment of the present invention further provides a face detection system based on a three-level convolutional neural network, including the three-level convolutional neural network, where the three-level convolutional neural network includes:
the device comprises an acquisition unit 1, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a training sample and a detection picture; the training sample at least comprises a face picture marked with face characteristic points;
the network training unit 2 is used for inputting the training samples into a three-level convolutional neural network for step-by-step training;
it includes: a feature vector module and a regression correction module,
the feature vector module 21 is configured to perform post-prediction dimensionality reduction according to the training samples and the training results of the previous n levels to obtain corresponding two-dimensional feature vectors, and calculate a first offset according to the corresponding two-dimensional feature vectors;
the regression correction module 22 is configured to perform regression correction on the two-dimensional feature vector through the first offset to obtain a corresponding training result;
and the face detection unit 3 is used for inputting the detection picture into the trained three-level convolutional neural network to carry out face detection step by step so as to obtain a face rectangular frame.
The face detection system based on the three-level convolutional neural network has the following beneficial effects:
1) the secondary network and the tertiary network in the network training unit 2 (or the face detection unit 3) make up the defect of poor performance of the secondary network, so that the accuracy of picture classification is improved, the recall rate and the accuracy of face detection are improved, and the performance of the whole network is improved;
2) adding a face characteristic point on a face picture in a training sample of the acquisition unit 1, and improving the classification of the face and the positioning precision of a face rectangular frame through the face characteristic point, thereby approaching to the online of a network and further improving the recall rate and the accuracy of face detection;
3) the regression correction of the image classification is carried out only through the classification offset obtained by the matching of the feature vector module 21 and the regression correction module 22, so that the part with correct classification is ensured to be unnecessary to be corrected, the speed of face detection is improved, and the purpose of further mining the network performance is achieved.
While certain exemplary embodiments of the present invention have been described above by way of illustration only, it will be apparent to those of ordinary skill in the art that the described embodiments may be modified in various different ways without departing from the spirit and scope of the invention. Accordingly, the drawings and description are illustrative in nature and should not be construed as limiting the scope of the invention.

Claims (7)

1. A face detection method based on a three-level convolutional neural network is characterized by comprising the following steps:
acquiring a training sample and a detection picture; the training sample at least comprises a face picture marked with a face frame and face characteristic points;
inputting the training samples into a three-level convolution neural network for training step by step, wherein the training process comprises the following steps:
performing dimensionality reduction after prediction according to the training samples and the training results of the previous n levels to obtain corresponding two-dimensional feature vectors, and calculating to obtain a first offset according to the corresponding two-dimensional feature vectors;
performing regression correction on the two-dimensional feature vector through the first offset to obtain a corresponding training result;
the obtaining of the two-dimensional feature vector comprises the following steps:
obtaining an m-dimensional feature vector according to the training sample and the training result of the previous n levels;
performing dimensionality reduction processing on the m-dimensional feature vector through a full convolution layer/full connection layer to obtain the two-dimensional feature vector;
the three-level convolutional neural network comprises a first-level network, a second-level network and a third-level network, wherein the third-level network comprises a first branch, a second branch and a third branch, the second-level network comprises the first branch and the second branch, and the first branch is the same as the first-level network;
in a three-level network, the obtaining of the m-dimensional feature vector comprises the following steps:
inputting the training sample and the training result of the previous stage into the first branch to obtain a first feature vector, inputting the first feature vector into the second branch to obtain a second feature vector, and inputting the second feature vector into the third branch to obtain a third feature vector;
splicing the first feature vector, the second feature vector and the third feature vector to obtain an m-dimensional feature vector;
and inputting the detection picture into the trained three-level convolutional neural network for carrying out face detection step by step to obtain a face rectangular frame.
2. The method of claim 1, wherein the face images in the training samples further comprise image classification labels.
3. The face detection method according to claim 1, wherein the obtaining of the first offset comprises the following steps:
inputting the two-dimensional feature vector into a SoftmaxWithLoss layer, and calculating to obtain a classification offset;
and inputting the two-dimensional feature vector into an Euclidean Loss layer, and calculating to obtain the offset of a face frame and the offset of the face feature point.
4. The face detection method of claim 3, wherein the calculation of the classification offset comprises the steps of:
defining the two-dimensional feature vector; is defined as Z ═ { Z ═ Z1,z2Therein of
Figure FDA0002478845610000011
Figure FDA0002478845610000021
Classifying through a softmax function; the method is divided into two types, and is characterized in that:
Figure FDA0002478845610000022
calculating the difference between the predicted two-dimensional feature vector and the training sample through a loss function;
the loss function is:
Figure FDA0002478845610000023
wherein
Figure FDA0002478845610000024
Computing
Figure FDA0002478845610000025
Correction
Figure FDA0002478845610000026
Where α is a coefficient.
5. The face detection method of claim 1, wherein the obtaining of the face rectangular frame comprises the following steps:
inputting the detected picture into a primary network, screening, performing regression correction and combining the detected picture to obtain a first face candidate frame;
inputting the first face candidate frame into a secondary network, screening, performing regression correction and combining the first face candidate frame and the second face candidate frame to obtain a second face candidate frame;
and inputting the second face candidate frame into a three-level network to carry out screening, regression correction and combination on the second face candidate frame to obtain a face rectangular frame.
6. The face detection method of claim 5, wherein the steps of screening, regression correcting and merging comprise:
screening out the face candidate frames larger than a set probability threshold value according to the detected picture, the first face candidate frame, the second face candidate frame and the corresponding face probability;
calculating to obtain a second offset according to the face candidate frame obtained after screening, and performing regression correction on the second offset;
and combining the face candidate frames obtained after correction through a non-maximum suppression algorithm to obtain a first face candidate frame/a second face candidate frame/a face rectangular frame.
7. A face detection system based on a three-level convolutional neural network is characterized by comprising the three-level convolutional neural network, wherein the three-level convolutional neural network comprises:
the acquisition unit is used for acquiring a training sample and a detection picture; the training sample at least comprises a face picture marked with face characteristic points;
the network training unit is used for inputting the training samples into a three-level convolutional neural network for step-by-step training;
it includes: a feature vector module and a regression correction module,
the feature vector module is used for predicting and reducing the dimension according to the training samples and the training results of the previous n levels to obtain corresponding two-dimensional feature vectors, and calculating to obtain a first offset according to the two-dimensional feature vectors;
the obtaining of the two-dimensional feature vector comprises the following steps:
obtaining an m-dimensional feature vector according to the training sample and the training result of the previous n levels;
performing dimensionality reduction processing on the m-dimensional feature vector through a full convolution layer/full connection layer to obtain the two-dimensional feature vector;
the three-level convolutional neural network comprises a first-level network, a second-level network and a third-level network, wherein the third-level network comprises a first branch, a second branch and a third branch, the second-level network comprises the first branch and the second branch, and the first branch is the same as the first-level network;
in a three-level network, the obtaining of the m-dimensional feature vector comprises the following steps:
inputting the training sample and the training result of the previous stage into the first branch to obtain a first feature vector, inputting the first feature vector into the second branch to obtain a second feature vector, and inputting the second feature vector into the third branch to obtain a third feature vector;
splicing the first feature vector, the second feature vector and the third feature vector to obtain an m-dimensional feature vector;
the regression correction module is used for performing regression correction on the two-dimensional feature vector through the first offset to obtain a corresponding training result;
and the face detection unit is used for inputting the detection picture into the trained three-level convolutional neural network to carry out face detection step by step so as to obtain a face rectangular frame.
CN201710078431.3A 2017-02-14 2017-02-14 Face detection method and system based on three-level convolutional neural network Active CN106874868B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710078431.3A CN106874868B (en) 2017-02-14 2017-02-14 Face detection method and system based on three-level convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710078431.3A CN106874868B (en) 2017-02-14 2017-02-14 Face detection method and system based on three-level convolutional neural network

Publications (2)

Publication Number Publication Date
CN106874868A CN106874868A (en) 2017-06-20
CN106874868B true CN106874868B (en) 2020-09-18

Family

ID=59167030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710078431.3A Active CN106874868B (en) 2017-02-14 2017-02-14 Face detection method and system based on three-level convolutional neural network

Country Status (1)

Country Link
CN (1) CN106874868B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679450A (en) * 2017-08-25 2018-02-09 珠海多智科技有限公司 Obstruction conditions servant's face recognition method based on deep learning
CN107688786A (en) * 2017-08-30 2018-02-13 南京理工大学 A kind of method for detecting human face based on concatenated convolutional neutral net
CN107665355B (en) * 2017-09-27 2020-09-29 重庆邮电大学 Agricultural pest detection method based on regional convolutional neural network
CN107784288B (en) * 2017-10-30 2020-01-14 华南理工大学 Iterative positioning type face detection method based on deep neural network
CN107808142A (en) * 2017-11-09 2018-03-16 北京小米移动软件有限公司 Eyeglass detection method and device
CN107886074B (en) * 2017-11-13 2020-05-19 苏州科达科技股份有限公司 Face detection method and face detection system
CN107784294B (en) * 2017-11-15 2021-06-11 武汉烽火众智数字技术有限责任公司 Face detection and tracking method based on deep learning
CN108363957A (en) * 2018-01-19 2018-08-03 成都考拉悠然科技有限公司 Road traffic sign detection based on cascade network and recognition methods
CN108509940B (en) * 2018-04-20 2019-11-05 北京达佳互联信息技术有限公司 Facial image tracking, device, computer equipment and storage medium
CN108960064A (en) * 2018-06-01 2018-12-07 重庆锐纳达自动化技术有限公司 A kind of Face datection and recognition methods based on convolutional neural networks
CN108921131B (en) * 2018-07-26 2022-05-24 中国银联股份有限公司 Method and device for generating face detection model and three-dimensional face image
CN109344740A (en) * 2018-09-12 2019-02-15 上海了物网络科技有限公司 Face identification system, method and computer readable storage medium
CN109376693A (en) * 2018-11-22 2019-02-22 四川长虹电器股份有限公司 Method for detecting human face and system
CN109635693B (en) * 2018-12-03 2023-03-31 武汉烽火众智数字技术有限责任公司 Front face image detection method and device
CN109389105B (en) * 2018-12-20 2022-02-08 北京万里红科技有限公司 Multitask-based iris detection and visual angle classification method
CN111382297B (en) * 2018-12-29 2024-05-17 杭州海康存储科技有限公司 User side user data reporting method and device
CN109753931A (en) * 2019-01-04 2019-05-14 广州广电卓识智能科技有限公司 Convolutional neural networks training method, system and facial feature points detection method
CN110263852B (en) * 2019-06-20 2021-10-08 北京字节跳动网络技术有限公司 Data processing method and device and electronic equipment
CN110717481B (en) * 2019-12-12 2020-04-07 浙江鹏信信息科技股份有限公司 Method for realizing face detection by using cascaded convolutional neural network
CN111209819A (en) * 2019-12-30 2020-05-29 新大陆数字技术股份有限公司 Rotation-invariant face detection method, system equipment and readable storage medium
CN112232215B (en) * 2020-10-16 2021-04-06 哈尔滨市科佳通用机电股份有限公司 Railway wagon coupler yoke key joist falling fault detection method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740758A (en) * 2015-12-31 2016-07-06 上海极链网络科技有限公司 Internet video face recognition method based on deep learning
CN106096535A (en) * 2016-06-07 2016-11-09 广东顺德中山大学卡内基梅隆大学国际联合研究院 A kind of face verification method based on bilinearity associating CNN
CN106228137A (en) * 2016-07-26 2016-12-14 广州市维安科技股份有限公司 A kind of ATM abnormal human face detection based on key point location
CN106295476A (en) * 2015-05-29 2017-01-04 腾讯科技(深圳)有限公司 Face key point localization method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295476A (en) * 2015-05-29 2017-01-04 腾讯科技(深圳)有限公司 Face key point localization method and device
CN105740758A (en) * 2015-12-31 2016-07-06 上海极链网络科技有限公司 Internet video face recognition method based on deep learning
CN106096535A (en) * 2016-06-07 2016-11-09 广东顺德中山大学卡内基梅隆大学国际联合研究院 A kind of face verification method based on bilinearity associating CNN
CN106228137A (en) * 2016-07-26 2016-12-14 广州市维安科技股份有限公司 A kind of ATM abnormal human face detection based on key point location

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"A Convolutional Neural Network Cascade for Face Detection";Haoxiang Li.et al;《2015 IEEE Conference on Computer Vision and Pattern Recognition》;20150612;期刊第1-4节,图1,2 *

Also Published As

Publication number Publication date
CN106874868A (en) 2017-06-20

Similar Documents

Publication Publication Date Title
CN106874868B (en) Face detection method and system based on three-level convolutional neural network
CN111709339B (en) Bill image recognition method, device, equipment and storage medium
TWI803472B (en) Method, computer program product and device for training a neural network
CN110348579B (en) Domain self-adaptive migration feature method and system
CN111144364B (en) Twin network target tracking method based on channel attention updating mechanism
CN111160407B (en) Deep learning target detection method and system
CN111931864B (en) Method and system for multiple optimization of target detector based on vertex distance and cross-over ratio
AU2016273851A1 (en) Accurate tag relevance prediction for image search
CN112149722A (en) Automatic image annotation method based on unsupervised domain adaptation
CN102165486B (en) Image characteristic amount extraction device
CN109685765B (en) X-ray film pneumonia result prediction device based on convolutional neural network
CN111160212B (en) Improved tracking learning detection system and method based on YOLOv3-Tiny
CN110991321B (en) Video pedestrian re-identification method based on tag correction and weighting feature fusion
CN111783767B (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN114022904B (en) Noise robust pedestrian re-identification method based on two stages
WO2021031704A1 (en) Object tracking method and apparatus, computer device, and storage medium
CN116089648B (en) File management system and method based on artificial intelligence
Zhu et al. Self-supervised universal domain adaptation with adaptive memory separation
CN117152587A (en) Anti-learning-based semi-supervised ship detection method and system
CN114926742A (en) Loop detection and optimization method based on second-order attention mechanism
CN111539417A (en) Text recognition training optimization method based on deep neural network
CN112487927B (en) Method and system for realizing indoor scene recognition based on object associated attention
CN113159071B (en) Cross-modal image-text association anomaly detection method
CN115546801A (en) Method for extracting paper image data features of test document
CN112989869B (en) Optimization method, device, equipment and storage medium of face quality detection model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant