The content of the invention
The invention provides a kind of method for detecting human face based on multitask concatenated convolutional neutral net and detection means, with
Solve the technical problem for being difficult to take into account computational complexity and accuracy of detection in the prior art.
The technical solution adopted by the present invention is as follows:
A kind of method for detecting human face based on multitask concatenated convolutional neutral net, including:Set up the multistage convolution of cascade
Neutral net;Trained using face positive sample, face negative sample, part face and face key point sample as training sample
Multistage convolutional neural networks carry out the study of the task of face classification, the recurrence of human face region position and the crucial point location of face;
Face datection is carried out to image to be detected using the multistage convolutional neural networks trained;Wherein, in the training stage, while using
The face negative sample that online mode and offline mode combine to excavate hardly possible is as training sample.
Further, online mode is:The gradient trained in each batch processing returns the stage, and a counting loss is more than setting
The gradient of the part face negative sample of threshold value, ignores remaining face negative sample.Offline mode is:By the model that trains come
The face negative sample of classification error is obtained, and one of score in setting range is selected from the face negative sample of classification error
Batch, it is added to predetermined probability in the face negative sample currently trained and continues to train the model.
Further, every one-level in multistage convolutional neural networks include be sequentially connected multiple convolutional layers, at least one
The feature and upper level of full articulamentum output in maximum pond layer and full an articulamentum, and rear stage convolutional neural networks
The feature of the full articulamentum output of convolutional neural networks is combined, then corresponds to different tasks by different full connection sublayers
Study.
Further, multistage convolutional neural networks include three-level convolutional neural networks, wherein, first order convolutional neural networks
Using the image of 10*10*3 sizes as input, first order convolutional neural networks include multiple convolutional layers and maximum pond layer with
And a full articulamentum, after convolution and pondization processing, image is changed into the feature of 1*1*32 sizes from 10*10*3, then is passed through
Two different full connection sublayers correspond to different tasks;Second level convolutional neural networks use the image of 24*24*3 sizes
As input, second level convolutional neural networks include multiple convolutional layers and two maximum pond layers and an output length is
128 full articulamentum, the feature of the full articulamentum output of this grade and the 1*1*32 of the first order convolutional neural networks output feature
It is stitched together and forms length and be 160 feature, then corresponds to by two different full connection sublayers the study of different task;The
Three-level convolutional neural networks are using the image of 48*48*3 sizes as input, and third level convolutional neural networks include multiple convolution
The full articulamentum that layer and three maximum pond layers and an output length are 256, the feature of the full articulamentum output of this grade with
The merging features that the length of second layer convolutional neural networks output is 160 get up to be formed the feature that length is 416, then pass through three
Different full connection sublayers corresponds to the study of different task.
Further, in the training stage, in the multistage convolutional neural networks of cascade, multistage convolutional neural networks above are defeated
Enter sample for face positive sample, face negative sample and part face sample, face is added in afterbody convolutional neural networks
Key point sample, and face positive sample and the ratio of face negative sample are improved in convolutional neural networks at different levels later.
Further, in the training stage, face classification task use cross entropy loss function, human face region position return and
Face key point location tasks use European loss function, and face classification task also uses middle body loss function, and will hand over
Entropy loss function, European loss function and middle body loss function is pitched to export by different Weights.
Further, in detection-phase, image per one-level convolutional neural networks after handling, to the same face area of identification
Domain, the individual face region clustering of K is merged into the people after k-th before using non-maxima suppression method, sample score is come
Face region, and adjust human face region position.When being ranked up using non-maxima suppression method to sample score, using multistage
The weight score of convolutional neural networks is used as sample score.Multistage convolutional neural networks include three-level convolutional neural networks, weighting
Score computing mechanism is:First order convolutional neural networks classification score is S1, second level convolutional neural networks classification score S2, the
Three-level convolutional neural networks classification score S3, in sequence, sample passes through the score T1=that first order convolutional neural networks are used
S1, score T2=S2+S1*0.5 of the sample after the convolutional neural networks of the second level, sample pass through third level convolutional neural networks
Score T3=S3+T2*0.5 afterwards.
According to another aspect of the present invention, a kind of Face datection based on multitask concatenated convolutional neutral net is additionally provided
Device, including:Module is set up, the multistage convolutional neural networks for setting up cascade;Training module, for multistage convolutional Neural
Network is trained, and training module is used as instruction using face positive sample, face negative sample, part face and face key point sample
Practicing sample, face classification, human face region position are returned and the crucial point location of face is appointed to train multistage convolutional neural networks to carry out
The study of business;Detection module, for carrying out Face datection using the multistage convolutional neural networks trained to image to be detected;Instruction
Practice module in the training stage, while the face negative sample for combining to excavate hardly possible using online mode and offline mode is used as training sample
This.
Further, online mode is:The gradient trained in each batch processing returns the stage, and a counting loss is more than setting
The gradient of the part face negative sample of threshold value, ignores remaining face negative sample;Offline mode is:By the model that trains come
The face negative sample of classification error is obtained, and one of score in setting range is selected from the face negative sample of classification error
Batch, it is added to predetermined probability in the face negative sample currently trained and continues to train the model.
Further, set up in the multistage convolutional neural networks that module is set up, include successively per one-level convolutional neural networks
In the multiple convolutional layers, at least one maximum pond layer and the full articulamentum, and rear stage convolutional neural networks that connect
The feature of full articulamentum output is combined with the feature of the full articulamentum output of upper level convolutional neural networks, then by different complete
Connection sublayer corresponds to the study of different tasks.
The invention has the advantages that:Multistage convolutional neural networks of the invention based on cascade, can learn to Shandong
The stronger feature of rod, while by the way of two kinds of online mode and offline mode excavate difficult face negative sample, can improve
The classification capacity of network, so as to improve the detectability and accuracy rate of network, and ensures operation of this method in actual product
Speed.
In addition to objects, features and advantages described above, the present invention also has other objects, features and advantages.
Below with reference to accompanying drawings, the present invention is further detailed explanation.
Embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase
Mutually combination.Describe the present invention in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Reference picture 1, the method for detecting human face of the invention based on multitask concatenated convolutional neutral net, idiographic flow bag
Include:
Step S100, sets up the multistage convolutional neural networks of cascade;
Step S200, training sample is used as using face positive sample, face negative sample, part face and face key point sample
Originally the multistage convolutional neural networks were trained to carry out face classification, the recurrence of human face region position and the crucial point location of face
The study of task;
Step S300, Face datection is carried out using the multistage convolutional neural networks trained to image to be detected.
The preferred embodiments of the present invention, by setting up and training the convolutional neural networks of three-level differentiation of cascade to realize people
The crucial positioning of face detection, face.The present invention ensures the speed of service of this method in actual product first, on this basis,
Image pyramid is constructed to input picture, the shallow-layer convolutional neural networks of three cascades is reapplied, greatly improves the performance of algorithm.
First order convolutional neural networks ensure that higher face recall rate while speed is ensured, second level convolutional neural networks and
3rd convolutional neural networks further obtain the position of accurate human face region positional information and face key point.Every grade of convolution
Neutral net is corrected while whether discriminant classification input picture is face to the position of human face region, is further improved
The accuracy rate of Face datection.
Specifically, first order convolutional neural networks are used as input using the image of 10*10*3 sizes.In first order convolution god
Training stage through network, by taking 8*8,9*9,10*10,11*11,12*12 respectively, totally five kinds of input sizes are tested,
The results show, as input, (can contrast 12*12*3) using the image of 10*10*3 sizes while very little precision is lost
Increase substantially the speed of first order convolutional neural networks.The structures of first order convolutional neural networks as shown in Fig. 2 it include according to
One 1*3*3 of secondary connection convolutional layer (C), 3*1*10 convolutional layer, 2*2 maximum pond max pooling
Layer (MP) and a 1*3*10 convolutional layer, the convolutional layer of 3*1*16 convolutional layer and 1*3*10 and one connect entirely
Connect layer.After convolution and pondization processing, image is changed into the feature (feature Map) of 1*1*32 sizes from 10*10*3, then is led to
Cross two full connection sublayers and correspond to different tasks, including face classification (face classification) and human face region
Position returns (bounding box regression).
For further optimal speed, core 3*3*10 convolutional layer is split into a core for 1*3*3 and one by the present invention
Core is 3*1*10 convolutional layer, and further optimal speed while performance hardly declines, the present invention is rolled up in two-stage below
Also apply the technology to optimize network in product neutral net.In detection-phase, the present invention is refreshing by the first order convolution trained
Arranged through the full articulamentum in network according to core for the form of 1*1 convolutional layer, can not so limit the size of input picture.It is defeated
Enter whole image and obtain the characteristic pattern after first network processes, judge which position for face area according to this feature figure
Domain.
Second level convolutional neural networks input is 24*24*3, as shown in figure 3, its structure includes a 1* being sequentially connected
3*3 convolutional layer, 3*1*28 convolutional layer, 3*3 maximum pond layer, 1*3*28 convolutional layer, a 3*
The maximum pond layer of 1*48 convolutional layer, 3*3, reconnects convolutional layer and a full articulamentum that a core is 2*2, exports
Length be 128.Merging features by this length for the 1*1*32 of 128 feature and the first order convolutional neural networks output rise
To form the feature that length is 160, the study of different tasks is corresponded to by two full connection sublayers on this basis, including
Face classification and human face region position are returned.Experiment proves to add after the feature of first order convolutional neural networks, can improve second
The classification capacity of level convolutional neural networks.
As shown in figure 4, the input of third level convolutional neural networks is 48*48*3, a 1* is sequentially connected after input layer
3*3 convolutional layer, 3*1*32 convolutional layer, 3*3 maximum pond layer, 1*3*32 convolutional layer, a 3*
1*64 convolutional layer, 3*3 maximum pond layer, a core for 3*3 convolutional layer and a 2*2 maximum pond layer, one
Individual core is for 2*2 convolutional layer, immediately following a full articulamentum, and output length is 256.It is similar with second level convolutional neural networks, this
The merging features that the feature and this level length that the length that invention also obtains preceding two-stage convolutional neural networks is 160 are 256 are one
Rise, characteristic length is 416 after splicing, then correspond to by three different full connection sublayers the study of different task, including people
Face classification, human face region position are returned and the crucial point location (landmark localization) of face.
The present invention is used as training data using WIDER FACE and CelebA.In the training stage, according to selected image district
Domain and the overlapping size of actual face marked region, training sample is divided into face positive sample, and (positives, overlapping region is big
In 0.7), face negative sample (negatives, overlapping region be less than 0.3), (part faces, overlapping region is more than part face
0.5 and less than 0.7), face key point sample (landmarks, have key point mark face positive sample).Wherein
Positives and negatives is used for face classification task;Positives and part faces are returned for human face region position
Return task;Landmarks is used for face key point location tasks.In the stage of training first order convolutional neural networks, control
Positives, negatives, part faces ratio are 1:3:1, in the stage control of training second level convolutional neural networks
It is 1 to make the ratio:2:1, trained finally adding landmarks in the stage of training third level convolutional neural networks, and control
Positives processed:negatives:part faces:Landmarks ratio is 1:1:1:1.In second level convolutional Neural net
Face positive sample and the ratio of face negative sample are improved in the training process of network and third level convolutional neural networks, can be effectively improved
The classification capacity of this two-level network.
When first order convolutional neural networks are trained, from the big diagram data for having label, random interception 10x10 sizes
Image block as first order convolutional neural networks input.In training second level convolutional neural networks and third level convolutional Neural
When network, the network trained before is done into Face datection in the data for have label respectively and obtains correspondence next stage network
Input data.
Returned for face classification, human face region position, crucial these three the different tasks of point location of face, the present invention adopts
With different loss functions.Face classification uses cross entropy loss function, and its formula is as follows:
Wherein piExpression is the probability of positive sample (face) in the sample,Value be 0 or 1, respectively to should sample
Physical tags be face or non-face.
The European loss function that human face region position is returned and the crucial point location of face is all used, its formula is as follows:
In order to further improve the performance of face classification, present invention employs middle body loss function (center
Loss the distance of face sample and non-face sample in higher dimensional space) is further pulled open, is also demonstrated that in experiment, in every one-level convolution
Added in the face classification task of neutral net after center loss, classification capacity has further raising.Center loss
Formula it is as follows:
To improve the recall rate to face, several losses are defeated by different Weights by more than in training by the present invention
Go out:
Wherein αiRepresent the corresponding weight of different loss, βiCorrespondence sample type.
In the training process of Face datection, difficult face negative sample (hard negative mining) is excavated as instruction
It is always a kind of effective method for improving detectability to practice sample.The present invention is excavated using two kinds of face negative samples simultaneously
Mode, one kind be online mode (online), it is another be offline mode (offline).Online mode realize it is as follows,
The gradient trained in each batch processing (batch) returns the stage, and a counting loss is more than the part face negative sample of given threshold
Gradient, ignore remaining face negative sample.Preferably, face negative sample of the of the invention counting loss than before larger 70%
Gradient, ignore residue 30% easy face negative sample.In the offline stages, the present invention uses a kind of difficult sample
This strategy, i.e., obtain the face negative sample of classification error by the model trained, from the face negative sample of classification error
In select score and a collection of be added to than relatively low that with predetermined probability in training sample.It is threshold value with 0, the present invention is from sorting out
Preferably chosen in wrong face negative sample score 0~0.5 that is a collection of, the people currently trained is added to 40% probability
Continue to train the model in face negative sample.The method can increase substantially network in the training first order convolutional neural networks stage
Classification capacity, makes first order convolutional neural networks filter out most face negative samples, effectively reduces second level convolutional Neural
Network and the number of samples of third level convolutional neural networks processing, improve detectability and the speed of service.It is demonstrated experimentally that more than
Two methods can improve the classification capacity of network.Above probability/numerical value is only preferred mode, the invention is not limited in
This.
After the completion of training, Face datection, detection are carried out to image to be detected using the multistage convolutional neural networks trained
The whole flow process in stage is as follows:
The image of detection is needed for each, image pyramid (resize) is built first, by the every of pyramid diagram picture
One-level as first order convolutional neural networks input.For first order convolutional neural networks, volume is helped into the conversion of full articulamentum
Product network, can correspond to the input of any scale size, and directly output obtains the confidence level figure of human face region position, and thus calculates
Go out correspondence human face region position.
Then Cluster merging is carried out to the human face region that first order convolutional neural networks are exported.The present invention is using a kind of improvement
Non-maxima suppression top K NMS methods carry out Cluster merging, to the same human face region of identification, K before retaining every time
Divide high, be merged into the human face region after the relatively low namely k-th of score, further ensure face recall rate.Will be poly-
Human face region after class merges carries out aligning, then zooms to the input size of second level convolutional neural networks.
Image after scaling will carry out position after the processing of second level convolutional neural networks by the human face region of threshold value
Adjustment, then passes through top K NMS processing.Third level convolutional neural networks except input make 48*48*3 sizes into addition to, other
At the same time process obtains the information of face key point with second level convolutional neural networks.
When being ranked up using non-maxima suppression method to sample score, the present invention not only considers obtaining for current network
Point, go back the score of cumulative calculation previous stage network.Specifically score computing mechanism is:First order convolutional neural networks classification score
It is S1, second level convolutional neural networks classification score S2, third level convolutional neural networks classification score S3, in sequence, sample is passed through
Cross the score T1=S1, score T2=S2 of the sample after the convolutional neural networks of the second level of first order convolutional neural networks use
+ S1*0.5, sample score T3=S3+T2*0.5 after third level convolutional neural networks.Fully take into account three-level convolutional Neural
The weight score of network, experiment proves that this method can improve the accuracy rate of Face datection.
According to another aspect of the present invention, a kind of Face datection based on multitask concatenated convolutional neutral net is additionally provided
Device, the device of the present embodiment refers to above method embodiment, specifically, reference picture 5, and it includes:
Module 400 is set up, the multistage convolutional neural networks for setting up cascade;
Training module 500, for being trained to multistage convolutional neural networks, the use face of training module 500 positive sample,
Face negative sample, part face and face key point sample train multistage convolutional neural networks to carry out face as training sample
Classification, the recurrence of human face region position and the study of face key point location tasks;And training module 500 is in the training stage, together
The face negative sample that Shi Caiyong online modes and offline mode combine to excavate hardly possible is as training sample;
Detection module 600, for carrying out Face datection using the multistage convolutional neural networks trained to image to be detected.
Online mode is realized as follows:The gradient trained in each batch processing returns the stage, and a counting loss is than before larger
The gradient of 70% face negative sample, ignores the easy face negative sample of residue 30%.Offline mode is specially:Pass through training
Good model selects score 0~0.5 to obtain the face negative sample of classification error from the face negative sample of classification error
That is a collection of, be added to 40% probability in the face negative sample currently trained and continue to train the model.
Set up in the multistage convolutional neural networks that module 400 is set up, include what is be sequentially connected per one-level convolutional neural networks
Full connection in multiple convolutional layers, at least one maximum pond layer and full an articulamentum, and rear stage convolutional neural networks
The feature that the feature of layer output is exported with the full articulamentum of upper level convolutional neural networks is combined, then passes through different full connexons
Layer corresponds to the study of different tasks.
The method for detecting human face and device based on multitask concatenated convolutional neutral net of the present invention, with advantages below:
(1) the input size and network structure of first order convolutional neural networks are adjusted, the speed of service greatly improved.
(2) while by the way of online and offline two kinds are excavated difficult face negative sample, increasing first order convolutional Neural
Network is effectively reduced at second level convolutional neural networks and third level convolutional neural networks to the elimination ability of face negative sample
The number of samples of reason, improves detectability and the speed of service.
(3) face classification task in the training process adds center loss constraint, widens face and non-face
Class spacing, be conducive to improve classification capacity.
(4) different scale feature is combined, improves the ability to express of feature.Specifically, second level convolutional neural networks
The feature representation of first order convolutional neural networks is added, third level convolutional neural networks add first order convolutional neural networks
With the feature representation of second level convolutional neural networks, the robustness of different scale is enhanced, second level convolutional neural networks are improved
With the ability of the classification of third level convolutional neural networks.
(5) after by every grade of convolutional neural networks processing, human face region position is used non-maxima suppression top by the present invention
K NMS methods carry out Cluster merging, and adjust human face region position, further increase the accuracy rate of Face datection.In non-pole
When big value suppresses, the strategy weighted using top K NMS and convolutional neural networks score at different levels improves recalling for face
Rate.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies
Change, equivalent substitution, improvement etc., should be included in the scope of the protection.