CN107239736A

CN107239736A - Method for detecting human face and detection means based on multitask concatenated convolutional neutral net

Info

Publication number: CN107239736A
Application number: CN201710291289.0A
Authority: CN
Inventors: 丁建华
Original assignee: Athena Eyes Science & Technology Co Ltd
Current assignee: Athena Eyes Science & Technology Co Ltd
Priority date: 2017-04-28
Filing date: 2017-04-28
Publication date: 2017-10-10

Abstract

The invention discloses a kind of method for detecting human face based on multitask concatenated convolutional neutral net and detection means, this method includes：Set up the multistage convolutional neural networks of cascade；The study of the task of multistage convolutional neural networks progress face classification, the recurrence of human face region position and the crucial point location of face is trained as training sample using face positive sample, face negative sample, part face and face key point sample；Face datection is carried out to image to be detected using the multistage convolutional neural networks trained；Wherein, in the training stage, while the face negative sample for combining to excavate hardly possible using online mode and offline two ways is as training sample.Multistage convolutional neural networks of the invention based on cascade, it can learn to the stronger feature of robustness, simultaneously by the way of two kinds of online mode and offline mode excavate difficult negative sample, the classification capacity of network can be improved, so as to improve the detectability and accuracy rate of network, and ensure the speed of service of this method in actual product.

Description

Method for detecting human face and detection means based on multitask concatenated convolutional neutral net

Technical field

The present invention relates to Face datection field, especially, it is related to a kind of people based on multitask concatenated convolutional neutral net Face detecting method and detection means.

Background technology

Face datection (face detection) technology is all face correlation techniques (recognition of face, face alignment, face Crucial point prediction of Expression Recognition, face etc.) basis.As human face detection tech is applied in increasing scene, especially In the monitors environment of the non-cooperation of user, face yardstick, angle and light conditions change very greatly, this precision and speed to Face datection Degree requires also more and more higher.

Deep learning (deep learning) technology has no the difference of essence with artificial neural network in those early years, from From 2012, the increase of data and the enhancing of computing capability are benefited from, depth learning technology is rapidly developed.Convolutional Neural Network (Convolutional neural networks, CNN) is one kind of deep learning network, in image domains, CNN side Method has the raising of matter than conventional method in the ability for solving various problems.

Method for detecting human face was divided into two classes according to whether using deep learning method in recent years, not using deep learning method Algorithm effect it is relatively good have joint cascade face detection and align (JDA) and Normalized Pixel Difference(NPD).JDA methods combine Face datection and face critical point detection, are compared using one kind Simple pixel difference feature and random forest method realize algorithm frame.Because feature is fairly simple, Shandong of the algorithm to illumination Rod is poor, and in the case where backlight, uneven illumination are even, the effect of Face datection is poor.NPD methods and the overall frame of JDA methods Frame is similar, and difference essentially consists in NPD and carried out normalized to pixel difference feature, has preferable improvement to backlight situation. The algorithm detects effect on authoritative Face datection data set FDDB (Face Detection Data Set and Benchmark) Fruit is slightly poorer than JDA, and JDA methods are slower than in speed.

Both the above detection algorithm is best one of several of effect in conventional method (not using deep learning), compares depth Recall rate will low 5 to 10 points in the case of flase drop number identical for the Face datection algorithm of learning method.It is described below several The relatively good deep learning face algorithm of effect.Face Detection with the Faster R-CNN are by vgg networks (16 Layer) and Faster R-CNN frame applications to Face datection, another method Boost strapping Face Detection With hard Negative Examples are even more by 50 layers of residual error network application into Face datection.Although these methods Good effect is achieved, but is due to that model file is too big, speed is too slow, basic is difficult to be used in actual product.

Conventional method monitors environment out of doors, the recall rate to face is inadequate.And the algorithm based on deep learning, use Network is too deep, and model file is larger, and computational complexity is too high, it is difficult to be disposed in actual product.

The content of the invention

The invention provides a kind of method for detecting human face based on multitask concatenated convolutional neutral net and detection means, with Solve the technical problem for being difficult to take into account computational complexity and accuracy of detection in the prior art.

The technical solution adopted by the present invention is as follows：

A kind of method for detecting human face based on multitask concatenated convolutional neutral net, including：Set up the multistage convolution of cascade Neutral net；Trained using face positive sample, face negative sample, part face and face key point sample as training sample Multistage convolutional neural networks carry out the study of the task of face classification, the recurrence of human face region position and the crucial point location of face； Face datection is carried out to image to be detected using the multistage convolutional neural networks trained；Wherein, in the training stage, while using The face negative sample that online mode and offline mode combine to excavate hardly possible is as training sample.

Further, online mode is：The gradient trained in each batch processing returns the stage, and a counting loss is more than setting The gradient of the part face negative sample of threshold value, ignores remaining face negative sample.Offline mode is：By the model that trains come The face negative sample of classification error is obtained, and one of score in setting range is selected from the face negative sample of classification error Batch, it is added to predetermined probability in the face negative sample currently trained and continues to train the model.

Further, every one-level in multistage convolutional neural networks include be sequentially connected multiple convolutional layers, at least one The feature and upper level of full articulamentum output in maximum pond layer and full an articulamentum, and rear stage convolutional neural networks The feature of the full articulamentum output of convolutional neural networks is combined, then corresponds to different tasks by different full connection sublayers Study.

Further, multistage convolutional neural networks include three-level convolutional neural networks, wherein, first order convolutional neural networks Using the image of 10*10*3 sizes as input, first order convolutional neural networks include multiple convolutional layers and maximum pond layer with And a full articulamentum, after convolution and pondization processing, image is changed into the feature of 1*1*32 sizes from 10*10*3, then is passed through Two different full connection sublayers correspond to different tasks；Second level convolutional neural networks use the image of 24*24*3 sizes As input, second level convolutional neural networks include multiple convolutional layers and two maximum pond layers and an output length is 128 full articulamentum, the feature of the full articulamentum output of this grade and the 1*1*32 of the first order convolutional neural networks output feature It is stitched together and forms length and be 160 feature, then corresponds to by two different full connection sublayers the study of different task；The Three-level convolutional neural networks are using the image of 48*48*3 sizes as input, and third level convolutional neural networks include multiple convolution The full articulamentum that layer and three maximum pond layers and an output length are 256, the feature of the full articulamentum output of this grade with The merging features that the length of second layer convolutional neural networks output is 160 get up to be formed the feature that length is 416, then pass through three Different full connection sublayers corresponds to the study of different task.

Further, in the training stage, in the multistage convolutional neural networks of cascade, multistage convolutional neural networks above are defeated Enter sample for face positive sample, face negative sample and part face sample, face is added in afterbody convolutional neural networks Key point sample, and face positive sample and the ratio of face negative sample are improved in convolutional neural networks at different levels later.

Further, in the training stage, face classification task use cross entropy loss function, human face region position return and Face key point location tasks use European loss function, and face classification task also uses middle body loss function, and will hand over Entropy loss function, European loss function and middle body loss function is pitched to export by different Weights.

Further, in detection-phase, image per one-level convolutional neural networks after handling, to the same face area of identification Domain, the individual face region clustering of K is merged into the people after k-th before using non-maxima suppression method, sample score is come Face region, and adjust human face region position.When being ranked up using non-maxima suppression method to sample score, using multistage The weight score of convolutional neural networks is used as sample score.Multistage convolutional neural networks include three-level convolutional neural networks, weighting Score computing mechanism is：First order convolutional neural networks classification score is S1, second level convolutional neural networks classification score S2, the Three-level convolutional neural networks classification score S3, in sequence, sample passes through the score T1=that first order convolutional neural networks are used S1, score T2=S2+S1*0.5 of the sample after the convolutional neural networks of the second level, sample pass through third level convolutional neural networks Score T3=S3+T2*0.5 afterwards.

According to another aspect of the present invention, a kind of Face datection based on multitask concatenated convolutional neutral net is additionally provided Device, including：Module is set up, the multistage convolutional neural networks for setting up cascade；Training module, for multistage convolutional Neural Network is trained, and training module is used as instruction using face positive sample, face negative sample, part face and face key point sample Practicing sample, face classification, human face region position are returned and the crucial point location of face is appointed to train multistage convolutional neural networks to carry out The study of business；Detection module, for carrying out Face datection using the multistage convolutional neural networks trained to image to be detected；Instruction Practice module in the training stage, while the face negative sample for combining to excavate hardly possible using online mode and offline mode is used as training sample This.

Further, online mode is：The gradient trained in each batch processing returns the stage, and a counting loss is more than setting The gradient of the part face negative sample of threshold value, ignores remaining face negative sample；Offline mode is：By the model that trains come The face negative sample of classification error is obtained, and one of score in setting range is selected from the face negative sample of classification error Batch, it is added to predetermined probability in the face negative sample currently trained and continues to train the model.

Further, set up in the multistage convolutional neural networks that module is set up, include successively per one-level convolutional neural networks In the multiple convolutional layers, at least one maximum pond layer and the full articulamentum, and rear stage convolutional neural networks that connect The feature of full articulamentum output is combined with the feature of the full articulamentum output of upper level convolutional neural networks, then by different complete Connection sublayer corresponds to the study of different tasks.

The invention has the advantages that：Multistage convolutional neural networks of the invention based on cascade, can learn to Shandong The stronger feature of rod, while by the way of two kinds of online mode and offline mode excavate difficult face negative sample, can improve The classification capacity of network, so as to improve the detectability and accuracy rate of network, and ensures operation of this method in actual product Speed.

In addition to objects, features and advantages described above, the present invention also has other objects, features and advantages. Below with reference to accompanying drawings, the present invention is further detailed explanation.

Brief description of the drawings

The accompanying drawing for constituting the part of the application is used for providing a further understanding of the present invention, schematic reality of the invention Apply example and its illustrate to be used to explain the present invention, do not constitute inappropriate limitation of the present invention.In the accompanying drawings：

Fig. 1 is the flow of method for detecting human face of the preferred embodiment of the present invention based on multitask concatenated convolutional neutral net Figure；

Fig. 2 is the structural representation of the first order convolutional neural networks of the preferred embodiment of the present invention；

Fig. 3 is the structural representation of the second level convolutional neural networks of the preferred embodiment of the present invention；

Fig. 4 is the structural representation of the third level convolutional neural networks of the preferred embodiment of the present invention；

Fig. 5 is the structure of the human face detection device based on multitask concatenated convolutional neutral net of the preferred embodiment of the present invention Block diagram.

Embodiment

It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combination.Describe the present invention in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Reference picture 1, the method for detecting human face of the invention based on multitask concatenated convolutional neutral net, idiographic flow bag Include：

Step S100, sets up the multistage convolutional neural networks of cascade；

Step S200, training sample is used as using face positive sample, face negative sample, part face and face key point sample Originally the multistage convolutional neural networks were trained to carry out face classification, the recurrence of human face region position and the crucial point location of face The study of task；

Step S300, Face datection is carried out using the multistage convolutional neural networks trained to image to be detected.

The preferred embodiments of the present invention, by setting up and training the convolutional neural networks of three-level differentiation of cascade to realize people The crucial positioning of face detection, face.The present invention ensures the speed of service of this method in actual product first, on this basis, Image pyramid is constructed to input picture, the shallow-layer convolutional neural networks of three cascades is reapplied, greatly improves the performance of algorithm. First order convolutional neural networks ensure that higher face recall rate while speed is ensured, second level convolutional neural networks and 3rd convolutional neural networks further obtain the position of accurate human face region positional information and face key point.Every grade of convolution Neutral net is corrected while whether discriminant classification input picture is face to the position of human face region, is further improved The accuracy rate of Face datection.

Specifically, first order convolutional neural networks are used as input using the image of 10*10*3 sizes.In first order convolution god Training stage through network, by taking 8*8,9*9,10*10,11*11,12*12 respectively, totally five kinds of input sizes are tested, The results show, as input, (can contrast 12*12*3) using the image of 10*10*3 sizes while very little precision is lost Increase substantially the speed of first order convolutional neural networks.The structures of first order convolutional neural networks as shown in Fig. 2 it include according to One 1*3*3 of secondary connection convolutional layer (C), 3*1*10 convolutional layer, 2*2 maximum pond max pooling Layer (MP) and a 1*3*10 convolutional layer, the convolutional layer of 3*1*16 convolutional layer and 1*3*10 and one connect entirely Connect layer.After convolution and pondization processing, image is changed into the feature (feature Map) of 1*1*32 sizes from 10*10*3, then is led to Cross two full connection sublayers and correspond to different tasks, including face classification (face classification) and human face region Position returns (bounding box regression).

For further optimal speed, core 3*3*10 convolutional layer is split into a core for 1*3*3 and one by the present invention Core is 3*1*10 convolutional layer, and further optimal speed while performance hardly declines, the present invention is rolled up in two-stage below Also apply the technology to optimize network in product neutral net.In detection-phase, the present invention is refreshing by the first order convolution trained Arranged through the full articulamentum in network according to core for the form of 1*1 convolutional layer, can not so limit the size of input picture.It is defeated Enter whole image and obtain the characteristic pattern after first network processes, judge which position for face area according to this feature figure Domain.

Second level convolutional neural networks input is 24*24*3, as shown in figure 3, its structure includes a 1* being sequentially connected 3*3 convolutional layer, 3*1*28 convolutional layer, 3*3 maximum pond layer, 1*3*28 convolutional layer, a 3* The maximum pond layer of 1*48 convolutional layer, 3*3, reconnects convolutional layer and a full articulamentum that a core is 2*2, exports Length be 128.Merging features by this length for the 1*1*32 of 128 feature and the first order convolutional neural networks output rise To form the feature that length is 160, the study of different tasks is corresponded to by two full connection sublayers on this basis, including Face classification and human face region position are returned.Experiment proves to add after the feature of first order convolutional neural networks, can improve second The classification capacity of level convolutional neural networks.

As shown in figure 4, the input of third level convolutional neural networks is 48*48*3, a 1* is sequentially connected after input layer 3*3 convolutional layer, 3*1*32 convolutional layer, 3*3 maximum pond layer, 1*3*32 convolutional layer, a 3* 1*64 convolutional layer, 3*3 maximum pond layer, a core for 3*3 convolutional layer and a 2*2 maximum pond layer, one Individual core is for 2*2 convolutional layer, immediately following a full articulamentum, and output length is 256.It is similar with second level convolutional neural networks, this The merging features that the feature and this level length that the length that invention also obtains preceding two-stage convolutional neural networks is 160 are 256 are one Rise, characteristic length is 416 after splicing, then correspond to by three different full connection sublayers the study of different task, including people Face classification, human face region position are returned and the crucial point location (landmark localization) of face.

The present invention is used as training data using WIDER FACE and CelebA.In the training stage, according to selected image district Domain and the overlapping size of actual face marked region, training sample is divided into face positive sample, and (positives, overlapping region is big In 0.7), face negative sample (negatives, overlapping region be less than 0.3), (part faces, overlapping region is more than part face 0.5 and less than 0.7), face key point sample (landmarks, have key point mark face positive sample).Wherein Positives and negatives is used for face classification task；Positives and part faces are returned for human face region position Return task；Landmarks is used for face key point location tasks.In the stage of training first order convolutional neural networks, control Positives, negatives, part faces ratio are 1:3:1, in the stage control of training second level convolutional neural networks It is 1 to make the ratio:2:1, trained finally adding landmarks in the stage of training third level convolutional neural networks, and control Positives processed：negatives：part faces：Landmarks ratio is 1:1:1:1.In second level convolutional Neural net Face positive sample and the ratio of face negative sample are improved in the training process of network and third level convolutional neural networks, can be effectively improved The classification capacity of this two-level network.

When first order convolutional neural networks are trained, from the big diagram data for having label, random interception 10x10 sizes Image block as first order convolutional neural networks input.In training second level convolutional neural networks and third level convolutional Neural When network, the network trained before is done into Face datection in the data for have label respectively and obtains correspondence next stage network Input data.

Returned for face classification, human face region position, crucial these three the different tasks of point location of face, the present invention adopts With different loss functions.Face classification uses cross entropy loss function, and its formula is as follows：

Wherein p_iExpression is the probability of positive sample (face) in the sample,Value be 0 or 1, respectively to should sample Physical tags be face or non-face.

The European loss function that human face region position is returned and the crucial point location of face is all used, its formula is as follows：

In order to further improve the performance of face classification, present invention employs middle body loss function (center Loss the distance of face sample and non-face sample in higher dimensional space) is further pulled open, is also demonstrated that in experiment, in every one-level convolution Added in the face classification task of neutral net after center loss, classification capacity has further raising.Center loss Formula it is as follows：

To improve the recall rate to face, several losses are defeated by different Weights by more than in training by the present invention Go out：

Wherein α_iRepresent the corresponding weight of different loss, β_iCorrespondence sample type.

In the training process of Face datection, difficult face negative sample (hard negative mining) is excavated as instruction It is always a kind of effective method for improving detectability to practice sample.The present invention is excavated using two kinds of face negative samples simultaneously Mode, one kind be online mode (online), it is another be offline mode (offline).Online mode realize it is as follows, The gradient trained in each batch processing (batch) returns the stage, and a counting loss is more than the part face negative sample of given threshold Gradient, ignore remaining face negative sample.Preferably, face negative sample of the of the invention counting loss than before larger 70% Gradient, ignore residue 30% easy face negative sample.In the offline stages, the present invention uses a kind of difficult sample This strategy, i.e., obtain the face negative sample of classification error by the model trained, from the face negative sample of classification error In select score and a collection of be added to than relatively low that with predetermined probability in training sample.It is threshold value with 0, the present invention is from sorting out Preferably chosen in wrong face negative sample score 0~0.5 that is a collection of, the people currently trained is added to 40% probability Continue to train the model in face negative sample.The method can increase substantially network in the training first order convolutional neural networks stage Classification capacity, makes first order convolutional neural networks filter out most face negative samples, effectively reduces second level convolutional Neural Network and the number of samples of third level convolutional neural networks processing, improve detectability and the speed of service.It is demonstrated experimentally that more than Two methods can improve the classification capacity of network.Above probability/numerical value is only preferred mode, the invention is not limited in This.

After the completion of training, Face datection, detection are carried out to image to be detected using the multistage convolutional neural networks trained The whole flow process in stage is as follows：

The image of detection is needed for each, image pyramid (resize) is built first, by the every of pyramid diagram picture One-level as first order convolutional neural networks input.For first order convolutional neural networks, volume is helped into the conversion of full articulamentum Product network, can correspond to the input of any scale size, and directly output obtains the confidence level figure of human face region position, and thus calculates Go out correspondence human face region position.

Then Cluster merging is carried out to the human face region that first order convolutional neural networks are exported.The present invention is using a kind of improvement Non-maxima suppression top K NMS methods carry out Cluster merging, to the same human face region of identification, K before retaining every time Divide high, be merged into the human face region after the relatively low namely k-th of score, further ensure face recall rate.Will be poly- Human face region after class merges carries out aligning, then zooms to the input size of second level convolutional neural networks.

Image after scaling will carry out position after the processing of second level convolutional neural networks by the human face region of threshold value Adjustment, then passes through top K NMS processing.Third level convolutional neural networks except input make 48*48*3 sizes into addition to, other At the same time process obtains the information of face key point with second level convolutional neural networks.

When being ranked up using non-maxima suppression method to sample score, the present invention not only considers obtaining for current network Point, go back the score of cumulative calculation previous stage network.Specifically score computing mechanism is：First order convolutional neural networks classification score It is S1, second level convolutional neural networks classification score S2, third level convolutional neural networks classification score S3, in sequence, sample is passed through Cross the score T1=S1, score T2=S2 of the sample after the convolutional neural networks of the second level of first order convolutional neural networks use + S1*0.5, sample score T3=S3+T2*0.5 after third level convolutional neural networks.Fully take into account three-level convolutional Neural The weight score of network, experiment proves that this method can improve the accuracy rate of Face datection.

According to another aspect of the present invention, a kind of Face datection based on multitask concatenated convolutional neutral net is additionally provided Device, the device of the present embodiment refers to above method embodiment, specifically, reference picture 5, and it includes：

Module 400 is set up, the multistage convolutional neural networks for setting up cascade；

Training module 500, for being trained to multistage convolutional neural networks, the use face of training module 500 positive sample, Face negative sample, part face and face key point sample train multistage convolutional neural networks to carry out face as training sample Classification, the recurrence of human face region position and the study of face key point location tasks；And training module 500 is in the training stage, together The face negative sample that Shi Caiyong online modes and offline mode combine to excavate hardly possible is as training sample；

Detection module 600, for carrying out Face datection using the multistage convolutional neural networks trained to image to be detected.

Online mode is realized as follows：The gradient trained in each batch processing returns the stage, and a counting loss is than before larger The gradient of 70% face negative sample, ignores the easy face negative sample of residue 30%.Offline mode is specially：Pass through training Good model selects score 0~0.5 to obtain the face negative sample of classification error from the face negative sample of classification error That is a collection of, be added to 40% probability in the face negative sample currently trained and continue to train the model.

Set up in the multistage convolutional neural networks that module 400 is set up, include what is be sequentially connected per one-level convolutional neural networks Full connection in multiple convolutional layers, at least one maximum pond layer and full an articulamentum, and rear stage convolutional neural networks The feature that the feature of layer output is exported with the full articulamentum of upper level convolutional neural networks is combined, then passes through different full connexons Layer corresponds to the study of different tasks.

The method for detecting human face and device based on multitask concatenated convolutional neutral net of the present invention, with advantages below：

(1) the input size and network structure of first order convolutional neural networks are adjusted, the speed of service greatly improved.

(2) while by the way of online and offline two kinds are excavated difficult face negative sample, increasing first order convolutional Neural Network is effectively reduced at second level convolutional neural networks and third level convolutional neural networks to the elimination ability of face negative sample The number of samples of reason, improves detectability and the speed of service.

(3) face classification task in the training process adds center loss constraint, widens face and non-face Class spacing, be conducive to improve classification capacity.

(4) different scale feature is combined, improves the ability to express of feature.Specifically, second level convolutional neural networks The feature representation of first order convolutional neural networks is added, third level convolutional neural networks add first order convolutional neural networks With the feature representation of second level convolutional neural networks, the robustness of different scale is enhanced, second level convolutional neural networks are improved With the ability of the classification of third level convolutional neural networks.

(5) after by every grade of convolutional neural networks processing, human face region position is used non-maxima suppression top by the present invention K NMS methods carry out Cluster merging, and adjust human face region position, further increase the accuracy rate of Face datection.In non-pole When big value suppresses, the strategy weighted using top K NMS and convolutional neural networks score at different levels improves recalling for face Rate.

The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims

1. a kind of method for detecting human face based on multitask concatenated convolutional neutral net, it is characterised in that including：

Set up the multistage convolutional neural networks of cascade；

Train described as training sample using face positive sample, face negative sample, part face and face key point sample Multistage convolutional neural networks carry out face classification, the recurrence of human face region position and the study of face key point location tasks；

Face datection is carried out to image to be detected using the multistage convolutional neural networks trained；

Wherein, in the training stage, while the face negative sample for combining to excavate hardly possible using online mode and offline mode is as institute State training sample.

2. the method for detecting human face according to claim 1 based on multitask concatenated convolutional neutral net, it is characterised in that The online mode is：The gradient trained in each batch processing returns the stage, and a counting loss is more than the groups of people of given threshold The gradient of face negative sample, ignores remaining face negative sample.

3. the method for detecting human face according to claim 1 based on multitask concatenated convolutional neutral net, it is characterised in that The offline mode is：The face negative sample of classification error, and the face malfunctioned from classification are obtained by the model trained Select that score is a collection of in setting range in negative sample, the face negative sample relaying currently trained is added to predetermined probability The continuous training model.

4. the method for detecting human face according to claim 1 based on multitask concatenated convolutional neutral net, it is characterised in that Multiple convolutional layers that every one-level in the multistage convolutional neural networks includes being sequentially connected, at least one maximum pond layer and The feature and upper level convolutional neural networks of full articulamentum output in one full articulamentum, and rear stage convolutional neural networks The feature of full articulamentum output is combined, then corresponds to by different full connection sublayers the study of different tasks.

5. the method for detecting human face according to claim 4 based on multitask concatenated convolutional neutral net, it is characterised in that The multistage convolutional neural networks include three-level convolutional neural networks, wherein,

First order convolutional neural networks use the image of 10*10*3 sizes as input, the first order convolutional neural networks bag Multiple convolutional layers and a maximum pond layer and a full articulamentum are included, after convolution and pondization processing, image is by 10* 10*3 is changed into the feature of 1*1*32 sizes, then corresponds to different tasks by two different full connection sublayers；

Second level convolutional neural networks use the image of 24*24*3 sizes as input, the second level convolutional neural networks bag Include the full articulamentum that multiple convolutional layers and two maximum pond layers and an output length are 128, the full connection of this grade The feature of layer output and the 1*1*32 of the first order convolutional neural networks output merging features get up to be formed the spy that length is 160 Levy, then correspond to by two different full connection sublayers the study of different task；

Third level convolutional neural networks use the image of 48*48*3 sizes as input, the third level convolutional neural networks bag Include the full articulamentum that multiple convolutional layers and three maximum pond layers and an output length are 256, the full connection of this grade It is 416 that the merging features that the feature of layer output and the length of second layer convolutional neural networks output are 160, which get up to be formed length, Feature, then correspond to by three different full connection sublayers the study of different task.

6. the method for detecting human face according to claim 1 based on multitask concatenated convolutional neutral net, it is characterised in that In the training stage, the multistage convolutional neural networks of cascade, multistage convolutional neural networks input sample above is face Positive sample, face negative sample and part face sample, add face key point sample in afterbody convolutional neural networks, and Face positive sample and the ratio of face negative sample are improved in convolutional neural networks at different levels later.

7. the method for detecting human face according to claim 1 based on multitask concatenated convolutional neutral net, it is characterised in that In the training stage, face classification task uses cross entropy loss function, and human face region position is returned and the crucial point location of face is appointed Business use European loss function, face classification task also use middle body loss function, and by the cross entropy loss function, The European loss function and the middle body loss function are exported by different Weights.

8. the method for detecting human face according to claim 1 based on multitask concatenated convolutional neutral net, it is characterised in that In detection-phase, image, to the same human face region of identification, is pressed down after being handled per one-level convolutional neural networks using non-maximum The individual face region clustering of K is merged into the human face region after k-th before method processed comes sample score, and adjusts face Regional location.

9. the method for detecting human face according to claim 8 based on multitask concatenated convolutional neutral net, it is characterised in that When being ranked up using non-maxima suppression method to sample score, using the weight score of the multistage convolutional neural networks It is used as sample score.

10. the method for detecting human face according to claim 9 based on multitask concatenated convolutional neutral net, its feature exists In the multistage convolutional neural networks include three-level convolutional neural networks, and the weight score computing mechanism is：First order convolution Neural network classification score is S1, second level convolutional neural networks classification score S2, third level convolutional neural networks classification score S3, in sequence, sample passes through the score T1=S1 that first order convolutional neural networks are used, and sample is by second level convolutional Neural Score T2=S2+S1*0.5 after network, sample score T3=S3+T2*0.5 after third level convolutional neural networks.

11. a kind of human face detection device based on multitask concatenated convolutional neutral net, it is characterised in that including：

Module is set up, the multistage convolutional neural networks for setting up cascade；

Training module, for being trained to the multistage convolutional neural networks, the training module uses face positive sample, people Face negative sample, part face and face key point sample train the multistage convolutional neural networks to enter pedestrian as training sample Face classification, the recurrence of human face region position and the study of face key point location tasks；

Detection module, for carrying out Face datection using the multistage convolutional neural networks trained to image to be detected；

The training module is in the training stage, while combining to excavate the face negative sample of hardly possible using online mode and offline mode It is used as the training sample.

12. the human face detection device according to claim 11 based on multitask concatenated convolutional neutral net, its feature exists In the online mode is：The gradient trained in each batch processing returns the stage, and a counting loss is more than the part of given threshold The gradient of face negative sample, ignores remaining face negative sample；The offline mode is：Obtained by the model trained point Score is selected in the face negative sample of class error, and the face negative sample malfunctioned from classifying a collection of in setting range, with Predetermined probability, which is added in the face negative sample currently trained, to be continued to train the model.

13. the method for detecting human face according to claim 11 based on multitask concatenated convolutional neutral net, its feature exists In, it is described to set up in the multistage convolutional neural networks that module is set up, include what is be sequentially connected per one-level convolutional neural networks Full connection in multiple convolutional layers, at least one maximum pond layer and full an articulamentum, and rear stage convolutional neural networks The feature that the feature of layer output is exported with the full articulamentum of upper level convolutional neural networks is combined, then passes through different full connexons Layer corresponds to the study of different tasks.