CN109086660A

CN109086660A - Training method, equipment and the storage medium of multi-task learning depth network

Info

Publication number: CN109086660A
Application number: CN201810615755.0A
Authority: CN
Inventors: 李千目; 练智超; 侯君; 朱虹; 李良; 宋佳
Original assignee: Shenzhen Bowei Chuangsheng Technology Co Ltd
Current assignee: Shenzhen Bowei Chuangsheng Technology Co Ltd
Priority date: 2018-06-14
Filing date: 2018-06-14
Publication date: 2018-12-25

Abstract

The embodiment of the invention discloses the training method of multi-task learning depth network, equipment and storage mediums.The embodiment of the present invention enables multi-task learning depth network be trained simultaneously to relevant positioning feature point task, feature visibility of a point prediction task, Face datection task and gender identification mission, task, Face datection task and gender identification mission are predicted by carrying out relevant positioning feature point task, feature visibility of a point mutually simultaneously in multi-task learning depth network, utilize influencing each other between relevant multiple tasks, the detection accuracy for improving corresponding task mutually, improves the performance of multi-task learning depth network.

Description

Training method, equipment and the storage medium of multi-task learning depth network

Technical field

The present embodiments relate to field of biological recognition, training method, training more particularly to multi-task learning network Equipment and storage medium.

Background technique

Recognition of face is the major issue in computer vision project, wherein relatively important several aspects are face inspection Survey, face feature point identification, facial characteristics point location etc..Many visual tasks depend on accurate facial characteristics point location knot Structure, such as face recognition, facial expression analysis and FA Facial Animation etc..Although being used in recent years by extensive research, and take A degree of success was obtained, but due to partial occlusion, expression shape change of illumination, largely end rotation and exaggeration etc. Factor, leads to the complexity and diversity of facial image, and facial characteristics point location still suffers from problems and challenge.

In the prior art, the method for facial characteristics point location can be roughly divided into two types: conventional method and be based on depth The method of study.Typical conventional method includes the method based on model and the method based on recurrence；Method based on model exists Learn shape increment in the case where given average original shape, such as active shape model (Active Shape Mode, ASM) and Active appearance models (active appearance model, AAM), using statistical model such as principal component analysis (Principal Component Analysis, PCA) shape and cosmetic variation are captured respectively；However, since single linear model is difficult to carve The complex nonlinear variation in reality scene data is drawn, therefore the conventional method based on model cannot be obtained with largely The accurate shape of the facial images such as the facial expression of head pose variation and exaggeration.The method based on recurrence in conventional method is then It is that key point position is predicted by training display model.There is researcher to pass through in Scale invariant features transform (Scale- Invariant feature transform, SIFT) on carry out predicting shape increment using linear regression.In addition, also there is research people Member proposes that using image pixel intensities difference to learn a series of random ferns as characteristic sequence returns, and gradually degenerates and learn cascade shape Shape, they return all parameters simultaneously, to effectively utilize shape constraining；I.e. the method based on recurrence is mainly from first The characteristic point position of prediction is iteratively modified in the estimation of beginning, therefore final result is highly dependent on initialization.

For the method based on deep learning, current existing several ways.Sun et al. is proposed using three-stage cascade convolution Neural network framework carries out the new method of facial characteristics point location, by 5 spies with convolutional neural networks (CNN) to face Sign point (i.e. left and right eye, nose, the left and right corners of the mouth) is returned, while using the convolutional neural networks of different stage come to feature Point combined trim.In addition, Zhang et al. proposes a kind of depth nonlinear characteristic independent positioning method (Coarse- from thick to thin To-Fine Auto-Encoder Networks, CFAN), use continuous self-encoding encoder network implementations nonlinear regression model (NLRM).This Both of which passes through cascade mode gradually location feature point using multiple depth networks.They for every image from coarse to fine The optimal characteristic point position of search, showing has higher precision than pervious characteristic point positioning method, however to blocking Problem can not but be effectively treated.In addition, because multiple convolutional neural networks structures are used, with the increasing of face feature point quantity Add, the time loss for positioning all the points also increases accordingly.In reality without in constraint environment, facial characteristics point location is actually simultaneously An individual task, it can also be interfered by various factors, such as: the swing on head, gender difference, all can Influence the accuracy of positioning feature point.

Summary of the invention

The embodiment of the present invention mainly solving the technical problems that provide a kind of training method of multi-task learning depth network, The performance of multi-task learning depth network can be improved.

In order to solve the above technical problems, a technical solution used in the embodiment of the present invention is: providing a kind of multitask The training method of depth network is practised, which includes:

Training set is inputted in multi-task learning depth network and carries out multi-task learning, exports the pre- of the multi-task learning Survey result, wherein the multi-task learning includes positioning feature point task, feature visibility of a point prediction task, Face datection times Business and gender identification mission；

The prediction result is compared with the label result in the training set, according to the comparison result obtain with The corresponding penalty values of the multi-task learning；

The penalty values are fed back in the multi-task learning depth network, the multi-task learning depth net is corrected Network.

In order to solve the above technical problems, another technical solution used in the embodiment of the present invention is: providing a kind of multitask Learn the training equipment of depth network, which includes:

Memory and processor interconnected；

The memory is stored with the multi-task learning depth network and program data of training set, building；

The processor is used to above-mentioned training method is executed, using the training set to institute according to described program data Multi-task learning depth network is stated to be trained.

In order to solve the above technical problems, another technical solution used in the embodiment of the present invention is: providing a kind of storage Jie Matter, the storage medium are stored with program data, and described program data can be performed to realize above-mentioned multi-task learning depth The training method of network.

The beneficial effect of the embodiment of the present invention is: in the training side of the multi-task learning depth network of the embodiment of the present invention Method carries out multi-task learning by inputting training set in multi-task learning depth network, exports the pre- of the multi-task learning Survey result；The prediction result is compared with the label result in the training set, according to the comparison result obtain with The corresponding penalty values of the multi-task learning；The penalty values are fed back in the multi-task learning depth network, institute is corrected State multi-task learning depth network.Wherein, the multi-task learning includes positioning feature point task, the prediction times of feature visibility of a point Business, Face datection task and gender identification mission.The present embodiment is mutual by carrying out simultaneously in multi-task learning depth network Mutually relevant positioning feature point task, feature visibility of a point prediction task, Face datection task and gender identification mission, utilize Influencing each other between relevant multiple tasks improves mutually the detection accuracy of corresponding task, improves multi-task learning depth net The performance of network.

Detailed description of the invention

Fig. 1 is the flow diagram of the training method first embodiment of multi-task learning depth network of the present invention；

Fig. 2 is the flow diagram of an embodiment of step S101 in Fig. 1；

Fig. 3 is the flow diagram of another embodiment of step S101 in Fig. 1；

Fig. 4 is the structural schematic diagram of one embodiment of multi-task learning depth network of the invention；

Fig. 5 is the flow diagram of the training method second embodiment of multi-task learning depth network of the present invention；

Fig. 6 is the flow diagram of the training method 3rd embodiment of multi-task learning depth network of the present invention；

Fig. 7 is the flow diagram of the another embodiment of step S101 in Fig. 6；

Fig. 8 is the structural schematic diagram of one embodiment of training equipment of multi-task learning depth network of the present invention；

Fig. 9 is the structural schematic diagram of another embodiment of training equipment of multi-task learning depth network of the present invention；

Figure 10 is the flow diagram of the test method first embodiment of multi-task learning depth network of the present invention；

Figure 11 is the flow diagram of an embodiment of step S201 in Figure 10；

Figure 12 is the structural schematic diagram of one embodiment of first order neural network of two-level concatenation convolutional neural networks of the present invention；

Figure 13 is the structural schematic diagram of one embodiment of second level neural network of two-level concatenation convolutional neural networks of the present invention；

Figure 14 is the flow diagram of the test method second embodiment of multi-task learning depth network of the present invention；

Figure 15 is the flow diagram of the 3rd embodiment of multi-task learning depth network test mode of the present invention；

Figure 16 is the flow diagram of the fourth embodiment of multi-task learning depth network test mode of the present invention；

Figure 17 is the structural schematic diagram of one embodiment of test equipment of multi-task learning depth network of the present invention；

Figure 18 is the structural schematic diagram of one embodiment of storage medium of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that the described embodiments are merely a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Referring to Fig. 1, Fig. 1 is the process signal of the training method first embodiment of multi-task learning depth network of the present invention Figure.As shown in Figure 1, the training method of the multi-task learning depth network of the present embodiment at least may include following steps:

In step s101, training set is inputted in multi-task learning depth network and carries out multi-task learning, export more The prediction result of business study.

In the present embodiment, the multitask of Primary Construction is had already passed through using the image of training set as the input of the data source of training Learn depth network, multitask is carried out to the image for including in training set by the multi-task learning depth network of Primary Construction It practises, and obtains the prediction result of multi-task learning.

In the present embodiment, multi-task learning includes positioning feature point task, feature visibility of a point prediction task, Face datection Task and gender identification mission.The multi-task learning depth network of Primary Construction, which can be exported accordingly in training set, as a result, includes Image face in positioning feature point result, feature visibility of a point prediction result, Face datection result and gender identification As a result.

In the present embodiment, using AFLW data set as training set for, the multi-task learning network of Primary Construction is carried out Training.AFLW data set includes most of facial image under natural conditions, possesses very huge information content, and AFLW data Concentrating is that every face both provides the mark of 21 characteristic points, in addition, face frame, head appearance are also marked in AFLW data set State and gender information.AFLW data set includes 25993 facial images manually marked, and wherein male accounts for 41%, women accounts for 59%, and most images are color image, and only fraction image is gray level image.It, will in the present embodiment Most of image is both used as the training set of multi-task learning depth network in AFLW data set, and there are also fraction images can then retain For testing the multi-task learning depth network after training, with the multi-task learning depth network after training of judgement whether Accuracy needed for meeting.

In step s 102, prediction result is compared with the label result in training set, is obtained according to comparison result Penalty values corresponding with multi-task learning.

The multi-task learning depth network of available Primary Construction carries out positioning feature point task, spy in step s101 The corresponding prediction of each task obtained from visibility of a point prediction task, Face datection task and gender identification mission is levied to tie Fruit.Then obtained prediction result is compared with the label result on the image in training set for this step, and then respectively obtains The corresponding penalty values of each task execution in multi-task learning.

In step s 103, penalty values are fed back in multi-task learning depth network, corrects multi-task learning depth net Network.

The corresponding penalty values of each task characterize the corresponding accuracy of each task in multi-task learning, by the penalty values Backpropagation, and then the error of an available layer network upper in multi-task learning depth network are participated in, and then to multitask Study depth network is modified, and finally obtains revised multi-task learning depth network to get the multitask to after training Learn depth network.

Further referring to Fig. 2, Fig. 2 is the flow diagram of an embodiment of step S101 in Fig. 1.As shown in Fig. 2, Step S101 at least may include following steps:

In step S1011, training set is inputted to the several layers convolutional layer of multi-task learning depth network cascade step by step With pond layer, and from several layers convolutional layer and pond layer multiple convolutional layers and/or multiple pond layers export respectively accordingly First operation result.

In the present embodiment, the multi-task learning depth network of Primary Construction be using the network structure of AlexNet network as Basis improves what building obtained.

Include several cascade convolutional layers and pond layer in multi-task learning depth network, with the data to output into Row convolution algorithm or pondization operate, and in the present embodiment, the pondization operation of pond layer is maximum pondization operation.Each convolutional layer and Each pond layer can obtain corresponding operation result by corresponding operation, these operation results are corresponding feature letter Breath.In the present embodiment, selected section convolutional layer and/or the layer conduct of part pond from the cascade convolutional layer of several layers and pond layer The output layer of operation result extracts multiple convolutional layers and/or multiple ponds respectively from several layers convolutional layer and pond layer The corresponding operation result of layer, and using the operation result extracted as the first operation result.

Further, the characteristic information for including due to the operation result that each convolutional layer and each pond layer export is incomplete It is identical, therefore the corresponding operation result by extracting multiple convolutional layers and/or multiple pond layers is so that the first obtained operation As a result the demand of information content needed for can satisfy multi-task learning.

In multi-task learning depth network, the operation result exported compared with the convolutional layer and pond layer of shallow-layer includes more Marginal information and angle point information are conducive to the study of positioning feature point task；The fortune of convolutional layer and pond the layer output of deeper Calculating result includes more Global Informations, then is more advantageous to the study for carrying out the more complex tasks such as Face datection, gender identification. As a result, in the present embodiment, multiple convolutional layers and/or multiple pond layers include at least convolutional layer of the several layers compared with shallow-layer and/or pond The convolutional layer and/or pond layer for changing layer and several layers deeper, so that the first obtained operation result includes enough sides Edge information and angle point information also include certain Global Information, so that the information extracted can preferably carry out multi-task learning, The number of plies specifically extracted then needs to be adjusted according to final prediction result etc., the information for including to avoid the first operation result It measures excessive.

In step S1012, the first operation result input feature vector is merged into full linking layer, exports Fusion Features data.

After extracting corresponding first operation result respectively, the letter that includes due to multiple convolutional layers and/or pond layer Breath amount is larger, can not directly carry out multi-task learning, needs to carry out feature to multiple corresponding first operation results extracted Fusion maps that a sub-spaces, and then improves network performance.

In the present embodiment, multiple corresponding first operation results obtained in step S1011 are output to multi-task learning The full linking layer of Fusion Features in depth network, by the full linking layer of Fusion Features to multiple corresponding first operation knots of input Fruit carries out Fusion Features, and exports Fusion Features data.

In step S1013, complete link the input of Fusion Features data is corresponding with each task in multi-task learning Layer carries out the study of each task respectively, exports the corresponding prediction result of each task respectively.

Fusion Features data will be obtained after Fusion Features further inputs each of multi-task learning depth network The corresponding full linking layer of task, full linking layer corresponding with each task carry out tagsort to the Fusion Features data of input, And it is linked to the corresponding branch of each task respectively, and then obtain the prediction result of each task.

Further referring to Fig. 3, as shown in figure 3, step S1011 obtains more in another embodiment of step S101 After corresponding first operation result that a convolutional layer and/or pond layer export respectively, it may also include the steps of:

In step S1014, the first operation result is at least partly inputted into corresponding sub- convolutional layer respectively, output corresponds to The second operation result with identical dimensional.

In the present embodiment, the characteristic of several layers convolutional layer and pond the layer output of multi-task learning depth network cascade It is different according to the size of (i.e. characteristic pattern), because exported respectively from multiple convolutional layers and/or multiple pond layers in step S1011 The size of corresponding first operation result is different, they cannot directly be attached.The present embodiment will be in step S1011 as a result, Obtained each convolutional layer and/or pond layer exports at least partly corresponding first operation result respectively and inputs corresponding sub- convolution In layer, the convolution kernel size of every sub- convolutional layer is corresponding with the size of corresponding first operation result that it is inputted, to be had There is the second operation result of identical dimensional.

It is understood that deeper convolutional layer and deeper pond layer output operation result size be greater than compared with The size of the operation result of the pond layer output of the convolutional layer and deeper of shallow-layer；And convolutional layer and pond layer that the number of plies is deeper, The size of its operation result exported is bigger.Thus, it is possible to by the convolutional layer for the bottommost layer for exporting the first operation result or pond The standard size that is adjusted as size of size of first operation result of layer output, by before each convolutional layer or pond layer it is defeated Size out is adjusted to the standard size.For example, if the first operation result that the convolutional layer or pond layer of bottommost layer export Having a size of 6x6x256, then by the first operation knot of convolutional layer or the output of pond layer before the convolutional layer of bottommost layer or pond layer Fruit is adjusted to 6x6x256.

In step S1015, the second operation result with identical dimensional is inputted into full convolutional layer, after exporting dimension-reduction treatment Third operation result.

The second operation result that the present embodiment obtains identical dimensional exports in a full convolutional layer, the convolution kernel of the full convolutional layer For 1x1, and then dimension-reduction treatment is carried out to the second operation result, the third operation result after exporting dimension-reduction treatment, and will be at dimensionality reduction The third operation result obtained after reason is input to the full linking layer of Fusion Features as the first operation result, continues step S1012 and step S1013.

Further, referring to Fig. 4, Fig. 4 is the structural schematic diagram of one embodiment of multi-task learning depth network of the invention. As shown in figure 4, the multi-task learning depth network (in dotted line frame) of the present embodiment includes several cascade convolutional layers and pond first Change layer, in the present embodiment, each pond layer can also pass through Regularization；First layer convolutional layer is defined as according to its waterfall sequence (conv1), first layer pond layer (pool1), second layer convolutional layer (conv2), second layer pond layer (pool2) etc. are as such It pushes away, the present embodiment is for being cascaded to the 5th pond layer (pool5).Training set inputs above-mentioned cascade convolutional layer and pond layer In, and corresponding first pond operation result, first are exported from first layer pond layer, third layer convolutional layer and layer 5 pond layer Convolution algorithm result and the second pond operation result, first layer pond layer, third layer convolutional layer and layer 5 pond in the present embodiment The size for changing the operation core of layer is 3x3, correspondingly, the first pond operation result, the first convolution operation result and the second pond The size of operation result is respectively 27x27x96,13x13x384,6x6x256.Wherein, the second pond exported with the second pond layer The size (6x6x256) for changing operation result is the standard of size adjusting, by the first pond operation result having a size of 27x27x96 The sub- convolutional layer (conv1a) that convolution kernel is 4x4 is inputted, the first convolution operation result having a size of 13x13x384 is inputted into convolution Core is the sub- convolutional layer (conv3a) of 2x2, and then passes through sub- convolutional layer (conv1a) and sub- convolutional layer (conv3a) for the first pond The size for changing operation result and the first convolution operation result is adjusted to 6x6x256, and the fortune that will have identical dimensional after adjusting Result is calculated as the second operation result.Fig. 4 is further regarded to, the second operation result is input to the full convolution that convolution kernel is 1x1 In layer (conv_all), and then dimension-reduction treatment is carried out to the second operation result, obtains the third operation knot having a size of 6x6x192 Fruit.Later, third operation result is inputted to the full linking layer of Fusion Features (fc_full) of 3072 dimensional feature vectors, later It is linked to again and each task (positioning feature point task, feature visibility of a point prediction task, Face datection task and gender knowledge Other task) corresponding full linking layer, the dimension of full linking layer corresponding with each task is 512, for each task into Row learning training.

The present invention appoints positioning feature point task, the prediction of feature visibility of a point by above-mentioned multi-task learning depth network Business, Face datection task and gender identification mission are learnt respectively, on the one hand by multi-task learning depth network Feature visibility of a point prediction task relevant to positioning feature point, Face datection task and gender identification mission are added, is realized The precision of positioning feature point is improved, while being able to carry out the execution of other tasks.On the other hand, the multi-task learning of the present embodiment Depth network uses Feature Fusion, the characteristic pattern that multiple convolutional layers and/or pond layer export is carried out Fusion Features, in turn Obtain the data information of enough positioning feature point required by task.The multi-task learning depth network of the present embodiment is in image The situation robustness with higher that posture changing, extreme path shine, the exaggeration variation such as expression and partial occlusion is complicated, has excellent Performance, realize higher precision and preferable performance.

Further, the training method of the multi-task learning depth network of the present embodiment all convolutional layers and full linking layer it Afterwards, it is all added to nonlinear activation function, the present embodiment is to correct linear unit (Rectified linear unit, ReLU) For activation primitive.Further, the multi-task learning depth network of the present embodiment does not increase any pond in converged network Operation, because the feature that pondization operation is extracted has Scale invariant shape to local message, and this characteristic is that positioning feature point is appointed Business institute is unwanted.

Further, referring to Fig. 5, Fig. 5 is the training method second embodiment of multi-task learning depth network of the present invention Flow diagram, the present embodiment are to improve to obtain on the basis of Fig. 1 to training method first embodiment shown in Fig. 3, The structure of its multi-task learning depth network is as shown in Figure 4.As shown in figure 5, the present embodiment may also include before step S101 Following steps:

In step S104, the training of Face datection task is carried out using AlexNet network, is obtained and Face datection task Corresponding weight.

In the present embodiment, before being trained multi-task learning depth network, need to initialize the network, Initializing the weight used then has existing AlexNet network progress Face datection task to obtain.Wherein, AlexNet net Network is the neural network structure model being suggested in 2012.

In step s105, weights initialisation multi-task learning depth network is utilized.

It can be according to the weight that step S104 is obtained to multi-task learning depth network proposed by the present invention in the present embodiment It is initialized.

It may be at the hidden layer neuron in depth network according to random starting values in training depth network In saturation state, at this point, carried out in weight small adjustment can only be brought to the activation value of hidden layer neuron it is extremely micro- Weak change, and this faint change also will affect remaining neuron in network, then can bring corresponding cost function Change, it is final as a result, these weights can learn very slow when network carries out gradient descent algorithm.And by changing The distribution of variable weight, initializes network, can improve to network.

Further, referring to Fig. 6, Fig. 6 is the training method 3rd embodiment of multi-task learning depth network of the present invention Flow diagram, the present embodiment are to improve to obtain on the basis of Fig. 1 to training method first embodiment shown in Fig. 3. As shown in fig. 6, the present embodiment may also include the steps of: before step S101

In step s 106, the prediction human face region of the image in training set is calculated.

It is training by RCNN network before inputting training set in multi-task learning depth network in the present embodiment The image of concentration calculates prediction human face region.The present embodiment is for calculating the algorithm that uses of prediction human face region as selective search Algorithm.

The present embodiment can also be tied with the training method second embodiment of multi-task learning depth network shown in fig. 5 It closes, it should be noted that not no inevitable sequencing relationship before step S106 and step S104 and step S105.

Further, referring to Fig. 7, the training method 3rd embodiment of multi-task learning depth network shown in Fig. 6 On the basis of, step S101 execute will training set input multi-task learning depth network in carry out multi-task learning can be further Include the following steps:

In step S1016, training set is inputted in multi-task learning depth network, comparison prediction human face region and training Marked label human face region, obtains comparison result on the image of concentration.

Training set input value multi-task learning depth network, wherein according to the above-mentioned explanation to training set it is found that training set In include image passed through manually human face region marked, using the human face region of handmarking as label face Region.In the present embodiment, training set is input to after multi-task learning depth network, in each task for carrying out multi-task learning Study when, need for the prediction human face region that step S106 is calculated to be compared with the label human face region, so To comparison result, meet the pre- of the corresponding preset condition of each task to filter out from prediction human face region according to comparing result Survey human face region.

In the present embodiment, which is the degree of overlapping for predicting human face region and the label human face region, the degree of overlapping It is able to reflect out prediction human face region and marks and agree with degree between human face region.

In step S1017, according to comparison result, select the prediction human face region for meeting preset condition as detection face Region.

The overlapping of each prediction human face region that can be calculated by step S1016 and corresponding label human face region Degree, the present embodiment is that each task in multi-task learning is provided with corresponding preset condition, i.e., corresponding with each task Full linking layer only carries out the study of corresponding task to the prediction human face region for meeting preset condition.

The present embodiment will predict the human face region for meeting preset condition in human face region as detection human face region, due to every The corresponding preset condition of a task may be different, therefore for each task, the corresponding detection face area filtered out It domain may be different.

In step S1018, multi-task learning is carried out to detection human face region.

After the detection human face region corresponding with each task obtained according to step S1017, it can enable and each task pair The full linking layer answered carries out corresponding tasking learning to the corresponding detection human face region filtered out.

It is understood that approve in the present embodiment in step s101 execute Fig. 2 and it is shown in Fig. 3 with each task The step of before the study of corresponding full linking layer progress corresponding task.

The training of each task in the multi-task learning of multi-task learning depth network of the invention is illustrated:

For Face datection task, its corresponding preset condition is to predict that human face region is overlapping with label human face region Degree is greater than 0.5, or prediction human face region and marks the degree of overlapping of human face region less than 0.35, in other words, in the present embodiment will be with The degree of overlapping of human face region is marked to be greater than 0.5 or carry out with prediction human face region of the degree of overlapping less than 0.35 of label human face region Face datection task.Using with detection human face region of the degree of overlapping greater than 0.5 of label human face region as positive sample, will in mark Detection human face region of the degree of overlapping less than 0.35 of human face region is remembered as negative sample, and formula is as follows:

loss_D=-(1-l) log (1-p)-llog (p)；

Wherein, loss_DFor loss function, loss in the present embodiment_DFor softmax function；For positive sample, the value of l is 1；For negative sample, the value of l is -1；P indicates that the detection human face region belongs to the probability of face.People can be set in the present embodiment The p value being calculated is compared by face probability threshold value with face probability threshold value, be greater than and/or equal to face probability threshold value p Being worth corresponding detection human face region is considered as face, and detection human face region corresponding less than the p value of face probability threshold value is recognized For be it is non-face, thus carry out Face datection task study.

For positioning feature point task, the present embodiment has used labeled 21 good facial spies in AFLW data set Sign point.Preset condition corresponding with positioning feature point task is the weight predicted human face region and mark human face region in the present embodiment Folded degree is greater than 0.35, i.e., determines using the prediction human world region with the degree of overlapping of label human face region greater than 0.35 as characteristic point is carried out The detection human face region of the study of position/task.Wherein, detection human face region is indicated with { x, y, w, h }, and (x, y) is detection face area The coordinate at the center in domain, w and h are respectively the width and height for detecting human face region.Each characteristic point is relative to detection human face region (x, y) is deviated at center, does normalized with the coordinate of (w, h) to characteristic point:

Wherein, (x_i,y_i) represent face characteristic point coordinate, (a_i,b_i) indicate that the coordinate process of the characteristic point of face is returned One changes treated relative value.

(0,0) is set by the coordinate of sightless characteristic point in the present embodiment, visible characteristic point is then used pre- Determine the study that loss function carries out positioning feature point task, formula is as follows:

Wherein, loss_LFor loss function, loss function is euclidean function in the present embodiment；N is characterized quantity a little (in AFLW data set, 21) quantity of characteristic point is；For corresponding predicted characteristics point coordinate be normalized after Relative coordinate.v_iThe visibility factor for indicating characteristic point, if v_iEqual to 1, then it represents that this feature point can in the detection human face region See, if v_iEqual to 0, then it represents that this feature point is invisible in the detection human face region, and invisible characteristic point is not in the present embodiment Participate in backpropagation.

There is above-mentioned two calculation formula, relative coordinate after being normalized according to the coordinate of corresponding predicted characteristics point, Characteristic point quantity, the coordinate for detecting human face region and width and height, are finally calculated the coordinate value of characteristic point.

For feature visibility of a point, can the present embodiment, with predicted characteristics point by the visibility factor of learning characteristic point It can be seen that.Preset condition corresponding with feature visibility of a point prediction task is prediction human face region and label face area in the present embodiment The degree of overlapping in domain is greater than 0.35, i.e., using the prediction human face region with the degree of overlapping of label human face region greater than 0.35 as detection people Face region carries out the study of feature visibility of a point prediction task.Formula is as follows:

Wherein, loss_VIt is euclidean function in the present embodiment for loss function；N is characterized quantity (AFLW number a little According to concentration, 21) quantity of characteristic point is；If characteristic point as it can be seen that if its visibility factor v_iIt is 1, if characteristic point is invisible, Visibility factor is 0, and thus calculating characteristic point being capable of visible predicted value

For gender identification mission, its corresponding preset condition is prediction human face region and label face in the present embodiment The degree of overlapping in region is greater than 0.5, i.e., using the prediction human face region with the degree of overlapping of label human face region greater than 0.5 as detection people Face region carries out the study of gender identification mission, and formula is as follows:

loss_G=-(1-g) log (1-p₀)-g·log(p₁)

Wherein, loss_GFor loss function, the present embodiment can use cross entropy loss function；(p₀,p₁) it is a two dimension Probability vector has network query function acquisition, if gender is male, g=0, if gender is women, g=1.

Further, the global loss function of the multi-task learning depth network of the present embodiment is the individual damage of each task The weighted sum of mistake value, calculation formula are as follows:

Wherein, loss_tIt is the penalty values of corresponding t-th of task, weight parameter λ_tBe by each task in total losses Importance determines, λ in the present embodiment_D=1, λ_L=5, λ_V=0.5, λ_G=2, it is special to respectively indicate Face datection task, face Levy point location task, specific visibility of a point prediction task and gender identification mission.

It is and each it is understood that the study of above-mentioned each task is carried out in its corresponding full linking layer The corresponding full linking layer of task links corresponding loss function, only carries out the study of above-mentioned each task.

Further, referring to Fig. 8, Fig. 8 is the knot of one embodiment of training equipment of multi-task learning depth network of the present invention Structure schematic diagram.As shown in figure 8, the training equipment 100 of the multi-task learning depth network of the present embodiment includes interconnected deposit Reservoir 101 and processor 102, wherein memory 101 is stored with the multi-task learning depth network having been built up and corresponding Program data, in addition, memory 101 can also store the training set for training the multi-task learning depth network.Processor 102 for executing the training method first embodiment of Fig. 1 to multi-task learning depth network shown in Fig. 7 according to program data To any embodiment of 3rd embodiment, completion is trained multi-task learning depth network.

Further, as shown in figure 9, in another embodiment, training equipment 200 can also include passing through bus and memory 101 and/or the telecommunication circuit 103 that connects of processor 102, the telecommunication circuit 103 training set is inputted for obtaining training set Processor, training set can not have to be stored in memory 101 at this time.

Further, the invention also provides the test methods of multi-task learning depth network.Referring to Fig. 10, Figure 10 is this The flow diagram of the test method first embodiment of invention multi-task learning depth network.As shown in Figure 10, the present embodiment The test method of multi-task learning depth network at least may include following steps:

In step s 201, testing image is inputted into two-level concatenation convolutional neural networks, includes in output testing image First human face region to be measured.

In the present embodiment, testing image can be the figure that the training of multi-task learning depth network is not used in training set Picture, the image being also possible in other data sets；For example, above-mentioned AFLW data set, to multi-task learning depth network into When row training, 25000 images in AFLW data set have been used, then 993 images for being not used for training have been used as to mapping Picture.

Further, testing image is by before being inputted trained multi-task learning depth network by the present embodiment, Two-level concatenation convolutional neural networks are first passed through to handle the testing image of input, obtain include in testing image first to Survey human face region.It is worth noting that, first human face region to be measured is in the test process of multi-task learning depth network It is obtained by two-level concatenation convolutional neural networks, in step S106 shown in Fig. 6, multi-task learning depth network is in training The prediction human face region being calculated in the process is not identical.

In step S202, the first human face region to be measured is inputted into multi-task learning depth network, from the first face to be measured Selection meets the second human face region to be measured of preset condition in region, output to the second human face region to be measured carry out Face datection, Positioning feature point, the testing result of the prediction of feature visibility of a point and gender identification.

The testing image for obtaining the first human face region to be measured is inputted in trained multi-task learning depth network, is enabled Multi-task learning depth network selects the second human face region to be measured for meeting preset condition from the first human face region to be measured, in turn The test of each task in multitask, final output Face datection, positioning feature point, feature are carried out to the second human face region to be measured The testing result of visibility of a point prediction and gender identification.

Further, multi-task learning depth network receives obtained by two-level concatenation convolutional neural networks the in the present embodiment After one human face region to be measured, the first human face region to be measured is calculated, obtains the corresponding of each first human face region to be measured Score is detected, the second human face region to be measured is filtered out from the first human face region to be measured according to the detection score.Wherein, screening is Each first human face region to be measured is detected score to be compared with preset fraction threshold value accordingly, is filtered out greater than preset fraction First human face region to be measured corresponding to the detection score of threshold value, using the filter out first human face region to be measured as more of input Second human face region to be measured of business study depth network.Wherein, preset fraction threshold value can be adjusted according to actual needs, this Preset fraction threshold value can be 0.4,0.5 or 0.6 in embodiment.

In the present embodiment, the testing image of the first human face region to be measured is inputted into trained multi-task learning depth net After in network, the test of each task in multitask, the execution of multi-task learning depth network are carried out to the second human face region to be measured Content is similar to content performed by the training process of multi-task learning depth network, by the testing image of the first human face region to be measured The particular content for inputting trained multi-task learning depth network progress multi-task learning detection please refers to Fig. 1 to Fig. 6 institute The training method first embodiment of the multi-task learning depth network shown to 3rd embodiment any one.

Further, it when carrying out the test of positioning feature point and feature visibility of a point prediction task, needs to characteristic point Coordinate converted, changed as the coordinate in original image, the transformation for mula used is as follows:

Wherein,It is the relative position of the ith feature point of prediction.

The present embodiment is when testing trained multi-task learning depth network, in multi-task learning depth net Two-level concatenation convolutional neural networks are added to before network, are carried out by testing image of the two-level concatenation convolutional neural networks to input The determination for detecting human face region, obtains the first human face region to be measured, and then make multi-task learning depth network from testing image The detection for the multitask that can be more prepared according to the first human face region to be measured improves the detection of each personage in multitask Precision.The multi-task learning depth network of the present embodiment shines posture changing, the extreme path in image, exaggeration expression and part hide The complicated situation robustness with higher of the variation such as gear, has excellent performance, realizes higher precision and preferable property Energy.

Further, Figure 11 is please referred to, Figure 11 is the flow diagram of an embodiment of step S201 in Figure 10.Such as Figure 11 Shown, step S201 may include following steps:

In step S2011, by the first order neural network of testing image input two-level concatenation convolutional neural networks, output It is respectively labeled as several couple candidate detection windows in human face region and non-face region.

In the present embodiment, which is carried out by the first order neural network in two-level concatenation convolution depth network.It is to be measured Image inputs in the first order neural network of two-level concatenation convolutional neural networks, which includes several cascade Convolutional layer and pond layer, each convolutional layer and pond layer gradually carry out corresponding operation to testing image, finally by the figure of output As being divided into two classes, and two class images are marked, if the label exported respectively is human face region and non-face region Dry couple candidate detection window, which, which can be entered in the neural network of the second level, carries out subsequent processing.

Figure 12 is please referred to, as shown in figure 12, several cascade convolutional layers that the first order neural network of the present embodiment includes It may include: first layer convolutional layer (conv1), second layer pond layer (pool1), third layer convolutional layer (conv2) with pond layer And the 4th layer of convolutional layer (conv3).Wherein, the convolution kernel of first layer convolutional layer is having a size of 3x3, due to other classification and more Object detection task is compared, and determines that face candidate region is substantially a challenging two-value classification task, therefore every Layer may need less convolution kernel number, therefore can reduce calculation amount using the convolution kernel of 3x3 size, while adding nerve The depth of network, and then the performance of neural network is made to be further improved.The size of the Chi Huahe of second layer pond layer is 2x2 is operated using maximum pondization.The size of the convolution kernel of third layer convolutional layer is 3x3.The ruler of the convolution kernel of 4th layer of convolutional layer Very little is 1x1, and convolution kernel, which is sized to 1x1, can enable that neural network can complete information exchange across channel and information is whole It closes, and dimensionality reduction can be carried out convolution kernel port number and/or rise dimension processing.

In other embodiments, the first order neural network of two-level concatenation convolutional neural networks is face in output token It, can be with output boundary frame regression vector while several couple candidate detection windows in region and non-face region.

In step S2012, several couple candidate detection windows are inputted into the second level in two-level concatenation convolutional neural networks Neural network is abandoned the couple candidate detection window for being labeled as non-face region by second level neural network, and is to label The couple candidate detection window in region carries out bounding box recurrence processing, output boundary frame returns that treated the first candidate face region, Using the first candidate face region as the first human face region to be measured.

In the present embodiment, which is carried out by the second level neural network of two-level concatenation convolutional neural networks.By step Several couple candidate detection windows obtained in S2011 input second level neural network, at this time several couple candidate detection window indicias There are human face region and non-face region, is labeled as at this point, second level neural network is then abandoned from several couple candidate detection windows The couple candidate detection window in non-face region retains the couple candidate detection window for being labeled as human face region.Further, to couple candidate detection window Mouthful bounding box recurrence processing is carried out, further obtains bounding box and return treated the first candidate face region, and first is waited Human face region is selected to input in multi-task learning depth network as the first human face region to be measured, to carry out multi-task learning depth net The test of network.In the present embodiment, the first candidate face region of output includes the location information of the region in the picture.

It is understood that the human face region that several are marked obtained in first order neural network, it may be to same Face can mark several or even tens or more human face regions, then in the neural network of the second level, to the same person Multiple human face regions of face carry out bounding box recurrence, reduce the human face region to the same face, and improve obtained face area The matching precision of face in domain and image carries out bounding box at this time and returns usable first order neural network output when processing Bounding box regression vector.

Figure 13 is please referred to, as shown in figure 13, second level neural network equally may include several cascade convolutional layers and pond Layer, such as cascade first layer convolutional layer, second layer pond layer, third layer convolutional layer, the 4th layer of pond layer, layer 5 convolutional layer And full linking layer.Wherein, the size of the convolution kernel of first layer convolutional layer and third layer convolutional layer is 3 × 3；Layer 5 convolutional layer Convolution kernel size be 2 × 2；The size of the convolution kernel of second layer pond layer and the 4th layer of pond layer is 3 × 3, and is all made of Maximum pondization operation；Full linking layer is the full linking layer of 128 dimensional feature vectors.

According to Figure 12 and Figure 13, due to the size for the image that first order neural network and second level neural network input Difference, therefore testing image is being inputted into first nerves network, and the first candidate face region is inputted into second level nerve net Before network, need to carry out size adjusting to testing image and the first candidate face region respectively.

Further, Figure 14 is please referred to, Figure 14 is the test method second embodiment of multi-task learning depth network of the present invention Flow diagram.As shown in figure 14, in the present embodiment, after the step S201 of Figure 10, can also include the following steps:

In step S203, degree of overlapping in the first human face region to be measured is higher than the default human face region to be measured of degree of overlapping first It merges, the final human face region to be measured after being merged.

It is understood that in the first human face region to be measured obtained in two-level concatenation convolutional neural networks, it may be to same One face can obtain several, tens human face regions to be measured of even more first.And then in the present embodiment, it is believed that In the first human face region to be measured that two-level concatenation convolutional neural networks obtain the human face region to be measured of degree of overlapping higher first be by What the same face obtained, therefore, the human face region to be measured of degree of overlapping higher first can be merged, with reduce first to The quantity of human face region is surveyed, and improves detection accuracy.

Further, the first human face region to be measured that the present embodiment obtains two-level concatenation convolutional neural networks compares It is right, degree of overlapping mutual between the multiple first human face regions to be measured is obtained, degree of overlapping is higher than two or more of default degree of overlapping A first human face region to be measured merges, and then obtains final human face region to be measured.The final human face region to be measured that will be obtained Multi-task learning depth network is inputted to carry out subsequent testing procedure.

It, can be next pair by non-maxima suppression algorithm (non maximum suppression, NMS) in the present embodiment First human face region to be measured merges, and non-maxima suppression algorithm is related to the selection scoring highest from the first human face region to be measured Region and abandon other regions of all overlappings greater than specific threshold, and by the area zoom of selection to preset ruler Very little, this in the present embodiment is preset having a size of 227x227.In addition, default degree of overlapping can be adjusted according to actual needs.

In another embodiment, step S203 can also be executed after step S1012, i.e., obtain in step S1012 After bounding box returns treated the first candidate face region, pass through non-maxima suppression algorithm (non maximum Suppression, NMS) the first candidate face region merged.Final people to be measured is obtained by NMS in the present embodiment The process flow in face region is as follows:

In above-mentioned process flow, it is as follows that score resets function Si:

In above-mentioned formula, in order to which whether the first candidate face region for differentiating adjacent can retain, NMS has used hard threshold The method of value.It is finally completed the final human face region to be measured after being merged.

Further, Figure 15 is please referred to, Figure 15 is the 3rd embodiment of multi-task learning depth network test mode of the present invention Flow diagram.As shown in figure 15, in the present embodiment, after the step S201 of Figure 10, can also include the following steps:

In step S204, the size of the first human face region to be measured is adjusted, by the size adjusting of the first human face region to be measured The default human face region size allowed for multi-task learning depth network.

It is required since size of the multi-task learning depth network to the human face region to be measured of input has, this implementation In example, after obtaining two-level concatenation convolutional neural networks and obtaining the first human face region to be measured, the first human face region to be measured is carried out Size adjusting is adjusted to the default human face region size of multi-task learning depth network permission.At this point, it is to be measured to merge first Human face region is the first human face region to be measured merged after size adjusting.

Further, step S204 can be executed after step S203, i.e., to the final human face region to be measured by merging Size be adjusted, be adjusted to multi-task learning depth network permission default human face region size.

Further, Figure 16 is please referred to, Figure 16 is the fourth embodiment of multi-task learning depth network test mode of the present invention Flow diagram.As shown in figure 16, the present embodiment looks for that, can also include following step before the step S201 of Figure 10 It is rapid:

In step S205, the size of testing image is adjusted, is two-level concatenation convolution mind by the size adjusting of testing image The size of the testing image allowed through network.

The present embodiment is treated before by the first order neural network of testing image input two-level concatenation convolutional neural networks Altimetric image carries out different change in size.Wherein, original zoom scale isWherein, S is the first human face region to be measured Minimum dimension, 12 be the human face region to be measured of first order neural network acceptable first minimum dimension.In the present embodiment, It can be as follows to the process flow of testing image:

Wherein, loss function is divided into two parts, respectively about the recurrence of face classification and human face region.Intersect entropy loss Function is or non-face region is classified to label, for each sample x_i, formula is as follows:

Wherein,Indicate the physical tags of background, p_iThen indicate sample x_iIt is the probability of face.

Wherein, bounding box recurrence processing is carried out using quadratic loss function, practical upper returning loss is to use Euclidean distance Come what is solved, formula is as follows:

Wherein,The coordinate that representative is obtained with neural network forecast, andIndicate actual background coordination.y^boxIt is by upper left The abscissa at angle, the ordinate in the upper left corner, the long and wide four-tuple composition formed.

The multi-task learning depth network of the present embodiment can actually be considered a three-level network, loss function packet Face classification is contained and bounding box returns two parts, it is therefore desirable to two part training, two loss functions, to each damage Function is lost to distribute to form final objective function by different currency types.The final goal function of the present embodiment is as follows:

The entire training process of loss function is substantially the process for minimizing above-mentioned function, wherein α_jRepresent corresponding appoint The importance of business, N indicate training samples number, the α in first order neural network and second level neural network_det=1, α_box= 0.5。Indicate sample label.

Further, visible to multi-task learning depth network progress Face datection of the invention, positioning feature point, characteristic point Property prediction and gender identification detection accuracy be illustrated:

The detection accuracy of the Face datection of the present embodiment mainly uses face detection and standard database (Face Detection Data Set and Benchmark, FDDB) assess corresponding performance.FDDB database has 245 images It is made of with 5171 the face of label, Univ. of Massachusetts's offer is provided, for justice, FDDB provides unified evaluation and test generation Code.According to test result, for the multi-task learning depth network of the present embodiment when erroneous detection number is 100, measuring accuracy can Reach 86.61%, it is only more slightly lower than optimal precision 88.53% (to be sensed by the deep cone deformable member model (Deep of face Pyramid Deformable Parts Model for Face Detection, DP2MFD model) test obtain), with The measuring accuracy of the increase of erroneous detection number, the Face datection of the multi-task learning depth network of the present embodiment also accordingly rises, when When erroneous detection number is 250, measuring accuracy can be up to 90.1%.For multi-task learning depth network Face datection and Speech, FDDB data set is that extremely have challenge, because data set includes many small and fuzzy faces, firstly, by image It is adjusted to the input size of 227x227, face can be made to generate distortion, causes to detect score reduction.Despite the presence of these problems, originally The multi-task learning depth network of embodiment still obtains relatively good test effect.

The performance of the Face datection of the multi-task learning depth network of the present embodiment is assessed by AFLW data set.AFLW number It is made of according to collection 1000 pictures containing 1132 face samples.Only when degree of overlapping is greater than preset threshold (the present embodiment When can be set to 0.5), just as the data set of face test, it is special to calculate prediction corresponding with human face region to be measured Levy the mean place of point.Create the subset of 450 samples at random from AFLW data set, and be divided into according to deflection angle [0 °, 30 °], [30 °, 60 °] and this three groups of [60 °, 90 °] each account for 1/3.Positioning accuracy is assessed using normalization mean error, but It is to be related to the visibility of characteristic point due to method of the invention, visible features point normalizes the average value of evaluated error, such as Shown in lower:

Wherein, U_iRepresent actual characteristic point coordinate, v_iIt is the corresponding visibility of characteristic point,It is sat for the characteristic point of prediction Mark, N_tIndicate test number of pictures.Wherein | v_i|₁It is the number of i-th of picture visible features point, U_i(:, j) it is U_iJth column, d_iFor the square root of face bounding box size.It merits attention, when the close front of facial image, d_iIt is in most instances Using the distance of pupil center, however in view of including invisible characteristic point in AFLW data set, so d_iUse face boundary Frame size.The test method of the multi-task learning depth network of the present embodiment remains to obtain preferable relative to existing method Structure.

Gender identification is assessed by CelebA data set and LFWA data set, these data sets contain gender Information.CelebA and LFWA data set separately includes the tag image selected in Celeb Faces and LFWA data set.CelebA Data set includes 10000 identity, a total of 200,000 images.LFWA data set include 5327 identity, a total of 13233 Image.The multi-task learning depth network of the present embodiment achieves 97% accuracy on CelebA data set, in LFWA number According to the accuracy for achieving 93% on collection.

Further, Figure 17 is please referred to, Figure 17 is one embodiment of test equipment of multi-task learning depth network of the present invention Structural schematic diagram.As shown in figure 17, the test equipment 300 of the present embodiment multi-task learning depth network at least may include mutual Memory 301, telecommunication circuit 303 and the processor 302 of connection；Memory 301 is stored with two-level concatenation convolutional neural networks, more Tasking learning depth network and program data；Telecommunication circuit 303 is for obtaining testing image；Processor 302 is used for according to journey Ordinal number evidence executes the test method of above-mentioned multi-task learning depth network, utilizes two-level concatenation convolutional neural networks and Duo Ren Business study depth network handles altimetric image carries out Face datection, positioning feature point, the prediction of feature visibility of a point and gender identification. In another embodiment, training set can also be stored directly in memory 301.

On the other hand, Figure 18 is please referred to, Figure 18 is the structural schematic diagram of one embodiment of storage medium of the present invention.Such as Figure 18 institute Show, at least one program or instruction 401 is stored in the storage medium 400 of the present embodiment, program or instruction 401 are for executing such as Fig. 1 to multi-task learning depth network shown in Fig. 7 training method first embodiment into 3rd embodiment any embodiment And/or the test method first embodiment of multi-task learning depth network shown in Figure 10 to Figure 16 appointing into 3rd embodiment Meaning embodiment.

In one embodiment, storage medium 400 can be the memory in Fig. 8, Fig. 9 or Figure 17, and the present embodiment is deposited Storage media 400 can be that storage chip, can also be hard disk either mobile hard disk or flash disk, CD etc., other read-write are deposited The tool of storage, in addition, storage medium can also be server etc..

Mode the above is only the implementation of the present invention is not intended to limit the scope of the invention, all to utilize this Equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content, it is relevant to be applied directly or indirectly in other Technical field is included within the scope of the present invention.

Claims

1. a kind of training method of multi-task learning depth network characterized by comprising

Training set is inputted in multi-task learning depth network and carries out multi-task learning, exports the prediction knot of the multi-task learning Fruit, wherein the multi-task learning include positioning feature point task, feature visibility of a point prediction task, Face datection task with And gender identification mission；

The prediction result is compared with the label result in the training set, according to the comparison result obtain with it is described The corresponding penalty values of multi-task learning；

The penalty values are fed back in the multi-task learning depth network, the multi-task learning depth network is corrected.

2. training method according to claim 1, which is characterized in that described that training set input multi-task learning is deep The step of carrying out multi-task learning in degree network, export the prediction result of the multi-task learning, comprising:

The training set is inputted to the several layers convolutional layer and pond layer of the multi-task learning depth network cascade step by step, and From in the several layers convolutional layer and pond layer multiple convolutional layers and/or multiple pond layers export corresponding first operation respectively As a result；

The first operation result input feature vector is merged into full linking layer, exports Fusion Features data；

The Fusion Features data are inputted full linking layer corresponding with each task in the multi-task learning to carry out respectively The study of each task exports the corresponding prediction result of each task respectively.

3. training method according to claim 2, which is characterized in that described from the several layers convolutional layer and pond layer In multiple convolutional layers and/or multiple pond layers the step of exporting corresponding first operation result respectively after, further includes:

First operation result is at least partly inputted into corresponding sub- convolutional layer respectively, is exported corresponding with identical dimensional The second operation result.

4. training method according to claim 3, which is characterized in that in at least portion by first operation result After the step of dividing and input corresponding sub- convolutional layer respectively, exporting corresponding second operation result with identical dimensional, also wrap It includes:

Second operation result with identical dimensional is inputted into full convolutional layer, the third operation knot after exporting dimension-reduction treatment Fruit.

5. training method according to claim 2, which is characterized in that described from the several layers convolutional layer and pond layer Multiple convolutional layers and/or multiple pond layers the step of exporting corresponding first operation result respectively, comprising:

From first layer pond layer, third layer convolutional layer and the layer 5 pond layer difference in the several layers convolutional layer and pond layer Corresponding first pond operation result, the first convolution operation result and the second pond operation result are exported, by first pond Operation result, the first convolution operation result and the second pond operation result are as the first operation result.

6. training method according to claim 1, which is characterized in that training set is inputted multi-task learning depth described Before the step of carrying out multi-task learning in network, further includes:

The training that Face datection task is carried out using AlexNet network, obtains weight corresponding with the Face datection task；

Utilize multi-task learning depth network described in the weights initialisation.

7. training method according to claim 1, which is characterized in that training set is inputted multi-task learning depth described Before the step of carrying out multi-task learning in network, further includes:

Calculate the prediction human face region of the image in the training set；

Described input training set in multi-task learning depth network carries out multi-task learning, comprising:

The training set is inputted in the multi-task learning depth network, the prediction human face region and the training set In image on marked label human face region, obtain comparison result；

According to the comparison result, select the prediction human face region for meeting preset condition as detection human face region；

Multi-task learning is carried out to the detection human face region.

8. training method according to claim 1, which is characterized in that the loss function of the Face datection task is Softmax function；The positioning feature point and the loss function of specific visibility of a point prediction are euclidean function；The property The loss function of other identification mission is cross entropy loss function.

9. a kind of training equipment of multi-task learning depth network, which is characterized in that including memory interconnected and processing Device；

The processor is used for according to described program data, and perform claim requires training method described in 1-8 any one, is utilized The training set is trained the multi-task learning depth network.

10. a kind of storage medium, which is characterized in that be stored with program data, described program data can be performed to realize power Benefit requires the training method of multi-task learning depth network described in 1-8 any one.