CN109086660A - Training method, equipment and the storage medium of multi-task learning depth network - Google Patents
Training method, equipment and the storage medium of multi-task learning depth network Download PDFInfo
- Publication number
- CN109086660A CN109086660A CN201810615755.0A CN201810615755A CN109086660A CN 109086660 A CN109086660 A CN 109086660A CN 201810615755 A CN201810615755 A CN 201810615755A CN 109086660 A CN109086660 A CN 109086660A
- Authority
- CN
- China
- Prior art keywords
- task
- task learning
- layer
- depth network
- face region
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Abstract
The embodiment of the invention discloses the training method of multi-task learning depth network, equipment and storage mediums.The embodiment of the present invention enables multi-task learning depth network be trained simultaneously to relevant positioning feature point task, feature visibility of a point prediction task, Face datection task and gender identification mission, task, Face datection task and gender identification mission are predicted by carrying out relevant positioning feature point task, feature visibility of a point mutually simultaneously in multi-task learning depth network, utilize influencing each other between relevant multiple tasks, the detection accuracy for improving corresponding task mutually, improves the performance of multi-task learning depth network.
Description
Technical field
The present embodiments relate to field of biological recognition, training method, training more particularly to multi-task learning network
Equipment and storage medium.
Background technique
Recognition of face is the major issue in computer vision project, wherein relatively important several aspects are face inspection
Survey, face feature point identification, facial characteristics point location etc..Many visual tasks depend on accurate facial characteristics point location knot
Structure, such as face recognition, facial expression analysis and FA Facial Animation etc..Although being used in recent years by extensive research, and take
A degree of success was obtained, but due to partial occlusion, expression shape change of illumination, largely end rotation and exaggeration etc.
Factor, leads to the complexity and diversity of facial image, and facial characteristics point location still suffers from problems and challenge.
In the prior art, the method for facial characteristics point location can be roughly divided into two types: conventional method and be based on depth
The method of study.Typical conventional method includes the method based on model and the method based on recurrence;Method based on model exists
Learn shape increment in the case where given average original shape, such as active shape model (Active Shape Mode, ASM) and
Active appearance models (active appearance model, AAM), using statistical model such as principal component analysis (Principal
Component Analysis, PCA) shape and cosmetic variation are captured respectively;However, since single linear model is difficult to carve
The complex nonlinear variation in reality scene data is drawn, therefore the conventional method based on model cannot be obtained with largely
The accurate shape of the facial images such as the facial expression of head pose variation and exaggeration.The method based on recurrence in conventional method is then
It is that key point position is predicted by training display model.There is researcher to pass through in Scale invariant features transform (Scale-
Invariant feature transform, SIFT) on carry out predicting shape increment using linear regression.In addition, also there is research people
Member proposes that using image pixel intensities difference to learn a series of random ferns as characteristic sequence returns, and gradually degenerates and learn cascade shape
Shape, they return all parameters simultaneously, to effectively utilize shape constraining;I.e. the method based on recurrence is mainly from first
The characteristic point position of prediction is iteratively modified in the estimation of beginning, therefore final result is highly dependent on initialization.
For the method based on deep learning, current existing several ways.Sun et al. is proposed using three-stage cascade convolution
Neural network framework carries out the new method of facial characteristics point location, by 5 spies with convolutional neural networks (CNN) to face
Sign point (i.e. left and right eye, nose, the left and right corners of the mouth) is returned, while using the convolutional neural networks of different stage come to feature
Point combined trim.In addition, Zhang et al. proposes a kind of depth nonlinear characteristic independent positioning method (Coarse- from thick to thin
To-Fine Auto-Encoder Networks, CFAN), use continuous self-encoding encoder network implementations nonlinear regression model (NLRM).This
Both of which passes through cascade mode gradually location feature point using multiple depth networks.They for every image from coarse to fine
The optimal characteristic point position of search, showing has higher precision than pervious characteristic point positioning method, however to blocking
Problem can not but be effectively treated.In addition, because multiple convolutional neural networks structures are used, with the increasing of face feature point quantity
Add, the time loss for positioning all the points also increases accordingly.In reality without in constraint environment, facial characteristics point location is actually simultaneously
An individual task, it can also be interfered by various factors, such as: the swing on head, gender difference, all can
Influence the accuracy of positioning feature point.
Summary of the invention
The embodiment of the present invention mainly solving the technical problems that provide a kind of training method of multi-task learning depth network,
The performance of multi-task learning depth network can be improved.
In order to solve the above technical problems, a technical solution used in the embodiment of the present invention is: providing a kind of multitask
The training method of depth network is practised, which includes:
Training set is inputted in multi-task learning depth network and carries out multi-task learning, exports the pre- of the multi-task learning
Survey result, wherein the multi-task learning includes positioning feature point task, feature visibility of a point prediction task, Face datection times
Business and gender identification mission;
The prediction result is compared with the label result in the training set, according to the comparison result obtain with
The corresponding penalty values of the multi-task learning;
The penalty values are fed back in the multi-task learning depth network, the multi-task learning depth net is corrected
Network.
In order to solve the above technical problems, another technical solution used in the embodiment of the present invention is: providing a kind of multitask
Learn the training equipment of depth network, which includes:
Memory and processor interconnected;
The memory is stored with the multi-task learning depth network and program data of training set, building;
The processor is used to above-mentioned training method is executed, using the training set to institute according to described program data
Multi-task learning depth network is stated to be trained.
In order to solve the above technical problems, another technical solution used in the embodiment of the present invention is: providing a kind of storage Jie
Matter, the storage medium are stored with program data, and described program data can be performed to realize above-mentioned multi-task learning depth
The training method of network.
The beneficial effect of the embodiment of the present invention is: in the training side of the multi-task learning depth network of the embodiment of the present invention
Method carries out multi-task learning by inputting training set in multi-task learning depth network, exports the pre- of the multi-task learning
Survey result;The prediction result is compared with the label result in the training set, according to the comparison result obtain with
The corresponding penalty values of the multi-task learning;The penalty values are fed back in the multi-task learning depth network, institute is corrected
State multi-task learning depth network.Wherein, the multi-task learning includes positioning feature point task, the prediction times of feature visibility of a point
Business, Face datection task and gender identification mission.The present embodiment is mutual by carrying out simultaneously in multi-task learning depth network
Mutually relevant positioning feature point task, feature visibility of a point prediction task, Face datection task and gender identification mission, utilize
Influencing each other between relevant multiple tasks improves mutually the detection accuracy of corresponding task, improves multi-task learning depth net
The performance of network.
Detailed description of the invention
Fig. 1 is the flow diagram of the training method first embodiment of multi-task learning depth network of the present invention;
Fig. 2 is the flow diagram of an embodiment of step S101 in Fig. 1;
Fig. 3 is the flow diagram of another embodiment of step S101 in Fig. 1;
Fig. 4 is the structural schematic diagram of one embodiment of multi-task learning depth network of the invention;
Fig. 5 is the flow diagram of the training method second embodiment of multi-task learning depth network of the present invention;
Fig. 6 is the flow diagram of the training method 3rd embodiment of multi-task learning depth network of the present invention;
Fig. 7 is the flow diagram of the another embodiment of step S101 in Fig. 6;
Fig. 8 is the structural schematic diagram of one embodiment of training equipment of multi-task learning depth network of the present invention;
Fig. 9 is the structural schematic diagram of another embodiment of training equipment of multi-task learning depth network of the present invention;
Figure 10 is the flow diagram of the test method first embodiment of multi-task learning depth network of the present invention;
Figure 11 is the flow diagram of an embodiment of step S201 in Figure 10;
Figure 12 is the structural schematic diagram of one embodiment of first order neural network of two-level concatenation convolutional neural networks of the present invention;
Figure 13 is the structural schematic diagram of one embodiment of second level neural network of two-level concatenation convolutional neural networks of the present invention;
Figure 14 is the flow diagram of the test method second embodiment of multi-task learning depth network of the present invention;
Figure 15 is the flow diagram of the 3rd embodiment of multi-task learning depth network test mode of the present invention;
Figure 16 is the flow diagram of the fourth embodiment of multi-task learning depth network test mode of the present invention;
Figure 17 is the structural schematic diagram of one embodiment of test equipment of multi-task learning depth network of the present invention;
Figure 18 is the structural schematic diagram of one embodiment of storage medium of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that the described embodiments are merely a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, Fig. 1 is the process signal of the training method first embodiment of multi-task learning depth network of the present invention
Figure.As shown in Figure 1, the training method of the multi-task learning depth network of the present embodiment at least may include following steps:
In step s101, training set is inputted in multi-task learning depth network and carries out multi-task learning, export more
The prediction result of business study.
In the present embodiment, the multitask of Primary Construction is had already passed through using the image of training set as the input of the data source of training
Learn depth network, multitask is carried out to the image for including in training set by the multi-task learning depth network of Primary Construction
It practises, and obtains the prediction result of multi-task learning.
In the present embodiment, multi-task learning includes positioning feature point task, feature visibility of a point prediction task, Face datection
Task and gender identification mission.The multi-task learning depth network of Primary Construction, which can be exported accordingly in training set, as a result, includes
Image face in positioning feature point result, feature visibility of a point prediction result, Face datection result and gender identification
As a result.
In the present embodiment, using AFLW data set as training set for, the multi-task learning network of Primary Construction is carried out
Training.AFLW data set includes most of facial image under natural conditions, possesses very huge information content, and AFLW data
Concentrating is that every face both provides the mark of 21 characteristic points, in addition, face frame, head appearance are also marked in AFLW data set
State and gender information.AFLW data set includes 25993 facial images manually marked, and wherein male accounts for
41%, women accounts for 59%, and most images are color image, and only fraction image is gray level image.It, will in the present embodiment
Most of image is both used as the training set of multi-task learning depth network in AFLW data set, and there are also fraction images can then retain
For testing the multi-task learning depth network after training, with the multi-task learning depth network after training of judgement whether
Accuracy needed for meeting.
In step s 102, prediction result is compared with the label result in training set, is obtained according to comparison result
Penalty values corresponding with multi-task learning.
The multi-task learning depth network of available Primary Construction carries out positioning feature point task, spy in step s101
The corresponding prediction of each task obtained from visibility of a point prediction task, Face datection task and gender identification mission is levied to tie
Fruit.Then obtained prediction result is compared with the label result on the image in training set for this step, and then respectively obtains
The corresponding penalty values of each task execution in multi-task learning.
In step s 103, penalty values are fed back in multi-task learning depth network, corrects multi-task learning depth net
Network.
The corresponding penalty values of each task characterize the corresponding accuracy of each task in multi-task learning, by the penalty values
Backpropagation, and then the error of an available layer network upper in multi-task learning depth network are participated in, and then to multitask
Study depth network is modified, and finally obtains revised multi-task learning depth network to get the multitask to after training
Learn depth network.
Further referring to Fig. 2, Fig. 2 is the flow diagram of an embodiment of step S101 in Fig. 1.As shown in Fig. 2,
Step S101 at least may include following steps:
In step S1011, training set is inputted to the several layers convolutional layer of multi-task learning depth network cascade step by step
With pond layer, and from several layers convolutional layer and pond layer multiple convolutional layers and/or multiple pond layers export respectively accordingly
First operation result.
In the present embodiment, the multi-task learning depth network of Primary Construction be using the network structure of AlexNet network as
Basis improves what building obtained.
Include several cascade convolutional layers and pond layer in multi-task learning depth network, with the data to output into
Row convolution algorithm or pondization operate, and in the present embodiment, the pondization operation of pond layer is maximum pondization operation.Each convolutional layer and
Each pond layer can obtain corresponding operation result by corresponding operation, these operation results are corresponding feature letter
Breath.In the present embodiment, selected section convolutional layer and/or the layer conduct of part pond from the cascade convolutional layer of several layers and pond layer
The output layer of operation result extracts multiple convolutional layers and/or multiple ponds respectively from several layers convolutional layer and pond layer
The corresponding operation result of layer, and using the operation result extracted as the first operation result.
Further, the characteristic information for including due to the operation result that each convolutional layer and each pond layer export is incomplete
It is identical, therefore the corresponding operation result by extracting multiple convolutional layers and/or multiple pond layers is so that the first obtained operation
As a result the demand of information content needed for can satisfy multi-task learning.
In multi-task learning depth network, the operation result exported compared with the convolutional layer and pond layer of shallow-layer includes more
Marginal information and angle point information are conducive to the study of positioning feature point task;The fortune of convolutional layer and pond the layer output of deeper
Calculating result includes more Global Informations, then is more advantageous to the study for carrying out the more complex tasks such as Face datection, gender identification.
As a result, in the present embodiment, multiple convolutional layers and/or multiple pond layers include at least convolutional layer of the several layers compared with shallow-layer and/or pond
The convolutional layer and/or pond layer for changing layer and several layers deeper, so that the first obtained operation result includes enough sides
Edge information and angle point information also include certain Global Information, so that the information extracted can preferably carry out multi-task learning,
The number of plies specifically extracted then needs to be adjusted according to final prediction result etc., the information for including to avoid the first operation result
It measures excessive.
In step S1012, the first operation result input feature vector is merged into full linking layer, exports Fusion Features data.
After extracting corresponding first operation result respectively, the letter that includes due to multiple convolutional layers and/or pond layer
Breath amount is larger, can not directly carry out multi-task learning, needs to carry out feature to multiple corresponding first operation results extracted
Fusion maps that a sub-spaces, and then improves network performance.
In the present embodiment, multiple corresponding first operation results obtained in step S1011 are output to multi-task learning
The full linking layer of Fusion Features in depth network, by the full linking layer of Fusion Features to multiple corresponding first operation knots of input
Fruit carries out Fusion Features, and exports Fusion Features data.
In step S1013, complete link the input of Fusion Features data is corresponding with each task in multi-task learning
Layer carries out the study of each task respectively, exports the corresponding prediction result of each task respectively.
Fusion Features data will be obtained after Fusion Features further inputs each of multi-task learning depth network
The corresponding full linking layer of task, full linking layer corresponding with each task carry out tagsort to the Fusion Features data of input,
And it is linked to the corresponding branch of each task respectively, and then obtain the prediction result of each task.
Further referring to Fig. 3, as shown in figure 3, step S1011 obtains more in another embodiment of step S101
After corresponding first operation result that a convolutional layer and/or pond layer export respectively, it may also include the steps of:
In step S1014, the first operation result is at least partly inputted into corresponding sub- convolutional layer respectively, output corresponds to
The second operation result with identical dimensional.
In the present embodiment, the characteristic of several layers convolutional layer and pond the layer output of multi-task learning depth network cascade
It is different according to the size of (i.e. characteristic pattern), because exported respectively from multiple convolutional layers and/or multiple pond layers in step S1011
The size of corresponding first operation result is different, they cannot directly be attached.The present embodiment will be in step S1011 as a result,
Obtained each convolutional layer and/or pond layer exports at least partly corresponding first operation result respectively and inputs corresponding sub- convolution
In layer, the convolution kernel size of every sub- convolutional layer is corresponding with the size of corresponding first operation result that it is inputted, to be had
There is the second operation result of identical dimensional.
It is understood that deeper convolutional layer and deeper pond layer output operation result size be greater than compared with
The size of the operation result of the pond layer output of the convolutional layer and deeper of shallow-layer;And convolutional layer and pond layer that the number of plies is deeper,
The size of its operation result exported is bigger.Thus, it is possible to by the convolutional layer for the bottommost layer for exporting the first operation result or pond
The standard size that is adjusted as size of size of first operation result of layer output, by before each convolutional layer or pond layer it is defeated
Size out is adjusted to the standard size.For example, if the first operation result that the convolutional layer or pond layer of bottommost layer export
Having a size of 6x6x256, then by the first operation knot of convolutional layer or the output of pond layer before the convolutional layer of bottommost layer or pond layer
Fruit is adjusted to 6x6x256.
In step S1015, the second operation result with identical dimensional is inputted into full convolutional layer, after exporting dimension-reduction treatment
Third operation result.
The second operation result that the present embodiment obtains identical dimensional exports in a full convolutional layer, the convolution kernel of the full convolutional layer
For 1x1, and then dimension-reduction treatment is carried out to the second operation result, the third operation result after exporting dimension-reduction treatment, and will be at dimensionality reduction
The third operation result obtained after reason is input to the full linking layer of Fusion Features as the first operation result, continues step
S1012 and step S1013.
Further, referring to Fig. 4, Fig. 4 is the structural schematic diagram of one embodiment of multi-task learning depth network of the invention.
As shown in figure 4, the multi-task learning depth network (in dotted line frame) of the present embodiment includes several cascade convolutional layers and pond first
Change layer, in the present embodiment, each pond layer can also pass through Regularization;First layer convolutional layer is defined as according to its waterfall sequence
(conv1), first layer pond layer (pool1), second layer convolutional layer (conv2), second layer pond layer (pool2) etc. are as such
It pushes away, the present embodiment is for being cascaded to the 5th pond layer (pool5).Training set inputs above-mentioned cascade convolutional layer and pond layer
In, and corresponding first pond operation result, first are exported from first layer pond layer, third layer convolutional layer and layer 5 pond layer
Convolution algorithm result and the second pond operation result, first layer pond layer, third layer convolutional layer and layer 5 pond in the present embodiment
The size for changing the operation core of layer is 3x3, correspondingly, the first pond operation result, the first convolution operation result and the second pond
The size of operation result is respectively 27x27x96,13x13x384,6x6x256.Wherein, the second pond exported with the second pond layer
The size (6x6x256) for changing operation result is the standard of size adjusting, by the first pond operation result having a size of 27x27x96
The sub- convolutional layer (conv1a) that convolution kernel is 4x4 is inputted, the first convolution operation result having a size of 13x13x384 is inputted into convolution
Core is the sub- convolutional layer (conv3a) of 2x2, and then passes through sub- convolutional layer (conv1a) and sub- convolutional layer (conv3a) for the first pond
The size for changing operation result and the first convolution operation result is adjusted to 6x6x256, and the fortune that will have identical dimensional after adjusting
Result is calculated as the second operation result.Fig. 4 is further regarded to, the second operation result is input to the full convolution that convolution kernel is 1x1
In layer (conv_all), and then dimension-reduction treatment is carried out to the second operation result, obtains the third operation knot having a size of 6x6x192
Fruit.Later, third operation result is inputted to the full linking layer of Fusion Features (fc_full) of 3072 dimensional feature vectors, later
It is linked to again and each task (positioning feature point task, feature visibility of a point prediction task, Face datection task and gender knowledge
Other task) corresponding full linking layer, the dimension of full linking layer corresponding with each task is 512, for each task into
Row learning training.
The present invention appoints positioning feature point task, the prediction of feature visibility of a point by above-mentioned multi-task learning depth network
Business, Face datection task and gender identification mission are learnt respectively, on the one hand by multi-task learning depth network
Feature visibility of a point prediction task relevant to positioning feature point, Face datection task and gender identification mission are added, is realized
The precision of positioning feature point is improved, while being able to carry out the execution of other tasks.On the other hand, the multi-task learning of the present embodiment
Depth network uses Feature Fusion, the characteristic pattern that multiple convolutional layers and/or pond layer export is carried out Fusion Features, in turn
Obtain the data information of enough positioning feature point required by task.The multi-task learning depth network of the present embodiment is in image
The situation robustness with higher that posture changing, extreme path shine, the exaggeration variation such as expression and partial occlusion is complicated, has excellent
Performance, realize higher precision and preferable performance.
Further, the training method of the multi-task learning depth network of the present embodiment all convolutional layers and full linking layer it
Afterwards, it is all added to nonlinear activation function, the present embodiment is to correct linear unit (Rectified linear unit, ReLU)
For activation primitive.Further, the multi-task learning depth network of the present embodiment does not increase any pond in converged network
Operation, because the feature that pondization operation is extracted has Scale invariant shape to local message, and this characteristic is that positioning feature point is appointed
Business institute is unwanted.
Further, referring to Fig. 5, Fig. 5 is the training method second embodiment of multi-task learning depth network of the present invention
Flow diagram, the present embodiment are to improve to obtain on the basis of Fig. 1 to training method first embodiment shown in Fig. 3,
The structure of its multi-task learning depth network is as shown in Figure 4.As shown in figure 5, the present embodiment may also include before step S101
Following steps:
In step S104, the training of Face datection task is carried out using AlexNet network, is obtained and Face datection task
Corresponding weight.
In the present embodiment, before being trained multi-task learning depth network, need to initialize the network,
Initializing the weight used then has existing AlexNet network progress Face datection task to obtain.Wherein, AlexNet net
Network is the neural network structure model being suggested in 2012.
In step s105, weights initialisation multi-task learning depth network is utilized.
It can be according to the weight that step S104 is obtained to multi-task learning depth network proposed by the present invention in the present embodiment
It is initialized.
It may be at the hidden layer neuron in depth network according to random starting values in training depth network
In saturation state, at this point, carried out in weight small adjustment can only be brought to the activation value of hidden layer neuron it is extremely micro-
Weak change, and this faint change also will affect remaining neuron in network, then can bring corresponding cost function
Change, it is final as a result, these weights can learn very slow when network carries out gradient descent algorithm.And by changing
The distribution of variable weight, initializes network, can improve to network.
Further, referring to Fig. 6, Fig. 6 is the training method 3rd embodiment of multi-task learning depth network of the present invention
Flow diagram, the present embodiment are to improve to obtain on the basis of Fig. 1 to training method first embodiment shown in Fig. 3.
As shown in fig. 6, the present embodiment may also include the steps of: before step S101
In step s 106, the prediction human face region of the image in training set is calculated.
It is training by RCNN network before inputting training set in multi-task learning depth network in the present embodiment
The image of concentration calculates prediction human face region.The present embodiment is for calculating the algorithm that uses of prediction human face region as selective search
Algorithm.
The present embodiment can also be tied with the training method second embodiment of multi-task learning depth network shown in fig. 5
It closes, it should be noted that not no inevitable sequencing relationship before step S106 and step S104 and step S105.
Further, referring to Fig. 7, the training method 3rd embodiment of multi-task learning depth network shown in Fig. 6
On the basis of, step S101 execute will training set input multi-task learning depth network in carry out multi-task learning can be further
Include the following steps:
In step S1016, training set is inputted in multi-task learning depth network, comparison prediction human face region and training
Marked label human face region, obtains comparison result on the image of concentration.
Training set input value multi-task learning depth network, wherein according to the above-mentioned explanation to training set it is found that training set
In include image passed through manually human face region marked, using the human face region of handmarking as label face
Region.In the present embodiment, training set is input to after multi-task learning depth network, in each task for carrying out multi-task learning
Study when, need for the prediction human face region that step S106 is calculated to be compared with the label human face region, so
To comparison result, meet the pre- of the corresponding preset condition of each task to filter out from prediction human face region according to comparing result
Survey human face region.
In the present embodiment, which is the degree of overlapping for predicting human face region and the label human face region, the degree of overlapping
It is able to reflect out prediction human face region and marks and agree with degree between human face region.
In step S1017, according to comparison result, select the prediction human face region for meeting preset condition as detection face
Region.
The overlapping of each prediction human face region that can be calculated by step S1016 and corresponding label human face region
Degree, the present embodiment is that each task in multi-task learning is provided with corresponding preset condition, i.e., corresponding with each task
Full linking layer only carries out the study of corresponding task to the prediction human face region for meeting preset condition.
The present embodiment will predict the human face region for meeting preset condition in human face region as detection human face region, due to every
The corresponding preset condition of a task may be different, therefore for each task, the corresponding detection face area filtered out
It domain may be different.
In step S1018, multi-task learning is carried out to detection human face region.
After the detection human face region corresponding with each task obtained according to step S1017, it can enable and each task pair
The full linking layer answered carries out corresponding tasking learning to the corresponding detection human face region filtered out.
It is understood that approve in the present embodiment in step s101 execute Fig. 2 and it is shown in Fig. 3 with each task
The step of before the study of corresponding full linking layer progress corresponding task.
The training of each task in the multi-task learning of multi-task learning depth network of the invention is illustrated:
For Face datection task, its corresponding preset condition is to predict that human face region is overlapping with label human face region
Degree is greater than 0.5, or prediction human face region and marks the degree of overlapping of human face region less than 0.35, in other words, in the present embodiment will be with
The degree of overlapping of human face region is marked to be greater than 0.5 or carry out with prediction human face region of the degree of overlapping less than 0.35 of label human face region
Face datection task.Using with detection human face region of the degree of overlapping greater than 0.5 of label human face region as positive sample, will in mark
Detection human face region of the degree of overlapping less than 0.35 of human face region is remembered as negative sample, and formula is as follows:
lossD=-(1-l) log (1-p)-llog (p);
Wherein, lossDFor loss function, loss in the present embodimentDFor softmax function;For positive sample, the value of l is
1;For negative sample, the value of l is -1;P indicates that the detection human face region belongs to the probability of face.People can be set in the present embodiment
The p value being calculated is compared by face probability threshold value with face probability threshold value, be greater than and/or equal to face probability threshold value p
Being worth corresponding detection human face region is considered as face, and detection human face region corresponding less than the p value of face probability threshold value is recognized
For be it is non-face, thus carry out Face datection task study.
For positioning feature point task, the present embodiment has used labeled 21 good facial spies in AFLW data set
Sign point.Preset condition corresponding with positioning feature point task is the weight predicted human face region and mark human face region in the present embodiment
Folded degree is greater than 0.35, i.e., determines using the prediction human world region with the degree of overlapping of label human face region greater than 0.35 as characteristic point is carried out
The detection human face region of the study of position/task.Wherein, detection human face region is indicated with { x, y, w, h }, and (x, y) is detection face area
The coordinate at the center in domain, w and h are respectively the width and height for detecting human face region.Each characteristic point is relative to detection human face region
(x, y) is deviated at center, does normalized with the coordinate of (w, h) to characteristic point:
Wherein, (xi,yi) represent face characteristic point coordinate, (ai,bi) indicate that the coordinate process of the characteristic point of face is returned
One changes treated relative value.
(0,0) is set by the coordinate of sightless characteristic point in the present embodiment, visible characteristic point is then used pre-
Determine the study that loss function carries out positioning feature point task, formula is as follows:
Wherein, lossLFor loss function, loss function is euclidean function in the present embodiment;N is characterized quantity a little
(in AFLW data set, 21) quantity of characteristic point is;For corresponding predicted characteristics point coordinate be normalized after
Relative coordinate.viThe visibility factor for indicating characteristic point, if viEqual to 1, then it represents that this feature point can in the detection human face region
See, if viEqual to 0, then it represents that this feature point is invisible in the detection human face region, and invisible characteristic point is not in the present embodiment
Participate in backpropagation.
There is above-mentioned two calculation formula, relative coordinate after being normalized according to the coordinate of corresponding predicted characteristics point,
Characteristic point quantity, the coordinate for detecting human face region and width and height, are finally calculated the coordinate value of characteristic point.
For feature visibility of a point, can the present embodiment, with predicted characteristics point by the visibility factor of learning characteristic point
It can be seen that.Preset condition corresponding with feature visibility of a point prediction task is prediction human face region and label face area in the present embodiment
The degree of overlapping in domain is greater than 0.35, i.e., using the prediction human face region with the degree of overlapping of label human face region greater than 0.35 as detection people
Face region carries out the study of feature visibility of a point prediction task.Formula is as follows:
Wherein, lossVIt is euclidean function in the present embodiment for loss function;N is characterized quantity (AFLW number a little
According to concentration, 21) quantity of characteristic point is;If characteristic point as it can be seen that if its visibility factor viIt is 1, if characteristic point is invisible,
Visibility factor is 0, and thus calculating characteristic point being capable of visible predicted value
For gender identification mission, its corresponding preset condition is prediction human face region and label face in the present embodiment
The degree of overlapping in region is greater than 0.5, i.e., using the prediction human face region with the degree of overlapping of label human face region greater than 0.5 as detection people
Face region carries out the study of gender identification mission, and formula is as follows:
lossG=-(1-g) log (1-p0)-g·log(p1)
Wherein, lossGFor loss function, the present embodiment can use cross entropy loss function;(p0,p1) it is a two dimension
Probability vector has network query function acquisition, if gender is male, g=0, if gender is women, g=1.
Further, the global loss function of the multi-task learning depth network of the present embodiment is the individual damage of each task
The weighted sum of mistake value, calculation formula are as follows:
Wherein, losstIt is the penalty values of corresponding t-th of task, weight parameter λtBe by each task in total losses
Importance determines, λ in the present embodimentD=1, λL=5, λV=0.5, λG=2, it is special to respectively indicate Face datection task, face
Levy point location task, specific visibility of a point prediction task and gender identification mission.
It is and each it is understood that the study of above-mentioned each task is carried out in its corresponding full linking layer
The corresponding full linking layer of task links corresponding loss function, only carries out the study of above-mentioned each task.
Further, referring to Fig. 8, Fig. 8 is the knot of one embodiment of training equipment of multi-task learning depth network of the present invention
Structure schematic diagram.As shown in figure 8, the training equipment 100 of the multi-task learning depth network of the present embodiment includes interconnected deposit
Reservoir 101 and processor 102, wherein memory 101 is stored with the multi-task learning depth network having been built up and corresponding
Program data, in addition, memory 101 can also store the training set for training the multi-task learning depth network.Processor
102 for executing the training method first embodiment of Fig. 1 to multi-task learning depth network shown in Fig. 7 according to program data
To any embodiment of 3rd embodiment, completion is trained multi-task learning depth network.
Further, as shown in figure 9, in another embodiment, training equipment 200 can also include passing through bus and memory
101 and/or the telecommunication circuit 103 that connects of processor 102, the telecommunication circuit 103 training set is inputted for obtaining training set
Processor, training set can not have to be stored in memory 101 at this time.
Further, the invention also provides the test methods of multi-task learning depth network.Referring to Fig. 10, Figure 10 is this
The flow diagram of the test method first embodiment of invention multi-task learning depth network.As shown in Figure 10, the present embodiment
The test method of multi-task learning depth network at least may include following steps:
In step s 201, testing image is inputted into two-level concatenation convolutional neural networks, includes in output testing image
First human face region to be measured.
In the present embodiment, testing image can be the figure that the training of multi-task learning depth network is not used in training set
Picture, the image being also possible in other data sets;For example, above-mentioned AFLW data set, to multi-task learning depth network into
When row training, 25000 images in AFLW data set have been used, then 993 images for being not used for training have been used as to mapping
Picture.
Further, testing image is by before being inputted trained multi-task learning depth network by the present embodiment,
Two-level concatenation convolutional neural networks are first passed through to handle the testing image of input, obtain include in testing image first to
Survey human face region.It is worth noting that, first human face region to be measured is in the test process of multi-task learning depth network
It is obtained by two-level concatenation convolutional neural networks, in step S106 shown in Fig. 6, multi-task learning depth network is in training
The prediction human face region being calculated in the process is not identical.
In step S202, the first human face region to be measured is inputted into multi-task learning depth network, from the first face to be measured
Selection meets the second human face region to be measured of preset condition in region, output to the second human face region to be measured carry out Face datection,
Positioning feature point, the testing result of the prediction of feature visibility of a point and gender identification.
The testing image for obtaining the first human face region to be measured is inputted in trained multi-task learning depth network, is enabled
Multi-task learning depth network selects the second human face region to be measured for meeting preset condition from the first human face region to be measured, in turn
The test of each task in multitask, final output Face datection, positioning feature point, feature are carried out to the second human face region to be measured
The testing result of visibility of a point prediction and gender identification.
Further, multi-task learning depth network receives obtained by two-level concatenation convolutional neural networks the in the present embodiment
After one human face region to be measured, the first human face region to be measured is calculated, obtains the corresponding of each first human face region to be measured
Score is detected, the second human face region to be measured is filtered out from the first human face region to be measured according to the detection score.Wherein, screening is
Each first human face region to be measured is detected score to be compared with preset fraction threshold value accordingly, is filtered out greater than preset fraction
First human face region to be measured corresponding to the detection score of threshold value, using the filter out first human face region to be measured as more of input
Second human face region to be measured of business study depth network.Wherein, preset fraction threshold value can be adjusted according to actual needs, this
Preset fraction threshold value can be 0.4,0.5 or 0.6 in embodiment.
In the present embodiment, the testing image of the first human face region to be measured is inputted into trained multi-task learning depth net
After in network, the test of each task in multitask, the execution of multi-task learning depth network are carried out to the second human face region to be measured
Content is similar to content performed by the training process of multi-task learning depth network, by the testing image of the first human face region to be measured
The particular content for inputting trained multi-task learning depth network progress multi-task learning detection please refers to Fig. 1 to Fig. 6 institute
The training method first embodiment of the multi-task learning depth network shown to 3rd embodiment any one.
Further, it when carrying out the test of positioning feature point and feature visibility of a point prediction task, needs to characteristic point
Coordinate converted, changed as the coordinate in original image, the transformation for mula used is as follows:
Wherein,It is the relative position of the ith feature point of prediction.
The present embodiment is when testing trained multi-task learning depth network, in multi-task learning depth net
Two-level concatenation convolutional neural networks are added to before network, are carried out by testing image of the two-level concatenation convolutional neural networks to input
The determination for detecting human face region, obtains the first human face region to be measured, and then make multi-task learning depth network from testing image
The detection for the multitask that can be more prepared according to the first human face region to be measured improves the detection of each personage in multitask
Precision.The multi-task learning depth network of the present embodiment shines posture changing, the extreme path in image, exaggeration expression and part hide
The complicated situation robustness with higher of the variation such as gear, has excellent performance, realizes higher precision and preferable property
Energy.
Further, Figure 11 is please referred to, Figure 11 is the flow diagram of an embodiment of step S201 in Figure 10.Such as Figure 11
Shown, step S201 may include following steps:
In step S2011, by the first order neural network of testing image input two-level concatenation convolutional neural networks, output
It is respectively labeled as several couple candidate detection windows in human face region and non-face region.
In the present embodiment, which is carried out by the first order neural network in two-level concatenation convolution depth network.It is to be measured
Image inputs in the first order neural network of two-level concatenation convolutional neural networks, which includes several cascade
Convolutional layer and pond layer, each convolutional layer and pond layer gradually carry out corresponding operation to testing image, finally by the figure of output
As being divided into two classes, and two class images are marked, if the label exported respectively is human face region and non-face region
Dry couple candidate detection window, which, which can be entered in the neural network of the second level, carries out subsequent processing.
Figure 12 is please referred to, as shown in figure 12, several cascade convolutional layers that the first order neural network of the present embodiment includes
It may include: first layer convolutional layer (conv1), second layer pond layer (pool1), third layer convolutional layer (conv2) with pond layer
And the 4th layer of convolutional layer (conv3).Wherein, the convolution kernel of first layer convolutional layer is having a size of 3x3, due to other classification and more
Object detection task is compared, and determines that face candidate region is substantially a challenging two-value classification task, therefore every
Layer may need less convolution kernel number, therefore can reduce calculation amount using the convolution kernel of 3x3 size, while adding nerve
The depth of network, and then the performance of neural network is made to be further improved.The size of the Chi Huahe of second layer pond layer is
2x2 is operated using maximum pondization.The size of the convolution kernel of third layer convolutional layer is 3x3.The ruler of the convolution kernel of 4th layer of convolutional layer
Very little is 1x1, and convolution kernel, which is sized to 1x1, can enable that neural network can complete information exchange across channel and information is whole
It closes, and dimensionality reduction can be carried out convolution kernel port number and/or rise dimension processing.
In other embodiments, the first order neural network of two-level concatenation convolutional neural networks is face in output token
It, can be with output boundary frame regression vector while several couple candidate detection windows in region and non-face region.
In step S2012, several couple candidate detection windows are inputted into the second level in two-level concatenation convolutional neural networks
Neural network is abandoned the couple candidate detection window for being labeled as non-face region by second level neural network, and is to label
The couple candidate detection window in region carries out bounding box recurrence processing, output boundary frame returns that treated the first candidate face region,
Using the first candidate face region as the first human face region to be measured.
In the present embodiment, which is carried out by the second level neural network of two-level concatenation convolutional neural networks.By step
Several couple candidate detection windows obtained in S2011 input second level neural network, at this time several couple candidate detection window indicias
There are human face region and non-face region, is labeled as at this point, second level neural network is then abandoned from several couple candidate detection windows
The couple candidate detection window in non-face region retains the couple candidate detection window for being labeled as human face region.Further, to couple candidate detection window
Mouthful bounding box recurrence processing is carried out, further obtains bounding box and return treated the first candidate face region, and first is waited
Human face region is selected to input in multi-task learning depth network as the first human face region to be measured, to carry out multi-task learning depth net
The test of network.In the present embodiment, the first candidate face region of output includes the location information of the region in the picture.
It is understood that the human face region that several are marked obtained in first order neural network, it may be to same
Face can mark several or even tens or more human face regions, then in the neural network of the second level, to the same person
Multiple human face regions of face carry out bounding box recurrence, reduce the human face region to the same face, and improve obtained face area
The matching precision of face in domain and image carries out bounding box at this time and returns usable first order neural network output when processing
Bounding box regression vector.
Figure 13 is please referred to, as shown in figure 13, second level neural network equally may include several cascade convolutional layers and pond
Layer, such as cascade first layer convolutional layer, second layer pond layer, third layer convolutional layer, the 4th layer of pond layer, layer 5 convolutional layer
And full linking layer.Wherein, the size of the convolution kernel of first layer convolutional layer and third layer convolutional layer is 3 × 3;Layer 5 convolutional layer
Convolution kernel size be 2 × 2;The size of the convolution kernel of second layer pond layer and the 4th layer of pond layer is 3 × 3, and is all made of
Maximum pondization operation;Full linking layer is the full linking layer of 128 dimensional feature vectors.
According to Figure 12 and Figure 13, due to the size for the image that first order neural network and second level neural network input
Difference, therefore testing image is being inputted into first nerves network, and the first candidate face region is inputted into second level nerve net
Before network, need to carry out size adjusting to testing image and the first candidate face region respectively.
Further, Figure 14 is please referred to, Figure 14 is the test method second embodiment of multi-task learning depth network of the present invention
Flow diagram.As shown in figure 14, in the present embodiment, after the step S201 of Figure 10, can also include the following steps:
In step S203, degree of overlapping in the first human face region to be measured is higher than the default human face region to be measured of degree of overlapping first
It merges, the final human face region to be measured after being merged.
It is understood that in the first human face region to be measured obtained in two-level concatenation convolutional neural networks, it may be to same
One face can obtain several, tens human face regions to be measured of even more first.And then in the present embodiment, it is believed that
In the first human face region to be measured that two-level concatenation convolutional neural networks obtain the human face region to be measured of degree of overlapping higher first be by
What the same face obtained, therefore, the human face region to be measured of degree of overlapping higher first can be merged, with reduce first to
The quantity of human face region is surveyed, and improves detection accuracy.
Further, the first human face region to be measured that the present embodiment obtains two-level concatenation convolutional neural networks compares
It is right, degree of overlapping mutual between the multiple first human face regions to be measured is obtained, degree of overlapping is higher than two or more of default degree of overlapping
A first human face region to be measured merges, and then obtains final human face region to be measured.The final human face region to be measured that will be obtained
Multi-task learning depth network is inputted to carry out subsequent testing procedure.
It, can be next pair by non-maxima suppression algorithm (non maximum suppression, NMS) in the present embodiment
First human face region to be measured merges, and non-maxima suppression algorithm is related to the selection scoring highest from the first human face region to be measured
Region and abandon other regions of all overlappings greater than specific threshold, and by the area zoom of selection to preset ruler
Very little, this in the present embodiment is preset having a size of 227x227.In addition, default degree of overlapping can be adjusted according to actual needs.
In another embodiment, step S203 can also be executed after step S1012, i.e., obtain in step S1012
After bounding box returns treated the first candidate face region, pass through non-maxima suppression algorithm (non maximum
Suppression, NMS) the first candidate face region merged.Final people to be measured is obtained by NMS in the present embodiment
The process flow in face region is as follows:
In above-mentioned process flow, it is as follows that score resets function Si:
In above-mentioned formula, in order to which whether the first candidate face region for differentiating adjacent can retain, NMS has used hard threshold
The method of value.It is finally completed the final human face region to be measured after being merged.
Further, Figure 15 is please referred to, Figure 15 is the 3rd embodiment of multi-task learning depth network test mode of the present invention
Flow diagram.As shown in figure 15, in the present embodiment, after the step S201 of Figure 10, can also include the following steps:
In step S204, the size of the first human face region to be measured is adjusted, by the size adjusting of the first human face region to be measured
The default human face region size allowed for multi-task learning depth network.
It is required since size of the multi-task learning depth network to the human face region to be measured of input has, this implementation
In example, after obtaining two-level concatenation convolutional neural networks and obtaining the first human face region to be measured, the first human face region to be measured is carried out
Size adjusting is adjusted to the default human face region size of multi-task learning depth network permission.At this point, it is to be measured to merge first
Human face region is the first human face region to be measured merged after size adjusting.
Further, step S204 can be executed after step S203, i.e., to the final human face region to be measured by merging
Size be adjusted, be adjusted to multi-task learning depth network permission default human face region size.
Further, Figure 16 is please referred to, Figure 16 is the fourth embodiment of multi-task learning depth network test mode of the present invention
Flow diagram.As shown in figure 16, the present embodiment looks for that, can also include following step before the step S201 of Figure 10
It is rapid:
In step S205, the size of testing image is adjusted, is two-level concatenation convolution mind by the size adjusting of testing image
The size of the testing image allowed through network.
The present embodiment is treated before by the first order neural network of testing image input two-level concatenation convolutional neural networks
Altimetric image carries out different change in size.Wherein, original zoom scale isWherein, S is the first human face region to be measured
Minimum dimension, 12 be the human face region to be measured of first order neural network acceptable first minimum dimension.In the present embodiment,
It can be as follows to the process flow of testing image:
Wherein, loss function is divided into two parts, respectively about the recurrence of face classification and human face region.Intersect entropy loss
Function is or non-face region is classified to label, for each sample xi, formula is as follows:
Wherein,Indicate the physical tags of background, piThen indicate sample xiIt is the probability of face.
Wherein, bounding box recurrence processing is carried out using quadratic loss function, practical upper returning loss is to use Euclidean distance
Come what is solved, formula is as follows:
Wherein,The coordinate that representative is obtained with neural network forecast, andIndicate actual background coordination.yboxIt is by upper left
The abscissa at angle, the ordinate in the upper left corner, the long and wide four-tuple composition formed.
The multi-task learning depth network of the present embodiment can actually be considered a three-level network, loss function packet
Face classification is contained and bounding box returns two parts, it is therefore desirable to two part training, two loss functions, to each damage
Function is lost to distribute to form final objective function by different currency types.The final goal function of the present embodiment is as follows:
The entire training process of loss function is substantially the process for minimizing above-mentioned function, wherein αjRepresent corresponding appoint
The importance of business, N indicate training samples number, the α in first order neural network and second level neural networkdet=1, αbox=
0.5。Indicate sample label.
Further, visible to multi-task learning depth network progress Face datection of the invention, positioning feature point, characteristic point
Property prediction and gender identification detection accuracy be illustrated:
The detection accuracy of the Face datection of the present embodiment mainly uses face detection and standard database (Face
Detection Data Set and Benchmark, FDDB) assess corresponding performance.FDDB database has 245 images
It is made of with 5171 the face of label, Univ. of Massachusetts's offer is provided, for justice, FDDB provides unified evaluation and test generation
Code.According to test result, for the multi-task learning depth network of the present embodiment when erroneous detection number is 100, measuring accuracy can
Reach 86.61%, it is only more slightly lower than optimal precision 88.53% (to be sensed by the deep cone deformable member model (Deep of face
Pyramid Deformable Parts Model for Face Detection, DP2MFD model) test obtain), with
The measuring accuracy of the increase of erroneous detection number, the Face datection of the multi-task learning depth network of the present embodiment also accordingly rises, when
When erroneous detection number is 250, measuring accuracy can be up to 90.1%.For multi-task learning depth network Face datection and
Speech, FDDB data set is that extremely have challenge, because data set includes many small and fuzzy faces, firstly, by image
It is adjusted to the input size of 227x227, face can be made to generate distortion, causes to detect score reduction.Despite the presence of these problems, originally
The multi-task learning depth network of embodiment still obtains relatively good test effect.
The performance of the Face datection of the multi-task learning depth network of the present embodiment is assessed by AFLW data set.AFLW number
It is made of according to collection 1000 pictures containing 1132 face samples.Only when degree of overlapping is greater than preset threshold (the present embodiment
When can be set to 0.5), just as the data set of face test, it is special to calculate prediction corresponding with human face region to be measured
Levy the mean place of point.Create the subset of 450 samples at random from AFLW data set, and be divided into according to deflection angle [0 °,
30 °], [30 °, 60 °] and this three groups of [60 °, 90 °] each account for 1/3.Positioning accuracy is assessed using normalization mean error, but
It is to be related to the visibility of characteristic point due to method of the invention, visible features point normalizes the average value of evaluated error, such as
Shown in lower:
Wherein, UiRepresent actual characteristic point coordinate, viIt is the corresponding visibility of characteristic point,It is sat for the characteristic point of prediction
Mark, NtIndicate test number of pictures.Wherein | vi|1It is the number of i-th of picture visible features point, Ui(:, j) it is UiJth column,
diFor the square root of face bounding box size.It merits attention, when the close front of facial image, diIt is in most instances
Using the distance of pupil center, however in view of including invisible characteristic point in AFLW data set, so diUse face boundary
Frame size.The test method of the multi-task learning depth network of the present embodiment remains to obtain preferable relative to existing method
Structure.
Gender identification is assessed by CelebA data set and LFWA data set, these data sets contain gender
Information.CelebA and LFWA data set separately includes the tag image selected in Celeb Faces and LFWA data set.CelebA
Data set includes 10000 identity, a total of 200,000 images.LFWA data set include 5327 identity, a total of 13233
Image.The multi-task learning depth network of the present embodiment achieves 97% accuracy on CelebA data set, in LFWA number
According to the accuracy for achieving 93% on collection.
Further, Figure 17 is please referred to, Figure 17 is one embodiment of test equipment of multi-task learning depth network of the present invention
Structural schematic diagram.As shown in figure 17, the test equipment 300 of the present embodiment multi-task learning depth network at least may include mutual
Memory 301, telecommunication circuit 303 and the processor 302 of connection;Memory 301 is stored with two-level concatenation convolutional neural networks, more
Tasking learning depth network and program data;Telecommunication circuit 303 is for obtaining testing image;Processor 302 is used for according to journey
Ordinal number evidence executes the test method of above-mentioned multi-task learning depth network, utilizes two-level concatenation convolutional neural networks and Duo Ren
Business study depth network handles altimetric image carries out Face datection, positioning feature point, the prediction of feature visibility of a point and gender identification.
In another embodiment, training set can also be stored directly in memory 301.
On the other hand, Figure 18 is please referred to, Figure 18 is the structural schematic diagram of one embodiment of storage medium of the present invention.Such as Figure 18 institute
Show, at least one program or instruction 401 is stored in the storage medium 400 of the present embodiment, program or instruction 401 are for executing such as
Fig. 1 to multi-task learning depth network shown in Fig. 7 training method first embodiment into 3rd embodiment any embodiment
And/or the test method first embodiment of multi-task learning depth network shown in Figure 10 to Figure 16 appointing into 3rd embodiment
Meaning embodiment.
In one embodiment, storage medium 400 can be the memory in Fig. 8, Fig. 9 or Figure 17, and the present embodiment is deposited
Storage media 400 can be that storage chip, can also be hard disk either mobile hard disk or flash disk, CD etc., other read-write are deposited
The tool of storage, in addition, storage medium can also be server etc..
Mode the above is only the implementation of the present invention is not intended to limit the scope of the invention, all to utilize this
Equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content, it is relevant to be applied directly or indirectly in other
Technical field is included within the scope of the present invention.
Claims (10)
1. a kind of training method of multi-task learning depth network characterized by comprising
Training set is inputted in multi-task learning depth network and carries out multi-task learning, exports the prediction knot of the multi-task learning
Fruit, wherein the multi-task learning include positioning feature point task, feature visibility of a point prediction task, Face datection task with
And gender identification mission;
The prediction result is compared with the label result in the training set, according to the comparison result obtain with it is described
The corresponding penalty values of multi-task learning;
The penalty values are fed back in the multi-task learning depth network, the multi-task learning depth network is corrected.
2. training method according to claim 1, which is characterized in that described that training set input multi-task learning is deep
The step of carrying out multi-task learning in degree network, export the prediction result of the multi-task learning, comprising:
The training set is inputted to the several layers convolutional layer and pond layer of the multi-task learning depth network cascade step by step, and
From in the several layers convolutional layer and pond layer multiple convolutional layers and/or multiple pond layers export corresponding first operation respectively
As a result;
The first operation result input feature vector is merged into full linking layer, exports Fusion Features data;
The Fusion Features data are inputted full linking layer corresponding with each task in the multi-task learning to carry out respectively
The study of each task exports the corresponding prediction result of each task respectively.
3. training method according to claim 2, which is characterized in that described from the several layers convolutional layer and pond layer
In multiple convolutional layers and/or multiple pond layers the step of exporting corresponding first operation result respectively after, further includes:
First operation result is at least partly inputted into corresponding sub- convolutional layer respectively, is exported corresponding with identical dimensional
The second operation result.
4. training method according to claim 3, which is characterized in that in at least portion by first operation result
After the step of dividing and input corresponding sub- convolutional layer respectively, exporting corresponding second operation result with identical dimensional, also wrap
It includes:
Second operation result with identical dimensional is inputted into full convolutional layer, the third operation knot after exporting dimension-reduction treatment
Fruit.
5. training method according to claim 2, which is characterized in that described from the several layers convolutional layer and pond layer
Multiple convolutional layers and/or multiple pond layers the step of exporting corresponding first operation result respectively, comprising:
From first layer pond layer, third layer convolutional layer and the layer 5 pond layer difference in the several layers convolutional layer and pond layer
Corresponding first pond operation result, the first convolution operation result and the second pond operation result are exported, by first pond
Operation result, the first convolution operation result and the second pond operation result are as the first operation result.
6. training method according to claim 1, which is characterized in that training set is inputted multi-task learning depth described
Before the step of carrying out multi-task learning in network, further includes:
The training that Face datection task is carried out using AlexNet network, obtains weight corresponding with the Face datection task;
Utilize multi-task learning depth network described in the weights initialisation.
7. training method according to claim 1, which is characterized in that training set is inputted multi-task learning depth described
Before the step of carrying out multi-task learning in network, further includes:
Calculate the prediction human face region of the image in the training set;
Described input training set in multi-task learning depth network carries out multi-task learning, comprising:
The training set is inputted in the multi-task learning depth network, the prediction human face region and the training set
In image on marked label human face region, obtain comparison result;
According to the comparison result, select the prediction human face region for meeting preset condition as detection human face region;
Multi-task learning is carried out to the detection human face region.
8. training method according to claim 1, which is characterized in that the loss function of the Face datection task is
Softmax function;The positioning feature point and the loss function of specific visibility of a point prediction are euclidean function;The property
The loss function of other identification mission is cross entropy loss function.
9. a kind of training equipment of multi-task learning depth network, which is characterized in that including memory interconnected and processing
Device;
The memory is stored with the multi-task learning depth network and program data of training set, building;
The processor is used for according to described program data, and perform claim requires training method described in 1-8 any one, is utilized
The training set is trained the multi-task learning depth network.
10. a kind of storage medium, which is characterized in that be stored with program data, described program data can be performed to realize power
Benefit requires the training method of multi-task learning depth network described in 1-8 any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810615755.0A CN109086660A (en) | 2018-06-14 | 2018-06-14 | Training method, equipment and the storage medium of multi-task learning depth network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810615755.0A CN109086660A (en) | 2018-06-14 | 2018-06-14 | Training method, equipment and the storage medium of multi-task learning depth network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109086660A true CN109086660A (en) | 2018-12-25 |
Family
ID=64839640
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810615755.0A Pending CN109086660A (en) | 2018-06-14 | 2018-06-14 | Training method, equipment and the storage medium of multi-task learning depth network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109086660A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110070047A (en) * | 2019-04-23 | 2019-07-30 | 杭州智趣智能信息技术有限公司 | A kind of face control methods, system and electronic equipment and storage medium |
CN110110729A (en) * | 2019-03-20 | 2019-08-09 | 中国地质大学(武汉) | Construction example mask extracting method based on U-shaped CNN model realization remote sensing images |
CN110188780A (en) * | 2019-06-03 | 2019-08-30 | 电子科技大学中山学院 | Method and device for constructing deep learning model for positioning multi-target feature points |
CN110334735A (en) * | 2019-05-31 | 2019-10-15 | 北京奇艺世纪科技有限公司 | Multitask network generation method, device, computer equipment and storage medium |
CN110348416A (en) * | 2019-07-17 | 2019-10-18 | 北方工业大学 | Multi-task face recognition method based on multi-scale feature fusion convolutional neural network |
CN111027428A (en) * | 2019-11-29 | 2020-04-17 | 北京奇艺世纪科技有限公司 | Training method and device of multi-task model and electronic equipment |
CN111292801A (en) * | 2020-01-21 | 2020-06-16 | 西湖大学 | Method for evaluating thyroid nodule by combining protein mass spectrum with deep learning |
CN112488003A (en) * | 2020-12-03 | 2021-03-12 | 深圳市捷顺科技实业股份有限公司 | Face detection method, model creation method, device, equipment and medium |
CN113239885A (en) * | 2021-06-04 | 2021-08-10 | 新大陆数字技术股份有限公司 | Face detection and recognition method and system |
CN113591573A (en) * | 2021-06-28 | 2021-11-02 | 北京百度网讯科技有限公司 | Training and target detection method and device for multi-task learning deep network model |
CN117079337A (en) * | 2023-10-17 | 2023-11-17 | 成都信息工程大学 | High-precision face attribute feature recognition device and method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017015390A1 (en) * | 2015-07-20 | 2017-01-26 | University Of Maryland, College Park | Deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition |
CN106503669A (en) * | 2016-11-02 | 2017-03-15 | 重庆中科云丛科技有限公司 | A kind of based on the training of multitask deep learning network, recognition methods and system |
CN107194346A (en) * | 2017-05-19 | 2017-09-22 | 福建师范大学 | A kind of fatigue drive of car Forecasting Methodology |
-
2018
- 2018-06-14 CN CN201810615755.0A patent/CN109086660A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017015390A1 (en) * | 2015-07-20 | 2017-01-26 | University Of Maryland, College Park | Deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition |
CN106503669A (en) * | 2016-11-02 | 2017-03-15 | 重庆中科云丛科技有限公司 | A kind of based on the training of multitask deep learning network, recognition methods and system |
CN107194346A (en) * | 2017-05-19 | 2017-09-22 | 福建师范大学 | A kind of fatigue drive of car Forecasting Methodology |
Non-Patent Citations (1)
Title |
---|
RAJEEV RANJAN 等: "HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation,and Gender Recognition", 《ARXIV 在线公开:HTTPS://ARXIV.ORG/ABS/1603.01249V3》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110110729A (en) * | 2019-03-20 | 2019-08-09 | 中国地质大学(武汉) | Construction example mask extracting method based on U-shaped CNN model realization remote sensing images |
CN110070047A (en) * | 2019-04-23 | 2019-07-30 | 杭州智趣智能信息技术有限公司 | A kind of face control methods, system and electronic equipment and storage medium |
CN110070047B (en) * | 2019-04-23 | 2021-03-26 | 杭州智趣智能信息技术有限公司 | Face comparison method and system, electronic equipment and storage medium |
CN110334735B (en) * | 2019-05-31 | 2022-07-08 | 北京奇艺世纪科技有限公司 | Multitask network generation method and device, computer equipment and storage medium |
CN110334735A (en) * | 2019-05-31 | 2019-10-15 | 北京奇艺世纪科技有限公司 | Multitask network generation method, device, computer equipment and storage medium |
CN110188780A (en) * | 2019-06-03 | 2019-08-30 | 电子科技大学中山学院 | Method and device for constructing deep learning model for positioning multi-target feature points |
CN110348416A (en) * | 2019-07-17 | 2019-10-18 | 北方工业大学 | Multi-task face recognition method based on multi-scale feature fusion convolutional neural network |
CN111027428A (en) * | 2019-11-29 | 2020-04-17 | 北京奇艺世纪科技有限公司 | Training method and device of multi-task model and electronic equipment |
CN111027428B (en) * | 2019-11-29 | 2024-03-08 | 北京奇艺世纪科技有限公司 | Training method and device for multitasking model and electronic equipment |
CN111292801A (en) * | 2020-01-21 | 2020-06-16 | 西湖大学 | Method for evaluating thyroid nodule by combining protein mass spectrum with deep learning |
CN112488003A (en) * | 2020-12-03 | 2021-03-12 | 深圳市捷顺科技实业股份有限公司 | Face detection method, model creation method, device, equipment and medium |
CN113239885A (en) * | 2021-06-04 | 2021-08-10 | 新大陆数字技术股份有限公司 | Face detection and recognition method and system |
CN113591573A (en) * | 2021-06-28 | 2021-11-02 | 北京百度网讯科技有限公司 | Training and target detection method and device for multi-task learning deep network model |
CN117079337A (en) * | 2023-10-17 | 2023-11-17 | 成都信息工程大学 | High-precision face attribute feature recognition device and method |
CN117079337B (en) * | 2023-10-17 | 2024-02-06 | 成都信息工程大学 | High-precision face attribute feature recognition device and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109086660A (en) | Training method, equipment and the storage medium of multi-task learning depth network | |
CN109033953A (en) | Training method, equipment and the storage medium of multi-task learning depth network | |
CN109101869A (en) | Test method, equipment and the storage medium of multi-task learning depth network | |
CN110532920A (en) | Smallest number data set face identification method based on FaceNet method | |
CN108564049A (en) | A kind of fast face detection recognition method based on deep learning | |
CN101739712B (en) | Video-based 3D human face expression cartoon driving method | |
CN103258204B (en) | A kind of automatic micro-expression recognition method based on Gabor and EOH feature | |
US11531876B2 (en) | Deep learning for characterizing unseen categories | |
Cevikalp et al. | Efficient object detection using cascades of nearest convex model classifiers | |
CN104680144B (en) | Based on the lip reading recognition methods and device for projecting very fast learning machine | |
CN102136024B (en) | Biometric feature identification performance assessment and diagnosis optimizing system | |
CN107463920A (en) | A kind of face identification method for eliminating partial occlusion thing and influenceing | |
CN102136075B (en) | Multiple-viewing-angle human face detecting method and device thereof under complex scene | |
CN107273845A (en) | A kind of facial expression recognizing method based on confidence region and multiple features Weighted Fusion | |
CN105184260B (en) | A kind of image characteristic extracting method and pedestrian detection method and device | |
CN106096557A (en) | A kind of semi-supervised learning facial expression recognizing method based on fuzzy training sample | |
CN108647654A (en) | The gesture video image identification system and method for view-based access control model | |
CN106651915B (en) | The method for tracking target of multi-scale expression based on convolutional neural networks | |
CN109359608A (en) | A kind of face identification method based on deep learning model | |
CN101739555A (en) | Method and system for detecting false face, and method and system for training false face model | |
CN107871107A (en) | Face authentication method and device | |
CN110321862B (en) | Pedestrian re-identification method based on compact ternary loss | |
CN112784763A (en) | Expression recognition method and system based on local and overall feature adaptive fusion | |
CN108921011A (en) | A kind of dynamic hand gesture recognition system and method based on hidden Markov model | |
CN109977887A (en) | A kind of face identification method of anti-age interference |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181225 |