CN105469041A

CN105469041A - Facial point detection system based on multi-task regularization and layer-by-layer supervision neural networ

Info

Publication number: CN105469041A
Application number: CN201510807796.6A
Authority: CN
Inventors: 熊红凯; 倪赛杰
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2015-11-19
Filing date: 2015-11-19
Publication date: 2016-04-06
Anticipated expiration: 2035-11-19
Also published as: CN105469041B

Abstract

The invention discloses a facial point detection system based on multi-task regularization and a layer-by-layer supervision neural network. The system comprises a multi-task regularization module and a layer-by-layer supervision network module. The multi-task regularization module includes a main task and a related task; and the main task and the related task study jointly to obtain a common feature space and then an additional regular term is provided by using an auxiliary tag of the related task to enhance a generalization ability of a network. The layer-by-layer supervision network module, different from the traditional convolution neural network only optimizing an objective function of an output layer, introduces a supervision objective function into each interlayer, thereby enhancing the saliency of features obtained by studying of the interlayers. Therefore, problems that overfitting occurs and the feature robustness is uncertain according to the traditional convolution neural network can be solved effectively.

Description

Based on multitask regularization and the face point detection system of successively supervising neural network

Technical field

The present invention relates to a kind of face point detecting method of computer vision field, specifically a kind of based on multitask regularization and the face point detection system of successively supervising neural network.

Background technology

At computer vision field, face point, the detection as eyes, nose, face etc. is a very basic and important problem, is the basis of follow-up recognition of face, tracking and the modeling of 3D face.Even if there is large quantifier elimination to drop into wherein, due to head pose change and the partial occlusion problem of people in image, when face point detects limited at ambient, remain a challenging problem.

Existing face point detecting method is mainly divided into two classes: template adaptive with based on the method returned.First carry out feature extraction to input picture based on the method returned, the Feature Mapping then study arrived is to the space of human face characteristic point.Original image as input, utilizes multiple linear filter automatically to calculate high-level character representation by convolutional neural networks, extracts in application obtain remarkable achievement at actual characteristic.

" Deepconvolutionalnetworkcascadeforfacialpointdetection " that the people such as Y.Sun deliver in " IEEEComputerVisionandPatternRecognition " (IEEECVPR) meeting of 2013 one proposes a kind of face point detecting method of multiple convolutional neural networks cascade in literary composition, face is divided into several part by advance, convolutional neural networks is used alone to each part and carries out feature point detection from coarse to fine, but the method for this cascade makes network parameter be multiplied causes training difficulty, and very large computing cost can be brought.

" Faciallandmarkdetectionbydeepmulti-tasklearning " that the people such as Z.Zhang deliver in " EuropeanConferenceonComputerVision " meeting in 2014 one proposes a kind of method of multi-task learning in literary composition.This method utilizes the correlativity of other characteristics of face and unique point to carry out the foundation of convolutional neural networks model, to promote the detection to main task and face point.This method reduces model complexity, but do not consider the physical relationship of main task and inter-related task.

Summary of the invention

The present invention is directed to defect of the prior art, provide a kind of based on multitask regularization and the face point detection system of successively supervising neural network, effectively can solve over-fitting and the uncertain problem of feature robustness that conventional roll amasss neural network.

The present invention is achieved by the following technical solutions:

Of the present invention a kind of based on multitask regularization and the face point detection system of successively to supervise neural network, comprise two parts: multitask regularization module and successively supervise mixed-media network modules mixed-media, wherein:

Describedly successively supervise mixed-media network modules mixed-media, according to its pixel value, feature extraction is carried out to input picture, be different from traditional convolutional neural networks to be only optimized output layer objective function, this module all introduces supervision objective function to each middle layer, thus strengthen the conspicuousness of the feature that middle layer learns, again output characteristic is inputed to the backpropagation that multitask regularization module carries out signal, repeat with this until network convergence;

Described multitask regularization module, comprise main task and inter-related task, the parameter that main task and inter-related task learn successively to supervise mixed-media network modules mixed-media jointly obtains the total feature space of all tasks, the assisted tag of recycling inter-related task provides additional regular terms with the generalization ability of Strengthens network, finally exports the prediction coordinate figure of main task.

Preferably, described multitask regularization module, comprises main task submodule and inter-related task submodule, wherein:

Described main task submodule, to the detection of input facial image 5 unique points, respectively: the detection of left eye, right eye, nose, the left corners of the mouth and the right corners of the mouth, predicts that the coordinate figure of each point is as final output.

Described inter-related task submodule carries out Attitude estimation, smile's detection, Glasses detection and gender prediction to input facial image respectively, predicts that the label value of each classification task is to promote the predictablity rate of main task.

More preferably, the fundamental purpose of described multitask regularization module produces objective function to be optimized, the i.e. difference of predicted value and actual value, carries out minimization problem solve to make predicted value approaching to reality value as far as possible to this objective function.

More preferably, the optimization object function of described multitask regularization module is the linear combination of main task loss function and inter-related task loss function.

More preferably, described main task loss function and inter-related task loss function use difference of two squares regression function and cross entropy function representation respectively.

Preferably, describedly successively supervise mixed-media network modules mixed-media, after each convolutional layer in centre, add returning Monitor function, carry out the backpropagation of signal together with the objective function to be optimized in multitask regularization module.

Preferably, describedly successively supervise mixed-media network modules mixed-media, wherein return the difference of two squares function that Monitor function is this convolutional layer output coordinate value and true coordinate value.

Preferably, describedly successively supervise mixed-media network modules mixed-media, only main task exercised supervision, and not to inter-related task supervision with the priority ensureing main task.

Preferably, describedly successively supervise mixed-media network modules mixed-media, wherein successively supervise the backpropagation of neural network, alleviate the gradient disperse problem of traditional convolutional neural networks.

Compared with prior art, the present invention has following beneficial effect:

Technique scheme of the present invention, for traditional convolutional neural networks Problems existing, proposes the method for improvement.The present invention adds supervision item to every one deck of traditional convolution nerve net, with strengthen feature the transparency and alleviate the problem of gradient disperse.The detection sharing feature space of face point, to strengthen the accuracy rate of main task, also strengthens the overall generalization ability of network to 4 inter-related tasks of the present invention---attitude detection, smile's detection, Glasses detection and gender prediction and main task---.

Accompanying drawing explanation

By reading the detailed description done non-limiting example with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:

Fig. 1 is the structured flowchart of present system one embodiment;

Fig. 2 successively supervises network diagram in the inventive method.

Embodiment

Below in conjunction with specific embodiment, the present invention is described in detail.Following examples will contribute to those skilled in the art and understand the present invention further, but not limit the present invention in any form.It should be pointed out that to those skilled in the art, without departing from the inventive concept of the premise, some distortion and improvement can also be made.These all belong to protection scope of the present invention.

The present invention is directed to the problem existing for traditional convolutional neural networks, propose a kind of based on multitask regularization and the face point detection system of successively supervising neural network.Native system, in multitask canonical part, for the over-fitting problem of traditional convolutional neural networks, utilizes the advantage of inter-related task label to represent with the common characteristic of study to high-level identification mission.In successively supervised learning part, native system for conventional roll and the disperse of neural network gradient learn the inadequate problem of feature significance, add monitor layer in each middle layer of neural network, to promote the gradient signal of returning from output layer backpropagation.This system to be used for face point and to detect by the present invention, valid certificates multitask canonical and the validity of successively supervising neural network.

As shown in Figure 1, be the structured flowchart of present system one embodiment, comprise: multitask regularization module and successively supervise mixed-media network modules mixed-media.

In the present embodiment, the main task in described multitask regularization module and several inter-related task learn the feature space obtaining having jointly, and the assisted tag of recycling inter-related task provides additional regular terms with the generalization ability of Strengthens network.

In the present embodiment, optimization object function is represented by the linear combination of main task loss function and inter-related task loss function and forms:

Wherein λ ^athe weight of a inter-related task, main task loss function, be the loss function of a inter-related task, T is total number of all tasks, and w is each layer parameter to be asked of neural network.

Main task in the present embodiment is the detection of face 5 coordinate points, inter-related task respectively: attitude detection, smile detect, Glasses detection and gender prediction.

For one group of training sample i=1 ..., N, t=1 ..., T, N and T are respectively total sample number and task number, wherein sample represent the original input of t task, represent corresponding true tag data.The detection of left eye, right eye, nose, the left corners of the mouth and these 5 points of the right corners of the mouth is recurrence tasks, and therefore desired value is the coordinate figure of respective point.The loss function of main task adopts squared error function: wherein f (x; W) be the prediction coordinate figure of 5 points, || .|| ²it is difference of two squares function; The loss function of 4 inter-related tasks adopts cross entropy function: wherein softmax function, in order to the modeling to posterior probability.

Therefore, corresponding with formula (1), final optimization pass objective function is:

\min_{w} {| | y - f (x; w) | |}^{2} + Σ_{a = 1}^{T - 1} λ^{a} (- y \log (p (y^{a} | x))) - - - (2)

In this example, described successively supervision mixed-media network modules mixed-media is different from traditional convolutional neural networks and is only optimized output layer objective function, but all introduces supervision objective function as Fig. 2 to each middle layer, thus strengthens the conspicuousness of the feature that middle layer learns.Convolutional neural networks alternately forms the feature to extract stratification by K layer convolutional layer and pooling layer, can be represented by following recursive formula:

Z _k＝pool(Z _k-1*W _k+b _k)(3)

Wherein Z _kthe characteristic pattern of a kth convolutional layer, Z _k-1the characteristic pattern of kth-1 convolutional layer, W _kthe filter weight needing study, b _kit is bias term.

The method that the present invention adopts the degree of depth to supervise, adds and returns supervision after each convolutional layer response in centre, to make solution formula (2) more accurately,

Wherein the objective function of last output layer, and it is the adjoint supervision objective function that kth layer exports.Therefore:

Wherein w and w _krepresent the filter parameter in final layer and middle layer respectively, K is the total number of plies of convolution, α _kbe a kth convolutional layer recurrence Monitor function shared by weight.Noticing the priority in order to ensure main task, only supervision item being applied to main task.

For final problem formulations to be optimized (4), the i.e. output function of multitask regularization module, stochastic gradient descent method is adopted to solve, namely the character representation that first forward direction study is total, again signals reverse is propagated and go back to represent so that refinement is this, repeat above-mentioned two steps until network convergence.

Implementation result

According to above-mentioned steps, the step in summary of the invention is adopted to implement, test the 10000 width pictures altogether that training data used derives from data set LFW and network, each width picture has all marked 5 points, respectively: left eye, right eye, nose, the left corners of the mouth and the right corners of the mouth.All mark values are all normalized to [0,1] according to picture size.Experiment used test is data from data set AFLW, AFW and LFPW.The present invention adopts three layers of wave filter size to be the convolutional layer of 5x5, connect pooling layer after each convolutional layer respectively and return monitor layer, 4th layer is containing 64 neuronic full articulamentums, is finally the multitask network layer containing the detection of main task face point and 4 face's association attributeses.This instance system compares traditional convolutional Neural respectively and returns net, successively supervises network, multitask regularization network, and the vision response test of measured 5 points is respectively: 2.14%, 5.18%, 2.80% and 2.71%.Experiment shows, has good effect in the problem detected at face point based on multi-task learning and the system of successively to supervise neural network that the present invention proposes.

Above specific embodiments of the invention are described.It is to be appreciated that the present invention is not limited to above-mentioned particular implementation, those skilled in the art can make various distortion or amendment within the scope of the claims, and this does not affect flesh and blood of the present invention.

Claims

1. based on multitask regularization and a face point detection system of successively to supervise neural network, it is characterized in that, comprising: multitask regularization module and successively supervise mixed-media network modules mixed-media, wherein:

It is 2. according to claim 1 that based on multitask regularization and the face point detection system of successively supervising neural network, it is characterized in that, described multitask regularization module, comprises main task submodule and inter-related task submodule, wherein:

Described main task submodule, to the detection of input facial image 5 unique points, respectively: the detection of left eye, right eye, nose, the left corners of the mouth and the right corners of the mouth, predicts that the coordinate figure of each point is as final output;

3. according to claim 2 based on multitask regularization and the face point detection system of successively supervising neural network, it is characterized in that, the fundamental purpose of described multitask regularization module produces objective function to be optimized, the i.e. difference of predicted value and actual value, carries out minimization problem to this objective function and solves to make predicted value approaching to reality value as far as possible.

4. according to claim 3 based on multitask regularization and the face point detection system of successively supervising neural network, it is characterized in that, the optimization object function of described multitask regularization module is the linear combination of main task loss function and inter-related task loss function.

5. according to claim 4 based on multitask regularization and the face point detection system of successively supervising neural network, it is characterized in that, described main task loss function and inter-related task loss function use difference of two squares regression function and cross entropy function representation respectively.

6. according to any one of claim 1-5 based on multitask regularization and the face point detection system of successively to supervise neural network, it is characterized in that, describedly successively supervise mixed-media network modules mixed-media, add after each convolutional layer in centre and return Monitor function, carry out the backpropagation of signal together with the objective function to be optimized in multitask regularization module.

7. according to claim 6 based on multitask regularization and the face point detection system of successively supervising neural network, it is characterized in that, describedly successively supervise mixed-media network modules mixed-media, wherein return the difference of two squares function that Monitor function is convolutional layer output coordinate value and true coordinate value.

8. according to claim 6 based on multitask regularization and the face point detection system of successively supervising neural network, it is characterized in that, described describedly successively supervise mixed-media network modules mixed-media, wherein return the backpropagation of Monitor function, alleviate the gradient disperse problem of traditional convolutional neural networks.

9. according to any one of claim 1-5 based on multitask regularization and the face point detection system of successively to supervise neural network, it is characterized in that, described successively supervision mixed-media network modules mixed-media, only exercises supervision to main task, and does not supervise inter-related task with the priority ensureing main task.