CN106575367A

CN106575367A - A method and a system for facial landmark detection based on multi-task

Info

Publication number: CN106575367A
Application number: CN201480081241.1A
Authority: CN
Inventors: 汤晓鸥; 张展鹏; 罗平; 吕健勤
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2014-08-21
Filing date: 2014-08-21
Publication date: 2017-04-19
Anticipated expiration: 2034-08-21
Also published as: CN106575367B; WO2016026063A1

Abstract

The present application disclosed a method and system for detecting facial landmarks of a face image. The method may comprise extracting multiple feature maps from at least one facial region of the face image and/or the whole face image; generating a shared facial feature vector from the extracted multiple feature maps; and predicting facial landmark locations of the face image from the generated shared facial feature vector. With the present method and system, the facial landmark detection can be optimized together with heterogeneous but subtly related task, so that the detection robustness can be improved through multi-task learning.

Description

For the method and system based on the face critical point detection of multitask

Technical field

The application be related to face alignment, exactly, be related to for face key point (landmark) detection method and System.

Background technology

Face critical point detection is that many human face analysis tasks (such as, infer, face verification and face are known by face character Substance not), but it is constantly subjected to the obstruction of light masking (occlusion) and postural change problem.

Accurately face critical point detection can be performed using cascade CNN (convolutional neural networks), and wherein face is advance Subregion is divided into different parts, and each part is by single depth CNN process.The output for obtaining subsequently carries out mean deviation And single cascading layers are transferred to, to process each face key point respectively.

Additionally, face critical point detection is not question of independence, it estimate can be subject to many isomeries but delicate association because The impact of element.For example, when child is when smiling, his/her face opens very big.Effectively find and using such internal association Face character will be helpful to more accurately detect the corners of the mouth.In addition, in the larger face of left rotation and right rotation, the distance between two is more It is little.This pose information can be used as additional information source, to constrain the solution space of crucial point estimation.Giving abundant reasonable phase In the case of the set of pass task, face critical point detection is processed in isolation can run counter to desire.

However, different task can be inherently different in terms of study difficult point, and with different rates of convergence.Additionally, Some tasks may be than other tasks overfitting earlier, so as to the study of the whole model of harm be received when learning at the same time Hold back.

The content of the invention

In the one side of the application, the method for detecting the face key point of facial image is disclosed.Methods described can Including：Multiple characteristic patterns are extracted from least one human face region of facial image；Generate from the multiple characteristic patterns for being extracted Shared face feature vector；And from the shared face feature vector for being generated predict facial image face key point Position.

In the another aspect of the application, the system for detecting the face key point of facial image is disclosed.The system May include feature extractor and predictor.Feature extractor can extract multiple spies from least one human face region of facial image Figure is levied, and shared face feature vector is generated from the multiple characteristic patterns for being extracted.Predictor can be from by feature extractor The face key point position of facial image is predicted in the shared face feature vector for generating.

The application is also associated nonproductive task with training for performing face critical point detection and at least one simultaneously The method of convolution character network.Methods described may include：1) to training facial image, its demarcation true from predetermined training set (ground-truth) key point position and its demarcation real goal for being used for each nonproductive task are sampled；2) compare it is pre- The face key point position of survey is different to generate crucial point tolerance from what is demarcated between true key point position；3) it is respectively compared mesh Mark prediction is different between real goal from for demarcating for each nonproductive task, to generate at least one training mission error； 4) by the crucial point tolerance for being generated and all of training mission error back propagation by convolutional neural networks, to adjust convolution The weight of the connection between the neuron of neutral net；5) from predetermined authentication concentrate to verify facial image and its to be used for each auxiliary The demarcation real goal for helping task is sampled；6) it is different between comparison object prediction and demarcation real goal, tested with generating Card task error；And 7) determine that the checking whether generated training mission error is less than first predetermined value and is generated is appointed Whether business error is less than second predetermined value.If it is, terminate training convolutional neural networks, and otherwise, by repeat step 1) to 7).

Present invention also provides a kind of computer-readable medium, the computer-readable medium is for store instruction, the instruction Can be performed to implement above-mentioned method by one or more processors.

Compared with the conventional method, face critical point detection can with isomery but the task of delicate auxiliary is optimized together, so as to Detection reliability can be improved by multi-task learning, especially the face with the masking of notable light and postural change is being processed In the case of.

According to the application, a single CNN is only used, therefore, it is possible to decrease the complexity of required systems/devices.Neither The advance subregion of face is needed also without concatenated convolutional nervous layer, so as to greatly reduce model complexity, and while still realizing Quite or even preferably accuracy.

With the carrying out of training, some inter-related tasks are just no longer beneficial to main task when optimum performance is reached, therefore can Stop the training process to them.According to the application, the training of CNN is performed using " stopping (early stopping) in advance " Journey, endangers the inter-related task of main task, to promote to stop those because overfitting (over-fit) training set is started Study convergence.

Description of the drawings

Below with reference to the exemplary non-limiting embodiments of the Description of Drawings present invention.Accompanying drawing is illustrative, and typically not It is actual precise proportions.Same or like element on different figures quotes identical drawing reference numeral.

Fig. 1 is the schematic diagram for illustrating the system for face critical point detection according to some disclosed embodiments.

Fig. 2 is the schematic diagram for illustrating the training unit as shown in Figure 1 according to some disclosed embodiments.

Fig. 3 is the signal of the example for illustrating the system for face critical point detection according to some disclosed embodiments Figure, there is shown with the example of convolutional neural networks.

When Fig. 4 is to illustrate to be performed in software according to the system for face critical point detection of some disclosed embodiments Schematic diagram.

Fig. 5 is the schematic flow diagram for illustrating the method for face critical point detection according to some disclosed embodiments.

Fig. 6 is the exemplary flow of the training process for illustrating the multitask convolutional neural networks according to some disclosed embodiments Figure.

Specific embodiment

This part will be explained in illustrative embodiments, and the example of these embodiments will be illustrated in the drawings.Suitable When working as, identical drawing reference numeral represents all the time same or similar part in accompanying drawing.

Fig. 1 is to illustrate showing for the example system 1000 for face critical point detection according to some disclosed embodiments It is intended to.According to system 1000, face critical point detection (being hereinafter also referred to as main task) is related at least one/nonproductive task It is optimized jointly.Face critical point detection refers to detect 2D positions, i.e. the 2D coordinates of the human face region of facial image (x and y).The example of face key point may include but be not limited to, the left eye center and right eye center of facial image, nose, the left corners of the mouth and The right corners of the mouth.The example of nonproductive task may include but be not limited to, head pose estimation, demography (such as, Gender Classification), year Age is estimated, expression recognition (such as, smiling) or face character infer (such as, wearing glasses).It will be appreciated that nonproductive task Quantity or type are not limited to those mentioned herein.

Fig. 1 is referred again to, feature extractor 100, training unit 200 and pre- are may include when system 1000 is implemented by hardware Survey device 300.Feature extractor 100 can extract many from least one human face region of facial image and/or whole facial image Individual characteristic pattern.Subsequently, shared face feature vector can be generated from the multiple characteristic patterns for being extracted by feature extractor 100.

Predictor 300 can predict facial image from the shared face feature vector extracted by feature extractor 100 Face key point position.Meanwhile, predictor 300 can further prediction and the inspection of face key point from shared face feature vector Survey the corresponding target of at least one associated nonproductive task.According to system 1000, face critical point detection can be with nonproductive task It is optimized jointly.

According to embodiment, feature extractor 100 may include convolutional neural networks.The network may include multiple convolution-ponds Layer and full articulamentum.In a network, each the executable convolution in multiple convolution-pond layer and maximum pondization operation, and by The characteristic pattern that preceding layer in convolution-pond layer is extracted is input in next layer of convolution pond layer, to extract and previously extraction The different characteristic pattern of characteristic pattern.Full articulamentum can generate shared face characteristic from all of extracted multiple characteristic patterns Vector.

The example of network is shown, wherein convolutional neural networks include input layer, multiple (for example, three) convolution-pond in Fig. 3 Change layer, a convolutional layer and a full articulamentum, wherein convolution-pond layer includes one or more (for example, three) convolution Layer and one or more (for example, three) pond layers.It should be noted that shown network is for example, and in feature extractor Convolutional neural networks not limited to this.As shown in figure 3, the facial image of 40 × 40 (such as) gray scales is input in input layer.First Convolution-pond layer extracts characteristic pattern from the image of input.Subsequently, the second convolution-pond layer using the output of ground floor as defeated Enter, to generate different characteristic patterns.This process is continued by using all of three convolution-pond layer.Finally, characteristic pattern Multiple layers by full articulamentum be used for generate shared face feature vector.In other words, by performing multiple convolution and maximum Pondization operates to generate shared face feature vector.Multiple neuron of each layer containing band locally or globally receptive field, and And the weight of the connection between the neuron of convolutional neural networks can be adjusted with correspondingly training network.

According to embodiment, system 1000 can also include training unit 200.Training unit 200 can use predetermined training set Carry out training characteristics extractor, with the weight of the connection between the neuron for adjusting convolutional neural networks, so that trained Feature extractor can extract shared d face feature vectors.Embodiments herein according to Fig. 2, training unit 200 can Including sampler 201, comparator 202 and back propagation device 203.

As shown in Fig. 2 sampler 201 can be from predetermined training set to training facial image, its true key point of demarcation Put and for each nonproductive task its demarcate real goal be sampled.According to embodiment, five are demarcated true key point (that is, the center of eyes, nose, the corners of the mouth) can be annotated directly on each training facial image.According to another embodiment, it is used for The demarcation real goal of each nonproductive task can hand labeled.For example, for Gender Classification, demarcating real goal can be marked as Female (F) or male (M).Infer for face character, such as, wear glasses, demarcating real goal can be marked as wearing (Y) or not wear (N).Estimate for head pose, can labelling (0 °, ± 30 °, ± 60 °), and for Expression Recognition, such as, smile, can be corresponding Ground labelling Yes/No.

It is different between comparable the predicted face key point position of comparator 202 and the true key point position of demarcation, To generate crucial point tolerance.Crucial point tolerance can be obtained by using such as method of least square.Comparator 203 can also compare respectively It is different between real goal from for demarcating for each nonproductive task compared with target prediction, to generate at least one training mission mistake Difference.According to another embodiment, training mission error can be obtained by using such as Cross-Entropy Method.

Back propagation device 203 can be by the crucial point tolerance for being generated and all of training mission error back propagation by volume Product neutral net, with the weight of the connection between the neuron for adjusting convolutional neural networks.

According to embodiment, training unit 200 may also include determining that device 204.Determiner 204 can determine that face critical point detection Training process whether restrain.According to another embodiment, determiner 204 may further determine that whether the training process of each task is received Hold back, this will be discussed after.

Hereinafter, will be discussed in detail the part in training unit as mentioned above 200.For purposes of illustration, will The embodiment for being trained T task jointly by training unit 200 is described.For T task, face critical point detection (that is, director Business) be expressed as in r, and at least one correlation/nonproductive task one be expressed as a, wherein a ∈ A.

For each in task, training data is expressed as (x_i,y_i ^t), t={ 1 ..., T } and i={ 1 ..., N }, its Middle N represents the quantity of training data.Specifically, for face critical point detection r, training data is expressed as (x_i,y_i ^r), whereinIt is the 2D coordinates of five key points.For task a, training data is expressed as (x_i,y_i ^a).In embodiment, four are illustrated Individual task p, g, w and s, and they represent respectively deduction ' posture ', ' sex ', ' wearing glasses ' and ' smile '.Therefore,Five different postures ((0 °, ± 30 °, ± 60 °)) are represented, andIt is binary system category Property and represent female/man respectively, do not wear glasses/wear glasses and do not have smile/smile.Different weights be assigned to main task r and Each nonproductive task a, and it is expressed as W^r{ W^a}。

Subsequently, the object function of all tasks is formulated as follows to optimize main task r and nonproductive task a：

Wherein, f (x^t；w^t) it is x^tWith weight vectors w^tLinear function；

L () represents loss function；

λ^aRepresent the important coefficient of the error of a-th task；And

x_iRepresent shared face feature vector.

According to embodiment, least square function is used separately as the loss letter of main task r and nonproductive task a with entropy function is intersected Number l (), to generate corresponding crucial point tolerance and training mission error.Therefore, above-mentioned object function is rewritable as follows：

In equation (2), in Section 1It is linear function.Section 2 is posterior probability functionWhereinThe jth row of the weight matrix of expression task a, w.Section 3 punishment is big Weight W={ (W^r,{W^a}})。

According to embodiment, the weight of all tasks can correspondingly update.Specifically, the weight of face critical point detection Matrix byUpdate, wherein η represent learning rate (such as, η=0.003) and In addition, the weight matrix of each task a can be calculated as in a similar fashion

Subsequently, the crucial point tolerance for being generated and training mission error can be by the back propagations layer by layer of back propagation device 203 By convolutional neural networks until lowermost layer, so as to the weight of the connection between the neuron for adjusting convolutional neural networks.

According to presently filed embodiment, error can counter-propagate through convolution god by following back propagation strategy Jing networks：

In equation (3), ε^lThe all errors in layer l are represented, wherein

For example, ε¹Represent the error of lowermost layer, and ε²Represent the error of the second low layer.The error of lower level is according to equation (3) calculated.For example,WhereinIt is the gradient of activation primitive σ () of network.

The above-mentioned training process of repetition, until determiner 204 determines that the training process of face critical point detection is convergence.Change Yan Zhi, if error is less than predetermined value, then training process will be confirmed as convergence.By above-mentioned training process, feature extraction Device 100 can extract shared face feature vector from given facial image.According to embodiment, for any facial image x⁰, characteristic vector x that the extraction of feature extractor 100 trained is shared^l.Subsequently, by (W^r)^Tx^lTo predict key point position, And by p (y^a|x^l；W^a) obtaining the prediction target of nonproductive task.

During above-mentioned training process, while training at least one nonproductive task.However, different tasks have it is different Loss function and study difficult point, therefore with different rates of convergence.According to another embodiment, determiner 204 may further determine that Whether the training process of nonproductive task restrains.

Specifically,WithThe value of the loss function of checking collection and task a in training set is represented respectively.If The measured value of one task exceeds threshold epsilon, as follows, then the task will stop：

In equation (4), t represents current iteration, and k represents training length, and λ^aRepresent the weight of the error of a-th task Want property coefficient.' med ' represents the function for calculating intermediate value.Section 1 in equation (4) represents the training mission of task a and misses Poor trend.If training error declines rapidly in segment length k, then the value of Section 1 is less, so as to show task Training process can continue, because the task is still valuable.Otherwise, Section 1 is larger, then the task more likely stops.Therefore, During training process, nonproductive task can be closed before its overfitting, so that the task can start overfitting instruction at it Practice collection thus endanger before main task " in advance stop ".

By above-mentioned training process, feature extractor 100 can extract shared face characteristic from any facial image Vector.For example, facial image x⁰It is input in the input layer of convolutional neural networks, for example, as shown in Figure 3.Each volume in CNN There is multigroup convolution filter in lamination and be applied to the activation primitive of facial image, and they are one after the other applied with by face Image projects higher level.In other words, by learn a series of nonlinear mapping (as follows) with obtain arch face characteristic to Amount x^lAnd step by step facial image is projected into higher level：

Herein, σ () and W^slRepresentative is applied to facial imageNonlinear activation function and The wave filter for learning is needed in the layer l of CNN.For example,.Fig. 3 is referred again to, in estimation stages, shared face feature vector can It is used for critical point detection and auxiliary/inter-related task simultaneously.

It will be appreciated that system 1000 can be implemented using a certain hardware, software or combinations thereof.Additionally, the reality of the present invention Apply example and may be adapted to computer program, the computer program is embodied in containing computer program code one or many On individual computer-readable recording medium (including but not limited to, disk memory, CD-ROM, optical memory etc.).

In the case where system 1000 is implemented with software, system 1000 may include general purpose computer, computer cluster, main flow Computer, the computing device for being exclusively used in offer online content, or computer network, the computer network includes one group to collect In or distribution mode operation computer.As shown in figure 4, system 1000 may include one or more processors (processor 102, 104th, 106 etc.) what, the information between the various parts of memorizer 112, storage device 116 and accelerating system 1000 was exchanged is total Line.Processor 102 to 106 may include CPU (" CPU "), Graphics Processing Unit (" GPU ") or other are suitable Information processor.According to the type of the hardware for being used, processor 102 to 106 may include one or more printed circuit board (PCB)s And/or one or more microprocessors chip.The executable computer program of processor 102 to 106 instruction sequence, with perform by The various methods for illustrating in further detail below.

Memorizer 112 can especially include random access memory (" RAM ") and read only memory (" ROM ").Computer journey Sequence instruction can be stored by memorizer 112, accessed and read from the memorizer, so as in by processor 102 to 106 or Multiple computing devices.For example, memorizer 112 can store one or more software applications.Additionally, memorizer 112 can store whole The one of the software application that individual software application or only storage can be performed by the one or more processors in processor 102 to 106 Part.Although it should be noted that only illustrating a frame in Fig. 1, memorizer 112 may include to be arranged on central processor or different meters Calculate the multiple physical units on device.

Described above is for the system of face critical point detection.Describe hereinafter with reference to Fig. 5 and Fig. 6 for face pass The method of key point detection.

Fig. 5 is illustrated for the schematic flow diagram of face critical point detection, and Fig. 6 illustrates many of the execution of training unit 200 The schematic flow diagram of the training process of task convolutional neural networks.

In fig. 5 and fig., method 500 and 600 includes to be held by the one or more processors in processor 102 to 106 The series of steps that capable or system 1000 each module/unit is performed, to implement data processing operation.For description Purpose, below by entering as a example by being formed by the combination of hardware or hardware and software by each module/unit of system 1000 Row description.It will be understood by one of ordinary skill in the art that other suitable devices or system are applicable to implement following process, and System 1000 is intended merely as implementing the illustration of the process.

As shown in figure 5, in step S501, feature extractor 100 is carried from least one human face region of facial image Take multiple characteristic patterns.In another embodiment, in step S501, multiple characteristic patterns can be extracted from whole facial image.With Afterwards, in step S502, from the multiple characteristic patterns extracted in step S501 shared face feature vector is generated.In step In S503, the face key point position of facial image is predicted from the shared face feature vector generated in step S502. According to another embodiment, shared face feature vector can be used to predict that at least one be associated with face critical point detection is auxiliary Help the corresponding target of task.Subsequently, while obtaining the target prediction of all nonproductive tasks.

According to presently filed embodiment, feature extractor includes convolutional neural networks, and the convolutional neural networks include many Individual convolution-pond layer and full articulamentum.Each in convolution-pond layer is configured to perform convolution and maximum pondization operation. In the embodiment, in step S501, multiple characteristic patterns can continuously be extracted by multiple convolution-pond layer, wherein by convolution- The characteristic pattern that preceding layer in the layer of pond is extracted is input to next layer of convolution-pond layer, to extract and the previous feature extracted The different characteristic pattern of figure.In step S502, can be by full articulamentum from all of multiple characteristic patterns extracted in step S501 It is middle to generate shared face feature vector.

In this embodiment, method 500 also includes training step (not shown in Fig. 5), and the training step will enter with reference to Fig. 6 Row is discussed.

As shown in fig. 6, in step s 601, to training facial image, its true key point of demarcation from predetermined training set Position and its demarcation real goal for being used for each nonproductive task are sampled.For training facial image, in step S602, The target prediction of its face key point prediction and all nonproductive tasks can be correspondingly obtained from predictor 300.Subsequently, in step In S603, compare predicted face key point position and demarcate different between true key point position, to generate key point Error.In step s 604, it is respectively compared target prediction different between real goal from for demarcating for each nonproductive task, To generate at least one training mission error.Subsequently, in step s 605, by the crucial point tolerance for being generated and all of training Task error back propagation passes through convolutional neural networks, with the weight of the connection between the neuron for adjusting convolutional neural networks. In step S606, determine whether in nonproductive task one restrained.If "No", process 600 returns to step S606.Such as Fruit "Yes", the training process of the task stops and proceeds to step S608 in step S607.In step S608, people is determined Whether the training process of face critical point detection restrains.If "Yes", process 600 terminates, and otherwise, process 600 returns to step S601。

Therefore, face critical point detection can optimize to isomery but together with delicate related task.

Although having been described for the preferred embodiment of the present invention, after basic conception of the present invention is understood, the technology of art Personnel can be changed or change to these examples.Appended claims are intended to preferred including what is fallen within the scope of the present invention Example and all changes or change.

Obviously, without departing from the spirit and scope of the present invention, those skilled in the art can be to the present invention It is changed or changes.Therefore, if these changes or change belong to the scope of claims and equivalence techniques, then they Also can fall within the scope of the present invention.

Claims

1. a kind of method for detecting the face key point of facial image, including：

Multiple characteristic patterns are extracted from least one human face region of the facial image；

Shared face feature vector is generated from the multiple characteristic patterns for being extracted；And

The face key point position of the facial image is predicted from the shared face feature vector for being generated.

2. method according to claim 1, wherein the face key point include selected from group consisting of extremely It is few one：The eyes central point of facial image, nose and the corners of the mouth.

3. method according to claim 1, wherein in the step of prediction, the shared face feature vector is used for pre- The corresponding target of at least one nonproductive task being associated with the face critical point detection is surveyed, it is all of described to obtain simultaneously The target prediction of nonproductive task.

4. method according to claim 3, wherein the nonproductive task include selected from group consisting of at least One：Head pose estimation, Gender Classification, estimation of Age, expression recognition or face character are inferred.

5. method according to claim 4, wherein the extraction is performed with the step of the generation by convolutional neural networks, The convolutional neural networks include the multiple convolution-pond layer for being configured to perform convolution and maximum pondization operation, and

Wherein, the step of extraction also includes：

The plurality of characteristic pattern is continuously extracted by the plurality of convolution-pond layer, wherein by before in the convolution-pond layer The characteristic pattern of one layer of extraction is input to next layer of the convolution-pond layer, to extract with the previous characteristic pattern for extracting not Same characteristic pattern.

6. method according to claim 5, wherein the convolutional neural networks also include full articulamentum, and in the life Into the step of in, from the full articulamentum generate from all of extracted multiple characteristic patterns the shared face characteristic to Amount.

7. method according to claim 6, wherein each layer of the convolutional neural networks has multiple neurons, and Wherein methods described also includes：

The convolutional neural networks are trained using predetermined training set, with adjust the convolutional neural networks the neuron it Between connection each weight so that generating the shared people by with the convolutional neural networks of weight being adjusted Face characteristic vector.

8. method according to claim 7, wherein the step of training also includes：

From the predetermined training set to training facial image, it demarcates true key point position and its is used for each nonproductive task Demarcation real goal be sampled；

Compare different between predicted face key point position and the true key point position of the demarcation, to generate key point Error；

The target prediction is respectively compared different between real goal from for demarcating for each nonproductive task, to generate at least One training mission error；And

The crucial point tolerance for being generated and the training mission error back propagation for being generated are passed through into the convolutional neural networks, with Adjust the weight of the connection between the neuron of the convolutional neural networks；

The repetition sampling, it is described compare and the step of the back propagation, until the crucial point tolerance of the generation is less than the One predetermined value and the training mission error of the generation are less than second predetermined value.

9. method according to claim 8, wherein comparing to generate crucial point tolerance according to Least Square in Processing to hold OK, and compare to generate training mission error according to cross entropy process to perform.

10. method according to claim 8, wherein for also including the step of each nonproductive task, the training：

Concentrate to verifying that facial image and its demarcation real goal for being used for each nonproductive task are sampled from predetermined authentication；

It is relatively more different between the target detection and the demarcation real goal, to generate validation task error；

The repetition sampling and the comparison, until the training mission error of the generation is less than third predetermined value and is generated Validation task error be less than the 4th predetermined value.

11. methods according to claim 1, wherein in the step of prediction, the people of the prediction of the facial image Face key point position is according to (W^r)^Tx^lTo determine,

Wherein W^rRepresentative is assigned to the weight of the face critical point detection, and x^lThe shared face characteristic vector is represented, And T represents transposition.

A kind of 12. systems for detecting the face key point of facial image, including：

Feature extractor, it is configured to：

Multiple characteristic patterns are extracted from least one human face region of the facial image；And

Predictor, it is described that it is configured to the prediction from the described shared face feature vector generated by the feature extractor The face key point position of facial image.

13. systems according to claim 12, wherein the predictor is further configured to by using described sharing simultaneously Face feature vector is obtaining the target prediction of at least one nonproductive task being associated with the face critical point detection.

14. systems according to claim 12, wherein the feature extractor also includes convolutional neural networks, wherein described Convolutional neural networks include：

Multiple convolution ponds layer, it is configured to perform convolution and maximum pondization operation, and wherein by convolution pond layer In the characteristic pattern that extracts of preceding layer be input in next layer of convolution pond layer, to extract and the previous spy of extraction Levy the different characteristic pattern of figure；And

Full articulamentum, its be configured to from multiple characteristic patterns of all of extraction to generate the shared face characteristic to Amount.

15. systems according to claim 13, wherein each layer of the convolutional neural networks has multiple neurons, with And its described in system also include：

Training unit, it is configured to train the convolutional neural networks using predetermined training set, to adjust the convolution god The weight of the connection between the neuron of Jing networks, so that the convolutional neural networks trained can extract described common The face feature vector enjoyed.

16. systems according to claim 15, wherein the training unit also includes：

Sampler, its be configured to from the predetermined training set to train facial image, its demarcate true key point position and Its demarcation real goal for being used for each nonproductive task is sampled；

Comparator, it is configured to compare between predicted face key point position and the true key point position of the demarcation Difference, to generate crucial point tolerance, and it is true with the demarcation for each nonproductive task to be respectively compared the target prediction Difference between real target, to generate at least one training mission error；And

Back propagation device, it is configured to the crucial point tolerance that will be generated and the training mission error back propagation passes through institute Convolutional neural networks are stated, with the weight of the connection between the neuron for adjusting the convolutional neural networks.

17. systems according to claim 15, wherein the training unit also includes：

Determiner, its training process for being configured to determine the face critical point detection whether restrain and each task instruction Practice whether process restrains.

18. systems according to claim 12, wherein the face key point is included selected from group consisting of At least one：The center of the eyes of facial image, nose, the corners of the mouth.

19. systems according to claim 13, wherein the nonproductive task includes being selected from group consisting of extremely It is few one：Head pose estimation, Gender Classification, estimation of Age, expression recognition or face character are inferred.

A kind of 20. methods for training convolutional neural networks, the convolutional neural networks perform face critical point detection simultaneously Nonproductive task is associated with least one, methods described includes：

1) from the predetermined training set to training facial image, it demarcates true key point position and its is used for each auxiliary and appoints The demarcation real goal of business is sampled；

2) compare different between predicted face key point position and the true key point position of the demarcation, to generate key Point tolerance；

3) be respectively compared it is different between the target prediction and the demarcation real goal for each nonproductive task, with life Into at least one training mission error；

4) the crucial point tolerance for being generated and all of training mission error back propagation are passed through into the convolutional Neural net Network, with the weight of the connection between the neuron for adjusting the convolutional neural networks；

5) concentrate to verifying that facial image and its demarcation real goal for being used for each nonproductive task take from predetermined authentication Sample；

6) it is relatively more different between the target detection and the demarcation real goal, to generate validation task error；

7) whether the training mission error for determining the generation is less than the validation task error of first predetermined value and the generation Whether second predetermined value is less than；And

If it is, terminate for training the convolutional neural networks, otherwise, by repeating said steps 1) to 7).