CN106575367B

CN106575367B - Method and system for the face critical point detection based on multitask

Info

Publication number: CN106575367B
Application number: CN201480081241.1A
Authority: CN
Inventors: 汤晓鸥; 张展鹏; 罗平; 吕健勤
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2014-08-21
Filing date: 2014-08-21
Publication date: 2018-11-06
Anticipated expiration: 2034-08-21
Also published as: WO2016026063A1; CN106575367A

Abstract

The application discloses a kind of method and system for detecting the face key point of facial image.The method may include：From extracting multiple characteristic patterns at least one human face region and/or entire facial image of the facial image；Shared face feature vector is generated from the multiple characteristic patterns extracted；And the face key point position of the facial image is predicted from the shared face feature vector generated.Method and system through the invention, the face critical point detection can optimize together with isomery but delicate relevant task, so as to improve detection reliability by multi-task learning.

Description

Method and system for the face critical point detection based on multitask

Technical field

This application involves face alignment, exactly, be related to for face key point (landmark) detection method and System.

Background technology

Face critical point detection is that many human face analysis tasks (such as, infer, face verification and face are known by face character Substance not), but it is constantly subjected to the obstruction of light masking (occlusion) and postural change problem.

Cascade CNN (convolutional neural networks) can be used to execute for accurate face critical point detection, and wherein face is advance Subregion is divided into different parts, and each part is by individual depth CNN processing.Obtained output then carries out mean deviation And individual cascading layers are transferred to, to handle each face key point respectively.

In addition, face critical point detection is not question of independence, its estimation can by many isomeries but it is delicate it is associated because The influence of element.For example, when child is when smiling, his/her face opens very big.Effectively find and utilize such internal association Face character will be helpful to more accurately detect the corners of the mouth.In addition, in rotating left and right larger face, the distance between two is more It is small.This pose information can be used as additional information source, with the solution space of the crucial point estimation of constraint.In given abundant reasonable phase In the case of the set of pass task, handling face critical point detection in isolation can run counter to desire.

However, different task can be inherently different in terms of learning difficult point, and there is different rates of convergence.In addition, Certain tasks may be received in overfitting, the study to endanger entire model earlier than other tasks when learning at the same time It holds back.

Invention content

In the one side of the application, the method that discloses the face key point for detecting facial image.The method can It is executed including the use of convolutional neural networks：Multiple characteristic patterns are extracted from least one human face region of facial image；From being carried Shared face feature vector is generated in the multiple characteristic patterns taken；And it is predicted from the shared face feature vector generated The face key point position of facial image, and it is related to face critical point detection to predict using shared face feature vector The correspondence target of at least one nonproductive task of connection, to obtain the target prediction of all nonproductive tasks simultaneously, wherein the volume Product neural network is trained using predetermined training set, and the step of the training includes：1) to training face from predetermined training set Image, it demarcates true key point position and it is used for the calibration real goal of each nonproductive task and is sampled；2) compare institute Difference between the face key point position and the true key point position of calibration of prediction, to generate crucial point tolerance；3) compare respectively The difference between calibration real goal compared with target prediction and for each nonproductive task, is missed with generating at least one training mission Difference；And 4) by the crucial point tolerance generated and the training mission error back propagation that is generated by convolutional neural networks, The weight of the connection between neuron to adjust convolutional neural networks；Repeat step 1) -4), until the crucial point tolerance of generation Less than first predetermined value and the training mission error of generation is less than second predetermined value.

In one embodiment, face key point includes selected from least one of the group being made up of：Face figure Eyes central point, nose and the corners of the mouth of picture.

In one embodiment, nonproductive task includes selected from least one of the group being made up of：Head pose Estimation, Gender Classification, age estimation, facial expression recognition or face character are inferred.

In one embodiment, convolutional neural networks include being configured to execute the multiple of convolution sum maximum pondization operation Convolution-pond layer, and wherein, from least one human face region of facial image extracting multiple characteristic patterns further includes：By more A convolution-pond layer continuously extracts multiple characteristic patterns, wherein being inputted by the characteristic pattern of the preceding layer extraction in convolution-pond layer To next layer of convolution-pond layer, to extract the characteristic pattern different from the characteristic pattern previously extracted.

In one embodiment, convolutional neural networks further include full articulamentum, and from the multiple features extracted When generating shared face feature vector in figure, generated from all multiple characteristic patterns extracted by full articulamentum shared Face feature vector.

In one embodiment, each layer of convolutional neural networks has multiple neurons, and wherein method is also wrapped It includes：Carry out training convolutional neural networks using predetermined training set, the connection between neuron to adjust convolutional neural networks it is every A weight, so that generating shared face feature vector by the convolutional neural networks with the weight being adjusted.

In one embodiment, compare and executed according to Least Square in Processing with generating crucial point tolerance, and compare It is executed compared with to generate training mission error according to cross entropy processing.

In one embodiment, for each nonproductive task, carry out training convolutional neural networks also using predetermined training set Include the following steps：5) from predetermined authentication concentrate to verification facial image and its be used for the true mesh of calibration of each nonproductive task Mark is sampled；6) compare target prediction and demarcate the difference between real goal, to generate validation task error；Repeat step 5) and 6), until the validation task error that the training mission error of generation is less than third predetermined value and is generated is less than the 4th in advance Definite value.

In one embodiment, in the face pass for predicting facial image from the shared face feature vector generated When key point position, the face key point position of facial image is according to (W^r)^Tx^lIt determines, wherein W^rRepresentative is assigned to face key The weight of point detection, and x^lIt represents shared face characteristic vector and T represents transposition.

In the another aspect of the application, the system that discloses the face key point for detecting facial image.The system It may include feature extractor and fallout predictor.Feature extractor can utilize convolutional neural networks to execute from at least one of facial image Multiple characteristic patterns are extracted in human face region, and generate shared face feature vector from the multiple characteristic patterns extracted.In advance The face key point position of facial image can be predicted from the shared face feature vector generated by feature extractor by surveying device, with And at least one nonproductive task associated with face critical point detection is obtained by using shared face feature vector Target prediction, wherein each layer of convolutional neural networks has multiple neurons, and system further includes training unit, is used for Training convolutional neural networks, so that the convolutional neural networks trained can extract shared face feature vector, training Unit includes：Sampler, be configured to from predetermined training set to training facial image, its demarcate true key point position and It is used for the calibration real goal of each nonproductive task and is sampled；Comparator is configured to compare predicted face pass Difference between key point position and the true key point position of calibration, to generate crucial point tolerance, and is respectively compared target prediction With for each nonproductive task calibration real goal between difference, to generate at least one training mission error；And it is anti- To transmission device, it is configured to the crucial point tolerance that will be generated and training mission error back propagation passes through convolutional Neural net Network, the weight of the connection between neuron to adjust convolutional neural networks.

In one embodiment, convolutional neural networks include：Multiple convolution ponds layer is configured to execute convolution sum Maximum pondization operation, and the next of convolution pond layer is wherein input to by the characteristic pattern of the preceding layer extraction in the layer of convolution pond In layer, to extract the characteristic pattern different from the characteristic pattern previously extracted；And full articulamentum, it is configured to from all extractions Multiple characteristic patterns in generate shared face feature vector.

In one embodiment, training unit further includes：Determiner is configured to determine face critical point detection Whether training process restrains and whether the training process of each task restrains.

In one embodiment, face key point includes selected from least one of the group being made up of：Face figure The center of the eyes of picture, nose, the corners of the mouth.

The application also have training for be performed simultaneously face critical point detection and at least one associated nonproductive task with The method for obtaining the convolution character network of the target prediction of nonproductive task.The method may include：1) right from predetermined training set Training facial image, its true (ground-truth) key point position of calibration and its to be used for the calibration of each nonproductive task true Target is sampled；2) compare different to generate between predicted face key point position and the true key point position of calibration Crucial point tolerance；3) it is respectively compared target prediction and the difference demarcated between real goal for each nonproductive task, with life At at least one training mission error；4) the crucial point tolerance generated and all training mission error back propagations are passed through Convolutional neural networks, the weight of the connection between neuron to adjust convolutional neural networks；5) it is concentrated from predetermined authentication to testing Witness's face image is used for the calibration real goal of each nonproductive task with it and is sampled；6) it is true with calibration to compare target prediction Difference between target, to generate validation task error；And 7) determine whether training mission error is less than first predetermined value simultaneously And whether validation task error is less than second predetermined value.If so, terminating training convolutional neural networks, otherwise, step will be repeated 1) to 7).

Present invention also provides a kind of computer-readable medium, the computer-readable medium for storing instruction, the instruction It can be executed by one or more processors to implement above-mentioned method.

Compared with the conventional method, face critical point detection can be optimized together with the task of isomery but delicate auxiliary, so as to Detection reliability can be improved by multi-task learning, especially there is in processing the face of the masking of notable light and postural change In the case of.

According to the application, a single CNN is only used, therefore, the complexity of required systems/devices can be reduced.Neither It needs the advance subregion of face also not need concatenated convolutional nervous layer, to greatly reduce model complexity, and still realizes simultaneously Quite or even preferably accuracy.

With trained progress, certain inter-related tasks are just no longer beneficial to main task when reaching optimum performance, therefore can Stop to their training process.According to the application, training for CNN is executed using " stopping (early stopping) in advance " Journey endangers the inter-related task of main task to stop those because overfitting (over-fit) training set is started, to promote Study convergence.

Description of the drawings

The exemplary non-limiting embodiments of the present invention are described referring to the attached drawing below.Attached drawing is illustrative, and generally not It is actual precise proportions.Same or like element on different figures quotes identical drawing reference numeral.

Fig. 1 is the schematic diagram for showing the system for face critical point detection according to some disclosed embodiments.

Fig. 2 is the schematic diagram for showing the training unit as shown in Figure 1 according to some disclosed embodiments.

Fig. 3 is the signal for the example for showing the system for face critical point detection according to some disclosed embodiments Figure, there is shown with the examples of convolutional neural networks.

Fig. 4 is when showing to be executed in software according to the system for face critical point detection of some disclosed embodiments Schematic diagram.

Fig. 5 is the schematic flow diagram for showing the method for face critical point detection according to some disclosed embodiments.

Fig. 6 is the exemplary flow for the training process for showing the multitask convolutional neural networks according to some disclosed embodiments Figure.

Specific implementation mode

This part will be explained in illustrative embodiments, and the example of these embodiments will be illustrated in the drawings.Suitable When working as, identical drawing reference numeral always shows same or similar part in attached drawing.

Fig. 1 is to show showing according to the exemplary systems 1000 for face critical point detection of some disclosed embodiments It is intended to.According to system 1000, face critical point detection (being hereinafter also referred to as main task) and at least one related/nonproductive task It optimizes jointly.Face critical point detection refers to the positions detection 2D, that is, the 2D coordinates (x of the human face region of facial image And y).The example of face key point may include but be not limited to, the left eye center and right eye center of facial image, nose, the left corners of the mouth With the right corners of the mouth.The example of nonproductive task may include but be not limited to, head pose estimation, demography (such as, Gender Classification), Age estimation, facial expression recognition (such as, smiling) or face character infer (such as, wearing glasses).It will be appreciated that auxiliary is appointed The quantity or type of business are not limited to those mentioned herein.

Referring again to FIGS. 1, may include feature extractor 100,200 and of training unit when system 1000 is implemented by hardware Fallout predictor 300.Feature extractor 100 can be extracted from least one human face region and/or entire facial image of facial image Multiple characteristic patterns.Then, can be generated from the multiple characteristic patterns extracted from feature extractor 100 shared face characteristic to Amount.

Fallout predictor 300 can predict facial image from the shared face feature vector extracted by feature extractor 100 Face key point position.Meanwhile fallout predictor 300 can further prediction be examined with face key point from shared face feature vector Survey the correspondence target of associated at least one nonproductive task.According to system 1000, face critical point detection can be with nonproductive task It optimizes jointly.

According to embodiment, feature extractor 100 may include convolutional neural networks.The network may include multiple convolution-ponds Layer and full articulamentum.In a network, each of multiple convolution-pond layer can perform convolution sum maximum pondization and operate, and by The characteristic pattern of preceding layer extraction in convolution-pond layer is input in next layer of convolution pond layer, with extraction and previously extraction The different characteristic pattern of characteristic pattern.Full articulamentum can generate shared face characteristic from all multiple characteristic patterns extracted Vector.

The example of network is shown, wherein convolutional neural networks include input layer, multiple (for example, three) convolution-pond in Fig. 3 Change layer, a convolutional layer and a full articulamentum, wherein convolution-pond layer includes one or more (for example, three) convolution Layer and one or more (for example, three) pond layers.It should be noted that shown network is for example, and in feature extractor Convolutional neural networks are without being limited thereto.As shown in figure 3, the facial image of 40 × 40 (for example) gray scales inputs in input layer.First Convolution-pond layer extracts characteristic pattern from the image of input.Then, the second convolution-pond layer is using the output of first layer as defeated Enter, to generate different characteristic patterns.This process is continued by using all three convolution-pond layers.Finally, feature Multiple layers of figure are used for generating shared face feature vector by full articulamentum.In other words, by executing multiple convolution sum most Great Chiization operates to generate shared face feature vector.Multiple neuron of each layer containing band locally or globally receptive field, And the weight of the connection between the neuron of convolutional neural networks can be adjusted correspondingly to train network.

According to embodiment, system 1000 can further include training unit 200.Predetermined training set can be used in training unit 200 Carry out training characteristics extractor, the weight of the connection between neuron to adjust convolutional neural networks, so that trained Feature extractor can extract shared d face feature vectors.Embodiments herein according to Fig.2, training unit 200 can Including sampler 201, comparator 202 and backpropagation device 203.

As shown in Fig. 2, sampler 201 can be from predetermined training set to training facial image, its true key point of calibration It sets and its calibration real goal for each nonproductive task is sampled.According to embodiment, five true key points of calibration (that is, the center of eyes, nose, corners of the mouth) can be annotated directly on each trained facial image.According to another embodiment, it is used for The calibration real goal of each nonproductive task can hand labeled.For example, being directed to Gender Classification, calibration real goal can be marked as Female (F) is male (M).Infer for face character, such as, wear glasses, calibration real goal can be marked as wearing (Y) or not wear (N).Estimate for head pose, can mark (0 °, ± 30 °, ± 60 °), and be directed to Expression Recognition, such as, smiles, it can be corresponding Ground marks Yes/No.

Difference between comparable the predicted face key point position of comparator 202 and the true key point position of calibration, To generate crucial point tolerance.Crucial point tolerance can be obtained by using such as least square method.Comparator 203 can also compare respectively The difference between calibration real goal compared with target prediction and for each nonproductive task, is missed with generating at least one training mission Difference.According to another embodiment, training mission error can be obtained by using such as Cross-Entropy Method.

The crucial point tolerance generated and all training mission error back propagations can be passed through volume by backpropagation device 203 Product neural network, the weight of the connection between neuron to adjust convolutional neural networks.

According to embodiment, training unit 200 may also include determining that device 204.Determiner 204 can determine that face key point is examined Whether the training process of survey restrains.According to another embodiment, determiner 204 may further determine that each task training process whether Convergence, this will be discussed later.

Hereinafter, it will be discussed in detail the component in training unit 200 as mentioned above.For purposes of illustration, will The embodiment of T task is trained in description by training unit 200 jointly.For T task, face critical point detection is (that is, director Business) it is expressed as r, and one at least one correlation/nonproductive task is expressed as a, wherein a ∈ A.

For each of task, training data is expressed as (x_i,y_i ^t), t={ 1 ..., T } and i={ 1 ..., N }, Middle N represents the quantity of training data.Specifically, being directed to face critical point detection r, training data is expressed as (x_i,y_i ^r), whereinIt is the 2D coordinates of five key points.For task a, training data is expressed as (x_i,y_i ^a).In embodiment, it shows Four tasks p, g, w and s, and they respectively represent deduction ' posture ', ' gender ', ' wearing glasses ' and ' smile '.Therefore,Five different postures ((0 °, ± 30 °, ± 60 °)) are represented, andIt is binary system category It property and respectively represents female's/man does not wear glasses/and wears glasses and there is no smile/smile.Different weights be assigned to main task r and Each nonproductive task a, and it is expressed as W^r{ W^a}。

Then, the object function of all tasks is formulated as follows to optimize main task r and nonproductive task a：

Wherein, f (x^t；w^t) it is x^tWith weight vectors w^tLinear function；

L () represents loss function；

λ^aRepresent the important coefficient of the error of a-th of task；And

x_iIt represents and shares face feature vector.

According to embodiment, least square function is used separately as the loss of main task r and nonproductive task a with entropy function is intersected Function l (), to generate corresponding crucial point tolerance and training mission error.Therefore, above-mentioned object function is rewritable as follows：

In equation (2), the f (x in first item_i；W^r)=(W^r)^Tx_iIt is linear function.Section 2 is posterior probability functionWhereinThe jth of the weight matrix of expression task a, w arranges.Section 3 punishment is big Weight W={ (W^r,{W^a}})。

According to embodiment, the weight of all tasks can update accordingly.Specifically, the weight of face critical point detection Matrix byUpdate, wherein η represent learning rate (such as, η=0.003) andIn addition, the weight matrix of each task a can be calculated as in a similar fashion

Then, the crucial point tolerance and training mission error generated can be by the backpropagation layer by layer of backpropagation device 203 By convolutional neural networks until lowermost layer, the weight of the connection between neuron to adjust convolutional neural networks.

According to presently filed embodiment, error can counter-propagate through convolution god by following backpropagation strategy Through network：

In equation (3), ε^lAll errors in layer l are represented, wherein

For example, ε¹Represent the error of lowermost layer, and ε²Represent the error of the second low layer.The error of lower level is according to equation (3) it is calculated.For example,WhereinIt is the gradient of the activation primitive σ () of network.

Above-mentioned training process is repeated, until determiner 204 determines that the training process of face critical point detection is convergence.It changes Yan Zhi, if error is less than predetermined value, training process will be confirmed as restraining.Pass through above-mentioned training process, feature extraction Device 100 can extract shared face feature vector from given facial image.According to embodiment, for any facial image x⁰, the shared feature vector x of the extraction of feature extractor 100 trained^l.Then, pass through (W^r)^Tx^lPredict key point position, And pass through p (y^a|x^l；W^a) obtain the prediction target of nonproductive task.

During above-mentioned training process, while at least one nonproductive task of training.However, different tasks is with different Loss function and study difficult point, therefore there is different rates of convergence.According to another embodiment, determiner 204 may further determine that Whether the training process of nonproductive task restrains.

Specifically,WithRespectively represent the value of verification collection and the loss function of the task a on training set.If The measured value of one task exceeds threshold epsilon, as follows, then the task will stop：

In equation (4), t represents current iteration, and k represents training length, and λ^aRepresent the weight of the error of a-th of task Want property coefficient.' med ' indicates the function for calculating median.The training mission that first item in equation (4) represents task a is missed The trend of difference.If training error declines rapidly in a segment length k, the value of first item is smaller, to show task Training process can continue, because the task is still valuable.Otherwise, first item is larger, then the task is more likely to stop.Therefore, During training process, nonproductive task can close before its overfitting, so that the task can start overfitting instruction at it Practice collection thus endanger main task before " stop in advance ".

By above-mentioned training process, feature extractor 100 can extract shared face characteristic from any facial image Vector.For example, facial image x⁰It is inputted in the input layer of convolutional neural networks, for example, as shown in Figure 3.Each volume in CNN There are multigroup convolution filter and applied to the activation primitive of facial image in lamination, and they are successively applied with by face Image projects higher level.In other words, by learn a series of Nonlinear Mapping (as follows) with obtain arch face characteristic to Measure x^lAnd facial image is projected into higher level step by step：

Herein, σ () and W^slIt represents and is applied to facial imageNonlinear activation function and The filter learnt is needed in the layer l of CNN.For example,.Referring again to FIGS. 3, in estimation stages, shared face feature vector can It is used for critical point detection and auxiliary/inter-related task simultaneously.

It will be appreciated that a certain hardware, software or combination thereof can be used to implement for system 1000.In addition, the reality of the present invention It applies example and may be adapted to computer program product, the computer program product is embodied in one or more containing computer program code (include but not limited to, magnetic disk storage, CD-ROM, optical memory etc.) on a computer readable storage medium.

With software implementation system 1000, system 1000 may include all-purpose computer, computer cluster, mainstream Computer is exclusively used in providing the computing device or computer network of online content, and the computer network includes one group to collect In or distribution mode operation computer.As shown in figure 4, system 1000 may include one or more processors (processor 102, 104,106 etc.), the information exchange between the various parts of memory 112, storage device 116 and promotion system 1000 is total Line.Processor 102 to 106 may include central processing unit (" CPU "), graphics processing unit (" GPU ") or other are suitable Information processing unit.According to the type of used hardware, processor 102 to 106 may include one or more printed circuit boards And/or one or more microprocessors chip.102 to 106 executable computer program of processor instruction sequence, with execute by The various methods illustrated in further detail below.

Memory 112 can especially include random access memory (" RAM ") and read-only memory (" ROM ").Computer journey Sequence instruction can be stored by memory 112, access and read from the memory, so as to by one in processor 102 to 106 or Multiple processors execute.For example, memory 112 can store one or more software applications.In addition, memory 112 can store it is whole The one of the software application that a software application or only storage can be executed by one or more of processor 102 to 106 processor Part.It should be noted that although only showing a frame in Fig. 4, memory 112 may include being mounted on central processing unit or different meters Calculate multiple physical units on device.

Described above is the systems for face critical point detection.It describes to close for face hereinafter with reference to Fig. 5 and Fig. 6 The method of key point detection.

Fig. 5 shows the schematic flow diagram for face critical point detection, and Fig. 6 shows that training unit 200 executes more The schematic flow diagram of the training process of task convolutional neural networks.

In fig. 5 and fig., method 500 and 600 include can be by one or more of processor 102 to 106 processor The series of steps that each module/unit execute or system 1000 executes, to implement data processing operation.For description Purpose, for being formed by the combination of hardware or hardware and software by each module/unit of system 1000 It is described.It will be understood by one of ordinary skill in the art that other suitable devices or system are applicable to implement following process, and And system 1000 is intended merely as implementing the illustration of the process.

As shown in figure 5, in step S501, feature extractor 100 is carried from least one human face region of facial image Take multiple characteristic patterns.In another embodiment, in step S501, multiple characteristic patterns can be extracted from entire facial image.With Afterwards, in step S502, shared face feature vector is generated in the multiple characteristic patterns extracted from step S501.In step In rapid S503, the face key point of facial image is predicted in the shared face feature vector generated from step S502 It sets.According to another embodiment, shared face feature vector can be used to predict associated with face critical point detection at least one The correspondence target of a nonproductive task.Then, while the target predictions of all nonproductive tasks is obtained.

According to presently filed embodiment, feature extractor includes convolutional neural networks, which includes more A convolution-pond layer and full articulamentum.Each of convolution-pond layer is configured to execute the operation of convolution sum maximum pondization.? In the embodiment, in step S501, can multiple characteristic patterns continuously be extracted by multiple convolution-pond layer, wherein by convolution- The characteristic pattern of preceding layer extraction in the layer of pond is input to next layer of convolution-pond layer, with the feature extracted with previously extracted Scheme different characteristic patterns.In step S502, all multiple characteristic patterns that can be extracted from step S501 by full articulamentum It is middle to generate shared face feature vector.

In this embodiment, method 500 further includes training step (being not shown in Fig. 5), the training step will with reference to figure 6 into Row is discussed.

As shown in fig. 6, in step s 601, to training facial image, its true key point of calibration from predetermined training set Position is used for the calibration real goal of each nonproductive task with it and is sampled.For training facial image, in step S602, The target prediction of its face key point prediction and all nonproductive tasks can be correspondingly obtained from fallout predictor 300.Then, in step In rapid S603, compare the difference between predicted face key point position and the true key point position of calibration, to generate key Point tolerance.In step s 604, be respectively compared target prediction and between the calibration real goal of each nonproductive task not Together, to generate at least one training mission error.Then, in step s 605, by the crucial point tolerance generated and all Training mission error back propagation is by convolutional neural networks, the power of the connection between neuron to adjust convolutional neural networks Weight.In step S606, determine whether one in nonproductive task restrained.If "No", process 600 returns to step S606.If "Yes", the training process of the task stops in step S607 and proceeds to step S608.In step S608 In, determine whether the training process of face critical point detection restrains.If "Yes", process 600 terminates, and otherwise, process 600 is returned Return to step S601.

Therefore, face critical point detection can optimize together with isomery but delicate relevant task.

Although the preferred embodiment of the present invention has been described, after understanding basic conception of the present invention, the technology of fields Personnel can be changed or change to these examples.The appended claims are intended to preferred including falling within the scope of the present invention Example and all changes or change.

Obviously, without departing from the spirit and scope of the present invention, those skilled in the art can be to the present invention It is changed or changes.Therefore, if these variations or change belong to the range of claims and equivalence techniques, they Also it can fall within the scope of the present invention.

Claims

1. a kind of method for detecting the face key point of facial image is executed including the use of convolutional neural networks：

Multiple characteristic patterns are extracted from least one human face region of the facial image；

Shared face feature vector is generated from the multiple characteristic patterns extracted；And

The face key point position of the facial image is predicted from the shared face feature vector generated, and utilizes institute State shared face feature vector predict with the corresponding target of the associated at least one nonproductive task of face critical point detection, To obtain the target prediction of all nonproductive tasks simultaneously,

Wherein, the convolutional neural networks are trained using predetermined training set, and the step of the training includes：

1) to training, facial image, it demarcates true key point position and it is used for each auxiliary and appoints from the predetermined training set The calibration real goal of business is sampled；

2) compare the difference between predicted face key point position and the true key point position of calibration, to generate key Point tolerance；

3) it is respectively compared the target prediction and the difference demarcated between real goal for each nonproductive task, to generate extremely A few training mission error；And

4) by the crucial point tolerance generated and the training mission error back propagation that is generated by the convolutional neural networks, The weight of the connection between neuron to adjust the convolutional neural networks；

Repeat step 1) -4), until the training that the crucial point tolerance of the generation is less than first predetermined value and the generation is appointed Error of being engaged in is less than second predetermined value.

2. according to the method described in claim 1, the wherein described face key point include in the group being made up of extremely It is one few：Eyes central point, nose and the corners of the mouth of facial image.

3. according to the method described in claim 1, the wherein described nonproductive task include in the group being made up of at least One：Head pose estimation, Gender Classification, age estimation, facial expression recognition or face character are inferred.

4. according to the method described in claim 3, wherein, the convolutional neural networks include being configured to execute convolution sum maximum Multiple convolution-pond layer of pondization operation, and

Wherein, extracting multiple characteristic patterns from least one human face region of the facial image further includes：

The multiple characteristic pattern is continuously extracted by the multiple convolution-pond layer, wherein by before in the convolution-pond layer The characteristic pattern of one layer of extraction is input to next layer of the convolution-pond layer, not with extraction and the characteristic pattern previously extracted Same characteristic pattern.

5. according to the method described in claim 4, the wherein described convolutional neural networks further include full articulamentum, and from being carried When generating shared face feature vector in the multiple characteristic patterns taken, by the full articulamentum from all multiple spies extracted The shared face feature vector is generated in sign figure.

6. according to the method described in claim 5, each layer of the wherein described convolutional neural networks have multiple neurons, and Wherein the method further includes：

The convolutional neural networks are trained using predetermined training set, with adjust the convolutional neural networks the neuron it Between connection each weight so that generating the shared people by the convolutional neural networks with the weight being adjusted Face feature vector.

7. being held according to Least Square in Processing according to the method described in claim 1, wherein comparing with generating crucial point tolerance Row, and compare and executed with generating training mission error according to cross entropy processing.

8. according to the method described in claim 1, being wherein directed to each nonproductive task, the step of the training further includes：

5) from predetermined authentication concentrate to verification facial image and its be used for the calibration real goal of each nonproductive task and take Sample；

6) difference between the target prediction and the calibration real goal, to generate validation task error；

Step 5) and 6) is repeated, until the training mission error of the generation is less than third predetermined value and the verification generated is appointed Error of being engaged in is less than the 4th predetermined value.

9. according to the method described in claim 1, wherein, the people is being predicted from the shared face feature vector generated When the face key point position of face image, the face key point position of the facial image is according to (W^r)^Tx^lIt determines,

Wherein W^rRepresent the weight for being assigned to the face critical point detection, and x^lThe shared face characteristic vector is represented, And T represents transposition.

10. a kind of system for detecting the face key point of facial image, including：

Feature extractor is configured to execute using convolutional neural networks：

Multiple characteristic patterns are extracted from least one human face region of the facial image；And

Fallout predictor is configured to from the shared face feature vector generated by the feature extractor described in prediction The face key point position of facial image, and obtained and face key point by using the shared face feature vector The target prediction of associated at least one nonproductive task is detected,

Wherein, each layer of the convolutional neural networks has multiple neurons, and the system also includes training units, use In the training convolutional neural networks, so that the convolutional neural networks trained can extract the shared face characteristic Vector, the training unit include：

Sampler is configured to from predetermined training set to training facial image, it demarcates true key point position and it is used It is sampled in the calibration real goal of each nonproductive task；

Comparator is configured to compare between predicted face key point position and the true key point position of calibration Difference, to generate crucial point tolerance, and be respectively compared the target prediction and for each nonproductive task the calibration it is true Difference between real target, to generate at least one training mission error；And

Backpropagation device, is configured to the crucial point tolerance that will be generated and the training mission error back propagation passes through institute Convolutional neural networks are stated, the weight of the connection between the neuron to adjust the convolutional neural networks.

11. system according to claim 10, wherein the convolutional neural networks include：

Multiple convolution ponds layer is configured to execute the operation of convolution sum maximum pondization, and wherein by convolution pond layer In the characteristic pattern of preceding layer extraction be input in next layer of convolution pond layer, with extraction and the previous spy that had extracted Sign schemes different characteristic patterns；And

Full articulamentum, be configured to generate from multiple characteristic patterns of all extractions the shared face characteristic to Amount.

12. system according to claim 10, wherein the training unit further includes：

Determiner, whether the training process for being configured to determine the face critical point detection restrains and the instruction of each task Practice whether process restrains.

13. system according to claim 10, wherein the face key point includes in the group being made up of It is at least one：The center of the eyes of facial image, nose, the corners of the mouth.

14. system according to claim 10, wherein the nonproductive task include in the group being made up of extremely It is one few：Head pose estimation, Gender Classification, age estimation, facial expression recognition or face character are inferred.

15. a kind of method for training convolutional neural networks, the convolutional neural networks are performed simultaneously face critical point detection With at least one associated nonproductive task to obtain the target prediction of the nonproductive task, the described method comprises the following steps：

1) to training, facial image, it demarcates true key point position and it is used for each nonproductive task from predetermined training set Calibration real goal is sampled；

3) target prediction is respectively compared and for the difference between the calibration real goal of each nonproductive task, with life At at least one training mission error；

4) the crucial point tolerance generated and all training mission error back propagations are passed through into the convolutional Neural net Network, the weight of the connection between neuron to adjust the convolutional neural networks；

7) determine whether the training mission error is less than first predetermined value and whether the validation task error is less than second Predetermined value；And

If so, terminating for training the convolutional neural networks, otherwise, by repeating said steps 1) to 7).