CN106575367B - Method and system for the face critical point detection based on multitask - Google Patents
Method and system for the face critical point detection based on multitask Download PDFInfo
- Publication number
- CN106575367B CN106575367B CN201480081241.1A CN201480081241A CN106575367B CN 106575367 B CN106575367 B CN 106575367B CN 201480081241 A CN201480081241 A CN 201480081241A CN 106575367 B CN106575367 B CN 106575367B
- Authority
- CN
- China
- Prior art keywords
- face
- training
- convolutional neural
- neural networks
- key point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/165—Detection; Localisation; Normalisation using facial parts and geometric relationships
Abstract
The application discloses a kind of method and system for detecting the face key point of facial image.The method may include:From extracting multiple characteristic patterns at least one human face region and/or entire facial image of the facial image;Shared face feature vector is generated from the multiple characteristic patterns extracted;And the face key point position of the facial image is predicted from the shared face feature vector generated.Method and system through the invention, the face critical point detection can optimize together with isomery but delicate relevant task, so as to improve detection reliability by multi-task learning.
Description
Technical field
This application involves face alignment, exactly, be related to for face key point (landmark) detection method and
System.
Background technology
Face critical point detection is that many human face analysis tasks (such as, infer, face verification and face are known by face character
Substance not), but it is constantly subjected to the obstruction of light masking (occlusion) and postural change problem.
Cascade CNN (convolutional neural networks) can be used to execute for accurate face critical point detection, and wherein face is advance
Subregion is divided into different parts, and each part is by individual depth CNN processing.Obtained output then carries out mean deviation
And individual cascading layers are transferred to, to handle each face key point respectively.
In addition, face critical point detection is not question of independence, its estimation can by many isomeries but it is delicate it is associated because
The influence of element.For example, when child is when smiling, his/her face opens very big.Effectively find and utilize such internal association
Face character will be helpful to more accurately detect the corners of the mouth.In addition, in rotating left and right larger face, the distance between two is more
It is small.This pose information can be used as additional information source, with the solution space of the crucial point estimation of constraint.In given abundant reasonable phase
In the case of the set of pass task, handling face critical point detection in isolation can run counter to desire.
However, different task can be inherently different in terms of learning difficult point, and there is different rates of convergence.In addition,
Certain tasks may be received in overfitting, the study to endanger entire model earlier than other tasks when learning at the same time
It holds back.
Invention content
In the one side of the application, the method that discloses the face key point for detecting facial image.The method can
It is executed including the use of convolutional neural networks:Multiple characteristic patterns are extracted from least one human face region of facial image;From being carried
Shared face feature vector is generated in the multiple characteristic patterns taken;And it is predicted from the shared face feature vector generated
The face key point position of facial image, and it is related to face critical point detection to predict using shared face feature vector
The correspondence target of at least one nonproductive task of connection, to obtain the target prediction of all nonproductive tasks simultaneously, wherein the volume
Product neural network is trained using predetermined training set, and the step of the training includes:1) to training face from predetermined training set
Image, it demarcates true key point position and it is used for the calibration real goal of each nonproductive task and is sampled;2) compare institute
Difference between the face key point position and the true key point position of calibration of prediction, to generate crucial point tolerance;3) compare respectively
The difference between calibration real goal compared with target prediction and for each nonproductive task, is missed with generating at least one training mission
Difference;And 4) by the crucial point tolerance generated and the training mission error back propagation that is generated by convolutional neural networks,
The weight of the connection between neuron to adjust convolutional neural networks;Repeat step 1) -4), until the crucial point tolerance of generation
Less than first predetermined value and the training mission error of generation is less than second predetermined value.
In one embodiment, face key point includes selected from least one of the group being made up of:Face figure
Eyes central point, nose and the corners of the mouth of picture.
In one embodiment, nonproductive task includes selected from least one of the group being made up of:Head pose
Estimation, Gender Classification, age estimation, facial expression recognition or face character are inferred.
In one embodiment, convolutional neural networks include being configured to execute the multiple of convolution sum maximum pondization operation
Convolution-pond layer, and wherein, from least one human face region of facial image extracting multiple characteristic patterns further includes:By more
A convolution-pond layer continuously extracts multiple characteristic patterns, wherein being inputted by the characteristic pattern of the preceding layer extraction in convolution-pond layer
To next layer of convolution-pond layer, to extract the characteristic pattern different from the characteristic pattern previously extracted.
In one embodiment, convolutional neural networks further include full articulamentum, and from the multiple features extracted
When generating shared face feature vector in figure, generated from all multiple characteristic patterns extracted by full articulamentum shared
Face feature vector.
In one embodiment, each layer of convolutional neural networks has multiple neurons, and wherein method is also wrapped
It includes:Carry out training convolutional neural networks using predetermined training set, the connection between neuron to adjust convolutional neural networks it is every
A weight, so that generating shared face feature vector by the convolutional neural networks with the weight being adjusted.
In one embodiment, compare and executed according to Least Square in Processing with generating crucial point tolerance, and compare
It is executed compared with to generate training mission error according to cross entropy processing.
In one embodiment, for each nonproductive task, carry out training convolutional neural networks also using predetermined training set
Include the following steps:5) from predetermined authentication concentrate to verification facial image and its be used for the true mesh of calibration of each nonproductive task
Mark is sampled;6) compare target prediction and demarcate the difference between real goal, to generate validation task error;Repeat step
5) and 6), until the validation task error that the training mission error of generation is less than third predetermined value and is generated is less than the 4th in advance
Definite value.
In one embodiment, in the face pass for predicting facial image from the shared face feature vector generated
When key point position, the face key point position of facial image is according to (Wr)TxlIt determines, wherein WrRepresentative is assigned to face key
The weight of point detection, and xlIt represents shared face characteristic vector and T represents transposition.
In the another aspect of the application, the system that discloses the face key point for detecting facial image.The system
It may include feature extractor and fallout predictor.Feature extractor can utilize convolutional neural networks to execute from at least one of facial image
Multiple characteristic patterns are extracted in human face region, and generate shared face feature vector from the multiple characteristic patterns extracted.In advance
The face key point position of facial image can be predicted from the shared face feature vector generated by feature extractor by surveying device, with
And at least one nonproductive task associated with face critical point detection is obtained by using shared face feature vector
Target prediction, wherein each layer of convolutional neural networks has multiple neurons, and system further includes training unit, is used for
Training convolutional neural networks, so that the convolutional neural networks trained can extract shared face feature vector, training
Unit includes:Sampler, be configured to from predetermined training set to training facial image, its demarcate true key point position and
It is used for the calibration real goal of each nonproductive task and is sampled;Comparator is configured to compare predicted face pass
Difference between key point position and the true key point position of calibration, to generate crucial point tolerance, and is respectively compared target prediction
With for each nonproductive task calibration real goal between difference, to generate at least one training mission error;And it is anti-
To transmission device, it is configured to the crucial point tolerance that will be generated and training mission error back propagation passes through convolutional Neural net
Network, the weight of the connection between neuron to adjust convolutional neural networks.
In one embodiment, convolutional neural networks include:Multiple convolution ponds layer is configured to execute convolution sum
Maximum pondization operation, and the next of convolution pond layer is wherein input to by the characteristic pattern of the preceding layer extraction in the layer of convolution pond
In layer, to extract the characteristic pattern different from the characteristic pattern previously extracted;And full articulamentum, it is configured to from all extractions
Multiple characteristic patterns in generate shared face feature vector.
In one embodiment, training unit further includes:Determiner is configured to determine face critical point detection
Whether training process restrains and whether the training process of each task restrains.
In one embodiment, face key point includes selected from least one of the group being made up of:Face figure
The center of the eyes of picture, nose, the corners of the mouth.
In one embodiment, nonproductive task includes selected from least one of the group being made up of:Head pose
Estimation, Gender Classification, age estimation, facial expression recognition or face character are inferred.
The application also have training for be performed simultaneously face critical point detection and at least one associated nonproductive task with
The method for obtaining the convolution character network of the target prediction of nonproductive task.The method may include:1) right from predetermined training set
Training facial image, its true (ground-truth) key point position of calibration and its to be used for the calibration of each nonproductive task true
Target is sampled;2) compare different to generate between predicted face key point position and the true key point position of calibration
Crucial point tolerance;3) it is respectively compared target prediction and the difference demarcated between real goal for each nonproductive task, with life
At at least one training mission error;4) the crucial point tolerance generated and all training mission error back propagations are passed through
Convolutional neural networks, the weight of the connection between neuron to adjust convolutional neural networks;5) it is concentrated from predetermined authentication to testing
Witness's face image is used for the calibration real goal of each nonproductive task with it and is sampled;6) it is true with calibration to compare target prediction
Difference between target, to generate validation task error;And 7) determine whether training mission error is less than first predetermined value simultaneously
And whether validation task error is less than second predetermined value.If so, terminating training convolutional neural networks, otherwise, step will be repeated
1) to 7).
Present invention also provides a kind of computer-readable medium, the computer-readable medium for storing instruction, the instruction
It can be executed by one or more processors to implement above-mentioned method.
Compared with the conventional method, face critical point detection can be optimized together with the task of isomery but delicate auxiliary, so as to
Detection reliability can be improved by multi-task learning, especially there is in processing the face of the masking of notable light and postural change
In the case of.
According to the application, a single CNN is only used, therefore, the complexity of required systems/devices can be reduced.Neither
It needs the advance subregion of face also not need concatenated convolutional nervous layer, to greatly reduce model complexity, and still realizes simultaneously
Quite or even preferably accuracy.
With trained progress, certain inter-related tasks are just no longer beneficial to main task when reaching optimum performance, therefore can
Stop to their training process.According to the application, training for CNN is executed using " stopping (early stopping) in advance "
Journey endangers the inter-related task of main task to stop those because overfitting (over-fit) training set is started, to promote
Study convergence.
Description of the drawings
The exemplary non-limiting embodiments of the present invention are described referring to the attached drawing below.Attached drawing is illustrative, and generally not
It is actual precise proportions.Same or like element on different figures quotes identical drawing reference numeral.
Fig. 1 is the schematic diagram for showing the system for face critical point detection according to some disclosed embodiments.
Fig. 2 is the schematic diagram for showing the training unit as shown in Figure 1 according to some disclosed embodiments.
Fig. 3 is the signal for the example for showing the system for face critical point detection according to some disclosed embodiments
Figure, there is shown with the examples of convolutional neural networks.
Fig. 4 is when showing to be executed in software according to the system for face critical point detection of some disclosed embodiments
Schematic diagram.
Fig. 5 is the schematic flow diagram for showing the method for face critical point detection according to some disclosed embodiments.
Fig. 6 is the exemplary flow for the training process for showing the multitask convolutional neural networks according to some disclosed embodiments
Figure.
Specific implementation mode
This part will be explained in illustrative embodiments, and the example of these embodiments will be illustrated in the drawings.Suitable
When working as, identical drawing reference numeral always shows same or similar part in attached drawing.
Fig. 1 is to show showing according to the exemplary systems 1000 for face critical point detection of some disclosed embodiments
It is intended to.According to system 1000, face critical point detection (being hereinafter also referred to as main task) and at least one related/nonproductive task
It optimizes jointly.Face critical point detection refers to the positions detection 2D, that is, the 2D coordinates (x of the human face region of facial image
And y).The example of face key point may include but be not limited to, the left eye center and right eye center of facial image, nose, the left corners of the mouth
With the right corners of the mouth.The example of nonproductive task may include but be not limited to, head pose estimation, demography (such as, Gender Classification),
Age estimation, facial expression recognition (such as, smiling) or face character infer (such as, wearing glasses).It will be appreciated that auxiliary is appointed
The quantity or type of business are not limited to those mentioned herein.
Referring again to FIGS. 1, may include feature extractor 100,200 and of training unit when system 1000 is implemented by hardware
Fallout predictor 300.Feature extractor 100 can be extracted from least one human face region and/or entire facial image of facial image
Multiple characteristic patterns.Then, can be generated from the multiple characteristic patterns extracted from feature extractor 100 shared face characteristic to
Amount.
Fallout predictor 300 can predict facial image from the shared face feature vector extracted by feature extractor 100
Face key point position.Meanwhile fallout predictor 300 can further prediction be examined with face key point from shared face feature vector
Survey the correspondence target of associated at least one nonproductive task.According to system 1000, face critical point detection can be with nonproductive task
It optimizes jointly.
According to embodiment, feature extractor 100 may include convolutional neural networks.The network may include multiple convolution-ponds
Layer and full articulamentum.In a network, each of multiple convolution-pond layer can perform convolution sum maximum pondization and operate, and by
The characteristic pattern of preceding layer extraction in convolution-pond layer is input in next layer of convolution pond layer, with extraction and previously extraction
The different characteristic pattern of characteristic pattern.Full articulamentum can generate shared face characteristic from all multiple characteristic patterns extracted
Vector.
The example of network is shown, wherein convolutional neural networks include input layer, multiple (for example, three) convolution-pond in Fig. 3
Change layer, a convolutional layer and a full articulamentum, wherein convolution-pond layer includes one or more (for example, three) convolution
Layer and one or more (for example, three) pond layers.It should be noted that shown network is for example, and in feature extractor
Convolutional neural networks are without being limited thereto.As shown in figure 3, the facial image of 40 × 40 (for example) gray scales inputs in input layer.First
Convolution-pond layer extracts characteristic pattern from the image of input.Then, the second convolution-pond layer is using the output of first layer as defeated
Enter, to generate different characteristic patterns.This process is continued by using all three convolution-pond layers.Finally, feature
Multiple layers of figure are used for generating shared face feature vector by full articulamentum.In other words, by executing multiple convolution sum most
Great Chiization operates to generate shared face feature vector.Multiple neuron of each layer containing band locally or globally receptive field,
And the weight of the connection between the neuron of convolutional neural networks can be adjusted correspondingly to train network.
According to embodiment, system 1000 can further include training unit 200.Predetermined training set can be used in training unit 200
Carry out training characteristics extractor, the weight of the connection between neuron to adjust convolutional neural networks, so that trained
Feature extractor can extract shared d face feature vectors.Embodiments herein according to Fig.2, training unit 200 can
Including sampler 201, comparator 202 and backpropagation device 203.
As shown in Fig. 2, sampler 201 can be from predetermined training set to training facial image, its true key point of calibration
It sets and its calibration real goal for each nonproductive task is sampled.According to embodiment, five true key points of calibration
(that is, the center of eyes, nose, corners of the mouth) can be annotated directly on each trained facial image.According to another embodiment, it is used for
The calibration real goal of each nonproductive task can hand labeled.For example, being directed to Gender Classification, calibration real goal can be marked as
Female (F) is male (M).Infer for face character, such as, wear glasses, calibration real goal can be marked as wearing (Y) or not wear
(N).Estimate for head pose, can mark (0 °, ± 30 °, ± 60 °), and be directed to Expression Recognition, such as, smiles, it can be corresponding
Ground marks Yes/No.
Difference between comparable the predicted face key point position of comparator 202 and the true key point position of calibration,
To generate crucial point tolerance.Crucial point tolerance can be obtained by using such as least square method.Comparator 203 can also compare respectively
The difference between calibration real goal compared with target prediction and for each nonproductive task, is missed with generating at least one training mission
Difference.According to another embodiment, training mission error can be obtained by using such as Cross-Entropy Method.
The crucial point tolerance generated and all training mission error back propagations can be passed through volume by backpropagation device 203
Product neural network, the weight of the connection between neuron to adjust convolutional neural networks.
According to embodiment, training unit 200 may also include determining that device 204.Determiner 204 can determine that face key point is examined
Whether the training process of survey restrains.According to another embodiment, determiner 204 may further determine that each task training process whether
Convergence, this will be discussed later.
Hereinafter, it will be discussed in detail the component in training unit 200 as mentioned above.For purposes of illustration, will
The embodiment of T task is trained in description by training unit 200 jointly.For T task, face critical point detection is (that is, director
Business) it is expressed as r, and one at least one correlation/nonproductive task is expressed as a, wherein a ∈ A.
For each of task, training data is expressed as (xi,yi t), t={ 1 ..., T } and i={ 1 ..., N },
Middle N represents the quantity of training data.Specifically, being directed to face critical point detection r, training data is expressed as (xi,yi r), whereinIt is the 2D coordinates of five key points.For task a, training data is expressed as (xi,yi a).In embodiment, it shows
Four tasks p, g, w and s, and they respectively represent deduction ' posture ', ' gender ', ' wearing glasses ' and ' smile '.Therefore,Five different postures ((0 °, ± 30 °, ± 60 °)) are represented, andIt is binary system category
It property and respectively represents female's/man does not wear glasses/and wears glasses and there is no smile/smile.Different weights be assigned to main task r and
Each nonproductive task a, and it is expressed as Wr{ Wa}。
Then, the object function of all tasks is formulated as follows to optimize main task r and nonproductive task a:
Wherein, f (xt;wt) it is xtWith weight vectors wtLinear function;
L () represents loss function;
λaRepresent the important coefficient of the error of a-th of task;And
xiIt represents and shares face feature vector.
According to embodiment, least square function is used separately as the loss of main task r and nonproductive task a with entropy function is intersected
Function l (), to generate corresponding crucial point tolerance and training mission error.Therefore, above-mentioned object function is rewritable as follows:
In equation (2), the f (x in first itemi;Wr)=(Wr)TxiIt is linear function.Section 2 is posterior probability functionWhereinThe jth of the weight matrix of expression task a, w arranges.Section 3 punishment is big
Weight W={ (Wr,{Wa}})。
According to embodiment, the weight of all tasks can update accordingly.Specifically, the weight of face critical point detection
Matrix byUpdate, wherein η represent learning rate (such as, η=0.003) andIn addition, the weight matrix of each task a can be calculated as in a similar fashion
Then, the crucial point tolerance and training mission error generated can be by the backpropagation layer by layer of backpropagation device 203
By convolutional neural networks until lowermost layer, the weight of the connection between neuron to adjust convolutional neural networks.
According to presently filed embodiment, error can counter-propagate through convolution god by following backpropagation strategy
Through network:
In equation (3), εlAll errors in layer l are represented, wherein
For example, ε1Represent the error of lowermost layer, and ε2Represent the error of the second low layer.The error of lower level is according to equation
(3) it is calculated.For example,WhereinIt is the gradient of the activation primitive σ () of network.
Above-mentioned training process is repeated, until determiner 204 determines that the training process of face critical point detection is convergence.It changes
Yan Zhi, if error is less than predetermined value, training process will be confirmed as restraining.Pass through above-mentioned training process, feature extraction
Device 100 can extract shared face feature vector from given facial image.According to embodiment, for any facial image
x0, the shared feature vector x of the extraction of feature extractor 100 trainedl.Then, pass through (Wr)TxlPredict key point position,
And pass through p (ya|xl;Wa) obtain the prediction target of nonproductive task.
During above-mentioned training process, while at least one nonproductive task of training.However, different tasks is with different
Loss function and study difficult point, therefore there is different rates of convergence.According to another embodiment, determiner 204 may further determine that
Whether the training process of nonproductive task restrains.
Specifically,WithRespectively represent the value of verification collection and the loss function of the task a on training set.If
The measured value of one task exceeds threshold epsilon, as follows, then the task will stop:
In equation (4), t represents current iteration, and k represents training length, and λaRepresent the weight of the error of a-th of task
Want property coefficient.' med ' indicates the function for calculating median.The training mission that first item in equation (4) represents task a is missed
The trend of difference.If training error declines rapidly in a segment length k, the value of first item is smaller, to show task
Training process can continue, because the task is still valuable.Otherwise, first item is larger, then the task is more likely to stop.Therefore,
During training process, nonproductive task can close before its overfitting, so that the task can start overfitting instruction at it
Practice collection thus endanger main task before " stop in advance ".
By above-mentioned training process, feature extractor 100 can extract shared face characteristic from any facial image
Vector.For example, facial image x0It is inputted in the input layer of convolutional neural networks, for example, as shown in Figure 3.Each volume in CNN
There are multigroup convolution filter and applied to the activation primitive of facial image in lamination, and they are successively applied with by face
Image projects higher level.In other words, by learn a series of Nonlinear Mapping (as follows) with obtain arch face characteristic to
Measure xlAnd facial image is projected into higher level step by step:
Herein, σ () and WslIt represents and is applied to facial imageNonlinear activation function and
The filter learnt is needed in the layer l of CNN.For example,.Referring again to FIGS. 3, in estimation stages, shared face feature vector can
It is used for critical point detection and auxiliary/inter-related task simultaneously.
It will be appreciated that a certain hardware, software or combination thereof can be used to implement for system 1000.In addition, the reality of the present invention
It applies example and may be adapted to computer program product, the computer program product is embodied in one or more containing computer program code
(include but not limited to, magnetic disk storage, CD-ROM, optical memory etc.) on a computer readable storage medium.
With software implementation system 1000, system 1000 may include all-purpose computer, computer cluster, mainstream
Computer is exclusively used in providing the computing device or computer network of online content, and the computer network includes one group to collect
In or distribution mode operation computer.As shown in figure 4, system 1000 may include one or more processors (processor 102,
104,106 etc.), the information exchange between the various parts of memory 112, storage device 116 and promotion system 1000 is total
Line.Processor 102 to 106 may include central processing unit (" CPU "), graphics processing unit (" GPU ") or other are suitable
Information processing unit.According to the type of used hardware, processor 102 to 106 may include one or more printed circuit boards
And/or one or more microprocessors chip.102 to 106 executable computer program of processor instruction sequence, with execute by
The various methods illustrated in further detail below.
Memory 112 can especially include random access memory (" RAM ") and read-only memory (" ROM ").Computer journey
Sequence instruction can be stored by memory 112, access and read from the memory, so as to by one in processor 102 to 106 or
Multiple processors execute.For example, memory 112 can store one or more software applications.In addition, memory 112 can store it is whole
The one of the software application that a software application or only storage can be executed by one or more of processor 102 to 106 processor
Part.It should be noted that although only showing a frame in Fig. 4, memory 112 may include being mounted on central processing unit or different meters
Calculate multiple physical units on device.
Described above is the systems for face critical point detection.It describes to close for face hereinafter with reference to Fig. 5 and Fig. 6
The method of key point detection.
Fig. 5 shows the schematic flow diagram for face critical point detection, and Fig. 6 shows that training unit 200 executes more
The schematic flow diagram of the training process of task convolutional neural networks.
In fig. 5 and fig., method 500 and 600 include can be by one or more of processor 102 to 106 processor
The series of steps that each module/unit execute or system 1000 executes, to implement data processing operation.For description
Purpose, for being formed by the combination of hardware or hardware and software by each module/unit of system 1000
It is described.It will be understood by one of ordinary skill in the art that other suitable devices or system are applicable to implement following process, and
And system 1000 is intended merely as implementing the illustration of the process.
As shown in figure 5, in step S501, feature extractor 100 is carried from least one human face region of facial image
Take multiple characteristic patterns.In another embodiment, in step S501, multiple characteristic patterns can be extracted from entire facial image.With
Afterwards, in step S502, shared face feature vector is generated in the multiple characteristic patterns extracted from step S501.In step
In rapid S503, the face key point of facial image is predicted in the shared face feature vector generated from step S502
It sets.According to another embodiment, shared face feature vector can be used to predict associated with face critical point detection at least one
The correspondence target of a nonproductive task.Then, while the target predictions of all nonproductive tasks is obtained.
According to presently filed embodiment, feature extractor includes convolutional neural networks, which includes more
A convolution-pond layer and full articulamentum.Each of convolution-pond layer is configured to execute the operation of convolution sum maximum pondization.?
In the embodiment, in step S501, can multiple characteristic patterns continuously be extracted by multiple convolution-pond layer, wherein by convolution-
The characteristic pattern of preceding layer extraction in the layer of pond is input to next layer of convolution-pond layer, with the feature extracted with previously extracted
Scheme different characteristic patterns.In step S502, all multiple characteristic patterns that can be extracted from step S501 by full articulamentum
It is middle to generate shared face feature vector.
In this embodiment, method 500 further includes training step (being not shown in Fig. 5), the training step will with reference to figure 6 into
Row is discussed.
As shown in fig. 6, in step s 601, to training facial image, its true key point of calibration from predetermined training set
Position is used for the calibration real goal of each nonproductive task with it and is sampled.For training facial image, in step S602,
The target prediction of its face key point prediction and all nonproductive tasks can be correspondingly obtained from fallout predictor 300.Then, in step
In rapid S603, compare the difference between predicted face key point position and the true key point position of calibration, to generate key
Point tolerance.In step s 604, be respectively compared target prediction and between the calibration real goal of each nonproductive task not
Together, to generate at least one training mission error.Then, in step s 605, by the crucial point tolerance generated and all
Training mission error back propagation is by convolutional neural networks, the power of the connection between neuron to adjust convolutional neural networks
Weight.In step S606, determine whether one in nonproductive task restrained.If "No", process 600 returns to step
S606.If "Yes", the training process of the task stops in step S607 and proceeds to step S608.In step S608
In, determine whether the training process of face critical point detection restrains.If "Yes", process 600 terminates, and otherwise, process 600 is returned
Return to step S601.
Therefore, face critical point detection can optimize together with isomery but delicate relevant task.
Although the preferred embodiment of the present invention has been described, after understanding basic conception of the present invention, the technology of fields
Personnel can be changed or change to these examples.The appended claims are intended to preferred including falling within the scope of the present invention
Example and all changes or change.
Obviously, without departing from the spirit and scope of the present invention, those skilled in the art can be to the present invention
It is changed or changes.Therefore, if these variations or change belong to the range of claims and equivalence techniques, they
Also it can fall within the scope of the present invention.
Claims (15)
1. a kind of method for detecting the face key point of facial image is executed including the use of convolutional neural networks:
Multiple characteristic patterns are extracted from least one human face region of the facial image;
Shared face feature vector is generated from the multiple characteristic patterns extracted;And
The face key point position of the facial image is predicted from the shared face feature vector generated, and utilizes institute
State shared face feature vector predict with the corresponding target of the associated at least one nonproductive task of face critical point detection,
To obtain the target prediction of all nonproductive tasks simultaneously,
Wherein, the convolutional neural networks are trained using predetermined training set, and the step of the training includes:
1) to training, facial image, it demarcates true key point position and it is used for each auxiliary and appoints from the predetermined training set
The calibration real goal of business is sampled;
2) compare the difference between predicted face key point position and the true key point position of calibration, to generate key
Point tolerance;
3) it is respectively compared the target prediction and the difference demarcated between real goal for each nonproductive task, to generate extremely
A few training mission error;And
4) by the crucial point tolerance generated and the training mission error back propagation that is generated by the convolutional neural networks,
The weight of the connection between neuron to adjust the convolutional neural networks;
Repeat step 1) -4), until the training that the crucial point tolerance of the generation is less than first predetermined value and the generation is appointed
Error of being engaged in is less than second predetermined value.
2. according to the method described in claim 1, the wherein described face key point include in the group being made up of extremely
It is one few:Eyes central point, nose and the corners of the mouth of facial image.
3. according to the method described in claim 1, the wherein described nonproductive task include in the group being made up of at least
One:Head pose estimation, Gender Classification, age estimation, facial expression recognition or face character are inferred.
4. according to the method described in claim 3, wherein, the convolutional neural networks include being configured to execute convolution sum maximum
Multiple convolution-pond layer of pondization operation, and
Wherein, extracting multiple characteristic patterns from least one human face region of the facial image further includes:
The multiple characteristic pattern is continuously extracted by the multiple convolution-pond layer, wherein by before in the convolution-pond layer
The characteristic pattern of one layer of extraction is input to next layer of the convolution-pond layer, not with extraction and the characteristic pattern previously extracted
Same characteristic pattern.
5. according to the method described in claim 4, the wherein described convolutional neural networks further include full articulamentum, and from being carried
When generating shared face feature vector in the multiple characteristic patterns taken, by the full articulamentum from all multiple spies extracted
The shared face feature vector is generated in sign figure.
6. according to the method described in claim 5, each layer of the wherein described convolutional neural networks have multiple neurons, and
Wherein the method further includes:
The convolutional neural networks are trained using predetermined training set, with adjust the convolutional neural networks the neuron it
Between connection each weight so that generating the shared people by the convolutional neural networks with the weight being adjusted
Face feature vector.
7. being held according to Least Square in Processing according to the method described in claim 1, wherein comparing with generating crucial point tolerance
Row, and compare and executed with generating training mission error according to cross entropy processing.
8. according to the method described in claim 1, being wherein directed to each nonproductive task, the step of the training further includes:
5) from predetermined authentication concentrate to verification facial image and its be used for the calibration real goal of each nonproductive task and take
Sample;
6) difference between the target prediction and the calibration real goal, to generate validation task error;
Step 5) and 6) is repeated, until the training mission error of the generation is less than third predetermined value and the verification generated is appointed
Error of being engaged in is less than the 4th predetermined value.
9. according to the method described in claim 1, wherein, the people is being predicted from the shared face feature vector generated
When the face key point position of face image, the face key point position of the facial image is according to (Wr)TxlIt determines,
Wherein WrRepresent the weight for being assigned to the face critical point detection, and xlThe shared face characteristic vector is represented,
And T represents transposition.
10. a kind of system for detecting the face key point of facial image, including:
Feature extractor is configured to execute using convolutional neural networks:
Multiple characteristic patterns are extracted from least one human face region of the facial image;And
Shared face feature vector is generated from the multiple characteristic patterns extracted;And
Fallout predictor is configured to from the shared face feature vector generated by the feature extractor described in prediction
The face key point position of facial image, and obtained and face key point by using the shared face feature vector
The target prediction of associated at least one nonproductive task is detected,
Wherein, each layer of the convolutional neural networks has multiple neurons, and the system also includes training units, use
In the training convolutional neural networks, so that the convolutional neural networks trained can extract the shared face characteristic
Vector, the training unit include:
Sampler is configured to from predetermined training set to training facial image, it demarcates true key point position and it is used
It is sampled in the calibration real goal of each nonproductive task;
Comparator is configured to compare between predicted face key point position and the true key point position of calibration
Difference, to generate crucial point tolerance, and be respectively compared the target prediction and for each nonproductive task the calibration it is true
Difference between real target, to generate at least one training mission error;And
Backpropagation device, is configured to the crucial point tolerance that will be generated and the training mission error back propagation passes through institute
Convolutional neural networks are stated, the weight of the connection between the neuron to adjust the convolutional neural networks.
11. system according to claim 10, wherein the convolutional neural networks include:
Multiple convolution ponds layer is configured to execute the operation of convolution sum maximum pondization, and wherein by convolution pond layer
In the characteristic pattern of preceding layer extraction be input in next layer of convolution pond layer, with extraction and the previous spy that had extracted
Sign schemes different characteristic patterns;And
Full articulamentum, be configured to generate from multiple characteristic patterns of all extractions the shared face characteristic to
Amount.
12. system according to claim 10, wherein the training unit further includes:
Determiner, whether the training process for being configured to determine the face critical point detection restrains and the instruction of each task
Practice whether process restrains.
13. system according to claim 10, wherein the face key point includes in the group being made up of
It is at least one:The center of the eyes of facial image, nose, the corners of the mouth.
14. system according to claim 10, wherein the nonproductive task include in the group being made up of extremely
It is one few:Head pose estimation, Gender Classification, age estimation, facial expression recognition or face character are inferred.
15. a kind of method for training convolutional neural networks, the convolutional neural networks are performed simultaneously face critical point detection
With at least one associated nonproductive task to obtain the target prediction of the nonproductive task, the described method comprises the following steps:
1) to training, facial image, it demarcates true key point position and it is used for each nonproductive task from predetermined training set
Calibration real goal is sampled;
2) compare the difference between predicted face key point position and the true key point position of calibration, to generate key
Point tolerance;
3) target prediction is respectively compared and for the difference between the calibration real goal of each nonproductive task, with life
At at least one training mission error;
4) the crucial point tolerance generated and all training mission error back propagations are passed through into the convolutional Neural net
Network, the weight of the connection between neuron to adjust the convolutional neural networks;
5) from predetermined authentication concentrate to verification facial image and its be used for the calibration real goal of each nonproductive task and take
Sample;
6) difference between the target prediction and the calibration real goal, to generate validation task error;
7) determine whether the training mission error is less than first predetermined value and whether the validation task error is less than second
Predetermined value;And
If so, terminating for training the convolutional neural networks, otherwise, by repeating said steps 1) to 7).
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2014/000769 WO2016026063A1 (en) | 2014-08-21 | 2014-08-21 | A method and a system for facial landmark detection based on multi-task |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106575367A CN106575367A (en) | 2017-04-19 |
CN106575367B true CN106575367B (en) | 2018-11-06 |
Family
ID=55350056
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201480081241.1A Active CN106575367B (en) | 2014-08-21 | 2014-08-21 | Method and system for the face critical point detection based on multitask |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106575367B (en) |
WO (1) | WO2016026063A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145857B (en) * | 2017-04-29 | 2021-05-04 | 深圳市深网视界科技有限公司 | Face attribute recognition method and device and model establishment method |
Families Citing this family (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6750854B2 (en) * | 2016-05-25 | 2020-09-02 | キヤノン株式会社 | Information processing apparatus and information processing method |
CN105957095B (en) * | 2016-06-15 | 2018-06-08 | 电子科技大学 | A kind of Spiking angular-point detection methods based on gray level image |
US10289822B2 (en) * | 2016-07-22 | 2019-05-14 | Nec Corporation | Liveness detection for antispoof face recognition |
US10467459B2 (en) | 2016-09-09 | 2019-11-05 | Microsoft Technology Licensing, Llc | Object detection based on joint feature extraction |
CN107871106B (en) * | 2016-09-26 | 2021-07-06 | 北京眼神科技有限公司 | Face detection method and device |
JP6692271B2 (en) * | 2016-09-28 | 2020-05-13 | 日本電信電話株式会社 | Multitask processing device, multitask model learning device, and program |
US10198626B2 (en) * | 2016-10-19 | 2019-02-05 | Snap Inc. | Neural networks for facial modeling |
US10460153B2 (en) | 2016-11-15 | 2019-10-29 | Futurewei Technologies, Inc. | Automatic identity detection |
CN106951840A (en) * | 2017-03-09 | 2017-07-14 | 北京工业大学 | A kind of facial feature points detection method |
CN107038429A (en) * | 2017-05-03 | 2017-08-11 | 四川云图睿视科技有限公司 | A kind of multitask cascade face alignment method based on deep learning |
CN106951888B (en) * | 2017-05-09 | 2020-12-01 | 安徽大学 | Relative coordinate constraint method and positioning method of human face characteristic point |
CN107358149B (en) * | 2017-05-27 | 2020-09-22 | 深圳市深网视界科技有限公司 | Human body posture detection method and device |
CN107578055B (en) * | 2017-06-20 | 2020-04-14 | 北京陌上花科技有限公司 | Image prediction method and device |
CN108229288B (en) * | 2017-06-23 | 2020-08-11 | 北京市商汤科技开发有限公司 | Neural network training and clothes color detection method and device, storage medium and electronic equipment |
CN107563279B (en) * | 2017-07-22 | 2020-12-22 | 复旦大学 | Model training method for adaptive weight adjustment aiming at human body attribute classification |
US11341631B2 (en) | 2017-08-09 | 2022-05-24 | Shenzhen Keya Medical Technology Corporation | System and method for automatically detecting a physiological condition from a medical image of a patient |
CN107423727B (en) * | 2017-08-14 | 2018-07-10 | 河南工程学院 | Face complex expression recognition methods based on neural network |
CN107704848A (en) * | 2017-10-27 | 2018-02-16 | 深圳市唯特视科技有限公司 | A kind of intensive face alignment method based on multi-constraint condition convolutional neural networks |
CN108196535B (en) * | 2017-12-12 | 2021-09-07 | 清华大学苏州汽车研究院(吴江) | Automatic driving system based on reinforcement learning and multi-sensor fusion |
CN108073910B (en) * | 2017-12-29 | 2021-05-07 | 百度在线网络技术(北京)有限公司 | Method and device for generating human face features |
CN107992864A (en) * | 2018-01-15 | 2018-05-04 | 武汉神目信息技术有限公司 | A kind of vivo identification method and device based on image texture |
CN110060296A (en) * | 2018-01-18 | 2019-07-26 | 北京三星通信技术研究有限公司 | Estimate method, electronic equipment and the method and apparatus for showing virtual objects of posture |
CN108399373B (en) * | 2018-02-06 | 2019-05-10 | 北京达佳互联信息技术有限公司 | The model training and its detection method and device of face key point |
EP3537348A1 (en) * | 2018-03-06 | 2019-09-11 | Dura Operating, LLC | Heterogeneous convolutional neural network for multi-problem solving |
CN108416314B (en) * | 2018-03-16 | 2022-03-08 | 中山大学 | Picture important face detection method |
CN108615016B (en) * | 2018-04-28 | 2020-06-19 | 北京华捷艾米科技有限公司 | Face key point detection method and face key point detection device |
US20210056292A1 (en) * | 2018-05-17 | 2021-02-25 | Hewlett-Packard Development Company, L.P. | Image location identification |
CN109147940B (en) * | 2018-07-05 | 2021-05-25 | 科亚医疗科技股份有限公司 | Apparatus and system for automatically predicting physiological condition from medical image of patient |
CN109145798B (en) * | 2018-08-13 | 2021-10-22 | 浙江零跑科技股份有限公司 | Driving scene target identification and travelable region segmentation integration method |
US11954881B2 (en) | 2018-08-28 | 2024-04-09 | Apple Inc. | Semi-supervised learning using clustering as an additional constraint |
CN109635750A (en) * | 2018-12-14 | 2019-04-16 | 广西师范大学 | A kind of compound convolutional neural networks images of gestures recognition methods under complex background |
CN109522910B (en) * | 2018-12-25 | 2020-12-11 | 浙江商汤科技开发有限公司 | Key point detection method and device, electronic equipment and storage medium |
CN109829431B (en) * | 2019-01-31 | 2021-02-12 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
CN111563397B (en) * | 2019-02-13 | 2023-04-18 | 阿里巴巴集团控股有限公司 | Detection method, detection device, intelligent equipment and computer storage medium |
CN109902641B (en) * | 2019-03-06 | 2021-03-02 | 中国科学院自动化研究所 | Semantic alignment-based face key point detection method, system and device |
CN110163080A (en) | 2019-04-02 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Face critical point detection method and device, storage medium and electronic equipment |
CN110163098A (en) * | 2019-04-17 | 2019-08-23 | 西北大学 | Based on the facial expression recognition model construction of depth of seam division network and recognition methods |
CN110136828A (en) * | 2019-05-16 | 2019-08-16 | 杭州健培科技有限公司 | A method of medical image multitask auxiliary diagnosis is realized based on deep learning |
WO2021036726A1 (en) * | 2019-08-29 | 2021-03-04 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method, system, and computer-readable medium for using face alignment model based on multi-task convolutional neural network-obtained data |
CN110705419A (en) * | 2019-09-24 | 2020-01-17 | 新华三大数据技术有限公司 | Emotion recognition method, early warning method, model training method and related device |
CN111339813B (en) * | 2019-09-30 | 2022-09-27 | 深圳市商汤科技有限公司 | Face attribute recognition method and device, electronic equipment and storage medium |
CN111191675B (en) * | 2019-12-03 | 2023-10-24 | 深圳市华尊科技股份有限公司 | Pedestrian attribute identification model realization method and related device |
WO2022003982A1 (en) * | 2020-07-03 | 2022-01-06 | 日本電気株式会社 | Detection device, learning device, detection method, and storage medium |
KR102538804B1 (en) * | 2020-11-16 | 2023-06-01 | 상명대학교 산학협력단 | Device and method for landmark detection using artificial intelligence |
CN112488003A (en) * | 2020-12-03 | 2021-03-12 | 深圳市捷顺科技实业股份有限公司 | Face detection method, model creation method, device, equipment and medium |
CN112820382A (en) * | 2021-02-04 | 2021-05-18 | 上海小芃科技有限公司 | Breast cancer postoperative intelligent rehabilitation training method, device, equipment and storage medium |
US11776323B2 (en) | 2022-02-15 | 2023-10-03 | Ford Global Technologies, Llc | Biometric task network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102831382A (en) * | 2011-06-15 | 2012-12-19 | 北京三星通信技术研究有限公司 | Face tracking apparatus and method |
CN103824054A (en) * | 2014-02-17 | 2014-05-28 | 北京旷视科技有限公司 | Cascaded depth neural network-based face attribute recognition method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1352436A (en) * | 2000-11-15 | 2002-06-05 | 星创科技股份有限公司 | Real-time face identification system |
JP4778158B2 (en) * | 2001-05-31 | 2011-09-21 | オリンパス株式会社 | Image selection support device |
CN101673340A (en) * | 2009-08-13 | 2010-03-17 | 重庆大学 | Method for identifying human ear by colligating multi-direction and multi-dimension and BP neural network |
-
2014
- 2014-08-21 CN CN201480081241.1A patent/CN106575367B/en active Active
- 2014-08-21 WO PCT/CN2014/000769 patent/WO2016026063A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102831382A (en) * | 2011-06-15 | 2012-12-19 | 北京三星通信技术研究有限公司 | Face tracking apparatus and method |
CN103824054A (en) * | 2014-02-17 | 2014-05-28 | 北京旷视科技有限公司 | Cascaded depth neural network-based face attribute recognition method |
Non-Patent Citations (1)
Title |
---|
增长式卷积神经网络及其在人脸检测中的应用;顾佳玲 等;《系统仿真学报》;20090430;第21卷(第8期);第2441-2445页 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145857B (en) * | 2017-04-29 | 2021-05-04 | 深圳市深网视界科技有限公司 | Face attribute recognition method and device and model establishment method |
Also Published As
Publication number | Publication date |
---|---|
WO2016026063A1 (en) | 2016-02-25 |
CN106575367A (en) | 2017-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106575367B (en) | Method and system for the face critical point detection based on multitask | |
Zhang et al. | Fruit classification by biogeography‐based optimization and feedforward neural network | |
EP3074918B1 (en) | Method and system for face image recognition | |
Hemanth et al. | Performance improved iteration-free artificial neural networks for abnormal magnetic resonance brain image classification | |
Currie et al. | Intelligent imaging in nuclear medicine: the principles of artificial intelligence, machine learning and deep learning | |
CN109902546A (en) | Face identification method, device and computer-readable medium | |
CN108427921A (en) | A kind of face identification method based on convolutional neural networks | |
CN114781272A (en) | Carbon emission prediction method, device, equipment and storage medium | |
Khan et al. | Human gait analysis: A sequential framework of lightweight deep learning and improved moth-flame optimization algorithm | |
CN111611877A (en) | Age interference resistant face recognition method based on multi-temporal-spatial information fusion | |
Pranav et al. | Detection and identification of COVID-19 based on chest medical image by using convolutional neural networks | |
Abd-Ellah et al. | Parallel deep CNN structure for glioma detection and classification via brain MRI Images | |
Farahmand-Tabar et al. | Steel Plate Fault Detection Using the Fitness-Dependent Optimizer and Neural Networks | |
McLaughlin et al. | 3-D human pose estimation using iterative conditional squeeze and excitation networks | |
Zakeri et al. | DragNet: Learning-based deformable registration for realistic cardiac MR sequence generation from a single frame | |
Trottier et al. | Multi-task learning by deep collaboration and application in facial landmark detection | |
Yentrapragada | Deep features based convolutional neural network to detect and automatic classification of white blood cells | |
Jahnavi et al. | Detection of COVID-19 using ResNet50, VGG19, mobilenet, and forecasting; using logistic regression, prophet, and SEIRD Model | |
Cárdenas-Peña et al. | Supervised kernel approach for automated learning using General Stochastic Networks | |
Goetschalckx et al. | Computing a human-like reaction time metric from stable recurrent vision models | |
Kamabattula et al. | Identifying the Training Stop Point with Noisy Labeled Data | |
Bhattacharjee et al. | Active learning for imbalanced domains: the ALOD and ALOD-RE algorithms | |
Wu et al. | Multi-rater Prism: Learning self-calibrated medical image segmentation from multiple raters | |
Farag et al. | Inductive Conformal Prediction for Harvest-Readiness Classification of Cauliflower Plants: A Comparative Study of Uncertainty Quantification Methods | |
Hosamani et al. | Data science: prediction and analysis of data using multiple classifier system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |