CN106575367B - Method and system for the face critical point detection based on multitask - Google Patents

Method and system for the face critical point detection based on multitask Download PDF

Info

Publication number
CN106575367B
CN106575367B CN201480081241.1A CN201480081241A CN106575367B CN 106575367 B CN106575367 B CN 106575367B CN 201480081241 A CN201480081241 A CN 201480081241A CN 106575367 B CN106575367 B CN 106575367B
Authority
CN
China
Prior art keywords
face
training
convolutional neural
neural networks
key point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201480081241.1A
Other languages
Chinese (zh)
Other versions
CN106575367A (en
Inventor
汤晓鸥
张展鹏
罗平
吕健勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Publication of CN106575367A publication Critical patent/CN106575367A/en
Application granted granted Critical
Publication of CN106575367B publication Critical patent/CN106575367B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships

Abstract

The application discloses a kind of method and system for detecting the face key point of facial image.The method may include:From extracting multiple characteristic patterns at least one human face region and/or entire facial image of the facial image;Shared face feature vector is generated from the multiple characteristic patterns extracted;And the face key point position of the facial image is predicted from the shared face feature vector generated.Method and system through the invention, the face critical point detection can optimize together with isomery but delicate relevant task, so as to improve detection reliability by multi-task learning.

Description

Method and system for the face critical point detection based on multitask
Technical field
This application involves face alignment, exactly, be related to for face key point (landmark) detection method and System.
Background technology
Face critical point detection is that many human face analysis tasks (such as, infer, face verification and face are known by face character Substance not), but it is constantly subjected to the obstruction of light masking (occlusion) and postural change problem.
Cascade CNN (convolutional neural networks) can be used to execute for accurate face critical point detection, and wherein face is advance Subregion is divided into different parts, and each part is by individual depth CNN processing.Obtained output then carries out mean deviation And individual cascading layers are transferred to, to handle each face key point respectively.
In addition, face critical point detection is not question of independence, its estimation can by many isomeries but it is delicate it is associated because The influence of element.For example, when child is when smiling, his/her face opens very big.Effectively find and utilize such internal association Face character will be helpful to more accurately detect the corners of the mouth.In addition, in rotating left and right larger face, the distance between two is more It is small.This pose information can be used as additional information source, with the solution space of the crucial point estimation of constraint.In given abundant reasonable phase In the case of the set of pass task, handling face critical point detection in isolation can run counter to desire.
However, different task can be inherently different in terms of learning difficult point, and there is different rates of convergence.In addition, Certain tasks may be received in overfitting, the study to endanger entire model earlier than other tasks when learning at the same time It holds back.
Invention content
In the one side of the application, the method that discloses the face key point for detecting facial image.The method can It is executed including the use of convolutional neural networks:Multiple characteristic patterns are extracted from least one human face region of facial image;From being carried Shared face feature vector is generated in the multiple characteristic patterns taken;And it is predicted from the shared face feature vector generated The face key point position of facial image, and it is related to face critical point detection to predict using shared face feature vector The correspondence target of at least one nonproductive task of connection, to obtain the target prediction of all nonproductive tasks simultaneously, wherein the volume Product neural network is trained using predetermined training set, and the step of the training includes:1) to training face from predetermined training set Image, it demarcates true key point position and it is used for the calibration real goal of each nonproductive task and is sampled;2) compare institute Difference between the face key point position and the true key point position of calibration of prediction, to generate crucial point tolerance;3) compare respectively The difference between calibration real goal compared with target prediction and for each nonproductive task, is missed with generating at least one training mission Difference;And 4) by the crucial point tolerance generated and the training mission error back propagation that is generated by convolutional neural networks, The weight of the connection between neuron to adjust convolutional neural networks;Repeat step 1) -4), until the crucial point tolerance of generation Less than first predetermined value and the training mission error of generation is less than second predetermined value.
In one embodiment, face key point includes selected from least one of the group being made up of:Face figure Eyes central point, nose and the corners of the mouth of picture.
In one embodiment, nonproductive task includes selected from least one of the group being made up of:Head pose Estimation, Gender Classification, age estimation, facial expression recognition or face character are inferred.
In one embodiment, convolutional neural networks include being configured to execute the multiple of convolution sum maximum pondization operation Convolution-pond layer, and wherein, from least one human face region of facial image extracting multiple characteristic patterns further includes:By more A convolution-pond layer continuously extracts multiple characteristic patterns, wherein being inputted by the characteristic pattern of the preceding layer extraction in convolution-pond layer To next layer of convolution-pond layer, to extract the characteristic pattern different from the characteristic pattern previously extracted.
In one embodiment, convolutional neural networks further include full articulamentum, and from the multiple features extracted When generating shared face feature vector in figure, generated from all multiple characteristic patterns extracted by full articulamentum shared Face feature vector.
In one embodiment, each layer of convolutional neural networks has multiple neurons, and wherein method is also wrapped It includes:Carry out training convolutional neural networks using predetermined training set, the connection between neuron to adjust convolutional neural networks it is every A weight, so that generating shared face feature vector by the convolutional neural networks with the weight being adjusted.
In one embodiment, compare and executed according to Least Square in Processing with generating crucial point tolerance, and compare It is executed compared with to generate training mission error according to cross entropy processing.
In one embodiment, for each nonproductive task, carry out training convolutional neural networks also using predetermined training set Include the following steps:5) from predetermined authentication concentrate to verification facial image and its be used for the true mesh of calibration of each nonproductive task Mark is sampled;6) compare target prediction and demarcate the difference between real goal, to generate validation task error;Repeat step 5) and 6), until the validation task error that the training mission error of generation is less than third predetermined value and is generated is less than the 4th in advance Definite value.
In one embodiment, in the face pass for predicting facial image from the shared face feature vector generated When key point position, the face key point position of facial image is according to (Wr)TxlIt determines, wherein WrRepresentative is assigned to face key The weight of point detection, and xlIt represents shared face characteristic vector and T represents transposition.
In the another aspect of the application, the system that discloses the face key point for detecting facial image.The system It may include feature extractor and fallout predictor.Feature extractor can utilize convolutional neural networks to execute from at least one of facial image Multiple characteristic patterns are extracted in human face region, and generate shared face feature vector from the multiple characteristic patterns extracted.In advance The face key point position of facial image can be predicted from the shared face feature vector generated by feature extractor by surveying device, with And at least one nonproductive task associated with face critical point detection is obtained by using shared face feature vector Target prediction, wherein each layer of convolutional neural networks has multiple neurons, and system further includes training unit, is used for Training convolutional neural networks, so that the convolutional neural networks trained can extract shared face feature vector, training Unit includes:Sampler, be configured to from predetermined training set to training facial image, its demarcate true key point position and It is used for the calibration real goal of each nonproductive task and is sampled;Comparator is configured to compare predicted face pass Difference between key point position and the true key point position of calibration, to generate crucial point tolerance, and is respectively compared target prediction With for each nonproductive task calibration real goal between difference, to generate at least one training mission error;And it is anti- To transmission device, it is configured to the crucial point tolerance that will be generated and training mission error back propagation passes through convolutional Neural net Network, the weight of the connection between neuron to adjust convolutional neural networks.
In one embodiment, convolutional neural networks include:Multiple convolution ponds layer is configured to execute convolution sum Maximum pondization operation, and the next of convolution pond layer is wherein input to by the characteristic pattern of the preceding layer extraction in the layer of convolution pond In layer, to extract the characteristic pattern different from the characteristic pattern previously extracted;And full articulamentum, it is configured to from all extractions Multiple characteristic patterns in generate shared face feature vector.
In one embodiment, training unit further includes:Determiner is configured to determine face critical point detection Whether training process restrains and whether the training process of each task restrains.
In one embodiment, face key point includes selected from least one of the group being made up of:Face figure The center of the eyes of picture, nose, the corners of the mouth.
In one embodiment, nonproductive task includes selected from least one of the group being made up of:Head pose Estimation, Gender Classification, age estimation, facial expression recognition or face character are inferred.
The application also have training for be performed simultaneously face critical point detection and at least one associated nonproductive task with The method for obtaining the convolution character network of the target prediction of nonproductive task.The method may include:1) right from predetermined training set Training facial image, its true (ground-truth) key point position of calibration and its to be used for the calibration of each nonproductive task true Target is sampled;2) compare different to generate between predicted face key point position and the true key point position of calibration Crucial point tolerance;3) it is respectively compared target prediction and the difference demarcated between real goal for each nonproductive task, with life At at least one training mission error;4) the crucial point tolerance generated and all training mission error back propagations are passed through Convolutional neural networks, the weight of the connection between neuron to adjust convolutional neural networks;5) it is concentrated from predetermined authentication to testing Witness's face image is used for the calibration real goal of each nonproductive task with it and is sampled;6) it is true with calibration to compare target prediction Difference between target, to generate validation task error;And 7) determine whether training mission error is less than first predetermined value simultaneously And whether validation task error is less than second predetermined value.If so, terminating training convolutional neural networks, otherwise, step will be repeated 1) to 7).
Present invention also provides a kind of computer-readable medium, the computer-readable medium for storing instruction, the instruction It can be executed by one or more processors to implement above-mentioned method.
Compared with the conventional method, face critical point detection can be optimized together with the task of isomery but delicate auxiliary, so as to Detection reliability can be improved by multi-task learning, especially there is in processing the face of the masking of notable light and postural change In the case of.
According to the application, a single CNN is only used, therefore, the complexity of required systems/devices can be reduced.Neither It needs the advance subregion of face also not need concatenated convolutional nervous layer, to greatly reduce model complexity, and still realizes simultaneously Quite or even preferably accuracy.
With trained progress, certain inter-related tasks are just no longer beneficial to main task when reaching optimum performance, therefore can Stop to their training process.According to the application, training for CNN is executed using " stopping (early stopping) in advance " Journey endangers the inter-related task of main task to stop those because overfitting (over-fit) training set is started, to promote Study convergence.
Description of the drawings
The exemplary non-limiting embodiments of the present invention are described referring to the attached drawing below.Attached drawing is illustrative, and generally not It is actual precise proportions.Same or like element on different figures quotes identical drawing reference numeral.
Fig. 1 is the schematic diagram for showing the system for face critical point detection according to some disclosed embodiments.
Fig. 2 is the schematic diagram for showing the training unit as shown in Figure 1 according to some disclosed embodiments.
Fig. 3 is the signal for the example for showing the system for face critical point detection according to some disclosed embodiments Figure, there is shown with the examples of convolutional neural networks.
Fig. 4 is when showing to be executed in software according to the system for face critical point detection of some disclosed embodiments Schematic diagram.
Fig. 5 is the schematic flow diagram for showing the method for face critical point detection according to some disclosed embodiments.
Fig. 6 is the exemplary flow for the training process for showing the multitask convolutional neural networks according to some disclosed embodiments Figure.
Specific implementation mode
This part will be explained in illustrative embodiments, and the example of these embodiments will be illustrated in the drawings.Suitable When working as, identical drawing reference numeral always shows same or similar part in attached drawing.
Fig. 1 is to show showing according to the exemplary systems 1000 for face critical point detection of some disclosed embodiments It is intended to.According to system 1000, face critical point detection (being hereinafter also referred to as main task) and at least one related/nonproductive task It optimizes jointly.Face critical point detection refers to the positions detection 2D, that is, the 2D coordinates (x of the human face region of facial image And y).The example of face key point may include but be not limited to, the left eye center and right eye center of facial image, nose, the left corners of the mouth With the right corners of the mouth.The example of nonproductive task may include but be not limited to, head pose estimation, demography (such as, Gender Classification), Age estimation, facial expression recognition (such as, smiling) or face character infer (such as, wearing glasses).It will be appreciated that auxiliary is appointed The quantity or type of business are not limited to those mentioned herein.
Referring again to FIGS. 1, may include feature extractor 100,200 and of training unit when system 1000 is implemented by hardware Fallout predictor 300.Feature extractor 100 can be extracted from least one human face region and/or entire facial image of facial image Multiple characteristic patterns.Then, can be generated from the multiple characteristic patterns extracted from feature extractor 100 shared face characteristic to Amount.
Fallout predictor 300 can predict facial image from the shared face feature vector extracted by feature extractor 100 Face key point position.Meanwhile fallout predictor 300 can further prediction be examined with face key point from shared face feature vector Survey the correspondence target of associated at least one nonproductive task.According to system 1000, face critical point detection can be with nonproductive task It optimizes jointly.
According to embodiment, feature extractor 100 may include convolutional neural networks.The network may include multiple convolution-ponds Layer and full articulamentum.In a network, each of multiple convolution-pond layer can perform convolution sum maximum pondization and operate, and by The characteristic pattern of preceding layer extraction in convolution-pond layer is input in next layer of convolution pond layer, with extraction and previously extraction The different characteristic pattern of characteristic pattern.Full articulamentum can generate shared face characteristic from all multiple characteristic patterns extracted Vector.
The example of network is shown, wherein convolutional neural networks include input layer, multiple (for example, three) convolution-pond in Fig. 3 Change layer, a convolutional layer and a full articulamentum, wherein convolution-pond layer includes one or more (for example, three) convolution Layer and one or more (for example, three) pond layers.It should be noted that shown network is for example, and in feature extractor Convolutional neural networks are without being limited thereto.As shown in figure 3, the facial image of 40 × 40 (for example) gray scales inputs in input layer.First Convolution-pond layer extracts characteristic pattern from the image of input.Then, the second convolution-pond layer is using the output of first layer as defeated Enter, to generate different characteristic patterns.This process is continued by using all three convolution-pond layers.Finally, feature Multiple layers of figure are used for generating shared face feature vector by full articulamentum.In other words, by executing multiple convolution sum most Great Chiization operates to generate shared face feature vector.Multiple neuron of each layer containing band locally or globally receptive field, And the weight of the connection between the neuron of convolutional neural networks can be adjusted correspondingly to train network.
According to embodiment, system 1000 can further include training unit 200.Predetermined training set can be used in training unit 200 Carry out training characteristics extractor, the weight of the connection between neuron to adjust convolutional neural networks, so that trained Feature extractor can extract shared d face feature vectors.Embodiments herein according to Fig.2, training unit 200 can Including sampler 201, comparator 202 and backpropagation device 203.
As shown in Fig. 2, sampler 201 can be from predetermined training set to training facial image, its true key point of calibration It sets and its calibration real goal for each nonproductive task is sampled.According to embodiment, five true key points of calibration (that is, the center of eyes, nose, corners of the mouth) can be annotated directly on each trained facial image.According to another embodiment, it is used for The calibration real goal of each nonproductive task can hand labeled.For example, being directed to Gender Classification, calibration real goal can be marked as Female (F) is male (M).Infer for face character, such as, wear glasses, calibration real goal can be marked as wearing (Y) or not wear (N).Estimate for head pose, can mark (0 °, ± 30 °, ± 60 °), and be directed to Expression Recognition, such as, smiles, it can be corresponding Ground marks Yes/No.
Difference between comparable the predicted face key point position of comparator 202 and the true key point position of calibration, To generate crucial point tolerance.Crucial point tolerance can be obtained by using such as least square method.Comparator 203 can also compare respectively The difference between calibration real goal compared with target prediction and for each nonproductive task, is missed with generating at least one training mission Difference.According to another embodiment, training mission error can be obtained by using such as Cross-Entropy Method.
The crucial point tolerance generated and all training mission error back propagations can be passed through volume by backpropagation device 203 Product neural network, the weight of the connection between neuron to adjust convolutional neural networks.
According to embodiment, training unit 200 may also include determining that device 204.Determiner 204 can determine that face key point is examined Whether the training process of survey restrains.According to another embodiment, determiner 204 may further determine that each task training process whether Convergence, this will be discussed later.
Hereinafter, it will be discussed in detail the component in training unit 200 as mentioned above.For purposes of illustration, will The embodiment of T task is trained in description by training unit 200 jointly.For T task, face critical point detection is (that is, director Business) it is expressed as r, and one at least one correlation/nonproductive task is expressed as a, wherein a ∈ A.
For each of task, training data is expressed as (xi,yi t), t={ 1 ..., T } and i={ 1 ..., N }, Middle N represents the quantity of training data.Specifically, being directed to face critical point detection r, training data is expressed as (xi,yi r), whereinIt is the 2D coordinates of five key points.For task a, training data is expressed as (xi,yi a).In embodiment, it shows Four tasks p, g, w and s, and they respectively represent deduction ' posture ', ' gender ', ' wearing glasses ' and ' smile '.Therefore,Five different postures ((0 °, ± 30 °, ± 60 °)) are represented, andIt is binary system category It property and respectively represents female's/man does not wear glasses/and wears glasses and there is no smile/smile.Different weights be assigned to main task r and Each nonproductive task a, and it is expressed as Wr{ Wa}。
Then, the object function of all tasks is formulated as follows to optimize main task r and nonproductive task a:
Wherein, f (xt;wt) it is xtWith weight vectors wtLinear function;
L () represents loss function;
λaRepresent the important coefficient of the error of a-th of task;And
xiIt represents and shares face feature vector.
According to embodiment, least square function is used separately as the loss of main task r and nonproductive task a with entropy function is intersected Function l (), to generate corresponding crucial point tolerance and training mission error.Therefore, above-mentioned object function is rewritable as follows:
In equation (2), the f (x in first itemi;Wr)=(Wr)TxiIt is linear function.Section 2 is posterior probability functionWhereinThe jth of the weight matrix of expression task a, w arranges.Section 3 punishment is big Weight W={ (Wr,{Wa}})。
According to embodiment, the weight of all tasks can update accordingly.Specifically, the weight of face critical point detection Matrix byUpdate, wherein η represent learning rate (such as, η=0.003) andIn addition, the weight matrix of each task a can be calculated as in a similar fashion
Then, the crucial point tolerance and training mission error generated can be by the backpropagation layer by layer of backpropagation device 203 By convolutional neural networks until lowermost layer, the weight of the connection between neuron to adjust convolutional neural networks.
According to presently filed embodiment, error can counter-propagate through convolution god by following backpropagation strategy Through network:
In equation (3), εlAll errors in layer l are represented, wherein
For example, ε1Represent the error of lowermost layer, and ε2Represent the error of the second low layer.The error of lower level is according to equation (3) it is calculated.For example,WhereinIt is the gradient of the activation primitive σ () of network.
Above-mentioned training process is repeated, until determiner 204 determines that the training process of face critical point detection is convergence.It changes Yan Zhi, if error is less than predetermined value, training process will be confirmed as restraining.Pass through above-mentioned training process, feature extraction Device 100 can extract shared face feature vector from given facial image.According to embodiment, for any facial image x0, the shared feature vector x of the extraction of feature extractor 100 trainedl.Then, pass through (Wr)TxlPredict key point position, And pass through p (ya|xl;Wa) obtain the prediction target of nonproductive task.
During above-mentioned training process, while at least one nonproductive task of training.However, different tasks is with different Loss function and study difficult point, therefore there is different rates of convergence.According to another embodiment, determiner 204 may further determine that Whether the training process of nonproductive task restrains.
Specifically,WithRespectively represent the value of verification collection and the loss function of the task a on training set.If The measured value of one task exceeds threshold epsilon, as follows, then the task will stop:
In equation (4), t represents current iteration, and k represents training length, and λaRepresent the weight of the error of a-th of task Want property coefficient.' med ' indicates the function for calculating median.The training mission that first item in equation (4) represents task a is missed The trend of difference.If training error declines rapidly in a segment length k, the value of first item is smaller, to show task Training process can continue, because the task is still valuable.Otherwise, first item is larger, then the task is more likely to stop.Therefore, During training process, nonproductive task can close before its overfitting, so that the task can start overfitting instruction at it Practice collection thus endanger main task before " stop in advance ".
By above-mentioned training process, feature extractor 100 can extract shared face characteristic from any facial image Vector.For example, facial image x0It is inputted in the input layer of convolutional neural networks, for example, as shown in Figure 3.Each volume in CNN There are multigroup convolution filter and applied to the activation primitive of facial image in lamination, and they are successively applied with by face Image projects higher level.In other words, by learn a series of Nonlinear Mapping (as follows) with obtain arch face characteristic to Measure xlAnd facial image is projected into higher level step by step:
Herein, σ () and WslIt represents and is applied to facial imageNonlinear activation function and The filter learnt is needed in the layer l of CNN.For example,.Referring again to FIGS. 3, in estimation stages, shared face feature vector can It is used for critical point detection and auxiliary/inter-related task simultaneously.
It will be appreciated that a certain hardware, software or combination thereof can be used to implement for system 1000.In addition, the reality of the present invention It applies example and may be adapted to computer program product, the computer program product is embodied in one or more containing computer program code (include but not limited to, magnetic disk storage, CD-ROM, optical memory etc.) on a computer readable storage medium.
With software implementation system 1000, system 1000 may include all-purpose computer, computer cluster, mainstream Computer is exclusively used in providing the computing device or computer network of online content, and the computer network includes one group to collect In or distribution mode operation computer.As shown in figure 4, system 1000 may include one or more processors (processor 102, 104,106 etc.), the information exchange between the various parts of memory 112, storage device 116 and promotion system 1000 is total Line.Processor 102 to 106 may include central processing unit (" CPU "), graphics processing unit (" GPU ") or other are suitable Information processing unit.According to the type of used hardware, processor 102 to 106 may include one or more printed circuit boards And/or one or more microprocessors chip.102 to 106 executable computer program of processor instruction sequence, with execute by The various methods illustrated in further detail below.
Memory 112 can especially include random access memory (" RAM ") and read-only memory (" ROM ").Computer journey Sequence instruction can be stored by memory 112, access and read from the memory, so as to by one in processor 102 to 106 or Multiple processors execute.For example, memory 112 can store one or more software applications.In addition, memory 112 can store it is whole The one of the software application that a software application or only storage can be executed by one or more of processor 102 to 106 processor Part.It should be noted that although only showing a frame in Fig. 4, memory 112 may include being mounted on central processing unit or different meters Calculate multiple physical units on device.
Described above is the systems for face critical point detection.It describes to close for face hereinafter with reference to Fig. 5 and Fig. 6 The method of key point detection.
Fig. 5 shows the schematic flow diagram for face critical point detection, and Fig. 6 shows that training unit 200 executes more The schematic flow diagram of the training process of task convolutional neural networks.
In fig. 5 and fig., method 500 and 600 include can be by one or more of processor 102 to 106 processor The series of steps that each module/unit execute or system 1000 executes, to implement data processing operation.For description Purpose, for being formed by the combination of hardware or hardware and software by each module/unit of system 1000 It is described.It will be understood by one of ordinary skill in the art that other suitable devices or system are applicable to implement following process, and And system 1000 is intended merely as implementing the illustration of the process.
As shown in figure 5, in step S501, feature extractor 100 is carried from least one human face region of facial image Take multiple characteristic patterns.In another embodiment, in step S501, multiple characteristic patterns can be extracted from entire facial image.With Afterwards, in step S502, shared face feature vector is generated in the multiple characteristic patterns extracted from step S501.In step In rapid S503, the face key point of facial image is predicted in the shared face feature vector generated from step S502 It sets.According to another embodiment, shared face feature vector can be used to predict associated with face critical point detection at least one The correspondence target of a nonproductive task.Then, while the target predictions of all nonproductive tasks is obtained.
According to presently filed embodiment, feature extractor includes convolutional neural networks, which includes more A convolution-pond layer and full articulamentum.Each of convolution-pond layer is configured to execute the operation of convolution sum maximum pondization.? In the embodiment, in step S501, can multiple characteristic patterns continuously be extracted by multiple convolution-pond layer, wherein by convolution- The characteristic pattern of preceding layer extraction in the layer of pond is input to next layer of convolution-pond layer, with the feature extracted with previously extracted Scheme different characteristic patterns.In step S502, all multiple characteristic patterns that can be extracted from step S501 by full articulamentum It is middle to generate shared face feature vector.
In this embodiment, method 500 further includes training step (being not shown in Fig. 5), the training step will with reference to figure 6 into Row is discussed.
As shown in fig. 6, in step s 601, to training facial image, its true key point of calibration from predetermined training set Position is used for the calibration real goal of each nonproductive task with it and is sampled.For training facial image, in step S602, The target prediction of its face key point prediction and all nonproductive tasks can be correspondingly obtained from fallout predictor 300.Then, in step In rapid S603, compare the difference between predicted face key point position and the true key point position of calibration, to generate key Point tolerance.In step s 604, be respectively compared target prediction and between the calibration real goal of each nonproductive task not Together, to generate at least one training mission error.Then, in step s 605, by the crucial point tolerance generated and all Training mission error back propagation is by convolutional neural networks, the power of the connection between neuron to adjust convolutional neural networks Weight.In step S606, determine whether one in nonproductive task restrained.If "No", process 600 returns to step S606.If "Yes", the training process of the task stops in step S607 and proceeds to step S608.In step S608 In, determine whether the training process of face critical point detection restrains.If "Yes", process 600 terminates, and otherwise, process 600 is returned Return to step S601.
Therefore, face critical point detection can optimize together with isomery but delicate relevant task.
Although the preferred embodiment of the present invention has been described, after understanding basic conception of the present invention, the technology of fields Personnel can be changed or change to these examples.The appended claims are intended to preferred including falling within the scope of the present invention Example and all changes or change.
Obviously, without departing from the spirit and scope of the present invention, those skilled in the art can be to the present invention It is changed or changes.Therefore, if these variations or change belong to the range of claims and equivalence techniques, they Also it can fall within the scope of the present invention.

Claims (15)

1. a kind of method for detecting the face key point of facial image is executed including the use of convolutional neural networks:
Multiple characteristic patterns are extracted from least one human face region of the facial image;
Shared face feature vector is generated from the multiple characteristic patterns extracted;And
The face key point position of the facial image is predicted from the shared face feature vector generated, and utilizes institute State shared face feature vector predict with the corresponding target of the associated at least one nonproductive task of face critical point detection, To obtain the target prediction of all nonproductive tasks simultaneously,
Wherein, the convolutional neural networks are trained using predetermined training set, and the step of the training includes:
1) to training, facial image, it demarcates true key point position and it is used for each auxiliary and appoints from the predetermined training set The calibration real goal of business is sampled;
2) compare the difference between predicted face key point position and the true key point position of calibration, to generate key Point tolerance;
3) it is respectively compared the target prediction and the difference demarcated between real goal for each nonproductive task, to generate extremely A few training mission error;And
4) by the crucial point tolerance generated and the training mission error back propagation that is generated by the convolutional neural networks, The weight of the connection between neuron to adjust the convolutional neural networks;
Repeat step 1) -4), until the training that the crucial point tolerance of the generation is less than first predetermined value and the generation is appointed Error of being engaged in is less than second predetermined value.
2. according to the method described in claim 1, the wherein described face key point include in the group being made up of extremely It is one few:Eyes central point, nose and the corners of the mouth of facial image.
3. according to the method described in claim 1, the wherein described nonproductive task include in the group being made up of at least One:Head pose estimation, Gender Classification, age estimation, facial expression recognition or face character are inferred.
4. according to the method described in claim 3, wherein, the convolutional neural networks include being configured to execute convolution sum maximum Multiple convolution-pond layer of pondization operation, and
Wherein, extracting multiple characteristic patterns from least one human face region of the facial image further includes:
The multiple characteristic pattern is continuously extracted by the multiple convolution-pond layer, wherein by before in the convolution-pond layer The characteristic pattern of one layer of extraction is input to next layer of the convolution-pond layer, not with extraction and the characteristic pattern previously extracted Same characteristic pattern.
5. according to the method described in claim 4, the wherein described convolutional neural networks further include full articulamentum, and from being carried When generating shared face feature vector in the multiple characteristic patterns taken, by the full articulamentum from all multiple spies extracted The shared face feature vector is generated in sign figure.
6. according to the method described in claim 5, each layer of the wherein described convolutional neural networks have multiple neurons, and Wherein the method further includes:
The convolutional neural networks are trained using predetermined training set, with adjust the convolutional neural networks the neuron it Between connection each weight so that generating the shared people by the convolutional neural networks with the weight being adjusted Face feature vector.
7. being held according to Least Square in Processing according to the method described in claim 1, wherein comparing with generating crucial point tolerance Row, and compare and executed with generating training mission error according to cross entropy processing.
8. according to the method described in claim 1, being wherein directed to each nonproductive task, the step of the training further includes:
5) from predetermined authentication concentrate to verification facial image and its be used for the calibration real goal of each nonproductive task and take Sample;
6) difference between the target prediction and the calibration real goal, to generate validation task error;
Step 5) and 6) is repeated, until the training mission error of the generation is less than third predetermined value and the verification generated is appointed Error of being engaged in is less than the 4th predetermined value.
9. according to the method described in claim 1, wherein, the people is being predicted from the shared face feature vector generated When the face key point position of face image, the face key point position of the facial image is according to (Wr)TxlIt determines,
Wherein WrRepresent the weight for being assigned to the face critical point detection, and xlThe shared face characteristic vector is represented, And T represents transposition.
10. a kind of system for detecting the face key point of facial image, including:
Feature extractor is configured to execute using convolutional neural networks:
Multiple characteristic patterns are extracted from least one human face region of the facial image;And
Shared face feature vector is generated from the multiple characteristic patterns extracted;And
Fallout predictor is configured to from the shared face feature vector generated by the feature extractor described in prediction The face key point position of facial image, and obtained and face key point by using the shared face feature vector The target prediction of associated at least one nonproductive task is detected,
Wherein, each layer of the convolutional neural networks has multiple neurons, and the system also includes training units, use In the training convolutional neural networks, so that the convolutional neural networks trained can extract the shared face characteristic Vector, the training unit include:
Sampler is configured to from predetermined training set to training facial image, it demarcates true key point position and it is used It is sampled in the calibration real goal of each nonproductive task;
Comparator is configured to compare between predicted face key point position and the true key point position of calibration Difference, to generate crucial point tolerance, and be respectively compared the target prediction and for each nonproductive task the calibration it is true Difference between real target, to generate at least one training mission error;And
Backpropagation device, is configured to the crucial point tolerance that will be generated and the training mission error back propagation passes through institute Convolutional neural networks are stated, the weight of the connection between the neuron to adjust the convolutional neural networks.
11. system according to claim 10, wherein the convolutional neural networks include:
Multiple convolution ponds layer is configured to execute the operation of convolution sum maximum pondization, and wherein by convolution pond layer In the characteristic pattern of preceding layer extraction be input in next layer of convolution pond layer, with extraction and the previous spy that had extracted Sign schemes different characteristic patterns;And
Full articulamentum, be configured to generate from multiple characteristic patterns of all extractions the shared face characteristic to Amount.
12. system according to claim 10, wherein the training unit further includes:
Determiner, whether the training process for being configured to determine the face critical point detection restrains and the instruction of each task Practice whether process restrains.
13. system according to claim 10, wherein the face key point includes in the group being made up of It is at least one:The center of the eyes of facial image, nose, the corners of the mouth.
14. system according to claim 10, wherein the nonproductive task include in the group being made up of extremely It is one few:Head pose estimation, Gender Classification, age estimation, facial expression recognition or face character are inferred.
15. a kind of method for training convolutional neural networks, the convolutional neural networks are performed simultaneously face critical point detection With at least one associated nonproductive task to obtain the target prediction of the nonproductive task, the described method comprises the following steps:
1) to training, facial image, it demarcates true key point position and it is used for each nonproductive task from predetermined training set Calibration real goal is sampled;
2) compare the difference between predicted face key point position and the true key point position of calibration, to generate key Point tolerance;
3) target prediction is respectively compared and for the difference between the calibration real goal of each nonproductive task, with life At at least one training mission error;
4) the crucial point tolerance generated and all training mission error back propagations are passed through into the convolutional Neural net Network, the weight of the connection between neuron to adjust the convolutional neural networks;
5) from predetermined authentication concentrate to verification facial image and its be used for the calibration real goal of each nonproductive task and take Sample;
6) difference between the target prediction and the calibration real goal, to generate validation task error;
7) determine whether the training mission error is less than first predetermined value and whether the validation task error is less than second Predetermined value;And
If so, terminating for training the convolutional neural networks, otherwise, by repeating said steps 1) to 7).
CN201480081241.1A 2014-08-21 2014-08-21 Method and system for the face critical point detection based on multitask Active CN106575367B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/000769 WO2016026063A1 (en) 2014-08-21 2014-08-21 A method and a system for facial landmark detection based on multi-task

Publications (2)

Publication Number Publication Date
CN106575367A CN106575367A (en) 2017-04-19
CN106575367B true CN106575367B (en) 2018-11-06

Family

ID=55350056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201480081241.1A Active CN106575367B (en) 2014-08-21 2014-08-21 Method and system for the face critical point detection based on multitask

Country Status (2)

Country Link
CN (1) CN106575367B (en)
WO (1) WO2016026063A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145857B (en) * 2017-04-29 2021-05-04 深圳市深网视界科技有限公司 Face attribute recognition method and device and model establishment method

Families Citing this family (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6750854B2 (en) * 2016-05-25 2020-09-02 キヤノン株式会社 Information processing apparatus and information processing method
CN105957095B (en) * 2016-06-15 2018-06-08 电子科技大学 A kind of Spiking angular-point detection methods based on gray level image
US10289822B2 (en) * 2016-07-22 2019-05-14 Nec Corporation Liveness detection for antispoof face recognition
US10467459B2 (en) 2016-09-09 2019-11-05 Microsoft Technology Licensing, Llc Object detection based on joint feature extraction
CN107871106B (en) * 2016-09-26 2021-07-06 北京眼神科技有限公司 Face detection method and device
JP6692271B2 (en) * 2016-09-28 2020-05-13 日本電信電話株式会社 Multitask processing device, multitask model learning device, and program
US10198626B2 (en) * 2016-10-19 2019-02-05 Snap Inc. Neural networks for facial modeling
US10460153B2 (en) 2016-11-15 2019-10-29 Futurewei Technologies, Inc. Automatic identity detection
CN106951840A (en) * 2017-03-09 2017-07-14 北京工业大学 A kind of facial feature points detection method
CN107038429A (en) * 2017-05-03 2017-08-11 四川云图睿视科技有限公司 A kind of multitask cascade face alignment method based on deep learning
CN106951888B (en) * 2017-05-09 2020-12-01 安徽大学 Relative coordinate constraint method and positioning method of human face characteristic point
CN107358149B (en) * 2017-05-27 2020-09-22 深圳市深网视界科技有限公司 Human body posture detection method and device
CN107578055B (en) * 2017-06-20 2020-04-14 北京陌上花科技有限公司 Image prediction method and device
CN108229288B (en) * 2017-06-23 2020-08-11 北京市商汤科技开发有限公司 Neural network training and clothes color detection method and device, storage medium and electronic equipment
CN107563279B (en) * 2017-07-22 2020-12-22 复旦大学 Model training method for adaptive weight adjustment aiming at human body attribute classification
US11341631B2 (en) 2017-08-09 2022-05-24 Shenzhen Keya Medical Technology Corporation System and method for automatically detecting a physiological condition from a medical image of a patient
CN107423727B (en) * 2017-08-14 2018-07-10 河南工程学院 Face complex expression recognition methods based on neural network
CN107704848A (en) * 2017-10-27 2018-02-16 深圳市唯特视科技有限公司 A kind of intensive face alignment method based on multi-constraint condition convolutional neural networks
CN108196535B (en) * 2017-12-12 2021-09-07 清华大学苏州汽车研究院(吴江) Automatic driving system based on reinforcement learning and multi-sensor fusion
CN108073910B (en) * 2017-12-29 2021-05-07 百度在线网络技术(北京)有限公司 Method and device for generating human face features
CN107992864A (en) * 2018-01-15 2018-05-04 武汉神目信息技术有限公司 A kind of vivo identification method and device based on image texture
CN110060296A (en) * 2018-01-18 2019-07-26 北京三星通信技术研究有限公司 Estimate method, electronic equipment and the method and apparatus for showing virtual objects of posture
CN108399373B (en) * 2018-02-06 2019-05-10 北京达佳互联信息技术有限公司 The model training and its detection method and device of face key point
EP3537348A1 (en) * 2018-03-06 2019-09-11 Dura Operating, LLC Heterogeneous convolutional neural network for multi-problem solving
CN108416314B (en) * 2018-03-16 2022-03-08 中山大学 Picture important face detection method
CN108615016B (en) * 2018-04-28 2020-06-19 北京华捷艾米科技有限公司 Face key point detection method and face key point detection device
US20210056292A1 (en) * 2018-05-17 2021-02-25 Hewlett-Packard Development Company, L.P. Image location identification
CN109147940B (en) * 2018-07-05 2021-05-25 科亚医疗科技股份有限公司 Apparatus and system for automatically predicting physiological condition from medical image of patient
CN109145798B (en) * 2018-08-13 2021-10-22 浙江零跑科技股份有限公司 Driving scene target identification and travelable region segmentation integration method
US11954881B2 (en) 2018-08-28 2024-04-09 Apple Inc. Semi-supervised learning using clustering as an additional constraint
CN109635750A (en) * 2018-12-14 2019-04-16 广西师范大学 A kind of compound convolutional neural networks images of gestures recognition methods under complex background
CN109522910B (en) * 2018-12-25 2020-12-11 浙江商汤科技开发有限公司 Key point detection method and device, electronic equipment and storage medium
CN109829431B (en) * 2019-01-31 2021-02-12 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN111563397B (en) * 2019-02-13 2023-04-18 阿里巴巴集团控股有限公司 Detection method, detection device, intelligent equipment and computer storage medium
CN109902641B (en) * 2019-03-06 2021-03-02 中国科学院自动化研究所 Semantic alignment-based face key point detection method, system and device
CN110163080A (en) 2019-04-02 2019-08-23 腾讯科技(深圳)有限公司 Face critical point detection method and device, storage medium and electronic equipment
CN110163098A (en) * 2019-04-17 2019-08-23 西北大学 Based on the facial expression recognition model construction of depth of seam division network and recognition methods
CN110136828A (en) * 2019-05-16 2019-08-16 杭州健培科技有限公司 A method of medical image multitask auxiliary diagnosis is realized based on deep learning
WO2021036726A1 (en) * 2019-08-29 2021-03-04 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method, system, and computer-readable medium for using face alignment model based on multi-task convolutional neural network-obtained data
CN110705419A (en) * 2019-09-24 2020-01-17 新华三大数据技术有限公司 Emotion recognition method, early warning method, model training method and related device
CN111339813B (en) * 2019-09-30 2022-09-27 深圳市商汤科技有限公司 Face attribute recognition method and device, electronic equipment and storage medium
CN111191675B (en) * 2019-12-03 2023-10-24 深圳市华尊科技股份有限公司 Pedestrian attribute identification model realization method and related device
WO2022003982A1 (en) * 2020-07-03 2022-01-06 日本電気株式会社 Detection device, learning device, detection method, and storage medium
KR102538804B1 (en) * 2020-11-16 2023-06-01 상명대학교 산학협력단 Device and method for landmark detection using artificial intelligence
CN112488003A (en) * 2020-12-03 2021-03-12 深圳市捷顺科技实业股份有限公司 Face detection method, model creation method, device, equipment and medium
CN112820382A (en) * 2021-02-04 2021-05-18 上海小芃科技有限公司 Breast cancer postoperative intelligent rehabilitation training method, device, equipment and storage medium
US11776323B2 (en) 2022-02-15 2023-10-03 Ford Global Technologies, Llc Biometric task network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831382A (en) * 2011-06-15 2012-12-19 北京三星通信技术研究有限公司 Face tracking apparatus and method
CN103824054A (en) * 2014-02-17 2014-05-28 北京旷视科技有限公司 Cascaded depth neural network-based face attribute recognition method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1352436A (en) * 2000-11-15 2002-06-05 星创科技股份有限公司 Real-time face identification system
JP4778158B2 (en) * 2001-05-31 2011-09-21 オリンパス株式会社 Image selection support device
CN101673340A (en) * 2009-08-13 2010-03-17 重庆大学 Method for identifying human ear by colligating multi-direction and multi-dimension and BP neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831382A (en) * 2011-06-15 2012-12-19 北京三星通信技术研究有限公司 Face tracking apparatus and method
CN103824054A (en) * 2014-02-17 2014-05-28 北京旷视科技有限公司 Cascaded depth neural network-based face attribute recognition method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
增长式卷积神经网络及其在人脸检测中的应用;顾佳玲 等;《系统仿真学报》;20090430;第21卷(第8期);第2441-2445页 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145857B (en) * 2017-04-29 2021-05-04 深圳市深网视界科技有限公司 Face attribute recognition method and device and model establishment method

Also Published As

Publication number Publication date
WO2016026063A1 (en) 2016-02-25
CN106575367A (en) 2017-04-19

Similar Documents

Publication Publication Date Title
CN106575367B (en) Method and system for the face critical point detection based on multitask
Zhang et al. Fruit classification by biogeography‐based optimization and feedforward neural network
EP3074918B1 (en) Method and system for face image recognition
Hemanth et al. Performance improved iteration-free artificial neural networks for abnormal magnetic resonance brain image classification
Currie et al. Intelligent imaging in nuclear medicine: the principles of artificial intelligence, machine learning and deep learning
CN109902546A (en) Face identification method, device and computer-readable medium
CN108427921A (en) A kind of face identification method based on convolutional neural networks
CN114781272A (en) Carbon emission prediction method, device, equipment and storage medium
Khan et al. Human gait analysis: A sequential framework of lightweight deep learning and improved moth-flame optimization algorithm
CN111611877A (en) Age interference resistant face recognition method based on multi-temporal-spatial information fusion
Pranav et al. Detection and identification of COVID-19 based on chest medical image by using convolutional neural networks
Abd-Ellah et al. Parallel deep CNN structure for glioma detection and classification via brain MRI Images
Farahmand-Tabar et al. Steel Plate Fault Detection Using the Fitness-Dependent Optimizer and Neural Networks
McLaughlin et al. 3-D human pose estimation using iterative conditional squeeze and excitation networks
Zakeri et al. DragNet: Learning-based deformable registration for realistic cardiac MR sequence generation from a single frame
Trottier et al. Multi-task learning by deep collaboration and application in facial landmark detection
Yentrapragada Deep features based convolutional neural network to detect and automatic classification of white blood cells
Jahnavi et al. Detection of COVID-19 using ResNet50, VGG19, mobilenet, and forecasting; using logistic regression, prophet, and SEIRD Model
Cárdenas-Peña et al. Supervised kernel approach for automated learning using General Stochastic Networks
Goetschalckx et al. Computing a human-like reaction time metric from stable recurrent vision models
Kamabattula et al. Identifying the Training Stop Point with Noisy Labeled Data
Bhattacharjee et al. Active learning for imbalanced domains: the ALOD and ALOD-RE algorithms
Wu et al. Multi-rater Prism: Learning self-calibrated medical image segmentation from multiple raters
Farag et al. Inductive Conformal Prediction for Harvest-Readiness Classification of Cauliflower Plants: A Comparative Study of Uncertainty Quantification Methods
Hosamani et al. Data science: prediction and analysis of data using multiple classifier system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant