WO2017084098A1 - System and method for face alignment - Google Patents
System and method for face alignment Download PDFInfo
- Publication number
- WO2017084098A1 WO2017084098A1 PCT/CN2015/095197 CN2015095197W WO2017084098A1 WO 2017084098 A1 WO2017084098 A1 WO 2017084098A1 CN 2015095197 W CN2015095197 W CN 2015095197W WO 2017084098 A1 WO2017084098 A1 WO 2017084098A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- shape
- predetermined
- feature
- vector
- regressed
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/755—Deformable models or variational models, e.g. snakes or active contours
- G06V10/7557—Deformable models or variational models, e.g. snakes or active contours based on appearance, e.g. active appearance models [AAM]
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The present invention discloses a system and method for face alignment. The method for face alignment which may comprises: extracting a feature of a face image based on a predetermined face shape in the face image, estimating a shape residual for each of a plurality of predetermined domains by applying a regressor to the extracted feature, computing a regressed shape for each of the plurality of predetermined domains by adding the shape residuals to the face shape, obtaining a feature for each domain based on the regressed shape, predicting a composition vector by using the obtained features, weighting the regressed shapes by using the predicted composition vector, and compositing the weighted regressed shapes to output a compositional shape.
Description
The present application relates to the technical field of pattern recognition, more particularly to a system and a method for face alignment.
Background of the Application
Face alignment aims to automatically localize facial parts, which are essential for many subsequent processing modules, e.g., face recognition, attributes prediction, and robust face frontalisation.
The study of face alignment has made rapid progresses in recent years. Unconstrained face alignment beyond frontally biased faces is an emerging research topic. However, the existing methods cannot properly handle faces with unconstrained variations.
For example, the supervised descent method (SDM) is a representative method among the mainstream approaches. As shown in Fig. 1 (a) , even the approach is retrained on AFLW dataset which provides a good example of images typically found in unconstrained scenarios, its effective scope is confined within frontally biased faces, and it has difficulty to cover an enlarged shape parameter space due to large head rotations and face deformations caused by rich expressions. Xiong and De la Torre have the same observation –a cascaded regressor such as the SDM is only effective within a specific domain of homogeneous descent (DHD) (see X. Xiong and F. De la Torre. Global supervised descent method. In CVPR, 2015) .
An intuitive multi-view approach has also been proposed, head poses are first estimated, followed by face alignment on a specific view. Although there is a performance improvement, however, as shown in Fig. 1 (b) , the heuristic partitioning with respect to the head pose only is still suboptimal because it neglects other shape
deformation or appearance variations, e.g. large mouth, large face scale or sunglasses. Moreover, this approach assumes independence between different view models without considering their inter-complementary and regularization role. Hence, the error caused by head pose estimation could easily be propagated and amplified to the final shape estimation, reducing the overall robustness.
The above approaches demonstrate the difficulties of covering a wider range of shape and appearance variations beyond frontal faces, both with a single model and multiple models.
There is therefore a need for a practical approach to address the problems of unconstrained face alignment.
Summary of the Application
The present application intends to provide an effective and efficient approach for unconstrained face alignment. It does not rely on 3D face modelling and 3D annotations, and does not make assumption on the pose range. It can comfortably deal with arbitrary view pose and rich expressions in the full AFLW dataset. In addition, the alignment is achieved on a single image without the need of temporal prior. The present application achieves this by using a cascaded compositional learning.
One aspect of the present application discloses a method for face alignment which may comprises: extracting a feature of a face image based on a predetermined face shape in the face image, estimating a shape residual for each of a plurality of predetermined domains by applying a regressor to the extracted feature, computing a regressed shape for each of the plurality of predetermined domains by adding the shape residuals to the face shape, obtaining a feature for each domain based on the regressed shape, predicting a composition vector by using the obtained features, weighting the regressed shapes by using the predicted composition vector, and compositing the weighted regressed shapes to output a compositional shape.
According to an embodiment of the present application, extracting the feature may comprises: traversing a region surrounding each of at least one landmark of
the predetermined face shape to each tree of a predetermined decision forest until a leaf node is reached for each tree, obtaining a vector for each of the landmarks the vector indicating the reached leaf node of the tree, and combining the vector for each of the landmarks to output the extracted feature.
According to an embodiment of the present application, obtaining the feature for each domain may comprise: using the vector for each of the landmarks to obtain the feature for each domain.
According to an embodiment of the present application, predicting the composition vector may comprise: predicting the composition vector by inputting the obtained feature into a predetermined composition forest.
According to an embodiment of the present application, the method may further comprise training the predetermined decision forest by using a Hough forest approach to minimize a structured loss of the predetermined decision forest.
According to an embodiment of the present application, the structured loss of the predetermined decision forest is minimized by regressing the difference between the predetermined face shape and a preset shape for each of the at least one landmark of the predetermined face shape.
According to an embodiment of the present application, the method may further comprise training the regressor by linear regression learning.
According to an embodiment of the present application, the method may further comprise training the predetermined composition forest by minimizing a discrepancy between the compositional shape and a preset shape.
According to an embodiment of the present application, a domain is excluded if the composition vector is zero for the domain.
Another aspect of the present application discloses an apparatus for face alignment which may comprise an extracting means for extracting a feature of a face image based on a predetermined face shape in the face image, an estimating means for
estimating a shape residual for each of a plurality of predetermined domains by applying a regressor to the extracted feature, a computing means for computing a regressed shape for each of the plurality of predetermined domains by adding the shape residual to the face shape, an obtaining means for obtaining a feature for each domain based on the regressed shape, a predicting means for predicting a composition vector by using the obtained features, a weighting means for weighting the regressed shapes by using the predicted composition vector, and a compositing means for compositing the weighted regressed shapes to output a compositional shape.
According to an embodiment of the present application, the extracting means may comprise: a traversing sub-means for traversing a region surrounding each of at least one landmark of the predetermined face shape to each tree of a predetermined decision forest until a leaf node is reached for each tree, an obtaining sub-means for obtaining a vector for each of the landmarks, the vector indicating the reached leaf node of the tree, and a combining sub-means for combining the vector for each of the landmarks to output the extracted feature.
According to an embodiment of the present application, the obtaining sub-means may use the vector for each of the landmarks to obtain the feature for each domain.
According to an embodiment of the present application, the predicting means may predict the composition vector by inputting the obtained feature into a predetermined composition forest.
According to an embodiment of the present application, the apparatus may further comprise a decision forest training means for training the predetermined decision forest by using a Hough forest approach to minimize a structured loss of the predetermined decision forest.
According to an embodiment of the present application, the structured loss of the predetermined decision forest may be minimized by regressing the difference
between the predetermined face shape and a preset shape for each of the at least one landmark of the predetermined face shape.
According to an embodiment of the present application, the apparatus may further comprise a regressor training means for training the regressor by linear regression learning.
According to an embodiment of the present application, the apparatus may further comprise a composition forest training means for training the predetermined composition forest by minimizing a discrepancy between the compositional shape and a preset shape.
Yet another aspect of the present application discloses a system for face alignment which may comprise a processor, and a memory, the memory storing computer-readable instructions which when executed by the processor, cause the processor to: extract a feature of a face image based on a predetermined face shape in the face image, estimate a shape residual for each of a plurality of predetermined domains by applying a regressor to the extracted feature, compute a regressed shape for each of the plurality of predetermined domains by adding the shape residual to the face shape, obtain a feature for each domain based on the regressed shape, predict a composition vector by using the obtained features, weight the regressed shapes by using the predicted composition vector, and composite the weighted regressed shapes to output a compositional shape..
Still another aspect of the present application discloses a non-volatile computer storage medium, storing computer-readable instructions which when executed by a processor, cause the processor to: extract a feature of a face image based on a predetermined face shape in the face image, estimate a shape residual for each of a plurality of predetermined domains by applying a regressor to the extracted feature, compute a regressed shape for each of the plurality of predetermined domains by adding the shape residual to the face shape, obtain a feature for each domain based on the regressed shape, predict a composition vector by using the obtained features, weight the
regressed shapes by using the predicted composition vector, and composite the weighted regressed shapes to output a compositional shape.
Brief Description of the Drawing
Other features, objects and advantages of the present application will become more apparent from a reading of the detailed description of the non-limiting embodiments, said description being given in relation to the accompanying drawings, among which:
Fig. 1 illustrates test error distributions of two existing approach on the AFLW dataset, in which two factors, yaw and mouth size, are selected to visualize the distribution and provide the representative facial images in five regions (I-V) ;
Fig. 2 illustrates an exemplary flow chart of a method for face alignment according to an embodiment of the present application;
Fig. 3 illustrates an exemplary flowchart of extracting a feature for a face image according to an embodiment of the present application;
Fig. 4 illustrates an exemplary flowchart of obtaining a regressed domain specific shape according to an embodiment of the present application;
Fig. 5 illustrates an exemplary flowchart of predicting a compositional shape according to an embodiment of the present application;
Fig. 6 illustrates a schematic block diagram of an apparatus for face alignment according to an embodiment of the present application; and
Fig. 7 illustrates a schematic structural diagram of a schematic structural diagram of a computer system that is adapted for implementing the method and the apparatus for face alignment according to an embodiment of the present application.
The present application will be further described in detail in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are provided to illustrate the present invention,
instead of limiting the present invention. It also should be noted that only parts related to the present invention are shown in the figures for convenience of description.
It should be noted that, the embodiments of the present application and the features in the present application, on a non-conflict basis, may be combined with each other. The present application will be further described in details below in conjunction with the accompanying drawings and embodiments.
Fig. 2 illustrates an exemplary flow chart of a method for face alignment according to an embodiment of the present application.
At step 100, a feature is extracted of a face image. In a non-limiting example, for each landmark on the face image, a binary feature is obtained. The binary features for all the landmarks are subsequently combined to form the feature of the face image.
At step 200, regressed domain specific shapes of the face image are obtained. For each domain, an estimated shape residual is obtained by using the feature of the face image. The estimated shape residual is added to the predetermined shape s of the face image to compute the regressed domain specific shapes.
At step 300, a compositional shape for the face image is predicted. For each domain, a feature is obtained by using the extracted feature of step 100. The feature for each domain is inputted into a composition forest to predict a composition vector. The domain specific shape for each domain is then weighted by the composition vector. All the weighted domain specific shapes are aggregated to obtain a compositional shape of the face image.
Fig. 3 illustrates an exemplary flowchart of extracting a feature for a face image according to an embodiment of the present application.
At step 110, a sample, i.e. a region surrounding each landmark l, is traversed to each tree of a predetermined decision forest until a leaf node is reached for
each tree to obtain a binary vectorwhich indicates where each leaf node of the tree is reached (1 when reached and 0 otherwise) . The dimensionality ofequals the total number of leafs in the decision forest and the number of 1 in the vector equals the total number of the trees in the forest.
For each landmark, the decision forest can be trained using a Hough Forest approach to minimize the structured loss by simultaneously minimize a landmark regression residual and classify the facial part and the background. The landmark regression residual is defined as a difference between the predetermined face shape s and a ground-truth shape s*for each landmark. The ground-truth shape s*is preset.
At step 120, all the features for the landmarks are combined to form the extracted featurefor the face image, i.e.,
Fig. 4 illustrates an exemplary flowchart of obtaining a regressed domain specific shape according to an embodiment of the present application.
At step 210, for each domain k, a shape residual Δsk is estimated by applying a domain-specific regressor ωk. The shape residual Δsk is obtained as follows: K domains may be defined by partitioning all training samples into K subsets. For example, all samples may be partitioned according to the principle components of shape and local appearance. Each component halves the samples and hence K is always a power of 2. It is worth pointing out that head pose is not the only underlying factor for the partition. By observing the mean face of each domain, it has been observed that some domains are dominant by shape deformation or appearance property, e.g. wide-open mouth, large facial scaling, large face contour or faces with sunglasses. All domains share the same feature mapping
For each domain k, the domain-specific regressor ωk may be learned by linear regression learning. The domain-specific regressor ωk may be defined as:
At step 220, the regressed domain specific shape sk is computed by adding the shape residual Δsk to the predetermined face shape s, i.e., sk = s +Δsk, (k=1, …, K) .
Fig. 5 illustrates an exemplary flowchart of predicting a compositional shape according to an embodiment of the present application.
At step 310, a featurefor each domain k is obtained. The previously learned feature mappingis used to obtain the featurefor each domain k.
At step 320, the regressed domain specific shape sk and the featurefor the domain is inputted into a predetermined composition forest f'to predict a composition vector p.
The predetermined composition forest f'may be trained by minimizing the discrepancy between the compositional shape s’and the ground-truth shape s*, which can be expressedThe composition vector p is a meaningful quantitative description of domains. For example, the composition of two incompatible domains (e.g. left and right profile-view domains) should not co-occur. Each composition is also non-negative that provides valid shape contribution. The composition vector p is estimated after Δsk so that it could directly exploit the local appearance. This provides the opportunity to handle faces in the unconstrained scenario by still only extracting the fast pixel feature throughout an embodiment of the present application.
At step 330, the domain specific shape sk is weighted by the composition vector p.
At step 340, the weighted domain specific shape sk is aggregated to output the composition shape s’, i.e.,
Fig. 6 illustrates a schematic block diagram of an apparatus for face alignment according to an embodiment of the present application.
As shown in Fig. 6, the apparatus for face alignment 2000 comprises a feature extraction unit 2100, a domain specific regression unit 2200 and a composition prediction unit 2300.
The feature extraction unit 2100 is used for extracting a feature of a face image. The face image and a predetermined shape of the face image are inputted into the feature extraction unit 2100, and the feature of the face image is outputted. In the feature extraction unit 2100, a sample, i.e. a region surrounding each landmark l, is traversed to each tree of a predetermined decision forest until a leaf node is reached for each tree to obtain a binary vectorwhich indicates whether each leaf node of the tree is reached (1 for reached and 0 otherwise) . The dimensionality ofequals the total number of leafs in the decision forest and the number of 1 in the vector equals the total number of trees in the forest. The decision forest can be trained as described above. The feature extraction unit 2100 combines all the features for the landmarks to form the extracted featurefor the face image, i.e.,
The domain specific regression unit 2200 is used for obtaining regressed domain specific shapes of the face image. The extracted feature of the face image is inputted into the domain specific regression unit 2200, and the regressed domain specific shapes are outputted. In the domain specific regression unit 2200, a shape residual Δsk is estimated for each domain k by applying a domain-specific regressor ωk. The shape residual Δsk is obtained as follows:K domains may be defined by partitioning all training samples into K subsets. The domain specific regression unit 2200 then computes the regressed domain specific shape sk by adding the shape residual Δskto the predetermined face shape s.
The composition prediction unit 2300 is used for predicting a compositional shape for the face image. The regressed domain specific shapes are inputted into the composition prediction unit 2300, and the compositional shape for the face image is outputted. In the composition prediction unit 2300, a featurefor each domain k is obtained. The feature mappingmay be determined in the feature extraction unit 2100. The composition prediction unit 2300 then inputs the regressed
domain specific shape sk and the featurefor the domain into a predetermined composition forest f'to predict a composition vector p. The predetermined composition forest f'may be trained by minimizing the discrepancy between the compositional shape s’and the ground-truth shape s*, which can be expressed The composition prediction unit 2300 weights the domain specific shape sk by using the composition vector p and aggregates the weighted domain specific shape sk to output the composition shape s’.
It should be understood that the units or sub-units described in the apparatus for face alignment 2000 correspond to the steps of the method described above with reference to the flow chart. Therefore, the operations and characteristics described above with reference to the method also apply to the apparatus for face alignment 2000 and the units thereof, and thus will not be repeated herein.
Referring now to Fig. 7, a schematic structural diagram of a computer system 3000 that is adapted for implementing the method and the apparatus for face alignment according to an embodiment of the present application is shown.
As shown in Fig. 7, the computer system 3000 comprises a central processing unit (CPU) 3001, which may perform a variety of appropriate actions and processes according to a program stored in a read only memory (ROM) 3002 or a program loaded to a random access memory (RAM) 3003 from a storage part 3008. RAM 3003 also stores various programs and data required by operations of the system 3000. CPU 3001, ROM 3002 and RAM 3003 are connected to each other via a bus 3004. An input/output (I/O) interface 3005 is also connected to the bus 3004.
The following components are connected to the I/O interface 3005: an input part 3006 comprising a keyboard, a mouse and the like, an output part 3007 comprising a cathode ray tube (CRT) , a liquid crystal display (LCD) , a speaker and the like; the storage part 3008 comprising a hard disk and the like; and a communication part 3009 comprising a network interface card, such as a LAN card, a modem and the like. The communication part 3009 performs communication process via a network,
such as the Internet. A driver 3010 is also connected to the I/O interface 3005 as required. A removable medium 3011, such as a magnetic disk, an optical disk, a magneto-optical disk and a semiconductor memory, may be installed onto the driver 3010 as required, so as to install a computer program read therefrom to the storage part 3008 as needed.
In particular, according to the embodiment of the present disclosure, the method described above with reference to Figs. 2 to 5 may be implemented as a computer software program. For example, the embodiment of the present disclosure comprises a computer program product, which comprises a computer program that tangibly included in a machine-readable medium. The computer program comprises program codes for executing the method in Figs. 2 to 5. In such embodiments, the computer program may be downloaded from the network via the communication part 3009 and installed, and/or be installed from the removable medium 3011.
The flow charts and the block diagrams in the figures illustrate the system architectures, functions, and operations which may be achieved by the systems, devices, methods, and computer program products according to various embodiments of the present application. For this, each block of the flow charts or the block diagrams may represent a module, a program segment, or a portion of the codes which comprise one or more executable instructions for implementing the specified logical functions. It should also be noted that, in some alternative implementations, the functions denoted in the blocks may occur in a different sequence from that marked in the figures. For example, two blocks denoted in succession may be performed substantially in parallel, or in an opposite sequence, which depends on the related functions. It should also be noted that each block of the block diagrams and/or the flow charts and the combination thereof may be achieved by a specific system which is based on the hardware and performs the specified functions or operations, or by the combination of the specific hardware and the computer instructions.
The units or modules involved in the embodiments of the present application may be implemented in hardware or software. The described units or
modules may also be provided in a processor. The names of these units or modules do not limit the units or modules themselves.
As another aspect, the present application further provides a computer readable storage medium, which may be a computer readable storage medium contained in the device described in the above embodiments; or a computer readable storage medium separately exists rather than being fitted into any terminal apparatus. One or more computer programs may be stored on the computer readable storage medium, and the programs are executed by one or more processors to perform the formula input method described in the present application.
The above description is only the preferred embodiments of the present application and the description of the principles of applied techniques. It will be appreciated by those skilled in the art that, the scope of the claimed solutions as disclosed in the present application are not limited to those consisted of particular combinations of features described above, but should cover other solutions formed by any combination of features from the foregoing or an equivalent thereof without departing from the inventive concepts, for example, a solution formed by replacing one or more features as discussed in the above with one or more features with similar functions disclosed (but not limited to) in the present application.
Claims (28)
- A method for face alignment, comprising:extracting a feature of a face image based on a predetermined face shape in the face image;estimating a shape residual for each of a plurality of predetermined domains by applying a regressor to the extracted feature;computing a regressed shape for each of the plurality of predetermined domains by adding the shape residuals to the face shape;obtaining a feature for each domain based on the regressed shape;predicting a composition vector by using the obtained features;weighting the regressed shapes by using the predicted composition vector; andcompositing the weighted regressed shapes to output a compositional shape.
- The method of claim 1, wherein extracting the feature comprises:traversing a region surrounding each of at least one landmark of the predetermined face shape to each tree of a predetermined decision forest until a leaf node is reached for each tree;obtaining a vector for each of the landmarks, the vector indicating the reached leaf node of the tree; andcombining the vector for each of the landmarks to output the extracted feature.
- The method of claim 2, wherein obtaining the feature for each domain comprises:using the vector for each of the landmarks to obtain the feature for each domain.
- The method of claim 1, wherein predicting the composition vector comprises:predicting the composition vector by inputting the obtained feature into a predetermined composition forest.
- The method of claim 1, further comprising training the predetermined decision forest by using a Hough forest approach to minimize a structured loss of the predetermined decision forest.
- The method of claim 5, wherein the structured loss of the predetermined decision forest is minimized by regressing the difference between the predetermined face shape and a preset shape for each of the at least one landmark of the predetermined face shape.
- The method of claim 1, further comprising training the regressor by linear regression learning.
- The method of claim 4, further comprising training the predetermined composition forest by minimizing a discrepancy between the compositional shape and a preset shape.
- The method of claim 1, excluding a domain if the composition vector is zero for the domain.
- An apparatus for face alignment, comprising:an extracting means for extracting a feature of a face image based on a predetermined face shape in the face image;an estimating means for estimating a shape residual for each of a plurality of predetermined domains by applying a regressor to the extracted feature;a computing means for computing a regressed shape for each of the plurality of predetermined domains by adding the shape residual to the face shape;an obtaining means for obtaining a feature for each domain based on the regressed shape;a predicting means for predicting a composition vector by using the obtained features;a weighting means for weighting the regressed shapes by using the predicted composition vector; anda compositing means for compositing the weighted regressed shapes to output a compositional shape.
- The apparatus of claim 10, wherein the extracting means comprises:a traversing sub-means for traversing a region surrounding each of at least one landmark of the predetermined face shape to each tree of a predetermined decision forest until a leaf node is reached for each tree;an obtaining sub-means for obtaining a vector for each of the landmarks, the vector indicating the reached leaf node of the tree; anda combining sub-means for combining the vector for each of the landmarks to output the extracted feature.
- The apparatus of claim 11, wherein the obtaining sub-means uses the vector for each of the landmarks to obtain the feature for each domain.
- The apparatus of claim 10, wherein the predicting means predicts the composition vector by inputting the obtained feature into a predetermined composition forest.
- The apparatus of claim 10, further comprising a decision forest training means for training the predetermined decision forest by using a Hough forest approach to minimize a structured loss of the predetermined decision forest.
- The apparatus of claim 14, wherein the structured loss of the predetermined decision forest is minimized by regressing the difference between the predetermined face shape and a preset shape for each of the at least one landmark of the predetermined face shape.
- The apparatus of claim 10, further comprising a regressor training means for training the regressor by linear regression learning.
- The apparatus of claim 13, further comprising a composition forest training means for training the predetermined composition forest by minimizing a discrepancy between the compositional shape and a preset shape.
- The apparatus of claim 10, a domain is excluded if the composition vector is zero for the domain.
- A system for face alignment, comprising:a processor; anda memory;the memory storing computer-readable instructions which when executed by the processor, cause the processor to:extract a feature of a face image based on a predetermined face shape in the face image;estimate a shape residual for each of a plurality of predetermined domains by applying a regressor to the extracted feature;compute a regressed shape for each of the plurality of predetermined domains by adding the shape residual to the face shape;obtain a feature for each domain based on the regressed shape;predict a composition vector by using the obtained features;weight the regressed shapes by using the predicted composition vector; andcomposite the weighted regressed shapes to output a compositional shape.
- The system of claim 19, wherein extracting the feature comprises:traversing a region surrounding each of at least one landmark of the predetermined face shape to each tree of a predetermined decision forest until a leaf node is reached for each tree;obtaining a vector for each of the landmarks, the vector indicating the reached leaf node of the tree; andcombining the vector for each of the landmarks to output the extracted feature.
- The system of claim 20, wherein obtaining the feature for each domain comprises:using the vector for each of the landmarks to obtain the feature for each domain.
- The system of claim 19, wherein predicting the composition vector comprises:predicting the composition vector by inputting the obtained feature into a predetermined composition forest.
- The system of claim 19, wherein the processor is further configured to train the predetermined decision forest by using a Hough forest approach to minimize a structured loss of the predetermined decision forest.
- The system of claim 23, wherein the structured loss of the predetermined decision forest is minimized by regressing the difference between the predetermined face shape and a preset shape for each of the at least one landmark of the predetermined face shape.
- The system of claim 19, wherein the processor is further configured to train the regressor by linear regression learning.
- The system of claim 22, wherein the processor is further configured to train the predetermined composition forest by minimizing a discrepancy between the compositional shape and a preset shape.
- The system of claim 19, the processor is further configured to exclude a domain if the composition vector is zero for the domain.
- A non-volatile computer storage medium, storing computer-readable instructions which when executed by a processor, cause the processor to:extract a feature of a face image based on a predetermined face shape in the face image;estimate a shape residual for each of a plurality of predetermined domains by applying a regressor to the extracted feature;compute a regressed shape for each of the plurality of predetermined domains by adding the shape residual to the face shape;obtain a feature for each domain based on the regressed shape;predict a composition vector by using the obtained features;weight the regressed shapes by using the predicted composition vector; andcomposite the weighted regressed shapes to output a compositional shape.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201580085696.5A CN108701206B (en) | 2015-11-20 | 2015-11-20 | System and method for facial alignment |
PCT/CN2015/095197 WO2017084098A1 (en) | 2015-11-20 | 2015-11-20 | System and method for face alignment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2015/095197 WO2017084098A1 (en) | 2015-11-20 | 2015-11-20 | System and method for face alignment |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017084098A1 true WO2017084098A1 (en) | 2017-05-26 |
Family
ID=58717266
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2015/095197 WO2017084098A1 (en) | 2015-11-20 | 2015-11-20 | System and method for face alignment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108701206B (en) |
WO (1) | WO2017084098A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3340109A1 (en) * | 2016-12-25 | 2018-06-27 | Facebook, Inc. | Shape prediction for face alignment |
US10019651B1 (en) | 2016-12-25 | 2018-07-10 | Facebook, Inc. | Robust shape prediction for face alignment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140185924A1 (en) * | 2012-12-27 | 2014-07-03 | Microsoft Corporation | Face Alignment by Explicit Shape Regression |
CN104318264A (en) * | 2014-10-14 | 2015-01-28 | 武汉科技大学 | Facial feature point tracking method based on human eye preferential fitting |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5294343B2 (en) * | 2008-06-10 | 2013-09-18 | 国立大学法人東京工業大学 | Image alignment processing device, area expansion processing device, and image quality improvement processing device |
CN104050628B (en) * | 2013-03-11 | 2017-04-12 | 佳能株式会社 | Image processing method and image processing device |
-
2015
- 2015-11-20 CN CN201580085696.5A patent/CN108701206B/en active Active
- 2015-11-20 WO PCT/CN2015/095197 patent/WO2017084098A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140185924A1 (en) * | 2012-12-27 | 2014-07-03 | Microsoft Corporation | Face Alignment by Explicit Shape Regression |
CN104318264A (en) * | 2014-10-14 | 2015-01-28 | 武汉科技大学 | Facial feature point tracking method based on human eye preferential fitting |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3340109A1 (en) * | 2016-12-25 | 2018-06-27 | Facebook, Inc. | Shape prediction for face alignment |
US10019651B1 (en) | 2016-12-25 | 2018-07-10 | Facebook, Inc. | Robust shape prediction for face alignment |
Also Published As
Publication number | Publication date |
---|---|
CN108701206B (en) | 2022-04-12 |
CN108701206A (en) | 2018-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108229296B (en) | Face skin attribute identification method and device, electronic equipment and storage medium | |
US11200424B2 (en) | Space-time memory network for locating target object in video content | |
CN111192292B (en) | Target tracking method and related equipment based on attention mechanism and twin network | |
CN110969250B (en) | Neural network training method and device | |
US11443445B2 (en) | Method and apparatus for depth estimation of monocular image, and storage medium | |
US20190279014A1 (en) | Method and apparatus for detecting object keypoint, and electronic device | |
US20200151849A1 (en) | Visual style transfer of images | |
CN108399383B (en) | Expression migration method, device storage medium, and program | |
US8958630B1 (en) | System and method for generating a classifier for semantically segmenting an image | |
WO2020119458A1 (en) | Facial landmark detection method and apparatus, computer device and storage medium | |
US20210192271A1 (en) | Method and Apparatus for Pose Planar Constraining on the Basis of Planar Feature Extraction | |
EP4322056A1 (en) | Model training method and apparatus | |
CN108230354B (en) | Target tracking method, network training method, device, electronic equipment and storage medium | |
Murtaza et al. | Face recognition using adaptive margin fisher’s criterion and linear discriminant analysis | |
CN110570435B (en) | Method and device for carrying out damage segmentation on vehicle damage image | |
CN113343982B (en) | Entity relation extraction method, device and equipment for multi-modal feature fusion | |
CN113128478B (en) | Model training method, pedestrian analysis method, device, equipment and storage medium | |
CN112861659B (en) | Image model training method and device, electronic equipment and storage medium | |
EP2927864A1 (en) | Image processing device and image processing method | |
EP2851867A2 (en) | Method and apparatus for filtering an image | |
WO2022152104A1 (en) | Action recognition model training method and device, and action recognition method and device | |
CN115861462B (en) | Training method and device for image generation model, electronic equipment and storage medium | |
CN109255382B (en) | Neural network system, method and device for picture matching positioning | |
CN113112518A (en) | Feature extractor generation method and device based on spliced image and computer equipment | |
CN113505797A (en) | Model training method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15908594 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15908594 Country of ref document: EP Kind code of ref document: A1 |