WO2016183834A1 - An apparatus and a method for locating facial landmarks of face image - Google Patents

An apparatus and a method for locating facial landmarks of face image Download PDF

Info

Publication number
WO2016183834A1
WO2016183834A1 PCT/CN2015/079429 CN2015079429W WO2016183834A1 WO 2016183834 A1 WO2016183834 A1 WO 2016183834A1 CN 2015079429 W CN2015079429 W CN 2015079429W WO 2016183834 A1 WO2016183834 A1 WO 2016183834A1
Authority
WO
WIPO (PCT)
Prior art keywords
face image
region
shapes
unit
sub
Prior art date
Application number
PCT/CN2015/079429
Other languages
French (fr)
Inventor
Xiaoou Tang
Shizhan ZHU
Cheng Li
Chen Change Loy
Original Assignee
Xiaoou Tang
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaoou Tang filed Critical Xiaoou Tang
Priority to PCT/CN2015/079429 priority Critical patent/WO2016183834A1/en
Priority to CN201580080396.8A priority patent/CN107615295B/en
Publication of WO2016183834A1 publication Critical patent/WO2016183834A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/755Deformable models or variational models, e.g. snakes or active contours
    • G06V10/7557Deformable models or variational models, e.g. snakes or active contours based on appearance, e.g. active appearance models [AAM]

Definitions

  • the disclosures relate to face alignment, in particular, to a method, an apparatus and a system for locating facial landmarks of a face image.
  • Face alignment aims at locating facial key points automatically.
  • the cascaded regression approach has emerged as one of the most popular.
  • the algorithm typically starts from an initial shape, e.g., a mean shape of training samples, and refines the shape through sequentially trained regressors.
  • the application aims to address at least one or more of the above problems of face alignment.
  • the method according to the present application begins with a coarse search over a shape space that contains diverse shapes, and employs the coarse solution to constrain subsequent finer search of shanes, that is, “coarse-to-fine” approach.
  • the unique stage-by-stage progressive and adaptive search can i) prevents the final solution from being trapped in local optima due to poor initialization, a common problem encountered by cascaded regression approaches; and ii) improves the robustness in coping with large pose variations.
  • the apparatus proposes a hybrid features setting to achieve practical speed. Owing to the unique error tolerance in the coarse-to-fine searching mechanism, the apparatus is capable of switching different types of regression features in different optimization stages, without sacrificing accuracy too much.
  • a method for locating facial landmarks of a face image may comprise: retrieving a set of candidate shapes respectively from a predetermined shape region, each of the candidate shapes having labeled facial landmarks; aligning each of the retrieved candidate shape with the face image to obtain corresponding aligned shapes; determining, according to the aligned shapes obtained in a current stage of two or more stages, a sub-region of the shape region to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage, and repeating the retrieving, the aligning and the determining for the stages to locate the facial landmarks of the face image.
  • an apparatus for locating facial landmarks of a face image may comprise: a retrieving unit for retrieving a set of candidate shapes from a predetermined shape region in one or more sequential stages, each of the candidate shapes having pre-labeled facial landmarks; an aligning unit being electronically communicated with the retrieving unit and aligning each of the retrieved candidate shape with the face image to obtain corresponding aligned shapes; and a determining unit being electronically communicated with the aligning unit and determining, according to the aligned shapes obtained in a current stage of the stages, a sub-region of the shape region to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage.
  • a system for locating facial landmarks of a face image may comprise: a memory for storing executable components and a processor being electrically coupled to the memory to execute the executable components to perform operations of the system, wherein the executable components comprise a retrieving component for retrieving a set of candidate shapes from a predetermined shape region in one or more sequential stages, each of the candidate shapes having pre-labeled facial landmarks; an aligning component for aligning each of the retrieved candidate shape with the face image to obtain corresponding aligned shapes; and a determining component for determining, according to the aligned shapes obtained in a current stage of the stages, a sub-region of the shape region to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage.
  • Fig. 1 illustrates an apparatus for locating facial landmarks of a face image according to one embodiment of the present application.
  • Fig. 2 illustrates a schematic block diagram of the determining unit according to an embodiment of the present application.
  • Fig. 3 illustrates a method for locating facial landmarks of a face image according to one embodiment of the present application.
  • Fig. 4 illustrates a schematic flow of the determining step of the method according to one embodiment of the present application.
  • Fig. 5 is a diagram illustrating a process for selecting sub-regions in three stages, which is visualized in 2D, according to one embodiment of the present application.
  • Fig. 6 is an example in which the method for locating facial landmarks is performed during three stages according to one embodiment of the present application.
  • Fig. 7 illustrates a system for locating facial landmarks of a face image according to one embodiment of the present application, in which the functions of the present invention are carried out by the software.
  • shape space refers to a 2n dimensional linear space, where n refers to the number of landmarks. Shapes in the shape space represent (x, y) coordinates of the n facial landmarks. Sub-region refers to a subset of the shape space, instead of the spatial notion of face region.
  • Fig. 1 illustrates an apparatus 1000 for locating facial landmarks of a face image according to one embodiment of the present application.
  • the apparatus 1000 comprises a retrieving unit 100, an aligning unit 200 and a determining unit 300.
  • the location of facial landmarks of a face such as eyes pupil or mouth corners, etc. can be automatically detected.
  • the retrieving unit 100 may be configured to retrieve a set of candidate shapes from a predetermined shape region in one or more sequential stages, each of the candidate shapes having pre-labeled facial landmarks.
  • the candidate shapes are obtained from a set pre-processed by Procrustes analysis.
  • the shape space S is fixed throughout the whole process.
  • the aligning unit 200 may be electronically communicated with the retrieving unit 100.
  • the aligning unit 200 may be configured to align each of the retrieved candidate shape with the face image to obtain corresponding aligned shapes.
  • the aligning unit 200 may further extract facial features from the face image and map the extracted facial features to a shape residual by using at least one regressor, so that the aligned shapes are obtained by using the shape residual.
  • Different numbers and different types of facial features can be extracted in different stages.
  • the SIFT Scale Invariant Feature Transform
  • BRIEF Breast Robust Independent Elementary Features
  • the present application is not limited thereto, the features can be any known features.
  • the determining unit 300 may be electronically communicated with the aligning unit 200 and configured to determine, according to the aligned shapes obtained in a current stage of the stages, a sub-region of the shape region to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage.
  • the determining unit 300 may further comprise a center inferring unit 301 and a suitability inferring unit 302, which is shown in Fig. 2 and will be described later in details.
  • Fig. 3 illustrates a method 2000 for locating facial landmarks of a face image according to one embodiment of the present application.
  • Fig. 4 illustrates a schematic flow of the determining performed by the determining unit 300. The configurations and functions of the elements of the apparatus 1000 and the processes of method 2000 will be described in details with reference to Figs. 1-4.
  • a set of candidate shapes may be retrieved respectively from a predetermined shape region, each of the candidate shapes having labeled facial landmarks.
  • each of the retrieved candidate shape may be aligned with the face image to obtain corresponding aligned shapes.
  • a sub-region of the shape region is determined to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage.
  • step S400 it is determined that whether the steps S100 to S300 finish in all the stages.
  • a predetermined number of stages are finished means that the process finishes. Note that, the present application is not limited thereto, any known method in the art is available. If yes at step S400, the process ends and the center of the sub-region inferred at a last stage of the stages is determined as the located facial landmarks of the face image, which will be described later. If no, the process proceeds to the step S100.
  • the method 2000 begins with a coarse search over the shape space that contains diverse shapes and employs the coarse results to constrain subsequent finer search of shape. With the method 2000, the facial landmarks of the face image can be located accurately.
  • Fig. 6 illustrates an exemplary embodiment in which the method 2000 is performed during three stages according to one embodiment. From Fig. 6, it can be seen that the problems that the landmarks on nose and mouth are trapped in local optimal due to poor initialization in the prior art can be overcame by the method for locating coarse-to-fine shape. In an implementation of the method according to the present application, 35 fps real-time performance is achieved on a single core i5-4590. Compared with the conventional cascaded regression, the estimation error is only 12.04. It is understood that the embodiment is only of exemplary and the present application is not limited thereto.
  • the candidate shapes in S are obtained from a predetermined shape space.
  • the center inferring unit 301 may infer the center of the sub-region of the shape space.
  • the sub-region of the shape space is represented by where represents the center of the sub-region, and represents the suitability probability that defines scopes of the sub-region around the center
  • the center of the sub-region is determined by combining linearly all the aligned shapes for collectively inferring the sub-region center as below:
  • a weight vector w is used.
  • An affinity matrix A is formed by representing all the elements a pq in a matrix forms and the diagonal elements of A is set to zero to avoid self-loops.
  • denotes element-wise vector multiplication
  • the weight vector can be determined. Unlike the conventional approach in which all the aligned shapes are averaged by fixing the weight, the susceptible to small quantity of erroneous aligned shapes caused by the local optima can be suppressed.
  • the determining unit 300 can determine the sub-region accordingly, so that a set of candidate shapes will be retrieved from the sub-region according to the suitability probability.
  • the suitability inferring unit 302 may infer, according to the inferred center of the sub-region and the local appearance patterns of the face image, a suitability probability of each candidate shape suitable to the face image, to determine the sub-region of the shape region.
  • the suitability inferring unit 302 is further configured to calculate, according to the determined center of the sub-region, an adjustable probability of scope near the center to be adjusted; calculate, according to the determined center of the sub-region, facial part similarity probability of a plurality of facial parts of the face image; and obtain the suitability by multiplying the adjustment probability and the facial part similarity probability.
  • the adjustable probability p i is calculated by the following equation:
  • the adjustable probability aims to approximately delineate the retrieving scope near x (l) and typically the suitability is more concentrated for the later stages.
  • the facial part similarity probability p i is calculated based on local appearance patterns ⁇ exacted from the face image by the following equation:
  • the latter of equation (5) is represented by discriminative mapping (Hough regression voting) and different facial part r are divided.
  • the facial part similarity probability aims to guide shapes moving towards more plausible shape region by separately considering local appearance from each facial part.
  • the suitability is calculated by multiplying the adjustment probability inferred by equation (4) and the facial part similarity probability inferred by equation (5) as above.
  • the center of the last sub-region in the last stage is determined as the final shape, that is, the coordinate of the facial landmarks of the face image can be determined accurately.
  • the center of sub-region x (l) for the l th stage is trained by given suitability
  • the sub-region center is trained by the following equation:
  • the suitability is trained by given center of sub-region x (l) for the l th stage.
  • the covariance matrix is learned by ground-truth shape x * and the center of sub-region x (l) .
  • is the covariance matrix of x (l) -x * over all training samples and is restricted to be diagonal.
  • the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment and hardware aspects that may all generally be referred to herein as a “unit” , “circuit, ” “module” or “system. ”
  • ICs integrated circuits
  • Fig. 7 illustrates a system 3000 for locating facial landmarks of a face image according to one embodiment of the present application, in which the functions of the present invention are carried out by the software.
  • the system 3000 comprises a memory 3001 that stores executable components and a processor 3002, electrically coupled to the memory 3001 to execute the executable components to perform operations of the system 3000.
  • the executable components may comprise: a retrieving component 3003 for retrieving for retrieving a set of candidate shapes from a predetermined shape region in one or more sequential stages, each of the candidate shapes having pre-labeled facial landmarks; an aligning component 3004 for aligning each of the retrieved candidate shape with the face image to obtain corresponding aligned shapes; a determining component 3005 for determining, according to the aligned shapes obtained in a current stage of the stages, a sub-region of the shape region to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage.
  • the functions of the components 3003 to 3005 are similar to those of the unit 100 to 300, respectively, and thus the detailed descriptions thereof are omitted herein.

Abstract

The present invention discloses a system, an apparatus and a method for locating facial landmarks of the face image. The method for locating facial landmarks of a face image comprises retrieving a set of candidate shapes respectively from a predetermined shape region, each of the candidate shapes having labeled facial landmarks; aligning each of the retrieved candidate shape with the face image to obtain corresponding aligned shapes; determining, according to the aligned shapes obtained in a current stage of two or more stages, a sub-region of the shape region to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage, repeating the retrieving, the aligning and the determining for the stages to locate the facial landmarks of the face image. With the present method and system, the final solution can be prevented from being trapped in local optima due to poor initialization encountered by cascaded regression approaches and the robustness in coping with large pose variations can be improved.

Description

AN APPARATUS AND A METHOD FOR LOCATING FACIAL LANDMARKS OF FACE IMAGE Technical Field
The disclosures relate to face alignment, in particular, to a method, an apparatus and a system for locating facial landmarks of a face image.
Background
Face alignment aims at locating facial key points automatically. Among the many different approaches for face alignment, the cascaded regression approach has emerged as one of the most popular. The algorithm typically starts from an initial shape, e.g., a mean shape of training samples, and refines the shape through sequentially trained regressors.
However, the cascaded regression approach has a widely acknowledged shortcoming of its dependence on initialization. In particular, if the initialized shape is far from the target shape, it is unlikely that the discrepancy will be completely rectified by subsequent iterations in the cascade. As a consequence, the final solution may be trapped in local optima. Existing methods often circumvent this problem by adopting some heuristic assumptions or strategies which mitigate the problem to certain extent, but do not fully resolve the issue.
All the aforementioned methods assume the initial shape is provided in some forms, typically a mean shape. Mean shape is used with the assumption that the test samples are distributed close to the mean pose of the training samples. This assumption does not always hold especially for faces with large pose variations. Cao et al. propose to run the algorithm several times using different initializations and take as final output the median of all predictions. Burgos-Artizzu et al. improve the strategy by a smart restart method but it requires cross-validation to determine a threshold and the number of runs. In general, these strategies mitigate the problem to some extents, but still do not fully eliminate the dependence on shape initializations. Zhang et al. propose to obtain initialization by predicting a rough estimation from global image patch, still followed by sequentially trained auto-encoder regression networks.
Summary
The application aims to address at least one or more of the above problems of face alignment. The method according to the present application begins with a coarse search over a shape space that contains diverse shapes, and employs the coarse solution to constrain subsequent finer search of shanes, that is, “coarse-to-fine” approach. The unique stage-by-stage progressive and adaptive search can i) prevents the final solution from being trapped in local optima due to poor initialization, a common problem encountered by cascaded regression approaches; and ii) improves the robustness in coping with large pose variations.
In addition, the apparatus according to the present application proposes a hybrid features setting to achieve practical speed. Owing to the unique error tolerance in the coarse-to-fine searching mechanism, the apparatus is capable of switching different types of regression features in different optimization stages, without sacrificing accuracy too much.
In an aspect, disclosed is a method for locating facial landmarks of a face image. The method may comprise: retrieving a set of candidate shapes respectively from a predetermined shape region, each of the candidate shapes having labeled facial landmarks; aligning each of the retrieved candidate shape with the face image to obtain corresponding aligned shapes; determining, according to the aligned shapes obtained in a current stage of two or more stages, a sub-region of the shape region to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage, and repeating the retrieving, the aligning and the determining for the stages to locate the facial landmarks of the face image.
In another aspect, disclosed is an apparatus for locating facial landmarks of a face image. The apparatus may comprise: a retrieving unit for retrieving a set of candidate shapes from a predetermined shape region in one or more sequential stages, each of the candidate shapes having pre-labeled facial landmarks; an aligning unit being electronically communicated with the retrieving unit and aligning each of the retrieved candidate shape with the face image to obtain corresponding aligned shapes; and a determining unit being electronically communicated with the aligning unit and determining, according to the aligned  shapes obtained in a current stage of the stages, a sub-region of the shape region to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage.
In another aspect, disclosed is a system for locating facial landmarks of a face image. The system may comprise: a memory for storing executable components and a processor being electrically coupled to the memory to execute the executable components to perform operations of the system, wherein the executable components comprise a retrieving component for retrieving a set of candidate shapes from a predetermined shape region in one or more sequential stages, each of the candidate shapes having pre-labeled facial landmarks; an aligning component for aligning each of the retrieved candidate shape with the face image to obtain corresponding aligned shapes; and a determining component for determining, according to the aligned shapes obtained in a current stage of the stages, a sub-region of the shape region to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage.
Brief Description of the Drawing
Exemplary non-limiting embodiments of the present invention are described below with reference to the attached drawings. The drawings are illustrative and generally not to an exact scale. The same or similar elements on different figures are referenced with the same reference numbers.
Fig. 1 illustrates an apparatus for locating facial landmarks of a face image according to one embodiment of the present application.
Fig. 2 illustrates a schematic block diagram of the determining unit according to an embodiment of the present application.
Fig. 3 illustrates a method for locating facial landmarks of a face image according to one embodiment of the present application.
Fig. 4 illustrates a schematic flow of the determining step of the method according to one embodiment of the present application.
Fig. 5 is a diagram illustrating a process for selecting sub-regions in three stages, which is visualized in 2D, according to one embodiment of the present application.
Fig. 6 is an example in which the method for locating facial landmarks is  performed during three stages according to one embodiment of the present application.
Fig. 7 illustrates a system for locating facial landmarks of a face image according to one embodiment of the present application, in which the functions of the present invention are carried out by the software.
Detailed Description
Reference will now be made in detail to some specific embodiments of the invention including the best modes contemplated by the inventors for carrying out the invention. Examples of these specific embodiments are illustrated in the accompanying drawings. While the invention is described in conjunction with these specific embodiments, it will be understood that it is not intended to limit the invention to the described embodiments. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be practiced without some or all of these specific details. In other instances, well-known process operations have not been described in detail in order not to unnecessarily obscure the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a" , "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising, " when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Hereinafter, shape space refers to a 2n dimensional linear space, where n refers to the number of landmarks. Shapes in the shape space represent (x, y) coordinates of the n facial landmarks. Sub-region refers to a subset of the shape space, instead of the spatial notion of face region.
Fig. 1 illustrates an apparatus 1000 for locating facial landmarks of a face image  according to one embodiment of the present application. As shown, the apparatus 1000 comprises a retrieving unit 100, an aligning unit 200 and a determining unit 300. With the apparatus according to the present application, the location of facial landmarks of a face, such as eyes pupil or mouth corners, etc. can be automatically detected.
As shown in Fig. 1, the retrieving unit 100 may be configured to retrieve a set of candidate shapes from a predetermined shape region in one or more sequential stages, each of the candidate shapes having pre-labeled facial landmarks. In an embodiment, the candidate shapes are obtained from a set pre-processed by Procrustes analysis. The shape space S is fixed throughout the whole process.
The aligning unit 200 may be electronically communicated with the retrieving unit 100. The aligning unit 200 may be configured to align each of the retrieved candidate shape with the face image to obtain corresponding aligned shapes. In an embodiment, the aligning unit 200 may further extract facial features from the face image and map the extracted facial features to a shape residual by using at least one regressor, so that the aligned shapes are obtained by using the shape residual. Different numbers and different types of facial features can be extracted in different stages. For example, the SIFT (Scale Invariant Feature Transform) feature is used in all stages to obtain the best accuracy. In an implementation, the BRIEF (Binary Robust Independent Elementary Features) feature is used in the first two stages and SIFT feature is used in the last stage. It is understand that the present application is not limited thereto, the features can be any known features.
The determining unit 300 may be electronically communicated with the aligning unit 200 and configured to determine, according to the aligned shapes obtained in a current stage of the stages, a sub-region of the shape region to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage. According to an embodiment, the determining unit 300 may further comprise a center inferring unit 301 and a suitability inferring unit 302, which is shown in Fig. 2 and will be described later in details.
Fig. 3 illustrates a method 2000 for locating facial landmarks of a face image according to one embodiment of the present application. Fig. 4 illustrates a schematic flow of the determining performed by the determining unit 300. The configurations and functions of  the elements of the apparatus 1000 and the processes of method 2000 will be described in details with reference to Figs. 1-4.
As shown in Fig. 3, at step S100, a set of candidate shapes may be retrieved respectively from a predetermined shape region, each of the candidate shapes having labeled facial landmarks. At step S200, each of the retrieved candidate shape may be aligned with the face image to obtain corresponding aligned shapes. At S300, according to the aligned shapes obtained in a current stage of two or more stages at step S200, a sub-region of the shape region is determined to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage.
Then, at S400, it is determined that whether the steps S100 to S300 finish in all the stages. In an embodiment, a predetermined number of stages are finished means that the process finishes. Note that, the present application is not limited thereto, any known method in the art is available. If yes at step S400, the process ends and the center of the sub-region inferred at a last stage of the stages is determined as the located facial landmarks of the face image, which will be described later. If no, the process proceeds to the step S100. The method 2000 begins with a coarse search over the shape space that contains diverse shapes and employs the coarse results to constrain subsequent finer search of shape. With the method 2000, the facial landmarks of the face image can be located accurately.
Hereinafter, an example in which N candidate shapes are retrieved from the shape space and denoted as s = {s1, s2, ... , sN} (N>>2n) during l = 1, ... , L stages will be described. Fig. 6 illustrates an exemplary embodiment in which the method 2000 is performed during three stages according to one embodiment. From Fig. 6, it can be seen that the problems that the landmarks on nose and mouth are trapped in local optimal due to poor initialization in the prior art can be overcame by the method for locating coarse-to-fine shape. In an implementation of the method according to the present application, 35 fps real-time performance is achieved on a single core i5-4590. Compared with the conventional cascaded regression, the estimation error is only 12.04. It is understood that the embodiment is only of exemplary and the present application is not limited thereto.
The candidate shapes in S are obtained from a predetermined shape space. At the first stage, a set of Nl candidate shapes including 
Figure PCTCN2015079429-appb-000001
j=1, 2, ... are retrieved  from the shape space S randomly, for example, based on uniform distribution.
The aligning unit 200 may align the Nl candidate shape with the face image of several iterations. For iteration k = 1, 2, ... , K, local appearance patterns 
Figure PCTCN2015079429-appb-000002
are computed as a feature f. Then, the feature f is mapped to a shape residual Δx=Mreg (k) (f) by using Kl regressors reg (k) . With K iterations, the aligned shape
Figure PCTCN2015079429-appb-000003
j=1, 2, ... is obtained by
Figure PCTCN2015079429-appb-000004
After the aligned shapes are obtained, the center inferring unit 301 may infer the center of the sub-region of the shape space. In the lth stage, the sub-region of the shape space is represented by
Figure PCTCN2015079429-appb-000005
where
Figure PCTCN2015079429-appb-000006
represents the center of the sub-region, and
Figure PCTCN2015079429-appb-000007
represents the suitability probability that defines scopes of the sub-region around the center
Figure PCTCN2015079429-appb-000008
According to one embodiment, the center of the sub-region is determined by combining linearly all the aligned shapes for collectively inferring the sub-region center as below:
Figure PCTCN2015079429-appb-000009
In the equation (1) , a weight vector w is used. The weight vector may be determined by adopting a dominant set approach. More precisely, an undirected graph G = {V, E} is constructed, where weight of each edge in E is represented by an affinity defined as below:
Figure PCTCN2015079429-appb-000010
An affinity matrix A is formed by representing all the elements apq in a matrix forms and the diagonal elements of A is set to zero to avoid self-loops.
For t=1, ... , T
Figure PCTCN2015079429-appb-000011
where, ο denotes element-wise vector multiplication; and
Figure PCTCN2015079429-appb-000012
From this, the weight vector can be determined. Unlike the conventional approach in which all the aligned shapes are averaged by fixing the weight, the susceptible to small quantity of erroneous aligned shapes caused by the local optima can be suppressed.
After the center inferring unit 301 infers the center of sub-region by the equation (1) as above, the determining unit 300 can determine the sub-region accordingly, so that a set of candidate shapes will be retrieved from the sub-region according to the suitability probability.
According to another embodiment, the suitability inferring unit 302 may infer, according to the inferred center of the sub-region and the local appearance patterns of the face image, a suitability probability of each candidate shape suitable to the face image, to determine the sub-region of the shape region. In an embodiment, the suitability inferring unit 302 is further configured to calculate, according to the determined center of the sub-region, an adjustable probability of scope near the center to be adjusted; calculate, according to the determined center of the sub-region, facial part similarity probability of a plurality of facial parts of the face image; and obtain the suitability by multiplying the adjustment probability and the facial part similarity probability.
In particular, for the center of sub-region x (l) and the shape space {si} , the adjustable probability pi is calculated by the following equation:
Figure PCTCN2015079429-appb-000013
The adjustable probability aims to approximately delineate the retrieving scope near x (l) and typically the suitability is more concentrated for the later stages.
In addition, the facial part similarity probability pi is calculated based on local appearance patterns φ exacted from the face image by the following equation:
Figure PCTCN2015079429-appb-000014
The latter of equation (5) is represented by discriminative mapping  (Hough regression voting) and different facial part r are divided. The facial part similarity probability aims to guide shapes moving towards more plausible shape region by separately considering local appearance from each facial part.
Then, the suitability is calculated by multiplying the adjustment probability inferred by equation (4) and the facial part similarity probability inferred by equation (5) as above.
After these processes continue through all the stages L, the center of the last sub-region in the last stage is determined as the final shape, that is, the coordinate of the facial landmarks of the face image can be determined accurately.
In the above, the method for locating the facial landmarks of the face image has been described with reference to Figs. 1-4. The processes of inferring center of sub-region x (l) and inferring the suitability 
Figure PCTCN2015079429-appb-000015
may be trained by a training algorithm. The training algorithm is listed in Table 1.
Table 1-Training Algorithm of coarse-to-fine
Figure PCTCN2015079429-appb-000016
In a training procedure, the center of sub-region x (l) for the lth stage is trained by given suitability
Figure PCTCN2015079429-appb-000017
In particular, each candidate shape
Figure PCTCN2015079429-appb-000018
j=1, 2, ... is regressed to a shape closer to a ground-truth shape x*.
For iteration k=1, 2, ... , K, the local appearance information
Figure PCTCN2015079429-appb-000019
is computed as feature firstly; then, the regressors Mreg are trained by: 
Figure PCTCN2015079429-appb-000020
finally, 
Figure PCTCN2015079429-appb-000021
is updated by
Figure PCTCN2015079429-appb-000022
Figure PCTCN2015079429-appb-000023
to obtain
Figure PCTCN2015079429-appb-000024
j=1, 2, .... .
Then, for i-th training sample, the sub-region center
Figure PCTCN2015079429-appb-000025
is trained by the following equation:
Figure PCTCN2015079429-appb-000026
For the weight vector wi, an undirected graph is constructed and the vertices of the graph are the aligned shapes. Each edge in the edge set is weighted by affinity defined as
Figure PCTCN2015079429-appb-000027
Then, the weight vector wi is optimized by the following equation:
Figure PCTCN2015079429-appb-000028
In another training procedure, the suitability 
Figure PCTCN2015079429-appb-000029
is trained by given center of sub-region x (l) for the lth stage.
For the adjustable probability pi as represented by equation (4) , the covariance matrix is learned by ground-truth shape x* and the center of sub-region x (l) . Σ is the covariance matrix of x (l) -x* over all training samples and is restricted to be diagonal.
For the facial part similarity probability pi as represented by equation (5) , different facial part are divided. For the facial part r, 
Figure PCTCN2015079429-appb-000030
is learned by discriminative mapping.
Then, the suitability probability 
Figure PCTCN2015079429-appb-000031
is trained by the following equation:
Figure PCTCN2015079429-appb-000032
As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment and hardware aspects that may all generally be referred to herein as a “unit” , “circuit, ” “module” or “system. ” Much of the inventive functionality and many of the inventive principles when implemented, are best supported with or integrated circuits (ICs) , such as a digital signal processor and software therefore or application specific ICs. It is expected that one of ordinary skill, notwithstanding possibly significant effort and many design choices motivated by, for example, available time, current technology, and economic considerations, when guided by the concepts and principles disclosed herein will be readily capable of generating ICs with minimal experimentation. Therefore, in the interest of brevity and minimization of any risk of obscuring the principles and concepts according to the present invention, further discussion of such software and ICs, if any, will be limited to the essentials with respect to the principles and concepts used by the preferred embodiments.
In addition, the present invention may take the form of an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software. Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer-usable program code embodied in the medium. Fig. 7 illustrates a system 3000 for locating facial landmarks of a face image according to one embodiment of the present application, in which the functions of the present invention are carried out by the software. Referring to Fig. 7, the system 3000 comprises a memory 3001 that stores executable components and a processor 3002, electrically coupled to the memory 3001 to execute the executable components to perform operations of the system 3000. The executable components may comprise: a retrieving component 3003 for retrieving for retrieving a set of candidate shapes from a predetermined shape region in one or more sequential stages, each of the candidate shapes  having pre-labeled facial landmarks; an aligning component 3004 for aligning each of the retrieved candidate shape with the face image to obtain corresponding aligned shapes; a determining component 3005 for determining, according to the aligned shapes obtained in a current stage of the stages, a sub-region of the shape region to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage. The functions of the components 3003 to 3005 are similar to those of the unit 100 to 300, respectively, and thus the detailed descriptions thereof are omitted herein.
Although the preferred examples of the present invention have been described, those skilled in the art can make variations or modifications to these examples upon knowing the basic inventive concept. The appended claims are intended to be considered as comprising the preferred examples and all the variations or modifications fell into the scope of the present invention.
Obviously, those skilled in the art can make variations or modifications to the present invention without departing the spirit and scope of the present invention. As such, if these variations or modifications belong to the scope of the claims and equivalent technique, they may also fall into the scope of the present invention.

Claims (20)

  1. A method for locating facial landmarks of a face image, comprising:
    retrieving a set of candidate shapes respectively from a predetermined shape region, each of the candidate shapes having labeled facial landmarks;
    aligning each of the retrieved candidate shape with the face image to obtain corresponding aligned shapes;
    determining, according to the aligned shapes obtained in a current stage of two or more stages, asub-region of the shape region to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage; and
    repeating the retrieving, the aligning and the determining for the stages to locate the facial landmarks of the face image.
  2. The method according to claim 1, wherein the determining further comprises:
    inferring a center of the sub-region, according to the aligned shapes obtained in the current stage and local appearance patterns of the face image.
  3. The method according to claim 2, wherein the located facial landmarks of the face image is determined from the center of the sub-region inferred at a last stage of the stages.
  4. The apparatus according to claim 2, wherein the determining further comprises:
    inferring, according to the inferred center of the sub-region and the local appearance patterns of the face image, asuitability probability of each candidate shape suitable to the face image, to determine the sub-region of the shape region.
  5. The apparatus according to claim 4, wherein the inferring suitability probability is further performed by
    calculating, according to the determined center of the sub-region, an adjustable probability of scopes to be adjusted around the center; and
    calculating, according to the local appearance patterns of the face image, facial part similarity probability of facial parts of the face image, to obtain the suitability probability by multiplying the adjustment probability and the facial part similarity probability.
  6. The apparatus according to claim 1, wherein the aligning further comprises:
    extracting facial features from the face image; and
    mapping the extracted facial features to a shape residual by using at least one regressors, so that the aligned shapes are obtained by using the shape residual.
  7. The apparatus according to claim 6, wherein different numbers and different types of facial features can be extracted in different stages.
  8. The apparatus according to claim 7, wherein the facial features extracted in the first stages is SIFT and that extracted in other stages is SIFT and BRIEF.
  9. An apparatus for locating facial landmarks of a face image, comprising:
    a retrieving unit for retrieving a set of candidate shapes from a predetermined shape region in one or more sequential stages, each of the candidate shapes having pre-labeled facial landmarks;
    an aligning unit being electronically communicated with the retrieving unit and aligning each of the retrieved candidate shape with the face image to obtain corresponding aligned shapes; and
    a determining unit being electronically communicated with the aligning unit and determining, according to the aligned shapes obtained in a current stage of the stages, asub-region of the shape region to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage.
  10. The apparatus according to claim 9, wherein the determining unit further comprises:
    a center inferring unit for inferring a center of the sub-region, according to the aligned  shapes obtained in the current stage and local appearance patterns of the face image.
  11. The apparatus according to claim 10, wherein the located facial landmarks of the face image is determined from the center of the sub-region inferred at a last stage of the stages.
  12. The apparatus according to claim 10, wherein the determining unit further comprises:
    a suitability inferring unit for inferring, according to the inferred center of the sub-region and the local appearance patterns of the face image, asuitability probability of each candidate shape suitable to the face image, to determine the sub-region of the shape region.
  13. The apparatus according to claim 12, wherein the suitability inferring unit is further configured to
    calculate, according to the determined center of the sub-region, an adjustable probability of scopes to be adjusted around the center; and
    calculate, according to the local appearance patterns of the face image, facial part similarity probability of facial parts of the face image, to obtain the suitability probability by multiplying the adjustment probability and the facial part similarity probability.
  14. The apparatus according to claim 9, wherein the aligning unit is further configured to
    extracting facial features from the face image; and
    mapping the extracted facial features to a shape residual by using at least one regressors, so that the aligned shapes are obtained by using the shape residual.
  15. The apparatus according to claim 14, wherein different numbers and different types of facial features can be extracted in different stages.
  16. The apparatus according to claim 15, wherein the features extracted in the first two stages is BRIEF and that extracted in other stages is SIFT.
  17. A system for locating facial landmarks of a face image, comprising:
    a image capturing unit for capturing the face image;
    a retrieving unit for retrieving a set of candidate shapes from a predetermined shape region in one or more sequential stages, each of the candidate shapes having pre-labeled facial landmarks;
    an aligning unit being electronically communicated with the retrieving unit and aligning each of the retrieved candidate shape with the face image to obtain corresponding aligned shapes; and
    a determining unit being electronically communicated with the aligning unit and determining, according to the aligned shapes obtained in a current stage of the stages, asub-region of the shape region to select a set of candidate shapes therefrom to be retrieved at a next stage following the current stage.
  18. The system according to claim 17, wherein the determining unit further comprises:
    a center inferring unit for inferring a center of the sub-region, according to the aligned shapes obtained in the current stage and local appearance patterns of the face image; and
    a suitability inferring unit for inferring, according to the inferred center of the sub-region and the local appearance patterns of the face image, asuitability probability of each candidate shape suitable to the face image, to determine the sub-region of the shape region.
  19. The system according to claim 18, further comprising:
    a training unit for training the center inferring unit with a given suitability and training the suitability inferring unit with a given center of the sub-region, so as to modify parameters used by the determining unit.
  20. The system according to claim 19, wherein the located facial landmarks of the face image is determined from the center of the sub-region inferred at a last stage of the stages.
PCT/CN2015/079429 2015-05-21 2015-05-21 An apparatus and a method for locating facial landmarks of face image WO2016183834A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2015/079429 WO2016183834A1 (en) 2015-05-21 2015-05-21 An apparatus and a method for locating facial landmarks of face image
CN201580080396.8A CN107615295B (en) 2015-05-21 2015-05-21 Apparatus and method for locating key features of face image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/079429 WO2016183834A1 (en) 2015-05-21 2015-05-21 An apparatus and a method for locating facial landmarks of face image

Publications (1)

Publication Number Publication Date
WO2016183834A1 true WO2016183834A1 (en) 2016-11-24

Family

ID=57319095

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/079429 WO2016183834A1 (en) 2015-05-21 2015-05-21 An apparatus and a method for locating facial landmarks of face image

Country Status (2)

Country Link
CN (1) CN107615295B (en)
WO (1) WO2016183834A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220092294A1 (en) * 2019-06-11 2022-03-24 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and system for facial landmark detection using facial component-specific local refinement

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8027521B1 (en) * 2008-03-25 2011-09-27 Videomining Corporation Method and system for robust human gender recognition using facial feature localization
CN103824050A (en) * 2014-02-17 2014-05-28 北京旷视科技有限公司 Cascade regression-based face key point positioning method
US20140147022A1 (en) * 2012-11-27 2014-05-29 Adobe Systems Incorporated Facial Landmark Localization By Exemplar-Based Graph Matching
US20140185924A1 (en) * 2012-12-27 2014-07-03 Microsoft Corporation Face Alignment by Explicit Shape Regression

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877055A (en) * 2009-12-07 2010-11-03 北京中星微电子有限公司 Method and device for positioning key feature point
CN103377382A (en) * 2012-04-27 2013-10-30 通用电气公司 Optimum gradient pursuit for image alignment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8027521B1 (en) * 2008-03-25 2011-09-27 Videomining Corporation Method and system for robust human gender recognition using facial feature localization
US20140147022A1 (en) * 2012-11-27 2014-05-29 Adobe Systems Incorporated Facial Landmark Localization By Exemplar-Based Graph Matching
US20140185924A1 (en) * 2012-12-27 2014-07-03 Microsoft Corporation Face Alignment by Explicit Shape Regression
CN103824050A (en) * 2014-02-17 2014-05-28 北京旷视科技有限公司 Cascade regression-based face key point positioning method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220092294A1 (en) * 2019-06-11 2022-03-24 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and system for facial landmark detection using facial component-specific local refinement

Also Published As

Publication number Publication date
CN107615295A (en) 2018-01-19
CN107615295B (en) 2020-09-25

Similar Documents

Publication Publication Date Title
JP5940453B2 (en) Method, computer program, and apparatus for hybrid tracking of real-time representations of objects in a sequence of images
Mattoccia et al. Fast full-search equivalent template matching by enhanced bounded correlation
Roffo et al. The visual object tracking VOT2016 challenge results
CN108090470B (en) Face alignment method and device
US9141871B2 (en) Systems, methods, and software implementing affine-invariant feature detection implementing iterative searching of an affine space
CN109903313B (en) Real-time pose tracking method based on target three-dimensional model
CN112784953A (en) Training method and device of object recognition model
US20120219209A1 (en) Image Labeling with Global Parameters
Capellen et al. ConvPoseCNN: Dense convolutional 6D object pose estimation
WO2019102608A1 (en) Image processing device, image processing method, and image processing program
CN111383252B (en) Multi-camera target tracking method, system, device and storage medium
CN112668374A (en) Image processing method and device, re-recognition network training method and electronic equipment
CN105678778A (en) Image matching method and device
Seib et al. Object recognition using hough-transform clustering of surf features
KR102369413B1 (en) Image processing apparatus and method
JP2011215716A (en) Position estimation device, position estimation method and program
Fragoso et al. SWIGS: A swift guided sampling method
Stevšič et al. Spatial attention improves iterative 6D object pose estimation
WO2016183834A1 (en) An apparatus and a method for locating facial landmarks of face image
JP6434718B2 (en) Face image recognition apparatus and face image recognition program
US20210183074A1 (en) Apparatus and method for tracking multiple objects
CN109074643B (en) Orientation-based object matching in images
Schramm et al. Toward fully automatic object detection and segmentation
JP2015007919A (en) Program, apparatus, and method of realizing high accuracy geometric inspection for images different in point of view
Gorbatsevich et al. Single-shot semantic matcher for unseen object detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15892213

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15892213

Country of ref document: EP

Kind code of ref document: A1