CN108022308A - A kind of facial alignment schemes based on three-dimensional face model fitting - Google Patents

A kind of facial alignment schemes based on three-dimensional face model fitting Download PDF

Info

Publication number
CN108022308A
CN108022308A CN201711238476.9A CN201711238476A CN108022308A CN 108022308 A CN108022308 A CN 108022308A CN 201711238476 A CN201711238476 A CN 201711238476A CN 108022308 A CN108022308 A CN 108022308A
Authority
CN
China
Prior art keywords
mrow
msub
mtd
face
mover
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201711238476.9A
Other languages
Chinese (zh)
Inventor
夏春秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Vision Technology Co Ltd
Original Assignee
Shenzhen Vision Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Vision Technology Co Ltd filed Critical Shenzhen Vision Technology Co Ltd
Priority to CN201711238476.9A priority Critical patent/CN108022308A/en
Publication of CN108022308A publication Critical patent/CN108022308A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Architecture (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • Image Analysis (AREA)

Abstract

A kind of facial alignment schemes based on three-dimensional face model fitting proposed in the present invention, its main contents include:The expression of 3D faces, the structure of convolutional neural networks, mark fitting constraint, contour fitting constraint and Scale invariant features transform match constraint, its process is, one convolutional neural networks of training are used for the face-image that intensive 3D face shapes are fitted to single input, convolutional neural networks learn nonlinear mapping function, corresponding projective parameter and form parameter from input picture, the parameter of estimation can be used to construct intensive 3D face shapes, then be represented using intensive 3D shape to apply multiple constraints.The three-dimensional face model fitting algorithm that the present invention uses, using multiple constraints and utilize multiple data sets, can not only be alignd the facial markers of finite number, and facial contour and Scale invariant features transform characteristic point can be met, improve facial alignment precision, calculating cost is reduced, substantially increases the efficiency of face alignment.

Description

Face alignment method based on three-dimensional face model fitting
Technical Field
The invention relates to the field of face alignment, in particular to a face alignment method based on three-dimensional face model fitting.
Background
With the development of computer technology, some biological characteristics of human beings are applied to identification. Compared with other biological identification technologies, the face identification technology has the advantages of convenience in feature acquisition, short identification period and the like. The face recognition system can be divided into the following steps: face detection, face alignment, extraction, and face part class recognition. The alignment of the faces of different images in the face sequence image affects the accuracy of face recognition, and becomes an important problem of a face recognition system. The face alignment can be used for face organ positioning and organ tracking, and corresponding part features are extracted by accurately positioning each part of the face; after the faces are aligned, the expression state of the faces can be analyzed by using the aligned face shapes, so that the method is applied to the fields of analysis of interest points of children, survey of user satisfaction, expression lie detection and the like; the face alignment can also be used for generating a human face cartoon and a sketch, and a picture required by a user is generated by combining a mobile phone picture editor; in addition, face alignment can be used for 3D cartoon simulation, gender identification, age inference from aging or rejuvenation of faces, virtual reality and augmented reality, etc. However, many face alignment methods cannot utilize multiple data sets because each data set has a different label, and face alignment is computationally expensive and difficult to apply efficiently.
The invention provides a face alignment method based on three-dimensional face model fitting.A convolutional neural network is trained to be used for fitting a dense 3D face shape into a single input face image, the convolutional neural network learns a nonlinear mapping function, corresponding projection parameters and shape parameters from the input image, the estimated parameters can be used for constructing the dense 3D face shape, and then a plurality of constraints are applied by utilizing dense three-dimensional shape representation. The three-dimensional face model fitting algorithm used by the invention adopts a plurality of constraints and utilizes a plurality of data sets, not only can align a limited number of face marks, but also can accord with face contour and scale invariant feature transformation feature points, thereby improving the face alignment precision, reducing the calculation cost and greatly improving the face alignment efficiency.
Disclosure of Invention
In view of the problem that multiple data sets cannot be utilized and the computation cost is high, the invention aims to provide a face alignment method based on three-dimensional face model fitting, a convolution neural network is trained to fit a dense 3D face shape into a single input face image, the convolution neural network learns a nonlinear mapping function, corresponding projection parameters and shape parameters from the input image, the estimated parameters can be used to construct the dense 3D face shape, and then multiple constraints are applied by utilizing the dense three-dimensional shape representation.
In order to solve the above problems, the present invention provides a face alignment method based on three-dimensional face model fitting, which mainly comprises:
(one) a 3D face representation;
(II) the structure of a Convolutional Neural Network (CNN);
(iii) Label Fitting Constraints (LFCs);
(IV) a Contour Fitting Constraint (CFC);
(V) Scale Invariant Feature Transform (SIFT) pairing constraints (SPC).
Wherein, the face alignment method trains a Convolutional Neural Network (CNN) for fitting dense 3D face shapes to a single input face image; a plurality of constraints, for example, a marker fitting constraint, a contour fitting constraint, and a SIFT pairing constraint, are imposed with a dense three-dimensional shape representation.
Wherein the 3D face representation represents a dense three-dimensional shape of the face as S, including three-dimensional positions of Q vertices:
to calculate the S of the face, the basis of the 3D shape is represented in terms of a 3D model:
wherein the face shape S is an average shapeAnd weighting the principal component analysis shape toAndand corresponding weightAndon the basis of the original data; to be provided withThe basic 199 shapes represent tall or short, light or heavy, male or female, etcThe basic 29 shapes represent expression changes, such as mouth opening, smiling, kissing, and the like; each radix has Q53215 vertices, which correspond to the vertices on all other bases.
Further, for the dense three-dimensional shape, a subset of the N vertices of the dense 3D face U corresponds to the positions of the 2D markers on the image:
the dense shape of the 2D face can be estimated from the 3D face shape by considering weak perspective projection, and the projection matrix has 6 degrees of freedom to model, such as size, rotation angle (tilt α, deviation β, and rotation gamma), and translation (t) andx,ty) (ii) a Transformed dense face shapeCan be expressed as:
U=Pr·A (5)
wherein a can be orthogonally projected onto a 2D plane to obtain U; thus, the z coordinate transformation (m)12) Out of the range of interest, is defined as 0; the orthogonal projection may be represented as a matrix
Further, the orthogonal projection, given the properties of the projection matrix, the normalized third row of the projection matrix may be represented as the vector product of the normalized first two rows:
thus, the projection parameters can be passedThe first two rows and the shape reference coefficient p ═To determine arbitrary 2D surfacesA dense shape of the portion; the learning of dense three-dimensional shapes translates into learning of m and p, which is easier to manage in terms of dimensionality.
Wherein, the structure of the Convolutional Neural Network (CNN) learns a nonlinear mapping function f (theta), a corresponding projection parameter m and a shape parameter p from an input image I by using the convolutional neural network; the estimated parameters may be used to construct dense 3D face shapes;
CNN networks have two branches, one for predicting m and the other for predicting p; the first three convolution blocks are shared by the two branches, after the third one, two independent convolution blocks are used to extract task specific features, the two fully connected layers transfer the features to the final output; each convolution block is a stack of two convolution layers and a maximum convergence layer, each convolution or fully-connected layer being followed by a bulk normalization layer and a modified linear unit (ReLU) layer;
to improve CNN learning, a loss function is used that contains a number of constraints: parameter Constraint (PC) JprMinimizing the difference between the estimated parameters and the calibrated real parameters; label Fitting Constraint (LFC) JlmReducing alignment errors of the 2D marks; contour Fitting Constraint (CFC) JcEstimating a match between the contour of the three-dimensional shape and contour pixels of the input image; SIFT Pair Constraint (SPC) JsExciting SIFT feature point pairs of the two face images to enable the SIFT feature point pairs to correspond to the same 3D vertex;
the global loss function is defined as:
wherein,
the Parametric Constraint (PC) loss function is defined as shown in the above equation.
Wherein said Label Fitting Constraint (LFC) is aimed at minimizing the estimated 2D label and the authentic 2D label UlmThe difference between them; given a 2D face image with specific labels, first manually label the indices of the 3D face vertices that correspond anatomically to these labels; the set of these indices is denoted ilm(ii) a According to formula (4) and estimationAndcalculating the shape A, the 3D marker may be from A to A (: i)lm) Extracting; with A (: i)lm) Projected onto a 2D plane, the LFC loss function is defined as:
where subscript F denotes the frobinius norm and L is the number of predefined labels.
Wherein said Contour Fitting Constraint (CFC) is intended to minimize the error between the projected outline of the dense 3D shape and the corresponding contour pixels in the input face image; while rendering the 3D space onto the 2D plane, the outer contour may be considered as a boundary between the background and the 3D face.
Further, in order to utilize the contour fitting constraint, the following three steps are required to be followed:
(1) detecting real contours in 2D face images: first, an off-the-shelf edge detector is used to detect contours on the facial imageFurther refining the detected edges by retaining edges within the narrow band defined by the contour marks; before training begins, the preprocessing step is finished off-line;
(2) describe the silhouette vertices on the estimated three-dimensional shape a: the contour on the estimated three-dimensional shape A can be described as a set of boundary verticesA is based on the estimationAndcalculating parameters; by representing shape A using Delaunay triangulation, one edge of a triangle is defined as a boundary if the adjacent surface has a sign change in the z value of the surface normal; the vertices associated with the edge are defined as boundary vertices, and their set is denoted as ic
(3) Determining the corresponding relation between the real contour and the estimated contour, and reversely propagating the fitting error: require UcAnd A (: i)c) Evaluating the constraint condition by the point-to-point correspondence relation between the points; matching contour pixels on the 2D image with the closest points on the 3D shape contour, and then calculating the minimum distance; the sum of all the minimum distances is the error of CFC, as shown in formula (10); to make the CFC loss negligible, equation (10) is rewritten to calculate the vertex index of the nearest contour proxel, e.g.,once k is0It was determined that CFC losses would be insignificant, similar to equation (9):
although i iscDependent on { m, pH, but for simplicity when performing reverse propagation, icAre considered constant.
The Scale Invariant Feature Transform (SIFT) pairing constraint (SPC) is characterized in that a pair of faces i and j are given, and SIFT points on two face images are detected and matched firstly; the matched SIFT points are expressed as
For perfectly dense face alignment, the matched SIFT points will overlap with exactly the same vertices in the estimated 3D face shape, denoted as aiAnd Aj(ii) a Finding 3D projection verticesIts projection and two-dimensional SIFT pointsAnd (3) superposition:
to be provided withOn the basis, the loss function for SPC is defined as:
wherein, by { mi,piCalculating to obtain Ai(ii) a SIFT points are mapped from one face to another and their distance is calculated, matching SIFT points on the other face.
Drawings
Fig. 1 is a system frame diagram of a face alignment method based on three-dimensional face model fitting according to the present invention.
FIG. 2 is a structure of a convolutional neural network of a face alignment method based on three-dimensional face model fitting.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application can be combined with each other without conflict, and the present invention is further described in detail with reference to the drawings and specific embodiments.
Fig. 1 is a system frame diagram of a face alignment method based on three-dimensional face model fitting according to the present invention. The method mainly comprises 3D face representation, the structure of a convolutional neural network, mark fitting constraint, contour fitting constraint and scale invariant feature transformation pairing constraint.
A face alignment method, training a Convolutional Neural Network (CNN) for fitting dense 3D face shapes to a single input face image; a plurality of constraints, for example, a marker fitting constraint, a contour fitting constraint, and a SIFT pairing constraint, are imposed with a dense three-dimensional shape representation.
3D face representation, representing the dense three-dimensional shape of the face as S, containing the three-dimensional positions of Q vertices:
to calculate the S of the face, the basis of the 3D shape is represented in terms of a 3D model:
wherein the face shape S is an average shapeAnd weighting the principal component analysis shape toAndand corresponding weightAndon the basis of the original data; to be provided withThe basic 199 shapes represent tall or short, light or heavy, male or female, etcThe basic 29 shapes represent expression changes, such as mouth opening, smiling, kissing, and the like; each radix has Q53215 vertices, which correspond to the vertices on all other bases.
The subset of N vertices of the dense 3D face U corresponds to the position of the 2D marker on the image:
the dense shape of the 2D face can be estimated from the 3D face shape by considering weak perspective projection, and the projection matrix has 6 degrees of freedom to model, such as size, rotation angle (tilt α, deviation β, and rotation gamma), and translation (t) andx,ty) (ii) a Transformed dense face shapeCan be expressed as:
U=Pr·A (5)
wherein a can be orthogonally projected onto a 2D plane to obtain U; thus, the z coordinate transformation (m)12) Out of the range of interest, is defined as 0; the orthogonal projection may be represented as a matrix
Given the properties of the projection matrix, the normalized third row of the projection matrix may be represented as the vector product of the normalized first two rows:
thus, the projection parameters can be passedFirst two rows and shape reference coefficients To determine the dense shape of any 2D face; the learning of dense three-dimensional shapes translates into learning of m and p, which is easier to manage in terms of dimensionality.
FIG. 2 is a structure of a convolutional neural network of a face alignment method based on three-dimensional face model fitting. Learning a non-linear mapping function from an input image I using a convolutional neural networkThe corresponding projection parameter m and shape parameter p; the estimated parameters may be used to construct dense 3D face shapes;
CNN networks have two branches, one for predicting m and the other for predicting p; the first three convolution blocks are shared by the two branches, after the third one, two independent convolution blocks are used to extract task specific features, the two fully connected layers transfer the features to the final output; each convolution block is a stack of two convolution layers and a maximum convergence layer, each convolution or fully-connected layer being followed by a bulk normalization layer and a modified linear unit (ReLU) layer;
to improve CNN learning, a loss function is used that contains a number of constraints: parameter Constraint (PC) JprMinimizing the difference between the estimated parameters and the calibrated real parameters; label Fitting Constraint (LFC) JlmReducing alignment errors of the 2D marks; contour Fitting Constraint (CFC) JcEstimating a match between the contour of the three-dimensional shape and contour pixels of the input image; SIFT Pair Constraint (SPC) JsExciting SIFT feature point pairs of the two face images to enable the SIFT feature point pairs to correspond to the same 3D vertex;
the global loss function is defined as:
wherein,
the Parametric Constraint (PC) loss function is defined as shown in the above equation.
Label Fitting Constraints (LFC) aim to minimize estimated 2D labels and authentic 2D labelsThe difference between them;given a 2D face image with specific labels, first manually label the indices of the 3D face vertices that correspond anatomically to these labels; the set of these indices is denoted ilm(ii) a According to formula (4) and estimationAndcalculating the shape A, the 3D marker may be from A to A (: i)lm) Extracting; with A (: i)lm) Projected onto a 2D plane, the LFC loss function is defined as:
where subscript F denotes the frobinius norm and L is the number of predefined labels.
The Contour Fitting Constraint (CFC) is intended to minimize the error between the projected outline of the dense 3D shape and the corresponding contour pixels in the input face image; while rendering the 3D space onto the 2D plane, the outer contour may be considered as a boundary between the background and the 3D face.
The specific steps of contour fitting constraint are as follows:
(1) detecting real contours in 2D face images: first, an off-the-shelf edge detector is used to detect contours on the facial imageFurther refining the detected edges by retaining edges within the narrow band defined by the contour marks; before training begins, the preprocessing step is finished off-line;
(2) describe the silhouette vertices on the estimated three-dimensional shape a: the contour on the estimated three-dimensional shape A can be described as a set of boundary verticesA is based on the estimationAndcalculating parameters; by representing shape A using Delaunay triangulation, one edge of a triangle is defined as a boundary if the adjacent surface has a sign change in the z value of the surface normal; the vertices associated with the edge are defined as boundary vertices, and their set is denoted as ic
(3) Determining the corresponding relation between the real contour and the estimated contour, and reversely propagating the fitting error: require UcAnd A (: i)c) Evaluating the constraint condition by the point-to-point correspondence relation between the points; matching contour pixels on the 2D image with the closest points on the 3D shape contour, and then calculating the minimum distance; the sum of all the minimum distances is the error of CFC, as shown in formula (10); to make the CFC loss negligible, equation (10) is rewritten to calculate the vertex index of the nearest contour proxel, e.g.,once k is0It was determined that CFC losses would be insignificant, similar to equation (9):
although i iscDepends on the current estimate of m, p, but for simplicity i is the inverse of the propagation performedcAre considered constant.
Scale Invariant Feature Transform (SIFT) pairing constraints (SPC), given a pair of faces i and j, are first detected and matchedSIFT points on the two face images; the matched SIFT points are expressed as
For perfectly dense face alignment, the matched SIFT points will overlap with exactly the same vertices in the estimated 3D face shape, denoted as aiAnd Aj(ii) a Finding 3D projection verticesIts projection and two-dimensional SIFT pointsAnd (3) superposition:
to be provided withOn the basis, the loss function for SPC is defined as:
wherein, by { mi,piCalculating to obtain Ai(ii) a SIFT points are mapped from one face to another and their distance is calculated, matching SIFT points on the other face.
It will be appreciated by persons skilled in the art that the invention is not limited to details of the foregoing embodiments and that the invention can be embodied in other specific forms without departing from the spirit or scope of the invention. In addition, various modifications and alterations of this invention may be made by those skilled in the art without departing from the spirit and scope of this invention, and such modifications and alterations should also be viewed as being within the scope of this invention. It is therefore intended that the following appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

Claims (10)

1. A face alignment method based on three-dimensional face model fitting is characterized by mainly comprising a 3D face representation (I); structure of Convolutional Neural Network (CNN) (two); label Fit Constraints (LFCs) (three); contour Fitting Constraint (CFC) (four); scale Invariant Feature Transform (SIFT) pairing constraints (SPC) (five).
2. A face alignment method as claimed in claim 1, characterized in that a Convolutional Neural Network (CNN) is trained for fitting dense 3D face shapes to a single input face image; a plurality of constraints, for example, a marker fitting constraint, a contour fitting constraint, and a SIFT pairing constraint, are imposed with a dense three-dimensional shape representation.
3. 3D face representation (one) based on claim 1, characterized in that the dense three-dimensional shape of the face is represented as S, containing the three-dimensional positions of Q vertices:
<mrow> <mi>S</mi> <mo>=</mo> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>x</mi> <mn>1</mn> </msub> </mtd> <mtd> <msub> <mi>x</mi> <mn>2</mn> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>x</mi> <mi>Q</mi> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>y</mi> <mn>1</mn> </msub> </mtd> <mtd> <msub> <mi>y</mi> <mn>2</mn> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>y</mi> <mi>Q</mi> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>z</mi> <mn>1</mn> </msub> </mtd> <mtd> <msub> <mi>z</mi> <mn>2</mn> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>z</mi> <mi>Q</mi> </msub> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
to calculate the S of the face, the basis of the 3D shape is represented in terms of a 3D model:
<mrow> <mi>S</mi> <mo>=</mo> <mover> <mi>S</mi> <mo>&amp;OverBar;</mo> </mover> <mo>+</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>N</mi> <mrow> <mi>i</mi> <mi>d</mi> </mrow> </msub> </munderover> <msubsup> <mi>p</mi> <mrow> <mi>i</mi> <mi>d</mi> </mrow> <mi>i</mi> </msubsup> <msubsup> <mi>S</mi> <mrow> <mi>i</mi> <mi>d</mi> </mrow> <mi>i</mi> </msubsup> <mo>+</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>N</mi> <mi>exp</mi> </msub> </munderover> <msubsup> <mi>p</mi> <mi>exp</mi> <mi>i</mi> </msubsup> <msubsup> <mi>S</mi> <mi>exp</mi> <mi>i</mi> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
wherein the face shape S is an average shapeAnd weighting the principal component analysis shape toAndand corresponding weightAndon the basis of the original data; to be provided withThe basic 199 shapes represent tall or short, light or heavy, male or female, etcThe basic 29 shapes represent expression changes, such as mouth opening, smiling, kissing, and the like; each radix has Q53215 vertices, which correspond to the vertices on all other bases.
4. The dense three-dimensional shape of claim 3, wherein a subset of the N vertices of the dense 3D face U correspond to the positions of the 2D markers on the image:
<mrow> <mi>U</mi> <mo>=</mo> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>u</mi> <mn>1</mn> </msub> </mtd> <mtd> <msub> <mi>u</mi> <mn>2</mn> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>u</mi> <mi>N</mi> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>v</mi> <mn>1</mn> </msub> </mtd> <mtd> <msub> <mi>v</mi> <mn>2</mn> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>v</mi> <mi>N</mi> </msub> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
the dense shape of the 2D face can be estimated from the 3D face shape by considering weak perspective projection, and the projection matrix has 6 degrees of freedom to model, such as size, rotation angle (tilt α, deviation β, and rotation gamma), and translation (t) andx,ty) (ii) a Transformed dense face shapeCan be expressed as:
U=Pr·A (5)
wherein a can be orthogonally projected onto a 2D plane to obtain U; thus, the z coordinate transformation (m)12) Out of the range of interest, is defined as 0; the orthogonal projection may be represented as a matrix
5. The orthographic projection as recited in claim 4, wherein the normalized third row of the projection matrix is expressed as a vector product of the normalized first two rows given the properties of the projection matrix:
<mrow> <mo>&amp;lsqb;</mo> <msub> <mover> <mi>m</mi> <mo>&amp;OverBar;</mo> </mover> <mn>9</mn> </msub> <mo>,</mo> <msub> <mover> <mi>m</mi> <mo>&amp;OverBar;</mo> </mover> <mn>10</mn> </msub> <mo>,</mo> <msub> <mover> <mi>m</mi> <mo>&amp;OverBar;</mo> </mover> <mn>11</mn> </msub> <mo>&amp;rsqb;</mo> <mo>=</mo> <mo>&amp;lsqb;</mo> <msub> <mover> <mi>m</mi> <mo>&amp;OverBar;</mo> </mover> <mn>1</mn> </msub> <mo>,</mo> <msub> <mover> <mi>m</mi> <mo>&amp;OverBar;</mo> </mover> <mn>2</mn> </msub> <mo>,</mo> <msub> <mover> <mi>m</mi> <mo>&amp;OverBar;</mo> </mover> <mn>3</mn> </msub> <mo>&amp;rsqb;</mo> <mo>&amp;times;</mo> <mo>&amp;lsqb;</mo> <msub> <mover> <mi>m</mi> <mo>&amp;OverBar;</mo> </mover> <mn>4</mn> </msub> <mo>,</mo> <msub> <mover> <mi>m</mi> <mo>&amp;OverBar;</mo> </mover> <mn>5</mn> </msub> <mo>,</mo> <msub> <mover> <mi>m</mi> <mo>&amp;OverBar;</mo> </mover> <mn>6</mn> </msub> <mo>&amp;rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>
thus, the projection parameters can be passedFirst two rows and shape reference coefficients To determine the dense shape of any 2D face; the learning of dense three-dimensional shapes translates into learning of m and p, which is easier to manage in terms of dimensionality.
6. The structure (ii) of a Convolutional Neural Network (CNN) based on claim 1, characterized in that a non-linear mapping function f (Θ), corresponding projection parameters m and shape parameters p are learned from an input image I using a convolutional neural network; the estimated parameters may be used to construct dense 3D face shapes;
CNN networks have two branches, one for predicting m and the other for predicting p; the first three convolution blocks are shared by the two branches, after the third one, two independent convolution blocks are used to extract task specific features, the two fully connected layers transfer the features to the final output; each convolution block is a stack of two convolution layers and a maximum convergence layer, each convolution or fully-connected layer being followed by a bulk normalization layer and a modified linear unit (ReLU) layer;
to improve CNN learning, a loss function is used that contains a number of constraints: parameter Constraint (PC) JprMinimizing the difference between the estimated parameters and the calibrated real parameters; label Fitting Constraint (LFC) JlmReducing alignment errors of the 2D marks; contour Fitting Constraint (CFC) JcEstimating a match between the contour of the three-dimensional shape and contour pixels of the input image; SIFT Pair Constraint (SPC) JsExciting SIFT feature point pairs of the two face images to enable the SIFT feature point pairs to correspond to the same 3D vertex;
the global loss function is defined as:
<mrow> <mi>arg</mi> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mrow> <mover> <mi>m</mi> <mo>^</mo> </mover> <mo>,</mo> <mover> <mi>p</mi> <mo>^</mo> </mover> </mrow> </munder> <mi>J</mi> <mo>=</mo> <msub> <mi>J</mi> <mrow> <mi>p</mi> <mi>r</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>&amp;lambda;</mi> <mrow> <mi>l</mi> <mi>m</mi> </mrow> </msub> <msub> <mi>J</mi> <mrow> <mi>l</mi> <mi>m</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>&amp;lambda;</mi> <mi>c</mi> </msub> <msub> <mi>J</mi> <mi>c</mi> </msub> <mo>+</mo> <msub> <mi>&amp;lambda;</mi> <mi>s</mi> </msub> <msub> <mi>J</mi> <mi>s</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>
wherein,
the Parametric Constraint (PC) loss function is defined as shown in the above equation.
7. Label Fitting Constraint (LFC) (III) based on claim 1, characterized in that LFC aims to minimize estimated 2D labels and true 2D labelsThe difference between them; given a 2D face image with specific labels, first manually label the indices of the 3D face vertices that correspond anatomically to these labels; the set of these indices is denoted ilm(ii) a According to formula (4) and estimationAndcalculating the shape A, the 3D marker may be from A to A (: i)lm) Extracting; with A (: i)lm) Projected onto a 2D plane, the LFC loss function is defined as:
<mrow> <msub> <mi>J</mi> <mrow> <mi>l</mi> <mi>m</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>L</mi> </mfrac> <mo>&amp;CenterDot;</mo> <mo>|</mo> <mo>|</mo> <mi>Pr</mi> <mi>A</mi> <mrow> <mo>(</mo> <mo>:</mo> <mo>,</mo> <msub> <mi>i</mi> <mrow> <mi>l</mi> <mi>m</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>U</mi> <mrow> <mi>l</mi> <mi>m</mi> </mrow> </msub> <mo>|</mo> <msubsup> <mo>|</mo> <mi>F</mi> <mn>2</mn> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>
where subscript F denotes the frobinius norm and L is the number of predefined labels.
8. The Contour Fitting Constraint (CFC) (iv) based on claim 1, characterized in that CFC aims to minimize the error between the projected outline of the dense 3D shape and the corresponding contour pixels in the input face image; while rendering the 3D space onto the 2D plane, the outer contour may be considered as a boundary between the background and the 3D face.
9. The specific steps based on the contour fitting constraint of claim 8, wherein in order to utilize such contour fitting constraint, the following three steps are followed:
(1) detecting real contours in 2D face images: first, an off-the-shelf edge detector is used to detect contours on the facial imageFurther refining the detected edges by retaining edges within the narrow band defined by the contour marks; before training begins, the preprocessing step is finished off-line;
(2) describe the silhouette vertices on the estimated three-dimensional shape a: the contour on the estimated three-dimensional shape A can be described as a set of boundary verticesA is based on the estimationAndcalculating parameters; by representing shape A using Delaunay triangulation, one edge of a triangle is defined as a boundary if the adjacent surface has a sign change in the z value of the surface normal; the vertices associated with the edge are defined as boundary vertices, and their set is denoted as ic
(3) Determining the corresponding relation between the real contour and the estimated contour, and reversely propagating the fitting error: require UcAnd A (: i)c) Evaluating the constraint condition by the point-to-point correspondence relation between the points; matching contour pixels on the 2D image with the closest points on the 3D shape contour, and then calculating the minimum distance; the sum of all the minimum distances is the error of CFC, as shown in formula (10); to make the CFC loss negligible, equation (10) is rewritten to calculate the vertex index of the nearest contour proxel, e.g.,once k is0It was determined that CFC losses would be insignificant, similar to equation (9):
<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>J</mi> <mi>c</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>L</mi> </mfrac> <munder> <mo>&amp;Sigma;</mo> <mi>j</mi> </munder> <munder> <mi>min</mi> <mrow> <mi>k</mi> <mo>&amp;Element;</mo> <msub> <mi>i</mi> <mi>c</mi> </msub> </mrow> </munder> <mo>|</mo> <mo>|</mo> <mi>Pr</mi> <mi>A</mi> <mrow> <mo>(</mo> <mo>:</mo> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>U</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <mo>:</mo> <mo>,</mo> <mi>J</mi> <mo>)</mo> </mrow> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>L</mi> </mfrac> <munder> <mo>&amp;Sigma;</mo> <mi>j</mi> </munder> <mo>|</mo> <mo>|</mo> <mi>Pr</mi> <mi>A</mi> <mrow> <mo>(</mo> <mo>:</mo> <mo>,</mo> <mi>arg</mi> <munder> <mi>min</mi> <mrow> <mi>k</mi> <mo>&amp;Element;</mo> <msub> <mi>i</mi> <mi>c</mi> </msub> </mrow> </munder> <mo>|</mo> <mo>|</mo> <mi>Pr</mi> <mi>A</mi> <mo>(</mo> <mrow> <mo>:</mo> <mo>,</mo> <mi>k</mi> </mrow> <mo>)</mo> <mo>-</mo> <msub> <mi>U</mi> <mi>c</mi> </msub> <mo>(</mo> <mrow> <mo>:</mo> <mo>,</mo> <mi>j</mi> </mrow> <mo>)</mo> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>U</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <mo>:</mo> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>10</mn> <mo>)</mo> </mrow> </mrow>
although i iscDepends on the current estimate of m, p, but for simplicity i is the inverse of the propagation performedcAre considered constant.
10. Scale-invariant feature transform (SIFT) pair constraints (SPC) (iv) based on claim 1, wherein given a pair of faces i and j, SIFT points on two face images are first detected and matched; the matched SIFT points are expressed asAnd
for perfectly dense face alignment, the matched SIFT points will overlap with exactly the same vertices in the estimated 3D face shape, denoted as aiAnd Aj(ii) a Finding 3D projection verticesIts projection and two-dimensional SIFT pointsAnd (3) superposition:
<mrow> <msubsup> <mi>i</mi> <mi>s</mi> <mi>i</mi> </msubsup> <mo>=</mo> <mi>arg</mi> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mrow> <mi>i</mi> <mo>&amp;Element;</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>...</mn> <mo>,</mo> <msub> <mi>L</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>}</mo> </mrow> </munder> <mo>|</mo> <mo>|</mo> <msup> <mi>A</mi> <mi>i</mi> </msup> <mo>{</mo> <msubsup> <mi>i</mi> <mi>s</mi> <mi>i</mi> </msubsup> <mo>}</mo> <mo>-</mo> <msubsup> <mi>U</mi> <mi>s</mi> <mi>i</mi> </msubsup> <mo>|</mo> <msubsup> <mo>|</mo> <mi>F</mi> <mn>2</mn> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> </mrow>
to be provided withOn the basis, the loss function for SPC is defined as:
<mrow> <msub> <mi>J</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <msup> <mover> <mi>m</mi> <mo>^</mo> </mover> <mi>j</mi> </msup> <mo>,</mo> <msup> <mover> <mi>p</mi> <mo>^</mo> </mover> <mi>j</mi> </msup> <mo>,</mo> <msup> <mover> <mi>m</mi> <mo>^</mo> </mover> <mi>i</mi> </msup> <mo>,</mo> <msup> <mover> <mi>p</mi> <mo>^</mo> </mover> <mi>i</mi> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>L</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mfrac> <mrow> <mo>(</mo> <mo>|</mo> <mo>|</mo> <msup> <mi>A</mi> <mi>i</mi> </msup> <mo>{</mo> <msubsup> <mi>i</mi> <mi>s</mi> <mi>j</mi> </msubsup> <mo>}</mo> <mo>-</mo> <msubsup> <mi>U</mi> <mi>s</mi> <mi>i</mi> </msubsup> <mo>|</mo> <msubsup> <mo>|</mo> <mi>F</mi> <mn>2</mn> </msubsup> <mo>+</mo> <mo>|</mo> <mo>|</mo> <msup> <mi>A</mi> <mi>j</mi> </msup> <mo>{</mo> <msubsup> <mi>i</mi> <mi>s</mi> <mi>i</mi> </msubsup> <mo>}</mo> <mo>-</mo> <msubsup> <mi>U</mi> <mi>s</mi> <mi>j</mi> </msubsup> <mo>|</mo> <msubsup> <mo>|</mo> <mi>F</mi> <mn>2</mn> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow>
wherein, by { mi,piCalculating to obtain Ai(ii) a SIFT points are mapped from one face to another and their distance is calculated, matching SIFT points on the other face.
CN201711238476.9A 2017-11-30 2017-11-30 A kind of facial alignment schemes based on three-dimensional face model fitting Withdrawn CN108022308A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711238476.9A CN108022308A (en) 2017-11-30 2017-11-30 A kind of facial alignment schemes based on three-dimensional face model fitting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711238476.9A CN108022308A (en) 2017-11-30 2017-11-30 A kind of facial alignment schemes based on three-dimensional face model fitting

Publications (1)

Publication Number Publication Date
CN108022308A true CN108022308A (en) 2018-05-11

Family

ID=62077788

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711238476.9A Withdrawn CN108022308A (en) 2017-11-30 2017-11-30 A kind of facial alignment schemes based on three-dimensional face model fitting

Country Status (1)

Country Link
CN (1) CN108022308A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008911A (en) * 2019-04-10 2019-07-12 北京旷视科技有限公司 Image processing method, device, electronic equipment and computer readable storage medium
CN110321822A (en) * 2019-06-24 2019-10-11 深圳爱莫科技有限公司 Face alignment initial method and device, storage medium based on closest retrieval
CN112001268A (en) * 2020-07-31 2020-11-27 中科智云科技有限公司 Face calibration method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YAOJIE LIU等: "Dense Face Alignment", 《ARXIV:1709.01442V1》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008911A (en) * 2019-04-10 2019-07-12 北京旷视科技有限公司 Image processing method, device, electronic equipment and computer readable storage medium
CN110321822A (en) * 2019-06-24 2019-10-11 深圳爱莫科技有限公司 Face alignment initial method and device, storage medium based on closest retrieval
CN112001268A (en) * 2020-07-31 2020-11-27 中科智云科技有限公司 Face calibration method and device
CN112001268B (en) * 2020-07-31 2024-01-12 中科智云科技有限公司 Face calibration method and equipment

Similar Documents

Publication Publication Date Title
CN112766244B (en) Target object detection method and device, computer equipment and storage medium
Liu et al. Dense face alignment
US10380413B2 (en) System and method for pose-invariant face alignment
Prisacariu et al. Simultaneous monocular 2D segmentation, 3D pose recovery and 3D reconstruction
Sun et al. Depth estimation of face images using the nonlinear least-squares model
WO2014205768A1 (en) Feature and model mutual matching face tracking method based on increment principal component analysis
Sung et al. Pose-Robust Facial Expression Recognition Using View-Based 2D $+ $ 3D AAM
CN108022308A (en) A kind of facial alignment schemes based on three-dimensional face model fitting
CN106570460A (en) Single-image human face posture estimation method based on depth value
Wang et al. Joint head pose and facial landmark regression from depth images
CN116385660A (en) Indoor single view scene semantic reconstruction method and system
CN105678833A (en) Point cloud geometrical data automatic splicing algorithm based on multi-view image three-dimensional modeling
Li et al. Sparse-to-local-dense matching for geometry-guided correspondence estimation
CN102592309B (en) Modeling method of nonlinear three-dimensional face
CN114283265A (en) Unsupervised face correcting method based on 3D rotation modeling
Lladó et al. Non-rigid metric reconstruction from perspective cameras
Koo et al. Recovering the 3D shape and poses of face images based on the similarity transform
Chen et al. Learning shape priors for single view reconstruction
Muhle et al. The probabilistic normal epipolar constraint for frame-to-frame rotation optimization under uncertain feature positions
CN116681743A (en) One-stage point cloud registration method based on complex network theory, electronic equipment and storage medium
Gao et al. Estimation of 3D category-specific object structure: Symmetry, Manhattan and/or multiple images
Mai et al. Projective reconstruction of ellipses from multiple images
Lee Geometric optimization for computer vision
Cai et al. Two-view curve reconstruction based on the snake model
Bartoli On the non-linear optimization of projective motion using minimal parameters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20180511

WW01 Invention patent application withdrawn after publication