CN109671108B

CN109671108B - Single multi-view face image attitude estimation method capable of rotating randomly in plane

Info

Publication number: CN109671108B
Application number: CN201811550656.5A
Authority: CN
Inventors: 傅由甲
Original assignee: Chongqing University of Technology
Current assignee: Tuoerte Intelligent Technology Wuhan Co ltd
Priority date: 2018-12-18
Filing date: 2018-12-18
Publication date: 2020-07-28
Anticipated expiration: 2038-12-18
Also published as: CN109671108A

Abstract

The invention discloses a single multi-view face image posture estimation method capable of rotating randomly in a plane, which comprises the following steps of 1, establishing a pre-estimated value set of a gamma angle according to a face image to be detected, wherein the gamma angle represents a deflection angle around a Z axis, 2, traversing the pre-estimated value set of the gamma angle, calculating α and β angles corresponding to the pre-estimated values of the gamma angle by adopting a calculation method of solving α and β angles meeting an objective function under the condition of the specified gamma angle, α represents the deflection angle around the X axis, β represents the deflection angle around the Y axis, and 3, forming an alternative set by values of the objective function corresponding to the pre-estimated values of the gamma angle, selecting a minimum objective function value from the alternative set, and taking the gamma angle pre-estimated value, α and β angles corresponding to the minimum objective function value as a face posture of the face image to be detected.

Description

Single multi-view face image attitude estimation method capable of rotating randomly in plane

Technical Field

The invention relates to the technical field of face recognition, in particular to a face gesture recognition model and a face gesture recognition method.

Background

In the face identification, face images at various angles need to be collected in advance, and then the face image to be detected is aligned with the face image collected in advance, however, in practical application, the face image to be detected does not always face the camera, and deflection occurs in a three-dimensional space in most cases, so that the face pose (the deflection angle of the face around each coordinate axis in the three-dimensional space) in the face image to be detected needs to be identified, and the face alignment can be performed.

In addition, the head rotation direction and the eye fixation position can be obtained through posture estimation, the basis of human-computer interaction and visual monitoring in a multi-view environment is provided, the human face posture is corrected through the posture estimation, and the accuracy of multi-view human face recognition and analysis can be improved.

The face image is recorded with face information in a two-dimensional space (XY plane), and therefore, it is relatively easy to estimate the face pose γ (the angle of deflection around the Z axis perpendicular to the XY plane) on the XY plane, and generally the angle between the line between the two eyes and the X axis (horizontal direction).

In addition, for a frontal face, the rotation angle γ in the XY plane can be calculated by the inclination angle of the line connecting the centers of both eyes, but for a face that is severely affected by perspective and is deflected laterally, even if there is no rotation in the plane, the line connecting the centers of both eyes also has an inclination angle, as shown in fig. 1, and therefore, the rotation angle γ cannot be simply adopted as a method of calculating the inclination angle of the line connecting the centers of both eyes.

The general human face three-dimensional sparse model is a parameterized 3D model, is originally used for the application of human face coding based on the model, and has a small number of matching points and triangular surfaces, so that the modeling of the general human face three-dimensional sparse model only needs little computing time, and is widely applied to video animation and transmission, and fig. 2 is a standard wire frame structure of the general human face three-dimensional sparse model. Multi-view face pose fast estimation method on single image suitable for arbitrary rotation in plane

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a single multi-view face image posture estimation method capable of rotating freely in a plane, solves the technical problem that the target face posture is detected by relying on prior conditions in the prior art, can realize target face posture estimation only by a single image to be detected, and does not need to be trained or learned in advance.

In order to solve the technical problems, the invention provides the following technical scheme: a single multi-view face image posture estimation method capable of rotating randomly in a plane comprises the following steps:

step 1: establishing a pre-evaluation value set of a gamma angle according to a face image to be detected, wherein the gamma angle represents a deflection angle around a Z axis;

step 2, traversing the estimated value set of the gamma angles, and calculating α and β angles corresponding to the estimated values of the gamma angles by adopting a calculation method of solving α and β angles meeting an objective function under the condition of the specified gamma angles, wherein α represents a deflection angle around an X axis, and β represents a deflection angle around a Y axis;

and 3, combining the values of the objective function corresponding to the gamma angle estimated values into an alternative set, selecting a minimum objective function value from the alternative set, and taking the gamma angle estimated values, α angles and β angles corresponding to the minimum objective function value as the human face posture estimation parameters of the human face image to be detected.

Preferably, in step 1, a search algorithm is used to establish a set of estimated values of γ angles, and the method includes the following steps:

step 101: detecting an included angle theta between a central line of two eyes and a horizontal line on a face image to be detected;

step 102: determining the estimated range of the gamma angle:

is the search range;

step 103: search is performed in search steps: γ ═ θ ± t; wherein, t is the current searching times,

carrying out positive and negative searching by using a searching step length t in each searching; storing the obtained estimated value of the gamma angle into an estimated value set of the gamma angle every time the search is carried out;

step 104: and after the search is finished, establishing an estimated value set of the gamma angle.

Preferably, in step 2, α angles and β angles corresponding to the estimated values of the gamma angles are calculated in parallel according to the estimated values of the gamma angles.

Preferably, the method for calculating α and β angles under the specified gamma angle in step 2 comprises the following steps:

step 201: establishing general human face three-dimensional sparse model

The general human face three-dimensional sparse model

The human face three-dimensional sparse model is a general human face three-dimensional sparse model which does not deflect on an X, Y, Z axis;

step 202: establishing general human face three-dimensional sparse model

Three-dimensional coordinate matrix V of upper designated point^3D：

The designated point comprises a reference point;

wherein the content of the first and second substances,

three-dimensional sparse model representing general human face

The (i) th designated point of (c),

step 203: establishing general human face three-dimensional sparse model of human face image to be detected

Two-dimensional coordinate matrix V of matching points corresponding to the upper designated point^2D: the matching points do not include reference points;

wherein the content of the first and second substances,

it represents the i-th matching point and,

n is the number of face matching points on the face image to be detected;

step 204, taking the vector X as an independent variable (s, α), and initializing values of s, α and β, wherein s represents a scaling coefficient, α represents a deflection angle around an X axis, and β represents a deflection angle around a Y axis;

step 205: universal human face three-dimensional sparse model rotating around Z axis at specified gamma angle

206-rotating the Utility model around the X-axis at an angle of αFace three-dimensional sparse model

And rotating the general human face three-dimensional sparse model around the Y axis at an angle of β

Universal human face three-dimensional sparse model after rotation by scaling factor s

Zooming is carried out;

step 207: the zoomed general human face three-dimensional sparse model

Orthographic projection is carried out on an XY plane, and a two-dimensional projection model is obtained

Step 208: translation two-dimensional projection model

Modeling two-dimensional projection

The reference point on the face image is superposed with the reference point on the face image to be detected; the reference point is contained in the matching point;

step 209: from two-dimensional projection models

Establishing a target function by the sum of squares of the distances between the upper designated point and the corresponding point on the face image to be detected;

step 2010, updating the vector X (s, α) by using a search algorithm, and repeating the steps 206 to 208 each time the vector X is updated until the optimal solution of the objective function is searched;

step 2011: and storing the optimal solution of the objective function into the alternative set.

Preferably, the designated points include a subnasal point, a corner point of eyes, a nose tip point, and a mouth corner point, and the reference point is a subnasal point among the matching points.

Preferably, the objective function minF (X) is constructed using an internal penalty function method as follows:

minF(X)＝min[f(X)+r/s]；

wherein r is a barrier factor, r > 0; s is a scaling factor;

(X) representing a two-dimensional projection model

The sum of squares of the distances between the upper designated point and the corresponding point on the face image to be detected,

wherein n represents a two-dimensional projection model

The number of the upper designated points;

R_3Dthree-dimensional sparse model representing general human face

The rotation matrix of (a);

P_3Dthree-dimensional sparse model representing general human face

The forward projection matrix of (a);

S_3Dthree-dimensional sparse model representing general human face

The scaling matrix of (a);

T_2Drepresenting two-dimensional projection models

The translation matrix of (a) is,

indicating a reference point on the face image to be detected,

representing two-dimensional projection models

The above reference point.

Preferably, in step 2010, the optimal solution of the objective function minf (x) is calculated by using a modified Newton method, and the method comprises the following steps:

step 111: let the current iteration number be k, k being an integer greater than or equal to zero, and the current gradient vector

X_kRepresenting a current vector;

step 112: the initialization k is 0 and the initialization k is,

and jumps to step 116;

step 113: constructing a Newton direction: calculating the current function F (X)_k) Current direction derivative P of_kAccording to the following formula:

P_k＝-G_k ^-1g_kwherein G is_kIs F (X)_k) The Hesse matrix of (a) is,

step 114: one-dimensional search is carried out, and the current iteration step length t is calculated by using a golden section method_kTo update the current vector X_kThe updated vector isX_k+1，X_k+1＝X_k+t_kP_kAnd calculating F (X)_k+1)；

Step 115: let k equal k +1, r_k+1＝cr_kC is the obstacle factor reduction coefficient, c is 0.1, and step 116 is entered;

step 116: judge g | |_kIf the | is less than or equal to the established limit, the given limit is set; if yes, go to step 117; if not, go back to step 113;

step 117: judgment of r_kWhether the/s is less than or equal to true or not; if yes, stopping iteration and using current vector X_kOutputting as an optimal solution; if not, go back to step 113.

Compared with the prior art, the invention has the following advantages:

1. the method directly carries out attitude estimation on the target face on a single image without knowing the camera parameters of the image, taking other attitude pictures of the target face as references and needing no prior training or learning, and is simple to realize.

2. The method separates the estimated values of the gamma angle and the α and β angles, solves the problems of complex independent variable, large calculated amount and sensitivity to initial iteration values of a target function caused by simultaneous solving of rotation angles in three directions in the prior art, calculates the α and β angles corresponding to each gamma angle estimated value in a parallel mode after estimating the gamma angle, and greatly improves the operation speed.

3. According to the invention, based on the included angle theta between the central line of the two eyes and the horizontal line, the optimal human face plane external deflection angles α and β in a certain range around the included angle theta are searched to obtain the in-plane rotation angle gamma of the multi-view human face, so that the in-plane rotation angle calculation errors caused by large parallax and human face deflection are avoided.

4. The invention utilizes the universal human face three-dimensional sparse model after rotation, scaling, projection and translation

α which is overlapped with the face image to be detected, thereby the two-dimensional image can not be directly detected,β corner, converting to solve general human face three-dimensional sparse model

The human face postures around the X axis and the Y axis on the two-dimensional image are detected (under the condition of specifying the gamma angle).

5. The model coincidence process and the solving process of the objective function have synchronicity: the vector X needs to be continuously updated in the process of solving the optimal solution of the objective function_kEvery time vector X is updated_kThen the updated s, α and β pairs of universal human face three-dimensional sparse models are used

And rotating and zooming are carried out, and then projection and coincidence are carried out, so that the solving process of the optimal solution can be visually represented, and the coincidence degree of the model and the face image to be detected can be visually represented.

6. To ensure that s, α, β is calculated under the condition that s > 0, an internal penalty function is used to construct the augmented objective function of f (X).

7. The target function of the invention is calculated based on the distance from point to point in a two-dimensional plane, and the calculation in a three-dimensional space is not needed, so that the dimension is reduced, and the calculation speed can be improved.

8. The designated points are infranasal points, binocular corner points, nasal cusp points and mouth corner points, the characteristics of the points are prominent and easy to position, and only 8 designated points are adopted, so that the calculation speed can be greatly improved.

9. In the method for estimating the deflection angle of the human face around the X, Y axis, the translation amount of the model is determined by a method of overlapping the subnasal points, so that independent variables in an objective function are reduced to s and α, the model is constrained to rotate only by taking the subnasal point as the center by increasing the alignment point of the nasal tip, the complexity of an algorithm and the dependency on an initial point are reduced, and the α estimation can be completed only by small iteration times.

Drawings

FIG. 1 is a schematic illustration of the line connecting the centers of the eyes;

FIG. 2 is a standard wireframe structure diagram of a general human face three-dimensional sparse model;

FIG. 3 is a flow chart for calculating α, β angles at a specified γ angle;

fig. 4 is a test effect diagram of the method for estimating the pose of a single multi-view face image arbitrarily rotated in a plane in the present embodiment.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and preferred embodiments.

A single multi-view face image posture estimation method capable of rotating randomly in a plane comprises the following steps:

and 3, combining the values of the objective function corresponding to the gamma angle estimated values into an alternative set, selecting a minimum objective function value from the alternative set, and taking the gamma angle estimated values, α and β angles corresponding to the minimum objective function value as estimation parameters of the human face pose of the human face image to be detected.

In the present embodiment, in step 2, α angles and β angles corresponding to the estimated values of each γ angle are calculated in parallel according to the estimated values of each γ angle.

In this embodiment, the search algorithm is used to establish the estimated value set of the γ angle in step 1, and the following steps are performed:

step 102: determining the estimated range of the gamma angle:

is the search range;

In this embodiment, as shown in fig. 3, the method for calculating α and β angles under the condition of specifying the γ angle in step 2 includes the following steps:

step 201: establishing general human face three-dimensional sparse model

The general human face three-dimensional sparse model

The human face three-dimensional sparse model is a general human face three-dimensional sparse model which does not deflect on an X, Y, Z axis; the general face three-dimensional sparse model can adopt a code-3 and sparse face grid model and the like;

step 202: establishing general human face three-dimensional sparse model

Three-dimensional coordinate matrix V of upper designated point^3D：

The designated point comprises a reference point;

wherein the content of the first and second substances,

three-dimensional sparse model representing general human face

To (1) aThe number of the i designated points is,

wherein the content of the first and second substances,

it represents the i-th matching point and,

n is the number of face matching points on the face image to be detected;

206, rotating the general human face three-dimensional sparse model around the X axis at an angle of α

Zooming is carried out;

step 207: the zoomed general human face three-dimensional sparse model

Step 208: translation two-dimensional projection model

Modeling two-dimensional projection

step 209: from two-dimensional projection models

In this specific embodiment, the designated points include a subnasal point, a corner point of eyes, a nose tip point, and a mouth corner point, and the reference point is a subnasal point in the matching points.

In this embodiment, the objective function minF (X) is constructed using an interior penalty function method as follows:

minF(X)＝min[f(X)+r/s]；

wherein r is a barrier factor, r > 0; s is a scaling factor;

(X) representing a two-dimensional projection model

wherein n represents a two-dimensional projection model

The number of the upper designated points;

R_3Dthree-dimensional sparse model representing general human face

The rotation matrix of (a);

P_3Dthree-dimensional sparse model representing general human face

The forward projection matrix of (a);

S_3Dthree-dimensional sparse model representing general human face

The scaling matrix of (a);

T_2Drepresenting two-dimensional projection models

The translation matrix of (a) is,

indicating a reference point on the face image to be detected,

representing two-dimensional projection models

The above reference point.

Of course, min f (X) can also be directly used as the objective function, but convergence is slow, and a negative solution of s occurs.

In this embodiment, s is an initial value s in step 204₀For pupil distance and two-dimensional projection model on human face image to be detected

Ratio of interpupillary distance above, initial value α of α, β₀、β₀Are all 0 degrees; initial value r of r₀＝100。

In this embodiment, in step 2010, a modified Newton method is used to calculate an optimal solution of an objective function minf (x), and the method includes the following steps:

X_kRepresenting a current vector;

step 112: the initialization k is 0 and the initialization k is,

and jumps to step 116;

P_k＝-G_k ^-1g_kwherein G is_kIs F (X)_k) The Hesse matrix of (a) is,

step (ii) of114: one-dimensional search is carried out, and the current iteration step length t is calculated by using a golden section method_kTo update the current vector X_kThe updated vector is X_k+1，X_k+1＝X_k+t_kP_kAnd calculating F (X)_k+1)；

In the present embodiment, f (x) is expanded and simplified as:

wherein, c₀To c₁₂Are all constants and are calculated as follows:

wherein n represents a two-dimensional projection model

The number of upper matching points.

Experiments on virtual 3D faces and real face images show that the average estimation error of the invention for α angles is 6.5 degrees, the average estimation error of β angles is 4.6 degrees, the calculation time of the deflection estimation of the face around the X, Y axis of a specified in-plane rotation angle is less than 1ms, a parallel calculation mode of an independent thread is created for the deflection estimation calculation of each axis around the X, Y axis, the average calculation time of the whole algorithm is less than 5ms, and partial test effects are shown in figure 4.

Claims

1. A single multi-view face image posture estimation method capable of rotating randomly in a plane is characterized in that: the method comprises the following steps:

step 3, combining the values of the objective functions corresponding to the gamma angle pre-estimated values into an alternative set, selecting a minimum objective function value from the alternative set, and taking the gamma angle pre-estimated values, α and β angles corresponding to the minimum objective function value as the human face posture estimation parameters of the human face image to be detected;

in the step 1, a search algorithm is adopted to establish an estimated value set of a gamma angle, and the method comprises the following steps:

step 102: determining the estimated range of the gamma angle:

is the search range;

2. The method for estimating pose of single multi-view face image with arbitrary rotation in plane according to claim 1, wherein the method for calculating α and β angles under the designated γ angle in step 2 comprises the following steps:

step 201: establishing general human face three-dimensional sparse model

The general human face three-dimensional sparse model

step 202: establishing general human face three-dimensional sparse model

Three-dimensional coordinate matrix V of upper designated point^3D：

The designated point comprises a reference point;

wherein the content of the first and second substances,

three-dimensional sparse model representing general human face

The (i) th designated point of (c),

step 203: establishing three-dimensional sparse model of face to be detected and general face on face image

wherein the content of the first and second substances,

it represents the i-th matching point and,

n is the number of face matching points on the face image to be detected;

Zooming is carried out;

step 207: the zoomed general human face three-dimensional sparse model

Step 208: translation two-dimensional projection model

Modeling two-dimensional projection

step 209: from two-dimensional projection models

3. The method for estimating the pose of a single multi-view face image rotated arbitrarily in a plane according to claim 2, wherein: the designated points comprise subnasal points, binocular eye corner points, nasal cusp points and mouth corner points, and the reference points are the subnasal points.

4. The method for estimating the pose of a single multi-view face image rotated arbitrarily in a plane according to claim 2, wherein: the objective function min f (X) is as follows:

wherein n represents a two-dimensional projection model

The number of upper matching points;

R_3Dthree-dimensional sparse model representing general human face

The rotation matrix of (a);

P_3Dthree-dimensional sparse model representing general human face

The forward projection matrix of (a);

S_3Dthree-dimensional sparse model representing general human face

The scaling matrix of (a);

T_2Drepresenting two-dimensional projection models

The translation matrix of (a) is,

indicating a reference point on the face image to be detected,

the reference points on the representation.

5. The method for estimating the pose of a single multi-view face image rotated arbitrarily in a plane according to claim 2, wherein: the objective function min F (X) is constructed using the internal penalty function method as follows:

min F(X)＝min[f(X)+r/s]；

wherein r is a barrier factor, r > 0; s is a scaling factor;

(X) representing a two-dimensional projection model

wherein n represents a two-dimensional projection model

The number of the upper designated points;

R_3Drepresenting a generic faceThree-dimensional sparse model

The rotation matrix of (a);

P_3Dthree-dimensional sparse model representing general human face

The forward projection matrix of (a);

S_3Dthree-dimensional sparse model representing general human face

The scaling matrix of (a);

T_2Drepresenting two-dimensional projection models

The translation matrix of (a) is,

indicating a reference point on the face image to be detected,

representing two-dimensional projection models

The above reference point.

6. The method for estimating the pose of a single multi-view facial image rotated arbitrarily in a plane according to claim 5, wherein: initial value s of s in step 204₀For pupil distance and two-dimensional projection model on human face image to be detected

Ratio of interpupillary distance above α, βInitial value α₀、β₀Are all 0 degrees; initial value r of r₀＝100。

7. The method for estimating the pose of a single multi-view facial image rotated arbitrarily in a plane according to claim 5, wherein: in step 2010, an optimal solution of an objective function minF (X) is calculated by adopting a modified Newton method, and the method comprises the following steps:

X_kRepresenting a current vector;

step 112: the initialization k is 0 and the initialization k is,

and jumps to step 116;

P_k＝-G_k ^-1g_kwherein G is_kIs F (X)_k) The Hesse matrix of (a) is,

step 114: one-dimensional search is carried out, and the current iteration step length t is calculated by using a golden section method_kTo update the current vector X_kThe updated vector is X_k+1，X_k+1＝X_k+t_kP_kAnd calculating F (X)_k+1)；

step 116: judge g | |_kWhether | is less than or equal toGiven a termination limit; if yes, go to step 117; if not, go back to step 113;

8. The method for estimating the pose of a single multi-view facial image rotated arbitrarily in a plane according to claim 5, wherein: (x) expanded and simplified as:

wherein, c₀To c₁₂Are all constants and are calculated as follows:

wherein n represents a two-dimensional projection model

The number of upper matching points.

9. The method for estimating the pose of a single multi-view facial image rotated arbitrarily in a plane according to claim 1, wherein in step 2, α angles and β angles corresponding to each gamma angle estimated value are calculated in parallel according to each gamma angle estimated value.