CN108363973B

CN108363973B - Unconstrained 3D expression migration method

Info

Publication number: CN108363973B
Application number: CN201810124168.1A
Authority: CN
Inventors: 程洪; 谢非; 郝家胜; 赵洋
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2018-02-07
Filing date: 2018-02-07
Publication date: 2022-03-25
Anticipated expiration: 2038-02-07
Also published as: CN108363973A

Abstract

The invention discloses an unconstrained 3D expression migration method, which is realized by using a computer vision and probability statistics based method; firstly, detecting a face region by using the adaboost + harr characteristics, then extracting face geometric characteristics in the face region according to a Constrained Local Model (CLM) method, then extracting face expression parameters by using Support Vector Regression (SVR), finally inputting the expression parameters to control the face BlendShape of the 3D Model, synthesizing expression animation, and realizing the unconstrained 3D expression migration method.

Description

Unconstrained 3D expression migration method

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an unconstrained 3D expression migration method.

Background

The facial expression has an important role in communication between people, and the facial expression has the advantages of being more intuitive and accurate in expressing human emotion compared with media such as character voice and the like. The emotion interaction mode of people is used in the fields of virtual reality, digital entertainment, communication and video conference, man-machine interaction and the like, and has the advantages of strong expressive force, more natural interaction and the like compared with the traditional interaction modes such as voice, control panels and the like. The expression migration method roughly comprises the following three aspects: capturing the facial expression, extracting facial expression parameters, and synthesizing parameterized target facial animation.

At present, facial expression capturing technologies can be divided into an expression capturing technology without mark points and an expression capturing technology with mark points, the expression capturing technology based on the mark points can capture accurate expression details, but complex hardware equipment is often needed for assistance, a painting brush and the like need to be used for marking the face of a user in the capturing process, and the facial expression capturing technology has certain invasiveness. The unmarked expression capturing technology has less constraint on hardware and a user, and facial expression information is extracted through a 2D facial image.

Most of the expression feature parameterization methods use a machine learning algorithm, a model is trained through a large number of data sets, the mapping relation between captured expression information and expression control parameters is learned, however, the method is limited by individual differences of users, the expression result of a classifier is dependent on training data and the detection capability of the facial expression in a natural state to a great extent, and for the problem, a learner learns the expression features of the current user through an additional initialization step, so that the influence caused by personalized differences is effectively weakened, but the algorithm execution complexity is increased.

The animation synthesis method can be divided into 2D face animation and 3D model face animation according to the type of the target face, the 2D face animation is based on images, and high-reality 2D face animation can be obtained, but in the synthesized animation, the illumination condition of the face and the posture of the face are difficult to change, and seamless splicing into a 3D scene is difficult to achieve. The muscle model driving mode in the 3D model face animation is difficult to obtain control parameters through a computer vision algorithm, the manual control and the manual consumption for establishing the 3D model are very high, the face animation mode of a mixed sample needs to preset an expression library of each basic expression, the expression library needs to meet the orthogonality and the comprehensiveness, and the manual cost for establishing the expression library is very high.

The invention uses a label-free expression capture algorithm, uses a machine learning algorithm to realize automatic parameterized expression feature extraction by using comprehensive training data according to the universality feature of the facial expression, and finally uses an orthogonalized mixed sample to synthesize the facial animation.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an unconstrained 3D expression migration method, which comprises the steps of extracting user expression parameters according to the facial image characteristics of a user, mapping the user expression parameters to a virtual 3D model, synthesizing facial animation and realizing unconstrained 3D expression migration.

In order to achieve the above object, the present invention provides an unconstrained 3D expression migration method, comprising:

(1) off-line training of the face shape model and obtaining relevant parameters in the model

(1.1) downloading a face image with characteristic point labels from a face database to serve as a face image set;

(1.2) constructing a human face shape model:

wherein,

representing the average face shape, P is a matrix of principal components of the face shape variation, P ═ P₁,P₂...P_k]B is the weight vector of the face shape change, B ═ B₁,b₂.,..,b_k]^T；

(1.3) calculating relevant parameters of the face shape model by using the face image set

Setting M face images in the face image set, wherein each face image has N feature points, and the coordinate of the ith feature point is recorded as (x)_i,yi)，i＝1,2,…,N；

The feature point vector composed by jth face image uses x^(j)＝[x₁ y₁ x₂ y₂...x_N y_N]^TIndicate, average face shape

Comprises the following steps:

subtracting the average face shape from the feature point vector formed by each face image to obtain a shape change matrix with the average value of 0

Then, PCA extraction matrix is used for principal component analysis

Characteristic vector P of_cAnd a corresponding characteristic value lambda_cC is 1,2, …, min (M, N), and then the first k eigenvectors are selected to form a matrix P consisting of main components of the face shape change in a column discharge mode;

(1.4) respectively modeling in a neighborhood range of m multiplied by m by taking the coordinate position of each feature point as a center, then training a scoring model by using a Support Vector Machine (SVM), scoring m multiplied by m points in the neighborhood range by using the trained scoring model to obtain m multiplied by m scoring results, and forming a scoring response graph of each feature point by using the m multiplied by m scoring results;

fitting each scored response graph to a quadratic function r (x, y) ═ a (x-x)₀)²+b(y-y₀)²+ c, wherein (x)₀,y₀) The position coordinates of the center point of the response diagram, namely the coordinates of each characteristic point, a, b and c are fitting parameters, and (x, y) the position coordinates of the rest points except the center point in the response diagram;

the position coordinates of the other points are respectively substituted into the quadratic function to obtain the maximum value R (x, y) in the response diagram, and then sigma (R (x, y) -R (x, y))²Solving the values of a, b and c;

(2) acquiring an image containing the face of a user in real time, and marking the image as a source image;

(3) detecting a face region of a user in the source image by using a haar feature + adaboost face detection algorithm;

(4) solved using a training phase

P,λ_jA, b, c, detecting the characteristic points of the current face area

(4.1) according to the human face shape model:

and parameters

P, using a shape-changing weight vector B ═ B₁,b₂.,..,b_k]^TAcquiring all initial feature points in the face region;

(4.2) obtaining a scoring response graph of each initial characteristic point according to the method in the step (1.4);

(4.3) by

As an objective function, wherein (x)_p,y_p) The position coordinate of the p-th point in the scoring response graph is represented, and beta refers to the weight value of the shape constraint; enabling f (x) to converge to the maximum value through iteration, and substituting the weight vector of the corresponding shape change into the face shape model at the moment to acquire all feature points in the face region;

(4.4) taking the updated shape change weight vector B as a new shape model parameter, repeating the steps (4.2) - (4.3) until the updated shape change weight vector B is kept unchanged, and outputting the position of a face feature point corresponding to the final face shape model;

(5) parameterizing the expression characteristics of the user by adopting an Action Unit (AU) detection algorithm with normalized personal characteristics according to the positions of the facial feature points

(5.1) extracting apparent features of the human face

Using a face image set with AU parameter labels in a plurality of databases, scaling the face region scale of each face image to a fixed size, then extracting a gradient histogram of each face image, extracting gradient features in the gradient histogram by using a Principal Component Analysis (PCA), and selecting the first k features as face apparent features;

(5.2) extracting geometric features of human face

Extracting human face feature points from the step (4) as human face geometric features;

(5.3) extracting the expression in the natural state

Calculating the average value of the apparent characteristics and the geometric characteristics in the volume image through the face image in the sequence natural state in the SEMAINE data set to be used as a natural expression descriptor, and then carrying out normalization operation on the natural expression descriptor to generate natural expression characteristics;

(5.4) according to the apparent features, the geometric features and the natural expression features extracted from the face image sets in the multiple databases, training an AU detection model by using a Support Vector Regression (SVR) method of a linear kernel;

(5.5) extracting apparent characteristics, geometric characteristics and natural expression characteristics of the user face area in the source image, inputting the extracted apparent characteristics, geometric characteristics and natural expression characteristics into an AU detection model, and acquiring AU expression parameters of the user face in the current source image;

(6) inputting the AU expression parameters into a 3D model containing a shape difference model (BlendShape), and driving the model expression by using a Unity3D engine to realize the unconstrained 3D model expression migration of the user.

The invention aims to realize the following steps:

the invention relates to an unconstrained 3D expression migration method, which is realized by using a computer vision and probability statistics based method; firstly, detecting a face region by using the adaboost + harr characteristics, then extracting face geometric characteristics in the face region according to a Constrained Local Model (CLM) method, then extracting face expression parameters by using Support Vector Regression (SVR), finally inputting the expression parameters to control the face BlendShape of the 3D Model, synthesizing expression animation, and realizing the unconstrained 3D expression migration method.

Meanwhile, the unconstrained 3D expression migration method provided by the invention also has the following beneficial effects:

(1) the expression migration method applied to 3D movie making at present needs to utilize additional hardware auxiliary equipment to capture expression information, so that the problems of high system complexity, difficulty in popularization and the like are caused, and the method only adopts a common network camera and has good migration performance;

(2) and when the expression parameters of the user are extracted by a plurality of methods, the specific position of the face of the user needs to be marked, which is invasive to the user. (ii) a

(3) The invention has the advantages of low hardware cost, good real-time performance, high corresponding speed, no restriction to users and the like;

drawings

FIG. 1 is a flow chart of an unconstrained 3D expression migration method of the present invention;

FIG. 2 is a flow chart of detecting facial region features;

FIG. 3 is a flow chart of expression parameter extraction;

fig. 4 is a diagram illustrating the expression state of the prefabricated 3D model.

Detailed Description

The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.

Examples

FIG. 1 is a flow chart of an unconstrained 3D expression migration method according to the present invention.

In this embodiment, as shown in fig. 1, an unconstrained 3D expression migration method according to the present invention includes the following steps:

s1, training the face shape model off line and obtaining relevant parameters in the model

S1.1, training face images of which relevant parameters of a face shape model need to be manually calibrated at key point coordinates, wherein an AFLW database is used for downloading the face images with characteristic point labels from the AFLW database to serve as a face image set;

s1.2, constructing a human face shape model:

wherein,

S1.3, using the face image set as input, comparing the true value of the characteristic points marked in the image set with the output result of the model, and iteratively updating the value of the face model parameter to be close to the true value:

Comprises the following steps:

Then, PCA extraction matrix is used for principal component analysis

s1.4, respectively modeling in a neighborhood range of m multiplied by m by taking the coordinate position of each feature point as a center, then training a scoring model by using a Support Vector Machine (SVM), scoring m multiplied by m points in the neighborhood range by using the trained scoring model to obtain m multiplied by m scoring results, and forming a scoring response graph of each feature point by using the m multiplied by m scoring results;

s2, acquiring an image containing a user face in real time by a camera, inputting the image as a data source, and marking the image as a source image;

s3, detecting the face area of the user in the source image by using a haar feature and adaboost face detection algorithm;

s4, using the shape model correlation parameters solved in the training phase:

P，λ_jand a, b and c, inputting the initialized key point position into a model by combining with the image information of the face area, and outputting the model so as to detect the feature point of the current face area, wherein the specific detection process is described below by combining with the figure 2.

According to the constructed human face shape model, a human face shape model can be initialized on a detected human face area, and each point is enabled to search for an optimal matching point in a neighborhood range according to the position of a human face characteristic point calculated by the human face shape model, and the specific steps are as follows:

s4.1, according to the human face shape model:

and parameters

s4.2, obtaining a scoring response graph of each initial characteristic point according to the method in the step S1.4;

s4.3, by

As an objective function, wherein (x)_p,y_p) The position coordinate of the p-th point in the scoring response graph is represented, and beta refers to the weight value of the shape constraint; converging f (x) to a maximum value through iteration, wherein a point corresponding to the maximum value is an optimal matching point searched in a neighborhood range, and substituting a weight vector of the corresponding shape change into the face shape model at the moment to acquire all feature points in the face region;

and S4.4, taking the updated shape change weight vector B as a new shape model parameter, repeating the steps S4.2-S4.3 until the updated shape change weight vector B is basically unchanged, and outputting the position of the face feature point corresponding to the final face shape model.

S5, parameterizing the expression features of the user by using an Action Unit (AU) detection algorithm with normalized personal characteristics according to the positions of the facial feature points, and the specific flowchart is shown in fig. 3.

After all the feature points in the face area of the user are acquired, the feature points need to be further extracted as parametric information under a fixed rule, and then expression animation synthesis can be further performed. The invention uses the commonly used human face expression motion Unit (AU) rule to parameterize the human face expression. The actual effect of AU classifiers is very dependent on the training data due to the large variability between individual users. Aiming at the problem, the invention combines the geometric characteristics and the apparent characteristics of the facial expression of the user and trains an AU classifier under a cross-database, and the specific process is as follows:

s5.1, extracting apparent features of human faces

Using a face image set with AU parameter labels in a plurality of databases, scaling the face region of each face image to 112 × 112, and then extracting a gradient histogram of each face image, specifically, using 2 × 2 unit blocks, each unit block being 8 × 8 pixels, to form 4464-dimensional vector features describing the face. In order to reduce the quantity of HOGs (human body features), extracting gradient feature principal components in a gradient histogram by using a Principal Component Analysis (PCA) method on the basis, namely selecting the first k features as apparent features of a human face; in the embodiment, the data of a plurality of databases are used for extracting apparent features with stronger generalization capability when the PCA performs data dimension reduction;

s5.2, extracting geometric features of human face

Extracting human face feature points from the step S4 to be used as human face geometric features;

s5.3, extracting expression in natural state

Because of individual differences, for example, some faces can present different conditions like smiling or frowning in a relaxed state, and it is very difficult to acquire certain facial expressions under the condition that the natural expressions of the individual are unknown, after the apparent features and the geometric features are acquired, the expressions in the natural state need to be extracted. The method relies on that the human face in the sequence image is basically in a natural state, so that in the embodiment, the average value of the apparent characteristic and the geometric characteristic in the volume image is calculated through the human face image in the sequence natural state in the SEMAINE data set to be used as a natural expression descriptor, and then the natural expression descriptor is subjected to normalization operation to generate a natural expression characteristic;

s5.4, training an AU detection model by using a Support Vector Regression (SVR) method of a linear kernel according to the apparent characteristics, the geometric characteristics and the natural expression characteristics extracted from the face image set in the multiple databases;

s5.5, respectively extracting the apparent characteristics, the geometric characteristics and the natural expression characteristics of the user face area of the source image according to S5.1-S5.3, inputting the extracted apparent characteristics, the geometric characteristics and the natural expression characteristics into an AU detection model, and acquiring AU expression parameters of the user face of the current source image;

s6, the animation synthesis method comprises a muscle model driving mode, a shape difference model driving mode and the like, and as the facial expression is higher in deformation complexity relative to limb actions, a plurality of skeleton nodes need to be arranged in the bone driving mode, the driving links are quite complex, the shape difference model is simple and effective, one shape difference model is shown in figure 4, only partial expression states need to be preset, and the final expression animation is synthesized in a linear fusion mode. Therefore, the method uses AU expression parameters to be input into a 3D model containing a shape difference model (blend shape), and uses a Unity3D engine to drive the model expression, so that the unconstrained 3D model expression migration of the user is realized. .

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims

1. An unconstrained 3D expression migration method is characterized by comprising the following steps:

(1) off-line training a face shape model and obtaining relevant parameters in the model;

(1.2) constructing a human face shape model:

wherein,

representing the average face shape, P is a matrix of principal components of the face shape variation, P ═ P₁,P₂...P_k]B is a weight vector of the face shape change,

(1.3) calculating relevant parameters of the face shape model by using the face image set;

setting M face images in the face image set, wherein each face image has N feature points, and the coordinate of the ith feature point is recorded as (x)_i,y_i)，i＝1,2,…,N；

Comprises the following steps:

Then, PCA extraction matrix is used for principal component analysis

respectively modeling in a neighborhood range of m multiplied by m by taking the coordinate position of each feature point as a center, then training a scoring model by using a Support Vector Machine (SVM), scoring m multiplied by m points in the neighborhood range by using the trained scoring model to obtain m multiplied by m scoring results, and forming a scoring response graph of each feature point by using the m multiplied by m scoring results;

(4) solved using a training phase

P,λ_cA, b and c, detecting the feature points of the current face area;

(4.1) according to the human face shape model:

and parameters

P, using weight vectors of shape changes

Acquiring all initial feature points in the face region;

(4.2) obtaining a scoring response graph of each initial characteristic point according to the method in the step (1.3);

(4.3) by

(5) adopting an expression parameter extraction algorithm based on an Action Unit (AU) to parameterize the expression characteristics of the user according to the positions of the facial feature points;

(5.1) extracting the apparent features of the human face;

(5.2) extracting geometric features of the human face;

(5.3) extracting the expression in a natural state;