CN114463815A

CN114463815A - Facial expression capturing method based on face key points

Info

Publication number: CN114463815A
Application number: CN202210102416.9A
Authority: CN
Inventors: 杨帆; 郝强; 潘鑫淼; 胡建国
Original assignee: Nanjing Zhenshi Intelligent Technology Co Ltd
Current assignee: Nanjing Zhenshi Intelligent Technology Co Ltd
Priority date: 2022-01-27
Filing date: 2022-01-27
Publication date: 2022-05-10

Abstract

The invention discloses a facial expression capturing method based on face key points, which comprises the following steps: 1) binding the expression for the selected 3DMM model by using a facial expression binding algorithm; 2) generating a random 3D face, carrying out normalization processing on recorded face key points to obtain a normalized key point set, and forming a training data set by the expression coefficients and the corresponding normalized key point set; 3) building a multilayer perceptron model, and performing iterative optimization by using a gradient descent method to obtain an expression recognition model; 4) carrying out key point detection on the face image to be recognized, screening out key points and carrying out normalization processing to obtain normalized key points; 5) and inputting the normalized human face key points into an expression recognition model, and predicting the change degree of the human face expression at the moment. According to the method, the analysis of the fine expressions of the face can be realized only by combining the visible light camera with the machine learning model, the cost is low, the real-time performance is good, and the method is beneficial to large-scale application.

Description

Facial expression capturing method based on face key points

Technical Field

The invention relates to the technical field of facial expression capture, in particular to a facial expression capture method based on face key points.

Background

The existing facial expression capturing method needs to place a sensor or use a structured light camera on the face, the equipment price is high, and facial movement is influenced due to wearing discomfort. The method based on computer vision only can recognize a few macroscopic expressions such as happy expression and sad expression due to difficult data acquisition, and is difficult to recognize fine expressions such as frown and mouth corner lifting, so that a facial expression capturing method based on human face key points is needed, and the fine expressions of the human face can be analyzed in real time only by combining a visible light camera with a machine learning model.

Disclosure of Invention

The invention provides a facial expression capturing method based on face key points, which aims to solve the problems in the background technology.

In order to achieve the above object, an embodiment of the present invention provides a facial expression capturing method based on face key points, which is characterized by including the following steps:

1) binding the expression for the selected 3DMM model by using a facial expression binding algorithm;

2) generating a random 3D face, carrying out normalization processing on recorded face key points to obtain a normalized key point set, and forming a training data set by the expression coefficients and the corresponding normalized key point set;

3) building a multilayer perceptron model, and performing iterative optimization by using a gradient descent method to obtain an expression recognition model;

4) carrying out key point detection on the face image to be recognized, screening out key points and carrying out normalization processing to obtain normalized key points;

5) and inputting the normalized human face key points into an expression recognition model, and predicting the change degree of the human face expression at the moment.

Further, the step 1) further includes selecting a 3DMM model, binding the required expression for the 3DMM model by using a facial expression binding algorithm, and representing the change degree of the expression by using a value from 0 to 1 for each expression.

Further, the 3DMM model comprises a BFM model and an LSFM model.

Further, the step 2) further comprises the steps of,

randomly adjusting the identity coefficient, the expression coefficient and the rotation angle to generate a random 3D face, wherein the random multiple groups of expression coefficients respectively correspond to the multiple 3D faces;

projecting a plurality of 3D faces to a two-dimensional plane, recording the coordinates of key points of the faces of the two-dimensional plane, and then carrying out normalization processing on all the key points to obtain a normalized key point set;

and forming a training data set by the multiple groups of expression coefficients and the corresponding normalized key point sets.

Further, the step 3) further comprises,

building a multilayer perceptron model, inputting the model as the horizontal and vertical coordinates of key points, and outputting the model as a predicted expression;

and calculating the loss of the predicted expression and the real expression of one batch of training data by adopting an L1 loss function, and performing iterative optimization on the built multilayer perceptron model by using a gradient descent method to obtain an expression recognition model.

Further, the step 4) further comprises,

performing key point detection on a face image to be recognized by using a face key point detection tool, and screening out key points corresponding to training data; and carrying out normalization processing on the screened key point coordinates to obtain normalized key points.

Further, the step 4) further comprises,

and inputting the normalized human face key points into the expression recognition model, and predicting the change degree of the human face expression at the moment.

Furthermore, the expressions comprise eyebrow lifting, eyebrow wrinkling, eye closing and mouth opening.

The embodiment of the present invention further provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor implements the steps of any one of the above facial expression capturing methods based on facial key points when executing the computer program.

The embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, a device on which the computer-readable storage medium is located is controlled to execute any one of the above steps of the facial expression capture method based on facial keypoints.

The beneficial effects are that: the facial expression capturing method based on the key points of the human face is provided, and the analysis of the fine expressions of the human face can be realized only by combining a visible light camera with a machine learning model. The method has low cost and good real-time property, and is beneficial to large-scale application.

Drawings

FIG. 1 is a schematic flow chart of a facial expression capturing method based on face key points according to the present invention;

FIG. 2 is a schematic diagram of a coordinate structure of 42 key points near a two-dimensional plane face in a facial expression capture method based on face key points according to the present invention;

fig. 3 is a schematic structural diagram of a preferred embodiment of a terminal device provided in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating a facial expression capturing method based on face key points according to a preferred embodiment of the present invention. The facial expression capturing method based on the face key points comprises the following steps:

s1, binding blenshapes (expression) for the 3DMM model: the Face 3DMM (3D portable Models) is a parameterized Face Model, and can generate a specific 3D Face shape by adjusting identity parameters, expression parameters, and rotation angles, and common Face 3DMM Models include a BFM (base Face Model), an LSFM (Large Scale Face Model), and the like. Selecting a proper 3DMM model, using a facial expression binding algorithm-Example-based facial profiling, binding M (M is a positive integer) expression blenshapes required by the 3DMM model, wherein the expression can be decoupled fine expressions such as eyebrow lifting, eyebrow wrinkling, eye closing, mouth opening and the like, each blenshape is a value from 0 to 1 and represents the change degree of the expression, and the 3DMM model after the blenshapes are calibrated can control the fine expression of the 3D face by adjusting the blenshapes coefficient;

and S2, generating training data: randomly adjusting the identity coefficient, the blenshapes coefficient and the rotation angle to generate a random 3D face, and then randomly adjusting N groups (N is a positive integer larger than 100) of blenshapes coefficients (B)₀,B₁,…,B_N-1) Corresponding to N3D faces (Face)₀,Face₁,…,Face_N-1) Wherein

Is an nth set of random expression coefficients (N is an integer from 0 to N-1),

is the mth expression coefficient component (

M is an integer from 0 to M-1), and combining the 3D faces (Face)₀,Face₁,…,Face_N-1) Projecting to a two-dimensional plane, recording 42 key point coordinates near eyebrows (10), eyes (12), noses (9), lips (8) and chin (3) of the two-dimensional plane Face, as shown in fig. 2, then the nth 3D Face_nCorresponding set of projection keypoints

K is an integer from 0 to 41,

respectively the abscissa and ordinate of the kth key point, normalizing the key points and collecting lmks of the key points_nThe abscissa of the leftmost key point in the graph is recorded as left_nThe abscissa of the rightmost key point is denoted as right_nThe ordinate of the uppermost key point is recorded as top_nThe ordinate of the lowest key point is recorded as bottom_nNormalized nth key point set

Wherein the kth key point coordinate is

Forming a training data set by the N groups of blenshapes coefficients and the corresponding normalized key point set

S3, training an expression recognition model: building a multilayer perceptron model, wherein the model comprises 4 full connection layers, and the number of neurons of the full connection layers is 84, 256 and M respectively; inputting the horizontal and vertical coordinates of 42 key points by the model, and outputting predicted M expression blendshapes; calculating the loss of predicted blenshapes and real blenshapes of a batch of training data by adopting an L1 loss function, and iterating and optimizing the model by using a gradient descent method; the model after final training can accurately predict the expression blenshapes from the input key points;

s4, detecting the key points of the human face: using Face key point detection tool (such as Dlib, Face-alignment) to make key point detection on the Face image to be recognized, screening out 42 key points (lmk) corresponding to training data₀,lmk₁,…,lmk₄₁)，lmk_k＝(x_k,y_k) Is the k-th key point, x_k,y_kRespectively is the horizontal coordinate and the vertical coordinate of the kth key point; normalizing the coordinates of the key points, and recording the abscissa of the leftmost key point of the 42 key points as left, the abscissa of the rightmost key point as right, the ordinate of the uppermost key point as top and the ordinate of the lowermost key point as bottom; 42 key points after normalization are

Wherein the kth key point coordinate is

S5, facial expression capture: normalizing the face key points

Inputting the trained model, and predicting the change degrees of the M expressions of the human face at the moment.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a terminal device according to a preferred embodiment of the present invention. The terminal device comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor executes the computer program to implement the steps of the facial expression capture method based on the face key points according to any one of the embodiments.

Preferably, the computer program may be divided into one or more modules/units (e.g., computer program 1, computer program 2, … …) that are stored in the memory and executed by the processor to implement the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the terminal device.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, etc., the general purpose Processor may be a microprocessor, or the Processor may be any conventional Processor, the Processor is a control center of the terminal device, and various interfaces and lines are used to connect various parts of the terminal device.

The memory mainly includes a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like, and the data storage area may store related data and the like. In addition, the memory may be a high speed random access memory, may also be a non-volatile memory, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), and the like, or may also be other volatile solid state memory devices.

It should be noted that the terminal device may include, but is not limited to, a processor and a memory, and those skilled in the art will understand that the structural diagram of fig. 3 is only an example of the terminal device and does not constitute a limitation of the terminal device, and may include more or less components than those shown, or combine some components, or different components.

The embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, a device on which the computer-readable storage medium is located is controlled to execute the steps of the facial expression capture method based on the facial keypoints according to any one of the above embodiments.

The embodiment of the invention provides a facial expression capturing method based on human face key points, which can realize the analysis of the fine expressions of the human face only by combining a visible light camera with a machine learning model.

It should be noted that the above-described system embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the system provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A facial expression capturing method based on face key points is characterized by comprising the following steps:

2. The facial expression capturing method based on human face key points as claimed in claim 1, wherein the step 1) further comprises selecting a 3d dm model, using a facial expression binding algorithm to bind the required expressions for the 3d dm model, and each expression adopts a value of 0 to 1 to represent the degree of change of the expression.

3. The facial keypoint-based facial expression capture method of claim 1, wherein said 3DMM models comprise BFM models, LSFM models.

4. The facial keypoint-based facial expression capture method of claim 1, wherein said step 2) further comprises,

5. The facial keypoint-based facial expression capture method of claim 1, wherein said step 3) further comprises,

6. The facial keypoint-based facial expression capture method of claim 5, wherein said step 4) further comprises,

7. The facial keypoint-based facial expression capture method of claim 6, wherein said step 4) further comprises,

8. The facial expression capture method based on facial keypoints according to claim 1, wherein the expressions comprise eyebrow lifting, eyebrow crumpling, eye closing and mouth opening.

9. A terminal device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the steps of the facial keypoint-based facial expression capture method according to any one of claims 1 to 8 when executing the computer program.

10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus on which the computer-readable storage medium is located to perform the steps of the method for capturing facial expression based on facial keypoints according to any one of claims 1-8.