CN111178255B

CN111178255B - Tensor decomposition-based multi-feature fusion 4D expression identification method

Info

Publication number: CN111178255B
Application number: CN201911384458.0A
Authority: CN
Inventors: 黄义妨; 张明; 岳江北; 李慧斌
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2019-12-28
Filing date: 2019-12-28
Publication date: 2022-07-12
Anticipated expiration: 2039-12-28
Also published as: CN111178255A

Abstract

A4D expression recognition method based on tensor decomposition and multi-feature fusion obtains 4D expression data of a human face; preprocessing the 4D facial expression data and then calculating to obtain three components of a normal vector, a shape index and a depth map of the 4D facial expression data; carrying out tensor decomposition on three components of a normal vector, a shape index and a depth map of 4D face data respectively, and extracting dynamic face expression information; and classifying the dynamic facial expression information by using a dynamic image network, and performing score fusion on the classified results to obtain a final classified result. The method fully utilizes the information of the 4D face data, calculates three components of a normal vector of the face, a shape index and a depth map of the sequence face data, fully utilizes the 3D geometric information of the face, has more representative and discriminability of the features of the face for different people, and has higher accuracy of face recognition and expression recognition.

Description

Tensor decomposition-based multi-feature fusion 4D expression identification method

Technical Field

The invention relates to an expression recognition method, in particular to a tensor decomposition-based multi-feature fusion 4D expression recognition method.

Background

With the development and progress of artificial intelligence and computer technology, expression recognition and face recognition are receiving more and more attention. The applications of expression recognition and face recognition in life are gradually becoming widespread. There are many current facial expression recognition methods, such as: and extracting the features of the 2D picture or the features of the video by using a deep neural network, and further classifying. Also, the expression classification using the 3D face data is available. In fact, 2D-based facial expression recognition is susceptible to illumination, scenes. The expression recognition based on 3D can overcome the influence of illumination and gesture, but for the expression recognition based on 3D, different individuals are different even with the same expression due to different expression modes and degrees of expressions of different people. Therefore, the identification information of the person is equivalent to a disturbance for the problem of expression recognition.

Disclosure of Invention

The invention aims to provide a tensor decomposition-based multi-feature fusion 4D expression recognition method.

In order to achieve the purpose, the invention adopts the following technical scheme:

a tensor decomposition-based multi-feature fusion 4D expression recognition method comprises the following steps:

(1) acquiring 4D facial expression data;

(2) preprocessing the 4D facial expression data and then calculating to obtain three components of a normal vector, a shape index and a depth map of the 4D facial expression data;

(3) carrying out tensor decomposition on three components of a normal vector, a shape index and a depth map of the 4D face data respectively, and extracting dynamic face expression information;

(4) and classifying the dynamic facial expression information by using a dynamic image network, and performing score fusion on the classified results to obtain a final classified result.

A further improvement of the present invention is that in step (1), the 4D facial expression data S ═ F₁，F₂，…F_lIn which F_iRepresenting 3D facial expression data, i ═ 1 … l, l representing the number of frames of the 4D face.

The further improvement of the invention is that in the step (2), the specific process of preprocessing the 4D expression data of the human face is as follows: and denoising the 4D facial expression data.

The further improvement of the invention is that in the step (2), the specific process of calculating the three components of the normal vector of the 4D face data is as follows:

1) firstly, calculating a normal vector of a single 3D face; the specific process is as follows: for 3D face data, first, a point P on the face is selected_jTo form a neighborhood δ ═ P_i(x_i，y_i，z_i) I | ═ 1,2, … k }, k is 5, and the plane to be fitted is:

Ax+By+Cz+D＝0

satisfies A²+B²+C²＝1；

Solving the plane fitting problem by a least square method and a Lagrange multiplier method to obtain a point P on the face_jEstimating all points on the 3D face to obtain a normal vector of the 3D face;

2) respectively projecting the normal vector of the 3D face to YZ, XZ and XY planes to obtain an X component diagram, a Y component diagram and a Z component diagram of the normal vector of the 3D face;

3) finally, performing step 1) and step 2) on each 3D face in the 4D face data to obtain a corresponding normal vector component image, and then overlapping corresponding normal vector X component images calculated by all 3D faces in the 4D face data together to obtain a normal vector X component image of the 4D face; overlapping the corresponding normal vector Y component images calculated by all 3D faces of the 4D face data to obtain a normal vector Y component image of the 4D face; and overlapping the normal vector Z component images calculated by all the 3D faces of the 4D face data together to obtain a normal vector Z component image of the 4D face.

The invention is further improved in that the concrete process for solving the plane fitting problem is as follows: normalization vector corresponding to minimum eigenvalue of covariance matrix sigma

The covariance matrix sigma is of the form

Wherein,

a further improvement of the present invention is that the specific process of calculating the shape index map is as follows:

firstly, calculating a shape index graph of a 3D face:

for a certain point of the human face, the point and the surrounding area are assumed to be a discrete parametric surface

Parameters A, B, C, D, E, F and G are fitted according to the vertex coordinates of the 3D face, and then a matrix is obtained

The characteristic root decomposition is carried out on the matrix, and the maximum characteristic root is the maximum principal curvature K₁The minimum feature root is the minimum principal curvature K₂(ii) a Substituting the maximum principal curvature and the minimum principal curvature at a vertex into a shape index Shapeindex calculation formula to obtain the shape index at the vertex;

calculating a shape index Shapeindex for each vertex of the 3D face to obtain a shape index graph of the 3D face;

and stacking the shape index images of each 3D face of the 4D face to obtain the shape index image of the 4D face.

The further improvement of the invention is that the specific process of calculating the depth map of the 4D face is as follows:

first, a depth map of a 3D face is calculated, and for a 3D face F_iA point P of_j(x_j，y_j，z_j) Corresponding to the pixel value Dep of the depth map_jThe calculation formula of (2) is as follows:

wherein z is_maxAnd z_minRepresenting a face F_iThe maximum and minimum values of the Z coordinate of the point of (a);

and then stacking the depth maps of all the 3D faces of the 4D face to obtain the depth map of the 4D face.

The further improvement of the present invention is that, in the step (3), the specific process of tensor decomposition of the depth map of the 4D face data is as follows:

1) establishing a model;

depth map Dep ∈ R for 4D face^H×W×LWherein H represents the height of the depth map, W represents the width of the depth map, L represents the sequence length of the 4D face, and the expression information is assumed to be Emo epsilon R^H×W×LThe identity information is ID ∈ R^H×W×LThen, a 4D facial expression-identity information separation model is established:

f＝DEmo

wherein, lambda represents a weight coefficient, e represents noise, and DEmo represents modeling of dynamic expression information;

||DEmo||₁＝||D_hEmo||₁+||D_vEmo||₁+||D_tEmo||₁

D_hEmo＝vec(Emo(i，j+1，k)-Emo(i，j，k))

D_vEmo＝vec(Emo(i+1，j，k)-Emo(i，j，k))

D_tEmo＝vec(Emo(i，j，k+1)-Emo(i，j，k)) (2)

D_hdifferential operator representing the horizontal direction, D_vDifferential operator representing the horizontal direction, D_tA difference operator representing a time domain direction;

modeling the static persona ID is as follows:

wherein,

core tensor, U, representing the Take decomposition₁，U₂，U₃A matrix representing each mode in the Tack decomposition; is made from₁，×₂And-₃Respectively representing the product of the tensor and the matrix of each mode;

2) solving the model:

and solving the established 4D facial expression-identity information separation model through iterative optimization.

The invention is further improved in that the specific process of solving by iterative optimization is as follows:

the first step is as follows: updating core tensor of tach decomposition

Sum matrix U₁，U₂，U₃；

Wherein λ is^DepIs a Lagrange multiplier vector, beta^DepIn order to be a positive penalty parameter,

for the estimated static identity information of the person, Dep is the depth map of the 4D face, Emo is the dynamic expression information of the 4D face data, e is noise,

is shown as

The tensor of (a);

the second step is that: updating the noise e;

wherein,

to represent

Spread vector, λ^DepIs a Lagrange multiplier vector, beta^DepThe penalty parameter is positive, Dep is a depth map of the 4D face, and Emo is dynamic expression information of the 4D face data;

the third step: updating the dynamic expression information Emo;

wherein fftn and ifftn represent fast 3D Fourier transform and inverse transform, respectively, β^DepAnd beta^fFor a positive penalty parameter, λ^fIs a Lagrange multiplier vector, | · non-conducting phosphor²Is the square operation of an elementDo, D^*The companion matrix of D is represented;

update tensor f:

where λ is a weight coefficient, λ^fIs the Lagrange multiplier vector, beta^fFor positive penalty parameters, soft is a function defined as: soft (a, τ): sgn (a) · max (| a | - τ, 0);

lagrange multiplier vector lambda for updating 4D facial expression-identity information separation model^fPositive penalty parameter beta^fAnd beta^Dep，：

Wherein,

nRes_preis the value of the last iteration, Dep is the depth map of the 4D face, Emo is the dynamic expression information of the 4D face data,

e is noise, and γ is a parameter related to model convergence; c. C₁，c₂Are all coefficients;

and sending the dynamic expression information Emo extracted in the third step into a dynamic image network to extract information of expression motions, and further realizing the classification of expressions.

The further improvement of the invention is that the bottom layer of the dynamic image network is a deep neural network, a rank posing layer is added before the network full-connection layer, and the calculation process of the rank posing layer of the network is as follows:

the network is updated as follows:

wherein，a^(m)Output, μ, representing the m-th layer of the moving picture network_tDenotes the parameter, V, to be learned by the network₁，...，V_TA feature representing an output of the dynamic image network; the following approximation is made to facilitate network back propagation:

wherein alpha is_tIs a parameter to be learned by the network,

the characteristics of the upper layer network are shown.

Compared with the prior art, the invention has the following beneficial effects:

(1) the 4D face data is used for expression recognition and face recognition, and the defect that the 2D face recognition is greatly influenced by factors such as illumination postures can be overcome. By using the 4D data, stable effects can be obtained in expression recognition and face recognition for different scenes and environments.

(2) The method fully utilizes the information of the 4D face data, calculates three components of normal vectors, shape indexes and depth maps of the face for the sequence face data, fully utilizes the 3D geometric information of the face, has more representative and discriminability of the features of the face for different people, and has higher accuracy of face recognition and expression recognition.

(3) And decomposing the 4D face data by using a tensor decomposition method to obtain dynamic expression information and static face identity information. The dynamic expression information is used for expression recognition, and the interference of the character identity is removed, so that the expression recognition result is more stable and accurate.

Drawings

FIG. 1 is a detailed flow chart of the present invention.

Fig. 2 is a normal vector three component, shape index and depth map of the 4D face of the present invention.

Fig. 3 is a dynamic expression information graph extracted by tensor decomposition of three components of a normal vector, a shape index and a depth map of a 4D face according to the invention.

Fig. 4 is a network structure diagram of the present invention for performing expression recognition on dynamic expression information extracted from a shape index using a dynamic image network.

FIG. 5 is a diagram of the network architecture for multi-feature fusion expression recognition using a dynamic image network in accordance with the present invention.

Detailed Description

The present invention will be described in detail below with reference to examples.

Referring to fig. 1, the present invention comprises the steps of:

(1) acquiring 4D facial expression data;

(2) preprocessing the data, and calculating three components of a normal vector, a shape index and a depth map of the 4D face data, wherein the five features deeply reflect the geometric shape characteristics of the face at each moment;

(3) carrying out tensor decomposition on three components of a normal vector of the 4D face data, the shape index and the depth map respectively, and extracting dynamic face expression information and static identity information;

Specifically, referring to fig. 1, the present invention comprises the following steps:

step 101:

and acquiring 4D expression data of the human face, wherein the 4D expression data refer to a series of 3D human face data video sequences. Some cameras used firstly, such as Intel RealSense SR300 and the like, can capture the depth information of the face, can easily obtain 3D facial expression data by means of a structured light model, and continuously acquire the 3D facial expression data to obtain 4D facial data.

Assume that 4D facial expression data is S ═ F₁，F₂，…F_lIn which F_i(i ═ 1 … l) represents 3D facial expression data, and l represents the number of frames of a 4D face.

Step 102:

the 4D data is preprocessed, 4D face data obtained by a camera often contains noise, holes and the like, the 4D face needs to be preprocessed, and normal vector components, shape indexes and depth maps corresponding to the 4D face are further calculated. In particular, the present invention relates to a method for producing,

step 2.1: the 4D face data is subjected to hole filling processing, and the hole filling processing can be realized by a template face hole filling method, which are common processing methods for 3D and 4D data.

Step 103:

three components of a normal vector of the 4D face data are calculated, as well as a shape index and a depth map. This step is to extract the corresponding geometric features from the 4D face. As shown in fig. 2, an example of calculating the three components of the normal vector, as well as the shape index and depth map, from the disclosed 4D expression public database BU4D is given. In this example, 5 frames of faces are selected, and the images respectively include, from top to bottom: the depth map comprises a normal vector X component map, a normal vector Y component map, a normal vector Z component map and a shape index map. The images show the same expression condition of the same face at different times from left to right. Specifically, calculating the three components of the normal vector of the 4D face, and the shape index and depth map comprises the following 3 steps.

(1) Calculating the normal vector component of the 4D face:

1) firstly, a normal vector of a single 3D face is calculated, and for a 3D face data, such as a point cloud, the normal vector at a certain point is estimated to be obtained by a method of plane fitting of a certain point and a plurality of points around the certain point, and generally, the normal vector is obtained by fitting a plane by using a point and 5 points nearest to the point. For example: a point P on the face_jThe normal vector of (1) is first selected to be P_jTo form a neighborhood δ ═ P_i(x_i，y_i，z_i) I ═ 1,2, … k }, where k takes 5 to fit the plane:

Ax+By+Cz+D＝0

satisfies A²+B²+C²＝1；

By least squares and Lagrange multipliersSolving the plane fitting problem to finally obtain a point P on the face_jThe specific process of solving the plane fitting problem specifically comprises the following steps: normalization vector corresponding to minimum eigenvalue of covariance matrix sigma

The covariance matrix Σ is of the form:

wherein

And so on.

Estimating all points on the 3D face to obtain a normal vector of the 3D face;

2) next, three components of the normal vector are calculated, specifically, after the normal vector of a certain 3D face is obtained, the normal vector is projected on YZ, XZ, and XY planes, and an X component map, a Y component map, and a Z component map of the normal vector of the 3D face are obtained in this order.

3) And finally, performing step 1) and step 2) on each 3D face in the 4D face data to obtain a corresponding normal vector component image, overlapping corresponding normal vector X component images calculated by all 3D faces of the 4D face data to obtain a normal vector X component image of the 4D face, overlapping corresponding normal vector Y component images calculated by all 3D faces of the 4D face data to obtain a normal vector Y component image of the 4D face, and overlapping corresponding normal vector Z component images calculated by all 3D faces of the 4D face data to obtain a normal vector Z component image of the 4D face. The resulting 4D normal component map is actually a video of normal components.

(2) The shape index of the 4D face is measured as a result of normalization of two principal curvatures of two curved surfaces of the face, and can be regarded as a property of second order differential, specifically, the step of calculating the shape index of the 4D face is as follows: firstly, calculating a shape index graph of a 3D face:

Parameters A, B, C, D, E, F and G in the above formula are fitted according to the vertex coordinates of the 3D face. Then, a matrix is obtained

The characteristic root decomposition is carried out on the matrix, and the maximum characteristic root is the maximum principal curvature K₁The minimum feature root is the minimum principal curvature K₂. Substituting the maximum principal curvature and the minimum principal curvature at one vertex into a shape index (Shapeindex) calculation formula:

the shape index at the vertex is obtained. Calculating a shape index Shapeindex for each vertex of the 3D face to obtain a shape index graph of the 3D face;

(3) And calculating a depth map of the 4D face, wherein the gray value of each pixel of the depth map represents each point of the face, the distance from the point to the camera and the geometric shape of the face. The steps of calculating the depth map of the 4D face are as follows:

first, a depth map of a 3D face is calculated, and for a 3D face F_iA point P of_j(x_j，y_j，z_j) In other words, the corresponding pixel value Dep of the depth map_jThe calculation formula of (2) is as follows:

wherein z is_maxAnd z_minRepresenting a face F_iThe maximum and minimum values of the Z coordinate of the point of (a).

Step 104:

and (4) carrying out tensor decomposition on the normal vector component, the shape index and the depth map obtained in the step 103 respectively to obtain static person identity information and dynamic expression change information. Specifically, taking a depth map as an example, the components of the normal vector and the process of tensor decomposition of the shape index are similarly obtained.

(1) Establishing a model, and determining a depth map Dep epsilon R of the 4D face^H×W×LWhere H denotes the height of the depth map, W denotes the width of the depth map, and L denotes the sequence length of the 4D face. Considering that for a 3D face sequence, the dynamic part is the expression and the static part is the identity information, the expression and identity can be considered as distributed independently, and the two can be separated. Suppose the expression information is Emo ∈ R^H×W×LThe identity information is ID ∈ R^H×W×LThen, the following 4D facial expression-identity information separation model can be established:

f＝DEmo

wherein λ represents a weight coefficient, which measures the specific gravity between f and e, e represents noise, and DEmo represents modeling of dynamic expression information:

||DEmo||₁＝||D_hEmo||₁+||D_vEmo||₁+||D_tEmo||₁

D_hEmo＝vec(Emo(i，j+1，k)-Emo(i，j，k))

D_vEmo＝vec(Emo(i+1，j，k)-Emo(i，j，k))

D_tEmo＝vec(Emo(i，j，k+1)-Emo(i，j，k)) (2)

D_hdifferential operator representing the horizontal direction, D_vDifferential operator representing the horizontal direction, D_tA difference operator representing the time domain direction.

Formula (2) depicts the dynamic transformation information of the facial expression. Modeling static persona IDs as follows

Formula (3) is actually a tach decomposition of the identity information of the 4D face,

core tensor, U, representing the Take decomposition₁，U₂，U₃A matrix representing each mode in the Tack decomposition. (3) In formula (i)₁，×₂And is prepared from₃The products of the tensors and the matrices of the individual modes are represented separately. The characteristic that the face identity information is kept unchanged in different expressions is reflected through the modeling of the formula (3).

(2) And (3) solving the model, namely solving the 4D facial expression-identity information separation model established by the formula (1) by iterative optimization. For such multivariate optimization problems, iterative optimization is usually performed using an Alternating Direction Multiplier Method (ADMM). Firstly, initializing parameters, and specifically, iteratively updating the parameters as follows:

the first step is as follows: updating core tensor of tack decomposition

Sum matrix U₁，U₂，U₃；

Wherein λ is^DepIs the Lagrange multiplier vector, beta^DepIn order to be a positive penalty parameter,

is shown as

The tensor of (a);

the second step is that: update noise e:

wherein

Represent

Spread vector, λ^DepIs the Lagrange multiplier vector, beta^DepFor the positive penalty parameter, Dep is the depth map of the 4D face, and Emo is the dynamic expression information of the 4D face data.

The third step: updating the dynamic expression information Emo;

wherein fftn and iftn respectively represent fast 3D Fourier transform and inverse transform, beta^DepAnd beta^fFor a positive penalty parameter, λ^fIs a Lagrange multiplier vector, | · non-conducting phosphor²Is the squaring operation of the elements, D^*Denoted is the companion matrix of D, D_h，D_v，D_tThe difference operators representing the vertical, horizontal and temporal directions, respectively, f is defined as: f is DEmo;

wherein D is^*The companion matrix of D is represented.

Update tensor f:

where λ is a weight coefficient, λ^fIs the Lagrange multiplier vector, beta^fFor a positive penalty parameter, soft is a function defined as: soft (a, τ): sgn (a) · max (| a | - τ,0)

Lagrange multiplier vector lambda for updating 4D facial expression-identity information separation model^f，λ^fAnd a positive penalty parameter beta^fAnd beta^Dep，：

Wherein,

e is noise, and γ is a parameter related to model convergence; c. C₁，c₂Are all coefficients; c. C₁，c₂1.15 and 0.95 were taken, respectively.

As shown in fig. 3, three components of the normal vector of the 4D face are displayed, and a dynamic expression map extracted after the shape index and the depth map are subjected to tensor decomposition is displayed. The image top-down represents a depth map, and the images top-down respectively are: the depth map, the normal vector X component map, the normal vector Y component map, the normal vector Z component map and the shape index map display dynamic expression information maps of the same expression of the same face extracted through tensor decomposition at different times from left to right.

Step 105:

and (4) sending the dynamic expression information Emo extracted in the step three into a dynamic image network to extract the information of expression motion, and further realizing the classification of expressions. General expressions can be divided into six categories: happy, angry, sadness, surprise, dislike and fear. A moving picture network is a network that extracts moving pictures. The bottom layer of the network is a general deep neural network such as: the VGGNet16 network is added with a rank posing layer before the network full connection layer. Wherein the rank posing layer functions to change a view sequence feature into a graph. This picture implies the dynamic characteristics of each frame of a video sequence. As shown in fig. 4, which is a network structure diagram of a dynamic image network, the calculation flow of the rank posing layer of the network is as follows:

the network is updated as follows:

wherein, a^(m)Representing the output, μ, of the m-th layer of the moving picture network_tRepresenting the parameter, V, to be learned by the network₁，...，V_TFeatures representing the output of the dynamic image network. The following approximation is made to facilitate network back propagation:

wherein alpha is_tIs a parameter to be learned by the network,

the characteristics of the upper layer network are shown.

Step 106:

and (4) performing score fusion on the results of the different feature data obtained in the step (4), finally obtaining the expression recognition result of the model, and outputting the expression recognition result. Fig. 5 shows a network structure diagram for multi-feature fusion expression recognition of dynamic expression information extracted from normal vector components, shape indexes and depth maps by using a dynamic image network.

The invention relates to a geometric feature image of 4D face data based on tensor decomposition, which comprises the following steps: three components of a normal vector, a shape index and a depth map. Dynamic expression information and static figure identity information of geometric feature images of the 4D face data are separated, expression recognition is carried out by respectively utilizing the extracted dynamic expression information of the 4D face data, score fusion is carried out on expression recognition results of different geometric feature images, and a final expression recognition result is obtained.

Claims

1. A tensor decomposition-based multi-feature fusion 4D expression recognition method is characterized by comprising the following steps:

(1) acquiring 4D facial expression data;

(3) carrying out tensor decomposition on three components of a normal vector, a shape index and a depth map of the 4D face data respectively, and extracting dynamic face expression information; the specific process of tensor decomposition of the depth map of the 4D face data is as follows:

1) establishing a model;

depth map Dep ∈ R for 4D face^H×W×LWherein H represents the height of the depth map, W represents the width of the depth map, L represents the sequence length of the 4D face, and the expression information is assumed to be Emo epsilon R^H×W×LThe identity information is ID ∈ R^H×W×LAnd then establishing a 4D facial expression-identity information separation model:

f＝DEmo

‖DEmo‖₁＝‖D_hEmo‖₁+‖D_vEmo‖₁+‖D_tEmo‖₁

D_hEmo＝vec(Emo(i,j+1,k)-Emo(i,j,k))

D_vEmo＝vec(Emo(i+1,j,k)-Emo(i,j,k))

D_tEmo＝vec(Emo(i,j,k+1)-Emo(i,j,k)) (2)

D_ndifferential operator representing the horizontal direction, D_vDifferential operator representing the horizontal direction, D_tA difference operator representing a time domain direction;

modeling the static persona ID is as follows:

wherein,

core tensor representing Take decomposition，U₁，U₂，U₃A matrix representing each mode in the Tack decomposition; is made from₁，×₂And-₃Respectively representing the product of the tensor and the matrix of each mode;

2) solving the model:

solving the established 4D facial expression-identity information separation model through iterative optimization;

2. The tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein in the step (1), 4D facial expression data S ═ { F ═ F%₁,F₂,…F_lIn which F_iRepresenting 3D facial expression data, i ═ 1 … l, l representing the number of frames of the 4D face.

3. The tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein in the step (2), the specific process of preprocessing the 4D expression data of the human face is as follows: and denoising the 4D facial expression data.

4. The tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein in the step (2), the specific process of calculating three components of the normal vector of the 4D face data is as follows:

Ax+By+Cz+D＝0

satisfies A²+B²+C²＝1；

By a minimum of twoMultiplying and Lagrange multiplier method, solving the plane fitting problem to obtain a point P on the face_jEstimating all points on the 3D face to obtain a normal vector of the 3D face;

3) finally, performing step 1) and step 2) on each 3D face in the 4D face data to obtain a corresponding normal vector component image, and overlapping corresponding normal vector X component images calculated by all 3D faces in the 4D face data to obtain a normal vector X component image of the 4D face; overlapping the corresponding normal vector Y component images calculated by all 3D faces of the 4D face data to obtain a normal vector Y component image of the 4D face; and overlapping the normal vector Z component images calculated by all the 3D faces of the 4D face data together to obtain a normal vector Z component image of the 4D face.

5. The tensor decomposition-based multi-feature fusion 4D expression recognition method as claimed in claim 4, wherein the specific process of solving the plane fitting problem is as follows: normalization vector corresponding to minimum eigenvalue of covariance matrix sigma

The covariance matrix Σ is of the form:

wherein,

6. the tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein the specific process of calculating the shape index map is as follows:

firstly, calculating a shape index graph of a 3D face:

7. The tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein the specific process of calculating the depth map of the 4D face is as follows:

first, a depth map of a 3D face is calculated, and for a 3D face F_iA point P of_j(x_j，y_j，z_j) The pixel value Dep of its corresponding depth map_jThe calculation formula of (2) is as follows:

8. The tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein the specific process of solving through iterative optimization is as follows:

the first step is as follows: updating core tensor of tach decomposition

Sum matrix U₁，U₂，U₃；

is shown as

The tensor of (a);

the second step: updating the noise e;

wherein,

to represent

the third step: updating the dynamic expression information Emo;

wherein fftn and ifftn represent fast 3D Fourier transform and inverse transform, respectively, β^DepAnd beta^fFor a positive penalty parameter, λ^fIs a Lagrange multiplier vector | · non-linear²Is a squaring operation of elements, D^*Denoted is the companion matrix for D;

update tensor f:

wherein is the weight coefficient, λ^fIs the Lagrange multiplier vector, beta^fFor a positive penalty parameter, soft is a function defined as: soft (a, τ) ═ sgn (a) · max (| a | - τ, 0);

lagrange multiplier vector lambda for updating 4D facial expression-identity information separation model^fPositive penalty parameter beta^fAnd beta^Dep：

Wherein,

e is noise, and λ is a parameter related to model convergence; c. C₁，c₂Are all coefficients;

9. The tensor decomposition-based multi-feature fusion 4D expression recognition method as claimed in claim 8, wherein the bottom layer of the dynamic image network is a deep neural network, a rank posing layer is added before the network full-connection layer, and the rank posing layer of the network is calculated as follows:

the network is updated as follows:

wherein, a^(m)Output, μ, representing the m-th layer of the moving picture network_tRepresenting the parameter, V, to be learned by the network₁,…,V_TA feature representing an output of the dynamic image network;the following approximation is made to facilitate network back propagation:

wherein alpha is_tIs a parameter to be learned by the network,

the characteristics of the upper layer network are shown.