CN111178255B - Tensor decomposition-based multi-feature fusion 4D expression identification method - Google Patents

Tensor decomposition-based multi-feature fusion 4D expression identification method Download PDF

Info

Publication number
CN111178255B
CN111178255B CN201911384458.0A CN201911384458A CN111178255B CN 111178255 B CN111178255 B CN 111178255B CN 201911384458 A CN201911384458 A CN 201911384458A CN 111178255 B CN111178255 B CN 111178255B
Authority
CN
China
Prior art keywords
face
expression
emo
normal vector
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911384458.0A
Other languages
Chinese (zh)
Other versions
CN111178255A (en
Inventor
黄义妨
张明
岳江北
李慧斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201911384458.0A priority Critical patent/CN111178255B/en
Publication of CN111178255A publication Critical patent/CN111178255A/en
Application granted granted Critical
Publication of CN111178255B publication Critical patent/CN111178255B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Optimization (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

A4D expression recognition method based on tensor decomposition and multi-feature fusion obtains 4D expression data of a human face; preprocessing the 4D facial expression data and then calculating to obtain three components of a normal vector, a shape index and a depth map of the 4D facial expression data; carrying out tensor decomposition on three components of a normal vector, a shape index and a depth map of 4D face data respectively, and extracting dynamic face expression information; and classifying the dynamic facial expression information by using a dynamic image network, and performing score fusion on the classified results to obtain a final classified result. The method fully utilizes the information of the 4D face data, calculates three components of a normal vector of the face, a shape index and a depth map of the sequence face data, fully utilizes the 3D geometric information of the face, has more representative and discriminability of the features of the face for different people, and has higher accuracy of face recognition and expression recognition.

Description

Tensor decomposition-based multi-feature fusion 4D expression identification method
Technical Field
The invention relates to an expression recognition method, in particular to a tensor decomposition-based multi-feature fusion 4D expression recognition method.
Background
With the development and progress of artificial intelligence and computer technology, expression recognition and face recognition are receiving more and more attention. The applications of expression recognition and face recognition in life are gradually becoming widespread. There are many current facial expression recognition methods, such as: and extracting the features of the 2D picture or the features of the video by using a deep neural network, and further classifying. Also, the expression classification using the 3D face data is available. In fact, 2D-based facial expression recognition is susceptible to illumination, scenes. The expression recognition based on 3D can overcome the influence of illumination and gesture, but for the expression recognition based on 3D, different individuals are different even with the same expression due to different expression modes and degrees of expressions of different people. Therefore, the identification information of the person is equivalent to a disturbance for the problem of expression recognition.
Disclosure of Invention
The invention aims to provide a tensor decomposition-based multi-feature fusion 4D expression recognition method.
In order to achieve the purpose, the invention adopts the following technical scheme:
a tensor decomposition-based multi-feature fusion 4D expression recognition method comprises the following steps:
(1) acquiring 4D facial expression data;
(2) preprocessing the 4D facial expression data and then calculating to obtain three components of a normal vector, a shape index and a depth map of the 4D facial expression data;
(3) carrying out tensor decomposition on three components of a normal vector, a shape index and a depth map of the 4D face data respectively, and extracting dynamic face expression information;
(4) and classifying the dynamic facial expression information by using a dynamic image network, and performing score fusion on the classified results to obtain a final classified result.
A further improvement of the present invention is that in step (1), the 4D facial expression data S ═ F1,F2,…FlIn which FiRepresenting 3D facial expression data, i ═ 1 … l, l representing the number of frames of the 4D face.
The further improvement of the invention is that in the step (2), the specific process of preprocessing the 4D expression data of the human face is as follows: and denoising the 4D facial expression data.
The further improvement of the invention is that in the step (2), the specific process of calculating the three components of the normal vector of the 4D face data is as follows:
1) firstly, calculating a normal vector of a single 3D face; the specific process is as follows: for 3D face data, first, a point P on the face is selectedjTo form a neighborhood δ ═ Pi(xi,yi,zi) I | ═ 1,2, … k }, k is 5, and the plane to be fitted is:
Ax+By+Cz+D=0
satisfies A2+B2+C2=1;
Solving the plane fitting problem by a least square method and a Lagrange multiplier method to obtain a point P on the facejEstimating all points on the 3D face to obtain a normal vector of the 3D face;
2) respectively projecting the normal vector of the 3D face to YZ, XZ and XY planes to obtain an X component diagram, a Y component diagram and a Z component diagram of the normal vector of the 3D face;
3) finally, performing step 1) and step 2) on each 3D face in the 4D face data to obtain a corresponding normal vector component image, and then overlapping corresponding normal vector X component images calculated by all 3D faces in the 4D face data together to obtain a normal vector X component image of the 4D face; overlapping the corresponding normal vector Y component images calculated by all 3D faces of the 4D face data to obtain a normal vector Y component image of the 4D face; and overlapping the normal vector Z component images calculated by all the 3D faces of the 4D face data together to obtain a normal vector Z component image of the 4D face.
The invention is further improved in that the concrete process for solving the plane fitting problem is as follows: normalization vector corresponding to minimum eigenvalue of covariance matrix sigma
Figure BDA0002343171920000031
The covariance matrix sigma is of the form
Figure BDA0002343171920000032
Wherein,
Figure BDA0002343171920000033
a further improvement of the present invention is that the specific process of calculating the shape index map is as follows:
firstly, calculating a shape index graph of a 3D face:
for a certain point of the human face, the point and the surrounding area are assumed to be a discrete parametric surface
Figure BDA0002343171920000034
Figure BDA0002343171920000035
Parameters A, B, C, D, E, F and G are fitted according to the vertex coordinates of the 3D face, and then a matrix is obtained
Figure BDA0002343171920000036
The characteristic root decomposition is carried out on the matrix, and the maximum characteristic root is the maximum principal curvature K1The minimum feature root is the minimum principal curvature K2(ii) a Substituting the maximum principal curvature and the minimum principal curvature at a vertex into a shape index Shapeindex calculation formula to obtain the shape index at the vertex;
Figure BDA0002343171920000037
calculating a shape index Shapeindex for each vertex of the 3D face to obtain a shape index graph of the 3D face;
and stacking the shape index images of each 3D face of the 4D face to obtain the shape index image of the 4D face.
The further improvement of the invention is that the specific process of calculating the depth map of the 4D face is as follows:
first, a depth map of a 3D face is calculated, and for a 3D face FiA point P ofj(xj,yj,zj) Corresponding to the pixel value Dep of the depth mapjThe calculation formula of (2) is as follows:
Figure BDA0002343171920000038
wherein z ismaxAnd zminRepresenting a face FiThe maximum and minimum values of the Z coordinate of the point of (a);
and then stacking the depth maps of all the 3D faces of the 4D face to obtain the depth map of the 4D face.
The further improvement of the present invention is that, in the step (3), the specific process of tensor decomposition of the depth map of the 4D face data is as follows:
1) establishing a model;
depth map Dep ∈ R for 4D faceH×W×LWherein H represents the height of the depth map, W represents the width of the depth map, L represents the sequence length of the 4D face, and the expression information is assumed to be Emo epsilon RH×W×LThe identity information is ID ∈ RH×W×LThen, a 4D facial expression-identity information separation model is established:
Figure BDA0002343171920000041
f=DEmo
Figure BDA0002343171920000045
wherein, lambda represents a weight coefficient, e represents noise, and DEmo represents modeling of dynamic expression information;
||DEmo||1=||DhEmo||1+||DvEmo||1+||DtEmo||1
DhEmo=vec(Emo(i,j+1,k)-Emo(i,j,k))
DvEmo=vec(Emo(i+1,j,k)-Emo(i,j,k))
DtEmo=vec(Emo(i,j,k+1)-Emo(i,j,k)) (2)
Dhdifferential operator representing the horizontal direction, DvDifferential operator representing the horizontal direction, DtA difference operator representing a time domain direction;
modeling the static persona ID is as follows:
Figure BDA0002343171920000042
wherein,
Figure BDA0002343171920000043
core tensor, U, representing the Take decomposition1,U2,U3A matrix representing each mode in the Tack decomposition; is made from1,×2And-3Respectively representing the product of the tensor and the matrix of each mode;
2) solving the model:
and solving the established 4D facial expression-identity information separation model through iterative optimization.
The invention is further improved in that the specific process of solving by iterative optimization is as follows:
the first step is as follows: updating core tensor of tach decomposition
Figure BDA0002343171920000044
Sum matrix U1,U2,U3
Figure BDA0002343171920000051
Figure BDA0002343171920000052
Figure BDA0002343171920000053
Wherein λ isDepIs a Lagrange multiplier vector, betaDepIn order to be a positive penalty parameter,
Figure BDA0002343171920000054
for the estimated static identity information of the person, Dep is the depth map of the 4D face, Emo is the dynamic expression information of the 4D face data, e is noise,
Figure BDA0002343171920000055
is shown as
Figure BDA0002343171920000056
The tensor of (a);
the second step is that: updating the noise e;
Figure BDA0002343171920000057
wherein,
Figure BDA00023431719200000510
Figure BDA00023431719200000511
to represent
Figure BDA00023431719200000512
Spread vector, λDepIs a Lagrange multiplier vector, betaDepThe penalty parameter is positive, Dep is a depth map of the 4D face, and Emo is dynamic expression information of the 4D face data;
the third step: updating the dynamic expression information Emo;
Figure BDA0002343171920000058
wherein fftn and ifftn represent fast 3D Fourier transform and inverse transform, respectively, βDepAnd betafFor a positive penalty parameter, λfIs a Lagrange multiplier vector, | · non-conducting phosphor2Is the square operation of an elementDo, D*The companion matrix of D is represented;
update tensor f:
Figure BDA0002343171920000059
where λ is a weight coefficient, λfIs the Lagrange multiplier vector, betafFor positive penalty parameters, soft is a function defined as: soft (a, τ): sgn (a) · max (| a | - τ, 0);
lagrange multiplier vector lambda for updating 4D facial expression-identity information separation modelfPositive penalty parameter betafAnd betaDep,:
Figure BDA0002343171920000061
Wherein,
Figure BDA0002343171920000062
nRespreis the value of the last iteration, Dep is the depth map of the 4D face, Emo is the dynamic expression information of the 4D face data,
Figure BDA0002343171920000063
e is noise, and γ is a parameter related to model convergence; c. C1,c2Are all coefficients;
and sending the dynamic expression information Emo extracted in the third step into a dynamic image network to extract information of expression motions, and further realizing the classification of expressions.
The further improvement of the invention is that the bottom layer of the dynamic image network is a deep neural network, a rank posing layer is added before the network full-connection layer, and the calculation process of the rank posing layer of the network is as follows:
the network is updated as follows:
Figure BDA0002343171920000064
wherein,a(m)Output, μ, representing the m-th layer of the moving picture networktDenotes the parameter, V, to be learned by the network1,...,VTA feature representing an output of the dynamic image network; the following approximation is made to facilitate network back propagation:
Figure BDA0002343171920000065
wherein alpha istIs a parameter to be learned by the network,
Figure BDA0002343171920000066
the characteristics of the upper layer network are shown.
Compared with the prior art, the invention has the following beneficial effects:
(1) the 4D face data is used for expression recognition and face recognition, and the defect that the 2D face recognition is greatly influenced by factors such as illumination postures can be overcome. By using the 4D data, stable effects can be obtained in expression recognition and face recognition for different scenes and environments.
(2) The method fully utilizes the information of the 4D face data, calculates three components of normal vectors, shape indexes and depth maps of the face for the sequence face data, fully utilizes the 3D geometric information of the face, has more representative and discriminability of the features of the face for different people, and has higher accuracy of face recognition and expression recognition.
(3) And decomposing the 4D face data by using a tensor decomposition method to obtain dynamic expression information and static face identity information. The dynamic expression information is used for expression recognition, and the interference of the character identity is removed, so that the expression recognition result is more stable and accurate.
Drawings
FIG. 1 is a detailed flow chart of the present invention.
Fig. 2 is a normal vector three component, shape index and depth map of the 4D face of the present invention.
Fig. 3 is a dynamic expression information graph extracted by tensor decomposition of three components of a normal vector, a shape index and a depth map of a 4D face according to the invention.
Fig. 4 is a network structure diagram of the present invention for performing expression recognition on dynamic expression information extracted from a shape index using a dynamic image network.
FIG. 5 is a diagram of the network architecture for multi-feature fusion expression recognition using a dynamic image network in accordance with the present invention.
Detailed Description
The present invention will be described in detail below with reference to examples.
Referring to fig. 1, the present invention comprises the steps of:
(1) acquiring 4D facial expression data;
(2) preprocessing the data, and calculating three components of a normal vector, a shape index and a depth map of the 4D face data, wherein the five features deeply reflect the geometric shape characteristics of the face at each moment;
(3) carrying out tensor decomposition on three components of a normal vector of the 4D face data, the shape index and the depth map respectively, and extracting dynamic face expression information and static identity information;
(4) and classifying the dynamic facial expression information by using a dynamic image network, and performing score fusion on the classified results to obtain a final classified result.
Specifically, referring to fig. 1, the present invention comprises the following steps:
step 101:
and acquiring 4D expression data of the human face, wherein the 4D expression data refer to a series of 3D human face data video sequences. Some cameras used firstly, such as Intel RealSense SR300 and the like, can capture the depth information of the face, can easily obtain 3D facial expression data by means of a structured light model, and continuously acquire the 3D facial expression data to obtain 4D facial data.
Assume that 4D facial expression data is S ═ F1,F2,…FlIn which Fi(i ═ 1 … l) represents 3D facial expression data, and l represents the number of frames of a 4D face.
Step 102:
the 4D data is preprocessed, 4D face data obtained by a camera often contains noise, holes and the like, the 4D face needs to be preprocessed, and normal vector components, shape indexes and depth maps corresponding to the 4D face are further calculated. In particular, the present invention relates to a method for producing,
step 2.1: the 4D face data is subjected to hole filling processing, and the hole filling processing can be realized by a template face hole filling method, which are common processing methods for 3D and 4D data.
Step 103:
three components of a normal vector of the 4D face data are calculated, as well as a shape index and a depth map. This step is to extract the corresponding geometric features from the 4D face. As shown in fig. 2, an example of calculating the three components of the normal vector, as well as the shape index and depth map, from the disclosed 4D expression public database BU4D is given. In this example, 5 frames of faces are selected, and the images respectively include, from top to bottom: the depth map comprises a normal vector X component map, a normal vector Y component map, a normal vector Z component map and a shape index map. The images show the same expression condition of the same face at different times from left to right. Specifically, calculating the three components of the normal vector of the 4D face, and the shape index and depth map comprises the following 3 steps.
(1) Calculating the normal vector component of the 4D face:
1) firstly, a normal vector of a single 3D face is calculated, and for a 3D face data, such as a point cloud, the normal vector at a certain point is estimated to be obtained by a method of plane fitting of a certain point and a plurality of points around the certain point, and generally, the normal vector is obtained by fitting a plane by using a point and 5 points nearest to the point. For example: a point P on the facejThe normal vector of (1) is first selected to be PjTo form a neighborhood δ ═ Pi(xi,yi,zi) I ═ 1,2, … k }, where k takes 5 to fit the plane:
Ax+By+Cz+D=0
satisfies A2+B2+C2=1;
By least squares and Lagrange multipliersSolving the plane fitting problem to finally obtain a point P on the facejThe specific process of solving the plane fitting problem specifically comprises the following steps: normalization vector corresponding to minimum eigenvalue of covariance matrix sigma
Figure BDA0002343171920000091
The covariance matrix Σ is of the form:
Figure BDA0002343171920000092
wherein
Figure BDA0002343171920000093
And so on.
Estimating all points on the 3D face to obtain a normal vector of the 3D face;
2) next, three components of the normal vector are calculated, specifically, after the normal vector of a certain 3D face is obtained, the normal vector is projected on YZ, XZ, and XY planes, and an X component map, a Y component map, and a Z component map of the normal vector of the 3D face are obtained in this order.
3) And finally, performing step 1) and step 2) on each 3D face in the 4D face data to obtain a corresponding normal vector component image, overlapping corresponding normal vector X component images calculated by all 3D faces of the 4D face data to obtain a normal vector X component image of the 4D face, overlapping corresponding normal vector Y component images calculated by all 3D faces of the 4D face data to obtain a normal vector Y component image of the 4D face, and overlapping corresponding normal vector Z component images calculated by all 3D faces of the 4D face data to obtain a normal vector Z component image of the 4D face. The resulting 4D normal component map is actually a video of normal components.
(2) The shape index of the 4D face is measured as a result of normalization of two principal curvatures of two curved surfaces of the face, and can be regarded as a property of second order differential, specifically, the step of calculating the shape index of the 4D face is as follows: firstly, calculating a shape index graph of a 3D face:
for a certain point of the human face, the point and the surrounding area are assumed to be a discrete parametric surface
Figure BDA0002343171920000101
Figure BDA0002343171920000102
Parameters A, B, C, D, E, F and G in the above formula are fitted according to the vertex coordinates of the 3D face. Then, a matrix is obtained
Figure BDA0002343171920000103
The characteristic root decomposition is carried out on the matrix, and the maximum characteristic root is the maximum principal curvature K1The minimum feature root is the minimum principal curvature K2. Substituting the maximum principal curvature and the minimum principal curvature at one vertex into a shape index (Shapeindex) calculation formula:
Figure BDA0002343171920000104
the shape index at the vertex is obtained. Calculating a shape index Shapeindex for each vertex of the 3D face to obtain a shape index graph of the 3D face;
and stacking the shape index images of each 3D face of the 4D face to obtain the shape index image of the 4D face.
(3) And calculating a depth map of the 4D face, wherein the gray value of each pixel of the depth map represents each point of the face, the distance from the point to the camera and the geometric shape of the face. The steps of calculating the depth map of the 4D face are as follows:
first, a depth map of a 3D face is calculated, and for a 3D face FiA point P ofj(xj,yj,zj) In other words, the corresponding pixel value Dep of the depth mapjThe calculation formula of (2) is as follows:
Figure BDA0002343171920000105
wherein z ismaxAnd zminRepresenting a face FiThe maximum and minimum values of the Z coordinate of the point of (a).
And then stacking the depth maps of all the 3D faces of the 4D face to obtain the depth map of the 4D face.
Step 104:
and (4) carrying out tensor decomposition on the normal vector component, the shape index and the depth map obtained in the step 103 respectively to obtain static person identity information and dynamic expression change information. Specifically, taking a depth map as an example, the components of the normal vector and the process of tensor decomposition of the shape index are similarly obtained.
(1) Establishing a model, and determining a depth map Dep epsilon R of the 4D faceH×W×LWhere H denotes the height of the depth map, W denotes the width of the depth map, and L denotes the sequence length of the 4D face. Considering that for a 3D face sequence, the dynamic part is the expression and the static part is the identity information, the expression and identity can be considered as distributed independently, and the two can be separated. Suppose the expression information is Emo ∈ RH×W×LThe identity information is ID ∈ RH×W×LThen, the following 4D facial expression-identity information separation model can be established:
Figure BDA0002343171920000111
f=DEmo
Figure BDA0002343171920000112
wherein λ represents a weight coefficient, which measures the specific gravity between f and e, e represents noise, and DEmo represents modeling of dynamic expression information:
||DEmo||1=||DhEmo||1+||DvEmo||1+||DtEmo||1
DhEmo=vec(Emo(i,j+1,k)-Emo(i,j,k))
DvEmo=vec(Emo(i+1,j,k)-Emo(i,j,k))
DtEmo=vec(Emo(i,j,k+1)-Emo(i,j,k)) (2)
Dhdifferential operator representing the horizontal direction, DvDifferential operator representing the horizontal direction, DtA difference operator representing the time domain direction.
Formula (2) depicts the dynamic transformation information of the facial expression. Modeling static persona IDs as follows
Figure BDA0002343171920000113
Formula (3) is actually a tach decomposition of the identity information of the 4D face,
Figure BDA0002343171920000114
core tensor, U, representing the Take decomposition1,U2,U3A matrix representing each mode in the Tack decomposition. (3) In formula (i)1,×2And is prepared from3The products of the tensors and the matrices of the individual modes are represented separately. The characteristic that the face identity information is kept unchanged in different expressions is reflected through the modeling of the formula (3).
(2) And (3) solving the model, namely solving the 4D facial expression-identity information separation model established by the formula (1) by iterative optimization. For such multivariate optimization problems, iterative optimization is usually performed using an Alternating Direction Multiplier Method (ADMM). Firstly, initializing parameters, and specifically, iteratively updating the parameters as follows:
the first step is as follows: updating core tensor of tack decomposition
Figure BDA0002343171920000121
Sum matrix U1,U2,U3
Figure BDA0002343171920000122
Figure BDA0002343171920000123
Figure BDA0002343171920000124
Wherein λ isDepIs the Lagrange multiplier vector, betaDepIn order to be a positive penalty parameter,
Figure BDA0002343171920000125
for the estimated static identity information of the person, Dep is the depth map of the 4D face, Emo is the dynamic expression information of the 4D face data, e is noise,
Figure BDA0002343171920000126
is shown as
Figure BDA0002343171920000127
The tensor of (a);
the second step is that: update noise e:
Figure BDA0002343171920000128
wherein
Figure BDA0002343171920000129
Figure BDA00023431719200001210
Represent
Figure BDA00023431719200001211
Spread vector, λDepIs the Lagrange multiplier vector, betaDepFor the positive penalty parameter, Dep is the depth map of the 4D face, and Emo is the dynamic expression information of the 4D face data.
The third step: updating the dynamic expression information Emo;
Figure BDA00023431719200001212
wherein fftn and iftn respectively represent fast 3D Fourier transform and inverse transform, betaDepAnd betafFor a positive penalty parameter, λfIs a Lagrange multiplier vector, | · non-conducting phosphor2Is the squaring operation of the elements, D*Denoted is the companion matrix of D, Dh,Dv,DtThe difference operators representing the vertical, horizontal and temporal directions, respectively, f is defined as: f is DEmo;
Figure BDA00023431719200001213
wherein D is*The companion matrix of D is represented.
Update tensor f:
Figure BDA0002343171920000131
where λ is a weight coefficient, λfIs the Lagrange multiplier vector, betafFor a positive penalty parameter, soft is a function defined as: soft (a, τ): sgn (a) · max (| a | - τ,0)
Lagrange multiplier vector lambda for updating 4D facial expression-identity information separation modelf,λfAnd a positive penalty parameter betafAnd betaDep,:
Figure BDA0002343171920000132
Wherein,
Figure BDA0002343171920000133
nRespreis the value of the last iteration, Dep is the depth map of the 4D face, Emo is the dynamic expression information of the 4D face data,
Figure BDA0002343171920000134
e is noise, and γ is a parameter related to model convergence; c. C1,c2Are all coefficients; c. C1,c21.15 and 0.95 were taken, respectively.
As shown in fig. 3, three components of the normal vector of the 4D face are displayed, and a dynamic expression map extracted after the shape index and the depth map are subjected to tensor decomposition is displayed. The image top-down represents a depth map, and the images top-down respectively are: the depth map, the normal vector X component map, the normal vector Y component map, the normal vector Z component map and the shape index map display dynamic expression information maps of the same expression of the same face extracted through tensor decomposition at different times from left to right.
Step 105:
and (4) sending the dynamic expression information Emo extracted in the step three into a dynamic image network to extract the information of expression motion, and further realizing the classification of expressions. General expressions can be divided into six categories: happy, angry, sadness, surprise, dislike and fear. A moving picture network is a network that extracts moving pictures. The bottom layer of the network is a general deep neural network such as: the VGGNet16 network is added with a rank posing layer before the network full connection layer. Wherein the rank posing layer functions to change a view sequence feature into a graph. This picture implies the dynamic characteristics of each frame of a video sequence. As shown in fig. 4, which is a network structure diagram of a dynamic image network, the calculation flow of the rank posing layer of the network is as follows:
the network is updated as follows:
Figure BDA0002343171920000135
wherein, a(m)Representing the output, μ, of the m-th layer of the moving picture networktRepresenting the parameter, V, to be learned by the network1,...,VTFeatures representing the output of the dynamic image network. The following approximation is made to facilitate network back propagation:
Figure BDA0002343171920000141
wherein alpha istIs a parameter to be learned by the network,
Figure BDA0002343171920000142
the characteristics of the upper layer network are shown.
Step 106:
and (4) performing score fusion on the results of the different feature data obtained in the step (4), finally obtaining the expression recognition result of the model, and outputting the expression recognition result. Fig. 5 shows a network structure diagram for multi-feature fusion expression recognition of dynamic expression information extracted from normal vector components, shape indexes and depth maps by using a dynamic image network.
The invention relates to a geometric feature image of 4D face data based on tensor decomposition, which comprises the following steps: three components of a normal vector, a shape index and a depth map. Dynamic expression information and static figure identity information of geometric feature images of the 4D face data are separated, expression recognition is carried out by respectively utilizing the extracted dynamic expression information of the 4D face data, score fusion is carried out on expression recognition results of different geometric feature images, and a final expression recognition result is obtained.

Claims (9)

1. A tensor decomposition-based multi-feature fusion 4D expression recognition method is characterized by comprising the following steps:
(1) acquiring 4D facial expression data;
(2) preprocessing the 4D facial expression data and then calculating to obtain three components of a normal vector, a shape index and a depth map of the 4D facial expression data;
(3) carrying out tensor decomposition on three components of a normal vector, a shape index and a depth map of the 4D face data respectively, and extracting dynamic face expression information; the specific process of tensor decomposition of the depth map of the 4D face data is as follows:
1) establishing a model;
depth map Dep ∈ R for 4D faceH×W×LWherein H represents the height of the depth map, W represents the width of the depth map, L represents the sequence length of the 4D face, and the expression information is assumed to be Emo epsilon RH×W×LThe identity information is ID ∈ RH×W×LAnd then establishing a 4D facial expression-identity information separation model:
Figure FDA0003635070780000011
f=DEmo
Figure FDA0003635070780000012
wherein, lambda represents a weight coefficient, e represents noise, and DEmo represents modeling of dynamic expression information;
‖DEmo‖1=‖DhEmo‖1+‖DvEmo‖1+‖DtEmo‖1
DhEmo=vec(Emo(i,j+1,k)-Emo(i,j,k))
DvEmo=vec(Emo(i+1,j,k)-Emo(i,j,k))
DtEmo=vec(Emo(i,j,k+1)-Emo(i,j,k)) (2)
Dndifferential operator representing the horizontal direction, DvDifferential operator representing the horizontal direction, DtA difference operator representing a time domain direction;
modeling the static persona ID is as follows:
Figure FDA0003635070780000013
wherein,
Figure FDA0003635070780000014
core tensor representing Take decomposition,U1,U2,U3A matrix representing each mode in the Tack decomposition; is made from1,×2And-3Respectively representing the product of the tensor and the matrix of each mode;
2) solving the model:
solving the established 4D facial expression-identity information separation model through iterative optimization;
(4) and classifying the dynamic facial expression information by using a dynamic image network, and performing score fusion on the classified results to obtain a final classified result.
2. The tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein in the step (1), 4D facial expression data S ═ { F ═ F%1,F2,…FlIn which FiRepresenting 3D facial expression data, i ═ 1 … l, l representing the number of frames of the 4D face.
3. The tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein in the step (2), the specific process of preprocessing the 4D expression data of the human face is as follows: and denoising the 4D facial expression data.
4. The tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein in the step (2), the specific process of calculating three components of the normal vector of the 4D face data is as follows:
1) firstly, calculating a normal vector of a single 3D face; the specific process is as follows: for 3D face data, first, a point P on the face is selectedjTo form a neighborhood δ ═ Pi(xi,yi,zi) I | ═ 1,2, … k }, k is 5, and the plane to be fitted is:
Ax+By+Cz+D=0
satisfies A2+B2+C2=1;
By a minimum of twoMultiplying and Lagrange multiplier method, solving the plane fitting problem to obtain a point P on the facejEstimating all points on the 3D face to obtain a normal vector of the 3D face;
2) respectively projecting the normal vector of the 3D face to YZ, XZ and XY planes to obtain an X component diagram, a Y component diagram and a Z component diagram of the normal vector of the 3D face;
3) finally, performing step 1) and step 2) on each 3D face in the 4D face data to obtain a corresponding normal vector component image, and overlapping corresponding normal vector X component images calculated by all 3D faces in the 4D face data to obtain a normal vector X component image of the 4D face; overlapping the corresponding normal vector Y component images calculated by all 3D faces of the 4D face data to obtain a normal vector Y component image of the 4D face; and overlapping the normal vector Z component images calculated by all the 3D faces of the 4D face data together to obtain a normal vector Z component image of the 4D face.
5. The tensor decomposition-based multi-feature fusion 4D expression recognition method as claimed in claim 4, wherein the specific process of solving the plane fitting problem is as follows: normalization vector corresponding to minimum eigenvalue of covariance matrix sigma
Figure FDA0003635070780000031
The covariance matrix Σ is of the form:
Figure FDA0003635070780000032
wherein,
Figure FDA0003635070780000033
6. the tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein the specific process of calculating the shape index map is as follows:
firstly, calculating a shape index graph of a 3D face:
for a certain point of the human face, the point and the surrounding area are assumed to be a discrete parametric surface
Figure FDA0003635070780000034
Figure FDA0003635070780000035
Parameters A, B, C, D, E, F and G are fitted according to the vertex coordinates of the 3D face, and then a matrix is obtained
Figure FDA0003635070780000036
The characteristic root decomposition is carried out on the matrix, and the maximum characteristic root is the maximum principal curvature K1The minimum feature root is the minimum principal curvature K2(ii) a Substituting the maximum principal curvature and the minimum principal curvature at a vertex into a shape index Shapeindex calculation formula to obtain the shape index at the vertex;
Figure FDA0003635070780000037
calculating a shape index Shapeindex for each vertex of the 3D face to obtain a shape index graph of the 3D face;
and stacking the shape index images of each 3D face of the 4D face to obtain the shape index image of the 4D face.
7. The tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein the specific process of calculating the depth map of the 4D face is as follows:
first, a depth map of a 3D face is calculated, and for a 3D face FiA point P ofj(xj,yj,zj) The pixel value Dep of its corresponding depth mapjThe calculation formula of (2) is as follows:
Figure FDA0003635070780000041
wherein z ismaxAnd zminRepresenting a face FiThe maximum and minimum values of the Z coordinate of the point of (a);
and then stacking the depth maps of all the 3D faces of the 4D face to obtain the depth map of the 4D face.
8. The tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein the specific process of solving through iterative optimization is as follows:
the first step is as follows: updating core tensor of tach decomposition
Figure FDA0003635070780000042
Sum matrix U1,U2,U3
Figure FDA0003635070780000043
Figure FDA0003635070780000044
Figure FDA0003635070780000045
Wherein λ isDepIs a Lagrange multiplier vector, betaDepIn order to be a positive penalty parameter,
Figure FDA0003635070780000046
for the estimated static identity information of the person, Dep is the depth map of the 4D face, Emo is the dynamic expression information of the 4D face data, e is noise,
Figure FDA0003635070780000047
is shown as
Figure FDA0003635070780000048
The tensor of (a);
the second step: updating the noise e;
Figure FDA0003635070780000049
wherein,
Figure FDA00036350707800000410
Figure FDA00036350707800000411
to represent
Figure FDA00036350707800000412
Spread vector, λDepIs a Lagrange multiplier vector, betaDepThe penalty parameter is positive, Dep is a depth map of the 4D face, and Emo is dynamic expression information of the 4D face data;
the third step: updating the dynamic expression information Emo;
Figure FDA0003635070780000051
wherein fftn and ifftn represent fast 3D Fourier transform and inverse transform, respectively, βDepAnd betafFor a positive penalty parameter, λfIs a Lagrange multiplier vector | · non-linear2Is a squaring operation of elements, D*Denoted is the companion matrix for D;
update tensor f:
Figure FDA0003635070780000052
wherein is the weight coefficient, λfIs the Lagrange multiplier vector, betafFor a positive penalty parameter, soft is a function defined as: soft (a, τ) ═ sgn (a) · max (| a | - τ, 0);
lagrange multiplier vector lambda for updating 4D facial expression-identity information separation modelfPositive penalty parameter betafAnd betaDep
Figure FDA0003635070780000053
Wherein,
Figure FDA0003635070780000054
nRespreis the value of the last iteration, Dep is the depth map of the 4D face, Emo is the dynamic expression information of the 4D face data,
Figure FDA0003635070780000055
e is noise, and λ is a parameter related to model convergence; c. C1,c2Are all coefficients;
and sending the dynamic expression information Emo extracted in the third step into a dynamic image network to extract information of expression motions, and further realizing the classification of expressions.
9. The tensor decomposition-based multi-feature fusion 4D expression recognition method as claimed in claim 8, wherein the bottom layer of the dynamic image network is a deep neural network, a rank posing layer is added before the network full-connection layer, and the rank posing layer of the network is calculated as follows:
the network is updated as follows:
Figure FDA0003635070780000056
wherein, a(m)Output, μ, representing the m-th layer of the moving picture networktRepresenting the parameter, V, to be learned by the network1,…,VTA feature representing an output of the dynamic image network;the following approximation is made to facilitate network back propagation:
Figure FDA0003635070780000061
wherein alpha istIs a parameter to be learned by the network,
Figure FDA0003635070780000062
the characteristics of the upper layer network are shown.
CN201911384458.0A 2019-12-28 2019-12-28 Tensor decomposition-based multi-feature fusion 4D expression identification method Active CN111178255B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911384458.0A CN111178255B (en) 2019-12-28 2019-12-28 Tensor decomposition-based multi-feature fusion 4D expression identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911384458.0A CN111178255B (en) 2019-12-28 2019-12-28 Tensor decomposition-based multi-feature fusion 4D expression identification method

Publications (2)

Publication Number Publication Date
CN111178255A CN111178255A (en) 2020-05-19
CN111178255B true CN111178255B (en) 2022-07-12

Family

ID=70658234

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911384458.0A Active CN111178255B (en) 2019-12-28 2019-12-28 Tensor decomposition-based multi-feature fusion 4D expression identification method

Country Status (1)

Country Link
CN (1) CN111178255B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113903063A (en) * 2021-09-27 2022-01-07 山东师范大学 Facial expression recognition method and system based on deep spatiotemporal network decision fusion

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679515A (en) * 2017-10-24 2018-02-09 西安交通大学 A kind of three-dimensional face identification method based on curved surface mediation shape image depth representing
CN110516557A (en) * 2019-08-01 2019-11-29 电子科技大学 Multisample facial expression recognizing method based on low-rank tensor resolution

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8923392B2 (en) * 2011-09-09 2014-12-30 Adobe Systems Incorporated Methods and apparatus for face fitting and editing applications
US9341728B2 (en) * 2013-07-29 2016-05-17 Westerngeco L.L.C. Methods of analyzing seismic data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679515A (en) * 2017-10-24 2018-02-09 西安交通大学 A kind of three-dimensional face identification method based on curved surface mediation shape image depth representing
CN110516557A (en) * 2019-08-01 2019-11-29 电子科技大学 Multisample facial expression recognizing method based on low-rank tensor resolution

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Automatic 3D Facial Expression Recognition using Geometric Scattering Representation;Xudong Yang 等;《2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition》;20150723;正文第3部分 *
Automatic 4D Facial Expression Recognition using Dynamic Geometrical Image Network;Weijian Li 等;《2018 13th IEEE International Conference on Automatic Face & Gesture Recognition》;20180607;正文第2-3部分 *
张量描述下的多姿态多表情人脸合成方法;吕煊 等;《计算机应用》;20120101;第32卷(第1期);全文 *

Also Published As

Publication number Publication date
CN111178255A (en) 2020-05-19

Similar Documents

Publication Publication Date Title
CN112887698B (en) High-quality face voice driving method based on nerve radiation field
US7876931B2 (en) Face recognition system and method
CN101916454B (en) Method for reconstructing high-resolution human face based on grid deformation and continuous optimization
JP6207210B2 (en) Information processing apparatus and method
CN112766160A (en) Face replacement method based on multi-stage attribute encoder and attention mechanism
CN110728209A (en) Gesture recognition method and device, electronic equipment and storage medium
CN112800903A (en) Dynamic expression recognition method and system based on space-time diagram convolutional neural network
WO2022184133A1 (en) Vision-based facial expression recognition method
CN110796593A (en) Image processing method, device, medium and electronic equipment based on artificial intelligence
CN111754637B (en) Large-scale three-dimensional face synthesis system with suppressed sample similarity
CN109325994B (en) Method for enhancing data based on three-dimensional face
CN111028319A (en) Three-dimensional non-photorealistic expression generation method based on facial motion unit
CN111754622B (en) Face three-dimensional image generation method and related equipment
CN111640172A (en) Attitude migration method based on generation of countermeasure network
CN109522865A (en) A kind of characteristic weighing fusion face identification method based on deep neural network
CN111178255B (en) Tensor decomposition-based multi-feature fusion 4D expression identification method
KR20230081378A (en) Multi-view semi-supervised learning for 3D human pose estimation
Zeng et al. Video‐driven state‐aware facial animation
Fan et al. Full face-and-head 3D model with photorealistic texture
CN111739168B (en) Large-scale three-dimensional face synthesis method with suppressed sample similarity
CN113468923B (en) Human-object interaction behavior detection method based on fine-grained multi-modal common representation
Tang et al. Global alignment for dynamic 3d morphable model construction
CN107423665A (en) Three-dimensional face analysis method and its analysis system based on BP neural network
CN113242419A (en) 2D-to-3D method and system based on static building
Ramnath et al. Increasing the density of active appearance models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant