CN111178255B - Tensor decomposition-based multi-feature fusion 4D expression identification method - Google Patents
Tensor decomposition-based multi-feature fusion 4D expression identification method Download PDFInfo
- Publication number
- CN111178255B CN111178255B CN201911384458.0A CN201911384458A CN111178255B CN 111178255 B CN111178255 B CN 111178255B CN 201911384458 A CN201911384458 A CN 201911384458A CN 111178255 B CN111178255 B CN 111178255B
- Authority
- CN
- China
- Prior art keywords
- face
- expression
- emo
- normal vector
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000014509 gene expression Effects 0.000 title claims abstract description 88
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 41
- 230000004927 fusion Effects 0.000 title claims abstract description 23
- 239000013598 vector Substances 0.000 claims abstract description 89
- 230000008921 facial expression Effects 0.000 claims abstract description 30
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 24
- 230000008569 process Effects 0.000 claims description 19
- 230000003068 static effect Effects 0.000 claims description 11
- 238000010586 diagram Methods 0.000 claims description 10
- 230000001815 facial effect Effects 0.000 claims description 10
- 238000000926 separation method Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000005457 optimization Methods 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 230000033001 locomotion Effects 0.000 claims description 3
- 230000006872 improvement Effects 0.000 description 7
- 238000005286 illumination Methods 0.000 description 3
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Optimization (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
- Processing Or Creating Images (AREA)
Abstract
A4D expression recognition method based on tensor decomposition and multi-feature fusion obtains 4D expression data of a human face; preprocessing the 4D facial expression data and then calculating to obtain three components of a normal vector, a shape index and a depth map of the 4D facial expression data; carrying out tensor decomposition on three components of a normal vector, a shape index and a depth map of 4D face data respectively, and extracting dynamic face expression information; and classifying the dynamic facial expression information by using a dynamic image network, and performing score fusion on the classified results to obtain a final classified result. The method fully utilizes the information of the 4D face data, calculates three components of a normal vector of the face, a shape index and a depth map of the sequence face data, fully utilizes the 3D geometric information of the face, has more representative and discriminability of the features of the face for different people, and has higher accuracy of face recognition and expression recognition.
Description
Technical Field
The invention relates to an expression recognition method, in particular to a tensor decomposition-based multi-feature fusion 4D expression recognition method.
Background
With the development and progress of artificial intelligence and computer technology, expression recognition and face recognition are receiving more and more attention. The applications of expression recognition and face recognition in life are gradually becoming widespread. There are many current facial expression recognition methods, such as: and extracting the features of the 2D picture or the features of the video by using a deep neural network, and further classifying. Also, the expression classification using the 3D face data is available. In fact, 2D-based facial expression recognition is susceptible to illumination, scenes. The expression recognition based on 3D can overcome the influence of illumination and gesture, but for the expression recognition based on 3D, different individuals are different even with the same expression due to different expression modes and degrees of expressions of different people. Therefore, the identification information of the person is equivalent to a disturbance for the problem of expression recognition.
Disclosure of Invention
The invention aims to provide a tensor decomposition-based multi-feature fusion 4D expression recognition method.
In order to achieve the purpose, the invention adopts the following technical scheme:
a tensor decomposition-based multi-feature fusion 4D expression recognition method comprises the following steps:
(1) acquiring 4D facial expression data;
(2) preprocessing the 4D facial expression data and then calculating to obtain three components of a normal vector, a shape index and a depth map of the 4D facial expression data;
(3) carrying out tensor decomposition on three components of a normal vector, a shape index and a depth map of the 4D face data respectively, and extracting dynamic face expression information;
(4) and classifying the dynamic facial expression information by using a dynamic image network, and performing score fusion on the classified results to obtain a final classified result.
A further improvement of the present invention is that in step (1), the 4D facial expression data S ═ F1,F2,…FlIn which FiRepresenting 3D facial expression data, i ═ 1 … l, l representing the number of frames of the 4D face.
The further improvement of the invention is that in the step (2), the specific process of preprocessing the 4D expression data of the human face is as follows: and denoising the 4D facial expression data.
The further improvement of the invention is that in the step (2), the specific process of calculating the three components of the normal vector of the 4D face data is as follows:
1) firstly, calculating a normal vector of a single 3D face; the specific process is as follows: for 3D face data, first, a point P on the face is selectedjTo form a neighborhood δ ═ Pi(xi,yi,zi) I | ═ 1,2, … k }, k is 5, and the plane to be fitted is:
Ax+By+Cz+D=0
satisfies A2+B2+C2=1;
Solving the plane fitting problem by a least square method and a Lagrange multiplier method to obtain a point P on the facejEstimating all points on the 3D face to obtain a normal vector of the 3D face;
2) respectively projecting the normal vector of the 3D face to YZ, XZ and XY planes to obtain an X component diagram, a Y component diagram and a Z component diagram of the normal vector of the 3D face;
3) finally, performing step 1) and step 2) on each 3D face in the 4D face data to obtain a corresponding normal vector component image, and then overlapping corresponding normal vector X component images calculated by all 3D faces in the 4D face data together to obtain a normal vector X component image of the 4D face; overlapping the corresponding normal vector Y component images calculated by all 3D faces of the 4D face data to obtain a normal vector Y component image of the 4D face; and overlapping the normal vector Z component images calculated by all the 3D faces of the 4D face data together to obtain a normal vector Z component image of the 4D face.
The invention is further improved in that the concrete process for solving the plane fitting problem is as follows: normalization vector corresponding to minimum eigenvalue of covariance matrix sigma
The covariance matrix sigma is of the form
a further improvement of the present invention is that the specific process of calculating the shape index map is as follows:
firstly, calculating a shape index graph of a 3D face:
for a certain point of the human face, the point and the surrounding area are assumed to be a discrete parametric surface Parameters A, B, C, D, E, F and G are fitted according to the vertex coordinates of the 3D face, and then a matrix is obtainedThe characteristic root decomposition is carried out on the matrix, and the maximum characteristic root is the maximum principal curvature K1The minimum feature root is the minimum principal curvature K2(ii) a Substituting the maximum principal curvature and the minimum principal curvature at a vertex into a shape index Shapeindex calculation formula to obtain the shape index at the vertex;
calculating a shape index Shapeindex for each vertex of the 3D face to obtain a shape index graph of the 3D face;
and stacking the shape index images of each 3D face of the 4D face to obtain the shape index image of the 4D face.
The further improvement of the invention is that the specific process of calculating the depth map of the 4D face is as follows:
first, a depth map of a 3D face is calculated, and for a 3D face FiA point P ofj(xj,yj,zj) Corresponding to the pixel value Dep of the depth mapjThe calculation formula of (2) is as follows:
wherein z ismaxAnd zminRepresenting a face FiThe maximum and minimum values of the Z coordinate of the point of (a);
and then stacking the depth maps of all the 3D faces of the 4D face to obtain the depth map of the 4D face.
The further improvement of the present invention is that, in the step (3), the specific process of tensor decomposition of the depth map of the 4D face data is as follows:
1) establishing a model;
depth map Dep ∈ R for 4D faceH×W×LWherein H represents the height of the depth map, W represents the width of the depth map, L represents the sequence length of the 4D face, and the expression information is assumed to be Emo epsilon RH×W×LThe identity information is ID ∈ RH×W×LThen, a 4D facial expression-identity information separation model is established:
f=DEmo
wherein, lambda represents a weight coefficient, e represents noise, and DEmo represents modeling of dynamic expression information;
||DEmo||1=||DhEmo||1+||DvEmo||1+||DtEmo||1
DhEmo=vec(Emo(i,j+1,k)-Emo(i,j,k))
DvEmo=vec(Emo(i+1,j,k)-Emo(i,j,k))
DtEmo=vec(Emo(i,j,k+1)-Emo(i,j,k)) (2)
Dhdifferential operator representing the horizontal direction, DvDifferential operator representing the horizontal direction, DtA difference operator representing a time domain direction;
modeling the static persona ID is as follows:
wherein,core tensor, U, representing the Take decomposition1,U2,U3A matrix representing each mode in the Tack decomposition; is made from1,×2And-3Respectively representing the product of the tensor and the matrix of each mode;
2) solving the model:
and solving the established 4D facial expression-identity information separation model through iterative optimization.
The invention is further improved in that the specific process of solving by iterative optimization is as follows:
Wherein λ isDepIs a Lagrange multiplier vector, betaDepIn order to be a positive penalty parameter,for the estimated static identity information of the person, Dep is the depth map of the 4D face, Emo is the dynamic expression information of the 4D face data, e is noise,is shown asThe tensor of (a);
the second step is that: updating the noise e;
wherein, to representSpread vector, λDepIs a Lagrange multiplier vector, betaDepThe penalty parameter is positive, Dep is a depth map of the 4D face, and Emo is dynamic expression information of the 4D face data;
the third step: updating the dynamic expression information Emo;
wherein fftn and ifftn represent fast 3D Fourier transform and inverse transform, respectively, βDepAnd betafFor a positive penalty parameter, λfIs a Lagrange multiplier vector, | · non-conducting phosphor2Is the square operation of an elementDo, D*The companion matrix of D is represented;
update tensor f:
where λ is a weight coefficient, λfIs the Lagrange multiplier vector, betafFor positive penalty parameters, soft is a function defined as: soft (a, τ): sgn (a) · max (| a | - τ, 0);
lagrange multiplier vector lambda for updating 4D facial expression-identity information separation modelfPositive penalty parameter betafAnd betaDep,:
Wherein,nRespreis the value of the last iteration, Dep is the depth map of the 4D face, Emo is the dynamic expression information of the 4D face data,e is noise, and γ is a parameter related to model convergence; c. C1,c2Are all coefficients;
and sending the dynamic expression information Emo extracted in the third step into a dynamic image network to extract information of expression motions, and further realizing the classification of expressions.
The further improvement of the invention is that the bottom layer of the dynamic image network is a deep neural network, a rank posing layer is added before the network full-connection layer, and the calculation process of the rank posing layer of the network is as follows:
wherein,a(m)Output, μ, representing the m-th layer of the moving picture networktDenotes the parameter, V, to be learned by the network1,...,VTA feature representing an output of the dynamic image network; the following approximation is made to facilitate network back propagation:
wherein alpha istIs a parameter to be learned by the network,the characteristics of the upper layer network are shown.
Compared with the prior art, the invention has the following beneficial effects:
(1) the 4D face data is used for expression recognition and face recognition, and the defect that the 2D face recognition is greatly influenced by factors such as illumination postures can be overcome. By using the 4D data, stable effects can be obtained in expression recognition and face recognition for different scenes and environments.
(2) The method fully utilizes the information of the 4D face data, calculates three components of normal vectors, shape indexes and depth maps of the face for the sequence face data, fully utilizes the 3D geometric information of the face, has more representative and discriminability of the features of the face for different people, and has higher accuracy of face recognition and expression recognition.
(3) And decomposing the 4D face data by using a tensor decomposition method to obtain dynamic expression information and static face identity information. The dynamic expression information is used for expression recognition, and the interference of the character identity is removed, so that the expression recognition result is more stable and accurate.
Drawings
FIG. 1 is a detailed flow chart of the present invention.
Fig. 2 is a normal vector three component, shape index and depth map of the 4D face of the present invention.
Fig. 3 is a dynamic expression information graph extracted by tensor decomposition of three components of a normal vector, a shape index and a depth map of a 4D face according to the invention.
Fig. 4 is a network structure diagram of the present invention for performing expression recognition on dynamic expression information extracted from a shape index using a dynamic image network.
FIG. 5 is a diagram of the network architecture for multi-feature fusion expression recognition using a dynamic image network in accordance with the present invention.
Detailed Description
The present invention will be described in detail below with reference to examples.
Referring to fig. 1, the present invention comprises the steps of:
(1) acquiring 4D facial expression data;
(2) preprocessing the data, and calculating three components of a normal vector, a shape index and a depth map of the 4D face data, wherein the five features deeply reflect the geometric shape characteristics of the face at each moment;
(3) carrying out tensor decomposition on three components of a normal vector of the 4D face data, the shape index and the depth map respectively, and extracting dynamic face expression information and static identity information;
(4) and classifying the dynamic facial expression information by using a dynamic image network, and performing score fusion on the classified results to obtain a final classified result.
Specifically, referring to fig. 1, the present invention comprises the following steps:
step 101:
and acquiring 4D expression data of the human face, wherein the 4D expression data refer to a series of 3D human face data video sequences. Some cameras used firstly, such as Intel RealSense SR300 and the like, can capture the depth information of the face, can easily obtain 3D facial expression data by means of a structured light model, and continuously acquire the 3D facial expression data to obtain 4D facial data.
Assume that 4D facial expression data is S ═ F1,F2,…FlIn which Fi(i ═ 1 … l) represents 3D facial expression data, and l represents the number of frames of a 4D face.
Step 102:
the 4D data is preprocessed, 4D face data obtained by a camera often contains noise, holes and the like, the 4D face needs to be preprocessed, and normal vector components, shape indexes and depth maps corresponding to the 4D face are further calculated. In particular, the present invention relates to a method for producing,
step 2.1: the 4D face data is subjected to hole filling processing, and the hole filling processing can be realized by a template face hole filling method, which are common processing methods for 3D and 4D data.
Step 103:
three components of a normal vector of the 4D face data are calculated, as well as a shape index and a depth map. This step is to extract the corresponding geometric features from the 4D face. As shown in fig. 2, an example of calculating the three components of the normal vector, as well as the shape index and depth map, from the disclosed 4D expression public database BU4D is given. In this example, 5 frames of faces are selected, and the images respectively include, from top to bottom: the depth map comprises a normal vector X component map, a normal vector Y component map, a normal vector Z component map and a shape index map. The images show the same expression condition of the same face at different times from left to right. Specifically, calculating the three components of the normal vector of the 4D face, and the shape index and depth map comprises the following 3 steps.
(1) Calculating the normal vector component of the 4D face:
1) firstly, a normal vector of a single 3D face is calculated, and for a 3D face data, such as a point cloud, the normal vector at a certain point is estimated to be obtained by a method of plane fitting of a certain point and a plurality of points around the certain point, and generally, the normal vector is obtained by fitting a plane by using a point and 5 points nearest to the point. For example: a point P on the facejThe normal vector of (1) is first selected to be PjTo form a neighborhood δ ═ Pi(xi,yi,zi) I ═ 1,2, … k }, where k takes 5 to fit the plane:
Ax+By+Cz+D=0
satisfies A2+B2+C2=1;
By least squares and Lagrange multipliersSolving the plane fitting problem to finally obtain a point P on the facejThe specific process of solving the plane fitting problem specifically comprises the following steps: normalization vector corresponding to minimum eigenvalue of covariance matrix sigma
The covariance matrix Σ is of the form:
Estimating all points on the 3D face to obtain a normal vector of the 3D face;
2) next, three components of the normal vector are calculated, specifically, after the normal vector of a certain 3D face is obtained, the normal vector is projected on YZ, XZ, and XY planes, and an X component map, a Y component map, and a Z component map of the normal vector of the 3D face are obtained in this order.
3) And finally, performing step 1) and step 2) on each 3D face in the 4D face data to obtain a corresponding normal vector component image, overlapping corresponding normal vector X component images calculated by all 3D faces of the 4D face data to obtain a normal vector X component image of the 4D face, overlapping corresponding normal vector Y component images calculated by all 3D faces of the 4D face data to obtain a normal vector Y component image of the 4D face, and overlapping corresponding normal vector Z component images calculated by all 3D faces of the 4D face data to obtain a normal vector Z component image of the 4D face. The resulting 4D normal component map is actually a video of normal components.
(2) The shape index of the 4D face is measured as a result of normalization of two principal curvatures of two curved surfaces of the face, and can be regarded as a property of second order differential, specifically, the step of calculating the shape index of the 4D face is as follows: firstly, calculating a shape index graph of a 3D face:
for a certain point of the human face, the point and the surrounding area are assumed to be a discrete parametric surface Parameters A, B, C, D, E, F and G in the above formula are fitted according to the vertex coordinates of the 3D face. Then, a matrix is obtainedThe characteristic root decomposition is carried out on the matrix, and the maximum characteristic root is the maximum principal curvature K1The minimum feature root is the minimum principal curvature K2. Substituting the maximum principal curvature and the minimum principal curvature at one vertex into a shape index (Shapeindex) calculation formula:
the shape index at the vertex is obtained. Calculating a shape index Shapeindex for each vertex of the 3D face to obtain a shape index graph of the 3D face;
and stacking the shape index images of each 3D face of the 4D face to obtain the shape index image of the 4D face.
(3) And calculating a depth map of the 4D face, wherein the gray value of each pixel of the depth map represents each point of the face, the distance from the point to the camera and the geometric shape of the face. The steps of calculating the depth map of the 4D face are as follows:
first, a depth map of a 3D face is calculated, and for a 3D face FiA point P ofj(xj,yj,zj) In other words, the corresponding pixel value Dep of the depth mapjThe calculation formula of (2) is as follows:
wherein z ismaxAnd zminRepresenting a face FiThe maximum and minimum values of the Z coordinate of the point of (a).
And then stacking the depth maps of all the 3D faces of the 4D face to obtain the depth map of the 4D face.
Step 104:
and (4) carrying out tensor decomposition on the normal vector component, the shape index and the depth map obtained in the step 103 respectively to obtain static person identity information and dynamic expression change information. Specifically, taking a depth map as an example, the components of the normal vector and the process of tensor decomposition of the shape index are similarly obtained.
(1) Establishing a model, and determining a depth map Dep epsilon R of the 4D faceH×W×LWhere H denotes the height of the depth map, W denotes the width of the depth map, and L denotes the sequence length of the 4D face. Considering that for a 3D face sequence, the dynamic part is the expression and the static part is the identity information, the expression and identity can be considered as distributed independently, and the two can be separated. Suppose the expression information is Emo ∈ RH×W×LThe identity information is ID ∈ RH×W×LThen, the following 4D facial expression-identity information separation model can be established:
f=DEmo
wherein λ represents a weight coefficient, which measures the specific gravity between f and e, e represents noise, and DEmo represents modeling of dynamic expression information:
||DEmo||1=||DhEmo||1+||DvEmo||1+||DtEmo||1
DhEmo=vec(Emo(i,j+1,k)-Emo(i,j,k))
DvEmo=vec(Emo(i+1,j,k)-Emo(i,j,k))
DtEmo=vec(Emo(i,j,k+1)-Emo(i,j,k)) (2)
Dhdifferential operator representing the horizontal direction, DvDifferential operator representing the horizontal direction, DtA difference operator representing the time domain direction.
Formula (2) depicts the dynamic transformation information of the facial expression. Modeling static persona IDs as follows
Formula (3) is actually a tach decomposition of the identity information of the 4D face,core tensor, U, representing the Take decomposition1,U2,U3A matrix representing each mode in the Tack decomposition. (3) In formula (i)1,×2And is prepared from3The products of the tensors and the matrices of the individual modes are represented separately. The characteristic that the face identity information is kept unchanged in different expressions is reflected through the modeling of the formula (3).
(2) And (3) solving the model, namely solving the 4D facial expression-identity information separation model established by the formula (1) by iterative optimization. For such multivariate optimization problems, iterative optimization is usually performed using an Alternating Direction Multiplier Method (ADMM). Firstly, initializing parameters, and specifically, iteratively updating the parameters as follows:
Wherein λ isDepIs the Lagrange multiplier vector, betaDepIn order to be a positive penalty parameter,for the estimated static identity information of the person, Dep is the depth map of the 4D face, Emo is the dynamic expression information of the 4D face data, e is noise,is shown asThe tensor of (a);
the second step is that: update noise e:
wherein RepresentSpread vector, λDepIs the Lagrange multiplier vector, betaDepFor the positive penalty parameter, Dep is the depth map of the 4D face, and Emo is the dynamic expression information of the 4D face data.
The third step: updating the dynamic expression information Emo;
wherein fftn and iftn respectively represent fast 3D Fourier transform and inverse transform, betaDepAnd betafFor a positive penalty parameter, λfIs a Lagrange multiplier vector, | · non-conducting phosphor2Is the squaring operation of the elements, D*Denoted is the companion matrix of D, Dh,Dv,DtThe difference operators representing the vertical, horizontal and temporal directions, respectively, f is defined as: f is DEmo;
wherein D is*The companion matrix of D is represented.
Update tensor f:
where λ is a weight coefficient, λfIs the Lagrange multiplier vector, betafFor a positive penalty parameter, soft is a function defined as: soft (a, τ): sgn (a) · max (| a | - τ,0)
Lagrange multiplier vector lambda for updating 4D facial expression-identity information separation modelf,λfAnd a positive penalty parameter betafAnd betaDep,:
Wherein,nRespreis the value of the last iteration, Dep is the depth map of the 4D face, Emo is the dynamic expression information of the 4D face data,e is noise, and γ is a parameter related to model convergence; c. C1,c2Are all coefficients; c. C1,c21.15 and 0.95 were taken, respectively.
As shown in fig. 3, three components of the normal vector of the 4D face are displayed, and a dynamic expression map extracted after the shape index and the depth map are subjected to tensor decomposition is displayed. The image top-down represents a depth map, and the images top-down respectively are: the depth map, the normal vector X component map, the normal vector Y component map, the normal vector Z component map and the shape index map display dynamic expression information maps of the same expression of the same face extracted through tensor decomposition at different times from left to right.
Step 105:
and (4) sending the dynamic expression information Emo extracted in the step three into a dynamic image network to extract the information of expression motion, and further realizing the classification of expressions. General expressions can be divided into six categories: happy, angry, sadness, surprise, dislike and fear. A moving picture network is a network that extracts moving pictures. The bottom layer of the network is a general deep neural network such as: the VGGNet16 network is added with a rank posing layer before the network full connection layer. Wherein the rank posing layer functions to change a view sequence feature into a graph. This picture implies the dynamic characteristics of each frame of a video sequence. As shown in fig. 4, which is a network structure diagram of a dynamic image network, the calculation flow of the rank posing layer of the network is as follows:
wherein, a(m)Representing the output, μ, of the m-th layer of the moving picture networktRepresenting the parameter, V, to be learned by the network1,...,VTFeatures representing the output of the dynamic image network. The following approximation is made to facilitate network back propagation:
wherein alpha istIs a parameter to be learned by the network,the characteristics of the upper layer network are shown.
Step 106:
and (4) performing score fusion on the results of the different feature data obtained in the step (4), finally obtaining the expression recognition result of the model, and outputting the expression recognition result. Fig. 5 shows a network structure diagram for multi-feature fusion expression recognition of dynamic expression information extracted from normal vector components, shape indexes and depth maps by using a dynamic image network.
The invention relates to a geometric feature image of 4D face data based on tensor decomposition, which comprises the following steps: three components of a normal vector, a shape index and a depth map. Dynamic expression information and static figure identity information of geometric feature images of the 4D face data are separated, expression recognition is carried out by respectively utilizing the extracted dynamic expression information of the 4D face data, score fusion is carried out on expression recognition results of different geometric feature images, and a final expression recognition result is obtained.
Claims (9)
1. A tensor decomposition-based multi-feature fusion 4D expression recognition method is characterized by comprising the following steps:
(1) acquiring 4D facial expression data;
(2) preprocessing the 4D facial expression data and then calculating to obtain three components of a normal vector, a shape index and a depth map of the 4D facial expression data;
(3) carrying out tensor decomposition on three components of a normal vector, a shape index and a depth map of the 4D face data respectively, and extracting dynamic face expression information; the specific process of tensor decomposition of the depth map of the 4D face data is as follows:
1) establishing a model;
depth map Dep ∈ R for 4D faceH×W×LWherein H represents the height of the depth map, W represents the width of the depth map, L represents the sequence length of the 4D face, and the expression information is assumed to be Emo epsilon RH×W×LThe identity information is ID ∈ RH×W×LAnd then establishing a 4D facial expression-identity information separation model:
f=DEmo
wherein, lambda represents a weight coefficient, e represents noise, and DEmo represents modeling of dynamic expression information;
‖DEmo‖1=‖DhEmo‖1+‖DvEmo‖1+‖DtEmo‖1
DhEmo=vec(Emo(i,j+1,k)-Emo(i,j,k))
DvEmo=vec(Emo(i+1,j,k)-Emo(i,j,k))
DtEmo=vec(Emo(i,j,k+1)-Emo(i,j,k)) (2)
Dndifferential operator representing the horizontal direction, DvDifferential operator representing the horizontal direction, DtA difference operator representing a time domain direction;
modeling the static persona ID is as follows:
wherein,core tensor representing Take decomposition,U1,U2,U3A matrix representing each mode in the Tack decomposition; is made from1,×2And-3Respectively representing the product of the tensor and the matrix of each mode;
2) solving the model:
solving the established 4D facial expression-identity information separation model through iterative optimization;
(4) and classifying the dynamic facial expression information by using a dynamic image network, and performing score fusion on the classified results to obtain a final classified result.
2. The tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein in the step (1), 4D facial expression data S ═ { F ═ F%1,F2,…FlIn which FiRepresenting 3D facial expression data, i ═ 1 … l, l representing the number of frames of the 4D face.
3. The tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein in the step (2), the specific process of preprocessing the 4D expression data of the human face is as follows: and denoising the 4D facial expression data.
4. The tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein in the step (2), the specific process of calculating three components of the normal vector of the 4D face data is as follows:
1) firstly, calculating a normal vector of a single 3D face; the specific process is as follows: for 3D face data, first, a point P on the face is selectedjTo form a neighborhood δ ═ Pi(xi,yi,zi) I | ═ 1,2, … k }, k is 5, and the plane to be fitted is:
Ax+By+Cz+D=0
satisfies A2+B2+C2=1;
By a minimum of twoMultiplying and Lagrange multiplier method, solving the plane fitting problem to obtain a point P on the facejEstimating all points on the 3D face to obtain a normal vector of the 3D face;
2) respectively projecting the normal vector of the 3D face to YZ, XZ and XY planes to obtain an X component diagram, a Y component diagram and a Z component diagram of the normal vector of the 3D face;
3) finally, performing step 1) and step 2) on each 3D face in the 4D face data to obtain a corresponding normal vector component image, and overlapping corresponding normal vector X component images calculated by all 3D faces in the 4D face data to obtain a normal vector X component image of the 4D face; overlapping the corresponding normal vector Y component images calculated by all 3D faces of the 4D face data to obtain a normal vector Y component image of the 4D face; and overlapping the normal vector Z component images calculated by all the 3D faces of the 4D face data together to obtain a normal vector Z component image of the 4D face.
5. The tensor decomposition-based multi-feature fusion 4D expression recognition method as claimed in claim 4, wherein the specific process of solving the plane fitting problem is as follows: normalization vector corresponding to minimum eigenvalue of covariance matrix sigma
The covariance matrix Σ is of the form:
6. the tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein the specific process of calculating the shape index map is as follows:
firstly, calculating a shape index graph of a 3D face:
for a certain point of the human face, the point and the surrounding area are assumed to be a discrete parametric surface Parameters A, B, C, D, E, F and G are fitted according to the vertex coordinates of the 3D face, and then a matrix is obtainedThe characteristic root decomposition is carried out on the matrix, and the maximum characteristic root is the maximum principal curvature K1The minimum feature root is the minimum principal curvature K2(ii) a Substituting the maximum principal curvature and the minimum principal curvature at a vertex into a shape index Shapeindex calculation formula to obtain the shape index at the vertex;
calculating a shape index Shapeindex for each vertex of the 3D face to obtain a shape index graph of the 3D face;
and stacking the shape index images of each 3D face of the 4D face to obtain the shape index image of the 4D face.
7. The tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein the specific process of calculating the depth map of the 4D face is as follows:
first, a depth map of a 3D face is calculated, and for a 3D face FiA point P ofj(xj,yj,zj) The pixel value Dep of its corresponding depth mapjThe calculation formula of (2) is as follows:
wherein z ismaxAnd zminRepresenting a face FiThe maximum and minimum values of the Z coordinate of the point of (a);
and then stacking the depth maps of all the 3D faces of the 4D face to obtain the depth map of the 4D face.
8. The tensor decomposition-based multi-feature fusion 4D expression recognition method as recited in claim 1, wherein the specific process of solving through iterative optimization is as follows:
Wherein λ isDepIs a Lagrange multiplier vector, betaDepIn order to be a positive penalty parameter,for the estimated static identity information of the person, Dep is the depth map of the 4D face, Emo is the dynamic expression information of the 4D face data, e is noise,is shown asThe tensor of (a);
the second step: updating the noise e;
wherein, to representSpread vector, λDepIs a Lagrange multiplier vector, betaDepThe penalty parameter is positive, Dep is a depth map of the 4D face, and Emo is dynamic expression information of the 4D face data;
the third step: updating the dynamic expression information Emo;
wherein fftn and ifftn represent fast 3D Fourier transform and inverse transform, respectively, βDepAnd betafFor a positive penalty parameter, λfIs a Lagrange multiplier vector | · non-linear2Is a squaring operation of elements, D*Denoted is the companion matrix for D;
update tensor f:
wherein is the weight coefficient, λfIs the Lagrange multiplier vector, betafFor a positive penalty parameter, soft is a function defined as: soft (a, τ) ═ sgn (a) · max (| a | - τ, 0);
lagrange multiplier vector lambda for updating 4D facial expression-identity information separation modelfPositive penalty parameter betafAnd betaDep:
Wherein,nRespreis the value of the last iteration, Dep is the depth map of the 4D face, Emo is the dynamic expression information of the 4D face data,e is noise, and λ is a parameter related to model convergence; c. C1,c2Are all coefficients;
and sending the dynamic expression information Emo extracted in the third step into a dynamic image network to extract information of expression motions, and further realizing the classification of expressions.
9. The tensor decomposition-based multi-feature fusion 4D expression recognition method as claimed in claim 8, wherein the bottom layer of the dynamic image network is a deep neural network, a rank posing layer is added before the network full-connection layer, and the rank posing layer of the network is calculated as follows:
wherein, a(m)Output, μ, representing the m-th layer of the moving picture networktRepresenting the parameter, V, to be learned by the network1,…,VTA feature representing an output of the dynamic image network;the following approximation is made to facilitate network back propagation:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911384458.0A CN111178255B (en) | 2019-12-28 | 2019-12-28 | Tensor decomposition-based multi-feature fusion 4D expression identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911384458.0A CN111178255B (en) | 2019-12-28 | 2019-12-28 | Tensor decomposition-based multi-feature fusion 4D expression identification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111178255A CN111178255A (en) | 2020-05-19 |
CN111178255B true CN111178255B (en) | 2022-07-12 |
Family
ID=70658234
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911384458.0A Active CN111178255B (en) | 2019-12-28 | 2019-12-28 | Tensor decomposition-based multi-feature fusion 4D expression identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111178255B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113903063A (en) * | 2021-09-27 | 2022-01-07 | 山东师范大学 | Facial expression recognition method and system based on deep spatiotemporal network decision fusion |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679515A (en) * | 2017-10-24 | 2018-02-09 | 西安交通大学 | A kind of three-dimensional face identification method based on curved surface mediation shape image depth representing |
CN110516557A (en) * | 2019-08-01 | 2019-11-29 | 电子科技大学 | Multisample facial expression recognizing method based on low-rank tensor resolution |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8923392B2 (en) * | 2011-09-09 | 2014-12-30 | Adobe Systems Incorporated | Methods and apparatus for face fitting and editing applications |
US9341728B2 (en) * | 2013-07-29 | 2016-05-17 | Westerngeco L.L.C. | Methods of analyzing seismic data |
-
2019
- 2019-12-28 CN CN201911384458.0A patent/CN111178255B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679515A (en) * | 2017-10-24 | 2018-02-09 | 西安交通大学 | A kind of three-dimensional face identification method based on curved surface mediation shape image depth representing |
CN110516557A (en) * | 2019-08-01 | 2019-11-29 | 电子科技大学 | Multisample facial expression recognizing method based on low-rank tensor resolution |
Non-Patent Citations (3)
Title |
---|
Automatic 3D Facial Expression Recognition using Geometric Scattering Representation;Xudong Yang 等;《2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition》;20150723;正文第3部分 * |
Automatic 4D Facial Expression Recognition using Dynamic Geometrical Image Network;Weijian Li 等;《2018 13th IEEE International Conference on Automatic Face & Gesture Recognition》;20180607;正文第2-3部分 * |
张量描述下的多姿态多表情人脸合成方法;吕煊 等;《计算机应用》;20120101;第32卷(第1期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111178255A (en) | 2020-05-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112887698B (en) | High-quality face voice driving method based on nerve radiation field | |
US7876931B2 (en) | Face recognition system and method | |
CN101916454B (en) | Method for reconstructing high-resolution human face based on grid deformation and continuous optimization | |
JP6207210B2 (en) | Information processing apparatus and method | |
CN112766160A (en) | Face replacement method based on multi-stage attribute encoder and attention mechanism | |
CN110728209A (en) | Gesture recognition method and device, electronic equipment and storage medium | |
CN112800903A (en) | Dynamic expression recognition method and system based on space-time diagram convolutional neural network | |
WO2022184133A1 (en) | Vision-based facial expression recognition method | |
CN110796593A (en) | Image processing method, device, medium and electronic equipment based on artificial intelligence | |
CN111754637B (en) | Large-scale three-dimensional face synthesis system with suppressed sample similarity | |
CN109325994B (en) | Method for enhancing data based on three-dimensional face | |
CN111028319A (en) | Three-dimensional non-photorealistic expression generation method based on facial motion unit | |
CN111754622B (en) | Face three-dimensional image generation method and related equipment | |
CN111640172A (en) | Attitude migration method based on generation of countermeasure network | |
CN109522865A (en) | A kind of characteristic weighing fusion face identification method based on deep neural network | |
CN111178255B (en) | Tensor decomposition-based multi-feature fusion 4D expression identification method | |
KR20230081378A (en) | Multi-view semi-supervised learning for 3D human pose estimation | |
Zeng et al. | Video‐driven state‐aware facial animation | |
Fan et al. | Full face-and-head 3D model with photorealistic texture | |
CN111739168B (en) | Large-scale three-dimensional face synthesis method with suppressed sample similarity | |
CN113468923B (en) | Human-object interaction behavior detection method based on fine-grained multi-modal common representation | |
Tang et al. | Global alignment for dynamic 3d morphable model construction | |
CN107423665A (en) | Three-dimensional face analysis method and its analysis system based on BP neural network | |
CN113242419A (en) | 2D-to-3D method and system based on static building | |
Ramnath et al. | Increasing the density of active appearance models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |