CN112036281A

CN112036281A - Facial expression recognition method based on improved capsule network

Info

Publication number: CN112036281A
Application number: CN202010860025.4A
Authority: CN
Inventors: 张会焱; 敖文刚; 刘宗敏
Original assignee: Chongqing Technology and Business University
Current assignee: Chongqing Technology and Business University
Priority date: 2020-07-29
Filing date: 2020-08-24
Publication date: 2020-12-04
Anticipated expiration: 2040-08-24
Also published as: CN112036281B

Abstract

The invention provides a facial expression recognition method based on an improved capsule network, which comprises the following steps: inputting the sample picture into an improved capsule network for training; inputting the live-action picture into an improved capsule network for recognition, and extracting the facial expression in the live-action picture; the training of inputting the sample picture into the improved capsule network specifically comprises the following steps: s1, extracting a face region from a picture through a multitask convolutional neural network; s2, marking the extracted face area to obtain the expression and the head posture of the face area; s3, inputting the expression and the head posture of the face area into a generation countermeasure network, and generating the face area with the expression for the countermeasure network; s4, the face area with the expression is input into the improved capsule network to train the improved capsule network, the expression of the face under different postures can be accurately recognized, the posture condition of a human body does not need to be considered, and therefore the recognition accuracy can be guaranteed, and the recognition efficiency can be effectively improved.

Description

Facial expression recognition method based on improved capsule network

Technical Field

The invention relates to a facial expression recognition method, in particular to a facial expression recognition method based on an improved capsule network.

Background

The facial expression recognition is widely applied to modern production life, the facial expression recognition method mainly depends on a frame of a deep convolutional neural network, although the deep neural network can recognize a face to a certain extent, the facial expression recognition under different postures is difficult to realize, the angle of the face needs to be adjusted in the recognition process, so that the facial expression recognition efficiency is reduced, on the other hand, the facial expression recognition depends on the respective characteristics of organs such as eyes, a nose and a mouth, the existing facial expression recognition cannot recognize the relative position between the organs, and the recognition accuracy is low.

Therefore, in order to solve the above technical problems, it is necessary to provide a technical means for solving the problems.

Disclosure of Invention

In view of the above, the present invention aims to provide a facial expression recognition method based on an improved capsule network, which can accurately recognize facial expressions in different postures without considering the posture conditions of the human body, thereby ensuring the recognition accuracy and effectively improving the recognition efficiency.

The invention provides a facial expression recognition method based on an improved capsule network, which comprises the following steps:

inputting the sample picture into an improved capsule network for training;

inputting the live-action picture into an improved capsule network for recognition, and extracting the facial expression in the live-action picture;

the training of inputting the sample picture into the improved capsule network specifically comprises the following steps:

s1, extracting a face region from a picture through a multitask convolutional neural network;

s2, marking the extracted face area to obtain the expression and the head posture of the face area;

s3, inputting the expression and the head posture of the face area into a generation countermeasure network, and generating the face area with the expression for the countermeasure network;

and S4, inputting the face area with the expression into the improved capsule network to train the improved capsule network.

Further, in step S3, the generating a face area with an expression by generating a confrontation network specifically includes:

the generative countermeasure network includes an encoder and a decoder;

inputting the expression and the head posture of the face area into an encoder for processing, and outputting the facial picture characteristics, the expression and the posture by the encoder;

inputting the characteristics, expressions and postures of the face picture into a decoder for processing, and outputting the face picture with the expressions by the decoder;

constructing an objective function for generating a countermeasure network:

wherein x is a human face area, y is an expression label and a posture label, D (x, y) is an expression of the discriminator, and the output is true or false; g (x, y) is an expression of a generator, and is output as a generated face picture; d (G (x, y), y) is the result of judging the generation of the face picture by the generator G by the discriminator D; pd (x, y) is the joint probability of x and y; e_{x,y～pd(x,y)}As desired for pd (x, y);

and judging the facial picture with the expression output by the decoder by adopting an objective function for generating the confrontation network, and outputting the facial picture with the expression with the real judgment result.

Further, step S4 specifically includes:

the modified capsule network has a relu convolution layer, an initial capsule layer prim _ cap, a first convolution capsule layer conv _ cap1, a second convolution capsule layer conv _ cap2 and a classification capsule layer class _ cap;

inputting the generated face picture with the face expression of the countermeasure network into a relu convolution layer for processing, and outputting the local characteristics of the face picture;

the initial capsule layer prim _ cap processes local features of the face picture output by the relu volume layer and outputs 32 capsules;

the first capsule convolution layer conv _ cap1 processes the initial capsule layer and outputs 32 capsules;

the second capsule convolution layer conv _ cap2 processes the 32 capsules output by the first capsule convolution layer conv _ cap1 and outputs 32 capsules;

the classified capsule layer class _ cap processes the 32 capsules output from the second capsule convolution layer conv _ cap2 and outputs 7 capsules, the 7 capsules corresponding to 7-of-the-face expression.

Further, the 32 capsules output by the initial capsule layer prim _ cap are sequentially realized by a first capsule convolution layer conv _ cap1, a second capsule convolution layer conv _ cap2 and a classified capsule layer class _ cap through a T-EM route, and the method specifically comprises the following steps:

determining voting matrix V from lower layer capsule i to higher layer capsule j_ij：

V_ij＝P_i·W_ij；

wherein ,P_iIs the attitude matrix of the lower layer capsule i, W_ijA visual angle invariant matrix from a lower layer capsule i to a higher layer capsule j;

wherein, the voting matrix V_ijThe k element of (1)

The capsules j belonging to the higher layer are determined by the T distribution:

wherein, (. cndot.) is a gamma function,

is an element

To mean value mu_jMahalanobis distance of;

in order to be the expectation of the T distribution,

is a degree of freedom of the distribution of T,

is the variance of T distribution, and pi is the circumferential rate;

wherein ,

the loss function C for classifying the I lower layer capsules into the J higher layer capsules is:

wherein ,R_ijA weight for the jth capsule belonging to the jth capsule;

attitude matrix P for higher layer capsules_jAnd an activation matrix a_jVoting matrix P by lower layer capsule_iAnd an activation matrix a_iThe method is obtained by minimizing the formula (3) through a T-EM routing process, and specifically comprises the following steps:

initializing parameters:

wherein J is the number of capsules of higher layer;

and M:

R_ij＝R_ij×a_i,i＝[1,I]；

β_a and β_vExpressing a trainable variable, wherein lambda is a temperature coefficient and takes the value of 0.01;

degree of freedom

By calculating the solution of:

wherein ,

the degree of freedom of T distribution in the previous calculation;

e, step E: determining routes based on t-distribution based on the parameters calculated in the M steps:

；П_kis an arithmetic operator;

after the times are set through the M step and the E step of iteration, the attitude matrix P of the higher-layer capsule is obtained_jAnd the attitude matrix P_jIs the voting matrix V_ijOf (2) element(s)

Is measured.

Further, the capsule network is trained by a propagation loss function, wherein the propagation loss function of the t-th capsule in the lower-layer capsule activating the higher-layer capsule is:

L＝∑_i≠t(max(0,m-(a_t-a_i)))²(ii) a Where m is the variable gap, the initial value is 0.2, the maximum value is 0.9, a_tFor activating the activation value of the parent capsule, a_iThe activation value of the inactive parent capsule.

The invention has the beneficial effects that: according to the invention, the expressions of the human faces under different postures can be accurately recognized without considering the posture condition of the human body, so that the recognition accuracy can be ensured, and the recognition efficiency can be effectively improved.

Drawings

The invention is further described below with reference to the following figures and examples:

FIG. 1 is a flow chart of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings of the specification:

inputting the sample picture into an improved capsule network for training;

s4, inputting the face area with the expression into an improved capsule network to train the improved capsule network; according to the invention, the expressions of the human faces under different postures can be accurately recognized without considering the posture condition of the human body, so that the recognition accuracy can be ensured, and the recognition efficiency can be effectively improved.

In this embodiment, in step S3, the generating a face area with an expression by generating a countermeasure network specifically includes:

the generative countermeasure network includes an encoder and a decoder;

inputting the expression and the head posture of the face area into an encoder for processing, and outputting the facial picture characteristics, the expression and the posture by the encoder; the encoder inputs the face picture of 224 × 244 × 3 and outputs the face picture feature f with the length of 50, and the encoder is composed of five convolution layers and a full connection layer, wherein: the convolution layer has a kernel of 5 x 5 and a relu activation function; the full connection layer is provided with a tanh activation function;

inputting the characteristics, expressions and postures of the face picture into a decoder for processing, and outputting the face picture with the expressions by the decoder; the decoder is composed of seven deconvolution layers, the convolution kernel is 5 x 5, the first six deconvolution layers are provided with relu activation functions, and the last deconvolution layer is provided with tanh activation functions;

constructing an objective function for generating a countermeasure network:

adopting an objective function for generating a countermeasure network to judge the facial picture with the expression output by the decoder, and outputting the facial picture with the expression with the real judgment result; by the method, the facial expressions in the sample pictures can be accurately extracted, so that subsequent training and final recognition are facilitated.

In this embodiment, step S4 specifically includes:

the modified capsule network has a relu convolution layer, an initial capsule layer prim _ cap, a first convolution capsule layer conv _ cap1, a second convolution capsule layer conv _ cap2 and a classification capsule layer class _ cap; the Relu convolution layer conv _ Relu has the input of 28 × 3 facial expression pictures and the output of 14 × 32 local features, and consists of a 5 × 5 convolution layer, a BatchNormal layer and a Relu layer;

the initial capsule layer prim _ cap processes local features of the face picture output by the relu volume layer and outputs 32 capsules; the input of the initial capsule layer (prim _ cap) is the local feature output by the Relu convolution layer, the output is 32 capsules, the initial capsule layer is composed of two 1 x 1 coiling layers, the step length is 1, and the attitude matrix and the activation matrix of the output capsules are respectively formed;

Specifically, the method comprises the following steps: the 32 capsules output by the initial capsule layer prim _ cap are sequentially realized by a first capsule convolution layer conv _ cap1, a second capsule convolution layer conv _ cap2 and a classification capsule layer class _ cap through a T-EM route, and the method specifically comprises the following steps:

V_ij＝P_i·W_ij；

wherein, the voting matrix V_ijThe k element of (1)

wherein, (. cndot.) is a gamma function,

is an element

To mean value mu_jMahalanobis distance of;

in order to be the expectation of the T distribution,

is a degree of freedom of the distribution of T,

is the variance of the distribution of T,pi is the circumference ratio;

wherein ,

wherein ,R_ijA weight for the jth capsule belonging to the jth capsule;

initializing parameters:

wherein J is the number of capsules of higher layer;

and M:

R_ij＝R_ij×a_i,i＝[1,I]；

degree of freedom

By calculating the solution of:

wherein ,

the degree of freedom of T distribution in the previous calculation;

；П_kis an arithmetic operator;

Is measured. By the method, each capsule can be trained, so that the accuracy of final recognition is ensured.

In this embodiment, the capsule network is trained through the propagation loss function, and is obtained in the training processTrainable variable beta_a and β_vThe propagation loss function of the t-th capsule in the lower layer capsule activation higher layer capsule is:

Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims

1. A facial expression recognition method based on an improved capsule network is characterized by comprising the following steps: the method comprises the following steps:

inputting the sample picture into an improved capsule network for training;

2. The facial expression recognition method based on the improved capsule network as claimed in claim 1, wherein: in step S3, the generating of the confrontation network to generate the face area with an expression specifically includes:

the generative countermeasure network includes an encoder and a decoder;

constructing an objective function for generating a countermeasure network:

3. The facial expression recognition method based on the improved capsule network as claimed in claim 2, wherein: step S4 specifically includes:

4. The facial expression recognition method based on the improved capsule network as claimed in claim 3, wherein: the 32 capsules output by the initial capsule layer prim _ cap are sequentially realized by a first capsule convolution layer conv _ cap1, a second capsule convolution layer conv _ cap2 and a classification capsule layer class _ cap through a T-EM route, and the method specifically comprises the following steps:

V_ij＝P_i·W_ij；

wherein, the voting matrix V_ijThe k element of (1)

wherein, (. cndot.) is a gamma function,

is an element

To mean value mu_jMahalanobis distance of;

in order to be the expectation of the T distribution,

is a degree of freedom of the distribution of T,

is the variance of T distribution, and pi is the circumferential rate;

wherein ,

wherein ,R_ijA weight for the jth capsule belonging to the jth capsule;

initializing parameters:

wherein J is the number of capsules of higher layer;

and M:

R_ij＝R_ij×a_i,i＝[1,I]；

β_a and β_vRepresenting a trainable variable, λ being a temperature coefficient;

degree of freedom

By calculating the solution of:

wherein ,

the degree of freedom of T distribution in the previous calculation;

П_kis an arithmetic operator;

Is measured.

5. The facial expression recognition method based on the improved capsule network as claimed in claim 4, wherein: training the capsule network through a propagation loss function, wherein the propagation loss function of the t-th capsule in the lower-layer capsule activating the higher-layer capsule is as follows: