CN110163098A

CN110163098A - Based on the facial expression recognition model construction of depth of seam division network and recognition methods

Info

Publication number: CN110163098A
Application number: CN201910307984.0A
Authority: CN
Inventors: 彭先霖; 张海曦; 胡琦瑶; 温超; 彭进业
Original assignee: Northwest University
Current assignee: Northwest University
Priority date: 2019-04-17
Filing date: 2019-04-17
Publication date: 2019-08-23

Abstract

The invention discloses a kind of based on the facial expression recognition model construction of depth of seam division network and recognition methods, it proposes the double-deck Tree Classifier and uses a facial expression recognition classifier and multiple recognition of face classifiers, high-rise learning tasks lay particular emphasis on the identification of human face expression, and each low layer learning tasks lay particular emphasis on recognition of face (face with identical expression).Furthermore, layered structure can be used for determining the distribution of the quantity and low-level nodes (face) of the slubbing point (human face expression) in each learning tasks, it should be wherein assigned in identical learning tasks with the face of identical human face expression, method provided by the invention is in facial paralysis Expression Recognition and extends in conventional facial expression recognition problem and has higher accuracy rate.

Description

Based on the facial expression recognition model construction of depth of seam division network and recognition methods

Technical field

The present invention relates to facial image recognition methods, and in particular to a kind of facial expression recognition based on depth of seam division network Model construction and recognition methods.

Background technique

With the development of the related disciplines such as computer technology, artificial intelligence technology, the degree of automation of entire society is continuous It improves, people are increasingly strong for the demand of the human-computer interaction of similar person to person's exchange way.If computer and robot energy It is enough that there is the ability for understanding and showing emotion as the mankind, it will fundamentally change the relationship between people and computer, make to count Calculation machine can be serviced preferably for the mankind.Expression Recognition is the basis of affective comprehension, is the premise of computer understanding people's emotion. If realizing that computer will fundamentally change the relationship of people and computer to the understanding of human face expression with identification, this will be to future Human-computer interaction generates important meaning.

The existing expression recognition method based on deep neural network is absorbed in Expression Recognition individual task mostly, does not consider The influence of facial shape otherness, such as facial paralysis Expression Recognition, the purpose is to ignore facial paralysis ingredient and identify expressive components, And facial paralysis is identified, facial paralysis expressive components should be ignored, and emphasis identifies facial paralysis, however, in real world facial paralysis expression with The many factors weave ins such as facial shape, head pose, the exterior light photograph of Different Individual, while the difference of facial shape is also It will lead to facial paralysis expression and generate variation.

Therefore expression recognition method in the prior art can only identify facial expression mostly, can not be simultaneously to face and table Feelings are identified.

Summary of the invention

The purpose of the present invention is to provide a kind of based on the facial expression recognition model construction of depth of seam division network and identification Method can not simultaneously identify face and expression to solve expression recognition method in the prior art.

In order to realize above-mentioned task, the invention adopts the following technical scheme:

A kind of facial expression recognition model building method based on depth of seam division network executes according to the following steps:

Step 1 obtains the image that several include face, obtains original image set；

Step 2 carries out again after concentrating every width original image to carry out the pretreatment of human face region interception to the original image The normalization of picture size obtains facial expression image collection；

Step 3 obtains the corresponding label of facial expression image concentration each image, obtains tally set；The label includes table Feelings label and face label；

Step 4 trains depth of seam division using the tally set as output using the facial expression image collection as input Network obtains facial expression recognition model；

The depth of seam division network includes the multiple convolutional layers set gradually, multiple full articulamentums and the double-deck tree Classifier；

The double-deck Tree Classifier includes Expression Recognition classification layer and recognition of face classification layer, the Expression Recognition Layer of classifying exports expression label, and the recognition of face classification layer exports face label.

Further, face table is obtained using error back propagation method training depth of seam division network in the step 4 Feelings identification model.

Further, it is calculated separately using Formulas I with Formula II using when error back propagation method training depth of seam division network The partial derivative of Expression Recognition classification layer weight parameter and the partial derivative of recognition of face classification layer weight parameter:

Wherein,For the partial derivative of Expression Recognition classification layer weight parameter, ￡ is loss function, and W is Expression Recognition classification Layer weight parameter, x_eFor Expression Recognition classify layer input feature vector,For the local derviation of recognition of face classification layer weight parameter Number, V are recognition of face classification layer weight parameter, x_fFor the feature vector of recognition of face classification layer input；

The loss function ￡ is obtained using formula III:

Wherein, V_nIndicate the weight parameter of n-th of face label of recognition of face classification layer, V_jIndicate recognition of face classification The weight parameter of j-th of face label of layer, j=1,2 ..., N, n=1,2 ..., N, N indicate face total number of labels, and N is positive whole Number, b_fIndicate recognition of face offset parameter, b_fFor natural number；

W_mIndicate the weight parameter of m-th of expression label of Expression Recognition classification layer, W_iIndicate the of Expression Recognition classification layer The weight parameter of i expression label, i=1,2 ..., M, m=1,2 ..., M, M indicate expression label sum, and M is positive integer, b_e Indicate Expression Recognition offset parameter, b_eFor natural number.

Further, depth of seam division network described in the step 4 include 5 convolutional layers set gradually, 3 it is complete Articulamentum and 1 double-deck Tree Classifier.

Further, the face label in the step 3 includes facial paralysis label and non-facial paralysis label.

A kind of facial expression recognizing method based on depth of seam division network, using step A- step B to face to be identified Image is handled, and recognition result is obtained:

Step A, the facial image to be identified is handled using the method for step 2, obtains facial expression image；

Step B, the step A facial expression image obtained is identified using facial expression recognition model, obtains recognition result.

The present invention has following technical effect that compared with prior art

1, provided by the invention a kind of based on the facial expression recognition model construction of depth of seam division network and recognition methods, it adopts The plane softmax classifier in deep neural network is replaced with the double-deck Tree Classifier, it is poor to solve human individual's facial shape The different influence that facial expression recognition is generated；It is used in conjunction with expression label and face label in the depth of seam division network of proposition, It is identified while realization to expression and face, and improves the accuracy rate of identification；

2, provided by the invention a kind of based on the facial expression recognition model construction of depth of seam division network and recognition methods, lead to It crosses and devises loss function, it can be with the weight coefficient in combined optimization multistage classifier, so that the function during each iteration The correctly predicted probability of Expression Recognition and recognition of face can be maximized.

Detailed description of the invention

Fig. 1 is the depth of seam division schematic network structure provided in one embodiment of the present of invention.

Specific embodiment

The technical term occurred in the present invention is explained first, in the technology to help to better understand the application Hold:

The double-deck Tree Classifier: one has the classifier of the double-deck classification layer, passes through the building one double-deck tree in the present invention Shape structure embodies the relationship between recognition of face and facial expression recognition task, and first layer is Expression Recognition classification, the second layer For recognition of face classification, input picture is trained in combination with face label and human face expression label, so that study has more The depth characteristic of discrimination.

Loss function: being that chance event or its value in relation to stochastic variable are mapped as nonnegative real number to indicate that this is random The function of " risk " or " loss " of event.In the application, loss function is associated usually as learning criterion with optimization problem, I.e. by minimizing loss function solution and assessment models.

Neural network weight coefficient: the weight coefficient connected between neuron.

The partial derivative of neural network weight coefficient: important ginseng when neural network weight coefficient is updated using gradient descent method Number.

Expression label: can express the label of human emotion, can be happy, sad and tranquil etc..

Face label: can represent the label of face characteristic, can be about the age, about the colour of skin or about human face's shape The label of state.

Error backpropagation algorithm: the purpose of back-propagation algorithm is that the method declined by gradient finds one group of energy maximum Reduce to limit the weight of error, the detailed process of error backpropagation algorithm are as follows: firstly, initialization neural network；Secondly, defeated Enter training sample pair, calculates each layer output；Then network output error is calculated；Then each layer error signal is calculated；Then pass through The method of gradient decline adjusts each layer weight, wherein when the method using gradient decline adjusts each layer weight, it is each by calculating The partial derivative of layer weight coefficient obtains gradient, and along the opposite direction of gradient vector, gradient reduces most fast, it is easier to find function Minimum value, that is, weighted value；Then it checks whether network overall error reaches required precision, meets, then training terminates；It is unsatisfactory for, then Input training sample pair is returned, is recycled after calculating each layer output.

Embodiment one

Since many factors such as the facial shape of expression and Different Individual, head pose, exterior light photograph are handed in real world It is woven in together, while the difference of facial shape also results in expression and generates variation.In order to weaken facial shape to Expression Recognition It influences, discloses a kind of facial expression recognition model building method based on depth of seam division network in the present embodiment.

Method provided by the invention constructs the double-deck Tree Classifier based on existing depth convolutional neural networks model to replace The plane softmax classifier of output layer is changed, depth multi-task learning frame is constructed, by utilizing expression label and face label Common study has more the depth characteristic of sense, knowledge is migrated from related face identification mission, to weaken face Influence of the form to Expression Recognition improves Expression Recognition performance.

Specifically, method provided by the invention executes according to the following steps:

In the present embodiment, the face position of original image is intercepted out, and be normalized, picture size is 224*224。

In the present embodiment, expression label can be the label that can express human feelings sense such as happy, sad and tranquil；People Face label can be about the age, about the colour of skin, about any label that can represent face characteristic of human face's state, such as Method provided in this embodiment is applied in facial paralysis Expression Recognition, therefore face label includes facial paralysis label and non-facial paralysis mark Label；Or method provided in this embodiment is applied to based in facial image fatigue driving identification, face label packet at this time Include tired label and non-tired label, that is to say, that facial expression recognizing method provided in this embodiment can extend to more A application environment, compared to a kind of traditional neural network that can only obtain recognition result, recognition methods provided in this embodiment Two kinds of recognition results can be obtained simultaneously.

In the present embodiment, facial expression recognizing method provided by the invention is applied to facial paralysis Expression Recognition, wherein table Feelings label includes closing one's eyes, smiling, lift eyebrow, frowning, alarmming nose, showing tooth and drum this 7 kinds of expressions of the cheek, face label include facial paralysis label with And non-facial paralysis label；Therefore, there is 7 × 2=14 kind for a possibility that its label of width facial expression image.Such as a width expression Image, label are [facial paralysis is closed one's eyes] or [non-facial paralysis is smiled] etc..

In the present embodiment, depth of seam division network is completed on the basis of deep neural network model A1exNet, such as Fig. 1 It is shown, as a preferred embodiment, depth of seam division network include the 5 layers of convolutional layer set gradually, 3 layers of full articulamentum with And the double-deck Tree Classifier.

Wherein the 1st hidden layer is convolutional layer, uses the convolution kernel of 96 11x 11x 3, number of nodes are as follows: 55 (W) x 55 (H) (C) x of x 48 2=290400.It (is rolled up according to the parameter of convolutional layer=convolution kernel size x convolution kernel quantity+number of offsets Long-pending nuclear volume), this layer parameter quantity are as follows: (11*11*3*96)+96=34848.

2nd hidden layer is convolutional layer, using 256 5x5x48 convolution kernels, only with upper one layer of connection of the same GPU, section Point quantity: 27*27*128*2=186624, number of parameters: (5*5*48*128+128) * 2=307456, last " * 2 " be because Network layer is evenly distributed on two GPU, first calculates the parameter on single GPU, multiplied by GPU quantity 2.

3rd hidden layer is convolutional layer, using 384 3x3x256 convolution kernels, number of nodes: and 13*13*192*2=64896, Number of parameters: 3*3*256*384+384=885120.

4th hidden layer is convolutional layer, using 384 3x3x192 convolution kernels, only with upper one layer of connection of the same GPU, section Point quantity: 13*13*192*2=64896, number of parameters: (3*3*192*192+192) * 2=663936.

5th hidden layer is convolutional layer, using 256 3x3x192 convolution kernels, only with upper one layer of connection of the same GPU, section Point quantity: 13*13*128*2=43264, number of parameters: (3*3*192*128+128) * 2=442624.

6th hidden layer is full articulamentum, number of nodes 4096.According to the number of parameters of full articulamentum=upper node layer number Measure the lower node layer quantity+number of offsets (i.e. next layer of number of nodes) of (after pooling) x, number of parameters are as follows: (6* 6*128*2) * 4096+4096=37752832, this number of parameters be far longer than before all convolutional layers number of parameters it With, that is to say, that the parameter of AlexNet is predominantly located in subsequent full articulamentum.

7th hidden layer is full articulamentum, and number of nodes is expression quantity.

8th hidden layer is full articulamentum, and the number of nodes of each is the quantity of corresponding expression human face.

In the present embodiment, the method at training depth of seam division network can be conjugate gradient method or error backpropagation algorithm Deng.

In order to improve the accuracy rate and efficiency of recognition methods, it is preferable that instructed in step 4 using error back propagation method Practice depth of seam division network, obtains facial expression recognition model.

In the present embodiment, due to devising the double-deck Tree Classifier, wherein Expression Recognition classification layer lays particular emphasis on human face expression Identification, the classification results of each Expression Recognition classification layer have corresponded to a recognition of face classification layer.In addition, layered structure is available In the distribution for the quantity and low-level nodes (face) for determining the slubbing point (human face expression) in each learning tasks, wherein having phase Face with human face expression should be assigned in identical learning tasks.

The purpose of back-propagation algorithm is to find one group by the method for gradient decline to reduce error to the maximum extent Weight, the partial derivative of weight coefficient constitutes gradient in loss function, and loss function is the function about weight, each by calculating The partial derivative of layer weight coefficient obtains gradient, and along the opposite direction of gradient vector, gradient reduces most fast, it is easier to find function Minimum value, it is therefore desirable to partial derivative is sought to weight coefficient, repetition training is carried out along the opposite direction of gradient vector, obtains The weight of error can be reduced.

Optionally, table is calculated separately using Formulas I and Formula II using when error back propagation method training depth of seam division network The partial derivative of feelings identification classification layer weight coefficient and the partial derivative of recognition of face classification layer weight coefficient:

Wherein,For the partial derivative of Expression Recognition classification layer weight coefficient, ￡ is loss function, and W is Expression Recognition classification Layer weight coefficient, x_eFor Expression Recognition classify layer input feature vector,For the local derviation of recognition of face classification layer weight coefficient Number, V are recognition of face classification layer weight coefficient, x_fFor the feature vector of recognition of face classification layer input.

The loss function ￡ is obtained using formula III:

Wherein, V_nIndicate the weight coefficient of n-th of face label of recognition of face classification layer, V_jIndicate recognition of face classification The weight coefficient of j-th of face label of layer, j=1,2 ..., N, n=1,2 ..., N, N indicate face total number of labels, and N is positive whole Number, b_fIndicate recognition of face offset parameter, b_fIt is updated together with weight coefficient V；

W_mIndicate the weight coefficient of m-th of expression label of Expression Recognition classification layer, W_iIndicate the of Expression Recognition classification layer The weight coefficient of i expression label, i=1,2 ..., M, m=1,2 ..., M, M indicate expression label sum, and M is positive integer, b_e Indicate Expression Recognition offset parameter, b_eIt is updated together with weight coefficient W.

It in the present embodiment, should be for not since the variation of human face expression will affect the accuracy rate of recognition of face Same classification task uses specific depth characteristic.

In the present embodiment, specific explanations are carried out to the calculating of partial derivative by taking facial paralysis Expression Recognition as an example, wherein face mark Label include facial paralysis and non-facial paralysis, and expression label includes eye closing, smile, lift eyebrow, frown, alarmming nose, shows tooth and the drum cheek, therefore M= 7, N=2；

Firstly, calculating loss function ￡ using formula III:

Calculate the loss function of Expression Recognition classification layer part in loss function:

Wherein, M indicates expression label sum, and in the present embodiment, M=7, m=1 indicate to close one's eyes, and m=2 indicates to smile, m =3 indicate lift eyebrow, and m=4 expression is frowned, and m=5 indicates to alarm nose, and m=6 shows that tooth, m=7 indicate the drum cheek；I=1 indicates to close one's eyes, I=2 indicates to smile, and i=3 indicates lift eyebrow, and i=4 expression is frowned, and i=5 indicates to alarm nose, and i=6 shows that tooth, i=7 indicate drum The cheek；In the present embodiment, x_eFor the output vector of the last one full articulamentum, i.e., the face table obtained by preceding 8 layers of neural network The depth characteristic vector of feelings.

Summation indicates to seek weight coefficient * feature vector+offset parameter summation of all expression labels below fraction, point It is weight coefficient * feature vector+offset parameter for seeking any one expression label above formula, whole fraction is for seeking The weight coefficient * feature vector of weight coefficient * feature vector+offset parameter of any one expression label and all expression labels+ The specific gravity of the summation of offset parameter；After calculating the corresponding rate of specific gravity of 7 labels, after taking logarithm to each rate of specific gravity, 7 ratios are sought It is negated after the summation of weight values logarithm, is exactly the loss function of Expression Recognition classification layer part in loss function.

Calculate the loss function of recognition of face classification layer part in loss function:

In the present embodiment, when carrying out facial paralysis Expression Recognition, N=2, n=1 indicate facial paralysis, and n=2 indicates non-facial paralysis, j=1 Indicate facial paralysis, j=2 indicates non-facial paralysis；In the present embodiment, x_fIt is defeated for Expression Recognition classification layer and the last one full articulamentum Feature vector out, i.e. the depth characteristic vector of recognition of face.

It is consistent with the Expression Recognition classification loss function of layer part is calculated, it is corresponding by seeking 2 face labels After rate of specific gravity, logarithm is taken to each rate of specific gravity, is negated after seeking the summation of 2 rate of specific gravity logarithms, is exactly face in loss function The loss function of identification classification layer part；

By in loss function Expression Recognition classification layer part loss function and loss function in recognition of face classify layer portion The loss function divided adds up, and obtains loss function ￡.

Secondly, updating Expression Recognition classification layer weight coefficient W using Formulas I, comprising:

Step I, it calculatesWithOverall error is obtained to the partial derivative of Expression Recognition classification layer weight coefficient W

Step II, it pressesMode realizes the update of Expression Recognition classification layer weight coefficient W, wherein taking Learning coefficient η=0.5；

Learning coefficient η is a factor for guaranteeing weighted value smooth change, avoids crossing most when close to minimum value Small value, this factor are commonly referred to as learning rate, take 0 < η < 1.

When practical calculating, with asking difference to replace seeking partial derivative, the difference of the input and output by calculating every layer obtains corresponding ginseng Several partial derivatives, and then realize the update of coefficient W.

Finally, updating recognition of face classification layer weight coefficient V using Formula II, comprising:

Step one calculatesWithOverall error is obtained to the partial derivative of V

Step two is pressedMode realizes the update of recognition of face classification layer weight coefficient V, wherein taking Practise coefficient μ=0.5.

In the present embodiment, W, V, b when specific implementation_e、b_fCalculating be on Pytorch platform realize simultaneously update, Pytorch platform can calculate the partial derivative of corresponding parameter, and then to W, V, b by the difference of every layer of input and output of calculating_e、b_f Parameter is all updated.

The sentence of calculating is mainly as follows:

Ecriterion=nn.CrossEntropyLoss () .cuda (args.gpu) # expression classifier loss function

Fcriterion=nn.CrossEntropyLoss () .cuda (args.gpu) # face classification device loss function

E_output, f_output=model (Imag) # do forward process to input image data Imag, are exported output

Loss=ecriterion (e_output, e_target)+fcriterion (f_output, f_target_ Var) # calculates separately the loss between two classifiers output and target

Loss.backward () # reverse procedure calculates gradient of the loss about each parameter

The calculated gradient of optimizer.step () # is updated parameter

Loss function provided in this embodiment is different from traditional backpropagation, only pays the utmost attention in each iterative process Relevant learning tasks, loss function only consider the associated weight coefficient being layered in Tree Classifier and other learning tasks Prediction probability prevents in the training process far from global optimum.

Embodiment two

A kind of facial expression recognizing method based on depth of seam division network is disclosed in the present embodiment, to people to be identified Face image obtains recognition result using step A- step B:

Step A, the facial image to be identified is handled using the method for step 2 in embodiment one, is obtained Facial expression image；

Step B, the step A facial expression image obtained is identified using the facial expression recognition model in embodiment one, Obtain recognition result.

In the present embodiment, it is tested on HP work station, workstation configuration Intel i5-7400CPU, has 8G The Nvidia GTX1080i video card of memory, 16G memory, 512G SSD hard disk.In addition, all experiments are all in PyTorch platform Upper progress.

In this experiment, it is tested using facial paralysis expression data library, the data-base recording the seven of 49 facial paralysis patients Kind facial paralysis expression (closes one's eyes, smiles, lift eyebrow, frowning, alarmming nose, showing tooth and rouse the cheek).Since resulting facial paralysis data are limited, each People corresponds to each facial expressions and acts and there was only a picture, therefore can only carry out the reality of face-independent in this experiment It tests.This experiment selects 268 facial paralysis expression samples as training sample altogether, and 40 samples are as test sample.

Facial paralysis expression data library is known using recognition methods provided by the invention and the identification of conventional depth learning network Not, accuracy rate comparison is shown in Table 1.

The accuracy rate of distinct methods on 1 facial paralysis expression data library of table

From table 1 it follows that hierarchy structure model proposed by the present invention is available one in the identification of facial paralysis expression Fixed promotion, relative to VGG-fintune, accuracy rate improves 5%, improves relative to Inception V3 accuracy rate 2.5%, relative to Resnet 34, accuracy rate improves 5%, and recognition effect is promoted obvious.

Claims

1. a kind of facial expression recognition model building method based on depth of seam division network, which is characterized in that according to the following steps It executes:

Step 2 carries out image after concentrating every width original image to carry out the pretreatment of human face region interception to the original image again The normalization of size obtains facial expression image collection；

Step 3 obtains the corresponding label of facial expression image concentration each image, obtains tally set；The label includes expression mark Label and face label；

Step 4 trains depth of seam division network using the tally set as output using the facial expression image collection as input, Obtain facial expression recognition model；

The depth of seam division network includes the multiple convolutional layers set gradually, multiple full articulamentums and the double-deck tree classification Device；

The double-deck Tree Classifier includes Expression Recognition classification layer and recognition of face classification layer, the Expression Recognition classification Layer output expression label, the recognition of face classification layer export face label.

2. as described in claim 1 based on the facial expression recognition model building method of depth of seam division network, which is characterized in that Using error back propagation method training depth of seam division network in the step 4, facial expression recognition model is obtained.

3. as claimed in claim 2 based on the facial expression recognition model building method of depth of seam division network, which is characterized in that Expression Recognition classification layer weight is calculated separately using Formulas I and Formula II using when error back propagation method training depth of seam division network The partial derivative of the partial derivative of parameter and recognition of face classification layer weight parameter:

Wherein,For Expression Recognition classify layer weight parameter partial derivative,For loss function, W is Expression Recognition classification layer power Weight parameter, x_eFor Expression Recognition classify layer input feature vector,For the partial derivative of recognition of face classification layer weight parameter, V For recognition of face classification layer weight parameter, x_fFor the feature vector of recognition of face classification layer input；

The loss function ￡ is obtained using formula III:

Wherein, V_nIndicate the weight parameter of n-th of face label of recognition of face classification layer, V_jIndicate recognition of face classification layer The weight parameter of j-th of face label, j=1,2 ..., N, n=1,2 ..., N, N indicate face total number of labels, and N is positive integer, b_fIndicate recognition of face offset parameter, b_fFor natural number；

W_mIndicate the weight parameter of m-th of expression label of Expression Recognition classification layer, W_iIndicate i-th of Expression Recognition classification layer The weight parameter of expression label, i=1,2 ..., M, m=1,2 ..., M, M indicate expression label sum, and M is positive integer, b_eIt indicates Expression Recognition offset parameter, b_eFor natural number.

4. as described in claim 1 based on the facial expression recognition model building method of depth of seam division network, which is characterized in that Depth of seam division network described in the step 4 includes 5 convolutional layers set gradually, 3 full articulamentums and 1 bilayer Tree Classifier.

5. as described in claim 1 based on the facial expression recognition model building method of depth of seam division network, which is characterized in that Face label in the step 3 includes facial paralysis label and non-facial paralysis label.

6. a kind of facial expression recognizing method based on depth of seam division network, which is characterized in that treat knowledge using step A- step B Other facial image is handled, and recognition result is obtained:

Step A, the facial image to be identified is handled using the method for step 2 in claim 1, obtains expression Image；

Step B, the table that step A is obtained using facial expression recognition model described in any one of claim 1-5 claim Feelings image is identified, recognition result is obtained.