CN111382684B

CN111382684B - Angle robust personalized facial expression recognition method based on antagonistic learning

Info

Publication number: CN111382684B
Application number: CN202010136966.3A
Authority: CN
Inventors: 王上飞; 王灿
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2020-03-02
Filing date: 2020-03-02
Publication date: 2022-09-06
Anticipated expiration: 2040-03-02
Also published as: CN111382684A

Abstract

The invention discloses an angle robust personalized facial expression recognition method based on counterstudy, which comprises the following steps: 1. carrying out image preprocessing on a database containing images with N types of human face expressions; 2. constructing a feature decoupling and domain self-adaptive network model based on counterstudy; 3. training the constructed network model by using an alternative iterative optimization mode; 4. and predicting the facial image to be detected by using the trained model to realize the classification and identification of the facial expression. The invention can simultaneously overcome the negative influence on the facial expression recognition effect caused by the angle and the difference between individuals in the facial expression recognition, thereby realizing the accurate recognition of the facial expression.

Description

Angle robust personalized facial expression recognition method based on counterstudy

Technical Field

The invention relates to the technical field of computer vision, in particular to an angle robust personalized facial expression recognition method based on counterstudy.

Background

Facial expression recognition is an important research topic in the field of computer vision, and has wide application in human-computer interaction, fatigue detection, crime reconnaissance and medical treatment. The current facial expression recognition method mostly assumes that a facial image is a front face, but in an actual application scene, the relative position of a user is not fixed, the scene is changeable, and only the facial expression recognition under a multi-angle condition can meet the actual requirement. Therefore, in recent years, some methods have been proposed by researchers to deal with the influence of angles on the recognition of facial expressions. Depending on how angle changes are handled, these methods can be divided into three categories: a specific perspective classifier method, a single classifier method, and an angle normalization method. The method for the classifier with the specific visual angle is intuitive, namely, corresponding classifiers are trained for samples with different angles respectively, however, the method is limited by limited training samples, and the classifier with robust performance cannot be learned for each angle. The single classifier approach attempts to learn a more robust classifier from a large number of samples, and can bring richer and diversified training samples to the learning of the classifier through sample generation, thanks to the application of generating a countermeasure network and a variational self-encoder. However, the generation of high-quality samples is a process which is difficult to guarantee, and the generated low-quality samples can bring noise to the learning of the classifier instead, so that the performance of the classifier is affected. The angle normalization method is to map face samples or feature representations of any angle into face samples or feature representations of a front face, and the consistency of individuals and the invariance of expression contents are kept during conversion. However, the method relies on the pair of training samples, that is, for a non-positive-face sample of an individual, a corresponding positive-face sample of the individual needs to exist, which severely restricts the use of the method in practice.

Besides the angle, the inter-individual difference is also an important factor affecting the facial expression recognition effect. Different individuals have very large differences of facial expressions for the same expression due to differences of facial shapes, characters, appearances and the like, and the expression recognition effect is seriously influenced. For example, for "smiling", some people tend to break into laugh, and some people tend to close into smile, and although both of them belong to the expression "happy", the expressions at the pixel level are different from each other, thereby causing difficulty in learning the features. In addition, individuals have different appearances, and the expression analysis is also challenged. Individual robust facial expression recognition can be solved by an individual-based method, i.e. individualized facial expression recognition method. The method based on the specific individual aims at establishing the specific classifier for the specific individual, so that the learned classifier is only concentrated on a single individual, and the deviation caused by learning of the classifier by other individuals is avoided. However, limited to the sample size of a single individual, it is difficult to learn a facial expression classifier with good performance.

Disclosure of Invention

The invention provides an angle robust personalized facial expression recognition method based on antagonistic learning, aiming at overcoming the influence of angle and difference between individuals in facial expression recognition, thereby improving the recognition rate of the facial expression recognition.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention relates to an angle robust personalized facial expression recognition method based on antagonistic learning, which is characterized by comprising the following steps of:

step 1, carrying out image preprocessing on a database containing images with N types of human face expressions:

carrying out face detection and correction on all facial expression images in the database by using an MTCNN (multiple-terminal coupled neural network) algorithm so as to obtain a normalized facial image data set which is used as a sample set;

randomly dividing the sample set by taking the individuals in the database as a dividing reference to obtain a source domain data set S and a target domain data set T; let any sample in the source domain data set S be x _s Any sample x in the source domain _s Is marked by y _s Any sample x in the source domain _s Is denoted by p _s (ii) a Let any sample in the target domain data set T be x _t ；

Step 2, constructing a feature decoupling and field self-adaptive network model based on counterstudy, and comprising the following steps: source domain feature extractor E _s And a target domain feature extractor E _t Angle classifier D _p And expression classifier R, angle domain discriminator D _dp And expression domain discriminator D _de Source domain image generator G _s And a target domain image generator G _t ；

The source domain feature extractor E _s And a target domain feature extractor E _t The system has the same network structure and consists of an input convolutional layer, M downsampling convolutional layers, Q residual convolutional layers and two branches containing W convolutional layers in sequence; an example regularization layer and a ReLU activation function are connected to each convolution layer;

the angle classifier D _p Expression classifier R and angle domain discriminator D _dp And expression domain discriminator D _de All are formed by full-connection networks of H layers;

the source domain image generator G _s And a target domain image generator G _t The network structure is the same, and the network structure is composed of an input convolutional layer, J up-sampling anti-convolutional layers and an output convolutional layer in sequence, for each convolutional layer before the output convolutional layer, an example regularization layer and a ReLU activation function are accessed, and for the output convolutional layer, a Tanh activation function is accessed;

initializing weight values of all convolution layers, anti-convolution layers and full connection layers in the feature decoupling and domain adaptive network model based on counterstudy by using Gaussian distribution;

step 3, four learning strategies of a feature decoupling and field self-adaptive network model based on antagonistic learning, including a supervision learning strategy, an antagonistic field self-adaptive learning strategy, a cross antagonistic feature decoupling learning strategy and an image reconstruction learning strategy;

step 3.1, a supervised learning strategy:

step 3.3.1, any sample x in the source domain _s Inputting the source domain feature extractor E _s In (1), two kinds of feature vectors are obtained _s ^e ,f _s ^p In which f _s ^e Representing a sample x in the source domain _s Expression-related feature of f _s ^p Representing a sample x in the source domain _s The angle-related characteristic of (a);

step 3.3.2, sample x in the Source Domain _s Angle-dependent characteristic f of _s ^p Inputting the angle classifier D _p Carrying out angle identification to obtain a sample x in a source domain _s The angle category of (1);

method for establishing loss function l of angle identification by using formula (1) _p (E _s ,D _p )：

In formula (1), Sup (·) represents a supervised loss function;

step 3.3.3 sample x in the Source Domain _s Expression-related feature f of _s ^e Inputting the expression into the expression classifier R for expression recognition to obtain a sample x in a source domain _s The expression category of (1);

loss function l for expression recognition is established by using formula (2) _e (E _s ,R)：

Step 3.2, the adaptive learning strategy of the countermeasure field:

step 3.2.1, any sample x in the target domain _t Inputting the target domain feature extractor E _t In (1), two kinds of feature vectors are obtained _t ^e ,f _t ^p In which f _t ^e Representing a sample x in the target domain _t Expression-related feature of f _t ^p Representing a sample x in the target domain _t The angle-related characteristic of (a);

step 3.2.2 sample x in the source domain _s Angle-dependent characteristic f of _s ^p Or samples x in the target domain _t Angle-dependent characteristic f of _t ^p Inputting the angle domain discriminator D _dp Obtaining an angle-dependent feature f _s ^p As true or expression-related features f _t ^p A false recognition result;

step 3.2.3 sample x in the Source Domain _s Expression-related feature f of _s ^e Or samples x in the object domain _t Expression-related feature f of _t ^e Inputting the expression domain discriminator D _de In the method, expression related characteristics f are obtained _s ^e As true or expression-related features f _t ^e A false recognition result;

step 3.2.4, establishing a countering learning loss function l by using the formula (3) _adv (E _s ,E _t ,D _dp ,D _de )：

Step 3.3, countermeasure characteristic decoupling learning strategy:

step 3.3.1, sample x in the Source Domain _s Angle-dependent characteristic f of _s ^p Inputting the expression into the expression classifier R to obtain a sample x in a source domain _s The expression classification result of (2);

sample x in the source domain _s Expression-related feature f of _s ^e Input angle classifier D _p Obtaining a sample x in the source domain _s The angle classification result of (1);

step 3.3.2, establishing expression classifier R for angle correlation characteristic f by using formula (4) _s ^p And an angle classifier D _p For expression-related features f _s ^e Is classified as a loss function

Step 3.4, image reconstruction learning strategy:

step 3.4.1, sample x in the Source Domain _s Angle-dependent characteristic f of _s ^p And sample x in the target domain _t Expression-related feature f of _t ^e Combined and input to the source domain image generator G _s Generating a reconstructed image in the source domain

Step 3.4.2, sample x in target Domain _t Angle-dependent characteristic f of _t ^p And sample x in the source domain _s Expression-related feature f of _s ^e Are combined and input to the target domain image generator G _t Generating a reconstructed image in the target domain

Step 3.4.3, establishing constraint l of reconstructed image by using formula (5) _clc (E _s ,E _t ,G _s ,G _t )：

In the formula (5), x _s ' represents another sample in the source domain data set S, and is associated with sample x _s Having the same angle label as sample x _t The same expression label is possessed; x is a radical of a fluorine atom _t ' represents another sample in the target domain data set T, and is associated with sample x _t Having the same angle label as sample x _s The same expression label is possessed;

and 4, constructing an overall loss function, and performing feature decoupling of counterstudy and study of a domain adaptive network model by using an alternative iterative optimization mode to obtain an optimal facial expression recognition model:

step 4.1, constructing a total objective function by using the formula (6):

in the formula (6), alpha, beta, eta and lambda are all weight factors;

step 4.2, setting the total training step number as K ₁ The current total number of training steps is k ₁ ；

Setting the optimized number of steps at three positions inside as K ₂ ，K ₃ And K ₄ The corresponding current optimization step number is k ₂ ，k ₃ And k ₄ ；

Setting the number of samples sampled each time in training as B;

initialization k ₁ ,k ₂ ,k ₃ ,k ₄ Are all '0';

step 4.3, respectively carrying out external kth from the source domain data set S and the target domain data set T ₁ Sub-inner kth ₂ B samples are randomly obtained andas external kth ₁ Sub-inner kth ₂ A source domain training sample and a target domain training sample of the secondary iteration;

step 4.4, optimizing the source domain feature extractor E by using the formula (7) _s And an expression classifier R to obtain the k-th external ₁ Sub-inner kth ₂ Corresponding gradient of the sub-iteration

Step 4.5, optimizing a source domain feature extractor E by using the formula (8) _s Get the external kth ₁ Sub-inner kth ₂ Corresponding gradient of the sub-iteration

Step 4.6, optimizing the source domain feature extractor E by using the formula (9) _s And a target domain feature extractor E _t Get the external kth ₁ Sub-inner kth ₂ Corresponding gradient of the sub-iteration

Step 4.7, let k ₂ +1 assignment to k ₂ Then, judge k ₂ ≥K ₂ If yes, executing the step 4.8, otherwise, returning to the step 4.3 for sequential execution;

step 4.8, respectively carrying out external kth from the source domain data set S and the target domain data set T ₁ Second inner kth ₃ Second random B samples are taken and used as the k-th outer sample ₁ Sub-inner kth ₃ A source domain training sample and a target domain training sample of the secondary iteration;

step 4.9, optimizing the source domain feature extractor E by using the formula (10) _s Target domain feature extractor E _t Source domain image generator G _s And a target domain image generator G _t Get the external kth ₁ Sub-inner kth ₃ Corresponding gradient of the sub-iteration

Step 4.10, let k ₃ +1 assign to k ₃ Then, judge k ₃ ≥K ₃ If yes, executing the step 4.11, otherwise, returning to the step 4.8 for sequential execution;

step 4.11, carry out external kth from the source domain data set S and the target domain data set T respectively ₁ Sub-inner kth ₄ Second random B samples are taken and used as the k-th outer sample ₁ Sub-inner kth ₄ A source domain training sample and a target domain training sample of the secondary iteration;

step 4.12, optimizing the source domain feature extractor E by using the formula (11) _s And angle classifier D _p Get the external kth ₁ Second inner kth ₄ Corresponding gradient of the sub-iteration

Step 4.13, optimizing expression domain discriminator D by using formula (12) _de And angle domain discriminator D _dp Get the external kth ₁ Sub-inner kth ₄ Corresponding gradient of the sub-iteration

Step 4.14, k ₄ +1 assignment to k ₄ Then, judge k ₄ ≥K ₄ If yes, executing the step 4.15, otherwise, returning to the step 4.11 for sequential execution;

step 4.15, let k ₁ +1 assignment to k ₁ Then, judge k ₁ ≥K ₁ Whether the face expression is established or not or whether the algorithm is converged or not is judged, if yes, the training is finished, and an optimal face expression recognition model is obtained and is used for realizing the classification of the face expression; otherwise, the step 4.3 is returned to execute in sequence.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the invention, through proposing the cross-confrontation feature decoupling learning strategy, the expression-related features and the angle-related features can be decoupled, so that the expression-related features do not contain angle information irrelevant to expression recognition, and the angle-related features do not contain expression information irrelevant to angle recognition, the problems that the existing angle-robust facial expression recognition method is limited by sample diversity, depends on high-quality facial image generation and the like are solved, and the more angle-robust facial expression recognition is realized.

2. According to the invention, by providing the countermeasure field adaptive learning strategy, the source domain information can be effectively migrated to the target domain, the learning of the facial expression recognition task of the target domain is facilitated, the defect that the traditional personalized facial expression recognition method is limited by a small number of target domain samples is overcome, the strategy does not need the expression and angle labeling information of the target domain, the usability in the actual environment is improved, and the influence of the inter-individual difference in the facial expression recognition on the recognition effect can be effectively coped with.

3. According to the method, through the provision of the reconstruction learning strategy, the performance of cross confrontation feature decoupling learning and confrontation field adaptive learning can be further improved, and the facial expression recognition effect of the method is further improved.

4. The invention designs an alternative iterative optimization method, which can simultaneously carry out supervised learning, cross confrontation feature decoupling learning, confrontation field adaptive learning and reconstruction learning, realizes end-to-end training and prediction, reduces manual intervention, supplements each learning strategy, jointly learns the features of angles and individual robustness, and optimizes the process of learning related features of human face expression.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a model block diagram of the present invention;

FIG. 3 is a graph of the reconstructed results of the present invention on the Multi-PIE and BU-3DFE databases.

Detailed Description

In this embodiment, as shown in fig. 1, an angle robust personalized facial expression recognition method based on counterstudy is performed according to the following steps:

carrying out face detection and correction on all facial expression images in the database by using an MTCNN (multiple-transmission neural network) algorithm so as to obtain a normalized facial image data set which is used as a sample set; in this embodiment, the pixel size of all the face images after normalization processing is 128 × 128;

randomly dividing the sample set by taking individuals in the database as a division reference to obtain a source domain data set S and a target domain data set T; let any sample in the source domain data set S be x _s Any sample x in the source domain _s Is marked as y _s Any sample x in the source domain _s Is denoted by p _s (ii) a Let any sample in the target domain data set T be x _t The target domain sample has no expression and angle marking information;

in this embodiment, as shown in FIG. 3, a Multi-PIE and BU-3DFE facial expression database is used. The Multi-PIE facial expression database contains 755,370 facial images collected from 337 volunteers for 13 angles: 90 ° to 90 ° and at 15 ° intervals, the expressions are noted: smile, surprise, strabismus, aversion, scream and neutrality. The BU-3DFE facial expression database contains 100 3D models, wherein 56 men and 44 women, samples at arbitrary angles can be obtained by rotating the 3D models, and the expressions are labeled as: anger, disgust, fear, happiness, neutrality, sadness and surprise.

Step 2, as shown in fig. 2, constructing a feature decoupling and domain adaptive network model based on counterstudy, and including: source domain feature extractor E _s And a target domain feature extractor E _t Angle classifier D _p And expression classifier R, angle domain discriminator D _dp And expression domain discriminator D _de Source domain image generator G _s And a target domain image generator G _t ；

Source domain feature extractor E _s And a target domain feature extractor E _t The network structure is the same and sequentially comprises an input convolutional layer (the sizes of convolution kernels are 7 multiplied by 7, the number of the convolutional layers is 3, the step length is 2, and the padding is 3), M downsampled convolutional layers (in the example, M is 4, the sizes of the convolution kernels are 4 multiplied by 4, the step lengths are 2, the padding is 1, the number of the convolutional layers is 64, 32, 16 and 8 respectively), Q residual convolutional layers (in the example, Q is 3, the sizes of the convolution kernels are 3 multiplied by 3, the number of the convolutional layers is 8, the step lengths are 2, and the padding is 1), and two branches containing W convolutional layers (in the example, W is 2, the sizes of the convolution kernels are 3 multiplied by 3, the number of the convolutional layers is 8, the step lengths are 2, and the padding is 1); an example regularization layer and a ReLU activation function are connected to each convolution layer;

angle classifier D _p Expression classifier R and angle domain discriminator D _dp And expression domain discriminator D _de Are all composed of a fully connected network of H layers of input length 512 (in this example, H is set to 3);

source domain image generator G _s And a target domain image generator G _t Having the same network structure, in turn all being an input volumeA plurality of laminates (convolution kernel size is 7 multiplied by 7, number is 8, step size is 1, filling is 3), a plurality of upsampling deconvolution (in this example, J is set to 4, convolution kernel size is 4 multiplied by 4, step size is 2, filling is 1, number is 8, 16, 32, 64 respectively) and an output convolution layer (convolution kernel size is 7 multiplied by 7, number is 3, step size is 1, filling is 3), for each convolution layer before the output convolution layer, an example regularization layer and a ReLU activation function are connected afterwards, for the output convolution layer, a Tanh activation function is connected afterwards;

initializing weights of all convolution layers, anti-convolution layers and full connection layers in the feature decoupling and domain adaptive network model based on antagonistic learning by using Gaussian distribution obeying N (0, 0.02);

step 3, four learning strategies based on feature decoupling and domain adaptive network models of antagonistic learning, including a supervised learning strategy, an antagonistic domain adaptive learning strategy, a cross antagonistic feature decoupling learning strategy and an image reconstruction learning strategy;

step 3.1, a supervised learning strategy:

step 3.3.1, any sample x in the source domain _s Input source domain feature extractor E _s In the method, two feature vectors are obtained

Wherein,

representing a sample x in the source domain _s Expression-related feature of f _s ^p Representing a sample x in the source domain _s The angle-related characteristic of (a); the two features are obtained by unfolding a feature map output by the convolutional layer, and the dimension is 512;

step 3.3.2 sample x in Source Domain _s Angle-dependent characteristic f of _s ^p Input angle classifier D _p Carrying out angle identification to obtain a sample x in a source domain _s The angle category of (1);

In formula (1), Sup (·) represents a supervised loss function; square losses, Softmax losses, cross entropy losses, etc. can be used;

step 3.3.3 sample x in Source Domain _s Expression-related feature f of _s ^e Inputting the expression into an expression classifier R for expression recognition to obtain a sample x in a source domain _s The expression category of (1);

Step 3.2, the adaptive learning strategy of the confrontation field:

step 3.2.1, any sample x in the target Domain _t Input target domain feature extractor E _t In (1), two kinds of feature vectors are obtained _t ^e ,f _t ^p In which f _t ^e Representing a sample x in the target domain _t Expression-related feature of f _t ^p Representing a sample x in the target domain _t The angle-related characteristic of (a); similarly, the two features are obtained by expanding a feature map output by the convolutional layer, and the dimension is 512;

step 3.2.2, introduction of countermeasure domain adaptive learning strategy reduction f _s ^p And f _t ^p Inter-domain distribution variability exists. In particular, sample x in the source domain _s Angle-dependent characteristic f of _s ^p Or samples x in the object domain _t Angle-dependent characteristic f of _t ^p Input angle domain discriminator D _dp Obtaining an angle-dependent feature f _s ^p Is a true or angle-dependent feature f _t ^p A false recognition result; in the angular domain discriminator D _dp To distinguish f as much as possible _s ^p And f _t ^p Time, source domain feature extractor E _s And a target domain feature extractor E _t As far as possible so that f is generated _s ^p And f _t ^p Can not be judged by the angle domain D _dp And (5) identifying. Thus a source domain feature extractor E _s And a target domain feature extractor E _t And angle domain discriminator D _dp Forming an antagonistic relationship.

Step 3.2.3, reducing f by introducing countervailing field adaptive learning strategy _s ^e And f _t ^e There is a domain-to-domain distribution variability. In particular, sample x in the source domain _s Expression-related feature f of _s ^e Or samples x in the target domain _t Expression-related feature f of _t ^e Input expression domain discriminator D _de In the method, expression related characteristics f are obtained _s ^e As true or expression-related features f _t ^e A false recognition result; in expression domain discriminator D _de To distinguish f as much as possible _s ^e And f _t ^e Time, source domain feature extractor E _s And a target domain feature extractor E _t As far as possible so that f is generated _s ^e And f _t e cannot be expressed by the expression domain discriminator D _de And (5) identifying. Thus the source domain feature extractor E _s And a target domain feature extractor E _t And expression domain discriminator D _de Constituting an antagonistic relationship.

Step 3.2.4, establishing a counterlearning loss function l by using the formula (3) _adv (E _s ,E _t ,D _dp ,D _de )：

Step 3.3, countermeasure characteristic decoupling learning strategy:

step 3.3.1, sample x in Source Domain _s Angle-dependent characteristic f of _s ^p Inputting the expression into a classifier R to obtain a sample x in a source domain _s The expression classification result of (1);

sample x in the source domain _s Expression-related feature f of _s ^e Input angle classifier D _p In (1), obtain a sample x in the source domain _s The angle classification result of (1);

step 3.3.2, establishing expression classifier R for angle correlation characteristic f by using formula (4) _s ^p And an angle classifier D _p For expression-related feature f _s ^e Is classified as a loss function

By optimizing this loss, the expression classifier R cannot correlate features f to angles _s ^p Recognizing expression information and enabling an angle classifier D _p Inability to characterize the situational related features f _s ^e Identifying angle information so that the angle-related feature f _s ^p There is no expression information independent of angle, so that the expression-related feature f _s ^e Angle information irrelevant to the expression does not exist, and the decoupling of the angle and the expression information is realized;

step 3.4, image reconstruction learning strategy:

step 3.4.1, sample x in Source Domain _s Angle-dependent characteristic f of _s ^p And sample x in the target domain _t Expression-related feature f of _t ^e Combining and inputting source domain image generator G _s In the generation of a reconstructed image in the source domain

Characteristic f at this time _s ^p And f _t ^e The characteristic diagram is output by the convolutional layer, is not expanded, has the length, width and depth of 8 multiplied by 8, and is directly spliced on the depth during combination to obtain the characteristic diagram of 8 multiplied by 16;

step 3.4.2, sample x in target Domain _t Angle-dependent characteristic f of _t ^p And source domain samplingThis x _s Expression-related feature f of _s ^e Combine and input target domain image generator G _t In generating a reconstructed image in the target domain

Characteristic f at this time _t ^p And f _s ^e The feature map is output by the convolutional layer, is not expanded, has the length, width and depth of 8 multiplied by 8, and is spliced on the depth during combination to obtain the feature map of 8 multiplied by 16;

In the formula (5), x _s ' represents another sample in the source domain data set S, and is associated with sample x _s Having the same angle label as sample x _t The same expression label is possessed; x is the number of _t ' represents another sample in the target domain data set T, and is associated with sample x _t Having the same angle label as sample x _s The same expression label is possessed; at this point sample x is needed _t Angle and expression labeling information of (a), while the target domain dataset is expressionless and angle labeled, so sample x _t The angle and expression information of (2) are obtained by pseudo-tags, i.e.

Wherein

Represents a sample x _t The pseudo-angle of (a) is noted,

represents a sample x _t Marking the pseudo expression;

step 4, constructing an overall loss function, and performing feature decoupling of antagonistic learning and learning of a field adaptive network model by using an alternative iterative optimization mode, so as to obtain an optimal facial expression recognition model:

step 4.1, constructing a total objective function by using the formula (6):

in the formula (6), alpha, beta, eta and lambda are all weight factors; in this example, the four weighting factors have values of 2.0, 3.0, 0.2, and 0.1, respectively;

Setting the number of samples sampled each time in training as B;

initialization k ₁ ,k ₂ ,k ₃ ,k ₄ Are all '0';

set the learning rate to l _ rate, K in this example ₁ Set to 30, K ₂ ，K ₃ And K ₄ Set to 1, 3 and 1, respectively, B is set to 32, and the initial learning rate l _ rate is set to 0.001.

Step 4.3, respectively carrying out external kth from the source domain data set S and the target domain data set T ₁ Second inner kth ₂ Second random B samples are taken and used as the k-th outer sample ₁ Sub-inner kth ₂ A source domain training sample and a target domain training sample of the secondary iteration;

step 4.4, optimizing the source domain feature extractor E by using the formula (7) _s And an expression classifier R to obtain the k-th external expression ₁ Second inner kth ₂ Corresponding gradient of the sub-iteration

Step 4.5, optimizing the source domain feature extractor E by using the formula (8) _s Get the external kth ₁ Sub-inner kth ₂ Corresponding gradient of the sub-iteration

Step 4.7, k ₂ +1 assignment to k ₂ Then, judge k ₂ ≥K ₂ If yes, executing the step 4.8, otherwise, returning to the step 4.3 for sequential execution;

Step 4.10, let k ₃ +1 assignment to k ₃ Then, judge k ₃ ≥K ₃ If yes, executing the step 4.11, otherwise, returning to the step 4.8 for sequential execution;

Step 4.13, optimizing expression domain discriminator D by using formula (12) _de And angle domain discriminator D _dp Get the external kth ₁ Second inner kth ₄ Corresponding gradient of the sub-iteration

Step 4.14, k ₄ +1 assign to k ₄ Then, judge k ₄ ≥K ₄ If yes, executing the step 4.15, otherwise, returning to the step 4.11 for sequential execution;

step 4.15, k ₁ +1 assign to k ₁ Then, two determinations are made, first 20 < k ₁ ＜K ₁ If yes, the learning rate is updated to show a linear decay of the learning rate, i.e., l _ rate ═ l _ rate- γ × l _ rate, where γ is a decay factor, set to 0.1 in this example, and then k is determined ₁ ≥K ₁ Whether the face expression recognition model is established or not or whether the algorithm is converged or not is judged, if yes, the training is finished, the optimal face expression recognition model is obtained and used for realizing the classification of the face expression, and the final face expression recognition model is obtained by a combined target domain feature extractor E _t And an expression classifier R is obtained as

Wherein,

representing the combination of functions, otherwise returning to step 4.3 for sequential execution.

The test results of the present invention are further described in conjunction with the following chart:

in order to verify the contribution of each learning strategy to the final facial expression recognition effect, a comparison experiment is carried out, and the method comprises the following four aspects: (1) only supervised learning strategies are used; (2) combining supervised learning and confrontation field adaptive learning strategies; (3) combining supervised learning, confrontation field adaptive learning and cross confrontation learning strategies; (4) all learning strategies are used. The results of the experiment are shown in tables 1 and 2.

TABLE 1 recognition rates (in%)

TABLE 2 recognition rates (in%) of different learning strategies on BU-3DFE database

As can be seen from the experimental results of table 1 and table 2: with the increase of the use of the learning strategy provided by the invention, the experimental result is obviously improved, and a better facial expression recognition effect is still achieved at a larger angle of deviation from the front face, thus showing the effectiveness of the invention.

Claims

1. An individualized facial expression recognition method based on angle robustness of counterstudy is characterized by comprising the following steps:

randomly dividing the sample set by taking the individuals in the database as a dividing reference to obtain a source domain data set S and a target domain data set T; let any sample in the source domain data set S be x _s Any sample x in the source domain _s Is marked as y _s Any sample x in the source domain _s Is denoted by p _s (ii) a Let x be any sample in the target domain data set T _t ；

Step 2, constructing a feature decoupling and field self-adaptive network model based on counterstudy, and comprising the following steps: source domain feature extractor E _s And a target domain feature extractor E _t Angle classifier D _p And expression classifier R, angle domain discriminator D _dp And expression domain discriminator D _de Source field image Generator G _s And a target domain image generator G _t ；

performing weight initialization on all convolution layers, anti-convolution layers and full connection layers in the feature decoupling and domain adaptive network model based on antagonistic learning by using Gaussian distribution;

step 3.1, a supervised learning strategy:

step 3.3.1, any sample x in the source domain _s Inputting the source domain feature extractor E _s In (1), two kinds of feature vectors are obtained

Wherein,

representing samples x in the source domain _s Expression-related feature of f _s ^p Representing samples x in the source domain _s The angle-related characteristic of (a);

step 3.3.2, sample x in the Source Domain _s Angle-dependent characteristic of

Inputting the angle classifier D _p Carrying out angle identification to obtain source domain samplesThis x _s The angle category of (d);

In formula (1), Sup (·) represents a supervised loss function;

step 3.3.3 sample x in the Source Domain _s Expression-related features of

Inputting the expression into the expression classifier R for expression recognition to obtain a sample x in a source domain _s The expression category of (a);

Step 3.2, the adaptive learning strategy of the confrontation field:

step 3.2.1, any sample x in the target domain _t Inputting the target domain feature extractor E _t In (1), two kinds of feature vectors are obtained

Wherein,

representing a sample x in the target domain _t Expression-related feature of f _t ^p Representing a sample x in the target domain _t The angle-related characteristic of (a);

step 3.2.2 sample x in the Source Domain _s Angle-dependent characteristic of

Or samples x in the target domain _t Angle-dependent characteristic f of _t ^p Inputting the angle domain discriminator D _dp Obtaining angle-related features

As true or expression-related features f _t ^p A false recognition result;

step 3.2.3 sample x in the Source Domain _s Expression-related features of

Or samples x in the target domain _t Expression-related features of

Inputting the expression domain discriminator D _de In the method, expression-related features are obtained

As true or expression-related features f _t ^e A false recognition result;

Step 3.3, a countermeasure characteristic decoupling learning strategy:

step 3.3.1, sample x in the Source Domain _s Angle-dependent characteristic of

Inputting the expression into the expression classifier R to obtain a sample x in a source domain _s The expression classification result of (2);

sample x in the source domain _s Expression-related feature f of _s ^e Input angleDegree classifier D _p Obtaining a sample x in the source domain _s The angle classification result of (1);

step 3.3.2, establishing angle-related features of expression classifier R by using formula (4)

And an angle classifier D _p For expression-related features

Is classified as a loss function

Step 3.4, image reconstruction learning strategy:

step 3.4.1, sample x in the Source Domain _s Angle-dependent characteristic of

And sample x in the target domain _t Expression-related features of

Combined and input to the source domain image generator G _s Generating a reconstructed image in the source domain

Step 3.4.2, sample x in target Domain _t Angle-dependent characteristic f of _t ^p And sample x in the source domain _s Expression-related features of

Make a combination and input the purposeMark domain image generator G _t Generating a reconstructed image in the target domain

In formula (5), x' _s Represents another sample in the source domain data set S and is associated with sample x _s Having the same angle label as sample x _t The same expression label is possessed; x' _t Represents another sample in the target domain data set T and is associated with sample x _t Having the same angle label as sample x _s The same expression label is possessed;

step 4.1, constructing a total objective function by using the formula (6):

in the formula (6), alpha, beta, eta and lambda are all weight factors;

Setting the number of samples sampled each time in training as B;

initialization k ₁ ,k ₂ ,k ₃ ,k ₄ Are all '0';

step 4.3, respectively carrying out external kth from the source domain data set S and the target domain data set T ₁ Sub-inner kth ₂ Second random B samples are taken and taken as the kth outer ₁ Sub-inner kth ₂ A source domain training sample and a target domain training sample of the secondary iteration;

step 4.4, optimizing the source domain feature extractor E using equation (7) _s And an expression classifier R to obtain the k-th external ₁ Sub-inner kth ₂ Corresponding gradient of the sub-iteration

step 4.8, respectively carrying out external kth from the source domain data set S and the target domain data set T ₁ Second inner kth ₃ Second random B samples are taken and taken as the kth outer ₁ Second inner kth ₃ A source domain training sample and a target domain training sample of the secondary iteration;

step 4.11, carry out external kth from the source domain data set S and the target domain data set T respectively ₁ Second inner kth ₄ Second random B samples are taken and used as the k-th outer sample ₁ Sub-inner kth ₄ A source domain training sample and a target domain training sample of the secondary iteration;

step 4.12, optimizing the source domain feature extractor E by using the formula (11) _s And angle classifier D _p Get the external kth ₁ Sub-inner kth ₄ Corresponding gradient of the sub-iteration

Step 4.14, let k ₄ +1 assignment to k ₄ Then, judge k ₄ ≥K ₄ If yes, executing the step 4.15, otherwise, returning to the step 4.11 for sequential execution;

step 4.15, k ₁ +1 assign to k ₁ Then, judge k ₁ ≥K ₁ Whether the face expression is established or the algorithm is converged or not is judged, if yes, the training is finished, and an optimal face expression recognition model is obtained and used for realizing the classification of the face expression; otherwise, the step 4.3 is returned to execute in sequence.