CN111382684B - Angle robust personalized facial expression recognition method based on antagonistic learning - Google Patents
Angle robust personalized facial expression recognition method based on antagonistic learning Download PDFInfo
- Publication number
- CN111382684B CN111382684B CN202010136966.3A CN202010136966A CN111382684B CN 111382684 B CN111382684 B CN 111382684B CN 202010136966 A CN202010136966 A CN 202010136966A CN 111382684 B CN111382684 B CN 111382684B
- Authority
- CN
- China
- Prior art keywords
- expression
- sample
- domain
- angle
- source domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000008921 facial expression Effects 0.000 title claims abstract description 55
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000003042 antagnostic effect Effects 0.000 title claims description 17
- 238000012549 training Methods 0.000 claims abstract description 35
- 230000001815 facial effect Effects 0.000 claims abstract description 8
- 238000005457 optimization Methods 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 230000014509 gene expression Effects 0.000 claims description 115
- 230000006870 function Effects 0.000 claims description 31
- 230000001419 dependent effect Effects 0.000 claims description 21
- 230000003044 adaptive effect Effects 0.000 claims description 19
- 230000004913 activation Effects 0.000 claims description 9
- 239000013598 vector Substances 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 4
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 3
- 206010061274 Malocclusion Diseases 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 206010063659 Aversion Diseases 0.000 description 1
- 208000004350 Strabismus Diseases 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 229910052731 fluorine Inorganic materials 0.000 description 1
- 125000001153 fluoro group Chemical group F* 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an angle robust personalized facial expression recognition method based on counterstudy, which comprises the following steps: 1. carrying out image preprocessing on a database containing images with N types of human face expressions; 2. constructing a feature decoupling and domain self-adaptive network model based on counterstudy; 3. training the constructed network model by using an alternative iterative optimization mode; 4. and predicting the facial image to be detected by using the trained model to realize the classification and identification of the facial expression. The invention can simultaneously overcome the negative influence on the facial expression recognition effect caused by the angle and the difference between individuals in the facial expression recognition, thereby realizing the accurate recognition of the facial expression.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to an angle robust personalized facial expression recognition method based on counterstudy.
Background
Facial expression recognition is an important research topic in the field of computer vision, and has wide application in human-computer interaction, fatigue detection, crime reconnaissance and medical treatment. The current facial expression recognition method mostly assumes that a facial image is a front face, but in an actual application scene, the relative position of a user is not fixed, the scene is changeable, and only the facial expression recognition under a multi-angle condition can meet the actual requirement. Therefore, in recent years, some methods have been proposed by researchers to deal with the influence of angles on the recognition of facial expressions. Depending on how angle changes are handled, these methods can be divided into three categories: a specific perspective classifier method, a single classifier method, and an angle normalization method. The method for the classifier with the specific visual angle is intuitive, namely, corresponding classifiers are trained for samples with different angles respectively, however, the method is limited by limited training samples, and the classifier with robust performance cannot be learned for each angle. The single classifier approach attempts to learn a more robust classifier from a large number of samples, and can bring richer and diversified training samples to the learning of the classifier through sample generation, thanks to the application of generating a countermeasure network and a variational self-encoder. However, the generation of high-quality samples is a process which is difficult to guarantee, and the generated low-quality samples can bring noise to the learning of the classifier instead, so that the performance of the classifier is affected. The angle normalization method is to map face samples or feature representations of any angle into face samples or feature representations of a front face, and the consistency of individuals and the invariance of expression contents are kept during conversion. However, the method relies on the pair of training samples, that is, for a non-positive-face sample of an individual, a corresponding positive-face sample of the individual needs to exist, which severely restricts the use of the method in practice.
Besides the angle, the inter-individual difference is also an important factor affecting the facial expression recognition effect. Different individuals have very large differences of facial expressions for the same expression due to differences of facial shapes, characters, appearances and the like, and the expression recognition effect is seriously influenced. For example, for "smiling", some people tend to break into laugh, and some people tend to close into smile, and although both of them belong to the expression "happy", the expressions at the pixel level are different from each other, thereby causing difficulty in learning the features. In addition, individuals have different appearances, and the expression analysis is also challenged. Individual robust facial expression recognition can be solved by an individual-based method, i.e. individualized facial expression recognition method. The method based on the specific individual aims at establishing the specific classifier for the specific individual, so that the learned classifier is only concentrated on a single individual, and the deviation caused by learning of the classifier by other individuals is avoided. However, limited to the sample size of a single individual, it is difficult to learn a facial expression classifier with good performance.
Disclosure of Invention
The invention provides an angle robust personalized facial expression recognition method based on antagonistic learning, aiming at overcoming the influence of angle and difference between individuals in facial expression recognition, thereby improving the recognition rate of the facial expression recognition.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to an angle robust personalized facial expression recognition method based on antagonistic learning, which is characterized by comprising the following steps of:
step 1, carrying out image preprocessing on a database containing images with N types of human face expressions:
carrying out face detection and correction on all facial expression images in the database by using an MTCNN (multiple-terminal coupled neural network) algorithm so as to obtain a normalized facial image data set which is used as a sample set;
randomly dividing the sample set by taking the individuals in the database as a dividing reference to obtain a source domain data set S and a target domain data set T; let any sample in the source domain data set S be x s Any sample x in the source domain s Is marked by y s Any sample x in the source domain s Is denoted by p s (ii) a Let any sample in the target domain data set T be x t ;
Step 2, constructing a feature decoupling and field self-adaptive network model based on counterstudy, and comprising the following steps: source domain feature extractor E s And a target domain feature extractor E t Angle classifier D p And expression classifier R, angle domain discriminator D dp And expression domain discriminator D de Source domain image generator G s And a target domain image generator G t ;
The source domain feature extractor E s And a target domain feature extractor E t The system has the same network structure and consists of an input convolutional layer, M downsampling convolutional layers, Q residual convolutional layers and two branches containing W convolutional layers in sequence; an example regularization layer and a ReLU activation function are connected to each convolution layer;
the angle classifier D p Expression classifier R and angle domain discriminator D dp And expression domain discriminator D de All are formed by full-connection networks of H layers;
the source domain image generator G s And a target domain image generator G t The network structure is the same, and the network structure is composed of an input convolutional layer, J up-sampling anti-convolutional layers and an output convolutional layer in sequence, for each convolutional layer before the output convolutional layer, an example regularization layer and a ReLU activation function are accessed, and for the output convolutional layer, a Tanh activation function is accessed;
initializing weight values of all convolution layers, anti-convolution layers and full connection layers in the feature decoupling and domain adaptive network model based on counterstudy by using Gaussian distribution;
step 3, four learning strategies of a feature decoupling and field self-adaptive network model based on antagonistic learning, including a supervision learning strategy, an antagonistic field self-adaptive learning strategy, a cross antagonistic feature decoupling learning strategy and an image reconstruction learning strategy;
step 3.1, a supervised learning strategy:
step 3.3.1, any sample x in the source domain s Inputting the source domain feature extractor E s In (1), two kinds of feature vectors are obtained s e ,f s p In which f s e Representing a sample x in the source domain s Expression-related feature of f s p Representing a sample x in the source domain s The angle-related characteristic of (a);
step 3.3.2, sample x in the Source Domain s Angle-dependent characteristic f of s p Inputting the angle classifier D p Carrying out angle identification to obtain a sample x in a source domain s The angle category of (1);
method for establishing loss function l of angle identification by using formula (1) p (E s ,D p ):
In formula (1), Sup (·) represents a supervised loss function;
step 3.3.3 sample x in the Source Domain s Expression-related feature f of s e Inputting the expression into the expression classifier R for expression recognition to obtain a sample x in a source domain s The expression category of (1);
loss function l for expression recognition is established by using formula (2) e (E s ,R):
Step 3.2, the adaptive learning strategy of the countermeasure field:
step 3.2.1, any sample x in the target domain t Inputting the target domain feature extractor E t In (1), two kinds of feature vectors are obtained t e ,f t p In which f t e Representing a sample x in the target domain t Expression-related feature of f t p Representing a sample x in the target domain t The angle-related characteristic of (a);
step 3.2.2 sample x in the source domain s Angle-dependent characteristic f of s p Or samples x in the target domain t Angle-dependent characteristic f of t p Inputting the angle domain discriminator D dp Obtaining an angle-dependent feature f s p As true or expression-related features f t p A false recognition result;
step 3.2.3 sample x in the Source Domain s Expression-related feature f of s e Or samples x in the object domain t Expression-related feature f of t e Inputting the expression domain discriminator D de In the method, expression related characteristics f are obtained s e As true or expression-related features f t e A false recognition result;
step 3.2.4, establishing a countering learning loss function l by using the formula (3) adv (E s ,E t ,D dp ,D de ):
Step 3.3, countermeasure characteristic decoupling learning strategy:
step 3.3.1, sample x in the Source Domain s Angle-dependent characteristic f of s p Inputting the expression into the expression classifier R to obtain a sample x in a source domain s The expression classification result of (2);
sample x in the source domain s Expression-related feature f of s e Input angle classifier D p Obtaining a sample x in the source domain s The angle classification result of (1);
step 3.3.2, establishing expression classifier R for angle correlation characteristic f by using formula (4) s p And an angle classifier D p For expression-related features f s e Is classified as a loss function
Step 3.4, image reconstruction learning strategy:
step 3.4.1, sample x in the Source Domain s Angle-dependent characteristic f of s p And sample x in the target domain t Expression-related feature f of t e Combined and input to the source domain image generator G s Generating a reconstructed image in the source domain
Step 3.4.2, sample x in target Domain t Angle-dependent characteristic f of t p And sample x in the source domain s Expression-related feature f of s e Are combined and input to the target domain image generator G t Generating a reconstructed image in the target domain
Step 3.4.3, establishing constraint l of reconstructed image by using formula (5) clc (E s ,E t ,G s ,G t ):
In the formula (5), x s ' represents another sample in the source domain data set S, and is associated with sample x s Having the same angle label as sample x t The same expression label is possessed; x is a radical of a fluorine atom t ' represents another sample in the target domain data set T, and is associated with sample x t Having the same angle label as sample x s The same expression label is possessed;
and 4, constructing an overall loss function, and performing feature decoupling of counterstudy and study of a domain adaptive network model by using an alternative iterative optimization mode to obtain an optimal facial expression recognition model:
step 4.1, constructing a total objective function by using the formula (6):
in the formula (6), alpha, beta, eta and lambda are all weight factors;
step 4.2, setting the total training step number as K 1 The current total number of training steps is k 1 ;
Setting the optimized number of steps at three positions inside as K 2 ,K 3 And K 4 The corresponding current optimization step number is k 2 ,k 3 And k 4 ;
Setting the number of samples sampled each time in training as B;
initialization k 1 ,k 2 ,k 3 ,k 4 Are all '0';
step 4.3, respectively carrying out external kth from the source domain data set S and the target domain data set T 1 Sub-inner kth 2 B samples are randomly obtained andas external kth 1 Sub-inner kth 2 A source domain training sample and a target domain training sample of the secondary iteration;
step 4.4, optimizing the source domain feature extractor E by using the formula (7) s And an expression classifier R to obtain the k-th external 1 Sub-inner kth 2 Corresponding gradient of the sub-iteration
Step 4.5, optimizing a source domain feature extractor E by using the formula (8) s Get the external kth 1 Sub-inner kth 2 Corresponding gradient of the sub-iteration
Step 4.6, optimizing the source domain feature extractor E by using the formula (9) s And a target domain feature extractor E t Get the external kth 1 Sub-inner kth 2 Corresponding gradient of the sub-iteration
Step 4.7, let k 2 +1 assignment to k 2 Then, judge k 2 ≥K 2 If yes, executing the step 4.8, otherwise, returning to the step 4.3 for sequential execution;
step 4.8, respectively carrying out external kth from the source domain data set S and the target domain data set T 1 Second inner kth 3 Second random B samples are taken and used as the k-th outer sample 1 Sub-inner kth 3 A source domain training sample and a target domain training sample of the secondary iteration;
step 4.9, optimizing the source domain feature extractor E by using the formula (10) s Target domain feature extractor E t Source domain image generator G s And a target domain image generator G t Get the external kth 1 Sub-inner kth 3 Corresponding gradient of the sub-iteration
Step 4.10, let k 3 +1 assign to k 3 Then, judge k 3 ≥K 3 If yes, executing the step 4.11, otherwise, returning to the step 4.8 for sequential execution;
step 4.11, carry out external kth from the source domain data set S and the target domain data set T respectively 1 Sub-inner kth 4 Second random B samples are taken and used as the k-th outer sample 1 Sub-inner kth 4 A source domain training sample and a target domain training sample of the secondary iteration;
step 4.12, optimizing the source domain feature extractor E by using the formula (11) s And angle classifier D p Get the external kth 1 Second inner kth 4 Corresponding gradient of the sub-iteration
Step 4.13, optimizing expression domain discriminator D by using formula (12) de And angle domain discriminator D dp Get the external kth 1 Sub-inner kth 4 Corresponding gradient of the sub-iteration
Step 4.14, k 4 +1 assignment to k 4 Then, judge k 4 ≥K 4 If yes, executing the step 4.15, otherwise, returning to the step 4.11 for sequential execution;
step 4.15, let k 1 +1 assignment to k 1 Then, judge k 1 ≥K 1 Whether the face expression is established or not or whether the algorithm is converged or not is judged, if yes, the training is finished, and an optimal face expression recognition model is obtained and is used for realizing the classification of the face expression; otherwise, the step 4.3 is returned to execute in sequence.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the invention, through proposing the cross-confrontation feature decoupling learning strategy, the expression-related features and the angle-related features can be decoupled, so that the expression-related features do not contain angle information irrelevant to expression recognition, and the angle-related features do not contain expression information irrelevant to angle recognition, the problems that the existing angle-robust facial expression recognition method is limited by sample diversity, depends on high-quality facial image generation and the like are solved, and the more angle-robust facial expression recognition is realized.
2. According to the invention, by providing the countermeasure field adaptive learning strategy, the source domain information can be effectively migrated to the target domain, the learning of the facial expression recognition task of the target domain is facilitated, the defect that the traditional personalized facial expression recognition method is limited by a small number of target domain samples is overcome, the strategy does not need the expression and angle labeling information of the target domain, the usability in the actual environment is improved, and the influence of the inter-individual difference in the facial expression recognition on the recognition effect can be effectively coped with.
3. According to the method, through the provision of the reconstruction learning strategy, the performance of cross confrontation feature decoupling learning and confrontation field adaptive learning can be further improved, and the facial expression recognition effect of the method is further improved.
4. The invention designs an alternative iterative optimization method, which can simultaneously carry out supervised learning, cross confrontation feature decoupling learning, confrontation field adaptive learning and reconstruction learning, realizes end-to-end training and prediction, reduces manual intervention, supplements each learning strategy, jointly learns the features of angles and individual robustness, and optimizes the process of learning related features of human face expression.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a model block diagram of the present invention;
FIG. 3 is a graph of the reconstructed results of the present invention on the Multi-PIE and BU-3DFE databases.
Detailed Description
In this embodiment, as shown in fig. 1, an angle robust personalized facial expression recognition method based on counterstudy is performed according to the following steps:
step 1, carrying out image preprocessing on a database containing images with N types of human face expressions:
carrying out face detection and correction on all facial expression images in the database by using an MTCNN (multiple-transmission neural network) algorithm so as to obtain a normalized facial image data set which is used as a sample set; in this embodiment, the pixel size of all the face images after normalization processing is 128 × 128;
randomly dividing the sample set by taking individuals in the database as a division reference to obtain a source domain data set S and a target domain data set T; let any sample in the source domain data set S be x s Any sample x in the source domain s Is marked as y s Any sample x in the source domain s Is denoted by p s (ii) a Let any sample in the target domain data set T be x t The target domain sample has no expression and angle marking information;
in this embodiment, as shown in FIG. 3, a Multi-PIE and BU-3DFE facial expression database is used. The Multi-PIE facial expression database contains 755,370 facial images collected from 337 volunteers for 13 angles: 90 ° to 90 ° and at 15 ° intervals, the expressions are noted: smile, surprise, strabismus, aversion, scream and neutrality. The BU-3DFE facial expression database contains 100 3D models, wherein 56 men and 44 women, samples at arbitrary angles can be obtained by rotating the 3D models, and the expressions are labeled as: anger, disgust, fear, happiness, neutrality, sadness and surprise.
Step 2, as shown in fig. 2, constructing a feature decoupling and domain adaptive network model based on counterstudy, and including: source domain feature extractor E s And a target domain feature extractor E t Angle classifier D p And expression classifier R, angle domain discriminator D dp And expression domain discriminator D de Source domain image generator G s And a target domain image generator G t ;
Source domain feature extractor E s And a target domain feature extractor E t The network structure is the same and sequentially comprises an input convolutional layer (the sizes of convolution kernels are 7 multiplied by 7, the number of the convolutional layers is 3, the step length is 2, and the padding is 3), M downsampled convolutional layers (in the example, M is 4, the sizes of the convolution kernels are 4 multiplied by 4, the step lengths are 2, the padding is 1, the number of the convolutional layers is 64, 32, 16 and 8 respectively), Q residual convolutional layers (in the example, Q is 3, the sizes of the convolution kernels are 3 multiplied by 3, the number of the convolutional layers is 8, the step lengths are 2, and the padding is 1), and two branches containing W convolutional layers (in the example, W is 2, the sizes of the convolution kernels are 3 multiplied by 3, the number of the convolutional layers is 8, the step lengths are 2, and the padding is 1); an example regularization layer and a ReLU activation function are connected to each convolution layer;
angle classifier D p Expression classifier R and angle domain discriminator D dp And expression domain discriminator D de Are all composed of a fully connected network of H layers of input length 512 (in this example, H is set to 3);
source domain image generator G s And a target domain image generator G t Having the same network structure, in turn all being an input volumeA plurality of laminates (convolution kernel size is 7 multiplied by 7, number is 8, step size is 1, filling is 3), a plurality of upsampling deconvolution (in this example, J is set to 4, convolution kernel size is 4 multiplied by 4, step size is 2, filling is 1, number is 8, 16, 32, 64 respectively) and an output convolution layer (convolution kernel size is 7 multiplied by 7, number is 3, step size is 1, filling is 3), for each convolution layer before the output convolution layer, an example regularization layer and a ReLU activation function are connected afterwards, for the output convolution layer, a Tanh activation function is connected afterwards;
initializing weights of all convolution layers, anti-convolution layers and full connection layers in the feature decoupling and domain adaptive network model based on antagonistic learning by using Gaussian distribution obeying N (0, 0.02);
step 3, four learning strategies based on feature decoupling and domain adaptive network models of antagonistic learning, including a supervised learning strategy, an antagonistic domain adaptive learning strategy, a cross antagonistic feature decoupling learning strategy and an image reconstruction learning strategy;
step 3.1, a supervised learning strategy:
step 3.3.1, any sample x in the source domain s Input source domain feature extractor E s In the method, two feature vectors are obtainedWherein,representing a sample x in the source domain s Expression-related feature of f s p Representing a sample x in the source domain s The angle-related characteristic of (a); the two features are obtained by unfolding a feature map output by the convolutional layer, and the dimension is 512;
step 3.3.2 sample x in Source Domain s Angle-dependent characteristic f of s p Input angle classifier D p Carrying out angle identification to obtain a sample x in a source domain s The angle category of (1);
method for establishing loss function l of angle identification by using formula (1) p (E s ,D p ):
In formula (1), Sup (·) represents a supervised loss function; square losses, Softmax losses, cross entropy losses, etc. can be used;
step 3.3.3 sample x in Source Domain s Expression-related feature f of s e Inputting the expression into an expression classifier R for expression recognition to obtain a sample x in a source domain s The expression category of (1);
loss function l for expression recognition is established by using formula (2) e (E s ,R):
Step 3.2, the adaptive learning strategy of the confrontation field:
step 3.2.1, any sample x in the target Domain t Input target domain feature extractor E t In (1), two kinds of feature vectors are obtained t e ,f t p In which f t e Representing a sample x in the target domain t Expression-related feature of f t p Representing a sample x in the target domain t The angle-related characteristic of (a); similarly, the two features are obtained by expanding a feature map output by the convolutional layer, and the dimension is 512;
step 3.2.2, introduction of countermeasure domain adaptive learning strategy reduction f s p And f t p Inter-domain distribution variability exists. In particular, sample x in the source domain s Angle-dependent characteristic f of s p Or samples x in the object domain t Angle-dependent characteristic f of t p Input angle domain discriminator D dp Obtaining an angle-dependent feature f s p Is a true or angle-dependent feature f t p A false recognition result; in the angular domain discriminator D dp To distinguish f as much as possible s p And f t p Time, source domain feature extractor E s And a target domain feature extractor E t As far as possible so that f is generated s p And f t p Can not be judged by the angle domain D dp And (5) identifying. Thus a source domain feature extractor E s And a target domain feature extractor E t And angle domain discriminator D dp Forming an antagonistic relationship.
Step 3.2.3, reducing f by introducing countervailing field adaptive learning strategy s e And f t e There is a domain-to-domain distribution variability. In particular, sample x in the source domain s Expression-related feature f of s e Or samples x in the target domain t Expression-related feature f of t e Input expression domain discriminator D de In the method, expression related characteristics f are obtained s e As true or expression-related features f t e A false recognition result; in expression domain discriminator D de To distinguish f as much as possible s e And f t e Time, source domain feature extractor E s And a target domain feature extractor E t As far as possible so that f is generated s e And f t e cannot be expressed by the expression domain discriminator D de And (5) identifying. Thus the source domain feature extractor E s And a target domain feature extractor E t And expression domain discriminator D de Constituting an antagonistic relationship.
Step 3.2.4, establishing a counterlearning loss function l by using the formula (3) adv (E s ,E t ,D dp ,D de ):
Step 3.3, countermeasure characteristic decoupling learning strategy:
step 3.3.1, sample x in Source Domain s Angle-dependent characteristic f of s p Inputting the expression into a classifier R to obtain a sample x in a source domain s The expression classification result of (1);
sample x in the source domain s Expression-related feature f of s e Input angle classifier D p In (1), obtain a sample x in the source domain s The angle classification result of (1);
step 3.3.2, establishing expression classifier R for angle correlation characteristic f by using formula (4) s p And an angle classifier D p For expression-related feature f s e Is classified as a loss function
By optimizing this loss, the expression classifier R cannot correlate features f to angles s p Recognizing expression information and enabling an angle classifier D p Inability to characterize the situational related features f s e Identifying angle information so that the angle-related feature f s p There is no expression information independent of angle, so that the expression-related feature f s e Angle information irrelevant to the expression does not exist, and the decoupling of the angle and the expression information is realized;
step 3.4, image reconstruction learning strategy:
step 3.4.1, sample x in Source Domain s Angle-dependent characteristic f of s p And sample x in the target domain t Expression-related feature f of t e Combining and inputting source domain image generator G s In the generation of a reconstructed image in the source domainCharacteristic f at this time s p And f t e The characteristic diagram is output by the convolutional layer, is not expanded, has the length, width and depth of 8 multiplied by 8, and is directly spliced on the depth during combination to obtain the characteristic diagram of 8 multiplied by 16;
step 3.4.2, sample x in target Domain t Angle-dependent characteristic f of t p And source domain samplingThis x s Expression-related feature f of s e Combine and input target domain image generator G t In generating a reconstructed image in the target domainCharacteristic f at this time t p And f s e The feature map is output by the convolutional layer, is not expanded, has the length, width and depth of 8 multiplied by 8, and is spliced on the depth during combination to obtain the feature map of 8 multiplied by 16;
step 3.4.3, establishing constraint l of reconstructed image by using formula (5) clc (E s ,E t ,G s ,G t ):
In the formula (5), x s ' represents another sample in the source domain data set S, and is associated with sample x s Having the same angle label as sample x t The same expression label is possessed; x is the number of t ' represents another sample in the target domain data set T, and is associated with sample x t Having the same angle label as sample x s The same expression label is possessed; at this point sample x is needed t Angle and expression labeling information of (a), while the target domain dataset is expressionless and angle labeled, so sample x t The angle and expression information of (2) are obtained by pseudo-tags, i.e.WhereinRepresents a sample x t The pseudo-angle of (a) is noted,represents a sample x t Marking the pseudo expression;
step 4, constructing an overall loss function, and performing feature decoupling of antagonistic learning and learning of a field adaptive network model by using an alternative iterative optimization mode, so as to obtain an optimal facial expression recognition model:
step 4.1, constructing a total objective function by using the formula (6):
in the formula (6), alpha, beta, eta and lambda are all weight factors; in this example, the four weighting factors have values of 2.0, 3.0, 0.2, and 0.1, respectively;
step 4.2, setting the total training step number as K 1 The current total number of training steps is k 1 ;
Setting the optimized number of steps at three positions inside as K 2 ,K 3 And K 4 The corresponding current optimization step number is k 2 ,k 3 And k 4 ;
Setting the number of samples sampled each time in training as B;
initialization k 1 ,k 2 ,k 3 ,k 4 Are all '0';
set the learning rate to l _ rate, K in this example 1 Set to 30, K 2 ,K 3 And K 4 Set to 1, 3 and 1, respectively, B is set to 32, and the initial learning rate l _ rate is set to 0.001.
Step 4.3, respectively carrying out external kth from the source domain data set S and the target domain data set T 1 Second inner kth 2 Second random B samples are taken and used as the k-th outer sample 1 Sub-inner kth 2 A source domain training sample and a target domain training sample of the secondary iteration;
step 4.4, optimizing the source domain feature extractor E by using the formula (7) s And an expression classifier R to obtain the k-th external expression 1 Second inner kth 2 Corresponding gradient of the sub-iteration
Step 4.5, optimizing the source domain feature extractor E by using the formula (8) s Get the external kth 1 Sub-inner kth 2 Corresponding gradient of the sub-iteration
Step 4.6, optimizing the source domain feature extractor E by using the formula (9) s And a target domain feature extractor E t Get the external kth 1 Sub-inner kth 2 Corresponding gradient of the sub-iteration
Step 4.7, k 2 +1 assignment to k 2 Then, judge k 2 ≥K 2 If yes, executing the step 4.8, otherwise, returning to the step 4.3 for sequential execution;
step 4.8, respectively carrying out external kth from the source domain data set S and the target domain data set T 1 Second inner kth 3 Second random B samples are taken and used as the k-th outer sample 1 Sub-inner kth 3 A source domain training sample and a target domain training sample of the secondary iteration;
step 4.9, optimizing the source domain feature extractor E by using the formula (10) s Target domain feature extractor E t Source domain image generator G s And a target domain image generator G t Get the external kth 1 Sub-inner kth 3 Corresponding gradient of the sub-iteration
Step 4.10, let k 3 +1 assignment to k 3 Then, judge k 3 ≥K 3 If yes, executing the step 4.11, otherwise, returning to the step 4.8 for sequential execution;
step 4.11, carry out external kth from the source domain data set S and the target domain data set T respectively 1 Sub-inner kth 4 Second random B samples are taken and used as the k-th outer sample 1 Sub-inner kth 4 A source domain training sample and a target domain training sample of the secondary iteration;
step 4.12, optimizing the source domain feature extractor E by using the formula (11) s And angle classifier D p Get the external kth 1 Second inner kth 4 Corresponding gradient of the sub-iteration
Step 4.13, optimizing expression domain discriminator D by using formula (12) de And angle domain discriminator D dp Get the external kth 1 Second inner kth 4 Corresponding gradient of the sub-iteration
Step 4.14, k 4 +1 assign to k 4 Then, judge k 4 ≥K 4 If yes, executing the step 4.15, otherwise, returning to the step 4.11 for sequential execution;
step 4.15, k 1 +1 assign to k 1 Then, two determinations are made, first 20 < k 1 <K 1 If yes, the learning rate is updated to show a linear decay of the learning rate, i.e., l _ rate ═ l _ rate- γ × l _ rate, where γ is a decay factor, set to 0.1 in this example, and then k is determined 1 ≥K 1 Whether the face expression recognition model is established or not or whether the algorithm is converged or not is judged, if yes, the training is finished, the optimal face expression recognition model is obtained and used for realizing the classification of the face expression, and the final face expression recognition model is obtained by a combined target domain feature extractor E t And an expression classifier R is obtained asWherein,representing the combination of functions, otherwise returning to step 4.3 for sequential execution.
The test results of the present invention are further described in conjunction with the following chart:
in order to verify the contribution of each learning strategy to the final facial expression recognition effect, a comparison experiment is carried out, and the method comprises the following four aspects: (1) only supervised learning strategies are used; (2) combining supervised learning and confrontation field adaptive learning strategies; (3) combining supervised learning, confrontation field adaptive learning and cross confrontation learning strategies; (4) all learning strategies are used. The results of the experiment are shown in tables 1 and 2.
TABLE 1 recognition rates (in%)
TABLE 2 recognition rates (in%) of different learning strategies on BU-3DFE database
As can be seen from the experimental results of table 1 and table 2: with the increase of the use of the learning strategy provided by the invention, the experimental result is obviously improved, and a better facial expression recognition effect is still achieved at a larger angle of deviation from the front face, thus showing the effectiveness of the invention.
Claims (1)
1. An individualized facial expression recognition method based on angle robustness of counterstudy is characterized by comprising the following steps:
step 1, carrying out image preprocessing on a database containing images with N types of human face expressions:
carrying out face detection and correction on all facial expression images in the database by using an MTCNN (multiple-terminal coupled neural network) algorithm so as to obtain a normalized facial image data set which is used as a sample set;
randomly dividing the sample set by taking the individuals in the database as a dividing reference to obtain a source domain data set S and a target domain data set T; let any sample in the source domain data set S be x s Any sample x in the source domain s Is marked as y s Any sample x in the source domain s Is denoted by p s (ii) a Let x be any sample in the target domain data set T t ;
Step 2, constructing a feature decoupling and field self-adaptive network model based on counterstudy, and comprising the following steps: source domain feature extractor E s And a target domain feature extractor E t Angle classifier D p And expression classifier R, angle domain discriminator D dp And expression domain discriminator D de Source field image Generator G s And a target domain image generator G t ;
The source domain feature extractor E s And a target domain feature extractor E t The system has the same network structure and consists of an input convolutional layer, M downsampling convolutional layers, Q residual convolutional layers and two branches containing W convolutional layers in sequence; an example regularization layer and a ReLU activation function are connected to each convolution layer;
the angle classifier D p Expression classifier R and angle domain discriminator D dp And expression domain discriminator D de All are formed by full-connection networks of H layers;
the source domain image generator G s And a target domain image generator G t The network structure is the same, and the network structure is composed of an input convolutional layer, J up-sampling anti-convolutional layers and an output convolutional layer in sequence, for each convolutional layer before the output convolutional layer, an example regularization layer and a ReLU activation function are accessed, and for the output convolutional layer, a Tanh activation function is accessed;
performing weight initialization on all convolution layers, anti-convolution layers and full connection layers in the feature decoupling and domain adaptive network model based on antagonistic learning by using Gaussian distribution;
step 3, four learning strategies of a feature decoupling and field self-adaptive network model based on antagonistic learning, including a supervision learning strategy, an antagonistic field self-adaptive learning strategy, a cross antagonistic feature decoupling learning strategy and an image reconstruction learning strategy;
step 3.1, a supervised learning strategy:
step 3.3.1, any sample x in the source domain s Inputting the source domain feature extractor E s In (1), two kinds of feature vectors are obtainedWherein,representing samples x in the source domain s Expression-related feature of f s p Representing samples x in the source domain s The angle-related characteristic of (a);
step 3.3.2, sample x in the Source Domain s Angle-dependent characteristic ofInputting the angle classifier D p Carrying out angle identification to obtain source domain samplesThis x s The angle category of (d);
method for establishing loss function l of angle identification by using formula (1) p (E s ,D p ):
In formula (1), Sup (·) represents a supervised loss function;
step 3.3.3 sample x in the Source Domain s Expression-related features ofInputting the expression into the expression classifier R for expression recognition to obtain a sample x in a source domain s The expression category of (a);
loss function l for expression recognition is established by using formula (2) e (E s ,R):
Step 3.2, the adaptive learning strategy of the confrontation field:
step 3.2.1, any sample x in the target domain t Inputting the target domain feature extractor E t In (1), two kinds of feature vectors are obtainedWherein,representing a sample x in the target domain t Expression-related feature of f t p Representing a sample x in the target domain t The angle-related characteristic of (a);
step 3.2.2 sample x in the Source Domain s Angle-dependent characteristic ofOr samples x in the target domain t Angle-dependent characteristic f of t p Inputting the angle domain discriminator D dp Obtaining angle-related featuresAs true or expression-related features f t p A false recognition result;
step 3.2.3 sample x in the Source Domain s Expression-related features ofOr samples x in the target domain t Expression-related features ofInputting the expression domain discriminator D de In the method, expression-related features are obtainedAs true or expression-related features f t e A false recognition result;
step 3.2.4, establishing a countering learning loss function l by using the formula (3) adv (E s ,E t ,D dp ,D de ):
Step 3.3, a countermeasure characteristic decoupling learning strategy:
step 3.3.1, sample x in the Source Domain s Angle-dependent characteristic ofInputting the expression into the expression classifier R to obtain a sample x in a source domain s The expression classification result of (2);
sample x in the source domain s Expression-related feature f of s e Input angleDegree classifier D p Obtaining a sample x in the source domain s The angle classification result of (1);
step 3.3.2, establishing angle-related features of expression classifier R by using formula (4)And an angle classifier D p For expression-related featuresIs classified as a loss function
Step 3.4, image reconstruction learning strategy:
step 3.4.1, sample x in the Source Domain s Angle-dependent characteristic ofAnd sample x in the target domain t Expression-related features ofCombined and input to the source domain image generator G s Generating a reconstructed image in the source domain
Step 3.4.2, sample x in target Domain t Angle-dependent characteristic f of t p And sample x in the source domain s Expression-related features ofMake a combination and input the purposeMark domain image generator G t Generating a reconstructed image in the target domain
Step 3.4.3, establishing constraint l of reconstructed image by using formula (5) clc (E s ,E t ,G s ,G t ):
In formula (5), x' s Represents another sample in the source domain data set S and is associated with sample x s Having the same angle label as sample x t The same expression label is possessed; x' t Represents another sample in the target domain data set T and is associated with sample x t Having the same angle label as sample x s The same expression label is possessed;
and 4, constructing an overall loss function, and performing feature decoupling of counterstudy and study of a domain adaptive network model by using an alternative iterative optimization mode to obtain an optimal facial expression recognition model:
step 4.1, constructing a total objective function by using the formula (6):
in the formula (6), alpha, beta, eta and lambda are all weight factors;
step 4.2, setting the total training step number as K 1 The current total number of training steps is k 1 ;
Setting the optimized number of steps at three positions inside as K 2 ,K 3 And K 4 The corresponding current optimization step number is k 2 ,k 3 And k 4 ;
Setting the number of samples sampled each time in training as B;
initialization k 1 ,k 2 ,k 3 ,k 4 Are all '0';
step 4.3, respectively carrying out external kth from the source domain data set S and the target domain data set T 1 Sub-inner kth 2 Second random B samples are taken and taken as the kth outer 1 Sub-inner kth 2 A source domain training sample and a target domain training sample of the secondary iteration;
step 4.4, optimizing the source domain feature extractor E using equation (7) s And an expression classifier R to obtain the k-th external 1 Sub-inner kth 2 Corresponding gradient of the sub-iteration
Step 4.5, optimizing the source domain feature extractor E by using the formula (8) s Get the external kth 1 Sub-inner kth 2 Corresponding gradient of the sub-iteration
Step 4.6, optimizing the source domain feature extractor E by using the formula (9) s And a target domain feature extractor E t Get the external kth 1 Sub-inner kth 2 Corresponding gradient of the sub-iteration
Step 4.7, let k 2 +1 assignment to k 2 Then, judge k 2 ≥K 2 If yes, executing the step 4.8, otherwise, returning to the step 4.3 for sequential execution;
step 4.8, respectively carrying out external kth from the source domain data set S and the target domain data set T 1 Second inner kth 3 Second random B samples are taken and taken as the kth outer 1 Second inner kth 3 A source domain training sample and a target domain training sample of the secondary iteration;
step 4.9, optimizing the source domain feature extractor E by using the formula (10) s Target domain feature extractor E t Source domain image generator G s And a target domain image generator G t Get the external kth 1 Sub-inner kth 3 Corresponding gradient of the sub-iteration
Step 4.10, let k 3 +1 assign to k 3 Then, judge k 3 ≥K 3 If yes, executing the step 4.11, otherwise, returning to the step 4.8 for sequential execution;
step 4.11, carry out external kth from the source domain data set S and the target domain data set T respectively 1 Second inner kth 4 Second random B samples are taken and used as the k-th outer sample 1 Sub-inner kth 4 A source domain training sample and a target domain training sample of the secondary iteration;
step 4.12, optimizing the source domain feature extractor E by using the formula (11) s And angle classifier D p Get the external kth 1 Sub-inner kth 4 Corresponding gradient of the sub-iteration
Step 4.13, optimizing expression domain discriminator D by using formula (12) de And angle domain discriminator D dp Get the external kth 1 Second inner kth 4 Corresponding gradient of the sub-iteration
Step 4.14, let k 4 +1 assignment to k 4 Then, judge k 4 ≥K 4 If yes, executing the step 4.15, otherwise, returning to the step 4.11 for sequential execution;
step 4.15, k 1 +1 assign to k 1 Then, judge k 1 ≥K 1 Whether the face expression is established or the algorithm is converged or not is judged, if yes, the training is finished, and an optimal face expression recognition model is obtained and used for realizing the classification of the face expression; otherwise, the step 4.3 is returned to execute in sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010136966.3A CN111382684B (en) | 2020-03-02 | 2020-03-02 | Angle robust personalized facial expression recognition method based on antagonistic learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010136966.3A CN111382684B (en) | 2020-03-02 | 2020-03-02 | Angle robust personalized facial expression recognition method based on antagonistic learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111382684A CN111382684A (en) | 2020-07-07 |
CN111382684B true CN111382684B (en) | 2022-09-06 |
Family
ID=71218531
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010136966.3A Active CN111382684B (en) | 2020-03-02 | 2020-03-02 | Angle robust personalized facial expression recognition method based on antagonistic learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111382684B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112101241A (en) * | 2020-09-17 | 2020-12-18 | 西南科技大学 | Lightweight expression recognition method based on deep learning |
CN112133311B (en) * | 2020-09-18 | 2023-01-17 | 科大讯飞股份有限公司 | Speaker recognition method, related device and readable storage medium |
CN114998973A (en) * | 2022-06-30 | 2022-09-02 | 南京邮电大学 | Micro-expression identification method based on domain self-adaptation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446609A (en) * | 2018-03-02 | 2018-08-24 | 南京邮电大学 | A kind of multi-angle human facial expression recognition method based on generation confrontation network |
CN109508669A (en) * | 2018-11-09 | 2019-03-22 | 厦门大学 | A kind of facial expression recognizing method based on production confrontation network |
CN110188656A (en) * | 2019-05-27 | 2019-08-30 | 南京邮电大学 | The generation and recognition methods of multi-orientation Face facial expression image |
CN110348330A (en) * | 2019-06-24 | 2019-10-18 | 电子科技大学 | Human face posture virtual view generation method based on VAE-ACGAN |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9323980B2 (en) * | 2011-05-13 | 2016-04-26 | Microsoft Technology Licensing, Llc | Pose-robust recognition |
-
2020
- 2020-03-02 CN CN202010136966.3A patent/CN111382684B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446609A (en) * | 2018-03-02 | 2018-08-24 | 南京邮电大学 | A kind of multi-angle human facial expression recognition method based on generation confrontation network |
CN109508669A (en) * | 2018-11-09 | 2019-03-22 | 厦门大学 | A kind of facial expression recognizing method based on production confrontation network |
CN110188656A (en) * | 2019-05-27 | 2019-08-30 | 南京邮电大学 | The generation and recognition methods of multi-orientation Face facial expression image |
CN110348330A (en) * | 2019-06-24 | 2019-10-18 | 电子科技大学 | Human face posture virtual view generation method based on VAE-ACGAN |
Non-Patent Citations (2)
Title |
---|
"Identity- and Pose-Robust Facial Expression Recognition through Adversarial Feature Learning";Can Wang et al.;《Affective Computing & Facial Analytics》;20191025;第238-246页 * |
"基于生成式对抗网络的鲁棒人脸表情识别";姚乃明 等;《自动化学报》;20180531;第44卷(第5期);第865-877页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111382684A (en) | 2020-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108615010B (en) | Facial expression recognition method based on parallel convolution neural network feature map fusion | |
Liu et al. | Connecting image denoising and high-level vision tasks via deep learning | |
CN108596039B (en) | Bimodal emotion recognition method and system based on 3D convolutional neural network | |
CN108510012B (en) | Target rapid detection method based on multi-scale feature map | |
CN110532900B (en) | Facial expression recognition method based on U-Net and LS-CNN | |
CN112800903B (en) | Dynamic expression recognition method and system based on space-time diagram convolutional neural network | |
CN106372581B (en) | Method for constructing and training face recognition feature extraction network | |
CN108648191B (en) | Pest image recognition method based on Bayesian width residual error neural network | |
Li et al. | Cross-modal attentional context learning for RGB-D object detection | |
CN109685072B (en) | Composite degraded image high-quality reconstruction method based on generation countermeasure network | |
CN111382684B (en) | Angle robust personalized facial expression recognition method based on antagonistic learning | |
CN111274921B (en) | Method for recognizing human body behaviors by using gesture mask | |
CN109886881B (en) | Face makeup removal method | |
CN107506722A (en) | One kind is based on depth sparse convolution neutral net face emotion identification method | |
CN108171318B (en) | Convolution neural network integration method based on simulated annealing-Gaussian function | |
CN109344759A (en) | A kind of relatives' recognition methods based on angle loss neural network | |
CN111814611B (en) | Multi-scale face age estimation method and system embedded with high-order information | |
CN112395442B (en) | Automatic identification and content filtering method for popular pictures on mobile internet | |
CN106503661B (en) | Face gender identification method based on fireworks deepness belief network | |
CN111861906B (en) | Pavement crack image virtual augmentation model establishment and image virtual augmentation method | |
CN109002755B (en) | Age estimation model construction method and estimation method based on face image | |
CN111582397A (en) | CNN-RNN image emotion analysis method based on attention mechanism | |
Ocquaye et al. | Dual exclusive attentive transfer for unsupervised deep convolutional domain adaptation in speech emotion recognition | |
CN115966010A (en) | Expression recognition method based on attention and multi-scale feature fusion | |
CN113627543B (en) | Anti-attack detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |