CN112818764A - Low-resolution image facial expression recognition method based on feature reconstruction model - Google Patents
Low-resolution image facial expression recognition method based on feature reconstruction model Download PDFInfo
- Publication number
- CN112818764A CN112818764A CN202110055946.8A CN202110055946A CN112818764A CN 112818764 A CN112818764 A CN 112818764A CN 202110055946 A CN202110055946 A CN 202110055946A CN 112818764 A CN112818764 A CN 112818764A
- Authority
- CN
- China
- Prior art keywords
- feature
- matrix
- low
- image
- resolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 230000008921 facial expression Effects 0.000 title claims abstract description 29
- 230000014509 gene expression Effects 0.000 claims abstract description 31
- 238000012549 training Methods 0.000 claims abstract description 21
- 238000000605 extraction Methods 0.000 claims abstract description 5
- 230000001815 facial effect Effects 0.000 claims abstract description 4
- 238000012360 testing method Methods 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 77
- 230000006870 function Effects 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 14
- 238000011176 pooling Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 6
- 238000000354 decomposition reaction Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000009826 distribution Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 238000012937 correction Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 230000008447 perception Effects 0.000 claims description 3
- 230000003042 antagnostic effect Effects 0.000 claims description 2
- 239000000203 mixture Substances 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 7
- 238000004364 calculation method Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000005096 rolling process Methods 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000000452 restraining effect Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Probability & Statistics with Applications (AREA)
- Multimedia (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a low-resolution image facial expression recognition method based on a feature reconstruction model, and belongs to the field of facial image expression recognition. The method comprises the steps of constructing a training and testing data set; then training a facial expression recognition model of the feature reconstruction model, extracting image expression features by using a feature extraction network with fixed parameters, then obtaining an expression feature generator and a feature discriminator by adopting a generation confrontation network mode training model, and using FSRG as inputImage reconstruction feature yields FSR(ii) a Classifier pair feature F consisting of fully connected network and softmax function layerSRClassifying, and re-weighting the sample loss by using the probability value of the correct category corresponding to the sample output by the softmax layer; the method is insensitive to the resolution of the input image, improves the identification accuracy rate under lower resolution, and has more stable identification effect on each resolution.
Description
Technical Field
The invention belongs to the field of facial image expression recognition, and particularly relates to a low-resolution image facial expression recognition method based on a feature reconstruction model.
Background
Facial expressions are one of the most direct, natural signals that humans express emotions. Facial expression recognition is a hot topic of researches such as human-computer natural interaction, computer vision, emotion calculation, image processing and the like, and is widely applied to the fields of human-computer interaction, remote education, security, intelligent robot development, medical treatment, animation production and the like.
Under different scenes, due to the change of equipment and environment and the imaging principle of a pinhole camera, the face images of people under the multi-person photographic scene have the problem of different resolutions, and the images can be compressed in network transmission and storage, so that the quality and the resolution of the images are reduced. The recognition accuracy of the algorithm can be severely impacted in low resolution scenarios. In order to more accurately recognize the expression of a person, it is necessary to reduce the influence of the resolution change. With the development of technologies such as deep learning and image super-resolution, when processing low-resolution input images, methods of performing super-resolution reconstruction on the images and then performing recognition are often used. The method of reconstructing an image has the following disadvantages, first: although the method is improved compared with the method of directly using the low-resolution image to recognize the expression, the method causes the problems of a large amount of calculation, unstable effect and the like. Secondly, the method comprises the following steps: since the expression recognition target is a human face, the problem of privacy disclosure is easily caused by high-resolution reconstruction of a human face image, and this point is increasingly emphasized in international research.
Disclosure of Invention
The invention aims to overcome the defects of large calculated amount and easy privacy disclosure of reconstructed face images and provides a low-resolution image facial expression recognition method based on a feature reconstruction model.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
a low-resolution image facial expression recognition method based on a feature reconstruction model comprises the following steps:
1) collecting facial expression images with resolution ratio more than or equal to 100x100 pixels and labeling expression types as original images IHR(ii) a Carrying out 2-8 times integer multiplying factor down-sampling on the original image to obtain a corresponding low-resolution image, wherein the expression category label of the low-resolution image is consistent with the original image; dividing an original image and a corresponding low-resolution image into a training set and a test set;
2) training a neural network model by adopting a generative confrontation network method;
inputting an original image and low-resolution images of respective magnifications into a feature extractor E, and extracting and calculating a feature matrix F of the original image by the feature extractor EHRAnd low resolution image feature matrix F of each magnificationLR;
Low resolution image feature matrix FLRInputting the data into an expression feature generator FSRG, and outputting a generated reconstruction feature matrix FSR;
Feature matrix F of the original imageHRAnd corresponding reconstructed feature matrix F of the low resolution imageSRInputting the data into a characteristic discriminator FSRD, comparing the difference of the two in a distribution space, and optimizing the characteristic discriminator FSRG through back propagation;
reconstruction of the expressive features FSRInputting the samples into a double-layer fully-connected expression classifier C for classification, calculating the probability of the samples being classified into various categories by the expression classifier C, calculating a weight coefficient by using the probability value of each sample being correctly classified to carry out weight weighting on the loss of the samples, and accelerating the convergence of a neural network;
repeating the training process until a trained neural network model is obtained;
3) inputting the facial image of the expression to be recognized into a trained neural network model, and extracting an input image by a feature extractor EAn image feature matrix F, a feature generator FSRG for generating a reconstructed feature matrix FSRAnd the classifier C calculates and outputs the class label of the recognition result.
Further, the feature extractor E in step 2) is formed by combining a plurality of convolution layers and nonlinear activation layers, and is a feature extraction part of the expression recognition model pre-trained by the original image data set.
Further, the process of extracting the features in the feature extractor E in step 2) is as follows:
extracting a three-dimensional feature tensor T for the input image I, wherein the size of the feature tensor T is w x h x n, w and h are the length and width of the feature tensor, and n is the number of channels;
calculating a covariance matrix M of the feature tensor T:
wherein ,fiOne channel representing the characteristic tensor T,for the mean value of the channels of the feature tensor, M ∈n*nN is the number of channels of the feature tensor T;
correcting the characteristic value of the covariance matrix M to obtain a corrected covariance matrix M+:
M+=M+λ*trace(M)*I (2)
Where λ is a coefficient greater than zero, I is an identity matrix, trace (M) is the trace of matrix M;
covariance matrix M for rectification+And performing pooling operation and logarithm of the characteristic value to obtain a characteristic matrix F.
Further, the corrected covariance matrix M+Performing pooling operation and logarithm of the characteristic value to obtain a characteristic matrix, wherein the process of obtaining the characteristic matrix is as follows:
Fcov=WM+WT (3)
To FcovPerforming eigenvalue decomposition and eigenvalue correction to obtain a matrix F+The method comprises the following specific operations:
Fcov=U1Σ1U1 T (4)
F+=U1max(εI,Σ1)U1 T (5)
wherein max () is the maximum value of the corresponding elements of the two matrices;
to F+Performing eigenvalue decomposition and logarithm of the eigenvalue to obtain an eigen matrix F, specifically:
F+=U2Σ2U2 T (6)
F=U2log(Σ2)U2 T (7)
wherein, log (Σ)2) Finger-to-eigenvalue matrix sigma2Is logarithmic.
Further, the feature generator FSRG in step 2) is a full convolution network, and is composed of a convolution neural network and a nonlinear activation layer, and the process of reconstructing the feature matrix by the feature generator FSRG is as follows:
feature matrix F with low resolution imagesLRFor input, the reconstructed feature matrix F is outputSRThe dimension of the matrix before and after reconstruction is consistent.
Further, the feature discriminator FSRD in step 2) compares the difference between the two in the distribution space, specifically:
the feature discriminator FSRD uses the feature matrix F corresponding to the same imageSR and FHRAs an input, the corresponding scores are output, and the absolute value of the difference between the scores represents the Wasserstein distance of the two in the feature space.
Further, step 2) of trainingIn the process, the loss function of the feature generator FSRG is formed by the penalty LGANFeature matrix FSR and FHRThe perceptual loss L betweenPAnd two-norm loss L2Composition is carried out;
against loss LGANComprises the following steps:
wherein b is the size of the data batch;
loss of feature perception LPComprises the following steps:
wherein ,CFC() Represents the output of the last fully connected layer of classifier C;
two norm loss L2Comprises the following steps:
the loss of the feature generator FSRG is a linear sum of the three:
LFSRG=LGAN+λ1LP+λ2L2 (11)
wherein ,λ1 and λ2Are all weight coefficients greater than zero.
Further, in the training process of step 2), the loss function of the feature discriminator FSRD is:
wherein ,theta is a random number between 0 and 1, and ensures data of each batchInIs FSR and FHRThe linear interpolation result of (2); p and k are eachThe exponential parameter and the coefficient parameter of the term.
Further, the expression classifier C in the step 2) calculates that the sample belongs to each Class by using softmaxiThe loss of the real category is reweighted by using the probability value corresponding to the real category, wherein the probability value i is 1.
w=(σ-logit)r (13)
Where, logic is the probability that the sample output by the softmax function corresponds to its true class, and the parameters σ and r are set to 1.5 and 2, respectively.
Compared with the prior art, the invention has the following beneficial effects:
the low-resolution image facial expression recognition method based on the feature reconstruction model comprises the steps of constructing a training and testing data set, carrying out different-magnification down-sampling on a high-resolution facial expression image to generate a plurality of high-low-resolution image pairs with multiple magnifications, and simultaneously keeping a category label; then training a facial expression recognition model of the feature reconstruction model, and extracting high-resolution image expression features F by using a feature extraction network with fixed parametersHRAnd corresponding low resolution image expressive features FLR(ii) a Then, an expression feature generator FSRG and a feature discriminator FSRD are obtained by adopting a training model in a mode of generating an antagonistic network, and the FSRG is used for reconstructing features of an input image to obtain FSR(ii) a Classifier C consisting of fully connected network and softmax function layer pairs feature FSRClassifying, and re-weighting the sample loss by using the probability value of the correct category corresponding to the sample output by the softmax layer, so as to accelerate the convergence of the model; identification processComprises the following steps: the model extracts a feature matrix F of the input image, and then a feature generator FSRG generates a reconstructed feature matrix FSRAnd calculating and outputting a class label of the recognition result by using the classifier C obtained by training. The invention provides a method for generating a network by combining deep learning countermeasure and reconstructing image characteristics to identify facial expressions. Compared with the traditional method, the method is insensitive to the resolution of the input image, and the identification accuracy under the lower resolution is improved; compared with a method for reconstructing an image, the method has more stable identification effect on each resolution, can avoid the problems of increased calculated amount and possible privacy disclosure caused by the reconstructed image, and has great industrial application value.
Drawings
FIG. 1 is an overall network of a low-resolution image facial expression recognition method based on a feature reconstruction model according to the present invention;
FIG. 2 is a network structure of a feature extractor of the present invention;
FIG. 3 is a network structure of the feature generation part of the present invention, wherein FIG. 3(a) is a feature generator network structure and FIG. 3(b) is a dense connection block structure;
fig. 4 is a network structure of the feature discriminator of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention provides a method for generating a network by combining deep learning countermeasure and reconstructing image characteristics to identify facial expressions. Compared with the traditional method, the method disclosed by the invention is insensitive to the resolution of the input image, and the identification accuracy under a lower resolution is improved; compared with a method for reconstructing an image, the method has more stable recognition effect on each resolution, can avoid the problems of increased calculated amount and possible privacy disclosure caused by the reconstructed image, and has great industrial application value in the fields of education analysis and management and entertainment.
The invention is described in further detail below with reference to the accompanying drawings:
referring to fig. 1, fig. 1 is an overall network of the low-resolution image facial expression recognition method based on the feature reconstruction model of the present invention; the network comprises four main parts, namely a feature extractor, a feature generator, a feature discriminator and an expression classifier.
Referring to fig. 2, fig. 2 is a network structure of the feature extractor of the present invention; the feature extractor contains six convolutional layers (Conv Layer) with convolutional kernels of 3x3 and step size of 1. The number of output characteristic channels of each convolutional layer is 64, 96, 128 and 256 in sequence, each convolutional layer is followed by an activation layer, and the activation function is a ReLu function. After each of the first, second and fourth active layers there is a pooling Layer, using a maximum pooling (Max boosting Layer), with a pooling window size of 2x2, with a step size of 2.
Referring to fig. 3, fig. 3(a) is a feature generator network structure and fig. 3(b) is a dense connection block structure; the method is composed of three cascade of dense connection blocks and residual connection between input and output, wherein the structure of a single dense block is shown in fig. 3(b), and comprises five convolution layer-batch normalization layer (BN) combinations, and dense connection is adopted between groups and an lreul function is added to serve as an active layer.
Referring to fig. 4, fig. 4 is a network structure of the feature discriminator of the present invention, which is formed by cascading five rolling blocks and two fully connected layers, where the number of output channels of each rolling block is sequentially 8, 16, 32, 64, and 64; each convolution block is formed by alternately arranging two convolution layers and two active layers, wherein the convolution kernel of the former convolution layer is 3x3 and has the step length of 1, the convolution kernel of the latter convolution layer is 5x5 and has the step length of 2; the output dimensions of the last two fully connected layers are, in turn, 100 and 1 dimensions.
The invention discloses a low-resolution image facial expression recognition method based on a feature reconstruction model, which comprises the following implementation processes:
a model training part:
step 1: collecting facial expression images with resolution ratio more than or equal to 100x100 pixels and labeling expression types as original images IHR(ii) a Adopting bicubic interpolation mode to make 2-8 times integer multiplying power down-sampling for original image (the length and width of image are changed into original resolution ratio)To) Obtaining a plurality of low resolution images (I)LR-2To ILR-8) (ii) a The expression category label of the low-resolution image is consistent with the original image;
step 2: feature extractor E using fixed parameter pre-training extracts feature matrix F of original resolution imageHRFeature matrix F of low-resolution image corresponding to each magnificationLRThe feature extractor E includes a convolutional layer and a nonlinear active layer. The method comprises the steps that a high-low resolution image pair is input at one time, for one image I, a corresponding three-dimensional feature tensor T is extracted by using a feature extractor, the size of the feature tensor T is w x h x n, w and h are the length and the width of the corresponding feature tensor, and n is the number of channels;
and step 3: calculating the covariance matrix of the respective feature tensors T:
wherein ,fiOne channel representing the characteristic tensor T,for the mean value of the channels of the feature tensor, M ∈n*nAnd n is the number of channels of the feature tensor T.
And 4, step 4: in order to ensure the positive nature of the matrix, the eigenvalue correction is carried out on each covariance matrix:
M+=M+λ*trace(M)*I (2)
wherein, λ is a coefficient larger than zero, and since the covariance matrix is symmetric and semi-positive, in order to reduce the influence of this operation on the feature matrix and ensure positive, the value of λ is 0.0001; i is the identity matrix.
And 5: for covariance matrix M+Performing pooling operation on the characteristic values and logarithm of the characteristic values to obtain a characteristic matrix, wherein the specific operation is as follows:
Fcov=WM+WT (3)
wherein ,for pooling parameter matrices, the specific parameters are optimized by back-propagation learning
Step 6: for matrix FcovEigenvalue decomposition is performed and the matrix F is obtained by the following operation+The method comprises the following specific operations:
Fcov=U1Σ1U1 T (4)
F+=U1max(εI,Σ1)U1 T (5)
where max () is the maximum value element by element for both matrices.
And 7: f to the matrix+Taking logarithm of the eigenvalue to obtain a final eigen matrix F, and specifically operating as follows:
Fcov=U2Σ2U2 T (6)
F=U2log(Σ2)U2 T (7)
wherein, log (Σ)2) Finger-to-feature matrix sigma2Is logarithmic.
And 7: model structure initialization
The feature generator FSRG is a full convolution network, and the invention adopts ResNet-50 to realize the feature matrix F of low-resolution imagesLRFor input, the reconstructed feature matrix F is outputSRThe dimension of the input and output characteristic matrix is kept consistent, so that the original pooling operation in ResNet is removed; the feature discriminator FSRD adopts a VGG-16 network to respectively use an original image feature matrix FHRCharacteristic matrix F of low resolution image of corresponding multiplying powerSRIs input; the expression classifier C consists of two full-connection layers and a softmax function layer and outputs the probability of each classification column.
And 8: setting a loss function
During training, the loss function of the feature generator FSRG is determined by the penalty LGANFeature matrix FSR and FHRThe perceptual loss L betweenPAnd L2 distance loss L2In which L isGANComprises the following steps:
where b is the size of the data batch, LGANIs used for restraining the feature perception loss, L, of the feature generator FSRG and the feature discriminator FSRDPComprises the following steps:
wherein ,CFC() Representing the output of the last fully connected layer of classifier C.
The loss of the feature generator FSRG is a linear sum of the three:
LFSRG=LGAN+λ1LP+λ2L2 (10)
wherein ,λ1 and λ2Both are adjustable weight coefficients greater than zero, both coefficients being set to 0.1 in the present invention.
The loss function calculation mode of the feature discriminator FSRD is as follows:
wherein ,theta is a random number between 0 and 1, and is ensured in the data of each batchIs FSR and FHRAs a result of linear interpolation, p and k are respectivelyThe index parameter and the coefficient parameter of the term, p is 6 and k is 2, can obtain the best effect in the experiment.
The concrete operations of calculating the probability of the sample belonging to each category by using softmax and re-weighting the sample loss are as follows:
w=(σ-logit)r (12)
wherein, logic is the probability of the sample output by the softmax function corresponding to the real category, and parameters sigma and r are respectively set to 1.5 and 2; the penalty function for classifier C is set to cross entropy penalty.
And step 9: model training
The gradient was updated using an Adam optimizer, the learning rate was set to 0.00002, Adam's first order momentum parameter was 0.1, and second order momentum parameter was 0.999. The number of data set training iterations (Epoch) was set to 400 and the data batch size (batch size) was set to 16.
The model using part:
extracting an image feature tensor T by using a feature extractor E, and then performing feature reconstruction by using a feature generator FSRG to obtain corresponding reconstructed features FSRThen, the classifier C calculates the probability that the sample belongs to each class, and classifies the sample into the class having the highest probability.
Referring to table 1, table 1 shows the average accuracy of expression recognition on the face image downsampled at each magnification of the RAF-DB data set by different methods, and the method provided by the present invention is significantly improved compared with a method for directly performing Bicubic interpolation amplification on a low-resolution image. Compared with a super-resolution method RCAN and Meta-SR for reconstructing images, the method has better effect on images with lower resolution and higher average identification accuracy of images with various scales. The method provided by the invention is obviously improved compared with a method for directly carrying out Bicubic interpolation amplification on the low-resolution image. Compared with the super-resolution method RCAN and Meta-SR of the reconstructed image, the method has better effect on the image with lower resolution and higher average identification accuracy of the image with each scale.
TABLE 1 mean rate of accuracy of expression recognition on RAF-DB data set for each-magnification downsampling face image by different methods
The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.
Claims (9)
1. A low-resolution image facial expression recognition method based on a feature reconstruction model is characterized by comprising the following steps:
1) collecting facial expression images with resolution ratio more than or equal to 100x100 pixels and labeling expression types as original images IHR(ii) a Carrying out 2-8 times integer multiplying factor down-sampling on the original image to obtain a corresponding low-resolution image, wherein the expression category label of the low-resolution image is consistent with the original image; dividing an original image and a corresponding low-resolution image into a training set and a test set;
2) training a neural network model by adopting a generative confrontation network method;
inputting an original image and low-resolution images of respective magnifications into a feature extractor E, and extracting and calculating a feature matrix F of the original image by the feature extractor EHRAnd low resolution image feature matrix F of each magnificationLR;
Low resolution image feature matrix FLRInputting the data into an expression feature generator FSRG, and outputting a generated reconstruction feature matrix FSR;
Feature matrix F of the original imageHRAnd corresponding reconstructed feature matrix F of the low resolution imageSRInputting the data into a characteristic discriminator FSRD, comparing the difference of the two in a distribution space, and optimizing the characteristic discriminator FSRG through back propagation;
reconstruction of the expressive features FSRInputting the samples into a double-layer fully-connected expression classifier C for classification, calculating the probability of the samples being classified into various categories by the expression classifier C, calculating a weight coefficient by using the probability value of each sample being correctly classified to carry out weight weighting on the loss of the samples, and accelerating the convergence of a neural network;
repeating the training process until a trained neural network model is obtained;
3) inputting a facial image with an expression to be recognized into a trained neural network model, extracting a feature matrix F of the input image by a feature extractor E, and generating a reconstructed feature matrix F by a feature generator FSRGSRAnd the classifier C calculates and outputs the class label of the recognition result.
2. The feature reconstruction model-based low-resolution image facial expression recognition method as claimed in claim 1, wherein the feature extractor E in step 2) is formed by combining a plurality of convolution layers and nonlinear activation layers and is a feature extraction part of an expression recognition model pre-trained by an original image data set.
3. The method for recognizing the facial expression of the low-resolution image based on the feature reconstruction model as claimed in claim 1, wherein the feature extraction process in the feature extractor E in the step 2) is as follows:
extracting a three-dimensional feature tensor T for the input image I, wherein the size of the feature tensor T is w x h x n, w and h are the length and width of the feature tensor, and n is the number of channels;
calculating a covariance matrix M of the feature tensor T:
wherein ,fiOne channel representing the characteristic tensor T,for the mean value of the channels of the feature tensor, M ∈n*nN is the number of channels of the feature tensor T;
correcting the characteristic value of the covariance matrix M to obtain a corrected covariance matrix M+:
M+=M+λ*trace(M)*I (2)
Where λ is a coefficient greater than zero, I is an identity matrix, trace (M) is the trace of matrix M;
covariance matrix M for rectification+And performing pooling operation and logarithm of the characteristic value to obtain a characteristic matrix F.
4. According to claimThe feature reconstruction model-based low-resolution image facial expression recognition method of claim 3, characterized in that the corrected covariance matrix M is corrected+Performing pooling operation and logarithm of the characteristic value to obtain a characteristic matrix, wherein the process of obtaining the characteristic matrix is as follows:
Fcov=WM+WT (3)
To FcovPerforming eigenvalue decomposition and eigenvalue correction to obtain a matrix F+The method comprises the following specific operations:
Fcov=U1Σ1U1 T (4)
F+=U1max(εI,Σ1)U1 T (5)
wherein max () is the maximum value of the corresponding elements of the two matrices;
to F+Performing eigenvalue decomposition and logarithm of the eigenvalue to obtain an eigen matrix F, specifically:
F+=U2Σ2U2 T (6)
F=U2log(Σ2)U2 T (7)
wherein, log (Σ)2) Finger-to-eigenvalue matrix sigma2Is logarithmic.
5. The method for recognizing facial expressions of low-resolution images based on a feature reconstruction model according to claim 1, wherein the feature generator FSRG in the step 2) is a full convolution network, which is composed of a convolution neural network and a nonlinear activation layer, and the process of reconstructing the feature matrix by the feature generator FSRG is as follows:
feature matrix F with low resolution imagesLRFor input, the reconstructed feature matrix F is outputSRThe dimension of the matrix before and after reconstruction is consistent.
6. The feature reconstruction model-based low-resolution image facial expression recognition method according to claim 1, wherein the feature discriminator FSRD in step 2) compares the difference between the two in the distribution space, specifically:
the feature discriminator FSRD uses the feature matrix F corresponding to the same imageSR and FHRAs an input, the corresponding scores are output, and the absolute value of the difference between the scores represents the Wasserstein distance of the two in the feature space.
7. The method for recognizing facial expressions of low-resolution images based on feature reconstruction model as claimed in claim 1, wherein in the training process of step 2), the loss function of the feature generator FSRG is represented by the antagonistic loss LGANFeature matrix FSR and FHRThe perceptual loss L betweenPAnd two-norm loss L2Composition is carried out;
against loss LGANComprises the following steps:
wherein b is the size of the data batch;
loss of feature perception LPComprises the following steps:
wherein ,CFC() Represents the output of the last fully connected layer of classifier C;
two norm loss L2Comprises the following steps:
the loss of the feature generator FSRG is a linear sum of the three:
LFSRG=LGAN+λ1LP+λ2L2 (11)
wherein ,λ1 and λ2Are all weight coefficients greater than zero.
8. The method for recognizing the facial expressions of the low-resolution images based on the feature reconstruction model as claimed in claim 1, wherein in the training process of the step 2), the loss function of the feature discriminator FSRD is:
9. The feature reconstruction model-based low-resolution image facial expression recognition method as claimed in claim 1, wherein the expression classifier C in step 2) uses softmax to calculate that the sample belongs to each ClassiThe loss of the real category is reweighted by using the probability value corresponding to the real category, wherein the probability value i is 1.
w=(σ-logit)r (13)
Where, logic is the probability that the sample output by the softmax function corresponds to its true class, and the parameters σ and r are set to 1.5 and 2, respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110055946.8A CN112818764B (en) | 2021-01-15 | 2021-01-15 | Low-resolution image facial expression recognition method based on feature reconstruction model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110055946.8A CN112818764B (en) | 2021-01-15 | 2021-01-15 | Low-resolution image facial expression recognition method based on feature reconstruction model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112818764A true CN112818764A (en) | 2021-05-18 |
CN112818764B CN112818764B (en) | 2023-05-02 |
Family
ID=75869434
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110055946.8A Active CN112818764B (en) | 2021-01-15 | 2021-01-15 | Low-resolution image facial expression recognition method based on feature reconstruction model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112818764B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255517A (en) * | 2021-05-24 | 2021-08-13 | 中国科学技术大学 | Privacy-protecting expression recognition model training method and expression recognition method and device |
CN113344110A (en) * | 2021-06-26 | 2021-09-03 | 浙江理工大学 | Fuzzy image classification method based on super-resolution reconstruction |
CN113486842A (en) * | 2021-07-23 | 2021-10-08 | 北京达佳互联信息技术有限公司 | Expression editing model training method and device and expression editing method and device |
CN113887371A (en) * | 2021-09-26 | 2022-01-04 | 华南理工大学 | Data enhancement method for low-resolution face recognition |
CN113902010A (en) * | 2021-09-30 | 2022-01-07 | 北京百度网讯科技有限公司 | Training method of classification model, image classification method, device, equipment and medium |
CN114648803A (en) * | 2022-05-20 | 2022-06-21 | 中国科学技术大学 | Method, system, equipment and storage medium for recognizing facial expressions in natural scene |
CN114863164A (en) * | 2022-04-02 | 2022-08-05 | 华中科技大学 | Target identification model construction method for small-target super-resolution reconstructed image |
CN115511748A (en) * | 2022-09-30 | 2022-12-23 | 北京航星永志科技有限公司 | Image high-definition processing method and device and electronic equipment |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110007174A1 (en) * | 2009-05-20 | 2011-01-13 | Fotonation Ireland Limited | Identifying Facial Expressions in Acquired Digital Images |
CN107154023A (en) * | 2017-05-17 | 2017-09-12 | 电子科技大学 | Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution |
WO2019015466A1 (en) * | 2017-07-17 | 2019-01-24 | 广州广电运通金融电子股份有限公司 | Method and apparatus for verifying person and certificate |
CN110084119A (en) * | 2019-03-26 | 2019-08-02 | 安徽艾睿思智能科技有限公司 | Low-resolution face image recognition methods based on deep learning |
CN110211045A (en) * | 2019-05-29 | 2019-09-06 | 电子科技大学 | Super-resolution face image method based on SRGAN network |
US20190303717A1 (en) * | 2018-03-28 | 2019-10-03 | Kla-Tencor Corporation | Training a neural network for defect detection in low resolution images |
CN111784581A (en) * | 2020-07-03 | 2020-10-16 | 苏州兴钊防务研究院有限公司 | SAR image super-resolution reconstruction method based on self-normalization generation countermeasure network |
CN111931805A (en) * | 2020-06-23 | 2020-11-13 | 西安交通大学 | Knowledge-guided CNN-based small sample similar abrasive particle identification method |
CN112070058A (en) * | 2020-09-18 | 2020-12-11 | 深延科技(北京)有限公司 | Face and face composite emotional expression recognition method and system |
-
2021
- 2021-01-15 CN CN202110055946.8A patent/CN112818764B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110007174A1 (en) * | 2009-05-20 | 2011-01-13 | Fotonation Ireland Limited | Identifying Facial Expressions in Acquired Digital Images |
CN107154023A (en) * | 2017-05-17 | 2017-09-12 | 电子科技大学 | Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution |
WO2019015466A1 (en) * | 2017-07-17 | 2019-01-24 | 广州广电运通金融电子股份有限公司 | Method and apparatus for verifying person and certificate |
US20190303717A1 (en) * | 2018-03-28 | 2019-10-03 | Kla-Tencor Corporation | Training a neural network for defect detection in low resolution images |
CN110084119A (en) * | 2019-03-26 | 2019-08-02 | 安徽艾睿思智能科技有限公司 | Low-resolution face image recognition methods based on deep learning |
CN110211045A (en) * | 2019-05-29 | 2019-09-06 | 电子科技大学 | Super-resolution face image method based on SRGAN network |
CN111931805A (en) * | 2020-06-23 | 2020-11-13 | 西安交通大学 | Knowledge-guided CNN-based small sample similar abrasive particle identification method |
CN111784581A (en) * | 2020-07-03 | 2020-10-16 | 苏州兴钊防务研究院有限公司 | SAR image super-resolution reconstruction method based on self-normalization generation countermeasure network |
CN112070058A (en) * | 2020-09-18 | 2020-12-11 | 深延科技(北京)有限公司 | Face and face composite emotional expression recognition method and system |
Non-Patent Citations (2)
Title |
---|
XINTAO WANG ETAL.: "ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks", 《 HTTP:ARXIV:1809.00219V2》 * |
姚乃明等: "基于生成式对抗网络的鲁棒人脸表情识别", 《自动化学报》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255517A (en) * | 2021-05-24 | 2021-08-13 | 中国科学技术大学 | Privacy-protecting expression recognition model training method and expression recognition method and device |
CN113255517B (en) * | 2021-05-24 | 2023-10-24 | 中国科学技术大学 | Expression recognition model training method for protecting privacy and expression recognition method and device |
CN113344110A (en) * | 2021-06-26 | 2021-09-03 | 浙江理工大学 | Fuzzy image classification method based on super-resolution reconstruction |
CN113344110B (en) * | 2021-06-26 | 2024-04-05 | 浙江理工大学 | Fuzzy image classification method based on super-resolution reconstruction |
CN113486842A (en) * | 2021-07-23 | 2021-10-08 | 北京达佳互联信息技术有限公司 | Expression editing model training method and device and expression editing method and device |
CN113887371A (en) * | 2021-09-26 | 2022-01-04 | 华南理工大学 | Data enhancement method for low-resolution face recognition |
CN113887371B (en) * | 2021-09-26 | 2024-05-28 | 华南理工大学 | Data enhancement method for low-resolution face recognition |
CN113902010A (en) * | 2021-09-30 | 2022-01-07 | 北京百度网讯科技有限公司 | Training method of classification model, image classification method, device, equipment and medium |
CN114863164A (en) * | 2022-04-02 | 2022-08-05 | 华中科技大学 | Target identification model construction method for small-target super-resolution reconstructed image |
CN114648803A (en) * | 2022-05-20 | 2022-06-21 | 中国科学技术大学 | Method, system, equipment and storage medium for recognizing facial expressions in natural scene |
CN114648803B (en) * | 2022-05-20 | 2022-09-06 | 中国科学技术大学 | Method, system, equipment and storage medium for recognizing facial expressions in natural scene |
CN115511748A (en) * | 2022-09-30 | 2022-12-23 | 北京航星永志科技有限公司 | Image high-definition processing method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN112818764B (en) | 2023-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112818764B (en) | Low-resolution image facial expression recognition method based on feature reconstruction model | |
CN111091045B (en) | Sign language identification method based on space-time attention mechanism | |
Rahman et al. | A new benchmark on american sign language recognition using convolutional neural network | |
Liu | Feature extraction and image recognition with convolutional neural networks | |
CN108537743B (en) | Face image enhancement method based on generation countermeasure network | |
CN107341452B (en) | Human behavior identification method based on quaternion space-time convolution neural network | |
CN112446476A (en) | Neural network model compression method, device, storage medium and chip | |
Teow | Understanding convolutional neural networks using a minimal model for handwritten digit recognition | |
Huynh et al. | Convolutional neural network models for facial expression recognition using bu-3dfe database | |
CN112070768B (en) | Anchor-Free based real-time instance segmentation method | |
CN106326843B (en) | A kind of face identification method | |
CN113379655B (en) | Image synthesis method for generating antagonistic network based on dynamic self-attention | |
CN114821050B (en) | Method for dividing reference image based on transformer | |
CN111967361A (en) | Emotion detection method based on baby expression recognition and crying | |
CN115966010A (en) | Expression recognition method based on attention and multi-scale feature fusion | |
CN108229432A (en) | Face calibration method and device | |
CN114463759A (en) | Lightweight character detection method and device based on anchor-frame-free algorithm | |
CN112668486A (en) | Method, device and carrier for identifying facial expressions of pre-activated residual depth separable convolutional network | |
CN109508640A (en) | Crowd emotion analysis method and device and storage medium | |
Teow | A minimal convolutional neural network for handwritten digit recognition | |
CN115238796A (en) | Motor imagery electroencephalogram signal classification method based on parallel DAMSCN-LSTM | |
Piat et al. | Image classification with quantum pre-training and auto-encoders | |
CN110688966A (en) | Semantic-guided pedestrian re-identification method | |
CN114492634A (en) | Fine-grained equipment image classification and identification method and system | |
CN107133579A (en) | Based on CSGF (2D)2The face identification method of PCANet convolutional networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |