CN112818764B - Low-resolution image facial expression recognition method based on feature reconstruction model - Google Patents
Low-resolution image facial expression recognition method based on feature reconstruction model Download PDFInfo
- Publication number
- CN112818764B CN112818764B CN202110055946.8A CN202110055946A CN112818764B CN 112818764 B CN112818764 B CN 112818764B CN 202110055946 A CN202110055946 A CN 202110055946A CN 112818764 B CN112818764 B CN 112818764B
- Authority
- CN
- China
- Prior art keywords
- feature
- matrix
- image
- low
- resolution image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 230000008921 facial expression Effects 0.000 title claims abstract description 28
- 230000014509 gene expression Effects 0.000 claims abstract description 33
- 238000012549 training Methods 0.000 claims abstract description 21
- 238000000605 extraction Methods 0.000 claims abstract description 6
- 238000012360 testing method Methods 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 76
- 230000006870 function Effects 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 11
- 230000004913 activation Effects 0.000 claims description 10
- 238000012937 correction Methods 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000009826 distribution Methods 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 230000008447 perception Effects 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 6
- 230000001815 facial effect Effects 0.000 abstract description 2
- 238000004364 calculation method Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 241000764238 Isis Species 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4053—Super resolution, i.e. output image resolution higher than sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
Abstract
The invention discloses a low-resolution image facial expression recognition method based on a feature reconstruction model, and belongs to the field of facial image expression recognition. The invention includes constructing training and testing data sets; then training a facial expression recognition model of a feature reconstruction model, extracting image expression features by using a feature extraction network with fixed parameters, obtaining an expression feature generator and a feature discriminator by adopting a training model in a mode of generating an countermeasure network, and obtaining F by using FSRG as an input image reconstruction feature SR The method comprises the steps of carrying out a first treatment on the surface of the Classifier pair feature F consisting of fully connected network and softmax function layer SR Classifying, and re-weighting the sample loss by using the probability value of the correct category corresponding to the sample output by the softmax layer; the invention is insensitive to the resolution of the input image, improves the recognition accuracy under lower resolution, and has more stable recognition effect on each resolution.
Description
Technical Field
The invention belongs to the field of facial image expression recognition, and particularly relates to a low-resolution image facial expression recognition method based on a feature reconstruction model.
Background
Facial expression is one of the most direct, natural signals that humans express emotion. Facial expression recognition is a hot topic of research such as man-machine natural interaction, computer vision, emotion calculation, image processing and the like, and has wide application in the fields of man-machine interaction, distance education, safety, intelligent robot development, medical treatment, animation production and the like.
Under different scenes, due to the change of equipment and environment and the imaging principle of a pinhole camera, the problem of different resolutions of 'near-large-far-small' exists in face images of people in a multi-person photographing scene, and the images can be compressed in network transmission and storage, so that the quality and resolution of the images are reduced. The recognition accuracy of the algorithm may be severely affected in low resolution scenarios. In order to more accurately recognize the expression of the person, it is necessary to reduce the influence of the change in resolution. With the development of deep learning, image super-resolution and other technologies, when processing low-resolution input images, methods of performing super-resolution reconstruction on the images and then performing recognition are mostly adopted. The method of reconstructing an image has the following disadvantages, first: although the expression recognition is improved compared with the method of directly using the low-resolution image, the method has the problems of greatly increased calculated amount, unstable effect and the like. Second,: because the object of expression recognition is a human face, the problem of privacy leakage is easily caused by high-resolution reconstruction of a human face image, and the problem is more and more paid attention to in international research.
Disclosure of Invention
The invention aims to overcome the defects of large calculation amount and easy privacy leakage of reconstructed face images and provides a low-resolution image facial expression recognition method based on a feature reconstruction model.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
a low-resolution image facial expression recognition method based on a feature reconstruction model comprises the following steps:
1) Collecting facial expression images with resolution ratio of more than or equal to 100x100 pixels and labeling expression typesAs an original image I HR The method comprises the steps of carrying out a first treatment on the surface of the Downsampling the original image by an integer multiplying power of 2-8 times to obtain a corresponding low-resolution image, wherein the expression class label of the low-resolution image is consistent with that of the original image; dividing an original image and a corresponding low-resolution image into a training set and a testing set;
2) Training a neural network model by adopting a generated countermeasure network method;
inputting the original image and the low resolution image of each magnification into a feature extractor E, and the feature extractor E extracts and calculates a feature matrix F of the original image HR And a low-resolution image feature matrix F of each magnification LR ;
Low resolution image feature matrix F LR Input into an expression feature generator FSRG, and output to generate a reconstructed feature matrix F SR ;
Feature matrix F of original image HR And a corresponding reconstructed feature matrix F of the low resolution image SR Inputting the difference in the distribution space of the two characteristics into a characteristic discriminator FSRD, and optimizing the characteristic discriminator FSRG through back propagation;
reconstructing expression feature F SR Inputting the samples into a double-layer fully-connected expression classifier C for classification, calculating the probability of classifying the samples into various categories by the expression classifier C, and re-weighting the loss of each sample by using the probability value calculated weight coefficient of each sample correctly classified so as to accelerate the convergence of a neural network;
repeating the training process until a trained neural network model is obtained;
3) Inputting a face image of an expression to be recognized into a trained neural network model, extracting a feature matrix F of the input image by a feature extractor E, and generating a reconstructed feature matrix F by a feature generator FSRG SR The classifier C calculates and outputs a class label of the recognition result.
Further, the feature extractor E in step 2) is formed by combining a plurality of convolution layers and nonlinear activation layers, and is a feature extraction part of the expression recognition model pre-trained by the original image dataset.
Further, the feature extraction process in the feature extractor E in step 2) is as follows:
for an input image I, extracting a three-dimensional characteristic tensor T, wherein the size of the characteristic tensor T is w x h x n, w and h are the length and width of the characteristic tensor, and n is the channel number;
calculating a covariance matrix M of the characteristic tensor T:
wherein ,fi Representing one channel of the feature tensor T,m E is the average value of each channel of the characteristic tensor n*n N is the number of channels of the feature tensor T;
correcting the eigenvalue of the covariance matrix M to obtain a corrected covariance matrix M + :
M + =M+λ*trace(M)*I (2)
Where λ is a coefficient greater than zero, I is the identity matrix, trace (M) is the trace of matrix M;
covariance matrix M for correction + And carrying out pooling operation and taking logarithm of the characteristic value to obtain a characteristic matrix F.
Further, for corrected covariance matrix M + The process of pooling operation and taking logarithm of the characteristic value to obtain the characteristic matrix is as follows:
F cov =WM + W T (3)
For F cov Performing eigenvalue decomposition and eigenvalue correction to obtain matrix F + The specific operation is as follows:
F cov =U 1 Σ 1 U 1 T (4)
F + =U 1 max(εI,Σ 1 )U 1 T (5)
wherein, max () is the maximum value of the corresponding elements of the two matrixes;
for F + The specific operation is as follows:
F + =U 2 Σ 2 U 2 T (6)
F=U 2 log(Σ 2 )U 2 T (7)
wherein log (Σ 2 ) Finger pair eigenvalue matrix Σ 2 The operation of taking the logarithm of each element of (a).
Further, the feature generator FSRG in step 2) is a full convolution network, which is composed of a convolution neural network and a nonlinear activation layer, and the process of reconstructing a feature matrix by the feature generator FSRG is as follows:
feature matrix F of image with low resolution LR For inputting and outputting reconstructed characteristic matrix F SR The matrix before and after reconstruction remains dimensionally consistent.
Further, in step 2), the feature discriminator FSRD compares the differences between the two in the distribution space, specifically:
the feature discriminator FSRD respectively uses the feature matrix F corresponding to the same image SR and FHR As input, corresponding scores are output, and the absolute value of the difference between the scores represents the wasperstein distance of both in the feature space.
Further, during the training of step 2), the loss function of the feature generator FSRG is determined by countering the loss L GAN Feature matrix F SR and FHR Perceptual loss L between P And a two-norm loss L 2 Composition;
countering loss L GAN The method comprises the following steps:
where b is the size of the data batch;
loss of feature perception L P The method comprises the following steps:
wherein ,CFC () Representing the output of the last full-connection layer of the classifier C;
loss of two norms L 2 The method comprises the following steps:
the loss of the feature generator FSRG is a linear sum of the three:
L FSRG =L GAN +λ 1 L P +λ 2 L 2 (11)
wherein ,λ1 and λ2 Are all weight coefficients greater than zero.
Further, in the training process of step 2), the loss function of the feature discriminator FSRD is:
wherein ,θ is a random number between 0 and 1, ensuring that each batch of data isIs F SR and FHR Linear interpolation results of (2); p and k are each->An exponential parameter and a coefficient parameter of the term.
Further, the expression classifier C in step 2) calculates that the sample belongs to each Class using softmax i I=1,..z, z is the total number of categories, the loss of which is re-weighted with the probability value corresponding to the real category therein, the specific operation being:
w=(σ-logit) r (13)
where logit is the probability that the sample output by the softmax function corresponds to its true class, parameters σ and r are set to 1.5 and 2, respectively.
Compared with the prior art, the invention has the following beneficial effects:
according to the low-resolution image facial expression recognition method based on the feature reconstruction model, a training and testing data set is constructed, different multiplying power downsampling is carried out on high-resolution facial expression images to generate high-low resolution image pairs with multiple multiplying power, and category labels are reserved; then training a facial expression recognition model of the feature reconstruction model, and extracting high-resolution image expression features F by using a feature extraction network with fixed parameters HR And corresponding low resolution image expressive features F LR The method comprises the steps of carrying out a first treatment on the surface of the Then training a model by adopting a mode of generating an countermeasure network to obtain an expression feature generator FSRG and a feature discriminator FSRD, and reconstructing features by using the FSRG as an input image to obtain F SR The method comprises the steps of carrying out a first treatment on the surface of the Classifier C pair feature F composed of fully connected network and softmax function layer SR Classifying, re-weighting the sample loss by using the probability value of the correct category corresponding to the sample output by the softmax layer, and accelerating model convergence; the identification process is as follows: the model extracts a feature matrix F of the input image, and then a feature generator FSRG generates a reconstructed feature matrix F SR And calculating and outputting the class labels of the recognition results by using the classifier C obtained through training. The invention provides a method for reconstructing image features by combining a deep learning countermeasure generation network to recognize facial expressions. Compared with the traditional method, the invention is insensitive to the resolution of the input image, and improves the lower resolutionThe recognition accuracy under the rate; compared with the method for reconstructing the image, the method has the advantages that the identification effect on each resolution is more stable, the problems of increased calculation amount and possible privacy leakage caused by reconstructing the image can be avoided, and the method has great industrial application value.
Drawings
FIG. 1 is an overall network of a low resolution image facial expression recognition method based on a feature reconstruction model of the present invention;
FIG. 2 is a network architecture of the feature extractor of the present invention;
FIG. 3 is a network structure of a signature generating part of the present invention, wherein FIG. 3 (a) is a signature generator network structure and FIG. 3 (b) is a dense connection block structure;
fig. 4 is a network structure of the feature discriminator of the invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention provides a method for reconstructing image features by combining a deep learning countermeasure generation network to recognize facial expressions. Compared with the traditional method, the method is insensitive to the resolution of the input image, and improves the recognition accuracy under lower resolution; compared with the method for reconstructing the image, the method has more stable recognition effect on each resolution, can avoid the problems of increased calculation amount and possible privacy leakage caused by reconstructing the image, and has great industrial application value in the fields of education analysis, management and entertainment.
The invention is described in further detail below with reference to the attached drawing figures:
referring to fig. 1, fig. 1 is an overall network of a low-resolution image facial expression recognition method based on a feature reconstruction model according to the present invention; the network comprises four main parts, namely a feature extractor, a feature generator, a feature discriminator and an expression classifier.
Referring to fig. 2, fig. 2 is a network structure of the feature extractor of the present invention; the feature extractor comprises six convolutional layers (Conv Layer) each having a 3x3 convolutional kernel and a step size of 1. The number of output characteristic channels of each convolution layer is 64, 96, 128 and 256 in turn, each convolution layer is followed by an activation layer, and the activation function is a ReLu function. After each of the first, second and fourth activation layers there is a pooling Layer, using a maximum pooling Layer, with a pooling window size of 2x2 and a step size of 2.
Referring to fig. 3, fig. 3 (a) is a feature generator network structure, and fig. 3 (b) is a dense connection block structure; the structure of a single dense block is shown in fig. 3 (b), and the structure comprises five convolution layer-batch normalization layer (BatchNormalization, BN) combinations, wherein dense connection is adopted among groups and an LReLu function is added as an activation layer.
Referring to fig. 4, fig. 4 is a network structure of the feature discriminator of the invention, which is composed of five convolution blocks and two full-connection hierarchies, and the number of output channels of each convolution block is 8, 16, 32, 64 and 64 in sequence; each convolution block is formed by alternately arranging two convolution layers and two activation layers, wherein the convolution kernel of the former convolution layer is 3x3, the step length is 1, the convolution kernel of the latter convolution layer is 5x5, and the step length is 2; the output dimensions of the last two fully connected layers are sequentially 100 dimensions and 1 dimension.
The invention discloses a low-resolution image facial expression recognition method based on a feature reconstruction model, which comprises the following implementation processes:
model training part:
step 1: collecting facial expression images with resolution ratio of 100x100 pixels or more, and labeling expression types to serve as an original image I HR The method comprises the steps of carrying out a first treatment on the surface of the Downsampling the original image by 2-8 times of integer multiplying power (the length and width of the image are changed into original resolution) by bicubic interpolationTo->) Obtaining a plurality of low resolution images (I LR-2 To I LR-8 ) The method comprises the steps of carrying out a first treatment on the surface of the The expression category label of the low-resolution image is consistent with the original image;
step 2: feature extractor E, pre-trained using fixed parameters, extracts feature matrix F of original resolution image HR Feature matrix F of low-resolution image corresponding to each magnification LR The feature extractor E comprises a convolution layer and a nonlinear activation layer. One input is a high-low resolution image pair, for one image I, a feature extractor is used for extracting a corresponding three-dimensional feature tensor T, the size of the feature tensor T is w x h x n, w and h are the length and the width of the corresponding feature tensor, and n is the channel number;
step 3: calculating covariance matrices of the respective feature tensors T:
wherein ,fi Representing one channel of the feature tensor T,m E is the average value of each channel of the characteristic tensor n*n N is the number of channels of the feature tensor T.
Step 4: to ensure the positive nature of the matrix, eigenvalue correction is performed on each covariance matrix:
M + =M+λ*trace(M)*I (2)
wherein lambda is a coefficient larger than zero, and as the covariance matrix is symmetrically semi-positive, the lambda takes a value of 0.0001 in order to reduce the influence of the operation on the feature matrix and ensure positive quality; i is the identity matrix.
Step 5: for covariance matrix M + Carrying out pooling operation on the characteristic values and taking logarithms of the characteristic values to obtain a characteristic matrix, wherein the specific operation is as follows:
F cov =WM + W T (3)
wherein ,to pool the parameter matrix, the specific parameters are optimized by back propagation learning, matrix +.>
Step 6: for matrix F cov Decomposing the eigenvalue and performing the following operation to obtain a matrix F + The specific operation is as follows:
F cov =U 1 Σ 1 U 1 T (4)
F + =U 1 max(εI,Σ 1 )U 1 T (5)
where max () is the maximum value of two matrices element by element.
Step 7: f to matrix + Taking the logarithm of the eigenvalue to obtain a final eigenvector F, wherein the specific operation is as follows:
F cov =U 2 Σ 2 U 2 T (6)
F=U 2 log(Σ 2 )U 2 T (7)
wherein log (Σ 2 ) Finger pair feature matrix Σ 2 The operation of taking the logarithm of each element of (a).
Step 7: model structure initialization
The feature generator FSRG is a full convolution network, and is implemented by ResNet-50 in the invention, and features matrix F of low-resolution image is adopted LR For inputting and outputting reconstructed characteristic matrix F SR The dimension of the input and output feature matrix is consistent, so that the original pooling operation in ResNet is removed; the feature discriminator FSRD is adopted in the invention that the VGG-16 network respectively uses the original image feature matrix F HR Feature matrix F of low-resolution image of magnification corresponding thereto SR Is input; the expression classifier C consists of two full-connection layers and a softmax function layer, and outputs the probability of each classification list.
Step 8: setting a loss function
During training, the loss function of the feature generator FSRG is determined by countering the loss L GAN Feature matrix F SR and FHR Perceptual loss L between P And L2 distance loss L 2 Composition, wherein L GAN The method comprises the following steps:
where b is the size of the data batch, L GAN Is to constrain the feature perception loss of the feature generator FSRG and the feature discriminator FSRD, L P The method comprises the following steps:
wherein ,CFC () Representing the output of the last fully connected layer of classifier C.
The loss of the feature generator FSRG is a linear sum of the three:
L FSRG =L GAN +λ 1 L P +λ 2 L 2 (10)
wherein ,λ1 and λ2 All are adjustable weight coefficients greater than zero, and in the invention, both coefficients are set to 0.1.
The loss function calculation mode of the feature discriminator FSRD is as follows:
wherein ,θ is a random number between 0 and 1, ensuring that each batch of data isIs F SR and FHR Is p and k are respectively +.>The exponential parameters and coefficient parameters of the terms, p=6 and k=2 in the experiment, give the best results.
The concrete operation of calculating the probability that a sample belongs to each class using softmax and re-weighting the sample loss is:
w=(σ-logit) r (12)
wherein, the logit is the probability that the sample output by the softmax function corresponds to the real category of the sample, and the parameters sigma and r are respectively set to be 1.5 and 2; the loss function of classifier C is set to cross entropy loss.
Step 9: model training
The gradient was updated using Adam optimizer, the learning rate was set to 0.00002, the Adam's one-order momentum parameter was 0.1, and the second-order momentum parameter was 0.999. The data set training iteration number (Epoch) was set to 400 and the data batch size (batch size) was set to 16.
Model use part:
extracting an image feature tensor T by using a feature extractor E, and carrying out feature reconstruction by using a feature generator FSRG to obtain a corresponding reconstructed feature F SR The probability that the sample belongs to each class is then calculated by the classifier C, classifying the sample into the class with the highest probability.
Referring to table 1, table 1 shows that the expression recognition average accuracy of different methods on the face image downsampled by each multiplying power of the RAF-DB data set is obviously improved compared with the method for directly carrying out Bicubic interpolation amplification on the low-resolution image. Compared with the super-resolution method RCAN and the Meta-SR of the reconstructed image, the method has better effect on the image with lower resolution and higher average recognition accuracy of each scale image. Compared with the method for directly carrying out Bicubic interpolation amplification on the low-resolution image, the method provided by the invention has obvious improvement. Compared with the super-resolution method RCAN and the Meta-SR of the reconstructed image, the method has better effect on the image with lower resolution and higher average recognition accuracy of each scale image.
TABLE 1 average accuracy of expression recognition on down-sampled face images for each magnification of RAF-DB dataset by different methods
The above is only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited by this, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the claims of the present invention.
Claims (8)
1. The low-resolution image facial expression recognition method based on the feature reconstruction model is characterized by comprising the following steps of:
1) Collecting facial expression images with resolution ratio of 100x100 pixels or more, and labeling expression types to serve as an original image I HR The method comprises the steps of carrying out a first treatment on the surface of the Both the length and width of the image are changed to the original resolutionTo->Obtaining a corresponding low-resolution image, wherein the expression category label of the low-resolution image is consistent with that of the original image; taking one part of the original image and the corresponding low-resolution image as a training set, and taking the other part of the original image and the corresponding low-resolution image as a test set;
2) Training a neural network model by adopting a generated countermeasure network method;
inputting the original image and the low resolution image of each magnification into a feature extractor E, and the feature extractor E extracts and calculates a feature matrix F of the original image HR And a low-resolution image feature matrix F of each magnification LR ;
Low resolution image feature matrix F LR Input into an expression feature generator FSRG, and output to generate a reconstructed feature matrix F SR ;
Feature matrix F of original image HR And a corresponding reconstructed feature matrix F of the low resolution image SR Inputting the difference in the distribution space of the two characteristics into a characteristic discriminator FSRD, and optimizing the characteristic discriminator FSRG through back propagation;
reconstructing expression feature F SR Inputting the samples into a double-layer fully-connected expression classifier C for classification, calculating the probability of classifying the samples into various categories by the expression classifier C, and re-weighting the loss of each sample by using the probability value calculated weight coefficient of each sample correctly classified so as to accelerate the convergence of a neural network;
repeating the training process until a trained neural network model is obtained;
during the training of step 2), the loss function of the feature generator FSRG is determined by countering the loss L GAN Feature matrix F SR and FHR Perceptual loss L between P And a two-norm loss L 2 Composition;
countering loss L GAN The method comprises the following steps:
where b is the size of the data batch;
loss of feature perception L P The method comprises the following steps:
wherein ,CFC () Representing the output of the last full-connection layer of the classifier C;
loss of two norms L 2 The method comprises the following steps:
the loss of the feature generator FSRG is a linear sum of the three:
L FSRG =L GAN +λ 1 L P +λ 2 L 2 (11)
wherein ,λ1 and λ2 All are weight coefficients greater than zero;
3) Inputting a face image of an expression to be recognized into a trained neural network model, extracting a feature matrix F of the input image by a feature extractor E, and generating a reconstructed feature matrix F by a feature generator FSRG SR The classifier C calculates and outputs a class label of the recognition result.
2. The low resolution image facial expression recognition method based on a feature reconstruction model according to claim 1, wherein the feature extractor E in step 2) is composed of a plurality of convolution layers and a nonlinear activation layer, and is a feature extraction portion of an expression recognition model pre-trained by an original image dataset.
3. The low-resolution image facial expression recognition method based on the feature reconstruction model according to claim 1, wherein the feature extraction process in the feature extractor E in step 2) is as follows:
for an input image I, extracting a three-dimensional characteristic tensor T, wherein the size of the characteristic tensor T is w x h x n, w and h are the length and width of the characteristic tensor, and n is the channel number;
calculating a covariance matrix M of the characteristic tensor T:
wherein ,fi Representing one channel of the feature tensor T,mean value of the channels of the feature tensor, +.>n is the number of channels of the feature tensor T;
correcting the eigenvalue of the covariance matrix M to obtain a corrected covariance matrix M + :
M + =M+λ*trace(M)*I (2)
Where λ is a coefficient greater than zero, I is the identity matrix, trace (M) is the trace of matrix M;
covariance matrix M for correction + And carrying out pooling operation and taking logarithm of the characteristic value to obtain a characteristic matrix F.
4. A low resolution image facial expression recognition method based on a feature reconstruction model as claimed in claim 3, wherein the covariance matrix M for correction + The process of pooling operation and taking logarithm of the characteristic value to obtain the characteristic matrix is as follows:
F cov =WM + W T (3)
for F cov Performing eigenvalue decomposition and eigenvalue correction to obtain matrix F + The specific operation is as follows:
F cov =U 1 Σ 1 U 1 T (4)
F + =U 1 max(εI,Σ 1 )U 1 T (5)
wherein, max () is the maximum value of the corresponding elements of the two matrixes;
for F + The specific operation is as follows:
F + =U 2 Σ 2 U 2 T (6)
F=U 2 log(Σ 2 )U 2 T (7)
wherein log (Σ 2 ) Finger pair eigenvalue matrix Σ 2 The operation of taking the logarithm of each element of (a).
5. The method for recognizing facial expression of a low-resolution image based on a feature reconstruction model according to claim 1, wherein the feature generator FSRG in the step 2) is a full convolution network, which is composed of a convolution neural network and a nonlinear activation layer, and the process of reconstructing a feature matrix by the feature generator FSRG is as follows:
feature matrix F of image with low resolution LR For inputting and outputting reconstructed characteristic matrix F SR The matrix before and after reconstruction remains dimensionally consistent.
6. The method for recognizing facial expression of a low-resolution image based on a feature reconstruction model according to claim 1, wherein the feature discriminator FSRD in step 2) compares the differences between the two in the distribution space, specifically:
the feature discriminator FSRD respectively uses the feature matrix F corresponding to the same image SR and FHR As input, corresponding scores are output, and the absolute value of the difference between the scores represents the wasperstein distance of both in the feature space.
7. The method for recognizing facial expressions of low-resolution images based on a feature reconstruction model according to claim 1, wherein the training process of step 2) includes the following steps:
8. The facial expression recognition method of a low-resolution image based on a feature reconstruction model as set forth in claim 1, wherein the expression classifier C in step 2) calculates samples belonging to each Class using softmax i I=1,..a., z, z is the total number of categories, and its loss is re-weighted by the probability value corresponding to the real category, specifically:
w=(σ-logit) r (13)
Where logit is the probability that the sample output by the softmax function corresponds to its true class, parameters σ and r are set to 1.5 and 2, respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110055946.8A CN112818764B (en) | 2021-01-15 | 2021-01-15 | Low-resolution image facial expression recognition method based on feature reconstruction model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110055946.8A CN112818764B (en) | 2021-01-15 | 2021-01-15 | Low-resolution image facial expression recognition method based on feature reconstruction model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112818764A CN112818764A (en) | 2021-05-18 |
CN112818764B true CN112818764B (en) | 2023-05-02 |
Family
ID=75869434
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110055946.8A Active CN112818764B (en) | 2021-01-15 | 2021-01-15 | Low-resolution image facial expression recognition method based on feature reconstruction model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112818764B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255517B (en) * | 2021-05-24 | 2023-10-24 | 中国科学技术大学 | Expression recognition model training method for protecting privacy and expression recognition method and device |
CN113344110B (en) * | 2021-06-26 | 2024-04-05 | 浙江理工大学 | Fuzzy image classification method based on super-resolution reconstruction |
CN113486842A (en) * | 2021-07-23 | 2021-10-08 | 北京达佳互联信息技术有限公司 | Expression editing model training method and device and expression editing method and device |
CN113887371A (en) * | 2021-09-26 | 2022-01-04 | 华南理工大学 | Data enhancement method for low-resolution face recognition |
CN114648803B (en) * | 2022-05-20 | 2022-09-06 | 中国科学技术大学 | Method, system, equipment and storage medium for recognizing facial expressions in natural scene |
CN115511748A (en) * | 2022-09-30 | 2022-12-23 | 北京航星永志科技有限公司 | Image high-definition processing method and device and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107154023A (en) * | 2017-05-17 | 2017-09-12 | 电子科技大学 | Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution |
WO2019015466A1 (en) * | 2017-07-17 | 2019-01-24 | 广州广电运通金融电子股份有限公司 | Method and apparatus for verifying person and certificate |
CN110084119A (en) * | 2019-03-26 | 2019-08-02 | 安徽艾睿思智能科技有限公司 | Low-resolution face image recognition methods based on deep learning |
CN110211045A (en) * | 2019-05-29 | 2019-09-06 | 电子科技大学 | Super-resolution face image method based on SRGAN network |
CN111784581A (en) * | 2020-07-03 | 2020-10-16 | 苏州兴钊防务研究院有限公司 | SAR image super-resolution reconstruction method based on self-normalization generation countermeasure network |
CN111931805A (en) * | 2020-06-23 | 2020-11-13 | 西安交通大学 | Knowledge-guided CNN-based small sample similar abrasive particle identification method |
CN112070058A (en) * | 2020-09-18 | 2020-12-11 | 深延科技(北京)有限公司 | Face and face composite emotional expression recognition method and system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8488023B2 (en) * | 2009-05-20 | 2013-07-16 | DigitalOptics Corporation Europe Limited | Identifying facial expressions in acquired digital images |
US10599951B2 (en) * | 2018-03-28 | 2020-03-24 | Kla-Tencor Corp. | Training a neural network for defect detection in low resolution images |
-
2021
- 2021-01-15 CN CN202110055946.8A patent/CN112818764B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107154023A (en) * | 2017-05-17 | 2017-09-12 | 电子科技大学 | Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution |
WO2019015466A1 (en) * | 2017-07-17 | 2019-01-24 | 广州广电运通金融电子股份有限公司 | Method and apparatus for verifying person and certificate |
CN110084119A (en) * | 2019-03-26 | 2019-08-02 | 安徽艾睿思智能科技有限公司 | Low-resolution face image recognition methods based on deep learning |
CN110211045A (en) * | 2019-05-29 | 2019-09-06 | 电子科技大学 | Super-resolution face image method based on SRGAN network |
CN111931805A (en) * | 2020-06-23 | 2020-11-13 | 西安交通大学 | Knowledge-guided CNN-based small sample similar abrasive particle identification method |
CN111784581A (en) * | 2020-07-03 | 2020-10-16 | 苏州兴钊防务研究院有限公司 | SAR image super-resolution reconstruction method based on self-normalization generation countermeasure network |
CN112070058A (en) * | 2020-09-18 | 2020-12-11 | 深延科技(北京)有限公司 | Face and face composite emotional expression recognition method and system |
Non-Patent Citations (2)
Title |
---|
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks;Xintao Wang etal.;《 http:arXiv:1809.00219v2》;20180917;全文 * |
基于生成式对抗网络的鲁棒人脸表情识别;姚乃明等;《自动化学报》;20180531;第44卷(第5期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112818764A (en) | 2021-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112818764B (en) | Low-resolution image facial expression recognition method based on feature reconstruction model | |
CN108537743B (en) | Face image enhancement method based on generation countermeasure network | |
CN111091045B (en) | Sign language identification method based on space-time attention mechanism | |
CN108717568B (en) | A kind of image characteristics extraction and training method based on Three dimensional convolution neural network | |
CN107977932B (en) | Face image super-resolution reconstruction method based on discriminable attribute constraint generation countermeasure network | |
CN112949565B (en) | Single-sample partially-shielded face recognition method and system based on attention mechanism | |
CN107341452B (en) | Human behavior identification method based on quaternion space-time convolution neural network | |
CN109522857B (en) | People number estimation method based on generation type confrontation network model | |
CN110348330B (en) | Face pose virtual view generation method based on VAE-ACGAN | |
CN108648197B (en) | Target candidate region extraction method based on image background mask | |
CN109886881B (en) | Face makeup removal method | |
CN107085704A (en) | Fast face expression recognition method based on ELM own coding algorithms | |
Huynh et al. | Convolutional neural network models for facial expression recognition using bu-3dfe database | |
CN109389171B (en) | Medical image classification method based on multi-granularity convolution noise reduction automatic encoder technology | |
CN115484410B (en) | Event camera video reconstruction method based on deep learning | |
CN110728629A (en) | Image set enhancement method for resisting attack | |
CN112766165B (en) | Falling pre-judging method based on deep neural network and panoramic segmentation | |
CN112184582B (en) | Attention mechanism-based image completion method and device | |
CN116168067B (en) | Supervised multi-modal light field depth estimation method based on deep learning | |
CN112257741B (en) | Method for detecting generative anti-false picture based on complex neural network | |
CN114463759A (en) | Lightweight character detection method and device based on anchor-frame-free algorithm | |
CN105550712B (en) | Aurora image classification method based on optimization convolution autocoding network | |
CN109977989A (en) | A kind of processing method of image tensor data | |
CN111967361A (en) | Emotion detection method based on baby expression recognition and crying | |
CN112686817A (en) | Image completion method based on uncertainty estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |