CN112818764A - Low-resolution image facial expression recognition method based on feature reconstruction model - Google Patents

Low-resolution image facial expression recognition method based on feature reconstruction model Download PDF

Info

Publication number
CN112818764A
CN112818764A CN202110055946.8A CN202110055946A CN112818764A CN 112818764 A CN112818764 A CN 112818764A CN 202110055946 A CN202110055946 A CN 202110055946A CN 112818764 A CN112818764 A CN 112818764A
Authority
CN
China
Prior art keywords
feature
matrix
low
image
resolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110055946.8A
Other languages
Chinese (zh)
Other versions
CN112818764B (en
Inventor
田锋
经纬
南方
洪振鑫
郑庆华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202110055946.8A priority Critical patent/CN112818764B/en
Publication of CN112818764A publication Critical patent/CN112818764A/en
Application granted granted Critical
Publication of CN112818764B publication Critical patent/CN112818764B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a low-resolution image facial expression recognition method based on a feature reconstruction model, and belongs to the field of facial image expression recognition. The method comprises the steps of constructing a training and testing data set; then training a facial expression recognition model of the feature reconstruction model, extracting image expression features by using a feature extraction network with fixed parameters, then obtaining an expression feature generator and a feature discriminator by adopting a generation confrontation network mode training model, and using FSRG as inputImage reconstruction feature yields FSR(ii) a Classifier pair feature F consisting of fully connected network and softmax function layerSRClassifying, and re-weighting the sample loss by using the probability value of the correct category corresponding to the sample output by the softmax layer; the method is insensitive to the resolution of the input image, improves the identification accuracy rate under lower resolution, and has more stable identification effect on each resolution.

Description

Low-resolution image facial expression recognition method based on feature reconstruction model
Technical Field
The invention belongs to the field of facial image expression recognition, and particularly relates to a low-resolution image facial expression recognition method based on a feature reconstruction model.
Background
Facial expressions are one of the most direct, natural signals that humans express emotions. Facial expression recognition is a hot topic of researches such as human-computer natural interaction, computer vision, emotion calculation, image processing and the like, and is widely applied to the fields of human-computer interaction, remote education, security, intelligent robot development, medical treatment, animation production and the like.
Under different scenes, due to the change of equipment and environment and the imaging principle of a pinhole camera, the face images of people under the multi-person photographic scene have the problem of different resolutions, and the images can be compressed in network transmission and storage, so that the quality and the resolution of the images are reduced. The recognition accuracy of the algorithm can be severely impacted in low resolution scenarios. In order to more accurately recognize the expression of a person, it is necessary to reduce the influence of the resolution change. With the development of technologies such as deep learning and image super-resolution, when processing low-resolution input images, methods of performing super-resolution reconstruction on the images and then performing recognition are often used. The method of reconstructing an image has the following disadvantages, first: although the method is improved compared with the method of directly using the low-resolution image to recognize the expression, the method causes the problems of a large amount of calculation, unstable effect and the like. Secondly, the method comprises the following steps: since the expression recognition target is a human face, the problem of privacy disclosure is easily caused by high-resolution reconstruction of a human face image, and this point is increasingly emphasized in international research.
Disclosure of Invention
The invention aims to overcome the defects of large calculated amount and easy privacy disclosure of reconstructed face images and provides a low-resolution image facial expression recognition method based on a feature reconstruction model.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
a low-resolution image facial expression recognition method based on a feature reconstruction model comprises the following steps:
1) collecting facial expression images with resolution ratio more than or equal to 100x100 pixels and labeling expression types as original images IHR(ii) a Carrying out 2-8 times integer multiplying factor down-sampling on the original image to obtain a corresponding low-resolution image, wherein the expression category label of the low-resolution image is consistent with the original image; dividing an original image and a corresponding low-resolution image into a training set and a test set;
2) training a neural network model by adopting a generative confrontation network method;
inputting an original image and low-resolution images of respective magnifications into a feature extractor E, and extracting and calculating a feature matrix F of the original image by the feature extractor EHRAnd low resolution image feature matrix F of each magnificationLR
Low resolution image feature matrix FLRInputting the data into an expression feature generator FSRG, and outputting a generated reconstruction feature matrix FSR
Feature matrix F of the original imageHRAnd corresponding reconstructed feature matrix F of the low resolution imageSRInputting the data into a characteristic discriminator FSRD, comparing the difference of the two in a distribution space, and optimizing the characteristic discriminator FSRG through back propagation;
reconstruction of the expressive features FSRInputting the samples into a double-layer fully-connected expression classifier C for classification, calculating the probability of the samples being classified into various categories by the expression classifier C, calculating a weight coefficient by using the probability value of each sample being correctly classified to carry out weight weighting on the loss of the samples, and accelerating the convergence of a neural network;
repeating the training process until a trained neural network model is obtained;
3) inputting the facial image of the expression to be recognized into a trained neural network model, and extracting an input image by a feature extractor EAn image feature matrix F, a feature generator FSRG for generating a reconstructed feature matrix FSRAnd the classifier C calculates and outputs the class label of the recognition result.
Further, the feature extractor E in step 2) is formed by combining a plurality of convolution layers and nonlinear activation layers, and is a feature extraction part of the expression recognition model pre-trained by the original image data set.
Further, the process of extracting the features in the feature extractor E in step 2) is as follows:
extracting a three-dimensional feature tensor T for the input image I, wherein the size of the feature tensor T is w x h x n, w and h are the length and width of the feature tensor, and n is the number of channels;
calculating a covariance matrix M of the feature tensor T:
Figure BDA0002900616070000031
wherein ,fiOne channel representing the characteristic tensor T,
Figure BDA0002900616070000032
for the mean value of the channels of the feature tensor, M ∈n*nN is the number of channels of the feature tensor T;
correcting the characteristic value of the covariance matrix M to obtain a corrected covariance matrix M+
M+=M+λ*trace(M)*I (2)
Where λ is a coefficient greater than zero, I is an identity matrix, trace (M) is the trace of matrix M;
covariance matrix M for rectification+And performing pooling operation and logarithm of the characteristic value to obtain a characteristic matrix F.
Further, the corrected covariance matrix M+Performing pooling operation and logarithm of the characteristic value to obtain a characteristic matrix, wherein the process of obtaining the characteristic matrix is as follows:
Fcov=WM+WT (3)
wherein ,
Figure BDA0002900616070000033
for pooling parameter matrices, matrices
Figure BDA0002900616070000034
To FcovPerforming eigenvalue decomposition and eigenvalue correction to obtain a matrix F+The method comprises the following specific operations:
Fcov=U1Σ1U1 T (4)
F+=U1max(εI,Σ1)U1 T (5)
wherein max () is the maximum value of the corresponding elements of the two matrices;
to F+Performing eigenvalue decomposition and logarithm of the eigenvalue to obtain an eigen matrix F, specifically:
F+=U2Σ2U2 T (6)
F=U2log(Σ2)U2 T (7)
wherein, log (Σ)2) Finger-to-eigenvalue matrix sigma2Is logarithmic.
Further, the feature generator FSRG in step 2) is a full convolution network, and is composed of a convolution neural network and a nonlinear activation layer, and the process of reconstructing the feature matrix by the feature generator FSRG is as follows:
feature matrix F with low resolution imagesLRFor input, the reconstructed feature matrix F is outputSRThe dimension of the matrix before and after reconstruction is consistent.
Further, the feature discriminator FSRD in step 2) compares the difference between the two in the distribution space, specifically:
the feature discriminator FSRD uses the feature matrix F corresponding to the same imageSR and FHRAs an input, the corresponding scores are output, and the absolute value of the difference between the scores represents the Wasserstein distance of the two in the feature space.
Further, step 2) of trainingIn the process, the loss function of the feature generator FSRG is formed by the penalty LGANFeature matrix FSR and FHRThe perceptual loss L betweenPAnd two-norm loss L2Composition is carried out;
against loss LGANComprises the following steps:
Figure BDA0002900616070000041
wherein b is the size of the data batch;
loss of feature perception LPComprises the following steps:
Figure BDA0002900616070000042
wherein ,CFC() Represents the output of the last fully connected layer of classifier C;
two norm loss L2Comprises the following steps:
Figure BDA0002900616070000051
the loss of the feature generator FSRG is a linear sum of the three:
LFSRG=LGAN1LP2L2 (11)
wherein ,λ1 and λ2Are all weight coefficients greater than zero.
Further, in the training process of step 2), the loss function of the feature discriminator FSRD is:
Figure BDA0002900616070000052
wherein ,
Figure BDA0002900616070000053
theta is a random number between 0 and 1, and ensures data of each batchIn
Figure BDA0002900616070000054
Is FSR and FHRThe linear interpolation result of (2); p and k are each
Figure BDA0002900616070000055
The exponential parameter and the coefficient parameter of the term.
Further, the expression classifier C in the step 2) calculates that the sample belongs to each Class by using softmaxiThe loss of the real category is reweighted by using the probability value corresponding to the real category, wherein the probability value i is 1.
w=(σ-logit)r (13)
Figure BDA0002900616070000056
Where, logic is the probability that the sample output by the softmax function corresponds to its true class, and the parameters σ and r are set to 1.5 and 2, respectively.
Compared with the prior art, the invention has the following beneficial effects:
the low-resolution image facial expression recognition method based on the feature reconstruction model comprises the steps of constructing a training and testing data set, carrying out different-magnification down-sampling on a high-resolution facial expression image to generate a plurality of high-low-resolution image pairs with multiple magnifications, and simultaneously keeping a category label; then training a facial expression recognition model of the feature reconstruction model, and extracting high-resolution image expression features F by using a feature extraction network with fixed parametersHRAnd corresponding low resolution image expressive features FLR(ii) a Then, an expression feature generator FSRG and a feature discriminator FSRD are obtained by adopting a training model in a mode of generating an antagonistic network, and the FSRG is used for reconstructing features of an input image to obtain FSR(ii) a Classifier C consisting of fully connected network and softmax function layer pairs feature FSRClassifying, and re-weighting the sample loss by using the probability value of the correct category corresponding to the sample output by the softmax layer, so as to accelerate the convergence of the model; identification processComprises the following steps: the model extracts a feature matrix F of the input image, and then a feature generator FSRG generates a reconstructed feature matrix FSRAnd calculating and outputting a class label of the recognition result by using the classifier C obtained by training. The invention provides a method for generating a network by combining deep learning countermeasure and reconstructing image characteristics to identify facial expressions. Compared with the traditional method, the method is insensitive to the resolution of the input image, and the identification accuracy under the lower resolution is improved; compared with a method for reconstructing an image, the method has more stable identification effect on each resolution, can avoid the problems of increased calculated amount and possible privacy disclosure caused by the reconstructed image, and has great industrial application value.
Drawings
FIG. 1 is an overall network of a low-resolution image facial expression recognition method based on a feature reconstruction model according to the present invention;
FIG. 2 is a network structure of a feature extractor of the present invention;
FIG. 3 is a network structure of the feature generation part of the present invention, wherein FIG. 3(a) is a feature generator network structure and FIG. 3(b) is a dense connection block structure;
fig. 4 is a network structure of the feature discriminator of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention provides a method for generating a network by combining deep learning countermeasure and reconstructing image characteristics to identify facial expressions. Compared with the traditional method, the method disclosed by the invention is insensitive to the resolution of the input image, and the identification accuracy under a lower resolution is improved; compared with a method for reconstructing an image, the method has more stable recognition effect on each resolution, can avoid the problems of increased calculated amount and possible privacy disclosure caused by the reconstructed image, and has great industrial application value in the fields of education analysis and management and entertainment.
The invention is described in further detail below with reference to the accompanying drawings:
referring to fig. 1, fig. 1 is an overall network of the low-resolution image facial expression recognition method based on the feature reconstruction model of the present invention; the network comprises four main parts, namely a feature extractor, a feature generator, a feature discriminator and an expression classifier.
Referring to fig. 2, fig. 2 is a network structure of the feature extractor of the present invention; the feature extractor contains six convolutional layers (Conv Layer) with convolutional kernels of 3x3 and step size of 1. The number of output characteristic channels of each convolutional layer is 64, 96, 128 and 256 in sequence, each convolutional layer is followed by an activation layer, and the activation function is a ReLu function. After each of the first, second and fourth active layers there is a pooling Layer, using a maximum pooling (Max boosting Layer), with a pooling window size of 2x2, with a step size of 2.
Referring to fig. 3, fig. 3(a) is a feature generator network structure and fig. 3(b) is a dense connection block structure; the method is composed of three cascade of dense connection blocks and residual connection between input and output, wherein the structure of a single dense block is shown in fig. 3(b), and comprises five convolution layer-batch normalization layer (BN) combinations, and dense connection is adopted between groups and an lreul function is added to serve as an active layer.
Referring to fig. 4, fig. 4 is a network structure of the feature discriminator of the present invention, which is formed by cascading five rolling blocks and two fully connected layers, where the number of output channels of each rolling block is sequentially 8, 16, 32, 64, and 64; each convolution block is formed by alternately arranging two convolution layers and two active layers, wherein the convolution kernel of the former convolution layer is 3x3 and has the step length of 1, the convolution kernel of the latter convolution layer is 5x5 and has the step length of 2; the output dimensions of the last two fully connected layers are, in turn, 100 and 1 dimensions.
The invention discloses a low-resolution image facial expression recognition method based on a feature reconstruction model, which comprises the following implementation processes:
a model training part:
step 1: collecting facial expression images with resolution ratio more than or equal to 100x100 pixels and labeling expression types as original images IHR(ii) a Adopting bicubic interpolation mode to make 2-8 times integer multiplying power down-sampling for original image (the length and width of image are changed into original resolution ratio)
Figure BDA0002900616070000081
To
Figure BDA0002900616070000082
) Obtaining a plurality of low resolution images (I)LR-2To ILR-8) (ii) a The expression category label of the low-resolution image is consistent with the original image;
step 2: feature extractor E using fixed parameter pre-training extracts feature matrix F of original resolution imageHRFeature matrix F of low-resolution image corresponding to each magnificationLRThe feature extractor E includes a convolutional layer and a nonlinear active layer. The method comprises the steps that a high-low resolution image pair is input at one time, for one image I, a corresponding three-dimensional feature tensor T is extracted by using a feature extractor, the size of the feature tensor T is w x h x n, w and h are the length and the width of the corresponding feature tensor, and n is the number of channels;
and step 3: calculating the covariance matrix of the respective feature tensors T:
Figure BDA0002900616070000091
wherein ,fiOne channel representing the characteristic tensor T,
Figure BDA0002900616070000092
for the mean value of the channels of the feature tensor, M ∈n*nAnd n is the number of channels of the feature tensor T.
And 4, step 4: in order to ensure the positive nature of the matrix, the eigenvalue correction is carried out on each covariance matrix:
M+=M+λ*trace(M)*I (2)
wherein, λ is a coefficient larger than zero, and since the covariance matrix is symmetric and semi-positive, in order to reduce the influence of this operation on the feature matrix and ensure positive, the value of λ is 0.0001; i is the identity matrix.
And 5: for covariance matrix M+Performing pooling operation on the characteristic values and logarithm of the characteristic values to obtain a characteristic matrix, wherein the specific operation is as follows:
Fcov=WM+WT (3)
wherein ,
Figure BDA0002900616070000093
for pooling parameter matrices, the specific parameters are optimized by back-propagation learning
Figure BDA0002900616070000094
Step 6: for matrix FcovEigenvalue decomposition is performed and the matrix F is obtained by the following operation+The method comprises the following specific operations:
Fcov=U1Σ1U1 T (4)
F+=U1max(εI,Σ1)U1 T (5)
where max () is the maximum value element by element for both matrices.
And 7: f to the matrix+Taking logarithm of the eigenvalue to obtain a final eigen matrix F, and specifically operating as follows:
Fcov=U2Σ2U2 T (6)
F=U2log(Σ2)U2 T (7)
wherein, log (Σ)2) Finger-to-feature matrix sigma2Is logarithmic.
And 7: model structure initialization
The feature generator FSRG is a full convolution network, and the invention adopts ResNet-50 to realize the feature matrix F of low-resolution imagesLRFor input, the reconstructed feature matrix F is outputSRThe dimension of the input and output characteristic matrix is kept consistent, so that the original pooling operation in ResNet is removed; the feature discriminator FSRD adopts a VGG-16 network to respectively use an original image feature matrix FHRCharacteristic matrix F of low resolution image of corresponding multiplying powerSRIs input; the expression classifier C consists of two full-connection layers and a softmax function layer and outputs the probability of each classification column.
And 8: setting a loss function
During training, the loss function of the feature generator FSRG is determined by the penalty LGANFeature matrix FSR and FHRThe perceptual loss L betweenPAnd L2 distance loss L2In which L isGANComprises the following steps:
Figure BDA0002900616070000101
where b is the size of the data batch, LGANIs used for restraining the feature perception loss, L, of the feature generator FSRG and the feature discriminator FSRDPComprises the following steps:
Figure BDA0002900616070000102
wherein ,CFC() Representing the output of the last fully connected layer of classifier C.
The loss of the feature generator FSRG is a linear sum of the three:
LFSRG=LGAN1LP2L2 (10)
wherein ,λ1 and λ2Both are adjustable weight coefficients greater than zero, both coefficients being set to 0.1 in the present invention.
The loss function calculation mode of the feature discriminator FSRD is as follows:
Figure BDA0002900616070000111
wherein ,
Figure BDA0002900616070000112
theta is a random number between 0 and 1, and is ensured in the data of each batch
Figure BDA0002900616070000113
Is FSR and FHRAs a result of linear interpolation, p and k are respectively
Figure BDA0002900616070000114
The index parameter and the coefficient parameter of the term, p is 6 and k is 2, can obtain the best effect in the experiment.
The concrete operations of calculating the probability of the sample belonging to each category by using softmax and re-weighting the sample loss are as follows:
w=(σ-logit)r (12)
Figure BDA0002900616070000115
wherein, logic is the probability of the sample output by the softmax function corresponding to the real category, and parameters sigma and r are respectively set to 1.5 and 2; the penalty function for classifier C is set to cross entropy penalty.
And step 9: model training
The gradient was updated using an Adam optimizer, the learning rate was set to 0.00002, Adam's first order momentum parameter was 0.1, and second order momentum parameter was 0.999. The number of data set training iterations (Epoch) was set to 400 and the data batch size (batch size) was set to 16.
The model using part:
extracting an image feature tensor T by using a feature extractor E, and then performing feature reconstruction by using a feature generator FSRG to obtain corresponding reconstructed features FSRThen, the classifier C calculates the probability that the sample belongs to each class, and classifies the sample into the class having the highest probability.
Referring to table 1, table 1 shows the average accuracy of expression recognition on the face image downsampled at each magnification of the RAF-DB data set by different methods, and the method provided by the present invention is significantly improved compared with a method for directly performing Bicubic interpolation amplification on a low-resolution image. Compared with a super-resolution method RCAN and Meta-SR for reconstructing images, the method has better effect on images with lower resolution and higher average identification accuracy of images with various scales. The method provided by the invention is obviously improved compared with a method for directly carrying out Bicubic interpolation amplification on the low-resolution image. Compared with the super-resolution method RCAN and Meta-SR of the reconstructed image, the method has better effect on the image with lower resolution and higher average identification accuracy of the image with each scale.
TABLE 1 mean rate of accuracy of expression recognition on RAF-DB data set for each-magnification downsampling face image by different methods
Figure BDA0002900616070000121
The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (9)

1. A low-resolution image facial expression recognition method based on a feature reconstruction model is characterized by comprising the following steps:
1) collecting facial expression images with resolution ratio more than or equal to 100x100 pixels and labeling expression types as original images IHR(ii) a Carrying out 2-8 times integer multiplying factor down-sampling on the original image to obtain a corresponding low-resolution image, wherein the expression category label of the low-resolution image is consistent with the original image; dividing an original image and a corresponding low-resolution image into a training set and a test set;
2) training a neural network model by adopting a generative confrontation network method;
inputting an original image and low-resolution images of respective magnifications into a feature extractor E, and extracting and calculating a feature matrix F of the original image by the feature extractor EHRAnd low resolution image feature matrix F of each magnificationLR
Low resolution image feature matrix FLRInputting the data into an expression feature generator FSRG, and outputting a generated reconstruction feature matrix FSR
Feature matrix F of the original imageHRAnd corresponding reconstructed feature matrix F of the low resolution imageSRInputting the data into a characteristic discriminator FSRD, comparing the difference of the two in a distribution space, and optimizing the characteristic discriminator FSRG through back propagation;
reconstruction of the expressive features FSRInputting the samples into a double-layer fully-connected expression classifier C for classification, calculating the probability of the samples being classified into various categories by the expression classifier C, calculating a weight coefficient by using the probability value of each sample being correctly classified to carry out weight weighting on the loss of the samples, and accelerating the convergence of a neural network;
repeating the training process until a trained neural network model is obtained;
3) inputting a facial image with an expression to be recognized into a trained neural network model, extracting a feature matrix F of the input image by a feature extractor E, and generating a reconstructed feature matrix F by a feature generator FSRGSRAnd the classifier C calculates and outputs the class label of the recognition result.
2. The feature reconstruction model-based low-resolution image facial expression recognition method as claimed in claim 1, wherein the feature extractor E in step 2) is formed by combining a plurality of convolution layers and nonlinear activation layers and is a feature extraction part of an expression recognition model pre-trained by an original image data set.
3. The method for recognizing the facial expression of the low-resolution image based on the feature reconstruction model as claimed in claim 1, wherein the feature extraction process in the feature extractor E in the step 2) is as follows:
extracting a three-dimensional feature tensor T for the input image I, wherein the size of the feature tensor T is w x h x n, w and h are the length and width of the feature tensor, and n is the number of channels;
calculating a covariance matrix M of the feature tensor T:
Figure FDA0002900616060000021
wherein ,fiOne channel representing the characteristic tensor T,
Figure FDA0002900616060000022
for the mean value of the channels of the feature tensor, M ∈n*nN is the number of channels of the feature tensor T;
correcting the characteristic value of the covariance matrix M to obtain a corrected covariance matrix M+
M+=M+λ*trace(M)*I (2)
Where λ is a coefficient greater than zero, I is an identity matrix, trace (M) is the trace of matrix M;
covariance matrix M for rectification+And performing pooling operation and logarithm of the characteristic value to obtain a characteristic matrix F.
4. According to claimThe feature reconstruction model-based low-resolution image facial expression recognition method of claim 3, characterized in that the corrected covariance matrix M is corrected+Performing pooling operation and logarithm of the characteristic value to obtain a characteristic matrix, wherein the process of obtaining the characteristic matrix is as follows:
Fcov=WM+WT (3)
wherein ,
Figure FDA0002900616060000023
for pooling parameter matrices, matrices
Figure FDA0002900616060000024
To FcovPerforming eigenvalue decomposition and eigenvalue correction to obtain a matrix F+The method comprises the following specific operations:
Fcov=U1Σ1U1 T (4)
F+=U1max(εI,Σ1)U1 T (5)
wherein max () is the maximum value of the corresponding elements of the two matrices;
to F+Performing eigenvalue decomposition and logarithm of the eigenvalue to obtain an eigen matrix F, specifically:
F+=U2Σ2U2 T (6)
F=U2log(Σ2)U2 T (7)
wherein, log (Σ)2) Finger-to-eigenvalue matrix sigma2Is logarithmic.
5. The method for recognizing facial expressions of low-resolution images based on a feature reconstruction model according to claim 1, wherein the feature generator FSRG in the step 2) is a full convolution network, which is composed of a convolution neural network and a nonlinear activation layer, and the process of reconstructing the feature matrix by the feature generator FSRG is as follows:
feature matrix F with low resolution imagesLRFor input, the reconstructed feature matrix F is outputSRThe dimension of the matrix before and after reconstruction is consistent.
6. The feature reconstruction model-based low-resolution image facial expression recognition method according to claim 1, wherein the feature discriminator FSRD in step 2) compares the difference between the two in the distribution space, specifically:
the feature discriminator FSRD uses the feature matrix F corresponding to the same imageSR and FHRAs an input, the corresponding scores are output, and the absolute value of the difference between the scores represents the Wasserstein distance of the two in the feature space.
7. The method for recognizing facial expressions of low-resolution images based on feature reconstruction model as claimed in claim 1, wherein in the training process of step 2), the loss function of the feature generator FSRG is represented by the antagonistic loss LGANFeature matrix FSR and FHRThe perceptual loss L betweenPAnd two-norm loss L2Composition is carried out;
against loss LGANComprises the following steps:
Figure FDA0002900616060000031
wherein b is the size of the data batch;
loss of feature perception LPComprises the following steps:
Figure FDA0002900616060000041
wherein ,CFC() Represents the output of the last fully connected layer of classifier C;
two norm loss L2Comprises the following steps:
Figure FDA0002900616060000042
the loss of the feature generator FSRG is a linear sum of the three:
LFSRG=LGAN1LP2L2 (11)
wherein ,λ1 and λ2Are all weight coefficients greater than zero.
8. The method for recognizing the facial expressions of the low-resolution images based on the feature reconstruction model as claimed in claim 1, wherein in the training process of the step 2), the loss function of the feature discriminator FSRD is:
Figure FDA0002900616060000043
wherein ,
Figure FDA0002900616060000044
theta is a random number between 0 and 1, and is ensured in the data of each batch
Figure FDA0002900616060000045
Is FSR and FHRThe linear interpolation result of (2); p and k are each
Figure FDA0002900616060000046
The exponential parameter and the coefficient parameter of the term.
9. The feature reconstruction model-based low-resolution image facial expression recognition method as claimed in claim 1, wherein the expression classifier C in step 2) uses softmax to calculate that the sample belongs to each ClassiThe loss of the real category is reweighted by using the probability value corresponding to the real category, wherein the probability value i is 1.
w=(σ-logit)r (13)
Figure FDA0002900616060000047
Where, logic is the probability that the sample output by the softmax function corresponds to its true class, and the parameters σ and r are set to 1.5 and 2, respectively.
CN202110055946.8A 2021-01-15 2021-01-15 Low-resolution image facial expression recognition method based on feature reconstruction model Active CN112818764B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110055946.8A CN112818764B (en) 2021-01-15 2021-01-15 Low-resolution image facial expression recognition method based on feature reconstruction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110055946.8A CN112818764B (en) 2021-01-15 2021-01-15 Low-resolution image facial expression recognition method based on feature reconstruction model

Publications (2)

Publication Number Publication Date
CN112818764A true CN112818764A (en) 2021-05-18
CN112818764B CN112818764B (en) 2023-05-02

Family

ID=75869434

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110055946.8A Active CN112818764B (en) 2021-01-15 2021-01-15 Low-resolution image facial expression recognition method based on feature reconstruction model

Country Status (1)

Country Link
CN (1) CN112818764B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255517A (en) * 2021-05-24 2021-08-13 中国科学技术大学 Privacy-protecting expression recognition model training method and expression recognition method and device
CN113344110A (en) * 2021-06-26 2021-09-03 浙江理工大学 Fuzzy image classification method based on super-resolution reconstruction
CN113486842A (en) * 2021-07-23 2021-10-08 北京达佳互联信息技术有限公司 Expression editing model training method and device and expression editing method and device
CN113887371A (en) * 2021-09-26 2022-01-04 华南理工大学 Data enhancement method for low-resolution face recognition
CN113902010A (en) * 2021-09-30 2022-01-07 北京百度网讯科技有限公司 Training method of classification model, image classification method, device, equipment and medium
CN114648803A (en) * 2022-05-20 2022-06-21 中国科学技术大学 Method, system, equipment and storage medium for recognizing facial expressions in natural scene
CN114863164A (en) * 2022-04-02 2022-08-05 华中科技大学 Target identification model construction method for small-target super-resolution reconstructed image
CN115511748A (en) * 2022-09-30 2022-12-23 北京航星永志科技有限公司 Image high-definition processing method and device and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110007174A1 (en) * 2009-05-20 2011-01-13 Fotonation Ireland Limited Identifying Facial Expressions in Acquired Digital Images
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
WO2019015466A1 (en) * 2017-07-17 2019-01-24 广州广电运通金融电子股份有限公司 Method and apparatus for verifying person and certificate
CN110084119A (en) * 2019-03-26 2019-08-02 安徽艾睿思智能科技有限公司 Low-resolution face image recognition methods based on deep learning
CN110211045A (en) * 2019-05-29 2019-09-06 电子科技大学 Super-resolution face image method based on SRGAN network
US20190303717A1 (en) * 2018-03-28 2019-10-03 Kla-Tencor Corporation Training a neural network for defect detection in low resolution images
CN111784581A (en) * 2020-07-03 2020-10-16 苏州兴钊防务研究院有限公司 SAR image super-resolution reconstruction method based on self-normalization generation countermeasure network
CN111931805A (en) * 2020-06-23 2020-11-13 西安交通大学 Knowledge-guided CNN-based small sample similar abrasive particle identification method
CN112070058A (en) * 2020-09-18 2020-12-11 深延科技(北京)有限公司 Face and face composite emotional expression recognition method and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110007174A1 (en) * 2009-05-20 2011-01-13 Fotonation Ireland Limited Identifying Facial Expressions in Acquired Digital Images
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
WO2019015466A1 (en) * 2017-07-17 2019-01-24 广州广电运通金融电子股份有限公司 Method and apparatus for verifying person and certificate
US20190303717A1 (en) * 2018-03-28 2019-10-03 Kla-Tencor Corporation Training a neural network for defect detection in low resolution images
CN110084119A (en) * 2019-03-26 2019-08-02 安徽艾睿思智能科技有限公司 Low-resolution face image recognition methods based on deep learning
CN110211045A (en) * 2019-05-29 2019-09-06 电子科技大学 Super-resolution face image method based on SRGAN network
CN111931805A (en) * 2020-06-23 2020-11-13 西安交通大学 Knowledge-guided CNN-based small sample similar abrasive particle identification method
CN111784581A (en) * 2020-07-03 2020-10-16 苏州兴钊防务研究院有限公司 SAR image super-resolution reconstruction method based on self-normalization generation countermeasure network
CN112070058A (en) * 2020-09-18 2020-12-11 深延科技(北京)有限公司 Face and face composite emotional expression recognition method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XINTAO WANG ETAL.: "ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks", 《 HTTP:ARXIV:1809.00219V2》 *
姚乃明等: "基于生成式对抗网络的鲁棒人脸表情识别", 《自动化学报》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255517A (en) * 2021-05-24 2021-08-13 中国科学技术大学 Privacy-protecting expression recognition model training method and expression recognition method and device
CN113255517B (en) * 2021-05-24 2023-10-24 中国科学技术大学 Expression recognition model training method for protecting privacy and expression recognition method and device
CN113344110A (en) * 2021-06-26 2021-09-03 浙江理工大学 Fuzzy image classification method based on super-resolution reconstruction
CN113344110B (en) * 2021-06-26 2024-04-05 浙江理工大学 Fuzzy image classification method based on super-resolution reconstruction
CN113486842A (en) * 2021-07-23 2021-10-08 北京达佳互联信息技术有限公司 Expression editing model training method and device and expression editing method and device
CN113887371A (en) * 2021-09-26 2022-01-04 华南理工大学 Data enhancement method for low-resolution face recognition
CN113887371B (en) * 2021-09-26 2024-05-28 华南理工大学 Data enhancement method for low-resolution face recognition
CN113902010A (en) * 2021-09-30 2022-01-07 北京百度网讯科技有限公司 Training method of classification model, image classification method, device, equipment and medium
CN114863164A (en) * 2022-04-02 2022-08-05 华中科技大学 Target identification model construction method for small-target super-resolution reconstructed image
CN114648803A (en) * 2022-05-20 2022-06-21 中国科学技术大学 Method, system, equipment and storage medium for recognizing facial expressions in natural scene
CN114648803B (en) * 2022-05-20 2022-09-06 中国科学技术大学 Method, system, equipment and storage medium for recognizing facial expressions in natural scene
CN115511748A (en) * 2022-09-30 2022-12-23 北京航星永志科技有限公司 Image high-definition processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN112818764B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN112818764B (en) Low-resolution image facial expression recognition method based on feature reconstruction model
CN111091045B (en) Sign language identification method based on space-time attention mechanism
Rahman et al. A new benchmark on american sign language recognition using convolutional neural network
Liu Feature extraction and image recognition with convolutional neural networks
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN107341452B (en) Human behavior identification method based on quaternion space-time convolution neural network
CN112446476A (en) Neural network model compression method, device, storage medium and chip
Teow Understanding convolutional neural networks using a minimal model for handwritten digit recognition
Huynh et al. Convolutional neural network models for facial expression recognition using bu-3dfe database
CN112070768B (en) Anchor-Free based real-time instance segmentation method
CN106326843B (en) A kind of face identification method
CN113379655B (en) Image synthesis method for generating antagonistic network based on dynamic self-attention
CN114821050B (en) Method for dividing reference image based on transformer
CN111967361A (en) Emotion detection method based on baby expression recognition and crying
CN115966010A (en) Expression recognition method based on attention and multi-scale feature fusion
CN108229432A (en) Face calibration method and device
CN114463759A (en) Lightweight character detection method and device based on anchor-frame-free algorithm
CN112668486A (en) Method, device and carrier for identifying facial expressions of pre-activated residual depth separable convolutional network
CN109508640A (en) Crowd emotion analysis method and device and storage medium
Teow A minimal convolutional neural network for handwritten digit recognition
CN115238796A (en) Motor imagery electroencephalogram signal classification method based on parallel DAMSCN-LSTM
Piat et al. Image classification with quantum pre-training and auto-encoders
CN110688966A (en) Semantic-guided pedestrian re-identification method
CN114492634A (en) Fine-grained equipment image classification and identification method and system
CN107133579A (en) Based on CSGF (2D)2The face identification method of PCANet convolutional networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant