CN112818764B - Low-resolution image facial expression recognition method based on feature reconstruction model - Google Patents

Low-resolution image facial expression recognition method based on feature reconstruction model Download PDF

Info

Publication number
CN112818764B
CN112818764B CN202110055946.8A CN202110055946A CN112818764B CN 112818764 B CN112818764 B CN 112818764B CN 202110055946 A CN202110055946 A CN 202110055946A CN 112818764 B CN112818764 B CN 112818764B
Authority
CN
China
Prior art keywords
feature
matrix
image
low
resolution image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110055946.8A
Other languages
Chinese (zh)
Other versions
CN112818764A (en
Inventor
田锋
经纬
南方
洪振鑫
郑庆华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202110055946.8A priority Critical patent/CN112818764B/en
Publication of CN112818764A publication Critical patent/CN112818764A/en
Application granted granted Critical
Publication of CN112818764B publication Critical patent/CN112818764B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Abstract

The invention discloses a low-resolution image facial expression recognition method based on a feature reconstruction model, and belongs to the field of facial image expression recognition. The invention includes constructing training and testing data sets; then training a facial expression recognition model of a feature reconstruction model, extracting image expression features by using a feature extraction network with fixed parameters, obtaining an expression feature generator and a feature discriminator by adopting a training model in a mode of generating an countermeasure network, and obtaining F by using FSRG as an input image reconstruction feature SR The method comprises the steps of carrying out a first treatment on the surface of the Classifier pair feature F consisting of fully connected network and softmax function layer SR Classifying, and re-weighting the sample loss by using the probability value of the correct category corresponding to the sample output by the softmax layer; the invention is insensitive to the resolution of the input image, improves the recognition accuracy under lower resolution, and has more stable recognition effect on each resolution.

Description

Low-resolution image facial expression recognition method based on feature reconstruction model
Technical Field
The invention belongs to the field of facial image expression recognition, and particularly relates to a low-resolution image facial expression recognition method based on a feature reconstruction model.
Background
Facial expression is one of the most direct, natural signals that humans express emotion. Facial expression recognition is a hot topic of research such as man-machine natural interaction, computer vision, emotion calculation, image processing and the like, and has wide application in the fields of man-machine interaction, distance education, safety, intelligent robot development, medical treatment, animation production and the like.
Under different scenes, due to the change of equipment and environment and the imaging principle of a pinhole camera, the problem of different resolutions of 'near-large-far-small' exists in face images of people in a multi-person photographing scene, and the images can be compressed in network transmission and storage, so that the quality and resolution of the images are reduced. The recognition accuracy of the algorithm may be severely affected in low resolution scenarios. In order to more accurately recognize the expression of the person, it is necessary to reduce the influence of the change in resolution. With the development of deep learning, image super-resolution and other technologies, when processing low-resolution input images, methods of performing super-resolution reconstruction on the images and then performing recognition are mostly adopted. The method of reconstructing an image has the following disadvantages, first: although the expression recognition is improved compared with the method of directly using the low-resolution image, the method has the problems of greatly increased calculated amount, unstable effect and the like. Second,: because the object of expression recognition is a human face, the problem of privacy leakage is easily caused by high-resolution reconstruction of a human face image, and the problem is more and more paid attention to in international research.
Disclosure of Invention
The invention aims to overcome the defects of large calculation amount and easy privacy leakage of reconstructed face images and provides a low-resolution image facial expression recognition method based on a feature reconstruction model.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
a low-resolution image facial expression recognition method based on a feature reconstruction model comprises the following steps:
1) Collecting facial expression images with resolution ratio of more than or equal to 100x100 pixels and labeling expression typesAs an original image I HR The method comprises the steps of carrying out a first treatment on the surface of the Downsampling the original image by an integer multiplying power of 2-8 times to obtain a corresponding low-resolution image, wherein the expression class label of the low-resolution image is consistent with that of the original image; dividing an original image and a corresponding low-resolution image into a training set and a testing set;
2) Training a neural network model by adopting a generated countermeasure network method;
inputting the original image and the low resolution image of each magnification into a feature extractor E, and the feature extractor E extracts and calculates a feature matrix F of the original image HR And a low-resolution image feature matrix F of each magnification LR
Low resolution image feature matrix F LR Input into an expression feature generator FSRG, and output to generate a reconstructed feature matrix F SR
Feature matrix F of original image HR And a corresponding reconstructed feature matrix F of the low resolution image SR Inputting the difference in the distribution space of the two characteristics into a characteristic discriminator FSRD, and optimizing the characteristic discriminator FSRG through back propagation;
reconstructing expression feature F SR Inputting the samples into a double-layer fully-connected expression classifier C for classification, calculating the probability of classifying the samples into various categories by the expression classifier C, and re-weighting the loss of each sample by using the probability value calculated weight coefficient of each sample correctly classified so as to accelerate the convergence of a neural network;
repeating the training process until a trained neural network model is obtained;
3) Inputting a face image of an expression to be recognized into a trained neural network model, extracting a feature matrix F of the input image by a feature extractor E, and generating a reconstructed feature matrix F by a feature generator FSRG SR The classifier C calculates and outputs a class label of the recognition result.
Further, the feature extractor E in step 2) is formed by combining a plurality of convolution layers and nonlinear activation layers, and is a feature extraction part of the expression recognition model pre-trained by the original image dataset.
Further, the feature extraction process in the feature extractor E in step 2) is as follows:
for an input image I, extracting a three-dimensional characteristic tensor T, wherein the size of the characteristic tensor T is w x h x n, w and h are the length and width of the characteristic tensor, and n is the channel number;
calculating a covariance matrix M of the characteristic tensor T:
Figure BDA0002900616070000031
wherein ,fi Representing one channel of the feature tensor T,
Figure BDA0002900616070000032
m E is the average value of each channel of the characteristic tensor n*n N is the number of channels of the feature tensor T;
correcting the eigenvalue of the covariance matrix M to obtain a corrected covariance matrix M +
M + =M+λ*trace(M)*I (2)
Where λ is a coefficient greater than zero, I is the identity matrix, trace (M) is the trace of matrix M;
covariance matrix M for correction + And carrying out pooling operation and taking logarithm of the characteristic value to obtain a characteristic matrix F.
Further, for corrected covariance matrix M + The process of pooling operation and taking logarithm of the characteristic value to obtain the characteristic matrix is as follows:
F cov =WM + W T (3)
wherein ,
Figure BDA0002900616070000033
for pooling the parameter matrix, matrix->
Figure BDA0002900616070000034
For F cov Performing eigenvalue decomposition and eigenvalue correction to obtain matrix F + The specific operation is as follows:
F cov =U 1 Σ 1 U 1 T (4)
F + =U 1 max(εI,Σ 1 )U 1 T (5)
wherein, max () is the maximum value of the corresponding elements of the two matrixes;
for F + The specific operation is as follows:
F + =U 2 Σ 2 U 2 T (6)
F=U 2 log(Σ 2 )U 2 T (7)
wherein log (Σ 2 ) Finger pair eigenvalue matrix Σ 2 The operation of taking the logarithm of each element of (a).
Further, the feature generator FSRG in step 2) is a full convolution network, which is composed of a convolution neural network and a nonlinear activation layer, and the process of reconstructing a feature matrix by the feature generator FSRG is as follows:
feature matrix F of image with low resolution LR For inputting and outputting reconstructed characteristic matrix F SR The matrix before and after reconstruction remains dimensionally consistent.
Further, in step 2), the feature discriminator FSRD compares the differences between the two in the distribution space, specifically:
the feature discriminator FSRD respectively uses the feature matrix F corresponding to the same image SR and FHR As input, corresponding scores are output, and the absolute value of the difference between the scores represents the wasperstein distance of both in the feature space.
Further, during the training of step 2), the loss function of the feature generator FSRG is determined by countering the loss L GAN Feature matrix F SR and FHR Perceptual loss L between P And a two-norm loss L 2 Composition;
countering loss L GAN The method comprises the following steps:
Figure BDA0002900616070000041
where b is the size of the data batch;
loss of feature perception L P The method comprises the following steps:
Figure BDA0002900616070000042
wherein ,CFC () Representing the output of the last full-connection layer of the classifier C;
loss of two norms L 2 The method comprises the following steps:
Figure BDA0002900616070000051
the loss of the feature generator FSRG is a linear sum of the three:
L FSRG =L GAN1 L P2 L 2 (11)
wherein ,λ1 and λ2 Are all weight coefficients greater than zero.
Further, in the training process of step 2), the loss function of the feature discriminator FSRD is:
Figure BDA0002900616070000052
wherein ,
Figure BDA0002900616070000053
θ is a random number between 0 and 1, ensuring that each batch of data is
Figure BDA0002900616070000054
Is F SR and FHR Linear interpolation results of (2); p and k are each->
Figure BDA0002900616070000055
An exponential parameter and a coefficient parameter of the term.
Further, the expression classifier C in step 2) calculates that the sample belongs to each Class using softmax i I=1,..z, z is the total number of categories, the loss of which is re-weighted with the probability value corresponding to the real category therein, the specific operation being:
w=(σ-logit) r (13)
Figure BDA0002900616070000056
where logit is the probability that the sample output by the softmax function corresponds to its true class, parameters σ and r are set to 1.5 and 2, respectively.
Compared with the prior art, the invention has the following beneficial effects:
according to the low-resolution image facial expression recognition method based on the feature reconstruction model, a training and testing data set is constructed, different multiplying power downsampling is carried out on high-resolution facial expression images to generate high-low resolution image pairs with multiple multiplying power, and category labels are reserved; then training a facial expression recognition model of the feature reconstruction model, and extracting high-resolution image expression features F by using a feature extraction network with fixed parameters HR And corresponding low resolution image expressive features F LR The method comprises the steps of carrying out a first treatment on the surface of the Then training a model by adopting a mode of generating an countermeasure network to obtain an expression feature generator FSRG and a feature discriminator FSRD, and reconstructing features by using the FSRG as an input image to obtain F SR The method comprises the steps of carrying out a first treatment on the surface of the Classifier C pair feature F composed of fully connected network and softmax function layer SR Classifying, re-weighting the sample loss by using the probability value of the correct category corresponding to the sample output by the softmax layer, and accelerating model convergence; the identification process is as follows: the model extracts a feature matrix F of the input image, and then a feature generator FSRG generates a reconstructed feature matrix F SR And calculating and outputting the class labels of the recognition results by using the classifier C obtained through training. The invention provides a method for reconstructing image features by combining a deep learning countermeasure generation network to recognize facial expressions. Compared with the traditional method, the invention is insensitive to the resolution of the input image, and improves the lower resolutionThe recognition accuracy under the rate; compared with the method for reconstructing the image, the method has the advantages that the identification effect on each resolution is more stable, the problems of increased calculation amount and possible privacy leakage caused by reconstructing the image can be avoided, and the method has great industrial application value.
Drawings
FIG. 1 is an overall network of a low resolution image facial expression recognition method based on a feature reconstruction model of the present invention;
FIG. 2 is a network architecture of the feature extractor of the present invention;
FIG. 3 is a network structure of a signature generating part of the present invention, wherein FIG. 3 (a) is a signature generator network structure and FIG. 3 (b) is a dense connection block structure;
fig. 4 is a network structure of the feature discriminator of the invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention provides a method for reconstructing image features by combining a deep learning countermeasure generation network to recognize facial expressions. Compared with the traditional method, the method is insensitive to the resolution of the input image, and improves the recognition accuracy under lower resolution; compared with the method for reconstructing the image, the method has more stable recognition effect on each resolution, can avoid the problems of increased calculation amount and possible privacy leakage caused by reconstructing the image, and has great industrial application value in the fields of education analysis, management and entertainment.
The invention is described in further detail below with reference to the attached drawing figures:
referring to fig. 1, fig. 1 is an overall network of a low-resolution image facial expression recognition method based on a feature reconstruction model according to the present invention; the network comprises four main parts, namely a feature extractor, a feature generator, a feature discriminator and an expression classifier.
Referring to fig. 2, fig. 2 is a network structure of the feature extractor of the present invention; the feature extractor comprises six convolutional layers (Conv Layer) each having a 3x3 convolutional kernel and a step size of 1. The number of output characteristic channels of each convolution layer is 64, 96, 128 and 256 in turn, each convolution layer is followed by an activation layer, and the activation function is a ReLu function. After each of the first, second and fourth activation layers there is a pooling Layer, using a maximum pooling Layer, with a pooling window size of 2x2 and a step size of 2.
Referring to fig. 3, fig. 3 (a) is a feature generator network structure, and fig. 3 (b) is a dense connection block structure; the structure of a single dense block is shown in fig. 3 (b), and the structure comprises five convolution layer-batch normalization layer (BatchNormalization, BN) combinations, wherein dense connection is adopted among groups and an LReLu function is added as an activation layer.
Referring to fig. 4, fig. 4 is a network structure of the feature discriminator of the invention, which is composed of five convolution blocks and two full-connection hierarchies, and the number of output channels of each convolution block is 8, 16, 32, 64 and 64 in sequence; each convolution block is formed by alternately arranging two convolution layers and two activation layers, wherein the convolution kernel of the former convolution layer is 3x3, the step length is 1, the convolution kernel of the latter convolution layer is 5x5, and the step length is 2; the output dimensions of the last two fully connected layers are sequentially 100 dimensions and 1 dimension.
The invention discloses a low-resolution image facial expression recognition method based on a feature reconstruction model, which comprises the following implementation processes:
model training part:
step 1: collecting facial expression images with resolution ratio of 100x100 pixels or more, and labeling expression types to serve as an original image I HR The method comprises the steps of carrying out a first treatment on the surface of the Downsampling the original image by 2-8 times of integer multiplying power (the length and width of the image are changed into original resolution) by bicubic interpolation
Figure BDA0002900616070000081
To->
Figure BDA0002900616070000082
) Obtaining a plurality of low resolution images (I LR-2 To I LR-8 ) The method comprises the steps of carrying out a first treatment on the surface of the The expression category label of the low-resolution image is consistent with the original image;
step 2: feature extractor E, pre-trained using fixed parameters, extracts feature matrix F of original resolution image HR Feature matrix F of low-resolution image corresponding to each magnification LR The feature extractor E comprises a convolution layer and a nonlinear activation layer. One input is a high-low resolution image pair, for one image I, a feature extractor is used for extracting a corresponding three-dimensional feature tensor T, the size of the feature tensor T is w x h x n, w and h are the length and the width of the corresponding feature tensor, and n is the channel number;
step 3: calculating covariance matrices of the respective feature tensors T:
Figure BDA0002900616070000091
wherein ,fi Representing one channel of the feature tensor T,
Figure BDA0002900616070000092
m E is the average value of each channel of the characteristic tensor n*n N is the number of channels of the feature tensor T.
Step 4: to ensure the positive nature of the matrix, eigenvalue correction is performed on each covariance matrix:
M + =M+λ*trace(M)*I (2)
wherein lambda is a coefficient larger than zero, and as the covariance matrix is symmetrically semi-positive, the lambda takes a value of 0.0001 in order to reduce the influence of the operation on the feature matrix and ensure positive quality; i is the identity matrix.
Step 5: for covariance matrix M + Carrying out pooling operation on the characteristic values and taking logarithms of the characteristic values to obtain a characteristic matrix, wherein the specific operation is as follows:
F cov =WM + W T (3)
wherein ,
Figure BDA0002900616070000093
to pool the parameter matrix, the specific parameters are optimized by back propagation learning, matrix +.>
Figure BDA0002900616070000094
Step 6: for matrix F cov Decomposing the eigenvalue and performing the following operation to obtain a matrix F + The specific operation is as follows:
F cov =U 1 Σ 1 U 1 T (4)
F + =U 1 max(εI,Σ 1 )U 1 T (5)
where max () is the maximum value of two matrices element by element.
Step 7: f to matrix + Taking the logarithm of the eigenvalue to obtain a final eigenvector F, wherein the specific operation is as follows:
F cov =U 2 Σ 2 U 2 T (6)
F=U 2 log(Σ 2 )U 2 T (7)
wherein log (Σ 2 ) Finger pair feature matrix Σ 2 The operation of taking the logarithm of each element of (a).
Step 7: model structure initialization
The feature generator FSRG is a full convolution network, and is implemented by ResNet-50 in the invention, and features matrix F of low-resolution image is adopted LR For inputting and outputting reconstructed characteristic matrix F SR The dimension of the input and output feature matrix is consistent, so that the original pooling operation in ResNet is removed; the feature discriminator FSRD is adopted in the invention that the VGG-16 network respectively uses the original image feature matrix F HR Feature matrix F of low-resolution image of magnification corresponding thereto SR Is input; the expression classifier C consists of two full-connection layers and a softmax function layer, and outputs the probability of each classification list.
Step 8: setting a loss function
During training, the loss function of the feature generator FSRG is determined by countering the loss L GAN Feature matrix F SR and FHR Perceptual loss L between P And L2 distance loss L 2 Composition, wherein L GAN The method comprises the following steps:
Figure BDA0002900616070000101
where b is the size of the data batch, L GAN Is to constrain the feature perception loss of the feature generator FSRG and the feature discriminator FSRD, L P The method comprises the following steps:
Figure BDA0002900616070000102
wherein ,CFC () Representing the output of the last fully connected layer of classifier C.
The loss of the feature generator FSRG is a linear sum of the three:
L FSRG =L GAN1 L P2 L 2 (10)
wherein ,λ1 and λ2 All are adjustable weight coefficients greater than zero, and in the invention, both coefficients are set to 0.1.
The loss function calculation mode of the feature discriminator FSRD is as follows:
Figure BDA0002900616070000111
wherein ,
Figure BDA0002900616070000112
θ is a random number between 0 and 1, ensuring that each batch of data is
Figure BDA0002900616070000113
Is F SR and FHR Is p and k are respectively +.>
Figure BDA0002900616070000114
The exponential parameters and coefficient parameters of the terms, p=6 and k=2 in the experiment, give the best results.
The concrete operation of calculating the probability that a sample belongs to each class using softmax and re-weighting the sample loss is:
w=(σ-logit) r (12)
Figure BDA0002900616070000115
wherein, the logit is the probability that the sample output by the softmax function corresponds to the real category of the sample, and the parameters sigma and r are respectively set to be 1.5 and 2; the loss function of classifier C is set to cross entropy loss.
Step 9: model training
The gradient was updated using Adam optimizer, the learning rate was set to 0.00002, the Adam's one-order momentum parameter was 0.1, and the second-order momentum parameter was 0.999. The data set training iteration number (Epoch) was set to 400 and the data batch size (batch size) was set to 16.
Model use part:
extracting an image feature tensor T by using a feature extractor E, and carrying out feature reconstruction by using a feature generator FSRG to obtain a corresponding reconstructed feature F SR The probability that the sample belongs to each class is then calculated by the classifier C, classifying the sample into the class with the highest probability.
Referring to table 1, table 1 shows that the expression recognition average accuracy of different methods on the face image downsampled by each multiplying power of the RAF-DB data set is obviously improved compared with the method for directly carrying out Bicubic interpolation amplification on the low-resolution image. Compared with the super-resolution method RCAN and the Meta-SR of the reconstructed image, the method has better effect on the image with lower resolution and higher average recognition accuracy of each scale image. Compared with the method for directly carrying out Bicubic interpolation amplification on the low-resolution image, the method provided by the invention has obvious improvement. Compared with the super-resolution method RCAN and the Meta-SR of the reconstructed image, the method has better effect on the image with lower resolution and higher average recognition accuracy of each scale image.
TABLE 1 average accuracy of expression recognition on down-sampled face images for each magnification of RAF-DB dataset by different methods
Figure BDA0002900616070000121
The above is only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited by this, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (8)

1. The low-resolution image facial expression recognition method based on the feature reconstruction model is characterized by comprising the following steps of:
1) Collecting facial expression images with resolution ratio of 100x100 pixels or more, and labeling expression types to serve as an original image I HR The method comprises the steps of carrying out a first treatment on the surface of the Both the length and width of the image are changed to the original resolution
Figure FDA0004081569660000011
To->
Figure FDA0004081569660000012
Obtaining a corresponding low-resolution image, wherein the expression category label of the low-resolution image is consistent with that of the original image; taking one part of the original image and the corresponding low-resolution image as a training set, and taking the other part of the original image and the corresponding low-resolution image as a test set;
2) Training a neural network model by adopting a generated countermeasure network method;
inputting the original image and the low resolution image of each magnification into a feature extractor E, and the feature extractor E extracts and calculates a feature matrix F of the original image HR And a low-resolution image feature matrix F of each magnification LR
Low resolution image feature matrix F LR Input into an expression feature generator FSRG, and output to generate a reconstructed feature matrix F SR
Feature matrix F of original image HR And a corresponding reconstructed feature matrix F of the low resolution image SR Inputting the difference in the distribution space of the two characteristics into a characteristic discriminator FSRD, and optimizing the characteristic discriminator FSRG through back propagation;
reconstructing expression feature F SR Inputting the samples into a double-layer fully-connected expression classifier C for classification, calculating the probability of classifying the samples into various categories by the expression classifier C, and re-weighting the loss of each sample by using the probability value calculated weight coefficient of each sample correctly classified so as to accelerate the convergence of a neural network;
repeating the training process until a trained neural network model is obtained;
during the training of step 2), the loss function of the feature generator FSRG is determined by countering the loss L GAN Feature matrix F SR and FHR Perceptual loss L between P And a two-norm loss L 2 Composition;
countering loss L GAN The method comprises the following steps:
Figure FDA0004081569660000021
where b is the size of the data batch;
loss of feature perception L P The method comprises the following steps:
Figure FDA0004081569660000022
wherein ,CFC () Representing the output of the last full-connection layer of the classifier C;
loss of two norms L 2 The method comprises the following steps:
Figure FDA0004081569660000023
the loss of the feature generator FSRG is a linear sum of the three:
L FSRG =L GAN1 L P2 L 2 (11)
wherein ,λ1 and λ2 All are weight coefficients greater than zero;
3) Inputting a face image of an expression to be recognized into a trained neural network model, extracting a feature matrix F of the input image by a feature extractor E, and generating a reconstructed feature matrix F by a feature generator FSRG SR The classifier C calculates and outputs a class label of the recognition result.
2. The low resolution image facial expression recognition method based on a feature reconstruction model according to claim 1, wherein the feature extractor E in step 2) is composed of a plurality of convolution layers and a nonlinear activation layer, and is a feature extraction portion of an expression recognition model pre-trained by an original image dataset.
3. The low-resolution image facial expression recognition method based on the feature reconstruction model according to claim 1, wherein the feature extraction process in the feature extractor E in step 2) is as follows:
for an input image I, extracting a three-dimensional characteristic tensor T, wherein the size of the characteristic tensor T is w x h x n, w and h are the length and width of the characteristic tensor, and n is the channel number;
calculating a covariance matrix M of the characteristic tensor T:
Figure FDA0004081569660000031
wherein ,fi Representing one channel of the feature tensor T,
Figure FDA0004081569660000032
mean value of the channels of the feature tensor, +.>
Figure FDA0004081569660000033
n is the number of channels of the feature tensor T;
correcting the eigenvalue of the covariance matrix M to obtain a corrected covariance matrix M +
M + =M+λ*trace(M)*I (2)
Where λ is a coefficient greater than zero, I is the identity matrix, trace (M) is the trace of matrix M;
covariance matrix M for correction + And carrying out pooling operation and taking logarithm of the characteristic value to obtain a characteristic matrix F.
4. A low resolution image facial expression recognition method based on a feature reconstruction model as claimed in claim 3, wherein the covariance matrix M for correction + The process of pooling operation and taking logarithm of the characteristic value to obtain the characteristic matrix is as follows:
F cov =WM + W T (3)
wherein ,
Figure FDA0004081569660000034
for pooling parameter matrix, < >>
Figure FDA0004081569660000035
Is an output matrix;
for F cov Performing eigenvalue decomposition and eigenvalue correction to obtain matrix F + The specific operation is as follows:
F cov =U 1 Σ 1 U 1 T (4)
F + =U 1 max(εI,Σ 1 )U 1 T (5)
wherein, max () is the maximum value of the corresponding elements of the two matrixes;
for F + The specific operation is as follows:
F + =U 2 Σ 2 U 2 T (6)
F=U 2 log(Σ 2 )U 2 T (7)
wherein log (Σ 2 ) Finger pair eigenvalue matrix Σ 2 The operation of taking the logarithm of each element of (a).
5. The method for recognizing facial expression of a low-resolution image based on a feature reconstruction model according to claim 1, wherein the feature generator FSRG in the step 2) is a full convolution network, which is composed of a convolution neural network and a nonlinear activation layer, and the process of reconstructing a feature matrix by the feature generator FSRG is as follows:
feature matrix F of image with low resolution LR For inputting and outputting reconstructed characteristic matrix F SR The matrix before and after reconstruction remains dimensionally consistent.
6. The method for recognizing facial expression of a low-resolution image based on a feature reconstruction model according to claim 1, wherein the feature discriminator FSRD in step 2) compares the differences between the two in the distribution space, specifically:
the feature discriminator FSRD respectively uses the feature matrix F corresponding to the same image SR and FHR As input, corresponding scores are output, and the absolute value of the difference between the scores represents the wasperstein distance of both in the feature space.
7. The method for recognizing facial expressions of low-resolution images based on a feature reconstruction model according to claim 1, wherein the training process of step 2) includes the following steps:
Figure FDA0004081569660000041
wherein ,
Figure FDA0004081569660000042
θ is a random number between 0 and 1, ensuring +.>
Figure FDA0004081569660000043
Is F SR and FHR Linear interpolation results of (2); p and k are each->
Figure FDA0004081569660000044
An exponential parameter and a coefficient parameter of the term.
8. The facial expression recognition method of a low-resolution image based on a feature reconstruction model as set forth in claim 1, wherein the expression classifier C in step 2) calculates samples belonging to each Class using softmax i I=1,..a., z, z is the total number of categories, and its loss is re-weighted by the probability value corresponding to the real category, specifically:
w=(σ-logit) r (13)
Figure FDA0004081569660000051
Where logit is the probability that the sample output by the softmax function corresponds to its true class, parameters σ and r are set to 1.5 and 2, respectively.
CN202110055946.8A 2021-01-15 2021-01-15 Low-resolution image facial expression recognition method based on feature reconstruction model Active CN112818764B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110055946.8A CN112818764B (en) 2021-01-15 2021-01-15 Low-resolution image facial expression recognition method based on feature reconstruction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110055946.8A CN112818764B (en) 2021-01-15 2021-01-15 Low-resolution image facial expression recognition method based on feature reconstruction model

Publications (2)

Publication Number Publication Date
CN112818764A CN112818764A (en) 2021-05-18
CN112818764B true CN112818764B (en) 2023-05-02

Family

ID=75869434

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110055946.8A Active CN112818764B (en) 2021-01-15 2021-01-15 Low-resolution image facial expression recognition method based on feature reconstruction model

Country Status (1)

Country Link
CN (1) CN112818764B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255517B (en) * 2021-05-24 2023-10-24 中国科学技术大学 Expression recognition model training method for protecting privacy and expression recognition method and device
CN113344110B (en) * 2021-06-26 2024-04-05 浙江理工大学 Fuzzy image classification method based on super-resolution reconstruction
CN113486842A (en) * 2021-07-23 2021-10-08 北京达佳互联信息技术有限公司 Expression editing model training method and device and expression editing method and device
CN113887371A (en) * 2021-09-26 2022-01-04 华南理工大学 Data enhancement method for low-resolution face recognition
CN114648803B (en) * 2022-05-20 2022-09-06 中国科学技术大学 Method, system, equipment and storage medium for recognizing facial expressions in natural scene
CN115511748A (en) * 2022-09-30 2022-12-23 北京航星永志科技有限公司 Image high-definition processing method and device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
WO2019015466A1 (en) * 2017-07-17 2019-01-24 广州广电运通金融电子股份有限公司 Method and apparatus for verifying person and certificate
CN110084119A (en) * 2019-03-26 2019-08-02 安徽艾睿思智能科技有限公司 Low-resolution face image recognition methods based on deep learning
CN110211045A (en) * 2019-05-29 2019-09-06 电子科技大学 Super-resolution face image method based on SRGAN network
CN111784581A (en) * 2020-07-03 2020-10-16 苏州兴钊防务研究院有限公司 SAR image super-resolution reconstruction method based on self-normalization generation countermeasure network
CN111931805A (en) * 2020-06-23 2020-11-13 西安交通大学 Knowledge-guided CNN-based small sample similar abrasive particle identification method
CN112070058A (en) * 2020-09-18 2020-12-11 深延科技(北京)有限公司 Face and face composite emotional expression recognition method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8488023B2 (en) * 2009-05-20 2013-07-16 DigitalOptics Corporation Europe Limited Identifying facial expressions in acquired digital images
US10599951B2 (en) * 2018-03-28 2020-03-24 Kla-Tencor Corp. Training a neural network for defect detection in low resolution images

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
WO2019015466A1 (en) * 2017-07-17 2019-01-24 广州广电运通金融电子股份有限公司 Method and apparatus for verifying person and certificate
CN110084119A (en) * 2019-03-26 2019-08-02 安徽艾睿思智能科技有限公司 Low-resolution face image recognition methods based on deep learning
CN110211045A (en) * 2019-05-29 2019-09-06 电子科技大学 Super-resolution face image method based on SRGAN network
CN111931805A (en) * 2020-06-23 2020-11-13 西安交通大学 Knowledge-guided CNN-based small sample similar abrasive particle identification method
CN111784581A (en) * 2020-07-03 2020-10-16 苏州兴钊防务研究院有限公司 SAR image super-resolution reconstruction method based on self-normalization generation countermeasure network
CN112070058A (en) * 2020-09-18 2020-12-11 深延科技(北京)有限公司 Face and face composite emotional expression recognition method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks;Xintao Wang etal.;《 http:arXiv:1809.00219v2》;20180917;全文 *
基于生成式对抗网络的鲁棒人脸表情识别;姚乃明等;《自动化学报》;20180531;第44卷(第5期);全文 *

Also Published As

Publication number Publication date
CN112818764A (en) 2021-05-18

Similar Documents

Publication Publication Date Title
CN112818764B (en) Low-resolution image facial expression recognition method based on feature reconstruction model
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN111091045B (en) Sign language identification method based on space-time attention mechanism
CN108717568B (en) A kind of image characteristics extraction and training method based on Three dimensional convolution neural network
CN107977932B (en) Face image super-resolution reconstruction method based on discriminable attribute constraint generation countermeasure network
CN112949565B (en) Single-sample partially-shielded face recognition method and system based on attention mechanism
CN107341452B (en) Human behavior identification method based on quaternion space-time convolution neural network
CN109522857B (en) People number estimation method based on generation type confrontation network model
CN110348330B (en) Face pose virtual view generation method based on VAE-ACGAN
CN108648197B (en) Target candidate region extraction method based on image background mask
CN109886881B (en) Face makeup removal method
CN107085704A (en) Fast face expression recognition method based on ELM own coding algorithms
Huynh et al. Convolutional neural network models for facial expression recognition using bu-3dfe database
CN109389171B (en) Medical image classification method based on multi-granularity convolution noise reduction automatic encoder technology
CN115484410B (en) Event camera video reconstruction method based on deep learning
CN110728629A (en) Image set enhancement method for resisting attack
CN112766165B (en) Falling pre-judging method based on deep neural network and panoramic segmentation
CN112184582B (en) Attention mechanism-based image completion method and device
CN116168067B (en) Supervised multi-modal light field depth estimation method based on deep learning
CN112257741B (en) Method for detecting generative anti-false picture based on complex neural network
CN114463759A (en) Lightweight character detection method and device based on anchor-frame-free algorithm
CN105550712B (en) Aurora image classification method based on optimization convolution autocoding network
CN109977989A (en) A kind of processing method of image tensor data
CN111967361A (en) Emotion detection method based on baby expression recognition and crying
CN112686817A (en) Image completion method based on uncertainty estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant