CN113869221B - Expression recognition method based on multistage deep neural network - Google Patents
Expression recognition method based on multistage deep neural network Download PDFInfo
- Publication number
- CN113869221B CN113869221B CN202111148260.XA CN202111148260A CN113869221B CN 113869221 B CN113869221 B CN 113869221B CN 202111148260 A CN202111148260 A CN 202111148260A CN 113869221 B CN113869221 B CN 113869221B
- Authority
- CN
- China
- Prior art keywords
- expression
- layer
- convolution
- expression recognition
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000014509 gene expression Effects 0.000 title claims abstract description 116
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 34
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 30
- 238000000605 extraction Methods 0.000 claims abstract description 20
- 206010028813 Nausea Diseases 0.000 claims abstract description 18
- 230000008693 nausea Effects 0.000 claims abstract description 18
- 238000012360 testing method Methods 0.000 claims description 18
- 238000009826 distribution Methods 0.000 claims description 13
- 238000011176 pooling Methods 0.000 claims description 13
- 241000282414 Homo sapiens Species 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 8
- 238000005520 cutting process Methods 0.000 claims description 7
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000012935 Averaging Methods 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 239000013598 vector Substances 0.000 abstract description 12
- 238000003062 neural network model Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008921 facial expression Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of expression recognition, and particularly discloses an expression recognition method based on a multistage deep neural network. Next, the dataset labeled as angry, nausea, fear, and contempt is fed into the feature extraction network model, and feature vectors of these data are output. These feature vectors are then processed through a normalized flow model. Finally, the processed feature vectors are fed into a multi-layer perceptron (MLP). Through continuous learning and training of the MLP, finally, the MLP can successfully identify four expressions of Qi, nausea, fear and contempt. Therefore, the seven basic expressions can be identified with higher precision by utilizing the trained multi-level neural network model.
Description
Technical Field
The invention relates to the technical field of expression recognition, in particular to an expression recognition method based on a multistage deep neural network.
Background
Expression is a very important information in our daily communications. In actual communication, expressions often play a role in enhancing the communication effect with each other. Psychologist a. Mehrasia mentions in literature An Approach to Environment Psychology that in daily communications of people, information conveyed by language accounts for only 7% of the total information, while information conveyed by facial expression accounts for 55% of the total information. Meanwhile, with the development of machine learning technology in recent years, the face recognition technology has also received a great deal of attention. Especially facial expression recognition, the method has gained more extensive attention in the fields of safety, robot manufacturing, automation, automatic driving, man-machine interaction and the like. There are at least 21 expressions of human beings, seven of which are: happy, surprised, sad, angry, nausea, fear and contempt. They are all composed of human expression basic units, i.e. one or more actions and states of muscles of various parts of the face. However, the accuracy of the present expression recognition is not too high, and particularly for some expressions with overlapped partial features, the recognition effect is not excellent.
Disclosure of Invention
The invention provides an expression recognition method based on a multistage deep neural network, which solves the technical problems that: how to improve the recognition effect of the expression with the overlapped partial characteristics.
In order to solve the technical problems, the invention provides an expression recognition method based on a multistage deep neural network, which comprises the following steps:
s1, preprocessing a training data set containing seven expression labels; wherein,
Seven big expression labels are happy, surprised, sad, angry, nausea, fear and contempt, respectively;
The preprocessing is to change the labels of the picture data of 4 expression labels with higher recognition complexity, namely, the breath, nausea, fear and contempt, into other labels, and the labels of the rest picture data are not changed, and cut all the picture data into a size of B multiplied by B;
S2, sending the data of the happy, surprised, sad and other four expression labels obtained after the pretreatment in the step S1 into a first expression recognition network model for training so as to fix weight data of the first expression recognition network model;
S3, cutting data of 4 expression labels with higher recognition complexity, namely, angry, nausea, fear and contempt, into a B multiplied by B size, and then sending the data into a feature extraction network model to obtain corresponding feature data;
S4, sending the characteristic data obtained in the step S3 into a standardized flow model for processing, so that the data is subjected to Gaussian distribution;
s5, sending the data obtained in the step S4 into a multi-layer perceptron for training, and storing the trained parameters;
s6, testing the trained expression recognition model consisting of the first expression recognition network model, the feature extraction network model, the standardized flow model and the multi-layer perceptron;
s7, using the expression recognition model passing the test for recognizing the unknown expression image, wherein the recognition process comprises the following steps:
Cutting the unknown expression image into B multiplied by B, then sending the B multiplied by B into the first expression recognition network model passing the test, directly outputting a recognition result if the B multiplied by B is judged to be the other expression except the other expression, otherwise sending the B multiplied by B into a second expression recognition network model consisting of the feature extraction network model, the standardized flow model and the multi-layer perceptron, and outputting a classification result with the maximum probability.
Specifically, the first expression recognition network model is built based on ResNet network model, and comprises a first convolution module and a full connection module;
the feature extraction network model is built based on ResNet a 18 a network model, including a second convolution module.
This has the advantage that the extracted expressive features are more spatially aggregated, since DNF requires a certain degree of aggregation of the data when using DNF.
Specifically, the first convolution module includes:
a first block: a convolution layer consisting of 64 7 x 7 convolution kernels, with a step size of 2;
and a second block: is composed of a 3 x 3 maximum pooling layer and a convolution layer composed of two 64 3 x 3 convolution kernels;
third block: a convolution layer consisting of two layers of 128 3 x3 convolution kernels;
fourth block: a convolution layer consisting of two layers of 256 3 x3 convolution kernels;
fifth block: a convolution layer consisting of two layers of 512 3 x3 convolution kernels;
the fully connected module comprises:
An averaging pooling layer, a full connectivity layer, and a Softmax layer;
A Dropout strategy was added before the fully connected layer and 50% of neurons were randomly inactivated.
Specifically, the first expression recognition network model uses cross entropy as a loss function thereof, and the formula is as follows:
In formula (1), N represents the number of categories, y (i) is whether the output category is the same as the label, the same as 1, otherwise 0, Representing the predicted value.
Specifically, the second convolution module includes:
a first block: a convolution layer consisting of 64 7 x 7 convolution kernels, with a step size of 2;
and a second block: is composed of a 3 x 3 maximum pooling layer and a convolution layer composed of two 64 3 x 3 convolution kernels;
third block: a convolution layer consisting of two layers of 128 3 x3 convolution kernels;
fourth block: a convolution layer consisting of two layers of 256 3 x3 convolution kernels;
fifth block: a convolution layer consisting of two layers of 512 3 x3 convolution kernels;
plus the final average pooling layer.
Specifically, the training objective function of the standardized flow model is:
In formula (2), Θ= { { μ y }, Σ, θ } represents all parameters, where y represents a class, μ y represents a variance of the y class, Σ represents a covariance, θ represents a parameter of the model; y (x i) represents the class of the ith sample, Z i represents samples that conform to a gaussian distribution after DNF processing, Representing the distribution of each class y, after training, the normalized flow model creates a normalized space for Z, and wherein each class conforms to a Gaussian distribution; det represents the determinant of the square matrix, x i=f(Zi), where f is a combination of T inverse autoregressive transforms, expressed as:
f=fT·fT-1...·f0 (3)
Wherein each f t is a structured neural network, and T is more than or equal to 0 and less than or equal to T.
Specifically, the standardized flow model is composed of 10 mask autoregressive flow blocks, and each mask autoregressive flow block is realized by three layers of fully connected neural networks and is an inverse autoregressive transformationWherein/>Refers to the j-th output of the i-th chunk masked autoregressive stream chunk,/>And/>{ F μ,fα } is an unconstrained function, exp denotes an exponential function based on a natural constant e. In the experiment, the model was trained using Adam optimizer, with batch size set to 300 and learning rate set to 0.003.
Specifically, the multi-layer perceptron comprises an input layer, a hidden layer and an output layer; the input layer has 5 nodes, the hidden layer has 6 nodes, the output layer has 4 nodes, and the connection between the nodes of adjacent layers has weight.
Preferably, b×b=224×224.
Specifically, when testing, firstly, a camera is required to collect a photo of a human face, after the 68 key points of the human face are identified, the photo which is cut into 224×224 size and contains 68 key points is sent into an expression identification model.
According to the expression recognition method based on the multistage deep neural network, the labels of the picture data of the 4 expression labels with higher recognition complexity, namely, the breath, nausea, fear and contempt are uniformly modified into other labels, and then the data of the four expression labels, namely, happiness, surprise, sadness and other expression labels, are sent into the first expression recognition network model for training, and the first expression recognition network model can successfully recognize the 4 classifications of other, happy, surprise and sadness through continuous learning and training. Then, the data set consisting of the 4 expression labels of the Qi, nausea, fear and contempt is sent into a feature extraction network model, and the feature extraction network model outputs feature vectors of the data. And then, processing the feature vectors through a standardized flow model to enlarge the class interval and reduce the class inner distance, so that the later training and recognition are facilitated. Finally, the processed feature vectors are fed into a multi-layer perceptron (MLP). Through continuous learning and training of the MLP, finally, the MLP can successfully identify 4 expressions of Qi, nausea, fear and contempt. Through the complete flow, the seven basic expressions can be identified with higher precision by utilizing the trained multi-level neural network model (a first expression identification network model, a feature extraction network model, a standardized flow model and a multi-layer perceptron).
Compared with a single-stage neural network, the multi-stage neural network utilizes the multi-stage characteristics of the single-stage neural network, and different modules distinguish expressions with different complexity, so that the complex expressions can be better resolved, and the recognition capability of higher precision on all expressions is achieved.
In a specific experiment, a dataset containing only asian faces was used. For this dataset, the VGG16 was used for training and testing first, and the final accuracy could only reach 38.72%. Then, single-stage ResNet is used for training and testing, and the accuracy reaches 42.58%. After the invention is applied, the accuracy reaches 86.32 percent.
Drawings
Fig. 1 is a step flowchart of an expression recognition method based on a multi-level deep neural network according to an embodiment of the present invention;
FIG. 2 is a block diagram of a multi-stage deep neural network according to an embodiment of the present invention;
Fig. 3 is a frame structure diagram of a first expression recognition network model ResNet (1) provided in an embodiment of the present invention;
FIG. 4 is a block diagram of a feature extraction network model ResNet (2) provided by an embodiment of the invention;
FIG. 5 is a block diagram of a standardized flow model (DNF model) provided by an embodiment of the present invention;
FIG. 6 is a block diagram of a multi-layer perceptron (MLP) provided by an embodiment of the invention;
FIG. 7 is a block diagram of a mask autoregressive flow block provided by an embodiment of the invention.
Detailed Description
The following examples are given for the purpose of illustration only and are not to be construed as limiting the invention, including the drawings for reference and description only, and are not to be construed as limiting the scope of the invention as many variations thereof are possible without departing from the spirit and scope of the invention.
The expression recognition method based on the multistage deep neural network provided by the embodiment of the invention, as shown in figures 1 and 2, comprises the following steps:
s1, preprocessing a training data set containing seven-big expression labels.
Among the seven-big expression labels are Happy ("Happy"), surprised ("Surprise"), sad ("Sad"), angry ("Angry"), nausea ("Disgust"), fear ("Fear") and contempt ("Contempt"), respectively.
Here, preprocessing refers to changing the labels of the picture data identifying the 4 expression labels of higher complexity, namely, the breath, nausea, fear and contempt, to "Other" ("Other"), and the labels of the rest of the picture data are not changed, and cutting all the picture data to be b×b size. Namely, pretreatment is to modify the labels of the 4 expressions of Qi, nausea, fear and contempt into Other labels (Other) to obtain the data of the 4 labels of the Other labels, happy, surprise and sadness. The present example uniformly clips data to a 224×224 size, i.e., b=224. Of course, the size of B can be determined according to actual requirements.
S2, sending the data of the happy, surprised, sad and other four expression labels obtained after the pretreatment in the step S1 into a first expression recognition network model for training so as to fix weight data of the first expression recognition network model.
As shown in fig. 2, the first expression recognition network model is built based on ResNet a network model, which is abbreviated as ResNet (1) in this example, and includes a first convolution module and a full connection module, and an image with a size of b×b=224×224 is input.
As shown in fig. 3, the first convolution module includes:
a first block: a convolution layer consisting of 64 7 x 7 convolution kernels, with a step size of 2;
And a second block: is composed of a3 x 3 max pooling layer (Maxpool) and a convolution layer composed of two layers of 64 3 x 3 convolution kernels;
third block: a convolution layer consisting of two layers of 128 3 x3 convolution kernels;
fourth block: a convolution layer consisting of two layers of 256 3 x3 convolution kernels;
fifth block: a convolution layer consisting of two layers of 512 3 x3 convolution kernels;
the fully connected module comprises:
an averaging pooling layer (Avg Pooling), a fully-connected layer (FC) and a Softmax layer;
A Dropout strategy was added before the fully connected layer and 50% of neurons were randomly inactivated.
S3, cutting data of the 4 expression labels with higher recognition complexity, namely, the breath, nausea, fear and contempt, into a B multiplied by B size, and then sending the data into a feature extraction network model to obtain corresponding feature data.
Here, as shown in fig. 2, the feature extraction network model is also built based on ResNet a network model, which is simply referred to as ResNet (2) model in this example, but includes only the second convolution module, and a picture of size b×b=224×224 is input. The purpose of this is to make the extracted expressive features spatially more aggregated, since a certain degree of aggregation of the data is required when using the normalized flow model afterwards. As shown in fig. 4, the second convolution module includes:
a first block: a convolution layer consisting of 64 7 x 7 convolution kernels, with a step size of 2;
and a second block: is composed of a 3 x 3 maximum pooling layer and a convolution layer composed of two 64 3 x 3 convolution kernels;
third block: a convolution layer consisting of two layers of 128 3 x3 convolution kernels;
fourth block: a convolution layer consisting of two layers of 256 3 x3 convolution kernels;
fifth block: a convolution layer consisting of two layers of 512 3 x3 convolution kernels;
plus the final average pooling layer.
And S4, sending the characteristic data obtained in the step S3 into a standardized flow model (DNF model) for processing, so that the data is subjected to Gaussian distribution.
The input DNF model is the eigenvectors from the Resnet (2) model, and the output is the normalized eigenvector space. The DNF model will change the original data of features with smaller class-spacing into data with larger class-spacing, making it subject to gaussian distribution, as shown in fig. 5. Wherein, the training objective function of the DNF model is:
In formula (2), Θ= { { μ y }, Σ, θ } represents all parameters, where y represents a class, μ y represents a variance of the y class, Σ represents a covariance, θ represents a parameter of the model; y (x i) represents the class of the ith sample, Z i represents samples that conform to a gaussian distribution after DNF processing, Representing the distribution of each class y, after training, the normalized flow model creates a normalized space for Z, and wherein each class conforms to a Gaussian distribution; det represents the determinant of the square matrix, x i=f(Zi), where f is a combination of T inverse autoregressive transforms, expressed as:
f=fT·fT-1...·f0 (3)
Wherein each f t is a structured neural network, and T is more than or equal to 0 and less than or equal to T.
The normalized flow model consists of 10 masked autoregressive flow blocks, each of which (MAF) is implemented by a three-layer fully connected neural network, all being an inverse autoregressive transformationWherein/>Refers to the j-th output of the i-th chunk masked autoregressive stream chunk,/>And/>{ F μ,fα } is an unconstrained function, exp denotes an exponential function based on a natural constant e. In the experiment, the model was trained using Adam optimizer, with batch size set to 300 and learning rate set to 0.003. The structure of MAF is shown in FIG. 7.
S5, sending the data obtained in the step S4 into a multi-layer perceptron (MLP) for training, and storing the trained parameters.
The data input to the MLP are vectors after the DNF model processing. The vector class spacing in the vector space is larger and the class inter-class spacing is smaller. The MLP model comprises an input layer, a hidden layer and an output layer. Wherein the input layer has 5 nodes, the hidden layer has 6 nodes, and the output layer has 4 nodes. The connections between adjacent tier nodes are weighted. By training, these edges are assigned the correct weights. The network structure of the MLP is shown in fig. 6.
S6, testing the trained first expression recognition network model, the feature extraction network model, the standardized flow model and the expression recognition model (multi-level deep neural network) formed by the multi-level perceptron.
When testing, firstly, a camera is required to collect a photo of a human face, after the key points at 68 positions of the human face are identified, the photo which is cut into 224 multiplied by 224 and contains 68 key points is sent into an expression identification model.
S7, using the expression recognition model passing the test for recognizing the unknown expression image, wherein the recognition process comprises the following steps:
And cutting the unknown expression image, sending the cut unknown expression image into the first expression recognition network model passing the test, directly outputting a recognition result if judging that the unknown expression image is the expression except the other expressions, otherwise, sending the unknown expression image into a second expression recognition network model consisting of the feature extraction network model, the standardized flow model and the multi-layer perceptron, and outputting a classification result with the maximum probability.
According to the expression recognition method based on the multistage deep neural network, the labels of the 4 expression labels ('gas', 'nausea', 'fear' and 'contempt') with higher recognition complexity are uniformly modified into 'others', all the data are sent into the first expression recognition network model ResNet (1) to be trained, and the first expression recognition network model ResNet (1) can successfully recognize the 'others' and the remaining 3 classifications ('happiness', 'surprise', 'sadness') through continuous learning and training. Next, the dataset consisting of the 4 tags ("vital", "nausea", "fear", and "contempt") is fed into the feature extraction network model ResNet (2), and the feature extraction network model ResNet (2) outputs feature vectors for these data. Then, these feature vectors are processed by a standardized flow model (DNF model) to enlarge the class spacing and reduce the class inner spacing, so that the later training and recognition are facilitated. Finally, the processed feature vectors are fed into a multi-layer perceptron (MLP). Through continuous learning and training of the MLP, finally, the MLP can successfully identify the A expression. Through the complete flow, the seven basic expressions can be identified with higher precision by utilizing the trained multi-level neural network model (a first expression identification network model, a feature extraction network model, a standardized flow model and a multi-layer perceptron).
Compared with a single-stage neural network, the multi-stage neural network utilizes the multi-stage characteristics of the single-stage neural network, and different modules distinguish expressions with different complexity, so that the complex expressions can be better resolved, and the high-precision recognition of all the expressions is achieved.
In particular experiments, datasets containing only asian faces were particularly employed herein. The dataset contains 40005 Asian face pictures, and each basic expression has 5715 pictures. Wherein 16002 pictures are used for training, 12001 pictures are used for verification, and 12002 pictures are used for testing. During training, the batch size was set to 64, the learning rate was set to 0.01, and the number of iterations was 50. At the end of the test, the formula of the accuracy is as follows:
Wherein T refers to the total number of correctly judged expressions, and F refers to the total number of incorrectly judged expressions. For this dataset, the VGG16 was used for training and testing first, and the final accuracy could only reach 38.72%. Then, single-stage ResNet is used for training and testing, and the accuracy reaches 42.58%. After the invention is applied, the accuracy reaches 86.32 percent.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.
Claims (10)
1. The expression recognition method based on the multistage deep neural network is characterized by comprising the following steps of:
S1, preprocessing a training data set containing seven expression labels;
wherein the seven big expression labels are happy, surprised, sad, angry, nausea, fear and contempt respectively;
The preprocessing is to change the labels of the picture data of 4 expression labels with higher recognition complexity, namely, the breath, nausea, fear and contempt, into other labels, and the labels of the rest picture data are not changed, and cut all the picture data into a size of B multiplied by B;
S2, sending the data of the happy, surprised, sad and other four expression labels obtained after the pretreatment in the step S1 into a first expression recognition network model for training so as to fix weight data of the first expression recognition network model;
S3, cutting data of 4 expression labels with higher recognition complexity, namely, angry, nausea, fear and contempt, into a B multiplied by B size, and then sending the data into a feature extraction network model to obtain corresponding feature data;
S4, sending the characteristic data obtained in the step S3 into a standardized flow model for processing, so that the data is subjected to Gaussian distribution;
s5, sending the data obtained in the step S4 into a multi-layer perceptron for training, and storing the trained parameters;
s6, testing the trained expression recognition model consisting of the first expression recognition network model, the feature extraction network model, the standardized flow model and the multi-layer perceptron;
s7, using the expression recognition model passing the test for recognizing the unknown expression image, wherein the recognition process comprises the following steps:
Cutting the unknown expression image into B multiplied by B, then sending the B multiplied by B into the first expression recognition network model passing the test, directly outputting a recognition result if the B multiplied by B is judged to be the other expression except the other expression, otherwise sending the B multiplied by B into a second expression recognition network model consisting of the feature extraction network model, the standardized flow model and the multi-layer perceptron, and outputting a classification result with the maximum probability.
2. The expression recognition method based on the multistage deep neural network according to claim 1, wherein:
The first expression recognition network model is built based on ResNet network models and comprises a first convolution module and a full connection module;
The feature extraction network model is built based on ResNet-18 network models, including a second convolution module.
3. The expression recognition method based on the multi-level deep neural network according to claim 2, wherein the first convolution module comprises:
a first block: a convolution layer consisting of 64 7 x 7 convolution kernels, with a step size of 2;
and a second block: is composed of a 3 x 3 maximum pooling layer and a convolution layer composed of two 64 3 x 3 convolution kernels;
third block: a convolution layer consisting of two layers of 128 3 x3 convolution kernels;
fourth block: a convolution layer consisting of two layers of 256 3 x3 convolution kernels;
fifth block: a convolution layer consisting of two layers of 512 3 x3 convolution kernels;
the fully connected module comprises:
An averaging pooling layer, a full connectivity layer, and a Softmax layer;
A Dropout strategy was added before the fully connected layer and 50% of neurons were randomly inactivated.
4. The expression recognition method based on a multi-level deep neural network according to claim 3, wherein the first expression recognition network model uses cross entropy as its loss function, and the formula is:
In formula (1), N represents the number of categories, y (i) is whether the output category is the same as the label, the same as 1, otherwise 0, Representing the predicted value.
5. The expression recognition method based on the multi-level deep neural network according to claim 2, wherein the second convolution module comprises:
a first block: a convolution layer consisting of 64 7 x 7 convolution kernels, with a step size of 2;
and a second block: is composed of a 3 x 3 maximum pooling layer and a convolution layer composed of two 64 3 x 3 convolution kernels;
third block: a convolution layer consisting of two layers of 128 3 x3 convolution kernels;
fourth block: a convolution layer consisting of two layers of 256 3 x3 convolution kernels;
fifth block: a convolution layer consisting of two layers of 512 3 x3 convolution kernels;
plus the final average pooling layer.
6. The expression recognition method based on the multistage deep neural network according to claim 1, wherein the training objective function of the standardized flow model is:
In formula (2), Θ= { { μ y }, Σ, θ } represents all parameters, where y represents a class, μ y represents a variance of the y class, Σ represents a covariance, θ represents a parameter of the model; y (x i) represents the class of the ith sample, Z i represents samples that conform to a gaussian distribution after DNF processing, Representing the distribution of each class y, after training, the normalized flow model creates a normalized space for Z, and wherein each class conforms to a Gaussian distribution; det represents the determinant of the square matrix, x i=f(Zi), where f is a combination of T inverse autoregressive transforms, expressed as:
f=fT·fT-1...·f0 (3)
Wherein each f t is a structured neural network, and T is more than or equal to 0 and less than or equal to T.
7. The expression recognition method based on the multistage deep neural network according to claim 6, wherein: the standardized flow model consists of 10 mask autoregressive flow blocks, each mask autoregressive flow block is realized by a three-layer fully-connected neural network and is an inverse autoregressive transformationWherein/>Refers to the j-th output of the i-th chunk masked autoregressive stream chunk,/>And/>{ F μ,fα } is an unconstrained function, exp denotes an exponential function based on a natural constant e.
8. The expression recognition method based on the multistage deep neural network according to claim 1, wherein: the multi-layer perceptron comprises an input layer, a hidden layer and an output layer; the input layer has 5 nodes, the hidden layer has 6 nodes, the output layer has 4 nodes, and the connection between the nodes of adjacent layers has weight.
9. The expression recognition method based on the multistage deep neural network according to any one of claims 1 to 8, wherein: b×b=224×224.
10. The expression recognition method based on the multistage deep neural network according to claim 9, wherein: when testing, firstly, a camera is required to collect a photo of a human face, after the key points at 68 positions of the human face are identified, the photo which is cut into 224 multiplied by 224 and contains 68 key points is sent into an expression identification model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111148260.XA CN113869221B (en) | 2021-09-29 | 2021-09-29 | Expression recognition method based on multistage deep neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111148260.XA CN113869221B (en) | 2021-09-29 | 2021-09-29 | Expression recognition method based on multistage deep neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113869221A CN113869221A (en) | 2021-12-31 |
CN113869221B true CN113869221B (en) | 2024-05-24 |
Family
ID=78992218
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111148260.XA Active CN113869221B (en) | 2021-09-29 | 2021-09-29 | Expression recognition method based on multistage deep neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113869221B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110705379A (en) * | 2019-09-12 | 2020-01-17 | 广州大学 | Expression recognition method of convolutional neural network based on multi-label learning |
CN111639544A (en) * | 2020-05-07 | 2020-09-08 | 齐齐哈尔大学 | Expression recognition method based on multi-branch cross-connection convolutional neural network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109684911B (en) * | 2018-10-30 | 2021-05-11 | 百度在线网络技术(北京)有限公司 | Expression recognition method and device, electronic equipment and storage medium |
-
2021
- 2021-09-29 CN CN202111148260.XA patent/CN113869221B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110705379A (en) * | 2019-09-12 | 2020-01-17 | 广州大学 | Expression recognition method of convolutional neural network based on multi-label learning |
CN111639544A (en) * | 2020-05-07 | 2020-09-08 | 齐齐哈尔大学 | Expression recognition method based on multi-branch cross-connection convolutional neural network |
Non-Patent Citations (1)
Title |
---|
基于多特征融合卷积神经网络的人脸表情识别;王建霞;陈慧萍;李佳泽;张晓明;;河北科技大学学报;20200103(第06期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113869221A (en) | 2021-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Coşkun et al. | Face recognition based on convolutional neural network | |
Chen | Deep learning with nonparametric clustering | |
CN111652066A (en) | Medical behavior identification method based on multi-self-attention mechanism deep learning | |
Sun et al. | Facial expression recognition based on a hybrid model combining deep and shallow features | |
CN112732916B (en) | BERT-based multi-feature fusion fuzzy text classification system | |
Prakash et al. | Face recognition with convolutional neural network and transfer learning | |
CN113407660B (en) | Unstructured text event extraction method | |
CN112199536A (en) | Cross-modality-based rapid multi-label image classification method and system | |
CN107491729B (en) | Handwritten digit recognition method based on cosine similarity activated convolutional neural network | |
CN112733866A (en) | Network construction method for improving text description correctness of controllable image | |
Elleuch et al. | Towards unsupervised learning for Arabic handwritten recognition using deep architectures | |
KR100729273B1 (en) | A method of face recognition using pca and back-propagation algorithms | |
CN114818703B (en) | Multi-intention recognition method and system based on BERT language model and TextCNN model | |
Qiao et al. | A face recognition system based on convolution neural network | |
Pandey et al. | Face Recognition Using Machine Learning | |
CN110490028A (en) | Recognition of face network training method, equipment and storage medium based on deep learning | |
Deeb et al. | Human facial emotion recognition using improved black hole based extreme learning machine | |
Dan et al. | Pf-vit: Parallel and fast vision transformer for offline handwritten chinese character recognition | |
CN114170659A (en) | Facial emotion recognition method based on attention mechanism | |
CN113869221B (en) | Expression recognition method based on multistage deep neural network | |
CN116775880A (en) | Multi-label text classification method and system based on label semantics and transfer learning | |
CN113159071B (en) | Cross-modal image-text association anomaly detection method | |
Zeghina et al. | Face Recognition Based on Harris Detector and Convolutional Neural Networks | |
Luqin | A survey of facial expression recognition based on convolutional neural network | |
Khaliluzzaman et al. | Automatic facial expression recognition using shallow convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20230914 Address after: No. 20, East Road, University City, Chongqing, Shapingba District, Chongqing Applicant after: Chongqing Daipu Technology Co.,Ltd. Address before: 400799 No. A022, floor 8, No. 142, Yunhan Avenue, Shuitu street, Liangjiang New Area, Yubei District, Chongqing Applicant before: Jiuziyuan (Chongqing) Intelligent Technology Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant |