CN113869221B - Expression recognition method based on multistage deep neural network - Google Patents

Expression recognition method based on multistage deep neural network Download PDF

Info

Publication number
CN113869221B
CN113869221B CN202111148260.XA CN202111148260A CN113869221B CN 113869221 B CN113869221 B CN 113869221B CN 202111148260 A CN202111148260 A CN 202111148260A CN 113869221 B CN113869221 B CN 113869221B
Authority
CN
China
Prior art keywords
expression
layer
convolution
expression recognition
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111148260.XA
Other languages
Chinese (zh)
Other versions
CN113869221A (en
Inventor
利节
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Daipu Technology Co ltd
Original Assignee
Chongqing Daipu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Daipu Technology Co ltd filed Critical Chongqing Daipu Technology Co ltd
Priority to CN202111148260.XA priority Critical patent/CN113869221B/en
Publication of CN113869221A publication Critical patent/CN113869221A/en
Application granted granted Critical
Publication of CN113869221B publication Critical patent/CN113869221B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of expression recognition, and particularly discloses an expression recognition method based on a multistage deep neural network. Next, the dataset labeled as angry, nausea, fear, and contempt is fed into the feature extraction network model, and feature vectors of these data are output. These feature vectors are then processed through a normalized flow model. Finally, the processed feature vectors are fed into a multi-layer perceptron (MLP). Through continuous learning and training of the MLP, finally, the MLP can successfully identify four expressions of Qi, nausea, fear and contempt. Therefore, the seven basic expressions can be identified with higher precision by utilizing the trained multi-level neural network model.

Description

Expression recognition method based on multistage deep neural network
Technical Field
The invention relates to the technical field of expression recognition, in particular to an expression recognition method based on a multistage deep neural network.
Background
Expression is a very important information in our daily communications. In actual communication, expressions often play a role in enhancing the communication effect with each other. Psychologist a. Mehrasia mentions in literature An Approach to Environment Psychology that in daily communications of people, information conveyed by language accounts for only 7% of the total information, while information conveyed by facial expression accounts for 55% of the total information. Meanwhile, with the development of machine learning technology in recent years, the face recognition technology has also received a great deal of attention. Especially facial expression recognition, the method has gained more extensive attention in the fields of safety, robot manufacturing, automation, automatic driving, man-machine interaction and the like. There are at least 21 expressions of human beings, seven of which are: happy, surprised, sad, angry, nausea, fear and contempt. They are all composed of human expression basic units, i.e. one or more actions and states of muscles of various parts of the face. However, the accuracy of the present expression recognition is not too high, and particularly for some expressions with overlapped partial features, the recognition effect is not excellent.
Disclosure of Invention
The invention provides an expression recognition method based on a multistage deep neural network, which solves the technical problems that: how to improve the recognition effect of the expression with the overlapped partial characteristics.
In order to solve the technical problems, the invention provides an expression recognition method based on a multistage deep neural network, which comprises the following steps:
s1, preprocessing a training data set containing seven expression labels; wherein,
Seven big expression labels are happy, surprised, sad, angry, nausea, fear and contempt, respectively;
The preprocessing is to change the labels of the picture data of 4 expression labels with higher recognition complexity, namely, the breath, nausea, fear and contempt, into other labels, and the labels of the rest picture data are not changed, and cut all the picture data into a size of B multiplied by B;
S2, sending the data of the happy, surprised, sad and other four expression labels obtained after the pretreatment in the step S1 into a first expression recognition network model for training so as to fix weight data of the first expression recognition network model;
S3, cutting data of 4 expression labels with higher recognition complexity, namely, angry, nausea, fear and contempt, into a B multiplied by B size, and then sending the data into a feature extraction network model to obtain corresponding feature data;
S4, sending the characteristic data obtained in the step S3 into a standardized flow model for processing, so that the data is subjected to Gaussian distribution;
s5, sending the data obtained in the step S4 into a multi-layer perceptron for training, and storing the trained parameters;
s6, testing the trained expression recognition model consisting of the first expression recognition network model, the feature extraction network model, the standardized flow model and the multi-layer perceptron;
s7, using the expression recognition model passing the test for recognizing the unknown expression image, wherein the recognition process comprises the following steps:
Cutting the unknown expression image into B multiplied by B, then sending the B multiplied by B into the first expression recognition network model passing the test, directly outputting a recognition result if the B multiplied by B is judged to be the other expression except the other expression, otherwise sending the B multiplied by B into a second expression recognition network model consisting of the feature extraction network model, the standardized flow model and the multi-layer perceptron, and outputting a classification result with the maximum probability.
Specifically, the first expression recognition network model is built based on ResNet network model, and comprises a first convolution module and a full connection module;
the feature extraction network model is built based on ResNet a 18 a network model, including a second convolution module.
This has the advantage that the extracted expressive features are more spatially aggregated, since DNF requires a certain degree of aggregation of the data when using DNF.
Specifically, the first convolution module includes:
a first block: a convolution layer consisting of 64 7 x 7 convolution kernels, with a step size of 2;
and a second block: is composed of a 3 x 3 maximum pooling layer and a convolution layer composed of two 64 3 x 3 convolution kernels;
third block: a convolution layer consisting of two layers of 128 3 x3 convolution kernels;
fourth block: a convolution layer consisting of two layers of 256 3 x3 convolution kernels;
fifth block: a convolution layer consisting of two layers of 512 3 x3 convolution kernels;
the fully connected module comprises:
An averaging pooling layer, a full connectivity layer, and a Softmax layer;
A Dropout strategy was added before the fully connected layer and 50% of neurons were randomly inactivated.
Specifically, the first expression recognition network model uses cross entropy as a loss function thereof, and the formula is as follows:
In formula (1), N represents the number of categories, y (i) is whether the output category is the same as the label, the same as 1, otherwise 0, Representing the predicted value.
Specifically, the second convolution module includes:
a first block: a convolution layer consisting of 64 7 x 7 convolution kernels, with a step size of 2;
and a second block: is composed of a 3 x 3 maximum pooling layer and a convolution layer composed of two 64 3 x 3 convolution kernels;
third block: a convolution layer consisting of two layers of 128 3 x3 convolution kernels;
fourth block: a convolution layer consisting of two layers of 256 3 x3 convolution kernels;
fifth block: a convolution layer consisting of two layers of 512 3 x3 convolution kernels;
plus the final average pooling layer.
Specifically, the training objective function of the standardized flow model is:
In formula (2), Θ= { { μ y }, Σ, θ } represents all parameters, where y represents a class, μ y represents a variance of the y class, Σ represents a covariance, θ represents a parameter of the model; y (x i) represents the class of the ith sample, Z i represents samples that conform to a gaussian distribution after DNF processing, Representing the distribution of each class y, after training, the normalized flow model creates a normalized space for Z, and wherein each class conforms to a Gaussian distribution; det represents the determinant of the square matrix, x i=f(Zi), where f is a combination of T inverse autoregressive transforms, expressed as:
f=fT·fT-1...·f0 (3)
Wherein each f t is a structured neural network, and T is more than or equal to 0 and less than or equal to T.
Specifically, the standardized flow model is composed of 10 mask autoregressive flow blocks, and each mask autoregressive flow block is realized by three layers of fully connected neural networks and is an inverse autoregressive transformationWherein/>Refers to the j-th output of the i-th chunk masked autoregressive stream chunk,/>And/>{ F μ,fα } is an unconstrained function, exp denotes an exponential function based on a natural constant e. In the experiment, the model was trained using Adam optimizer, with batch size set to 300 and learning rate set to 0.003.
Specifically, the multi-layer perceptron comprises an input layer, a hidden layer and an output layer; the input layer has 5 nodes, the hidden layer has 6 nodes, the output layer has 4 nodes, and the connection between the nodes of adjacent layers has weight.
Preferably, b×b=224×224.
Specifically, when testing, firstly, a camera is required to collect a photo of a human face, after the 68 key points of the human face are identified, the photo which is cut into 224×224 size and contains 68 key points is sent into an expression identification model.
According to the expression recognition method based on the multistage deep neural network, the labels of the picture data of the 4 expression labels with higher recognition complexity, namely, the breath, nausea, fear and contempt are uniformly modified into other labels, and then the data of the four expression labels, namely, happiness, surprise, sadness and other expression labels, are sent into the first expression recognition network model for training, and the first expression recognition network model can successfully recognize the 4 classifications of other, happy, surprise and sadness through continuous learning and training. Then, the data set consisting of the 4 expression labels of the Qi, nausea, fear and contempt is sent into a feature extraction network model, and the feature extraction network model outputs feature vectors of the data. And then, processing the feature vectors through a standardized flow model to enlarge the class interval and reduce the class inner distance, so that the later training and recognition are facilitated. Finally, the processed feature vectors are fed into a multi-layer perceptron (MLP). Through continuous learning and training of the MLP, finally, the MLP can successfully identify 4 expressions of Qi, nausea, fear and contempt. Through the complete flow, the seven basic expressions can be identified with higher precision by utilizing the trained multi-level neural network model (a first expression identification network model, a feature extraction network model, a standardized flow model and a multi-layer perceptron).
Compared with a single-stage neural network, the multi-stage neural network utilizes the multi-stage characteristics of the single-stage neural network, and different modules distinguish expressions with different complexity, so that the complex expressions can be better resolved, and the recognition capability of higher precision on all expressions is achieved.
In a specific experiment, a dataset containing only asian faces was used. For this dataset, the VGG16 was used for training and testing first, and the final accuracy could only reach 38.72%. Then, single-stage ResNet is used for training and testing, and the accuracy reaches 42.58%. After the invention is applied, the accuracy reaches 86.32 percent.
Drawings
Fig. 1 is a step flowchart of an expression recognition method based on a multi-level deep neural network according to an embodiment of the present invention;
FIG. 2 is a block diagram of a multi-stage deep neural network according to an embodiment of the present invention;
Fig. 3 is a frame structure diagram of a first expression recognition network model ResNet (1) provided in an embodiment of the present invention;
FIG. 4 is a block diagram of a feature extraction network model ResNet (2) provided by an embodiment of the invention;
FIG. 5 is a block diagram of a standardized flow model (DNF model) provided by an embodiment of the present invention;
FIG. 6 is a block diagram of a multi-layer perceptron (MLP) provided by an embodiment of the invention;
FIG. 7 is a block diagram of a mask autoregressive flow block provided by an embodiment of the invention.
Detailed Description
The following examples are given for the purpose of illustration only and are not to be construed as limiting the invention, including the drawings for reference and description only, and are not to be construed as limiting the scope of the invention as many variations thereof are possible without departing from the spirit and scope of the invention.
The expression recognition method based on the multistage deep neural network provided by the embodiment of the invention, as shown in figures 1 and 2, comprises the following steps:
s1, preprocessing a training data set containing seven-big expression labels.
Among the seven-big expression labels are Happy ("Happy"), surprised ("Surprise"), sad ("Sad"), angry ("Angry"), nausea ("Disgust"), fear ("Fear") and contempt ("Contempt"), respectively.
Here, preprocessing refers to changing the labels of the picture data identifying the 4 expression labels of higher complexity, namely, the breath, nausea, fear and contempt, to "Other" ("Other"), and the labels of the rest of the picture data are not changed, and cutting all the picture data to be b×b size. Namely, pretreatment is to modify the labels of the 4 expressions of Qi, nausea, fear and contempt into Other labels (Other) to obtain the data of the 4 labels of the Other labels, happy, surprise and sadness. The present example uniformly clips data to a 224×224 size, i.e., b=224. Of course, the size of B can be determined according to actual requirements.
S2, sending the data of the happy, surprised, sad and other four expression labels obtained after the pretreatment in the step S1 into a first expression recognition network model for training so as to fix weight data of the first expression recognition network model.
As shown in fig. 2, the first expression recognition network model is built based on ResNet a network model, which is abbreviated as ResNet (1) in this example, and includes a first convolution module and a full connection module, and an image with a size of b×b=224×224 is input.
As shown in fig. 3, the first convolution module includes:
a first block: a convolution layer consisting of 64 7 x 7 convolution kernels, with a step size of 2;
And a second block: is composed of a3 x 3 max pooling layer (Maxpool) and a convolution layer composed of two layers of 64 3 x 3 convolution kernels;
third block: a convolution layer consisting of two layers of 128 3 x3 convolution kernels;
fourth block: a convolution layer consisting of two layers of 256 3 x3 convolution kernels;
fifth block: a convolution layer consisting of two layers of 512 3 x3 convolution kernels;
the fully connected module comprises:
an averaging pooling layer (Avg Pooling), a fully-connected layer (FC) and a Softmax layer;
A Dropout strategy was added before the fully connected layer and 50% of neurons were randomly inactivated.
S3, cutting data of the 4 expression labels with higher recognition complexity, namely, the breath, nausea, fear and contempt, into a B multiplied by B size, and then sending the data into a feature extraction network model to obtain corresponding feature data.
Here, as shown in fig. 2, the feature extraction network model is also built based on ResNet a network model, which is simply referred to as ResNet (2) model in this example, but includes only the second convolution module, and a picture of size b×b=224×224 is input. The purpose of this is to make the extracted expressive features spatially more aggregated, since a certain degree of aggregation of the data is required when using the normalized flow model afterwards. As shown in fig. 4, the second convolution module includes:
a first block: a convolution layer consisting of 64 7 x 7 convolution kernels, with a step size of 2;
and a second block: is composed of a 3 x 3 maximum pooling layer and a convolution layer composed of two 64 3 x 3 convolution kernels;
third block: a convolution layer consisting of two layers of 128 3 x3 convolution kernels;
fourth block: a convolution layer consisting of two layers of 256 3 x3 convolution kernels;
fifth block: a convolution layer consisting of two layers of 512 3 x3 convolution kernels;
plus the final average pooling layer.
And S4, sending the characteristic data obtained in the step S3 into a standardized flow model (DNF model) for processing, so that the data is subjected to Gaussian distribution.
The input DNF model is the eigenvectors from the Resnet (2) model, and the output is the normalized eigenvector space. The DNF model will change the original data of features with smaller class-spacing into data with larger class-spacing, making it subject to gaussian distribution, as shown in fig. 5. Wherein, the training objective function of the DNF model is:
In formula (2), Θ= { { μ y }, Σ, θ } represents all parameters, where y represents a class, μ y represents a variance of the y class, Σ represents a covariance, θ represents a parameter of the model; y (x i) represents the class of the ith sample, Z i represents samples that conform to a gaussian distribution after DNF processing, Representing the distribution of each class y, after training, the normalized flow model creates a normalized space for Z, and wherein each class conforms to a Gaussian distribution; det represents the determinant of the square matrix, x i=f(Zi), where f is a combination of T inverse autoregressive transforms, expressed as:
f=fT·fT-1...·f0 (3)
Wherein each f t is a structured neural network, and T is more than or equal to 0 and less than or equal to T.
The normalized flow model consists of 10 masked autoregressive flow blocks, each of which (MAF) is implemented by a three-layer fully connected neural network, all being an inverse autoregressive transformationWherein/>Refers to the j-th output of the i-th chunk masked autoregressive stream chunk,/>And/>{ F μ,fα } is an unconstrained function, exp denotes an exponential function based on a natural constant e. In the experiment, the model was trained using Adam optimizer, with batch size set to 300 and learning rate set to 0.003. The structure of MAF is shown in FIG. 7.
S5, sending the data obtained in the step S4 into a multi-layer perceptron (MLP) for training, and storing the trained parameters.
The data input to the MLP are vectors after the DNF model processing. The vector class spacing in the vector space is larger and the class inter-class spacing is smaller. The MLP model comprises an input layer, a hidden layer and an output layer. Wherein the input layer has 5 nodes, the hidden layer has 6 nodes, and the output layer has 4 nodes. The connections between adjacent tier nodes are weighted. By training, these edges are assigned the correct weights. The network structure of the MLP is shown in fig. 6.
S6, testing the trained first expression recognition network model, the feature extraction network model, the standardized flow model and the expression recognition model (multi-level deep neural network) formed by the multi-level perceptron.
When testing, firstly, a camera is required to collect a photo of a human face, after the key points at 68 positions of the human face are identified, the photo which is cut into 224 multiplied by 224 and contains 68 key points is sent into an expression identification model.
S7, using the expression recognition model passing the test for recognizing the unknown expression image, wherein the recognition process comprises the following steps:
And cutting the unknown expression image, sending the cut unknown expression image into the first expression recognition network model passing the test, directly outputting a recognition result if judging that the unknown expression image is the expression except the other expressions, otherwise, sending the unknown expression image into a second expression recognition network model consisting of the feature extraction network model, the standardized flow model and the multi-layer perceptron, and outputting a classification result with the maximum probability.
According to the expression recognition method based on the multistage deep neural network, the labels of the 4 expression labels ('gas', 'nausea', 'fear' and 'contempt') with higher recognition complexity are uniformly modified into 'others', all the data are sent into the first expression recognition network model ResNet (1) to be trained, and the first expression recognition network model ResNet (1) can successfully recognize the 'others' and the remaining 3 classifications ('happiness', 'surprise', 'sadness') through continuous learning and training. Next, the dataset consisting of the 4 tags ("vital", "nausea", "fear", and "contempt") is fed into the feature extraction network model ResNet (2), and the feature extraction network model ResNet (2) outputs feature vectors for these data. Then, these feature vectors are processed by a standardized flow model (DNF model) to enlarge the class spacing and reduce the class inner spacing, so that the later training and recognition are facilitated. Finally, the processed feature vectors are fed into a multi-layer perceptron (MLP). Through continuous learning and training of the MLP, finally, the MLP can successfully identify the A expression. Through the complete flow, the seven basic expressions can be identified with higher precision by utilizing the trained multi-level neural network model (a first expression identification network model, a feature extraction network model, a standardized flow model and a multi-layer perceptron).
Compared with a single-stage neural network, the multi-stage neural network utilizes the multi-stage characteristics of the single-stage neural network, and different modules distinguish expressions with different complexity, so that the complex expressions can be better resolved, and the high-precision recognition of all the expressions is achieved.
In particular experiments, datasets containing only asian faces were particularly employed herein. The dataset contains 40005 Asian face pictures, and each basic expression has 5715 pictures. Wherein 16002 pictures are used for training, 12001 pictures are used for verification, and 12002 pictures are used for testing. During training, the batch size was set to 64, the learning rate was set to 0.01, and the number of iterations was 50. At the end of the test, the formula of the accuracy is as follows:
Wherein T refers to the total number of correctly judged expressions, and F refers to the total number of incorrectly judged expressions. For this dataset, the VGG16 was used for training and testing first, and the final accuracy could only reach 38.72%. Then, single-stage ResNet is used for training and testing, and the accuracy reaches 42.58%. After the invention is applied, the accuracy reaches 86.32 percent.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (10)

1. The expression recognition method based on the multistage deep neural network is characterized by comprising the following steps of:
S1, preprocessing a training data set containing seven expression labels;
wherein the seven big expression labels are happy, surprised, sad, angry, nausea, fear and contempt respectively;
The preprocessing is to change the labels of the picture data of 4 expression labels with higher recognition complexity, namely, the breath, nausea, fear and contempt, into other labels, and the labels of the rest picture data are not changed, and cut all the picture data into a size of B multiplied by B;
S2, sending the data of the happy, surprised, sad and other four expression labels obtained after the pretreatment in the step S1 into a first expression recognition network model for training so as to fix weight data of the first expression recognition network model;
S3, cutting data of 4 expression labels with higher recognition complexity, namely, angry, nausea, fear and contempt, into a B multiplied by B size, and then sending the data into a feature extraction network model to obtain corresponding feature data;
S4, sending the characteristic data obtained in the step S3 into a standardized flow model for processing, so that the data is subjected to Gaussian distribution;
s5, sending the data obtained in the step S4 into a multi-layer perceptron for training, and storing the trained parameters;
s6, testing the trained expression recognition model consisting of the first expression recognition network model, the feature extraction network model, the standardized flow model and the multi-layer perceptron;
s7, using the expression recognition model passing the test for recognizing the unknown expression image, wherein the recognition process comprises the following steps:
Cutting the unknown expression image into B multiplied by B, then sending the B multiplied by B into the first expression recognition network model passing the test, directly outputting a recognition result if the B multiplied by B is judged to be the other expression except the other expression, otherwise sending the B multiplied by B into a second expression recognition network model consisting of the feature extraction network model, the standardized flow model and the multi-layer perceptron, and outputting a classification result with the maximum probability.
2. The expression recognition method based on the multistage deep neural network according to claim 1, wherein:
The first expression recognition network model is built based on ResNet network models and comprises a first convolution module and a full connection module;
The feature extraction network model is built based on ResNet-18 network models, including a second convolution module.
3. The expression recognition method based on the multi-level deep neural network according to claim 2, wherein the first convolution module comprises:
a first block: a convolution layer consisting of 64 7 x 7 convolution kernels, with a step size of 2;
and a second block: is composed of a 3 x 3 maximum pooling layer and a convolution layer composed of two 64 3 x 3 convolution kernels;
third block: a convolution layer consisting of two layers of 128 3 x3 convolution kernels;
fourth block: a convolution layer consisting of two layers of 256 3 x3 convolution kernels;
fifth block: a convolution layer consisting of two layers of 512 3 x3 convolution kernels;
the fully connected module comprises:
An averaging pooling layer, a full connectivity layer, and a Softmax layer;
A Dropout strategy was added before the fully connected layer and 50% of neurons were randomly inactivated.
4. The expression recognition method based on a multi-level deep neural network according to claim 3, wherein the first expression recognition network model uses cross entropy as its loss function, and the formula is:
In formula (1), N represents the number of categories, y (i) is whether the output category is the same as the label, the same as 1, otherwise 0, Representing the predicted value.
5. The expression recognition method based on the multi-level deep neural network according to claim 2, wherein the second convolution module comprises:
a first block: a convolution layer consisting of 64 7 x 7 convolution kernels, with a step size of 2;
and a second block: is composed of a 3 x 3 maximum pooling layer and a convolution layer composed of two 64 3 x 3 convolution kernels;
third block: a convolution layer consisting of two layers of 128 3 x3 convolution kernels;
fourth block: a convolution layer consisting of two layers of 256 3 x3 convolution kernels;
fifth block: a convolution layer consisting of two layers of 512 3 x3 convolution kernels;
plus the final average pooling layer.
6. The expression recognition method based on the multistage deep neural network according to claim 1, wherein the training objective function of the standardized flow model is:
In formula (2), Θ= { { μ y }, Σ, θ } represents all parameters, where y represents a class, μ y represents a variance of the y class, Σ represents a covariance, θ represents a parameter of the model; y (x i) represents the class of the ith sample, Z i represents samples that conform to a gaussian distribution after DNF processing, Representing the distribution of each class y, after training, the normalized flow model creates a normalized space for Z, and wherein each class conforms to a Gaussian distribution; det represents the determinant of the square matrix, x i=f(Zi), where f is a combination of T inverse autoregressive transforms, expressed as:
f=fT·fT-1...·f0 (3)
Wherein each f t is a structured neural network, and T is more than or equal to 0 and less than or equal to T.
7. The expression recognition method based on the multistage deep neural network according to claim 6, wherein: the standardized flow model consists of 10 mask autoregressive flow blocks, each mask autoregressive flow block is realized by a three-layer fully-connected neural network and is an inverse autoregressive transformationWherein/>Refers to the j-th output of the i-th chunk masked autoregressive stream chunk,/>And/>{ F μ,fα } is an unconstrained function, exp denotes an exponential function based on a natural constant e.
8. The expression recognition method based on the multistage deep neural network according to claim 1, wherein: the multi-layer perceptron comprises an input layer, a hidden layer and an output layer; the input layer has 5 nodes, the hidden layer has 6 nodes, the output layer has 4 nodes, and the connection between the nodes of adjacent layers has weight.
9. The expression recognition method based on the multistage deep neural network according to any one of claims 1 to 8, wherein: b×b=224×224.
10. The expression recognition method based on the multistage deep neural network according to claim 9, wherein: when testing, firstly, a camera is required to collect a photo of a human face, after the key points at 68 positions of the human face are identified, the photo which is cut into 224 multiplied by 224 and contains 68 key points is sent into an expression identification model.
CN202111148260.XA 2021-09-29 2021-09-29 Expression recognition method based on multistage deep neural network Active CN113869221B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111148260.XA CN113869221B (en) 2021-09-29 2021-09-29 Expression recognition method based on multistage deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111148260.XA CN113869221B (en) 2021-09-29 2021-09-29 Expression recognition method based on multistage deep neural network

Publications (2)

Publication Number Publication Date
CN113869221A CN113869221A (en) 2021-12-31
CN113869221B true CN113869221B (en) 2024-05-24

Family

ID=78992218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111148260.XA Active CN113869221B (en) 2021-09-29 2021-09-29 Expression recognition method based on multistage deep neural network

Country Status (1)

Country Link
CN (1) CN113869221B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705379A (en) * 2019-09-12 2020-01-17 广州大学 Expression recognition method of convolutional neural network based on multi-label learning
CN111639544A (en) * 2020-05-07 2020-09-08 齐齐哈尔大学 Expression recognition method based on multi-branch cross-connection convolutional neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684911B (en) * 2018-10-30 2021-05-11 百度在线网络技术(北京)有限公司 Expression recognition method and device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705379A (en) * 2019-09-12 2020-01-17 广州大学 Expression recognition method of convolutional neural network based on multi-label learning
CN111639544A (en) * 2020-05-07 2020-09-08 齐齐哈尔大学 Expression recognition method based on multi-branch cross-connection convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于多特征融合卷积神经网络的人脸表情识别;王建霞;陈慧萍;李佳泽;张晓明;;河北科技大学学报;20200103(第06期);全文 *

Also Published As

Publication number Publication date
CN113869221A (en) 2021-12-31

Similar Documents

Publication Publication Date Title
Coşkun et al. Face recognition based on convolutional neural network
Chen Deep learning with nonparametric clustering
CN111652066A (en) Medical behavior identification method based on multi-self-attention mechanism deep learning
Sun et al. Facial expression recognition based on a hybrid model combining deep and shallow features
CN112732916B (en) BERT-based multi-feature fusion fuzzy text classification system
Prakash et al. Face recognition with convolutional neural network and transfer learning
CN113407660B (en) Unstructured text event extraction method
CN112199536A (en) Cross-modality-based rapid multi-label image classification method and system
CN107491729B (en) Handwritten digit recognition method based on cosine similarity activated convolutional neural network
CN112733866A (en) Network construction method for improving text description correctness of controllable image
Elleuch et al. Towards unsupervised learning for Arabic handwritten recognition using deep architectures
KR100729273B1 (en) A method of face recognition using pca and back-propagation algorithms
CN114818703B (en) Multi-intention recognition method and system based on BERT language model and TextCNN model
Qiao et al. A face recognition system based on convolution neural network
Pandey et al. Face Recognition Using Machine Learning
CN110490028A (en) Recognition of face network training method, equipment and storage medium based on deep learning
Deeb et al. Human facial emotion recognition using improved black hole based extreme learning machine
Dan et al. Pf-vit: Parallel and fast vision transformer for offline handwritten chinese character recognition
CN114170659A (en) Facial emotion recognition method based on attention mechanism
CN113869221B (en) Expression recognition method based on multistage deep neural network
CN116775880A (en) Multi-label text classification method and system based on label semantics and transfer learning
CN113159071B (en) Cross-modal image-text association anomaly detection method
Zeghina et al. Face Recognition Based on Harris Detector and Convolutional Neural Networks
Luqin A survey of facial expression recognition based on convolutional neural network
Khaliluzzaman et al. Automatic facial expression recognition using shallow convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230914

Address after: No. 20, East Road, University City, Chongqing, Shapingba District, Chongqing

Applicant after: Chongqing Daipu Technology Co.,Ltd.

Address before: 400799 No. A022, floor 8, No. 142, Yunhan Avenue, Shuitu street, Liangjiang New Area, Yubei District, Chongqing

Applicant before: Jiuziyuan (Chongqing) Intelligent Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant