CN112070058A - Face and face composite emotional expression recognition method and system - Google Patents

Face and face composite emotional expression recognition method and system Download PDF

Info

Publication number
CN112070058A
CN112070058A CN202010985959.0A CN202010985959A CN112070058A CN 112070058 A CN112070058 A CN 112070058A CN 202010985959 A CN202010985959 A CN 202010985959A CN 112070058 A CN112070058 A CN 112070058A
Authority
CN
China
Prior art keywords
face
feature vector
image
layer
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010985959.0A
Other languages
Chinese (zh)
Inventor
陈海波
罗志鹏
张治广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyan Technology Beijing Co ltd
Original Assignee
Shenyan Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyan Technology Beijing Co ltd filed Critical Shenyan Technology Beijing Co ltd
Priority to CN202010985959.0A priority Critical patent/CN112070058A/en
Publication of CN112070058A publication Critical patent/CN112070058A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a face and face composite emotional expression recognition method, which comprises the following steps: carrying out face detection on the image, and extracting key points of facial features; calculating distance measurement between the key points to obtain a geometric representation vector of the face; constructing a double-branch face detection network, and obtaining a first feature vector from a face image through a first branch network structure; obtaining a second feature vector by the obtained geometric representation vector of the face through a second branch network structure; connecting the first feature vector with the second feature vector to obtain a third feature vector, and obtaining the expression category confidence of the current face image; and constructing a multi-classification loss function of the face detection network to perform optimization solution, and predicting the expression category. The method has higher identification precision for the composite emotional expression category expressed by the face in the high-resolution image, and the proposed model has stronger robustness and has good identification effect for classifying the micro-expressions of the face.

Description

Face and face composite emotional expression recognition method and system
Technical Field
The invention belongs to the technical field of image processing and computer vision, and particularly relates to a method and a system for recognizing face and face composite emotional expressions.
Background
In recent years, with the continuous update of intelligent devices, the continuous change of algorithms such as machine learning and deep learning, the development of face recognition technology is more and more mature, and the face recognition technology is widely applied to various large application platforms and daily life. Meanwhile, as an important branch of the field of face Recognition, Facial Expression Recognition (FER) is also regarded by more researchers. Facial expression recognition has gained wide attention in many fields, such as human-computer interaction, driver fatigue monitoring, intelligent robots, smart medicine, and the like. There are at least 21 facial expressions in human, and in addition to 6 common happiness, surprise, sadness, anger, disgust and fear, 15 distinguishable compound expressions such as surprise + surprise, anger + anger, etc., it is of course possible to further refine the facial expression categories according to different criteria.
Generally, the facial expression recognition algorithm mainly includes four steps: obtaining a face image, detecting a face, extracting face features and classifying the face features. Generally speaking, the facial expression recognition algorithm can be divided into a traditional research method and a deep learning-based research method. In the traditional research method, the face feature extraction and classification are often divided into two independent parts. Firstly, a mathematical method is adopted, a computer technology is used for processing a facial expression image, expression characteristics are extracted, and then a classifier is used for classifying the facial expression characteristics, so that the category of the expression is determined. The traditional feature extraction algorithm mainly comprises a principal component analysis method, a linear discriminant analysis method, an independent component analysis method and the like. A comparison method [1] (from stove, Tang Jing Hai, Lijing Wen, and the like.) support vector identification analysis and application in facial expression recognition [ J ]. electronic newspaper 2008,36(4): 725-. The conventional feature classification algorithm can be mainly classified into a classification method based on distance measurement and a classification method based on a bayesian network. The former mainly accomplishes the classification task by computing distance measures between data. The typical algorithm mainly comprises a nearest neighbor method and an SVM algorithm. The nearest neighbor method is used for classifying by comparing the distance between the sample to be predicted and the predicted sample, and whether the sample to be predicted and a certain predicted sample belong to the same class is determined by the distance. The SVM algorithm optimizes the objective function by finding a hyperplane that maximizes the distance between samples of different classes. The classification method based on the Bayesian network infers the unknown expression probability by analyzing the known expression information. The facial expression recognition method based on deep learning generally integrates the processes of facial feature extraction and classification into a network. The deep learning network has better feature extraction capability on the image, and the extracted features have rich semantic information, so that the complicated process of manually extracting the features is avoided. The facial expression recognition network based on deep learning usually extracts features of a facial image through a plurality of convolutional neural network layers, and then accesses a full connection layer to realize nonlinear classification. The number of the final neurons is determined by the type of the facial expression, and finally the probability value of each type is obtained through a softmax function. A comparison method [2] (Huiyuan Yang, Umur Ciftci, Lijun Yin. facial Expression registration by De-Expression Recognition Residue learning. in Proceedings of the IEEE Conference on Computer Vision and Pattern Registration (CVPR),2018, pp.2168-2177.) proposes a residual Expression Recognition algorithm based on cGAN (conditional GAN) and Expression element filtering. The method comprises the steps of filtering neutral elements of a face image through a cGAN network, and processing residual expression elements by using MLP (multi-level hierarchical processing), so that high-precision recognition of face expressions is realized. However, the algorithm only realizes the recognition of the basic seven types of expressions, the similarity between the types is higher for the compound expressions, the quality of the algorithm cannot be verified, and meanwhile, the GAN network is sampled, so that the training difficulty of the model is increased.
Disclosure of Invention
1. Objects of the invention
The method and the system for recognizing the facial expression of the human face are provided aiming at the problem that the facial expression of the human face is difficult to recognize under the condition of compound emotion in a high-resolution image.
2. The technical scheme adopted by the invention
A face and face composite emotional expression recognition method comprises the following steps:
s01: carrying out face detection on the image, and extracting key points of facial features;
s02: calculating distance measurement between the key points to obtain a geometric representation vector of the face;
s03: constructing a double-branch face detection network, wherein the double-branch face detection network comprises a first branch network structure and a second branch network structure, and a face image is subjected to the first branch network structure to obtain a first feature vector; obtaining a second feature vector by the obtained geometric representation vector of the face through a second branch network structure; the first feature vector and the second feature vector have the same size, and the first feature vector and the second feature vector are connected to obtain a third feature vector, so that the expression category confidence of the current face image is obtained;
s04: and constructing a multi-classification loss function of the face detection network to perform optimization solution, and predicting the expression category.
In a preferred technical solution, before the step S02, the method further includes marking the extracted key points, and performing a preprocessing operation on the image.
In a preferred embodiment, the method for detecting a human face in step S01 includes:
s11: taking an image containing a human face as a positive sample, taking an image not containing the human face as a negative sample, respectively extracting directional gradient histogram features from a certain number of positive and negative samples, and obtaining a directional gradient histogram feature descriptor;
s12: training the positive and negative samples by using a support vector machine algorithm to obtain a trained model for realizing secondary classification;
s13: carrying out hard-to-separate sample mining on the trained model, wherein the hard-to-separate sample mining comprises the steps of scaling negative sample data in a training set, matching with a template, and carrying out searching matching through a template sliding window; and if the false detection occurs, intercepting the false detection face area and adding the false detection face area into the negative sample data.
In a preferred technical scheme, the preprocessing operation comprises a first layer of regression training and a second layer of regression training;
the first layer of regression training comprises the following steps:
representing the data organization form in the first layer regression training as
Figure BDA0002689279810000031
Wherein, IπiIs an image of a human face in the training data set,
Figure BDA0002689279810000032
for predicted keypoint locations, Δ S, of the t-th layer in the first layer regressioni (t)Is the difference between the predicted value and the true value of the t-th layer, and the iterative formula is as follows:
Figure BDA0002689279810000033
Figure BDA0002689279810000034
wherein I represents the input of each layer in the iterative process;
continuously iterating according to the iteration mode, and generating gamma when the regression cascade layer number of the first layer is set to be K layers12,…,γkRegressors, namely regression models obtained through training;
the second layer of regression training comprises the error delta S of each first layer after regression is completedi (t)As input to each second-level regression, each regressor gamma is determined by a gradient lifting tree algorithmt
In a preferred technical solution, the obtaining of the geometric representation vector of the face in step S02 includes:
s21: calculating the distance between each feature key point and the feature key point at the nose:
l′(i)=l(i)-l(30)
wherein, l is a key point vector value, i is a characteristic key point number, and l(30)The characteristic key points at the nose;
s22: then use the average key point face lm(i)Instead of the original face image, the formula is as follows:
Figure BDA0002689279810000041
wherein N is the number of samples of each face image, and j is a sampling number;
s23: obtaining a geometric representation vector of the human face:
lr(i)=l′(i)-lm(i)
in a preferred embodiment, in step S03, the first branch network structure is designed based on an AlexNet network structure, the last two full connection layers of the AlexNet structure are removed from the first branch network structure, other structures remain unchanged, and batch normalization operation is added after each convolution layer to obtain a first feature vector with a size of 256 dimensions.
In a preferred embodiment, in step S03, the second branch network structure is formed by a full connection layer without a bias term, the geometric representation vector obtains a 256-dimensional second feature vector through the second branch network structure, and sends the obtained third feature vector to the last full connection layer to obtain a feature vector F with an output size of 512 dimensions.
In a preferred technical solution, the face detection network multi-classification loss function constructed in step S04 is composed of two parts, where the first part of the loss function predicts the probability that an expression belongs to each category using a softmax function, and the formula is as follows:
Figure BDA0002689279810000042
p represents the probability of predicting a sample of class x as j, y is an indicator variable, where z isi、ZkThe prediction results of the ith and the K-th classes are shown, and K shows the number of the classes of the expression;
calculating the uncertainty between the predicted output value and the true tag value using a cross entropy loss function, the formula is as follows:
Figure BDA0002689279810000051
wherein C represents the number of predicted expression categories;
and the second part of loss function optimizes the distribution of the characteristics among different classes by using the triple loss function, and the formula is as follows:
ltri=[α+dp-dn]+
wherein d ispIs the characteristic distance of the positive sample pair, dnIs the characteristic distance of the negative sample pair, alpha is the minimum separation between the two distances, [ z ]]+Represents the function max (z, 0);
and adding the two loss functions to obtain the overall network loss function.
The invention also discloses a face and face composite emotional expression recognition system, which comprises:
the face detection extraction module is used for carrying out face detection on the image and extracting key points of facial features;
the face geometric representation module is used for calculating distance measurement between the key points to obtain a geometric representation vector of the face;
the double-branch face prediction module is used for constructing a double-branch face detection network, comprises a first branch network structure and a second branch network structure, and obtains a first feature vector from a face image through the first branch network structure; obtaining a second feature vector by the obtained geometric representation vector of the face through a second branch network structure; the first feature vector and the second feature vector have the same size, and the first feature vector and the second feature vector are connected to obtain a third feature vector, so that the expression category confidence of the current face image is obtained;
and the category prediction module is used for constructing a face detection network multi-classification loss function to carry out optimization solution and predicting the expression category.
In a preferred technical solution, the system further comprises an image preprocessing module, configured to mark the extracted key points and perform preprocessing operation on the image.
3. Advantageous effects adopted by the present invention
(1) A robust network structure is designed for realizing the function of recognizing facial expressions, the feature information of key points of the human face is used as one input of the network, the spatial geometric information of the image is used for assisting in recognition, and meanwhile, the other branch network extracts rich image texture information. Through a large number of case tests, the method has higher identification precision for the composite emotional expression categories expressed by the human face in the high-resolution image, and the proposed model has stronger robustness and has good identification effect on the classification of the human face micro-expression.
(2) The method adopts a Dlib face detection algorithm to carry out face detection on the image, extracts face feature key points and uses the key points as the basis of a subsequent identification process; marking the extracted key points by using a Face Alignment algorithm, and reducing the size of the image by using an image cutting algorithm so as to carry out preprocessing operation on the high-resolution image; and calculating distance measurement between the key points, using the average key point human face to replace the original human face image, calculating the space geometric characteristic information of the human face image, and assisting the whole recognition process. A double-branch face detection network is designed, and a face imaging branch is designed based on an AlexNet network and is mainly used for extracting rich texture feature information of a face image; the human face characteristic point branch is composed of a full connection layer and is identified by the aid of human face key point characteristic information; a Cross Entropy Loss function (Cross entry Loss) and a triple Loss (triple Loss) are adopted to design a multi-classification Loss function of the face detection network, so that positive samples of the same type of samples are closer, and negative samples of different types of samples are farther.
Drawings
FIG. 1 is a flow chart of a method for recognizing facial compound emotional expressions according to the present invention;
FIG. 2 is a schematic diagram of a face alignment algorithm in the present embodiment;
fig. 3 is a schematic diagram of a network structure in the present embodiment;
FIG. 4 is a diagram of the architecture of the facial compound emotional expression recognition system of the present invention.
Detailed Description
The technical solutions in the examples of the present invention are clearly and completely described below with reference to the drawings in the examples of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without inventive step, are within the scope of the present invention.
The present invention will be described in further detail with reference to the accompanying drawings.
Example 1
As shown in fig. 1, a method for recognizing compound emotional expressions of human faces and faces includes the following steps:
s01: carrying out face detection on the image, and extracting key points of facial features;
s02: calculating distance measurement between the key points to obtain a geometric representation vector of the face;
s03: constructing a double-branch face detection network, wherein the double-branch face detection network comprises a first branch network structure and a second branch network structure, a first feature vector is obtained by a face image through the first branch network structure, and texture features of the face image are extracted; obtaining a second feature vector by the obtained geometric representation vector of the face through a second branch network structure; the first feature vector and the second feature vector have the same size, and the first feature vector and the second feature vector are connected to obtain a third feature vector, so that the expression category confidence of the current face image is obtained;
s04: and constructing a multi-classification loss function of the face detection network to perform optimization solution, and predicting the expression category.
In a preferred embodiment, after the step S01, the step S02 further includes marking the extracted key points, and performing a pre-processing operation on the image.
In a preferred embodiment, the face detection method in step S01 includes the following steps:
s11: taking an image containing a human face as a positive sample, taking an image not containing the human face as a negative sample, respectively extracting directional gradient histogram (Hog) features from a certain number of positive and negative samples, and acquiring a directional gradient histogram (Hog) feature descriptor; in particular, the amount of negative sample data is much larger than the amount of positive sample data, so that more data amount can be obtained by randomly clipping the negative sample.
S12: training positive and negative samples by using a Support Vector Machine (SVM) algorithm to obtain a trained model for realizing two-classification;
s13: carrying out hard-to-separate sample mining on the trained model, wherein the hard-to-separate sample mining comprises the steps of scaling negative sample data in a training set, matching with a template, and carrying out searching matching through a template sliding window; and if the false detection occurs, intercepting the false detection face area and adding the false detection face area into the negative sample data.
And obtaining the finally trained model through the steps. The classifier is used for detecting the face pictures, the pictures with different sizes are subjected to sliding scanning, the Hog characteristics of the pictures are sequentially extracted, and finally the pictures are classified by the classifier. And if the classification result is the face, calibrating the face. And if the same face is calibrated for multiple times after one round of sliding scanning, and the redundant face is removed by adopting NMS operation.
In a preferred embodiment, as shown in fig. 2, the step of marking the extracted key points and performing a preprocessing operation on the image includes:
a mathematical model was built using two-layer regression. The iterative formula of the first-layer regression is as follows:
Figure BDA0002689279810000071
wherein, S is a shape vector and stores the position information of all key feature points of the face. Where I denotes the input, γ, for each layer in the iterative processtIs one layerAnd the input quantity of the regressor is the current shape variable and the training image corresponding to the shape variable, and the output quantity of the regressor is the position updating quantity of all the shape variables on the training image. Therefore, in the cascade regressor in the first layer, the positions of all key feature points in the training image are updated once every time the cascade regressor passes through the first-level cascade regressor, so that a more correct position is achieved. Gamma raytThe inner part is also the first regression, i.e. the second layer regression. The target of the second-order regression is the difference between the current predicted value and the true value.
The first layer of regression training process is described below. First, a training data set is represented as (I)1,S1),(I2,S2),…,(In,Sn) Wherein, IiRepresenting the ith image, SiIndicating the location of the corresponding feature keypoints in the image. The data organization form in the first layer regression training can be expressed as
Figure BDA0002689279810000072
Wherein, IπiIs an image of a human face in the training data set,
Figure BDA0002689279810000081
for predicted keypoint locations, Δ S, of the t-th layer in the first layer regressioni (t)Is the difference between the predicted value and the true value of the t-th layer.
Figure BDA0002689279810000082
The iterative formula is shown in formula (1), Δ Si (t)The iteration formula is specifically as follows:
Figure BDA0002689279810000083
the iteration is continuously carried out according to the iteration mode shown above, and when the regression cascade layer number of the first layer is set to be K layers, gamma is generated12,…,γkAnd the regressors are regression models obtained by training.
The second layer of regression training process determines each gammatHow to train the training is achieved, the method is realized by adopting a Gradient Boosting Tree Algorithm (Gradient Boosting Tree Algorithm). The error Delta S after each first layer regression is completedi (t)As input to each second-level regression, each regressor gamma is determined by a gradient lifting tree algorithmt
Through the above steps, a plurality of feature key points are detected from each face image, and the number of the feature key points may be preset, and in this embodiment, the number is 68.
In a preferred embodiment, the method for obtaining the geometric representation vector of the human face in step S02 includes the following steps:
s21: calculating the distance between each feature key point and the feature key point at the nose:
l′(i)=l(i)-l(30)
wherein l is a key point vector value, i is a characteristic key point number, i is more than or equal to 0 and less than or equal to 68, and l(30)The characteristic key points at the nose;
s22: then use the average key point face lm(i)Instead of the original face image, the formula is as follows:
Figure BDA0002689279810000084
wherein, N is the number of samples of each face image, and in this embodiment, 250 is adopted, and j is a sample number;
s23: obtaining a geometric representation vector of the human face:
lr(i)=l′(i)-lm(i)
in a preferred embodiment, as shown in FIG. 3, the network structure at this stage consists of two branches B1,B2And (4) forming. Wherein, B1The branch is an imaging branch which is designed based on an AlexNet network structure. The network structure of the original AlexNet consists of five convolutional layers (Conv)1,Conv2,Conv3,Conv4,Conv5) And three full connection layers (FC)1_1,FC1_2,FC1_3) The imaging branch in the invention removes the last two full connection layers (FC) of the AlexNet structure1_2,FC1_3) Other structures remain unchanged and are on each convolutional layer (Conv)1,Conv2,Conv3,Conv4,Conv5) The batch normalization (batch normalization) operation was added later. Inputting the original face Image into imaging branch to obtain 256-dimensional characteristic vector V1And extracting the texture features of the face image. The main function of the Imaging branch is to capture richer face image semantic information as much as possible.
In the home network B2The branch structure is a landworks branch, and the input of the branch is a geometric representation vector of the facial expression, namely the geometric representation obtained in the last step. Landmarks branches are formed by a fully-connected layer (FC) without bias terms2_1) And (4) forming. After the geometric representation variable passes through the branch structure, the output characteristic vector V with the size of 256 dimensions is obtained2. Finally, the output characteristic vectors (V) with the same size obtained by the Imaging branch and the Landmarks branch are divided1,V2) Concatenated (concatenated) together to form a new vector V3And forming the newly formed vector V3Into the last full connection layer (FC)final) And obtaining a feature vector F with the output size of 512 dimensions. Via a full connection layer FCfinalAnd then, obtaining the expression category confidence of the current face image.
In this embodiment, a multi-classification loss function is designed in the network. Expressions in the face image are divided into 50 category labels. The number of category labels may be predetermined. The specific labeling method may adopt an existing labeling method, and this embodiment is not described in detail.
The face detection network multi-classification loss function is composed of two parts, wherein the first part of the loss function predicts the probability that an expression belongs to each class by using a softmax function, and the formula is as follows:
Figure BDA0002689279810000091
p represents the probability of predicting a sample of class x as j, y is an indicator variable, where z isi、ZkThe prediction results of the ith and the K-th classes are shown, and K shows the number of the classes of the expression;
calculating the uncertainty between the predicted output value and the true tag value using a cross entropy loss function, the formula is as follows:
Figure BDA0002689279810000092
wherein C represents the number of predicted expression categories;
and the second part of loss function optimizes the distribution of the characteristics among different classes by using the triple loss function, and the formula is as follows:
ltri=[α+dp-dn]+
wherein d ispIs the characteristic distance of the positive sample pair, dnIs the characteristic distance of the negative sample pair, alpha is the minimum separation between the two distances, [ z ]]+Represents the function max (z, 0);
and adding the two loss functions to obtain the overall network loss function.
For each mini-batch, we set a batch size to P K, in this case we take P to 32 and K to 2. The method carries out data enhancement operation, carries out horizontal turning operation on each image and the corresponding key point thereof, trains a model by adopting a Stochastic Gradient Descent (SGD), and saves the temporary parameters of the model as a checkpoint file after each epoch is trained.
In order to verify the effectiveness of the method, the experimental example compares the effect of a common micro expression data set CASME2 and the like with that of the existing identification method, and the result shows that the method has higher identification precision for the composite emotional expression type expressed by the face in the high-resolution image, and the proposed model has stronger robustness and has good identification effect for the classification of the micro expression of the face.
In another embodiment, a facial complex emotional expression recognition system is provided, which corresponds to the facial complex emotional expression recognition method in the above embodiments one to one, as shown in fig. 4, the facial complex emotional expression recognition system includes a face detection and extraction module 10, an image preprocessing module 20, a face geometric representation module 30, a dual-branch face prediction module 40, and a category prediction module 50. The functional modules are explained in detail as follows:
the face detection extraction module 10 is used for carrying out face detection on the image and extracting key points of facial features;
the face geometric representation module 30 calculates the distance measurement between the key points to obtain a geometric representation vector of the face;
the double-branch face prediction module 40 is used for constructing a double-branch face detection network, and comprises a first branch network structure and a second branch network structure, wherein the face image obtains a first feature vector through the first branch network structure, and the texture feature of the face image is extracted; obtaining a second feature vector by the obtained geometric representation vector of the face through a second branch network structure; the first feature vector and the second feature vector have the same size, and the first feature vector and the second feature vector are connected to obtain a third feature vector, so that the expression category confidence of the current face image is obtained;
and the category prediction module 50 is used for constructing a face detection network multi-classification loss function to carry out optimization solution and predicting the expression category.
And the image preprocessing module 20 is configured to mark the extracted key points and perform preprocessing operation on the image.
The specific implementation method of each module may refer to the face-face composite emotional expression recognition method in the above embodiment, and details are not repeated in this embodiment.
The face-face composite emotional expression recognition system provided by the embodiment of the invention is applied to the environments of the client and the server, and the client and the server communicate through a network to solve the problem that face expression information in an image cannot be accurately acquired. The client is also called a user side, and refers to a program corresponding to the server and providing local services for the client. The client may be installed on, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server can be implemented by an independent server or a server cluster composed of a plurality of servers.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A face and face composite emotional expression recognition method is characterized by comprising the following steps:
s01: carrying out face detection on the image, and extracting key points of facial features;
s02: calculating distance measurement between the key points to obtain a geometric representation vector of the face;
s03: constructing a double-branch face detection network, wherein the double-branch face detection network comprises a first branch network structure and a second branch network structure, and a face image is subjected to the first branch network structure to obtain a first feature vector; obtaining a second feature vector by the obtained geometric representation vector of the face through a second branch network structure; the first feature vector and the second feature vector have the same size, and the first feature vector and the second feature vector are connected to obtain a third feature vector, so that the expression category confidence of the current face image is obtained;
s04: and constructing a multi-classification loss function of the face detection network to perform optimization solution, and predicting the expression category.
2. The method for recognizing composite emotional expression of human faces and faces according to claim 1, wherein the step S02 is preceded by labeling the extracted key points and performing a preprocessing operation on the images.
3. The method for recognizing composite emotional expression of human face according to claim 1, wherein the method for detecting human face in step S01 comprises:
s11: taking an image containing a human face as a positive sample, taking an image not containing the human face as a negative sample, respectively extracting directional gradient histogram features from a certain number of positive and negative samples, and obtaining a directional gradient histogram feature descriptor;
s12: training the positive and negative samples by using a support vector machine algorithm to obtain a trained model for realizing secondary classification;
s13: carrying out hard-to-separate sample mining on the trained model, wherein the hard-to-separate sample mining comprises the steps of scaling negative sample data in a training set, matching with a template, and carrying out searching matching through a template sliding window; and if the false detection occurs, intercepting the false detection face area and adding the false detection face area into the negative sample data.
4. The method for recognizing composite emotional expression of human faces and faces according to claim 2, wherein the preprocessing operation comprises a first layer of regression training and a second layer of regression training;
the first layer of regression training comprises the following steps:
representing the data organization form in the first layer regression training as
Figure FDA0002689279800000011
Wherein, IπiIs an image of a human face in the training data set,
Figure FDA0002689279800000012
for predicted keypoint locations, Δ S, of the t-th layer in the first layer regressioni (t)Is the difference between the predicted value and the true value of the t-th layer, and the iterative formula is as follows:
Figure FDA0002689279800000013
Figure FDA0002689279800000021
wherein, I represents the input of each layer in the iteration process;
continuously iterating according to the iteration mode, and generating gamma when the regression cascade layer number of the first layer is set to be K layers12,…,γkRegressors, namely regression models obtained through training;
the second layer of regression training comprises the error delta S of each first layer after regression is completedi (t)As input to each second-level regression, each level of regressor gamma is determined by a gradient lifting tree algorithmt
5. The method for recognizing composite emotional expression of human face according to claim 1, wherein the step S02 of obtaining geometric expression vector of human face includes:
s21: calculating the distance between each feature key point and the feature key point at the nose:
l′(i)=l(i)-l(30)
wherein, l is a key point vector value, i is a characteristic key point number, and l(30)The characteristic key points at the nose;
s22: then use the average key point face lm(i)Instead of the original face image, the formula is as follows:
Figure FDA0002689279800000022
wherein N is the number of samples of each face image, and j is a sampling number;
s23: obtaining a geometric representation vector of the human face:
lr(i)=l′(i)-lm(i)
6. the method for recognizing composite emotional expression of human faces and faces according to claim 1, wherein in step S03, the first branch network structure is designed based on an AlexNet network structure, the last two fully connected layers of the AlexNet structure are removed from the first branch network structure, other structures remain unchanged, and batch normalization operation is added after each convolution layer to obtain a first feature vector with 256 dimensions.
7. The method for recognizing composite emotional expression of human faces and faces according to claim 1 or 6, wherein in step S03, the second branch network structure is composed of a full-connected layer without bias terms, the geometric representation vector obtains a 256-dimensional second feature vector through the second branch network structure, and the obtained third feature vector is sent to the last full-connected layer to obtain a feature vector F with an output size of 512-dimensional.
8. The method for recognizing composite emotional expression of human face according to claim 1, wherein the human face detection network multi-classification loss function constructed in step S04 is composed of two parts, the first part loss function predicts probability of the expression belonging to each category by using softmax function, and the formula is as follows:
Figure FDA0002689279800000031
p represents the probability of predicting a sample of class x as j, y is an indicator variable, where z isiZk represents the prediction results of the ith and K classes, and K represents the number of the classes of the expression;
calculating the uncertainty between the predicted output value and the true tag value using a cross entropy loss function, the formula is as follows:
Figure FDA0002689279800000032
wherein C represents the number of predicted expression categories;
and the second part of loss function optimizes the distribution of the characteristics among different classes by using the triple loss function, and the formula is as follows:
ltri=[α+dp-dn]+
wherein d ispIs the characteristic distance of the positive sample pair, dnIs the characteristic distance of the negative sample pair, alpha is the minimum separation between the two distances, [ z ]]+Represents the function max (z, 0);
and adding the two loss functions to obtain the overall network loss function.
9. A facial composite emotional expression recognition system, comprising:
the face detection extraction module is used for carrying out face detection on the image and extracting key points of facial features;
the face geometric representation module is used for calculating distance measurement between the key points to obtain a geometric representation vector of the face;
the double-branch face prediction module is used for constructing a double-branch face detection network, comprises a first branch network structure and a second branch network structure, and obtains a first feature vector from a face image through the first branch network structure; obtaining a second feature vector by the obtained geometric representation vector of the face through a second branch network structure; the first feature vector and the second feature vector have the same size, and the first feature vector and the second feature vector are connected to obtain a third feature vector, so that the expression category confidence of the current face image is obtained;
and the category prediction module is used for constructing a face detection network multi-classification loss function to carry out optimization solution and predicting the expression category.
10. The system of claim 9, further comprising an image preprocessing module for labeling the extracted key points and preprocessing the image.
CN202010985959.0A 2020-09-18 2020-09-18 Face and face composite emotional expression recognition method and system Pending CN112070058A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010985959.0A CN112070058A (en) 2020-09-18 2020-09-18 Face and face composite emotional expression recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010985959.0A CN112070058A (en) 2020-09-18 2020-09-18 Face and face composite emotional expression recognition method and system

Publications (1)

Publication Number Publication Date
CN112070058A true CN112070058A (en) 2020-12-11

Family

ID=73682368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010985959.0A Pending CN112070058A (en) 2020-09-18 2020-09-18 Face and face composite emotional expression recognition method and system

Country Status (1)

Country Link
CN (1) CN112070058A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800941A (en) * 2021-01-26 2021-05-14 中科人工智能创新技术研究院(青岛)有限公司 Face anti-fraud method and system based on asymmetric auxiliary information embedded network
CN112818764A (en) * 2021-01-15 2021-05-18 西安交通大学 Low-resolution image facial expression recognition method based on feature reconstruction model
CN113076916A (en) * 2021-04-19 2021-07-06 山东大学 Dynamic facial expression recognition method and system based on geometric feature weighted fusion
CN113239833A (en) * 2021-05-20 2021-08-10 厦门大学 Facial expression recognition method based on double-branch interference separation network
CN113239727A (en) * 2021-04-03 2021-08-10 国家计算机网络与信息安全管理中心 Person detection and identification method
CN113283376A (en) * 2021-06-10 2021-08-20 泰康保险集团股份有限公司 Face living body detection method, face living body detection device, medium and equipment
CN113420709A (en) * 2021-07-07 2021-09-21 内蒙古科技大学 Cattle face feature extraction model training method and system and cattle insurance method and system
CN113887538A (en) * 2021-11-30 2022-01-04 北京的卢深视科技有限公司 Model training method, face recognition method, electronic device and storage medium
WO2022236647A1 (en) * 2021-05-11 2022-11-17 Huawei Technologies Co., Ltd. Methods, devices, and computer readable media for training a keypoint estimation network using cgan-based data augmentation

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038429A (en) * 2017-05-03 2017-08-11 四川云图睿视科技有限公司 A kind of multitask cascade face alignment method based on deep learning
CN109711258A (en) * 2018-11-27 2019-05-03 哈尔滨工业大学(深圳) Lightweight face critical point detection method, system and storage medium based on convolutional network
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Face and key point combined detection system, method based on multi-task learning
CN110135251A (en) * 2019-04-09 2019-08-16 上海电力学院 A kind of group's image Emotion identification method based on attention mechanism and hybrid network
CN110197099A (en) * 2018-02-26 2019-09-03 腾讯科技(深圳)有限公司 The method and apparatus of across age recognition of face and its model training
CN110321845A (en) * 2019-07-04 2019-10-11 北京奇艺世纪科技有限公司 A kind of method, apparatus and electronic equipment for extracting expression packet from video
CN110532920A (en) * 2019-08-21 2019-12-03 长江大学 Smallest number data set face identification method based on FaceNet method
US20200151489A1 (en) * 2018-11-13 2020-05-14 Nvidia Corporation Determining associations between objects and persons using machine learning models
CN111414862A (en) * 2020-03-22 2020-07-14 西安电子科技大学 Expression recognition method based on neural network fusion key point angle change
US20200242153A1 (en) * 2019-01-29 2020-07-30 Samsung Electronics Co., Ltd. Method, apparatus, electronic device and computer readable storage medium for image searching
CN111611849A (en) * 2020-04-08 2020-09-01 广东工业大学 Face recognition system for access control equipment
CN111611934A (en) * 2020-05-22 2020-09-01 北京华捷艾米科技有限公司 Face detection model generation and face detection method, device and equipment

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038429A (en) * 2017-05-03 2017-08-11 四川云图睿视科技有限公司 A kind of multitask cascade face alignment method based on deep learning
CN110197099A (en) * 2018-02-26 2019-09-03 腾讯科技(深圳)有限公司 The method and apparatus of across age recognition of face and its model training
US20200151489A1 (en) * 2018-11-13 2020-05-14 Nvidia Corporation Determining associations between objects and persons using machine learning models
CN109711258A (en) * 2018-11-27 2019-05-03 哈尔滨工业大学(深圳) Lightweight face critical point detection method, system and storage medium based on convolutional network
US20200242153A1 (en) * 2019-01-29 2020-07-30 Samsung Electronics Co., Ltd. Method, apparatus, electronic device and computer readable storage medium for image searching
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Face and key point combined detection system, method based on multi-task learning
CN110135251A (en) * 2019-04-09 2019-08-16 上海电力学院 A kind of group's image Emotion identification method based on attention mechanism and hybrid network
CN110321845A (en) * 2019-07-04 2019-10-11 北京奇艺世纪科技有限公司 A kind of method, apparatus and electronic equipment for extracting expression packet from video
CN110532920A (en) * 2019-08-21 2019-12-03 长江大学 Smallest number data set face identification method based on FaceNet method
CN111414862A (en) * 2020-03-22 2020-07-14 西安电子科技大学 Expression recognition method based on neural network fusion key point angle change
CN111611849A (en) * 2020-04-08 2020-09-01 广东工业大学 Face recognition system for access control equipment
CN111611934A (en) * 2020-05-22 2020-09-01 北京华捷艾米科技有限公司 Face detection model generation and face detection method, device and equipment

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
LIWEI WANG等: "Learning Two-Branch Neural Networks for Image-Text Matching Tasks", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》, vol. 41, no. 2, 28 February 2019 (2019-02-28), pages 394 - 407, XP011696229, DOI: 10.1109/TPAMI.2018.2797921 *
YANCHENG BAI等: "Multi-Branch Fully Convolutional Network for Face Detection", 《HTTPS://ARXIV.ORG/ABS/1707.06330》, 20 July 2017 (2017-07-20), pages 4321 - 4329 *
YUCHI LIU等: "A Neural Micro-Expression Recognizer", 《2019 14TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2019)》, 11 July 2019 (2019-07-11), pages 1 - 4 *
张燕红: "基于卷积神经网络的人脸识别研究", 《中国博士学位论文全文数据库 (信息科技辑)》, no. 06, 15 June 2020 (2020-06-15), pages 138 - 86 *
曹雯静: "多姿态表情的人脸识别算法研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, 31 December 2018 (2018-12-31), pages 138 - 1396 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818764A (en) * 2021-01-15 2021-05-18 西安交通大学 Low-resolution image facial expression recognition method based on feature reconstruction model
CN112818764B (en) * 2021-01-15 2023-05-02 西安交通大学 Low-resolution image facial expression recognition method based on feature reconstruction model
CN112800941A (en) * 2021-01-26 2021-05-14 中科人工智能创新技术研究院(青岛)有限公司 Face anti-fraud method and system based on asymmetric auxiliary information embedded network
CN113239727A (en) * 2021-04-03 2021-08-10 国家计算机网络与信息安全管理中心 Person detection and identification method
CN113076916A (en) * 2021-04-19 2021-07-06 山东大学 Dynamic facial expression recognition method and system based on geometric feature weighted fusion
WO2022236647A1 (en) * 2021-05-11 2022-11-17 Huawei Technologies Co., Ltd. Methods, devices, and computer readable media for training a keypoint estimation network using cgan-based data augmentation
CN113239833A (en) * 2021-05-20 2021-08-10 厦门大学 Facial expression recognition method based on double-branch interference separation network
CN113239833B (en) * 2021-05-20 2023-08-29 厦门大学 Facial expression recognition method based on double-branch interference separation network
CN113283376A (en) * 2021-06-10 2021-08-20 泰康保险集团股份有限公司 Face living body detection method, face living body detection device, medium and equipment
CN113283376B (en) * 2021-06-10 2024-02-09 泰康保险集团股份有限公司 Face living body detection method, face living body detection device, medium and equipment
CN113420709A (en) * 2021-07-07 2021-09-21 内蒙古科技大学 Cattle face feature extraction model training method and system and cattle insurance method and system
CN113887538A (en) * 2021-11-30 2022-01-04 北京的卢深视科技有限公司 Model training method, face recognition method, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN112070058A (en) Face and face composite emotional expression recognition method and system
US20220027603A1 (en) Fast, embedded, hybrid video face recognition system
CN109800648B (en) Face detection and recognition method and device based on face key point correction
Zafar et al. Face recognition with Bayesian convolutional networks for robust surveillance systems
CN107133569B (en) Monitoring video multi-granularity labeling method based on generalized multi-label learning
CN109993100B (en) Method for realizing facial expression recognition based on deep feature clustering
CN113255557B (en) Deep learning-based video crowd emotion analysis method and system
Li et al. Experimental evaluation of FLIR ATR approaches—a comparative study
CN113657267A (en) Semi-supervised pedestrian re-identification model, method and device
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
CN117197904A (en) Training method of human face living body detection model, human face living body detection method and human face living body detection device
Wang et al. A novel multiface recognition method with short training time and lightweight based on ABASNet and H-softmax
Okokpujie et al. An improved age invariant face recognition using data augmentation
De Rosa et al. Online Action Recognition via Nonparametric Incremental Learning.
US11403875B2 (en) Processing method of learning face recognition by artificial intelligence module
Jeyanthi et al. An efficient automatic overlapped fingerprint identification and recognition using ANFIS classifier
CN111950592B (en) Multi-modal emotion feature fusion method based on supervised least square multi-class kernel canonical correlation analysis
CN111401440B (en) Target classification recognition method and device, computer equipment and storage medium
Ma et al. Bottleneck feature extraction-based deep neural network model for facial emotion recognition
Srininvas et al. A framework to recognize the sign language system for deaf and dumb using mining techniques
Dar Neural networks (CNNs) and VGG on real time face recognition system
CN113591607B (en) Station intelligent epidemic situation prevention and control system and method
Bhattacharya et al. Simplified face quality assessment (sfqa)
Huang et al. Skew correction of handwritten Chinese character based on ResNet
Guzzi et al. Distillation of a CNN for a high accuracy mobile face recognition system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination