CN107808146B - Multi-mode emotion recognition and classification method - Google Patents

Multi-mode emotion recognition and classification method Download PDF

Info

Publication number
CN107808146B
CN107808146B CN201711144196.1A CN201711144196A CN107808146B CN 107808146 B CN107808146 B CN 107808146B CN 201711144196 A CN201711144196 A CN 201711144196A CN 107808146 B CN107808146 B CN 107808146B
Authority
CN
China
Prior art keywords
space
time
image
face
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711144196.1A
Other languages
Chinese (zh)
Other versions
CN107808146A (en
Inventor
孙波
何珺
余乐军
曹斯铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Normal University
Original Assignee
Beijing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Normal University filed Critical Beijing Normal University
Priority to CN201711144196.1A priority Critical patent/CN107808146B/en
Publication of CN107808146A publication Critical patent/CN107808146A/en
Application granted granted Critical
Publication of CN107808146B publication Critical patent/CN107808146B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a multi-modal emotion recognition and classification method, which comprises the steps of processing a video containing a human face to be detected and a video containing body actions within the same time, converting the video into an image time sequence consisting of image frames, extracting time characteristics and space characteristics in the image time sequence, carrying out multi-characteristic-level fusion on the characteristics based on obtained multi-layer depth space-time characteristics, and carrying out decision-level fusion on classification results, thereby recognizing the emotion types of tasks in the video to be detected from multiple modes.

Description

Multi-mode emotion recognition and classification method
Technical Field
The invention relates to the technical field of computer processing, in particular to a multi-mode emotion recognition and classification method.
Background
The emotion recognition is used as a new research field with multiple interdisciplines such as computer science, cognitive science, psychology, brain science, neuroscience and the like, and the research aim is to enable a computer to learn and understand the emotion expression of human beings, and finally enable the computer to have the ability of recognizing and understanding the emotion like the human beings. Therefore, as a highly challenging interdisciplinary subject, emotion recognition becomes a research hotspot in the fields of current home and abroad mode recognition, computer vision, big data mining and artificial intelligence, and has important research value and application prospect.
In the existing emotion recognition technology, the research trend of emotion recognition presents two obvious characteristics, on one hand, data is expanded from emotion recognition based on a static image to emotion recognition based on a dynamic image sequence; on the other hand, the method is extended to multi-modal-based emotion recognition by single-modal-based emotion recognition. At present, emotion recognition research based on static images has achieved a lot of good results, however, the emotion recognition method based on static images ignores time dynamic information of human expressions. The accuracy of the analysis of video data, as a whole, requires further research relative to picture-based emotion recognition. In addition, psychological research shows that emotion recognition is a multi-modal problem in nature, and the joint judgment of emotional states by using body postures and facial expressions has a better effect than that by using single-modal information. Compared with a single mode, the emotion recognition by utilizing multi-mode information fusion is more accurate and reliable. This makes multimodal information fusion also developed as a research hotspot in the field of emotion recognition.
In the prior art, the modal fusion method of facial expressions and body postures only adopts a single fusion mode, and selects one of feature level fusion or decision level fusion according to a certain strategy. In the prior art, effective spatio-temporal features cannot be extracted from video data for emotion recognition, and on the other hand, no matter early-stage fusion or later-stage fusion is adopted, similar fusion methods have the characteristics of model independence, effective information existing in each mode is not fully utilized, and the problem of low fusion efficiency generally exists.
Disclosure of Invention
In order to solve the problems that effective space-time characteristics cannot be extracted from video data for emotion recognition in the prior art, and similar fusion methods have the characteristics of model independence no matter early-stage fusion or later-stage fusion is adopted in emotion recognition, effective information existing in each mode is not fully utilized, and the fusion efficiency is not high generally, the multi-mode emotion recognition classification method is provided.
According to one aspect of the invention, the multi-modal emotion recognition classification method comprises the following steps:
s1, receiving data to be detected, wherein the data to be detected comprises a video containing a face and a corresponding video containing a body motion at the same time, and preprocessing the video containing the face and the corresponding video containing the body motion to obtain a face image time sequence containing the face and a body image time sequence containing the body motion;
s2, sequentially inputting the face image time sequence into a convolution neural network based on Alexnet and a circulation neural network based on BLSTM, taking out output data as a first face image space-time characteristic, sequentially inputting the body image time sequence into the convolution neural network based on Alexnet and the circulation neural network based on BLSTM, and taking out the output data as a first body image space-time characteristic;
s3, serially inputting the first face image space-time feature and the first body image space-time feature into a fully-connected neural network, obtaining a probability matrix belonging to different emotion types after the first face image space-time feature and the first body image space-time feature are fused, marking the probability matrix as a first probability matrix, simultaneously serially inputting the first face image space-time feature and the first body image space-time feature into a support vector machine, obtaining a probability matrix belonging to different emotion types after the first face image space-time feature and the first body image space-time feature are serially connected, and marking the probability matrix as a second probability matrix;
s4, inputting the first face image space-time feature into a support vector machine, obtaining probability matrixes of the first face image space-time feature belonging to different emotion types, marking the probability matrixes as third probability matrixes, inputting the first body image feature into the support vector machine, obtaining probability matrixes of the first body image space-time feature belonging to different emotion types, marking the probability matrixes as fourth probability matrixes, performing decision fusion on the first probability matrixes, the second probability matrixes, the third probability matrixes and the fourth probability matrixes, obtaining first fusion probability matrixes, and taking the highest probability emotion type in the first fusion probability matrixes as an emotion recognition result.
Wherein, before the step S1, the method further includes: and training the Alexnet-based convolutional neural network, the BLSTM-based cyclic neural network, the fully-connected neural network and the support vector machine.
In step S1, the preprocessing the video including the face and the corresponding video including the body motion specifically includes:
carrying out face detection and alignment processing on each frame of image in the video containing the face, and arranging the processed image frames according to a time sequence to obtain a face image time sequence;
and carrying out normalization processing on each frame image in the video containing the body movement, and arranging the processed image frames according to a time sequence to obtain a body image time sequence.
Wherein the step S1 further includes:
reading the mark of each image frame in the video containing the face, extracting the image frames marked as beginning, vertex and disappearance to form a face image time sequence;
reading the mark of each image frame in the video containing the body action, extracting the image frames marked as beginning, vertex and disappearance to form a body image time sequence;
wherein the markers of the image frame include a plateau, a start, a vertex, and a vanishing.
Wherein, the step S2 specifically includes:
s21, inputting the face image time sequence into a convolution neural network based on Alexnet, taking out data of the first two full connection layers of the three full connection layers as face space initial features, carrying out principal component analysis on the face space initial features so as to realize space conversion and dimensionality reduction, obtaining first face image space features, inputting the body image time sequence into the convolution neural network based on Alexnet, taking out data of the first two full connection layers of the three full connection layers as body space initial features, carrying out principal component analysis on the body space initial features so as to realize space conversion and dimensionality reduction, and obtaining first body image space features;
s22, inputting the first human face image space characteristic into a BLSTM-based recurrent neural network, taking out the data of the first two full connected layers in the three full connected layers as the human face space-time initial characteristic, carrying out principal component analysis on the human face space-time initial characteristic to realize space conversion and dimensionality reduction, obtaining the first human face image space-time characteristic, inputting the first human body image space characteristic into the BLSTM-based recurrent neural network, taking out the data of the first two full connected layers in the three full connected layers as the human body space-time initial characteristic, carrying out principal component analysis on the human body space-time initial characteristic, realizing space conversion and dimensionality reduction, and obtaining the first human body image space-time characteristic.
Wherein, the step S1 further includes:
and cutting the face image time sequence and the body image time sequence according to the preset length of the sliding window to obtain a face image time subsequence group consisting of a plurality of face image time sequence segments and a body image time subsequence group consisting of a plurality of body image time sequence segments.
Wherein the step S2 further includes:
sequentially inputting a plurality of face image time sequence segments in the face image time subsequence group into a convolution neural network based on Alexnet and a circulation neural network based on BLSTM, and taking out output data as second face image space-time characteristics;
and sequentially inputting a plurality of body image time sequence segments in the body image time subsequence group into a convolution neural network based on Alexnet and a circulation neural network based on BLSTM, and taking out output data as the space-time characteristics of the second body image.
Wherein, the step S2 further includes:
inputting a plurality of face image time sequences in the face image time subsequence group into a convolution neural network based on Alexnet, taking out data of the first two full connection layers in the three full connection layers as second face space initial features, performing principal component analysis on the second face space initial features to realize space conversion and dimension reduction, obtaining second face image space features, inputting a plurality of body image time sequences in the body image time subsequence group into the convolution neural network based on Alexnet, taking out data of the first two full connection layers in the three full connection layers as second body space initial features, and performing principal component analysis on the second body space initial features to realize space conversion and dimension reduction, so as to obtain second body image space features;
inputting the space characteristics of the second face image into a BLSTM-based recurrent neural network, taking out the data of the first two full connection layers in the three full connection layers as the space-time initial characteristics of the second face, performing principal component analysis on the space-time initial characteristics of the face to realize space conversion and dimensionality reduction, obtaining the space-time characteristics of the second face image, inputting the space characteristics of the second body image into the BLSTM-based recurrent neural network, taking out the data of the first two full connection layers in the three full connection layers as the space-time initial characteristics of the second body, performing principal component analysis on the space-time initial characteristics of the body to realize space conversion and dimensionality reduction, and obtaining the space-time characteristics of the second body image.
Wherein the step S3 further includes:
inputting the second face image space-time characteristic and the second body image space-time characteristic into a fully-connected neural network in series, inputting an output result into a support vector machine, obtaining probability matrixes which belong to different emotion types after the second face image space-time characteristic and the second body image space-time characteristic are fused, marking the probability matrixes as fifth probability matrixes, simultaneously inputting the second face image space-time characteristic and the second body image space-time characteristic into the support vector machine in series, obtaining probability matrixes which belong to different emotion types after the second face image space-time characteristic and the second body image space-time characteristic are fused, and marking the probability matrixes as sixth probability matrixes.
Wherein the step S4 further includes:
inputting the first face image space-time characteristics into a support vector machine to obtain probability matrixes of the first face image space-time characteristics belonging to different emotion types, marking the probability matrixes as third probability matrixes, inputting the first body image characteristics into the support vector machine to obtain probability matrixes of the first body image space-time characteristics belonging to different emotion types, marking the probability matrixes as fourth probability matrixes, and performing decision fusion on the fifth probability matrixes, the sixth probability matrixes, the seventh probability matrixes and the eighth probability matrixes to obtain second fusion probability matrixes;
and performing decision fusion on the first fusion probability matrix and the second fusion probability matrix to obtain a third fusion probability matrix, and taking the emotion type with the highest probability in the third fusion probability matrix as an emotion recognition result.
The method provided by the invention adopts a multi-mode combined emotion recognition method, fully utilizes effective information of various modes in the video to be detected, improves the fusion efficiency, and simultaneously improves the accuracy of emotion recognition.
Drawings
Fig. 1 is a flowchart of a multi-modal emotion recognition classification method according to an embodiment of the present invention;
FIG. 2 is a comparison graph of emotion recognition rates based on time series and using different fusion strategies in the multi-modal emotion recognition classification method provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a neural network structure for extracting spatiotemporal features in a multi-modal emotion recognition classification method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating segmentation of a time sequence by using a sliding window in a multi-modal emotion recognition classification method according to an embodiment of the present invention;
FIG. 5 is a comparison graph of emotion recognition rates based on time sequence segments and using different fusion strategies in the multi-modal emotion recognition classification method provided by an embodiment of the present invention;
fig. 6 is an emotion recognition rate comparison diagram obtained by fusing a time sequence and a time sequence segment according to the multi-modal emotion recognition classification method provided by the embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart of a multi-modal emotion recognition and classification method according to an embodiment of the present invention, where the method includes:
s1, receiving data to be detected, wherein the data to be detected comprises a video containing a human face and a corresponding video containing a body action, and preprocessing the video containing the human face and the corresponding video containing the body action to obtain a human face image time sequence and a body image time sequence.
Specifically, a video containing facial expressions of people and a video containing body movements within the same time are received, the videos are preprocessed, and then the video of the face and the video of the body movements are arranged according to image frames respectively, so that a face image time sequence and a body image time sequence which are formed by the image frames in the videos are obtained.
By the method, the video data are converted into the image frame sequence, the operability of the data is improved, and the data can be conveniently processed subsequently.
And S2, sequentially inputting the face image time sequence into a convolutional neural network based on Alexnet and a cyclic neural network based on BLSTM, taking out output data as a first face image space-time characteristic, sequentially inputting the body image time sequence into the convolutional neural network based on Alexnet and the cyclic neural network based on BLSTM, and taking out the output data as a first body image space-time characteristic.
Specifically, the face image time series and the body image time series obtained in S1 are input into a trained Alexnet-based convolutional neural network and a BLSTM-based recurrent neural network, respectively, spatial features of the image time series can be obtained from the time series through the Alexnet-based convolutional neural network, and spatial and temporal features in the image time series can be further obtained from the obtained spatial features through the recurrent neural network. In this embodiment, the face image time sequence and the body image time sequence are respectively input into the trained convolution neural network based on Alexnet and the trained circulation neural network based on BLSTM, so that the spatiotemporal features of the face image spatiotemporal sequence, that is, the spatiotemporal features of the first face image, and the spatiotemporal features of the body image sequence, that is, the spatiotemporal features of the first body image, can be respectively obtained.
By the method, a depth network combining a convolution neural network based on Alexnet and a circulation neural network based on BLSTM is constructed, and local and global space-time characteristics are extracted, so that the face image time sequence and the body image time sequence can be classified according to the acquired multilayer depth space-time characteristics.
And S3, serially inputting the first face image space-time characteristic and the first body image space-time characteristic into a fully-connected neural network, inputting an output result into a support vector machine, obtaining probability matrixes belonging to different emotion types after the first face image space-time characteristic and the first body image space-time characteristic are fused, marking the probability matrixes as first probability matrixes, simultaneously serially inputting the first face image space-time characteristic and the first body image space-time characteristic into the support vector machine, obtaining probability matrixes belonging to different emotion types after the first face image space-time characteristic and the first body image space-time characteristic are serially connected, and marking the probability matrixes as second probability matrixes.
Specifically, the first face image spatiotemporal feature and the first body image spatiotemporal feature are connected in series and input into a trained fully-connected neural network, an output result is input into a trained support vector machine, and the probability that the combined feature of the first face image spatiotemporal feature and the first body image spatiotemporal feature belongs to different emotion categories can be obtained according to two mode combinations of the first face image spatiotemporal feature and the first body image spatiotemporal feature, so that a first classification probability matrix is constructed.
Preferably, the data of the last but one full-connected layer is taken from the output data of the full-connected neural network to perform principal component analysis for dimensionality reduction, and then the processed data is input into a trained support vector machine to obtain a probability classification result with higher precision.
On the other hand, the first face image space-time feature and the first body image space-time feature are connected in series, and then the connected features are input into a trained support vector machine, so that the probability that the combined features of the first face image space-time feature and the first body image space-time feature belong to different emotion categories can be obtained, and a second classification probability matrix is constructed.
In the process of connecting the first face image space-time characteristic and the first body image space-time characteristic in series, the dimension of the connected characteristics can be reduced through principal component analysis, and then the features after dimension reduction are input into a trained support vector machine so as to obtain probability output. By the method, the characteristics of the human face and the characteristics of the body action are fused through characteristic level fusion, different fusion strategies including a neural network fusion strategy and a characteristic series fusion strategy are adopted, and probability matrixes of video data belonging to different emotion categories can be obtained respectively.
By the method, the characteristics of the human face and the characteristics of the body action are fused through characteristic level fusion, different fusion strategies including a neural network fusion strategy and a characteristic series fusion strategy are adopted, and probability matrixes of video data belonging to different emotion categories can be obtained respectively.
S4, inputting the first face image space-time feature into a support vector machine, obtaining probability matrixes of the first face image space-time feature belonging to different emotion types, marking the probability matrixes as third probability matrixes, inputting the first body image feature into the support vector machine, obtaining probability matrixes of the first body image space-time feature belonging to different emotion types, marking the probability matrixes as fourth probability matrixes, performing decision fusion on the first probability matrixes, the second probability matrixes, the third probability matrixes and the fourth probability matrixes, obtaining first fusion probability matrixes, and taking the highest probability emotion type in the first fusion probability matrixes as an emotion recognition result.
Specifically, the first face image space-time characteristics are independently input into a trained support vector machine, so that probability matrixes of the first face image space-time characteristics belonging to different emotion categories can be obtained, a third probability matrix is constructed through the probability matrixes, on the other hand, the first body image space-time characteristics are independently input into the trained support vector machine, so that probability matrixes of the first body image space-time characteristics belonging to different emotion categories can be obtained, and a fourth probability matrix is constructed through the probability matrixes.
Referring to fig. 2, fig. 2 is an emotion recognition rate comparison graph that adopts different fusion strategies based on time sequences in the multi-modal emotion recognition classification method provided in an embodiment of the present invention, and decision fusion is performed on four obtained probability matrices to obtain a new fused probability matrix, where the probability matrix includes a set of probabilities that data to be detected belong to different emotion categories, and in this set, an emotion category with the highest probability is selected as a final recognition result.
According to the method, the facial image expression of a person and the body action in the same time period are combined, the spatiotemporal features of the data to be detected are extracted by using the deep neural network, and the spatiotemporal features are classified according to different fusion strategies by the support vector machine, so that multi-modal emotion recognition is finally realized, effective information in each mode is fully utilized, and the emotion recognition accuracy probability is improved.
On the basis of the above embodiment, the step S1 is preceded by: and training the Alexnet-based convolutional neural network, the BLSTM-based cyclic neural network, the fully-connected neural network and the support vector machine.
Specifically, 127 videos in the FABO database are used for training a convolutional neural network based on Alexnet, a cyclic neural network based on BLSTM, a fully-connected neural network, and a support vector machine.
The feature extraction model is obtained by training an Alexnet-based convolutional neural network and a BLSTM-based cyclic neural network using an image sequence having variations in the face and body, and adjusting network parameters. Spatio-temporal features of body pose using spatio-temporal features of different facial activities are input into a support vector machine, an emotion classification model.
On the basis of the foregoing embodiment, the preprocessing the video including the human face and the corresponding video including the body motion in step S1 specifically includes: carrying out face detection and alignment processing on each frame of image in the video containing the face, and arranging the processed image frames according to a time sequence to obtain a face image time sequence; and carrying out normalization processing on each frame of image in the video containing the body movement, and arranging the processed image frames according to a time sequence to obtain a body image time sequence.
Specifically, the face detection operation and the alignment processing are performed on each image frame in the video containing the face, then each processed image frame is arranged according to the time sequence, so that a face image time sequence is obtained, meanwhile, the normalization processing is performed on the image frames in the video containing the body movement, so that the formats of the image frames of each image frame are consistent, and then the processed image frame groups are arranged according to the time sequence, so that the body image time sequence is formed.
By the method, the formats of each frame of image in the face image time sequence and the body image time sequence are the same, and subsequent operations such as feature extraction are facilitated.
On the basis of the above embodiment, the step S1 further includes: reading the mark of each image frame in the video containing the face, extracting the image frames marked as beginning, vertex and disappearance to form a face image time sequence; and reading the mark of each image frame in the video containing the body motion, and extracting the image frames marked as beginning, vertex and disappearance to form a body image time sequence. Wherein the markers of the image frame include a plateau, a start, a vertex, and a vanishing.
Specifically, in a database of data to be detected, each frame of a video is marked, all image frames at the beginning stage of an expression action are marked as "beginning", a time period when the expression action reaches the maximum is marked as "peak", all image frames within the time period when the expression action ends are marked as "end", and other image frames without expression are marked as "flat".
In emotion recognition using a face image time series and a body image time series, the time sequence composed of images containing all image frames can be used, the time sequence composed of only image frames in the time period when the expression action reaches the maximum can be selected, preferably, the image frames before the expression action starts and after the expression action finishes are discarded, only partial image frames from the expression action starts to the expression action finishes are selected for classification processing, the image frames marked as 'start', 'vertex' and 'disappearance' are extracted to form the time sequence, therefore, the overall recognition accuracy can be improved, the table 1 shows the result of emotion recognition through the face video based on different image frame extraction methods, and the table 2 shows the result of emotion recognition through body actions based on different image frame extraction methods.
TABLE 1
Time series screening method MAA(%) ACC(%)
Vertex sequence 55.90 56.84
Start-vertex-vanish sequence 57.56 61.11
All sequences of the whole cycle 51.67 53.85
TABLE 2
Time series screening method MAA(%) ACC(%)
Vertex sequence 45.88 50.60
Start-vertex-vanish sequence 48.98 51.70
All sequences of the whole cycle 44.50 49.77
As can be seen from tables 1 and 2, the emotion recognition performed when the image frames marked as "start", "vertex" and "disappear" in the video are selected to form the time sequence has a higher recognition rate than other schemes. Wherein MAA represents the macro average accuracy, ACC represents the overall accuracy, and the calculation formula specifically comprises:
Figure BDA0001472120390000111
Pi=TPi/(TPi+FPi)
Figure BDA0001472120390000121
wherein s is the number of emotion categories, PiThe accuracy of the ith emotion is shown, i is the number of correct classifications in the ith class, FPiRefers to the number of misclassifications in class i.
On the basis of the foregoing embodiment, the step S2 specifically includes:
s21, inputting the face image time sequence into a convolution neural network based on Alexnet, taking out data of the first two full connection layers of the three full connection layers as face space initial features, carrying out principal component analysis on the face space initial features so as to realize space conversion and dimensionality reduction, obtaining first face image space features, inputting the body image time sequence into the convolution neural network based on Alexnet, taking out data of the first two full connection layers of the three full connection layers as body space initial features, carrying out principal component analysis on the body space initial features so as to realize space conversion and dimensionality reduction, and obtaining first body image space features;
s22, inputting the first human face image space characteristic into a BLSTM-based recurrent neural network, taking out the data of the first two full connection layers in the three full connection layers as the human face space-time initial characteristic, carrying out principal component analysis on the human face space-time initial characteristic, realizing space conversion and dimensionality reduction, obtaining the first human face image space-time characteristic, inputting the first human body image space characteristic into the BLSTM-based recurrent neural network, taking out the data of the first two full connection layers in the three full connection layers as the human body space-time initial characteristic, carrying out principal component analysis on the human body space-time initial characteristic, realizing space conversion and dimensionality reduction, and obtaining the first human body image space-time characteristic.
Specifically, referring to fig. 3, in order to obtain multi-layer depth spatio-temporal features in a face image time sequence and a body image time sequence, feature extraction in an image space needs to be implemented by means of a convolutional neural network, and further, a cyclic neural network is used to extract time information in the image sequence, in this embodiment, spatial features in the face image time sequence and the body image time sequence are respectively extracted by using a convolutional neural network based on Alexnet, preferably, in the convolutional neural network based on Alexnet, the last three layers are all fully connected layers, output feature dimensions are respectively 1024 dimensions, 512 dimensions and 10 dimensions, output data of the first 2 layers in three fully connected layers are used as output initial spatial features, where the extracted initial feature dimensions total 1536 dimensions, main component analysis is performed on the 1536-dimensional features, thereby implementing spatial transformation and dimension reduction processing, the latitude reaches the input standard of a BLSTM-based recurrent neural network, the output data of the middle and front 2 layers of the last three full-connection layers are extracted as initial space-time characteristics, wherein the initial space-time characteristics are also 1536-dimensional, principal component analysis is carried out on 1536-dimensional characteristic points of the initial space-time characteristics, and therefore space conversion and dimension reduction processing are achieved, and finally space-time characteristics are obtained. In this step, the face image time series is sequentially input to the trained Alexnet-based convolutional neural network and the trained BLSTM-based cyclic neural network, thereby obtaining the face image spatiotemporal features, and similarly, the body image time series is sequentially input to the trained Alexnet-based convolutional neural network and the trained BLSTM-based cyclic neural network, thereby obtaining the body image spatiotemporal features, which are labeled as the first face image spatiotemporal features and the first body image spatiotemporal features.
By the method, the extraction of the spatial features and the extraction of the temporal features of the time series of the images are realized.
On the basis of the foregoing embodiments, the step S1 further includes: and cutting the face image time sequence and the body image time sequence according to the preset length of the sliding window to obtain a face image time subsequence group consisting of a plurality of face image time sequence segments and a body image time subsequence group consisting of a plurality of body image time sequence segments.
Specifically, after a face image time sequence and a body image time sequence are obtained, the time sequence is cut through a sliding window with preset window length, as shown in fig. 4, a face image time sequence with a length of 15 includes 5 frame image frames marked as "start", 5 frame image frames marked as "vertex", 5 frame image marks as "disappear", the sequence is cut through a sliding window with a length of 6 and a sliding step length of 1, 10 face image time sequence segments with a length of 6 can be obtained after the face image time sequence with a length of 15 passes through the set sliding window, a face image time subsequence group is formed, wherein the length of the sliding window is defined as far as possible to ensure that the cut time sequence segments include at least two types of image frames of three types of image frames including "start", "vertex" and "end", and cutting the body image time sequence, and forming the body image time sequence segments obtained after cutting into a body segment time subsequence group.
Table 3 shows the emotion recognition results based on the face image time series at different sliding window lengths, and table 4 shows the emotion recognition results based on the body image time series at different sliding window lengths.
TABLE 3
t 6 7 8 9 10
MAA(%) 58.61 60.45 67.09 58.48 56.13
ACC(%) 59.00 61.25 66.46 59.03 57.21
TABLE 4
t 6 7 8 9 10
MAA(%) 43.66 55.00 50.20 47.33 45.81
ACC(%) 44.85 55.98 51.83 48.76 46.00
As can be seen from tables 3 and 4, when the sliding window length is selected to have a suitable length, the recognition accuracy is higher than that of the emotion recognition mode in tables 1 and 2 in which the entire time series is used without time series division.
On the basis of the foregoing embodiments, the step S2 further includes: sequentially inputting a plurality of face image time sequence segments in the face image time subsequence group into a convolution neural network based on Alexnet and a circulation neural network based on BLSTM to obtain a second face image space-time characteristic; and sequentially inputting a plurality of body image time sequence segments in the body image time subsequence group into a convolution neural network based on Alexnet and a circulation neural network based on BLSTM to obtain the space-time characteristics of a second body image.
Specifically, a plurality of face image time sequence segments in the face image time subsequence set and a plurality of body image time sequence segments in the body image time subsequence set are input into a trained convolution neural network based on Alexnet and a trained circulation neural network based on BLSTM, and the spatiotemporal features of all the time sequence segments in the face image time subsequence set and the spatiotemporal features of all the time sequence segments in the body motion image time subsequence set are respectively obtained and marked as a second face image spatiotemporal feature and a second body motion image spatiotemporal feature.
By the method, the feature extraction is carried out on the plurality of time sequence segments after the segmentation, so that the new human face image space-time feature and the new body motion image space-time feature can be obtained and used for classifying the classifier.
On the basis of the foregoing embodiments, the step S2 further includes:
inputting a plurality of face image time sequences in the face image time subsequence group into a convolution neural network based on Alexnet, taking out data of the first two full connection layers in the three full connection layers as second face space initial features, performing principal component analysis on the second face space initial features to realize space conversion and dimension reduction, obtaining second face image space features, inputting a plurality of body image time sequences in the body image time subsequence group into the convolution neural network based on Alexnet, taking out data of the first two full connection layers in the three full connection layers as second body space initial features, and performing principal component analysis on the second body space initial features to realize space conversion and dimension reduction, so as to obtain second body image space features;
inputting the space characteristics of the second face image into a BLSTM-based recurrent neural network, taking out the data of the first two full connection layers in the three full connection layers as the space-time initial characteristics of the second face, performing principal component analysis on the space-time initial characteristics of the face to realize space conversion and dimensionality reduction, obtaining the space-time characteristics of the second face image, inputting the space characteristics of the second body image into the BLSTM-based recurrent neural network, taking out the data of the first two full connection layers in the three full connection layers as the space-time initial characteristics of the second body, performing principal component analysis on the space-time initial characteristics of the body to realize space conversion and dimensionality reduction, and obtaining the space-time characteristics of the second body image.
Specifically, consistent with the method for extracting the first face spatiotemporal feature and the first body spatiotemporal feature in the foregoing embodiment, in order to obtain spatiotemporal features of multiple depths in a face image time sequence and a body image time sequence, feature extraction in an image space needs to be realized by means of a convolutional neural network, and then time information in an image needs to be further extracted by using a cyclic neural network. Here, the extraction manner of the features in the neural network is the same as that in the above embodiment, and the details are not described here.
On the basis of the foregoing embodiments, the step S3 further includes: inputting the second face image space-time characteristic and the second body image space-time characteristic into a fully-connected neural network in series, inputting an output result into a support vector machine, obtaining probability matrixes which belong to different emotion types after the second face image space-time characteristic and the second body image space-time characteristic are fused, marking the probability matrixes as fifth probability matrixes, simultaneously inputting the second face image space-time characteristic and the second body image space-time characteristic into the support vector machine in series, obtaining probability matrixes which belong to different emotion types after the second face image space-time characteristic and the second body image space-time characteristic are connected in series, and marking the probability matrixes as sixth probability matrixes.
Specifically, the second face image space-time feature and the second body image space-time feature are connected in series and input into a trained fully-connected neural network, data of a last but one fully-connected layer in the fully-connected neural network is used as output data, after principal component analysis is carried out, the output data is input into a trained support vector machine, so that the probability that the second face image space-time feature and the second body image space-time feature belong to different emotion categories is obtained according to two mode combinations of the second face image space-time feature and the second body image space-time feature, and a fifth classification probability matrix is constructed.
On the other hand, the second face image space-time feature and the second body image space-time feature are connected in series, and then the features after being connected in series are input into a trained support vector machine, so that the probabilities that the second face image space-time feature and the second body image space-time feature after being connected in series belong to different emotion categories can be obtained, and the probabilities are combined to construct a sixth classification probability matrix.
On the basis of the above embodiment, the step S4 further includes: inputting the first face image space-time characteristics into a support vector machine to obtain probability matrixes of the first face image space-time characteristics belonging to different emotion types, marking the probability matrixes as seventh probability matrixes, inputting the first body image characteristics into the support vector machine to obtain probability matrixes of the first body image space-time characteristics belonging to different emotion types, marking the probability matrixes as eighth probability matrixes, and performing decision fusion on the fifth probability matrixes, the sixth probability matrixes, the seventh probability matrixes and the eighth probability matrixes to obtain second fusion probability matrixes; performing decision fusion on the first fusion probability matrix and the second fusion probability matrix to obtain a third fusion probability matrix, and taking the emotion type with the highest probability in the third fusion probability matrix as an emotion recognition result
Specifically, the second face image space-time feature is separately input into a trained support vector machine, so that a probability matrix that the second face image space-time feature belongs to different emotion classes can be obtained, and the probability matrix is marked as a seventh probability matrix, on the other hand, the second body image space-time feature is separately input into the trained support vector machine, so that a probability matrix that the second body image space-time feature belongs to different emotion classes can be obtained, and the probability matrix is marked as an eighth probability matrix.
Referring to fig. 5, it can be seen from fig. 5 that the emotion type recognition rate shown in Multi4-2 in fig. 5 can be achieved by comparing the recognition rates of emotion recognition based on the fifth probability matrix, the sixth probability matrix, the seventh probability matrix and the eighth probability matrix, and performing decision fusion on the fifth probability matrix, the sixth probability matrix, the seventh probability matrix and the eighth probability matrix to generate a second fused probability matrix.
And finally, performing decision-level fusion on the first fusion probability matrix and the second fusion probability matrix according to probability decision to obtain a third fusion probability matrix, and selecting the emotion category with the highest probability in the set as a final recognition result. Referring to fig. 6, fig. 6 shows that the emotion recognition accuracy of 99% or more can be obtained by performing emotion recognition on the entire time series and the time series segment groups subjected to sliding window segmentation, respectively, and then fusing the recognition results, with the emotion recognition rate of the first fusion probability matrix, the emotion recognition rate of the second fusion probability matrix, and the emotion recognition rate of the third fusion probability matrix.
By the method, the emotion recognition method combined with multiple modes is adopted, effective information of various modes in the video to be detected is fully utilized, the fusion efficiency is improved, and meanwhile the emotion recognition accuracy is improved.
Finally, the method of the present application is only a preferred embodiment and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A multi-modal emotion recognition classification method is characterized by comprising the following steps:
s1, receiving data to be detected, wherein the data to be detected comprises a video containing a face and a corresponding video containing a body motion at the same time, and preprocessing the video containing the face and the corresponding video containing the body motion to obtain a face image time sequence containing the face and a body image time sequence containing the body motion;
s2, sequentially inputting the face image time sequence into a convolution neural network based on Alexnet and a circulation neural network based on BLSTM, taking out output data as a first face image space-time characteristic, sequentially inputting the body image time sequence into the convolution neural network based on Alexnet and the circulation neural network based on BLSTM, and taking out the output data as a first body image space-time characteristic;
s3, serially inputting the first face image space-time feature and the first body image space-time feature into a fully-connected neural network, inputting an output result into a support vector machine, obtaining probability matrixes belonging to different emotion types after the first face image space-time feature and the first body image space-time feature are fused, marking the probability matrixes as first probability matrixes, simultaneously serially inputting the first face image space-time feature and the first body image space-time feature into the support vector machine, obtaining probability matrixes belonging to different emotion types after the first face image space-time feature and the first body image space-time feature are serially connected, and marking the probability matrixes as second probability matrixes;
s4, inputting the first face image space-time feature into a support vector machine, obtaining probability matrixes of the first face image space-time feature belonging to different emotion types, marking the probability matrixes as third probability matrixes, inputting the first body image feature into the support vector machine, obtaining probability matrixes of the first body image space-time feature belonging to different emotion types, marking the probability matrixes as fourth probability matrixes, performing decision fusion on the first probability matrix, the second probability matrix, the third probability matrix and the fourth probability matrixes, obtaining first fusion probability matrixes, and taking the highest probability emotion type in the first fusion probability matrixes as an emotion recognition result;
in step S1, the preprocessing the video including the face and the corresponding video including the body motion specifically includes:
carrying out face detection and alignment processing on each frame of image in the video containing the face, and arranging the processed image frames according to a time sequence to obtain a face image time sequence;
normalizing each frame image in the video containing the body movement, and arranging the processed image frames according to a time sequence to obtain a body image time sequence;
wherein the step S1 further includes:
reading the mark of each image frame in the video containing the face, extracting the image frames marked as beginning, vertex and disappearance to form a face image time sequence;
reading the mark of each image frame in the video containing the body action, extracting the image frames marked as beginning, vertex and disappearance to form a body image time sequence;
wherein the markers of the image frame include a plateau, a start, a vertex, and a vanishing.
2. The method according to claim 1, wherein the step S1 is preceded by: and training the Alexnet-based convolutional neural network, the BLSTM-based cyclic neural network, the fully-connected neural network and the support vector machine.
3. The method according to claim 1, wherein the step S2 specifically includes:
s21, inputting the face image time sequence into a convolution neural network based on Alexnet, taking out data of the first two full connection layers of the three full connection layers as face space initial features, carrying out principal component analysis on the face space initial features so as to realize space conversion and dimensionality reduction, obtaining first face image space features, inputting the body image time sequence into the convolution neural network based on Alexnet, taking out data of the first two full connection layers of the three full connection layers as body space initial features, carrying out principal component analysis on the body space initial features so as to realize space conversion and dimensionality reduction, and obtaining first body image space features;
s22, inputting the first human face image space characteristic into a BLSTM-based recurrent neural network, taking out the data of the first two full connected layers in the three full connected layers as the human face space-time initial characteristic, carrying out principal component analysis on the human face space-time initial characteristic to realize space conversion and dimensionality reduction, obtaining the first human face image space-time characteristic, inputting the first human body image space characteristic into the BLSTM-based recurrent neural network, taking out the data of the first two full connected layers in the three full connected layers as the human body space-time initial characteristic, carrying out principal component analysis on the human body space-time initial characteristic, realizing space conversion and dimensionality reduction, and obtaining the first human body image space-time characteristic.
4. The method according to any one of claims 1 to 3, wherein the step S1 further comprises:
and cutting the face image time sequence and the body image time sequence according to the preset length of the sliding window to obtain a face image time subsequence group consisting of a plurality of face image time sequence segments and a body image time subsequence group consisting of a plurality of body image time sequence segments.
5. The method according to claim 4, wherein the step S2 further comprises:
sequentially inputting a plurality of face image time sequence segments in the face image time subsequence group into a convolution neural network based on Alexnet and a circulation neural network based on BLSTM, and taking out output data as second face image space-time characteristics;
and sequentially inputting a plurality of body image time sequence segments in the body image time subsequence group into a convolution neural network based on Alexnet and a circulation neural network based on BLSTM, and taking out output data as the space-time characteristics of the second body image.
6. The method according to claim 5, wherein the step S2 further comprises:
inputting a plurality of face image time sequences in the face image time subsequence group into a convolution neural network based on Alexnet, taking out data of the first two full connection layers in the three full connection layers as second face space initial features, performing principal component analysis on the second face space initial features to realize space conversion and dimension reduction, obtaining second face image space features, inputting a plurality of body image time sequences in the body image time subsequence group into the convolution neural network based on Alexnet, taking out data of the first two full connection layers in the three full connection layers as second body space initial features, and performing principal component analysis on the second body space initial features to realize space conversion and dimension reduction, so as to obtain second body image space features;
inputting the space characteristics of the second face image into a BLSTM-based recurrent neural network, taking out the data of the first two full connection layers in the three full connection layers as the space-time initial characteristics of the second face, performing principal component analysis on the space-time initial characteristics of the face to realize space conversion and dimensionality reduction, obtaining the space-time characteristics of the second face image, inputting the space characteristics of the second body image into the BLSTM-based recurrent neural network, taking out the data of the first two full connection layers in the three full connection layers as the space-time initial characteristics of the second body, performing principal component analysis on the space-time initial characteristics of the body to realize space conversion and dimensionality reduction, and obtaining the space-time characteristics of the second body image.
7. The method according to claim 6, wherein the step S3 further comprises:
inputting the second face image space-time characteristic and the second body image space-time characteristic into a fully-connected neural network in series, inputting an output result into a support vector machine, obtaining probability matrixes which belong to different emotion types after the second face image space-time characteristic and the second body image space-time characteristic are fused, marking the probability matrixes as fifth probability matrixes, simultaneously inputting the second face image space-time characteristic and the second body image space-time characteristic into the support vector machine in series, obtaining probability matrixes which belong to different emotion types after the second face image space-time characteristic and the second body image space-time characteristic are connected in series, and marking the probability matrixes as sixth probability matrixes.
8. The method according to claim 7, wherein the step S4 further comprises:
inputting the first face image space-time characteristics into a support vector machine to obtain probability matrixes of the first face image space-time characteristics belonging to different emotion types, marking the probability matrixes as seventh probability matrixes, inputting the first body image characteristics into the support vector machine to obtain probability matrixes of the first body image space-time characteristics belonging to different emotion types, marking the probability matrixes as eighth probability matrixes, and performing decision fusion on the fifth probability matrixes, the sixth probability matrixes, the seventh probability matrixes and the eighth probability matrixes to obtain second fusion probability matrixes;
and performing decision fusion on the first fusion probability matrix and the second fusion probability matrix to obtain a third fusion probability matrix, and taking the emotion type with the highest probability in the third fusion probability matrix as an emotion recognition result.
CN201711144196.1A 2017-11-17 2017-11-17 Multi-mode emotion recognition and classification method Active CN107808146B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711144196.1A CN107808146B (en) 2017-11-17 2017-11-17 Multi-mode emotion recognition and classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711144196.1A CN107808146B (en) 2017-11-17 2017-11-17 Multi-mode emotion recognition and classification method

Publications (2)

Publication Number Publication Date
CN107808146A CN107808146A (en) 2018-03-16
CN107808146B true CN107808146B (en) 2020-05-05

Family

ID=61589748

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711144196.1A Active CN107808146B (en) 2017-11-17 2017-11-17 Multi-mode emotion recognition and classification method

Country Status (1)

Country Link
CN (1) CN107808146B (en)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491720B (en) * 2018-03-20 2023-07-14 腾讯科技(深圳)有限公司 Application identification method, system and related equipment
CN108491880B (en) * 2018-03-23 2021-09-03 西安电子科技大学 Object classification and pose estimation method based on neural network
CN108596039B (en) * 2018-03-29 2020-05-05 南京邮电大学 Bimodal emotion recognition method and system based on 3D convolutional neural network
CN109101999B (en) * 2018-07-16 2021-06-25 华东师范大学 Support vector machine-based cooperative neural network credible decision method
CN110795973A (en) * 2018-08-03 2020-02-14 北京大学 Multi-mode fusion action recognition method and device and computer readable storage medium
CN109190514B (en) * 2018-08-14 2021-10-01 电子科技大学 Face attribute recognition method and system based on bidirectional long-short term memory network
CN109325457B (en) * 2018-09-30 2022-02-18 合肥工业大学 Emotion analysis method and system based on multi-channel data and recurrent neural network
CN109359599A (en) * 2018-10-19 2019-02-19 昆山杜克大学 Human facial expression recognition method based on combination learning identity and emotion information
CN109684911B (en) * 2018-10-30 2021-05-11 百度在线网络技术(北京)有限公司 Expression recognition method and device, electronic equipment and storage medium
CN109522945B (en) * 2018-10-31 2020-09-25 中国科学院深圳先进技术研究院 Group emotion recognition method and device, intelligent device and storage medium
CN109766759A (en) * 2018-12-12 2019-05-17 成都云天励飞技术有限公司 Emotion identification method and Related product
CN109783684B (en) * 2019-01-25 2021-07-06 科大讯飞股份有限公司 Video emotion recognition method, device and equipment and readable storage medium
CN110020596B (en) * 2019-02-21 2021-04-30 北京大学 Video content positioning method based on feature fusion and cascade learning
CN110037693A (en) * 2019-04-24 2019-07-23 中央民族大学 A kind of mood classification method based on facial expression and EEG
CN110378335B (en) * 2019-06-17 2021-11-19 杭州电子科技大学 Information analysis method and model based on neural network
CN110287912A (en) * 2019-06-28 2019-09-27 广东工业大学 Method, apparatus and medium are determined based on the target object affective state of deep learning
CN110234018B (en) * 2019-07-09 2022-05-31 腾讯科技(深圳)有限公司 Multimedia content description generation method, training method, device, equipment and medium
CN110472506B (en) * 2019-07-11 2023-05-26 广东工业大学 Gesture recognition method based on support vector machine and neural network optimization
CN110693508A (en) * 2019-09-02 2020-01-17 中国航天员科研训练中心 Multi-channel cooperative psychophysiological active sensing method and service robot
CN110765839B (en) * 2019-09-02 2022-02-22 合肥工业大学 Multi-channel information fusion and artificial intelligence emotion monitoring method for visible light facial image
CN110598608B (en) * 2019-09-02 2022-01-14 中国航天员科研训练中心 Non-contact and contact cooperative psychological and physiological state intelligent monitoring system
CN111242155A (en) * 2019-10-08 2020-06-05 台州学院 Bimodal emotion recognition method based on multimode deep learning
CN111476217A (en) * 2020-05-27 2020-07-31 上海乂学教育科技有限公司 Intelligent learning system and method based on emotion recognition
CN111914742A (en) * 2020-07-31 2020-11-10 辽宁工业大学 Attendance checking method, system, terminal equipment and medium based on multi-mode biological characteristics
CN112418034A (en) * 2020-11-12 2021-02-26 元梦人文智能国际有限公司 Multi-modal emotion recognition method and device, electronic equipment and storage medium
CN112784730B (en) * 2021-01-20 2022-03-29 东南大学 Multi-modal emotion recognition method based on time domain convolutional network
CN116682168B (en) * 2023-08-04 2023-10-17 阳光学院 Multi-modal expression recognition method, medium and system
CN117351575B (en) * 2023-12-05 2024-02-27 北京师范大学珠海校区 Nonverbal behavior recognition method and nonverbal behavior recognition device based on text-generated graph data enhancement model

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968643A (en) * 2012-11-16 2013-03-13 华中科技大学 Multi-mode emotion recognition method based on Lie group theory

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529504B (en) * 2016-12-02 2019-05-31 合肥工业大学 A kind of bimodal video feeling recognition methods of compound space-time characteristic
CN107273876B (en) * 2017-07-18 2019-09-10 山东大学 A kind of micro- expression automatic identifying method of ' the macro micro- transformation model of to ' based on deep learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968643A (en) * 2012-11-16 2013-03-13 华中科技大学 Multi-mode emotion recognition method based on Lie group theory

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Multi View Facial Action Unit Detection Based on CNN and BLSTM-RNN》;He Jun;《2017 12th IEEE International Conference on Automatic Face & Gesture Recognition》;20170629;第848-853页 *
《表情和姿态的双模态情感识别》;闫静杰 等;《中国图象图形学报》;20130930;第18卷(第9期);第1101-1106页 *

Also Published As

Publication number Publication date
CN107808146A (en) 2018-03-16

Similar Documents

Publication Publication Date Title
CN107808146B (en) Multi-mode emotion recognition and classification method
Marrero Fernandez et al. Feratt: Facial expression recognition with attention net
Zhang et al. Ppr-fcn: Weakly supervised visual relation detection via parallel pairwise r-fcn
Kim et al. Deep generative-contrastive networks for facial expression recognition
Tang et al. 3D facial expression recognition based on automatically selected features
CN106778796B (en) Human body action recognition method and system based on hybrid cooperative training
Yuan et al. Facial expression feature extraction using hybrid PCA and LBP
CN106203356B (en) A kind of face identification method based on convolutional network feature extraction
CN106548149A (en) The recognition methods of the micro- facial expression image sequence of face in monitor video sequence
Hossain et al. Multimodal feature learning for gait biometric based human identity recognition
CN115862120B (en) Face action unit identification method and equipment capable of decoupling separable variation from encoder
CN103577804B (en) Based on SIFT stream and crowd's Deviant Behavior recognition methods of hidden conditional random fields
Vadlapati et al. Facial recognition using the OpenCV Libraries of Python for the pictures of human faces wearing face masks during the COVID-19 pandemic
Bai et al. Collaborative attention mechanism for multi-view action recognition
Nasir et al. ENGA: elastic net-based genetic algorithm for human action recognition
CN113111797B (en) Cross-view gait recognition method combining self-encoder and view transformation model
Kong et al. A hierarchical model for human interaction recognition
Seyedarabi et al. Automatic lip tracking and action units classification using two-step active contours and probabilistic neural networks
CN113076905A (en) Emotion recognition method based on context interaction relationship
CN112508121A (en) Method and system for sensing outside by industrial robot
Carvajal et al. Multi-action recognition via stochastic modelling of optical flow and gradients
CN114241573A (en) Facial micro-expression recognition method and device, electronic equipment and storage medium
Verma et al. Facial expression recognition: A review
CN111553202A (en) Training method, detection method and device of neural network for detecting living body
Ptucha et al. Fusion of static and temporal predictors for unconstrained facial expression recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant