CN108596039A - A kind of bimodal emotion recognition method and system based on 3D convolutional neural networks - Google Patents
A kind of bimodal emotion recognition method and system based on 3D convolutional neural networks Download PDFInfo
- Publication number
- CN108596039A CN108596039A CN201810267991.8A CN201810267991A CN108596039A CN 108596039 A CN108596039 A CN 108596039A CN 201810267991 A CN201810267991 A CN 201810267991A CN 108596039 A CN108596039 A CN 108596039A
- Authority
- CN
- China
- Prior art keywords
- layer
- expression
- posture
- neural networks
- convolutional neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Neurology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of bimodal emotion recognition method and system based on 3D convolutional neural networks.Structure is used for two kinds of 3D convolutional neural networks of expression emotion recognition and posture emotion recognition, and the training set based on bimodal emotion video library and verification collection optimization network model parameter to this method respectively first;The test set for being then based on bimodal emotion video library respectively tests two kinds of neural networks after optimization, obtains expression emotion recognition confusion matrix and posture emotion recognition confusion matrix;The priori for finally utilizing expression emotion recognition confusion matrix and posture emotion recognition confusion matrix, will merge the recognition result of the both modalities which of the expression video sequence and posture video sequence that newly input, obtains the emotional semantic classification result of bimodal.This method uses 3D convolutional neural networks and bimodal blending algorithm, avoids the subjectivity of artificial design features, overcomes the limitation of single mode emotion recognition, and can effectively improve the accuracy and robustness of emotion recognition.
Description
Technical field
The invention belongs to machine learning and area of pattern recognition, it is related to a kind of video feeling recognition methods and system, especially
It is related to a kind of bimodal emotion recognition method and system based on 3D convolutional neural networks.
Background technology
With the high speed development of science and technology, the mankind constantly enhance the dependence of computer, and interactive capability is ground
The attention for the person of studying carefully.One of the important goal of computer science development is how personalizing for realization computer, this has become
One hot issue of the area research.A critical issue of required solution is to realize computer in human-computer interaction
Emotion recognition ability.
Emotion recognition ability is an importance of computer intelligence, it reflects computer and passes through the information to acquisition
Judge the ability of the affective state of operator or interlocutor.By studying emotion recognition technology, machine can be identified and understands people
Emotion, people can establish more friendly, harmonious man-machine interaction environment.Emotion recognition technology human-computer interaction, medical treatment,
The fields such as safety, education and amusement have broad application prospects.With the deep and computer of emotion recognition technical research
The continuous improvement of emotion recognition ability will greatly improve the quality of life of the mankind.
Currently, the research of emotion recognition is greatly for single mode such as facial expression, voice or EEG signals
It carries out.Compared to single mode, two or more mode possess more emotion informations.The mankind be also by multi-modal mode come
Show emotion information.Therefore, multiple modalities signal is excavated and merged to depth, is the one kind for further increasing emotion recognition performance
Effective way.
A kind of Chinese patent application " bimodal video feeling recognition methods of compound space-time characteristic " (number of patent application
201611096937.9, publication number CN106529504A), by extract respectively upper body posture sample and human face expression sample when
Three value pattern square (TSLTPM) histogram features of empty part and three-dimensional gradient direction (3DHOG) histogram feature, constitute corresponding sample
This compound space-time characteristic of upper body posture and the compound space-time characteristic of human face expression, finally uses D-S evidence theory decision rule pair
Compound space-time characteristic test collection is classified, and emotion recognition result is obtained.This method uses the feature of engineer, feature extraction
Process is relatively complicated, and complexity is higher, in addition, when being merged using D-S evidence theory decision rule, will produce Yin Ji
The minor change of this probability distribution function and the unstability for causing fusion results completely different, and in processing conflict completely or
The result for running counter to convention is generated when height conflicting evidence.
Chinese patent application " mankind's nature emotion identification method combined based on expression and behavior bimodal " (patent
Application number 201610654684.6, publication number CN106295568A), using the emotion cognition framework of two-stage classification pattern, first
The trunk motion feature of extraction match comparing with the trunk motion feature library established in advance, it is thick to obtain emotion
Classification results;Then, match from the human face expression feature that the human face expression feature database established in advance is found out and is extracted
Human face expression feature exports corresponding emotion disaggregated classification result.Greatest problem existing for this method is can not to extract effective people
Body torso exercise feature, and it is difficult to set up effective trunk motion feature library and human face expression feature database.
Invention content
Goal of the invention:In view of the deficiencies of the prior art, a kind of based on 3D convolutional neural networks present invention aims at providing
Bimodal emotion recognition method and system the extraction of feature is simplified by powerful feature learning and classification capacity, and improve
The accuracy and robustness of emotion recognition.
Technical solution:The present invention uses following technical scheme for achieving the above object:
A kind of bimodal emotion recognition method based on 3D convolutional neural networks, includes the following steps:
(1) while everyone facial expression video clip and body posture video clip sample is obtained, by each
Video clip is trimmed into an isometric frame sequence, establishes the expression and posture bimodal emotion video for including emotional category label
Library, and the sample of bimodal emotion video library is divided into training set, verification collection and test set;
(2) utilize the expression video sequence and posture video sequence that training set and verification are concentrated respectively to the first 3D of structure
Convolutional neural networks and the 2nd 3D convolutional neural networks are trained, and optimize network model parameter;The training set is used for network
Training after often training iteration preset times, is once tested, whether the selection for verifying network parameter closes on verification collection
Reason;The first 3D convolutional neural networks and the 2nd 3D convolutional neural networks include:
Data input layer is used for input video sequence, every frame image in video sequence is normalized;
The composite module of at least two convolutional layer and pond layer, wherein convolutional layer are using several 3D convolution kernels to last layer
Output carries out convolution algorithm, and pond layer is used for the output to convolutional layer and carries out down-sampling operation;
Full articulamentum, the output neuron for the output of last layer pond layer to be fully connected to this layer export a spy
Sign vector;
And classification layer, the feature vector for exporting full articulamentum are connected to the output section for indicating emotional category entirely
Point, exports a n-dimensional vector, and wherein n is emotional category number;
Preferably, the first 3D convolutional neural networks, including the 1 data input layer, at least two volume that are linked in sequence
The composite module of lamination and pond layer, 1 full articulamentum and 1 Softmax classification layer;
The data input layer is first layer, is inputted as expression video sequence, to every frame image progress in video sequence
Normalized;The length of the expression video sequence is 16,24 or 32 frames;
The composite module of the convolutional layer and pond layer, including 1 convolutional layer and 1 pond layer, wherein convolutional layer includes
ReLU nonlinear activation function layers select m1A d1×k1×k13D convolution kernels convolution algorithm is carried out to the output of last layer,
In, d1、k1It is chosen in 3,5,7 numerical value, m1It is chosen in 32,64,128,256,512 numerical value;Pond layer choosing d2×k2×k2
Pondization verification last layer convolutional layer output carry out down-sampling operation, wherein d2、k2It is chosen in 1,2,3 numerical value;
The output of last layer pond layer is fully connected to the c output neuron of this layer by the full articulamentum, exports a c
The feature vector of dimension, wherein c chooses in 256,512,1024 numerical value;
The feature vector of the full articulamentum output of last layer is connected to n output node by the Softmax classification layers entirely, is passed through
It crosses after Softmax is returned and obtains a n-dimensional vector [p1 p2 p3 … pn]T, the wherein numerical value of each dimension is exactly to input to regard
The emotional category of frequency sequence belongs to the probability of corresponding classification;N is emotional category number.
Preferably, the 2nd 3D convolutional neural networks, including the 1 data input layer, at least two volume that are linked in sequence
The composite module of lamination and pond layer, 1 full articulamentum and 1 Softmax classification layer;
The data input layer is first layer, is inputted as posture video sequence, to every frame image progress in video sequence
Normalized;The length of the posture video sequence is 16,24 or 32 frames;
The composite module of the convolutional layer and pond layer, including 1 convolutional layer and 1 pond layer, wherein convolutional layer includes
ReLU nonlinear activation function layers select m2A d3×k3×k33D convolution kernels convolution algorithm is carried out to the output of last layer,
In, d3、k3It is chosen in 3,5,7 numerical value, m2It is chosen in 32,64,128,256,512 numerical value;Pond layer choosing d4×k4×k4
Pondization verification last layer convolutional layer output carry out down-sampling operation, wherein d4、k4It is chosen in 1,2,3 numerical value;
The output of last layer pond layer is fully connected to the c output neuron of this layer by the full articulamentum, exports a c
The feature vector of dimension, wherein c chooses in 256,512,1024 numerical value;
The feature vector of the full articulamentum output of last layer is connected to n output node by the Softmax classification layers entirely, is passed through
It crosses after Softmax is returned and obtains a n-dimensional vector [q1 q2 q3 … qn]T, the wherein numerical value of each dimension is exactly to input to regard
The emotional category of frequency sequence belongs to the probability of corresponding classification.
(3) emotion is carried out to the expression video sequence samples in test set using the first 3D convolutional neural networks after optimization
Classification and Identification obtains a n-dimensional vector, the maximum dimension institute of the numerical values recited of each more vectorial dimension, wherein numerical value
Corresponding classification is exactly the emotional category of the sample;Retest is carried out to all expression video sequence samples in test set,
Statistical classification recognition result obtains expression emotional semantic classification identity confusion matrix E, i.e.,
Similarly, using the 2nd 3D convolutional neural networks after optimization to the posture video sequence sample in test set into market
Feel Classification and Identification, obtains a n-dimensional vector, the maximum dimension of the numerical values recited of each more vectorial dimension, wherein numerical value
Corresponding classification is exactly the emotional category of the sample;All posture video sequence samples in test set are carried out to repeat survey
Examination, statistical classification recognition result obtain posture emotional semantic classification identity confusion matrix G, i.e.,
(4) the first 3D convolutional neural networks and the 2nd 3D convolutional neural networks table to newly inputting respectively after optimization are utilized
Feelings video sequence and posture video sequence carry out emotional semantic classification identification, obtain the emotional semantic classification identification of expression and posture both modalities which
As a result;
(5) the expression emotional semantic classification identity confusion matrix E and posture emotional semantic classification identity confusion square that step (3) obtains are utilized
The priori of battle array G, fusion is weighted by the emotional semantic classification recognition result for the both modalities which that step (4) obtains in decision-making level,
Obtain the emotional semantic classification of bimodal as a result, specific steps are as follows:
(5.1) numerical value of the element on expression emotional semantic classification identity confusion matrix E leading diagonals is normalized, is obtained
(5.2) numerical value of the element on posture emotional semantic classification identity confusion matrix G leading diagonals is normalized, is obtained
(5.3) the emotional semantic classification recognition result of expression and posture both modalities which is weighted fusion, obtains a new n
Dimensional vector V, i.e.,
Compare the numerical values recited of each dimension in vectorial V, wherein the classification corresponding to the maximum dimension of numerical value is exactly defeated
Enter the emotional category of video sequence.
A kind of bimodal emotion recognition system based on 3D convolutional neural networks that another aspect of the present invention provides, including:
Preprocessing module, for obtaining everyone facial expression video clip and body posture video clip sample simultaneously
This, an isometric frame sequence is trimmed by each video clip, establishes the expression comprising emotional category label and posture is double
Mode emotion video library, and the sample of bimodal emotion video library is divided into training set, verification collection and test set;
Network model training module, the expression video sequence and posture video sequence concentrated using training set and verification are distinguished
First 3D convolutional neural networks of structure and the 2nd 3D convolutional neural networks are trained, network model parameter is optimized;It is described
First 3D convolutional neural networks and the 2nd 3D convolutional neural networks include:Data input layer is used for input video sequence, to regarding
Image in frequency sequence is normalized;The composite module of at least two convolutional layer and pond layer, if wherein convolutional layer uses
Dry 3D convolution kernels carry out convolution algorithm to the output of last layer, and pond layer is used for the output to convolutional layer and carries out down-sampling operation;
Full articulamentum, the output neuron for the output of last layer pond layer to be fully connected to this layer export a feature vector;With
And classification layer, the feature vector for exporting full articulamentum are connected to the output node for indicating emotional category entirely, export one
N-dimensional vector, wherein n are emotional category number;
Confusion matrix acquisition module, the first 3D convolutional neural networks for being utilized respectively after optimizing and the 2nd 3D convolution god
Through network in test set expression video sequence samples and posture video sequence sample carry out emotional semantic classification identification, and statistical
Class recognition result obtains the expression emotional semantic classification identity confusion matrix and posture emotional semantic classification identity confusion matrix of n × n;
Expression and posture emotional semantic classification identification module utilize the first 3D convolutional neural networks and the 2nd 3D convolution after optimization
Neural network carries out emotional semantic classification identification to the expression video sequence and posture video sequence that newly input respectively, obtains expression and appearance
The emotional semantic classification recognition result of state both modalities which;
And decision-making module, the expression emotional semantic classification identity confusion matrix for being obtained using confusion matrix acquisition module
With the priori of posture emotional semantic classification identity confusion matrix, two kinds of moulds that expression and posture emotional semantic classification identification module are obtained
The emotional semantic classification recognition result of state is weighted fusion in decision-making level, obtains the emotional semantic classification result of bimodal.
Advantageous effect:Compared with prior art, the present invention has the following technical effects:
(1) present invention using 3D convolutional neural networks extraction video clip time domain and spatial feature, by feature extraction from
Still image is extended to image sequence, is adaptively adjusted parameter by training network, can independently extract being capable of reflecting time
The behavioral characteristics of information, the affective characteristics extracted can preferably characterize the variation of facial expression and body posture, relative to
Traditional artificial design features have stronger characterization ability and generalization ability, to finally promote the accuracy of Classification and Identification.
(2) present invention carries out emotional semantic classification identification using the information of fusion facial expression and body posture both modalities which, gram
The limitation of single mode emotional semantic classification identification is taken.
(3) present invention utilizes table when decision-making level is weighted fusion to the recognition result of expression and posture both modalities which
The priori of the emotional semantic classification identity confusion matrix of feelings and posture both modalities which determines the weighted value of weighting, can overcome and adopt
Lead to fusion results completely because of the minor change of Basic probability assignment function when being merged with D-S evidence theory decision rule
Different unstability, and the problems such as running counter to the result of convention is generated in processing conflict completely or height conflicting evidence,
The accuracy and robustness of emotion recognition can be effectively improved.
Description of the drawings
Fig. 1 is a kind of flow chart of bimodal emotion recognition method based on 3D convolutional neural networks of the present invention;
Fig. 2 is a kind of basic framework figure of bimodal emotion recognition method based on 3D convolutional neural networks of the present invention;
Fig. 3 is the partial video truncated picture in FABO databases;(a)-(c) is different facial expression video sectional drawings,
(d)-(f) is different body posture video interceptions.
Specific implementation mode
Specific embodiments of the present invention are further described in detail with reference to the accompanying drawings of the specification.
As shown in Figure 1, a kind of bimodal emotion recognition side based on 3D convolutional neural networks provided in an embodiment of the present invention
Method mainly includes the following steps:
Step 1:Everyone facial expression video clip and body posture video clip sample is obtained simultaneously, it will be each
A video clip is trimmed into an isometric frame sequence, establishes expression and posture bimodal emotion comprising emotional category label and regards
Frequency library, and the sample of bimodal emotion video library is divided into training set, verification collection and test set according to a certain percentage.
In the present embodiment, FABO (A Bimodal Face and Body Gesture Database) bimodal feelings are chosen
Feel video database.In practice, other video databases can also be used, or twin camera is voluntarily used to acquire facial table
Feelings video and body posture video establish the expression and posture bimodal emotion video library for including emotional category label.This implementation
The sample that the FABO databases of example provide contains 23 people, everyone has 9 kinds of different emotional categories, including anger, anxiety,
It is weary of, detests, fearing, is sad, is surprised, is glad, is uncertain." sad " and " surprised " the two kinds of feelings for including in view of FABO databases
The sample number for feeling classification is insufficient, we have chosen anger, anxiety, are weary of, detest, fearing, is glad, not knowing 7 kinds of emotional categories
Sample, used respectively 1~7 as emotional category label;Video sample in database is pre-processed, according to 4:1:1
The arbitrary selecting video sample of ratio is respectively as training set, verification collection and test set, and each video clip interception is at 16 frames
The video sequence of each sample set and label are stored as lst files by long frame sequence.When practical application, frame length can 16,
24, it is chosen in 32 numerical value.
Step 2:Two kinds of 3D convolutional neural networks are built respectively, wherein the first 3D convolutional neural networks are used for facial expression
Emotion recognition, the 2nd 3D convolutional neural networks are used for body posture emotion recognition.
Structure the first 3D convolutional neural networks, including be linked in sequence 1 data input layer, at least two convolutional layer and
The composite module of pond layer, 1 full articulamentum and 1 Softmax classification layer;
Data input layer is first layer, is inputted as expression video sequence, to every frame image progress normalizing in video sequence
Change is handled;
The composite module of convolutional layer and pond layer, including 1 convolutional layer and 1 pond layer, wherein convolutional layer includes ReLU
Nonlinear activation function layer selects m1A d1×k1×k13D convolution kernels convolution algorithm is carried out to the output of last layer, wherein
m1、d1、k1For positive integer, d1、k1It is chosen in 3,5,7 numerical value, m1It is chosen in 32,64,128,256,512 numerical value;Pond layer
Select d2×k2×k2Pondization verification last layer convolutional layer output carry out down-sampling operation, wherein d2、k2For positive integer, d2、
k2It is chosen in 1,2,3 numerical value;
The output of last layer pond layer is fully connected to the c output neuron of this layer by full articulamentum, exports what a c was tieed up
Feature vector, wherein c is positive integer, is chosen in 256,512,1024 numerical value;
The feature vector of the full articulamentum output of last layer is connected to n output node by Softmax classification layers entirely, is passed through
Softmax obtains a n-dimensional vector after returning, and the numerical value of wherein each dimension is exactly the emotional category category of input video sequence
In the probability of corresponding classification.
Structure the 2nd 3D convolutional neural networks, including be linked in sequence 1 data input layer, at least two convolutional layer and
The composite module of pond layer, 1 full articulamentum and 1 Softmax classification layer;
Data input layer is first layer, is inputted as posture video sequence, to every frame image progress normalizing in video sequence
Change is handled;
The composite module of convolutional layer and pond layer, including 1 convolutional layer and 1 pond layer, wherein convolutional layer includes ReLU
Nonlinear activation function layer selects m2A d3×k3×k33D convolution kernels convolution algorithm is carried out to the output of last layer, wherein
m2、d3、k3For positive integer, d3、k3It is chosen in 3,5,7 numerical value, m2It is chosen in 32,64,128,256,512 numerical value;Pond layer
Select d4×k4×k4Pondization verification last layer convolutional layer output carry out down-sampling operation, wherein d4、k4For positive integer, d4、
k4It is chosen in 1,2,3 numerical value;
The output of last layer pond layer is fully connected to the c output neuron of this layer by full articulamentum, exports what a c was tieed up
Feature vector, wherein c is positive integer, is chosen in 256,512,1024 numerical value;
The feature vector of the full articulamentum output of last layer is connected to n output node by Softmax classification layers entirely, is passed through
Softmax obtains a n-dimensional vector after returning, and the numerical value of wherein each dimension is exactly the emotional category category of input video sequence
In the probability of corresponding classification.
Used database based on the present embodiment can build two kinds of structures are identical, model parameter is different 3D volumes
Product neural network, as shown in Fig. 2, concrete structure is as follows:
First layer is data input layer, is 112 by each frame image normalization in the video sequence of 16 frame lengths of input
× 112 pixels;
The second layer is convolutional layer 1, selects the feature of 64 3 × 3 × 3 3D convolution kernels pair the first layer data input layer output
Figure group carries out convolution operation, and convolution step-length is 1, carries out zero padding (Zero Padding) and operate the length of edged to be 1, after convolution again
Nonlinear Mapping is carried out by correcting linear unit (ReLU) function, exports 64 characteristic pattern groups, each characteristic pattern group includes 16
The characteristic pattern that a size is 112 × 112;
Third layer is pond layer 1, the Chi Huahe of selection 1 × 2 × 2, the characteristic pattern group that convolutional layer 1 is exported with step-length 2 into
Row down-sampling operates, and exports 64 characteristic pattern groups, and each characteristic pattern group includes the characteristic pattern that 16 sizes are 56 × 56;
4th layer is convolutional layer 2, and the 3D convolution kernels selected 128 3 × 3 × 3 carry out the characteristic pattern group that pond layer 1 exports
Convolution operation, convolution step-length are 1, and the length for carrying out zero padding operation edged is 1, using amendment linear unit (ReLU) after convolution
Function carries out Nonlinear Mapping, exports 128 characteristic pattern groups, and each characteristic pattern group includes the feature that 16 sizes are 56 × 56
Figure;
Layer 5 is pond layer 2, the Chi Huahe of selection 2 × 2 × 2, the characteristic pattern group that convolutional layer 2 is exported with step-length 2 into
Row down-sampling operates, and exports 128 characteristic pattern groups, and each characteristic pattern group includes the characteristic pattern that 8 sizes are 28 × 28;
Layer 6 is convolutional layer 3, and the 3D convolution kernels selected 256 3 × 3 × 3 carry out the characteristic pattern group that pond layer 2 exports
Convolution operation, convolution step-length are 1, and the length for carrying out zero padding operation edged is 1, using amendment linear unit (ReLU) after convolution
Function carries out Nonlinear Mapping, exports 256 characteristic pattern groups, and each characteristic pattern group includes the characteristic pattern that 8 sizes are 28 × 28;
Layer 7 is pond layer 3, the Chi Huahe of selection 2 × 2 × 2, the characteristic pattern group that convolutional layer 3 is exported with step-length 2 into
Row down-sampling operates, and exports 256 characteristic pattern groups, and each characteristic pattern group includes the characteristic pattern that 4 sizes are 14 × 14;
8th layer is convolutional layer 4, and the 3D convolution kernels selected 256 3 × 3 × 3 carry out the characteristic pattern group that pond layer 3 exports
Convolution operation, convolution step-length are 1, and the length for carrying out zero padding operation edged is 1, after convolution using correct linear unit function into
Row Nonlinear Mapping, exports 256 characteristic pattern groups, and each characteristic pattern group includes the characteristic pattern that 4 sizes are 14 × 14;
9th layer is pond layer 4, selects the Chi Huahe of 2 × 2 × 2 sizes, the characteristic pattern exported to convolutional layer 4 with step-length 2
Group carries out down-sampling operation, exports 256 characteristic pattern groups, and each characteristic pattern group includes the characteristic pattern that 2 sizes are 7 × 7;
Tenth layer is full articulamentum, and the output of pond layer 4 is fully connected to 512 output neurons of this layer, output one
Then the feature vector of a 512 dimension uses Dropout methods to adjust connection weight by it using ReLU function nonlinear transformations
Weight, full linking number are 512;
Eleventh floor is classification layer, and using Softmax graders, the feature vector of the tenth layer of full articulamentum output is connected entirely
7 output nodes are connected to, 7 dimensional vectors are obtained after Softmax is returned, the numerical value of wherein each dimension is exactly to input
The emotional category of video sequence belongs to the probability of corresponding classification;
After building above two 3D convolutional neural networks, with the expression video sequence and appearance in bimodal emotion video library
State video sequence is respectively trained corresponding 3D convolutional neural networks as input, optimizes two using back-propagation algorithm
The model parameter of kind 3D convolutional neural networks.
Step 3:The first 3D convolutional neural networks of expression video sequence pair concentrated using training set and verification are trained,
The 2nd 3D convolutional neural networks of posture video sequence pair concentrated using training set and verification are trained, optimization network model ginseng
Number.Wherein, training set is used for network training, after often training iteration preset times, is once tested on verification collection, for testing
Whether the selection for demonstrate,proving network parameter is reasonable.
Step 4:The expression video sequence samples in test set are carried out using the first 3D convolutional neural networks after optimization
Emotional semantic classification identifies, obtains 7 dimensional vectors, the maximum dimension of the numerical values recited of each more vectorial dimension, wherein numerical value
The corresponding classification of degree is exactly the emotional category of the sample;All expression video sequence samples in test set are carried out to repeat survey
Examination, statistical classification recognition result obtain expression emotional semantic classification identity confusion matrix E, i.e.,
Similarly, using the 2nd 3D convolutional neural networks after optimization to the posture video sequence sample in test set into market
Feel Classification and Identification, obtains 7 dimensional vectors, the maximum dimension of the numerical values recited of each more vectorial dimension, wherein numerical value
Corresponding classification is exactly the emotional category of the sample;All posture video sequence samples in test set are carried out to repeat survey
Examination, statistical classification recognition result obtain posture emotional semantic classification identity confusion matrix G, i.e.,
Step 5:Using after optimization the first 3D convolutional neural networks and the 2nd 3D convolutional neural networks respectively to newly inputting
Expression video sequence and posture video sequence carry out emotional semantic classification identification, obtain the emotional semantic classification of expression and posture both modalities which
Recognition result;
Step 6:The expression emotional semantic classification identity confusion matrix E and posture emotional semantic classification identity confusion obtained using step 4
The emotional semantic classification recognition result for the both modalities which that step 5 obtains is weighted fusion by the priori of matrix G in decision-making level,
Obtain the emotional semantic classification of bimodal as a result, specific steps are as follows:
(6.1) numerical value of the element on expression emotional semantic classification identity confusion matrix E leading diagonals is normalized, is obtained
(6.2) numerical value of the element on posture emotional semantic classification identity confusion matrix G leading diagonals is normalized, is obtained
(6.3) the emotional semantic classification recognition result of expression and posture both modalities which is weighted fusion, obtains one new 7
Dimensional vector V, i.e.,
Compare the numerical values recited of each dimension in vectorial V, wherein the classification corresponding to the maximum dimension of numerical value is exactly defeated
Enter the emotional category of video sequence.
A kind of bimodal emotion recognition method based on 3D convolutional neural networks that the embodiment of the present invention proposes and traditional
Bimodal emotion recognition method is compared, the affective characteristics extracted relative to artificial design features have stronger characterization ability and
Generalization ability, to finally promote the accuracy of Classification and Identification.In addition, the identification in decision-making level to expression and posture both modalities which
When being as a result weighted fusion, using the priori of the emotional semantic classification identity confusion matrix of expression and posture both modalities which come really
Surely the weighted value weighted can overcome when being merged using D-S evidence theory decision rule because of Basic probability assignment function
Minor change and the unstability for causing fusion results completely different, and produced in processing conflict completely or height conflicting evidence
Raw the problems such as running counter to the result of convention, the accuracy and robustness of emotion recognition can be effectively improved.
A kind of bimodal emotion recognition system based on 3D convolutional neural networks that another embodiment of the present invention provides, packet
It includes:Preprocessing module will for obtaining everyone facial expression video clip and body posture video clip sample simultaneously
Each video clip is trimmed into an isometric frame sequence, establishes the expression and posture bimodal feelings for including emotional category label
Feel video library, and the sample of bimodal emotion video library is divided into training set, verification collection and test set;Network model trains mould
Block, the expression video sequence concentrated using training set and verification and posture video sequence are respectively to the first 3D convolutional Neurals of structure
Network and the 2nd 3D convolutional neural networks are trained, and optimize network model parameter;The first 3D convolutional neural networks and
Two 3D convolutional neural networks include:Data input layer is used for input video sequence, and normalizing is carried out to the image in video sequence
Change is handled;The composite module of at least two convolutional layer and pond layer, wherein convolutional layer are using several 3D convolution kernels to the defeated of last layer
Go out and carry out convolution algorithm, pond layer is used for the output to convolutional layer and carries out down-sampling operation;Full articulamentum is used for last layer pond
The output for changing layer is fully connected to the output neuron of this layer, exports a feature vector;And classification layer, for that will connect entirely
The feature vector of layer output is connected to the output node for indicating emotional category entirely;Confusion matrix acquisition module, for being utilized respectively
The first 3D convolutional neural networks and the 2nd 3D convolutional neural networks after optimization in test set expression video sequence samples and
Posture video sequence sample carries out emotional semantic classification identification, and statistical classification recognition result, and obtained expression emotional semantic classification identification is mixed
Confuse matrix and posture emotional semantic classification identity confusion matrix;Expression and posture emotional semantic classification identification module, utilize first after optimization
3D convolutional neural networks and the 2nd 3D convolutional neural networks respectively to the expression video sequence that newly inputs and posture video sequence into
Market sense Classification and Identification obtains the emotional semantic classification recognition result of expression and posture both modalities which;And decision-making module, for profit
The expression emotional semantic classification identity confusion matrix that is obtained with confusion matrix acquisition module and posture emotional semantic classification identity confusion matrix
Priori, the emotional semantic classification recognition result for the both modalities which that expression and posture emotional semantic classification identification module are obtained is in decision-making level
It is weighted fusion, obtains the emotional semantic classification result of bimodal.
The above-mentioned bimodal emotion recognition system embodiment based on 3D convolutional neural networks can be used for executing above-mentioned be based on
The bimodal emotion recognition method embodiments of 3D convolutional neural networks, technical principle, it is solved the technical issues of and generation
Technique effect is similar, the specific work process of the bimodal emotion recognition based on 3D convolutional neural networks of foregoing description and related
Illustrate, the corresponding process in the aforementioned bimodal emotion recognition method embodiment based on 3D convolutional neural networks can be referred to,
This is repeated no more.
It will be understood by those skilled in the art that can carry out adaptively changing and it to the module in embodiment
Be arranged in the one or more systems different from the embodiment.Can in embodiment module or unit or component combine
At a module or unit or component, and it can be divided into multiple submodule or subelement or sub-component in addition.
The above, the only specific implementation mode in the present invention, but scope of protection of the present invention is not limited thereto, appoints
What is familiar with the people of the technology within the technical scope disclosed by the invention, it will be appreciated that expects transforms or replaces, and should all cover
Within the scope of the present invention, therefore, the scope of protection of the invention shall be subject to the scope of protection specified in the patent claim.
Claims (5)
1. a kind of bimodal emotion recognition method based on 3D convolutional neural networks, which is characterized in that include the following steps:
(1) while everyone facial expression video clip and body posture video clip sample is obtained, by each video
Segment is trimmed into an isometric frame sequence, establishes the expression and posture bimodal emotion video library for including emotional category label,
And the sample of bimodal emotion video library is divided into training set, verification collection and test set;
(2) utilize the expression video sequence and posture video sequence that training set and verification are concentrated respectively to the first 3D convolution of structure
Neural network and the 2nd 3D convolutional neural networks are trained, and optimize network model parameter;The first 3D convolutional neural networks
Include with the 2nd 3D convolutional neural networks:
Data input layer is used for input video sequence, every frame image in video sequence is normalized;
The composite module of at least two convolutional layer and pond layer, wherein convolutional layer use output of several 3D convolution kernels to last layer
Convolution algorithm is carried out, pond layer is used for the output to convolutional layer and carries out down-sampling operation;
Full articulamentum, the output neuron for the output of last layer pond layer to be fully connected to this layer, output one feature to
Amount;
And classification layer, the feature vector for exporting full articulamentum is connected to the output node for indicating emotional category entirely, defeated
Go out a n-dimensional vector, wherein n is emotional category number;
(3) the first 3D convolutional neural networks and the 2nd 3D convolutional neural networks being utilized respectively after optimization are to the expression in test set
Video sequence sample and posture video sequence sample carry out emotional semantic classification identification, and statistical classification recognition result, obtain n × n's
Expression emotional semantic classification identity confusion matrix E and posture emotional semantic classification identity confusion matrix G;
(4) using after optimization the first 3D convolutional neural networks and the 2nd 3D convolutional neural networks the expression newly inputted is regarded respectively
Frequency sequence and posture video sequence carry out emotional semantic classification identification, obtain the emotional semantic classification identification knot of expression and posture both modalities which
Fruit;
(5) the expression emotional semantic classification identity confusion matrix E and posture emotional semantic classification identity confusion matrix G that step (3) obtains are utilized
Priori, the emotional semantic classification recognition result for the both modalities which that step (4) obtains is weighted fusion in decision-making level, is obtained
The emotional semantic classification result of bimodal.
2. a kind of bimodal emotion recognition method based on 3D convolutional neural networks according to claim 1, feature exist
In, the first 3D convolutional neural networks, including the 1 data input layer, at least two convolutional layer and the pond layer that are linked in sequence
Composite module, 1 full articulamentum and 1 Softmax classification layer;
The data input layer is first layer, is inputted as expression video sequence, to every frame image progress normalizing in video sequence
Change is handled;The length of the expression video sequence is 16,24 or 32 frames;
The composite module of the convolutional layer and pond layer, including 1 convolutional layer and 1 pond layer, wherein convolutional layer includes ReLU
Nonlinear activation function layer selects m1A d1×k1×k13D convolution kernels convolution algorithm is carried out to the output of last layer, wherein
d1、k1It is chosen in 3,5,7 numerical value, m1It is chosen in 32,64,128,256,512 numerical value;Pond layer choosing d2×k2×k2Pond
The output for changing verification last layer convolutional layer carries out down-sampling operation, wherein d2、k2It is chosen in 1,2,3 numerical value;
The output of last layer pond layer is fully connected to the c output neuron of this layer by the full articulamentum, exports what a c was tieed up
Feature vector, wherein c chooses in 256,512,1024 numerical value;
The feature vector of the full articulamentum output of last layer is connected to n output node by the Softmax classification layers entirely, is passed through
Softmax obtains a n-dimensional vector [p after returning1 p2 p3 … pn]T, the numerical value of wherein each dimension is exactly to input expression
The emotional category of video sequence belongs to the probability of corresponding classification;N is emotional category number.
3. a kind of bimodal emotion recognition method based on 3D convolutional neural networks according to claim 1, feature exist
In, the 2nd 3D convolutional neural networks, including the 1 data input layer, at least two convolutional layer and the pond layer that are linked in sequence
Composite module, 1 full articulamentum and 1 Softmax classification layer;
The data input layer is first layer, is inputted as posture video sequence, to every frame image progress normalizing in video sequence
Change is handled;The length of the posture video sequence is 16,24 or 32 frames;
The composite module of the convolutional layer and pond layer, including 1 convolutional layer and 1 pond layer, wherein convolutional layer includes ReLU
Nonlinear activation function layer selects m2A d3×k3×k33D convolution kernels convolution algorithm is carried out to the output of last layer, wherein
d3、k3It is chosen in 3,5,7 numerical value, m2It is chosen in 32,64,128,256,512 numerical value;Pond layer choosing d4×k4×k4Pond
The output for changing verification last layer convolutional layer carries out down-sampling operation, wherein d4、k4It is chosen in 1,2,3 numerical value;
The output of last layer pond layer is fully connected to the c output neuron of this layer by the full articulamentum, exports what a c was tieed up
Feature vector, wherein c chooses in 256,512,1024 numerical value;
The feature vector of the full articulamentum output of last layer is connected to n output node by the Softmax classification layers entirely, is passed through
Softmax obtains a n-dimensional vector [q after returning1 q2 q3 … qn]T, the numerical value of wherein each dimension is exactly to input posture
The emotional category of video sequence belongs to the probability of corresponding classification;N is emotional category number.
4. a kind of bimodal emotion recognition method based on 3D convolutional neural networks according to claim 1, feature exist
In the step (5) includes:
(5.1) numerical value of the element on expression emotional semantic classification identity confusion matrix E leading diagonals is normalized, is obtained
(5.2) numerical value of the element on posture emotional semantic classification identity confusion matrix G leading diagonals is normalized, is obtained
(5.3) the emotional semantic classification recognition result of expression and posture both modalities which is weighted fusion, obtain a new n tie up to
V is measured, i.e.,
Compare the numerical values recited of each dimension in vectorial V, the wherein classification corresponding to the maximum dimension of numerical value is exactly to input to regard
The emotional category of frequency sequence;Wherein [p1 p2 p3 … pn]T[q1 q2 q3 … qn]TRespectively the first 3D convolutional Neural nets
The recognition result vector of network and the classification layer output of the 2nd 3D convolutional neural networks.
5. a kind of bimodal emotion recognition system based on 3D convolutional neural networks, which is characterized in that including:
Preprocessing module, for obtaining everyone facial expression video clip and body posture video clip sample simultaneously,
Each video clip is trimmed into an isometric frame sequence, establishes the expression and posture bimodal for including emotional category label
Emotion video library, and the sample of bimodal emotion video library is divided into training set, verification collection and test set;
Network model training module, the expression video sequence concentrated using training set and verification and posture video sequence are respectively to structure
The first 3D convolutional neural networks built and the 2nd 3D convolutional neural networks are trained, and optimize network model parameter;Described first
3D convolutional neural networks and the 2nd 3D convolutional neural networks include:Data input layer is used for input video sequence, to video sequence
Image in row is normalized;The composite module of at least two convolutional layer and pond layer, wherein convolutional layer use several 3D
Convolution kernel carries out convolution algorithm to the output of last layer, and pond layer is used for the output to convolutional layer and carries out down-sampling operation;Quan Lian
Layer is connect, the output neuron for the output of last layer pond layer to be fully connected to this layer exports a feature vector;And
Classification layer, the feature vector for exporting full articulamentum are connected to the output node for indicating emotional category entirely, export a n dimension
Vector, wherein n are emotional category number;
Confusion matrix acquisition module, the first 3D convolutional neural networks and the 2nd 3D convolutional Neural nets for being utilized respectively after optimizing
Network in test set expression video sequence samples and posture video sequence sample carry out emotional semantic classification identification, and statistical classification know
Not as a result, obtaining the expression emotional semantic classification identity confusion matrix and posture emotional semantic classification identity confusion matrix of n × n;
Expression and posture emotional semantic classification identification module utilize the first 3D convolutional neural networks and the 2nd 3D convolutional Neurals after optimization
Network carries out emotional semantic classification identification to the expression video sequence and posture video sequence that newly input respectively, obtains expression and posture two
The emotional semantic classification recognition result of kind mode;
And decision-making module, the expression emotional semantic classification identity confusion matrix for being obtained using confusion matrix acquisition module and appearance
The priori of state emotional semantic classification identity confusion matrix, the both modalities which that expression and posture emotional semantic classification identification module are obtained
Emotional semantic classification recognition result is weighted fusion in decision-making level, obtains the emotional semantic classification result of bimodal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810267991.8A CN108596039B (en) | 2018-03-29 | 2018-03-29 | Bimodal emotion recognition method and system based on 3D convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810267991.8A CN108596039B (en) | 2018-03-29 | 2018-03-29 | Bimodal emotion recognition method and system based on 3D convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108596039A true CN108596039A (en) | 2018-09-28 |
CN108596039B CN108596039B (en) | 2020-05-05 |
Family
ID=63623893
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810267991.8A Active CN108596039B (en) | 2018-03-29 | 2018-03-29 | Bimodal emotion recognition method and system based on 3D convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108596039B (en) |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109460737A (en) * | 2018-11-13 | 2019-03-12 | 四川大学 | A kind of multi-modal speech-emotion recognition method based on enhanced residual error neural network |
CN109472269A (en) * | 2018-10-17 | 2019-03-15 | 深圳壹账通智能科技有限公司 | Characteristics of image configuration and method of calibration, device, computer equipment and medium |
CN109508644A (en) * | 2018-10-19 | 2019-03-22 | 陕西大智慧医疗科技股份有限公司 | Facial paralysis grade assessment system based on the analysis of deep video data |
CN109522945A (en) * | 2018-10-31 | 2019-03-26 | 中国科学院深圳先进技术研究院 | One kind of groups emotion identification method, device, smart machine and storage medium |
CN109614895A (en) * | 2018-10-29 | 2019-04-12 | 山东大学 | A method of the multi-modal emotion recognition based on attention Fusion Features |
CN109766765A (en) * | 2018-12-18 | 2019-05-17 | 深圳壹账通智能科技有限公司 | Audio data method for pushing, device, computer equipment and storage medium |
CN109934293A (en) * | 2019-03-15 | 2019-06-25 | 苏州大学 | Image-recognizing method, device, medium and obscure perception convolutional neural networks |
CN110033029A (en) * | 2019-03-22 | 2019-07-19 | 五邑大学 | A kind of emotion identification method and device based on multi-modal emotion model |
CN110084266A (en) * | 2019-03-11 | 2019-08-02 | 中国地质大学(武汉) | A kind of dynamic emotion identification method based on audiovisual features depth integration |
CN110147548A (en) * | 2019-04-15 | 2019-08-20 | 浙江工业大学 | The emotion identification method initialized based on bidirectional valve controlled cycling element network and new network |
CN110163145A (en) * | 2019-05-20 | 2019-08-23 | 西安募格网络科技有限公司 | A kind of video teaching emotion feedback system based on convolutional neural networks |
CN110188706A (en) * | 2019-06-03 | 2019-08-30 | 南京邮电大学 | Neural network training method and detection method based on facial expression in the video for generating confrontation network |
CN111292765A (en) * | 2019-11-21 | 2020-06-16 | 台州学院 | Bimodal emotion recognition method fusing multiple deep learning models |
CN111401117A (en) * | 2019-08-14 | 2020-07-10 | 南京邮电大学 | Neonate pain expression recognition method based on double-current convolutional neural network |
CN111401116A (en) * | 2019-08-13 | 2020-07-10 | 南京邮电大学 | Bimodal emotion recognition method based on enhanced convolution and space-time L STM network |
CN111414839A (en) * | 2020-03-16 | 2020-07-14 | 清华大学 | Emotion recognition method and device based on gestures |
CN111506697A (en) * | 2019-01-30 | 2020-08-07 | 北京入思技术有限公司 | Cross-modal emotion knowledge graph construction method and device |
CN111523462A (en) * | 2020-04-22 | 2020-08-11 | 南京工程学院 | Video sequence list situation recognition system and method based on self-attention enhanced CNN |
CN111680550A (en) * | 2020-04-28 | 2020-09-18 | 平安科技(深圳)有限公司 | Emotion information identification method and device, storage medium and computer equipment |
CN111860064A (en) * | 2019-04-30 | 2020-10-30 | 杭州海康威视数字技术股份有限公司 | Target detection method, device and equipment based on video and storage medium |
CN112329648A (en) * | 2020-11-09 | 2021-02-05 | 东北大学 | Interpersonal relationship behavior pattern recognition method based on facial expression interaction |
CN112529054A (en) * | 2020-11-27 | 2021-03-19 | 华中师范大学 | Multi-dimensional convolution neural network learner modeling method for multi-source heterogeneous data |
CN112784798A (en) * | 2021-02-01 | 2021-05-11 | 东南大学 | Multi-modal emotion recognition method based on feature-time attention mechanism |
CN112784730A (en) * | 2021-01-20 | 2021-05-11 | 东南大学 | Multi-modal emotion recognition method based on time domain convolutional network |
CN112800894A (en) * | 2021-01-18 | 2021-05-14 | 南京邮电大学 | Dynamic expression recognition method and system based on attention mechanism between space and time streams |
CN112800979A (en) * | 2021-02-01 | 2021-05-14 | 南京邮电大学 | Dynamic expression recognition method and system based on characterization flow embedded network |
CN113326868A (en) * | 2021-05-06 | 2021-08-31 | 南京邮电大学 | Decision layer fusion method for multi-modal emotion classification |
CN113326781A (en) * | 2021-05-31 | 2021-08-31 | 合肥工业大学 | Non-contact anxiety recognition method and device based on face video |
CN113383345A (en) * | 2019-12-17 | 2021-09-10 | 索尼互动娱乐有限责任公司 | Method and system for defining emotion machine |
CN113505719A (en) * | 2021-07-21 | 2021-10-15 | 山东科技大学 | Gait recognition model compression system and method based on local-integral joint knowledge distillation algorithm |
CN113780091A (en) * | 2021-08-12 | 2021-12-10 | 西安交通大学 | Video emotion recognition method based on body posture change expression |
CN113935435A (en) * | 2021-11-17 | 2022-01-14 | 南京邮电大学 | Multi-modal emotion recognition method based on space-time feature fusion |
CN114170540A (en) * | 2020-08-21 | 2022-03-11 | 四川大学 | Expression and gesture fused individual emotion recognition method |
CN115206297A (en) * | 2022-05-19 | 2022-10-18 | 重庆邮电大学 | Variable-length speech emotion recognition method based on space-time multiple fusion network |
WO2023151289A1 (en) * | 2022-02-09 | 2023-08-17 | 苏州浪潮智能科技有限公司 | Emotion identification method, training method, apparatus, device, storage medium and product |
CN116682168A (en) * | 2023-08-04 | 2023-09-01 | 阳光学院 | Multi-modal expression recognition method, medium and system |
CN117315755A (en) * | 2023-10-09 | 2023-12-29 | 北京大学 | Abnormal image identification method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102968643B (en) * | 2012-11-16 | 2016-02-24 | 华中科技大学 | A kind of multi-modal emotion identification method based on the theory of Lie groups |
CN106250855A (en) * | 2016-08-02 | 2016-12-21 | 南京邮电大学 | A kind of multi-modal emotion identification method based on Multiple Kernel Learning |
WO2017164478A1 (en) * | 2016-03-25 | 2017-09-28 | 한국과학기술원 | Method and apparatus for recognizing micro-expressions through deep learning analysis of micro-facial dynamics |
CN107220591A (en) * | 2017-04-28 | 2017-09-29 | 哈尔滨工业大学深圳研究生院 | Multi-modal intelligent mood sensing system |
CN107808146A (en) * | 2017-11-17 | 2018-03-16 | 北京师范大学 | A kind of multi-modal emotion recognition sorting technique |
-
2018
- 2018-03-29 CN CN201810267991.8A patent/CN108596039B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102968643B (en) * | 2012-11-16 | 2016-02-24 | 华中科技大学 | A kind of multi-modal emotion identification method based on the theory of Lie groups |
WO2017164478A1 (en) * | 2016-03-25 | 2017-09-28 | 한국과학기술원 | Method and apparatus for recognizing micro-expressions through deep learning analysis of micro-facial dynamics |
CN106250855A (en) * | 2016-08-02 | 2016-12-21 | 南京邮电大学 | A kind of multi-modal emotion identification method based on Multiple Kernel Learning |
CN107220591A (en) * | 2017-04-28 | 2017-09-29 | 哈尔滨工业大学深圳研究生院 | Multi-modal intelligent mood sensing system |
CN107808146A (en) * | 2017-11-17 | 2018-03-16 | 北京师范大学 | A kind of multi-modal emotion recognition sorting technique |
Non-Patent Citations (2)
Title |
---|
JINGJIE YAN等: "Integrating Facial Expression and Body Gesture in Videos for Emotion Recognition", 《IEICE TRANS.INF.& SYST.》 * |
卢官明等: "一种用于人脸表情识别的卷积神经网络", 《南京邮电大学学报(自然科学版)》 * |
Cited By (57)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109472269A (en) * | 2018-10-17 | 2019-03-15 | 深圳壹账通智能科技有限公司 | Characteristics of image configuration and method of calibration, device, computer equipment and medium |
CN109508644A (en) * | 2018-10-19 | 2019-03-22 | 陕西大智慧医疗科技股份有限公司 | Facial paralysis grade assessment system based on the analysis of deep video data |
CN109614895A (en) * | 2018-10-29 | 2019-04-12 | 山东大学 | A method of the multi-modal emotion recognition based on attention Fusion Features |
CN109522945A (en) * | 2018-10-31 | 2019-03-26 | 中国科学院深圳先进技术研究院 | One kind of groups emotion identification method, device, smart machine and storage medium |
CN109460737A (en) * | 2018-11-13 | 2019-03-12 | 四川大学 | A kind of multi-modal speech-emotion recognition method based on enhanced residual error neural network |
CN109766765A (en) * | 2018-12-18 | 2019-05-17 | 深圳壹账通智能科技有限公司 | Audio data method for pushing, device, computer equipment and storage medium |
CN111506697A (en) * | 2019-01-30 | 2020-08-07 | 北京入思技术有限公司 | Cross-modal emotion knowledge graph construction method and device |
CN110084266A (en) * | 2019-03-11 | 2019-08-02 | 中国地质大学(武汉) | A kind of dynamic emotion identification method based on audiovisual features depth integration |
CN109934293A (en) * | 2019-03-15 | 2019-06-25 | 苏州大学 | Image-recognizing method, device, medium and obscure perception convolutional neural networks |
CN110033029A (en) * | 2019-03-22 | 2019-07-19 | 五邑大学 | A kind of emotion identification method and device based on multi-modal emotion model |
CN110147548A (en) * | 2019-04-15 | 2019-08-20 | 浙江工业大学 | The emotion identification method initialized based on bidirectional valve controlled cycling element network and new network |
CN110147548B (en) * | 2019-04-15 | 2023-01-31 | 浙江工业大学 | Emotion identification method based on bidirectional gating circulation unit network and novel network initialization |
CN111860064A (en) * | 2019-04-30 | 2020-10-30 | 杭州海康威视数字技术股份有限公司 | Target detection method, device and equipment based on video and storage medium |
CN111860064B (en) * | 2019-04-30 | 2023-10-20 | 杭州海康威视数字技术股份有限公司 | Video-based target detection method, device, equipment and storage medium |
CN110163145A (en) * | 2019-05-20 | 2019-08-23 | 西安募格网络科技有限公司 | A kind of video teaching emotion feedback system based on convolutional neural networks |
CN110188706A (en) * | 2019-06-03 | 2019-08-30 | 南京邮电大学 | Neural network training method and detection method based on facial expression in the video for generating confrontation network |
CN110188706B (en) * | 2019-06-03 | 2022-04-19 | 南京邮电大学 | Neural network training method and detection method based on character expression in video for generating confrontation network |
CN111401116B (en) * | 2019-08-13 | 2022-08-26 | 南京邮电大学 | Bimodal emotion recognition method based on enhanced convolution and space-time LSTM network |
CN111401116A (en) * | 2019-08-13 | 2020-07-10 | 南京邮电大学 | Bimodal emotion recognition method based on enhanced convolution and space-time L STM network |
CN111401117B (en) * | 2019-08-14 | 2022-08-26 | 南京邮电大学 | Neonate pain expression recognition method based on double-current convolutional neural network |
CN111401117A (en) * | 2019-08-14 | 2020-07-10 | 南京邮电大学 | Neonate pain expression recognition method based on double-current convolutional neural network |
CN111292765A (en) * | 2019-11-21 | 2020-06-16 | 台州学院 | Bimodal emotion recognition method fusing multiple deep learning models |
CN113383345A (en) * | 2019-12-17 | 2021-09-10 | 索尼互动娱乐有限责任公司 | Method and system for defining emotion machine |
CN111414839A (en) * | 2020-03-16 | 2020-07-14 | 清华大学 | Emotion recognition method and device based on gestures |
CN111414839B (en) * | 2020-03-16 | 2023-05-23 | 清华大学 | Emotion recognition method and device based on gesture |
CN111523462A (en) * | 2020-04-22 | 2020-08-11 | 南京工程学院 | Video sequence list situation recognition system and method based on self-attention enhanced CNN |
CN111523462B (en) * | 2020-04-22 | 2024-02-09 | 南京工程学院 | Video sequence expression recognition system and method based on self-attention enhanced CNN |
CN111680550A (en) * | 2020-04-28 | 2020-09-18 | 平安科技(深圳)有限公司 | Emotion information identification method and device, storage medium and computer equipment |
CN111680550B (en) * | 2020-04-28 | 2024-06-04 | 平安科技(深圳)有限公司 | Emotion information identification method and device, storage medium and computer equipment |
CN114170540B (en) * | 2020-08-21 | 2023-06-13 | 四川大学 | Individual emotion recognition method integrating expression and gesture |
CN114170540A (en) * | 2020-08-21 | 2022-03-11 | 四川大学 | Expression and gesture fused individual emotion recognition method |
CN112329648B (en) * | 2020-11-09 | 2023-08-08 | 东北大学 | Interpersonal relationship behavior pattern recognition method based on facial expression interaction |
CN112329648A (en) * | 2020-11-09 | 2021-02-05 | 东北大学 | Interpersonal relationship behavior pattern recognition method based on facial expression interaction |
CN112529054A (en) * | 2020-11-27 | 2021-03-19 | 华中师范大学 | Multi-dimensional convolution neural network learner modeling method for multi-source heterogeneous data |
CN112800894A (en) * | 2021-01-18 | 2021-05-14 | 南京邮电大学 | Dynamic expression recognition method and system based on attention mechanism between space and time streams |
CN112800894B (en) * | 2021-01-18 | 2022-08-26 | 南京邮电大学 | Dynamic expression recognition method and system based on attention mechanism between space and time streams |
CN112784730A (en) * | 2021-01-20 | 2021-05-11 | 东南大学 | Multi-modal emotion recognition method based on time domain convolutional network |
CN112784730B (en) * | 2021-01-20 | 2022-03-29 | 东南大学 | Multi-modal emotion recognition method based on time domain convolutional network |
CN112784798B (en) * | 2021-02-01 | 2022-11-08 | 东南大学 | Multi-modal emotion recognition method based on feature-time attention mechanism |
CN112784798A (en) * | 2021-02-01 | 2021-05-11 | 东南大学 | Multi-modal emotion recognition method based on feature-time attention mechanism |
CN112800979B (en) * | 2021-02-01 | 2022-08-26 | 南京邮电大学 | Dynamic expression recognition method and system based on characterization flow embedded network |
CN112800979A (en) * | 2021-02-01 | 2021-05-14 | 南京邮电大学 | Dynamic expression recognition method and system based on characterization flow embedded network |
CN113326868A (en) * | 2021-05-06 | 2021-08-31 | 南京邮电大学 | Decision layer fusion method for multi-modal emotion classification |
CN113326868B (en) * | 2021-05-06 | 2022-07-15 | 南京邮电大学 | Decision layer fusion method for multi-modal emotion classification |
CN113326781B (en) * | 2021-05-31 | 2022-09-02 | 合肥工业大学 | Non-contact anxiety recognition method and device based on face video |
CN113326781A (en) * | 2021-05-31 | 2021-08-31 | 合肥工业大学 | Non-contact anxiety recognition method and device based on face video |
CN113505719A (en) * | 2021-07-21 | 2021-10-15 | 山东科技大学 | Gait recognition model compression system and method based on local-integral joint knowledge distillation algorithm |
CN113505719B (en) * | 2021-07-21 | 2023-11-24 | 山东科技大学 | Gait recognition model compression system and method based on local-integral combined knowledge distillation algorithm |
CN113780091A (en) * | 2021-08-12 | 2021-12-10 | 西安交通大学 | Video emotion recognition method based on body posture change expression |
CN113780091B (en) * | 2021-08-12 | 2023-08-22 | 西安交通大学 | Video emotion recognition method based on body posture change representation |
CN113935435A (en) * | 2021-11-17 | 2022-01-14 | 南京邮电大学 | Multi-modal emotion recognition method based on space-time feature fusion |
WO2023151289A1 (en) * | 2022-02-09 | 2023-08-17 | 苏州浪潮智能科技有限公司 | Emotion identification method, training method, apparatus, device, storage medium and product |
CN115206297A (en) * | 2022-05-19 | 2022-10-18 | 重庆邮电大学 | Variable-length speech emotion recognition method based on space-time multiple fusion network |
CN115206297B (en) * | 2022-05-19 | 2024-10-01 | 重庆邮电大学 | Variable-length voice emotion recognition method based on space-time multiple fusion network |
CN116682168B (en) * | 2023-08-04 | 2023-10-17 | 阳光学院 | Multi-modal expression recognition method, medium and system |
CN116682168A (en) * | 2023-08-04 | 2023-09-01 | 阳光学院 | Multi-modal expression recognition method, medium and system |
CN117315755A (en) * | 2023-10-09 | 2023-12-29 | 北京大学 | Abnormal image identification method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108596039B (en) | 2020-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108596039A (en) | A kind of bimodal emotion recognition method and system based on 3D convolutional neural networks | |
CN110532900B (en) | Facial expression recognition method based on U-Net and LS-CNN | |
CN106650806B (en) | A kind of cooperating type depth net model methodology for pedestrian detection | |
CN110046671A (en) | A kind of file classification method based on capsule network | |
CN107085704A (en) | Fast face expression recognition method based on ELM own coding algorithms | |
CN106599797A (en) | Infrared face identification method based on local parallel nerve network | |
CN104679863A (en) | Method and system for searching images by images based on deep learning | |
CN106503654A (en) | A kind of face emotion identification method based on the sparse autoencoder network of depth | |
CN110399821A (en) | Customer satisfaction acquisition methods based on facial expression recognition | |
CN110188615A (en) | A kind of facial expression recognizing method, device, medium and system | |
CN109389171B (en) | Medical image classification method based on multi-granularity convolution noise reduction automatic encoder technology | |
CN106096641A (en) | A kind of multi-modal affective characteristics fusion method based on genetic algorithm | |
Liu et al. | Visual question answering with dense inter-and intra-modality interactions | |
CN109086802A (en) | A kind of image classification method based on biquaternion convolutional neural networks | |
CN106971145A (en) | A kind of various visual angles action identification method and device based on extreme learning machine | |
CN101169830A (en) | Human face portrait automatic generation method based on embedded type hidden markov model and selective integration | |
CN106909938A (en) | Viewing angle independence Activity recognition method based on deep learning network | |
CN115966010A (en) | Expression recognition method based on attention and multi-scale feature fusion | |
CN106503616A (en) | A kind of Mental imagery Method of EEG signals classification of the learning machine that transfinited based on layering | |
Xu et al. | Face expression recognition based on convolutional neural network | |
CN110135244A (en) | It is a kind of based on brain-machine cooperative intelligent expression recognition method | |
CN116150747A (en) | Intrusion detection method and device based on CNN and SLTM | |
CN111401116A (en) | Bimodal emotion recognition method based on enhanced convolution and space-time L STM network | |
Jin et al. | MiniExpNet: A small and effective facial expression recognition network based on facial local regions | |
CN111259264B (en) | Time sequence scoring prediction method based on generation countermeasure network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |