CN107808146A - A kind of multi-modal emotion recognition sorting technique - Google Patents
A kind of multi-modal emotion recognition sorting technique Download PDFInfo
- Publication number
- CN107808146A CN107808146A CN201711144196.1A CN201711144196A CN107808146A CN 107808146 A CN107808146 A CN 107808146A CN 201711144196 A CN201711144196 A CN 201711144196A CN 107808146 A CN107808146 A CN 107808146A
- Authority
- CN
- China
- Prior art keywords
- space
- time
- probability matrix
- facial image
- image space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of multi-modal emotion recognition sorting technique, methods described is included to the video comprising face to be detected and the video comprising body action is handled in the corresponding same time, it is transformed into the temporal sequence of images being made up of picture frame, extract the temporal characteristics and space characteristics in temporal sequence of images, more layer depth space-time characteristics based on acquisition, various features level fusion is carried out to feature, and decision level fusion is carried out to classification results, so as to the affective style of the task from multi-modal upper identification video to be detected, method provided by the invention, take full advantage of effective information present in each mode, improve the discrimination of emotion recognition.
Description
Technical field
The present invention relates to computer processing technology field, more particularly, to a kind of multi-modal emotion recognition sorting technique.
Background technology
Emotion recognition is as multi-crossed disciplines such as computer science, cognitive science, psychology, brain science, Neuscience
Emerging research field, its research purpose are exactly the emotional expression for allowing computer learning to understand the mankind, finally can be as the mankind
Equally there is identification, understand the ability of emotion.Therefore, as a cross discipline for being rich in challenge, emotion recognition, which turns into, to be worked as
Preceding pattern-recognition both at home and abroad, computer vision, big data is excavated and a study hotspot of artificial intelligence field, has important
Researching value and application prospect.
In existing emotion recognition technology, the research tendency of emotion recognition show two it is more obvious the characteristics of, one
Aspect, data expand to the emotion recognition based on dynamic image sequence by the emotion recognition based on still image;On the other hand, by
Emotion recognition based on single mode is expanded to based on multi-modal emotion recognition.At present, the emotion recognition based on still image is ground
Study carefully and have been achieved for a collection of good achievement, however, the emotion identification method based on static images have ignored human body expression when
Between multidate information.As a whole, the relative emotion recognition based on picture, the accuracy of analysis of video data are also needed into traveling
The research of one step.In addition, psychological study shows, emotion recognition is substantially multi-modal problem, utilizes body posture and face
Expression judges that affective state Billy has more preferable effect with single mode information jointly.For single mode, multi-modal letter is utilized
Breath is merged to identify that emotion can be more accurately and reliably.This causes multimodal information fusion also to develop into the one of emotion recognition field
Individual study hotspot.
In the prior art, facial expression and the modality fusion method of body posture be all only with single amalgamation mode,
Selected according to certain strategy from feature-based fusion or decision level fusion a kind of.In the prior art, can not be from video data
Extract effective space-time characteristic and carry out emotion recognition, on the other hand, either merged using early stage or later stage, similar melts
Conjunction method all has the characteristics of model is unrelated, does not make full use of effective information present in each mode, generally existing fusion effect
The problem of rate is not high.
The content of the invention
To solve in the prior art, effective space-time characteristic can not be extracted from video data and carries out asking for emotion recognition
Topic, and to either being merged in emotion recognition using early stage or later stage, similar fusion method all has model unrelated
Feature, effective information present in each mode is not made full use of, the problem of generally existing fusion efficiencies are not high, there is provided Yi Zhongduo
Mode emotion recognition sorting technique.
According to an aspect of the present invention, a kind of multi-modal emotion recognition sorting technique, including:
S1, receives testing data, and the testing data includes the video comprising face and included in the corresponding same time
The video of body action, the video comprising face and the corresponding video comprising body action are pre-processed, obtained
Facial image time series comprising face and the body image time series comprising body action;
S2, the facial image time series is sequentially inputted to the convolutional neural networks based on Alexnet and is based on
In BLSTM Recognition with Recurrent Neural Network, the data of output are taken out, as the first facial image space-time characteristic, by the body image
Time series is sequentially inputted in the convolutional neural networks based on Alexnet and the Recognition with Recurrent Neural Network based on BLSTM, is taken out defeated
The data gone out, as the first body image space-time characteristic;
S3, the first facial image space-time characteristic and the first body image space-time characteristic series connection are input to and connected entirely
Connect in neutral net, after obtaining the first facial image space-time characteristic and the first body image space-time characteristic fusion, category
In the probability matrix of different emotions type, this probability matrix is labeled as the first probability matrix, while by the first face figure
Connect and be input in SVMs as space-time characteristic and the first body image space-time characteristic, obtain the first face figure
After space-time characteristic and the first body image space-time characteristic series connection, belong to the probability matrix of different emotions type, this is general
Rate matrix is labeled as the second probability matrix;
S4, the first facial image space-time characteristic is input in SVMs, obtains first facial image
Space-time characteristic belongs to the probability matrix of different emotions type, this probability matrix is labeled as into the 3rd probability matrix, by described first
Body image feature is input in SVMs, is obtained the first body image space-time characteristic and is belonged to different emotions type
Probability matrix, this probability matrix is labeled as the 4th probability matrix, by first probability matrix, the second probability matrix, the 3rd
Probability matrix and the 4th probability matrix carry out Decision fusion, obtain the first fusion probability matrix, probability square is merged by described first
Probability highest emotion type is as emotion recognition result in battle array.
Wherein, also include before the step S1:To the convolutional neural networks based on Alexnet, based on BLSTM's
Recognition with Recurrent Neural Network, full Connection Neural Network and SVMs are trained.
Wherein, the video comprising face and the corresponding video comprising body action are pre-processed in step S1
Specifically include:
Face datection and registration process are carried out to each two field picture in the video comprising face, by the image after processing
Frame is sequentially arranged, and obtains the facial image time series;
Each two field picture in entering to the video comprising body action is normalized, by the image after processing
Frame arranges sequentially in time, obtains body image time series.
Wherein, the step S1 further comprises:
The mark of each picture frame in the reading video comprising face, extraction is labeled as beginning, summit and disappearance
Picture frame, form facial image time series;
Read the mark of each picture frame in the video comprising body action, extraction labeled as start, summit and
The picture frame of disappearance, form body image time series;
Wherein, the mark of described image frame include calming down, start, summit and disappearance.
Wherein, the step S2 is specifically included:
S21, the facial image time series is input in the convolutional neural networks based on Alexnet, takes out three
The data of the full articulamentum of the first two are carried out face space initial characteristicses as face space initial characteristicses in full articulamentum
Principal component analysis, so as to realize space conversion and dimensionality reduction, the first facial image space characteristics are obtained, by the body image time
Sequence inputting takes out the data of the full articulamentum of the first two in three full articulamentums into the convolutional neural networks based on Alexnet
As body space initial characteristicses, the body space initial characteristicses are subjected to principal component analysis, so as to realize space conversion and
Dimensionality reduction, obtain the first body image space characteristics;
S22, the first facial image space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, take out three
The data of the full articulamentum of the first two are carried out the face space-time initial characteristicses as face space-time initial characteristicses in full articulamentum
Principal component analysis, so as to realize space conversion and dimensionality reduction, the first facial image space-time characteristic is obtained, by first body image
Space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, take out the data of the full articulamentum of the first two in three full articulamentums
As body space-time initial characteristicses, the body space-time initial characteristicses are subjected to principal component analysis, so as to realize space conversion and
Dimensionality reduction, obtain the first body image space-time characteristic.
Wherein, also include in the step S1:
By default sliding window length, the facial image time series and the body image time series are entered
Row cutting, obtains the facial image time subsequence group being made up of multiple facial image time series fragments and multiple body images
The body image time subsequence group of time series fragment composition.
Wherein, the step S2 further comprises:
Multiple facial image time series fragments in the facial image time subsequence group are sequentially inputted to be based on
In Alexnet convolutional neural networks and Recognition with Recurrent Neural Network based on BLSTM, the data of output are taken out, as the second face
Image space-time characteristic;
Multiple body image time series fragments in the body image time subsequence group are sequentially inputted to be based on
In Alexnet convolutional neural networks and Recognition with Recurrent Neural Network based on BLSTM, the data of output are taken out, as the second body
Image space-time characteristic.
Wherein, also include in the step S2:
Multiple facial image time serieses in the facial image time subsequence group are input to based on Alexnet's
In convolutional neural networks, the data for taking out the full articulamentum of the first two in three full articulamentums are initially special as the second face space
Sign, the second face space initial characteristicses are subjected to principal component analysis, so as to realize space conversion and dimensionality reduction, obtain the second people
Face image space characteristics, multiple body image time serieses in the body image time subsequence group are input to and are based on
In Alexnet convolutional neural networks, the data for taking out the full articulamentum of the first two in three full articulamentums are empty as the second body
Between initial characteristicses, by the second body space initial characteristicses carry out principal component analysis, so as to realize space conversion and dimensionality reduction, obtain
Obtain the second body image space characteristics;
The second facial image space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, three is taken out and connects entirely
The data of the full articulamentum of the first two in layer are connect as the second face space-time initial characteristicses, the face space-time initial characteristicses are carried out
Principal component analysis, so as to realize space conversion and dimensionality reduction, the second facial image space-time characteristic is obtained, by second body image
Space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, take out the data of the full articulamentum of the first two in three full articulamentums
As the second body space-time initial characteristicses, the body space-time initial characteristicses are subjected to principal component analysis, so as to realize that space turns
Change and dimensionality reduction, obtain the second body image space-time characteristic.
Wherein, the step S3 further comprises:
The second facial image space-time characteristic and the second body image space-time characteristic series connection are input to full connection
In neutral net, output result is input in SVMs, obtains the second facial image space-time characteristic and described the
After the fusion of two body image space-time characteristics, belong to the probability matrix of different emotions type, this probability matrix is general labeled as the 5th
Rate matrix, while the second facial image space-time characteristic and the second body image space-time characteristic series connection are input to support
In vector machine, after obtaining the second facial image space-time characteristic and the second body image space-time characteristic fusion, belong to not
The probability matrix of feeling of sympathy type, this probability matrix is labeled as the 6th probability matrix.
Wherein, the step S4 further comprises:
The first facial image space-time characteristic is input in SVMs, obtains the first facial image space-time
Feature belongs to the probability matrix of different emotions type, this probability matrix is labeled as into the 3rd probability matrix, by first body
Characteristics of image is input in SVMs, obtains the probability that the first body image space-time characteristic belongs to different emotions type
Matrix, this probability matrix is labeled as the 4th probability matrix, by the 5th probability matrix, the 6th probability matrix, the 7th probability
Matrix and the 8th probability matrix carry out Decision fusion, obtain the second fusion probability matrix;
Described first fusion probability matrix and the second fusion probability matrix are subjected to Decision fusion, obtain the 3rd fusion
Probability matrix, using probability highest emotion type in the described 3rd fusion probability matrix as emotion recognition result.
Method provided by the invention, using the emotion identification method of multi-modal combination, take full advantage of in band detection video
The effective information of various mode, fusion efficiencies are improved, while improve the accuracy to emotion recognition.
Brief description of the drawings
Fig. 1 is a kind of flow chart for multi-modal emotion recognition sorting technique that one embodiment of the invention provides;
Fig. 2 is to be used in a kind of multi-modal emotion recognition sorting technique that one embodiment of the invention provides based on time series
The emotion recognition rate comparison diagram of different convergence strategies;
Fig. 3 is to space-time characteristic extraction in a kind of multi-modal emotion recognition sorting technique that one embodiment of the invention provides
Neural network structure schematic diagram;
Fig. 4 is to time sequence in a kind of multi-modal emotion recognition sorting technique that one embodiment of the invention provides using sliding window
The schematic diagram of column split;
Fig. 5 is to be based on time series fragment in a kind of multi-modal emotion recognition sorting technique that one embodiment of the invention provides
Using the emotion recognition rate comparison diagram of different convergence strategies;
Fig. 6 be one embodiment of the invention provide a kind of multi-modal emotion recognition sorting technique in based on time series and when
Between the emotion recognition rate comparison diagram that is merged of sequence fragment.
Embodiment
With reference to the accompanying drawings and examples, the embodiment of the present invention is described in further detail.Implement below
Example is used to illustrate the present invention, but is not limited to the scope of the present invention.
With reference to figure 1, Fig. 1 is a kind of flow chart for multi-modal emotion recognition sorting technique that one embodiment of the invention provides,
Methods described includes:
S1, receive testing data, and the testing data includes the video comprising face and corresponding comprising body action
Video, the video comprising face and the corresponding video comprising body action are pre-processed, when obtaining facial image
Between sequence and body image time series.
Specifically, by receiving comprising the video of countenance comprising people and regarding comprising body action in the same time
Frequently, by after video pre-filtering, the video of video and body action to face arranges according to picture frame respectively, obtain by regarding
The facial image time series and body image time series of picture frame composition in frequency.
By the method, video data is converted into picture frame sequence, improves the operability to data, it is convenient follow-up
Data are handled.
S2, the facial image time series is sequentially inputted to the convolutional neural networks based on Alexnet and is based on
In BLSTM Recognition with Recurrent Neural Network, the data of output are taken out, as the first facial image space-time characteristic, by the body image
Time series is sequentially inputted in the convolutional neural networks based on Alexnet and the Recognition with Recurrent Neural Network based on BLSTM, is taken out defeated
The data gone out, as the first body image space-time characteristic.
Specifically, the facial image time series obtained in S1 and body image time series are input to and trained respectively
Based on Alexnet convolutional neural networks neutralize the Recognition with Recurrent Neural Network based on BLSTM in, by based on Alexnet convolution god
The feature spatially of temporal sequence of images can be obtained from the time series through network, and can by Recognition with Recurrent Neural Network
Further to obtain the feature in temporal sequence of images on space-time in the space characteristics of acquisition.In the present embodiment, pass through difference
Facial image time series and body image time series are input in the convolutional neural networks based on Alexnet trained
In the Recognition with Recurrent Neural Network based on BLSTM, space-time characteristic i.e. the first face of facial image Time-space serial can be obtained respectively
Image space-time characteristic and the space-time characteristic of body image sequence are the first body image space-time characteristic.
By the method, the convolutional neural networks combined based on Alexnet and the circulation god based on BLSTM are built
Depth network through network, the local and global space-time characteristic of extraction so that can according to more layer depth space-time characteristics of acquisition,
The facial image time series and body image time series are classified.
S3, the first facial image space-time characteristic and the first body image space-time characteristic series connection are input to and connected entirely
Connect in neutral net, output result is input in SVMs, obtain the first facial image space-time characteristic and described
After the fusion of first body image space-time characteristic, belong to the probability matrix of different emotions type, this probability matrix is labeled as first
Probability matrix, while the first facial image space-time characteristic and the first body image space-time characteristic series connection are input to branch
Hold in vector machine, after obtaining the first facial image space-time characteristic and the first body image space-time characteristic series connection, belong to
The probability matrix of different emotions type, this probability matrix is labeled as the second probability matrix.
Specifically, the first facial image space-time characteristic and the first body image space-time characteristic are connected, input
Into the full Connection Neural Network trained, output result is input in the SVMs trained, can be according to described
First facial image space-time characteristic and the first body image space-time characteristic both modalities which combination, obtain the first face figure
As the assemblage characteristic of space-time characteristic and the first body image space-time characteristic belongs to the probability of different emotions classification, structure first
Class probability matrix.
Wherein, in the output data of full Connection Neural Network, it is preferred that the data of inverted second full articulamentum are entered
Row principal component analysis carries out dimensionality reduction, then by the data input after processing into the SVMs trained, to obtain precision more
High probabilistic classification result.
On the other hand, by the way that the first facial image space-time characteristic and the first body image space-time characteristic are carried out
Feature after series connection, is then input in the SVMs trained, it is hereby achieved that the first face figure by series connection
As the assemblage characteristic of space-time characteristic and the first body image space-time characteristic belongs to the probability of different emotions classification, structure second
Class probability matrix.
Wherein, by the cascade process of the first facial image space-time characteristic and the first body image space-time characteristic
In, dimensionality reduction can be carried out by principal component analysis to the feature after series connection, then the feature after dimensionality reduction is input to the branch trained
Hold in vector machine so as to obtain probability output.By the method, by feature-based fusion, feature and body action to face
Feature is merged, and includes neural network fusion strategy and feature series connection convergence strategy, Ke Yifen using different convergence strategies
Not Huo get video data belong to the probability matrix of different emotions classification.
By the method, by feature-based fusion, the feature of feature and body action to face merges, using not
Same convergence strategy includes neural network fusion strategy and feature series connection convergence strategy, can obtain video data respectively and belong to not
The probability matrix of feeling of sympathy classification.
S4, the first facial image space-time characteristic is input in SVMs, obtains first facial image
Space-time characteristic belongs to the probability matrix of different emotions type, this probability matrix is labeled as into the 3rd probability matrix, by described first
Body image feature is input in SVMs, is obtained the first body image space-time characteristic and is belonged to different emotions type
Probability matrix, this probability matrix is labeled as the 4th probability matrix, by first probability matrix, the second probability matrix, the 3rd
Probability matrix and the 4th probability matrix carry out Decision fusion, obtain the first fusion probability matrix, probability square is merged by described first
Probability highest emotion type is as emotion recognition result in battle array.
Specifically, the first facial image space-time characteristic is individually entered in the SVMs trained, so as to
The probability matrix that the first facial image space-time characteristic belongs to different emotions classification can be obtained, passes through this probability matrix, structure
The 3rd probability matrix is built, on the other hand, the first body image space-time characteristic is individually entered to the supporting vector trained
In machine, it is hereby achieved that the first body image space-time characteristic belongs to the probability matrix of different emotions classification, it is general by this
Rate matrix builds the 4th probability matrix.
With reference to figure 2, Fig. 2 is to be based on the time in a kind of multi-modal emotion recognition sorting technique that one embodiment of the invention provides
Sequence uses the emotion recognition rate comparison diagram of different convergence strategies, and four probability matrixs of acquisition are carried out into Decision fusion, obtained
Probability matrix after new fusion, include the collection that testing data belongs to the probability of different emotional categories in the probability matrix
Close, in this set, select probability highest emotional category is as final recognition result.
By the method, by the face image expression of people and in the same period, body action is combined, by making
The space-time characteristic that testing data is carried out with deep neural network extracts, during by SVMs according to different convergence strategies pair
Empty feature is classified, and so as to finally realize multi-modal emotion recognition, has been made full use of the effective information in each mode, has been carried
Emotion recognition accuracy probability is risen.
On the basis of above-described embodiment, also include before the step S1:To the convolutional Neural based on Alexnet
Network, the Recognition with Recurrent Neural Network based on BLSTM, full Connection Neural Network and SVMs are trained.
Specifically, by FABO databases, 127 videos are used for the convolutional neural networks based on Alexnet, are based on
BLSTM Recognition with Recurrent Neural Network, full Connection Neural Network and SVMs are trained.
Have by using in people's face and body on the image sequence changed, to the convolutional neural networks based on Alexnet
It is trained with the Recognition with Recurrent Neural Network based on BLSTM, adjusts network parameter, obtain Feature Selection Model.Use different faces
The space-time characteristic of the space-time characteristic body posture of portion's activity is input in SVMs, sentiment classification model.
To the video comprising face and corresponding body action is included on the basis of above-described embodiment, in step S1
Video carry out pretreatment and specifically include:Each two field picture in the video comprising face is carried out at Face datection and alignment
Reason, the picture frame after processing is sequentially arranged, obtains the facial image time series;Body action is included to described
Video in each two field picture be normalized, the picture frame after processing is arranged sequentially in time, obtain body
Temporal sequence of images.
Specifically, by carrying out Face datection operation and alignment to each picture frame in the video comprising face
Processing, then by each two field picture after processing, is arranged sequentially in time, so as to obtain facial image time series,
The picture frame in the video comprising body action is normalized simultaneously so that the form one of each two field picture frame
Cause, then arranged the picture frame group after processing sequentially in time, form body image time series.
Pass through the method so that the form of each two field picture in facial image time series and body image time series
It is identical, the operation such as convenient follow-up progress feature extraction.
On the basis of above-described embodiment, the step S1 further comprises:Read in the video comprising face
The mark of each picture frame, extraction form facial image time series labeled as the picture frame of beginning, summit and disappearance;Read
Take the mark of each picture frame in the video comprising body action, image of the extraction labeled as beginning, summit and disappearance
Frame, form body image time series.Wherein, the mark of described image frame include calming down, start, summit and disappearance.
Specifically, in the database of testing data, each frame of video is all marked, and is opened in a facial expressions and acts
All picture frames in stage beginning are marked as " starting ", and the period that maximum is reached in facial expressions and acts is labeled as " summit ", by table
All picture frames are labeled as " end " in the period of feelings release, and other are not had into the picture frame mark that expression expressed
It is designated as " calming down ".
During emotion recognition is carried out using facial image time series and body image time series, it can use
The time series of image composition comprising all picture frames, it can also select to be used only in the period that facial expressions and acts reach maximum
Picture frame composition time series, it is preferred that abandon facial expressions and acts start before and facial expressions and acts finish later picture frame, only
Select the parts of images frame in facial expressions and acts start to finish to carry out classification processing, " beginning ", " summit " will be labeled as and " disappeared
The picture frame of mistake " is extracted, makeup time sequence, and so as to lift overall accuracy of identification, table 1 is shown based on difference
Picture frame extracting method under by face video carry out emotion recognition result, table 2 is shown to be carried based on different picture frames
Take the result for carrying out emotion recognition under method by body action.
Table 1
Time series screening technique | MAA (%) | ACC (%) |
Vertex sequence | 55.90 | 56.84 |
Beginning-summit-disappearance sequence | 57.56 | 61.11 |
Whole cycle all sequences | 51.67 | 53.85 |
Table 2
Time series screening technique | MAA (%) | ACC (%) |
Vertex sequence | 45.88 | 50.60 |
Beginning-summit-disappearance sequence | 48.98 | 51.70 |
Whole cycle all sequences | 44.50 | 49.77 |
By Tables 1 and 2 as can be seen that mark is starts in the selecting video ", the picture frame of " summit " and " disappearance "
Makeup time sequence carries out emotion recognition and possesses higher discrimination compared to other schemes.Wherein, MAA represents the average standard of macroscopic view
True rate, ACC represent overall accuracy rate, and calculation formula is specially:
Pi=TPi/(TPi+FPi)
In formula, s refers to emotional category number, PiRefer to the precision of the i-th class emotion, i refers to correctly to classify in the i-th class
Number, FPiRefer to the number of mistake classification in the i-th class.
On the basis of above-described embodiment, the step S2 is specifically included:
S21, the facial image time series is input in the convolutional neural networks based on Alexnet, takes out three
The data of the full articulamentum of the first two are carried out face space initial characteristicses as face space initial characteristicses in full articulamentum
Principal component analysis, so as to realize space conversion and dimensionality reduction, the first facial image space characteristics are obtained, by the body image time
Sequence inputting takes out the data of the full articulamentum of the first two in three full articulamentums into the convolutional neural networks based on Alexnet
As body space initial characteristicses, the body space initial characteristicses are subjected to principal component analysis, so as to realize space conversion and
Dimensionality reduction, obtain the first body image space characteristics;
S22, the first facial image space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network respectively, taken out
The data of the full articulamentum of the first two are as face space-time initial characteristicses in three full articulamentums, by the face space-time initial characteristicses
Principal component analysis is carried out, so as to realize space conversion and dimensionality reduction, the first facial image space-time characteristic is obtained, by first body
Image space feature is input to based in BLSTM Recognition with Recurrent Neural Network, takes out the full articulamentum of the first two in three full articulamentums
The body space-time initial characteristicses are carried out principal component analysis, so as to realize that space turns by data as body space-time initial characteristicses
Change and dimensionality reduction, obtain the first body image space-time characteristic.
Specifically, with reference to figure 3, in order to obtain more layer depths in facial image time series and body image time series
Space-time characteristic, it is necessary to realized the feature extraction on image space by means of convolutional neural networks, further using circulation
Temporal information in neutral net extraction image sequence, in the present embodiment, by using the convolutional Neural net based on Alexnet
Network, the space characteristics in facial image time series and body image time series are extracted respectively, it is preferred that based on
In Alexnet convolutional neural networks, last three layers are all full articulamentum, and the intrinsic dimensionality of output is respectively 1024 dimensions, 512 dimensions
With 10 dimensions, here using first 2 layers in three full articulamentums of output data as the initial space feature for going out output, extract herein
Initial characteristicses dimension one share 1536 dimensions, by this 1536 dimensional feature carry out principal component analysis, so as to realize space conversion and drop
Dimension processing so that latitude reaches the input standard of the Recognition with Recurrent Neural Network based on BLSTM, then by last three full articulamentums
First 2 layers of output data is extracted as initial space-time characteristic, wherein initial space-time characteristic is also 1536 dimensions, then to initial space-time characteristic
1536 dimensional feature points carry out principal component analysis, so as to realize space conversion and dimension-reduction treatment, finally obtain space-time characteristic.At this
In one step, by the convolutional neural networks based on Alexnet that are sequentially inputted to train by facial image time series and
The Recognition with Recurrent Neural Network based on BLSTM trained, so as to obtain facial image space-time characteristic, likewise, during by body image
Between the sequence convolutional neural networks based on Alexnet for being sequentially inputted to train and the god of the circulation based on BLSTM trained
Through network, so as to obtain body image space-time characteristic, labeled as the first facial image space-time characteristic and the first body image space-time
Feature.
By the method, realize and the extraction of space characteristics and the extraction of temporal characteristics are carried out to the time series of image.
On the basis of the various embodiments described above, the step S1 also includes:By default sliding window length, to described
Facial image time series and the body image time series are cut, and are obtained by multiple facial image time series fragments
The body image time subsequence of the facial image time subsequence group of composition and multiple body image time series fragments composition
Group.
Specifically, after facial image time series and body image time series is obtained, window has been preset by one
The sliding window of mouth length, cuts to time series, as shown in figure 4, in the facial image time series that a length is 15,
Include that 5 two field picture frame flags are " beginning ", 5 two field picture frame flags are " summit ", 5 two field pictures are labeled as " disappearance ", by setting
It is 6 to put length, and sliding step is 1 sliding window, and sequence is cut, and the facial image time series that length is 15 is by above-mentioned
The facial image time series fragment that 10 length are 6 can be obtained after the sliding window of setting, forms facial image time subsequence
The length of group, wherein sliding window, which tries one's best to be defined into ensure to cut in obtained time series fragment, includes " beginning ", " summit "
The picture frame of at least two types, also cuts to body image time series in the picture frame of " end " three types
Cut, the body image time series fragment obtained after cutting is formed into body fractional time subsequence group.
Table 3 shows the emotion recognition result carried out under different sliding window length based on facial image time series, table 4
Show the emotion recognition result carried out under different sliding window length based on body image time series.
Table 3
t | 6 | 7 | 8 | 9 | 10 |
MAA (%) | 58.61 | 60.45 | 67.09 | 58.48 | 56.13 |
ACC (%) | 59.00 | 61.25 | 66.46 | 59.03 | 57.21 |
Table 4
t | 6 | 7 | 8 | 9 | 10 |
MAA (%) | 43.66 | 55.00 | 50.20 | 47.33 | 45.81 |
ACC (%) | 44.85 | 55.98 | 51.83 | 48.76 | 46.00 |
By table 3 and table 4 as can be seen that when sliding window length selects suitable length, the accurate rate of identification is higher than
The emotion recognition mode cut using whole time series in Tables 1 and 2 without time series.
On the basis of the various embodiments described above, the step S2 further comprises:By the facial image chronon sequence
Multiple facial image time series fragments in row group are sequentially inputted to the convolutional neural networks based on Alexnet and are based on
In BLSTM Recognition with Recurrent Neural Network, the second facial image space-time characteristic is obtained;By in the body image time subsequence group
Multiple body image time series fragments are sequentially inputted to the convolutional neural networks based on Alexnet and the circulation based on BLSTM
In neutral net, the second body image space-time characteristic is obtained.
Specifically, by multiple facial image time series fragments in the facial image time subsequence group and the body
Multiple body image time series fragments in body image temporal subsequence group be also fed to train based on Alexnet's
In convolutional neural networks and Recognition with Recurrent Neural Network based on BLSTM, institute is obtained in facial image time subsequence group respectively sometimes
Between the space-time characteristic of sequence fragment and the space-time characteristic of all time series fragments in body action image temporal subsequence group, mark
It is designated as the second facial image space-time characteristic and the second body action image space-time characteristic.
By the method, feature extraction is entered to multiple time series fragments after cutting, when can obtain new facial image
Empty feature and new body action image space-time characteristic, for being classified to grader.
On the basis of the various embodiments described above, also include in the step S2:
Multiple facial image time serieses in the facial image time subsequence group are input to based on Alexnet's
In convolutional neural networks, the data for taking out the full articulamentum of the first two in three full articulamentums are initially special as the second face space
Sign, the second face space initial characteristicses are subjected to principal component analysis, so as to realize space conversion and dimensionality reduction, obtain the second people
Face image space characteristics, multiple body image time serieses in the body image time subsequence group are input to and are based on
In Alexnet convolutional neural networks, the data for taking out the full articulamentum of the first two in three full articulamentums are empty as the second body
Between initial characteristicses, by the second body space initial characteristicses carry out principal component analysis, so as to realize space conversion and dimensionality reduction, obtain
Obtain the second body image space characteristics;
The second facial image space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, three is taken out and connects entirely
The data of the full articulamentum of the first two in layer are connect as the second face space-time initial characteristicses, the face space-time initial characteristicses are carried out
Principal component analysis, so as to realize space conversion and dimensionality reduction, the second facial image space-time characteristic is obtained, by second body image
Space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, take out the data of the full articulamentum of the first two in three full articulamentums
As the second body space-time initial characteristicses, the body space-time initial characteristicses are subjected to principal component analysis, so as to realize that space turns
Change and dimensionality reduction, obtain the second body image space-time characteristic.
Specifically, the method one with extracting the first face space-time characteristic and the first body space-time characteristic in above-described embodiment
Cause, in order to obtain the space-time characteristic of more layer depths in facial image time series and body image time series, it is necessary to by
The feature extraction on image space is realized in convolutional neural networks, is further extracted using Recognition with Recurrent Neural Network in image
Temporal information, in the present embodiment, by using the convolutional neural networks based on Alexnet and the circulation nerve net based on BLSTM
Network has carried out the space-time characteristic for all time series fragments that sliding window is cut to extract, so as to extract the second face space-time spy
Seek peace the second body space-time characteristic.It is same as the previously described embodiments to the extracting mode of feature in neutral net herein, herein no longer
Repeat.
On the basis of the various embodiments described above, the step S3 further comprises:By the second facial image space-time
Feature and the second body image space-time characteristic series connection are input in full Connection Neural Network, and output result is input into support
In vector machine, after obtaining the second facial image space-time characteristic and the second body image space-time characteristic fusion, belong to not
The probability matrix of feeling of sympathy type, this probability matrix is labeled as the 5th probability matrix, while during by second facial image
Empty feature and the second body image space-time characteristic series connection are input in SVMs, when obtaining second facial image
After empty feature and the second body image space-time characteristic series connection, belong to the probability matrix of different emotions type, by this probability square
Battle array is labeled as the 6th probability matrix.
Specifically, the second facial image space-time characteristic and the second body image space-time characteristic are connected, input
Into the full Connection Neural Network trained, using the data of the full articulamentum of penultimate in full Connection Neural Network as output
Data, after carrying out principal component analysis, it is input in the SVMs trained, so as to according to the second facial image space-time
Feature and the second body image space-time characteristic both modalities which combination, obtain the second facial image space-time characteristic and described
Second body image space-time characteristic belongs to the probability of different emotions classification, builds the 5th class probability matrix.
On the other hand, by the way that the second facial image space-time characteristic and the second body image space-time characteristic are carried out
Feature after series connection, is then input in the SVMs trained by series connection, it is hereby achieved that the second people after series connection
Face image space-time characteristic and the second body image space-time characteristic belong to the probability of different emotions classification, by this probabilistic combination,
Build the 6th class probability matrix.
On the basis of above-described embodiment, the step S4 further comprises:The first facial image space-time is special
Sign is input in SVMs, obtains the probability matrix that the first facial image space-time characteristic belongs to different emotions type,
This probability matrix is labeled as the 7th probability matrix, the first body image feature is input in SVMs, is obtained
The first body image space-time characteristic belongs to the probability matrix of different emotions type, and this probability matrix is labeled as into the 8th probability
Matrix, the 5th probability matrix, the 6th probability matrix, the 7th probability matrix and the 8th probability matrix are subjected to Decision fusion,
Obtain the second fusion probability matrix;Described first fusion probability matrix and the second fusion probability matrix are carried out decision-making and melted
Close, obtain the 3rd fusion probability matrix, merge probability highest emotion type in probability matrix using the described 3rd identifies as emotion
As a result
Specifically, the second facial image space-time characteristic is individually entered in the SVMs trained, so as to
The probability matrix that the second facial image space-time characteristic belongs to different emotions classification can be obtained, this probability matrix is labeled as
7th general probability matrix, on the other hand, the second body image space-time characteristic is individually entered to the supporting vector trained
In machine, it is hereby achieved that the second body image space-time characteristic belongs to the probability matrix of different emotions classification, by this probability
Matrix is labeled as the 8th probability matrix.
With reference to figure 5, as can be seen from Figure 5 based on the 5th probability matrix, the 6th probability matrix, the 7th probability matrix and
The discrimination contrast of emotion recognition is carried out in eight probability matrixs, by the 5th probability matrix, the 6th probability matrix, the 7th probability
Matrix and the 8th probability matrix carry out Decision fusion, the fusion probability matrix of generation second, can reach in Fig. 5 and show in Multi4-2
The affective style discrimination gone out.
Enter finally by described first fusion probability matrix is merged into probability matrix with described second further according to Probabilistic Decision-making
Row decision level fusion, obtain the 3rd fusion probability matrix, in this set, select probability highest emotional category as finally
Recognition result.With reference to figure 6, Fig. 6 is shown in the first fusion probability matrix to the discrimination of emotion, and the second fusion probability matrix
In to the discrimination of emotion, using whole time series and carrying out sliding window in the discrimination of emotion and the 3rd fusion probability
Time series fragment group after cutting carries out after emotion recognition again merged recognition result, to obtain accuracy and existing respectively
More than 99% emotion recognition.
By the method, using the emotion identification method of multi-modal combination, various moulds in band detection video are taken full advantage of
The effective information of state, fusion efficiencies are improved, while improve the accuracy to emotion recognition.
Finally, the present processes are only preferable embodiment, are not intended to limit the scope of the present invention.It is all
Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements made etc., the protection of the present invention should be included in
Within the scope of.
Claims (10)
- A kind of 1. multi-modal emotion recognition sorting technique, it is characterised in that including:S1, receives testing data, and the testing data includes the video comprising face and body is included in the corresponding same time The video of action, the video comprising face and the corresponding video comprising body action are pre-processed, comprising The facial image time series of face and the body image time series comprising body action;S2, the facial image time series is sequentially inputted to convolutional neural networks based on Alexnet and based on BLSTM's In Recognition with Recurrent Neural Network, the data of output are taken out, as the first facial image space-time characteristic, by the body image time series It is sequentially inputted in the convolutional neural networks based on Alexnet and the Recognition with Recurrent Neural Network based on BLSTM, takes out the number of output According to as the first body image space-time characteristic;S3, the first facial image space-time characteristic and the first body image space-time characteristic series connection are input to full connection god Through in network, output result is input in SVMs, the first facial image space-time characteristic and described first are obtained After the fusion of body image space-time characteristic, belong to the probability matrix of different emotions type, this probability matrix is labeled as the first probability Matrix, at the same by the first facial image space-time characteristic and the first body image space-time characteristic series connection be input to support to In amount machine, after obtaining the first facial image space-time characteristic and the first body image space-time characteristic series connection, belong to different The probability matrix of affective style, this probability matrix is labeled as the second probability matrix;S4, the first facial image space-time characteristic is input in SVMs, obtains the first facial image space-time Feature belongs to the probability matrix of different emotions type, this probability matrix is labeled as into the 3rd probability matrix, by first body Characteristics of image is input in SVMs, obtains the probability that the first body image space-time characteristic belongs to different emotions type Matrix, this probability matrix is labeled as the 4th probability matrix, by first probability matrix, the second probability matrix, the 3rd probability Matrix and the 4th probability matrix carry out Decision fusion, obtain the first fusion probability matrix, described first is merged in probability matrix Probability highest emotion type is as emotion recognition result.
- 2. according to the method for claim 1, it is characterised in that also include before the step S1:It is based on to described Alexnet convolutional neural networks, the Recognition with Recurrent Neural Network based on BLSTM, full Connection Neural Network and SVMs enter Row training.
- 3. according to the method for claim 1, it is characterised in that to the video comprising face and corresponding in step S1 Video comprising body action carries out pretreatment and specifically included:Face datection and registration process are carried out to each two field picture in the video comprising face, the picture frame after processing is pressed Time sequencing arranges, and obtains the facial image time series;Each two field picture in entering to the video comprising body action is normalized, and the picture frame after processing is pressed Arranged according to time sequencing, obtain body image time series.
- 4. according to the method for claim 3, it is characterised in that the step S1 further comprises:The mark of each picture frame, figure of the extraction labeled as beginning, summit and disappearance in the reading video comprising face As frame, facial image time series is formed;The mark of each picture frame in the reading video comprising body action, extraction is labeled as beginning, summit and disappearance Picture frame, form body image time series;Wherein, the mark of described image frame include calming down, start, summit and disappearance.
- 5. according to the method for claim 1, it is characterised in that the step S2 is specifically included:S21, the facial image time series is input in the convolutional neural networks based on Alexnet, takes out three and connect entirely The data of the full articulamentum of the first two in layer are connect as face space initial characteristicses, by face space initial characteristicses carry out it is main into Analysis, so as to realize space conversion and dimensionality reduction, the first facial image space characteristics are obtained, by the body image time series It is input in the convolutional neural networks based on Alexnet, takes out the data conduct of the full articulamentum of the first two in three full articulamentums Body space initial characteristicses, the body space initial characteristicses are subjected to principal component analysis, so as to realize space conversion and dimensionality reduction, Obtain the first body image space characteristics;S22, the first facial image space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, three is taken out and connects entirely The data of the full articulamentum of the first two in layer are connect as face space-time initial characteristicses, by the face space-time initial characteristicses carry out it is main into Analysis, so as to realize space conversion and dimensionality reduction, the first facial image space-time characteristic is obtained, by the first body image space Feature is input to based in BLSTM Recognition with Recurrent Neural Network, takes out the data conduct of the full articulamentum of the first two in three full articulamentums Body space-time initial characteristicses, the body space-time initial characteristicses are subjected to principal component analysis, so as to realize space conversion and dimensionality reduction, Obtain the first body image space-time characteristic.
- 6. according to any described method in claim 1-5, it is characterised in that also include in the step S1:By default sliding window length, the facial image time series and the body image time series are cut Cut, obtain the facial image time subsequence group being made up of multiple facial image time series fragments and multiple body image times The body image time subsequence group of sequence fragment composition.
- 7. according to the method for claim 6, it is characterised in that the step S2 further comprises:Multiple facial image time series fragments in the facial image time subsequence group are sequentially inputted to be based on In Alexnet convolutional neural networks and Recognition with Recurrent Neural Network based on BLSTM, the data of output are taken out, as the second face Image space-time characteristic;Multiple body image time series fragments in the body image time subsequence group are sequentially inputted to be based on In Alexnet convolutional neural networks and Recognition with Recurrent Neural Network based on BLSTM, the data of output are taken out, as the second body Image space-time characteristic.
- 8. according to the method for claim 7, it is characterised in that also include in the step S2:Multiple facial image time serieses in the facial image time subsequence group are input to the convolution based on Alexnet In neutral net, the data of the full articulamentum of the first two in three full articulamentums are taken out as the second face space initial characteristicses, will The second face space initial characteristicses carry out principal component analysis, so as to realize space conversion and dimensionality reduction, obtain the second face figure Image space feature, multiple body image time serieses in the body image time subsequence group are input to and are based on In Alexnet convolutional neural networks, the data for taking out the full articulamentum of the first two in three full articulamentums are empty as the second body Between initial characteristicses, by the second body space initial characteristicses carry out principal component analysis, so as to realize space conversion and dimensionality reduction, obtain Obtain the second body image space characteristics;The second facial image space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, take out three full articulamentums The data of the middle full articulamentum of the first two as the second face space-time initial characteristicses, by the face space-time initial characteristicses carry out it is main into Analysis, so as to realize space conversion and dimensionality reduction, the second facial image space-time characteristic is obtained, by the second body image space Feature is input to based in BLSTM Recognition with Recurrent Neural Network, takes out the data conduct of the full articulamentum of the first two in three full articulamentums Second body space-time initial characteristicses, by the body space-time initial characteristicses carry out principal component analysis, so as to realize space conversion and Dimensionality reduction, obtain the second body image space-time characteristic.
- 9. according to the method for claim 8, it is characterised in that the step S3 further comprises:The second facial image space-time characteristic and the second body image space-time characteristic series connection are input to full connection nerve In network, output result is input in SVMs, obtains the second facial image space-time characteristic and second body After the fusion of body image space-time characteristic, belong to the probability matrix of different emotions type, this probability matrix is labeled as the 5th probability square Battle array, while the second facial image space-time characteristic and the second body image space-time characteristic series connection are input to supporting vector In machine, after obtaining the second facial image space-time characteristic and the second body image space-time characteristic series connection, belong to and do not sympathize with Feel the probability matrix of type, this probability matrix is labeled as the 6th probability matrix.
- 10. according to the method for claim 9, it is characterised in that the step S4 further comprises:The first facial image space-time characteristic is input in SVMs, obtains the first facial image space-time characteristic Belong to the probability matrix of different emotions type, this probability matrix is labeled as the 3rd probability matrix, by first body image Feature is input in SVMs, obtains the probability square that the first body image space-time characteristic belongs to different emotions type Battle array, is labeled as the 4th probability matrix, by the 5th probability matrix, the 6th probability matrix, the 7th probability square by this probability matrix Battle array and the 8th probability matrix carry out Decision fusion, obtain the second fusion probability matrix;Described first fusion probability matrix and the second fusion probability matrix are subjected to Decision fusion, obtain the 3rd fusion probability Matrix, using probability highest emotion type in the described 3rd fusion probability matrix as emotion recognition result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711144196.1A CN107808146B (en) | 2017-11-17 | 2017-11-17 | Multi-mode emotion recognition and classification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711144196.1A CN107808146B (en) | 2017-11-17 | 2017-11-17 | Multi-mode emotion recognition and classification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107808146A true CN107808146A (en) | 2018-03-16 |
CN107808146B CN107808146B (en) | 2020-05-05 |
Family
ID=61589748
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711144196.1A Active CN107808146B (en) | 2017-11-17 | 2017-11-17 | Multi-mode emotion recognition and classification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107808146B (en) |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491720A (en) * | 2018-03-20 | 2018-09-04 | 腾讯科技(深圳)有限公司 | A kind of application and identification method, system and relevant device |
CN108491880A (en) * | 2018-03-23 | 2018-09-04 | 西安电子科技大学 | Object classification based on neural network and position and orientation estimation method |
CN108596039A (en) * | 2018-03-29 | 2018-09-28 | 南京邮电大学 | A kind of bimodal emotion recognition method and system based on 3D convolutional neural networks |
CN109101999A (en) * | 2018-07-16 | 2018-12-28 | 华东师范大学 | The credible decision-making technique of association's neural network based on support vector machines |
CN109190514A (en) * | 2018-08-14 | 2019-01-11 | 电子科技大学 | Face character recognition methods and system based on two-way shot and long term memory network |
CN109325457A (en) * | 2018-09-30 | 2019-02-12 | 合肥工业大学 | Sentiment analysis method and system based on multi-channel data and Recognition with Recurrent Neural Network |
CN109359599A (en) * | 2018-10-19 | 2019-02-19 | 昆山杜克大学 | Human facial expression recognition method based on combination learning identity and emotion information |
CN109522945A (en) * | 2018-10-31 | 2019-03-26 | 中国科学院深圳先进技术研究院 | One kind of groups emotion identification method, device, smart machine and storage medium |
CN109684911A (en) * | 2018-10-30 | 2019-04-26 | 百度在线网络技术(北京)有限公司 | Expression recognition method, device, electronic equipment and storage medium |
CN109766759A (en) * | 2018-12-12 | 2019-05-17 | 成都云天励飞技术有限公司 | Emotion identification method and Related product |
CN109783684A (en) * | 2019-01-25 | 2019-05-21 | 科大讯飞股份有限公司 | A kind of emotion identification method of video, device, equipment and readable storage medium storing program for executing |
CN110020596A (en) * | 2019-02-21 | 2019-07-16 | 北京大学 | A kind of video content localization method based on Fusion Features and cascade study |
CN110037693A (en) * | 2019-04-24 | 2019-07-23 | 中央民族大学 | A kind of mood classification method based on facial expression and EEG |
CN110234018A (en) * | 2019-07-09 | 2019-09-13 | 腾讯科技(深圳)有限公司 | Multimedia content description generation method, training method, device, equipment and medium |
CN110287912A (en) * | 2019-06-28 | 2019-09-27 | 广东工业大学 | Method, apparatus and medium are determined based on the target object affective state of deep learning |
CN110378335A (en) * | 2019-06-17 | 2019-10-25 | 杭州电子科技大学 | A kind of information analysis method neural network based and model |
CN110472506A (en) * | 2019-07-11 | 2019-11-19 | 广东工业大学 | A kind of gesture identification method based on support vector machines and Neural Network Optimization |
CN110598608A (en) * | 2019-09-02 | 2019-12-20 | 中国航天员科研训练中心 | Non-contact and contact cooperative psychological and physiological state intelligent monitoring system |
CN110693508A (en) * | 2019-09-02 | 2020-01-17 | 中国航天员科研训练中心 | Multi-channel cooperative psychophysiological active sensing method and service robot |
CN110765839A (en) * | 2019-09-02 | 2020-02-07 | 合肥工业大学 | Multi-channel information fusion and artificial intelligence emotion monitoring method for visible light facial image |
CN110795973A (en) * | 2018-08-03 | 2020-02-14 | 北京大学 | Multi-mode fusion action recognition method and device and computer readable storage medium |
CN111242155A (en) * | 2019-10-08 | 2020-06-05 | 台州学院 | Bimodal emotion recognition method based on multimode deep learning |
CN111476217A (en) * | 2020-05-27 | 2020-07-31 | 上海乂学教育科技有限公司 | Intelligent learning system and method based on emotion recognition |
CN111914742A (en) * | 2020-07-31 | 2020-11-10 | 辽宁工业大学 | Attendance checking method, system, terminal equipment and medium based on multi-mode biological characteristics |
CN112418034A (en) * | 2020-11-12 | 2021-02-26 | 元梦人文智能国际有限公司 | Multi-modal emotion recognition method and device, electronic equipment and storage medium |
CN112784730A (en) * | 2021-01-20 | 2021-05-11 | 东南大学 | Multi-modal emotion recognition method based on time domain convolutional network |
CN116682168A (en) * | 2023-08-04 | 2023-09-01 | 阳光学院 | Multi-modal expression recognition method, medium and system |
CN117351575A (en) * | 2023-12-05 | 2024-01-05 | 北京师范大学珠海校区 | Nonverbal behavior recognition method and nonverbal behavior recognition device based on text-generated graph data enhancement model |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102968643A (en) * | 2012-11-16 | 2013-03-13 | 华中科技大学 | Multi-mode emotion recognition method based on Lie group theory |
CN106529504A (en) * | 2016-12-02 | 2017-03-22 | 合肥工业大学 | Dual-mode video emotion recognition method with composite spatial-temporal characteristic |
CN107273876A (en) * | 2017-07-18 | 2017-10-20 | 山东大学 | A kind of micro- expression automatic identifying method of ' the grand micro- transformation models of to ' based on deep learning |
-
2017
- 2017-11-17 CN CN201711144196.1A patent/CN107808146B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102968643A (en) * | 2012-11-16 | 2013-03-13 | 华中科技大学 | Multi-mode emotion recognition method based on Lie group theory |
CN106529504A (en) * | 2016-12-02 | 2017-03-22 | 合肥工业大学 | Dual-mode video emotion recognition method with composite spatial-temporal characteristic |
CN107273876A (en) * | 2017-07-18 | 2017-10-20 | 山东大学 | A kind of micro- expression automatic identifying method of ' the grand micro- transformation models of to ' based on deep learning |
Non-Patent Citations (2)
Title |
---|
HE JUN: "《Multi View Facial Action Unit Detection Based on CNN and BLSTM-RNN》", 《2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION》 * |
闫静杰 等: "《表情和姿态的双模态情感识别》", 《中国图象图形学报》 * |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491720A (en) * | 2018-03-20 | 2018-09-04 | 腾讯科技(深圳)有限公司 | A kind of application and identification method, system and relevant device |
CN108491720B (en) * | 2018-03-20 | 2023-07-14 | 腾讯科技(深圳)有限公司 | Application identification method, system and related equipment |
CN108491880A (en) * | 2018-03-23 | 2018-09-04 | 西安电子科技大学 | Object classification based on neural network and position and orientation estimation method |
CN108596039A (en) * | 2018-03-29 | 2018-09-28 | 南京邮电大学 | A kind of bimodal emotion recognition method and system based on 3D convolutional neural networks |
CN108596039B (en) * | 2018-03-29 | 2020-05-05 | 南京邮电大学 | Bimodal emotion recognition method and system based on 3D convolutional neural network |
CN109101999A (en) * | 2018-07-16 | 2018-12-28 | 华东师范大学 | The credible decision-making technique of association's neural network based on support vector machines |
CN109101999B (en) * | 2018-07-16 | 2021-06-25 | 华东师范大学 | Support vector machine-based cooperative neural network credible decision method |
CN110795973A (en) * | 2018-08-03 | 2020-02-14 | 北京大学 | Multi-mode fusion action recognition method and device and computer readable storage medium |
CN109190514A (en) * | 2018-08-14 | 2019-01-11 | 电子科技大学 | Face character recognition methods and system based on two-way shot and long term memory network |
CN109190514B (en) * | 2018-08-14 | 2021-10-01 | 电子科技大学 | Face attribute recognition method and system based on bidirectional long-short term memory network |
CN109325457A (en) * | 2018-09-30 | 2019-02-12 | 合肥工业大学 | Sentiment analysis method and system based on multi-channel data and Recognition with Recurrent Neural Network |
CN109359599A (en) * | 2018-10-19 | 2019-02-19 | 昆山杜克大学 | Human facial expression recognition method based on combination learning identity and emotion information |
CN109684911A (en) * | 2018-10-30 | 2019-04-26 | 百度在线网络技术(北京)有限公司 | Expression recognition method, device, electronic equipment and storage medium |
US11151363B2 (en) | 2018-10-30 | 2021-10-19 | Baidu Online Network Technology (Beijing) Co., Ltd. | Expression recognition method, apparatus, electronic device, and storage medium |
CN109522945A (en) * | 2018-10-31 | 2019-03-26 | 中国科学院深圳先进技术研究院 | One kind of groups emotion identification method, device, smart machine and storage medium |
CN109766759A (en) * | 2018-12-12 | 2019-05-17 | 成都云天励飞技术有限公司 | Emotion identification method and Related product |
CN109783684A (en) * | 2019-01-25 | 2019-05-21 | 科大讯飞股份有限公司 | A kind of emotion identification method of video, device, equipment and readable storage medium storing program for executing |
CN110020596A (en) * | 2019-02-21 | 2019-07-16 | 北京大学 | A kind of video content localization method based on Fusion Features and cascade study |
CN110037693A (en) * | 2019-04-24 | 2019-07-23 | 中央民族大学 | A kind of mood classification method based on facial expression and EEG |
CN110378335A (en) * | 2019-06-17 | 2019-10-25 | 杭州电子科技大学 | A kind of information analysis method neural network based and model |
CN110287912A (en) * | 2019-06-28 | 2019-09-27 | 广东工业大学 | Method, apparatus and medium are determined based on the target object affective state of deep learning |
CN110234018A (en) * | 2019-07-09 | 2019-09-13 | 腾讯科技(深圳)有限公司 | Multimedia content description generation method, training method, device, equipment and medium |
CN110472506A (en) * | 2019-07-11 | 2019-11-19 | 广东工业大学 | A kind of gesture identification method based on support vector machines and Neural Network Optimization |
CN110765839B (en) * | 2019-09-02 | 2022-02-22 | 合肥工业大学 | Multi-channel information fusion and artificial intelligence emotion monitoring method for visible light facial image |
CN110598608A (en) * | 2019-09-02 | 2019-12-20 | 中国航天员科研训练中心 | Non-contact and contact cooperative psychological and physiological state intelligent monitoring system |
CN110693508A (en) * | 2019-09-02 | 2020-01-17 | 中国航天员科研训练中心 | Multi-channel cooperative psychophysiological active sensing method and service robot |
CN110598608B (en) * | 2019-09-02 | 2022-01-14 | 中国航天员科研训练中心 | Non-contact and contact cooperative psychological and physiological state intelligent monitoring system |
CN110765839A (en) * | 2019-09-02 | 2020-02-07 | 合肥工业大学 | Multi-channel information fusion and artificial intelligence emotion monitoring method for visible light facial image |
CN111242155A (en) * | 2019-10-08 | 2020-06-05 | 台州学院 | Bimodal emotion recognition method based on multimode deep learning |
CN111476217A (en) * | 2020-05-27 | 2020-07-31 | 上海乂学教育科技有限公司 | Intelligent learning system and method based on emotion recognition |
CN111914742A (en) * | 2020-07-31 | 2020-11-10 | 辽宁工业大学 | Attendance checking method, system, terminal equipment and medium based on multi-mode biological characteristics |
CN112418034A (en) * | 2020-11-12 | 2021-02-26 | 元梦人文智能国际有限公司 | Multi-modal emotion recognition method and device, electronic equipment and storage medium |
CN112784730A (en) * | 2021-01-20 | 2021-05-11 | 东南大学 | Multi-modal emotion recognition method based on time domain convolutional network |
CN112784730B (en) * | 2021-01-20 | 2022-03-29 | 东南大学 | Multi-modal emotion recognition method based on time domain convolutional network |
CN116682168A (en) * | 2023-08-04 | 2023-09-01 | 阳光学院 | Multi-modal expression recognition method, medium and system |
CN116682168B (en) * | 2023-08-04 | 2023-10-17 | 阳光学院 | Multi-modal expression recognition method, medium and system |
CN117351575A (en) * | 2023-12-05 | 2024-01-05 | 北京师范大学珠海校区 | Nonverbal behavior recognition method and nonverbal behavior recognition device based on text-generated graph data enhancement model |
CN117351575B (en) * | 2023-12-05 | 2024-02-27 | 北京师范大学珠海校区 | Nonverbal behavior recognition method and nonverbal behavior recognition device based on text-generated graph data enhancement model |
Also Published As
Publication number | Publication date |
---|---|
CN107808146B (en) | 2020-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107808146A (en) | A kind of multi-modal emotion recognition sorting technique | |
CN108596039B (en) | Bimodal emotion recognition method and system based on 3D convolutional neural network | |
CN106203395B (en) | Face attribute recognition method based on multitask deep learning | |
CN104679863B (en) | It is a kind of based on deep learning to scheme to search drawing method and system | |
US20180150719A1 (en) | Automatically computing emotions aroused from images through shape modeling | |
CN105999670A (en) | Shadow-boxing movement judging and guiding system based on kinect and guiding method adopted by same | |
CN107609572A (en) | Multi-modal emotion identification method, system based on neutral net and transfer learning | |
CN106203356B (en) | A kind of face identification method based on convolutional network feature extraction | |
CN108764065A (en) | A kind of method of pedestrian's weight identification feature fusion assisted learning | |
CN110464366A (en) | A kind of Emotion identification method, system and storage medium | |
CN108491077A (en) | A kind of surface electromyogram signal gesture identification method for convolutional neural networks of being divided and ruled based on multithread | |
CN106845513B (en) | Manpower detector and method based on condition random forest | |
CN112906631B (en) | Dangerous driving behavior detection method and detection system based on video | |
CN107330393A (en) | A kind of neonatal pain expression recognition method based on video analysis | |
CN112036276A (en) | Artificial intelligent video question-answering method | |
CN105956570A (en) | Lip characteristic and deep learning based smiling face recognition method | |
Lyu et al. | Spontaneous facial expression database of learners’ academic emotions in online learning with hand occlusion | |
CN115862120A (en) | Separable variation self-encoder decoupled face action unit identification method and equipment | |
CN113486752A (en) | Emotion identification method and system based on electrocardiosignals | |
CN106529453A (en) | Reinforcement patch and multi-tag learning combination-based expression lie test method | |
CN108511064A (en) | The system for automatically analyzing healthy data based on deep learning | |
KR20110098286A (en) | Self health diagnosis system of oriental medicine using fuzzy inference method | |
CN113506274B (en) | Detection system for human cognitive condition based on visual saliency difference map | |
Alam et al. | An Autism Detection Architecture with Fusion of Feature Extraction and Classification | |
Javed et al. | Behavior-based risk detection of autism spectrum disorder through child-robot interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |